Scatter plot python примеры

Содержание

Simple Scatter Plots
Scatter Plots with plt.plot ¶
Scatter Plots with plt.scatter ¶
plot Versus scatter : A Note on Efficiency¶
matplotlib.pyplot.scatter#

Simple Scatter Plots

Another commonly used plot type is the simple scatter plot, a close cousin of the line plot. Instead of points being joined by line segments, here the points are represented individually with a dot, circle, or other shape. We’ll start by setting up the notebook for plotting and importing the functions we will use:

%matplotlib inline import matplotlib.pyplot as plt plt.style.use('seaborn-whitegrid') import numpy as np

Scatter Plots with plt.plot ¶

In the previous section we looked at plt.plot / ax.plot to produce line plots. It turns out that this same function can produce scatter plots as well:

x = np.linspace(0, 10, 30) y = np.sin(x) plt.plot(x, y, 'o', color='black');

The third argument in the function call is a character that represents the type of symbol used for the plotting. Just as you can specify options such as ‘-‘ , ‘—‘ to control the line style, the marker style has its own set of short string codes. The full list of available symbols can be seen in the documentation of plt.plot , or in Matplotlib’s online documentation. Most of the possibilities are fairly intuitive, and we’ll show a number of the more common ones here:

rng = np.random.RandomState(0) for marker in ['o', '.', ',', 'x', '+', 'v', '^', ', '>', 's', 'd']: plt.plot(rng.rand(5), rng.rand(5), marker, label="marker=' '".format(marker)) plt.legend(numpoints=1) plt.xlim(0, 1.8);

For even more possibilities, these character codes can be used together with line and color codes to plot points along with a line connecting them:

Additional keyword arguments to plt.plot specify a wide range of properties of the lines and markers:

plt.plot(x, y, '-p', color='gray', markersize=15, linewidth=4, markerfacecolor='white', markeredgecolor='gray', markeredgewidth=2) plt.ylim(-1.2, 1.2);

This type of flexibility in the plt.plot function allows for a wide variety of possible visualization options. For a full description of the options available, refer to the plt.plot documentation.

Scatter Plots with plt.scatter ¶

A second, more powerful method of creating scatter plots is the plt.scatter function, which can be used very similarly to the plt.plot function:

The primary difference of plt.scatter from plt.plot is that it can be used to create scatter plots where the properties of each individual point (size, face color, edge color, etc.) can be individually controlled or mapped to data.

Let’s show this by creating a random scatter plot with points of many colors and sizes. In order to better see the overlapping results, we’ll also use the alpha keyword to adjust the transparency level:

rng = np.random.RandomState(0) x = rng.randn(100) y = rng.randn(100) colors = rng.rand(100) sizes = 1000 * rng.rand(100) plt.scatter(x, y, c=colors, s=sizes, alpha=0.3, cmap='viridis') plt.colorbar(); # show color scale

Notice that the color argument is automatically mapped to a color scale (shown here by the colorbar() command), and that the size argument is given in pixels. In this way, the color and size of points can be used to convey information in the visualization, in order to visualize multidimensional data.

For example, we might use the Iris data from Scikit-Learn, where each sample is one of three types of flowers that has had the size of its petals and sepals carefully measured:

from sklearn.datasets import load_iris iris = load_iris() features = iris.data.T plt.scatter(features[0], features[1], alpha=0.2, s=100*features[3], c=iris.target, cmap='viridis') plt.xlabel(iris.feature_names[0]) plt.ylabel(iris.feature_names[1]);

We can see that this scatter plot has given us the ability to simultaneously explore four different dimensions of the data: the (x, y) location of each point corresponds to the sepal length and width, the size of the point is related to the petal width, and the color is related to the particular species of flower. Multicolor and multifeature scatter plots like this can be useful for both exploration and presentation of data.

plot Versus scatter : A Note on Efficiency¶

Aside from the different features available in plt.plot and plt.scatter , why might you choose to use one over the other? While it doesn’t matter as much for small amounts of data, as datasets get larger than a few thousand points, plt.plot can be noticeably more efficient than plt.scatter . The reason is that plt.scatter has the capability to render a different size and/or color for each point, so the renderer must do the extra work of constructing each point individually. In plt.plot , on the other hand, the points are always essentially clones of each other, so the work of determining the appearance of the points is done only once for the entire set of data. For large datasets, the difference between these two can lead to vastly different performance, and for this reason, plt.plot should be preferred over plt.scatter for large datasets.

Источник

matplotlib.pyplot.scatter#

matplotlib.pyplot. scatter ( x , y , s = None , c = None , marker = None , cmap = None , norm = None , vmin = None , vmax = None , alpha = None , linewidths = None , * , edgecolors = None , plotnonfinite = False , data = None , ** kwargs ) [source] #

A scatter plot of y vs. x with varying marker size and/or color.

Parameters : x, y float or array-like, shape (n, )

s float or array-like, shape (n, ), optional

The marker size in points**2 (typographic points are 1/72 in.). Default is rcParams[‘lines.markersize’] ** 2 .

c array-like or list of colors or color, optional

The marker colors. Possible values:

A scalar or sequence of n numbers to be mapped to colors using cmap and norm.
A 2D array in which the rows are RGB or RGBA.
A sequence of colors of length n.
A single color format string.

Note that c should not be a single numeric RGB or RGBA sequence because that is indistinguishable from an array of values to be colormapped. If you want to specify the same RGB or RGBA value for all points, use a 2D array with a single row. Otherwise, value-matching will have precedence in case of a size matching with x and y.

If you wish to specify a single color for all points prefer the color keyword argument.

Defaults to None . In that case the marker color is determined by the value of color, facecolor or facecolors. In case those are not specified or None , the marker color is determined by the next color of the Axes ‘ current «shape and fill» color cycle. This cycle defaults to rcParams[«axes.prop_cycle»] (default: cycler(‘color’, [‘#1f77b4’, ‘#ff7f0e’, ‘#2ca02c’, ‘#d62728’, ‘#9467bd’, ‘#8c564b’, ‘#e377c2’, ‘#7f7f7f’, ‘#bcbd22’, ‘#17becf’]) ).

The marker style. marker can be either an instance of the class or the text shorthand for a particular marker. See matplotlib.markers for more information about marker styles.

cmap str or Colormap , default: rcParams[«image.cmap»] (default: ‘viridis’ )

The Colormap instance or registered colormap name used to map scalar data to colors.

This parameter is ignored if c is RGB(A).

norm str or Normalize , optional

The normalization method used to scale scalar data to the [0, 1] range before mapping to colors using cmap. By default, a linear scaling is used, mapping the lowest value to 0 and the highest to 1.

If given, this can be one of the following:

An instance of Normalize or one of its subclasses (see Colormap Normalization ).
A scale name, i.e. one of «linear», «log», «symlog», «logit», etc. For a list of available scales, call matplotlib.scale.get_scale_names() . In that case, a suitable Normalize subclass is dynamically generated and instantiated.

This parameter is ignored if c is RGB(A).

vmin, vmax float, optional

When using scalar data and no explicit norm, vmin and vmax define the data range that the colormap covers. By default, the colormap covers the complete value range of the supplied data. It is an error to use vmin/vmax when a norm instance is given (but using a str norm name together with vmin/vmax is acceptable).

This parameter is ignored if c is RGB(A).

alpha float, default: None

The alpha blending value, between 0 (transparent) and 1 (opaque).

linewidths float or array-like, default: rcParams[«lines.linewidth»] (default: 1.5 )

The linewidth of the marker edges. Note: The default edgecolors is ‘face’. You may want to change this as well.

edgecolors None> or color or sequence of color, default: rcParams[«scatter.edgecolors»] (default: ‘face’ )

The edge color of the marker. Possible values:

‘face’: The edge color will always be the same as the face color.
‘none’: No patch boundary will be drawn.
A color or sequence of colors.

For non-filled markers, edgecolors is ignored. Instead, the color is determined like with ‘face’, i.e. from c, colors, or facecolors.

plotnonfinite bool, default: False

Whether to plot points with nonfinite c (i.e. inf , -inf or nan ). If True the points are drawn with the bad colormap color (see Colormap.set_bad ).

Returns : PathCollection Other Parameters : data indexable object, optional

If given, the following parameters also accept a string s , which is interpreted as data[s] (unless this raises an exception):

**kwargs Collection properties

To plot scatter plots when markers are identical in size and color.

The plot function will be faster for scatterplots where markers don’t vary in size or color.
Any or all of x, y, s, and c may be masked arrays, in which case all masks will be combined and only unmasked points will be plotted.
Fundamentally, scatter works with 1D arrays; x, y, s, and c may be input as N-D arrays, but within scatter they will be flattened. The exception is c, which will be flattened only if its size matches the size of x and y.

Источник