Box whisker plot python

Boxplot in Matplotlib

In this tutorial, we will cover box plots and the creation of box plots in the matplotlib library using the boxplot() function. This article demonstrates how to use the matplotlib package to create a variety of various types of boxplots. In addition, we’ll discuss why some parameters in matplotlib’s boxplot function are essential.

What is boxplot in matplotlib?

Matplotlib’s boxplot mainly provides a graphical summary of a data set with features such as minimum, first quartile, median, third quartile, and maximum.

Note: A quartile is a statistical phrase for dividing observations into four predetermined intervals based on data values.

  • The «Whisker Plot» is another name for the «Box Plot.»
  • The whiskers are the two lines that run from the minimum to the lower quartile (the box’s beginning) and then from the higher quartile (the box’s end) to the maximum.
  • In the box plot, the box is drawn from the first to third quartiles, with a vertical line running through it at the median.
  • The x-axis of a box plot represents the data to be plotted, while the y-axis represents the frequency distribution.
  • Box plots are useful for visualizing the distribution of numerical values in a field. They come in handy for making comparisons across categorical variables and spotting outliers, if any exist in a dataset.

Parameters of Matplotlib boxplot

Attribute Value
data A plottable array or sequence of arrays
notch This parameter is optional and accepts boolean values.
vert For horizontal and vertical graphs, this optional parameter accepts boolean values of false and true
bootstrap Intervals around notched box plot is an optional parameter that accepts int.
used medians An optional parameter that accepts an array or a sequence of array dimensions compatible with the data.
positions The optional argument sets the position of boxes by accepting an array.
widths An optional parameter accepts an array and specifies the box width.
patch_artist Boolean values are available as an optional parameter.
labels Each dataset is given a label using a string sequence.
meanline Attempt to render the meanline as the entire width of the box having a boolean value as an option.
order An optional parameter determines the boxplot’s order.

How to Create a Boxplot in Matplotlib?

The boxplot() method in the matplotlib library is usually used to produce a box plot.

  • The numpy.random.normal() function generates random data in the Box Plot. Its arguments are the mean, standard deviation, and the desired number of values.
  • A Numpy array, a Python list, or a tuple of arrays can be used as data values for the ax.boxplot() method.
Читайте также:  Ошибка установки python 2503

The boxplot() method requires the following syntax:

How to Customize Matplotlib Boxplot?

The matplotlib.pyplot.boxplot() function allows for unlimited modification of the box plot. patch_artist = True fills the box plot with colors, and notch = True generates the notch format for the box plot. Distinct colors can be assigned to different boxes. A horizontal box plot is created with the vert = 0 parameter . The dimensions of the labels are the same as the dimensions of the number of data sets.

Matplotlib Boxplot Examples

Example 1: Simple Matplotlib Boxplot

In this example, we will look at how to use the matplotlib boxplot function. We will start by using the numpy library to generate random data, which will then be put into the matplotlib boxplot function. Remember that we are using the Numpy normal function to generate a random sample of data from normally distributed data.

The boxplot created has a box that displays the data’s major distribution, while the whiskers at both ends, i.e., the top and bottom, depict the data’s outliers.

Boxplot in Matplotlib Example

Example 2: Multiple Box Plots in Matplotlib We can also create additional box plots to aid in comparing data from various groups. In this example, we produce normally distributed data for various boxes and feed it to the boxplot function.

Multiple Boxplots in Matplotlib

Example 3: Matplotlib boxplot Color Customization The patch_artist argument of the matplotlib boxplot function for color modification, with the help of the Line2d artist, this parameter will aid in displaying the boxplot. In contrast to the preceding example, empty boxes were present, this signifies that the boxes will be filled with colour.

Color Customization in Matplotlib

Example 4: Box Plot with Notches This example will explore how to add notches to our box plots. The notches transmit useful information regarding the significance of two separate box plots’ differential medians.

Again, we can use the boxplot method with the notch argument set to True to create a boxplot with notches. We can produce random data first, then make notches in this manner.

Boxplot with Notches

Example 5: Horizontal Box Plots with Whiskers of Various Lengths and Colors

The final example in this matplotlib box plot tutorial will show us how to create horizontal boxplots; the previous examples all focused on vertical boxplots. We use the subplot function to plot this graph once we’ve created random data. We will get horizontal boxplots if we set the vert option to 0 for horizontal boxplots. Using a loop, we can modify the appearance and linewidth of whiskers, caps, medians, and fliers.

Customized Boxplot with Labels

Conclusion:

  • This matplotlib boxplot tutorial offered a comprehensive overview of how to use matplotlib to create a variety of boxplots.
  • We learned how to use the boxplot function’s various arguments. This tutorial also demonstrated how to make vertical and horizontal boxplots.
  • We can create several boxplots on the same axes by defining as many data sets as needed.
  • In addition, the Matlotlib boxplot allows for several customization options. Different examples of customization have also been considered.
Читайте также:  Store and retrieve image

Источник

pandas.DataFrame.boxplot#

Make a box-and-whisker plot from DataFrame columns, optionally grouped by some other columns. A box plot is a method for graphically depicting groups of numerical data through their quartiles. The box extends from the Q1 to Q3 quartile values of the data, with a line at the median (Q2). The whiskers extend from the edges of box to show the range of the data. By default, they extend no more than 1.5 * IQR (IQR = Q3 — Q1) from the edges of the box, ending at the farthest data point within that interval. Outliers are plotted as separate dots.

For further details see Wikipedia’s entry for boxplot.

Parameters column str or list of str, optional

Column name or list of names, or vector. Can be any valid input to pandas.DataFrame.groupby() .

by str or array-like, optional

Column in the DataFrame to pandas.DataFrame.groupby() . One box-plot will be done per value of columns in by .

ax object of class matplotlib.axes.Axes, optional

The matplotlib axes to be used by boxplot.

fontsize float or str

Tick label font size in points or as a string (e.g., large ).

rot float, default 0

The rotation angle of labels (in degrees) with respect to the screen coordinate system.

grid bool, default True

Setting this to True will show the grid.

figsize A tuple (width, height) in inches

The size of the figure to create in matplotlib.

layout tuple (rows, columns), optional

For example, (3, 5) will display the subplots using 3 rows and 5 columns, starting from the top-left.

return_type or None, default ‘axes’

The kind of object to return. The default is axes .

  • ‘axes’ returns the matplotlib axes the boxplot is drawn on.
  • ‘dict’ returns a dictionary whose values are the matplotlib Lines of the boxplot.
  • ‘both’ returns a namedtuple with the axes and dict.
  • when grouping with by , a Series mapping columns to return_type is returned. If return_type is None , a NumPy array of axes with the same shape as layout is returned.

Backend to use instead of the backend specified in the option plotting.backend . For instance, ‘matplotlib’. Alternatively, to specify the plotting.backend for the whole session, set pd.options.plotting.backend .

All other plotting keyword arguments to be passed to matplotlib.pyplot.boxplot() .

Читайте также:  Http do cdodd ru mod page view php id 4503

Matplotlib equivalent plot.

The return type depends on the return_type parameter:

  • ‘axes’ : object of class matplotlib.axes.Axes
  • ‘dict’ : dict of matplotlib.lines.Line2D objects
  • ‘both’ : a namedtuple with structure (ax, lines)

For data grouped with by , return a Series of the above or a numpy array:

Use return_type=’dict’ when you want to tweak the appearance of the lines after plotting. In this case a dict containing the Lines making up the boxes, caps, fliers, medians, and whiskers is returned.

Boxplots can be created for every column in the dataframe by df.boxplot() or indicating the columns to be used:

>>> np.random.seed(1234) >>> df = pd.DataFrame(np.random.randn(10, 4), . columns=['Col1', 'Col2', 'Col3', 'Col4']) >>> boxplot = df.boxplot(column=['Col1', 'Col2', 'Col3']) 

../../_images/pandas-DataFrame-boxplot-1.png

Boxplots of variables distributions grouped by the values of a third variable can be created using the option by . For instance:

>>> df = pd.DataFrame(np.random.randn(10, 2), . columns=['Col1', 'Col2']) >>> df['X'] = pd.Series(['A', 'A', 'A', 'A', 'A', . 'B', 'B', 'B', 'B', 'B']) >>> boxplot = df.boxplot(by='X') 

../../_images/pandas-DataFrame-boxplot-2.png

A list of strings (i.e. [‘X’, ‘Y’] ) can be passed to boxplot in order to group the data by combination of the variables in the x-axis:

>>> df = pd.DataFrame(np.random.randn(10, 3), . columns=['Col1', 'Col2', 'Col3']) >>> df['X'] = pd.Series(['A', 'A', 'A', 'A', 'A', . 'B', 'B', 'B', 'B', 'B']) >>> df['Y'] = pd.Series(['A', 'B', 'A', 'B', 'A', . 'B', 'A', 'B', 'A', 'B']) >>> boxplot = df.boxplot(column=['Col1', 'Col2'], by=['X', 'Y']) 

../../_images/pandas-DataFrame-boxplot-3.png

The layout of boxplot can be adjusted giving a tuple to layout :

>>> boxplot = df.boxplot(column=['Col1', 'Col2'], by='X', . layout=(2, 1)) 

../../_images/pandas-DataFrame-boxplot-4.png

Additional formatting can be done to the boxplot, like suppressing the grid ( grid=False ), rotating the labels in the x-axis (i.e. rot=45 ) or changing the fontsize (i.e. fontsize=15 ):

>>> boxplot = df.boxplot(grid=False, rot=45, fontsize=15) 

../../_images/pandas-DataFrame-boxplot-5.png

The parameter return_type can be used to select the type of element returned by boxplot . When return_type=’axes’ is selected, the matplotlib axes on which the boxplot is drawn are returned:

>>> boxplot = df.boxplot(column=[‘Col1’, ‘Col2’], return_type=‘axes’) >>> type(boxplot)

When grouping with by , a Series mapping columns to return_type is returned:

>>> boxplot = df.boxplot(column=[‘Col1’, ‘Col2’], by=‘X’, . return_type=‘axes’) >>> type(boxplot)

If return_type is None , a NumPy array of axes with the same shape as layout is returned:

>>> boxplot = df.boxplot(column=[‘Col1’, ‘Col2’], by=‘X’, . return_type=None) >>> type(boxplot)

Источник

Оцените статью