Five ways to plot whiskers in box and whisker plots.
The box always extends from the 25th to 75th percentiles. These limits are sometimes called the hinges of the plot. Surprisingly, there are multiple ways to compute these percentile values. Prism uses a standard method, but a different method than Excel uses.
The line in the middle of the box is plotted at the median. You can not choose a different value, but Prism also lets you put a "+" at the mean.
Prism offers five choices for drawing the whiskers in box-and-whiskers plots for column and grouped data:
- Min to max. The whiskers go down to the smallest value and up to the largest. For box-and-whisker plots of XY data, Prism always plots like this and offers no choice.
- Tukey. See details below.
- 10 to 90 percentiles. The whiskers are drawn down to the 10th percentile and up to the 90th. Points below and above the whiskers are drawn as individual dots.
- 5 and 95 percentiles
- 2.5 and 97.5 percentiles
- 1 and 99 percentiles
- Min to max, show all points. This method plots whiskers down to the minimum and up to the maximum value, but also plots each individual value as a dot superimposed on the graph
Because whiskers can be created in so many ways, it is essential to mention in a graph's figure legend which method you chose.
Details of the Tukey method for plotting the whiskers and outliers:
- Calculate the interquartile range (the difference between the 25th and 75th percentiles). Call this the IQR.
- Add the 75th percentile plus 1.5 times IQR. If this value is greater than (or equal to) the largest value in the data set, draw the upper whisker to the largest value. Otherwise stop the upper whisker at the the largest value less than the sum of the 75th percentile plus 1.5IQR, and plot any values that are greater than this as individual points.
- Calculate the 25th percentile minus 1.5IQR. If this value is less than the smallest value in the data set, draw the lower whisker to the smallest value. Otherwise stop the lower whisker at the lowest value greater than the 25th percentile minues 1.5IQR, and plot any values that are less than this as individual points.
- Why 1.5IQR? There is no statistical rationale; it is simply how Tukey decided to do it, and he invented the idea of box-and-whisker plots.
- When the Tukey method is used to create the whiskers, the ends of the whiskers are sometimes called the inner fences.
- If the largest value preciesly equals the 75th percentile plus 1.5IQR, Prism (up to 6.01 and 6.0b) does not plot that value as an outlier. However, if the smallest value preceisly equals the 25th percentile minus 1.5IQR then Prism did plot the point as an outlier. We fixed this inconsistency in 6.02 and 6.0c, and now do not pot that point individually.
- The values that are plotted individually are sometimes called outliers, but "outlier" is defined differently by Grubbs test or some other outlier test. The chance of finding one or more "outlier" by Tukey's rule in data sampled from a Gaussian distribution depends on sample size.
- If you only enter three values per group (n=3), Prism will plot the median and range. It will not plot the percentiles and will ignore your choice for how to plot the whiskers.
- Steps 2 and 3 were corrected in Nov. 2013 to correctly explain what Prism does. The Tukey whiskers always stop at the value of a data point, and do not extenda all the way to the 75th percentile plus 1.5IQR or all the way down to the 25th percentile minus 1.5IQR.
- With Tukey's method, the whiskers always end at a value matching one of the values in the sample. So the two whiskers are often not the same length.
- The terms boxplot and box-and-whiskers plot are often used interchangeably, although originally the boxplot was used to describe a plot with Tukey whiskers (fences) and the box-and-whisker plot was used to describe a plot where the whiskers extend down to the minimum value and up to the maximum value.