Interpreting results: Quartiles and the interquartile range 

Interpreting results: Quartiles and the interquartile range 


Percentiles are useful for giving the relative standing of an individual in a group. Percentiles are essentially normalized ranks. The 80th percentile is a value where you'll find 80% of the values lower and 20% of the values higher. Percentiles are expressed in the same units as the data.
The median is the 50th percentile. Half the values are higher; half are lower. Rank the values from low to high. If there are an odd number of points, the median is the one in the middle. If there are an even number of points, the median is the average of the two middle values.
Quartiles divide the data into four groups, each containing an equal number of values. Quartiles are divided by the 25th, 50th, and 75th percentile, also called the first, second and third quartile. One quarter of the values are less than or equal to the 25th percentile. Three quarters of the values are less than or equal to the 75th percentile.
The difference between the 75th and 25th percentile is called the interquartile range. It is a useful way to quantify scatter.
Computing a percentile other than the median is not straightforward. Believe it or not, there are at least eight different methods to compute percentiles. Here is another explanation of different methods (scroll down to "plotting positions").
Prism computes percentile values by first evaluating this expression:
R = P * (n + 1)/100
P is the desired percentile (25 or 75 for quartiles) and n is the number of values in the data set. The result is the rank that corresponds to the percentile value. If there are 68 values, the 25th percentile corresponds to a rank equal to:
0.25 * 69 = 17.25
Prism (since version 5) interpolates one quarter of the way between the 17th and 18th value. This is the method most commonly used in stats programs. It is definition 6 in Hyndman and Fan (1) . With this method, the percentile of any point is k/(n+1), where k is the rank (starting at 1) and n is the sample size. This is not the same way that Excel computes percentiles, so percentiles computed by Prism and Excel will not match when sample sizes are small.
Beware of percentiles of tiny data sets. Consider this example: What is the 90th percentile of six values? Using the formula above, R equals 6.3. Since the largest value has a rank of 6, it is not really possible to compute a 90th percentile. Prism reports the largest value as the 90th percentile. A similar problem occurs if you try to compute the 10th percentile of six values. R equals 0.7, but the lowest value has a rank of 1. Prism reports the lowest value as the 10th percentile.
Note that there is no ambiguity about how to compute the median. All definitions of percentiles lead to the same result for the median.
The term fivenumber summary is used to describe a list of five values: the minimum, the 25th percentile, the median, the 75th percentile, and the maximum. These are the same values plotted in a boxandwhiskers plots (when the whiskers extend to the minimum and maximum; Prism offers other ways to define the whiskers).
1. R.J. and Y. Fan, Sample quantiles in statistical packages, The American Statistician, 50: 361365, 1996