How Prism computes percentiles
The median (50th percentile)
Computing the median, the 50th percentile, is straightforward. First rank the values. If there are an odd number of values, the median is the middle one. If there are an even number of values, average the two middle ones.
Computing the value that corresponds to the 25th or 75th or some other percentile
Computing other percentiles is not so straightward. Believe it or not, there are at least eight different methods to compute percentiles. When you arrive on that page, click on Method 1, then Method 2.... Here is another explanation of different methods (scroll down to "plotting positions"). And another.
Prism computes percentile values by first evaluating this expression:
R = P * (n + 1)/100
P is the desired percentile (25 or 75 for quartiles) and n is the number of values in the data set. The result is the rank that corresponds to the percentile value. If there are 68 values, the 25th percentile corresponds to a rank equal to:
0.25 * 69 = 17.25
To compute the percentile, Prism (version 5 and later) interpolates one quarter of the way between the 17th and 18th value. This is the method most commonly used in stats programs. It is definition 6 in Hyndman and Fan "Sample quantiles in statistical packages", The American Statistician, 50: 361-365, 1996. With this method, the percentile of any point is k/(n+1), where k is the rank (starting at 1) and n is the sample size. This is not the same way that Excel computes percentiles, so results from Prism and Excel will not match when sample sizes are small.
Earlier versions of Prism did it differently:
- Prism 3.03 to 4.03 average the 17th- and 18th-ranked values (ranking from low to high) to get the 25th percentile.
- There is a bug in how older versions of Prism (2.00 - 3.02 Windows, 2.0a - 3.0a Mac) compute the 25th and 75th percentile. The problem only occurs if the number of points minus one is evenly divisible by four. So the problem occurs when N=7,11,15,19, etc. In these cases, Prism adds 0.5 to the resulting 25th and 75th percentile values that you see in the Column Statistics analysis and are plotted in box and whisker plots. If your values are large, adding an extra 0.5 is not noticeable. If your values are small, the error can be more serious.
Computing the percentile of a value
First rank the values so the smallest is 1 and the largest is n. To find the percentile of a value of rank R, compute 100*R/(n + 1)
Percentiles with small sample sizes
Beware of percentiles of tiny data sets. Consider this example: What is the 90th percentile of six values? Using the formula above, R equals 6.3. Since the largest value has a rank of 6, it is not really possible to compute a 90th percentile. Prism reports the largest value as the 90th percentile. A similar problem occurs if you try to compute the 10th percentile of six values. R equals 0.7, but the lowest value has a rank of 1. Prism reports the lowest value as the 10th percentile.