Interpreting results: Quartiles and the interquartile range

What are percentiles?

Percentiles are useful for giving the relative standing of an individual in a group. Percentiles are essentially normalized ranks. The 80th percentile is a value where you'll find 80% of the values lower and 20% of the values higher. Percentiles are expressed in the same units as the data.

The median

The median is the 50th percentile. Half the values are higher; half are lower. Rank the values from low to high. If there are an odd number of points, the median is the one in the middle. If there are an even number of points, the median is the average of the two middle values.

Quartiles

Quartiles divide the data into four groups, each containing an equal number of values. Quartiles are divided by the 25th, 50th, and 75th percentile. One quarter of the values are less than or equal to the 25th percentile. Three quarters of the values are less than or equal to the 75th percentile.

Interquartile range

The difference between the 75th and 25th percentile is called the interquartile range. It is a useful way to quantify scatter.

Computing percentiles

To compute a percentile value, first compute P*(N+1)/100, where P is the percentile value (i.e. 25, 50, or 75) and N is the number of values in the data set. The result is the rank that corresponds to that percentile value. If there are 68 values, the 25th percentile corresponds to a rank equal to 25*(68+1)/100 =17.25. Therefore, the 25th percentile lies between the value of the 17th and 18th value (when ranked from low to high). But where exactly? There is no clear answer, so not all programs compute the percentile the same way. Prism 5 computes the 25th percentile in this example as the value at 25% of the distance from the 17th to 18th value (earlier versions of Prism averaged the 17th and 18th values).

Because different methods for computing the 25th and 75th percentiles give different results with small data sets, we suggest that you only report the 25th and 75th percentiles for large data sets (N>100 is a reasonable cut off). For smaller data sets, we suggest showing a column scatter graph that shows every value.

 Note that there is no ambiguity about how to compute the median. All programs do it the same way.