Comparing the Grubbs' and ROUT method of identifying outliers.
Detecting multiple outliers is a challenge.
Grubbs' test, also called the ESD (extreme studentized deviate) method, is a common method to remove outliers. While it was designed to detect one outlier, it is often extended to detect multiple outliers. This is done using a simple method. If an outlier is found, it is removed and the remaining values are tested again. If that second test finds an outlier, then that value is removed, and the test is run a third time ...
While Grubb's test does a good job of finding one outlier in a data set, it is known to not work so well with multiple outliers. The presence of a second outlier in a small data set can prevent the first one from being detected. This is called masking. Grubbs' method identifies an outlier by calculating the difference between the value and the mean, and then dividing that difference by the standard deviation. When that ratio is too large, the value is defined to be an outlier. The problem is that the standard deviation is computed from all the values, including the outliers. With two outliers, the standard deviation can become large, which reduces that ratio to a value below the critical value used to define outliers. Details.
The ROUT method was developed as a method to identify outliers from nonlinear regression. Learn more about the ROUT method. This method can detect any number of outliers (up to 30% of the sample size). While it was designed to find outliers from curves fit by nonlinear regression, the ROUT method can easily be extended to identify outliers in a column of numbers (the details appear at the bottom of this page).
With the Grubbs' test, you specify alpha. If there are no outliers, alpha is the chance of mistakenly identifying one or more outliers. The ROUT method is based on the False Discovery Rate (FDR), so you specify Q, which is the maximum desired FDR. When there are no outliers (and the distribution is Gaussian), Q is very similar to alpha -- the chance of identifying one or more outliers when in fact there aren't any. When there are outliers in the data, Q is the desired false discovery rate. If you set Q to 1%, then you are aiming for no more than 1% of the identified outliers to be false (are in fact just the tail of a Gaussian distribution) and at least 99% to be actual outliers (from a different distribution).
Simulations comparing the ROUT method with Grubbs' method
I performed simulations to compare the Grubbs' and ROUT methods of detecting outliers. Download the details of the simulations and results. Briefly, the data were sampled from a Gaussian distribution. In most cases, outliers (drawn from a uniform distribution with specified limits) were added. Each experimental design was simulated 25,000 times, and I tabulated the number of simulations with zero, one, two, or more than two outliers.
- When there are no outliers, the ROUT and Grubbs' tests perform almost identically. The value of Q specified for the ROUT method is equivalent to the value of alpha you set for the Grubbs' test.
- When there is a single outlier, the Grubb's test is slightly better able to detect it. The ROUT method results in both more false negatives and more false positives. It is slightly more likely to miss the outlier, and is also more likely to find two outliers even when the simulation only included one. This is not so surprising, as Grubbs' test was really designed to detect a single outlier (although it can be used iteratively to detect more). While the difference between the two methods is clear, it is not substantial.
- When there are two outliers in a small data set, the ROUT test does a much better job. The iterative Grubbs' test is subject to masking, while the ROUT test is not. Whether or not masking is an issue depends on how large the sample is and how far the outliers are from the mean of the other values. In situations where masking is a real possibility, the ROUT test works much better than Grubbs' test. For example, when n=10 with two outliers, the Grubbs test never found both outliers and missed both in 98.8% of the simulations (in the remaining 1.2% of simulations, the Grubbs' test found one of the two outliers). In contrast, the ROUT method identified both outliers in 92.8% of those simulations and missed both in only 6% of simulations.
Bottom line: Grubbs' is slightly better than the ROUT method for the task it was designed for: Detecting a single outlier from a Gaussian distribution. But the Grubbs' test is much worse at detecting two outliers in some situations. I can't imagine any scientific situation where you know for sure that there are either no outliers or only one outlier, with no possibility of two or more outliers. Whenever the presence of two (or more) outliers is possible, we recommend that the ROUT method be used instead of the Grubbs' test.
Reminder. Don't delete outliers without thinking.
One an outlier is detected, stop and think. Don't just delete it.
- Think about the assumptions. Both the Grubbs' and ROUT methods assume that the data (except for any outlers) are sampled from a Gaussian distribution. If that assumption is violated, the "outliers" may be from the same distribution as the rest. Beware of lognormal distributions. These distributions have values in the tails that will often be incorrectly flagged as outliers by methods that assume a Gaussian distribution.
- Even if the value truly is an outlier from the rest, it may be a important value. It may not be a mistake. It may tell you about biological variability.
How the ROUT method was adapted to find outlier(s) in a stack of values
Enter the values you are testing as Y values on an XY table. Enter an arbitrary X value for each (the row number). Then fit the XY data to the model, Y=0*X + M. The X values are needed to use nonlinear regression, and the equation must include X. But since X is multiplied by zero, the X values have no impact on the results. And while the method is called "nonlinear" regression, it fits this equation just fine. Choose outlier elimination, and Prism, via the ROUT method, will fit a robust mean (M), eliminate outliers (assuming the rest of the values are sampled from a Gaussian distribution), and then fit the equation to the outlier depleted data.