Since the ROUT method is not yet a standard method, we did simulations to compare it to the Grubbs method. We compared the two methods for data with no outliers, with one outlier and with two outliers.
•All simulations assumed a Gaussian distribution with a mean of 100 and SD of 15 for the bulk of the values.
•A specified number of outliers were added. These were selected from a uniform distribution whose limits are specified.
•How the false discovery rate (FDR) was computed: For each simulated data set, the FDR was defined to be 0.0 if no outliers were detected. If any outliers were detected, the FDR for that simulation is the fraction of outliers that are false values that were simulated from the Gaussian distribution, and were not included as outliers by the simulation. The overall FDR is the average of these individual FDR values over the simulations.
•In each case, 25,000 simulations were done.
The table below shows the ten simulated experimental designs, which differ in sample size (n), the number of outliers included in the sample, and the range of values from which those outliers were selected.
Design 
n 
# of outliers 
Outlier range 
A 
100 
0 

B 
10 
0 

C 
10 
1 
5075 
D 
10 
1 
100125 
E 
100 
1 
100125 
F 
100 
1 
5075 
G 
100 
2 
5075 
H 
100 
2 
100125 
I 
10 
2 
5075 
J 
25 
2 
5075 
Here are the results. Each set of simulated data was analyzed by both the Grubbs and ROUT methods.




Number of outliers identified 



Design 
# Outliers 
Analysis method 
0 
1 
2 
>2 
FDR 
1 
A 
0 
Grubbs 5% 
95.104% 
4.69% 
0.19% 
0.20% 
4.90% 
2 
A 
0 
Rout 5% 
94.31% 
4.68% 
0.74% 
0.10% 
5.69% 
3 
A 
0 
Grubbs 1% 
99.10% 
0.90% 
0.00% 
0.00% 
0.90% 
4 
A 
0 
Rout 1% 
98.70% 
1.21% 
0.00% 
0.08% 
1.21% 
5 
B 
0 
Grubbs 5% 
94.99% 
5.01% 
0.00% 
0.00% 
5.01% 
6 
B 
0 
Rout 5% 
95.13% 
3.87% 
0.98% 
0.02% 
4.87% 
7 
B 
0 
Grubbs 1% 
98.92% 
1.08% 
0.00% 
0.00% 
1.08% 
8 
B 
0 
Rout 1% 
98.65% 
1.14% 
0.21% 
0.00% 
1.35% 
9 
C 
1 
Grubbs 1% 
74.33% 
25.41% 
0.26% 
0.00% 
0.13% 
10 
C 
1 
Rout 1% 
78.11% 
21.29% 
0.60% 
0.00% 
0.31% 
11 
D 
1 
Grubbs 1% 
5.50% 
93.51% 
0.99% 
0.00% 
0.50% 
12 
D 
1 
Rout 1% 
15.38% 
84.01% 
0.60% 
0.00% 
0.30% 
13 
D 
1 
Grubbs 5% 
0.20% 
94.86% 
4.75% 
0.18% 
2.51% 
14 
D 
1 
Rout 5% 
2.30% 
94.96% 
2.70% 
0.04% 
2.73% 
15 
E 
1 
Grubbs 1% 
0.00% 
98.94% 
1.05% 
0.01% 
0.53% 
16 
E 
1 
Rout 1% 
0.00% 
97.92% 
1.94% 
0.14% 
1.07% 
17 
F 
1 
Grubbs 1% 
43.94% 
55.47% 
0.57% 
0.02% 
0.40% 
18 
F 
1 
Rout 1% 
47.08% 
51.16% 
1.63% 
0.11% 
1.05% 
19 
G 
2 
Grubbs 1% 
39.70% 
29.84% 
30.72% 
0.38% 
0.16% 
20 
G 
2 
Rout 1% 
29.08% 
26.61% 
40.37% 
1.88% 
0.82% 
21 
G 
2 
Grubbs 5% 
10.82% 
21.29% 
64.23% 
3.66% 
1.40% 
22 
G 
2 
Rout 5% 
7.52% 
15.50% 
66.54% 
10.43% 
3.96% 
23 
H 
2 
Grubbs 1% 
0.00% 
0.00% 
98.89% 
1.11% 
0.37% 
24 
H 
2 
Rout 1% 
0.00% 
0.00% 
97.57% 
2.43% 
0.84% 
25 
I 
2 
Grubbs 5% 
98.80% 
1.20% 
0.00% 
0.00% 
0.00% 
26 
I 
2 
Rout 5% 
6.06% 
0.97% 
92.80% 
0.16% 
0.05% 
27 
I 
2 
Rout 1% 
27.46% 
2.58% 
69.95% 
0.01% 
0.004% 
28 
J 
2 
Grubbs 5% 
49.16% 
7.86% 
40.85% 
2.14% 
0.737% 
29 
J 
2 
Rout 5% 
24.57% 
13.27% 
57.46% 
0.71% 
1.74% 
30 
J 
2 
Grubbs 1% 
90.21% 
3.51% 
6.20% 
0.72% 
0.24% 
31 
J 
2 
Rout 1% 
54.47% 
15.08% 
29.46% 
0.98% 
0.36% 
When the simulations added no outliers to the data sets, the ROUT and Grubbs' tests perform almost identically. The value of Q specified for the ROUT method is equivalent to the value of alpha you set for the Grubbs' test. If you set alpha to 0.05 or Q to 5%, then you'll detect a single outlier in about 5% of simulations, even though all data in these simulations came from a Gaussian distribution.
When the simulations include a single outlier not from the same Gaussian distribution as the rest, the Grubb's test is slightly better able to detect it. The ROUT method has both more false negatives and false positives. It is slightly more likely to miss the outlier, and is also more likely to find two outliers even when the simulation actually only included one.
This is not so surprising, as Grubbs' test was really designed to detect a single outlier (although it can be used iteratively to detect more). While the difference between the two methods is consistent, it is not substantial.
When simulations include two outliers in a small data set, the ROUT test does a much better job. The iterative Grubbs' test is subject to masking, while the ROUT test is not. Whether or not masking is an issue depends on how large the sample is and how far the outliers are from the mean of the other values. In situations where masking is a real possibility, the ROUT test works much better than Grubbs' test. For example, when n=10 with two outliers (experimental design I), the Grubbs test never found both outliers and missed both outliers in 98.8% of the simulations. In the remaining 1.2% of simulations, the Grubbs' test found one of the two outliers. In contrast, the ROUT method identified both outliers in 92.8% of those simulations, and missed both in only 6% of simulations.
Once an outlier (or several outliers) is detected, stop and think. Don't just delete it.
Think about the assumptions. Both the Grubbs' and ROUT methods assume that the data (except for any outlers) are sampled from a Gaussian distribution. If that assumption is violated, the "outliers" may be from the same distribution as the rest. Beware of lognormal distributions. These distributions have values in the tails that will often be incorrectly flagged as outliers by methods that assume a Gaussian distribution.
Even if the value truly is an outlier from the rest, it may be a important value. It may not be a mistake. It may tell you about biological variability.
Grubbs' is slightly better than the ROUT method for the task it was designed for: Detecting a single outlier from a Gaussian distribution.
The Grubbs' test is much worse than the ROUT method at detecting two outliers. I can't imagine any scientific situation where you know for sure that there are either no outliers, or only one outlier, with no possibility of two or more outliers. Whenever the presence of two (or more) outliers is possible, we recommend that the ROUT method be used instead of the Grubbs' test.
More details, with links to the Prism file used to do these simulations