GraphPad logo GraphPad Home Page

Grubbs Test for Detecting Outliers

Statisticians have devised several ways to detect outliers. Grubbs' test is particularly easy to follow. This method is also called the ESD method (extreme studentized deviate).

The first step is to quantify how far the outlier is from the others. Calculate the ratio Z as the difference between the outlier and the mean divided by the SD. If Z is large, the value is far from the others. Note that you calculate the mean and SD from all values, including the outlier.

Since 5% of the values in a Gaussian population are more than 1.96 standard deviations from the mean, your first thought might be to conclude that the outlier comes from a different population if Z is greater than 1.96. This approach only works if you know the population mean and SD from other data. Although this is rarely the case in experimental science, it is often the case in quality control. You know the overall mean and SD from historical data, and want to know whether the latest value matches the others. This is the basis for quality control charts.

When analyzing experimental data, you don't know the SD of the population. Instead, you calculate the SD from the data. The presence of an outlier increases the calculated SD. Since the presence of an outlier increases both the numerator (difference between the value and the mean) and denominator (SD of all values), Z does not get very large. In fact, no matter how the data are distributed, Z can not get larger than(N-1)/sqrt(N), where N is the number of values. For example, if N=3, Z cannot be larger than 1.555 for any set of values.

Grubbs and others have tabulated critical values for Z which are tabulated below. The critical value increases with sample size, as expected.

If your calculated value of Z is greater than the critical value in the table, then the P value is less than 0.05. This means that there is less than a 5% chance that you'd encounter an outlier so far from the others (in either direction) by chance alone, if all the data were really sampled from a single Gaussian distribution. Note that the method only works for testing the most extreme value in the sample (if in doubt, calculate Z for all values, but only calculate a P value for Grubbs' test from the largest value of Z.

Once you've identified an outlier, you may choose to exclude that value from your analyses. Or you may choose to keep the outlier, but use robust analysis techniques that do not assume that data are sampled from Gaussian populations.

If you decide to remove the outlier, you then may be tempted to run Grubbs' test again to see if there is a second outlier in your data. If you do this , you cannot use the same table. Rosner has extended the method to detecting several outliers in one sample. See the first referernce below for details..

References: (Click to see full citation, and to order from amazon.com)

How to Detect and Handle Outliersby B Iglewicz and DC Hoaglin,

Outliers in Statistical Data (3rdedition)by V. Barnett and T. Lewis

Critical values for Z.Calculate Z as shown above. Look up the critical value of Z in the table below, where N is the number of values in the group. If your value of Z is higher than the tabulated value, the P value is less than 0.05.

N

Critical Z

 

N

Critical Z

3

1.15

 

27

2.86

4

1.48

 

28

2.88

5

1.71

 

29

2.89

6

1.89

 

30

2.91

7

2.02

 

31

2.92

8

2.13

 

32

2.94

9

2.21

 

33

2.95

10

2.29

 

34

2.97

11

2.34

 

35

2.98

12

2.41

 

36

2.99

13

2.46

 

37

3.00

14

2.51

 

38

3.01

15

2.55

 

39

3.03

16

2.59

 

40

3.04

17

2.62

 

50

3.13

18

2.65

 

60

3.20

19

2.68

 

70

3.26

20

2.71

 

80

3.31

21

2.73

 

90

3.35

22

2.76

 

100

3.38

23

2.78

 

110

3.42

24

2.80

 

120

3.44

25

2.82

 

130

3.47

26

2.84

 

140

3.49

 

Computing an approximate P value

You can also calculate an approximate P value as follows.

  1. Calculate  .
    N is the number of values in the sample, Z is calculated for the suspected outlier as shown above.
  2. Look up the two-tailed P value for the student t distribution with the calculated value of T and N-2 degrees of freedom. Using Excel, the formula is =TDIST(T,DF,2) (the '2' is for a two-tailed P value).
  3. Multiply the P value you obtain in step 2 by N. The result is an approximate P value for the outlier test. This P value is the chance of observing one point so far from the others if the data were all sampled from a Gaussian distribution. If Z is large, this P value will be very accurate. With smaller values of Z, the calculated P value may be too large.



Copyright © 2000 by GraphPad Software, Inc. All rights reserved.      GraphPad Home Page