Interpreting a normality test

The result of a normality test is expressed as a P value that answers this question:

If your model is correct and all scatter around the model follows a Gaussian population, what is the probability of obtaining data whose residuals deviate from a Gaussian distribution as much (or more so) as your data does?

If the P value is large, then the residuals pass the normality test. If the P value is small, the residuals fail the normality test and you have evidence that your data don't follow one of the assumptions of the regression. Things to consider:

•Fit a different model

•Weight the data differently.

A large P value means that your data are consistent with the assumptions of regression (but certainly does not prove that the model is correct). With small numbers of data points, normality tests have little power to detect modest deviations from a Gaussian distribution.

We recommend relying on the D'Agostino-Pearson normality test. It first computes the skewness and kurtosis to quantify how far from Gaussian the distribution is in terms of asymmetry and shape. It then calculates how far each of these values differs from the value expected with a Gaussian distribution, and computes a single P value from the sum of these discrepancies. It is a versatile and powerful (compared to some others) normality test, and is recommended. Note that D'Agostino developed several normality tests. The one used by Prism is the "omnibus K2" test.

An alternative is the Shapiro-Wilk normality test. We prefer the D'Agostino-Pearson test for two reasons. One reason is that, while the Shapiro-Wilk test works very well if every residual is unique, it does not work well when several residuals are identical. The other reason is that the basis of the test is hard for non mathematicians to understand.

Earlier versions of Prism offered only the Kolmogorov-Smirnov test. We still offer this test (for consistency) but no longer recommend it. This test compares the cumulative distribution of the data with the expected cumulative Gaussian distribution, and bases its P value simply on the largest discrepancy. This is not a very sensitive way to assess normality, and we now agree with this statement1: "The Kolmogorov-Smirnov test is only a historical curiosity. It should never be used."

The Kolmogorov-Smirnov method as originally published assumes that you know the mean and SD of the overall population (perhaps from prior work). When analyzing data, you rarely know the overall population mean and SD. You only know the mean and SD of your sample. To compute the P value, therefore, Prism uses the Dallal and Wilkinson approximation to Lilliefors’ method (Am. Statistician, 40:294-296, 1986). Since that method is only accurate with small P values, Prism simply reports “P>0.10” for large P values.

Prism 8 adds the Anderson Darling test. While the Kolmogorov-Smirnov test only looks at the largest discrepancy between the actual distribution and the Gaussian distribution, the Anderson-Darling test sums all the discrepancies. Prism uses the form of the Anderson-Darling test that corrects for the fact that it uses the sample mean and sample SD, and doesn't know the population mean and SD to compare the data to.

1 RB D'Agostino, "Tests for Normal Distribution" in Goodness-Of-Fit Techniques edited by RB D'Agostino and MA Stepenes, Macel Decker, 1986.