﻿ Interpreting results: Normality tests

# Interpreting results: Normality tests

## What question does the normality test answer?

The normality tests all report a P value. To understand any P value, you need to know the null hypothesis. In this case, the null hypothesis is that all the values were sampled from a population that follows a Gaussian distribution.

The P value answers the question:

If that null hypothesis were true, what is the chance that a random sample of data would deviate from the Gaussian ideal as much as these data do?

Prism also uses the traditional 0.05 cut-off to answer the question whether the data passed the normality test. If the P value is greater than 0.05, the answer is Yes. If the P value is less than or equal to 0.05, the answer is No.

## What should I conclude if the P value from the normality test is high?

All you can say is that the data are not inconsistent with a Gaussian distribution. A normality test cannot prove the data were sampled from a Gaussian distribution. All the normality test can do is demonstrate that the deviation from the Gaussian ideal is not more than you’d expect to see with chance alone. With large data sets, this is reassuring. With smaller data sets, the normality tests don’t have much power to detect modest deviations from the Gaussian ideal.

## What should I conclude if the P value from the normality test is low?

The null hypothesis is that the data are sampled from a Gaussian distribution. If the P value is small enough, you reject that null hypothesis and so accept the alternative hypothesis that the data are not sampled from a Gaussian population. The distribution could be close to Gaussian (with large data sets) or very far form it. The normality test tells you nothing about the alternative distributions.

If you P value is small enough to declare the deviations from the Gaussian idea to be "statistically significant", you then have four choices:

The data may come from another identifiable distribution. If so, you may be able to transform your values to create a Gaussian distribution. For example, if the data come from a lognormal distribution, transform all values to their logarithms.

The presence of one or a few outliers might be causing the normality test to fail. Run an outlier test. Consider excluding the outlier(s).

If the departure from normality is small, you may choose to do nothing. Statistical tests tend to be quite robust to mild violations of the Gaussian assumption.

Switch to nonparametric tests that don’t assume a Gaussian distribution. But the decision to use (or not use) nonparametric tests is a big decision. It should not be based on a single normality test and should not be automated.