GraphPad Statistics Guide

Advice: Don't automate the decision to use a nonparametric test

Advice: Don't automate the decision to use a nonparametric test

Previous topic Next topic No expanding text in this topic  

Advice: Don't automate the decision to use a nonparametric test

Previous topic Next topic JavaScript is required for expanding text JavaScript is required for the print function Mail us feedback on this topic!  

Don't use this approach:

First perform a normality test. If the P value is low, demonstrating that the data do not follow a Gaussian distribution, choose a nonparametric test. Otherwise choose a conventional test.

Prism does not use this approach, because the choice of parametric vs. nonparametric is more complicated than that.

Often, the analysis will be one of a series of experiments. Since you want to analyze all the experiments the same way, you cannot rely on the results from a single normality test.

Many biological variables follow lognormal distributions. If your data are sampled from a lognormal distribution, the best way to analyze the data is to first transform to logarithms and then analyze the logs. It would be a mistake to jump right to nonparametric tests, without considering transforming.

Other transforms can also be useful (reciprocal) depending on the distribution of the data.

Data can fail a normality test because of the presence of an outlier. Removing that outlier can restore normality.

The decision of whether to use a parametric or nonparametric test is most important with small data sets (since the power of nonparametric tests is so low). But with small data sets, normality tests have little power to detect nongaussian distributions, so an automatic approach would give you false confidence.

With large data sets, normality tests can be too sensitive. A low P value from a normality test tells you that there is strong evidence that the data are not sampled from an ideal Gaussian distribution. But you already know that, as almost no scientifically relevant variables form an ideal Gaussian distribution. What you want to know is whether the distribution deviates enough from the Gaussian ideal to invalidate conventional statistical tests (that assume a Gaussian distribution). A normality test does not answer this question. With large data sets, trivial deviations from the idea can lead to a small P value.

The decision of when to use a parametric test and when to use a nonparametric test is a difficult one, requiring thinking and perspective. This decision should not be automated.