﻿ Nonparametric tests with small and large samples

# Nonparametric tests with small and large samples

## Small samples

Your decision to choose a parametric or nonparametric test matters the most when samples are small (say less than a dozen values).

If you choose a parametric test and your data do not come from a Gaussian distribution, the results won't be very meaningful. Parametric tests are not very robust to deviations from a Gaussian distribution when the samples are tiny.

If you choose a nonparametric test, but actually do have Gaussian data, you are likely to get a P value that is too large, as nonparametric tests have less power than parametric tests, and the difference is noticeable with tiny samples.

Unfortunately, normality tests have little power to detect whether or not a sample comes from a Gaussian population when the sample is tiny. Small samples simply don't contain enough information to let you make reliable inferences about the shape of the distribution in the entire population.

## Large samples

The decision to choose a parametric or nonparametric test matters less with huge samples (say greater than 100 or so).

If you choose a parametric test and your data are not really Gaussian, you haven't lost much as the parametric tests are robust to violation of the Gaussian assumption, especially if the sample sizes are equal (or nearly so).

If you choose a nonparametric test, but actually do have Gaussian data, you haven't lost much as nonparametric tests have nearly as much power as parametric tests when the sample size is large.

Normality tests work well with large samples, which contain enough data to let you make reliable inferences about the shape of the distribution of the population from which the data were drawn. But normality tests don't answer the question you care about. What you want to know is whether the distribution differs enough from Gaussian to cast doubt on the usefulness of parametric tests. But normality tests answer a different question. Normality tests ask the question of whether there is evidence that the distribution differs from Gaussian. But with huge samples, normality testing will detect tiny deviations from Gaussian, differences small enough so they shouldn't sway the decision about parametric vs. nonparametric testing.

## Summary

 Large samples (>100 or so) Small samples (<12 or so) Parametric tests on nongaussian data OK. Tests are robust. Misleading. Not robust. Nonparametric tests on Gaussian data OK. Tests have good power. Misleading. Too little power. Usefulness of normality testing A bit useful. Not very useful.