Frequently Asked Questions
The meaning of the term "nonparametric".
FAQ# 1582 Last Modified 1-March-2010
The term nonparametric is used inconsistently.
Nonparametric method or nonparametric data?
The term nonparametric should only refers to an analysis method. A statistical test can be nonparametric or not, although the distinction is not as crisp as you'd guess.
It makes no sense to describe data as being nonparametric, and the phrase "nonparametric data" should never ever be used. The term nonparametric simply does not describe data, or distributions of data. That term should only be used to describe the method used to analyze data.
Which methods are nonparametric?
Methods that analyze ranks are uniformly called nonparametric. These tests are all named after their inventors, including: Mann-Whitney, Wilcoxon, Kruskal-Wallis, Friedman, and Spearman.
Beyond that, the definition gets slippery.
What about modern statistical methods including randomization, resampling and bootstrapping? These methods do not necessarily assume any assumption about the population. They do not assume sampling from a Gaussian distribution. They analyze the actual data, and not the ranks. Are these methods nonparametric? Wilcox and Manly have each written texts about modern methods, but they do not refer to these methods as "nonparametric". Four texts of nonparametric statistics (by Conover, Gibbons, Lehman, and Daniel) don't mention randomization, resampling or bootstrapping at all, but the texts by Hollander and Wasserman do.
What about chi-square test, and Fisher's exact test? Are they nonparametric? Daniel and Gibbons include a chapter on these tests their texts of nonparametric statistics, but Lehman and Hollander do not.
What about survival data? Are the methods used to create a survival curve (Kaplan-Meier) and to compare survival curves (log-rank or Mantel-Haenszel) nonparametric? Hollander includes survival data in his text of nonparametric statistics, but the other texts of nonparametric statistics don't mention survival data at all. I think everyone would agree that fancier methods of analyzing survival curves (which involve fitting the data to a model) are not nonparametric.
Rank-based methods can be used for two purposes
I think the confusion arises because there are two distinct reasons to choose rank-based tests (like the Mann-Whitney test):
- To avoid making assumptions about the distribution of the population. This also implies that there is no strong model describing the population.
- To create a method that is robust to outliers.
Once you get beyond the rank-based tests, these two goals do not always go together. Modern methods can be distribution free, but not robust to outliers. And some robust methods are not distribution free (they can assume that all but a few values are sampled from a Gaussian distribution).The term 'nonparametric' can be confusing because it can be used as a synonym for three different phrases:
- Distribution free (the method makes no assumption, or at least no strong assumption, about the distribution of the population)
- Robust (the method is not much influenced by one or a few outliers.
- Rank based (the method works by first ranking the values, and then analyzing those ranks)
Because of these ambiguities, I would suggest avoiding the term nonparametric when possible. Instead, write up your analyses with the name of the test used.