GraphPad Statistics Guide

Results: Wilcoxon matched pairs test

Results: Wilcoxon matched pairs test

Previous topic Next topic No expanding text in this topic  

Results: Wilcoxon matched pairs test

Previous topic Next topic JavaScript is required for expanding text JavaScript is required for the print function Mail us feedback on this topic!  

Interpreting the P value

The Wilcoxon test is a nonparametric test that compares two paired groups. Prism first computes the differences between each set of pairs and ranks the absolute values of the differences from low to high. Prism then sums the ranks of the differences where column A was higher (positive ranks), sums the ranks where column B was higher (it calls these negative ranks), and reports the two sums. If the average sums of ranks are very different in the two groups, the P value will be small.

The P value answers this question:

If the median difference in the entire population is zero (the treatment is ineffective), what is the chance that random sampling would result in a median change as far from zero (or further) as observed in this experiment?

If the P value is small, you can reject the idea that the difference is due to chance, and conclude instead that the populations have different medians.

If the P value is large, the data do not give you any reason to conclude that the overall medians differ. This is not the same as saying that the medians are the same. You just have no compelling evidence that they differ. If you have small samples, the Wilcoxon test has little power to detect small differences.

How the P value is calculated

If there are fewer than 200 pairs, Prism calculates an exact P value. See more details in the page about the Wilcoxon signed rank test. Prism 6 and later can do this even if there are ties. With more than 200 pairs, it calculates the P value from a Gaussian approximation. The term Gaussian, as used here, has to do with the distribution of sum of ranks and does not imply that your data need to follow a Gaussian distribution.

How Prism deals with pairs that have exactly the same value

What happens if  some of the subjects have exactly the same value before and after the intervention (same value in both columns)?

When Wilcoxon developed this test, he recommended that those data simply be ignored. Imagine there are ten pairs. Nine of the pairs have distinct before and after values, but  the tenth pair has identical values so the difference equals zero. Using Wilcoxon's original method, that tenth pairs would be ignored and the other nine pairs would be analyzed.This is how InStat and previous versions of Prism (up to version 5) handle the situation.

Pratt(1,2) proposed a different method that accounts for the tied values. Prism 6 and later offers the choice of using this method.

Which method should you choose? Obviously, if no pairs have identical before and after values, it doesn't matter. Nor does it matter much if there is, for example, only one such identical pair out of 200.

It makes intuitive sense that data should not be ignored, and so Pratt's method must be better.  However, Conover (3) has shown that the relative merits of the two methods depend on the underlying distribution of the data, which you don't know.

95% Confidence interval for the median difference

Prism can compute a 95% confidence interval for the median of the paired differences (choose on the options tab). This can only be interpreted when you assume that the distribution of differences is symmetrical. Prism 6 and later uses the method explained in page 234-235 of Sheskin (Fourth Edition) and 302-303 of Klotz.

Test for effective pairing

The whole point of using a paired test is to control for experimental variability. Some factors you don't control in the experiment will affect the before and the after measurements equally, so they will not affect the difference between before and after. By analyzing only the differences, therefore, a paired test corrects for these sources of scatter.

If pairing is effective, you expect the before and after measurements to vary together. Prism quantifies this by calculating the nonparametric Spearman correlation coefficient, rs. From rs, Prism calculates a P value that answers this question: If the two groups really are not correlated at all, what is the chance that randomly selected subjects would have a correlation coefficient as large (or larger) as observed in your experiment? The P value is one-tail, as you are not interested in the possibility of observing a strong negative correlation.

If the pairing was effective, rs will be positive and the P value will be small. This means that the two groups are significantly correlated, so it made sense to choose a paired test.

If the P value is large (say larger than 0.05), you should question whether it made sense to use a paired test. Your choice of whether to use a paired test or not should not be based on this one P value, but also on the experimental design and the results you have seen in other similar experiments (assuming you have repeated the experiments several times).

If rs is negative, it means that the pairing was counterproductive! You expect the values of the pairs to move together – if one is higher, so is the other. Here the opposite is true – if one has a higher value, the other has a lower value. Most likely this is just a matter of chance. If rs is close to -1, you should review your procedures, as the data are unusual.

Why results might differ from those reported by earlier versions of Prism

Results from Prism 6 and later can differ from prior versions because Prism now does exact calculations in two situations where Prism 5 did approximate calculations. All versions of Prism report whether it uses an approximate or exact methods.

Prism can perform the exact calculations much faster than did Prism 5, so does exact calculations with some sample sizes that earlier versions of Prism could only do approximate calculations.

If the before-after differences for two pairs are the same, prior versions of Prism always used the approximate method. Prism 6 uses the exact method unless the sample is huge.  

Prism reports whether it uses an approximate or exact method, so it is easy to tell if this is the reason for different results.

Descriptive statistics

The analysis tab of descriptive statistics summarizes only the data that was used for the Wilcoxon test. If you had any data in one column, but not the other, those values are not included in the descriptive statistics results that are included with the paired t test. But of course, the general descriptive statistics analysis analyzes all the data.


1. Pratt JW (1959) Remarks on zeros and ties in the Wilcoxon signed rank procedures. Journal of the American Statistical Association, Vol. 54, No. 287 (Sep., 1959), pp. 655-667

2. Pratt, J.W. and Gibbons, J.D. (1981), Concepts of Nonparametric Theory, New York: Springer Verlag.

3. WJ Conover, On Methods of Handling Ties in the Wilcoxon Signed-Rank Test, Journal of the American Statistical Association, Vol. 68, No. 344 (Dec., 1973), pp. 985-988