﻿ Interpreting results: Friedman test

# Interpreting results: Friedman test

## P value

The Friedman test is a nonparametric test that compares three or more matched or paired groups. The Friedman test first ranks the values in each matched set (each row) from low to high. Each row is ranked separately. It then sums the ranks in each group (column). If the sums are very different, the P value will be small. Prism reports the value of the Friedman statistic, which is calculated from the sums of ranks and the sample sizes.

The whole point of using a matched test is to control for experimental variability between subjects, thus increasing the power of the test. Some factors you don't control in the experiment will increase (or decrease) all the measurements in a subject. Since the Friedman test ranks the values in each row, it is not affected by sources of variability that equally affect all values in a row (since that factor won't change the ranks within the row).

The P value answers this question: If the different treatments (columns) really are identical, what is the chance that random sampling would result in sums of ranks as far apart (or more so) as observed in this experiment?

If the P value is small, you can reject the idea that all of the differences between columns are due to random sampling, and conclude instead that at least one of the treatments (columns) differs from the rest. Then look at post test results to see which groups differ from which other groups.

If the P value is large, the data do not give you any reason to conclude that the overall medians differ. This is not the same as saying that the medians are the same. You just have no compelling evidence that they differ. If you have small samples, Friedman's test has little power.

## Exact or approximate P value?

With a fairly small table, Prism does an exact calculation. When the table is larger, Prism uses a standard approximation. To decide when to use the approximate method, Prism computes (T!)S (T factorial to the S power) where T is number of treatments (data sets) and S is the number of subjects (rows).When that value exceeds 109, Prism uses the approximate method. For example, if there are 3 treatments and 12 rows, then (T!)S equals 612, which equals 2.2 × 109, so Prism uses an approximate method.

The approximate method is sometimes called a Gaussian approximation. The term Gaussian has to do with the distribution of sum of ranks, and does not imply that your data need to be sampled from a Gaussian distribution. With medium size samples, Prism can take a long time to calculate the exact P value. You can interrupt the calculations if an approximate P value meets your needs.

The exact method works by examining all possible rearrangements of the values, keeping each value in the same row (same subject, since this is a repeated measures design) but allowing the column (treatment) assignment to vary.

If two or more values (in the same row) have the same value, previous versions of Prism were not able to calculate the exact P value, so Prism computed an approximate P value even with tiny samples. Prism 6 can compute an exact P value even in the presence of ties, so only uses an approximation when sample size is fairly large as explained above. This means that with some data sets, Prism 6 will report different results than prior versions did.

## Dunn's post test

Following Friedman's test, Prism can perform Dunn's post test. For details, see Applied Nonparametric Statistics by WW Daniel, published by PWS-Kent publishing company in 1990 or Nonparametric Statistics for Behavioral Sciences by S Siegel and NJ Castellan, 1988. The original reference is O.J. Dunn, Technometrics, 5:241-252, 1964. Note that some books and programs simply refer to this test as the post test following a Friedman test and don't give it an exact name.

Dunn's post test compares the difference in the sum of ranks between two columns with the expected average difference (based on the number of groups and their size). For each pair of columns, Prism reports the P value as >0.05, <0.05, <0.01, or < 0.001. The calculation of the P value takes into account the number of comparisons you are making. If the null hypothesis is true (all data are sampled from populations with identical distributions, so all differences between groups are due to random sampling), then there is a 5% chance that at least one of the post tests will have P<0.05. The 5% chance does not apply to each comparison but rather to the entire family of comparisons.