## Please enable JavaScript to view this site.

 Interpreting results: One-way ANOVA

One-way ANOVA compares three or more unmatched groups, based on the assumption that the populations are Gaussian.

## P value

The P value tests the null hypothesis that data from all groups are drawn from populations with identical means. Therefore,  the P value answers this question:

If all the populations really have the same mean (the treatments are ineffective), what is the chance that random sampling would result in means as far apart (or more so) as observed in this experiment?

If the overall P value is large, the data do not give you any reason to conclude that the means differ. Even if the population means were equal, you would not be surprised to find sample means this far apart just by chance. This is not the same as saying that the true means are the same. You just don't have compelling evidence that they differ.

If the overall P value is small, then it is unlikely that the differences you observed are due to random sampling. You can reject the idea that all the populations have identical means. This doesn't mean that every mean differs from every other mean, only that at least one differs from the rest. Look at the results of post tests to identify where the differences are.

## F ratio and ANOVA table

The P value is computed from the F ratio which is computed from the ANOVA table.

ANOVA partitions the variability among all the values into one component that is due to variability among group means (due to the treatment) and another component that is due to variability within the groups (also called residual variation). Variability within groups (within the columns) is quantified as the sum of squares of the differences between each value and its group mean. This is the residual sum-of-squares. Variation among groups (due to treatment) is quantified as the sum of the squares of the differences between the group means and the grand mean (the mean of all values in all groups). Adjusted for the size of each group, this becomes the treatment sum-of-squares.

Each sum-of-squares is associated with a certain number of degrees of freedom (df, computed from number of subjects and number of groups), and the mean square (MS) is computed by dividing the sum-of-squares by the appropriate number of degrees of freedom. These can be thought of as variances. The square root of the mean square residual can be thought of as the pooled standard deviation.

The F ratio is the ratio of two mean square values. If the null hypothesis is true, you expect F to have a value close to 1.0 most of the time. A large F ratio means that the variation among group means is more than you'd expect to see by chance. You'll see a large F ratio both when the null hypothesis is wrong (the data are not sampled from populations with the same mean) and when random sampling happened to end up with large values in some groups and small values in others.

The P value is determined from the F ratio and the two values for degrees of freedom shown in the ANOVA table.

## Tests for equal variances

ANOVA is based on the assumption that the data are sampled from populations that all have the same standard deviations. Prism tests this assumption with two tests. It computes the Brown-Forsythe test and also (if every group has at least five values) computes Bartlett's test. There are no options for whether to run these tests. Prism automatically does so and always reports the results.

Both these tests compute a P value designed to answer this question:

If the populations really have the same standard deviations, what is the chance that you'd randomly select samples whose standard deviations are as different from one another (or more different) as they are in your experiment?

### Bartlett's test

Prism reports the results of the "corrected" Barlett's test as explained in section 10.6 of Zar(1). Bartlett's test works great if the data really are sampled from Gaussian distributions. But if the distributions deviate even slightly from the Gaussian ideal, Bartett's test may report a small P value even when the differences among standard deviations is trivial. For this reason, many do not recommend that test. That's why we added the test of Brown and Forsythe.  It has the same goal as the Bartlett's test, but is less sensitive to minor deviations from normality. We suggest that you pay attention to the Brown-Forsythe result, and ignore Bartlett's test (which we left in to be consistent with prior versions of Prism).

### Brown-Forsythe test

The Brown-Forsythe test is conceptually simple. Each value in the data table is transformed by subtracting from it the median of that column, and then taking the absolute value of that difference. One-way ANOVA is run on these values, and the P value from that ANOVA is reported as the result of the Brown-Forsythe test.

How does it work. By subtracting the medians, any differences between medians have been subtracted away, so the only distinction between groups is their variability.

Why subtract the median and not the mean of each group?  If you subtract the column mean instead of the column median, the test is called the Levene test for equal variances. Which is better? If the distributions are not quite Gaussian, it depends on what the distributions are. Simulations from several groups of statisticians show that using the median works well with many types of nongaussian data. Prism only uses the median (Brown-Forsythe) and not the mean (Levene).

### Interpreting the results

If the P value is small, you must decide whether you will conclude that the standard deviations of the populations are different. Obviously the tests of equal variances are based only on the values in this one experiment. Think about data from other similar experiments before making a conclusion.

If you conclude that the populations have different variances, you have four choices:

Conclude that the populations are different. In many experimental contexts, the finding of different standard deviations is as important as the finding of different means. If the standard deviations are truly different, then the populations are different regardless of what ANOVA concludes about differences among the means. This may be the most important conclusion from the experiment.

Transform the data to equalize the standard deviations, and then rerun the ANOVA. Often you'll find that converting values to their reciprocals or logarithms will equalize the standard deviations and also make the distributions more Gaussian.

Use the Welch or Brown-Forsythe versions of one-way ANOVA that do not assume that all standard deviations are equal.

Switch to the nonparametric Kruskal-Wallis test. The problem with this is that if your groups have very different standard deviations, it is difficult to interpret the results of the Kruskal-Wallis test. If the standard deviations are very different, then the shapes of the distributions are very different, and the kruskal-Wallis results cannot be interpreted as comparing medians.

## R squared

R2 is the fraction of the overall variance (of all the data, pooling all the groups) attributable to differences among the group means. It compares the variability among group means with the variability within the groups. A large value means that a large fraction of the variation is due to the treatment that defines the groups. The R2 value is calculated from the ANOVA table and equals the between group sum-of-squares divided by the total sum-of-squares. Some programs (and books) don't bother reporting this value. Others refer to it as η2 (eta squared) rather than R2. It is a descriptive statistic that quantifies the strength of the relationship between group membership and the variable you measured.

## Reference

J.H. Zar, Biostatistical Analysis, Fifth edition 2010, ISBN:  0131008463.