KNOWLEDGEBASE - ARTICLE #1533

t tests after one-way ANOVA, without correction for multiple comparisons

Correcting for multiple comparisons is not essential

Testing multiple hypotheses at once creates a dilemma that cannot be escaped. If you do not make any corrections for multiple comparisons, it becomes 'too easy' to find 'significant' findings by chance -- it is too easy to make a Type I error. But if you do correct for multiple comparisons, you lose power to detect real differences -- it is too easy to make a Type II error.

The only way to escape this dilemma is to focus you analyses, and thus avoid making multiple comparisons. For example, if your treatments are ordered, don't compare each mean with each other mean (multiple comparisons), instead do one test for trend to ask if the outcome is linearly related with treatment number. Another example: If some of the groups are simply positive and negative controls needed to verify that an experiment 'worked', don't include them as part of the ANOVA and as part of the multiple comparisons. Once you verified that the experiment worked, throw away those controls and only analyze the data that relate to your experimental hypothesis, which might be a single comparison.

If you need to test multiple hypotheses at once, there is simply no way to escape the dilemma. If you use multiple comparisons procedures to reduce the risk of making a Type I error, you will increase your risk of making a Type II error. If you don't make corrections for multiple comparisons, you increase your risk of making a Type I error and lower the chance of making a Type II error.

How to compute individual P values without correcting for multiple comparisons

Saville suggests that corrections for multiple comparison not be performed, but rather that you simply report all your data and let your readers make the conclusions (D. J. Saville, Multiple Comparison Procedures: The Practical Solution. The American Statistician, 44:174-180, 1990). This requires you to alert your readers to the fact you have not done any correction for multiple comparisons, and to honestly report all the comparisons you did make, so the reader can informally adjust for multiple comparisons while reviewing the data.

A t test compares the difference between two means with a standard error of that difference, which is computed from the pooled standard deviation of the groups and their sample sizes. One-way ANOVA assumes that all the data are sampled from populations that follow a Gaussian distribution, and that the standard deviation of all of these populations is the same. If you accept this assumption (which is standard), then you can do better than performing multiple t tests. Rather than accounting for the standard deviation of just the two groups you are comparing, you can compute it from all the groups. This gives you a more accurate assessment of variation, which shows up as more degrees of freedom, and thus a bit more power to find differences. In fact, one-way ANOVA does this calculation for you. The same idea applies to two-way ANOVA.

To compare any two means after one-way or two-way ANOVA:

Compute the difference between the two means.

Compute (1/Na + 1/Nb) * MSresidual, where Na and Nb are the sample sizes of the two groups you are comparing, and MSresidual is the Mean Square for Residuals (sometimes called "Error") from the one-way or two-way ANOVA results.

Compute the square root of the value computed in step 2. The result is the pooled standard error of the difference.

Divide the difference computed in step 1 by the pooled standard error value computed in step 3. The result is the t ratio.

Look up (or use the free QuickCalc) to determine the P value. This requires entering the number of degrees of freedom, which comes from the ANOVA table (the DF for residual or error). This Excel formula also gives the answer: =TDIST(t, DF, 2). The first value is the t ratio computed in step 4, the second value is the number of degrees of freedom from the ANOVA (number of values minus number of groups) and the third value is 2 because you want a two-tailed P value.

Remember this P value is NOT corrected for multiple comparisons.

These individual P values can be calculated whether or not the overall ANOVA yielded a P value less than 0.05. This method is known as the unprotected Fisher Least Significant Difference (LSD) test.

But I thought the Fishers LSD test is outmoded and never recommended!

The protected (also called restricted) Fishers LSD test was the first multiple comparison invented. The word "protected" means that you first look at the P value for the entire ANOVA. If greater than 0.05, you state that none of the differences are 'significant', and don't look at individual comparisons. If that P value is less than 0.05, you can divide the comparisons into a set that are 'significant' and those that are not. Better methods have been developed, and the protected Fisher's LSD test is not recommended. The whole idea of multiple comparison post tests is to set the family-wise error rate to 5%, and the protected Fishers LSD test does not do this very well.

The method described in the previous secton is the unprotected Fisher LSD test, and treats the P values as individual P values. They don't account for multiple comparisons.

Confidence intervals too

When you perform multiple comparison tests, most will also report multiple comparison confidence interval. This means that the 95% probability doesn't apply to individual intervals, but rather to the entire family of intervals. You are 95% sure that all the confidence intervals contain the true differences.

If you report P values without correcting for multiple comparisons, as recomended above, you should also report corresponding 95% confidence intervals. To do so, follow these steps (the first three steps are the same as before):

Compute the difference between the two means.
Compute (1/Na + 1/Nb) * MSresidual, where Na and Nb are the sample sizes of the two groups you are comparing, and MSresidual is the Mean Square for Residuals (sometimes called "Error") from the one-way or two-way ANOVA results.
Compute the square root of the value computed in step 2. The result is the pooled standard error of the difference.
Look up, or use QuickCalcs, the critical t value for 95% confidence and the number of degrees of freedom from the ANOVA results (DF residual or error).
Multiply the t ratio from step 4 times the pooled standard error of step 3.
Add and subtract the value computed in step 5 from the difference computed in step 1. The range between the two results is the 95% confidence interval for the difference, not adjusting for multiple comparisons.

Example
This Excel file works through an example.

How can one interpret a set of P values, which are not adjusted for multiple comparisons?

One has to be very cautious when interpreting a set of P values. It is very easy to get distracted by the smallest P value, and not take into account how large the set of P values is. It is often best to consider these multiple comparisons as a tool to generate hypotheses -- hypotheses that can be tested with future, more focussed, experiments.

Will Prism 5 do these calculations? Is there a free QuickCalc that does them.

No.

t tests after one-way ANOVA, without correction for multiple comparisons

Explore the Knowledgebase