## Frequently Asked Questions

# Planned comparisons after one-way ANOVA

FAQ# 1092 Last Modified 1-January-2009

**What are planned comparisons?**

In the context of one-way ANOVA, the term *planned comparison* is used when:

- You focus in on a few scientifically sensible comparisons rather than every possible comparison.
- The choice of which comparisons to make was part of the experimental design.
- You did not succumb to the temptation to do more comparisons after looking at the data.

It is important to distinguish between comparisons that are preplanned and those that are not (post hoc). It is not a planned comparison if you first look at the data, and based on that peek decide to make only two comparisons. In that case, you implicitly compared all the groups.

**The advantage of planned comparisons**

By making only a limited number of comparisons, you increase the statistical power of each comparison.

**Choices when doing planned comparisons. A. Correct for multiple comparisons?**

There are two approaches to analyzing planned comparisons:

- Use the Bonferroni correction for multiple comparisons, but only correct for the number of comparisons that were planned. Don't count other possible comparisons that were not planned, and so not performed. In this case, the significance level (often set to 5%) applies to the family of comparisons, rather than to each individual comparison.
- Set the significance level (or the meaning of the confidence interval) for each individual comparison. The 5% traditional significance level applies to each individual comparisons, rather than the whole family of comparisons as it does for multiple comparisons.

The second approach has more power to detect true differences, but also has a higher chance of falsely declaring a difference to be "significant". In other words, the second approach has a higher chance of making a Type I error but a lower chance of making a Type II error.

What is the logic of not correcting for multiple comparisons? It seems that some statisticians think this extra power is a deserved bonus for planning the experiment carefully and focussing on only a few scientifically sensible comparisons. Kepel and Wickles advocate this approach (reference below). But they also warn it is not fair to "plan" to make all comparisons, and thus not correct for multiple comparisons.

I don't really understand the logic of that second approach. It makes perfect sense that if you only plan to make two comparisons, the multiple comparisons should only correct for two comparisons and not the many others you could have made. I don't see how it makes sense to get rid of the whole idea of multiple comparisons just because they were preplanned. There is an inherent tradeoff between protecting against Type I errors (declaring differences "statistically significant" when they were in fact just due to a coincidence of random sampling) and Type II errors (declaring a difference "not statistically significant" even when there really is a difference). There is no way to avoid that tradeoff. Creating arbitrary rules just for preplanned comparisons does not seem justified to me.

**Choices when doing planned comparisons. B. Include all the groups when computing scatter?**

Each comparison is made by dividing the difference between means by the standard error of that difference. Two alternative approaches can be used to compute that standard error:

- Do an ordinary t test, only using the data in the two groups being compared.
- Use the ANOVA results to account for the scatter in all the groups. ANOVA assumes that all the data are sampled from Gaussian populations and that the SD of each of those populations is identical. That latter assumption is called
*homoscedasticity*. If that assumption is true, the scatter (variability) from all the groups can be pooled. The Mean Square Residual (also called Mean Square Error) of the ANOVA is used to compute the standard error of the difference.

If the assumption of homoscedasticity is valid, the second approach has more power. The calculation of the standard error is based on more data so is more precise. This shows up in the calculations as more degrees of freedom. But if that assumption is wrong, then pooling the scatter will give you an invalid measure of scatter.