﻿ When it makes sense to not correct for multiple comparisons

# When it makes sense to not correct for multiple comparisons

Multiple comparisons can be accounted for with Bonferroni and other corrections, or by the approach of controlling the False Discover Rate. But these approaches are not always needed. Here are three situations were special calculations are not needed.

## Account for multiple comparisons when interpreting the results rather than in the calculations

Some statisticians recommend never correcting for multiple comparisons while analyzing data (1,2). Instead report all of the individual P values and confidence intervals, and make it clear that no mathematical correction was made for multiple comparisons. This approach requires that all comparisons be reported. When you interpret these results, you need to informally account for multiple comparisons. If all the null hypotheses are true, you’d expect 5% of the comparisons to have uncorrected P values less than 0.05. Compare this number to the actual number of small P values.

Following ANOVA, the unprotected Fishers Least Significant Difference test follows this approach.

## Corrections for multiple comparisons may not be needed if you make only a few planned comparisons

The term planned comparison is used when:

You focus in on a few scientifically sensible comparisons rather than every possible comparison.

The choice of which comparisons to make was part of the experimental design.

You did not succumb to the temptation to do more comparisons after looking at the data.

It is important to distinguish between comparisons that are preplanned and those that are not (post hoc). It is not a planned comparison if you first look at the data, and based on that peek decide to make only two comparisons. In that case, you implicitly compared all the groups.

If you only make a few planned comparisons, some statistical texts recommend setting the significance level (or the meaning of the confidence interval) for each individual comparison without correction for multiple comparisons. In this case, the 5% traditional significance level applies to each individual comparisons, rather than the whole family of comparisons.

The logic of not correcting for multiple comparisons seems to be that some statisticians think this extra power is a deserved bonus for planning the experiment carefully and focussing on only a few scientifically sensible comparisons. Kepel and Wickles advocate this approach (reference below). But they also warn it is not fair to "plan" to make all comparisons, and thus not correct for multiple comparisons.

## Corrections for multiple comparisons are not needed when the comparisons are complementary

Ridker and colleagues (3) asked whether lowering LDL cholesterol would prevent heart disease in patients who did not have high LDL concentrations and did not have a prior history of heart disease (but did have an abnormal blood test suggesting the presence of some inflammatory disease).  They study included almost 18,000 people. Half received a statin drug to lower LDL cholesterol and half received placebo.

The investigators primary goal (planned as part of the protocol) was to compare the number of  “end points” that occurred in the two groups, including deaths from a heart attack or stroke, nonfatal heart attacks or strokes, and hospitalization for chest pain. These events happened about half as often to people treated with the drug compared to people taking placebo. The drug worked.

The investigators also analyzed each of the endpoints separately. Those taking the drug (compared to those taking placebo) had fewer deaths, and fewer heart attacks, and fewer strokes, and fewer hospitalizations for chest pain.

The data from various demographic groups were then analyzed separately. Separate analyses were done for men and women, old and young, smokers and nonsmokers, people with hypertension and without, people with a family history of heart disease and those without. In each of 25 subgroups, patients receiving the drug experienced fewer primary endpoints than those taking placebo, and all these effects were statistically significant.

The investigators made no correction for multiple comparisons for all these separate analyses of outcomes and  subgroups. No corrections were needed, because the results are so consistent.  The multiple comparisons each ask the same basic question a different way (does the drug prevent disease?), and all the comparisons point to the same conclusion – people taking the drug had less cardiovascular disease than those taking placebo.

## References

1. Rothman, K.J. (1990). No adjustments are needed for multiple comparisons .Epidemiology, 1: 43-46.

2. D. J. Saville, Multiple Comparison Procedures: The Practical Solution. The American Statistician, 44:174-180, 1990

3. Ridker. Rosuvastatin to Prevent Vascular Events in Men and Women with Elevated C-Reactive Protein. N Engl J Med (2008) vol. 359 pp. 3195