| Multiple comparisons
Interpreting an individual P value is easy. If the null hypothesis is true, the P value is the chance that random selection of subjects would result in a difference (or correlation or association...) as large (or larger) than observed in your study. If the null hypothesis is true, there is a 5% chance of randomly selecting subjects such that the trend is statistically significant.
However, many scientific studies generate more than one P value. Some studies in fact generate hundreds of P values. Interpreting multiple P values can be difficult.
If you test several independent null hypotheses, and leave the threshold at 0.05 for each comparison, there is greater than a 5% chance of obtaining at least one "statistically significant" result by chance. The second column in the table below shows you how much greater.
|
Number of Independent Null Hypotheses
|
Probability of obtaining one or more P values less than 0.05 by chance
|
Threshold to keep overall risk of type I error equal to 0.05
|
|
1
|
5%
|
0.0500
|
|
2
|
10%
|
0.0253
|
|
3
|
14%
|
0.0170
|
|
4
|
19%
|
0.0127
|
|
5
|
23%
|
0.0102
|
|
6
|
26%
|
0.0085
|
|
7
|
30%
|
0.0073
|
|
8
|
34%
|
0.0064
|
|
9
|
37%
|
0.0057
|
|
10
|
40%
|
0.0051
|
|
20
|
64%
|
0.0026
|
|
50
|
92%
|
0.0010
|
|
100
|
99%
|
0.0005
|
|
N
|
100(1.00 - 0.95^N)
|
1.00 - 0.95^(1/N)
|
Note: "0.95^N" means 0.95 to the Nth power.
|
|
To maintain the chance of randomly obtaining at least one statistically significant result at 5%, you need to set a stricter (lower) threshold for each individual comparison. This is tabulated in the third column of the table. If you only conclude that a difference is statistically significant when a P value is less than this value, then you'll have only a 5% chance of finding any "significant" difference by chance among all the comparisons.
For example, if you test three null hypotheses and use the traditional cutoff of alpha=0.05 for declaring each P value to be significant, there would be a 14% chance of observing one or more significant P values, even if all three null hypotheses were true. To keep the overall chance at 5%, you need to lower the threshold for significance to 0.0170.
If you compare three or more groups, account for multiple comparisons using post tests following one-way ANVOA. These methods account both for multiple comparisons and the fact that the comparisons are not independent. See How post tests work.
You can only account for multiple comparisons when you know about all the comparisons made by the investigators. If you report only "significant" differences, without reporting the total number of comparisons, others will not be able to properly evaluate your results. Ideally, you should plan all your analyses before collecting data, and then report all the results.
Distinguish between studies that test a hypothesis from studies that generate a hypothesis. Exploratory analyses of large databases can generate hundreds of P values, and scanning these can generate intriguing research hypotheses. To test these hypotheses, you'll need a different set of data.
|