Frequently Asked Questions
Multiple comparisons of survival curves
FAQ# 1655 Last Modified 17-September-2010
When you compare three or more survival curves at once, Prism reports a single P value testing the null hypothesis that all the samples come from populations with identical survival, and that all differences are due to chance. You may also want to drill down and compare curves two at a time.
The need to adjust for multiple comparisons
If you don't adjust for multiple comparisons, it is easy to fool yourself. If you compare many groups, the chances are high that one or more pair of groups will be 'significantly different' purely due to chance. To protect yourself from making this mistake, you probably should correct for multiple comparisons. Probably? There certainly are arguments for not adjusting for multiple comparisons.
How multiple comparisons of survival curves work
Multiple comparison tests after ANOVA are complicated because they not only use a stricter threshold for significance, but also include data from all groups when computing scatter (pooled SD, or Mean Square within), and use this value with every comparison. By quantifying variability from all groups, not just the two you are comparing, you gain some degrees of freedom and thus some power.
Multiple comparison tests for comparing survival curves are simpler. You simply have to adjust the definition of significance, and don't need to take into account any information about the groups not in the comparison (as that information would not be helpful).
Comparing survival curves two at a time with Prism
For each pair of groups you wish to compare, follow these steps:
- Start from the results sheet that compares all groups.
- Click New, and then Duplicate Current Sheet.
- The Analyze dialog will pop up. On the right side, select the two groups you wish to compare and make sure all other data sets are unselected. Then click OK.
- The parameters dialog for survival analysis pops up. Click OK without changing anything.
- Note the P value (from the logrank or Gehan-Breslow-Wilcoxon test), but don't interpret it until you correct for multiple comparisons, as explained in the next section.
- Repeat the steps for each comparison if you want each to be in its own results sheet. Or click Change.. data analyzed, and choose a different pair of data sets.
Which comparisons are 'statistically significant'?
When you are comparing multiple pairs of groups at once, you can't interpret the individual P in the usual way. Instead, you set a significance level, and ask which comparisons are 'statistically significant' using that threshold.
The simplest approach is to use the Bonferroni method:
- Define the significance level that you want to apply to the entire family of comparisons. This is conventionally set to 0.05.
- Count the number of comparisons you are making, and call this value K. See the next section which discusses some ambiguities.
- Compute the Bonferroni corrected threshold that you will use for each individual comparison. This equals the family-wise significance level (defined in step 1 above, usually .05) divided by K.
- If a P value is less than this Bonferroni-corrected threshold, then the comparison can be said to be 'statistically significant'.
How many comparisons are you making?
You must be honest about the number of comparisons you are making. Say there are four treatment groups (including control). You then go back and compare the group with the longest survival with the group with the shortest survival. It is not fair to say that you are only making one comparison, since you couldn't decide which comparison to make without looking at all the data. With four groups, there are six pairwise comparisons you could make. You have implicitly made all these comparisons, so you should define K in step 3 above to equal 6.
If you were only interested in comparing each of three treatments to the control, and weren't interested in comparing the treatments with each other, then you would be making three comparisons, so should set K equal to 3.