Multiple comparisons of survival curves
When you compare three or more survival curves at once, Prism reports a single P value testing the null hypothesis that all the samples come from populations with identical survival, and that all differences are due to chance. You may also want to drill down and perform pairwise comparisons (comparing curves two at a time). The term "multiple comparisons" is used to refer to the process of conducting many such pairwise comparisons to investigate differences between different pairs of survival curves. This could mean comparing many different survival curves to a single "control" curve, comparing each curve to every other curve, or comparing specific pairs of curves from among the entire set of curves. Starting with version 10.5, Prism offers the ability to perform all of these different comparison methods, and provides all of the necessary controls for adjusting/correcting the results of these multiple comparisons*.
The need to adjust for multiple comparisons
If you don't adjust for multiple comparisons, it is easy to fool yourself. If you compare many groups, the chances are high that one or more pair of groups will be 'significantly different' purely due to chance. To protect yourself from making this mistake, you probably should correct for multiple comparisons. Probably? There certainly are arguments for not adjusting for multiple comparisons. However, under most circumstances, correcting for multiple comparisons is considered the standard and most acceptable approach.
How multiple comparisons of survival curves work
Multiple comparison tests after ANOVA are complicated because they not only use a stricter threshold for significance, but also include data from all groups when computing scatter (pooled SD, or Mean Square within), and use this value with every comparison. By quantifying variability from all groups, not just the two you are comparing, you gain some degrees of freedom and thus some power.
Multiple comparison tests for comparing survival curves are simpler. You simply have to adjust the definition of significance, and don't need to take into account any information about the groups not in the comparison (as that information would not be helpful).
Performing Multiple Comparisons of survival curves automatically in Prism
Beginning in Prism version 10.5, you can specify which multiple comparisons you'd like to perform with Kaplan-Meier survival analysis, and also specify the correction method that you'd like to use*. You can choose if you'd like to correct for multiple comparisons using statistical hypothesis testing or with methods designed to control the false discovery rate (FDR). Simply specify which comparisons to perform, and Prism will perform all of the calculations appropriately.
Details on each of the options on this dialog tab are given in the Prism Statistics Guide.
Performing calculations manually
While we do not recommend adjusting P values for multiple comparisons manually, it is possible to use Prism to calculate individual P values, then apply the correction method manually on your own. For each pair of groups you wish to compare, follow these steps:
- Start from the results sheet that compares all groups.
- Click New, and then Duplicate Current Sheet.
- The Analyze dialog will pop up. On the right side, select the two groups you wish to compare and make sure all other data sets are unselected. Then click OK.
- The parameters dialog for survival analysis pops up. Click OK without changing anything.
- Note the P value (from the logrank or Gehan-Breslow-Wilcoxon test), but don't interpret it until you correct for multiple comparisons, as explained in the next section.
- Repeat the steps for each comparison if you want each to be in its own results sheet. Or click Change.. data analyzed, and choose a different pair of data sets.
Which comparisons are 'statistically significant'?
When you are comparing multiple pairs of groups at once, you can't interpret the individual P in the usual way. Instead, you set a significance level, and ask which comparisons are 'statistically significant' using that threshold.
The simplest approach is to use the Bonferroni method:
- Define the significance level that you want to apply to the entire family of comparisons. This is conventionally set to 0.05.
- Count the number of comparisons you are making, and call this value K. See the next section which discusses some ambiguities.
- Compute the Bonferroni corrected threshold that you will use for each individual comparison. This equals the family-wise significance level (defined in step 1 above, usually .05) divided by K.
- If a P value is less than this Bonferroni-corrected threshold, then the comparison can be said to be 'statistically significant'.
How many comparisons are you making?
You must be honest about the number of comparisons you are making. Say there are four treatment groups (including control). You then go back and compare the group with the longest survival with the group with the shortest survival. It is not fair to say that you are only making one comparison, since you couldn't decide which comparison to make without looking at all the data. With four groups, there are six pairwise comparisons you could make. You have implicitly made all these comparisons, so you should define K in step 3 above to equal 6.
If you were only interested in comparing each of three treatments to the control, and weren't interested in comparing the treatments with each other, then you would be making three comparisons, so should set K equal to 3.
* Requires a named-user Prism license (not available for serial number-based Prism licenses)