| Interpreting survival analyses
How to think about survival curve results
As with any analysis, you need to look both at the P value and at the confidence intervals.
This section applies only if you are comparing two survival curves. The logrank test computes a P value that answers this question: If the two populations have identical survival curves overall, what is that chance that random sampling of subjects would lead to as big a difference in survival (or bigger) as you observed.
To interpret the P value, you also need to consider a confidence interval. Prism reports two confidence intervals: the confidence interval for the ratio of median survival and the confidence interval for the hazard ratio. The discussion below assumes that you are focusing on the ratio of median survival times, but you can use the same logic if you focus on the hazard ratio.
If the difference is statistically significant - the P value is small (logrank test)
If the P value is small, then it is unlikely that the difference you observed is due to a coincidence of random sampling. You can reject the idea that the two populations have identical survival characteristics.
Because of random variation, the ratio of median survival times (or the hazard ratio) between the two groups in this experiment is unlikely to equal the true ratio. There is no way to know what that true ratio is. Prism presents the uncertainty as a 95% confidence interval for the ratio of median survival times (and the hazard ratio). You can be 95% sure that this interval contains the true ratio.
To interpret the results in a scientific context, look at both ends of the confidence interval and ask whether they represent a ratio far enough from 1.0 to be scientifically important. How far is "far"? How close is "close"? That depends on the reasons you did the experiment. In some cases, you may think that a 10% increase or decrease in median survival would be very important. In other cases you may care only about a 50% change.
| Lower confidence limit |
Upper confidence limit |
Conclusion |
| Close to 1.0 |
Close to 1.0 |
Although the true ratio is not 1.0 (since the P value is low) the difference between median survival is tiny and uninteresting. The treatment had an effect, but a small one. |
| Close to 1.0 |
Far from 1.0 |
Since the confidence interval ranges from a ratio that you think is trivially different than 1.0 to one you think is far from 1.0, you can't reach a strong conclusion from your data. You can conclude that the median survival times are different, but you don't know whether the size of that difference is scientifically trivial or important. You'll need more data to obtain a clear conclusion. |
| Far from 1.0 |
Far from 1.0 |
Since even the low end of the confidence interval represents a ratio far from 1.0 and thus considered biologically important, you can conclude that there is a difference between median survival times, and that the difference is large enough to be scientifically relevant. |
If the difference is not statistically significant -- the P value is large (logrank test)
If the P value from the logrank is large, the data do not give you any reason to conclude that the median survival times of the two groups are really different. Even if the true median survival times were equal, you would not be surprised to find medians this far apart just by coincidence. This is not the same as saying that the true median times are the same. You just don't have evidence that they differ.
How large could the true difference really be? Because of random variation, the ratio of median survival times in this experiment is unlikely to equal the true ratio in the entire population. There is no way to know what that true ratio is. Prism presents the uncertainty as a 95% confidence interval (except with the Newman-Keuls test). You can be 95% sure that this interval contains the true ratio of median survival times. When the P value is larger than 0.05, the 95% confidence interval will start with a ratio below 1.0 (representing a decrease) and go up to a ratio greater than 1.0 (representing an increase).
To interpret the results in a scientific context, look at both ends of the confidence interval, and ask whether they are close to or far from 1.0.
| Lower confidence limit |
Upper confidence limit |
Conclusion |
| Close to 1.0 |
Close to 1.0 |
You can reach a crisp conclusion. Either the median survival times really are the same, or they are different by a trivial amount. At most, the true difference between median survival times is tiny and uninteresting.
|
| Close to 1.0 |
Far from 1.0 |
You can't reach a strong conclusion. The data are consistent with the treatment causing a trivial decrease, no change, or a large increase. To reach a clear conclusion, you need to repeat the experiment with more subjects.
|
| Far from 1.0 |
Close to 1.0 |
You can't reach a strong conclusion. The data are consistent with a trivial increase, no change, or a decrease that may be large enough to be important. You can't make a clear conclusion without repeating the experiment with more subjects. |
| Far from 1.0 |
Far from 1.0 |
You cannot reach any conclusion. Your data are consistent with no change, or a difference in either direction that may be large enough to be scientifically relevant. You can't make a clear conclusion without repeating the experiment with more subjects. |
Checklist for interpreting survival analyses
|
Question
|
Discussion
|
| Are the subjects independent? |
Factors that influence survival should either affect all subjects in a group, or just one subject. If the survival of several subjects is linked, then you don't have independent observations. For example, if the study pools data from two hospitals, the subjects are not independent, as it is possible that subjects from one hospital have different average survival times than subjects from another. You could alter the median survival curve by choosing more subjects from one hospital and fewer from the other. To analyze these data, use Cox proportional hazards regression which Prism cannot perform.
|
| Were the entry criteria consistent? |
Typically, subjects are enrolled over a period of months or years. In these studies, it is important that the starting criteria don't change during the enrollment period. Imagine a cancer survival curve starting from the date that the first metastasis was detected. What would happen if improved diagnostic technology detected metastases earlier? Even with no change in therapy or in the natural history of the disease, survival time will apparently increase. Here's why: Patients die at the same age they otherwise would, but are diagnosed when they are younger, and so live longer with the diagnosis. |
| Was the end point defined consistently? |
If the curve is plotting time to death, then there can be ambiguity about which deaths to count. In a cancer trial, for example, what happens to subjects who die in an automobile accident? Some investigators count these as deaths, others count them as censored subjects. Both approaches can be justified, but the approach should be decided before the study begins. If there is any ambiguity about which deaths to count, the decision should be made by someone who doesn't know which patient is in which treatment group.
If the curve plots time to an event other than death, it is crucial that the event be assessed consistently throughout the study.
|
| Is time of censoring is unrelated to survival? |
The survival analysis only is valid when the survival times of censored patients are identical to the survival of subjects who stayed with the study. If a large fraction of subjects are censored, the validity of this assumption is critical to the integrity of the results. There is no reason to doubt that assumption for patients still alive at the end of the study. When patients drop out of the study, you should ask whether the reason could affect survival. A survival curve would be misleading, for example, if many patients quit the study because they were too sick to come to clinic, or because they felt too well to take medication.
|
| Does average survival stay constant during the course of the study? |
Many survival studies enroll subjects over a period of several years. The analysis is only meaningful if you can assume that the average survival of the first few patients is not different than the average survival of the last few subjects. If the nature of the disease or the treatment changes during the study, the results will be difficult to interpret.
|
| Is the assumption of proportional hazards reasonable? |
The logrank test is only strictly valid when the survival curves have proportional hazards. This means that the rate of dying in one group is a constant fraction of the rate of dying in the other group. This assumption has proven to be reasonable for many situations. It would not be reasonable, for example, if you are comparing a medical therapy with a risky surgical therapy. At early times, the rate of dying might be much higher in the surgical group. At later times the rate of dying might be greater in the medical group. Since the hazard ratio is not consistent over time (the assumption of proportional hazards is not reasonable), these data should not be analyzed with a logrank test.
|
|
Were the treatment groups defined before data collection began?
|
It is not valid to divide a single group of patients (all treated the same) into two groups based on whether they responded to treatment (tumor got smaller, lab tests got better). By definition, the responders must have lived long enough to see the response. And they may have lived longer anyway, regardless of treatment. When you compare groups, the groups must be defined before data collection begins.
|
|