![]() |
| This page describes the first edition of Intuitive Biostatistics. A completely revised second edition is now available. | ||||||
|
Intuitive Biostatistics: Survival Curves This is chapter 6 of Intuitive Biostatistics (ISBN 0-19-508607-4) by Harvey Motulsky. Copyright © 1995 by Oxford University Press Inc. All rights reserved. You may order the book from GraphPad Software with a software purchase, from any academic bookstore, or from amazon.com. |
||||||
| We have discussed how to quantify uncertainty for outcomes expressed as proportions (Chapter 2) or as measurements (Chapters 3 to 5). However, in many clinical studies, the outcome is survival time. Analysis of survival data is trickier than you might first imagine.
What's wrong with calculating the mean survival time and its confidence interval (Cl)? This approach is rarely useful. One problem is that you can't calculate the mean survival time until you know the survival time for each patient, which means you can't analyze the data until the last patient has died. Another problem is that survival times are unlikely to follow a Gaussian distribution. For these and other reasons, special methods must be used for analysis of survival data. A SIMPLE SURVIVAL CURVE Survival curves plot percent survival as a function of time. Figure 6.1 shows a simple survival curve. Fifteen patients were followed for 36 months. Nine patients died at known times, and six were still alive at the end of the study.
Time zero is not some specified calendar date; rather, it is the time that each patient entered the study. In many clinical studies, "time zero" spans several calendar years as patients are enrolled. At time zero, by definition, all patients are alive, so Y = 100%. Whenever a patient dies, the percent surviving decreases. If the study (and thus the X axis) were extended far enough, Y would eventually reach 0. This study ended at 36 months with 40% (6/15) of the patients still alive. Each patient's death is clearly visible as a downward jump in the curve. When the first patient died the percent survival dropped from 100.0% to 93.3% (14/15). When the next patient died, the percent survival dropped again to 86.7%. At 19 months, two patients died, so the downward step is larger The term survival curve is a bit misleading, as "survival" curves can plot time to any well-defined end point, such as occlusion of a vascular graft, date of first metastasis, or rejection of a transplanted kidney. The event does not have to be dire. The event could be restoration of renal function, discharge from a hospital, or graduation. The event must be a one-time event. Recurring events should not be analyzed with survival curves. CENSORED SURVIVAL DATA In the previous example, we knew that all subjects either died before 36 months or survived longer than 36 months (the right end of our curve). Real data are rarely so simple. In most survival studies, some surviving subjects are not followed for the entire span of the curve. This can happen in two ways: Some subjects are still alive at the end of the study but were not followed for the entire span of the curve. Many studies enroll patients over a period of several years. The patients who enroll later are not followed for as many years as patients who enroll early. Imagine a study that enrolls patients between 1985 and 1989, and that ends in 1991. Patient A enrolled in 1989 and is still alive at the end of the study. Even though the study lasted 6 years, we only know that patient A survived at least 3 years. In either case, you know that the subject survived up to a certain time but have no useful information about what happened after that. Information about these patients is said to be censored. Before the censored time, you know they were alive and following the experimental protocol, so these subjects contribute useful information. After they are censored, you can't use any information on the subjects. Either we don't have information beyond the censoring day (because the data weren't or can't be collected) or we have information but can't use it (because the patient no longer was following the experimental protocol). The word censor has a negative ring to it, It sounds like the subject has done something bad. Not so. It's the data that have been censored, not the subject! CREATING A SURVIVAL CURVE There are two slightly different methods to create a survival curve. With the actuarial method, the X axis is divided up into regular intervals, perhaps months or years, and survival is calculated for each interval. With the Kaplan-Meier method, survival is recalculated every time a patient dies. This method is preferred, unless the number of patients is huge. The term life-table analysis is used inconsistently, but usually includes both methods. You should recognize all three names. The Kaplan-Meier method is logically simple but tedious. Since computer programs can do the calculations for you, the details will not be presented here. The idea is pretty simple. To calculate the fraction of patients who survived on a particular day, simply divide the number alive at the end of the day by the number alive at the beginning of the day (excluding any who were censored on that day from both the numerator and denominator). This gives you the fraction of patients who were alive at the beginning of a particular day who were still alive at the beginning of the next day. To calculate the fraction of patients who survive from day 0 until a particular day, multiply the fraction of patients who survive day 1, times the fraction of those patients who survive day 2, times the fraction of those patients who survive day 3 ... times the fraction who survive day k. This method automatically accounts for censored patients, as both the numerator and denominator are reduced on the day a patient is censored. Because we calculate the product of many survival fractions, this method is also called the product-limit method. Note that day refers to day of the study, not a particular day on the calendar. Day I is the first day of the study for each subject. Figure 6.2 shows a survival curve with censored data. The study started with 15 patients. Nine died during the study (same as the previous example) and six were censored at various times during the study. On the left panel, each censored patient is denoted by upward blips in the survival curve. On the right panel, each censored patient is denoted by a symbol in the middle of a horizontal part of the survival curve. At the time a patient is censored, the survival curve does not dip down as no one has died. When the next patient dies, the step downward is larger because the denominator (the number of patients still being followed) has shrunk.
Figure 6.2. A survival curve with censored subjects. A subject is censored at a certain time for one of two reasons. (1) He stopped following the study protocol at that time. (2) The trial ended with the subject still alive. In the left panel, censored subjects are shown as upward blips. In the right panel, censored subjects are shown as solid circles in a horizontal portion of the curve. You'll see both kinds of graphs frequently. CONFIDENCE INTERVAL OF A SURVIVAL CURVE In order to extrapolate from our knowledge of a sample to the overall population, a survival curve is far more informative when it includes a 95% Cl. Calculating Cls is not straightforward and is best left to computer programs. The interpretation of the 95% Cl for a survival curve should be clear to you by now. We have measured survival exactly in a sample but don't know what the survival curve for the entire population looks like. We can be 95% sure that the true population survival curve lies within the 95% CI shown on our graph at all times. Unfortunately, many published survival curves do not include Cls. Assuming that you can figure out how many patients are still alive at any given time, you can use Equation 6.1 to calculate an approximate 95% CI for the fraction surviving up to at any time t (p is fraction surviving up to time t, and N is the number of patients still alive and following the protocol at time t): Equation 6.1 is not well known. I got it from page 378 of D. G. Altman, Practical Statistics for Medical Research, Chapman & Hall, London, 1991.
Let's use this equation to figure out approximate Cls for the example at 24 months. We started with 15 patients. Between 0 and 24 months, eight patients have died Oust count the downward steps, remembering that the big step at 19 months represents two patients). Four patients have been censored before 24 months (count the ticks on the left panel of Figure 6.1 between 0 and 24 months). Thus three patients (15 - 8 - 4) are still alive and being followed at 24 months, so N = 3. Reading off the curve, p - 0.35. Plugging p and N into the equation, the 95% Cl is approximately 0.03 to 0.67. Equation 6.1 is only an approximation and sometimes calculates values that are nonsense. It can calculate a lower confidence limit less than 0. In this case, set the lower limit to 0. It can also calculate an upper confidence limit greater than 100%. In this case set the limit to I 00%. Figure 6.3 shows more exact Cls calculated by computer. If there are censored patients, the right side of a survival curve represents fewer patients than the left side, and the Cls become wider as time progresses (until survival converges on 0).
MEDIAN SURVIVAL It is easy to derive the median survival time from the survival curve. Simply draw a horizontal line at 50% survival and see where it crosses the curve. Then look down at the X axis to read off the median survival time. Figure 6.4 shows that the median survival in the example is about 22 months. Sometimes the survival curve is horizontal at 50% survival. In this case, the median encompasses a range of survival times. Most people define the median survival in this case as the average of the first and last time point at which the survival curve equals 50%. If the survival curve includes 95% Cls, you can determine the 95% CI of the median by seeing where the upper and lower Cl crosses the horizontal line where survival equals 50%. From Figure 6.3 you can estimate that the 95% Cl of median survival ranges from about 19 to 33 months. Obviously, you can't determine median survival if more than half the subjects are still alive when the study ends.
ASSUMPTIONS The interpretation of survival curves (and their Cls) depends on these assumptions: Random sample. If your sample is not randomly selected from a population, then you must assume that your sample is representative of that population. PROBLEMS WITH SURVIVAL STUDIES Since the survival curve plots time until death, you have to decide when to "start the clock." The starting point should be an objective date-perhaps the date of first diagnosis or first hospital admission. You may be tempted to use instead an earlier starting criteria, such as the time that a patient remembers first observing symptoms. Don't do it. Such data are invalid because a patient's recollection of early symptoms may be altered by later events. If the curve is plotting deaths due to a particular form of cancer, you need to decide what to do with patients who die of another cause, say, an automobile accident. Some investigators count these as deaths, and others count them as censored subjects. Both approaches are sensible, but the, approach should be decided before the study is started. SUMMARY You will frequently encounter survival curves in the medical literature. Survival curves can be used to plot time to any nonrecurrent event. The event does not have to be death, so the term survival can be misleading. Creating a survival curves is a bit tricky, because you need to account for censored subjects. Subjects can be censored because they stop following the experimental protocol, or because they are still alive when the protocol ends. These subjects contribute data up until the time of censoring but contribute no data after that. It is easiest to interpret a survival curve when you plot 95% confidence limits for survival at various times. You can be 95% sure that the survival curve for the overall population lies somewhere within those limits. Chapter 33 explains how to compare survival curves. Visit the GraphPad home page. |