KNOWLEDGEBASE - ARTICLE #1226

Hazard ratio from survival analysis.

Definition of the hazard ratio

Hazard is defined as the slope of the survival curve — a measure of how rapidly subjects are dying.

The hazard ratio compares two treatments. If the hazard ratio is 2.0, then the rate of deaths in one treatment group is twice the rate in the other group.

As part of the survival analysis of two data sets, Prism reports the hazard ratio with its 95% confidence interval.

Interpreting the hazard ratio

The hazard ratio is not computed at any one time point, but includes all the data in the survival curve.

Since there is only one hazard ratio reported, it can can only be interpreted if you assume that the population hazard ratio is consistent over time, and that any differences are due to random sampling.

If the hazard ratio is not consistent over time, the value that Prism reports for the hazard ratio will not be useful. If two survival curves cross, the hazard ratios are certainly not consistent (unless they cross at late time points, when there are few subjects still being followed so there is a lot of uncertainty in the true position of the survival curves).

Note that a hazard ratio of two does not mean that the median survival time is doubled (or halved). A hazard ratio of two means a patient in one treatment group who has not died (or progressed, or whatever end point is tracked) at a certain time point has twice the probability of having died (or progressed...) by the next time point compared to a patient in the other treatment group.

For other cautions about interpreting hazard ratios, see these two review papers:

Hazards of Hazard Ratios (M.A. Hernán. Epidemiology. 21:13-5, 2010).
Hazard ratio in clinical trials. ( Spruance et al, Antimicrobial Agents and Chemotherapy vol. 48 (8) pp. 2787, 2004).

How the hazard ratio is computed

There are two very similar ways of doing survival calculations: log-rank, and Mantel-Haenszel. Both are explained in chapter 3 of Machin, Cheung and Parmar,Survival Analysis (details below).

The Mantel Haneszel approach uses these steps:

Compute the total variance, V, as explained on page 38-40 of a handout by Michael Vaeth. Note that he calls the test "log-rank" but in a note explains that this is the more accurate test, and also gives the equation for the simpler approximation that we call log-rank.
Compute L = (O1 - E1) / V, where O1 - is the total observed number of events in group1 E1 - is the total expected number of events in group1. You'd get the same value of K if you used the other group.
Note that L is the natural logarithm of the HR.
The lower 95% confidence limit of the hazard ratio equals:
EXP(L - 1.96/sqrt(V))
The upper 95% confidence limit equals:
EXP(L + 1.96/sqrt(V))

The logrank approach uses these steps:

As part of the Kaplan-Meier calculations, compute the number of observed events (deaths, usually) in each group (Oa, and Ob), and the number of expected events assuming a null hypothesis of no difference in survival (Ea and Eb).
The hazard ratio then is:
HR= (Oa/Ea)/(Ob/Eb)
The standard error of the natural logarithm of the hazard ratio is: sqrt(1/Ea + 1/Eb)
Calculate L = ln(HR). (Natural logarithm)
The lower 95% confidence limit of the hazard ratio equals:
EXP( (L- 1.96*sqrt(1/Ea + 1/Eb))
The upper 95% confidence limit equals:
EXP( (L + 1.96*sqrt(1/Ea + 1/Eb))

The two methods compared

The two usually give identical (or nearly identical) results. But the results can differ when several subjects die at the same time or when the hazard ratio is far from 1.0.

Bernstein and colleagues analyzed simulated data with both methods (1). In all their simulations, the assumption of proportional hazards was true. The two methods gave very similar values. The logrank method (which they refer to as the O/E method) reports values that are closer to 1.0 than the true Hazard Ratio, especially when the hazard ratio is large or the sample size is large.

When there are ties, both methods are less accurate. The logrank methods tend to report hazard ratios that are even closer to 1.0 (so the reported hazard ratio is too small when the hazard ratio is greater than 1.0, and too large when the hazard ratio is less than 1.0). The Mantel-Haenszel method, in contrast, reports hazard ratios that are further from 1.0 (so the reported hazard ratio is too large when the hazard ratio is greater than 1.0, and too small when the hazard ratio is less than 1.0).

They did not test the two methods with data simulated where the assumption of proportional hazards is not true. I have seen one data set where the two estimate of HR were very different (by a factor of three), and the assumption of proportional hazards was dubious for those data (Excel file). It seems that the Mantel-Haenszel method gives more weight to differences in the hazard at late time points, while the logrank method gives equal weight everywhere (but I have not explored this in detail). If you see very different HR values with the two methods, think about whether the assumption of proportional hazards is reasonable. If that assumption is not reasonable, then of course the entire concept of a single hazard ratio describing the entire curve is not meaningful.

A bug in Prism 6

Note that both methods use the natural logarithm of the HR in their calculations. We define this value to be L above. The bug in Prism 6 is that the calculation for the logrank test actually calculated L using the Mantel-Haenszel approach when computing the confidence interval. Usually, the two HR values are nearly identical so this bug was mostly trivial. It only affects the calculations when the two HR values are very different. In this situation, one has to wonder if either definition is very helpful. I suspect this discrepancy happens when the data simply don't comply with the assumes of proportional hazards. The bug was fixed in 7.00 and 7.0a.

How older versions of Prism computes the Hazard Ratio

Prism 4 uses the logrank method to compute the hazard ratio, but uses the Mantel-Haenszel approach to calculate the confidence interval of the hazard ratio. The results can be inconsistent. In rare cases, the hazard ratio reported by Prism 4 could be outside the confidence interval of the hazard ratio reported by Prism 4.

Prism 5 computes both the hazard ratio, and its confidence interval, using the Mantel Haenszel approach.

References

1. L Bernstein, J. Anderson and MC Pike. Estimation of the proportional hazard in two-treatment-group clinical trials. Biometrics (1981) vol. 37 (3) pp. 513-519

Hazard ratio from survival analysis.

Explore the Knowledgebase