﻿ Interpreting results: The hazard ratio

## Key facts about the hazard ratio

Hazard is defined as the slope of the survival curve — a measure of how rapidly subjects are dying.

The hazard ratio compares two treatments. If the hazard ratio is 2.0, then the rate of deaths in one treatment group is twice the rate in the other group.

The hazard ratio is not computed at any one time point, but is computed from all the data in the survival curve.

Since there is only one hazard ratio reported, it can can only be interpreted if you assume that the population hazard ratio is consistent over time, and that any differences are due to random sampling. This is called the assumption of proportional hazards.

If the hazard ratio is not consistent over time, the value that Prism reports for the hazard ratio will not be useful. If two survival curves cross, the hazard ratios are certainly not consistent (unless they cross at late time points, when there are few subjects still being followed so there is a lot of uncertainty in the true position of the survival curves).

The hazard ratio is not directly related to the ratio of median survival times. A hazard ratio of 2.0 does not mean that the median survival time is doubled (or halved). A hazard ratio of 2.0 means a patient in one treatment group who has not died (or progressed, or whatever end point is tracked) at a certain time point has twice the probability of having died (or progressed...) by the next time point compared to a patient in the other treatment group.

Prism computes the hazard ratio, and its confidence interval, using two methods, explained below. For each method it reports both the hazard ratio and its reciprocal. If people in group A die at twice the rate of people in group B (HR=2.0), then people in group B die at half the rate of people in group A (HR=0.5).

For other cautions about interpreting hazard ratios, see two reviews by Hernan(1) and Spruance(2).

Duerden (6) wrote a good easy-to-follow explanation of hazard ratios.

## The two methods compared

Prism reports the hazard ratio computed by two methods: logrank and Mantel-Haenszel. The two usually give identical (or nearly identical) results. But the results can differ when several subjects die at the same time or when the hazard ratio is far from 1.0.

Bernstein and colleagues analyzed simulated data with both methods (3). In all their simulations, the assumption of proportional hazards was true. The two methods gave very similar values.  The logrank method (which they refer to as the O/E method) reports values that are closer to 1.0 than the true Hazard Ratio, especially when the hazard ratio is large or the sample size is large.

When there are ties, both methods are less accurate. The logrank methods tend to report hazard ratios that are even closer to 1.0 (so the reported hazard ratio is too small when the hazard ratio is greater than 1.0, and too large when the hazard ratio is less than 1.0). The Mantel-Haenszel method, in contrast, reports hazard ratios that are further from 1.0  (so the reported hazard ratio is too large when the hazard ratio is greater than 1.0, and too small when the hazard ratio is less than 1.0).

## What does it mean when the two hazard ratios are very different?

The simulations of reference 3 did not compare the two methods with data simulated where the assumption of proportional hazards is not true. I have seen one data set where the two estimate of HR were very different (by a factor of three), and the assumption of proportional hazards was dubious for those data. It seems that the Mantel-Haenszel method gives more weight to differences in the hazard at late time points, while the logrank method gives equal weight everywhere (but I have not explored this in detail).

If you see very different HR values with the two methods, think about whether the assumption of proportional hazards is reasonable. If that assumption is not reasonable, then of course the entire concept of a single hazard ratio describing the entire curve is not meaningful.

## How the hazard ratio is computed

There are two very similar ways of doing survival calculations: logrank, and Mantel-Haenszel. Both are explained in chapter 3 of Machin, Cheung and Parmar, Survival Analysis (4).

### The Mantel Haenszel approach:

1.Compute the total variance, V, as explained on page 38-40 of a handout by Michael Vaeth. Note that he calls the test "logrank" but in a note explains that this is the more accurate test, and also gives the equation for the simpler approximation that we call logrank.

2.Compute L = (O1 - E1) / V, where O1 is the total observed number of events in group1, and E1 is the total expected number of events in group1. You'd get the same value of L if you used the other group.

3.Note that L is the natural logarithm of the hazard ratio. So the hazard ratio equals exp(L).

4.The lower 95% confidence limit of the hazard ratio equals:

exp(L - 1.96/sqrt(V))

5.The upper 95% confidence limit equals:

exp(L + 1.96/sqrt(V))

### The logrank approach:

1.As part of the Kaplan-Meier calculations, compute the number of observed events (deaths, usually) in each group (Oa, and Ob), and the number of expected events assuming a null hypothesis of no difference in survival (Ea and Eb).

2.The hazard ratio then is:

HR= (Oa/Ea)/(Ob/Eb)

3.The standard error of the natural logarithm of the hazard ratio is S= sqrt(1/Ea + 1/Eb)

4.Calculate L = ln(HR). (Natural logarithm)

5.The lower 95% confidence limit of the hazard ratio equals:

exp(L - 1.96*S)

5.The upper 95% confidence limit equals:

exp(L + 1.96*S)

### Prior versions of Prism

Prism 6 reports the hazard ratio twice, once computed with the Mantel-Haenszel method and again using the logrank method.

A bug in Prism 6. Note that both methods use the natural logarithm of the HR in their calculations. We define this value to be L above. The bug in Prism 6 is that the calculation for the logrank test actually calculated L using the Mantel-Haenszel approach when computing the confidence interval. Usually, the two HR values are nearly identical so this bug was mostly trivial. It only affects the calculations when the two HR values are very different. In this situation, one has to wonder if either definition is very helpful. I suspect this discrepancy happens when the data simply don't comply with the assumes of proportional hazards. The bug was fixed in 7.00 and 7.0a.

Prism 5  computed the hazard ratio and its confidence interval using the Mantel Haenszel approach. Prism  4  used the logrank method to compute the hazard ratio, but used the Mantel-Haenszel approach to calculate the confidence interval of the hazard ratio. The results can be inconsistent. In rare cases,  the hazard ratio reported by Prism 4 could be outside the confidence interval of the hazard ratio reported by Prism 4.

## References

1. M.A. Hernán. Hazards of Hazard Ratios, Epidemiology. 21:13-5, 2010.

2. S. L. Spruance et all, Hazard ratio in clinical trials, Antimicrobial Agents and Chemotherapy  vol. 48 (8) pp. 2787, 2004.

3. L Bernstein, J. Anderson and MC Pike. Estimation of the proportional hazard in two-treatment-group clinical trials. Biometrics (1981) vol. 37 (3) pp. 513-519

4.  David Machin, Yin Bun Cheung, Mahesh Parmar, Survival Analysis: A Practical Approach, 2nd edition, IBSN:0470870400.

5. Michael Vaeth, Statistical analysis of survival data in clinical research (2004).

6. Martin Duerden, What are hazard ratios? (2009 )