When performing survival analysis in which the time-to-event response variable data is accompanied by multiple predictor variables (including categorical or continuous predictor variables), nonparametric approaches such as using the Kaplan-Meier (product limit) estimator cannot be used. An alternative is to use Cox proportional hazards regression, a semiparametric technique. The following pages of this guide cover the background and mathematical theory involved in Cox regression. If you're just looking for a guide on how to run this analysis in Prism, skip to this how-to page instead.
Cox proportional hazards regression was introduced in Prism 9.3.0 as the newest (and arguably most advanced) Prism Labs feature. This analysis is very-well established as the industry standard for survival analysis, and allows for complex investigations of multiple different kinds of predictor variables (both categorical and continuous) and their effect on survival. We've gone to great lengths to ensure that the results Prism generates are accurate, and within these guide pages, you'll find numerous explanations for how these results are generated, as well as basic guidance for how to interpret many of these results.
HOWEVER, Cox regression is advanced - arguably more advanced than any other analysis available within Prism. Before analyzing your data with Cox regression, be sure that you understand the fundamentals of survival analysis (i.e. Kaplan-Meier survival estimation and the various tests available for comparing the resulting survival curves: the logrank test, the logrank test for trend, and the Gehan-Breslow-Wilcoxon test). Cox regression also relies heavily on statistical concepts that power other forms of multiple regression (like multiple linear and multiple logistic regression). Even with knowledge of all of these different concepts, the best advice is always to seek guidance or assistance from a statistician when dealing with these complex techniques.
First, let’s consider what is meant by ‘semiparametric’? In an earlier section, we looked at reasons why linear regression couldn’t be used to analyze survival data. One of those reasons was that the data (survival times) are highly skewed, and must by definition be positive (survival time can’t be negative). Linear regression relies heavily on the normal (Gaussian) distribution, but this distribution doesn’t do a great job describing survival data. Notably, the normal distribution is a symmetric distribution, and can contain negative values. Instead, other distributions can be used to analyze survival data (such as the Weibull, exponential, lognormal, or other distributions). In all of these cases where a distribution is specified, the analyses are considered ‘parametric’ because they assume that the data come from a distribution that can be defined using a strict set of parameters (to be a bit more accurate, these analyses make an assumption about the form of the hazard function, which will be discussed later). Cox proportional hazards regression doesn’t make such an assumption about the distribution of the time data, but it does make a parametric assumption about the effect of the predictor variables on survival time. Hence, it is a ‘semiparametric’ technique.
So then, if Cox proportional hazards doesn’t make an assumption about the distribution of the survival data, how is it able to estimate a survival curve (a survival function that provides survival probability as a function of time)? A subsequent section goes into some of the mathematics behind this technique, but the short answer is right in the name of the analysis itself: ‘proportional hazards’. To understand what this means, let’s first look at what hazard rates are.