## Please enable JavaScript to view this site.

This guide is for an old version of Prism. Browse the latest version or update Prism
 Overview of multiple logistic regression

Multiple logistic regression is an extension of multiple linear regression. Logistic regression is used to model a dependent variable with binary responses such as yes/no or presence/absence. The goal of logistic regression is to perform predictions or inference on the probability of observing a 0 or a 1 given a set of X values. In the following, we write the probability of Y = 1 as P(Y=1).

Logistic regression fits a linear regression model to the log odds. The odds are defined mathematically as P(Y=1) / P(Y=0). People have often seen odds used in reference to betting. For example, 3 to 1 odds is another way of saying that the P(Y=1) is 0.75. The log odds are then just the natural log (Ln) of the odds.

Why not just fit a multiple linear regression (MLR) instead of a multiple logistic regression? There are two main reasons:

1.The standard statistical tests for MLR (goodness of fit, parameter estimates, standard errors, etc.) assume a Gaussian distribution for the residuals. This assumption is violated.

2.For interpretability, it’s preferable to talk about the probability of observing a “success” or a “failure.” This can be done with logistic regression, but not with linear regression.

## So why do we model the log odds?

It's essentially a mathematical technicality that lets us use much of the framework developed for MLR models. While probability is constrained between 0 and 1, odds can be any positive number and the range for log odds, then, is any real number. Thus, we can use a familiar form of our model equation: log odds (Y) = β0 + β1*X1 + β2*X2 + ... We can then back transform estimates as needed to provide inference about odds or predictions for probabilities.