Multiple logistic regression is an extension of multiple linear regression. Logistic regression is used to model a dependent variable with binary responses such as yes/no or presence/absence. The goal of logistic regression is to perform predictions or inference on the probability of observing a 0 or a 1 given a set of X values. In the following, we write the probability of Y = 1 as P(Y=1).
Logistic regression fits a linear regression model to the log odds. The odds are defined mathematically as P(Y=1) / P(Y=0). People have often seen odds used in reference to betting. For example, 3 to 1 odds is another way of saying that the P(Y=1) is 0.75. The log odds are then just the natural log (Ln) of the odds.
Why not just fit a multiple linear regression (MLR) instead of a multiple logistic regression? There are two main reasons:
1.The standard statistical tests for MLR (goodness of fit, parameter estimates, standard errors, etc.) assume a Gaussian distribution for the residuals. This assumption is violated.
2.For interpretability, it’s preferable to talk about the probability of observing a “success” or a “failure.” This can be done with logistic regression, but not with linear regression.
It's essentially a mathematical technicality that lets us use much of the framework developed for MLR models. While probability is constrained between 0 and 1, odds can be any positive number and the range for log odds, then, is any real number. Thus, we can use a familiar form of our model equation: log odds (Y) = β0 + β1*X1 + β2*X2 + ... We can then back transform estimates as needed to provide inference about odds or predictions for probabilities.