This guide is for an old version of Prism. Browse the latest version or update Prism

Logistic regression is used when the outcome variable (Y variable, dependent variable, response variable, etc.) can only take on two possible outcomes, and its goal is to model the probability of observing a success. In this sense, “success” simply refers to one of those two possible outcomes, and should be based on your experimental design. As with many terms in statistics, “success” in this case has a slightly different meaning than the one we’re commonly used to using. For example, while studying the incidence of a rare disease in a population, what you may be interested in is the probability that an individual would get this disease. In this case, you would consider getting the disease a “success,” if only for the sake of constructing the model.

Looking at another example, let’s say you were given a dataset containing the length of time that students studied for a test in addition to whether or not those students passed the test. You would probably expect that the longer a student studies for the test, the more probable it is that the student will pass. Here, the “success” would be that the student passes. However, the Y variable for logistic regression could be just about anything, so long as it can only take on one of two possible values: yes/no, pass/fail, alive/dead, etc. Another way to say this is that the outcome variable must be “binary”. Usually, these outcomes are encoded as a “1” (indicating a “success”) or a “0” (indicating a “failure”). Note that in our example, if you were given the grades for each student (as a percent), you may have considered performing linear or nonlinear regression. However, because our outcome is binary, logistic regression is the appropriate choice.

In a sense, simple logistic regression can be thought of as an extension to simple linear regression to handle cases with binary outcomes: both simple linear regression and simple logistic regression build models with which you can predict an outcome value (Y) by knowing a single input value (X). Because of this, there are two very important things to remember when thinking about the similarities and differences of linear and logistic regression:

1.Linear regression works when the outcome is continuous, logistic regression works when the outcome is binary. Trying to use linear regression on a binary outcome variable simply won’t work (well).

2.Logistic regression generates a model that allows you to predict the probability of success, given a certain X value. Data that you put into the model will only include actual outcomes (at a given X value, a success was observed or it wasn’t).

These two topics are discussed in more detail in the following sections