## Please enable JavaScript to view this site.

 Choosing a model for multiple logistic regression

Multiple logistic regression is used when the dependent (Y) variable is dichotomous (yes/no, success/fail, etc.). The dependent (Y) variable must only have two values. It could be a continuous variable with values "0" and "1" or a categorical variable with two (text) levels. Multiple logistic regression will provide a fit for the probability that Y equals 1 (or the probability that Y equals the assigned categorical level), given particular values of the one or more X variables.

## Choose dependent variable

Select the appropriate column from the drop down menu that contains your dependent (Y) variable. Note that this variable can only contain two values (it must be a continuous variable with values "1" and "0" or a categorical variable with two levels). If a continuous variable (with values "0" and "1") is used as the dependent variable, then labels can be added to identify these values on the options tab of the parameters dialog.

## Define model

Other than the restriction on the Y values, data gets entered in the same way as it does for multiple linear regression. Prism makes it easy to choose which variables to include in the model including two- and three-way interactions and transformations. However, Prism cannot automatically choose a set of variables or interactions for you. Read why.

### Intercept

For logistic regression, the intercept is the expected log odds when all of the X values are zero. If that doesn't provide you with much intuitive value, you're not alone. This value is much less interpretable than with linear regression due to the log odds transformation of the dependent variable used in logistic regression. Perhaps the easiest way to understand it is to think in terms of the logistic, S-curve that can be plotted from simple logistic regression. The intercept term can provide information on the value of that S-curve when X is zero. Using some simple algebra, it can be shown that the value of this S-curve at X=0 is given by:

Probability when X is 0 = eβ0/(1+eβ0)

Another way to think of this is that the intercept term allows you to determine the probability of success in the absence of (or accounting for) your predictor variable(s).

One very important thing to keep in mind about the intercept of a logistic regression model is that there’s almost never a reason to exclude it. As discussed above, the value of the intercept (β0) of a logistic regression model tells you what the predicted log odds are when all of the predictor variables (X variables) are equal to zero. If we convert this information from a log odds scale to a probability scale, the intercept of the model provides the information for the probability of observing a “success” (Y=1) after accounting for all other variables. However, by excluding the intercept from the model, you’re making the assumption that the probability of observing a “success” when all X variables are zero is equal to 0.5 (or 50%). You can read more about the math involved here if you’re interested. But the bottom line is that it’s quite uncommon that you would be able to assume (or that you’d want to assume) that the probability of a “success” in the absence of predictors is 0.5. Therefore, you’ll almost always want to include the intercept term in your model.

### Main Effects

Use the checkboxes to select the desired X variables to include in the model. When fitting the model, Prism will determine a regression coefficient for each of the selected main effects (along with each of the interactions and transforms selected). By default, all of the main effects are selected. If you uncheck one of the main effects, that X variable will essentially not be part of the analysis (unless that variable is part of an interaction or transform as explained below).

### Interactions

Prism makes it easy to include two- and three-way interactions of the independent variables in the model. A two-way interaction multiplies two variables together to create a new variable for which the model will determine a regression coefficient. Similarly three-way interactions multiply together three variables. Three-way interactions are used less commonly than two-way interactions.

### Transforms

Prism lets you use the square, the cube, or the square root of any variable in the model. Let us know if you’d like Prism to offer other transforms when defining multiple regression models.