Please enable JavaScript to view this site.

 Analysis checklist: Simple logistic regression

To check that simple logistic regression is an appropriate analysis for your these data, ask yourself these questions:

Is the outcome (Y) variable binary (dichotomous)?

The independent (Y) variable may only take on two values and in Prism, these must be coded as a 0 and a 1.

Are the observations (rows) independent?

One of the fundamental assumptions of logistic regression is that each row of data is a unique, independent observation. An example of independent observations is a study on 100 randomly selected people where a 1 indicates a positive outcome and a 0 a negative outcome, and each person is recorded on a single row. If each person was measured more than once (say at various time points in the study), then the observations are not independent and logistic regression isn't appropriate. Independence would also be questionable if some of the participants were in the same family as almost any outcome is likely to be more similar between two individuals from one family than from two unrelated individuals.

Does the model fit and predict the data well?

All models are wrong, but some are useful…

Prism offers a variety of metrics to evaluate how well the simple logistic model fits to the entered data. However, you should keep in mind that fitting models to data and interpretation of model fits is - to some extent - subjective. Some possibilities to consider when evaluating a given model include:

Does the model classify data well? In other words, does the model correctly predict the observed 0s and 1s? You can evaluate this in Prism a number of different ways such as with Tjur’s R squared, an ROC plot (with area under the ROC curve), and the row classification table.

Does the logistic model out-perform an intercept-only model? Prism tests this concept in two related, but slightly different ways: using the Wald test to examine if β1 is significantly non-zero, and using the likelihood ratio test to directly compare the given model with an intercept-only model.

Do you have sufficient data to trust your results?

As with all stats modeling, the more data (generally) the better. At the bottom of the tabular results sheet of the analysis results, Prism reports how many observations were included in the model (Rows analyzed). For simple logistic regression, a general rule of thumb is to have at least ten observations with an outcome of zero and ten observations with an outcome of 1.

Are you underfitting?

In the case of simple logistic regression, it’s possible that your predictor (X) variable is only one of multiple variables that affect if an outcome is a success or not. If the model prediction performance isn’t as good as desired, perhaps you’re missing some key variable(s) that you either didn’t measure or chose not to model. If you simply chose not to model them, you should definitely investigate their impact using multiple logistic regression - which is a natural extension of simple logistic regression. Read more about multiple logistic regression here. On the other hand, if the key variable is one that you didn’t measure, you’re out of luck. Go back and repeat the experiment with a focus on collecting more information. THEN come back and perform multiple logistic regression!