Please enable JavaScript to view this site.

A reminder of how hypothesis tests work

Two hypothesis tests are offered by Prism for assessing how well a model fits the entered data. Like other hypothesis-based tests that you’ve likely encountered, these two tests start by defining a null hypothesis (H0). Each test then calculates a statistic and corresponding P value. The calculated P value represents the probability of obtaining a test statistic as large as the one calculated (or larger) if the null hypothesis were true . This test also requires that a threshold be set for how small a P value must be to make a decision about whether to reject the null hypothesis. This – admittedly arbitrary – threshold is typically set to 0.05, and is also referred to as the alpha (α) level. If the obtained P value is smaller than α, then we reject the null hypothesis.

Before reading about each of the tests below, it's also important to understand what is meant by the terms "specified model" and "intercept-only model". The specified model is simply the model that was fit to the data (the one that you defined on the Model tab of the analysis). It includes the effects of the predictor variables on the probability of observing a success. The intercept-only model is a model that assumes that the contributions from the predictor variables is zero. In other words, the intercept-only model assumes that none of the independent variables help predict the outcome.

Hosmer-Lemeshow (HL) Test

The Hosmer-Lemeshow test is a classic hypothesis test for logistic regression. The null hypothesis is that the specified model is correct (that it fits well). The way the test works is by first sorting the observations by their predicted probability, and splitting them into 10 groups of equal numbers of observations (N). For each group, the average predicted probability is calculated, and that average is multiplied by N to get an expected number of 1s for that group (and in turn the expected number of 0s for that group). It then calculates a Pearson goodness of fit statistic using the observed and expected numbers of 0s and 1s, and summing across all of the groups. It uses a chi-squared distribution to then calculate a P value.

As mentioned, the null hypothesis is that the specified model fits well, so contrary to many tests, a small P value indicates a poor fit of the model to the data. Another way to think about this is that a small P value indicates that there is more deviation from the expected number of 0s and 1s (given 10 bins) than you'd expect by chance. Thus, there may be some additional factor, interaction, or transform that is missing from the model.

This test has received criticism for the arbitrary number of bins (10), since it has been shown that changing this number can influence the result of the test. The test is included as an option in Prism so you can compare results obtained in Prism with results calculated elsewhere, even though this test is not recommended.

Log likelihood ratio test (LRT)

The log likelihood ratio test is also a classic test that compares how well the model selected fits compared to the intercept only model. In this case, the null hypothesis is that the intercept-only model fits best, so a small P value here indicates that you would reject this null hypothesis (or that the specified model outperforms the intercept only model). As the name implies, this test uses the log likelihood of the specified model and the intercept-only model to calculate the associated statistic and P value. Although this test specifically looks at the defined model and an intercept-only model, it is the same test that is provided on the Compare tab of the multiple logistic regression parameters dialog to compare any two nested models.

The likelihood ratio tells you how much more likely the data were to be generated by the fit model than by the intercept-only model. If the independent variables do a good job of predicting the outome, the likelihood ratio will be high and the corresponding P value will be small.

Note the different meaning of the P values

A small P value has opposite meanings with the two tests:

A small P value from the Hosmer-Lemeshow test means that the specified model doesn't do a good  job of predicting the data. Consider whether you need additional independent variables or interactions included in the model.

A small P value from the likelihood ratio test means that the intercept-only model doesn't do a good job of predicting the data. The specified independent variables and interactions improve the fit of the model to the data.

 

 

© 1995-2019 GraphPad Software, LLC. All rights reserved.