## Please enable JavaScript to view this site.

 Classification methods for multiple logistic regression

A reasonable question to ask when evaluating a model might be, “How well does the model work for classifying the 0s and 1s observed in the data?”

Logistic regression computes the probability of receiving a "positive" result (encoded in the data table as a 1). To use logistic regression to predict if a  new observation is “positive” or “negative”, specify a cutoff value that specifies the minimum probability that would be considered a “positive”. The standard cutoff is 0.5, which means that if the predicted probability is greater than 0.5, that observation is classified as a “positive” (or simply as a 1). Now that we understand what cutoff values tell us, let’s look at the three classification methods Prism offers.

## Area under the ROC curve

Area under the ROC curve (AUC) provides an aggregate value of how well the model correctly classifies the 0s and 1s with all possible cutoff values. AUC values range between 0.5 and 1, where an area of 0.5 means that the model predicts which outcomes will be 1 or 0 no better than flipping a coin, and an area of 1 means that the model predicts perfectly. Provided are some examples of various extremes for ROC curves from logistic regression.

## Classification table

The classification table reports a 2x2 table that displays the numbers of correctly classified values at the user-specified cutoff. This table has four entries that report the number of observed 0s (and 1s) that were correctly (and incorrectly) predicted. Additionally, the classification table will provide information on total number of observed 1s and 0s, total number of predicted 1s and 0s, the percent of correctly classified 1s and 0s, the percent of total correctly classified observations, and the positive and negative predictive power. Details.

## Row classification

Row Classification generates an additional table containing two columns. The first column contains a copy of the values in the selected independent (Y) variable column as found in the data table. The second column contains predicted probabilities generated by the model corresponding to each entry (each row) in the first column.