Details of calculations for ROC curves.

Last modified January 1, 2009

The area under the ROC curve ?

The area under a ROC curve quantifies the overall ability of the test to discriminate between those individuals with the disease and those without the disease.

Prism computes the area under the entire AUC curve, starting at 0,0 and ending at 100, 100. Note that even though Prism does not plot the ROC curve out to these extremes, it computes the area for that entire curve. Learn how to extend the ROC curve and how the AUC is computed. 

Even though the sensitivity and specificity are plotted as percentages, the AUC is computed as if they were fractions. The maximum possible AUC of an ideal test, therefore, is 1.00. The AUC doesn't  really have practical units, but it can be thought of as the fraction of the maximum possible AUC . 

A truly useless test (one no better at identifying true positives than flipping a coin) has an area of 0.5. A perfect test (one that has zero false positives and zero false negatives) has an area of 1.00. Your test will have an area between those two values. Even if you choose to plot the results as percentages, Prism reports the area as a fraction.

Here is a very intuitive interpretation of the AUC.

If patients have higher test values than controls, then:

The area represents the probability that a randomly selected patient will have a higher test result than a randomly selected control.

If patients tend to have lower test results than controls:

The area represents the probability that a randomly selected patient will have a lower test result than a randomly selected control.

For example: If the area equals 0.80, on average, a patient will have a more abnormal test result than 80% of the controls. If the test were perfect, every patient would have a more abnormal test result than every control and the area would equal 1.00.

If the test were worthless, no better at identifying normal versus abnormal than chance, then one would expect that half of the controls would have a higher test value than a patient known to have the disease and half would have a lower test value. Therefore, the area under the curve would be 0.5.

The area under a ROC curve can never be less than 0.50. If the area is first calculated as less than 0.50, Prism will reverse the definition of abnormal from a higher test value to a lower test value. This adjustment will result in an area under the curve that is greater than 0.50.

Bug: The area can be greater than 1.00

A bug in Prism 5 and 6 can lead to a reported area greater than 1.0 when you have a huge amount of data. This happens when Controls * Patients^2 >= 2,147,483,648) or Controls^2 * Patients >= 2,147,483,648. With equal numbers of patients and controls, this bug will happen when there are more than 2580 subjects in all. Will be fixed in Prism 7.

Computing the standard error of the area under a ROC curve

The SE of the area is calculated using this equation from Hanley JA, McNeil BJ. Radiology 1982 143 29-36. The meaning and use of the area under the Receiver Operating Characteristic (ROC) curve

SE = __    /  A (1-A) + (na-1)(Q1 - A*A)+(nn-1)(Q2 - A*A)
       \  /   -----------------------------------------
        \/                     na*nn

Where A is the area under the curve, na and nn are the number of abnormals and normals respectively, and Q1 and Q2 are estimated by:

Q1 = A / (2 - A)
          Q2 = 2A*A / (1 + A)

The P value tests the null hypothesis that the area under the ROC curve is really 0.5, meaning that the diagnostic test is not helpful at all.

To calculate this P value, Prism recomputes the SE assuming that the area really is 0.5, which is the null hypothesis. This simplifies the equation to:


SE = __    /  0.25 + (na + nn - 2)(0.083333)
       \  /   -----------------------------------------
        \/                     na*nn

Calculating the P value

Prism calculates z as (Area - 0.5)/(SEarea)

Finally, a two-tailed P value is calculated from the z ratio, using the normal distribution.

The P value tests the null hypothesis that the test has no skill at distinguishing patients from controls. Its results are essentially random.


The sensitivity and specificity at various thresholds

The list of thresholds is taken by sorting all the values in both groups (patients and controls) and averaging adjacent values in that sorted list. So each threshold value is midway between two values in the data. 

Each sensitivity is the fraction of values in the patient group that are above the threshold. The specificity is the fraction of values in the control group that are below the threshold. Each confidence intervals is computed from the observed proportion by the Clopper method, without any correction for multiple comparisons. 


Explore the Knowledgebase

Analyze, graph and present your scientific work easily with GraphPad Prism. No coding required.