The first page of Bland-Altman results shows the difference and average values and is used to create the plot.

The second results page shows the average bias, or the average of the differences. The bias is computed as the value determined by one method minus the value determined by the other method. If one method is sometimes higher, and sometimes the other method is higher, the average of the differences will be close to zero. If it is not close to zero, this indicates that the two assay methods are systematically producing different results.

This page also shows the standard deviation (SD) of the differences between the two assay methods (labeled as the SD of bias) and the 95% limits of agreement, computed as the mean difference (bias) plus or minus 1.96 times its SD. If we assume a Normal distribution for the differences, and a large enough sample size that the sample mean and sample SD are quite close to their population values, then 95% of the differences between the two assay methods would be expected to be within the range of values described by the 95% limits of agreement.

Especially with small sample sizes, the sample mean and sample SD may not have values close to the true population mean and SD. To account for this possible discrepancy, it is possible to compute 95% prediction bands for the difference between the two assay methods. These 95% prediction bands are wider than the 95% limits of agreement (especially for small sample sizes), and so provide a more accurate prediction of where to expect future differences between the two assay methods to be found. Prism does not compute the prediction bands but they can easily be computed by hand using a formula on page 146 of a review by Giavarina (1).

Bland-Altman plots are generally interpreted informally, without further analyses. Ask yourself these questions:

•How big is the average discrepancy between methods (the bias)? You must interpret this clinically. Is the discrepancy large enough to be important? This is a clinical question, not a statistical one.

•How wide are the limits of agreement? If it is wide (as defined clinically), the results are ambiguous. If the limits are narrow (and the bias is tiny), then the two methods are essentially equivalent.

•Is there a trend? Does the difference between methods tend to get larger (or smaller) as the average increases?

•Is the variability consistent across the graph? Does the scatter around the bias line get larger as the average gets higher?

1. Giavarina, D. (2015). Understanding Bland Altman analysis. Biochem Med 25: 141–151.