KNOWLEDGEBASE - ARTICLE #1362

What you can conclude when two error bars overlap (or don't)?

It is tempting to look at whether two error bars overlap or not, and try to reach a conclusion about whether the difference between means is statistically significant.

Resist that temptation (Lanzante, 2005)!

Standard Deviation Error Bars

SD error bars quantify the scatter among the values. Looking at whether the error bars overlap lets you compare the difference between the mean with the amount of scatter within the groups. But the t test also takes into account sample size. If the samples were larger with the same means and same standard deviations, the P value would be much smaller. If the samples were smaller with the same means and same standard deviations, the P value would be larger.

When the difference between two means is statistically significant (P < 0.05), the two SD error bars may or may not overlap. Likewise, when the difference between two means is not statistically significant (P > 0.05), the two SD error bars may or may not overlap.

Knowing whether SD error bars overlap or not does not let you conclude whether the difference between the means is statistically significant or not.

Standard Error of the Mean Error Bars

SEM error bars quantify how precisely you know the mean, taking into account both the SD and sample size. Looking at whether the error bars overlap, therefore, lets you compare the difference between the mean with the precision of those means. This sounds promising. But in fact, you don’t learn much by looking at whether SEM error bars overlap.

By taking into account sample size and considering how far apart two error bars are, Cumming (2007) came up with some rules for deciding when a difference is significant or not. But these rules are hard to remember and apply.

Here is a simpler rule:

If two SEM error bars do overlap, and the sample sizes are equal or nearly equal, then you know that the P value is (much) greater than 0.05, so the difference is not statistically significant. The opposite rule does not apply. If two SEM error bars do not overlap, the P value could be less than 0.05, or it could be greater than 0.05. If the sample sizes are very different, this rule of thumb does not always work.

Confidence Interval Error Bars

Error bars that show the 95% confidence interval (CI) are wider than SE error bars. It doesn’t help to observe that two 95% CI error bars overlap, as the difference between the two means may or may not be statistically significant.

A useful rule of thumb: If two 95% CI error bars do not overlap, and the sample sizes are nearly equal, the difference is statistically significant with a P value much less than 0.05 (Payton 2003).

With multiple comparisons following ANOVA, the significance level usually applies to the entire family of comparisons. With many comparisons, it takes a much larger difference to be declared "statistically significant". But the error bars are usually graphed (and calculated) individually for each treatment group, without regard to multiple comparisons. So the rule above regarding overlapping CI error bars does not apply in the context of multiple comparisons.

Rules of thumb (for when sample sizes are equal, or nearly equal).

Type of error bar	Conclusion if they overlap	Conclusion if they don’t overlap
SD	No conclusion	No conclusion
SEM	P > 0.05	No conclusion
95% CI	No conclusion	P < 0.05 (assuming no multiple comparisons)

Unequal Sample Sizes

This page was updated 4/16/2010 to point out that the rules of thumb are true only when the sample sizes are equal, or nearly equal.

Here is an example where the rule of thumb about confidence intervals is not true (and sample sizes are very different).

Sample 1: Mean=0, SD=1, n=10

Sample 2: Mean=3, SD=10, n=100

The confidence intervals do not overlap, but the P value is high (0.35).

And here is an example where the rule of thumb about SE is not true (and sample sizes are very different).

Sample 1: Mean=0, SD=1, n=100, SEM=0.1

Sample 2: Mean 3, SD=10, n=10, SEM=3.33

The SEM error bars overlap, but the P value is tiny (0.005).

References

Cumming et al. Error bars in experimental biology. J Cell Biol (2007) vol. 177 (1) pp. 7-11

Lanzante. A Cautionary Note on the Use of Error Bars. Journal of Climate (2005) vol. 18 pp. 3699-3703

Payton et al. Overlapping confidence intervals or standard error intervals: what do they mean in terms of statistical significance?. J Insect Sci (2003) vol. 3 pp. 34