KNOWLEDGEBASE - ARTICLE #1568

The unequal variance (Welch) t test

Two unpaired t tests

When you choose to compare the means of two non-paired groups with a t test, you have two choices:

Use the standard unpaired t test. It assumes that both groups of data are sampled from Gaussian populations with the same standard deviation.
Use the unequal variance t test, also called the Welch t test. It assumes that both groups of data are sampled from Gaussian populations, but does not assume those two populations have the same standard deviation.

These choices are offered by GraphPad Prism and the GraphPad free web t test QuickCalc, as well as many other programs.

The usefulness of the unequal variance t test

To interpret any P value, it is essential that the null hypothesis be carefully defined. For the unequal variance t test, the null hypothesis is that the two population means are the same but the two population variances may differ. If the P value is large, you don't reject that null hypothesis, so conclude that the evidence does not persuade you that the two population means are different, even though you assume the two populations have (or may have) different standard deviations. What a strange set of assumptions. What would it mean for two populations to have the same mean but different standard deviations? Why would you want to test for that? Swailowsky points out that this situation simply doesn't often come up in science (1).

I think the unequal variance t test is more useful when you think about it as a way to create a confidence interval. Your prime goal is not to ask whether two populations differ, but to quantify how far apart the two means are. The unequal variance t test reports a confidence interval for the difference between two means that is usable even if the standard deviations differ.

How the unequal variance t test is computed

Both t tests report both a P value and confidence interval. The calculations differ in two ways:

Calculation of the standard error of the difference between means. The t ratio is computed by dividing the difference between the two sample means by the standard error of the difference between the two means. This standard error is computed from the two standard deviations and sample sizes. When the two groups have the same sample size, the standard error is identical for the two t tests. But when the two groups have different sample sizes, the t ratio for the Welch t test is different than for the ordinary t test. This standard error of the difference is also used to compute the confidence interval for the difference between the two means.
Calculation of the df. For the ordinary unpaired t test, df is computed as the total sample size (both groups) minus two. The df for the unequal variance t test is computed by a complicated formula that takes into account the discrepancy between the two standard deviations. If the two samples have identical standard deviations, the df for the Welch t test will be identical to the df for the standard t test. In most cases, however, the two standard deviations are not identical and the df for the Welch t test is smaller than it would be for the unpaired t test. The calculation usually leads to a df value that is not an integer. Prism uses this fractional df value in its calculations directly as this is the most accurate approach. Our t test QuickCalc rounds the fractional df value down to the next lower integer. This approach is common (but less accurate).

When to choose the unequal variance (Welch) t test

Deciding when to use the unequal variance t test is not straightforward.

It seems sensible to first test whether the variances are different, and then choose the ordinary or Welch t test accordingly. In fact, this is not a good plan. You should decide to use this test as part of the experimental planning. Instead, perhaps it makes sense to always use the Welch t test. Ruxton (2) makes a strong case that this is the best approach, as does Delacre (3). Using the Welch t test, you lose some power to detect differences in populations when the standard deviations are - in fact - equal, but you gain power to detect differences when they are not.

References

S.S. Sawilowsky. Fermat, Schubert, Einstein, and Behrens-Fisher: The Probable Difference Between Two Means With Different Variances. J. Modern Applied Statistical Methods (2002) vol. 1 pp. 461-472
Ruxton. The unequal variance t-test is an underused alternative to Student's t test and the Mann-Whitney U test. Behavioral Ecology (2006) vol. 17 (4) pp. 688
Delacre, M., Lakens, D.L., and Leys, C. (2017). Why Psychologists Should by Default Use Welch's t-test Instead of Student's t-test. Rips 30: 92-10.

The unequal variance (Welch) t test

Explore the Knowledgebase