GraphPad Statistics Guide

The unequal variance Welch t test

The unequal variance Welch t test

Previous topic Next topic No expanding text in this topic  

The unequal variance Welch t test

Previous topic Next topic JavaScript is required for expanding text JavaScript is required for the print function Mail us feedback on this topic!  

Two unpaired t tests

When you choose to compare the means of two nonpaired groups with a t test, you have two choices:

Use the standard unpaired t test. It assumes that both groups of data are sampled from Gaussian populations with the same standard deviation.

Use the unequal variance t test, also called the Welch t test. It assues that both groups of data are sampled from Gaussian populations, but does not assume those two populations have the same standard deviation.

The usefulness of the unequal variance t test

To interpret any P value, it is essential that the null hypothesis be carefully defined. For the unequal variance t test, the null hypothesis is that the two population means are the same but the two population variances may differ. If the P value is large, you don't reject that null hypothesis, so conclude that the evidence does not persuade you that the two population means are different, even though you assume the two populations may have different standard deviations. What a strange set of assumptions. What would it mean for two populations to have the same mean but different standard deviations? Why would you want to test for that? Swailowsky points out that this situation simply doesn't often come up in science (1).

I think the unequal variance t test is more useful when you think about it as a way to create a confidence interval. Your prime goal is not to ask whether two populations differ, but to quantify how far apart the two means are. The unequal variance t test reports a confidence interval for the difference between two means that is usable even if the standard deviations differ.

How the unequal variance t test is computed

Both t tests report both a P value and confidence interval. The calculations differ in two ways:

Calculation of the standard error of the difference between means

The t ratio is computed by dividing the difference between the two sample means by the standard error of the difference between the two means. This standard error is computed from the two standard deviations and sample sizes. When the two groups have the same sample size, the standard error is identical for the two t tests. But when the two groups have different sample sizes, the t ratio for the Welch t test is different than for the ordinary t test. This standard error of the difference  is also used to compute the confidence interval for the difference between the two means.

Calculation of the df

For the ordinary unpaired t test, df is computed as the total sample size (both groups) minus two. The df for the unequal variance t test is computed by a complicated formula that takes into account the discrepancy between the two standard deviations. If the two samples have identical standard deviations, the df for the Welch t test will be identical to the df for the standard t test. In most cases, however, the two standard deviations are not identical and the df for the Welch t test is smaller than it would be for the unpaired t test. The calculation usually leads to a df value that is not an integer. Prism reports and uses this fractional value for df. Many programs, including Prism 5, as well as  InStat  and our QuickCalc all round the df down to next lower integer. For this reason, the P value reported by Prism  can be a bit smaller than the P values reported by other programs.

When to chose the unequal variance (Welch) t test

Deciding when to use the unequal variance t test is not straightforward.

It seems sensible to first test whether the variances are different, and then choose the ordinary or Welch t test accordingly. In fact, this is not a good plan. You should decide to use this test as part of the experimental planning.

What about always choosing the Welch test? Ruxton (2) and Delacre (3) make a strong case that this is a good idea. You lose some power when the standard deviations are, in fact, equal but gain power in the cases where they are not.

Reference

1. S.S. Sawilowsky.  Fermat, Schubert, Einstein, and Behrens-Fisher: The Probable Difference Between Two Means With Different Variances. J. Modern Applied Statistical Methods (2002) vol. 1 pp. 461-472

2. Ruxton. The unequal variance t-test is an underused alternative to Student's t-test and the Mann-Whitney U test. Behavioral Ecology (2006) vol. 17 (4) pp. 688

3. Delacre, M., Lakens, D.L., and Leys, C. (2017). Why Psychologists Should by Default Use Welch's t-test Instead of Student's t-test. Rips 30: 92–10.