When you choose to compare the means of two nonpaired groups with a t test, you have two choices:
•Use the standard unpaired t test. It assumes that both groups of data are sampled from Gaussian populations with the same standard deviation.
•Use the unequal variance t test, also called the Welch t test. It assues that both groups of data are sampled from Gaussian populations, but does not assume those two populations have the same standard deviation.
To interpret any P value, it is essential that the null hypothesis be carefully defined. For the unequal variance t test, the null hypothesis is that the two population means are the same but the two population variances may differ. If the P value is large, you don't reject that null hypothesis, so conclude that the evidence does not persuade you that the two population means are different, even though you assume the two populations may have different standard deviations. What a strange set of assumptions. What would it mean for two populations to have the same mean but different standard deviations? Why would you want to test for that? Swailowsky points out that this situation simply doesn't often come up in science (1).
I think the unequal variance t test is more useful when you think about it as a way to create a confidence interval. Your prime goal is not to ask whether two populations differ, but to quantify how far apart the two means are. The unequal variance t test reports a confidence interval for the difference between two means that is usable even if the standard deviations differ.
Both t tests report both a P value and confidence interval. The calculations differ in two ways:
The t ratio is computed by dividing the difference between the two sample means by the standard error of the difference between the two means. This standard error is computed from the two standard deviations and sample sizes. When the two groups have the same sample size, the standard error is identical for the two t tests. But when the two groups have different sample sizes, the t ratio for the Welch t test is different than for the ordinary t test. This standard error of the difference is also used to compute the confidence interval for the difference between the two means.
For the ordinary unpaired t test, df is computed as the total sample size (both groups) minus two. The df for the unequal variance t test is computed by a complicated formula that takes into account the discrepancy between the two standard deviations. If the two samples have identical standard deviations, the df for the Welch t test will be identical to the df for the standard t test. In most cases, however, the two standard deviations are not identical and the df for the Welch t test is smaller than it would be for the unpaired t test. The calculation usually leads to a df value that is not an integer. Prism reports and uses this fractional value for df. Many programs, including Prism 5, as well as InStat and our QuickCalc all round the df down to next lower integer. For this reason, the P value reported by Prism can be a bit smaller than the P values reported by other programs.
Deciding when to use the unequal variance t test is not straightforward.
It seems sensible to first test whether the variances are different, and then choose the ordinary or Welch t test accordingly. In fact, this is not a good plan. You should decide to use this test as part of the experimental planning.
What about always choosing the Welch test? Ruxton (2) and Delacre (3) make a strong case that this is a good idea. You lose some power when the standard deviations are, in fact, equal but gain power in the cases where they are not.
1. S.S. Sawilowsky. Fermat, Schubert, Einstein, and Behrens-Fisher: The Probable Difference Between Two Means With Different Variances. J. Modern Applied Statistical Methods (2002) vol. 1 pp. 461-472
2. Ruxton. The unequal variance t-test is an underused alternative to Student's t-test and the Mann-Whitney U test. Behavioral Ecology (2006) vol. 17 (4) pp. 688
3. Delacre, M., Lakens, D.L., and Leys, C. (2017). Why Psychologists Should by Default Use Welch's t-test Instead of Student's t-test. Rips 30: 92–10.