Statistics with n=2
Which statistical calculations are valid when you only have two values in each group?
Is valid to calculate the SD or SEM or CI of two values?
It seems to be common lab folklore that the calculations of SD or SEM are not valid for n=2. This folklore is wrong. The equations that calculate the SD, SEM and CI all work just fine when you have only duplicate (N=2) data.
Is it valid to compute a t test or ANOVA with only two replicates in each group?
Sure. You get more power with more data. But n=2 is enough for the results to be valid. (But of course t tests and ANOVA cannot be done with n=1.)
Why display mean and SD or SEM rather than the raw data?
When you have fewer than say 100 values, there really is not much point in graphing a mean with SD or SEM. You can display the actual data in the same amount of space. There are better alternatives to plotting either the SD or the SEM. You may not agree with that opinion with moderately large data sets, but with n=2, it really makes no sense. It is easier to show two values, or a graph with two dots, than to tabulate or graph the mean plus/minus error.
How do the values of SD and SEM, and the width of the 95% CI, relate to the range of the data?
With n=2, there is a direct relationship between the range of the data (difference between the two values) and the value of the SD and SEM, and the width of the 95% confidence interval:
- The SD equals 0.7071 times the range.
- The SEM equals 0.50 times the range. So an error bar that extends from the mean minus one SEM to the mean plus one SEM, extends from one the two data values to the other value. Connect the two dots with a vertical line and you've plotted the mean plus or minus the SEM.
- With n=2, all these are identical: the 50% CI; the range; and the mean plus/minus the SEM.
- The entire width of the 95% confidence interval equals 12.70 times the range. With only n=2, you really haven't determined the population mean very precisely.
- The margin of error (distance from mean to one end of the 95% confidence interval) is half that width, so is 6.35 times the range.
Simulations to prove that the SD and SEM calculations work for n=2
Are the results valid? It is known that the sample SD computed from small samples underestimates, on average, the true population SD. But the discrepancy is small compared to random variability inherent in collecting tiny data sets.
The discrepancy only applies to the SD. The variance, which is the SD squared, is unbiased even for n=2.
To prove the validity of n=2 calculations, I simulated ten thousand data sets with n=2, with each value randomly chosen from a Gaussian distribution (GraphPad QuickCalcs can do this, as can Excel). First I computed the 95% confidence intervals for each data set and asked whether the interval included the true value. When analyzing data, you can't answer this question. But here the data are simulated from a known population, so we know what the true population mean is. In 95.02% of these simulations, the confidence interval of the mean included the true population mean. So a confidence interval of a mean computed from a n=2 sample can be interpreted as it usually is. The only problem with having only duplicate data, is that the confidence interval is so very wide.
Using the simulated data, it does not make sense to ask whether the calculated sample SD is a good estimate of the true SD. It is known that the sample SD, on average, is too small (underestimates the population SD) when n is small. That doesn't really matter, since all statistical theory(confidence intervals, t tests, ANOVA, etc.) is actually based on the variance (the square of the SD). For these reasons, I used simulations to ask whether the sample variance from a n=2 sample is unbiased. For each of the 10,000 simulated data sets I computed the variance from the two values. The average of these 10,000 variances was within 1% of the true population variance from which the data were simulated. This shows that the variance computed from n=2 data is a valid assessment of the scatter in your data, no less valid than a variance computed from data with larger n.
The problem is that with only two values, you really don't know the SD with any accuracy. There is no bias. The variance is too high as often as it is too low. But there is a lot of variation. One way to see this is to compute the 95% CI of the standard deviation. Of course, the width of this confidence interval depends on sample size. With only n=2, the 95% CI of a standard deviation covers a huge range, from about half the SD to 32 times that SD. It takes lots of data to determine the population SD with precision. With only two values, there is a lot of uncertainty.
