Please enable JavaScript to view this site.

 Navigation: PRINCIPLES OF STATISTICS > Choosing sample size What's wrong with standard values for effect size?

The appeal of using standard effect sizes

Computing sample size requires that you decide how large a difference you are looking for -- how large a difference (association, correlation..) would be scientifically interesting. You'll need a large sample size if your goal is to find tiny differences. You can get by with smaller samples, if you are only looking for larger differences.

In a very influential book (1) , Jacob Cohen   makes some recommendations for what to do when you don't know what effect size you are looking for. He limits these recommendations to the behavioral sciences (his area of expertise), and warns that all general recommendations are more useful in some circumstances than others. Here are his guidelines for an unpaired t test:

A "small" difference between means is equal to one fifth the standard deviation.

A "medium" effect size is equal to one half the standard deviation.

A "large" effect is equal to 0.8 times the standard deviation.

So if you are having trouble deciding what effect size you are looking for (and therefore are stuck and can't determine a sample size), Cohen would recommend you choose whether you are looking for a "small", "medium", or "large" effect, and then use the standard definitions.

The problem with standard effect sizes

Russell Lenth (2) argues that you should avoid these "canned" effect sizes, and I agree. You must decide how large a difference you care to detect based on understanding the experimental system you are using and the scientific questions you are asking. Cohen's recommendations seem a way to avoid thinking about the point of the experiment. It doesn't make sense to only think about the difference you are looking at in terms of the scatter you expect to see (anticipated standard deviation), without even considering what the mean value might be.

If you choose standard definitions of alpha (0.05), power (80%), and effect size (see above), then there is no need for any calculations. If you accept those standard definitions for all your studies (that use an unpaired t test to compare two groups), then all studies need a sample size of 26 in each group to detect a large effect, 65 in each group to detect a medium effect, 400 in each group to detect a small effect.

Bottom line

Choosing standard effect sizes is really the same as picking standard sample sizes.

References

1. J. Cohen, Statistical Power Analysis for the Behavioral Sciences, 1988, ISBN=978-0805802832

2. R. V. Lenth, R. V. (2001), "Some Practical Guidelines for Effective Sample Size Determination,'' The American Statistician, 55, 187-193.  A preliminary draft was posted as a pdf file.