Many experiments and clinical trials are run with too few subjects. An underpowered study is a wasted effort because even substantial treatment effects are likely to go undetected. Even if the treatment substantially changed the outcome, the study would have only a small chance of finding a "statistically significant" effect.

When planning a study, therefore, you need to choose an appropriate sample size. The required sample size depends on your answers to these questions:

How scattered do you expect your data to be in the population you're sampling from?

In statistical terms, this is asking how much variance there is in the population. In order to determine a sample size, you must estimate the variance (or standard deviation) in the population. If you can't estimate the standard deviation, you can't compute how many subjects you will need. If you expect lots of scatter (a large variance), then it will be harder to discriminate real effects from random noise, and you'll need lots of subjects.

This question is answered with your definition of statistical significance. Almost all investigators choose the 5% significance level, meaning that P values less than 0.05 are considered to be "statistically significant". If you choose a smaller significance level (say 1%), then you'll need more subjects.

This is a much trickier question than the first two. Everyone would prefer to plan a study that can detect very small differences (small effect sizes), but this requires a large sample size. Most of the time, you'll be limited by resources (time, money, available participants, etc.). You must be able to choose an effect size that is 'meaningful' but that is feasible with the resources available to you.

How sure do you need to be that your study will detect a difference, if it exists?

In other words, how much statistical power do you need? Like the previous question, everyone wants to design a study with lots of power, so it is quite certain to return a "statistically significant" result if the treatment actually works, but this too requires lots of subjects.

Rather than asking you to answer those last two questions, Prism's power analysis calculator available in Prism Cloud presents results in a table allowing you to see the tradeoffs between sample size, power, and the effect size you can detect. You can look at this table, consider the time, expense and risk of your experiment, and decide on an appropriate sample size. Note that this table does not directly answer the question "how many subjects do I need?" but rather answers the related question "if I use N subjects, what information can I learn?". This approach to sample size calculations was recommended by Parker and Berman (1).

In some cases, these results may convince you that it is impossible to find what you want to know with the number of subjects you are able to use. This can be very helpful. It is far better to cancel such an experiment in the planning stage, than to waste time and money on a futile experiment that won't have sufficient power. If the experiment involves any clinical risk or expenditure of public money, performing such a study can even be considered unethical.

One benefit of larger sample size is that you have more power to detect a specified effect. Equivalently, a larger sample size with constant power will allow you to detect smaller effect sizes. But there is another reason to choose larger sample sizes when possible. With larger samples, you can better assess the distribution of the data. Is the assumption of sampling from a Gaussian, or lognormal, distribution reasonable? With larger samples, it is easier to assess

1. R. A. Parker and N. G. Berman, Sample Size: More than Calculations, Am. Statistician 57:166-170, 2003.