GraphPad Statistics Guide

Choosing alpha and beta for sample size calculations

Choosing alpha and beta for sample size calculations

Previous topic Next topic No expanding text in this topic  

Choosing alpha and beta for sample size calculations

Previous topic Next topic JavaScript is required for expanding text JavaScript is required for the print function Mail us feedback on this topic!  

Standard approach

When computing sample size, many scientists use standard values for alpha and beta. They always set alpha to 0.05, and beta to 0.20 (which allows for 80% power).

The advantages of the standard approach are that everyone else does it too and it doesn't require much thinking. The disadvantage is that it doesn't do a good job of deciding sample size

Choosing alpha and beta for the scientific context

When computing sample size, you should pick values for alpha and power according to the experimental setting, and on the consequences of making a Type I or Type II error ().

Let's consider four somewhat contrived examples. Assume you are running a screening test to detect compounds that are active in your system. In this context, a Type I error is concluding that a drug is effective, when it really is not. A Type II error is concluding that a drug is ineffective, when it fact it is effective. But the consequences of making a Type I or Type II error depend on the context of the experiment. Let's consider four situations.

A. Screening drugs from a huge library of compounds with no biological rationale for choosing the drugs. You know that some of the "hits" will be false-positives (Type I error) so plan to test all those "hits" in another assay. So the consequence of a Type I error is that you need to retest that compound. You don't want to retest too many compounds, so can't make alpha huge. But it might make sense to set it to a fairly high value, perhaps 0.10. A Type II error occurs when you conclude that a drug has no statistically significant effect, when in fact the drug is effective. But in this context, you have hundreds of thousands of more drugs to test, and you can't possibly test them all. By choosing a low value of power (say 60%) you can use a smaller sample size. You know you'll miss some real drugs, but you'll be able to test many more with the same effort. So in this context, you can justify setting alpha to a high value. Summary: low power, high alpha.

B. Screening  selected drugs, chosen with scientific logic. The consequences of a Type I error are as before, so you can justify setting alpha to 0.10. But the consequences of a Type II error are more serious here. You've picked these compounds with some care, so a Type II error means that a great drug might be overlooked. In this context, you want to set power to a high value. Summary: high power, high alpha.

C. Test carefully selected drugs, with no chance for a second round of testing. Say the compounds might be unstable, so you can only use them in one experiment. The results of this experiment -- the list of hits and misses -- will be used to do a structure-activity relationship which will then be used to come up with a new list of compounds for the chemists to synthesize. This will be a expensive and time-consuming task, so a lot is riding on this experiment, which can't easily be repeated. In this case, the consequences of both a Type I and Type II error are pretty bad, so you set alpha to a small value (say 0.01) and power to a large value (perhaps 99%). Choosing these values means you'll need a larger sample size, but the cost is worth it here. Summary: high power, low alpha.

D. Rethink scenario C. The sample size required for scenario C may be too high to be feasible. You simply can't run that many replicates. After talking to your colleagues, you decide that the consequence of making a Type I error (falsely concluding that a drug is effective) is much worse than making a Type II error (missing a real drug). One false hit may have a huge impact on your structure-activity studies, and lead the chemists to synthesize the wrong compounds. Falsely calling a drug to be inactive will have less severe consequences. Therefore you choose a low value of alpha and also a low power. Summary: low power, low alpha.

Bottom line

These scenarios are contrived, and I certainly am not in a position to tell anyone how to design their efforts to screen for drugs. But these scenarios make the point that you should choose values for alpha and power after carefully considering the consequences of making a Type I and Type II error. These consequences depend on the scientific context of your experiment. It doesn't really make sense to just use standard values for alpha and power.