| The confidence interval of a proportion
When an experiment has two possible outcomes, the results are expressed as a proportion. Out of N experiments (or subjects), you observed one outcome (termed "success") in S experiments (or subjects) and the alternative outcome in N-S experiments. Success occurred in S/N of the experiments (or subjects), and we will call that proportion p. Since your data are derived from random sampling, the true proportion of success in the overall population is almost certainly not p. A 95% confidence interval quantifies the uncertainty. You can be 95% sure the overall proportion of success is within the confidence interval.
How to compute the 95% CI of a proportion
Prism does not compute the confidence interval of a single proportion, but does compute the confidence interval of two proportions when analyzing a 2x2 contingency table. Prism's companion program StatMate computes a confidence interval of a single proportion. Both programs (and many others) compute a confidence interval of a proportion using a method developed by Clopper and Pearson (Biometrika 26:404-413, 1934). The result is labeled an "exact" confidence interval (in contrast to the approximate intervals you can calculate conveniently by hand).
If you want to compute the 95% confidence interval by hand, most books present the Wald equation:

However, there is a better way. The Wald approximation is known to work well only with large N and proportions not too close to 0.0 or 1.0. Computer simulations by several investigators demonstrate that the so-called exact confidence intervals are also approximations. They are wider than they need to be, and so generally give you more than 95% confidence. The discrepancy varies depending on the values of S and N. The so-called "exact" confidence intervals are not, in fact, exactly correct. For all values of S and N, you can be sure that you get at least 95% confidence, but the intervals may be wider than they need to be.
Agresti and Coull (The American Statistician. 52:119-126, 1998) recommend a method they term the modified Wald method. It is easy to compute by hand and is more accurate than the so-called "exact" method.

In some cases, the lower limit calculated using that equation is less than zero. If so, set the lower limit to 0.0. Similarly, if the calculated upper limit is greater than 1.0, set the upper limit to 1.0.
This method works very well. For any values of S and N, there is close to a 95% chance that it contains the true proportion. With some values of S and N, the degree of confidence can a bit less than 95%, but it is never less than 92%.
Where did the numbers 2 and 4 in the equation come from? Those values are actually z and z2, where z is a critical value from the Gaussian distribution. Since 95% of all values of a normal distribution lie within 1.96 standard deviations of the mean, z=1.96 (which we round to 2.0) for 95% confidence intervals.
Note that the confidence interval is centered on p', which is not the same as p, the proportion of experiments that were "successful". If p is less than 0.5, p' is higher than p. If p is greater than 0.5, p' is less than p. This makes sense as the confidence interval can never extend below zero or above one. So the center of the interval is between p and 0.5.
The meaning of "95% confidence" when the numerator is zero
If the numerator of a proportion is zero, the "95% confidence interval" really gives you 97.5% confidence. Here's why. When the proportion does not equal zero, we define the 95% confidence interval so that there is a 2.5% chance that the true proportion is less than the lower limit of the interval, and a 2.5% chance that the true proportion is higher than the upper limit. This leaves a 95% chance (100% -2.5% - 2.5%) that the interval includes the true proportion. When the numerator is zero, we know that the true proportion cannot be less than zero, so we only need to compute an upper confidence limit. If we use the usual equations, we define the upper limit so that there is only a 2.5% chance that the true proportion is higher. Since the uncertainty only goes one way you'll actually have a 97.5% CI (100% - 2.5%). The advantage of this approach is consistency with CIs computed for proportions where the numerator is not zero.
If you don't care about consistency with other data, but want to really calculate a 95% CI, you can do that by computing a "90% CI". This is computed so that there is a 5% chance that the true proportion is higher than the upper limit. If the numerator is zero, there is no chance of the proportion being less than zero, so the "90% CI" reported by StatMate (or other programs) really gives you 95% confidence (and StatMate tells you this). For the example above, StatMate says that the "90% confidence interval" for a proportion with the numerator = 0 and the denominator = 41 extends from 0.00% to 7.04%.
A shortcut equation for a confidence interval when the numerator equals zero
JA Hanley and A Lippman-Hand (J. Am. Med. Assoc., 249: 17431745, 1983) devised a simple shortcut equation for estimating the 95% confidence interval. If you observe zero events in N trials, you can be 95% sure that the true rate is less than 3/N. To compute the usual "95% confidence interval" (which really gives you 97.5% confidence), estimate the upper limit as 3.5/N. This equation is so simple, you can do it by hand in a few seconds.
Here is an example. You observe 0 dead cells in 10 cells you examined. What is the 95% confidence interval for the true proportion of dead cells. The "exact 95% CI" (calculated by StatMate) is 0.00% to 30.83. The adjusted Wald equation gives a "95%" confidence interval of 0.0 to 32.61%. The shortcut equation computes upper confidence limits of 35% (3.5/10). With such small N, the shortcut equation overestimates the confidence limit, but it is useful as a ballpark estimate.
Another example: You have observed no adverse drug reactions in the first 250 patients treated with a new antibiotic. What is the confidence interval for the true rate of drug reactions? StatMate tells us that the true rate could be as high as 1.46% (95% CI). The shortcut equation computes the upper limits as 1.40% (3.5/250). The adjusted Wald equation computes the upper limit as 1.87%. With large N, the shortcut equation is reasonably exact.
|