Contents

Statistical principles:

The need for statistics

Sample vs population

Gaussian distribution

Confidence intervals

P values

Statistical significance

Power

Bayes

Multiple comparisons

Analyzing one group

Analyzing two groups

Analysis of variance (ANOVA)

Analyzing survival data

Categorical data
(contingency tables)

Correlation & linear regression

Our Products...
Prism
InStat
StatMate
Intuitive Biostatistics


© 1999 GraphPad Software Inc.

The Prism Guide to Interpreting Statistical Results
This guide is excerpted from Analyzing Data with GraphPad Prism, a book that accompanies the program GraphPad Prism. Browse this guide using the Contents navigation on the left. You may also download the entire book.

Statistical power

Type II errors and power

When a study reaches a conclusion of "no statistically significant difference", you should not necessarily conclude that the treatment was ineffective. It is possible that the study missed a real effect because you used a small sample or your data were quite variable. In this case you made a Type II error - obtaining a "not significant" result when in fact there is a difference.

When interpreting the results of an experiment that found no significant difference, you need to ask yourself how much power the study had to find various hypothetical differences if they existed. The power depends on the sample size and amount of variation within the groups, where variation is  quantified by the standard deviation (SD).

Here is a precise definition of power: Start with the assumption that the two population means differ by a certain amount and that the SD of the populations has a particular value. Now assume that you perform many experiments with the sample size you used, and calculate a P value for each experiment. Power is the fraction of these experiments that would have a P value less than a (the largest P value you deem "significant", usually set to 0.05). In other words, power equals the fraction of experiments that would lead to statistically significant results. Prism does not compute power, but the companion program GraphPad StatMate does.

  Example of power calculations

Motulsky et al. asked whether people with hypertension (high blood pressure) had altered numbers of a2-adrenergic receptors on their platelets (Clinical Science 64:265-272, 1983). There are many reasons to think that autonomic receptor numbers may be altered in hypertensives. They studied platelets because they are easily accessible from a blood sample. The results are shown here:

Variable Hypertensives Controls
Number of subjects 18 17

Mean receptor number
(receptors per cell)

257 263
Standard Deviation 59.4 86.6

The two means were almost identical, and a t test gave a very high P value. The authors concluded that the platelets of hypertensives do not have an altered number of a2 receptors.

What was the power of this study to find a difference if there was one? The answer depends on how large the difference really is. Prism does not compute power, but the companion program GraphPad StatMate does. Here are the results shown as a graph.

If the true difference between means was 50.58, then this study had only 50% power to find a statistically significant difference. In other words, if hypertensives really averaged 51 more receptors per cell, you'd find a statistically significant difference in about half of studies of this size, but would not find a statistically significant difference in the other half of the studies. This is about a 20% change (51/257), large enough that it could possibly have a physiological impact.

If the true difference between means was 84 receptors/cell, then this study had 90% power to find a statistically significant difference. If hypertensives really had such a large difference, you'd find a statistically significant difference in 90% of studies this size and would find a not significant difference in the other 10% of studies.

All studies have low power to find small differences and high power to find large differences. However, it is up to you to define "low" and "high" in the context of the experiment, and to decide whether the power was high enough for you to believe the negative results. If the power is too low, you shouldn't reach a firm conclusion until the study has been repeated with more subjects. Most investigators aim for 80% or 90% power to detect a difference.

Since this study had only a 50% power to detect a difference of 20% in receptor number (50 sites per platelet, a large enough difference to possibly explain some aspects of hypertension physiology), the negative conclusion is not solid.