Contents

Statistical principles:

The need for statistics

Sample vs population

Gaussian distribution

Confidence intervals

P values

Statistical significance

Power

Bayes

Multiple comparisons

Analyzing one group

Analyzing two groups

Analysis of variance (ANOVA)

Analyzing survival data

Categorical data
(contingency tables)

Correlation & linear regression

Our Products...
Prism
InStat
StatMate
Intuitive Biostatistics


© 1999 GraphPad Software Inc.

The Prism Guide to Interpreting Statistical Results
This guide is excerpted from Analyzing Data with GraphPad Prism, a book that accompanies the program GraphPad Prism. Browse this guide using the Contents navigation on the left. You may also download the entire book.

The key concept: Sampling from a population

Sampling from a population

The basic idea of statistics is simple: you want to extrapolate from the data you have collected to make general conclusions about the larger population from which the data sample was derived.

To do this, statisticians have developed methods based on a simple model: Assume that all your data are randomly sampled from an infinitely large population. Analyze this sample, and use the results to make inferences about the population.

This model is an accurate description of some situations. For example, quality control samples really are randomly selected from a large population. Clinical trials do not enroll a randomly selected sample of patients, but it is usually reasonable to extrapolate from the sample you studied to the larger population of similar patients.

In a typical experiment, you don't really sample from a population. But you do want to extrapolate from your data to a more general conclusion. The concepts of sample and population can still be used if you define the sample to be the data you collected, and the population to be the data you would have collected if you had repeated the experiment an infinite number of times.

Note that the term sample has a specific definition in statistics that is very different than its usual definition. Learning new meanings for commonly used words is part of the challenge of learning statistics.

The need for independent samples

  • It is not enough that your data are sampled from a population. Statistical tests are also based on the assumption that each subject (or each experimental unit) was sampled independently of the rest. The concept of independence can be difficult to grasp. Consider the following three situations.
  • You are measuring blood pressure in animals. You have five animals in each group, and measure the blood pressure three times in each animal. You do not have 15 independent measurements, because the triplicate measurements in one animal are likely to be closer to each other than to measurements from the other animals. You should average the three measurements in each animal. Now you have five mean values that are independent of each other.
  • You have done a biochemical experiment three times, each time in triplicate. You do not have nine independent values, as an error in preparing the reagents for one experiment could affect all three triplicates. If you average the triplicates, you do have three independent mean values.
  • You are doing a clinical study, and recruit ten patients from an inner-city hospital and ten more patients from a suburban clinic. You have not independently sampled 20 subjects from one population. The data from the ten inner-city patients may be closer to each other than to the data from the suburban patients. You have sampled from two populations, and need to account for this in your analysis.

Data are independent when any random factor that causes a value to be too high or too low affects only that one value. If a random factor (that you didn't account for in the analysis of the data) can affect more than one value, but not all of the values, then the data are not independent.

How you can use statistics to extrapolate from sample to population

Statisticians have devised three basic approaches to make conclusions about populations from samples of data:

The first method is to assume that the populations follow a special distribution, known as the Gaussian (bell shaped) distribution. Once you assume that a population is distributed in that manner, statistical tests let you make inferences about the mean (and other properties) of the population. Most commonly used statistical tests assume that the population is Gaussian.

The second method is to rank all values from low to high, and then compare the distribution of ranks. This is the principle behind most commonly used nonparametric tests, which are used to analyze data from non-Gaussian distributions.

The third method is known as resampling. With this method, you create a population of sorts, by repeatedly sampling values from your sample.This is best understood by an example. Assume you have a single sample of five values, and want to know how close that sample mean is likely to be from the true population mean. Write each value on a card and place the cards in a hat. Create many pseudo samples by drawing a card from the hat, and returning it. Generate many samples of N=5 this way. Since you can draw the same value more than once, the samples won't all be the same. When randomly selecting cards gets tedious, use a computer program instead. The distribution of the means of these computer-generated samples gives you information about how accurately you know the mean of the entire population. The idea of resampling can be difficult to grasp. To learn about this approach to statistics, read the instructional material available at www.resample.com. Prism does not perform any tests based on resampling. Resampling methods are closely linked to bootstrapping methods.

Limitations of statistics

The statistical model is simple: Extrapolate from the sample you collected to a more general situation, assuming that your sample was randomly selected from a large population. The problem is that the statistical inferences can only apply to the population from which your samples were obtained, but you often want to make conclusions that extrapolate even beyond that large population.  For example,  you perform an experiment in the lab three times. All the experiments used the same cell preparation, the same buffers, and the same equipment. Statistical inferences let you make conclusions about what would happen if you repeated the experiment many more times with that same cell preparation, those same buffers, and the same equipment. You probably want to extrapolate further to what would happen if someone else repeated the experiment with a different source of cells, freshly made buffer and different instruments. Unfortunately, statistical calculations can't help with this further extrapolation. You must use scientific judgment and common sense to make inferences that go beyond the limitations of statistics. Thus, statistical logic is only part of data interpretation.