Interpreting results: Mean, geometric mean and median 

Interpreting results: Mean, geometric mean and median 


The mean is the average. Add up the values, and divide by the number of values.
The median is the 50th percentile. Half the values are higher than the median, and half are lower.
Compute the logarithm of all values, compute the mean of the logarithms, and then take the antilog. It is a better measure of central tendency when data follow a lognormal distribution (long tail).
If your data are sampled from a Gaussian distribution, the mean, geometric mean and median all have similar values. But if the distribution is skewed, the values can differ a lot as this graph shows:
The graph shows one hundred values sampled from a population that follows a lognormal distribution. The left panel plots the data on a linear (ordinary) axis. Most of the data points are piled up at the bottom of the graph, where you can't really see them. The right panel plots the data with a logarithmic scale on the Y axis. On a log axis, the distribution appears symmetrical. The median and geometric mean are near the center of the data cluster (on a log scale) but the mean is much higher, being pulled up by some very large values.
Why is there no 'geometric median'? you would compute such a value by converting all the data to logarithms, find their median, and then take the antilog of that median. The result would be identical to the median of the actual data, since the median works by finding percentiles (ranks) and not by manipulating the raw data.
The idea of trimmed or Winsorized means is to not let the largest and smallest values have much impact. Before calculating a trimmed or Winsorized mean, you first have to choose how many of the largest and smallest values to ignore or down weight. If you set K to 1, the largest and smallest values are treated differently. If you set K to 2, then the two largest and two smallest values are treated differently. K must be set in advance. Sometimes K is set to 1, other times to some small fraction of the number of values, so K is larger when you have lots of data.
To compute a trimmed mean, simply delete the K smallest and K largest observations, and compute the mean of the remaining data.
To compute a Winsorized mean, replace the K smallest values with the value at the K+1 position, and replace the k largest values with the value at the NK1 position. Then take the mean of the data. .
The advantage of trimmed and Winsorized means is that they are not influenced by one (or a few) very high or low values. Prism does not compute these values.
To compute the harmonic mean, first transform all the values to their reciprocals. Then take the mean of those reciprocals. The harmonic mean is the reciprocal of that mean. If the values are all positive, larger numbers effectively get less weight than lower numbers. The harmonic means is not often used in biology, and is not computed by Prism.
The mode is the value that occurs most commonly. It is not useful with measured values assessed with at least several digits of accuracy, as most values will be unique. It can be useful with variables that can only have integer values. While the mode is often included in lists like this, the mode doesn't always assess the center of a distribution. Imagine a medical survey where one of the questions is "How many times have you had surgery?" In many populations, the most common answer will be zero, so that is the mode. In this case, some values will be higher than the mode, but none lower, so the mode is not a way to quantify the center of the distribution.