Three way ANOVA may not answer your scientific questions 

Three way ANOVA may not answer your scientific questions 


GraphPad Prism can compute threeway ANOVA in certain circumstances. But before using threeway ANOVA, note that it often is much less useful than most scientists hope. When three way ANOVA is used to analyze data, the results often do not answer the questions the experiment was designed to ask. Let's work through an example:
A gene has been identified that is required for angiogenesis (growth of new blood vessels) under pathological conditions. The question is whether it also is active in the brain. Hypoxia (low oxygen levels) is known to provoke angiogenesis in the brain. So the question is whether angiogenesis (stimulated by hypoxia) will be reduced in animals created with that gene removed (knockedout; KO) compared to normal (wild type, WT) animals. In other words, the goal is to find out whether there is a significant difference in vessels growth in the KO hypoxic mice compared to WT hypoxic mice.
The experimental design:
•Half the animals are wildtype. Half have the gene of interest knocked out.
•Half the animals are kept in normal air. Half are kept in hypoxic (low oxygen) conditions.
•Blood vessel number in a region of brain is measured at two time points (1 and 3 weeks).
The experiment has three factors: genotype (wildtype vs KO), oxygen (normal air vs. low oxygen) and time (1 and 3 weeks). So it seems logical to think that threeway ANOVA is the appropriate analysis. Threeway ANOVA will report seven P values (even before asking for multiple comparisons tests or contrasts). These P values test seven null hypotheses:
•Effect of genotype. The null hypothesis is that in both conditions (hypoxic or not) and at all time points, the average result in the wildtype animals equals the average affect in the KO animals. This isn't very useful. You don't expect the KO to be different in the normal air condition, so averaging that with hypoxia just muddles the picture. This P value is not helpful.
•Effect of hypoxia. The null hypothesis is that with both genotypes and all time points, the average result in normal air is identical to the average result in hypoxia. We already know hypoxia will provoke angiogenesis in WT animals. The point of the experiment is to see if hypoxia has a different affect in the KO animals. Combining the results of WT and KO animals together doesn't really make sense, so this P value is not helpful.
•Effect of time. The null hypothesis is that for both genotypes and both conditions (hypoxia or not), the average result at the two times points is the same. But we know already it takes time for angiogenesis to occur, so there will be more vessel growth at late times than at early time points in the normal animals treated with hypoxia. Combining both genotypes and both conditions doesn't really make sense. This P value is not helpful.
•Interaction of genotype and hypoxia. The null hypothesis is that the effect of hypoxia is the same in wildtype and KO animals at all both points. This sort of gets at the point of the study, and is the only one of seven P values that seems to answer the experimental question. But even this P value doesn't quite test the null hypothesis you care about. You really want to know if the two genotypes have different outcomes in the presence of hypoxia. Including the data collected under normal air will confuse the results, rather than clarify. Including the data at the earliest time point, before angiogenesis had a chance to begin also clouds the picture.
•Interaction of genotype and time. Under both conditions (hypoxia and not), the null hypothesis is that the difference between the two genotypes is consistent over time. Since the whole point of the experiment is to investigate the affect of hypoxia, it makes no sense really to average together the results from hypoxic animals with results from animals breathing regular air. This P value is not useful.
•Interaction of hypoxia and time. Averaging together both genotypes, the null hypothesis is that the effect of hypoxia is the same at all times. It really makes no sense to average together both genotypes, so this P value won't be useful.
•Threeway interaction of genotype, hypoxia and time. This P value is not useful, because it is too hard to figure out what null hypothesis it tests!
Why were animals exposed to ordinary air included in the experiment? As a control. We don't expect much angiogenesis in the three week period for unstressed animals. The other half of the animals were exposed to hypoxia, which is known to provoke angiogenesis. The animals exposed to regular air are a control to show the experiment worked as expected. So I think it is reasonable to look at these results as a way to decide whether the experiment worked, and whether the hypoxic data are worth analyzing. If there was much angiogenesis in the animals exposed to regular air, you'd suspect some other toxin was present. Once you are sure the experiment worked, those data can be ignored in the final analysis.
By analyzing the data only from the hypoxic animals, we are down to two factors: genotype and time, so the data could be analyzed by two way ANOVA. Twoway ANOVA reports three P values from three null hypotheses:
•Effect of genotype. The null hypothesis is that pooling all time points, the average result in the wildtype animals equals the average affect in the KO animals. That gets at the experimental question, so is useful.
•Effect of time. The null hypothesis is that pooling both genotypes, the average result at the three times points is the same. But we know already there will be more vessel growth at late times than at early time points in the normal animals. We know that there are more blood vessels at later times than earlier, so this P value is likely to be small, and that doesn't help answer the experimental question.
•Interaction of genotype and time. The null hypothesis is that the difference between the two genotypes is consistent at all time points. If the P value is large, you won't reject that hypothesis. In this case the P value for genotype answers the question the experiment was designed to ask. If the P value is small, you will reject the null hypothesis and conclude that the difference between genotypes is different at the various times. In this case, multiple comparison tests could compare the two genotypes at each time point individually.
Bottom line: With these data, considering half the experiment to be a control proving the methods worked vastly simplifies data analysis.
A statistician might object that those control data provide information about variability, so it isn't fair to ignore those data entirely. Someone skilled with R or SAS (etc.) could find a way to analyze all the data, to report P values that test the particular hypotheses of interest. But this is far from straightforward, and beyond the skills of most scientists. Blindly plugging the data into threeway ANOVA would not lead to results that answer the experimental question.
One problem with ANOVA (even twoway) is that it treats the three time points exactly as it would treat three species or treatment with three alternative drugs.
An alternative analysis approach would be to use regression. The simplest model is linear (and with only two time points, there would be no point fitting a more complicated model). Use linear regression to look at the rate of angiogenesis in hypoxic animals. Fit one slope to the WT animals and one to the KO animals, and compare the slopes.
This approach seems best to me. Each slope is understandable on its own as a measure of the rate of angiogenesis. The null hypothesis is understandable as well (the two rates are the same). The analysis seems much closer to the biological question, and the results will be much easier for nonstatisticians to interpret. Of course, it assumes that angiogenesis is linear over the time course studied, which may or may not be a reasonable assumption.
•Just because an experimental design includes three factors, don't assume that threeway ANOVA is the best analysis.
•Many experiments are designed with positive or negative controls. These are important, as they let you know whether everything worked as it should. If the controls gave unexpected results, it would not be worth analyzing the rest of the data. Once you've verified that the controls worked as expected, those control data can often be removed from the data used in the key analyses. This can vastly simplify data analysis.
•When a factor is dose or time, fitting a regression model often answers an experimental question better than does ANOVA.