Two-way ANOVA is sometimes used when one of the factors is quantitative, such as when comparing time courses or dose response curves. In these situations one of the factors is dose or time that you have set to one of several (or many) discrete values.

ANOVA pays no attention to the order of your time points (or doses). Think about that. The whole point of your experiment may have been to look at a trend or at a dose-response relationship. But the ANOVA calculations completely ignores the order of the time points or doses. If you randomly scramble the time points or doses, two-way ANOVA would report identical results. ANOVA treats different time points, or different doses, exactly the same way it would treat different drugs, different genotypes, or different countries.

Since ANOVA ignores the entire point of the experiment when one of the factors is quantitative, consider using alternative (regression) approaches. In some cases, you don't have enough data or enough theory to fit a curve, so ANOVA might be a reasonable analysis.

Let's imagine you compare two treatments at six time points.

The two-way ANOVA will report three P values:

•One P value tests the null hypothesis that time has no effect on the outcome. In many situatons, you know that the outcome changes over time. That's why you did a time course. Since you expect a small P value for the effect of time, it doesn't tell you much.

•Another P value tests the null hypothesis that the treatment makes no difference, on average. This hypothesis might be worth testing in some situaitons. But in many situations, you expect no difference at early time points, and only care about differences at late time points. In these situations, testing the average treatment effect may not be so helpful.

•The third P value tests for interaction. The null hypothesis is that any difference between treatments is identical at all time points. But if you collect data at time zero, or at early time points, you don't expect to find any difference then. Your experiment really is designed to ask about later time points. In this situation, you expect an interaction, so finding a small P value for interaction does not help you understand your data. It is even less useful if the difference between treatments gets larger at some time points and then gets smaller at later time points.

What about multiple comparisons tests?

Some scientists like to ask which is the lowest dose (or time) at which the change in response is statistically significant. Multiple comparisons tests can give you the answer, but the answer depends on sample size. Run more subjects, or more doses or time points for each curve, and the answer will change. With a large enough sample size (at each dose or time point), you will find a statistically significant (but biologically trivial) effect with a tiny dose or at a very early time point. With fewer replicates at each dose or time point, you won't see statistical significance until a larger dose or later time point. Since asking for the smallest dose that gives a "signficant" effect of does not ask a fundamental question about the system, the results may not be helpful helpful.

If you want to know the minimally effective dose, consider finding the minimum dose that causes an effect bigger than some threshold you set based on physiology (or some other scientific context). For example, find the minimum dose that raises the pulse rate by more than 10 beats per minute. That approach can lead to useful answers. Searching for the smallest dose that leads to a "significant" effect does not.

If you look at all the multiple comparisons tests (and not just ask which is the lowest dose or time point that gives a 'significant' effect), you can get results that make no sense. You might find that the difference is statistically significant at time points 3, 5, 6 and 9 but not at time points 1, 2, 4, 7, 8 and 10. How do you interpret that? Knowing at which doses or time points the treatment had a statistically significant rarely helps you understand the biology of the system and rarely helps you design new experiments.

What is the alternative to two-way ANOVA?

If you have a repeated measures design, consider using this alternative to ANOVA, which Will G Hopkins calls within-subject modeling.

First, quantify the data for each subject in some biologically meaningful way. Perhaps this would be the area under the curve. Perhaps the peak level. Perhaps the time to peak. Perhaps you can fit a curve with nonlinear regression and determine a rate constant or a slope.

Now take these values (the areas or rate constants...) and compare between groups of subjects using a t test (if two treatments) or one-way ANOVA (if three or more). Unlike two-way ANOVA, this kind of analysis follows the scientific logic of the experiment, and so leads to results that are understandable and can lead you to the next step (designing a better experiment).

If you don't have a repeated measures design, you can still fit a curve for each treatment. Then compare slopes, or EC50s, or lag times as part of the linear or nonlinear regression.

Think hard about what your scientific goals are, and try to find a way to make the statistical testing match the scientific goals. In many cases, you'll find a better approach than using two-way ANOVA.

One of the choices for multiple comparisons tests following one-way ANOVA is a test for linear trend. This test, of course, does consider the order of the treatments. Other programs (but not Prism) offer polynomial post tests, which also take into account the treatment order.