![]() |
|
The GraphPad Guide to |
|
Part 2. Interpreting nonlinear regression results This document contains a chapter from the manual for GraphPad Prism version 2. Although some sections are directed specifically to Prism users, most of the material is more general and will be helpful to everyone who fits curves with nonlinear regression, no matter what program they use. This document is copyright © 1995 by GraphPad Software Inc. and was written by Dr. Harvey Motulsky, President of GraphPad Software, Inc. (hmotulsky@graphpad.com). Assumptions of nonlinear regression The results of nonlinear regression are meaningful only if these assumptions are true (or nearly true): Variables, standard errors, and confidence intervals Along with the best-fit value of each variable in the equation, Prism reports its standard error and 95% confidence interval. By itself, the standard error is difficult to interpret. It is used to calculate the 95% confidence interval, which is easier to interpret. This is what the CI is supposed to mean: If all the assumptions of nonlinear regression are true, there is a 95% chance that the true value of the variable lies within the interval. More precisely, if you perform nonlinear regression many times (on different data sets) you expect the confidence interval to include the true value 95% of the time, but to exclude the true value the other 5% of the time. Three factors can make the confidence interval too narrow: Because of these problems, you shouldn't interpret the confidence intervals too rigorously. Rather than focusing on the CI reported from analysis of this one experiment, you should repeat the experiment several times. Sum-of-squares, sy.x, and R squared The sum-of-squares (SS) is the sum of the square of the vertical distances of the points from the curve. Nonlinear regression works by varying the values of the variables to minimize the sum-of-squares. It is expressed in the square of the units used for the Y values.The value sy.x is the standard deviation of the vertical distances of the points from the line. Since the distances of the points from the line are called residuals, sy.x is the standard deviation of the residuals. Its value is expressed in the same units as Y. The value R2 is a measure of goodness of fit. It is a fraction between 0.0 and 1.0, and has no units. When R2 equals 0.0, the best-fit curve fits the data no better than a horizontal line going through the mean of all Y values. In this case, knowing X does not help you predict Y. When R2=1.0, all points lie exactly on the curve with no scatter. If you know X you can calculate Y exactly. You can think of R2 as the fraction of the total variance of Y that is explained by the model (equation). Mathematically, it is defined by this equation: R2 =1.0 - SS/sy.x . Residuals and the runs test A residual is the distance of a point from the curve. A residual is positive when the point is above the curve, and is negative when the point is below the curve. The residual table has the same X values as the original data, but each Y value is replaced by the vertical distance of the point from the curve. If you selected the residuals output option, Prism creates a graph of the residuals. An example is shown below. If you look carefully at the curve on the left, you'll see that the data points are not randomly distributed above and below the curve. There are clusters of points all above or all below. This is much easier to see on the graph of the residuals on the right. The points are not randomly scattered above and below the X axis.
The runs test determines whether your data differ significantly from the equation you selected. A run is a series of consecutive points that are either all above or all below the regression curve. Another way of saying this is that a run is a series of points whose residuals are either all positive or all negative. If the data points are randomly distributed above and below the regression curve, it is possible to calculate the expected number of runs. If there are fewer runs than expected, it may mean that the regression model is wrong. The P value from the runs test answers this question: If the data really follow the linear or nonlinear equation used to create the line or curve, what is the chance of obtaining as few (or fewer) runs as observed in this experiment? If the P value is small, you'd be inclined to conclude that the data really don't follow the model. The P values are always one-tail, asking about the probability of observing as few runs (or fewer) than observed. If you observe more runs than expected, the P value will be higher than 0.50. If the runs test reports a low P value, you should suspect that the data don't really follow the equation you have selected. In the example above, the equation does not adequately match the data. There are only six runs, and the P value for the runs test is tiny. This means that the data systematically deviate from the curve. Most likely, the data were fit to the wrong equation. How to tell if the nonlinear regression fit is any good Before accepting the results that Prism (or any curve fitting program) gives you, ask yourself the following questions: Did the fit converge on a solution? Nonlinear regression stops its iterations when it can't improve the fit by adjusting to the values of any of the variables. At that point, the program is said to have converged on the best-fit. In some cases, the program gets stuck. It doesn't know whether the fit would improve by increasing or decreasing the value of a variable. When this happens, the program stops and says that it was unable to converge on a solution. No results are reported. Does the curve come close to the points? In rare cases, the fit may be far from the data points. This may happen, for example, if you picked the wrong equation. Look at the graph to make sure this didn't happen. Also look at the R2 value (defined above) It is the fraction of the overall variance in Y that is "explained" by the model. If R2 is low, the curve does not come close to the points. If R2 is high, you can conclude that the curve comes closer to the points than would a horizontal line through the mean Y value. But don't over interpret a high R2 . It does not mean that you have chosen the equation that best describes the data. It also does not mean that the fit is unique - other values of the variables may generate a curve that fits just as well. Are the results scientifically plausible? Prism fits curves and displays the results. It is up to you to figure out what they mean. Before accepting the results, ask yourself if the results make any sense. The mathematics of curve fitting sometimes yields results that make no scientific sense. For example with noisy or incomplete data, Prism can calculate negative rate constants, fractions greater than 1.0, and negative Kd values. Its up to you to realize that these are nonsense. If the results make no scientific sense, you should conclude that the fit is no good, regardless of R2 and regardless of how close the curve comes to the points. Try a simpler equation, or try fixing some variables to constant values. Also check that the best-fit values of the variables make sense in light of the range of the data. Don't trust the results if the top plateau of a sigmoid curve is far larger than the highest data point. Don't trust the results if an EC50 value is not within the range of your X values. Do the data systematically deviate from the curve? If the data really follow the model described by your equation, the data points should randomly bounce above and below the curve. The distance of the points from the curve should also be random, and not be related to the value of X. The best way to look for systematic deviations of the points from the curve is to inspect a graph of the residuals and to look at the runs test (discussed above). With a good fit, the residuals should be randomly distributed between positive and negative values and the P value from the runs test will be high. If the runs test reports a low P value, you should suspect that the data don't really follow the equation you have selected. Are the confidence intervals wide? Prism reports the standard error of each variable, and its 95% confidence interval. You can be approximately 95% sure that the true value of the variable lies within the confidence interval. The confidence interval will be very wide (i.e. the standard error will be very large) when the fit is not unique. This means that curves generated from other values of the variables would fit nearly as well. Confidence intervals are wide in these circumstances: Is the fit a local minimum? The nonlinear regression procedure adjusts the variables in small steps in order to improve the goodness-of-fit. If Prism converges on an answer, you can be sure that altering any of the variables a little bit will make the fit worse. But it is theoretically possible that large changes in the variables might lead to much better goodness-of-fit. Thus the curve that Prism decides is the "best" may really not be the best. Think of latitude and longitude as representing two variables Prism is trying to fit. Now think of altitude as the sum-of-squares. Nonlinear works iteratively to reduce the sum-of-squares. This is like walking downhill to find the bottom of the valley. When nonlinear regression has converged, changing any variable increases the sum-of-squares. When you are at the bottom of the valley, every direction leads uphill. But there may be a much deeper valley over the ridge that you are unaware of. In nonlinear regression, large changes in variables might decrease the sum-of-squares. This problem (called finding a local minimum) is intrinsic to nonlinear regression, no matter what program you use. You will rarely encounter a local minimum if your data have little scatter, you collected data over an appropriate range of X values, and you have chosen an appropriate equation. To continue the analogy, the confidence intervals for the variables are very wide when the bottom of the valley is very flat. You can walk a great distance without changing elevation. You can change the values of the variables a great deal without changing the goodness-of-fit. To test for the presence of a false minimum:
What to do when the fit is no good? The last section explained how to identify a bad fit. Briefly, a fit is bad when:
If you encounter any of these situations, here are some things to try.
Comparing two equations Sometimes you don't know which of two equations is more appropriate for your data. You want to fit both equations, and let the program compare the results. For example, you might want to fit a competitive binding curve to models with both one and two binding sites. Or you might want to fit a dissociation kinetics curve to exponential decay equations with both one and two phases. Goodness of fit is quantified by the sum-of-squares. Therefore you might imagine that you can simply define the "best" equation as the one that gives the smaller sum-of-squares. That rule makes sense when both equations have the same number of variables. Most often, however, you wish to compare equations with different numbers of variables. If the more complicated equation fits worse than the simpler equation, then you should clearly stick with the simpler equation. However, the curve generated by the more complicated equation (the one with more variables) will nearly always come closer to the points because it has more inflection points (it wiggles more).The question is whether this decrease in sum-of-squares is worth the "cost" of the additional variables (loss of degrees of freedom). The F test addresses this question by calculating a P value that answers this question: If the simpler model is really correct, what is the chance that you'd randomly obtain data that fits the more complicated model so much better? If the P value is low, you conclude that the more complicated model is significantly better than the simpler model. The results of the F test are only strictly valid when the simpler equation is a special case of the more complicated equation. For example, you can compare a one-site vs. two-site binding curve. How the F test works First fit the more complicated model (Model 2) and calculate its goodness-of-fit as the sum-of-squares. Now fit the simpler model (Model 1). Even if this simpler model is correct, you expect it to fit worse (have the higher sum-of-squares) because it has fewer inflection points (more degrees of freedom). In fact, statisticians can prove that the relative increase in the sum of squares is expected to equal the relative increase in degrees of freedom. In other words, if the simpler model is correct you expect that:
If the more complicated model is correct, then you expect the relative increase in sum-of-squares (going from complicated to simple model) to be greater than the relative increase in degrees of freedom:
The F ratio quantifies the relationship between the relative increase in sum-of-squares and the relative increase in degrees of freedom.
If the simpler model is correct you expect to get an F ratio near 1.0. If the ratio is much greater than 1.0, there are two possibilities: The P value answers this question: If model 1 is really correct, what is the chance that you'd randomly obtain data that fits model 2 so much better? If the P value is low, you conclude that model 2 is significantly better than model 1. The equation is usually presented in this more conventional form.
If you are extremely familiar with analysis of variance, you'll appreciate that the F ratio is determined from this analysis of variance tables,
Example
This graph compares a one-site and two-site competitive binding curve. The results are shown here:
In going from the two-site to the one-site model, we gained two degrees of freedom because the one-site model has two fewer variables. Since the two-site model has 10 degrees of freedom (15 data points minus 5 variables), the degrees of freedom increased 20%. If the one-site model were correct, you'd expect the sum-of-squares to also increase about 20% just by chance. In fact the sum-of-squares increased 91%. The percent increase was 4.56 times higher than expected (91.1/20.0=4.56). This is the F ratio (F=4.56), and it corresponds to a P value of 0.039. If the one-site model is correct, there is only a 3.9% chance that you'd randomly obtain data that fits the two-site model so much better. Since this is below the traditional threshold of 5%, you'd probably conclude that the two-site model fits significantly better than the one-site model. Here is how Prism reports the comparison: Comparing fits to two data sets The previous section discussed how to compare the fits of two different equations to one set of data. Here we discuss how to compare the fit of one equation to two different sets of data, for example comparing fits to data from control and treated preparations. Although this is a common situation, there is no clear consensus for how to compare fits to different groups. Three approaches are discussed below. 1. Compare the results of repeated experiments. If you repeat the experiment several times, you can compare the best-fit value of a variable in control and treated preparations using a paired t test (or the analogous Wilcoxon nonparametric test). For example, here are the results of a competitive binding curve performed in two groups of cells. The table shows the logKi values.
Compare the results using a paired t test using Prism (or InStat). The t ratio is 16.7, and the P value is 0.0036 (two-tail). If the treatment did not alter the logKi, there is only a 0.36% chance that you observe such a large difference (or larger) between logKi by chance. Notes: 2. Compare the results within one experiment. Simple approach. When Prism reports the best-fit value of each variable for each data set, it also reports the standard error of the estimates. You can compare the best-fit values between two data sets using a t test. For example, control and treated data were fit to a competitive binding curve. The control LogEC50 was -6.08 with a standard error of 0.0677. The treated EC50 was -6.29 with a standard error of 0.617. We want to compare the two LogEC50 values with a t test. The only trick is deciding what value to enter for N. Follow this logic: Prism can perform the t test. The resulting P value is 0.0309. If the treatment really didn't alter the EC50, there is only a 3.09% chance that you'd observe this large of a difference (or more) by coincidence. Since the P value is so low, you conclude that the two EC50 values are statistically significantly different. Notes: 3. Compare the results within one experiment. More complicated approach. The method of the previous section only compared the value of the logEC50. This section describes a more general method to compare entire curves to ask whether the data sets differ at all. The idea is to first fit the two curves separately, and then combine the values and fit one curve to all the data. Follow these steps:
Notes: Advantages and disadvantages of the three methods If you have repeated the experiment several times, I recommend that you use the first method. There are two advantages: The disadvantage of the first method is that you are throwing away information. The calculations are based only on the best-fit value from each experiment, and ignores the SE of those values presented by the curve fitting program. If you have performed the experiment only once, then you probably ought to repeat the experiment. Regardless of what statistical results you obtain, you shouldn't trust results from a single experiment. If you want to compare results in a single experiment, you can use method 2 or method 3. The advantage of method 2 is that it focuses your thinking on a single variable. Generally, you care mostly about one variable (i.e. a rate constant or EC50), and care less about the others. Method 2 compares the variable of interest. Method 3 is the most general method. Since the method compares the entire curve, it does not force you to decide which variable(s) you wish to compare. This is both its advantage and disadvantage. |