Go to the GraphPad home page

The GraphPad Guide to
Nonlinear Regression


Part 2. Interpreting nonlinear regression results

Go back to part 1

This document contains a chapter from the manual for GraphPad Prism version 2. Although some sections are directed specifically to Prism users, most of the material is more general and will be helpful to everyone who fits curves with nonlinear regression, no matter what program they use.

This document is copyright © 1995 by GraphPad Software Inc. and was written by Dr. Harvey Motulsky, President of GraphPad Software, Inc. (hmotulsky@graphpad.com).


Assumptions of nonlinear regression

The results of nonlinear regression are meaningful only if these assumptions are true (or nearly true):

  • The model is correct. Nonlinear regression adjusts the variables in the equation you chose to minimize the sum-of-squares. It does not attempt to find a better equation.
  • The variability of values around the curve follow a Gaussian distribution. Even though no biological variable follows a Gaussian distribution exactly, it is sufficient that the variation be approximately Gaussian.
  • The SD of the variability is the same everywhere, regardless of the value of X. The assumption is termed homoscedasticity . If the SD is not constant but rather is proportional to the value of Y, you should weight the data to minimize the sum-of-squares of the relative distances.
  • The model assumes that you know X exactly. This is rarely the case, but it is sufficient to assume that any imprecision in measuring X is very small compared to the variability in Y.
  • The errors are independent. The deviation of each value from the curve should be random, and should not be correlated with the deviation of the previous or next point. If there is any carryover from one sample to the next, this assumption will be violated.
  • Variables, standard errors, and confidence intervals Along with the best-fit value of each variable in the equation, Prism reports its standard error and 95% confidence interval.

    By itself, the standard error is difficult to interpret. It is used to calculate the 95% confidence interval, which is easier to interpret.

    This is what the CI is supposed to mean: If all the assumptions of nonlinear regression are true, there is a 95% chance that the true value of the variable lies within the interval. More precisely, if you perform nonlinear regression many times (on different data sets) you expect the confidence interval to include the true value 95% of the time, but to exclude the true value the other 5% of the time.

    Three factors can make the confidence interval too narrow:

  • The CI is based only on the scatter of data points around the curve within this one experiment . If you repeat the experiment many times, the scatter between the results is likely to be greater than predicted from the CI based on one experiment.
  • The CI can only be interpreted if you accept the assumptions of nonlinear regression.
  • The confidence intervals from linear regression are calculated using straightforward mathematical methods. If you accept the assumptions of linear regression, then you can interpret the 95% CI of slope and intercept quite rigorously. It is not straightforward to calculate the 95% CI of variables from nonlinear regression - mathematical shortcuts are needed. These shortcut intervals (the ones reported by Prism) are sometimes referred to as asymptotic confidence intervals. In some cases these intervals can be too narrow (too optimistic).
  • Because of these problems, you shouldn't interpret the confidence intervals too rigorously. Rather than focusing on the CI reported from analysis of this one experiment, you should repeat the experiment several times.

    Sum-of-squares, sy.x, and R squared

    The sum-of-squares (SS) is the sum of the square of the vertical distances of the points from the curve. Nonlinear regression works by varying the values of the variables to minimize the sum-of-squares. It is expressed in the square of the units used for the Y values.

    The value sy.x is the standard deviation of the vertical distances of the points from the line. Since the distances of the points from the line are called residuals, sy.x is the standard deviation of the residuals. Its value is expressed in the same units as Y.

    The value R2 is a measure of goodness of fit. It is a fraction between 0.0 and 1.0, and has no units. When R2 equals 0.0, the best-fit curve fits the data no better than a horizontal line going through the mean of all Y values. In this case, knowing X does not help you predict Y. When R2=1.0, all points lie exactly on the curve with no scatter. If you know X you can calculate Y exactly.

    You can think of R2 as the fraction of the total variance of Y that is explained by the model (equation). Mathematically, it is defined by this equation: R2 =1.0 - SS/sy.x .

    Residuals and the runs test

    A residual is the distance of a point from the curve. A residual is positive when the point is above the curve, and is negative when the point is below the curve. The residual table has the same X values as the original data, but each Y value is replaced by the vertical distance of the point from the curve.

    If you selected the residuals output option, Prism creates a graph of the residuals. An example is shown below. If you look carefully at the curve on the left, you'll see that the data points are not randomly distributed above and below the curve. There are clusters of points all above or all below. This is much easier to see on the graph of the residuals on the right. The points are not randomly scattered above and below the X axis.

    The runs test determines whether your data differ significantly from the equation you selected. A run is a series of consecutive points that are either all above or all below the regression curve. Another way of saying this is that a run is a series of points whose residuals are either all positive or all negative.

    If the data points are randomly distributed above and below the regression curve, it is possible to calculate the expected number of runs. If there are fewer runs than expected, it may mean that the regression model is wrong. The P value from the runs test answers this question: If the data really follow the linear or nonlinear equation used to create the line or curve, what is the chance of obtaining as few (or fewer) runs as observed in this experiment? If the P value is small, you'd be inclined to conclude that the data really don't follow the model.

    The P values are always one-tail, asking about the probability of observing as few runs (or fewer) than observed. If you observe more runs than expected, the P value will be higher than 0.50.

    If the runs test reports a low P value, you should suspect that the data don't really follow the equation you have selected.

    In the example above, the equation does not adequately match the data. There are only six runs, and the P value for the runs test is tiny. This means that the data systematically deviate from the curve. Most likely, the data were fit to the wrong equation.

    How to tell if the nonlinear regression fit is any good

    Before accepting the results that Prism (or any curve fitting program) gives you, ask yourself the following questions:

    Did the fit converge on a solution?

    Nonlinear regression stops its iterations when it can't improve the fit by adjusting to the values of any of the variables. At that point, the program is said to have converged on the best-fit. In some cases, the program gets stuck. It doesn't know whether the fit would improve by increasing or decreasing the value of a variable. When this happens, the program stops and says that it was unable to converge on a solution. No results are reported.

    Does the curve come close to the points?

    In rare cases, the fit may be far from the data points. This may happen, for example, if you picked the wrong equation. Look at the graph to make sure this didn't happen.

    Also look at the R2 value (defined above) It is the fraction of the overall variance in Y that is "explained" by the model. If R2 is low, the curve does not come close to the points. If R2 is high, you can conclude that the curve comes closer to the points than would a horizontal line through the mean Y value. But don't over interpret a high R2 . It does not mean that you have chosen the equation that best describes the data. It also does not mean that the fit is unique - other values of the variables may generate a curve that fits just as well.

    Are the results scientifically plausible?

    Prism fits curves and displays the results. It is up to you to figure out what they mean. Before accepting the results, ask yourself if the results make any sense.

    The mathematics of curve fitting sometimes yields results that make no scientific sense. For example with noisy or incomplete data, Prism can calculate negative rate constants, fractions greater than 1.0, and negative Kd values. Its up to you to realize that these are nonsense.

    If the results make no scientific sense, you should conclude that the fit is no good, regardless of R2 and regardless of how close the curve comes to the points. Try a simpler equation, or try fixing some variables to constant values.

    Also check that the best-fit values of the variables make sense in light of the range of the data. Don't trust the results if the top plateau of a sigmoid curve is far larger than the highest data point. Don't trust the results if an EC50 value is not within the range of your X values.

    Do the data systematically deviate from the curve?

    If the data really follow the model described by your equation, the data points should randomly bounce above and below the curve. The distance of the points from the curve should also be random, and not be related to the value of X.

    The best way to look for systematic deviations of the points from the curve is to inspect a graph of the residuals and to look at the runs test (discussed above). With a good fit, the residuals should be randomly distributed between positive and negative values and the P value from the runs test will be high.

    If the runs test reports a low P value, you should suspect that the data don't really follow the equation you have selected.

    Are the confidence intervals wide?

    Prism reports the standard error of each variable, and its 95% confidence interval. You can be approximately 95% sure that the true value of the variable lies within the confidence interval.

    The confidence interval will be very wide (i.e. the standard error will be very large) when the fit is not unique. This means that curves generated from other values of the variables would fit nearly as well.

    Confidence intervals are wide in these circumstances:

  • You have not collected data over a wide enough range of X values. For example, when fitting to a sigmoidal dose-response curve, you need to collect data in both plateau regions..
  • You have not collected data in an important part of the curve. When fitting to a sigmoidal curve, for example, you must collect data near the middle of the curve.
  • The data are very scattered.
  • The equation contains redundant variables. For example, the confidence intervals would be very wide if you fit this equation: Y = A + B + C*X. This equation describes a line, but the intercept is defined by the sum of A plus B. There is no way for the program to know how to apportion the value between A and B, so both will have very wide confidence intervals.
  • Is the fit a local minimum?

    The nonlinear regression procedure adjusts the variables in small steps in order to improve the goodness-of-fit. If Prism converges on an answer, you can be sure that altering any of the variables a little bit will make the fit worse. But it is theoretically possible that large changes in the variables might lead to much better goodness-of-fit. Thus the curve that Prism decides is the "best" may really not be the best.

    Think of latitude and longitude as representing two variables Prism is trying to fit. Now think of altitude as the sum-of-squares. Nonlinear works iteratively to reduce the sum-of-squares. This is like walking downhill to find the bottom of the valley. When nonlinear regression has converged, changing any variable increases the sum-of-squares. When you are at the bottom of the valley, every direction leads uphill. But there may be a much deeper valley over the ridge that you are unaware of. In nonlinear regression, large changes in variables might decrease the sum-of-squares.

    This problem (called finding a local minimum) is intrinsic to nonlinear regression, no matter what program you use. You will rarely encounter a local minimum if your data have little scatter, you collected data over an appropriate range of X values, and you have chosen an appropriate equation.

    To continue the analogy, the confidence intervals for the variables are very wide when the bottom of the valley is very flat. You can walk a great distance without changing elevation. You can change the values of the variables a great deal without changing the goodness-of-fit.

    To test for the presence of a false minimum:

    1. Note the values of the variables and the sum-of-squares from the first fit.
    1. Make a large change to the initial values of one or more variables and run the fit again.
    1. Repeat step 2 several times.
    1. Ideally, Prism will report nearly the same sum-of-squares and same variables regardless of the initial values. If the values are different, accept the ones with the lowest sum-of-squares.

    What to do when the fit is no good?

    The last section explained how to identify a bad fit. Briefly, a fit is bad when:

    • The fit did not converge.
    • The results make no sense.
    • The confidence intervals are wide.

    If you encounter any of these situations, here are some things to try.

    Potential problem Solution
    The equation simply does not describe the data. Try a different equation.
    The initial values are too far from their correct values. Enter different initial values. If you are using a user-defined equation, check the rules for initial values.
    The range of X values is too narrow to define the curve completely. If possible, collect more data. Otherwise, hold one of the variables to a constant value.
    You have not collected enough data in a critical range of X values. Collect more data in the important regions.
    Your data are very scattered and don't really define a curve. Try to collect less scattered data. If you are combining several experiments, normalize the data for each experiment to an internal control.
    The equation includes more than one component, but your data don't follow a multicomponent model. Use a simpler equation.
    Your numbers are too large. If your Y values are huge, change the units. Don't use values greater than about 104.
    Your numbers are too small. If your Y values are tiny, change the units. Don't use values less than about 10-4.

    Comparing two equations

    Sometimes you don't know which of two equations is more appropriate for your data. You want to fit both equations, and let the program compare the results. For example, you might want to fit a competitive binding curve to models with both one and two binding sites. Or you might want to fit a dissociation kinetics curve to exponential decay equations with both one and two phases.

    Goodness of fit is quantified by the sum-of-squares. Therefore you might imagine that you can simply define the "best" equation as the one that gives the smaller sum-of-squares. That rule makes sense when both equations have the same number of variables.

    Most often, however, you wish to compare equations with different numbers of variables. If the more complicated equation fits worse than the simpler equation, then you should clearly stick with the simpler equation. However, the curve generated by the more complicated equation (the one with more variables) will nearly always come closer to the points because it has more inflection points (it wiggles more).The question is whether this decrease in sum-of-squares is worth the "cost" of the additional variables (loss of degrees of freedom). The F test addresses this question by calculating a P value that answers this question: If the simpler model is really correct, what is the chance that you'd randomly obtain data that fits the more complicated model so much better? If the P value is low, you conclude that the more complicated model is significantly better than the simpler model.

    The results of the F test are only strictly valid when the simpler equation is a special case of the more complicated equation. For example, you can compare a one-site vs. two-site binding curve.

    How the F test works

    First fit the more complicated model (Model 2) and calculate its goodness-of-fit as the sum-of-squares. Now fit the simpler model (Model 1). Even if this simpler model is correct, you expect it to fit worse (have the higher sum-of-squares) because it has fewer inflection points (more degrees of freedom). In fact, statisticians can prove that the relative increase in the sum of squares is expected to equal the relative increase in degrees of freedom. In other words, if the simpler model is correct you expect that:

    (SS1 - SS2)/SS2 (DF1 - DF2)/DF2

    If the more complicated model is correct, then you expect the relative increase in sum-of-squares (going from complicated to simple model) to be greater than the relative increase in degrees of freedom:

    (SS1 - SS2)/SS2 > (DF1 - DF2)/DF2

    The F ratio quantifies the relationship between the relative increase in sum-of-squares and the relative increase in degrees of freedom.

    If the simpler model is correct you expect to get an F ratio near 1.0. If the ratio is much greater than 1.0, there are two possibilities:

  • The more complicated model is correct.
  • The simpler model is correct, but random scatter led the more complicated model to fit better. The P value tells you how rare this coincidence would be.
  • The P value answers this question: If model 1 is really correct, what is the chance that you'd randomly obtain data that fits model 2 so much better? If the P value is low, you conclude that model 2 is significantly better than model 1.

    The equation is usually presented in this more conventional form.

    If you are extremely familiar with analysis of variance, you'll appreciate that the F ratio is determined from this analysis of variance tables,

    Source of variation Sum-of-squares
    df
    MS
    Difference SS1 - SS2 DF1 - DF2 SS1 - SS2
    DF1 - DF2
    Model 2 (complicated) SS2 DF2 SS2/DF2
    Model 1 (simple) SS1 DF1

    Example

    This graph compares a one-site and two-site competitive binding curve. The results are shown here:

    Two-site
    One-site
    % Increase
    Degrees of freedom 10 12 20.00%
    Sum-of-squares 129800 248100 91.14%

    In going from the two-site to the one-site model, we gained two degrees of freedom because the one-site model has two fewer variables. Since the two-site model has 10 degrees of freedom (15 data points minus 5 variables), the degrees of freedom increased 20%. If the one-site model were correct, you'd expect the sum-of-squares to also increase about 20% just by chance. In fact the sum-of-squares increased 91%. The percent increase was 4.56 times higher than expected (91.1/20.0=4.56). This is the F ratio (F=4.56), and it corresponds to a P value of 0.039. If the one-site model is correct, there is only a 3.9% chance that you'd randomly obtain data that fits the two-site model so much better. Since this is below the traditional threshold of 5%, you'd probably conclude that the two-site model fits significantly better than the one-site model. Here is how Prism reports the comparison:

    Comparing fits to two data sets

    The previous section discussed how to compare the fits of two different equations to one set of data. Here we discuss how to compare the fit of one equation to two different sets of data, for example comparing fits to data from control and treated preparations. Although this is a common situation, there is no clear consensus for how to compare fits to different groups. Three approaches are discussed below.

    1. Compare the results of repeated experiments.

    If you repeat the experiment several times, you can compare the best-fit value of a variable in control and treated preparations using a paired t test (or the analogous Wilcoxon nonparametric test).

    For example, here are the results of a competitive binding curve performed in two groups of cells. The table shows the logKi values.


    Experiment Control Treated
    1 -6.13 -6.53
    2 -6.39 -6.86
    3 -5.92 -6.31

    Compare the results using a paired t test using Prism (or InStat). The t ratio is 16.7, and the P value is 0.0036 (two-tail). If the treatment did not alter the logKi, there is only a 0.36% chance that you observe such a large difference (or larger) between logKi by chance.

    Notes:

  • We compare logKi values, not Ki values. When doing a paired t test, a key assumption is that the distribution of differences (treated - control) follow a Gaussian distribution. Since a competitive binding curve (similar to a dose response curve) is conducted with X values (concentration) equally spaced on a log scale, the uncertainty of X is reasonable symmetrical (and perhaps Gaussian) when expressed on a log scale. It is equally likely that the observed logKi is 0.1 log units too high or 0.1 log units too low. In contrast, the uncertainty in Ki is not symmetrical.
  • Prism can to do the paired t test. Copy the results from each nonlinear regression results page onto a new data table for the t test.
  • 2. Compare the results within one experiment. Simple approach.

    When Prism reports the best-fit value of each variable for each data set, it also reports the standard error of the estimates. You can compare the best-fit values between two data sets using a t test.

    For example, control and treated data were fit to a competitive binding curve. The control LogEC50 was -6.08 with a standard error of 0.0677. The treated EC50 was -6.29 with a standard error of 0.617. We want to compare the two LogEC50 values with a t test. The only trick is deciding what value to enter for N. Follow this logic:

  • For nonlinear regression, the number of degrees of freedom (df) equals the number of data points minus the number of variables fit. In this example, there were 15 data points, and three variables were fit. So there are 12 df.
  • For an ordinary t test, the number of df for each sample equals one less than the number of data points.
  • The t test calculations are based on the numbers of degrees of freedom. There is no way to enter degrees of freedom into Prism - instead you enter N. Prism is programmed to always compute the df as N-1. When comparing the results of nonlinear regression, enter N as the number of degrees of freedom plus 1. Prism will subtract 1, and make the df correct. In this example, enter N=12+1=13.
  • Prism can perform the t test. The resulting P value is 0.0309. If the treatment really didn't alter the EC50, there is only a 3.09% chance that you'd observe this large of a difference (or more) by coincidence. Since the P value is so low, you conclude that the two EC50 values are statistically significantly different.

    Notes:

  • This method only uses data from one experiment. The SE value is a measure of how precisely you have determined the logEC50 in this one experiment. It is not a measure of how reproducible the experiment is. Despite the impressive P value, I wouldn't trust these results until the experiment is repeated.
  • The t test assumes that the uncertainty in the values of the variables follow a Gaussian distribution. This assumption is not necessarily true with the SE values that emerge from nonlinear regression. The only way to assess the validity of this assumption is to simulate many sets of data, fit each with nonlinear regression, and examine the distribution of best-fit values. I have done this informally with commonly used equations, and it seems that the assumption is reasonable in many cases.
  • Compare LogEC50, not EC50. You want to express the variables in a form that makes the uncertainty as symmetrical and Gaussian as possible. Since a competitive binding curve (similar to a dose response curve) is conducted with X values (concentration) equally spaced on a log scale, the uncertainty of X is reasonable symmetrical (and perhaps Gaussian) when expressed on a log scale. It is equally likely that the observed logKi is 0.1 log units too high or 0.1 log units too low. In contrast, the uncertainty in Ki is not symmetrical.
  • 3. Compare the results within one experiment. More complicated approach.

    The method of the previous section only compared the value of the logEC50. This section describes a more general method to compare entire curves to ask whether the data sets differ at all. The idea is to first fit the two curves separately, and then combine the values and fit one curve to all the data.

    Follow these steps:

    1. Fit the two data sets separately. We did this in the previous section.
    2. Total the sum-of-squares and df from the two fits. For this example the total sum of squares equals. 19560+29320= 48880, and the total df equals 12+12=24. Since these are the results of fitting the two data sets separately, label these values SSseparate and DFseparate
    3. Combine the two data sets into one. For this example, the combined data set has XY pairs, with each X value appearing in twice.
    4. Fit the combined data set to the same equation. Note the SS and df. For this example, SS=165200, and df=27 (30 data points minus three variables). Call these values SScombined and Dfcombined.
    5. You expect SSseparate to be smaller than SS combined even if the curves are really identical simply because the separate fits have more degrees of freedom. The question is whether the SS values are more different than you'd expect to see by chance. To find out, calculate the F ratio using this equation. For this example, F = 5.63.
    6. Determine the P value from F. There are Dfcombined - Dfseparate degrees of freedom in the numerator, and Dfseparate degrees of freedom in the denominator. GraphPad StatMate and InStat can calculate the P value (from F and the two df values), or you can use tables in the back of most statistics books.
    7. For this example, the P value is 0.0046. If the treatment were really ineffective, there is only a 0.46% chance that the two curves would differ as much (or more) than observed in this experiment. Since the P value is low, you'll conclude that the curves are really different.

                Notes:

              1. This method only uses data from one experiment. Despite the impressive P value, I wouldn't trust these results until the experiment is repeated.
              2. This method compares the curves overall. It doesn't tell you which variable(s) are different. Differences might be due to something trivial like a different baseline, rather than something important like a different rate constant.
              3. Advantages and disadvantages of the three methods

                If you have repeated the experiment several times, I recommend that you use the first method. There are two advantages:

              4. Compared to the other methods discussed below, this method is far easier to understand and communicate to others.
              5. The entire test is based on the consistency of the results between repeat experiments. Since there are usually more causes for variability between experiments than within experiments, it makes sense to base the comparison on differences between experiments.
              6. The disadvantage of the first method is that you are throwing away information. The calculations are based only on the best-fit value from each experiment, and ignores the SE of those values presented by the curve fitting program.

                If you have performed the experiment only once, then you probably ought to repeat the experiment. Regardless of what statistical results you obtain, you shouldn't trust results from a single experiment. If you want to compare results in a single experiment, you can use method 2 or method 3.

                The advantage of method 2 is that it focuses your thinking on a single variable. Generally, you care mostly about one variable (i.e. a rate constant or EC50), and care less about the others. Method 2 compares the variable of interest.

                Method 3 is the most general method. Since the method compares the entire curve, it does not force you to decide which variable(s) you wish to compare. This is both its advantage and disadvantage.

                Go back to Part 1


                GraphPad Home