![]() |
spa |
|
How certain are the best-fit values?
Standard errors of best-fit values The standard error of each parameter tells you how certain you can be of the best-fit value. If the SE is low, the best-fit value is "tight" - if you changed the variable a little bit, the curve would fit much worse. If the SE is high, the best-fit value is not so certain. You could change the value of the variable a lot without noticeably changing the goodness-of-fit. The standard errors reported by Prism (and virtually all other nonlinear regression programs) are based on some mathematical simplifications. They are called "asymptotic" or "approximate" standard errors. They are calculated assuming that the equation is linear, but are applied to nonlinear equations. This simplification means that the intervals can be too optimistic. To understand how the standard errors are calculated requires mastering the matrix algebra of nonlinear regression (way beyond the scope of this manual). There is no simple formula. The standard errors are a function of the number of data points, the distance of the points from the curve, and the overall shape of the curve. Some programs call these values standard deviations rather than standard errors. There really is no distinction between the standard error and standard deviation of a best-fit value. The term standard error refers to the standard deviation of a computed value. So the standard error of the mean is the same as the standard deviation of the mean (which is very different than the standard deviation of the data), and the standard error of a Kd or slope is the same as the standard deviation of a Kd or slope. Prism uses the term standard deviation to refer only to a measure of variability among values, and uses the term standard error to refer to the accuracy of a calculated value such as a mean or best-fit value. By themselves, the SE values are difficult to interpret. They are used to calculate 95% confidence intervals, which are easier to interpret. Confidence intervals of best-fit values Confidence intervals are easier to interpret than standard error values. If your nonlinear regression program does not compute confidence intervals (Prism does), you can easily calculate them youself. The confidence interval is always centered at the best fit value and extends the same distance above and below it. That distance equals the SE of the best-fit value times the critical value from the t distribution, abbreviated above as t*. This value depends on the degree of confidence you want (usually 95%) and the number of degrees of freedom, which equals the number of data points minus the number of parameters fit by nonlinear regression. For example, if you want 95% confidence and have 15 degrees of freedom, t* equals 2.1315. The confidence intervals are computed from the best-fit values and the SE of those best-fit values using this equation:
If all the assumptions of nonlinear regression are true, there is a 95% chance that the true value of the variable lies within the interval. More precisely, if you perform nonlinear regression many times (on different data sets) you expect the confidence interval to include the true value 95% of the time, but to exclude the true value the other 5% of the time (but you won't know when this happens). Explanations for high SE and wide confidence intervals If the SE values are very high and the confidence intervals are very wide, the results of nonlinear regression won't be useful. The following four situations can cause confidence intervals to be wide: Data collected over a too narrow range of X values The confidence intervals of best-fit values provided by nonlinear regression will be wide if you have not collected data over a wide enough range of X values to fully define the curve. One example is a sigmoid dose-response curve with no data defining the top and bottom plateau. When these data were fit to a sigmoid dose-response curve, the 95% confidence interval for the EC50 extended over fifteen orders of magnitude! The explanation is simple. The data were fit to a sigmoid equation with four variables: the top plateau, the bottom plateau, the slope, and the EC50 (the log[Dose] when response=50%). But the data do not form plateaus at either the top or the bottom, so the program is unable to fit unique values for the plateaus. The information is simply not in the data. Since the data do not define zero and one hundred, many curves (defined by different sets of parameters) would fit these data with similar sum of squares values. In this example, it might make scientific sense to set the bottom plateau to 0% and the top plateau to 100% (assuming that 0% and 100% are well defined by control experiments). But Prism doesn't know this. It just finds values for Top and Bottom that make the curve fit well. In the example above, the best-fit value of the bottom plateau is 23 and the best-fit value of the top plateau is 137. Prism doesn't know that a negative value of Y makes no sense. If you defined the Y values to be percent of control, you could set the bottom plateau to a constant of zero and the top plateau to a constant of 100. If you do this, Prism fits only the EC50 and the slope factor, and the confidence intervals will be narrower. Note that the problem with the fit is not obvious by inspecting a graph, because the curve goes very close to the points. The value of R2 (0.99) is also not helpful. That value just tells you that the curve comes close to the points, but does not tell you whether the fit is unique. Data missing in an important part of the curve
In fitting this dose-response curve, we set three parameters to constant values. We set the bottom plateau to equal zero, the top plateau to equal 100, and the slope to equal 1.0. There is only one variable to fit: the logEC50. But the 95% CI for the EC50 is quite wide, extending over almost an order of magnitude. The problem is simple. The EC50 is the concentration at which the response is half-maximal, and there are few data points near that point. Scattered data
Because of the scattered data, the confidence interval for the EC50 in this experiment is wide, extending over a fiftyfold range. The equation contains redundant variables
These data were fit to a competitive binding curve with two sites, constraining the bottom plateau to equal zero and the top plateau to equal 100. The curve comes very close to the points, so the R2 is very high (0.997). But the confidence intervals for the two logEC50 values all extremely wide, extending over several orders of magnitude. The confidence interval for the fraction of high-affinity sites extends to values that are nonsense (fractions must be between 0 and 1). The problem is that these data fit a one-site model just fine. There is no information in the data to support a two-site fit. When you choose a two-site model, the nonlinear regression procedure finds extremely wide confidence intervals. |
| All contents copyright © 1999 by GraphPad Software, Inc. All rights reserved. |