Notes
Slide Show
Outline
1
Nonlinear regression for pharmacologists
  • Harvey Motulsky
  • hmotulsky@graphpad.com
  • www.graphpad.com
2
Curve fitting
3
Other kinds of regression
  • Linear  Y=slope*X+intercept
  • Polynomial Y = A + BX +CX2 + DX3 …
  • Multiple (Y=A + BX1 + CX2…)
  • Logistic (outcome is binary)
  • Proportional hazards (outcome is survival time)


4
What is a model?
  • A mathematical model is a description of a physical, chemical or biological state or process.
  • "A mathematical model is neither an hypothesis nor a theory. Unlike scientific hypotheses, a model is not verifiable directly by an experiment. For all models are both true and false.... The validation of a model is not that it is "true" but that it generates good testable hypotheses relevant to important problems. "
    -- R. Levins, Am. Scientist 54:421-31, 1966
  •  “All models are wrong, but some are useful.”
    -- George E. P. Box
5
Example model. Linear
6
Example model 2. Exponential
  • Exponential equations are used to model many processes where the rate at which something happens is proportional to the amount which is left.
  • Ligands dissociating from receptors,.
  • Radioactive isotope decay.
  • Drug metabolism.
7
Exponential decay
8
Example model #3. Equil. binding
9
Scatter
  • Fitting a model would be easy if the data completely followed the model.
  • Experimental error adds complexity.
  • Have to choose a model for distribution of scatter.
  • Most common assumption is Gaussian
10
Model plus scatter
11
Least squares
  • If scatter really follows a Gaussian distribution, then two points each 5 units from the curve is way more likely than one point 1 unit away, and another 9 units away.
  • So you can’t minimize sum of distances (those two examples have same sum).
  • Instead, minimize sum of squares of distances. “Least squares” regression.
  • Mathematicians: “Maximum likelihood”
12
We have..
  • A model for what’s going on
  • A model for random scatter
  • A goal (minimize sum of squares of vertical distances between points and curve)
13
Step 1. Prepare data for nonlinear regression
  • Scale to reasonable units. No huge or tiny numbers.
  • Don’t smooth
  • Transform if it makes the scatter more Gaussian. Don’t transform to make linear.
  • ? Prune obvious outliers??
  • If X is log(conc) what about zero? Often people put it in as a very low conc.


14
Step 2. Choose a model
  • Based on physical or chemical model
    • Examples: exponential decay, equilibrium binding, enzyme velocity as a function of [substrate]
  • Empirical
    • Dose response curves
15
Why log(EC50) rather than EC50?
  • You can write the equation either way.
  • The best-fit results will be the same either way.
  • You can easily convert EC50 to/from logEC50.
  • The SE and CI values are different, and not convertible…
16
Why log(EC50) rather than EC50?
17
Should some parameters be held constant?
  • A model defines Y as a function of X and one or more parameters.
  • You can ask the nonlinear regression program to find best-fit values for any number of those.
  • You can set some to constant values. Do so when you know more than the data.
    • If the baseline HAS to be zero, then set it to a constant.
    • If the top value in a dose-response curve has to be 100%, then make that a constant.
18
Should parameters be constrained?
  • Constrain rate constants to positive values
  • Constrain fractions to be between 0 and 1
  • Etc.
19
Step 3. Estimate initial values
  • Usually not a big deal if you understand the equation and view the data.
  • For dose-response curve:
    • Initial value of top can be the max Y value
    • Initial value of bottom can be the min Y value
    • Initial value of logEC50 can be the middle of the X range
    • Initial value of HillSlope can be 1 or -1
20
Step 4. Decide on weighting.
Why minimize the sum-of-squares?
  • If variability is really Gaussian, it gives the best answer.
  • In math lingo: Results match maximum likelihood estimate (assuming Gaussian scatter).
  • If variability is Gaussian, one big deviation is much less likely than several small deviations.
    • Two deviations of 5, much more likely than one deviation of 9 and another of 1. SS=50 vs 82.
  • But, outliers can muck things up!
21
Weighting
  • If scatter increases as Y increases, then assumptions of regression aren’t met. Larger points will have larger deviations – and much larger squared deviations -- and dominate the calculations.
  • Solution: Weight (or unweight) the squared distance so that the average weighted-sum-of-square is the same at all points along the curve.
  • Problem: It’s hard to know when to weight.
22
Weighting
23
What if you collected replicate Y values at each X??
  • Approach 1. Enter each replicate value as a separate data point.
    • Appropriate when errors are independent… when the experimental error in one replicate is no more related to the other replicates at the same X than to other data in the experiment.
  • Approach 2. Enter only the means.
    • Appropriate when replicates are not independent. Example: Dose response curve, where each dose is a separate animal.
24
Problem with minimizing sum-of-squares
  • Outliers can throw things off a lot, especially if you don’t have many points.
  • One solution is to remove outliers, but hard to decide which points to remove.
  • Another solution is to use a weighting method that gives less weight to outlying points. Robust nonlinear regression.
25
How nonlinear regression works
  • Method of steepest descent. Head downhill step by step.  Good initially, slow later.
  • Gauss-Newton method. Assume surface is parabola. From position and slopes, find the bottom. Repeat. Bad initially. Fine later.
  • **Marquardt-Levenberg. Blends steepest descent with Gauss-Newton.
  • Simplex. Not used much. No confidence intervals.
26
Visualizing the search for smallest sum-of-squares
27
When is curve fitting done?
  • Prism stops iterating and declares the results to have converged when two iterations in a row change the sum-of-squares by less than 0.01%.
  • If you check the box for strict convergence criteria, Prism will continue the iterations until five consecutive iterations each reduce the sum-of-squares by less than 0.000001%.


28
What’s special about linear regression??
  • Y = Intercept + slope*X
  • Special to mathematicians, as it is much easier to find best-fit values for slope and intercept. Initial values and iterations not needed. Local minima impossible.
  • From scientists point-of-view, not all that different.
  • Choose when the linear model makes sense, but don’t bend over backwards to make a model linear.
29
Diversion: Other kinds of  curve fitting
  • Nonlinear regression is great when you want to fit to a chemical, physiological physical, or empirical model.
  • Consider different tools if you have a  different goal:
    • Draw nice looking curve for figure.
    • Interpolate from a standard curve
    • Create “black box” for simulation

30
Free hand curve fitting. Astronomers did it first
31
Three methods you probably don’t want to use
  • Polynomial regression.
  • Transform to make linear (Scatchard. Lineweaver Burk, etc.)
  • Computer programs that pick an equation for you.


32
Results of polynomial regression are hard to interpret
  • Equation: Y=A + BX + CX2 +DX3....
  • Advantages: Easy calculations. Available in lots of programs. No need for initial values.
  • Disadvantage: Biological and chemical processes rarely follow polynomial models.
  • Take home message: Beware the term “curve fitting”.
33
Polynomial fits
34
Why not let the computer pick an equation?
35
What happens when the computer picks an equation?
36
What happens when the computer picks an equation?
37
Why not create a Scatchard plot?
38
Linear transforms vs. nonlin.
39
Graphical results
40
Results of nonlinear regression
  • Values of the variables
  • SE and CI of the variables
  • R squared
  • Sum-of-squares
41
Numerical results
42
Assumptions of nonlinear regression
  • Data really follow model described by equation
  • Scatter is Gaussian, with same SD all along the curve
  • All experimental error is in Y (not X)
  • Independence. Experimental “error” in one point is not affected by its neighbors.
  • No systematic errors in your measurements.
43
Approach to nonlinear regression results (1)
  • Did the program converge on a solution?
    • An error message doesn’t mean a bug in the program, just a problem with data, model, initial values…
  • Are the best-fit values scientifically plausible?
  • Are the confidence intervals narrow?
  • Does the curve come close to the points?
    • Quantify with R2

44
Example 1. No data to define bottom and top. Needs to set constant.
45
Example 1 with fixed top and bottom
46
Example 2. Model too complex.
47
Example 2. With top and bottom fixed.
48
Example 2. Simpler model plus fix top and bottom
49
Example 3.
50
Example 3. With sharing.
51
Example 4
52
Example 4. Curve generated by initial values
53
Example 4. Better init values
54
Example 5.
55
Example 6.
56
 
57
Approach to nonlinear regression results (2).
  • Is the fit a local minimum?
  • Are the residuals random?
  • Are there too few runs?
58
Example of false minimum
59
The problem of false minima
  • All iterative methods of nonlinear regression stop when any small change in the values of the parameters make the fit worse (increase the sum-of-squares).
  • No iterative method will “know” if a much better fit is possible by making some large changes in the parameters.
  • Nonlinear regression finds a valley, but may not realize there is a deeper valley over the ridge.
60
Residuals
61
What can residual plots tell you?
62
Runs test
  • A run is a series of consecutive points (may be just one) that are all on the same side of the curve.
  • If the point are randomly distributed around the curve, you can predict the number of runs expected from the number of positive(A) and negative (B) residuals. Expected # of runs = 1+2AB/(A+B))
  • P value: Chance of obtaining so few runs (or fewer) by chance.
  • Too few runs (low P value) suggests that the model is wrong. Systematic deviations.
  • What does it mean if there are too many runs?
63
Comparing models.
 One set of data, two models
64
One set of data, two models. First step.
  • Check that both fits make sense
    • Reject a fit that has best-fit values that are biologically irrelevant.
      • negative rate constants
      • Fractions that are <0 or >1
      • EC50 values not in the range of the data
    • Reject a fit if the confidence intervals for the best-fit values are super wide.
65
Example data. Reject two-sites as nonsense
66
Second example
67
Second example best-fit.
Worth formal testing.
68
One set of data, two models
  • If the simpler model fits better (lower sum-of-squares)
    • Accept it. No reason to consider more complicated model.
    • No need for statistics.


69
If the more complicated model fits better (lower sum-of-squares)
    • This does not mean you should accept the fancier model.
    • If you add variables to a model, the curve wiggles more, so usually comes closer to the data.
    • Need to use statistics to ask whether the two-site fit is even better than expected by chance.
70
Extra sum-of-squares F test to compare fits
71
Interpreting the F ratio
72
 F test to compare two fits to one set of data
  • If the more complicated model fits better (lower sum-of-squares),F test answers this question:
      • If the simpler model really were correct, what is the chance that the more complicated model would fit this much better, or more so.
      • Answer is P value
73
General idea of P value (usual)
  • Set null hypothesis. In this example, that the simple model is correct.
  • Do computations to express deviation of your data from null hypothesis as a single value, in this case F ratio.
  • Use math to determine the chance of obtaining such a large deviation, or larger, by coincidence if the null hypothesis is true.
74
General idea of P value (alternate)
  • Set null hypothesis. In this example, that the simple model is correct.
  • Set alternative, more complicated, hypothesis.
  • Calculate the observed difference in sum-of-squares.
  • Use math to determine the chance of obtaining such a large difference, or larger, by coincidence if the null hypothesis is true.
75
Comparing two means
76
Compare two means by comparing models
  • Model 1:  Y = GrandMean Enter as Y=Mean + 0*X if program insists on X)
  • Model 2: Y = GroupMean


77
Comparing two means by unpaired t test
  • t= (Difference between means)/(SE of diff)
  • SE of difference pools both SE, adjusting for sample size. SE of diff is larger than either SEM. In this case, the two SEMS are 1.64 and  1.28 and the SE of diff is 2.09.
  • t=(60.2 – 51.2)/2.09 = 4.38
  • Df=Sum of Ns minus 2. So 5+5-2=8
  • =TDIST(4.3,8,2) = 0.0026
78
 
79
Interpreting the P value
  • By comparing models. If the simpler model (one population with one mean) was correct, the chance that the more complicated model (two populations with different means) would fit the data so much better (or more so) is 0.2%
  • By t test. Assume null hypothesis that the two populations have same mean. If so, the chance of observing sample means as far apart as we observed (or more so) is 0.2%.
80
Compare standard slope (hill=1) to variable slope by F test
81
Test whether slope differs from 1.0. One-sample t test.
  • Our sample: slope=0.7535  SE=0.1356 df=22
  • Hypothetical slope = 1.0 (based on theory)


82
P value
83
Interpretation
  • Assume a null hypothesis: Curve really has a standard slope factor (Hill slope) of 1.0.
  • If you performed many experiments (same size and design as ours) the  observed (fitted) slope will be as far from 1.0 as we observed (or further) in 10.7% of samples.
  • Either a moderately rare coincidence has occurred -- or the null hypothesis is wrong.
  • The smaller the P value, the more apt you are to question the null hypothesis.
84
What question does a P value answer (one sample t)?
  • Assume the true mean is some hypothetical value (from theory or previous data). “Null hypothesis”
  • Account for the size and variability of the sample.
  • What is the chance that random sampling would result in an observed mean difference so far (or further) from the hypothetical mean?
85
Comparing fits from repeated experiments
86
Paired t test
  • Use when data are matched or paired.
    • Before-after
    • Control-subject matched before treatment
    • Twins, sibs
  • Why? More power. So important to do paired test when appropriate.
  • How? Subtract pairs, and perform one-sample t test on the differences. (excel)


87
Interpreting P values
88
The question a P value answers
  • If there really were no difference overall, what is the probability of observing such a large difference just by coincidence of random sampling?
  • If no difference overall, what fraction of experiments of this size would result in a difference as big or bigger than I saw?
  • Answer is P value.
89
Common misconception
90
P value as summary of data
  • P value summarizes findings in one number.
  • Similar to mean, ratio, or percent difference.
  • You don’t have to ever use the word “significant”, which often confuses more than helps.
  • Whenever you summarize the data with one number, you emphasize one aspect and ignore other aspects.
91
Hypothesis Testing
  • Reduce P value to binary result, necessary when you need to make a decision based on one experiment
  • Define an arbitrary threshold, a, usually 0.05 by tradition.
  • If P< a, “Reject null hypothesis”, “statistically significant”.
  •  If P> a, “Do not reject null hypothesis.” (Never say “accept”.) “Not statistically significant”
  • All makes sense for quality control. Not always helpful for analyzing experimental data.
92
Hypothesis testing in quality control vs. science
93
Type I vs. Type II errors
94
Analogy to trial by jury
95
“Very significant”
  • Some scientists use adjectives:
    • P<0.05  “significant”  *
    • P<0.01 “very significant” **
    • P<0.001 “extremely significant” ***
    • P<0.1 “almost significant”


96
 
97