|
1
|
- Harvey Motulsky
- hmotulsky@graphpad.com
- www.graphpad.com
|
|
2
|
|
|
3
|
- Linear Y=slope*X+intercept
- Polynomial Y = A + BX +CX2 + DX3 …
- Multiple (Y=A + BX1 + CX2…)
- Logistic (outcome is binary)
- Proportional hazards (outcome is survival time)
|
|
4
|
- A mathematical model is a description of a physical, chemical or
biological state or process.
- "A mathematical model is neither an hypothesis nor a theory. Unlike
scientific hypotheses, a model is not verifiable directly by an
experiment. For all models are both true and false.... The validation of
a model is not that it is "true" but that it generates good
testable hypotheses relevant to important problems. "
-- R. Levins, Am. Scientist 54:421-31, 1966
- “All models are wrong, but some
are useful.”
-- George E. P. Box
|
|
5
|
|
|
6
|
- Exponential equations are used to model many processes where the rate at
which something happens is proportional to the amount which is left.
- Ligands dissociating from receptors,.
- Radioactive isotope decay.
- Drug metabolism.
|
|
7
|
|
|
8
|
|
|
9
|
- Fitting a model would be easy if the data completely followed the model.
- Experimental error adds complexity.
- Have to choose a model for distribution of scatter.
- Most common assumption is Gaussian
|
|
10
|
|
|
11
|
- If scatter really follows a Gaussian distribution, then two points each
5 units from the curve is way more likely than one point 1 unit away,
and another 9 units away.
- So you can’t minimize sum of distances (those two examples have same
sum).
- Instead, minimize sum of squares of distances. “Least squares”
regression.
- Mathematicians: “Maximum likelihood”
|
|
12
|
- A model for what’s going on
- A model for random scatter
- A goal (minimize sum of squares of vertical distances between points and
curve)
|
|
13
|
- Scale to reasonable units. No huge or tiny numbers.
- Don’t smooth
- Transform if it makes the scatter more Gaussian. Don’t transform to make
linear.
- ? Prune obvious outliers??
- If X is log(conc) what about zero? Often people put it in as a very low
conc.
|
|
14
|
- Based on physical or chemical model
- Examples: exponential decay, equilibrium binding, enzyme velocity as a
function of [substrate]
- Empirical
|
|
15
|
- You can write the equation either way.
- The best-fit results will be the same either way.
- You can easily convert EC50 to/from logEC50.
- The SE and CI values are different, and not convertible…
|
|
16
|
|
|
17
|
- A model defines Y as a function of X and one or more parameters.
- You can ask the nonlinear regression program to find best-fit values for
any number of those.
- You can set some to constant values. Do so when you know more than the
data.
- If the baseline HAS to be zero, then set it to a constant.
- If the top value in a dose-response curve has to be 100%, then make
that a constant.
|
|
18
|
- Constrain rate constants to positive values
- Constrain fractions to be between 0 and 1
- Etc.
|
|
19
|
- Usually not a big deal if you understand the equation and view the data.
- For dose-response curve:
- Initial value of top can be the max Y value
- Initial value of bottom can be the min Y value
- Initial value of logEC50 can be the middle of the X range
- Initial value of HillSlope can be 1 or -1
|
|
20
|
- If variability is really Gaussian, it gives the best answer.
- In math lingo: Results match maximum likelihood estimate (assuming
Gaussian scatter).
- If variability is Gaussian, one big deviation is much less likely than
several small deviations.
- Two deviations of 5, much more likely than one deviation of 9 and
another of 1. SS=50 vs 82.
- But, outliers can muck things up!
|
|
21
|
- If scatter increases as Y increases, then assumptions of regression
aren’t met. Larger points will have larger deviations – and much larger
squared deviations -- and dominate the calculations.
- Solution: Weight (or unweight) the squared distance so that the average
weighted-sum-of-square is the same at all points along the curve.
- Problem: It’s hard to know when to weight.
|
|
22
|
|
|
23
|
- Approach 1. Enter each replicate value as a separate data point.
- Appropriate when errors are independent… when the experimental error in
one replicate is no more related to the other replicates at the same X
than to other data in the experiment.
- Approach 2. Enter only the means.
- Appropriate when replicates are not independent. Example: Dose response
curve, where each dose is a separate animal.
|
|
24
|
- Outliers can throw things off a lot, especially if you don’t have many
points.
- One solution is to remove outliers, but hard to decide which points to
remove.
- Another solution is to use a weighting method that gives less weight to
outlying points. Robust nonlinear regression.
|
|
25
|
- Method of steepest descent. Head downhill step by step. Good initially, slow later.
- Gauss-Newton method. Assume surface is parabola. From position and
slopes, find the bottom. Repeat. Bad initially. Fine later.
- **Marquardt-Levenberg. Blends steepest descent with Gauss-Newton.
- Simplex. Not used much. No confidence intervals.
|
|
26
|
|
|
27
|
- Prism stops iterating and declares the results to have converged when
two iterations in a row change the sum-of-squares by less than 0.01%.
- If you check the box for strict convergence criteria, Prism will
continue the iterations until five consecutive iterations each reduce
the sum-of-squares by less than 0.000001%.
|
|
28
|
- Y = Intercept + slope*X
- Special to mathematicians, as it is much easier to find best-fit values
for slope and intercept. Initial values and iterations not needed. Local
minima impossible.
- From scientists point-of-view, not all that different.
- Choose when the linear model makes sense, but don’t bend over backwards
to make a model linear.
|
|
29
|
- Nonlinear regression is great when you want to fit to a chemical,
physiological physical, or empirical model.
- Consider different tools if you have a
different goal:
- Draw nice looking curve for figure.
- Interpolate from a standard curve
- Create “black box” for simulation
|
|
30
|
|
|
31
|
- Polynomial regression.
- Transform to make linear (Scatchard. Lineweaver Burk, etc.)
- Computer programs that pick an equation for you.
|
|
32
|
- Equation: Y=A + BX + CX2 +DX3....
- Advantages: Easy calculations. Available in lots of programs. No need
for initial values.
- Disadvantage: Biological and chemical processes rarely follow polynomial
models.
- Take home message: Beware the term “curve fitting”.
|
|
33
|
|
|
34
|
|
|
35
|
|
|
36
|
|
|
37
|
|
|
38
|
|
|
39
|
|
|
40
|
- Values of the variables
- SE and CI of the variables
- R squared
- Sum-of-squares
|
|
41
|
|
|
42
|
- Data really follow model described by equation
- Scatter is Gaussian, with same SD all along the curve
- All experimental error is in Y (not X)
- Independence. Experimental “error” in one point is not affected by its
neighbors.
- No systematic errors in your measurements.
|
|
43
|
- Did the program converge on a solution?
- An error message doesn’t mean a bug in the program, just a problem with
data, model, initial values…
- Are the best-fit values scientifically plausible?
- Are the confidence intervals narrow?
- Does the curve come close to the points?
|
|
44
|
|
|
45
|
|
|
46
|
|
|
47
|
|
|
48
|
|
|
49
|
|
|
50
|
|
|
51
|
|
|
52
|
|
|
53
|
|
|
54
|
|
|
55
|
|
|
56
|
|
|
57
|
- Is the fit a local minimum?
- Are the residuals random?
- Are there too few runs?
|
|
58
|
|
|
59
|
- All iterative methods of nonlinear regression stop when any small change
in the values of the parameters make the fit worse (increase the
sum-of-squares).
- No iterative method will “know” if a much better fit is possible by
making some large changes in the parameters.
- Nonlinear regression finds a valley, but may not realize there is a
deeper valley over the ridge.
|
|
60
|
|
|
61
|
|
|
62
|
- A run is a series of consecutive points (may be just one) that are all
on the same side of the curve.
- If the point are randomly distributed around the curve, you can predict
the number of runs expected from the number of positive(A) and negative
(B) residuals. Expected # of runs = 1+2AB/(A+B))
- P value: Chance of obtaining so few runs (or fewer) by chance.
- Too few runs (low P value) suggests that the model is wrong. Systematic
deviations.
- What does it mean if there are too many runs?
|
|
63
|
|
|
64
|
- Check that both fits make sense
- Reject a fit that has best-fit values that are biologically irrelevant.
- negative rate constants
- Fractions that are <0 or >1
- EC50 values not in the range of the data
- Reject a fit if the confidence intervals for the best-fit values are
super wide.
|
|
65
|
|
|
66
|
|
|
67
|
|
|
68
|
- If the simpler model fits better (lower sum-of-squares)
- Accept it. No reason to consider more complicated model.
- No need for statistics.
|
|
69
|
- This does not mean you should accept the fancier model.
- If you add variables to a model, the curve wiggles more, so usually
comes closer to the data.
- Need to use statistics to ask whether the two-site fit is even better
than expected by chance.
|
|
70
|
|
|
71
|
|
|
72
|
- If the more complicated model fits better (lower sum-of-squares),F test
answers this question:
- If the simpler model really were correct, what is the chance that the
more complicated model would fit this much better, or more so.
- Answer is P value
|
|
73
|
- Set null hypothesis. In this example, that the simple model is correct.
- Do computations to express deviation of your data from null hypothesis
as a single value, in this case F ratio.
- Use math to determine the chance of obtaining such a large deviation, or
larger, by coincidence if the null hypothesis is true.
|
|
74
|
- Set null hypothesis. In this example, that the simple model is correct.
- Set alternative, more complicated, hypothesis.
- Calculate the observed difference in sum-of-squares.
- Use math to determine the chance of obtaining such a large difference,
or larger, by coincidence if the null hypothesis is true.
|
|
75
|
|
|
76
|
- Model 1: Y = GrandMean Enter as
Y=Mean + 0*X if program insists on X)
- Model 2: Y = GroupMean
|
|
77
|
- t= (Difference between means)/(SE of diff)
- SE of difference pools both SE, adjusting for sample size. SE of diff is
larger than either SEM. In this case, the two SEMS are 1.64 and 1.28 and the SE of diff is 2.09.
- t=(60.2 – 51.2)/2.09 = 4.38
- Df=Sum of Ns minus 2. So 5+5-2=8
- =TDIST(4.3,8,2) = 0.0026
|
|
78
|
|
|
79
|
- By comparing models. If the simpler model (one population with one mean)
was correct, the chance that the more complicated model (two populations
with different means) would fit the data so much better (or more so) is
0.2%
- By t test. Assume null hypothesis that the two populations have same
mean. If so, the chance of observing sample means as far apart as we
observed (or more so) is 0.2%.
|
|
80
|
|
|
81
|
- Our sample: slope=0.7535
SE=0.1356 df=22
- Hypothetical slope = 1.0 (based on theory)
|
|
82
|
|
|
83
|
- Assume a null hypothesis: Curve really has a standard slope factor (Hill
slope) of 1.0.
- If you performed many experiments (same size and design as ours)
the observed (fitted) slope will
be as far from 1.0 as we observed (or further) in 10.7% of samples.
- Either a moderately rare coincidence has occurred -- or the null
hypothesis is wrong.
- The smaller the P value, the more apt you are to question the null
hypothesis.
|
|
84
|
- Assume the true mean is some hypothetical value (from theory or previous
data). “Null hypothesis”
- Account for the size and variability of the sample.
- What is the chance that random sampling would result in an observed mean
difference so far (or further) from the hypothetical mean?
|
|
85
|
|
|
86
|
- Use when data are matched or paired.
- Before-after
- Control-subject matched before treatment
- Twins, sibs
- Why? More power. So important to do paired test when appropriate.
- How? Subtract pairs, and perform one-sample t test on the differences. (excel)
|
|
87
|
|
|
88
|
- If there really were no difference overall, what is the probability of
observing such a large difference just by coincidence of random
sampling?
- If no difference overall, what fraction of experiments of this size
would result in a difference as big or bigger than I saw?
- Answer is P value.
|
|
89
|
|
|
90
|
- P value summarizes findings in one number.
- Similar to mean, ratio, or percent difference.
- You don’t have to ever use the word “significant”, which often confuses
more than helps.
- Whenever you summarize the data with one number, you emphasize one
aspect and ignore other aspects.
|
|
91
|
- Reduce P value to binary result, necessary when you need to make a
decision based on one experiment
- Define an arbitrary threshold, a, usually 0.05 by tradition.
- If P< a,
“Reject null hypothesis”, “statistically significant”.
- If P> a, “Do not reject null hypothesis.” (Never say
“accept”.) “Not statistically significant”
- All makes sense for quality control. Not always helpful for analyzing
experimental data.
|
|
92
|
|
|
93
|
|
|
94
|
|
|
95
|
- Some scientists use adjectives:
- P<0.05 “significant” *
- P<0.01 “very significant” **
- P<0.001 “extremely significant” ***
- P<0.1 “almost significant”
|
|
96
|
|
|
97
|
|