|
Unequal weighting in nonlinear regression |
|
|
The idea of unequal weighting Regression is most often done by minimizing the sum-of-squares of the vertical distances of the data from the line or curve. Points further from the curve contribute more to the sum-of-squares. Points close to the curve contribute little. This makes sense, when you expect experimental scatter to be the same, on average, in all parts of the curve. In many experimental situations, you expect the average distance (or rather the average absolute value of the distance) of the points from the curve to be higher when Y is higher. The points with the larger scatter will have much larger sum-of-squares and thus dominate the calculations. To restore equal weighting to all the data points, you can choose a weighting method, as described below. Prism offers six choices on the Weights tab of nonlinear regression. Relative weighting (weighting by 1/Y2) The weighting method used most often is called weighting by 1/Y2. It is easier to think of this method as minimizing the sum-of-squares of the relative distances of the data from the curve. This method is appropriate when you expect the average distance of the points from the curve to be higher when Y is higher, but the relative distance (distance divided by Y) to be a constant. In this common situation, minimizing the sum-of-squares is inappropriate because points with high Y values will have a large influence on the sum-of-squares value while points with smaller Y values will have little influence. Minimizing the sum of the squares of the relative distances restores equal weighting to all points. There are two ways to express the equation describing the quantity that nonlinear regression minimizes, shown below. The form on the left is easier to understand. You divide the distance of the data from the curve by the Y values of the data to obtain the relative distance, and then square that result. Most books on nonlinear regression use the equivalent form shown on the right – you first square the distance of the data from the curve, and then multiply that value times a weighting constant equal to 1/Y2. That explains why relative weighting is often called weighting by 1/Y2.
Weighting by 1/Y Weighting by 1/Y is a compromise between minimizing the actual distance squared and minimizing the relative distance squared. One situation where 1/Y weighting is appropriate is when the Y values follow a Poisson distribution. This would be the case when Y values are radioactive counts and most of the scatter is due to counting error. With the Poisson distribution, the standard error of a value equals the square root of that value. Therefore you divide the distance between the data and the curve by the square root of the value, and then square that result. The equation below shows the quantity that Prism minimizes, and shows why it is called weighting by 1/Y.
Weighting by 1/X The choices to weight by 1/X or 1/X2 are rarely used. These choices are useful when you want to weight the points at the left part of the graph more than points to the right. Weighting by observed standard deviation Prism also offers the choice to weight by the reciprocal of the standard deviation squared. This means that data with little scatter (smaller standard deviation) get more weight that data with lots of scatter. This option will be useful if you understand how the scatter (or errors) arise in your experimental system, and can calculate appropriate weighting factors based on theory. Format a data table for entry of mean and SD, and enter (or paste) the weighting factors into the SD column. Don't use 1/SD2 weighting if the SD values are computed from a few replicates. Random scatter can cause some SD values to be high and some low, and these differences may not reflect consistent differences in variability. You want to choose a weighting scheme to account for systematic differences in the predicted amount of variability if you were to repeat the experiment many times. You should not choose weighting based on variability you happened to observe in one small experiment. If you choose to weight by 1/SD2, Prism minimizes this quantity:
How Prism 5 implements weighting Prism 5 weights by the Y value of the curve. Previous versions weighted by the Y value of the data. The distinction is subtle and rarely matters much, but our simulations show that the results are sometimes more accurate when weights are based on the value of the curve rather than the data. The situation is a bit tricky. The goal is to adjust the values of the parameters to minimize the weighted sum-of-squares. But the values of the weights depend on the values of those parameters. Here is how Prism resolves this issue:
Weighting with robust regression or automatic outlier removal As we explain in reference 1, it doesn't make sense to perform robust regression using unequal weights. The problem is that outliers can get too much weight. If you choose both unequal weighting and robust fitting, therefore, Prism does the fitting assuming equal weights. However it uses your weighting choice when creating a table of residuals, and when counting the number of outliers (a choice you can make in the preferences tab). If you choose both unequal weighting and automatic outlier removal, Prism first fits using robust regression (ignoring your weighting choice), and then uses the weighting factors in identifying the outliers, as explained in reference 1. Reference
|