Please enable JavaScript to view this site.

 Much of statistics can be viewed as comparing models

Finding the mean by fitting a model

You already know how to find the mean of a bunch of numbers: Add them up, and divide the total by the sample size. That is a convenient shortcut method, but you can also do it via a much harder route!

You can also find the mean by fitting a model. One way to think about this is to fit a linear regression model, but with the slope constrained to equal 0 so only the intercept is fit. When you fit such a model, the best-fit value of the Y intercept is the mean of your data. The mean is simply the best-fit parameter value from a simple model fit to your data.

Comparing two means by fitting a model

The usual way to compare two means is to perform an unpaired t test.

You can also compare to means by comparing two linear regression models. In both models, constrain the slope to equal zero. With a slope of zero, you are fitting a horizontal line.models.

The first model constrains (shares) the intercepts to be the same for both groups. You are fitting one horizontal line through all the values. The intercept of this line is the grand mean of all the values.

The second model doesn't constrain the intercepts so finds individual intercepts for each group. In other words, it fits one horizontal line to the data from one group, and another line to data from the other group. Each intercept is the mean of the respective group of data.

The P value from this comparison will be the same as the P value from an unpaired t test. The P value answers this question:

If the first model is really correct, what is the chance that just by coincidence the data will fit the other model as much better as observed?

Interpreting P values via comparing models

In almost all cases, you can understand a P value as the answer to a question in comparing models. In a clinical study analyzed by logistic regression, the key question is whether the treatment made a difference. Fit one logistic regression model where treatment is ignored, and another model where the treatment is included as one of the parameters in the model. If the P value is small, you can conclude that the treatment mattered. Does it work equally in men and women? Fit one model where gender is one of the parameters and compare to a model that omits gender. If the P value is small, you can conclude that the results differed in men and women.