Prism currently offers three different multiple regression model frameworks: linear, Poisson, and logistic. This section describes options for linear and Poisson. For more information about how to perform multiple logistic regression, check out its section of the guide.
Multiple linear regression is used when Y is a continuous variable. Prism minimizes the sum-of-squares of the vertical distances between the data points and the curve. This method is often called a least squares method. This is the appropriate choice if you assume that the distribution of residuals (distances of the points from the predicted values) are Gaussian.
Poisson regression is used when every Y value is a count (0, 1, 2, ..) of objects or events. These must be the actual counts, not normalized in any way. If a machine says your sample had 98.5 radioactive decays per minute, but you asked the counter to count each sample for ten minutes, then it counted 985 radioactive decays. That is the value you should enter for Poisson regression. If the Y values are normalized counts, and are not actual counts, then you should not choose Poisson regression.
One variable is the dependent, Y, variable and you must tell Prism which variable it is. The goal of multiple regression is to find the model that best predicts that variable.
Note that the Y variable must be a continuous variable. If your outcome (Y) variable is binary (has only two possible values), you should use logistic regression rather than multiple regression.
The intercept is the value of the outcome variable when all the continuous predictor variables equal zero and the categorical predictor variables are set to their reference level. You will almost always want to include the intercept, so Prism fits its value. Only remove it from the model if you have a very strong reason, as this makes sense very rarely. Removing the intercept from the model is the same as setting it to zero.
Each main effect multiplies one parameter by a regression coefficient (parameter). You will almost always want to include all main effects in your model. For each continuous predictor variable, only one coefficient is required. The number of coefficients required for categorical predictor variables is equal to one fewer than the number of levels of the categorical variable (due to the process of variable encoding). If you uncheck one of the main effects, that predictor variable will essentially not be part of the analysis (unless that variable is part of an interaction or transform as explained below).
Each two-way interaction multiplies two parameters together, and multiplies that product by a regression coefficient (parameter). Two-way interactions are often, but not always, used in multiple regression. Why "interaction"? Because the model uses the product of two variables. Of course, two variables can interact in many ways, not just the way captured by multiplying the two variables together.
Each three-way interaction multiplies three parameters together, and multiplies that product by a regression coefficient (parameter). Three-way interactions are used less commonly than two-way interactions.
Prism lets you use the square, the cube, or the square root of any continuous predictor variable in the model. Let us know if you'd like Prism to offer other transforms when defining a multiple regression model.
In this example, variable A is blood pressure in mmHg, variable B is age in years, variable C is weight in kg, and variable D is gender with levels “male” and “female”. If you select variable A to be the dependent (outcome) variable and include variables B, C, and D in the model, the resulting model can be represented as:
Blood pressure ~ Intercept + Age + Weight + Gender
The full mathematical model being fit to the data in this case is:
Blood pressure = β0 + β1*Age + β2*Weight + β3*Gender[Male]
Prism finds the values of the coefficients (beta values) that minimize the sum of the square of the differences between the values of the outcome variable in your data and the values predicted by the equation.
The model is very simple, and it is surprising that it turns out to be so useful. For the blood pressure example, the model assumes:
•On average, blood pressure increases (or decreases) a certain amount (the best- fit value of the beta coefficient for Age) for every year of age. This amount is the same for men and women of all ages and all weights.
•On average, blood pressure increases (or decreases) a certain amount per pound (the best-fit value of the beta coefficient for Weight). This amount is the same for men and women of all ages and all weights.
•On average, blood pressure is higher (or lower) by a certain amount for men compared to women (the best-fit value of the beta coefficient for “Gender[Male]”; in this case, “Female” was the reference level for the predictor variable “Gender”). This amount is the same for people of all ages and weights.
•The intercept of this model is harder to conceptualize, as it represents a female (the reference level of the Gender variable) with age and weight both equal to zero. Clearly this value doesn’t represent an observation that could exist in reality (neither age nor weight can equal zero), but it is an important value for the model, and can be used with interpolation to predict values that are more reasonable (such as blood pressure of a female at the average values of age and weight).
The mathematical terms are that the model is linear and allows for no interaction. Linear means that holding other variables constant, the graph of blood pressure vs. age (or vs. weight) is a straight line. No interaction means that the slope of the blood pressure vs. age line is the same for all weights and for men and women.
If you checked the option to include the interaction between age and gender, the model would be shown as:
Blood pressure ~ Intercept + Age + Weight + Gender + Age:Gender
The full mathematical model including the interaction term would be:
Blood pressure = β0 + β1*Age + β2*Weight + β3*Gender[Male] + β4*Age*Gender[Male]