Prism currently offers three different multiple regression model frameworks: linear, Poisson, and logistic. This section describes options for linear and Poisson. For more information about how to perform multiple logistic regression, check out its section of the guide.
Multiple linear regression is used when Y is a continuous variable. Prism minimizes the sum-of-squares of the vertical distances between the data points and the curve. This method is often called a least squares method. This is the appropriate choice if you assume that the distribution of residuals (distances of the points from the predicted values) are Gaussian.
Poisson regression is used when every Y value is a count (0, 1, 2, ..) of objects or events. These must be the actual counts, not normalized in any way. If a machines says your sample had 98.5 radioactive decays per minute, but you asked the counter to count each sample for ten minutes, then it counted 985 radioactive decays. That is the value you should enter for Poisson regression. If the Y values are normalized counts, and are not actual counts, then you should not choose Poisson regression.
One variable is the dependent, Y, variable and you must tell Prism which variable it is. The goal of multiple regression is to find the model that best predicts that variable.
Note that the Y variable should be continuous. If your outcome (Y) variable is binary (has only two possible values), you should use logistic regression rather than multiple regression.
Prism requires you to specify exactly what model you want to fit. It cannot automatically choose a set of variables or interactions for you. Read why.
The intercept is the value of Y when all the X values equal zero. You will almost always want to include the intercept. Only remove it from the model if you have a very strong reason, as this makes sense very rarely.
Each main effect multiplies one parameter by a regression coefficient (parameter). You will almost always want to include all main effects in your model. If you uncheck one of the main effects, that X variable will essentially not be part of the analysis (unless that variable is part of an interaction or transform as explained below).
Each two-way interaction multiplies two parameters together, and multiplies that product by a regression coefficient (parameter). Two-way interactions are often, but not always, used in multiple regression. Why "interaction"? Because the model uses the product of two variables. Of course, two variables can interact in many ways, not just the way captured by multiplying the two variables together.
Each three-way interaction multiplies three parameters together, and multiplies that product by a regression coefficient (parameter). Three-way interactions are used less commonly than two-way interactions.
Prism lets you use the square, the cube, or the square root of any parameter in the model. Let us know if you'd like Prism to offer other transforms when defining a multiple regression model.
In this example, column A is blood pressure in mmHg, column B is weight in kg, column C is gender coded as 0=male and 1=female. If you select column A to be the dependent (outcome variable) and include both columns B and C in the model, the resulting equation (model) is:
Blood pressure = Beta0 + Beta1*age +Beta2*weight +Beta3*gender + random scatter
Prism finds the values of the three coefficients (beta values). that that minimize the sum of the square of the differences between the Y values in your data and the Y values predicted by the equation.
The model is very simple, and it is surprising that it turns out to be so useful. For the blood pressure example, the model assumes:
•On average, blood pressure increases (or decreases) a certain amount (the best- fit value of Beta1) for every year of age. This amount is the same for men and women of all ages and all weights.
•On average, blood pressure increases (or decreases) a certain amount per pound (the best-fit value of Beta2). This amount is the same for men and women of all ages and all weights.
•On average, blood pressure differs by a certain amount between men and women (the best-fit value of Beta3). This amount is the same for people of all ages and weights.
The mathematical terms are that the model is linear and allows for no interaction. Linear means that holding other variables constant, the graph of blood pressure vs. age (or vs. weight) is a straight line. No interaction means that the slope of the blood pressure vs. age line is the same for all weights and for men and women.
If you checked the option to include the interaction between age and gender, the model would be:
Blood pressure = Beta0 + Beta1*age +Beta2*weight +Beta3*gender + Beta3*age*gender + random scatter