﻿ Lingo

# Lingo

## Variables

A regression model predicts one variable Y from one or more other variables X. The Y variable is called the dependent variable, the response variable or the outcome variable. The X variables are called independent variables, explanatory variables or predictor variables.

Each X variable can be a value that the experimenter manipulated, a treatment that the experimenter selected or assigned, or a value that the experimenter measures.

Each independent variable can be: continuous (e.g., age, blood pressure, weight) or binary (e.g. gender, defining zero as male and one as female. These codes, of course, are arbitrary. When there are only two possible values for a variable, it is called a dummy variable.

Consult advanced texts if you need to use a categorical variable with three or more categories (e.g., four medical school classes,  three different countries, or four different stages of cancer). Regression with a categorical variable with more than two categories is not straightforward, and it is easy to do it incorrectly.

## Parameters

The multiple regression model defines the dependent variable as a function of the independent variables and a set of parameters, also called regression coefficients. Regression methods find the values of each parameter that make the model predictions come as close as possible to the data. This approach is analogous to linear regression, which determines the values of the slope and intercept (the two parameters or regression coefficients of the model) to make the model predict Y from X as closely as possible.

## Simple regression versus multiple regression

Simple regression refers to models with a single X variable. Multiple regression, also called multivariable regression, refers to models with two or more X variables.

## Univariate versus multivariate regression

Although they are beyond the scope of this book, methods do exist that can simultaneously analyze several outcomes (Y variables) at once. These are called multivariate methods, and they include factor analysis, cluster analysis, principal components analysis, and multiple ANOVA (MANOVA). These methods contrast with univariate methods, which deal with only a single Y variable.

Note that the terms multivariate and univariate are used inconsistently. Sometimes multivariate is used to refer to multivariable methods for which there is one outcome and several independent variables (i.e., multiple and logistic regression). And sometimes univariate is used to refer to simple regression with only one independent variable.

## Linear vs. nonlinear multiple regression

Prism only performs linear multiple regression. This means that each parameter is linear with Y. If you made a graph of how Y changes as you change any parameter (while holding all the X values and all the other parameters constant), the graph would be a straight line.

It is certainly possible to write models with one Y variable and multiple X values related to Y via a nonlinear function. But Prism does not (yet) perform multiple nonlinear regression. Let us know, with details, if this would be helpful to you.