GraphPad Statistics Guide

Interpreting results: Correlation

Interpreting results: Correlation

Previous topic Next topic No expanding text in this topic  

Interpreting results: Correlation

Previous topic Next topic JavaScript is required for expanding text JavaScript is required for the print function Mail us feedback on this topic!  

Correlation coefficient

The correlation coefficient, r, ranges from -1 to +1. The nonparametric Spearman correlation coefficient, abbreviated rs, has the same range. This latter value is sometimes denoted by the Greek letter ρ (rho).

Value of r (or rs)

Interpretation

1.0

Perfect correlation

0 to 1

The two variables tend to increase or decrease together.

0.0

The two variables do not vary together at all.

-1 to 0

One variable increases as the other decreases.

-1.0

Perfect negative or inverse correlation.

If r or rs is far from zero, there are four possible explanations:

Changes in the X variable causes a change the value of the Y variable.

Changes in the Y variable causes a change the value of the X variable.

Changes in another variable influence both X and Y.

X and Y don’t really correlate at all, and you just happened to observe such a strong correlation by chance. The P value quantifies the likelihood that this could occur.

Notes on correlation coefficients:

If you choose Spearman nonparametric correlation, Prism computes the confidence interval of the Spearman correlation coefficient by an approximation. According to Zar (Biostatistical Analysis) this approximation should only be used when n>10. So with smaller n, Prism simply does not report the confidence interval of the Spearman correlation coefficient.

If you ask Prism to compute a correlation matrix (compute the correlation coefficient for each pair of variables), it computes a simple correlation coefficient for each pair, without regard for the other variables. It does not compute multiple regression, or partial regression, coefficients.

If all Y values are the same, it is not possible to compute a correlation coefficient (parametric or nonparametric), and Prism reports "horizontal line". Correlation asks how much X and Y vary together. If Y doesn't vary at all, that question is not meaningful and the correlation calculations can't be done (division by zero).

If all the X values are the same, it is not possible to compute a correlation coefficient, and Prism reports "vertical line".

r2

Perhaps the best way to interpret the value of r is to square it to calculate r2. Statisticians call this quantity the coefficient of determination, but scientists call it "r squared". It is a value that ranges from zero to one, and is the fraction of the variance in the two variables that is “shared”. For example, if r2=0.59, then 59% of the variance in X can be explained by variation in Y. Likewise, 59% of the variance in Y can be explained by variation in X. More simply, 59% of the variance is shared between X and Y.

Prism only calculates an r2 value from the Pearson correlation coefficient. It is not appropriate to compute r2 from the nonparametric Spearman correlation coefficient.

P value

The P value answers this question:

If there really is no correlation between X and Y overall, what is the chance that random sampling would result in a correlation coefficient as far from zero (or further) as observed in this experiment?

If the P value is small, you can reject the idea that the correlation is due to random sampling.

If the P value is large, the data do not give you any reason to conclude that the correlation is real. This is not the same as saying that there is no correlation at all. You just have no compelling evidence that the correlation is real and not due to chance. Look at the confidence interval for r. It will extend from a negative correlation to a positive correlation. If the entire interval consists of values near zero that you would consider biologically trivial, then you have strong evidence that either there is no correlation in the population or that there is a weak (biologically trivial) association. On the other hand, if the confidence interval contains correlation coefficients that you would consider biologically important, then you couldn't make any strong conclusion from this experiment. To make a strong conclusion, you’ll need data from a larger experiment.

If you entered data onto a column table and requested a correlation matrix, Prism will report a P value for the correlation of each column with every other column. These P values do not include any correction for multiple comparisons.

Prism always reports two-tailed (two-sided) P values.

How Prism computes the P value for Spearman nonparametric correlation

With 17 or fewer XY pairs, Prism computes an exact P value for nonparametric (Spearman) correlation,  looking at all possible permutations of the data. The exact calculations handle ties with no problem. With 18 or more pairs, Prism computes an approximate P value for nonparametric correlation). This approximation is standard. It first computes a t ratio from Rs, and then computes P from that.

Prism 5 used a cutoff of >13 pairs to do an approximate calculation in the absence of ties and always used the approximation in the presence of ties, while now Prism uses a cutoff of >17 pairs. Therefore Prism 5 will report different (less accurate) results for data sets with between 14 and 17 pairs or data sets with fewer than 17 pairs but with ties.

Prism 7 fixed a bug in Prism 6 (up to 6.05 and 6.0f, but not in earlier versions) that resulted in incorrect P values sometimes when Rs was negative, there were tied values, and the P value was computed exactly.