Data Analysis Blog
A peculiar prevalence of p values just below .05
Does the "P<0.05" threshold lead investigators to cut some corners? To find out, Masicampo and Lalande(1) systematically reviewed many articles in three highly regarded, and peer reviewed, psychology journals. They recorded the value of each P value, and computed the P value when it was not reported. The distribution of 3627 P values is shown below. (This figure is not in the paper, but was created by Larry Wasserman from the raw data obtained from the original authors).
Clearly there is a discontinuity at 0.05. There are "too many" P values just a little bit less than 0.05 (to the left of the red line) and "too few" greater than 0.05. Or as the authors say, "The number of p values in the psychology literature that barely meet the criterion for statistical significance (i.e., that fall just below .05) is unusually large, given the number of p values occurring in other ranges."
Why is this? Possible reasons:
- Give up. Investigators with P values just above 0.05 may not bother to write up the results.
- Publication bias. Journals may reject manuscripts with P values greater than 0.05.
- Tweaking. The investigators may have played with the analyses. If one analysis doesn't give a P value less than 0.05, then try a different one. Switch between parametric and nonparametric. Try including different variables in multiple regression models. Report only the analyses that give the P values less than 0.05.
- Dynamic sample size. The investigators may have analyzed their data several times. Each time, they may stop if the P value is less than 0.05, but collect more data when the P value is above 0.05. This approach yields misleading results as it is biased towards stopping when P values are small.
- Slice and dice. The investigators may tried analyzing various subsets of the data, only only reported the subsets that gave low P values.
- Outliers. The investigators may have tried using various definitions of outliers, so reanalyzed the data several times, and only reported the analyses with low P values.
These P values were all collected from psychology journals, but I suspect similar (or perhaps even more "peculiar") results would be found in the biomedical literature. These findings point out that P values are not quite as objective as they seem to be, and demonstrate the consequences of treating P<0.05 as a sacred cut off.