KNOWLEDGEBASE - ARTICLE #2069

What is the chance that your "statistically significant" conclusion is a false positive?

What is the chance that your "statistically significant" conclusion is a false positive?

When using the traditional P<0.05 to define statistical significance, many scientists mistakenly believe that the answer is 5%. Wrong!  The answer depends on the scientific context. The easiest way to see this is to consider the two extremes.

  • When you run negative controls, all "significant" results are false positives.
  • When you run positive controls, none of the "significant" results are false positives.

Most science is between these two extremes, of course. 

David Colquhoun has written two papers that discuss this issue in depth (1,2).  He has also created a great web app that does the calculations. 

Example. You obtained a P value of 0.05 from a two-sample t test with n=17 in each group. This means the test has 81% power to detect a difference between means equal to the standard deviation. The experimental context is that your hypothesis is not entirely speculative. There is some basis to expect a difference, but that is far from sure. You estimate the prior probability of a true difference between means equal to the SD (or larger) to be 20%. 

Question 1: For all P values less than 0.049, what is the False positive rate? This is easy to compute by hand. Imagine doing lots of experiments. Of the 20% where there truly is a difference (that's the prior probability), you expect 81%  (the power) to have P<0.05. So these true positives will be  20% * 81%= 16% of all experiments. Of the 80% where there is really no difference, you expect a P<0.05 in 5%. That is the definition of a P value. Of all experiments, these false positives will be 80% * 5% = 4%. The False Positive Rate is 4%/(4%+16%) = 20%.

Here is the result from the calculator.  For the "P-less-than case", the FPR is 20%

Question 2: The calculator also computes the "P-equals case". This is not easy to compute by hand. It asks: Of all possible experiments yielding a P value equal (or very close to 0.05) in this experimental situation, what fraction are false positives? The answer is 61%.  Wow. With a prior probability of 20% and a power of 80%, most P values of 0.05 will be false positives.

When interpreting a single P value, Colqhoun makes a strong case that what you care about is the probability that a P value equal to the P value you obtained is a false positive. 

Question 3:  You are planning an experiment, so you don't know what the P value will be.  Since you don't know what the P value will be, it can make sense to report or interpret the FPR computed with P-less-than method. The resulting FPR is the average FPR you'd see if you ran that experiment zillions of times. The actual FPR will depend on the P value you actually will end up observing in a particular experiment. If that P value is tiny, the FPR will also be small. But if the P value is just a bit lower than your threshold (say 0.049 if your threshold is 0.05), the FPR will be larger. Rather than computing an average expected FPR, it makes more sense I think to compute a worst-case FPR. That is the FPR computed for your alpha threshold (usually 0.05) using the "P-equals case". If your results end up being statistically significant, you know the FPR cannot be higher than that value (but can be lower, depending on what P value you end up observing). 

Question 4: Estimating the prior probability is not easy. So the calculator lets you approach the problem backward. Let's stick with P=0.05 and power=81%. Instead of estimating a prior probability, let's instead specify a goal for FPR. SInce many people think that the FPR is the same as alpha (significance threshold), let's set the desired FPR to 5% and see what prior probability is needed to achieve that goal. Here are the calculator input and output. 

To get an FPR as small as 5%, the prior probability has to be 54% for the "P-less-than" case and 88% for the "P-equals" case. You have to be almost certain before doing the experiment that you'll find an effect for the FPR to be so low when P=5%. 

URL of calculator: http://fpr-calc.ucl.ac.uk/

Note on terminology. These three terms mean the same thing:

  • FPR: False Positive Risk. This is the term Colquhoun uses in the calculator and 2017 paper. 
  • FPRP: False Positive Report Probability: This is the term that Lakens uses, and I use in the fourth edition of Intuitive Biostatistics
  • FDR: False Discovery Rate. This is the term that Colquhoun used in his 2014 paper and I used in the third edition of Intuitive Biostatistics. It is a bit confusing as FDR is usually used in the context of multiple comparisons, not interpreting P values. 

References

  1. Colquhoun D.(2014) An investigation of the false discovery rate and the misinterpretation of p-values. Royal Society Open Science. 1(3):140216. doi: 10.1098/rsos.140216. Click for full text 
  2. Colquhoun D. (2017). The reproducibility of research and misinterpretation of P values.bioRxiv, May 31, 2017, doi: http://dx.doi.org/10.1101/144337 Click for full text 

Explore the Knowledgebase

Analyze, graph and present your scientific work easily with GraphPad Prism. No coding required.