KNOWLEDGEBASE - ARTICLE #1114

Which test should I use to determine the statistical significance of a 2x2 contingency table? Chi-square? Fisher's exact test? Something else?

Last modified January 1, 2009
Contingency tables summarize results where you compare two or more groups when the outcome is a categorical variable (such as disease vs. no disease, pass vs. fail, artery open vs. artery obstructed). Most often, a contingency tables has two rows and two columns, so is called a '2x2 contingency table'.  Each entry in the table is the number of subjects who were given the treatment. You want to know if the association between columns and rows is more than you'd expect to see by chance.

 

Statisticians have developed many ways to analyze a 2x2 contingency table to obtain a P value. Which test should you choose?

Conventional advice

In the days before computers were readily available, people analyzed contingency tables by hand, or using a calculator, using chi-square tests. But the chi-square test is only an approximation. The Yates' continuity correction is designed to make the chi-square approximation better, but it overcorrects so gives a P value that is too large (too 'conservative'). With large sample sizes, the Yates' correction makes little difference, and the chi-square test works very well. With small sample sizes, chi-square is not accurate, with or without Yates' correction. 

Fisher's exact test, as its name implies, always gives an exact P value and works fine with small sample sizes. Fisher's test (unlike chi-square) is very hard to calculate by hand, but is easy to compute with a computer. Most statistical books advise using it instead of chi-square test. My advice: Use Fishers tests, unless someone requires you to use chi-square test.

Controversies 

As its name implies, Fisher's exact test, gives an exactly correct answer no matter what sample size you use. But some statisticians conclude that Fisher's test gives the exact answer to the wrong question, so its result is also an approximation to the answer you really want. The problem is that the Fisher's test is based on assuming that the row and column totals are fixed by the experiment. In fact, the row totals are fixed in a prospective study or an experiment, the column totals are fixed in a retrospective case-control study, and only the overall N is fixed in a cross-sectional experiment. Since the constraints of your study design don't match the constraints of Fisher's test, you could question whether the exact P value produced by Fisher's test actually answers the question you had in mind.

An alternative to Fisher's test is the Barnard test.  Fisher's test is said to be 'conditional' on the row and column totals, while Barnard's test is not. Mehta and Senchaudhuri explain the difference and why Barnard's test has more power. 

A.R. Feinstein (see below) has an interesting perspective on this: "The controversy has been useful for generating graduate-school theses in statistics, but can otherwise be generally ignored... and the Fisher test calculations can be done". It is also worth noting that Fisher convinced Barnard to repudiate his test!

At this time, we do not plan to implement Bernard's test into GraphPad Prism, InStat or our free QuickCalc web calculators. There certainly does not seem to be any consensus among statisticians.  But let us know if you disagree.

Explore the Knowledgebase

Analyze, graph and present your scientific work easily with GraphPad Prism. No coding required.