{ } Raw JSON

bundles / scipy 1.17.1 / scipy / stats / _stats_py / fisher_exact

function

`scipy.stats._stats_py:fisher_exact`

source: /scipy/stats/_stats_py.py :4788

Signature

def fisher_exact ( table , alternative = None , * , method = None )

Summary

Perform a Fisher exact test on a contingency table.

Extended Summary

For a 2x2 table, the null hypothesis is that the true odds ratio of the populations underlying the observations is one, and the observations were sampled from these populations under a condition: the marginals of the resulting table must equal those of the observed table. The statistic is the unconditional maximum likelihood estimate of the odds ratio, and the p-value is the probability under the null hypothesis of obtaining a table at least as extreme as the one that was actually observed.

For other table sizes, or if method is provided, the null hypothesis is that the rows and columns of the tables have fixed sums and are independent; i.e., the table was sampled from a scipy.stats.random_table distribution with the observed marginals. The statistic is the probability mass of this distribution evaluated at table, and the p-value is the percentage of the population of tables with statistic at least as extreme (small) as that of table. There is only one alternative hypothesis available: the rows and columns are not independent.

There are other possible choices of statistic and two-sided p-value definition associated with Fisher's exact test; please see the Notes for more information.

Parameters

table : array_like of ints

A contingency table. Elements must be non-negative integers.

alternative : {'two-sided', 'less', 'greater'}, optional

Defines the alternative hypothesis for 2x2 tables; unused for other table sizes. The following options are available (default is 'two-sided'):

'two-sided': the odds ratio of the underlying population is not one
'less': the odds ratio of the underlying population is less than one
'greater': the odds ratio of the underlying population is greater than one

See the Notes for more details.

method : ResamplingMethod, optional

Defines the method used to compute the p-value. If method is an instance of PermutationMethod/MonteCarloMethod, the p-value is computed using scipy.stats.permutation_test/scipy.stats.monte_carlo_test with the provided configuration options and other appropriate settings. Note that if method is an instance of MonteCarloMethod, the rvs attribute must be left unspecified; Monte Carlo samples are always drawn using the rvs method of scipy.stats.random_table. Otherwise, the p-value is computed as documented in the notes.

Returns

res : SignificanceResult

An object containing attributes:

statistic: statistic
pvalue: pvalue

Raises

: ValueError: If table is not two-dimensional or has negative entries.

Notes

Null hypothesis and p-values

The null hypothesis is that the true odds ratio of the populations underlying the observations is one, and the observations were sampled at random from these populations under a condition: the marginals of the resulting table must equal those of the observed table. Equivalently, the null hypothesis is that the input table is from the hypergeometric distribution with parameters (as used in hypergeom) M = a + b + c + d, n = a + b and N = a + c, where the input table is [[a, b], [c, d]]. This distribution has support max(0, N + n - M) <= x <= min(N, n), or, in terms of the values in the input table, min(0, a - d) <= x <= a + min(b, c). x can be interpreted as the upper-left element of a 2x2 table, so the tables in the distribution have form

[  x           n - x     ]
[N - x    M - (n + N) + x]

For example, if

table = [6  2]
        [1  4]

then the support is 2 <= x <= 7, and the tables in the distribution are

[2 6]   [3 5]   [4 4]   [5 3]   [6 2]  [7 1]
[5 0]   [4 1]   [3 2]   [2 3]   [1 4]  [0 5]

The probability of each table is given by the hypergeometric distribution hypergeom.pmf(x, M, n, N). For this example, these are (rounded to three significant digits)

x       2      3      4      5       6        7
p  0.0163  0.163  0.408  0.326  0.0816  0.00466

These can be computed with

>>> import numpy as np
>>> from scipy.stats import hypergeom
>>> table = np.array([[6, 2], [1, 4]])
>>> M = table.sum()
>>> n = table[0].sum()
>>> N = table[:, 0].sum()
>>> start, end = hypergeom.support(M, n, N)
>>> hypergeom.pmf(np.arange(start, end+1), M, n, N)
array([0.01631702, 0.16317016, 0.40792541, 0.32634033, 0.08158508,
       0.004662  ])

The two-sided p-value is the probability that, under the null hypothesis, a random table would have a probability equal to or less than the probability of the input table. For our example, the probability of the input table (where x = 6) is 0.0816. The x values where the probability does not exceed this are 2, 6 and 7, so the two-sided p-value is 0.0163 + 0.0816 + 0.00466 ~= 0.10256

>>> from scipy.stats import fisher_exact
>>> res = fisher_exact(table, alternative='two-sided')
>>> res.pvalue
0.10256410256410257

The one-sided p-value for alternative='greater' is the probability that a random table has x >= a, which in our example is x >= 6, or 0.0816 + 0.00466 ~= 0.08626

>>> res = fisher_exact(table, alternative='greater')
>>> res.pvalue
0.08624708624708627

This is equivalent to computing the survival function of the distribution at x = 5 (one less than x from the input table, because we want to include the probability of x = 6 in the sum)

>>> hypergeom.sf(5, M, n, N)
0.08624708624708627

For alternative='less', the one-sided p-value is the probability that a random table has x <= a, (i.e. x <= 6 in our example), or 0.0163 + 0.163 + 0.408 + 0.326 + 0.0816 ~= 0.9949

>>> res = fisher_exact(table, alternative='less')
>>> res.pvalue
0.9953379953379957

This is equivalent to computing the cumulative distribution function of the distribution at x = 6:

>>> hypergeom.cdf(6, M, n, N)
0.9953379953379957

Odds ratio

The calculated odds ratio is different from the value computed by the R function fisher.test. This implementation returns the "sample" or "unconditional" maximum likelihood estimate, while fisher.test in R uses the conditional maximum likelihood estimate. To compute the conditional maximum likelihood estimate of the odds ratio, use scipy.stats.contingency.odds_ratio.

Array API Standard Support

fisher_exact has experimental support for Python Array API Standard compatible backends in addition to NumPy. Please consider testing these features by setting an environment variable SCIPY_ARRAY_API=1 and providing CuPy, PyTorch, JAX, or Dask arrays as array arguments. The following combinations of backend and device (or other capability) are supported.

====================  ====================  ====================
Library               CPU                   GPU
====================  ====================  ====================
NumPy                 ✅                     n/a                 
CuPy                  n/a                   ⛔                   
PyTorch               ⛔                     ⛔                   
JAX                   ⛔                     ⛔                   
Dask                  ⛔                     n/a                 
====================  ====================  ====================

See dev-arrayapi for more information.

Examples

from scipy.stats import fisher_exact
res = fisher_exact([[8, 2], [1, 5]])

✓

res.statistic
res.pvalue

✗

For tables with shape other than ``(2, 2)``, provide an instance of `scipy.stats.MonteCarloMethod` or `scipy.stats.PermutationMethod` for the `method` parameter:

import numpy as np
from scipy.stats import MonteCarloMethod
rng = np.random.default_rng(4507195762371367)
method = MonteCarloMethod(rng=rng)

✓

fisher_exact([[8, 2, 3], [1, 5, 4]], method=method)

✗

For a more detailed example, see :ref:`hypothesis_fisher_exact`.

Aliases

scipy.stats.fisher_exact

Referenced by

This package

release:1.15.0-notes