bundles / scipy latest / scipy / stats / _stats_py / combine_pvalues

function

`scipy.stats._stats_py:combine_pvalues`

source: /scipy/stats/_stats_py.py :8741

Signature

   def     combine_pvalues (    pvalues    ,    method    =  fisher   ,    weights    =  None   ,  * ,     axis    =  0   ,    nan_policy    =  propagate   ,    keepdims    =  False     )

Summary

Combine p-values from independent tests that bear upon the same hypothesis.

Extended Summary

These methods are intended only for combining p-values from hypothesis tests based upon continuous distributions.

Each method assumes that under the null hypothesis, the p-values are sampled independently and uniformly from the interval [0, 1]. A test statistic (different for each method) is computed and a combined p-value is calculated based upon the distribution of this test statistic under the null hypothesis.

Parameters

pvalues : array_like

Array of p-values assumed to come from independent tests based on continuous distributions.

method : {'fisher', 'pearson', 'tippett', 'stouffer', 'mudholkar_george'}

Name of method to use to combine p-values.

The available methods are (see Notes for details):

'fisher': Fisher's method (Fisher's combined probability test)
'pearson': Pearson's method
'mudholkar_george': Mudholkar's and George's method
'tippett': Tippett's method
'stouffer': Stouffer's Z-score method

weights : array_like, optional

Optional array of weights used only for Stouffer's Z-score method. Ignored by other methods.

axis : int or None, default: 0

If an int, the axis of the input along which to compute the statistic. The statistic of each axis-slice (e.g. row) of the input will appear in a corresponding element of the output. If None, the input will be raveled before computing the statistic.

nan_policy : {'propagate', 'omit', 'raise'}

Defines how to handle input NaNs.

propagate: if a NaN is present in the axis slice (e.g. row) along which the statistic is computed, the corresponding entry of the output will be NaN.
omit: NaNs will be omitted when performing the calculation. If insufficient data remains in the axis slice along which the statistic is computed, the corresponding entry of the output will be NaN.
raise: if a NaN is present, a ValueError will be raised.

keepdims : bool, default: False

If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.

Returns

res : SignificanceResult

An object containing attributes:

statistic: statistic
pvalue: pvalue

Notes

If this function is applied to tests with a discrete statistics such as any rank test or contingency-table test, it will yield systematically wrong results, e.g. Fisher's method will systematically overestimate the p-value ^[1]. This problem becomes less severe for large sample sizes when the discrete distributions become approximately continuous.

The differences between the methods can be best illustrated by their statistics and what aspects of a combination of p-values they emphasise when considering significance ^[2]. For example, methods emphasising large p-values are more sensitive to strong false and true negatives; conversely methods focussing on small p-values are sensitive to positives.

The statistics of Fisher's method (also known as Fisher's combined probability test) ^[3] is $- 2 \sum_{i} lo g (p_{i})$ , which is equivalent (as a test statistics) to the product of individual p-values: $\prod_{i} p_{i}$ . Under the null hypothesis, this statistics follows a $χ^{2}$ distribution. This method emphasises small p-values.
Pearson's method uses $- 2 \sum_{i} lo g (1 - p_{i})$ , which is equivalent to $\prod_{i} \frac{1}{1 - p _{i}}$ ^[2]. It thus emphasises large p-values.
Mudholkar and George compromise between Fisher's and Pearson's method by averaging their statistics ^[4]. Their method emphasises extreme p-values, both close to 1 and 0.
Stouffer's method ^[5] uses Z-scores and the statistic: $\sum_{i} Φ^{- 1} (p_{i})$ , where $Φ$ is the CDF of the standard normal distribution. The advantage of this method is that it is straightforward to introduce weights, which can make Stouffer's method more powerful than Fisher's method when the p-values are from studies of different size ^[6] ^[7].
Tippett's method uses the smallest p-value as a statistic. (Mind that this minimum is not the combined p-value.)

Fisher's method may be extended to combine p-values from dependent tests ^[8]. Extensions such as Brown's method and Kost's method are not currently implemented.

Beginning in SciPy 1.9, np.matrix inputs (not recommended for new code) are converted to np.ndarray before the calculation is performed. In this case, the output will be a scalar or np.ndarray of appropriate shape rather than a 2D np.matrix. Similarly, while masked elements of masked arrays are ignored, the output will be a scalar or np.ndarray rather than a masked array with mask=False.

Array API Standard Support

combine_pvalues has experimental support for Python Array API Standard compatible backends in addition to NumPy. Please consider testing these features by setting an environment variable SCIPY_ARRAY_API=1 and providing CuPy, PyTorch, JAX, or Dask arrays as array arguments. The following combinations of backend and device (or other capability) are supported.

====================  ====================  ====================
Library               CPU                   GPU
====================  ====================  ====================
NumPy                 ✅                     n/a                 
CuPy                  n/a                   ✅                   
PyTorch               ✅                     ⛔                   
JAX                   ⚠️ no JIT             ⚠️ no JIT           
Dask                  ⚠️ computes graph     n/a                 
====================  ====================  ====================

See dev-arrayapi for more information.

Examples

Suppose we wish to combine p-values from four independent tests of the same null hypothesis using Fisher's method (default).

from scipy.stats import combine_pvalues
pvalues = [0.1, 0.05, 0.02, 0.3]

✓

combine_pvalues(pvalues)

✗

When the individual p-values carry different weights, consider Stouffer's method.

weights = [1, 2, 3, 4]
res = combine_pvalues(pvalues, method='stouffer', weights=weights)

✓

res.pvalue