{ } Raw JSON

bundles / scipy 1.17.1 / scipy / stats / _mgc / multiscale_graphcorr

function

scipy.stats._mgc:multiscale_graphcorr

source: /scipy/stats/_mgc.py :99

Signature

def   multiscale_graphcorr ( x y compute_distance = <function _euclidean_dist at 0x0000> reps = 1000 workers = 1 is_twosamp = False random_state = None )

Summary

Computes the Multiscale Graph Correlation (MGC) test statistic.

Extended Summary

Specifically, for each point, MGC finds the -nearest neighbors for one property (e.g. cloud density), and the -nearest neighbors for the other property (e.g. grass wetness) [1]. This pair is called the "scale". A priori, however, it is not know which scales will be most informative. So, MGC computes all distance pairs, and then efficiently computes the distance correlations for all scales. The local correlations illustrate which scales are relatively informative about the relationship. The key, therefore, to successfully discover and decipher relationships between disparate data modalities is to adaptively determine which scales are the most informative, and the geometric implication for the most informative scales. Doing so not only provides an estimate of whether the modalities are related, but also provides insight into how the determination was made. This is especially important in high-dimensional data, where simple visualizations do not reveal relationships to the unaided human eye. Characterizations of this implementation in particular have been derived from and benchmarked within in [2].

Parameters

x, y : ndarray

If x and y have shapes (n, p) and (n, q) where n is the number of samples and p and q are the number of dimensions, then the MGC independence test will be run. Alternatively, x and y can have shapes (n, n) if they are distance or similarity matrices, and compute_distance must be sent to None. If x and y have shapes (n, p) and (m, p), an unpaired two-sample MGC test will be run.

compute_distance : callable, optional

A function that computes the distance or similarity among the samples within each data matrix. Set to None if x and y are already distance matrices. The default uses the euclidean norm metric. If you are calling a custom function, either create the distance matrix before-hand or create a function of the form compute_distance(x) where x is the data matrix for which pairwise distances are calculated.

reps : int, optional

The number of replications used to estimate the null when using the permutation test. The default is 1000.

workers : int or map-like callable, optional

If workers is an int the population is subdivided into workers sections and evaluated in parallel (uses multiprocessing.Pool <multiprocessing>). Supply -1 to use all cores available to the Process. Alternatively supply a map-like callable, such as multiprocessing.Pool.map for evaluating the p-value in parallel. This evaluation is carried out as workers(func, iterable). Requires that func be pickleable. The default is 1.

is_twosamp : bool, optional

If True, a two sample test will be run. If x and y have shapes (n, p) and (m, p), this optional will be overridden and set to True. Set to True if x and y both have shapes (n, p) and a two sample test is desired. The default is False. Note that this will not run if inputs are distance matrices.

random_state : {None, int, `numpy.random.Generator`,

numpy.random.RandomState}, optional

If seed is None (or np.random), the numpy.random.RandomState singleton is used. If seed is an int, a new RandomState instance is used, seeded with seed. If seed is already a Generator or RandomState instance then that instance is used.

Returns

res : MGCResult

An object containing attributes:

statistic

statistic

pvalue

pvalue

mgc_dict

mgc_dict

Notes

A description of the process of MGC and applications on neuroscience data can be found in [1]. It is performed using the following steps:

  • Two distance matrices and are computed and modified to be mean zero columnwise. This results in two distance matrices and (the centering and unbiased modification) [3].

  • For all values and from ,

    • The -nearest neighbor and -nearest neighbor graphs are calculated for each property. Here, indicates the -smallest values of the -th row of and indicates the smallested values of the -th row of

    • Let denotes the entry-wise matrix product, then local correlations are summed and normalized using the following statistic:

  • The MGC test statistic is the smoothed optimal local correlation of . Denote the smoothing operation as (which essentially set all isolated large correlations) as 0 and connected large correlations the same as before, see [3].) MGC is,

The test statistic returns a value between since it is normalized.

The p-value returned is calculated using a permutation test. This process is completed by first randomly permuting to estimate the null distribution and then calculating the probability of observing a test statistic, under the null, at least as extreme as the observed test statistic.

MGC requires at least 5 samples to run with reliable results. It can also handle high-dimensional data sets. In addition, by manipulating the input data matrices, the two-sample testing problem can be reduced to the independence testing problem [4]. Given sample data and of sizes , data matrix and can be created as follows:

Then, the MGC statistic can be calculated as normal. This methodology can be extended to similar tests such as distance correlation [4].

Array API Standard Support

multiscale_graphcorr has experimental support for Python Array API Standard compatible backends in addition to NumPy. Please consider testing these features by setting an environment variable SCIPY_ARRAY_API=1 and providing CuPy, PyTorch, JAX, or Dask arrays as array arguments. The following combinations of backend and device (or other capability) are supported.

====================  ====================  ====================
Library               CPU                   GPU
====================  ====================  ====================
NumPy                 ✅                     n/a                 
CuPy                  n/a                   ⛔                   
PyTorch               ⛔                     ⛔                   
JAX                   ⛔                     ⛔                   
Dask                  ⛔                     n/a                 
====================  ====================  ====================

See dev-arrayapi for more information.

Examples

import numpy as np
from scipy.stats import multiscale_graphcorr
x = np.arange(100)
y = x
res = multiscale_graphcorr(x, y)
res.statistic, res.pvalue
To run an unpaired two-sample test,
x = np.arange(100)
y = np.arange(79)
res = multiscale_graphcorr(x, y)
or, if shape of the inputs are the same,
x = np.arange(100)
y = x
res = multiscale_graphcorr(x, y, is_twosamp=True)

See also

kendalltau

Calculates Kendall's tau.

pearsonr

Pearson correlation coefficient and p-value for testing non-correlation.

spearmanr

Calculates a Spearman rank-order correlation coefficient.

Aliases

  • scipy.stats.multiscale_graphcorr

Referenced by