bundles / scipy 1.17.1 / scipy / stats / _stats_py / pearsonr
function
scipy.stats._stats_py:pearsonr
source: /scipy/stats/_stats_py.py :4369
Signature
def pearsonr ( x , y , * , alternative = two-sided , method = None , axis = 0 ) Summary
Pearson correlation coefficient and p-value for testing non-correlation.
Extended Summary
The Pearson correlation coefficient [1] measures the linear relationship between two datasets. Like other correlation coefficients, this one varies between -1 and +1 with 0 implying no correlation. Correlations of -1 or +1 imply an exact linear relationship. Positive correlations imply that as x increases, so does y. Negative correlations imply that as x increases, y decreases.
This function also performs a test of the null hypothesis that the distributions underlying the samples are uncorrelated and normally distributed. (See Kowalski [3] for a discussion of the effects of non-normality of the input on the distribution of the correlation coefficient.) The p-value roughly indicates the probability of an uncorrelated system producing datasets that have a Pearson correlation at least as extreme as the one computed from these datasets.
Parameters
x: array_likeInput array.
y: array_likeInput array.
axis: int or None, defaultAxis along which to perform the calculation. Default is 0. If None, ravel both arrays before performing the calculation.
alternative: {'two-sided', 'greater', 'less'}, optionalDefines the alternative hypothesis. Default is 'two-sided'. The following options are available:
'two-sided': the correlation is nonzero
'less': the correlation is negative (less than zero)
'greater': the correlation is positive (greater than zero)
method: ResamplingMethod, optionalDefines the method used to compute the p-value. If
methodis an instance of PermutationMethod/MonteCarloMethod, the p-value is computed using scipy.stats.permutation_test/scipy.stats.monte_carlo_test with the provided configuration options and other appropriate settings. Otherwise, the p-value is computed as documented in the notes.
Returns
result: `~scipy.stats._result_classes.PearsonRResult`An object with the following attributes:
statistic
statistic
pvalue
pvalue
The object has the following method:
confidence_interval(confidence_level, method)
This computes the confidence interval of the correlation coefficient
statisticfor the given confidence level. The confidence interval is returned in anamedtuplewith fieldslowandhigh. Ifmethodis not provided, the confidence interval is computed using the Fisher transformation [1]. Ifmethodis an instance of BootstrapMethod, the confidence interval is computed using scipy.stats.bootstrap with the provided configuration options and other appropriate settings. In some cases, confidence limits may be NaN due to a degenerate resample, and this is typical for very small samples (~6 observations).
Raises
: ValueErrorIf
xandydo not have length at least 2.
Warns
: `~scipy.stats.ConstantInputWarning`Raised if an input is a constant array. The correlation coefficient is not defined in this case, so
np.nanis returned.: `~scipy.stats.NearConstantInputWarning`Raised if an input is "nearly" constant. The array
xis considered nearly constant ifnorm(x - mean(x)) < 1e-13 * abs(mean(x)). Numerical errors in the calculationx - mean(x)in this case might result in an inaccurate calculation of r.
Notes
The correlation coefficient is calculated as follows:
where is the mean of the vector x and is the mean of the vector y.
Under the assumption that x and y are drawn from independent normal distributions (so the population correlation coefficient is 0), the probability density function of the sample correlation coefficient r is ([1], [2]):
where n is the number of samples, and B is the beta function. This is sometimes referred to as the exact distribution of r. This is the distribution that is used in pearsonr to compute the p-value when the method parameter is left at its default value (None). The distribution is a beta distribution on the interval [-1, 1], with equal shape parameters a = b = n/2 - 1. In terms of SciPy's implementation of the beta distribution, the distribution of r is
dist = scipy.stats.beta(n/2 - 1, n/2 - 1, loc=-1, scale=2)The default p-value returned by pearsonr is a two-sided p-value. For a given sample with correlation coefficient r, the p-value is the probability that abs(r') of a random sample x' and y' drawn from the population with zero correlation would be greater than or equal to abs(r). In terms of the object dist shown above, the p-value for a given r and length n can be computed as
p = 2*dist.cdf(-abs(r))When n is 2, the above continuous distribution is not well-defined. One can interpret the limit of the beta distribution as the shape parameters a and b approach a = b = 0 as a discrete distribution with equal probability masses at r = 1 and r = -1. More directly, one can observe that, given the data x = [x1, x2] and y = [y1, y2], and assuming x1 != x2 and y1 != y2, the only possible values for r are 1 and -1. Because abs(r') for any sample x' and y' with length 2 will be 1, the two-sided p-value for a sample of length 2 is always 1.
For backwards compatibility, the object that is returned also behaves like a tuple of length two that holds the statistic and the p-value.
Array API Standard Support
pearsonr has experimental support for Python Array API Standard compatible backends in addition to NumPy. Please consider testing these features by setting an environment variable SCIPY_ARRAY_API=1 and providing CuPy, PyTorch, JAX, or Dask arrays as array arguments. The following combinations of backend and device (or other capability) are supported.
==================== ==================== ==================== Library CPU GPU ==================== ==================== ==================== NumPy ✅ n/a CuPy n/a ✅ PyTorch ✅ ⛔ JAX ✅ ✅ Dask ✅ n/a ==================== ==================== ====================
See
dev-arrayapifor more information.
Examples
import numpy as np from scipy import stats x, y = [1, 2, 3, 4, 5, 6, 7], [10, 9, 2.5, 6, 4, 3, 2] res = stats.pearsonr(x, y)✓
res
✗rng = np.random.default_rng(7796654889291491997) method = stats.PermutationMethod(n_resamples=np.inf, random_state=rng)✓
stats.pearsonr(x, y, method=method)
✗method = stats.MonteCarloMethod(rvs=(rng.uniform, rng.uniform))
✓stats.pearsonr(x, y, method=method)
✗res.confidence_interval(confidence_level=0.9)
✗method = stats.BootstrapMethod(method='BCa', rng=rng)
✓res.confidence_interval(confidence_level=0.9, method=method)
✗rng = np.random.default_rng(2348246935601934321) x = rng.standard_normal((8, 15)) y = rng.standard_normal((8, 15)) stats.pearsonr(x, y, axis=0).statistic.shape # between corresponding columns stats.pearsonr(x, y, axis=1).statistic.shape # between corresponding rows✓
stats.pearsonr(x[:, np.newaxis, :], y, axis=-1).statistic.shape
✓rng = np.random.default_rng() s = 0.5 x = stats.norm.rvs(size=500, random_state=rng) e = stats.norm.rvs(scale=s, size=500, random_state=rng) y = x + e✓
stats.pearsonr(x, y).statistic
✗1/np.sqrt(1 + s**2)
✗y = np.abs(x)
✓stats.pearsonr(x, y)
✗y = np.where(x < 0, x, 0)
✓stats.pearsonr(x, y)
✗See also
- hypothesis_pearsonr
Extended example
- kendalltau
Kendall's tau, a correlation measure for ordinal data.
- spearmanr
Spearman rank-order correlation coefficient.
Aliases
-
scipy.stats.pearsonr