`scipy.stats._hypotests:somersd`

source: /scipy/stats/_hypotests.py :775

Signature

def somersd ( x , y = None , alternative = two-sided )

Summary

Calculates Somers' D, an asymmetric measure of ordinal association.

Extended Summary

Like Kendall's $τ$ , Somers' $D$ is a measure of the correspondence between two rankings. Both statistics consider the difference between the number of concordant and discordant pairs in two rankings $X$ and $Y$ , and both are normalized such that values close to 1 indicate strong agreement and values close to -1 indicate strong disagreement. They differ in how they are normalized. To show the relationship, Somers' $D$ can be defined in terms of Kendall's $τ_{a}$ :

D (Y ∣ X) = \frac{τ _{a} ( X , Y )}{τ _{a} ( X , X )}

Suppose the first ranking $X$ has $r$ distinct ranks and the second ranking $Y$ has $s$ distinct ranks. These two lists of $n$ rankings can also be viewed as an $r \times s$ contingency table in which element $i, j$ is the number of rank pairs with rank $i$ in ranking $X$ and rank $j$ in ranking $Y$ . Accordingly, somersd also allows the input data to be supplied as a single, 2D contingency table instead of as two separate, 1D rankings.

Note that the definition of Somers' $D$ is asymmetric: in general, $D (Y ∣ X) \neq = D (X ∣ Y)$ . somersd(x, y) calculates Somers' $D (Y ∣ X)$ : the "row" variable $X$ is treated as an independent variable, and the "column" variable $Y$ is dependent. For Somers' $D (X ∣ Y)$ , swap the input lists or transpose the input table.

Parameters

x : array_like: 1D array of rankings, treated as the (row) independent variable. Alternatively, a 2D contingency table.
y : array_like, optional: If x is a 1D array of rankings, y is a 1D array of rankings of the same length, treated as the (column) dependent variable. If x is 2D, y is ignored.
alternative : {'two-sided', 'less', 'greater'}, optional: Defines the alternative hypothesis. Default is 'two-sided'. The following options are available: * 'two-sided': the rank correlation is nonzero * 'less': the rank correlation is negative (less than zero) * 'greater': the rank correlation is positive (greater than zero)

Returns

res : SomersDResult

A SomersDResult object with the following fields:

statistic
statistic
pvalue
pvalue
table
table

Notes

This function follows the contingency table approach of ^[2] and ^[3]. p-values are computed based on an asymptotic approximation of the test statistic distribution under the null hypothesis $D = 0$ .

Theoretically, hypothesis tests based on Kendall's $t a u$ and Somers' $D$ should be identical. However, the p-values returned by kendalltau are based on the null hypothesis of independence between $X$ and $Y$ (i.e. the population from which pairs in $X$ and $Y$ are sampled contains equal numbers of all possible pairs), which is more specific than the null hypothesis $D = 0$ used here. If the null hypothesis of independence is desired, it is acceptable to use the p-value returned by kendalltau with the statistic returned by somersd and vice versa. For more information, see ^[2].

Contingency tables are formatted according to the convention used by SAS and R: the first ranking supplied (x) is the "row" variable, and the second ranking supplied (y) is the "column" variable. This is opposite the convention of Somers' original paper ^[1].

Array API Standard Support

somersd has experimental support for Python Array API Standard compatible backends in addition to NumPy. Please consider testing these features by setting an environment variable SCIPY_ARRAY_API=1 and providing CuPy, PyTorch, JAX, or Dask arrays as array arguments. The following combinations of backend and device (or other capability) are supported.

====================  ====================  ====================
Library               CPU                   GPU
====================  ====================  ====================
NumPy                 ✅                     n/a                 
CuPy                  n/a                   ⛔                   
PyTorch               ⛔                     ⛔                   
JAX                   ⛔                     ⛔                   
Dask                  ⛔                     n/a                 
====================  ====================  ====================

See dev-arrayapi for more information.

Examples

We calculate Somers' D for the example given in [4]_, in which a hotel chain owner seeks to determine the association between hotel room cleanliness and customer satisfaction. The independent variable, hotel room cleanliness, is ranked on an ordinal scale: "below average (1)", "average (2)", or "above average (3)". The dependent variable, customer satisfaction, is ranked on a second scale: "very dissatisfied (1)", "moderately dissatisfied (2)", "neither dissatisfied nor satisfied (3)", "moderately satisfied (4)", or "very satisfied (5)". 189 customers respond to the survey, and the results are cast into a contingency table with the hotel room cleanliness as the "row" variable and customer satisfaction as the "column" variable. +-----+-----+-----+-----+-----+-----+ | | (1) | (2) | (3) | (4) | (5) | +=====+=====+=====+=====+=====+=====+ | (1) | 27 | 25 | 14 | 7 | 0 | +-----+-----+-----+-----+-----+-----+ | (2) | 7 | 14 | 18 | 35 | 12 | +-----+-----+-----+-----+-----+-----+ | (3) | 1 | 3 | 2 | 7 | 17 | +-----+-----+-----+-----+-----+-----+ For example, 27 customers assigned their room a cleanliness ranking of "below average (1)" and a corresponding satisfaction of "very dissatisfied (1)". We perform the analysis as follows.

from scipy.stats import somersd
table = [[27, 25, 14, 7, 0], [7, 14, 18, 35, 12], [1, 3, 2, 7, 17]]
res = somersd(table)

✓

res.statistic
res.pvalue

✗

The value of the Somers' D statistic is approximately 0.6, indicating a positive correlation between room cleanliness and customer satisfaction in the sample. The *p*-value is very small, indicating a very small probability of observing such an extreme value of the statistic under the null hypothesis that the statistic of the entire population (from which our sample of 189 customers is drawn) is zero. This supports the alternative hypothesis that the true value of Somers' D for the population is nonzero.

`scipy.stats._hypotests:somersd`

Signature

Summary

Extended Summary

Parameters

Returns

Notes

Examples

See also

Aliases

Referenced by

This package