`scipy.spatial.distance:cdist`

source: /scipy/spatial/distance.py :2606

Signature

   def     cdist (    XA    ,    XB    ,    metric    =  euclidean   ,  * ,     out    =  None   ,   ** kwargs      )

Summary

Compute distance between each pair of the two collections of inputs.

Extended Summary

See Notes for common calling conventions.

Parameters

XA : array_like

An $m_{A}$ by $n$ array of $m_{A}$ original observations in an $n$ -dimensional space. Inputs are converted to float type.

XB : array_like

An $m_{B}$ by $n$ array of $m_{B}$ original observations in an $n$ -dimensional space. Inputs are converted to float type.

metric : str or callable, optional

The distance metric to use. If a string, the distance function can be 'braycurtis', 'canberra', 'chebyshev', 'cityblock', 'correlation', 'cosine', 'dice', 'euclidean', 'hamming', 'jaccard', 'jensenshannon', 'mahalanobis', 'matching', 'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalsneath', 'sqeuclidean', 'yule'.

**kwargs : dict, optional

Extra arguments to metric: refer to each metric documentation for a list of all possible arguments.

Some possible arguments:

pscalar The p-norm to apply for Minkowski, weighted and unweighted. Default: 2.

warray_like The weight vector for metrics that support weights (e.g., Minkowski).

Varray_like The variance vector for standardized Euclidean. Default: var(vstack([XA, XB]), axis=0, ddof=1)

VIarray_like The inverse of the covariance matrix for Mahalanobis. Default: inv(cov(vstack([XA, XB].T))).T

outndarray The output array If not None, the distance matrix Y is stored in this array.

Returns

Y : ndarray: A $m_{A}$ by $m_{B}$ distance matrix is returned. For each $i$ and $j$ , the metric dist(u=XA[i], v=XB[j]) is computed and stored in the $ij$ th entry.

Raises

: ValueError: An exception is thrown if XA and XB do not have the same number of columns.

Notes

The following are common calling conventions:

Y = cdist(XA, XB, 'euclidean')
Computes the distance between $m$ points using Euclidean distance (2-norm) as the distance metric between the points. The points are arranged as $m$ $n$ -dimensional row vectors in the matrix X.
Y = cdist(XA, XB, 'minkowski', p=2.)
Computes the distances using the Minkowski distance $∥ u - v ∥_{p}$ ( $p$ -norm) where $p > 0$ (note that this is only a quasi-metric if $0 < p < 1$ ).
Y = cdist(XA, XB, 'cityblock')
Computes the city block or Manhattan distance between the points.
Y = cdist(XA, XB, 'seuclidean', V=None)
Computes the standardized Euclidean distance. The standardized Euclidean distance between two n-vectors u and v is
$\sum (u_{i} - v_{i})^{2} / V [x_{i}] .$
V is the variance vector; V[i] is the variance computed over all the i'th components of the points. If not passed, it is automatically computed.
Y = cdist(XA, XB, 'sqeuclidean')
Computes the squared Euclidean distance $∥ u - v ∥_{2}^{2}$ between the vectors.
Y = cdist(XA, XB, 'cosine')
Computes the cosine distance between vectors u and v,
$1 - \frac{u \cdot v}{∥ u ∥ _{2} ∥ v ∥ _{2}}$
where $∥ * ∥_{2}$ is the 2-norm of its argument *, and $u \cdot v$ is the dot product of $u$ and $v$ .
Y = cdist(XA, XB, 'correlation')
Computes the correlation distance between vectors u and v. This is
$1 - \frac{( u - u ˉ ) \cdot ( v - v ˉ )}{∥ ( u - u ˉ ) ∥ _{2} ∥ ( v - v ˉ ) ∥ _{2}}$
where $\overset{v}{ˉ}$ is the mean of the elements of vector v, and $x \cdot y$ is the dot product of $x$ and $y$ .
Y = cdist(XA, XB, 'hamming')
Computes the normalized Hamming distance, or the proportion of those vector elements between two n-vectors u and v which disagree. To save memory, the matrix X can be of type boolean.
Y = cdist(XA, XB, 'jaccard')
Computes the Jaccard distance between the points. Given two vectors, u and v, the Jaccard distance is the proportion of those elements u[i] and v[i] that disagree where at least one of them is non-zero.
Y = cdist(XA, XB, 'jensenshannon')
Computes the Jensen-Shannon distance between two probability arrays. Given two probability vectors, $p$ and $q$ , the Jensen-Shannon distance is
$\frac{D ( p ∥ m ) + D ( q ∥ m )}{2}$
where $m$ is the pointwise mean of $p$ and $q$ and $D$ is the Kullback-Leibler divergence.
Y = cdist(XA, XB, 'chebyshev')
Computes the Chebyshev distance between the points. The Chebyshev distance between two n-vectors u and v is the maximum norm-1 distance between their respective elements. More precisely, the distance is given by
$d (u, v) = i max ∣ u_{i} - v_{i} ∣ .$
Y = cdist(XA, XB, 'canberra')
Computes the Canberra distance between the points. The Canberra distance between two points u and v is
$d (u, v) = i \sum \frac{∣ u _{i} - v _{i} ∣}{∣ u _{i} ∣ + ∣ v _{i} ∣} .$
Y = cdist(XA, XB, 'braycurtis')
Computes the Bray-Curtis distance between the points. The Bray-Curtis distance between two points u and v is
$d (u, v) = \frac{\sum _{i} ( ∣ u _{i} - v _{i} ∣ )}{\sum _{i} ( ∣ u _{i} + v _{i} ∣ )}$
Y = cdist(XA, XB, 'mahalanobis', VI=None)
Computes the Mahalanobis distance between the points. The Mahalanobis distance between two points u and v is $(u - v) (1/ V) (u - v)^{T}$ where $(1/ V)$ (the VI variable) is the inverse covariance. If VI is not None, VI will be used as the inverse covariance matrix.
Y = cdist(XA, XB, 'yule')
Computes the Yule distance between the boolean vectors. (see yule function documentation)
Y = cdist(XA, XB, 'matching')
Synonym for 'hamming'.
Y = cdist(XA, XB, 'dice')
Computes the Dice distance between the boolean vectors. (see dice function documentation).
Y = cdist(XA, XB, 'rogerstanimoto')
Computes the Rogers-Tanimoto distance between the boolean vectors. (see rogerstanimoto function documentation)
Y = cdist(XA, XB, 'russellrao')
Computes the Russell-Rao distance between the boolean vectors. (see russellrao function documentation)
Y = cdist(XA, XB, 'sokalsneath')
Computes the Sokal-Sneath distance between the vectors. (see sokalsneath function documentation)
Y = cdist(XA, XB, f)
Computes the distance between all pairs of vectors in X using the user supplied 2-arity function f. For example, Euclidean distance between the vectors could be computed as follows
```
dm = cdist(XA, XB, lambda u, v: np.sqrt(((u-v)**2).sum()))
```
Note that you should avoid passing a reference to one of the distance functions defined in this library. For example,
```
dm = cdist(XA, XB, sokalsneath)
```
would calculate the pair-wise distances between the vectors in X using the Python function sokalsneath. This would result in sokalsneath being called $(2 n)$ times, which is inefficient. Instead, the optimized C version is more efficient, and we call it using the following syntax
```
dm = cdist(XA, XB, 'sokalsneath')
```

Examples

Find the Euclidean distances between four 2-D coordinates:

from scipy.spatial import distance
import numpy as np
coords = [(35.0456, -85.2672),
          (35.1174, -89.9711),
          (35.9728, -83.9422),
          (36.1667, -86.7833)]

✓

distance.cdist(coords, coords, 'euclidean')

✗

Find the Manhattan distance from a 3-D point to the corners of the unit cube:

a = np.array([[0, 0, 0],
              [0, 0, 1],
              [0, 1, 0],
              [0, 1, 1],
              [1, 0, 0],
              [1, 0, 1],
              [1, 1, 0],
              [1, 1, 1]])
b = np.array([[ 0.1,  0.2,  0.4]])

✓

distance.cdist(a, b, 'cityblock')