bundles / scipy latest / scipy / spatial / distance / cdist
function
scipy.spatial.distance:cdist
source: /scipy/spatial/distance.py :2606
Signature
def cdist ( XA , XB , metric = euclidean , * , out = None , ** kwargs ) Summary
Compute distance between each pair of the two collections of inputs.
Extended Summary
See Notes for common calling conventions.
Parameters
XA: array_likeAn by array of original observations in an -dimensional space. Inputs are converted to float type.
XB: array_likeAn by array of original observations in an -dimensional space. Inputs are converted to float type.
metric: str or callable, optionalThe distance metric to use. If a string, the distance function can be 'braycurtis', 'canberra', 'chebyshev', 'cityblock', 'correlation', 'cosine', 'dice', 'euclidean', 'hamming', 'jaccard', 'jensenshannon', 'mahalanobis', 'matching', 'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalsneath', 'sqeuclidean', 'yule'.
**kwargs: dict, optionalExtra arguments to
metric: refer to each metric documentation for a list of all possible arguments.Some possible arguments:
pscalar The p-norm to apply for Minkowski, weighted and unweighted. Default: 2.
warray_like The weight vector for metrics that support weights (e.g., Minkowski).
Varray_like The variance vector for standardized Euclidean. Default: var(vstack([XA, XB]), axis=0, ddof=1)
VIarray_like The inverse of the covariance matrix for Mahalanobis. Default: inv(cov(vstack([XA, XB].T))).T
outndarray The output array If not None, the distance matrix Y is stored in this array.
Returns
Y: ndarrayA by distance matrix is returned. For each and , the metric
dist(u=XA[i], v=XB[j])is computed and stored in the th entry.
Raises
: ValueErrorAn exception is thrown if
XAandXBdo not have the same number of columns.
Notes
The following are common calling conventions:
Y = cdist(XA, XB, 'euclidean')Computes the distance between points using Euclidean distance (2-norm) as the distance metric between the points. The points are arranged as -dimensional row vectors in the matrix X.
Y = cdist(XA, XB, 'minkowski', p=2.)Computes the distances using the Minkowski distance (-norm) where (note that this is only a quasi-metric if ).
Y = cdist(XA, XB, 'cityblock')Computes the city block or Manhattan distance between the points.
Y = cdist(XA, XB, 'seuclidean', V=None)Computes the standardized Euclidean distance. The standardized Euclidean distance between two n-vectors
uandvisV is the variance vector; V[i] is the variance computed over all the i'th components of the points. If not passed, it is automatically computed.
Y = cdist(XA, XB, 'sqeuclidean')Computes the squared Euclidean distance between the vectors.
Y = cdist(XA, XB, 'cosine')Computes the cosine distance between vectors u and v,
where is the 2-norm of its argument
*, and is the dot product of and .Y = cdist(XA, XB, 'correlation')Computes the correlation distance between vectors u and v. This is
where is the mean of the elements of vector v, and is the dot product of and .
Y = cdist(XA, XB, 'hamming')Computes the normalized Hamming distance, or the proportion of those vector elements between two n-vectors
uandvwhich disagree. To save memory, the matrixXcan be of type boolean.Y = cdist(XA, XB, 'jaccard')Computes the Jaccard distance between the points. Given two vectors,
uandv, the Jaccard distance is the proportion of those elementsu[i]andv[i]that disagree where at least one of them is non-zero.Y = cdist(XA, XB, 'jensenshannon')Computes the Jensen-Shannon distance between two probability arrays. Given two probability vectors, and , the Jensen-Shannon distance is
where is the pointwise mean of and and is the Kullback-Leibler divergence.
Y = cdist(XA, XB, 'chebyshev')Computes the Chebyshev distance between the points. The Chebyshev distance between two n-vectors
uandvis the maximum norm-1 distance between their respective elements. More precisely, the distance is given byY = cdist(XA, XB, 'canberra')Computes the Canberra distance between the points. The Canberra distance between two points
uandvisY = cdist(XA, XB, 'braycurtis')Computes the Bray-Curtis distance between the points. The Bray-Curtis distance between two points
uandvisY = cdist(XA, XB, 'mahalanobis', VI=None)Computes the Mahalanobis distance between the points. The Mahalanobis distance between two points
uandvis where (theVIvariable) is the inverse covariance. IfVIis not None,VIwill be used as the inverse covariance matrix.Y = cdist(XA, XB, 'yule')Computes the Yule distance between the boolean vectors. (see yule function documentation)
Y = cdist(XA, XB, 'matching')Synonym for 'hamming'.
Y = cdist(XA, XB, 'dice')Computes the Dice distance between the boolean vectors. (see dice function documentation).
Y = cdist(XA, XB, 'rogerstanimoto')Computes the Rogers-Tanimoto distance between the boolean vectors. (see rogerstanimoto function documentation)
Y = cdist(XA, XB, 'russellrao')Computes the Russell-Rao distance between the boolean vectors. (see russellrao function documentation)
Y = cdist(XA, XB, 'sokalsneath')Computes the Sokal-Sneath distance between the vectors. (see sokalsneath function documentation)
Y = cdist(XA, XB, f)Computes the distance between all pairs of vectors in X using the user supplied 2-arity function f. For example, Euclidean distance between the vectors could be computed as follows
dm = cdist(XA, XB, lambda u, v: np.sqrt(((u-v)**2).sum()))Note that you should avoid passing a reference to one of the distance functions defined in this library. For example,
dm = cdist(XA, XB, sokalsneath)would calculate the pair-wise distances between the vectors in X using the Python function sokalsneath. This would result in sokalsneath being called times, which is inefficient. Instead, the optimized C version is more efficient, and we call it using the following syntax
dm = cdist(XA, XB, 'sokalsneath')
Examples
Find the Euclidean distances between four 2-D coordinates:from scipy.spatial import distance import numpy as np coords = [(35.0456, -85.2672), (35.1174, -89.9711), (35.9728, -83.9422), (36.1667, -86.7833)]✓
distance.cdist(coords, coords, 'euclidean')
✗a = np.array([[0, 0, 0], [0, 0, 1], [0, 1, 0], [0, 1, 1], [1, 0, 0], [1, 0, 1], [1, 1, 0], [1, 1, 1]]) b = np.array([[ 0.1, 0.2, 0.4]])✓
distance.cdist(a, b, 'cityblock')
✗Aliases
-
scipy.cluster.vq.cdist