bundles / scipy latest / scipy / spatial / distance / pdist
function
scipy.spatial.distance:pdist
source: /scipy/spatial/distance.py :1816
Signature
def pdist ( X , metric = euclidean , * , out = None , ** kwargs ) Summary
Pairwise distances between observations in n-dimensional space.
Extended Summary
See Notes for common calling conventions.
Parameters
X: array_likeAn m by n array of m original observations in an n-dimensional space.
metric: str or function, optionalThe distance metric to use. The distance function can be 'braycurtis', 'canberra', 'chebyshev', 'cityblock', 'correlation', 'cosine', 'dice', 'euclidean', 'hamming', 'jaccard', 'jensenshannon', 'mahalanobis', 'matching', 'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalsneath', 'sqeuclidean', 'yule'.
out: ndarray, optionalThe output array. If not None, condensed distance matrix Y is stored in this array.
**kwargs: dict, optionalExtra arguments to
metric: refer to each metric documentation for a list of all possible arguments.Some possible arguments:
pscalar The p-norm to apply for Minkowski, weighted and unweighted. Default: 2.
wndarray The weight vector for metrics that support weights (e.g., Minkowski).
Vndarray The variance vector for standardized Euclidean. Default: var(X, axis=0, ddof=1)
VIndarray The inverse of the covariance matrix for Mahalanobis. Default: inv(cov(X.T)).T
Returns
Y: ndarrayReturns a condensed distance matrix Y. For each and (where ),where m is the number of original observations. The metric
dist(u=X[i], v=X[j])is computed and stored in entrym * i + j - ((i + 2) * (i + 1)) // 2.
Notes
See squareform for information on how to calculate the index of this entry or to convert the condensed distance matrix to a redundant square matrix.
The following are common calling conventions.
Y = pdist(X, 'euclidean')Computes the distance between m points using Euclidean distance (2-norm) as the distance metric between the points. The points are arranged as m n-dimensional row vectors in the matrix X.
Y = pdist(X, 'minkowski', p=2.)Computes the distances using the Minkowski distance (-norm) where (note that this is only a quasi-metric if ).
Y = pdist(X, 'cityblock')Computes the city block or Manhattan distance between the points.
Y = pdist(X, 'seuclidean', V=None)Computes the standardized Euclidean distance. The standardized Euclidean distance between two n-vectors
uandvisV is the variance vector; V[i] is the variance computed over all the i'th components of the points. If not passed, it is automatically computed.
Y = pdist(X, 'sqeuclidean')Computes the squared Euclidean distance between the vectors.
Y = pdist(X, 'cosine')Computes the cosine distance between vectors u and v,
where is the 2-norm of its argument
*, and is the dot product ofuandv.Y = pdist(X, 'correlation')Computes the correlation distance between vectors u and v. This is
where is the mean of the elements of vector v, and is the dot product of and .
Y = pdist(X, 'hamming')Computes the normalized Hamming distance, or the proportion of those vector elements between two n-vectors
uandvwhich disagree. To save memory, the matrixXcan be of type boolean.Y = pdist(X, 'jaccard')Computes the Jaccard distance between the points. Given two vectors,
uandv, the Jaccard distance is the proportion of those elementsu[i]andv[i]that disagree.Y = pdist(X, 'jensenshannon')Computes the Jensen-Shannon distance between two probability arrays. Given two probability vectors, and , the Jensen-Shannon distance is
where is the pointwise mean of and and is the Kullback-Leibler divergence.
Y = pdist(X, 'chebyshev')Computes the Chebyshev distance between the points. The Chebyshev distance between two n-vectors
uandvis the maximum norm-1 distance between their respective elements. More precisely, the distance is given byY = pdist(X, 'canberra')Computes the Canberra distance between the points. The Canberra distance between two points
uandvisY = pdist(X, 'braycurtis')Computes the Bray-Curtis distance between the points. The Bray-Curtis distance between two points
uandvisY = pdist(X, 'mahalanobis', VI=None)Computes the Mahalanobis distance between the points. The Mahalanobis distance between two points
uandvis where (theVIvariable) is the inverse covariance. IfVIis not None,VIwill be used as the inverse covariance matrix.Y = pdist(X, 'yule')Computes the Yule distance between each pair of boolean vectors. (see yule function documentation)
Y = pdist(X, 'matching')Synonym for 'hamming'.
Y = pdist(X, 'dice')Computes the Dice distance between each pair of boolean vectors. (see dice function documentation)
Y = pdist(X, 'rogerstanimoto')Computes the Rogers-Tanimoto distance between each pair of boolean vectors. (see rogerstanimoto function documentation)
Y = pdist(X, 'russellrao')Computes the Russell-Rao distance between each pair of boolean vectors. (see russellrao function documentation)
Y = pdist(X, 'sokalsneath')Computes the Sokal-Sneath distance between each pair of boolean vectors. (see sokalsneath function documentation)
Y = pdist(X, f)Computes the distance between all pairs of vectors in X using the user supplied 2-arity function f. For example, Euclidean distance between the vectors could be computed as follows
dm = pdist(X, lambda u, v: np.sqrt(((u-v)**2).sum()))Note that you should avoid passing a reference to one of the distance functions defined in this library. For example,
dm = pdist(X, sokalsneath)would calculate the pair-wise distances between the vectors in X using the Python function sokalsneath. This would result in sokalsneath being called times, which is inefficient. Instead, the optimized C version is more efficient, and we call it using the following syntax.
dm = pdist(X, 'sokalsneath')
Examples
import numpy as np from scipy.spatial.distance import pdist✓
x = np.array([[2, 0, 2], [2, 2, 3], [-2, 4, 5], [0, 1, 9], [2, 2, 4]])
✓pdist(x)
✗pdist(x, metric='minkowski', p=3.5)
✗pdist(x, metric='cityblock')
✓See also
- squareform
converts between condensed distance matrices and square distance matrices.
Aliases
-
scipy.interpolate._rbf.pdist