{ } Raw JSON

bundles / scipy 1.17.1 / scipy / cluster / hierarchy / average

function

scipy.cluster.hierarchy:average

source: /scipy/cluster/hierarchy.py :338

Signature

def   average ( y )

Summary

Perform average/UPGMA linkage on a condensed distance matrix.

Parameters

y : ndarray

The upper triangular of the distance matrix. The result of pdist is returned in this form.

Returns

Z : ndarray

A linkage matrix containing the hierarchical clustering. See linkage for more information on its structure.

Notes

Array API Standard Support

average has experimental support for Python Array API Standard compatible backends in addition to NumPy. Please consider testing these features by setting an environment variable SCIPY_ARRAY_API=1 and providing CuPy, PyTorch, JAX, or Dask arrays as array arguments. The following combinations of backend and device (or other capability) are supported.

====================  ====================  ====================
Library               CPU                   GPU
====================  ====================  ====================
NumPy                 ✅                     n/a                 
CuPy                  n/a                   ⛔                   
PyTorch               ✅                     ⛔                   
JAX                   ✅                     ⛔                   
Dask                  ⚠️ merges chunks      n/a                 
====================  ====================  ====================

See dev-arrayapi for more information.

Examples

from scipy.cluster.hierarchy import average, fcluster
from scipy.spatial.distance import pdist
First, we need a toy dataset to play with:: x x x x x x x x x x x x
X = [[0, 0], [0, 1], [1, 0],
     [0, 4], [0, 3], [1, 4],
     [4, 0], [3, 0], [4, 1],
     [4, 4], [3, 4], [4, 3]]
Then, we get a condensed distance matrix from this dataset:
y = pdist(X)
Finally, we can perform the clustering:
Z = average(y)
Z
The linkage matrix ``Z`` represents a dendrogram - see `scipy.cluster.hierarchy.linkage` for a detailed explanation of its contents. We can use `scipy.cluster.hierarchy.fcluster` to see to which cluster each initial point would belong given a distance threshold:
fcluster(Z, 0.9, criterion='distance')
fcluster(Z, 1.5, criterion='distance')
fcluster(Z, 4, criterion='distance')
fcluster(Z, 6, criterion='distance')
Also, `scipy.cluster.hierarchy.dendrogram` can be used to generate a plot of the dendrogram.

See also

linkage

for advanced creation of hierarchical clusterings.

scipy.spatial.distance.pdist

pairwise distance metrics

Aliases

  • scipy.cluster.hierarchy.average