Clustering

Hierarchical Clustering

Hierarchical Clustering algorithm derived from the R package ‘amap’ [Amap].

class mlpy.HCluster(method='euclidean', link='complete')

Hierarchical Cluster.

Initialize Hierarchical Cluster.

Parameters:
method : string (‘euclidean’)

the distance measure to be used

link : string (‘single’, ‘complete’, ‘mcquitty’, ‘median’)

the agglomeration method to be used

Example:

>>> import numpy as np
>>> import mlpy
>>> x = np.array([[ 1. ,  1.5],
...               [ 1.1,  1.8],
...               [ 2. ,  2.8],
...               [ 3.2,  3.1],
...               [ 3.4,  3.2]])
>>> hc = mlpy.HCluster()
>>> hc.compute(x)
>>> hc.ia
array([-4, -1, -3,  2])
>>> hc.ib
array([-5, -2,  1,  3])
>>> hc.heights
array([ 0.2236068 ,  0.31622776,  1.4560219 ,  2.94108844])
>>> hc.cut(0.5)
array([0, 0, 1, 2, 2])
compute(x)

Compute Hierarchical Cluster.

Parameters:
x : ndarray

An 2-dimensional vector (sample x features).

Returns:
self.ia : ndarray (1-dimensional vector)

merge

self.ib : ndarray (1-dimensional vector)

merge

self.heights : ndarray (1-dimensional vector)

a set of n-1 non-decreasing real values. The clustering height: that is, the value of the criterion associated with the clustering method for the particular agglomeration.

Element i of merge describes the merging of clusters at step i of the clustering. If an element j is negative, then observation -j was merged at this stage. If j is positive then the merge was with the cluster formed at the (earlier) stage j of the algorithm. Thus negative entries in merge indicate agglomerations of singletons, and positive entries indicate agglomerations of non-singletons.

cut(ht)

Cuts the tree into several groups by specifying the cut height.

Parameters:
ht : float

height where the tree should be cut

Returns:
cl : ndarray (1-dimensional vector)

group memberships. Groups are in 0, ..., N-1

[Amap]amap: Another Multidimensional Analysis Package, http://cran.r-project.org/web/packages/amap/index.html

k-medoids

class mlpy.Kmedoids(k, dist, maxloops=100, rs=0)

k-medoids algorithm.

Initialize Kmedoids.

Parameters:
k : int

Number of clusters/medoids

dist : class

class with a .compute(x, y) method which returns a distance

maxloops : int

maximum number of loops

rs : int

random seed

Example:

>>> import numpy as np
>>> import mlpy
>>> x = np.array([[ 1. ,  1.5],
...               [ 1.1,  1.8],
...               [ 2. ,  2.8],
...               [ 3.2,  3.1],
...               [ 3.4,  3.2]])
>>> dtw = mlpy.Dtw(onlydist=True)
>>> km = mlpy.Kmedoids(k=3, dist=dtw)
>>> km.compute(x)
(array([4, 0, 2]), array([3, 1]), array([0, 1]), 0.072499999999999981)

Samples 4, 0, 2 are medoids and represent cluster 0, 1, 2 respectively.

  • cluster 0: samples 4 (medoid) and 3
  • cluster 1: samples 0 (medoid) and 1
  • cluster 2: sample 2 (medoid)

New in version 2.0.8.

compute(x)

Compute Kmedoids.

Parameters:
x : ndarray

An 2-dimensional vector (sample x features).

Returns:
m : ndarray (1-dimensional vector)

medoids indexes

n : ndarray (1-dimensional vector)

non-medoids indexes

cl : ndarray 1-dimensional vector)

cluster membership for non-medoids. Groups are in 0, ..., k-1

co : double

total cost of configuration

Table Of Contents

Previous topic

Distance Computations

Next topic

Classification

This Page