clustering algorithms k-means hierarchic agglomerative clustering (hac) …. birch association rule...

Clustering Algorithms

• k-means• Hierarchic Agglomerative Clustering (HAC)•….• BIRCH• Association Rule Hypergraph Partitioning (ARHP)•Categorical clustering (CACTUS, STIRR)•……•STC•QDC

Hierarchical clustering

Given a set of N items to be clustered, and an NxN distance (or similarity) matrix,

1. Start by assigning each item to its own cluster

2. Find the closest (most similar) pair of clusters and merge them into a single cluster, so that now you have one less cluster.

3. Compute distances (similarities) between the new cluster and each of the old clusters.

4. Repeat steps 2 and 3 until all items are clustered into a single cluster of size N.

Iwona Białynicka-Birula - Clustering Web Search Results

Agglomerative hierarchical clustering

Clustering result: dendrogram

AHC variants

• Various ways of calculating cluster similarity

single-link(minimum)

complete-link(maximum)

Group-average(average)

Data ClusteringK-means

Partitional clusteringInitial number of clusters k

K-means1. Place K points into the space represented by the

objects that are being clustered. These points represent initial group centroids.

2. Assign each object to the group that has the closest centroid.

3. When all objects have been assigned, recalculate the positions of the K centroids.

4. Repeat Steps 2 and 3 until the centroids no longer move. This produces a separation of the objects into groups from which the metric to be minimized can be calculated.

Example by Andrew W. Moore

K-means

K-means clustering (k=3)

Single-pass

threshold

Document Clustering: k-means •k-means: distance-based flat clustering

•Advantage:•linear time complexity •works relatively well in low dimension space

•Drawback:•distance computation in high dimension space•centroid vector may not well summarize the cluster documents•initial k clusters affect the quality of clusters

0. Input: D::={d1,d2,…dn }; k::=the cluster number;1. Select k document vectors as the initial centriods of k clusters 2. Repeat3. Select one vector d in remaining documents4. Compute similarities between d and k centroids5. Put d in the closest cluster and recompute the centroid 6. Until the centroids don’t change7. Output:k clusters of documents

Document Clustering: HAC •Hierarchic agglomerative clustering(HAC):distance-based hierarchic clustering

•Advantage:•producing better quality clusters•works relatively well in low dimension space

•Drawback:•distance computation in high dimension space•quadratic time complexity

0. Input: D::={d1,d2,…dn };1. Calculate similarity matrix SIM[i,j] 2. Repeat3. Merge the most similar two clusters, K and L, to form a new cluster KL4. Compute similarities between KL and each of the remaining cluster and update SIM[i,j]5. Until there is a single(or specified number) cluster6. Output: dendogram of clusters

clustering algorithms k-means hierarchic agglomerative clustering (hac) …. birch association rule...

closest cluster

clustering algorithms

clustering web search

old clusters

cluster documentsinitial

cluster number1

similar pair of clusters

new cluster kl4

Documents

problems of the hierarchic company

2013 acr/arhp annual meeting session · manage your meeting...

arhp management of early pregnancy failures in the...

collaborative clustering for entity clustering

generalization of dem looking for hierarchic levels of

clustering k-mean clustering

the most common ways to organize information hierarchic...

hierarchic data structures for sparse matrix

clustering 2: hierarchical clustering

clustering in ratemaking: applications in territories...

application of a hierarchic finite element

hierarchic graphic organizers - calhoun.k12.al.us...

efficient reachability analysis of hierarchic reactive ...

tum · construction to the three-dimensional theory of...

review is homo hierarchic us

clustering. overview definition of clustering existing...

outline gb1 chameleon 8-mer peptide md gb1 folding pathway...

hierarchic superpositionuwe/paper/mpi-i-2013-rg1-002.pdf ·...

clustering. 2 outline introduction k-means clustering ...

hierarchic anatomical structure segmentation guided by...