main clustering algorithms §k-means §hierarchical §som

43

Upload: loren-west

Post on 27-Dec-2015

239 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Main Clustering Algorithms §K-Means §Hierarchical §SOM
Page 2: Main Clustering Algorithms §K-Means §Hierarchical §SOM

Main Clustering Algorithms

K-Means

Hierarchical

SOM

Page 3: Main Clustering Algorithms §K-Means §Hierarchical §SOM

K-Means

MacQueen, 1967

clusters defined by means/centroids

Many clustering algorithms are derivatives of K-Means

Widespread use in industry and academia, despite it’s many problems

Page 4: Main Clustering Algorithms §K-Means §Hierarchical §SOM

K-Means Example

Page 5: Main Clustering Algorithms §K-Means §Hierarchical §SOM
Page 6: Main Clustering Algorithms §K-Means §Hierarchical §SOM
Page 7: Main Clustering Algorithms §K-Means §Hierarchical §SOM
Page 8: Main Clustering Algorithms §K-Means §Hierarchical §SOM
Page 9: Main Clustering Algorithms §K-Means §Hierarchical §SOM
Page 10: Main Clustering Algorithms §K-Means §Hierarchical §SOM
Page 11: Main Clustering Algorithms §K-Means §Hierarchical §SOM
Page 12: Main Clustering Algorithms §K-Means §Hierarchical §SOM
Page 13: Main Clustering Algorithms §K-Means §Hierarchical §SOM
Page 14: Main Clustering Algorithms §K-Means §Hierarchical §SOM
Page 15: Main Clustering Algorithms §K-Means §Hierarchical §SOM
Page 16: Main Clustering Algorithms §K-Means §Hierarchical §SOM

Hierarchical Clustering

Starts by assuming each point as a cluster

Iteratively links most similar pair of clusters

User-defined threshold parameter specifies the output clusters

Page 17: Main Clustering Algorithms §K-Means §Hierarchical §SOM

Hierarchical Clustering Variants In Minitab©

Linkage MethodsAverageCentroidCompleteMcQuittyMedianSingleWard

Distance MeasuresEuclideanManhattanPearsonSquared EuclideanSquared Pearson

Page 18: Main Clustering Algorithms §K-Means §Hierarchical §SOM

Hierarchical Clustering Example

Page 19: Main Clustering Algorithms §K-Means §Hierarchical §SOM

Results

Page 20: Main Clustering Algorithms §K-Means §Hierarchical §SOM

Still There are Problems

Page 21: Main Clustering Algorithms §K-Means §Hierarchical §SOM

Clustering Documents“bag of words”

Di: vector of length l

Distance between Di and Dj: <Di, Dj>

W1 W2 W3 Wi Wj Wn

f11 f21 f31 fi1 fj1 fn1

. . . . . . . . . . . . . . . .

. . . . . . . . . . . .. . . .D1:

f12 f22 f32 fi2 fj2 fn2. . . . . . . . . . . .. . . .D2:

Dm: f1m f2m f3m fim fjm fnm. . . . . . . . . . . .. . . .

M

Page 22: Main Clustering Algorithms §K-Means §Hierarchical §SOM

Cluster Centroid

Cluster defined by distance to centroid: C

C = 1/m Di, where m is

the # of vectors

Page 23: Main Clustering Algorithms §K-Means §Hierarchical §SOM

Elevations

Elevation of D: El(D) = <C, D>

Problem: Would like:

Page 24: Main Clustering Algorithms §K-Means §Hierarchical §SOM

Mapping to higher DimensionUtilizing Kernel Function K(X,Y)

K(X,Y) = <(X),(Y)>,where, X,Y are vectors in Rn, and is a mapping into Rd, d >> n

Key element in Support Vector Machines

Data needs to appear as Dot Product only: <Di,Dj>

Page 25: Main Clustering Algorithms §K-Means §Hierarchical §SOM

Kernel Function ExamplesPolynomial:

K(X, Y) = (<X, Y> + 1)n

Feedforward Neural Network Classifier

K(X, Y) = tanh(β<X, Y> + b)

Radial Basis

K(X, Y) = e-<X, Y>^2/2^2

Page 26: Main Clustering Algorithms §K-Means §Hierarchical §SOM

First Step: Penalizing Outliers

Ck = 1/m <Di,N(Ck-1)>Di) (1)

Convergence: C = Principal Eigenvector of MTM,where M is the

matrix of Di’s Clim L (MTM)LU (2)

Both (1) and (2) are efficient methods of computing C

Page 27: Main Clustering Algorithms §K-Means §Hierarchical §SOM

Cannot with: Fk = 1/m <(Di),N(Fk-1)> (Di))

Or by using (2):

M = MTM has unmanageable (eventually infinite) dimension

So instead we use ik = <(Di),N(Fk-1)> =

(1/m)jk-1Di),Dj)>) (3)

(D1)

(D2)

.

.

Using Kernels to replace

Page 28: Main Clustering Algorithms §K-Means §Hierarchical §SOM

Theorem

F = i*Di

i*= lim {i

n=(1/m)jn-1K(Di , Dj)}

El(D): Elevation of vector D = i*K(Di , D)

where

for n

Page 29: Main Clustering Algorithms §K-Means §Hierarchical §SOM

Zoomed Clusters

Clusters defined through peaks Peaks: all vectors, which are the highest in their vicinity:

PEAKS = {Dj El(Dj) El(Di)<Di,Dj>S) for all i}

S: Sharpening/Smoothing ParameterCluster: Set of vectors, which are in the vicinity of a

peak

Page 30: Main Clustering Algorithms §K-Means §Hierarchical §SOM

1 2 3

0.1 0.2 0.3 0.4 0.5 0.6

0.1

0.2

0.3

0.4

0.5

0.6

0.7

C1

C2

Kernel: Linear S: Default (1)

1 2 3

0.1 0.2 0.3 0.4 0.5 0.6

0.1

0.2

0.3

0.4

0.5

0.6

0.7

C1

C2

Kernel: Linear S: Default (1)

Clustering Example

Page 31: Main Clustering Algorithms §K-Means §Hierarchical §SOM

Zooming Example

0.00

0.5

1.0

1

2

3

0.0

0.5

1.0

1.5

0.0

Page 32: Main Clustering Algorithms §K-Means §Hierarchical §SOM

0.0

0.1

0.5

0.2

1.41.2

1.0

C2 1.0

0.3

0.80.6

0.4 C10.2

C3

0.4

0.0

0.50.6

1 2 3 4 5 6 7 8

Kernel: LinearS: Default (1)

0.0

0.1

0.5

0.2

1.41.2

1.0

C2 1.0

0.3

0.80.6

0.4 C10.2

C3

0.4

0.0

0.50.6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Kernel: Polynomial Degree 2S: 16

Zoomed Clusters Results

0.0

0.1

0.5

0.2

1.41.2

1.0

C2 1.0

0.3

0.80.6

0.4 C10.2

C3

0.4

0.0

0.50.6

1 2 3 4

Kernel: Polynomial Degree 8000S: 1.5

0.0

0.1

0.5

0.2

1.41.2

1.0

C2 1.0

0.3

0.80.6

0.4 C10.2

C3

0.4

0.0

0.50.6

1 2

Kernel: Polynomial Degree 8000S: Deafault (1)Default

Page 33: Main Clustering Algorithms §K-Means §Hierarchical §SOM

Genes

Experiments

Clustering MicroArray Data

Expression Level of Gene i during Experiment j

Page 34: Main Clustering Algorithms §K-Means §Hierarchical §SOM

MicroArrays As Time Series

Page 35: Main Clustering Algorithms §K-Means §Hierarchical §SOM

Clustering Time Series

Reveals groups of genes, which have similar reactions to experiments

Functionally related genes should cluster

Page 36: Main Clustering Algorithms §K-Means §Hierarchical §SOM

Simulated Time Series

Simulated 180 Time Series, with 3 clusters and 9 sub-clusters (20 per sub-cluster)

Each time series is a vector with 1000 components Each component is expression level at a given time

Page 37: Main Clustering Algorithms §K-Means §Hierarchical §SOM

Results

Kernel: Polynomial Degree 3 S: 6Kernel: Polynomial Degree 3 S: 7Kernel: Polynomial Degree 6 S: 15

Page 38: Main Clustering Algorithms §K-Means §Hierarchical §SOM

HMM Parameter Estimation

Viterbi Algorithm

Refinement of HMM Model

Final HMM Model

Sequential K-Means

Baum-Welch Algorithm

Final HMM Model

Refinement of HMM Model

Initial HMM Model

Page 39: Main Clustering Algorithms §K-Means §Hierarchical §SOM

Parameter Estimation with Zoomed Clusters

Zoomed Clusters

Initial HMM Model

Advantages:

• Flexibility with number of states

• Initial Model is closer to the final one

Consequences:

• Higher accuracy and faster convergence for either Baum-Welch or Viterbi

Page 40: Main Clustering Algorithms §K-Means §Hierarchical §SOM

Example: Coins

HHHHHTTTTTTTHHHHHHHTHTHTHTHTHTTTTTTTTHHHHHTTTTTTTHHHHHHHTHTHTHTHTHTTTTTTTT

Coin 1: 100% Heads

Coin 1: 100% Tails Coin 3:

50% Tails 50% Heads

• Regions with similar statistical distribution of Heads and Tails represent the states in the initial HMM Model

• Use Elevation Functions, separately for Heads and Tails to represent these distributions

Page 41: Main Clustering Algorithms §K-Means §Hierarchical §SOM

HHHHH HHHHHHH H H H H H

TTTTTTT T T T T T TTTTTTTT

Step 1: Separating LettersStep 2: Calculating Elevation

Function for each letterStep 3: For each position in the

sequence of throws …

Position i

Step 3: Get the Elevation Functions for Heads and Tails

Step 3: Create point Di in R2, whose components are the

elevations

Step 4: Cluster all the points obtained from each position

Point Di = [Eh, Et]

Page 42: Main Clustering Algorithms §K-Means §Hierarchical §SOM

What Clustering Achieves

Each cluster defines regions of similar distributions of heads and tails

Each Cluster is a state in the initial HMM model

State transition/emission probabilities, are estimated from the clusters

Page 43: Main Clustering Algorithms §K-Means §Hierarchical §SOM

References MacQueen, J. 1967. Some methods for classification and analysis of

multivariate observations. Pp. 281-297 in: L. M. Le Cam & J. Neyman [eds.] Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Vol. 1. University of California Press, Berkeley. xvii + 666 p.

Jain, A. K., Murty, M. N., and Flynn, P. J. Data Clustering: A Review. ACM Computing Surveys, Vol. 31, No. 3, September 1999

http://www.gene-chips.com/ by Leming Shi, Ph.D.