main clustering algorithms §k-means §hierarchical §som

Post on 27-Dec-2015

239 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Main Clustering Algorithms

K-Means

Hierarchical

SOM

K-Means

MacQueen, 1967

clusters defined by means/centroids

Many clustering algorithms are derivatives of K-Means

Widespread use in industry and academia, despite it’s many problems

K-Means Example

Hierarchical Clustering

Starts by assuming each point as a cluster

Iteratively links most similar pair of clusters

User-defined threshold parameter specifies the output clusters

Hierarchical Clustering Variants In Minitab©

Linkage MethodsAverageCentroidCompleteMcQuittyMedianSingleWard

Distance MeasuresEuclideanManhattanPearsonSquared EuclideanSquared Pearson

Hierarchical Clustering Example

Results

Still There are Problems

Clustering Documents“bag of words”

Di: vector of length l

Distance between Di and Dj: <Di, Dj>

W1 W2 W3 Wi Wj Wn

f11 f21 f31 fi1 fj1 fn1

. . . . . . . . . . . . . . . .

. . . . . . . . . . . .. . . .D1:

f12 f22 f32 fi2 fj2 fn2. . . . . . . . . . . .. . . .D2:

Dm: f1m f2m f3m fim fjm fnm. . . . . . . . . . . .. . . .

M

Cluster Centroid

Cluster defined by distance to centroid: C

C = 1/m Di, where m is

the # of vectors

Elevations

Elevation of D: El(D) = <C, D>

Problem: Would like:

Mapping to higher DimensionUtilizing Kernel Function K(X,Y)

K(X,Y) = <(X),(Y)>,where, X,Y are vectors in Rn, and is a mapping into Rd, d >> n

Key element in Support Vector Machines

Data needs to appear as Dot Product only: <Di,Dj>

Kernel Function ExamplesPolynomial:

K(X, Y) = (<X, Y> + 1)n

Feedforward Neural Network Classifier

K(X, Y) = tanh(β<X, Y> + b)

Radial Basis

K(X, Y) = e-<X, Y>^2/2^2

First Step: Penalizing Outliers

Ck = 1/m <Di,N(Ck-1)>Di) (1)

Convergence: C = Principal Eigenvector of MTM,where M is the

matrix of Di’s Clim L (MTM)LU (2)

Both (1) and (2) are efficient methods of computing C

Cannot with: Fk = 1/m <(Di),N(Fk-1)> (Di))

Or by using (2):

M = MTM has unmanageable (eventually infinite) dimension

So instead we use ik = <(Di),N(Fk-1)> =

(1/m)jk-1Di),Dj)>) (3)

(D1)

(D2)

.

.

Using Kernels to replace

Theorem

F = i*Di

i*= lim {i

n=(1/m)jn-1K(Di , Dj)}

El(D): Elevation of vector D = i*K(Di , D)

where

for n

Zoomed Clusters

Clusters defined through peaks Peaks: all vectors, which are the highest in their vicinity:

PEAKS = {Dj El(Dj) El(Di)<Di,Dj>S) for all i}

S: Sharpening/Smoothing ParameterCluster: Set of vectors, which are in the vicinity of a

peak

1 2 3

0.1 0.2 0.3 0.4 0.5 0.6

0.1

0.2

0.3

0.4

0.5

0.6

0.7

C1

C2

Kernel: Linear S: Default (1)

1 2 3

0.1 0.2 0.3 0.4 0.5 0.6

0.1

0.2

0.3

0.4

0.5

0.6

0.7

C1

C2

Kernel: Linear S: Default (1)

Clustering Example

Zooming Example

0.00

0.5

1.0

1

2

3

0.0

0.5

1.0

1.5

0.0

0.0

0.1

0.5

0.2

1.41.2

1.0

C2 1.0

0.3

0.80.6

0.4 C10.2

C3

0.4

0.0

0.50.6

1 2 3 4 5 6 7 8

Kernel: LinearS: Default (1)

0.0

0.1

0.5

0.2

1.41.2

1.0

C2 1.0

0.3

0.80.6

0.4 C10.2

C3

0.4

0.0

0.50.6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Kernel: Polynomial Degree 2S: 16

Zoomed Clusters Results

0.0

0.1

0.5

0.2

1.41.2

1.0

C2 1.0

0.3

0.80.6

0.4 C10.2

C3

0.4

0.0

0.50.6

1 2 3 4

Kernel: Polynomial Degree 8000S: 1.5

0.0

0.1

0.5

0.2

1.41.2

1.0

C2 1.0

0.3

0.80.6

0.4 C10.2

C3

0.4

0.0

0.50.6

1 2

Kernel: Polynomial Degree 8000S: Deafault (1)Default

Genes

Experiments

Clustering MicroArray Data

Expression Level of Gene i during Experiment j

MicroArrays As Time Series

Clustering Time Series

Reveals groups of genes, which have similar reactions to experiments

Functionally related genes should cluster

Simulated Time Series

Simulated 180 Time Series, with 3 clusters and 9 sub-clusters (20 per sub-cluster)

Each time series is a vector with 1000 components Each component is expression level at a given time

Results

Kernel: Polynomial Degree 3 S: 6Kernel: Polynomial Degree 3 S: 7Kernel: Polynomial Degree 6 S: 15

HMM Parameter Estimation

Viterbi Algorithm

Refinement of HMM Model

Final HMM Model

Sequential K-Means

Baum-Welch Algorithm

Final HMM Model

Refinement of HMM Model

Initial HMM Model

Parameter Estimation with Zoomed Clusters

Zoomed Clusters

Initial HMM Model

Advantages:

• Flexibility with number of states

• Initial Model is closer to the final one

Consequences:

• Higher accuracy and faster convergence for either Baum-Welch or Viterbi

Example: Coins

HHHHHTTTTTTTHHHHHHHTHTHTHTHTHTTTTTTTTHHHHHTTTTTTTHHHHHHHTHTHTHTHTHTTTTTTTT

Coin 1: 100% Heads

Coin 1: 100% Tails Coin 3:

50% Tails 50% Heads

• Regions with similar statistical distribution of Heads and Tails represent the states in the initial HMM Model

• Use Elevation Functions, separately for Heads and Tails to represent these distributions

HHHHH HHHHHHH H H H H H

TTTTTTT T T T T T TTTTTTTT

Step 1: Separating LettersStep 2: Calculating Elevation

Function for each letterStep 3: For each position in the

sequence of throws …

Position i

Step 3: Get the Elevation Functions for Heads and Tails

Step 3: Create point Di in R2, whose components are the

elevations

Step 4: Cluster all the points obtained from each position

Point Di = [Eh, Et]

What Clustering Achieves

Each cluster defines regions of similar distributions of heads and tails

Each Cluster is a state in the initial HMM model

State transition/emission probabilities, are estimated from the clusters

References MacQueen, J. 1967. Some methods for classification and analysis of

multivariate observations. Pp. 281-297 in: L. M. Le Cam & J. Neyman [eds.] Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Vol. 1. University of California Press, Berkeley. xvii + 666 p.

Jain, A. K., Murty, M. N., and Flynn, P. J. Data Clustering: A Review. ACM Computing Surveys, Vol. 31, No. 3, September 1999

http://www.gene-chips.com/ by Leming Shi, Ph.D.

top related