clustering part 1

32
Clustering Part 1 Abdul Kawsar Tushar Nadeem Ahmed CSE, UAP

Upload: abdul-kawsar-tushar

Post on 11-Apr-2017

65 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Clustering part 1

Clustering Part 1Abdul Kawsar Tushar

Nadeem AhmedCSE, UAP

Page 2: Clustering part 1

What is Clustering

• visualization of data• hypothesis generation

Page 3: Clustering part 1

Overview of Clustering

• Feature Selection• Feature Extraction

• transformations of the input features to produce new salient features.

• Inter-pattern Similarity• Grouping

Page 4: Clustering part 1

Formal Definition• Clustering is the classification of objects into different

groups, or more precisely, the partitioning of a data set into subsets (clusters), so that the data in each subset (ideally) share some common trait - often according to some defined distance measure.

Page 5: Clustering part 1

Notion of a Cluster can be Ambiguous

How many clusters?

Four Clusters Two Clusters

Six Clusters

Page 6: Clustering part 1

Hierarchical Clustering: Example

Page 7: Clustering part 1

Hierarchical Clustering: Example Using Single Linkage

Page 8: Clustering part 1

Hierarchical Clustering: Forming Clusters• Forming clusters from dendograms

Page 9: Clustering part 1

Hierarchical Clustering• Advantages

• Dendograms are great for visualization• Provides hierarchical relations between clusters• Shown to be able to capture concentric clusters

• Disadvantages• Not easy to define levels for clusters• Experiments showed that other clustering techniques outperform hierarchical

clustering

Page 10: Clustering part 1

How to Define Inter-Cluster Similarity

Similarity?

Single Link Complete Link Average Link

Page 11: Clustering part 1

How to Define Inter-Cluster Similarity

Single Link Complete Link Average Link

Page 12: Clustering part 1

How to Define Inter-Cluster Similarity

Single Link Complete Link Average Link

Page 13: Clustering part 1

How to Define Inter-Cluster Similarity

Single Link Complete Link Average Link

Page 14: Clustering part 1

Common Similarity Measures• Distance measure will determine how the similarity of two

elements is calculated and it will influence the shape of the clusters.They include:

1. The Euclidean distance (also called 2-norm distance) is given by:

2. The Manhattan distance (also called taxicab norm or 1-norm) is given by:

Page 15: Clustering part 1

A Simple example showing the implementation of k-means algorithm (using K=2)

Page 16: Clustering part 1

Step 1:Initialization: Randomly we choose following two centroids (k=2) for two clusters.In this case the 2 centroid are: m1=(1.0,1.0) and m2=(5.0,7.0).

Page 17: Clustering part 1

Step 2:• Thus, we obtain two clusters

containing:{1,2,3} and {4,5,6,7}.

• Their new centroids are:

Page 18: Clustering part 1

Step 3:• Now using these centroids we

compute the Euclidean distance of each object, as shown in table.

• Therefore, the new clusters are:{1,2} and {3,4,5,6,7}

• Next centroids are: m1=(1.25,1.5) and m2 = (3.9,5.1)

Page 19: Clustering part 1

• Step 4 :The clusters obtained are:{1,2} and {3,4,5,6,7}

• Therefore, there is no change in the cluster.

• Thus, the algorithm comes to a halt here and final result consist of 2 clusters {1,2} and {3,4,5,6,7}.

Page 20: Clustering part 1

PLOT

Page 21: Clustering part 1

(with K=3)

Step 1 Step 2

Page 22: Clustering part 1

PLOT

Page 23: Clustering part 1

Two different K-means Clustering

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Sub-optimal Clustering-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Optimal Clustering

Original Points

Page 24: Clustering part 1

Importance of Choosing Initial Centroids

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Iteration 1

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Iteration 2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Iteration 3

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Iteration 4

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Iteration 5

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Iteration 6

Page 25: Clustering part 1

Importance of Choosing Initial Centroids

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Iteration 1

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Iteration 2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Iteration 3

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Iteration 4

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

x

y

Iteration 5

Page 26: Clustering part 1

Clustering Non-clustered Data

Page 27: Clustering part 1

Getting Stuck In A Local Minimum

Page 28: Clustering part 1

Can k-means Handle Non-spherical Clusters?

Page 29: Clustering part 1

…Maybe not.

Page 30: Clustering part 1

Let’s Try Single Linkage Hierarchical Clustering

Page 31: Clustering part 1

K-means with Polar Coordinates

Page 32: Clustering part 1