clustering part 1
TRANSCRIPT
![Page 1: Clustering part 1](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ecae0b1a28ab8b058b4623/html5/thumbnails/1.jpg)
Clustering Part 1Abdul Kawsar Tushar
Nadeem AhmedCSE, UAP
![Page 2: Clustering part 1](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ecae0b1a28ab8b058b4623/html5/thumbnails/2.jpg)
What is Clustering
• visualization of data• hypothesis generation
![Page 3: Clustering part 1](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ecae0b1a28ab8b058b4623/html5/thumbnails/3.jpg)
Overview of Clustering
• Feature Selection• Feature Extraction
• transformations of the input features to produce new salient features.
• Inter-pattern Similarity• Grouping
![Page 4: Clustering part 1](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ecae0b1a28ab8b058b4623/html5/thumbnails/4.jpg)
Formal Definition• Clustering is the classification of objects into different
groups, or more precisely, the partitioning of a data set into subsets (clusters), so that the data in each subset (ideally) share some common trait - often according to some defined distance measure.
![Page 5: Clustering part 1](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ecae0b1a28ab8b058b4623/html5/thumbnails/5.jpg)
Notion of a Cluster can be Ambiguous
How many clusters?
Four Clusters Two Clusters
Six Clusters
![Page 6: Clustering part 1](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ecae0b1a28ab8b058b4623/html5/thumbnails/6.jpg)
Hierarchical Clustering: Example
![Page 7: Clustering part 1](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ecae0b1a28ab8b058b4623/html5/thumbnails/7.jpg)
Hierarchical Clustering: Example Using Single Linkage
![Page 8: Clustering part 1](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ecae0b1a28ab8b058b4623/html5/thumbnails/8.jpg)
Hierarchical Clustering: Forming Clusters• Forming clusters from dendograms
![Page 9: Clustering part 1](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ecae0b1a28ab8b058b4623/html5/thumbnails/9.jpg)
Hierarchical Clustering• Advantages
• Dendograms are great for visualization• Provides hierarchical relations between clusters• Shown to be able to capture concentric clusters
• Disadvantages• Not easy to define levels for clusters• Experiments showed that other clustering techniques outperform hierarchical
clustering
![Page 10: Clustering part 1](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ecae0b1a28ab8b058b4623/html5/thumbnails/10.jpg)
How to Define Inter-Cluster Similarity
Similarity?
Single Link Complete Link Average Link
![Page 11: Clustering part 1](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ecae0b1a28ab8b058b4623/html5/thumbnails/11.jpg)
How to Define Inter-Cluster Similarity
Single Link Complete Link Average Link
![Page 12: Clustering part 1](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ecae0b1a28ab8b058b4623/html5/thumbnails/12.jpg)
How to Define Inter-Cluster Similarity
Single Link Complete Link Average Link
![Page 13: Clustering part 1](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ecae0b1a28ab8b058b4623/html5/thumbnails/13.jpg)
How to Define Inter-Cluster Similarity
Single Link Complete Link Average Link
![Page 14: Clustering part 1](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ecae0b1a28ab8b058b4623/html5/thumbnails/14.jpg)
Common Similarity Measures• Distance measure will determine how the similarity of two
elements is calculated and it will influence the shape of the clusters.They include:
1. The Euclidean distance (also called 2-norm distance) is given by:
2. The Manhattan distance (also called taxicab norm or 1-norm) is given by:
![Page 15: Clustering part 1](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ecae0b1a28ab8b058b4623/html5/thumbnails/15.jpg)
A Simple example showing the implementation of k-means algorithm (using K=2)
![Page 16: Clustering part 1](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ecae0b1a28ab8b058b4623/html5/thumbnails/16.jpg)
Step 1:Initialization: Randomly we choose following two centroids (k=2) for two clusters.In this case the 2 centroid are: m1=(1.0,1.0) and m2=(5.0,7.0).
![Page 17: Clustering part 1](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ecae0b1a28ab8b058b4623/html5/thumbnails/17.jpg)
Step 2:• Thus, we obtain two clusters
containing:{1,2,3} and {4,5,6,7}.
• Their new centroids are:
![Page 18: Clustering part 1](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ecae0b1a28ab8b058b4623/html5/thumbnails/18.jpg)
Step 3:• Now using these centroids we
compute the Euclidean distance of each object, as shown in table.
• Therefore, the new clusters are:{1,2} and {3,4,5,6,7}
• Next centroids are: m1=(1.25,1.5) and m2 = (3.9,5.1)
![Page 19: Clustering part 1](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ecae0b1a28ab8b058b4623/html5/thumbnails/19.jpg)
• Step 4 :The clusters obtained are:{1,2} and {3,4,5,6,7}
• Therefore, there is no change in the cluster.
• Thus, the algorithm comes to a halt here and final result consist of 2 clusters {1,2} and {3,4,5,6,7}.
![Page 20: Clustering part 1](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ecae0b1a28ab8b058b4623/html5/thumbnails/20.jpg)
PLOT
![Page 21: Clustering part 1](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ecae0b1a28ab8b058b4623/html5/thumbnails/21.jpg)
(with K=3)
Step 1 Step 2
![Page 22: Clustering part 1](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ecae0b1a28ab8b058b4623/html5/thumbnails/22.jpg)
PLOT
![Page 23: Clustering part 1](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ecae0b1a28ab8b058b4623/html5/thumbnails/23.jpg)
Two different K-means Clustering
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Sub-optimal Clustering-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Optimal Clustering
Original Points
![Page 24: Clustering part 1](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ecae0b1a28ab8b058b4623/html5/thumbnails/24.jpg)
Importance of Choosing Initial Centroids
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 3
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 4
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 5
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 6
![Page 25: Clustering part 1](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ecae0b1a28ab8b058b4623/html5/thumbnails/25.jpg)
Importance of Choosing Initial Centroids
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 3
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 4
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 5
![Page 26: Clustering part 1](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ecae0b1a28ab8b058b4623/html5/thumbnails/26.jpg)
Clustering Non-clustered Data
![Page 27: Clustering part 1](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ecae0b1a28ab8b058b4623/html5/thumbnails/27.jpg)
Getting Stuck In A Local Minimum
![Page 28: Clustering part 1](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ecae0b1a28ab8b058b4623/html5/thumbnails/28.jpg)
Can k-means Handle Non-spherical Clusters?
![Page 29: Clustering part 1](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ecae0b1a28ab8b058b4623/html5/thumbnails/29.jpg)
…Maybe not.
![Page 30: Clustering part 1](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ecae0b1a28ab8b058b4623/html5/thumbnails/30.jpg)
Let’s Try Single Linkage Hierarchical Clustering
![Page 31: Clustering part 1](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ecae0b1a28ab8b058b4623/html5/thumbnails/31.jpg)
K-means with Polar Coordinates
![Page 32: Clustering part 1](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58ecae0b1a28ab8b058b4623/html5/thumbnails/32.jpg)