25 machine learning unsupervised learaning k-means k-centers

210
Machine Learning for Data Mining Unsupervised Learning - Clustering Andres Mendez-Vazquez July 22, 2015 1 / 75

Upload: andres-mendez-vazquez

Post on 16-Jan-2017

553 views

Category:

Engineering


1 download

TRANSCRIPT

Page 1: 25 Machine Learning Unsupervised Learaning K-means K-centers

Machine Learning for Data MiningUnsupervised Learning - Clustering

Andres Mendez-Vazquez

July 22, 2015

1 / 75

Page 2: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Outline1 Supervised Learning vs. Unsupervised Learning

Supervised vs UnsupervisedClusteringPattern RecognitionWhy Clustering?

2 K-Means ClusteringK-Means ClusteringConvergence CriterionThe Distance FunctionExampleProperties of K-Means

3 The K-Center Criterion ClusteringIntroduction

Comparison with K-meansThe Greedy K-Center Algorithm

Pseudo-CodeK-Center Algorithm PropertiesK-Center Algorithm proof of correctness

2 / 75

Page 3: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Outline1 Supervised Learning vs. Unsupervised Learning

Supervised vs UnsupervisedClusteringPattern RecognitionWhy Clustering?

2 K-Means ClusteringK-Means ClusteringConvergence CriterionThe Distance FunctionExampleProperties of K-Means

3 The K-Center Criterion ClusteringIntroduction

Comparison with K-meansThe Greedy K-Center Algorithm

Pseudo-CodeK-Center Algorithm PropertiesK-Center Algorithm proof of correctness

3 / 75

Page 4: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Supervised Learning vs. Unsupervised Learning

Supervised learning:Discover patterns in the data that relate data attributes with a target(class) attribute.

Unsupervised learning:The data have no target attribute.

4 / 75

Page 5: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Supervised Learning vs. Unsupervised Learning

Supervised learning:Discover patterns in the data that relate data attributes with a target(class) attribute.

Unsupervised learning:The data have no target attribute.

4 / 75

Page 6: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Outline1 Supervised Learning vs. Unsupervised Learning

Supervised vs UnsupervisedClusteringPattern RecognitionWhy Clustering?

2 K-Means ClusteringK-Means ClusteringConvergence CriterionThe Distance FunctionExampleProperties of K-Means

3 The K-Center Criterion ClusteringIntroduction

Comparison with K-meansThe Greedy K-Center Algorithm

Pseudo-CodeK-Center Algorithm PropertiesK-Center Algorithm proof of correctness

5 / 75

Page 7: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Clustering

ClusteringIt is a technique for finding similarity groups in data, called clusters.

CalledAn unsupervised learning task as no class values denoting an a priorigrouping of the data instances are given, which is the case in supervisedlearning.

Due to historical reasonsClustering is often considered synonymous with unsupervised learning.

In fact, association rule mining is also unsupervised.

6 / 75

Page 8: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Clustering

ClusteringIt is a technique for finding similarity groups in data, called clusters.

CalledAn unsupervised learning task as no class values denoting an a priorigrouping of the data instances are given, which is the case in supervisedlearning.

Due to historical reasonsClustering is often considered synonymous with unsupervised learning.

In fact, association rule mining is also unsupervised.

6 / 75

Page 9: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Clustering

ClusteringIt is a technique for finding similarity groups in data, called clusters.

CalledAn unsupervised learning task as no class values denoting an a priorigrouping of the data instances are given, which is the case in supervisedlearning.

Due to historical reasonsClustering is often considered synonymous with unsupervised learning.

In fact, association rule mining is also unsupervised.

6 / 75

Page 10: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Clustering

ClusteringIt is a technique for finding similarity groups in data, called clusters.

CalledAn unsupervised learning task as no class values denoting an a priorigrouping of the data instances are given, which is the case in supervisedlearning.

Due to historical reasonsClustering is often considered synonymous with unsupervised learning.

In fact, association rule mining is also unsupervised.

6 / 75

Page 11: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Outline1 Supervised Learning vs. Unsupervised Learning

Supervised vs UnsupervisedClusteringPattern RecognitionWhy Clustering?

2 K-Means ClusteringK-Means ClusteringConvergence CriterionThe Distance FunctionExampleProperties of K-Means

3 The K-Center Criterion ClusteringIntroduction

Comparison with K-meansThe Greedy K-Center Algorithm

Pseudo-CodeK-Center Algorithm PropertiesK-Center Algorithm proof of correctness

7 / 75

Page 12: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Pattern Recognition

DefinitionSearch for structure in data

Elements of Numerical Pattern Recognition1 Process Description

I Feature Nomination, Test Data, Design Data2 Feature Analysis

I Preprocessing, Extraction, Selection, . . .3 Cluster Analysis

I Labeling, Validity, . . .4 Classifier Design

I Classification, Estimation, Prediction, Control, . . .

8 / 75

Page 13: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Pattern Recognition

DefinitionSearch for structure in data

Elements of Numerical Pattern Recognition1 Process Description

I Feature Nomination, Test Data, Design Data2 Feature Analysis

I Preprocessing, Extraction, Selection, . . .3 Cluster Analysis

I Labeling, Validity, . . .4 Classifier Design

I Classification, Estimation, Prediction, Control, . . .

8 / 75

Page 14: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Pattern Recognition

DefinitionSearch for structure in data

Elements of Numerical Pattern Recognition1 Process Description

I Feature Nomination, Test Data, Design Data2 Feature Analysis

I Preprocessing, Extraction, Selection, . . .3 Cluster Analysis

I Labeling, Validity, . . .4 Classifier Design

I Classification, Estimation, Prediction, Control, . . .

8 / 75

Page 15: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Pattern Recognition

DefinitionSearch for structure in data

Elements of Numerical Pattern Recognition1 Process Description

I Feature Nomination, Test Data, Design Data2 Feature Analysis

I Preprocessing, Extraction, Selection, . . .3 Cluster Analysis

I Labeling, Validity, . . .4 Classifier Design

I Classification, Estimation, Prediction, Control, . . .

8 / 75

Page 16: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Pattern Recognition

DefinitionSearch for structure in data

Elements of Numerical Pattern Recognition1 Process Description

I Feature Nomination, Test Data, Design Data2 Feature Analysis

I Preprocessing, Extraction, Selection, . . .3 Cluster Analysis

I Labeling, Validity, . . .4 Classifier Design

I Classification, Estimation, Prediction, Control, . . .

8 / 75

Page 17: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Pattern Recognition

DefinitionSearch for structure in data

Elements of Numerical Pattern Recognition1 Process Description

I Feature Nomination, Test Data, Design Data2 Feature Analysis

I Preprocessing, Extraction, Selection, . . .3 Cluster Analysis

I Labeling, Validity, . . .4 Classifier Design

I Classification, Estimation, Prediction, Control, . . .

8 / 75

Page 18: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Pattern Recognition

DefinitionSearch for structure in data

Elements of Numerical Pattern Recognition1 Process Description

I Feature Nomination, Test Data, Design Data2 Feature Analysis

I Preprocessing, Extraction, Selection, . . .3 Cluster Analysis

I Labeling, Validity, . . .4 Classifier Design

I Classification, Estimation, Prediction, Control, . . .

8 / 75

Page 19: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Pattern Recognition

DefinitionSearch for structure in data

Elements of Numerical Pattern Recognition1 Process Description

I Feature Nomination, Test Data, Design Data2 Feature Analysis

I Preprocessing, Extraction, Selection, . . .3 Cluster Analysis

I Labeling, Validity, . . .4 Classifier Design

I Classification, Estimation, Prediction, Control, . . .

8 / 75

Page 20: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Pattern Recognition

DefinitionSearch for structure in data

Elements of Numerical Pattern Recognition1 Process Description

I Feature Nomination, Test Data, Design Data2 Feature Analysis

I Preprocessing, Extraction, Selection, . . .3 Cluster Analysis

I Labeling, Validity, . . .4 Classifier Design

I Classification, Estimation, Prediction, Control, . . .

8 / 75

Page 21: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

An illustration

The data set has three natural groups of data points, i.e., 3 naturalclusters.

9 / 75

Page 22: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Examples

Example 1Groups people of similar sizes together to make “small”, “medium” and“large” T-Shirts.

Example 2In marketing, segment customers according to their similarities.

10 / 75

Page 23: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Examples

Example 1Groups people of similar sizes together to make “small”, “medium” and“large” T-Shirts.

Example 2In marketing, segment customers according to their similarities.

10 / 75

Page 24: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

How we create this classes?

For this, we use the following conceptClustering!!!

BasicallyWe want to “reveal” the organization of patterns into “sensible” clusters(groups).

ActuallyClustering is one of the most primitive mental activities of humans, usedto handle the huge amount of information they receive every day.

11 / 75

Page 25: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

How we create this classes?

For this, we use the following conceptClustering!!!

BasicallyWe want to “reveal” the organization of patterns into “sensible” clusters(groups).

ActuallyClustering is one of the most primitive mental activities of humans, usedto handle the huge amount of information they receive every day.

11 / 75

Page 26: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

How we create this classes?

For this, we use the following conceptClustering!!!

BasicallyWe want to “reveal” the organization of patterns into “sensible” clusters(groups).

ActuallyClustering is one of the most primitive mental activities of humans, usedto handle the huge amount of information they receive every day.

11 / 75

Page 27: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Outline1 Supervised Learning vs. Unsupervised Learning

Supervised vs UnsupervisedClusteringPattern RecognitionWhy Clustering?

2 K-Means ClusteringK-Means ClusteringConvergence CriterionThe Distance FunctionExampleProperties of K-Means

3 The K-Center Criterion ClusteringIntroduction

Comparison with K-meansThe Greedy K-Center Algorithm

Pseudo-CodeK-Center Algorithm PropertiesK-Center Algorithm proof of correctness

12 / 75

Page 28: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

What is clustering for?

Data ReductionWe can use clustering to reduce the amount of samples in each group toreduce the processing of information.

Example:I In data transmission, a representative for each cluster is defined.I Then, instead of transmitting the data samples, we transmit a code

number corresponding to the representative.I Thus, data compression is achieved.

Hypothesis GenerationIn this case we apply cluster analysis to a data set in order to infer somehypotheses concerning the nature of the data.

13 / 75

Page 29: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

What is clustering for?

Data ReductionWe can use clustering to reduce the amount of samples in each group toreduce the processing of information.

Example:I In data transmission, a representative for each cluster is defined.I Then, instead of transmitting the data samples, we transmit a code

number corresponding to the representative.I Thus, data compression is achieved.

Hypothesis GenerationIn this case we apply cluster analysis to a data set in order to infer somehypotheses concerning the nature of the data.

13 / 75

Page 30: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

What is clustering for?

Data ReductionWe can use clustering to reduce the amount of samples in each group toreduce the processing of information.

Example:I In data transmission, a representative for each cluster is defined.I Then, instead of transmitting the data samples, we transmit a code

number corresponding to the representative.I Thus, data compression is achieved.

Hypothesis GenerationIn this case we apply cluster analysis to a data set in order to infer somehypotheses concerning the nature of the data.

13 / 75

Page 31: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

What is clustering for?

Data ReductionWe can use clustering to reduce the amount of samples in each group toreduce the processing of information.

Example:I In data transmission, a representative for each cluster is defined.I Then, instead of transmitting the data samples, we transmit a code

number corresponding to the representative.I Thus, data compression is achieved.

Hypothesis GenerationIn this case we apply cluster analysis to a data set in order to infer somehypotheses concerning the nature of the data.

13 / 75

Page 32: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

What is clustering for?

Data ReductionWe can use clustering to reduce the amount of samples in each group toreduce the processing of information.

Example:I In data transmission, a representative for each cluster is defined.I Then, instead of transmitting the data samples, we transmit a code

number corresponding to the representative.I Thus, data compression is achieved.

Hypothesis GenerationIn this case we apply cluster analysis to a data set in order to infer somehypotheses concerning the nature of the data.

13 / 75

Page 33: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

What is clustering for?

Hypothesis testingIn this context, cluster analysis is used for the verification of the validity ofa specific hypothesis.

Example:I “Big Companies Invest Abroad”

Prediction based on groupsFirst: We apply cluster analysis to the available data set.

Second: The resulting clusters are characterized based on thecharacteristics of the patterns by which they are formed.

Third: If we are given an unknown pattern, we can determine thecluster to which it is more likely to belong.

14 / 75

Page 34: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

What is clustering for?

Hypothesis testingIn this context, cluster analysis is used for the verification of the validity ofa specific hypothesis.

Example:I “Big Companies Invest Abroad”

Prediction based on groupsFirst: We apply cluster analysis to the available data set.

Second: The resulting clusters are characterized based on thecharacteristics of the patterns by which they are formed.

Third: If we are given an unknown pattern, we can determine thecluster to which it is more likely to belong.

14 / 75

Page 35: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

What is clustering for?

Hypothesis testingIn this context, cluster analysis is used for the verification of the validity ofa specific hypothesis.

Example:I “Big Companies Invest Abroad”

Prediction based on groupsFirst: We apply cluster analysis to the available data set.

Second: The resulting clusters are characterized based on thecharacteristics of the patterns by which they are formed.

Third: If we are given an unknown pattern, we can determine thecluster to which it is more likely to belong.

14 / 75

Page 36: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

What is clustering for?

Hypothesis testingIn this context, cluster analysis is used for the verification of the validity ofa specific hypothesis.

Example:I “Big Companies Invest Abroad”

Prediction based on groupsFirst: We apply cluster analysis to the available data set.

Second: The resulting clusters are characterized based on thecharacteristics of the patterns by which they are formed.

Third: If we are given an unknown pattern, we can determine thecluster to which it is more likely to belong.

14 / 75

Page 37: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

What is clustering for?

Hypothesis testingIn this context, cluster analysis is used for the verification of the validity ofa specific hypothesis.

Example:I “Big Companies Invest Abroad”

Prediction based on groupsFirst: We apply cluster analysis to the available data set.

Second: The resulting clusters are characterized based on thecharacteristics of the patterns by which they are formed.

Third: If we are given an unknown pattern, we can determine thecluster to which it is more likely to belong.

14 / 75

Page 38: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

What is clustering for?

Hypothesis testingIn this context, cluster analysis is used for the verification of the validity ofa specific hypothesis.

Example:I “Big Companies Invest Abroad”

Prediction based on groupsFirst: We apply cluster analysis to the available data set.

Second: The resulting clusters are characterized based on thecharacteristics of the patterns by which they are formed.

Third: If we are given an unknown pattern, we can determine thecluster to which it is more likely to belong.

14 / 75

Page 39: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

What is clustering for?

Hypothesis testingIn this context, cluster analysis is used for the verification of the validity ofa specific hypothesis.

Example:I “Big Companies Invest Abroad”

Prediction based on groupsFirst: We apply cluster analysis to the available data set.

Second: The resulting clusters are characterized based on thecharacteristics of the patterns by which they are formed.

Third: If we are given an unknown pattern, we can determine thecluster to which it is more likely to belong.

14 / 75

Page 40: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Aspects of clustering

A clustering algorithm - They are Many!!!Partition clustering.Hierarchical clustering.etc.

Based in a functionA distance (similarity, or dissimilarity) function.

Clustering qualityInter-clusters distance → maximized.Intra-clusters distance → minimized.

15 / 75

Page 41: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Aspects of clustering

A clustering algorithm - They are Many!!!Partition clustering.Hierarchical clustering.etc.

Based in a functionA distance (similarity, or dissimilarity) function.

Clustering qualityInter-clusters distance → maximized.Intra-clusters distance → minimized.

15 / 75

Page 42: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Aspects of clustering

A clustering algorithm - They are Many!!!Partition clustering.Hierarchical clustering.etc.

Based in a functionA distance (similarity, or dissimilarity) function.

Clustering qualityInter-clusters distance → maximized.Intra-clusters distance → minimized.

15 / 75

Page 43: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Aspects of clustering

A clustering algorithm - They are Many!!!Partition clustering.Hierarchical clustering.etc.

Based in a functionA distance (similarity, or dissimilarity) function.

Clustering qualityInter-clusters distance → maximized.Intra-clusters distance → minimized.

15 / 75

Page 44: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Aspects of clustering

A clustering algorithm - They are Many!!!Partition clustering.Hierarchical clustering.etc.

Based in a functionA distance (similarity, or dissimilarity) function.

Clustering qualityInter-clusters distance → maximized.Intra-clusters distance → minimized.

15 / 75

Page 45: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Aspects of clustering

A clustering algorithm - They are Many!!!Partition clustering.Hierarchical clustering.etc.

Based in a functionA distance (similarity, or dissimilarity) function.

Clustering qualityInter-clusters distance → maximized.Intra-clusters distance → minimized.

15 / 75

Page 46: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Outline1 Supervised Learning vs. Unsupervised Learning

Supervised vs UnsupervisedClusteringPattern RecognitionWhy Clustering?

2 K-Means ClusteringK-Means ClusteringConvergence CriterionThe Distance FunctionExampleProperties of K-Means

3 The K-Center Criterion ClusteringIntroduction

Comparison with K-meansThe Greedy K-Center Algorithm

Pseudo-CodeK-Center Algorithm PropertiesK-Center Algorithm proof of correctness

16 / 75

Page 47: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

K -means clustering

K -meansIs a partitional clustering algorithm.

DefinitionLet the set of data points (or instances) D be {x1, · · · , xn} wherexi = (xi1, · · · , xir)T :

The K -means algorithm partitions the given data into K clusters.Each cluster has a cluster center, called centroid.K is specified by the user.

17 / 75

Page 48: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

K -means clustering

K -meansIs a partitional clustering algorithm.

DefinitionLet the set of data points (or instances) D be {x1, · · · , xn} wherexi = (xi1, · · · , xir)T :

The K -means algorithm partitions the given data into K clusters.Each cluster has a cluster center, called centroid.K is specified by the user.

17 / 75

Page 49: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

K -means clustering

K -meansIs a partitional clustering algorithm.

DefinitionLet the set of data points (or instances) D be {x1, · · · , xn} wherexi = (xi1, · · · , xir)T :

The K -means algorithm partitions the given data into K clusters.Each cluster has a cluster center, called centroid.K is specified by the user.

17 / 75

Page 50: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

K -means clustering

K -meansIs a partitional clustering algorithm.

DefinitionLet the set of data points (or instances) D be {x1, · · · , xn} wherexi = (xi1, · · · , xir)T :

The K -means algorithm partitions the given data into K clusters.Each cluster has a cluster center, called centroid.K is specified by the user.

17 / 75

Page 51: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

K -means clustering

K -meansIs a partitional clustering algorithm.

DefinitionLet the set of data points (or instances) D be {x1, · · · , xn} wherexi = (xi1, · · · , xir)T :

The K -means algorithm partitions the given data into K clusters.Each cluster has a cluster center, called centroid.K is specified by the user.

17 / 75

Page 52: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

K -means algorithmThe K -means algorithm works as followsGiven k as the possible number of cluster:

1 Randomly choose K data points (seeds) to be the initial centroids,cluster centers,

I {v1, · · · , vk}2 Assign each data point to the closest centroid

I ci = arg minj{dist(xi − vj)}

3 Re-compute the centroids using the current cluster memberships.

I vj =

n∑i=1

I (ci = j)xi

n∑i=1

I (ci = j)

4 If a convergence criterion is not met, go to 2.

18 / 75

Page 53: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

K -means algorithmThe K -means algorithm works as followsGiven k as the possible number of cluster:

1 Randomly choose K data points (seeds) to be the initial centroids,cluster centers,

I {v1, · · · , vk}2 Assign each data point to the closest centroid

I ci = arg minj{dist(xi − vj)}

3 Re-compute the centroids using the current cluster memberships.

I vj =

n∑i=1

I (ci = j)xi

n∑i=1

I (ci = j)

4 If a convergence criterion is not met, go to 2.

18 / 75

Page 54: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

K -means algorithmThe K -means algorithm works as followsGiven k as the possible number of cluster:

1 Randomly choose K data points (seeds) to be the initial centroids,cluster centers,

I {v1, · · · , vk}2 Assign each data point to the closest centroid

I ci = arg minj{dist(xi − vj)}

3 Re-compute the centroids using the current cluster memberships.

I vj =

n∑i=1

I (ci = j)xi

n∑i=1

I (ci = j)

4 If a convergence criterion is not met, go to 2.

18 / 75

Page 55: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

K -means algorithmThe K -means algorithm works as followsGiven k as the possible number of cluster:

1 Randomly choose K data points (seeds) to be the initial centroids,cluster centers,

I {v1, · · · , vk}2 Assign each data point to the closest centroid

I ci = arg minj{dist(xi − vj)}

3 Re-compute the centroids using the current cluster memberships.

I vj =

n∑i=1

I (ci = j)xi

n∑i=1

I (ci = j)

4 If a convergence criterion is not met, go to 2.

18 / 75

Page 56: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

K -means algorithmThe K -means algorithm works as followsGiven k as the possible number of cluster:

1 Randomly choose K data points (seeds) to be the initial centroids,cluster centers,

I {v1, · · · , vk}2 Assign each data point to the closest centroid

I ci = arg minj{dist(xi − vj)}

3 Re-compute the centroids using the current cluster memberships.

I vj =

n∑i=1

I (ci = j)xi

n∑i=1

I (ci = j)

4 If a convergence criterion is not met, go to 2.

18 / 75

Page 57: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

What is the code trying to do?

It is trying to find a partition SK -means tries to find a partition S that minimizes:

minS

N∑k=1

∑i:xi∈Ck

(xi − µk)T (xi − µk) (1)

Where µk is the centroid for cluster Ck

µk = 1Nk

∑i:xi∈Ck

xi (2)

Where Nk is the number of samples in the cluster Ck .

19 / 75

Page 58: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

What is the code trying to do?

It is trying to find a partition SK -means tries to find a partition S that minimizes:

minS

N∑k=1

∑i:xi∈Ck

(xi − µk)T (xi − µk) (1)

Where µk is the centroid for cluster Ck

µk = 1Nk

∑i:xi∈Ck

xi (2)

Where Nk is the number of samples in the cluster Ck .

19 / 75

Page 59: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

What is the code trying to do?

It is trying to find a partition SK -means tries to find a partition S that minimizes:

minS

N∑k=1

∑i:xi∈Ck

(xi − µk)T (xi − µk) (1)

Where µk is the centroid for cluster Ck

µk = 1Nk

∑i:xi∈Ck

xi (2)

Where Nk is the number of samples in the cluster Ck .

19 / 75

Page 60: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Outline1 Supervised Learning vs. Unsupervised Learning

Supervised vs UnsupervisedClusteringPattern RecognitionWhy Clustering?

2 K-Means ClusteringK-Means ClusteringConvergence CriterionThe Distance FunctionExampleProperties of K-Means

3 The K-Center Criterion ClusteringIntroduction

Comparison with K-meansThe Greedy K-Center Algorithm

Pseudo-CodeK-Center Algorithm PropertiesK-Center Algorithm proof of correctness

20 / 75

Page 61: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

What Stopping/convergence criterion should we use?

FirstNo (or minimum) re-assignments of data points to different clusters.

SecondNo (or minimum) change of centroids.

ThirdMinimum decrease in the sum of squared error (SSE),

Ck is cluster k.vk is the centroid of cluster Ck .

SSE =K∑

k=1

∑x∈ck

dist (x, vk)2

21 / 75

Page 62: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

What Stopping/convergence criterion should we use?

FirstNo (or minimum) re-assignments of data points to different clusters.

SecondNo (or minimum) change of centroids.

ThirdMinimum decrease in the sum of squared error (SSE),

Ck is cluster k.vk is the centroid of cluster Ck .

SSE =K∑

k=1

∑x∈ck

dist (x, vk)2

21 / 75

Page 63: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

What Stopping/convergence criterion should we use?

FirstNo (or minimum) re-assignments of data points to different clusters.

SecondNo (or minimum) change of centroids.

ThirdMinimum decrease in the sum of squared error (SSE),

Ck is cluster k.vk is the centroid of cluster Ck .

SSE =K∑

k=1

∑x∈ck

dist (x, vk)2

21 / 75

Page 64: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

What Stopping/convergence criterion should we use?

FirstNo (or minimum) re-assignments of data points to different clusters.

SecondNo (or minimum) change of centroids.

ThirdMinimum decrease in the sum of squared error (SSE),

Ck is cluster k.vk is the centroid of cluster Ck .

SSE =K∑

k=1

∑x∈ck

dist (x, vk)2

21 / 75

Page 65: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

What Stopping/convergence criterion should we use?

FirstNo (or minimum) re-assignments of data points to different clusters.

SecondNo (or minimum) change of centroids.

ThirdMinimum decrease in the sum of squared error (SSE),

Ck is cluster k.vk is the centroid of cluster Ck .

SSE =K∑

k=1

∑x∈ck

dist (x, vk)2

21 / 75

Page 66: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Outline1 Supervised Learning vs. Unsupervised Learning

Supervised vs UnsupervisedClusteringPattern RecognitionWhy Clustering?

2 K-Means ClusteringK-Means ClusteringConvergence CriterionThe Distance FunctionExampleProperties of K-Means

3 The K-Center Criterion ClusteringIntroduction

Comparison with K-meansThe Greedy K-Center Algorithm

Pseudo-CodeK-Center Algorithm PropertiesK-Center Algorithm proof of correctness

22 / 75

Page 67: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

The distance function

Actually, we have the following distance functions:

Euclidean

dixt(x, y) = ||x− y|| =√

(x− y)T (x− y)

Manhattan

dixt(x, y) = ||x− y||1 =n∑

i=1|xi − yi |

Mahalanobis

dixt(x, y) = ||x− y||A =√

(x− y)T A(x− y)

23 / 75

Page 68: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

The distance function

Actually, we have the following distance functions:

Euclidean

dixt(x, y) = ||x− y|| =√

(x− y)T (x− y)

Manhattan

dixt(x, y) = ||x− y||1 =n∑

i=1|xi − yi |

Mahalanobis

dixt(x, y) = ||x− y||A =√

(x− y)T A(x− y)

23 / 75

Page 69: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

The distance function

Actually, we have the following distance functions:

Euclidean

dixt(x, y) = ||x− y|| =√

(x− y)T (x− y)

Manhattan

dixt(x, y) = ||x− y||1 =n∑

i=1|xi − yi |

Mahalanobis

dixt(x, y) = ||x− y||A =√

(x− y)T A(x− y)

23 / 75

Page 70: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Outline1 Supervised Learning vs. Unsupervised Learning

Supervised vs UnsupervisedClusteringPattern RecognitionWhy Clustering?

2 K-Means ClusteringK-Means ClusteringConvergence CriterionThe Distance FunctionExampleProperties of K-Means

3 The K-Center Criterion ClusteringIntroduction

Comparison with K-meansThe Greedy K-Center Algorithm

Pseudo-CodeK-Center Algorithm PropertiesK-Center Algorithm proof of correctness

24 / 75

Page 71: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

An example

Dropping two possible centroids

25 / 75

Page 72: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

An example

Calculate the memberships

26 / 75

Page 73: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

An example

We re-calculate centroids

27 / 75

Page 74: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

An example

We re-calculate memberships

28 / 75

Page 75: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

An example

We re-calculate centroids and keep going

29 / 75

Page 76: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Outline1 Supervised Learning vs. Unsupervised Learning

Supervised vs UnsupervisedClusteringPattern RecognitionWhy Clustering?

2 K-Means ClusteringK-Means ClusteringConvergence CriterionThe Distance FunctionExampleProperties of K-Means

3 The K-Center Criterion ClusteringIntroduction

Comparison with K-meansThe Greedy K-Center Algorithm

Pseudo-CodeK-Center Algorithm PropertiesK-Center Algorithm proof of correctness

30 / 75

Page 77: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Strengths of K -means

StrengthsSimple: easy to understand and to implementEfficient: Time complexity: O(tKN ), where N is the number of datapoints, K is the number of clusters, and t is the number of iterations.Since both K and t are small. K -means is considered a linearalgorithm.

PopularityK -means is the most popular clustering algorithm.

Note thatIt terminates at a local optimum if SSE is used. The global optimum ishard to find due to complexity.

31 / 75

Page 78: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Strengths of K -means

StrengthsSimple: easy to understand and to implementEfficient: Time complexity: O(tKN ), where N is the number of datapoints, K is the number of clusters, and t is the number of iterations.Since both K and t are small. K -means is considered a linearalgorithm.

PopularityK -means is the most popular clustering algorithm.

Note thatIt terminates at a local optimum if SSE is used. The global optimum ishard to find due to complexity.

31 / 75

Page 79: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Strengths of K -means

StrengthsSimple: easy to understand and to implementEfficient: Time complexity: O(tKN ), where N is the number of datapoints, K is the number of clusters, and t is the number of iterations.Since both K and t are small. K -means is considered a linearalgorithm.

PopularityK -means is the most popular clustering algorithm.

Note thatIt terminates at a local optimum if SSE is used. The global optimum ishard to find due to complexity.

31 / 75

Page 80: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Strengths of K -means

StrengthsSimple: easy to understand and to implementEfficient: Time complexity: O(tKN ), where N is the number of datapoints, K is the number of clusters, and t is the number of iterations.Since both K and t are small. K -means is considered a linearalgorithm.

PopularityK -means is the most popular clustering algorithm.

Note thatIt terminates at a local optimum if SSE is used. The global optimum ishard to find due to complexity.

31 / 75

Page 81: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Strengths of K -means

StrengthsSimple: easy to understand and to implementEfficient: Time complexity: O(tKN ), where N is the number of datapoints, K is the number of clusters, and t is the number of iterations.Since both K and t are small. K -means is considered a linearalgorithm.

PopularityK -means is the most popular clustering algorithm.

Note thatIt terminates at a local optimum if SSE is used. The global optimum ishard to find due to complexity.

31 / 75

Page 82: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Weaknesses of K -means

ImportantThe algorithm is only applicable if the mean is defined.

For categorical data, K -mode - the centroid is represented by mostfrequent values.

In additionThe user needs to specify K .

OutliersThe algorithm is sensitive to outliers.

Outliers are data points that are very far away from other data points.Outliers could be errors in the data recording or some special datapoints with very different values.

32 / 75

Page 83: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Weaknesses of K -means

ImportantThe algorithm is only applicable if the mean is defined.

For categorical data, K -mode - the centroid is represented by mostfrequent values.

In additionThe user needs to specify K .

OutliersThe algorithm is sensitive to outliers.

Outliers are data points that are very far away from other data points.Outliers could be errors in the data recording or some special datapoints with very different values.

32 / 75

Page 84: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Weaknesses of K -means

ImportantThe algorithm is only applicable if the mean is defined.

For categorical data, K -mode - the centroid is represented by mostfrequent values.

In additionThe user needs to specify K .

OutliersThe algorithm is sensitive to outliers.

Outliers are data points that are very far away from other data points.Outliers could be errors in the data recording or some special datapoints with very different values.

32 / 75

Page 85: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Weaknesses of K -means

ImportantThe algorithm is only applicable if the mean is defined.

For categorical data, K -mode - the centroid is represented by mostfrequent values.

In additionThe user needs to specify K .

OutliersThe algorithm is sensitive to outliers.

Outliers are data points that are very far away from other data points.Outliers could be errors in the data recording or some special datapoints with very different values.

32 / 75

Page 86: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Weaknesses of K -means

ImportantThe algorithm is only applicable if the mean is defined.

For categorical data, K -mode - the centroid is represented by mostfrequent values.

In additionThe user needs to specify K .

OutliersThe algorithm is sensitive to outliers.

Outliers are data points that are very far away from other data points.Outliers could be errors in the data recording or some special datapoints with very different values.

32 / 75

Page 87: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Weaknesses of K -means

ImportantThe algorithm is only applicable if the mean is defined.

For categorical data, K -mode - the centroid is represented by mostfrequent values.

In additionThe user needs to specify K .

OutliersThe algorithm is sensitive to outliers.

Outliers are data points that are very far away from other data points.Outliers could be errors in the data recording or some special datapoints with very different values.

32 / 75

Page 88: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Weaknesses of K -means: Problems with outliers

A single outlier

Outlier

Outlier

Undesirable Clusters

Ideal Clusters

33 / 75

Page 89: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Weaknesses of K -means: How to deal with outliers

One methodIs to remove some data points in the clustering process that are muchfurther away from the centroids than other data points.

To be safe, we may want to monitor these possible outliers over a fewiterations and then decide to remove them.

Another methodIs to perform random sampling.

Since in sampling we only choose a small subset of the data points,the chance of selecting an outlier is very small.Assign the rest of the data points to the clusters by distance orsimilarity comparison, or classification.

34 / 75

Page 90: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Weaknesses of K -means: How to deal with outliers

One methodIs to remove some data points in the clustering process that are muchfurther away from the centroids than other data points.

To be safe, we may want to monitor these possible outliers over a fewiterations and then decide to remove them.

Another methodIs to perform random sampling.

Since in sampling we only choose a small subset of the data points,the chance of selecting an outlier is very small.Assign the rest of the data points to the clusters by distance orsimilarity comparison, or classification.

34 / 75

Page 91: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Weaknesses of K -means: How to deal with outliers

One methodIs to remove some data points in the clustering process that are muchfurther away from the centroids than other data points.

To be safe, we may want to monitor these possible outliers over a fewiterations and then decide to remove them.

Another methodIs to perform random sampling.

Since in sampling we only choose a small subset of the data points,the chance of selecting an outlier is very small.Assign the rest of the data points to the clusters by distance orsimilarity comparison, or classification.

34 / 75

Page 92: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Weaknesses of K -means: How to deal with outliers

One methodIs to remove some data points in the clustering process that are muchfurther away from the centroids than other data points.

To be safe, we may want to monitor these possible outliers over a fewiterations and then decide to remove them.

Another methodIs to perform random sampling.

Since in sampling we only choose a small subset of the data points,the chance of selecting an outlier is very small.Assign the rest of the data points to the clusters by distance orsimilarity comparison, or classification.

34 / 75

Page 93: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Weaknesses of K -means: How to deal with outliers

One methodIs to remove some data points in the clustering process that are muchfurther away from the centroids than other data points.

To be safe, we may want to monitor these possible outliers over a fewiterations and then decide to remove them.

Another methodIs to perform random sampling.

Since in sampling we only choose a small subset of the data points,the chance of selecting an outlier is very small.Assign the rest of the data points to the clusters by distance orsimilarity comparison, or classification.

34 / 75

Page 94: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Weaknesses of K -means (cont...)

The algorithm is sensitive to initial seeds

Initial Centroids

Final Clusters

35 / 75

Page 95: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Weaknesses of K -means : Different Densities

We have three cluster nevertheless

Original Points K-Means (3 Clusters)

36 / 75

Page 96: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Weaknesses of K -means: Non-globular Shapes

Here, we notice that K -means can only detect globular shapes

37 / 75

Page 97: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Outline1 Supervised Learning vs. Unsupervised Learning

Supervised vs UnsupervisedClusteringPattern RecognitionWhy Clustering?

2 K-Means ClusteringK-Means ClusteringConvergence CriterionThe Distance FunctionExampleProperties of K-Means

3 The K-Center Criterion ClusteringIntroduction

Comparison with K-meansThe Greedy K-Center Algorithm

Pseudo-CodeK-Center Algorithm PropertiesK-Center Algorithm proof of correctness

38 / 75

Page 98: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

We change the Criterion

Instead of using the K -mean criterionWe use the K -center criterion.

New CriterionThe K -center criterion partitions the points into K clusters so as tominimize the maximum distance of any point to its cluster center.

39 / 75

Page 99: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

We change the Criterion

Instead of using the K -mean criterionWe use the K -center criterion.

New CriterionThe K -center criterion partitions the points into K clusters so as tominimize the maximum distance of any point to its cluster center.

39 / 75

Page 100: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Explanation

SetupSuppose we have a data set A that contains N objects.

We want the followingWe want to partition A into K sets labeled C1, C2, ..., CK .

NowDefine a cluster size for any cluster Ck as follows.

The smallest value D for which all points in this cluster Ck are:I Within distance D of each other.I Or within distance D

2 of some point called the cluster center.

40 / 75

Page 101: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Explanation

SetupSuppose we have a data set A that contains N objects.

We want the followingWe want to partition A into K sets labeled C1, C2, ..., CK .

NowDefine a cluster size for any cluster Ck as follows.

The smallest value D for which all points in this cluster Ck are:I Within distance D of each other.I Or within distance D

2 of some point called the cluster center.

40 / 75

Page 102: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Explanation

SetupSuppose we have a data set A that contains N objects.

We want the followingWe want to partition A into K sets labeled C1, C2, ..., CK .

NowDefine a cluster size for any cluster Ck as follows.

The smallest value D for which all points in this cluster Ck are:I Within distance D of each other.I Or within distance D

2 of some point called the cluster center.

40 / 75

Page 103: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Explanation

SetupSuppose we have a data set A that contains N objects.

We want the followingWe want to partition A into K sets labeled C1, C2, ..., CK .

NowDefine a cluster size for any cluster Ck as follows.

The smallest value D for which all points in this cluster Ck are:I Within distance D of each other.I Or within distance D

2 of some point called the cluster center.

40 / 75

Page 104: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Explanation

SetupSuppose we have a data set A that contains N objects.

We want the followingWe want to partition A into K sets labeled C1, C2, ..., CK .

NowDefine a cluster size for any cluster Ck as follows.

The smallest value D for which all points in this cluster Ck are:I Within distance D of each other.I Or within distance D

2 of some point called the cluster center.

40 / 75

Page 105: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Explanation

SetupSuppose we have a data set A that contains N objects.

We want the followingWe want to partition A into K sets labeled C1, C2, ..., CK .

NowDefine a cluster size for any cluster Ck as follows.

The smallest value D for which all points in this cluster Ck are:I Within distance D of each other.I Or within distance D

2 of some point called the cluster center.

40 / 75

Page 106: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Another Way

We haveAnother way to define the cluster size is that D is the maximum pairwisedistance between an arbitrary pair of points in the cluster, or twice themaximum distance between a data point and a chosen centroid.

ThusDenoting the cluster size of Ck by Dk , we have that the cluster size ofpartition (the way the points are grouped) S by:

D = maxk=1,...,K

Dk (3)

In another wordsThe cluster size of a partition composed of multiple clusters is themaximum size of these clusters.

41 / 75

Page 107: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Another Way

We haveAnother way to define the cluster size is that D is the maximum pairwisedistance between an arbitrary pair of points in the cluster, or twice themaximum distance between a data point and a chosen centroid.

ThusDenoting the cluster size of Ck by Dk , we have that the cluster size ofpartition (the way the points are grouped) S by:

D = maxk=1,...,K

Dk (3)

In another wordsThe cluster size of a partition composed of multiple clusters is themaximum size of these clusters.

41 / 75

Page 108: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Another Way

We haveAnother way to define the cluster size is that D is the maximum pairwisedistance between an arbitrary pair of points in the cluster, or twice themaximum distance between a data point and a chosen centroid.

ThusDenoting the cluster size of Ck by Dk , we have that the cluster size ofpartition (the way the points are grouped) S by:

D = maxk=1,...,K

Dk (3)

In another wordsThe cluster size of a partition composed of multiple clusters is themaximum size of these clusters.

41 / 75

Page 109: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Comparison with K-means

We use the following distance for comparisonIn order to compare these methods, we use Euclidean distance.

In K -means we assume the distance between vectors is the squaredEuclidean distanceK -means tries to find a partition S that minimizes:

minS

N∑k=1

∑i:xi∈Ck

(xi − µk)T (xi − µk) (4)

Where µk is the centroid for cluster Ck

µk = 1Nk

∑i:xi∈Ck

xi (5)

42 / 75

Page 110: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Comparison with K-means

We use the following distance for comparisonIn order to compare these methods, we use Euclidean distance.

In K -means we assume the distance between vectors is the squaredEuclidean distanceK -means tries to find a partition S that minimizes:

minS

N∑k=1

∑i:xi∈Ck

(xi − µk)T (xi − µk) (4)

Where µk is the centroid for cluster Ck

µk = 1Nk

∑i:xi∈Ck

xi (5)

42 / 75

Page 111: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Comparison with K-means

We use the following distance for comparisonIn order to compare these methods, we use Euclidean distance.

In K -means we assume the distance between vectors is the squaredEuclidean distanceK -means tries to find a partition S that minimizes:

minS

N∑k=1

∑i:xi∈Ck

(xi − µk)T (xi − µk) (4)

Where µk is the centroid for cluster Ck

µk = 1Nk

∑i:xi∈Ck

xi (5)

42 / 75

Page 112: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Now, what about the K -centersK -center, on the other hand, minimizes the worst case distance tothis centroid

minS

maxk=1,...,K

maxi:xi∈Ck

(xi − µk)T (xi − µk) (6)

Whereµk is called the “centroid”, but may not be the mean vector.

PropertiesThe above objective function shows that for each cluster, only theworst scenario matters, that is, the farthest data point to the centroid.Moreover, among the clusters, only the worst cluster matters, whosefarthest data point yields the maximum distance to the centroidcomparing with the farthest data points of the other clusters.

43 / 75

Page 113: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Now, what about the K -centersK -center, on the other hand, minimizes the worst case distance tothis centroid

minS

maxk=1,...,K

maxi:xi∈Ck

(xi − µk)T (xi − µk) (6)

Whereµk is called the “centroid”, but may not be the mean vector.

PropertiesThe above objective function shows that for each cluster, only theworst scenario matters, that is, the farthest data point to the centroid.Moreover, among the clusters, only the worst cluster matters, whosefarthest data point yields the maximum distance to the centroidcomparing with the farthest data points of the other clusters.

43 / 75

Page 114: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Now, what about the K -centersK -center, on the other hand, minimizes the worst case distance tothis centroid

minS

maxk=1,...,K

maxi:xi∈Ck

(xi − µk)T (xi − µk) (6)

Whereµk is called the “centroid”, but may not be the mean vector.

PropertiesThe above objective function shows that for each cluster, only theworst scenario matters, that is, the farthest data point to the centroid.Moreover, among the clusters, only the worst cluster matters, whosefarthest data point yields the maximum distance to the centroidcomparing with the farthest data points of the other clusters.

43 / 75

Page 115: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Now, what about the K -centersK -center, on the other hand, minimizes the worst case distance tothis centroid

minS

maxk=1,...,K

maxi:xi∈Ck

(xi − µk)T (xi − µk) (6)

Whereµk is called the “centroid”, but may not be the mean vector.

PropertiesThe above objective function shows that for each cluster, only theworst scenario matters, that is, the farthest data point to the centroid.Moreover, among the clusters, only the worst cluster matters, whosefarthest data point yields the maximum distance to the centroidcomparing with the farthest data points of the other clusters.

43 / 75

Page 116: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

In other words

FirstThis minimax type of problem is much harder to solve than solving theobjective function of k-means.

Another formulation of K -center minimizes the worst case pairwisedistance instead of using centroids

minS

maxk=1,...,K

maxi,j:xi ,xj∈Ck

L (xi , xj) (7)

WithL (xi , xj) denotes any distance between a pair of objects in the samecluster.

44 / 75

Page 117: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

In other words

FirstThis minimax type of problem is much harder to solve than solving theobjective function of k-means.

Another formulation of K -center minimizes the worst case pairwisedistance instead of using centroids

minS

maxk=1,...,K

maxi,j:xi ,xj∈Ck

L (xi , xj) (7)

WithL (xi , xj) denotes any distance between a pair of objects in the samecluster.

44 / 75

Page 118: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

In other words

FirstThis minimax type of problem is much harder to solve than solving theobjective function of k-means.

Another formulation of K -center minimizes the worst case pairwisedistance instead of using centroids

minS

maxk=1,...,K

maxi,j:xi ,xj∈Ck

L (xi , xj) (7)

WithL (xi , xj) denotes any distance between a pair of objects in the samecluster.

44 / 75

Page 119: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Outline1 Supervised Learning vs. Unsupervised Learning

Supervised vs UnsupervisedClusteringPattern RecognitionWhy Clustering?

2 K-Means ClusteringK-Means ClusteringConvergence CriterionThe Distance FunctionExampleProperties of K-Means

3 The K-Center Criterion ClusteringIntroduction

Comparison with K-meansThe Greedy K-Center Algorithm

Pseudo-CodeK-Center Algorithm PropertiesK-Center Algorithm proof of correctness

45 / 75

Page 120: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Greedy Algorithm for K -Center

Main IdeaThe idea behind the Greedy Algorithm is to choose a subset H from theoriginal dataset S consisting of K points that are farthest apart from eachother.

IntuitionSince the points in set H are far apart then the worst-case scenario hasbeen taken care of and hopefully the cluster size for the partition is small.

ThusEach point hk ∈ H represents one cluster or subset of points Ck .

46 / 75

Page 121: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Greedy Algorithm for K -Center

Main IdeaThe idea behind the Greedy Algorithm is to choose a subset H from theoriginal dataset S consisting of K points that are farthest apart from eachother.

IntuitionSince the points in set H are far apart then the worst-case scenario hasbeen taken care of and hopefully the cluster size for the partition is small.

ThusEach point hk ∈ H represents one cluster or subset of points Ck .

46 / 75

Page 122: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Greedy Algorithm for K -Center

Main IdeaThe idea behind the Greedy Algorithm is to choose a subset H from theoriginal dataset S consisting of K points that are farthest apart from eachother.

IntuitionSince the points in set H are far apart then the worst-case scenario hasbeen taken care of and hopefully the cluster size for the partition is small.

ThusEach point hk ∈ H represents one cluster or subset of points Ck .

46 / 75

Page 123: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Then

Something NotableWe can think of it as a centroid.

HoweverTechnically it is not a centroid because it tends to be at the boundary of acluster, but conceptually we can think of it as a centroid.

ImportantThe way that we partition these points given the centroids is the same asin K -means, that is, the nearest-neighbor rule.

47 / 75

Page 124: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Then

Something NotableWe can think of it as a centroid.

HoweverTechnically it is not a centroid because it tends to be at the boundary of acluster, but conceptually we can think of it as a centroid.

ImportantThe way that we partition these points given the centroids is the same asin K -means, that is, the nearest-neighbor rule.

47 / 75

Page 125: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Then

Something NotableWe can think of it as a centroid.

HoweverTechnically it is not a centroid because it tends to be at the boundary of acluster, but conceptually we can think of it as a centroid.

ImportantThe way that we partition these points given the centroids is the same asin K -means, that is, the nearest-neighbor rule.

47 / 75

Page 126: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Specifically

We do the followingFor every point xi , in order to see which cluster Ck it is partitioned into,we compute its distance to each cluster centroid as follows, and find outwhich centroid is the closest:

L (xi , hk) = mink′=1,...,K

L (xi , hk′) (8)

ThusWhichever centroid with the minimum distance is selected as the clusterfor xi .

48 / 75

Page 127: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Specifically

We do the followingFor every point xi , in order to see which cluster Ck it is partitioned into,we compute its distance to each cluster centroid as follows, and find outwhich centroid is the closest:

L (xi , hk) = mink′=1,...,K

L (xi , hk′) (8)

ThusWhichever centroid with the minimum distance is selected as the clusterfor xi .

48 / 75

Page 128: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Important

For K-center clusteringWe only need pairwise distance L (xi , xj) for any xi , xj ∈ S .

Wherexi can be a non-vector representation of the objects.

As long we can calculateL (xi , xj) which makes the K -center more general than the K -means.

49 / 75

Page 129: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Important

For K-center clusteringWe only need pairwise distance L (xi , xj) for any xi , xj ∈ S .

Wherexi can be a non-vector representation of the objects.

As long we can calculateL (xi , xj) which makes the K -center more general than the K -means.

49 / 75

Page 130: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Important

For K-center clusteringWe only need pairwise distance L (xi , xj) for any xi , xj ∈ S .

Wherexi can be a non-vector representation of the objects.

As long we can calculateL (xi , xj) which makes the K -center more general than the K -means.

49 / 75

Page 131: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Properties

Something NotableThe greedy algorithm achieves an approximation factor of 2 as thedistance measure L satisfies the triangle inequality.

Thus, we have that

D∗ = minS

maxk=1,...,K

maxi,j:xi ,xj∈Ck

L (xi , xj) (9)

Then, we have the following guarantee for the greedy algorithm

D ≤ 2D∗ (10)

50 / 75

Page 132: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Properties

Something NotableThe greedy algorithm achieves an approximation factor of 2 as thedistance measure L satisfies the triangle inequality.

Thus, we have that

D∗ = minS

maxk=1,...,K

maxi,j:xi ,xj∈Ck

L (xi , xj) (9)

Then, we have the following guarantee for the greedy algorithm

D ≤ 2D∗ (10)

50 / 75

Page 133: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Properties

Something NotableThe greedy algorithm achieves an approximation factor of 2 as thedistance measure L satisfies the triangle inequality.

Thus, we have that

D∗ = minS

maxk=1,...,K

maxi,j:xi ,xj∈Ck

L (xi , xj) (9)

Then, we have the following guarantee for the greedy algorithm

D ≤ 2D∗ (10)

50 / 75

Page 134: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Nevertheless

We have thatK -center does not provide a locally optimal solution.

We can get only a solutionGuaranteeing to be within a certain performance range of the theoreticaloptimal solution.

51 / 75

Page 135: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Nevertheless

We have thatK -center does not provide a locally optimal solution.

We can get only a solutionGuaranteeing to be within a certain performance range of the theoreticaloptimal solution.

51 / 75

Page 136: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Setup

FirstSet H denotes the set of cluster centroids or cluster of representativeobjects {h1, ..., hk} ⊂ S .

SecondLet cluster (xi) be the identity of the cluster xi ∈ S belongs to.

ThirdThe distance dist (xi) is the distance between xi and its closest clusterrepresentative object (centroid):

dist (xi) = minhj∈H

L (xi , hj) (11)

52 / 75

Page 137: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Setup

FirstSet H denotes the set of cluster centroids or cluster of representativeobjects {h1, ..., hk} ⊂ S .

SecondLet cluster (xi) be the identity of the cluster xi ∈ S belongs to.

ThirdThe distance dist (xi) is the distance between xi and its closest clusterrepresentative object (centroid):

dist (xi) = minhj∈H

L (xi , hj) (11)

52 / 75

Page 138: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Setup

FirstSet H denotes the set of cluster centroids or cluster of representativeobjects {h1, ..., hk} ⊂ S .

SecondLet cluster (xi) be the identity of the cluster xi ∈ S belongs to.

ThirdThe distance dist (xi) is the distance between xi and its closest clusterrepresentative object (centroid):

dist (xi) = minhj∈H

L (xi , hj) (11)

52 / 75

Page 139: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

The Main Idea

Something NotableWe always assign xi to the closest centroid. Therefore dist (xi) is theminimum distance between xi and any centroid.

The Algorithm is IterativeWe generate one cluster centroid first and then add others one by oneuntil we get K clusters.The set of centroids H starts with only a single centroid, h1.

53 / 75

Page 140: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

The Main Idea

Something NotableWe always assign xi to the closest centroid. Therefore dist (xi) is theminimum distance between xi and any centroid.

The Algorithm is IterativeWe generate one cluster centroid first and then add others one by oneuntil we get K clusters.The set of centroids H starts with only a single centroid, h1.

53 / 75

Page 141: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

The Main Idea

Something NotableWe always assign xi to the closest centroid. Therefore dist (xi) is theminimum distance between xi and any centroid.

The Algorithm is IterativeWe generate one cluster centroid first and then add others one by oneuntil we get K clusters.The set of centroids H starts with only a single centroid, h1.

53 / 75

Page 142: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Algorithm

Step 1Randomly select an object xj from S , let h1 = xj , H = {h1}.It does not matter how xj is selected.

Step 2For j = 1 to n:

1 dist (xi) = L (xi , h1).2 cluster (xi).

In other wordsThe dist (xi) is the computed distance between xj and h1.Because so far we only have one cluster, we will assign a cluster label1 to every xj .

54 / 75

Page 143: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Algorithm

Step 1Randomly select an object xj from S , let h1 = xj , H = {h1}.It does not matter how xj is selected.

Step 2For j = 1 to n:

1 dist (xi) = L (xi , h1).2 cluster (xi).

In other wordsThe dist (xi) is the computed distance between xj and h1.Because so far we only have one cluster, we will assign a cluster label1 to every xj .

54 / 75

Page 144: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Algorithm

Step 1Randomly select an object xj from S , let h1 = xj , H = {h1}.It does not matter how xj is selected.

Step 2For j = 1 to n:

1 dist (xi) = L (xi , h1).2 cluster (xi).

In other wordsThe dist (xi) is the computed distance between xj and h1.Because so far we only have one cluster, we will assign a cluster label1 to every xj .

54 / 75

Page 145: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Algorithm

Step 1Randomly select an object xj from S , let h1 = xj , H = {h1}.It does not matter how xj is selected.

Step 2For j = 1 to n:

1 dist (xi) = L (xi , h1).2 cluster (xi).

In other wordsThe dist (xi) is the computed distance between xj and h1.Because so far we only have one cluster, we will assign a cluster label1 to every xj .

54 / 75

Page 146: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Algorithm

Step 1Randomly select an object xj from S , let h1 = xj , H = {h1}.It does not matter how xj is selected.

Step 2For j = 1 to n:

1 dist (xi) = L (xi , h1).2 cluster (xi).

In other wordsThe dist (xi) is the computed distance between xj and h1.Because so far we only have one cluster, we will assign a cluster label1 to every xj .

54 / 75

Page 147: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Algorithm

Step 1Randomly select an object xj from S , let h1 = xj , H = {h1}.It does not matter how xj is selected.

Step 2For j = 1 to n:

1 dist (xi) = L (xi , h1).2 cluster (xi).

In other wordsThe dist (xi) is the computed distance between xj and h1.Because so far we only have one cluster, we will assign a cluster label1 to every xj .

54 / 75

Page 148: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Algorithm

Step 1Randomly select an object xj from S , let h1 = xj , H = {h1}.It does not matter how xj is selected.

Step 2For j = 1 to n:

1 dist (xi) = L (xi , h1).2 cluster (xi).

In other wordsThe dist (xi) is the computed distance between xj and h1.Because so far we only have one cluster, we will assign a cluster label1 to every xj .

54 / 75

Page 149: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Algorithm

Step 3For i = 2 to K

1 D = maxxj :xj∈S−H

dist (xj)

2 Choose hi ∈ S −H such that dist (hi) == D3 H = H ∪ {hi}4 for j = 1 to N5 if L (xj , hi) ≤ dist (xj)6 dist (xj) = L (xj , hi)7 cluster (xj) = i

55 / 75

Page 150: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Algorithm

Step 3For i = 2 to K

1 D = maxxj :xj∈S−H

dist (xj)

2 Choose hi ∈ S −H such that dist (hi) == D3 H = H ∪ {hi}4 for j = 1 to N5 if L (xj , hi) ≤ dist (xj)6 dist (xj) = L (xj , hi)7 cluster (xj) = i

55 / 75

Page 151: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Algorithm

Step 3For i = 2 to K

1 D = maxxj :xj∈S−H

dist (xj)

2 Choose hi ∈ S −H such that dist (hi) == D3 H = H ∪ {hi}4 for j = 1 to N5 if L (xj , hi) ≤ dist (xj)6 dist (xj) = L (xj , hi)7 cluster (xj) = i

55 / 75

Page 152: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Algorithm

Step 3For i = 2 to K

1 D = maxxj :xj∈S−H

dist (xj)

2 Choose hi ∈ S −H such that dist (hi) == D3 H = H ∪ {hi}4 for j = 1 to N5 if L (xj , hi) ≤ dist (xj)6 dist (xj) = L (xj , hi)7 cluster (xj) = i

55 / 75

Page 153: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Algorithm

Step 3For i = 2 to K

1 D = maxxj :xj∈S−H

dist (xj)

2 Choose hi ∈ S −H such that dist (hi) == D3 H = H ∪ {hi}4 for j = 1 to N5 if L (xj , hi) ≤ dist (xj)6 dist (xj) = L (xj , hi)7 cluster (xj) = i

55 / 75

Page 154: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Algorithm

Step 3For i = 2 to K

1 D = maxxj :xj∈S−H

dist (xj)

2 Choose hi ∈ S −H such that dist (hi) == D3 H = H ∪ {hi}4 for j = 1 to N5 if L (xj , hi) ≤ dist (xj)6 dist (xj) = L (xj , hi)7 cluster (xj) = i

55 / 75

Page 155: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Algorithm

Step 3For i = 2 to K

1 D = maxxj :xj∈S−H

dist (xj)

2 Choose hi ∈ S −H such that dist (hi) == D3 H = H ∪ {hi}4 for j = 1 to N5 if L (xj , hi) ≤ dist (xj)6 dist (xj) = L (xj , hi)7 cluster (xj) = i

55 / 75

Page 156: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Thus

As the algorithm progressesWe gradually add more and more cluster centroids, beginning with 2until we get to K .At each iteration, we find among all of the points which are not yetincluded in the set, a worst point:

I Worst in the sense that this point has maximum distance to itscorresponding centroid.

This worst pointIt is added to the set H .To stress gain, points already included in H are not among theconsideration.

56 / 75

Page 157: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Thus

As the algorithm progressesWe gradually add more and more cluster centroids, beginning with 2until we get to K .At each iteration, we find among all of the points which are not yetincluded in the set, a worst point:

I Worst in the sense that this point has maximum distance to itscorresponding centroid.

This worst pointIt is added to the set H .To stress gain, points already included in H are not among theconsideration.

56 / 75

Page 158: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Thus

As the algorithm progressesWe gradually add more and more cluster centroids, beginning with 2until we get to K .At each iteration, we find among all of the points which are not yetincluded in the set, a worst point:

I Worst in the sense that this point has maximum distance to itscorresponding centroid.

This worst pointIt is added to the set H .To stress gain, points already included in H are not among theconsideration.

56 / 75

Page 159: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Thus

As the algorithm progressesWe gradually add more and more cluster centroids, beginning with 2until we get to K .At each iteration, we find among all of the points which are not yetincluded in the set, a worst point:

I Worst in the sense that this point has maximum distance to itscorresponding centroid.

This worst pointIt is added to the set H .To stress gain, points already included in H are not among theconsideration.

56 / 75

Page 160: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Implementation

We can use the following for implementationThe disjoint-set data structure with the following operations:

I MakeSetI FindI UnionI Remove Iteratively through the disjoint trees.

Of course it is necessary to have extra fields for the necessarydistancesThus, each node-element for the sets must have a field dist (xj) such that

dist (xj) = L (xj , Find (xj)) (12)

57 / 75

Page 161: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Implementation

We can use the following for implementationThe disjoint-set data structure with the following operations:

I MakeSetI FindI UnionI Remove Iteratively through the disjoint trees.

Of course it is necessary to have extra fields for the necessarydistancesThus, each node-element for the sets must have a field dist (xj) such that

dist (xj) = L (xj , Find (xj)) (12)

57 / 75

Page 162: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Implementation

We can use the following for implementationThe disjoint-set data structure with the following operations:

I MakeSetI FindI UnionI Remove Iteratively through the disjoint trees.

Of course it is necessary to have extra fields for the necessarydistancesThus, each node-element for the sets must have a field dist (xj) such that

dist (xj) = L (xj , Find (xj)) (12)

57 / 75

Page 163: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Implementation

We can use the following for implementationThe disjoint-set data structure with the following operations:

I MakeSetI FindI UnionI Remove Iteratively through the disjoint trees.

Of course it is necessary to have extra fields for the necessarydistancesThus, each node-element for the sets must have a field dist (xj) such that

dist (xj) = L (xj , Find (xj)) (12)

57 / 75

Page 164: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Implementation

We can use the following for implementationThe disjoint-set data structure with the following operations:

I MakeSetI FindI UnionI Remove Iteratively through the disjoint trees.

Of course it is necessary to have extra fields for the necessarydistancesThus, each node-element for the sets must have a field dist (xj) such that

dist (xj) = L (xj , Find (xj)) (12)

57 / 75

Page 165: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Implementation

We can use the following for implementationThe disjoint-set data structure with the following operations:

I MakeSetI FindI UnionI Remove Iteratively through the disjoint trees.

Of course it is necessary to have extra fields for the necessarydistancesThus, each node-element for the sets must have a field dist (xj) such that

dist (xj) = L (xj , Find (xj)) (12)

57 / 75

Page 166: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Although

Other things need to be taken in considerationI will allow to you to think about them

58 / 75

Page 167: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Example

Running the following algorithm allows to obtain better results in oneof the problematic sets

Original Points K-Centroid (3 Clusters)

59 / 75

Page 168: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Outline1 Supervised Learning vs. Unsupervised Learning

Supervised vs UnsupervisedClusteringPattern RecognitionWhy Clustering?

2 K-Means ClusteringK-Means ClusteringConvergence CriterionThe Distance FunctionExampleProperties of K-Means

3 The K-Center Criterion ClusteringIntroduction

Comparison with K-meansThe Greedy K-Center Algorithm

Pseudo-CodeK-Center Algorithm PropertiesK-Center Algorithm proof of correctness

60 / 75

Page 169: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

The Running Time

We have thatThe running time of the algorithm is O (KN ), where K is the number ofclusters generated and N is the size of the data set.

Why?Because K -center only requires pairwise distance between any point andthe centroids.

61 / 75

Page 170: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

The Running Time

We have thatThe running time of the algorithm is O (KN ), where K is the number ofclusters generated and N is the size of the data set.

Why?Because K -center only requires pairwise distance between any point andthe centroids.

61 / 75

Page 171: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Main Bound of the K -center algorithm

LemmaGiven the distance measure L satisfying the triangle inequality. If thepartition obtained by the greedy algorithm is S̃ and the optimal partitionbe S∗, such that the cluster size of S̃ be D̃ and the one for S∗ is D∗, then

D̃ ≤ 2D∗ (13)

62 / 75

Page 172: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Outline1 Supervised Learning vs. Unsupervised Learning

Supervised vs UnsupervisedClusteringPattern RecognitionWhy Clustering?

2 K-Means ClusteringK-Means ClusteringConvergence CriterionThe Distance FunctionExampleProperties of K-Means

3 The K-Center Criterion ClusteringIntroduction

Comparison with K-meansThe Greedy K-Center Algorithm

Pseudo-CodeK-Center Algorithm PropertiesK-Center Algorithm proof of correctness

63 / 75

Page 173: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

ProofIf we look only at the first j centroidIt generates a partition j with size Dj and also:

The cluster size is the size of the biggest cluster in the currentpartition.The size of every cluster is defined as the maximum distance betweena point and the cluster centroid

Thus

D1 ≥ D2 ≥ D3... (14)

Why?Because D = max

xj :xj∈S−Hdist (xj) and lines 5-7 in the step 3.

64 / 75

Page 174: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

ProofIf we look only at the first j centroidIt generates a partition j with size Dj and also:

The cluster size is the size of the biggest cluster in the currentpartition.The size of every cluster is defined as the maximum distance betweena point and the cluster centroid

Thus

D1 ≥ D2 ≥ D3... (14)

Why?Because D = max

xj :xj∈S−Hdist (xj) and lines 5-7 in the step 3.

64 / 75

Page 175: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

ProofIf we look only at the first j centroidIt generates a partition j with size Dj and also:

The cluster size is the size of the biggest cluster in the currentpartition.The size of every cluster is defined as the maximum distance betweena point and the cluster centroid

Thus

D1 ≥ D2 ≥ D3... (14)

Why?Because D = max

xj :xj∈S−Hdist (xj) and lines 5-7 in the step 3.

64 / 75

Page 176: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

ProofIf we look only at the first j centroidIt generates a partition j with size Dj and also:

The cluster size is the size of the biggest cluster in the currentpartition.The size of every cluster is defined as the maximum distance betweena point and the cluster centroid

Thus

D1 ≥ D2 ≥ D3... (14)

Why?Because D = max

xj :xj∈S−Hdist (xj) and lines 5-7 in the step 3.

64 / 75

Page 177: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

ProofIf we look only at the first j centroidIt generates a partition j with size Dj and also:

The cluster size is the size of the biggest cluster in the currentpartition.The size of every cluster is defined as the maximum distance betweena point and the cluster centroid

Thus

D1 ≥ D2 ≥ D3... (14)

Why?Because D = max

xj :xj∈S−Hdist (xj) and lines 5-7 in the step 3.

64 / 75

Page 178: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Now

No, we have that

∀i < j, L (hi , hj) ≥ Dj−1 (15)

65 / 75

Page 179: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Proof of the Previous Statement

First, Did you notice the following?

Dj−1 = L (hj−1, hj) (16)

In addition, it is possible to see that

L (hj−2, hj) ≥ L (hj−1, hj) (17)

How?Assume it is not true. Then, L (hj−2, hj) < L (hj−1, hj)

66 / 75

Page 180: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Proof of the Previous Statement

First, Did you notice the following?

Dj−1 = L (hj−1, hj) (16)

In addition, it is possible to see that

L (hj−2, hj) ≥ L (hj−1, hj) (17)

How?Assume it is not true. Then, L (hj−2, hj) < L (hj−1, hj)

66 / 75

Page 181: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Proof of the Previous Statement

First, Did you notice the following?

Dj−1 = L (hj−1, hj) (16)

In addition, it is possible to see that

L (hj−2, hj) ≥ L (hj−1, hj) (17)

How?Assume it is not true. Then, L (hj−2, hj) < L (hj−1, hj)

66 / 75

Page 182: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Then

We have thathj−2 ∈C̃j−2 , hj−2 ∈C̃j which is a contradiction because hj−2 is generatedby the algorithm such that cannot be in any other cluster!!!

Thus iteratively

L (h1, hj) ≥ L (h2, hj) ≥ L (h3, hj) ≥ ... ≥ L (hj−1, hj) = Dj−1 (18)

67 / 75

Page 183: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Then

We have thathj−2 ∈C̃j−2 , hj−2 ∈C̃j which is a contradiction because hj−2 is generatedby the algorithm such that cannot be in any other cluster!!!

Thus iteratively

L (h1, hj) ≥ L (h2, hj) ≥ L (h3, hj) ≥ ... ≥ L (hj−1, hj) = Dj−1 (18)

67 / 75

Page 184: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Not only that

Not only that

∀j, ∃i < j, L (hi , hj) = Dj−1 (19)

ThereforeTherefore, Dj−1 is not only the lower bound for the distance between hiand hj , it is also the exact boundary.

68 / 75

Page 185: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Not only that

Not only that

∀j, ∃i < j, L (hi , hj) = Dj−1 (19)

ThereforeTherefore, Dj−1 is not only the lower bound for the distance between hiand hj , it is also the exact boundary.

68 / 75

Page 186: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Not only that

Not only that

∀j, ∃i < j, L (hi , hj) = Dj−1 (19)

ThereforeTherefore, Dj−1 is not only the lower bound for the distance between hiand hj , it is also the exact boundary.

68 / 75

Page 187: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

ProofNow

Let us consider the optimal partition S∗ with K clusters and theminimum size D∗.

I Suppose the greedy algorithm generates the centroidsH̃ = {h1, h2, ..., hK}.

I For the proof here we are adding one more, hK+1.I This can be supposed without losing generality.

According to the pigeonhole principle, at least two of the centroidsamong{h1, h2, ..., hK , hK+1}will fall into one cluster k of the partition S∗.

ThusLet the two centroids be 1 ≤ i < j ≤ k + 1⇒ Using the triangle inequality:

L (hi , hj) ≤ L (hi , hk) + L (hk , hj) ≤ D∗ + D∗ = 2D∗ (20)69 / 75

Page 188: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

ProofNow

Let us consider the optimal partition S∗ with K clusters and theminimum size D∗.

I Suppose the greedy algorithm generates the centroidsH̃ = {h1, h2, ..., hK}.

I For the proof here we are adding one more, hK+1.I This can be supposed without losing generality.

According to the pigeonhole principle, at least two of the centroidsamong{h1, h2, ..., hK , hK+1}will fall into one cluster k of the partition S∗.

ThusLet the two centroids be 1 ≤ i < j ≤ k + 1⇒ Using the triangle inequality:

L (hi , hj) ≤ L (hi , hk) + L (hk , hj) ≤ D∗ + D∗ = 2D∗ (20)69 / 75

Page 189: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

ProofNow

Let us consider the optimal partition S∗ with K clusters and theminimum size D∗.

I Suppose the greedy algorithm generates the centroidsH̃ = {h1, h2, ..., hK}.

I For the proof here we are adding one more, hK+1.I This can be supposed without losing generality.

According to the pigeonhole principle, at least two of the centroidsamong{h1, h2, ..., hK , hK+1}will fall into one cluster k of the partition S∗.

ThusLet the two centroids be 1 ≤ i < j ≤ k + 1⇒ Using the triangle inequality:

L (hi , hj) ≤ L (hi , hk) + L (hk , hj) ≤ D∗ + D∗ = 2D∗ (20)69 / 75

Page 190: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

ProofNow

Let us consider the optimal partition S∗ with K clusters and theminimum size D∗.

I Suppose the greedy algorithm generates the centroidsH̃ = {h1, h2, ..., hK}.

I For the proof here we are adding one more, hK+1.I This can be supposed without losing generality.

According to the pigeonhole principle, at least two of the centroidsamong{h1, h2, ..., hK , hK+1}will fall into one cluster k of the partition S∗.

ThusLet the two centroids be 1 ≤ i < j ≤ k + 1⇒ Using the triangle inequality:

L (hi , hj) ≤ L (hi , hk) + L (hk , hj) ≤ D∗ + D∗ = 2D∗ (20)69 / 75

Page 191: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

ProofNow

Let us consider the optimal partition S∗ with K clusters and theminimum size D∗.

I Suppose the greedy algorithm generates the centroidsH̃ = {h1, h2, ..., hK}.

I For the proof here we are adding one more, hK+1.I This can be supposed without losing generality.

According to the pigeonhole principle, at least two of the centroidsamong{h1, h2, ..., hK , hK+1}will fall into one cluster k of the partition S∗.

ThusLet the two centroids be 1 ≤ i < j ≤ k + 1⇒ Using the triangle inequality:

L (hi , hj) ≤ L (hi , hk) + L (hk , hj) ≤ D∗ + D∗ = 2D∗ (20)69 / 75

Page 192: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Then

In additionAlso L (hi , hj) ≥ Dj−1 ≥ Dk then Dk ≤ 2D∗

Now define with S̃ is the partition generated by the greedy algorithm

∆ = maxxj :xj∈S̃−H̃

minhk :hk∈H̃

L (xj , hk) (21)

BasicallyThe maximum of all points that are not centroids that minimize thedistance to some centroid for the partition generated by the greedyalgorithm.

70 / 75

Page 193: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Then

In additionAlso L (hi , hj) ≥ Dj−1 ≥ Dk then Dk ≤ 2D∗

Now define with S̃ is the partition generated by the greedy algorithm

∆ = maxxj :xj∈S̃−H̃

minhk :hk∈H̃

L (xj , hk) (21)

BasicallyThe maximum of all points that are not centroids that minimize thedistance to some centroid for the partition generated by the greedyalgorithm.

70 / 75

Page 194: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Then

In additionAlso L (hi , hj) ≥ Dj−1 ≥ Dk then Dk ≤ 2D∗

Now define with S̃ is the partition generated by the greedy algorithm

∆ = maxxj :xj∈S̃−H̃

minhk :hk∈H̃

L (xj , hk) (21)

BasicallyThe maximum of all points that are not centroids that minimize thedistance to some centroid for the partition generated by the greedyalgorithm.

70 / 75

Page 195: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Now, we are ready for the proof

Let hK+1 be an element in S̃ − H̃Such that

minhk :hk∈H̃

L (hK+1, hk) = ∆ (22)

By definition

L (hK+1, hk) ≥ ∆, ∀k = 1, ..., K (23)

Thus, we have the following setsLet Hk = {h1, ..., hk} with k = 1, 2, ..., K .

71 / 75

Page 196: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Now, we are ready for the proof

Let hK+1 be an element in S̃ − H̃Such that

minhk :hk∈H̃

L (hK+1, hk) = ∆ (22)

By definition

L (hK+1, hk) ≥ ∆, ∀k = 1, ..., K (23)

Thus, we have the following setsLet Hk = {h1, ..., hk} with k = 1, 2, ..., K .

71 / 75

Page 197: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Now, we are ready for the proof

Let hK+1 be an element in S̃ − H̃Such that

minhk :hk∈H̃

L (hK+1, hk) = ∆ (22)

By definition

L (hK+1, hk) ≥ ∆, ∀k = 1, ..., K (23)

Thus, we have the following setsLet Hk = {h1, ..., hk} with k = 1, 2, ..., K .

71 / 75

Page 198: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Now

Consider the distance between hi and hj for i < j ≤ KAccording the greedy algorithm min

hk :hk∈Hj−1L (hj , hk) ≥ min

hk :hk∈Hj−1L (x l , hk)

for any xl ∈ S̃ −Hj

SincehK+1 ∈ S̃ − H̃ and S̃ − H̃ ⊂ S̃ −Hj

72 / 75

Page 199: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Now

Consider the distance between hi and hj for i < j ≤ KAccording the greedy algorithm min

hk :hk∈Hj−1L (hj , hk) ≥ min

hk :hk∈Hj−1L (x l , hk)

for any xl ∈ S̃ −Hj

SincehK+1 ∈ S̃ − H̃ and S̃ − H̃ ⊂ S̃ −Hj

72 / 75

Page 200: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

We have

The following

L (hj , hi) ≥ minhk :hk∈Hj−1

L (hj , hk)

≥ minhk :hk∈Hj−1

L (hK+1, hk)

≥ minhk :hk∈H̃

L (hK+1, hk)

=∆

We have shown that for any for i < j ≤ k + 1

L (hj , hi) ≥ ∆ (24)

73 / 75

Page 201: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

We have

The following

L (hj , hi) ≥ minhk :hk∈Hj−1

L (hj , hk)

≥ minhk :hk∈Hj−1

L (hK+1, hk)

≥ minhk :hk∈H̃

L (hK+1, hk)

=∆

We have shown that for any for i < j ≤ k + 1

L (hj , hi) ≥ ∆ (24)

73 / 75

Page 202: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

We have

The following

L (hj , hi) ≥ minhk :hk∈Hj−1

L (hj , hk)

≥ minhk :hk∈Hj−1

L (hK+1, hk)

≥ minhk :hk∈H̃

L (hK+1, hk)

=∆

We have shown that for any for i < j ≤ k + 1

L (hj , hi) ≥ ∆ (24)

73 / 75

Page 203: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

We have

The following

L (hj , hi) ≥ minhk :hk∈Hj−1

L (hj , hk)

≥ minhk :hk∈Hj−1

L (hK+1, hk)

≥ minhk :hk∈H̃

L (hK+1, hk)

=∆

We have shown that for any for i < j ≤ k + 1

L (hj , hi) ≥ ∆ (24)

73 / 75

Page 204: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

We have

The following

L (hj , hi) ≥ minhk :hk∈Hj−1

L (hj , hk)

≥ minhk :hk∈Hj−1

L (hK+1, hk)

≥ minhk :hk∈H̃

L (hK+1, hk)

=∆

We have shown that for any for i < j ≤ k + 1

L (hj , hi) ≥ ∆ (24)

73 / 75

Page 205: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Now

Consider the optimal partitionS∗ = {C ∗1 , C ∗2 , ..., C ∗K}, thus at least 2 of the K + 1 elementsh1, h2, ..., hK+1 will be covered by one cluster.

Assume thathi and hj belong to the same cluster in S∗ . Then L (hi , hj) ≤ D∗.

In additionWe have that since L (hi , hj) ≥ ∆ then ∆ ≤ D∗

74 / 75

Page 206: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Now

Consider the optimal partitionS∗ = {C ∗1 , C ∗2 , ..., C ∗K}, thus at least 2 of the K + 1 elementsh1, h2, ..., hK+1 will be covered by one cluster.

Assume thathi and hj belong to the same cluster in S∗ . Then L (hi , hj) ≤ D∗.

In additionWe have that since L (hi , hj) ≥ ∆ then ∆ ≤ D∗

74 / 75

Page 207: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

Now

Consider the optimal partitionS∗ = {C ∗1 , C ∗2 , ..., C ∗K}, thus at least 2 of the K + 1 elementsh1, h2, ..., hK+1 will be covered by one cluster.

Assume thathi and hj belong to the same cluster in S∗ . Then L (hi , hj) ≤ D∗.

In additionWe have that since L (hi , hj) ≥ ∆ then ∆ ≤ D∗

74 / 75

Page 208: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

In addition

Consider elements xm and xn in any cluster represented by hk

L (xm , hk) ≤ ∆ and L (xn , hk) ≤ ∆ (25)

By Triangle Inequality

L (xm , xk) ≤ L (xm , hk) + L (xn , hk) ≤ 2∆ (26)

Finally, because this is true for any pair of xm and xn, andD̃ = L (x ′m,x ′n)

D̃ = maxk

Dk ≤ 2∆ ≤ 2D∗ (27)

75 / 75

Page 209: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

In addition

Consider elements xm and xn in any cluster represented by hk

L (xm , hk) ≤ ∆ and L (xn , hk) ≤ ∆ (25)

By Triangle Inequality

L (xm , xk) ≤ L (xm , hk) + L (xn , hk) ≤ 2∆ (26)

Finally, because this is true for any pair of xm and xn, andD̃ = L (x ′m,x ′n)

D̃ = maxk

Dk ≤ 2∆ ≤ 2D∗ (27)

75 / 75

Page 210: 25 Machine Learning Unsupervised Learaning K-means K-centers

Images/cinvestav-1.jpg

In addition

Consider elements xm and xn in any cluster represented by hk

L (xm , hk) ≤ ∆ and L (xn , hk) ≤ ∆ (25)

By Triangle Inequality

L (xm , xk) ≤ L (xm , hk) + L (xn , hk) ≤ 2∆ (26)

Finally, because this is true for any pair of xm and xn, andD̃ = L (x ′m,x ′n)

D̃ = maxk

Dk ≤ 2∆ ≤ 2D∗ (27)

75 / 75