methods for clustering k-means, soft k-means...

74
APPLIED MACHINE LEARNING 1 APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN

Upload: others

Post on 14-Aug-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

1

APPLIED MACHINE LEARNING

Methods for Clustering

K-means, Soft K-means

DBSCAN

Page 2: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

2

Objectives

Learn basic techniques for data clustering

• K-means and soft K-means, GMM (next lecture)

• DBSCAN

Understand the issues and major challenges in clustering

• Choice of metric

• Choice of number of clusters

Page 3: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

3

What is clustering?

Clustering is a type of multivariate statistical analysis also known as

cluster analysis, unsupervised classification analysis, or numerical

taxonomy.

Clustering is a process of partitioning a set of data (or objects) in a set

of meaningful sub-classes, called clusters.

Cluster: a collection of data objects that are “similar” to one another and

thus can be treated collectively as one group.

Page 4: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

4

Classification versus Clustering

Supervised Classification = Classification

We know the class labels and the number of classes.

1 2 3 1 2 3

Unsupervised Classification = Clustering

We do not know the class labels and may not know

the number of classes.

? ? ? ? ? ?

Page 5: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

5

Classification versus Clustering

Unsupervised Classification = Clustering

Hard problem when no pair of objects have exactly

the same feature.

Need to determine how similar two or more objects

are to one another.

??

?? ?

Page 6: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

6

Which clusters can you create?

Which two subgroups of pictures are similar and why?

Page 7: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

7

Which clusters can you create?

Which two subgroups of pictures are similar and why?

Page 8: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

8

A good clustering method produces high quality clusters

when:

• The intra-class (that is, intra-cluster) similarity is high.

• The inter-class similarity is low.

• The quality measure of a cluster depends on the similarity

measure used!

What is Good Clustering?

Page 9: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

9

Exercise:

Intra-class similarity is the highest when:

a) you choose to classify images with and without glasses

b) you choose to classify images of person1 against person2

Person1 with glasses

Person1 without glasses

Person2 without glasses

Person2 with glasses

Page 10: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

10

Exercise:

Projection onto first two principal components after PCA

Person1 with glasses

Person1 without glasses

Person2 without glasses

Person2 with glasses

Intra-class similarity is the highest when:

a) you choose to classify images with and without glasses

b) you choose to classify images of person1 against person2

Page 11: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

11

Exercise:

The eigenvector e1 is composed of a mix between the main characteristics of

the two faces and it is hence explanatory of both. However, since both faces

have little in common, the two groups have different coordinates onto e1 but

have quasi identical coordinates for the glasses in each subgroup. Projecting

onto e1 hence offers a mean to compute a metric of similarity across the two

persons.

Projection onto e1 against e2

Person1 with glasses

Person1 without glasses

Person2 without glasses

Person2 with glasses

e1 e2

Page 12: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

12

Exercise:

When projecting onto e1 and e3, we can separate the image of the

person1 with and without glasses, as the eigenvector e3 embeds

features distinctive of person1 primarily.

Projection onto e1 against e3

Person1 with glasses

Person1 without glasses

Person2 without glasses

Person2 with glasses

e1 e3e2

Page 13: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

13

Exercise:

Design a method to find out the groups when you no longer

have the class labels?

Projection onto first two principal components after PCA

Page 14: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

14

Priors:

• Data cluster within a circle

• There are 2 clusters

x1

x2

x3

Sensitivity to Prior Knowledge

Outliers (noise)

Relevant Data

Page 15: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

15

Priors:

• Data follow a complex distribution

• There are 3 clusters

x1

x2

x3

Sensitivity to Prior Knowledge

Page 16: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

16

Globular Clusters

Non-Globular Clusters

Clusters’ Types

K-means

produces

globular clusters

DBSCAN

produces non-

globular clusters

Page 17: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

17

Requirements for good clustering:

• Discovery of clusters with arbitrary shape

• Ability to deal with noise and outliers

• Insensitivity to input records’ ordering

• Scalability

• High dimensionality

• Interpretability and reusability

What is Good Clustering?

Page 18: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

18

How to cluster?

x1

x2

What choice of model (circle, ellipse) for the cluster?

How many models?

Page 19: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

19

x1

x2

What choice of model (circle, ellipse) for the cluster?

How many models?

Circle

Fixed number: K=2

Where to place them for optimal clustering?

K-means Clustering

K-Means clustering generates a number K of disjoint clusters to miminize:

2

1

1

,..., i k

ik

KK

k x c

J x

ix ith data point

k geometric centroid

𝑐𝒌 cluster label or number

Page 20: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

20

K-means Clustering

Initialization: initialize at random the positions of the centers of the clusters

x1

x2

In mldemos; centroids are initialized on one datapoint with no overlap across centroids.

Page 21: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

21

K-means Clustering

Assignment Step:

• Calculate the distance from each data point to each centroid.

• Assign the responsibility of each data point to its “closest” centroid.If a tie happens (i.e. two centroids are equidistant to a data point, one assigns the data point to the smallest

winning centroid).

x1

x2

arg min ,i k

ik

k d x

ix ith data point

k geometric centroid

Responsibility of cluster for point

1 if k

0 otherwise

i

ik

i

k x

kr

Page 22: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

22

Update step (M-Step):

Recompute the position of centroid based on the assignment of the points

K-means Clustering

x1

x2

i

k

k

i

i

k

i

i

r x

r

arg min ,i k

ik

k d x

Responsibility of cluster for point

1 if k

0 otherwise

i

ik

i

k x

kr

Page 23: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

23

K-means Clustering

x1

x2

arg min ,i k

ik

k d x

Responsibility of cluster for point

1 if k

0 otherwise

i

ik

i

k x

kr

Assignment Step:

• Calculate the distance from each data point to each centroid.

• Assign the responsibility of each data point to its “closest” centroid.If a tie happens (i.e. two centroids are equidistant to a data point, one assigns the data point to the smallest

winning centroid).

i

k

k

i

i

k

i

i

r x

r

Page 24: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

24

K-means Clustering

x1

x2

Stopping Criterion: Go back to step 2 and repeat the process until the clusters are stable.

Update step (M-Step):

Recompute the position of centroid based on the assignment of the points

Page 25: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

25

K-means Clustering

x1

x2

Intersection points

K-means creates a hard partitioning of the dataset

Page 26: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

26

Effect of the distance metric on K-means

L1-Norm L2-Norm

L3-Norm L8-Norm

Page 27: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

27

K-means Clustering: Algorithm

1. Initialization: Pick K arbitrary centroids and set their geometric means

to random values (in mldemos; centroids are initialized on one datapoint

with no overlap across centroids).

2. Calculate the distance from each data point to each centroid .

3. Assignment Step: Assign the responsibility of each data point to its

“closest” centroid (E-step). If a tie happens (i.e. two centroids are equidistant

to a data point, one assigns the data point to the smallest winning centroid).

4. Update Step: Adjust the centroids to be the means of all data points

assigned to them (M-step)

5. Go back to step 2 and repeat the process until the clusters are stable.

arg min ,i k

ik

k d x 1 if k

0 otherwise

ik

i

kr

i

k

k

i

i

k

i

i

r x

r

Page 28: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

28

K-means Clustering

The algorithm of K-means is a simple version of

Expectation-Maximization applied to a model

composed of isotropic Gauss functions

(see next lecture)

Page 29: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

29

K-means Clustering: Properties

• There are always K clusters.

• The clusters do not overlap. (soft K-means relaxes this assumption,

see next slides)

• Each member of a cluster is closer to its cluster than to any other

cluster.

The algorithm is guaranteed to converge

in a finite number of iterations

But it converges to a local optimum!

It is hence very sensitive to initialization of the centroids.

Page 30: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

30

Soft K-means Clustering

Assignment Step (E-step):

• Calculate the distance from each data point to each centroid.

• Assign the responsibility of each data point to its “closest” centroid.

x1

x2

'

,

,

'

: responsibility of cluster for point

[0,1],

Normalized over clusters: 1

i

k i

k i

k

i

d x

k

id x

k

k

i

k

r k x

er

e

r

Each data point is given a soft `degree of assignment'

to each of the means .

i

k

x

Page 31: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

31

x1

x2

Update step (M-Step):

Recompute the position of centroid based on the assignment of the points

The model parameters, i.e. the means, are adjusted to match the weighted

sample means of the data points that they are responsible for.

ik

ik i

k

i

i

r x

r

'

,

,

'

: responsibility of cluster for point

[0,1],

Normalized over clusters: 1

i

k i

k i

k

i

d x

k

id x

k

k

i

k

r k x

er

e

r

The update algorithm of the soft K-means is identical to that of the hard K-means, aside from the

fact that the responsibilities to a particular cluster are now real numbers varying between 0 and 1.

Soft K-means Clustering

Page 32: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

32

small

~ large

large

~ small

is the stiffness

1 measures the disparity across clusters

'

,

,

'

: responsibility of cluster for point

[0,1],

Normalized over clusters: 1

i

k i

k i

k

i

d x

k

id x

k

k

i

k

r k x

er

e

r

Soft K-means Clustering

Page 33: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

33

Soft K-means algorithm with a small (left), medium (center) and large (right)

Soft K-means Clustering

10 5 1

Page 34: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

34

Soft K-means Clustering

Iterations of the Soft K-means algorithm from the random initialization (left)

to convergence (right). Computed with = 10.

Page 35: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

35

Advantages:

• Computationally faster than other clustering techniques.

• Produces tighter clusters, especially if the clusters are globular.

• Guaranteed to converge.

Drawbacks:

• Does not work well with non-globular clusters.

• Sensitivity to choice of initial partitions

Different initial partitions can result in different final clusters.

• Assumes a fixed number K of clusters.

It is, therefore, good practice to run the algorithm several times using

different K values, to determine the optimal number of clusters.

(soft) K-means Clustering: Properties

Page 36: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

36

Advantages:

• Computationally faster than other clustering techniques.

• Produces tighter clusters, especially if the clusters are globular.

• Guaranteed to converge.

Drawbacks:

• Does not work well with non-globular clusters.

• Sensitivity to choice of initial partitions

Different initial partitions can result in different final clusters.

• Assumes a fixed number K of clusters.

It is, therefore, good practice to run the algorithm several times using

different K values, to determine the optimal number of clusters.

(soft) K-means Clustering: Properties

Page 37: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

37

• Unbalanced clusters:

K-means takes into account only the distance between the means and data

points; it has no representation of the variance of the data within each

cluster.

• Elongated clusters:

K-means imposes a fixed shape for each cluster (sphere).

K-means Clustering: Weaknesses

Page 38: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

38

Very sensitive to the choice of the number of clusters K and

the initialization. Mldemos example

K-means Clustering: Weaknesses

Page 39: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

39

K-means would not be able to reject outliers

x1

x2

x3

K-means: Limitations

Outliers (noise)

Relevant Data

Page 40: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

40

K-means would not be able to reject outliers

K-means assigns all datapoints to a cluster

Outliers get assigned to the closest cluster

x1

x2

x3

K-means: Limitations

DBSCAN can determine outliers and can generate non-globular clusters

Page 41: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

41

Density Based Spatial Clustering of Applications

with Noise (DBSCAN)

x1

x2

x3

Outliers (noise)e

1. Pick a datapoint at random

2. Compute number of datapoints within e

3. If < mdata, set this datapoint as outlier

4. Go back to 1

Page 42: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

42

Density Based Spatial Clustering of Applications

with Noise (DBSCAN)

x1

x2

x3

Outliers (noise)

1. Pick a datapoint at random

2. Compute number of datapoints within e

3. For each datapoint found, assign it to same cluster

4. Go back to 1

Cluster 1

Page 43: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

43

Density Based Spatial Clustering of Applications

with Noise (DBSCAN)

x1

x2

x3

Outliers (noise)

1. Pick a datapoint at random

2. Compute number of datapoints within e

3. For each datapoint found, assign it to same cluster

4. Merge two clusters if distance between clusters < e

Cluster 1

Cluster 2

Cluster 1

Page 44: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

44

Density Based Spatial Clustering of Applications

with Noise (DBSCAN)

x1

x2

x3

Outliers (noise)

Cluster 1

Cluster 2

Cluster 1

Hyperparameters:

• e: size of neighborhood

• mdata: minimum

number of datapoints

Page 45: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

APPLIED MACHINE LEARNING

46

Comparison: K-means / DBSCAN

K-means DBSCAN

Hyperparameters K: Nm of clusters e: size, Mdata: min. nm of datapoints

Computational cost O(K*M) O(M*log(M)), M: nm datapoints

Type of cluster Globular Non-globular (arbitrary shapes, non-

linear boundaries)

Robustness to noise Not robust Robust to outliers within e

K-means is computational cheap. However, it is not robust to noise and

produces only globular clusters.

DBSCAN is computationally intensive, but it can detect automatically noise

and produces clusters of arbitrary shape.

Both K-means and BDSCAN depend on choosing well the hyperparameters

To determine the hyperparameters, use evaluation methods for clustering (next)

Page 46: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

APPLIED MACHINE LEARNING

47

Clustering methods rely on hyper parameters

• Number of clusters, elements in the cluster, distance metric

Need to determine the goodness of these choices

Clustering is unsupervised classification

Do not know the real number of clusters and the data labels

Difficult to evaluate these choices without ground truth

Evaluation of Clustering Methods

Page 47: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

4848

ADVANCED MACHINE LEARNING

Two types of measures: Internal versus external measures

Internal measures rely on measures of similarity:

(low) intra-cluster distance versus (high) inter-cluster distances

Internal measures are problematic as the metric of similarity is

often already optimized by the clustering algorithm.

External measures rely on ground truth (class labels):

Given a (sub)-set of known class labels compute similarity of

clusters to class labels.

In real-world data, it’s hard/infeasible to gather ground truth.

Evaluation of Clustering Methods

Page 48: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

APPLIED MACHINE LEARNING

49

Internal Measure: RSS

Residual Sum of Square RSS is an internal measure (available in

mldemos).

It computes the distance (in norm-2) of each datapoint from its centroid

for all clusters.

2

1

RSS=k

Kk

k x C

x

Page 49: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

5050

ADVANCED MACHINE LEARNING

Goal of K-means is to find cluster centers 𝜇𝑘 which minimize distortion.

RSS for K-Means

2

1

RSS=k

Kk

k x C

x

Measure of

Distortion

𝐾:𝑀 𝑐𝑙𝑢𝑠𝑡𝑒𝑟𝑠𝑅𝑆𝑆: 0

𝑀:100 𝑑𝑎𝑡𝑎𝑝𝑜𝑖𝑛𝑡𝑠

𝑁: 2 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛𝑠

However, it can still be used to determine an ‘optimal’ 𝐾 by monitoring the

slope of the decrease of the measure as 𝐾 increases.

By ↑ 𝐾 we ↓ 𝑅𝑆𝑆, what is the optimal 𝐾 such that 𝑅𝑆𝑆 → 0? 𝑅𝑆𝑆 = 0 when 𝐾 = 𝑀.One has as many clusters as datapoints!

Page 50: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

5151

ADVANCED MACHINE LEARNING

K-means Clustering: Examples

Procedure: Run K-means – increase monotonically number of clusters – run K-

means with several initialization and take best run;

use RSS measure to measure improvement in clustering determine a plateau

Optimal 𝑘 is at the

‘elbow’ of the curve

𝑀:100 𝑑𝑎𝑡𝑎𝑝𝑜𝑖𝑛𝑡𝑠𝑁: 2 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛𝑠

𝑘: 4 𝑐𝑙𝑢𝑠𝑡𝑒𝑟𝑠

Page 51: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

52

K-means with RSS: Examples

Cluster Analysis of Hedge Funds (fonds speculatifs)[N. Das, 9th Int. Conf. on Computing Economis and Finance, 2011]

No legal definition of Hedge funds - consists of a wide category of

investment funds with high risk & high returns – variety of strategies for

guiding the investment

Research Question: classify type of Hedge funds based on information

provided to the client

Data Dimension (Features): such as: asset class, size of the hedge fund,

incentive fee, risk-level, and liquidity of hedge funds

Page 52: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

53

K-means with RSS: Examples

Cluster Analysis of Hedge Funds (fonds speculatifs)[N. Das, 9th Int. Conf. on Computing Economis and Finance, 2011]

No legal definition of Hedge funds - consists of a wide category of investment funds with high

risk & high returns – variety of strategies for guiding the investment

Research Question: classify type of Hedge funds based on information provided to the client

Data Dimension (Features): such as: asset class, size of the hedge fund, incentive fee, risk-

level, and liquidity of hedge funds

Number of Clusters (K) Optimal results are found with 7

clusters.

Cutoff

Procedure: Run K-means – increase

monotonically number of clusters – run

K-means with several initialization and

take best run;

Use RSS measure to measure

improvement in clustering determine

a plateau

Page 53: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

5454

ADVANCED MACHINE LEARNING

K-means Clustering: Examples

Which one is the

‘optimal’ 𝑘?

𝑀:100 𝑑𝑎𝑡𝑎𝑝𝑜𝑖𝑛𝑡𝑠K: 3 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛𝑠

The ‘elbow’ or ‘plateau’ method for choosing the optimal 𝑘 from the RSS curve can

be unreliable for certain datasets:

𝑘: 11

𝑘: 2

We don’t know! We need an

additional penalty or criterion!

Page 54: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

APPLIED MACHINE LEARNING

55

AIC and BIC determine how good the model fits the dataset in a probabilistic

sense (maximum-likelihood measure). The measure is balanced by how

many parameters are needed to get a good fit.

L: maximum likelihood of the model

: number of free parameters

: number of datapoints

- Aikaike Information Criterion: AIC= 2ln 2

- Bayesian Information Criterion: 2 ln ln

As the number of da

B

M

L B

BIC L B M

tapoints (observations) increase, BIC assigns more weights

to simpler models than AIC.

Low BIC implies either fewer explanatory variables, better fit, or both.

Penalty for an

increase in

computational costs

due to number of

parameters and

number of datapoints

Other Metrics to Evaluate Clustering Methods

Choosing AIC versus BIC depends on the application:

Is the purpose of the analysis to make predictions, or to decide which model best

represents reality?

AIC may have better predictive ability than BIC, but BIC finds a computationally more

efficient solution.

Page 55: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

5656

ADVANCED MACHINE LEARNING

AIC for K-Means

For the particular case of K-means, we do not have a maximum likelihood

estimate of the model:

𝐴𝐼𝐶 = −2 ln(𝐿) + 2𝐵

𝐴𝐼𝐶𝑅𝑆𝑆 = 𝑅𝑆𝑆 + 𝐵

However, we can formulate a metric based on the RSS that penalizes for

model complexity (# K-clusters), conceptually following AIC:

Weighting

Factor

2

1

RSS=k

Kk

k x C

x

Number of free

parameters B=(K*N)

K: # clusters

N: # dimensions

: likelihood of model

: number of free parameters

L

B

Page 56: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

5757

ADVANCED MACHINE LEARNING

BIC for K-Means

𝐵𝐼𝐶𝑅𝑆𝑆 = 𝑅𝑆𝑆 + ln(𝑀)𝐵

For the particular case of K-means, we do not have a maximum likelihood

estimate of the model:

𝐵𝐼𝐶 = −2 ln(𝐿) + ln(𝑀)𝐵

However, we can formulate a metric based on the RSS that penalizes for

model complexity (# K-clusters, # M-datapoints), conceptually following BIC:

Weighting factor penalizes wrt.

# datapoints (i.e. computational

complexity)

2

1

RSS=k

Kk

k x C

x

Number of free

parameters B=(K*N)

K: # clusters

N: # dimensions

Page 57: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

5858

ADVANCED MACHINE LEARNING

K-means Clustering: Examples

𝑀:100 𝑑𝑎𝑡𝑎𝑝𝑜𝑖𝑛𝑡𝑠N: 3 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛𝑠

Procedure: Run K-means – increase monotonically number of clusters – run K-

means with several initialization and take best run;

use AIC/BIC curves to find the optimal 𝑘, which is min 𝐴𝐼𝐶 or min(𝐵𝐼𝐶)

Both min(𝐵𝐼𝐶) and

min(𝐴𝐼𝐶) → 𝑘 = 2

𝑘: 2 𝑐𝑙𝑢𝑠𝑡𝑒𝑟𝑠

Page 58: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

5959

ADVANCED MACHINE LEARNING

BIC for K-Means

𝑀:100 𝑑𝑎𝑡𝑎𝑝𝑜𝑖𝑛𝑡𝑠N: 2 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛𝑠

𝐵𝐼𝐶𝑅𝑆𝑆 = 𝑅𝑆𝑆 + ln(𝑀) (𝐾 ∙ 𝑁)

𝑘: 14

𝐾: 14 𝑐𝑙𝑢𝑠𝑡𝑒𝑟𝑠

Page 59: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

6060

ADVANCED MACHINE LEARNING

BIC for K-Means

𝑀:100 𝑑𝑎𝑡𝑎𝑝𝑜𝑖𝑛𝑡𝑠N: 2 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛𝑠

𝐵𝐼𝐶𝑅𝑆𝑆 = 𝑅𝑆𝑆 + ln(𝑀) (𝐾 ∙ 𝑁)

𝑘: 4

𝐾: 4 𝑐𝑙𝑢𝑠𝑡𝑒𝑟𝑠

Page 60: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

6161

ADVANCED MACHINE LEARNING

AIC / BIC for DBSCAN

Comput centroid of each cluster and apply AIC/BIC of K−means

DBSCAN large e DBSCAN medium e DBSCAN small e

DBSCAN large e DBSCAN medium e DBSCAN small e

RSS 43 26 0.5

BIC 42 34 78

AIC 69 51 24

Page 61: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

6262

ADVANCED MACHINE LEARNING

AIC / BIC for DBSCAN

Comput centroid of each cluster and apply AIC/BIC of K−means

DBSCAN large e DBSCAN medium e DBSCAN small e

K-means DBSCAN large e DBSCAN medium e DBSCAN small e

RSS 51 95 59 0.6

BIC 65 118 88 331

AIC 55 102 67 93

K-means

Page 62: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

APPLIED MACHINE LEARNING

63

Two types of measures: Internal versus external measures

External measures assume that a subset of datapoints have class label

semi-supervised learning

They measure how well these datapoints are clustered.

Needs to have an idea of the number of existing classes and have

labeled some datapoints

Interesting only in cases when labeling is highly time-consuming

when the data is very large (e.g. in speech recognition)

Evaluation of Clustering Methods

Page 63: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

64

Semi-Supervised Learning

Clustering F1-Measure:(careful: similar but not the same F-measure as the F-measure we will see for classification!)

Tradeoff between clustering correctly all datapoints of the same class in the same

cluster and making sure that each cluster contains points of only one class.

1 1

1

: nm of labeled datapoints

: the set of classes

: nm of clusters,

: nm of members of class and of cluster

, max ,

2 , ,,

, ,

,

,

ik

i

ik

ik

i

i

i

i

c C k

i i

i

i i

i

i

i

M

C c

K

n c k

cF C K F c k

M

R c k P c kF c k

R c k P c k

nR c k

c

nP c k

k

Page 64: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

65

Recall: proportion of

datapoints correctly

classified/clusterized

Precision: proportion of

datapoints of the same

class in the cluster

Class 1

Class 2

Labeled

Unlabeled

1 1

1

: nm of labeled datapoints

: the set of classes

: nm of clusters,

: nm of members of class and of cluster

, max ,

2 , ,,

, ,

,

,

ik

i

ik

ik

i

i

i

i

c C k

i i

i

i i

i

i

i

M

C c

K

n c k

cF C K F c k

M

R c k P c kF c k

R c k P c k

nR c k

c

nP c k

k

1 2

2 4, 1 1 , 2 1

2 4R c k R c k

1 2

2 4, 1 , 2

6 6P c k R c k

Page 65: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

66

Penalize fraction of labeled

points in each class

1 2

2 4, , 1 , 2 0.7

6 6F C K F c k F c k

Class 1

Class 2

Labeled

Unlabeled

Picks for each class

the cluster with the

maximal F1 measure

1 1

1

: nm of labeled datapoints

: the set of classes

: nm of clusters,

: nm of members of class and of cluster

, max ,

2 , ,,

, ,

,

,

ik

i

ik

ik

i

i

i

i

c C k

i i

i

i i

i

i

i

M

C c

K

n c k

cF C K F c k

M

R c k P c kF c k

R c k P c k

nR c k

c

nP c k

k

Page 66: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

67

Summary of F1-Measure

Picks for each class the

cluster with the maximal

F1 measure

Recall: proportion of

datapoints correctly

classified/clusterized

Precision: proportion of

datapoints of the same

class in the cluster

1 1

1

: nm of labeled datapoints

: the set of classes

: nm of clusters,

: nm of members of class and of cluster

, max ,

2 , ,,

, ,

,

,

ik

i

ik

ik

i

i

i

i

c C k

i i

i

i i

i

i

i

M

C c

K

n c k

cF C K F c k

M

R c k P c kF c k

R c k P c k

nR c k

c

nP c k

k

Penalize fraction of labeled

points in each class

Clustering F1-Measure:(careful: similar but not the same F-measure as the F-measure we will see for classification!)

Tradeoff between clustering correctly all datapoints of the same class in the same

cluster and making sure that each cluster contains points of only one class.

Page 67: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

68

Introduced two clustering techniques: K-means and DBSCAN

Discussed pros and cons in terms of computational time, power

of representation (globular/non-globular clusters)

Introduced metrics to evaluate clustering and help to choose the

hyperparameters:

• Internal measures (RSS, AIC, BIC)

• External measures: F1-measure (also called F-measure

for clustering)

Summary of Lecture

Next week: Practical on Clustering:

You will compare performance of K-means and DBSCAN on

your datasets and use the internal and external measure to

assess these performance and choose the hyperparameters.

Page 68: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

APPLIED MACHINE LEARNING

69

Robotic Application of Clustering Method

Variety of hand postures

when grasping objects How to generate correct hand

posture on robots?

El-Khoury, S., Miao, Li and Billard, A. (2013) On the Generation of a Variety of Grasps. Robotics and Autonomous Systems Journal.

Page 69: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

APPLIED MACHINE LEARNING

70

Robotic Application of Clustering Method

4 DOFs industrial hand

(Barrett Technology)

9 DOFs humanoid hand

(iCub Robot)

Problem:

Choose the point of contact and generate feasible posture for the fingers to

touch the object at the correct point and with the desired force.

Difficuly:

High-degrees of freedom (large number of possible points of contact, large

number of DOFs to control)

Page 70: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

71

Formulate the problem as Constraint-Based Optimization :

Minimize generated torques at fingertips under

constraints:

•Force closure

•Kinematic feasibility

•Collision avoidance

Nonconvex optimization yields several local / feasible solutions

From 1890 trials converge

to 791 feasible solutions

From 1890 trials converge

to 612 feasible solutions

Took ~2.65s. for each

solution!

Took ~12.14s for each

solution

Took too long for realistic application

Page 71: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

72

Apply K-means on all solutions and group them into clusters

11 Clusters 20 Clusters

Page 72: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

73A. Shukla and A. Billard, NIPS 2012

Page 73: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

74

Page 74: Methods for Clustering K-means, Soft K-means DBSCANlasa.epfl.ch/teaching/lectures/ML_Msc/Slides/Clustering.pdf · r P ¦ ¦ ' ,, ' k k 1 i ki ki i dx i dx k k i k x e r e r EP EP

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

75