methods for clustering k-means, soft k-means...

Post on 14-Aug-2020

8 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

1

APPLIED MACHINE LEARNING

Methods for Clustering

K-means, Soft K-means

DBSCAN

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

2

Objectives

Learn basic techniques for data clustering

• K-means and soft K-means, GMM (next lecture)

• DBSCAN

Understand the issues and major challenges in clustering

• Choice of metric

• Choice of number of clusters

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

3

What is clustering?

Clustering is a type of multivariate statistical analysis also known as

cluster analysis, unsupervised classification analysis, or numerical

taxonomy.

Clustering is a process of partitioning a set of data (or objects) in a set

of meaningful sub-classes, called clusters.

Cluster: a collection of data objects that are “similar” to one another and

thus can be treated collectively as one group.

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

4

Classification versus Clustering

Supervised Classification = Classification

We know the class labels and the number of classes.

1 2 3 1 2 3

Unsupervised Classification = Clustering

We do not know the class labels and may not know

the number of classes.

? ? ? ? ? ?

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

5

Classification versus Clustering

Unsupervised Classification = Clustering

Hard problem when no pair of objects have exactly

the same feature.

Need to determine how similar two or more objects

are to one another.

??

?? ?

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

6

Which clusters can you create?

Which two subgroups of pictures are similar and why?

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

7

Which clusters can you create?

Which two subgroups of pictures are similar and why?

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

8

A good clustering method produces high quality clusters

when:

• The intra-class (that is, intra-cluster) similarity is high.

• The inter-class similarity is low.

• The quality measure of a cluster depends on the similarity

measure used!

What is Good Clustering?

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

9

Exercise:

Intra-class similarity is the highest when:

a) you choose to classify images with and without glasses

b) you choose to classify images of person1 against person2

Person1 with glasses

Person1 without glasses

Person2 without glasses

Person2 with glasses

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

10

Exercise:

Projection onto first two principal components after PCA

Person1 with glasses

Person1 without glasses

Person2 without glasses

Person2 with glasses

Intra-class similarity is the highest when:

a) you choose to classify images with and without glasses

b) you choose to classify images of person1 against person2

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

11

Exercise:

The eigenvector e1 is composed of a mix between the main characteristics of

the two faces and it is hence explanatory of both. However, since both faces

have little in common, the two groups have different coordinates onto e1 but

have quasi identical coordinates for the glasses in each subgroup. Projecting

onto e1 hence offers a mean to compute a metric of similarity across the two

persons.

Projection onto e1 against e2

Person1 with glasses

Person1 without glasses

Person2 without glasses

Person2 with glasses

e1 e2

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

12

Exercise:

When projecting onto e1 and e3, we can separate the image of the

person1 with and without glasses, as the eigenvector e3 embeds

features distinctive of person1 primarily.

Projection onto e1 against e3

Person1 with glasses

Person1 without glasses

Person2 without glasses

Person2 with glasses

e1 e3e2

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

13

Exercise:

Design a method to find out the groups when you no longer

have the class labels?

Projection onto first two principal components after PCA

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

14

Priors:

• Data cluster within a circle

• There are 2 clusters

x1

x2

x3

Sensitivity to Prior Knowledge

Outliers (noise)

Relevant Data

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

15

Priors:

• Data follow a complex distribution

• There are 3 clusters

x1

x2

x3

Sensitivity to Prior Knowledge

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

16

Globular Clusters

Non-Globular Clusters

Clusters’ Types

K-means

produces

globular clusters

DBSCAN

produces non-

globular clusters

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

17

Requirements for good clustering:

• Discovery of clusters with arbitrary shape

• Ability to deal with noise and outliers

• Insensitivity to input records’ ordering

• Scalability

• High dimensionality

• Interpretability and reusability

What is Good Clustering?

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

18

How to cluster?

x1

x2

What choice of model (circle, ellipse) for the cluster?

How many models?

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

19

x1

x2

What choice of model (circle, ellipse) for the cluster?

How many models?

Circle

Fixed number: K=2

Where to place them for optimal clustering?

K-means Clustering

K-Means clustering generates a number K of disjoint clusters to miminize:

2

1

1

,..., i k

ik

KK

k x c

J x

ix ith data point

k geometric centroid

𝑐𝒌 cluster label or number

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

20

K-means Clustering

Initialization: initialize at random the positions of the centers of the clusters

x1

x2

In mldemos; centroids are initialized on one datapoint with no overlap across centroids.

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

21

K-means Clustering

Assignment Step:

• Calculate the distance from each data point to each centroid.

• Assign the responsibility of each data point to its “closest” centroid.If a tie happens (i.e. two centroids are equidistant to a data point, one assigns the data point to the smallest

winning centroid).

x1

x2

arg min ,i k

ik

k d x

ix ith data point

k geometric centroid

Responsibility of cluster for point

1 if k

0 otherwise

i

ik

i

k x

kr

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

22

Update step (M-Step):

Recompute the position of centroid based on the assignment of the points

K-means Clustering

x1

x2

i

k

k

i

i

k

i

i

r x

r

arg min ,i k

ik

k d x

Responsibility of cluster for point

1 if k

0 otherwise

i

ik

i

k x

kr

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

23

K-means Clustering

x1

x2

arg min ,i k

ik

k d x

Responsibility of cluster for point

1 if k

0 otherwise

i

ik

i

k x

kr

Assignment Step:

• Calculate the distance from each data point to each centroid.

• Assign the responsibility of each data point to its “closest” centroid.If a tie happens (i.e. two centroids are equidistant to a data point, one assigns the data point to the smallest

winning centroid).

i

k

k

i

i

k

i

i

r x

r

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

24

K-means Clustering

x1

x2

Stopping Criterion: Go back to step 2 and repeat the process until the clusters are stable.

Update step (M-Step):

Recompute the position of centroid based on the assignment of the points

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

25

K-means Clustering

x1

x2

Intersection points

K-means creates a hard partitioning of the dataset

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

26

Effect of the distance metric on K-means

L1-Norm L2-Norm

L3-Norm L8-Norm

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

27

K-means Clustering: Algorithm

1. Initialization: Pick K arbitrary centroids and set their geometric means

to random values (in mldemos; centroids are initialized on one datapoint

with no overlap across centroids).

2. Calculate the distance from each data point to each centroid .

3. Assignment Step: Assign the responsibility of each data point to its

“closest” centroid (E-step). If a tie happens (i.e. two centroids are equidistant

to a data point, one assigns the data point to the smallest winning centroid).

4. Update Step: Adjust the centroids to be the means of all data points

assigned to them (M-step)

5. Go back to step 2 and repeat the process until the clusters are stable.

arg min ,i k

ik

k d x 1 if k

0 otherwise

ik

i

kr

i

k

k

i

i

k

i

i

r x

r

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

28

K-means Clustering

The algorithm of K-means is a simple version of

Expectation-Maximization applied to a model

composed of isotropic Gauss functions

(see next lecture)

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

29

K-means Clustering: Properties

• There are always K clusters.

• The clusters do not overlap. (soft K-means relaxes this assumption,

see next slides)

• Each member of a cluster is closer to its cluster than to any other

cluster.

The algorithm is guaranteed to converge

in a finite number of iterations

But it converges to a local optimum!

It is hence very sensitive to initialization of the centroids.

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

30

Soft K-means Clustering

Assignment Step (E-step):

• Calculate the distance from each data point to each centroid.

• Assign the responsibility of each data point to its “closest” centroid.

x1

x2

'

,

,

'

: responsibility of cluster for point

[0,1],

Normalized over clusters: 1

i

k i

k i

k

i

d x

k

id x

k

k

i

k

r k x

er

e

r

Each data point is given a soft `degree of assignment'

to each of the means .

i

k

x

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

31

x1

x2

Update step (M-Step):

Recompute the position of centroid based on the assignment of the points

The model parameters, i.e. the means, are adjusted to match the weighted

sample means of the data points that they are responsible for.

ik

ik i

k

i

i

r x

r

'

,

,

'

: responsibility of cluster for point

[0,1],

Normalized over clusters: 1

i

k i

k i

k

i

d x

k

id x

k

k

i

k

r k x

er

e

r

The update algorithm of the soft K-means is identical to that of the hard K-means, aside from the

fact that the responsibilities to a particular cluster are now real numbers varying between 0 and 1.

Soft K-means Clustering

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

32

small

~ large

large

~ small

is the stiffness

1 measures the disparity across clusters

'

,

,

'

: responsibility of cluster for point

[0,1],

Normalized over clusters: 1

i

k i

k i

k

i

d x

k

id x

k

k

i

k

r k x

er

e

r

Soft K-means Clustering

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

33

Soft K-means algorithm with a small (left), medium (center) and large (right)

Soft K-means Clustering

10 5 1

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

34

Soft K-means Clustering

Iterations of the Soft K-means algorithm from the random initialization (left)

to convergence (right). Computed with = 10.

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

35

Advantages:

• Computationally faster than other clustering techniques.

• Produces tighter clusters, especially if the clusters are globular.

• Guaranteed to converge.

Drawbacks:

• Does not work well with non-globular clusters.

• Sensitivity to choice of initial partitions

Different initial partitions can result in different final clusters.

• Assumes a fixed number K of clusters.

It is, therefore, good practice to run the algorithm several times using

different K values, to determine the optimal number of clusters.

(soft) K-means Clustering: Properties

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

36

Advantages:

• Computationally faster than other clustering techniques.

• Produces tighter clusters, especially if the clusters are globular.

• Guaranteed to converge.

Drawbacks:

• Does not work well with non-globular clusters.

• Sensitivity to choice of initial partitions

Different initial partitions can result in different final clusters.

• Assumes a fixed number K of clusters.

It is, therefore, good practice to run the algorithm several times using

different K values, to determine the optimal number of clusters.

(soft) K-means Clustering: Properties

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

37

• Unbalanced clusters:

K-means takes into account only the distance between the means and data

points; it has no representation of the variance of the data within each

cluster.

• Elongated clusters:

K-means imposes a fixed shape for each cluster (sphere).

K-means Clustering: Weaknesses

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

38

Very sensitive to the choice of the number of clusters K and

the initialization. Mldemos example

K-means Clustering: Weaknesses

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

39

K-means would not be able to reject outliers

x1

x2

x3

K-means: Limitations

Outliers (noise)

Relevant Data

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

40

K-means would not be able to reject outliers

K-means assigns all datapoints to a cluster

Outliers get assigned to the closest cluster

x1

x2

x3

K-means: Limitations

DBSCAN can determine outliers and can generate non-globular clusters

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

41

Density Based Spatial Clustering of Applications

with Noise (DBSCAN)

x1

x2

x3

Outliers (noise)e

1. Pick a datapoint at random

2. Compute number of datapoints within e

3. If < mdata, set this datapoint as outlier

4. Go back to 1

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

42

Density Based Spatial Clustering of Applications

with Noise (DBSCAN)

x1

x2

x3

Outliers (noise)

1. Pick a datapoint at random

2. Compute number of datapoints within e

3. For each datapoint found, assign it to same cluster

4. Go back to 1

Cluster 1

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

43

Density Based Spatial Clustering of Applications

with Noise (DBSCAN)

x1

x2

x3

Outliers (noise)

1. Pick a datapoint at random

2. Compute number of datapoints within e

3. For each datapoint found, assign it to same cluster

4. Merge two clusters if distance between clusters < e

Cluster 1

Cluster 2

Cluster 1

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

44

Density Based Spatial Clustering of Applications

with Noise (DBSCAN)

x1

x2

x3

Outliers (noise)

Cluster 1

Cluster 2

Cluster 1

Hyperparameters:

• e: size of neighborhood

• mdata: minimum

number of datapoints

APPLIED MACHINE LEARNING

46

Comparison: K-means / DBSCAN

K-means DBSCAN

Hyperparameters K: Nm of clusters e: size, Mdata: min. nm of datapoints

Computational cost O(K*M) O(M*log(M)), M: nm datapoints

Type of cluster Globular Non-globular (arbitrary shapes, non-

linear boundaries)

Robustness to noise Not robust Robust to outliers within e

K-means is computational cheap. However, it is not robust to noise and

produces only globular clusters.

DBSCAN is computationally intensive, but it can detect automatically noise

and produces clusters of arbitrary shape.

Both K-means and BDSCAN depend on choosing well the hyperparameters

To determine the hyperparameters, use evaluation methods for clustering (next)

APPLIED MACHINE LEARNING

47

Clustering methods rely on hyper parameters

• Number of clusters, elements in the cluster, distance metric

Need to determine the goodness of these choices

Clustering is unsupervised classification

Do not know the real number of clusters and the data labels

Difficult to evaluate these choices without ground truth

Evaluation of Clustering Methods

4848

ADVANCED MACHINE LEARNING

Two types of measures: Internal versus external measures

Internal measures rely on measures of similarity:

(low) intra-cluster distance versus (high) inter-cluster distances

Internal measures are problematic as the metric of similarity is

often already optimized by the clustering algorithm.

External measures rely on ground truth (class labels):

Given a (sub)-set of known class labels compute similarity of

clusters to class labels.

In real-world data, it’s hard/infeasible to gather ground truth.

Evaluation of Clustering Methods

APPLIED MACHINE LEARNING

49

Internal Measure: RSS

Residual Sum of Square RSS is an internal measure (available in

mldemos).

It computes the distance (in norm-2) of each datapoint from its centroid

for all clusters.

2

1

RSS=k

Kk

k x C

x

5050

ADVANCED MACHINE LEARNING

Goal of K-means is to find cluster centers 𝜇𝑘 which minimize distortion.

RSS for K-Means

2

1

RSS=k

Kk

k x C

x

Measure of

Distortion

𝐾:𝑀 𝑐𝑙𝑢𝑠𝑡𝑒𝑟𝑠𝑅𝑆𝑆: 0

𝑀:100 𝑑𝑎𝑡𝑎𝑝𝑜𝑖𝑛𝑡𝑠

𝑁: 2 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛𝑠

However, it can still be used to determine an ‘optimal’ 𝐾 by monitoring the

slope of the decrease of the measure as 𝐾 increases.

By ↑ 𝐾 we ↓ 𝑅𝑆𝑆, what is the optimal 𝐾 such that 𝑅𝑆𝑆 → 0? 𝑅𝑆𝑆 = 0 when 𝐾 = 𝑀.One has as many clusters as datapoints!

5151

ADVANCED MACHINE LEARNING

K-means Clustering: Examples

Procedure: Run K-means – increase monotonically number of clusters – run K-

means with several initialization and take best run;

use RSS measure to measure improvement in clustering determine a plateau

Optimal 𝑘 is at the

‘elbow’ of the curve

𝑀:100 𝑑𝑎𝑡𝑎𝑝𝑜𝑖𝑛𝑡𝑠𝑁: 2 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛𝑠

𝑘: 4 𝑐𝑙𝑢𝑠𝑡𝑒𝑟𝑠

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

52

K-means with RSS: Examples

Cluster Analysis of Hedge Funds (fonds speculatifs)[N. Das, 9th Int. Conf. on Computing Economis and Finance, 2011]

No legal definition of Hedge funds - consists of a wide category of

investment funds with high risk & high returns – variety of strategies for

guiding the investment

Research Question: classify type of Hedge funds based on information

provided to the client

Data Dimension (Features): such as: asset class, size of the hedge fund,

incentive fee, risk-level, and liquidity of hedge funds

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

53

K-means with RSS: Examples

Cluster Analysis of Hedge Funds (fonds speculatifs)[N. Das, 9th Int. Conf. on Computing Economis and Finance, 2011]

No legal definition of Hedge funds - consists of a wide category of investment funds with high

risk & high returns – variety of strategies for guiding the investment

Research Question: classify type of Hedge funds based on information provided to the client

Data Dimension (Features): such as: asset class, size of the hedge fund, incentive fee, risk-

level, and liquidity of hedge funds

Number of Clusters (K) Optimal results are found with 7

clusters.

Cutoff

Procedure: Run K-means – increase

monotonically number of clusters – run

K-means with several initialization and

take best run;

Use RSS measure to measure

improvement in clustering determine

a plateau

5454

ADVANCED MACHINE LEARNING

K-means Clustering: Examples

Which one is the

‘optimal’ 𝑘?

𝑀:100 𝑑𝑎𝑡𝑎𝑝𝑜𝑖𝑛𝑡𝑠K: 3 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛𝑠

The ‘elbow’ or ‘plateau’ method for choosing the optimal 𝑘 from the RSS curve can

be unreliable for certain datasets:

𝑘: 11

𝑘: 2

We don’t know! We need an

additional penalty or criterion!

APPLIED MACHINE LEARNING

55

AIC and BIC determine how good the model fits the dataset in a probabilistic

sense (maximum-likelihood measure). The measure is balanced by how

many parameters are needed to get a good fit.

L: maximum likelihood of the model

: number of free parameters

: number of datapoints

- Aikaike Information Criterion: AIC= 2ln 2

- Bayesian Information Criterion: 2 ln ln

As the number of da

B

M

L B

BIC L B M

tapoints (observations) increase, BIC assigns more weights

to simpler models than AIC.

Low BIC implies either fewer explanatory variables, better fit, or both.

Penalty for an

increase in

computational costs

due to number of

parameters and

number of datapoints

Other Metrics to Evaluate Clustering Methods

Choosing AIC versus BIC depends on the application:

Is the purpose of the analysis to make predictions, or to decide which model best

represents reality?

AIC may have better predictive ability than BIC, but BIC finds a computationally more

efficient solution.

5656

ADVANCED MACHINE LEARNING

AIC for K-Means

For the particular case of K-means, we do not have a maximum likelihood

estimate of the model:

𝐴𝐼𝐶 = −2 ln(𝐿) + 2𝐵

𝐴𝐼𝐶𝑅𝑆𝑆 = 𝑅𝑆𝑆 + 𝐵

However, we can formulate a metric based on the RSS that penalizes for

model complexity (# K-clusters), conceptually following AIC:

Weighting

Factor

2

1

RSS=k

Kk

k x C

x

Number of free

parameters B=(K*N)

K: # clusters

N: # dimensions

: likelihood of model

: number of free parameters

L

B

5757

ADVANCED MACHINE LEARNING

BIC for K-Means

𝐵𝐼𝐶𝑅𝑆𝑆 = 𝑅𝑆𝑆 + ln(𝑀)𝐵

For the particular case of K-means, we do not have a maximum likelihood

estimate of the model:

𝐵𝐼𝐶 = −2 ln(𝐿) + ln(𝑀)𝐵

However, we can formulate a metric based on the RSS that penalizes for

model complexity (# K-clusters, # M-datapoints), conceptually following BIC:

Weighting factor penalizes wrt.

# datapoints (i.e. computational

complexity)

2

1

RSS=k

Kk

k x C

x

Number of free

parameters B=(K*N)

K: # clusters

N: # dimensions

5858

ADVANCED MACHINE LEARNING

K-means Clustering: Examples

𝑀:100 𝑑𝑎𝑡𝑎𝑝𝑜𝑖𝑛𝑡𝑠N: 3 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛𝑠

Procedure: Run K-means – increase monotonically number of clusters – run K-

means with several initialization and take best run;

use AIC/BIC curves to find the optimal 𝑘, which is min 𝐴𝐼𝐶 or min(𝐵𝐼𝐶)

Both min(𝐵𝐼𝐶) and

min(𝐴𝐼𝐶) → 𝑘 = 2

𝑘: 2 𝑐𝑙𝑢𝑠𝑡𝑒𝑟𝑠

5959

ADVANCED MACHINE LEARNING

BIC for K-Means

𝑀:100 𝑑𝑎𝑡𝑎𝑝𝑜𝑖𝑛𝑡𝑠N: 2 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛𝑠

𝐵𝐼𝐶𝑅𝑆𝑆 = 𝑅𝑆𝑆 + ln(𝑀) (𝐾 ∙ 𝑁)

𝑘: 14

𝐾: 14 𝑐𝑙𝑢𝑠𝑡𝑒𝑟𝑠

6060

ADVANCED MACHINE LEARNING

BIC for K-Means

𝑀:100 𝑑𝑎𝑡𝑎𝑝𝑜𝑖𝑛𝑡𝑠N: 2 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛𝑠

𝐵𝐼𝐶𝑅𝑆𝑆 = 𝑅𝑆𝑆 + ln(𝑀) (𝐾 ∙ 𝑁)

𝑘: 4

𝐾: 4 𝑐𝑙𝑢𝑠𝑡𝑒𝑟𝑠

6161

ADVANCED MACHINE LEARNING

AIC / BIC for DBSCAN

Comput centroid of each cluster and apply AIC/BIC of K−means

DBSCAN large e DBSCAN medium e DBSCAN small e

DBSCAN large e DBSCAN medium e DBSCAN small e

RSS 43 26 0.5

BIC 42 34 78

AIC 69 51 24

6262

ADVANCED MACHINE LEARNING

AIC / BIC for DBSCAN

Comput centroid of each cluster and apply AIC/BIC of K−means

DBSCAN large e DBSCAN medium e DBSCAN small e

K-means DBSCAN large e DBSCAN medium e DBSCAN small e

RSS 51 95 59 0.6

BIC 65 118 88 331

AIC 55 102 67 93

K-means

APPLIED MACHINE LEARNING

63

Two types of measures: Internal versus external measures

External measures assume that a subset of datapoints have class label

semi-supervised learning

They measure how well these datapoints are clustered.

Needs to have an idea of the number of existing classes and have

labeled some datapoints

Interesting only in cases when labeling is highly time-consuming

when the data is very large (e.g. in speech recognition)

Evaluation of Clustering Methods

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

64

Semi-Supervised Learning

Clustering F1-Measure:(careful: similar but not the same F-measure as the F-measure we will see for classification!)

Tradeoff between clustering correctly all datapoints of the same class in the same

cluster and making sure that each cluster contains points of only one class.

1 1

1

: nm of labeled datapoints

: the set of classes

: nm of clusters,

: nm of members of class and of cluster

, max ,

2 , ,,

, ,

,

,

ik

i

ik

ik

i

i

i

i

c C k

i i

i

i i

i

i

i

M

C c

K

n c k

cF C K F c k

M

R c k P c kF c k

R c k P c k

nR c k

c

nP c k

k

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

65

Recall: proportion of

datapoints correctly

classified/clusterized

Precision: proportion of

datapoints of the same

class in the cluster

Class 1

Class 2

Labeled

Unlabeled

1 1

1

: nm of labeled datapoints

: the set of classes

: nm of clusters,

: nm of members of class and of cluster

, max ,

2 , ,,

, ,

,

,

ik

i

ik

ik

i

i

i

i

c C k

i i

i

i i

i

i

i

M

C c

K

n c k

cF C K F c k

M

R c k P c kF c k

R c k P c k

nR c k

c

nP c k

k

1 2

2 4, 1 1 , 2 1

2 4R c k R c k

1 2

2 4, 1 , 2

6 6P c k R c k

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

66

Penalize fraction of labeled

points in each class

1 2

2 4, , 1 , 2 0.7

6 6F C K F c k F c k

Class 1

Class 2

Labeled

Unlabeled

Picks for each class

the cluster with the

maximal F1 measure

1 1

1

: nm of labeled datapoints

: the set of classes

: nm of clusters,

: nm of members of class and of cluster

, max ,

2 , ,,

, ,

,

,

ik

i

ik

ik

i

i

i

i

c C k

i i

i

i i

i

i

i

M

C c

K

n c k

cF C K F c k

M

R c k P c kF c k

R c k P c k

nR c k

c

nP c k

k

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

67

Summary of F1-Measure

Picks for each class the

cluster with the maximal

F1 measure

Recall: proportion of

datapoints correctly

classified/clusterized

Precision: proportion of

datapoints of the same

class in the cluster

1 1

1

: nm of labeled datapoints

: the set of classes

: nm of clusters,

: nm of members of class and of cluster

, max ,

2 , ,,

, ,

,

,

ik

i

ik

ik

i

i

i

i

c C k

i i

i

i i

i

i

i

M

C c

K

n c k

cF C K F c k

M

R c k P c kF c k

R c k P c k

nR c k

c

nP c k

k

Penalize fraction of labeled

points in each class

Clustering F1-Measure:(careful: similar but not the same F-measure as the F-measure we will see for classification!)

Tradeoff between clustering correctly all datapoints of the same class in the same

cluster and making sure that each cluster contains points of only one class.

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

68

Introduced two clustering techniques: K-means and DBSCAN

Discussed pros and cons in terms of computational time, power

of representation (globular/non-globular clusters)

Introduced metrics to evaluate clustering and help to choose the

hyperparameters:

• Internal measures (RSS, AIC, BIC)

• External measures: F1-measure (also called F-measure

for clustering)

Summary of Lecture

Next week: Practical on Clustering:

You will compare performance of K-means and DBSCAN on

your datasets and use the internal and external measure to

assess these performance and choose the hyperparameters.

APPLIED MACHINE LEARNING

69

Robotic Application of Clustering Method

Variety of hand postures

when grasping objects How to generate correct hand

posture on robots?

El-Khoury, S., Miao, Li and Billard, A. (2013) On the Generation of a Variety of Grasps. Robotics and Autonomous Systems Journal.

APPLIED MACHINE LEARNING

70

Robotic Application of Clustering Method

4 DOFs industrial hand

(Barrett Technology)

9 DOFs humanoid hand

(iCub Robot)

Problem:

Choose the point of contact and generate feasible posture for the fingers to

touch the object at the correct point and with the desired force.

Difficuly:

High-degrees of freedom (large number of possible points of contact, large

number of DOFs to control)

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

71

Formulate the problem as Constraint-Based Optimization :

Minimize generated torques at fingertips under

constraints:

•Force closure

•Kinematic feasibility

•Collision avoidance

Nonconvex optimization yields several local / feasible solutions

From 1890 trials converge

to 791 feasible solutions

From 1890 trials converge

to 612 feasible solutions

Took ~2.65s. for each

solution!

Took ~12.14s for each

solution

Took too long for realistic application

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

72

Apply K-means on all solutions and group them into clusters

11 Clusters 20 Clusters

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

73A. Shukla and A. Billard, NIPS 2012

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

74

MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING

75

top related