clustering - ic.unicamp.br file2 outline •introduction • unsupervised learning • what is...

71
Clustering Sriram Sankararaman (Adapted from slides by Junming Yin)

Upload: hoangmien

Post on 23-Apr-2019

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

ClusteringSriram Sankararaman

(Adapted from slides by Junming Yin)

Page 2: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

2

Outline• Introduction

• Unsupervised learning• What is cluster analysis?• Applications of clustering

• Dissimilarity (similarity) of samples• Clustering algorithms

• K-means• Gaussian mixture model (GMM)• Hierarchical clustering• Spectral clustering

Page 3: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

3

Unsupervised Learning• Recall in the setting of classification and

regression, the training data are represented as , the goal is to learn a function that

predicts given . (supervised learning)• In the unsupervised setting, we only have

unlabelled data . Can we infer someproperties of the distribution of X?

Page 4: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

4

Why do UnsupervisedLearning?• Raw data is cheap but labeling them can be costly.

• The data lies in a high-dimensional space. We might findsome low-dimensional features that might be sufficient todescribe the samples (next lecture).

• In the early stages of an investigation, it may be valuableto perform exploratory data analysis and gain someinsight into the nature or structure of data.

• Cluster analysis is one method for unsupervised learning.

Page 5: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

5

What is Cluster Analysis?• Cluster analysis aims to discover clusters or groups of

samples such that samples within the same group aremore similar to each other than they are to the samples ofother groups.• A dissimilarity (similarity) function between samples.• A loss function to evaluate a groupings of samples into

clusters.• An algorithm that optimizes this loss function.

Page 6: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

6

Outline• Introduction

• Unsupervised learning• What is cluster analysis?• Applications of clustering

• Dissimilarity (similarity) of samples• Clustering algorithms

• K-means• Gaussian mixture model (GMM)• Hierarchical clustering• Spectral clustering

Page 7: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

7

Image Segmentation

http://people.cs.uchicago.edu/~pff/segment/

Page 8: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

8

Clustering Search Results

Page 9: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

9

Clustering geneexpression data

Eisen et al, PNAS 1998

Page 10: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

10

Vector quantization tocompress images

Bishop, PRML

Page 11: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

11

Outline• Introduction

• Unsupervised learning• What is cluster analysis?• Applications of clustering

• Dissimilarity (similarity) of samples• Clustering algorithms

• K-means• Gaussian mixture model (GMM)• Hierarchical clustering• Spectral clustering

Page 12: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

12

Dissimilarity of samples

• The natural question now is: how should we measure thedissimilarity between samples?

• The clustering results depend on the choice ofdissimilarity.

• Usually from subject matter consideration.

• Need to consider the type of the features.

• Quantitative, ordinal, categorical.

• Possible to learn the dissimilarity from data for aparticular application (later).

Page 13: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

13

Dissimilarity Based on features• Most of time, data have measurements on features

• A common choice of dissimilarity function between samples isthe Euclidean distance.

• Clusters defined by Euclidean distance is invariant totranslations and rotations in feature space, but not invariant toscaling of features.

• One way to standardize the data: translate and scale thefeatures so that all of features have zero mean and unitvariance.

• BE CAREFUL! It is not always desirable.

Page 14: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

14

Standardization not alwayshelpful

Simulated data, 2-meanswithout standardization

Simulated data, 2-meanswith standardization

Page 15: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

15

Outline• Introduction

• Unsupervised learning• What is cluster analysis?• Applications of clustering

• Dissimilarity (similarity) of samples• Clustering algorithms

• K-means• Gaussian mixture model (GMM)• Hierarchical clustering• Spectral clustering

Page 16: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

16

K-means: Idea• Represent the data set in terms of K clusters,

each of which is summarized by a prototype• Each data is assigned to one of K clusters

• Represented by responsibilities such

that for all data indices i

• Example: 4 data points and 3 clusters

Page 17: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

17

K-means: Idea

• Loss function: the sum-of-squareddistances from each data point to itsassigned prototype (is equivalent to thewithin-cluster scatter).

prototypesresponsibilities

data

Page 18: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

18

Minimizing the loss Function• Chicken and egg problem

• If prototypes known, can assign responsibilities.• If responsibilities known, can compute optimal

prototypes.• We minimize the loss function by an iterative

procedure.• Other ways to minimize the loss function include

a merge-split approach.

Page 19: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

19

Minimizing the loss Function• E-step: Fix values for and minimize w.r.t.

• Assign each data point to its nearest prototype• M-step: Fix values for and minimize w.r.t

• This gives

• Each prototype set to the mean of points in thatcluster.

• Convergence guaranteed since there are a finitenumber of possible settings for the responsibilities.

• It can only find the local minima, we should start thealgorithm with many different initial settings.

Page 20: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

20

Page 21: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

21

Page 22: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

22

Page 23: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

23

Page 24: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

24

Page 25: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

25

Page 26: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

26

Page 27: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

27

Page 28: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

28

Page 29: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

29

The Cost Function after eachE and M step

Page 30: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

30

How to Choose K?• In some cases it is known apriori from problem

domain.• Generally, it has to be estimated from data and

usually selected by some heuristics in practice.• Recall the choice of parameter K in nearest-neighbor.

• The loss function J generally decrease withincreasing K

• Idea: Assume that K* is the right number• We assume that for K<K* each estimated cluster

contains a subset of true underlying groups• For K>K* some natural groups must be split• Thus we assume that for K<K* the cost function

falls substantially, afterwards not a lot more

Page 31: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

31

How to Choose K?

• The Gap statistic provides a more principled way ofsetting K.

K=2

Page 32: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

32

Initializing K-means

• K-means converge to a local optimum.• Clusters produced will depend on the

initialization.• Some heuristics

• Randomly pick K points as prototypes.• A greedy strategy. Pick prototype so that it is

farthest from prototypes .

Page 33: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

33

Limitations of K-means• Hard assignments of data points to clusters

• Small shift of a data point can flip it to a different cluster• Solution: replace hard clustering of K-means with soft

probabilistic assignments (GMM)• Assumes spherical clusters and equal probabilities

for each cluster.• Solution: GMM

• Clusters arbitrary with different values of K• As K is increased, cluster memberships change in an

arbitrary way, the clusters are not necessarily nested• Solution: hierarchical clustering

• Sensitive to outliers.• Solution: use a different loss function.

• Works poorly on non-convex clusters.• Solution: spectral clustering

Page 34: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

34

Outline• Introduction

• Unsupervised learning• What is cluster analysis?• Applications of clustering

• Dissimilarity (similarity) of samples• Clustering algorithms

• K-means• Gaussian mixture model (GMM)• Hierarchical clustering• Spectral clustering

Page 35: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

35

The Gaussian Distribution• Multivariate Gaussian

• Maximum likelihood estimationmean covariance

Page 36: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

36

Gaussian Mixture• Linear combination of Gaussians

where

parameters to be estimated

Page 37: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

37

Gaussian Mixture• To generate a data point:

• first pick one of the components with probability• then draw a sample from that component distribution

• Each data point is generated by one of K components, a latentvariable is associated with each

Page 38: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

38

Synthetic Data Set Without Colours

Page 39: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

39

• Loss function: The negative log likelihood of thedata.• Equivalently, maximize the log likelihood.

• Without knowing values of latent variables, we haveto maximize the incomplete log likelihood.• Sum over components appears inside the logarithm, no

closed-form solution.

Gaussian Mixture

Page 40: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

40

Fitting the Gaussian Mixture• Given the complete data set

• Maximize the complete log likelihood.

• Trivial closed-form solution: fit each component to thecorresponding set of data points.

• Observe that if all the and are equal, then thecomplete log likelihood is exactly the loss function used inK-means.

• Need a procedure that would let us optimize the incompletelog likelihood by working with the (easier) complete loglikelihood instead.

Page 41: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

41

The Expectation-Maximization(EM) Algorithm• E-step: for given parameter values we can

compute the expected values of the latentvariables (responsibilities of data points)

• Note that instead of but we stillhave

Bayes rule

Page 42: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

42

The EM Algorithm

• M-step: maximize the expected complete loglikelihood

• Parameter update:

Page 43: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

43

The EM Algorithm• Iterate E-step and M-step until the log

likelihood of data does not increase anymore.• Converge to local optima.• Need to restart algorithm with different

initial guess of parameters (as in K-means).

• Relation to K-means• Consider GMM with common covariance.

• As , two methods coincide.

Page 44: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

44

Page 45: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

45

Page 46: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

46

Page 47: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

47

Page 48: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

48

Page 49: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

49

Page 50: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

50

K-means vs GMM• Loss function:

• Minimize sum of squaredEuclidean distance.

• Can be optimized by an EMalgorithm.• E-step: assign points to

clusters.• M-step: optimize clusters.• Performs hard assignment

during E-step.• Assumes spherical clusters

with equal probability of acluster.

• Loss function• Minimize the negative log-

likelihood.• EM algorithm

• E-step: Compute posteriorprobability of membership.

• M-step: Optimize parameters.• Perform soft assignment

during E-step.• Can be used for non-spherical

clusters. Can generate clusterswith different probabilities.

Page 51: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

51

• K-means not robust.• Squared Euclidean distance gives greater weight

to more distant points.• Only the dissimilarity matrix may be given

and not the attributes.• Attributes may not be quantitative.

K-medoids

Page 52: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

52

K-medoids• Restrict the prototypes to one of the data points

assigned to the cluster.• E-step: Fix the prototypes and minimize w.r.t.

• Assigns each data point to its nearest prototype• M-step: Fix values for and minimize w.r.t the

prototypes.

Page 53: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

53

K-medoids: Example

• Use L1 distance instead of squaredEuclidean distance.

• Prototype is the median of points in a cluster.• Recall the connection between linear and L1

regression.

Page 54: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

54

Outline• Introduction

• Unsupervised learning• What is cluster analysis?• Applications of clustering

• Dissimilarity (similarity) of samples• Clustering algorithms

• K-means• Gaussian mixture model (GMM)• Hierarchical clustering• Spectral clustering

Page 55: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

55

Hierarchical Clustering• Organize the clusters in an hierarchical way• Produces a rooted (binary) tree (dendrogram)

Step 0 Step 1 Step 2 Step 3 Step 4

b

dc

e

a a b

d ec d e

a b c d e

Step 4 Step 3 Step 2 Step 1 Step 0

agglomerative

divisive

Page 56: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

56

Hierarchical Clustering• Two kinds of strategy

• Bottom-up (agglomerative): Recursively merge two groups withthe smallest between-cluster dissimilarity (defined later on).

• Top-down (divisive): In each step, split a least coherent cluster(e.g. largest diameter); splitting a cluster is also a clusteringproblem (usually done in a greedy way); less popular thanbottom-up way.

Step 0 Step 1 Step 2 Step 3 Step 4

b

dc

e

a a b

d ec d e

a b c d e

Step 4 Step 3 Step 2 Step 1 Step 0

agglomerative

divisive

Page 57: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

57

Hierarchical Clustering• User can choose a cut through the hierarchy to

represent the most natural division into clusters• e.g, Choose the cut where intergroup dissimilarity

exceeds some threshold

Step 0 Step 1 Step 2 Step 3 Step 4

b

dc

e

a a b

d ec d e

a b c d e

Step 4 Step 3 Step 2 Step 1 Step 0

agglomerative

divisive

3 2

Page 58: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

58

Hierarchical Clustering

• Have to measure the dissimilarity for two disjointgroups G and H, is computed frompairwise dissimilarities

• Single Linkage: tends to yield extended clusters.

• Complete Linkage: tends to yield round clusters.

• Group Average: tradeoff between them. Not invariantunder monotone increasing transform.

Page 59: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

59

Example: Human Tumor Microarray Data

• 6830×64 matrix of real numbers.• Rows correspond to genes,

columns to tissue samples.• Cluster rows (genes) can deduce

functions of unknown genes from known genes with similar expression profiles.

• Cluster columns (samples) can identify disease profiles: tissues with similar disease should yield similar expression profiles.

Gene expression matrix

Page 60: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

60

Example: Human Tumor Microarray Data

• 6830×64 matrix of real numbers• GA clustering of the microarray data

• Applied separately to rows and columns.• Subtrees with tighter clusters placed on the left.• Produces a more informative picture of genes and samples than

the randomly ordered rows and columns.

Page 61: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

61

Outline• Introduction

• Unsupervised learning• What is cluster analysis?• Applications of clustering

• Dissimilarity (similarity) of samples• Clustering algorithms

• K-means• Gaussian mixture model (GMM)• Hierarchical clustering• Spectral clustering

Page 62: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

62

Spectral Clustering

• Represent datapoints as the vertices V of agraph G.

• All pairs of vertices are connected by anedge E.

• Edges have weights W.• Large weights mean that the adjacent vertices are

very similar; small weights imply dissimilarity.

Page 63: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

63

Graph partitioning• Clustering on a graph is equivalent to partitioning

the vertices of the graph.• A loss function for a partition of V into sets A and B

• In a good partition, vertices in different partitions willbe dissimilar.• Mincut criterion: Find a partition that minimizes

Page 64: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

64

• Mincut criterion ignores the size of thesubgraphs formed.

• Normalized cut criterion favors balancedpartitions.

• Minimizing the normalized cut criterion exactlyis NP-hard.

Graph partitioning

Page 65: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

65

Spectral Clustering• One way of approximately optimizing the normalized

cut criterion leads to spectral clustering.• Spectral clustering

• Find a new representation of the original data points.• Cluster the points in this representation using any

clustering scheme (say 2-means).• The representation involves forming the row-

normalized matrix using the largest 2eigenvectors of the matrix

Page 66: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

66

Example: 2-means

Page 67: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

67

Example: Spectral clustering

Page 68: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

68

Learning Dissimilarity• Suppose a user indicates certain objects are considered by him to be

“similar”:

• Consider learning a dissimilarity that respects this subjectivity

• If A is identity matrix, it corresponds to Euclidean distance• Generally, A parameterizes a family of Mahalanobis distance

• Leaning such a dissimilarity is equivalent to finding a rescaling of data by

replacing with , and then applying the standard Euclidean

distance

Page 69: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

69

Learning Dissimilarity

• A simple way to define a criterion for thedesired dissimilarity:

• A convex optimization problem, could besolved by gradient descent and iterativeprojection

• For details, see [Xing, Ng, Jordan, Russell ’03]

Page 70: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

70

Learning Dissimilarity

Page 71: Clustering - ic.unicamp.br file2 Outline •Introduction • Unsupervised learning • What is cluster analysis? • Applications of clustering •Dissimilarity (similarity) of samples

71

References

• Hastie, Tibshirani and Friedman, TheElements of Statistical Learning, Chapter 14

• Bishop, Pattern Recognition and MachineLearning, Chapter 9