![Page 1: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/1.jpg)
Clustering Lecture 8
David Sontag New York University
Slides adapted from Luke Zettlemoyer, Vibhav Gogate, Carlos Guestrin, Andrew Moore, Dan Klein
![Page 2: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/2.jpg)
Clustering
Clustering: – Unsupervised learning
– Requires data, but no labels
– Detect patterns e.g. in • Group emails or search results
• Customer shopping patterns
• Regions of images
– Useful when don’t know what you’re looking for
– But: can get gibberish
![Page 3: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/3.jpg)
Clustering • Basic idea: group together similar instances • Example: 2D point patterns
![Page 4: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/4.jpg)
Clustering • Basic idea: group together similar instances • Example: 2D point patterns
![Page 5: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/5.jpg)
Clustering • Basic idea: group together similar instances • Example: 2D point patterns
• What could “similar” mean? – One option: small Euclidean distance (squared)
– Clustering results are crucially dependent on the measure of similarity (or distance) between “points” to be clustered
dist(~x, ~y) = ||~x� ~y||22
![Page 6: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/6.jpg)
Clustering algorithms
!"#$%&'()*�+"*,'(%-.$
• /(&'+'0-(0+"�+"*,'(%-.$�– 1,%%,.�#2�3 +**",.&'+%(4&– 5,2�6,7)�3 6(4($(4&�
• 8+'%(%(,)�+"*,'(%-.$�9:"+%;– <�.&+)$– =(>%#'&�,?�@+#$$(+)– A2&0%'+"�!"#$%&'()*
!"#$%&'()*�+"*,'(%-.$
• /(&'+'0-(0+"�+"*,'(%-.$�– 1,%%,.�#2�3 +**",.&'+%(4&– 5,2�6,7)�3 6(4($(4&�
• 8+'%(%(,)�+"*,'(%-.$�9:"+%;– <�.&+)$– =(>%#'&�,?�@+#$$(+)– A2&0%'+"�!"#$%&'()*
![Page 7: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/7.jpg)
Clustering examples
Image segmenta2on Goal: Break up the image into meaningful or perceptually similar regions
[Slide from James Hayes]
![Page 8: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/8.jpg)
Clustering examples
9
Clustering gene
expression data
Eisen et al, PNAS 1998
![Page 9: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/9.jpg)
Clustering examples
Cluster news ar2cles
![Page 10: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/10.jpg)
Clustering examples Cluster people by space and 2me
[Image from Pilho Kim]
![Page 11: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/11.jpg)
Clustering examples Clustering languages
[Image from scienceinschool.org]
![Page 12: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/12.jpg)
Clustering examples
Clustering languages
[Image from dhushara.com]
![Page 13: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/13.jpg)
Clustering examples
Clustering species
(“phylogeny”)
[Lindblad-Toh et al., Nature 2005]
![Page 14: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/14.jpg)
Clustering examples
Clustering search queries
![Page 15: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/15.jpg)
K-Means • An iterative clustering
algorithm
– Initialize: Pick K random points as cluster centers
– Alternate: 1. Assign data points to
closest cluster center 2. Change the cluster
center to the average of its assigned points
– Stop when no points’ assignments change
![Page 16: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/16.jpg)
K-Means • An iterative clustering
algorithm
– Initialize: Pick K random points as cluster centers
– Alternate: 1. Assign data points to
closest cluster center 2. Change the cluster
center to the average of its assigned points
– Stop when no points’ assignments change
![Page 17: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/17.jpg)
K-‐means clustering: Example
• Pick K random points as cluster centers (means)
Shown here for K=2
17
![Page 18: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/18.jpg)
K-‐means clustering: Example
Iterative Step 1
• Assign data points to closest cluster center
18
![Page 19: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/19.jpg)
K-‐means clustering: Example
19
Iterative Step 2
• Change the cluster center to the average of the assigned points
![Page 20: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/20.jpg)
K-‐means clustering: Example
• Repeat unDl convergence
20
![Page 21: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/21.jpg)
ProperDes of K-‐means algorithm
• Guaranteed to converge in a finite number of iteraDons
• Running Dme per iteraDon: 1. Assign data points to closest cluster center
O(KN) time
2. Change the cluster center to the average of its assigned points
O(N)
![Page 22: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/22.jpg)
!"#$%& '(%)#*+#%,#!"#$%&'($
���� ���� � � � �� ��������
-. /01��2�(340"05#�!"
���� �� � �� ����
�
���� ���� � � � ���
�
�6. /01�!#�(340"05#���
���� � � � �� ��������
– 7$8#�3$*40$9�:#*0)$40)#�(;��� $%:��4(�5#*(2�<#�=$)#
!"#$�%�&'�()#*+,
�� �����
� ����
!"#$�-�&'�()#*+,
!"#$%& 4$8#&�$%�$94#*%$40%+�(340"05$40(%�$33*($,=2�#$,=�&4#3�0&�+>$*$%4##:�4(�:#,*#$&#�4=#�(?@#,40)#�A 4=>&�+>$*$%4##:�4(�,(%)#*+#
[Slide from Alan Fern]
with respect to
![Page 23: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/23.jpg)
Example: K-Means for Segmentation
K = 2 K = 3 K = 10 Original imageK=2 Original Goal of Segmentation is to partition an image into regions each of which has reasonably homogenous visual appearance.
![Page 24: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/24.jpg)
Example: K-Means for Segmentation
K = 2 K = 3 K = 10 Original imageK=2 K=3 K=10 Original
![Page 25: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/25.jpg)
Example: K-Means for Segmentation
K = 2 K = 3 K = 10 Original imageK=2 K=3 K=10 Original
![Page 26: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/26.jpg)
Example: Vector quantization 514 14. Unsupervised Learning
FIGURE 14.9. Sir Ronald A. Fisher (1890 ! 1962) was one of the foundersof modern day statistics, to whom we owe maximum-likelihood, su!ciency, andmany other fundamental concepts. The image on the left is a 1024"1024 grayscaleimage at 8 bits per pixel. The center image is the result of 2" 2 block VQ, using200 code vectors, with a compression rate of 1.9 bits/pixel. The right image usesonly four code vectors, with a compression rate of 0.50 bits/pixel
We see that the procedure is successful at grouping together samples ofthe same cancer. In fact, the two breast cancers in the second cluster werelater found to be misdiagnosed and were melanomas that had metastasized.However, K-means clustering has shortcomings in this application. For one,it does not give a linear ordering of objects within a cluster: we have simplylisted them in alphabetic order above. Secondly, as the number of clustersK is changed, the cluster memberships can change in arbitrary ways. Thatis, with say four clusters, the clusters need not be nested within the threeclusters above. For these reasons, hierarchical clustering (described later),is probably preferable for this application.
14.3.9 Vector Quantization
The K-means clustering algorithm represents a key tool in the apparentlyunrelated area of image and signal compression, particularly in vector quan-tization or VQ (Gersho and Gray, 1992). The left image in Figure 14.92 is adigitized photograph of a famous statistician, Sir Ronald Fisher. It consistsof 1024! 1024 pixels, where each pixel is a grayscale value ranging from 0to 255, and hence requires 8 bits of storage per pixel. The entire image oc-cupies 1 megabyte of storage. The center image is a VQ-compressed versionof the left panel, and requires 0.239 of the storage (at some loss in quality).The right image is compressed even more, and requires only 0.0625 of thestorage (at a considerable loss in quality).
The version of VQ implemented here first breaks the image into smallblocks, in this case 2!2 blocks of pixels. Each of the 512!512 blocks of four
2This example was prepared by Maya Gupta.
[Figure from Hastie et al. book]
![Page 27: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/27.jpg)
Initialization
• K-means algorithm is a heuristic – Requires initial means – It does matter what you pick!
– What can go wrong?
– Various schemes for preventing this kind of thing: variance-based split / merge, initialization heuristics
![Page 28: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/28.jpg)
K-Means Getting Stuck
A local optimum:
Would be better to have one cluster here
… and two clusters here
![Page 29: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/29.jpg)
K-means not able to properly cluster
X
Y
![Page 30: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/30.jpg)
Changing the features (distance function) can help
θ
R
![Page 31: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/31.jpg)
Hierarchical Clustering
![Page 32: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/32.jpg)
Agglomerative Clustering • Agglomerative clustering:
– First merge very similar instances – Incrementally build larger clusters out
of smaller clusters
• Algorithm: – Maintain a set of clusters – Initially, each instance in its own
cluster – Repeat:
• Pick the two closest clusters • Merge them into a new cluster • Stop when there’s only one cluster left
• Produces not one clustering, but a family of clusterings represented by a dendrogram
![Page 33: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/33.jpg)
Agglomerative Clustering • How should we define “closest” for clusters
with multiple elements?
![Page 34: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/34.jpg)
Agglomerative Clustering • How should we define “closest” for clusters
with multiple elements?
• Many options: – Closest pair
(single-link clustering) – Farthest pair
(complete-link clustering) – Average of all pairs
• Different choices create different clustering behaviors
![Page 35: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/35.jpg)
Agglomerative Clustering • How should we define “closest” for clusters
with multiple elements?
Farthest pair (complete-link clustering)
Closest pair (single-link clustering) Single Link Example
1 2
3 4
5 6
7 8
Complete Link Example
1 2
3 4
5 6
7 8
[Pictures from Thorsten Joachims]
![Page 36: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/36.jpg)
Clustering Behavior Average
Mouse tumor data from [Hastie et al.]
Farthest Nearest
![Page 37: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/37.jpg)
AgglomeraDve Clustering
When can this be expected to work?
Closest pair (single-link clustering) Single Link Example
1 2
3 4
5 6
7 8
Strong separation property: All points are more similar to points in their own cluster than to any points in any other cluster
Then, the true clustering corresponds to some pruning of the tree obtained by single-link clustering!
Slightly weaker (stability) conditions are solved by average-link clustering
(Balcan et al., 2008)
![Page 38: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/38.jpg)
Spectral Clustering
Slides adapted from James Hays, Alan Fern, and Tommi Jaakkola
![Page 39: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/39.jpg)
Spectral clustering
[Shi & Malik ‘00; Ng, Jordan, Weiss NIPS ‘01]
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5nips, 8 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5lineandballs, 3 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5fourclouds, 2 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5squiggles, 4 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5twocircles, 2 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5threecircles−joined, 2 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5threecircles−joined, 3 clusters
−0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1Rows of Y (jittered, randomly subsampled) for twocircles
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5two circles, 2 clusters (K−means)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5threecircles−joined, 3 clusters (connected components)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5lineandballs, 3 clusters (Meila and Shi algorithm)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5nips, 8 clusters (Kannan et al. algorithm)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5nips, 8 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5lineandballs, 3 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5fourclouds, 2 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5squiggles, 4 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5twocircles, 2 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5threecircles−joined, 2 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5threecircles−joined, 3 clusters
−0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1Rows of Y (jittered, randomly subsampled) for twocircles
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5two circles, 2 clusters (K−means)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5threecircles−joined, 3 clusters (connected components)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5lineandballs, 3 clusters (Meila and Shi algorithm)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5nips, 8 clusters (Kannan et al. algorithm)
K-means Spectral clustering
![Page 40: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/40.jpg)
Spectral clustering
[Figures from Ng, Jordan, Weiss NIPS ‘01]
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5nips, 8 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5lineandballs, 3 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5fourclouds, 2 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5squiggles, 4 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5twocircles, 2 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5threecircles−joined, 2 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5threecircles−joined, 3 clusters
−0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1Rows of Y (jittered, randomly subsampled) for twocircles
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5two circles, 2 clusters (K−means)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5threecircles−joined, 3 clusters (connected components)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5lineandballs, 3 clusters (Meila and Shi algorithm)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5nips, 8 clusters (Kannan et al. algorithm)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5nips, 8 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5lineandballs, 3 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5fourclouds, 2 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5squiggles, 4 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5twocircles, 2 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5threecircles−joined, 2 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5threecircles−joined, 3 clusters
−0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1Rows of Y (jittered, randomly subsampled) for twocircles
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5two circles, 2 clusters (K−means)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5threecircles−joined, 3 clusters (connected components)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5lineandballs, 3 clusters (Meila and Shi algorithm)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5nips, 8 clusters (Kannan et al. algorithm)
![Page 41: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/41.jpg)
Spectral clustering
Group points based on links in a graph
A B
[Slide from James Hays]
![Page 42: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/42.jpg)
!"#�$"�%&'($'�$)'�*&(+)�,
• -$�./�0"11"2�$"�3/'�(�*(3//.(2�4'&2'5�$"�0"1+3$'�/.1.5(&.$6�7'$#''2�"78'0$/
� ��
�• 92'�0"35:�0&'($'�– ;�<3556�0"22'0$':�=&(+)– 4�2'(&'/$�2'.=)7"&�=&(+)�>'(0)�2":'�./�"256�0"22'0$':�$"�.$/�4�2'(&'/$�2'.=)7"&/?
A B
[Slide from Alan Fern]
![Page 43: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/43.jpg)
Spectral clustering for segmenta7on
[Slide from James Hays]
![Page 44: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/44.jpg)
Can we use minimum cut for clustering?
!"#$%&'( #$" '")!"*#+#,%* -.,#".,+ /'"& ,* !%'# %0 #$"! +."1+'"& %* 2%-+2 3.%3".#,"' %0 #$" ).+3$4 5"-+/'" 3".-"3#/+2).%/3,*) ,' +1%/# "6#.+-#,*) #$" )2%1+2 ,!3."'',%*' %0 +'-"*"( +' 7" '+7 "+.2,".( #$,' 3+.#,#,%*,*) -.,#".,%* %0#"*0+22' '$%.# %0 #$,' !+,* )%+24
8* #$,' 3+3".( 7" 3.%3%'" + *"7 ).+3$9#$"%."#,- -.,#".,%*0%. !"+'/.,*) #$" )%%&*"'' %0 +* ,!+)" 3+.#,#,%*:#$"!"#$%&'()* +,-4 ;" ,*#.%&/-" +*& </'#,0= #$,' -.,#".,%* ,*>"-#,%* ?4 @$" !,*,!,A+#,%* %0 #$,' -.,#".,%* -+* 1"0%.!/2+#"& +' + )"*".+2,A"& ",)"*B+2/" 3.%12"!4 @$"",)"*B"-#%.' -+* 1" /'"& #% -%*'#./-# )%%& 3+.#,#,%*' %0#$" ,!+)" +*& #$" 3.%-"'' -+* 1" -%*#,*/"& ."-/.',B"2= +'&"',."& C>"-#,%* ?4DE4 >"-#,%* F ),B"' + &"#+,2"& "632+*+#,%*%0 #$" '#"3' %0 %/. ).%/3,*) +2)%.,#$!4 8* >"-#,%* G( 7"'$%7 "63".,!"*#+2 ."'/2#'4 @$" 0%.!/2+#,%* +*& !,*,!,A+9#,%* %0 #$" *%.!+2,A"& -/# -.,#".,%* &.+7' %* + 1%&= %0."'/2#' 0.%! #$" 0,"2& %0 '3"-#.+2 ).+3$ #$"%.= C>"-#,%* HE4I"2+#,%*'$,3 #% 7%.J ,* -%!3/#". B,',%* ,' &,'-/''"& ,*>"-#,%* K +*& -%!3+.,'%* 7,#$ ."2+#"& ",)"*B"-#%. 1+'"&'")!"*#+#,%* !"#$%&' ,' ."3."'"*#"& ,* >"-#,%* K4D4 ;"-%*-2/&" ,* >"-#,%* L4
@$" !+,* ."'/2#' ,* #$,' 3+3". 7"." 0,.'# 3."'"*#"& ,* M?NO4
! "#$%&'(" )* "#)&+ &)#,','$('("
P ).+3$ ! ! ""!## -+* 1" 3+.#,#,%*"& ,*#% #7% &,'<%,*#'"#'( "!#( " $# ! $ ( " %# ! &( 1= ',!32= ."!%B,*) "&)"'-%**"-#,*) #$" #7% 3+.#'4 @$" &")."" %0 &,'',!,2+.,#=1"#7""* #$"'" #7% 3,"-"' -+* 1" -%!3/#"& +' #%#+2 7",)$#%0 #$" "&)"' #$+# $+B" 1""* ."!%B"&4 8* ).+3$ #$"%."#,-2+*)/+)"( ,# ,' -+22"& #$" +,-Q
%&'""!## !!
&'"!('#)"&! (#* "!#
@$" %3#,!+2 1,3+.#,#,%*,*) %0 + ).+3$ ,' #$" %*" #$+#!,*,!,A"' #$,' +,- B+2/"4 P2#$%/)$ #$"." +." +* "63%*"*#,+2*/!1". %0 '/-$ 3+.#,#,%*'( 0,*&,*) #$" $'!'$,$ +,- %0 +).+3$ ,' + 7"229'#/&,"& 3.%12"! +*& #$"." "6,'# "00,-,"*#+2)%.,#$!' 0%. '%2B,*) ,#4
;/ +*& R"+$= M?HO 3.%3%'"& + -2/'#".,*) !"#$%& 1+'"&%* #$,' !,*,!/! -/# -.,#".,%*4 8* 3+.#,-/2+.( #$"= '""J #%3+.#,#,%* + ).+3$ ,*#% J9'/1).+3$' '/-$ #$+# #$" !+6,!/!-/# +-.%'' #$" '/1).%/3' ,' !,*,!,A"&4 @$,' 3.%12"! -+* 1""00,-,"*#2= '%2B"& 1= ."-/.',B"2= 0,*&,*) #$" !,*,!/! -/#'#$+# 1,'"-# #$" "6,'#,*) '")!"*#'4 P' '$%7* ,* ;/ +*&R"+$=S' 7%.J( #$,' )2%1+22= %3#,!+2 -.,#".,%* -+* 1" /'"& #%3.%&/-" )%%& '")!"*#+#,%* %* '%!" %0 #$" ,!+)"'4
T%7"B".( +' ;/ +*& R"+$= +2'% *%#,-"& ,* #$",. 7%.J(#$" !,*,!/! -/# -.,#".,+ 0+B%.' -/##,*) '!+22 '"#' %0,'%2+#"& *%&"' ,* #$" ).+3$4 @$,' ,' *%# '/.3.,',*) ',*-"#$" -/# &"0,*"& ,* CDE ,*-."+'"' 7,#$ #$" */!1". %0 "&)"')%,*) +-.%'' #$" #7% 3+.#,#,%*"& 3+.#'4 U,)4 D ,22/'#.+#"' %*"'/-$ -+'"4 P''/!,*) #$" "&)" 7",)$#' +." ,*B".'"2=3.%3%.#,%*+2 #% #$" &,'#+*-" 1"#7""* #$" #7% *%&"'( 7"'"" #$" -/# #$+# 3+.#,#,%*' %/# *%&" +! %. +" 7,22 $+B" + B".='!+22 B+2/"4 8* 0+-#( +*= -/# #$+# 3+.#,#,%*' %/# ,*&,B,&/+2*%&"' %* #$" .,)$# $+20 7,22 $+B" '!+22". -/# B+2/" #$+* #$"-/# #$+# 3+.#,#,%*' #$" *%&"' ,*#% #$" 2"0# +*& .,)$# $+2B"'4
@% +B%,& #$,' /**+#/.+2 1,+' 0%. 3+.#,#,%*,*) %/# '!+22'"#' %0 3%,*#'( 7" 3.%3%'" + *"7 !"+'/." %0 &,'+''%-,+#,%*
1"#7""* #7% ).%/3'4 8*'#"+& %0 2%%J,*) +# #$" B+2/" %0 #%#+2"&)" 7",)$# -%**"-#,*) #$" #7% 3+.#,#,%*'( %/. !"+'/."-%!3/#"' #$" -/# -%'# +' + 0.+-#,%* %0 #$" #%#+2 "&)"-%**"-#,%*' #% +22 #$" *%&"' ,* #$" ).+3$4 ;" -+22 #$,'&,'+''%-,+#,%* !"+'/." #$" !"#$%&'()* +,- C.+,-EQ
,%&'""!## ! %&'""!##-../%""! $ #
( %&'""!##-../%"#! $ #
! ""#
7$"." -../%""! $ # !"
&'"!''$ )"&! '# ,' #$" #%#+2 -%**"-#,%*
0.%! *%&"' ,* P #% +22 *%&"' ,* #$" ).+3$ +*& -../%"#! $ # ,'
',!,2+.2= &"0,*"&4 ;,#$ #$,' &"0,*,#,%* %0 #$" &,'+''%-,+#,%*
1"#7""* #$" ).%/3'( #$" -/# #$+# 3+.#,#,%*' %/# '!+22
,'%2+#"& 3%,*#' 7,22 *% 2%*)". $+B" '!+22 .+,- B+2/"( ',*-"
#$" +,- B+2/" 7,22 +2!%'# -".#+,*2= 1" + 2+.)" 3".-"*#+)" %0
#$" #%#+2 -%**"-#,%* 0.%! #$+# '!+22 '"# #% +22 %#$". *%&"'4 8*
#$" -+'" ,22/'#.+#"& ,* U,)4 D( 7" '"" #$+# #$" %&'! B+2/"
+-.%'' *%&" +! 7,22 1" DNN 3".-"*# %0 #$" #%#+2 -%**"-#,%*
0.%! #$+# *%&"48* #$" '+!" '3,.,#( 7" -+* &"0,*" + !"+'/." 0%. #%#+2
*%.!+2,A"& +''%-,+#,%* 7,#$,* ).%/3' 0%. + ),B"* 3+.#,#,%*Q
,-../%""!## ! -../%""!"#-../%""! $ #
( -../%"#!##-../%"#! $ #
! "##
7$"." -../%""!"# +*& -../%"#!## +." #%#+2 7",)$#' %0"&)"' -%**"-#,*) *%&"' 7,#$,* " +*& #( ."'3"-#,B"2=4 ;"'"" +)+,* #$,' ,' +* /*1,+'"& !"+'/."( 7$,-$ ."02"-#' $%7#,)$#2= %* +B".+)" *%&"' 7,#$,* #$" ).%/3 +." -%**"-#"& #%"+-$ %#$".4
P*%#$". ,!3%.#+*# 3.%3".#= %0 #$,' &"0,*,#,%* %0 +''%-,+9#,%* +*& &,'+''%-,+#,%* %0 + 3+.#,#,%* ,' #$+# #$"= +."*+#/.+22= ."2+#"&Q
,%&'""!## ! %&'""!##-../%""! $ # (
%&'""!##-../%"#! $ #
! -../%""! $ # ) -../%""!"#-../%""! $ #
( -../%"#! $ # ) -../%"#!##-../%"#! $ #
! ") -../%""!"#-../%""! $ # (
-../%"#!##-../%"#! $ #
# $
! "),-../%""!##*
T"*-"( #$" #7% 3+.#,#,%* -.,#".,+ #$+# 7" '""J ,* %/.).%/3,*) +2)%.,#$!( !,*,!,A,*) #$" &,'+''%-,+#,%* 1"#7""*#$" ).%/3' +*& !+6,!,A,*) #$" +''%-,+#,%* 7,#$,* #$"
!"# $%& '$(#)* %+,'$(#-.& /01! $%& #'$2. !.2'.%1$1#+% 334
5678 98 $ :;<= >?=@= A6B6ACA :CD 76E=< ; F;G H;@D6D6IB8
[Shi & Malik ‘00]
![Page 45: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/45.jpg)
Graph�partitioning
![Page 46: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/46.jpg)
Graph�Terminologies• Degree�of�nodes
• Volume�of�a�set
![Page 47: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/47.jpg)
Graph�Cut• Consider�a�partition�of�the�graph�into�two�parts�A�and�B
• Cut(A,�B):�sum�of�the�weights�of�the�set�of�edges�that�connect�the�two�groups
• An�intuitive�goal�is�find�the�partition�that��minimizes�the�cut
![Page 48: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/48.jpg)
Normalized�Cut
• Consider�the�connectivity�between�groups�relative�to�the�volume�of�each�group
A
B)(),(
)(),(
),(BVolBAcut
AVolBAcut
BANcut ��
)()()()(
),(),(BVolAVolBVolAVol
BAcutBANcut�
�
Minimized�when�Vol(A)�and�Vol(B)�are�equal.�Thus�encourage�balanced�cut
![Page 49: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/49.jpg)
01�DyTSubject�to:
Solving�NCut• How�to�minimize�Ncut?
• With�some�simplifications,�we�can�show:
DyyyWDy
xNcut T
T
yx)(
min)(min�
�
Rayleigh�quotient
NP�Hard!
.1)(,}1,1{ in vector a be Let );,(),( matrix, diag. thebe DLet ;),( matrix, similarity thebe Let ,
AiixxjiWiiD
WjiWW
Nj
ji
�����
��
(y takes discrete values)
![Page 50: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/50.jpg)
• Relax�the�optimization�problem�into�the�continuous�domain�by�solving�generalized�eigenvalue�system:
���" �� � ' � � subject�to����� ( �
• Which�gives: � ' � � ( ���• Note�that� � ' � � ( �,�so�the�first�eigenvector�is��� ( �
with�eigenvalue��.• The�second�smallest�eigenvector�is�the�real�valued�solution�to�
this�problem!!
Solving�NCut
![Page 51: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/51.jpg)
2�way�Normalized�Cuts
1. Compute�the�affinity�matrix�W,�compute�the�degree�matrix�(D),�D�is�diagonal�and�
!��2. Solve� ,�where� is�
called�the�Laplacian matrix3. Use�the�eigenvector�with�the�second�smallest�
eigen�value�to�bipartition�the�graph�into�two�parts.
![Page 52: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/52.jpg)
Creating�Bi�partition�Using�2ndEigenvector
• Sometimes�there�is�not�a�clear�threshold�to�split�based�on�the�second�vector�since�it��takes�continuous�values
• How�to�choose�the�splitting�point?�a) Pick�a�constant�value�(0,�or�0.5).b) Pick�the�median�value�as�splitting�point.c) Look�for�the�splitting�point�that�has�the�minimum�Ncut
value:1. Choose�n possible�splitting�points.2. Compute�Ncut value.3. Pick�minimum.
![Page 53: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/53.jpg)
Spectral clustering: example
!3 !2 !1 0 1 2 3 4 5!2
!1
0
1
2
3
4
5
6
!4 !2 0 2 4 6!2
!1
0
1
2
3
4
5
6
Tommi Jaakkola, MIT CSAIL 18
![Page 54: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/54.jpg)
Spectral clustering: example cont’d
0 5 10 15 20 25 30 35 40!0.5
!0.4
!0.3
!0.2
!0.1
0
0.1
0.2
0.3
0.4
0.5
Components of the eigenvector corresponding to the secondlargest eigenvalue
Tommi Jaakkola, MIT CSAIL 19
![Page 55: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/55.jpg)
K�way�Partition?
• Recursive�bi�partitioning�(Hagen�et�al.,̂ 91)– Recursively�apply�bi�partitioning�algorithm�in�a�hierarchical�divisive�manner.
– Disadvantages:�Inefficient,�unstable• Cluster�multiple�eigenvectors– Build�a�reduced�space�from�multiple�eigenvectors.– Commonly�used�in�recent�papers– A�preferable�approach`�its�like�doing�dimension�reduction�then�k�means
![Page 56: lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture8.pdf · Lecture8 David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate,](https://reader033.vdocuments.mx/reader033/viewer/2022043017/5f39b5e1fc77d34ccb580f3e/html5/thumbnails/56.jpg)