l22 kmeans gmm - virginia tech
TRANSCRIPT
![Page 1: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/1.jpg)
ECE 5424: Introduction to Machine Learning
Stefan LeeVirginia Tech
Topics: – Unsupervised Learning: Kmeans, GMM, EM
Readings: Barber 20.1-20.3
![Page 2: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/2.jpg)
Tasks
(C) Dhruv Batra 2
Classificationx y
Regressionx y
Discrete
Continuous
Clusteringx c Discrete ID
DimensionalityReductionx z Continuous
Supervised Learning
Unsupervised Learning
![Page 3: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/3.jpg)
Unsupervised Learning• Learning only with X
– Y not present in training data
• Some example unsupervised learning problems:– Clustering / Factor Analysis– Dimensionality Reduction / Embeddings– Density Estimation with Mixture Models
(C) Dhruv Batra 3
![Page 4: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/4.jpg)
New Topic: Clustering
Slide Credit: Carlos Guestrin 4
![Page 5: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/5.jpg)
Synonyms• Clustering
• Vector Quantization
• Latent Variable Models• Hidden Variable Models• Mixture Models
• Algorithms:– K-means– Expectation Maximization (EM)
(C) Dhruv Batra 5
![Page 6: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/6.jpg)
Some Data
6(C) Dhruv Batra Slide Credit: Carlos Guestrin
![Page 7: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/7.jpg)
K-means
1. Ask user how many clusters they’d like.
(e.g. k=5)
7(C) Dhruv Batra Slide Credit: Carlos Guestrin
![Page 8: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/8.jpg)
K-means
1. Ask user how many clusters they’d like.
(e.g. k=5)
2. Randomly guess k cluster Center locations
8(C) Dhruv Batra Slide Credit: Carlos Guestrin
![Page 9: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/9.jpg)
K-means
1. Ask user how many clusters they’d like.
(e.g. k=5)
2. Randomly guess k cluster Center locations
3. Each datapoint finds out which Center it’s closest to. (Thus each Center “owns” a set of datapoints)
9(C) Dhruv Batra Slide Credit: Carlos Guestrin
![Page 10: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/10.jpg)
K-means
1. Ask user how many clusters they’d like.
(e.g. k=5)
2. Randomly guess kcluster Center locations
3. Each datapoint finds out which Center it’s
closest to.
4. Each Center finds the centroid of the points it owns
10(C) Dhruv Batra Slide Credit: Carlos Guestrin
![Page 11: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/11.jpg)
K-means
1. Ask user how many clusters they’d like.
(e.g. k=5)
2. Randomly guess kcluster Center locations
3. Each datapoint finds out which Center it’s
closest to.
4. Each Center finds the centroid of the points it owns
5. …Repeat until terminated!
11(C) Dhruv Batra Slide Credit: Carlos Guestrin
![Page 12: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/12.jpg)
K-means• Randomly initialize k centers
– �(0) = �1(0),…, �k(0)
• Assign: – Assign each point i�{1,…n} to nearest center:–
• Recenter: – 𝜇# becomes centroid of its points
12(C) Dhruv Batra Slide Credit: Carlos Guestrin
C(i) ⇥� argminj
||xi � µj ||2
![Page 13: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/13.jpg)
K-means• Demo
– http://mlehman.github.io/kmeans-javascript/
(C) Dhruv Batra 13
![Page 14: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/14.jpg)
What is K-means optimizing? • Objective F(�,C): function of centers �and point allocations C:
–
– 1-of-k encoding
• Optimal K-means:– min�mina F(�,a)
14(C) Dhruv Batra
F (µ, C) =NX
i=1
||xi � µC(i)||2
F (µ,a) =NX
i=1
kX
j=1
aij ||xi � µj ||2
![Page 15: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/15.jpg)
Coordinate descent algorithms
15(C) Dhruv Batra Slide Credit: Carlos Guestrin
• Want: mina minb F(a,b)
• Coordinate descent:– fix a, minimize b– fix b, minimize a– repeat
• Converges!!!– if F is bounded– to a (often good) local optimum
• as we saw in applet (play with it!)
• K-means is a coordinate descent algorithm!
![Page 16: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/16.jpg)
• Optimize objective function:
• Fix �, optimize a (or C)
16(C) Dhruv Batra Slide Credit: Carlos Guestrin
K-means as Co-ordinate Descent
minµ1,...,µk
mina1,...,aN
F (µ,a) = minµ1,...,µk
mina1,...,aN
NX
i=1
kX
j=1
aij ||xi � µj ||2
![Page 17: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/17.jpg)
• Optimize objective function:
• Fix a (or C), optimize �
17(C) Dhruv Batra Slide Credit: Carlos Guestrin
K-means as Co-ordinate Descent
minµ1,...,µk
mina1,...,aN
F (µ,a) = minµ1,...,µk
mina1,...,aN
NX
i=1
kX
j=1
aij ||xi � µj ||2
![Page 18: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/18.jpg)
One important use of K-means• Bag-of-word models in computer vision
(C) Dhruv Batra 18
![Page 19: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/19.jpg)
Bag of Words model
aardvark 0
about 2
all 2
Africa 1
apple 0
anxious 0
...
gas 1
...
oil 1
…
Zaire 0
Slide Credit: Carlos Guestrin(C) Dhruv Batra 19
![Page 20: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/20.jpg)
Object Bag of ‘words’
Fei-‐Fei Li
![Page 21: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/21.jpg)
Fei-‐Fei Li
![Page 22: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/22.jpg)
Interest Point Features
Normalize patch
Detect patches[Mikojaczyk and Schmid ’02]
[Matas et al. ’02]
[Sivic et al. ’03]
Compute SIFT
descriptor[Lowe’99]
Slide credit: Josef Sivic
![Page 23: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/23.jpg)
…
Patch Features
Slide credit: Josef Sivic
![Page 24: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/24.jpg)
dictionary formation
…
Slide credit: Josef Sivic
![Page 25: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/25.jpg)
Clustering (usually k-means)
Vector quantization
…
Slide credit: Josef Sivic
![Page 26: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/26.jpg)
Clustered Image Patches
Fei-Fei et al. 2005
![Page 27: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/27.jpg)
Image representation
…..
frequency
codewords
Fei-‐Fei Li
![Page 28: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/28.jpg)
(One) bad case for k-means
• Clusters may overlap• Some clusters may be “wider” than others
• GMM to the rescue!
Slide Credit: Carlos Guestrin(C) Dhruv Batra 28
![Page 29: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/29.jpg)
GMM
(C) Dhruv Batra 29Figure Credit: Kevin Murphy
−25 −20 −15 −10 −5 0 5 10 15 20 250
5
10
15
20
25
30
35
![Page 30: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/30.jpg)
Recall Multi-variate Gaussians
(C) Dhruv Batra 30
![Page 31: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/31.jpg)
GMM
(C) Dhruv Batra 31
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Figure Credit: Kevin Murphy
![Page 32: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/32.jpg)
Hidden Data Causes Problems #1• Fully Observed (Log) Likelihood factorizes
• Marginal (Log) Likelihood doesn’t factorize
• All parameters coupled!
(C) Dhruv Batra 32
![Page 33: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/33.jpg)
GMM vs Gaussian Joint Bayes Classifier• On Board
– Observed Y vs Unobserved Z– Likelihood vs Marginal Likelihood
(C) Dhruv Batra 33
![Page 34: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/34.jpg)
Hidden Data Causes Problems #2
(C) Dhruv Batra 34
−25 −20 −15 −10 −5 0 5 10 15 20 250
5
10
15
20
25
30
35
Figure Credit: Kevin Murphy
![Page 35: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/35.jpg)
Hidden Data Causes Problems #2• Identifiability
(C) Dhruv Batra 35
−25 −20 −15 −10 −5 0 5 10 15 20 250
5
10
15
20
25
30
35
µ1
µ2
−15.5 −10.5 −5.5 −0.5 4.5 9.5 14.5 19.5
−15.5
−10.5
−5.5
−0.5
4.5
9.5
14.5
19.5
Figure Credit: Kevin Murphy
![Page 36: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/36.jpg)
Hidden Data Causes Problems #3• Likelihood has singularities if one Gaussian “collapses”
(C) Dhruv Batra 36x
p(x
)
![Page 37: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/37.jpg)
Special case: spherical Gaussians and hard assignments
Slide Credit: Carlos Guestrin(C) Dhruv Batra 37
• If P(X|Z=k) is spherical, with same �for all classes:
• If each xi belongs to one class C(i) (hard assignment), marginal likelihood:
• M(M)LE same as K-means!!!
P(xi, y = j)j=1
k
∑i=1
N
∏ ∝ exp − 12σ 2 xi −µC(i)
2%
&'(
)*i=1
N
∏
P(xi | z = j)∝ exp −12σ 2 xi −µ j
2#
$%&
'(
![Page 38: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/38.jpg)
The K-means GMM assumption
• There are k components
• Component i has an associated mean vector µι
µ1
µ2
µ3
Slide Credit: Carlos Guestrin(C) Dhruv Batra 38
![Page 39: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/39.jpg)
The K-means GMM assumption
• There are k components
• Component i has an associated mean vector µι
• Each component generates data from a Gaussian with mean mi and
covariance matrix σ2Ι
Each data point is generated according to the following recipe:
µ1
µ2
µ3
Slide Credit: Carlos Guestrin(C) Dhruv Batra 39
![Page 40: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/40.jpg)
The K-means GMM assumption
• There are k components
• Component i has an associated mean vector µι
• Each component generates data from a Gaussian with
mean mi and covariance matrix σ2Ι
Each data point is generated according to the following
recipe:
1. Pick a component at random: Choose component i with
probability P(y=i)
µ2
Slide Credit: Carlos Guestrin(C) Dhruv Batra 40
![Page 41: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/41.jpg)
The K-means GMM assumption• There are k components
• Component i has an associated mean vector µι
• Each component generates data from a Gaussian with
mean mi and covariance matrix σ2Ι
Each data point is generated according to the following
recipe:
1. Pick a component at random: Choose component i with
probability P(y=i)
2. Datapoint ∼ Ν(µι, σ2Ι )
µ2
x
Slide Credit: Carlos Guestrin(C) Dhruv Batra 41
![Page 42: L22 kmeans gmm - Virginia Tech](https://reader030.vdocuments.mx/reader030/viewer/2022012615/619dcfa46a93fb79460480c8/html5/thumbnails/42.jpg)
The General GMM assumption
µ1
µ2
µ3
• There are k components
• Component i has an associated mean vector mi
• Each component generates data from a Gaussian with
mean mi and covariance matrix Σi
Each data point is generated according to the following
recipe:
1. Pick a component at random: Choose component i with
probability P(y=i)
2. Datapoint ~ N(mi, Σi )Slide Credit: Carlos Guestrin(C) Dhruv Batra 42