k-means and gmm

24
Network Intelligence and Analysis Lab Clustering methods via EM algorithm 2014.07.10 Sanghyuk Chun

Upload: sanghyuk-chun

Post on 14-Jul-2015

196 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: K-means and GMM

Network Intelligence and Analysis Lab

Network Intelligence and Analysis Lab

Clustering methods via EM algorithm

2014.07.10Sanghyuk Chun

Page 2: K-means and GMM

Network Intelligence and Analysis Lab

β€’ Machine Learningβ€’ Training dataβ€’ Learning model

β€’ Unsupervised Learningβ€’ Training data without labelβ€’ Input data: 𝐷𝐷 = {π‘₯π‘₯1, π‘₯π‘₯2, … , π‘₯π‘₯𝑁𝑁}β€’ Most of unsupervised learning problems are trying to find

hidden structure in unlabeled dataβ€’ Examples: Clustering, Dimensionality Reduction (PCA, LDA), …

Machine Learning and Unsupervised Learning

2

Page 3: K-means and GMM

Network Intelligence and Analysis Lab

β€’ Clusteringβ€’ Grouping objects in a such way that objects in the same group

are more similar to each other than other groupsβ€’ Input: a set of objects (or data) without group informationβ€’ Output: cluster index for each object

β€’ Usage: Customer Segmentation, Image Segmentation…

Unsupervised Learning and Clustering

Input Output

ClusteringAlgorithm

3

Page 4: K-means and GMM

Network Intelligence and Analysis Lab

K-means ClusteringIntroductionOptimization

4

Page 5: K-means and GMM

Network Intelligence and Analysis Lab

β€’ Intuition: data in same cluster has shorter distance than data which are in other clusters

β€’ Goal: minimize distance between data in same clusterβ€’ Objective function:

β€’

𝐽𝐽 = �𝑛𝑛=1

𝑁𝑁

οΏ½π‘˜π‘˜=1

𝐾𝐾

π‘Ÿπ‘Ÿπ‘›π‘›π‘˜π‘˜ 𝐱𝐱𝐧𝐧 βˆ’ 𝛍𝛍𝐀𝐀 2

β€’ Where N is number of data points, K is number of clustersβ€’ π‘Ÿπ‘Ÿπ‘›π‘›π‘˜π‘˜ ∈ {0,1} is indicator variables where k describing which of

the K clusters the data point 𝐱𝐱𝐧𝐧 is assigned toβ€’ 𝛍𝛍𝐀𝐀 is a prototype associated with the k-th cluster

β€’ Eventually 𝛍𝛍𝐀𝐀 is same as the center (mean) of cluster

K-means Clustering

5

Page 6: K-means and GMM

Network Intelligence and Analysis Lab

β€’ Objective function:β€’

π‘Žπ‘Žπ‘Ÿπ‘Ÿπ‘Žπ‘Žπ‘Žπ‘Žπ‘Žπ‘Žπ‘›π‘›{π‘Ÿπ‘Ÿπ‘›π‘›π‘›π‘›,𝛍𝛍𝐀𝐀} �𝑛𝑛=1

𝑁𝑁

οΏ½π‘˜π‘˜=1

𝐾𝐾

π‘Ÿπ‘Ÿπ‘›π‘›π‘˜π‘˜ 𝐱𝐱𝐧𝐧 βˆ’ 𝛍𝛍𝐀𝐀 2

β€’ This function can be solved through an iterative procedureβ€’ Step 1: minimize J with respect to the π‘Ÿπ‘Ÿπ‘›π‘›π‘˜π‘˜, keeping 𝛍𝛍𝐀𝐀 is fixedβ€’ Step 2: minimize J with respect to the 𝛍𝛍𝐀𝐀, keeping π‘Ÿπ‘Ÿπ‘›π‘›π‘˜π‘˜ is fixedβ€’ Repeat Step 1,2 until converge

β€’ Does it always converge?

K-means Clustering – Optimization

6

Page 7: K-means and GMM

Network Intelligence and Analysis Lab

β€’ Biconvex optimization is a generalization of convex optimization where the objective function and the constraint set can be biconvex

β€’ 𝑓𝑓 π‘₯π‘₯,𝑦𝑦 is biconvex if fixing x, 𝑓𝑓π‘₯π‘₯ y = 𝑓𝑓 π‘₯π‘₯,𝑦𝑦 is convex over Y and fixing y, 𝑓𝑓𝑦𝑦 π‘₯π‘₯ = 𝑓𝑓 π‘₯π‘₯,𝑦𝑦 is convex over X

β€’ One way to solve biconvex optimization problem is that iteratively solve the corresponding convex problems

β€’ It does not guarantee the global optimal pointβ€’ But it always converge to some local optimum

Optional – Biconvex optimization

7

Page 8: K-means and GMM

Network Intelligence and Analysis Lab

β€’

π‘Žπ‘Žπ‘Ÿπ‘Ÿπ‘Žπ‘Žπ‘Žπ‘Žπ‘Žπ‘Žπ‘›π‘›{π‘Ÿπ‘Ÿπ‘›π‘›π‘›π‘›,𝛍𝛍𝐀𝐀} �𝑛𝑛=1

𝑁𝑁

οΏ½π‘˜π‘˜=1

𝐾𝐾

π‘Ÿπ‘Ÿπ‘›π‘›π‘˜π‘˜ 𝐱𝐱𝐧𝐧 βˆ’ 𝛍𝛍𝐀𝐀 2

β€’ Step 1: minimize J with respect to the π‘Ÿπ‘Ÿπ‘›π‘›π‘˜π‘˜, keeping 𝛍𝛍𝐀𝐀 is fixed

β€’ π‘Ÿπ‘Ÿπ‘›π‘›π‘˜π‘˜ = οΏ½1 π‘Žπ‘Žπ‘“π‘“ π‘˜π‘˜ = π‘Žπ‘Žπ‘Ÿπ‘Ÿπ‘Žπ‘Žπ‘Žπ‘Žπ‘Žπ‘Žπ‘›π‘›π‘—π‘— 𝐱𝐱𝐧𝐧 βˆ’ 𝛍𝛍𝐀𝐀 𝟐𝟐

0 π‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘Ÿπ‘Ÿπ‘œπ‘œπ‘Žπ‘Žπ‘œπ‘œπ‘œπ‘œβ€’ Step 2: minimize J with respect to the 𝛍𝛍𝐀𝐀, keeping π‘Ÿπ‘Ÿπ‘›π‘›π‘˜π‘˜ is fixed

β€’ Derivative with respect to 𝛍𝛍𝐀𝐀 to zero givingβ€’ 2βˆ‘π‘›π‘› π‘Ÿπ‘Ÿπ‘›π‘›π‘˜π‘˜ 𝐱𝐱𝐧𝐧 βˆ’ 𝛍𝛍𝐀𝐀 = 0β€’ 𝛍𝛍𝐀𝐀 = βˆ‘π‘›π‘› π‘Ÿπ‘Ÿπ‘›π‘›π‘›π‘›π±π±π§π§

βˆ‘π‘›π‘› π‘Ÿπ‘Ÿπ‘›π‘›π‘›π‘›β€’ 𝛍𝛍𝐀𝐀 is equal to the mean of all the data assigned to cluster k

K-means Clustering – Optimization

8

Page 9: K-means and GMM

Network Intelligence and Analysis Lab

β€’ Advantage of K-means clusteringβ€’ Easy to implement (kmeans in Matlab, kcluster in Python)β€’ In practice, it works well

β€’ Disadvantage of K-means clusteringβ€’ It can converge to local optimumβ€’ Computing Euclidian distance of every point is expensive

β€’ Solution: Batch K-meansβ€’ Euclidian distance is non-robust to outlier

β€’ Solution: K-medoids algorithms (use different metric)

K-means Clustering – Conclusion

9

Page 10: K-means and GMM

Network Intelligence and Analysis Lab

Mixture of GaussiansMixture ModelEM AlgorithmEM for Gaussian Mixtures

10

Page 11: K-means and GMM

Network Intelligence and Analysis Lab

β€’ Assumption: There are k components: 𝑐𝑐𝑖𝑖 𝑖𝑖=1π‘˜π‘˜

β€’ Component 𝑐𝑐𝑖𝑖 has an associated mean vector πœ‡πœ‡π‘–π‘–β€’ Each component generates data from a Gaussian with mean πœ‡πœ‡π‘–π‘–

and covariance matrix Σ𝑖𝑖

Mixture of Gaussians

πœ‡πœ‡1 πœ‡πœ‡2

πœ‡πœ‡3πœ‡πœ‡4

πœ‡πœ‡5

11

Page 12: K-means and GMM

Network Intelligence and Analysis Lab

β€’ Represent model as linear combination of Gaussiansβ€’ Probability density function of GMMβ€’

𝑝𝑝 π‘₯π‘₯ = οΏ½π‘˜π‘˜=1

𝐾𝐾

πœ‹πœ‹π‘˜π‘˜π‘π‘ π‘₯π‘₯ πœ‡πœ‡π‘˜π‘˜ , Ξ£π‘˜π‘˜

β€’ 𝑁𝑁 π‘₯π‘₯ πœ‡πœ‡π‘˜π‘˜ , Ξ£π‘˜π‘˜ = 12πœ‹πœ‹ 𝑑𝑑/2 Ξ£ 1/2 exp{βˆ’1

2π‘₯π‘₯ βˆ’ πœ‡πœ‡ βŠ€Ξ£βˆ’1 π‘₯π‘₯ βˆ’ πœ‡πœ‡ }

β€’ Which is called a mixture of Gaussian or Gaussian Mixture Modelβ€’ Each Gaussian density is called component of the mixtures and

has its own mean πœ‡πœ‡π‘˜π‘˜ and covariance Ξ£π‘˜π‘˜β€’ The parameters are called mixing coefficients (βˆ‘π‘˜π‘˜ πœ‹πœ‹π‘˜π‘˜ = 1)

Gaussian Mixture Model

12

Page 13: K-means and GMM

Network Intelligence and Analysis Lab

β€’ 𝑝𝑝 π‘₯π‘₯ = βˆ‘π‘˜π‘˜=1𝐾𝐾 πœ‹πœ‹π‘˜π‘˜π‘π‘ π‘₯π‘₯ πœ‡πœ‡π‘˜π‘˜, Ξ£π‘˜π‘˜ , where βˆ‘π‘˜π‘˜ πœ‹πœ‹π‘˜π‘˜ = 1

β€’ Input:β€’ The training set: π‘₯π‘₯𝑖𝑖 𝑖𝑖=1

𝑁𝑁

β€’ Number of clusters: k

β€’ Goal: model this data using mixture of Gaussiansβ€’ Mixing coefficients πœ‹πœ‹1,πœ‹πœ‹2, … ,πœ‹πœ‹π‘˜π‘˜β€’ Means and covariance: πœ‡πœ‡1, πœ‡πœ‡2, … , πœ‡πœ‡π‘˜π‘˜; Ξ£1, Ξ£2, … , Ξ£π‘˜π‘˜

Clustering using Mixture Model

13

Page 14: K-means and GMM

Network Intelligence and Analysis Lab

β€’ 𝑝𝑝 π‘₯π‘₯ 𝐺𝐺 = 𝑝𝑝 π‘₯π‘₯ πœ‹πœ‹1, πœ‡πœ‡1, … = βˆ‘π‘–π‘– 𝑝𝑝 π‘₯π‘₯ 𝑐𝑐𝑖𝑖 𝑝𝑝(𝑐𝑐𝑖𝑖) = βˆ‘π‘–π‘– πœ‹πœ‹π‘–π‘–π‘π‘(π‘₯π‘₯|πœ‡πœ‡π‘–π‘– , Σ𝑖𝑖)β€’ 𝑝𝑝 π‘₯π‘₯1, π‘₯π‘₯2, … , π‘₯π‘₯𝑁𝑁 𝐺𝐺 = Π𝑖𝑖𝑝𝑝(π‘₯π‘₯𝑖𝑖|𝐺𝐺)β€’ The log likelihood function is given byβ€’

ln𝑝𝑝 𝐗𝐗 𝛑𝛑,𝛍𝛍,𝚺𝚺 = �𝑛𝑛=1

𝑁𝑁

ln οΏ½π‘˜π‘˜=1

𝐾𝐾

πœ‹πœ‹π‘˜π‘˜π‘π‘ 𝐱𝐱𝐧𝐧 𝛍𝛍𝐀𝐀,𝚺𝚺𝐀𝐀

β€’ Goal: Find parameter which maximize log-likelihoodβ€’ Problem: Hard to compute maximum likelihoodβ€’ Solution: use EM algorithm

Maximum Likelihood of GMM

14

Page 15: K-means and GMM

Network Intelligence and Analysis Lab

β€’ EM algorithm is an iterative procedure for finding the MLEβ€’ An expectation (E) step creates a function for the expectation of

the log-likelihood evaluated using the current estimate for the parameters

β€’ A maximization (M) step computes parameters maximizing the expected log-likelihood found on the E step

β€’ These parameter-estimates are then used to determine the distribution of the latent variables in the next E step.

β€’ EM always converges to one of local optimums

EM (Expectation Maximization) Algorithm

15

Page 16: K-means and GMM

Network Intelligence and Analysis Lab

β€’

π‘Žπ‘Žπ‘Ÿπ‘Ÿπ‘Žπ‘Žπ‘Žπ‘Žπ‘Žπ‘Žπ‘›π‘›{π‘Ÿπ‘Ÿπ‘›π‘›π‘›π‘›,𝛍𝛍𝐀𝐀} �𝑛𝑛=1

𝑁𝑁

οΏ½π‘˜π‘˜=1

𝐾𝐾

π‘Ÿπ‘Ÿπ‘›π‘›π‘˜π‘˜ 𝐱𝐱𝐧𝐧 βˆ’ 𝛍𝛍𝐀𝐀 2

β€’ E-Step: minimize J with respect to the π‘Ÿπ‘Ÿπ‘›π‘›π‘˜π‘˜, keeping 𝛍𝛍𝐀𝐀 is fixed

β€’ π‘Ÿπ‘Ÿπ‘›π‘›π‘˜π‘˜ = οΏ½1 π‘Žπ‘Žπ‘“π‘“ π‘˜π‘˜ = π‘Žπ‘Žπ‘Ÿπ‘Ÿπ‘Žπ‘Žπ‘Žπ‘Žπ‘Žπ‘Žπ‘›π‘›π‘—π‘— 𝐱𝐱𝐧𝐧 βˆ’ 𝛍𝛍𝐀𝐀 𝟐𝟐

0 π‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘Ÿπ‘Ÿπ‘œπ‘œπ‘Žπ‘Žπ‘œπ‘œπ‘œπ‘œ

β€’ M-Step: minimize J with respect to the 𝛍𝛍𝐀𝐀, keeping π‘Ÿπ‘Ÿπ‘›π‘›π‘˜π‘˜ is fixed

β€’ 𝛍𝛍𝐀𝐀 = βˆ‘π‘›π‘› π‘Ÿπ‘Ÿπ‘›π‘›π‘›π‘›π±π±π§π§βˆ‘π‘›π‘› π‘Ÿπ‘Ÿπ‘›π‘›π‘›π‘›

K-means revisit: EM and K-means

16

Page 17: K-means and GMM

Network Intelligence and Analysis Lab

β€’ Let π‘§π‘§π‘˜π‘˜ is Bernoulli random variable with probability πœ‹πœ‹π‘˜π‘˜β€’ 𝑝𝑝 π‘§π‘§π‘˜π‘˜ = 1 = πœ‹πœ‹π‘˜π‘˜ where βˆ‘π‘§π‘§π‘˜π‘˜ = 1 and βˆ‘πœ‹πœ‹π‘˜π‘˜ = 1

β€’ Because z use a 1-of-K representation, this distribution in the form

β€’ 𝑝𝑝 𝑧𝑧 = βˆπ‘˜π‘˜=1𝐾𝐾 πœ‹πœ‹π‘˜π‘˜

𝑧𝑧𝑛𝑛

β€’ Similarly, the conditional distribution of x given a particular value for z is a Gaussian

β€’ 𝑝𝑝 π‘₯π‘₯ 𝑧𝑧 = βˆπ‘˜π‘˜=1𝐾𝐾 𝑁𝑁 π‘₯π‘₯ πœ‡πœ‡π‘˜π‘˜, Ξ£π‘˜π‘˜ 𝑧𝑧𝑛𝑛

Latent variable for GMM

17

Page 18: K-means and GMM

Network Intelligence and Analysis Lab

β€’ The joint distribution is given by 𝑝𝑝 π‘₯π‘₯, 𝑧𝑧 = 𝑝𝑝 𝑧𝑧 𝑝𝑝(π‘₯π‘₯|𝑧𝑧)β€’ 𝑝𝑝 π‘₯π‘₯ = βˆ‘π‘§π‘§ 𝑝𝑝 𝑧𝑧 𝑝𝑝(π‘₯π‘₯|𝑧𝑧) = βˆ‘π‘˜π‘˜ πœ‹πœ‹π‘˜π‘˜π‘π‘(π‘₯π‘₯|πœ‡πœ‡π‘˜π‘˜ , Ξ£π‘˜π‘˜)β€’ Thus the marginal distribution of x is a Gaussian mixture of the

above formβ€’ Now, we are able to work with joint distribution instead of

marginal distributionβ€’ Graphical representation of a GMM

for a set of N i.i.d. data points {π‘₯π‘₯𝑛𝑛}with corresponding latent variable{𝑧𝑧𝑛𝑛}, where n=1,…,N

Latent variable for GMM

𝐳𝐳𝐧𝐧

𝑿𝑿𝒏𝒏

𝛑𝛑

𝝁𝝁 𝚺𝚺N

18

Page 19: K-means and GMM

Network Intelligence and Analysis Lab

β€’ Conditional probability of z given xβ€’ From Bayes’ theorem,

β€’ 𝛾𝛾 π‘§π‘§π‘˜π‘˜ ≑ 𝑝𝑝 π‘§π‘§π‘˜π‘˜ = 1 𝐱𝐱 = 𝑝𝑝 𝑧𝑧𝑛𝑛=1 𝑝𝑝 𝐱𝐱 π‘§π‘§π‘˜π‘˜ = 1βˆ‘π‘—π‘—=1𝐾𝐾 𝑝𝑝 𝑧𝑧𝑗𝑗=1 𝑝𝑝 𝐱𝐱 𝑧𝑧𝑗𝑗 = 1

=πœ‹πœ‹π‘˜π‘˜π‘π‘ 𝐱𝐱 𝛍𝛍𝐀𝐀,𝚺𝚺𝐀𝐀

βˆ‘π‘—π‘—=1𝐾𝐾 πœ‹πœ‹π‘—π‘—π‘π‘(𝐱𝐱|𝛍𝛍𝐣𝐣,𝚺𝚺𝐣𝐣)

β€’ 𝛾𝛾 π‘§π‘§π‘˜π‘˜ can also be viewed as the responsibility that component k takes for β€˜explaining’ the observation x

EM for Gaussian Mixtures (E-step)

19

Page 20: K-means and GMM

Network Intelligence and Analysis Lab

β€’ Likelihood function for GMMβ€’

ln𝑝𝑝 𝐗𝐗 𝛑𝛑,𝛍𝛍,𝚺𝚺 = �𝑛𝑛=1

𝑁𝑁

ln οΏ½π‘˜π‘˜=1

𝐾𝐾

πœ‹πœ‹π‘˜π‘˜π‘π‘ 𝐱𝐱𝐧𝐧 𝛍𝛍𝐀𝐀,𝚺𝚺𝐀𝐀

β€’ Setting the derivatives of log likelihood with respect to the means πœ‡πœ‡π‘˜π‘˜ of the Gaussian components to zero, we obtain

β€’

πœ‡πœ‡π‘˜π‘˜ =1

Nπ‘˜π‘˜οΏ½π‘›π‘›=1

𝑁𝑁

𝛾𝛾 π‘§π‘§π‘›π‘›π‘˜π‘˜ 𝐱𝐱𝐧𝐧

where, π‘π‘π‘˜π‘˜ = βˆ‘π‘›π‘›=1𝑁𝑁 𝛾𝛾(π‘§π‘§π‘›π‘›π‘˜π‘˜)

EM for Gaussian Mixtures (M-step)

20

Page 21: K-means and GMM

Network Intelligence and Analysis Lab

β€’ Setting the derivatives of likelihood with respect to the Ξ£π‘˜π‘˜ to zero, we obtain

β€’

πšΊπšΊπ’Œπ’Œ =1π‘π‘π‘˜π‘˜

�𝑛𝑛=1

𝑁𝑁

𝛾𝛾 π‘§π‘§π‘›π‘›π‘˜π‘˜ 𝐱𝐱𝐧𝐧 βˆ’ πœ‡πœ‡π‘˜π‘˜ 𝐱𝐱𝐧𝐧 βˆ’ πœ‡πœ‡π‘˜π‘˜ ⊀

β€’ Maximize likelihood with respect to the mixing coefficient πœ‹πœ‹ by using a Lagrange multiplier, we obtain

β€’ ln𝑝𝑝 𝐗𝐗 𝛑𝛑,𝛍𝛍,𝚺𝚺 + πœ†πœ†(βˆ‘π‘˜π‘˜=1𝐾𝐾 πœ‹πœ‹π‘˜π‘˜ βˆ’ 1)

β€’ πœ‹πœ‹π‘˜π‘˜ = 𝑁𝑁𝑛𝑛𝑁𝑁

EM for Gaussian Mixtures (M-step)

21

Page 22: K-means and GMM

Network Intelligence and Analysis Lab

β€’ πœ‡πœ‡π‘˜π‘˜ ,Ξ£π‘˜π‘˜ ,πœ‹πœ‹π‘˜π‘˜ do not constitute a closed-form solution for the parameters of the mixture model because the responsibility 𝛾𝛾 π‘§π‘§π‘›π‘›π‘˜π‘˜ depend on those parameters in a complex way

β€’ 𝛾𝛾(π‘§π‘§π‘›π‘›π‘˜π‘˜) = πœ‹πœ‹π‘›π‘›π‘π‘ 𝐱𝐱 𝛍𝛍𝐀𝐀,πšΊπšΊπ€π€βˆ‘π‘—π‘—=1𝐾𝐾 πœ‹πœ‹π‘—π‘—π‘π‘(𝐱𝐱|𝛍𝛍𝐣𝐣,𝚺𝚺𝐣𝐣)

β€’ In EM algorithm for GMM, 𝛾𝛾(π‘§π‘§π‘›π‘›π‘˜π‘˜) and parameters are iteratively optimized

β€’ In E step, responsibilities or the posterior probabilities are evaluated by current values for the parameters

β€’ In M step, re-estimate the means, covariances, and mixing coefficients using previous results

EM for Gaussian Mixtures

22

Page 23: K-means and GMM

Network Intelligence and Analysis Lab

β€’ Initialize the means πœ‡πœ‡π‘˜π‘˜, covariances Ξ£π‘˜π‘˜ and mixing coefficient πœ‹πœ‹π‘˜π‘˜, and evaluate the initial value of the log likelihood

β€’ E step: Evaluate the responsibilities using the current parameterβ€’

𝛾𝛾(π‘§π‘§π‘›π‘›π‘˜π‘˜) =πœ‹πœ‹π‘˜π‘˜π‘π‘ 𝐱𝐱 𝛍𝛍𝐀𝐀,𝚺𝚺𝐀𝐀

βˆ‘π‘—π‘—=1𝐾𝐾 πœ‹πœ‹π‘—π‘—π‘π‘(𝐱𝐱|𝛍𝛍𝐣𝐣,𝚺𝚺𝐣𝐣)β€’ M step: Re-estimate parameters using the current responsibilities

β€’ πœ‡πœ‡π‘˜π‘˜π‘›π‘›π‘›π‘›π‘›π‘› = 1Nπ‘›π‘›βˆ‘π‘›π‘›=1𝑁𝑁 𝛾𝛾 π‘§π‘§π‘›π‘›π‘˜π‘˜ 𝐱𝐱𝐧𝐧

β€’ πšΊπšΊπ’Œπ’Œπ’π’π’π’π’π’ = 1π‘π‘π‘›π‘›βˆ‘π‘›π‘›=1𝑁𝑁 𝛾𝛾 π‘§π‘§π‘›π‘›π‘˜π‘˜ 𝐱𝐱𝐧𝐧 βˆ’ πœ‡πœ‡π‘˜π‘˜ 𝐱𝐱𝐧𝐧 βˆ’ πœ‡πœ‡π‘˜π‘˜ ⊀

β€’ πœ‹πœ‹π‘˜π‘˜π‘›π‘›π‘›π‘›π‘›π‘› = 𝑁𝑁𝑛𝑛𝑁𝑁

β€’ π‘π‘π‘˜π‘˜ = βˆ‘π‘›π‘›=1𝑁𝑁 𝛾𝛾(π‘§π‘§π‘›π‘›π‘˜π‘˜)β€’ Repeat E step and M step until converge

EM for Gaussian Mixtures

23

Page 24: K-means and GMM

Network Intelligence and Analysis Lab

β€’ We can derive the K-means algorithm as a particular limit of EM for Gaussian Mixture Model

β€’ Consider a Gaussian mixture model with covariance matrices are given by πœ€πœ€πΌπΌ, where πœ€πœ€ is a variance parameter and I is identity

β€’ If we consider the limit πœ€πœ€ β†’ 0, log likelihood of GMM becomes

β€’ 𝐸𝐸𝑧𝑧 ln𝑝𝑝 𝑋𝑋,𝑍𝑍 πœ‡πœ‡, Ξ£,πœ‹πœ‹ β†’ βˆ’12

= βˆ‘π‘›π‘›βˆ‘π‘˜π‘˜ π‘Ÿπ‘Ÿπ‘›π‘›π‘˜π‘˜ 𝐱𝐱𝐧𝐧 βˆ’ 𝛍𝛍𝐀𝐀 2 + 𝐢𝐢

β€’ Thus, we see that in this limit, maximizing the expected complete-data log likelihood is equivalent to K-means algorithm

Relationship between K-means algorithm and GMM

24