lecture 17 gaussian mixture models and expectation maximization machine learning

34
Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

Upload: polly-hines

Post on 05-Jan-2016

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

Lecture 17 Gaussian Mixture Models and Expectation

Maximization

Machine Learning

Page 2: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

Last Time

• Logistic Regression

Page 3: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

Today

• Gaussian Mixture Models• Expectation Maximization

Page 4: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

The Problem

• You have data that you believe is drawn from n populations

• You want to identify parameters for each population

• You don’t know anything about the populations a priori– Except you believe that they’re gaussian…

Page 5: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

Gaussian Mixture Models

• Rather than identifying clusters by “nearest” centroids• Fit a Set of k Gaussians to the data • Maximum Likelihood over a mixture model

Page 6: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

GMM example

Page 7: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

Mixture Models

• Formally a Mixture Model is the weighted sum of a number of pdfs where the weights are determined by a distribution,

Page 8: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

Gaussian Mixture Models

• GMM: the weighted sum of a number of Gaussians where the weights are determined by a distribution,

Page 9: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

Graphical Modelswith unobserved variables

• What if you have variables in a Graphical model that are never observed?– Latent Variables

• Training latent variable models is an unsupervised learning application

laughing

amused

sweating

uncomfortable

Page 10: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

Latent Variable HMMs

• We can cluster sequences using an HMM with unobserved state variables

• We will train latent variable models using Expectation Maximization

Page 11: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

Expectation Maximization

• Both the training of GMMs and Graphical Models with latent variables can be accomplished using Expectation Maximization– Step 1: Expectation (E-step)• Evaluate the “responsibilities” of each cluster with the

current parameters

– Step 2: Maximization (M-step)• Re-estimate parameters using the existing

“responsibilities”

• Similar to k-means training.

Page 12: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

Latent Variable Representation

• We can represent a GMM involving a latent variable

• What does this give us?TODO: plate notation

Page 13: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

GMM data and Latent variables

Page 14: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

One last bit

• We have representations of the joint p(x,z) and the marginal, p(x)…

• The conditional of p(z|x) can be derived using Bayes rule.– The responsibility that a mixture component takes for

explaining an observation x.

Page 15: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

Maximum Likelihood over a GMM

• As usual: Identify a likelihood function

• And set partials to zero…

Page 16: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

Maximum Likelihood of a GMM

• Optimization of means.

Page 17: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

Maximum Likelihood of a GMM

• Optimization of covariance

• Note the similarity to the regular MLE without responsibility terms.

Page 18: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

Maximum Likelihood of a GMM

• Optimization of mixing term

Page 19: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

MLE of a GMM

Page 20: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

EM for GMMs

• Initialize the parameters– Evaluate the log likelihood

• Expectation-step: Evaluate the responsibilities

• Maximization-step: Re-estimate Parameters– Evaluate the log likelihood– Check for convergence

Page 21: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

EM for GMMs

• E-step: Evaluate the Responsibilities

Page 22: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

EM for GMMs

• M-Step: Re-estimate Parameters

Page 23: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

Visual example of EM

Page 24: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

Potential Problems

• Incorrect number of Mixture Components

• Singularities

Page 25: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

Incorrect Number of Gaussians

Page 26: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

Incorrect Number of Gaussians

Page 27: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

Singularities

• A minority of the data can have a disproportionate effect on the model likelihood.

• For example…

Page 28: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

GMM example

Page 29: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

Singularities

• When a mixture component collapses on a given point, the mean becomes the point, and the variance goes to zero.

• Consider the likelihood function as the covariance goes to zero.

• The likelihood approaches infinity.

Page 30: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

Relationship to K-means

• K-means makes hard decisions. – Each data point gets assigned to a single cluster.

• GMM/EM makes soft decisions.– Each data point can yield a posterior p(z|x)

• Soft K-means is a special case of EM.

Page 31: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

Soft means as GMM/EM

• Assume equal covariance matrices for every mixture component:

• Likelihood:

• Responsibilities:

• As epsilon approaches zero, the responsibility approaches unity.

Page 32: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

Soft K-Means as GMM/EM

• Overall Log likelihood as epsilon approaches zero:

• The expectation of soft k-means is the intercluster variability

• Note: only the means are reestimated in Soft K-means. – The covariance matrices are all tied.

Page 33: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

General form of EM

• Given a joint distribution over observed and latent variables:

• Want to maximize:1. Initialize parameters2. E Step: Evaluate:

3. M-Step: Re-estimate parameters (based on expectation of complete-data log likelihood)

4. Check for convergence of params or likelihood

Page 34: Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning

Next Time

• Proof of Expectation Maximization in GMMs• Hidden Markov Models