what is it? when would you use it? why does it work? how do you implement it? where does it stand in...
Post on 21-Dec-2015
219 views
TRANSCRIPT
![Page 1: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/1.jpg)
What is it?
When would you use it?
Why does it work?
How do you implement it?
Where does it stand in relation to other methods?
EM algorithm reading group
Introduction & Motivation
Theory
Practical
Comparison with other methods
![Page 2: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/2.jpg)
Expectation Maximization (EM)
• Iterative method for parameter estimation where you have missing data
• Has two steps: Expectation (E) and Maximization (M)
• Applicable to a wide range of problems
• Old idea (late 50’s) but formalized by Dempster, Laird and Rubin in 1977
• Subject of much investigation. See McLachlan & Krishnan book 1997.
![Page 3: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/3.jpg)
Applications of EM (1)
• Fitting mixture models
![Page 4: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/4.jpg)
Applications of EM (2)
• Probabilistic Latent Semantic Analysis (pLSA)– Technique from text community
P(w,d) P(w|z) P(z|d)
Z
WD
Z
D
W
![Page 5: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/5.jpg)
Applications of EM (3)
• Learning parts and structure models
![Page 6: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/6.jpg)
Applications of EM (4)
• Automatic segmentation of layers in video
http://www.psi.toronto.edu/images/figures/cutouts_vid.gif
![Page 7: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/7.jpg)
Motivating example
0 1-2 -1-4 -3 4 52 3
Data:
Model:
Parameters:
OBJECTIVE: Fit mixture of Gaussian model with C=2 components
keep fixed
i.e. only estimate x
P(x|)
where
![Page 8: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/8.jpg)
Likelihood functionLikelihood is a function of parameters, Probability is a function of r.v. x DIFFERENT TO LAST PLOT
![Page 9: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/9.jpg)
Probabilistic modelImagine model generating data
Need to introduce label, z, for each data point
Label is called a latent variablealso called hidden, unobserved, missing
c
0 1-2 -1-4 -3 4 52 3
Simplifies the problem:if we knew the labels, we can decouple the components as estimate
parameters separately for each one
![Page 10: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/10.jpg)
Intuition of EME-step: Compute a distribution on the labels of the points, using current parameters
M-step: Update parameters using current guess of label distribution.
E
E
M
M
E
![Page 11: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/11.jpg)
Theory
![Page 12: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/12.jpg)
Complete log-likelihood (CLL)
Log-likelihood [Incomplete log-likelihood (ILL)]
Expected complete log-likelihood (ECLL)
Some definitionsObserved data
Latent variables
Iteration index
Continuous I.I.D
Discrete 1 ... C
![Page 13: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/13.jpg)
Use Jensen’sinequality
Lower bound on log-likelihood
AUXILIARY FUNCTION
![Page 14: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/14.jpg)
Jensen’s InequalityJensen’s inequality:
For a real continuous concave function and
Equality holds when all x are the same
1. Definition of concavity. Consider where
then
2. By induction:
for
![Page 15: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/15.jpg)
Recall key result : Auxiliary function is LOWER BOUND on likelihood
EM is alternating ascent
Alternately improve q then :
Is guaranteed to improve likelihood itself….
![Page 16: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/16.jpg)
E-step: Choosing the optimal q(z|x,)
Turns out that q(z|x,) = p(z|x,t) is the best.
![Page 17: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/17.jpg)
E-step: What do we actually compute?
nComponents x nPoints matrix (columns sum to 1):
Component 1
Component 2
Poi
nt 1
Poi
nt 2
Poi
nt 6
Responsibility of component for point :
![Page 18: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/18.jpg)
M-Step
Entropy termECLL
Auxiliary function separates into ECLL and entropy term:
![Page 19: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/19.jpg)
M-Step
From previous slide:
Recall definition of ECLL:
From E-step
Let’s see what happens for
![Page 20: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/20.jpg)
![Page 21: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/21.jpg)
![Page 22: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/22.jpg)
![Page 23: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/23.jpg)
![Page 24: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/24.jpg)
![Page 25: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/25.jpg)
Practical
![Page 26: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/26.jpg)
InitializationMean of data + random offsetK-Means
TerminationMax # iterationslog-likelihood changeparameter change
ConvergenceLocal maximaAnnealed methods (DAEM)Birth/death process (SMEM)
Numerical issuesInject noise in covariance matrix to prevent blowupSingle point gives infinite likelihood
Number of componentsOpen problemMinimum description lengthBayesian approach
Practical issues
![Page 27: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/27.jpg)
Local minima
![Page 28: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/28.jpg)
Robustness of EM
![Page 29: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/29.jpg)
What EM won’t do
Pick structure of model# componentsgraph structure
Find global maximum
Always have nice closed-form updatesoptimize within E/M step
Avoid computational problemssampling methods for computing expectations
![Page 30: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/30.jpg)
Comparison with other methods
![Page 31: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/31.jpg)
Why not use standard optimization methods?
• No step size
• Works directly in parameter space model, thus parameter constraints are obeyed
• Fits naturally into graphically model frame work
• Supposedly faster
In favour of EM:
![Page 32: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/32.jpg)
Gradient
Newton
EM
![Page 33: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/33.jpg)
Gradient
Newton
EM
![Page 34: What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group](https://reader038.vdocuments.mx/reader038/viewer/2022103123/56649d6a5503460f94a48ef5/html5/thumbnails/34.jpg)
Acknowledgements
Shameless stealing of figures and equations and explanations from:
Frank DellaertMichael JordanYair Weiss