expectation-maximization (em) chapter 3 (duda et al.) – section 3.9
DESCRIPTION
Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9. CS479/679 Pattern Recognition Dr. George Bebis. Expectation-Maximization (EM). EM is an iterative method to perform ML estimation: Starts with an initial estimate for θ . - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/1.jpg)
Expectation-Maximization (EM)
Chapter 3 (Duda et al.) – Section 3.9
CS479/679 Pattern RecognitionDr. George Bebis
![Page 2: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/2.jpg)
Expectation-Maximization (EM)
• EM is an iterative ML estimation method:
– Starts with an initial estimate for θ.
– Refines the current estimate iteratively to increase the likelihood of the observed data:
p(D/ θ)
![Page 3: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/3.jpg)
Expectation-Maximization (cont’d)
• EM represents a general framework – works best in situations where the data is incomplete (or can be thought as being incomplete)
– Some creativity is required to recognize where the EM algorithm can be used.
– Standard method for estimating the parameters of Mixtures of Gaussians (MoG).
![Page 4: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/4.jpg)
Incomplete Data
• Many times, it is impossible to apply ML estimation because certain features cannot be measured directly.
• The EM algorithm is ideal for problems with unobserved (missing) data.
![Page 5: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/5.jpg)
Example (Moon, 1996)
xx11!x!x22!x!x33!!
x1+x2+x3=k
Assume a trinomialdistribution:
s
k!
![Page 6: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/6.jpg)
Example (Moon, 1996) (cont’d)
y1
y2
![Page 7: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/7.jpg)
EM: Main Idea
• If x was available, we could estimate θ using ML:
• Given that only y is available, estimate θ by:
Maximizing the expectation of ln p(Dx / θ) (with
respect to the unknown variables) given Dy and an estimate of θ.
θθ̂ arg max ln ( / θ)xp D
( ; ) (ln ( / ) / , )unobserved
t tx x yQ E p D D
![Page 8: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/8.jpg)
EM Steps
(1) Initialization(2) E-Step: Expectation(3) M-Step: Maximization(4) Test for convergence
![Page 9: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/9.jpg)
EM Steps (cont’d)
(1) Initialization Step: initialize the algorithm with a guess θ0
(2) Expectation step: it is performed with respect to the unobserved variables, using the current estimate of parameters and conditioned upon the observations:
– Note: if ln p(Dx / θ) is a linear function of the unobserved variables, the expectation step is equivalent to:
( ; ) (ln ( / ) / , )unobserved
t tx x yQ E p D D
( ; ) ( / , )t tunobserved yQ E x D
![Page 10: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/10.jpg)
EM Steps (cont’d)
(3) Maximization Step: provides a new estimate of the parameters:
(4) Test for Convergence:
stop; otherwise, go to Step 2.
t+1 tθθ arg max (θ;θ )Q
t+1 t|θ - θ | if
![Page 11: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/11.jpg)
Example (Moon, 1996) (cont’d)
xx11!x!x22!x!x33!!
k!
k!
where xi=(xi1,xi2,xi3)
![Page 12: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/12.jpg)
Example (Moon, 1996) (cont’d)
Let’s look at the M-step before completing the E-step …
• Take the expected value:
k!
![Page 13: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/13.jpg)
Example (Moon, 1996) (cont’d)
Let’s complete the E-step now …
• We only need to estimate:
22ΣΣii
ΣΣii
=
![Page 14: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/14.jpg)
Example (Moon, 1996) (cont’d)
(see Moon’s paper, page 53)
![Page 15: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/15.jpg)
Example (Moon, 1996) (cont’d)
• Initialization: θ0
• Expectation Step:
• Maximization Step:
• Convergence Step: t+1 t|θ -θ |
22ΣΣii
ΣΣii
![Page 16: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/16.jpg)
Example (Moon, 1996) (cont’d)
θθtt
![Page 17: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/17.jpg)
Convergence properties of EM
• The solution depends on the initial estimate θ0
• At each iteration, a value of θ is computed so that the likelihood function does not decrease.
• The algorithm is guaranteed to be stable (i.e., does not oscillate).
• There is no guarantee that it will convergence to a global maximum.
![Page 18: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/18.jpg)
• EM is the standard method for estimating the parameters of “mixture models”.
Mixture Models
Example:
mixture of2D Gaussians
![Page 19: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/19.jpg)
Mixture Model (cont’d)
ππ11
ππ22ππ33
ππkk
![Page 20: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/20.jpg)
Mixture of 1D Gaussians - Example
π1=0.3
π2=0.2
π3=0.5
![Page 21: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/21.jpg)
Mixture Model (cont’d)
![Page 22: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/22.jpg)
Estimating the parameters of a Mixture Model
• Two fundamental problems:
(1) Estimate the number of mixture components K
(2) Estimate mixture parameters (πk, θk), k=1,2,…,K
![Page 23: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/23.jpg)
Mixtures of Gaussians(Chapter 10)
where p(x/θk)=
• In this case, θk = (μk,Σk)
kk
k k
![Page 24: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/24.jpg)
Data Generation Process Using Mixtures of Gaussians
ππ11
ππ22ππ33
ππkk
![Page 25: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/25.jpg)
Estimating Mixture Parameters Using ML – not easy!
• ML works my maximizing:
• The density function is a mixture:
• Using ML, we must maximize:
![Page 26: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/26.jpg)
Estimating Mixture Parameters Using EM: Case of Unknown Means• Assumptions
Observation
… but we don’t!
![Page 27: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/27.jpg)
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d)
• Introduce hidden or unobserved variables zi
![Page 28: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/28.jpg)
• Main steps using EM
Estimating Mixture Parameters Using EM: Case of Unknown Means
(cont’d)
![Page 29: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/29.jpg)
Estimating Mixture Parameters Using EM: Case of Unknown Means
(cont’d)• Expectation Step
substitute
![Page 30: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/30.jpg)
Estimating Mixture Parameters Using EM: Case of Unknown Means
(cont’d)• Expectation Step
![Page 31: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/31.jpg)
Estimating Mixture Parameters Using EM: Case of Unknown Means
(cont’d)• Expectation Step
![Page 32: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/32.jpg)
Estimating Mixture Parameters Using EM: Case of Unknown Means
(cont’d)• Expectation Step
E(zik) is just the probability that xi was generated by the k-th
component:
![Page 33: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/33.jpg)
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d)
• Maximization Step
![Page 34: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/34.jpg)
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d)
![Page 35: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/35.jpg)
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d)
![Page 36: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/36.jpg)
Estimating Mixture Parameters Using EM: General Case
• Need to review Lagrange Optimization first …
![Page 37: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/37.jpg)
Lagrange Optimization
g(x)=0
solve forx and λ
n+1 equations / n+1 unknowns
![Page 38: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/38.jpg)
Lagrange Optimization (cont’d)
• Example
Maximize f(x1,x2)=x1x2 subject to the constraint
g(x1,x2)=x1+x2-1=0
1 22
1
1 21
2
1 2
( , , )0
( , , )0
1 0
L x xx
x
L x xx
x
x x
1 2 1 2 1 2( , , ) ( , ) ( , )L x x f x x g x x
3 equations / 3 unknowns
![Page 39: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/39.jpg)
Estimating Mixture Parameters Using EM: General Case
• Introduce hidden or unobserved variables zi
![Page 40: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/40.jpg)
Estimating Mixture Parameters Using EM: General Case (cont’d)
• Expectation Step
substitute
![Page 41: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/41.jpg)
Estimating Mixture Parameters Using EM: General Case (cont’d)
• Expectation Step (cont’d)
![Page 42: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/42.jpg)
Estimating Mixture Parameters Using EM: General Case (cont’d)
• Expectation Step (cont’d)
![Page 43: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/43.jpg)
Estimating Mixture Parameters Using EM: General Case (cont’d)
• Maximization Stepuse Lagrangeoptimization
g(x)=0
n
![Page 44: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/44.jpg)
Estimating Mixture Parameters Using EM: General Case (cont’d)
• Maximization Step (cont’d)
![Page 45: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/45.jpg)
Estimating Mixture Parameters Using EM: General Case (cont’d)
![Page 46: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/46.jpg)
Estimating Mixture Parameters Using EM: General Case (cont’d)
![Page 47: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56813bbc550346895da4ed1f/html5/thumbnails/47.jpg)
Estimating the Number of Components K