1 hmm - part 2 review of the last lecture the em algorithm continuous density hmm

46
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

Upload: dwayne-hopkins

Post on 12-Jan-2016

233 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

1

HMM - Part 2

Review of the last lecture The EM algorithm Continuous density HMM

Page 2: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

2

Three Basic Problems for HMMs

Given an observation sequence O=(o1,o2,…,oT), and an HMM =(A,B,)

– Problem 1:

How to compute P(O|) efficiently ?

The forward algorithm

– Problem 2:

How to choose an optimal state sequence Q=(q1,q2,……, qT)

which best explains the observations?

The Viterbi algorithm– Problem 3:

How to adjust the model parameters =(A,B,) to maximize P(O|)? The Baum-Welch (forward-backward) algorithm

cf. The segmental K-means algorithm maximizes P(O, Q* |)

* arg max ( , | )P Q

Q O Q

)|(maxarg*iP

i

OP(up, up, up, up, up|)?

Page 3: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

3

The Forward Algorithm

The forward variable:– Probability of o1,o2,…,ot being observed and the state at time t

being i, given model λ

The forward algorithm

1 1 1 1

1 11

1

1. Initialization ( , | ) 1

2. Induction 1 1 1

3.Termination O

i i

N

t t ij j ti

N

Ti

α i P o q i π b o , i N

α j α i a b o , t T - , j N

P λ α i

λiqoooPi ttt ,...21

Page 4: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

4

The Viterbi Algorithm

1. Initialization

2. Induction

3. Termination

4. Backtracking),...,,(

1,...,2.1),(**

2*1

*

*11

T

tt*t

qqq

TTtqq

Q

Ni, i

Ni, obπi ii

10)(

1

1

11

Nj,T-t, aij

Nj,T-t, obaij

ijtNi

tjijtNi

t

1 11][maxarg)(

1 11][max

11t

11

1

iq

iλP

TNi

*T

TNi

1

1

*

maxarg

max,QO

is the best state sequence

11

1 cf.

tj

N

iijtt obaiαjα

N

iT iαλP

1O

Page 5: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

5

The Segmental K-means Algorithm

Assume that we have a training set of observations and an initial estimate of model parameters– Step 1 : Segment the training data

The set of training observation sequences is segmented into states, based on the current model, by the Viterbi Algorithm

– Step 2 : Re-estimate the model parameters

– Step 3: Evaluate the model If the difference between the new and current model scores exceeds a threshold, go back to Step 1; otherwise, return

Number of " " in state ˆ Number of times in state j

k jb k

j

1Number of times ˆ

Number of training sequencesi

q i

Number of transitions from state to state ˆ

Number of transitions from state ij

i ja

i

1ˆ1

N

ii

1ˆ1

N

jija

1)(ˆ1

M

kj kb

Page 6: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

6

Segmental K-means vs. Baum-Welch

Number of " " in state ˆ Number of times in state j

k jb k

j

1Number of times ˆ

Number of training sequencesi

q i

Number of transitions from state to state ˆ

Number of transitions from state ij

i ja

i

1 number of times ˆ

Number of training sequencesi

q i

Expected

number of " " in state ˆ number of times in state j

k jb k

j

Expected

Expected

number of transitions from state to state ˆ

number of transitions from state ij

i ja

i

Expected

Expected

Page 7: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

7

The Backward Algorithm

The backward variable:– Probability of ot+1,ot+2,…,oT being observed, given the state at

time t being i and model

The backward algorithm

1 11

1 11

1. Initialization 1, 1

2. Induction , 1 1 1

3. Termination ( ) ( )

T

N

t ij j t tj

N

i ii

β i i N

i a b o j t T - , j N

P λ i b o

O

λiqoooPi tTttt ,,...,, 21

N

iT iαλP

1Ocf.

Page 8: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

8

ot

The Forward-Backward Algorithm

Relation between the forward and backward variables

)(][

,...

11

21

ti

N

jjitt

ttt

obaji

iqoooPi

λ

N

jttjijt

tTttt

jobai

iqoooPi

111

21

)(

,...

λ

λiqPii ttt ,)( O

(Huang et al., 2001)

Ni tt iiλP 1 )(O

Page 9: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

9

The Baum-Welch Algorithm (1/3)

Define two new variables:

t(i)= P(qt = i | O, ) – Probability of being in state i at time t, given O and

t( i, j )=P(qt = i, qt+1 = j | O, ) – Probability of being in state i at time t and state j at time t+1, given O

and

N

m

N

nttnmnt

ttjijtttt

nobam

jobai

λP

λjqiqPji

1 111

111 ,,,

O

O

Ni tt

tttttt

ii

ii

λP

ii

λP

iqPi

1

)|,(

OO

O

N

jtt jii

1

,

Page 10: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

10

The Baum-Welch Algorithm (2/3)

t(i)= P(qt = i | O, ) – Probability of being in state i at time t, given O and

t( i, j )=P(qt = i, qt+1 = j | O, ) – Probability of being in state i at time t and state j at time t+1, given O

and

iiL

l

T

t

lt

l

state from ns transitioofnumber expected)( 1

1

1

iqiL

l

l

11

1 timesofnumber expected)(

jijiL

l

T

t

lt

l

state to state from ns transitioofnumber expected),( 1

1

1

Page 11: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

11

The Baum-Welch Algorithm (3/3)

Re-estimation formulae for , A, and B are

How do you know ? ˆ( | ) ( | )P P O O

L

iiq

L

l

l

i

1

11

)(

sequences trainingofNumber

timesofnumber Expectedˆ

L

l

T

t

lt

L

l

T

t

lt

ijl

l

i

ji

i

jia

1

1

1

1

1

1

)(

),(

state from ns transitioofnumber Expected

state to state from ns transitioofnumber Expectedˆ

L

l

T

t

lt

L

l

T

vo

t

lt

jl

l

kt

j

j

j

jkkb

1 1

1 s.t.

1

)(

)(

statein timesofnumber Expected

statein "" ofnumber Expected)(ˆ

Page 12: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

12

Maximum Likelihood Estimation for HMM

QQO

O

O

)|,(log maxarg

))|(log)(( )(maxarg

))|()(( )(maxarg

P

Pll

PLLML

,....,...,, 10 tHowever, we cannot find the solution directly.

An alternative way is to find a sequence:

....)(...)()( 10 tlll s.t.

Page 13: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

13

])|,(

)|,([log

])|,(

)|,([log

)|,(

)|,(),|(log

)|,(

)|,(

)|(

)|,(log

)|,(

)|,(

)|(

)|,(log

)|(

)|,(log

)|(log)|,(log

)|(log)|(log)()(

),|(

),|(

tP

tP

tt

tt

t

t

t

t

t

t

tt

P

PE

P

PE

P

PP

P

P

P

P

P

P

P

P

P

P

PP

PPll

t

t

QO

QO

QO

QO

QO

QOOQ

QO

QO

O

QO

QO

QO

O

QO

O

QO

OQO

OO

OQ

OQ

Q

Q

Q

Q

Q

Jensen’s inequality

)|,(logmaxarg

)|,(log),|(maxarg

)|,(

)|,(log),|(maxarg

])|,(

)|,([logmaxarg

),|(

),|(

)1(

QO

QOOQ

QO

QOOQ

QO

QO

OQ

Q

Q

OQ

PE

PP

P

PP

P

PE

t

t

P

t

tt

tP

t

Q function

Solvable and can be proved that 1( ) ( )t tl l

1( | ) ( | )t tP P O OIf f is a concave function, and X is a r.v., thenE[f(X)]≤ f(E[X])

Page 14: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

14

The EM Algorithm

EM: Expectation Maximization– Why EM?

• Simple optimization algorithms for likelihood functions rely on the intermediate variables, called latent dataFor HMM, the state sequence is the latent data

• Direct access to the data necessary to estimate the parameters is impossible or difficultFor HMM, it is almost impossible to estimate (A, B, ) without considering the state sequence

– Two Major Steps :• E step: compute the expectation of the likelihood by including the

latent variables as if they were observed

• M step: compute the maximum likelihood estimates of the parameters by maximizing the expected likelihood found in the E step

Q QOOQ )|,(log),|( PλP

Page 15: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

15

Three Steps for EM

Step 1. Draw a lower bound– Use the Jensen’s inequality

Step 2. Find the best lower bound auxiliary function– Let the lower bound touch the objective function at the

current guess

Step 3. Maximize the auxiliary function– Obtain the new guess– Go to Step 2 until converge

[Minka 1998]

Page 16: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

16

)(F

objective function

current guess

Form an Initial Guess of =(A,B,)

Given the current guess , the goal is to find a new guess such that

NEW

)()( NEWFF

)|(maxarg*

OP

Page 17: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

17

)(F

)()( Fg

Step 1. Draw a Lower Bound

)(glower bound function

objective function

Page 18: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

18

)(F

Step 2. Find the Best Lower Bound

objective function

lower bound function)(g

),( g

auxiliary function

Page 19: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

19

)(F

),( g

Step 3. Maximize the Auxiliary Function

NEW

)()( FF NEW

auxiliary function

objective function

Page 20: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

20

)(F

Update the Model

NEW

objective function

Page 21: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

21

)(F

),( g

Step 2. Find the Best Lower Bound

objective function

auxiliary function

Page 22: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

22

)(F

NEW

Step 3. Maximize the Auxiliary Function

)()( FF NEW

objective function

Page 23: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

23

Step 1. Draw a Lower Bound (cont’d)

Q

Q

Q

QOQ

QOO

)(

)|,()(log

)|,(log)|(log

p

Pp

PP

Apply Jensen’s Inequality

A lower bound function of

)(F

If f is a concave function, and X is a r.v., thenE[f(X)]≤ f(E[X])

Q Q

QOQ

)(

)|,(log)(

p

Pp

Objective function

p(Q): an arbitrary probability distribution

Page 24: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

24

Step 2. Find the Best Lower Bound (cont’d)

– Find that makes

the lower bound function

touch the objective function

at the current guess

*

( )

( , | ) We want to maximize ( ) log w.r.t ( ) at

( )

( , | )The best ( ) arg max ( ) log

( )p

Pp p

p

Pp p

p

Q

QQ

O QQ Q

Q

O QQ Q

Q

)(Qp

Page 25: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

25

Step 2. Find the Best Lower Bound (cont’d)

),|()|(

)|,(

)|,(

)|,()(

)|,(

1)|,(

)()|,(

)(

1)|,(log)(log

01)(log)|,(log

)(log)()|,(log)()(1

here multiplier Lagrange a introduce we,1)( Since

OQO

QO

QO

QOQ

QO

QOQ

QOQ

QOQ

QQO

QQQOQQ

Q

QQ

Q

Q

QQQ

Q

PP

P

P

Pp

P

ee

e

Pep

e

Pep

Pp

pP

ppPpp

p

Take the derivative w.r.t and set it to zero)(Qp

Page 26: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

26

Step 2. Find the Best Lower Bound (cont’d)

Q function

)|(log)|(log),|(

),|(

)|(),|(log),|(

),|(

)|,(log),|(),(

OOOQ

OQ

OOQOQ

OQ

QOOQ

Q

Q

Q

PPP

P

PPP

P

PPg

We can check

),(),|(

)|,(log),|(

g

P

PP

Q OQ

QOOQDefine

QQ

OQOQQOOQ ),|(log),|()|,(log),|(),( PPPPg

Page 27: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

27

EM for HMM Training

Basic idea– Assume we have and the probability that each Q occurred in the

generation of O

i.e., we have in fact observed a complete data pair (O,Q) with frequency proportional to the probability P(O,Q|)

– We then find a new that maximizes

– It can be guaranteed that

EM can discover parameters of model to maximize the log-likelihood of the incomplete data, logP(O|), by iteratively maximizing the expectation of the log-likelihood of the complete data, logP(O,Q|)

Q QOOQ )|,(log),|( PλP ˆ

)|()ˆ|( λPP OO Expectation

Page 28: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

28

Solution to Problem 3 - The EM Algorithm

The auxiliary function

where and can be expressed as

Q

Q

QOO

QO

QOOQ

,log,

,log,,

PλP

λP

PλPλλQ

T

ttq

T

tqqq

T

ttq

T

tqqq

obaP

obaλP

ttt

ttt

1

1

1

1

1

1

logloglog,log

,

11

11

QO

QO

λP QO, QO,log P

Page 29: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

29

Solution to Problem 3 - The EM Algorithm (cont’d)

The auxiliary function can be rewritten as

wi yi

wj yj

wk yk

N

j

M

k votj

t

N

i

N

j

T

tij

tt

N

ii

kt

kbλP

λj,qPλQ

aλP

λjqi,qPλQ

λP

λi,qPλQ

1 1

1 1

1

1

1

1

1

log,

log,

,

log,

O

Ob

O

Oa

O

b

a

π

i1

( )t j

),( jit

λQλQλQ

obaλP

λ,PλλQ

T

ttq

T

tqqq ttt

,,,

]loglog[log, all 1

1

111

baπ

O

QO

baπ

Q

example

Page 30: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

30

Solution to Problem 3 - The EM Algorithm (cont’d)

The auxiliary function is separated into three independent terms, each respectively corresponds to , , and – Maximization procedure on can be done by maximizing

the individual terms separately subject to probability constraints

– All these terms have the following form

ija kb ji

N

nn

jj

j

N

jj

N

jjjN

w

wyF

yyywyyygF

1

1121

: when valuemaximum a has

0 and ,1 where,log,,...,,

y

y

Mk j

Nj ij

Ni i jkbia 111 1)( , 1 ,1

λλ,Q

Page 31: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

31

Solution to Problem 3 - The EM Algorithm (cont’d)

Proof: Apply Lagrange Multiplier

N

nn

jj

N

jj

N

jj

N

jj

N

jj

j

j

j

j

j

N

j

N

jjjj

N

jjj

w

wy

wwyy

jy

w

y

w

y

F

yywywF

1

1111

1 11

0Then

0 Letting

1loglog that Suppose

Multiplier Lagrange applyingBy

Constraint

xe

xxh

x

xhxh

h

xhx

h

xhx

dx

xd

he

hx

xh

xhx

h

h

h

hh

h

h

1ln

1/1lnlim

1

/1lnlim/1lnlim

/lnlim

)ln()ln(lim

ln

...71828.21lim

/

0/

/1/

0

/1

0

00

/1

0

Page 32: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

32

Solution to Problem 3 - The EM Algorithm (cont’d)

N

ii

λP

λi,qPλQ

1

1log,

O

Oππ

wi yi

N

nn

ii

w

wy

1 i

P

iqPi 1

1,ˆ

O

O

λiqPi tt ,)( O

1

1

1

1

N

n

N

nn

λP

λn,qP

w

O

O

Page 33: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

33

Solution to Problem 3 - The EM Algorithm (cont’d)

N

nn

jj

w

wy

1

1

1

1

11

1

1

11 ,

,

,,ˆ

T

tt

T

tt

T

tt

T

ttt

ij

i

ji

iqP

jqiqPa

O

O

N

i

N

j

T

tij

tta

λP

λjqi,qPλQ

1 1

1

1

1log

,,

O

Oaa

wj yj

Page 34: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

34

Solution to Problem 3 - The EM Algorithm (cont’d)

N

nn

kk

w

wy

1

wk yk

1 1s.t. s.t.

1 1

,

ˆ

,

t k t k

T T

t tt to v o v

j T T

t tt t

P q j j

b kP q j j

O

O

N

j

M

k votj

t

kt

kbλP

λj,qPλQ

1 1log,

O

Obb

Page 35: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

35

Solution to Problem 3 - The EM Algorithm (cont’d)

The new model parameter set can be expressed as:

BAπ ˆ,ˆ,ˆ=̂

1

1

1 1

11 1

1 1

1 1

1 1s.t. s.t.

1 1

, , ,ˆ

,

,

ˆ

,

t k t k

i

T T

t t tt t

ij T T

t tt t

T T

t tt to v o v

j T T

t tt t

P q ii

P

P q i q j i ja

P q i i

P q j j

b kP q j j

O

O

O

O

O

O

λjqiqPji

λiqPi

ttt

tt

,,,

,)(

1 O

O

Page 36: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

36

Discrete vs. Continuous Density HMMs

Two major types of HMMs according to the observations– Discrete and finite observation:

• The observations that all distinct states generate are finite in number, i.e., V={v1, v2, v3, ……, vM}, vkRL

• In this case, the observation probability distribution in state j, B={bj(k)}, is defined as bj(k)=P(ot=vk|qt=j), 1kM, 1jNot : observation at time t, qt : state at time t

bj(k) consists of only M probability values

– Continuous and infinite observation:• The observations that all distinct states generate are infinite and

continuous, i.e., V={v| vRL}

• In this case, the observation probability distribution in state j, B={bj(v)}, is defined as bj(v)=f(ot=v|qt=j), 1jNot : observation at time t, qt : state at time t

bj(v) is a continuous probability density function (pdf) and is often a mixture of Multivariate Gaussian (Normal) Distributions

Page 37: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

37

Gaussian Distribution

A continuous random variable X is said to have a Gaussian distribution with mean μ and variance σ2(σ>0) if X has a continuous pdf in the following form:

2

2

2/12

2exp

2

1),|(

x

μxXf

Page 38: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

38

Multivariate Gaussian Distribution

If X=(X1,X2,X3,…,Xd) is an d-dimensional random vector with a multivariate Gaussian distribution with mean vector and covariance matrix , then the pdf can be expressed as

If X1,X2,X3,…,Xd are independent random variables, the covariance matrix is reduced to diagonal, i.e.,

))((

oft determinan : ((

2

1exp

2

1),;()(

2

TTT

1T2/12/

jjiiij

d

xxE

E))E

E

Nf

ΣΣμμxxμxμxΣ

μxΣμxΣ

ΣμxxX

jiij ,02

d

i ii

ii

ii

xf

12

2

2/1 2exp

2

1),|(

ΣμxX

Page 39: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

39

Multivariate Mixture Gaussian Distribution

An d-dimensional random vector X=(X1,X2,X3,…,Xd) is with a multivariate mixture Gaussian distribution if

In CDHMM, bj(v) is a continuous probability density function (pdf) and is often a mixture of multivariate Gaussian distributions

M

kjkjkjk

jkd

jkj cb1

1T2/12/ 2

1exp

2

1μvΣμv

Σv

M

kjkjk cc

1

1and0 Covariance matrix of the k-th mixture of the j-th state

Mean vectorof the k-th mixture of the j-th state

Observation vector

wNwfM

kk

M

kkkk 1 ,),;()(

11

Σμxx

Page 40: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

40

Solution to Problem 3 – The Segmental K-means Algorithm for CDHMM

Assume that we have a training set of observations and an initial estimate of model parameters– Step 1 : Segment the training data

The set of training observation sequences is segmented into states, based on the current model, by Viterbi Algorithm

– Step 2 : Re-estimate the model parameters

– Step 3: Evaluate the model If the difference between the new and current model scores exceeds a threshold, go back to Step 1; otherwise, return

1Number of times ˆ

Number of training sequencesi

q i Number of transitions from state to state

ˆ Number of transitions from state ij

i ja

i

By partitioning the observation vectors within each state into clusters

number of vectors classified into cluster of state ˆ

number of vectors in state

ˆ sample mean of vectors classified

jm

jm

j M

m jc

j

μ into cluster of state

ˆ sample covariance matrix of vectors classified into cluster of state jm

m j

m jΣ

Page 41: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

41

Solution to Problem 3 – The Segmental K-means Algorithm for CDHMM

(cont’d) 3 states and 4 Gaussian mixtures per state

O1

State

O2

1 2 tOt

s2

s3

s1

s2

s3

s1

s2

s3

s1

s2

s3

s1

s2

s3

s1

s2

s3

s1

s2

s3

s1

s2

s3

s1

s2

s3

s1

Global mean Cluster 1 mean

Cluster 2mean

K-means {11,11,c11}{12,12,c12}

{13,13,c13} {14,14,c14}

Page 42: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

42

Solution to Problem 3 – The Baum-Welch Algorithm for CDHMM

Define a new variable t(j,k) – Probability of being in state j at time t with the k-th mixture

component accounting for ot, given O and

M

mjmjmtjm

jkjktjk

N

stt

tt

tt

tttt

t

tttttt

tttttt

Nc

Nc

ss

jj

λjqP

λjqkmPj

λjqP

λjqkmPjλjqkmPj

λjqkmPλjqPλkmjqPkj

11,;

,;

,

,,

,

,,,,

,,,,,,

Σμo

Σμo

o

o

O

OO

OOO

Observation-independent assumption

λjqP

λjqkmP

tTttt

tTtttt

,,...,,,,...,

,,...,,,,...,,

111

111

ooooo

ooooo

λjqP

λkmjqPλjqkmP

tt

ttttt

,

,,,

o

o

Page 43: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

43

Solution to Problem 3 – The Baum-Welch Algorithm for CDHMM (cont’d)

Re-estimation formulae for are

1

1

,ˆ Weighted average (mean) of observations in state and mixture

,

T

t tt

jk T

tt

j kj k

j k

1 1

1 1 1

Expected number of times in state and mixture ˆ

Expected number of times in state

T T

t tt t

jk T M T

t tt m t

j,k j,kj k

cj j,m j

T

1

1

ˆ Weighted covariance of observations in state and mixture

ˆ ˆ,

,

jk

T

t t jk t jkt

T

tt

j k

j k

j k

Σ

o μ o μ

, , jk jk jkc μ Σ

Page 44: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

44

A Simple Example

o1

State

o2 o3

1 2 3 Time

S1

S2

S1

S2

S1

S2

1 1 11

2 2 11 2 2 22

2 2 33

1 1 22 1 1 33

The Forward/Backward Procedure

N

jtt

tt

N

jt

t

tt

jj

ii

λjqP

λiqP

λP

λiqPi

1

1

,

,

,

O

O

O

O

N

jt1tjijt

N

i

t1tjijt

N

jtt

N

i

tt

ttt

jobai

jobai

λjqiqP

λjqiqP

λP

λjqiqPji

11

1

1

11

1

1

1

)(

)(

,,

,,

,,,

O

O

O

O

Page 45: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

45

A Simple Example (cont’d) 1

2

1

2

1

2

4v 7v 4v

start1

2

11a

12a

22a

21a

4,1117,1114,11 babab 1 4,1117,1114,11 loglogloglogloglog babab

4,2127,1114,11 babab 2 4,2127,1114,11 loglogloglogloglog babab

4,1217,2124,11 babab 3 4,1217,2124,11 loglogloglogloglog babab

4,2227,2124,11 babab 4 4,2227,2124,11 loglogloglogloglog babab

4,1117,1214,22 babab 5 4,1117,1214,22 loglogloglogloglog babab

4,2127,1214,22 babab 6 4,2127,1214,22 loglogloglogloglog babab

4,1217,2224,22 babab 7 4,1217,2224,22 loglogloglogloglog babab

4,2227,2224,22 babab 8 4,2227,2224,22 loglogloglogloglog babab

)|,(log qOp)|,( λp qO

Total 8 paths

q: 1 1 1

q: 1 1 2

Page 46: 1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM

46

A Simple Example (cont’d)

pathsall 87654321 all

21 log8765

log4321

allall

...

log8487

log7365

log6243

log5121

2221

1211

aallall

aallall

aallall

aallall

back

)1,1()1,1(/1,1/1,1 213221 λPλq,qPλPλq,qP OOOO

λQλQλQ

obaλP

λ,PλλQ

T

ttq

T

tqqq ttt

,,,

]loglog[log, all 1

1

111

baπ

O

QO

baπ

Q

2111 log)2(log)1(

)1(/ 11 λPλi,qP OO )2(1

11(1,1) logtt

a

( , ) logt ijt

i j a