hidden markov models. hidden markov model in some markov processes, we may not be able to observe...

28
Hidden Markov Models

Post on 20-Dec-2015

232 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly

Hidden Markov Models

Page 2: Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly

Hidden Markov Model

In some Markov processes, we may not be able to observe the states directly.

Page 3: Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly

Hidden Markov Model

A HMM is a quintuple (S, E, S : {s1…sN } are the values for the hidden states

E : {e1…eT } are the values for the observations

probability distribution of the initial statetransition probability matrix emission probability matrix

Xt+1XtXt-1

et+1etet-1

X1

e1

XT

eT

Page 4: Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly

Inferences with HMMFiltering: P(xt|e1:t) Given an observation sequence, compute

the probability of the last state.

Decoding: argmaxx1:t P(x1:t|e1:t)

Given an observation sequence, compute the most likely hidden state sequence.

Learning: argmax P(e1:t) where =() are parameters of the HMM Given an observation sequence, find out

which transition probability and emission probability table assigns the observations the highest probability.

Unsupervised learning

Page 5: Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly

Filtering

P(Xt+1|e1:t+1) = P(Xt+1|e1:t, et+1)

=P(et+1|Xt+1, e1:t) P(Xt+1|e1:t)/P(et+1|e1:t)

=P(et+1|Xt+1) P(Xt+1|e1:t)/P(et+1|e1:t)

P(Xt+1|e1:t) = xt P(Xt+1|xt, e1:t) P(xt|e1:t)

Same form. Use recursion

Page 6: Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly

Filtering Example

Page 7: Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly

Viterbi Algorithm

Compute argmaxx1:t P(x1:t|e1:t)

Since P(x1:t|e1:t) = P(x1:t, e1:t)/P(e1:t), and P(e1:t) remains constant when we consider different x1:t

argmaxx1:t P(x1:t|e1:t)= argmaxx1:t

P(x1:t, e1:t)

Since the Markov chain is a Bayes Net, P(x1:t, e1:t)=P(x0) i=1,t P(xi|xi-1) P(ei|xi) Minimize – log P(x1:t, e1:t)

=–logP(x0) +i=1,t(–log P(xi|xi-1) –log P(ei|xi))

Page 8: Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly

Viterbi Algorithm

Given a HMM (S, E, and observations o1:t, construct a graph that consists 1+tN nodes: One initial node N node at time i. The jth node at time i

represent Xi=sj. The link between the nodes Xi-i=sj and

Xi=sk is associated with the length

–log P(Xi=sk| Xi-1=sj-1)P(ei|Xi=sk)

Page 9: Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly

The problem of finding argmaxx1:t

P(x1:t|e1:t) becomes that of finding the shortest path from x0=s0 to one of the nodes xt=st.

Page 10: Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly

Example

Page 11: Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly

Baum-Welch Algorithm

The previous two kinds of computation needs parameters =(). Where do the probabilities come from?Relative frequency? But the states are not observable!

Solution: Baum-Welch Algorithm Unsupervised learning from observations Find argmax P(e1:t)

Page 12: Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly

Baum-Welch Algorithm

Start with an initial set of parameters 0

Possibly arbitrary

Compute pseudo counts How many times the transition from Xi-i=sj to Xi=sk

occurred?

Use the pseudo counts to obtain another (better) set of parameters 1

Iterate until P1(e1:t) is not bigger than P(e1:t)

A special case of EM (Expectation-Maximization)

Page 13: Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly

Pseudo CountsGiven the observation sequence e1:T, the pseudo counts of the link from Xt=si to Xt+1=sj is the probability

P(Xt=si,Xt+1=sj|e1:T)

Xt=si

Xt+1=sj

Page 14: Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly

Update HMM Parameters

Add P(Xt=si,Xt+1=sj|e1:T) to count(i,j)

Add P(Xt=si|e1:T) to count(i)

Add P(Xt=si|e1:T) to count(i,et)

Updated aij= count(i,j)/count(i);

Updated bjet=count(j,et)/count(j)

Page 15: Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly

P(Xt=si,Xt+1=sj|e1:T)

=P(Xt=si,Xt+1=sj, e1:t, et+1, et+2:T)/ P(e1:T)

=P(Xt=si, e1:t)P(Xt+1=sj|Xt=si)P(et|Xt+1=sj)

P(et+2:T|Xt+1=sj)/P(e1:T)

=P(Xt=si, e1:t) aijbjetP(et+2:T|Xt+1=sj)/ P(e1:T)

=i(t) aij bjet βj(t+1)/P(e1:T)

Page 16: Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly

Forward Probability

),...()( 1 itti sxeePt

Nijeiji

ttttNi

tt

j

tbat

jxePixjxPixeeP

t

...1

111...1

1

1)(

)|()|(),...(

)1(

Page 17: Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly

Backward Probability

)|...()( 1 ixeePt tTti

1)( Ti

Nj

jieiji tbatt

...1

)1()(

Page 18: Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly

Xt=si

Xt+1=sj

t-1 t t+1 t+2

i(t)

j(t+1)

aijbjet

Page 19: Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly

P(Xt=si|e1:T)

=P(Xt=si, e1:t, et+1:T)/P(e1:T)

=P(et+1:T| Xt=si, e1:t)P(Xt=si, e1:t)/P(e1:T)

= P(et+1:T| Xt=si)P(Xt=si|e1:t)P(e1:t)/P(e1:T)

=i(t) βi(t)/P(et+1:T|e1:t)

Page 20: Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly
Page 21: Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly

Speech Recognition

Page 22: Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly

Phones

Page 23: Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly

Speech Signal

Waveform

Spectrogram

Page 24: Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly

Feature Extraction

Frame 1

Frame 2

Feature VectorX1

Feature VectorX2

Page 25: Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly

Speech System Architecture

xx11 xxTT

overover ww11 ...... wwkk Language modelLanguage modelPP((ww11 ......wwkk))

Phoneme inventoryPhoneme inventory

Pronunciation lexiconPronunciation lexicon

PP((xx11......xxTT ||ww11 ......wwkk))

Acousticanalysis

Global search:Maximize

Global search:Maximize

Recognizedword sequence

Recognizedword sequence

Speech inputSpeech input

......

PP (( xx11...... xxTT||ww11...... wwkk ))

P (x1... xT |w1...wk)・P(w1...wk)

Page 26: Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly

HMM for Speech Recognition

start0 n1 d3 end4iy2

a01 a12 a23 a34

a11 a22 a33

a24

o1 o2 o3 o4 o5 o6

b1(o1) b1(o2)b1(o3) b1(o5)

b1(o4) b1(o6)

……

Word Model

ObservationSequence

Page 27: Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly

Language Modeling

Goal: determine which sequence of words is more likely: I went to a party Eye went two a bar tea

• Rudolph the Red Nose reigned here.

• Rudolph the Red knows rain, dear.

• Rudolph the red nose

reindeer.

Page 28: Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly

Summary

HMM Filtering Decoding Learning

Speech Recognition Feature extraction from signal HMM for speech recognition