hidden markov models. hidden markov model in some markov processes, we may not be able to observe...
Post on 20-Dec-2015
232 views
TRANSCRIPT
Hidden Markov Models
Hidden Markov Model
In some Markov processes, we may not be able to observe the states directly.
Hidden Markov Model
A HMM is a quintuple (S, E, S : {s1…sN } are the values for the hidden states
E : {e1…eT } are the values for the observations
probability distribution of the initial statetransition probability matrix emission probability matrix
Xt+1XtXt-1
et+1etet-1
X1
e1
XT
eT
Inferences with HMMFiltering: P(xt|e1:t) Given an observation sequence, compute
the probability of the last state.
Decoding: argmaxx1:t P(x1:t|e1:t)
Given an observation sequence, compute the most likely hidden state sequence.
Learning: argmax P(e1:t) where =() are parameters of the HMM Given an observation sequence, find out
which transition probability and emission probability table assigns the observations the highest probability.
Unsupervised learning
Filtering
P(Xt+1|e1:t+1) = P(Xt+1|e1:t, et+1)
=P(et+1|Xt+1, e1:t) P(Xt+1|e1:t)/P(et+1|e1:t)
=P(et+1|Xt+1) P(Xt+1|e1:t)/P(et+1|e1:t)
P(Xt+1|e1:t) = xt P(Xt+1|xt, e1:t) P(xt|e1:t)
Same form. Use recursion
Filtering Example
Viterbi Algorithm
Compute argmaxx1:t P(x1:t|e1:t)
Since P(x1:t|e1:t) = P(x1:t, e1:t)/P(e1:t), and P(e1:t) remains constant when we consider different x1:t
argmaxx1:t P(x1:t|e1:t)= argmaxx1:t
P(x1:t, e1:t)
Since the Markov chain is a Bayes Net, P(x1:t, e1:t)=P(x0) i=1,t P(xi|xi-1) P(ei|xi) Minimize – log P(x1:t, e1:t)
=–logP(x0) +i=1,t(–log P(xi|xi-1) –log P(ei|xi))
Viterbi Algorithm
Given a HMM (S, E, and observations o1:t, construct a graph that consists 1+tN nodes: One initial node N node at time i. The jth node at time i
represent Xi=sj. The link between the nodes Xi-i=sj and
Xi=sk is associated with the length
–log P(Xi=sk| Xi-1=sj-1)P(ei|Xi=sk)
The problem of finding argmaxx1:t
P(x1:t|e1:t) becomes that of finding the shortest path from x0=s0 to one of the nodes xt=st.
Example
Baum-Welch Algorithm
The previous two kinds of computation needs parameters =(). Where do the probabilities come from?Relative frequency? But the states are not observable!
Solution: Baum-Welch Algorithm Unsupervised learning from observations Find argmax P(e1:t)
Baum-Welch Algorithm
Start with an initial set of parameters 0
Possibly arbitrary
Compute pseudo counts How many times the transition from Xi-i=sj to Xi=sk
occurred?
Use the pseudo counts to obtain another (better) set of parameters 1
Iterate until P1(e1:t) is not bigger than P(e1:t)
A special case of EM (Expectation-Maximization)
Pseudo CountsGiven the observation sequence e1:T, the pseudo counts of the link from Xt=si to Xt+1=sj is the probability
P(Xt=si,Xt+1=sj|e1:T)
Xt=si
Xt+1=sj
Update HMM Parameters
Add P(Xt=si,Xt+1=sj|e1:T) to count(i,j)
Add P(Xt=si|e1:T) to count(i)
Add P(Xt=si|e1:T) to count(i,et)
Updated aij= count(i,j)/count(i);
Updated bjet=count(j,et)/count(j)
P(Xt=si,Xt+1=sj|e1:T)
=P(Xt=si,Xt+1=sj, e1:t, et+1, et+2:T)/ P(e1:T)
=P(Xt=si, e1:t)P(Xt+1=sj|Xt=si)P(et|Xt+1=sj)
P(et+2:T|Xt+1=sj)/P(e1:T)
=P(Xt=si, e1:t) aijbjetP(et+2:T|Xt+1=sj)/ P(e1:T)
=i(t) aij bjet βj(t+1)/P(e1:T)
Forward Probability
),...()( 1 itti sxeePt
Nijeiji
ttttNi
tt
j
tbat
jxePixjxPixeeP
t
...1
111...1
1
1)(
)|()|(),...(
)1(
Backward Probability
)|...()( 1 ixeePt tTti
1)( Ti
Nj
jieiji tbatt
...1
)1()(
Xt=si
Xt+1=sj
t-1 t t+1 t+2
i(t)
j(t+1)
aijbjet
P(Xt=si|e1:T)
=P(Xt=si, e1:t, et+1:T)/P(e1:T)
=P(et+1:T| Xt=si, e1:t)P(Xt=si, e1:t)/P(e1:T)
= P(et+1:T| Xt=si)P(Xt=si|e1:t)P(e1:t)/P(e1:T)
=i(t) βi(t)/P(et+1:T|e1:t)
Speech Recognition
Phones
Speech Signal
Waveform
Spectrogram
Feature Extraction
Frame 1
Frame 2
Feature VectorX1
Feature VectorX2
Speech System Architecture
xx11 xxTT
overover ww11 ...... wwkk Language modelLanguage modelPP((ww11 ......wwkk))
Phoneme inventoryPhoneme inventory
Pronunciation lexiconPronunciation lexicon
PP((xx11......xxTT ||ww11 ......wwkk))
Acousticanalysis
Global search:Maximize
Global search:Maximize
Recognizedword sequence
Recognizedword sequence
Speech inputSpeech input
......
PP (( xx11...... xxTT||ww11...... wwkk ))
P (x1... xT |w1...wk)・P(w1...wk)
HMM for Speech Recognition
start0 n1 d3 end4iy2
a01 a12 a23 a34
a11 a22 a33
a24
o1 o2 o3 o4 o5 o6
b1(o1) b1(o2)b1(o3) b1(o5)
b1(o4) b1(o6)
……
Word Model
ObservationSequence
Language Modeling
Goal: determine which sequence of words is more likely: I went to a party Eye went two a bar tea
• Rudolph the Red Nose reigned here.
• Rudolph the Red knows rain, dear.
• Rudolph the red nose
reindeer.
Summary
HMM Filtering Decoding Learning
Speech Recognition Feature extraction from signal HMM for speech recognition