![Page 1: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d1f5503460f949f2b50/html5/thumbnails/1.jpg)
Foundations of Statistical NLP
Chapter 9. Markov Models
한 기 덕
![Page 2: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d1f5503460f949f2b50/html5/thumbnails/2.jpg)
2
Contents
Introduction Markov Models Hidden Markov Models
– Why use HMMs– General form of an HMM– The Three Fundamental Questions for HMMs
Fundamental Questions For HMMs Implementation, Properties, and Variants
![Page 3: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d1f5503460f949f2b50/html5/thumbnails/3.jpg)
3
Introduction
Markov Model– Markov processes/chains/models were first developed by Andrei
A. Markov
– First use linguistic purpose : modeling the letter sequences in Russian literature(1913)
– Current use general statistical tool
VMM (Visible Markov Model)– Words in sentences is depend on their syntax.
HMM (Hidden Markov Model)– operate high level abstraction by postulating additional “hidden”
structures.
![Page 4: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d1f5503460f949f2b50/html5/thumbnails/4.jpg)
4
Markov Models
Markov assumption– Future elements of the sequence independent of past ele
ments, given the present element.
Limited Horizon– Xt = sequence of random variables
– Sk = state space
Time invariant (stationary)
![Page 5: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d1f5503460f949f2b50/html5/thumbnails/5.jpg)
5
Markov Models(Cont’)
Notation– stochastic transition
– probability of different initial state
Application : Linear sequence of events– modeling valid phone sequences in speech recognition
– sequences of speech acts in dialog systems
![Page 6: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d1f5503460f949f2b50/html5/thumbnails/6.jpg)
6
Markov Chain
circle : state and state name arrows connecting states : possible transition arc label : probability of each transition
![Page 7: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d1f5503460f949f2b50/html5/thumbnails/7.jpg)
7
Visible Markov Model
We know what states the machine is passing through.
mth order Markov model– n 3, n-gram violate Limited Horizen condition– reformulate any n-gram model as a visible Markov model by simpl
y encoding (n-1)-gram
![Page 8: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d1f5503460f949f2b50/html5/thumbnails/8.jpg)
8
Hidden Markov Model
We don’t know the state sequence that the model passes through, only some probabilistic function of it
Example 1 : The crazy soft drink machine– two state : cola preferring(CP), iced tea preferring(IP)
– VMM : machine always put out a cola in CP
– HMM : emission probability
– Output probability given From state
![Page 9: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d1f5503460f949f2b50/html5/thumbnails/9.jpg)
9
Crazy soft drink machine
Problem– What is the probability of seeing the output sequence {lem, ice-t} i
f the machine always start off in the cola preferring state?
![Page 10: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d1f5503460f949f2b50/html5/thumbnails/10.jpg)
10
Crazy soft drink machine(Cont’)
![Page 11: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d1f5503460f949f2b50/html5/thumbnails/11.jpg)
11
Why use HMMs?
underlying events probabilistically generate surface events– the words in a text parts of speech
Linear interpolation of n-gram
Hidden state– the choice of whether to use the unigram, bigram, or trigram proba
bilities.
Two Keys– This is conversion works by adding epsilon transitions.
– Separate parameters iab don’t adjust them separately.
![Page 12: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d1f5503460f949f2b50/html5/thumbnails/12.jpg)
12
![Page 13: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d1f5503460f949f2b50/html5/thumbnails/13.jpg)
13
NotationA
B
AAA
BB
SSS
KKK
S
K
S
K
![Page 14: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d1f5503460f949f2b50/html5/thumbnails/14.jpg)
14
General form of an HMM
Arc-emission HMM– the symbol emits at time t depends on both the state at time t and
at time(t+1). State-emission HMM : ex) crazy drink machine
– the symbol emits at time t depends just on the state at time t
Figure 9.4 A program for a Markov process.
![Page 15: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d1f5503460f949f2b50/html5/thumbnails/15.jpg)
15
The Three Fundamental Questions for HMMS
![Page 16: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d1f5503460f949f2b50/html5/thumbnails/16.jpg)
16
Finding the probability of an observation
![Page 17: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d1f5503460f949f2b50/html5/thumbnails/17.jpg)
17
The forward procedure
Cheap algorithm required only 2N2T multiplication
![Page 18: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d1f5503460f949f2b50/html5/thumbnails/18.jpg)
18
The backward procedure
The total probability of seeing the rest of the observation sequence.
Use of a combination of forward and backward probabilities is vital for solving the third problem of parameter reestimation.
Backward variables
Combining forward & backward
![Page 19: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d1f5503460f949f2b50/html5/thumbnails/19.jpg)
19
Finding the best state sequence
State sequence that explains the observations is more than one way.
Find Xt that maximizes P(Xt|O, )
This may yield a quite unlikely state sequence.
Viterbi algorithm is more efficient.
![Page 20: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d1f5503460f949f2b50/html5/thumbnails/20.jpg)
20
Viterbi algorithem
The most likely complete path
This is sufficient to maximize for a fixed O
Definition
![Page 21: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d1f5503460f949f2b50/html5/thumbnails/21.jpg)
21
Variable calculations for O = (lem, ice_t, cola)
![Page 22: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d1f5503460f949f2b50/html5/thumbnails/22.jpg)
22
Parameter estimation
Given a certain observation sequence Find the values of the model parameter
= (A, B, ) Using Maximum Likelihood Estimation
Locally maximize by an iterative hill-climbing algorithm usually effective for HMM
![Page 23: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d1f5503460f949f2b50/html5/thumbnails/23.jpg)
23
Parameter estimation (Cont’)
![Page 24: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d1f5503460f949f2b50/html5/thumbnails/24.jpg)
24
Parameter estimation (Cont’)
![Page 25: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d1f5503460f949f2b50/html5/thumbnails/25.jpg)
25
Implementation, Properties, Variants
Implementation– Obvious issue : keeping on multiplying very small
numbers Use Log function
Variants– It is not impossible to estimate many number
parameter.
Multiple input observations Initialization of parameter values
– Try to approach near global maximum