cyclone: safe programming at the c level of...
TRANSCRIPT
![Page 1: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/1.jpg)
Machine Learning CSE 6363 (Fall 2016)
Lecture 20 Conditional Random Field
Heng Huang, Ph.D.
Department of Computer Science and Engineering
![Page 2: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/2.jpg)
Fall 2016 Heng Huang Machine Learning 2
Generative Models
• Hidden Markov models (HMMs) and stochastic grammars
– Assign a joint probability to paired observation and label sequences
– The parameters typically trained to maximize the joint likelihood of train
examples
![Page 3: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/3.jpg)
Fall 2016 Heng Huang Machine Learning 3
Conditional Models
• Conditional probability P(label sequence y | observation
sequence x) rather than joint probability P(y, x)
– Specify the probability of possible label sequences given an
observation sequence
• Allow arbitrary, non-independent features on the observation
sequence X
• The probability of a transition between labels may depend on past
and future observations
– Relax strong independence assumptions in generative models
![Page 4: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/4.jpg)
Fall 2016 Heng Huang Machine Learning 4
Implications of the model
• Does this do what we want?
• Q: does Y[i-1] depend on X[i+1] ?
– “a nodes is conditionally independent of its non-
descendents given its parents”
![Page 5: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/5.jpg)
Fall 2016 Heng Huang Machine Learning 5
Discriminative Models
Maximum Entropy Markov Models (MEMMs)
• Exponential model
• Given training set X with label sequence Y:
– Train a model θ that maximizes P(Y|X, θ)
– For a new data sequence x, the predicted label y maximizes P(y|x, θ)
– Notice the per-state normalization
L(Θy)= Σ log pΘy(yi|xi)
![Page 6: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/6.jpg)
Fall 2016 Heng Huang Machine Learning 6
MEMM-Supervise Learning
![Page 7: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/7.jpg)
Fall 2016 Heng Huang Machine Learning 7
MEMM-Linear Regression
![Page 8: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/8.jpg)
Fall 2016 Heng Huang Machine Learning 8
MEMM-Linear Regression
![Page 9: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/9.jpg)
Fall 2016 Heng Huang Machine Learning 9
MEMM-Logistic Regression
![Page 10: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/10.jpg)
Fall 2016 Heng Huang Machine Learning 10
MEMM-Logistic Regression/Classification
![Page 11: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/11.jpg)
Fall 2016 Heng Huang Machine Learning 11
MEMM-Logistic Regression/Learning
![Page 12: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/12.jpg)
Fall 2016 Heng Huang Machine Learning 12
MEMM-Maximum Entropy Model
![Page 13: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/13.jpg)
Fall 2016 Heng Huang Machine Learning 13
HMMs and MEMMs
![Page 14: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/14.jpg)
Maximum Entropy Markov Models (MEMMs)
Fall 2016 Heng Huang Machine Learning 14
![Page 15: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/15.jpg)
HMM vs MEMM
Fall 2016 Heng Huang Machine Learning 15
![Page 16: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/16.jpg)
Fall 2016 Heng Huang Machine Learning 16
Maximum Entropy Markov Model (MEMM)
• Models dependence between each state and the full observation sequence
explicitly
– More expressive than HMMs
• Discriminative model
– Completely ignores modeling P(X): saves modeling effort
– Learning objective function consistent with predictive function: P(Y|X)
Y1 Y2 … … … Yn
X1:n
![Page 17: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/17.jpg)
Viterbi in MEMMs
Fall 2016 Heng Huang Machine Learning 17
![Page 18: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/18.jpg)
Fall 2016 Heng Huang Machine Learning 18
MEMM: Label Bias Problem
State 1
State 2
State 3
State 4
State 5
Observation 1 Observation 2 Observation 3 Observation 4 0.4
0.6 0.2
0.2
0.2
0.2
0.2
0.45
0.55 0.2
0.3
0.1
0.1
0.3
0.5
0.5 0.1
0.3
0.2
0.2
0.2
What the local transition probabilities say:
• State 1 almost always prefers to go to state 2
• State 2 almost always prefer to stay in state 2
![Page 19: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/19.jpg)
Fall 2016 Heng Huang Machine Learning 19
State 1
State 2
State 3
State 4
State 5
Observation 1 Observation 2 Observation 3 Observation 4
0.4
0.6 0.2
0.2
0.2
0.2
0.2
0.45
0.55 0.2
0.3
0.1
0.1
0.3
0.5
0.5 0.1
0.3
0.2
0.2
0.2
Probability of path 1-> 1-> 1-> 1:
• 0.4 x 0.45 x 0.5 = 0.09
MEMM: Label Bias Problem
![Page 20: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/20.jpg)
Fall 2016 Heng Huang Machine Learning 20
State 1
State 2
State 3
State 4
State 5
Observation 1 Observation 2 Observation 3 Observation 4
0.4
0.6 0.2
0.2
0.2
0.2
0.2
0.45
0.55 0.2
0.3
0.1
0.1
0.3
0.5
0.5 0.1
0.3
0.2
0.2
0.2
Probability of path 2->2->2->2 :
• 0.2 X 0.3 X 0.3 = 0.018 Other paths:
1-> 1-> 1-> 1: 0.09
MEMM: Label Bias Problem
![Page 21: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/21.jpg)
Fall 2016 Heng Huang Machine Learning 21
State 1
State 2
State 3
State 4
State 5
Observation 1 Observation 2 Observation 3 Observation 4
0.4
0.6 0.2
0.2
0.2
0.2
0.2
0.45
0.55 0.2
0.3
0.1
0.1
0.3
0.5
0.5 0.1
0.3
0.2
0.2
0.2
Probability of path 1->2->1->2:
• 0.6 X 0.2 X 0.5 = 0.06 Other paths:
1->1->1->1: 0.09
2->2->2->2: 0.018
MEMM: Label Bias Problem
![Page 22: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/22.jpg)
Fall 2016 Heng Huang Machine Learning 22
State 1
State 2
State 3
State 4
State 5
Observation 1 Observation 2 Observation 3 Observation 4
0.4
0.6 0.2
0.2
0.2
0.2
0.2
0.45
0.55 0.2
0.3
0.1
0.1
0.3
0.5
0.5 0.1
0.3
0.2
0.2
0.2
Probability of path 1->1->2->2:
• 0.4 X 0.55 X 0.3 = 0.066 Other paths:
1->1->1->1: 0.09
2->2->2->2: 0.018
1->2->1->2: 0.06
MEMM: Label Bias Problem
![Page 23: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/23.jpg)
Fall 2016 Heng Huang Machine Learning 23
State 1
State 2
State 3
State 4
State 5
Observation 1 Observation 2 Observation 3 Observation 4
0.4
0.6 0.2
0.2
0.2
0.2
0.2
0.45
0.55 0.2
0.3
0.1
0.1
0.3
0.5
0.5 0.1
0.3
0.2
0.2
0.2
Most Likely Path: 1-> 1-> 1-> 1
• Although locally it seems state 1 wants to go to state 2 and state 2 wants to remain in state 2.
• why?
MEMM: Label Bias Problem
![Page 24: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/24.jpg)
Fall 2016 Heng Huang Machine Learning 24
State 1
State 2
State 3
State 4
State 5
Observation 1 Observation 2 Observation 3 Observation 4
0.4
0.6 0.2
0.2
0.2
0.2
0.2
0.45
0.55 0.2
0.3
0.1
0.1
0.3
0.5
0.5 0.1
0.3
0.2
0.2
0.2
Most Likely Path: 1-> 1-> 1-> 1
• State 1 has only two transitions but state 2 has 5:
• Average transition probability from state 2 is lower
MEMM: Label Bias Problem
![Page 25: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/25.jpg)
Fall 2016 Heng Huang Machine Learning 25
State 1
State 2
State 3
State 4
State 5
Observation 1 Observation 2 Observation 3 Observation 4
0.4
0.6 0.2
0.2
0.2
0.2
0.2
0.45
0.55 0.2
0.3
0.1
0.1
0.3
0.5
0.5 0.1
0.3
0.2
0.2
0.2
Label bias problem in MEMM:
• Preference of states with lower number of transitions over others
MEMM: Label Bias Problem
![Page 26: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/26.jpg)
Fall 2016 Heng Huang Machine Learning 26
Solution: Do not normalize probabilities locally
State 1
State 2
State 3
State 4
State 5
Observation 1 Observation 2 Observation 3 Observation 4 0.4
0.6 0.2
0.2
0.2
0.2
0.2
0.45
0.55 0.2
0.3
0.1
0.1
0.3
0.5
0.5 0.1
0.3
0.2
0.2
0.2
From local probabilities ….
![Page 27: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/27.jpg)
Fall 2016 Heng Huang Machine Learning 27
Solution: Do not normalize probabilities locally
State 1
State 2
State 3
State 4
State 5
Observation 1 Observation 2 Observation 3 Observation 4 20
30 10
20
10
20
20
30
20 20
30
10
10
30
5
5 10
30
20
20
20
From local probabilities to local potentials
• States with lower transitions do not have an unfair advantage!
![Page 28: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/28.jpg)
Fall 2016 Heng Huang Machine Learning 28
From MEMM ….
Y1 Y2 … … … Yn
X1:n
![Page 29: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/29.jpg)
Fall 2016 Heng Huang Machine Learning 29
• CRF is a partially directed model
– Discriminative model like MEMM
– Usage of global normalizer Z(x) overcomes the label bias problem of
MEMM
– Models the dependence between each state and the entire observation
sequence (like MEMM)
From MEMM to CRF
Y1 Y2 … … … Yn
x1:n
![Page 30: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/30.jpg)
Fall 2016 Heng Huang Machine Learning 30
Conditional Random Fields
• General parametric form:
Y1 Y2 … … … Yn
x1:n
![Page 31: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/31.jpg)
Ramesh Nallapati
31
CRFs: Inference
• Given CRF parameters and , find the y* that maximizes P(y|x)
– Can ignore Z(x) because it is not a function of y
• Run the max-product algorithm on the junction-tree of CRF:
Y1 Y2 … … … Yn
x1:n
Y1,Y2 Y2,Y3 ……. Yn-2,Yn-1 Yn-1,Yn
Y2 Y3 Yn-2 Yn-1
Same as Viterbi decoding used
in HMMs!
![Page 32: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/32.jpg)
Fall 2016 Heng Huang Machine Learning 32
Recall: Random Field
![Page 33: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/33.jpg)
Fall 2016 Heng Huang Machine Learning 33
Conditional Random Fields (CRFs)
• CRFs have all the advantages of MEMMs without
label bias problem – MEMM uses per-state exponential model for the conditional
probabilities of next states given the current state
– CRF has a single exponential model for the joint probability of the
entire sequence of labels given the observation sequence
• Undirected acyclic graph
• Allow some transitions “vote” more strongly than others
depending on the corresponding observations
![Page 34: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/34.jpg)
Fall 2016 Heng Huang Machine Learning 34
Definition of CRFs
X is a random variable over data sequences to be labeled
Y is a random variable over corresponding label sequences
![Page 35: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/35.jpg)
Fall 2016 Heng Huang Machine Learning 35
Example of CRFs
![Page 36: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/36.jpg)
Fall 2016 Heng Huang Machine Learning 36
Graphical comparison among
HMMs, MEMMs and CRFs
HMM MEMM CRF
![Page 37: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/37.jpg)
Functional Models
Fall 2016 Heng Huang Machine Learning 37
![Page 38: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/38.jpg)
Fall 2016 Heng Huang Machine Learning 38
x is a data sequence
y is a label sequence
v is a vertex from vertex set V = set of label random variables
e is an edge from edge set E over V
fk and gk are given and fixed. gk is a Boolean vertex feature; fk is a
Boolean edge feature
k is the number of features
are parameters to be estimated
y|e is the set of components of y defined by edge e
y|v is the set of components of y defined by vertex v
1 2 1 2( , , , ; , , , ); andn n k k
Conditional Distribution
If the graph G = (V, E) of Y is a tree, the conditional distribution over the
label sequence Y = y, given X = x, by fundamental theorem of random
fields is:
(y | x) exp ( , y | , x) ( , y | , x)
k k e k k v
e E,k v V ,k
p f e g v
![Page 39: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/39.jpg)
Training of CRFs (From Prof. Dietterich)
log ( | )( , y | , x) ( , y | , x) log (x)
k k e k k v
e E,k v V ,k
p y xf e g v Z
log ( | ) ( , y | , x) ( , y | , x) log (x)k k e k k v
e E,k v V ,k
p y x f e g v Z
• First, we take the log of the equation
• Then, take the derivative of the above equation
• For training, the first 2 items are easy to get.
• For example, for each k, fk is a sequence of Boolean numbers, such
as 00101110100111.
is just the total number of 1’s in the sequence. ( , y | , x)k k ef e
• The hardest thing is how to calculate Z(x)
Fall 2016 Heng Huang Machine Learning 39
![Page 40: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/40.jpg)
Fall 2016 Heng Huang Machine Learning 40
Another Definition
CRF is a Markov Random Fields.
By the Hammersley-Clifford theorem, the probability of a label can be
expressed as a Gibbs distribution, so that
What is clique?
|
1
1( | , , ) exp( ( , ))
( , ) ( , , )
j j
j
n
j j c
i
p y x F y xZ
F y x f y x i
| |
1( | , , ) exp( ( , , ) ( , , ))j j e k k s
j k
p y x t y x i s y x iZ
By only taking consideration of the one node and two nodes cliques, we
have
clique
![Page 41: Cyclone: Safe Programming at the C Level of Abstractionranger.uta.edu/~heng/CSE6363_slides/Lecture20_2016.pdf · MEMM-Linear Regression . ... MEMM-Logistic Regression/Learning . Fall](https://reader034.vdocuments.mx/reader034/viewer/2022052000/6012c8910957992b012a3b0b/html5/thumbnails/41.jpg)
Fall 2016 Heng Huang Machine Learning 41
Definition (cont.)
Moreover, let us consider the problem in a first-order chain model, we
have
For simplifying description, let fj(y, x) denote tj(yi-1, yi, x, i) and sk(yi, x, i)
|
1
1( | , , ) exp( ( , ))
( , ) ( , , )
j j
j
n
j j c
i
p y x F y xZ
F y x f y x i
1
1( | , , ) exp( ( , , , ) ( , , ))j j i i k k i
j k
p y x t y y x i s y x iZ