log-linear models - northeastern university · 2016-04-01 · log-linear models with structured...
TRANSCRIPT
![Page 1: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/1.jpg)
Log-Linear Models with Structured Outputs
Natural Language ProcessingCS 4120/6120—Spring 2016
Northeastern University
David Smith(some slides from Andrew McCallum)
![Page 2: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/2.jpg)
Overview
• Sequence labeling task (cf. POS tagging)
• Independent classifiers
• HMMs
• (Conditional) Maximum Entropy Markov Models
• Conditional Random Fields
• Beyond Sequence Labeling
![Page 3: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/3.jpg)
Sequence Labeling
• Inputs: x = (x1, …, xn)
• Labels: y = (y1, …, yn)
• Typical goal: Given x, predict y
• Example sequence labeling tasks
– Part-of-speech tagging
– Named-entity-recognition (NER)
• Label people, places, organizations
![Page 4: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/4.jpg)
NER Example:
![Page 5: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/5.jpg)
First Solution:
Maximum Entropy Classifier• Conditional model p(y|x).
– Do not waste effort modeling p(x), since x
is given at test time anyway.
– Allows more complicated input features,
since we do not need to model
dependencies between them.
• Feature functions f(x,y):
– f1(x,y) = { word is Boston & y=Location }
– f2(x,y) = { first letter capitalized & y=Name }
– f3(x,y) = { x is an HTML link & y=Location}
![Page 6: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/6.jpg)
First Solution: MaxEnt Classifier
• How should we choose a classifier?
• Principle of maximum entropy
– We want a classifier that:
• Matches feature constraints from training data.
• Predictions maximize entropy.
• There is a unique, exponential family
distribution that meets these criteria.
![Page 7: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/7.jpg)
First Solution: MaxEnt Classifier
• Problem with using a maximum entropy
classifier for sequence labeling:
• It makes decisions at each position
independently!
![Page 8: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/8.jpg)
Second Solution: HMM
• Defines a generative process.
• Can be viewed as a weighted finite
state machine.!
P(y,x) = P(yt | yt"1)P(x | yt )t
#
![Page 9: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/9.jpg)
Second Solution: HMM
• How can represent we multiple features
in an HMM?
– Treat them as conditionally independent
given the class label?
• The example features we talked about are not
independent.
– Try to model a more complex generative
process of the input features?
• We may lose tractability (i.e. lose a dynamic
programming for exact inference).
![Page 10: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/10.jpg)
Second Solution: HMM
• Let’s use a conditional model instead.
![Page 11: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/11.jpg)
Third Solution: MEMM
• Use a series of maximum entropy
classifiers that know the previous label.
• Define a Viterbi algorithm for inference.
!
P(y | x) = Pyt"1 (yt | x)t
#
![Page 12: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/12.jpg)
Third Solution: MEMM
• Combines the advantages of maximum
entropy and HMM!
• But there is a problem…
![Page 13: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/13.jpg)
Problem with MEMMs: Label Bias
• In some state space configurations,
MEMMs essentially completely ignore
the inputs.
• Example (ON BOARD).
• This is not a problem for HMMs,
because the input sequence is
generated by the model.
maximum entropy framework. Previously published exper-imental results show MEMMs increasing recall and dou-bling precision relative to HMMs in a FAQ segmentationtask.
MEMMs and other non-generative finite-state modelsbased on next-state classifiers, such as discriminativeMarkov models (Bottou, 1991), share a weakness we callhere the label bias problem: the transitions leaving a givenstate compete only against each other, rather than againstall other transitions in the model. In probabilistic terms,transition scores are the conditional probabilities of pos-sible next states given the current state and the observa-tion sequence. This per-state normalization of transitionscores implies a “conservation of score mass” (Bottou,1991) whereby all the mass that arrives at a state must bedistributed among the possible successor states. An obser-vation can affect which destination states get the mass, butnot how much total mass to pass on. This causes a bias to-ward states with fewer outgoing transitions. In the extremecase, a state with a single outgoing transition effectivelyignores the observation. In those cases, unlike in HMMs,Viterbi decoding cannot downgrade a branch based on ob-servations after the branch point, and models with state-transition structures that have sparsely connected chains ofstates are not properly handled. The Markovian assump-tions in MEMMs and similar state-conditional models in-sulate decisions at one state from future decisions in a waythat does not match the actual dependencies between con-secutive states.
This paper introduces conditional random fields (CRFs), asequence modeling framework that has all the advantagesof MEMMs but also solves the label bias problem in aprincipled way. The critical difference between CRFs andMEMMs is that a MEMM uses per-state exponential mod-els for the conditional probabilities of next states given thecurrent state, while a CRF has a single exponential modelfor the joint probability of the entire sequence of labelsgiven the observation sequence. Therefore, the weights ofdifferent features at different states can be traded off againsteach other.
We can also think of a CRF as a finite state model with un-normalized transition probabilities. However, unlike someother weighted finite-state approaches (LeCun et al., 1998),CRFs assign a well-defined probability distribution overpossible labelings, trained by maximum likelihood or MAPestimation. Furthermore, the loss function is convex,2 guar-anteeing convergence to the global optimum. CRFs alsogeneralize easily to analogues of stochastic context-freegrammars that would be useful in such problems as RNAsecondary structure prediction and natural language pro-cessing.
2In the case of fully observable states, as we are discussinghere; if several states have the same label, the usual local maximaof Baum-Welch arise.
0
1r:_
4r:_
2i:_
3
b:rib
5o:_b:rob
Figure 1. Label bias example, after (Bottou, 1991). For concise-ness, we place observation-label pairs o : l on transitions ratherthan states; the symbol ‘ ’ represents the null output label.
We present the model, describe two training procedures andsketch a proof of convergence. We also give experimentalresults on synthetic data showing that CRFs solve the clas-sical version of the label bias problem, and, more signifi-cantly, that CRFs perform better than HMMs and MEMMswhen the true data distribution has higher-order dependen-cies than the model, as is often the case in practice. Finally,we confirm these results as well as the claimed advantagesof conditional models by evaluating HMMs, MEMMs andCRFs with identical state structure on a part-of-speech tag-ging task.
2. The Label Bias ProblemClassical probabilistic automata (Paz, 1971), discrimina-tive Markov models (Bottou, 1991), maximum entropytaggers (Ratnaparkhi, 1996), and MEMMs, as well asnon-probabilistic sequence tagging and segmentation mod-els with independently trained next-state classifiers (Pun-yakanok & Roth, 2001) are all potential victims of the labelbias problem.
For example, Figure 1 represents a simple finite-statemodel designed to distinguish between the two words rib
and rob. Suppose that the observation sequence is r i b.In the first time step, r matches both transitions from thestart state, so the probability mass gets distributed roughlyequally among those two transitions. Next we observe i.Both states 1 and 4 have only one outgoing transition. State1 has seen this observation often in training, state 4 has al-most never seen this observation; but like state 1, state 4has no choice but to pass all its mass to its single outgoingtransition, since it is not generating the observation, onlyconditioning on it. Thus, states with a single outgoing tran-sition effectively ignore their observations. More generally,states with low-entropy next state distributions will take lit-tle notice of observations. Returning to the example, thetop path and the bottom path will be about equally likely,independently of the observation sequence. If one of thetwo words is slightly more common in the training set, thetransitions out of the start state will slightly prefer its cor-responding transition, and that word’s state sequence willalways win. This behavior is demonstrated experimentallyin Section 5.
Leon Bottou (1991) discussed two solutions for the labelbias problem. One is to change the state-transition struc-
![Page 14: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/14.jpg)
Fourth Solution:
Conditional Random Field
• Conditionally-trained, undirected
graphical model.
• For a standard linear-chain structure:
!
P(y | x) = "k (yt ,yt#1,x)t
$
"k (yt ,yt#1,x) = exp %kk
& f (yt ,yt#1,x)'
( )
*
+ ,
![Page 15: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/15.jpg)
Fourth Solution: CRF
• Have the advantages of MEMMs, but
avoid the label bias problem.
• CRFs are globally normalized, whereas
MEMMs are locally normalized.
• Widely used and applied. CRFs give
state-the-art results in many domains.
![Page 16: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/16.jpg)
Fourth Solution: CRF
• Have the advantages of MEMMs, but
avoid the label bias problem.
• CRFs are globally normalized, whereas
MEMMs are locally normalized.
• Widely used and applied. CRFs give
state-the-art results in many domains.Remember, Z is the normalization constant. How do we compute it?
![Page 17: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/17.jpg)
CRF Applications• Part-of-speech tagging
• Named entity recognition
• Document layout (e.g. table) classification
• Gene prediction
• Chinese word segmentation
• Morphological disambiguation
• Citation parsing
• Etc., etc.
![Page 18: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/18.jpg)
NER as Sequence Tagging
17
The Phoenicians came from the Red Sea
![Page 19: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/19.jpg)
NER as Sequence Tagging
17
The Phoenicians came from the Red Sea
O B-E O O O B-L I-L
![Page 20: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/20.jpg)
NER as Sequence Tagging
17
The Phoenicians came from the Red Sea
O B-E O O O B-L I-L
Capitalized word
![Page 21: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/21.jpg)
NER as Sequence Tagging
17
The Phoenicians came from the Red Sea
O B-E O O O B-L I-L
Capitalized word
Ends in “s”
![Page 22: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/22.jpg)
NER as Sequence Tagging
17
The Phoenicians came from the Red Sea
O B-E O O O B-L I-L
Capitalized word
Ends in “s” Ends in “ans”
![Page 23: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/23.jpg)
NER as Sequence Tagging
17
The Phoenicians came from the Red Sea
O B-E O O O B-L I-L
Capitalized word
Ends in “s” Ends in “ans”
Previous word “the”
![Page 24: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/24.jpg)
NER as Sequence Tagging
17
The Phoenicians came from the Red Sea
O B-E O O O B-L I-L
Capitalized word
Ends in “s” Ends in “ans”
Previous word “the”
“Phoenicians” in gazetteer
![Page 25: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/25.jpg)
NER as Sequence Tagging
18
The Phoenicians came from the Red Sea
O B-E O O O B-L I-L
![Page 26: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/26.jpg)
NER as Sequence Tagging
18
The Phoenicians came from the Red Sea
O B-E O O O B-L I-L
Not capitalized
![Page 27: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/27.jpg)
NER as Sequence Tagging
18
The Phoenicians came from the Red Sea
O B-E O O O B-L I-L
Not capitalized
Tagged as “VB”
![Page 28: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/28.jpg)
NER as Sequence Tagging
18
The Phoenicians came from the Red Sea
O B-E O O O B-L I-L
Not capitalized
Tagged as “VB”B-E to right
![Page 29: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/29.jpg)
NER as Sequence Tagging
19
The Phoenicians came from the Red Sea
O B-E O O O B-L I-L
![Page 30: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/30.jpg)
NER as Sequence Tagging
19
The Phoenicians came from the Red Sea
O B-E O O O B-L I-L
Word “sea”
![Page 31: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/31.jpg)
NER as Sequence Tagging
19
The Phoenicians came from the Red Sea
O B-E O O O B-L I-L
Word “sea”
Word “sea” preceded by “the ADJ”
![Page 32: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/32.jpg)
NER as Sequence Tagging
19
The Phoenicians came from the Red Sea
O B-E O O O B-L I-L
Word “sea”
Word “sea” preceded by “the ADJ”
Hard constraint: I-L must follow B-L or I-L
![Page 33: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/33.jpg)
Overview
• What computations do we need?
• Smoothing log-linear models
• MEMMs vs. CRFs again
• Action-based parsing and dependency parsing
![Page 34: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/34.jpg)
1.Gather constraints/features from training data
2.Initialize all parameters to zero
3.Classify training data with current parameters; calculate expectations
4.Gradient is
5.Take a step in the direction of the gradient
6.Repeat from 3 until convergence 43 43
Recipe for a Conditional
MaxEnt Classifier
1. Gather constraints from training data:
2. Initialize all parameters to zero.
3. Classify training data with current parameters. Calculateexpectations.
4. Gradient is
5. Take a step in the direction of the gradient
6. Until convergence, return to step 3.
43
43 43
Recipe for a Conditional
MaxEnt Classifier
1. Gather constraints from training data:
2. Initialize all parameters to zero.
3. Classify training data with current parameters. Calculateexpectations.
4. Gradient is
5. Take a step in the direction of the gradient
6. Until convergence, return to step 3.
4343 43
Recipe for a Conditional
MaxEnt Classifier
1. Gather constraints from training data:
2. Initialize all parameters to zero.
3. Classify training data with current parameters. Calculateexpectations.
4. Gradient is
5. Take a step in the direction of the gradient
6. Until convergence, return to step 3.
43
Recipe for Conditional Training of p(y | x)
![Page 35: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/35.jpg)
1.Gather constraints/features from training data
2.Initialize all parameters to zero
3.Classify training data with current parameters; calculate expectations
4.Gradient is
5.Take a step in the direction of the gradient
6.Repeat from 3 until convergence 43 43
Recipe for a Conditional
MaxEnt Classifier
1. Gather constraints from training data:
2. Initialize all parameters to zero.
3. Classify training data with current parameters. Calculateexpectations.
4. Gradient is
5. Take a step in the direction of the gradient
6. Until convergence, return to step 3.
43
43 43
Recipe for a Conditional
MaxEnt Classifier
1. Gather constraints from training data:
2. Initialize all parameters to zero.
3. Classify training data with current parameters. Calculateexpectations.
4. Gradient is
5. Take a step in the direction of the gradient
6. Until convergence, return to step 3.
4343 43
Recipe for a Conditional
MaxEnt Classifier
1. Gather constraints from training data:
2. Initialize all parameters to zero.
3. Classify training data with current parameters. Calculateexpectations.
4. Gradient is
5. Take a step in the direction of the gradient
6. Until convergence, return to step 3.
43
Where have we seen expected counts before?
Recipe for Conditional Training of p(y | x)
![Page 36: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/36.jpg)
1.Gather constraints/features from training data
2.Initialize all parameters to zero
3.Classify training data with current parameters; calculate expectations
4.Gradient is
5.Take a step in the direction of the gradient
6.Repeat from 3 until convergence 43 43
Recipe for a Conditional
MaxEnt Classifier
1. Gather constraints from training data:
2. Initialize all parameters to zero.
3. Classify training data with current parameters. Calculateexpectations.
4. Gradient is
5. Take a step in the direction of the gradient
6. Until convergence, return to step 3.
43
43 43
Recipe for a Conditional
MaxEnt Classifier
1. Gather constraints from training data:
2. Initialize all parameters to zero.
3. Classify training data with current parameters. Calculateexpectations.
4. Gradient is
5. Take a step in the direction of the gradient
6. Until convergence, return to step 3.
4343 43
Recipe for a Conditional
MaxEnt Classifier
1. Gather constraints from training data:
2. Initialize all parameters to zero.
3. Classify training data with current parameters. Calculateexpectations.
4. Gradient is
5. Take a step in the direction of the gradient
6. Until convergence, return to step 3.
43
Where have we seen expected counts before?
EM!
Recipe for Conditional Training of p(y | x)
![Page 37: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/37.jpg)
Gradient-Based Training
• λ := λ + rate * Gradient(F)
• After all training examples? (batch)
• After every example? (on-line)
• Use second derivative for faster learning?
• A big field: numerical optimization
![Page 38: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/38.jpg)
Parsing as Structured Prediction
![Page 39: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/39.jpg)
Shift-reduce parsing
Stack Input remaining Action() Book that flight shift(Book) that flight reduce, Verb � book, (Choice #1 of 2)(Verb) that flight shift(Verb that) flight reduce, Det � that(Verb Det) flight shift(Verb Det flight) reduce, Noun � flight(Verb Det Noun) reduce, NOM � Noun(Verb Det NOM) reduce, NP � Det NOM(Verb NP) reduce, VP � Verb NP(Verb) reduce, S � V(S) SUCCESS!
Ambiguity may lead to the need for backtracking.
![Page 40: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/40.jpg)
Shift-reduce parsing
Stack Input remaining Action() Book that flight shift(Book) that flight reduce, Verb � book, (Choice #1 of 2)(Verb) that flight shift(Verb that) flight reduce, Det � that(Verb Det) flight shift(Verb Det flight) reduce, Noun � flight(Verb Det Noun) reduce, NOM � Noun(Verb Det NOM) reduce, NP � Det NOM(Verb NP) reduce, VP � Verb NP(Verb) reduce, S � V(S) SUCCESS!
Ambiguity may lead to the need for backtracking.
![Page 41: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/41.jpg)
Shift-reduce parsing
Stack Input remaining Action() Book that flight shift(Book) that flight reduce, Verb � book, (Choice #1 of 2)(Verb) that flight shift(Verb that) flight reduce, Det � that(Verb Det) flight shift(Verb Det flight) reduce, Noun � flight(Verb Det Noun) reduce, NOM � Noun(Verb Det NOM) reduce, NP � Det NOM(Verb NP) reduce, VP � Verb NP(Verb) reduce, S � V(S) SUCCESS!
Ambiguity may lead to the need for backtracking.
Train log-linear model of p(action | context)
![Page 42: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/42.jpg)
Shift-reduce parsing
Stack Input remaining Action() Book that flight shift(Book) that flight reduce, Verb � book, (Choice #1 of 2)(Verb) that flight shift(Verb that) flight reduce, Det � that(Verb Det) flight shift(Verb Det flight) reduce, Noun � flight(Verb Det Noun) reduce, NOM � Noun(Verb Det NOM) reduce, NP � Det NOM(Verb NP) reduce, VP � Verb NP(Verb) reduce, S � V(S) SUCCESS!
Ambiguity may lead to the need for backtracking.
Train log-linear model of p(action | context)
Compare to an MEMM
![Page 43: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/43.jpg)
25
Structured Log-Linear Models
![Page 44: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/44.jpg)
25
Structured Log-Linear Models• Linear model for scoring structures
score(out, in) = � · features(out, in)
![Page 45: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/45.jpg)
25
Structured Log-Linear Models• Linear model for scoring structures
• Get a probability distribution by normalizing
p(out | in) =1Z
escore(out,in)
score(out, in) = � · features(out, in)
Z =�
out��GEN(in)
escore(out�,in)
![Page 46: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/46.jpg)
25
Structured Log-Linear Models• Linear model for scoring structures
• Get a probability distribution by normalizing✤ Viz. logistic regression, Markov random fields, undirected
graphical models
p(out | in) =1Z
escore(out,in)
score(out, in) = � · features(out, in)
Z =�
out��GEN(in)
escore(out�,in)
![Page 47: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/47.jpg)
25
Structured Log-Linear Models• Linear model for scoring structures
• Get a probability distribution by normalizing✤ Viz. logistic regression, Markov random fields, undirected
graphical models
p(out | in) =1Z
escore(out,in)
score(out, in) = � · features(out, in)Usually the bottleneck in NLP
Z =�
out��GEN(in)
escore(out�,in)
![Page 48: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/48.jpg)
25
Structured Log-Linear Models• Linear model for scoring structures
• Get a probability distribution by normalizing✤ Viz. logistic regression, Markov random fields, undirected
graphical models
• Inference: sampling, variational methods, dynamic programming, local search, ...
p(out | in) =1Z
escore(out,in)
score(out, in) = � · features(out, in)Usually the bottleneck in NLP
Z =�
out��GEN(in)
escore(out�,in)
![Page 49: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/49.jpg)
25
Structured Log-Linear Models• Linear model for scoring structures
• Get a probability distribution by normalizing✤ Viz. logistic regression, Markov random fields, undirected
graphical models
• Inference: sampling, variational methods, dynamic programming, local search, ...
• Training: maximum likelihood, minimum risk, etc.
p(out | in) =1Z
escore(out,in)
score(out, in) = � · features(out, in)Usually the bottleneck in NLP
Z =�
out��GEN(in)
escore(out�,in)
![Page 50: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/50.jpg)
26
• Several layers of linguistic structure
• Unknown correspondences
• Naturally handled by probabilistic framework
• Several inference setups, for example:
With latent variables
Structured Log-Linear Models
p(out1 | in) =�
out2,alignment
p(out1, out2, alignment | in)
![Page 51: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/51.jpg)
26
• Several layers of linguistic structure
• Unknown correspondences
• Naturally handled by probabilistic framework
• Several inference setups, for example:
With latent variables
Another computational problem
Structured Log-Linear Models
p(out1 | in) =�
out2,alignment
p(out1, out2, alignment | in)
![Page 52: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/52.jpg)
27
Edge-Factored Parsers• No global features of a parse (McDonald et al. 2005)
• Each feature is attached to some edge
• MST or CKY-like DP for fast O(n2) or O(n3) parsing
Byl jasný studený dubnový den a hodiny odbíjely třináctou
“It bright cold day April and clocks were thirteen”was a in the striking
![Page 53: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/53.jpg)
28
Edge-Factored Parsers
Byl jasný studený dubnový den a hodiny odbíjely třináctou
• Is this a good edge?
“It bright cold day April and clocks were thirteen”was a in the striking
![Page 54: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/54.jpg)
28
Edge-Factored Parsers
Byl jasný studený dubnový den a hodiny odbíjely třináctou
• Is this a good edge?
yes, lots of positive features ...
“It bright cold day April and clocks were thirteen”was a in the striking
![Page 55: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/55.jpg)
29
Edge-Factored Parsers
Byl jasný studený dubnový den a hodiny odbíjely třináctou
• Is this a good edge?
“It bright cold day April and clocks were thirteen”was a in the striking
![Page 56: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/56.jpg)
29
Edge-Factored Parsers
Byl jasný studený dubnový den a hodiny odbíjely třináctou
• Is this a good edge?
jasný ! den(“bright day”)
“It bright cold day April and clocks were thirteen”was a in the striking
![Page 57: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/57.jpg)
30
Edge-Factored Parsers
Byl jasný studený dubnový den a hodiny odbíjely třináctou
• Is this a good edge?
jasný ! den(“bright day”)
V A A A N J N V C
“It bright cold day April and clocks were thirteen”was a in the striking
![Page 58: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/58.jpg)
30
Edge-Factored Parsers
Byl jasný studený dubnový den a hodiny odbíjely třináctou
• Is this a good edge?
jasný ! den(“bright day”)
jasný ! N(“bright NOUN”)
V A A A N J N V C
“It bright cold day April and clocks were thirteen”was a in the striking
![Page 59: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/59.jpg)
31
Edge-Factored Parsers
Byl jasný studený dubnový den a hodiny odbíjely třináctou
• Is this a good edge?
jasný ! den(“bright day”)
jasný ! N(“bright NOUN”)
V A A A N J N V C
“It bright cold day April and clocks were thirteen”was a in the striking
![Page 60: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/60.jpg)
31
Edge-Factored Parsers
Byl jasný studený dubnový den a hodiny odbíjely třináctou
• Is this a good edge?
jasný ! den(“bright day”)
jasný ! N(“bright NOUN”)
V A A A N J N V C
A ! N
“It bright cold day April and clocks were thirteen”was a in the striking
![Page 61: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/61.jpg)
32
Edge-Factored Parsers
Byl jasný studený dubnový den a hodiny odbíjely třináctou
• Is this a good edge?
jasný ! den(“bright day”)
jasný ! N(“bright NOUN”)
V A A A N J N V C
A ! N
“It bright cold day April and clocks were thirteen”was a in the striking
![Page 62: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/62.jpg)
32
Edge-Factored Parsers
Byl jasný studený dubnový den a hodiny odbíjely třináctou
• Is this a good edge?
jasný ! den(“bright day”)
jasný ! N(“bright NOUN”)
V A A A N J N V C
A ! Npreceding
conjunction A ! N
“It bright cold day April and clocks were thirteen”was a in the striking
![Page 63: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/63.jpg)
33
Edge-Factored Parsers
Byl jasný studený dubnový den a hodiny odbíjely třináctou
• How about this competing edge?
V A A A N J N V C
“It bright cold day April and clocks were thirteen”was a in the striking
![Page 64: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/64.jpg)
33
Edge-Factored Parsers
Byl jasný studený dubnový den a hodiny odbíjely třináctou
• How about this competing edge?
V A A A N J N V C
not as good, lots of red ...
“It bright cold day April and clocks were thirteen”was a in the striking
![Page 65: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/65.jpg)
34
Edge-Factored Parsers
Byl jasný studený dubnový den a hodiny odbíjely třináctou
• How about this competing edge?
V A A A N J N V C
“It bright cold day April and clocks were thirteen”was a in the striking
![Page 66: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/66.jpg)
34
Edge-Factored Parsers
Byl jasný studený dubnový den a hodiny odbíjely třináctou
• How about this competing edge?
V A A A N J N V C
jasný ! hodiny(“bright clocks”)
“It bright cold day April and clocks were thirteen”was a in the striking
![Page 67: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/67.jpg)
34
Edge-Factored Parsers
Byl jasný studený dubnový den a hodiny odbíjely třináctou
• How about this competing edge?
V A A A N J N V C
jasný ! hodiny(“bright clocks”)
... undertrained ...
“It bright cold day April and clocks were thirteen”was a in the striking
![Page 68: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/68.jpg)
35
Edge-Factored Parsers
Byl jasný studený dubnový den a hodiny odbíjely třináctou
• How about this competing edge?
V A A A N J N V C
byl jasn stud dubn den a hodi odbí třin
jasný ! hodiny(“bright clocks”)
... undertrained ...
“It bright cold day April and clocks were thirteen”was a in the striking
![Page 69: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/69.jpg)
35
Edge-Factored Parsers
Byl jasný studený dubnový den a hodiny odbíjely třináctou
• How about this competing edge?
V A A A N J N V C
byl jasn stud dubn den a hodi odbí třin
jasný ! hodiny(“bright clocks”)
... undertrained ...
jasn ! hodi(“bright clock,”
stems only)
“It bright cold day April and clocks were thirteen”was a in the striking
![Page 70: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/70.jpg)
36
Edge-Factored Parsers
Byl jasný studený dubnový den a hodiny odbíjely třináctou
• How about this competing edge?
V A A A N J N V C
jasn ! hodi(“bright clock,”
stems only)
byl jasn stud dubn den a hodi odbí třin
jasný ! hodiny(“bright clocks”)
... undertrained ...
“It bright cold day April and clocks were thirteen”was a in the striking
![Page 71: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/71.jpg)
36
Edge-Factored Parsers
Byl jasný studený dubnový den a hodiny odbíjely třináctou
• How about this competing edge?
V A A A N J N V C
jasn ! hodi(“bright clock,”
stems only)
byl jasn stud dubn den a hodi odbí třin
Asingular ! Nplural
jasný ! hodiny(“bright clocks”)
... undertrained ...
“It bright cold day April and clocks were thirteen”was a in the striking
![Page 72: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/72.jpg)
37
jasný ! hodiny(“bright clocks”)
... undertrained ...
Edge-Factored Parsers
Byl jasný studený dubnový den a hodiny odbíjely třináctou
• How about this competing edge?
V A A A N J N V C
jasn ! hodi(“bright clock,”
stems only)
byl jasn stud dubn den a hodi odbí třin
Asingular ! Nplural
“It bright cold day April and clocks were thirteen”was a in the striking
![Page 73: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/73.jpg)
37
jasný ! hodiny(“bright clocks”)
... undertrained ...
Edge-Factored Parsers
Byl jasný studený dubnový den a hodiny odbíjely třináctou
• How about this competing edge?
V A A A N J N V C
jasn ! hodi(“bright clock,”
stems only)
byl jasn stud dubn den a hodi odbí třin
Asingular ! Nplural A ! N
where N follows a conjunction
“It bright cold day April and clocks were thirteen”was a in the striking
![Page 74: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/74.jpg)
38
jasný
Edge-Factored Parsers
Byl studený dubnový den a hodiny odbíjely třináctou
V A A A N J N V C
byl jasn stud dubn den a hodi odbí třin
• Which edge is better?
• “bright day” or “bright clocks”?
“It bright cold day April and clocks were thirteen”was a in the striking
![Page 75: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/75.jpg)
39
jasný
Edge-Factored Parsers
Byl studený dubnový den a hodiny odbíjely třináctou
“It bright cold day April and clocks were thirteen”was a in the striking
V A A A N J N V C
byl jasn stud dubn den a hodi odbí třin
• Which edge is better?
• Score of an edge e = θ ⋅ features(e)
• Standard algos " valid parse with max total score
![Page 76: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/76.jpg)
39
jasný
Edge-Factored Parsers
Byl studený dubnový den a hodiny odbíjely třináctou
“It bright cold day April and clocks were thirteen”was a in the striking
V A A A N J N V C
byl jasn stud dubn den a hodi odbí třin
• Which edge is better?
• Score of an edge e = θ ⋅ features(e)
• Standard algos " valid parse with max total score
our current weight vector
![Page 77: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/77.jpg)
40
# First, a familiar example $ Conditional Random Field (CRF) for POS tagging
Local factors in a graphical model
……
find preferred tags
![Page 78: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/78.jpg)
40
# First, a familiar example $ Conditional Random Field (CRF) for POS tagging
Local factors in a graphical model
……
find preferred tags
Observed input sentence (shaded)
![Page 79: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/79.jpg)
40
# First, a familiar example $ Conditional Random Field (CRF) for POS tagging
Local factors in a graphical model
……
find preferred tags
v v v
Possible tagging (i.e., assignment to remaining variables)
Observed input sentence (shaded)
![Page 80: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/80.jpg)
41
Local factors in a graphical model# First, a familiar example
$ Conditional Random Field (CRF) for POS tagging
……
find preferred tags
Possible tagging (i.e., assignment to remaining variables) Another possible tagging
Observed input sentence (shaded)
![Page 81: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/81.jpg)
41
Local factors in a graphical model# First, a familiar example
$ Conditional Random Field (CRF) for POS tagging
……
find preferred tags
v a n
Possible tagging (i.e., assignment to remaining variables) Another possible tagging
Observed input sentence (shaded)
![Page 82: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/82.jpg)
42
Local factors in a graphical model# First, a familiar example
$ Conditional Random Field (CRF) for POS tagging
……
find preferred tags
v n av 0 2 1n 2 1 0a 0 3 1
”Binary” factor that measures
compatibility of 2 adjacent tags
![Page 83: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/83.jpg)
42
Local factors in a graphical model# First, a familiar example
$ Conditional Random Field (CRF) for POS tagging
……
find preferred tags
v n av 0 2 1n 2 1 0a 0 3 1
v n av 0 2 1n 2 1 0a 0 3 1
”Binary” factor that measures
compatibility of 2 adjacent tags
Model reusessame parameters at this position
![Page 84: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/84.jpg)
43
Local factors in a graphical model# First, a familiar example
$ Conditional Random Field (CRF) for POS tagging
……
find preferred tags
v 0.2n 0.2a 0
![Page 85: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/85.jpg)
43
Local factors in a graphical model# First, a familiar example
$ Conditional Random Field (CRF) for POS tagging
……
find preferred tags
v 0.2n 0.2a 0
“Unary” factor evaluates this tag
![Page 86: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/86.jpg)
43
Local factors in a graphical model# First, a familiar example
$ Conditional Random Field (CRF) for POS tagging
……
find preferred tags
v 0.2n 0.2a 0
“Unary” factor evaluates this tagIts values depend on corresponding word
![Page 87: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/87.jpg)
43
Local factors in a graphical model# First, a familiar example
$ Conditional Random Field (CRF) for POS tagging
……
find preferred tags
v 0.2n 0.2a 0
“Unary” factor evaluates this tagIts values depend on corresponding word
can’t be adj
v 0.2n 0.2a 0
![Page 88: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/88.jpg)
44
Local factors in a graphical model# First, a familiar example
$ Conditional Random Field (CRF) for POS tagging
……
find preferred tags
v 0.2n 0.2a 0
“Unary” factor evaluates this tag Its values depend on corresponding word
![Page 89: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/89.jpg)
44
Local factors in a graphical model# First, a familiar example
$ Conditional Random Field (CRF) for POS tagging
……
find preferred tags
v 0.2n 0.2a 0
“Unary” factor evaluates this tag Its values depend on corresponding word
(could be made to depend on entire observed sentence)
![Page 90: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/90.jpg)
45
Local factors in a graphical model# First, a familiar example
$ Conditional Random Field (CRF) for POS tagging
……
find preferred tags
v 0.2n 0.2a 0
v 0.3n 0.02a 0
v 0.3n 0a 0.1
![Page 91: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/91.jpg)
45
Local factors in a graphical model# First, a familiar example
$ Conditional Random Field (CRF) for POS tagging
……
find preferred tags
v 0.2n 0.2a 0
“Unary” factor evaluates this tagDifferent unary factor at each position
v 0.3n 0.02a 0
v 0.3n 0a 0.1
![Page 92: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/92.jpg)
46
Local factors in a graphical model# First, a familiar example
$ Conditional Random Field (CRF) for POS tagging
……
find preferred tags
v n av 0 2 1n 2 1 0a 0 3 1
v 0.3n 0.02a 0
v n av 0 2 1n 2 1 0a 0 3 1
v 0.3n 0a 0.1
v 0.2n 0.2a 0
![Page 93: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/93.jpg)
46
Local factors in a graphical model# First, a familiar example
$ Conditional Random Field (CRF) for POS tagging
……
find preferred tags
v n av 0 2 1n 2 1 0a 0 3 1
v 0.3n 0.02a 0
v n av 0 2 1n 2 1 0a 0 3 1
v 0.3n 0a 0.1
v 0.2n 0.2a 0
v a n
p(v a n) is proportional to the product of all
factors’ values on v a n
![Page 94: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/94.jpg)
47
Local factors in a graphical model# First, a familiar example
$ Conditional Random Field (CRF) for POS tagging
……
find preferred tags
v n av 0 2 1n 2 1 0a 0 3 1
v 0.3n 0.02a 0
v n av 0 2 1n 2 1 0a 0 3 1
v 0.3n 0a 0.1
v 0.2n 0.2a 0
v a n
p(v a n) is proportional to the product of all
factors’ values on v a n
![Page 95: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/95.jpg)
47
Local factors in a graphical model# First, a familiar example
$ Conditional Random Field (CRF) for POS tagging
……
find preferred tags
v n av 0 2 1n 2 1 0a 0 3 1
v 0.3n 0.02a 0
v n av 0 2 1n 2 1 0a 0 3 1
v 0.3n 0a 0.1
v 0.2n 0.2a 0
v a n
= … 1*3*0.3*0.1*0.2 …p(v a n) is proportional
to the product of all factors’ values on v a n
![Page 96: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/96.jpg)
48
• First, a labeling example✤ CRF for POS tagging
• Now let’s do dependency parsing!✤ O(n2) boolean variables for the possible links
v a n
Graphical Models for Parsing
find preferred links ……
![Page 97: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/97.jpg)
48
• First, a labeling example✤ CRF for POS tagging
• Now let’s do dependency parsing!✤ O(n2) boolean variables for the possible links
v a n
Graphical Models for Parsing
find preferred links ……
![Page 98: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/98.jpg)
48
• First, a labeling example✤ CRF for POS tagging
• Now let’s do dependency parsing!✤ O(n2) boolean variables for the possible links
v a n
Graphical Models for Parsing
find preferred links ……
![Page 99: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/99.jpg)
49
• First, a labeling example✤ CRF for POS tagging
• Now let’s do dependency parsing!✤ O(n2) boolean variables for the possible links
v a n
Graphical Models for Parsing
find preferred links ……
Possible parse...
![Page 100: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/100.jpg)
49
• First, a labeling example✤ CRF for POS tagging
• Now let’s do dependency parsing!✤ O(n2) boolean variables for the possible links
v a n
Graphical Models for Parsing
find preferred links ……
tf
ff
ft
Possible parse...
... and variable
assignment
![Page 101: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/101.jpg)
50
• First, a labeling example✤ CRF for POS tagging
• Now let’s do dependency parsing!✤ O(n2) boolean variables for the possible links
v a n
Graphical Models for Parsing
find preferred links ……
Another parse...
![Page 102: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/102.jpg)
50
• First, a labeling example✤ CRF for POS tagging
• Now let’s do dependency parsing!✤ O(n2) boolean variables for the possible links
v a n
Graphical Models for Parsing
find preferred links ……
ff
tf
tf
Another parse...
... and variable
assignment
![Page 103: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/103.jpg)
51
• First, a labeling example✤ CRF for POS tagging
• Now let’s do dependency parsing!✤ O(n2) boolean variables for the possible links
v a n
Graphical Models for Parsing
find preferred links ……
ft
tf
tf
An illegal parse...
... with a cycle!
![Page 104: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/104.jpg)
52
• First, a labeling example✤ CRF for POS tagging
• Now let’s do dependency parsing!✤ O(n2) boolean variables for the possible links
v a n
Graphical Models for Parsing
find preferred links ……
ft
tf
tt
An illegal parse...
... with a cycle and multiple parents!
![Page 105: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/105.jpg)
53
• What factors determine parse probability?
Local Factors for Parsing
find preferred links ……
![Page 106: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/106.jpg)
53
• What factors determine parse probability?✤ Unary factors to score each link in isolation
Local Factors for Parsing
find preferred links ……
t 2f 1
![Page 107: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/107.jpg)
53
• What factors determine parse probability?✤ Unary factors to score each link in isolation
Local Factors for Parsing
find preferred links ……
t 2f 1
t 1f 2
t 1f 2
t 1f 6
t 1f 3
t 1f 8
![Page 108: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/108.jpg)
53
• What factors determine parse probability?✤ Unary factors to score each link in isolation
• But what if the best assignment isn’t a tree?
Local Factors for Parsing
find preferred links ……
t 2f 1
t 1f 2
t 1f 2
t 1f 6
t 1f 3
t 1f 8
![Page 109: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/109.jpg)
54
• What factors determine parse probability?✤ Unary factors to score each link in isolation
Global Factors for Parsing
find preferred links ……
![Page 110: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/110.jpg)
54
• What factors determine parse probability?✤ Unary factors to score each link in isolation
✤ Global TREE factor to require links to form a legal tree
Global Factors for Parsing
find preferred links ……
ffffff 0ffffft 0fffftf 0
… …fftfft 1
… …tttttt 0
![Page 111: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/111.jpg)
54
• What factors determine parse probability?✤ Unary factors to score each link in isolation
✤ Global TREE factor to require links to form a legal tree
• A hard constraint: potential is either 0 or 1
Global Factors for Parsing
find preferred links ……
ffffff 0ffffft 0fffftf 0
… …fftfft 1
… …tttttt 0
![Page 112: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/112.jpg)
55
• What factors determine parse probability?✤ Unary factors to score each link in isolation
✤ Global TREE factor to require links to form a legal tree
• A hard constraint: potential is either 0 or 1
Global Factors for Parsing
find preferred links ……
tf
ff
ft
ffffff 0ffffft 0fffftf 0
… …fftfft 1
… …tttttt 0
we’re legal!
![Page 113: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/113.jpg)
55
• What factors determine parse probability?✤ Unary factors to score each link in isolation
✤ Global TREE factor to require links to form a legal tree
• A hard constraint: potential is either 0 or 1
Global Factors for Parsing
find preferred links ……
tf
ff
ft
ffffff 0ffffft 0fffftf 0
… …fftfft 1
… …tttttt 0
optionally require the tree to be projective (no crossing links)
So far, this is equivalent to edge-factored parsing
we’re legal!
64 entries (0/1)
![Page 114: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/114.jpg)
55
• What factors determine parse probability?✤ Unary factors to score each link in isolation
✤ Global TREE factor to require links to form a legal tree
• A hard constraint: potential is either 0 or 1
Global Factors for Parsing
find preferred links ……
tf
ff
ft
ffffff 0ffffft 0fffftf 0
… …fftfft 1
… …tttttt 0
optionally require the tree to be projective (no crossing links)
So far, this is equivalent to edge-factored parsing
we’re legal!
64 entries (0/1)
Note: traditional parsers don’t loop through this table to consider exponentially many trees one at a time. They use combinatorial algorithms; so should we!
![Page 115: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/115.jpg)
56
• What factors determine parse probability?✤ Unary factors to score each link in isolation
✤ Global TREE factor to require links to form a legal tree
• A hard constraint: potential is either 0 or 1
✤ Second order effects: factors on 2 variables
• Grandparent–parent–child chains
Local Factors for Parsing
find preferred links ……
![Page 116: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/116.jpg)
56
• What factors determine parse probability?✤ Unary factors to score each link in isolation
✤ Global TREE factor to require links to form a legal tree
• A hard constraint: potential is either 0 or 1
✤ Second order effects: factors on 2 variables
• Grandparent–parent–child chains
Local Factors for Parsing
find preferred links ……
f t
f 1 1
t 1 3
![Page 117: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/117.jpg)
57
Local Factors for Parsing
find preferred links ……
• What factors determine parse probability?✤ Unary factors to score each link in isolation
✤ Global TREE factor to require links to form a legal tree
• A hard constraint: potential is either 0 or 1
✤ Second order effects: factors on 2 variables
• Grandparent–parent–child chains
• No crossing links
• Siblings
✤ Hidden morphological tags
✤ Word senses and subcategorization frames
![Page 118: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/118.jpg)
58
Great Ideas in ML: Message Passing
adapted from MacKay (2003) textbook
![Page 119: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/119.jpg)
58
Great Ideas in ML: Message Passing
adapted from MacKay (2003) textbook
Count the soldiers
![Page 120: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/120.jpg)
58
Great Ideas in ML: Message Passing
adapted from MacKay (2003) textbook
Count the soldiers
3 behind
you
2 behind
you
1 behind
you
![Page 121: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/121.jpg)
58
Great Ideas in ML: Message Passingthere’s 1 of me
adapted from MacKay (2003) textbook
Count the soldiers
3 behind
you
2 behind
you
1 behind
you
![Page 122: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/122.jpg)
58
Great Ideas in ML: Message Passing
4 behind
you
5 behind
you
there’s 1 of me
adapted from MacKay (2003) textbook
Count the soldiers
3 behind
you
2 behind
you
1 behind
you
![Page 123: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/123.jpg)
58
Great Ideas in ML: Message Passing
4 behind
you
5 behind
you
1 before
you
2 before
you
adapted from MacKay (2003) textbook
Count the soldiers
3 behind
you
2 behind
you
1 behind
you
![Page 124: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/124.jpg)
58
Great Ideas in ML: Message Passing
4 behind
you
5 behind
you
1 before
you
2 before
you
3 before
you
4 before
you
5 before
you
adapted from MacKay (2003) textbook
Count the soldiers
3 behind
you
2 behind
you
1 behind
you
![Page 125: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/125.jpg)
59
Great Ideas in ML: Message Passing
2 before
you
there’s 1 of me
only see my incoming
messages
Count the soldiers
adapted from MacKay (2003) textbook
3 behind
you
![Page 126: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/126.jpg)
59
Great Ideas in ML: Message Passing
2 before
you
there’s 1 of me
Belief:Must be
2 + 1 + 3 = 6 of us
only see my incoming
messages
Count the soldiers
adapted from MacKay (2003) textbook
3 behind
you
![Page 127: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/127.jpg)
59
Great Ideas in ML: Message Passing
2 before
you
there’s 1 of me
Belief:Must be
2 + 1 + 3 = 6 of us
only see my incoming
messages
2 31
Count the soldiers
adapted from MacKay (2003) textbook
3 behind
you
![Page 128: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/128.jpg)
60
Belief:Must be
2 + 1 + 3 = 6 of us
2 31
Great Ideas in ML: Message Passing
1 before
you
there’s 1 of me
only see my incoming messages
Count the soldiers
adapted from MacKay (2003) textbook
4 behind
you
![Page 129: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/129.jpg)
60
Great Ideas in ML: Message Passing
1 before
you
there’s 1 of me
only see my incoming messages
Belief:Must be
1 + 1 + 4 = 6 of us
1 41
Count the soldiers
adapted from MacKay (2003) textbook
4 behind
you
![Page 130: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/130.jpg)
61
Great Ideas in ML: Message PassingEach soldier receives reports from all branches of tree
adapted from MacKay (2003) textbook
![Page 131: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/131.jpg)
61
Great Ideas in ML: Message Passing
7 here
3 here
Each soldier receives reports from all branches of tree
adapted from MacKay (2003) textbook
![Page 132: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/132.jpg)
61
Great Ideas in ML: Message Passing
7 here
3 here
11 here (= 7+3+1)
1 of me
Each soldier receives reports from all branches of tree
adapted from MacKay (2003) textbook
![Page 133: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/133.jpg)
62
Great Ideas in ML: Message PassingEach soldier receives reports from all branches of tree
adapted from MacKay (2003) textbook
![Page 134: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/134.jpg)
62
Great Ideas in ML: Message Passing
3 here
3 here
Each soldier receives reports from all branches of tree
adapted from MacKay (2003) textbook
![Page 135: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/135.jpg)
62
Great Ideas in ML: Message Passing
3 here
3 here
7 here (= 3+3+1)
Each soldier receives reports from all branches of tree
adapted from MacKay (2003) textbook
![Page 136: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/136.jpg)
63
Great Ideas in ML: Message PassingEach soldier receives reports from all branches of tree
adapted from MacKay (2003) textbook
![Page 137: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/137.jpg)
63
Great Ideas in ML: Message Passing
7 here
3 here
Each soldier receives reports from all branches of tree
adapted from MacKay (2003) textbook
![Page 138: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/138.jpg)
63
Great Ideas in ML: Message Passing
7 here
3 here
11 here (= 7+3+1)
Each soldier receives reports from all branches of tree
adapted from MacKay (2003) textbook
![Page 139: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/139.jpg)
64
Great Ideas in ML: Message PassingEach soldier receives reports from all branches of tree
adapted from MacKay (2003) textbook
![Page 140: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/140.jpg)
64
Great Ideas in ML: Message Passing
Belief: Must be 14 of us
Each soldier receives reports from all branches of tree
adapted from MacKay (2003) textbook
![Page 141: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/141.jpg)
64
Great Ideas in ML: Message Passing
7 here
3 here
3 here
Belief: Must be 14 of us
Each soldier receives reports from all branches of tree
adapted from MacKay (2003) textbook
![Page 142: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/142.jpg)
65
Great Ideas in ML: Message PassingEach soldier receives reports from all branches of tree
7 here
3 here
3 here
Belief:Must be 14 of us
adapted from MacKay (2003) textbook
![Page 143: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/143.jpg)
65
Great Ideas in ML: Message PassingEach soldier receives reports from all branches of tree
7 here
3 here
3 here
Belief:Must be 14 of us
wouldn’t work correctlywith a “loopy” (cyclic) graph
adapted from MacKay (2003) textbook
![Page 144: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/144.jpg)
66
……
find preferred tags
Great ideas in ML: Forward-Backward# In the CRF, message passing = forward-backward=
“sum-product algorithm”
v n av 0 2 1n 2 1 0a 0 3 1
v n av 0 2 1n 2 1 0a 0 3 1
![Page 145: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/145.jpg)
66
……
find preferred tags
Great ideas in ML: Forward-Backward
v 1.8n 0
a 4.2
belief
# In the CRF, message passing = forward-backward= “sum-product algorithm”
v n av 0 2 1n 2 1 0a 0 3 1
v n av 0 2 1n 2 1 0a 0 3 1
![Page 146: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/146.jpg)
66
……
find preferred tags
Great ideas in ML: Forward-Backward
v 1.8n 0
a 4.2α β
belief
message message
v 2n 1a 7
# In the CRF, message passing = forward-backward= “sum-product algorithm”
v 3n 1a 6
v n av 0 2 1n 2 1 0a 0 3 1
v n av 0 2 1n 2 1 0a 0 3 1
![Page 147: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/147.jpg)
66
……
find preferred tags
Great ideas in ML: Forward-Backward
v 0.3n 0a 0.1
v 1.8n 0
a 4.2α β
belief
message message
v 2n 1a 7
# In the CRF, message passing = forward-backward= “sum-product algorithm”
v 3n 1a 6
v n av 0 2 1n 2 1 0a 0 3 1
v n av 0 2 1n 2 1 0a 0 3 1
![Page 148: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/148.jpg)
66
……
find preferred tags
Great ideas in ML: Forward-Backward
v 0.3n 0a 0.1
v 1.8n 0
a 4.2α βα
belief
message message
v 2n 1a 7
# In the CRF, message passing = forward-backward= “sum-product algorithm”
v 7n 2a 1
v 3n 1a 6
v n av 0 2 1n 2 1 0a 0 3 1
v n av 0 2 1n 2 1 0a 0 3 1
![Page 149: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/149.jpg)
66
……
find preferred tags
Great ideas in ML: Forward-Backward
v 0.3n 0a 0.1
v 1.8n 0
a 4.2α βα
belief
message message
v 2n 1a 7
# In the CRF, message passing = forward-backward= “sum-product algorithm”
v 7n 2a 1
v 3n 1a 6
v n av 0 2 1n 2 1 0a 0 3 1
![Page 150: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/150.jpg)
66
……
find preferred tags
Great ideas in ML: Forward-Backward
v 0.3n 0a 0.1
v 1.8n 0
a 4.2α βα
belief
message message
v 2n 1a 7
# In the CRF, message passing = forward-backward= “sum-product algorithm”
v 7n 2a 1
v 3n 1a 6
![Page 151: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/151.jpg)
66
……
find preferred tags
Great ideas in ML: Forward-Backward
v 0.3n 0a 0.1
v 1.8n 0
a 4.2α βα
belief
message message
v 2n 1a 7
# In the CRF, message passing = forward-backward= “sum-product algorithm”
v 7n 2a 1
v 3n 1a 6
v n av 0 2 1n 2 1 0a 0 3 1
![Page 152: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/152.jpg)
66
……
find preferred tags
Great ideas in ML: Forward-Backward
v 0.3n 0a 0.1
v 1.8n 0
a 4.2α βα
belief
message message
v 2n 1a 7
# In the CRF, message passing = forward-backward= “sum-product algorithm”
v 7n 2a 1
v 3n 1a 6
βv n a
v 0 2 1n 2 1 0a 0 3 1
v 3n 6a 1
![Page 153: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/153.jpg)
66
……
find preferred tags
Great ideas in ML: Forward-Backward
v 0.3n 0a 0.1
v 1.8n 0
a 4.2α βα
belief
message message
v 2n 1a 7
# In the CRF, message passing = forward-backward= “sum-product algorithm”
v 7n 2a 1
v 3n 1a 6
βv n a
v 0 2 1n 2 1 0a 0 3 1
v 3n 6a 1
v n av 0 2 1n 2 1 0a 0 3 1
![Page 154: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/154.jpg)
Sum-Product Equations
# Message from variable v to factor f
# Message from factor f to variable v
67
mv!f (x) =Y
f 02N(v)\{f}
mf 0!v(x)
mf!v(x) =X
N(f)\{v}
2
4f(xm)
Y
v02N(f)\{v}
mv0!f (y)
3
5
![Page 155: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/155.jpg)
68
……
find preferred tags
Great ideas in ML: Forward-Backward
v 3n 1a 6
v 2n 1a 7
α β
v 0.3n 0a 0.1
![Page 156: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/156.jpg)
68
……
find preferred tags
Great ideas in ML: Forward-Backward
v 3n 1a 6
v 2n 1a 7
α β
v 0.3n 0a 0.1
![Page 157: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/157.jpg)
68
# Extend CRF to “skip chain” to capture non-local factor
……
find preferred tags
Great ideas in ML: Forward-Backward
v 3n 1a 6
v 2n 1a 7
α β
v 0.3n 0a 0.1
![Page 158: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/158.jpg)
68
# Extend CRF to “skip chain” to capture non-local factor$ More influences on belief ☺
……
find preferred tags
Great ideas in ML: Forward-Backward
v 3n 1a 6
v 2n 1a 7
α β
v 3n 1a 6
v 5.4n 0a 25.2
v 0.3n 0a 0.1
![Page 159: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/159.jpg)
69
# Extend CRF to “skip chain” to capture non-local factor $ More influences on belief ☺ $ Graph becomes loopy &
……
find preferred tags
Great ideas in ML: Forward-Backward
v 3n 1a 6
v 2n 1a 7
α β
v 3n 1a 6
v 5.4`n 0a 25.2`
v 0.3n 0a 0.1
![Page 160: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/160.jpg)
69
# Extend CRF to “skip chain” to capture non-local factor $ More influences on belief ☺ $ Graph becomes loopy &
……
find preferred tags
Great ideas in ML: Forward-Backward
v 3n 1a 6
v 2n 1a 7
α β
v 3n 1a 6
v 5.4`n 0a 25.2`
v 0.3n 0a 0.1
Red messages not independent?Pretend they are!
![Page 161: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/161.jpg)
69
# Extend CRF to “skip chain” to capture non-local factor $ More influences on belief ☺ $ Graph becomes loopy &
……
find preferred tags
Great ideas in ML: Forward-Backward
v 3n 1a 6
v 2n 1a 7
α β
v 3n 1a 6
v 5.4`n 0a 25.2`
v 0.3n 0a 0.1
Red messages not independent?Pretend they are!“Loopy Belief Propagation”
![Page 162: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/162.jpg)
70
Terminological Clarification
propagation
![Page 163: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/163.jpg)
70
Terminological Clarification
belief propagation
![Page 164: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/164.jpg)
70
Terminological Clarification
beliefloopy propagation
![Page 165: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/165.jpg)
70
Terminological Clarification
beliefloopy propagation
![Page 166: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/166.jpg)
70
Terminological Clarification
beliefloopy propagation
![Page 167: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/167.jpg)
70
Terminological Clarification
beliefloopy propagation
loopy belief propagation
![Page 168: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/168.jpg)
70
Terminological Clarification
beliefloopy propagation
loopy belief propagation
![Page 169: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/169.jpg)
71
• Loopy belief propagation is easy for local factors
• How do combinatorial factors (like TREE) compute the message to the link in question?✤ “Does the TREE factor think the link is probably t given the
messages it receives from all the other links?”
Propagating Global Factors
find preferred links ……
?
![Page 170: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/170.jpg)
71
• Loopy belief propagation is easy for local factors
• How do combinatorial factors (like TREE) compute the message to the link in question?✤ “Does the TREE factor think the link is probably t given the
messages it receives from all the other links?”
Propagating Global Factors
find preferred links ……
?
![Page 171: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/171.jpg)
71
• Loopy belief propagation is easy for local factors
• How do combinatorial factors (like TREE) compute the message to the link in question?✤ “Does the TREE factor think the link is probably t given the
messages it receives from all the other links?”
Propagating Global Factors
find preferred links ……
?TREE factorffffff 0ffffft 0fffftf 0
… …fftfft 1
… …tttttt 0
![Page 172: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/172.jpg)
72
• How does the TREE factor compute the message to the link in question?✤ “Does the TREE factor think the link is probably t given the
messages it receives from all the other links?”
Propagating Global Factors
find preferred links ……
?TREE factorffffff 0ffffft 0fffftf 0
… …fftfft 1
… …tttttt 0
![Page 173: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/173.jpg)
72
• How does the TREE factor compute the message to the link in question?✤ “Does the TREE factor think the link is probably t given the
messages it receives from all the other links?”
Propagating Global Factors
find preferred links ……
?TREE factorffffff 0ffffft 0fffftf 0
… …fftfft 1
… …tttttt 0
Old-school parsing to the rescue!
This is the outside probability of the link in an edge-factored parser!
∴TREE factor computes all outgoing messages at once (given all incoming messages)
Projective case: total O(n3) time by inside-outside
Non-projective: total O(n3) time by inverting Kirchhoff matrix
![Page 174: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/174.jpg)
73
Graph Theory to the Rescue!
Tutte’s Matrix-Tree Theorem (1948)
The determinant of the Kirchoff (aka Laplacian) adjacency matrix of directed graph G without row
and column r is equal to the sum of scores of all directed spanning trees of G rooted at node r.
![Page 175: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/175.jpg)
73
Graph Theory to the Rescue!
Tutte’s Matrix-Tree Theorem (1948)
The determinant of the Kirchoff (aka Laplacian) adjacency matrix of directed graph G without row
and column r is equal to the sum of scores of all directed spanning trees of G rooted at node r.
Exactly the Z we need!
![Page 176: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/176.jpg)
73
Graph Theory to the Rescue!
Tutte’s Matrix-Tree Theorem (1948)
The determinant of the Kirchoff (aka Laplacian) adjacency matrix of directed graph G without row
and column r is equal to the sum of scores of all directed spanning trees of G rooted at node r.
Exactly the Z we need!
O(n3) time!
![Page 177: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/177.jpg)
74
Kirchoff (Laplacian) Matrix
€
0 −s(1,0) −s(2,0) ! −s(n,0)0 0 −s(2,1) ! −s(n,1)0 −s(1,2) 0 ! −s(n,2)" " " # "0 −s(1,n) −s(2,n) ! 0
#
$
% % % % % %
&
'
( ( ( ( ( (
'Negate edge scores 'Sum columns
(children) 'Strike root row/col. 'Take determinant
![Page 178: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/178.jpg)
74
Kirchoff (Laplacian) Matrix
€
0 −s(1,0) −s(2,0) ! −s(n,0)0 0 −s(2,1) ! −s(n,1)0 −s(1,2) 0 ! −s(n,2)" " " # "0 −s(1,n) −s(2,n) ! 0
#
$
% % % % % %
&
'
( ( ( ( ( (
'Negate edge scores 'Sum columns
(children) 'Strike root row/col. 'Take determinant
![Page 179: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/179.jpg)
74
Kirchoff (Laplacian) Matrix
€
0 −s(1,0) −s(2,0) ! −s(n,0)0 0 −s(2,1) ! −s(n,1)0 −s(1,2) 0 ! −s(n,2)" " " # "0 −s(1,n) −s(2,n) ! 0
#
$
% % % % % %
&
'
( ( ( ( ( (
€
0 −s(1,0) −s(2,0) ! −s(n,0)0 s(1, j)
j≠1∑ −s(2,1) ! −s(n,1)
0 −s(1,2) s(2, j)j≠2∑ ! −s(n,2)
" " " # "0 −s(1,n) −s(2,n) ! s(n, j)
j≠n∑
%
&
' ' ' ' ' ' ' '
(
)
* * * * * * * *
'Negate edge scores 'Sum columns
(children) 'Strike root row/col. 'Take determinant
![Page 180: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/180.jpg)
74
Kirchoff (Laplacian) Matrix
€
0 −s(1,0) −s(2,0) ! −s(n,0)0 0 −s(2,1) ! −s(n,1)0 −s(1,2) 0 ! −s(n,2)" " " # "0 −s(1,n) −s(2,n) ! 0
#
$
% % % % % %
&
'
( ( ( ( ( (
€
0 −s(1,0) −s(2,0) ! −s(n,0)0 s(1, j)
j≠1∑ −s(2,1) ! −s(n,1)
0 −s(1,2) s(2, j)j≠2∑ ! −s(n,2)
" " " # "0 −s(1,n) −s(2,n) ! s(n, j)
j≠n∑
%
&
' ' ' ' ' ' ' '
(
)
* * * * * * * *
€
s(1, j)j≠1∑ −s(2,1) ! −s(n,1)
−s(1,2) s(2, j)j≠2∑ ! −s(n,2)
" " # "−s(1,n) −s(2,n) ! s(n, j)
j≠n∑
'Negate edge scores 'Sum columns
(children) 'Strike root row/col. 'Take determinant
![Page 181: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/181.jpg)
74
Kirchoff (Laplacian) Matrix
€
0 −s(1,0) −s(2,0) ! −s(n,0)0 0 −s(2,1) ! −s(n,1)0 −s(1,2) 0 ! −s(n,2)" " " # "0 −s(1,n) −s(2,n) ! 0
#
$
% % % % % %
&
'
( ( ( ( ( (
€
0 −s(1,0) −s(2,0) ! −s(n,0)0 s(1, j)
j≠1∑ −s(2,1) ! −s(n,1)
0 −s(1,2) s(2, j)j≠2∑ ! −s(n,2)
" " " # "0 −s(1,n) −s(2,n) ! s(n, j)
j≠n∑
%
&
' ' ' ' ' ' ' '
(
)
* * * * * * * *
€
s(1, j)j≠1∑ −s(2,1) ! −s(n,1)
−s(1,2) s(2, j)j≠2∑ ! −s(n,2)
" " # "−s(1,n) −s(2,n) ! s(n, j)
j≠n∑
'Negate edge scores 'Sum columns
(children) 'Strike root row/col. 'Take determinant
N.B.: This allows multiple children of root, but see Koo et al. 2007.
![Page 182: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/182.jpg)
Transition-Based Parsing
• Linear time
• Online
• Train a classifier to predict next action
• Deterministic or beam-search strategies
• But... generally less accurate
![Page 183: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/183.jpg)
Transition-Based ParsingParsing Methods
Arc-Eager Shift-Reduce Parsing [Nivre 2003]
Start state: ([ ], [1, . . . , n], { })
Final state: (S, [ ], A)
Shift: (S, i |B, A) ) (S|i , B, A)
Reduce: (S|i , B, A) ) (S, B, A)
Right-Arc: (S|i , j |B, A) ) (S|i |j , B, A [ {i ! j})
Left-Arc: (S|i , j |B, A) ) (S, j |B, A [ {i j})
Bare-Bones Dependency Parsing 20(30)
Arc-eager shift-reduce parsing (Nivre, 2003)
![Page 184: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/184.jpg)
Transition-Based ParsingArc-eager shift-reduce parsing (Nivre, 2003)
Parsing Methods
Parsing Example
Stack Buffer Arcs
[ ]S [who, did, you, see]B { }
who OBJ � diddid SBJ�! youdid VG�! see }
Bare-Bones Dependency Parsing 21(30)
![Page 185: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/185.jpg)
Transition-Based ParsingArc-eager shift-reduce parsing (Nivre, 2003)
Parsing Methods
Parsing Example
Stack Buffer Arcs
[who]S [did, you, see]B { }
who OBJ � diddid SBJ�! youdid VG�! see }
Bare-Bones Dependency Parsing 21(30)
Shift
![Page 186: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/186.jpg)
Transition-Based ParsingArc-eager shift-reduce parsing (Nivre, 2003)
Parsing Methods
Parsing Example
Stack Buffer Arcs
[ ]S [did, you, see]B { who OBJ � did }
did SBJ�! youdid VG�! see }
Bare-Bones Dependency Parsing 21(30)
Left-arcOBJ
![Page 187: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/187.jpg)
Transition-Based ParsingArc-eager shift-reduce parsing (Nivre, 2003)
Parsing Methods
Parsing Example
Stack Buffer Arcs
[did]S [you, see]B { who OBJ � did }
did SBJ�! youdid VG�! see }
Bare-Bones Dependency Parsing 21(30)
Shift
![Page 188: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/188.jpg)
Transition-Based ParsingArc-eager shift-reduce parsing (Nivre, 2003)
Parsing Methods
Parsing Example
Stack Buffer Arcs
[did, you]S [see]B { who OBJ � did,did SBJ�! you }
did VG�! see }
Bare-Bones Dependency Parsing 21(30)
Right-arcSBJ
![Page 189: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/189.jpg)
Transition-Based ParsingArc-eager shift-reduce parsing (Nivre, 2003)
Parsing Methods
Parsing Example
Stack Buffer Arcs
[did]S [see]B { who OBJ � did,did SBJ�! you }
did VG�! see }
Bare-Bones Dependency Parsing 21(30)
Reduce
![Page 190: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/190.jpg)
Transition-Based ParsingArc-eager shift-reduce parsing (Nivre, 2003)
Parsing Methods
Parsing Example
Stack Buffer Arcs
[did, see]S [ ]B { who OBJ � did,did SBJ�! you,did VG�! see }
Bare-Bones Dependency Parsing 21(30)
Right-arcVG
![Page 191: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/191.jpg)
Transition-Based ParsingArc-eager shift-reduce parsing (Nivre, 2003)
Parsing Methods
Parsing Example
Stack Buffer Arcs
[did, you]S [see]B { who OBJ � did,did SBJ�! you }
did VG�! see }
Bare-Bones Dependency Parsing 21(30)
Right-arcSBJ
![Page 192: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/192.jpg)
Transition-Based ParsingArc-eager shift-reduce parsing (Nivre, 2003)
Parsing Methods
Parsing Example
Stack Buffer Arcs
[did, you]S [see]B { who OBJ � did,did SBJ�! you }
did VG�! see }
Bare-Bones Dependency Parsing 21(30)
Right-arcSBJ
Choose action w/best classifier score100k - 1M features
![Page 193: Log-Linear Models - Northeastern University · 2016-04-01 · Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120—Spring 2016 Northeastern University](https://reader035.vdocuments.mx/reader035/viewer/2022062922/5f09e38a7e708231d428fc0c/html5/thumbnails/193.jpg)
Transition-Based ParsingArc-eager shift-reduce parsing (Nivre, 2003)
Parsing Methods
Parsing Example
Stack Buffer Arcs
[did, you]S [see]B { who OBJ � did,did SBJ�! you }
did VG�! see }
Bare-Bones Dependency Parsing 21(30)
Right-arcSBJ
Choose action w/best classifier score100k - 1M features
Very fast linear-time performanceWSJ 23 (2k sentences) in 3 s