university of albertanov 30, 2006 inversion transduction grammar with linguistic constraints colin...
Post on 22-Dec-2015
218 views
TRANSCRIPT
University of AlbertaNov 30, 2006
Inversion Transduction Grammar with Linguistic Constraints
Colin Cherry
University of Alberta
3 University of AlbertaNov 30, 2006
Outline
• Bitext and Bitext Parsing
• Inversion Transduction Grammar (ITG)
• ITG with Linguistic Constraints
• Discriminative ITG with Linguistic Features
• Other Projects
4 University of AlbertaNov 30, 2006
Statistical Machine Translation• Input:
– Source language sentence E
• Goal: – Produce a well-formed target language sentence F
with same meaning as E
• Process:– Decoding: search for an operation sequence O that
transforms E into F
– Weights on individual operations are determined empirically from examples of translation
5 University of AlbertaNov 30, 2006
Bitext
• Valuable resource for training and testing statistical machine translation systems
• Large-scale examples of translation
• Needs analysis to determine small-scale operations that generalize to unseen sentences
Text inEnglish
Same text,in French
6 University of AlbertaNov 30, 2006
Word Alignment
• Given a sentence and its translation, find the word-to-word connections
the minister in charge of the Canadian Wheat Board
le ministre chargé de la Commission Canadienne du blé
7 University of AlbertaNov 30, 2006
Word Alignment
• Given a sentence and its translation, find the word-to-word connections
• Link: a single word-to-word connection
the minister in charge of the Canadian Wheat Board
le ministre chargé de la Commission Canadienne du blé
8 University of AlbertaNov 30, 2006
Given a Word Alignment
• Extract bilingual phrase pairs for phrasal SMT (Koehn et al. 2003)
• Add in a parse tree and:– Extract treelet pairs for dependency translation (Quirk et
al. 2005)
– Extract rules for a tree transducer (Galley et al. 2004)
• Other fun things:– Train monolingual paraphrasers (Quirk et al. 2004, Callison-
Burch et al. 2005)
9 University of AlbertaNov 30, 2006
Bitext Parsing
• Assume a context-free grammar generates two languages at once
• Like joint models, but position of words in both languages is controlled by grammar
10 University of AlbertaNov 30, 2006
Monolingual Parsing
always verbs the adjective nounhe
Adv V Det
Adj N
NP
V NP
NP VP
S Non-terminals
Terminals
ProductionNPAdj N
11 University of AlbertaNov 30, 2006
Another viewS
VPNP
VNP NP
…
always verbshe the adjective noun
VP V NP
S NP VP
18 University of AlbertaNov 30, 2006
Bitext Parsing is in 2D
Adj
NP
Adv
V
Det
N
il verbe toujours le nom adjectif
he
always
verbs
the
adjective
noun
19 University of AlbertaNov 30, 2006
Why Bitext Parsing?• Established polynomial algorithms
• Flexible framework, easy to add info:– Parse given an alignment– Align given a parse (this work)
• Discoveries can be ported to parser-based decoders (Zens et al. 2004, Melamed 2004)
• Advances in parsing can be ported to word alignment
20 University of AlbertaNov 30, 2006
Outline
• Bitext and Bitext Parsing
• Inversion Transduction Grammar (ITG)
• ITG with Linguistic Constraints
• Discriminative ITG with Linguistic Features
• Other Projects
21 University of AlbertaNov 30, 2006
Inversion Transduction Grammar
• Introduced in by Wu (1997)
– Transduction: • N noun / nom
– Inversion:• NP [Det NP]
• NP <Adj N>
N
nom
noun
NP
Det
AdjN
Straight
Inverted
22 University of AlbertaNov 30, 2006
Binary Bracketing
A[AA]
A<AA>
Ae/f
• No linguistic meaning to “A”
24 University of AlbertaNov 30, 2006
Pros and Cons of Bracketing
• Pros:– Language independent– Straight-forward and fast– Symbols are minimally restrictive
• Cons:– Grammar is meaningless– ITG Constraint
25 University of AlbertaNov 30, 2006
ITG Constraint
12 are acceptable to the commission Mr Burton fully or in part
12 are acceptable to the commissionMr Burton fully or in part
26 University of AlbertaNov 30, 2006
Outline
• Bitext and Bitext Parsing
• Inversion Transduction Grammar (ITG)
• ITG with Linguistic Constraints
• Discriminative ITG with Linguistic Features
• Other Projects
27 University of AlbertaNov 30, 2006
Some questions
• Those ITG constraints are kind of scary. How bad are they? Do they ever help?
• Can we inject some linguistics into this otherwise purely syntactic process?
– Linguistic grammar would limit trees that can be built - and therefore limit alignments
28 University of AlbertaNov 30, 2006
Alignment Spaces
• Set of feasible alignments for a sentence pair
• Described by how links interact– If links don’t interact, problem loses its structure
• Should encourage competition between links (Guidance)
• Should not eliminate correct alignments (Expressiveness)
29 University of AlbertaNov 30, 2006
ITG Space
• Rules out “inside-out” alignments
• Limits how concepts can be re-ordered during translation
30 University of AlbertaNov 30, 2006
Permutation Space
• One-to-one: each word in at most one link
• Allows any permutation of concepts
• Reduces to weighted maximum matching if each link can be scored independently
the tax causes unrest
l’ impôt cause le malaise
31 University of AlbertaNov 30, 2006
Linguistic source: Dependencies
• Tree structure defines dependencies between words
• Subtrees define contiguous phrases
the minister in charge of the Canadian Wheat Board
32 University of AlbertaNov 30, 2006
Linguistic source: Dependencies
• Tree structure defines dependencies between words
• Subtrees define contiguous phrases
the minister in charge of the Canadian Wheat Board
33 University of AlbertaNov 30, 2006
Phrasal Cohesion
• Syntactic phrases in tree tend to stay together after translation (Fox 2002)
• We can use this idea to constrain an alignment given an English dependency tree
• Shown to improve alignment quality (Lin and Cherry 2003)
35 University of AlbertaNov 30, 2006
Example
the tax causes unrest
l’ impôt cause le malaise
We can rule out the link, even with no one-to-one violation
36 University of AlbertaNov 30, 2006
ITG & Dependency• Both limit movement with phrasal cohesion
– ITG: Cohesive in some binary tree
– Dep: Cohesive in provided dependency tree
• Not subspaces of each other
the big red dog the dog ate it
Dep & ITG x Dep x & ITG
37 University of AlbertaNov 30, 2006
D-ITG Space
• Force ITG to maintain phrasal cohesion with a provided dependency tree
• Intersects ITG and Dependency spaces
• Adds linguistic dependency tree to ITG parsing
38 University of AlbertaNov 30, 2006
Chart Modification Solution
• Eliminate structures that allow tax to invert away from the
the tax causes unrest the tax causes unrest
the tax causes unrest
39 University of AlbertaNov 30, 2006
Effect on Parser
the
tax
causes
unrest
l’ impôt cause le malaise
A x
A
A
40 University of AlbertaNov 30, 2006
Effect on Parser
the
tax
causes
unrest
l’ impôt cause le malaise
A
A
42 University of AlbertaNov 30, 2006
Experimental Setup
• English-French Parliamentary debates
• 500 sentence labeled test set – (Och and Ney, 2003)
• Dependency parses from Minipar
43 University of AlbertaNov 30, 2006
Guidance Test
• Does the space stop incorrect alignments?
• Use a weighted link score built from:– Bilingual correlations between words– Relative position of tokens
• Maximize summed link scores in all spaces, check alignment error rate– AER: Combined precision and recall, lower is better
44 University of AlbertaNov 30, 2006
Guidance Results
02468
101214161820
Alignment Error Rate
PermutationITGD-ITG
45 University of AlbertaNov 30, 2006
Expressiveness Test
• Given a strong model, does the space hold us back?
• Use a cooked link score from the gold standard:– Only correct links are given positive scores– Best space is unconstrained space
• Maximize summed link scores in all spaces, check recall
47 University of AlbertaNov 30, 2006
Contributions
• Algorithmic:– Method to inject ITG with linguistic constraints
• Experimental:– ITG constraints provide guidance, with virtually no
loss in expressiveness (French-English)
– Dependency cohesion constraints provide greater guidance, at the cost of some expressiveness
48 University of AlbertaNov 30, 2006
Outline
• Bitext and Bitext Parsing
• Inversion Transduction Grammar (ITG)
• ITG with Linguistic Constraints
• Discriminative ITG with Linguistic Features
• Other Projects
49 University of AlbertaNov 30, 2006
Remaining Problems
• Dependency cohesion stops correct links:– Parse errors, Paraphrase, Exceptions– Would like a soft constraint
• I’m not doing much learning 2 competitive linking with an ITG search
50 University of AlbertaNov 30, 2006
Soft Constraint• Invalid spans need not be disallowed
– Instead parser could incur a penalty
• Easy to incorporate penalty into DP
the
tax
causes
unrest
l’ impôt cause le malaise
A -5
A
A
51 University of AlbertaNov 30, 2006
ITG Learning
• Zhang and Gildea 2004, 2005, 2006…• Expectation Maximization to parameterize a
stochastic grammar unsupervised– Driven by expensive 2D inside-outside– Not doing much better than I am with 2
• Meanwhile, EMNLP’05 is happening– Moore 2005, Taskar et al. 2005– Suddenly it’s okay to use some training data
52 University of AlbertaNov 30, 2006
Discriminative matching (Taskar et al. 05)
causes
cause
?2 0.767DIST 0.050LCSR 0.833HMM 0.0
= 47.2
Link Score
Max matching finds alignment that maximizes the sum of link scores
Entire alignment y can be given feature vector (y) according to features of links in y
Features Learned Weights
53 University of AlbertaNov 30, 2006
Learning objective
• Find weights w, such that for each example i:
• Can formulate as constrained optimization problem, do max margin training
• Problem: Exponential number of wrong answers
FeaturesLearned Weights Structured Distance
QuickTime™ and aPhoto - JPEG decompressor
are needed to see this picture.
54 University of AlbertaNov 30, 2006
SVM Struct (Tsochantaridis et al. 2004)
Constrainedoptimization
wSearch for
most violated
Emptyconstraints
Accumulated constraints
Theory of constraint generation in constrained optimization guarantees convergence
55 University of AlbertaNov 30, 2006
Similarities to Averaged Perceptron
• Online method driven by comparisons of current output to correct answer
• But– Allows a notion of structural distance– Returns a max margin solution (with slacks) at
each step– Remembers all of its past mistakes
56 University of AlbertaNov 30, 2006
SVM-ITG• Can learn ITG parameters discriminatively
• Link productions Ae/f are scored as in discriminative matching
• Non-terminal productions A[AA] | <AA> are scored with two features:– Is it inverted?– Does it cover a span that would usually be illegal?
causes
cause
2 0.767DIST 0.050LCSR 0.833HMM 0.0
= 47.2Acauses / cause :
57 University of AlbertaNov 30, 2006
Experimental Setup
• Identical to Taskar et al.– 100 training– 37 development– 347 test
• Same unsupervised text as before to derive features – 50k Hansards data
58 University of AlbertaNov 30, 2006
Results: Bipartite matching SVM (Permutation): SVM weights with hard constraint (D-ITG)
02468
10121416
AER 1-Prec 1-Rec
59 University of AlbertaNov 30, 2006
Results: Bipartite matching SVM : SVM weights with hard constraint : ITG SVM with soft cohesion feature
02468
10121416
AER 1-Prec 1-Rec
60 University of AlbertaNov 30, 2006
Contributions
• Algorithmic:– Discriminative learning method for ITGs
• Experimental:– Value of hard constraints is reduced in the presence
of a strong link score– Integrating constraint as a feature during training can
recover value of constraints, improve AER & recall
61 University of AlbertaNov 30, 2006
Other Projects
• Applying techniques from SMT to new domains:– Unsupervised pronoun resolution
• Discriminative Structured Learning:– Discriminative parsing
62 University of AlbertaNov 30, 2006
Unsupervised Pronoun ResolutionCherry and Bergsma, CoNLL’05
• The president entered the arena with his family.
• Input: – A pronoun in context, and a list of candidates
“his family”, {arena, president}
• Output: The correct candidate - president
• Big Idea:– Formulate a generative model, where a candidate
generates the pronoun and context, run EM– Similar to IBM-1: Align pronouns to candidates
63 University of AlbertaNov 30, 2006
Pronoun Resolution: Innovations
• Used linguistics to limit candidate list:– Binding theory, known noun genders
• Used unambiguous cases to initialize EM
• Re-weighted component models discriminatively with maximum entropy
• End result: – Within 5% of a supervised system, with re-weighted
model matching supervised performance
64 University of AlbertaNov 30, 2006
Discriminative ParsingWang, Cherry, Lizotte and Schuurmans, CoNLL’06
• Input: Segmented Chinese string
• Output: Dependency parse tree
• Big Idea: – Score each link independently, with SVM weighting
features on links (MacDonald 2005), but generalize without Part of Speech tags
– Learn a weight for every word-pair seen in training
the tax causes unrest
65 University of AlbertaNov 30, 2006
Parsing Innovations
• To promote generalization:– Altered “large margin” portion of SVM objective so
semantically similar word pairs have similar weights
• Tried two constraint types:– Local: Link scores constrained so links present in gold
standard score higher than those absent– Global: SVM Struct-style constraint generation
66 University of AlbertaNov 30, 2006
Others in brief
• Dependency treelet decoder (here)
• Sequence tagging:– Biomedical Term recognition
• Highlight gene names, proteins in medical texts
– Character-based Syllabification• Find syllable breaks in written words
67 University of AlbertaNov 30, 2006
Outline
• Bitext and Bitext Parsing
• Inversion Transduction Grammar (ITG)
• ITG with Linguistic Constraints
• Discriminative ITG with Linguistic Features
• Other Projects
69 University of AlbertaNov 30, 2006
Connecting E and F
• One language generates the other – IBM models (Brown et al. 1993), HMM (Vogel et al. 1996),
Tree-to-string model (Yamada and Knight 2001)
• Both languages generated simultaneously– Joint model (Melamed 2000), Phrasal joint model (Marcu and
Wong 2002)
• S and T generate an alignment– Conditional model (Cherry and Lin 2003), Discriminative
models (Taskar et al. 2005, Moore 2005)
70 University of AlbertaNov 30, 2006
Phrases agree, not trees
he ran here quickly
Dependencies state that ran is modified here and quickly separately
We allow ITG to state that ran is modified by “here quickly”
Also tested these additional head constraints
71 University of AlbertaNov 30, 2006
A
Effect on Parser
the
tax
causes
unrest
l’ impôt cause le malaise
A x
72 University of AlbertaNov 30, 2006
Custom Grammar Solution
• What trees force the and tax to stay together?– Custom recursive grammar– Same alignment space, canonical tree
the tax causes unresttax causes unrest
the tax
ITG
ITG
73 University of AlbertaNov 30, 2006
Guidance Results
02468
101214161820
Alignment Error Rate
PermutationITGDep BeamD-ITGHD-ITG
74 University of AlbertaNov 30, 2006
Expressiveness Results
90919293949596979899
100
Recall
UnconstrainedPermutationITGDep BeamD-ITGHD-ITG
75 University of AlbertaNov 30, 2006
Expressiveness Analysis
• HD-ITG has systematic violations– Discontinuous Constituents (Melamed, 2003)
– Maintains distance to head - not always maintained in translation
Canadian Wheat Board
Commission Canadienne du blé
Canadian Wheat Board
76 University of AlbertaNov 30, 2006
Discriminative Alignment
• Alignment can be viewed as multi-class classification
the tax causes unrest
l’ impôt cause le malaise
the tax causes unrest
l’ impôt cause le malaise
Input:
Correct Answer:
Wrong Answers:
the tax causes unrest
l’ impôt cause le malaise
the tax causes unrest
l’ impôt cause le malaise
the tax causes unrest
l’ impôt cause le malaise…
77 University of AlbertaNov 30, 2006
Problem
• Exponential number of incorrect alignments• One solution:
– Take advantage of properties of matching algorithm– Factor constraints
• Doing the same factorization on ITG could be a lot of work - need something more modular– Averaged perceptron?– Structured SVM
78 University of AlbertaNov 30, 2006
Final Challenge
• Need gold standard trees to train on, only have gold standard alignments
• Versatility of ITG makes this easy:– Search for best parse given an alignment– Select the parse with fewest cohesion violations and
fewest inversions
79 University of AlbertaNov 30, 2006
Redundancy
• Using A[AA] | <AA> | e/f– Several parses produce the same alignment– Wu provides a canonical-form grammar– Creates only one parse per alignment
• Useful for:– Counting methods like EM– Detecting arbitrary bracketing decisions
80 University of AlbertaNov 30, 2006
Results Table
Method Prec Rec AER
Match 79.3 82.7 19.24
ITG 81.8 83.7 17.36
Cohesion 88.8 84.0 13.40
D-ITG 88.8 84.2 13.32
HD-ITG 89.2 84.0 13.15
81 University of AlbertaNov 30, 2006
Guidance Results
0
5
10
15
20
25
Prec Err Rec Err AER
PermutationITGDep BeamD-ITGHD-ITG
82 University of AlbertaNov 30, 2006
Expressiveness Results
0
1
2
3
4
5
6
Recall Error AER
PermutationITGDep BeamD-ITGHD-ITG