parsing german with latent variable grammars slav petrov and dan klein uc berkeley
TRANSCRIPT
Parsing German with Latent Variable Grammars
Slav Petrov and Dan Klein
UC Berkeley
The Game of Designing a Grammar
Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98] Head lexicalization [Collins ’99, Charniak ’00] Automatic clustering?
Previous Work:Manual Annotation
Manually split categories NP: subject vs object DT: determiners vs demonstratives IN: sentential vs prepositional
Advantages: Fairly compact grammar Linguistic motivations
Disadvantages: Performance leveled out Manually annotated
[Klein & Manning ’03]
Model F1
Naïve Treebank Grammar 72.6
Klein & Manning ’03 86.3
Previous Work:Automatic Annotation Induction
Advantages: Automatically learned:
Label all nodes with latent variables.
Same number k of subcategories for all categories.
Disadvantages: Grammar gets too large Most categories are
oversplit while others are undersplit.
[Matsuzaki et. al ’05, Prescher ’05]
Model F1
Klein & Manning ’03 86.3
Matsuzaki et al. ’05 86.7
[Petrov, Barrett, Thibaux & Klein
in ACL’06]
[Petrov & Klein in NAACL’07]
Overview
Learning: Hierarchical Training Adaptive Splitting Parameter Smoothing
Inference: Coarse-To-Fine Decoding Variational Approximation
German Analysis
Forward
Learning Latent Annotations
EM algorithm:
X1
X2X7X4
X5 X6X3
He was right
.
Brackets are known Base categories are known Only induce subcategories
Just like Forward-Backward for HMMs. Backward
k=16k=8
k=4
k=2
k=160
65
70
75
80
85
90
50 250 450 650 850 1050 1250 1450 1650
Total Number of grammar symbols
Parsing accuracy (F1)
Starting PointLimit of computational resources
Refinement of the DT tag
DT-1 DT-2 DT-3 DT-4
DT
Refinement of the DT tagDT
Hierarchical Refinement of the DT tag
DT
Hierarchical Estimation Results
74
76
78
80
82
84
86
88
90
100 300 500 700 900 1100 1300 1500 1700
Total Number of grammar symbols
Parsing accuracy (F1)
Model F1
Baseline 87.3
Hierarchical Training 88.4
Refinement of the , tag
Splitting all categories the same amount is wasteful:
The DT tag revisited
Oversplit?
Adaptive Splitting
Want to split complex categories more Idea: split everything, roll back splits which
were least useful
Adaptive Splitting
Want to split complex categories more Idea: split everything, roll back splits which
were least useful
Adaptive Splitting
Evaluate loss in likelihood from removing each split =
Data likelihood with split reversed
Data likelihood with split No loss in accuracy when 50% of the splits are
reversed.
Adaptive Splitting Results
74
76
78
80
82
84
86
88
90
100 300 500 700 900 1100 1300 1500 1700
Total Number of grammar symbols
Parsing accuracy (F1)
50% Merging
Hierarchical Training
Flat TrainingModel F1
Previous 88.4
With 50% Merging 89.5
0
5
10
15
20
25
30
35
VP NP PP AP S
CNP AVP PN CAP
CS CVP
VZ CCP NM CPP MTA CVZ
AA ISU
VROOT CAVP CAC
CH CO DL ROOT
Number of Phrasal Subcategories
0
5
10
15
20
25
30
35
NE
VVFIN ADJA NN ADV
ADJD VVPP APPR VVINF CARD ART PIS PIAT
PPER KON $[
PROAV VAFIN PDS
APPRAR PPOSAT
$.
PDAT PRELS PTKVZ VVIZU VAINF KOUS VMFIN
FM VAPP
KOKOM PWAV
PWS KOUI TRUNC
XY
PTKZU PWAT VVIMP NNE
PRELAT PTKNEG
APZR
Number of Lexical Subcategories
Smoothing
Heavy splitting can lead to overfitting Idea: Smoothing allows us to pool
statistics
Linear Smoothing
74
76
78
80
82
84
86
88
90
100 300 500 700 900 1100
Total Number of grammar symbols
Parsing accuracy (F1)50% Merging and Smoothing
50% Merging
Hierarchical Training
Flat Training
Model F1
Previous 89.5
With Smoothing 90.7
Result Overview
Coarse-to-Fine Parsing[Goodman ‘97, Charniak&Johnson ‘05]
Coarse grammarNP … VP
NP-dog NP-catNP-apple VP-run NP-eat…
Refined grammar
…
TreebankParse
Pru
ne
NP-17 NP-12NP-1 VP-6VP-31…
Refined grammar
…
Parse
Hierarchical Pruning
Consider the span 5 to 12:
… QP NP VP …coarse:
split in two: … QP1
QP2
NP1 NP2 VP1 VP2 …
… QP1
QP1
QP3
QP4
NP1 NP2 NP3 NP4 VP1 VP2 VP3 VP4 …split in four:
split in eight: … … … … … … … … … … … … … … … … …
Intermediate Grammars
X-Bar=G0
G=
G1
G2
G3
G4
G5
G6
Lea
rning DT1 DT2 DT3 DT4 DT5 DT6 DT7 DT8
DT1 DT2 DT3 DT4
DT1
DT
DT2
State Drift (DT tag)
somesomethisthisThatThat thesethese
That this some
the
these
this some
that
That this some
the
these
this some
that
……………… …… ……………… …… somesomethesethisThatThis thatthat EM
G1
G2
G3
G4
G5
G6
Lea
rning
G1
G2
G3
G4
G5
G6
Lea
rning
Projected Grammars
X-Bar=G0
G=
Pro
jectio
n i
0(G)
1(G)
2(G)
3(G)
4(G)
5(G)G
Bracket Posteriors (after G0)
Bracket Posteriors (after G1)
Bracket Posteriors (Movie)(Final Chart)
Bracket Posteriors (Best Tree)
Parse Selection
Computing most likely unsplit tree is NP-hard: Settle for best derivation. Rerank n-best list. Use alternative objective function / Variational Approximation.
Parses:
-1
-1
-2
-2
-1
-1
-1Derivations:
-1
-2
-1
-1
-2
-1
-2
Efficiency Results
Berkeley Parser: 15 min Implemented in Java
Charniak & Johnson ‘05 Parser 19 min Implemented in C
Accuracy Results
≤ 40 words
F1
all
F1
EN
G
Charniak&Johnson ‘05 (generative) 90.1 89.6
This Work 90.6 90.1
GE
R
Dubey ‘05 76.3 -
This Work 80.8 80.1
CH
N
Chiang et al. ‘02 80.0 76.6
This Work 86.3 83.4
Parsing German Shared Task
Two Pass Parsing Determine constituency structure (F1: 85/94) Assign grammatical functions
One Pass Approach Treat categories+grammatical functions as
labels
Parsing German Shared Task
Two Pass Parsing Determine constituency structure Assign grammatical functions
One Pass Approach Treat categories+grammatical functions as
labels
Development Set Results
Shared Task Results
Part-of-speech splits
Linguistic Candy
Conclusions
Split & Merge Learning Hierarchical Training Adaptive Splitting Parameter Smoothing
Hierarchical Coarse-to-Fine Inference Projections Marginalization
Multi-lingual Unlexicalized Parsing
Thank You!
Parser is avaliable athttp://nlp.cs.berkeley.edu