improved inference for unlexicalized parsing

Improved Inference for Unlexicalized Parsing

Slav Petrov and Dan Klein

Unlexicalized Parsing

Hierarchical, adaptive refinement:

1,140 Nonterminal symbols 1621min Parsing time

531,200 Rewrites

[Petrov et al. ‘06]

91.2 F1 score on Dev Set (1600 sentences)

DT1 DT2 DT3 DT4 DT5 DT6 DT7 DT8

DT1 DT2 DT3 DT4

1621 min

Coarse-to-Fine Parsing[Goodman ‘97, Charniak&Johnson ‘05]

Coarse grammarNP … VP

NP-dog NP-catNP-apple VP-run NP-eat…

Refined grammar

TreebankParse

NP-17 NP-12NP-1 VP-6VP-31…

Refined grammar

Prune?

For each chart item X[i,j], compute posterior probability:

… QP NP VP …

coarse:

refined:

E.g. consider the span 5 to 12:

< threshold

1621 min

111 min(no search error)

[Charniak et al. ‘06]

NP … VP

NP-dog NP-catNP-apple VP-run NP-eat…

Refined grammar

A,B,..

Multilevel Coarse-to-Fine Parsing

Add more rounds of

pre-parsing

Grammars coarser

than X-bar ???

Hierarchical Pruning

Consider again the span 5 to 12:

… QP NP VP …coarse:

split in two: … QP1

NP1 NP2 VP1 VP2 …

… QP1

NP1 NP2 NP3 NP4 VP1 VP2 VP3 VP4 …split in four:

split in eight: … … … … … … … … … … … … … … … … …

Intermediate Grammars

X-Bar=G0

rning DT1 DT2 DT3 DT4 DT5 DT6 DT7 DT8

DT1 DT2 DT3 DT4

1621 min111 min

State Drift (DT tag)

somesomethisthisThatThat thesethese

That this some

this some

That this some

this some

……………… …… ……………… …… somesomethesethisThatThis thatthat EM

Projected Grammars

X-Bar=G0

jectio

Estimating Projected Grammars

Nonterminals?

Nonterminals in G

NP1VP1VP0 S0S1

Nonterminals in (G)

Projection

Rules in G Rules in (G)

Estimating Projected Grammars

Rules?

S1 NP1 VP1 0.20S1 NP1 VP2 0.12S1 NP2 VP1 0.02S1 NP2 VP2 0.03S2 NP1 VP1 0.11S2 NP1 VP2 0.05S2 NP2 VP1 0.08S2 NP2 VP2 0.12

S NP VP

Treebank

Estimating Projected Grammars[Corazza & Satta ‘06]

Rules in (G)

S NP VP

Rules in G

S1 NP1 VP1 0.20S1 NP1 VP2 0.12S1 NP2 VP1 0.02S1 NP2 VP2 0.03S2 NP1 VP1 0.11S2 NP1 VP2 0.05S2 NP2 VP1 0.08S2 NP2 VP2 0.12

Infinite tree distribution

Estimating Grammars

Calculating Expectations

Nonterminals:

ck(X): expected counts up to depth k Converges within 25 iterations (few seconds)

Rules:

1621 min111 min35 min

Parsing times

X-Bar=G0

Bracket Posteriors (after G0)

Bracket Posteriors (after G1)

Bracket Posteriors (Movie)(Final Chart)

Bracket Posteriors (Best Tree)

Parse Selection

Computing most likely unsplit tree is NP-hard: Settle for best derivation. Rerank n-best list. Use alternative objective function.

Parses:

-1Derivations:

Parse Risk Minimization

Expected loss according to our beliefs:

TT : true tree TP : predicted tree L : loss function (0/1, precision, recall, F1)

[Titov & Henderson ‘06]

Use n-best candidate list and approximate

expectation with samples.

Reranking Results

Objective Precision Recall F1 Exact

BEST DERIVATION

Viterbi Derivation 89.6 89.4 89.5 37.4

Exact (non-sampled) 90.8 90.8 90.8 41.7

Exact/F1 (oracle) 95.3 94.4 95.0 63.9

RERANKING

Precision (sampled) 91.1 88.1 89.6 21.4

Recall (sampled) 88.2 91.3 89.7 21.5

F1 (sampled) 90.2 89.3 89.8 27.2

Exact (sampled) 89.5 89.5 89.5 25.8

Dynamic Programming

[Matsuzaki et al. ‘05]Approximate posterior parse distribution

à la [Goodman ‘98]Maximize number of expected correct rules

Objective Precision Recall F1 Exact

BEST DERIVATION

Viterbi Derivation 89.6 89.4 89.5 37.4

DYNAMIC PROGRAMMING

Variational 90.7 90.9 90.8 41.4

Max-Rule-Sum 90.5 91.3 90.9 40.4

Max-Rule-Product 91.2 91.1 91.2 41.4

Dynamic Programming Results

Final Results (Efficiency)

Berkeley Parser: 15 min 91.2 F-score Implemented in Java

Charniak & Johnson ‘05 Parser 19 min 90.7 F-score Implemented in C

Final Results (Accuracy)

≤ 40 words

Charniak&Johnson ‘05 (generative) 90.1 89.6

This Work 90.6 90.1

Charniak&Johnson ‘05 (reranked) 92.0 91.4

Dubey ‘05 76.3 -

This Work 80.8 80.1

Chiang et al. ‘02 80.0 76.6

This Work 86.3 83.4

Conclusions

Hierarchical coarse-to-fine inference Projections Marginalization

Multi-lingual unlexicalized parsing

Thank You!

Parser available at

http://nlp.cs.berkeley.edu

improved inference for unlexicalized parsing

08s2 np2 vp2

02s1 np2 vp2

s1 np1 vp1

11s2 np1 vp2

20s1 np1 vp2

12s1 np2 vp1

05s2 np2 vp1

03s2 np1 vp1

Documents

syntactic analysis operator-precedence parsing...

u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf ·...

syntax and parsing of semitic languages -...

dependency parsing (3) - university of maryland ·...

botnet judo: fighting spam with itself...signature generator...

chart parsing and probabilistic parsing

october 2008csa3180: setence parsing algorithms 1 1 csa350:...

grammarless parsing for joint inference · end task when...

parsing expression grammar and packrat parsing ...

semantics where are we in the “big picture” morph...

exact inference for generative probabilistic non...

the importance of syntactic parsing and inference in ... ›...

syntax and parsing i - instituto de...

learning for semantic parsing using statistical syntactic...

grammarless parsing for joint inference

weighted parsing, probabilistic parsing

bab 6: contex-free grammar & parsing -...

parsing iii (top-down parsing: recursive descent & ll(1) ...

knowledge base who is justin bieber’s sister? semantic...

chart parsing and probabilistic parsing -...