improved inference for unlexicalized parsing

Post on 30-Jan-2016

25 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Improved Inference for Unlexicalized Parsing. Slav Petrov and Dan Klein. DT. DT 1. DT 2. DT 1. DT 2. DT 3. DT 4. DT 1. DT 2. DT 3. DT 4. DT 5. DT 6. DT 7. DT 8. [Petrov et al. ‘06]. Unlexicalized Parsing. Hierarchical, adaptive refinement:. - PowerPoint PPT Presentation

TRANSCRIPT

Improved Inference for Unlexicalized Parsing

Slav Petrov and Dan Klein

Unlexicalized Parsing

Hierarchical, adaptive refinement:

1,140 Nonterminal symbols 1621min Parsing time

531,200 Rewrites

[Petrov et al. ‘06]

91.2 F1 score on Dev Set (1600 sentences)

DT1 DT2 DT3 DT4 DT5 DT6 DT7 DT8

DT1 DT2 DT3 DT4

DT1

DT

DT2

1621 min

Coarse-to-Fine Parsing[Goodman ‘97, Charniak&Johnson ‘05]

Coarse grammarNP … VP

NP-dog NP-catNP-apple VP-run NP-eat…

Refined grammar

TreebankParse

Pru

ne

NP-17 NP-12NP-1 VP-6VP-31…

Refined grammar

Parse

Prune?

For each chart item X[i,j], compute posterior probability:

… QP NP VP …

coarse:

refined:

E.g. consider the span 5 to 12:

< threshold

1621 min

111 min(no search error)

[Charniak et al. ‘06]

NP … VP

NP-dog NP-catNP-apple VP-run NP-eat…

Refined grammar

X

A,B,..

Multilevel Coarse-to-Fine Parsing

Add more rounds of

pre-parsing

Grammars coarser

than X-bar ???

???

?

Hierarchical Pruning

Consider again the span 5 to 12:

… QP NP VP …coarse:

split in two: … QP1

QP2

NP1 NP2 VP1 VP2 …

… QP1

QP1

QP3

QP4

NP1 NP2 NP3 NP4 VP1 VP2 VP3 VP4 …split in four:

split in eight: … … … … … … … … … … … … … … … … …

Intermediate Grammars

X-Bar=G0

G=

G1

G2

G3

G4

G5

G6

Lea

rning DT1 DT2 DT3 DT4 DT5 DT6 DT7 DT8

DT1 DT2 DT3 DT4

DT1

DT

DT2

1621 min111 min

35 min(no search error)

State Drift (DT tag)

somesomethisthisThatThat thesethese

That this some

the

these

this some

that

That this some

the

these

this some

that

……………… …… ……………… …… somesomethesethisThatThis thatthat EM

G1

G2

G3

G4

G5

G6

Lea

rning

G1

G2

G3

G4

G5

G6

Lea

rning

Projected Grammars

X-Bar=G0

G=

Pro

jectio

n i

0(G)

1(G)

2(G)

3(G)

4(G)

5(G)G

Estimating Projected Grammars

Nonterminals?

Nonterminals in G

NP1VP1VP0 S0S1

NP0

Nonterminals in (G)

VP

S

NP

Projection

Easy:

Rules in G Rules in (G)

Estimating Projected Grammars

Rules?

S1 NP1 VP1 0.20S1 NP1 VP2 0.12S1 NP2 VP1 0.02S1 NP2 VP2 0.03S2 NP1 VP1 0.11S2 NP1 VP2 0.05S2 NP2 VP1 0.08S2 NP2 VP2 0.12

S NP VP

????

Treebank

Estimating Projected Grammars[Corazza & Satta ‘06]

Rules in (G)

S NP VP

Rules in G

S1 NP1 VP1 0.20S1 NP1 VP2 0.12S1 NP2 VP1 0.02S1 NP2 VP2 0.03S2 NP1 VP1 0.11S2 NP1 VP2 0.05S2 NP2 VP1 0.08S2 NP2 VP2 0.12

Infinite tree distribution

0.56

Estimating Grammars

Calculating Expectations

Nonterminals:

ck(X): expected counts up to depth k Converges within 25 iterations (few seconds)

Rules:

1621 min111 min35 min

15 min(no search error)

G1

G2

G3

G4

G5

G6

Lea

rning

Parsing times

X-Bar=G0

G=

60 %

12 %

7 %

6 %

6 %

5 %

4 %

Bracket Posteriors (after G0)

Bracket Posteriors (after G1)

Bracket Posteriors (Movie)(Final Chart)

Bracket Posteriors (Best Tree)

Parse Selection

Computing most likely unsplit tree is NP-hard: Settle for best derivation. Rerank n-best list. Use alternative objective function.

Parses:

-1

-1

-2

-2

-1

-1

-1Derivations:

-1

-2

-1

-1

-2

-1

-2

Parse Risk Minimization

Expected loss according to our beliefs:

TT : true tree TP : predicted tree L : loss function (0/1, precision, recall, F1)

[Titov & Henderson ‘06]

Use n-best candidate list and approximate

expectation with samples.

Reranking Results

Objective Precision Recall F1 Exact

BEST DERIVATION

Viterbi Derivation 89.6 89.4 89.5 37.4

Exact (non-sampled) 90.8 90.8 90.8 41.7

Exact/F1 (oracle) 95.3 94.4 95.0 63.9

RERANKING

Precision (sampled) 91.1 88.1 89.6 21.4

Recall (sampled) 88.2 91.3 89.7 21.5

F1 (sampled) 90.2 89.3 89.8 27.2

Exact (sampled) 89.5 89.5 89.5 25.8

Dynamic Programming

[Matsuzaki et al. ‘05]Approximate posterior parse distribution

à la [Goodman ‘98]Maximize number of expected correct rules

Objective Precision Recall F1 Exact

BEST DERIVATION

Viterbi Derivation 89.6 89.4 89.5 37.4

DYNAMIC PROGRAMMING

Variational 90.7 90.9 90.8 41.4

Max-Rule-Sum 90.5 91.3 90.9 40.4

Max-Rule-Product 91.2 91.1 91.2 41.4

Dynamic Programming Results

Final Results (Efficiency)

Berkeley Parser: 15 min 91.2 F-score Implemented in Java

Charniak & Johnson ‘05 Parser 19 min 90.7 F-score Implemented in C

Final Results (Accuracy)

≤ 40 words

F1

all

F1

EN

G

Charniak&Johnson ‘05 (generative) 90.1 89.6

This Work 90.6 90.1

Charniak&Johnson ‘05 (reranked) 92.0 91.4

GE

R

Dubey ‘05 76.3 -

This Work 80.8 80.1

CH

N

Chiang et al. ‘02 80.0 76.6

This Work 86.3 83.4

Conclusions

Hierarchical coarse-to-fine inference Projections Marginalization

Multi-lingual unlexicalized parsing

Thank You!

Parser available at

http://nlp.cs.berkeley.edu

top related