learning accurate, compact, and interpretable tree annotation

38
Learning Accurate, Compact, and Interpretable Tree Annotation Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein

Upload: ophrah

Post on 06-Jan-2016

42 views

Category:

Documents


1 download

DESCRIPTION

Learning Accurate, Compact, and Interpretable Tree Annotation. Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein. The Game of Designing a Grammar. Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98]. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Learning Accurate, Compact, and Interpretable Tree Annotation

Learning Accurate, Compact, and Interpretable Tree Annotation

Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein

Page 2: Learning Accurate, Compact, and Interpretable Tree Annotation

The Game of Designing a Grammar

Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98]

Page 3: Learning Accurate, Compact, and Interpretable Tree Annotation

The Game of Designing a Grammar

Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98] Head lexicalization [Collins ’99, Charniak ’00]

Page 4: Learning Accurate, Compact, and Interpretable Tree Annotation

The Game of Designing a Grammar

Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98] Head lexicalization [Collins ’99, Charniak ’00] Automatic clustering?

Page 5: Learning Accurate, Compact, and Interpretable Tree Annotation

Previous Work:Manual Annotation

Manually split categories NP: subject vs object DT: determiners vs demonstratives IN: sentential vs prepositional

Advantages: Fairly compact grammar Linguistic motivations

Disadvantages: Performance leveled out Manually annotated

[Klein & Manning ’03]

Model F1

Naïve Treebank Grammar 72.6

Klein & Manning ’03 86.3

Page 6: Learning Accurate, Compact, and Interpretable Tree Annotation

Previous Work:Automatic Annotation Induction

Advantages: Automatically learned:

Label all nodes with latent variables.

Same number k of subcategories for all categories.

Disadvantages: Grammar gets too large Most categories are

oversplit while others are undersplit.

[Matsuzaki et. al ’05, Prescher ’05]

Model F1

Klein & Manning ’03 86.3

Matsuzaki et al. ’05 86.7

Page 7: Learning Accurate, Compact, and Interpretable Tree Annotation

Previous work is complementary

Manual Annotation

Allocates splits where needed

Very tedious

Compact Grammar

Misses Features

Automatic Annotation

Splits uniformly

Automatically learned

Large Grammar

Captures many features

This Work

Allocates splits where needed

Automatically learned

Compact Grammar

Captures many features

Page 8: Learning Accurate, Compact, and Interpretable Tree Annotation

Forward

Learning Latent Annotations

EM algorithm:

X1

X2X7X4

X5 X6X3

He was right

.

Brackets are known Base categories are known Only induce subcategories

Just like Forward-Backward for HMMs. Backward

Page 9: Learning Accurate, Compact, and Interpretable Tree Annotation

k=16k=8k=4

k=2

k=160

65

70

75

80

85

90

50 250 450 650 850 1050 1250 1450 1650

Total Number of grammar symbols

Par

sing

acc

urac

y (F

1)

OverviewLimit of computational resources

- Hierarchical Training- Adaptive Splitting- Parameter Smoothing

Page 10: Learning Accurate, Compact, and Interpretable Tree Annotation

Refinement of the DT tagDT

DT-1 DT-2 DT-3 DT-4

Page 11: Learning Accurate, Compact, and Interpretable Tree Annotation

Refinement of the DT tagDT

Page 12: Learning Accurate, Compact, and Interpretable Tree Annotation

Hierarchical refinement of the DT tag

Page 13: Learning Accurate, Compact, and Interpretable Tree Annotation

Hierarchical Estimation Results

74

76

78

80

82

84

86

88

90

100 300 500 700 900 1100 1300 1500 1700

Total Number of grammar symbols

Par

sing

acc

urac

y (F

1)

Model F1

Baseline 87.3

Hierarchical Training 88.4

Page 14: Learning Accurate, Compact, and Interpretable Tree Annotation

Refinement of the , tag

Splitting all categories the same amount is wasteful:

Page 15: Learning Accurate, Compact, and Interpretable Tree Annotation

The DT tag revisited

Oversplit?

Page 16: Learning Accurate, Compact, and Interpretable Tree Annotation

Adaptive Splitting

Want to split complex categories more Idea: split everything, roll back splits which

were least useful

Page 17: Learning Accurate, Compact, and Interpretable Tree Annotation

Adaptive Splitting

Want to split complex categories more Idea: split everything, roll back splits which

were least useful

Page 18: Learning Accurate, Compact, and Interpretable Tree Annotation

Adaptive Splitting

Want to split complex categories more Idea: split everything, roll back splits which

were least useful

Page 19: Learning Accurate, Compact, and Interpretable Tree Annotation

Adaptive Splitting

Evaluate loss in likelihood from removing each split =

Data likelihood with split reversed

Data likelihood with split No loss in accuracy when 50% of the splits are

reversed.

Page 20: Learning Accurate, Compact, and Interpretable Tree Annotation

Adaptive Splitting Results

74

76

78

80

82

84

86

88

90

100 300 500 700 900 1100 1300 1500 1700

Total Number of grammar symbols

Pars

ing

accu

racy

(F1)

50% Merging

Hierarchical Training

Flat Training

Model F1

Previous 88.4

With 50% Merging 89.5

Page 21: Learning Accurate, Compact, and Interpretable Tree Annotation

0

5

10

15

20

25

30

35

40

NP

VP PP

AD

VP S

AD

JP

SB

AR QP

WH

NP

PR

N

NX

SIN

V

PR

T

WH

PP

SQ

CO

NJP

FR

AG

NA

C

UC

P

WH

AD

VP

INT

J

SB

AR

Q

RR

C

WH

AD

JP X

RO

OT

LST

Number of Phrasal Subcategories

Page 22: Learning Accurate, Compact, and Interpretable Tree Annotation

0

5

10

15

20

25

30

35

40

NP

VP PP

AD

VP S

AD

JP

SB

AR QP

WH

NP

PR

N

NX

SIN

V

PR

T

WH

PP

SQ

CO

NJP

FR

AG

NA

C

UC

P

WH

AD

VP

INT

J

SB

AR

Q

RR

C

WH

AD

JP X

RO

OT

LST

Number of Phrasal Subcategories

PP

VPNP

Page 23: Learning Accurate, Compact, and Interpretable Tree Annotation

0

5

10

15

20

25

30

35

40

NP

VP PP

AD

VP S

AD

JP

SB

AR QP

WH

NP

PR

N

NX

SIN

V

PR

T

WH

PP

SQ

CO

NJP

FR

AG

NA

C

UC

P

WH

AD

VP

INT

J

SB

AR

Q

RR

C

WH

AD

JP

X

RO

OT

LST

Number of Phrasal Subcategories

XNAC

Page 24: Learning Accurate, Compact, and Interpretable Tree Annotation

Number of Lexical Subcategories

0

10

20

30

40

50

60

70

NN

P JJN

NS

NN

VB

N RB

VB

G VB

VB

D CD IN

VB

ZV

BP DT

NN

PS

CC

JJR

JJS :

PR

PP

RP

$M

DR

BR

WP

PO

SP

DT

WR

B-L

RB

- .E

XW

P$

WD

T-R

RB

- ''F

WR

BS

TO $

UH , ``

SY

M RP

LS#

TO

,

POS

Page 25: Learning Accurate, Compact, and Interpretable Tree Annotation

0

10

20

30

40

50

60

70

NN

P JJN

NS

NN

VB

N RB

VB

G VB

VB

D CD IN

VB

ZV

BP DT

NN

PS

CC

JJR

JJS :

PR

PP

RP

$M

DR

BR

WP

PO

SP

DT

WR

B-L

RB

- .E

XW

P$

WD

T-R

RB

- ''F

WR

BS

TO $

UH , ``

SY

M RP

LS#

Number of Lexical Subcategories

IN

DT

RB VBx

Page 26: Learning Accurate, Compact, and Interpretable Tree Annotation

Number of Lexical Subcategories

0

10

20

30

40

50

60

70

NN

P JJN

NS

NN

VB

N RB

VB

G VB

VB

D CD IN

VB

ZV

BP DT

NN

PS

CC

JJR

JJS :

PR

PP

RP

$M

DR

BR

WP

PO

SP

DT

WR

B-L

RB

- .E

XW

P$

WD

T-R

RB

- ''F

WR

BS

TO $

UH , ``

SY

M RP

LS#

NN

NNS

NNP JJ

Page 27: Learning Accurate, Compact, and Interpretable Tree Annotation

Smoothing

Heavy splitting can lead to overfitting Idea: Smoothing allows us to pool

statistics

Page 28: Learning Accurate, Compact, and Interpretable Tree Annotation

Linear Smoothing

Page 29: Learning Accurate, Compact, and Interpretable Tree Annotation

Result Overview

74

76

78

80

82

84

86

88

90

100 300 500 700 900 1100

Total Number of grammar symbols

Pa

rsin

g a

ccu

racy

(F

1)

50% Merging and Smoothing

50% Merging

Hierarchical Training

Flat Training

Page 30: Learning Accurate, Compact, and Interpretable Tree Annotation

Result Overview

74

76

78

80

82

84

86

88

90

100 300 500 700 900 1100

Total Number of grammar symbols

Pa

rsin

g a

ccu

racy

(F

1)

50% Merging and Smoothing

50% Merging

Hierarchical Training

Flat Training

Page 31: Learning Accurate, Compact, and Interpretable Tree Annotation

Result Overview

74

76

78

80

82

84

86

88

90

100 300 500 700 900 1100

Total Number of grammar symbols

Pa

rsin

g a

ccu

racy

(F

1)

50% Merging and Smoothing

50% Merging

Hierarchical Training

Flat Training

Model F1

Previous 89.5

With Smoothing 90.7

Page 32: Learning Accurate, Compact, and Interpretable Tree Annotation

Final Results

F1

≤ 40 words

F1

all wordsParser

Klein & Manning ’03 86.3 85.7

Matsuzaki et al. ’05 86.7 86.1

This Work 90.2 89.7

Page 33: Learning Accurate, Compact, and Interpretable Tree Annotation

Final Results

F1

≤ 40 words

F1

all wordsParser

Klein & Manning ’03 86.3 85.7

Matsuzaki et al. ’05 86.7 86.1

Collins ’99 88.6 88.2

Charniak & Johnson ’05 90.1 89.6

This Work 90.2 89.7

Page 34: Learning Accurate, Compact, and Interpretable Tree Annotation

Linguistic Candy

Proper Nouns (NNP):

Personal pronouns (PRP):

NNP-14 Oct. Nov. Sept.

NNP-12 John Robert James

NNP-2 J. E. L.

NNP-1 Bush Noriega Peters

NNP-15 New San Wall

NNP-3 York Francisco Street

PRP-0 It He I

PRP-1 it he they

PRP-2 it them him

Page 35: Learning Accurate, Compact, and Interpretable Tree Annotation

Linguistic Candy

Relative adverbs (RBR):

Cardinal Numbers (CD):

RBR-0 further lower higher

RBR-1 more less More

RBR-2 earlier Earlier later

CD-7 one two Three

CD-4 1989 1990 1988

CD-11 million billion trillion

CD-0 1 50 100

CD-3 1 30 31

CD-9 78 58 34

Page 36: Learning Accurate, Compact, and Interpretable Tree Annotation

Conclusions

New Ideas: Hierarchical Training Adaptive Splitting Parameter Smoothing

State of the Art Parsing Performance: Improves from X-Bar initializer 63.4 to 90.2

Linguistically interesting grammars to sift through.

Page 37: Learning Accurate, Compact, and Interpretable Tree Annotation

Thank You!

[email protected]

Page 38: Learning Accurate, Compact, and Interpretable Tree Annotation

Other things we tried

X-Bar vs structurally annotated grammar: X-Bar grammar starts at lower performance, but provides

more flexibility Better Smoothing:

Tried different (hierarchical) smoothing methods, all worked about the same

(Linguistically) constraining rewrite possibilities between subcategories: Hurts performance EM automatically learns that most subcategory

combinations are meaningless: ≥ 90% of the possible rewrites have 0 probability