neural semantic role labeling: what works and what’s...

Post on 28-Sep-2020

7 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Neural Semantic Role Labeling:What works and what’s next

or: What else can we do other than using 1000 LSTM layers :)

Luheng He†, Kenton Lee†, Mike Lewis ‡ and Luke Zettlemoyer†*

† Paul G. Allen School of Computer Science & Engineering, Univ. of Washington, ‡ Facebook AI Research

* Allen Institute for Artificial Intelligence

1

Semantic Role Labeling (SRL)

2

who did what to whom, when and where

Motivation

predicate argument

role label

who what when where why …

• Find out “who did what to whom” in text. • Given predicate, identify arguments and label them.

Semantic Role Labeling (SRL)

2

who did what to whom, when and where

Question Answering Information Extraction Machine Translation

Applications

Motivation

predicate argument

role label

who what when where why …

My mug broke into pieces immediately.

The robot broke my favorite mug with a wrench.

Semantic Role Labeling (SRL)

3

My mug broke into pieces immediately.

The robot broke my favorite mug with a wrench.

Semantic Role Labeling (SRL)

3

vsubj obj prep

subj v prep adv

My mug broke into pieces immediately.

The robot broke my favorite mug with a wrench.

Semantic Role Labeling (SRL)

3

vsubj obj prep

subj v prep adv

thing broken

thing broken

My mug broke into pieces immediately.

The robot broke my favorite mug with a wrench.

Semantic Role Labeling (SRL)

3

vsubj obj prep

subj v prep adv

thing broken

thing broken

breaker instrument

pieces (final state) temporal

My mug broke into pieces immediately.

The robot broke my favorite mug with a wrench.

Semantic Role Labeling (SRL)

3

vsubj obj prep

Frame: break.01role description

ARG0 breakerARG1 thing brokenARG2 instrumentARG3 pieces

ARG4 broken away from what?

subj v prep adv

thing broken

thing broken

breaker instrument

pieces (final state) temporal

My mug broke into pieces immediately.

The robot broke my favorite mug with a wrench.

Semantic Role Labeling (SRL)

3

vsubj obj prep

Frame: break.01role description

ARG0 breakerARG1 thing brokenARG2 instrumentARG3 pieces

ARG4 broken away from what?

subj v prep adv

thing broken

thing broken

breaker instrument

pieces (final state) temporal

ARG0 ARG1 ARG2

ARG3ARG1 ARGM-TMP

4

The Proposition Bank (PropBank)

4

Paul Kingsbury and Martha Palmer.From Treebank to PropBank. 2002

4

The Proposition Bank (PropBank)

Annotated on top of the Penn Treebank Syntax

PropBank Annotation Guidelines, Bonial et al., 2010

4

Paul Kingsbury and Martha Palmer.From Treebank to PropBank. 2002

4

The Proposition Bank (PropBank)

Core roles: Verb-specific roles (ARG0-

ARG5) defined in frame files

Frame: break.01

role descriptionARG0 breakerARG1 thing broken

ARG2 instrument

ARG3 pieces

ARG4 broken away from what?

Annotated on top of the Penn Treebank Syntax

PropBank Annotation Guidelines, Bonial et al., 2010

4

Paul Kingsbury and Martha Palmer.From Treebank to PropBank. 2002

4

The Proposition Bank (PropBank)

Core roles: Verb-specific roles (ARG0-

ARG5) defined in frame files

Frame: break.01

role descriptionARG0 breakerARG1 thing broken

ARG2 instrument

ARG3 pieces

ARG4 broken away from what?

Frame: buy.01

role descriptionARG0 buyerARG1 thing bough

ARG2 seller

ARG3 price paid

ARG4 benefactive

Annotated on top of the Penn Treebank Syntax

PropBank Annotation Guidelines, Bonial et al., 2010

4

Paul Kingsbury and Martha Palmer.From Treebank to PropBank. 2002

4

The Proposition Bank (PropBank)

Core roles: Verb-specific roles (ARG0-

ARG5) defined in frame files

Frame: break.01

role descriptionARG0 breakerARG1 thing broken

ARG2 instrument

ARG3 pieces

ARG4 broken away from what?

Frame: buy.01

role descriptionARG0 buyerARG1 thing bough

ARG2 seller

ARG3 price paid

ARG4 benefactive

role descriptionTMP temporalLOC locationMNR mannerDIR directionCAU causePRP purpose…

Adjunct roles: (ARGM-) shared

across verbsAnnotated on top of the Penn Treebank Syntax

PropBank Annotation Guidelines, Bonial et al., 2010

4

Paul Kingsbury and Martha Palmer.From Treebank to PropBank. 2002

SRL is a hard problem …

5

SRL is a hard problem …

• Over 10 years, F1 on the PropBank test set: 79.4 (Punyakanok 2005) — 80.3 (FitzGerald 2015)

5

SRL is a hard problem …

• Over 10 years, F1 on the PropBank test set: 79.4 (Punyakanok 2005) — 80.3 (FitzGerald 2015)

• Many interesting challenges: Syntactic alternation Prepositional phrase attachment Long-range dependencies and common sense

5

The robot plays piano.

The music plays softly.

ARG0player

ARG1thing performed

ARG2instrument

ARGM-MNR

The cafe is playing my favorite song.

ARG1thing performed

ARG0player

6

Syntactic Alternation PP Attachment Long-range Dependencies

The robot plays piano.

The music plays softly.

ARG0player

ARG1thing performed

ARG2instrument

ARGM-MNR

The cafe is playing my favorite song.

ARG1thing performed

ARG0player

subjects

6

Syntactic Alternation PP Attachment Long-range Dependencies

The robot plays piano.

The music plays softly.

ARG0player

ARG1thing performed

ARG2instrument

ARGM-MNR

The cafe is playing my favorite song.

ARG1thing performed

ARG0player

subjects

objects

6

Syntactic Alternation PP Attachment Long-range Dependencies

I eat [pasta] [with delight].ARG0eater

ARG1meal

ARGM-MNRmanner

7

Syntactic Alternation Prepositional Phrase (PP) Attachment Long-range

Dependencies

I eat [pasta] [with delight].ARG0eater

ARG1meal

ARGM-MNRmanner

I eat [pasta with broccoli].ARG0eater

ARG1meal

7

Syntactic Alternation Prepositional Phrase (PP) Attachment Long-range

Dependencies

8

We flew to Chicago.ARG1

passengerARGM-GOL

goal

Syntactic Alternation PP Attachment Long-range Dependencies

8

We flew to Chicago.ARG1

passengerARGM-GOL

goal

We remember the nice view flying to Chicago.ARG1

passengerARGM-GOL

goal

Syntactic Alternation PP Attachment Long-range Dependencies

8

We flew to Chicago.ARG1

passengerARGM-GOL

goal

We remember the nice view flying to Chicago.ARG1

passengerARGM-GOL

goal

We remember John and Mary flying to Chicago.ARG1

passengerARGM-GOL

goal

Syntactic Alternation PP Attachment Long-range Dependencies

SRL is even harder for out-domain data …

“Dip chicken breasts into eggs to coat”

9

SRL is even harder for out-domain data …

“Dip chicken breasts into eggs to coat”

9

SRL is even harder for out-domain data …

“Dip chicken breasts into eggs to coat”

Active, Ser133-phosphorylated CREB effects transcription of CRE-dependent genes via interaction with the 265-kDa …

9

10

Long-term Plan for Improving SRL

Step 1: Collect more data for SRL — Question-Answer Driven Semantic Role Labeling (QA-SRL) — Human-in-the-Loop Parsing

10

Long-term Plan for Improving SRL

Step 1: Collect more data for SRL — Question-Answer Driven Semantic Role Labeling (QA-SRL) — Human-in-the-Loop Parsing

10

Long-term Plan for Improving SRL

Step 2: Build accurate SRL model — Neural Semantic Role Labeling (for PropBank SRL)

Step 1: Collect more data for SRL — Question-Answer Driven Semantic Role Labeling (QA-SRL) — Human-in-the-Loop Parsing

10

Long-term Plan for Improving SRL

Step 3: SRL system for many domains — Future work …

Step 2: Build accurate SRL model — Neural Semantic Role Labeling (for PropBank SRL)

Step 1: Collect more data for SRL — Question-Answer Driven Semantic Role Labeling (QA-SRL) — Human-in-the-Loop Parsing

10

Long-term Plan for Improving SRL

Step 3: SRL system for many domains — Future work …

Step 2: Build accurate SRL model — Neural Semantic Role Labeling (for PropBank SRL)

11

First Step: Collect more (cheaper) SRL Data

11

Intuition: Anyone who understands the meaning of a sentence should be able provide annotation for SRL.

First Step: Collect more (cheaper) SRL Data

11

Intuition: Anyone who understands the meaning of a sentence should be able provide annotation for SRL.

First Step: Collect more (cheaper) SRL Data

Challenge: Complicated annotation process of traditional SRL.

11

Intuition: Anyone who understands the meaning of a sentence should be able provide annotation for SRL.

First Step: Collect more (cheaper) SRL Data

Challenge: Complicated annotation process of traditional SRL.

Solution: Design a simpler annotation scheme!

12

Last month, we saw the Grand Canyon flying to Chicago.

Given sentence and a verb:

Question-Answer Driven SRL (QA-SRL)

12

Who was flying?

Last month, we saw the Grand Canyon flying to Chicago.

Given sentence and a verb:

Step 1: Ask a question about the verb:

Question-Answer Driven SRL (QA-SRL)

12

Who was flying? we

Last month, we saw the Grand Canyon flying to Chicago.

Given sentence and a verb:

Step 1: Ask a question about the verb:

Step 2: Answer with words in the sentence:

Question-Answer Driven SRL (QA-SRL)

12

Who was flying? we

Last month, we saw the Grand Canyon flying to Chicago.

Given sentence and a verb:

Step 1: Ask a question about the verb:

Step 2: Answer with words in the sentence:

Step 3: Repeat, write as many Q/A pairs as possible …

Question-Answer Driven SRL (QA-SRL)

12

Who was flying? we

Where did someone fly to? Chicago

When did someone fly? Last month

Last month, we saw the Grand Canyon flying to Chicago.

Given sentence and a verb:

Step 1: Ask a question about the verb:

Step 2: Answer with words in the sentence:

Step 3: Repeat, write as many Q/A pairs as possible …

Question-Answer Driven SRL (QA-SRL)

12

Who was flying? we

Where did someone fly to? Chicago

When did someone fly? Last month

Last month, we saw the Grand Canyon flying to Chicago.

Given sentence and a verb:

Step 1: Ask a question about the verb:

Step 2: Answer with words in the sentence:

Step 3: Repeat, write as many Q/A pairs as possible …

Stop until all Q/A pairs are exhausted.

Question-Answer Driven SRL (QA-SRL)

13

Predicate Argument (Verbal) Predicate

Answer

Role Question

Question-Answer Driven SRL (QA-SRL)

Traditional SRL (PropBank)

Large Role Inventory No Role Inventory!

Comparing QA-SRL to PropBank

14

Last month, we saw the Grand Canyon flying to Chicago.

Question-Answer Driven SRL (QA-SRL)

Who was flying?Where did someone fly to?

When did someone fly?

QA-SRL

14

Last month, we saw the Grand Canyon flying to Chicago.

Question-Answer Driven SRL (QA-SRL)

Who was flying?Where did someone fly to?

When did someone fly?

QA-SRL

ARG1ARGM-GOL

ARGM-TMP

PropBank SRL

14

Last month, we saw the Grand Canyon flying to Chicago.

Question-Answer Driven SRL (QA-SRL)

Who was flying?Where did someone fly to?

When did someone fly?

QA-SRL

ARG1ARGM-GOL

ARGM-TMP

PropBank SRL

Non-expert annotated QA-SRL has about 80% agreement with PropBank.

15

Long-term Plan for Improving SRL

Step 3: SRL system for many domains — Future work …

Step 1: Collect more data for SRL — Question-Answer Driven Semantic Role Labeling (QA-SRL) — Human-in-the-Loop Parsing

Step 2: Build accurate SRL model — Neural Semantic Role Labeling (for PropBank SRL)

15

Long-term Plan for Improving SRL

Step 3: SRL system for many domains — Future work …

Step 1: Collect more data for SRL — Question-Answer Driven Semantic Role Labeling (QA-SRL) — Human-in-the-Loop Parsing

Step 2: Build accurate SRL model — Neural Semantic Role Labeling (for PropBank SRL)

16

SRL Systems

syntactic features

candidate argument spans

labeled arguments

prediction

labeling

ILP/DP

sentence, predicate

argument id.

Pipeline Systems

Punyakanok et al., 2008 Täckström et al., 2015 FitzGerald et al., 2015

16

SRL Systems

syntactic features

candidate argument spans

labeled arguments

prediction

labeling

ILP/DP

sentence, predicate

argument id.

Pipeline Systems

Punyakanok et al., 2008 Täckström et al., 2015 FitzGerald et al., 2015

sentence, predicate

BIO sequence

prediction

Deep BiLSTM + CRF layer

Viterbi

context window features

End-to-end Systems

Collobert et al., 2011 Zhou and Xu, 2015 Wang et. al, 2015

16

SRL Systems

syntactic features

candidate argument spans

labeled arguments

prediction

labeling

ILP/DP

sentence, predicate

argument id.

Pipeline Systems

Deep BiLSTM

Hard constraints

BIO sequence

prediction

sentence, predicate

*This work

Punyakanok et al., 2008 Täckström et al., 2015 FitzGerald et al., 2015

sentence, predicate

BIO sequence

prediction

Deep BiLSTM + CRF layer

Viterbi

context window features

End-to-end Systems

Collobert et al., 2011 Zhou and Xu, 2015 Wang et. al, 2015

17

The cats love hats .Input (sentence and predicate):

SRL as BIO Tagging Problem

17

The cats love hats .Input (sentence and predicate):

BIO output: B-ARG0 I-ARG0 B-V I-ARG1 O(Begin, Inside, Outside)

SRL as BIO Tagging Problem

17

The cats love hats .Input (sentence and predicate):

BIO output: B-ARG0 I-ARG0 B-V I-ARG1 O

Final SRL output: ARG0 V ARG1

(Begin, Inside, Outside)

SRL as BIO Tagging Problem

18

the cats love hats[ ] [ ] [V] [ ]

B-ARG0 0.4I-ARG0 0.05B-ARG1 0.5I-ARG1 0.03

… …

B-ARG0 0.1I-ARG0 0.5B-ARG1 0.1I-ARG1 0.2

… …

B-ARG0 0.001I-ARG0 0.001B-ARG1 0.001

… …B-V 0.95

B-ARG0 0.1I-ARG0 0.1B-ARG1 0.7I-ARG1 0.2

… …

18

the cats love hats[ ] [ ] [V] [ ]

B-ARG0 0.4I-ARG0 0.05B-ARG1 0.5I-ARG1 0.03

… …

B-ARG0 0.1I-ARG0 0.5B-ARG1 0.1I-ARG1 0.2

… …

B-ARG0 0.001I-ARG0 0.001B-ARG1 0.001

… …B-V 0.95

B-ARG0 0.1I-ARG0 0.1B-ARG1 0.7I-ARG1 0.2

… …

(1) Deep BiLSTM tagger

18

the cats love hats[ ] [ ] [V] [ ]

B-ARG0 0.4I-ARG0 0.05B-ARG1 0.5I-ARG1 0.03

… …

B-ARG0 0.1I-ARG0 0.5B-ARG1 0.1I-ARG1 0.2

… …

B-ARG0 0.001I-ARG0 0.001B-ARG1 0.001

… …B-V 0.95

B-ARG0 0.1I-ARG0 0.1B-ARG1 0.7I-ARG1 0.2

… …

(1) Deep BiLSTM tagger

(2) Highway connections

18

the cats love hats[ ] [ ] [V] [ ]

B-ARG0 0.4I-ARG0 0.05B-ARG1 0.5I-ARG1 0.03

… …

B-ARG0 0.1I-ARG0 0.5B-ARG1 0.1I-ARG1 0.2

… …

B-ARG0 0.001I-ARG0 0.001B-ARG1 0.001

… …B-V 0.95

B-ARG0 0.1I-ARG0 0.1B-ARG1 0.7I-ARG1 0.2

… …

(1) Deep BiLSTM tagger

(2) Highway connections

(3) Variational dropout

18

the cats love hats[ ] [ ] [V] [ ]

B-ARG0 0.4I-ARG0 0.05B-ARG1 0.5I-ARG1 0.03

… …

B-ARG0 0.1I-ARG0 0.5B-ARG1 0.1I-ARG1 0.2

… …

B-ARG0 0.001I-ARG0 0.001B-ARG1 0.001

… …B-V 0.95

B-ARG0 0.1I-ARG0 0.1B-ARG1 0.7I-ARG1 0.2

… …

(1) Deep BiLSTM tagger

(2) Highway connections

(4) Viterbi decoding with

hard constraints

(3) Variational dropout

19

the cats love hats[ ] [ ] [V] [ ]

B-ARG0 I-ARG0 B-V B-ARG1

Forward LSTM

Backward LSTM

Softmax

Word + Tag embeddings

Output

argmax

Model - (1) Deep BiLSTM TaggerDeep BiLSTM Tagger Highway Connections

Variational Dropout

Viterbi Decoding w\ Hard Constraints

19

the cats love hats[ ] [ ] [V] [ ]

B-ARG0 I-ARG0 B-V B-ARG1

Forward LSTM

Backward LSTM

Softmax

Word + Tag embeddings

Output

argmax

Concatenate: 100dim + 100dim

Predicate

Model - (1) Deep BiLSTM TaggerDeep BiLSTM Tagger Highway Connections

Variational Dropout

Viterbi Decoding w\ Hard Constraints

19

the cats love hats[ ] [ ] [V] [ ]

B-ARG0 I-ARG0 B-V B-ARG1

Forward LSTM

Backward LSTM

Softmax

Word + Tag embeddings

OutputB-ARG0I-ARG0

B-ARG1O

0.175 0.35 0.525 0.7

0.030.1

0.70.1

argmax

Concatenate: 100dim + 100dim

Predicate

Model - (1) Deep BiLSTM TaggerDeep BiLSTM Tagger Highway Connections

Variational Dropout

Viterbi Decoding w\ Hard Constraints

Grammar as a Foreign Language (Vinyals et al., 2014): 3 layers End-to-end Semantic Role Labeling (Zhou and Xu, 2015): 8 layers

Google’s Neural Machine Translation (GNMT, Wu et al., 2016): 8 layers

this work: 8 layers

Deep Residual Learning for Image Recognition (He et al, 2016): 152 layers

20

Model - (2) Highway Connections

Trend: Deeper models for higher accuracy

Deep BiLSTM Tagger Highway Connections Variational

DropoutViterbi Decoding w\

Hard Constraints

21

the cats love hats[ ] [ ] [V] [ ]

BiLSTM layers 1-2

increase expressive

power

21

the cats love hats[ ] [ ] [V] [ ]

BiLSTM layers 1-2

BiLSTM layers 3-4

increase expressive

power

21

the cats love hats[ ] [ ] [V] [ ]

BiLSTM layers 1-2

BiLSTM layers 3-4

BiLSTM layers 5-6

increase expressive

power

21

the cats love hats[ ] [ ] [V] [ ]

BiLSTM layers 1-2

BiLSTM layers 3-4

BiLSTM layers 5-6

increase expressive

power

21

the cats love hats[ ] [ ] [V] [ ]

BiLSTM layers 1-2

BiLSTM layers 3-4

BiLSTM layers 5-6

increase expressive

power

harder to back-

propagate

Grammar as a Foreign Language (Vinyals et al., 2014): 3 layers End-to-end Semantic Role Labeling (Zhou and Xu, 2015): 8 layers

Google’s Neural Machine Translation (GNMT, Wu et al., 2016): 8 layers

this work: 8 layers

Deep Residual Learning for Image Recognition (He et al, 2016): 152 layers

22

Model - (2) Highway ConnectionsDeep BiLSTM Tagger Highway Connections Variational

DropoutViterbi Decoding w\

Hard Constraints

Grammar as a Foreign Language (Vinyals et al., 2014): 3 layers End-to-end Semantic Role Labeling (Zhou and Xu, 2015): 8 layers

Google’s Neural Machine Translation (GNMT, Wu et al., 2016): 8 layers

this work: 8 layers

Deep Residual Learning for Image Recognition (He et al, 2016): 152 layers

22

use different learning rates for different layers (harder to reimplement)

Model - (2) Highway ConnectionsDeep BiLSTM Tagger Highway Connections Variational

DropoutViterbi Decoding w\

Hard Constraints

Grammar as a Foreign Language (Vinyals et al., 2014): 3 layers End-to-end Semantic Role Labeling (Zhou and Xu, 2015): 8 layers

Google’s Neural Machine Translation (GNMT, Wu et al., 2016): 8 layers

this work: 8 layers

Deep Residual Learning for Image Recognition (He et al, 2016): 152 layers

22

use shortcut connections between layers (“highway” or “residual”)

use different learning rates for different layers (harder to reimplement)

Model - (2) Highway ConnectionsDeep BiLSTM Tagger Highway Connections Variational

DropoutViterbi Decoding w\

Hard Constraints

23

input from the previous layer

recurrent input

from the prev. timestep

output to the next layer

References: Deep Residual Networks, Kaiming He, ICML 2016 Tutorial

Training Very Deep Networks, Srivastava et al., 2015

Non-linearity

ht

ht�1 F(ct�1,ht�1,xt)

xt

Model - (2) Highway ConnectionsDeep BiLSTM Tagger Highway Connections Variational

DropoutViterbi Decoding w\

Hard Constraints

23

input from the previous layer

recurrent input

from the prev. timestep

output to the next layer

References: Deep Residual Networks, Kaiming He, ICML 2016 Tutorial

Training Very Deep Networks, Srivastava et al., 2015

Non-linearity

ht shortcut

ht�1 F(ct�1,ht�1,xt)

xt

xt

Model - (2) Highway ConnectionsDeep BiLSTM Tagger Highway Connections Variational

DropoutViterbi Decoding w\

Hard Constraints

23

input from the previous layer

recurrent input

from the prev. timestep

output to the next layer

References: Deep Residual Networks, Kaiming He, ICML 2016 Tutorial

Training Very Deep Networks, Srivastava et al., 2015

Non-linearity

ht shortcut

ht�1 F(ct�1,ht�1,xt)

residual netht + xt

xt

xt

new output:

Model - (2) Highway ConnectionsDeep BiLSTM Tagger Highway Connections Variational

DropoutViterbi Decoding w\

Hard Constraints

23

input from the previous layer

recurrent input

from the prev. timestep

output to the next layer

References: Deep Residual Networks, Kaiming He, ICML 2016 Tutorial

Training Very Deep Networks, Srivastava et al., 2015

Non-linearity

ht shortcut

ht�1 F(ct�1,ht�1,xt)

residual netht + xt

gated highway network:

rt � ht + (1� rt) � xt

rt = �(f(ht�1,xt))

xt

xt

new output:

Model - (2) Highway ConnectionsDeep BiLSTM Tagger Highway Connections Variational

DropoutViterbi Decoding w\

Hard Constraints

24

the cats love[ ] [ ] [V]

Model - (3) Variational DropoutDeep BiLSTM Tagger

Highway Connections Variational Dropout Viterbi Decoding w\

Hard Constraints

24

the cats love[ ] [ ] [V]

Traditionally, dropout masks are only applied to vertical connections.

Model - (3) Variational DropoutDeep BiLSTM Tagger

Highway Connections Variational Dropout Viterbi Decoding w\

Hard Constraints

24

the cats love[ ] [ ] [V]

Traditionally, dropout masks are only applied to vertical connections.

Applying dropout to recurrent connections causes too much noise amplification.

Model - (3) Variational DropoutDeep BiLSTM Tagger

Highway Connections Variational Dropout Viterbi Decoding w\

Hard Constraints

24

the cats love[ ] [ ] [V]

Traditionally, dropout masks are only applied to vertical connections.

Variational dropout: Reuse the same dropout mask for each timestep. Gal and Ghahramani, 2016

Applying dropout to recurrent connections causes too much noise amplification.

Model - (3) Variational DropoutDeep BiLSTM Tagger

Highway Connections Variational Dropout Viterbi Decoding w\

Hard Constraints

Softmax

BiLSTM layers …

25

B-ARG1 I-ARG0 B-V B-ARG1Greedy Output

argmax

the cats love hats[ ] [ ] [V] [ ]

Model - (4) Viterbi Decoding with Hard ConstraintsDeep BiLSTM Tagger

Highway Connections

Variational Dropout Viterbi Decoding w\ Hard Constraints

Softmax

BiLSTM layers …

25

BIO inconsistency

B-ARG1 I-ARG0 B-V B-ARG1Greedy Output

argmax

the cats love hats[ ] [ ] [V] [ ]

Model - (4) Viterbi Decoding with Hard ConstraintsDeep BiLSTM Tagger

Highway Connections

Variational Dropout Viterbi Decoding w\ Hard Constraints

Softmax

BiLSTM layers …

25

BIO inconsistency

s(B-ARG0 ! I-ARG0) =0

s(B-ARG1 ! I-ARG0) =� inf

· · ·

Heuristic transition scores

B-ARG1 I-ARG0 B-V B-ARG1Greedy Output

argmax

the cats love hats[ ] [ ] [V] [ ]

Model - (4) Viterbi Decoding with Hard ConstraintsDeep BiLSTM Tagger

Highway Connections

Variational Dropout Viterbi Decoding w\ Hard Constraints

Softmax

BiLSTM layers …

25

B-ARG0 0.4I-ARG0 0.05B-ARG1 0.5I-ARG1 0.03

… …O 0.01

B-ARG0 0.1I-ARG0 0.5B-ARG1 0.1I-ARG1 0.2

… …O 0.05

B-ARG0 0.001I-ARG0 0.001B-ARG1 0.001I-ARG1 0.002

… …B-V 0.95

B-ARG0 0.1I-ARG0 0.1B-ARG1 0.7I-ARG1 0.2

… …O 0.05

s(B-ARG0 ! I-ARG0) =0

s(B-ARG1 ! I-ARG0) =� inf

· · ·

Heuristic transition scores

Viterbi decoding

the cats love hats[ ] [ ] [V] [ ]

Model - (4) Viterbi Decoding with Hard ConstraintsDeep BiLSTM Tagger

Highway Connections

Variational Dropout Viterbi Decoding w\ Hard Constraints

Other Implementation Details …

26

• 8 layer BiLSTMs with 300D hidden layers.

• 100D GloVe embeddings, updated during training.

• Orthonormal initialization for LSTM weight matrices (Saxe et al., 2013)

• 5 model ensemble with product-of-experts (Hinton 2002)

• Trained for 500 epochs.

Datasets

27

CoNLL-2005(PropBank)

CoNLL-2012(OntoNotes)

Size 40k sentences 140k sentences

Domains newswire

• telephone conversations • newswire • newsgroups • broadcast news • broadcast conversation • weblogs

Annotated predicates Verbs Added some nominal

predicates

CoNLL 2005 Results

28

F1

60

65

70

75

80

85

90

Ours* Ours Zhou FitzGerald* Täckström Toutanova* Punyakanok*

WSJ Test Brown (out-domain) Test *:Ensemble models

CoNLL 2005 Results

28

F1

60

65

70

75

80

85

90

Ours* Ours Zhou FitzGerald* Täckström Toutanova* Punyakanok*

WSJ Test Brown (out-domain) Test

79.480.379.980.382.883.184.6

*:Ensemble models

CoNLL 2005 Results

28

F1

60

65

70

75

80

85

90

Ours* Ours Zhou FitzGerald* Täckström Toutanova* Punyakanok*

WSJ Test Brown (out-domain) Test

67.868.871.372.2

69.472.173.6

79.480.379.980.382.883.184.6

*:Ensemble models

CoNLL 2005 Results

28

F1

60

65

70

75

80

85

90

Ours* Ours Zhou FitzGerald* Täckström Toutanova* Punyakanok*

WSJ Test Brown (out-domain) Test

67.868.871.372.2

69.472.173.6

79.480.379.980.382.883.184.6

Pipeline modelsBiLSTM models

*:Ensemble models

CoNLL 2012 (OntoNotes) Results

29

F1

60

65

70

75

80

85

90

Ours* Ours Zhou FitzGerald* Täckström Pradhan

77.579.480.281.581.783.4

CoNLL 2012 Test

Pipeline modelsBiLSTM models

*:Ensemble models

Ablations on Number of Layers (2,4,6 and 8)

30

F1 o

n C

oNLL

-05

Dev

.

70

75

80

85

L2 L4 L6 L8

Greedy decodingViterbi decoding

80.580.179.1

74.6

Ablations on Number of Layers (2,4,6 and 8)

30

F1 o

n C

oNLL

-05

Dev

.

70

75

80

85

L2 L4 L6 L8

Greedy decodingViterbi decoding

81.681.480.5

77.2

80.580.179.1

74.6

Ablations on Number of Layers (2,4,6 and 8)

30

F1 o

n C

oNLL

-05

Dev

.

70

75

80

85

L2 L4 L6 L8

Greedy decodingViterbi decoding

81.681.480.5

77.2

80.580.179.1

74.6Performance increases as model goes deeper. Biggest jump from 2 to 4 layer.

Ablations on Number of Layers (2,4,6 and 8)

30

F1 o

n C

oNLL

-05

Dev

.

70

75

80

85

L2 L4 L6 L8

Greedy decodingViterbi decoding

81.681.480.5

77.2

80.580.179.1

74.6

Shallow models benefit more from constrained decoding.

Performance increases as model goes deeper. Biggest jump from 2 to 4 layer.

Ablations (single model, on CoNLL05 Dev)

31

F1 o

n D

ev. S

et

60

65

70

75

80

85

Num. Epochs

1 50 100 150 200 250 300 350 400 450 500

Full model No highway No orthonormal init. No dropout

Ablations (single model, on CoNLL05 Dev)

31

F1 o

n D

ev. S

et

60

65

70

75

80

85

Num. Epochs

1 50 100 150 200 250 300 350 400 450 500

Full model No highway No orthonormal init. No dropout

Ablations (single model, on CoNLL05 Dev)

31

F1 o

n D

ev. S

et

60

65

70

75

80

85

Num. Epochs

1 50 100 150 200 250 300 350 400 450 500

Full model No highway No orthonormal init. No dropout

Without initialization, the deep model learns very slowly

Ablations (single model, on CoNLL05 Dev)

31

F1 o

n D

ev. S

et

60

65

70

75

80

85

Num. Epochs

1 50 100 150 200 250 300 350 400 450 500

Full model No highway No orthonormal init. No dropout

Without dropout, model overfits at ~300 epochs.

Without initialization, the deep model learns very slowly

What can we learn from the results?

32

F1

60657075808590

Ours* Ours Zhou FitzGerald* Täckström Pradhan

77.579.480.281.581.783.4

1. What’s in the remaining 17%? When does the model still struggle?

What can we learn from the results?

32

F1

60657075808590

Ours* Ours Zhou FitzGerald* Täckström Pradhan

77.579.480.281.581.783.4

1. What’s in the remaining 17%? When does the model still struggle?2. What are deeper models good at?

What can we learn from the results?

32

F1

60657075808590

Ours* Ours Zhou FitzGerald* Täckström Pradhan

77.579.480.281.581.783.4

1. What’s in the remaining 17%? When does the model still struggle?2. What are deeper models good at?3. BiLSTM-based models are very accurate even without syntax. But

can we conclude syntax is no longer useful in SRL?

33

Question (1): When does the model make mistakes?

33

Question (1): When does the model make mistakes?

Analysis— Error breakdown with oracle transformation — E.g. tease apart labeling errors and boundary errors — Link the error types to known linguistic phenomena (e.g. pp attachment)

Oracle Transformations (Error Breakdown)

34

Error Breakdown Labeling Errors

PP Attachment

Long-range Dependencies

Structural Consistency

Can Syntax Still Help?

Oracle Transformations

Oracle Transformations (Error Breakdown)

34

[We] fly to NYC tomorrow.ARG0ARG1

1) Fix Label:

Error Breakdown Labeling Errors

PP Attachment

Long-range Dependencies

Structural Consistency

Can Syntax Still Help?

Oracle Transformations

Oracle Transformations (Error Breakdown)

34

[We] fly to NYC tomorrow.ARG0ARG1

1) Fix Label: [They] wrote [an email] to

cancel it.

ARG0ARG02) Move core arg:

Error Breakdown Labeling Errors

PP Attachment

Long-range Dependencies

Structural Consistency

Can Syntax Still Help?

Oracle Transformations

Oracle Transformations (Error Breakdown)

34

[We] fly to NYC tomorrow.ARG0ARG1

I eat [pasta with delight].ARG1

[pasta] [with delight]ARG1 ARGM-MNR

1) Fix Label: [They] wrote [an email] to

cancel it.

ARG0ARG02) Move core arg:

3) Split/Merge span:

I eat [pasta] [with broccoli].

ARG1[pasta with broccoli]

ARG1 ARGM-MNR

Error Breakdown Labeling Errors

PP Attachment

Long-range Dependencies

Structural Consistency

Can Syntax Still Help?

Oracle Transformations

Oracle Transformations (Error Breakdown)

34

[We] fly to NYC tomorrow.ARG0ARG1

I eat [pasta with delight].ARG1

[pasta] [with delight]ARG1 ARGM-MNR

[“No broccoli”,] I said.

[“No broccoli”] ,

ARG1

1) Fix Label: [They] wrote [an email] to

cancel it.

ARG0ARG02) Move core arg:

3) Split/Merge span:

I eat [pasta] [with broccoli].

ARG1[pasta with broccoli]

ARG1 ARGM-MNR

4) Fix span boundary:

Error Breakdown Labeling Errors

PP Attachment

Long-range Dependencies

Structural Consistency

Can Syntax Still Help?

Oracle Transformations

Oracle Transformations (Error Breakdown)

34

[We] fly to NYC tomorrow.ARG0ARG1

I eat [pasta with delight].ARG1

[pasta] [with delight]ARG1 ARGM-MNR

[“No broccoli”,] I said.

[“No broccoli”] ,

ARG1

1) Fix Label: [They] wrote [an email] to

cancel it.

ARG0ARG02) Move core arg:

3) Split/Merge span:

I eat [pasta] [with broccoli].

ARG1[pasta with broccoli]

ARG1 ARGM-MNR

4) Fix span boundary:

5) Drop/add arg:

Hesitantly, they declined to elaborate [on that matter].

[Hesitantly], they declined to elaborate on that matter.

ARG1ARGM-MNR+

Error Breakdown Labeling Errors

PP Attachment

Long-range Dependencies

Structural Consistency

Can Syntax Still Help?

Oracle Transformations

35

F1 a

fter e

rror fi

x

75

81.25

87.5

93.75

100

Original Fix Labels Move arg. Merge/Split spans Fix boundary Drop/Add arg.

Ours Pradhan Punyakanok

Oracle Transformations (Error Breakdown) Result

Pradhan, Punyakanok: CoNLL-2005 systems

Error Breakdown Labeling Errors

PP Attachment

Long-range Dependencies

Structural Consistency

Can Syntax Still Help?

35

F1 a

fter e

rror fi

x

75

81.25

87.5

93.75

100

Original Fix Labels Move arg. Merge/Split spans Fix boundary Drop/Add arg.

Ours Pradhan Punyakanok

Oracle Transformations (Error Breakdown) Result

Pradhan, Punyakanok: CoNLL-2005 systems

Error Breakdown Labeling Errors

PP Attachment

Long-range Dependencies

Structural Consistency

Can Syntax Still Help?

35

F1 a

fter e

rror fi

x

75

81.25

87.5

93.75

100

Original Fix Labels Move arg. Merge/Split spans Fix boundary Drop/Add arg.

Ours Pradhan Punyakanok

Oracle Transformations (Error Breakdown) Result

Pradhan, Punyakanok: CoNLL-2005 systems

Error Breakdown Labeling Errors

PP Attachment

Long-range Dependencies

Structural Consistency

Can Syntax Still Help?

35

F1 a

fter e

rror fi

x

75

81.25

87.5

93.75

100

Original Fix Labels Move arg. Merge/Split spans Fix boundary Drop/Add arg.

Ours Pradhan Punyakanok

Oracle Transformations (Error Breakdown) Result

Pradhan, Punyakanok: CoNLL-2005 systems

Error Breakdown Labeling Errors

PP Attachment

Long-range Dependencies

Structural Consistency

Can Syntax Still Help?

35

F1 a

fter e

rror fi

x

75

81.25

87.5

93.75

100

Original Fix Labels Move arg. Merge/Split spans Fix boundary Drop/Add arg.

Ours Pradhan Punyakanok

Oracle Transformations (Error Breakdown) Result

Pradhan, Punyakanok: CoNLL-2005 systems

Error Breakdown Labeling Errors

PP Attachment

Long-range Dependencies

Structural Consistency

Can Syntax Still Help?

35

F1 a

fter e

rror fi

x

75

81.25

87.5

93.75

100

Original Fix Labels Move arg. Merge/Split spans Fix boundary Drop/Add arg.

Ours Pradhan Punyakanok

Oracle Transformations (Error Breakdown) Result

Pradhan, Punyakanok: CoNLL-2005 systems

Error Breakdown Labeling Errors

PP Attachment

Long-range Dependencies

Structural Consistency

Can Syntax Still Help?

35

F1 a

fter e

rror fi

x

75

81.25

87.5

93.75

100

Original Fix Labels Move arg. Merge/Split spans Fix boundary Drop/Add arg.

Ours Pradhan Punyakanok

Labeling error 29.3%

Oracle Transformations (Error Breakdown) Result

Pradhan, Punyakanok: CoNLL-2005 systems

Error Breakdown Labeling Errors

PP Attachment

Long-range Dependencies

Structural Consistency

Can Syntax Still Help?

35

F1 a

fter e

rror fi

x

75

81.25

87.5

93.75

100

Original Fix Labels Move arg. Merge/Split spans Fix boundary Drop/Add arg.

Ours Pradhan Punyakanok

Labeling error 29.3%

Attachment error 25.3%

Oracle Transformations (Error Breakdown) Result

Pradhan, Punyakanok: CoNLL-2005 systems

Error Breakdown Labeling Errors

PP Attachment

Long-range Dependencies

Structural Consistency

Can Syntax Still Help?

36

Confusion matrix for labeling errors (row normalized)

Error Breakdown Labeling Errors PP

AttachmentLong-range

DependenciesStructural

ConsistencyCan Syntax Still Help?

36

Confusion matrix for labeling errors (row normalized)

• ARG2 is often confused with certain adjuncts (DIR, LOC, MNR), why?

Error Breakdown Labeling Errors PP

AttachmentLong-range

DependenciesStructural

ConsistencyCan Syntax Still Help?

36

Confusion matrix for labeling errors (row normalized)

Predicate: move

Arg0-PAG: mover        Arg1-PPT: moved        Arg2-GOL: destination        Arg3-VSP: aspect, domain in which arg1 moving

• ARG2 is often confused with certain adjuncts (DIR, LOC, MNR), why?

Error Breakdown Labeling Errors PP

AttachmentLong-range

DependenciesStructural

ConsistencyCan Syntax Still Help?

Predicate: cut

        Arg0-PAG: intentional cutter        Arg1-PPT: thing cut        Arg2-DIR: medium, source        Arg3-MNR: instrument, unintentional cutter        Arg4-GOL: beneficiary

36

Confusion matrix for labeling errors (row normalized)

Predicate: move

Arg0-PAG: mover        Arg1-PPT: moved        Arg2-GOL: destination        Arg3-VSP: aspect, domain in which arg1 moving

• ARG2 is often confused with certain adjuncts (DIR, LOC, MNR), why?

Error Breakdown Labeling Errors PP

AttachmentLong-range

DependenciesStructural

ConsistencyCan Syntax Still Help?

Predicate: cut

        Arg0-PAG: intentional cutter        Arg1-PPT: thing cut        Arg2-DIR: medium, source        Arg3-MNR: instrument, unintentional cutter        Arg4-GOL: beneficiary

36

Confusion matrix for labeling errors (row normalized)

Predicate: move

Arg0-PAG: mover        Arg1-PPT: moved        Arg2-GOL: destination        Arg3-VSP: aspect, domain in which arg1 moving

Predicate: strike

        Arg0-PAG: Agent         Arg1-PPT: Theme(-Creation)         Arg2-MNR: Instrument

• ARG2 is often confused with certain adjuncts (DIR, LOC, MNR), why?

Error Breakdown Labeling Errors PP

AttachmentLong-range

DependenciesStructural

ConsistencyCan Syntax Still Help?

Predicate: cut

        Arg0-PAG: intentional cutter        Arg1-PPT: thing cut        Arg2-DIR: medium, source        Arg3-MNR: instrument, unintentional cutter        Arg4-GOL: beneficiary

36

Confusion matrix for labeling errors (row normalized)

Predicate: move

Arg0-PAG: mover        Arg1-PPT: moved        Arg2-GOL: destination        Arg3-VSP: aspect, domain in which arg1 moving

Predicate: strike

        Arg0-PAG: Agent         Arg1-PPT: Theme(-Creation)         Arg2-MNR: Instrument

• ARG2 is often confused with certain adjuncts (DIR, LOC, MNR), why?

• Argument-adjunct distinctions are difficult even for human annotators!

Error Breakdown Labeling Errors PP

AttachmentLong-range

DependenciesStructural

ConsistencyCan Syntax Still Help?

Predicate: cut

        Arg0-PAG: intentional cutter        Arg1-PPT: thing cut        Arg2-DIR: medium, source        Arg3-MNR: instrument, unintentional cutter        Arg4-GOL: beneficiary

36

Confusion matrix for labeling errors (row normalized)

Predicate: move

Arg0-PAG: mover        Arg1-PPT: moved        Arg2-GOL: destination        Arg3-VSP: aspect, domain in which arg1 moving

Predicate: strike

        Arg0-PAG: Agent         Arg1-PPT: Theme(-Creation)         Arg2-MNR: Instrument

• ARG2 is often confused with certain adjuncts (DIR, LOC, MNR), why?

• Argument-adjunct distinctions are difficult even for human annotators!

Error Breakdown Labeling Errors PP

AttachmentLong-range

DependenciesStructural

ConsistencyCan Syntax Still Help?

“After many attempts to find a reliable test to distinguish between arguments and adjuncts, we abandoned structurally marking this difference.”

—The Penn Treebank: An Overview (Taylor et al., 2003)

37

Sumimoto financed the acquisition from Sears

Error Breakdown

Labeling Errors PP Attachment Long-range

DependenciesStructural

ConsistencyCan Syntax Still Help?

37

Sumimoto financed the acquisition from Sears

Wrong PP attachment (attach high)

Correct PP attachment (attach low)

Arg2 (PP)Arg1 (NP)

Arg1 (NP)

Error Breakdown

Labeling Errors PP Attachment Long-range

DependenciesStructural

ConsistencyCan Syntax Still Help?

37

Sumimoto financed the acquisition from Sears

Wrong PP attachment (attach high)

Correct PP attachment (attach low)

Arg2 (PP)Arg1 (NP)

Arg1 (NP)

Wrong SRL spans

Correct SRL spans

merge

Error Breakdown

Labeling Errors PP Attachment Long-range

DependenciesStructural

ConsistencyCan Syntax Still Help?

37

Sumimoto financed the acquisition from Sears

Wrong PP attachment (attach high)

Correct PP attachment (attach low)

Arg2 (PP)Arg1 (NP)

Arg1 (NP)

Wrong SRL spans

Correct SRL spans

merge

Error Breakdown

Labeling Errors PP Attachment Long-range

DependenciesStructural

ConsistencyCan Syntax Still Help?

Merge/split span operations: 25.3%. of the mode mistakes.

Categorize the Y spans in : [XY]—>[X][Y] and

[X][Y]—>[XY] operations using gold syntactic labels

37

Sumimoto financed the acquisition from Sears

Wrong PP attachment (attach high)

Correct PP attachment (attach low)

Arg2 (PP)Arg1 (NP)

Arg1 (NP)

Wrong SRL spans

Correct SRL spans

merge

Error Breakdown

Labeling Errors PP Attachment Long-range

DependenciesStructural

ConsistencyCan Syntax Still Help?

Merge/split span operations: 25.3%. of the mode mistakes.

Categorize the Y spans in : [XY]—>[X][Y] and

[X][Y]—>[XY] operations using gold syntactic labels

38

Question (1): When does the model make mistakes?

Analysis— Error breakdown with oracle transformation — E.g. tease apart labeling errors and boundary errors — Link the error types to known linguistic phenomena (e.g. pp attachment)

38

Question (1): When does the model make mistakes?

Takeaway— Traditionally hard tasks, such as argument-adjunct distinction and PP attachment decisions are still challenging! — Use external information to improve PP attachment.

Analysis— Error breakdown with oracle transformation — E.g. tease apart labeling errors and boundary errors — Link the error types to known linguistic phenomena (e.g. pp attachment)

39

Question (2): What are deeper models good at?

Analysis— Long-range dependencies: model performance on arguments that are far away from the predicates. — Structural consistency: amount of inconsistent BIO tag pairs in greedy prediction.

40

Avg.

F1

54

62.5

71

79.5

88

0 1-2 3-7 7-max

L8 L6 L4 L2 PunyakanokPradhan

0

1250

2500

3750

5000

Distance to the predicate (by num. words)

0 1-2 3-7 7-max

6571,174

2,151

4,469

Num. gold arguments

Error Breakdown

Labeling Errors

PP Attachment

Long-range Dependencies

Structural Consistency

Can Syntax Still Help?

40

Avg.

F1

54

62.5

71

79.5

88

0 1-2 3-7 7-max

L8 L6 L4 L2 PunyakanokPradhan

0

1250

2500

3750

5000

Distance to the predicate (by num. words)

0 1-2 3-7 7-max

6571,174

2,151

4,469

Num. gold arguments

Error Breakdown

Labeling Errors

PP Attachment

Long-range Dependencies

Structural Consistency

Can Syntax Still Help?

40

Avg.

F1

54

62.5

71

79.5

88

0 1-2 3-7 7-max

L8 L6 L4 L2 PunyakanokPradhan

0

1250

2500

3750

5000

Distance to the predicate (by num. words)

0 1-2 3-7 7-max

6571,174

2,151

4,469

Num. gold arguments

Error Breakdown

Labeling Errors

PP Attachment

Long-range Dependencies

Structural Consistency

Can Syntax Still Help?

40

Avg.

F1

54

62.5

71

79.5

88

0 1-2 3-7 7-max

L8 L6 L4 L2 PunyakanokPradhan

0

1250

2500

3750

5000

Distance to the predicate (by num. words)

0 1-2 3-7 7-max

6571,174

2,151

4,469

Num. gold arguments

Error Breakdown

Labeling Errors

PP Attachment

Long-range Dependencies

Structural Consistency

Can Syntax Still Help?

40

Avg.

F1

54

62.5

71

79.5

88

0 1-2 3-7 7-max

L8 L6 L4 L2 PunyakanokPradhan

0

1250

2500

3750

5000

Distance to the predicate (by num. words)

0 1-2 3-7 7-max

6571,174

2,151

4,469

Num. gold arguments

Deep models’ performance deteriorates slower on long-distance predictions

Error Breakdown

Labeling Errors

PP Attachment

Long-range Dependencies

Structural Consistency

Can Syntax Still Help?

41

BIO

-vio

latio

ns/T

oken

0

0.06

0.12

0.18

L2 L4 L6 L8 L8+Ensemble

0.070.070.060.08

0.18

Error Breakdown

Labeling Errors

PP Attachment

Long-range Dependencies Structural Consistency Can Syntax

Still Help?

e.g. (B-ArgX, I-ArgY) or (O, I-ArgY)

41

Deeper models (with 4+ layers) generate more consistent BIO sequences.

BIO

-vio

latio

ns/T

oken

0

0.06

0.12

0.18

L2 L4 L6 L8 L8+Ensemble

0.070.070.060.08

0.18

Error Breakdown

Labeling Errors

PP Attachment

Long-range Dependencies Structural Consistency Can Syntax

Still Help?

e.g. (B-ArgX, I-ArgY) or (O, I-ArgY)

42

Question (3): Can syntax still help SRL?

Recap— PropBank SRL is annotated on top of the PTB syntax. — More than 98% of the gold SRL spans are syntactic constituents. Analysis— At decoding time, make predicted argument spans agree with given syntactic structure. — See if SRL performance increases.

F1

75

80

85

90

95

100

% Arguments in gold syntax tree91 93 95 97 99

43

L2

Syntax-aware modelsBiLSTM-based models

Gold

PunyakanokPradhan

L4 L6 L8 L8+PoE

Error Breakdown

Labeling Errors

PP Attachment

Long-range Dependencies

Structural Consistency Can Syntax Still Help?

F1

75

80

85

90

95

100

% Arguments in gold syntax tree91 93 95 97 99

43

L2

Syntax-aware modelsBiLSTM-based models

Gold

PunyakanokPradhan

L4 L6 L8 L8+PoE

Improve SRL accuracy by adding

syntactic information?

Error Breakdown

Labeling Errors

PP Attachment

Long-range Dependencies

Structural Consistency Can Syntax Still Help?

44

[The cats] love [hats and the dogs] love bananas. ARG0 ARG1

[The cats][hats and the dogs]

2 Syntax Tree

62 Syntax Tree

Error Breakdown

Labeling Errors

PP Attachment

Long-range Dependencies

Structural Consistency Can Syntax Still Help?

Constrained Decoding with Syntax

44

[The cats] love [hats and the dogs] love bananas. ARG0 ARG1

[The cats][hats and the dogs]

2 Syntax Tree

62 Syntax Tree

Error Breakdown

Labeling Errors

PP Attachment

Long-range Dependencies

Structural Consistency Can Syntax Still Help?

Constrained Decoding with Syntax

Penalize sequence score

44

[The cats] love [hats and the dogs] love bananas. ARG0 ARG1

[The cats][hats and the dogs]

Sequence score:

2 Syntax Tree

62 Syntax Tree

Error Breakdown

Labeling Errors

PP Attachment

Long-range Dependencies

Structural Consistency Can Syntax Still Help?

Constrained Decoding with Syntax

tX

i=1

log p(tagt | sentence)� C ⇥X

span

1(span 62 Syntax Tree)

Penalize sequence score

Num. arguments disagree w\ syntax

Penalty strength

44

• Constraints are not locally decomposable. • A* search (Lewis and Steedman 2014) for a sequence with highest score.

[The cats] love [hats and the dogs] love bananas. ARG0 ARG1

[The cats][hats and the dogs]

Sequence score:

2 Syntax Tree

62 Syntax Tree

Error Breakdown

Labeling Errors

PP Attachment

Long-range Dependencies

Structural Consistency Can Syntax Still Help?

Constrained Decoding with Syntax

tX

i=1

log p(tagt | sentence)� C ⇥X

span

1(span 62 Syntax Tree)

Penalize sequence score

Num. arguments disagree w\ syntax

Penalty strength

45

Charniak: A maximum-entropy-inspired parser, Charniak, 2000 Choe: Parsing as language modeling, Choe and Charniak, 2016

(State of the art)

Error Breakdown

Labeling Errors

PP Attachment

Long-range Dependencies

Structural Consistency Can Syntax Still Help?

Constrained Decoding with Syntax

45

Charniak: A maximum-entropy-inspired parser, Charniak, 2000 Choe: Parsing as language modeling, Choe and Charniak, 2016

(State of the art)

Error Breakdown

Labeling Errors

PP Attachment

Long-range Dependencies

Structural Consistency Can Syntax Still Help?

Constrained Decoding with Syntax

F1

75

80

85

90

95

100

% Arguments agree with gold syntax91 93 95 97 99

46

L2

Syntax-aware modelsBiLSTM-based models

Gold

PunyakanokPradhan

L4 L6 L8 L8+PoE

Constrained decoding w\ Syntax

Error Breakdown

Labeling Errors

PP Attachment

Long-range Dependencies

Structural Consistency Can Syntax Still Help?

F1

75

80

85

90

95

100

% Arguments agree with gold syntax91 93 95 97 99

46

L2

Syntax-aware modelsBiLSTM-based models

Gold

PunyakanokPradhan

L4 L6 L8 L8+PoE +AutoSyn+GoldSyn

Constrained decoding w\ Syntax

Error Breakdown

Labeling Errors

PP Attachment

Long-range Dependencies

Structural Consistency Can Syntax Still Help?

Takeaway— Modest gain observed with predicted syntax. — Joint training could bring more improvement.

47

Contributions (Neural SRL)

• New state-of-the-art deep network for end-to-end SRL.

• Code and models will be publicly available at: https://github.com/luheng/deep_srl

• In-depth error analysis indicating where the models work well and where they still struggle.

• Syntax-based experiments pointing towards directions for future improvements.

48

Long-term Plan for Improving SRL

Step 3: SRL system for many domains — Future work …

Step 1: Collect more data for SRL — Question-Answer Driven Semantic Role Labeling (QA-SRL) — Human-in-the-Loop Parsing

Step 2: Build accurate SRL model — Neural Semantic Role Labeling (for PropBank SRL)

48

Long-term Plan for Improving SRL

Step 3: SRL system for many domains — Future work …

Step 1: Collect more data for SRL — Question-Answer Driven Semantic Role Labeling (QA-SRL) — Human-in-the-Loop Parsing

Step 2: Build accurate SRL model — Neural Semantic Role Labeling (for PropBank SRL)

49

Thanks!

Frame: break.01

role descriptionARG0 breakerARG1 thing broken

ARG2 instrument

ARG3 pieces

ARG4 broken away from what?

Model AnalysisProblem

Code will be available at: https://github.com/luheng/deep_srl

top related