dynamic feature selection for dependency parsing › ~jason › papers ›...

102
Dynamic Feature Selection for Dependency Parsing He He, Hal Daumé III and Jason Eisner EMNLP 2013, Seattle

Upload: others

Post on 03-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

Dynamic Feature Selection for Dependency Parsing

He He, Hal Daumé III and Jason Eisner

EMNLP 2013, Seattle

Page 2: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

2

Structured Prediction in NLP

Fruit flies like a banana .

果 蝇 喜欢 香蕉 。

N ⟶ N ⟶ V ⟶ Det ⟶ N

↓ ↓ ↓ ↓ ↓

Fruit flies like a banana

summarization, name entity resolution and many more ...

Machine Translation

ParsingPart-of-Speech Tagging

$ Fruit flies like a banana

Page 3: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

3

Structured Prediction in NLP

Fruit flies like a banana .

果 蝇 喜欢 香蕉 。

N ⟶ N ⟶ V ⟶ Det ⟶ N

↓ ↓ ↓ ↓ ↓

Fruit flies like a banana

summarization, name entity resolution and many more ...

Machine Translation

ParsingPart-of-Speech Tagging

$ Fruit flies like a banana

Exponentially increasing search space

Millions of features for scoring

Page 4: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

4

Structured Prediction in NLP

⋮ ⋮ ⋮

Fruit flies like a banana

⋮ ⋮

N

V

D

N

V

D

N

V

D

N

V

D

N

V

D

abanana

like

flies

Fruit

$

Page 5: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

5

Structured Prediction in NLP

⋮ ⋮ ⋮

Fruit flies like a banana

⋮ ⋮

N

V

D

N

V

D

N

V

D

N

V

D

N

V

D

abanana

like

flies

Fruit

$

tokenleft tokenright tokenin-between tokenstemformbigramtagcoarse taglengthdirection...

Feature templates per edge

Page 6: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

6

Structured Prediction in NLP

⋮ ⋮ ⋮

Fruit flies like a banana

⋮ ⋮

N

V

D

N

V

D

N

V

D

N

V

D

N

V

D

abanana

like

flies

Fruit

$

tokenleft tokenright tokenin-between tokenstemformbigramtagcoarse taglengthdirection...

(head_token + mod_token)X

(head_tag +mod_tag)

Feature templates per edge

Page 7: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

7

Structured Prediction in NLP

⋮ ⋮ ⋮

Fruit flies like a banana

⋮ ⋮tokenleft tokenright tokenin-between tokenstemformbigramtagcoarse taglengthdirection...

(head_token + mod_token)X

(head_tag +mod_tag)

N

V

D

N

V

D

N

V

D

N

V

D

N

V

D

abanana

like

flies

Fruit

$

HUGE

Feature templates per edge

Page 8: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

8

Structured Prediction in NLP

⋮ ⋮ ⋮

Fruit flies like a banana

⋮ ⋮

N

V

D

N

V

D

N

V

D

N

V

D

N

V

D

abanana

like

flies

Fruit

$

Do you need all features everywhere ?

token

tagcoarse taglengthdirection...

Feature templates per edge

Page 9: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

9

Structured Prediction in NLP

⋮ ⋮ ⋮

Fruit flies like a banana

⋮ ⋮

abanana

like

flies

Fruit

$

N

V

D

N

V

D

N

V

D

N

V

D

N

V

D

token

tagcoarse taglengthdirection...

Feature templates per edge

Do you need all features everywhere ?

Page 10: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

10

Structured Prediction in NLP

⋮ ⋮ ⋮

Fruit flies like a banana

⋮ ⋮

abanana

like

flies

Fruit

$

N

V

D

N

V

D

N

V

D

N

V

D

N

V

D

token

tagcoarse taglengthdirection...

Feature templates per edge

Do you need all features everywhere ?

Page 11: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

11

Structured Prediction in NLP

⋮ ⋮ ⋮

Fruit flies like a banana

⋮ ⋮

abanana

like

flies

Fruit

$

N

V

D

N

V

D

N

V

D

N

V

D

N

V

D

token

tagcoarse taglengthdirection...

Feature templates per edge

Dynamic Decisions

Page 12: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

12

Case Study: Dependency Parsing

BulgarianChinese

EnglishGerman

JapanesePortuguese

Swedish

0

1

2

3

4

5

6

OursBaseline

Sp

ee

du

p

2x to 6x speedup with little loss in accuracy

Page 13: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

13

Graph-based Dependency Parsing

.This time , the firms were ready$

Scoring:

Page 14: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

14

Graph-based Dependency Parsing

.This time , the firms were ready$

Scoring:

firms were

length: 1direction: right

modifier_token: werehead_token: firms

head_tag: noun

And hundreds more!

Page 15: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

15

Graph-based Dependency Parsing

.This time , the firms were ready$

Decoding: find the highest-scoring tree

the

$

This

time , firms

were

ready .

Page 16: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

16

totaltime

MST Dependency Parsing (1st-order projective)

Page 17: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

17

? ? ? ?

Find highest-scoring tree O(n3)

MST Dependency Parsing (1st-order projective)

Page 18: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

18

Average Sentence0

10

20

30

40

50

60

find edge scores

Find highest-scoring tree O(n3)

Find edge scores

~268 feature templates~76M features

MST Dependency Parsing (1st-order projective)

Page 19: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

19

Add features only when necessary!

This the firms ready

score(This → ready) =

score(the → firms) =

Page 20: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

20

Add features only when necessary!

This the firms ready

score(This → ready) = -0.23

score(the → firms) = 0.63

-0.23

+0.63

Page 21: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

21

Add features only when necessary!

This the firms ready

score(This → ready) = -0.13

score(the → firms) = 1.33

-0.23

+0.1

+0.63 +0.7

Page 22: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

22

Add features only when necessary!

This the firms ready

score(This → ready) = -0.13

score(the → firms) = 1.33

-0.23

+0.1

+0.63 +0.7WINNER

Page 23: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

23

Add features only when necessary!

This the firms ready

score(This → ready) = -1.33

score(the → firms) = 1.33

-0.23

+0.1

-1.2

+0.63 +0.7WINNER

Page 24: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

24

Add features only when necessary!

This the firms ready

score(This → ready) = -1.88

score(the → firms) = 1.33

-0.23

+0.1

-1.2

- 0.55

+0.63 +0.7WINNER

LOSER

Page 25: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

25

Add features only when necessary!

This the firms ready

score(This → ready) = -1.88

score(the → firms) = 1.33

-0.23

+0.1

-1.2

- 0.55

+0.63 +0.7

This is a structured problem! Should not look at scores independently.

WINNER

LOSER

Page 26: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

26

Dynamic Dependency Parsing

1.Find the highest-scoring tree after adding some features fast non-projective decoding

Page 27: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

27

Dynamic Dependency Parsing

1.Find the highest-scoring tree after adding some features fast non-projective decoding

2.Only edges in the current best tree can win

Page 28: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

28

Dynamic Dependency Parsing

1.Find the highest-scoring tree after adding some features fast non-projective decoding

2.Only edges in the current best tree can win

are chosen by a classifier ≤ n decisions

are killed because they fight with the winners

Page 29: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

29

Dynamic Dependency Parsing

1.Find the highest-scoring tree after adding some features fast non-projective decoding

2.Only edges in the current best tree can win

are chosen by a classifier ≤ n decisions

are killed because they fight with the winners

3. Add features to undetermined edges by group

Page 30: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

30

Dynamic Dependency Parsing

1.Find the highest-scoring tree after adding some features fast non-projective decoding

2.Only edges in the current best tree can win

are chosen by a classifier ≤ n decisions

are killed because they fight with the winners

3. Add features to undetermined edges by group

Max # of iterations = # of feature groups

Page 31: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

31

.This time , the firms were ready$

+ first feature group 51 gray edges with unknown fate...5 features per gray edge

Current 1-best treeWinner edge

Loser edge

Undetermined edge

(permanently in 1-best tree)

Page 32: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

32

.This time , the firms were ready$

51 gray edges with unknown fate...5 features per gray edge

Current 1-best treeWinner edge

Loser edge

Undetermined edge

(permanently in 1-best tree)

Non-projective decoding to find new 1-best tree

Page 33: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

33

.This time , the firms were ready$

50 gray edges with unknown fate...5 features per gray edge

Classifier picks winners among the blue edgesCurrent 1-best tree

Winner edge

Loser edge

Undetermined edge

(permanently in 1-best tree)

Page 34: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

34

.This time , the firms were ready$

44 gray edges with unknown fate...5 features per gray edge

Current 1-best treeWinner edge

Loser edge

Undetermined edge

(permanently in 1-best tree)

Remove losers in conflict with the winners

Page 35: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

35

.This time , the firms were ready$

44 gray edges with unknown fate...5 features per gray edge

Current 1-best treeWinner edge

Loser edge

Undetermined edge

(permanently in 1-best tree)

Remove losers in conflict with the winners

Page 36: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

36

.This time , the firms were ready$

+ next feature group 44 gray edges with unknown fate...27 features per gray edge

Current 1-best treeWinner edge

Loser edge

Undetermined edge

(permanently in 1-best tree)

Page 37: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

37

.This time , the firms were ready$

+ next feature group 44 gray edges with unknown fate...27 features per gray edge

Current 1-best treeWinner edge

Loser edge

Undetermined edge

(permanently in 1-best tree)

Non-projective decoding to find new 1-best tree

Page 38: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

38

.This time , the firms were ready$

42 gray edges with unknown fate...27 features per gray edge

Current 1-best treeWinner edge

Loser edge

Undetermined edge

(permanently in 1-best tree)

Classifier picks winners among the blue edges

Page 39: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

39

.This time , the firms were ready$

31 gray edges with unknown fate...27 features per gray edge

Current 1-best treeWinner edge

Loser edge

Undetermined edge

(permanently in 1-best tree)

Remove losers in conflict with the winners

Page 40: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

40

.This time , the firms were ready$

31 gray edges with unknown fate...27 features per gray edge

Current 1-best treeWinner edge

Loser edge

Undetermined edge

(permanently in 1-best tree)

Remove losers in conflict with the winners

Page 41: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

41

.This time , the firms were ready$

+ next feature group 31 gray edges with unknown fate...74 features per gray edge

Current 1-best treeWinner edge

Loser edge

Undetermined edge

(permanently in 1-best tree)

Page 42: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

42

.This time , the firms were ready$

31 gray edges with unknown fate...74 features per gray edge

Current 1-best treeWinner edge

Loser edge

Undetermined edge

(permanently in 1-best tree)

Non-projective decoding to find new 1-best tree

Page 43: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

43

.This time , the firms were ready$

28 gray edges with unknown fate...74 features per gray edge

Current 1-best treeWinner edge

Loser edge

Undetermined edge

(permanently in 1-best tree)

Classifier picks winners among the blue edges

Page 44: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

44

.This time , the firms were ready$

8 gray edges with unknown fate...74 features per gray edge

Current 1-best treeWinner edge

Loser edge

Undetermined edge

(permanently in 1-best tree)

Remove losers in conflict with the winners

Page 45: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

45

.This time , the firms were ready$

8 gray edges with unknown fate...74 features per gray edge

Current 1-best treeWinner edge

Loser edge

Undetermined edge

(permanently in 1-best tree)

Remove losers in conflict with the winners

Page 46: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

46

.This time , the firms were ready$

+ next feature group 8 gray edges with unknown fate...107 features per gray edge

Current 1-best treeWinner edge

Loser edge

Undetermined edge

(permanently in 1-best tree)

Page 47: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

47

.This time , the firms were ready$

8 gray edges with unknown fate...107 features per gray edge

Current 1-best treeWinner edge

Loser edge

Undetermined edge

(permanently in 1-best tree)

Non-projective decoding to find new 1-best tree

Page 48: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

48

.This time , the firms were ready$

7 gray edges with unknown fate...107 features per gray edge

Current 1-best treeWinner edge

Loser edge

Undetermined edge

(permanently in 1-best tree)

Classifier picks winners among the blue edges

Page 49: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

49

.This time , the firms were ready$

3 gray edges with unknown fate...107 features per gray edge

Current 1-best treeWinner edge

Loser edge

Undetermined edge

(permanently in 1-best tree)

Remove losers in conflict with the winners

Page 50: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

50

.This time , the firms were ready$

3 gray edges with unknown fate...107 features per gray edge

Current 1-best treeWinner edge

Loser edge

Undetermined edge

(permanently in 1-best tree)

Remove losers in conflict with the winners

Page 51: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

51

.This time , the firms were ready$

+ last feature group 3 gray edges with unknown fate...268 features per gray edge

Current 1-best treeWinner edge

Loser edge

Undetermined edge

(permanently in 1-best tree)

Page 52: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

52

.This time , the firms were ready$

0 gray edge with unknown fate...268 features per gray edge

Current 1-best treeWinner edge

Loser edge

Undetermined edge

(permanently in 1-best tree)

Projective decoding to find final 1-best tree

Page 53: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

53

What Happens During the Average Parse?

Page 54: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

54

Most edges winor lose early

What Happens During the Average Parse?

Page 55: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

55Some edges win late

Most edges winor lose early

What Happens During the Average Parse?

Page 56: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

56Some edges win late

Most edges winor lose early

Later features are helpful

What Happens During the Average Parse?

Page 57: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

57Some edges win late

Linear increase in runtime

Most edges winor lose early

Later features are helpful

What Happens During the Average Parse?

Page 58: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

58

Summary: How Early Decisions Are Made

● Winners– Will definitely appear in the 1-best tree

● Losers – Have the same child as a winning edge– Form cycle with winning edges– Cross a winning edge (optional)– Share root ($) with a winning edge (optional)

● Undetermined– Add the next feature group to the remaining

gray edges

Page 59: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

59

Feature Template Ranking

● Forward selection

Page 60: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

60

Feature Template Ranking

● Forward selection

A 0.60B 0.49C 0.55

Page 61: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

61

Feature Template Ranking

● Forward selection

A1 A

A 0.60B 0.49C 0.55

Page 62: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

62

Feature Template Ranking

● Forward selection

A&B 0.80A&C 0.85

A1 A

A 0.60B 0.49C 0.55

Page 63: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

63

Feature Template Ranking

● Forward selection

A&B 0.80A&C 0.85

A C1 A2 C

A 0.60B 0.49C 0.55

Page 64: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

64

Feature Template Ranking

● Forward selection

A&B 0.80A&C 0.85

A CA&C&B 0.9

1 A

3 B2 C

A 0.60B 0.49C 0.55

Page 65: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

65

Feature Template Ranking

● Forward selection

A&B 0.80A&C 0.85

A CA&C&B 0.9

1 A

3 B2 C

A 0.60B 0.49C 0.55

● Groupinghead cPOS+ mod cPOS + in-between punct # 0.49

in-between cPOS 0.59

head POS + mod POS + in-between conj # 0.71

head POS + mod POS + in-between POS + dist 0.72

head token + mod cPOS + dist 0.80

Page 66: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

66

Feature Template Ranking

● Forward selection

A&B 0.80A&C 0.85

A CA&C&B 0.9

1 A

3 B2 C

A 0.60B 0.49C 0.55

● Groupinghead cPOS+ mod cPOS + in-between punct # 0.49

in-between cPOS 0.59

head POS + mod POS + in-between conj # 0.71

head POS + mod POS + in-between POS + dist 0.72

head token + mod cPOS + dist 0.80

+ ~0.1

+ ~0.1

Page 67: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

67

Partition Feature List Into Groups

Page 68: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

68

How to pick the winners?

Page 69: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

69

How to pick the winners?

● Learn a classifier

Page 70: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

70

How to pick the winners?

● Learn a classifier● Features

– Currently added parsing features – Meta-features -- confidence of a prediction

Page 71: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

71

How to pick the winners?

● Learn a classifier● Features

– Currently added parsing features – Meta-features -- confidence of a prediction

● Training examples– Input: each blue edge in current 1-best tree– Output: is the edge in the gold tree? If so,

we want it to win!

Page 72: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

72

Classifier Features

● Currently added parsing features ● Meta-features

– : …, 0.5, 0.8, 0.85

(scores are normalized by the sigmoid function)

– Margins to the highest-scoring competing edge

– Index of the next feature group

the firms

.This time the firms were$

0.720.65 0.30

0.23

0.12

Page 73: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

73

Classifier Features

● Currently added parsing features ● Meta-features

– : …, 0.5, 0.8, 0.85

(scores are normalized by the sigmoid function)

– Margins to the highest-scoring competing edge

– Index of the next feature group

the firms

.This time the firms were$

0.720.65 0.30

0.23

0.12

Page 74: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

74

Classifier Features

● Currently added parsing features ● Meta-features

– : …, 0.5, 0.8, 0.85

(scores are normalized by the sigmoid function)

– Margins to the highest-scoring competing edge

– Index of the next feature group

the firms

.This time the firms were$

0.720.65 0.30

0.23

0.12

Dynamic Features

Page 75: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

75

How To Train With Dynamic Features

● Training examples are not fixed in advance!● Winners/losers from stages < k affect:

– Set of edges to classify at stage k

– The dynamic features of those edges at stage k

● Bad decisions can cause future errors

Page 76: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

76

How To Train With Dynamic Features

● Training examples are not fixed in advance!!● Winners/losers from stages < k affect:

– Set of edges to classify at stage k

– The dynamic features of those edges at stage k

● Bad decisions can cause future errors

Reinforcement / Imitation Learning

● Dataset Aggregation (DAgger) (Ross et al., 2011)

– Iterates between training and running a model

– Learns to recover from past mistakes

Page 77: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

77

Upper Bound of Our Performance

● “Labels”– Gold edges always win

– 96.47% UAS with 2.9% first-order features

.This time , the firms were ready$ .This time , the firms were ready$

Page 78: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

78

How To Train Our Parser

1.Train parsers (non-projective, projective) using all features

2.Rank and group feature templates

3.Iteratively train a classifier to decide winners/losers

Page 79: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

79

Experiment

● Data– Penn Treebank: English

– CoNLL-X: Bulgarian, Chinese, German, Japanese, Portuguese, Swedish

● Parser– MSTParser (McDonald et al., 2006)

● Dynamically-trained Classifier– LibLinear (Fan et al., 2008)

Page 80: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

80

Dynamic Feature Selection Beats Static Forward Selection

Page 81: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

81

Dynamic Feature Selection Beats Static Forward Selection

Add features as needed

Always add the next feature group to all edges

Page 82: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

82

Experiment: 1st-order2x to 6x speedup

BulgarianChinese

EnglishGerman

JapanesePortuguese

Swedish

0

1

2

3

4

5

6

DynFSBaseline

Sp

ee

du

p

Page 83: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

83

Experiment: 1st-order~0.2% loss in accuracy

BulgarianChinese

EnglishGerman

JapanesePortuguese

Swedish

99.3%99.4%99.5%99.6%99.7%99.8%99.9%

100.0%100.1%100.2%100.3%

DynFSBaseline

Re

lativ

e a

ccu

racy

relative accuracy=accuracy of the pruning parser accuracy of the full parser

Page 84: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

84

Second-order Dependency Parsing

were ready .

● Features depend on the siblings as well

● First-order: ● O(n2) substructure to score

● Second-order: ● O(n3) substructure to score

~380 feature templates ~96M features

● Decoding: still O(n3)

the

$

This

time , firms

were

ready .

Page 85: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

85

Experiment: 2nd-order2x to 8x speedup

BulgarianChinese

EnglishGerman

JapanesePortuguese

Swedish

0123456789

DynFSBaseline

Sp

ee

du

p

Page 86: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

86

Experiment: 2nd-order~0.3% loss in accuracy

BulgarianChinese

EnglishGerman

JapanesePortuguese

Swedish

99.3%99.4%99.5%99.6%99.7%99.8%99.9%

100.0%100.1%100.2%100.3%

DynFSBaseline

Re

lativ

e a

ccu

racy

Page 87: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

87

Ours vs Vine Pruning (Rush and Petrov, 2012)

● Vine pruning: a very fast parser that speeds up using orthogonal techniques– Start with short edges (fully scored)

– Add long edges in if needed

● Ours– Start with all edges (partially scored)

– Quickly remove unneeded edges

● Could be combined for further speedup!

Page 88: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

88

VS Vine Pruning: 1st-ordercomparable performance

BulgarianChinese

EnglishGerman

JapanesePortuguese

Swedish

0

1

2

3

4

5

6

DynFSVinePBaseline

Sp

ee

du

p

Page 89: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

89

VS Vine Pruning: 1st-order

BulgarianChinese

EnglishGerman

JapanesePortuguese

Swedish

99.3%99.4%99.5%99.6%99.7%99.8%99.9%

100.0%100.1%100.2%100.3%

DynFSVinePBaseline

Re

lativ

e a

ccu

racy

Page 90: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

90

VS Vine Pruning: 2nd-order

BulgarianChinese

EnglishGerman

JapanesePortuguese

Swedish

0

2

4

6

8

10

12

14

16

DynFSVinePBaseline

Sp

ee

du

p

Page 91: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

91

VS Vine Pruning: 2nd-order

BulgarianChinese

EnglishGerman

JapanesePortuguese

Swedish

99.3%99.4%99.5%99.6%99.7%99.8%99.9%

100.0%100.1%100.2%100.3%

DynFSVinePBaseline

Re

lativ

e a

ccu

racy

Page 92: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

92

Conclusion

● Feature computation is expensive in structured prediction

● Commitment should be made dynamically

● Early commitment to edges reduce both searching and scoring time

● Can be used in other feature-rich models for structured prediction

Page 93: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

93

Backup Slides

Page 94: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

94

Static dictionary pruning (Rush and Petrov, 2012)

VB CD:→ 18VB CD:← 3NN VBG:→ 22NN VBG:← 11...

.This time , the firms were ready$

Page 95: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

95

Reinforcement Learning 101

● Markov Decision Process (MDP)– State: all the information helping us to make

decisions

– Action: things we choose to do

– Reward: criteria for evaluating actions

– Policy: the “brain” that makes the decision

● Goal– Maximize the expected future reward

Page 96: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

96

Policy Learning

π ( + context) = add / lock

● Markov Decision Process (MDP)

– reward = accuracy + λ∙speed

● Reinforcement learning– Delayed reward

– Long time to converge

● Imitation learning– Mimic the oracle

– Reduced to supervised classification problem

the firms

Page 97: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

97

Imitation Learning

● Oracle– (near) optimal performance

– generate target action in any given state

π ( + context) = lockthe firms

time , theπ ( + context) = add

...

Binary classifier

Page 98: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

98

Dataset Aggregation (DAgger)

● Collect data from the oracle only– Different distribution at training and test time

● Iterative policy training

● Correct the learner's mistake● Obtain a policy performs well under its own policy

distribution

Page 99: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

99

Experiment (1st-order)

BulgarianChinese

EnglishGerman

JapanesePortuguese

Swedish

0.00%5.00%

10.00%15.00%20.00%25.00%30.00%35.00%40.00%45.00%

DynFS

Fe

atu

r e c

ost

cost= # feature templates usedtotal # feature templates on the statically pruned graph

Page 100: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

100

Experiment (2nd-order)

BulgarianChinese

EnglishGerman

JapanesePortuguese

Swedish

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

DynFS

Fe

atu

r e c

ost

Page 101: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

101

Second-order Parsing

.

Page 102: Dynamic Feature Selection for Dependency Parsing › ~jason › papers › he+al.emnlp13.slides.pdf · Part-of-Speech Tagging Parsing $ Fruit flies like a banana. 3 Structured Prediction

102

Second-order Parsing

.