syntax and parsing ii - lxmls 2020lxmls.it.pt/2018/part_2_-_dependency_parsing_2018.pdfsyntax and...

84
Syntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald, Alexander Rush, Joakim Nivre, Greg Durrett, David Weiss, Luheng He, Timothy Dozat Lisbon Machine Learning School 2018

Upload: others

Post on 21-Jan-2021

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Syntax and Parsing II Dependency Parsing

Slav Petrov – Google

Thanks to:

Dan Klein, Ryan McDonald, Alexander Rush, Joakim Nivre, Greg Durrett, David Weiss, Luheng He, Timothy Dozat

Lisbon Machine Learning School 2018

Page 2: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

PRON VERB DET NOUN ADP NOUN

nsubj

det

dobj

prep

pobjROOT

Dependency Parsing

They solved the problem with statistics

Page 3: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

(Non-)Projectivity

• Crossing Arcs needed to account for non-projective constructions

• Fairly rare in English but can be common in other languages (e.g. Czech):

Page 4: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Formal Conditions

Page 5: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Styles of Dependency Parsing• Transition-Based (tr)

• Fast, greedy, linear time inference algorithms

• Trained for greedy search • Beam search

• Graph-Based (gr) • Slower, exhaustive, dynamic

programming inference algorithms

• Higher-order factorizations

Time

Accu

racy

O(n)

greedy tr

O(n3)

1st-order grO(n3)

2nd-order gr O(n4)

3rd-order gr

O(k

· n)k-best tr

[Nivre et al. ’03-’11] [McDonald et al. ’05-’06]

Page 6: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Arc-Factored Models

Page 7: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Representation

* As McGwire neared , fans went wild

*

As

McGwire

neared

,

fans

went

wildAs

McG

wire

neared

, fans

went

wild

Heads

Modifiers

Dependency Representation

Page 8: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Representation

* As McGwire neared , fans went wild

*

As

McGwire

neared

,

fans

went

wildAs

McG

wire

neared

, fans

went

wild

Heads

Modifiers

Dependency Representation

Page 9: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Representation

* As McGwire neared , fans went wild

*

As

McGwire

neared

,

fans

went

wildAs

McG

wire

neared

, fans

went

wild

Heads

Modifiers

Dependency Representation

Page 10: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Representation

* As McGwire neared , fans went wild

*

As

McGwire

neared

,

fans

went

wildAs

McG

wire

neared

, fans

went

wild

Heads

Modifiers

Dependency Representation

Page 11: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Representation

* As McGwire neared , fans went wild

*

As

McGwire

neared

,

fans

went

wildAs

McG

wire

neared

, fans

went

wild

Heads

Modifiers

Dependency Representation

Page 12: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Representation

* As McGwire neared , fans went wild

*

As

McGwire

neared

,

fans

went

wildAs

McG

wire

neared

, fans

went

wild

Heads

Modifiers

Dependency Representation

Page 13: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Representation

* As McGwire neared , fans went wild

*

As

McGwire

neared

,

fans

went

wildAs

McG

wire

neared

, fans

went

wild

Heads

Modifiers

Dependency Representation

Page 14: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Representation

* As McGwire neared , fans went wild

*

As

McGwire

neared

,

fans

went

wildAs

McG

wire

neared

, fans

went

wild

Heads

Modifiers

Dependency Representation

Page 15: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Arc-factored Projective Parsing

Page 16: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Arc-factored Projective Parsing

Page 17: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Eisner Algorithm

Page 18: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Eisner First-Order Rules

h m

h r

+

mr + 1

h e

h m

+

m e

Eisner First-Order Parsing

In practice also left arc version

Page 19: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

First-Order Parsing

* As McGwire neared , fans went wild

*

As

McGwire

neared

,

fans

went

wild

As

McG

wire

neared

, fans

went

wild

Eisner First-Order Parsing

Page 20: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

First-Order Parsing

* As McGwire neared , fans went wild

*

As

McGwire

neared

,

fans

went

wild

As

McG

wire

neared

, fans

went

wild

Eisner First-Order Parsing

Page 21: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

First-Order Parsing

* As McGwire neared , fans went wild

*

As

McGwire

neared

,

fans

went

wild

As

McG

wire

neared

, fans

went

wild

Eisner First-Order Parsing

Page 22: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

First-Order Parsing

* As McGwire neared , fans went wild

*

As

McGwire

neared

,

fans

went

wild

As

McG

wire

neared

, fans

went

wild

Eisner First-Order Parsing

Page 23: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

First-Order Parsing

* As McGwire neared , fans went wild

*

As

McGwire

neared

,

fans

went

wild

As

McG

wire

neared

, fans

went

wild

Eisner First-Order Parsing

Page 24: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

First-Order Parsing

* As McGwire neared , fans went wild

*

As

McGwire

neared

,

fans

went

wild

As

McG

wire

neared

, fans

went

wild

Eisner First-Order Parsing

Page 25: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

First-Order Parsing

* As McGwire neared , fans went wild

*

As

McGwire

neared

,

fans

went

wild

As

McG

wire

neared

, fans

went

wild

Eisner First-Order Parsing

Page 26: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

First-Order Parsing

* As McGwire neared , fans went wild

*

As

McGwire

neared

,

fans

went

wild

As

McG

wire

neared

, fans

went

wild

Eisner First-Order Parsing

Page 27: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

First-Order Parsing

* As McGwire neared , fans went wild

*

As

McGwire

neared

,

fans

went

wild

As

McG

wire

neared

, fans

went

wild

Eisner First-Order Parsing

Page 28: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Eisner Algorithm Pseudo Code

Page 29: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Maximum Spanning Trees (MSTs)

Can use MST algorithms for nonprojective parsing!

Page 30: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Chu-Liu-Edmonds

Page 31: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Chu-Liu-Edmonds

Page 32: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Find Cycle and Contract

Page 33: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Recalculate Edge Weights

Page 34: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Theorem

Page 35: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Final MST

Page 36: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Chu-Liu-Edmonds PseudoCode

Page 37: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Chu-Liu-Edmonds PseudoCode

Page 38: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Arc Weights

Page 39: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Arc Feature Ideas for f(i,j,k)

• Identities of the words wi and wj and the label lk • Part-of-speech tags of the words wi and wj and the label lk • Part-of-speech of words surrounding and between wi and wj • Number of words between wi and wj , and their orientation • Combinations of the above

Page 40: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

First-Order Feature Calculation

* As McGwire neared , fans went wild

*

As

McGwire

neared

,

fans

went

wildAs

McG

wire

neared

, fans

went

wild

[went] [VBD] [As] [ADP] [went]

[VERB] [As] [IN] [went, VBD] [As, ADP]

[went, As] [VBD, ADP] [went, VERB] [As, IN] [went, As]

[VERB, IN] [VBD, As, ADP] [went, As, ADP] [went, VBD, ADP] [went, VBD, As]

[ADJ, *, ADP] [VBD, *, ADP] [VBD, ADJ, ADP] [VBD, ADJ, *] [NNS, *, ADP]

[NNS, VBD, ADP] [NNS, VBD, *] [ADJ, ADP, NNP] [VBD, ADP, NNP] [VBD, ADJ, NNP]

[NNS, ADP, NNP] [NNS, VBD, NNP] [went, left, 5] [VBD, left, 5] [As, left, 5]

[ADP, left, 5] [VERB, As, IN] [went, As, IN] [went, VERB, IN] [went, VERB, As]

[JJ, *, IN] [VERB, *, IN] [VERB, JJ, IN] [VERB, JJ, *] [NOUN, *, IN]

[NOUN, VERB, IN] [NOUN, VERB, *] [JJ, IN, NOUN] [VERB, IN, NOUN] [VERB, JJ, NOUN]

[NOUN, IN, NOUN] [NOUN, VERB, NOUN] [went, left, 5] [VERB, left, 5] [As, left, 5]

[IN, left, 5] [went, VBD, As, ADP] [VBD, ADJ, *, ADP] [NNS, VBD, *, ADP] [VBD, ADJ, ADP, NNP]

[NNS, VBD, ADP, NNP] [went, VBD, left, 5] [As, ADP, left, 5] [went, As, left, 5] [VBD, ADP, left, 5]

[went, VERB, As, IN] [VERB, JJ, *, IN] [NOUN, VERB, *, IN] [VERB, JJ, IN, NOUN] [NOUN, VERB, IN, NOUN]

[went, VERB, left, 5] [As, IN, left, 5] [went, As, left, 5] [VERB, IN, left, 5] [VBD, As, ADP, left, 5]

[went, As, ADP, left, 5] [went, VBD, ADP, left, 5] [went, VBD, As, left, 5] [ADJ, *, ADP, left, 5] [VBD, *, ADP, left, 5]

[VBD, ADJ, ADP, left, 5] [VBD, ADJ, *, left, 5] [NNS, *, ADP, left, 5] [NNS, VBD, ADP, left, 5] [NNS, VBD, *, left, 5]

[ADJ, ADP, NNP, left, 5] [VBD, ADP, NNP, left, 5] [VBD, ADJ, NNP, left, 5] [NNS, ADP, NNP, left, 5] [NNS, VBD, NNP, left, 5]

[VERB, As, IN, left, 5] [went, As, IN, left, 5] [went, VERB, IN, left, 5] [went, VERB, As, left, 5] [JJ, *, IN, left, 5]

[VERB, *, IN, left, 5] [VERB, JJ, IN, left, 5] [VERB, JJ, *, left, 5] [NOUN, *, IN, left, 5] [NOUN, VERB, IN, left, 5]

[NOUN, VERB, *, left, 5] [JJ, IN, NOUN, left, 5] [VERB, IN, NOUN, left, 5] [VERB, JJ, NOUN, left, 5] [NOUN, IN, NOUN, left, 5]

[NOUN, VERB, NOUN, left, 5] [went, VBD, As, ADP, left, 5] [VBD, ADJ, *, ADP, left, 5] [NNS, VBD, *, ADP, left, 5] [VBD, ADJ, ADP, NNP, left, 5]

[NNS, VBD, ADP, NNP, left, 5] [went, VERB, As, IN, left, 5] [VERB, JJ, *, IN, left, 5] [NOUN, VERB, *, IN, left, 5] [VERB, JJ, IN, NOUN, left, 5]

[NOUN, VERB, IN, NOUN, left, 5]

First-Order Feature Computation

Page 41: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

(Structured) Perceptron

Page 42: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Transition Based Dependency Parsing

• Process sentence left to right • Different transition strategies available • Delay decisions by pushing on stack

• Arc-Standard Transition Strategy [Nivre ’03]

Initial configuration: ([],[0,…,n],[]) Terminal configuration: ([0],[],A)

shift: (σ,[i|β],A) ⇒ ([σ|i],β,A)

left-arc (label): ([σ|i|j],B,A) ⇒ ([σ|j],B,A∪{j,l,i})

right-arc (label): ([σ|i|j],B,A) ⇒ ([σ|i],B,A∪{i,l,j})

Page 43: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

↑ Stack ← Buffer

Arc-Standard Example

I booked a flight to Lisbon

SHIFT

I booked a flight to Lisbon

Page 44: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

↑ Stack ← Buffer

Arc-Standard Example

I booked a flight to Lisbon

SHIFT

I booked a flight to Lisbon

Page 45: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

↑ Stack ← Buffer

Arc-Standard Example

I

booked a flight to Lisbon

LEFT-ARC nsubj

I booked a flight to Lisbon

Page 46: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

↑ Stack ← Buffer

Arc-Standard Example

a flight to Lisbon

SHIFT

I booked

I booked a flight to Lisbon

nsubj

Page 47: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

↑ Stack ← Buffer

Arc-Standard Example

a flight to Lisbon

SHIFT I booked

I booked a flight to Lisbon

nsubj

Page 48: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

↑ Stack ← Buffer

Arc-Standard Example

a

flight to Lisbon

LEFT-ARC det I booked

I booked a flight to Lisbon

nsubj

Page 49: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

↑ Stack ← Buffer

Arc-Standard Example

to Lisbon

SHIFT I booked

a flight

I booked a flight to Lisbon

nsubj det

Page 50: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

↑ Stack ← Buffer

Arc-Standard Example

to Lisbon

SHIFT I booked

a flight

I booked a flight to Lisbon

nsubj det

Page 51: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

↑ Stack ← Buffer

Arc-Standard Example

to

Lisbon

RIGHT-ARC pobj

I booked

a flight

I booked a flight to Lisbon

nsubj det

Page 52: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

↑ Stack

Arc-Standard Example

RIGHT-ARC prep

I booked

a flight

to Lisbon

← Buffer

I booked a flight to Lisbon

nsubj det pobj

Page 53: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

↑ Stack

Arc-Standard Example

RIGHT-ARC dobj

I booked

a flight to Lisbon

← Buffer

I booked a flight to Lisbon

nsubj det pobjprep

Page 54: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

↑ Stack

Arc-Standard Example

I booked a flight to Lisbon

← Buffer

dobj

nsubj pobjprepdet

I booked a flight to Lisbon

Page 55: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

↑ Stack ← Buffer

Features

to Lisbon

I booked

a flight

Stack top word = “flight” Stack top POS tag = “NOUN” Buffer front word = “to” Child of stack top word = “a” ....

RIGHT-ARC?

LEFT-ARC?

SHIFT

Page 56: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

SVM / Structured Perceptron Hyperparameters

• Regularization • Loss function • Hand-crafted features

Page 57: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Features ZPar Parser# From Single Wordspair { stack.tag stack.word }stack { word tag }pair { input.tag input.word }input { word tag }pair { input(1).tag input(1).word }input(1) { word tag }pair { input(2).tag input(2).word }input(2) { word tag }

# From word pairsquad { stack.tag stack.word input.tag input.word }triple { stack.tag stack.word input.word }triple { stack.word input.tag input.word }triple { stack.tag stack.word input.tag }triple { stack.tag input.tag input.word }pair { stack.word input.word }pair { stack.tag input.tag }pair { input.tag input(1).tag }

# From word triplestriple { input.tag input(1).tag input(2).tag }triple { stack.tag input.tag input(1).tag }triple { stack.head(1).tag stack.tag input.tag }triple { stack.tag stack.child(-1).tag input.tag }triple { stack.tag stack.child(1).tag input.tag }triple { stack.tag input.tag input.child(-1).tag }

# Distancepair { stack.distance stack.word }pair { stack.distance stack.tag }pair { stack.distance input.word }pair { stack.distance input.tag }triple { stack.distance stack.word input.word }triple { stack.distance stack.tag input.tag }

# valencypair { stack.word stack.valence(-1) }pair { stack.word stack.valence(1) }pair { stack.tag stack.valence(-1) }pair { stack.tag stack.valence(1) }pair { input.word input.valence(-1) }pair { input.tag input.valence(-1) }

# unigramsstack.head(1) {word tag}stack.labelstack.child(-1) {word tag label}stack.child(1) {word tag label}input.child(-1) {word tag label}

# third orderstack.head(1).head(1) {word tag}stack.head(1).labelstack.child(-1).sibling(1) {word tag label}stack.child(1).sibling(-1) {word tag label}input.child(-1).sibling(1) {word tag label}triple { stack.tag stack.child(-1).tag stack.child(-1).sibling(1).tag }triple { stack.tag stack.child(1).tag stack.child(1).sibling(-1).tag }triple { stack.tag stack.head(1).tag stack.head(1).head(1).tag }triple { input.tag input.child(-1).tag input.child(-1).sibling(1).tag }

# label setpair { stack.tag stack.child(-1).label }triple { stack.tag stack.child(-1).label stack.child(-1).sibling(1).label }quad { stack.tag stack.child(-1).label stack.child(-1).sibling(1).label stack.child(-1).sibling(2).label }pair { stack.tag stack.child(1).label }triple { stack.tag stack.child(1).label stack.child(1).sibling(-1).label }quad { stack.tag stack.child(1).label stack.child(1).sibling(-1).label stack.child(1).sibling(-2).label }pair { input.tag input.child(-1).label }triple { input.tag input.child(-1).label input.child(-1).sibling(1).label }quad { input.tag input.child(-1).label input.child(-1).sibling(1).label input.child(-1).sibling(2).label }

Page 58: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

……

Neural Network Transition Based Parser

f0 = 1 [buffer0-word = “to”]

f1 = 1 [buffer1-word = “Bilbao”]

f3 = 1 [stack0-label = “pobj”]f2 = 1 [buffer0-POS = “IN”]

… Embedding Layer

Hidden Layer

Softmax

Atomic Inputs

words labelspos

[Chen & Manning ’14] and [Weiss et al. ’15, Andor et al. ’16]

Page 59: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

……

f0 = 1 [buffer0-word = “to”]

f1 = 1 [buffer1-word = “Bilbao”]

f3 = 1 [stack0-label = “pobj”]f2 = 1 [buffer0-POS = “IN”]

… Embedding Layer

Hidden Layer

Softmax

Atomic Inputs

words labelspos

Neural Network Transition Based Parser [Weiss et al. ’15]

Page 60: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

……

f0 = 1 [buffer0-word = “to”]

f1 = 1 [buffer1-word = “Bilbao”]

f3 = 1 [stack0-label = “pobj”]f2 = 1 [buffer0-POS = “IN”]

… Embedding Layer

Hidden Layer

Softmax

Atomic Inputs

words labelspos

[Weiss et al. ’15]

Neural Network Transition Based Parser

Page 61: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

……

f0 = 1 [buffer0-word = “to”]

f1 = 1 [buffer1-word = “Bilbao”]

f3 = 1 [stack0-label = “pobj”]f2 = 1 [buffer0-POS = “IN”]

… Embedding Layer

Hidden Layer

Softmax

Atomic Inputs

words labelspos

Hidden Layer 2

1

Neural Network Transition Based Parser [Weiss et al. ’15]

Page 62: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

……

f0 = 1 [buffer0-word = “to”]

f1 = 1 [buffer1-word = “Bilbao”]

f3 = 1 [stack0-label = “pobj”]f2 = 1 [buffer0-POS = “IN”]

words labelspos

. wfs

structuredperceptron

Neural Network Transition Based Parser

[Andor et al. ’16]

[Weiss et al. ’15]

globally-normalized CRF

Page 63: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

NN Hyperparameters• Regularization

• Loss function

Page 64: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

NN Hyperparameters

• Dimensions

• Activation function

• Initialization

• Adagrad

• Dropout

• Regularization

• Loss function

Page 65: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

NN Hyperparameters

• Dimensions

• Activation function

• Initialization

• Adagrad

• Dropout

• Mini-batch size

• Initial learning rate

• Learning rate schedule

• Momentum • Stopping time

• Parameter averaging

• Regularization

• Loss function

Page 66: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

NN Hyperparameters

Optimization matters! Use random restarts, grid

Pick best using holdout data

Tune: WSJ S24 Dev: WSJ S22 Test: WSJ S23

Page 67: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Random Restarts: How much Variance?

91.2 91.4 91.6 91.8 9292

92.1

92.2

92.3

92.4

92.5

92.6

92.7

UAS (%) on WSJ Tune Set

UAS

(%) o

n W

SJ D

ev S

etVariance of Networks on Tuning/Dev Set

Pretrained 200x200Pretrained 200200x200200

2nd hidden layer + pre training increases

correlation

Page 68: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Effect of Embedding Dimensions

1 2 4 8 16 32 64 12889.5

90

90.5

91

91.5

92

Word Embedding Dimension (Dwords)

UAS

(%)

Word Tuning on WSJ (Tune Set, Dpos,Dlabels=32)

Pretrained 200x200Pretrained 200200x200200

Page 69: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

1 2 4 8 16 3290.5

91

91.5

92

POS/Label Embedding Dimension (Dpos,Dlabels)

UAS

(%)

POS/Label Tuning on WSJ (Tune Set, Dwords=64)

Pretrained 200x200Pretrained 200200x200200

Effect of Embedding Dimensions

Page 70: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Do we need structure? [Dozat & Manning ’17]

Page 71: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Bi-Affine Parsing

Page 72: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Self-Attention

Page 73: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

English Results (WSJ 23)

Method UAS LAS Beam3rd-order Graph-based (ZM2014) 93,22 91,02 -

Transition-based Linear (ZN2011) 93,00 90,95 32

NN Baseline (Chen & Manning, 2014) 91,80 89,60 1

NN Better SGD (Weiss et al., 2015) 92,58 90,54 1

NN Deeper Network (Weiss et al., 2015) 93,19 91,18 1

NN Perceptron (Weiss et al., 2015) 93,99 92,05 8

NN Semi-supervised (Weiss et al., 2015) 94,26 92,41 8

S-LSTM (Dyer et al., 2015) 93,20 90,90 1

Contrastive NN (Zhou et al., 2015) 92,83 — 100

Page 74: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

English Results (WSJ 23)

Page 75: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Multilingual Results

Page 76: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Beyond Syntax: Semantic Structures

On January 13, 2018, a false ballistic missile alert was issued via

the Emergency Alert System and Commercial Mobile Alert System over

television, radio, and cellphones in the U.S. state of Hawaii.

The alert stated that there was an incoming ballistic missile threat to Hawaii,

advised residents to seek shelter, and concluded "This is not a drill".

The message was sent at 8:07 a.m. local time.

LOC LOC

SRLCoreference

NERPatient Time

From Wikipedia: 2018 Hawaii false missile alert. Only part of the structures are visualized.

Agent Patient

PatientTime

Page 77: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

SRL Systems: Pipelined vs. BIO-based

Page 78: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

SRL as a BIO Tagging Problem

Input Sentence & Predicate visitMany tourists Disney to meet their favorite cartoon characters

BIO Output B-VB-A0 I-A0 B-A1B-AM-PRP

I-AM-PRP

I-AM-PRP

I-AM-PRP

I-AM-PRP

I-AM-PRP

Span Output ARG0 V ARG1 AM-PRP

Page 79: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Input1 visitMany tourists Disney to meet their favorite cartoon characters

B-VB-A0 I-A0 B-A1B-AM-PRP

I-AM-PRP

I-AM-PRP

I-AM-PRP

I-AM-PRP

I-AM-PRP

Output1 ARG0 V ARG1 AM-PRP

Input2 visitMany tourists Disney to meet their favorite cartoon characters

OB-A0 I-A0 O O B-V B-A1 I-A1 I-A1 I-A1

Output2 ARG0 V ARG!

SRL as a BIO Tagging Problem

Page 80: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

DeepSRL Architecture (Revisit)

Input sentence

Word & Pred.Embeddings

HighwayBiLSTMs

Many tourists visit Disney to meet their favorite cartoon characters[0] [0] [1] [0] [0] [0] [0] [0] [0] [0]Target Predicate

TaggingSoftmax

B-A0 I-A0 B-V B-V … …Output Labels

Page 81: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

LSGN Architecture: Overview

Input sentence

Word & CharEmbeddings

HighwayBiLSTMs

Many tourists visit Disney to meet their favorite cartoon characters

Span Representation

Many tourists

tourists visit Disney

Disney to meet their

their favorite cartoon

cartoon characters

No predicate input!

(1) Construct span representations for all n2 spans!

Page 82: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

LSGN Architecture: Overview

Input sentence

Word & CharEmbeddings

HighwayBiLSTMs

Many tourists visit Disney to meet their favorite cartoon characters

Span Representation

Node & Edge Scores

LabelingSoftmax

(2) Local classifier over labels (including NULL) for all possible (predicate, argument) pairs

(1) Construct span representations for all n2 spans!

…(3) Greedy beam pruning for spans

Page 83: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

F1

60

65

70

75

80

85

90

DeepSRL LSGN DeepSRL (Ensemble) LSGN+ELMo

82,978,479,8

76,8 76,1

70,170,868,5

8682,782,581,2

WSJ Test Brown (out-domain) Test CoNLL2012 (OntoNotes)

83

End-to-End SRL Results

• More improvements on Brown (out-domain) & OntoNotes (with nominal predicates)

• With ELMo, over 3 points improvement over ensemble model!

Page 84: Syntax and Parsing II - LxMLS 2020lxmls.it.pt/2018/Part_2_-_Dependency_Parsing_2018.pdfSyntax and Parsing II Dependency Parsing Slav Petrov – Google Thanks to: Dan Klein, Ryan McDonald,

Summary• Constituency Parsing

• CKY Algorithm • Lexicalized Grammars • Latent Variable Grammars • Conditional Random Field Parsing • Neural Network Representations

• Dependency Parsing • Eisner Algorithm • Maximum Spanning Tree Algorithm • Transition Based Parsing • Neural Network Representations

• Semantic Role Labeling