should neural network architecture reflect linguistic...

144
Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind) Austin Matthews (CMU) Noah A. Smith (UW) Miguel Ballesteros (UPF) Joint work with: Chris Dyer (CMU Google/DeepMind) ICLR 2016 May 2, 2016

Upload: buidiep

Post on 24-Jul-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Should Neural Network Architecture Reflect Linguistic Structure?

Wang Ling (Google/DeepMind)Austin Matthews (CMU)

Noah A. Smith (UW)

Miguel Ballesteros (UPF) Joint work with:

Chris Dyer (CMU → Google/DeepMind)

ICLR 2016 May 2, 2016

Page 2: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Learning languageARBITRARINESS

COMPOSITIONALITY

(de Saussure, 1916)

(Frege, 1892)

Page 3: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Learning languageARBITRARINESS

COMPOSITIONALITY

car c b bar+� =

(de Saussure, 1916)

(Frege, 1892)

Page 4: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Learning languageARBITRARINESS

COMPOSITIONALITY

car c b bar+� = cat c b bat+� =

(de Saussure, 1916)

(Frege, 1892)

Page 5: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Learning languageARBITRARINESS

COMPOSITIONALITY

car c b bar+� = cat c b bat+� =

(de Saussure, 1916)

(Frege, 1892)

car

Page 6: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Learning languageARBITRARINESS

COMPOSITIONALITY

Auto voiture xehơiọkọayọkẹlẹ koloi sakyanan

car c b bar+� = cat c b bat+� =

(de Saussure, 1916)

(Frege, 1892)

car

Page 7: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Learning languageARBITRARINESS

COMPOSITIONALITY

Auto voiture xehơiọkọayọkẹlẹ koloi sakyanan

car c b bar+� = cat c b bat+� =

(de Saussure, 1916)

(Frege, 1892)

car

Page 8: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Learning languageARBITRARINESS

COMPOSITIONALITY

Auto voiture xehơiọkọayọkẹlẹ koloi sakyanan

car c b bar+� = cat c b bat+� =

(de Saussure, 1916)

(Frege, 1892)Johndances John Mary Marydances+� =

dance(John) dance(Mary)

car

Page 9: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Learning languageARBITRARINESS

COMPOSITIONALITY

Auto voiture xehơiọkọayọkẹlẹ koloi sakyanan

car c b bar+� = cat c b bat+� =

(de Saussure, 1916)

(Frege, 1892)Johndances John Mary Marydances+� =

dance(John) dance(Mary)

Johnsings John Mary Marysings+� =

sing(John) sing(Mary)

car

Page 10: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Learning languageARBITRARINESS

COMPOSITIONALITY

Auto voiture xehơiọkọayọkẹlẹ koloi sakyanan

car c b bar+� = cat c b bat+� =

(de Saussure, 1916)

(Frege, 1892)Johndances John Mary Marydances+� =

dance(John) dance(Mary)

Johnsings John Mary Marysings+� =

sing(John) sing(Mary)

car

Page 11: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Learning languageARBITRARINESS

COMPOSITIONALITY

Auto voiture xehơiọkọayọkẹlẹ koloi sakyanan

car c b bar+� = cat c b bat+� =

(de Saussure, 1916)

(Frege, 1892)Johndances John Mary Marydances+� =

dance(John) dance(Mary)

Johnsings John Mary Marysings+� =

sing(John) sing(Mary)

car Memorize

Generalize

Page 12: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Learning language

John gave the book to Mary

Page 13: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Learning language

00 0 1 0 · · ·· · ·· · ·· · · 1>John

John gave the book to Mary

Page 14: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Learning language

P⇥00 0 1 0 · · ·· · ·· · ·· · · 1>

John

John gave the book to Mary

Page 15: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Learning language

P⇥00 0 1 0 · · ·· · ·· · ·· · · 1>

John

vJohn

John gave the book to Mary

Page 16: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Learning language

John gave the book to Mary

Page 17: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Learning language

John gave the book to Mary

Page 18: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Learning language

John gave the book to Mary

f

Page 19: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Learning language

John gave the book to Mary

f Recurrent NN, ConvNet, Recursive NN, …

Page 20: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Learning language

John gave the book to Mary

f Recurrent NN, ConvNet, Recursive NN, …

prediction

Page 21: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Learning language

John gave the book to Mary

f Recurrent NN, ConvNet, Recursive NN, …

prediction

Memorize

Page 22: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Learning language

John gave the book to Mary

f Recurrent NN, ConvNet, Recursive NN, …

prediction

Memorize

Generalize

Page 23: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Learning languageCHALLENGE 1: IDIOMS

CHALLENGE 2: MORPHOLOGY

Page 24: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Learning languageCHALLENGE 1: IDIOMSJohnsawthefootballJohnsawthebucket

see(John, football)

see(John,bucket)

CHALLENGE 2: MORPHOLOGY

Page 25: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Learning languageCHALLENGE 1: IDIOMSJohnsawthefootballJohnsawthebucket

see(John, football)

see(John,bucket)

kick(John, football)JohnkickedthefootballJohnkickedthebucket die(John)

CHALLENGE 2: MORPHOLOGY

Page 26: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Learning languageCHALLENGE 1: IDIOMSJohnsawthefootballJohnsawthebucket

see(John, football)

see(John,bucket)

kick(John, football)JohnkickedthefootballJohnkickedthebucket die(John)

CHALLENGE 2: MORPHOLOGY

Page 27: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Learning languageCHALLENGE 1: IDIOMSJohnsawthefootballJohnsawthebucket

see(John, football)

see(John,bucket)

kick(John, football)JohnkickedthefootballJohnkickedthebucket die(John)

CHALLENGE 2: MORPHOLOGYcool|coooool|coooooooool

Page 28: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Learning languageCHALLENGE 1: IDIOMSJohnsawthefootballJohnsawthebucket

see(John, football)

see(John,bucket)

kick(John, football)JohnkickedthefootballJohnkickedthebucket die(John)

CHALLENGE 2: MORPHOLOGYcool|coooool|coooooooool

cat s+ = cats

Page 29: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Learning languageCHALLENGE 1: IDIOMSJohnsawthefootballJohnsawthebucket

see(John, football)

see(John,bucket)

kick(John, football)JohnkickedthefootballJohnkickedthebucket die(John)

CHALLENGE 2: MORPHOLOGYcool|coooool|coooooooool

cat s+ = cats bat s+ = bats

Page 30: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Learning languageCHALLENGE 1: IDIOMSJohnsawthefootballJohnsawthebucket

see(John, football)

see(John,bucket)

kick(John, football)JohnkickedthefootballJohnkickedthebucket die(John)

CHALLENGE 2: MORPHOLOGYcool|coooool|coooooooool

cat s+ = cats bat s+ = bats

Page 31: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Compositional words

cats

Memorize Generalize

Page 32: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Compositional words

cats

00 0 1 0 · · ·· · ·· · ·· · ·

Memorize Generalize

Page 33: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Compositional words

cats

00 0 1 0 · · ·· · ·· · ·· · ·

Memorize Generalize

Page 34: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Compositional words

cats

00 0 1 0 · · ·· · ·· · ·· · ·

Memorize Generalize

c a t STOPSTART s

Page 35: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Compositional words

cats

00 0 1 0 · · ·· · ·· · ·· · ·

Memorize Generalize

c a t STOPSTART s

Page 36: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Compositional words

cats

00 0 1 0 · · ·· · ·· · ·· · ·

Memorize Generalize

c a t STOPSTART s

Page 37: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Compositional words

cats

00 0 1 0 · · ·· · ·· · ·· · ·

Memorize Generalize

c a t STOPSTART s

Page 38: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Compositional words

cats

00 0 1 0 · · ·· · ·· · ·· · ·

Memorize Generalize

c a t STOPSTART s

Page 39: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

• Does a “compositional” model have the capacity to learn the “arbitrariness” that is required?

• We might think so—RNNs/LSTMs can definitely overfit!

• Will we see better improvements in languages with more going on in the morphology?

Compositional wordsQuestions

Page 40: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

ExampleDependency parsing

I saw her duck

Page 41: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

ExampleDependency parsing

I saw her duck

Page 42: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

ExampleDependency parsing

I saw her duck

ROOT

Page 43: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

ExampleDependency parsing

I saw her duck

ROOT

Page 44: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

ExampleDependency parsing

I saw her duck

ROOT

00 0 1 0 · · ·· · ·· · ·· · ·

SS

Word embedding models:

Memorize Generalize

Page 45: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Dependency parsingCharLSTM > Word Lookup

Word Chars Δ

English 91.2 91.5 +0.3

Turkish 71.7 76.3 +4.6

Hungarian 72.8 80.4 +7.6

Basque 77.1 85.2 +8.1

Korean 78.7 88.4 +9.7

Swedish 76.4 79.2 +3.2

Swedish 76.4 79.2 +2.8

Arabic 85.2 86.1 +0.9

Chinese 79.1 79.9 +0.8

Page 46: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Dependency parsingCharLSTM > Word Lookup

Word Chars Δ

English 91.2 91.5 +0.3

Turkish 71.7 76.3 +4.6

Hungarian 72.8 80.4 +7.6

Basque 77.1 85.2 +8.1

Korean 78.7 88.4 +9.7

Swedish 76.4 79.2 +3.2

Swedish 76.4 79.2 +2.8

Arabic 85.2 86.1 +0.9

Chinese 79.1 79.9 +0.8

In English parsing, the character LSTM is roughly equivalent to the lookup approach.

00 0 1 0 · · ·· · ·· · ·· · ·

Page 47: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Dependency parsingCharLSTM > Word Lookup

Word Chars Δ

English 91.2 91.5 +0.3

Turkish 71.7 76.3 +4.6

Hungarian 72.8 80.4 +7.6

Basque 77.1 85.2 +8.1

Korean 78.7 88.4 +9.7

Swedish 76.4 79.2 +3.2

Swedish 76.4 79.2 +2.8

Arabic 85.2 86.1 +0.9

Chinese 79.1 79.9 +0.8

In English parsing, the character LSTM is roughly equivalent to the lookup approach.

00 0 1 0 · · ·· · ·· · ·· · ·

⇡What about languages with richer lexicons?MuvaffakiyetsizleştiricileştiriveremeyebileceklerimizdenmişsinizcesineTurkish:

MegszentségteleníthetetlenségeskedéseitekértHungarian:

Page 48: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Word Chars Δ

English 91.2 91.5 +0.3

Turkish 71.7 76.3 +4.6

Hungarian 72.8 80.4 +7.6

Basque 77.1 85.2 +8.1

Korean 78.7 88.4 +9.7

Swedish 76.4 79.2 +3.2

Swedish 76.4 79.2 +2.8

Arabic 85.2 86.1 +0.9

Chinese 79.1 79.9 +0.8

Dependency parsingCharLSTM > Word Lookup

Page 49: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Word Chars Δ

English 91.2 91.5 +0.3

Turkish 71.7 76.3 +4.6

Hungarian 72.8 80.4 +7.6

Basque 77.1 85.2 +8.1

Korean 78.7 88.4 +9.7

Swedish 76.4 79.2 +3.2

Swedish 76.4 79.2 +2.8

Arabic 85.2 86.1 +0.9

Chinese 79.1 79.9 +0.8

00 0 1 0 · · ·· · ·· · ·· · ·

In agglutinative languages,

Dependency parsingCharLSTM > Word Lookup

Page 50: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Word Chars Δ

English 91.2 91.5 +0.3

Turkish 71.7 76.3 +4.6

Hungarian 72.8 80.4 +7.6

Basque 77.1 85.2 +8.1

Korean 78.7 88.4 +9.7

Swedish 76.4 79.2 +3.2

Swedish 76.4 79.2 +2.8

Arabic 85.2 86.1 +0.9

Chinese 79.1 79.9 +0.8

00 0 1 0 · · ·· · ·· · ·· · ·

In agglutinative languages,

In fusional/templatic languages,

00 0 1 0 · · ·· · ·· · ·· · ·<

Dependency parsingCharLSTM > Word Lookup

Page 51: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Word Chars Δ

English 91.2 91.5 +0.3

Turkish 71.7 76.3 +4.6

Hungarian 72.8 80.4 +7.6

Basque 77.1 85.2 +8.1

Korean 78.7 88.4 +9.7

Swedish 76.4 79.2 +3.2

Swedish 76.4 79.2 +2.8

Arabic 85.2 86.1 +0.9

Chinese 79.1 79.9 +0.8

00 0 1 0 · · ·· · ·· · ·· · ·

In agglutinative languages,

In fusional/templatic languages,

00 0 1 0 · · ·· · ·· · ·· · ·<

In analytic languages, the models are roughly equivalent.

Dependency parsingCharLSTM > Word Lookup

Page 52: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Language modelingWord similarities

increased John Noahshire phding

reduced Richard Nottinghamshire mixing

improved George Bucharest modelling

expected James Saxony styling

decreased Robert Johannesburg blaming

targeted Edward Gloucestershire christening

query

Page 53: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Language modelingWord similarities

increased John Noahshire phding

reduced Richard Nottinghamshire mixing

improved George Bucharest modelling

expected James Saxony styling

decreased Robert Johannesburg blaming

targeted Edward Gloucestershire christening

query

5 nearest neighbors

Page 54: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Language modelingWord similarities

increased John Noahshire phding

reduced Richard Nottinghamshire mixing

improved George Bucharest modelling

expected James Saxony styling

decreased Robert Johannesburg blaming

targeted Edward Gloucestershire christening

query

5 nearest neighbors

Page 55: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

• Lots of exciting work from a variety of places

• Google Brain: language models

• Harvard/NYU (Kim, Rush, Sontag): language models

• NYU/FB: document representation “from scratch”

• CMU (me, Cohen, Salakhutdinov): Twitter, morphologically rich languages, translation

• Now for something a bit more controversial…

Character vs. word modelingSummary

Page 56: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Structure-aware words

cats

00 0 1 0 · · ·· · ·· · ·· · ·

Memorize Generalize

c a t STOPSTART s

Page 57: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Structure-aware words

STOPSTARTcats

00 0 1 0 · · ·· · ·· · ·· · ·

Memorize Generalize

Would we benefit from a more knowledge-rich

decomposition?

cat +PL

Page 58: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

• Rather than assuming a fixed vocabulary, model any sequence in where is the inventory of characters.

Open Vocabulary LMs

⌃⇤ ⌃

Page 59: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Open Vocabulary LMsTurkish

Page 60: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Open Vocabulary LMsTurkish morphology

Page 61: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Open Vocabulary LMsTurkish morphology

Page 62: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Open Vocabulary LMsTurkish morphology

süreç+NOUN+A3SG+P3SG+ACC

Page 63: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Open Vocabulary LMsTurkish morphology

süreç+NOUN+A3SG+P3SG+ACC süreç+NOUN+A3SG+P2SG+ACC

Page 64: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Open Vocabulary LMsTurkish morphology

süreç+NOUN+A3SG+P3SG+ACC süreç+NOUN+A3SG+P2SG+ACC

POOLING

Page 65: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Open Vocabulary LMsTurkish morphology

süreç+NOUN+A3SG+P3SG+ACC süreç+NOUN+A3SG+P2SG+ACC

POOLING

sürecini s ü r e c i n i

Page 66: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Input word representation:

Page 67: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Input word representation:

Output generation process: (marginalized)

Page 68: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Input word representation:

Output generation process: (marginalized)

Page 69: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Input word representation:

Output generation process: (marginalized)

Page 70: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Input word representation:

Output generation process: (marginalized)

Page 71: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Input word representation:

Output generation process: (marginalized)

Page 72: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Input word representation:

Output generation process: (marginalized)

Page 73: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Open Vocabulary LMperplexity per

word

Characters 18600

Characters +Morphs 8165

Characters +Words 5021

Characters +Words +Morphs

4116

Page 74: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

• Model performance is essentially equivalent in morphologically simple languages (e.g., Chinese, English)

• In morphologically rich languages (e.g., Hungarian, Turkish, Finnish), performance improvements are most pronounced

• We need far fewer parameters to represent words as “compositions” of characters

• Word and morpheme level information adds additional value

• Where else could we add linguistic structural knowledge?

Character vs. word modelingSummary

Page 75: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

(1) a. The talk I gave did not appeal to anybody.

Modeling syntaxLanguage is hierarchical

Examples adapted from Everaert et al. (TICS 2015)

Page 76: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

(1) a. The talk I gave did not appeal to anybody.b. *The talk I gave appealed to anybody.

Modeling syntaxLanguage is hierarchical

Examples adapted from Everaert et al. (TICS 2015)

Page 77: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

(1) a. The talk I gave did not appeal to anybody.b. *The talk I gave appealed to anybody.

Modeling syntaxLanguage is hierarchical

NPI

Examples adapted from Everaert et al. (TICS 2015)

Page 78: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

(1) a. The talk I gave did not appeal to anybody.b. *The talk I gave appealed to anybody.

Generalization hypothesis: not must come before anybody

Modeling syntaxLanguage is hierarchical

NPI

Examples adapted from Everaert et al. (TICS 2015)

Page 79: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

(1) a. The talk I gave did not appeal to anybody.b. *The talk I gave appealed to anybody.

Generalization hypothesis: not must come before anybody

(2) *The talk I did not give appealed to anybody.

Modeling syntaxLanguage is hierarchical

NPI

Examples adapted from Everaert et al. (TICS 2015)

Page 80: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Examples adapted from Everaert et al. (TICS 2015)

containing anybody. (This structural configuration is called c(onstituent)-command in thelinguistics literature [31].) When the relationship between not and anybody adheres to thisstructural configuration, the sentence is well-formed.

In sentence (3), by contrast, not sequentially precedes anybody, but the triangle dominating notin Figure 1B fails to also dominate the structure containing anybody. Consequently, the sentenceis not well-formed.

The reader may confirm that the same hierarchical constraint dictates whether the examples in(4–5) are well-formed or not, where we have depicted the hierarchical sentence structure interms of conventional labeled brackets:

(4) [S1 [NP The book [S2 I bought]S2]NP did not [VP appeal to anyone]VP]S1(5) *[S1 [NP The book [S2 I did not buy]S2]NP [VP appealed to anyone]VP]S1

Only in example (4) does the hierarchical structure containing not (corresponding to the sentenceThe book I bought did not appeal to anyone) also immediately dominate the NPI anybody. In (5)not is embedded in at least one phrase that does not also include the NPI. So (4) is well-formedand (5) is not, exactly the predicted result if the hierarchical constraint is correct.

Even more strikingly, the same constraint appears to hold across languages and in many othersyntactic contexts. Note that Japanese-type languages follow this same pattern if we assumethat these languages have hierarchically structured expressions similar to English, but linearizethese structures somewhat differently – verbs come at the end of sentences, and so forth [32].Linear order, then, should not enter into the syntactic–semantic computation [33,34]. This israther independent of possible effects of linearly intervening negation that modulate acceptabilityin NPI contexts [35].

The Syntax of SyntaxObserve an example as in (6):

(6) Guess which politician your interest in clearly appeals to.

The construction in (6) is remarkable because a single wh-phrase is associated bothwith the prepositional object gap of to and with the prepositional object gap of in, as in(7a). We talk about ‘gaps’ because a possible response to (6) might be as in (7b):

(7) a. Guess which politician your interest in GAP clearly appeals to GAP.b. response to (7a): Your interest in Donald Trump clearly appeals to Donald Trump

(A) (B)

X X

X X X X

The book X X X The book X appealed to anybodydid not

that I bought appeal to anybody that I did not buy

Figure 1. Negative Polarity. (A) Negative polarity licensed: negative element c-commands negative polarity item.(B) Negative polarity not licensed. Negative element does not c-command negative polarity item.

734 Trends in Cognitive Sciences, December 2015, Vol. 19, No. 12

Language is hierarchical

The talk

I gavedid not

appeal to anybody

appealed to anybodyThe talk

I did not give

Page 81: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Examples adapted from Everaert et al. (TICS 2015)

containing anybody. (This structural configuration is called c(onstituent)-command in thelinguistics literature [31].) When the relationship between not and anybody adheres to thisstructural configuration, the sentence is well-formed.

In sentence (3), by contrast, not sequentially precedes anybody, but the triangle dominating notin Figure 1B fails to also dominate the structure containing anybody. Consequently, the sentenceis not well-formed.

The reader may confirm that the same hierarchical constraint dictates whether the examples in(4–5) are well-formed or not, where we have depicted the hierarchical sentence structure interms of conventional labeled brackets:

(4) [S1 [NP The book [S2 I bought]S2]NP did not [VP appeal to anyone]VP]S1(5) *[S1 [NP The book [S2 I did not buy]S2]NP [VP appealed to anyone]VP]S1

Only in example (4) does the hierarchical structure containing not (corresponding to the sentenceThe book I bought did not appeal to anyone) also immediately dominate the NPI anybody. In (5)not is embedded in at least one phrase that does not also include the NPI. So (4) is well-formedand (5) is not, exactly the predicted result if the hierarchical constraint is correct.

Even more strikingly, the same constraint appears to hold across languages and in many othersyntactic contexts. Note that Japanese-type languages follow this same pattern if we assumethat these languages have hierarchically structured expressions similar to English, but linearizethese structures somewhat differently – verbs come at the end of sentences, and so forth [32].Linear order, then, should not enter into the syntactic–semantic computation [33,34]. This israther independent of possible effects of linearly intervening negation that modulate acceptabilityin NPI contexts [35].

The Syntax of SyntaxObserve an example as in (6):

(6) Guess which politician your interest in clearly appeals to.

The construction in (6) is remarkable because a single wh-phrase is associated bothwith the prepositional object gap of to and with the prepositional object gap of in, as in(7a). We talk about ‘gaps’ because a possible response to (6) might be as in (7b):

(7) a. Guess which politician your interest in GAP clearly appeals to GAP.b. response to (7a): Your interest in Donald Trump clearly appeals to Donald Trump

(A) (B)

X X

X X X X

The book X X X The book X appealed to anybodydid not

that I bought appeal to anybody that I did not buy

Figure 1. Negative Polarity. (A) Negative polarity licensed: negative element c-commands negative polarity item.(B) Negative polarity not licensed. Negative element does not c-command negative polarity item.

734 Trends in Cognitive Sciences, December 2015, Vol. 19, No. 12

Generalization: not must “structurally precede” anybody

Language is hierarchical

The talk

I gavedid not

appeal to anybody

appealed to anybodyThe talk

I did not give

Page 82: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Examples adapted from Everaert et al. (TICS 2015)

containing anybody. (This structural configuration is called c(onstituent)-command in thelinguistics literature [31].) When the relationship between not and anybody adheres to thisstructural configuration, the sentence is well-formed.

In sentence (3), by contrast, not sequentially precedes anybody, but the triangle dominating notin Figure 1B fails to also dominate the structure containing anybody. Consequently, the sentenceis not well-formed.

The reader may confirm that the same hierarchical constraint dictates whether the examples in(4–5) are well-formed or not, where we have depicted the hierarchical sentence structure interms of conventional labeled brackets:

(4) [S1 [NP The book [S2 I bought]S2]NP did not [VP appeal to anyone]VP]S1(5) *[S1 [NP The book [S2 I did not buy]S2]NP [VP appealed to anyone]VP]S1

Only in example (4) does the hierarchical structure containing not (corresponding to the sentenceThe book I bought did not appeal to anyone) also immediately dominate the NPI anybody. In (5)not is embedded in at least one phrase that does not also include the NPI. So (4) is well-formedand (5) is not, exactly the predicted result if the hierarchical constraint is correct.

Even more strikingly, the same constraint appears to hold across languages and in many othersyntactic contexts. Note that Japanese-type languages follow this same pattern if we assumethat these languages have hierarchically structured expressions similar to English, but linearizethese structures somewhat differently – verbs come at the end of sentences, and so forth [32].Linear order, then, should not enter into the syntactic–semantic computation [33,34]. This israther independent of possible effects of linearly intervening negation that modulate acceptabilityin NPI contexts [35].

The Syntax of SyntaxObserve an example as in (6):

(6) Guess which politician your interest in clearly appeals to.

The construction in (6) is remarkable because a single wh-phrase is associated bothwith the prepositional object gap of to and with the prepositional object gap of in, as in(7a). We talk about ‘gaps’ because a possible response to (6) might be as in (7b):

(7) a. Guess which politician your interest in GAP clearly appeals to GAP.b. response to (7a): Your interest in Donald Trump clearly appeals to Donald Trump

(A) (B)

X X

X X X X

The book X X X The book X appealed to anybodydid not

that I bought appeal to anybody that I did not buy

Figure 1. Negative Polarity. (A) Negative polarity licensed: negative element c-commands negative polarity item.(B) Negative polarity not licensed. Negative element does not c-command negative polarity item.

734 Trends in Cognitive Sciences, December 2015, Vol. 19, No. 12

Generalization: not must “structurally precede” anybody- many theories of the details of structure - the psychological reality of structural sensitivty

is not empirically controversial - much more than NPIs follow such constraints

Language is hierarchical

The talk

I gavedid not

appeal to anybody

appealed to anybodyThe talk

I did not give

Page 83: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

One theory of hierarchy

• Generate symbols sequentially using an RNN

• Add some “control symbols” to rewrite the history periodically

• Periodically “compress” a sequence into a single “constituent”

• Augment RNN with an operation to compress recent history into a single vector (-> “reduce”)

• RNN predicts next symbol based on the history of compressed elements and non-compressed terminals (“push”)

• RNN must also predict “control symbols” that decide how big constituents are

• We call such models recurrent neural network grammars.

Page 84: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

One theory of hierarchy

• Generate symbols sequentially using an RNN

• Add some “control symbols” to rewrite the history periodically

• Periodically “compress” a sequence into a single “constituent”

• Augment RNN with an operation to compress recent history into a single vector (-> “reduce”)

• RNN predicts next symbol based on the history of compressed elements and non-compressed terminals (“shift” or “generate”)

• RNN must also predict “control symbols” that decide how big constituents are

• We call such models recurrent neural network grammars.

Page 85: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

One theory of hierarchy

• Generate symbols sequentially using an RNN

• Add some “control symbols” to rewrite the history periodically

• Periodically “compress” a sequence into a single “constituent”

• Augment RNN with an operation to compress recent history into a single vector (-> “reduce”)

• RNN predicts next symbol based on the history of compressed elements and non-compressed terminals (“shift” or “generate”)

• RNN must also predict “control symbols” that decide how big constituents are

• We call such models recurrent neural network grammars.

Page 86: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Stack ActionTerminals

Page 87: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Stack ActionNT(S)

Terminals

Page 88: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Stack ActionNT(S)

(S

Terminals

Page 89: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Stack ActionNT(S)

(S NT(NP)

Terminals

Page 90: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Stack ActionNT(S)

(S NT(NP)

(S (NP

Terminals

Page 91: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Stack ActionNT(S)

(S NT(NP)

(S (NP GEN(The)

Terminals

Page 92: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Stack ActionNT(S)

(S NT(NP)

(S (NP GEN(The)

(S (NP TheThe

Terminals

Page 93: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Stack ActionNT(S)

(S NT(NP)

(S (NP GEN(The)

GEN(hungry)(S (NP TheThe

Terminals

Page 94: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Stack ActionNT(S)

(S NT(NP)

(S (NP GEN(The)

GEN(hungry)

(S (NP The hungryThe hungry

(S (NP TheThe

Terminals

Page 95: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Stack ActionNT(S)

(S NT(NP)

(S (NP GEN(The)

GEN(hungry)

GEN(cat)(S (NP The hungryThe hungry

(S (NP TheThe

Terminals

Page 96: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Stack ActionNT(S)

(S NT(NP)

(S (NP GEN(The)

GEN(hungry)

GEN(cat)

(S (NP The hungry catThe hungry cat

(S (NP The hungryThe hungry

(S (NP TheThe

Terminals

Page 97: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Stack ActionNT(S)

(S NT(NP)

(S (NP GEN(The)

GEN(hungry)

GEN(cat)

REDUCE(S (NP The hungry catThe hungry cat

(S (NP The hungryThe hungry

(S (NP TheThe

Terminals

Page 98: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Stack ActionNT(S)

(S NT(NP)

(S (NP GEN(The)

GEN(hungry)

GEN(cat)

REDUCE(S (NP The hungry catThe hungry cat

(S (NP The hungryThe hungry

(S (NP TheThe

Terminals

The hungry cat (S (NP The hungry cat )

Page 99: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Stack ActionNT(S)

(S NT(NP)

(S (NP GEN(The)

GEN(hungry)

GEN(cat)

REDUCE(S (NP The hungry catThe hungry cat

(S (NP The hungryThe hungry

(S (NP TheThe

Terminals

The hungry cat (S (NP The hungry cat )

(S (NP The hungry cat)

Compress “The hungry cat” into a single composite symbol

Page 100: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Stack ActionNT(S)

(S NT(NP)

(S (NP GEN(The)

GEN(hungry)

GEN(cat)

REDUCE

(S (NP The hungry cat)The hungry cat

(S (NP The hungry catThe hungry cat

(S (NP The hungryThe hungry

(S (NP TheThe

Terminals

Page 101: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Stack ActionNT(S)

(S NT(NP)

(S (NP GEN(The)

GEN(hungry)

GEN(cat)

REDUCE

NT(VP)(S (NP The hungry cat)The hungry cat

(S (NP The hungry catThe hungry cat

(S (NP The hungryThe hungry

(S (NP TheThe

Terminals

Page 102: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Stack ActionNT(S)

(S NT(NP)

(S (NP GEN(The)

GEN(hungry)

GEN(cat)

REDUCE

NT(VP)

(S (NP The hungry cat) (VPThe hungry cat

(S (NP The hungry cat)The hungry cat

(S (NP The hungry catThe hungry cat

(S (NP The hungryThe hungry

(S (NP TheThe

Terminals

Page 103: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Stack ActionNT(S)

(S NT(NP)

(S (NP GEN(The)

GEN(hungry)

GEN(cat)

REDUCE

NT(VP)

GEN(meows)(S (NP The hungry cat) (VPThe hungry cat

(S (NP The hungry cat)The hungry cat

(S (NP The hungry catThe hungry cat

(S (NP The hungryThe hungry

(S (NP TheThe

Terminals

Page 104: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Stack ActionNT(S)

(S NT(NP)

(S (NP GEN(The)

GEN(hungry)

GEN(cat)

REDUCE

NT(VP)

GEN(meows)

The hungry cat meows

REDUCE

(S (NP The hungry cat) (VP meows) GEN(.)

REDUCE

(S (NP The hungry cat) (VP meows) .)The hungry cat meows .

(S (NP The hungry cat) (VP meows) .The hungry cat meows .

(S (NP The hungry cat) (VP meowsThe hungry cat meows

(S (NP The hungry cat) (VPThe hungry cat

(S (NP The hungry cat)The hungry cat

(S (NP The hungry catThe hungry cat

(S (NP The hungryThe hungry

(S (NP TheThe

Terminals

Page 105: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Syntactic Composition(NP The hungry cat)Need representation for:

Page 106: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Syntactic Composition(NP The hungry cat)Need representation for:

NP

What head type?

Page 107: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Syntactic Composition

The

(NP The hungry cat)Need representation for:

NP

What head type?

Page 108: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Syntactic Composition

The hungry

(NP The hungry cat)Need representation for:

NP

What head type?

Page 109: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Syntactic Composition

The hungry cat

(NP The hungry cat)Need representation for:

NP

What head type?

Page 110: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Syntactic Composition

The hungry cat )

(NP The hungry cat)Need representation for:

NP

What head type?

Page 111: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Syntactic Composition

TheNP hungry cat )

(NP The hungry cat)Need representation for:

Page 112: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Syntactic Composition

TheNP hungry cat ) NP

(NP The hungry cat)Need representation for:

Page 113: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Syntactic Composition

TheNP hungry cat ) NP

(NP The hungry cat)Need representation for:

Page 114: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Syntactic Composition

TheNP hungry cat ) NP

(NP The hungry cat)Need representation for:

Page 115: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Syntactic Composition

TheNP hungry cat ) NP

(NP The hungry cat)Need representation for:

Page 116: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Syntactic Composition

TheNP hungry cat ) NP (

(NP The hungry cat)Need representation for:

Page 117: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Syntactic Composition

TheNP hungry cat ) NP (

(NP The hungry cat)Need representation for:

Page 118: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Recursion

TheNP cat ) NP (

(NP The (ADJP very hungry) cat)Need representation for: (NP The hungry cat)

hungry

Page 119: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Recursion

TheNP cat ) NP (

(NP The (ADJP very hungry) cat)Need representation for: (NP The hungry cat)

| {z }v

v

Page 120: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Syntactic Composition• Inspired by Socher et al (2011, 2012 …)

• words and constituents embedded in same space

• Composition functions designed to

• capture linguistic notion of headedness (LSTMs know what type of head they are looking for while they traverse children)

• support any number of children

• are learned via backpropagation through structure

Page 121: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

• RNNGs jointly model sequences of words together with a “tree structure”,

• Any parse tree can be converted to a sequence of actions (depth first traversal) and vice versa (subject to wellformedness constraints)

• We use trees from the Penn Treebank

• We could treat the non-generation actions as latent variables or learn them with RL, effectively making this a problem of grammar induction. Future work…

Implementing RNNGsParameter Estimation

p✓(x,y)

Page 122: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

• An RNNG is a joint distribution p(x,y) over strings (x) and parse trees (y)

• We are interested in two inference questions:

• What is p(x) for a given x? [language modeling]

• What is max p(y | x) for a given x? [parsing]

• Unfortunately, the dynamic programming algorithms we often rely on are of no help here

• We can use importance sampling to do both by sampling from a discriminatively trained model

y

Implementing RNNGsInference

Page 123: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Type F1

Petrov and Klein (2007) G 90.1

Shindo et al (2012)Single model G 91.1

Shindo et al (2012)Ensemble ~G 92.4

Vinyals et al (2015)PTB only D 90.5

Vinyals et al (2015)Ensemble S 92.8

Discriminative D 89.8

Generative (IS) G 92.4

English PTB (Parsing)

Page 124: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Type F1

Petrov and Klein (2007) G 90.1

Shindo et al (2012)Single model G 91.1

Shindo et al (2012)Ensemble ~G 92.4

Vinyals et al (2015)PTB only D 90.5

Vinyals et al (2015)Ensemble S 92.8

Discriminative D 89.8

Generative (IS) G 92.4

English PTB (Parsing)

Page 125: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Type F1

Petrov and Klein (2007) G 90.1

Shindo et al (2012)Single model G 91.1

Shindo et al (2012)Ensemble ~G 92.4

Vinyals et al (2015)PTB only D 90.5

Vinyals et al (2015)Ensemble S 92.8

Discriminative D 89.8

Generative (IS) G 92.4

English PTB (Parsing)

Page 126: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Perplexity

5-gram IKN 169.3

LSTM + Dropout 113.4

Generative (IS) 102.4

English PTB (LM)

Perplexity

5-gram IKN 255.2

LSTM + Dropout 207.3

Generative (IS) 171.9

Chinese CTB (LM)

Page 127: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

This Talk, In a Nutshell• Facts about language:

• Arbitrariness and compositionality exist at all levels

• Language is sensitive to hierarchy, not strings

• My work’s hypothesis:Models designed with these considerations structure explicit will outperform models that don’t

Page 128: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

STOPSTARTt h a n k s !

Page 129: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

STOPSTARTt h a n k s !

questions?

Page 130: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

• Augment a sequential RNN with a stack pointer

• Two constant-time operations

• push - read input, add to top of stack, connect to current location of the stack pointer

• pop - move stack pointer to its parent

• A summary of stack contents is obtained by accessing the output of the RNN at location of the stack pointer

• Note: push and pop are discrete actions here (cf. Grefenstette et al., 2015)

Implementing RNNGsStack RNNs

Page 131: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

;

y0

PUSH

Implementing RNNGsStack RNNs

Page 132: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

; x1

y0 y1

POP

Implementing RNNGsStack RNNs

Page 133: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

; x1

y0 y1

Implementing RNNGsStack RNNs

Page 134: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

; x1

y0 y1

PUSH

Implementing RNNGsStack RNNs

Page 135: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

; x1

y0 y1 y2

x2

POP

Implementing RNNGsStack RNNs

Page 136: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

; x1

y0 y1 y2

x2

Implementing RNNGsStack RNNs

Page 137: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

; x1

y0 y1 y2

x2

PUSH

Implementing RNNGsStack RNNs

Page 138: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

; x1

y0 y1 y2

x2 x3

y3

Implementing RNNGsStack RNNs

Page 139: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Importance Samplingq(y | x)Assume we’ve got a conditional distribution

p(x,y) > 0 =) q(y | x) > 0y ⇠ q(y | x)

(i)(ii) is tractable and

q(y | x)(iii) is tractable

s.t.

Page 140: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Importance Samplingq(y | x)Assume we’ve got a conditional distribution

p(x,y) > 0 =) q(y | x) > 0y ⇠ q(y | x)

(i)(ii) is tractable and

q(y | x)(iii) is tractable

s.t.

w(x,y) =p(x,y)

q(y | x)Let the importance weights

Page 141: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Importance Samplingq(y | x)Assume we’ve got a conditional distribution

p(x,y) > 0 =) q(y | x) > 0y ⇠ q(y | x)

(i)(ii) is tractable and

q(y | x)(iii) is tractable

s.t.

w(x,y) =p(x,y)

q(y | x)Let the importance weights

p(x) =X

y2Y(x)

p(x,y) =X

y2Y(x)

w(x,y)q(y | x)

= Ey⇠q(y|x)w(x,y)

Page 142: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Importance Samplingp(x) =

X

y2Y(x)

p(x,y) =X

y2Y(x)

w(x,y)q(y | x)

= Ey⇠q(y|x)w(x,y)

Page 143: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Importance Samplingp(x) =

X

y2Y(x)

p(x,y) =X

y2Y(x)

w(x,y)q(y | x)

= Ey⇠q(y|x)w(x,y)

Replace this expectation with its Monte Carlo estimate.

y

(i) ⇠ q(y | x) for i 2 {1, 2, . . . , N}

Page 144: Should Neural Network Architecture Reflect Linguistic ...demo.clab.cs.cmu.edu/cdyer/iclr-ling.pdf · Should Neural Network Architecture Reflect Linguistic Structure? Wang Ling (Google/DeepMind)

Importance Samplingp(x) =

X

y2Y(x)

p(x,y) =X

y2Y(x)

w(x,y)q(y | x)

= Ey⇠q(y|x)w(x,y)

Replace this expectation with its Monte Carlo estimate.

y

(i) ⇠ q(y | x) for i 2 {1, 2, . . . , N}

Eq(y|x)w(x,y)MC⇡ 1

N

NX

i=1

w(x,y(i))