incorporating contextual and syntactic structures improves ... · pwim: he, hua, and jimmy lin....

Post on 04-Aug-2020

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Incorporating Contextual and Syntactic Structures

Improves Semantic Similarity Modeling

Linqing Liu, Wei Yang, Jinfeng Rao, Raphael Tang, and Jimmy Lin

1

2

Semantic Textual Similarity

Some people are walking

People are walking

Similarity score: 1 5

A group of scouts are camping in the grass

4.7

1.5

A group of people is on a beach

A group of people is near the sea

A group of people is on a mountain

4.0

2.6

SICK dataset: Marelli, Marco, et al. "Semeval-2014 task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment." Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014). 2014.

Current Models for Sentence Pair Modeling

● Shortcut-stacked Sentence Encoders (SSE) [Nie and Bansal, 2017]

● Decomposable Attention Model (DecAtt) [Parikh et al., 2016]

● Bi-LSTM Max-pooling Network (InferSent) [Conneau et al., 2017]

● Enhanced Sequential Inference Model (ESIM) [Chen et al., 2017]

● Pairwise Word Interaction model (PWIM) [He and Lin, 2016]

3

Current Models for Sentence Pair Modeling

4

Components

Sentence Pair Interaction & Attention Mechanism

Sentence Representation

Current Models for Sentence Pair Modeling

5

Components

Sentence Pair Interaction & Attention Mechanism

Encoding Contextual Information by LSTM

Incorporation of Syntactic Parsing Information

Current Models for Sentence Pair Modeling

6

SSE DecAtt Infersent ESIM PWIM

Components

Sentence Pair Interaction & Attention Mechanism

Encoding Contextual Information by LSTM

Incorporation of Syntactic Parsing Information

Current Models for Sentence Pair Modeling

7

SSE DecAtt Infersent ESIM PWIM

Components

Sentence Pair Interaction & Attention Mechanism

Encoding Contextual Information by LSTM

Incorporation of Syntactic Parsing Information

Current Models for Sentence Pair Modeling

8

SSE DecAtt Infersent ESIM PWIM

Components

Sentence Pair Interaction & Attention Mechanism

Encoding Contextual Information by LSTM

Incorporation of Syntactic Parsing Information

Current Models for Sentence Pair Modeling

9

SSE DecAtt Infersent ESIM PWIM

Components

Sentence Pair Interaction & Attention Mechanism

Encoding Contextual Information by LSTM

Incorporation of Syntactic Parsing Information

Does Syntactic Structure help Sentence Pair Modeling?

Current Models for Sentence Pair Modeling

10

SSE DecAtt Infersent ESIM PWIM

Components

Sentence Pair Interaction & Attention Mechanism

Encoding Contextual Information by LSTM

Incorporation of Syntactic Parsing Information

Do Contextual and Syntactic Structures help sentence Pair Modeling?

Pairwise Word Interaction Model (PWIM)

11

w11 w12 w1n w21 w22 w2m…...…...

Cats Sit on the Mat On the Mat There Sit Cats

On the Mat There Sit Cats

Cats

Sit

On

The

Mat cos(w1n, w2m)

L2Euclid(w1n, w2m)

DotProduct(w1n, w2m)

19-layer deep CNN

Fully connected layers

…... …...Bi-LSTM

PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity measurement." Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016.

Lan, Wuwei, and Wei Xu. "Neural Network Models for Paraphrase Identification, Semantic Textual Similarity, Natural Language Inference, and Question Answering." ACL 2018.

Pairwise Word Interaction Model (PWIM)

12

w11 w12 w1n w21 w22 w2m…...…...

Cats Sit on the Mat On the Mat There Sit Cats

On the Mat There Sit Cats

Cats

Sit

On

The

Mat cos(w1n, w2m)

L2Euclid(w1n, w2m)

DotProduct(w1n, w2m)

19-layer deep CNN

Fully connected layers

…... …...Bi-LSTM

PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity measurement." Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016.

Pairwise Word Interaction Model (PWIM)

13

w11 w12 w1n w21 w22 w2m…...…...

Cats Sit on the Mat On the Mat There Sit Cats

On the Mat There Sit Cats

Cats

Sit

On

The

Mat cos(w1n, w2m)

L2Euclid(w1n, w2m)

DotProduct(w1n, w2m)

19-layer deep CNN

Fully connected layers

…... …...Bi-LSTM

Pairwise Word Interaction Model (PWIM)

14

w11 w12 w1n w21 w22 w2m…...…...

…...

Cats Sit on the Mat On the Mat There Sit Cats

On the Mat There Sit Cats

Cats

Sit

On

The

Mat cos(w1n, w2m)

L2Euclid(w1n, w2m)

DotProduct(w1n, w2m)

19-layer deep CNN

Fully connected layers

Bi-LSTM

…...Bi-LSTM

…...Bi-LSTM

…...

…...

…...

Multi-layer BiLSTMSentence Encoders

Pairwise Word Interaction Model (PWIM)

15

w11 w12 w1n w21 w22 w2m…...…...

Cats Sit on the Mat On the Mat There Sit Cats

On the Mat There Sit Cats

Cats

Sit

On

The

Mat cos(w1n, w2m)

L2Euclid(w1n, w2m)

DotProduct(w1n, w2m)

19-layer deep CNN

Fully connected layers

…... …...Bi-LSTM

Pairwise Word Interaction Model (PWIM)

16

On the Mat There Sit Cats

Cats

Sit

On

The

Mat cos(w1n, w2m)

L2Euclid(w1n, w2m)

DotProduct(w1n, w2m)

19-layer deep CNN

Fully connected layers

w11 w12 w1n w21 w22 w2m…...…...

Cats Sit on the Mat On the Mat There Sit Cats

w1

w2

w3

w5

w4 w2

w3

w1

w5

w4 w6

Tree-LSTM

*Tai, Kai Sheng, Richard Socher, and Christopher D. Manning. "Improved semantic representations from tree-structured long short-term memory networks." arXiv preprint arXiv:1503.00075(2015).

Experiments on eight datasets

❏ Semantic Textual Similarity

STS-2014 SICK

❏ Paraphrase Identification

Quora Twitter PIT-2015

❏ Question Answering

WikiQA TrecQA

❏ Natural Language Inference

SNLI17

0 5

Paraphrase Non-paraphrase

True False

Entailment Neutral Contradict

1

Experiment Results Analyses

18

Additional Contextual Structure?

19

Additional Contextual Structure?

20

+0.019 +0.006 +0.009 +0.012 +0.01 +0.008 +0.015 +0.029

Additional Contextual Structure?

21

+0.019 +0.006 +0.009 +0.012 +0.01 +0.008 +0.015 +0.029

Additional Contextual Structure?

22

+0.336 -0.016 +0.107 +0.19 +0.093 +0.146 -0.004

Additional Syntactic Structure?

23

+0.023 +0.016 +0.017 -0.002 +0.021 +0.026 +0.022 +0.033

Sentence Pair Visualization

24

mPWIM_seq mPWIM_seq+tree

Sentence Pair Visualization

25

mPWIM_seq mPWIM_seq+tree

Sentence Pair Visualization

26

mPWIM_seq mPWIM_seq+tree

Sentence Pair Visualization

27

mPWIM_seq mPWIM_seq+tree

28

29

● We are after the question: Does Contextual / Syntactic Structure help?

● Help to form a better student model

Takeaway

30

Incorporating structural information contributes to consistent improvements over strong baselines

31

top related