incorporating contextual and syntactic structures improves ... · pwim: he, hua, and jimmy lin....

31
Incorporating Contextual and Syntactic Structures Improves Semantic Similarity Modeling Linqing Liu, Wei Yang, Jinfeng Rao, Raphael Tang, and Jimmy Lin 1

Upload: others

Post on 04-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Incorporating Contextual and Syntactic Structures Improves ... · PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity

Incorporating Contextual and Syntactic Structures

Improves Semantic Similarity Modeling

Linqing Liu, Wei Yang, Jinfeng Rao, Raphael Tang, and Jimmy Lin

1

Page 2: Incorporating Contextual and Syntactic Structures Improves ... · PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity

2

Semantic Textual Similarity

Some people are walking

People are walking

Similarity score: 1 5

A group of scouts are camping in the grass

4.7

1.5

A group of people is on a beach

A group of people is near the sea

A group of people is on a mountain

4.0

2.6

SICK dataset: Marelli, Marco, et al. "Semeval-2014 task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment." Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014). 2014.

Page 3: Incorporating Contextual and Syntactic Structures Improves ... · PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity

Current Models for Sentence Pair Modeling

● Shortcut-stacked Sentence Encoders (SSE) [Nie and Bansal, 2017]

● Decomposable Attention Model (DecAtt) [Parikh et al., 2016]

● Bi-LSTM Max-pooling Network (InferSent) [Conneau et al., 2017]

● Enhanced Sequential Inference Model (ESIM) [Chen et al., 2017]

● Pairwise Word Interaction model (PWIM) [He and Lin, 2016]

3

Page 4: Incorporating Contextual and Syntactic Structures Improves ... · PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity

Current Models for Sentence Pair Modeling

4

Components

Sentence Pair Interaction & Attention Mechanism

Sentence Representation

Page 5: Incorporating Contextual and Syntactic Structures Improves ... · PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity

Current Models for Sentence Pair Modeling

5

Components

Sentence Pair Interaction & Attention Mechanism

Encoding Contextual Information by LSTM

Incorporation of Syntactic Parsing Information

Page 6: Incorporating Contextual and Syntactic Structures Improves ... · PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity

Current Models for Sentence Pair Modeling

6

SSE DecAtt Infersent ESIM PWIM

Components

Sentence Pair Interaction & Attention Mechanism

Encoding Contextual Information by LSTM

Incorporation of Syntactic Parsing Information

Page 7: Incorporating Contextual and Syntactic Structures Improves ... · PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity

Current Models for Sentence Pair Modeling

7

SSE DecAtt Infersent ESIM PWIM

Components

Sentence Pair Interaction & Attention Mechanism

Encoding Contextual Information by LSTM

Incorporation of Syntactic Parsing Information

Page 8: Incorporating Contextual and Syntactic Structures Improves ... · PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity

Current Models for Sentence Pair Modeling

8

SSE DecAtt Infersent ESIM PWIM

Components

Sentence Pair Interaction & Attention Mechanism

Encoding Contextual Information by LSTM

Incorporation of Syntactic Parsing Information

Page 9: Incorporating Contextual and Syntactic Structures Improves ... · PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity

Current Models for Sentence Pair Modeling

9

SSE DecAtt Infersent ESIM PWIM

Components

Sentence Pair Interaction & Attention Mechanism

Encoding Contextual Information by LSTM

Incorporation of Syntactic Parsing Information

Does Syntactic Structure help Sentence Pair Modeling?

Page 10: Incorporating Contextual and Syntactic Structures Improves ... · PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity

Current Models for Sentence Pair Modeling

10

SSE DecAtt Infersent ESIM PWIM

Components

Sentence Pair Interaction & Attention Mechanism

Encoding Contextual Information by LSTM

Incorporation of Syntactic Parsing Information

Do Contextual and Syntactic Structures help sentence Pair Modeling?

Page 11: Incorporating Contextual and Syntactic Structures Improves ... · PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity

Pairwise Word Interaction Model (PWIM)

11

w11 w12 w1n w21 w22 w2m…...…...

Cats Sit on the Mat On the Mat There Sit Cats

On the Mat There Sit Cats

Cats

Sit

On

The

Mat cos(w1n, w2m)

L2Euclid(w1n, w2m)

DotProduct(w1n, w2m)

19-layer deep CNN

Fully connected layers

…... …...Bi-LSTM

PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity measurement." Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016.

Lan, Wuwei, and Wei Xu. "Neural Network Models for Paraphrase Identification, Semantic Textual Similarity, Natural Language Inference, and Question Answering." ACL 2018.

Page 12: Incorporating Contextual and Syntactic Structures Improves ... · PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity

Pairwise Word Interaction Model (PWIM)

12

w11 w12 w1n w21 w22 w2m…...…...

Cats Sit on the Mat On the Mat There Sit Cats

On the Mat There Sit Cats

Cats

Sit

On

The

Mat cos(w1n, w2m)

L2Euclid(w1n, w2m)

DotProduct(w1n, w2m)

19-layer deep CNN

Fully connected layers

…... …...Bi-LSTM

PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity measurement." Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016.

Page 13: Incorporating Contextual and Syntactic Structures Improves ... · PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity

Pairwise Word Interaction Model (PWIM)

13

w11 w12 w1n w21 w22 w2m…...…...

Cats Sit on the Mat On the Mat There Sit Cats

On the Mat There Sit Cats

Cats

Sit

On

The

Mat cos(w1n, w2m)

L2Euclid(w1n, w2m)

DotProduct(w1n, w2m)

19-layer deep CNN

Fully connected layers

…... …...Bi-LSTM

Page 14: Incorporating Contextual and Syntactic Structures Improves ... · PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity

Pairwise Word Interaction Model (PWIM)

14

w11 w12 w1n w21 w22 w2m…...…...

…...

Cats Sit on the Mat On the Mat There Sit Cats

On the Mat There Sit Cats

Cats

Sit

On

The

Mat cos(w1n, w2m)

L2Euclid(w1n, w2m)

DotProduct(w1n, w2m)

19-layer deep CNN

Fully connected layers

Bi-LSTM

…...Bi-LSTM

…...Bi-LSTM

…...

…...

…...

Multi-layer BiLSTMSentence Encoders

Page 15: Incorporating Contextual and Syntactic Structures Improves ... · PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity

Pairwise Word Interaction Model (PWIM)

15

w11 w12 w1n w21 w22 w2m…...…...

Cats Sit on the Mat On the Mat There Sit Cats

On the Mat There Sit Cats

Cats

Sit

On

The

Mat cos(w1n, w2m)

L2Euclid(w1n, w2m)

DotProduct(w1n, w2m)

19-layer deep CNN

Fully connected layers

…... …...Bi-LSTM

Page 16: Incorporating Contextual and Syntactic Structures Improves ... · PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity

Pairwise Word Interaction Model (PWIM)

16

On the Mat There Sit Cats

Cats

Sit

On

The

Mat cos(w1n, w2m)

L2Euclid(w1n, w2m)

DotProduct(w1n, w2m)

19-layer deep CNN

Fully connected layers

w11 w12 w1n w21 w22 w2m…...…...

Cats Sit on the Mat On the Mat There Sit Cats

w1

w2

w3

w5

w4 w2

w3

w1

w5

w4 w6

Tree-LSTM

*Tai, Kai Sheng, Richard Socher, and Christopher D. Manning. "Improved semantic representations from tree-structured long short-term memory networks." arXiv preprint arXiv:1503.00075(2015).

Page 17: Incorporating Contextual and Syntactic Structures Improves ... · PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity

Experiments on eight datasets

❏ Semantic Textual Similarity

STS-2014 SICK

❏ Paraphrase Identification

Quora Twitter PIT-2015

❏ Question Answering

WikiQA TrecQA

❏ Natural Language Inference

SNLI17

0 5

Paraphrase Non-paraphrase

True False

Entailment Neutral Contradict

1

Page 18: Incorporating Contextual and Syntactic Structures Improves ... · PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity

Experiment Results Analyses

18

Page 19: Incorporating Contextual and Syntactic Structures Improves ... · PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity

Additional Contextual Structure?

19

Page 20: Incorporating Contextual and Syntactic Structures Improves ... · PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity

Additional Contextual Structure?

20

+0.019 +0.006 +0.009 +0.012 +0.01 +0.008 +0.015 +0.029

Page 21: Incorporating Contextual and Syntactic Structures Improves ... · PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity

Additional Contextual Structure?

21

+0.019 +0.006 +0.009 +0.012 +0.01 +0.008 +0.015 +0.029

Page 22: Incorporating Contextual and Syntactic Structures Improves ... · PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity

Additional Contextual Structure?

22

+0.336 -0.016 +0.107 +0.19 +0.093 +0.146 -0.004

Page 23: Incorporating Contextual and Syntactic Structures Improves ... · PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity

Additional Syntactic Structure?

23

+0.023 +0.016 +0.017 -0.002 +0.021 +0.026 +0.022 +0.033

Page 24: Incorporating Contextual and Syntactic Structures Improves ... · PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity

Sentence Pair Visualization

24

mPWIM_seq mPWIM_seq+tree

Page 25: Incorporating Contextual and Syntactic Structures Improves ... · PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity

Sentence Pair Visualization

25

mPWIM_seq mPWIM_seq+tree

Page 26: Incorporating Contextual and Syntactic Structures Improves ... · PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity

Sentence Pair Visualization

26

mPWIM_seq mPWIM_seq+tree

Page 27: Incorporating Contextual and Syntactic Structures Improves ... · PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity

Sentence Pair Visualization

27

mPWIM_seq mPWIM_seq+tree

Page 28: Incorporating Contextual and Syntactic Structures Improves ... · PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity

28

Page 29: Incorporating Contextual and Syntactic Structures Improves ... · PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity

29

● We are after the question: Does Contextual / Syntactic Structure help?

● Help to form a better student model

Page 30: Incorporating Contextual and Syntactic Structures Improves ... · PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity

Takeaway

30

Incorporating structural information contributes to consistent improvements over strong baselines

Page 31: Incorporating Contextual and Syntactic Structures Improves ... · PWIM: He, Hua, and Jimmy Lin. "Pairwise word interaction modeling with deep neural networks for semantic similarity

31