encoding linguistic structures with graph convolutional networks · 2019-11-10 · @south england...
TRANSCRIPT
Encoding Linguistic Structures with Graph Convolutional Networks
Diego MarcheggianiJoint work with Ivan Titov and Joost Bastings
University of AmsterdamUniversity of Edinburgh
@South England NLP Meetup
Structured (Linguistic) Priors
Sequa makes and repairs jet engines.
creator
creation
entity repaired
repairer
SBJ COORD
OBJ
CONJ NMOD
ROOT
“I voted for Palpatine because he was
most aligned with my values,” she said.
2
Sequence to Sequence
3
[Sutskever et al., 2014]
the black cat
le chat noire <\s>
<s> le chat noire
Sequence to Sequence
} Language is not (only) a sequence of words} We have linguistic knowledge
4
[Sutskever et al., 2014]
the black cat
le chat noire <\s>
<s> le chat noire
Sequence to Sequence
} Language is not (only) a sequence of words} We have linguistic knowledge
Encode structured linguistic knowledge into NN using Graph Convolutional Networks
5
the black cat
le chat noire <\s>
<s> le chat noire
Outline
} Semantic Role Labeling} Graph Convolutional Networks (GCN)} Syntactic GCN for Semantic Role Labeling (SRL)} SRL Model} Exploiting Semantics in Neural Machine Translation with GCNs
Encoding Sentences with Graph Convolutional Networks for Semantic Role LabelingDiego Marcheggiani, Ivan Titov. In Proceedings of EMNLP, 2017.
Exploiting Semantics in Neural Machine Translation with Graph Convolutional NetworksDiego Marcheggiani, Joost Bastings, Ivan Titov. In Proceedings of NAACL-HLT, 2018.
6
Semantic Role Labeling
} Predicting the predicate-argument structure of a sentence
Sequa makes and repairs jet engines.Sequa makes and repairs jet engines.
7
Semantic Role Labeling
} Predicting the predicate-argument structure of a sentence} Discover and disambiguate predicates
8
Sequa makes and repairs jet engines.make.01 repair.01
Sequa makes and repairs jet engines.
} Predicting the predicate-argument structure of a sentence} Discover and disambiguate predicates} Identify arguments and label them with their semantic roles
Sequa makes and repairs jet engines.make.01 repair.01
Creator
Semantic Role Labeling
9
} Predicting the predicate-argument structure of a sentence} Discover and disambiguate predicates} Identify arguments and label them with their semantic roles
Sequa makes and repairs jet engines.make.01 repair.01
Creator
Creation
Semantic Role Labeling
10
} Predicting the predicate-argument structure of a sentence} Discover and disambiguate predicates} Identify arguments and label them with their semantic roles
Sequa makes and repairs jet engines.make.01 repair.01
Creator
Creation
Entity repaired
Repairer
Semantic Role Labeling
11
Semantic Role Labeling
} Only the head of an argument is labeled} Sequence labeling task for each predicate} Focus on argument identification and labeling
12
Sequa makes and repairs jet engines.make.01 repair.01
Creator
Creation
Entity repaired
Repairer
Semantic Role Labeling
13
Question answeringNarayanan and Harabagiu 2004
Shen and Lapata 2007 Khashabi et al. 2018
Machine translationWu and Fung 2009
Aziz et al. 2011
Information extractionSurdeanu et al. 2003
Christensen et al. 2010
Related work
14
Tutorial on Semantic Role Labeling at EMNLP 2017
Related work
} SRL systems that use syntax with simple NN architectures} [FitzGerald et al., 2015]} [Roth and Lapata, 2016]
} Recent models ignore linguistic bias } [Zhou and Xu, 2014]} [He et al., 2017]} [Marcheggiani et al., 2017]
15
Tutorial on Semantic Role Labeling at EMNLP 2017
Motivations
} Some semantic dependencies are mirrored in the syntactic graph
Sequa makes and repairs jet engines.
creator
creation
SBJ COORD
OBJ
CONJ NMOD
ROOT
16
Sequa makes and repairs jet engines.
creator
creation
entity repaired
repairer
SBJ COORD
OBJ
CONJ NMOD
ROOT
Motivations
} Some semantic dependencies are mirrored in the syntactic graph} Not all of them – syntax-semantics interface is not trivial
17
Outline
} Semantic Role Labeling} Graph Convolutional Networks (GCN)} Syntactic GCN for Semantic Role Labeling (SRL)} SRL Model} Exploiting Semantics in Neural Machine Translation with GCNs
18
Graph Convolutional Networks (message passing)
Undirected graph
[Gori et al. 2005 Scarselli et al. 2009Kipf and Welling, 2016]
19
Graph Convolutional Networks (message passing)
Undirected graph Update of the blue node
[Gori et al. 2005 Scarselli et al. 2009Kipf and Welling, 2016]
20
Graph Convolutional Networks (message passing)
Undirected graph Update of the blue node
[Kipf and Welling, 2016]
21
hi = ReLU
0
@W0hi +X
j2N (v)
W1hj
1
A
<latexit sha1_base64="dRNZOAdr3+64yfJmCNqaHzngt30=">AAACcXicbVFdS9xAFJ2kttqtrdv6VEQYXGxXhCWRQvtSkPbFBxEt3Q8wS5jM3mxGJ5MwcyMNIT/Cn9Wf0N/Rh746WaOwbi8MnDnn3LkfE+VSGPS8P477bO35i/WNl51Xm6/fbHXfvhuZrNAchjyTmZ5EzIAUCoYoUMIk18DSSMI4uv7e6OMb0EZk6ieWOUxTNlciFpyhpcLubbB4o4pkATVNQkEfCcav66/0B5wOaSAhxn6r8JKpehx6q+7Dh6uGWR2YIg2rKxoIRYOUYcKZrM7q/s1BTcehb7OvlrMDLeYJHoTdnjfwFkFXgd+CHmnjPOz+DmYZL1JQyCUz5tL3cpxWTKPgEupOUBjIbQE2h+rXomBN9y03o3Gm7VFIF+ySkaXGlGlknU3r5qnWkP/TLguMv0wrofICQfH7QnEhKWa0WT+dCQ0cZWkB41rYFilPmGYc7Sd17Oz+00lXweho4HsD/+JT7/hbu4UNskP2SJ/45DM5JifknAwJJ/+cXeeD89H56753qbt3b3WdNmebLIV7eAfNqr4U</latexit><latexit sha1_base64="dRNZOAdr3+64yfJmCNqaHzngt30=">AAACcXicbVFdS9xAFJ2kttqtrdv6VEQYXGxXhCWRQvtSkPbFBxEt3Q8wS5jM3mxGJ5MwcyMNIT/Cn9Wf0N/Rh746WaOwbi8MnDnn3LkfE+VSGPS8P477bO35i/WNl51Xm6/fbHXfvhuZrNAchjyTmZ5EzIAUCoYoUMIk18DSSMI4uv7e6OMb0EZk6ieWOUxTNlciFpyhpcLubbB4o4pkATVNQkEfCcav66/0B5wOaSAhxn6r8JKpehx6q+7Dh6uGWR2YIg2rKxoIRYOUYcKZrM7q/s1BTcehb7OvlrMDLeYJHoTdnjfwFkFXgd+CHmnjPOz+DmYZL1JQyCUz5tL3cpxWTKPgEupOUBjIbQE2h+rXomBN9y03o3Gm7VFIF+ySkaXGlGlknU3r5qnWkP/TLguMv0wrofICQfH7QnEhKWa0WT+dCQ0cZWkB41rYFilPmGYc7Sd17Oz+00lXweho4HsD/+JT7/hbu4UNskP2SJ/45DM5JifknAwJJ/+cXeeD89H56753qbt3b3WdNmebLIV7eAfNqr4U</latexit><latexit sha1_base64="dRNZOAdr3+64yfJmCNqaHzngt30=">AAACcXicbVFdS9xAFJ2kttqtrdv6VEQYXGxXhCWRQvtSkPbFBxEt3Q8wS5jM3mxGJ5MwcyMNIT/Cn9Wf0N/Rh746WaOwbi8MnDnn3LkfE+VSGPS8P477bO35i/WNl51Xm6/fbHXfvhuZrNAchjyTmZ5EzIAUCoYoUMIk18DSSMI4uv7e6OMb0EZk6ieWOUxTNlciFpyhpcLubbB4o4pkATVNQkEfCcav66/0B5wOaSAhxn6r8JKpehx6q+7Dh6uGWR2YIg2rKxoIRYOUYcKZrM7q/s1BTcehb7OvlrMDLeYJHoTdnjfwFkFXgd+CHmnjPOz+DmYZL1JQyCUz5tL3cpxWTKPgEupOUBjIbQE2h+rXomBN9y03o3Gm7VFIF+ySkaXGlGlknU3r5qnWkP/TLguMv0wrofICQfH7QnEhKWa0WT+dCQ0cZWkB41rYFilPmGYc7Sd17Oz+00lXweho4HsD/+JT7/hbu4UNskP2SJ/45DM5JifknAwJJ/+cXeeD89H56753qbt3b3WdNmebLIV7eAfNqr4U</latexit><latexit sha1_base64="dRNZOAdr3+64yfJmCNqaHzngt30=">AAACcXicbVFdS9xAFJ2kttqtrdv6VEQYXGxXhCWRQvtSkPbFBxEt3Q8wS5jM3mxGJ5MwcyMNIT/Cn9Wf0N/Rh746WaOwbi8MnDnn3LkfE+VSGPS8P477bO35i/WNl51Xm6/fbHXfvhuZrNAchjyTmZ5EzIAUCoYoUMIk18DSSMI4uv7e6OMb0EZk6ieWOUxTNlciFpyhpcLubbB4o4pkATVNQkEfCcav66/0B5wOaSAhxn6r8JKpehx6q+7Dh6uGWR2YIg2rKxoIRYOUYcKZrM7q/s1BTcehb7OvlrMDLeYJHoTdnjfwFkFXgd+CHmnjPOz+DmYZL1JQyCUz5tL3cpxWTKPgEupOUBjIbQE2h+rXomBN9y03o3Gm7VFIF+ySkaXGlGlknU3r5qnWkP/TLguMv0wrofICQfH7QnEhKWa0WT+dCQ0cZWkB41rYFilPmGYc7Sd17Oz+00lXweho4HsD/+JT7/hbu4UNskP2SJ/45DM5JifknAwJJ/+cXeeD89H56753qbt3b3WdNmebLIV7eAfNqr4U</latexit>
NeighborhoodSelf loop
GCNs PipelineHidden layer Hidden layer
Input Output
X = H(0)
H(1) H(2)
Z = H(n)
Initial feature representation of
nodes
Representation informed by nodes’
neighborhood
[Kipf and Welling, 2016]
… …
…
22
GCNs PipelineHidden layer Hidden layer
Input Output
X = H(0)
H(1) H(2)
Z = H(n)
[Kipf and Welling, 2016]
… …
…
Extend GCNs for syntactic dependency trees
Initial feature representation of
nodes
Representation informed by nodes’
neighborhood
23
Outline
} Semantic Role Labeling} Graph Convolutional Networks (GCN)} Syntactic GCN for Semantic Role Labeling (SRL)} SRL Model} Exploiting Semantics in Neural Machine Translation with GCNs
24
Example
Lane disputed those estimates
NMOD
SBJ OBJ
[Marcheggiani and Titov, 2017]
25
Example
Lane disputed those estimates
NMOD
SBJ OBJ
⇥W
(1)
self
⇥W
(1)
self
⇥W
(1)
self
⇥W
(1)
self
ReLU(⌃·)ReLU(⌃·)ReLU(⌃·)ReLU(⌃·)
[Marcheggiani and Titov, 2017]
26
Example
Lane disputed those estimates
NMOD
SBJ OBJ
⇥W
(1)
self
⇥W
(1)
self
⇥W
(1)
self
⇥W
(1)
self
ReLU(⌃·)ReLU(⌃·)ReLU(⌃·)ReLU(⌃·)
⇥W(1)subj
⇥W(1)n
m
o
d
⇥W(1)
o
b
j
[Marcheggiani and Titov, 2017]
27
Example
Lane disputed those estimates
NMOD
SBJ OBJ
⇥W
(1)
self
⇥W
(1)
self
⇥W
(1)
self
⇥W
(1)
self
ReLU(⌃·)ReLU(⌃·)ReLU(⌃·)ReLU(⌃·)
⇥W(1)subj
⇥W(1)n
m
o
d
⇥W(1)
o
b
j
⇥W (1)o
b
j
0
⇥W(1)
n
m
o
d
0
⇥W(1)
subj
0
[Marcheggiani and Titov, 2017]
28
Example
Lane disputed those estimates
NMOD
SBJ OBJ
⇥W
(1)
self
⇥W
(1)
self
⇥W
(1)
self
⇥W
(1)
self
ReLU(⌃·)ReLU(⌃·)ReLU(⌃·)ReLU(⌃·)
⇥W(1)subj
⇥W(1)n
m
o
d
⇥W(1)
o
b
j
⇥W (1)o
b
j
0
⇥W(1)
n
m
o
d
0
⇥W(1)
subj
0
[Marcheggiani and Titov, 2017]
29
Example
⇥W
(1)
self
Lane disputed those estimates
NMOD
SBJ OBJ
⇥W(1)subj
⇥W
(1)
self
⇥W
(1)
self
⇥W
(1)
self
⇥W (1)o
b
j
0
⇥W(1)n
m
o
d
⇥W(1)
n
m
o
d
0
⇥W(1)
o
b
j
⇥W(1)
subj
0
ReLU(⌃·)ReLU(⌃·)ReLU(⌃·)ReLU(⌃·)
[Marcheggiani and Titov, 2017]
30
Example
⇥W
(1)
self
Lane disputed those estimates
NMOD
SBJ OBJ
⇥W(1)subj
⇥W
(1)
self
⇥W
(1)
self
⇥W
(1)
self
⇥W (1)o
b
j
0
⇥W(1)n
m
o
d
⇥W(1)
n
m
o
d
0
⇥W(1)
o
b
j
⇥W(1)
subj
0
ReLU(⌃·)ReLU(⌃·)ReLU(⌃·)ReLU(⌃·)
ReLU(⌃·)ReLU(⌃·)ReLU(⌃·)ReLU(⌃·)
⇥W
(2)
self ⇥W
(2)
self
⇥W
(2)
self
⇥W
(2)
self
⇥W(2)subj⇥W
(2)
subj
0
⇥W (2)o
b
j
0
⇥W(2)
o
b
j
⇥W
(2)n
m
o
d
⇥W(2)
n
m
o
d
0
Stacking GCNs widens the syntactic neighborhood
[Marcheggiani and Titov, 2017]
31
Syntactic GCNs
h(k+1)v = ReLU
0
@X
u2N (v)
W (k)L(u,v)h
(k)u + b(k)L(u,v)
1
A
[Marcheggiani and Titov, 2017]
32
Syntactic GCNs
h(k+1)v = ReLU
0
@X
u2N (v)
W (k)L(u,v)h
(k)u + b(k)L(u,v)
1
A
Syntactic neighborhood
[Marcheggiani and Titov, 2017]
33
Syntactic GCNs
Syntactic neighborhood
h(k+1)v = ReLU
0
@X
u2N (v)
W (k)L(u,v)h
(k)u + b(k)L(u,v)
1
A
Message
[Marcheggiani and Titov, 2017]
34
Syntactic GCNs
Syntactic neighborhood Self-loop is included in NMessages are direction and
label specific
h(k+1)v = ReLU
0
@X
u2N (v)
W (k)L(u,v)h
(k)u + b(k)L(u,v)
1
A
Message
[Marcheggiani and Titov, 2017]
35
} Overparametrized: one matrix for each label-direction pair}
Syntactic GCNs
Syntactic neighborhood
W (k)L(u,v) = V (k)
dir(u,v)
Self-loop is included in NMessages are direction and
label specific
h(k+1)v = ReLU
0
@X
u2N (v)
W (k)L(u,v)h
(k)u + b(k)L(u,v)
1
A
Message
[Marcheggiani and Titov, 2017]
36
Edge-wise Gates
} Not all edges are equally important for the final task
[Marcheggiani and Titov, 2017]
37
Edge-wise Gates
} Not all edges are equally important for the final task} We should not blindly rely on predicted syntax
[Marcheggiani and Titov, 2017]
38
Edge-wise Gates
} Not all edges are equally important for the final task} We should not blindly rely on predicted syntax} Gates decide the “importance” of each message
Lane disputed those estimates
NMOD
SBJ OBJ
ReLU(⌃·) ReLU(⌃·)ReLU(⌃·)ReLU(⌃·)
g g g g g g g g g g
[Marcheggiani and Titov, 2017]
39
Edge-wise Gates
} Not all edges are equally important for the final task} We should not blindly rely on predicted syntax} Gates decide the “importance” of each message
Gates depend on nodes and edges Lane disputed those estimates
NMOD
SBJ OBJ
ReLU(⌃·) ReLU(⌃·)ReLU(⌃·)ReLU(⌃·)
g g g g g g g g g g
[Marcheggiani and Titov, 2017]
40
Outline
} Semantic Role Labeling} Graph Convolutional Networks (GCN)} Syntactic GCN for Semantic Role Labeling (SRL)} SRL Model} Exploiting Semantics in Neural Machine Translation with GCNs
41
Our Model
} Word representation} Bidirectional LSTM encoder} GCN Encoder} Local role classifier
[Marcheggiani and Titov, 2017]
42
Word Representation
} Pretrained word embeddings} Word embeddings} POS tag embeddings} Predicate lemma embeddings
Lane disputed those estimates
wordrepresentation
[Marcheggiani and Titov, 2017]
43
BiLSTM Encoder
} Encode each word with its left and right context} Stacked BiLSTM
Lane disputed those estimates
wordrepresentation
J layers BiLSTM
[Marcheggiani and Titov, 2017]
44
GCNs Encoder
} Syntactic GCNs after BiLSTM encoder} Add syntactic information} Skip connections} Longer dependencies are captured
Lane disputed those estimates
wordrepresentation
J layers BiLSTM
dobj
nmodnsubj
K layers GCN
[Marcheggiani and Titov, 2017]
45
Semantic Role Classifier
Lane disputed those estimates
wordrepresentation
J layers BiLSTM
dobj
nmodnsubj
K layers GCN
A1Classifier
�
predicate representation
candidate argument representation
} Local log-linear classifier
p(r|ti, tp, l) / exp(Wl,r(ti � tp))
46
Experiments
} Data} CoNLL-2009 dataset - English and Chinese} F1 evaluation measure
} Model} Hyperparameters tuned on English development set} State-of-the-art predicate disambiguation models
[Marcheggiani and Titov, 2017]
47
Ablation Experiments (Dev set)
82.7
83.3
81
82
83
84
85
English SRL w/o predicate disambiguation
BiLSTM GCN
[Marcheggiani and Titov, 2017]
48
75.2
77.1
73
74
75
76
77
78
Chinese SRL w/o predicate disambiguation
BiLSTM GCN
English Test Set
87.3
87.7 87.7
88
86
87
88
89
FitzGerald et al. (2015) (global)
Roth and Lapata (2016) (global)
Marcheggiani et al. (2017, CoNLL) (local)
Ours (Bi-LSTM + GCN) (local)
SRL with predicate disambiguation
[Marcheggiani and Titov, 2017]
49
English Out of Domain
75.2
76.1
77.7
77.2
74
75
76
77
78
FitzGerald et al. (2015) (global)
Roth and Lapata (2016) (global)
Marcheggiani et al. (2017, CoNLL) (local)
Ours (Bi-LSTM + GCN) (local)
SRL with predicate disambiguation
[Marcheggiani and Titov, 2017]
50
English Test Set (Ensemble)
87.787.9
89.1
86
87
88
89
90
FitzGerald et al. (2015) (ensemble) Roth and Lapata (2016) (ensemble) Ours (Bi-LSTM + GCN) (ensemble)
SRL with predicate disambiguation
[Marcheggiani and Titov, 2017]
51
Chinese Test Set
77.7
78.6
79.4
82.5
76
77
78
79
80
81
82
83
Zhao et al. (2009) (global) Bjö̈rkelund et al. (2009) (global)
Roth and Lapata (2016) (global)
Ours (Bi-LSTM + GCN) (local)
SRL with predicate disambiguation
[Marcheggiani and Titov, 2017]
52
Syntactic Graph Convolutional Networks
53
} Fast and simple} Can be seamlessly applied to other tasks
Syntactic Graph Convolutional Networks
54
} Fast and simple} Can be seamlessly applied to other tasks
Graph Convolutional Encoders for Syntax-aware Machine TranslationJoost Bastings, Ivan Titov, Wilker Aziz, Diego Marcheggiani, Khalil Sima'an. In Proceedings of EMNLP, 2017.
Syntactic Graph Convolutional Networks
55
} Fast and simple} Can be seamlessly applied to other tasks
Graph Convolutional Encoders for Syntax-aware Machine TranslationJoost Bastings, Ivan Titov, Wilker Aziz, Diego Marcheggiani, Khalil Sima'an. In Proceedings of EMNLP, 2017.
Improvements on English to German and
English to Czech translations
Multi-document Question Answering
56
[De Cao et al., 2018]
• Nodes are entities and edges are co-reference links• Inference on a graph representing the documents collection
Multi-document Question Answering
57
[De Cao et al., 2018]
Syntactic Graph Convolutional Networks
58
Syntactic Graph Convolutional Networks
59
Syntactic Graph Convolutional Networks
60
Outline
} Semantic Role Labeling} Graph Convolutional Networks (GCN)} Syntactic GCN for Semantic Role Labeling (SRL)} SRL Model} Exploiting Semantics in Neural Machine Translation with GCNs
61
Motivations [Marcheggiani at al., 2018]
62
John gave his wonderful wife a nice present .
Giver
Thing given
Entity given to
John gave a nice present to his wonderful wife .
Giver
Entity given to
Thing given
Motivations
SRL helps to generalize over different surface realizations of the same underlying “meaning”.
[Marcheggiani at al., 2018]
63
John gave his wonderful wife a nice present .
Giver
Thing given
Entity given to
John gave a nice present to his wonderful wife .
Giver
Entity given to
Thing given
Motivations
64
Motivations
65
Lost in translation
Related work
} Semantics in statistical MT} [Wu and Fung, 2009]} [Liu and Gildea, 2010]} [Aziz et al., 2011]} ...
} Syntax in neural MT} [Sennrich and Haddow, 2016]} [Aharoni and Goldberg, 2017 ]} [Bastings et al., 2017]} …
} Semantics in neural MT} ???
[Marcheggiani at al., 2018]
66
Predicate-argument encoding
67
John gave his wonderful wife a nice present
WA0
WA1WA2
WA0’ WA2’
WA1’
Wse
lf
Wse
lf
Wse
lf
Wse
lf
Wse
lf
Wse
lf
Wse
lf
Wse
lf
Semantic GCN
Semantic GCN WA0
WA1WA2
WA0’ WA2’
WA1’
Wse
lf
Wse
lf
Wse
lf
Wse
lf
Wse
lf
Wse
lf
Wse
lf
Wse
lf
Giver
Thing given
Entity given to
Our Model
} Standard sequence2sequence with attention} Semantic GCN encoder on top of a bidirectional RNN} RNN decoder
[Marcheggiani at al., 2018]
68
Our model
John gave his wonderful wife a nice present
WA0
WA1WA2
WA0’ WA2’
WA1’
Wse
lf
Wse
lf
Wse
lf
Wse
lf
Wse
lf
Wse
lf
Wse
lf
Wse
lf
BiRNN/CNN
Semantic GCN
Semantic GCN WA0
WA1WA2
WA0’ WA2’
WA1’
Wse
lf
Wse
lf
Wse
lf
Wse
lf
Wse
lf
Wse
lf
Wse
lf
Wse
lf<bos> John
John
+RNN
DECODERATTENTIONMECHANISM
[Marcheggiani at al., 2018]
69
Our model
John gave his wonderful wife a nice present
WA0
WA1WA2
WA0’ WA2’
WA1’
Wse
lf
Wse
lf
Wse
lf
Wse
lf
Wse
lf
Wse
lf
Wse
lf
Wse
lf
BiRNN/CNN
Semantic GCN
Semantic GCN WA0
WA1WA2
WA0’ WA2’
WA1’
Wse
lf
Wse
lf
Wse
lf
Wse
lf
Wse
lf
Wse
lf
Wse
lf
Wse
lf<bos> John
John
+RNN
DECODERATTENTIONMECHANISM
[Marcheggiani at al., 2018]
70
Experiments
} Data} WMT ‘16 English-German dataset (~4.5 million sentence pairs)} BLEU as evaluation measure
} Model} Hyperparameters tuned on News Commentary En-De (~226K sentence pairs)} GRU as RNN
[Marcheggiani at al., 2018]
71
Results
23.323.9
20
21
22
23
24
25
26
BiRNN (Bastings et al. 2017)
BiRNN + Syntactic GCN
(Bastings et al. 2017)
BiRNN + Semantic GCN
BiRNN+Syntactic GCN +Semantic GCN
Full WMT 2016 English-German BLEU
[Marcheggiani at al., 2018]
72
Results
23.323.9
24.5
20
21
22
23
24
25
26
BiRNN (Bastings et al. 2017)
BiRNN + Syntactic GCN
(Bastings et al. 2017)
BiRNN + Semantic GCN
BiRNN+Syntactic GCN +Semantic GCN
Full WMT 2016 English-German BLEU
[Marcheggiani at al., 2018]
73
Results
23.323.9
24.5
20
21
22
23
24
25
26
BiRNN (Bastings et al. 2017)
BiRNN + Syntactic GCN
(Bastings et al. 2017)
BiRNN + Semantic GCN
BiRNN+Syntactic GCN +Semantic GCN
Full WMT 2016 English-German BLEU
[Marcheggiani at al., 2018]
74
+ 1.2 BLEU
Results
23.323.9
24.5
20
21
22
23
24
25
26
BiRNN (Bastings et al. 2017)
BiRNN + Syntactic GCN
(Bastings et al. 2017)
BiRNN + Semantic GCN
BiRNN+Syntactic GCN +Semantic GCN
Full WMT 2016 English-German BLEU
Semantics is helpful
[Marcheggiani at al., 2018]
75
+ 1.2 BLEU
Results
23.323.9
24.524.9
20
21
22
23
24
25
26
BiRNN (Bastings et al. 2017)
BiRNN + Syntactic GCN
(Bastings et al. 2017)
BiRNN + Semantic GCN
BiRNN+Syntactic GCN +Semantic GCN
Full WMT 2016 English-German BLEU
[Marcheggiani at al., 2018]
76
Results
23.323.9
24.524.9
20
21
22
23
24
25
26
BiRNN (Bastings et al. 2017)
BiRNN + Syntactic GCN
(Bastings et al. 2017)
BiRNN + Semantic GCN
BiRNN+Syntactic GCN +Semantic GCN
Full WMT 2016 English-German BLEU
[Marcheggiani at al., 2018]
77
+ 1.6 BLEU
Results
23.323.9
24.524.9
20
21
22
23
24
25
26
BiRNN (Bastings et al. 2017)
BiRNN + Syntactic GCN
(Bastings et al. 2017)
BiRNN + Semantic GCN
BiRNN+Syntactic GCN +Semantic GCN
Full WMT 2016 English-German BLEU
Syntax and semantics are
complementary
[Marcheggiani at al., 2018]
78
+ 1.6 BLEU
Analysis
John sold the car to Mark .
Seller Thing sold Buyer
The boy sitting on a bench in the park plays chess .
Thing sitting Location Player GameAM-LOC
The boy walking down the dusty road is drinking a beer .
Walker AM-DIR Drinker Liquid
SOURCE
SEM GCN
BiRNN John verkaufte das Auto nach Mark .
John verkaufte das Auto an Mark .
SEM GCN
BiRNN Der Junge zu Fuß die staubige Straße ist ein Bier trinken .
Der Junge , der die staubige Straße hinunter geht , trinkt ein Bier .
SEM GCN
BiRNN Der Junge auf einer Bank im Park spielt Schach .
Der Junge sitzt auf einer Bank im Park Schach .
SOURCE
SOURCE
[Marcheggiani at al., 2018]
79
BiRNN mistranslates “to” as “nach” (directionality)
Analysis
John sold the car to Mark .
Seller Thing sold Buyer
The boy sitting on a bench in the park plays chess .
Thing sitting Location Player GameAM-LOC
The boy walking down the dusty road is drinking a beer .
Walker AM-DIR Drinker Liquid
SOURCE
SEM GCN
BiRNN John verkaufte das Auto nach Mark .
John verkaufte das Auto an Mark .
SEM GCN
BiRNN Der Junge zu Fuß die staubige Straße ist ein Bier trinken .
Der Junge , der die staubige Straße hinunter geht , trinkt ein Bier .
SEM GCN
BiRNN Der Junge auf einer Bank im Park spielt Schach .
Der Junge sitzt auf einer Bank im Park Schach .
SOURCE
SOURCE
[Marcheggiani at al., 2018]
80
BiRNN mistranslates “to” as “nach” (directionality)
John sold the car to Mark .
Seller Thing sold Buyer
The boy sitting on a bench in the park plays chess .
Thing sitting Location Player GameAM-LOC
The boy walking down the dusty road is drinking a beer .
Walker AM-DIR Drinker Liquid
SOURCE
SEM GCN
BiRNN John verkaufte das Auto nach Mark .
John verkaufte das Auto an Mark .
SEM GCN
BiRNN Der Junge zu Fuß die staubige Straße ist ein Bier trinken .
Der Junge , der die staubige Straße hinunter geht , trinkt ein Bier .
SEM GCN
BiRNN Der Junge auf einer Bank im Park spielt Schach .
Der Junge sitzt auf einer Bank im Park Schach .
SOURCE
SOURCE
81
BiRNN mistranslates “to” as “nach” (directionality)
Analysis [Marcheggiani at al., 2018]
John sold the car to Mark .
Seller Thing sold Buyer
The boy sitting on a bench in the park plays chess .
Thing sitting Location Player GameAM-LOC
The boy walking down the dusty road is drinking a beer .
Walker AM-DIR Drinker Liquid
SOURCE
SEM GCN
BiRNN John verkaufte das Auto nach Mark .
John verkaufte das Auto an Mark .
SEM GCN
BiRNN Der Junge zu Fuß die staubige Straße ist ein Bier trinken .
Der Junge , der die staubige Straße hinunter geht , trinkt ein Bier .
SEM GCN
BiRNN Der Junge auf einer Bank im Park spielt Schach .
Der Junge sitzt auf einer Bank im Park Schach .
SOURCE
SOURCE
Analysis [Marcheggiani at al., 2018]
82
Both translations are wrong,but the BiRNN’s one is grammatically correct
John sold the car to Mark .
Seller Thing sold Buyer
The boy sitting on a bench in the park plays chess .
Thing sitting Location Player GameAM-LOC
The boy walking down the dusty road is drinking a beer .
Walker AM-DIR Drinker Liquid
SOURCE
SEM GCN
BiRNN John verkaufte das Auto nach Mark .
John verkaufte das Auto an Mark .
SEM GCN
BiRNN Der Junge zu Fuß die staubige Straße ist ein Bier trinken .
Der Junge , der die staubige Straße hinunter geht , trinkt ein Bier .
SEM GCN
BiRNN Der Junge auf einer Bank im Park spielt Schach .
Der Junge sitzt auf einer Bank im Park Schach .
SOURCE
SOURCE
Analysis [Marcheggiani at al., 2018]
83
Both translations are wrong,but the BiRNN’s one is grammatically correct
John sold the car to Mark .
Seller Thing sold Buyer
The boy sitting on a bench in the park plays chess .
Thing sitting Location Player GameAM-LOC
The boy walking down the dusty road is drinking a beer .
Walker AM-DIR Drinker Liquid
SOURCE
SEM GCN
BiRNN John verkaufte das Auto nach Mark .
John verkaufte das Auto an Mark .
SEM GCN
BiRNN Der Junge zu Fuß die staubige Straße ist ein Bier trinken .
Der Junge , der die staubige Straße hinunter geht , trinkt ein Bier .
SEM GCN
BiRNN Der Junge auf einer Bank im Park spielt Schach .
Der Junge sitzt auf einer Bank im Park Schach .
SOURCE
SOURCE
Analysis [Marcheggiani at al., 2018]
84
Both translations are wrong,but the BiRNN’s one is grammatically correct
Conclusion
} GCNs for encoding linguistic structures into NN} Semantics, coreference, discourse} Fast} Cheap
} State-of-the-art model for dependency-based SRL
} First to exploit semantics in NMT
85
Roadmap
86
Including structured bias into neural NLP models
Roadmap
87
Including structured bias into neural NLP models
Low-resource setting
Roadmap
88
Including structured bias into neural NLP models
Low-resource settingLong-range dependencies
Document levelCross-document level
Roadmap
89
Including structured bias into neural NLP models
Low-resource settingLong-range dependencies
Document levelCross-document level
Integrating external knowledge i.e., knowledge graphs
Roadmap
90
Including structured bias into neural NLP models
Low-resource settingLong-range dependencies
Document levelCross-document level
Integrating external knowledge i.e., knowledge graphs
Thanks for your attention!