extracting relational facts by an end-to-end neural …copies the first entity (head entity) from...

9
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Long Papers), pages 506–514 Melbourne, Australia, July 15 - 20, 2018. c 2018 Association for Computational Linguistics 506 Extracting Relational Facts by an End-to-End Neural Model with Copy Mechanism Xiangrong Zeng 12 , Daojian Zeng 3 , Shizhu He 1 , Kang Liu 12 , Jun Zhao 12 1 National Laboratory of Pattern Recognition (NLPR), Institute of Automation Chinese Academy of Sciences, Beijing, 100190, China 2 University of Chinese Academy of Sciences, Beijing, 100049, China 3 Changsha University of Science & Technology, Changsha, 410114, China {xiangrong.zeng, shizhu.he, kliu, jzhao}@nlpr.ia.ac.cn [email protected] Abstract The relational facts in sentences are often complicated. Different relational triplet- s may have overlaps in a sentence. We divided the sentences into three types ac- cording to triplet overlap degree, including Normal, EntityPairOverlap and SingleEn- tiyOverlap. Existing methods mainly fo- cus on Normal class and fail to extract re- lational triplets precisely. In this paper, we propose an end-to-end model based on sequence-to-sequence learning with copy mechanism, which can jointly extract rela- tional facts from sentences of any of these classes. We adopt two different strategies in decoding process: employing only one united decoder or applying multiple sepa- rated decoders. We test our models in two public datasets and our model outperform the baseline method significantly. 1 Introduction Recently, to build large structural knowledge bases (KB), great efforts have been made on extract- ing relational facts from natural language texts. A relational fact is often represented as a triplet which consists of two entities (an entity pair) and a semantic relation between them, such as < Chicago, country, UnitedStates >. So far, most previous methods mainly focused on the task of relation extraction or classification which identifies the semantic relations between t- wo pre-assigned entities. Although great progress- es have been made (Hendrickx et al., 2010; Zeng et al., 2014; Xu et al., 2015a,b), they all assume that the entities are identified beforehand and ne- glect the extraction of entities. To extract both of entities and relations, early works(Zelenko et al., 2003; Chan and Roth, 2011) adopted a pipeline N o r m a l S C hi c a go i s l o c a t e d i n t h e U n i t e d S t a t e s { C hi c a go c o un t r y U n i t e d S t a t e s > } EP O S N e w s of t h e l i s t s e x i s t e nc e u nn e r ve d o f f i c i a l s i n K h a r t o um S u da n s c a p i t a l { S u da n c a p i t a l K h a r t o um > S u da n c o n t a i ns K h a r t o um > } S EO S A a r h us a i r po r t s e r v e s t he c i t yof A a r h us w hos l e a de r i s J a c ob B u nd s g a a r d { A a r h us l e a d e r N a m e J a c ob B un d s ga a r d > A a r h us A i r p or t c i t y S e r v e d A a r h us > } C hi c a go U n i t e d S t a t e s c o un t r y S u da n K h a r t o um c o nt a i n s c a p i t a l A a r h us A a r h us A i r p o r t J a c ob B u nd s g a a r d Figure 1: Examples of Normal, EntityPairOver- lap (EPO) and SingleEntityOverlap (SEO) class- es. The overlapped entities are marked in yel- low. S1 belongs to Normal class because none of its triplets have overlapped entities; S2 belongs to EntityPairOverlap class since the entity pair < Sudan, Khartoum > of it’s two triplets are overlapped; And S3 belongs to SingleEntityOver- lap class because the entity Aarhus of it’s two triplets are overlapped and these two triplets have no overlapped entity pair. manner, where they first conduct entity recogni- tion and then predict relations between extract- ed entities. However, the pipeline framework ig- nores the relevance of entity identification and re- lation prediction (Li and Ji, 2014). Recent work- s attempted to extract entities and relations joint- ly. Yu and Lam (2010); Li and Ji (2014); Miwa and Sasaki (2014) designed several elaborate fea- tures to construct the bridge between these two subtasks. Similar to other natural language pro- cessing (NLP) tasks, they need complicated fea- ture engineering and heavily rely on pre-existing NLP tools for feature extraction.

Upload: others

Post on 27-Jun-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Extracting Relational Facts by an End-to-End Neural …copies the first entity (head entity) from the source sentence. Lastly, the decoder copies the second entity (tail entity) from

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Long Papers), pages 506–514Melbourne, Australia, July 15 - 20, 2018. c©2018 Association for Computational Linguistics

506

Extracting Relational Facts by an End-to-End Neural Model withCopy Mechanism

Xiangrong Zeng12, Daojian Zeng3, Shizhu He1, Kang Liu12, Jun Zhao12

1National Laboratory of Pattern Recognition (NLPR), Institute of AutomationChinese Academy of Sciences, Beijing, 100190, China

2University of Chinese Academy of Sciences, Beijing, 100049, China3Changsha University of Science & Technology, Changsha, 410114, China

{xiangrong.zeng, shizhu.he, kliu, jzhao}@[email protected]

Abstract

The relational facts in sentences are oftencomplicated. Different relational triplet-s may have overlaps in a sentence. Wedivided the sentences into three types ac-cording to triplet overlap degree, includingNormal, EntityPairOverlap and SingleEn-tiyOverlap. Existing methods mainly fo-cus on Normal class and fail to extract re-lational triplets precisely. In this paper,we propose an end-to-end model based onsequence-to-sequence learning with copymechanism, which can jointly extract rela-tional facts from sentences of any of theseclasses. We adopt two different strategiesin decoding process: employing only oneunited decoder or applying multiple sepa-rated decoders. We test our models in twopublic datasets and our model outperformthe baseline method significantly.

1 Introduction

Recently, to build large structural knowledge bases(KB), great efforts have been made on extract-ing relational facts from natural language texts.A relational fact is often represented as a tripletwhich consists of two entities (an entity pair)and a semantic relation between them, such as< Chicago, country, UnitedStates >.

So far, most previous methods mainly focusedon the task of relation extraction or classificationwhich identifies the semantic relations between t-wo pre-assigned entities. Although great progress-es have been made (Hendrickx et al., 2010; Zenget al., 2014; Xu et al., 2015a,b), they all assumethat the entities are identified beforehand and ne-glect the extraction of entities. To extract both ofentities and relations, early works(Zelenko et al.,2003; Chan and Roth, 2011) adopted a pipeline

Norm

al

S1: Chicago is located in the United States.

{<Chicago, country, United States>}

EP

OS2: News of the list’s existence unnerved officials in Khartoum, Sudan ’s capital.

{<Sudan, capital, Khartoum>,<Sudan, contains, Khartoum>}

SE

O

S3: Aarhus airport serves the city of Aarhus who's leader is Jacob Bundsgaard.

{<Aarhus, leaderName, Jacob Bundsgaard>,<Aarhus Airport, cityServed,

Aarhus>}

Chicago United States

country

Sudan Khartoum

contains

capital

Aarhus

Aarhus Airport

Jacob Bundsgaard

Figure 1: Examples of Normal, EntityPairOver-lap (EPO) and SingleEntityOverlap (SEO) class-es. The overlapped entities are marked in yel-low. S1 belongs to Normal class because noneof its triplets have overlapped entities; S2 belongsto EntityPairOverlap class since the entity pair< Sudan,Khartoum > of it’s two triplets areoverlapped; And S3 belongs to SingleEntityOver-lap class because the entity Aarhus of it’s twotriplets are overlapped and these two triplets haveno overlapped entity pair.

manner, where they first conduct entity recogni-tion and then predict relations between extract-ed entities. However, the pipeline framework ig-nores the relevance of entity identification and re-lation prediction (Li and Ji, 2014). Recent work-s attempted to extract entities and relations joint-ly. Yu and Lam (2010); Li and Ji (2014); Miwaand Sasaki (2014) designed several elaborate fea-tures to construct the bridge between these twosubtasks. Similar to other natural language pro-cessing (NLP) tasks, they need complicated fea-ture engineering and heavily rely on pre-existingNLP tools for feature extraction.

Page 2: Extracting Relational Facts by an End-to-End Neural …copies the first entity (head entity) from the source sentence. Lastly, the decoder copies the second entity (tail entity) from

507

Recently, with the success of deep learning onmany NLP tasks, it is also applied on relation-al facts extraction. Zeng et al. (2014); Xu et al.(2015a,b) employed CNN or RNN on relationclassification. Miwa and Bansal (2016); Guptaet al. (2016); Zhang et al. (2017) treated relationextraction task as an end-to-end (end2end) table-filling problem. Zheng et al. (2017) proposed anovel tagging schema and employed a RecurrentNeural Networks (RNN) based sequence labelingmodel to jointly extract entities and relations.

Nevertheless, the relational facts in sentencesare often complicated. Different relational triplet-s may have overlaps in a sentence. Such phe-nomenon makes aforementioned methods, what-ever deep learning based models and traditionalfeature engineering based joint models, always failto extract relational triplets precisely. Generally,according to our observation, we divide the sen-tences into three types according to triplet over-lap degree, including Normal, EntityPairOverlap(EPO) and SingleEntityOverlap (SEO). As shownin Figure 1, a sentence belongs to Normal class ifnone of its triplets have overlapped entities. A sen-tence belongs to EntityPairOverlap class if someof its triplets have overlapped entity pair. And asentence belongs to SingleEntityOverlap class ifsome of its triplets have an overlapped entity andthese triplets don’t have overlapped entity pair. Inour knowledge, most previous methods focusedon Normal type and seldom consider other type-s. Even the joint models based on neural network(Zheng et al., 2017), it only assigns a single tag toa word, which means one word can only partici-pate in at most one triplet. As a result, the tripletoverlap issue is not actually addressed.

To address the aforementioned challenge, weaim to design a model that could extract triplets,including entities and relations, from sentences ofNormal, EntityPairOverlap and SingleEntityOver-lap classes. To handle the problem of triplet over-lap, one entity must be allowed to freely partici-pate in multiple triplets. Different from previousneural methods, we propose an end2end modelbased on sequence-to-sequence (Seq2Seq) learn-ing with copy mechanism, which can jointly ex-tract relational facts from sentences of any of theseclasses. Specially, the main component of thismodel includes two parts: encoder and decoder.The encoder converts a natural language sentence(the source sentence) into a fixed length semantic

vector. Then, the decoder reads in this vector andgenerates triplets directly. To generate a triplet,firstly, the decoder generates the relation. Second-ly, by adopting the copy mechanism, the decodercopies the first entity (head entity) from the sourcesentence. Lastly, the decoder copies the secondentity (tail entity) from the source sentence. Inthis way, multiple triplets can be extracted (In de-tail, we adopt two different strategies in decod-ing process: employing only one unified decoder(OneDecoder) to generate all triplets or applyingmultiple separated decoders (MultiDecoder) andeach of them generating one triplet). In our mod-el, one entity is allowed to be copied several timeswhen it needs to participate in different triplets.Therefore, our model could handle the triplet over-lap issue and deal with both of EntityPairOverlapand SingleEntityOverlap sentence types. More-over, since extracting entities and relations in asingle end2end neural network, our model couldextract entities and relations jointly.

The main contributions of our work are as fol-lows:

• We propose an end2end neural model basedon sequence-to-sequence learning with copymechanism to extract relational facts fromsentences, where the entities and relationscould be jointly extracted.

• Our model could consider the relationaltriplet overlap problem through copy mecha-nism. In our knowledge, the relational tripletoverlap problem has never been addressedbefore.

• We conduct experiments on two publicdatasets. Experimental results show that weoutperforms the state-of-the-arts with 39.8%and 31.1% improvements respectively.

2 Related Work

By giving a sentence with annotated entities, Hen-drickx et al. (2010); Zeng et al. (2014); Xu et al.(2015a,b) treat identifying relations in sentencesas a multi-class classification problem. Zeng et al.(2014) among the first to introduce CNN into re-lation classification. Xu et al. (2015a) and Xuet al. (2015b) learned relation representations fromshortest dependency paths through a CNN or RN-N. Despite their success, these models ignore theextraction of the entities from sentences and couldnot truly extract relational facts.

Page 3: Extracting Relational Facts by an End-to-End Neural …copies the first entity (head entity) from the source sentence. Lastly, the decoder copies the second entity (tail entity) from

508

Born_in

Located_in

Contains

……

New

s

of the

list

's existence

unnerved

officials

in Khartoum

, Sudan

's capital

GOCapital

Capital

SudanKhartoum

Khartoum

ContainsSudan

SudanKhartoum

Relation Prediction0.9

Born_in𝑟

𝑟 Located_in

𝑟 Contains𝑟 ……

Predicted relation

Attention Vector 𝐜

Decod

er

Encoder

{<Capital,Sudan,Khartoum>, < Contains,Sudan,Khartoum>}Extracted triplets

0.8

Entity Copy

𝐩

Copied entity

New

s of

the

list 's

existence

unnerved

officials in

Khartoum ,

Sudan 's

capital

𝐩

𝐬

𝐨

𝐨

𝐨

𝐨

𝐨

𝐨

𝐡

𝐡

𝐡

𝐡

𝐡

𝐡

Figure 2: The overall structure of OneDecoder model. A bi-directional RNN is used to encode thesource sentence and then a decoder is used to generate triples directly. The relation is predicted and theentity is copied from source sentence.

By giving a sentence without any annotated en-tities, researchers proposed several methods to ex-tract both entities and relations. Pipeline basedmethods, like Zelenko et al. (2003) and Chan andRoth (2011), neglected the relevance of entity ex-traction and relation prediction. To resolve thisproblem, several joint models have been proposed.Early works (Yu and Lam, 2010; Li and Ji, 2014;Miwa and Sasaki, 2014) need complicated pro-cess of feature engineering and heavily depends onNLP tools for feature extraction. Recent models,like Miwa and Bansal (2016); Gupta et al. (2016);Zhang et al. (2017); Zheng et al. (2017), jointly ex-tract the entities and relations based on neural net-works. These models are based on tagging frame-work, which assigns a relational tag to a word ora word pair. Despite their success, none of thesemodels can fully handle the triplet overlap prob-lem mentioned in the first section. The reason isin their hypothesis, that is, a word (or a word pair)can only be assigned with just one relational tag.

This work is based on sequence-to-sequencelearning with copy mechanism, which have been

adopted for some NLP tasks. Dong and Lapata(2016) presented a method based on an attention-enhanced and encoder-decoder model, which en-codes input utterances and generates their logicalforms. Gu et al. (2016); He et al. (2017) appliedcopy mechanism to sentence generation. Theycopy a segment from the source sequence to thetarget sequence.

3 Our Model

In this section, we introduce a differentiable neu-ral model based on Seq2Seq learning with copymechanism, which is able to extract multiple rela-tional facts in an end2end fashion.

Our neural model encodes a variable-lengthsentence into a fixed-length vector representationfirst and then decodes this vector into the corre-sponding relational facts (triplets). When decod-ing, we can either decode all triplets with one u-nified decoder or decode every triplet with a sep-arated decoder. We denote them as OneDecodermodel and MultiDecoder model separately.

Page 4: Extracting Relational Facts by an End-to-End Neural …copies the first entity (head entity) from the source sentence. Lastly, the decoder copies the second entity (tail entity) from

509

3.1 OneDecoder Model

The overall structure of OneDecoder model isshown in Figure 2.

3.1.1 EncoderTo encode a sentence s = [w1, .., wn], where wt

represent the t-th word and n is the source sen-tence length, we first turn it into a matrix X =[x1, · · · , xn], where xt is the embedding of t-thword.

The canonical RNN encoder reads this matrixX sequentially and generates output oEt and hid-den state hE

t in time step t(1 ≤ t ≤ n) by

oEt ,hEt = f(xt,hE

t−1) (1)

where f(· ) represents the encoder function.Following (Gu et al., 2016), our encoder uses

a bi-directional RNN (Chung et al., 2014) to en-code the input sentence. The forward and back-

ward RNN obtain output sequence {−→oE1 , · · · ,

−→oEn }

and {←−oEn , · · · ,

←−oE1 }, respectively. We then concate-

nate−→oEt and

←−−−−oEn−t+1 to represent the t-th word. We

use OE = [oE1 , ..., oEn ], where oEt = [−→oEt ;←−−−−oEn−t+1],

to represent the concatenate result. Similarly, theconcatenation of forward and backward RNN hid-den states are used as the representation of sen-

tence, that is s = [−→hEn ;←−hEn ]

3.1.2 DecoderThe decoder is used to generate triplets direct-ly. Firstly, the decoder generates a relation forthe triplet. Secondly, the decoder copies an enti-ty from the source sentence as the first entity ofthe triplet. Lastly, the decoder copies the secondentity from the source sentence. Repeat this pro-cess, the decoder could generate multiple triplets.Once all valid triplets are generated, the decoderwill generate NA triplets, which means “stopping”and is similar to the “eos” symbol in neural sen-tence generation. Note that, a NA triplet is com-posed of an NA-relation and an NA-entity pair.

As shown in Figure 3 (a), in time step t (1 ≤t), we calculate the decoder output oDt and hiddenstate hD

t as follows:

oDt ,hDt = g(ut,hD

t−1) (2)

where g(· ) is the decoder function and hDt−1 is the

hidden state of time step t − 1. We initialize hD0

with the representation of source sentence s. ut is

the decoder input in time step t and we calculate itas:

ut = [vt; ct]·Wu (3)

where ct is the attention vector and vt is the em-bedding of copied entity or predicted relation intime step t− 1. Wu is a weight matrix.

Attention Vector. The attention vector ct is cal-culated as follows:

ct =n∑

i=1

αi × oEi (4)

α = softmax(β) (5)

βi = selu([hDt−1; oEi ]·wc) (6)

where oEi is the output of encoder in time step i,α = [α1, ..., αn] and β = [β1, ..., βn] are vectors,wc is a weight vector. selu(· ) is activation func-tion (Klambauer et al., 2017).

After we get decoder output oDt in time step t(1 ≤ t), if t%3 = 1 (that is t = 1, 4, 7, ...), weuse oDt to predict a relation, which means we aredecoding a new triplet. Otherwise, if t%3 = 2(that is t = 2, 5, 8, ...), we use oDt to copy the firstentity from the source sentence, and if t%3 = 0(that is t = 3, 6, 9, ...), we copy the second entity.

Predict Relation. Suppose there are m validrelations in total. We use a fully connected layerto calculate the confidence vector qr = [qr1, ..., q

rm]

of all valid relations:

qr = selu(oDt ·Wr + br) (7)

where Wr is the weight matrix and br is the bias.When predict the relation, it is possible to predic-t the NA-relation when the model try to generateNA-triplet. To take this into consideration, we cal-culate the confidence value of NA-relation as:

qNA = selu(oDt ·WNA + bNA) (8)

where WNA is the weight matrix and bNA is thebias. We then concatenate qr and qNA to formthe confidence vector of all relations (includingthe NA-relation) and apply softmax to obtain theprobability distribution pr = [pr1, ..., p

rm+1] as:

pr = softmax([qr;qNA]) (9)

We select the relation with the highest probabilityas the predict relation and use it’s embedding asthe next time step input vt+1.

Page 5: Extracting Relational Facts by an End-to-End Neural …copies the first entity (head entity) from the source sentence. Lastly, the decoder copies the second entity (tail entity) from

510

𝐬

𝐜

𝐨

𝐯

𝐡

𝐜

𝐨

𝐯

𝐡

𝐜

𝐨

𝐯

𝐡𝐬

𝐜

𝐨

𝐯

𝐡

𝐜

𝐨

𝐯

𝐡

𝐜

𝐨

𝐯

𝐡

𝐜

𝐬

𝐨

𝐯

𝐡

𝐜

𝐨

𝐯

𝐡

𝐜

𝐨

𝐯

𝐡

𝐜

𝐨

𝐯

𝐡

𝐜

𝐨

𝐯

𝐡

𝐜

𝐨

𝐯

𝐡

(a)

(b)

Figure 3: The inputs and outputs of the decoder(s) of OneDecoder model and MultiDecoder model.(a) is the decoder of OneDecoder model. As we can see, only one decoder (the green rectangle withshadows) is used and this encoder is initialized with the sentence representation s. (b) is the decodersof MultiDecoder model. There are two decoders (the green rectangle and blue rectangle with shadows).The first decoder is initialized with s; Other decoder(s) are initialized with s and previous decoder’s state.

Copy the First Entity. To copy the first en-tity, we calculate the confidence vector qe =[qe1, ..., q

en] of all words in source sentence as:

qei = selu([oDt ; oEi ]·we) (10)

where we is the weight vector. Similar with therelation prediction, we concatenate qe and qNA

to form the confidence vector and apply soft-max to obtain the probability distribution pe =[pe1, ..., p

en+1]:

pe = softmax([qe;qNA]) (11)

Similarly, We select the word with the highestprobability as the predict the word and use it’s em-bedding as the next time step input vt+1.

Copy the Second Entity. Copy the second en-tity is almost the same as copy the first entity. Theonly difference is when copying the second enti-ty, we cannot copy the first entity again. This isbecause in a valid triplet, two entities must be dif-ferent. Suppose the first copied entity is the k-thword in the source sentence, we introduce a maskvector M with n (n is the length of source sen-tence) elements, where:

Mi =

{1, i 6= k

0, i = k(12)

then we calculate the probability distribution pe

as:

pe = softmax([M ⊗ qe;qNA]) (13)

where ⊗ is element-wise multiplication. Just likecopy the first entity, We select the word with thehighest probability as the predict word and use it’sembedding as the next time step input vt+1.

3.2 MultiDecoder ModelMultiDecoder model is an extension of the pro-posed OneDecoder model. The main difference iswhen decoding triplets, MultiDecoder model de-code triplets with several separated decoders. Fig-ure 3 (b) shows the inputs and outputs of decodersof MultiDecoder model. There are two decoders(the green and blue rectangle with shadows). De-coders work in a sequential order: the first decodergenerate the first triplet and then the second de-coder generate the second triplet.

Similar with Eq 2, we calculate the hidden statehDit and output oDi

t of i-th (1 ≤ i) decoder in timestep t as follows:

oDit ,hDi

t =

{gDi(ut,hDi

t−1), t%3 = 2, 0

gDi(ut, hDi

t−1), t%3 = 1(14)

gDi(· ) is the decoder function of decoder i. ut isthe decoder input in time step t and we calculatedit as Eq 3. hDi

t−1 is the hidden state of i-th decoder

in time step t − 1. hDi

t−1 is the initial hidden stateof i-th decoder, which is calculated as follows:

hDi

t−1 =

{s, i = 112(s + hDi−1

t−1 ), i > 1(15)

Page 6: Extracting Relational Facts by an End-to-End Neural …copies the first entity (head entity) from the source sentence. Lastly, the decoder copies the second entity (tail entity) from

511

ClassNYT WebNLG

Train Test Train TestNormal 37013 3266 1596 246

EPO 9782 978 227 26SEO 14735 1297 3406 457ALL 56195 5000 5019 703

Table 1: The number of sentences of Normal, En-tityPairOverlap (EPO) and SingleEntityOverlap(SEO) classes. It’s worthy noting that a sentencecan belongs to both EPO class and SEO class.

3.3 TrainingBoth OneDecoder and MultiDecoder models aretrained with the negative log-likelihood loss func-tion. Given a batch of data with B sentencesS = {s1, ..., sB} with the target results Y ={y1, ..., yB}, where yi = [y1i , ..., y

Ti ] is the target

result of si, the loss function is defined as follows:

L =1

B × T

B∑i=1

T∑t=1

−log(p(yti |y<ti , si, θ)) (16)

T is the maximum time step of decoder. p(x|y) isthe conditional probability of x given y. θ denotesparameters of the entire model.

4 Experiments

4.1 DatasetTo evaluate the performance of our methods, weconduct experiments on two widely used datasets.

The first is New York Times (NYT) dataset,which is produced by distant supervision method(Riedel et al., 2010). This dataset consists of1.18M sentences sampled from 294k 1987-2007New York Times news articles. There are 24 validrelations in total. In this paper, we treat this datasetas supervised data as the same as Zheng et al.(2017). We filter the sentences with more than100 words and the sentences containing no posi-tive triplets, and 66195 sentences are left. We ran-domly select 5000 sentences from it as the test set,5000 sentences as the validation set and the rest56195 sentences are used as train set.

The second is WebNLG dataset (Gardent et al.,2017). It is originally created for Natural Lan-guage Generation (NLG) task. This dataset con-tains 246 valid relations. In this dataset, a instanceincluding a group of triplets and several standardsentences (written by human). Every standard sen-tence contains all triplets of this instance. We on-

ly use the first standard sentence in our experi-ments and we filter out the instances if all enti-ties of triplets are not found in this standard sen-tence. The origin WebNLG dataset contains trainset and development set. In our experiments, wetreat the origin development set as test set and ran-domly split the origin train set into validation setand train set. After filtering and splitting, the trainset contains 5019 instances, the test set contains703 instances and the validation set contains 500instances.

The number of sentences of every class in NYTand WebNLG dataset are shown in Table 1. It’sworthy noting that a sentence can belongs to bothEntityPairOverlap class and SingleEntityOverlapclass.

4.2 Settings

In our experiments, for both dataset, we use LSTM(Hochreiter and Schmidhuber, 1997) as the modelcell; The cell unit number is set to 1000; The em-bedding dimension is set to 100; The batch size is100 and the learning rate is 0.001; The maximumtime steps T is 15, which means we predict at most5 triplets for each sentence (therefore, there are 5decoders in MultiDecoder model). These hyper-parameters are tuned on the validation set. We useAdam (Kingma and Ba, 2015) to optimize param-eters and we stop the training when we find thebest result in the validation set.

4.3 Baseline and Evaluation Metrics

We compare our models with NovelTagging mod-el (Zheng et al., 2017), which conduct the best per-formance on relational facts extraction. We direct-ly run the code released by Zheng et al. (2017) toacquire the results.

Following Zheng et al. (2017), we use the stan-dard micro Precision, Recall and F1 score to eval-uate the results. Triplets are regarded as correc-t when it’s relation and entities are both correc-t. When copying the entity, we only copy the lastword of it. A triplet is regarded as NA-triplet whenand only when it’s relation is NA-relation and ithas an NA-entity pair. The predicted NA-tripletswill be excluded.

4.4 Results

Table 2 shows the Precision, Recall and F1 valueof NovelTagging model (Zheng et al., 2017) andour OneDecoder and MultiDecoder models.

Page 7: Extracting Relational Facts by an End-to-End Neural …copies the first entity (head entity) from the source sentence. Lastly, the decoder copies the second entity (tail entity) from

512

ModelNYT WebNLG

Precision Recall F1 Precision Recall F1NovelTagging 0.624 0.317 0.420 0.525 0.193 0.283OneDecoder 0.594 0.531 0.560 0.322 0.289 0.305

MultiDecoder 0.610 0.566 0.587 0.377 0.364 0.371

Table 2: Results of different models in NYT dataset and WebNLG dataset.

Precision Recall F10

1

0.777

0.6960.734

0.6410.686

0.6630.632

0.6900.660

Normal ClassNovelTaggingOneDecoderMultiDecoder

Precision Recall F10

1

0.374

0.085

0.138

0.598

0.485

0.536

0.607

0.5030.550

EntityPairOverlap ClassNovelTaggingOneDecoderMultiDecoder

Precision Recall F10

1

0.432

0.111

0.176

0.480

0.353

0.406

0.548

0.436

0.486

SingleEntityOverlap ClassNovelTaggingOneDecoderMultiDecoder

Figure 4: Results of NovelTagging, OneDecoder, and MultiDecoder model in Normal, EntityPairOverlapand SingleEntityOverlap classes in NYT dataset.

As we can see, in NYT dataset, our MultiDe-coder model achieves the best F1 score, whichis 0.587. There is 39.8% improvement comparedwith the NovelTagging model, which is 0.420. Be-sides, our OneDecoder model also outperformsthe NovelTagging model. In the WebNLG dataset,MultiDecoder model achieves the highest F1 score(0.371). MultiDecoder and OneDecoder modelsoutperform the NovelTagging model with 31.1%and 7.8% improvements, respectively. These ob-servations verify the effectiveness of our models.

We can also observe that, in both NYTand WebNLG dataset, the NovelTagging modelachieves the highest precision value and lowest re-call value. By contrast, our models are much morebalanced. We think that the reason is in the struc-ture of the proposed models. The NovelTaggingmethod finds triplets through tagging the word-s. However, they assume that only one tag couldbe assigned to just one word. As a result, oneword can participate at most one triplet. There-fore, the NovelTagging model can only recall asmall number of triplets, which harms the recal-l performance. Different from the NovelTaggingmodel, our models apply copy mechanism to findentities for a triplet, and a word can be copied

many times when this word needs to participatein multiple different triplets. Not surprisingly, ourmodels recall more triplets and achieve higher re-call value. Further experiments verified this.

4.5 Detailed Results on Different SentenceTypes

To verify the ability of our models in handling theoverlapping problem, we conduct further experi-ments on NYT dataset.

Figure 4 shows the results of NovelTagging,OneDecoder and MultiDecoder model in Normal,EntityPairOverlap and SingleEntityOverlap class-es. As we can see, our proposed models performmuch better than NovelTagging model in Entity-PairOverlap class and SingleEntityOverlap class-es. Specifically, our models achieve much high-er performance on all metrics. Another observa-tion is that NovelTagging model achieves the bestperformance in Normal class. This is because theNovelTagging model is designed more suitable forNormal class. However, our proposed models aremore suitable for the triplet overlap issues. Fur-thermore, it is still difficult for our models to judgehow many triplets are needed for the input sen-tence. As a result, there is a loss in our modelsfor Normal class. Nevertheless, the overall perfor-

Page 8: Extracting Relational Facts by an End-to-End Neural …copies the first entity (head entity) from the source sentence. Lastly, the decoder copies the second entity (tail entity) from

513

1 2 3 4 >=5Triplets number of a sentence

0.0

0.80.777

0.471

0.296

0.464

0.295

0.645

0.572 0.5810.546

0.316

0.636 0.6240.586 0.584

0.409

PrecisionNovelTaggingOneDecoderMultiDecoder

1 2 3 4 >=5Triplets number of a sentence

0.0

0.8

0.697

0.191

0.074 0.0830.038

0.698

0.488

0.434 0.440

0.149

0.699

0.551

0.4680.495

0.237

RecallNovelTaggingOneDecoderMultiDecoder

1 2 3 4 >=5Triplets number of a sentence

0.0

0.8

0.735

0.272

0.1180.141

0.068

0.671

0.5260.497 0.487

0.203

0.666

0.586

0.520 0.536

0.300

F1NovelTaggingOneDecoderMultiDecoder

Figure 5: Relation Extraction from sentences that contains different number of triplets. We divide thesentences of NYT test set into 5 subclasses. Each class contains sentences that have 1,2,3,4 or >= 5triplets.

Model NYT WebNLGOneDecoder 0.858 0.745

MultiDecoder 0.862 0.821

Table 3: F1 values of entity generation.

mance of the proposed models still outperformsNoverTagging. Moreover, we notice that the w-hole extracted performance of EntityPairOverlapand SingleEntityOverlap class is lower than thatin Normal class. It proves that extracting relation-al facts from EntityPairOverlap and SingleEntity-Overlap classes are much more challenging thanfrom Normal class.

We also compare the model’s ability of extract-ing relations from sentences that contains differ-ent number of triplets. We divide the sentences inNYT test set into 5 subclasses. Each class con-tains sentences that has 1,2,3,4 or >= 5 triplets.The results are shown in Figure 5. When extract-ing relation from sentences that contains 1 triplet-s, NovelTagging model achieve the best perfor-mance. However, when the number of triplets in-creases, the performance of NovelTagging mod-el decreases significantly. We can also observethe huge decrease of recall value of NovelTaggingmodel. These experimental results demonstratethe ability of our model in handling multiple re-lation extraction.

4.6 OneDecoder vs. MultiDecoder

As shown in the previous experiments (Table 2,Figure 4 and Figure 5), our MultiDecoder modelperforms better then OneDecoder model and Nov-

Model NYT WebNLGOneDecoder 0.874 0.759

MultiDecoder 0.870 0.751

Table 4: F1 values of relation generation.

elTagging model. To find out why MultiDecodermodel performs better than OneDecoder model,we analyzed their ability of entity generation andrelation generation. The experiment results areshown in Table 3 and Table 4. We can observethat on both NYT and WebNLG datasets, these t-wo models have comparable abilities on relationgeneration. However, MultiDecoder performs bet-ter than OneDecoder model when generating en-tities. We think that it is because MultiDecodermodel utilizes different decoder to generate dif-ferent triplets so that the entity generation resultscould be more diverse.

Conclusions and Future Work

In this paper, we proposed an end2end neuralmodel based on Seq2Seq learning framework withcopy mechanism for relational facts extraction.Our model can jointly extract relation and entityfrom sentences, especially when triplets in the sen-tences are overlapped. Moreover, we analyze thedifferent overlap types and adopt two strategies forthis issue, including one unified decoder and mul-tiple separated decoders. We conduct experimentson two public datasets to evaluate the effectivenessof our models. The experiment results show thatour models outperform the baseline method signif-

Page 9: Extracting Relational Facts by an End-to-End Neural …copies the first entity (head entity) from the source sentence. Lastly, the decoder copies the second entity (tail entity) from

514

icantly and our models can extract relational factsfrom all three classes.

This challenging task is far from being solved.Our future work will concentrate on how to im-prove the performance further. Another futurework is test our model in other NLP tasks likeevent extraction.

Acknowledgments

The authors thank Yi Chang from Huawei Tech.Ltm for his helpful discussions. This work is sup-ported by the Natural Science Foundation of Chi-na (No.61533018 and No.61702512). This workis also supported in part by Beijing Unisound In-formation Technology Co., Ltd, and Huawei Inno-vation Research Program of Huawei Tech. Ltm.

ReferencesYee Seng Chan and Dan Roth. 2011. Exploiting

syntactico-semantic structures for relation extrac-tion. In Proceedings of ACL, pages 551–560.

Junyoung Chung, Caglar Gulcehre, KyungHyun Cho,and Yoshua Bengio. 2014. Empirical evaluation ofgated recurrent neural networks on sequence model-ing. arXiv preprint arXiv:1412.3555.

Li Dong and Mirella Lapata. 2016. Language to logicalform with neural attention. In Proceedings of ACL,pages 33–43.

Claire Gardent, Anastasia Shimorina, Shashi Narayan,and Laura Perez-Beltrachini. 2017. Creating train-ing corpora for nlg micro-planners. In Proceedingsof ACL, pages 179–188.

Jiatao Gu, Zhengdong Lu, Hang Li, and Victor O.K.Li. 2016. Incorporating copying mechanism insequence-to-sequence learning. In Proceedings ofACL, pages 1631–1640.

Pankaj Gupta, Hinrich Schtze, and Bernt Andrassy.2016. Table filling multi-task recurrent neural net-work for joint entity and relation extraction. In Pro-ceedings of COLING, pages 2537–2547.

Shizhu He, Cao Liu, Kang Liu, and Jun Zhao.2017. Generating natural answers by incorporatingcopying and retrieving mechanisms in sequence-to-sequence learning. In Proceedings of ACL, pages199–208.

Iris Hendrickx, Su Nam Kim, Zornitsa Kozareva,Preslav Nakov, Diarmuid O Seaghdha, SebastianPado, Marco Pennacchiotti, Lorenza Romano, andStan Szpakowicz. 2010. Semeval-2010 task 8:Multi-way classification of semantic relations be-tween pairs of nominals. In Proceedings of the5th International Workshop on Semantic Evaluation,pages 33–38.

Sepp Hochreiter and Jurgen Schmidhuber. 1997.Long short-term memory. Neural computation,9(8):1735–1780.

Diederik P. Kingma and Jimmy Lei Ba. 2015. Adam:a Method for Stochastic Optimization. In Proceed-ings of ICLR, pages 1–15.

Gunter Klambauer, Thomas Unterthiner, AndreasMayr, and Sepp Hochreiter. 2017. Self-normalizingneural networks. In Advances in NIPS, pages 971–980.

Qi Li and Heng Ji. 2014. Incremental joint extractionof entity mentions and relations. In Proceedings ofACL, pages 402–412.

Makoto Miwa and Mohit Bansal. 2016. End-to-end re-lation extraction using lstms on sequences and treestructures. In Proceedings of ACL, pages 1105–1116.

Makoto Miwa and Yutaka Sasaki. 2014. Modelingjoint entity and relation extraction with table repre-sentation. In Proceedings of EMNLP, pages 1858–1869.

Sebastian Riedel, Limin Yao, and Andrew McCallum.2010. Modeling relations and their mentions with-out labeled text. In Proceedings of ECML PKDD,pages 148–163.

Kun Xu, Yansong Feng, Songfang Huang, andDongyan Zhao. 2015a. Semantic relation classifi-cation via convolutional neural networks with sim-ple negative sampling. In Proceedings of EMNLP,pages 536–540.

Yan Xu, Lili Mou, Ge Li, Yunchuan Chen, Hao Peng,and Zhi Jin. 2015b. Classifying relations via longshort term memory networks along shortest depen-dency paths. In Proceedings of EMNLP, pages1785–1794.

Xiaofeng Yu and Wai Lam. 2010. Jointly identifyingentities and extracting relations in encyclopedia textvia a graphical model approach. In Proceedings ofCOLING, pages 1399–1407.

Dmitry Zelenko, Chinatsu Aone, and AnthonyRichardella. 2003. Kernel methods for relation ex-traction. J. Mach. Learn. Res., 3:1083–1106.

Daojian Zeng, Kang Liu, Siwei Lai, Guangyou Zhou,and Jun Zhao. 2014. Relation classification via con-volutional deep neural network. In Proceedings ofCOLING, pages 2335–2344.

Meishan Zhang, Yue Zhang, and Guohong Fu. 2017.End-to-end neural relation extraction with global op-timization. In Proceedings of EMNLP, pages 1730–1740.

Suncong Zheng, Feng Wang, Hongyun Bao, YuexingHao, Peng Zhou, and Bo Xu. 2017. Joint extrac-tion of entities and relations based on a novel taggingscheme. In Proceedings of ACL, pages 1227–1236.