open relation extraction for chinese noun phrases · 16 index terms—open relation extraction,...

1 Open Relation Extraction for2 Chinese Noun Phrases3 Chengyu Wang , Xiaofeng He ,Member, IEEE, and Aoying Zhou,Member, IEEE

4 Abstract—RelationExtraction (RE) aims at harvesting relational facts from texts. Amajority of existing research targets at knowledge

5 acquisition from sentences, where subject-verb-object structures are usually treated as the signals of existence of relations. In contrast,

6 relational facts expressedwithin noun phrases are highly implicit. Previousworksmostly relies on human-compiled assertions and textual

7 patterns in English to address noun phrase-basedRE. For Chinese, the corresponding task is non-trivial becauseChinese is a highly analytic

8 languagewith flexible expressions. Additionally, noun phrases tend to be incomplete in grammatical structures, where clearmentions of

9 predicates are oftenmissing. In this article, we present an unsupervised NounPhrase-basedOpenRE system for theChinese language

10 (NPORE), which employs a three-layer data-driven architecture. The system contains three components, i.e., Modifier-sensitive Phrase

11 Segmenter, CandidateRelationGenerator andMissingRelation Predicate Detector. It integrates with a graph cliquemining algorithm to

12 chunkChinese noun phrases, consideringhow relations are expressed.We further propose a probabilisticmethodwith knowledgepriors and

13 a hypergraph-based randomwalk process to detectmissing relation predicates. Experiments overChineseWikipedia showNPORE

14 outperforms state-of-the-art, capable of extracting 55.2 percentmore relations than themost competitive baseline, with a comparable

15 precision at 95.4 percent.

16 Index Terms—Open relation extraction, noun phrase segmentation, graph clique mining, hypergraph-based random walk

Ç

17 1 INTRODUCTION

18 1.1 Motivation

19 RELATION Extraction (RE) is one of the coreNLP taskswhich20 aims at harvesting relational facts from free texts automat-21 ically. The extracted relations are essential for various applica-22 tions such as knowledge base construction [1], [2], taxonomy23 learning [3], question answering [4], etc.24 According to different task settings, RE can be addressed25 using a variety of machine learning paradigms, including26 supervised relation classification [5], distantly supervised RE27 over knowledge bases [6], pattern-based iterative relation boot-28 strapping [7] and the Open Information Extraction (OIE)29 approaches which do not require the input of a collection of30 pre-defined relation types [8], [9]. These methods mainly deal31 with RE on the sentence level, which determine the semantic32 relation between twoentitieswithin a single sentence. Recently,33 several approaches consider the global contexts of entities and34 model the global consistency of distant supervision, in order to35 extract relations across sentence boundaries on the corpus36 level [10], [11], [12].37 A drawback of these approaches is that they pay little atten-38 tion to semantic relations expressed by smaller semantic units.39 For example, we can extract the relation “(Donald Trump, is-40 decent-of, Scottish)” from “Scottish American” describing

41Donald Trump. These noun phrases contain rich knowledge42and are regarded as fine-grained representations of entities [13],43[14]. However, harvesting such knowledge from noun phrases44is non-trivial because they are extremely incomplete of syntac-45tic structures. Relational expressions within noun phrases are46highly implicit [15]. While the relation predicate “is-decent-of”47in theprevious case canbe easily inferred byhumans, this pred-48icate is omitted in texts and is difficult formachines to generate.49To deal with this problem, noun phrase-based OIE systems are50proposed to extract relations from noun compounds [16], [17],51[18]. These systems require a large number of human-compiled52assertions and lexical patterns to identify relations. For instance,53pattern “the [. . .] of [. . .]” can be used to extract “(Donald54Trump, is-president-of, United States)” from “the President of55theUnited States, Donald Trump”.56Although there has been significant success for English,57harvesting such relations from Chinese noun phrases still58faces several challenges and is an emerging task for NLP. This59is because Chinese is a highly analytic language, lacking60explicit expressions to convey grammatical relations [19].61There are noword spaces, explicit tenses and voices, or singu-62lar/plural distinctions in Chinese. Circumstances of how63semantic relations are expressed in Chinese noun phrases are64more complicated. Additionally, based on linguistic research,65properties of entities are more likely to be expressed by noun66phrases rather than verbal clauses [20]. To achievemore intui-67tive understanding, we illustrate relations extracted from two68noun phrases describingDonald Trump:

69Example 1. (American entrepreneur born in 1946)1

� C. Wang is with the School of Software Engineering, East China NormalUniversity, Shanghai 200062, China. E-mail: [email protected].

� X. He is with the School of Computer Science and Technology, East ChinaNormalUniversity, Shanghai 200062, China. E-mail: [email protected].

� A. Zhou is with the School of Data Science and Engineering, East ChinaNormal University, Shanghai 200062, China.E-mail: [email protected].

Manuscript received 28 Dec. 2018; revised 31 Oct. 2019; accepted 11 Nov.2019. Date of publication 0 . 0000; date of current version 0 . 0000.(Corresponding author: Xiaofeng He.)Recommended for acceptance by J. Tang.Digital Object Identifier no. 10.1109/TKDE.2019.2953839

1. The Chinese noun phrases below are printed after the Chineseword segmentation process [21]. English words directly under Chinesecharacters refer to the literal translation. “的(de)” is a Chinese auxiliaryword, usually put at the end of a modifier.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 31, NO. X, XXXXX 2019 1

1041-4347� 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See ht _tp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

https://orcid.org/0000-0003-1010-9678

https://orcid.org/0000-0003-1010-9678

https://orcid.org/0000-0003-1010-9678

https://orcid.org/0000-0003-1010-9678

https://orcid.org/0000-0003-1010-9678

https://orcid.org/0000-0002-6911-348X

https://orcid.org/0000-0002-6911-348X

https://orcid.org/0000-0002-6911-348X

https://orcid.org/0000-0002-6911-348X

https://orcid.org/0000-0002-6911-348X

mailto:[email protected]



70 1946年出生的美国企业家71 Year of 1946 Born de America Entrepreneur72 (Modifier 1) (Modifier 2) (Head)

73 Extracted relations (English translation):74 (Donald Trump, born-in, 1946)75 (Donald Trump, has-nationality, American)

76 Example 2. (People originated from Queens, New York)

77 纽约市皇后区出身人物78 New York City Queens District Origin Person79 (Modifier 1) (Head)

80 Extracted relation (English translation):81 (Donald Trump, originated-from, Queens District of New York)

82 As seen, such Chinese noun phrases are usually in the83 form of one/many modifier(s) + head word, with prepositions84 omitted in a large proportion. While head words are typically85 considered as hypernyms or topics of entities [22], [23], modi-86 fiers express non-taxonomic semantic relations of entities,87 either explicitly or implicitly. Different from traditional RE88 approaches, we observe that three unique challenges should89 be addressed for accurate RE fromChinese noun phrases:

90 Challenge 1. (Difficult to segment Chinese noun phrases91 into modifiers and head words) For English, boundaries92 between modifiers and head words can be identified by93 patterns [24], [25]. In contrast, there are no natural bound-94 aries in Chinese noun phrases. Chinese word segmentation95 and NLP chunking methods (e.g., [21], [26]) can not be96 applied to this task directly. A modifier (or even a compli-97 cated entity, see Example 2) may consist of multiple words.98 There is no standard, effective solution in NLP to solve this99 problem, without large amount of manual work.

100 Challenge 2. (Unclearmappings frommodifiers to seman-101 tic relations) Due to the lack of prepositions and attribu-102 tive clauses in Chinese, a modifier is usually a combination103 of nouns and other words. It is unclear how to extract the104 relation predicate and the object from the modifier to gen-105 erate relation triples.

106 Challenge 3. (Missing relation predicates in noun107 phrases) In many cases, relation predicates are non-existent108 in modifiers. In Example 1, the noun phrase does not explic-109 itly express the relation predicate between America and110 Donald Trump. Humans can easily infer the relation predi-111 cate as “has-nationality” based on commonsense knowl-112 edge. In contrast, a specific mechanism should be designed113 for machines to learn the predicates automatically. In the lit-114 erature, it is similar to the task of noun phrase interpretation115 in NLP [27]. However, our task is more challenging due to116 the complicated linguistic nature of the Chinese language.

117 1.2 Summary of Our Approach

118 We present an unsupervised Noun Phrase-based Open RE119 system for the Chinese language (NPORE). The input is a col-120 lection of entity-noun phrase pairs, where noun phrases are121 semantically related to the corresponding entities. The system122 generates knowledge in the form of relation triples, describing123 facts about entities explicitly. A topically related corpus is also124 provided, treated as the background knowledge source. To

125avoid tedious human labeling, the NPORE system employs a126three-layer data-driven architecture. The three major compo-127nents are summarized as follows:2

128Step 1. Modifier-sensitive Phrase Segmenter (MPS): It129segments a Chinese noun phrase into one/many modifier130(s) and one head word. To avoid the time-consuming131human labeling process and to be self-adaptive to any132domains,we propose an unsupervised graph cliquemining133algorithm to segment the noun phrases based on statistical134measures and word embeddings. Especially, we propose135two graph pruning strategies and an approximate algo-136rithm for efficient graph clique detection.

137For example, after the process of MPS, the segmentation138results of the two noun phrases are shown as follows:

139Example 1 1946年出生的美国企业家140Born in 1946 America Entrepreneur141(Modifier 1) (Modifier 2) (Head)

142Example 2 纽约市皇后区出身人物143Originated from Queens, New York Person144(Modifier 1) (Head)

145Step 2. Candidate Relation Generator (CRG): This com-146ponent generates full relations (subject-predicate-object147triples) and partial relations (subject-object pairs with148predicates missing) based on the results of MPS and syn-149tactic structures of noun phrases.

150The sample outputs of CRG are shown in below:

151Example 1 (Donald Trump, born-in, 1946) [full relation]152(Donald Trump, ?, American) [partial relation]

153Example 2 (Donald Trump, originated-from, [full relation]154Queens, New York)

155Step 3. Missing Relation Predicate Detector (MRPD): For156partial relations, a probabilistic predicate detection approach is157proposed. Especially, we employ Bayesian knowledge priors158and a hypergraph-based randomwalk process to encode both159contextual signals derived from the background text corpus160and the commonsense knowledge of humans into themodel.

161In the experiments, we evaluate the NPORE system over162datasets generated fromChineseWikipedia categories. Gener-163ally, the number of extracted relation triples are 155.2 percent164as many as the most competitive baseline and has a compara-165ble precision of 95.4 percent. We also evaluate various aspects166of the system tomake the convincing conclusion.

1671.3 Contributions and Paper Organization

168In summary, we make the following major contributions:

169� We introduce an unsupervised RE system named170Noun Phrase basedOpen RE. It employs a three-layer

2. Head words of noun phrases may express is-a or topic-of relationsbetween entities and the noun phrases. This issue has been addressedvia the hypernymy predication task in abundant papers (e.g., [22], [23])and summarized in [28]. Hence, it is not the focus of this work.

2 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 31, NO. X, XXXXX 2019

171 data-driven architecture to extract various relations172 fromChinese noun phrases.173 � We propose a graph-based algorithm to chunk Chi-174 nese noun phrases into modifiers and head words. A175 probabilistic method (integrated with Bayesian pri-176 ors and a hypergraph-based random walk process)177 is presented to detect missing relation predicates.178 � We conduct extensive experiments over Chinese179 Wikipedia categories to evaluate NPORE. The results180 show that it outperforms state-of-the-art approaches.181 The rest of this paper is organized as follows. Section 2182 summarizes the related work. The detailed techniques of183 NPORE are described in Section 3. Experiments are pre-184 sented in Section 4, with the conclusion drawn in Section 5.

185 2 RELATED WORK

186 In this section, we briefly overview recent advances on RE.187 Besides the research on the general RE task, we specifically188 focus on noun phrase-based RE and commonsense RE. This is189 because our goal is to extract relations from Chinese noun190 phrases, which also requires commonsense reasoning to191 detect missing predicates (especially for spatial and temporal192 commonsense relations). Additionally, we discuss some spe-193 cial considerations for RE over Chinese texts.

194 2.1 General Relation Extraction

195 The task of RE has been extensively studied in the NLP196 community, aiming at harvesting relational facts from free197 texts automatically. A typical paradigm of RE is supervised198 relation classification, which classifies entity pairs into a199 finite, pre-defined set of relation types based on contextual200 information [5]. To reduce human labeling efforts, distant201 supervision has been proposed to use relational facts in202 knowledge bases as training data [6]. One disadvantage of203 these approaches is that relation types of the RE systems204 need to be defined by humans in advance. OIE expands the205 RE research into open domains, which automatically identi-206 fies relation types and their corresponding relation triples207 in sentences [8], [29], [30]. Recently, deep learning benefits208 RE at a large extent by introducing techniques mostly in the209 following aspects: i) deep reinforcement learning models210 optimize long-term rewards of the quality of extracted rela-211 tions by treating relation extractors as agents [31]; adversar-212 ial training techniques impose additional regularization213 effects on relation classification by training discriminators214 and relation extractors at the same time [32]; attention215 mechanisms [33] improve the process of contextual feature216 extraction from sentences, etc. For OIE, the encoder-decoder217 architecture has been deeply exploited [34]. Because the218 general RE task is not the major focus of this work, we do219 not elaborate here.220 To improve the recall of RE, several works harvest rela-221 tions beyond the single sentence level. The intuition is that222 relations can be identified by considering more complicated223 sentence structures and the global contexts of entities, rather224 than a single sentence [10], [11], [12]. For example, Han and225 Sun [10] propose a global distant supervision model, which226 reduces the uncertainty of traditional distant supervision227 approaches by considering the global consistency of RE.

228Su et al. [11] learn the textual relation embeddings for dis-229tantly supervised RE. This work deals with the wrong label-230ing problem of distant supervision and models the global231statistics of relations. For OIE systems, Zhu et al. [35] lever-232age global information in documents by adding global233structure constraints to the relation extractors of OIE. These234methods harvest general semantic relations effectively but235are unable to deal with relations hidden in non-sentences,236especially for relations expressed by noun phrases.

2372.2 Noun Phrase-Based Relation Extraction

238A recent advance on OIE is to learn noun phrase-based rela-239tions. For example, Xavier and de Lima [16] harvest relations240expressed in noun compounds based on noun phrase inter-241pretation. RELNOUN [18] extends the RENOUN system [17],242which considers demonyms and relational compound nouns243to improve noun-based OIE. Among all noun phrases, user244generated categories (especially Wikipedia categories) are245highly informative, providing rich knowledge to characterize246entities. To discover relations in Wikipedia categories, Nas-247tase and Strube [36] propose a hybrid approach based on248preposition patterns. Pasca [37] studies how to decompose249names of Wikipedia categories into attribute-value pairs,250using lexical patterns in English.251Another similar task to extract relations from noun phrases252is called noun phrase interpretation, which uses verbal relations253to interpret the meanings of noun phrases. This task is closely254related to MRPD in our system for finding the missing predi-255cates. For example, a verbal relation “made-from” should be256generated from the noun phrase “olive oil”, because “olive257oil” is a kind of “oils”made from “olives”. In the literature, this258is usually formulated as supervised classification, where a259noun phrase is classified into an abstract verbal relation from260a manually-defined, fixed inventory [38]. However, a finite261set of relations (or verbs) are insufficient to represent the262semantics of noun phrases. For finer-grained relation repre-263sentations, several works (e.g., [39], [40]) consider multiple264paraphrases to express the semantics of noun phrases. In265SemEval-2013 Task 4 [41], participants are allowed to use free266paraphrases to represent the relationswithin noun phrases.267Compared to English, RE fromChinese noun phrases does268not yield comparable performance due to the complicated269linguistic nature. ZORE [42] is a recent sentence-based OIE270system for Chinese, using verb-based syntactic patterns to271extract relations. Similar OIE systems include [43], [44], etc.272However, there is limited success in RE from Chinese short273texts. Our previous work [14], [45] proposes to learn multiple274types of relations over Chinese Wikipedia categories, which275relies on human work to define relation types and only deals276with most frequent patterns. This work improves previous277research by enabling unsupervised open-domain common-278sense RE fromChinese noun phrases.

2792.3 Commonsense Relation Extraction

280Commonsense RE is fundamentally different from general281RE, because commonsense relations are rarely expressed in282texts. In the early age of artificial intelligence, commonsense283knowledge bases are mostly constructed by experts or Web-284scale crowd-sourcing, such as the CYC project [46], Concept-285Net [47], etc. The automatic acquisition of commonsense

WANG ET AL.: OPEN RELATION EXTRACTION FOR CHINESE NOUN PHRASES 3

286 relations is primarily based on pattern-based approaches. In287 the literature, WebChild [15] employs textual patterns to288 extract several types of commonsense relations from Web289 texts, including has-shape, has-taste, evokes-emotion, etc. The290 research of Narisawa et al. [48] focuses on harvesting numeri-291 cal commonsense facts. They extract numerical expressions292 and their contexts from the Web, and propose distributional293 and pattern-based models to predict whether a given value is294 large, small, or normal based on the context. The extraction of295 spatial commonsense relations is presented in [49], based on296 implicit spatial templates. Xu et al. [50] specifically focus on297 the locate-near commonsense relations. They propose a sen-298 tence-level relation classifier to predict whether two entities299 are close to each other, and aggregate the scores of entity pairs300 from a large corpus.301 As seen, all above works on short-text RE pay attention to302 one or a few types of relations only. It remains a challenge to303 extract a large number of relations from Chinese noun304 phrases, without pre-defined relation types. Our work aims305 at solving this problem, with minimal human supervision.

306 3 THE NPORE SYSTEM

307 In this section, we begin with a high-level overview of the308 NPORE system, with important notations and concepts309 introduced. Next, we elaborate the algorithms and techni-310 ques of the three components of NPORE in detail.

311 3.1 Overview of NPORE Components

312 The input of NPORE is a collection of entity-noun phrase313 pairs, denoted as fðe; pÞg where the noun phrase p describes314 the entity e. For example, we have e ¼“唐纳德·特朗普

315 (Donald Trump)” and p ¼“1946年出生的美国企业家(Ameri-316 can entrepreneur born in 1946)”. A topically related text cor-317 pus D is also provided as the background knowledge318 source. The three modules of NPORE are introduced as fol-319 lows. The general framework of the NPORE system is illus-320 trated in Fig. 1.

321 3.1.1 Modifier-Sensitive Phrase Segmenter (MPS)

322 We first perform Chinese word segmentation over the noun323 phrase p. The result is denoted as an ordered list: wsðpÞ ¼324 fw1; w2; . . . ; wjwsðpÞjg where wi 2 wsðpÞ is a segmented Chi-325 nese word in p. The goal of MPS is to generate the modifier-326 sensitive phrase segmentation of p, i.e., psðpÞ ¼ fq1; q2; . . . ;327 qjpsðpÞjg where qi is a modifier/head word in p, consisting of328 one or several words inwsðpÞ.

329Based on the observation discussed in the introduction330and previous research [51], we follow the assumption that a331noun phrase consists of one/many modifiers and a head332word. We treat qjpsðpÞj as the head word of p and qi (1 �333i � jpsðpÞj � 1) as a modifier of p. In order to generate the334segmentation results, we construct an N-gram Segmenta-335tion Graph (NSG) Gp for each noun phrase p and generate336the result psðpÞ based on the structure of Gp.

3373.1.2 Candidate Relation Generator (CRG)

338After MPS, we generate entity-modifier pairs fðe; qiÞgwhere339each qi 2 wsðpÞ (1 � i � jpsðpÞj � 1). Denote RðpÞ as the col-340lection of extracted candidate relations from the pair ðe; pÞ.341For each entity-modifier pair ðe; qiÞ, if a candidate predicate342can be detected from qi or the entire sequence wsðpÞ, a candi-343date full relation rðe; qiÞ is extracted and added to RðpÞ. Let344rvðe; qiÞ and roðe; qiÞ be the relation predicate and object w.r.345t. the relation rðe; qiÞ, respectively. If the relation predicate is346missing or can not be detected, a candidate partial relation347~rðe; qiÞ is derived and added to RðpÞ. The relation object is348denoted as ~roðe; qiÞ.

3493.1.3 Missing Relation Predicate Detector (MRPD)

350In this step, we employ a probabilistic predicate detection351algorithm with knowledge priors and a hypergraph-based352randomwalk process to detect the proper relation predicates.353A collection of relation predicates V is first generated. After354that, the prior model PrðvÞ and the data likelihood model355Prð~rðe; qiÞjvÞ are trained in an unsupervised manner, where356v 2 V. Especially, we construct a Predicate-basedHypergraph357Network (PHN) HðR;VÞ to approximate Prð~rðe; qiÞjvÞ based358on a random walk process. The most possible relation predi-359cate ~rv� ðe; qiÞ w.r.t. the partial relation ~rðe; qiÞ is generated via360Bayesian inference overPrðvÞ andPrð~rðe; qiÞjvÞ for all v 2 V.361Finally, for both full relations and partial relations with362the predicate detected, we compute confidence scores363confðrðe; qiÞÞ or confð~rðe; qiÞÞ to quantify the possibility that364the relation triples are correct. Relation triples with low con-365fidence scores are filtered.366Important notations are summarized in Table 1.

3673.2 Modifier-Sensitive Phrase Segmenter

368In this section, we introduce the graph mining-based369approach for MPS. To ensure that our system is unsupervised370and can be adapted to any domains, this algorithm is fully371data-driven and does not require any human-labeled data.

3723.2.1 Graph Construction

373The first step of MPS is Chinese word segmentation, separat-374ing a noun phrase p into a sequence of words, i.e., wsðpÞ ¼375fw1; w2; . . . ; wjwsðpÞjg. In this work, we treat Chinese word seg-376mentation andmodifier-sensitive phrase segmentation as two377separate tasks for two reasons: i) Chinese word segmentation378techniques have relatively high performance [21]; ii) the sepa-379ration of two tasks lowers the computational complexity of380MPS.381A natural gap between Chinese word segmentation and382MPS is that the boundaries of modifiers and head words in383Chinese are semantically implicit. Consider the following384noun phrase in both English and Chinese:

Fig. 1. The general framework of the NPORE system.


385 Example 3. (American entrepreneur in the 21st century)

386 English American entrepreneur in the 21st century387 Pre-modifier Head word Post-modifier388 (Adjective) (Noun) (Prepositional phrase)389 Chinese 21世纪美国企业家390 21st century America entrepreneur391 Modifier Modifier Head word392 (Two Nouns) (Noun) (Noun)

393 As seen, the modifiers in English noun phrases can be eas-394 ily separated by POS-based rules (which is a common practice395 in the literature such as [24]). For Chinese, there is no clear396 separation between modifiers and head words. The lack of397 variations of lexical units results in the fact that multiple con-398 secutive nouns can be used tomodify the headwords, further399 increasing the difficulty ofMPS.400 In this work, we propose a graph-based approach to401 address this problem. For each segmented noun phrasewsðpÞ,402 we construct a graph model to represent all possible configu-403 rations of phrase segmentation, and select the best configura-404 tion as the results of MPS. Define n as an n-gram factor,405 typically set to a small, positive integer.We introduce the con-406 cept ofN-gram SegmentationGraph:

407 Definition 1 (N-gram Segmentation Graph). An NSG408 GpðM;E;W Þ w.r.t. noun phrase p is an undirected graph with409 edge weights, where M and E denote collections of nodes and410 edges, respectively. W is an jEj-dimensional weight vector that411 assigns a weight ai;j to each ðmi;mjÞ 2 E, in the range of [0,1].

412 In the graph GpðM;E;WÞ, each node m 2 M is a word413 sequence derived from wsðpÞ. The word sequences include

414uni-grams, bi-grams, to n-grams. In Fig. 2a, given wsðpÞ ¼415fw1; w2; w3; w4g and n ¼ 3, we have M ¼ fw1; w2; w3; w4;416w1w2; w2w3; w3w4; w1w2w3; w2w3w4g. For each mi;mj 2 M,417we constrain that ðmi;mjÞ 2 E iff mi \mj ¼ ;.3 This is418because in a segmented noun phrases, any two elementsmust419bemutually excluded.We can see that eachmaximal clique in420the NSG GpðM;E;WÞ represents a configuration of phrase421segmentation of wsðpÞ. For instance, if the maximal clique422fw1; w2w3; w4g is selected, the phrase segmentation ofwsðpÞ is423m1 ¼ w1;m2 ¼ w2w3;m3 ¼ w4. For rigorousness, we prove424this claim as follows:

425Theorem 1. A maximal clique inGp is equivalent to a configura-426tion of the phrase segmentation of p.

427Proof Sketch. Based on the definition of the modifier-sensi-428tive phrase segmentation, a valid phrase segmentation of p429(i.e.,psðpÞ ¼ fq1; q2; . . . ; qjpsðpÞjg) forms a partition of wsðpÞ.430This requires two conditions: i) 8qi; qj 2 psðpÞ; qi \ qj ¼ ; and431ii)

Sqi2psðpÞ ¼ wsðpÞ.

432Consider a maximal clique M0in Gp. Because 8mi;mj 2

433M;mi \mj ¼ ; and M0 � M, we have 8mi;mj 2 M

0; mi\

434mj ¼ ;. By mapping each mi 2 M0to its corresponding seg-

435ment qi 2 psðpÞ, we have 8qi; qj 2 psðpÞ; qi \ qj ¼ ;. Next, we436prove the satisfaction of Condition ii) by contradiction.437Assume the maximal clique M

0does not ensure

Sqi2psðpÞ ¼

438wsðpÞ. There must be one node m�i not in M

0such that

439m�i =2

Smi2M 0 . This is contradictory to the definition of the

440maximal clique because addingm�i toM

0also forms a clique.

441Therefore, the assumption is not valid.442For the definition of the weights W , we propose a hybrid443approach to encode both statistical and distributional444knowledge into the model. Ifmi andmj are two consecutive445n-grams in wsðpÞ (e.g., mi ¼ w1, mj ¼ w2w3), the statistical446score wsði; jÞ and the distributional score wdði; jÞ are defined447as follows. wsði; jÞ is a variant of Normalized Pointwise448Mutual Information (NPMI), in the range of [0,1]

TABLE 1Important Notations

Notation Description

ðe; pÞ An entity-noun phrase pair such that the nounphrase p describes the entity e

D A large background text corpuswsðpÞ Chinese word segmentation result of phrase ppsðpÞ Modifier-sensitive segmentation result of

phrase pðe; qiÞ An entity-modifier pair where qi 2 psðpÞGpðM;E;WÞ An N-gram Segmentation Graph (NSG) w.r.t.

phrase pvecðmiÞ The embedding vector of n-grammi

Pi/N i A positive/negative constraint over ðwi; wiþ1ÞM� The maximum edge weight clique in NSG Gp

Rðe; pÞ The collection of candidate relations generatedfrom the pair ðe; pÞ

rðe; qiÞ A full relation generated from the pair ðe; qiÞ~rðe; qiÞ Apartial relation generated from the pair ðe; qiÞrvðe; qiÞ The relation predicate of rðe; qiÞroðe; qiÞ The relation object of rðe; qiÞV A collection of relation predicatesPrðvÞ The prior model of relation predicates where

v 2 VPrð~rðe; qiÞjvÞ The data likelihood model where v 2 VHðR;VÞ The Predicate-based Hypergraph Network

(PHN)confðrðe; qiÞÞ The confidence score of the full relation rðe; qiÞconfð~rðe; qiÞÞ The confidence score of the partial relation

~rðe; qiÞ

Fig. 2. The graph structure of NSG w. and w/o. positive and negativeconstraints. For simplicity, edge weights of all graphs are omitted. (Inthis example, we have wsðpÞ ¼ fw1; w2; w3; w4g and n ¼ 3.).

3. Without ambiguity, we also use the notation mi to representwords of corresponding n-grams. Hence, mi \mj ¼ ; means corre-sponding n-grams ofmi andmj do not share overlapping sequences.


wsði; jÞ ¼ 1

2� PMIði; jÞ

2hði; jÞ ¼ � log PrðmiÞPrðmjÞ2log Prðmi;mjÞ ;

450450

451 where PMIði; jÞ and hði; jÞ are the PMI scores and self-452 information of n-grams mi and mj, respectively. PrðmiÞ,453 PrðmjÞ and Prðmi;mjÞ are probabilities estimated using any454 language models.455 The distributional score wdði; jÞ is inspired by the compo-456 sitionality analysis in computational linguistics. We define457 wdði; jÞ based on a variant of the measure in [52]

wdði; jÞ ¼ 1

2ð1� cos ðvecðmimjÞ; vecðmi þmjÞÞÞ;

459459

460 where vecðmimjÞ is the compound embedding ofmi and mj,461 and vecðmi þmjÞ is the normalized sum of the word462 embeddings of mi and mj separately, i.e., vecðmi þmjÞ ¼463

vecðmiÞkvecðmiÞk þ

vecðmjÞkvecðmjÞk . Ifmi andmj are highly indecomposable,

464 the individual contexts of mi and mj should be significantlydifferent from the context of the mimj compound. Hence,

vecðmimjÞ and vecðmi þmjÞ are dis-similar. We employ the

compositionality score in this work because it leverages the

low-dimensional representations of terms.465 As seen, if mi and mj are highly decomposable, wsði; jÞ466 and wdði; jÞ will be close to 1. This is a strong signal that mi

467 and mj should be groped into different modifiers/head468 words. ai;j is computed by combining the two scores

ai;j ¼ gwsði; jÞ þ ð1� gÞwdði; jÞ; (1)470470

471 where g 2 ð0; 1Þ is a pre-defined hyper-parameter.472 Remarks. For simplicity, let k ¼ jwsðpÞj. Recall that n is the473 n-gram factor (n � k). It is trivial to see that at least dkne times474 of segmentation are required. Hence, the total number of475 possible segmentation configurations D is derived as

D ¼Xk�1

i¼dknek�1i

� � ¼ 2k�1 �Xdkne�1

i¼0

k�1i

� �

477477

478 where ki ¼ k!i!ðk�iÞ!. Therefore, in the worst cases, it takes

479 Oð2kÞ time to find the best segmentation by brute-force

search of all maximal cliques.480 Although in real applications, n and k are small integers,481 finding the optimal segmentation result could be computation-482 ally expensive. In this work, we propose two techniques to483 minimize the computation cost: i) two graph pruning strategies484 to reduce the graph size; and ii) an approximate algorithm to485 detect the propermaximal clique (i.e., the segmentation result).

486 3.2.2 Graph Pruning Strategies

487 We introduce two types of constraints based on linguistic488 rules to reduce of the NSG size and improve the accuracy of489 MPS. The concept of positive constraint is defined as:

490 Definition 2 (Positive Constraint). A positive constraint Pi

491 is defined over a consecutive word pair ðwi; wiþ1Þ (wi 2 wsðpÞ,492 wiþ1 2 wsðpÞ) such that two words wi and wiþ1 must be seg-493 mented into the same phrase.

494 Fig. 2b illustrates the NSG structure by adding a positive495 constraint over ðw3; w4Þ. The other type of constraints is the496 negative constraint, defined as follows:

497Definition 3 (Negative Constraint). A negative constraint498N i is defined over a consecutive word pair ðwi; wiþ1Þ499(wi 2 wsðpÞ, wiþ1 2 wsðpÞ) such that two words wi and wiþ1

500must not be segmented into the same phrase.

501The NSG structure with a negative constraint over502ðw1; w2Þ is shown in Fig. 2c. In this example, compared with503the original graph, all nodes containing the bi-gram w1w2

504are removed. The combination of both constraints is per-505formed by calculating the intersection of the two graphs, as506illustrated in Fig. 2d.507Remarks.We analyze to what degree the two types of con-508straints can reduce the graph size. For the original graph, it509is trivial to see that: jMj ¼ kþ ðk� 1Þ þ � � � þ ðk� nþ 1Þ ¼510nk� 1

2n2.

511Define F as the collection of all words associated with512at least one positive constraint. For example, we have513F ¼ fw1; w2; w3; w5; w6g if ðw1; w2Þ, ðw2; w3Þ and ðw5; w6Þ514match the constraints. The usage of positive constraints515reduce the number of nodes to jMj � jFj by removing the516corresponding jFj uni-grams. The situations of negative517constraints are more complicated, depending on the posi-518tions of words and the values of n and k. For a negative con-519straint N i ¼ ðwi; wiþ1Þ, if i ¼ 1 or iþ 1 ¼ k, it can reduce520n� 1 nodes in the graph, which is the worst case. The best521case can be satisfied if iþ 1 � n and i > k� n, with the522reduction number of nodes as

Pn�1j¼1 j ¼ 1

2nðn� 1Þ. Let c be523the number of negative constraints used in this work. The524number of nodes in the NSG is loosely bounded by:525½nk� 1

2n2 � jFj � c

2 ðn� 1Þ; nk� 12n

2 � jFj � c2 nðn� 1Þ.

526Tighter bounds can be achieved by discussing all cases in527details, which are beyond the scope of this paper. As seen in528Fig. 2, the number of nodes of the NSG reduces from 9 to 4529by applying only two constraints. Correspondingly, the530number of edges reduces from 14 to 4.

531Algorithm 1.NSG Construction with Pruning Strategies

532Input: Word segmentation result wsðpÞ of phrase p, n-gram fac-533tor n, positive constraints fPig, negative constraints fN ig.534Output: Pruned NSG GpðM;E;W Þ.5351: Initialize an empty NSG GpðM;E;WÞ;5362: //Handling positive constraints5373: Construct the word collection F w.r.t. positive constraints538fPig;5394: for each wi 2 wsðpÞ (i < jwsðpÞj) do5405: if wi =2 F then5416: Add the node wi toM ;5427: end if5438: end for5449: //Handling negative constraints54510: for j ¼ 2 to n do54611: for each wi 2 wsðpÞ (i < jwsðpÞj � jþ 1) do54712: if word pairs in wiwiþ1 � � �wiþj�1 does not violate any548negative constraints fN ig then54913: Add the node wiwiþ1 � � �wiþj�1 toM ;55014: Add corresponding edges w.r.t. wiwiþ1 � � �wiþj�1 to E;55115: Compute the weights of the edges by Eq. (1);55216: end if55317: end for55418: end for55519: return Pruned NSG GpðM;E;WÞ.


556 Algorithm 1 presents the detailed procedure to construct557 the NSG, with pruning strategies applied. Before the algo-558 rithm adds all uni-grams wi 2 wsðpÞ to the NSG Gp, the559 word collection setF is constructed by checking all the posi-560 tive constraints. If wi 2 F, wi does not need to be added to561 the graph. The negative constraints take effect when bi-562 grams, tri-grams to n-grams in wsðpÞ are added as nodes. If563 any pair of consecutive words in wiwiþ1 � � �wiþj�1 violate564 one negative constraint (j > 1), the node wiwiþ1 � � �wiþj�1

565 should be pruned in advance. As seen, by applying the two566 pruning strategies during the graph construction process,567 the system does not need to construct the original NSG568 fully, which reduces computational resources.569 Because our work specifically focuses on the Chinese lan-570 guage, we design three positive constraints and one nega-571 tive constraint specifically for the Chinese language, shown572 in Table 2. For example, the auxiliary word “的(de)” is an573 important signal, indicating the end of a modifier. Hence,574 the noun phrase should be segmented after the word “的575 (de)”. Our method is flexible that can be extended to other576 languages by designing language-specific rules.

577 3.2.3 Approximate Algorithm for MEWC

578 We select the optimal maximal clique as the best segmenta-579 tion result over the pruned NSG Gp ¼ ðM;E;WÞ. In this580 work, this problem is modeled as detecting the Maximum581 Edge Weight Clique (MEWC) M

0among all maximal cli-

582 ques in G (M0 � M). Denote G

0pðM

0; E

0 Þ as the subgraph of583 Gp w.r.t. the cliqueM

0. Formally, MEWC is defined as:

584 Definition 4 (MEWC Problem). The optimization objective585 of the Maximum Edge Weight Clique problem is:

maxX

ðmi;mÞ2E0ai;j

s.t. E0 � E;8mi 2 M

0; 8mj 2 M

0; ðmi;mjÞ 2 E

0:587587

588

589 This problem proves to be NP-hard by Alidaee et al. [53].590 In our previous work, we present an Monte Carlo-based591 approximate algorithm to solve this problem, suitable for592 detecting MEWCs in word similarity graphs [14], [45]. This593 algorithm is highly efficient for dense, complete graphs.594 However, NSGs, especially after constraints-based pruning,595 tend to be sparse in structure. Directly applying this method596 to these graphs may lead to detection of multiple cliques in a597 graph, rather than one clique with the largest sum of edge598 weights. This is because the algorithm greedily selects edges

599with large weights without checking whether the selected600edges can form a single clique. An example ofMEWC and its601counter-example are shown in Fig. 3.602In thiswork,we improve the algorithmofMEWC for sparse603graphs, as shown in Algorithm 2. It starts with the selection of604an edge ðmi;mjÞ 2 E with probability / ai;j. Let G

0p ¼

605ðM 0;E

0 Þ be an initial graph where M0 ¼ fmi;mjg and

606E0 ¼ fðmi;mjÞg. NðM 0 Þ is the neighboring node set of M

0in

607Gp. For each mi 2 NðM 0 Þ, the algorithm checks whether608M

0 [ fmig forms a clique and denotes nodes that satisfy the609criteria as the candidate node setCanðM 0 Þ. An iterative process610samplesmi fromCanðM 0 Þwith probability/ P

mj2M 0 ai;j, and

611addmi and its corresponding edges toG0p. The node collections

612NðM 0 Þ and CanðM 0 Þ are updated in each iteration. The algo-613rithm continues until no edges can be selected, resulting inM

0

614as theMEWCof theNSGGp.

615Algorithm 2. Improved Algorithm for MEWC

616Input: Pruned NSG Gp ¼ ðM;E;WÞ.617Output:MEWC V

0.

6181: Sample an edge ðmi;mjÞ from E with prob. / ai;j;

6192: InitializeG0 ¼ ðM 0

;E0 ÞwithM

0 ¼ fmi;mjg,E0 ¼ fðmi;mjÞg;6203: Compute neighbor node setNðM 0 Þ;6214: Generate candidate node set CanðM 0 Þ � NðM 0 Þ;6225: while CanðM 0 Þ 6¼ ; do6236: Samplemi from CanðM 0 Þwith prob. / P

mj2M 0 ai;j;

6247: Addmi and corresponding edges to G0;

6258: Update NðM 0 Þ and CanðM 0 Þ;6269: end while62710: returnMEWCM

0.

628The worst-case runtime complexity of this algorithm is629OðjMj2jEjÞ using hash-maps as graph implementation,630slightly larger than [14] (i.e., OðjEj2Þ). However, the increase631of complexity does not affect efficiency much because in632most cases, a pruned NSG usually contains fewer than ten633nodes. We run this algorithm multiple times and denote the634collection of detected cliques as C ¼ fM 0 g. The clique635M� 2 C is selected to form the final segmentation result

M� ¼ argmaxM

0 2C

Pðmi;mjÞ2M 0 ai;j

log ð1þ bjM 0 jÞ ; (2)

637637

638where b > 0 is a scaling factor. This technique favors smaller639cliques. As smaller cliques contain a fewer number of seg-640ments, this technique avoids segmenting noun phrases into641toomany short, semantically incomplete phrases.642For better understanding of MPS, we summarize the643high-level procedure, shown in Algorithm 3.

TABLE 2Positive and Negative Constraints That we

Designed for the Chinese Language

Positive Constraintsw.r.t. wi and wiþ1

Condition 1: POS(wi)=VERB and POS(wiþ1)=PREPCondition 2: POS(wi)=CONJ or POS(wiþ1)=CONJCondition 3: wiþ1=“的(de)”

Negative Constraintw.r.t. wi and wiþ1

Condition 1: wi=“的(de)”

POS(wi) is the Part-of-Speech tag of word wi.

Fig. 3. The detected MEWC and an counter-example. The cliques andtheir corresponding edges are printed in bold.


644 Algorithm 3.High-Level Algorithm of MPS

645 Input: Chinese noun phrase p, max iteration numbermax.646 Output:Modifier-sensitive phrase segmentation result psðpÞ.647 1: Generate word segmentation result wsðpÞ of phrase p via648 Chinese word segmentation;649 2: Construct a pruned NSG GpðM;E;WÞ based on wsðpÞ by650 Algorithm 1;651 3: Initialize the collection of cliques C ¼ ;;652 4: for each iteration i ¼ 1 tomax do653 5: Detect the MEWCM

0by Algorithm 2;

654 6: Update C ¼ C [ fM 0 g;655 7: end for656 8: Select the best cliqueM� from C by Eq. (2);657 9: Generate the segmentation result psðpÞ based onM�;658 10: returnModifier-sensitive phrase segmentation result psðpÞ.

659 3.3 Candidate Relation Generation

660 As discussed earlier, MPS is able to extract modifiers from661 Chinese noun phrases, which provide useful information662 about entities. However, it is still unclear how modifiers can663 be transformed into relation predicates and objects.664 In the CRG component, for each entity-segmented noun665 phrase pair ðe; psðpÞÞ, let RðpÞ be the collection of candidate666 relations derived based on modifiers in psðpÞ. For each mod-667 ifier qi w.r.t. an entity e (i < jpsðpÞj), if qi does not contain a668 specific named entity other than e, it is likely that this modi-669 fier does not express a relational fact. Hence, we simply dis-670 card it. Otherwise, it is probable that a relational fact can be671 derived from the entity e and the modifier qi.672 To generate candidate relations, it is vital to identify the673 relation predicates and objects from the modifiers. For exam-674 ple, given “1946年出生的(Born in 1946)”w.r.t. DonaldTrump,675 CRG aims at detecting “出生(Born in)” as the relation predi-676 cate and “1946年(1946)” as the object. Hence, the full relation677 “(Donald Trump, born-in, 1946)” can be extracted. In Chinese,678 relation expressions aremore irregular than English. Based on679 the syntactic structure of qi, the operations of CRG can be680 divided into three cases, elaborated as follows, with four681 examples shown in Fig. 4:

682Case i). If qi is a verbal clause, the verb and the object in qi683can be directly extracted as the relation predicate and object684(i.e., rvðe; qiÞ and roðe; qiÞ) of the full relation rðe; qiÞ. The cor-685responding objects of verbs are determined by dependency686parsing [54].687Case ii). In a few cases, the verb and the object are not clus-688tered into one phrase qi. This is becauseMPS is fully unsuper-689vised with no pre-defined number of modifiers. Here, we690propose a cross-modifier relation generation technique. If a691named entity exists in qi but no verbs are found, we further692search qi�1 and qiþ1. If qi�1 or qiþ1 is also not a complete verbal693clause and contains a verb for the entity based on dependency694parsing, the relation triple rðe; qiÞ can also be extracted,695togetherwith the predicate rvðe; qiÞ and the object roðe; qiÞ.696Case iii). If no verbs are detected for the entity, it means i) the697verbal relation is expressed implicitly or ii) an error occurs in698MPS. In this case, we extract a partial relation, denoted as699~rðe; qiÞwhere qi contains an entity as the relation object ~roðe; qiÞ.700It should be further noted that not all candidate relations701(especially partial relations) are correct in semantics. Con-702sider the following segmented phrase w.r.t. Donald Trump:

703Example 4. (Political figure in the United States)

704美国政治人物705America Politics Person706(Modifier 1) (Modifier 2) (Head)

707Extracted partial relations (English translation):708(Donald Trump, ?, America)709(Donald Trump, ?, Politics)

710Although “Donald Trump” is related to “Politics”, no711explicit relation can be established. This relation triple can be712automatically filtered by confidence assessment inMRPD.

7133.4 Missing Relation Predicate Detector

714After MPS and CRG, we continue to detect missing predi-715cates for partial relations. Here, we introduce the probabilis-716tic predicate detection algorithm with knowledge priors and717a hypergraph-based random walk process to detect missing718relation predicates for partial relations generated by CRG.

7193.4.1 Model Formulation

720Denote V as the collection of all possible relation predicates.721Based on our previous research [14], the pattern-based722approach for Chinese relation predicate extraction from723noun phrases has very low accuracy (around 14 percent).724Additionally, the detection of predicates from texts also suf-725fers from low accuracy due to flexible expressions and the726existence of light verb constructions [42]. Hence, in the727NPORE system, we restrict our focus to only two knowl-728edge sources in order to create the collection V: i) all the729relation predicates generated by CRG, due to the explicit730verbal structures in noun phrases; and ii) relation predicates731defined manually based on human common sense.732Given a partial relation ~rðe; qiÞ, a basic model is the dis-733criminative model Prðvj~rðe; qiÞÞ, which directly models the734conditional probability of all relation predicates v 2 V, given735a partial relation ~rðe; qiÞ as input. However, this model736would suffer from the data sparsity problem due to the737huge number of combinations of relation subject-object pair

Fig. 4. Examples of three cases in CRG. In a full/partial relation triplerðe; qiÞ or ~rðe; qiÞ, we only list the extracted relation predicate rvðe; qiÞand the object roðe; qiÞ or ~roðe; qiÞ in the table, with the name of the entity

e omitted. The notation �!DEPrefers to the case where the object depends

on the relation predicate in the dependency parsing tree.


738 ðe; ~roðe; qiÞÞ and the predicate v. Besides, as our system is739 fully unsupervised, learning parameters of Prðvj~rðe; qiÞÞ740 would be highly challenging.741 In this work, inspired by the text generation method [55],742 we model the problem of probabilistic predicate detection743 as a generative model Prðv; ~rðe; qiÞÞ ¼ PrðvÞPrð~rðe; qiÞjvÞ744 where PrðvÞ and Prð~rðe; qiÞjvÞ are the prior and data likeli-745 hood models, respectively. For model prediction, based on

746 the Bayesian rule, we have Prðvj~rðe; qiÞÞ ¼ PrðvÞPrð~rðe;qiÞjvÞPrð~rðe;qiÞÞ ,

747 where Prð~rðe; qiÞÞ is treated as the normalization terms. The748 verb ~rv� ðe; qiÞ is then selected by the following formula as749 the relation predicate between e and ~roðe; qiÞ in ~rðe; qiÞ:

~rv� ðe; qiÞ ¼ argmaxv0 2V

Prðv0 ÞPrð~rðe; qiÞjv0 Þ: (3)

751751

752 In the following, we introduce the definitions of the two753 models PrðvÞ and Prð~rðe; qiÞjvÞ in detail.

754 3.4.2 The Prior Model

755 The prior model PrðvÞ integrates both knowledge learned756 frompreviously extracted fully relations and human common757 sense. The first part of model PrðvÞ is formulated based on758 Maximum Likelihood Estimation (MLE), i.e., PrðvÞMLE ¼ Nv

N ,759 whereN andNv denote the numbers of extracted full relation760 triples by CRG and a subset of these full relation triples with761 relation predicates as v.762 According to our data-centric analysis, the majority of cases763 with missing relation predicates are due to the existence of764 implicit commonsense relations [15], [48], [49]. As a prelimi-765 nary experiment, we randomly sample 300 partial relation tri-766 ples from the experiment dataset, and observe that spatial and767 temporal commonsense relations are the twomost frequent rela-768 tion types withmissing predicates. Here, spatial commonsense769 relations refer to the located-in relations between locations,770 while temporal commonsense relations refer to the happened-in771 relations between events and temporal expressions. InChinese,772 the prepositions of spatial and temporal expressions are usu-773 ally omitted (i.e., the counterpart of the preposition “in” in774 English). Examples of both types of commonsense relations are775 shown inTable 3. Therefore,we need to derive such relation tri-776 ples by commonsense reasoning.4

777Addition to MLE, we propose the commonsense proba-

778bility distribution PrðvÞCS to encode this observation. As an779approximate estimation, let Ns, Nt and Np be the numbers780of locations, temporal expressions and all objects among all781the candidate relations generated by CRG. The model782PrðvÞCS is defined as follows:

PrðvÞCS ¼NsNp

; v is spatial relationNtNp

; v is temporal relation1

jVj�2 ð1� NsþNtNp

Þ; Otherwise

8>><>>:

:

784784

785

786Combining the two probabilistic distributions PrðvÞMLE

787and PrðvÞCS , we derive the full model of PrðvÞPrðvÞ ¼ �1 PrðvÞMLE þ �2 PrðvÞCS þ ð1� �1 � �2Þ 1

jVj ; (4)

789789

790where �1 and �2 are balancing hyper-parameters with7910 < �1 < 1, 0 < �2 < 1 and �1 þ �2 < 1. ð1� �1 � �2Þ 1

jVj792gives a smoothing effect on the verb distribution PrðvÞ

based on the Jelinek-Mercer smoothing technique [56].

7933.4.3 The Data Likelihood Model

794The data likelihood model estimates Prð~rðe; qiÞjvÞ in an795unsupervised manner based on a hypergraph-based ran-796dom walk process. We first introduce two scores over the797graph to define the random walk process.798Predicate Coherence Score. The predicate coherence score is799defined between a predicate v 2 V and a partial relation800~rðe; qiÞ, denoted as wpðv; ~rðe; qiÞÞ. It measures whether a801predicate v is suitable to describe the relation between the802subject e and the object ~roðe; qiÞ of the partial relation ~rðe; qiÞ.803To speed up the process of verb retrieval, we construct a804sentence-level inverted index over the background text cor-805pus D using Apache Lucene.5 The query “e AND ~roðe; qiÞ”806is used to retrieve a collection of sentences S. For each sen-807tence s 2 S, we extract contextual verbs from s which may808indicate the relations between e and ~roðe; qiÞ. Inspired809by [29], we regard a verb to be contextual if it is in the810dependency chain between e and ~roðe; qiÞ. Let V ðe; qiÞ be the811contextual verb collection, cðvÞ be the count of v extracted812from S. The score wpðv; ~rðe; qiÞÞ is defined as

wpðv; ~rðe; qiÞÞ ¼ 1

Z

Xv0 2V ðe;qiÞ

cðv0 Þ cos ðvecðvÞ; vecðv0 ÞÞ;814814

815where Z ¼ Pv0 2V ðe;qiÞ cðv

0 Þ is the normalization factor.

816Relation Similarity Score. The relation similarity score817wrðrðe; qiÞ; rðe0 ; q0iÞÞ is defined over two relations rðe; qiÞ and818rðe0 ; q0iÞ. It computes the degree that the two relations may819have the same predicate, based on the similarity of word820embeddings of their subjects and objects:6

wrðrðe; qiÞ; rðe0 ; q0iÞÞ ¼1

2ð cos ðvecðeÞ; vecðroðe; qiÞÞÞ

þ cos ðvecðe0 Þ; vecðroðe0 ; q0iÞÞÞÞ:(5)

822822

823

TABLE 3Examples of Spatial and Temporal Commonsense Relations

Type Entity Noun phrase

Spatial 复旦大学上海高等院校Fudan University University in Shanghai

故宫博物院北京博物馆Palace Museum Museum in Beijing

Temporal 诺曼底战役 1944年欧洲战场战役Battle of Normandy Battle of Europe in 1944

安史之乱 8世纪中国战争An Lushan Rebellion Chinese War in 8th Century

Locations and temporal expressions are printed in bold.

4. Note that a majority of existing research works focus harvesting ofspatial/temporal relations [48], [49]. This practice is also applied in theYAGO2 knowledge base [25]. Harvesting more types of commonsenseknowledge automatically is left as future research.

5. http://lucene.apache.org6. For simplicity, we do not distinguish full or partial relations here,

and denote them as rðe; qiÞ and rðe0 ; q0iÞ uniformly.


http://lucene.apache.org

824 Based on the two scores, we construct the hypergraph for825 the random walk process. We introduce the concept of826 Predicate-based Hypergraph NetworkHðR;VÞ:827 Definition 5 (Predicate-based Hypergraph Network). A828 PHN HðR;VÞ is a hypergraph model where R is the node col-829 lection, corresponding to all full and partial relations generated830 by CRG and V is the hyper-edge collection, corresponding to all831 predicates.

832 In PHM HðR;VÞ, each hyper-edge (i.e., relation predi-833 cate) v 2 V is associated with a collection of full and partial834 relations. We require that i) a full relation rðe; qiÞ is in the835 hyper-edge v iff rvðe; qiÞ ¼ v; and ii) a partial relation ~rðe; qiÞ836 is in the hyper-edge v iff the score wpðv; ~rðe; qiÞÞ > t1 where837 t1 2 ð0; 1Þ is a predefined hyper-parameter. Hence, we can838 see that all nodes in the hyper-edge v are either full relations839 with predicate v or partial relations that are highly probable840 to have the predicate v. Note that it is possible for a partial841 relation to be in more than one hyper-edge. Refer to a toy842 example in Fig. 5.843 Hypergraph-Based Random Walk Process. We further define844 the concept of neighborhood of any nodes in PHN:

845 Definition 6 (Neighborhood of PHN). The neighborhood846 Nbðrðe; qiÞÞ of a node rðe; qiÞ in PHN HðR;VÞ is a collection847 of nodes such that a node rðe0 ; q0iÞ 2 Nbðrðe; qiÞÞ iff there exists848 a hyper-edge v 2 V with ðrðe; qiÞ; rðe0 ; q0iÞÞ 2 v.

849 Consider the example in Fig. 5. The neighborhoods of850 Nodes 1 and 4 are {2,3,4} and {1,2,3,5,6}, respectively.851 The hypergraph-based random walk process is as follows.852 Let Rv be the node collection in hyper-edge v 2 V that all853 correspond to full relations. For each v 2 V, a separate random854 walker starts from each rðe; qiÞ 2 Rv, and goes to a neighbor855 node rðe0 ; q0iÞ 2 Nbðrðe; qiÞÞ with probability / wrðrðe; qiÞ;856 rðe0 ; q0iÞÞ. The process iterates after a sufficient number of857 walks. Finally, each partial relation ~rðe; qiÞ receives a score858 svð~rðe; qiÞÞ, indicating the number of visits of all random859 walkers.Prð~rðe; qiÞjvÞ is approximated by

Prð~rðe; qiÞjvÞ ¼ svð~rðe; qiÞÞP~rðe0 ;q0

iÞ2R svð~rðe0 ; q0iÞÞ

: (6)

861861

862

863 It is noteworthy that it is highly possible for a random864 walker starting from a hyper-edge to go to another hyper-865 edge. For example, in Fig. 5, the random walker may go from866 Node 1 to Node 5 (from hyper-edge v1 to v2). This setting867 assigns a part of the probability to nodes outside the candidate868 sets, which addresses the problem where the candidate

869generation technique does not yield 100 percent recall. Read-870ers can also refer to a summarization of the probabilistic predi-871cate detection process inAlgorithm 4.

872Algorithm 4.Missing Predicate Detection

8731: Generate the predicate collection V;8742: for each predicate v 2 V do8753: Estimate prior probability PrðvÞ by Eq. (4);8764: Generate full relations associated with v asRv;8775: end for8786: Construct the PHNHðR;VÞ;8797: for each predicate v 2 V do8808: Run the random walk process based on Eq. (5);8819: for each partial relation ~rðe; qiÞ do88210: Compute Prð~rðe; qiÞjvÞ by Eq. (6);88311: Predict the relation predicate ~rv� ðe; qiÞ by Eq. (3);88412: end for88513: end for

8863.4.4 Confidence Assessment

887Finally, we filter out noisy and meaningless extractions. We888observe that most extraction errors occur when the algo-889rithm extracts meaningless “relation predicates”. This phe-890nomenon is also consistent with our previous research [14],891[45] and classical OIE research [29], [30]. Let ~cðvÞ be the892number of extracted full and partial relations with predi-893cates predicted as v. The confidence score of each full rela-894tion rðe; qiÞ is defined as: confðrðe; qiÞÞ ¼ ~cðrvðe; qiÞÞ.895For each partial relation ~rðe; qiÞ, we add another factor to896measure whether the prediction of the relation predicate897~rv� ðe; qiÞ by Eq. (3) is confident. From a probabilistic perspec-898tive, if the predication is confident, the score maxv2V PrðvÞ899Prð~rðe; qiÞjvÞ should be larger than secmaxv2V PrðvÞPrð~rðe;900qiÞjvÞ by a large margin, where secmaxv2V PrðvÞPrð~rðe; qiÞjvÞ901refers to the second largest value among allPrðvÞPrð~rðe; qiÞjvÞ902(v 2 V). Hence, the confidence score confð~rðe; qiÞÞ is defined903as

confð~rðe; qiÞÞ ¼ ~cðrv� ðe; qiÞÞ�maxv2V PrðvÞPrð~rðe; qiÞjvÞ

maxv2V PrðvÞPrð~rðe; qiÞjvÞ þ secmaxv2V PrðvÞPrð~rðe; qiÞjvÞ : 905905

906

907In the NPORE system, we employ a pre-defined thresh-908old t2 to filter out relations if confðrðe; qiÞÞ < t2 for full rela-909tions or confð~rðe; qiÞÞ < t2 for partial relations.

9104 EXPERIMENTS

911In this section, we conduct experiments to evaluate NPORE912in various aspects. We also compare it with state-of-the-art913to make the convincing conclusion.

9144.1 Data Source and Experimental Settings

915The collection of entity-noun phrase pairs is taken from the916Chinese Wikipedia category system of version January 20th,9172017.7 We follow the common practice in [14], [36], [37] to918extract the pairs. In Wikipedia, the titles of the pages are

Fig. 5. A toy example of the graph structure of PHN. Small circles 1-8refer to nodes (i.e., full or partial relations, depending on the color).Large circles v1, v2 and v3 refer to hyper-edges (i.e., relation predicates).Lines refer to all possible routes that random walkers can travel.

7. http://download.wikipedia.com/zhwiki/20170120/


http://download.wikipedia.com/zhwiki/20170120/

919 treated as names of entities and the categories are treated as920 the noun phrases related to these entities. Because several921 Wikipedia pages are not about entities, after filtering of indi-922 rect, template, stub and disambiguation pages, we obtain923 0.6M entities and 2.4M entity-noun phrase pairs.924 In this paper, we use the FudanNLP toolkit [54] for basic925 Chinese NLP analysis, such as Chinese word segmentation,926 POS and NER. To improve the recall of NER, we also add927 the names of Chinese Wikipedia entities as a dictionary in928 FudanNLP. Because the size of the Chinese Wikipedia cor-929 pus is relatively small, we also crawl 1.3M articles from930 Baidu Baike (a large Chinese online encyclopedia) to train931 language models based on [57]. In total, we have a large932 Chinese corpus, consisting of 2M articles as the background933 text corpus. The dimension of word embeddings is 100.934 In the implementation of the NPORE system, the default935 hyper-parameter settings are shown as follows: n ¼ 3,936 g ¼ 0:3, b ¼ 5, �1 ¼ 0:6, �2 ¼ 0:3, t1 ¼ 0:7 and t2 ¼ 20. The937 MEWC algorithm is run for 3 times in MPS to generate the938 clique collection C. For the hypergraph-based random walk939 process, we send out ten random walkers from each starting940 points in the PHM and run for 500 steps. We also study how941 different values of these hyper-parameters can affect the942 performance in the experiments. All the algorithms of the943 NPORE system are implemented in JAVA and run in a sin-944 gle PC machine with 2.9 GHz CPU and 16 GB memory.

945 4.2 Baselines

946 Although there are abundant RE approaches, most of them947 can not be taken as baselines due to the difference between948 these works and ours. Because our system works in open949 domains, we compare our method against several OIE sys-950 tems, especially noun phrase-based OIE systems. We also951 employ knowledge extraction methods for Wikipedia cate-952 gories as baselines, summarized as follows:953 Classical Sentence-Based OIE. Because there are significant954 linguistic differences between English and Chinese, we955 regard the state-of-the-art Chinese OIE system ZORE [42] as956 a strong baseline of classical OIE systems. To accommodate957 noun phrase-based RE, we use entity-noun phrase pairs as958 queries to search for sentences in the background text cor-959 pus D. For each entity-noun phrase pair, we take top-5 sen-960 tences returned by the Apache Lucene search engine as the961 input sentences of ZORE. The implementation and detailed962 parameter settings of ZORE are taken from the authors’963 original source.8

964 Neural Sentence-BasedOIE. The encoder-decoder network is965 the state-of-the art neural network architecture for OIE. We966 employ the neural network proposed in [34] to extract rela-967 tions from the same sentences as the inputs of ZORE [42]. In968 this model, we use three-layer BiLSTMs as the encoder and969 the decoder (as the default settings reported in [34]). The970 word embeddings are trained over our corpusD by ourselves971 with the dimensionality set to 100.972 Classical Noun Phrase-Based OIE. Because we consider RE973 from noun phrases, we take a recent noun phrase-based OIE974 system RELNOUN [18] as a baseline. As it considers English975 patterns only, we manually translate such noun phrase-based

976patterns into Chinese and and implement a variant of CN-977RELNOUN to extract the relations.978Neural Noun Phrase-Based OIE. To apply cutting-edge deep979learning techniques for noun phrase-based OIE, we imple-980ment a variant of Soares et al. [58] as a baseline. In the imple-981mentation, we use ourMPS and CRGmodules as the first two982steps. As our task is unsupervised, we take the relation pair-983wise similarity model in [58] to approximate Prð~rðe; qiÞjvÞ in984MPRD. The random walker travels from one node rðe; qiÞ to985another rðe0 ; q0iÞ with probability / snðrðe; qiÞ; rðe0 ; q0iÞÞ where986snð�; �Þ is the predicted relation similarity score. The relation987statements used in [58] are the same as what we used for988ZORE [42]. The underlying Chinese BERT model [59] is989downloaded fromGitHub.9

990Wikipedia-Specific Methods. We also employ two methods991designed for RE fromWikipedia categories as baselines. The992first method is Nastase and Strube [36], which employ prep-993ositions inWikipedia category patterns to discover relations.994The second method takes from our previous work [14], [45],995which mines frequent textual patterns by graph-based min-996ing and achieves the highest performance previously.997Because the method of Nastase and Strube [36] focuses on998the English language only, we take the variant for Chinese999(i.e., CN-WikiRe [45]) as our baseline. The implementation1000details are described in [45].1001In a few cases, the Chinese Wikipedia categories are ver-1002bal clauses, such as “1946年出生 (Born in 1946)” for Donald1003Trump. The extraction of relations from these categories has1004been addressed in several baselines [14], [18], [36]. To make1005NPORE comparable with these baselines, we add an addi-1006tional step to the system. If the input category is verbal, we1007regard all the segmented elements generated by MPS as1008modifiers, rather than modifiers plus one head word.

10094.3 Evaluation Metrics

1010As a variant of RE systems, Precision, Recall and F1 score1011would be the first choices to evaluate NPORE. However,1012this evaluation method is infeasible. Re-consider Example 3.1013Given the modifier “美国(America)” in “美国政治人物

1014(Political figure in the United States)” and “Donald Trump”1015as input, four possible extracted relation triples are:

1016Possible extracted relations (English translation):

1017(Donald Trump, has-nationality, American)1018(Donald Trump, born-in, America)1019(Donald Trump, works-in, America)1020(Donald Trump, is-leader-of, America)

1021All four relation predicates are valid. Hence, the “gold-1022standard” for computing Recall and F1 score for our system1023can not be established. The difficulty of evaluating such sys-1024tems is also an open research challenge in OIE [9]. To address1025this issue, Mausam et al. [30] propose to use the Yield score to1026evaluate OIE systems. The score is calculated by multiplying1027the number of extractions (i.e., relation triples) by their preci-1028sion scores. Hence, it is equal to the (estimated) number of cor-1029rect extractions. In this paper, we employ three metrics to1030compare our system against all the baselines, i.e., #Relations1031(total number of extracted relation triples), Precision (the ratio

8. https://sourceforge.net/projects/zore/ 9. https://github.com/google-research/bert/


https://sourceforge.net/projects/zore/

https://github.com/google-research/bert/

1032 of the numbers of extracted corrected relation triples and all1033 extracted relation triples), and and Yield score (the product of1034 #Relations and Precision).

1035 4.4 Overall Performance Comparison

1036 We report the overall performance of the NPORE system and1037 compare it with baselines. After we run all the systems over1038 the entire Wikipedia entity-category dataset, we sample four1039 subsets to evaluate NPORE and baselines in terms of #Rela-1040 tions, Precision and Yield score. Each subset contains 300 enti-1041 ties and their corresponding categories. The first subset is1042 uniformly sampled from the entire dataset, denoted as1043 “General”. The remaining three subsets are domain-specific1044 datasets, which are related to the three domains: Politics,1045 Entertainment andMilitary. In thiswork, we employ heuristic1046 rules to extract domain-specific entities. For example, we1047 regard an entity is in the entertainment domain if there exists1048 one category that ends in the following words: 歌手(singer),1049 音乐(music), 电影(movie), 娱乐(entertainment), etc. The1050 example entities that belong to the three domains are shown1051 in Table 5. Due to space limitation, we omit the details of the1052 extraction process here.1053 Table 4 illustrates the experimental results of the overall1054 performance of all the systems. Generally, the metric scores of1055 all methods are consistent over all four subsets. As seen,

1056sentence-based OIE systems [34], [42] (either classical or neu-1057ral-based) do not yield satisfactory results. This is because sen-1058tence-based OIE systems extract relations primarily based on1059subject-verb-object structures, which inevitably suffers from1060data noise and low recall. In our datasets, a lot of relations1061within Chinese noun phrases are expressed more concisely1062and implicitly. Hence, sentence-based methods are not suit-1063able for noun phrase-based RE. Classical noun phrase-based1064OIE approaches (e.g., [18]) have relatively high precision but1065low coverage, leading to the low Yield score. For English, due1066to the frequent usage of prepositions, the patterns “[...] is [...]1067of [...]” and “[...] is [...] from [...]” are highly effective in1068RENOUN [18]. The lack of such expressions in Chinese result1069in low recall. In our approach, we do not rely on fixed patterns1070to extract relations. Instead, our three-layer framework can be1071viewed as a “divide-and-conquer” strategy to harvest rela-1072tions. As for Soares et al. [58], although deep languagemodels1073such as BERT [59] are employed to learn relation similarities,1074this method is not suitable for Chinese short texts. This is1075because the encoding process of Soares et al. [58] still requires1076the detection of relation statements in the corpus, which are1077extremely sparse.1078Compared to Wikipedia-based baselines, for the general1079domain, NPORE extracts 149.7 percent as many as relations1080compared to the strongest baseline [14] and has a comparable1081precision of 92.7 percent. Overall, theNPORE systemachieves1082a 47.3 percent higher Yield score than [14], indicating the effec-1083tiveness of the proposed approach. Regarding the three spe-1084cific domains, the trends of performance are similar to the1085general domain. The improvement of NPORE is mostly due1086to the data-driven designs of our system. To elaborate, previ-1087ous approaches (e.g., [14], [36]) either use manually-defined1088or automatically-mined patterns for RE, which may only con-1089sider a small portion of circumstances. Our system imposes1090few hypotheses on the input data and is fully data-driven,1091capable of extracting more relations. In summary, the superi-1092ority of the NPORE system can be easily proven through the1093four sets of experiments and our analysis.1094We further estimate the general extraction performance1095over the entire dataset. We randomly sample 500 relations1096from the complete relation collections generated from all1097the approaches and ask human annotators to evaluate1098the precision manually. Based on the estimated precision1099and the numbers of extracted relations, the total Yield sco-1100res can be also estimated. The results are summarized in1101Table 6. Overall, NPORE harvests 554K relations, at the pre-1102cision of 95.4 percent. It outperforms state-of-the-art [14] by

TABLE 5Examples of Domain-Specific Entities

Domain Entities

Politics 唐纳德·特朗普 (Donald Trump),罗纳德·里根 (Ronald Reagan),英国议会(Parliament of the UK)

Entertainment 王菲 (Faye Wong),肖申克的救赎 (TheShawshank Redemption),奥斯卡金像奖(Academy Award)

Military 航空母舰 (Aircraft Carrier),氢弹(Thermonuclear weapon),中途岛海战(Battle of Midway)

TABLE 4General Performance Comparison of Different Methods Over

Four Subsets of Chinese Wikipedia Categories

Method #Rel. Pre. Yield #Rel. Pre. Yield

Domain General Politics

Nastase and Strube [36] 87 41.7% 41 84 57.1% 48Pal and Mausam [18] 31 93.5% 29 35 88.6% 31Qiu and Zhang [42] 28 75.0% 21 34 76.4% 26Wang et al. [14] 193 94.3% 182 193 95.9% 185Cui et al. [34] 52 51.9% 27 51 43.1% 22Soares et al. [58] 213 75.6% 161 154 70.1% 108NPORE 289 92.7% 268 314 93.9% 295

Domain Entertainment Military

Nastase and Strube [36] 102 39.2% 40 76 53.9% 41Pal and Mausam [18] 42 88.1% 37 34 82.3% 28Qiu and Zhang [42] 21 76.2% 16 32 81.2% 26Wang et al. [14] 204 95.1% 194 188 96.3% 181Cui et al. [34] 54 48.1% 26 44 56.8% 25Soares et al. [58] 163 60.1% 98 201 69.2% 139NPORE 324 92.3% 299 274 94.2% 258

TABLE 6Summary of Extraction Performance Overall All Wikipedia

Categories

Method #Relations Precision Yield score

(Estimated) (Estimated)

Nastase and Strube [36] 165K 58.6% 96.7KPal and Mausam [18] 65K 92.8% 60.3KQiu and Zhang [42] 42K 82.3% 34.6KWang et al. [14] 357K 97.4% 347.7KCui et al. [34] 89K 51.2% 45.6KSoares et al. [58] 420K 72.3% 303.7KNPORE 554K 95.4% 528.5K


1103 55.2 percent of #Relations and 52.0 percent of the Yield1104 score, without sacrificing too much in precision.

1105 4.5 Detailed Analysis of NPORE

1106 We tune the hyper-parameters of NPORE and analyze the1107 performance of the NPORE components in detail.1108 Efficiency and Effectiveness of Clique Detection. A preliminary1109 experiment shows that over 90 percent of the modifiers con-1110 tain fewer than four segmented Chinese words (not Chinese1111 characters). Therefore, we set the n-gram factor as n ¼ 3. One1112 can also set a larger value for n but we suggest that such prac-1113 tice increases the computational complexity of MPS and1114 makes the perplexity of languagemodels larger.1115 Next, we focus on the efficiency and effectiveness of the1116 MEWCalgorithm and linguistic constraints.We set g and b as1117 their default values and conduct two sets of experiments, con-1118 sidering four settings: “MPS” (the MEWC algorithm without1119 any constraints), “MPS (P)” (with positive constraints), “MPS1120 (N)” (with negative constraints) and “MPS (P+N)” (with both1121 types of constraints).1122 For efficiency study, we randomly sample 100500 noun1123 phrases, perform MPS in four settings and record the execu-1124 tion time. For effectiveness study, we ask human annotators1125 to label the correctness of phrase segmentation results over1126 the “General” subset. The results are shown in Fig. 6. As seen,1127 the proposed approach with both types of constraints are1128 highly efficient. Generally speaking, the usage of both con-1129 straints reduces the running time to approximately 50 percent1130 of the original time. It can be further noted that the constraints1131 can also improve the accuracy of phrase segmentation by con-1132 sidering the linguistic characteristics of Chinese nounphrases.1133 Additionally, we tune the values of two parameters g and b,1134 with results illustrated in Fig. 7. The experimental results1135 show that the distributional score wdði; jÞ contributes more1136 than the statistical score wsði; jÞ. The highest performance can1137 be achievedwhen b ¼ 5.1138 Study of Candidate Relation Generation. In this component,1139 the cross-modifier relation generation technique is proposed1140 to improve the ratio of full relations with detected predicates.1141 To verify this hypothesis, we report the percentages of such1142 relations under two settings: with and without the cross-

1143modifier relation generation technique. The results are shown1144in Table 7. As seen, the percentages have varying degrees of1145improvement, from 2.4 to 4.6 percent.1146Study of Missing Relation Predicate Detection. For the MRPD1147component, we tune the parameters �1 and �2. The perfor-1148mance is evaluated based on the Yield score of the “General”1149subset. Fig. 8 illustrates the change of Yield scores when the1150two parameters vary. In each experiment, we fix one parame-1151ter as 0.1 and tune the value of the other. We can see that the1152changes of �1 and �2 reflect the relative importance of three1153prior probabilistic distributions. As for the hyper-graph based1154random walk process, we set t1 ¼ 0:7 because we observe1155that when t1 � 0:7, the subject-object pairs usually share the1156same relation predicate. We further tune the value of t2 for1157confidence-based filtering. In Table 8, we list examples of1158extracted relation predicates with high and low ~cðvÞ scores.1159We can see that words with low ~cðvÞ scores are usually not1160relation predicates due to POS and parsing errors. We set1161t2 ¼ 20 becausemost wordswith t2 < 20 are not proper rela-1162tion predicates.

11634.6 Error Analysis and Case Studies

1164To indicate how our method can be improved in the future,1165we analyze errors in extracted relations. 300 extracted errors1166are re-presented to the human annotators to distinguish the1167types of errors. The examples are shown in Table 9. The first1168type of errors can be summarized as incomplete object extraction1169(IOE, about 32 percent), which means the extracted objects in1170relation triples are not semantically complete. For example,1171“Seoul Special City” is the name of the complete entity, but1172MPS separates “Seoul” and “Special City”, leading to the1173error. The relation “(Wang Jiaji, has-position, president)” is

Fig. 6. Efficiency and effectiveness study of MPS.

Fig. 7. Parameter analysis w.r.t. g and b of the MPS component.

TABLE 7Percentage of Generated Relations With Relation Predicates

Setting General Politics Entertainment Military

w/o. CMRG 12.2% 11.0% 15.4% 13.3%w. CMRG 16.8% 14.6% 17.8% 15.8%

Improvement +4.6% +3.6% +2.4% +2.5%

CMRG refers to “cross-modifier relation generation”.

Fig. 8. Parameter analysis w.r.t. �1 and �2 of the MRPD component.

TABLE 8Examples of Relation Predicates With High and Low ~cðvÞ Scores

Relation Predicate ~cðvÞ Relation predicate ~cðvÞ位于(located-in) 124K 警告 (warn) 19发生 (happened-in) 53K 民变 (civil commotion) 16毕业 (graduated-from) 44K 冷藏 (refrigerate) 14建立 (established-in) 23K 集会 (assembly) 8


1174 correct in syntax but it is not much meaningful due to its1175 semantic incompleteness. The complete relation should be1176 “(Wang Jiaji, has-position, president of National Taitung Uni-1177 versity)”. The incomplete extraction problem also contributes1178 to a large proportion of errors in other OIE systems [8], [29],1179 [30]. This problem is more difficult for our system due to the1180 flexible expressions of Chinese. The remaining errors occurs1181 when the detection of missing relation predicates is incorrect.1182 This type is called error predicate detection, abbreviated as EPD.1183 For example, the NPORE system predicts there is a located-in1184 relation between an event entity and another entity tagged as1185 a location. However, a few person names in our datasets are1186 tagged as locations by NER errors. Hence, the located-in rela-1187 tions do not hold. Additionally, during the hypergraph-based1188 random walk process, incorrect relation predicates may be1189 predicted due to NLP parsing errors and noises, with exam-1190 ples illustrated in Table 9.1191 A more important problem to be addressed is the miss-1192 ing extraction problem. Table 10 illustrates two Wikipedia1193 categories w.r.t. Donald Trump that contain relational facts1194 un-extracted by NPORE. The missing relations should be1195 “(Donald Trump, survived, assassination attempt)” and1196 “(Donald Trump, worked-in, real estate)”. Based on our1197 research and the survey [9], this issue is even difficult to be1198 evaluated, not to mention solving it completely.

1199 4.7 Discussion on Further Research

1200 Although we have achieved some success, knowledge1201 extraction from Chinese noun phrases still faces challenges.1202 The key barriers lie in two aspects: i) the lack of (relatively)1203 fixed syntactic/lexical expressions in Chinese and ii) the1204 large amount of commonsense knowledge left unexpressed1205 inside noun phrases. In the future, our work can be extended1206 by addressing the following issues: i) improving our work1207 by using more conceptual and commonsense knowledge1208 such as [47], [60]; ii) extending this system to the Web scale,1209 which automatically extracts descriptive noun phrases and1210 entities from free texts and extracts the relations from them;

1211iii) developing a comprehensive framework to evaluate1212noun phrase-based OIE; and iv) studying how neural net-1213works can be applied to RE from noun phrases. While neural1214networks are suitable for learning implicit, high-dimensional1215representations, it is still a challenge for neural networks1216be used for short-text knowledge extraction that requires1217explicit reasoning and human common sense.

12185 CONCLUSION

1219In this work, we present a fully unsupervised, open-domain1220Noun Phrase based Open RE system for RE from Chinese1221noun phrases. NPORE contains three major components:1222Modifier-sensitive Phrase Segmenter, Candidate Relation1223Generator andMissingRelation PredicateDetector. Especially,1224the system integrates with a graph clique mining algorithm to1225chunk Chinese noun phrases into modifiers and head words,1226which are utilized to generate candidate relation triples. We1227further propose a probabilistic predicate detection algorithm1228with Bayesian knowledge priors and a hypergraph-based ran-1229domwalk process to detectmissing relation predicates. Exper-1230imental results over Chinese Wikipedia show that NPORE1231outperforms state-of-the-art.

1232ACKNOWLEDGMENTS

1233This work is supported by the National Key Research and1234Development Program of China (No. 2016YFB1000904).

1235REFERENCES

1236[1] X. Ren et al., “CoType: Joint extraction of typed entities and rela-1237tions with knowledge bases,” in Proc. 26th Int. Conf. World Wide1238Web, 2017, pp. 1015–1024.1239[2] R. Lu, X. Jin, S. Zhang, M. Qiu, and X. Wu, “A study on big knowl-1240edge and its engineering issues,” IEEE Trans. Knowl. Data Eng.,1241vol. 31, no. 9, pp. 1630–1644, Sep. 2019.1242[3] J. Shen et al., “HiExpan: Task-guided taxonomy construction by1243hierarchical tree expansion,” in Proc. 24th ACM SIGKDD Int. Conf.1244Knowl. Discovery Data Mining, 2018, pp. 2180–2189.1245[4] M. Yu, W. Yin, K. S. Hasan, C. N. dos Santos, B. Xiang, and B. Zhou,1246“Improved neural relation detection for knowledge base question1247answering,” in Proc. 55th Annu. Meeting Assoc. Comput. Linguistics,12482017, pp. 571–581.1249[5] S. T. Hsu, C. Moon, P. Jones, and N. F. Samatova, “An interpret-1250able generative adversarial approach to classification of latent1251entity relations in unstructured sentences,” in Proc. 32nd AAAI1252Conf. Artif. Intell., 2018, pp. 5181–5188.1253[6] S. Su, N. Jia, X. Cheng, S. Zhu, and R. Li, “Exploring encoder-1254decoder model for distant supervised relation extraction,” in Proc.125527th Int. Joint Conf. Artif. Intell., 2018, pp. 4389–4395.

TABLE 9Two Major Types of Extraction Errors and Their Examples

Error Type Extracted Relations Corrections

IOE (王家骥,职务,校长) (王家骥,职务,国立台东大学校长)(Wang Jiaji, has-position, president) (Wang Jiaji, has-position, president of National Taitung University)

(梁耀燮,出身,特别市) (梁耀燮,出身,首尔特别市)(Yang Yo-seop, originated-from, special city) (Yang Yo-seop, originated-from, Seoul Special City)

EPD (第65届戛纳电影节,担任,戛纳) (第65届戛纳电影节,位于,戛纳)(65th Cannes Film Festival, work-as, Cannes) (65th Cannes Film Festival, located-in, Cannes)

(台北101,生于,台湾) (台北101,位于,台湾)(Taipei 101, born-in, Taiwan) (Taipei 101, located-in, Taiwan)

TABLE 10Categories w.r.t. Donald Trump With Relations

Un-Extracted by NPORE

Wikipedia Category English Translation

暗杀未遂幸存者 Attempted assassination survivor美国房地产商 US real estate developer


1256 [7] P. Gupta, B. Roth, and H. Sch€utze, “Joint bootstrapping machines1257 for high confidence relation extraction,” in Proc. Conf. North Amer.1258 Chapter Assoc. Comput. Linguistics, 2018, pp. 26–36.1259 [8] K. Gashteovski, R. Gemulla, and L. D. Corro, “MinIE: Minimizing1260 facts in open information extraction,” in Proc. Conf. Empirical1261 Methods Natural Lang. Process., 2017, pp. 2630–2640.1262 [9] C. Niklaus, M. Cetto, A. Freitas, and S. Handschuh, “A survey on1263 open information extraction,” in Proc. 27th Int. Conf. Comput.1264 Linguistics, 2018, pp. 3866–3878.1265 [10] X. Han and L. Sun, “Global distant supervision for relation extrac-1266 tion,” in Proc. 30th AAAI Conf. Artif. Intell., 2016, pp. 2950–2956.1267 [11] Y. Su, H. Liu, S. Yavuz, I. Gur, H. Sun, and X. Yan, “Global rela-1268 tion embedding for relation extraction,” in Proc. Conf. North Amer.1269 Chapter Assoc. Comput. Linguistics, 2018, pp. 820–830.1270 [12] S. Saha and Mausam, “Open information extraction from conjunc-1271 tive sentences,” in Proc. 27th Int. Conf. Comput. Linguistics, 2018,1272 pp. 2288–2299.1273 [13] B. Xu, C. Xie, Y. Zhang, Y. Xiao, H. Wang, and W.Wang, “Learning1274 defining features for categories,” in Proc. 25th Int. Joint Conf. Artif.1275 Intell., 2016, pp. 3924–3930.1276 [14] C. Wang, Y. Fan, X. He, and A. Zhou, “Learning fine-grained rela-1277 tions from chinese user generated categories,” inProc. Conf. Empirical1278 Methods Natural Lang. Process., 2017, pp. 2577–2587.1279 [15] N. Tandon, G. de Melo, F. M. Suchanek, and G. Weikum,1280 “WebChild: Harvesting and organizing commonsense knowledge1281 from the web,” in Proc. 7th ACM Int. Conf. Web Search Data Mining,1282 2014, pp. 523–532.1283 [16] C. C. Xavier and V. L. S. de Lima, “Boosting open information1284 extraction with noun-based relations,” in Proc. 9th Int. Conf. Lang.1285 Resou. Eval., 2014, pp. 96–100.1286 [17] M. Yahya, S. Whang, R. Gupta, and A. Y. Halevy, “ReNoun: Fact1287 extraction for nominal attributes,” in Proc. Conf. Empirical Methods1288 Natural Lang. Process., 2014, pp. 325–335.1289 [18] H. Pal andMausam, “Demonyms and compound relational nouns1290 in nominal open IE,” in Proc. 5th Workshop Autom. Knowl. Base1291 Construction, 2016, pp. 35–39.1292 [19] C. N. Li and S. A. Thompson, “Mandarin chinese: A functional1293 reference grammar,” J. Asian Stud., vol. 42, no. 3, pp. 10–12,1989.1294 [20] C.-T. J. Huang, Logical Relations in Chinese and the Theory of Gram-1295 mar. New York, NY, USA: Taylor & Francis, 1998.1296 [21] X. Chen, Z. Shi, X. Qiu, and X. Huang, “Adversarial multi-criteria1297 learning for chinese word segmentation,” in Proc. 55th Annu. Meet-1298 ing Assoc. Comput. Linguistics, 2017, pp. 1193–1203.1299 [22] R. Fu, J. Guo, B. Qin, W. Che, H. Wang, and T. Liu, “Learning1300 semantic hierarchies via word embeddings,” in Proc. 52nd Annu.1301 Meeting Assoc. Comput. Linguistics, 2014, pp. 1199–1209.1302 [23] C. Wang, J. Yan, A. Zhou, and X. He, “Transductive non-linear1303 learning for chinese hypernym prediction,” in Proc. 55th Annu.1304 Meeting Assoc. Comput. Linguistics, 2017, pp. 1394–1404.1305 [24] F. M. Suchanek, G. Kasneci, and G. Weikum, “Yago: A core of1306 semantic knowledge,” in Proc. 16th Int. Conf. World Wide Web,1307 2007, pp. 697–706.1308 [25] J. Hoffart, F. M. Suchanek, K. Berberich, and G. Weikum, “YAGO2:1309 A spatially and temporally enhanced knowledge base from1310 wikipedia,”Artif. Intell., vol. 194, pp. 28–61, 2013.1311 [26] C. Lyu, Y. Zhang, and D. Ji, “Joint word segmentation, POS-tagging1312 and syntactic chunking,” in Proc. 30th AAAI Conf. Artif. Intell., 2016,1313 pp. 3007–3014.1314 [27] I. Dagan and V. Shwartz, “Paraphrase to explicate: Revealing1315 implicit noun-compound relations,” in Proc. 56th Annu. Meeting1316 Assoc. Comput. Linguistics, 2018, pp. 1200–1211.1317 [28] C.Wang, X.He, andA.Zhou, “A short survey on taxonomy learning1318 from text corpora: Issues, resources and recent advances,” in Proc.1319 Conf. EmpiricalMethods Natural Lang. Process., 2017, pp. 1190–1203.1320 [29] M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and1321 O. Etzioni, “Open information extraction from the web,” in Proc.1322 20th Int. Joint Conf. Artif. Intell., 2007, pp. 2670–2676.1323 [30] Mausam, M. Schmitz, S. Soderland, R. Bart, and O. Etzioni, “Open1324 language learning for information extraction,” in Proc. Joint Conf.1325 Empirical Methods Natural Lang. Process. Comput. Natural Lang.1326 Learn., 2012, pp. 523–534.1327 [31] X. Zeng, S. He, K. Liu, and J. Zhao, “Large scaled relation extrac-1328 tion with reinforcement learning,” in Proc. 32nd AAAI Conf. Artif.1329 Intell., 2018, pp. 5658–5665.1330 [32] W. Y. Wang, W. Xu, and P. Qin, “DSGAN: Generative adversarial1331 training for distant supervision relation extraction,” in Proc. 56th1332 Annu. Meeting Assoc. Comput. Linguistics, 2018, pp. 496–505.

1333[33] Y. Lin, Z. Liu, and M. Sun, “Neural relation extraction with multi-1334lingual attention,” in Proc. 55th Annu. Meeting Assoc. Comput. Lin-1335guistics, 2017, pp. 34–43.1336[34] L. Cui, F. Wei, and M. Zhou, “Neural open information extrac-1337tion,” in Proc. 56th Annu. Meeting Assoc. Comput. Linguistics, 2018,1338pp. 407–413.1339[35] Q. Zhu, X. Ren, J. Shang, Y. Zhang, F. F. Xu, and J. Han, “Open1340information extraction with global structure constraints,” in Com-1341panion Proc. Web Conf., 2018, pp. 57–58.1342[36] V. Nastase and M. Strube, “Decoding wikipedia categories for1343knowledge acquisition,” in Proc. 23rd Nat. Conf. Artif. Intell., 2008,1344pp. 1219–1224.1345[37] M. Pasca, “German typographers vs. german grammar: Decomposi-1346tion of wikipedia category labels into attribute-value pairs,” in Proc.134710th ACM Int. Conf.Web Search DataMining, 2017, pp. 315–324.1348[38] V. Shwartz and C. Waterson, “Olive oil is made of olives, baby oil1349is made for babies: Interpreting noun compounds using para-1350phrases in a neural model,” in Proc. Annu. Conf. North Amer. Chap-1351ter Assoc. Comput. Linguistics, 2018, pp. 218–224.1352[39] P. Nakov and M. A. Hearst, “Using verbs to characterize noun-1353noun relations,” in Proc. Int. Conf. Artif. Intell.: Methodology Syst.1354Appl., 2006, pp. 233–244.1355[40] A. Grycner and G. Weikum, “POLY: Mining relational paraphrases1356from multilingual sentences,” in Proc. Conf. Empirical Methods Natu-1357ral Lang. Process., 2016, pp. 2183–2192.1358[41] I.Hendrickx, Z.Kozareva, P.Nakov, D. �O. S�eaghdha, S. Szpakowicz,1359and T. Veale, “SemEval-2013 task 4: Free paraphrases of noun com-1360pounds,” in Proc. 2nd Joint Conf. Lexical Comput. Semantics: Proc. 7th1361Int.Workshop Semantic Eval., 2013, pp. 138–143.1362[42] L. Qiu and Y. Zhang, “ZORE: A syntax-based system for chinese1363open relation extraction,” in Proc. Conf. Empirical Methods Natural1364Lang. Process., 2014, pp. 1870–1880.1365[43] Y. Tseng, L. Lee, S. Lin, B. Liao, M. Liu, H. Chen, O. Etzioni, and1366A. Fader, “Chinese open relation extraction for knowledge acquis-1367ition,” in Proc. 14th Conf. Eur. Chapter Assoc. Comput. Linguistics,13682014, pp. 12–16.1369[44] S. Jia, S. E,M. Li, andY. Xiang, “Chinese open relation extraction and1370knowledge base establishment,” ACM Trans. Asian Low-Resource1371Lang. Inf. Process., vol. 17, no. 3, pp. 15:1–15:22, 2018.1372[45] C. Wang, Y. Fan, X. He, and A. Zhou, “Decoding chinese user gen-1373erated categories for fine-grained knowledge harvesting,” IEEE1374Trans. Knowl. Data Eng., vol. 31, no. 8, pp. 1491–1505, Aug. 2019.1375[46] D. B. Lenat, “CYC: A large-scale investment in knowledge infra-1376structure,” Commun. ACM, vol. 38, no. 11, pp. 32–38, 1995.1377[47] R. Speer and C. Havasi, “Representing general relational knowl-1378edge in ConceptNet 5,” in Proc. 8th Int. Conf. Lang. Resources Eval.,13792012, pp. 3679–3686.1380[48] K. Narisawa, Y. Watanabe, J. Mizuno, N. Okazaki, and K. Inui, “Is1381a 204 cm man tall or small ? Acquisition of numerical common1382sense from the web,” in Proc. 51st Annu. Meeting Assoc. Comput.1383Linguistics, 2013, pp. 382–391.1384[49] G. Collell, L. V. Gool, and M. Moens, “Acquiring common sense1385spatial knowledge through implicit spatial templates,” in Proc.138632nd AAAI Conf. Artif. Intell., 2018, pp. 6765–6772.1387[50] F. F. Xu, B. Y. Lin, and K. Q. Zhu, “Automatic extraction of1388commonsense LocatedNear knowledge,” in Proc. 56th Annu. Meet-1389ing Assoc. Comput. Linguistics, 2018, pp. 96–101.1390[51] M. Pasca, “Interpreting compound noun phrases using web1391search queries,” in Proc. Conf. North Amer. Chapter Assoc. Comput.1392Linguistics, 2015, pp. 335–344.1393[52] S. Cordeiro, C. Ramisch, M. Idiart, and A. Villavicencio, “Predicting1394the compositionality of nominal compounds: Giving word embed-1395dings a hard time,” in Proc. 54th Annu. Meeting Assoc. Comput. Lin-1396guistics, 2016, pp. 1986–1997.1397[53] B. Alidaee, F. Glover, G. A. Kochenberger, and H. Wang, “Solving1398the maximum edge weight clique problem via unconstrained qua-1399dratic programming,” Eur. J. Oper. Res., vol. 181, no. 2, pp. 592–597,14002007.1401[54] X. Qiu, Q. Zhang, and X. Huang, “FudanNLP: A toolkit for chi-1402nese natural language processing,” in Proc. 51st Annu. Meeting1403Assoc. Comput. Linguistics, 2013, pp. 49–54.1404[55] D. Zhang, J. Yuan, X. Wang, and A. Foster, “Probabilistic verb1405selection for data-to-text generation,” Trans. Assoc. Comput. Lin-1406guistics, vol. 6, pp. 511–527, 2018.1407[56] C. Zhai and J. D. Lafferty, “A study of smoothing methods for lan-1408guage models applied to ad hoc information retrieval,” ACM1409SIGIR Forum, vol. 51, no. 2, pp. 268–276, 2017.


1410 [57] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation1411 of word representations in vector space,” in Proc. 1st Int. Conf. Learn.1412 Representations, 2013.1413 [58] L. B. Soares, N. FitzGerald, J. Ling, and T. Kwiatkowski, “Matching1414 the blanks: Distributional similarity for relation learning,” in Proc.1415 57thAnnu.MeetingAssoc. Comput. Linguistics, 2019, pp. 2895–2905.1416 [59] J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training1417 of deep bidirectional transformers for language understanding,” in1418 Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics, 2019,1419 pp. 4171–4186.1420 [60] W. Wu, H. Li, H. Wang, and K. Q. Zhu, “Probase: A probabilistic1421 taxonomy for text understanding,” in Proc. ACM SIGMOD Int.1422 Conf. Manage. Data, 2012, pp. 481–492.

1423 Chengyu Wang received the BE degree in soft-1424 ware engineering from East China Normal Univer-1425 sity, in 2015. He is currently working toward the1426 PhD degree in the School of Software Engineer-1427 ing, East China Normal University (ECNU), China.1428 His research interests include Web data mining,1429 information extraction, and natural language proc-1430 essing. He is working on the construction and1431 application of large-scale knowledge graphs.

1432Xiaofeng He received the PhD degree from Penn-1433sylvania State University, Pennsylvania. He is a1434professor of computer science at the School of1435Computer Science and Technology, East China1436Normal University, China. His research interests1437include machine learning, data mining, and infor-1438mation retrieval. Prior to joining ECNU, he worked1439at Microsoft, Yahoo Labs, and Lawrence Berkeley1440National Laboratory. He is amember of the IEEE.1441

1442Aoying Zhou is a professor with East China Nor-1443mal University. He is now acting as a vice director1444of the ACM SIGMOD China and Database Tech-1445nology Committee of the China Computer Feder-1446ation. He is serving as a member of the editorial1447boards of the VLDB Journal, WWW Journal, etc.1448His research interests include data management1449for data-intensive computing, and memory cluster1450computing. He is a member of the IEEE.

1451

1452" For more information on this or any other computing topic,1453please visit our Digital Library at www.computer.org/csdl.


open relation extraction for chinese noun phrases · 16 index terms—open relation extraction,...

Documents