exploiting background knowledge for relation extraction

54
Exploiting Background Knowledge for Relation Extraction Yee Seng Chan and Dan Roth University of Illinois at Urbana- Champaign 1

Upload: hedley-osborn

Post on 01-Jan-2016

53 views

Category:

Documents


1 download

DESCRIPTION

Exploiting Background Knowledge for Relation Extraction. Yee Seng Chan and Dan Roth University of Illinois at Urbana-Champaign. Relation Extraction. Relation extraction (RE) “David Cone, a Kansas City native, was originally signed by the Royals and broke into the majors with the team” - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Exploiting Background Knowledge for Relation Extraction

Exploiting Background Knowledgefor Relation Extraction

Yee Seng Chan and Dan Roth

University of Illinois at Urbana-Champaign

1

Page 2: Exploiting Background Knowledge for Relation Extraction

Relation Extraction

Relation extraction (RE) “David Cone, a Kansas City native, was originally signed by the

Royals and broke into the majors with the team”

Supervised RE Train on sentences annotated with entity mentions and

predefined target relations Common features: BOW, POS tags, syntactic/dependency

parses, kernel functions based on structured representations of the sentence

2

Page 3: Exploiting Background Knowledge for Relation Extraction

Background Knowledge

Features employed are usually restricted to being defined on the various representations of the target sentences

Humans rely on background knowledge to recognize relations

Overall aim of this work Propose methods of using knowledge or resources that exists

beyond the sentence Wikipedia, word clusters, hierarchy of relations, entity type

constraints, coreference As additional features, or under the Constraint Conditional Model

(CCM) framework with Integer Linear Programming (ILP)

3

Page 4: Exploiting Background Knowledge for Relation Extraction

4

DavidCone,aKansasCitynative,wasoriginallysignedbytheRoyalsandbrokeintothemajorswiththeteam

Using Background Knowledge

Page 5: Exploiting Background Knowledge for Relation Extraction

5

DavidCone,aKansasCitynative,wasoriginallysignedbytheRoyalsandbrokeintothemajorswiththeteam

Using Background Knowledge

Page 6: Exploiting Background Knowledge for Relation Extraction

6

DavidCone,aKansasCitynative,wasoriginallysignedbytheRoyalsandbrokeintothemajorswiththeteam

Using Background Knowledge

Page 7: Exploiting Background Knowledge for Relation Extraction

7

DavidCone,aKansasCitynative,wasoriginallysignedbytheRoyalsandbrokeintothemajorswiththeteam

Using Background Knowledge

Page 8: Exploiting Background Knowledge for Relation Extraction

8

DavidCone,aKansasCitynative,wasoriginallysignedbytheRoyalsandbrokeintothemajorswiththeteam

Using Background KnowledgeDavid Brian Cone (born January 2, 1963) is a former Major League Baseball pitcher. He compiled an 8–3 postseason record over 21 postseason starts and was a part of five World Series championship teams (1992 with the Toronto Blue Jays and 1996, 1998, 1999 & 2000 with the New York Yankees). He had a career postseason ERA of 3.80. He is the subject of the book A Pitcher's Story: Innings With David Cone by Roger Angell. Fans of David are known as "Cone-Heads."Cone lives in Stamford, Connecticut, and is formerly a color commentator for the Yankees on the YES Network.[1]

Contents[hide]1 Early years2 Kansas City Royals3 New York Mets

Partly because of the resulting lack of leadership, after the 1994 season the Royals decided to reduce payroll by trading pitcher David Cone and outfielder Brian McRae, then continued their salary dump in the 1995 season. In fact, the team payroll, which was always among the league's highest, was sliced in half from $40.5 million in 1994 (fourth-highest in the major leagues) to $18.5 million in 1996 (second-lowest in the major leagues)

Page 9: Exploiting Background Knowledge for Relation Extraction

9

DavidCone,aKansasCitynative,wasoriginallysignedbytheRoyalsandbrokeintothemajorswiththeteam

Using Background Knowledge

fine-grained

Employment:Staff 0.20

Employment:Executive 0.15

Personal:Family 0.10

Personal:Business 0.10

Affiliation:Citizen 0.20

Affiliation:Based-in 0.25

Page 10: Exploiting Background Knowledge for Relation Extraction

10

DavidCone,aKansasCitynative,wasoriginallysignedbytheRoyalsandbrokeintothemajorswiththeteam

Using Background Knowledge

fine-grained coarse-grained

Employment:Staff 0.200.35 Employment

Employment:Executive 0.15

Personal:Family 0.100.40 Personal

Personal:Business 0.10

Affiliation:Citizen 0.200.25 Affiliation

Affiliation:Based-in 0.25

Page 11: Exploiting Background Knowledge for Relation Extraction

11

DavidCone,aKansasCitynative,wasoriginallysignedbytheRoyalsandbrokeintothemajorswiththeteam

Using Background Knowledge

fine-grained coarse-grained

Employment:Staff 0.200.35 Employment

Employment:Executive 0.15

Personal:Family 0.100.40 Personal

Personal:Business 0.10

Affiliation:Citizen 0.200.25 Affiliation

Affiliation:Based-in 0.25

Page 12: Exploiting Background Knowledge for Relation Extraction

12

DavidCone,aKansasCitynative,wasoriginallysignedbytheRoyalsandbrokeintothemajorswiththeteam

Using Background Knowledge

fine-grained coarse-grained

Employment:Staff 0.200.35 Employment

Employment:Executive 0.15

Personal:Family 0.100.40 Personal

Personal:Business 0.10

Affiliation:Citizen 0.200.25 Affiliation

Affiliation:Based-in 0.25

0.55

Page 13: Exploiting Background Knowledge for Relation Extraction

Basic Relation Extraction (RE) System

Our basicRE system Given a sentence “... m1 ... m2 ...”, predict whether any

predefined relation holds Asymmetric relations, e.g. m1:r:m2 vs m2:r:m1

13

Page 14: Exploiting Background Knowledge for Relation Extraction

Basic Relation Extraction (RE) System

Our basicRE system Given a sentence “... m1 ... m2 ...”, predict whether any

predefined relation holds Asymmetric relations, e.g. m1:r:m2 vs m2:r:m1

14

Page 15: Exploiting Background Knowledge for Relation Extraction

Basic Relation Extraction (RE) System

Most of the features based on the work in (Zhou et al., 2005)

Lexical: hw, BOW, bigrams, ... Collocations: words to the left/right of the mentions, ... Structural: m1-in-m2, #mentions between m1,m2, ...

Entity typing: m1,m2 entity-type, ...

Dependency: dep-path between m1,m2, ...

15

Page 16: Exploiting Background Knowledge for Relation Extraction

Knowledge Sources

As additional features Wikipedia Word clusters

As constraints Hierarchy of relations Entity type constraints Coreference

16

Page 17: Exploiting Background Knowledge for Relation Extraction

Knowledge1: Wikipedia1

(as additional feature)

We use a Wikifier system (Ratinov et al., 2010) which performs context-sensitive mapping of mentions to Wikipedia pages

Introduce a new feature based on:

introduce a new feature by combining the above with the coarse-grained entity types of mi,mj

otherwise ,0

)(or )( if ,1),(1

imjmji

mAmAmmw ji

17

mi mj

r ?

Page 18: Exploiting Background Knowledge for Relation Extraction

Knowledge1: Wikipedia2

(as additional feature)

Given mi,mj, we use a Parent-Child system (Do and Roth, 2010) to predict whether they have a parent-child relation

Introduce a new feature based on:

combine the above with the coarse-grained entity types of mi,mj

otherwise ,0

),( if ,1),(2

jiji

mmchild-parentmmw

18

mi mj

parent-child?

Page 19: Exploiting Background Knowledge for Relation Extraction

Knowledge2: Word Class Information(as additional feature)

Supervised systems face an issue of data sparseness (of lexical features)

Use class information of words to support generalization better: instantiated as word clusters in our work Automatically generated from unlabeled texts using algorithm of

(Brown et al., 1992)

apple pear Apple IBM

0 1 0 1

0 1

bought run of in

0 1 0 1

0 1

0 1

19

Page 20: Exploiting Background Knowledge for Relation Extraction

Knowledge2: Word Class Information

Supervised systems face an issue of data sparseness (of lexical features)

Use class information of words to support generalization better: instantiated as word clusters in our work Automatically generated from unlabeled texts using algorithm of

(Brown et al., 1992)

apple pear Apple

0 1 0 1

0 1

bought run of in

0 1 0 1

0 1

0 1

20

IBMIBM

Page 21: Exploiting Background Knowledge for Relation Extraction

Knowledge2: Word Class Information

Supervised systems face an issue of data sparseness (of lexical features)

Use class information of words to support generalization better: instantiated as word clusters in our work Automatically generated from unlabeled texts using algorithm of

(Brown et al., 1992)

apple pear Apple

0 1 0 1

0 1

bought run of in

0 1 0 1

0 1

0 1

21

IBMIBM011011

Page 22: Exploiting Background Knowledge for Relation Extraction

Knowledge2: Word Class Information

All lexical features consisting of single words will be duplicated with its corresponding bit-string representation

apple pear Apple IBM

0 1 0 1

0 1

bought run of in

0 1 0 1

0 1

0 1

22

00 01 10 11

Page 23: Exploiting Background Knowledge for Relation Extraction

Knowledge Sources

As additional features Wikipedia Word clusters

As constraints Hierarchy of relations Entity type constraints Coreference

23

Page 24: Exploiting Background Knowledge for Relation Extraction

24

weight vector for“local” models collection of

classifiers

Constraint Conditional Models (CCMs)(Roth and Yih, 2007; Chang et al., 2008)

Page 25: Exploiting Background Knowledge for Relation Extraction

Constraint Conditional Models (CCMs)(Roth and Yih, 2007; Chang et al., 2008)

25

weight vector for“local” models collection of

classifiers

penalty for violatingthe constraint

how far y is from a “legal” assignment

Page 26: Exploiting Background Knowledge for Relation Extraction

Constraint Conditional Models (CCMs)(Roth and Yih, 2007; Chang et al., 2008)

26

•Wikipedia•word clusters

•hierarchy of relations•entity type constraints•coreference

Page 27: Exploiting Background Knowledge for Relation Extraction

Constraint Conditional Models (CCMs)(Roth and Yih, 2007; Chang et al., 2008)

Goal of CCM: when you want to predict multiple variables, and you want to exploit the fact that they are related Encode knowledge as constraints to exploit interaction between

the multiple predictions You impose constraints on the predictions of your various models.

This is a global inference problem

We learn separate models and then perform joint global inference to arrive at final predictions

27

Page 28: Exploiting Background Knowledge for Relation Extraction

28

DavidCone,aKansasCitynative,wasoriginallysignedbytheRoyalsandbrokeintothemajorswiththeteam

Constraint Conditional Models (CCMs)

fine-grained coarse-grained

Employment:Staff 0.200.35 Employment

Employment:Executive 0.15

Personal:Family 0.100.40 Personal

Personal:Business 0.10

Affiliation:Citizen 0.200.25 Affiliation

Affiliation:Based-in 0.25

Page 29: Exploiting Background Knowledge for Relation Extraction

Key steps Write down a linear objective function Write down constraints as linear inequalities Solve using integer linear programming (ILP) packages

29

Constraint Conditional Models (CCMs)(Roth and Yih, 2007; Chang et al., 2008)

Page 30: Exploiting Background Knowledge for Relation Extraction

Knowledge3: Relations between our target relations

......

personal

......

employment

family biz executivestaff

30

Page 31: Exploiting Background Knowledge for Relation Extraction

Knowledge3: Hierarchy of Relations

......

personal

......

employment

family biz executivestaff

31

coarse-grainedclassifier

fine-grainedclassifier

Page 32: Exploiting Background Knowledge for Relation Extraction

Knowledge3: Hierarchy of Relations

......

personal

......

employment

family biz executivestaff

32

mi mj

coarse-grained?

fine-grained?

Page 33: Exploiting Background Knowledge for Relation Extraction

Knowledge3: Hierarchy of Relations

......

personal

......

employment

family biz executivestaff

33

Page 34: Exploiting Background Knowledge for Relation Extraction

Knowledge3: Hierarchy of Relations

......

personal

......

employment

family biz executivestaff

34

Page 35: Exploiting Background Knowledge for Relation Extraction

Knowledge3: Hierarchy of Relations

......

personal

......

employment

family biz executivestaff

35

Page 36: Exploiting Background Knowledge for Relation Extraction

Knowledge3: Hierarchy of Relations

......

personal

......

employment

family biz executivestaff

36

Page 37: Exploiting Background Knowledge for Relation Extraction

Knowledge3: Hierarchy of Relations

......

personal

......

employment

family biz executivestaff

37

Page 38: Exploiting Background Knowledge for Relation Extraction

Knowledge3: Hierarchy of Relations

Write down a linear objective function

R LR L R rf

rfRRR rc

rcRR

RfRc

yrfpxrcp ,, )()(max

38

coarse-grainedprediction probabilities

fine-grainedprediction probabilities

Page 39: Exploiting Background Knowledge for Relation Extraction

Knowledge3: Hierarchy of Relations

Write down a linear objective function

R LR L R rf

rfRRR rc

rcRR

RfRc

yrfpxrcp ,, )()(max

39

coarse-grainedprediction probabilities

fine-grainedprediction probabilities

coarse-grainedindicatorvariable

fine-grainedindicatorvariable

indicator variable == relation assignment

Page 40: Exploiting Background Knowledge for Relation Extraction

Knowledge3: Hierarchy of Relations

Write down constraints If a relation R is assigned a coarse-grained label rc, then we must

also assign to R a fine-grained relation rf which is a child of rc.

(Capturing the inverse relationship) If we assign rf to R, then we must also assign to R the parent of rf, which is a corresponding coarse-grained label

40

nrfRrfRrfRrcR yyyx ,,,, 21

)(,, rfparentRrfR xy

Page 41: Exploiting Background Knowledge for Relation Extraction

Knowledge4: Entity Type Constraints(Roth and Yih, 2004, 2007)

Entity types are useful for constraining the possible labels that a relation R can assume

41

mi mj

Employment:Staff

Employment:Executive

Personal:Family

Personal:Business

Affiliation:Citizen

Affiliation:Based-in

Page 42: Exploiting Background Knowledge for Relation Extraction

Entity types are useful for constraining the possible labels that a relation R can assume

42

Employment:Staff

Employment:Executive

Personal:Family

Personal:Business

Affiliation:Citizen

Affiliation:Based-in

per org

per org

per

per per

per

per

org

gpe

gpe

per per

mi mj

Knowledge4: Entity Type Constraints(Roth and Yih, 2004, 2007)

Page 43: Exploiting Background Knowledge for Relation Extraction

We gather information on entity type constraints from ACE-2004 documentation and impose them on the coarse-grained relations By improving the coarse-grained predictions and combining with the

hierarchical constraints defined earlier, the improvements would propagate to the fine-grained predications

43

Employment:Staff

Employment:Executive

Personal:Family

Personal:Business

Affiliation:Citizen

Affiliation:Based-in

per org

per org

per

per per

per

per

org

gpe

gpe

per per

mi mj

Knowledge4: Entity Type Constraints(Roth and Yih, 2004, 2007)

Page 44: Exploiting Background Knowledge for Relation Extraction

Knowledge5: Coreference

44

mi mj

Employment:Staff

Employment:Executive

Personal:Family

Personal:Business

Affiliation:Citizen

Affiliation:Based-in

Page 45: Exploiting Background Knowledge for Relation Extraction

Knowledge5: Coreference

In this work, we assume that we are given the coreference information, which is available from the ACE annotation.

45

mi mj

Employment:Staff

Employment:Executive

Personal:Family

Personal:Business

Affiliation:Citizen

Affiliation:Based-in

null

Page 46: Exploiting Background Knowledge for Relation Extraction

Experiments

Used the ACE-2004 dataset for our experiments Relations do not cross sentence boundaries We model the argument order (of the mentions)

m1:r:m2 vs m2:r:m1

Allow null label prediction when mentions are not related

Classifiers regularized averaged perceptrons implemented within the SNoW

(Carlson et al., 1999) Followed prior work (Jiang and Zhai, 2007) and performed 5-fold

cross validation

46

Page 47: Exploiting Background Knowledge for Relation Extraction

Performance of the Basic RE System

Build a BasicRE system using only the basic features Compare against the state-of-the-art feature-based RE

system of Jiang and Zhai (2007) The authors performed their evaluation using undirected coarse-

grained relations (7 relation labels + 1 null label) Evaluation on nwire and bnews corpora of ACE-2004 Performance (F1%)

47

Jiang and Zhai (2007) BasicRE

71.5% 71.2%

Page 48: Exploiting Background Knowledge for Relation Extraction

Experimental Settings

ACE-2004: 7 (coarse) and 23 (fine) grained relations Trained two classifiers: coarse-grained (15 relation labels) fine-grained (47 relation labels)

Focus on evaluation of fine-grained relations Use the nwire corpus for our experiments Two of our knowledge sources (Wiki system, word clusters) assume

inputs of mixed-case text bnews corpus in lower-cased text 28,943 relation instances with 2,226 (non-null)

48

Page 49: Exploiting Background Knowledge for Relation Extraction

Evaluation Settings

49

Prior work Our workTrain-test data splits at mention level

Train-test data splits at document level

Evaluation at mention level

Evaluation at entity level

more realistic

Page 50: Exploiting Background Knowledge for Relation Extraction

Experimental Settings

Evaluate our performance at the entity level Prior work calculated RE performance at the level of mentions ACE annotators rarely duplicate a relation link for coreferent

mentions:

Given a pair of entities, we establish the set of relation types existing between them, based on their mention annotations

r

... mi ... mj ... mk ...

50

null?

Page 51: Exploiting Background Knowledge for Relation Extraction

Experiment Results (F1%): fine-grained relations

51

All nwire 10% of nwire

BasicRE 50.5% 31.0%

Page 52: Exploiting Background Knowledge for Relation Extraction

Experiment Results (F1%): fine-grained relations

52

F1% improvement from using each knowledge source

All nwire 10% of nwire

BasicRE 50.5% 31.0%

Page 53: Exploiting Background Knowledge for Relation Extraction

Related Work

Ji et al. (2005), Zhou et al. (2005,2008), Jiang (2009)

53

Page 54: Exploiting Background Knowledge for Relation Extraction

Conclusion

We proposed a broad range of methods to inject background knowledge into a RE system

Some methods (e.g. exploiting the relation hierarchy) are general in nature

To combine the various relation predictions, we perform global inference within an ILP framework This allows for easy injection of knowledge as constraints Ensures globally coherent models and predictions

54