exploiting background knowledge for relation extraction
DESCRIPTION
Exploiting Background Knowledge for Relation Extraction. Yee Seng Chan and Dan Roth University of Illinois at Urbana-Champaign. Relation Extraction. Relation extraction (RE) “David Cone, a Kansas City native, was originally signed by the Royals and broke into the majors with the team” - PowerPoint PPT PresentationTRANSCRIPT
Exploiting Background Knowledgefor Relation Extraction
Yee Seng Chan and Dan Roth
University of Illinois at Urbana-Champaign
1
Relation Extraction
Relation extraction (RE) “David Cone, a Kansas City native, was originally signed by the
Royals and broke into the majors with the team”
Supervised RE Train on sentences annotated with entity mentions and
predefined target relations Common features: BOW, POS tags, syntactic/dependency
parses, kernel functions based on structured representations of the sentence
2
Background Knowledge
Features employed are usually restricted to being defined on the various representations of the target sentences
Humans rely on background knowledge to recognize relations
Overall aim of this work Propose methods of using knowledge or resources that exists
beyond the sentence Wikipedia, word clusters, hierarchy of relations, entity type
constraints, coreference As additional features, or under the Constraint Conditional Model
(CCM) framework with Integer Linear Programming (ILP)
3
4
DavidCone,aKansasCitynative,wasoriginallysignedbytheRoyalsandbrokeintothemajorswiththeteam
Using Background Knowledge
5
DavidCone,aKansasCitynative,wasoriginallysignedbytheRoyalsandbrokeintothemajorswiththeteam
Using Background Knowledge
6
DavidCone,aKansasCitynative,wasoriginallysignedbytheRoyalsandbrokeintothemajorswiththeteam
Using Background Knowledge
7
DavidCone,aKansasCitynative,wasoriginallysignedbytheRoyalsandbrokeintothemajorswiththeteam
Using Background Knowledge
8
DavidCone,aKansasCitynative,wasoriginallysignedbytheRoyalsandbrokeintothemajorswiththeteam
Using Background KnowledgeDavid Brian Cone (born January 2, 1963) is a former Major League Baseball pitcher. He compiled an 8–3 postseason record over 21 postseason starts and was a part of five World Series championship teams (1992 with the Toronto Blue Jays and 1996, 1998, 1999 & 2000 with the New York Yankees). He had a career postseason ERA of 3.80. He is the subject of the book A Pitcher's Story: Innings With David Cone by Roger Angell. Fans of David are known as "Cone-Heads."Cone lives in Stamford, Connecticut, and is formerly a color commentator for the Yankees on the YES Network.[1]
Contents[hide]1 Early years2 Kansas City Royals3 New York Mets
Partly because of the resulting lack of leadership, after the 1994 season the Royals decided to reduce payroll by trading pitcher David Cone and outfielder Brian McRae, then continued their salary dump in the 1995 season. In fact, the team payroll, which was always among the league's highest, was sliced in half from $40.5 million in 1994 (fourth-highest in the major leagues) to $18.5 million in 1996 (second-lowest in the major leagues)
9
DavidCone,aKansasCitynative,wasoriginallysignedbytheRoyalsandbrokeintothemajorswiththeteam
Using Background Knowledge
fine-grained
Employment:Staff 0.20
Employment:Executive 0.15
Personal:Family 0.10
Personal:Business 0.10
Affiliation:Citizen 0.20
Affiliation:Based-in 0.25
10
DavidCone,aKansasCitynative,wasoriginallysignedbytheRoyalsandbrokeintothemajorswiththeteam
Using Background Knowledge
fine-grained coarse-grained
Employment:Staff 0.200.35 Employment
Employment:Executive 0.15
Personal:Family 0.100.40 Personal
Personal:Business 0.10
Affiliation:Citizen 0.200.25 Affiliation
Affiliation:Based-in 0.25
11
DavidCone,aKansasCitynative,wasoriginallysignedbytheRoyalsandbrokeintothemajorswiththeteam
Using Background Knowledge
fine-grained coarse-grained
Employment:Staff 0.200.35 Employment
Employment:Executive 0.15
Personal:Family 0.100.40 Personal
Personal:Business 0.10
Affiliation:Citizen 0.200.25 Affiliation
Affiliation:Based-in 0.25
12
DavidCone,aKansasCitynative,wasoriginallysignedbytheRoyalsandbrokeintothemajorswiththeteam
Using Background Knowledge
fine-grained coarse-grained
Employment:Staff 0.200.35 Employment
Employment:Executive 0.15
Personal:Family 0.100.40 Personal
Personal:Business 0.10
Affiliation:Citizen 0.200.25 Affiliation
Affiliation:Based-in 0.25
0.55
Basic Relation Extraction (RE) System
Our basicRE system Given a sentence “... m1 ... m2 ...”, predict whether any
predefined relation holds Asymmetric relations, e.g. m1:r:m2 vs m2:r:m1
13
Basic Relation Extraction (RE) System
Our basicRE system Given a sentence “... m1 ... m2 ...”, predict whether any
predefined relation holds Asymmetric relations, e.g. m1:r:m2 vs m2:r:m1
14
Basic Relation Extraction (RE) System
Most of the features based on the work in (Zhou et al., 2005)
Lexical: hw, BOW, bigrams, ... Collocations: words to the left/right of the mentions, ... Structural: m1-in-m2, #mentions between m1,m2, ...
Entity typing: m1,m2 entity-type, ...
Dependency: dep-path between m1,m2, ...
15
Knowledge Sources
As additional features Wikipedia Word clusters
As constraints Hierarchy of relations Entity type constraints Coreference
16
Knowledge1: Wikipedia1
(as additional feature)
We use a Wikifier system (Ratinov et al., 2010) which performs context-sensitive mapping of mentions to Wikipedia pages
Introduce a new feature based on:
introduce a new feature by combining the above with the coarse-grained entity types of mi,mj
otherwise ,0
)(or )( if ,1),(1
imjmji
mAmAmmw ji
17
mi mj
r ?
Knowledge1: Wikipedia2
(as additional feature)
Given mi,mj, we use a Parent-Child system (Do and Roth, 2010) to predict whether they have a parent-child relation
Introduce a new feature based on:
combine the above with the coarse-grained entity types of mi,mj
otherwise ,0
),( if ,1),(2
jiji
mmchild-parentmmw
18
mi mj
parent-child?
Knowledge2: Word Class Information(as additional feature)
Supervised systems face an issue of data sparseness (of lexical features)
Use class information of words to support generalization better: instantiated as word clusters in our work Automatically generated from unlabeled texts using algorithm of
(Brown et al., 1992)
apple pear Apple IBM
0 1 0 1
0 1
bought run of in
0 1 0 1
0 1
0 1
19
Knowledge2: Word Class Information
Supervised systems face an issue of data sparseness (of lexical features)
Use class information of words to support generalization better: instantiated as word clusters in our work Automatically generated from unlabeled texts using algorithm of
(Brown et al., 1992)
apple pear Apple
0 1 0 1
0 1
bought run of in
0 1 0 1
0 1
0 1
20
IBMIBM
Knowledge2: Word Class Information
Supervised systems face an issue of data sparseness (of lexical features)
Use class information of words to support generalization better: instantiated as word clusters in our work Automatically generated from unlabeled texts using algorithm of
(Brown et al., 1992)
apple pear Apple
0 1 0 1
0 1
bought run of in
0 1 0 1
0 1
0 1
21
IBMIBM011011
Knowledge2: Word Class Information
All lexical features consisting of single words will be duplicated with its corresponding bit-string representation
apple pear Apple IBM
0 1 0 1
0 1
bought run of in
0 1 0 1
0 1
0 1
22
00 01 10 11
Knowledge Sources
As additional features Wikipedia Word clusters
As constraints Hierarchy of relations Entity type constraints Coreference
23
24
weight vector for“local” models collection of
classifiers
Constraint Conditional Models (CCMs)(Roth and Yih, 2007; Chang et al., 2008)
Constraint Conditional Models (CCMs)(Roth and Yih, 2007; Chang et al., 2008)
25
weight vector for“local” models collection of
classifiers
penalty for violatingthe constraint
how far y is from a “legal” assignment
Constraint Conditional Models (CCMs)(Roth and Yih, 2007; Chang et al., 2008)
26
•Wikipedia•word clusters
•hierarchy of relations•entity type constraints•coreference
Constraint Conditional Models (CCMs)(Roth and Yih, 2007; Chang et al., 2008)
Goal of CCM: when you want to predict multiple variables, and you want to exploit the fact that they are related Encode knowledge as constraints to exploit interaction between
the multiple predictions You impose constraints on the predictions of your various models.
This is a global inference problem
We learn separate models and then perform joint global inference to arrive at final predictions
27
28
DavidCone,aKansasCitynative,wasoriginallysignedbytheRoyalsandbrokeintothemajorswiththeteam
Constraint Conditional Models (CCMs)
fine-grained coarse-grained
Employment:Staff 0.200.35 Employment
Employment:Executive 0.15
Personal:Family 0.100.40 Personal
Personal:Business 0.10
Affiliation:Citizen 0.200.25 Affiliation
Affiliation:Based-in 0.25
Key steps Write down a linear objective function Write down constraints as linear inequalities Solve using integer linear programming (ILP) packages
29
Constraint Conditional Models (CCMs)(Roth and Yih, 2007; Chang et al., 2008)
Knowledge3: Relations between our target relations
......
personal
......
employment
family biz executivestaff
30
Knowledge3: Hierarchy of Relations
......
personal
......
employment
family biz executivestaff
31
coarse-grainedclassifier
fine-grainedclassifier
Knowledge3: Hierarchy of Relations
......
personal
......
employment
family biz executivestaff
32
mi mj
coarse-grained?
fine-grained?
Knowledge3: Hierarchy of Relations
......
personal
......
employment
family biz executivestaff
33
Knowledge3: Hierarchy of Relations
......
personal
......
employment
family biz executivestaff
34
Knowledge3: Hierarchy of Relations
......
personal
......
employment
family biz executivestaff
35
Knowledge3: Hierarchy of Relations
......
personal
......
employment
family biz executivestaff
36
Knowledge3: Hierarchy of Relations
......
personal
......
employment
family biz executivestaff
37
Knowledge3: Hierarchy of Relations
Write down a linear objective function
R LR L R rf
rfRRR rc
rcRR
RfRc
yrfpxrcp ,, )()(max
38
coarse-grainedprediction probabilities
fine-grainedprediction probabilities
Knowledge3: Hierarchy of Relations
Write down a linear objective function
R LR L R rf
rfRRR rc
rcRR
RfRc
yrfpxrcp ,, )()(max
39
coarse-grainedprediction probabilities
fine-grainedprediction probabilities
coarse-grainedindicatorvariable
fine-grainedindicatorvariable
indicator variable == relation assignment
Knowledge3: Hierarchy of Relations
Write down constraints If a relation R is assigned a coarse-grained label rc, then we must
also assign to R a fine-grained relation rf which is a child of rc.
(Capturing the inverse relationship) If we assign rf to R, then we must also assign to R the parent of rf, which is a corresponding coarse-grained label
40
nrfRrfRrfRrcR yyyx ,,,, 21
)(,, rfparentRrfR xy
Knowledge4: Entity Type Constraints(Roth and Yih, 2004, 2007)
Entity types are useful for constraining the possible labels that a relation R can assume
41
mi mj
Employment:Staff
Employment:Executive
Personal:Family
Personal:Business
Affiliation:Citizen
Affiliation:Based-in
Entity types are useful for constraining the possible labels that a relation R can assume
42
Employment:Staff
Employment:Executive
Personal:Family
Personal:Business
Affiliation:Citizen
Affiliation:Based-in
per org
per org
per
per per
per
per
org
gpe
gpe
per per
mi mj
Knowledge4: Entity Type Constraints(Roth and Yih, 2004, 2007)
We gather information on entity type constraints from ACE-2004 documentation and impose them on the coarse-grained relations By improving the coarse-grained predictions and combining with the
hierarchical constraints defined earlier, the improvements would propagate to the fine-grained predications
43
Employment:Staff
Employment:Executive
Personal:Family
Personal:Business
Affiliation:Citizen
Affiliation:Based-in
per org
per org
per
per per
per
per
org
gpe
gpe
per per
mi mj
Knowledge4: Entity Type Constraints(Roth and Yih, 2004, 2007)
Knowledge5: Coreference
44
mi mj
Employment:Staff
Employment:Executive
Personal:Family
Personal:Business
Affiliation:Citizen
Affiliation:Based-in
Knowledge5: Coreference
In this work, we assume that we are given the coreference information, which is available from the ACE annotation.
45
mi mj
Employment:Staff
Employment:Executive
Personal:Family
Personal:Business
Affiliation:Citizen
Affiliation:Based-in
null
Experiments
Used the ACE-2004 dataset for our experiments Relations do not cross sentence boundaries We model the argument order (of the mentions)
m1:r:m2 vs m2:r:m1
Allow null label prediction when mentions are not related
Classifiers regularized averaged perceptrons implemented within the SNoW
(Carlson et al., 1999) Followed prior work (Jiang and Zhai, 2007) and performed 5-fold
cross validation
46
Performance of the Basic RE System
Build a BasicRE system using only the basic features Compare against the state-of-the-art feature-based RE
system of Jiang and Zhai (2007) The authors performed their evaluation using undirected coarse-
grained relations (7 relation labels + 1 null label) Evaluation on nwire and bnews corpora of ACE-2004 Performance (F1%)
47
Jiang and Zhai (2007) BasicRE
71.5% 71.2%
Experimental Settings
ACE-2004: 7 (coarse) and 23 (fine) grained relations Trained two classifiers: coarse-grained (15 relation labels) fine-grained (47 relation labels)
Focus on evaluation of fine-grained relations Use the nwire corpus for our experiments Two of our knowledge sources (Wiki system, word clusters) assume
inputs of mixed-case text bnews corpus in lower-cased text 28,943 relation instances with 2,226 (non-null)
48
Evaluation Settings
49
Prior work Our workTrain-test data splits at mention level
Train-test data splits at document level
Evaluation at mention level
Evaluation at entity level
more realistic
Experimental Settings
Evaluate our performance at the entity level Prior work calculated RE performance at the level of mentions ACE annotators rarely duplicate a relation link for coreferent
mentions:
Given a pair of entities, we establish the set of relation types existing between them, based on their mention annotations
r
... mi ... mj ... mk ...
50
null?
Experiment Results (F1%): fine-grained relations
51
All nwire 10% of nwire
BasicRE 50.5% 31.0%
Experiment Results (F1%): fine-grained relations
52
F1% improvement from using each knowledge source
All nwire 10% of nwire
BasicRE 50.5% 31.0%
Related Work
Ji et al. (2005), Zhou et al. (2005,2008), Jiang (2009)
53
Conclusion
We proposed a broad range of methods to inject background knowledge into a RE system
Some methods (e.g. exploiting the relation hierarchy) are general in nature
To combine the various relation predictions, we perform global inference within an ILP framework This allows for easy injection of knowledge as constraints Ensures globally coherent models and predictions
54