relation extraction and machine learning for ie feiyu xu feiyu@dfki€¦ · •topic extraction...

131
Relation Relation Extraction Extraction and and Machine Learning for Machine Learning for IE IE Feiyu Xu Feiyu Xu feiyu@dfki feiyu@dfki .de .de Language Technology-Lab Language Technology-Lab DFKI, DFKI, Saarbrücken Saarbrücken

Upload: others

Post on 31-Jul-2020

18 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Relation Relation ExtractionExtractionandand

Machine Learning for Machine Learning for IEIE

Feiyu Xu Feiyu Xu

feiyu@[email protected]

Language Technology-LabLanguage Technology-LabDFKI, DFKI, SaarbrückenSaarbrücken

Page 2: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Relation in IE

Page 3: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Information Extraction is Information Extraction is ……

a technology that is futuristic from the user's point ofa technology that is futuristic from the user's point ofview in the current information-driven world.view in the current information-driven world.

Rather than indicating which documents need to beRather than indicating which documents need to beread by a user, it extracts pieces of information that areread by a user, it extracts pieces of information that aresalient to the user's needs salient to the user's needs ……

provided by NIST:provided by NIST:[http:[http://www-nlpir//www-nlpir..nistnist..gov/related_projects/muc/gov/related_projects/muc/]]

Page 4: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Information Extraction:Information Extraction:A Pragmatic ApproachA Pragmatic Approach

•• Identify the types of entities that are relevantIdentify the types of entities that are relevantto a particular taskto a particular task

•• Identify the range of facts that one isIdentify the range of facts that one isinterested in for those entitiesinterested in for those entities

•• Ignore everything elseIgnore everything else

[Appelt, 2003][Appelt, 2003]

Page 5: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

IE from Research PapersIE from Research Papers

Page 6: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Extracting Job Openings from the WebExtracting Job Openings from the Web::SemiSemi-Structured Data-Structured Data

foodscience.com-Job2

JobTitle: Ice Cream Guru

Employer: foodscience.com

JobCategory: Travel/Hospitality

JobFunction: Food Services

JobLocation: Upper Midwest

Contact Phone: 800-488-2611

DateExtracted: January 8, 2001 Source: www.foodscience.com/jobs_midwest.html

OtherCompanyJobs: foodscience.com-Job1

Page 7: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

On On the Notion the Notion Relation Relation ExtractionExtraction

Relation Relation Extraction is the cover term for thoseExtraction is the cover term for thoseInformation Information Extraction tasks Extraction tasks in in which instanceswhich instances

of of semantic semantic relations relations are detected are detected in in naturalnaturallanguage textslanguage texts..

Page 8: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Types Types of Information of Information Extraction Extraction in LTin LT

•• Topic Topic ExtractionExtraction•• Term Term ExtractionExtraction•• Named Entity ExtractionNamed Entity Extraction•• Binary Binary Relation Relation ExtractionExtraction•• N-ary N-ary Relation Relation ExtractionExtraction•• Event Event ExtractionExtraction•• Answer ExtractionAnswer Extraction•• Opinion ExtractionOpinion Extraction•• Sentiment Sentiment ExtractionExtraction

Page 9: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Types Types of Information of Information Extraction Extraction in LTin LT

•• Topic Topic ExtractionExtraction•• Term Term ExtractionExtraction•• NamedNamed EntityEntity ExtractionExtraction•• BinaryBinary Relation Relation ExtractionExtraction•• N-aryN-ary Relation Relation ExtractionExtraction•• Event Event ExtractionExtraction•• AnswerAnswer ExtractionExtraction•• OpinionOpinion ExtractionExtraction•• Sentiment Sentiment ExtractionExtraction

Types of Relation ExtractionTypes of Relation Extraction

Page 10: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Relation Extraction is a demanding sub-areaRelation Extraction is a demanding sub-areaof Information Extractionof Information Extraction

Page 11: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Example of Binary Social RelationsExample of Binary Social RelationsSocial Network of Social Network of ““MadonnaMadonna”” (Depth = 1) (Depth = 1)

Page 12: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Examples of Binary RelationsExamples of Binary RelationsSocial Network of Social Network of ““My Chemical RomanceMy Chemical Romance”” (Depth = 1) (Depth = 1)

Page 13: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

ExamplesExamples

Social Network of Social Network of ““MadonnaMadonna”” (Depth = 3) (Depth = 3)

Page 14: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Relation about Person, Title and OrganizationRelation about Person, Title and Organization

October 14, 2002, 4:00 a.m. PT

For years, Microsoft Corporation CEOBill Gates railed against the economicphilosophy of open-source softwarewith Orwellian fervor, denouncing itscommunal licensing as a "cancer" thatstifled technological innovation.

Today, Microsoft claims to "love" theopen-source concept, by whichsoftware code is made public toencourage improvement anddevelopment by outside programmers.Gates himself says Microsoft will gladlydisclose its crown jewels--the covetedcode behind the Windows operatingsystem--to select customers.

"We can be open source. We love theconcept of shared source," said BillVeghte, a Microsoft VP. "That's a super-important shift for us in terms of codeaccess.“

Richard Stallman, founder of the FreeSoftware Foundation, counteredsaying…

Microsoft CorporationCEOBill GatesMicrosoftGatesMicrosoft

Bill VeghteMicrosoftVPRichard StallmanfounderFree Software Foundation

NAME TITLE ORGANIZATION

Bill Gates CEO Microsoft

Bill Veghte VP Microsoft

Richard Stallman founder Free Soft..

*

*

*

*

Page 15: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

A relation extraction task in the domain A relation extraction task in the domain managementmanagementsuccessionsuccession (MUC-6) (MUC-6)

< < person_inperson_in, , person_outperson_out, position, organisation>, position, organisation>

•• person_inperson_in: the person who obtained the position: the person who obtained the position•• person_outperson_out: the person who left the position: the person who left the position•• positionposition: the job position that the two persons were: the job position that the two persons were

involved ininvolved in•• organisationorganisation: the organisation where the position was: the organisation where the position was

locatedlocated

ExampleExample

Page 16: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

According to CEO Fred Bell, Acme Inc. hired Peter Smithas COO yesterday, replacing Hans Bloggs.

Page 17: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

According to CEO Fred Bell, Acme Inc. hired Peter Smithas COO yesterday, replacing Hans Bloggs.

<person_in, person_out, position, organisation>

Page 18: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

According to CEO Fred Bell, Acme Inc. hired Peter Smithas COO yesterday, replacing Hans Bloggs.

<person_in, person_out, position, organisation>

Page 19: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

hire/V

Acme Inc./N/Organisation

Peter Smith/N/Person

replace/Vsubj obj

vpsc-mod

asHans Bloggs

/N/Person

pcomp-n

According to CEO Fred Bell, Acme Inc. hired Peter Smithas COO yesterday, replacing Hans Bloggs.

COO/N/Position

according to/P

Fred Bell/N/Person

obj

title

CEO/N/Position

yesterday/N

mod

mod

<person_in, person_out, position, organisation>

Page 20: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

hire/V

Acme Inc./N/Organisation

Peter Smith/N/Person

replace/Vsubj obj

vpsc-mod

asHans Bloggs

/N/Person

pcomp-n

COO/N/Position

according to/P

Fred Bell/N/Person

obj

title

CEO/N/Position

yesterday/N

mod

mod

According to CEO Fred Bell, Acme Inc. hired Peter Smithas COO yesterday, replacing Hans Bloggs.

<person_in, person_out, position, organisation>

Page 21: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

hire/V

Acme Inc./N/Organisation

Peter Smith/N/Person

replace/Vsubj obj

vpsc-mod

asHans Bloggs

/N/Person

pcomp-n

COO/N/Position

according to/P

Fred Bell/N/Person

obj

title

CEO/N/Position

yesterday/N

mod

mod

<person_in, person_out, position, organisation>

Page 22: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

A Brief A Brief History History of IEof IE

Page 23: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Message Understanding ConferencesMessage Understanding Conferences[MUC-7 98][MUC-7 98]

•• U.S. Government sponsored conferences with the intention toU.S. Government sponsored conferences with the intention tocoordinate multiple research groups seeking to improve IE andcoordinate multiple research groups seeking to improve IE andIR technologiesIR technologies (since 1987) (since 1987)

•• defined several generic types of information extraction tasksdefined several generic types of information extraction tasks(MUC Competition)(MUC Competition)

•• MUC 1-2 focused on automated analysis of military messagesMUC 1-2 focused on automated analysis of military messagescontaining textual informationcontaining textual information

•• MUC 3-7 focused on information extraction from newswireMUC 3-7 focused on information extraction from newswirearticlesarticles

•• terrorist eventsterrorist events•• international joint-venturesinternational joint-ventures•• management succession eventmanagement succession event

Page 24: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Evaluation of IE systems in MUCEvaluation of IE systems in MUC

•• Participants receive description of the scenario along withParticipants receive description of the scenario along withthe annotated the annotated training corpustraining corpus in order to adapt their in order to adapt theirsystems to the new scenario (1 to 6 months)systems to the new scenario (1 to 6 months)

•• Participants receive new set of documents (Participants receive new set of documents (test corpustest corpus))and use their systems to extract information from theseand use their systems to extract information from thesedocuments and return the results to the conferencedocuments and return the results to the conferenceorganizerorganizer

•• The results are compared to the manually filled set ofThe results are compared to the manually filled set oftemplatestemplates((answer keyanswer key))

Page 25: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Evaluation of IE systems in MUCEvaluation of IE systems in MUC

•• precision and recall measures were adopted fromprecision and recall measures were adopted fromthe information retrieval research communitythe information retrieval research community

•• Sometimes an Sometimes an FF--meassuremeassure is used as a is used as acombined recall-precision scorecombined recall-precision score

key

correct

N

Nrecall =

incorrectcorrect

correct

NN

Nprecision

+=

recallprecision

recallprecisionF

+!

!!+=

2

2 )1(

"

"

Page 26: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Generic IE tasks for MUC-7Generic IE tasks for MUC-7•• (NE) Named Entity Recognition Task requires the(NE) Named Entity Recognition Task requires the

identification and classification of named entitiesidentification and classification of named entities•• organizationsorganizations•• locationslocations•• personspersons•• dates, times, percentages and monetary expressionsdates, times, percentages and monetary expressions

•• (TE) Template Element Task requires the filling of small(TE) Template Element Task requires the filling of smallscale templates for specified classes of entities in the textsscale templates for specified classes of entities in the texts

•• Attributes of entities are slot fills (identifying the entities beyond theAttributes of entities are slot fills (identifying the entities beyond thename level)name level)

•• Example: Persons with slots such as name (plus name variants),Example: Persons with slots such as name (plus name variants),title, nationality, description as supplied in the text, and subtype.title, nationality, description as supplied in the text, and subtype.

““Capitan Denis Gillespie, the Capitan Denis Gillespie, the comander comander of Carrier Air Wing 11of Carrier Air Wing 11””

Page 27: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Generic IE tasks for MUC-7Generic IE tasks for MUC-7

•• (TR) Template Relation Task requires filling a two slot(TR) Template Relation Task requires filling a two slottemplate representing a binary relation with pointers totemplate representing a binary relation with pointers totemplate elements standing in the relation, which weretemplate elements standing in the relation, which werepreviously identified in the TE taskpreviously identified in the TE task

•• subsidiary relationship between two companiessubsidiary relationship between two companies(employee_of, product_of, location_of)(employee_of, product_of, location_of)

:

:

_

ONORGANIZATI

PERSON

OFEMPLOYEE researcherDESCRIPTOR

XuFeiyuNAME

PERSON

:

:

GmbH

instituteresearch

NAME

ONORGANIZATI

:CATEGORY

:DESCRIPTOR

DFKI:

Page 28: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

•• (CO) (CO) Coreference Coreference Resolution requires the identification ofResolution requires the identification ofexpressions in the text that refer to the same object, set orexpressions in the text that refer to the same object, set oractivityactivity

•• variant forms of name expressionsvariant forms of name expressions•• definite noun phrases and their antecedentsdefinite noun phrases and their antecedents•• pronouns and their antecedentspronouns and their antecedents

““The U.K. satellite television broadcasterThe U.K. satellite television broadcaster said said itsitssubscriber basesubscriber base grew 17.5 percent grew 17.5 percentduring the past year to during the past year to 5.35 million5.35 million””

Generic IE tasks for MUC-7Generic IE tasks for MUC-7

Page 29: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

•• (ST) Scenario Template requires filling a template(ST) Scenario Template requires filling a templatestructure with extracted information involving severalstructure with extracted information involving severalrelations or events of interestrelations or events of interest

•• intended to be the MUC approximation to a real-worldintended to be the MUC approximation to a real-worldinformation extraction probleminformation extraction problem

•• identification of partners, products, profits andidentification of partners, products, profits andcapitalization of joint venturescapitalization of joint ventures

Generic IE tasks for MUC-7Generic IE tasks for MUC-7

1997 18February :

:

:/

:2

:1

LtdSystems ionCommunicat GEC Siemens :

_

TIME

unknownTIONCAPITALIZA

SERVICEPRODUCT

PARTNER

PARTNER

NAME

VENTUREJOINT

!

!

..............

ONORGANIZATI

..............

ONORGANIZATI

:

:

_

ONORGANIZATI

PRODUCT

OFPRODUCT

..............

PRODUCT

Page 30: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Tasks evaluated in MUC 3-7Tasks evaluated in MUC 3-7[[ChinchorChinchor, 98], 98]

YESYESYESYESYESYESYESYESYESYESMUC-7MUC-7

YESYESYESYESYESYESYESYESMUC-6MUC-6

YESYESMUC-5MUC-5

YESYESMUC-4MUC-4

YESYESMUC-3MUC-3

STSTTRTRRERECOCONENEEVAL\TASKEVAL\TASK

Page 31: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Development Steps within Development Steps within IE IE CommunitiesCommunities

•• from attempts from attempts to to use the methods use the methods of of full full texttextunderstanding understanding to to shallow shallow text text processingprocessing;;

•• from from pure pure knowledge-based hand-coded systems knowledge-based hand-coded systems toto(semi-) (semi-) automatic systems using machine learningautomatic systems using machine learningmethodsmethods;;

•• from complex domain-dependent event extraction from complex domain-dependent event extraction totostandardized domain-independent elementary entitystandardized domain-independent elementary entityidentificationidentification, simple , simple semantic relation semantic relation and and eventeventextractionextraction..

Page 32: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

The ACE ProgramThe ACE Program

•• ““Automated Content ExtractionAutomated Content Extraction”” since 1999since 1999

•• Develop core information extraction technology by focusing onDevelop core information extraction technology by focusing onextracting specific semantic entities and relations over a very wideextracting specific semantic entities and relations over a very widerange of texts.range of texts.

•• Corpora: Newswire and broadcast transcripts, but broad range ofCorpora: Newswire and broadcast transcripts, but broad range oftopics and genres.topics and genres.•• Third person reportsThird person reports•• InterviewsInterviews•• EditorialsEditorials•• Topics: foreign relations, significant events, human interest, sports,Topics: foreign relations, significant events, human interest, sports,

weatherweather

•• Discourage highly domain- and genre-dependent solutionsDiscourage highly domain- and genre-dependent solutions

[[AppeltAppelt, 2003], 2003]

Page 33: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Components of a Semantic ModelComponents of a Semantic Model

•• Entities - Individuals in the world Entities - Individuals in the world that are mentioned in a textthat are mentioned in a text•• Simple entities: singular objectsSimple entities: singular objects•• Collective entities: sets of objects of the same type Collective entities: sets of objects of the same type wherewhere

the set isthe set is explicitly mentioned in the textexplicitly mentioned in the text

•• Relations Relations –– Properties that hold of Properties that hold of tuples tuples of entities. of entities.

•• Complex Relations Complex Relations –– Relations that hold among entities and Relations that hold among entities andrelationsrelations

•• Attributes Attributes –– one place relations are attributes or individual one place relations are attributes or individualpropertiesproperties

Page 34: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Components of a Semantic ModelComponents of a Semantic Model

•• Temporal points and intervalsTemporal points and intervals

•• Relations may be timeless or bound to time intervalsRelations may be timeless or bound to time intervals

•• Events Events –– A particular kind of simple or complex relation among entities A particular kind of simple or complex relation among entitiesinvolving a change in relation state at the end ofinvolving a change in relation state at the end of a time intervala time interval..

Page 35: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Relations in TimeRelations in Time

•• timeless attributetimeless attribute: : gendergender(x)(x)

•• time-dependent attributetime-dependent attribute: age(x): age(x)

•• timeless two-place relationtimeless two-place relation: : fatherfather(x, y)(x, y)

•• time-dependent two-place relationtime-dependent two-place relation: : bossboss(x, y)(x, y)

Page 36: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Relations vs. Features Relations vs. Features oror RolesRoles in in AVMsAVMs

•• SeveralSeveral twotwo placeplace relations relations betweenbetween an an entityentity xx and and otherotherentitiesentities yyii cancan bebe bundledbundled as as propertiesproperties of x. In of x. In thisthis casecase, , thetherelations relations areare calledcalled rolesroles ( (oror attributesattributes) and ) and anyany pair pair<<relationrelation : : yyii> > isis calledcalled a a rolerole assignmentassignment ( (oror a a featurefeature).).

•• namename <x, CR> <x, CR>

name: Condoleezza Riceoffice: National Security Advisorage: 49gender: female

Page 37: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Semantic Analysis: Relating Language toSemantic Analysis: Relating Language tothe Modelthe Model

•• Linguistic MentionLinguistic Mention•• A particular linguistic phraseA particular linguistic phrase•• Denotes a particular entity, relation, or eventDenotes a particular entity, relation, or event

•• A noun phrase, name, or possessive pronounA noun phrase, name, or possessive pronoun•• A verb, nominalization, compound nominal, or other linguisticA verb, nominalization, compound nominal, or other linguistic

construct relating other linguistic mentionsconstruct relating other linguistic mentions

•• Linguistic EntityLinguistic Entity•• Equivalence class of mentions with same meaningEquivalence class of mentions with same meaning

•• CoreferringCoreferring noun phrases noun phrases•• Relations and events derived from different mentions, butRelations and events derived from different mentions, but

conveying the same meaningconveying the same meaning

[Appelt, 2003][Appelt, 2003]

Page 38: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Language and World ModelLanguage and World Model

LinguisticMention

LinguisticEntity

Denotes

Denotes

[Appelt, 2003][Appelt, 2003]

Page 39: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

NLP Tasks in an Extraction SystemNLP Tasks in an Extraction System

Cross-Document Coreference

LinguisticMention

RecognitionType Classification

LinguisticEntity

Coreference

Events andRelations

Event Recognition

[Appelt, 2003][Appelt, 2003]

Page 40: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

ExampleExample

1.1. Three Three of of the the Nobel Nobel PrizesPrizes for Chemistry for Chemistry during the first decade during the first decadewere awarded for pioneering work were awarded for pioneering work in in organic chemistryorganic chemistry..

2.2. In 1902 Emil Fischer (1852-1919), In 1902 Emil Fischer (1852-1919), then then in Berlin, was in Berlin, was given given thetheprizeprize for for his his work work on on sugar sugar and and purine synthesespurine syntheses..

3.3. Another major influence from organic chemistry Another major influence from organic chemistry was was thethedevelopment development of of the chemical industrythe chemical industry, and a , and a chief contributorchief contributorhere here was was Fischer's teacherFischer's teacher, Adolf von Baeyer (1835-1917) in, Adolf von Baeyer (1835-1917) inMunichMunich, , who who was was awarded awarded the prizethe prize in 1905.in 1905.

Page 41: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Anaphora Anaphora in Textsin Texts

He/The scientistHe/The scientist won the won the 20052005 Nobel Nobel PrizePrize for for PeacePeace on on Friday forFriday forhis his efforts efforts to to limit the spread limit the spread of of atomic weaponsatomic weapons..

<<?PERSON?PERSON, Nobel, Peace, 2005>, Nobel, Peace, 2005>

Page 42: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Coreference Coreference Relations and Relations and IndicatorsIndicators

•• Complex linguistic phenomena, influenced by lexical,Complex linguistic phenomena, influenced by lexical,syntactic, semantic and discourse constraintssyntactic, semantic and discourse constraints

•• The indicators shared by many approaches areThe indicators shared by many approaches are•• Distance: Distance: coreference coreference expressions are often close to each otherexpressions are often close to each other

in the surface structure;in the surface structure;

•• Syntactic: pronominal resolution constraints within sentenceSyntactic: pronominal resolution constraints within sentence

•• Semantic: same or compatible semantic category, agreement inSemantic: same or compatible semantic category, agreement innumber, gender and person;number, gender and person;

•• Discourse: parallelism, repetition, apposition, name alias.Discourse: parallelism, repetition, apposition, name alias.

Page 43: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Receny Indicator Receny Indicator in in Nobel Nobel Prize Prize DomainDomain•• News News reports from reports from New York Times, online BBC andNew York Times, online BBC and

CCN (CCN (18.4 MB, 3328 18.4 MB, 3328 documentsdocuments))

Page 44: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

1.1. Two AmericansTwo Americans have won the have won the 2002 Nobel 2002 Nobel Prize Prize in in Economic SciencesEconomic Sciences..2.2. The two scientistsThe two scientists, , Daniel Daniel KahnemanKahneman and and Vernon L. Smith, Vernon L. Smith, received thereceived the

honour honour on on Wednesday for their work using psychological research Wednesday for their work using psychological research andandlaboratory experiments laboratory experiments in in economic analysiseconomic analysis..

1.1. Egypt honours Egypt honours its its Nobel Nobel Prize chemistPrize chemist..2.2. President President Hosni Mubarak of Hosni Mubarak of Egypt Egypt has has awarded the country's mostawarded the country's most

prestigious prize prestigious prize - - the Nile Necklace the Nile Necklace - to - to the Egyptian-born chemist the Egyptian-born chemist AhmedAhmedZewailZewail..

Page 45: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Repetition and Repetition and ElaborationElaboration

•• Cohension indicator Cohension indicator repetitionrepetition is often used is often used as as indictor forindictor forsemantic similarity semantic similarity and and semantic consistencysemantic consistency, e.g.,, e.g.,

•• „„twotwo Americans Americans““ and and „„twotwo scientists scientists““•• „„chemistchemist““ and and „„chemistchemist““

•• Elaboration phenomena are Elaboration phenomena are normal in normal in newspaper textsnewspaper texts

S1 is an Elaboration of S0 if a proposition P follows from theS1 is an Elaboration of S0 if a proposition P follows from theassertions of both S0 and S1, but S1 contains a property of oneassertions of both S0 and S1, but S1 contains a property of oneof the elements of P that is not in S0 (Hobbs, 1979)of the elements of P that is not in S0 (Hobbs, 1979)

Page 46: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Relation Argument as a Relation Argument as a Complex Semantic ObjectComplex Semantic Object•• A complex noun phrase contains often more than one propertyA complex noun phrase contains often more than one property

about an argument: e.g.about an argument: e.g.

EgyptianEgyptian--bornborn chemist chemist Ahmed Ahmed ZewailZewail

•• Relevant Relevant properties properties of a of a winner winner in Nobel in Nobel Prize domainPrize domain•• Nationality/origin/inhabitant: e.g., two Americans, Nationality/origin/inhabitant: e.g., two Americans, the Egyptian-born, athe Egyptian-born, a

DutchDutch•• Profession/occupation: e.g., novelist, chemist, scientist, researcherProfession/occupation: e.g., novelist, chemist, scientist, researcher•• Title/position: e.g., professor, presidentTitle/position: e.g., professor, president•• Domain description: e.g., recipient, winner, Nobel LaureateDomain description: e.g., recipient, winner, Nobel Laureate•• General description: e.g., the man, a woman, the teamGeneral description: e.g., the man, a woman, the team

Page 47: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

„„two Americanstwo Americans““

!

sentence_ id : i

number :type : plural

amount :2

"

# $

%

& '

definite : indef

grammarrole :subject

semantics : nationality : american[ ]

"

#

$ $ $ $ $ $ $ $ $

%

&

' ' ' ' ' ' ' ' '

Page 48: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

„„two scientiststwo scientists““

!

sentence_ id : i+1

number :type : plural

amount : 2

"

# $

%

& '

definite : def

grammarrole : subject

semantics : profession : scientist[ ]

names : name1 name2

"

#

$ $ $ $ $ $ $ $ $

%

&

' ' ' ' ' ' ' ' '

Page 49: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Unification Unification ofof„„two Americanstwo Americans““ and and „„two scientiststwo scientists““

!

number :type : plural

amount : 2

"

# $

%

& '

semantics :nationality : american

profession : scientist

"

# $

%

& '

names : name1 name2

"

#

$ $ $ $ $ $ $ $

%

&

' ' ' ' ' ' ' '

Page 50: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

The Basic Semantic Tasks of an IE SystemThe Basic Semantic Tasks of an IE System

•• Recognition of linguistic entitiesRecognition of linguistic entities•• Classification of linguistic entities into semantic typesClassification of linguistic entities into semantic types•• Identification of coreference equivalence classes ofIdentification of coreference equivalence classes of

linguistic entitieslinguistic entities•• Identifying the actual individuals that are mentionedIdentifying the actual individuals that are mentioned

in an articlein an article•• Associating linguistic entities with predefined individualsAssociating linguistic entities with predefined individuals

(e.g. a database, or knowledge base)(e.g. a database, or knowledge base)•• Forming equivalence classes of linguistic entities fromForming equivalence classes of linguistic entities from

different documents.different documents.

[[AppeltAppelt, 2003], 2003]

Page 51: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

The ACE OntologyThe ACE Ontology

•• PersonsPersons•• A natural kind, and hence self-evidentA natural kind, and hence self-evident

•• OrganizationsOrganizations•• Should have some persistent existence that transcends aShould have some persistent existence that transcends a

mere set of individualsmere set of individuals•• LocationsLocations

•• Geographic places with no associated governmentsGeographic places with no associated governments•• FacilitiesFacilities

•• Objects from the domain of civil engineeringObjects from the domain of civil engineering•• Geopolitical EntitiesGeopolitical Entities

•• Geographic places with associated governmentsGeographic places with associated governments

[Appelt, 2003][Appelt, 2003]

Page 52: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Why Why GPEsGPEs

•• An ontological problem: certain entities haveAn ontological problem: certain entities haveattributes of physical objects in some contexts,attributes of physical objects in some contexts,organizations in some contexts, and collections oforganizations in some contexts, and collections ofpeople in otherspeople in others

•• Sometimes it is difficult to impossible to determineSometimes it is difficult to impossible to determinewhich aspect is intendedwhich aspect is intended

•• It appears that in some contexts, the same phraseIt appears that in some contexts, the same phraseplays different roles in different clausesplays different roles in different clauses

Page 53: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Aspects of Aspects of GPEsGPEs

•• PhysicalPhysical•• San Francisco has a mild climateSan Francisco has a mild climate

•• OrganizationOrganization•• The United States is seeking a solution to theThe United States is seeking a solution to the

North Korean problem.North Korean problem.

•• PopulationPopulation•• France makes a lot of good wine.France makes a lot of good wine.

Page 54: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Types of Linguistic MentionsTypes of Linguistic Mentions

•• Name mentionsName mentions•• The mention uses a proper name to refer to the entityThe mention uses a proper name to refer to the entity

•• Nominal mentionsNominal mentions•• The mention is a noun phrase whose head is a commonThe mention is a noun phrase whose head is a common

nounnoun

•• Pronominal mentionsPronominal mentions•• The mention is a headless noun phrase, or a noun phraseThe mention is a headless noun phrase, or a noun phrase

whose head is a pronoun, or a possessive pronounwhose head is a pronoun, or a possessive pronoun

Page 55: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Explicit and Implicit RelationsExplicit and Implicit Relations

•• Many relations are true in the world. ReasonableMany relations are true in the world. Reasonableknoweldge knoweldge bases used by extraction systems willbases used by extraction systems willinclude many of these relations. Semantic analysisinclude many of these relations. Semantic analysisrequires focusing on certain ones that are directlyrequires focusing on certain ones that are directlymotivated by the text.motivated by the text.

•• Example:Example:•• Baltimore is in Maryland, which is in United States.Baltimore is in Maryland, which is in United States.•• ““Baltimore, MDBaltimore, MD””•• Text mentions Baltimore and United States. Is there a relationText mentions Baltimore and United States. Is there a relation

between Baltimore and United States?between Baltimore and United States?

Page 56: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Another ExampleAnother Example

•• Prime Minister Tony Blair attempted to convince thePrime Minister Tony Blair attempted to convince theBritish Parliament of the necessity of intervening inBritish Parliament of the necessity of intervening inIraq.Iraq.

•• Is there a role relation specifying Tony Blair as primeIs there a role relation specifying Tony Blair as primeminister of Britain?minister of Britain?

•• A test: a relation is implicit in the text if the textA test: a relation is implicit in the text if the textprovides convincing evidence that the relationprovides convincing evidence that the relationactually holds.actually holds.

Page 57: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Explicit RelationsExplicit Relations

•• Explicit relations are expressed by certain surfaceExplicit relations are expressed by certain surfacelinguistic formslinguistic forms

•• Copular predication - Clinton was the president.Copular predication - Clinton was the president.•• Prepositional Phrase - The CEO of MicrosoftPrepositional Phrase - The CEO of Microsoft……•• PrenominalPrenominal modification - The American envoy modification - The American envoy……•• Possessive - MicrosoftPossessive - Microsoft’’s chief scientists chief scientist……•• SVO relations - Clinton arrived in Tel AvivSVO relations - Clinton arrived in Tel Aviv……•• Nominalizations - AnanNominalizations - Anan’’s visit to Baghdads visit to Baghdad……•• Apposition - Tony Blair, BritainApposition - Tony Blair, Britain’’s prime ministers prime minister……

Page 58: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Types of ACE RelationsTypes of ACE Relations

•• ROLE - relates a person to an organization or aROLE - relates a person to an organization or ageopolitical entitygeopolitical entity•• Subtypes: member, owner, affiliate, client, citizenSubtypes: member, owner, affiliate, client, citizen

•• PART - generalized containmentPART - generalized containment•• Subtypes: subsidiary, physical part-of, set membershipSubtypes: subsidiary, physical part-of, set membership

•• AT - permanent and transient locationsAT - permanent and transient locations•• Subtypes: located, based-in, residenceSubtypes: located, based-in, residence

•• SOC - social relations among personsSOC - social relations among persons•• Subtypes: parent, sibling, spouse, grandparent, associateSubtypes: parent, sibling, spouse, grandparent, associate

Page 59: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Event Types (preliminary)Event Types (preliminary)

•• MovementMovement•• Travel, visit, move, arrive, depart Travel, visit, move, arrive, depart ……

•• TransferTransfer•• Give, take, steal, buy, sellGive, take, steal, buy, sell……

•• Creation/DiscoveryCreation/Discovery•• Birth, make, discover, learn, inventBirth, make, discover, learn, invent……

•• DestructionDestruction•• die, destroy, wound, kill, damagedie, destroy, wound, kill, damage……

Page 60: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

MachineMachine LearningLearningforfor

Relation Relation ExtractionExtraction

Page 61: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

MotivationsMotivations of ML of ML

•• Porting to new domains or applications isPorting to new domains or applications isexpensiveexpensive

•• Current technology requires IE expertsCurrent technology requires IE experts•• Expertise difficult to find on the marketExpertise difficult to find on the market•• SME cannot afford IE expertsSME cannot afford IE experts

•• Machine learning approachesMachine learning approaches•• Domain portability is relatively straightforwardDomain portability is relatively straightforward•• System expertise is not required for customizationSystem expertise is not required for customization•• ““Data drivenData driven”” rule acquisition ensures full coverage rule acquisition ensures full coverage

of examplesof examples

Page 62: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

ProblemsProblems

•• Training data may not exist, and may be veryTraining data may not exist, and may be veryexpensive to acquireexpensive to acquire

•• Large volume of training data may be requiredLarge volume of training data may be required

•• Changes to specifications may requireChanges to specifications may requirereannotationreannotation of large quantities of training data of large quantities of training data

•• Understanding and control of a domain adaptiveUnderstanding and control of a domain adaptivesystem is not always easy for non-expertssystem is not always easy for non-experts

Page 63: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

ParametersParameters

•• DocumentDocument structurestructure•• Free textFree text•• Semi-structuredSemi-structured•• StructuredStructured

•• RichnessRichness of of thethe annotationannotation•• ShallowShallow NLP NLP•• DeepDeep NLP NLP

•• ComplexityComplexity of of thethe templatetemplate fillingfillingrulesrules•• Single Single slotslot•• Multi Multi slotslot

•• AmountAmount of of datadata

•• DegreeDegree of of automationautomation•• Semi-automaticSemi-automatic•• SupervisedSupervised•• Semi-SupervisedSemi-Supervised•• UnsupervisedUnsupervised

•• Human Human interactioninteraction//contributioncontribution

•• Evaluation/Evaluation/validationvalidation•• duringduring learninglearning looploop•• Performance: Performance: recallrecall and and precisionprecision

Page 64: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

DocumentsDocuments

•• Unstructured (Free) TextUnstructured (Free) Text•• Regular sentences and paragraphsRegular sentences and paragraphs•• Linguistic techniques, e.g., NLPLinguistic techniques, e.g., NLP

•• Structured TextStructured Text•• Itemized informationItemized information•• Uniform syntactic clues, e.g., table understandingUniform syntactic clues, e.g., table understanding

•• Semi-structured TextSemi-structured Text•• Ungrammatical, telegraphic (e.g., missing attributes, multi-Ungrammatical, telegraphic (e.g., missing attributes, multi-

value attributes, value attributes, ……))•• Specialized programs, e.g., wrappersSpecialized programs, e.g., wrappers

Page 65: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Research Goal

Development of a general framework for automatically learning mappingsbetween linguistic analyses and target semantic relations, with minimalhuman intervention.

subject

verb

object

mod

head

mod mod

Page 66: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Easy adaptation to new relation types with varied complexity

Automatic learning without annotated corpus

Exhaustive discovery of relevant linguistic patterns

Integration of semantic role information into linguistic patterns

Challenges

Page 67: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Outline

State of the art

Domain Adaptive Relation Extraction Framework (DARE)

Experiments and evaluations

Performance analysis and discussion

Conclusion and future work

Page 68: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Outline

State of the art

Domain Adaptive Relation Extraction Framework (DARE)

Experiments and evaluations

Performance analysis and discussion

Conclusion and future work

Page 69: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

A relation extraction task in the domain management succession (MUC-6)

< person_in, person_out, position, organisation>

person_in: the person who obtained the position person_out: the person who left the position position: the job position that the two persons were involved in organisation: the organisation where the position was located

Example

Page 70: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

According to CEO Fred Bell, Acme Inc. hired Peter Smithas COO yesterday, replacing Hans Bloggs.

Page 71: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

According to CEO Fred Bell, Acme Inc. hired Peter Smithas COO yesterday, replacing Hans Bloggs.

<person_in, person_out, position, organisation>

Page 72: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

According to CEO Fred Bell, Acme Inc. hired Peter Smithas COO yesterday, replacing Hans Bloggs.

<person_in, person_out, position, organisation>

Page 73: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

hire/V

Acme Inc./N/Organisation

Peter Smith/N/Person

replace/Vsubj obj

vpsc-mod

asHans Bloggs

/N/Person

pcomp-n

According to CEO Fred Bell, Acme Inc. hired Peter Smithas COO yesterday, replacing Hans Bloggs.

COO/N/Position

according to/P

Fred Bell/N/Person

obj

title

CEO/N/Position

yesterday/N

mod

mod

<person_in, person_out, position, organisation>

Page 74: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

hire/V

Acme Inc./N/Organisation

Peter Smith/N/Person

replace/Vsubj obj

vpsc-mod

asHans Bloggs

/N/Person

pcomp-n

COO/N/Position

according to/P

Fred Bell/N/Person

obj

title

CEO/N/Position

yesterday/N

mod

mod

According to CEO Fred Bell, Acme Inc. hired Peter Smithas COO yesterday, replacing Hans Bloggs.

<person_in, person_out, position, organisation>

Page 75: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

hire/V

Acme Inc./N/Organisation

Peter Smith/N/Person

replace/Vsubj obj

vpsc-mod

asHans Bloggs

/N/Person

pcomp-n

COO/N/Position

according to/P

Fred Bell/N/Person

obj

title

CEO/N/Position

yesterday/N

mod

mod

Ideal Target Pattern

Page 76: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

hire/V

Acme Inc./N/Organisation

Peter Smith/N/Person

replace/Vsubj obj

vpsc-mod

asHans Bloggs

/N/Person

pcomp-n

Previous Work: SVO Model

COO/N/Position

according to/P

Fred Bell/N/Person

obj

title

CEO/N/Position

yesterday/N

mod

mod

Yangarber (2001)

Verb centered

Direct relations between subject-verb-object

Complex NP can not be extracted, e.g., the person and position relation

The linguistic relations among patterns are not considered, e.g., hire and replace

Page 77: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

hire/V

Acme Inc./N/Organisation

Peter Smith/N/Person

replace/Vsubj obj

vpsc-mod

asHans Bloggs

/N/Person

pcomp-n

COO/N/Position

according to/P

Fred Bell/N/Person

obj

title

CEO/N/Position

yesterday/N

mod

mod

Previous Work: Chain Model Sudo et al. (2001)

Verb centered

A single syntactic path dominated by a verb containing at least one relevantnamed entity concept

Page 78: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

hire/V

Acme Inc./N/Organisation

Peter Smith/N/Person

replace/Vsubj obj

vpsc-mod

asHans Bloggs

/N/Person

pcomp-n

COO/N/Position

according to/P

Fred Bell/N/Person

obj

title

CEO/N/Position

yesterday/N

mod

mod

Previous Work: Chain Model Sudo et al. (2001)

Verb centered

A single syntactic path dominated by a verb containing at least one relevantnamed entity concept

Page 79: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

hire/V

Acme Inc./N/Organisation

Peter Smith/N/Person

replace/Vsubj obj

vpsc-mod

asHans Bloggs

/N/Person

pcomp-n

COO/N/Position

according to/P

Fred Bell/N/Person

obj

title

CEO/N/Position

yesterday/N

mod

mod

Previous Work: Chain Model Sudo et al. (2001)

Verb centered

A single syntactic path dominated by a verb containing at least one relevantnamed entity concept

Page 80: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

hire/V

Acme Inc./N/Organisation

Peter Smith/N/Person

replace/Vsubj obj

vpsc-mod

asHans Bloggs

/N/Person

pcomp-n

COO/N/Position

according to/P

Fred Bell/N/Person

obj

title

CEO/N/Position

yesterday/N

mod

mod

Previous Work: Chain Model Sudo et al. (2001)

Verb centered

A single syntactic path dominated by a verb containing at least one relevantnamed entity concept

Page 81: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

hire/V

Acme Inc./N/Organisation

Peter Smith/N/Person

replace/Vsubj obj

vpsc-mod

asHans Bloggs

/N/Person

pcomp-n

COO/N/Position

according to/P

Fred Bell/N/Person

obj

title

CEO/N/Position

yesterday/N

mod

mod

Previous Work: Chain Model Sudo et al. (2001)

Verb centered

A single syntactic path dominated by a verb containing at least one relevantnamed entity concept

Page 82: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

hire/V

Acme Inc./N/Organisation

Peter Smith/N/Person

replace/Vsubj obj

vpsc-mod

asHans Bloggs

/N/Person

pcomp-n

COO/N/Position

according to/P

Fred Bell/N/Person

obj

title

CEO/N/Position

yesterday/N

mod

mod

• Pairs of chains are extracted•Verb centered verb centered

pairs of chains instead of single paths

Previous Work: Linked Chain Model Stevenson & Greenwood 2005

Page 83: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

hire/V

Acme Inc./N/Organisation

Peter Smith/N/Person

replace/Vsubj obj

vpsc-mod

asHans Bloggs

/N/Person

pcomp-n

COO/N/Position

according to/P

Fred Bell/N/Person

obj

title

CEO/N/Position

yesterday/N

mod

mod

• Pairs of chains are extracted•Verb centered verb centered

pairs of chains instead of single paths

Previous Work: Linked Chain Model Stevenson & Greenwood 2005

Page 84: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

hire/V

Acme Inc./N/Organisation

Peter Smith/N/Person

replace/Vsubj obj

vpsc-mod

asHans Bloggs

/N/Person

pcomp-n

COO/N/Position

according to/P

Fred Bell/N/Person

obj

title

CEO/N/Position

yesterday/N

mod

mod

• Pairs of chains are extracted•Verb centered verb centered

pairs of chains instead of single paths

Previous Work: Linked Chain Model Stevenson & Greenwood 2005

Page 85: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Fred Bell/N/Person

hire/V

Acme Inc./N/Organisation

Peter Smith/N/Person

replace/Vsubj obj

vpsc-mod

asHans Bloggs

/N/Person

pcomp-n

COO/N/Position

according to/P

Fred Bell/N/Person

obj

title

CEO/N/Position

yesterday/N

mod

mod

• Pairs of chains are extracted•Verb centered verb centered

pairs of chains instead of single paths

Previous Work: Linked Chain Model Stevenson & Greenwood 2005

Page 86: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Previous Work: Subtree-Model

verb centered

All chains dominated by a verb, which contain at least one relevant namedentity and their combinations

hire/V

Acme Inc./N/Organisation

Peter Smith/N/Person

replace/Vsubj obj

vpsc-mod

asHans Bloggs

/N/Person

pcomp-n

COO/N/Position

according to/P

Fred Bell/N/Person

obj

title

CEO/N/Position

yesterday/N

mod

mod

Sudo et al. (2003)

Page 87: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

hire/V

Acme Inc./N/Organisation

Peter Smith/N/Person

replace/Vsubj obj

vpsc-mod

asHans Bloggs

/N/Person

pcomp-n

COO/N/Position

according to/P

Fred Bell/N/Person

obj

title

CEO/N/Position

yesterday/N

mod

mod

Previous Work: Subtree-Model

verb centered

All chains dominated by a verb, which contain at least one relevant namedentity and their combinations

Sudo et al. (2003)

Page 88: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

hire/V

Acme Inc./N/Organisation

Peter Smith/N/Person

replace/Vsubj obj

vpsc-mod

asHans Bloggs

/N/Person

pcomp-n

COO/N/Position

according to/P

Fred Bell/N/Person

obj

title

CEO/N/Position

yesterday/N

mod

mod

Previous Work: Subtree-Model

verb centered

All chains dominated by a verb, which contain at least one relevant namedentity and their combinations

Sudo et al. (2003)

Page 89: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

hire/V

Acme Inc./N/Organisation

Peter Smith/N/Person

replace/Vsubj obj

vpsc-mod

asHans Bloggs

/N/Person

pcomp-n

COO/N/Position

according to/P

Fred Bell/N/Person

obj

title

CEO/N/Position

yesterday/N

mod

mod

Previous Work: Subtree-Model

verb centered

All chains dominated by a verb, which contain at least one relevant namedentity and their combinations

Sudo et al. (2003)

Page 90: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

hire/V

Acme Inc./N/Organisation

Peter Smith/N/Person

replace/Vsubj obj

vpsc-mod

asHans Bloggs

/N/Person

pcomp-n

COO/N/Position

according to/P

Fred Bell/N/Person

obj

title

CEO/N/Position

yesterday/N

mod

mod

Previous Work: Subtree-Model

verb centered

All chains dominated by a verb, which contain at least one relevant namedentity and their combinations

Sudo et al. (2003)

Page 91: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Acme Inc./N/Organisation

Fred Bell/N/Person

hire/V

Acme Inc./N/Organisation

Peter Smith/N/Person

replace/Vsubj obj

vpsc-mod

asHans Bloggs

/N/Person

pcomp-n

COO/N/Position

according to/P

Fred Bell/N/Person

obj

title

CEO/N/Position

yesterday/N

mod

mod

Previous Work: Subtree-Model

verb centered

All chains dominated by a verb, which contain at least one relevant namedentity and their combinations

Sudo et al. (2003)

Page 92: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

hire/V

Acme Inc./N/Organisation

Peter Smith/N/Person

replace/Vsubj obj

vpsc-mod

asHans Bloggs

/N/Person

pcomp-n

COO/N/Position

according to/P

Fred Bell/N/Person

obj

title

CEO/N/Position

yesterday/N

mod

mod

None of the existing models links the detected slot-filling candidates withtheir respective semantic roles

<person_in, person_out, position, organisation>

Page 93: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

DARE

a seed-driven and bottom-up rule learningin

a bootstrapping framework

State of the art

Domain Adaptive Relation Extraction Framework (DARE)

Experiments and evaluations

Performance analysis and discussion

Conclusion and future work

Page 94: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Properties of DARE

Samples of target relation instances serve as semantic seed

Systematic treatment of n-ary relations and their projections

Exploitation of relation projections for pattern discovery

Bottom-up compositional pattern discovery

A recursive linguistic rule representation

Rules contain semantic roles w.r.t. to target relation

Bottom-up compression method to generalize rules

Filtering of rule candidates by “domain relevance”

Page 95: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Properties of DARE

Samples of target relation instances serve as semantic seed

Systematic treatment of n-ary relations and their projections

Exploitation of relation projections for pattern discovery

Bottom-up compositional pattern discovery

A recursive linguistic rule representation

Rules contain semantic roles w.r.t. to target relation

Bottom-up compression method to generalize rules

Filtering of rule candidates by “domain relevance”

Novel

Page 96: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Bootstrapping Relation Extraction with Semantic Seed

Rule_1,…Rule_n

Adapted from DIPRE (Brin, 1998) and Snowball (Agichtein & Gravano, 2000)but extended and enriched with linguistic analysis

subject

verb

object

mod

head

mod mod

Page 97: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Bootstrapping Relation Extraction with Semantic Seed

DIPRE and Snowball

binary relations only, no projections, no linguistic analysis

DARE

n-ary relations and their projections, deep linguistic analysis

(in the experiments I use MINIPAR by Dekan Lin 1999)

Page 98: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Start of Bootstrapping (simplified)

e1

r1

r2

r3

m1

m2m3

m7

m4 m5 m6

m8

e2e1

m11

e5

e3

r4

e4

m9

m10

r5 r2

Page 99: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

t1

t

t3

rf2 rf3

n1 n2

n0

rf1

n3

t2

0. replace all nodes that are instan-tiated with the seed arguments bynew nodes. Label these new nodeswith the seed argument roles andtheir entity classes;

Pattern Collection

Page 100: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

t1

t

t3

n1 n2

n0n3

t2r2 r3r1

0. replace all nodes that are instan-tiated with the seed arguments bynew nodes. Label these new nodeswith the seed argument roles andtheir entity classes;

Pattern Collection

Page 101: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

t1

t

t3

n1

n0n3

r2r1

n2_r3_i

0. replace all nodes that are instan-tiated with the seed arguments bynew nodes. Label these new nodeswith the seed argument roles andtheir entity classes;

for i=1 to n

1. identify the set of the lowest non-terminal nodes N1 in t that dominatei arguments (possibly among othernodes).

2. substitute N1 by nodes labelled withthe seed argument roles and theirentity classes

3. prune the subtrees dominated by N1from t and add these subtrees intothe pattern collection. Thesesubtrees are assigned the argumentrole information and a unique id.

Pattern Collectiont2r3

n2_r3_i

Page 102: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

t

t3

n0n3

n2_r3_in1_r1_r2_i

0. replace all nodes that are instan-tiated with the seed arguments bynew nodes. Label these new nodeswith the seed argument roles andtheir entity classes;

for i=1 to n

1. identify the set of the lowest non-terminal nodes N1 in t that dominatei arguments (possibly among othernodes).

2. substitute N1 by nodes labelled withthe seed argument roles and theirentity classes

3. prune the subtrees dominated by N1from t and add these subtrees intothe pattern collection. Thesesubtrees are assigned the argumentrole information and a unique id.

Pattern Collectiont2r3

n2_r3_in1_r1_r2_i

t1r2r1

Page 103: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

t

t3

n0n3

n2_r3_in1_r1_r2_i

0. replace all nodes that are instan-tiated with the seed arguments bynew nodes. Label these new nodeswith the seed argument roles andtheir entity classes;

for i=1 to n

1. identify the set of the lowest non-terminal nodes N1 in t that dominatei arguments (possibly among othernodes).

2. substitute N1 by nodes labelled withthe seed argument roles and theirentity classes

3. prune the subtrees dominated by N1from t and add these subtrees intothe pattern collection. Thesesubtrees are assigned the argumentrole information and a unique id.

Pattern Collectiont2r3

n2_r3_in1_r1_r2_i

t1r2r1

Page 104: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

t n0

t2r3

n2_r3_in1_r1_r2_i

t1r2r1

n3_r1_r2_r3_k

t3

n2_r3_in1_r1_r2_i

n3_r1_r2_r3_k

0. replace all nodes that are instan-tiated with the seed arguments bynew nodes. Label these new nodeswith the seed argument roles andtheir entity classes;

for i=1 to n

1. identify the set of the lowest non-terminal nodes N1 in t that dominatei arguments (possibly among othernodes).

2. substitute N1 by nodes labelled withthe seed argument roles and theirentity classes

3. prune the subtrees dominated by N1from t and add these subtrees intothe pattern collection. Thesesubtrees are assigned the argumentrole information and a unique id.

Page 105: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

t n0

Pattern Collectiont2r3

n2_r3_in1_r1_r2_i

t1r2r1

n3_r1_r2_r3_k

t3

n2_r3_in1_r1_r2_i

n3_r1_r2_r3_k

0. replace all nodes that are instan-tiated with the seed arguments bynew nodes. Label these new nodeswith the seed argument roles andtheir entity classes;

for i=1 to n

1. identify the set of the lowest non-terminal nodes N1 in t that dominatei arguments (possibly among othernodes).

2. substitute N1 by nodes labelled withthe seed argument roles and theirentity classes

3. prune the subtrees dominated by N1from t and add these subtrees intothe pattern collection. Thesesubtrees are assigned the argumentrole information and a unique id.

Page 106: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

hire/V

Acme Inc./N/Organisation

Peter Smith/N/Person

replace/Vsubj obj

vpsc-mod

asHans Bloggs

/N/Person

pcomp-n

COO/N/Position

according to/P

Fred Bell/N/Person

obj

title

CEO/N/Position

yesterday/N

mod

mod

According to CEO Fred Bell, Acme Inc. hired Peter Smithas COO yesterday, replacing Hans Bloggs.

<Peter Smith/person_in, Hans Bloggs/person_out, COO /position, Acme Inc. /organisation>

Page 107: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

hire/V

Organisation/organisation

Person/person_in

replace/Vsubj obj

vpsc-mod

asPerson/

person_out

pcomp-n

Position/position

according to/P

Fred Bell/N/Person

obj

title

CEO/N/Position

yesterday/N

mod

mod

According to CEO Fred Bell, Acme Inc. hired Peter Smithas COO yesterday, replacing Hans Bloggs.

<Peter Smith/person_in, Hans Bloggs/person_out, COO /position, Acme Inc. /organisation>

Page 108: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

hire/V

Organisation/organisation

Person/person_in

replace/Vsubj obj

vpsc-mod

asPerson/

person_out

pcomp-n

Position/position

according to/P

Fred Bell/N/Person

obj

title

CEO/N/Position

yesterday/N

mod

mod

According to CEO Fred Bell, Acme Inc. hired Peter Smithas COO yesterday, replacing Hans Bloggs.

<Peter Smith/person_in, Hans Bloggs/person_out, COO /position, Acme Inc. /organisation>

Page 109: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

hire/V

Organisation/organisation

Person/person_in

replace/Vsubj obj

vpsc-mod

asPerson/

person_out

pcomp-n

Position/position

according to/P

Fred Bell/N/Person

obj

title

CEO/N/Position

yesterday/N

mod

mod

According to CEO Fred Bell, Acme Inc. hired Peter Smithas COO yesterday, replacing Hans Bloggs.

<Peter Smith/person_in, Hans Bloggs/person_out, COO /position, Acme Inc. /organisation>

Page 110: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

hire/V

Organisation/organisation

Person/person_in

replace/Vsubj obj

vpsc-mod

asPerson/

person_out

pcomp-n

Position/position

according to/P

Fred Bell/N/Person

obj

title

CEO/N/Position

yesterday/N

mod

mod

According to CEO Fred Bell, Acme Inc. hired Peter Smithas COO yesterday, replacing Hans Bloggs.

<Peter Smith/person_in, Hans Bloggs/person_out, COO /position, Acme Inc. /organisation>

Page 111: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

DARE Rule Components1. rule name: ri;

2. rule body: in AVM format containing:

head: the linguistic annotation of thetop node of the linguistic structure;

daughters: its value is a list of specificlinguistic structures (e.g., subject, object,head, mod), derived from the linguisticanalysis, e.g., dependency structures and thenamed entity information;

rules: its value is a DARE rule which extractsa subset of arguments of the target relation.

3. Output: n-tupel of arguments with their roles

Page 112: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

DARE Rule Components1. rule name: ri;

2. rule body: in AVM format containing:

head: the linguistic annotation of thetop node of the linguistic structure;

daughters: its value is a list of specificlinguistic structures (e.g., subject, object,head, mod), derived from the linguisticanalysis, e.g., dependency structures and thenamed entity information;

rules: its value is a DARE rule which extractsa subset of arguments of the target relation.

3. Output: the extracted relation instance,tupel of arguments with their roles

Page 113: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

prize_area_year_1

Page 114: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

State of the art

Domain Adaptive Relation Extraction Framework (DARE)

Experiments and evaluations

Performance analysis and discussion

Conclusion and future work

Page 115: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Two Domains

Award Events (start with subdomain Nobel Prizes)

reasons: good news coveragecomplete list of all award eventsgood starting point for other award domains

Management Succession Events

reason: comparison with previous work

Page 116: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Experiments

Two domains Nobel Prize Awards: <recipient, prize, area, year>

Management Succession: <person_in, person_out, position, organisation>

Test data sets

1MB199MUC-6

18.4 MB3328Nobel Prize A+B

12.6 MB2296Nobel Prize B (1999-2005)

5.8 MB1032Nobel Prize A (1981-1998)

Data AmountDoc NumberData Set Name

Page 117: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Evaluation Against Ideal Tables

62.9%80.6%<[Zewail, Ahmed H], nobel, chemistry,1999>A+B

50.7%71.6%<[Zewail, Ahmed H], nobel, chemistry,1999>Nobel Prize B

32.0%83.8%<[Arias, Oscar], nobel, peace, 1987>Nobel Prize A

31.0%87.3%<[Sen, Amartya], nobel, economics, 1998>Nobel Prize A

RecallPrecisionSeedData Set

Page 118: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Management Succession Domain

7.0%12.6% 1

48.0%62.0% 55

34.2%48.4% 20

21.8%15.1% 1

RecallPrecisionInitial Seed #

Page 119: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Comparison

Our result with 20 seeds (after 4 iterations)

- precision: 48.4%- recall: 34.2%

compares well with the best result reported so far by (Greenwood andStevenson, 2006) with the linked chain model starting with 7 hand-crafted patterns (after 190 iterations)

- precision: 43.4%- recall: 26.5%

Page 120: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Reusability of Rules

Prize award patterns

Detection of other Prizes such as Pulitzer Prize, Turner Prize Precision: 86.2%

Management succession

Domain independent binary pattern rules:Person-Organisation, Person-Position

Evaluation of top 100 relation instancesPrecision: 98%

Page 121: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

State of the art

Domain Adaptive Relation Extraction Framework (DARE)

Experiments and evaluations

Performance analysis and discussion

Conclusion and future work

Page 122: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

The Dream

Wouldn‘t it be wonderful if we could always automatically learnmost or all relevant patterns of some relation from one singlesemantic instance!

Or at least find all event instances.

This sounds too good to be true!

Page 123: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Research Questions

As scientists we want to know

Why does it work for some tasks?

Why doesn‘t it work for all tasks?

How can we estimate the suitability of domains?

How can we deal with less suitable domains?

Page 124: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Careful analysis confirmed thefollowing assumption:redundancy, both on patterns andevent mentions, helps.Frequently reported events makerare patterns reachable

PH

Er

EH

Pr

Popular patterns help to reach rarely mentioned events

Page 125: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Instance to PatternNobel Prize vs. Management Succession

Page 126: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Rule to Instances(Nobel Prize vs. Management Succession)

Page 127: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Insights

Results from graph theory help to understand the requirements on data.

Example: small world property

For data sets with continents and islands, we can sometimes exploitadditional data or auxiliary domains to bridge the islands by learningrare patterns.

Example: use of Nobel prize domain for learning patterns for eventsconcerning less popular prizes (many other prizes could be detected)

Page 128: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

State of the art

Domain Adaptive Relation Extraction Framework (DARE)

Experiments and evaluations

Performance analysis and discussion

Conclusion and future work

Page 129: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Conclusion

DARE is the first approach to combine the idea of bootstrapping IE systemswith a linguistic grammar

This can be illustrated by a simple formula:

reusable generic linguistic knowledge+ raw data+ a few examples (seed)= domain specific relation extraction grammar

In addition to the obvious practical advantages, the approach offerstheoretical benefits: It supports a view of IE as a systematic gradualapproximation of language understanding.

Page 130: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

Future Work

Improvement of recall Extension of learning data

• Bridging the islands by new additional data• Use of a related domain, e.g, Nobel Prize for other prizes

Improvement of rule generalization Intersentential extraction

Improvement of precision Negative rules (domain indepedent and domain specific) Integration of high-precision NLP analysis (HPSG)

Page 131: Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

ReferencesReferences

1.1. N. N. KushmerickKushmerick. . Wrapper induction: Efficiency and ExpressivenessWrapper induction: Efficiency and Expressiveness, Artificial, ArtificialIntelligence, 2000.Intelligence, 2000.

2.2. I. I. MusleaMuslea. . Extraction Patterns for Information ExtractionExtraction Patterns for Information Extraction. AAAI-99 Workshop on. AAAI-99 Workshop onMachine Learning for Information Extraction.Machine Learning for Information Extraction.

3.3. RiloffRiloff, E. and R. Jones. , E. and R. Jones. Learning Dictionaries for Information Extraction by Multi-LevelLearning Dictionaries for Information Extraction by Multi-LevelBootstrapping.Bootstrapping. In Proceedings of the Sixteenth National Conference on Artificial In Proceedings of the Sixteenth National Conference on ArtificialIntelligence (AAAI-99) , 1999, pp. 474-479.Intelligence (AAAI-99) , 1999, pp. 474-479.

4.4. R. R. YangarberYangarber, R. , R. GrishmanGrishman, P. , P. Tapanainen Tapanainen and S. and S. HuttunenHuttunen. . Automatic Acquisition ofAutomatic Acquisition ofDomain Knowledge for Information Extraction.Domain Knowledge for Information Extraction. In Proceedings of the 18th International In Proceedings of the 18th InternationalConference on Computational Linguistics: Conference on Computational Linguistics: COLING-2000COLING-2000, , SaarbrückenSaarbrücken..

5.5. F. Xu, H. F. Xu, H. Uszkoreit Uszkoreit and Hong Li. and Hong Li. Automatic Event and Relation Detection with SeedsAutomatic Event and Relation Detection with Seedsof Varying Complexityof Varying Complexity. In Proceedings of . In Proceedings of AAAI 2006 WorkshopAAAI 2006 Workshop Event Extraction and Event Extraction andSynthesis, Boston, July, 2006.Synthesis, Boston, July, 2006.

6.6. F. Xu, D F. Xu, D KurzKurz, J , J PiskorskiPiskorski, S , S SchmeierSchmeier. A Domain Adaptive Approach to Automatic. A Domain Adaptive Approach to AutomaticAcquisition of Domain Relevant Terms and their Relations with Bootstrapping. InAcquisition of Domain Relevant Terms and their Relations with Bootstrapping. InProceedings of LREC 2002.Proceedings of LREC 2002.

7.7. W. W. DrozdzyskiDrozdzyski, H.U. Krieger, J. , H.U. Krieger, J. PiskorskiPiskorski, U. , U. Schäfer Schäfer and and F. Xu. Shallow ProcessingShallow Processingwith Unification and Typed Feature Structures -- Foundations and Applicationswith Unification and Typed Feature Structures -- Foundations and Applications. In KI. In KI((Artifical Artifical Intelligence) journal 2004.Intelligence) journal 2004.

8.8. Feiyu Xu, Hans Feiyu Xu, Hans UszkoreitUszkoreit, Hong Li. , Hong Li. A Seed-driven Bottom-up Machine LearningA Seed-driven Bottom-up Machine LearningFramework for Extracting Relations of Various ComplexityFramework for Extracting Relations of Various Complexity. . In In Proceeedings Proceeedings of ACLof ACL2007, Prague2007, Prague

9.9. http://www.dfki.de/~neumann/ie-esslli04.htmlhttp://www.dfki.de/~neumann/ie-esslli04.html