relation extraction - sameer singhsameersingh.org/courses/...relation-extraction.pdf · relation...

49
Relation Extraction Prof. Sameer Singh CS 295: STATISTICAL NLP WINTER 2017 February 23, 2017 Based on slides from Dan Jurafski, Chris Manning, and everyone else they copied from.

Upload: others

Post on 26-May-2020

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

RelationExtraction

Prof.SameerSinghCS295:STATISTICALNLP

WINTER2017

February23,2017

BasedonslidesfromDanJurafski,ChrisManning,andeveryoneelsetheycopiedfrom.

Page 2: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

Outline

IntroductiontoRelationExtraction

Hand-writtenPatterns

SupervisedMachineLearning

SemiandUnsupervisedLearning

CS295:STATISTICALNLP(WINTER2017) 2

Page 3: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

Outline

IntroductiontoRelationExtraction

Hand-writtenPatterns

SupervisedMachineLearning

SemiandUnsupervisedLearning

CS295:STATISTICALNLP(WINTER2017) 3

Page 4: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

KnowledgeExtractionJohn was born in Liverpool, to Julia and Alfred Lennon.

Text

JohnLennon

AlfredLennon

JuliaLennon

Liverpoolbirthplace

childOf

childOf

LiteralFacts

CS295:STATISTICALNLP(WINTER2017) 4

Page 5: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

RelationExtractionCompanyreport: “InternationalBusinessMachinesCorporation(IBMorthecompany)wasincorporatedintheStateofNewYorkonJune16,1911,astheComputing-Tabulating-RecordingCo.(C-T-R)…”

ExtractedComplexRelation:Company-Founding

Company IBMLocation NewYorkDate June16,1911Original-Name Computing-Tabulating-RecordingCo.

ButwewillfocusonthesimplertaskofextractingrelationtriplesFounding-year(IBM,1911)Founding-location(IBM,New York)

CS295:STATISTICALNLP(WINTER2017) 5

Page 6: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

ExtractingRelationTriplesTheLelandStanfordJuniorUniversity,commonlyreferredtoasStanfordUniversityorStanford,isanAmericanprivateresearchuniversitylocatedinStanford,California …nearPaloAlto,California…LelandStanford…foundedtheuniversityin1891

Stanford EQ Leland Stanford Junior UniversityStanford LOC-IN CaliforniaStanford IS-A research universityStanford LOC-NEAR Palo AltoStanford FOUNDED-IN 1891Stanford FOUNDER Leland Stanford

CS295:STATISTICALNLP(WINTER2017) 6

Page 7: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

NewsDomainROLE:relatesapersontoanorganizationorageopoliticalentity◦ subtypes:member,owner,affiliate,client,citizen

PART:generalizedcontainment◦ subtypes:subsidiary,physicalpart-of,setmembership

AT:permanentandtransientlocations◦ subtypes:located,based-in,residence

SOCIAL:socialrelationsamongpersons◦ subtypes:parent,sibling,spouse,grandparent,associate

CS295:STATISTICALNLP(WINTER2017) 7

Page 8: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

AutomatedContentExtraction

ARTIFACT

GENERALAFFILIATION

ORGAFFILIATION

PART-WHOLE

PERSON-SOCIAL PHYSICAL

Located

Near

Business

Family Lasting Personal

Citizen-Resident-Ethnicity-Religion

Org-Location-Origin

Founder

EmploymentMembership

OwnershipStudent-Alum

Investor

User-Owner-Inventor-Manufacturer

GeographicalSubsidiary

Sports-Affiliation

CS295:STATISTICALNLP(WINTER2017) 8

Page 9: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

ACERelationsExamples

Physical-LocatedPER-GPEHe was in Tennessee

Part-Whole-SubsidiaryORG-ORGXYZ, the parent company of ABC

Person-Social-FamilyPER-PERJohn’s wife Yoko

Org-AFF-FounderPER-ORGSteve Jobs, co-founder of Apple…

CS295:STATISTICALNLP(WINTER2017) 9

Page 10: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

GeographicalRelations

CS295:STATISTICALNLP(WINTER2017) 10

Page 11: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

MedicalRelationsUMLSResource

CS295:STATISTICALNLP(WINTER2017) 11

Page 12: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

MedicalRelations

Doppler echocardiography can be used to diagnose left anterior descending artery stenosis in

patients with type 2 diabetes

ê

Echocardiography,DopplerDIAGNOSES Acquiredstenosis

CS295:STATISTICALNLP(WINTER2017) 12

Page 13: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

FreebaseRelations

Thousandsofrelationsandmillionsofinstances!ManuallycreatedfrommultiplesourcesincludingWikipediaInfoBoxes

CS295:STATISTICALNLP(WINTER2017) 13

Page 14: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

OntologicalRelations

IS-A(hypernym):subsumption betweenclasses◦ Giraffe IS-Aruminant IS-A ungulate IS-A mammalIS-Avertebrate IS-Aanimal…

Instance-of:relationbetweenindividualandclass◦ San Francisco instance-ofcity

CS295:STATISTICALNLP(WINTER2017) 14

Page 15: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

Outline

IntroductiontoRelationExtraction

Hand-writtenPatterns

SupervisedMachineLearning

SemiandUnsupervisedLearning

CS295:STATISTICALNLP(WINTER2017) 15

Page 16: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

RulesforIS-ARelation

EarlyintuitionfromHearst(1992)“Agarisasubstancepreparedfrom

amixtureofredalgae,suchasGelidium,forlaboratoryorindustrialuse”

WhatdoesGelidium mean?

Howdoyouknow?

CS295:STATISTICALNLP(WINTER2017) 16

Page 17: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

Hearst’sPatternsforIS-Arelations

Hearst(1992):AutomaticAcquisitionofHyponyms

“Y such as X ((, X)* (, and|or) X)”“such Y as X”“X or other Y”“X and other Y”“Y including X”“Y, especially X”

CS295:STATISTICALNLP(WINTER2017) 17

Page 18: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

Hearst’sPatternsforIS-Arelations

Hearstpattern ExampleoccurrencesXandother Y ...temples,treasuries,andotherimportantcivicbuildings.

XorotherY Bruises,wounds,brokenbonesorotherinjuries...

YsuchasX Thebowlute,suchastheBambarandang...

Such YasX ...such authorsas Herrick,Goldsmith,andShakespeare.

YincludingX ...common-lawcountries,including CanadaandEngland...

Y,especiallyX Europeancountries,especially France,England,andSpain...

CS295:STATISTICALNLP(WINTER2017) 18

Page 19: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

ExtractingRicherRelations

Intuition:Relationsoftenholdbetweenspecifictypesofentities◦ located-in(ORGANIZATION,LOCATION)◦ founded (PERSON,ORGANIZATION)◦ cures (DRUG,DISEASE)

StartwithNamedEntitytagstoextractrelation!

CS295:STATISTICALNLP(WINTER2017) 19

Page 20: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

EntityTypesaren’tenough

Drug Disease

Cure?Prevent?

Cause?

Whichrelationsholdbetween2entities?

CS295:STATISTICALNLP(WINTER2017) 20

Page 21: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

Whichrelationsholdbetweentwoentities?

PERSON ORGANIZATION

Founder?

Investor?

Member?

Employee?

President?

CS295:STATISTICALNLP(WINTER2017) 21

Page 22: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

ExtractingRicherRelationsUsingRulesandNamedEntities

Whoholdswhatofficeinwhatorganization?

PERSON, POSITIONof ORG◦ GeorgeMarshall,SecretaryofStateoftheUnitedStates

PERSON(named|appointed|chose|etc.) PERSON Prep?POSITION◦ TrumanappointedMarshallSecretaryofState

PERSON [be]?(named|appointed|etc.)Prep?ORG POSITION◦ GeorgeMarshallwasnamedUSSecretaryofState

CS295:STATISTICALNLP(WINTER2017) 22

Page 23: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

ComplexSurfacePatternsCombinetokens,dependencypaths,andentitytypestodefinerules.

Argument1 Argument2,Person Organization

DT CEO of

appos nmod

casedet

BillGates,theCEOofMicrosoft,said…Mr.Jobs,thebrilliantandcharmingCEOofAppleInc.,said…… announcedbySteveJobs,theCEOofApple.… announcedbyBillGates,thedirectorandCEOofMicrosoft.… musedBill,aformerCEOofMicrosoft.andmanyotherpossibleinstantiations…

CS295:STATISTICALNLP(WINTER2017) 23

Page 24: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

Rule-BasedExtraction

UseacollectionofrulesasthesystemitselfArgument1 Argument2,Person Organization

DT CEO of

appos nmod

casedet Implies Argument1 Argument2headOf

Source:• Manuallyspecified• LearnedfromDataMultipleRules:• Attachpriorities/precedence• Attachprobabilities(morelater)

Varia

tions

CS295:STATISTICALNLP(WINTER2017) 24

Page 25: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

Hand-builtpatternsforrelations

◦ Humanpatternstendtobehigh-precision◦ Canbetailoredtospecificdomains◦ Easytodebug:whyapredictionwasmade,howtofix?

Pluses

◦ Humanpatternsareoftenlow-recall◦ Alotofworktothinkofallpossiblepatterns!◦ Don’twanttohavetodothisforeveryrelation!◦ We’dlikebetteraccuracy(generalization)

Minuses

CS295:STATISTICALNLP(WINTER2017) 25

Page 26: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

Outline

IntroductiontoRelationExtraction

Hand-writtenPatterns

SupervisedMachineLearning

SemiandUnsupervisedLearning

CS295:STATISTICALNLP(WINTER2017) 26

Page 27: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

SupervisedMachineLearningChooseasetofrelationswe’dliketoextractChooseasetofrelevantnamedentitiesFindandlabeldata◦ Choosearepresentativecorpus◦ Labelthenamedentitiesinthecorpus◦ Hand-labeltherelationsbetweentheseentities◦ Breakintotraining,development,andtest

Trainaclassifieronthetrainingset

CS295:STATISTICALNLP(WINTER2017) 27

Page 28: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

AutomatedContentExtraction

ARTIFACT

GENERALAFFILIATION

ORGAFFILIATION

PART-WHOLE

PERSON-SOCIAL PHYSICAL

Located

Near

Business

Family Lasting Personal

Citizen-Resident-Ethnicity-Religion

Org-Location-Origin

Founder

EmploymentMembership

OwnershipStudent-Alum

Investor

User-Owner-Inventor-Manufacturer

GeographicalSubsidiary

Sports-Affiliation

ACE2008“RelationExtractionTask” CS295:STATISTICALNLP(WINTER2017) 28

Page 29: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

RelationExtractionClassifytherelationbetweentwoentitiesinasentence

AmericanAirlines,aunitofAMR,immediatelymatchedthemove,spokesmanTimWagnersaid.

SUBSIDIARY

FAMILYEMPLOYMENT

NIL

FOUNDER

CITIZEN

INVENTOR…

CS295:STATISTICALNLP(WINTER2017) 29

Page 30: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

WordFeaturesforRelationExtraction

HeadwordsofM1andM2,andcombinationAirlinesWagnerAirlines-Wagner

BagofwordsandbigramsinM1andM2

{American,Airlines,Tim,Wagner,AmericanAirlines,TimWagner}

WordsorbigramsinparticularpositionsleftandrightofM1/M2M2:-1spokesmanM2:+1said

Bagofwordsorbigramsbetweenthetwoentities{a,AMR,of,immediately,matched,move,spokesman,the,unit}

AmericanAirlines,aunitofAMR,immediatelymatchedthemove,spokesmanTimWagnersaidMention1 Mention2

CS295:STATISTICALNLP(WINTER2017) 30

Page 31: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

NamedEntityTypeandMentionLevelFeatures

Named-entitytypes◦ M1:ORG◦ M2:PERSON

Concatenationofthetwonamed-entitytypes◦ ORG-PERSON

EntityLevelofM1andM2 (NAME,NOMINAL,PRONOUN)◦ M1:NAME [itor he wouldbePRONOUN]◦ M2:NAME [thecompanywouldbeNOMINAL]

AmericanAirlines,aunitofAMR,immediatelymatchedthemove,spokesmanTimWagnersaidMention1 Mention2

CS295:STATISTICALNLP(WINTER2017) 31

Page 32: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

DependencyParseFeaturesforRelationExtraction

BasesyntacticchunksequencefromonetotheotherNPNPPPVPNPNP

ConstituentpaththroughthetreefromonetotheotherNPé NPé Sé Sê NP

Dependencypath

AirlinesmatchedWagnersaid

AmericanAirlines,aunitofAMR,immediatelymatchedthemove,spokesmanTimWagnersaidMention1 Mention2

CS295:STATISTICALNLP(WINTER2017) 32

Page 33: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

Gazeteer andTriggerwordfeaturesforrelationextraction

Triggerlistforfamily:kinshipterms◦ parent,wife,husband,grandparent,etc.[fromWordNet]

Gazeteer:◦ Listsofusefulgeoorgeopoliticalwords◦ Countrynamelist◦ Othersub-entities

CS295:STATISTICALNLP(WINTER2017) 33

Page 34: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

AmericanAirlines,aunitofAMR,immediatelymatchedthemove,spokesmanTimWagnersaid.

CS295:STATISTICALNLP(WINTER2017) 34

Page 35: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

SupervisedExtractionMachineLearning:hopefully,generalizesthelabelsintherightway

UseallofNLPasfeatures:words,POS,NER,dependencies,embeddings

However

Usually,alotoflabeleddata isneeded,whichisexpensive&timeconsuming.Requiresalotoffeatureengineering!

Classifier

P(birthplace)= 0.75

JohnwasborninLiverpool,toJuliaandAlfredLennon.

FeatureEngineering

…NER DepPath Textinb/w embeddingsPOS

CS295:STATISTICALNLP(WINTER2017) 35

Page 36: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

SupervisedRelationExtraction

◦ Cangethighaccuraciesifenoughtrainingdata◦ Iftestsimilarenoughtotraining◦ CanutilizeanumberofNLPtasks

Pluses

◦ Labelingalargetrainingsetisexpensive◦ Supervisedmodelsarebrittle,don’tgeneralizewelltodifferentgenres

Minuses

CS295:STATISTICALNLP(WINTER2017) 36

Page 37: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

Outline

IntroductiontoRelationExtraction

Hand-writtenPatterns

SupervisedMachineLearning

SemiandUnsupervisedLearning

CS295:STATISTICALNLP(WINTER2017) 37

Page 38: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

Seed-basedorbootstrappingapproachestorelationextraction

Notrainingset?Maybeyouhave:◦ Afewseedtuplesor◦ Afewhigh-precisionpatterns

Canyouusethoseseedstodosomethinguseful?◦ Bootstrapping:usetheseedstodirectlylearnarelation

CS295:STATISTICALNLP(WINTER2017) 38

Page 39: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

RelationBootstrapping

Gatherasetofseedpairsthathavetherelation1. Findsentenceswiththesepairs2. Lookatthecontextbetweenoraroundthe

pairandgeneralizethecontexttocreatepatterns

3. Usethepatternstogathermorepairs4. Repeat

CS295:STATISTICALNLP(WINTER2017) 39

Page 40: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

BootstrappingExample<MarkTwain,Elmira>Seedtupleod“diedin”

Lookfortheenvironmentsoftheseedtuple

“MarkTwainisburiedinElmira,NY.”XisburiedinY

“ThegraveofMarkTwainisinElmira”ThegraveofXisinY

“ElmiraisMarkTwain’sfinalrestingplace”YisX’sfinalrestingplace.

Usethosepatternstofindnewtuples

Repeat

CS295:STATISTICALNLP(WINTER2017) 40

Page 41: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

Dipre:Extract<author,book>pairsStartwith5seeds:

FindInstancesontheWeb:TheComedyofErrors,by WilliamShakespeare,wasTheComedyofErrors,byWilliamShakespeare,isTheComedyofErrors,oneofWilliamShakespeare'searliestattemptsTheComedyofErrors,oneofWilliamShakespeare'smost

Extractpatterns(groupbymiddle,takelongestcommonprefix/suffix)?x , by ?y , ?x , one of ?y ‘s

Nowiterate,findingnewseedsthatmatchthepattern

Author BookIsaacAsimov TheRobots ofDawnDavidBrin Startide RisingJamesGleick Chaos:MakingaNewScienceCharlesDickens GreatExpectationsWilliamShakespeare TheComedyofErrors

Brin,Sergei.1998.ExtractingPatterns… CS295:STATISTICALNLP(WINTER2017) 41

Page 42: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

SnowballSimilariterativealgorithm

Groupinstancesw/similarprefix,middle,suffix,extractpatterns◦ ButrequirethatXandYbenamedentities◦ Andcomputeaconfidenceforeachpattern

{’s, in, headquarters}

{in, based} ORGANIZATIONLOCATION

Organization LocationofHeadquartersMicrosoft RedmondExxon IrvingIBM Armonk

ORGANIZATION LOCATION .69

.75

E.Agichtein andL.Gravano,ICDL(2000) CS295:STATISTICALNLP(WINTER2017) 42

Page 43: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

DistantSupervision

Combinebootstrappingwithsupervisedlearning◦ Insteadof5(orjustafew)seeds,◦ Usealargedatabasetogethuge#ofseedexamples

◦ Createlotsoffeaturesfromalltheseexamples◦ Combineinasupervisedclassifier

Snow,Jurafsky,Ng(2005),Wu&Weld(2007),Mintz,Bills,Snow,Jurafsky (2009) CS295:STATISTICALNLP(WINTER2017) 43

Page 44: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

DistantlySupervisedlearningofrelationextractionpatterns

Foreachrelation

Foreachtupleinbigdatabase

Findsentencesinlargecorpuswithbothentities

Extractfrequentfeatures(parse, words,etc)

Trainsupervisedclassifierusingthesepatterns

4

1

2

3

5

PERwasborninLOCPER,born(XXXX),LOCPER’sbirthplaceinLOC

<EdwinHubble,Marshfield><AlbertEinstein,Ulm>

Born-In

HubblewasborninMarshfieldEinstein,born(1879),UlmHubble’sbirthplaceinMarshfield

P(born-in | f1,f2,f3,…,f70000)

CS295:STATISTICALNLP(WINTER2017) 44

Page 45: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

DistantSupervisionParadigm

Likesupervisedclassification:◦ Usesaclassifierwithlotsoffeatures◦ Supervisedbydetailedhand-createdknowledge◦ Doesn’trequireiterativelyexpandingpatterns

Likeunsupervisedclassification:◦ Usesverylargeamountsofunlabeleddata◦ Notsensitivetogenreissuesintrainingcorpus

CS295:STATISTICALNLP(WINTER2017) 45

Page 46: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

UnsupervisedRelationExtraction

OpenInformationExtraction:◦ extractrelationsfromthewebwithnotrainingdata,nolistofrelations

1. Useparseddatatotraina“trustworthytuple”classifier

2. Single-passextractallrelationsbetweenNPs,keepiftrustworthy

3. Assessorranksrelationsbasedontextredundancy(FCI,specializesin,softwaredevelopment)

(Tesla,invented,coiltransformer)

Banko,Cararella,Soderland,Broadhead,Etzioni.2007 CS295:STATISTICALNLP(WINTER2017) 46

Page 47: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

EvaluationofSemi-supervisedandUnsupervisedRelationExtraction

Sinceitextractstotallynewrelationsfromtheweb◦ Thereisnogoldsetofcorrectinstancesofrelations!◦ Can’tcomputeprecision(don’tknowwhichonesarecorrect)◦ Can’tcomputerecall(don’tknowwhichonesweremissed)

Instead,wecanapproximateprecision(only)◦ Drawarandomsampleofrelationsfromoutput,checkprecisionmanually

Canalsocomputeprecisionatdifferentlevelsofrecall.◦ Precisionfortop1000newrelations,top10,000newrelations,top100,000◦ Ineachcasetakingarandomsampleofthatset

Butnowaytoevaluaterecall

P̂ = # of correctly extracted relations in the sampleTotal # of extracted relations in the sample

CS295:STATISTICALNLP(WINTER2017) 47

Page 48: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

Outline

IntroductiontoRelationExtraction

Hand-writtenPatterns

SupervisedMachineLearning

SemiandUnsupervisedLearning

CS295:STATISTICALNLP(WINTER2017) 48

Page 49: Relation Extraction - Sameer Singhsameersingh.org/courses/...relation-extraction.pdf · Relation Extraction Classify the relation between two entities in a sentence American Airlines,

Upcoming…

• Homework3isdueonFebruary27• Write-upanddatahasbeenreleased.Homework

• Statusreportduein1.5weeks:March2,2017• Instructionscomingsoon• Only5pages

Project

• Papersummaries:February28,March14• Only1 pageeachSummaries

CS295:STATISTICALNLP(WINTER2017) 49