a tutorial - knowledge graph construction from text · 2018. 2. 5. · tutorial outline 1....

25
Knowledge Graph Construction from Text AAAI 2017 J AY P UJARA ,S AMEER S INGH ,B HAVANA D ALVI

Upload: others

Post on 11-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Tutorial - Knowledge Graph Construction from Text · 2018. 2. 5. · Tutorial Outline 1. Knowledge Graph Primer [Jay] 2. Knowledge Extraction from Text a. ... Entity resolution,

KnowledgeGraphConstructionfromTextAAAI2017

JAY PUJARA,SAMEER SINGH,BHAVANA DALVI

Page 2: A Tutorial - Knowledge Graph Construction from Text · 2018. 2. 5. · Tutorial Outline 1. Knowledge Graph Primer [Jay] 2. Knowledge Extraction from Text a. ... Entity resolution,

TutorialOverview

Part3:GraphConstruction

Part1:KnowledgeGraphs

Part4:CriticalAnalysis

Part2:KnowledgeExtraction

2

Page 3: A Tutorial - Knowledge Graph Construction from Text · 2018. 2. 5. · Tutorial Outline 1. Knowledge Graph Primer [Jay] 2. Knowledge Extraction from Text a. ... Entity resolution,

TutorialOutline1. KnowledgeGraphPrimer [Jay]

2. KnowledgeExtractionfromText

a. NLPFundamentals [Sameer]b. InformationExtraction [Bhavana]

CoffeeBreak

3. KnowledgeGraphConstructiona. ProbabilisticModels [Jay]b. EmbeddingTechniques [Sameer]

4. CriticalOverviewandConclusion [Bhavana]

3

Page 4: A Tutorial - Knowledge Graph Construction from Text · 2018. 2. 5. · Tutorial Outline 1. Knowledge Graph Primer [Jay] 2. Knowledge Extraction from Text a. ... Entity resolution,

WhatisKnowledgeExtraction?

4

John was born in Liverpool, to Julia and Alfred Lennon.Text

JohnLennon

AlfredLennon

JuliaLennon

Liverpoolbirthplace

childOf

childOf

LiteralFacts

Page 5: A Tutorial - Knowledge Graph Construction from Text · 2018. 2. 5. · Tutorial Outline 1. Knowledge Graph Primer [Jay] 2. Knowledge Extraction from Text a. ... Entity resolution,

NLPFundamentalsEXTRACTINGSTRUCTURESFROMLANGUAGE

Page 6: A Tutorial - Knowledge Graph Construction from Text · 2018. 2. 5. · Tutorial Outline 1. Knowledge Graph Primer [Jay] 2. Knowledge Extraction from Text a. ... Entity resolution,

WhatisNLP?

NLP

UnstructuredAmbiguousLotsandlotsofit!

Humanscanreadthem,but… veryslowly… can’trememberall… can’tanswerquestions

“Knowledge”

StructuredPrecise,ActionableSpecifictothetask

Canbeusedfordownstreamapplications,suchascreatingKnowledgeGraphs!

6

Page 7: A Tutorial - Knowledge Graph Construction from Text · 2018. 2. 5. · Tutorial Outline 1. Knowledge Graph Primer [Jay] 2. Knowledge Extraction from Text a. ... Entity resolution,

WhatisNLP?

John was born in Liverpool, to Julia and Alfred Lennon.

NaturalLanguageProcessing

NNP VBD VBD IN NNP TO NNP CC NNP NNP

John was born in Liverpool, to Julia and Alfred Lennon.Person Location Person Person

Lennon..JohnLennon...

Mrs.Lennon....hismother..

hisfatherAlfredhe

thePool

7

Page 8: A Tutorial - Knowledge Graph Construction from Text · 2018. 2. 5. · Tutorial Outline 1. Knowledge Graph Primer [Jay] 2. Knowledge Extraction from Text a. ... Entity resolution,

WhatisInformationExtraction?

JohnLennon

AlfredLennon

JuliaLennon

Liverpoolbirthplace

childOf

childOf

spouse

InformationExtraction

NNP VBD VBD IN NNP TO NNP CC NNP NNP

John was born in Liverpool, to Julia and Alfred Lennon.Person Location Person Person

Lennon..JohnLennon...

Mrs.Lennon....hismother..

hisfatherAlfredhe

thePool

8

Page 9: A Tutorial - Knowledge Graph Construction from Text · 2018. 2. 5. · Tutorial Outline 1. Knowledge Graph Primer [Jay] 2. Knowledge Extraction from Text a. ... Entity resolution,

BreakingitDown

John was born in Liverpool, to Julia and Alfred Lennon.NNP VBD VBD IN NNP TO NNP CC NNP NNP

John was born in Liverpool, to Julia and Alfred Lennon.Person Location Person Person

Lennon..JohnLennon...

Mrs.Lennon....hismother..

hisfatherAlfredhe

thePool

Senten

ce DependencyParsing,Partofspeechtagging,Namedentityrecognition…

Documen

t

Coreference Resolution...

JohnLennon

AlfredLennon

JuliaLennon

Liverpoolbirthplace

childOf

childOf

spouse

Inform

ation

Extractio

n

Entityresolution,Entitylinking,Relationextraction…

9

Page 10: A Tutorial - Knowledge Graph Construction from Text · 2018. 2. 5. · Tutorial Outline 1. Knowledge Graph Primer [Jay] 2. Knowledge Extraction from Text a. ... Entity resolution,

Tokenization&SentenceSplitting

UsesinKGConstruction:• StrictlyconstrainsotherNLPtasks

• PartsofSpeech• DependencyParsing

• DirectlyeffectsKGnodes/edges• Mentionboundaries• Relationswithinsentences

Howitisdone:• Regularexpressions,butnottrivial

• Mr.,Yahoo!,lower-case• Fornon-English,incrediblydifficult!

• Chinese:no“space”character• Non-trivialforsomedomains…

• Whatisa“token”inBioNLP?

“Mr.BobDobolina isthinkin'ofamasterplan. Whydoesn'thequit?”

[Mr.][Bob][Dobolina][is][thinkin’][of][a][master][plan][.][Why][doesn't][he][quit][?]

10

Page 11: A Tutorial - Knowledge Graph Construction from Text · 2018. 2. 5. · Tutorial Outline 1. Knowledge Graph Primer [Jay] 2. Knowledge Extraction from Text a. ... Entity resolution,

TaggingthePartsofSpeech

John was born in Liverpool, to Julia and Alfred Lennon.NNP VBD VBD IN NNP TO NNP CC NNP NNP

UsesinKGConstruction:• Entitiesappearasnouns• Verbsareveryuseful

• Foridentifyingrelations• Foridentifyingentitytypes

• ImportantfordownstreamNLP• NER,DependencyParsing,…

Howitisdone:• Contextisimportant!

• run,table, bar,…• Labelwholesentencetogether

• “Structuredprediction”• ConditionalRandomFields,..• Now:CNNs,LSTMs,…

11

Page 12: A Tutorial - Knowledge Graph Construction from Text · 2018. 2. 5. · Tutorial Outline 1. Knowledge Graph Primer [Jay] 2. Knowledge Extraction from Text a. ... Entity resolution,

DetectingNamedEntities

UsesinKGConstruction:• Mentionsdescribesthenodes• Typesareincrediblyimportant!

• Oftenrestrictrelations• Fine-grainedtypesareinformative!

• Brooklyn:city• Sanders:politician,senator

Howitisdone:• Contextisimportant!

• Georgia,Washington,…• JohnDeere,ThomasCook,…• Princeton,Amazon,…

• Labelwholesentencetogether• Structuredpredictionagain

John was born in Liverpool, to Julia and Alfred Lennon.Person Location Person Person

12

Page 13: A Tutorial - Knowledge Graph Construction from Text · 2018. 2. 5. · Tutorial Outline 1. Knowledge Graph Primer [Jay] 2. Knowledge Extraction from Text a. ... Entity resolution,

NER:EntityTypes

3class: Location,Person,Organization

4class: Location,Person,Organization,Misc

7class: Location,Person,Organization,Money,Percent,Date,Time

FromStanfordCoreNLP (http://nlp.stanford.edu/software/CRF-NER.shtml)

PERSON People,includingfictional.NORP Nationalitiesorreligiousorpoliticalgroups.FACILITY Buildings,airports,highways,bridges,etc.ORG Companies,agencies,institutions,etc.GPE Countries,cities,states.LOC Non-GPElocations,mountainranges,bodiesof

water.PRODUCT Objects,vehicles,foods,etc.(Notservices.)EVENT Namedhurricanes,battles,wars,sportsevents,etc.WORK_OF_ART Titlesofbooks,songs,etc.LANGUAGE Anynamedlanguage.

StanfordCoreNLP

spaCy.io

13

Page 14: A Tutorial - Knowledge Graph Construction from Text · 2018. 2. 5. · Tutorial Outline 1. Knowledge Graph Primer [Jay] 2. Knowledge Extraction from Text a. ... Entity resolution,

NER:EntityTypes

•Moreonthislater…

FromLing&Weld.AAAI2012(http://aiweb.cs.washington.edu/ai/pubs/ling-aaai12.pdf)

Fine-grainedTypes

14

Page 15: A Tutorial - Knowledge Graph Construction from Text · 2018. 2. 5. · Tutorial Outline 1. Knowledge Graph Primer [Jay] 2. Knowledge Extraction from Text a. ... Entity resolution,

DependencyParsing

UsesinKGConstruction:• Incrediblyusefulforrelations!

• Whatverbisattached?• Relationtowhichmention?

• Incrediblyusefulforattributes!• Appositives:“X,theCEO,…”

• Pathsareusedassurfacerelations

Howitisdone:• Model:scoretreesusingfeatures

• Lexical:words,POS,…• Structure:distance,…

• Prediction:Searchovertrees• greedy,spanningtree,belief

propagation,dynamicprog,…

Usinghttp://nlp.stanford.edu:8080/corenlp/process 15

Page 16: A Tutorial - Knowledge Graph Construction from Text · 2018. 2. 5. · Tutorial Outline 1. Knowledge Graph Primer [Jay] 2. Knowledge Extraction from Text a. ... Entity resolution,

DependencyPaths

John was born in Liverpool, to Julia and Alfred Lennon.

John, Liverpool

John, Julia

John, Alfred Lennon

TextPatterns

“wasbornin”

“wasborninLiverpool,to”

“wasborninLiverpool,toJuliaand”

DependencyPaths

“wasbornin”

“wasbornto”

“wasbornto”

nmod:to

nmod:in

case

case

16

Page 17: A Tutorial - Knowledge Graph Construction from Text · 2018. 2. 5. · Tutorial Outline 1. Knowledge Graph Primer [Jay] 2. Knowledge Extraction from Text a. ... Entity resolution,

Within-documentCoreference

John was born in Liverpool, to Julia and Alfred Lennon.

Mrs.Lennon....hismother..

hisfather

Alfred

hethePool

UsesinKGConstruction:• Morecontextforeachentity!• Manyrelationsoccuronpronouns

• “Heismarriedtoher”• Coref canbeusedfortypes

• Nominals:Thepresident,…• Difficult,sooftenignored

Howitisdone:• Mo`del:scorepairwiselinks

• deppath,similarity,types,…• “representativemention”

• Prediction:Searchoverclusterings• greedy(lefttoright),ILP,

beliefpropagation,MCMC,…

Lennon..

JohnLennon...

He…

17

Page 18: A Tutorial - Knowledge Graph Construction from Text · 2018. 2. 5. · Tutorial Outline 1. Knowledge Graph Primer [Jay] 2. Knowledge Extraction from Text a. ... Entity resolution,

InformationExtraction

JohnLennon

AlfredLennon

JuliaLennon

Liverpoolbirthplace

childOf

childOf

spouse

InformationExtraction

NNP VBD VBD IN NNP TO NNP CC NNP NNP

John was born in Liverpool, to Julia and Alfred Lennon.Person Location Person Person

Lennon..JohnLennon...

Mrs.Lennon....hismother..

hisfatherAlfredhe

thePool

18

Page 19: A Tutorial - Knowledge Graph Construction from Text · 2018. 2. 5. · Tutorial Outline 1. Knowledge Graph Primer [Jay] 2. Knowledge Extraction from Text a. ... Entity resolution,

SurfacePatternsCombinetokens,dependencypaths,andentitytypestodefinerules.

Argument1 Argument2,Person Organization

DT CEO of

appos nmod

casedet

BillGates,theCEOofMicrosoft,said…Mr.Jobs,thebrilliantandcharmingCEOofAppleInc.,said…… announcedbySteveJobs,theCEOofApple.… announcedbyBillGates,thedirectorandCEOofMicrosoft.… musedBill,aformerCEOofMicrosoft.andmanyotherpossibleinstantiations…

19

Page 20: A Tutorial - Knowledge Graph Construction from Text · 2018. 2. 5. · Tutorial Outline 1. Knowledge Graph Primer [Jay] 2. Knowledge Extraction from Text a. ... Entity resolution,

Rule-BasedExtraction

Highprecision:whenitfires,it’scorrectEasytoexplainpredictionsEasytofixmistakes

Useacollectionofrulesasthesystemitself

However…OnlyworkwhentherulesfirePoorrecall:Donotgeneralize!

Argument1 Argument2,Person Organization

DT CEO of

appos nmod

casedet Implies Argument1 Argument2headOf

Source:• Manuallyspecified• LearnedfromDataMultipleRules:• Attachpriorities/precedence• Attachprobabilities(morelater)

Variations

20

Page 21: A Tutorial - Knowledge Graph Construction from Text · 2018. 2. 5. · Tutorial Outline 1. Knowledge Graph Primer [Jay] 2. Knowledge Extraction from Text a. ... Entity resolution,

SupervisedExtraction

MachineLearning:hopefully,generalizesthelabelsintherightway

UseallofNLPasfeatures:words,POS,NER,dependencies,embeddings

However

Usually,alotoflabeleddata isneeded,whichisexpensive&timeconsuming.Requiresalotoffeatureengineering!

Classifier

P(birthplace)= 0.75

John was born in Liverpool, to Julia and Alfred Lennon.

FeatureEngineering

…NER DepPath Textinb/w embeddingsPOS

21

Page 22: A Tutorial - Knowledge Graph Construction from Text · 2018. 2. 5. · Tutorial Outline 1. Knowledge Graph Primer [Jay] 2. Knowledge Extraction from Text a. ... Entity resolution,

EntityResolution&Linking

...during the late 60's and early 70's, Kevin Smith worked with several local...

...the term hip-hop is attributed to Lovebug Starski. What does it actually mean...

The filmmaker Kevin Smith returns to the role of Silent Bob...

Nothing could be more irrelevant to Kevin Smith's audacious ''Dogma'' than ticking off...

Like Back in 2008, the Lions drafted Kevin Smith, even though Smith was badly...

... backfield in the wake of Kevin Smith's knee injury, and the addition of Haynesworth...

... The Physiological Basis of Politics,” by Kevin Smith, Douglas Oxley, Matthew Hibbing...

22

Page 23: A Tutorial - Knowledge Graph Construction from Text · 2018. 2. 5. · Tutorial Outline 1. Knowledge Graph Primer [Jay] 2. Knowledge Extraction from Text a. ... Entity resolution,

EntityNames:TwoMainProblems

DifferentNamesforEntities

InconsistentReferencesMSFT,APPL,GOOG…

Typos/MisspellingsBaarak,Barak,Barrack,…

NickNamesBamBam,Drumpf,…

EntitieswithSameName

PartialReference

Thingsnamedaftereachother

Firstnamesofpeople,Locationinsteadofteamname,Nicknames

Clinton,Washington,Paris,Amazon,Princeton,Kingston,…

SametypeofentitiessharenamesKevinSmith,JohnSmith,Springfield,…

23

Page 24: A Tutorial - Knowledge Graph Construction from Text · 2018. 2. 5. · Tutorial Outline 1. Knowledge Graph Primer [Jay] 2. Knowledge Extraction from Text a. ... Entity resolution,

EntityLinkingApproach

24

Washington drops 10 points after game with UCLA Bruins.

CandidateGenerationWashingtonDC,GeorgeWashington,Washingtonstate,LakeWashington,WashingtonHuskies,DenzelWashington,UniversityofWashington,WashingtonHighSchool,…

EntityTypesWashingtonDC,GeorgeWashington,Washingtonstate,LakeWashington,WashingtonHuskies,DenzelWashington,UniversityofWashington,WashingtonHighSchool,…

LOC/ORG

CoreferenceWashingtonDC,GeorgeWashington,Washingtonstate,LakeWashington,WashingtonHuskies,DenzelWashington,UniversityofWashington,WashingtonHighSchool,…

UWashington,Huskies

Coherence UCLABruins,USCTrojans

WashingtonDC,GeorgeWashington,Washingtonstate,LakeWashington,WashingtonHuskies,DenzelWashington,UniversityofWashington,WashingtonHighSchool,…

Vinculum,Ling,Singh,Weld,TACL(2015)

Page 25: A Tutorial - Knowledge Graph Construction from Text · 2018. 2. 5. · Tutorial Outline 1. Knowledge Graph Primer [Jay] 2. Knowledge Extraction from Text a. ... Entity resolution,

InformationExtraction

JohnLennon

AlfredLennon

JuliaLennon

Liverpoolbirthplace

childOf

childOf

spouse

InformationExtraction

NNP VBD VBD IN NNP TO NNP CC NNP NNP

John was born in Liverpool, to Julia and Alfred Lennon.Person Location Person Person

Lennon..JohnLennon...

Mrs.Lennon....hismother..

hisfatherAlfredhe

thePool

25