part5 summary slides - mining knowledge graphs from text · / o o µ ] } v } (
TRANSCRIPT
![Page 1: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/1.jpg)
Tutorial Overview
1
Part 2: Knowledge Extraction
Part 3:Graph Construction
Part 1: Knowledge Graphs
Part 4: Critical Analysis
https://kgtutorial.github.io
![Page 2: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/2.jpg)
Tutorial Outline1. Knowledge Graph Primer [Jay]
2. Knowledge Extraction Primer [Jay]
Coffee Break
3. Knowledge Graph Constructiona. Probabilistic Models [Jay]b. Embedding Techniques [Sameer]
4. Critical Overview and Conclusion [Sameer]
2
![Page 3: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/3.jpg)
Critical OverviewSUMMARY
SUCCESS STORIES
DATASETS, TASKS, SOFTWARES
EXCITING RESEARCH DIRECTIONS
3
![Page 4: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/4.jpg)
Critical OverviewSUMMARY
SUCCESS STORIES
DATASETS, TASKS, SOFTWARES
EXCITING RESEARCH DIRECTIONS
4
![Page 5: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/5.jpg)
Why do we need Knowledge graphs?• Humans can explore large database in intuitive
ways
•AI agents get access to human common sense knowledge
5
![Page 6: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/6.jpg)
Knowledge graph construction
• Who are the entities (nodes) in the graph?
• What are their attributes and types (labels)?
• How are they related (edges)?
6
E1A1A2
E2
E3
A1A2
A1A2
![Page 7: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/7.jpg)
Knowledge Graph Construction
7
TextKnowledge Extraction
Graph Construction
Extractiongraph
Knowledgegraph
![Page 8: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/8.jpg)
Two perspectives
8
Extraction graph Knowledge graph
Who are the entities? (nodes)
• Named Entity Recognition
• Entity Coreference
• Entity Linking• Entity Resolution
What are their attributes? (labels)
• Entity Typing • Collective classification
How are they related? (edges)
• Semantic role labeling
• Relation Extraction
• Link prediction
![Page 9: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/9.jpg)
Knowledge Extraction
9
John Lennon
Alfred Lennon
Julia Lennon
Liverpoolbirthplace
childOf
childOf
John was born in Liverpool, to Julia and Alfred Lennon.
John was born in Liverpool, to Julia and Alfred Lennon.Person Location Person Person
NNP VBD VBD IN NNP TO NNP CC NNP NNP
Lennon..John Lennon...
Mrs. Lennon.... his mother ..
his fatherAlfredhe
the Pool
NLP
InformationExtraction
Extraction graph
Annotated text
Text
![Page 10: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/10.jpg)
Information Extraction
Defining domain
Learning extractors
Scoring candidate facts
Supervised
Semi-supervised
Unsupervised
10
Fusing multiple extractors
Single extractor
![Page 11: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/11.jpg)
Knowledge Graph Construction
11
Text
Part 2: Knowledge Extraction
Extractiongraph
Knowledgegraph
Part 3:Graph Construction
![Page 12: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/12.jpg)
Issues with Extraction GraphExtracted knowledge could be:
• ambiguous
• incomplete
• inconsistent
12
![Page 13: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/13.jpg)
Two approaches for KG construction PROBABILISTIC MODELS
EMBEDDING BASED MODELS
13
![Page 14: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/14.jpg)
Two approaches for KG construction PROBABILISTIC MODELS
EMBEDDING BASED MODELS
14
![Page 15: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/15.jpg)
Two classes of Probabilistic Models
GRAPHICAL MODEL BASED
◦ Possible facts in KG are variables
◦ Logical rules relate facts
◦ Probability satisfied rules
◦ Universal-quantification
RANDOM WALK BASED
◦ Possible facts posed as queries
◦ Random walks of the KG constitute “proofs”
◦ Probability path lengths/transitions
◦ Local grounding
15
![Page 16: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/16.jpg)
Illustration of KG Identification
Ontology:Dom(albumArtist, musician)Mut(novel, musician)
Uncertain Extractions:.5: Lbl(Fab Four, novel).7: Lbl(Fab Four, musician).9: Lbl(Beatles, musician)
.8: Rel(Beatles,AlbumArtist, Abbey Road)
Entity Resolution:SameEnt(Fab Four, Beatles)
Beatles
Fab FourAbbey Roadmusician
Rel(AlbumArtist)Lbl
musician
Fab Four Beatles
novel
Abbey Road
SameEnt
(Annotated) Extraction Graph
After Knowledge Graph Identification
PUJARA+ISWC13; PUJARA+AIMAG15
![Page 17: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/17.jpg)
Random Walk Illustration
17
Query: R(Lennon, PlaysInstrument, ?)
![Page 18: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/18.jpg)
Two approaches for KG construction PROBABILISTIC MODELS
EMBEDDING BASED MODELS
18
![Page 19: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/19.jpg)
Why embeddings?
Limitation to Logical Relations
Computational Complexity of Algorithms
• Representation restricted by manual design• Clustering? Asymmetric implications?• Information flows through these relations
• Difficult to generalize to unseen entities/relations
• Learning is NP-Hard, difficult to approximate• Query-time inference is also NP-Hard• Not easy to parallelize, or use GPUs• Scalability is badly affected by representation
Embedding based models
• Everything as dense vectors• Captures many relations• Learned from data
• Learning using stochastic gradient, back-propagation
• Querying is often cheap• GPU-parallelism friendly
• Can generalize to unseen entities and relations
• Efficient inference at large scale
Limitations of probabilistic models
![Page 20: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/20.jpg)
Relation Embeddings
20
![Page 21: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/21.jpg)
21
Part 2: Knowledge Extraction
Part 3:Graph Construction
Part 1: Knowledge Graphs
![Page 22: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/22.jpg)
Critical OverviewSUMMARY
SUCCESS STORIESDATASETS, TASKS, SOFTWARES
EXCITING RESEARCH DIRECTIONS
22
![Page 23: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/23.jpg)
Success stories
23
YAGO
![Page 24: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/24.jpg)
Success story: OpenIE (ReVerb)
24
openie.allenai.org
![Page 25: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/25.jpg)
Success story: NELL
25
![Page 26: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/26.jpg)
Success story: YAGO• Input: Wikipedia infoboxes, WordNet and GeoNames
• Output: KG with 350K entity types, 10M entities, 120M facts
• Temporal and spatial information
26
![Page 27: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/27.jpg)
27
Link
![Page 28: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/28.jpg)
Success story
• DBPedia is automatically extracted structured data from Wikipedia• 17M canonical entities• 88M type statements• 72M infobox statements
28
![Page 29: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/29.jpg)
DeepDive
• Machine learning based extraction system
• Best Precision/recall/F1 in KBP-slot filling task 2014 evaluations (31 teams participated)
29
![Page 30: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/30.jpg)
30
Defining domain
Learningextractors
Scoringcandidate facts
Fusing extractors
ConceptNet
NELL
Knowledge Vault
OpenIE
IE systems in practice
Heuristic rules
Classifier
![Page 31: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/31.jpg)
Critical OverviewSUMMARY
SUCCESS STORIES
DATASETS, TASKS, SOFTWARES
EXCITING RESEARCH DIRECTIONS
31
![Page 32: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/32.jpg)
Datasets• KG as datasets
• FB15K-237 Knowledge base completion dataset based on Freebase1
• DBPedia Structured data extracted from Wikipedia• NELL Read the web datasets• AristoKB Tuple knowledge base for Science domain
• Text datasets• Clueweb09: 1 billion webpages (sample of Web)• FACC1: Freebase Annotations of the Clueweb09 Corpora• Gigaword: automatically-generated syntactic and discourse structure• NYTimes: The New York Times Annotated Corpus
• Datasets related to Semi-supervised learning for information extractionLink: entity typing, concept discovery, aligning glosses to KB, multi-view learning
32
1see Dettmers et al, 2017 for details (https://arxiv.org/pdf/1707.01476.pdf)
![Page 33: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/33.jpg)
Shared tasks• Text Analysis Conference on Knowledge Base Population (TAC KBP)
• Slot filling task• Cold Start KBP Track• Tri-Lingual Entity Discovery and Linking Track (EDL)• Event Track• Validation/Ensembling Track
33
![Page 34: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/34.jpg)
Software: NLP• Stanford CoreNLP: a suite of core NLP tools
[link] (Java code)
• FIGER: fine-grained entity recognizer assigns over 100 semantic types link (Java code)
• FACTORIE: out-of-the-box tools for NLP and information integrationlink (Scala code)
• EasySRL: Semantic role labelinglink (Java code)
34
![Page 35: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/35.jpg)
Software: Extracting and Reasoning• Open IE
(University of Washington) Open IE 4.2 link (Scala code) Stanford Open IE link (Java code)
• Interactive Knowledge Extraction (IKE)(Allen Institute for Artificial Intelligence) link (Scala code)
• PSL: Probabilistic soft logic link (Java code)
• ProPPR: Programming with Personalized PageRanklink (Java code)
35
![Page 36: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/36.jpg)
Critical OverviewSUMMARY
SUCCESS STORIES
DATASETS, TASKS, SOFTWARES
EXCITING RESEARCH DIRECTIONS
36
![Page 37: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/37.jpg)
Exciting Active Research• INTERESTING APPLICATIONS OF KG
• MULTI-MODAL INFORMATION EXTRACTION
• KNOWLEDGE AS SUPERVISION
• COMMON KNOWLEDGE
37
![Page 38: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/38.jpg)
Exciting Active Research• INTERESTING APPLICATIONS OF KG
• MULTI-MODAL INFORMATION EXTRACTION
• KNOWLEDGE AS SUPERVISION
• COMMON KNOWLEDGE
38
![Page 39: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/39.jpg)
Interesting application of Knowledge Graphs
The Literome Project [link]
• Automatic system for extracting genomic knowledge from PubMed articles
• Web-accessible knowledge base
39Literome: PubMed-Scale Genomic Knowledge Base in the Cloud, Hoifung Poon et al., Bioinformatics 2014
![Page 40: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/40.jpg)
Interesting application of Knowledge Graphs
40
Chronic disease management: develop AI technology for predictive and preventive personalized medicine to reduce the national healthcare expenditure on chronic diseases (90% of total cost)
![Page 41: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/41.jpg)
Exciting Active Research• INTERESTING APPLICATIONS OF KG
• MULTI-MODAL INFORMATION EXTRACTION
• KNOWLEDGE AS SUPERVISION
• COMMON KNOWLEDGE
41
![Page 42: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/42.jpg)
Knowledge Base Completion
Table from Dettmers, et al. (2017)
ScoringFunction
![Page 43: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/43.jpg)
Multimodal KB Embeddings
EncoderObject
ScoringFunction
Lookup
CNN
LSTM
FeedFwd
Entity
Images
Text
Numbers, etc.
![Page 44: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/44.jpg)
Exciting Active Research• INTERESTING APPLICATIONS OF KG
• MULTI-MODAL INFORMATION EXTRACTION
• KNOWLEDGE AS SUPERVISION
• COMMON KNOWLEDGE
44
![Page 45: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/45.jpg)
Knowledge as Supervision
LearningAlgorithm
Learned Model
UserUpdate Model
X husband of Y => spouseOf(X,Y)✔
45
LearningAlgorithm
Learned Model
UserUpdate Model
spouseOf(Barack, Michelle)✔
Problem 1: Each annotation takes timeProblem 2: Each annotation is a drop in the ocean
Many different options- Generalized Expectation- Posterior Regularization- Labeling functions in SNORKEL
![Page 46: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/46.jpg)
Exciting Active Research• INTERESTING APPLICATIONS OF KG
• MULTI-MODAL INFORMATION EXTRACTION
• KNOWLEDGE AS SUPERVISION
• COMMON KNOWLEDGE
46
![Page 47: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/47.jpg)
Aristo Science QA challenge• Science questions dataset
~5K 4-way multiple choice questionsFrogs lay eggs that develop into tadpoles and then into adult frogs. This sequence of changes is an example of how living things _____(A) go through a life cycle(B) form a food web(C) act as a source of food(D) affect other parts of the ecosystem
47
Science knowledge
frog’s life cycle, metamorphosis
Common sense knowledge
frog is an animal, animals have life cycle
![Page 48: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/48.jpg)
Future……
48
Future KG construction
system
Consume online streams
of data
Represent context beyond
facts
Supports humanity
Corrects its own mistakes
![Page 50: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/50.jpg)
Two perspectives
50
Extraction graph Knowledge graph
Who are the entities? (nodes)
What are their attributes? (labels)
How are they related? (edges)
![Page 51: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/51.jpg)
Natural Language Processing
John was born in Liverpool, to Julia and Alfred Lennon.NNP VBD VBD IN NNP TO NNP CC NNP NNP
John was born in Liverpool, to Julia and Alfred Lennon.Person Location Person Person
Lennon..John Lennon...
Mrs. Lennon.... his mother ..
his fatherAlfredhe
the Pool
Sent
ence Dependency Parsing,
Part of speech tagging,Named entity recognition…
Docu
men
t
Within-doc Coreference...
![Page 52: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/52.jpg)
NLP annotations features for IE
Combine tokens, dependency paths, and entity types to define rules.
Argument 1 Argument 2,Person Organization
DT CEO of
appos nmod
casedet
Bill Gates, the CEO of Microsoft, said …Mr. Jobs, the brilliant and charming CEO of Apple Inc., said …… announced by Steve Jobs, the CEO of Apple.… announced by Bill Gates, the director and CEO of Microsoft.… mused Bill, a former CEO of Microsoft.and many other possible instantiations…
52
![Page 53: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/53.jpg)
Success story: OpenIE•Key contributions:
• No need for human defined relation schemas• First ever successful open-source open domain IE system
• ReVerb• Input = Clueweb09 corpus (1B web pages)• Output = 15M high-precision extractions
53
![Page 54: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/54.jpg)
Open IE Systems
54
2007 2010 2012 2014 2016
OpenIE v 1.0 v 2.0 v 3.0 OpenIE 4.0 OpenIE 5.0 TextRunner ReVerb OLLIE
CRFSelf-training
POS-tag basedrelation extraction
Dependency parse based extraction
SRL-based extraction;temporal, spatial extractions
Supports compound noun phrases; numbers; lists
Increase in precision, recall, expressiveness
Derived from Prof. Mausam’s slides
![Page 55: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/55.jpg)
Success story: NELL•Key technical contributions:
• “Never ending learning” paradigm• “Coupled bootstrap learning” to reduce semantic drift
• Input: Clueweb09 corpus (1B web pages)
• Ontology: ~2K predicatesO(100K)constraints between predicates
• Output: 50 million candidate facts3 million high-confidence facts
55
![Page 56: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/56.jpg)
Success story: YAGO•Key contributions:
• Rich Ontology: Linking Wikipedia categories to WordNet• High Quality: High precision extractions (~95%)
56
![Page 57: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/57.jpg)
Success story: ConceptNet• Commonsense knowledge base
• Key contributions: • Freely available resource: covers wide range of common sense concepts and
relations organized in a easy-to-use semantic network• NLP toolkit based on this resource: supports analogy, text summarization,
context dependent inferences
• ConceptNet4 was manually built using inputs from thousands of people• 28 million facts expressed in natural language • spanning 304 different languages
57
![Page 58: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/58.jpg)
DeepDive• Machine learning based extraction system
• Key contributions:• scalable, high-performance inference and learning engine• Developers contribute features (rules) not algorithms• Combines data from variety of sources (webpages, pdf, figures, tables)
58
![Page 59: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/59.jpg)
Future……
59
![Page 60: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/60.jpg)
Aristo ScienceKB• AI2’s TupleKB dataset: link
• Open problems o Best KR for Science domaino Domain targeted KB completionoMeasuring recall w.r.t. end task
60
![Page 61: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/61.jpg)
(1) Future research directions: Going beyond facts
• Most of the existing KGs are designed to represent and extract binary relations good enough for search engines
• Applications like QA demand in depth knowledge about higher level structures like activities, events, processes
61
![Page 62: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/62.jpg)
(2) Future research directions: Online KG Construction
• One shot KG construction Online KG construction• Consume online stream of data• Temporal scoping of facts• Discovering new concepts automatically• Self-correcting systems
62
![Page 63: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/63.jpg)
(2) Future research directions: Online KG Construction
• Continuously learning and self-correcting systems• [Selecting Actions for Resource-bounded Information Extraction using
Reinforcement Learning, Kanani and McCallum, WSDM 2012]• Presented a reinforcement learning framework for budget constrained information extraction
• [Never-Ending Learning, Mitchell et al. AAAI 2015]• Tom Mitchell says “Self reflection and an explicit agenda of learning subgoals” is an important
direction of future research for continuously learning systems.
63
![Page 64: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/64.jpg)
AI2’s ScienceKB
64**Upcoming article on ``High Precision Knowledge Extraction for Science domain’’
Existing knowledge graphs
• Too named entity centric (no domain relevance)• Too noisy (not directly usable by inference systems)
![Page 65: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/65.jpg)
AI2’s ScienceKB
65
3 eat("fox", "rabbit")2 eat("cat", "mouse")2 kill("coyote", "sheep")1 kill("lion", "deer")
21 eat("shark", "fish")2 catch("cat", "mouse")5 chase("cat", "mouse")6 kill("cat", "mouse")1 kill("fox", "chicken")
3 eat("anteater", "ant")1 feed-on("bear", "seed")
10 live-in("bear", "Alaska")11 live-in("bear", "cave")21 live-in("bear", "forest")3 live-in("bear", "mountain")
Highprecisionphrasaltuples
FinalHigh
precisionScience
KB
Defining Domain
Learning canonical predicates
High precision tuple extraction
3 eat("fox", "rabbit")2 eat("cat", "mouse")2 kill("coyote", "sheep")1 kill("lion", "deer")
21 eat("shark", "fish")2 catch("cat", "mouse")5 chase("cat", "mouse")6 kill("cat", "mouse")1 kill("fox", "chicken")
3 eat("anteater", "ant")1 feed-on("bear", "seed")
10 live-in("bear", "Alaska")11 live-in("bear", "cave")21 live-in("bear", "forest")3 live-in("bear", "mountain")
Open IE + headword extraction Learn &
applyschema
mapping rules
Turk + auto-scoring
Domain-appropriatesentences
Reintroduce phrasal tuples
Domain vocabulary
Text corpus
Searchengine
**Upcoming article on ``High Precision Knowledge Extraction for Science domain’’
![Page 66: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/66.jpg)
AI2’s ScienceKB
66**Upcoming article on ``High Precision Knowledge Extraction for Science domain’’
AI2’s TupleKB dataset: link> 300K common-sense and science facts> 80% precision
Hybrid Approach: Adding structure to Open domain IE
Defining domain
Learningextractors
Scoringcandidate facts
Open domain IE
Distant supervision to add structure
![Page 67: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/67.jpg)
Future research directions: Going beyond facts
• Fact: Individual knowledge tuples(plant, take in, CO2)
• Event frame: more context how, when, where?
• Processes: representing larger structures, sequence of events e.g. Photosynthesis
67
subject plant
predicate Take in
object CO2
time daytime
[ Modeling Biological Processes for Reading Comprehension, Berant et al., EMNLP 2014 ]
![Page 68: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/68.jpg)
(3) Exciting active research:Ambitious Project
68
![Page 69: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/69.jpg)
(2) Exciting active research:Multi-modal information extraction
69
Text
Images
Multi-modal Knowledge Graph
![Page 70: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/70.jpg)
NEIL: Extracting Visual Knowledge from Web Data
70[Chen et al., "NEIL: Extracting Visual Knowledge from Web Data," ICCV 2013]
![Page 71: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/71.jpg)
NEIL: Extracting Visual Knowledge from Web Data
71[Chen et al., "NEIL: Extracting Visual Knowledge from Web Data," ICCV 2013]
![Page 72: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/72.jpg)
WebChild: Text + Images
72[Tandon et al. “Commonsense in Parts: Mining Part-Whole Relations from the Web and Image Tags.” AAAI ’16]
![Page 73: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/73.jpg)
Knowledge Base Completion
Link PredictionEntity Prediction
![Page 74: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/74.jpg)
Restrictions in the Model
Each object has a vector representation:• Limits number of objects• Large number of parameters• Is not compositional (doesn’t generalize)
What about other kinds of objects?• Dates and Numbers: should generalize• Text: Names and Descriptions• Images: Portraits, Posters, etc.
![Page 75: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/75.jpg)
Multimodal KB Embeddings
EncoderObject
Lookup
CNN
LSTM
FeedFwd
Entity
Images
Text
Numbers, etc.
![Page 76: Part5 summary slides - Mining Knowledge Graphs from Text · / o o µ ] } v } (](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4aca856c0d6724390eb033/html5/thumbnails/76.jpg)
Augmenting Existing Datasets
MovieLens-100k-plus
Relations 13
Users 943
Movies 1682
Posters 1651
Ratings 100,000
YAGO3-10-plus
Relations 37 → 45
Entities 123,182
Structure Triples 1,079,040
Numbers (Years) 1651
Descriptions 107,326
Images 61,246