efficient random walk inference with knowledge bases · 2020-04-09 · efficient random walk...
TRANSCRIPT
![Page 1: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/1.jpg)
Efficient Random Walk Inference with Knowledge Bases
Ni Lao
Carnegie Mellon University
2012-7-11
1
Committee: William W. Cohen (Chair), Teruko Mitamura, Tom Mitchell,
C. Lee Giles (Pennsylvania State University)
![Page 2: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/2.jpg)
Knowledge itself is power. --Francis Bacon
7/13/2012 2 an algorithm which tries to achieve something,
e.g. IR/IE/QA/MT
KB as an edge-labeled graph
![Page 3: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/3.jpg)
Link Prediction -- a generic relational learning task
Given
a directed edge-labeled graph
a relation type r
a source node s (also called a query)
Find
the set of nodes G, so that r(s,t) for each t in G
7/13/2012 3
![Page 4: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/4.jpg)
Infer New Knowledge
7/13/2012 4
Application
Charlotte
BrontëPainter
Writer
Profession?Carpenter
What is the profession of Charlotte Brontë?
![Page 5: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/5.jpg)
Consider Friends/Family
7/13/2012 5
Charlotte
Brontë
Patrick Brontë
HasFather
Writer
Profession
![Page 6: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/6.jpg)
Consider Behaviors
7/13/2012 6
IsA-1 is the inverse of IsA Wrote-1 is the inverse of Wrote
Charlotte
BrontëWriter
Jane
Eyre
Wrote
Novel
A Tale of
Two Cities
IsA-1
IsA
Charles Dickens
Profession
Wrote-1
![Page 7: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/7.jpg)
Consider Literatures/Publications
7/13/2012 7
Mentioned
Charlotte
BrontëWriter
Mentioned-1
Painter
Profession
![Page 8: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/8.jpg)
Reading Recommendation
7/13/2012 8
a paper stream
these are interesting papers
Application
![Page 9: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/9.jpg)
7/13/2012 9
a paper river
![Page 10: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/10.jpg)
7/13/2012 10
a paper river
new development of an interesting topic
citewrite
![Page 11: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/11.jpg)
7/13/2012 11
a paper river
new papers of my favorite author
readwrite
write
my favorite author
![Page 12: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/12.jpg)
7/13/2012 12
a paper river
social recommendation
scientist who have similar interests
read
readreadwrite
read
![Page 13: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/13.jpg)
Relational Learning Goals
7/13/2012 13
robust
scalable
expressive define features expressing
sequences of relations on graph
combine many such features when making decisions
efficiently discover and calculate such features
Relational learning is a subfield of artificial intelligence, that learns with expressive logical or relational representations.
![Page 14: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/14.jpg)
Why is relational learning computationally challenging?
7/13/2012 14
Exponentially many path types
IsAIsA-1 AthletePlaysSport
Exponentially many path instantiations
s
Our solution: feature metrics
Our solution: sampling
![Page 15: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/15.jpg)
Thesis Outline
15
Ch. 2: motivation
Ch. 3: knowledge base inference (Lao+, EMNLP 2011)
Ch. 4: literature recommendation (Lao & Cohen, DILS 2012)
Ch. 6: relation extraction from parsed text (Lao+, EMNLP 2012)
Applications
Ch. 7: coordinate term extraction
Ch. 2: Path Ranking Algorithm (Lao & Cohen, MLJ 2010)
Ch. 5: efficient RW (Lao & Cohen, KDD 2010)
Ch. 6: distributed computing
Algorithms
Ch. 7: more expressive features (submitted)
Ch. 8: future work
![Page 16: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/16.jpg)
Outline
16
previous work
idea
contribution
Motivation
knowledge base inference
literature recommendation
relation extraction from parsed text
Applications
coordinate term extraction
Path Ranking Algorithm (PRA)
efficient RW
distributed computing
Algorithms
more expressive features
the problem
![Page 17: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/17.jpg)
Inductive Logic Programming
e.g.
First Order Inductive Learner--FOIL (Quinlan, ECML’93)
7/13/2012 17
HasFather(a,b) ^ Profession(b,y) Profession(a,y)
not robust
High precision Horn clauses
not scalable
experimental comparison later
expressive
![Page 18: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/18.jpg)
Undirected Graphical Models -- combine logics with GM
e.g. Markov Logic Networks (Kok & Domingos, ICML’05) Relational CRFs (Lao+, NIPS’10)
7/13/2012 18
Horn clauses smokes(A) & Friends(A,B)smokes(B)
as CRFs features smokes(A) & Friends(A,B) & !smokes(B)
robust
not scalable
expressive
![Page 19: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/19.jpg)
Random Walk with Restart -- ignore logic
19
P(Charlotte Writer)
P(Charlotte Painter)
7/13/2012
e.g. Tong+, ICDM’06
experimental comparison later
robust
not expressive
scalable
Mentioned
Charlotte
Brontë
Patrick Brontë
HasFather
Writer
Profession
Mentioned-1
Jane
Eyre
Wrote
Novel
A Tale of
Two Cities
IsA-1
IsA
Charles Dickens
Profession
Wrote-1
PainterProfession
![Page 20: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/20.jpg)
Outline
20
previous work
idea
contribution
Motivation
knowledge base inference
literature recommendation
relation extraction from parsed text
Applications
coordinate term extraction
Path Ranking Algorithm (PRA)
efficient RW
distributed computing
Algorithms
more expressive features
the problem
![Page 21: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/21.jpg)
Relational Classification -- combine logics with RWs
21
e.g. Path Ranking algorithm (Lao & Cohen, MLJ’10)
7/13/2012
P(Charlotte Writer; <HasFather,IsA>)
P(Charlotte Writer; <Mention,Mention-1,IsA>)
…
P(Charlotte Painter; <HasFather,IsA>)
P(Charlotte Painter; <Mention,Mention-1,IsA>)
…
Mentioned
Profession?
Charlotte
Brontë
Patrick Brontë
HasFather
Writer
Profession
Mentioned-1
Profession
Jane
Eyre
Wrote
Novel
A Tale of
Two Cities
IsA-1
IsA
Charles Dickens
Profession
Wrote-1
Painter
Profession
robust
scalable
expressive
![Page 22: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/22.jpg)
Contribution
22
made possible by
a family of easy-to-learn features
fast random walk
distributed computing
Apply relational learning
at scales not possible before
![Page 23: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/23.jpg)
Outline
23
previous work
idea
contribution
Motivation
knowledge base inference
literature recommendation
relation extraction from parsed text
Applications
coordinate term extraction
Path Ranking Algorithm (PRA)
efficient RW
distributed computing
Algorithms
more expressive features
the problem
![Page 24: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/24.jpg)
Path Ranking Algorithm (PRA)
24
(Lao & Cohen, MLJ 2010)
7/13/2012
( , ) ( ; )B
score s t P s t
e.g. π=<Mention,Mention-1,IsA>
robust
expressive
a weight
![Page 25: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/25.jpg)
( , ) ( ; )B
score s t P s t
Random Walk Calculation
25 7/13/2012
e.g.
π’=<Mention,Mention-1>
r=Profession
( ; ) ( ; ') ( ; )z
P s t P s z P z t r
Dynamic Programing
π’
π’
π’
z3
s z2
z1
t2
t1
rr
rr
later about how to do it x100 more efficiently using sampling scalable
![Page 26: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/26.jpg)
( , ) ( ; )B
score s t P s t
Feature Selection with Labeled Data
given training query set {(si, Gi)}
7/13/2012 26
( ) ( , )i
i j
i j G
hits f I f s t h
( , )1
( )( , )
i
i j
j G
i i j
j
f s t
accuracy f aN f s t
I(): the indicator function N: total number of queries
![Page 27: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/27.jpg)
( , ) ( ; )B
score s t P s t
Estimating θ
for a relation r
generate positive and negative node pairs {(si, ti)}
for each (si, ti) generate (xi, yi)
xi is a vector of RW features of different paths π
yi is a binary label r(si, ti)
estimate θ by L1/L2 regularized (elastic-net) logistic regression
27 7/13/2012
![Page 28: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/28.jpg)
Outline
28
previous work
idea
contribution
Motivation
knowledge base inference
literature recommendation
relation extraction from parsed text
Applications
coordinate term extraction
Path Ranking Algorithm (PRA)
efficient RW
distributed computing
Algorithms
more expressive features
the problem
![Page 29: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/29.jpg)
Knowledge Base Inference
NELL (Never Ending Language Learner) v165
353 relations
0.7M nodes (concepts)
1.7M edges
7/13/2012 29
(Lao, Mitchell, Cohen, EMNLP 2010)
Example NELL relations
Application
![Page 30: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/30.jpg)
IsAIsA-1 AthletePlaysSport
7/13/2012 30
PRA Uses Broad Coverage Features
AthletePlaysSport(HinesWard, ?)
![Page 31: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/31.jpg)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
p@10 p@100 p@1000
Pre
cisi
on
FOIL
PRA
PRA Has Much Higher Recall and Is Much Faster
7/13/2012 31
Mechanical Turk evaluate new beliefs of 8 functional relations
PRA trains in an hour vs. FOIL trains in a few days
![Page 32: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/32.jpg)
Outline
32
previous work
idea
contribution
Motivation
knowledge base inference
literature recommendation
relation extraction from parsed text
Applications
coordinate term extraction
Path Ranking Algorithm (PRA)
efficient RW
distributed computing
Algorithms
more expressive features
the problem
![Page 33: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/33.jpg)
Biology Literatures
Databases
Yeast: 0.8M nodes, 3.5M edges
Fly: 0.7M nodes, 16.9M edges
33
Application
7/13/2012
author
71k
paper
50k
gene
5.6k 160k
Relates to
1.6K
Cites
0.3M
year
64 Before
title word
40k 0.5M
journal
1.0k
institute
6k
39k
mesh
descriptors/
qualifiers
29k
1.1M
chemical
14k
0.3M
![Page 34: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/34.jpg)
Recommendation Tasks
Literature Recommendation year, author papers a user is going to read
training data --- 1 user over 20 years
34
(collected from Dr. John Woolford’s computer)
7/13/2012
![Page 35: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/35.jpg)
PRA Combines Dozens of Recommendation Strategies
7/13/2012 35
citewrite
readwrite
write
![Page 36: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/36.jpg)
7/13/2012 36
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
2 3 4 5
MR
R
Maximum Path Length (L)
PRARWRRWR(no training)Community-basedContent-based
Reading Recommendation
Mean Reciprocal
Rank
![Page 37: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/37.jpg)
Outline
37
previous work
idea
contribution
Motivation
knowledge base inference
literature recommendation
relation extraction from parsed text
Applications
coordinate term extraction
Path Ranking Algorithm (PRA)
efficient RW
distributed computing
Algorithms
more expressive features
the problem
![Page 38: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/38.jpg)
Efficient Random Walks
Exact calculation of random walks results in non-zero probabilities for many internal nodes
38
(Lao & Cohen, KDD 2010)
1 billion
nodesquery
node A few nodes that
we care about
Charlotte
Writer
Painter
Zebra
7/13/2012
![Page 39: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/39.jpg)
Idea: a few random walkers (particles) are enough to distinguish good target
nodes from bad ones
39
1 billion
nodesquery
node A few nodes that
we care about
Writer
Painter
Zebra Charlotte
7/13/2012
details
![Page 40: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/40.jpg)
0.17
0.18
0.19
0.20
0.21
0.22
0.23
1 10 100 1000
MR
R
Speedup
200
1k1k
300
Finger PrintingParticle FilteringFixed TruncationBeam Truncation
Compare Speedup Approaches
40 7/13/2012
10x ~ 100x faster with little loss of quality
exact random walks
exact random walks
Gene Recommendation on Fly Data (N=2k)
Mean Reciprocal
Rank
number of particles
![Page 41: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/41.jpg)
Outline
41
previous work
idea
contribution
Motivation
knowledge base inference
literature recommendation
relation extraction from parsed text
Applications
coordinate term extraction
Path Ranking Algorithm (PRA)
efficient RW
distributed computing
Algorithms
more expressive features
the problem
![Page 42: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/42.jpg)
Relation Extraction
7/13/2012 42
wrote
She
Mention
dobj
Charlotte
was
nsubjnsubj
Jane Eyre
Charlotte
Bronte
Mention
Jane Eyre
Mention
Coreference Resolution
Entity
Resolution
Freebase
News Corpus
Dependency Trees
Write
Patrick BrontëHasFather
?
Profession
Writer
(21M concepts, 70M edges)
(60M )
Application
![Page 43: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/43.jpg)
7/13/2012 43
Can PRA scale?
Can PRA learn syntactic-semantic rules?
![Page 44: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/44.jpg)
Distributed Computing
7/13/2012 44
Large number of queries e.g. 0.3M/2M persons have known profession Solution: map/reduce to explore path, generate training samples, calculate gradient, and do predictions for each query
Large text graph e.g. 60M documents Solution: each node keeps the Freebase graph in memory relevant sentences are loaded/unloaded for each query
![Page 45: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/45.jpg)
<M, conj, M-1, Profession>
7/13/2012 45
Combine Syntax with Semantics
“McDougall and Simon Phillips collaborated…”
M
Profession
IanMcDougall
?
M
conj
Performer
SimonPhilips
subj
Freebase
Parsed TextIan McDougall
Simon Phillips
collaborated
![Page 46: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/46.jpg)
<M, WORD, CW-1, profession-1, profession>
7/13/2012 46
Combine Text with Semantics
M
BarackObama
CW
President
“he”
Freebase
Parsed Text
leaderProfession
“leader”
other
persons
TW
tokens
words
host
CW
“host”
“president”
e.g.“The president said …”
![Page 47: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/47.jpg)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Profession Nationality Parent
MR
R Freebase-only
Text-only
Freebase+Text
Text and KB Work Better Together
7/13/2012 47
Tested by existing knowledge in Freebase
Mean Reciprocal
Rank
with closed world assumption
![Page 48: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/48.jpg)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Profession Nationality Parents
Pre
cisi
on
p@100
p@1k
p@10k
Highly Accurate New Beliefs
7/13/2012 48
manually evaluated new beliefs
![Page 49: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/49.jpg)
Outline
49
previous work
idea
contribution
Motivation
knowledge base inference
literature recommendation
relation extraction from parsed text
Applications
coordinate term extraction
Path Ranking Algorithm (PRA)
efficient RW
distributed computing
Algorithms
more expressive features
the problem
![Page 50: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/50.jpg)
Coordinate Term Extraction Task
parsed MUC-6 corpus 153k nodes, 748K edges
30 queries given 4 person names as seeds, find other persons
7/13/2012 50
Words/POSs
Tokens
Tokens
(Minkov & Cohen, ECML 2010)
Application
W
BillGates
BillGates
founded
founded
nsubj
W
W
SteveJobs
SteveJobs founded
nsubj
W
vbd
POS
POS
nnp
POS
POS
W: word POS: part of speech nnp: singular proper nouns vbd: verb, past tense nsubj: subject of a verb
![Page 51: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/51.jpg)
Good Paths Are Quite Long
7/13/2012 51
<W-1,nsubj,W,W-1,nsubj-1,W>
find entities with similar behaviors
Words
Tokens
Tokens
W
BillGates
BillGates
founded
founded
nsubj
W
W
SteveJobs
SteveJobs founded
nsubj
W
![Page 52: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/52.jpg)
Good Paths Are Quite Long
7/13/2012 52
0.00
0.05
0.10
0.15
0.20
3 4 5 6
MA
P
Max Path Length
kF
kF+1B
kF+2B
kF+3B
Mean Average
Precision
![Page 53: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/53.jpg)
Forward Search Is Wasteful
7/13/2012 53
s
t
Find paths that connect s and t
![Page 54: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/54.jpg)
Bidirectional Search Is More Efficient
7/13/2012 54
s
t
z
challenge is to calculate P(s→t;π)
![Page 55: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/55.jpg)
Forward vs. Backward RWs
7/13/2012 55
( ; ) ( ; ') ( ; )z
P t s P t z P z s r
( ; ) ( ; ') ( ; )z
P s t P s z P z t r π’
π’
π’
z3
s z2
z1
t2
t1
rr
rr
evaluate P(s→t;π)
for many s
π’
π’
π’
z3
t z2
z1
s2
s1
rr
rr
Forward
Backward
evaluate P(s→t;π)
for many t
Details
![Page 56: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/56.jpg)
Bidirectional Search with RW
7/13/2012 56
1 2 1 2( ; ) ( ; ) ( ; )z
P s t P s z P t z
π2
π2
π2 t
z3
z2
z1π1
π1sπ1
Forward RW Backward RW
![Page 57: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/57.jpg)
0.1
1
10
100
1000
3 4 5 6
Pat
h F
ind
ing
Tim
e (
s)
Max Path Length
2F+1B
3F+1B
3F
4F
2F+2B3F+2B
5F
4F+2B
3F+3B
4F+1B
Bidirectional Search Is Much Faster
7/13/2012 57
Exceed 20Gb memory limit
1000x faster
![Page 58: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/58.jpg)
Need for Lexicalized Paths
7/13/2012 58
P(vbd→t | <POS-1, nsubj-1, W>)
Task: find person entities
W
BillGates
BillGates founded
nsubjW
SteveJobs
SteveJobs founded
nsubj
vbd
POSPOS
Tokens
Words
POSs
![Page 59: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/59.jpg)
Evaluate Lexicalized Paths
7/13/2012 59
Given an example (si, ti)
calculate P(z→ti;π) for many z
π
π
πt
z3
z2
z1
![Page 60: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/60.jpg)
0.0
0.1
0.2
0.3
0.4
L=6
MA
P
RWR(no train)
RWR
FOIL
PRA
PRA+c2
PRA+c3
Person Name Extraction
7/13/2012 60
~1000 correct answers
Mean Average
Precision
![Page 61: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/61.jpg)
Outline
61
previous work
idea
contribution
Motivation
knowledge base inference
literature recommendation
relation extraction from parsed text
Applications
coordinate term extraction
Path Ranking Algorithm (PRA)
efficient RW
distributed computing
Algorithms
more expressive features
the problem
![Page 62: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/62.jpg)
Future Work
Apply knowledge to NLP/IE/IR/CV tasks
7/13/2012 62
arg max ( | , )decision
P decision context KB
wrote
She
Mention
dobj
Charlotte
was
nsubjnsubj
Jane Eyre
Charlotte
Bronte Jane Eyre
Mention
Coreference?
Entity
Resolution
Dependency Trees
Write
IsA FemaleKB
Text
![Page 63: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/63.jpg)
Future Work
Conjunctions of Paths
rules can have tree structures
with source/constant/target nodes as leafs
7/13/2012 63
founded
t3
nsubj
W
dobj
W
Apple
W
SteveJobs
t2 t1
source s constant z target t
![Page 64: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/64.jpg)
Future Work
Conjunctions of Paths
forward PCRW with multiple walkers
7/13/2012 64
founded
t3
nsubj
W
dobj
W
Apple
W
SteveJobs
t2 t1
source s constant z target t
![Page 65: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/65.jpg)
Future Work
Conjunctions of Paths
backward PCRW with multiple walkers
7/13/2012 65
founded
t3
nsubj
W
dobj
W
Apple
W
SteveJobs
t2 t1
source s constant z target t
![Page 66: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/66.jpg)
Contribution
66
Made possible by
a family of easy-to-learn features (3 types)
fast random walk (sampling)
distributed computing
Apply relational learning at scales not possible before.
Leads to new applications!
![Page 67: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/67.jpg)
other work I did at CMU
Relational CRFs (Lao+, NIPS’10)
Question answering (Lao+, NTCIR’08)
Utility based retrieval evaluation (Yang+, SIGIR’07)
7/13/2012 67
![Page 68: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/68.jpg)
7/13/2012 68
![Page 69: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/69.jpg)
Future Work
KB extension new relation types, new concepts
7/13/2012 69
arg max ( | )KB
P corpus KB KB
arg max ( | , )KB
P decisions contexts KB KB
Unsupervised
Supervised
![Page 70: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/70.jpg)
Directed Graphical Models e.g. Probabilistic Relational Models (Getoor+, ICML’01)
7/13/2012 70
model structure restricted to DAG
cannot express features corresponding to chains
![Page 71: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/71.jpg)
Coverage of top k triples
7/13/2012 71
Profession Triples Unique Persons
1k 970
10k 8,726
100k 79,885
![Page 72: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/72.jpg)
Repeatedly Combine Forward and Backward RWs
7/13/2012 72
Forward search
+Backward Search
+Backward Search
W-1,conj_and,W>
<W-1,conj_and,W,
W-1,conj_and,W,
![Page 73: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/73.jpg)
Summary of PRA
7/13/2012 73
Stage Computation
Path Discovery
given {(si, Gi)}, find {f ; acc(f)>=a, hits(f)>=h}
Generate Training Samples
generate {(si, ti)} and {(xi, yi)}
Logistic Regression Training
Prediction
apply model to nodes s in domain(r)
2
1 1 2 2arg max ( ) || || || ||i
i
l
![Page 74: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/74.jpg)
Need for Lexicalized Paths
7/13/2012 74
Bias toward MLB
A prior over the leagues participated by Boston Braves university athletes
Task=AthletePlaysInLeague
( ; )P mlb t
1,;
AthletePlaysForTeamP BostonBraves t
AthletePlaysInLeagure
![Page 75: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/75.jpg)
Need for Lexicalized Paths
7/13/2012 75
Bias toward Google
Companies around Google
Task=CompetesWith
( ; )P google t
; ,P Google t CompetesWith CompetesWith
![Page 76: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:](https://reader035.vdocuments.mx/reader035/viewer/2022062918/5edd915aad6a402d6668b29c/html5/thumbnails/76.jpg)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
L=3
MR
R
RWR(no train)
RWR
PRA
PRA+c1
PRA+c2
Knowledge Base Inference
7/13/2012 76
16 tasks