1
LearningMarkov Logic Network Structure
Via Hypergraph Lifting
Stanley KokDept. of Computer Science and Eng.
University of WashingtonSeattle, USA
Joint work with Pedro Domingos
2
Synopsis of LHL
Input: Relational DB
AdvisesPete Sam Pete SaulPaul Sara… …
TAsSam CS1Sam CS2Sara CS1… …
TeachesPete CS1Pete CS2Paul CS2… …
2.7 Teaches(p, c) Æ TAs(s, c) ) Advises(p, s)
1.4 Advises(p, s) ) Teaches(p, c) Æ TAs(s, c)
-1.1 TAs(s, c) Æ Advises (s, p)
…
Output: Probabilistic KB
Input: Relational DBAdvises
Pete Sam
Pete Saul
Paul Sar
… …
TAs
Sam CS1
Sam CS2
Sara CS1
… …
Teaches
Pete CS1
Pete CS2
Paul CS2
… …
2.7 Teaches(p, c) Æ TAs(s, c) ) Advises(p, s)
1.4 Advises(p, s) ) Teaches(p, c) Æ TAs(s, c)
-1.1 TAs(s, c) ) Advises(s, p)
…
Output: Probabilistic KB
Sam
Pete CS1
CS2
CS3
CS4
CS5
CS6
CS7
CS8
Paul
Pat
Phil
Sara
Saul
SueTAs
AdvisesTeaches
PetePaulPatPhil
SamSaraSaulSue
CS1 CS2CS3 CS4CS5 CS6CS7 CS8
Teaches
TAs
Advises
Professor
Student
Course
Goal of LHL
3
Area under Prec Recall Curve (AUC)
Conditional Log-Likelihood (CLL)
LHL
BUSL MSL
LHL BUSL MSL
Experimental Results
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-0.39
-0.37
-0.35
-0.33
-0.31
-0.29
-0.27
-0.25
LHL LHL
BUSL MSLBUSL MSL
Background Learning via Hypergraph Lifting Experiments Future Work
Background Learning via Hypergraph Lifting Experiments Future Work
4
Outline
5
Markov Logic
A logical KB is a set of hard constraintson the set of possible worlds
Let’s make them soft constraints:When a world violates a formula,it becomes less probable, not impossible
Give each formula a weight(Higher weight Stronger constraint)
6
Markov Logic
A Markov logic network (MLN) is a set of pairs (F,w) F is a formula in first-order logic w is a real number
vector of truth assignments to ground atoms
partition function
weight ofith formula
#true groundingsof ith formula
P (x) = 1Z exp
³ P Fi=1 wi ni
´
Challenging task Few approaches to date
[Kok & Domingos, ICML’05; Mihalkova & Mooney, ICML’07; Biba et. al. ECAI’08; Huynh & Mooney, ICML’08]
Most MLN structure learners Greedily and systematically enumerate formulas Computationally expensive; large search space Susceptible to local optima
7
MLN Structure Learning
While beam not empty Add unit clauses to beam While beam has changed For each clause c in beam
c’ à add a literal to c newClauses à newClauses [ c’
beam à k best clauses in beam [ newClauses Add best clause in beam to MLN
8
MSL [Kok & Domingos, ICML’05]
Find paths of linked ground atoms !formulas Path ´ conjunction that is true at least once Exponential search space of paths Restricted to short paths
9
Relational Pathfinding [Richards & Mooney, AAAI’92]
Sam
Pete CS1
CS2
CS3
CS4
CS5
CS6
CS7
CS8
Paul
Pat
Phil
Sara
Saul
Sue
Teaches
TAs
Advises
Pete CS1
Sam
Advises(Pete, Sam) Æ Teaches(Pete, CS1) Æ TAs(Sam, CS1) Advises( p , s ) Æ Teaches( p , c ) Æ TAs( s , c )
Find short paths with a form of relational pathfinding Path ! Boolean variable ! Node in Markov network Greedily tries to link the nodes with edges Cliques ! clauses
Form disjunctions of atoms in clique’s nodes Greedily adds clauses to an empty MLN
10
BUSL[Mihalkova & Mooney, ICML’07]
Advises( p,s) Æ
Teaches(p,c)
TAs(s,c)
… Advises(p,s) V Teaches(p,c) V TAs(s,c) :Advises(p,s) V : Teaches(p,c) V TAs(s,c)
…
Uses relational pathfinding to fuller extent Induces a hypergraph over clusters of constants
12
Learning viaHypergraph Lifting (LHL)
Sam
Pete CS1
CS2
CS3
CS4
CS5
CS6
CS7
CS8
Paul
Pat
Phil
Sara
Saul
Sue
Teaches
TAs
Advises
PetePaulPatPhil
SamSaraSaulSue
CS1 CS2
CS3 CS4
CS5 CS6
CS7 CS8
Teaches
TAs
Advises
“Lift”
Uses a hypergraph (V,E) V : set of nodes E : set of labeled, non-empty, ordered
subsets of VFind paths in a hypergraph
Path: set of hyperedges s.t. for any two e0 and en, 9 sequence of hyperedges in set
that leads from e0 Ã en13
Learning viaHypergraph Lifting (LHL)
Relational DB can be viewed as hypergraph Nodes ´ Constants Hyperedges ´ True ground atoms
14
DBAdvises
Pete Sam Pete SaulPaul Sara
… …TAs
Sam CS1Sam CS2Sara CS1
… …
TeachesPete CS1
Pete CS2Paul CS2
… …
Learning viaHypergraph Lifting (LHL)
Sam
Pete CS1
CS2
CS3
CS4
CS5
CS6
CS7
CS8
Paul
Pat
Phil
Sara
Saul
SueTAs
AdvisesTeaches
LHL “lifts” hypergraph into more compact rep. Jointly clusters nodes into higher-level concepts Clusters hyperedges
Traces paths in lifted hypergraph
15
LHL = Clustering + Relational Pathfinding
Sam
Pete CS1
CS2
CS3
CS4
CS5
CS6
CS7
CS8
Paul
Pat
Phil
Sara
Saul
Sue
Teaches
TAs
Advises
PetePaulPatPhil
SamSaraSaulSue
CS1 CS2CS3 CS4CS5 CS6CS7 CS8
Teaches
TAs
Advises
“Lift”
LHL has three components LiftGraph: Lifts hypergraph
FindPaths: Finds paths in lifted hypergraph
CreateMLN: Creates rules from paths, and
adds good ones to empty MLN
16
Learning via Hypergraph Lifting
Defined using Markov logic Jointly clusters constants in bottom-up
agglomerative manner Allows information to propagate from one
cluster to another Ground atoms also clustered #Clusters need not be specified in advance Each lifted hyperedge contains ¸ one true
ground atom17
LiftGraph
Find cluster assignment C that maxmizes posterior prob. P(C | D) / P(D | C) P(C)
18
Learning Problem in LiftGraph
Truth values ofground atoms Defined with
an MLNDefined withanother MLN
For each predicate r and each cluster combination containing a true ground atom of r, we have an atom prediction rule
19
LiftGraph’s P(D|C) MLN
8x1; : : :;xn x1 2 °1 ^:::^xn 2 °n ) r(x1; : : : ;xn)8x1; : : :;xn x1 2 °1 ^:::^xn 2 °n ) r(x1; : : : ;xn)
PetePaulPatPhil
Professor
Student
SamSaraSaulSue
Teaches
TAs
Advises
CS1 CS2CS3 CS4CS5 CS6CS7 CS8
CoursePetePaulPatPhil
ProfessorCS1 CS2CS3 CS4CS5 CS6CS7 CS8
Course
Teaches
For each predicate r and each cluster combination containing a true ground atom of r, we have an atom prediction rule
CS1 CS2CS3 CS4CS5 CS6CS7 CS8
Course
20
LiftGraph’s P(D|C) MLN
8x1; : : :;xn x1 2 °1 ^:::^xn 2 °n ) r(x1; : : : ;xn)8x1; : : :;xn x1 2 °1 ^:::^xn 2 °n ) r(x1; : : : ;xn)
PetePaulPatPhil
Professor
Teaches
Teaches(p,c)p 2 Æ c 2 )
For each predicate r, we have a default atom prediction rule
21
LiftGraph’s P(D|C) MLN
8x1; : : :;xn x1 2 °1 ^:::^xn 2 °n ) r(x1; : : : ;xn)8x1; : : :;xn x1 2 °1 ^:::^xn 2 °n ) r(x1; : : : ;xn)
PetePaulPatPhil
ProfessorCS1 CS2
CS3 CS4
CS5 CS6
CS7 CS8
Course
SamSaraSaulSue
x 2
Æ y 2x 2
8x1; : : :;xn x1 2 °1 ^:::^xn 2 °n ) r(x1; : : : ;xn)8x1; : : :;xn x1 2 °1 ^:::^xn 2 °n ) r(x1; : : : ;xn)
PetePaulPatPhil
Professor
Æ y 2
…
DefaultClusterCombination
) Teaches(x,y)
Student
Each symbol belongs to exactly one cluster Infinite weight
Exponential prior on #cluster combinations Negative weight -¸
22
1+ 11+ 1
LiftGraph’s P(C) MLN
Hard assignments of constants to clusters Weights and log-posterior computed in closed
form Searches for cluster assignment with highest
log-posterior
23
LiftGraph
25
LiftGraph’s Search Algm
CS1
CS2
CS3
Sam
Sara
Teaches
Advises
PetePaul
CS1
CS2
CS1CS2
CS3
CS1CS2CS3
Sam
Sara
SamSara
Teaches
Advises
26
FindPaths
1+ 11+ 1
PetePaulPatPhil
SamSaraSaulSue
CS1 CS2CS3 CS4CS5 CS6CS7 CS8
Teaches
TAs
Advises
Paths Found
PetePaulPatPhil
SamSaraSaulSue
CS1 CS2CS3 CS4CS5 CS6CS7 CS8
Advises( , )
Advises( , ) ,
Teaches ( , )
Advises( , ) ,
Teaches ( , ),TAs( , )
1+ 11+ 1
Advises( , ) ,
PetePaulPatPhil
SamSaraSaulSue
Teaches( , ) ,
CS1 CS2CS3 CS4CS5 CS6CS7 CS8
PetePaulPatPhil
TAs( , )
SamSaraSaulSue
CS1 CS2CS3 CS4CS5 CS6CS7 CS8
:Advises(p, s) V :Teaches(p, c) V :TAs(s, c)
27
Clause Creation
1+ 11+ 1
1+ 11+ 1
Advises( , )
PetePaulPatPhil
SamSaraSaulSue
Teaches( , )
CS1 CS2CS3 CS4CS5 CS6CS7 CS8
PetePaulPatPhil
TAs( , )
SamSaraSaulSue
CS1 CS2CS3 CS4CS5 CS6CS7 CS8
Æ
Æ
1+ 11+ 1
Advises( , )
Teaches( , )
TAs( , )
Æ
Æ
p
p
s
s
c
c
Advises(p, s) Æ Teaches(p, c) Æ TAs(s, c)
Advises(p, s) V :Teaches(p, c) V :TAs(s, c)
Advises(p, s) V Teaches(p, c) V :TAs(s, c) …
28
Clause Pruning
1+ 11+ 1
: Advises(p, s) V :Teaches(p, c) V TAs(s, c)
Advises(p, s) V :Teaches(p, c) V TAs(s, c)…
: Advises(p, s) V :Teaches(p, c)
: Advises(p, s) V TAs(s, c)
…
: Advises(p, s)
: Teaches(p, c)
:Teaches(p, c) V TAs(s, c)
TAs(s, c)
Score
-1.15
-1.17
-2.21 -2.23 -2.03
-3.13 -2.93 -3.93
…
…
`
29
Clause Pruning
1+ 11+ 1
: Advises(p, s) V :Teaches(p, c) V TAs(s, c)
Advises(p, s) V :Teaches(p, c) V TAs(s, c)…
: Advises(p, s) V :Teaches(p, c)
: Advises(p, s) V TAs(s, c)
…
: Advises(p, s)
: Teaches(p, c)
:Teaches(p, c) V TAs(s, c)
TAs(s, c)
Score
-1.15
-1.17
-2.21 -2.23 -2.03
-3.13 -2.93 -3.93
…
…
Compare each clause against its sub-clauses (taken individually)
Add clauses to empty MLN in order of decreasing score
Retrain weights of clauses each time clause is added
Retain clause in MLN if overall score improves
30
1+ 11+ 1
MLN Creation
IMDB Created from IMDB.com DB Movies, actors, etc., and relationships 17,793 ground atoms; 1224 true ones
UW-CSE Describes academic department Students, faculty, etc., and relationships 260,254 ground atoms; 2112 true ones
32
Datasets
Cora Citations to computer science papers Papers, authors, titles, etc., and their
relationships 687,422 ground atoms; 42,558 true ones
33
Datasets
Five-fold cross validation Inferred prob. true for groundings of each
predicate Groundings of all other predicates as evidence
Evaluation measures Area under precision-recall curve (AUC) Average conditional log-likelihood (CLL)
34
Methodology
MCMC inference algms in Alchemy to evaluate the test atoms 1 million samples 24 hours
35
Methodology
Compared with MSL [Kok & Domingos, ICML’05]
BUSL [Mihalkova & Mooney, ICML’07]
Lesion study NoLiftGraph: LHL with no hypergraph lifting
Find paths directly from unlifted hypergraph NoPathFinding: LHL with no pathfinding
Use MLN representing LiftGraph
36
Methodology
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
0.17
0.19
0.21
0.23
37
LHL vs. BUSL vs. MSLArea under Prec-Recall Curve
LHL BUSL MSL
IMDB UW-CSE
Cora
LHL BUSL MSL
LHL BUSL MSL
-0.39
-0.35
-0.31
-0.27
-0.57
-0.47
-0.37
-0.27
-0.17
-0.07
-0.18
-0.17
-0.16
-0.15
-0.14
-0.13
-0.12
LHL vs. BUSL vs. MSLConditional Log-likelihood
IMDB UW-CSE
Cora
LHL BUSL MSL LHL BUSL MSL
LHL BUSL MSL
0
4
8
12
16
39
LHL vs. BUSL vs. MSLRuntime
0
20
40
60
0
4
8
12
UW-CSEIMDB
Cora
min
hr
hr
LHL BUSL MSL LHL BUSL MSL
LHL BUSL MSL
40
LHL vs. NoLiftGraphArea under Prec-Recall Curve
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1IMDB UW-CSE
Cora
NoLiftGraph
LHLNoLiftGraphLHL
NoLiftGraph
LHL
-0.39
-0.34
-0.29
-0.24
-0.19
-0.05
-0.04
-0.03
-0.135
-0.13
-0.125
-0.12
41
LHL vs. NoLiftGraphConditional Log-likelihood
IMDB UW-CSE
Cora
NoLiftGraphLHL
NoLiftGraphLHL
NoLiftGraphLHL
42
0
50
100
150
200
250
0
1000
2000
3000
4000
5000
6000
0
50
100
150
LHL vs. NoLiftGraphRuntime
IMDB UW-CSE
Cora
min
hr
hr
NoLiftGraphLHL
NoLiftGraphLHL
NoLiftGraphLHL
43
LHL vs. NoPathFinding
0
0.2
0.4
0.6
0.8
1
0
0.1
0.2
0.3
-0.28
-0.23
-0.18
-0.13
-0.0700000000000001
-0.0600000000000001
-0.0500000000000001
-0.0400000000000001
-0.03
AU
C
AU
C
CL
L
CL
L
IMDB UW-CSE
NoPathFindingLHL
NoPathFindingLHL
NoPathFindingLHL NoPath
FindingLHL
if a is an actor
and d is a director,
and they both worked in the same movie,
then a probably worked under d
if p is a professor,
and p co-authored a paper with s,
then s is likely a student
if papers x and y have same author
then x and y are likely to be same paper 44
Examples of Rules Learned
Integrate the components of LHL Integrate LHL with lifted inference [Singla &
Domingos, AAAI’08]
Construct ontology simultaneously with probabilistic KB
Further scale LHL up Apply LHL to larger, richer domains
e.g., the Web
46
Future Work