lifted probabilistic inference in relational modelssuciu/tutorial-uai2014.pdf · why statistical...

Post on 17-Jun-2020

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Lifted Probabilistic Inference

in Relational Models

Guy Van den Broeck Dan Suciu

KU Leuven U. of Washington

Tutorial UAI 2014 1

About the Tutorial

Slides available online.

Bibliography is at the end.

Your speakers:

I work in AI I work in DB

http://www.guyvdb.eu/

https://homes.cs.washington.edu/~suciu/

About the Tutorial

• The tutorial is about

– deep connections between AI and DBs

– a unified view on probabilistic reasoning

– a logical approach to Lifted Inference

• The tutorial is NOT an exhaustive

overview of lifted algorithms for graphical

models (see references at the end)

Outline

• Part 1: Motivation

• Part 2: Probabilistic Databases

• Part 3: Weighted Model Counting

• Part 4: Lifted Inference for WFOMC

• Part 5: The Power of Lifted Inference

• Part 6: Conclusion/Open Problems

Tutorial UAI 2014 4

Part 1: Motivation

• Why do we need relational representations

of uncertainty?

• Why do we need lifted inference

algorithms?

Tutorial UAI 2014 5

Why Relational Data?

• Our data is already relational!

– Companies run relational databases

– Scientific data is relational:

• Large Hadron Collider generated 25PB in 2012

• LSST Telescope will produce 30TB per night

• Big data is big business:

– Oracle: $7.1BN in sales

– IBM: $3.2BN in sales

– Microsoft: $2.6BN in sales

≈ GDP of

Iceland!

6 [Gartner‟06]

Why Probabilistic Relational Data?

• Relational data is increasingly probabilistic

– NELL machine reading (>50M tuples)

– Google Knowledge Vault (>2BN tuples)

– DeepDive (>7M tuples)

• Data is inferred from unstructured information using statistical models

– Learned from the web, large text corpora, ontologies, etc.

– The learned/extracted data is relational

Tutorial UAI 2014 7 [Carlson‟10, Dong‟14, Niu‟12]

Representation:

Probabilistic Databases

• Tuple-independent probabilistic databases

• Query: SQL or First Order Logic

Actor:

Name Prob

Brando 0.9

Cruise 0.8

Coppola 0.1

WorkedFor:

Actor Director Prob

Brando Coppola 0.9

Coppola Brando 0.2

Cruise Coppola 0.1

Tutorial UAI 2014 8

Q(x) = ∃y Actor(x)∧WorkedFor(x,y) SELECT Actor.name

FROM Actor, WorkedFor

WHERE Actor.name = WorkedFor.actor

9

Logical

Propositional Relational

Summary

Uncertainty in AI

Databases

Statistical Prob.

DBs

10

Statistical

Logical

Propositional Relational

Representations in AI and ML

Cloudy Rain P(Rain|Cloudy)

T T 0.80

T F 0.20

F T 0.01

F F 0.99

Rain ⇒ Cloudy

Relational Representations

Example: First-Order Logic

Logical variables have domain of constants

x,y range over domain People = {Alice,Bob}

Ground formula has no logical variables

Smokes(Alice) ∧ Friends(Alice,Bob) ⇒ Smokes(Bob)

Tutorial UAI 2014 11

∀x,y, Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)

Formula

Logical Variable Atom

12

Statistical

Logical

Propositional Relational

Representations in AI and ML

Cloudy Rain P(Rain|Cloudy)

T T 0.80

T F 0.20

F T 0.01

F F 0.99

∀x, ∀y, Smokes(x) ∧ Friends(x,y)

⇒ Smokes(y) Rain ⇒ Cloudy

Why Statistical Relational Models?

Probabilistic graphical models

Quantify uncertainty and noise

Not very expressive Rules of chess in ~100,000 pages

First-order logic Very expressive Rules of chess in 1 page

Good match for abundant relational data

Hard to express uncertainty and noise

Tutorial UAI 2014 13

Example: Markov Logic

Weighted First-Order Logic

Ground atom/tuple = random variable in {true,false}

e.g., Smokes(Alice), Friends(Alice,Bob), etc.

Ground formula = factor in propositional factor graph

Friends(Alice,Bob)

Smokes(Alice) Smokes(Bob)

Friends(Bob,Alice)

f1 f2 Friends(Alice,Alice) Friends(Bob,Bob)

f3 f4

3.14 Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)

FOL Formula Weight or Probability

[Richardson‟06]

15

Cloudy Rain P(Rain|Cloudy)

T T 0.80

T F 0.20

F T 0.01

F F 0.99

Statistical

Logical

Propositional Relational

Representations in AI and ML

3.14 Smokes(x) ∧ Friends(x,y)

⇒ Smokes(y)

∀x, ∀y, Smokes(x) ∧ Friends(x,y)

⇒ Smokes(y) Rain ⇒ Cloudy

15

Cloudy Rain P(Rain|Cloudy)

T T 0.80

T F 0.20

F T 0.01

F F 0.99

Statistical

Logical

Propositional Relational

Representations in AI and ML

3.14 Smokes(x) ∧ Friends(x,y)

⇒ Smokes(y)

∀x, ∀y, Smokes(x) ∧ Friends(x,y)

⇒ Smokes(y) Rain ⇒ Cloudy

16

Logical

Propositional Relational

Summary

Uncertainty in AI

Databases

Statistical StarAI

17

Logical

Propositional Relational

Summary

Uncertainty in AI

Databases

Statistical

17

Logical

Propositional Relational

Summary

Uncertainty in AI

Databases

Statistical

Lifted Inference

• Main idea: exploit high level relational

representation to speed up reasoning

• Let‟s see an example…

Tutorial UAI 2014 18

A Simple Reasoning Problem

52 playing cards

Let us ask some simple questions

...

Tutorial UAI 2014 19

...

A Simple Reasoning Problem

?

Probability that Card1 is Q?

Tutorial UAI 2014 20

...

A Simple Reasoning Problem

?

Probability that Card1 is Q? 1/13

Tutorial UAI 2014 20

...

A Simple Reasoning Problem

?

Probability that Card1 is Hearts?

Tutorial UAI 2014 21

...

A Simple Reasoning Problem

?

Probability that Card1 is Hearts? 1/4

Tutorial UAI 2014 21

...

A Simple Reasoning Problem

?

Probability that Card1 is Hearts

given that Card1 is red?

Tutorial UAI 2014 22

...

A Simple Reasoning Problem

?

Probability that Card1 is Hearts

given that Card1 is red? 1/2

Tutorial UAI 2014 22

A Simple Reasoning Problem

...

?

Probability that Card52 is Spades

given that Card1 is QH?

Tutorial UAI 2014 23

A Simple Reasoning Problem

...

?

Probability that Card52 is Spades

given that Card1 is QH? 13/51

Tutorial UAI 2014 23

Automated Reasoning

Let us automate this:

1. Probabilistic graphical model (e.g., factor graph)

2. Probabilistic inference algorithm

(e.g., variable elimination or junction tree)

Tutorial UAI 2014 24

Reasoning in Propositional Models

A key result: Treewidth Why?

A

B C

D E

F

A

B C

D E

F

A

B C

D E

F

Tree Sparse Graph Dense Graph

25

Reasoning in Propositional Models

A key result: Treewidth Why?

A

B C

D E

F

A

B C

D E

F

A

B C

D E

F

Conditional Independence!

Tree Sparse Graph Dense Graph

25

Reasoning in Propositional Models

A key result: Treewidth Why?

A

B C

D E

F

A

B C

D E

F

A

B C

D E

F

P(A|C,E) = P(A|C) P(A|B,E,F) = P(A|B,E) P(A|B,E,F) ≠ P(A|B,E)

Conditional Independence!

Tree Sparse Graph Dense Graph

25

Is There Conditional Independence?

...

P(Card52 | Card1) ≟ P(Card52 | Card1, Card2)

26

Is There Conditional Independence?

...

?

P(Card52 | Card1) ≟ P(Card52 | Card1, Card2)

? ≟ ?

26

Is There Conditional Independence?

...

?

P(Card52 | Card1) ≟ P(Card52 | Card1, Card2)

13/51 ≟ ?

26

Is There Conditional Independence?

...

?

P(Card52 | Card1) ≟ P(Card52 | Card1, Card2)

13/51 ≟ ?

26

Is There Conditional Independence?

...

?

P(Card52 | Card1) ≟ P(Card52 | Card1, Card2)

13/51 ≠ 12/50

26

Is There Conditional Independence?

...

?

P(Card52 | Card1) ≟ P(Card52 | Card1, Card2)

13/51 ≠ 12/50

P(Card52 | Card1) ≠ P(Card52 | Card1, Card2)

26

Is There Conditional Independence?

...

?

P(Card52 | Card1) ≟ P(Card52 | Card1, Card2)

13/51 ≠ 12/50

P(Card52 | Card1, Card2) ≟ P(Card52 | Card1, Card2, Card3)

P(Card52 | Card1) ≠ P(Card52 | Card1, Card2)

26

Is There Conditional Independence?

...

?

P(Card52 | Card1) ≟ P(Card52 | Card1, Card2)

13/51 ≠ 12/50

P(Card52 | Card1, Card2) ≟ P(Card52 | Card1, Card2, Card3)

P(Card52 | Card1) ≠ P(Card52 | Card1, Card2)

12/50 ≠ 12/49

P(Card52 | Card1, Card2) ≠ P(Card52 | Card1, Card2, Card3)

26

Automated Reasoning

(artist's impression)

Let us automate this:

1. Probabilistic graphical model (e.g., factor graph) is fully connected!

2. Probabilistic inference algorithm (e.g., variable elimination or junction tree) builds a table with 1352 rows

27

...

What's Going On Here?

?

Probability that Card52 is Spades

given that Card1 is QH?

Tutorial UAI 2014 28

...

What's Going On Here?

?

Probability that Card52 is Spades

given that Card1 is QH? 13/51

Tutorial UAI 2014 28

What's Going On Here?

?

...

Probability that Card52 is Spades

given that Card2 is QH?

Tutorial UAI 2014 29

What's Going On Here?

?

...

Probability that Card52 is Spades

given that Card2 is QH? 13/51

Tutorial UAI 2014 29

What's Going On Here?

?

...

Probability that Card52 is Spades

given that Card3 is QH?

Tutorial UAI 2014 30

What's Going On Here?

?

...

Probability that Card52 is Spades

given that Card3 is QH? 13/51

Tutorial UAI 2014 30

...

Tractable Probabilistic Inference

Which property makes inference tractable?

Traditional belief: Independence (conditional/contextual)

What's going on here?

31 [Niepert‟14]

...

Tractable Probabilistic Inference

Which property makes inference tractable?

Traditional belief: Independence (conditional/contextual)

What's going on here?

⇒ Lifted Inference

High-level reasoning

Symmetry

Exchangeability

31 [Niepert‟14]

...

Tractable Probabilistic Inference

Which property makes inference tractable?

Traditional belief: Independence (conditional/contextual)

What's going on here?

⇒ Lifted Inference

High-level reasoning

Symmetry

Exchangeability See AAAI talk on Tuesday!

31 [Niepert‟14]

Automated Reasoning

Let us automate this:

Relational model

Lifted probabilistic inference algorithm

Tutorial UAI 2014 32

∀p, ∃c, Card(p,c)

∀c, ∃p, Card(p,c)

∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟

Other Examples of Lifted Inference

implies

First-order resolution

Tutorial UAI 2014 33

∀x, Human(x) ⇒ Mortal(x)

∀x, Greek(x) ⇒ Human(x)

∀x, Greek(x) ⇒ Mortal(x)

Other Examples of Lifted Inference

First-order resolution

Reasoning about populations We are investigating a rare disease. The disease is more rare in

women, presenting only in one in every two billion women

and one in every billion men. Then, assuming there are 3.4

billion men and 3.6 billion women in the world, the probability

that more than five people have the disease is

Lifted Inference in SRL

Statistical relational model (e.g., MLN)

As a probabilistic graphical model:

26 pages; 728 variables; 676 factors

1000 pages; 1,002,000 variables;

1,000,000 factors

Highly intractable?

– Lifted inference in milliseconds!

Tutorial UAI 2014 35

3.14 FacultyPage(x) ∧ Linked(x,y) ⇒ CoursePage(y)

Summary of Motivation

• Relational data is everywhere:

– Databases in industry

– Databases in sciences

– Knowledge bases

• Lifted inference:

– Use relational structure during reasoning

– Very efficient where traditional methods break

This tutorial: Lifted Inference in Relational Models

Tutorial UAI 2014 36

Outline

• Part 1: Motivation

• Part 2: Probabilistic Databases

• Part 3: Weighted Model Counting

• Part 4: Lifted Inference for WFOMC

• Part 5: The Power of Lifted Inference

• Part 6: Conclusion/Open Problems

Tutorial UAI 2014 37

What Everyone Should Know about

Databases

• Database = several relations (a.k.a. tables)

• SQL Query = FO Formula

• Boolean Query = FO Sentence

Tutorial UAI 2014 38

What Everyone Should Know about

Databases

X Y

Alice 2009

Alice 2010

Bob 2009

Carol 2010

Smoker X Z

Alice Bob

Alice Carol

Bob Carol

Carol Bob

Friend

D =

Database: relations (= tables)

What Everyone Should Know about

Databases

X Y

Alice 2009

Alice 2010

Bob 2009

Carol 2010

Smoker X Z

Alice Bob

Alice Carol

Bob Carol

Carol Bob

Friend

D =

Z

Bob

Carol

Database: relations (= tables)

Query answer: Q(D) =

Q(z) = ∃x (Smoker(x,‟2009‟)∧Friend(x,z)) Query: First Order Formula

Find friends of smokers in 2009

Conjunctive Queries CQ = FO(∃, ∧)

Union of Conjunctive Queries UCQ = FO(∃, ∧, ∨)

What Everyone Should Know about

Databases

X Y

Alice 2009

Alice 2010

Bob 2009

Carol 2010

Smoker X Z

Alice Bob

Alice Carol

Bob Carol

Carol Bob

Friend

Boolean Query: FO Sentence Q = ∃x (Smoker(x,‟2009‟)∧Friend(x,‟Bob‟))

Query answer: Q(D) = TRUE

D =

Z

Bob

Carol

Database: relations (= tables)

Query answer: Q(D) =

Q(z) = ∃x (Smoker(x,‟2009‟)∧Friend(x,z)) Query: First Order Formula

Find friends of smokers in 2009

Conjunctive Queries CQ = FO(∃, ∧)

Union of Conjunctive Queries UCQ = FO(∃, ∧, ∨)

What Everyone Should Know about

Databases

Declarative Query Query Plan

“what” “how”

What Everyone Should Know about

Databases

Declarative Query Query Plan

“what” “how”

Q(z) = ∃x (Smoker(x,‟2009‟) ∧ Friend(x,z))

What Everyone Should Know about

Databases

Declarative Query Query Plan

“what” “how”

Q(z) = ∃x (Smoker(x,‟2009‟) ∧ Friend(x,z))

Πz

⋈x

σy=„2009‟

Smoker(x,y) Friend(x,z)

Logical Query Plan

What Everyone Should Know about

Databases

Declarative Query Query Plan

“what” “how”

Q(z) = ∃x (Smoker(x,‟2009‟) ∧ Friend(x,z))

Πz

⋈x

σy=„2009‟

Smoker(x,y) Friend(x,z)

Logical Query Plan

Πz

⋈x

σy=„2009‟

Smoker(x,y) Friend(x,z)

Logical Query Plan

Optimize

What Everyone Should Know about

Databases

Declarative Query Query Plan

“what” “how”

Q(z) = ∃x (Smoker(x,‟2009‟) ∧ Friend(x,z))

Πz

⋈x

σy=„2009‟

Smoker(x,y) Friend(x,z)

Logical Query Plan

Index-scan

Merge-join

Hash-based

deduplication

Physical Query Plan

Πz

⋈x

σy=„2009‟

Smoker(x,y) Friend(x,z)

Logical Query Plan

Optimize

What Every Researcher Should

Know about Databases

Problem: compute Q(D)

Moshe Vardi [Vardi‟82]

2008 ACM SIGMOD Contribution Award

Tutorial UAI 2014 41 [Vardi‟82]

What Every Researcher Should

Know about Databases

Problem: compute Q(D)

• Data complexity:

fix Q, complexity = f(D)

Moshe Vardi [Vardi‟82]

2008 ACM SIGMOD Contribution Award

Tutorial UAI 2014 41 [Vardi‟82]

What Every Researcher Should

Know about Databases

Problem: compute Q(D)

• Data complexity:

fix Q, complexity = f(D)

Query complexity: (expression complexity)

fix D, complexity = f(Q)

• Combined complexity:

complexity = f(D,Q)

Moshe Vardi [Vardi‟82]

2008 ACM SIGMOD Contribution Award

Tutorial UAI 2014 41 [Vardi‟82]

Probabilistic Databases

• A probabilistic database = relational

database where each tuple has an

associated probability

• Semantics = probability distribution over

possible worlds (deterministic databases)

• In this talk: tuples are independent events

Tutorial UAI 2014 42

[Suciu‟11]

Example

x y P

A B p1

A C p2

B C p3

Probabilistic database D: Friend

Tutorial UAI 2014 43

x y

A B

A C

B C

Example

x y P

A B p1

A C p2

B C p3

Possible worlds semantics:

p1p2p3

Probabilistic database D: Friend

Tutorial UAI 2014 43

x y

A B

A C

B C

Example

x y P

A B p1

A C p2

B C p3

Possible worlds semantics:

p1p2p3

(1-p1)p2p3

Probabilistic database D: Friend

x y

A C

B C

Tutorial UAI 2014 43

x y

A B

A C

B C

Example

x y P

A B p1

A C p2

B C p3

Possible worlds semantics:

p1p2p3

(1-p1)p2p3

(1-p1)(1-p2)(1-p3)

Probabilistic database D: Friend

x y

A C

B C

x y

A B

A C

x y

A B

B C

x y

A B x y

A C x y

B C x y

Tutorial UAI 2014 43

Query Semantics

Fix a Boolean query Q

Fix a probabilistic database D:

P (Q | D) = marginal probability of Q

on possible words of D

Tutorial UAI 2014 44

x y P

A D q1 Y1

A E q2 Y2

B F q3 Y3

B G q4 Y4

B H q5 Y5

Friend

x P

A p1 X1

B p2 X2

C p3 X3

Smoker

P(Q | D) =

An Example

Q = ∃x∃y Smoker(x)∧Friend(x,y)

x y P

A D q1 Y1

A E q2 Y2

B F q3 Y3

B G q4 Y4

B H q5 Y5

Friend

x P

A p1 X1

B p2 X2

C p3 X3

Smoker

P(Q | D) = 1-(1-q1)*(1-q2)

An Example

Q = ∃x∃y Smoker(x)∧Friend(x,y)

x y P

A D q1 Y1

A E q2 Y2

B F q3 Y3

B G q4 Y4

B H q5 Y5

Friend

x P

A p1 X1

B p2 X2

C p3 X3

Smoker

P(Q | D) = 1-(1-q1)*(1-q2) p1*[ ]

An Example

Q = ∃x∃y Smoker(x)∧Friend(x,y)

x y P

A D q1 Y1

A E q2 Y2

B F q3 Y3

B G q4 Y4

B H q5 Y5

Friend

x P

A p1 X1

B p2 X2

C p3 X3

Smoker

P(Q | D) = 1-(1-q1)*(1-q2) p1*[ ]

1-(1-q3)*(1-q4)*(1-q5)

An Example

Q = ∃x∃y Smoker(x)∧Friend(x,y)

x y P

A D q1 Y1

A E q2 Y2

B F q3 Y3

B G q4 Y4

B H q5 Y5

Friend

x P

A p1 X1

B p2 X2

C p3 X3

Smoker

P(Q | D) = 1-(1-q1)*(1-q2) p1*[ ]

1-(1-q3)*(1-q4)*(1-q5) p2*[ ]

An Example

Q = ∃x∃y Smoker(x)∧Friend(x,y)

x y P

A D q1 Y1

A E q2 Y2

B F q3 Y3

B G q4 Y4

B H q5 Y5

Friend

x P

A p1 X1

B p2 X2

C p3 X3

Smoker

P(Q | D) = 1-(1-q1)*(1-q2) p1*[ ]

1-(1-q3)*(1-q4)*(1-q5) p2*[ ]

1- {1- } *

{1- }

An Example

Q = ∃x∃y Smoker(x)∧Friend(x,y)

x y P

A D q1 Y1

A E q2 Y2

B F q3 Y3

B G q4 Y4

B H q5 Y5

Friend

x P

A p1 X1

B p2 X2

C p3 X3

Smoker

P(Q | D) = 1-(1-q1)*(1-q2) p1*[ ]

1-(1-q3)*(1-q4)*(1-q5) p2*[ ]

1- {1- } *

{1- }

An Example

Q = ∃x∃y Smoker(x)∧Friend(x,y)

One can compute P(Q | D) in PTIME

in the size of the database D

An Example Q = ∃x∃y Smoker(x)∧Friend(x,y)

x y P

A D q1

A E q2

B F q3

B G q4

B H q5

x P

A p1

B p2

C p3

ΠΦ

Friend(x,y)

Πx

Smoker(x)

Use the SQL engine

to compute the query!

Aggregate on probabilities.

An Example Q = ∃x∃y Smoker(x)∧Friend(x,y)

x y P

A D q1

A E q2

B F q3

B G q4

B H q5

x P

A p1

B p2

C p3

ΠΦ

Friend(x,y)

Πx

x P

A 1-(1-q1)(1-q2)

B 1-(1-q4)(1-q5) (1-q6)

Smoker(x)

Use the SQL engine

to compute the query!

Aggregate on probabilities.

An Example Q = ∃x∃y Smoker(x)∧Friend(x,y)

x y P

A D q1

A E q2

B F q3

B G q4

B H q5

x P

A p1

B p2

C p3

ΠΦ

Friend(x,y)

Πx

x P

A 1-(1-q1)(1-q2)

B 1-(1-q4)(1-q5) (1-q6)

1-{1-p1[1-(1-q1)(1-q2)]}*

{1-p2[1-(1-q4)(1-q5) (1-q6)]}

Smoker(x)

Use the SQL engine

to compute the query!

Aggregate on probabilities.

Problem Statement

Given: probabilistic database D, query Q

Compute: P(Q | D)

Data complexity: fix Q, complexity = f(|D|)

Tutorial UAI 2014 47

Approaches to Compute P(Q | D)

• Propositional inference:

– Ground the query Q FQ,D, compute P(FQ,D)

– This is Weighted Model Counting (later…)

– Works for every query Q

– But: may be exponential in |D| (data complexity)

• Lifted inference:

– Compute a query plan for Q, execute plan on D

– Always polynomial time in |D| (data complexity)

– But: does not work for all queries Q

Tutorial UAI 2014 48 [Olteanu‟08, Jha‟13, Dalvi‟04, Dalvi‟12]

The Lifted Inference Rules

• If Q1, Q2 are independent:

AND-rule: P(Q1 ∧ Q2) = P(Q1)P(Q2)

OR-rule: P(Q1 ∨ Q2) =1 – (1 – P(Q1))(1 – P(Q2))

The Lifted Inference Rules

• If Q1, Q2 are independent:

AND-rule: P(Q1 ∧ Q2) = P(Q1)P(Q2)

OR-rule: P(Q1 ∨ Q2) =1 – (1 – P(Q1))(1 – P(Q2))

• If Q[C1/x], Q[C2/x], … are independent

∀-Rule: P(∀z Q) = ΠC ∈Domain P(Q[C/z])

∃-Rule: P(∃z Q) = 1 – ΠC ∈Domain (1– P(Q[C/z])

The Lifted Inference Rules

• If Q1, Q2 are independent:

AND-rule: P(Q1 ∧ Q2) = P(Q1)P(Q2)

OR-rule: P(Q1 ∨ Q2) =1 – (1 – P(Q1))(1 – P(Q2))

• If Q[C1/x], Q[C2/x], … are independent

∀-Rule: P(∀z Q) = ΠC ∈Domain P(Q[C/z])

∃-Rule: P(∃z Q) = 1 – ΠC ∈Domain (1– P(Q[C/z])

• Inclusion/Exclusion formula:

P(Q1 ∨ Q2) = P(Q1) + P(Q2)- P(Q1 ∧ Q2)

P(Q1 ∧ Q2) = P(Q1) + P(Q2)- P(Q1 ∨ Q2)

The Lifted Inference Rules

• If Q1, Q2 are independent:

AND-rule: P(Q1 ∧ Q2) = P(Q1)P(Q2)

OR-rule: P(Q1 ∨ Q2) =1 – (1 – P(Q1))(1 – P(Q2))

• If Q[C1/x], Q[C2/x], … are independent

∀-Rule: P(∀z Q) = ΠC ∈Domain P(Q[C/z])

∃-Rule: P(∃z Q) = 1 – ΠC ∈Domain (1– P(Q[C/z])

• Inclusion/Exclusion formula:

P(Q1 ∨ Q2) = P(Q1) + P(Q2)- P(Q1 ∧ Q2)

P(Q1 ∧ Q2) = P(Q1) + P(Q2)- P(Q1 ∨ Q2)

• Negation: P(¬Q) = 1 – P(Q)

Example

Q = ∀x∀y (Smoker(x) ∨ Friend(x,y)) =∀x (Smoker(x) ∨∀y Friend(x,y))

P(Q) = ΠA ∈ Domain P(Smoker(A) ∨ ∀y Friend(A,y))

∀-Rule

Check independence:

Smoker(Alice) ∨ ∀y Friend(Alice,y)

Smoker(Bob) ∨ ∀y Friend(Bob,y)

Example

Q = ∀x∀y (Smoker(x) ∨ Friend(x,y)) =∀x (Smoker(x) ∨∀y Friend(x,y))

P(Q) = ΠA ∈ Domain P(Smoker(A) ∨ ∀y Friend(A,y))

∀-Rule

P(Q) = ΠA ∈ Domain (1– P(Smoker(A))) × (1 – P(∀y Friend(A,y))) ∨-Rule

Check independence:

Smoker(Alice) ∨ ∀y Friend(Alice,y)

Smoker(Bob) ∨ ∀y Friend(Bob,y)

Example

Q = ∀x∀y (Smoker(x) ∨ Friend(x,y)) =∀x (Smoker(x) ∨∀y Friend(x,y))

P(Q) = ΠA ∈ Domain P(Smoker(A) ∨ ∀y Friend(A,y))

∀-Rule

P(Q) = ΠA ∈ Domain (1– P(Smoker(A))) × (1 – P(∀y Friend(A,y))) ∨-Rule

P(Q) = ΠA ∈ Domain (1– P(Smoker(A))) × (1 – ΠB ∈ Domain P(Friend(A,B)))

∀-Rule

Check independence:

Smoker(Alice) ∨ ∀y Friend(Alice,y)

Smoker(Bob) ∨ ∀y Friend(Bob,y)

Example

Q = ∀x∀y (Smoker(x) ∨ Friend(x,y)) =∀x (Smoker(x) ∨∀y Friend(x,y))

Lookup the probabilities

in the database

P(Q) = ΠA ∈ Domain P(Smoker(A) ∨ ∀y Friend(A,y))

∀-Rule

P(Q) = ΠA ∈ Domain (1– P(Smoker(A))) × (1 – P(∀y Friend(A,y))) ∨-Rule

P(Q) = ΠA ∈ Domain (1– P(Smoker(A))) × (1 – ΠB ∈ Domain P(Friend(A,B)))

∀-Rule

Check independence:

Smoker(Alice) ∨ ∀y Friend(Alice,y)

Smoker(Bob) ∨ ∀y Friend(Bob,y)

Example

Q = ∀x∀y (Smoker(x) ∨ Friend(x,y)) =∀x (Smoker(x) ∨∀y Friend(x,y))

Lookup the probabilities

in the database Runtime = O(n2).

P(Q) = ΠA ∈ Domain P(Smoker(A) ∨ ∀y Friend(A,y))

∀-Rule

P(Q) = ΠA ∈ Domain (1– P(Smoker(A))) × (1 – P(∀y Friend(A,y))) ∨-Rule

P(Q) = ΠA ∈ Domain (1– P(Smoker(A))) × (1 – ΠB ∈ Domain P(Friend(A,B)))

∀-Rule

Check independence:

Smoker(Alice) ∨ ∀y Friend(Alice,y)

Smoker(Bob) ∨ ∀y Friend(Bob,y)

Discussion: CNF vs. DNF

Databases KR/AI

Conjunctive

Queries CQ

FO(∃, ∧) Positive Clause FO(∀, ∨)

Union of Conjunctive

Queries UCQ

FO(∃, ∧, ∨) =

∃ Positive-DNF

Positive FO FO(∀, ∧, ∨) =

∀ Positive-CNF

UCQ with “safe

negation” UCQ¬

∃ DNF First Order CNF ∀ CNF

Q = ∃x,∃y, Smoker(x)∧Friend(x,y) Q = ∀x∀y (Smoker(x) ∨ Friend(x,y))

By duality we can reduce one problem to the other:

∃x,∃y, Smoker(x) ∧ Friend(x,y) = ¬ ∀x,∀y, (¬Smoker(x) ∨ ¬Friend(x,y))

Discussion

H0= ∀x∀y (Smoker(x)∨Friend(x,y)∨Jogger(y))

No rule applies here!

The ∀-rule does not apply, because H0[Alice/x] and H0[Bob/x] are dependent:

H0[Alice/x] = ∀y (Smoker(Alice) ∨ Friend(Alice,y) ∨ Jogger(y))

H0[Bob/x] = ∀y (Smoker(Bob) ∨ Friend(Bob,y) ∨ Jogger(y)) Dependent

Lifted Inference Sometimes Fails

Discussion

H0= ∀x∀y (Smoker(x)∨Friend(x,y)∨Jogger(y))

No rule applies here!

The ∀-rule does not apply, because H0[Alice/x] and H0[Bob/x] are dependent:

H0[Alice/x] = ∀y (Smoker(Alice) ∨ Friend(Alice,y) ∨ Jogger(y))

H0[Bob/x] = ∀y (Smoker(Bob) ∨ Friend(Bob,y) ∨ Jogger(y)) Dependent

Theorem. [Dalvi‟04] Computing P(H0 | D) is #P-hard in |D|

Proof: later…

Lifted Inference Sometimes Fails

Discussion

H0= ∀x∀y (Smoker(x)∨Friend(x,y)∨Jogger(y))

No rule applies here!

The ∀-rule does not apply, because H0[Alice/x] and H0[Bob/x] are dependent:

H0[Alice/x] = ∀y (Smoker(Alice) ∨ Friend(Alice,y) ∨ Jogger(y))

H0[Bob/x] = ∀y (Smoker(Bob) ∨ Friend(Bob,y) ∨ Jogger(y)) Dependent

Theorem. [Dalvi‟04] Computing P(H0 | D) is #P-hard in |D|

Proof: later…

Consequence: assuming PTIME ≠ #P, H0 is not liftable!

Lifted Inference Sometimes Fails

Summary

• Database D = relations

• Query Q = FO

• Query plans, query optimization

• Data complexity: fix Q, complexity f(D)

• Probabilistic DB‟s = independent tuples

• Lifted inference: simple, but fails sometimes

Next: Weighted Model Counting = Unified framework for inference

Later: Are rules complete? Yes! (sort of): Power of Lifted Inference

Outline

• Part 1: Motivation

• Part 2: Probabilistic Databases

• Part 3: Weighted Model Counting

• Part 4: Lifted Inference for WFOMC

• Part 5: The Power of Lifted Inference

• Part 6: Conclusion/Open Problems

54

Weighted Model Counting

• Model = solution to a propositional logic formula Δ

• Model counting = #SAT

Rain Cloudy Model?

T T Yes

T F No

F T Yes

F F Yes

#SAT = 3

+

Δ = (Rain ⇒ Cloudy)

Weighted Model Counting

• Model = solution to a propositional logic formula Δ

• Model counting = #SAT

Rain Cloudy Model?

T T Yes

T F No

F T Yes

F F Yes

#SAT = 3

Weight

1 * 3 = 3

0

2 * 3 = 6

2 * 5 = 10

• Weighted model counting (WMC) – Weights for assignments to variables

– Model weight is product of variable weights w(.)

+

Δ = (Rain ⇒ Cloudy)

Rain

w(R) w(¬R)

1 2

Cloudy

w(C) w(¬C)

3 5

Weighted Model Counting

• Model = solution to a propositional logic formula Δ

• Model counting = #SAT

Rain Cloudy Model?

T T Yes

T F No

F T Yes

F F Yes

#SAT = 3

Weight

1 * 3 = 3

0

2 * 3 = 6

2 * 5 = 10

WMC = 19

• Weighted model counting (WMC) – Weights for assignments to variables

– Model weight is product of variable weights w(.)

+ +

Δ = (Rain ⇒ Cloudy)

Rain

w(R) w(¬R)

1 2

Cloudy

w(C) w(¬C)

3 5

Weighted Model Counting @ UAI

• Assembly language for non-lifted inference

• Reductions to WMC for inference in – Bayesian networks [Chavira‟05, Sang‟05 , Chavira‟08]

– Factor graphs [Choi‟13]

– Relational Bayesian networks [Chavira‟06]

– Probabilistic logic programs [Fierens‟11, Fierens‟13]

– Probabilistic databases [Olteanu‟08, Jha‟13]

• State-of-the-art solvers

– Knowledge compilation (WMC → d-DNNF → AC)

Winner of the UAI’08 exact inference competition!

– DPLL search

Weighted First-Order Model Counting Model = solution to first-order logic formula Δ

Δ = ∀d (Rain(d)

⇒ Cloudy(d))

[V.d.Broeck‟11a, Gogate‟11]

Days = {Monday}

Weighted First-Order Model Counting Model = solution to first-order logic formula Δ

Rain(M) Cloudy(M) Model?

T T Yes

T F No

F T Yes

F F Yes

#SAT = 3

+

Δ = ∀d (Rain(d)

⇒ Cloudy(d))

[V.d.Broeck‟11a, Gogate‟11]

Days = {Monday}

Weighted First-Order Model Counting Model = solution to first-order logic formula Δ

Rain(M) Cloudy(M) Rain(T) Cloudy(T) Model?

T T T T Yes

T F T T No

F T T T Yes

F F T T Yes

T T T F No

T F T F No

F T T F No

F F T F No

T T F T Yes

T F F T No

F T F T Yes

F F F T Yes

T T F F Yes

T F F F No

F T F F Yes

F F F F Yes

Δ = ∀d (Rain(d)

⇒ Cloudy(d))

Days = {Monday

Tuesday}

[V.d.Broeck‟11a, Gogate‟11]

Weighted First-Order Model Counting Model = solution to first-order logic formula Δ

Rain(M) Cloudy(M) Rain(T) Cloudy(T) Model?

T T T T Yes

T F T T No

F T T T Yes

F F T T Yes

T T T F No

T F T F No

F T T F No

F F T F No

T T F T Yes

T F F T No

F T F T Yes

F F F T Yes

T T F F Yes

T F F F No

F T F F Yes

F F F F Yes

#SAT = 9

+

Δ = ∀d (Rain(d)

⇒ Cloudy(d))

Days = {Monday

Tuesday}

[V.d.Broeck‟11a, Gogate‟11]

Weighted First-Order Model Counting Model = solution to first-order logic formula Δ

Rain(M) Cloudy(M) Rain(T) Cloudy(T) Model?

T T T T Yes

T F T T No

F T T T Yes

F F T T Yes

T T T F No

T F T F No

F T T F No

F F T F No

T T F T Yes

T F F T No

F T F T Yes

F F F T Yes

T T F F Yes

T F F F No

F T F F Yes

F F F F Yes

#SAT = 9

+

Δ = ∀d (Rain(d)

⇒ Cloudy(d))

Rain

d w(R(d)) w(¬R(d))

M 1 2

T 4 1

Cloudy

d w(C(d)) w(¬C(d))

M 3 5

T 6 2

Days = {Monday

Tuesday}

[V.d.Broeck‟11a, Gogate‟11]

Weighted First-Order Model Counting Model = solution to first-order logic formula Δ

Weight

1 * 3 * 4 * 6 = 72

0

2 * 3 * 4 * 6 = 144

2 * 5 * 4 * 6 = 240

0

0

0

0

1 * 3 * 1 * 6 = 18

0

2 * 3 * 1 * 6 = 36

2 * 5 * 1 * 6 = 60

1 * 3 * 1 * 2 = 6

0

2 * 3 * 1 * 2 = 12

2 * 5 * 1 * 2 = 20

Rain(M) Cloudy(M) Rain(T) Cloudy(T) Model?

T T T T Yes

T F T T No

F T T T Yes

F F T T Yes

T T T F No

T F T F No

F T T F No

F F T F No

T T F T Yes

T F F T No

F T F T Yes

F F F T Yes

T T F F Yes

T F F F No

F T F F Yes

F F F F Yes

#SAT = 9

+

Δ = ∀d (Rain(d)

⇒ Cloudy(d))

Rain

d w(R(d)) w(¬R(d))

M 1 2

T 4 1

Cloudy

d w(C(d)) w(¬C(d))

M 3 5

T 6 2

Days = {Monday

Tuesday}

[V.d.Broeck‟11a, Gogate‟11]

Weighted First-Order Model Counting Model = solution to first-order logic formula Δ

Weight

1 * 3 * 4 * 6 = 72

0

2 * 3 * 4 * 6 = 144

2 * 5 * 4 * 6 = 240

0

0

0

0

1 * 3 * 1 * 6 = 18

0

2 * 3 * 1 * 6 = 36

2 * 5 * 1 * 6 = 60

1 * 3 * 1 * 2 = 6

0

2 * 3 * 1 * 2 = 12

2 * 5 * 1 * 2 = 20

WFOMC = 608

+

Rain(M) Cloudy(M) Rain(T) Cloudy(T) Model?

T T T T Yes

T F T T No

F T T T Yes

F F T T Yes

T T T F No

T F T F No

F T T F No

F F T F No

T T F T Yes

T F F T No

F T F T Yes

F F F T Yes

T T F F Yes

T F F F No

F T F F Yes

F F F F Yes

#SAT = 9

+

Δ = ∀d (Rain(d)

⇒ Cloudy(d))

Rain

d w(R(d)) w(¬R(d))

M 1 2

T 4 1

Cloudy

d w(C(d)) w(¬C(d))

M 3 5

T 6 2

Days = {Monday

Tuesday}

[V.d.Broeck‟11a, Gogate‟11]

Weighted First-Order

Model Counting @ UAI

• Assembly language for lifted inference

• Reduction to WFOMC for lifted inference in

– Markov logic networks [V.d.Broeck‟11a,Gogate‟11]

– Parfactor graphs [V.d.Broeck‟13a]

– Probabilistic logic programs [V.d.Broeck‟14]

– Probabilistic databases [Gribkoff‟14]

x y

A B

A C

B C

From Probabilities to Weights

Friend

x y P

A B p1

A C p2

B C p3

p1p2p3

(1-p1)p2p3

(1-p1)(1-p2)(1-p3)

x y

A C

B C

x y

A B

A C

x y

A B

B C

x y

A B x y

A C x y

B C x y

x y

A B

A C

B C

From Probabilities to Weights

Friend

x y P

A B p1

A C p2

B C p3

p1p2p3

(1-p1)p2p3

(1-p1)(1-p2)(1-p3)

x y

A C

B C

x y

A B

A C

x y

A B

B C

x y

A B x y

A C x y

B C x y

x y

A B

A C

B C

From Probabilities to Weights

Friend

x y P

A B p1

A C p2

B C p3

p1p2p3

(1-p1)p2p3

(1-p1)(1-p2)(1-p3)

x y

A C

B C

x y

A B

A C

x y

A B

B C

x y

A B x y

A C x y

B C x y

x y w(Friend(x,y)) w(¬Friend(x,y))

A B w1 = p1 w1 = 1-p1

A C w2 = p2 w2 = 1-p2

B C w3 = p3 w3 = 1-p3

A A w4 = 0 w4 = 1

A C w5 = 0 w5 = 1

… …

Also for

missing

tuples!

w1w2w3

w1w2w3

w1w2w3

Discussion

• Simple idea: replace p, 1-p by w, w

• Query computation becomes WFOMC

• To obtain a probability space, divide the weight of each world by Z = sum of weights of all worlds: Z = (w1+w1) (w2+w2) (w3+w3) …

• Why weights instead of probabilities? They can describe complex correlations (next)

Markov Logic

Capture knowledge through constraints (a.k.a. “features”):

Soft constraint,

weight = exp(3.73)

Hard constraint

∞ Smoker(x) ⇒ Person(x)

3.75 Smoker(x)∧Friend(x,y) ⇒ Smoker(y)

[Richardson‟06]

Markov Logic

Capture knowledge through constraints (a.k.a. “features”):

Soft constraint,

weight = exp(3.73)

Hard constraint

An MLN is a set of constraints (w, Γ(x)), where w=weight, Γ(x)=FO formula

∞ Smoker(x) ⇒ Person(x)

3.75 Smoker(x)∧Friend(x,y) ⇒ Smoker(y)

[Richardson‟06]

Markov Logic

Capture knowledge through constraints (a.k.a. “features”):

Soft constraint,

weight = exp(3.73)

Hard constraint

An MLN is a set of constraints (w, Γ(x)), where w=weight, Γ(x)=FO formula

Weight of a world = product of exp(w), for all MLN rules (w, Γ(x))

and grounding Γ(a) that hold in that world

∞ Smoker(x) ⇒ Person(x)

3.75 Smoker(x)∧Friend(x,y) ⇒ Smoker(y)

[Richardson‟06]

Markov Logic

Capture knowledge through constraints (a.k.a. “features”):

Soft constraint,

weight = exp(3.73)

Hard constraint

An MLN is a set of constraints (w, Γ(x)), where w=weight, Γ(x)=FO formula

Weight of a world = product of exp(w), for all MLN rules (w, Γ(x))

and grounding Γ(a) that hold in that world

∞ Smoker(x) ⇒ Person(x)

3.75 Smoker(x)∧Friend(x,y) ⇒ Smoker(y)

Probability of a world = Weight / Z

Z = sum of weights of all worlds (no longer a simple expression!)

[Richardson‟06]

Problem Statement

0.7 Actor(a) ⇒ ¬Director(a)

1.2 Director(a) ⇒ ¬WorkedFor(a,b)

1.4 InMovie(m,a) ∧ WorkedFor(a,b) ⇒ InMovie(m,b)

Actor:

Name w

Brando 2.9

Cruise 3.8

Coppola 1.1

WorkedFor:

Actor Director w

Brando Coppola 2.5

Coppola Brando 0.2

Cruise Coppola 1.7

MLN:

Database tables (if missing, then w = 1)

P(InMovie(GodFather, Brando) = ??

Given:

Compute:

Discussion

• Probabilistic databases = independence MLN = complex correlations

• To translate weights to probabilities we need to divide by Z, which often is difficult to compute

• However, we can reduce the Z-computation problem to WFOMC (next)

Z WFOMC(Δ)

1. Formula Δ

2. Weight function w(.)

Z WFOMC(Δ)

1. Formula Δ

Δ = ∧ (∞,Γ(x))∈MLN (∀x Γ(x)) If all MLN constraints are hard:

2. Weight function w(.)

Z WFOMC(Δ)

1. Formula Δ

Δ = ∧ (∞,Γ(x))∈MLN (∀x Γ(x)) If all MLN constraints are hard:

If (wi, Γi (x)) is a soft MLN constraint, then:

a) Remove (wi, Γi (x)) from the MLN

b) Add new probabilistic relation Fi(x)

c) Add hard constraint (∞, ∀x (Fi(x) ⬄ Γi (x)))

2. Weight function w(.)

Z WFOMC(Δ)

1. Formula Δ

Δ = ∧ (∞,Γ(x))∈MLN (∀x Γ(x)) If all MLN constraints are hard:

If (wi, Γi (x)) is a soft MLN constraint, then:

a) Remove (wi, Γi (x)) from the MLN

b) Add new probabilistic relation Fi(x)

c) Add hard constraint (∞, ∀x (Fi(x) ⬄ Γi (x)))

Better rewritings in

[Jha‟12],[V.d.Broeck‟14]

2. Weight function w(.)

For all constants A, relations Fi,

set w(Fi(A)) = exp(wi), w(¬Fi(A)) = 1

Z WFOMC(Δ)

1. Formula Δ

Δ = ∧ (∞,Γ(x))∈MLN (∀x Γ(x)) If all MLN constraints are hard:

If (wi, Γi (x)) is a soft MLN constraint, then:

a) Remove (wi, Γi (x)) from the MLN

b) Add new probabilistic relation Fi(x)

c) Add hard constraint (∞, ∀x (Fi(x) ⬄ Γi (x)))

Better rewritings in

[Jha‟12],[V.d.Broeck‟14]

2. Weight function w(.)

For all constants A, relations Fi,

set w(Fi(A)) = exp(wi), w(¬Fi(A)) = 1

Theorem: Z = WFOMC(Δ)

Example

1. Formula Δ

2. Weight function w(.)

Example

1. Formula Δ ∞ Smoker(x) ⇒ Person(x)

2. Weight function w(.)

Example

1. Formula Δ ∞ Smoker(x) ⇒ Person(x)

2. Weight function w(.)

Δ = ∀x (Smoker(x) ⇒ Person(x))

Example

1. Formula Δ ∞ Smoker(x) ⇒ Person(x)

3.75 Smoker(x) ∧ Friend(x,y) ⇒ Smoker(y)

2. Weight function w(.)

Δ = ∀x (Smoker(x) ⇒ Person(x))

Example

1. Formula Δ ∞ Smoker(x) ⇒ Person(x)

3.75 Smoker(x) ∧ Friend(x,y) ⇒ Smoker(y)

2. Weight function w(.)

Δ = ∀x (Smoker(x) ⇒ Person(x)) Δ = ∀x (Smoker(x) ⇒ Person(x))

∧ ∀x∀y (F(x,y) ⬄ [Smoker(x) ∧ Friend(x,y) ⇒ Smoker(y)])

Example

1. Formula Δ ∞ Smoker(x) ⇒ Person(x)

3.75 Smoker(x) ∧ Friend(x,y) ⇒ Smoker(y)

2. Weight function w(.)

F

x y w(F(x,y)) w(¬F(x,y))

A A exp(3.75) 1

A B exp(3.75) 1

A C exp(3.75) 1

B A exp(3.75) 1

. . . . . .

Note: if no tables given

for Smoker, Person, etc,

(i.e. no evidence)

then set their w = w = 1

Δ = ∀x (Smoker(x) ⇒ Person(x)) Δ = ∀x (Smoker(x) ⇒ Person(x))

∧ ∀x∀y (F(x,y) ⬄ [Smoker(x) ∧ Friend(x,y) ⇒ Smoker(y)])

Example

1. Formula Δ ∞ Smoker(x) ⇒ Person(x)

3.75 Smoker(x) ∧ Friend(x,y) ⇒ Smoker(y)

2. Weight function w(.)

F

x y w(F(x,y)) w(¬F(x,y))

A A exp(3.75) 1

A B exp(3.75) 1

A C exp(3.75) 1

B A exp(3.75) 1

. . . . . .

Note: if no tables given

for Smoker, Person, etc,

(i.e. no evidence)

then set their w = w = 1

Z = WFOMC(Δ)

Δ = ∀x (Smoker(x) ⇒ Person(x)) Δ = ∀x (Smoker(x) ⇒ Person(x))

∧ ∀x∀y (F(x,y) ⬄ [Smoker(x) ∧ Friend(x,y) ⇒ Smoker(y)])

Lessons

• Weighed Model Counting: – Unified framework for probabilistic inference tasks

– Independent variables

• Weighed FO Model Counting: – Formula described by a concise FO sentence

– Still independent variables

• MLN: – Formulas plus weights

– Correlations!

– Can be converted to WFOMC

Symmetric vs. Asymmetric

Symmetric WFOMC:

• In every relation R, all tuples have same weight

• Example: converting MLN “without evidence” into WFOMC leads to a symmetric weight function

Asymmetric WFOMC:

• Each relation R is given explicitly

• Example: Probabilistic Databases

• Example: MLN‟s plus evidence

[Gribkoff‟14]

Terminology

MLNs Prob. DBs

Random variable is a Ground atom DB Tuple

Weights w associated with Formulas DB Tuples

Typical query Q is a Single atom FO formula/SQL

Data is encoded into Evidence (Query) Distribution

Correlations induced by Model formulas Query

Model generalizes across domains? Yes No

Query generalizes across domains? No Yes

Sum of weights of worlds is 1 (normalized)? No Yes

Outline

• Part 1: Motivation

• Part 2: Probabilistic Databases

• Part 3: Weighted Model Counting

• Part 4: Lifted Inference for WFOMC

• Part 5: The Power of Lifted Inference

• Part 6: Conclusion/Open Problems

70

Informal: Exploit symmetries, Reason at first-order level, Reason about groups of objects,

Scalable inference, High-level probabilistic reasoning, etc.

A formal definition: Domain-lifted inference

Defining Lifted Inference

Polynomial in #people, #webpages, #cards

Not polynomial in #predicates, #formulas, #logical variables

Related to data complexity in databases

Inference runs in time polynomial

in the number of objects in the domain.

[V.d.Broeck‟11b, Jaeger‟12]

Informal: Exploit symmetries, Reason at first-order level, Reason about groups of objects,

Scalable inference, High-level probabilistic reasoning, etc.

A formal definition: Domain-lifted inference

Defining Lifted Inference

[V.d.Broeck‟11b, Jaeger‟12]

Informal: Exploit symmetries, Reason at first-order level, Reason about groups of objects,

Scalable inference, High-level probabilistic reasoning, etc.

A formal definition: Domain-lifted inference

Defining Lifted Inference

[V.d.Broeck‟11b, Jaeger‟12]

Lifted inference = ∃Query Plan = ∃FO Compilation

Alternative in this tutorial:

Rules for Asymmetric WFOMC

• If Δ1, Δ 2 are independent:

AND-rule: WMC(Δ1 ∧ Δ2) = WMC(Δ 1) * WMC(Δ 2) OR-rule: WMC(Δ1 ∨ Δ2) = Z - (Z1-WMC(Δ1)) * (Z2-WMC(Δ2))

Rules for Asymmetric WFOMC

• If Δ1, Δ 2 are independent:

AND-rule: WMC(Δ1 ∧ Δ2) = WMC(Δ 1) * WMC(Δ 2) OR-rule: WMC(Δ1 ∨ Δ2) = Z - (Z1-WMC(Δ1)) * (Z2-WMC(Δ2))

Normalization constants

(easy to compute)

Rules for Asymmetric WFOMC

• If Δ1, Δ 2 are independent:

AND-rule: WMC(Δ1 ∧ Δ2) = WMC(Δ 1) * WMC(Δ 2) OR-rule: WMC(Δ1 ∨ Δ2) = Z - (Z1-WMC(Δ1)) * (Z2-WMC(Δ2))

• If Δ[c1/x], Δ[c2/x], … are independent

∀-Rule: WMC(∀z Δ) = Πc ∈Domain WMC(Δ[c/z]) ∃-Rule: WMC(∃z Δ) = Z – Πc ∈Domain (Zc-WMC(Δ[c/z])

Normalization constants

(easy to compute)

Rules for Asymmetric WFOMC

• If Δ1, Δ 2 are independent:

AND-rule: WMC(Δ1 ∧ Δ2) = WMC(Δ 1) * WMC(Δ 2) OR-rule: WMC(Δ1 ∨ Δ2) = Z - (Z1-WMC(Δ1)) * (Z2-WMC(Δ2))

• If Δ[c1/x], Δ[c2/x], … are independent

∀-Rule: WMC(∀z Δ) = Πc ∈Domain WMC(Δ[c/z]) ∃-Rule: WMC(∃z Δ) = Z – Πc ∈Domain (Zc-WMC(Δ[c/z])

• Inclusion/Exclusion formula: WMC(Δ1 ∨ Δ2) = WMC(Δ1) + WMC(Δ2) - WMC(Δ1 ∧ Δ2)

WMC(Δ1 ∧ Δ2) = WMC(Δ1) + WMC(Δ2) - WMC(Δ1 ∨ Δ2)

Normalization constants

(easy to compute)

Rules for Asymmetric WFOMC

• If Δ1, Δ 2 are independent:

AND-rule: WMC(Δ1 ∧ Δ2) = WMC(Δ 1) * WMC(Δ 2) OR-rule: WMC(Δ1 ∨ Δ2) = Z - (Z1-WMC(Δ1)) * (Z2-WMC(Δ2))

• If Δ[c1/x], Δ[c2/x], … are independent

∀-Rule: WMC(∀z Δ) = Πc ∈Domain WMC(Δ[c/z]) ∃-Rule: WMC(∃z Δ) = Z – Πc ∈Domain (Zc-WMC(Δ[c/z])

• Inclusion/Exclusion formula: WMC(Δ1 ∨ Δ2) = WMC(Δ1) + WMC(Δ2) - WMC(Δ1 ∧ Δ2)

WMC(Δ1 ∧ Δ2) = WMC(Δ1) + WMC(Δ2) - WMC(Δ1 ∨ Δ2)

• Negation: WMC(¬Δ) = Z - WMC(Δ)

Normalization constants

(easy to compute)

Symmetric WFOMC Rules

• Simplifications:

If Δ[c1/x], Δ[c2/x], … are independent

∀-Rule: WMC(∀z Δ) = WMC(Δ[c1/z])|Domain|

∃-Rule: WMC(∃z Δ) = Z – (Zc1-WMC(Δ[c1/z])|Domain|

• A powerful new inference rule: atom counting

Only possible with symmetric weights

Intuition: Remove unary relations

[V.d.Broeck‟11a]

Symmetric WFOMC Rules

• Simplifications:

If Δ[c1/x], Δ[c2/x], … are independent

∀-Rule: WMC(∀z Δ) = WMC(Δ[c1/z])|Domain|

∃-Rule: WMC(∃z Δ) = Z – (Zc1-WMC(Δ[c1/z])|Domain|

The workhorse of

Symmetric WFOMC

• A powerful new inference rule: atom counting

Only possible with symmetric weights

Intuition: Remove unary relations

[V.d.Broeck‟11a]

Symmetric WFOMC Rules: Example

• FO-Model Counting: w(R) = w(¬R) = 1

• Apply inference rules backwards (step 4-3-2-1)

Symmetric WFOMC Rules: Example

4.

• FO-Model Counting: w(R) = w(¬R) = 1

• Apply inference rules backwards (step 4-3-2-1)

Δ = (Stress(Alice) ⇒ Smokes(Alice)) Domain = {Alice}

Symmetric WFOMC Rules: Example

4.

• FO-Model Counting: w(R) = w(¬R) = 1

• Apply inference rules backwards (step 4-3-2-1)

WMC(¬Stress(Alice) ∨ Smokes(Alice))) =

= Z – WMC(Stress(Alice)) × WMC(¬Smokes(Alice))

= 4 – 1 × 1 = 3 models

Δ = (Stress(Alice) ⇒ Smokes(Alice)) Domain = {Alice}

OR-rule

Symmetric WFOMC Rules: Example

4.

• FO-Model Counting: w(R) = w(¬R) = 1

• Apply inference rules backwards (step 4-3-2-1)

WMC(¬Stress(Alice) ∨ Smokes(Alice))) =

= Z – WMC(Stress(Alice)) × WMC(¬Smokes(Alice))

= 4 – 1 × 1 = 3 models

Δ = (Stress(Alice) ⇒ Smokes(Alice)) Domain = {Alice}

3. Δ = ∀x, (Stress(x) ⇒ Smokes(x)) Domain = {n people}

OR-rule

Symmetric WFOMC Rules: Example

4.

→ 3n models

• FO-Model Counting: w(R) = w(¬R) = 1

• Apply inference rules backwards (step 4-3-2-1)

WMC(¬Stress(Alice) ∨ Smokes(Alice))) =

= Z – WMC(Stress(Alice)) × WMC(¬Smokes(Alice))

= 4 – 1 × 1 = 3 models

∀-Rule

Δ = (Stress(Alice) ⇒ Smokes(Alice)) Domain = {Alice}

3. Δ = ∀x, (Stress(x) ⇒ Smokes(x)) Domain = {n people}

OR-rule

Symmetric WFOMC Rules: Example

→ 3n models

3. Δ = ∀x, (Stress(x) ⇒ Smokes(x)) Domain = {n people}

Symmetric WFOMC Rules: Example

→ 3n models

3. Δ = ∀x, (Stress(x) ⇒ Smokes(x)) Domain = {n people}

2. Δ = ∀y, (ParentOf(y) ∧ Female ⇒ MotherOf(y)) D = {n people}

Symmetric WFOMC Rules: Example

OR-Rule

→ 3n models

3. Δ = ∀x, (Stress(x) ⇒ Smokes(x)) Domain = {n people}

2. Δ = ∀y, (ParentOf(y) ∧ Female ⇒ MotherOf(y)) D = {n people}

WMC(Δ ) = WMC(¬ Female ∨ ∀y, (ParentOf(y) ⇒ MotherOf(y)))

= 2 * 2n * 2n - (2 – 1) * (2n * 2n – WMC(∀y, (ParentOf(y) ⇒ MotherOf(y))))

= 2 * 4n – (4n – 3n)

→ 3n + 4n models

Symmetric WFOMC Rules: Example

OR-Rule

→ 3n models

3. Δ = ∀x, (Stress(x) ⇒ Smokes(x)) Domain = {n people}

2. Δ = ∀y, (ParentOf(y) ∧ Female ⇒ MotherOf(y)) D = {n people}

WMC(Δ ) = WMC(¬ Female ∨ ∀y, (ParentOf(y) ⇒ MotherOf(y)))

= 2 * 2n * 2n - (2 – 1) * (2n * 2n – WMC(∀y, (ParentOf(y) ⇒ MotherOf(y))))

= 2 * 4n – (4n – 3n)

→ 3n + 4n models

Symmetric WFOMC Rules: Example

OR-Rule

→ 3n models

3. Δ = ∀x, (Stress(x) ⇒ Smokes(x)) Domain = {n people}

2. Δ = ∀y, (ParentOf(y) ∧ Female ⇒ MotherOf(y))

1. Δ = ∀x,y, (ParentOf(x,y) ∧ Female(x) ⇒ MotherOf(x,y)) D = {n people}

D = {n people}

WMC(Δ ) = WMC(¬ Female ∨ ∀y, (ParentOf(y) ⇒ MotherOf(y)))

= 2 * 2n * 2n - (2 – 1) * (2n * 2n – WMC(∀y, (ParentOf(y) ⇒ MotherOf(y))))

= 2 * 4n – (4n – 3n)

→ 3n + 4n models

→ (3n + 4n)n models

Symmetric WFOMC Rules: Example

∀-Rule

OR-Rule

→ 3n models

3. Δ = ∀x, (Stress(x) ⇒ Smokes(x)) Domain = {n people}

2. Δ = ∀y, (ParentOf(y) ∧ Female ⇒ MotherOf(y))

1. Δ = ∀x,y, (ParentOf(x,y) ∧ Female(x) ⇒ MotherOf(x,y)) D = {n people}

D = {n people}

WMC(Δ ) = WMC(¬ Female ∨ ∀y, (ParentOf(y) ⇒ MotherOf(y)))

= 2 * 2n * 2n - (2 – 1) * (2n * 2n – WMC(∀y, (ParentOf(y) ⇒ MotherOf(y))))

= 2 * 4n – (4n – 3n)

Atom Counting: Example

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

Atom Counting: Example

If we know precisely who smokes, and there are k smokers?

k

n-k

k

n-k

Database:

Smokes(Alice) = 1

Smokes(Bob) = 0

Smokes(Charlie) = 0

Smokes(Dave) = 1

Smokes(Eve) = 0

...

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

Atom Counting: Example

If we know precisely who smokes, and there are k smokers?

k

n-k

k

n-k

Database:

Smokes(Alice) = 1

Smokes(Bob) = 0

Smokes(Charlie) = 0

Smokes(Dave) = 1

Smokes(Eve) = 0

...

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

Atom Counting: Example

If we know precisely who smokes, and there are k smokers?

k

n-k

k

n-k

Database:

Smokes(Alice) = 1

Smokes(Bob) = 0

Smokes(Charlie) = 0

Smokes(Dave) = 1

Smokes(Eve) = 0

...

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

Atom Counting: Example

If we know precisely who smokes, and there are k smokers?

k

n-k

k

n-k

Database:

Smokes(Alice) = 1

Smokes(Bob) = 0

Smokes(Charlie) = 0

Smokes(Dave) = 1

Smokes(Eve) = 0

...

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

Atom Counting: Example

If we know precisely who smokes, and there are k smokers?

k

n-k

k

n-k

Database:

Smokes(Alice) = 1

Smokes(Bob) = 0

Smokes(Charlie) = 0

Smokes(Dave) = 1

Smokes(Eve) = 0

...

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

Atom Counting: Example

If we know precisely who smokes, and there are k smokers?

k

n-k

k

n-k

Database:

Smokes(Alice) = 1

Smokes(Bob) = 0

Smokes(Charlie) = 0

Smokes(Dave) = 1

Smokes(Eve) = 0

...

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

Atom Counting: Example

If we know precisely who smokes, and there are k smokers?

k

n-k

k

n-k

Database:

Smokes(Alice) = 1

Smokes(Bob) = 0

Smokes(Charlie) = 0

Smokes(Dave) = 1

Smokes(Eve) = 0

...

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

Atom Counting: Example

If we know precisely who smokes, and there are k smokers?

k

n-k

k

n-k

Database:

Smokes(Alice) = 1

Smokes(Bob) = 0

Smokes(Charlie) = 0

Smokes(Dave) = 1

Smokes(Eve) = 0

...

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

Atom Counting: Example

If we know precisely who smokes, and there are k smokers?

k

n-k

k

n-k

Database:

Smokes(Alice) = 1

Smokes(Bob) = 0

Smokes(Charlie) = 0

Smokes(Dave) = 1

Smokes(Eve) = 0

...

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

Atom Counting: Example

If we know precisely who smokes, and there are k smokers?

k

n-k

k

n-k

→ models

Database:

Smokes(Alice) = 1

Smokes(Bob) = 0

Smokes(Charlie) = 0

Smokes(Dave) = 1

Smokes(Eve) = 0

...

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

Atom Counting: Example

If we know precisely who smokes, and there are k smokers?

k

n-k

k

n-k

If we know that there are k smokers?

→ models

Database:

Smokes(Alice) = 1

Smokes(Bob) = 0

Smokes(Charlie) = 0

Smokes(Dave) = 1

Smokes(Eve) = 0

...

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

Atom Counting: Example

If we know precisely who smokes, and there are k smokers?

k

n-k

k

n-k

If we know that there are k smokers?

→ models

Database:

Smokes(Alice) = 1

Smokes(Bob) = 0

Smokes(Charlie) = 0

Smokes(Dave) = 1

Smokes(Eve) = 0

...

→ models

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

Atom Counting: Example

If we know precisely who smokes, and there are k smokers?

k

n-k

k

n-k

If we know that there are k smokers?

In total…

→ models

Database:

Smokes(Alice) = 1

Smokes(Bob) = 0

Smokes(Charlie) = 0

Smokes(Dave) = 1

Smokes(Eve) = 0

...

→ models

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

Atom Counting: Example

If we know precisely who smokes, and there are k smokers?

k

n-k

k

n-k

If we know that there are k smokers?

In total…

→ models

Database:

Smokes(Alice) = 1

Smokes(Bob) = 0

Smokes(Charlie) = 0

Smokes(Dave) = 1

Smokes(Eve) = 0

...

→ models

→ models

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

Augment Rules with Logical Rewritings

Augment Rules with Logical Rewritings

1. Remove constants (shattering)

Δ = ∀x (Friend(Alice, x) ∨ Friend(x, Bob))

Augment Rules with Logical Rewritings

1. Remove constants (shattering)

Δ = ∀x (Friend(Alice, x) ∨ Friend(x, Bob))

Δ = ∀x (F1(x) ∨ F2(x)) ∧ (F3 ∨ F4) ∧ (F4 ∨ F5)

F1(x) = Friend(Alice,x)

F2(x) = Friend(x,Bob)

F3 = Friend(Alice, Alice)

F4 = Friend(Alice,Bob)

F5 = Friend(Bob,Bob)

Augment Rules with Logical Rewritings

1. Remove constants (shattering)

Δ = ∀x (Friend(Alice, x) ∨ Friend(x, Bob))

Δ = ∀x (F1(x) ∨ F2(x)) ∧ (F3 ∨ F4) ∧ (F4 ∨ F5)

F1(x) = Friend(Alice,x)

F2(x) = Friend(x,Bob)

F3 = Friend(Alice, Alice)

F4 = Friend(Alice,Bob)

F5 = Friend(Bob,Bob)

2. “Rank” variables (= occur in the same order in each atom)

Δ = (Friend(x,y) ∨ Enemy(x,y)) ∧ (Friend(x,y) ∨ Enemy(y,x)) Wrong order

Augment Rules with Logical Rewritings

1. Remove constants (shattering)

Δ = ∀x (Friend(Alice, x) ∨ Friend(x, Bob))

Δ = ∀x (F1(x) ∨ F2(x)) ∧ (F3 ∨ F4) ∧ (F4 ∨ F5)

F1(x) = Friend(Alice,x)

F2(x) = Friend(x,Bob)

F3 = Friend(Alice, Alice)

F4 = Friend(Alice,Bob)

F5 = Friend(Bob,Bob)

2. “Rank” variables (= occur in the same order in each atom)

Δ = (Friend(x,y) ∨ Enemy(x,y)) ∧ (Friend(x,y) ∨ Enemy(y,x)) Wrong order

F1(u,v) = Friend(u,v),u<v

F2(u) = Friend(u,u) F3(u,v) = Friend(v,u),v<u

E1(u,v) = Friend(u,v),u<v

E2(u) = Friend(u,u) E3(u,v) = Friend(v,u),v<u

Δ = (F1(x,y) ∨ E1(x,y)) ∧ (F1(x,y) ∨ E3(x,y))

∧ (F2(x) ∨ E2(x))

∧ (F3(x,y) ∨ E3(x,y)) ∧ (F3(x,y) ∨ E1(x,y))

Augment Rules with Logical Rewritings

3. Perform Resolution [Gribkoff‟14]

See UAI Poster

on Saturday!

4. Skolemization [V.d.Broeck‟14]

Δ = ∀p, ∃c, Card(p,c)

Mix ∀/∃ in encodings of MLNs with quantifiers and probabilistic programs

Input: Mix ∀/∃ Output: Only ∀

Δ = ∀x∀y (R(x) ∨¬S(x,y)) ∧ ∀x∀y (S(x,y) ∨ T(y)) Rules stuck…

Resolution: Δ ∧ ∀x∀y (R(x) ∨T(y)) Now apply I/E!

Skolemization: Example

Δ = ∀p, ∃c, Card(p,c)

[V.d.Boeck‟14]

Skolemization

Skolemization: Example

Δ = ∀p, ∃c, Card(p,c)

Δ‟ = ∀p, ∀c, Card(p,c) ⇒ S(p)

[V.d.Boeck‟14]

w(S) = 1 and w(¬S) = -1

Skolemization

Skolem predicate

Skolemization: Example

Δ = ∀p, ∃c, Card(p,c)

Δ‟ = ∀p, ∀c, Card(p,c) ⇒ S(p)

[V.d.Boeck‟14]

∃c, Card(p,c) = true

Consider one position p:

w(S) = 1 and w(¬S) = -1

∃c, Card(p,c) = false

Skolemization

Skolem predicate

Skolemization: Example

Δ = ∀p, ∃c, Card(p,c)

Δ‟ = ∀p, ∀c, Card(p,c) ⇒ S(p)

[V.d.Boeck‟14]

∃c, Card(p,c) = true

S(p) = true Also model of Δ, weight * 1

Consider one position p:

w(S) = 1 and w(¬S) = -1

∃c, Card(p,c) = false

Skolemization

Skolem predicate

Skolemization: Example

Δ = ∀p, ∃c, Card(p,c)

Δ‟ = ∀p, ∀c, Card(p,c) ⇒ S(p)

[V.d.Boeck‟14]

∃c, Card(p,c) = true

S(p) = true Also model of Δ, weight * 1

Consider one position p:

w(S) = 1 and w(¬S) = -1

∃c, Card(p,c) = false

S(p) = true No model of Δ, weight * 1

S(p) = false No model of Δ, weight * -1

Extra models Cancel out

Skolemization

Skolem predicate

Skolemization: Example

Δ = ∀p, ∃c, Card(p,c)

Δ‟ = ∀p, ∀c, Card(p,c) ⇒ S(p)

[V.d.Boeck‟14]

...

Playing Cards Revisited

Let us automate this:

Relational model

Lifted probabilistic inference algorithm

∀p, ∃c, Card(p,c)

∀c, ∃p, Card(p,c)

∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟

Playing Cards Revisited

Tutorial UAI 2014 82

∀p, ∃c, Card(p,c)

∀c, ∃p, Card(p,c)

∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟

Playing Cards Revisited

Tutorial UAI 2014 82

∀p, ∃c, Card(p,c)

∀c, ∃p, Card(p,c)

∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟

Skolemization

∀p, ∀c, Card(p,c) ⇒ S1(p)

∀c, ∀p, Card(p,c) ⇒ S2(c)

∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟

Playing Cards Revisited

Tutorial UAI 2014 82

∀p, ∃c, Card(p,c)

∀c, ∃p, Card(p,c)

∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟

Skolemization

∀p, ∀c, Card(p,c) ⇒ S1(p)

∀c, ∀p, Card(p,c) ⇒ S2(c)

∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟

Playing Cards Revisited

w(S1) = 1 and w(¬S1) = -1

w(S2) = 1 and w(¬S2) = -1

Tutorial UAI 2014 82

∀p, ∃c, Card(p,c)

∀c, ∃p, Card(p,c)

∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟

Skolemization

∀p, ∀c, Card(p,c) ⇒ S1(p)

∀c, ∀p, Card(p,c) ⇒ S2(c)

∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟

Playing Cards Revisited

w(S1) = 1 and w(¬S1) = -1

w(S2) = 1 and w(¬S2) = -1

Tutorial UAI 2014 82

∀p, ∃c, Card(p,c)

∀c, ∃p, Card(p,c)

∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟

Skolemization

Atom counting

∀p, ∀c, Card(p,c) ⇒ S1(p)

∀c, ∀p, Card(p,c) ⇒ S2(c)

∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟

Playing Cards Revisited

w(S1) = 1 and w(¬S1) = -1

w(S2) = 1 and w(¬S2) = -1

Tutorial UAI 2014 82

∀p, ∃c, Card(p,c)

∀c, ∃p, Card(p,c)

∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟

∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟

Skolemization

Atom counting

∀p, ∀c, Card(p,c) ⇒ S1(p)

∀c, ∀p, Card(p,c) ⇒ S2(c)

∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟

Playing Cards Revisited

w(S1) = 1 and w(¬S1) = -1

w(S2) = 1 and w(¬S2) = -1

Tutorial UAI 2014 82

∀p, ∃c, Card(p,c)

∀c, ∃p, Card(p,c)

∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟

∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟

∀-Rule

Skolemization

Atom counting

∀p, ∀c, Card(p,c) ⇒ S1(p)

∀c, ∀p, Card(p,c) ⇒ S2(c)

∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟

Playing Cards Revisited

w(S1) = 1 and w(¬S1) = -1

w(S2) = 1 and w(¬S2) = -1

Tutorial UAI 2014 82

∀p, ∃c, Card(p,c)

∀c, ∃p, Card(p,c)

∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟

∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟

∀c, ∀c‟, Card(c) ∧ Card(c‟) ⇒ c = c‟

∀-Rule

Skolemization

Atom counting

∀p, ∀c, Card(p,c) ⇒ S1(p)

∀c, ∀p, Card(p,c) ⇒ S2(c)

∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟

Playing Cards Revisited

w(S1) = 1 and w(¬S1) = -1

w(S2) = 1 and w(¬S2) = -1

Tutorial UAI 2014 82

∀p, ∃c, Card(p,c)

∀c, ∃p, Card(p,c)

∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟

∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟

∀c, ∀c‟, Card(c) ∧ Card(c‟) ⇒ c = c‟

∀-Rule

Skolemization

Atom counting

...

Playing Cards Revisited

Let us automate this:

Lifted probabilistic inference algorithm

Computed in time polynomial in n Tutorial UAI 2014 83

Summary Lifted Inference

• By definition: PTIME data complexity

Also: ∃ FO compilation = ∃ Query Plan

• However: only works for “liftable” queries

• The rules:

– AND/OR-rules, ∀/∃-rules, I/E

(inclusion/exclusion), Atom Counting

– Deceptively simple: the only surprising rules

are I/E and atom counting

Tutorial UAI 2014 84

Next: will show that lifted inference is provably

more powerful than grounded inference

Outline

• Part 1: Motivation

• Part 2: Probabilistic Databases

• Part 3: Weighted Model Counting

• Part 4: Lifted Inference for WFOMC

• Part 5: The Power of Lifted Inference

• Part 6: Conclusion/Open Problems

Tutorial UAI 2014 85

Two Questions

• Q1: Are the lifted rules complete?

– We know that they get stuck on some queries

– Do we need to add more rules?

• Q2: Are lifted rules stronger than grounded?

– Some lifted rules easily correspond to operations

on grounded formulas (e.g. Independent-AND)

– Can we simulate every lifted inference directly on

the grounded formula?

Tutorial UAI 2014 86

Two Questions

• Q1: Are the lifted rules complete?

– We know that they get stuck on some queries

– Do we need to add more rules?

• Q2: Are lifted rules stronger than grounded?

– Some lifted rules easily correspond to operations

on grounded formulas (e.g. Independent-AND)

– Can we simulate every lifted inference directly on

the grounded formula?

Complete for Positive CNF-FO, for UCQ

Symmetric: yes (grounded inference ignores symmetries)

Asymmetric: Strictly stronger than Decision-DNNF & DPLL-based algorithms

1. Are the Lifted Rules Complete?

We use complexity classes

• Inference rules: PTIME data complexity

• Some queries: #P-hard data complexity

Dichotomy Theorem for Positive CNF-FO:

• If lifted rules succeed, then query in PTIME

• If lifted rules fail, then query is #P-hard

Implies lifted rules are complete for Positive CNF-FO

Tutorial UAI 2014 88 Will show in two steps: Small and Big Dichotomy Theorem

NP v.s. #P

• SAT = Satisfiability Problem

• SAT is NP-complete [Cook‟71]

• NP = decision problems polynomial-time, nondeterministic TM

• #SAT = model counting

• #SAT is #P-complete [Valiant‟79]

• #P = numerical functions polynomial-time, nondeterministic TM, answer = #accepting computations

Note: it would be wrong to say “#SAT is NP-complete”

A Simple Propositional Formula

that is Hard

A Positive, Partitioned 2CNF Formula is a formula of the form:

F = ∧(i,j) ∈ E (xi yj)

Where E = the edge set of a bipartite graph

Tutorial UAI 2014 90

A Simple Propositional Formula

that is Hard

1

2

1

2

3

E:

A Positive, Partitioned 2CNF Formula is a formula of the form:

F = ∧(i,j) ∈ E (xi yj)

Where E = the edge set of a bipartite graph

F = (x1 y1) ∧ (x2 y1) ∧ (x2 y3) ∧ (x1 y3) ∧ (x2 y2)

Tutorial UAI 2014 91

A Simple Propositional Formula

that is Hard

Theorem [Provan‟83] #SAT for PP2CNF is #P-hard

1

2

1

2

3

E:

A Positive, Partitioned 2CNF Formula is a formula of the form:

F = ∧(i,j) ∈ E (xi yj)

Where E = the edge set of a bipartite graph

F = (x1 y1) ∧ (x2 y1) ∧ (x2 y3) ∧ (x1 y3) ∧ (x2 y2)

Tutorial UAI 2014 92

A Query That is #P-Hard

H0= ∀x∀y (Smoker(x)∨Friend(x,y)∨Jogger(y))

Theorem. Computing P(H0 | D) is #P-hard in |D|

[Dalvi‟04]

Proof: Reduction from PP2CNF. Given a PP2CNF F defined by edge relation E, set:

P(Friend(a,b)) = 1 if (a,b) ∈ E

P(Friend(a,b)) = 0 if (a,b) ∉ E

Then the grounding of H0 is: ∧(i,j) ∈ E (Smoker(i) Jogger(j)) = F Hence, P(H0 | D) = P(F)

Lesson: no lifted inference rules will ever compute H0

Hierarchical Clause

at(x) = set of atoms containing the variable x

x

y

Non-hierarchical

Smoker Friend

x

z

Hierarchical

y

Q = (Smoker(x,y)∨Friend(x,z)) H0=Smoker(x)∨Friend(x,y)∨Jogger(y)

Definition A clause Q is hierarchical if forall variables x, y:

at(x) at(y) or at(x) ⊇ at(y) or at(x) at(y) =

= ∀x[∀y Smoker(x,y)]∨[∀z Friend(x,z)]

Smoker Friend Jogger

Small Dichotomy Theorem

Theorem [Dalvi‟04] Dichotomy:

• If Q is hierarchical, then Q is liftable (PTIME data complexity)

• If Q is not hierarchical, Q is #P-hard

Note: checking “Q is hierarchical” is in AC0 (expression complexity)

Definition A clause Q is hierarchical if forall variables x, y:

at(x) at(y) or at(x) ⊇ at(y) or at(x) at(y) =

Let Q be a single clause, w/o repeating relation symbols

And, moreover, the

OR-rule and ∀-rule are complete.

Proof Hierarchical PTIME

Proof Hierarchical PTIME

∀-Rule:

P(∀x Q) = Πa P(Q[a/x])

x

Case 1:

Q=

Proof Hierarchical PTIME

∀-Rule:

P(∀x Q) = Πa P(Q[a/x])

x

Case 1:

Q=

∨-Rule:

P(Q) =1 – (1– P(Q1))(1– P(Q2))

Q1 ∨ Q2 Q=

Case 2:

Proof Hierarchical PTIME

Non-hierarchical #P-hard

Reduction from H0:

Q = … R(x, …) ∨ S(x,y,…) ∨ T(y,…), …

x y

R S T

∀-Rule:

P(∀x Q) = Πa P(Q[a/x])

x

Case 1:

Q=

∨-Rule:

P(Q) =1 – (1– P(Q1))(1– P(Q2))

Q1 ∨ Q2 Q=

Case 2:

The Big Dichotomy Theorem

• For Positive CNF-FO the rules are not complete as stated!

• Instead we will revise inclusion/exclusion

• After the revision, the rules are complete

• We start with some non-liftable queries…

Tutorial UAI 2014 97

The Non-liftable Queries Hk

H0= R(x)∨S(x,y)∨T(y)

H1= [R(x0)∨S(x0,y0)] ∧ [S(x1,y1)∨T(y1)]

The Non-liftable Queries Hk

H0= R(x)∨S(x,y)∨T(y)

H2= [R(x0)∨S1(x0,y0)] ∧ [S1(x1,y1)∨S2(x1,y1)] ∧ [S2(x2,y2)∨T(y2)]

H1= [R(x0)∨S(x0,y0)] ∧ [S(x1,y1)∨T(y1)]

The Non-liftable Queries Hk

H0= R(x)∨S(x,y)∨T(y)

H2= [R(x0)∨S1(x0,y0)] ∧ [S1(x1,y1)∨S2(x1,y1)] ∧ [S2(x2,y2)∨T(y2)]

H1= [R(x0)∨S(x0,y0)] ∧ [S(x1,y1)∨T(y1)]

. . .

H3= [R(x0)∨S1(x0,y0)]∧[S1(x1,y1)∨S2(x1,y1)]∧[S2(x2,y2)∨S3(x2,y2)]∧[S3(x3,y3)∨T(y3)]

The Non-liftable Queries Hk

H0= R(x)∨S(x,y)∨T(y)

H2= [R(x0)∨S1(x0,y0)] ∧ [S1(x1,y1)∨S2(x1,y1)] ∧ [S2(x2,y2)∨T(y2)]

H1= [R(x0)∨S(x0,y0)] ∧ [S(x1,y1)∨T(y1)]

. . .

H3= [R(x0)∨S1(x0,y0)]∧[S1(x1,y1)∨S2(x1,y1)]∧[S2(x2,y2)∨S3(x2,y2)]∧[S3(x3,y3)∨T(y3)]

Theorem. [Dalvi‟12] For every k, the query Hk is #P-hard

So far, not very interesting…

The Query QW

QW =

[∀x0∀y0(R(x0)∨S1(x0,y0)) ∧ ∀x2∀y2(S2(x2,y2)∨S3(x2,y2))]/* Q1 */

∨ [∀x0∀y0(R(x0)∨S1(x0,y0)) ∧ ∀x3∀y3(S3(x3,y3)∨T(y3)] /* Q2 */

∨ [∀x1∀y1(S1(x1,y1)∨S2(x1,y1)) ∧ ∀x3∀y3(S3(x3,y3)∨T(y3))] /* Q3 */

QW is a Boolean

combination

of clauses in H3

H3= [R(x0)∨S1(x0,y0)]∧[S1(x1,y1)∨S2(x1,y1)]∧[S2(x2,y2)∨S3(x2,y2)]∧[S3(x3,y3)∨T(y3)]

The Query QW

QW =

[∀x0∀y0(R(x0)∨S1(x0,y0)) ∧ ∀x2∀y2(S2(x2,y2)∨S3(x2,y2))]/* Q1 */

∨ [∀x0∀y0(R(x0)∨S1(x0,y0)) ∧ ∀x3∀y3(S3(x3,y3)∨T(y3)] /* Q2 */

∨ [∀x1∀y1(S1(x1,y1)∨S2(x1,y1)) ∧ ∀x3∀y3(S3(x3,y3)∨T(y3))] /* Q3 */

QW is a Boolean

combination

of clauses in H3

QW is liftable BUT we need to use cancellations!

H3= [R(x0)∨S1(x0,y0)]∧[S1(x1,y1)∨S2(x1,y1)]∧[S2(x2,y2)∨S3(x2,y2)]∧[S3(x3,y3)∨T(y3)]

The Query QW

QW =

[∀x0∀y0(R(x0)∨S1(x0,y0)) ∧ ∀x2∀y2(S2(x2,y2)∨S3(x2,y2))]/* Q1 */

∨ [∀x0∀y0(R(x0)∨S1(x0,y0)) ∧ ∀x3∀y3(S3(x3,y3)∨T(y3)] /* Q2 */

∨ [∀x1∀y1(S1(x1,y1)∨S2(x1,y1)) ∧ ∀x3∀y3(S3(x3,y3)∨T(y3))] /* Q3 */

QW is a Boolean

combination

of clauses in H3

QW is liftable BUT we need to use cancellations!

P(QW) = P(Q1) + P(Q2) + P(Q3) +

- P(Q1 ∧ Q2) - P(Q2 ∧ Q3) – P(Q1 ∧ Q3)

+ P(Q1 ∧ Q2 ∧ Q3)

Also = H3

= H3 (hard !)

Liftable

H3= [R(x0)∨S1(x0,y0)]∧[S1(x1,y1)∨S2(x1,y1)]∧[S2(x2,y2)∨S3(x2,y2)]∧[S3(x3,y3)∨T(y3)]

The Query QW

QW =

[∀x0∀y0(R(x0)∨S1(x0,y0)) ∧ ∀x2∀y2(S2(x2,y2)∨S3(x2,y2))]/* Q1 */

∨ [∀x0∀y0(R(x0)∨S1(x0,y0)) ∧ ∀x3∀y3(S3(x3,y3)∨T(y3)] /* Q2 */

∨ [∀x1∀y1(S1(x1,y1)∨S2(x1,y1)) ∧ ∀x3∀y3(S3(x3,y3)∨T(y3))] /* Q3 */

The two hard queries cancel out, and what remains is Liftable

QW is a Boolean

combination

of clauses in H3

QW is liftable BUT we need to use cancellations!

P(QW) = P(Q1) + P(Q2) + P(Q3) +

- P(Q1 ∧ Q2) - P(Q2 ∧ Q3) – P(Q1 ∧ Q3)

+ P(Q1 ∧ Q2 ∧ Q3)

Also = H3

= H3 (hard !)

Liftable

H3= [R(x0)∨S1(x0,y0)]∧[S1(x1,y1)∨S2(x1,y1)]∧[S2(x2,y2)∨S3(x2,y2)]∧[S3(x3,y3)∨T(y3)]

Cancellations?

• Cancellations in the inclusion/exclusion

formula are critical! If we fail to do them,

then the rules get stuck

• The mathematical concept that explains

which terms cancel out is the Mobius‟

function (next)

Tutorial UAI 2014 100

August Ferdinand Möbius

1790-1868

• Möbius strip

• Möbius function μ in

number theory

• Generalized to lattices [Stanley‟97]

• And to lifted inference!

101

1

^

The Lattice of a Query Definition. The lattice of Q = Q1 ∧ Q2 ∧ … is:

• Elements are terms of inclusion/exclusion;

• Order is logical implication

1

^

1

^

The Lattice of a Query

Q1 Q2 Q3

Q2∧Q3 Q1∧Q2

Q1∧Q2∧Q3 (= Q1∧ Q3)

Definition. The lattice of Q = Q1 ∧ Q2 ∧ … is:

• Elements are terms of inclusion/exclusion;

• Order is logical implication

Nodes Liftable,

Nodes #P hard.

QW =

[ (R(x0)∨S1(x0,y0)) ∧ (S2(x2,y2)∨S3(x2,y2))]/* Q1 */

∨ [ (R(x0)∨S1(x0,y0)) ∧ (S3(x3,y3)∨T(y3)] /* Q2 */

∨ [ (S1(x1,y1)∨S2(x1,y1)) ∧ (S3(x3,y3)∨T(y3))] /* Q3 */

1

^

The Möbius‟ Function

103

Def. The Möbius function:

μ( , ) = 1

μ(u, ) = - Σu < v ≤ μ(v, ) 1

^ 1

^

1

^ 1

^ 1

^

1

^

1

^

Möbius’ Inversion Formula:

P(Q) = - Σ Qi < μ(Qi, ) P(Qi) 1

^ 1

^

The Möbius‟ Function

103

1

Def. The Möbius function:

μ( , ) = 1

μ(u, ) = - Σu < v ≤ μ(v, ) 1

^ 1

^

1

^ 1

^ 1

^

1

^

1

^

Möbius’ Inversion Formula:

P(Q) = - Σ Qi < μ(Qi, ) P(Qi) 1

^ 1

^

The Möbius‟ Function

103

1

-1 -1 -1

Def. The Möbius function:

μ( , ) = 1

μ(u, ) = - Σu < v ≤ μ(v, ) 1

^ 1

^

1

^ 1

^ 1

^

1

^

1

^

Möbius’ Inversion Formula:

P(Q) = - Σ Qi < μ(Qi, ) P(Qi) 1

^ 1

^

The Möbius‟ Function

103

1

-1 -1 -1

1 1

Def. The Möbius function:

μ( , ) = 1

μ(u, ) = - Σu < v ≤ μ(v, ) 1

^ 1

^

1

^ 1

^ 1

^

1

^

1

^

Möbius’ Inversion Formula:

P(Q) = - Σ Qi < μ(Qi, ) P(Qi) 1

^ 1

^

The Möbius‟ Function

103

1

-1 -1 -1

1 1

0

Def. The Möbius function:

μ( , ) = 1

μ(u, ) = - Σu < v ≤ μ(v, ) 1

^ 1

^

1

^ 1

^ 1

^

1

^

1

^

Möbius’ Inversion Formula:

P(Q) = - Σ Qi < μ(Qi, ) P(Qi) 1

^ 1

^

The Möbius‟ Function

-1 -1 -1

1

103

1

-1 -1 -1

1 1

0

Def. The Möbius function:

μ( , ) = 1

μ(u, ) = - Σu < v ≤ μ(v, ) 1

^ 1

^

1

^ 1

^ 1

^

1

^

1

^

Möbius’ Inversion Formula:

P(Q) = - Σ Qi < μ(Qi, ) P(Qi) 1

^ 1

^

The Möbius‟ Function

-1 -1 -1

2

1

103

1

-1 -1 -1

1 1

0

Def. The Möbius function:

μ( , ) = 1

μ(u, ) = - Σu < v ≤ μ(v, ) 1

^ 1

^

1

^ 1

^ 1

^

1

^

1

^

Möbius’ Inversion Formula:

P(Q) = - Σ Qi < μ(Qi, ) P(Qi) 1

^ 1

^

The Möbius‟ Function

-1 -1 -1

2

1

New Rule

Inclusion/Exclusion

Mobius‟ Inversion Formula 103

1

-1 -1 -1

1 1

0

Def. The Möbius function:

μ( , ) = 1

μ(u, ) = - Σu < v ≤ μ(v, ) 1

^ 1

^

1

^ 1

^ 1

^

1

^

1

^

Möbius’ Inversion Formula:

P(Q) = - Σ Qi < μ(Qi, ) P(Qi) 1

^ 1

^

The Dichotomy Theorem

104

Dichotomy Theorem [Dalvi‟12] Fix a Positive-CNF Q.

1. If Q is liftable, then P(Q) is in PTIME (obviously)

2. If Q is not liftable, then P(Q) is #P-complete

Note 1: for the theorem to hold one must replace

the inclusion/exclusion rule with the Mobius‟ rule

Note 2: Original formulation for UCQ; holds for

Positive CNF-FO by duality. Tutorial UAI 2014

Discussion

• This answers Question 1: lifted inference

rules are complete for Positive CNF-FO

• Beyond Positive CNF-FO?

– See poster on Saturday

– Take-away: rules+resolution conjectured to be

complete for CNF-FO; strong evidence that

no complete rules exists for FO

Tutorial UAI 2014 105

2. Are lifted rules stronger

than grounded?

Symmetric WFOMC:

Grounded WMC does not

use symmetries.

Query H0 is:

• Liftable on symmetric,

• #P-hard on asymmetric

Asymmetric WFOMC

Query QW is in PTIME:

• DPLL-based search

has exponential time

• Decision-DNNF have

exponential size

Alternative to lifting:

1. Ground the FO sentence

2. Do WMC on the propositional formula

Symmetric WFOMC

H0 = ∀x∀y (Smoker(x)∨Friend(x,y)∨Jogger(y))

We have seen that H0 is #P-hard (over asymmetric spaces!)

But over symmetric spaces it can be lifted:

Lifted inference is strictly more powerful than grounded inference

Symmetric WFOMC

H0 = ∀x∀y (Smoker(x)∨Friend(x,y)∨Jogger(y))

We have seen that H0 is #P-hard (over asymmetric spaces!)

But over symmetric spaces it can be lifted:

Lifted inference is strictly more powerful than grounded inference

Theorem [V.d.Broeck‟14]: every query in FO2 is liftable over symmetric spaces

FO2 includes H0, and some quite complex complex sentences like:

Q = ∀x∀y∀z∀u (Friend(x,y) ∨Enemy(y,z) ∨Friend(z,u) ∨Enemy(u,v))

= ∀x∀y (Friend(x,y) ∨ ∀x (Enemy(y,x) ∨ ∀y (Friend(x,y) ∨ ∀x (Enemy(y,x)))))

Asymmetric WFOMC

• Lifted inference does no longer have a

fundamental reason to be stronger than

grounded WMC

• However, we can prove that lifted

inference is stronger than WMC algorithms

used in practice today:

– DPLL search (with caching; with components)

– Decision-DNNF

Basic DPLL

Davis, Putnam, Logemann, Loveland [Davis‟60, ‟62]

Assume uniform distribution for simplicity

// basic DPLL:

Function P(F):

if F = false then return 0

if F = true then return 1

select a variable x, return

½ P(FX=0) + ½ P(FX=1)

Basic DPLL

Davis, Putnam, Logemann, Loveland [Davis‟60, ‟62]

F: (xy) (xuw) (xuwz)

Assume uniform distribution for simplicity

// basic DPLL:

Function P(F):

if F = false then return 0

if F = true then return 1

select a variable x, return

½ P(FX=0) + ½ P(FX=1)

Basic DPLL

Davis, Putnam, Logemann, Loveland [Davis‟60, ‟62]

x

F: (xy) (xuw) (xuwz)

Assume uniform distribution for simplicity

// basic DPLL:

Function P(F):

if F = false then return 0

if F = true then return 1

select a variable x, return

½ P(FX=0) + ½ P(FX=1)

Basic DPLL

Davis, Putnam, Logemann, Loveland [Davis‟60, ‟62]

x

0 1

F: (xy) (xuw) (xuwz)

uwz y(uw)

Assume uniform distribution for simplicity

// basic DPLL:

Function P(F):

if F = false then return 0

if F = true then return 1

select a variable x, return

½ P(FX=0) + ½ P(FX=1)

Basic DPLL

Davis, Putnam, Logemann, Loveland [Davis‟60, ‟62]

x

z

0

y

1

u 0

1

1

0

w

1

0

0

1

1 0

u 1 1

1

0

w

1

0

0

1

1 0

F: (xy) (xuw) (xuwz)

uwz

uw

w

uw

y(uw)

w

Assume uniform distribution for simplicity

// basic DPLL:

Function P(F):

if F = false then return 0

if F = true then return 1

select a variable x, return

½ P(FX=0) + ½ P(FX=1)

Basic DPLL

Davis, Putnam, Logemann, Loveland [Davis‟60, ‟62]

x

z

0

y

1

u 0

1

1

0

w

1

0

0

1

1 0

u 1 1

1

0

w

1

0

0

1

1 0

1 0 1 0

0 1

1 1

F: (xy) (xuw) (xuwz)

uwz

uw

w

uw

½

¾ ¾

y(uw) 3/8 7/8

5/8

w ½

Assume uniform distribution for simplicity

// basic DPLL:

Function P(F):

if F = false then return 0

if F = true then return 1

select a variable x, return

½ P(FX=0) + ½ P(FX=1)

Basic DPLL x

z

0

y

1

u 0

1

1

0

w

1

0

0

1

1 0

u 1 1

1

0

w

1

0

0

1

1 0

F: (xy) (xuw) (xuwz)

The trace is a

Decision-Tree for F

Caching

// basic DPLL:

Function P(F):

if F = false then return 0

if F = true then return 1

select a variable x, return

½ P(FX=0) + ½ P(FX=1)

x

z

0

y

1

u 0

1

1

0

w

1

0

0

1

1 0

u 1 1

1

0

w

1

0

0

1

1 0

F: (xy) (xuw) (xuwz)

uwz

uw

w

uw

y(uw)

w

// DPLL with caching:

Cache F and P(F);

look it up before computing

Caching

// basic DPLL:

Function P(F):

if F = false then return 0

if F = true then return 1

select a variable x, return

½ P(FX=0) + ½ P(FX=1)

x

z

0

y

1

u 0

1

1

0

w

1

0

0

1

1 0

u 1 1

1

0

w

1

0

0

1

1 0

F: (xy) (xuw) (xuwz)

uwz

uw

w

uw

y(uw)

w

// DPLL with caching:

Cache F and P(F);

look it up before computing

Caching

// basic DPLL:

Function P(F):

if F = false then return 0

if F = true then return 1

select a variable x, return

½ P(FX=0) + ½ P(FX=1)

x

z

0

y

1

u 0

1

1

0

w

1

0

0

1

1 0

u 1 1

1

0

w

1

0

0

1

1 0

F: (xy) (xuw) (xuwz)

uwz

uw

w

uw

y(uw)

w

// DPLL with caching:

Cache F and P(F);

look it up before computing

Caching

// basic DPLL:

Function P(F):

if F = false then return 0

if F = true then return 1

select a variable x, return

½ P(FX=0) + ½ P(FX=1)

x

z

0

y

1

0

1 0

u 1 1

1

0

w

1

0

0

1

1 0

F: (xy) (xuw) (xuwz)

uwz

uw

w

y(uw)

// DPLL with caching:

Cache F and P(F);

look it up before computing

Caching & FBDDs

x

z

0

y

1

0

1 0

u 1 1

1

0

w

1

0

0

1

1 0

The trace is a decision-DAG for F

Caching & FBDDs

x

z

0

y

1

0

1 0

u 1 1

1

0

w

1

0

0

1

1 0

The trace is a decision-DAG for F

FBDD (Free Binary Decision Diagram)

or

ROBP (Read Once Branching Program)

Caching & FBDDs

x

z

0

y

1

0

1 0

u 1 1

1

0

w

1

0

0

1

1 0

The trace is a decision-DAG for F

FBDD (Free Binary Decision Diagram)

or

ROBP (Read Once Branching Program)

• Every variable is tested at most once

on any path

Caching & FBDDs

x

z

0

y

1

0

1 0

u 1 1

1

0

w

1

0

0

1

1 0

The trace is a decision-DAG for F

FBDD (Free Binary Decision Diagram)

or

ROBP (Read Once Branching Program)

• Every variable is tested at most once

on any path

• All internal nodes are decision-nodes

Decision-Node

Component Analysis

x

z

0

y

1

0

1 0

u 1 1

1

0

w

1

0

0

1

1 0

F: (xy) (xuw) (xuwz)

uwz

uw

w

y (uw)

// basic DPLL:

Function P(F):

if F = false then return 0

if F = true then return 1

select a variable x, return

½ P(FX=0) + ½ P(FX=1)

// DPLL with component analysis

(and caching):

if F = G H

where G and H have disjoint set

of variables

P(F) = P(G) × P(H)

Component Analysis

x

z

0

y

1

0

1 0

u 1 1

1

0

w

1

0

0

1

1 0

F: (xy) (xuw) (xuwz)

uwz

uw

w

y (uw)

// basic DPLL:

Function P(F):

if F = false then return 0

if F = true then return 1

select a variable x, return

½ P(FX=0) + ½ P(FX=1)

// DPLL with component analysis

(and caching):

if F = G H

where G and H have disjoint set

of variables

P(F) = P(G) × P(H)

Components & Decision-DNNF

x

z

1 u 1

1

1

0

w

1

0

0

1

1 0

uwz

w

y (uw)

0

y

1

0

F: (xy) (xuw) (xuwz)

y

0 1

uw

Components & Decision-DNNF

x

z

1 u 1

1

1

0

w

1

0

0

1

1 0

0

y

1

0

The trace is a Decision-DNNF [Huang‟05, ‟07]

FBDD + “Decomposable” AND-nodes

(Two sub-DAGs do not share variables)

Decision Node

0 1 AND Node

New Queries From Hk

Consider the k+1 clauses that form Hk

Hk0= ∀x0∀y0 (R(x0)∨S1(x0,y0))

Hk1= ∀x1∀y1 (S1(x1,y1)∨S2(x1,y1))

Hk2= ∀x2∀y2 (S2(x2,y2)∨S3(x2,y2))

Hkk= ∀xk∀yk (Sk(xk,yk)∨T(yk))

Tutorial UAI 2014 115

Asymmetric WFOMC

Theorem. [Beame‟14] If the query Q is any Boolean combination

of the formulas Hk0, …, Hkk then:

• Any DPLL-based algorithm takes time Ω(2√n) time

• Any Decision-DNNF has Ω(2√n) nodes.

Tutorial UAI 2014 116

For example, QW is a Boolean combination of H30, H31, H32, H33.

Liftable (hence PTIME), yet grounded WMC takes exponential time.

Discussion

• This answers question 2: there exists queries that (a) are liftable, and (b) grounded algorithms like DPLL search or Decision-DNNF run in exponential time

• Perhaps there are more powerful grounded algorithms? We don‟t know. Open problem: do d-DNNFs compute these queries in PTIME?

Tutorial UAI 2014 117

Möbius Über Alles

#P-hard

PTIME

Poly-size d-DNNF

Poly-size FBDD,

DecisionDNNF

Poly-size OBDD

Read Once

QJ

QV

QW

Q9

H0

H1

H2

?

QU

Conjectured

H3

hierarchical

Non-hierarchical

Open

[Suciu‟11]

Outline

• Part 1: Motivation

• Part 2: Probabilistic Databases

• Part 3: Weighted Model Counting

• Part 4: Lifted Inference for WFOMC

• Part 5: The Power of Lifted Inference

• Part 6: Conclusion/Open Problems

Tutorial UAI 2014 119

Summary

• Relational models = the vast majority of

data today, plus probabilistic Databases

• Weighted Model Counting = Uniform

approach to Probabilistic Inference

• Lifted Inference = really simple rules

• The Power of Lifted Inference = we can

prove that lifted inference is better

Tutorial UAI 2014 120

Lifted Algorithms (in the AI community)

Exact Probabilistic Inference First-Order Variable Elimination [Poole‟03, Braz‟05, Milch‟08, Taghipour‟13]

First-Order Knowledge Compilation [V.d.Broeck‟11a, „11b, ‟12a, ‟13a]

Probabilistic Theorem Proving [Gogate‟11]

Approximate Probabilistic Inference Lifted Belief Propagation [Jaimovich‟07, Singla‟08, Kersting‟09]

Lifted Bisimulation/Mini-buckets [Sen‟08, „09]

Lifted Importance Sampling [Gogate‟11, ‟12]

Lifted Relax, Compensate & Recover [V.d.Broeck‟12b]

Lifted MCMC [Niepert‟12, Niepert‟13, Venugopal‟12]

Lifted Variational Inference [Choi‟12, Bui‟12]

Lifted MAP-LP [Mladenov‟14, Apsel‟14]

Special-Purpose Inference: Lifted Kalman Filter [Ahmadi‟11, Choi‟11]

Lifted Linear Programming [Mladenov‟12]

[Kimmig‟14]

“But my application has

no symmetries?”

1. Statistical relational models have abundant symmetries

2. Some tasks do not require symmetries in data Weight learning, partition functions, single marginals, etc.

3. Symmetries of computation are not symmetries of data Belief propagation and MAP-LP require weaker automorphisms

4. Over-symmetric approximations Approximate P(Q|DB) by P(Q|DB')

DB' has more symmetries than DB (is more liftable)

Very high speed improvements

Low approximation error

[Kersting‟09, Mladenov‟14, V.d.Broeck‟13b]

Open Problems

Symmetric spaces:

• Prove hardness for ANY lifted inference task. Likely needed: #P1-hardness.

• Are lifted inference rules complete beyond FO2?

Asymmetric spaces:

• Prove completeness for CNF FO formulas

• Extend lifted inference algorithms beyond liftable formulas (need approximations)

• Measure of complexity as a function of the FO formula AND the database D. E.g. if D has bounded treewidth then tractable

Final Thoughts

Long-term outlook: probabilistic inference

exploits

• 1988: conditional independence

• 2000: contextual independence (local

structure)

201?: Exchangeability/Symmetries

Need lifted inference!

Thank You!

Questions?

Thank You!

Questions?

Probabilistic

Inference

Inside!

[Suciu‟11]

StarAI Workshop

@ AAAI on Sunday

References

[Gartner‟06]

Gartner (2006). Market Share: Relational Database Management Systems by Operating System, Worldwide [Carlson‟10]

Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka Jr, E. R., & Mitchell, T. M. (2010, July). Toward an Architecture for Never-Ending Language Learning. In AAAI (Vol. 5, p. 3).

[Dong‟14] Dong, X. L., Murphy, K., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Strohmann, T., Sun, S. & Zhang, W.

(2014). Knowledge Vault: A Web-scale approach to probabilistic knowledge fusion.

[Niu‟12] Niu, F., Zhang, C., Ré, C., & Shavlik, J. W. (2012). DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference. In VLDS (pp. 25-28).

[Richardson‟06]

Richardson, M., & Domingos, P. (2006). Markov logic networks. Machine learning, 62(1-2), 107-136. [Niepert‟14]

Niepert, M., & Van den Broeck, G. (2014). Tractability through exchangeability: A new perspective on efficient probabilistic inference. Proceedings of AAAI.

[Vardi‟82] Vardi, M. Y. (1982). The complexity of relational query languages. In Proceedings of the fourteenth annual

ACM symposium on Theory of computing (pp. 137-146). ACM.

References

[Suciu‟11]

Suciu, D., Olteanu, D., Ré, C., & Koch, C. (2011). Probabilistic databases. Synthesis Lectures on Data Management, 3(2), 1-180

[Olteanu‟08] Olteanu, D., & Huang, J. (2008). Using OBDDs for efficient query evaluation on probabilistic databases. In

Scalable Uncertainty Management (pp. 326-340). Springer Berlin Heidelberg. [Jha‟13]

Jha, A., & Suciu, D. (2013). Knowledge compilation meets database theory: compiling queries to decision diagrams. Theory of Computing Systems, 52(3), 403-440.

[Dalvi‟04] Dalvi, N. and Suciu, D. 2004. Efficient query evaluation on probabilistic databases. In VLDB.

[V.d.Broeck‟11a]

Van den Broeck, G., Taghipour, N., Meert, W., Davis, J., & De Raedt, L. (2011, July). Lifted probabilistic inference by first-order knowledge compilation. In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence (pp. 2178-2185). AAAI Press.

[Dalvi‟12]

Dalvi, N., & Suciu, D. (2012). The dichotomy of probabilistic inference for unions of conjunctive queries. Journal of the ACM (JACM), 59(6), 30.

References

[Chavira‟05]

Chavira, M., & Darwiche, A. (2005). Compiling Bayesian networks with local structure. In IJCAI (Vol. 5, pp. 1306-1312).

[Sang‟05] Sang, T., Beame, P., & Kautz, H. A. (2005, July). Performing Bayesian inference by weighted model

counting. In AAAI (Vol. 5, pp. 475-481). [Chavira‟08]

Chavira, M., & Darwiche, A. (2008). On probabilistic inference by weighted model counting. Artificial Intelligence, 172(6), 772-799.

[Choi‟13] Choi, A., Kisa, D., & Darwiche, A. (2013). Compiling probabilistic graphical models using sentential decision

diagrams. In Symbolic and Quantitative Approaches to Reasoning with Uncertainty (pp. 121-132). Springer Berlin Heidelberg.

[Chavira‟06] Chavira, M., Darwiche, A., & Jaeger, M. (2006). Compiling relational Bayesian networks for exact inference.

International Journal of Approximate Reasoning, 42(1), 4-20.

[Fierens‟11] Fierens, D., Broeck, G. V. D., Thon, I., Gutmann, B., & De Raedt, L. (2011). Inference in probabilistic logic programs using weighted CNF's. in UAI.

References

[Fierens‟13]

Fierens, D., Van den Broeck, G., Renkens, J., Shterionov, D., Gutmann, B., Thon, I., ... & De Raedt, L. (2013). Inference and learning in probabilistic logic programs using weighted boolean formulas. Theory and Practice of Logic Programming, 1-44.

[Gogate‟11]

Gogate, V., & Domingos, P. (2012). Probabilistic theorem proving. Proceedings of Uncertainty in AI.

[V.d.Boeck‟13a] Van den Broeck, G. (2013). Lifted Inference and Learning in Statistical Relational Models (Doctoral dissertation, Ph. D. Dissertation, KU Leuven).

[V.d.Boeck‟14]

Van den Broeck, G., Meert, W., & Darwiche, A. (2013). Skolemization for weighted first-order model counting. Proceedings of KR.

[Gribkoff‟14] Gribkoff, E., Van den Broeck, G., & Suciu, D. (2014). Understanding the Complexity of Lifted Inference and

Asymmetric Weighted Model Counting. Proceedings of Uncertainty in AI. [Jha‟12]

Jha, A., & Suciu, D. (2012). Probabilistic databases with MarkoViews. Proceedings of the VLDB Endowment, 5(11), 1160-1171.

References

[V.d.Broeck‟11b]

Van den Broeck, G. (2011). On the completeness of first-order knowledge compilation for lifted probabilistic inference. In Advances in Neural Information Processing Systems (pp. 1386-1394).

[Jaeger-‟12] Jaeger, M., & Van den Broeck, G. (2012, August). Liftability of probabilistic inference: Upper and lower

bounds. In Proceedings of the 2nd International Workshop on Statistical Relational AI .

[Cook‟71] Cook, S. A. (1971, May). The complexity of theorem-proving procedures. In Proceedings of the third annual ACM symposium on Theory of computing (pp. 151-158). ACM.

[Valiant‟79]

Valiant, L. G. (1979). The complexity of computing the permanent. Theoretical computer science, 8(2), 189-201.

[Provan‟83] Provan, J. S., & Ball, M. O. (1983). The complexity of counting cuts and of computing the probability that a

graph is connected. SIAM Journal on Computing, 12(4), 777-788. [Stanley‟97]

Stanley, R. P. 1997. Enumerative Combinatorics. Cambridge University Press.

References

[Davis‟60]

Davis, M., & Putnam, H. (1960). A computing procedure for quantification theory. Journal of the ACM (JACM), 7(3), 201-215.

[Davis‟62] Davis, M., Logemann, G., & Loveland, D. (1962). A machine program for theorem-proving. Communications

of the ACM, 5(7), 394-397. [Huang‟05]

Huang, J., & Darwiche, A. (2005, July). DPLL with a trace: From SAT to knowledge compilation. In IJCAI (Vol. 5, pp. 156-162).

[Huang‟07] Huang, J., & Darwiche, A. (2007). The Language of Search. J. Artif. Intell. Res.(JAIR), 29, 191-219.

[Beame‟14]

Beame, P., Li, J., Roy, S., & Suciu, D. (2014). Counting of Query Expressions: Limitations of Propositional Methods. In ICDT (pp. 177-188).

[Poole‟03] Poole, D. (2003, August). First-order probabilistic inference. In IJCAI (Vol. 3, pp. 985-991).

[Braz‟05]

Braz, R., Amir, E., & Roth, D. (2005, July). Lifted first-order probabilistic inference. In Proceedings of the 19th international joint conference on Artificial intelligence (pp. 1319-1325).

References

[Milch‟08]

Milch, B., Zettlemoyer, L. S., Kersting, K., Haimes, M., & Kaelbling, L. P. (2008, July). Lifted Probabilistic Inference with Counting Formulas. In AAAI (Vol. 8, pp. 1062-1068).

[Taghipour‟13] Taghipour, N., Fierens, D., Davis, J., & Blockeel, H. (2014). Lifted variable elimination: Decoupling the

operators from the constraint language. JAIR

[V.d.Broeck‟12a] Van den Broeck, G., & Davis, J. (2012, July). Conditioning in First-Order Knowledge Compilation and Lifted Probabilistic Inference. In AAAI.

[Jaimovich‟07]

Jaimovich, A., Meshi, O., & Friedman, N. (2007). Template based inference in symmetric relational Markov random fields. Proceedings of Uncertainty in AI

[Singla‟08] Singla, P., & Domingos, P. (2008, July). Lifted First-Order Belief Propagation. In AAAI (Vol. 8, pp. 1094-

1099). [Kersting‟09]

Kersting, K., Ahmadi, B., & Natarajan, S. (2009, June). Counting belief propagation. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (pp. 277-284). AUAI Press.

References [Sen‟08]

Sen, P., Deshpande, A., & Getoor, L. (2008). Exploiting shared correlations in probabilistic databases. Proceedings of the VLDB Endowment, 1(1), 809-820.

[Sen‟09] Sen, P., Deshpande, A., & Getoor, L. (2009). Bisimulation-based approximate lifted inference. In

Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (pp. 496-505). AUAI Press.

[Gogate‟12] Gogate, V., Jha, A. K., & Venugopal, D. (2012). Advances in Lifted Importance Sampling. In AAAI.

[V.d.Broeck‟12b]

Van den Broeck, G., Choi, A., & Darwiche, A. (2012). Lifted relax, compensate and then recover: From approximate to exact lifted probabilistic inference. Proceedings of Uncertainty in AI

[Nieper‟12] Niepert, M. (2012). Markov chains on orbits of permutation groups. Proceedings of Uncertainty in AI

[Niepert‟13] Niepert, M. (2013). Symmetry-Aware Marginal Density Estimation. Proceedings of AAAI.

[Venugopal‟12] Venugopal, D., & Gogate, V. (2012). On lifting the gibbs sampling algorithm. In Advances in Neural

Information Processing Systems (pp. 1655-1663).

References [Choi‟12]

Choi, J., & Amir, E. (2012). Lifted relational variational inference. Proceedings of Uncertainty in AI [Bui‟12]

Bui, H. H., Huynh, T. N., & Riedel, S. (2012). Automorphism groups of graphical models and lifted variational inference. StarAI

[Mladenov‟14] Mladenov, M., Kersting, K., & Globerson, A. (2014). Efficient Lifting of MAP LP Relaxations Using k-Locality.

In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics (pp. 623-632).

[Apsel‟14] Apsel, U., Kersting, K., & Mladenov, M. (2014). Lifting Relational MAP-LPs using Cluster Signatures.

Proceedings of AAAI

[Ahmadi‟11] Ahmadi, B., Kersting, K., & Sanner, S. (2011, July). Multi-evidence lifted message passing, with application

to pagerank and the kalman filter. In IJCAI Proceedings-International Joint Conference on Artificial Intelligence (Vol. 22, No. 1, p. 1152).

[Choi‟11] Choi, J., Guzman-Rivera, A., & Amir, E. (2011, June). Lifted Relational Kalman Filtering. In IJCAI (pp. 2092-

2099).

References [Mladenov‟12]

Mladenov, M., Ahmadi, B., & Kersting, K. (2012). Lifted linear programming. In International Conference on Artificial Intelligence and Statistics (pp. 788-797).

[Kimmig‟14] Kimmig, A., Mihalkova, L., & Getoor, L. (2014). Lifted graphical models: a survey. Machine Learning, 1-45.

[V.d.Broeck‟13b] Van den Broeck, G., & Darwiche, A. (2013). On the complexity and approximation of binary evidence in lifted

inference. In Advances in Neural Information Processing Systems (pp. 2868-2876).

top related