probabilistic reasoning about possible worlds

62
Probabilistic Reasoning about Possible Worlds Possible Worlds Brian Milch Google (based on work at UC Berkeley with Stuart Russell and his group) IPAM Summer School July 8, 2011

Upload: others

Post on 23-Jun-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Probabilistic Reasoning about Possible Worlds

Probabilistic Reasoning about

Possible WorldsPossible Worlds

Brian Milch

Google(based on work at UC Berkeley with Stuart Russell and his group)

IPAM Summer SchoolJuly 8, 2011

Page 2: Probabilistic Reasoning about Possible Worlds

Example: Bibliographies

Z. Ghahramani M. J. Beal

“Variational…mixtures of

factor analysers”

M. I. Jordan

“Variational

learning…”

Z. Ghahramani and M. Beal. “Variational inference for bayesian mixture of factor

analysers”. In Advances in Neural Information Processing Systems 12, 1999.

Z. Ghahramani and M. J. Beal, "Variational Inference for Bayesian Mixtures of

Factor Analyzers, " NIPS12, MIT Press, (2000).

22

Page 3: Probabilistic Reasoning about Possible Worlds

Probabilities on Possible Worlds

1.2 x 10-12 2.3 x 10-13 4.5 x 10-13

… …

3

… … …

6.7 x 10-15 8.9 x 10-15 9.0 x 10-17

Page 4: Probabilistic Reasoning about Possible Worlds

Traditional Approach

Customlearning

Custominference

Data Evidence Query

Model in

4

learningalgorithm

inferencealgorithm

Answer

specific class

Page 5: Probabilistic Reasoning about Possible Worlds

Role of Representation Language

Genericlearning

Genericinference

Data Evidence Query

Modelin

5

learningalgorithm

inferencealgorithm

Answer

inlanguage

Page 6: Probabilistic Reasoning about Possible Worlds

Outline

• Approaches to probability over possible worlds

• Bayesian Logic (BLOG)

– Generalization of Bayesian networks– Generalization of Bayesian networks

– Unknown objects

• Inference: MCMC over possible worlds

• Application: Seismic monitoring

6

Page 7: Probabilistic Reasoning about Possible Worlds

Levels of Expressivity

• Atomic: Enumerate all worlds (states)

– Examples: histogram, Markov model

• Propositional: Describe worlds using

propositions or random variables (RVs)propositions or random variables (RVs)

– Examples: propositional logic, graphical models

• First-order: Parameterize propositions or RVs

with logical variables

– Examples: first-order logic, programming languages, plate models, BLOG, Markov Logic, …

7

Page 8: Probabilistic Reasoning about Possible Worlds

Example: Rules of Chess

• 1 page in first-order logic

On(color,piece,x,y,t)

• ~100000 pages in propositional logic

WhiteKingOnC4Move12WhiteKingOnC4Move12

• ~100000000000000000000000000000000000000 pages as atomic-state model

R.B.KB.RPPP..PPP..N..N…..PP….q.pp..Q..n..n..ppp..pppr.b.kb.r

(Note: Chess is still tiny compared to the real world)

8

Page 9: Probabilistic Reasoning about Possible Worlds

First-Order Logical Languages

• For a given application, declare:

– Constant symbols: Citation1, Ghahramani, …

– Predicate symbols: IsProf(person),

AuthorOf(person, publication), …AuthorOf(person, publication), …

– Function symbols: PubCited(citation),

Name(person), Affiliation(person, date)…

9

Page 10: Probabilistic Reasoning about Possible Worlds

First-Order Structures

• Formalization of possible worlds

• Each structure specifies domain of objects, and maps:

– Constant symbols → objects– Constant symbols → objects

– Predicate symbols → relations on objects

– Function symbols → functions on objects

• Each sentence of logical language is true or false in each structure

10

Page 11: Probabilistic Reasoning about Possible Worlds

Approach 1: Probabilities for

First-Order Formulas

• Specify probabilities of sentences or formulas, as in:

– 0.2: ∀∀∀∀x IsProf(x) [Gaifman 1964]

– ∀∀∀∀x(µµµµ(IsProf(x)) = 0.7) [Halpern 1990]– ∀∀∀∀x(µµµµ(IsProf(x)) = 0.7) [Halpern 1990]

• Drawback: hard to tell if there’s exactly one distribution over structures satisfying these constraints

11

Page 12: Probabilistic Reasoning about Possible Worlds

Approach 2:

Weights for Formulas

2.5: IsProf(x) → Busy(x)

e2.5 1

e2.5 e2.5

T

T F

FIsProf(Smith)

Busy(Smith)

• Weight of world x is exp(∑ϕ wϕ #sat(ϕ ,x))

• Get probability distribution by normalizing

• Weights need to be learned jointly

12

e eF

(Markov logic [Richardson & Domingos 2006], relational Markov networks [Taskar et al. 2002])

Page 13: Probabilistic Reasoning about Possible Worlds

Approach 3: Logic Plus Coin Flips

• Coin flips are mutually independent

• Coin flips yield a structure by logical

0.8: BeingProfCausesBusy(x)

IsProf(x) ∧∧∧∧ BeingProfCausesBusy(x) → Busy(x)

• Coin flips yield a structure by logical implication; anything not implied is false

13

(probabilistic Horn abduction [Poole 1993], independent choice logic [Poole 1997],PRISM [Sato & Kameya 1997])

Page 14: Probabilistic Reasoning about Possible Worlds

Approach 4:

Conditional Probability Statements

• Specify conditional probability distribution

Busy(x) if IsProf(x) then ~ Bernoulli[0.8]else ~ Bernoulli[0.3]

• Specify conditional probability distribution (CPD) for each predicate and function

– Defines directed graphical model

– Model must be acyclic

14

(BUGS/plates [Thomas et al. 1992], relational Bayes nets [Jaeger 1997], probabilistic relational models [Friedman et al. 1999], Bayesian logic programs [Kersting & De Raedt 2001], Bayesian logic [Milch et al. 2005], multi-entity Bayes nets [Laskey 2008], many others)

Page 15: Probabilistic Reasoning about Possible Worlds

Approach 5: Stochastic Programs

• Functions are stochastic: each call may

(define (Busy is_prof) (cond is_prof(flip 0.8)(flip 0.2)))

• Functions are stochastic: each call may return a different value (but see “mem” in Church)

– Contrast with IsProf(x) in other approaches

– Store results in program variables for re-use,

implicitly define possible world

15

(Stochastic logic programs [Muggleton 1996], stochastic programs [Koller et al. 1997], IBAL [Pfeffer 2001], λO [Frank et al. 2005], Church [Goodman et al. 2008])

Page 16: Probabilistic Reasoning about Possible Worlds

Summary of Approaches

Arbitrary formulas Generative process

First-order probabilistic model

16

Formulaprobabilities

Formulaweights

Stochastic program

Logic pluscoin flips

CPD-basedmodel

Random-world semantics

Markovlogic

BLOG Church

Page 17: Probabilistic Reasoning about Possible Worlds

Bibliographies Again

Z. Ghahramani M. J. Beal

“Variational…mixtures of

factor analysers”

M. I. Jordan

“Variational

learning…”

Z. Ghahramani and M. Beal. “Variational inference for bayesian mixture of factor

analysers”. In Advances in Neural Information Processing Systems 12, 1999.

Z. Ghahramani and M. J. Beal, "Variational Inference for Bayesian Mixtures of

Factor Analyzers, " NIPS12, MIT Press, (2000).

1717

Page 18: Probabilistic Reasoning about Possible Worlds

BLOG Model: Object Existencetype Researcher; type Pub; type Citation;

guaranteed Citation Cit1, Cit2, Cit3, Cit4;

#Researcher ~ NumResearchersPrior();

#Pub ~ NumPubsPrior();

18

Page 19: Probabilistic Reasoning about Possible Worlds

BLOG Model: Relational Structuretype Researcher; type Pub; type Citation;

guaranteed Citation Cit1, Cit2, Cit3, Cit4;

#Researcher ~ NumResearchersPrior();

#Pub ~ NumPubsPrior();

random NaturalNum NumAuthors(Pub p) ~ NumAuthorsPrior();

random Researcher NthAuthor(Pub p, NaturalNum n)

19

random Researcher NthAuthor(Pub p, NaturalNum n)

if (n < NumAuthors(p)) then ~ Uniform(Res r);

random Pub PubCited(Citation c) ~ Uniform(Pub p);

Page 20: Probabilistic Reasoning about Possible Worlds

Full BLOG Model for Citationstype Researcher; type Pub; type Citation;

guaranteed Citation Cit1, Cit2, Cit3, Cit4;

#Researcher ~ NumResearchersPrior();

random String Name(Researcher r) ~ NamePrior();

#Pub ~ NumPubsPrior();

random NaturalNum NumAuthors(Pub p) ~ NumAuthorsPrior();

random Researcher NthAuthor(Pub p, NaturalNum n)

20

random Researcher NthAuthor(Pub p, NaturalNum n)

if (n < NumAuthors(p)) then ~ Uniform(Res r);

random String Title(Pub p) ~ TitlePrior();

random Pub PubCited(Citation c) ~ Uniform(Pub p);

random String Text(Citation c)

~ FormatCPD(Title(PubCited(c)),

n, Name(NthAuthor(PubCited(c), n)) for

NaturalNum n : n < NumAuthors(PubCited(c)));

Page 21: Probabilistic Reasoning about Possible Worlds

Flexibility of Number Variables

• Can have whole family of number variables for a given type:

#Pub(FirstAuthor = r) ~ PubsPerAuthorPrior();

• Number can depend on other variables:

21

For each researcher r, there is random variableindicating |p : FirstAuthor(p) = r|

#Pub(FirstAuthor = r)

if IsProf(r) then ~ ProfPubsPrior()

else ~ IndividualPubsCPD(Diligence(r));

Page 22: Probabilistic Reasoning about Possible Worlds

What Exactly Are the Objects?

#Researcher ~ NumResearchersPrior();

Objects are (Res, 1), (Res, 2), …

#Pub(FirstAuthor = r) ~ PubsPerAuthorPrior();

22

Publications with FirstAuthor = (Res, 2) are:(Pub, (FirstAuthor, (Res, 2)), 1)(Pub, (FirstAuthor, (Res, 2)), 2)…

Result: Given values of number variables, there’s no ambiguity about which publications have which first authors

Page 23: Probabilistic Reasoning about Possible Worlds

BN Defined by BLOG Model

• BN includes

– All variables that have values in any world

– All potential dependencies

#Pub

#Pub unbounded→ infinitely many Title nodes

23

Title(Pub1) Title(Pub2) Title(Pub3) …

Text(Cit1)PubCited(Cit1)

BN has notopologicalnumbering

Previous results don’t guarantee that CPDs define joint distribution

Page 24: Probabilistic Reasoning about Possible Worlds

Contingent Bayes Nets

Title(Pub1) Title(Pub2) Title(Pub3) …

Text(Cit1)PubCited(Cit1)

#Pub

PubCited(Cit1)

= Pub1

PubCited(Cit1)

= Pub2PubCited(Cit1)

= Pub3

• Contingent Bayes net (CBN)

– Each edge labeled with condition

– Each world yields subgraph of active edges

24

Text(Cit1)PubCited(Cit1)

[Milch et al., AI/Stats 2005]

Page 25: Probabilistic Reasoning about Possible Worlds

BLOG Semantics Using CBNs

• BLOG model M defines set of possible

worlds ΩM, contingent Bayes net BM

• Theorem: If for each world in ΩM, the active subgraph of B has a topological active subgraph of BM has a topological numbering, then the model M fully defines

a probability distribution on ΩM

25

Topological numbering can vary from world to world

Flexibility in modeling unknown objects, relations

Page 26: Probabilistic Reasoning about Possible Worlds

Markov Chain Monte Carlo

(MCMC) over Possible Worlds• Random walk that converges to

distribution defined by model

• Proposed move w → w′′′′ accepted

or rejected based on P(w′′′′ ) / P(w)

… …1 10037

“Bayesian…”“…analysers”

“Logical…”

“…analysers”

26

… …1 10037

“Bayesian…”“…analyzers”“Logical…”

… …1 8037

“Bayesian…”“…analysers”“Logical…”

… …1 10037

“Bayesian…”“…analysers”

“Logical…”…

85

“…analyzers”

Page 27: Probabilistic Reasoning about Possible Worlds

• Instantiate varying subsets of

the random variables

• When is this correct?

Partial Worlds

37

“…analysers”

#Pub=100

27

37

“…analyzers”

37

“…analysers”

37

“…analysers”

85

“…analyzers”

#Pub=100

#Pub=80

#Pub=100

Page 28: Probabilistic Reasoning about Possible Worlds

MCMC over Sets of Worlds• Think of partial world as

description satisfied by whole set of worlds

• Probability of partial world is probability of this whole set

37

“…analysers”

#Pub=100

28[Milch & Russell, UAI 2006]

• Theorem: MCMC over partial worlds is correct if:

• Each partial world is specific enough to affirm

the given evidence and answer the query

• Partial worlds define non-overlapping sets of

possible worlds

Page 29: Probabilistic Reasoning about Possible Worlds

Probabilities of Partial Worlds

• What’s the probability of the

set of worlds satisfying this

description?

• As in BN, it’s product of CPDs:

Pub37

“…analysers”

#Pub=100

“…analyzers”

Cit1

29

p(#Pub = 100) × p(Title(Pub37) = “…analysers” | #Pub = 100)× p(PubCited(Cit1) = Pub37 | #Pub = 100) × p(Text(Cit1) = “…analyzers” | PubCited(Cit1) = Pub37,

Title(Pub37) = “…analysers”)

Probability of partial world is product of CPDs

if the world includes all the active parents

of the variables it instantiates

Page 30: Probabilistic Reasoning about Possible Worlds

Numbering of Objects?

• Specifying publication numbers

is unnecessary and inefficient

37

“…analysers”

#Pub=100 Reducing #Pub to 80 would be rejected here

30

37

“…analyzers”

37

“…analysers”

37

“…analysers”

85

“…analyzers”

#Pub=100

#Pub=80

#Pub=100

Page 31: Probabilistic Reasoning about Possible Worlds

Abstraction over Objects

• Can use existential quantifiers,

describe larger set of worlds

• Probabilities include factorialsx

“…analysers”

#Pub=100∃∃∃∃x

31

x

“…analyzers”

x

“…analysers”

x

“…analysers”

y

“…analyzers”

#Pub=100

#Pub=80

#Pub=100

∃∃∃∃x

∃∃∃∃x

∃∃∃∃ x, ∃∃∃∃y≠≠≠≠x

Page 32: Probabilistic Reasoning about Possible Worlds

MCMC over Partial Worlds

… …1 10037

“Bayesian…”“…analysers”

“Logical…”…

85

“…analyzers”

x

“…analysers”

y

“…analyzers”

#Pub=100

∃∃∃∃ x, ∃∃∃∃y≠≠≠≠x

Complete world Partial world

• Derived conditions under which generic MCMC engine can use partial worlds

• Engine avoids spending time changing irrelevant aspects of world

32

Page 33: Probabilistic Reasoning about Possible Worlds

Experiment: Citation Data Set

• Citeseer created four sets of 300-500 citations by searching for certain strings (“face”, “reinforcement”, “reasoning”, “constraint”)

• Lots of noise in citation strings• Lots of noise in citation strings

33

[Lashkari et al 94] Collaborative Interface Agents, Yezdi

Lashkari, Max Metral, and Pattie Maes, Proceedings of the

Twelfth National Conference on Articial Intelligence, MIT

Press, Cambridge, MA, 1994.

Metral M. Lashkari, Y. and P. Maes. Collaborative interface

agents. In Conference of the American Association for

Artificial Intelligence, Seattle, WA, August 1994.

Page 34: Probabilistic Reasoning about Possible Worlds

Experiment: Accuracy Results

0.15

0.2

0.25

Err

or

rate

(Fra

ctio

n o

f clu

ste

rs n

ot re

co

ve

red

exactly)

Phrase Matching[Lawrence et al. 1999]

MCMC over Partial Worlds

34

Four data sets of ~300-500 citations, referring to ~150-300 papers

0

0.05

0.1

Reinforce Face Reason Constraint

Err

or

rate

(Fra

ctio

n o

f clu

ste

rs n

ot re

co

ve

red

exactly)

MCMC over Partial Worlds [Pasula et al. 2003]

Conditional Random Field[Wellner et al. 2004]

Page 35: Probabilistic Reasoning about Possible Worlds

Seismological Monitoring

• Comprehensive Nuclear Test Ban Treaty (CTBT) organization responsible for detecting nuclear explosions

• Detect seismological events, then figure • Detect seismological events, then figure out if earthquakes or nuclear tests

• Vertically Integrated Seismic Analysis (VISA) project using BLOG-style model [Arora et al., NIPS 2010]

35

Page 36: Probabilistic Reasoning about Possible Worlds

254 monitoring stations

36

Page 37: Probabilistic Reasoning about Possible Worlds

37

Page 38: Probabilistic Reasoning about Possible Worlds

Vertically Integrated Seismic Analysis

• The problem is hard:

– ~10000 “detections” per day, 90% false

– CTBT system (SEL3) finds 69% of significant events plus

about twice as many spurious (nonexistent) events

– 16 human analysts find more events, correct existing ones,

38

– 16 human analysts find more events, correct existing ones,

throw out spurious events, generate LEB (“ground truth”)

– Unreliable below magnitude 4 (1kT)

• Solve it by global probabilistic inference

– NET-VISA finds around 88% of significant events

Page 39: Probabilistic Reasoning about Possible Worlds

39

Page 40: Probabilistic Reasoning about Possible Worlds

40

Page 41: Probabilistic Reasoning about Possible Worlds

41

Page 42: Probabilistic Reasoning about Possible Worlds

42

Page 43: Probabilistic Reasoning about Possible Worlds

43

Page 44: Probabilistic Reasoning about Possible Worlds

44

Page 45: Probabilistic Reasoning about Possible Worlds

45

Page 46: Probabilistic Reasoning about Possible Worlds

46

Page 47: Probabilistic Reasoning about Possible Worlds

47

Page 48: Probabilistic Reasoning about Possible Worlds

Generative model for detections• Events occur in time and space with magnitude

– Natural spatial distribution a mixture of Fisher-Binghams

– Man-made spatial distribution uniform

– Time distribution Poisson with given spatial intensity

– Magnitude distribution Gutenberg-Richter (exponential)

– Aftershock distribution (not yet implemented)

48

– Aftershock distribution (not yet implemented)

• Travel time according to IASPEI91 model plusLaplacian error distribution for each of 14 phases

• Detection depends on magnitude, distance, station

• Detected azimuth, slowness have Laplacian error

• False detections with station-dependent distribution

Page 49: Probabilistic Reasoning about Possible Worlds

Model for Seismic Monitoring# SeismicEvents ~ Poisson[TIME_DURATION*EVENT_RATE];

IsEarthQuake(e) ~ Bernoulli(.999);

EventLocation(e) ~ If IsEarthQuake(e) then EarthQuakeDistribution()

Else UniformEarthDistribution();

Magnitude(e) ~ Exponential(log(10)) + MIN_MAG;

Distance(e,s) = GeographicalDistance(EventLocation(e), SiteLocation(s));

IsDetected(e,p,s) ~ Logistic[SITE_COEFFS(s,p)](Magnitude(e), Distance(e,s);

49

IsDetected(e,p,s) ~ Logistic[SITE_COEFFS(s,p)](Magnitude(e), Distance(e,s);

#Arrivals(site = s) ~ Poisson[TIME_DURATION*FALSE_RATE(s)];

#Arrivals(event=e, phase = p, site = s) = If IsDetected(e,p,s) then 1 else 0;

Time(a) ~ If (event(a) = null) then Uniform(0,TIME_DURATION)

else IASPEI(EventLocation(event(a)),SiteLocation(site(a)),Phase(a)) + TimeRes(a);

TimeRes(a) ~ Laplace(TIMLOC(site(a)), TIMSCALE(site(a)));

Azimuth(a) ~ If (event(a) = null) then Uniform(0, 360)

else GeoAzimuth(EventLocation(event(a)),SiteLocation(site(a)) + AzRes(a);

AzRes(a) ~ Laplace(0, AZSCALE(site(a)));

Slow(a) ~ If (event(a) = null) then Uniform(0,20)

else IASPEI-SLOW(EventLocation(event(a)),SiteLocation(site(a)) + SlowRes(site(a));

Page 50: Probabilistic Reasoning about Possible Worlds

50

Page 51: Probabilistic Reasoning about Possible Worlds

Prior on Event Locations

EventLocation(e) ~ If IsEarthQuake(e) then EarthQuakeDistribution()

51

Page 52: Probabilistic Reasoning about Possible Worlds

Detection Probability

P phase S phase

IsDetected(e,p,s) ~ Logistic[SITE_COEFFS(s,p)](Magnitude(e), Distance(e,s);

52

Page 53: Probabilistic Reasoning about Possible Worlds

Travel timeTime(a) ~ IASPEI(EventLocation(event(a)),SiteLocation(site(a)),Phase(a))

+ TimeRes(a);

53

Page 54: Probabilistic Reasoning about Possible Worlds

Detected Azimuth

Azimuth(a) ~ GeoAzimuth(EventLocation(event(a)),SiteLocation(site(a))

+ AzRes(a);

54

Page 55: Probabilistic Reasoning about Possible Worlds

Fraction of LEB events missed

NET-VISA

55

Page 56: Probabilistic Reasoning about Possible Worlds

Precision-Recall Curve

56

Page 57: Probabilistic Reasoning about Possible Worlds

Event distribution: LEB vs SEL3

57

Page 58: Probabilistic Reasoning about Possible Worlds

Event distribution: LEB vs NET-VISA

58

Page 59: Probabilistic Reasoning about Possible Worlds

Why does NET-VISA work?

• Empirically calibrated seismological models

– Improving model structure and quality improves the

results

• Sound Bayesian combination of evidence

59

• Measured arrival times, phase labels, azimuths,

etc., NOT taken as truth

• Absence of detections provides negative

evidence

• More detections per event than SEL3 or LEB

Page 60: Probabilistic Reasoning about Possible Worlds

Why does NET-VISA not work (perfectly)?

• Needs hydroacoustics for mid-ocean events

• Weaknesses in model:– Travel time residuals for all phases along a single

path are assumed to be uncorrelated

– Each phase arrival is assumed to generate at

60

– Each phase arrival is assumed to generate at most one detection; in fact, multiple detectionsoccur

• Arrival detectors use high SNR thresholds, look only at local signal to make hard decisions

Page 61: Probabilistic Reasoning about Possible Worlds

Detection-based and signal-based monitoring

events

SEL3 NET-VISA SIG-VISA

61

detections

waveform signals

SEL3 NET-VISA SIG-VISA

Page 62: Probabilistic Reasoning about Possible Worlds

Summary

• CPD-based first-order probabilistic models

– Describe generative processes for worlds

– Generalize Bayesian networks

• BLOG adds support for unknown objects• BLOG adds support for unknown objects

• Can run MCMC over partial worlds

• Strong results when CPDs come from science, as in seismological monitoring

62