reasoning with bayesian networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · lecture 1:...

85
Probability Calculus Bayesian Networks Reasoning with Bayesian Networks Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning with Bayesian Networks

Upload: trandung

Post on 16-May-2018

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Reasoning with Bayesian NetworksLecture 1: Probability Calculus, Bayesian Networks

Jinbo Huang

NICTA and ANU

Jinbo Huang Reasoning with Bayesian Networks

Page 2: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Overview of the Course

I Probability calculus, Bayesian networks

I Inference by variable elimination, factor elimination,conditioning

I Models for graph decomposition

I Most likely instantiations

I Complexity of probabilistic inference

I Compiling Bayesian networks

I Inference with local structure

I Applications

Jinbo Huang Reasoning with Bayesian Networks

Page 3: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Probabilities of Worlds

Worlds modeled with propositional variables

A world is complete instantiation of variables

Probabilities of all worlds add up to 1

An event is a set of worlds

Jinbo Huang Reasoning with Bayesian Networks

Page 4: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Probabilities of Events

Pr(α)def=

∑ω∈α

Pr(ω)

I Pr(Earthquake) = Pr(ω1) + Pr(ω2) + Pr(ω3) + Pr(ω4) = .1

I Pr(¬Burglary) = .8

I Pr(Alarm) = .2442

Jinbo Huang Reasoning with Bayesian Networks

Page 5: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Probabilities of Events

Pr(α)def=

∑ω∈α

Pr(ω)

I Pr(Earthquake) = Pr(ω1) + Pr(ω2) + Pr(ω3) + Pr(ω4) = .1

I Pr(¬Burglary) = .8

I Pr(Alarm) = .2442

Jinbo Huang Reasoning with Bayesian Networks

Page 6: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Propositional Sentences as Events

Pr((Earthquake ∨ Burglary) ∧ ¬Alarm) =?

Interpret world ω as model (satisfying assignment) forpropositional sentence α

Pr(α)def=

∑ω|=α

Pr(ω)

Jinbo Huang Reasoning with Bayesian Networks

Page 7: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Propositional Sentences as Events

Pr((Earthquake ∨ Burglary) ∧ ¬Alarm) =?

Interpret world ω as model (satisfying assignment) forpropositional sentence α

Pr(α)def=

∑ω|=α

Pr(ω)

Jinbo Huang Reasoning with Bayesian Networks

Page 8: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Updating Beliefs

Suppose we’re given evidence β that Alarm = true

Need to update Pr(.) to Pr(.|β)

Jinbo Huang Reasoning with Bayesian Networks

Page 9: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Updating Beliefs

Pr(β|β) = 1, Pr(¬β|β) = 0

Partition worlds into those |= β and those |= ¬βI Pr(ω|β) = 0, for all ω |= ¬β

I Relative beliefs stay put, for ω |= β

I∑

ω|=β Pr(ω) = Pr(β)

I∑

ω|=β Pr(ω|β) = 1

I Above imply we normalize by constant Pr(β)

Pr(ω|β)def=

{0, if ω |= ¬βPr(ω)/Pr(β), if ω |= β

Jinbo Huang Reasoning with Bayesian Networks

Page 10: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Updating Beliefs

Pr(β|β) = 1, Pr(¬β|β) = 0

Partition worlds into those |= β and those |= ¬βI Pr(ω|β) = 0, for all ω |= ¬βI Relative beliefs stay put, for ω |= β

I∑

ω|=β Pr(ω) = Pr(β)

I∑

ω|=β Pr(ω|β) = 1

I Above imply we normalize by constant Pr(β)

Pr(ω|β)def=

{0, if ω |= ¬βPr(ω)/Pr(β), if ω |= β

Jinbo Huang Reasoning with Bayesian Networks

Page 11: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Updating Beliefs

Pr(β|β) = 1, Pr(¬β|β) = 0

Partition worlds into those |= β and those |= ¬βI Pr(ω|β) = 0, for all ω |= ¬βI Relative beliefs stay put, for ω |= β

I∑

ω|=β Pr(ω) = Pr(β)

I∑

ω|=β Pr(ω|β) = 1

I Above imply we normalize by constant Pr(β)

Pr(ω|β)def=

{0, if ω |= ¬βPr(ω)/Pr(β), if ω |= β

Jinbo Huang Reasoning with Bayesian Networks

Page 12: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Updating Beliefs

Pr(β|β) = 1, Pr(¬β|β) = 0

Partition worlds into those |= β and those |= ¬βI Pr(ω|β) = 0, for all ω |= ¬βI Relative beliefs stay put, for ω |= β

I∑

ω|=β Pr(ω) = Pr(β)

I∑

ω|=β Pr(ω|β) = 1

I Above imply we normalize by constant Pr(β)

Pr(ω|β)def=

{0, if ω |= ¬βPr(ω)/Pr(β), if ω |= β

Jinbo Huang Reasoning with Bayesian Networks

Page 13: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Updating Beliefs

Pr(β|β) = 1, Pr(¬β|β) = 0

Partition worlds into those |= β and those |= ¬βI Pr(ω|β) = 0, for all ω |= ¬βI Relative beliefs stay put, for ω |= β

I∑

ω|=β Pr(ω) = Pr(β)

I∑

ω|=β Pr(ω|β) = 1

I Above imply we normalize by constant Pr(β)

Pr(ω|β)def=

{0, if ω |= ¬βPr(ω)/Pr(β), if ω |= β

Jinbo Huang Reasoning with Bayesian Networks

Page 14: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Updating Beliefs

Pr(β|β) = 1, Pr(¬β|β) = 0

Partition worlds into those |= β and those |= ¬βI Pr(ω|β) = 0, for all ω |= ¬βI Relative beliefs stay put, for ω |= β

I∑

ω|=β Pr(ω) = Pr(β)

I∑

ω|=β Pr(ω|β) = 1

I Above imply we normalize by constant Pr(β)

Pr(ω|β)def=

{0, if ω |= ¬βPr(ω)/Pr(β), if ω |= β

Jinbo Huang Reasoning with Bayesian Networks

Page 15: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Updating Beliefs

Jinbo Huang Reasoning with Bayesian Networks

Page 16: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Bayes Conditioning

Pr(α|β) =∑ω|=α

Pr(ω|β)

=∑

ω|=α, ω|=β

Pr(ω|β) +∑

ω|=α, ω|=¬β

Pr(ω|β)

=∑

ω|=α, ω|=β

Pr(ω|β) =∑

ω|=α∧β

Pr(ω|β)

=∑

ω|=α∧β

Pr(ω)/Pr(β) =1

Pr(β)

∑ω|=α∧β

Pr(ω)

=Pr(α ∧ β)

Pr(β)

Jinbo Huang Reasoning with Bayesian Networks

Page 17: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Bayes Conditioning

Pr(Burglary) = .2

Pr(Burglary |Alarm) ≈ .741 ↑Pr(Burglary |Alarm ∧ Earthquake) ≈ .253 ↓Pr(Burglary |Alarm ∧ ¬Earthquake) ≈ .957 ↑

Jinbo Huang Reasoning with Bayesian Networks

Page 18: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Bayes Conditioning

Pr(Burglary) = .2Pr(Burglary |Alarm) ≈ .741 ↑

Pr(Burglary |Alarm ∧ Earthquake) ≈ .253 ↓Pr(Burglary |Alarm ∧ ¬Earthquake) ≈ .957 ↑

Jinbo Huang Reasoning with Bayesian Networks

Page 19: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Bayes Conditioning

Pr(Burglary) = .2Pr(Burglary |Alarm) ≈ .741 ↑Pr(Burglary |Alarm ∧ Earthquake) ≈ .253 ↓

Pr(Burglary |Alarm ∧ ¬Earthquake) ≈ .957 ↑

Jinbo Huang Reasoning with Bayesian Networks

Page 20: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Bayes Conditioning

Pr(Burglary) = .2Pr(Burglary |Alarm) ≈ .741 ↑Pr(Burglary |Alarm ∧ Earthquake) ≈ .253 ↓Pr(Burglary |Alarm ∧ ¬Earthquake) ≈ .957 ↑

Jinbo Huang Reasoning with Bayesian Networks

Page 21: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Independence

Pr(Earthquake) = .1Pr(Earthquake|Burglary) = .1

Event α independent of event β if

Pr(α|β) = Pr(α) or Pr(β) = 0

Or equivalentlyPr(α ∧ β) = Pr(α) Pr(β)

Independence is symmetric

Jinbo Huang Reasoning with Bayesian Networks

Page 22: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Independence

Pr(Earthquake) = .1Pr(Earthquake|Burglary) = .1

Event α independent of event β if

Pr(α|β) = Pr(α) or Pr(β) = 0

Or equivalentlyPr(α ∧ β) = Pr(α) Pr(β)

Independence is symmetric

Jinbo Huang Reasoning with Bayesian Networks

Page 23: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Independence

Pr(Earthquake) = .1Pr(Earthquake|Burglary) = .1

Event α independent of event β if

Pr(α|β) = Pr(α) or Pr(β) = 0

Or equivalentlyPr(α ∧ β) = Pr(α) Pr(β)

Independence is symmetric

Jinbo Huang Reasoning with Bayesian Networks

Page 24: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Independence

Pr(Earthquake) = .1Pr(Earthquake|Burglary) = .1

Event α independent of event β if

Pr(α|β) = Pr(α) or Pr(β) = 0

Or equivalentlyPr(α ∧ β) = Pr(α) Pr(β)

Independence is symmetric

Jinbo Huang Reasoning with Bayesian Networks

Page 25: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Conditional Independence

Independence is a dynamic notion

Burglary independent of Earthquake, but not after acceptingevidence Alarm

Pr(Burglary |Alarm) ≈ .741Pr(Burglary |Alarm ∧ Earthquake) ≈ .253

α independent of β given γ if

Pr(α ∧ β|γ) = Pr(α|γ) Pr(β|γ) or Pr(γ) = 0

Jinbo Huang Reasoning with Bayesian Networks

Page 26: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Conditional Independence

Independence is a dynamic notion

Burglary independent of Earthquake, but not after acceptingevidence Alarm

Pr(Burglary |Alarm) ≈ .741Pr(Burglary |Alarm ∧ Earthquake) ≈ .253

α independent of β given γ if

Pr(α ∧ β|γ) = Pr(α|γ) Pr(β|γ) or Pr(γ) = 0

Jinbo Huang Reasoning with Bayesian Networks

Page 27: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Variable Independence

For disjoint sets of variables X, Y, and Z

IPr(X, Z, Y) : X independent of Y given Z in Pr

Means x independent of y given z for all instantiations of x, y, z

Jinbo Huang Reasoning with Bayesian Networks

Page 28: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Properties of Beliefs: Chain Rule

Pr(α1 ∧ α2 ∧ . . . ∧ αn)

= Pr(α1|α2 ∧ . . . ∧ αn) Pr(α2|α3 ∧ . . . ∧ αn) . . .Pr(αn)

Jinbo Huang Reasoning with Bayesian Networks

Page 29: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Properties of Beliefs: Case Analysis

Pr(α) =n∑

i=1

Pr(α ∧ βi )

Or equivalently

Pr(α) =n∑

i=1

Pr(α|βi ) Pr(βi )

Where events βi are mutually exclusive and collectively exhaustive

Jinbo Huang Reasoning with Bayesian Networks

Page 30: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Properties of Beliefs: Case Analysis

Special case of n = 2

Pr(α) = Pr(α ∧ β) + Pr(α ∧ ¬β)

Or equivalently

Pr(α) = Pr(α|β) Pr(β) + Pr(α|¬β) Pr(¬β)

Jinbo Huang Reasoning with Bayesian Networks

Page 31: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Properties of Beliefs: Bayes Rule

Pr(α|β) =Pr(β|α) Pr(α)

Pr(β)

Jinbo Huang Reasoning with Bayesian Networks

Page 32: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Properties of Beliefs: Bayes Rule

Example: patient tested positive for disease

I D: patient has disease

I T : test comes out positive

I Would like to know Pr(D|T )

We know

I Pr(D) = 1/1000

I Pr(T |¬D) = 2/100 (false positive)

I Pr(¬T |D) = 5/100 (false negative)

Using Bayes rule: Pr(D|T ) = Pr(T |D) Pr(D)Pr(T )

Jinbo Huang Reasoning with Bayesian Networks

Page 33: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Properties of Beliefs: Bayes Rule

Example: patient tested positive for disease

I D: patient has disease

I T : test comes out positive

I Would like to know Pr(D|T )

We know

I Pr(D) = 1/1000

I Pr(T |¬D) = 2/100 (false positive)

I Pr(¬T |D) = 5/100 (false negative)

Using Bayes rule: Pr(D|T ) = Pr(T |D) Pr(D)Pr(T )

Jinbo Huang Reasoning with Bayesian Networks

Page 34: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Properties of Beliefs: Bayes Rule

Example: patient tested positive for disease

I D: patient has disease

I T : test comes out positive

I Would like to know Pr(D|T )

We know

I Pr(D) = 1/1000

I Pr(T |¬D) = 2/100 (false positive)

I Pr(¬T |D) = 5/100 (false negative)

Using Bayes rule: Pr(D|T ) = Pr(T |D) Pr(D)Pr(T )

Jinbo Huang Reasoning with Bayesian Networks

Page 35: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Properties of Beliefs: Bayes Rule

Computing Pr(T ): case analysis

Pr(T ) = Pr(T |D) Pr(D) + Pr(T |¬D) Pr(¬D)

=95

100× 1

1000+

2

200× 999

1000=

2093

100000

Pr(D|T ) =95

2093≈ 4.5%

Jinbo Huang Reasoning with Bayesian Networks

Page 36: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Properties of Beliefs: Bayes Rule

Computing Pr(T ): case analysis

Pr(T ) = Pr(T |D) Pr(D) + Pr(T |¬D) Pr(¬D)

=95

100× 1

1000+

2

200× 999

1000=

2093

100000

Pr(D|T ) =95

2093≈ 4.5%

Jinbo Huang Reasoning with Bayesian Networks

Page 37: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Soft Evidence

So far we’ve considered hard evidence, that some event hasoccurred

Soft evidence: neighbor with hearing problem calls to tell us theyheard alarm

May not confirm Alarm, but can still increase our belief in Alarm

Two main methods to specify strength of soft evidence

Jinbo Huang Reasoning with Bayesian Networks

Page 38: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Soft Evidence: All Things Considered

“After receiving my neighbor’s call, my belief in Alarm is now .85”

Soft evidence as constraint Pr′(β) = q

Not a statement about strength of evidence per se, but result of itsintegration with our initial beliefs

Scale beliefs in worlds |= β so they add up to qScale beliefs in worlds |= ¬β so they add up to 1− q

Jinbo Huang Reasoning with Bayesian Networks

Page 39: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Soft Evidence: All Things Considered

“After receiving my neighbor’s call, my belief in Alarm is now .85”

Soft evidence as constraint Pr′(β) = q

Not a statement about strength of evidence per se, but result of itsintegration with our initial beliefs

Scale beliefs in worlds |= β so they add up to qScale beliefs in worlds |= ¬β so they add up to 1− q

Jinbo Huang Reasoning with Bayesian Networks

Page 40: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Soft Evidence: All Things Considered

For worlds

Pr′(ω)def=

{q

Pr(β) Pr(ω), if ω |= β1−q

Pr(¬β) Pr(ω), if ω |= ¬β

For events (Jeffery’s rule)

Pr′(α) = q Pr(α|β) + (1− q) Pr(α|¬β)

Bayes conditioning is special case of q = 1

Jinbo Huang Reasoning with Bayesian Networks

Page 41: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Soft Evidence: All Things Considered

For worlds

Pr′(ω)def=

{q

Pr(β) Pr(ω), if ω |= β1−q

Pr(¬β) Pr(ω), if ω |= ¬β

For events (Jeffery’s rule)

Pr′(α) = q Pr(α|β) + (1− q) Pr(α|¬β)

Bayes conditioning is special case of q = 1

Jinbo Huang Reasoning with Bayesian Networks

Page 42: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Soft Evidence: All Things Considered

For worlds

Pr′(ω)def=

{q

Pr(β) Pr(ω), if ω |= β1−q

Pr(¬β) Pr(ω), if ω |= ¬β

For events (Jeffery’s rule)

Pr′(α) = q Pr(α|β) + (1− q) Pr(α|¬β)

Bayes conditioning is special case of q = 1

Jinbo Huang Reasoning with Bayesian Networks

Page 43: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Soft Evidence: Nothing Else Considered

Would like to declare strength of evidence independently of currentbeliefs

Need notion of odds of event β: O(β)def= Pr(β)

Pr(¬β)

Declare evidence β by Bayes factor k = O′(β)O(β)

“My neighbor’s call increases odds of Alarm by factor of 4”

Compute Pr′(β) and fall back on Jeffrey’s rule to update beliefs

Jinbo Huang Reasoning with Bayesian Networks

Page 44: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Soft Evidence: Nothing Else Considered

Would like to declare strength of evidence independently of currentbeliefs

Need notion of odds of event β: O(β)def= Pr(β)

Pr(¬β)

Declare evidence β by Bayes factor k = O′(β)O(β)

“My neighbor’s call increases odds of Alarm by factor of 4”

Compute Pr′(β) and fall back on Jeffrey’s rule to update beliefs

Jinbo Huang Reasoning with Bayesian Networks

Page 45: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Soft Evidence: Nothing Else Considered

Would like to declare strength of evidence independently of currentbeliefs

Need notion of odds of event β: O(β)def= Pr(β)

Pr(¬β)

Declare evidence β by Bayes factor k = O′(β)O(β)

“My neighbor’s call increases odds of Alarm by factor of 4”

Compute Pr′(β) and fall back on Jeffrey’s rule to update beliefs

Jinbo Huang Reasoning with Bayesian Networks

Page 46: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Soft Evidence: Nothing Else Considered

Would like to declare strength of evidence independently of currentbeliefs

Need notion of odds of event β: O(β)def= Pr(β)

Pr(¬β)

Declare evidence β by Bayes factor k = O′(β)O(β)

“My neighbor’s call increases odds of Alarm by factor of 4”

Compute Pr′(β) and fall back on Jeffrey’s rule to update beliefs

Jinbo Huang Reasoning with Bayesian Networks

Page 47: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

ProbabilitiesUpdating BeliefsIndependenceProperties of BeliefsSoft Evidence

Soft Evidence: Nothing Else Considered

Would like to declare strength of evidence independently of currentbeliefs

Need notion of odds of event β: O(β)def= Pr(β)

Pr(¬β)

Declare evidence β by Bayes factor k = O′(β)O(β)

“My neighbor’s call increases odds of Alarm by factor of 4”

Compute Pr′(β) and fall back on Jeffrey’s rule to update beliefs

Jinbo Huang Reasoning with Bayesian Networks

Page 48: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

Probabilistic Reasoning

Basic task: belief updating in face of hard and soft evidence

Can be done in principle using definitions we’ve seen

Inefficient, requires joint probability tables (exponential)

Also a modeling difficulty

I Specifying joint probabilities is tedious

I Beliefs (e.g., independencies) held by human experts noteasily enforced

Jinbo Huang Reasoning with Bayesian Networks

Page 49: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

Capturing Independence Graphically

Evidence R couldincrease Pr(A),Pr(C )

But not if we know¬A

C independent of Rgiven ¬A

Jinbo Huang Reasoning with Bayesian Networks

Page 50: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

Capturing Independence Graphically

Evidence R couldincrease Pr(A),Pr(C )

But not if we know¬A

C independent of Rgiven ¬A

Jinbo Huang Reasoning with Bayesian Networks

Page 51: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

Capturing Independence Graphically

Evidence R couldincrease Pr(A),Pr(C )

But not if we know¬A

C independent of Rgiven ¬A

Jinbo Huang Reasoning with Bayesian Networks

Page 52: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

Capturing Independence Graphically

Evidence R couldincrease Pr(A),Pr(C )

But not if we know¬A

C independent of Rgiven ¬A

Jinbo Huang Reasoning with Bayesian Networks

Page 53: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

Capturing Independence Graphically

Parents(V): variables withedge to V

Descendants(V): variablesto which ∃ directed pathfrom V

Nondescendants(V): allexcept V, Parents(V), andDescendants(V)

Jinbo Huang Reasoning with Bayesian Networks

Page 54: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

Markovian Assumptions

Every variable is conditionally independent of its nondescendantsgiven its parents

I (V ,Parents(V ),Nondescendants(V ))

Given direct causes of variable, our beliefs in that variable are nolonger influenced by any other variable except possibly by its effects

Jinbo Huang Reasoning with Bayesian Networks

Page 55: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

Markovian Assumptions

Every variable is conditionally independent of its nondescendantsgiven its parents

I (V ,Parents(V ),Nondescendants(V ))

Given direct causes of variable, our beliefs in that variable are nolonger influenced by any other variable except possibly by its effects

Jinbo Huang Reasoning with Bayesian Networks

Page 56: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

Capturing Independence Graphically

I (C ,A, {B,E ,R})

I (R,E , {A,B,C})

I (A, {B,E},R)

I (B, ∅, {E ,R})

I (E , ∅,B)

Jinbo Huang Reasoning with Bayesian Networks

Page 57: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

Conditional Probability Tables

DAG G together with Markov(G ) declares a set of independencies,but does not define Pr

Pr is uniquely defined if a conditional probability table (CPT) isprovided for each variable X

CPT: Pr(x |u) for every value x of variable X and everyinstantiation u of parents U

Jinbo Huang Reasoning with Bayesian Networks

Page 58: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

Conditional Probability Tables

Pr(c|a)

Pr(r |e)

Pr(a|b, e)

Pr(e)

Pr(b)

Jinbo Huang Reasoning with Bayesian Networks

Page 59: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

Conditional Probability Tables

Half of probabilitiesare redundant

Only need 10probabilities forwhole DAG

Jinbo Huang Reasoning with Bayesian Networks

Page 60: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

Conditional Probability Tables

Half of probabilitiesare redundant

Only need 10probabilities forwhole DAG

Jinbo Huang Reasoning with Bayesian Networks

Page 61: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

Bayesian Network

A pair (G ,Θ) where

I G is DAG over variables Z, called network structure

I Θ is set of CPTs, one for each variable in Z, called networkparametrization

Jinbo Huang Reasoning with Bayesian Networks

Page 62: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

Bayesian Network

ΘX |U: CPT for X andparents U

XU: network family

θx |u = Pr(x |u): networkparameter∑

x θx |u = 1 for everyparent instantiation u

Jinbo Huang Reasoning with Bayesian Networks

Page 63: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

Bayesian Network

Network instantiation: e.g.,a, b, c , d , e

θx |u ∼ z: instantiations xu &z are compatible

θa, θb|a, θc|a, θd |b,c , θe|c all ∼ zabove

Jinbo Huang Reasoning with Bayesian Networks

Page 64: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

Bayesian Network

Independence constraints imposed by network structure (DAG) andnumeric constraints imposed by network parametrization (CPTs)are satisfied by unique Pr

Pr(z)def=

∏θx|u∼z

θx |u (chain rule)

Bayesian network is implicit representation of this probabilitydistribution

Jinbo Huang Reasoning with Bayesian Networks

Page 65: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

Chain Rule Example

Pr(z)def=

∏θx|u∼z

θx |u (chain rule)

Pr(a, b, c, d , e) = θa θb|a θc|a θd |b,c θe|c

= (.6)(.2)(.2)(.9)(.1)

= .0216

Jinbo Huang Reasoning with Bayesian Networks

Page 66: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

Compactness of Bayesian Networks

Let d be max variable domain size, k max number of parents

Size of any CPT = O(dk+1)

Total number of network parameters = O(n · dk+1) for n-variablenetwork

Compact as long as k is relatively small, compared with jointprobability table (up to dn entries)

Jinbo Huang Reasoning with Bayesian Networks

Page 67: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

Properties of Probabilistic Independence

Distribution Pr represented by Bayesian network (G ,Θ) isguaranteed to satisfy every independence assumption in Markov(G )

IPr(X ,Parents(X ),Nondescendants(X ))

It may satisfy more

Derive independence by graphoid axioms

Jinbo Huang Reasoning with Bayesian Networks

Page 68: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

Graphoid Axioms: Symmetry

IPr(X,Z,Y)↔ IPr(Y,Z,X)

If learning y does not influence our belief in x, then learning x doesnot influence our belief in y either

Jinbo Huang Reasoning with Bayesian Networks

Page 69: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

Graphoid Axioms: Decomposition

IPr(X,Z,Y ∪W)→ IPr(X,Z,Y) ∧ IPr(X,Z,W)

If learning yw does not influence our belief in x, then learning yalone, or w alone, does not influence our belief in x either

If some information is irrelevant, then any part of it is alsoirrelevant

Reverse does not hold in general: Two pieces of information mayeach be irrelevant on their own, yet their combination may berelevant

Jinbo Huang Reasoning with Bayesian Networks

Page 70: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

Graphoid Axioms: Decomposition

IPr(X,Z,Y ∪W)→ IPr(X,Z,Y) ∧ IPr(X,Z,W)

In particular, can strengthen Markov(G )

IPr(X ,Parents(X ),W) for W ⊆ Nondescendants(X )

Useful for proving chain rule for Bayesian networks from chain ruleof probability calculus

Jinbo Huang Reasoning with Bayesian Networks

Page 71: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

Graphoid Axioms: Weak Union

IPr(X,Z,Y ∪W)→ IPr(X,Z ∪ Y,W)

If information yw is not relevant to our belief in x, then partialinformation y will not make rest of information, w, relevant

In particular, can strengthen Markov(G )

IPr(X ,Parents(X ) ∪W,Nondescendants(X )\W)

for W ⊆ Nondescendants(X )

Jinbo Huang Reasoning with Bayesian Networks

Page 72: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

Graphoid Axioms: Contraction

IPr(X,Z,Y) ∧ IPr(X,Z ∪ Y,W)→ IPr(X,Z,Y ∪W)

If, after learning irrelevant information y, information w is found tobe irrelevant to our belief in x, then combined information ywmust have been irrelevant to start with

Jinbo Huang Reasoning with Bayesian Networks

Page 73: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

Intersection Axiom

IPr(X,Z ∪W,Y) ∧ IPr(X,Z ∪ Y,W)→ IPr(X,Z,Y ∪W)

for strictly positive distribution Pr

If information w is irrelevant given y, and information y is irrelevantgiven w, then their combination is irrelevant to start with

Not true in general, when there’re zero probabilities (determinism)

Jinbo Huang Reasoning with Bayesian Networks

Page 74: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

Graphical Test of Independence

Graphoid axioms produce independencies beyond Markov(G )

Derivation of independencies is nontrivial; need efficient way

Linear-time graphical test : d-separation

Jinbo Huang Reasoning with Bayesian Networks

Page 75: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

d-separation

dsepG (X,Z,Y): every path between X & Y is blocked by Z

dsepG (X,Z,Y)→ IPr(X,Z,Y) for every Pr induced by G(soundness)

Also enjoys weak completeness (discussed later)

Jinbo Huang Reasoning with Bayesian Networks

Page 76: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

d-separation: Paths and Valves

Jinbo Huang Reasoning with Bayesian Networks

Page 77: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

d-separation: Paths and Valves

Jinbo Huang Reasoning with Bayesian Networks

Page 78: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

d-separation: Path Blocking

Path is blocked if any valve is closed

Sequential valve →W → closed iff W ∈ Z

Divergent valve ←W → closed iff W ∈ Z

Convergent value →W ← closed iff W 6∈ Z and none ofDescendants(W ) appears in Z

Jinbo Huang Reasoning with Bayesian Networks

Page 79: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

d-separation: Examples

Jinbo Huang Reasoning with Bayesian Networks

Page 80: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

d-separation: Examples

Jinbo Huang Reasoning with Bayesian Networks

Page 81: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

d-separation: Complexity

Need not enumerate paths between X & Y

dsepG (X,Z,Y) iff X and Y are disconnected in new DAG G ′

obtained by pruning DAG G as follows

I Delete all leaves W 6∈ X ∪ Y ∪ Z

I Delete all edges outgoing from Z

Linear in size of G

Jinbo Huang Reasoning with Bayesian Networks

Page 82: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

d-separation: Soundness

dsepG (X,Z,Y)→ IPr(X,Z,Y) for Pr induced by G

Prove by showing every independence claimed by d-separation canbe derived from graphoid axioms

Jinbo Huang Reasoning with Bayesian Networks

Page 83: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

d-separation: Completeness?

dsepG (X,Z,Y)← IPr(X,Z,Y)?

Does not hold

I Consider X → Y → Z

I X not d-separated from Z (given ∅)I However, if θy |x = θy |x then X & Y are independent, so are

X & Z

Not surprising as d-separation is based on structure alone

Jinbo Huang Reasoning with Bayesian Networks

Page 84: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

d-separation: Weak Completeness

For every DAG G , ∃ parametrization Θ such that

dsepG (X,Z,Y)↔ IPr(X,Z,Y)

where Pr is induced by Bayesian network (G ,Θ)

Implies that one cannot improve on d-separation

Jinbo Huang Reasoning with Bayesian Networks

Page 85: Reasoning with Bayesian Networksusers.cecs.anu.edu.au/~jinbo/pr/lecture01.pdf · Lecture 1: Probability Calculus, Bayesian Networks Jinbo Huang NICTA and ANU Jinbo Huang Reasoning

Probability CalculusBayesian Networks

Capturing Independence GraphicallyConditional Probability TablesBayesian NetworksGraphoid Axiomsd-separation

Summary

Probabilities of worlds & events, Bayes conditioning, independence,chain rule, case analysis, Bayes rule, two methods for soft evidence

Capturing independence graphically, Markovian assumptions,conditional probabilities tables, Bayesian networks, chain rule,graphoid axioms, d-separation

Jinbo Huang Reasoning with Bayesian Networks