probabilistic reasoning [ch. 14] bayes networks – part 1 ◦syntax ◦semantics ◦parameterized...

32
Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 Syntax Semantics Parameterized distributions Inference – Part2 Exact inference by enumeration Exact inference by variable elimination Approximate inference by stochastic simulation Approximate inference by Markov chain Monte Carlo

Upload: dominick-cole

Post on 17-Jan-2016

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration

Probabilistic Reasoning [Ch. 14]• Bayes Networks – Part 1

◦Syntax◦Semantics◦Parameterized distributions

• Inference – Part2◦Exact inference by enumeration◦Exact inference by variable elimination◦Approximate inference by stochastic simulation◦Approximate inference by Markov chain Monte

Carlo

Page 2: Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration

Bayesian networks• A simple, graphical notation for conditional

independence assertions and hence for compact specification of full joint distributions

• Syntax:◦ a set of nodes, one per variable◦ a directed, acyclic graph (link ≈ “directly influences”)◦ a conditional distribution for each node given its

parents:P(X i|Parents(X i))

• In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the distribution over X i for each combination of parent values

Page 3: Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration

Bayes’ Nets• A Bayes’ net is an

efficient encoding of a probabilistic model of a domain

• Questions we can ask:◦Inference: given a

fixed BN, what is P(X | e)?

◦Representation: given a BN graph, what kinds of distributions can it encode?

◦Modeling: what BN is most appropriate for a given domain?

Page 4: Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration

Bayes’ Net Semantics• Let’s formalize the semantics of a Bayes’

net• A set of nodes, one per variable X

• A directed, acyclic graph

• A conditional distribution for each node◦ A collection of distributions over X, one for each

combination of parents’ values

◦ CPT: conditional probability table◦ Description of a noisy “causal” process

A Bayes net = Topology (graph) + Local Conditional Probabilities

Page 5: Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration

Topology Limits Distributions

• Given some graph topology G, only certain joint distributions can be encoded

• The graph structure guarantees certain (conditional) independences

• (There might be more independence)

• Adding arcs increases the set of distributions, but has several costs

• Full conditioning can encode any distribution

Page 6: Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration

Independence in a BNImportant question about a BN:

◦Are two nodes independent given certain evidence?

◦If yes, can prove using algebra (tedious in general)◦If no, can prove with a counter example◦Example:

X Y Z

◦Question: are X and Z necessarily independent?• Answer: no. Example: low pressure causes rain,

which causes traffic.• X can influence Z, Z can influence X (via Y)• Addendum: they could be independent: how?

Page 7: Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration

Causal Chains• This configuration is a “causal chain”

X: Low pressure

X Y Z Y: Rain

Z: Traffic

◦Is X independent of Z given Y?

Yes!

◦Evidence along the chain “blocks” the influence

Page 8: Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration

Common Cause• Another basic configuration:

two effects of the same cause◦Are X and Z independent?

◦Are X and Z independent given Y? X Z

Y: Project due

X: Newsgroup busy

Z: Lab full

Y

◦Observing the cause blocks influence between effects.

Yes!

Page 9: Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration

Common Effect

X Z

• Last configuration: two causes of one effect (v-structures)◦Are X and Z independent?• Yes: the ballgame and the rain cause

traffic, but they are not correlated• Still need to prove they must be (try

it!)

◦Are X and Z independent given Y?• No: seeing traffic puts the rain and the

ballgame in competition as explanation?

◦This is backwards from the other cases• Observing an effect activates

influence between possible causes.

Y

X: Raining

Z: Ballgame

Y: Traffic

Page 10: Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration

The General Case• Any complex example can be

analyzed using these three canonical cases

• General question: in a given BN, are two variables independent (given evidence)?

• Solution: analyze the graph

Page 11: Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration

Reachability• Recipe: shade evidence

nodes• Attempt 1: if two nodes are

connected by an undirected path not blocked by a shaded node, they are conditionally independent

L

• Almost works, but not quite◦ Where does it break?◦ Answer: the v-structure at T

doesn’t count as a link in a path unless “active”

R

T

B

D

Page 12: Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration

Reachability (D-Separation)• Question: Are X and Y

conditionally independent given evidence vars {Z}?◦ Yes, if X and Y “separated” by Z◦ Look for active paths from X to Y◦ No active paths =

independence!• A path is active if each triple is

active:

Active Triples

XB

Y

Inactive Triples

X B Y

XB

Y XB

Y

◦ Causal chain X B Y where B is unobserved (either direction)

◦ Common cause X B Y where B is unobserved

◦ Common effect (aka v-structure) X B Y where B or one of its descendents is observed

• All it takes to block a path is a single inactive segment

XB

YXB

Y

X Y

B

Page 13: Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration

Example

R B

X B Y

BY

Y

X

XB

Inactive Triples

Yes

T

T’

X B Y

BY

Y

B

X

X

Active Triples

Page 14: Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration

Example

R B

L

Ye

s

Ye

s

X B Y

BY

Y

X

XB

Inactive Triples

TD

T’Yes

X B Y

BY

Y

B

X

X

Active Triples

Page 15: Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration

Example• Variables:

◦R: Raining◦T: Traffic◦D: Roof drips◦S: I’m sad

X B Y

BY

Y

X

XB

Inactive Triples

• Questions:

Yes

X B Y

BY

Y

B

X

X

Active Triples

Page 16: Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration

Causality?• When Bayes’ nets reflect the true causal patterns:

◦ Often simpler (nodes have fewer parents)◦ Often easier to think about◦ Often easier to elicit from experts

• BNs need not actually be causal◦ Sometimes no causal net exists over the domain◦ E.g. consider the variables Traffic and Drips◦ End up with arrows that reflect correlation, not causation

• What do the arrows really mean?◦ Topology may happen to encode causal structure◦ Topology only guaranteed to encode

conditional independence

Page 17: Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration

Changing Bayes’ Net Structure• The same joint distribution can be encoded

in many different Bayes’ nets◦Causal structure tends to be the simplest

• Analysis question: given some edges, what other edges do you need to add?◦One answer: fully connect the graph◦Better answer: don’t make any false conditional

independence assumptions

Page 18: Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration

Example• Topology of network encodes

conditional independence assertions:

• Weather is independent of the other variables

• Toothache and Catch are conditionally independent, given Cavity

Page 19: Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration

Example• I'm at work, neighbor John calls to say my alarm is ringing,

but neighbor Mary doesn't call. Sometimes it's set off by minor earthquakes. Is there a burglar?

• Variables: Burglar, Earthquake, Alarm, JohnCalls, MaryCalls

• Network topology reflects “causal” knowledge:◦ A burglar can set the alarm off◦ An earthquake can set the alarm off◦ The alarm can cause Mary to call◦ The alarm can cause John to call

Page 20: Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration

Example contd.

Probabilities derived from prior observations

Page 21: Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration

Compactness• A CPT for Boolean X i with k Boolean parents

has 2k rows for the combinations of parent values• Each row requires one number p for X i

=true (the number for X i =false is just 1 - p)• If each variable has no more than k parents,

the complete network requires O(n · 2k) numbers

• I.e., grows linearly with n, vs. O(2n) for the full joint distribution

• For burglary net, 1 + 1 + 4 + 2 + 2=10 numbers (vs. 25-1 = 31)

Page 22: Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration

Global semantics• “Global” semantics defines the full

joint distribution as the product of the local conditional distributions:

P(x1, ... , xn) = Πni=1P(x |

parents(X ))i i i

• e.g., P(j ∧ m ∧ a ∧ ¬b ∧ ¬e)= P(j|a)P(m|a)P(a|¬b,¬e)P(¬b)P(¬e)= 0.9×0.7×0.001×0.999×0.998≈ 0.00063

Page 23: Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration

Local semantics• Local semantics: each node is conditionally

independent of its nondescendants given its parents

• Theorem: Local semantics ⇔ global semantics

Page 24: Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration

Markov blanket• Each node is conditionally independent of all others given

itsMarkov blanket: parents + children + children's parents

Page 25: Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration

Constructing Bayesian networks• Need a method such that a series of locally testable

assertions of conditional independence guarantees the required global semantics

• 1. Choose an ordering of variables X , . . . ,X n

• 2. For i = 1 to nadd X i to the network

select parents from X , . . . ,X i - such that

P(X i|Parents(X i)) = P(X i| X , . . . ,X i -)

• This choice of parents guarantees the global semantics:• P(X , . . . , X n) = Πn

i=1P(X | X , . . . ,X

i i

i-) (chain rule)= Πn

i=1 P(X |Parents(X )) (by construction)i i i

Page 26: Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration

Example: Problem formulation• I'm at work, neighbor John calls to say my alarm is ringing,

but neighbor Mary doesn't call. Sometimes it's set off by minor earthquakes. Is there a burglar?

• Variables: Burglar, Earthquake, Alarm, JohnCalls, MaryCalls

• Network topology reflects “causal” knowledge:◦ A burglar can set the alarm off◦ An earthquake can set the alarm off◦ The alarm can cause Mary to call◦ The alarm can cause John to call

Page 27: Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration

ExampleSuppose we choose the ordering M, J , A, B, E

MaryCalls

JohnCalls

Alarm

• P(J|M) = P(J)?• P(A|J,M) = P(A|J)? P(A|J,M) = P(A)?• P(B|A,J,M) = P(B|A)? Yes• P(B|A,J,M) = P(B)?• P(E|B,A,J,M) = P(E|A)?• P(E|B,A,J,M) = P(E|A,B)?

Yes

Burglary

EarthquakeNoNo

NoNo

Page 28: Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration

Example contd.MaryCalls

JohnCalls

Alarm

Deciding conditional independence is hard in non-causal directions (Causal models and conditional independence seem hardwired for

humans!)Assessing conditional probabilities is hard in non-causal directions Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed

Burglary

Earthquake

Page 29: Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration

Example: Car diagnosisInitial evidence: car won't startTestable variables (green), “broken, so fix it” variables (orange) Hidden variables (gray) ensure sparse structure, reduce parameters

Page 30: Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration

Example: Car insurance

Page 31: Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration

Compact conditional distributions• CPT grows exponentially with number of parents• CPT becomes infinite with continuous-valued parent or

child• Solution: canonical distributions that are defined compactly

• Deterministic nodes are the simplest case:X = f (Parents(X )) for some function f

• E.g., Boolean functionsNorthAmerican ⇔ Canadian ∨ US ∨ Mexican

• E.g., numerical relationships among continuous variables

Level inflow precipitation - outflow - evaporationt

Page 32: Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration

Compact conditional distributions contd.

Noisy-OR distributions model multiple noninteracting causes1) Parents U1 . . .Uk include all causes (can add leak node)2) Independent failure probability qi for each cause alone

⇒ P(X |U1 . . .Uj, ¬Uj+1 . . . ¬Uk) = 1 - Π jii=1qi

0.0 1.0Know

• Number of parameters linear in number of parents

Cold FluMalaria

F F F

F F T

F T F

F T T

T F F

T F T

T T F

T T T

P(Fever)

P(¬Fever)

0.9 0.1

0.8 0.2

0.98 0.02 = 0.2 x 0.1

0.4 0.6

0.94 0.06 = 0.6 x 0.1

0.88 0.12 = 0.6 x 0.2

0.988 0.012 = 0.6 x 0.2 x 0.1

Infer