Download - Bayesian Networks
![Page 1: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/1.jpg)
1
Bayesian Networks
Tamara BergCS 590-133 Artificial Intelligence
Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore, Percy Liang, Luke Zettlemoyer, Rob Pless, Killian Weinberger, Deva Ramanan
![Page 2: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/2.jpg)
Announcements• Some students in the back are having trouble
hearing the lecture due to talking.
• Please respect your fellow students. If you have a question or comment relevant to the course please share with all of us. Otherwise, don’t talk during lecture.
• Also, if you are having trouble hearing in the back there are plenty of seats further forward.
![Page 3: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/3.jpg)
Reminder
• HW3 was released 2/27– Written questions only (no programming)– Due Tuesday, 3/18, 11:59pm
![Page 4: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/4.jpg)
From last class
![Page 5: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/5.jpg)
Random Variables
Random variables
be a realization of Let
A random variable is some aspect of the world about which we (may) have uncertainty.
Random variables can be:Binary (e.g. {true,false}, {spam/ham}), Take on a discrete set of values
(e.g. {Spring, Summer, Fall, Winter}), Or be continuous (e.g. [0 1]).
![Page 6: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/6.jpg)
Joint Probability Distribution
Random variables
Joint Probability Distribution:
be a realization of Let
Also written
Gives a real value for all possible assignments.
![Page 7: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/7.jpg)
Queries
Joint Probability Distribution:
Also written
Given a joint distribution, we can reason about unobserved variables given observations (evidence):
Stuff you care about Stuff you already know
![Page 8: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/8.jpg)
Main kinds of models• Undirected (also called Markov Random Fields)
- links express constraints between variables.
• Directed (also called Bayesian Networks) - have a notion of causality -- one can regard an arc from A to B as indicating that A "causes" B.
![Page 9: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/9.jpg)
Syntax Directed Acyclic Graph (DAG) Nodes: random variables
Can be assigned (observed)or unassigned (unobserved)
Arcs: interactions An arrow from one variable to another indicates
direct influence Encode conditional independence
Weather is independent of the other variables Toothache and Catch are conditionally independent
given Cavity Must form a directed, acyclic graph
Weather Cavity
Toothache Catch
![Page 10: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/10.jpg)
Bayes Nets
Directed Graph, G = (X,E)
Nodes
Edges
Each node is associated with a random variable
![Page 11: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/11.jpg)
Example
![Page 12: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/12.jpg)
Joint Distribution
By Chain Rule (using the usual arithmetic ordering)
![Page 13: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/13.jpg)
Directed Graphical Models
Directed Graph, G = (X,E)
Nodes
Edges
Each node is associated with a random variable
Definition of joint probability in a graphical model:
where are the parents of
![Page 14: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/14.jpg)
Example
Joint Probability:
![Page 15: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/15.jpg)
Example
00
1
1
00
1
10
0
1
1
00
1
1
10
0
10 1
0
1
![Page 16: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/16.jpg)
Size of a Bayes’ Net• How big is a joint distribution over N Boolean variables?
2N
• How big is an N-node net if nodes have up to k parents?
O(N * 2k+1)
• Both give you the power to calculate• BNs: Huge space savings!• Also easier to elicit local CPTs• Also turns out to be faster to answer queries
16
![Page 17: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/17.jpg)
The joint probability distribution
For example, P(j, m, a, ¬b, ¬e)= P(¬b) P(¬e) P(a | ¬b, ¬e) P(j | a) P(m | a)
![Page 18: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/18.jpg)
Independence in a BN• Important question about a BN:
– Are two nodes independent given certain evidence?– If yes, can prove using algebra (tedious in general)– If no, can prove with a counter example– Example:
– Question: are X and Z necessarily independent?• Answer: no. Example: low pressure causes rain, which
causes traffic.• X can influence Z, Z can influence X (via Y)• Addendum: they could be independent: how?
X Y Z
![Page 19: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/19.jpg)
Causal Chains• This configuration is a “causal chain”
– Is Z independent of X given Y?
– Evidence along the chain “blocks” the influence
X Y Z
Yes!
X: Project due
Y: No office hours
Z: Students panic
19
![Page 20: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/20.jpg)
Common Cause• Another basic configuration: two
effects of the same cause– Are X and Z independent?
– Are X and Z independent given Y?
– Observing the cause blocks influence between effects.
X
Y
Z
Yes!
Y: Homework due
X: Full attendance
Z: Students sleepy
20
![Page 21: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/21.jpg)
Common Effect• Last configuration: two causes of
one effect (v-structures)– Are X and Z independent?
• Yes: the ballgame and the rain cause traffic, but they are not correlated
• Still need to prove they must be (try it!)– Are X and Z independent given Y?
• No: seeing traffic puts the rain and the ballgame in competition as explanation
– This is backwards from the other cases• Observing an effect activates influence
between possible causes.
X
Y
Z
X: Raining
Z: Ballgame
Y: Traffic
21
![Page 22: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/22.jpg)
The General Case• Any complex example can be
analyzed using these three canonical cases
• General question: in a given BN, are two variables independent (given evidence)?
• Solution: analyze the graph
22
Causal Chain
Common Cause
(Unobserved)Common Effect
![Page 23: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/23.jpg)
Bayes Ball
• Shade all observed nodes. Place balls at the starting node, let them bounce around according to some rules, and ask if any of the balls reach any of the goal node.
• We need to know what happens when a ball arrives at a node on its way to the goal node.
23
![Page 24: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/24.jpg)
24
![Page 25: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/25.jpg)
Example
Yes
25
R
T
B
T’
![Page 26: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/26.jpg)
Bayesian decision making• Suppose the agent has to make decisions about
the value of an unobserved query variable X based on the values of an observed evidence variable E
• Inference problem: given some evidence E = e, what is P(X | e)?
• Learning problem: estimate the parameters of the probabilistic model P(X | E) given training samples {(x1,e1), …, (xn,en)}
![Page 27: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/27.jpg)
Probabilistic inference A general scenario:
Query variables: X Evidence (observed) variables: E = e Unobserved variables: Y
If we know the full joint distribution P(X, E, Y), how can we perform inference about X?
y
yeXeeXeEX ),,()(),()|( P
PPP
![Page 28: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/28.jpg)
Inference• Inference: calculating some
useful quantity from a joint probability distribution
• Examples:– Posterior probability:
– Most likely explanation:
29
B E
A
J M
![Page 29: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/29.jpg)
Inference – computing conditional probabilities
Marginalization:Conditional Probabilities:
![Page 30: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/30.jpg)
Inference by Enumeration• Given unlimited time, inference in BNs is easy• Recipe:
– State the marginal probabilities you need– Figure out ALL the atomic probabilities you need– Calculate and combine them
• Example:
31
B E
A
J M
![Page 31: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/31.jpg)
Example: Enumeration• In this simple method, we only need the BN to
synthesize the joint entries
32
![Page 32: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/32.jpg)
Probabilistic inference A general scenario:
Query variables: X Evidence (observed) variables: E = e Unobserved variables: Y
If we know the full joint distribution P(X, E, Y), how can we perform inference about X?
Problems Full joint distributions are too large Marginalizing out Y may involve too many summation terms
y
yeXeeXeEX ),,()(),()|( P
PPP
![Page 33: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/33.jpg)
Inference by Enumeration?
34
![Page 34: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/34.jpg)
Variable Elimination• Why is inference by enumeration on a Bayes
Net inefficient?– You join up the whole joint distribution before you sum
out the hidden variables– You end up repeating a lot of work!
• Idea: interleave joining and marginalizing!– Called “Variable Elimination”– Choosing the order to eliminate variables that
minimizes work is NP-hard, but *anything* sensible is much faster than inference by enumeration
35
![Page 35: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/35.jpg)
General Variable Elimination• Query:
• Start with initial factors:– Local CPTs (but instantiated by evidence)
• While there are still hidden variables (not Q or evidence):– Pick a hidden variable H– Join all factors mentioning H– Eliminate (sum out) H
• Join all remaining factors and normalize
36
![Page 36: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/36.jpg)
37
Example: Variable elimination
Query: What is the probability that a student attends class, given that they pass the exam?
[based on slides taken from UMBC CMSC 671, 2005]
P(pr|at,st) at st0.9 T T0.5 T F0.7 F T0.1 F F
attend study
preparedfair
pass
P(at)=.8P(st)=.6
P(fa)=.9
P(pa|at,pre,fa) pr at fa0.9 T T T0.1 T T F0.7 T F T0.1 T F F0.7 F T T0.1 F T F0.2 F F T0.1 F F F
![Page 37: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/37.jpg)
38
Join study factors
attend study
preparedfair
pass
P(at)=.8P(st)=.6
P(fa)=.9
Original Joint Marginalprep study attend P(pr|at,st) P(st) P(pr,st|sm) P(pr|sm)
T T T 0.9 0.6 0.54 0.74T F T 0.5 0.4 0.2T T F 0.7 0.6 0.42 0.46T F F 0.1 0.4 0.04 F T T 0.1 0.6 0.06 0.26F F T 0.5 0.4 0.2 F T F 0.3 0.6 0.18 0.54F F F 0.9 0.4 0.36
P(pa|at,pre,fa) pr at fa0.9 T T T0.1 T T F0.7 T F T0.1 T F F0.7 F T T0.1 F T F0.2 F F T0.1 F F F
![Page 38: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/38.jpg)
39
Marginalize out study
attend
prepared,study
fair
pass
P(at)=.8
P(fa)=.9
Original Joint Marginalprep study attend P(pr|at,st) P(st) P(pr,st|at) P(pr|at)
T T T 0.9 0.6 0.54 0.74T F T 0.5 0.4 0.2T T F 0.7 0.6 0.42 0.46T F F 0.1 0.4 0.04 F T T 0.1 0.6 0.06 0.26F F T 0.5 0.4 0.2 F T F 0.3 0.6 0.18 0.54F F F 0.9 0.4 0.36
P(pa|at,pre,fa) pr at fa0.9 T T T0.1 T T F0.7 T F T0.1 T F F0.7 F T T0.1 F T F0.2 F F T0.1 F F F
![Page 39: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/39.jpg)
40
Remove “study”
attend
preparedfair
pass
P(at)=.8
P(fa)=.9
P(pr|at) pr at0.74 T T0.46 T F0.26 F T0.54 F F
P(pa|at,pre,fa) pr at fa0.9 T T T0.1 T T F0.7 T F T0.1 T F F0.7 F T T0.1 F T F0.2 F F T0.1 F F F
![Page 40: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/40.jpg)
41
Join factors “fair”
attend
preparedfair
pass
P(at)=.8
P(fa)=.9
P(pr|at) prep attend0.74 T T0.46 T F0.26 F T0.54 F F
Original Joint Marginal
pa pre attend fairP(pa|
at,pre,fa) P(fair)P(pa,fa|sm,pre)
P(pa|sm,pre)
t T T T 0.9 0.9 0.81 0.82t T T F 0.1 0.1 0.01 t T F T 0.7 0.9 0.63 0.64t T F F 0.1 0.1 0.01 t F T T 0.7 0.9 0.63 0.64t F T F 0.1 0.1 0.01 t F F T 0.2 0.9 0.18 0.19t F F F 0.1 0.1 0.01
![Page 41: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/41.jpg)
42
Marginalize out “fair”
attend
prepared
pass,fair
P(at)=.8
P(pr|at) prep attend0.74 T T0.46 T F0.26 F T0.54 F F
Original Joint Marginal
pa pre attend fair P(pa|at,pre,fa) P(fair) P(pa,fa|at,pre) P(pa|at,pre)T T T T 0.9 0.9 0.81 0.82T T T F 0.1 0.1 0.01 T T F T 0.7 0.9 0.63 0.64T T F F 0.1 0.1 0.01 T F T T 0.7 0.9 0.63 0.64T F T F 0.1 0.1 0.01 T F F T 0.2 0.9 0.18 0.19T F F F 0.1 0.1 0.01
![Page 42: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/42.jpg)
43
Marginalize out “fair”
attend
prepared
pass
P(at)=.8
P(pr|at) prep attend0.74 T T0.46 T F0.26 F T0.54 F F
P(pa|at,pre) pa pre attend0.82 t T T0.64 t T F0.64 t F T0.19 t F F
![Page 43: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/43.jpg)
44
Join factors “prepared”
attend
prepared
pass
P(at)=.8
Original Joint Marginalpa pre attend P(pa|at,pr) P(pr|at) P(pa,pr|sm) P(pa|sm)t T T 0.82 0.74 0.6068 0.7732t T F 0.64 0.46 0.2944 0.397t F T 0.64 0.26 0.1664 t F F 0.19 0.54 0.1026
![Page 44: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/44.jpg)
45
Join factors “prepared”
attend
pass,prepared
P(at)=.8
Original Joint Marginalpa pre attend P(pa|at,pr) P(pr|at) P(pa,pr|at) P(pa|at)t T T 0.82 0.74 0.6068 0.7732t T F 0.64 0.46 0.2944 0.397t F T 0.64 0.26 0.1664 t F F 0.19 0.54 0.1026
![Page 45: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/45.jpg)
46
Join factors “prepared”
attend
pass
P(at)=.8
P(pa|at) pa attend0.7732 t T0.397 t F
![Page 46: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/46.jpg)
47
Join factors
attend
pass
P(at)=.8
Original Joint Normalized:pa attend P(pa|at) P(at) P(pa,sm) P(at|pa)T T 0.7732 0.8 0.61856 0.89T F 0.397 0.2 0.0794 0.11
![Page 47: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/47.jpg)
48
Join factors
attend,pass
Original Joint Normalized:pa attend P(pa|at) P(at) P(pa,at) P(at|pa)T T 0.7732 0.8 0.61856 0.89T F 0.397 0.2 0.0794 0.11
![Page 48: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/48.jpg)
Bayesian network inference: Big picture
• Exact inference is intractable– There exist techniques to speed up
computations, but worst-case complexity is still exponential except in some classes of networks
• Approximate inference – Sampling, variational methods, message
passing / belief propagation…
![Page 49: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/49.jpg)
Approximate Inference
Sampling (particle based method)
50
![Page 50: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/50.jpg)
Approximate Inference
51
![Page 51: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/51.jpg)
Sampling – the basics ...• Scrooge McDuck gives you
an ancient coin. • He wants to know what is
P(H) • You have no homework,
and nothing good is on television – so you toss it 1 Million times.
• You obtain 700000x Heads, and 300000x Tails.
• What is P(H)?
52
![Page 52: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/52.jpg)
Sampling – the basics ...
• Exactly, P(H)=0.7• Why?
53
![Page 53: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/53.jpg)
Monte Carlo Method
54
Who is more likely to win? Green or Purple?
What is the probability that green wins, P(G)?
Two ways to solve this:1. Compute the exact probability.2. Play 100,000 games and see
how many times green wins.
![Page 54: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/54.jpg)
Approximate Inference• Simulation has a name: sampling
• Sampling is a hot topic in machine learning,and it’s really simple
• Basic idea:– Draw N samples from a sampling distribution S– Compute an approximate posterior probability– Show this converges to the true probability P
• Why sample?– Learning: get samples from a distribution you don’t know– Inference: getting a sample is faster than computing the right
answer (e.g. with variable elimination)55
S
A
F
![Page 55: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/55.jpg)
Forward Sampling
Cloudy
Sprinkler Rain
WetGrass
Cloudy
Sprinkler Rain
WetGrass
56
+c 0.5-c 0.5
+c+s 0.1
-s 0.9-c +s 0.5
-s 0.5
+c+r 0.8
-r 0.2-c +r 0.2
-r 0.8
+s
+r+w 0.99
-w 0.01
-r
+w 0.90
-w 0.10
-s +r +w 0.90-w 0.10
-r +w 0.01-w 0.99
Samples:
+c, -s, +r, +w-c, +s, -r, +w
…
![Page 56: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/56.jpg)
Forward Sampling• This process generates samples with probability:
…i.e. the BN’s joint probability
• Let the number of samples of an event be
• Then
• I.e., the sampling procedure is consistent57
![Page 57: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/57.jpg)
Example• We’ll get a bunch of samples from the BN:
+c, -s, +r, +w+c, +s, +r, +w-c, +s, +r, -w+c, -s, +r, +w-c, -s, -r, +w
• If we want to know P(W)– We have counts <+w:4, -w:1>– Normalize to get P(W) = <+w:0.8, -w:0.2>– This will get closer to the true distribution with more samples– Can estimate anything else, too– What about P(C| +w)? P(C| +r, +w)? P(C| -r, -w)?– Fast: can use fewer samples if less time (what’s the drawback?)
Cloudy
Sprinkler Rain
WetGrass
C
S R
W
58
![Page 58: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/58.jpg)
Rejection Sampling• Let’s say we want P(C)
– No point keeping all samples around– Just tally counts of C as we go
• Let’s say we want P(C| +s)– Same thing: tally C outcomes, but
ignore (reject) samples which don’t have S=+s
– This is called rejection sampling– It is also consistent for conditional
probabilities (i.e., correct in the limit)
+c, -s, +r, +w
+c, +s, +r, +w
-c, +s, +r, -w
+c, -s, +r, +w
-c, -s, -r, +w
Cloudy
Sprinkler Rain
WetGrass
C
S R
W
59
![Page 59: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/59.jpg)
Likelihood Weighting• Problem with rejection sampling:
– If evidence is unlikely, you reject a lot of samples– You don’t exploit your evidence as you sample– Consider P(B|+a)
• Idea: fix evidence variables and sample the rest
• Problem: sample distribution not consistent!• Solution: weight by probability of evidence given parents
Burglary Alarm
Burglary Alarm
61
-b, -a -b, -a -b, -a -b, -a+b, +a
-b +a -b, +a -b, +a -b, +a+b, +a
![Page 60: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/60.jpg)
Likelihood Weighting• Sampling distribution if z sampled and e fixed evidence
• Now, samples have weights
• Together, weighted sampling distribution is consistent
Cloudy
R
C
S
W
62
![Page 61: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/61.jpg)
Likelihood Weighting
63
+c 0.5-c 0.5
+c+s 0.1
-s 0.9-c +s 0.5
-s 0.5
+c+r 0.8
-r 0.2-c +r 0.2
-r 0.8
+s
+r+w 0.99
-w 0.01
-r
+w 0.90
-w 0.10
-s +r +w 0.90-w 0.10
-r +w 0.01-w 0.99
Samples:
+c, +s, +r, +w…
Cloudy
Sprinkler Rain
WetGrass
Cloudy
Sprinkler Rain
WetGrass
![Page 62: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/62.jpg)
Inference: Sum over weights that match query value Divide by total sample weight What is P(C|+w,+r)?
Likelihood Weighting Example
64
Cloudy Rainy Sprinkler Wet Grass Weight0 1 1 1 0.4950 0 1 1 0.450 0 1 1 0.450 0 1 1 0.451 0 1 1 0.09
![Page 63: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/63.jpg)
Likelihood Weighting• Likelihood weighting is good
– We have taken evidence into account as we generate the sample
– E.g. here, W’s value will get picked based on the evidence values of S, R
– More of our samples will reflect the state of the world suggested by the evidence
• Likelihood weighting doesn’t solve all our problems– Evidence influences the choice of
downstream variables, but not upstream ones (C isn’t more likely to get a value matching the evidence)
• We would like to consider evidence when we sample every variable 65
Cloudy
Rain
C
S R
W
![Page 64: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/64.jpg)
Gibbs Sampling
1. Set all evidence E to e2. Do forward sampling to obtain x1,...,xn
3. Repeat:1. Pick any variable Xi uniformly at random.2. Resample xi’ from p(Xi | x1,..., xi-1, xi+1,..., xn)3. Set all other xj’=xj
4. The new sample is x1’,..., xn’
67
![Page 65: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/65.jpg)
Markov Blanket
68
X
Markov blanket of X: 1. All parents of X2. All children of X3. All parents of children of X
(except X itself)
X is conditionally independent from all other variables in the BN, given all variables in the markov blanket (besides X).
![Page 66: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/66.jpg)
Inference Algorithms• Exact algorithms
– Elimination algorithm– Sum-product algorithm– Junction tree algorithm
• Sampling algorithms– Importance sampling– Markov chain Monte Carlo
• Variational algorithms– Mean field methods– Sum-product algorithm and variations– Semidefinite relaxations
![Page 67: Bayesian Networks](https://reader036.vdocuments.mx/reader036/viewer/2022062411/56816698550346895dda836c/html5/thumbnails/67.jpg)
Summary
• Sampling can be your salvation• The dominating approach to inference in
BNs• Approaches:
– Forward (/Prior) Sampling– Rejection Sampling– Likelihood Weighted Sampling– Gibbs Sampling
70