experiments in bayes nets
TRANSCRIPT
8/15/2019 Experiments in Bayes Nets
http://slidepdf.com/reader/full/experiments-in-bayes-nets 1/21
Liu, Smith 1
Jamie Liu and Adam Smith6.825 – Project 211/4/2004
We learned a lot from this project. Enjoy.
1. Variable Elimination Functionality
After executing our variable elimination procedure, we obtained the following results for each of the queries below.
For the sake of easy analysis of the PropCost probability distributions obtainedthroughout this project from the insurance network, we define the function f to be aweighted average across the discrete domain, resulting in a single scalar valuerepresentative of the overall cost. More specifically,
f = 1E5*PHundredThou + 1E6*PMillion + 1E4*PTenThou + 1E3*PThousand
1. P(Burglary | JohnCalls = true, MaryCalls = true)
<[Burglary] = [false]> = 0.7158281646356072
<[Burglary] = [true]> = 0.284171835364393
2. P(Earthquake | JohnCalls = true, Burglary = true)
<[Earthquake] = [false]> = 0.8239331615949207
<[Earthquake] = [true]> = 0.17606683840507917
3. P(PropCost | Age = Adolescent, Antilock = False, Mileage = FiftyThou)
<[PropCost] = [HundredThou]> = 0.1729786918964137
<[PropCost] = [Million]> = 0.02709352198178344
<[PropCost] = [TenThou]> = 0.3427002442093675
<[PropCost] = [Thousand]> = 0.45722754191243536
(f = 48275.62)
These results are consistent with those obtained by executing the given enumerationprocedure, and those given in Table 1 of the project hand-out.
2. More Variable Elimination Exercise
A. Insurance Network Queries
8/15/2019 Experiments in Bayes Nets
http://slidepdf.com/reader/full/experiments-in-bayes-nets 2/21
Liu, Smith 2
1. P(PropCost | Age = Adolescent, Antilock = False, Mileage = FiftyThou,MakeModel = SportsCar)
If the MakeModel of the car in question is that of a sports car then,
based on the network as illustrated in Figure 1 of the handout, weexpect that the driver would be less risk averse, the driver would have
more money, the car would be of higher value. All of these thingsshould cause the cost of insurance to “go up,” relative to our previousquery which did not involve any evidence about the MakeModel of the
car. An increase in the PropCost domain sense means that the
probability distribution should be shifted towards the higher costelements of the domain (e.g. Million might have a higher probability
than Thousand).
Indeed, this is what happens. As can be seen below, f is about four
thousand dollars greater in this case relative to that from Section 1.3.
<[PropCost] = [HundredThou]> = 0.17179333672003955
<[PropCost] = [Million]> = 0.03093877334365239
<[PropCost] = [TenThou]> = 0.34593039737969233
<[PropCost] = [Thousand]> = 0.45133749255661565
(f = 52028.74)
2. P(PropCost | Age = Adolescent, Antilock = False, Mileage = FiftyThou,GoodStudent = True)
In this case, counter-intuitive as it may seem, if the driver is aGoodStudent, then the overall cost of insurance goes up. This
follows from the network as shown in Figure 1 of the project handout,
i.e. GoodStudent is only connected to the network through two
parents: Age and SocioEcon. Since Age is an evidence variable,
SocioEcon is the only node affected by the augmentation of
GoodStudent to the evidence. More specifically, if the adolescent
driver is a good student, they are likely to have more money, and thusdrive fancier cars, be less risk averse, et cetera.
This result is manifested in the results after variable elimination giventhe proper evidence. More specifically, f is a little less than four
thousand dollars greater in this case relative to that from Section 1.3.
<[PropCost] = [HundredThou]> = 0.1837467917616061
<[PropCost] = [Million]> = 0.029748793596801583
<[PropCost] = [TenThou]> = 0.32771416728772235
<[PropCost] = [Thousand]> = 0.4587902473538701
(f = 51859.40)
8/15/2019 Experiments in Bayes Nets
http://slidepdf.com/reader/full/experiments-in-bayes-nets 3/21
Liu, Smith 3
B. Carpo Network Queries
1. P(N112 | N64 = “3”, N113 = “1”, N116 = “0”)
<[N112] = [0]> = 0.9880400004226929
<[N112] = [1]> = 0.01195999957730707
2. P(N143 | N146 = “1”, N116 = “0”, N121 = “1”)
<[N143] = [0]> = 0.899999996961172
<[N143] = [1]> = 0.10000000303882783
3. Random Elimination Ordering
A. Histograms
Histogram of Computation Time underRandom Elimination Ordering: Problem 1
0
1000
2000
3000
4000
5000
6000
1 2 3 4 5 6 7 8 9 10
Trials
Figure 1. Histogram of Computation Time for P(ProbCost | Age =Adolescent, Antilock = False, Mileage = FiftyThou, MakeModel =SportsCar).
8/15/2019 Experiments in Bayes Nets
http://slidepdf.com/reader/full/experiments-in-bayes-nets 4/21
Liu, Smith 4
Histogram of Computation Time underRandom Elimination Ordering: Problem 2
0
1000
2000
3000
4000
5000
6000
1 2 3 4 5 6 7 8 9 10
Trials
Figure 2. Histogram of Computation Time for P(ProbCost | Age =Adolescent, Antilock = False, Mileage = FiftyThou, GoodStudent = True).
Histogram of Computation Time underRandom Elimination Ordering: Problem 3
0
1000
2000
3000
4000
5000
6000
1 2 3 4 5 6 7 8 9 10
Trials
Figure 3. Histogram of Computation Time for P(N112 | N64 = "3", N113 ="1", N116 = "0").
8/15/2019 Experiments in Bayes Nets
http://slidepdf.com/reader/full/experiments-in-bayes-nets 5/21
Liu, Smith 5
Histogram of Computation Time underRandom Elimination Ordering: Problem 4
0
1000
2000
3000
4000
5000
6000
1 2 3 4 5 6 7 8 9 10
Trials
Figure 4. Histogram of Computation Time for P(N143 | N146 = "1", N116 ="0", N121 = "1").
B. Discussion
Error! Reference source not found. through Error! Reference sourcenot found. illustrate the running time of a random order variable
elimination algorithm for each of the problems in Task 2 of the projecthandout. We ran the algorithm ten times for each problem. For each bar,if there it is stacked with a purple bar on top of it, then the heap ran out of memory during that execution. In this case, we know that the executionwould have taken at least the amount of time illustrated by the blue bar,the time it executed before running out of memory. We suppose thateach execution where the computer ran out of memory would have takenat least 5000 seconds to complete.
It is worth noting that the time taken on the successful runs (the sampleswithout a purple bar) is much lower than the time taken to execute theunsuccessful runs before they crashed. I.e. the successful blue bars tend
to be shorter than the unsuccessful blue bars. This indicates that either random ordering tends to get it very right or very wrong.
8/15/2019 Experiments in Bayes Nets
http://slidepdf.com/reader/full/experiments-in-bayes-nets 6/21
Liu, Smith 6
4. Greedy Elimination Ordering
A. Histograms
Greedy Variable Elimination Runtimes
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1 2 3 4
Problem Number
Figure 5. Greedy Variable Elimination Runtimes for 10 trials of running each of 4
problems.
Problem Average Time (seconds)Insurance – 1 0.629
Insurance – 2 1.086
Carpo – 1 0.088Carpo – 2 0.087
Table 1. Average time of execution for variable elimination for the problemsfrom Task 2. Averages are constructed across ten independant runs each,which are illustrated in Figure 5.
B. Discussion
As can be seen fromTable 1, the time needed for variable elimination is much smaller for agreedy elimination ordering versus a random ordering. This makes a lot
of sense, because the random ordering could happen to eliminate aparent of many children, creating a huge factor which slows down thealgorithm and eats up memory. On the contrary, greedy ordering variableelimination works very well. Even in the cases from Section 3 in whichwe did not run out of memory, the greedy algorithm tends to be about100-200 times faster.
8/15/2019 Experiments in Bayes Nets
http://slidepdf.com/reader/full/experiments-in-bayes-nets 7/21
Liu, Smith 7
5. Likelihood Weighting and Gibbs SamplingFunctionality
Each of our results below look like they are in the right neighborhood. Wegive more explicit quality results in the problems that follow this one.
A. Basic Results – Likelihood Weighting
1. P(Burglary | JohnCalls = true, MaryCalls = true)
<[Burglary] = [false]> = 0.5448387970739699
<[Burglary] = [true]> = 0.4551612029260302
2. P(Earthquake | JohnCalls = true, Burglary = true)
<[Earthquake] = [false]> = 0.9997158283603297
<[Earthquake] = [true]> = 2.8417163967036946E-4
3. P(PropCost | Age = Adolescent, Antilock = False, Mileage = FiftyThou)
<[PropCost] = [HundredThou]> = 0.17105091038203132
<[PropCost] = [Million]> = 0.021563876240368398
<[PropCost] = [TenThou]> = 0.35877461270610517
<[PropCost] = [Thousand]> = 0.44861060067149516
4. P(PropCost | Age = Adolescent, Antilock = False, Mileage = FiftyThou,MakeModel = SportsCar)
<[PropCost] = [HundredThou]> = 0.16339257873401916
<[PropCost] = [Million]> = 0.030620517617711222
<[PropCost] = [TenThou]> = 0.35048331774243846
<[PropCost] = [Thousand]> = 0.4555035859058312
5. P(PropCost | Age = Adolescent, Antilock = False, Mileage = FiftyThou,GoodStudent = True)
<[PropCost] = [HundredThou]> = 0.20177159162635994<[PropCost] = [Million]> = 0.032866049889275516
<[PropCost] = [TenThou]> = 0.30414914618811645
<[PropCost] = [Thousand]> = 0.46121321229624807
6. P(N112 | N64 = “3”, N113 = “1”, N116 = “0”)
<[N112] = [0]> = 0.9910128302117664
<[N112] = [1]> = 0.00898716978823346
8/15/2019 Experiments in Bayes Nets
http://slidepdf.com/reader/full/experiments-in-bayes-nets 8/21
Liu, Smith 8
7. P(N143 | N146 = “1”, N116 = “0”, N121 = “1”)
<[N143] = [0]> = 0.9172494563262301
<[N143] = [1]> = 0.08275054367376986
B. Basic Results – Gibbs Sampling
1. P(Burglary | JohnCalls = true, MaryCalls = true)
<[Burglary] = [false]> = 0.71
<[Burglary] = [true]> = 0.29
2. P(Earthquake | JohnCalls = true, Burglary = true)
<[Earthquake] = [false]> = 0.842<[Earthquake] = [true]> = 0.158
3. P(PropCost | Age = Adolescent, Antilock = False, Mileage = FiftyThou)
<[PropCost] = [HundredThou]> = 0.06
<[PropCost] = [Million]> = 0.01
<[PropCost] = [TenThou]> = 0.355
<[PropCost] = [Thousand]> = 0.5750000000000001
4. P(PropCost | Age = Adolescent, Antilock = False, Mileage = FiftyThou,
MakeModel = SportsCar)
<[PropCost] = [HundredThou]> = 0.09
<[PropCost] = [Million]> = 0.011
<[PropCost] = [TenThou]> = 0.34
<[PropCost] = [Thousand]> = 0.559
5. P(PropCost | Age = Adolescent, Antilock = False, Mileage = FiftyThou,GoodStudent = True)
<[PropCost] = [HundredThou]> = 0.213
<[PropCost] = [Million]> = 0.038
<[PropCost] = [TenThou]> = 0.372<[PropCost] = [Thousand]> = 0.377
6. P(N112 | N64 = “3”, N113 = “1”, N116 = “0”)
<[N112] = [0]> = 0.97
<[N112] = [1]> = 0.03
8/15/2019 Experiments in Bayes Nets
http://slidepdf.com/reader/full/experiments-in-bayes-nets 9/21
Liu, Smith 9
7. P(N143 | N146 = “1”, N116 = “0”, N121 = “1”)
<[N143] = [0]> = 0.922
<[N143] = [1]> = 0.078
6. Ignoring Prefix of Samples in Gibbs Sampling
A. Results
Prefix Throwaway in Gibbs Sampling
0.00E+00
5.00E-04
1.00E-03
1.50E-03
2.00E-03
2.50E-03
3.00E-03
3.50E-03
4.00E-03
0 200 400 600 800 1000
Size of Prefix Thrown Away
K L
D i v e r g
e n c e
Figure 6. Quality (KL divergence) of estimates produced by Gibbs sampler.Each run used 2000 samples, and threw away the first x samples, theindependant variable expressed on the x-axis.
Prefix Throwaway in Gibbs Sampling
0.00E+00
2.00E-04
4.00E-04
6.00E-04
8.00E-04
1.00E-03
1.20E-03
0 200 400 600 800 1000
Size of Prefix Thrown Away
A v e r a g e K L
D i v e r g e n c e
Figure 7. Averages for different prefix throwaway sizes from Figure 6.
8/15/2019 Experiments in Bayes Nets
http://slidepdf.com/reader/full/experiments-in-bayes-nets 10/21
Liu, Smith 10
B. Discussion
In this analysis, we ran the Gibbs sampler with 2000 samples on thesame problem (Carpo – 1). For each iteration, we threw away a variablenumber of the first samples. The idea is that since Gibbs sampling is aMarkov Chain algorithm, each sample highly depends on the samples
before it. Since we choose a random initialization vector for eachvariable, it can take some “burn in” time before the algorithm begins tosettle into the right global solution.
The results of our experiments are expressed in Figure 6 and Figure 7. Wehave a fairly nice characteristic curve as can be seen in the averagegraph, with the only exception being when we threw away the first 600samples. Looking at each run, however, at x = 600 there was a singleoutlier with an extremely high KL divergence; we can ignore it based onthe many runs that we did. It seems that the ideal “burn in” time, a trade-off between good initialization and diversity of counted samples, is 800samples.
7. Detailed Analysis – KL Divergences
A. Results
We present results indexed first by the algorithm (Likelihood Weighting,then Gibbs Samples) and then by the problem. Within each problem wedisplay two graphs: the first showing the results from ten iterations, andthe second showing the average KL divergence across each iteration.
8/15/2019 Experiments in Bayes Nets
http://slidepdf.com/reader/full/experiments-in-bayes-nets 11/21
Liu, Smith 11
1. Likelihood Weighting
Likelihood Weighting - Problem Insurance1
0.00E+00
1.00E-02
2.00E-02
3.00E-02
4.00E-02
5.00E-02
6.00E-02
7.00E-02
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
)Number of Samples (x1000
K L D i v e r g e n c e
Figure 3. KL Divergences when applying Likelihood Weighting to P(PropCost |
Age = Adolescent, Antilock=False, Mileage = FiftyThou, MakeModel = SportsCar).
Likelihood Weighting: Average KL Divergence -
Problem Insurance1
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
1 0 0
0
1 1 0
0
1 2 0
0
1 3 0
0
1 4 0 0
1 5 0
0
1 6 0 0
1 7 0
0
1 8 0
0
1 9 0
0
2 0 0
0
Number of Samples
K L D i v e r g e n c e
Figure 4. Average KL Divergence when applying Likelihood Weighting to
P(PropCost | Age = Adolescent, Antilock=False, Mileage = FiftyThou, MakeModel =
SportsCar) to sample sizes between 100 and 2000.
8/15/2019 Experiments in Bayes Nets
http://slidepdf.com/reader/full/experiments-in-bayes-nets 12/21
Liu, Smith 12
Likelihood Weighting - Problem Insurance2
0.00E+00
1.00E-02
2.00E-02
3.00E-02
4.00E-02
5.00E-02
6.00E-02
7.00E-02
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
1 0 0 0
1 1 0 0
1 2 0 0
1 3 0 0
1 4 0 0
1 5 0 0
1 6 0 0
1 7 0 0
1 8 0 0
1 9 0 0
2 0 0 0
Sample Size
D i v e r g e n c e
Figure 8. KL Divergences when applying Likelihood Weighting to P(PropCost | Age= Adolescent, Antilock=False, Mileage = FiftyThou, GoodStudent = True).
Likelihood Weighting: Average KL Divergence -
Problem Insurance2
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
0.02
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
1 0 0 0
1 1 0 0
1 2 0 0
1 3 0 0
1 4 0 0
1 5 0 0
1 6 0 0
1 7 0 0
1 8 0 0
1 9 0 0
2 0 0 0
Number of Samples
K L D i v e
r g e n c e
Figure 9. Average KL Divergence when applying Likelihood Weighting to
P(PropCost | Age = Adolescent, Antilock=False, Mileage = FiftyThou, GoodStudent= True).
8/15/2019 Experiments in Bayes Nets
http://slidepdf.com/reader/full/experiments-in-bayes-nets 13/21
Liu, Smith 13
Likelihood Weighting - Problem 3
0.00E+00
1.00E-02
2.00E-02
3.00E-02
4.00E-02
5.00E-02
6.00E-02
7.00E-02
8.00E-02
1
0 0
2
0 0
3
0 0
4
0 0
5
0 0
6
0 0
7
0 0
8
0 0
9
0 0
1 0
0 0
1 1
0 0
1 2
0 0
1 3
0 0
1 4
0 0
1 5
0 0
1 6
0 0
1 7
0 0
1 8
0 0
1 9
0 0
2 0
0 0
Number of Samples
Figure 10. KL Divergences when applying Likelihood Weighting to P(N112 | N64 =
"3", N113 = "1", N116 = "0").
Likelihood Weighting: Average KL Divergence -
Problem Carpo1
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
1 0 0 0
1 1 0 0
1 2 0 0
1 3 0 0
1 4 0 0
1 5 0 0
1 6 0 0
1 7 0 0
1 8 0 0
1 9 0 0
2 0 0 0
Number of Samples
K L D i v e r g e n c e
Figure 11. Average KL Divergence when applying Likelihood Weighting to P(N112 |
N64 = "3", N113 = "1", N116 = "0").
8/15/2019 Experiments in Bayes Nets
http://slidepdf.com/reader/full/experiments-in-bayes-nets 14/21
Liu, Smith 14
Likelihood Weighting - Problem 4
0.00E+00
5.00E-03
1.00E-02
1.50E-02
2.00E-02
2.50E-02
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
1
0 0 0
1
1 0 0
1
2 0 0
1
3 0 0
1
4 0 0
1
5 0 0
1
6 0 0
1
7 0 0
1
8 0 0
1
9 0 0
2
0 0 0
Number of Samples
Figure 12. KL Divergences when applying Likelihood Weighting to P(N143 | N146 =
"1", N116 = "0", N121 = "1").
Likelihood Weighting: Average KL Divergence -
Problem Carpo2
0
0.001
0.002
0.003
0.004
0.005
0.006
0.007
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
1 0 0 0
1 1 0 0
1 2 0 0
1 3 0 0
1 4 0 0
1 5 0 0
1 6 0 0
1 7 0 0
1 8 0 0
1 9 0 0
2 0 0 0
Number of Samples
K L
D i v e r g e n c e
Figure 13. Average KL Divergence when applying Likelihood Weighting to P(N143 |
N146 = "1", N116 = "0", N121 = "1").
8/15/2019 Experiments in Bayes Nets
http://slidepdf.com/reader/full/experiments-in-bayes-nets 15/21
Liu, Smith 15
2. Gibbs Sampling
Gibbs Sampling: KL Divergences vs Number of Samples for Problem 1
0
0.2
0.4
0.6
0.8
1
1.2
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 0
1 0 0 0 0
1 1 0 0 0
1 2 0 0 0
1 3 0 0 0
1 4 0 0 0
1 5 0 0 0
1 6 0 0 0
1 7 0 0 0
1 8 0 0 0
1 9 0 0 0
2 0 0 0 0
2 1 0 0 0
2 2 0 0 0
2 3 0 0 0
2 4 0 0 0
2 5 0 0 0
Number of Samples
Figure 14. Divergences resulting from Gibbs Sampling applied to P(PropCost | Age
= Adolescent, Antilock = False, Mileage = FiftyThou, MakeModel = SportsCar) for
sample sizes between 1000 and 25000.
8/15/2019 Experiments in Bayes Nets
http://slidepdf.com/reader/full/experiments-in-bayes-nets 16/21
Liu, Smith 16
Gibbs Sampling: Average KL Divergence vs Numberof Samples for Problem 1
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 0
1
0 0 0 0
1
1 0 0 0
1
2 0 0 0
1
3 0 0 0
1
4 0 0 0
1
5 0 0 0
1
6 0 0 0
1
7 0 0 0
1
8 0 0 0
1
9 0 0 0
2
0 0 0 0
2
1 0 0 0
2
2 0 0 0
2
3 0 0 0
2
4 0 0 0
2
5 0 0 0
Number of Samples
Figure 15. Average divergence resulting from Gibbs Sampling applied to
P(PropCost | Age = Adolescent, Antilock = False, Mileage = FiftyThou, MakeModel
= SportsCar) for sample sizes between 1000 and 25000.
Gibbs Sampling: KL Divergences vs Number of Samples for Problem 2
0
0.2
0.4
0.6
0.8
1
1.2
1 0
0 0
2 0
0 0
3 0
0 0
4 0
0 0
5 0
0 0
6 0
0 0
7 0
0 0
8 0
0 0
9 0
0 0
1 0 0
0 0
1 1 0
0 0
1 2 0
0 0
1 3 0
0 0
1 4 0
0 0
1 5 0
0 0
1 6 0
0 0
1 7 0
0 0
1 8 0
0 0
1 9 0
0 0
2 0 0
0 0
2 1 0
0 0
2 2 0
0 0
2 3 0
0 0
2 4 0
0 0
2 5 0
0 0
Number of Samples
Figure 16. Divergences resulting from Gibbs Sampling applied to P(PropCost | Age
= Adolescent, Antilock = False, Mileage = FiftyThou, GoodStudent = True) for
sample sizes between 1000 and 25000.
8/15/2019 Experiments in Bayes Nets
http://slidepdf.com/reader/full/experiments-in-bayes-nets 17/21
Liu, Smith 17
Gibbs Sampling: Average KL Divergence vs Numberof Samples for Problem 2
0
0.05
0.1
0.15
0.2
0.25
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 0
1 0 0 0 0
1 1 0 0 0
1 2 0 0 0
1 3 0 0 0
1 4 0 0 0
1 5 0 0 0
1 6 0 0 0
1 7 0 0 0
1 8 0 0 0
1 9 0 0 0
2 0 0 0 0
2 1 0 0 0
2 2 0 0 0
2 3 0 0 0
2 4 0 0 0
2 5 0 0 0
Number of Samples
Figure 17. Average divergence resulting from Gibbs Sampling applied to
P(PropCost | Age = Adolescent, Antilock = False, Mileage = FiftyThou,
GoodStudent = True) for sample sizes between 1000 and 25000.
Gibbs Sampling: KL Divergences vs Number of Samples for Problem 3
0.00E+00
5.00E-04
1.00E-03
1.50E-03
2.00E-03
2.50E-03
3.00E-03
3.50E-03
4.00E-03
1 0
0 0
3 0
0 0
5 0
0 0
7 0
0 0
9 0
0 0
1 1 0
0 0
1 3 0
0 0
1 5 0
0 0
1 7 0
0 0
1 9 0
0 0
2 1 0
0 0
2 3 0
0 0
2 5 0
0 0
Number of Samples
Figure 18. Divergences resulting from Gibbs Sampling applied to P(N112 | N64 =
"3", N113 = "1", N116 = "0") for sample sizes between 1000 and 25000.
8/15/2019 Experiments in Bayes Nets
http://slidepdf.com/reader/full/experiments-in-bayes-nets 18/21
Liu, Smith 18
Gibbs Sampling: Average KL Divergence vs Numberof Samples for Problem 3
0.00E+00
2.00E-04
4.00E-04
6.00E-04
8.00E-04
1.00E-03
1.20E-03
1 0 0
0
3 0 0
0
5 0 0
0
7 0 0
0
9 0 0
0
1 1 0 0
0
1 3 0 0
0
1 5 0 0
0
1 7 0 0
0
1 9 0 0
0
2 1 0 0
0
2 3 0 0
0
2 5 0 0
0
Number of Samples
Figure 19. Average Divergence resulting from Gibbs Sampling applied to P(N112 |
N64 = "3", N113 = "1", N116 = "0") for sample sizes between 1000 and 25000.
Gibbs Sampling: KL Divergences vs Number of Samples for Problem 4
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 0
1 0 0 0 0
1 1 0 0 0
1 2 0 0 0
1 3 0 0 0
1 4 0 0 0
1 5 0 0 0
1 6 0 0 0
1 7 0 0 0
1 8 0 0 0
1 9 0 0 0
2 0 0 0 0
2 1 0 0 0
2 2 0 0 0
2 3 0 0 0
2 4 0 0 0
2 5 0 0 0
Number of Samples
Figure 20. Divergences resulting from Gibbs Sampling applied to P(N143 | N146 =
"1", N116 = "0", N121 = "1") for sample sizes between 1000 and 25000.
8/15/2019 Experiments in Bayes Nets
http://slidepdf.com/reader/full/experiments-in-bayes-nets 19/21
Liu, Smith 19
Gibbs Sampling: Average KL Divergence vs Numberof Samples for Problem 4
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 0
1 0 0 0 0
1 1 0 0 0
1 2 0 0 0
1 3 0 0 0
1 4 0 0 0
1 5 0 0 0
1 6 0 0 0
1 7 0 0 0
1 8 0 0 0
1 9 0 0 0
2 0 0 0 0
2 1 0 0 0
2 2 0 0 0
2 3 0 0 0
2 4 0 0 0
2 5 0 0 0
Number of Samples
Figure 21. Average divergence resulting from Gibbs Sampling applied to P(N143 |
N146 = "1", N116 = "0", N121 = "1") for sample sizes between 1000 and 25000.
B. Discussion of Results
Four interesting things:
1. Number of samples in Gibbs versus Likelihood WeightingAs seen from the figures in Section 7.A.1, Likelihood weighting tendsto converge after about 500 samples, but always after 1000 in our problems and analyses.
We originally assumed that Gibbs sampling would converge in aboutthe same time, if not better. It turns out that Gibbs takes much longer;it typically converges by 5000 samples, a full order of magnitudehigher, as can be seen from the figures in Section 7.A.2. This is likelybecause of the Markov Chain approach used; since each sampledepends on the ones before it, it can take many iterations before the
algorithm settles into the global optima, whereas likelihood weightingby definition discovers the appropriate probabilities (i.e. weights).
2. Variance of time to converge can be highThe convergence of Likelihood Weighting in Problem 3, as illustratedin Figure 10 and Figure 11, exhibits very interesting properties. In theother problems, likelihood weighting runs tended to exhibit relativelylow variance in time to convergence. However, here we see someruns which converged very quickly, and others that took abnormally
8/15/2019 Experiments in Bayes Nets
http://slidepdf.com/reader/full/experiments-in-bayes-nets 20/21
Liu, Smith 20
long. This high variance occurred with high consistency in thisproblem, and thus is likely induced by some characteristic in theproblem; one likely explanation is that our query variable is a leaf node in a very poly-tree-like network.
3. Convergence is logarithmic
This is an evident feature of all of the graphs, but has enormousimplications for a choice of algorithms.
The criterion for “completeness” of an algorithm is that it arrives at theright answer. In the case of the sampling methods that we surveyed,unfortunately it takes an infinite time to arrive at the right answer.However, it is important to note that variable elimination alwaysarrives at the exact answer. Thus, if a user needs completeness (i.e.the right answer), they should probably use variable elimination.
However, if they only need a certain level of completeness, i.e. theywant to be x% right, they still cannot rely on sampling methods. This
gives rise to the x% correct y% of the time metric. We certainly seethis from our graphs.
4. Local optima in Gibbs sampling, but not in Likelihood WeightingThis is a very interesting point. In both problems 3 and 4 from Task 2under Gibbs sampling, one of the runs from each of these problemsdo not converge to zero. Instead, they seem to converge to a localoptima (which is not the global optima). This can be seen in the pinkline in Figure 18 and the jungle green line in Figure 20.
This is probably more likely in some networks than others. We couldprobably construct a very simple network that would not provoke this
behavior.
C. Computational Considerations – Sampling versus Variable Elimination
In comparing the computation time of sampling methods to variableelimination, we limit ourselves to discussion of greedy ordering variableelimination; since random ordering is very sub-optimal (see Section 4).
It turns out that for the networks and queries that we considered, variableelimination is the champ on both accuracy and speed. As can be seenfrom Table 2, variable elimination performed in near-second times on eachproblem, while Gibbs took about 15 seconds and Likelihood Weightingtook around 5 seconds.
This is with 1000 samples for the sampling algorithms, and an effectiveinfinite samples for variable elimination.
Our results might have been different if the networks involved were muchmore dense (i.e. connected) or much larger.
8/15/2019 Experiments in Bayes Nets
http://slidepdf.com/reader/full/experiments-in-bayes-nets 21/21
Liu, Smith 21
Task2.Insurance1
Task2.Insurance2
Task2.Carpo1
Task2.Carpo2
VariableElimination
0.741 1.142 0.120 0.090
GibbsSampling
12.778 13.530 19.228 18.045
LikelihoodWeighting
4.377 4.687 5.608 5.317
Table 2. Execution time of various algorithms on the four problems from Task 2.