samplesearch: importance sampling in the presence of determinism rina dechter bren school of...
TRANSCRIPT
![Page 1: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/1.jpg)
Overview
• Introduction: Mixed graphical models
• SampleSearch: Sampling with Searching
• Exploiting structure in samplinig: AND/OR Importance sampling
![Page 2: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/2.jpg)
Overview
• Introduction: Mixed graphical models
• SampleSearch: Sampling with Searching
• Exploiting structure in samplinig: AND/OR Importance sampling
![Page 3: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/3.jpg)
Bayesian Networks: Representation(Pearl, 1988)
))(|()(
))(|(
1
i
n
i
i
ii
XpaXPXP
XpaXP
:CPTs
lung Cancer
Smoking
X-ray
Bronchitis
DyspnoeaP(D|C,B)
P(B|S)
P(S)
P(X|C,S)
P(C|S)
P(S, C, B, X, D) = P(S) P(C|S) P(B|S) P(X|C,S) P(D|C,B)
Belief Updating:P (lung cancer=yes | smoking=no, dyspnoea=yes ) = ?
![Page 4: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/4.jpg)
Mixed Networks: Mixing Belief and Constraints
Belief or Bayesian Networks
A
D
B C
E
FA
D
B C
E
F
)|(),,|(
),|(),|(),|(),( :CPTS
}1,0{:Domains
,,,,, :Variables
AFPBAEP
CBDPACPABPAP
DDDDDD
FEDCBA
FEDCBA
Constraint Networks
)( :solutions ofset theExpresses
),(),(),(),( :sConstraint
}1,0{:Domains
,,,,, :Variables
4321
Rsol
EARBCDRACFRABCR
DDDDDD
FEDCBA
FEDCBA
B C D=0
D=1
0 0 0 1
0 1 .1 .9
1 0 .3 .7
1 1 1 0
),|( CBDP
allowednot is 1D1,C1,B
allowednot is 0,0,0
)(3
DCB
BCDR
B= R=
Constraints could be specified externally or may occur as zeros in the Belief network
![Page 5: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/5.jpg)
Mixed networks:Distribution and Queries
• The distribution represented by a mixed network T=(B,R):
• Queries:– Weighted Counting (Equivalent to P(e), partition
function, solution counting)
– Marginal distribution:
)(
)(Rsolx
B xPM
otherwise ,0
)( if ),(1
)(RsolxxP
MxP BT
)( iT XP
![Page 6: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/6.jpg)
Applications• Determinism: More Ubiquitous than you may think!
• Transportation Planning (Liao et al. 2004, Gogate et al. 2005)– Predicting and Inferring Car Travel Activity of individuals
• Genetic Linkage Analysis (Fischelson and Geiger, 2002)– associate functionality of genes to their location on chromosomes.
• Functional/Software Verification (Bergeron, 2000)– Generating random test programs to check validity of hardware
• First Order Probabilistic models (Domingos et al. 2006, Milch et al. 2005)– Citation matching
![Page 7: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/7.jpg)
8
S13m
L11fL11m
L13m
X11 S13f
L12fL12m
L13f
X12
X13Model for locus 1
S23m
L21fL21m
L23m
X21 S23f
L22fL22m
L23f
X22
X23Model for locus 2
Functions on Orange nodes are deterministic
![Page 8: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/8.jpg)
Approximate Inference
• Approximations are hard with determinism• Randomized Polynomial ε-approximation possible when no
zeros are present (Karp 1993, Cheng 2001)• ε-approximation NP-hard in the presence of zeros• Gibbs sampling is problematic when MCMC is not ergodic.
• Current remedies– Replace zeros with very small values (Laplace
correction: Naive Bayes, NLP)– bad performance when zeros or determinism is
real!
![Page 9: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/9.jpg)
Overview
• Introduction: Mixed graphical models
• SampleSearch: Sampling with Searching– Rejection problem – Recovery, and analysis– Empirical evaluation
• Exploiting structure in samplinig: AND/OR Importance sampling
![Page 10: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/10.jpg)
Importance Sampling: Overview
• Given a proposal or importance distribution Q(z) such that f(z)>0 implies Q(z)>0, rewrite
EXZwherezfM
eEXPeP
Zz
EXB
\ P(e) )(
),()(\
)(
)(
)(
)()()(
zQ
zfE
zQ
zQzfzfM Q
Zz Zz
Given i.i.d. samples z1,..,zN from Q(z),
P(e)M]M̂[E )(1
)(
)(1ˆQ
11
N
j
N
jj
j
j zwNzQ
zf
NM
![Page 11: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/11.jpg)
Generating i.i.d. samples from Q
A=0
B=0 B=1 B=0 B=1
A=1
C=1C=1C=1 C=0 C=0 C=0 C=1
Root
0.8 0.2
0.4 0.6
0.2 0.8 0.2 0.8 0.2 0.8 0.2 0.8
0.2 0.8
C=0
)8.0.2.0()(),|(),8.0,2.0,6.0,4.0()|(),2.0,8.0()(
),|()|()(),,(
),...,|(.....)|()( 11121
CQBACQABQAQ
BACQABQAQCBAQ
XXXQXXQXQ Q(X) nn
![Page 12: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/12.jpg)
Rejection Problemsolution. anot is xi if
:sampling Importance
0)(
)(
)(1~
1
xif
xiQ
xif
NM
N
i
A=0
B=0
C=0
B=1 B=0 B=1
A=1
C=1C=1C=1 C=0 C=0 C=0 C=1
Root
0.8 0.2
0.4 0.6
0.2 0.8 0.2 0.8 0.2 0.8 0.2 0.8
0.2 0.8
Toss a biased coin
P(H)=0.8, P(T)=0.2
Say we get H
![Page 13: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/13.jpg)
Rejection Problem solution. anot is xi if
:sampling Importance
0)(
)(
)(1~
1
xif
xiQ
xif
NM
N
i
A=0
B=0
C=0
B=1
C=1C=1 C=0
Root
0.8
0.4 0.6
0.2 0.8 0.2 0.8
Toss a biased coin
P(H)=0.4, P(T)=0.6
Say, We get a Head
![Page 14: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/14.jpg)
Rejection Problem solution. anot is xi if
:sampling Importance
0)(
)(
)(1~
1
xif
xiQ
xif
NM
N
i
A=0
B=0
C=0
B=1
C=1C=1 C=0
Root
0.8
0.4 0.6
0.2 0.8 0.2 0.8
Toss a biased coin
P(H)=0.4, P(T)=0.6
Say, We get a Head
![Page 15: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/15.jpg)
Rejection Problem
• A large number of assignments generated will be rejected, thrown away
solution. anot is xi if
:sampling Importance
0)(
)(
)(1~
1
xif
xiQ
xif
NM
N
i
A=0
B=0
C=0 C=1
Root
0.8
0.4
0.2 0.8
Toss a biased coin
P(H)=0.4, P(T)=0.6
Say, We get a Head
![Page 16: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/16.jpg)
Rejection Problem
All Blue leaves correspond to solutions i.e. f(x) >0All Red leaves correspond to non-solutions i.e. f(x)=0
solution. anot is xif 0)(
)(
)(1ˆ :sampling Importance
i
1
i
N
i i
i
xf
xQ
xf
NM
A=0
B=0
C=0
B=1 B=0 B=1
A=1
C=1C=1C=1 C=0 C=0 C=0 C=1
Root
0.8 0.2
0.4 0.6
0.2 0.8 0.2 0.8 0.2 0.8 0.2 0.8
0.2 0.8
Constraints: A≠B A≠C
![Page 17: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/17.jpg)
Revising Q to backtrack-free distribution:
All Blue leaves correspond to solutions i.e. f(x) >0All Red leaves correspond to non-solutions i.e. f(x)=0
A=0
B=0
C=0
B=1 B=0 B=1
A=1
C=1C=1C=1 C=0 C=0 C=0 C=1
Root
0.8 0.2
0.40 0.61
0 0 0 1 1 0 0 0
0.21 0.80
QF(branch)=0 if no solutions under it
QF(branch)=1 if no solutions under the other branch
QF(branch) =Q(branch) otherwise
Constraints: A≠B A≠C
![Page 18: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/18.jpg)
Generating samples from QF
C=1
Root
0.80.2
?? Solution?? Solution
?? Solution?? Solution
A=0
00.4 0.6 1
?? Solution ?? Solution
B=1
00.2 0.81
• Invoke an oracle at each branch.• Oracle returns True if
there is a solution under a branch
• False, otherwise
Constraints: A≠B A≠C QF(branch)=0 if no solutions under it
QF(branch)=1 if no solutions under the other branch
QF(branch) =Q(branch) otherwise
![Page 19: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/19.jpg)
Generating samples from QF
• Oracles: In practice• Adaptive consistency as pre-
processing step• A complete CSP solver
• Too costly• O(exp(treewidth))• Invoked O(nd) for each
sample
A=0
B=1
C=1
Root
0.80.2
1
1
Constraints: A≠B A≠C
Gogate et al., UAI 2005, Gogate and Dechter, UAI 2005
![Page 20: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/20.jpg)
Approximations of QF
• Use i-consistency instead of adaptive consistency – O(ni) time and space complexity– identify some zeros so that they are never sampled
• Cons: Too weak when constraint portion is hard.
Reje
ction
Rat
e
Percentage of Zeros identified
NP-hard exponential time and space
No processing
Polynomial consistency enforcement
Gogate et al., UAI 2005, Gogate and Dechter, UAI 2005
![Page 21: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/21.jpg)
25
Algorithm SampleSearch
A=0
B=0
C=0
B=1 B=0 B=1
A=1
C=1C=1C=1 C=0 C=0 C=0 C=1
Root
0.8 0.2
0.4 0.6
0.2 0.8 0.2 0.8 0.2 0.8 0.2 0.8
0.2 0.8
N
j j
j
zQ
zf
NM
1 )(
)(1ˆ
Gogate and Dechter, AISTATS 2007, AAAI 2007
Constraints: A≠B A≠C
![Page 22: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/22.jpg)
26
Algorithm SampleSearch
A=0
B=0
C=0
B=1 B=0 B=1
A=1
C=1C=1C=1 C=0 C=0 C=0 C=1
Root
0.8 0.2
0.4 0.6
0.2 0.8 0.2 0.8 0.2 0.8 0.2 0.8
0.2 0.81
N
j j
j
zQ
zf
NM
1 )(
)(1ˆ
Gogate and Dechter, AISTATS 2007, AAAI 2007
Constraints: A≠B A≠C
![Page 23: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/23.jpg)
27
Algorithm SampleSearch
A=0
B=1 B=0 B=1
A=1
C=1C=1C=0 C=0 C=0 C=1
Root
0.8 0.2
0.2 0.8 0.2 0.8 0.2 0.8
0.2 0.8
Resume Sampling
1
N
j j
j
zQ
zf
NM
1 )(
)(1ˆ
Gogate and Dechter, AISTATS 2007, AAAI 2007
Constraints: A≠B A≠C
![Page 24: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/24.jpg)
28
Algorithm SampleSearch
A=0
B=1 B=0 B=1
A=1
C=1C=1C=0 C=0 C=0 C=1
Root
0.8 0.2
1
0.2 0.8 0.2 0.8 0.2 0.8
0.2 0.8
Constraint Violated
Until f(sample)>0
1
N
j j
j
zQ
zf
NM
1 )(
)(1ˆ
Gogate and Dechter, AISTATS 2007, AAAI 2007
Constraints: A≠B A≠C
![Page 25: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/25.jpg)
29
Generate more Samples
A=0
B=0
C=0
B=1 B=0 B=1
A=1
C=1C=1C=1 C=0 C=0 C=0 C=1
Root
0.8 0.2
0.4 0.6
0.2 0.8 0.2 0.8 0.2 0.8 0.2 0.8
0.2 0.8
N
j j
j
zQ
zf
NM
1 )(
)(1ˆ
Gogate and Dechter, AISTATS 2007, AAAI 2007
Constraints: A≠B A≠C
![Page 26: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/26.jpg)
30
Generate more Samples
A=0
B=0
C=0
B=1 B=0 B=1
A=1
C=1C=1C=1 C=0 C=0 C=0 C=1
Root
0.8 0.2
0.4 0.6
0.2 0.8 0.2 0.8 0.2 0.8 0.2 0.8
0.2 0.8
1
N
j j
j
zQ
zf
NM
1 )(
)(1ˆ
Gogate and Dechter, AISTATS 2007, AAAI 2007
Constraints: A≠B A≠C
![Page 27: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/27.jpg)
Traces of SampleSearch
A=0
B=1
C=1
Root
A=0
B=0B=1
C=1
Root
A=0
B=1
C=1
Root
C=0
A=0
B=0 B=1
C=1
Root
C=0
Constraints: A≠B A≠C
Gogate and Dechter, AISTATS 2007, AAAI 2007
![Page 28: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/28.jpg)
32
SampleSearch: Sampling Distribution• Problem: Due to Search, the samples are no
longer i.i.d. from Q
• Theorem: SampleSearch generates i.i.d. samples from the backtrack-free distribution
M]M[E )(
)(1
ondistributi free-Backtrack :Q
FQ
1
F
N
j jF
j
zQ
zf
NM
MMEzQ
zf
NM Q
N
j j
j
ˆ ,)(
)(1ˆ1
Gogate and Dechter, AISTATS 2007, AAAI 2007
![Page 29: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/29.jpg)
The Sampling distribution QF of SampleSearch
A=0
B=0 B=1
C=1C=1 C=0
Root
0.8
0 1
0 0 0 1
C=0
What is probability of generating A=0?
QF(A=0)=0.8
Why? SampleSearch is systematic
What is probability of generating (A=0,B=1)?
QF(B=1|A=0)=1
Why? SampleSearch is systematic
What is probability of generating (A=0,B=0)?
Simple: QF(B=0|A=0)=0
All samples generated by SampleSearch are solutions
Backtrack-free distribution
N
i i
i
xQ
xf
NM
1 )(
)(1~
Gogate and Dechter, AISTATS 2007, AAAI 2007
Constraints: A≠B A≠C
![Page 30: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/30.jpg)
Asymptotic approximations of QF
A=0
B=0 B=1
C=1C=1 C=0
Root
0.8
0 1
0 0 0 1
C=0
Hole ?
Don’t know
No solutions here
No solutions here
•IF Hole THEN
• UF=Q (i.e. there is a solution at the other branch)
• LF=0 (i.e. no solution at the other branch)
Gogate and Dechter, AISTATS 2007, AAAI 2007
QF(branch)=0 if no solutions under it
QF(branch) =Q(branch) otherwise
![Page 31: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/31.jpg)
Approximations:Convergence in the limit
• Store all possible traces
A=0
B=1
C=1
Root
0.8
1
1
?
Gogate and Dechter, AISTATS 2007, AAAI 2007
A=0
B=1
C=1
Root
C=0
0.8
?
0.6
1
?A=0
B=0B=1
C=1
Root
0.8
?
1
0.8
?
![Page 32: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/32.jpg)
Approximations:Convergence in the limit
• From the combined sample tree, update U and L.
IF Hole THEN UFN=Q and LF
N=0
)()()(:
)(
)(lim
)(
)(lim
)(
)()(
xLxQxUBounding
unbiasedallyAsymptotic
MxL
xfE
xU
xfE
xQ
xfExfM
FN
FFN
FN
NFN
N
FXx
A=0
B=1
C=1
Root
0.8
1
1
?
Gogate and Dechter, AISTATS 2007, AAAI 2007
![Page 33: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/33.jpg)
Improving Naive SampleSeach:The IJGP-wc-SS algorithm
• Better Search Strategy– Can use any state-of-the-art CSP/SAT solver e.g. minisat (Een and
Sorrenson 2006)
• Better Proposal distribution– Use output of IJGP- a generalized belief propagation to compute
the initial importance function
• w-cutset importance sampling (Bidyuk and Dechter, 2007)– Reduce variance by sampling from a subspace
Gogate and Dechter, AISTATS 2007, AAAI 2007
![Page 34: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/34.jpg)
Experiments• Tasks
– Weighted Counting– Marginals
• Benchmarks– Satisfiability problems (counting solutions)– Linkage networks– Relational instances (First order probabilistic networks)– Grid networks– Logistics planning instances
• Algorithms– IJGP-wc-SS/LB and IJGP-wc-SS/UB– IJGP-wc-IS (Vanilla algorithm that does not perform search)– SampleCount (Gomes et al. 2007, SAT)– ApproxCount (Wei and Selman, 2007, SAT)– EPIS (Changhe and Druzdzel, 2006)– RELSAT (Bayardo and Peshoueshk, 2000, SAT)– Edge Deletion Belief Propagation (Choi and Darwiche, 2006)– Iterative Join Graph Propagation (Dechter et al., 2002)– Variable Elimination and Conditioning (VEC)
![Page 35: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/35.jpg)
Results: Probability of EvidenceLinkage instances (UAI 2006 evaluation)
Time Bound: 3 hrsM: number of samples generated in 10 hrsZ: Probability of Evidence
![Page 36: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/36.jpg)
Results: Probability of EvidenceRelational instances (UAI 2008 evaluation)
Time Bound: 10 hrsM: number of samples generated in 10 hrsZ: Probability of Evidence
![Page 37: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/37.jpg)
![Page 38: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/38.jpg)
Results: Solution CountsLatin Square instances (size 8 to 16)
Time Bound: 10 hrsM: number of samples generated in 10 hrsZ: Solution Counts
![Page 39: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/39.jpg)
Results: Solution CountsLangford instances
Time Bound: 10 hrsM: number of samples generated in 10 hrsZ: Solution Counts
![Page 40: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/40.jpg)
![Page 41: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/41.jpg)
Results on Marginals
• Evaluation Criteria
• Always bounded between 0 and 1• Lower Bounds the KL distance• When probabilities close to zero are present KL
distance may tend to infinity.
n
xAxP
cedisHellinger
xAeApproximatxPExactn
i Dxii
ii
ii
1
2)()(
21
tan
)(: )(:
![Page 42: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/42.jpg)
Results: Posterior MarginalsLinkage instances (UAI 2006 evaluation)
Time Bound: 3 hrsTable shows the Hellinger distance (∆) and Number of samples: M
![Page 43: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/43.jpg)
![Page 44: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/44.jpg)
Summary: SampleSearch• Manages rejection problem while sampling
• Sampling Distribution is the backtrack-free distribution QF
• Approximation of QF by storing all traces yielding an asymptotically unbiased estimator– Linear time and space overhead– Bound the weighted counts from above and below
• Empirically, when a substantial number of zero probabilities are present, SampleSearch dominate.
![Page 45: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/45.jpg)
Overview
• Introduction: Mixed graphical models
• SampleSearch: Sampling with Searching
• Exploiting structure in sampling: AND/OR Importance sampling
![Page 46: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/46.jpg)
51
Motivation
)(
)(
)(
)()()(
zQ
zfE
zQ
zQzfzfM Q
Zz Zz
4,3,2,1
4432321 )(),,(),,()(xxxx
CBAz
xfxxxfxxxfzfM
4,3,2,1
4321 ),()()()(xxxx
CBAz
xxfxfxfzfM
Given Q(z), Importance sampling totally disregards the structure of f(z) while approximating it.
![Page 47: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/47.jpg)
52
OR Search Tree
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 1
C
D
F
E
B
A 0 1
A
E
C
B
F
D
A B C RABC
0 0 0 10 0 1 10 1 0 00 1 1 11 0 0 11 0 1 11 1 0 11 1 1 0
A B E RABE
0 0 0 10 0 1 00 1 0 10 1 1 11 0 0 01 0 1 11 1 0 11 1 1 0
A E F RAEF
0 0 0 00 0 1 10 1 0 10 1 1 11 0 0 11 0 1 11 1 0 11 1 1 0
B C D RBCD
0 0 0 10 0 1 10 1 0 10 1 1 01 0 0 11 0 1 01 1 0 11 1 1 1
Constraint Satisfaction – Counting Solutions
![Page 48: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/48.jpg)
54
AND/OR Search Tree
A
E
C
B
F
D
A
D
B
EC
F
A B C RABC
0 0 0 10 0 1 10 1 0 00 1 1 11 0 0 11 0 1 11 1 0 11 1 1 0
A B E RABE
0 0 0 10 0 1 00 1 0 10 1 1 11 0 0 01 0 1 11 1 0 11 1 1 0
A E F RAEF
0 0 0 00 0 1 10 1 0 10 1 1 11 0 0 11 0 1 11 1 0 11 1 1 0
B C D RBCD
0 0 0 10 0 1 10 1 0 10 1 1 01 0 0 11 0 1 01 1 0 11 1 1 1
AOR
0AND
BOR
0AND
OR E
OR F F
AND 0 1 0 1
AND 0 1
C
D D
0 1 0 1
0 1
1
E
F F
0 1 0 1
0 1
C
D D
0 1 0 1
0 1
1
B
0
E
F F
0 1 0 1
0 1
C
D D
0 1 0 1
0 1
1
E
F F
0 1 0 1
0 1
C
D D
0 1 0 1
0 1
Constraint Satisfaction – Counting Solutions
pseudo tree
![Page 49: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/49.jpg)
55
AND/OR TreeA
D
B
EC
F
AOR
0AND
BOR
0AND
OR E
OR F
AND 0 1
AND 0 1
C
D D
0 1 0 1
0 1
1 1 1 0 0 1
0
1
E
F F
0 1 0 1
0 1
C
D
0 1
0 1
1
B
0
E
F
0 1
0 1
C
D D
0 1 0 1
0 1
1
E
F
0 1
0 1
C
D
0 1
0 1
1 1 0 1 1 1
0
1 1 1 0 1 0 1 0 1 1
0 0 0
pseudo tree
OR – casing upon variable values
AND – decomposition into independent subproblems
![Page 50: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/50.jpg)
56
Complexity of AND/OR Tree Search
AND/OR tree OR tree
Space O(n) O(n)
Time
O(n km)O(n kw* log n)
[Freuder & Quinn85], [Collin, Dechter & Katz91], [Bayardo & Miranker95], [Darwiche01]
O(kn)
k = domain sizem = depth of pseudo-treen = number of variablesw*= treewidth
![Page 51: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/51.jpg)
Background: AND/OR search space
X
Z
Y
≠
≠
{0,1,2}
{0,1,2}
{0,1,2}
Problem
Z
X Y
Pseudo-tree
X
Z
YChain Pseudo-tree
Z
0 1 2
X Y X Y X Y
1 2 1 202 0 2 0 101
OR
AND
OR
AND
AND/OR Tree
X
0 1 2
Z Z Z
1 2 20 0 1
Y
1 2
Y
1 2
Y
0 2
Y
0 2
Y
0 1
Y
0 1
OR Tree
57
![Page 52: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/52.jpg)
Example Bayesian network
58
P(X|Z) X=0 X=1 X=2
Z=0 0.3 0.4 0.3
Z=1 0.2 0.7 0.1
P(Y|Z) Y=0 Y=1 Y=2
Z=0 0.5 0.1 0.4
Z=1 0.2 0.6 0.2
P(Y)
Y=0 0.2
Y=1 0.7
Y=2 0.1
P(Z) Z=0 Z=1
0.8 0.2
Y
B
Z
X
A
Evidence A=0, B=0
P(X)
X=0 0.1
X=1 0.2
X=2 0.6
f(XZ) f(YZ)
f(Z)
XYZ
ZfYZfXZfbaPM )()()(),(
![Page 53: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/53.jpg)
59
Recap:Conventional Importance Sampling
)|()|()(
)()()(
)|()|()()|()|()(
)()()(
ZYQZXQZQ
ZfYZfXZfE
ZYQZXQZQZYQZXQZQ
ZfYZfXZfM
Q
XYZ
XYZ
ZfYZfXZfM )()()(
)|()|()()( ZYQZXQZQXYZQ
Gogate and Dechter, UAI 2008, CP2008
AND/OR idea!Decompose this expectation
Y
B
Z
X
A
![Page 54: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/54.jpg)
60
AND/OR Importance Sampling (General Idea)
Z
X Y
Pseudo-tree
)|(
)|(
)()|(
)|(
)()(
)(
)(ZYQ
ZYQ
YZfZXQ
ZXQ
XZfZQ
ZQ
ZfM
YXZ
)|()|()()|()|()(
)()()(ZYQZXQZQ
ZYQZXQZQ
ZfYZfXZfM
XYZ
Z
ZYQ
YZfEZ
ZXQ
XZfE
ZQ
ZfEM QQQ |
)|(
)(|
)|(
)(
)(
)(
• Decompose Expectation
Y
B
Z
X
A
![Page 55: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/55.jpg)
61
• Estimate all conditional expectations separately• How?
– Record all samples– For each sample that has Z=j
• Estimate the conditional expectations X|Z and Y|Z using samples corresponding to X|Z=j and Y|Z=j respectively.
• Combine the results
AND/OR Importance Sampling (General Idea) Z
X Y
Pseudo-tree
Z
ZYQ
YZfEZ
ZXQ
XZfE
ZQ
ZfEM QQQ |
)|(
)(|
)|(
)(
)(
)(
Y
B
Z
X
A
![Page 56: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/56.jpg)
62
AND/OR Importance Sampling
Z
0
X
2
1
1
Y
0 1
X
21
Y
0 1
<1.6,2> <0.4,2>
<0.16,1> <0.36,1> <0.2,1><0.14,1> <0.28,1> <0.84,1>
<0.08,1><0.12,1>
0|)0|(
)0,(Z
ZXQ
ZXfE
0|)0|(
)0,(Z
ZYQ
ZYfE
Sample # Z X Y
1 0 1 0
2 0 2 1
3 1 1 1
4 1 2 0
2/)0,2w(x0)z1,w(x
0 Zhaving Xon samples of Weight Average
0|)0|(
)0,( of Estimate
z
ZZXQ
ZXfE
Z
X Y
Pseudo-tree
Z
ZYQ
YZfEZ
ZXQ
XZfE
ZQ
ZfEM QQQ |
)|(
)(|
)|(
)(
)(
)(
Gogate and Dechter, UAI 2008, CP2008
Y
B
Z
X
A
![Page 57: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/57.jpg)
64
AND/OR Importance Sampling
Z
0
X
2
1
1
Y
0 1
X
21
Y
0 1
<1.6,2> <0.4,2>
<0.16,1> <0.36,1> <0.2,1><0.14,1> <0.28,1> <0.84,1>
<0.08,1><0.12,1>
0.262
0.360.16
0.17
2
0.140.2
0.462
0.840.08
0.2
2
0.120.28
0.09246.0*2.0 0.044217.0*26.0
05376.04
)092.04.02(0.0442)1.6(2
All AND nodes: Separate Components. Take ProductOperator: Product
All OR nodes: Conditional Expectations given the assignment above it
Operator: Average
Sample # Z X Y
1 0 1 0
2 0 2 1
3 1 1 1
4 1 2 0
Gogate and Dechter, UAI 2008, CP2008
![Page 58: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/58.jpg)
65
Algorithm AND/OR Importance Sampling
1. Generate samples x1, . . . ,xN from Q along O.
2. Build a AND/OR sample tree for the samples x1, . . . ,xN along the ordering O.
3. FOR all leaf nodes i of AND-OR tree do1. IF AND-node v(i)= 1 ELSE v(i)=0
4. FOR every node n from leaves to the root do1. IF AND-node v(n)=product of children2. IF OR-node v(n) = Average of children
5. Return v(root-node)
Gogate and Dechter, UAI 2008, CP2008
![Page 59: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/59.jpg)
67
Properties of AND/OR Importance Sampling
• Unbiased estimate of weighted counts.• AND/OR estimate has lower Variance than
conventional importance sampling estimate.• Variance Reduction
– Easy to Prove for case of complete independence (Goodman, 1960)
– Complicated to prove for general conditional independence case (Gogate thesis, papers!)
Gogate and Dechter, UAI 2008, CP2008
![Page 60: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/60.jpg)
AND/OR w-cutset (Rao-Blackwellised) sampling
• Rao-Blackwellisation (Rao, 1963)– Partition X into K and R, such that we can compute
P(R|k) efficiently.– Sample from K and sum out R– Estimate:
– w-cutset sampling (Bidyuk and Dechter, 2003): Select K such that the treewidth of R after removing K is bounded by “w”.
N
i i
Ri
RB kQ
kRP
NM
1 )(
)|(1ˆ
Weighted Counts conditioned on K=ki
Gogate and Dechter, Under review, CP 2009
![Page 61: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/61.jpg)
AND/OR w-cutset sampling• Perform AND/OR tree or graph sampling on K• Exact inference on R• Orthogonal approaches:
– Theorem: Combining AND/OR sampling and w-cutset sampling yields further variance reduction.
A
B C
D
E G
F
A
B C
D
E G
F
A
B C
A
B
C
Full pseudo tree
Start pseudo treeon the cutset variables
OR pseudo tree on the cutset variables
Graphical model
Gogate and Dechter, Under review, CP 2009
![Page 62: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/62.jpg)
70
From Search Trees to Search Graphs
• Any two nodes that root identical subtrees (subgraphs) can be merged
![Page 63: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/63.jpg)
71
From Search Trees to Search Graphs
• Any two nodes that root identical subtrees (subgraphs) can be merged
![Page 64: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/64.jpg)
72
Merging Based on Context
• One way of recognizing nodes that can be merged:
context (X) = ancestors of X in pseudo tree that are connected to X, or to descendants of X
[ ]
[A]
[AB]
[AE][BC]
[AB]
A
D
B
EC
F
pseudo tree
A
E
C
B
F
D
A
E
C
B
F
D
![Page 65: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/65.jpg)
AND/OR Graphs
C
B D
A
2
B D
0 101
C
0 1
B D B D
1 2 1 202 0 2
2 0 1 1
A
1 2
A
0 2
A
0
A
1
A
2
A
0
2
B D
0 101
C
0 1
D D
1 2 0 2
1 2
2
0
B
1
A
0
2
A
1
B
0
A
2
AND/OR tree
AND/OR graph
C
B
D
A
≠
≠
≠
{0,1,2}
{0,1,2}
{0,1,2}
{0,1,2}
![Page 66: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/66.jpg)
C
0
B
1
D
1
A
0
2
D
2
A
1
1
B
0
D
0
A
2 0
2
D
2
A
C
0 1
B D B D
1 2 1 202 0 2
A
0
A
1
A
2
A
0
C
0 1
D D
1 2 0 22
0
B
1
A
0
2
A
1
B
0
A
2
C
B D
A
C
B
D
A
≠
≠
≠
{0,1,2}
{0,1,2}
{0,1,2}
{0,1,2}
OR Sample Tree
AND/OR Sample Tree AND/OR Sample
Graph
AND/OR graph sampling
![Page 67: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/67.jpg)
IS
w-cutset ISAND/OR
Tree IS
AND/OR w-cutset Tree IS
AND/OR Graph IS
AND/OR w-cutset Graph IS
Variance Hierarchy and Complexity
O(nN)O(1)
O(nN)O(h)
O(cN+(n-c)Nexp(w))O(h+(n-c)exp(w))
O(cN+(n-c)Nexp(w))O(1)
O(nNw*)O(nN)
O(cNw*+(n-c)Nexp(w))O(cN+(n-c)exp(w))
![Page 68: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/68.jpg)
Experiments
• Benchmarks– Linkage analysis– Graph coloring– Grids
• Algorithms– OR tree sampling– AND/OR tree sampling– AND/OR graph sampling– w-cutset versions of the three schemes above
![Page 69: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/69.jpg)
Results: Probability of EvidenceLinkage instances (UAI 2006 evaluation)
Time Bound: 1hr
![Page 70: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/70.jpg)
![Page 71: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649ec75503460f94bd3321/html5/thumbnails/71.jpg)
82
Summary: AND/OR Importance sampling
• AND/OR sampling: A general scheme to exploit conditional independence in sampling
• Theoretical guarantees: lower sampling error than conventional sampling
• Variance reduction orthogonal to Rao-Blackwellised sampling.
• Better empirical performance than conventional sampling.