adaptive belief propagation · xxxxx [email protected] xxxxxxxxxxxxxxxxxxxxx [email protected]...
TRANSCRIPT
Adaptive Belief Propagationhttps://github.com/geopapa11/adabp
Georgios Papachristoudis and John W. Fisher IIIxxxxx [email protected] xxxxxxxxxxxxxxxxxxxxx [email protected]
Massachusetts Institute of Technology
Motivation
� Graphical models are commonly used to represent large-scale inference problems.
� At any given time, only a small subset of all variables may be of interest.
� Observations may arrive at different times, resulting in a change of distribution.
� The marginals of variables of interest change after the addition of new observations.
� It is desirable to efficiently evaluate desired statistics and avoid redundant computations.
#1?
w1
v1
?v2
#2
?v2
w2
?
#3
?
w2w3
v3
v2
#4 ?
w4
v4
v2
?
· · · · · ·Model parameters (bold nodes) change due to the addition of newobservations, while only a small subset of latent nodes is of interest (?)
Temperature measurements (from sensors) are added sequentially,while only a part of latent variables (server) is of interest.
Contributions
� We develop an adaptive inference approach which gives the exact marginals on trees andwhose average-case performance is significantly better than that of standard BP.
� We provide an extension to Gaussian loopy graphs, where the results are exact.
� Implementation is straightforward, code is publicly available.
Problem statement
� X = {X1, . . . ,XN}: N latent variables that are thefocus of inference.
� Direct dependencies between latent variables arerepresented by edge set E .
� Neighbors of Xk are represented by N (k).
� Each latent node Xk is linked to mk ≥ 0measurements/observations: Yk,1, . . . ,Yk,mk
.
Y1,1
Y1,m1
Y2,1 Y2,m2
. . .
Y3,1 Y3,m3
... X1 X2
X3X4
Y4,m4Y4,1. . .
. . .
. . .
XN
� Discrete setting: Xk ∈ X , Gaussian setting: Xk ∈ Rd .
Updating node potentials
� ϕ(0)k (xk) : Node potentials of latent variables.
� χk`(xk, y`) : Pairwise potential between latent andobserved variables.
� ψij(xi , xj) : Pairwise potential between latentvariables.
. . .
X1 X2
X3X4
XN
Y1,1
Y1,m1
Y2,1 Y2,m2
. . .
Y3,1 Y3,m3
... X1 X2
X3X4
Y4,m4Y4,1. . .
. . .
. . .
XN
)
� A new measurement Yw`,u = yu at iteration ` changes the node potential of Xw` as
ϕ(`)w` (xw`) = ϕ
(`−1)w` (xw`)χw`u(xw`, yu).
� We only consider the graph of latent nodes.
Measurement and marginal orders
Measurement order w = {w1, . . . ,wM}: The order of acquiring measurements.
Marginal order v = {v1, . . . , vM}: The order of (latent) nodes whose marginal is ofinterest at each step.
. . .
X1 X2
X3X4
XN
?v1
w1
. . .
X1 X2
X3X4
XN
?
v2
w2
. . .
X1 X2
X3X4
XN
?
w3
v3
Adaptive BP
Belief Propagation
� Belief propagation is a message passing algorithm that runs in linear time tothe number of latent nodes and computes the node marginals.
� A message is given by mi→j(xj) =∑
xiϕi(xi)ψij(xi , xj)
∏k∈N (i)\j mk→i(xi).
TvTu
...
...
· · ·
· · ·
· · ·
i
j
k
u v
... ...
Ti
mi!j
Lowest Common Ancestor (LCA)
� The lca of two nodes w , v is the lowest (deepest) node that has both w and v asdescendants.
� For trees, the path between two nodes is uniquely determined from their lca.
� The lca of two nodes is determined in constant time by reduction to theRange Minimum Query (RMQ) problem [Czumaj et al., 2007].
� This requires the building of the so-called RMQ structure in O(N logN) time and space.
w
v
lca(w, v)
path(w, v)
Adaptive BP (AdaBP)
� Denote by M(w → v) the directed path from node w to node v , which is unique for trees.
?w`
v` w`�1
Send messages from w`−1 to w`
?w`
v` w`�1
Send messages from w` to v`
Theorem. AdaBP provides the exact marginals in pathM(w`→ v`),∀` for tree MRFs.
Preprocessing Build the RMQ structure.Initialization Initialize node, pairwise potentials and messages.for ` = 1, 2, . . . do
Determine path M(w`−1 → w`).Compute messages in M(w`−1 → w`).Update the node potential at Xw`
.Determine path M(w`→ v`).Compute messages in M(w`−1 → w`).Compute the marginal of interest pXv`
(xv`).end for
Extension to Max-Product
� Find the most likely sequence: x∗ ∈ arg maxx p(x).
� mi→j(xj) = maxxi ϕi(xi)ψij(xi , xj)∏
k∈N (i)\j mk→i(xi).
� δi→j(xj) = arg maxxi ϕi(xi)ψij(xi , xj)∏
k∈N (i)\j mk→i(xi).
� Propagate m, δ messages in M(w`−1 → w`).
w`
w`�1
Extension to Gaussian Loopy MRFs
� Feedback Message Passing (FMP) by [Liu et al., 2012] is abelief-propagation-like algorithm, which provides the exact marginal meansand variances in Gaussian loopy graphs.
� F : FVS nodes, set of nodes whose removal breaks loops (here: 16, 17, 18).
� T : Remaining acyclic graph.
� A: Anchors (neighbors of FVS nodes).
� wT` = w`, if w` ∈ T and wT` = wT`−1, otherwise.
1 2 3 4 16
5 6 7 8 9
10 11 12
18
13
17 14 15
Extension of AdaBP to Gaussian loopy graphs
1 2 3 4 16
5 6 7 8 9
10 11 12
18
13
17 14 15
w`w`�1
w`−1,w` ∈ T
w`�1
1 2 3 4 16
5 6 7 8 9
10 11 12
18
13
17 14 15w`
wT`
w`−1 ∈ F ,w` ∈ TFirst phase: Send messages from wT`−1 to w`
1 2 3 4 16
5 6 7 8 9
10 11 12
18
13
17 14 15
wT`
?v`
M(wT` → A)
?1 2 3 4 16
5 6 7 8 9
10 11 12
18
13
17 14 15
v` wT`
M(A → v`)
Second phase: Send messages from wT` to A and then to v`
Experiments
� Comparison against standard BP and RCTreeBP by [Sumer et al., 2011].
� Method by [Sumer et al., 2011] is an adaptive inference approach which constructs abalanced representation of an elimination tree to evaluate marginals in logarithmic time.
� Preprocessing time (for trees): O(|X |3N) / O(N logN) for AdaBP.� Time/update (for trees): O(|X |3 logN) / O(|X |2dist(w`−1,w`)) for AdaBP.
Synthetic data
� We construct unbalanced trees of varying sizes (N ∈ {10, 102, 103, 104}).
� We generate measurement order w randomly (column (a)) or such thatE[dist(w`−1,w`)] ≤ |X | logN (column (b)) and compute marginals at each step.
� AdaBP is orders of magnitude faster than standard BP,� AdaBP is 1.3–4.7 faster than RCTreeBP when E[dist(w`−1,w`)] ≤ |X | logN (b).
N
101
102
103
104
Speedup r
atio
100
101
102
103
104
|X | = 2
t(BP)/t(AdaBP)t(RCTreeBP)/t(AdaBP)
N
101
102
103
104
Speedup r
atio
100
101
102
103
104
|X | = 2
t(BP)/t(AdaBP)t(RCTreeBP)/t(AdaBP)
N10
110
210
310
4
Speedup r
atio
10-2
10-1
100
101
E[dist(wℓ−1, wℓ)] ∼ N
t(BP)/t(AdaBP)t(RCTreeBP)/t(AdaBP)
(c) E[dist(w`−1,w`)] ∼ N .
N
101
102
103
104
Speedup r
atio
100
101
102
103
104
|X | = 10
t(BP)/t(AdaBP)t(RCTreeBP)/t(AdaBP)
(a) Unconstrained w .
N
101
102
103
104
Speedup r
atio
100
101
102
103
104
|X | = 10
t(BP)/t(AdaBP)t(RCTreeBP)/t(AdaBP)
(b) Constrained w .
N10
110
210
310
4
Speedup r
atio
100
101
102
103
104
E[dist(wℓ−1, wℓ)] = 2
t(BP)/t(AdaBP)t(RCTreeBP)/t(AdaBP)
(d) E[dist(w`−1,w`)] fixed.
Real data
� We explore the effect of basepair mutation in the birth/death of CpG islands.
� AdaMP is up to 8 times faster than RCTreeMP.
� Update time per iteration is much smaller for AdaMP for small dist(w`−1,w`).
� Both methods are not sensitive to changes in the MAP sequence.
N0
2
4
6
8
102
103
104
10510
-2
100
102
104RCTreeMP
AdaMP
(a) Speedups of AdaMP over RCTreeMP.
10-2100102
10-2100102
10-5100105
100 101 102 103 104 10510-5100105
(b) Update times of AdaMP over RCTreeMP.
0 200 400 600 800
Update
tim
e (
sec)
5
10
15
20
25
RCTreeMP (ρ = 0.09)AdaMP (ρ = 0.14)
(c) Sensitivity to changes in MAP sequence.
� We analyze temperature measurements collected from 53 wireless sensors from IntelBerkeley Research Lab.
� We assume temperature evolution follows a Gaussian distribution.
� We collect measurements in a 6-hour window on a random order and are interested incomputing the marginal of selected areas.
� AdaBP is up to 4–6 times faster than standard Kalman filtering/smoothing techniques.
Iteration j50 100 150 200
Speedup r
atio
20
40
60
t(KF)/t(AdaBP)
dist(wℓ, wℓ−1)20 40 60 80 100 120 140 160 180
Tim
e (
se
c)
5
10
15
20
25
30
AdaBPKF
[Sumer et al., 2011] O. Sumer, U. A. Acar, A. T. Ihler and R. R. Mettu. Adaptive Exact Inference in Graphical Models. Journal of Machine Learning Research (JMLR), 12:3147–3186,November 2011.
[Czumaj et al., 2007] A. Czumaj, M. Kowaluk and A. Lingas. Faster algorithms for finding lowest common ancestors in directed acyclic graphs. Theor. Comput. Sci., 380:37–46, July 2007.
[Liu et al., 2012] Y. Liu, V. Chandrasekaran, A. Anandkumar and A. S. Willsky. Feedback Message Passing for Inference in Gaussian Graphical Models. IEEE Transactions on SignalProcessing, 60(8):4135–4150, August 2012.