Transcript
Page 1: Adaptive Belief Propagation  · xxxxx geopapa@mit.edu xxxxxxxxxxxxxxxxxxxxx fisher@csail.mit.edu Massachusetts Institute of Technology Motivation ... whose average-case performance

Adaptive Belief Propagationhttps://github.com/geopapa11/adabp

Georgios Papachristoudis and John W. Fisher IIIxxxxx [email protected] xxxxxxxxxxxxxxxxxxxxx [email protected]

Massachusetts Institute of Technology

Motivation

� Graphical models are commonly used to represent large-scale inference problems.

� At any given time, only a small subset of all variables may be of interest.

� Observations may arrive at different times, resulting in a change of distribution.

� The marginals of variables of interest change after the addition of new observations.

� It is desirable to efficiently evaluate desired statistics and avoid redundant computations.

#1?

w1

v1

?v2

#2

?v2

w2

?

#3

?

w2w3

v3

v2

#4 ?

w4

v4

v2

?

· · · · · ·Model parameters (bold nodes) change due to the addition of newobservations, while only a small subset of latent nodes is of interest (?)

Temperature measurements (from sensors) are added sequentially,while only a part of latent variables (server) is of interest.

Contributions

� We develop an adaptive inference approach which gives the exact marginals on trees andwhose average-case performance is significantly better than that of standard BP.

� We provide an extension to Gaussian loopy graphs, where the results are exact.

� Implementation is straightforward, code is publicly available.

Problem statement

� X = {X1, . . . ,XN}: N latent variables that are thefocus of inference.

� Direct dependencies between latent variables arerepresented by edge set E .

� Neighbors of Xk are represented by N (k).

� Each latent node Xk is linked to mk ≥ 0measurements/observations: Yk,1, . . . ,Yk,mk

.

Y1,1

Y1,m1

Y2,1 Y2,m2

. . .

Y3,1 Y3,m3

... X1 X2

X3X4

Y4,m4Y4,1. . .

. . .

. . .

XN

� Discrete setting: Xk ∈ X , Gaussian setting: Xk ∈ Rd .

Updating node potentials

� ϕ(0)k (xk) : Node potentials of latent variables.

� χk`(xk, y`) : Pairwise potential between latent andobserved variables.

� ψij(xi , xj) : Pairwise potential between latentvariables.

. . .

X1 X2

X3X4

XN

Y1,1

Y1,m1

Y2,1 Y2,m2

. . .

Y3,1 Y3,m3

... X1 X2

X3X4

Y4,m4Y4,1. . .

. . .

. . .

XN

)

� A new measurement Yw`,u = yu at iteration ` changes the node potential of Xw` as

ϕ(`)w` (xw`) = ϕ

(`−1)w` (xw`)χw`u(xw`, yu).

� We only consider the graph of latent nodes.

Measurement and marginal orders

Measurement order w = {w1, . . . ,wM}: The order of acquiring measurements.

Marginal order v = {v1, . . . , vM}: The order of (latent) nodes whose marginal is ofinterest at each step.

. . .

X1 X2

X3X4

XN

?v1

w1

. . .

X1 X2

X3X4

XN

?

v2

w2

. . .

X1 X2

X3X4

XN

?

w3

v3

Adaptive BP

Belief Propagation

� Belief propagation is a message passing algorithm that runs in linear time tothe number of latent nodes and computes the node marginals.

� A message is given by mi→j(xj) =∑

xiϕi(xi)ψij(xi , xj)

∏k∈N (i)\j mk→i(xi).

TvTu

...

...

· · ·

· · ·

· · ·

i

j

k

u v

... ...

Ti

mi!j

Lowest Common Ancestor (LCA)

� The lca of two nodes w , v is the lowest (deepest) node that has both w and v asdescendants.

� For trees, the path between two nodes is uniquely determined from their lca.

� The lca of two nodes is determined in constant time by reduction to theRange Minimum Query (RMQ) problem [Czumaj et al., 2007].

� This requires the building of the so-called RMQ structure in O(N logN) time and space.

w

v

lca(w, v)

path(w, v)

Adaptive BP (AdaBP)

� Denote by M(w → v) the directed path from node w to node v , which is unique for trees.

?w`

v` w`�1

Send messages from w`−1 to w`

?w`

v` w`�1

Send messages from w` to v`

Theorem. AdaBP provides the exact marginals in pathM(w`→ v`),∀` for tree MRFs.

Preprocessing Build the RMQ structure.Initialization Initialize node, pairwise potentials and messages.for ` = 1, 2, . . . do

Determine path M(w`−1 → w`).Compute messages in M(w`−1 → w`).Update the node potential at Xw`

.Determine path M(w`→ v`).Compute messages in M(w`−1 → w`).Compute the marginal of interest pXv`

(xv`).end for

Extension to Max-Product

� Find the most likely sequence: x∗ ∈ arg maxx p(x).

� mi→j(xj) = maxxi ϕi(xi)ψij(xi , xj)∏

k∈N (i)\j mk→i(xi).

� δi→j(xj) = arg maxxi ϕi(xi)ψij(xi , xj)∏

k∈N (i)\j mk→i(xi).

� Propagate m, δ messages in M(w`−1 → w`).

w`

w`�1

Extension to Gaussian Loopy MRFs

� Feedback Message Passing (FMP) by [Liu et al., 2012] is abelief-propagation-like algorithm, which provides the exact marginal meansand variances in Gaussian loopy graphs.

� F : FVS nodes, set of nodes whose removal breaks loops (here: 16, 17, 18).

� T : Remaining acyclic graph.

� A: Anchors (neighbors of FVS nodes).

� wT` = w`, if w` ∈ T and wT` = wT`−1, otherwise.

1 2 3 4 16

5 6 7 8 9

10 11 12

18

13

17 14 15

Extension of AdaBP to Gaussian loopy graphs

1 2 3 4 16

5 6 7 8 9

10 11 12

18

13

17 14 15

w`w`�1

w`−1,w` ∈ T

w`�1

1 2 3 4 16

5 6 7 8 9

10 11 12

18

13

17 14 15w`

wT`

w`−1 ∈ F ,w` ∈ TFirst phase: Send messages from wT`−1 to w`

1 2 3 4 16

5 6 7 8 9

10 11 12

18

13

17 14 15

wT`

?v`

M(wT` → A)

?1 2 3 4 16

5 6 7 8 9

10 11 12

18

13

17 14 15

v` wT`

M(A → v`)

Second phase: Send messages from wT` to A and then to v`

Experiments

� Comparison against standard BP and RCTreeBP by [Sumer et al., 2011].

� Method by [Sumer et al., 2011] is an adaptive inference approach which constructs abalanced representation of an elimination tree to evaluate marginals in logarithmic time.

� Preprocessing time (for trees): O(|X |3N) / O(N logN) for AdaBP.� Time/update (for trees): O(|X |3 logN) / O(|X |2dist(w`−1,w`)) for AdaBP.

Synthetic data

� We construct unbalanced trees of varying sizes (N ∈ {10, 102, 103, 104}).

� We generate measurement order w randomly (column (a)) or such thatE[dist(w`−1,w`)] ≤ |X | logN (column (b)) and compute marginals at each step.

� AdaBP is orders of magnitude faster than standard BP,� AdaBP is 1.3–4.7 faster than RCTreeBP when E[dist(w`−1,w`)] ≤ |X | logN (b).

N

101

102

103

104

Speedup r

atio

100

101

102

103

104

|X | = 2

t(BP)/t(AdaBP)t(RCTreeBP)/t(AdaBP)

N

101

102

103

104

Speedup r

atio

100

101

102

103

104

|X | = 2

t(BP)/t(AdaBP)t(RCTreeBP)/t(AdaBP)

N10

110

210

310

4

Speedup r

atio

10-2

10-1

100

101

E[dist(wℓ−1, wℓ)] ∼ N

t(BP)/t(AdaBP)t(RCTreeBP)/t(AdaBP)

(c) E[dist(w`−1,w`)] ∼ N .

N

101

102

103

104

Speedup r

atio

100

101

102

103

104

|X | = 10

t(BP)/t(AdaBP)t(RCTreeBP)/t(AdaBP)

(a) Unconstrained w .

N

101

102

103

104

Speedup r

atio

100

101

102

103

104

|X | = 10

t(BP)/t(AdaBP)t(RCTreeBP)/t(AdaBP)

(b) Constrained w .

N10

110

210

310

4

Speedup r

atio

100

101

102

103

104

E[dist(wℓ−1, wℓ)] = 2

t(BP)/t(AdaBP)t(RCTreeBP)/t(AdaBP)

(d) E[dist(w`−1,w`)] fixed.

Real data

� We explore the effect of basepair mutation in the birth/death of CpG islands.

� AdaMP is up to 8 times faster than RCTreeMP.

� Update time per iteration is much smaller for AdaMP for small dist(w`−1,w`).

� Both methods are not sensitive to changes in the MAP sequence.

N0

2

4

6

8

102

103

104

10510

-2

100

102

104RCTreeMP

AdaMP

(a) Speedups of AdaMP over RCTreeMP.

10-2100102

10-2100102

10-5100105

100 101 102 103 104 10510-5100105

(b) Update times of AdaMP over RCTreeMP.

0 200 400 600 800

Update

tim

e (

sec)

5

10

15

20

25

RCTreeMP (ρ = 0.09)AdaMP (ρ = 0.14)

(c) Sensitivity to changes in MAP sequence.

� We analyze temperature measurements collected from 53 wireless sensors from IntelBerkeley Research Lab.

� We assume temperature evolution follows a Gaussian distribution.

� We collect measurements in a 6-hour window on a random order and are interested incomputing the marginal of selected areas.

� AdaBP is up to 4–6 times faster than standard Kalman filtering/smoothing techniques.

Iteration j50 100 150 200

Speedup r

atio

20

40

60

t(KF)/t(AdaBP)

dist(wℓ, wℓ−1)20 40 60 80 100 120 140 160 180

Tim

e (

se

c)

5

10

15

20

25

30

AdaBPKF

[Sumer et al., 2011] O. Sumer, U. A. Acar, A. T. Ihler and R. R. Mettu. Adaptive Exact Inference in Graphical Models. Journal of Machine Learning Research (JMLR), 12:3147–3186,November 2011.

[Czumaj et al., 2007] A. Czumaj, M. Kowaluk and A. Lingas. Faster algorithms for finding lowest common ancestors in directed acyclic graphs. Theor. Comput. Sci., 380:37–46, July 2007.

[Liu et al., 2012] Y. Liu, V. Chandrasekaran, A. Anandkumar and A. S. Willsky. Feedback Message Passing for Inference in Gaussian Graphical Models. IEEE Transactions on SignalProcessing, 60(8):4135–4150, August 2012.

Top Related