map estimation algorithms in

MAP Estimation Algorithms in

M. Pawan Kumar, University of Oxford

Pushmeet Kohli, Microsoft Research

Computer Vision - Part I

Aim of the Tutorial

• Description of some successful algorithms

• Computational issues

• Enough details to implement

• Some proofs will be skipped :-(

• But references to them will be given :-)

A Vision ApplicationBinary Image Segmentation

How ?

Cost function Models our knowledge about natural images

Optimize cost function to obtain the segmentation

Object - white, Background - green/grey Graph G = (V,E)

Each vertex corresponds to a pixel

Edges define a 4-neighbourhood grid graph

Assign a label to each vertex from L = {obj,bkg}


Graph G = (V,E)

Cost of a labelling f : V L Per Vertex Cost

Cost of label ‘obj’ low Cost of label ‘bkg’ high

Object - white, Background - green/grey


Graph G = (V,E)

Cost of a labelling f : V L

Cost of label ‘obj’ high Cost of label ‘bkg’ low

Per Vertex Cost

UNARY COST



Graph G = (V,E)

Cost of a labelling f : V L Per Edge Cost

Cost of same label low

Cost of different labels high



Graph G = (V,E)


Cost of same label high

Cost of different labels low

Per Edge Cost

PAIRWISECOST



Graph G = (V,E)

Problem: Find the labelling with minimum cost f*



Another Vision ApplicationObject Detection using Parts-based Models

How ?

Once again, by defining a good cost function

H T

L1

Each vertex corresponds to a part - ‘Head’, ‘Torso’, ‘Legs’

1

Edges define a TREE

Assign a label to each vertex from L = {positions}

Graph G = (V,E)

L2 L3 L4


2



Graph G = (V,E)

Edges define a TREE

H T

L1 L2 L3 L4


3



Graph G = (V,E)

Edges define a TREE

H T

L1 L2 L3 L4



Unary cost : How well does part match image patch?

Pairwise cost : Encourages valid configurations

Find best labelling f*

Graph G = (V,E)

3 H T

L1 L2 L3 L4


Yet Another Vision ApplicationStereo Correspondence

Disparity Map

How ?

Minimizing a cost function


Graph G = (V,E)

Vertex corresponds to a pixel

Edges define grid graph

L = {disparities}


Cost of labelling f :

Unary cost + Pairwise Cost

Find minimum cost f*

The General Problem

b

a

e

d

c

f

Graph G = ( V, E )

Discrete label set L = {1,2,…,h}

Assign a label to each vertex f: V L

1

1 2

2 2

3

Cost of a labelling Q(f)

Unary Cost Pairwise Cost

Find f* = arg min Q(f)

Outline• Problem Formulation

– Energy Function– MAP Estimation– Computing min-marginals

• Reparameterization

• Belief Propagation

• Tree-reweighted Message Passing

Energy Function

Va Vb Vc Vd

Label l0

Label l1

Da Db Dc Dd

Random Variables V = {Va, Vb, ….}

Labels L = {l0, l1, ….} Data D

Labelling f: {a, b, …. } {0,1, …}

Energy Function

Va Vb Vc Vd

Da Db Dc Dd

Q(f) = ∑a a;f(a)

Unary Potential

2

5

4

2

6

3

3

7Label l0

Label l1

Easy to minimize

Neighbourhood

Energy Function

Va Vb Vc Vd

Da Db Dc Dd

E : (a,b) E iff Va and Vb are neighbours

E = { (a,b) , (b,c) , (c,d) }

2

5

4

2

6

3

3

7Label l0

Label l1

Energy Function

Va Vb Vc Vd

Da Db Dc Dd

+∑(a,b) ab;f(a)f(b)

Pairwise Potential

0

1 1

0

0

2

1

1

4 1

0

3

2

5

4

2

6

3

3

7Label l0

Label l1

Q(f) = ∑a a;f(a)

Energy Function

Va Vb Vc Vd

Da Db Dc Dd

0

1 1

0

0

2

1

1

4 1

0

3

Parameter

2

5

4

2

6

3

3

7Label l0

Label l1

+∑(a,b) ab;f(a)f(b)Q(f; ) = ∑a a;f(a)

MAP Estimation

Va Vb Vc Vd

2

5

4

2

6

3

3

7

0

1 1

0

0

2

1

1

4 1

0

3

Q(f; ) = ∑a a;f(a) + ∑(a,b) ab;f(a)f(b)

Label l0

Label l1

MAP Estimation

Va Vb Vc Vd

2

5

4

2

6

3

3

7

0

1 1

0

0

2

1

1

4 1

0

3

Q(f; ) = ∑a a;f(a) + ∑(a,b) ab;f(a)f(b)

2 + 1 + 2 + 1 + 3 + 1 + 3 = 13

Label l0

Label l1

MAP Estimation

Va Vb Vc Vd

2

5

4

2

6

3

3

7

0

1 1

0

0

2

1

1

4 1

0

3

Q(f; ) = ∑a a;f(a) + ∑(a,b) ab;f(a)f(b)

Label l0

Label l1

MAP Estimation

Va Vb Vc Vd

2

5

4

2

6

3

3

7

0

1 1

0

0

2

1

1

4 1

0

3

Q(f; ) = ∑a a;f(a) + ∑(a,b) ab;f(a)f(b)

5 + 1 + 4 + 0 + 6 + 4 + 7 = 27

Label l0

Label l1

MAP Estimation

Va Vb Vc Vd

2

5

4

2

6

3

3

7

0

1 1

0

0

2

1

1

4 1

0

3

Q(f; ) = ∑a a;f(a) + ∑(a,b) ab;f(a)f(b)

f* = arg min Q(f; )

q* = min Q(f; ) = Q(f*; )

Label l0

Label l1

MAP Estimation

f(a) f(b) f(c) f(d) Q(f; )

0 0 0 0 18

0 0 0 1 15

0 0 1 0 27

0 0 1 1 20

0 1 0 0 22

0 1 0 1 19

0 1 1 0 27

0 1 1 1 20

16 possible labellings

f(a) f(b) f(c) f(d) Q(f; )

1 0 0 0 16

1 0 0 1 13

1 0 1 0 25

1 0 1 1 18

1 1 0 0 18

1 1 0 1 15

1 1 1 0 23

1 1 1 1 16

f* = {1, 0, 0, 1}q* = 13

Computational Complexity

Segmentation

2|V|

|V| = number of pixels ≈ 320 * 480 = 153600


|L| = number of pixels ≈ 153600

Detection

|L||V|


|V| = number of pixels ≈ 153600

Stereo

|L||V|

Can we do better than brute-force?

MAP Estimation is NP-hard !!


|V| = number of pixels ≈ 153600

Stereo

|L||V|

Exact algorithms do exist for special cases

Good approximate algorithms for general case

But first … two important definitions

Min-Marginals

Va Vb Vc Vd

2

5

4

2

6

3

3

7

0

1 1

0

0

2

1

1

4 1

0

3

f* = arg min Q(f; ) such that f(a) = i

Min-marginal qa;i

Label l0

Label l1

Not a marginal (no summation)

Min-Marginals16 possible labellings qa;0 = 15

f(a) f(b) f(c) f(d) Q(f; )

0 0 0 0 18

0 0 0 1 15

0 0 1 0 27

0 0 1 1 20

0 1 0 0 22

0 1 0 1 19

0 1 1 0 27

0 1 1 1 20

f(a) f(b) f(c) f(d) Q(f; )

1 0 0 0 16

1 0 0 1 13

1 0 1 0 25

1 0 1 1 18

1 1 0 0 18

1 1 0 1 15

1 1 1 0 23

1 1 1 1 16

Min-Marginals16 possible labellings qa;1 = 13

f(a) f(b) f(c) f(d) Q(f; )

1 0 0 0 16

1 0 0 1 13

1 0 1 0 25

1 0 1 1 18

1 1 0 0 18

1 1 0 1 15

1 1 1 0 23

1 1 1 1 16

f(a) f(b) f(c) f(d) Q(f; )

0 0 0 0 18

0 0 0 1 15

0 0 1 0 27

0 0 1 1 20

0 1 0 0 22

0 1 0 1 19

0 1 1 0 27

0 1 1 1 20

Min-Marginals and MAP• Minimum min-marginal of any variable = energy of MAP labelling

minf Q(f; ) such that f(a) = i

qa;i mini

mini ( )

Va has to take one label

minf Q(f; )

Summary

MAP Estimation

f* = arg min Q(f; )

Q(f; ) = ∑a a;f(a) + ∑(a,b) ab;f(a)f(b)

Min-marginals

qa;i = min Q(f; ) s.t. f(a) = i

Energy Function

Reparameterization

Va Vb

2

5

4

2

0

1 1

0

f(a) f(b) Q(f; )

0 0 7

0 1 10

1 0 5

1 1 6

2 +

2 +

- 2

- 2

Add a constant to all a;i

Subtract that constant from all b;k

Reparameterization

f(a) f(b) Q(f; )

0 0 7 + 2 - 2

0 1 10 + 2 - 2

1 0 5 + 2 - 2

1 1 6 + 2 - 2

Add a constant to all a;i

Subtract that constant from all b;k

Q(f; ’) = Q(f; )

Va Vb

2

5

4

2

0

0

2 +

2 +

- 2

- 2

1 1

Reparameterization

Va Vb

2

5

4

2

0

1 1

0

f(a) f(b) Q(f; )

0 0 7

0 1 10

1 0 5

1 1 6

- 3 + 3

Add a constant to one b;k

Subtract that constant from ab;ik for all ‘i’

- 3

Reparameterization

Va Vb

2

5

4

2

0

1 1

0

f(a) f(b) Q(f; )

0 0 7

0 1 10 - 3 + 3

1 0 5

1 1 6 - 3 + 3

- 3 + 3

- 3

Q(f; ’) = Q(f; )



Reparameterization

Va Vb

2

5

4

2

3 1

0

1

2

Va Vb

2

5

4

2

3 1

1

0

1

- 2

- 2

- 2 + 2+ 1

+ 1

+ 1

- 1

Va Vb

2

5

4

2

3 1

2

1

0 - 4 + 4

- 4

- 4

’a;i = a;i ’b;k = b;k

’ab;ik = ab;ik

+ Mab;k

- Mab;k

+ Mba;i

- Mba;i Q(f; ’) = Q(f; )

Reparameterization

Q(f; ’) = Q(f; ), for all f

’ is a reparameterization of , iff

’

’b;k = b;k

’a;i = a;i

’ab;ik = ab;ik

+ Mab;k

- Mab;k

+ Mba;i

- Mba;i

Equivalently Kolmogorov, PAMI, 2006

Va Vb

2

5

4

2

0

0

2 +

2 +

- 2

- 2

1 1

RecapMAP Estimation

f* = arg min Q(f; )Q(f; ) = ∑a a;f(a) + ∑(a,b) ab;f(a)f(b)

Min-marginals

qa;i = min Q(f; ) s.t. f(a) = i

Q(f; ’) = Q(f; ), for all f ’ Reparameterization



• Belief Propagation– Exact MAP for Chains and Trees– Approximate MAP for general graphs– Computational Issues and Theoretical Properties


Belief Propagation

• Belief Propagation gives exact MAP for chains

• Remember, some MAP problems are easy

• Exact MAP for trees

• Clever Reparameterization

Two Variables

Va Vb

2

5 2

1

0

Va Vb

2

5

40

1

Choose the right constant ’b;k = qb;k



Va Vb

2

5 2

1

0

Va Vb

2

5

40

1


a;0 + ab;00 = 5 + 0

a;1 + ab;10 = 2 + 1minMab;0 =

Two Variables

Va Vb

2

5 5

-2

-3

Va Vb

2

5

40

1


Two Variables

Va Vb

2

5 5

-2

-3

Va Vb

2

5

40

1


f(a) = 1

’b;0 = qb;0

Two Variables

Potentials along the red path add up to 0

Va Vb

2

5 5

-2

-3

Va Vb

2

5

40

1


a;0 + ab;01 = 5 + 1

a;1 + ab;11 = 2 + 0minMab;1 =

Two Variables

Va Vb

2

5 5

-2

-3

Va Vb

2

5

6-2

-1


f(a) = 1

’b;0 = qb;0

f(a) = 1

’b;1 = qb;1

Minimum of min-marginals = MAP estimate

Two Variables

Va Vb

2

5 5

-2

-3

Va Vb

2

5

6-2

-1


f(a) = 1

’b;0 = qb;0

f(a) = 1

’b;1 = qb;1

f*(b) = 0 f*(a) = 1

Two Variables

Va Vb

2

5 5

-2

-3

Va Vb

2

5

6-2

-1


f(a) = 1

’b;0 = qb;0

f(a) = 1

’b;1 = qb;1

We get all the min-marginals of Vb

Two Variables

RecapWe only need to know two sets of equations

General form of Reparameterization

’a;i = a;i

’ab;ik = ab;ik

+ Mab;k

- Mab;k

+ Mba;i

- Mba;i

’b;k = b;k

Reparameterization of (a,b) in Belief Propagation

Mab;k = mini { a;i + ab;ik }

Mba;i = 0

Three Variables

Va Vb

2

5 2

1

0

Vc

4 60

1

0

1

3

2 3

Reparameterize the edge (a,b) as before

l0

l1

Va Vb

2

5 5-3

Vc

6 60

1

-2

3


f(a) = 1

f(a) = 1

-2 -1 2 3

Three Variables

l0

l1

Va Vb

2

5 5-3

Vc

6 60

1

-2

3


f(a) = 1

f(a) = 1


-2 -1 2 3

Three Variables

l0

l1

Va Vb

2

5 5-3

Vc

6 60

1

-2

3

Reparameterize the edge (b,c) as before

f(a) = 1

f(a) = 1


-2 -1 2 3

Three Variables

l0

l1

Va Vb

2

5 5-3

Vc

6 12-6

-5

-2

9


f(a) = 1

f(a) = 1


f(b) = 1

f(b) = 0

-2 -1 -4 -3

Three Variables

l0

l1

Va Vb

2

5 5-3

Vc

6 12-6

-5

-2

9


f(a) = 1

f(a) = 1


f(b) = 1

f(b) = 0

qc;0

qc;1-2 -1 -4 -3

Three Variables

l0

l1

Va Vb

2

5 5-3

Vc

6 12-6

-5

-2

9

f(a) = 1

f(a) = 1

f(b) = 1

f(b) = 0

qc;0

qc;1

f*(c) = 0 f*(b) = 0 f*(a) = 1

Generalizes to any length chain

-2 -1 -4 -3

Three Variables

l0

l1

Va Vb

2

5 5-3

Vc

6 12-6

-5

-2

9

f(a) = 1

f(a) = 1

f(b) = 1

f(b) = 0

qc;0

qc;1

f*(c) = 0 f*(b) = 0 f*(a) = 1

Only Dynamic Programming

-2 -1 -4 -3

Three Variables

l0

l1

Why Dynamic Programming?

3 variables 2 variables + book-keeping

n variables (n-1) variables + book-keeping

Start from left, go to right

Reparameterize current edge (a,b)


’ab;ik = ab;ik+ Mab;k - Mab;k’b;k = b;k

Repeat

Why Dynamic Programming?





Repeat

Messages Message Passing

Why stop at dynamic programming?

Va Vb

2

5 5-3

Vc

6 12-6

-5

-2

9

Reparameterize the edge (c,b) as before

-2 -1 -4 -3

Three Variables

l0

l1

Va Vb

2

5 9-3

Vc

11 12-11

-9

-2

9

Reparameterize the edge (c,b) as before

-2 -1 -9 -7

’b;i = qb;i

Three Variables

l0

l1

Va Vb

2

5 9-3

Vc

11 12-11

-9

-2

9

Reparameterize the edge (b,a) as before

-2 -1 -9 -7

Three Variables

l0

l1

Va Vb

9

11 9-9

Vc

11 12-11

-9

-9

9

Reparameterize the edge (b,a) as before

-9 -7 -9 -7

’a;i = qa;i

Three Variables

l0

l1

Va Vb

9

11 9-9

Vc

11 12-11

-9

-9

9

Forward Pass Backward Pass

-9 -7 -9 -7

All min-marginals are computed

Three Variables

l0

l1

Belief Propagation on Chains





Repeat till the end of the chain

Start from right, go to left

Repeat till the end of the chain

Belief Propagation on Chains

• A way of computing reparam constants

• Generalizes to chains of any length

• Forward Pass - Start to End• MAP estimate• Min-marginals of final variable

• Backward Pass - End to start• All other min-marginals

Won’t need this .. But good to know


• Each constant takes O(|L|)

• Number of constants - O(|E||L|)

O(|E||L|2)

• Memory required ?

O(|E||L|)

Belief Propagation on Trees

Vb

Va

Forward Pass: Leaf Root

All min-marginals are computed

Backward Pass: Root Leaf

Vc

Vd Ve Vg Vh

Belief Propagation on Cycles

Va Vb

Vd Vc

Where do we start? Arbitrarily

a;0

a;1

b;0

b;1

d;0

d;1

c;0

c;1

Reparameterize (a,b)


Va Vb

Vd Vc

a;0

a;1

’b;0

’b;1

d;0

d;1

c;0

c;1



Va Vb

Vd Vc

a;0

a;1

’b;0

’b;1

d;0

d;1

’c;0

’c;1



Va Vb

Vd Vc

a;0

a;1

’b;0

’b;1

’d;0

’d;1

’c;0

’c;1



Va Vb

Vd Vc

’a;0

’a;1

’b;0

’b;1

’d;0

’d;1

’c;0

’c;1



Va Vb

Vd Vc

’a;0

’a;1

’b;0

’b;1

’d;0

’d;1

’c;0

’c;1


- a;0

- a;1

’a;0 - a;0 = qa;0 ’a;1 - a;1 = qa;1


Va Vb

Vd Vc

’a;0

’a;1

’b;0

’b;1

’d;0

’d;1

’c;0

’c;1

Pick minimum min-marginal. Follow red path.

- a;0

- a;1

’a;0 - a;0 = qa;0 ’a;1 - a;1 = qa;1


Va Vb

Vd Vc

a;0

a;1

’b;0

’b;1

d;0

d;1

c;0

c;1



Va Vb

Vd Vc

a;0

a;1

’b;0

’b;1

d;0

d;1

’c;0

’c;1



Va Vb

Vd Vc

a;0

a;1

’b;0

’b;1

’d;0

’d;1

’c;0

’c;1



Va Vb

Vd Vc

’a;0

’a;1

’b;0

’b;1

’d;0

’d;1

’c;0

’c;1


- a;0

- a;1

’a;1 - a;1 = qa;1 ’a;0 - a;0 ≤ qa;0≤


Va Vb

Vd Vc

’a;0

’a;1

’b;0

’b;1

’d;0

’d;1

’c;0

’c;1

Problem Solved

- a;0

- a;1

’a;1 - a;1 = qa;1 ’a;0 - a;0 ≤ qa;0≤


Va Vb

Vd Vc

’a;0

’a;1

’b;0

’b;1

’d;0

’d;1

’c;0

’c;1

Problem Not Solved

- a;0

- a;1

’a;1 - a;1 = qa;1 ’a;0 - a;0 ≤ qa;0≥


Va Vb

Vd Vc

’a;0

’a;1

’b;0

’b;1

’d;0

’d;1

’c;0

’c;1

- a;0

- a;1

Reparameterize (a,b) again


Va Vb

Vd Vc

’a;0

’a;1

’’b;0

’’b;1

’d;0

’d;1

’c;0

’c;1


But doesn’t this overcount some potentials?


Va Vb

Vd Vc

’a;0

’a;1

’’b;0

’’b;1

’d;0

’d;1

’c;0

’c;1


Yes. But we will do it anyway


Va Vb

Vd Vc

’a;0

’a;1

’’b;0

’’b;1

’d;0

’d;1

’c;0

’c;1

Keep reparameterizing edges in some order

Hope for convergence and a good solution

Belief Propagation

• Generalizes to any arbitrary random field

• Complexity per iteration ?

O(|E||L|2)

• Memory required ?

O(|E||L|)

Computational Issues of BP

Complexity per iteration O(|E||L|2)

Special Pairwise Potentials ab;ik = wabd(|i-k|)

i - k

d

Potts

i - k

d

Truncated Linear

i - k

d

Truncated Quadratic

O(|E||L|) Felzenszwalb & Huttenlocher, 2004


Memory requirements O(|E||L|)

Half of original BP Kolmogorov, 2006

Some approximations exist

But memory still remains an issue

Yu, Lin, Super and Tan, 2007

Lasserre, Kannan and Winn, 2007


Order of reparameterization

Randomly

Residual Belief Propagation

In some fixed order

The one that results in maximum change

Elidan et al. , 2006

Theoretical Properties of BP

Exact for Trees Pearl, 1988

What about any general random field?

Run BP. Assume it converges.




Choose variables in a tree. Change their labels.Value of energy does not decrease




Choose variables in a cycle. Change their labels.Value of energy does not decrease




For cycles, if BP converges then exact MAPWeiss and Freeman, 2001

ResultsObject Detection Felzenszwalb and Huttenlocher, 2004

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.



H

TA1 A2

L1 L2

Labels - Poses of parts

Unary Potentials:Fraction of foreground pixels

Pairwise Potentials:Favour Valid Configurations

ResultsObject Detection Felzenszwalb and Huttenlocher, 2004









ResultsBinary Segmentation Szeliski et al. , 2008

Labels - {foreground, background}

Unary Potentials: -log(likelihood) using learnt fg/bg models

Pairwise Potentials: 0, if same labels

1 - exp(|Da - Db|), if different labels

ResultsBinary Segmentation



Szeliski et al. , 2008



Belief Propagation





Global optimum



ResultsSzeliski et al. , 2008

Labels - {disparities}

Unary Potentials: Similarity of pixel colours



Stereo Correspondence






Belief Propagation





Global optimum




Summary of BP

Exact for chains

Exact for trees

Approximate MAP for general cases

Not even convergence guaranteed

So can we do something better?




• Tree-reweighted Message Passing– Integer Programming Formulation– Linear Programming Relaxation and its Dual– Convergent Solution for Dual– Computational Issues and Theoretical Properties

TRW Message Passing

• Convex (not Combinatorial) Optimization

• A different look at the same problem

• A similar solution

• Combinatorial (not Convex) Optimization

We will look at the most general MAP estimation

Not trees No assumption on potentials

Things to Remember

• Forward-pass computes min-marginals of root

• BP is exact for trees

• Every iteration provides a reparameterization

• Basics of Mathematical Optimization

Mathematical Optimization

min g0(x)

subject to gi(x) ≤ 0 i=1, … , N

• Objective function • Constraints

• Feasible region = {x | gi(x) ≤ 0}

x* = arg

OptimalSolution

g0(x*)

OptimalValue

Integer Programming

min g0(x)




x* = arg

OptimalSolution

g0(x*)

OptimalValue

xk Z

Feasible Region

Generally NP-hard to optimize

Linear Programming

min g0(x)




x* = arg

OptimalSolution

g0(x*)

OptimalValue

Linear Programming

min g0(x)


• Linear objective function • Linear constraints


x* = arg

OptimalSolution

g0(x*)

OptimalValue

Linear Programming

min cTx

subject to Ax ≤ b i=1, … , N

• Linear objective function • Linear constraints

• Feasible region = {x | Ax ≤ b}

x* = arg

OptimalSolution

cTx*

OptimalValue

Polynomial-time Solution

Feasible Region

Polynomial-time Solution

Feasible Region

Optimal solution lies on a vertex (obj func linear)

Integer Programming Formulation

Va Vb

Label l0

Label l12

5

4

2

0

1 1

0

2Unary Potentials

a;0 = 5

a;1 = 2

b;0 = 2

b;1 = 4

Labelling

f(a) = 1

f(b) = 0

ya;0 = 0 ya;1 = 1

yb;0 = 1 yb;1 = 0

Any f(.) has equivalent boolean variables ya;i


Va Vb

2

5

4

2

0

1 1

0

2Unary Potentials

a;0 = 5

a;1 = 2

b;0 = 2

b;1 = 4

Labelling

f(a) = 1

f(b) = 0

ya;0 = 0 ya;1 = 1

yb;0 = 1 yb;1 = 0

Find the optimal variables ya;i

Label l0

Label l1


Va Vb

2

5

4

2

0

1 1

0

2Unary Potentials

a;0 = 5

a;1 = 2

b;0 = 2

b;1 = 4

Sum of Unary Potentials

∑a ∑i a;i ya;i

ya;i {0,1}, for all Va, li∑i ya;i = 1, for all Va

Label l0

Label l1


Va Vb

2

5

4

2

0

1 1

0

2Pairwise Potentials

ab;00 = 0

ab;10 = 1

ab;01 = 1

ab;11 = 0

Sum of Pairwise Potentials

∑(a,b) ∑ik ab;ik ya;iyb;k

ya;i {0,1}

∑i ya;i = 1

Label l0

Label l1


Va Vb

2

5

4

2

0

1 1

0

2Pairwise Potentials

ab;00 = 0

ab;10 = 1

ab;01 = 1

ab;11 = 0

Sum of Pairwise Potentials

∑(a,b) ∑ik ab;ik yab;ik

ya;i {0,1}

∑i ya;i = 1

yab;ik = ya;i yb;k

Label l0

Label l1


min ∑a ∑i a;i ya;i + ∑(a,b) ∑ik ab;ik yab;ik

ya;i {0,1}

∑i ya;i = 1

yab;ik = ya;i yb;k


min Ty

ya;i {0,1}

∑i ya;i = 1

yab;ik = ya;i yb;k

= [ … a;i …. ; … ab;ik ….]

y = [ … ya;i …. ; … yab;ik ….]

One variable, two labels

ya;0

ya;1

ya;0 {0,1} ya;1 {0,1} ya;0 + ya;1 = 1

y = [ ya;0 ya;1] = [ a;0 a;1]

Two variables, two labels

= [ a;0 a;1 b;0 b;1

ab;00 ab;01 ab;10 ab;11]y = [ ya;0 ya;1 yb;0 yb;1

yab;00 yab;01 yab;10 yab;11]

ya;0 {0,1} ya;1 {0,1} ya;0 + ya;1 = 1

yb;0 {0,1} yb;1 {0,1} yb;0 + yb;1 = 1

yab;00 = ya;0 yb;0 yab;01 = ya;0 yb;1

yab;10 = ya;1 yb;0 yab;11 = ya;1 yb;1

In General

Marginal Polytope

In General

R(|V||L| + |E||L|2)

y {0,1}(|V||L| + |E||L|2)

Number of constraints

|V||L| + |V| + |E||L|2

ya;i {0,1} ∑i ya;i = 1 yab;ik = ya;i yb;k


min Ty

ya;i {0,1}

∑i ya;i = 1

yab;ik = ya;i yb;k

= [ … a;i …. ; … ab;ik ….]

y = [ … ya;i …. ; … yab;ik ….]


min Ty

ya;i {0,1}

∑i ya;i = 1

yab;ik = ya;i yb;k

Solve to obtain MAP labelling y*


min Ty

ya;i {0,1}

∑i ya;i = 1

yab;ik = ya;i yb;k

But we can’t solve it in general

Linear Programming Relaxation

min Ty

ya;i {0,1}

∑i ya;i = 1

yab;ik = ya;i yb;k

Two reasons why we can’t solve this


min Ty

ya;i [0,1]

∑i ya;i = 1

yab;ik = ya;i yb;k

One reason why we can’t solve this


min Ty

ya;i [0,1]

∑i ya;i = 1

∑k yab;ik = ∑kya;i yb;k



min Ty

ya;i [0,1]

∑i ya;i = 1


= 1∑k yab;ik = ya;i∑k yb;k


min Ty

ya;i [0,1]

∑i ya;i = 1

∑k yab;ik = ya;i



min Ty

ya;i [0,1]

∑i ya;i = 1

∑k yab;ik = ya;i

No reason why we can’t solve this*

*memory requirements, time complexity


ya;0

ya;1

ya;0 {0,1} ya;1 {0,1} ya;0 + ya;1 = 1

y = [ ya;0 ya;1] = [ a;0 a;1]


ya;0

ya;1

ya;0 [0,1] ya;1 [0,1] ya;0 + ya;1 = 1

y = [ ya;0 ya;1] = [ a;0 a;1]


= [ a;0 a;1 b;0 b;1



ya;0 {0,1} ya;1 {0,1} ya;0 + ya;1 = 1

yb;0 {0,1} yb;1 {0,1} yb;0 + yb;1 = 1

yab;00 = ya;0 yb;0 yab;01 = ya;0 yb;1

yab;10 = ya;1 yb;0 yab;11 = ya;1 yb;1


= [ a;0 a;1 b;0 b;1



ya;0 [0,1] ya;1 [0,1] ya;0 + ya;1 = 1

yb;0 [0,1] yb;1 [0,1] yb;0 + yb;1 = 1

yab;00 = ya;0 yb;0 yab;01 = ya;0 yb;1

yab;10 = ya;1 yb;0 yab;11 = ya;1 yb;1


= [ a;0 a;1 b;0 b;1



ya;0 [0,1] ya;1 [0,1] ya;0 + ya;1 = 1

yb;0 [0,1] yb;1 [0,1] yb;0 + yb;1 = 1

yab;00 + yab;01 = ya;0

yab;10 = ya;1 yb;0 yab;11 = ya;1 yb;1


= [ a;0 a;1 b;0 b;1



ya;0 [0,1] ya;1 [0,1] ya;0 + ya;1 = 1

yb;0 [0,1] yb;1 [0,1] yb;0 + yb;1 = 1

yab;00 + yab;01 = ya;0

yab;10 + yab;11 = ya;1

In General

Marginal Polytope

LocalPolytope

In General

R(|V||L| + |E||L|2)

y [0,1](|V||L| + |E||L|2)

Number of constraints

|V||L| + |V| + |E||L|


min Ty

ya;i [0,1]

∑i ya;i = 1

∑k yab;ik = ya;i

No reason why we can’t solve this


Extensively studied

Optimization

Schlesinger, 1976

Koster, van Hoesel and Kolen, 1998

Theory

Chekuri et al, 2001 Archer et al, 2004

Machine Learning

Wainwright et al., 2001


Many interesting Properties

• Global optimal MAP for trees


But we are interested in NP-hard cases

• Preserves solution for reparameterization


• Large class of problems

• Metric Labelling• Semi-metric Labelling

Many interesting Properties - Integrality Gap

Manokaran et al., 2008

• Most likely, provides best possible integrality gap


• A computationally useful dual

Many interesting Properties - Dual

Optimal value of dual = Optimal value of primal

Easier-to-solve

Dual of the LP RelaxationWainwright et al., 2001

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

min Ty

ya;i [0,1]

∑i ya;i = 1

∑k yab;ik = ya;i


Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

1

2

3

4 5 6

1

2

3

4 5 6

ii =

i ≥ 0


1

2

3

4 5 6

q*(1)

ii =

q*(2)

q*(3)

q*(4) q*(5) q*(6)

i q*(i)

Dual of LP

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

i ≥ 0

max


1

2

3

4 5 6

q*(1)

ii

q*(2)

q*(3)

q*(4) q*(5) q*(6)

Dual of LP

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

i ≥ 0

i q*(i)max


ii

max i q*(i)

I can easily compute q*(i)

I can easily maintain reparam constraint

So can I easily solve the dual?

TRW Message PassingKolmogorov, 2006

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

VaVb Vc

VdVe Vf

VgVh Vi

1

2

3

1

2

3

4 5 6

4 5 6

ii

i q*(i)

Pick a variable Va


ii

i q*(i)

Vc Vb Va

1c;0

1c;1

1b;0

1b;1

1a;0

1a;1

Va Vd Vg

4a;0

4a;1

4d;0

4d;1

4g;0

4g;1


11 + 44 + rest

1 q*(1) + 4

q*(4) + K

Vc Vb Va Va Vd Vg

Reparameterize to obtain min-marginals of Va

1c;0

1c;1

1b;0

1b;1

1a;0

1a;1

4a;0

4a;1

4d;0

4d;1

4g;0

4g;1


1’1 + 4’4 + rest

Vc Vb Va

’1c;0

’1c;1

’1b;0

’1b;1

’1a;0

’1a;1

Va Vd Vg

’4a;0

’4a;1

’4d;0

’4d;1

’4g;0

’4g;1

One pass of Belief Propagation

1 q*(’1) + 4

q*(’4) + K


1’1 + 4’4 + rest

Vc Vb Va Va Vd Vg

Remain the same

1 q*(’1) + 4

q*(’4) + K

’1c;0

’1c;1

’1b;0

’1b;1

’1a;0

’1a;1

’4a;0

’4a;1

’4d;0

’4d;1

’4g;0

’4g;1


1’1 + 4’4 + rest

1 min{’1a;0,’1a;1} + 4

min{’4a;0,’4a;1} + K

Vc Vb Va Va Vd Vg

’1c;0

’1c;1

’1b;0

’1b;1

’1a;0

’1a;1

’4a;0

’4a;1

’4d;0

’4d;1

’4g;0

’4g;1


1’1 + 4’4 + rest

Vc Vb Va Va Vd Vg

Compute weighted average of min-marginals of Va

’1c;0

’1c;1

’1b;0

’1b;1

’1a;0

’1a;1

’4a;0

’4a;1

’4d;0

’4d;1

’4g;0

’4g;1

1 min{’1a;0,’1a;1} + 4

min{’4a;0,’4a;1} + K


1’1 + 4’4 + rest

Vc Vb Va Va Vd Vg

’’a;0 = 1’1a;0+ 4’4a;0

1 + 4

’’a;1 = 1’1a;1+ 4’4a;1

1 + 4

’1c;0

’1c;1

’1b;0

’1b;1

’1a;0

’1a;1

’4a;0

’4a;1

’4d;0

’4d;1

’4g;0

’4g;1

1 min{’1a;0,’1a;1} + 4

min{’4a;0,’4a;1} + K


1’’1 + 4’’4 + rest

Vc Vb Va Va Vd Vg

’1c;0

’1c;1

’1b;0

’1b;1

’’a;0

’’a;1

’’a;0

’’a;1

’4d;0

’4d;1

’4g;0

’4g;1

1 min{’1a;0,’1a;1} + 4

min{’4a;0,’4a;1} + K

’’a;0 = 1’1a;0+ 4’4a;0

1 + 4

’’a;1 = 1’1a;1+ 4’4a;1

1 + 4


1’’1 + 4’’4 + rest

Vc Vb Va Va Vd Vg

1 min{’’a;0,’’a;1} + 4

min{’’a;0,’’a;1} + K

’1c;0

’1c;1

’1b;0

’1b;1

’’a;0

’’a;1

’’a;0

’’a;1

’4d;0

’4d;1

’4g;0

’4g;1

’’a;0 = 1’1a;0+ 4’4a;0

1 + 4

’’a;1 = 1’1a;1+ 4’4a;1

1 + 4


1’’1 + 4’’4 + rest

Vc Vb Va Va Vd Vg

(1 + 4) min{’’a;0, ’’a;1} + K

’1c;0

’1c;1

’1b;0

’1b;1

’’a;0

’’a;1

’’a;0

’’a;1

’4d;0

’4d;1

’4g;0

’4g;1

’’a;0 = 1’1a;0+ 4’4a;0

1 + 4

’’a;1 = 1’1a;1+ 4’4a;1

1 + 4


1’’1 + 4’’4 + rest

Vc Vb Va Va Vd Vg

(1 + 4) min{’’a;0, ’’a;1} + K

’1c;0

’1c;1

’1b;0

’1b;1

’’a;0

’’a;1

’’a;0

’’a;1

’4d;0

’4d;1

’4g;0

’4g;1

min {p1+p2, q1+q2} min {p1, q1} + min {p2, q2}≥


1’’1 + 4’’4 + rest

Vc Vb Va Va Vd Vg

Objective function increases or remains constant

’1c;0

’1c;1

’1b;0

’1b;1

’’a;0

’’a;1

’’a;0

’’a;1

’4d;0

’4d;1

’4g;0

’4g;1

(1 + 4) min{’’a;0, ’’a;1} + K

TRW Message Passing

Initialize i. Take care of reparam constraint

Choose random variable Va

Compute min-marginals of Va for all trees

Node-average the min-marginals

REPEAT

Kolmogorov, 2006

Can also do edge-averaging

Example 1

Va Vb

0

1 1

0

2

5

4

2l0

l1

Vb Vc

0

2 3

1

4

2

6

3

Vc Va

1

4 1

0

6

3

6

4

2 =1 3 =11 =1

5

6

7

Pick variable Va. Reparameterize.

Example 1

Va Vb

-3

-2 -1

-2

5

7

4

2

Vb Vc

0

2 3

1

4

2

6

3

Vc Va

-3

1 -3

-3

6

3

10

7

2 =1 3 =11 =1

5

6

7

Average the min-marginals of Va

l0

l1

Example 1

Va Vb

-3

-2 -1

-2

7.5

7

4

2

Vb Vc

0

2 3

1

4

2

6

3

Vc Va

-3

1 -3

-3

6

3

7.5

7

2 =1 3 =11 =1

7

6

7

Pick variable Vb. Reparameterize.

l0

l1

Example 1

Va Vb

-7.5

-7 -5.5

-7

7.5

7

8.5

7

Vb Vc

-5

-3 -1

-3

9

6

6

3

Vc Va

-3

1 -3

-3

6

3

7.5

7

2 =1 3 =11 =1

7

6

7

Average the min-marginals of Vb

l0

l1

Example 1

Va Vb

-7.5

-7 -5.5

-7

7.5

7

8.75

6.5

Vb Vc

-5

-3 -1

-3

8.75

6.5

6

3

Vc Va

-3

1 -3

-3

6

3

7.5

7

2 =1 3 =11 =1

6.5

6.5

7 Value of dual does not increase

l0

l1

Example 1

Va Vb

-7.5

-7 -5.5

-7

7.5

7

8.75

6.5

Vb Vc

-5

-3 -1

-3

8.75

6.5

6

3

Vc Va

-3

1 -3

-3

6

3

7.5

7

2 =1 3 =11 =1

6.5

6.5

7 Maybe it will increase for Vc

NO

l0

l1

Example 1

Va Vb

-7.5

-7 -5.5

-7

7.5

7

8.75

6.5

Vb Vc

-5

-3 -1

-3

8.75

6.5

6

3

Vc Va

-3

1 -3

-3

6

3

7.5

7

2 =1 3 =11 =1

Strong Tree Agreement

Exact MAP Estimate

f1(a) = 0 f1(b) = 0 f2(b) = 0 f2(c) = 0 f3(c) = 0 f3(a) = 0

l0

l1

Example 2

Va Vb

0

1 1

0

2

5

2

2

Vb Vc

1

0 0

1

0

0

0

0

Vc Va

0

1 1

0

0

3

4

8

2 =1 3 =11 =1

4

0

4

Pick variable Va. Reparameterize.

l0

l1

Example 2

Va Vb

-2

-1 -1

-2

4

7

2

2

Vb Vc

1

0 0

1

0

0

0

0

Vc Va

0

0 1

-1

0

3

4

9

2 =1 3 =11 =1

4

0

4

Average the min-marginals of Va

l0

l1

Example 2

Va Vb

-2

-1 -1

-2

4

8

2

2

Vb Vc

1

0 0

1

0

0

0

0

Vc Va

0

0 1

-1

0

3

4

8

2 =1 3 =11 =1

4

0

4 Value of dual does not increase

l0

l1

Example 2

Va Vb

-2

-1 -1

-2

4

8

2

2

Vb Vc

1

0 0

1

0

0

0

0

Vc Va

0

0 1

-1

0

3

4

8

2 =1 3 =11 =1

4

0

4 Maybe it will decrease for Vb or Vc

NO

l0

l1

Example 2

Va Vb

-2

-1 -1

-2

4

8

2

2

Vb Vc

1

0 0

1

0

0

0

0

Vc Va

0

0 1

-1

0

3

4

8

2 =1 3 =11 =1

f1(a) = 1 f1(b) = 1 f2(b) = 1 f2(c) = 0 f3(c) = 1 f3(a) = 1

f2(b) = 0 f2(c) = 1

Weak Tree Agreement

Not Exact MAP Estimate

l0

l1

Example 2

Va Vb

-2

-1 -1

-2

4

8

2

2

Vb Vc

1

0 0

1

0

0

0

0

Vc Va

0

0 1

-1

0

3

4

8

2 =1 3 =11 =1

Weak Tree Agreement

Convergence point of TRW

l0

l1

f1(a) = 1 f1(b) = 1 f2(b) = 1 f2(c) = 0 f3(c) = 1 f3(a) = 1

f2(b) = 0 f2(c) = 1

Obtaining the Labelling

Only solves the dual. Primal solutions?

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

’ = ii

Fix the labelOf Va

Obtaining the Labelling

Only solves the dual. Primal solutions?

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

’ = ii

Fix the labelOf Vb

Continue in some fixed order

Meltzer et al., 2006

Computational Issues of TRW

• Speed-ups for some pairwise potentials

Basic Component is Belief Propagation

Felzenszwalb & Huttenlocher, 2004

• Memory requirements cut down by half

Kolmogorov, 2006

• Further speed-ups using monotonic chains

Kolmogorov, 2006

Theoretical Properties of TRW

• Always converges, unlike BP

Kolmogorov, 2006

• Strong tree agreement implies exact MAP


• Optimal MAP for two-label submodular problems

Kolmogorov and Wainwright, 2005

ab;00 + ab;11 ≤ ab;01 + ab;10

ResultsBinary Segmentation Szeliski et al. , 2008











TRW





Belief Propagation



ResultsStereo Correspondence Szeliski et al. , 2008










TRW





Belief Propagation




ResultsNon-submodular problems Kolmogorov, 2006

BP TRW-S

30x30 grid K50

BP TRW-S

BP outperforms TRW-S

Summary• Trees can be solved exactly - BP

• No guarantee of convergence otherwise - BP

• Strong Tree Agreement - TRW-S

• Submodular energies solved exactly - TRW-S

• TRW-S solves an LP relaxation of MAP estimation

• Loopier graphs give worse results Rother and Kolmogorov, 2006

Related New(er) Work

• Solving the Dual

Globerson and Jaakkola, 2007

Komodakis, Paragios and Tziritas 2007

Weiss et al., 2006

Schlesinger and Giginyak, 2007

• Solving the Primal

Ravikumar, Agarwal and Wainwright, 2008

Related New(er) Work

• More complex relaxations

Sontag and Jaakkola, 2007

Komodakis and Paragios, 2008

Kumar, Kolmogorov and Torr, 2007

Werner, 2008

Sontag et al., 2008

Kumar and Torr, 2008

Questions on Part I ?

Code + Standard Data

http://vision.middlebury.edu/MRF

map estimation algorithms in

Documents

v lunary cost

minimum cost f

background greengreygraph

v lper edge costcost

cost functionmodels

eeach vertex

cost functionyet

match image patch