functions are exactly solvable? boros hammer [1965], kolmogorov zabih [eccv 2002, pami 2004] ,...

Pushmeet Kohli Microsoft Research Cambridge

IbPRIA 2011

2:30 – 4:30 – Labelling Problems Graphical Models Message Passing

4:30 – 5:00 - Coffee break

5:00 – 7:00 - Graph Cuts Move Making Algorithms Speed and Efficiency

Infer the values of hidden random variables

Geometry Estimation Object Segmentation

Building

Sky

Tree Grass

Pose Estimation

Orientation Joint angles

z є (R,G,B)n x є {0,1}n

( n = number of pixels )

n random variables: x1,…,xn

Joint Probability: P(x1,…,xn)

Conditional Probability: P(x1|z1…,zn)

z є (R,G,B)n x є {0,1}n


z є (R,G,B)n x є {0,1}n

Posterior Probability

Likelihood (data-dependent)

Prior (data- independent)

P(x|z) = P(z|x) P(x) / P(z) ~ P(z|x) P(x)

(MAP Solution) x* = arg max P(x|z) = arg min E(x) x x


P(x|z) = P(z|x) P(x)

Posterior Likelihood Prior

P(zi|xi) ∏ xi

P(x|z) ~ P(z|x) P(x)

Red

Gre

en

Red

Gre

en

P(z|x) = FGMM (z,x)

MAP Solution

x* = argmax P(z|x) = argmax P(zi|xi)

Log P(zi|xi=0) P(zi|xi=1)

x

∏ xi


x

P(x|z) = P(z|x) P(x)

Posterior Likelihood Prior

f(xi,xj) ∏ xi,xj

Encourages consistency between labelling of adjacent pixels


P(x) = ∏ fij (xi,xj)

= ∏ exp{-|xi-xj|} “MRF Ising prior”

xi xj

i,j Є N

i,j Є N

E(x,z,w) = ∑ θi (xi,zi) + w ∑ θij (xi,xj,zi,zj) i,j i

P(x|z) = P(zi|xi) ∏ xi

P(xi,xj) ∏ xi,xj

-ve log

Posterior Probability

Energy

E(x,z,w) = ∑ θi (xi,zi) + w ∑ θij (xi,xj) i,j i

MRF Solution (Ising prior)

Likelihood Solution

Image


P(xi,xj) ∏ xi,xj



P(xi,xj,zi,zj) ∏ xi,xj

-ve log

[Boykov and Jolly ‘ 01] [Blake et al. ‘04] [Rother, Kolmogorov and Blake `04]

||zi-zj||2

Co

st (

θij)



Pairwise Cost


Pairwise Cost


Global Minimum (x*)

Factors graphs

&

Order of a posterior distribution (or energy function)

Write probability distributions as Graphical model:

x2 x1

x4 x3

x5

Factor graph

unobserved

P(x) ~ exp{-E(x)} E(x) = θ(x1,x2,x3) + θ(x2,x4) + θ(x3,x4) + θ(x3,x5)

variables are in same factor.

“4 factors”

Gibbs distribution

References:

- Pattern Recognition and Machine Learning [Bishop ‘08, book, chapter 8] - several lectures at the Machine Learning Summer School 2009 (see video lectures)

Definition “Order”: The arity (number of variables) of the largest factor

E(X) = θ(x1,x2,x3) θ(x2,x4) θ(x3,x4) θ(x3,x5)

x2 x1

x4 x3

x5

Factor graph with order 3

arity 3 arity 2

4-connected; pairwise MRF

Higher-order RF

E(x) = ∑ θij (xi,xj) i,j Є N4

higher(8)-connected; pairwise MRF

E(x) = ∑ θij (xi,xj) i,j Є N8

Order 2 Order 2 Order n

E(x) = ∑ θij (xi,xj)

+θ(x1,…,xn) i,j Є N4

“Pairwise energy” “higher-order energy”

1. Image segmentation

2. Stereo

3. Semantic Segmentation

P(x|z) ~ exp{-E(x)}

E(x) = ∑ θi (xi,zi) + ∑ θij (xi,xj) i

Observed variable

Unobserved (latent) variable

xi

i,j Є N4

zi

Factor graph

xj

E(x,w) = ∑ |T(w)i-xi| + ∑ θij (xi,xj) i i,j Є N4

E(x,w): {0,1}n x {Exemplar} → R

Large set of example segmentation:

T(1) T(2) T(3)

1

Up to 2.000.000 exemplars

Goal: Detect and segment test image:

Hamming distance

[Kohli et al. IJCV 08, Lempisky et al. ECCV ’08]

w

(Template, Position, Scale, Orientation)

UIUC dataset; 98.8% accuracy

[Kohli et al. IJCV 08, Lempisky et al. ECCV ’08]

Test Image Test Image (60% Noise)

Training Image

Result

Pairwise Energy

P(x)

Minimized using st-mincut or max-product

message passing


Training Image

Result

Pairwise Energy

P(x)

Minimized using st-mincut or max-product

message passing

Higher Order Structure not Preserved

Minimize:

Where:

Higher Order Function (|c| = 10x10 = 100)

Assigns cost to 2100 possible labellings!

Exploit function structure to transform it to a Pairwise function

E(X) = P(X) + ∑ hc (Xc) c

hc: {0,1}|c| → R

p1 p2 p3

Rother and Kohli, MSR Tech Report [2010]


Training Image

Pairwise Result

Higher-Order Result

Learned Patterns

Rother and Kohli, MSR Tech Report [2010]


2. Stereo

3. Semantic Segmentation

d=4

d=0

Ground truth depth Image – left(a) Image – right(b)

• Images rectified • Ignore occlusion for now

E(d): {0,…,D-1}n → R

Energy:

Labels: d (depth/shift)

di

θij (di,dj) = g(|di-dj|)

E(d): {0,…,D-1}n → R

Energy:

E(d) = ∑ θi (di) + ∑ θij (di,dj)

Pairwise:

i i,j Є N4

θi (di) = (lj-ri-di) “SAD; Sum of absolute differences” (many others possible, NCC,…)

i

i-2 (di=2)

Unary:

Left Image R

ight

Imag

e

left

right

[Olga Veksler PhD thesis, Daniel Cremers et al.]

|di-dj|


cost

No truncation (global min.)

[Olga Veksler PhD thesis, Daniel Cremers et al.]

|di-dj|

discontinuity preserving potentials [Blake&Zisserman’83,’87]


cost

No truncation (global min.)

with truncation (NP hard optimization)

|di-dj|


cost

[Olga Veksler PhD thesis]

(Potts model) Smooth disparities

Potts model

Left image

see http://vision.middlebury.edu/stereo/

No MRF Pixel independent (WTA)

No horizontal links Efficient since independent chains

Ground truth Pairwise MRF [Boykov et al. ‘01]

http://vision.middlebury.edu/stereo/


2. Stereo

3. Semantic/Object Segmentation

Object Segmentation

water

boat chair

tree

road

Organ Detection

Assign a label to each pixel (or Voxel in 3D)

(1) No relationships between Pixel s

(2) Pixel Groups, no relationships between groups

(4) Pairwise Relationships between Pixel s

(3) Pairwise Relationships between Pixel groups

(5) Higher order Relationships between

Pixels

Image

Image Window (W)

Pixel to be classified (P)

Pixel Classifier

P,W Cost for assigning label Cow

Boosting [Shotton et al, 2006] Random Decision Forests

Image (MSRC-21)

Pairwise CRF

Ground Truth

Higher order CRF

Sheep

Grass

[Unwrap Mosaic, Rav-Acha, Kohli, Fitzgibbon, Rother, Siggraph ’08]

E(x) = ∑ fi (xi) + ∑ gij (xi,xj) + ∑ hc(xc) i ij c

Unary Pairwise Higher Order

How to minimize E(x)?

x takes discrete values

Space of Problems

CSP

MAXCUT NP-Hard

Which functions are exactly solvable?

Approximate solutions of NP-hard problems

Scalability and Efficiency

Which functions are exactly solvable? Boros Hammer [1965], Kolmogorov Zabih [ECCV 2002, PAMI 2004] , Ishikawa [PAMI 2003],

Schlesinger [EMMCVPR 2007], Kohli Kumar Torr [CVPR2007, PAMI 2008] , Ramalingam Kohli Alahari Torr [CVPR 2008] , Kohli Ladicky Torr [CVPR 2008, IJCV 2009] , Zivny Jeavons [CP 2008]

Approximate solutions of NP-hard problems Schlesinger [1976 ], Kleinberg and Tardos [FOCS 99], Chekuri et al. [2001], Boykov et al. [PAMI

2001], Wainwright et al. [NIPS 2001], Werner [PAMI 2007], Komodakis [PAMI 2005], Lempitsky et al. [ICCV 2007], Kumar et al. [NIPS 2007], Kumar et al. [ICML 2008], Sontag and Jakkola [NIPS 2007], Kohli et al. [ICML 2008], Kohli et al. [CVPR 2008, IJCV 2009], Rother et al. [2009]

Scalability and Efficiency Kohli Torr [ICCV 2005, PAMI 2007], Juan and Boykov [CVPR 2006], Alahari Kohli Torr [CVPR

2008] , Delong and Boykov [CVPR 2008]

Dynamic Programming (DP)

Belief Propagtion (BP)

Tree-Reweighted (TRW), Max-sum Diffusion

Graph Cut (GC)

Branch & Bound

Relaxation methods

Combinatorial Algorithms

Message passing

Convex Optimization (Linear Programming)

Space of Problems

CSP

Tree Structured

MAXCUT NP-Hard

Iteratively pass messages between nodes...

Message update rule? Belief propagation (BP) Tree-reweighted belief propagation (TRW)

Max-product (minimizing an energy function, or MAP estimation)

x* = arg max P(x) = arg min E(x) E(x) = -ln P(x) + K

Sum-product (computing marginal probabilities)

P(xi) = ∑ P(x)

p q E(x) = ∑ fi (xi) + ∑ gij (xi,xj) i ij

xV-{i}

q p r

f (xp) + gpq (xp,xq)

Mp->q(L1) = min f (xp) + gpq (xp, L1) xp

= min (5+0, 1+2, 2+2)

5

1

2

gpq =2 ( xp ≠xq ) Potts Model Mp->q(L1,L2,L3) = (3,1,2)

L1

j

q p r

f (xp) + gpq (xp,xq)

5

1

2

gpq =2 ( xp ≠xq ) Potts Model

L1

q p r

Mp->q + f (xq) + gqr (xq,xr)

Global minimum in linear time

Select minimum cost label

Trace back path to get minimum cost labeling

(Min-marginal)

Mq->r + f (xr)

min { E(x) | xr = j} x

Dynamic programming: global minimum in linear time

BP: Inward pass (dynamic programming)

Outward pass

Gives min-marginals

q p r

root leaf

leaf

),()( qppqpp xxx

q p r

j

q p r

k

q p r

Min-marginal for node q and label j:

j

Pass messages using same rules

Empirically often works quite well

May not converge

“Pseudo” min-marginals

Gives local minimum in the “tree neighborhood”

[Weiss&Freeman’01],[Wainwright et al.’04]

Assumptions: ▪ BP has converged

▪ no ties in pseudo min-marginals

Naïve implementation: O(K2) [K = number of labels] Often can be improved to O(K)

Potts interactions, truncated linear, truncated quadratic, ...

j

Provides a lower bound

Lower Bound < E(x*) < E(x’)

Slightly different messages

Try to solve a LP relaxation of the MAP problem

+

[Kolmogorov, Wainwright et al.]

Related to Dynamic Programming

Exact on Trees (Hyper-trees)

Connections to Linear programming

Popular methods: BP, TRW-S, Diffusion

Schedules have a significant result on the solution

For more details: [Kolmogorov ‘06, Wainwright et al. 05]

Space of Problems

Submodular

Functions

CSP

Tree Structured

Pair-wise O(n3)

MAXCUT

O(n6)

n = Number of Variables

Segmentation Energy

NP-Hard

Example: n = 2, A = [1,0] , B = [0,1]

f([1,0]) + f([0,1]) f([1,1]) + f([0,0])

Property : Sum of submodular functions is submodular

E(x) = ∑ ci xi + ∑ dij |xi-xj| i i,j

Binary Image Segmentation Energy is submodular

for all A,B ϵ {0,1}n f(A) + f(B) f(A˅B) + f(A˄B) (AND) (OR)

Pseudo-boolean function f{0,1}n ℝ is submodular if

Discrete Analogues of Concave Functions [Lovasz, ’83]

Widely applied in Operations Research

Applications in Machine Learning

MAP Inference in Markov Random Fields

Clustering [Narasimhan , Jojic, & Bilmes, NIPS 2005]

Structure Learning [Narasimhan & Bilmes, NIPS 2006]

Maximizing the spread of influence through a social network [Kempe, Kleinberg & Tardos, KDD 2003]

Polynomial time algorithms Ellipsoid Algorithm: [Grotschel, Lovasz & Schrijver ‘81]

First strongly polynomial algorithm: [Iwata et al. ’00] [A. Schrijver ’00]

Current Best: O(n5 Q + n6) [Q is function evaluation time] [Orlin ‘07]

Symmetric functions: E(x) = E(1-x) Can be minimized in O(n3)

Minimizing Pairwise submodular functions

Can be transformed to st-mincut/max-flow [Hammer , 1965]

Very low empirical running time ~ O(n)

E(X) = ∑ fi (xi) + ∑ gij (xi,xj) i ij

Source

Sink

v1 v2

2

5

9

4 1

2

Graph (V, E, C)

Vertices V = {v1, v2 ... vn}

Edges E = {(v1, v2) ....}

Costs C = {c(1, 2) ....}

Source

Sink

v1 v2

2

5

9

4 1

2

What is a st-cut?

Source

Sink

v1 v2

2

5

9

4 1

2

What is a st-cut?

An st-cut (S,T) divides the nodes between source and sink.

What is the cost of a st-cut?

Sum of cost of all edges going from S to T

5 + 1 + 9 = 15

What is a st-cut?

An st-cut (S,T) divides the nodes between source and sink.

What is the cost of a st-cut?

Sum of cost of all edges going from S to T

What is the st-mincut?

st-cut with the minimum cost

Source

Sink

v1 v2

2

5

9

4 1

2

2 + 2 + 4 = 8

Construct a graph such that:

1. Any st-cut corresponds to an assignment of x

2. The cost of the cut is equal to the energy of x : E(x)

Solution T

S st-mincut

E(x)

[Hammer, 1965] [Kolmogorov and Zabih, 2002

E(x) = ∑ θi (xi) + ∑ θij (xi,xj) i,j i

θij(0,1) + θij (1,0) θij

(0,0) + θij (1,1) For all ij

E(x) = ∑ ci xi + ∑ cij xi(1-xj) cij≥0 i,j i

Equivalent (transformable)

Sink (1)

Source (0)

a1 a2

E(a1,a2)

Sink (1)

Source (0)

a1 a2

E(a1,a2) = 2a1

2

a1 a2

E(a1,a2) = 2a1 + 5ā1

2

5

Sink (1)

Source (0)

a1 a2

E(a1,a2) = 2a1 + 5ā1+ 9a2 + 4ā2

2

5

9

4

Sink (1)

Source (0)

a1 a2

E(a1,a2) = 2a1 + 5ā1+ 9a2 + 4ā2 + 2a1ā2

2

5

9

4

2

Sink (1)

Source (0)

a1 a2

E(a1,a2) = 2a1 + 5ā1+ 9a2 + 4ā2 + 2a1ā2 + ā1a2

2

5

9

4

2

1

Sink (1)

Source (0)

a1 a2

E(a1,a2) = 2a1 + 5ā1+ 9a2 + 4ā2 + 2a1ā2 + ā1a2

2

5

9

4

2

1 a1 = 1 a2 = 1

E (1,1) = 11

Cost of cut = 11

Sink (1)

Source (0)

a1 a2

E(a1,a2) = 2a1 + 5ā1+ 9a2 + 4ā2 + 2a1ā2 + ā1a2

2

5

9

4

2

1

Sink (1)

Source (0)

a1 = 1 a2 = 0

E (1,0) = 8

st-mincut cost = 8

Source

Sink

v1 v2

2

5

9

4 2

1

Solve the dual maximum flow problem

Compute the maximum flow between Source and Sink s.t.

Edges: Flow < Capacity

Nodes: Flow in = Flow out

Assuming non-negative capacity

In every network, the maximum flow equals the cost of the st-mincut

Min-cut\Max-flow Theorem

Augmenting Path Based Algorithms

Source

Sink

v1 v2

2

5

9

4 2

1

Flow = 0


1. Find path from source to sink with positive capacity

Source

Sink

v1 v2

2

5

9

4 2

1

Flow = 0



2. Push maximum possible flow through this path

Source

Sink

v1 v2

2-2

5-2

9

4 2

1

Flow = 0 + 2

Source

Sink

v1 v2

0

3

9

4 2

1




Flow = 2

Source

Sink

v1 v2

0

3

9

4 2

1




3. Repeat until no path can be found

Flow = 2

Source

Sink

v1 v2

0

3

5

0 2

1





Flow = 2 + 4

Source

Sink

v1 v2

0

3

5

0 2

1





Flow = 6

Source

Sink

v1 v2

0

1

3

0 2-2

1+2





Flow = 6 + 2

Source

Sink

v1 v2

0

2

4

0

3

0





Flow = 8

Source

Sink

v1 v2

0

2

4

0 3

0





Flow = 8

a1 a2

E(a1,a2) = 2a1 + 5ā1+ 9a2 + 4ā2 + 2a1ā2 + ā1a2

2

5

9

4

2

1

Sink (1)

Source (0)

a1 a2

E(a1,a2) = 2a1 + 5ā1+ 9a2 + 4ā2 + 2a1ā2 + ā1a2

2

5

9

4

2

1

Sink (1)

Source (0)

2a1 + 5ā1

= 2(a1+ā1) + 3ā1

= 2 + 3ā1

Sink (1)

Source (0)

a1 a2

E(a1,a2) = 2 + 3ā1+ 9a2 + 4ā2 + 2a1ā2 + ā1a2

0

3

9

4

2

1 2a1 + 5ā1

= 2(a1+ā1) + 3ā1

= 2 + 3ā1

a1 a2

E(a1,a2) = 2 + 3ā1+ 9a2 + 4ā2 + 2a1ā2 + ā1a2

0

3

9

4

2

1 9a2 + 4ā2

= 4(a2+ā2) + 5ā2

= 4 + 5ā2

Sink (1)

Source (0)

a1 a2

E(a1,a2) = 2 + 3ā1+ 5a2 + 4 + 2a1ā2 + ā1a2

0

3

5

0

2

1 9a2 + 4ā2

= 4(a2+ā2) + 5ā2

= 4 + 5ā2

Sink (1)

Source (0)

a1 a2

E(a1,a2) = 6 + 3ā1+ 5a2 + 2a1ā2 + ā1a2

0

3

5

0

2

1

Sink (1)

Source (0)

a1 a2

E(a1,a2) = 6 + 3ā1+ 5a2 + 2a1ā2 + ā1a2

0

3

5

0

2

1

3ā1+ 5a2 + 2a1ā2

= 2(ā1+a2+a1ā2) +ā1+3a2

= 2(1+ā1a2) +ā1+3a2

F1 = ā1+a2+a1ā2 F2 = 1+ā1a2

a1 a2 F1 F2

0 0 1 1

0 1 2 2

1 0 1 1

1 1 1 1

Sink (1)

Source (0)

a1 a2

E(a1,a2) = 8 + ā1+ 3a2 + 3ā1a2

0

1

3

0

0

3

3ā1+ 5a2 + 2a1ā2

= 2(ā1+a2+a1ā2) +ā1+3a2

= 2(1+ā1a2) +ā1+3a2

a1 a2 F1 F2

0 0 1 1

0 1 2 2

1 0 1 1

1 1 1 1

F1 = ā1+a2+a1ā2 F2 = 1+ā1a2

Sink (1)

Source (0)

a1 a2

0

1

3

0

0

3

E(a1,a2) = 8 + ā1+ 3a2 + 3ā1a2

No more augmenting paths

possible

Sink (1)

Source (0)

a1 a2

0

1

3

0

0

3

E(a1,a2) = 8 + ā1+ 3a2 + 3ā1a2

Total Flow

Residual Graph (positive coefficients)

bound on the optimal solution

Tight Bound --> Inference of the optimal solution becomes trivial

Sink (1)

Source (0)

a1 a2

0

1

3

0

0

3

E(a1,a2) = 8 + ā1+ 3a2 + 3ā1a2

a1 = 1 a2 = 0

E (1,0) = 8

st-mincut cost = 8

Total Flow

bound on the energy of the optimal solution

Residual Graph (positive coefficients)

Tight Bound --> Inference of the optimal solution becomes trivial

Sink (1)

Source (0)

[Slide credit: Andrew Goldberg]

Augmenting Path and Push-Relabel n: #nodes

m: #edges

U: maximum edge weight

Algorithms assume non-

negative edge weights

[Slide credit: Andrew Goldberg]

n: #nodes

m: #edges

U: maximum edge weight

Algorithms assume non-

negative edge weights

Augmenting Path and Push-Relabel

a1 a2

1000 1

Sink

Source

1000

1000 1000

0

Ford Fulkerson: Choose any augmenting path

a1 a2

1000 1

Sink

Source

1000

1000 1000

0 Good Augmenting

Paths


a1 a2

1000 1

Sink

Source

1000

1000 1000

0 Bad Augmenting

Path


a1 a2

999 0

Sink

Source

1000

1000 999

1


a1 a2

999 0

Sink

Source

1000

1000 999

1


n: #nodes

m: #edges

We will have to perform 2000 augmentations!

Worst case complexity: O (m x Total_Flow) (Pseudo-polynomial bound: depends on flow)

Dinic: Choose shortest augmenting path

n: #nodes

m: #edges

Worst case Complexity: O (m n2)

a1 a2

1000 1

Sink

Source

1000

1000 1000

0

Specialized algorithms for vision problems Grid graphs

Low connectivity (m ~ O(n))

Dual search tree augmenting path algorithm [Boykov and Kolmogorov PAMI 2004]

• Finds approximate shortest augmenting paths efficiently

• High worst-case time complexity

• Empirically outperforms other algorithms on vision problems

Efficient code available on the web

http://www.adastral.ucl.ac.uk/~vladkolm/software.html

E(x) = ∑ ci xi + ∑ dij |xi-xj| i i,j

Global Minimum (x*)

x x* = arg min E(x)

How to minimize E(x)?

E: {0,1}n → R 0 → fg

1 → bg

n = number of pixels

Sink (1)

Source (0)

Graph *g;

For all pixels p

/* Add a node to the graph */ nodeID(p) = g->add_node();

/* Set cost of terminal edges */ set_weights(nodeID(p), fgCost(p), bgCost(p));

end

for all adjacent pixels p,q

add_weights(nodeID(p), nodeID(q), cost(p,q)); end

g->compute_maxflow(); label_p = g->is_connected_to_source(nodeID(p)); // is the label of pixel p (0 or 1)

a1 a2

fgCost(a1)

Sink (1)

Source (0)

fgCost(a2)

bgCost(a1) bgCost(a2)

Graph *g;

For all pixels p



end




a1 a2

fgCost(a1)

Sink (1)

Source (0)

fgCost(a2)


cost(p,q)

Graph *g;

For all pixels p



end




Graph *g;

For all pixels p



end




a1 a2

fgCost(a1)

Sink (1)

Source (0)

fgCost(a2)


cost(p,q)

a1 = bg a2 = fg

Mixed (Real-Integer) Problems

Higher Order Energy Functions

Multi-label Problems

Ordered Labels ▪ Stereo (depth labels)

Unordered Labels ▪ Object segmentation ( ‘car’, `road’, `person’)

x – binary image segmentation (xi ∊ {0,1})

ω – non-local parameter (lives in some large set Ω)

constant unary

potentials pairwise

potentials

E(x,ω) = C(ω) + ∑ θi (ω, xi) + ∑ θij (ω,xi,xj) i,j i

≥ 0

ω Template Position

Scale Orientation

We have seen several of them in the intro ...

x – binary image segmentation (xi ∊ {0,1})

ω – non-local parameter (lives in some large set Ω)

constant unary

potentials pairwise

potentials

E(x,ω) = C(ω) + ∑ θi (ω, xi) + ∑ θij (ω,xi,xj) i,j i

≥ 0

{x*,ω*} = arg min E(x,ω)

• Standard “graph cut” energy if ω is fixed

x,ω

[Kohli et al, 06,08] [Lempitsky et al, 08]

Local Method: Gradient Descent over ω

ω*

ω * = arg min min E (x,ω)

x ω

Submodular

[Kohli et al, 06,08]

Local Method: Gradient Descent over ω

ω * = arg min min E (x,ω)

x ω

Submodular

Dynamic Graph Cuts

15- 20 time speedup!

E (x,ω1)

E (x,ω2)

Similar Energy Functions

[Kohli et al, 06,08]

Global Method: Branch and Mincut

[Lempitsky et al, 08]

Produces the global optimal solution Exhaustively explores all ω in Ω in the worst case

30,000,000

shapes

E(X) = ∑ ci xi + ∑ dij |xi-xj| i i,j

E: {0,1}n → R

0 →fg, 1→bg



Image Unary Cost Segmentation

Patch Dictionary (Tree)

Cmax C1

{ C1 if xi = 0, i ϵ p Cmax otherwise

h(Xp) =

[Kohli et al. ‘07]

p

E(X) = ∑ ci xi + ∑ dij |xi-xj| + ∑ hp (Xp) i i,j p

p

{ C1 if xi = 0, i ϵ p Cmax otherwise

h(Xp) =

E: {0,1}n → R

0 →fg, 1→bg



E(X) = ∑ ci xi + ∑ dij |xi-xj| + ∑ hp (Xp) i i,j

Image Pairwise Segmentation Final Segmentation

p

E: {0,1}n → R

0 →fg, 1→bg



T

S st-mincut

Pairwise Submodular Function

Higher Order Submodular

Functions

Billionnet and M. Minoux [DAM 1985] Kolmogorov & Zabih [PAMI 2004] Freedman & Drineas [CVPR2005] Kohli Kumar Torr [CVPR2007, PAMI 2008] Kohli Ladicky Torr [CVPR 2008, IJCV 2009] Ramalingam Kohli Alahari Torr [CVPR 2008] Zivny et al. [CP 2008]

Exact Transformation

?

Identified transformable families of higher order function s.t.

1. Constant or polynomial number of auxiliary variables (a) added

2. All pairwise functions (g) are submodular

Pairwise Submodular

Function

Higher Order Function

H (X) = F ( ∑ xi ) Example:

H (X)

∑ xi

concave


0

Simple Example using Auxiliary variables

{ 0 if all xi = 0 C1 otherwise f(x) =

min f(x) min C1a + C1 ∑ ā xi

x ϵ L = {0,1}n

x = x,a ϵ {0,1}

Higher Order Submodular Function

Quadratic Submodular Function

∑xi < 1 a=0 (ā=1) f(x) = 0

∑xi ≥ 1 a=1 (ā=0) f(x) = C1

min f(x) min C1a + C1 ∑ ā xi x

= x,a ϵ {0,1}



∑xi

1 2 3

C1

C1∑xi


min f(x) min C1a + C1 ∑ ā xi x

= x,a ϵ {0,1}



∑xi

1 2 3

C1

C1∑xi

a=1 a=0 Lower envelop of concave functions is

concave


min f(x) min f1 (x)a + f2(x)ā x

= x,a ϵ {0,1}



∑xi

1 2 3

Lower envelop of concave functions is

concave

f2(x)

f1(x)


min f(x) min f1 (x)a + f2(x)ā x

= x,a ϵ {0,1}



∑xi

1 2 3

a=1 a=0 Lower envelop of concave functions is

concave

f2(x)

f1(x)


min f(x) min max f1 (x)a + f2(x)ā x

= x ϵ {0,1}



∑xi

1 2 3

a=1 a=0 Upper envelope

of linear functions is

convex

f2(x)

f1(x)

a ϵ {0,1}

Very Hard Problem!!!! [Kohli and Kumar, CVPR 2010]

Object Prior on size of object (say n/2)

Background

BINARY SEGMENTATION

SIZE/VOLUME PRIORS

[Kohli and Kumar, CVPR 2010]

SILHOUETTE CONSTRAINTS

Rays must touch silhouette

at least once

3D RECONSTRUCTION

[Sinha et al. ‘05, Cremers et al. ’08] [Woodford et al. 2009]

Exact Transformation to QPBF Move making algorithms

E(y) = ∑ fi (yi) + ∑ gij (yi,yj) i,j i

y ϵ Labels L = {l1, l2, … , lk}

Min y

[Roy and Cox ’98] [Ishikawa ’03] [Schlesinger & Flach ’06]

[Ramalingam, Alahari, Kohli, and Torr ’08]

So what is the problem?

Eb (x1,x2, ..., xm) Em (y1,y2, ..., yn)

Multi-label Problem Binary label Problem

yi ϵ L = {l1, l2, … , lk} xi ϵ L = {0,1}

such that:

Let Y and X be the set of feasible solutions, then

1. One-One encoding function T:X->Y

2. arg min Em(y) = T(arg min Eb(x))

• Popular encoding scheme

[Roy and Cox ’98, Ishikawa ’03, Schlesinger & Flach ’06]

# Nodes = n * k

# Edges = m * k2



# Nodes = n * k

# Edges = m * k2

Schlesinger & Flach ’06:

E(y) = ∑ θi (yi) + ∑ θij (yi,yj) i,j i


θij(li+1,lj) + θij (li,lj+1) θij

(li,lj) + θij (li+1,lj+1)

li +1

li

lj +1

lj

Image MAP Solution

Scanline algorithm

[Roy and Cox, 98]

Unary Potentials

Pair-wise Potentials

Complexity

Ishikawa Transformation

[03]

Arbitrary Convex and Symmetric

T(nk, mk2)

Schlesinger Transformation

[06]

Arbitrary Submodular T(nk, mk2)

Hochbaum [01]

Linear Convex and Symmetric

T(n, m) + n log k

Hochbaum [01]

Convex Convex and Symmetric

O(mn log n log nk)

Other “less known” algorithms

T(a,b) = complexity of maxflow with a nodes and b edges

Exact Transformation to QPBF Move making algorithms

E(y) = ∑ fi (yi) + ∑ gij (yi,yj) i,j i


Min y

[Boykov , Veksler and Zabih 2001] [Woodford, Fitzgibbon, Reid, Torr, 2008]

[Lempitsky, Rother, Blake, 2008] [Veksler, 2008] [Kohli, Ladicky, Torr 2008]

Solution Space

En

erg

y

Search Neighbourhood

Current Solution

Optimal Move

Solution Space

En

erg

y

Search Neighbourhood

Current Solution

Optimal Move

xc

(t) Key Property

Move Space

Bigger move space

Solution Space

En

erg

y

• Better solutions

• Finding the optimal move hard

Minimizing Pairwise Functions [Boykov Veksler and Zabih, PAMI 2001]

• Series of locally optimal moves

• Each move reduces energy

• Optimal move by minimizing submodular function

Space of Solutions (x) : Ln

Move Space (t) : 2n Search Neighbourhood

Current Solution

n Number of Variables

L Number of Labels

Kohli et al. ‘07, ‘08, ‘09 Extend to minimize Higher order Functions

Minimize over move variables t

x = t x1 + (1- t) x2

New solution

Current Solution

Second solution

Em(t) = E(t x1 + (1- t) x2)

For certain x1 and x2, the move energy is sub-modular QPBF

[Boykov , Veksler and Zabih 2001]

• Variables labeled α, β can swap their labels


Sky

House

Tree

Ground Swap Sky, House



Move energy is submodular if:

Unary Potentials: Arbitrary

Pairwise potentials: Semi-metric

θij (la,lb) ≥ 0

θij (la,lb) = 0 a = b

Examples: Potts model, Truncated Convex



[Boykov, Veksler, Zabih]

• Variables take label a or retain current label


Sky

House

Tree

Ground

Initialize with Tree Status: Expand Ground Expand House Expand Sky

[Boykov, Veksler, Zabih] [Boykov , Veksler and Zabih 2001]


Move energy is submodular if:

Unary Potentials: Arbitrary

Pairwise potentials: Metric

[Boykov, Veksler, Zabih]

θij (la,lb) + θij (lb,lc) ≥ θij (la,lc)

Semi metric +

Triangle Inequality

Examples: Potts model, Truncated linear

Cannot solve truncated quadratic



Expansion and Swap can be derived as a primal dual scheme

Get solution of the dual problem which is a lower bound on the energy of solution

Weak guarantee on the solution

[Komodakis et al 05, 07]

E(x) < 2(dmax /dmin) E(x*)

dmax

dmin

θij (li,lj) = g(|li-lj|)

|yi-yj|

Move Type First Solution

Second Solution

Guarantee

Expansion Old solution All alpha Metric

Fusion Any solution Any solution

Minimize over move variables t

x = t x1 + (1-t) x2

New solution

First solution

Second solution

Move functions can be non-submodular!!

x = t x1 + (1-t) x2

x1, x2 can be continuous

F x1

x2

x

Optical Flow Example

Final Solution

Solution from

Method 1

Solution from

Method 2

[Woodford, Fitzgibbon, Reid, Torr, 2008] [Lempitsky, Rother, Blake, 2008]

Move variables can be multi-label

Optimal move found out by using the Ishikawa Transform

Useful for minimizing energies with truncated convex pairwise potentials

θij (yi,yj) = min(|yi-yj|2,T)

|yi-yj|

θij (yi,yj)

T

x = (t ==1) x1 + (t==2) x2 +… +(t==k) xk

[Kumar and Torr, 2008] [Veksler, 2008]

[Veksler, 2008]

Image Noisy Image

Range Moves

Expansion Move

3,600,000,000 Pixels Created from about 800 8 MegaPixel Images

[Kopf et al. (MSR Redmond) SIGGRAPH 2007 ]

[Kopf et al. (MSR Redmond) SIGGRAPH 2007 ]

Processing Videos 1 minute video of 1M pixel resolution

3.6 B pixels

3D reconstruction [500 x 500 x 500 = .125B voxels]

Kohli & Torr (ICCV05, PAMI07)

Can we do better?

Segment

Segment

First Frame

Second Frame [Kohli & Torr, ICCV05 PAMI07]

Kohli & Torr (ICCV05, PAMI07) [Kohli & Torr, ICCV05 PAMI07]

Image

Flow Segmentation

EA SA minimize

fast

Simpler

3–100000

time speedup! Reparametrization

Reuse Computation

minimize SB

differences between A and B

EB*

EB Frame 2

Frame 1

[Kohli & Torr, ICCV05 PAMI07] [Komodakis & Paragios, CVPR07]

Reparametrized Energy

Kohli & Torr (ICCV05, PAMI07)

E(a1,a2) = 2a1 + 5ā1+ 9a2 + 4ā2 + 2a1ā2 + ā1a2

E(a1,a2) = 8 + ā1+ 3a2 + 3ā1a2

Original Energy

E(a1,a2) = 2a1 + 5ā1+ 9a2 + 4ā2 + 7a1ā2 + ā1a2

E(a1,a2) = 8 + ā1+ 3a2 + 3ā1a2 + 5a1ā2

New Energy

New Reparametrized Energy

[Kohli & Torr, ICCV05 PAMI07] [Komodakis & Paragios, CVPR07]

Original Problem (Large)

Fast partially optimal

algorithm

Approximate Solution

[ Alahari Kohli & Torr CVPR ‘08]

Approximation algorithm

(Slow)



algorithm



(Slow)

Reduced Problem

Solved Problem (Global Optima)



Fast partially optimal algorithm

[Kovtun ‘03] [ Kohli et al. ‘09]


(Fast)



algorithm


Tree Reweighted Message Passing

(9.89 sec)

Reduced Problem

Solved Problem (Global Optima) Total Time

(0.30 sec)


sky

Building

Airplane

Grass

sky

Building

Airplane

Grass

sky

Building

Airplane

Grass

3- 100 Times Speed up

Tree Reweighted Message Passing


Fast partially optimal algorithm

[Kovtun ‘03] [ Kohli et al. ‘09]

Message Passing algorithms

Belief propagation, TRW

Exact on tree structured graphs

Connections to Linear programming relaxations

Graph Cuts and submodular functions

Move making algorithms – expansion, swap, fusion

Minimizing Higher order functions

Going beyond small connectivity

Representation, Inference and learning of Higher order models

Topology constraints: connectivity of objects

Parallelization – GPUs

Linear programming formulations

Past presenters

Pawan, Carsten and Andrew Blake

Special thanks to the IbPRIA team

In particular:

Joost Van de Weijer

Jordi Gonzàlez

Mario Hernández Tejera

Set of Label assignments

Associated Costs

Introduce pattern variable z with (t+1)-states to get a pairwise function:

Unary term assigns the cost for the selected label assignment z

Pairwise terms ensure x is consistent with pattern z

p1 p2 p3

E(x) = ∑ θi (xi,zi) + ∑ θij (xi,xj,zi,zj) i

Observed variable

Unobserved (latent) variable

i,j Є N4

Conditional Random Field (CRF): no pure prior

MRF

xj

zj

zi

xj

zj

zi

xi

xj

Factor graph CRF

θij (xi,xj,zi,zj) = |xi-xj| (-exp{-ß||zi-zj||})

ß=2(Mean(||zi-zj||2) )-1

||zi-zj||

θij

|xi-xj|

θij (xi,xj) = g(|xi-xj|)

Pai

r-w

ise

E(x) = ∑ θi (xi,zi) + w∑ θij (xi,xj)

θi (xi,zi) = |xi-zi|

Convex unary and pairwise terms:

original input TRW-S (discrete labels)

HBF [Szelsiki ‘06] (continuous labels) ~ 15times faster

xi ∊ R

xi

un

ary

Can be solved globally optimal

Non-submodular Energy Functions






Minimizing general non-submodular functions is NP-hard.

Commonly used method is to solve a relaxation of the problem

E(x) = ∑ θi (xi) + ∑ θij (xi,xj) i,j i

θij(0,1) + θij (1,0) ≤ θij

(0,0) + θij (1,1) for some ij

[Boros and Hammer, ‘02]

Integer Program

,

,

Feasibility Region

(LOCAL Polytope)

Linear Program

Formulate the program

Solve LP and obtain Integer solution by rounding

Integral solution is not guaranteed to be optimal

However, we can obtain multiplicative bounds on the error

Feasibility Region

(LOCAL Polytope)

Round

Traditional LP Solvers are unable to handle large instances ( >107 variables) Simplex

Interior Point Methods

[Bertsimas and Tsitsikilis, 1997]

[Boyd and Vandenberghe, 2004]

Exploit the structure of the problem

Special Message Passing Solvers Message Passing Algorithms

Weak convergence properties

[Kolmogorov and Wainwright]

[Ravikumar et al., ICML 2008] Feasibility Region

(LOCAL Polytope)

A new weaker relaxation...roof-dual [Boros and Hammer, 2002] [Kohli et al., ICML 2008] [Kolmogorov and Rother, PAMI 2006]

... which can be exactly solved by

minimizing a pairwise submodular function

Obtain partially optimal solutions x = [00*10**1111*] xOPT = [001 100011110]

Feasibility Region

(LOCAL Polytope)

x

pairwise nonsubmodular

unary

pairwise submodular

)0,1()1,0()1,1()0,0( pqpqpqpq

)0,1(~

)1,0(~

)1,1(~

)0,0(~

pqpqpqpq


Double number of variables:


Double number of variables: ppp xxx , )1( pp xx

Ignore constraint and solve


Double number of variables: ppp xxx , )1( pp xx

Local Optimality


functions are exactly solvable? boros hammer [1965], kolmogorov zabih [eccv 2002, pami 2004] ,...

Documents