energy minimization - university of chicagottic.uchicago.edu/~rurtasun/courses/cv/lecture15.pdfp =...

Post on 10-Aug-2020

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Energy Minimization

Raquel Urtasun

TTI Chicago

Feb 28, 2013

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 1 / 48

The ST-mincut problem

Suppose we have a graph G = {V ,E ,C}, with vertices V , Edges E andcosts C .

[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 2 / 48

The ST-mincut problem

An st-cut (S,T) divides the nodes between source and sink.

The cost of a st-cut is the sum of cost of all edges going from S to T

[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 3 / 48

The ST-mincut problem

The st-mincut is the st-cut with the minimum cost

[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 4 / 48

Back to our energy minimization

Construct a graph such that

1 Any st-cut corresponds to an assignment of x

2 The cost of the cut is equal to the energy of x : E(x)

[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 5 / 48

St-mincut and Energy Minimization

[Source: P. Kohli]

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 6 / 48

How are they equivalent?

A B

C D

0 1

0

1 xi

xj

= A + 0 0

C-A C-A

0 1

0

1

0 D-C

0 D-C

0 1

0

1

0 B+C-A-D

0 0

0 1

0

1 + +

if x1=1 add C-A

if x2 = 1 add D-C

B+C-A-D ! 0 is true from the submodularity of !ij

A = !ij (0,0) B = !ij(0,1) C = !ij

(1,0) D = !ij (1,1)

!ij (xi,xj) = !ij(0,0) + (!ij(1,0)-!ij(0,0)) xi + (!ij(1,0)-!ij(0,0)) xj + (!ij(1,0) + !ij(0,1) - !ij(0,0) - !ij(1,1)) (1-xi) xj

[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 7 / 48

Graph Construction

[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 8 / 48

Graph Construction

[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 8 / 48

Graph Construction

[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 8 / 48

Graph Construction

[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 8 / 48

Graph Construction

[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 8 / 48

Graph Construction

[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 8 / 48

Graph Construction

[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 8 / 48

Graph Construction

[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 8 / 48

Graph Construction

[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 8 / 48

How to compute the St-mincut?

[Source: P. Kohli]

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 9 / 48

How does the code look like

[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 10 / 48

How does the code look like

[Source: P. Kohli]

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 10 / 48

How does the code look like

[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 10 / 48

How does the code look like

[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 10 / 48

Graph cuts for multi-label problems

Exact Transformation to QPBF [Roy and Cox 98] [Ishikawa 03] [Schlesingeret al. 06] [Ramalingam et al. 08]

Very high computational cost

[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 11 / 48

Computing the Optimal Move

[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 12 / 48

Move Making Algorithms

[Source: P. Kohli]

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 13 / 48

Energy Minimization

Consider pairwise MRFs

E (f ) =∑

{p,q}∈N

Vp,q(fp, fq) +∑p

Dp(fp)

with N defining the interactions between nodes, e.g., pixels

Dp non-negative, but arbitrary.

This is the graph-cuts notation.

Important to notice it’s the same thing.

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 14 / 48

Energy Minimization

Consider pairwise MRFs

E (f ) =∑

{p,q}∈N

Vp,q(fp, fq) +∑p

Dp(fp)

with N defining the interactions between nodes, e.g., pixels

Dp non-negative, but arbitrary.

This is the graph-cuts notation.

Important to notice it’s the same thing.

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 14 / 48

Energy Minimization

Consider pairwise MRFs

E (f ) =∑

{p,q}∈N

Vp,q(fp, fq) +∑p

Dp(fp)

with N defining the interactions between nodes, e.g., pixels

Dp non-negative, but arbitrary.

This is the graph-cuts notation.

Important to notice it’s the same thing.

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 14 / 48

Metric vs Semimetric

Two general classes of pairwise interactions

Metric if it satisfies for any set of labels α, β, γ

V (α, β) = 0 ↔ α = β

V (α, β) = V (β, α) ≥ 0

V (α, β) ≤ V (α, γ) + V (γ, β)

Semi-metric if it satisfies for any set of labels α, β, γ

V (α, β) = 0 ↔ α = β

V (α, β) = V (β, α) ≥ 0

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 15 / 48

Metric vs Semimetric

Two general classes of pairwise interactions

Metric if it satisfies for any set of labels α, β, γ

V (α, β) = 0 ↔ α = β

V (α, β) = V (β, α) ≥ 0

V (α, β) ≤ V (α, γ) + V (γ, β)

Semi-metric if it satisfies for any set of labels α, β, γ

V (α, β) = 0 ↔ α = β

V (α, β) = V (β, α) ≥ 0

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 15 / 48

Examples for 1D label set

Truncated quadratic is a semi-metric

V (α, β) = min(K , |α− β|2)

with K a constant.

Truncated absolute distance is a metric

V (α, β) = min(K , |α− β|)

with K a constant.

For multi-dimensional, replace | · | by any norm.

Potts model is a metric

V (α, β) = K · T (α 6= β)

with T (·) = 1 if the argument is true and 0 otherwise.

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 16 / 48

Examples for 1D label set

Truncated quadratic is a semi-metric

V (α, β) = min(K , |α− β|2)

with K a constant.

Truncated absolute distance is a metric

V (α, β) = min(K , |α− β|)

with K a constant.

For multi-dimensional, replace | · | by any norm.

Potts model is a metric

V (α, β) = K · T (α 6= β)

with T (·) = 1 if the argument is true and 0 otherwise.

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 16 / 48

Examples for 1D label set

Truncated quadratic is a semi-metric

V (α, β) = min(K , |α− β|2)

with K a constant.

Truncated absolute distance is a metric

V (α, β) = min(K , |α− β|)

with K a constant.

For multi-dimensional, replace | · | by any norm.

Potts model is a metric

V (α, β) = K · T (α 6= β)

with T (·) = 1 if the argument is true and 0 otherwise.

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 16 / 48

Examples for 1D label set

Truncated quadratic is a semi-metric

V (α, β) = min(K , |α− β|2)

with K a constant.

Truncated absolute distance is a metric

V (α, β) = min(K , |α− β|)

with K a constant.

For multi-dimensional, replace | · | by any norm.

Potts model is a metric

V (α, β) = K · T (α 6= β)

with T (·) = 1 if the argument is true and 0 otherwise.

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 16 / 48

Binary Moves

α− β moves works for semi-metrics

α expansion works for V being a metric

Figure: Figure from P. Kohli tutorial on graph-cuts

For certain x1 and x2, the move energy is sub-modular QPBF

[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 17 / 48

Swap Move

[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 18 / 48

Swap Move

[Source: P. Kohli]

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 18 / 48

Expansion Move

[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 19 / 48

Expansion Move

[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 19 / 48

More formally

Any labeling can be uniquely represented by a partition of image pixelsP = {Pl |l ∈ L}, where Pl = {p ∈ P|fp = l} is a subset of pixels assignedlabel l .

There is a one to one correspondence between labelings f and partitions P.

Given a pair of labels α, β, a move from a partition P (labeling f ) to a newpartition P’ (labeling f ′) is called an α− β swap if Pl = P ′ for any labell 6= α, β.

The only difference between P and P ′ is that some pixels that were labeledin P are now labeled in P ′, and vice-versa.

Given a label l , a move from a partition P (labeling f ) to a new partition P ′

(labeling f ′) is called an α-expansion if Pα ⊂ P ′α and P ′

l ⊂ Pl .

An α-expansion move allows any set of image pixels to change their labelsto α.

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 20 / 48

More formally

Any labeling can be uniquely represented by a partition of image pixelsP = {Pl |l ∈ L}, where Pl = {p ∈ P|fp = l} is a subset of pixels assignedlabel l .

There is a one to one correspondence between labelings f and partitions P.

Given a pair of labels α, β, a move from a partition P (labeling f ) to a newpartition P’ (labeling f ′) is called an α− β swap if Pl = P ′ for any labell 6= α, β.

The only difference between P and P ′ is that some pixels that were labeledin P are now labeled in P ′, and vice-versa.

Given a label l , a move from a partition P (labeling f ) to a new partition P ′

(labeling f ′) is called an α-expansion if Pα ⊂ P ′α and P ′

l ⊂ Pl .

An α-expansion move allows any set of image pixels to change their labelsto α.

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 20 / 48

More formally

Any labeling can be uniquely represented by a partition of image pixelsP = {Pl |l ∈ L}, where Pl = {p ∈ P|fp = l} is a subset of pixels assignedlabel l .

There is a one to one correspondence between labelings f and partitions P.

Given a pair of labels α, β, a move from a partition P (labeling f ) to a newpartition P’ (labeling f ′) is called an α− β swap if Pl = P ′ for any labell 6= α, β.

The only difference between P and P ′ is that some pixels that were labeledin P are now labeled in P ′, and vice-versa.

Given a label l , a move from a partition P (labeling f ) to a new partition P ′

(labeling f ′) is called an α-expansion if Pα ⊂ P ′α and P ′

l ⊂ Pl .

An α-expansion move allows any set of image pixels to change their labelsto α.

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 20 / 48

More formally

Any labeling can be uniquely represented by a partition of image pixelsP = {Pl |l ∈ L}, where Pl = {p ∈ P|fp = l} is a subset of pixels assignedlabel l .

There is a one to one correspondence between labelings f and partitions P.

Given a pair of labels α, β, a move from a partition P (labeling f ) to a newpartition P’ (labeling f ′) is called an α− β swap if Pl = P ′ for any labell 6= α, β.

The only difference between P and P ′ is that some pixels that were labeledin P are now labeled in P ′, and vice-versa.

Given a label l , a move from a partition P (labeling f ) to a new partition P ′

(labeling f ′) is called an α-expansion if Pα ⊂ P ′α and P ′

l ⊂ Pl .

An α-expansion move allows any set of image pixels to change their labelsto α.

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 20 / 48

More formally

Any labeling can be uniquely represented by a partition of image pixelsP = {Pl |l ∈ L}, where Pl = {p ∈ P|fp = l} is a subset of pixels assignedlabel l .

There is a one to one correspondence between labelings f and partitions P.

Given a pair of labels α, β, a move from a partition P (labeling f ) to a newpartition P’ (labeling f ′) is called an α− β swap if Pl = P ′ for any labell 6= α, β.

The only difference between P and P ′ is that some pixels that were labeledin P are now labeled in P ′, and vice-versa.

Given a label l , a move from a partition P (labeling f ) to a new partition P ′

(labeling f ′) is called an α-expansion if Pα ⊂ P ′α and P ′

l ⊂ Pl .

An α-expansion move allows any set of image pixels to change their labelsto α.

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 20 / 48

More formally

Any labeling can be uniquely represented by a partition of image pixelsP = {Pl |l ∈ L}, where Pl = {p ∈ P|fp = l} is a subset of pixels assignedlabel l .

There is a one to one correspondence between labelings f and partitions P.

Given a pair of labels α, β, a move from a partition P (labeling f ) to a newpartition P’ (labeling f ′) is called an α− β swap if Pl = P ′ for any labell 6= α, β.

The only difference between P and P ′ is that some pixels that were labeledin P are now labeled in P ′, and vice-versa.

Given a label l , a move from a partition P (labeling f ) to a new partition P ′

(labeling f ′) is called an α-expansion if Pα ⊂ P ′α and P ′

l ⊂ Pl .

An α-expansion move allows any set of image pixels to change their labelsto α.

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 20 / 48

Example

Figure: (a) Current partition (b) local move (c) α− β-swap (d) α-expansion.

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 21 / 48

Algorithms

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 22 / 48

Finding optimal Swap move

Given an input labeling f (partition P) and a pair of labels α, β we want tofind a labeling f that minimizes E over all labelings within one α− β-swapof f .

This is going to be done by computing a labeling corresponding to aminimum cut on a graph Gαβ = (Vαβ , Eαβ).

The structure of this graph is dynamically determined by the currentpartition P and by the labels α, β.

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 23 / 48

Finding optimal Swap move

Given an input labeling f (partition P) and a pair of labels α, β we want tofind a labeling f that minimizes E over all labelings within one α− β-swapof f .

This is going to be done by computing a labeling corresponding to aminimum cut on a graph Gαβ = (Vαβ , Eαβ).

The structure of this graph is dynamically determined by the currentpartition P and by the labels α, β.

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 23 / 48

Finding optimal Swap move

Given an input labeling f (partition P) and a pair of labels α, β we want tofind a labeling f that minimizes E over all labelings within one α− β-swapof f .

This is going to be done by computing a labeling corresponding to aminimum cut on a graph Gαβ = (Vαβ , Eαβ).

The structure of this graph is dynamically determined by the currentpartition P and by the labels α, β.

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 23 / 48

Graph Construction

The set of vertices includes the two terminals α and β, as well as imagepixels p in the sets Pα and Pβ (i.e., fp ∈ {α, β}).

Each pixel p ∈ Pαβ is connected to the terminals α and β, called t-links.

Each set of pixels p, q ∈ Pαβ which are neighbors is connected by an edgeep,q

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 24 / 48

Computing the Cut

Any cut must have a single t-link not cut.

This defines a labeling

There is a one-to-one correspondences between a cut and a labeling.

The energy of the cut is the energy of the labeling.

See Boykov et al, ”fast approximate energy minimization via graph cuts”PAMI 2001.

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 25 / 48

Properties

For any cut, then

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 26 / 48

Finding the optimal α expansion

Given an input labeling f (partition P) and a label α we want to find alabeling f that minimizes E over all labelings within one α-expansion of f .

This is going to be done by computing a labeling corresponding to aminimum cut on a graph Gα = (Vα, Eα).

The structure of this graph is dynamically determined by the currentpartition P and by the label α.

Different graph than the α− β swap.

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 27 / 48

Finding the optimal α expansion

Given an input labeling f (partition P) and a label α we want to find alabeling f that minimizes E over all labelings within one α-expansion of f .

This is going to be done by computing a labeling corresponding to aminimum cut on a graph Gα = (Vα, Eα).

The structure of this graph is dynamically determined by the currentpartition P and by the label α.

Different graph than the α− β swap.

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 27 / 48

Finding the optimal α expansion

Given an input labeling f (partition P) and a label α we want to find alabeling f that minimizes E over all labelings within one α-expansion of f .

This is going to be done by computing a labeling corresponding to aminimum cut on a graph Gα = (Vα, Eα).

The structure of this graph is dynamically determined by the currentpartition P and by the label α.

Different graph than the α− β swap.

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 27 / 48

Finding the optimal α expansion

Given an input labeling f (partition P) and a label α we want to find alabeling f that minimizes E over all labelings within one α-expansion of f .

This is going to be done by computing a labeling corresponding to aminimum cut on a graph Gα = (Vα, Eα).

The structure of this graph is dynamically determined by the currentpartition P and by the label α.

Different graph than the α− β swap.

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 27 / 48

Graph Construction

The set of vertices includes the two terminals α and α, as well as all imagepixels p ∈ P.

Additionally, for each pair of neighboring pixels p, q such that fp 6= fq wecreate an auxiliary node ap,q.

Each pixel p is connected to the terminals α and α, called t-links.

Each set of pixels p, q which are neighbors and fp = fq, we connect with andn-link.

For each pair of neighboring pixels such that fp 6= fq, we create a triplet{ep,a, ea,q, tαa }.

The set of edges is then

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 28 / 48

Graph Construction

The set of vertices includes the two terminals α and α, as well as all imagepixels p ∈ P.

Additionally, for each pair of neighboring pixels p, q such that fp 6= fq wecreate an auxiliary node ap,q.

Each pixel p is connected to the terminals α and α, called t-links.

Each set of pixels p, q which are neighbors and fp = fq, we connect with andn-link.

For each pair of neighboring pixels such that fp 6= fq, we create a triplet{ep,a, ea,q, tαa }.

The set of edges is then

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 28 / 48

Graph Construction

The set of vertices includes the two terminals α and α, as well as all imagepixels p ∈ P.

Additionally, for each pair of neighboring pixels p, q such that fp 6= fq wecreate an auxiliary node ap,q.

Each pixel p is connected to the terminals α and α, called t-links.

Each set of pixels p, q which are neighbors and fp = fq, we connect with andn-link.

For each pair of neighboring pixels such that fp 6= fq, we create a triplet{ep,a, ea,q, tαa }.

The set of edges is then

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 28 / 48

Graph Construction

The set of vertices includes the two terminals α and α, as well as all imagepixels p ∈ P.

Additionally, for each pair of neighboring pixels p, q such that fp 6= fq wecreate an auxiliary node ap,q.

Each pixel p is connected to the terminals α and α, called t-links.

Each set of pixels p, q which are neighbors and fp = fq, we connect with andn-link.

For each pair of neighboring pixels such that fp 6= fq, we create a triplet{ep,a, ea,q, tαa }.

The set of edges is then

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 28 / 48

Graph Construction

The set of vertices includes the two terminals α and α, as well as all imagepixels p ∈ P.

Additionally, for each pair of neighboring pixels p, q such that fp 6= fq wecreate an auxiliary node ap,q.

Each pixel p is connected to the terminals α and α, called t-links.

Each set of pixels p, q which are neighbors and fp = fq, we connect with andn-link.

For each pair of neighboring pixels such that fp 6= fq, we create a triplet{ep,a, ea,q, tαa }.

The set of edges is then

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 28 / 48

Graph Construction

The set of vertices includes the two terminals α and α, as well as all imagepixels p ∈ P.

Additionally, for each pair of neighboring pixels p, q such that fp 6= fq wecreate an auxiliary node ap,q.

Each pixel p is connected to the terminals α and α, called t-links.

Each set of pixels p, q which are neighbors and fp = fq, we connect with andn-link.

For each pair of neighboring pixels such that fp 6= fq, we create a triplet{ep,a, ea,q, tαa }.

The set of edges is then

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 28 / 48

Graph Construction

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 29 / 48

Properties

There is a one-to-one correspondences between a cut and a labeling.

The energy of the cut is the energy of the labeling.

See Boykov et al, ”fast approximate energy minimization via graph cuts”PAMI 2001.

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 30 / 48

Global Minimization Techniques

Ways to get an approximate solution typically

Dynamic programming approximations

Sampling

Simulated annealing

Graph-cuts: imposes restrictions on the type of pairwise cost functions

Message passing: iterative algorithms that pass messages between nodes inthe graph.

Now we can solve for the MAP (approximately) in general energies. We can solve

for other problems than stereo

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 31 / 48

Let’s look at data/bechmarks

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 32 / 48

Benchmarks

Two benchmarks with very different characteristics

(Middlebury) (KITTI)

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 33 / 48

Middlebury Dataset

Middlebury Stereo Evaluation – Version 2

Laboratory

Lambertian

Rich in texture

Medium-size label set

Largely fronto-parallel

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 34 / 48

Middlebury Dataset

Middlebury Stereo Evaluation – Version 2

Laboratory

Lambertian

Rich in texture

Medium-size label set

Largely fronto-parallel

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 34 / 48

Middlebury Dataset

Middlebury Stereo Evaluation – Version 2

Laboratory

Lambertian

Rich in texture

Medium-size label set

Largely fronto-parallel

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 34 / 48

Middlebury Dataset

Middlebury Stereo Evaluation – Version 2

Laboratory

Lambertian

Rich in texture

Medium-size label set

Largely fronto-parallel

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 34 / 48

Middlebury Dataset

Middlebury Stereo Evaluation – Version 2

Laboratory

Lambertian

Rich in texture

Medium-size label set

Largely fronto-parallel

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 34 / 48

Benchmarks for Stereo and metrics

Middlebury Stereo Evaluation – Version 2

Best methods < 3% errors (for all non-occluded regions)

http://vision.middlebury.edu/stereo/data/

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 35 / 48

Benchmarks: KITTI Data Collection

Two stereo rigs (1392× 512 px, 54 cm base, 90◦ opening)

Velodyne laser scanner, GPS+IMU localization

6 hours at 10 frames per second!

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 36 / 48

The KITTI Vision Benchmark Suite

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 37 / 48

Novel Challenges

Fast guided cost-volume filtering (Rhemann et al., CVPR 2011)

Middlebury, Errors: 2.7%

Error threshold: 1 px (Middlebury) / 3 px (KITTI)

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 38 / 48

Novel Challenges

Fast guided cost-volume filtering (Rhemann et al., CVPR 2011)

Middlebury, Errors: 2.7% KITTI, Errors: 46.3%

Error threshold: 1 px (Middlebury) / 3 px (KITTI)

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 38 / 48

Novel Challenges

So what is the difference?

Middlebury

Laboratory

Lambertian

Rich in texture

Medium-size label set

Largely fronto-parallel

KITTI

Moving vehicle

Specularities

Sensor saturation

Large label set

Strong slants

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 39 / 48

Novel Challenges

So what is the difference?

Middlebury

Laboratory

Lambertian

Rich in texture

Medium-size label set

Largely fronto-parallel

KITTI

Moving vehicle

Specularities

Sensor saturation

Large label set

Strong slants

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 39 / 48

Novel Challenges

So what is the difference?

Middlebury

Laboratory

Lambertian

Rich in texture

Medium-size label set

Largely fronto-parallel

KITTI

Moving vehicle

Specularities

Sensor saturation

Large label set

Strong slants

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 39 / 48

Novel Challenges

So what is the difference?

Middlebury

d = 50 px

d = 16 px

Laboratory

Lambertian

Rich in texture

Medium-size label set

Largely fronto-parallel

KITTI

d = 150 px

d = 0 px

Moving vehicle

Specularities

Sensor saturation

Large label set

Strong slants

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 39 / 48

Novel Challenges

So what is the difference?

Middlebury

Laboratory

Lambertian

Rich in texture

Medium-size label set

Largely fronto-parallel

KITTI

Moving vehicle

Specularities

Sensor saturation

Large label set

Strong slants

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 39 / 48

Stereo Evaluation

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 40 / 48

MRFs for stereo

Global methods: define a Markov random field over

Pixel-level

Fronto-parallel planes

Slanted planes

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 41 / 48

Plane MRFs

First segment an image into small regions, i.e., superpixels

Assume that the 3D world is compose of small frontal/slanted planes

Good representation if the superpixels are small and respect boundaries

E (x1, · · · , xn) =∑i

C (xi ) +∑i

∑j∈Nj

C (xi , xj)

with xi ∈ < for the fronto-parallel planes, and xi ∈ <3 for the slanted planes

This are continuous variables. Is this a problem?

What can I do to solve this? Discretize the problem

The unitary are usually agreegation of cost over the local matching on thepixels in that superpixel

Pairwise is typically smoothness

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 42 / 48

Plane MRFs

First segment an image into small regions, i.e., superpixels

Assume that the 3D world is compose of small frontal/slanted planes

Good representation if the superpixels are small and respect boundaries

E (x1, · · · , xn) =∑i

C (xi ) +∑i

∑j∈Nj

C (xi , xj)

with xi ∈ < for the fronto-parallel planes, and xi ∈ <3 for the slanted planes

This are continuous variables. Is this a problem?

What can I do to solve this? Discretize the problem

The unitary are usually agreegation of cost over the local matching on thepixels in that superpixel

Pairwise is typically smoothness

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 42 / 48

Plane MRFs

First segment an image into small regions, i.e., superpixels

Assume that the 3D world is compose of small frontal/slanted planes

Good representation if the superpixels are small and respect boundaries

E (x1, · · · , xn) =∑i

C (xi ) +∑i

∑j∈Nj

C (xi , xj)

with xi ∈ < for the fronto-parallel planes, and xi ∈ <3 for the slanted planes

This are continuous variables. Is this a problem?

What can I do to solve this? Discretize the problem

The unitary are usually agreegation of cost over the local matching on thepixels in that superpixel

Pairwise is typically smoothness

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 42 / 48

Plane MRFs

First segment an image into small regions, i.e., superpixels

Assume that the 3D world is compose of small frontal/slanted planes

Good representation if the superpixels are small and respect boundaries

E (x1, · · · , xn) =∑i

C (xi ) +∑i

∑j∈Nj

C (xi , xj)

with xi ∈ < for the fronto-parallel planes, and xi ∈ <3 for the slanted planes

This are continuous variables. Is this a problem?

What can I do to solve this? Discretize the problem

The unitary are usually agreegation of cost over the local matching on thepixels in that superpixel

Pairwise is typically smoothness

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 42 / 48

Plane MRFs

First segment an image into small regions, i.e., superpixels

Assume that the 3D world is compose of small frontal/slanted planes

Good representation if the superpixels are small and respect boundaries

E (x1, · · · , xn) =∑i

C (xi ) +∑i

∑j∈Nj

C (xi , xj)

with xi ∈ < for the fronto-parallel planes, and xi ∈ <3 for the slanted planes

This are continuous variables. Is this a problem?

What can I do to solve this? Discretize the problem

The unitary are usually agreegation of cost over the local matching on thepixels in that superpixel

Pairwise is typically smoothness

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 42 / 48

Plane MRFs

First segment an image into small regions, i.e., superpixels

Assume that the 3D world is compose of small frontal/slanted planes

Good representation if the superpixels are small and respect boundaries

E (x1, · · · , xn) =∑i

C (xi ) +∑i

∑j∈Nj

C (xi , xj)

with xi ∈ < for the fronto-parallel planes, and xi ∈ <3 for the slanted planes

This are continuous variables. Is this a problem?

What can I do to solve this? Discretize the problem

The unitary are usually agreegation of cost over the local matching on thepixels in that superpixel

Pairwise is typically smoothness

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 42 / 48

Plane MRFs

First segment an image into small regions, i.e., superpixels

Assume that the 3D world is compose of small frontal/slanted planes

Good representation if the superpixels are small and respect boundaries

E (x1, · · · , xn) =∑i

C (xi ) +∑i

∑j∈Nj

C (xi , xj)

with xi ∈ < for the fronto-parallel planes, and xi ∈ <3 for the slanted planes

This are continuous variables. Is this a problem?

What can I do to solve this? Discretize the problem

The unitary are usually agreegation of cost over the local matching on thepixels in that superpixel

Pairwise is typically smoothness

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 42 / 48

Slanted-plane MRFs

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 43 / 48

A more sophisticated occlusion model

MRF on continuous variables (slanted planes) and discrete var. (boundary)

Combines depth ordering (segmentation) and stereo

!"#$%&'(

)*+,*$- !"#$"%&'()*+),-").&$-*%/01/2.&$*/"3/4*+,*$-

./0%1)*2'()*+),-"5*.&6"$4782/9*-:**$/4*+,*$-4/

;/4-&-*4/

<==.#48"$ >8$+* ?"2.&$&'

?"$6$#"#4/@&'8&9.*/

184='*-*/@&'8&9.*/)#2*'28A*.4/BC?D/EF'9*.&*GH/*-/&.I/JKLLM/

////////////////////////&$%/)NO?/EF=7&$-&H/*-/&.I/JKLKMP/

Takes as input disparities computed by any local algorithm

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 44 / 48

Energy of PCBP-Stereo

y the set of slanted 3D planes, o the set of discrete boundary variables

E (y, o) = Ecolor (o) + Ematch(y, o) + Ecompatibility (y, o) + Ejunction(o)

!"#"$%&'()$)&' *"+,$-'.)'/,'()0$%1%&'

*,2'"#%3,'

4)$)&'5"#"$%&".-'

!"#"$%&'

6"55"#"$%&'

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 45 / 48

Energy of PCBP-Stereo

y the set of slanted 3D planes, o the set of discrete boundary variables

E (y, o) = Ecolor (o) + Ematch(y, o) + Ecompatibility (y, o) + Ejunction(o)

!"#$$%$&'()*'+(#$,-.'(/0(*&1-'(2*,13#*'4(%31(

5/%1-'$2(64(3&4(%3'7+*&"(%$'+/2((((89/2*:$2(,$%*;"./63.(%3'7+*&"<(

=*,13#*'4(%31( >.3&'$2(1.3&$(

?#-&73'$2(@-32#3A7(0-&7A/&

B&(6/-&23#4(CB77.-,*/&D(E(F/#$"#/-&2(,$"%$&'(/)&,(6/-&23#4(

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 46 / 48

Energy of PCBP-Stereo

y the set of slanted 3D planes, o the set of discrete boundary variables

E (y, o) = Ecolor (o) + Ematch(y, o) + Ecompatibility (y, o) + Ejunction(o)

!"#$%&'()*&+,*-) .*&$/,%)0*&$,-)!"#$%&

1',2,',$3,)"2)+"#$%&'()*&+,*) 45"0*&$&')6)78$9,)6):33*#-8"$;)

<&/3=)

>:33*#-8"$?)

>78$9,?)

>5"0*&$&'?)

i j i j

i j

i j

on boundary

in both segments

@<0"-,)0,$&*/()

4A;)

4B;)

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 47 / 48

Energy of PCBP-Stereo

y the set of slanted 3D planes, o the set of discrete boundary variables

E (y, o) = Ecolor (o) + Ematch(y, o) + Ecompatibility (y, o) + Ejunction(o)

!""#$%&'()*'$(+,-.)-/,%'(&(0)

123'%%&*#/)",%/%)

!""#$%&'()4-'(5)

6,"7)8&(0/)9'3#,(,-)

:;,#&7)<=>?@)

A/(,#&B/)&23'%%&*#/)C$("D'(%)

Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 48 / 48

top related