multiplecosegmentation - stanford ai...

56
Introduction Method overview Spatial consistency Discriminative clustering Optimization Results Multiple cosegmentation Armand Joulin, Francis Bach and Jean Ponce. INRIA -Ecole Normale Sup´ erieure April 25, 2012 Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Upload: others

Post on 23-Mar-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

Multiple cosegmentation

Armand Joulin,Francis Bach and Jean Ponce.

INRIA -Ecole Normale Superieure

April 25, 2012

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 2: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

SegmentationSupervised and weakly-supervised segmentationCosegmentation

Segmentation

Segmentation is classical and fundamental vision problem.

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 3: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

SegmentationSupervised and weakly-supervised segmentationCosegmentation

Segmentation

Segmentation is classical and fundamental vision problem.

Problem: Many possible solutions.

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 4: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

SegmentationSupervised and weakly-supervised segmentationCosegmentation

Existing solutions

Supervised Segmentation:

Need ground truth for every class of objectCannot deal with an unknown object.

P. Krahenbuhl and V. Koltun (NIPS’11)

Interactive segmentation (scribbles or a bounding box)

Need human interaction for each image.

GrabCut

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 5: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

SegmentationSupervised and weakly-supervised segmentationCosegmentation

Cosegmentation

Dividing one images.

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 6: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

SegmentationSupervised and weakly-supervised segmentationCosegmentation

Cosegmentation

Dividing a set of images by using shared information.

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 7: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

SegmentationSupervised and weakly-supervised segmentationCosegmentation

Cosegmentation

Dividing a set of images by using shared information.

No prior information.

But: common foreground and different background.

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 8: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

SegmentationSupervised and weakly-supervised segmentationCosegmentation

Cosegmentation

Previous existing methods (Rother et al. 2006, Singh andHochbaum 2009,...) only work with 2 images and the exactsame object.

The first presented method works on multiple images and onan object class.

The second one extends it to multiple images and multipleobject classes.

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 9: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

SegmentationSupervised and weakly-supervised segmentationCosegmentation

....Cosegmentation is also a ill-posed problem

In natural images, objects are link with their environement

...the background is also common to all the images.

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 10: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

SegmentationSupervised and weakly-supervised segmentationCosegmentation

....Cosegmentation is also a ill-posed problem

In natural images, objects are link with their environement

...the background is also common to all the images.

Solutions:

Use user interaction on some images,

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 11: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

SegmentationSupervised and weakly-supervised segmentationCosegmentation

....Cosegmentation is also a ill-posed problem

In natural images, objects are link with their environement

...the background is also common to all the images.

Solutions:

Use user interaction on some images,

Segment the background into meaningful regions.

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 12: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

Goal of our approachNotations

The goals of our approach

Our method should:

Handle multiple images.

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 13: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

Goal of our approachNotations

The goals of our approach

Our method should:

Handle multiple images.

Works on any kind of object/stuff.

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 14: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

Goal of our approachNotations

The goals of our approach

Our method should:

Handle multiple images.

Works on any kind of object/stuff.

Segments the ”background” into meaningful regions.

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 15: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

Goal of our approachNotations

The goals of our approach

Our method should:

Handle multiple images.

Works on any kind of object/stuff.

Segments the ”background” into meaningful regions.

Uses no prior information but can be easily extended tointeractive cosegmentation.

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 16: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

Goal of our approachNotations

Method goals

Local consistency

Figure: Image

space.

Maximizing spatial consistencywithin a particular image.

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 17: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

Goal of our approachNotations

Method goals

Local consistency

Figure: Image

space.

Figure:Feature space.

Maximizing spatial consistencywithin a particular image.

Separation of the classes

Maximizing the separabilityof K classes between different images

Our framework:Unsupervised discriminative clustering.

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 18: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

Goal of our approachNotations

Problem Notations

Each image i is reduced to a subsampled grid of pixels.

For the n-th pixel, we denote by:

xn its d-dimensional feature vector.yn the K -vector such as ynk = 1 if the n-th pixel is in thek-class and 0 otherwise.

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 19: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

Spatial consistency

Figure: Image space.

Normalized Cut (Shi and Malik, 2000):

The similarty between two pixels is mesured by the rbfdistance between their position pn and their color cn.

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 20: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

Spatial consistency

Figure: Image space.

Normalized Cut (Shi and Malik, 2000):

The similarty between two pixels is mesured by the rbfdistance between their position pn and their color cn.

For an image i , our similarity matrix is:

W inm = exp(−λp‖pn − pm‖

22 − λc‖cn − cm‖

2).

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 21: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

Spatial consistency

Figure: Image space.

Normalized Cut (Shi and Malik, 2000):

The Laplacian matrix is L = I − D−1/2WD−1/2 where D thediagonal matrix composed of the row sums of W

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 22: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

Spatial consistency

Figure: Image space.

Normalized Cut (Shi and Malik, 2000):

The Laplacian matrix is L = I − D−1/2WD−1/2 where D thediagonal matrix composed of the row sums of W

We thus have the following in our cost function:

EB(y) =µ

Ntr(yTLy).

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 23: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

FormulationMapping approximationLoss functionCluster size balancingOverall problemProbabilistic interpretation

Discriminative clustering

Figure: Feature space.

Discriminative classifier:

given the labels y , we solve the following problem:

EU(y) = minA∈IR

K×d,

b∈IRK

1

N

N∑

n=1

ℓ(yn,Aφ(xn) + b) +λ

2K‖A‖2F ,

Notations

φ a non-linear mapping of the feature,

ℓ is a cost function.Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 24: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

FormulationMapping approximationLoss functionCluster size balancingOverall problemProbabilistic interpretation

Discriminative clustering

Figure: Feature space.

Discriminative classifier:

given the labels y , we solve the following problem:

EU(y) = minA∈IR

K×d,

b∈IRK

1

N

N∑

n=1

ℓ(yn,Aφ(xn) + b) +λ

2K‖A‖2F ,

Notations

φ a non-linear mapping of the feature,

ℓ is a cost function.Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 25: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

FormulationMapping approximationLoss functionCluster size balancingOverall problemProbabilistic interpretation

Mapping approximation

Our discriminative clustering framework works with positivedefinite kernels

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 26: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

FormulationMapping approximationLoss functionCluster size balancingOverall problemProbabilistic interpretation

Mapping approximation

Our discriminative clustering framework works with positivedefinite kernels

We use the χ2 kernel matrix K:

Knm = exp

(

− λh

D∑

d=1

(xnd − xmd)2

xnd + xmd

)

,

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 27: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

FormulationMapping approximationLoss functionCluster size balancingOverall problemProbabilistic interpretation

Mapping approximation

Our discriminative clustering framework works with positivedefinite kernels

We use the χ2 kernel matrix K:

Knm = exp

(

− λh

D∑

d=1

(xnd − xmd)2

xnd + xmd

)

,

Equivalent to apply a mapping φ from the feature space to ahigh-dimensional Hilbert space F , such that:

Knm = 〈φ(xn), φ(xm)〉

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 28: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

FormulationMapping approximationLoss functionCluster size balancingOverall problemProbabilistic interpretation

Discriminative clustering

Figure: Feature space.

Discriminative classifier:

given the labels y , we solve the following problem:

EU(y) = minA∈IR

K×d,

b∈IRK

1

N

N∑

n=1

ℓ(yn,Aφ(xn) + b) +λ

2K‖A‖2F ,

Notations

φ a non-linear mapping of the feature,

ℓ is a cost function.Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 29: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

FormulationMapping approximationLoss functionCluster size balancingOverall problemProbabilistic interpretation

Loss function

We choose the soft-max loss function because it is suited formulticlass and is related to probabilistic models:

ℓ(yn,Aφ(xn) + b) = −K∑

k=1

ynk log

(

exp(aTk φ(xn) + bk)∑K

l=1 exp(aTl φ(xn) + bl)

)

,

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 30: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

FormulationMapping approximationLoss functionCluster size balancingOverall problemProbabilistic interpretation

Discriminative clustering

Find the set of labels y which leads to the best dataseparation into K classes:

miny∈{0,1}N×K ,

y1K=1N

minA∈IRK×d ,b∈IRK

EU(y ,A, b)

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 31: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

FormulationMapping approximationLoss functionCluster size balancingOverall problemProbabilistic interpretation

Discriminative clustering

Find the set of labels y which leads to the best dataseparation into K classes:

miny∈{0,1}N×K ,

y1K=1N

minA∈IRK×d ,b∈IRK

EU(y ,A, b)

Problem: Same label for all the pixels → perfect separation

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 32: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

FormulationMapping approximationLoss functionCluster size balancingOverall problemProbabilistic interpretation

Cluster size balancing

Two solutions:adding linear constraints on the number of elements per classEncourage the proportion of points per class to be uniform

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 33: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

FormulationMapping approximationLoss functionCluster size balancingOverall problemProbabilistic interpretation

Cluster size balancing

Two solutions:adding linear constraints on the number of elements per classEncourage the proportion of points per class to be uniform

We choose the second: No additional parameters and have aprobabilistic interpretation.

H(y) = −∑

i∈I

K∑

k=1

(

1

N

n∈Ni

ynk

)

log

(

1

N

n∈Ni

ynk

)

.

where i is an image, and Ni the number of pixels in i

Note: In a weakly supervised setting (e.g., interactivesegmentation), this term can be modify to take into accountprior knowledge.

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 34: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

FormulationMapping approximationLoss functionCluster size balancingOverall problemProbabilistic interpretation

Overall problem

Combining the unary and binary term with the class balancingterm, we obtain the following problem:

miny∈{0,1}N×K ,

y1K=1N

[

minA∈IRd×K ,b∈IRK

EU(y ,A, b)

]

+ EB(y)− H(y).

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 35: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

FormulationMapping approximationLoss functionCluster size balancingOverall problemProbabilistic interpretation

Probabilistic interpretation

We introduce tn in {0, 1}|I| indicating to which image n

belongs and zn in {1, . . . ,M} giving for each pixel n someobservable information

The label y is a latent variable of the observable information zgiven x (x → y → z ← t) inducing an “explain away”phenomenon:

the label yn and the variable tn compete to explain theobservable information zn.

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 36: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

FormulationMapping approximationLoss functionCluster size balancingOverall problemProbabilistic interpretation

Probabilistic interpretation

More precisely, we suppose a bilinear model:

P(znm = 1 | tni = 1, ynk = 1) = ynkGikm tni ,

where∑N

m=1 Gikm = 1

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 37: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

FormulationMapping approximationLoss functionCluster size balancingOverall problemProbabilistic interpretation

Probabilistic interpretation

More precisely, we suppose a bilinear model:

P(znm = 1 | tni = 1, ynk = 1) = ynkGikm tni ,

where∑N

m=1 Gikm = 1

and a exponential family model for Y = (y1, . . . , yN) givenX = (x1, . . . , xN) with unary parameters (A, b) and binaryparameters L.

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 38: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

FormulationMapping approximationLoss functionCluster size balancingOverall problemProbabilistic interpretation

Probabilistic interpretation

Our cost function is the mean-field variational approximationof the following (regularized) negative conditionallog-likelihood of Z = (z1, . . . , zN) given X andT = (t1, . . . , tN) for our model:

minA∈IRd×K ,b∈IRK ,G∈IRN×K |I|,

GT 1N=1, G≥0

−1

N

N∑

n=1

log(

p(zn | xn, tn))

2K‖A‖22.

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 39: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

FormulationMapping approximationLoss functionCluster size balancingOverall problemProbabilistic interpretation

Probabilistic interpretation

Our cost function is the mean-field variational approximationof the following (regularized) negative conditionallog-likelihood of Z = (z1, . . . , zN) given X andT = (t1, . . . , tN) for our model:

minA∈IRd×K ,b∈IRK ,G∈IRN×K |I|,

GT 1N=1, G≥0

−1

N

N∑

n=1

log(

p(zn | xn, tn))

2K‖A‖22.

Z can encode “must-link” and “must-not-link” constraintsbetween pixels (e.g., superpixels).

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 40: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

EM procedure

miny∈{0,1}N×K ,

y1K=1N

[

minA∈IRd×K ,b∈IRK

EU(y ,A, b)

]

+ EB(y)− H(y).

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 41: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

EM procedure

miny∈{0,1}N×K ,

y1K=1N

[

minA∈IRd×K ,b∈IRK

EU(y ,A, b)

]

+ EB(y)− H(y).

This cost function is not jointly convex in y and (A, b).

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 42: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

EM procedure

miny∈{0,1}N×K ,

y1K=1N

[

minA∈IRd×K ,b∈IRK

EU(y ,A, b)

]

+ EB(y)− H(y).

This cost function is not jointly convex in y and (A, b).

However it is convex in both independently.

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 43: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

EM procedure

miny∈{0,1}N×K ,

y1K=1N

[

minA∈IRd×K ,b∈IRK

EU(y ,A, b)

]

+ EB(y)− H(y).

This cost function is not jointly convex in y and (A, b).

However it is convex in both independently.

We alternatively optimize over each variable while fixing theother:

We use L-BFGS for (A, b)We use a projected gradient descent for y .

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 44: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

The initialization

Since our problem is not convex, a good initialization is crucial

We propose a quadratic convex approximation related toJoulin et al. (CVPR’10).

Quadratic function may lead to poor solutions, thus we alsouse random initializations.

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 45: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

Initialization: Quadratic approximation

The second-order Taylor expansion of our cost function is:

J(y) =K

2

[

tr(yyTC ) +2µ

NKtr(yyTL)−

1

Ntr(yyTΠI )

]

,

where C = 1NΠN(I − Φ(NλIK +ΦTΠNΦ)

−1ΦT )ΠN is relatedto the reweighted ridge regression classifier (Joulin et al.CVPR’10).

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 46: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

Initialization: Quadratic approximation

The second-order Taylor expansion of our cost function is:

J(y) =K

2

[

tr(yyTC ) +2µ

NKtr(yyTL)−

1

Ntr(yyTΠI )

]

,

where C = 1NΠN(I − Φ(NλIK +ΦTΠNΦ)

−1ΦT )ΠN is relatedto the reweighted ridge regression classifier (Joulin et al.CVPR’10).

This is not convex because of the last term which can bereplaced by the following linear constraints:

n∈Ni

ynk ≤ 0.9Ni ;∑

j∈I\i

n∈Nj

ynk ≥ 0.1(N − Ni ).

we obtain a formulation similar to Joulin et al. (CVPR’10).

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 47: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

Results

Binary segmentation (foreground/background) on MSRC:

High variability in foreground and background,around 30 images per classes,We use SIFT features.

Multiclass cosegmentation on iCoseg:

Low variability in the image, same illumination...around 10 images per classes,We use color histograms.

Some extensions:

Grabcut.weakly supervised problemvideo key frames.

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 48: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

Binary cosegmentation

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 49: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

Binary cosegmentation

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 50: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

Binary cosegmentation

class Ours Kim et al. (ICCV’11) Joulin et al. (CVPR’10)

Bike 43.3 29.9 42.3

Bird 47.7 29.9 33.2

Car 59.7 37.1 59.0

Cat 31.9 24.4 30.1

Chair 39.6 28.7 37.6

Cow 52.7 33.5 45.0

Dog 41.8 33.0 41.3

Face 70.0 33.2 66.2

Flower 51.9 40.2 50.9

House 51.0 32.2 50.5

Plane 21.6 25.1 21.7

Sheep 66.3 60.8 60.4

Sign 58.9 43.2 55.2

Tree 67.0 61.2 60.0

Average 50.2 36.6 46.7

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 51: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

Multiple cosegmentation

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 52: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

Multiple cosegmentation

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 53: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

Multiple cosegmentation

class K Ours Joulin et al. CVPR’10 Kim et al ICCV’11

Baseball player 5 62.2 53.5 51.1

Brown bear 3 75.6 78.5 40.4

Elephant 4 65.5 51.2 43.5

Ferrari 4 65.2 63.2 60.5

Football player 5 51.1 38.8 38.3

Helicopter 3 43.3 67.8 7.3

Kite Panda 2 57.8 58.0 66.2Monk 2 77.6 76.9 71.3

Panda 3 55.9 49.1 39.4

Skating 2 64.0 47.2 51.1

Stonehedge 3 86.3 85.4 64.6

Plane 3 45.8 39.2 25.2

Face 3 70.5 56.4 33.2

Average 64.8 58.1 48.7

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 54: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

Extensions

grabCut:

Weakly supervised learning with image tags ({ plane, sheep,sky, grass}).

Video shot segmentation:

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 55: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

Limitations

Number of classes:

Each class must be in each image (because of the entropy).

Running time: About half an hour to one hour (MATLABimplementation).

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation

Page 56: Multiplecosegmentation - Stanford AI Labai.stanford.edu/~ajoulin/slides/JoulBachPonceCVPR12_slides.pdf · Need ground truth for every class of object Cannot deal with an unknown object

IntroductionMethod overview

Spatial consistencyDiscriminative clustering

OptimizationResults

Thank you.

Armand Joulin, Francis Bach and Jean Ponce. Multiple cosegmentation