harmony voc 2010 - university of oxfordhost.robots.ox.ac.uk/pascal/voc/voc2010/workshop/cvc.pdf ·...

Harmony Poten,als: Fusing Global and Local Scale for Seman,c Image Segmenta,on

J. M. Gonfaus X. Boix

F. S. Khan J. van de Weijer

A. Bagdanov M. Pedersoli

J. Serrat X. Roca

J. Gonzàlez

Mo,va,on (I)

•  Why combine global and local scale?

Mo,va,on (I) •  Classifica,on is oPen impossible based on local appearance only.

Image Classifier

0 0,5 1

Chair

Plant

Sofa

Bus

Aeroplane

Context is a powerful and dis,nc,ve cue

Mo,va,on (II)

•  How can we improve local classifiers?

0 0,5 1

Dog

Cat

Horse

Cow

Aeroplane

Is this the foreground or the background of…

0 0,5 1

Dog

Cat

Horse

Cow

Aeroplane

Is this object X or some other object

Why not combine them? Bad class discrimina,on

Good figure segmenta,on Inaccurate segmenta,on

Good class discrimina,on

Mo,va,on (II)

•  How can we improve local classifiers? – More informa,on sources •  Mid-‐level informa,on through object detectors

Outline

•  Overview of our method •  How to fuse local and global scale – Harmony Poten,als* – CVC_Harmony submission (35.4% on test)

•  Improving local classifiers – CVC_Harmony+Det submission (40.1% on test)

•  Results •  Conclusions

*J.M. Gonfaus, X. Boix, J. Van de Weijer, A. D. Bagdanov, J. Serrat, J. Gonzàlez “Harmony Poten,als for Joint Classifica,on and Segmenta,on”, in CVPR 2010

Overview of our method


•  Unsupervised segmenta,on. –  Around 500 superpixels/image


•  Unsupervised segmenta,on.

•  Superpixel nodes

–  Unary poten,al •  BoW inside AND neighborhood

–  Smoothness poten,al

•  Pairwise Pois poten,al –  BoW

•  SIFT, RGB Histogram, SSIM •  Mul,scale: 12, 24, 36, 48 square patches •  Step size 50% of the patch

•  Quan,zed to 1000, 400, 300 words •  Learned on SVM with 8000 samples +

retraining

(CVC_Harmony)




–  Unary poten,al •  BoW inside AND neighborhood

•  Detec,on scores •  Loca,on prior

–  Smoothness poten,al

•  Pairwise Pois poten,al –  BoW

•  SIFT, RGB Histogram, SSIM •  Mul,scale: 12, 24, 36, 48 square patches •  Step size 50% of the patch

•  Quan,zed to 1000, 400, 300 words •  Learned on SVM with 8000 samples +

retraining

(CVC_Harmony+det)




•  Global Node –  Unary poten,al:

•  Global classifier method •  CVC_flat submission:

mAP: 61% for classifica,on task

–  Consistency poten,al •  From global node to each sp

•  Harmony Poten,al

Model

Unary Poten,al Smoothness Poten,al Consistency Poten,al

Model

Consistency Poten,al

Consistency poten,al

•  Ground-‐Truth

•  Unary Poten,als

•  Pois-‐based Poten,als

•  Robust PN Poten,als •  Harmony Poten,als






GT


Free •  Ground-‐Truth




GT






GT


=

All possible label combina,ons is unfeasible


Prior From the training data we extract the co-‐occurrence sta,s,cs of labels

Likelihood Image classifica,on scores each combina,on

•  Ranked subsampling of Few best combina,ons are required to saturate the performance

Model

Unary Poten,al

Unary poten,al

•  Local classifiers are weak classifiers – Too ambiguous because liile informa,on is used

•  Combining mul,ple classifiers makes our local unary poten,al stronger.

•  Features: –  foreground/background – class versus others – object detec,ons – spa,al loca,on prior

Ffg-‐bg: Fore-‐Background

•  Easy to iden,fy whether the superpixel belongs to the object class or to its common background

Fclass: Class vs. other classes

•  Learning how different an object is from its common background becomes difficult for certain class combina,ons

Foreground Background

Fclass: Class vs. other classes

•  Learning how different an object is from its common background becomes difficult for certain class combina,ons

Fposi,on: Loca,on prior

•  Objects tend to appear in class-‐specific, par,cular loca,ons (and not at the borders)

Fdet: Object detector* scores

•  Mid-‐level informa,on is added by considering object detec,ons [Felzenszwalb et al. 2010].

•  Average over superpixel area with maximum detec,on score at each pixel.

•  Scores = [-‐1 , ∞)

•  Class specific “No detec,on” score is learned. •  Keeps the CRF and the model simple.

*Felzenszwalb, Girshick, McAllester, Ramanan, “Object Detec,on with Discriminately Trained Part based models”, PAMI 2010

Fdet: Object detector* scores

Results on valida,on set 2010

33,2

23,4

20

26

34,5

36,6

40,1

15 20 25 30 35 40

Fg_Bk

Class

Loc

Det

Fg_Bk + Loc

Fg_Bk + Class

All

Mean Average Precision

2009 submission

Combina,on of features

•  Naïve Bayes approach •  Specific sigmoid per class and per classifier

•  Total number of parameters to be learned:

2x20x4 + 20 + 4 + 1 = 185 parameters

•  All parameters are jointly op,mized by stochas,c steepest ascent

€

φ(xi) =1

1+ exp(−a f xif + b f )f ∈F

∏

feature sigmoids no_detec,on score

CRF weights

background probability

Results on valida,on set 2010

33,2

23,4

20

26

34,5

36,6

39,2

15 20 25 30 35 40

Fg_Bk

Class

Loc

Det

Fg_Bk + Loc

Fg_Bk + Class

All

Mean Average Precision

CVC_Harmony 2010 submission 35,4 on test

2009 submission

CVC_Harmony_Det 2010 submission 40,1 on test

Illustra,ve examples

Fg/bg class

*

*

*

det

*

*

*

loc

*

*

*

=

=

=

=

=

=

=

=

=

final unary

Final results

Conclusions

•  Harmony poten,al is an effec,ve way to fuse global and local scales for seman,c image segmenta,on.

•  We have focused on improving the local classifiers

•  Baseline: 29% + combining fg/bg and mul,class classifiers (+2%) + object detec,on (+3%) + loca,on prior (+1%) + per class parameter op,miza,on (+5%)

more details: hip://iselab.cvc.uab.es/pvoc2010

Thanks for your aien,on!

Gràcies per la vostra atenció!

Ευχαριστω για την προσοχη σας

Full Prac,cal Example

Ffgbg: Fore-‐Back ground

Fclass: Class against other classes

Close-‐up comparison

Fore-‐Back ground learning

Class against others learning

Ffgbg * Fclass

Fdet: Detector Scores

Ffgbg*Fclass*Fdet

Floca,on: Loca,on Prior

Ffgbg*Fclass*Fdet*Floc

Result

harmony voc 2010 - university of oxfordhost.robots.ox.ac.uk/pascal/voc/voc2010/workshop/cvc.pdf ·...

Documents