harmony voc 2010 - university of oxfordhost.robots.ox.ac.uk/pascal/voc/voc2010/workshop/cvc.pdf ·...
TRANSCRIPT
Harmony Poten,als: Fusing Global and Local Scale for Seman,c Image Segmenta,on
J. M. Gonfaus X. Boix
F. S. Khan J. van de Weijer
A. Bagdanov M. Pedersoli
J. Serrat X. Roca
J. Gonzàlez
Mo,va,on (I)
• Why combine global and local scale?
Mo,va,on (I)
• Why combine global and local scale?
Mo,va,on (I) • Classifica,on is oPen impossible based on local appearance only.
Image Classifier
0 0,5 1
Chair
Plant
Sofa
Bus
Aeroplane
Context is a powerful and dis,nc,ve cue
Mo,va,on (II)
• How can we improve local classifiers?
0 0,5 1
Dog
Cat
Horse
Cow
Aeroplane
Is this the foreground or the background of…
0 0,5 1
Dog
Cat
Horse
Cow
Aeroplane
Is this object X or some other object
Why not combine them? Bad class discrimina,on
Good figure segmenta,on Inaccurate segmenta,on
Good class discrimina,on
Mo,va,on (II)
• How can we improve local classifiers?
0 0,5 1
Dog
Cat
Horse
Cow
Aeroplane
Is this the foreground or the background of…
0 0,5 1
Dog
Cat
Horse
Cow
Aeroplane
Is this object X or some other object
Why not combine them? Bad class discrimina,on
Good figure segmenta,on Inaccurate segmenta,on
Good class discrimina,on
Mo,va,on (II)
• How can we improve local classifiers? – More informa,on sources • Mid-‐level informa,on through object detectors
Outline
• Overview of our method • How to fuse local and global scale – Harmony Poten,als* – CVC_Harmony submission (35.4% on test)
• Improving local classifiers – CVC_Harmony+Det submission (40.1% on test)
• Results • Conclusions
*J.M. Gonfaus, X. Boix, J. Van de Weijer, A. D. Bagdanov, J. Serrat, J. Gonzàlez “Harmony Poten,als for Joint Classifica,on and Segmenta,on”, in CVPR 2010
Overview of our method
Overview of our method
• Unsupervised segmenta,on. – Around 500 superpixels/image
Overview of our method
• Unsupervised segmenta,on.
• Superpixel nodes
– Unary poten,al • BoW inside AND neighborhood
– Smoothness poten,al
• Pairwise Pois poten,al – BoW
• SIFT, RGB Histogram, SSIM • Mul,scale: 12, 24, 36, 48 square patches • Step size 50% of the patch
• Quan,zed to 1000, 400, 300 words • Learned on SVM with 8000 samples +
retraining
(CVC_Harmony)
Overview of our method
• Unsupervised segmenta,on.
• Superpixel nodes
– Unary poten,al • BoW inside AND neighborhood
• Detec,on scores • Loca,on prior
– Smoothness poten,al
• Pairwise Pois poten,al – BoW
• SIFT, RGB Histogram, SSIM • Mul,scale: 12, 24, 36, 48 square patches • Step size 50% of the patch
• Quan,zed to 1000, 400, 300 words • Learned on SVM with 8000 samples +
retraining
(CVC_Harmony+det)
Overview of our method
• Unsupervised segmenta,on.
• Superpixel nodes
• Global Node – Unary poten,al:
• Global classifier method • CVC_flat submission:
mAP: 61% for classifica,on task
– Consistency poten,al • From global node to each sp
• Harmony Poten,al
Model
Unary Poten,al Smoothness Poten,al Consistency Poten,al
Model
Consistency Poten,al
Consistency poten,al
• Ground-‐Truth
• Unary Poten,als
• Pois-‐based Poten,als
• Robust PN Poten,als • Harmony Poten,als
Consistency poten,al
• Ground-‐Truth
• Unary Poten,als
• Pois-‐based Poten,als
• Robust PN Poten,als • Harmony Poten,als
GT
Consistency poten,al
• Ground-‐Truth
• Unary Poten,als
• Pois-‐based Poten,als
• Robust PN Poten,als • Harmony Poten,als
GT
Consistency poten,al
Free • Ground-‐Truth
• Unary Poten,als
• Pois-‐based Poten,als
• Robust PN Poten,als • Harmony Poten,als
GT
Consistency poten,al
• Ground-‐Truth
• Unary Poten,als
• Pois-‐based Poten,als
• Robust PN Poten,als • Harmony Poten,als
GT
Consistency poten,al
=
All possible label combina,ons is unfeasible
Consistency poten,al
Prior From the training data we extract the co-‐occurrence sta,s,cs of labels
Likelihood Image classifica,on scores each combina,on
• Ranked subsampling of Few best combina,ons are required to saturate the performance
Model
Unary Poten,al
Unary poten,al
• Local classifiers are weak classifiers – Too ambiguous because liile informa,on is used
• Combining mul,ple classifiers makes our local unary poten,al stronger.
• Features: – foreground/background – class versus others – object detec,ons – spa,al loca,on prior
Ffg-‐bg: Fore-‐Background
• Easy to iden,fy whether the superpixel belongs to the object class or to its common background
Ffg-‐bg: Fore-‐Background
• Easy to iden,fy whether the superpixel belongs to the object class or to its common background
Ffg-‐bg: Fore-‐Background
• Easy to iden,fy whether the superpixel belongs to the object class or to its common background
Fclass: Class vs. other classes
• Learning how different an object is from its common background becomes difficult for certain class combina,ons
Foreground Background
Fclass: Class vs. other classes
• Learning how different an object is from its common background becomes difficult for certain class combina,ons
Fposi,on: Loca,on prior
• Objects tend to appear in class-‐specific, par,cular loca,ons (and not at the borders)
Fdet: Object detector* scores
• Mid-‐level informa,on is added by considering object detec,ons [Felzenszwalb et al. 2010].
• Average over superpixel area with maximum detec,on score at each pixel.
• Scores = [-‐1 , ∞)
• Class specific “No detec,on” score is learned. • Keeps the CRF and the model simple.
*Felzenszwalb, Girshick, McAllester, Ramanan, “Object Detec,on with Discriminately Trained Part based models”, PAMI 2010
Fdet: Object detector* scores
Results on valida,on set 2010
33,2
23,4
20
26
34,5
36,6
40,1
15 20 25 30 35 40
Fg_Bk
Class
Loc
Det
Fg_Bk + Loc
Fg_Bk + Class
All
Mean Average Precision
2009 submission
Combina,on of features
• Naïve Bayes approach • Specific sigmoid per class and per classifier
• Total number of parameters to be learned:
2x20x4 + 20 + 4 + 1 = 185 parameters
• All parameters are jointly op,mized by stochas,c steepest ascent
€
φ(xi) =1
1+ exp(−a f xif + b f )f ∈F
∏
feature sigmoids no_detec,on score
CRF weights
background probability
Results on valida,on set 2010
33,2
23,4
20
26
34,5
36,6
39,2
15 20 25 30 35 40
Fg_Bk
Class
Loc
Det
Fg_Bk + Loc
Fg_Bk + Class
All
Mean Average Precision
CVC_Harmony 2010 submission 35,4 on test
2009 submission
CVC_Harmony_Det 2010 submission 40,1 on test
Illustra,ve examples
Fg/bg class
*
*
*
det
*
*
*
loc
*
*
*
=
=
=
=
=
=
=
=
=
final unary
Illustra,ve examples
Fg/bg class
*
*
*
det
*
*
*
loc
*
*
*
=
=
=
=
=
=
=
=
=
final unary
Final results
Conclusions
• Harmony poten,al is an effec,ve way to fuse global and local scales for seman,c image segmenta,on.
• We have focused on improving the local classifiers
• Baseline: 29% + combining fg/bg and mul,class classifiers (+2%) + object detec,on (+3%) + loca,on prior (+1%) + per class parameter op,miza,on (+5%)
more details: hip://iselab.cvc.uab.es/pvoc2010
Thanks for your aien,on!
Gràcies per la vostra atenció!
Ευχαριστω για την προσοχη σας
Full Prac,cal Example
Ffgbg: Fore-‐Back ground
Fclass: Class against other classes
Close-‐up comparison
Fore-‐Back ground learning
Class against others learning
Ffgbg * Fclass
Fdet: Detector Scores
Ffgbg*Fclass*Fdet
Floca,on: Loca,on Prior
Ffgbg*Fclass*Fdet*Floc
Result