learning to segment with diverse data
DESCRIPTION
Learning to Segment with Diverse Data. M. Pawan Kumar Stanford University. Semantic Segmentation. sky. tree. car. road. grass. Segmentation Models. sky. tree. car. MODEL w. road. grass. x. y. P( x , y ; w ). y * = argmin y E( x , y ; w ). y * = argmax y P( x , y ; w ). - PowerPoint PPT PresentationTRANSCRIPT
Learning to Segment withDiverse Data
M. Pawan KumarStanford University
Semantic Segmentation
car
roadgrass
treesky
Segmentation Models
car
roadgrass
treesky
MODELw
x y P(x,y; w)
Learn accurate parameters
y* = argmaxy P(x,y; w)
P(x,y; w) α exp(-E(x,y;w))
y* = argminy E(x,y; w)
Fully Supervised Data
“Fully” Supervised DataSpecific foreground classes, generic background class
PASCAL VOC Segmentation Datasets
“Fully” Supervised DataSpecific background classes, generic foreground class
Stanford Background Datasets
J. Gonfaus et al. Harmony Potentials for Joint Classification and Segmentation. CVPR, 2010
S. Gould et al. Multi-Class Segmentation with Relative Location Prior. IJCV, 2008 S. Gould et al. Decomposing a Scene into Geometric and Semantically Consistent
Regions. ICCV, 2009 X. He et al. Multiscale Conditional Random Fields for Image Labeling. CVPR, 2004 S. Konishi et al. Statistical Cues for Domain Specific Image Segmentation with
Performance Analysis. CVPR, 2000 L. Ladicky et al. Associative Hierarchical CRFs for Object Class Image Segmentation.
ICCV, 2009 F. Li et al. Object Recognition as Ranking Holistic Figure-Ground Hypotheses. CVPR,
2010 J. Shotton et al. TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-
Class Object Recognition and Segmentation. ECCV, 2006 J. Verbeek et al. Scene Segmentation with Conditional Random Fields Learned from
Partially Labeled Images. NIPS, 2007 Y. Yang et al. Layered Object Detection for Multi-Class Segmentation. CVPR, 2010
Supervised Learning
Generic classes, burdensome annotation
PASCAL VOC Detection Datasets
Thousands of images
Weakly Supervised DataBounding Boxes for Objects
“Car”
Weakly Supervised Data
Thousands of images
ImageNet,Caltech…
Image-Level Labels
B. Alexe et al. ClassCut for Unsupervised Class Segmentation. ECCV, 2010 H. Arora et al. Unsupervised Segmentation of Objects Using Efficient Learning.
CVPR, 2007 L. Cao et al. Spatially Coherent Latent Topic Model for Concurrent Segmentation
and Classification of Objects and Scenes. ICCV, 2007 J. Winn et al. LOCUS: Learning Object Classes with Unsupervised Segmentation.
ICCV, 2005
Weakly Supervised Learning
Binary segmentation, limited data
Diverse Data“Car”
Diverse Data Learning
• Avoid “generic” classes
• Take advantage of– Cleanliness of supervised data– Vast availability of weakly supervised data
Outline
• Model
• Energy Minimization
• Parameter Learning
• Results
• Future Work
Region-Based Model
Pixels
Regions
Gould, Fulton and Koller, ICCV 2009
Unary Potential θr(i) = wiTΨr(x)
For example, Ψr(x) = Average [R G B]
wwater= [0 0 -10] wgrass = [0 -10 0]
Features extracted from region r of image xPairwise Potential θrr’(i,j) = wijTΨrr’(x)
For example, Ψrr’(x) = constant > 0
w”car above ground” << 0 w”ground above car” >> 0
Region-based Model
E(x,y) α -log P(x,y) = Unaries + Pairwise
E(x,y) = wTΨ (x,y)
Best segmentation of an image? Accurate w?
x y
Outline• Model
• Energy Minimization
• Parameter Learning
• Results
• Future WorkKumar and Koller, CVPR 2010
Besag. On the Statistical Analysis of Dirty Pictures, JRSS, 1986 Boykov et al. Fast Approximate Energy Minimization via Graph Cuts, PAMI, 2001 Komodakis et al. Fast, Approximately Optimal Solutions for Single and Dynamic MRFs, CVPR, 2007 Lempitsky et al. Fusion Moves for Markov Random Field Optimization, PAMI, 2010
Move-Making T. Minka. Expectation Propagation for Approximate Bayesian Inference, UAI, 2001 Murphy. Loopy Belief Propagation: An Empirical Study, UAI, 1999 J. Winn et al. Variational Message Passing, JMLR, 2005 J. Yedidia et al. Generalized Belief Propagation, NIPS, 2001
Message-Passing
Chekuri et al. Approximation Algorithms for Metric Labeling, SODA, 2001 M. Goemans et al. Improved Approximate Algorithms for Maximum-Cut, JACM, 1995 M. Muramatsu et al. A New SOCP Relaxation for Max-Cut, JORJ, 2003 Ravikumar et al. QP Relaxations for Metric Labeling, ICML, 2006
Convex Relaxations
K. Alahari et al. Dynamic Hybrid Algorithms for MAP Inference, PAMI 2010 P. Kohli et al. On Partial Optimality in Multilabel MRFs, ICML, 2008 C. Rother et al. Optimizing Binary MRFs via Extended Roof Duality, CVPR, 2007
Hybrid Algorithms
Which one is the best relaxation?
Convex Relaxations
TimeLP
1976
SOCP
2003
QP
2006
Tigh
tnes
s
We expect ….Kumar, Kolmogorov and Torr, NIPS, 2007
Use LP!!LP provably better than QP, SOCP.
Energy Minimization
Find Regions
Find Labels
Fixed Regions
LP Relaxation
Energy MinimizationGood region – homogenous appearance, textureBad region – inhomogenous appearance, textureLow-level segmentation for candidate regions
Find Regions
Find LabelsCan we prune regions?
Super-exponentialin Number of Pixels
………………………………………………
Energy Minimization
Spatial Bandwidth = 10
Mean-Shift Segmentation
Energy Minimization
Spatial Bandwidth = 20
Mean-Shift Segmentation
Energy Minimization
Spatial Bandwidth = 30
Mean-Shift Segmentation
Energy Minimization
“Combine” Multiple Segmentations
Car
Dictionary of Regions Select Regions, Assign Classes
yr(i) {0,1}, for i = 0, 1, 2, … , C Not Selected
Selected regions cover entire image
No two selected regions overlap
min Σ θr(i)yr(i) + Σ θrr’(i,j)yr(i)yr’(j)
Pixel
Regions
Kumar and Koller, CVPR 2010
Efficient DD. Komodakis and Paragios, CVPR, 2009✗233
Comparison
0100020003000400050006000
GOULDOUR (LP)
78
78.5
79
79.5
Energy Accuracy
IMAGE
GOULD
OUR
Parameters learned using Gould, Fulton and Koller, ICCV 2009
Statistically significant improvement (paired t-test)
Outline• Model
• Energy Minimization
• Parameter Learning
• Results
• Future Work
Kumar, Turki, Preston and Koller, In Submission
Supervised Learning
x1 y1
x2 y2
P(x,y) α exp(-E(x,y))= exp(wTΨ (x,y))
P(y|x1)
yP(y|x2)
y
y1
y2
Well-studied problem, efficient solutions
Diverse Data Learning
x a
h
GenericClass
Annotation
Diverse Data Learning
x a
h
BoundingBox
Annotation
Diverse Data Learning
x a = “Cow”
h
ImageLevel
Annotation
Learning with Missing InformationExpectation Maximization
A. Dempster et al. Maximum Likelihood from Incomplete Data via the EM Algorithm. JRSS, 1977.
M. Jamshadian et al. Acceleration of the EM Algorithm by Using Quasi-Newton Methods. JRSS, 1997.
R. Neal et al. A View of the EM Algorithm that Justifies Incremental, Sparse, and Other Variants. LGM, 1999.
R. Sundberg. Maximum Likelihood Theory for Incomplete Data from an Exponential Family. SJS 1974.
Latent Support Vector Machine P. Felzenszwalb et al. A Discriminatively Trained, Multiscale, Deformable Part
Model. CVPR, 2008. C.-N. Yu et al. Learning Structural SVMs with Latent Variables. ICML, 2009.
Computationally Inefficient
Only requires an energy minimization algorithm
Hard EM
Latent SVM
wTΨ (xi,ai,hi) ≤ wTΨ (xi,a,h) – ξi
min Σiξi + λ w 2 || ||
minhi
Energy of Ground-truth
Energy ofOther Labelings
≤
User-defined loss
Difference of Convex CCCP
+ Δ(ai,a,h)
Number of disagreements
Felzenszwalb et al., NIPS 2007, Yu et al., ICML 2008
CCCP
Start with an initial estimate w0
Update
Update wt+1 by solving a convex problem
min ∑i i
wT(xi,ai,hi) - wT(xi,a,h)≤ (ai,a,h) - i
hi = minh wtT(xi,ai,h)
Felzenszwalb et al., NIPS 2007, Yu et al., ICML 2008
+ λ w 2 || ||
Energy Minimization
Generic Class Annotation
Generic background with specific background
Generic foreground with specific foreground
Bounding Box Annotation
Every row “contains” the object
Every column “contains” the object
Image Level Annotation
The image “contains” the object
“Cow”
CCCP
Start with an initial estimate w0
Update
Update wt+1 by solving a convex problem
min ∑i i
wT(xi,ai,hi) - wT(xi,a,h)≤ (ai,a,h) - i
hi = minh wtT(xi,ai,h)
Felzenszwalb et al., NIPS 2007, Yu et al., ICML 2008
+ λ w 2 || ||
Energy Minimization
BadLocal
Minimum!!
White sky Grey road
EASY
Green grass
White sky Blue waterGreen grass
EASY
Cow? Cat?Horse?
HARD
Red Sky? Black Mountain?
All images are not equal
HARD
Real Numbers
Imaginary Numbers
eiπ+1 = 0
Math is forlosers !!
Real Numbers
Imaginary Numbers
eiπ+1 = 0
Euler was agenius!!
Self-Paced Learning
Easy vs. Hard
Easy for human Easy for machine
Simultaneously estimate easiness and parameters
Self-Paced Learning
Start with an initial estimate w0
Update
Update wt+1 by solving a convex problem
hi = minh wtT(xi,ai,h)
Kumar, Packer and Koller, NIPS 2010
min ∑I i
wT(xi,ai,hi) - wT(xi,a,h)≤ (ai,a,h) - i
+ λ w 2 || ||
vi -∑i vi/Kvi {0,1}vi [0,1]
vi = 1 for easy examples vi = 0 for hard examples
Biconvex Optimization Alternate Convex Search
Self-Paced Learning
Start with an initial estimate w0
Update
Update wt+1 by solving a biconvex problem
min ∑I ivi
wT(xi,ai,hi) - wT(xi,a,h)≤ (ai,a,h) - i
hi = minh wtT(xi,ai,h)
Kumar, Packer and Koller, NIPS 2010
+ λ w 2 || || -∑i vi/K
Decrease K K/
As SimpleAs CCCP!!
Self-Paced LearningKumar, Packer and Koller, NIPS 2010
hx
a = “Deer”14.5
1515.5
1616.5
1717.5
CCCP SPL
Test Error
ImageClassification
x
a = -1 or +1h = Motif Position 28
30
32
34
36
CCCP SPL
Test Error
MotifFinding
Learning to Segment
CCCP SPL
Learning to Segment
CCCP SPLIteration 1
Learning to Segment
CCCP SPLIteration 3
Learning to Segment
CCCP SPLIteration 6
Learning to Segment
CCCP SPL
Learning to Segment
CCCP SPLIteration 1
Learning to Segment
CCCP SPLIteration 2
Learning to Segment
CCCP SPLIteration 4
Outline• Model
• Energy Minimization
• Parameter Learning
• Results
• Future Work
DatasetStanford Background
Generic background class20 foreground classes
Generic foreground class7 background classes
PASCAL VOC 2009
+
DatasetStanford BackgroundPASCAL VOC 2009
+
Train - 572 imagesValidation - 53 images
Test - 90 images
Train - 1274 imagesValidation - 225 images
Test - 750 images
Baseline Results for SBDGould, Fulton and Koller, ICCV 2009
0
10
20
30
40
50
60
70
80
90
Classes
Overlap Score
Foreground36.0%
Road70.1%
CLL Average53.1%
Mountain0%
Improvement for SBD
-2
-1
0
1
2
3
4
5
6
Classes
Difference (SPL-CLL)
Input CLL SPL
Road75.5% (+5.4)
CLL Average53.1%
SPL Average54.3%
Foreground39.1% (+3.1)
Baseline Results for VOC
01020304050607080
Gould, Fulton and Koller, ICCV 2009 Overlap Score
Classes
Bird9.5%
Aeroplane32.1%
TV23.6%
CLL Average24.7%
Improvement for VOC
-2
0
2
4
6
8
10
Input CLL SPL
Difference (SPL-CLL)
Classes
Aeroplane41.4% (+9.3)
TV31.3% (+7.7)
CLL Average24.7%
SPL Average26.9%
Weakly Supervised DatasetImageNetVOC Detection 2009
+
Train - 1564 images Train - 1000 imagesBounding Box Data Image-Level Data
Improvement for SBD
-2
-1
0
1
2
3
4
5
6
Input Generic All
Difference (All-Generic)
Classes
Generic Average54.3%
All Average55.3%
Foreground41.3% (+2.2)
Water60.1% (+5.0)
Improvement for VOC
-4
-2
0
2
4
6
8Difference (All-Generic)
Classes
Input Generic All
Generic Average26.9%
All Average28.8%
Motorbike40.4% (+6.9)
Person42.2% (+4.9)
Improvement over CCCP
-2
-1
0
1
2
3
4
5
6
-5
0
5
10
15
20
Classes
Difference (SPL-CCCP)
CCCP24.7%
SPL28.8%
CCCP53.8%
SPL55.3%
No Improvementwith CCCP
SPL isEssential!!
Difference (SPL-CCCP)
Classes
• Energy minimization for region-based model– Tight LP relaxation of integer program
• Self-paced learning– Simultaneously select examples and learn parameters
• Even weak annotation is useful
Summary
Outline• Model
• Energy Minimization
• Parameter Learning
• Results
• Future Work
Learning with Diverse DataNoise in LabelsSize of Problem
Learning Diverse Tasks
Object Detection
Action Recognition
Pose Estimation
3D Reconstruction
Daphne Koller
Stephen GouldBen Packer Haithem Turki Dan Preston
Andrew ZissermanPhil Torr Vladimir Kolmogorov
Summary
Questions?
• Energy minimization for region-based model– Tight LP relaxation of integer program
• Self-paced learning– Simultaneously select examples and learn parameters
• Even weak annotation is useful