structured models for image segmentationpeople.inf.ethz.ch/alucchi/talks/berkeley_130213.pdf ·...
TRANSCRIPT
1
Structured Models for Image Segmentation
Aurelien Lucchi
Wednesday February 13th, 2013
Joint work with Yunpeng Li, Kevin Smith, Raphael Sznitman, Radhakrishna Achanta, Bohumil Maco,
Graham Knott, Pascal Fua.
2
Image Segmentation
● Goal: partition an image into meaningful regions with respect to a particular application.
3
Image Segmentation
● Goal: partition an image into meaningful regions with respect to a particular application.
4
Understanding the Brain
5
Electron Microscopy Data
5 × 5 × 5 μm section taken from the CA1 hippocampus, corresponding to a 1024 × 1024 × 1000 volume (N ≈ 109 total voxels)
● Human brain contains ~100 billion (1011) neurons and 100 trillion (1014) synapses.
6
Image Segmentation
Statistics
7
Image Segmentation
Featureextraction
ClassificationStructuredprediction
ground-truth
8
Image Segmentation
Featureextraction
ClassificationStructuredprediction
ground-truth
9
Outline
1. CRF for Image segmentation
2. Maximum Margin Training for CRFs - Cutting Plane (Structured SVM)
3. Maximum Margin Training of CRFs - Online Subgradient Descent (SGD)
4. SLIC superpixels/supervoxels
10
1. CRF for Image Segmentation
11
Structured Prediction
● Non structured output
● inputs X can be any kind of objects● output y is a real number
● Prediction of complex outputs
● Structured output y is complex (images, text, audio...)● Ad hoc definition of structured data: data that consists of several parts, and not
only the parts themselves contain information, but also the way in which the parts belong together
Slide courtesy: Christoph Lampert
12
Structured Prediction for Images
Histograms, Filter responses, ...
13
CRF for Image Segmentation
Pair-wise Terms MAP SolutionUnary likelihoodData (D)
Maximum-a-posteriori (MAP) solution :
Boykov and Jolly [ICCV 2001], Blake et al. [ECCV 2004]Slide courtesy : Pushmeet Kohli
14
CRF for Image Segmentation
Pair-wise Terms MAP SolutionUnary likelihoodData (D)
Maximum-a-posteriori (MAP) solution :
Boykov and Jolly [ICCV 2001], Blake et al. [ECCV 2004]Slide courtesy : Pushmeet Kohli
15
CRF for Image Segmentation
Pair-wise Terms
Favors the same label for neighboring nodes.
16
CRF for Image Segmentation
Pair-wise Terms MAP SolutionUnary likelihoodData (D)
Maximum-a-posteriori (MAP) solution :
Boykov and Jolly [ICCV 2001], Blake et al. [ECCV 2004]Slide courtesy : Pushmeet Kohli
17
Energy Minimization
● MAP inference for discrete graphical models:
● Dynamic programming– Exact on non loopy graphs
● Graph-cuts (Boykov, 2001)– Optimal solution if energy function is submodular
● Belief propagation (Pearl, 1982)– No theoretical guarantees on loopy graphs but seems to work well in
practice.● Mean field (root in statistical physics)● ...
18
Training a Structured Model
● First rewrite the energy function as:
where w is a vector of parameters to be learned from training data and is a joint feature map to map the input-output pair into a linear feature space.
Log-linear model
19
Training a Structured Model
● Energy function is parametrized by vector w
-1 1
-1 ? ?
1 ? ?+
20
Training a Structured Model
● Energy function is parametrized by vector w
-1 1
-1 0 1
1 1 0+
Low energy
High energy
21
Training a Structured Model
● Maximum likelihood● Pseudo-likelihood● Variational approximation● Contrastive divergence● Maximum-margin framework
22
2. Maximum Margin Training for CRFs - Cutting Plane (Structured SVM)
23
Structured SVM
● Given a set of N training examples with ground truth labels , we can write
≡
Energy for the correct labeling at least as low as energy of any incorrect labeling.
24
Structured SVM
E( )
E( )
E( )
E( )
<ground-truth
25
Structured SVM
● Given a set of N training examples with ground truth labellings we optimize :
26
Structured SVM
● In order to deal with the exponential number of constraints, Tsochantaridis* proposed a cutting plane algorithm.● Iteratively finds the most violated constraint and
adds it to the working set of constraints.
* I. Tsochantaridis et al., Support vector machine learning for interdependent and structured output spaces, ICML, 2004.
Loss-augmented inference
27
Illustrative Example
SSVM Problem● Exponential constraints● Most are dominated by a small set of
“important” constraints
Cutting plane approach● Repeatedly finds the next most
violated constraint…● …until set of constraints is a good
approximation.
An Introduction to Structured Output Learning Using Support Vector MachinesYisong Yue, Thorsten Joachims
28
Illustrative Example
SSVM Problem● Exponential constraints● Most are dominated by a small set of
“important” constraints
Cutting plane approach● Repeatedly finds the next most
violated constraint…● …until set of constraints is a good
approximation.
An Introduction to Structured Output Learning Using Support Vector MachinesYisong Yue, Thorsten Joachims
29
Illustrative Example
SSVM Problem● Exponential constraints● Most are dominated by a small set of
“important” constraints
Cutting plane approach● Repeatedly finds the next most
violated constraint…● …until set of constraints is a good
approximation.
An Introduction to Structured Output Learning Using Support Vector MachinesYisong Yue, Thorsten Joachims
30
Illustrative Example
SSVM Problem● Exponential constraints● Most are dominated by a small set of
“important” constraints
Cutting plane approach● Repeatedly finds the next most
violated constraint…● …until set of constraints is a good
approximation.
An Introduction to Structured Output Learning Using Support Vector MachinesYisong Yue, Thorsten Joachims
31
Drawbacks of SSVM
● Finding the most violated constraint at each iteration of the cutting plane is intractable in loopy graphical models.
● Approximations can sometimes be imprecise enough to have a major impact on learning● An unsatisfactory constraint can cause the cutting
plane algorithm to prematurely terminate.
32
3. Maximum Margin Training of Structured Models: Online Subgradient
Descent (SGD)
33
SGD Approach
● Reformulate the problem as an unconstrained optimization:
● SGD approach : compute and step in the negative direction of a sub-gradient of
● See N. Ratliff et al., (Online) Subgradient Methods for Structured Prediction. In AISTATS, 2007.
34
Subgradient
● A subgradient for the convex loss function L at w is defined as a vector g, such that:
w
● Set of all subgradients is called the subdifferential.
● If L is differentiable at w, then g is the gradient .
35
Subgradient
● How to compute a subgradient?
● Subgradient can be obtained by computing
at
Loss-augmented inference
36
SGD approach
37
SGD Approach
● Convergence guarantees:
Goes to 0 with appropriate step size
38
What can we say about Approximate Subgradients ?
● Epsilon subgradients:
w
ε
39
What can we say about Approximate Subgradients ?
● Convergence guarantees (see S. M. Robinson. Linear convergence of epsilon-subgradient descent methods for a class of convex functions. Mathematical Programming, 86:41–50, 1999):
Goes to 0 with appropriate step size
40
Proposed Algorithm
● Goal: better estimate the subgradient by using working sets of constraints, denoted .
● Algorithm:● First solves the loss-augmented inference to find a
constraint and add it to the working set .● It then steps in the opposite direction of the
subgradient computed as an average over the set of violated constraints belonging to .
41
Proposed Algorithm
42
EM Segmentation ResultsSegmentation performance measured with the Jaccard index.
[Lucchi] Supervoxel-Based Segmentation of Mitochondria in EM Image Stacks with Learned Shape Features, A. Lucchi et al.[SGD+inference] (Online) Subgradient Methods for Structured Prediction, N. Ratliff et al.[SSVM] Support Vector Machine Learning for Interdependent and Structured Output Spaces, I. Tsochantaridis et al.[Samplerank] Samplerank: Training Factor Graphs with Atomic Gradients, M. Wick et al.
43
MSRC Segmentation Results
[Ladicky] What, Where and How Many? Combining Object Detectors and Crfs, L. Ladicky et al.[SGD+inference] (Online) Subgradient Methods for Structured Prediction, N. Ratliff et al.[SSVM] Support Vector Machine Learning for Interdependent and Structured Output Spaces, I. Tsochantaridis et al.[Yao] Describing the Scene as a Whole: Joint Object Detection, Scene Classification and Semantic Segmentation, J. Yao et al.
All the methods in the top part of the table were optimized for the average score.
44
MSRC Segmentation Results
[SGD+inference] (Online) Subgradient Methods for Structured Prediction, N. Ratliff et al.
45
Time Analysis
● Sampling can replace the more expensive inference step without much performance loss, leading to significantly lower learning time.
[SGD+inference] (Online) Subgradient Methods for Structured Prediction, N. Ratliff et al.[Samplerank] Samplerank: Training Factor Graphs with Atomic Gradients, M. Wick et al.
Running time for T = 1000 iterations.
Computational overhead = increase in time resulting from the working set.
46
Summary
● Structured learning makes it possible to jointly learn the model parameters.
● Used a working sets of constraints to produce better subgradient estimates and higher-quality solutions
47
Future Challenges
● Better energy functions, fully connected CRFs...● High order potentials
Increasing connectivity
48
4. SLIC superpixels/supervoxels
SLIC Superpixels Compared to State-of-the-art Superpixel Methods, IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 2012R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua and S. Süsstrunk.
49
SLIC Superpixels
● Clusters pixels in the combined five-dimensional color and image plane space.
● Efficiently generate compact, nearly uniform superpixels.
SLIC Superpixels Compared to State-of-the-art Superpixel Methods, IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 2012.Source code available online.
50
SLIC Supervoxels
Supervoxel-Based Segmentation of Mitochondria in EM Image Stacks with Learned Shape Features,IEEE Transactions on Medical Imaging, 2011, A. Lucchi, K.Smith, R. Achanta, G. Knott, P. Fua.
51
SLIC Superpixels
Under-segmentation error = error with respect to a known ground truth.
Boundary recall = fraction of ground truth edges fall within one pixel of a least one superpixel boundary.
● GS04: Efficient Graph-Based Image Segmentation, P. Felzenszwalb and D. Huttenlocher.● NC05: Normalized Cuts and Image Segmentation, J. Shi and J. Malik.● QS09: Quick Shift and Kernel Methods for Mode Seeking, A. Vedaldi et al.● TP09: Turbopixels: Fast Superpixels Using Geometric Flows, A. Levinshtein et al.
SLIC
SLIC
52
SLIC Superpixels
● GKM: 10 iterations of k-means.● GS04: Efficient Graph-Based Image Segmentation, P. Felzenszwalb and D. Huttenlocher.● NC05: Normalized Cuts and Image Segmentation, J. Shi and J. Malik.● QS09: Quick Shift and Kernel Methods for Mode Seeking, A. Vedaldi et al.● TP09: Turbopixels: Fast Superpixels Using Geometric Flows, A. Levinshtein et al.
53
SLIC Superpixels
● Simple approach but successful: more than 100 citations, re-implemented in vlfeat and scikit + GPU implementation.
● Code: http://cvlab.epfl.ch/~lucchi/code.php
54
Questions
55
Resources
● http://cvlab.epfl.ch/research/medical/em/mitochondria/● http://cvlab.epfl.ch/research/medical/em/synapses/● http://cvlab.epfl.ch/~lucchi/
56
Credits
● Slides courtesy :● Christoph Lampert (Learning with Structured Inputs
and Outputs)● Pushmeet Kohli (Efficiently Solving Dynamic
Markov Random Fields using Graph Cuts)● Ben Taskar (Structured Prediction: A Large Margin
Approach, NIPS tutorial)● Yisong Yue, Thorsten Joachims (An Introduction to
Structured Output Learning Using Support Vector Machines)