1 vibhav vineet, jonathan warrell, paul sturgess, philip h.s. torr improved initialisation and...
TRANSCRIPT
1
Vibhav Vineet, Jonathan Warrell, Paul Sturgess, Philip H.S. Torr
Improved Initialisation and Gaussian Mixture Pairwise Terms for Dense
Random Fields with Mean-field Inference
http://cms.brookes.ac.uk/research/visiongroup/
Labelling Problem
2Stereo Object detection
Assign a label to each image pixel
Object segmentation
Problem Formulation
Find a labelling that maximizes the conditional probability or minimizes the energy function
3
Problem Formulation
4
Grid CRF
construction
Inference
Grid CRF leads to over smoothing around boundaries
Problem Formulation
5
Grid CRF leads to over smoothing around boundariesDense CRF is able to recover fine boundaries
Grid CRF
construction
Dense CRF construction
Inference
Inference
Inference in Dense CRF
6
Very high time complexity
alpha-expansion takes almost 1200 secs/per image with neighbourhood size of 15 on PascalVOC segmentation dataset
graph-cuts based methods not feasible
Inference in Dense CRF
7
Filter-based mean-field inference method takes 0.2 secs*
*Krahenbuhl et al. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 11
Efficient inference under two assumptionsMean-field approximation to CRFPairwise weights take Gaussian weights
Efficient inference in dense CRF
8
• Intractable inference with distribution P
• Approximate distribution from tractable family
• Mean-fields methods (Jordan et.al., 1999)
P
Naïve mean field
9
Assume all variables are independent
Efficient inference in dense CRF
10
Assume Gaussian pairwise weight
Mixture of Gaussian kernels
Bilateral Spatial
Marginal update
11
• Marginal update involve expectation of cost over distribution Q given that x_i takes label l
Expensive message passing step is solved using highly efficient permutohedral lattice based filtering approach
• Maximum posterior marginal (MPM) with approximate distribution:
Q distribution
12
Iteration 0
Q distribution for different classes across different iterations on CamVID dataset
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Q distribution
13
Q distribution for different classes across different iterations on CamVID dataset
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Iteration 1
Q distribution
14
Q distribution for different classes across different iterations on CamVID dataset
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Iteration 2
Q distribution
15
Q distribution for different classes across different iterations on CamVID dataset
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Iteration 10
Q distribution
16
Iter 0
Iter 1
Iter 2
Iter 10
Q distribution for different classes across different iterations on CamVID dataset
17
• Sensitive to initialisation
• Restrictive Gaussian pairwise weights
Two issues associated with the method
Our Contributions
18
• Sensitive to initialisationPropose SIFT-flow based initialisation method
• Restrictive Gaussian pairwise weightsExpectation maximisation (EM) based strategy to learn more general Gaussian mixture model
Resolve two issues associated with the method
Sensitivity to initialisation
19
Experiment on PascalVOC-10 segmentation dataset
• Good initialisation can lead to better solution
Propose a SIFT-flow based better initialisation method
Mean-field Alpha-expansion
Unary potential 28.52 % 27.88%
Ground truth label 41 % 27.88%
Observe an improvement of almost 13% in I/U score on initialising the mean-field inference with the ground truth labelling
SIFT-flow based correspondence
20
Given a test image, we first retrieve a set of nearest neighbours from training set using GIST features
Test image
Nearest neighbours retrieved from training set
SIFT-flow based correspondence
21
K-nearest neighbours warped to the test image
23.31 13.31 14.31
18.38 22 22
22 30.87 27.2
Test image
Warped nearest neighbours and corresponding flows
SIFT-flow based correspondence
22
Pick the best nearest neighbour based on the flow value
Test image
13.31
Nearest neighbour
Warped image
Flow:
Label transfer
Warp the ground truth according to correspondence
Transfer labels from top 1 using flow
23
Ground truth of the best nearest neighbour
Flow
Warped ground truth according to flow
Ground truth of test image
SIFT-flow based initialisation
24
Rescore the unary potential
Test image Ground truth After rescoringWithout rescoring
Qualitative improvement in accuracy after using rescored unary potential
s rescores the unary potential of a variable based on the label observed after the label transfer stage
set through cross-validation
SIFT-flow based initialisation
25
Initialise mean-field solution
Test image
Ground truth With initialisationWithout initialisation
Qualitative improvement in accuracy after initialisation of mean-field
Gaussian pairwise weights
26
-400 -300 -200 -100 0 100 200 300 400 500-500
-400
-300
-200
-100
0
100
200
300
400
500
Plotted the distribution of class-class ( ) interaction by selecting pair of random points (i-j)
Experiment on PascalVOC-10 segmentation dataset
Aeroplane-Aeroplane Car-PersonHorse-Person
Gaussian pairwise weights
27
Such complex structure of data can not be captured by zero mean Gaussian
-400 -300 -200 -100 0 100 200 300 400 500-500
-400
-300
-200
-100
0
100
200
300
400
500
Experiment on PascalVOC-10 segmentation dataset
distributed horizontally
not centred around zero mean
distributed vertically
Propose an EM-based learning strategy to incorporate more general class of Gaussian mixture model
Our model
28
Our energy function takes following form:
We use separate weights for label pairs but Gaussian components are shared
We follow piecewise learning strategy to learn parameters of our energy function
Learning mixture model
29
• Learn the parameters similar to this model*
*Krahenbuhl et al. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 11
Learning mixture model
30
*Krahenbuhl et al. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 11
• Learn the parameters of the Gaussian mixture• Learn the parameters similar to this model*
mean, standard deviation
mixing coefficients
Learning mixture model
31
*Krahenbuhl et al. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 11
• Lambda is set through cross validation
• Learn the parameters of the Gaussian mixture• Learn the parameters similar to this model*
mean, standard deviation
mixing coefficients
Our model
32
• We follow a generative training model Maximise joint likelihood of pair of labels and features:
: latent variable: cluster assignment
We follow expectation maximization (EM) based method to maximize the likelihood function
Learning mixture model
33
50 100 150 200 250 300 350 400
50
100
150
200
250
300
350
400
50 100 150 200 250 300 350 400
50
100
150
200
250
300
350
400 50 100 150 200 250 300 350 400
50
100
150
200
250
300
350
400
Aeroplane-Aeroplane Car-PersonHorse-Person
Our model is able to capture the true distribution of class-class interaction
-400 -300 -200 -100 0 100 200 300 400 500-500
-400
-300
-200
-100
0
100
200
300
400
500
Inference with mixture model
34
• Involves evaluating M extra Gaussian terms:
• Perform blurring on mean-shifted points
• Increases time complexity
Experiments on Camvid
350
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Iteration 0
Confidence of building pixels increases with initialisation
Ground truth Without initialisation With initialisation
Q distribution for building classes on CamVID dataset
Experiments on Camvid
360
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Confidence of building pixels increases with initialisation
Ground truth Without initialisation With initialisation
Iteration 1
Q distribution for building classes on CamVID dataset
Experiments on Camvid
370
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Confidence of building pixels increases with initialisation
Ground truth Without initialisation With initialisation
Iteration 2
Q distribution for building classes on CamVID dataset
Experiments on Camvid
380
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Ground truth
Q distribution for building classes on CamVID dataset
Iteration 10
Without initialisation With initialisation
Confidence of building pixels increases with initialisation
Experiments on Camvid
39
Ground truth
Without Initialisation With Initialisation
Image 2
Building is properly recovered with our initialisation strategy
Experiments on Camvid
40
Quantitative results on Camvid dataset
Algorithm Time(s) Overall(%-corr) Av. Recall Av. U/I
Alpha-exp 0.96 78.84 58.64 43.89
APST(U+P+H) 1.6 85.18 60.06 50.62
denseCRF 0.2 79.96 59.29 45.18
Ours (U+P+I) 0.35 85.31 59.75 50.56
• Our model with unary and pairwise terms achieve better accuracy than other complex models
• Generally achieve very high efficiency compared to other methods
Experiments on Camvid
41
Qualitative results on Camvid dataset
Image Alpha-expansion Ours
Able to recover building and tree properly
Ground truth
Experiments on PascalVOC-10
42
Qualitative results of SIFT-flow method
Image Warped nearest ground truth image
Output without SIFT-flow
Output with SIFT-flow
Able to recover missing body parts
Ground truth
Experiments on PascalVOC-10
43
Quantitative results PascalVOC-10 segmentation dataset
Algorithm Time(s) Overall(%-corr) Av. Recall Av. U/I
Alpha-exp 3.0 79.52 36.08 27.88
AHCRF+Cooc 36 81.43 38.01 30.9
Dense CRF 0.67 71.63 34.53 28.4
Ours1(U+P+GM) 26.7 80.23 36.41 28.73
Ours2 (U+P+I) 0.90 79.65 41.84 30.95
Ours3 (U+P+I+GM) 26.7 78.96 44.05 31.48
• Our model with unary and pairwise terms achieves better accuracy than other complex models
• Generally achieves very high efficiency compared to other methods
Experiments on PascalVOC-10
44
Qualitative results on PascalVOC-10 segmentation dataset
Image alpha-expansion Dense CRF Ours
Able to recover missing object and body parts
Ground truth
Conclusion
45
• Filter-based mean-field inference promises high efficiency and accuracy
• Proposed methods to robustify basic mean-field method• SIFT-flow based method for better initialisation• EM based algorithm for learning general Gaussian
mixture model
• More complex higher order models can be incorporated into pairwise model
46
Thank you