1 vibhav vineet, jonathan warrell, paul sturgess, philip h.s. torr improved initialisation and...

1

Vibhav Vineet, Jonathan Warrell, Paul Sturgess, Philip H.S. Torr

Improved Initialisation and Gaussian Mixture Pairwise Terms for Dense

Random Fields with Mean-field Inference

http://cms.brookes.ac.uk/research/visiongroup/

Labelling Problem

2Stereo Object detection

Assign a label to each image pixel

Object segmentation

Problem Formulation

Find a labelling that maximizes the conditional probability or minimizes the energy function

3

Problem Formulation

4

Grid CRF

construction

Inference

Grid CRF leads to over smoothing around boundaries

Problem Formulation

5

Grid CRF leads to over smoothing around boundariesDense CRF is able to recover fine boundaries

Grid CRF

construction

Dense CRF construction

Inference

Inference

Inference in Dense CRF

6

Very high time complexity

alpha-expansion takes almost 1200 secs/per image with neighbourhood size of 15 on PascalVOC segmentation dataset

graph-cuts based methods not feasible

Inference in Dense CRF

7

Filter-based mean-field inference method takes 0.2 secs*

*Krahenbuhl et al. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 11

Efficient inference under two assumptionsMean-field approximation to CRFPairwise weights take Gaussian weights

Efficient inference in dense CRF

8

• Intractable inference with distribution P

• Approximate distribution from tractable family

• Mean-fields methods (Jordan et.al., 1999)

P

Naïve mean field

9

Assume all variables are independent

Efficient inference in dense CRF

10

Assume Gaussian pairwise weight

Mixture of Gaussian kernels

Bilateral Spatial

Marginal update

11

• Marginal update involve expectation of cost over distribution Q given that x_i takes label l

Expensive message passing step is solved using highly efficient permutohedral lattice based filtering approach

• Maximum posterior marginal (MPM) with approximate distribution:

Q distribution

12

Iteration 0

Q distribution for different classes across different iterations on CamVID dataset

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Q distribution

13


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Iteration 1

Q distribution

14


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Iteration 2

Q distribution

15


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Iteration 10

Q distribution

16

Iter 0

Iter 1

Iter 2

Iter 10


17

• Sensitive to initialisation

• Restrictive Gaussian pairwise weights

Two issues associated with the method

Our Contributions

18

• Sensitive to initialisationPropose SIFT-flow based initialisation method

• Restrictive Gaussian pairwise weightsExpectation maximisation (EM) based strategy to learn more general Gaussian mixture model

Resolve two issues associated with the method

Sensitivity to initialisation

19

Experiment on PascalVOC-10 segmentation dataset

• Good initialisation can lead to better solution

Propose a SIFT-flow based better initialisation method

Mean-field Alpha-expansion

Unary potential 28.52 % 27.88%

Ground truth label 41 % 27.88%

Observe an improvement of almost 13% in I/U score on initialising the mean-field inference with the ground truth labelling

SIFT-flow based correspondence

20

Given a test image, we first retrieve a set of nearest neighbours from training set using GIST features

Test image

Nearest neighbours retrieved from training set


21

K-nearest neighbours warped to the test image

23.31 13.31 14.31

18.38 22 22

22 30.87 27.2

Test image

Warped nearest neighbours and corresponding flows


22

Pick the best nearest neighbour based on the flow value

Test image

13.31

Nearest neighbour

Warped image

Flow:

Label transfer

Warp the ground truth according to correspondence

Transfer labels from top 1 using flow

23

Ground truth of the best nearest neighbour

Flow

Warped ground truth according to flow

Ground truth of test image

SIFT-flow based initialisation

24

Rescore the unary potential

Test image Ground truth After rescoringWithout rescoring

Qualitative improvement in accuracy after using rescored unary potential

s rescores the unary potential of a variable based on the label observed after the label transfer stage

set through cross-validation

SIFT-flow based initialisation

25

Initialise mean-field solution

Test image

Ground truth With initialisationWithout initialisation

Qualitative improvement in accuracy after initialisation of mean-field

Gaussian pairwise weights

26

-400 -300 -200 -100 0 100 200 300 400 500-500

-400

-300

-200

-100

0

100

200

300

400

500

Plotted the distribution of class-class ( ) interaction by selecting pair of random points (i-j)


Aeroplane-Aeroplane Car-PersonHorse-Person

Gaussian pairwise weights

27

Such complex structure of data can not be captured by zero mean Gaussian

-400 -300 -200 -100 0 100 200 300 400 500-500

-400

-300

-200

-100

0

100

200

300

400

500


distributed horizontally

not centred around zero mean

distributed vertically

Propose an EM-based learning strategy to incorporate more general class of Gaussian mixture model

Our model

28

Our energy function takes following form:

We use separate weights for label pairs but Gaussian components are shared

We follow piecewise learning strategy to learn parameters of our energy function

Learning mixture model

29

• Learn the parameters similar to this model*



30


• Learn the parameters of the Gaussian mixture• Learn the parameters similar to this model*

mean, standard deviation

mixing coefficients


31


• Lambda is set through cross validation

• Learn the parameters of the Gaussian mixture• Learn the parameters similar to this model*

mean, standard deviation

mixing coefficients

Our model

32

• We follow a generative training model Maximise joint likelihood of pair of labels and features:

: latent variable: cluster assignment

We follow expectation maximization (EM) based method to maximize the likelihood function


33

50 100 150 200 250 300 350 400

50

100

150

200

250

300

350

400

50 100 150 200 250 300 350 400

50

100

150

200

250

300

350

400 50 100 150 200 250 300 350 400

50

100

150

200

250

300

350

400

Aeroplane-Aeroplane Car-PersonHorse-Person

Our model is able to capture the true distribution of class-class interaction

-400 -300 -200 -100 0 100 200 300 400 500-500

-400

-300

-200

-100

0

100

200

300

400

500

Inference with mixture model

34

• Involves evaluating M extra Gaussian terms:

• Perform blurring on mean-shifted points

• Increases time complexity

Experiments on Camvid

350

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Iteration 0

Confidence of building pixels increases with initialisation

Ground truth Without initialisation With initialisation

Q distribution for building classes on CamVID dataset


360

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1



Iteration 1



370

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1



Iteration 2



380

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Ground truth


Iteration 10

Without initialisation With initialisation



39

Ground truth

Without Initialisation With Initialisation

Image 2

Building is properly recovered with our initialisation strategy


40

Quantitative results on Camvid dataset

Algorithm Time(s) Overall(%-corr) Av. Recall Av. U/I

Alpha-exp 0.96 78.84 58.64 43.89

APST(U+P+H) 1.6 85.18 60.06 50.62

denseCRF 0.2 79.96 59.29 45.18

Ours (U+P+I) 0.35 85.31 59.75 50.56

• Our model with unary and pairwise terms achieve better accuracy than other complex models

• Generally achieve very high efficiency compared to other methods


41

Qualitative results on Camvid dataset

Image Alpha-expansion Ours

Able to recover building and tree properly

Ground truth

Experiments on PascalVOC-10

42

Qualitative results of SIFT-flow method

Image Warped nearest ground truth image

Output without SIFT-flow

Output with SIFT-flow

Able to recover missing body parts

Ground truth


43

Quantitative results PascalVOC-10 segmentation dataset

Algorithm Time(s) Overall(%-corr) Av. Recall Av. U/I

Alpha-exp 3.0 79.52 36.08 27.88

AHCRF+Cooc 36 81.43 38.01 30.9

Dense CRF 0.67 71.63 34.53 28.4

Ours1(U+P+GM) 26.7 80.23 36.41 28.73

Ours2 (U+P+I) 0.90 79.65 41.84 30.95

Ours3 (U+P+I+GM) 26.7 78.96 44.05 31.48

• Our model with unary and pairwise terms achieves better accuracy than other complex models

• Generally achieves very high efficiency compared to other methods


44

Qualitative results on PascalVOC-10 segmentation dataset

Image alpha-expansion Dense CRF Ours

Able to recover missing object and body parts

Ground truth

Conclusion

45

• Filter-based mean-field inference promises high efficiency and accuracy

• Proposed methods to robustify basic mean-field method• SIFT-flow based method for better initialisation• EM based algorithm for learning general Gaussian

mixture model

• More complex higher order models can be incorporated into pairwise model

46

Thank you

1 vibhav vineet, jonathan warrell, paul sturgess, philip h.s. torr improved initialisation and...

Documents

field slide

ukresearchvisiongroup

feasible slide

independent slide

camvid dataset slide

distribution q

q distribution

ground truth labelling