visual element discovery as discriminative mode seeking

40
Visual Element Discovery as Discriminative Mode Seeking Carl Doersch, Abhinav Gupta, Alexei A. Efros CMU CMU UCB

Upload: kurt

Post on 24-Feb-2016

37 views

Category:

Documents


0 download

DESCRIPTION

Visual Element Discovery as Discriminative Mode Seeking. CMU CMU UCB. Carl Doersch , Abhinav Gupta, Alexei A. Efros. The need for mid-level representations. 6 billion images. 70 billion images. 1 billion images served daily. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Visual Element Discovery as Discriminative Mode Seeking

Visual Element Discovery as Discriminative Mode Seeking

Carl Doersch, Abhinav Gupta, Alexei A. EfrosCMU CMU UCB

Page 2: Visual Element Discovery as Discriminative Mode Seeking

The need for mid-level representations

6 billion images 70 billion images 1 billion images served daily

10 billion images

60 hours uploaded per minute

Almost 90% of web traffic is visual!

:From

Page 3: Visual Element Discovery as Discriminative Mode Seeking

Discriminative patches

• Visual words are too simple

• Objects are too difficult

• Something in the middle?(Felzenswalb et al. 2008)

(Singh et al. 2012)

Page 4: Visual Element Discovery as Discriminative Mode Seeking

Mid-level “Visual Elements”

• Simple enough to be detected easily• Complex enough to be meaningful– “Meaningful” as measured by weak labels

(Doersch et al. 2012)

(Singh et al. 2012)

Page 5: Visual Element Discovery as Discriminative Mode Seeking

Mid-level “Visual Elements”

(Doersch et al. 2012)

(Singh et al. 2012)

• Doersch et al. 2012• Singh et al. 2012• Jain et al. 2013• Endres et al. 2013• Juneja et al. 2013

• Li et al. 2013• Sun et al. 2013• Wang et al. 2013• Fouhey et al. 2013• Lee et al. 2013

Page 6: Visual Element Discovery as Discriminative Mode Seeking

Our goal

• Provide a mathematical optimization for visual elements

• Improve performance of mid-level representations.

Page 7: Visual Element Discovery as Discriminative Mode Seeking

Elements as Patch Classifiers

Page 8: Visual Element Discovery as Discriminative Mode Seeking

What if the labels are weak?

• E.g. image has horse/no-horse• (Or even weaker, like Paris/not-Paris)

• Idea: Label these all as “horse”

• Problem: 10,000 patches per image, most of which are unclassifiable.

Page 9: Visual Element Discovery as Discriminative Mode Seeking

The weaker the label, the bigger the problem.

Task: Learn to classify Paris from Not-Paris

Paris Also Paris

Page 10: Visual Element Discovery as Discriminative Mode Seeking

Other approaches

• Latent SVM:– Assumes we have one instance per positive image

• Multiple instance learning– Not clear how to define the bags

Page 11: Visual Element Discovery as Discriminative Mode Seeking

What if the labels are weak?

• Negatives are negatives, positives might not be positive

• Most of our data can be ignored• First: how to cluster without clustering everything

(Doersch et al. 2012)

(Singh et al. 2012)

Page 12: Visual Element Discovery as Discriminative Mode Seeking

Mean shift

Page 13: Visual Element Discovery as Discriminative Mode Seeking

Mean shift

Page 14: Visual Element Discovery as Discriminative Mode Seeking

Mean shift

Page 15: Visual Element Discovery as Discriminative Mode Seeking

Patch distances

Min distance: 2.59e-4

Max distance: 1.22e-4

Input Nearest neighbor

Page 16: Visual Element Discovery as Discriminative Mode Seeking

Mean shift

Page 17: Visual Element Discovery as Discriminative Mode Seeking

Negative Set Not ParisParis

Page 18: Visual Element Discovery as Discriminative Mode Seeking

Negative Set Not ParisParis

Page 19: Visual Element Discovery as Discriminative Mode Seeking

Density Ratios Not ParisParis

Page 20: Visual Element Discovery as Discriminative Mode Seeking

Density Ratios Not ParisParis

Page 21: Visual Element Discovery as Discriminative Mode Seeking

Adaptive Bandwidth NegativePositive

Bandwidth

Page 22: Visual Element Discovery as Discriminative Mode Seeking

Discriminative Mode Seeking

• Find local optima of an estimate of the density ratio

• Allow an adaptive bandwidth• Be extremely fast– Minimize the number of passes through the data

Page 23: Visual Element Discovery as Discriminative Mode Seeking

Discriminative Mode Seeking

• Mean shift: maximize (w.r.t. w)

Centroid

Patch FeatureBandwidth

Distance

w

b

Page 24: Visual Element Discovery as Discriminative Mode Seeking

Discriminative Mode Seeking

B(w) is the value of b satisfying:

Page 25: Visual Element Discovery as Discriminative Mode Seeking

Discriminative Mode Seeking

s.t.

optimize

• Distance metric: Normalized Correlation

Page 26: Visual Element Discovery as Discriminative Mode Seeking

s.t.

optimize

NegativePositive

w

Discriminative Mode Seeking

Page 27: Visual Element Discovery as Discriminative Mode Seeking

Optimization

• Initialization is straightforward• For each element, just keep around ~500

patches where wTx - b > 0• Trivially parallelizable in MapReduce.• Optimization is piecewise quadratic

s.t.

Page 28: Visual Element Discovery as Discriminative Mode Seeking

Evaluation via Purity-Coverage Plot

• Analogous to Precision-Recall Plot

Page 29: Visual Element Discovery as Discriminative Mode Seeking

Low Purity

Element 1

Element 2

Element 3

Element 4

Element 5

Page 30: Visual Element Discovery as Discriminative Mode Seeking

High purity, Low Coverage

Element 1

Element 2

Element 3

Element 4

Element 5

Page 31: Visual Element Discovery as Discriminative Mode Seeking

0 2 4 6 8 100

0.10.20.30.40.50.60.70.80.9

1

Purity-Coverage Curve

ParisNot Paris

Purity

Coverage x1e4 pixels

Page 32: Visual Element Discovery as Discriminative Mode Seeking

Purity

Purity-Coverage Curve

ParisNot Paris Coverage

0 2 4 6 8 100

0.10.20.30.40.50.60.70.80.9

1

x1e4 pixels

Page 33: Visual Element Discovery as Discriminative Mode Seeking

Purity-Coverage Curve

• Coverage for multiple elements is simply the union.

Page 34: Visual Element Discovery as Discriminative Mode Seeking

Purity-Coverage

0 0.1 0.2 0.3 0.4 0.50.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

0 0.2 0.4 0.6 0.8

Purit

y

Coverage (fraction of positive dataset) Coverage (fraction of positive dataset)

Top 25 Elements Top 200 Elements

This workThis work, no inter-elementSVM Retrained 5x (Doersch et al. 2012)LDA Retrained 5xLDA RetrainedExemplar LDA (Hariharan et al. 2012)

Page 35: Visual Element Discovery as Discriminative Mode Seeking

Results on Indoor 67 Scenes

Kitchen Grocery Bowling

Elevator Bakery Bathroom

Page 36: Visual Element Discovery as Discriminative Mode Seeking

Results on Indoor 67 Scenes

Method Accuracy Method Accuracy

ROI+Gist (Quattoni et al.) 26.05 miSVM (Li et al.) 46.40

MM-Scene (Zhu et al.) 28.00 D. Patches (full) (Singh et al.) 49.40

Scene-DPM (Pandley et al.) 30.40 MMDL (Wang et al.) 50.15

CENTRIST (Wu et al.) 36.90 Discr. Parts (Sun et al.) 51.40

Object Bank (Li et al.) 37.60 IFV (Juneja et al.) 60.77

RBoW (Parizi et al.) 37.93 Bag of Parts+IFV (Juneja et al.) 63.10

Discr. Patches (Singh et al.) 38.10 Ours (no inter-element) 63.36

Latent Pyramid. (Sadeghi et al.) 44.84 Ours 64.03

Bag of Parts (Juneja et al.) 46.10 Ours+IFV 66.87

Page 37: Visual Element Discovery as Discriminative Mode Seeking

Qualitative Indoor67 Results

Page 38: Visual Element Discovery as Discriminative Mode Seeking

Indoor67: Error Analysis

Ground Truth (GT): deli GT: corridorGuess: grocery store Guess: staircase

GT: laundromat Guess: closetGT: museum Guess: garage

Page 39: Visual Element Discovery as Discriminative Mode Seeking

Ground Truth (GT): deli GT: corridorGuess: grocery store Guess: staircase

GT: laundromat Guess: closetGT: museum Guess: garage

Thank you!

More results athttp://graphics.cs.cmu.edu/projects/discriminativeModeSeeking/

Paris Elements • Indoor 67 ElementsIndoor 67 Heatmaps • Source code (soon)

Page 40: Visual Element Discovery as Discriminative Mode Seeking

Some New Paris Elements