combining randomization and discrimination for fine...

Motivation

Reference B. Yao*, A. Khosla* and L. Fei-Fei. “Combining Randomization and Discrimination for Fine-Grained Image Categorization.” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.

Our Work

• Approach: A model combining randomization and discrimination Dense feature representation; Random forest with discriminative decision trees classifier

original image

traditional method (SPM)

our intuition: what humans do

Dense Feature Representation • Our representation consists of (pairs of) image regions of arbitrary sizes and at arbitrary locations:

Experiment

People-Playing-Musical-Instruments (PPMI)

PASCAL Action Dataset

Future Work: •Improve speed by exploiting the inherent parallel nature of random forests using GPUs •Strong classifiers with analytical solution (e.g. LDA) • Incorporate multiple features

Combining Randomization and Discrimination for Fine-Grained Image Categorization Bangpeng Yao*, Aditya Khosla* and Li Fei-Fei

{bangpeng,aditya86,feifeili}@cs.stanford.edu The Vision Lab Computer Science Dept. Stanford University

• Objective: Finding image regions that contain discriminative information for fine-grained image categorization.

Caltech-UCSD Birds 200 • 200-class classification of 200 bird species from North America

(* - indicates equal contribution)

California Gull

California Gull

Herring Gull

our goal

16.1%

18.0%

19% 19.2%

0.15

0.16

0.17

0.18

0.19

0.2

SPM LLC MKL Ours

Cla

ssifi

catio

n Ac

cura

cy

• MKL: Uses many features including Gray/ColorSIFT, geometric blur, color histograms, etc.

class heatmap

image heatmap

image heatmap

Flute

Playing Instrument

Not Playing Instrument

Guitar Violin

Random forest with Discriminative Decision Trees

Leaf node

Strong classifier

Coarse-to-fine Learning

• 9-class classification of human actions (%mAP)

Playing Instrument Reading Riding

Bike

1 2 3 4 5

0.225 0.277 0.237 0.178 0.167

... ...

... Region Height

Region Width

Normalized Image

Size of image region

Center of image region

PPMI Binary Classification

Generalization Error of RF

Training a Strong Classifier • {1, …, 5} represent original class labels

Random binary

assignment

Train a binary SVM 2

3

4

5

0

1

1

1

0

1

… … … … Control Experiments

22.7%

36.7% 39.1% 41.8%

47.0%

0.2

0.25

0.3

0.35

0.4

0.45

0.5

BoW Grouplet SPM LLC Ours

mA

P

PPMI 24 –Class Classification

78.0%

85.1% 88.2% 89.2%

92.1%

0.75

0.81

0.87

0.93

mA

P

0.70.75

0.80.85

0.90.95

1

Bassoon Erhu Flute French Horn Guitar Saxophone Violin Trumpet Cello Clarinet Harp Recorder

mA

P

BoW SPM Our Method Grouplet LLC

Bassoon Erhu Flute French Horn Saxophone Trumpet Cello Clarinet Recorder

Overall mAP

• Ours: Uses a single feature (ColorSIFT)

85.7%

0.0%

66.7%

45.5%

31.8%

20.0%

correctly classified

wrongly classified

Class Accuracy

• Learning our random forest classifier:

• Select best sample using information gain criterion:

Features for image regions: • Grayscale SIFT descriptors for

PPMI and PASCAL Action • ColorSIFT descriptors for

Caltech-UCSD Birds-200 • Dense SIFT sampling at multiple

scales (8, 12, 16, 24, 30) • Locality-constrained Linear

Coding (LLC) Features

: set of all training examples

: entropy of training examples

• Classification of test example:

: number of trees

: class label of test example

: probability of test example belonging to class c for tree t

• Generalization error of a random forest:

: correlation between decision trees

: strength of the decision trees

• Dense feature space decreases • Strong classifiers increases

Better generalization

Depth:

Area:

• Our method automatically learns a coarse-to-fine region of interest (e.g. shown below for ‘playing trumpet’ class)

• This is similar to the human visual system which is believed to analyze raw input from low to high spatial frequencies or from large global shapes to smaller local ones

combining randomization and discrimination for fine...

Documents