combining randomization and discrimination for fine...

1
Motivation Reference B. Yao*, A. Khosla* and L. Fei-Fei. “Combining Randomization and Discrimination for Fine- Grained Image Categorization.” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011. Our Work Approach: A model combining randomization and discrimination Dense feature representation; Random forest with discriminative decision trees classifier original image traditional method (SPM) our intuition: what humans do Dense Feature Representation Our representation consists of (pairs of) image regions of arbitrary sizes and at arbitrary locations: Experiment People-Playing-Musical-Instruments (PPMI) PASCAL Action Dataset Future Work: Improve speed by exploiting the inherent parallel nature of random forests using GPUs Strong classifiers with analytical solution (e.g. LDA) Incorporate multiple features Combining Randomization and Discrimination for Fine-Grained Image Categorization Bangpeng Yao*, Aditya Khosla* and Li Fei-Fei {bangpeng,aditya86,feifeili}@cs.stanford.edu The Vision Lab Computer Science Dept. Stanford University Objective: Finding image regions that contain discriminative information for fine-grained image categorization. Caltech-UCSD Birds 200 200-class classification of 200 bird species from North America (* - indicates equal contribution) California Gull California Gull Herring Gull our goal 16.1% 18.0% 19% 19.2% 0.15 0.16 0.17 0.18 0.19 0.2 SPM LLC MKL Ours Classification Accuracy MKL: Uses many features including Gray/ColorSIFT, geometric blur, color histograms, etc. class heatmap image heatmap image heatmap Flute Playing Instrument Not Playing Instrument Guitar Violin Random forest with Discriminative Decision Trees Leaf node Strong classifier Coarse-to-fine Learning 9-class classification of human actions (%mAP) Playing Instrument Reading Riding Bike 1 2 3 4 5 0.225 0.277 0.237 0.178 0.167 ... ... ... Region Height Region Width Normalized Image Size of image region Center of image region PPMI Binary Classification Generalization Error of RF Training a Strong Classifier {1, …, 5} represent original class labels Random binary assignment Train a binary SVM 2 3 4 5 0 1 1 1 0 1 Control Experiments 22.7% 36.7% 39.1% 41.8% 47.0% 0.2 0.25 0.3 0.35 0.4 0.45 0.5 BoW Grouplet SPM LLC Ours mAP PPMI 24 –Class Classification 78.0% 85.1% 88.2% 89.2% 92.1% 0.75 0.81 0.87 0.93 mAP 0.7 0.75 0.8 0.85 0.9 0.95 1 Guitar Violin Harp mAP BoW SPM Our Method Grouplet LLC Bassoon Erhu Flute French Horn Saxophone Trumpet Cello Clarinet Recorder Overall mAP Ours: Uses a single feature (ColorSIFT) 85.7% 0.0% 66.7% 45.5% 31.8% 20.0% correctly classified wrongly classified Class Accuracy Learning our random forest classifier: Select best sample using information gain criterion: Features for image regions: Grayscale SIFT descriptors for PPMI and PASCAL Action ColorSIFT descriptors for Caltech-UCSD Birds-200 Dense SIFT sampling at multiple scales (8, 12, 16, 24, 30) Locality-constrained Linear Coding (LLC) Features : set of all training examples : entropy of training examples Classification of test example: : number of trees : class label of test example : probability of test example belonging to class c for tree t Generalization error of a random forest: : correlation between decision trees : strength of the decision trees Dense feature space decreases Strong classifiers increases Better generalization Depth: Area: Our method automatically learns a coarse-to-fine region of interest (e.g. shown below for ‘playing trumpet’ class) This is similar to the human visual system which is believed to analyze raw input from low to high spatial frequencies or from large global shapes to smaller local ones

Upload: others

Post on 26-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Combining Randomization and Discrimination for Fine ...people.csail.mit.edu/khosla/posters/cvpr11_0254_1.pdf · Combining Randomization and Discrimination for Fine-Grained Image Categorization

Motivation

Reference B. Yao*, A. Khosla* and L. Fei-Fei. “Combining Randomization and Discrimination for Fine-Grained Image Categorization.” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.

Our Work

• Approach: A model combining randomization and discrimination Dense feature representation; Random forest with discriminative decision trees classifier

original image

traditional method (SPM)

our intuition: what humans do

Dense Feature Representation • Our representation consists of (pairs of) image regions of arbitrary sizes and at arbitrary locations:

Experiment

People-Playing-Musical-Instruments (PPMI)

PASCAL Action Dataset

Future Work: •Improve speed by exploiting the inherent parallel nature of random forests using GPUs •Strong classifiers with analytical solution (e.g. LDA) • Incorporate multiple features

Combining Randomization and Discrimination for Fine-Grained Image Categorization Bangpeng Yao*, Aditya Khosla* and Li Fei-Fei

{bangpeng,aditya86,feifeili}@cs.stanford.edu The Vision Lab Computer Science Dept. Stanford University

• Objective: Finding image regions that contain discriminative information for fine-grained image categorization.

Caltech-UCSD Birds 200 • 200-class classification of 200 bird species from North America

(* - indicates equal contribution)

California Gull

California Gull

Herring Gull

our goal

16.1%

18.0%

19% 19.2%

0.15

0.16

0.17

0.18

0.19

0.2

SPM LLC MKL Ours

Cla

ssifi

catio

n Ac

cura

cy

• MKL: Uses many features including Gray/ColorSIFT, geometric blur, color histograms, etc.

class heatmap

image heatmap

image heatmap

Flute

Playing Instrument

Not Playing Instrument

Guitar Violin

Random forest with Discriminative Decision Trees

Leaf node

Strong classifier

Coarse-to-fine Learning

• 9-class classification of human actions (%mAP)

Playing Instrument Reading Riding

Bike

1 2 3 4 5

0.225 0.277 0.237 0.178 0.167

... ...

... Region Height

Region Width

Normalized Image

Size of image region

Center of image region

PPMI Binary Classification

Generalization Error of RF

Training a Strong Classifier • {1, …, 5} represent original class labels

Random binary

assignment

Train a binary SVM 2

3

4

5

0

1

1

1

0

1

… … … … Control Experiments

22.7%

36.7% 39.1% 41.8%

47.0%

0.2

0.25

0.3

0.35

0.4

0.45

0.5

BoW Grouplet SPM LLC Ours

mA

P

PPMI 24 –Class Classification

78.0%

85.1% 88.2% 89.2%

92.1%

0.75

0.81

0.87

0.93

mA

P

0.70.75

0.80.85

0.90.95

1

Bassoon Erhu Flute French Horn Guitar Saxophone Violin Trumpet Cello Clarinet Harp Recorder

mA

P

BoW SPM Our Method Grouplet LLC

Bassoon Erhu Flute French Horn Saxophone Trumpet Cello Clarinet Recorder

Overall mAP

• Ours: Uses a single feature (ColorSIFT)

85.7%

0.0%

66.7%

45.5%

31.8%

20.0%

correctly classified

wrongly classified

Class Accuracy

• Learning our random forest classifier:

• Select best sample using information gain criterion:

Features for image regions: • Grayscale SIFT descriptors for

PPMI and PASCAL Action • ColorSIFT descriptors for

Caltech-UCSD Birds-200 • Dense SIFT sampling at multiple

scales (8, 12, 16, 24, 30) • Locality-constrained Linear

Coding (LLC) Features

: set of all training examples

: entropy of training examples

• Classification of test example:

: number of trees

: class label of test example

: probability of test example belonging to class c for tree t

• Generalization error of a random forest:

: correlation between decision trees

: strength of the decision trees

• Dense feature space decreases • Strong classifiers increases

Better generalization

Depth:

Area:

• Our method automatically learns a coarse-to-fine region of interest (e.g. shown below for ‘playing trumpet’ class)

• This is similar to the human visual system which is believed to analyze raw input from low to high spatial frequencies or from large global shapes to smaller local ones