【arxiv】feature evaluation of deep convolutional neural networks for object recognition and...

Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection Hirokatsu KATAOKA , Kenji Iwata, Yutaka SATOH National Institute of Advanced Industrial Science and Technology (AIST) http://www.hirokatsukataoka.net/ arXiv preprint arXiv:1509.07627 http://arxiv.org/abs/1509.07627

Upload: hirokatsu-kataoka

Post on 14-Apr-2017

1.440 views

Category:

Science

1 download

Report

Download

Embed Size (px):

TRANSCRIPT

Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection

Hirokatsu KATAOKA, Kenji Iwata, Yutaka SATOH

National Institute of Advanced Industrial Science and Technology (AIST)

http://www.hirokatsukataoka.net/

arXiv preprint arXiv:1509.07627 http://arxiv.org/abs/1509.07627

Page 2: 【arXiv】Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection

Feature Evaluation •  Significant task in computer vision –  Based on the DeCAF [Donahue+, ICML2014], we evaluate several CNN

features + SVM classifier –  The representative architecture: AlexNet [Krizhevsky+, NIPS2012] &

VGGNet[Simonyan+, ICLR2015] –  Basic Idea1: Which layer has better feature in CNN architecture? –  Basic Idea2: Mid- & High-level CNN features should be concatenated! (e.g. Layer 3 + Layer 5 + Layer 7)

Page 3: 【arXiv】Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection

CNN Architecture & Feature Extraction •  AlexNet & VGGNet –  AlexNet: 8-layer architecture –  VGGNet: 16-layer arhitecture (each pooling layer and last 2 FC layers are

applied as feature vector)

Input

Conv

Pool

Conv

Pool

So.max

Input

Conv

Pool

AlexNet

VGGNet

Conv

Pool

Conv

Pool

Conv

Pool

Conv

Pool

So.max

Input

Conv

Pool

So.max

: Image input

: Convolu:onal layer

: Max-‐pooling layer

: Fully-‐connected layer

: So.max layer

Layer1

Layer2

Layer3

Layer4

Layer5

Layer6

Layer7

Layer1

Layer2

Layer3

Layer4

Layer5

Layer6

Layer7

Page 4: 【arXiv】Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection

Experiment •  Settings –  Layer: 3 – 7 (middle and deeper layers) •  Conv., pooling and fully-connected layers

–  Concatenation and transformation •  Layer 345, 456, 567, 357 •  Principal component analysis (PCA): 1500dims

–  Classifier •  Support vector machine (SVM) •  The parameters are based on DeCAF [Donahue+, ICML2014]

•  Datasets –  Daimler pedestrian benchmark dataset (pedestrian detection) [Munder+,

TPAMI2006] –  Caltech 101 dataset (object classification) [Fei-Fei+, CVPRW2004]

Page 5: 【arXiv】Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection

Results on the Daimler dataset •  Daimler pedestrian benchmark dataset –  VGGNet Layer 5 (original vector) is the best rate (99.35%) –  In AlexNet, Layer 3 with PCA is the best rate (98.71%)

Mid-layer is tend to be better rate on the pedestrian detection data

Page 6: 【arXiv】Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection

Results on the Caltech 101 dataset •  Caltech 101 dataset –  VGGNet Layer 5 (original vector) is the best rate (91.80%) –  In AlexNet, Layer 5 with PCA is the best rate (78.37%)

The layer before FC layer performs good rate in object classification

Page 7: 【arXiv】Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection

Feature Concatenation •  Three-layer connection with PCA –  Layer 345, 456, 567, 357 –  4,500 dimensions (1,500dims at each vector) –  Left: Daimler –  Right: Caltech 101

Daimler Caltech 101

VGGNet layer 567 is the significant tuning Pedestrian detection: mid-level feature Object classification: high-level feature