【arxiv】feature evaluation of deep convolutional neural networks for object recognition and...

8
Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection Hirokatsu KATAOKA , Kenji Iwata, Yutaka SATOH National Institute of Advanced Industrial Science and Technology (AIST) http://www.hirokatsukataoka.net/ arXiv preprint arXiv:1509.07627 http://arxiv.org/abs/1509.07627

Upload: hirokatsu-kataoka

Post on 14-Apr-2017

1.440 views

Category:

Science


1 download

TRANSCRIPT

Page 1: 【arXiv】Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection

Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection

Hirokatsu KATAOKA, Kenji Iwata, Yutaka SATOH

National Institute of Advanced Industrial Science and Technology (AIST)

http://www.hirokatsukataoka.net/

arXiv preprint arXiv:1509.07627 http://arxiv.org/abs/1509.07627

Page 2: 【arXiv】Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection

Feature Evaluation •  Significant task in computer vision –  Based on the DeCAF [Donahue+, ICML2014], we evaluate several CNN

features + SVM classifier –  The representative architecture: AlexNet [Krizhevsky+, NIPS2012] &

VGGNet[Simonyan+, ICLR2015] –  Basic Idea1: Which layer has better feature in CNN architecture? –  Basic Idea2: Mid- & High-level CNN features should be concatenated! (e.g. Layer 3 + Layer 5 + Layer 7)

Page 3: 【arXiv】Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection

CNN Architecture & Feature Extraction •  AlexNet & VGGNet –  AlexNet: 8-layer architecture –  VGGNet: 16-layer arhitecture (each pooling layer and last 2 FC layers are

applied as feature vector)

Input  

Conv  

Conv  

Pool  

Conv  

Pool  

FC  

FC  

So.max  

Input  

Conv  

Conv  

Pool  

FC  

FC  

AlexNet  

VGGNet  

Conv  

Conv  

Pool  

Conv  

Conv  

Pool  

Conv  

Conv  

Pool  

Conv  

Conv  

Pool  

FC  

So.max  

Input  

Conv  

Pool  

FC  

So.max  

:  Image  input  

:  Convolu:onal  layer  

:  Max-­‐pooling  layer  

:  Fully-­‐connected  layer  

:  So.max  layer  

Layer1  

Layer2  

Layer3  

Layer4  

Layer5  

Layer6  

Layer7  

Layer1  

Layer2  

Layer3  

Layer4  

Layer5  

Layer6  

Layer7  

Page 4: 【arXiv】Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection

Experiment •  Settings –  Layer: 3 – 7 (middle and deeper layers) •  Conv., pooling and fully-connected layers

–  Concatenation and transformation •  Layer 345, 456, 567, 357 •  Principal component analysis (PCA): 1500dims

–  Classifier •  Support vector machine (SVM) •  The parameters are based on DeCAF [Donahue+, ICML2014]

•  Datasets –  Daimler pedestrian benchmark dataset (pedestrian detection) [Munder+,

TPAMI2006] –  Caltech 101 dataset (object classification) [Fei-Fei+, CVPRW2004]

Page 5: 【arXiv】Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection

Results on the Daimler dataset •  Daimler pedestrian benchmark dataset –  VGGNet Layer 5 (original vector) is the best rate (99.35%) –  In AlexNet, Layer 3 with PCA is the best rate (98.71%)

Mid-layer is tend to be better rate on the pedestrian detection data

Page 6: 【arXiv】Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection

Results on the Caltech 101 dataset •  Caltech 101 dataset –  VGGNet Layer 5 (original vector) is the best rate (91.80%) –  In AlexNet, Layer 5 with PCA is the best rate (78.37%)

The layer before FC layer performs good rate in object classification

Page 7: 【arXiv】Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection

Feature Concatenation •  Three-layer connection with PCA –  Layer 345, 456, 567, 357 –  4,500 dimensions (1,500dims at each vector) –  Left: Daimler –  Right: Caltech 101

Daimler Caltech 101

VGGNet layer 567 is the significant tuning Pedestrian detection: mid-level feature Object classification: high-level feature

Page 8: 【arXiv】Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection

Conclusion •  Feature evaluation with AlexNet & VGGNet –  VGGNet is better than AlexNet

–  Mid-level feature is good for pedestrian detection, and high-level feature is

good for object classification task

–  Concatenation of VGGNet - 5th Pooling, last 2 FC layers is the best setting on

the Daimler pedestrian benchmark and Caltech 101 dataset

–  PCA is effective transformation for CNN feature