modern convolutional object detectorspeople.ee.duke.edu/~lcarin/kevin9.29.2017.pdf ·...

32
Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Modern Convolutional Object Detectors Faster R-CNN, R-FCN, SSD 29 September 2017 Presented by: Kevin Liang

Upload: others

Post on 17-Jan-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modern Convolutional Object Detectorspeople.ee.duke.edu/~lcarin/Kevin9.29.2017.pdf · 2017-09-30 · Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Conclusions Found

Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison

Modern Convolutional Object DetectorsFaster R-CNN, R-FCN, SSD

29 September 2017

Presented by: Kevin Liang

Page 2: Modern Convolutional Object Detectorspeople.ee.duke.edu/~lcarin/Kevin9.29.2017.pdf · 2017-09-30 · Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Conclusions Found

Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison

Papers Presented

Faster R-CNN: Towards Real-Time Object Detection withRegion Proposal Networks (NIPS 2015)Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun

R-FCN: Object Detection via Region-based FullyConvolutional Networks (NIPS 2016)Jifeng Dai, Yi Li, Kaiming He, Jian Sun

SSD: Single Shot MultiBox Detector (ECCV 2016)Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed,

Cheng-Yang Fu, Alexander C Berg

Speed/accuracy trade-offs for modern convolutional objectdetectors (CVPR 2017)Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara,

Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama,

Kevin Murphy

Page 3: Modern Convolutional Object Detectorspeople.ee.duke.edu/~lcarin/Kevin9.29.2017.pdf · 2017-09-30 · Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Conclusions Found

Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison

Object Detection Models

Page 4: Modern Convolutional Object Detectorspeople.ee.duke.edu/~lcarin/Kevin9.29.2017.pdf · 2017-09-30 · Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Conclusions Found

Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison

Section 1

Background

Page 5: Modern Convolutional Object Detectorspeople.ee.duke.edu/~lcarin/Kevin9.29.2017.pdf · 2017-09-30 · Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Conclusions Found

Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison

Classification (top) vs Detection (bottom)

Page 6: Modern Convolutional Object Detectorspeople.ee.duke.edu/~lcarin/Kevin9.29.2017.pdf · 2017-09-30 · Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Conclusions Found

Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison

Two-stage Detection Paradigm

Many algorithms break up detection into two components:

Box ProposalObject Classification

Page 7: Modern Convolutional Object Detectorspeople.ee.duke.edu/~lcarin/Kevin9.29.2017.pdf · 2017-09-30 · Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Conclusions Found

Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison

Regions with CNN features (R-CNN): 2013

Pros:

Demonstrated ImageNet classification + AlexNet learnedlearned features also applicable for detection

Significant jump in mAP performance on PASCAL VOC overstate-of-the-art

Page 8: Modern Convolutional Object Detectorspeople.ee.duke.edu/~lcarin/Kevin9.29.2017.pdf · 2017-09-30 · Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Conclusions Found

Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison

Regions with CNN features (R-CNN) (continued)

Cons:

Proposals generated by external algorithm (Selective Search)

SVM and bounding box regressors trained separately; savingfeatures to train these is memory intensive

Patches extracted from raw image; overlapping proposals leadto repeated CNN computation for classification

Detection takes ∼47 s/image (GPU)

Page 9: Modern Convolutional Object Detectorspeople.ee.duke.edu/~lcarin/Kevin9.29.2017.pdf · 2017-09-30 · Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Conclusions Found

Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison

Fast R-CNN

Single pass of entire image through CNN; patches extractedfrom convolutional feature maps

Final classification and bounding box regression trained jointlywith CNN

Test time detection is 213x as fast as R-CNN

Proposals still generated by external algorithm

Page 10: Modern Convolutional Object Detectorspeople.ee.duke.edu/~lcarin/Kevin9.29.2017.pdf · 2017-09-30 · Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Conclusions Found

Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison

Two-stage Detection Paradigm - Revisited

Page 11: Modern Convolutional Object Detectorspeople.ee.duke.edu/~lcarin/Kevin9.29.2017.pdf · 2017-09-30 · Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Conclusions Found

Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison

Section 2

Faster R-CNN

Page 12: Modern Convolutional Object Detectorspeople.ee.duke.edu/~lcarin/Kevin9.29.2017.pdf · 2017-09-30 · Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Conclusions Found

Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison

Two-stage Detection Paradigm - Revisited

Page 13: Modern Convolutional Object Detectorspeople.ee.duke.edu/~lcarin/Kevin9.29.2017.pdf · 2017-09-30 · Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Conclusions Found

Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison

Faster R-CNN: Model Architecture

Box proposal andclassification shareconvolutionalcomputation

Patches extractedfrom convolutionalfeature map

Detection ∼5 fps

Page 14: Modern Convolutional Object Detectorspeople.ee.duke.edu/~lcarin/Kevin9.29.2017.pdf · 2017-09-30 · Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Conclusions Found

Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison

Faster R-CNN: Region Proposal Network

Sliding window passed over final convolutional feature maps

Box proposals are parameterized relative to fixed sizereference boxes (termed ”anchors”)

Multiple scales and and aspect ratios (eg 1x1, 2x1, 1x2, 2x2,etc.)

Page 15: Modern Convolutional Object Detectorspeople.ee.duke.edu/~lcarin/Kevin9.29.2017.pdf · 2017-09-30 · Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Conclusions Found

Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison

Faster R-CNN: Training

Page 16: Modern Convolutional Object Detectorspeople.ee.duke.edu/~lcarin/Kevin9.29.2017.pdf · 2017-09-30 · Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Conclusions Found

Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison

Faster R-CNN: Loss

RPN Loss:

L(pi, ti) =1

Ncls

∑i

Lcls(pi, p∗i ) + λ

1

Nreg

∑i

p∗iLreg(ti, t∗i ) (1)

Bounding box parameterization:

tx = (x− xa)/wa, ty = (y − ya)/hatw = log(w/wa), th = log(h/ha)

t∗x = (x∗ − xa)/wa, t∗y = (y∗ − ya)/hat∗w = log(w∗/wa), t∗h = log(h∗/ha)

Localization loss:

Lreg(t, t∗) =

∑j∈x,y,w,h

smoothL1(tj − t∗j ),

smoothL1(x) =

{0.5x2 if |x| < 1

|x| − 0.5 otherwise

Page 17: Modern Convolutional Object Detectorspeople.ee.duke.edu/~lcarin/Kevin9.29.2017.pdf · 2017-09-30 · Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Conclusions Found

Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison

Faster R-CNN: ResNet

ResNet weights:

Page 18: Modern Convolutional Object Detectorspeople.ee.duke.edu/~lcarin/Kevin9.29.2017.pdf · 2017-09-30 · Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Conclusions Found

Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison

Section 3

R-FCN

Page 19: Modern Convolutional Object Detectorspeople.ee.duke.edu/~lcarin/Kevin9.29.2017.pdf · 2017-09-30 · Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Conclusions Found

Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison

Region-based Fully Convolutional Networks: Inspiration

Fast and Faster R-CNN save time by sharing computation ofrepeated convolutional features for object classification andregion proposals, respectively

However, Faster R-CNN still contains several unshared fullyconnected layers that must be computed for each of hundredsof proposals

Dilemma of opposing needs for translation invariance(classification) and translation variance (localization)⇒ Leads to unnatural insertion of region proposals betweenconvolutional layers

Page 20: Modern Convolutional Object Detectorspeople.ee.duke.edu/~lcarin/Kevin9.29.2017.pdf · 2017-09-30 · Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Conclusions Found

Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison

Region-based Fully Convolutional Networks: ModelArchitecture

Page 21: Modern Convolutional Object Detectorspeople.ee.duke.edu/~lcarin/Kevin9.29.2017.pdf · 2017-09-30 · Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Conclusions Found

Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison

Region-based Fully Convolutional Networks: Visualization

Page 22: Modern Convolutional Object Detectorspeople.ee.duke.edu/~lcarin/Kevin9.29.2017.pdf · 2017-09-30 · Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Conclusions Found

Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison

Region-based Fully Convolutional Networks: Benefits andContributions

2.5-20x faster than Faster R-CNN

Comparable accuracy

Easy Online Hard Example Mining (OHEM)

Use of atrous convolutions

Page 23: Modern Convolutional Object Detectorspeople.ee.duke.edu/~lcarin/Kevin9.29.2017.pdf · 2017-09-30 · Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Conclusions Found

Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison

Section 4

SSD

Page 24: Modern Convolutional Object Detectorspeople.ee.duke.edu/~lcarin/Kevin9.29.2017.pdf · 2017-09-30 · Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Conclusions Found

Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison

Two-stage Detection Paradigm - Revisited Again

Page 25: Modern Convolutional Object Detectorspeople.ee.duke.edu/~lcarin/Kevin9.29.2017.pdf · 2017-09-30 · Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Conclusions Found

Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison

Single Shot Multibox Detector: Model Architecture

Page 26: Modern Convolutional Object Detectorspeople.ee.duke.edu/~lcarin/Kevin9.29.2017.pdf · 2017-09-30 · Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Conclusions Found

Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison

Single Shot Multibox Detector: Default boxes

Page 27: Modern Convolutional Object Detectorspeople.ee.duke.edu/~lcarin/Kevin9.29.2017.pdf · 2017-09-30 · Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Conclusions Found

Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison

Section 5

Speed/Accuracy Comparison

Page 28: Modern Convolutional Object Detectorspeople.ee.duke.edu/~lcarin/Kevin9.29.2017.pdf · 2017-09-30 · Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Conclusions Found

Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison

CNNs and Meta Architectures Summary

Page 29: Modern Convolutional Object Detectorspeople.ee.duke.edu/~lcarin/Kevin9.29.2017.pdf · 2017-09-30 · Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Conclusions Found

Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison

Accuracy vs Speed

Page 30: Modern Convolutional Object Detectorspeople.ee.duke.edu/~lcarin/Kevin9.29.2017.pdf · 2017-09-30 · Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Conclusions Found

Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison

Accuracy/Speed vs Box proposals

Page 31: Modern Convolutional Object Detectorspeople.ee.duke.edu/~lcarin/Kevin9.29.2017.pdf · 2017-09-30 · Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Conclusions Found

Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison

Conclusions

Found a speed-accuracy performance ”frontier”

For two-stage set-ups, more box proposals leads to betterperformance, but at the cost of speed and significantlydiminishing returns.

General correlation of convolutional feature extractorperformance on ImageNet classification and performance aspart of an object detector.

Higher resolution images lead to higher quality localization,but at the cost of speed and memory. A similar trade-offexists with respect to convolutional feature extractor outputstride (atrous convolutions).

Won 2016 MS COCO object detection challenge byensembling these implementations.

Page 32: Modern Convolutional Object Detectorspeople.ee.duke.edu/~lcarin/Kevin9.29.2017.pdf · 2017-09-30 · Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Conclusions Found

Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison

Open-Sourced TensorFlow Implementations