object detection - through machine learningmachinelearning.math.rs/stegic-objectdetection.pdf ·...
TRANSCRIPT
Object DetectionThrough Machine Learning
Machine Learning and Applications Group, 2018.
Uroš Stegić
TRADITIONAL COMPUTER VISION
General Overview
Convolution Operator
Filters
Convolutions Over Volume
Traditional Computer Vision Convolutional Neural Networks Object Detection
Description
Process & analyze visual signalExtract information from visual signalPerform on raw signal (pixel intensities values)
3
Traditional Computer Vision Convolutional Neural Networks Object Detection
Enhance Intuition
Computer Graphics
(x0, y0) = (3, 4)r = 6↓
Computer Vision
↓(x0, y0) = (3, 4)
r = 6
4
Traditional Computer Vision Convolutional Neural Networks Object Detection
Tasks in Computer Vision
Object RecognitionImage RetrievalObject DetectionOCRPose Estimation...
TrackingScene ReconstructionOptical FlowSemantic SegmentationImage Reconstuction...
5
Traditional Computer Vision Convolutional Neural Networks Object Detection
Tasks in Computer Vision
Object RecognitionImage RetrievalObject DetectionOCRPose Estimation...
TrackingScene ReconstructionOptical FlowSemantic SegmentationImage Reconstuction...
6
Traditional Computer Vision Convolutional Neural Networks Object Detection
Convolution Operator - Definition
DefinitionLet A,B ∈ D ⊆ Rn×n. Convolution operator, denoted as ∗maps the spaceD×D to a field of real numbers and is definedas follows:
A ∗ B =
n∑i=1
n∑j=1
AijBij
7
Traditional Computer Vision Convolutional Neural Networks Object Detection
Convolution Operator - Example
1 2 34 5 67 8 9
∗
0 1 01 0 10 1 0
8
Traditional Computer Vision Convolutional Neural Networks Object Detection
Convolution Operator - Example
1 2 34 5 67 8 9
∗
0 1 01 0 10 1 0
= 2 ∗ 1 + 4 ∗ 1 + 6 ∗ 1 + 8 ∗ 1 = 20
9
Traditional Computer Vision Convolutional Neural Networks Object Detection
Convolution Operator - Example
1 2 34 5 67 8 9
∗
0 1 01 0 10 1 0
= 2 ∗ 1 + 4 ∗ 1 + 6 ∗ 1 + 8 ∗ 1 = 20
10
Traditional Computer Vision Convolutional Neural Networks Object Detection
Convolution Operator - Example
1 2 34 5 67 8 9
∗
0 1 01 0 10 1 0
= 2 ∗ 1 + 4 ∗ 1 + 6 ∗ 1 + 8 ∗ 1 = 20
11
Traditional Computer Vision Convolutional Neural Networks Object Detection
Convolution Operator - Example
1 2 34 5 67 8 9
∗
0 1 01 0 10 1 0
= 2 ∗ 1 + 4 ∗ 1 + 6 ∗ 1 + 8 ∗ 1 = 20
12
Traditional Computer Vision Convolutional Neural Networks Object Detection
Convolution Operator - Example
1 2 34 5 67 8 9
∗
0 1 01 0 10 1 0
= 2 ∗ 1 + 4 ∗ 1 + 6 ∗ 1 + 8 ∗ 1 = 20
13
Traditional Computer Vision Convolutional Neural Networks Object Detection
Filters
211 39 200 102 174 25 90 144138 44 184 110 193 30 92 136151 73 190 114 189 41 105 128129 101 123 181 201 169 117 191140 122 153 231 209 157 124 113221 115 77 244 198 149 156 247
∗
0 1 01 0 10 1 0
14
Traditional Computer Vision Convolutional Neural Networks Object Detection
Filters - Examples
Vertical Edge ExtractorHorizontal Edge ExtractorSobel filterSharpenGaussian Blur
15
Traditional Computer Vision Convolutional Neural Networks Object Detection
Filters - Edge Extractor
∗
1 0 −11 0 −11 0 −1
=
16
Traditional Computer Vision Convolutional Neural Networks Object Detection
Filters - Edge Extractor
∗
1 0 −11 0 −11 0 −1
=
17
Traditional Computer Vision Convolutional Neural Networks Object Detection
Filters - Sobel
∗
1 0 −12 0 −21 0 −1
=
18
Traditional Computer Vision Convolutional Neural Networks Object Detection
Filters - Gaussian Blur
∗ 1
256
1 4 6 4 14 16 24 16 46 24 36 24 64 16 24 16 41 4 6 4 1
=
19
Traditional Computer Vision Convolutional Neural Networks Object Detection
Multiple Input Channels
Figure: Convolution of multichannel image
20
Traditional Computer Vision Convolutional Neural Networks Object Detection
Multiple Filters
Figure: Convolution of multichannel image with two filters21
CONVOLUTIONAL NEURALNETWORKS
Parameter Learning
Basic CNNs
Residual Networks
Inception Networks
Traditional Computer Vision Convolutional Neural Networks Object Detection
Basic Concepts
Figure: Convolutional layers stacked
23
Traditional Computer Vision Convolutional Neural Networks Object Detection
Basic Concepts - Takeaway
Image ClassificationParameters (filters) Learning [LBD+89]Weight SharingFeature Extraction
24
Traditional Computer Vision Convolutional Neural Networks Object Detection
Basic Concepts - Feature Abstractions
Figure: Feature Visualization [ZF13]
25
Traditional Computer Vision Convolutional Neural Networks Object Detection
Basic Concepts - Pooling Layers
Sampling important FeaturesReduce Computation TimeMake Features Robust
26
Traditional Computer Vision Convolutional Neural Networks Object Detection
Basic Concepts - Pooling Layers (Example)
Pooling Layer - Max Pooling
9 2 4 13 1 8 24 5 9 25 6 0 1
−→[9 86 9
]
27
Traditional Computer Vision Convolutional Neural Networks Object Detection
Basic Concepts - Architecture
Figure: Convolutional Neural Network - Example
28
Traditional Computer Vision Convolutional Neural Networks Object Detection
CNN Architecture - Lenet-5
Figure: Lenet-5 Architecture [LBBH98]
29
Traditional Computer Vision Convolutional Neural Networks Object Detection
CNN Architecture - VGG
Figure: VGG Architecture
30
Traditional Computer Vision Convolutional Neural Networks Object Detection
CNN Architecture - AlexNet
Figure: AlexNet Architecture [KSH12]
31
Traditional Computer Vision Convolutional Neural Networks Object Detection
CNN - Problems
Vanishing GradientExploding GradientComputational Complexity
32
Traditional Computer Vision Convolutional Neural Networks Object Detection
Residual Block
Figure: Residual Block (Skip Connection) [HZRS15]
33
Traditional Computer Vision Convolutional Neural Networks Object Detection
Residual Network
Figure: CNN Architecture - ResNet-34 [HZRS15]
34
Traditional Computer Vision Convolutional Neural Networks Object Detection
1x1 Convolution
Figure: 1x1 Convolution [LCY13]
35
Traditional Computer Vision Convolutional Neural Networks Object Detection
Inception Module - Idea
Figure: Inception Module Naive Version [SLJ+14]
36
Traditional Computer Vision Convolutional Neural Networks Object Detection
Inception Module - Redone
Figure: Inception Module With Dimension Reduction [SLJ+14]
37
Traditional Computer Vision Convolutional Neural Networks Object Detection
Inception Network
Figure: Inception Network (GoogLeNet) [SLJ+14]
38
OBJECT DETECTION
Task Outline
YOLO
RCNN Family
Other Influental Models
Speed/Accuracy Trade-Off
Traditional Computer Vision Convolutional Neural Networks Object Detection
Visualizing the Task
Figure: Object Detection Task40
Traditional Computer Vision Convolutional Neural Networks Object Detection
Understanding the Bounding Box Error
Figure: Bounding Box Missmatch
41
Traditional Computer Vision Convolutional Neural Networks Object Detection
Defining the IoU
Figure: Intersection over Union
42
Traditional Computer Vision Convolutional Neural Networks Object Detection
Gaining Intuition on IoU
Figure: Intersection over Union - Example
43
Traditional Computer Vision Convolutional Neural Networks Object Detection
Similar Bounding Boxes Problem
Figure: Elimination of Multiple Bounding Boxes
44
Traditional Computer Vision Convolutional Neural Networks Object Detection
Non-Maximum Suppression
Threshold every bounding boxSort bounding boxes by detection probability indecresing orderFor each bounding box bi remove all bounding boxesbj(j = i) such that IoU(bi,bj) ≥ t for some fixed t
45
Traditional Computer Vision Convolutional Neural Networks Object Detection
YOLO - Introduction
Figure: Grid for YOLO
y =
pcbxbybwbhc1c2...cn
0You Only Look Once: Unified, Real-Time Object Detection [RDGF15]
46
Traditional Computer Vision Convolutional Neural Networks Object Detection
Limitations (already?)
Problem: Multiple objects centered in same cell
47
Traditional Computer Vision Convolutional Neural Networks Object Detection
Anchor Boxes
Choose a number of anchors (predefined bboxes)Select a ratio (width and height) for each of themModify the output to include this anchors...Profit
48
Traditional Computer Vision Convolutional Neural Networks Object Detection
Anchor Boxes - Example
y1 =
pc1bx1by1bw1bh1c11...cn1
, y2 =
pc2bx2by2bw2bh2c12...cn2
, y =
[y1y2
]
49
Traditional Computer Vision Convolutional Neural Networks Object Detection
YOLO - Loss Fucntion
L(y, y) = λcoord
s2∑i=0
B∑j=0
1objij [(xi − xi)2 + (yi − yi)2]
+ λcoords2∑i=0
B∑j=0
1objij [(
√wi −√
wi)2 + (
√hi −
√hi)
2]
+s2∑i=0
B∑j=0
1objij (Ci − Ci)
2
+ λnoobjs2∑i=0
B∑j=0
1objij (Ci − Ci)
2
+s2∑i=0
1obji
∑c∈classes
(pi(c)− pi(c))2
50
Traditional Computer Vision Convolutional Neural Networks Object Detection
Region Based Approach
Propose Regions of InterestClassify each RoIRegress Bounding Box Coordinates
51
Traditional Computer Vision Convolutional Neural Networks Object Detection
Region Models
Regions with CNN (R-CNN) [GDDM13]Fast R-CNN [Gir15]Faster R-CNN [RHGS15]Mask R-CNN [HGDG17]
52
Traditional Computer Vision Convolutional Neural Networks Object Detection
Region Proposals - Selective Search
Figure: Selective Search Algorithm Visualized
53
Traditional Computer Vision Convolutional Neural Networks Object Detection
R-CNN
Figure: R-CNN Pipeline
54
Traditional Computer Vision Convolutional Neural Networks Object Detection
Fast R-CNN
Convolution Based Sliding WindowROI PoolingSoftmax Classification
55
Traditional Computer Vision Convolutional Neural Networks Object Detection
Fast R-CNN - Sliding Window
Figure: Sliding Window - CNN Implementation
56
Traditional Computer Vision Convolutional Neural Networks Object Detection
Fast R-CNN - Visualized
Figure: Fast R-CNN Pipeline
57
Traditional Computer Vision Convolutional Neural Networks Object Detection
Fast R-CNN - Loss
L(p,u, tu, v) = Lcls(p,u) + λ[u ≥ 1]Lloc(tu, v)
Lcls(p,u) = − logpu
Lloc(tu, v) =∑
i∈{x,y,w,h}smoothL1(tui − vi)
smoothL1(x) ={0.5x2, if x ≤ 1
x− 0.5, otherwise
58
Traditional Computer Vision Convolutional Neural Networks Object Detection
Faster R-CNN
Bottleneck: Region Proposals by Selective Search (2s)Solution: Region Proposals by CNN (0.01s)
59
Traditional Computer Vision Convolutional Neural Networks Object Detection
Region Proposal Network
Figure: Region Proposal Network for Faster R-CNN60
Traditional Computer Vision Convolutional Neural Networks Object Detection
RPN - Loss
L(pi, ti) =1
Ncls
∑iLcls(pi,p∗i ) + λ
1
Nreg
∑ip∗i Lreg(ti, t∗i )
61
Traditional Computer Vision Convolutional Neural Networks Object Detection
Faster R-CNN - Architecture
Figure: Model Scheme of Faster R-CNN
62
Traditional Computer Vision Convolutional Neural Networks Object Detection
Mask R-CNN
Figure: Model Scheme of Faster R-CNN
63
Traditional Computer Vision Convolutional Neural Networks Object Detection
Other Influential Models
RetinaNet (Focal Loss) [LGG+17]Single Shot Detector [LAE+15]
64
Traditional Computer Vision Convolutional Neural Networks Object Detection
RetinaNet
Figure: Retina Net - Overview
65
Traditional Computer Vision Convolutional Neural Networks Object Detection
Speed vs. Precision
Figure: GPU Time vs. Precision [HRS+16]
66
Traditional Computer Vision Convolutional Neural Networks Object Detection
Lecture Pronouncement
CONVERGENCE
67
Traditional Computer Vision Convolutional Neural Networks Object Detection
References I
Ross B. Girshick, Jeff Donahue, Trevor Darrell, andJitendra Malik, Rich feature hierarchies for accurate objectdetection and semantic segmentation, CoRRabs/1311.2524 (2013).Ross B. Girshick, Fast R-CNN, CoRR abs/1504.08083(2015).Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross B.Girshick, Mask R-CNN, CoRR abs/1703.06870 (2017).
68
Traditional Computer Vision Convolutional Neural Networks Object Detection
References II
Jonathan Huang, Vivek Rathod, Chen Sun, MenglongZhu, Anoop Korattikara, Alireza Fathi, Ian Fischer,Zbigniew Wojna, Yang Song, Sergio Guadarrama, andKevin Murphy, Speed/accuracy trade-offs for modernconvolutional object detectors, CoRR abs/1611.10012(2016).Kaiming He, Xiangyu Zhang, Shaoqing Ren, and JianSun, Deep residual learning for image recognition, CoRRabs/1512.03385 (2015).
69
Traditional Computer Vision Convolutional Neural Networks Object Detection
References III
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton,Imagenet classification with deep convolutional neuralnetworks, Advances in Neural Information ProcessingSystems 25 (F. Pereira, C. J. C. Burges, L. Bottou, andK. Q. Weinberger, eds.), Curran Associates, Inc., 2012,pp. 1097–1105.
Wei Liu, Dragomir Anguelov, Dumitru Erhan, ChristianSzegedy, Scott E. Reed, Cheng-Yang Fu, andAlexander C. Berg, SSD: single shot multibox detector,CoRR abs/1512.02325 (2015).Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner,Gradient-based learning applied to document recognition,IEEE (1998), 2278–2324.
70
Traditional Computer Vision Convolutional Neural Networks Object Detection
References IV
Yann Lecun, Bernhard Boser, John Denker, DonHenderson, R E. Howard, W.E. Hubbard, and LarryJackel, Backpropagation applied to handwritten zip coderecognition, Neural Computation 1 (1989), 541–551.
Min Lin, Qiang Chen, and Shuicheng Yan, Network innetwork, CoRR abs/1312.4400 (2013).Tsung-Yi Lin, Priya Goyal, Ross B. Girshick, Kaiming He,and Piotr Dollár, Focal loss for dense object detection,CoRR abs/1708.02002 (2017).Joseph Redmon, Santosh Kumar Divvala, Ross B.Girshick, and Ali Farhadi, You only look once: Unified,real-time object detection, CoRR abs/1506.02640 (2015).
71
Traditional Computer Vision Convolutional Neural Networks Object Detection
References V
Shaoqing Ren, Kaiming He, Ross B. Girshick, and JianSun, Faster R-CNN: towards real-time object detection withregion proposal networks, CoRR abs/1506.01497 (2015).
Christian Szegedy, Wei Liu, Yangqing Jia, PierreSermanet, Scott E. Reed, Dragomir Anguelov, DumitruErhan, Vincent Vanhoucke, and Andrew Rabinovich,Going deeper with convolutions, CoRR abs/1409.4842(2014).Matthew D. Zeiler and Rob Fergus, Visualizing andunderstanding convolutional networks, CoRRabs/1311.2901 (2013).
72