object detection - through machine learningmachinelearning.math.rs/stegic-objectdetection.pdf ·...

Object DetectionThrough Machine Learning

Machine Learning and Applications Group, 2018.

Uroš Stegić

[email protected]

TRADITIONAL COMPUTER VISION

General Overview

Convolution Operator

Filters

Convolutions Over Volume

Traditional Computer Vision Convolutional Neural Networks Object Detection

Description

Process & analyze visual signalExtract information from visual signalPerform on raw signal (pixel intensities values)

3


Enhance Intuition

Computer Graphics

(x0, y0) = (3, 4)r = 6↓

Computer Vision

↓(x0, y0) = (3, 4)

r = 6

4


Tasks in Computer Vision

Object RecognitionImage RetrievalObject DetectionOCRPose Estimation...

TrackingScene ReconstructionOptical FlowSemantic SegmentationImage Reconstuction...

5


Tasks in Computer Vision

Object RecognitionImage RetrievalObject DetectionOCRPose Estimation...

TrackingScene ReconstructionOptical FlowSemantic SegmentationImage Reconstuction...

6


Convolution Operator - Definition

DefinitionLet A,B ∈ D ⊆ Rn×n. Convolution operator, denoted as ∗maps the spaceD×D to a field of real numbers and is definedas follows:

A ∗ B =

n∑i=1

n∑j=1

AijBij

7


Convolution Operator - Example

1 2 34 5 67 8 9

∗

0 1 01 0 10 1 0

8



1 2 34 5 67 8 9

∗

0 1 01 0 10 1 0

= 2 ∗ 1 + 4 ∗ 1 + 6 ∗ 1 + 8 ∗ 1 = 20

9



1 2 34 5 67 8 9

∗

0 1 01 0 10 1 0

= 2 ∗ 1 + 4 ∗ 1 + 6 ∗ 1 + 8 ∗ 1 = 20

10



1 2 34 5 67 8 9

∗

0 1 01 0 10 1 0

= 2 ∗ 1 + 4 ∗ 1 + 6 ∗ 1 + 8 ∗ 1 = 20

11



1 2 34 5 67 8 9

∗

0 1 01 0 10 1 0

= 2 ∗ 1 + 4 ∗ 1 + 6 ∗ 1 + 8 ∗ 1 = 20

12



1 2 34 5 67 8 9

∗

0 1 01 0 10 1 0

= 2 ∗ 1 + 4 ∗ 1 + 6 ∗ 1 + 8 ∗ 1 = 20

13


Filters

211 39 200 102 174 25 90 144138 44 184 110 193 30 92 136151 73 190 114 189 41 105 128129 101 123 181 201 169 117 191140 122 153 231 209 157 124 113221 115 77 244 198 149 156 247

∗

0 1 01 0 10 1 0

14


Filters - Examples

Vertical Edge ExtractorHorizontal Edge ExtractorSobel filterSharpenGaussian Blur

15


Filters - Edge Extractor

∗

1 0 −11 0 −11 0 −1

=

16


Filters - Edge Extractor

∗

1 0 −11 0 −11 0 −1

=

17


Filters - Sobel

∗

1 0 −12 0 −21 0 −1

=

18


Filters - Gaussian Blur

∗ 1

256

1 4 6 4 14 16 24 16 46 24 36 24 64 16 24 16 41 4 6 4 1

=

19


Multiple Input Channels

Figure: Convolution of multichannel image

20


Multiple Filters

Figure: Convolution of multichannel image with two filters21

CONVOLUTIONAL NEURALNETWORKS

Parameter Learning

Basic CNNs

Residual Networks

Inception Networks


Basic Concepts

Figure: Convolutional layers stacked

23


Basic Concepts - Takeaway

Image ClassificationParameters (filters) Learning [LBD+89]Weight SharingFeature Extraction

24


Basic Concepts - Feature Abstractions

Figure: Feature Visualization [ZF13]

25


Basic Concepts - Pooling Layers

Sampling important FeaturesReduce Computation TimeMake Features Robust

26


Basic Concepts - Pooling Layers (Example)

Pooling Layer - Max Pooling

9 2 4 13 1 8 24 5 9 25 6 0 1

−→[9 86 9

]

27


Basic Concepts - Architecture

Figure: Convolutional Neural Network - Example

28


CNN Architecture - Lenet-5

Figure: Lenet-5 Architecture [LBBH98]

29


CNN Architecture - VGG

Figure: VGG Architecture

30


CNN Architecture - AlexNet

Figure: AlexNet Architecture [KSH12]

31


CNN - Problems

Vanishing GradientExploding GradientComputational Complexity

32


Residual Block

Figure: Residual Block (Skip Connection) [HZRS15]

33


Residual Network

Figure: CNN Architecture - ResNet-34 [HZRS15]

34


1x1 Convolution

Figure: 1x1 Convolution [LCY13]

35


Inception Module - Idea

Figure: Inception Module Naive Version [SLJ+14]

36


Inception Module - Redone

Figure: Inception Module With Dimension Reduction [SLJ+14]

37


Inception Network

Figure: Inception Network (GoogLeNet) [SLJ+14]

38

OBJECT DETECTION

Task Outline

YOLO

RCNN Family

Other Influental Models

Speed/Accuracy Trade-Off


Visualizing the Task

Figure: Object Detection Task40


Understanding the Bounding Box Error

Figure: Bounding Box Missmatch

41


Defining the IoU

Figure: Intersection over Union

42


Gaining Intuition on IoU

Figure: Intersection over Union - Example

43


Similar Bounding Boxes Problem

Figure: Elimination of Multiple Bounding Boxes

44


Non-Maximum Suppression

Threshold every bounding boxSort bounding boxes by detection probability indecresing orderFor each bounding box bi remove all bounding boxesbj(j = i) such that IoU(bi,bj) ≥ t for some fixed t

45


YOLO - Introduction

Figure: Grid for YOLO

y =

pcbxbybwbhc1c2...cn

0You Only Look Once: Unified, Real-Time Object Detection [RDGF15]

46


Limitations (already?)

Problem: Multiple objects centered in same cell

47


Anchor Boxes

Choose a number of anchors (predefined bboxes)Select a ratio (width and height) for each of themModify the output to include this anchors...Profit

48


Anchor Boxes - Example

y1 =

pc1bx1by1bw1bh1c11...cn1

, y2 =

pc2bx2by2bw2bh2c12...cn2

, y =

[y1y2

]

49


YOLO - Loss Fucntion

L(y, y) = λcoord

s2∑i=0

B∑j=0

1objij [(xi − xi)2 + (yi − yi)2]

+ λcoords2∑i=0

B∑j=0

1objij [(

√wi −√

wi)2 + (

√hi −

√hi)

2]

+s2∑i=0

B∑j=0

1objij (Ci − Ci)

2

+ λnoobjs2∑i=0

B∑j=0

1objij (Ci − Ci)

2

+s2∑i=0

1obji

∑c∈classes

(pi(c)− pi(c))2

50


Region Based Approach

Propose Regions of InterestClassify each RoIRegress Bounding Box Coordinates

51


Region Models

Regions with CNN (R-CNN) [GDDM13]Fast R-CNN [Gir15]Faster R-CNN [RHGS15]Mask R-CNN [HGDG17]

52


Region Proposals - Selective Search

Figure: Selective Search Algorithm Visualized

53


R-CNN

Figure: R-CNN Pipeline

54


Fast R-CNN

Convolution Based Sliding WindowROI PoolingSoftmax Classification

55


Fast R-CNN - Sliding Window

Figure: Sliding Window - CNN Implementation

56


Fast R-CNN - Visualized

Figure: Fast R-CNN Pipeline

57


Fast R-CNN - Loss

L(p,u, tu, v) = Lcls(p,u) + λ[u ≥ 1]Lloc(tu, v)

Lcls(p,u) = − logpu

Lloc(tu, v) =∑

i∈{x,y,w,h}smoothL1(tui − vi)

smoothL1(x) ={0.5x2, if x ≤ 1

x− 0.5, otherwise

58


Faster R-CNN

Bottleneck: Region Proposals by Selective Search (2s)Solution: Region Proposals by CNN (0.01s)

59


Region Proposal Network

Figure: Region Proposal Network for Faster R-CNN60


RPN - Loss

L(pi, ti) =1

Ncls

∑iLcls(pi,p∗i ) + λ

1

Nreg

∑ip∗i Lreg(ti, t∗i )

61


Faster R-CNN - Architecture

Figure: Model Scheme of Faster R-CNN

62


Mask R-CNN

Figure: Model Scheme of Faster R-CNN

63


Other Influential Models

RetinaNet (Focal Loss) [LGG+17]Single Shot Detector [LAE+15]

64


RetinaNet

Figure: Retina Net - Overview

65


Speed vs. Precision

Figure: GPU Time vs. Precision [HRS+16]

66


Lecture Pronouncement

CONVERGENCE

67


References I

Ross B. Girshick, Jeff Donahue, Trevor Darrell, andJitendra Malik, Rich feature hierarchies for accurate objectdetection and semantic segmentation, CoRRabs/1311.2524 (2013).Ross B. Girshick, Fast R-CNN, CoRR abs/1504.08083(2015).Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross B.Girshick, Mask R-CNN, CoRR abs/1703.06870 (2017).

68


References II

Jonathan Huang, Vivek Rathod, Chen Sun, MenglongZhu, Anoop Korattikara, Alireza Fathi, Ian Fischer,Zbigniew Wojna, Yang Song, Sergio Guadarrama, andKevin Murphy, Speed/accuracy trade-offs for modernconvolutional object detectors, CoRR abs/1611.10012(2016).Kaiming He, Xiangyu Zhang, Shaoqing Ren, and JianSun, Deep residual learning for image recognition, CoRRabs/1512.03385 (2015).

69


References III

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton,Imagenet classification with deep convolutional neuralnetworks, Advances in Neural Information ProcessingSystems 25 (F. Pereira, C. J. C. Burges, L. Bottou, andK. Q. Weinberger, eds.), Curran Associates, Inc., 2012,pp. 1097–1105.

Wei Liu, Dragomir Anguelov, Dumitru Erhan, ChristianSzegedy, Scott E. Reed, Cheng-Yang Fu, andAlexander C. Berg, SSD: single shot multibox detector,CoRR abs/1512.02325 (2015).Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner,Gradient-based learning applied to document recognition,IEEE (1998), 2278–2324.

70


References IV

Yann Lecun, Bernhard Boser, John Denker, DonHenderson, R E. Howard, W.E. Hubbard, and LarryJackel, Backpropagation applied to handwritten zip coderecognition, Neural Computation 1 (1989), 541–551.

Min Lin, Qiang Chen, and Shuicheng Yan, Network innetwork, CoRR abs/1312.4400 (2013).Tsung-Yi Lin, Priya Goyal, Ross B. Girshick, Kaiming He,and Piotr Dollár, Focal loss for dense object detection,CoRR abs/1708.02002 (2017).Joseph Redmon, Santosh Kumar Divvala, Ross B.Girshick, and Ali Farhadi, You only look once: Unified,real-time object detection, CoRR abs/1506.02640 (2015).

71


References V

Shaoqing Ren, Kaiming He, Ross B. Girshick, and JianSun, Faster R-CNN: towards real-time object detection withregion proposal networks, CoRR abs/1506.01497 (2015).

Christian Szegedy, Wei Liu, Yangqing Jia, PierreSermanet, Scott E. Reed, Dragomir Anguelov, DumitruErhan, Vincent Vanhoucke, and Andrew Rabinovich,Going deeper with convolutions, CoRR abs/1409.4842(2014).Matthew D. Zeiler and Rob Fergus, Visualizing andunderstanding convolutional networks, CoRRabs/1311.2901 (2013).

72

object detection - through machine learningmachinelearning.math.rs/stegic-objectdetection.pdf ·...

Documents