object detection lecture 10.3 - introduction to deep … · deep learning • computational models...

24
Object Detection Lecture 10.3 - Introduction to deep learning (CNN) Idar Dyrdal

Upload: vohuong

Post on 21-Aug-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Object Detection Lecture 10.3 - Introduction to deep … · Deep Learning • Computational models composed of multiple processing layers (non-linear transformations) • Used to

Object Detection

Lecture 10.3 - Introduction to deep learning (CNN)

Idar Dyrdal

Page 2: Object Detection Lecture 10.3 - Introduction to deep … · Deep Learning • Computational models composed of multiple processing layers (non-linear transformations) • Used to

Deep Learning

• Computational models composed of multiple processing layers (non-linear transformations)

• Used to learn representations of data with multiple levels of abstraction:

• Learning a hierarchy of feature extractors

• Each level in the hierarchy extracts features from the output of the previous layer (pixels classes)

• Deep learning has dramatically improved state-of-the-art in:

• Speech and character recognition

• Visual object detection and recognition

• Convolutional neural nets for processing of images, video, speech and signals (time series) in general

• Recurrent neural nets for processing of sequential data (speech, text).

2

Level 3

Level 2

Level 1

Raw data (images, video, signals)

Labels

Page 3: Object Detection Lecture 10.3 - Introduction to deep … · Deep Learning • Computational models composed of multiple processing layers (non-linear transformations) • Used to

Deep Learning for Object Recognition

3

«Ship»

Millions of images Millions of parameters Thousands of classes

(AlexNet)

Page 4: Object Detection Lecture 10.3 - Introduction to deep … · Deep Learning • Computational models composed of multiple processing layers (non-linear transformations) • Used to

Traditional supervised learning

4

Training

images

Feature

extraction

Classifier

training Classifier

Class

labels

Page 5: Object Detection Lecture 10.3 - Introduction to deep … · Deep Learning • Computational models composed of multiple processing layers (non-linear transformations) • Used to

Deep learning

5

Training

images

Feature

extraction

Classifier

training Classifier

Class

labels

• Learning of weights in the processing layers

• Supervised, unsupervised (or semi-supervised) learning

Page 6: Object Detection Lecture 10.3 - Introduction to deep … · Deep Learning • Computational models composed of multiple processing layers (non-linear transformations) • Used to

Semi-supervised learning

6

Labeled samples and (trained) linear

decision boundary Labeled and unlabeled samples and non-

linear decision boundary

Page 7: Object Detection Lecture 10.3 - Introduction to deep … · Deep Learning • Computational models composed of multiple processing layers (non-linear transformations) • Used to

Artificial Neural Network (ANN)

Used in Machine Learning and Pattern Recognition: • Regression • Classification • Clustering • …

Applications:

• Speech recognition • Recognition of handwritten text • Image classification • …

Network types:

• Feed-forward neural networks • Recurrent neural networks (RNN) • …

7

Feed-forward ANN (non-linear classifier)

Page 8: Object Detection Lecture 10.3 - Introduction to deep … · Deep Learning • Computational models composed of multiple processing layers (non-linear transformations) • Used to

Mark 1 Perceptron (Rosenblatt, 1957-59)

8

(Cornell Aeronautical Laboratory)

Page 9: Object Detection Lecture 10.3 - Introduction to deep … · Deep Learning • Computational models composed of multiple processing layers (non-linear transformations) • Used to

Activation functions

Sigmoid (logistic function):

Hyperbolic tangent:

Rectified linear unit (ReLU):

9

(Quasar Jarosz, English Wikipedia)

Biological neuron:

Page 10: Object Detection Lecture 10.3 - Introduction to deep … · Deep Learning • Computational models composed of multiple processing layers (non-linear transformations) • Used to

Feed-forward neural network

10

i

j

k

l

Page 11: Object Detection Lecture 10.3 - Introduction to deep … · Deep Learning • Computational models composed of multiple processing layers (non-linear transformations) • Used to

Back-propagation

11

i

j

k

l

Page 12: Object Detection Lecture 10.3 - Introduction to deep … · Deep Learning • Computational models composed of multiple processing layers (non-linear transformations) • Used to

Convolutional Neural Network (CNN)

Used in Machine Vision and Image Analysis:

• Speech Recognition

• Image Recognition

• Video Recognition

• Image Segmentation

• …

Convolutional neural network:

• Multi-layer feed-forward ANN

• Combinations of convolutional and fully connected layers

• Convolutional layers with local connectivity

• Shared weights across spatial positions

• Local or global pooling layers

12

(A. Karpathy)

Page 13: Object Detection Lecture 10.3 - Introduction to deep … · Deep Learning • Computational models composed of multiple processing layers (non-linear transformations) • Used to

Typical CNN

13

(Aphex34)

Learned features

Page 14: Object Detection Lecture 10.3 - Introduction to deep … · Deep Learning • Computational models composed of multiple processing layers (non-linear transformations) • Used to

Convolutional neural net

14

Convolution (learned)

Input image

Feature map

Non-linearity

Spatial pooling

Normalization

Input image

(credit: S. Lazebnik)

Page 15: Object Detection Lecture 10.3 - Introduction to deep … · Deep Learning • Computational models composed of multiple processing layers (non-linear transformations) • Used to

Convolutional neural net

15

Convolution (learned)

Input image

Feature map

Non-linearity

Spatial pooling

Normalization

Input Feature Map

.

.

.

(credit: S. Lazebnik)

Page 16: Object Detection Lecture 10.3 - Introduction to deep … · Deep Learning • Computational models composed of multiple processing layers (non-linear transformations) • Used to

Convolutional neural net

16

Convolution (learned)

Input image

Feature map

Non-linearity

Spatial pooling

Normalization Rectified Linear Unit (ReLU)

(credit: S. Lazebnik)

Page 17: Object Detection Lecture 10.3 - Introduction to deep … · Deep Learning • Computational models composed of multiple processing layers (non-linear transformations) • Used to

Convolutional neural net

17

Convolution (learned)

Input image

Feature map

Non-linearity

Spatial pooling

Normalization Max pooling

(credit: S. Lazebnik)

Max-pooling: a non-linear down-sampling

Provide translation invariance

Page 18: Object Detection Lecture 10.3 - Introduction to deep … · Deep Learning • Computational models composed of multiple processing layers (non-linear transformations) • Used to

Convolutional neural net

18

Convolution (learned)

Input image

Feature map

Non-linearity

Spatial pooling

Normalization

Feature Maps Feature Maps After Contrast Normalization

(credit: S. Lazebnik)

Page 19: Object Detection Lecture 10.3 - Introduction to deep … · Deep Learning • Computational models composed of multiple processing layers (non-linear transformations) • Used to

Convolutional neural net

19

Convolution (learned)

Input image

Feature map

Non-linearity

Spatial pooling

Normalization

Feature maps after contrast normalization

(credit: S. Lazebnik)

Page 20: Object Detection Lecture 10.3 - Introduction to deep … · Deep Learning • Computational models composed of multiple processing layers (non-linear transformations) • Used to

Example - Caffe Demos

20

http://demo.caffe.berkeleyvision.org

Page 21: Object Detection Lecture 10.3 - Introduction to deep … · Deep Learning • Computational models composed of multiple processing layers (non-linear transformations) • Used to

21

Page 22: Object Detection Lecture 10.3 - Introduction to deep … · Deep Learning • Computational models composed of multiple processing layers (non-linear transformations) • Used to

22

Page 23: Object Detection Lecture 10.3 - Introduction to deep … · Deep Learning • Computational models composed of multiple processing layers (non-linear transformations) • Used to

Example - Semantic Segmentation (SegNet)

23

http://mi.eng.cam.ac.uk/projects/segnet/

Page 24: Object Detection Lecture 10.3 - Introduction to deep … · Deep Learning • Computational models composed of multiple processing layers (non-linear transformations) • Used to

Summary

Topics covered:

• Deep learning

• Artificial neural networks

• Convolutional neural networks

More information:

• Szeliski, chapter 14

• Yann LeCun ,Yoshua Bengio & Geoffrey Hinton, “Deep learning”, Nature, Vol 521, 28. May 2015.

• Shaohuai Shi, Qiang Wang, Pengfei Xu, Xiaowen Chu, “Benchmarking State-of-the-Art Deep Learning Software Tools”, 2017 (https://arxiv.org/pdf/1608.07249.pdf)

Software: • Caffe (http://caffe.berkeleyvision.org) • TensorFlow (https://www.tensorflow.org/) • MatConvNet (http://www.vlfeat.org/matconvnet)

24