introduction to deep learning for biomedical...

138
Introduction to Deep Learning for Biomedical Engineering After a presentation made by: Evan Shelhamer, Jeff Donahue, Jon Long caffe.berkeleyvision.org github.com/BVLC/caffe 1 Prof. Bart ter Haar Romeny

Upload: trinhkhanh

Post on 21-Aug-2018

238 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Introduction to Deep Learning for Biomedical

Engineering

After a presentation made by:Evan Shelhamer, Jeff Donahue, Jon Long

caffe.berkeleyvision.orggithub.com/BVLC/caffe 1

Prof. Bart ter Haar Romeny

Page 2: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

What isDeep Learning?

2

Page 3: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

3

A typical Deep Convolutional Neural Network

Page 4: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

4

Page 5: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

5

ImageNet – Fei Fei Li

ImageNet Large ScaleVisual Recognition Competition(ILSVRC)

AlexNET

Page 6: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

6

Page 7: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

7

Litjens, Geert, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen AWM van der Laak, Bram van Ginneken, and Clara I. Sánchez. "A survey on deep learning in medical image analysis." arXiv preprint arXiv:1702.05747 (Feb 2017).

Page 8: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

8

Page 9: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

9

Power of heatmaps – Train on image level, visualize on pixel level.

Page 10: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

10Samaneh Abbasi, Bart Romeny et al. TU/e:Recurrent Convolutional Neural Networks, MICCAI 2017, Quebec City

Page 11: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

11

Samaneh Abbasi et al. TU/e:Recurrent ConvolutionalNeural Networks,MICCAI 2017, Quebec City

Page 12: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

12

Page 13: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

13

Page 14: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

14

For Diabetic Retinopathy the best detection performance is by Quellec et al.: Az = 0.954 in Kaggle’s dataset and Az = 0.949 in e-Ophtha.

Page 15: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

15

Why Deep Learning?

Applications

The Challengeof Recognition

Learning & Optimization

Network Tour Transfer Learning

Deep Learning for VisionDive into

Deep LearningWhat is DL?Why Now?

Caffe First Sip

Page 16: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Why Deep Learning? End-to-End Learning for Many Tasks

vision speech text control

16

Presenter
Presentation Notes
deep learning has proven useful for many purposes and not just one single task this is the point of the learning end-to-end, that is, learning the whole problem from input to output:�the same toolkit can work for different domains whether vision, speech, text, or control and robotics we’ll focus on vision, and next we’ll look at core visual recognition tasks and the standard benchmarks for each problem. deep learning approaches have delivered dramatic improvements across these and many other tasks.
Page 17: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Some examples

Demo: Google translate on smartphone (speech + images)

Demo: https://www.imageidentify.com/

How does this work?

Биомедицинская инженерияToday you can read this Russian text with your smartphone

Kaggle: Diabetic Retinopathy ChallengeBlog

Google Photos

Page 18: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

18

Other examples:

Robot vision and recognition:Harvest robot for peppers.

Wageningen University, the Netherlands

Vision for self-driving cars

Page 19: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

19

Aalsmeer, Netherlands, largest flower auction in the world

Page 20: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

20

Quick facts and figures about the Dutch Horticulture industry

The Dutch horticulture sector is a global trendsetter and the undisputed international market leader in flowers, plants, bulbs and propagation material.

Did you know?• Holland has a 44% share of the worldwide trade in floricultural products, making it the dominant global supplier of flowers and flower products. Some 77% of all flower bulbs traded worldwide come from the Netherlands, the majority of which are tulips. 40% of the trade in 2015 was cut flowers and flower buds.• The sector is the number 1 exporter to the world for live trees, plants, bulbs, roots and cut flowers.• The sector is the number 3 exporter in nutritional horticulture products.• Of the approximately 1,800 new plant varieties that enter the European market each year, 65% originate in the Netherlands. In addition, Dutch breeders account for more than 35% of all applications for community plant variety rights.• The Dutch are one of the world’s largest exporter of seeds: the exports of seeds amounted to € 3.1 billion in 2014.• In 2014 the Netherlands was the world’s second largest exporter (in value) of fresh vegetables. The Netherlands exported vegetables with a market value of € 7 billion.

Page 21: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

21

Page 22: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

From Wikipedia:

Deep learning is a class of machine learning algorithms that

• use a deep cascade of many layers of nonlinear processing unitsfor feature extraction and transformation.

• Each successive layer uses the output from the previous layer as input. • The algorithms may be supervised or unsupervised.• Applications include pattern analysis (unsupervised) and classification (supervised).

Page 23: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

• are based on the (unsupervised) learning of multiple levels of features or representations of the data.

• Higher level features are derived from lower level features to form a hierarchical representation.

Deep Learning

So we have to learn:

1. Overview in depth → Introduction, Caffe example2. What are filters? → Convolution and convolution networks3. What is learned? → Invariant geometric features4. How can kernels be learned? → Principal Component Analysis5. How does the visual system this? → Front-end vision, visual cortex6. How can we use this? → Software developments in Deep Learning7. Questions → and answers

Page 24: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Deep Learning is a very hot area of Machine Learning Research, with many remarkable recent successes, such as 97.5% accuracy on face recognition, nearly perfect German traffic sign recognition, or even Dogs vs Cats image recognition with 98.9% accuracy.

Many winning entries in recent Kaggle Data Science competitions have used Deep Learning.

The term "deep learning" refers to the method of training multi-layered neural networks, and became popular after papers by Geoffrey Hinton and his co-workers which showed a fast way to train such networks.

http://www.kdnuggets.com/2014/05/learn-deep-learning-courses-tutorials-overviews.html

Page 25: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Yann LeCun, a student of Geoff Hinton, also developed a very effective algorithm for deep learning, called ConvNet, which was successfully used in late 80-s and early 90-s for automatic reading of amounts on bank checks.

In May 2014, Baidu, the Chinese search giant, has hired Andrew Ng, a leading Machine Learning and Deep Learning expert (and co-founder of Coursera) to head their new AI Lab in Silicon Valley, setting up an AI & Deep Learning race with Google (which hired Geoffrey Hinton) and Facebook (which hired Yann LeCun to head Facebook AI Lab).

Page 26: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering
Page 27: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

27

Human vision and convolutional neural networks:

A cascade of increasing complexity

• Hierarchical network• Use of context

Page 28: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

28

Wikipedia: Gestalt psychology or gestaltism (German: Gestalt "shape, form") is a philosophy of mind of the Berlin School of experimental psychology. Gestalt psychology is an attempt to understand the laws behind the ability to acquire and maintain meaningful perceptions in an apparently chaotic world. The central principle of gestalt psychology is that the mind forms a global whole with self-organizing tendencies. The assumed physiological mechanisms on which Gestalt theory rests are poorly defined and support for their existence is lacking. It is known as ‘perceptual grouping’.

AlexNET - pdf

Page 29: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Vision: the highest bandwidth input channel

29

Machines are useful mainly to the extent that they interact with the physical worldVisual information is the richest source of information about the real world

Vision is the highest-bandwidth mode for machines to obtain real-world info

Embedded vision enables our things to be:- More responsive- More personal and secure- Safer, more autonomous- Easier to use

subaru.com

Page 30: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

30

http://www.kdnuggets.com/2017/02/top-arxiv-papers-january-convnets-wide-adversarial.html

Top papers on arXiv (https://arxiv.org/) :

Page 31: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

31

Performance evaluation: http://www.robots.ox.ac.uk/~vgg/research/deep_eval/

VOC:

VisualObjectClasses

Page 32: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Why Now?1.Data

ImageNet et al.: millions of labeled (crowdsourced) images1.Compute

GPUs: terabytes/s memory bandwidth, teraflops compute1.Technique

new optimization know-how,new variants on old architectures,new tools for rapid experimentation

32

Presenter
Presentation Notes
note the importance of memory bandwidth: it determines how fast you can look at all that data
Page 33: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Why Now? DataFor example:

>14 million labeled images>1 million with bounding boxes

>300,000 images with labeled and segmented objects

33

URL

Page 34: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Why Now? GPUs

Parallel processorsfor parallel models:

Inherent Parallelismsame op, different data

Bandwidthlots of data in and out

Tuned PrimitivescuDNN and cuBLASfor deep nets for matrices 34

Nvidia News URL

Presenter
Presentation Notes
mention ILSVRC in particular as standard contest mention/include industrial data, e.g. Facebook, YouTube have much much more data than represented here the data is valuable!
Page 35: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

GPU – Graphical Processing Unit

35

Thousands of parallell coresFully programmable in e.g. CUDAVery affordableShared large memory (e.g. 12 GB)In large server banksCan be rented by Amazon, Baidu, Alibaba etc.

Page 36: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Titan Xp GPU

36

Page 37: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Why Now? TechniqueNon-convex and high-dimensional learning is okaywith the right design choices

e.g. non-saturating non-linearities

Learning by Stochastic Gradient Descent (SGD) with momentum and other variants — more later!

instead of

37

Page 38: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

38

Examples from NVIDIA:https://developer.nvidia.com/deep-learning

Page 39: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

39

DeepBreak

Presenter
Presentation Notes
mention the traditional picture of getting stuck in local minima, and how this is not a problem in practice
Page 40: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

What is Deep Learning?

Compositional ModelsLearned End-to-End

Hierarchy of Representations- vision: pixel, motif, part, object- text: character, word, clause, sentence- speech: audio, band, phone, word concrete

abstract

layer1

input

layer2

loss

θ1

θ2

truth

output

θ3

40

Page 41: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Back-propagation jointly learnsall of the model parameters tooptimize the output for the task—more on this later!

What is Deep Learning?

Compositional ModelsLearned End-to-End

41

layer1

input

layer2

loss

θ1

θ2

truth

output

θ3

Page 42: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Shallow Learning

[slide credit K. Cho]

Separation of hand engineering and machine learning

42

= a conclusion reached on the basis of evidence and reasoning

Presenter
Presentation Notes
note that representations are learned, and don’t correspond exactly to examples given
Page 43: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Hand-Engineered Features

43Features from years of vision expertise by the whole community are nowsurpassed by learned representations and these transfer across tasks

[figure credit R. Fergus]

Presenter
Presentation Notes
note that deep learning does not have to be backprop
Page 44: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Deep Learning

44[slide credit K. Cho]

Presenter
Presentation Notes
shallow learning: logistic regression, svm, decision tree, codebook -> quantization -> classification pipeline
Page 45: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

45

End-to-End Learning Representations

The visual world is too vast and variedto fully describe by hand

Learn the representation from datalocal appearance parts and texture objects and semantics

[figure credit H. Lee]

Presenter
Presentation Notes
all the data -> learning learning -> all the tasks
Page 46: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Hierarchical growth of complexity

46

Page 47: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

47

End-to-End Learning Tasks

The visual world is too vast and variedto fully describe by hand

Learn the task from data

Presenter
Presentation Notes
layers: compositionality feature sharing learning: better task performance other data computation time
Page 48: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Types of Learning

Vast space of models!

[figure credit Marc’aurelio Ranzato, CVPR 2014 tutorial]

Deep Network

Recurrent Network

Convolutional Network

48

Example: TensorFlow (URL)

Page 49: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

49

The Neural Networks ZOO : http://www.asimovinstitute.org/neural-network-zoo/

Page 50: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

50

Neural Network Graphs : http://www.asimovinstitute.org/neural-network-zoo-prequel-cells-layers/

Page 51: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

51

Neural Network Graphs : http://www.asimovinstitute.org/neural-network-zoo-prequel-cells-layers/

Page 52: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

History

Is deep learning 4, 20, or 50 years old? What’s changed?

2000s Sparse, Probabilistic, and Layer-wise models (Hinton, Bengio, Ng)2012 DL popularized in vision by contest victory (Krizhevsky et al. 2012)

Rosenblatt’s Perceptron52

Radial Basis Function

Page 53: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Convolutional Networks: 1989

LeNet: a layered model composed of convolution and subsampling layers followed by a holistic representationand ultimately a classifier for handwritten digits [LeNet]

53

Note: channel dimension goes upas spatial dimension goes down... still a common pattern today

Page 54: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

AlexNet: a layered model composed of convolution, subsampling, and further operations followed by a holistic representation and all-in-all a landmark classifier onILSVRC12 [AlexNet]

+ data+ gpu+ non-saturating non-linearity+ regularization 54

Convolutional Networks: 2012

Presenter
Presentation Notes
gloss connected the dots exploration of model structure optimization know-how computation + data
Page 55: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

55

FC 1000

FC 4096 / ReLU

FC 4096 / ReLU

Max Pool 3x3s2

Conv 3x3s1, 256 / ReLU

Conv 3x3s1, 384 / ReLU

Conv 3x3s1, 384 / ReLU

Max Pool 3x3s2

Local Response Norm

Conv 5x5s1, 256 / ReLU

Max Pool 3x3s2

Local Response NormConv 11x11s4, 96 /

ReLU

FC-ReLU:stack at end of the net to learn outputmajority of the learned parameters

Conv-Pool: 1+ conv are followed by pooling to subsamplespatial size shrinks; receptive field grows

Conv-ReLU:all conv are followed by non-linearityin this case ReLU

Convnet Design Patterns

Page 56: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Convnet Computation: 2012 & 2014AlexNet inference for a single image (3x227x227 input):

- 725M FLOPS

- 60M parameters (60,965,224 to be exact)

- 408 mb GPU memory in Caffe<12 gb for batch size of 1,500

- <1ms / image on Titan X with cuDNN v4for batch size >= 256

56

Compare GoogleNet (ILSVRC14 winner):- 2x FLOPs- 0.1x the parameters- 14% more accurate

Architecture matters!But the computational primitives are the same.

FC 1000

FC 4096 / ReLU

FC 4096 / ReLU

Max Pool 3x3s2

Conv 3x3s1, 256 / ReLU

Conv 3x3s1, 384 / ReLU

Conv 3x3s1, 384 / ReLU

Max Pool 3x3s2

Local Response Norm

Conv 5x5s1, 256 / ReLU

Max Pool 3x3s2

Local Response Norm

Conv 11x11s4, 96 / ReLU

4M

16M

37M

442K

1.3M

884K

307K

35K

4M

16M

37M

74M

112M

149M

223M

105M

params FLOPsAlexNet

Page 57: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Convolutional Nets: 2014

GoogLeNet ILSVRC14 Winner: ~6.6% Top-5 error- composition of multi-scale dimension-reduced

“Inception” modules- no FC layers and only 5 million parameters

+ depth+ auxiliary classifiers+ dimensionality reduction

57[Szegedy15]

Page 58: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

1x1 Convolution

58

- reduce channel dimension to control 1. parameter count 2. computation- stack with non-linearity for deeper net- found in many of the latest nets

each filter has size64x1x1 and does a64-dim dot product

1x1 convwith 32 filters

[figure credit A. Karpathy]

Presenter
Presentation Notes
comment on inference v. training (this is the time for inference on a single image; a training iteration is roughly 2-3x the computation and is iterated many times) go through each AlexNet, then gloss over GoogLeNet FLOPS: 725,066,088 for all conv + fc w/ biases 000,659,272 for ReLU 000,027,000 for pooling 000,020,000 for LRN layer, weight ops, bias ops conv1 105415200 290400 conv2 223948800 186624 conv3 149520384 64896 conv4 112140288 64896 conv5 74760192 43264 fc6 37748736 4096 fc7 16777216 4096 fc8 4096000 1000 conv2 has 256 * (96 / 2) * 5^2 = 307,200 params
Page 59: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Convolutional Nets: 2014

VGG16 ILSVRC14 Runner-up: ~7.3% Top-5 error- 13 layers of 3x3 convolution interleaved with

max pooling + 3 fully-connected layers - simple architecture, good for transfer learning- 155 million params and more expensive to compute

+ depth+ fine-tuning deeper and deeper+ stacking small filters

59

FC 1000

FC 4096 / ReLU

FC 4096 / ReLUMax Pool 2x2s2

Conv 3x3s1, 256 / ReLU

Conv 3x3s1, 256 / ReLU

Conv 3x3s1, 256 / ReLU

Max Pool 2x2s2

Conv 3x3s1, 128 / ReLU

Max Pool 2x2s2

Conv 3x3s1, 64 / ReLU

Conv 3x3s1, 64 / ReLU

Conv 3x3s1, 128 / ReLU Max Pool 2x2s2

Conv 3x3s1, 512 / ReLU

Conv 3x3s1, 512 / ReLU

Conv 3x3s1, 512 / ReLU

Max Pool 2x2s2

Conv 3x3s1, 512 / ReLU

Conv 3x3s1, 512 / ReLU

Conv 3x3s1, 512 / ReLU

stack 23x3 conv

for a 5x5 receptive field

[figure creditA. Karpathy]

[Simonyan15]

Page 60: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

ILSVRC15 and COCO15 Winner: MSRA ResNet- classification- detection- segmentation

Convolutional Nets: 2015

Learn residual mapping w.r.t. identity

- very deep 100+ layer nets

- skip connections across layers

- batch normalization

60

Kaiming He, et al.Deep Residual Learning for Image RecognitionarXiv 1512.03385. Dec. 2015.

[He15]

Page 61: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Convolutional Nets: 2015

MSRA ResNet

(~5x the layers shown here)

ILSVRC15 Winner 3.5% Top-5 error andCOCO15 Winner with >10% lead for detection and segmentation

- MSRA Residual Net (ResNet): 101 and 152 layer networks- skip and sum layers to form residuals- batch normalization (optimization trick) 61[He15]

Page 63: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Why Now?1.Data

ImageNet et al.: millions of labeled (crowdsourced) images1.Compute

GPUs: terabytes/s memory bandwidth, teraflops compute1.Technique

new optimization know-how,new variants on old architectures,new tools for rapid experimentation

63

Presenter
Presentation Notes
http://arxiv.org/abs/1512.03385
Page 64: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Why Now? DataFor example:

>14 million labeled images>1 million with bounding boxes

>300,000 images with labeled and segmented objects

64

URL

Page 65: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Why Now? GPUs

Parallel processorsfor parallel models:

Inherent Parallelismsame op, different data

Bandwidthlots of data in and out

Tuned PrimitivescuDNN and cuBLASfor deep nets for matrices 65

Nvidia News URL

Presenter
Presentation Notes
note the importance of memory bandwidth: it determines how fast you can look at all that data
Page 66: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

GPU – Graphical Processing Unit

66

Thousands of parallell coresFully programmable in e.g. CUDAVery affordableShared large memory (e.g. 12 GB)In large server banksCan be rented by Amazon, Baidu, Alibaba etc.

Page 67: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Titan Xp GPU

67

Presenter
Presentation Notes
mention ILSVRC in particular as standard contest mention/include industrial data, e.g. Facebook, YouTube have much much more data than represented here the data is valuable!
Page 68: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Why Now? TechniqueNon-convex and high-dimensional learning is okaywith the right design choices

e.g. non-saturating non-linearities

Learning by Stochastic Gradient Descent (SGD) with momentum and other variants — more later!

instead of

68

Page 69: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

framework

Why Now? Deep Learning Frameworks

networkinternal

representation

tools:visualization, profiling, debugging, etc.

layer library:fast implementations of common functions and gradients

backend:dispatch compute for learning and inference

frontend:a language for any network, any task

69

Page 70: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Deep Learning Frameworks

all open sourcewe like to brew our networks with Caffe

CaffeBerkeley / BVLCC++ / CUDA, Python, MATLAB

TorchFacebook + NYULua (C++)

TheanoU. MontrealPython

TensorFlowGooglePython (C++)

70

Page 71: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

- This isn’t a problem (except for neuroscientists)

- Be wary of neural realism hype or “it just works because it’s like the brain”

- network, not neural networkunit, not neuron

Not So “Neural”

71

These models are not how the brain worksWe don’t know how the brain works!

Page 72: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Visual Recognition TasksClassification- what kind of image?- which kind(s) of objects?

Challenges- appearance varies by

lighting, pose, context, ...- clutter- fine-grained categorization

(horse or exact species) 72

❏ dog❏ car❏ horse❏ bike❏ cat❏ bottle❏ person

Presenter
Presentation Notes
mention the traditional picture of getting stuck in local minima, and how this is not a problem in practice
Page 73: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

73

Page 74: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Image Classification: ILSVRC 2010-2015

[graph credit K. He]74

top-5error

❏ dog❏ car❏ horse❏ bike❏ cat❏ bottle❏ person

ImageNet Large Scale Visual Recognition Competition

Website

AlexNET - pdf

Page 75: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Visual Recognition Tasks

75

car person horse

Detection- what objects are there?- where are the objects?

Challenges- localization- multiple instances- small objects

Presenter
Presentation Notes
for-real edition of “it works because”: more data/supervision more of the model is made learnable
Page 76: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Detection: PASCAL VOC

[graph credit R. Girshick]76

dete

ctio

n ac

cura

cy

R-CNN:regions +convnets

state-of-the-art, in Caffe

Visual object classes

Presenter
Presentation Notes
classification is the fundamental visual task of recognizing what is in an image or what type of image it is. for example the kinds of objects in the image shown are car, horse, and person but we could also consider tasks like whether this is a daytime or nighttime image classification is challenging because of the many differences in appearance seen in the visual world, like lighting, pose, style, and so on clutter or noise can obscure the information to be extracted from the image fine-grained categorization is a further difficulty when we want to recognize not just any horse but an exact species
Page 77: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Semantic Segmentation- what kind of thing

is each pixel part of?- what kind of stuff

is each pixel?

Challenges- tension between

recognition and localization

- amount of computation

Visual Recognition Tasks

77

horse

car

Page 78: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

78

Some examples:

• NVIDIA news:https://news.developer.nvidia.com/google-releases-tensorflow-1-0/http://nvidianews.nvidia.com/news?q=neural+nets&year=&month=&c=&from=&to= http://nvidianews.nvidia.com/news?q=deep+learning&year=&month=&c=&from=&to=

• Free book:http://neuralnetworksanddeeplearning.com/

• Other books:MIT: https://pdfs.semanticscholar.org/751f/aab15cbb955b07537fc38901bc96d4e70f57.pdf

• New companies:http://aidence.com/

• Papers:Classical paper: http://www.nature.com/nature/journal/v521/n7553/abs/nature14539.htmlImageNet: http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks(cited 11342 times) CAD: https://www.nature.com/articles/srep24454

• Google TensorFlow:https://www.tensorflow.org/get_started/

• Kaggle Diabetic Retinopahy Challenge: https://www.kaggle.com/c/diabetic-retinopathy-detection(see also our BMIE project: www.retinacheck.org/zh/index.html).

• Google Diabetic Retinopathy paper:https://research.googleblog.com/2016/11/deep-learning-for-detection-of-diabetic.html?m=1

Presenter
Presentation Notes
This graph shows the latest results as of the 2015 challenge with years running right to left The introduction of deep learning not only dropped the error by almost 10 points in 2012, but deep learning methods have improved in accuracy every year while networks are made deeper and deeper many of the contest winners and runners-up were done with Caffe or reproduced with Caffe, including the latest winner (ResNet) Mention speed as well as accuracy Highlighted done in Caffe (ResNet, VGG) or reproduced in Caffe (GoogLeNet, AlexNet)
Page 79: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Some Basics of Deep Learning

79

Presenter
Presentation Notes
detection is the task of recognizing not only what but where: both the identity and location of each object need to be predicted while classification considered only presence or absence, detection demands the recognition of every instance as we see for all three cars localization is difficult especially for interacting or articulated objects like the person and horse or for small objects that are easy to miss
Page 80: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

80

Why Deep Learning?

Applications

The Challengeof Recognition

Learning & Optimization

Network Tour Transfer Learning

Deep Learning for Vision

Embedded Vision Alliance Tutorial – © Shelhamer, Donahue, Long

Dive intoDeep Learning

What is DL?Why Now?

Caffe First Sip

Presenter
Presentation Notes
deep learning is likewise having a remarkable impact on detection PASCAL VOC is a gold standard dataset and challenge with fierce competition detection accuracy scores both recognition and localization by 2012 progress had slowed and plateaued only to be driven further by the adoption of deep learning R-CNN Gloss mean AP as “detection accuracy” == a measure of recognition and localization gold standard for detection. drove the data set + challenge shift in computer vision. successor is COCO
Page 81: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

First Dive Into Deep Learning

81

Deep Learning is

Stacking LayersandLearning End-to-End

Presenter
Presentation Notes
semantic segmentation is a visual recognition task that asks the identity of every pixel for things this means what kind of object is the pixel part of, as in this example output that shows which are the horse pixels and which are the person pixels we could just as well ask for what kind of stuff each pixel is, such as grass or sky, or in the context of a satellite image there might be road, buildings, crops, water, and so forth. in this task there is a tension between recognizing what globally and where locally computational cost can be an obstacle now that there is a decision to be made for every pixel
Page 82: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Deep networks are layered models made bystacking different types of transformation

A layer is a transformation

82

Stacking Layers

x’ = layer(x)

x2 = layer1(x1)x3 = layer2(x2)...

How do layers stack?

Page 83: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Networks run layer-by-layer, composingthe input-output transformation of each layer

83

Layered Networks

layer1

layer2

output

input

layer1

layer2

output

input

During learning, the error is passed backlayer-by-layer to tune the transformations

layer1

layer2

output

input What kind of layers should we stack?

x1out

= layer1(input)= layer2(x1)

output+ error

Page 84: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Non-linearity

84

The simplest layers

Matrix Multiplication

(for example)

Page 85: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

85

Matrix Multiplication

Multiply input x by weights W and add bias bLearns linear transformations

K x O dimensionalK inputsO outputs

O outputs

Page 86: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

86

Matrix Multiplication == Fully Connected Layer

Output is a function of every input, or the input and output are“fully connected”

Abbreviated as FC

[figure credit BDTI]

Presenter
Presentation Notes
note: animated
Page 87: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

- Suppose our data points (x) are 2D and each comes with a label y, where y = -1 or y = 1

- Learn a weight vector w = [w1; w2]

- Predict the class of a given xby sign(wTx) = sign(w1x1 + w2x2)

87

Linear Classification

?

To classify we need to separate the data into red vs. blue

y = -1

y = 1

x1

x2

Page 88: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

- Suppose our data points (x) are 2D and each comes with a label y, where y = -1 or y = 1

- Learn a weight vector w = [w1; w2]

- Predict the class of a given xby sign(wTx) = sign(w1x1 + w2x2)

88

Linear Classification

To classify we need to separate the data into red vs. blue

y = -1

y = 1

x1

x2

Page 89: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

89

Linearity is Not Enough

To classify we need to separate the data into red vs. blue

y = -1

y = 1

?x1

x2

Page 90: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

NO90

Linearity is Not Enough

To classify we need to separate the data into red vs. blue

y = -1

y = 1

x1

x2

Page 91: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

NO91

Linearity is Not Enough

To classify we need to separate the data into red vs. blue

y = -1

y = 1

x1

x2

Presenter
Presentation Notes
armed with matrix multiplication we can do linear classification separate the data into red vs. blue the data x is 2-dimensional, with axes x1 and x2 while the output y is simply -1 / red or +1 / blue learn weights w1 and w2 to weight x1 and x2 to represent a separating line what line?
Page 92: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

NO92

Linearity is Not Enough

To classify we need to separate the data into red vs. blue

y = -1

y = 1

x1

x2

Presenter
Presentation Notes
armed with matrix multiplication we can do linear classification separate the data into red vs. blue the data x is 2-dimensional, with axes x1 and x2 while the output y is simply -1 / red or +1 / blue learn weights w1 and w2 to weight x1 and x2 to represent a separating line what line?
Page 93: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

YES93

Linearity is Not Enough

To classify we need to separate the data into red vs. blue

y = -1

y = 1

Non-linearity!

x1

x2

Presenter
Presentation Notes
in practice linearity is not enough and real world data requires more sophisticated classifiers what line separates this data? none need non-linearity
Page 94: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

YES94

Linearity is Not Enough

To classify we need to separate the data into red vs. blue

y = -1

y = 1

Non-linearity!

x1

x2

Presenter
Presentation Notes
in practice linearity is not enough and real world data requires more sophisticated classifiers what line separates this data? none need non-linearity
Page 95: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

95

The Limits of Linearity

Linear steps collapse and stay linear

Linear layers alone do not meaningfully stack

Presenter
Presentation Notes
in practice linearity is not enough and real world data requires more sophisticated classifiers what line separates this data? none need non-linearity
Page 96: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

96

The Shallowest Deep Net

Deep nets are made by stacking learned linear layersand simple pointwise non-linear layers

Due to the Rectified Linear Unit (ReLU) non-linearity max(0, x), x3 cannot be computed as a linear function of x1

Linear Non-linear, Deep

add ReLU

Presenter
Presentation Notes
in practice linearity is not enough and real world data requires more sophisticated classifiers what line separates this data? none need non-linearity
Page 97: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Non-linearity is needed to deepen the representationMany non-linearities or activations to choose from

97

Non-linearityReLU

Sigmoid

Presenter
Presentation Notes
in practice linearity is not enough and real world data requires more sophisticated classifiers what line separates this data? none need non-linearity
Page 98: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Yet more non-linearities

98

ReLU

Sigmoid

TanH

Leaky ReLU

When in doubt, ReLU

Worth trying Leaky ReLU, ELU

Avoid Sigmoid

ELU

Presenter
Presentation Notes
in practice linearity is not enough and real world data requires more sophisticated classifiers what line separates this data? none need non-linearity
Page 99: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

99

Define Your First Net

Let’s go non-linear ona classification problem

Try It OutDeep Learning in your browser demos

Page 100: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

100

Designing for Sight

Convolutional Networks or convnets are nets for vision

- functional fit for the visual worldby compositionality and feature sharing

- learned end-to-end to handle visual detailfor more accuracy and less engineering

Convnets are the dominant architectures for visual tasks

Page 101: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

101

Visual StructureLocal Processing: pixels close together go togetherreceptive fields capture local detail

Across Space: the same what, no matter whererecognize the same input in different places

Presenter
Presentation Notes
Pointwise non-linearities
Page 102: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

102

Visual StructureLocal Processing: pixels close together go togetherreceptive fields capture local detail

Across Space: the same what, no matter whererecognize the same input in different places

Can rely on spatial coherence This is not a cat

Page 103: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

103

Visual StructureLocal Processing: pixels close together go togetherreceptive fields capture local detail

Across Space: the same what, no matter whererecognize the same input in different places

Can rely on spatial coherence This is not a cat

All of these are cats

Page 104: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

104

Vision Layers

Convolution/Filteringlinear layer for vision

Poolingspatial summarization max pool 2x2

with stride 2

Learned Filter

[figure credit A. Karpathy, cs231n course]

Page 105: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

So use the same weights between nodes with the same spatial relationship

Convolution: A Linear Layer for VisionImages have translation invariant semantics: these are all equally squirrels

105

This is convolution (or correlation—used interchangeably in vision)Convolution means fewer parameters for more efficient learning

Page 106: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

106

A Filter

input is 3x32x32 dataa color image (3 RGB channels) and square (32x32)

A filter is a spatially local and cross-channel templateConvnet filters are learned

[figure adapted from A. Karpathy]

Page 107: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

107

A Filter

input is 3x32x32 dataa color image (3 RGB channels) and square (32x32)

A filter is a spatially local and cross-channel templateConvnet filters are learned

filter is 3x5x5 weights- spatially local: kernel size is 5x5- cross-channel: connected across all input channels

[figure adapted from A. Karpathy]

Page 108: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

108

A Filter

input is 3x32x32 dataa color image (3 RGB channels) and square (32x32)

A filter is a spatially local and cross-channel templateConvnet filters are learned

filter is 3x5x5 weights- spatially local: kernel size is 5x5- cross-channel: connected across all input channels

total parameters:3*52 = 75 filter weights + 1 bias

[figure adapted from A. Karpathy]

Page 109: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

One filter evaluation is a dot product between the input window and weights + bias

109

Convolution

32

inputfilterbiasoutput

3x32x323x5x5

11

[figure adapted from A. Karpathy]

Presenter
Presentation Notes
use the same weights for the same spatial relationship
Page 110: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

110

Convolution

32

inputfilterbiasoutput

3x32x323x5x5

11

feature map

1x28x28

[figure adapted from A. Karpathy]

Convolving the filter with the input gives a feature map.

Page 111: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

111

Convolution

32

inputfilterbiasoutput

3x32x323x5x5

11

feature map

Convolving the filter with the input gives a feature map.

1x28x28

Filter parameters:FC parameters:

3*52 = 753*322 = 3,072 [figure adapted from A. Karpathy]

Page 112: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

112

Convolution Layer (conv)

32

inputfiltersbiasoutput

3x32x326x3x5x5

66x28x28

feature maps

Convolution layers have multiple filters for more modeling capacity

Convolution Layer

[figure adapted from A. Karpathy]

Page 113: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

113

Convolution Layer (conv)

32

inputfiltersbiasoutput

3x32x326x3x5x5

66x28x28

feature maps

Convolution layers have multiple filters for more modeling capacity

Convolution LayerLearned Filters from AlexNet conv1

conv1 has 96 filters foredge, color, and frequency

richer than 3D RGB [figure adapted from A. Karpathy]

Presenter
Presentation Notes
weights are shared across space
Page 114: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

114

Pooling (pool)

2x2 pooling, stride 2Max pooling

Average pooling

Spatial summary by computingoperation over window with stride

- overlapping or non-overlapping

- separate across channels

- Current fashion:3x3 max poolingwith stride 2

[figure credit BDTI]

Presenter
Presentation Notes
weights are shared across space
Page 115: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

[figure credit A. Karpathy]

Pooling

115

- reduce resolution

- increase receptive field sizefor later layers

- save computation

- add invariance to translation/noise within pooling window

64x224x22464x112x112

Presenter
Presentation Notes
weights are shared across space
Page 116: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Fully Connected Layers (FC)

116

Learn a global feature from the full feature mapsOften found at the end of convnetsNote: this could likewise be done by a large convolution kernel

feature maps2x2x2

unroll

input1 x 8

weights8 x 3

outputsor units

1 x 3

bias1 x 3

Page 117: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

117

Normalization Layers (Deprecated)Local response normalization was popular for a time but is now deprecated;more recent networks do not include these layers

[figure credit BDTI]

Page 118: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

118

- layers compute differentiable transformations

- types of layers: conv, ReLU, pool, FC

- parameters (conv, FC) or not (pool, ReLU)

- arguments like kernel size, stride, etc. (conv pool)

Layer Review

Page 119: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

119

Convnet Architecture

Input Image Scores

Conv 3x3s1, 10 / ReLU Type: Conv Kernel Size: 3x3 Stride: 1 Channels:10 Activation: ReLU

FC 10

Conv 3x3s1, 10 / ReLU

Max Pool 3x3s1

Conv 3x3s1, 10 / ReLU

Conv 3x3s1, 10 / ReLU

Conv 3x3s1, 10 / ReLU

Max Pool 3x3s1

Conv 3x3s1, 10 / ReLU

Max Pool 3x3s1

Conv 3x3s1, 10 / ReLU

Stack convolution, non-linearity, and pooling until global FC layer classifier

[figure credit A. Karpathy]

Presenter
Presentation Notes
Page 120: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Data augmentation: making muchmore data

120

transform the training data, without changing its truth

… and anything else you can come up ith! ( d bi ti f th b

horizontal flipscat still a cat

random crops/scalesviews of catcat cat darker cat

relighting

[figure adapted from A. Karpathy]

much

Page 121: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

121

See a Net Learn to See

Let’s watch a convnet as it learnshow to recognize objects in images

MNIST demo: Try It Out

Cifar 10 demo: Try It Out

Page 122: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Internalfunctionality

122

Page 123: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Supervised Learning

Given labeled data:(x1, y1), (x2, y2), …, (xN, yN)

Goal: find a function f such that yn = f(xn)for all n, “as well as possible”

labeldata

123

Page 124: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

What does “as well as possible” mean?Pick a loss function ℓ(y, ŷ): how wrong is it to predict ŷ when the true label is y?Minimize the total loss over all data:

E.g. ℓ(y, ŷ) = ‖y - ŷ‖2 “Euclidean Loss” or everyday linear regression

Supervised Loss

124

Page 125: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Parametric Learning

How do we find the label-prediction function f?Parametric answer: pick it from a family determined by a set of parameters θ:

E.g. f(x; θ) = θ x “linear prediction”For us: f is a network, θ is a set of weights

f(x) = f(x; θ)matrix vector

125

Page 126: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Parametric Supervised Learning

Altogether: our goal is to find θ in order to loss true label

parameters(weights)

model(network)

predicted label

sum over data 126

Page 127: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Underfitting and Overfitting

underfitting:not enough parameters to model the data

overfitting:enough parameters to memorize the training set without generalizing

fewer parameters more parameters

127

[figure credit A. Karpathy]

Page 128: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

RegularizationHow can we prevent overfitting without reducing the number of parameters?

Add a regularization penalty to our loss: “complicated” solutions are worse128

[figure credit A. Karpathy]

Page 129: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Regularization: Weight Decay and Dropout

Weight Decay: minimize L(θ) + λ‖θ‖2 to pull weights toward zeroλ (scalar) is an optimization setting… pick it empiricallyaka “L2 regularization”

Dropout: during training, randomly set a fraction p of activations to zerop is an optimization setting (often 0.5)forces model to be robust to noise

129

Page 130: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Gradient Descent: Intuition

Want to minimize “loss” function L(x; θ)

θ axis

L(x; θ)

Move in the direction of the gradient

old θnew θ

θ (vector): parameter to updatex (vector): input data (fixed on this slide)

130

The gradient tells you, for each element of the network parameters,how the loss changes in response to a change in that parameter.

Page 131: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Stochastic Gradient Descent (SGD)

Want to minimize “loss” function L(x; θ)1. Pick input datum x

2. Compute parameter gradient

3. Multiply by learning rate

4. Update parameters θ

131

Page 132: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

(The alternative is to average the gradient over all available data,“batch gradient descent”:

That’s too slow for big data!)

Why “Stochastic”?

The gradient depends on the choice of input datum xChoose x randomly (or just cycle through all data in a fixed order)

132

Page 133: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

SGD with Weight Decay and Momentum

133

Page 134: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

SGD with Weight Decay and Momentum

weight decay(regularization)

134

Page 135: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

SGD with Weight Decay and Momentum

There are many other variants:Adam, RMSprop, AdaDelta, AdaGrad, Nesterov, ...

weight decay(regularization)

momentum(p is a number less than 1)

135

Page 136: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

136

ReLU

Sigmoid

Layer GradientsMatrix Multiply Gradients

Page 137: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

137

Back-propagation: The Chain Rule

layer1

θ

loss (ℓ)

A net is a composition of layer functionsThe gradient of a net is the product of layer gradients

Page 138: Introduction to Deep Learning for Biomedical Engineeringbmia.bmt.tue.nl/people/BRomeny/Courses/Taipei2017/NTUST 2017 Dee… · Introduction to Deep Learning for Biomedical Engineering

Back-propagation in a Bigger Net

layer1

x

layer2

loss

θ1

θ2

input

output

y truth

ŷ

θ3

138

Backward passForward pass