presentation springschool2019d.pptx [autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn «...

183
Basics of deep learning By Pierre-Marc Jodoin and Christian Desrosiers

Upload: others

Post on 13-Jul-2020

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Basics of deep learning

By

Pierre-Marc Jodoin and Christian Desrosiers

Page 2: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Lets start with a simple example

From Wikimedia Commonsthe free media repository

Page 3: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

( temp, freq) diagnostic

(37.5, 72) Healthy

(39.1, 103) Sick

(38.3, 100) Sick

(…) …

(36.7, 88) Healthy

Freq

temp

Sick

Healthy

Patient 1

Patient 2

Patient 3

Patient N

D

x t

D

Lets start with a simple example

Page 4: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Freq

temp

Sick

Healthy

A new patient comes to the hospitalHow can we determine if he is sick or not?

From Wikimedia Commonsthe free media repository

Lets start with a simple example

Page 5: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Freq

temp

Divide the feature space in 2 regions : sick and healthy

Solution

From Wikimedia Commonsthe free media repository

Page 6: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Freq

temp

Sick

Healthy

Divide the feature space in 2 regions : sick and healthy

Solution

From Wikimedia Commonsthe free media repository

Page 7: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Freq

temp

t=Sick

t=Healthy

otherwise

region blue in the is ifealthy

Sick

xHxy

More formally

From Wikimedia Commonsthe free media repository

Page 8: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

How to split the featurespace?

Page 9: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Definition … a line!

x

ybmxy

x

y

x

ym

slope bias

b, bias

Page 10: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

x

ybmxy

x

y bxx

yy

x

ym

xbyxxy

xbxyyx 0

Definition … a line!

b, bias

Page 11: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

xbxyyx 0

1w 2w 0w

Rename variables

Page 12: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

x

y

1x

2x

Rename variables

0210 wywxw

022110 wxwxw

Page 13: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

1x

2x

02211 wxwxwxyw

0xyw

0xyw

0xyw

Classification function

),( 21 wwN

Page 14: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

1x

2x

002211 wxwxw

0.4

0.2

0.1

0

2

1

w

w

w

)3,2(

)6,1(

0702211 wxwxw

)1,4(

0602211 wxwxw

Classification function

02211 wxwxwxyw

Page 15: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

1x

2x

02211 wxwxwxyw

0xyw

0xyw

0xyw

Classification function

),( 21 wwN

21210

02211

,,1,, xxwww

wxwxwxyw

DOT

product

'x

w

Page 16: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

1x

2x

02211 wxwxwxyw

0xyw

0xyw

0xyw

Classification function

),( 21 wwN

'

,,1,, 21210

02211

xw

xxwww

wxwxwxy

T

w

DOT

product

Page 17: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

To simplify notation

xwxy Tw

linear classification = dot product with bias included

Page 18: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Freq

0tSick

Healthy

With the training dataset D

the GOAL is to

find the parameters that would best separate the two classes.

210 ,, www 0xyw

0xyw

Learning

Page 19: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

How do we know if a model is good?

Page 20: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Freq

0t

FreqFreq

0t0t

0, DxyL w

0, DxyL w

0, DxyL w

Loss function

Good! Medium BAD!

Page 21: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

4. Training : find that minimize

So far…

1. Training dataset: D

2. Classification function (a line in 2D) :

3. Loss function:

02211 wxwxwxyw

210 ,, www

DxyL w ,

DxyL w ,

Page 22: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Perceptron Logistic regressionMulti-layer perceptronConv Nets

Today

22

Page 23: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Perceptron

27

Rosenblatt, Frank (1958), The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain, Psychological Review, v65, No. 6, pp. 386–408

Page 24: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Perceptron

1

1w

2w

0w

(2D and 2 classes)

)(xyw

1x

2x

w

1x

2x0)( xyw

0)( xyw

0)( xyw

xw

xwxwwxyw

T

12110

)(

xw T

28

Page 25: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

sign }1,1{)( xyw

Activation function

xw T

xwsignxyw

T

Perceptron(2D and 2 classes)

29

1x

2x0)( xyw

0)( xyw

0)( xyw

w

1

1w

2w

0w

1x

2x

Page 26: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

NeuronDot product + activation function

sgxw T 1,1T xwsignxyw

Perceptron(2D and 2 classes)

30

w

1x

2x0)( xyw

0)( xyw

0)( xyw

1

1w

2w

0w

1x

2x

Page 27: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

So far…

1. Training dataset: D

2. Classification function (a line in 2D) :

3. Loss function:

02211 wxwxwxyw

DxyL w ,

Page 28: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

4. Training : find that minimize

So far…

1. Training dataset: D

2. Classification function (a line in 2D) :

3. Loss function:

210 ,, www

DxyL w ,

DxyL w ,

sg

1

1w

2w

0w

1x

2x xw T xyW

Page 29: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Linear classifiers have limits

Page 30: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Non-linearly separable training data

Linear classifier = large error rate

Page 31: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Three classical solutions

1. Acquire more observations

2. Use a non-linear classifier

3. Transform the data

Non-linearly separable training data

Page 32: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Three classical solutions

1. Acquire more observations

2. Use a non-linear classifier

3. Transform the data

Non-linearly separable training data

Page 33: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

D

x

( temp, freq, headache) Diagnostic

(37.5, 72, 2) healthy

(39.1, 103, 8) sick

(38.3, 100, 6) sick

(…) …

(36.7, 88, 0) healthy

Patient 1

Patient 2

Patient 3

Patient N

D

x

t

( temp, freq) diagnostic

(37.5, 72) healthy

(39.1, 103) sick

(38.3, 100) sick

(…) …

(36.7, 88) healthy

Patient 1

Patient 2

Patient 3

Patient N

D

t

Acquire more data

Page 34: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

10

8

6

4

2

0

02211 wxwxwxyw

(line)

0332211 wxwxwxwxyw

(plane)

Non-linearly separable training data

Page 35: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

39

sgxw T 1,1T xwsignxyw

Perceptron(2D and 2 classes)

02211 wxwxwxyw

(line)

1x

2x0)( xyw

0)( xyw

0)( xyw

w

1

1w

2w

0w

1x

2x

Page 36: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

40

Perceptron(3D and 2 classes)

sg

1

1w

2w

0w

1x

2x

xw T 1,1T xwsignxyw

1)( xyw

1)( xyw

Example 3D

0332211 wxwxwxwxyw

(plane)

3x 3w

Page 37: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

41

Perceptron(K-D and 2 classes)

sg

1

1w

2w

0w

xw T 1,1T xwsignxyw

0332211 wxwxwxwxwxy KKw

(hyperplane)

1Kw

Kw

(…)

1x

2x

1Kx

Kx

Page 38: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Learning a machine

The goal: with a set of training data , estimate so:

In other words, minimize the training loss

Optimization problem

ntxy nnw

NN txtxtxD ,,...,,,, 2211

,,1

N

nnnww txylDxyL

42

w

Page 39: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Freq

0t

FreqFreq

0t0t

Loss function

0, DxyL w

0, DxyL w

0, DxyL w

Page 40: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

44

DxyL w ,

1w2w

Page 41: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Perceptron

Question: how to find the best solution?

45

0, DxyL w

Random initialization

DxyL w ,

1w2w

[w1,w2]=np.random.randn(2)

Page 42: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Gradient descent

Question: how to find the best solution?

DxyLηww [k]w

[k]][k ,1

Gradient of the loss function

Learning rate

47

0, DxyL w

Page 43: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Perceptron Criterion (loss)

A wrongly classified sample is when

Consequently is ALWAYS positive for wrongly classified samples

.1et 0

or

1et 0

T

T

nn

nn

txw

txw

Tnntxw

Observation

50

Page 44: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Perceptron Criterion (loss)

15.464, DxyL w

samples classified wrongly ofset theis V ere wh, T

Vx

nnw

n

txwDxyL

51

Page 45: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

So far…

1. Training dataset: D

2. Linear classification function:

3. Loss function:

02211 wxwxwxwxy MMw

, T

Vx

nnw

n

txwDxyL

Page 46: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

4. Training : find that minimizes

So far…

1. Training dataset: D

2. Linear classification function:

3. Loss function:

w DxyL w ,

sg

1

1w2w

0w

1x

2x

xw T 1,1T xwsignxyw

3x

Kx

3w

Kw

(…)

DxyLw ww

,minarg

0, DxyL w

, T

Vx

nnw

n

txwDxyL

Page 47: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Optimisation

Lkkk ][][]1[ ww

Gradient of the loss function

learning rate

Stochastic gradient descent (SGD)

Initk=0DO k=k+1

FOR n = 1 to N

UNTIL every data is well classified or k== MAX_ITER

nk xLww

][

54

w

Page 48: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Perceptron gradient descent

Vx

nnw

n

xtDxyL

,

NOTE : learning rate :

• Too low => slow convergence• Too large => might not converge (even diverge)• Can decrease at each iteration (e.g. )kcstk /][ 55

, T

Vx

nnw

n

txwDxyL

Stochastic gradient descent (SGD)

Initk=0DO k=k+1

FOR n = 1 to NIF THEN /* wrongly classified */

UNTIL every data is well classified OR k=k_MAX

nnxtww

0T nntxw

w

Page 49: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Similar loss functions

N

nn

Tnw xwt,DxyL

1

0max,

“Hinge Loss” or “SVM” Loss

samples classified wrongly ofset thes V ere wh, T itxwDxyLVx

nnw

n

N

nn

Tnw xwt,DxyL

1

10max,

58

Page 50: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Multiclass Perceptron

1 1,0w

2,0w

0,0w

2221120222

2211110111

2201100000

xwxwwxw)x(y

xwxwwxw)x(y

xwxwwxw)x(y

,,,T

,w

,,,T

,w

,,,T

,w

(2D and 3 classes)

1,1w

2,1w

0,1w

1,2w

2,2w

0,2w

xy iWi

,maxarg

1x

xwT 0

xwT 1

xwT 2

59

2x

max is )(0, xyw

max is )(2, xyw

max is )(1, xyw

Page 51: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

xWxyW

T)(

2

1

2,21,20,2

2,11,10,1

2,01,00,0 1

)(

x

x

www

www

www

xyW

1 1,0w

2,0w

0,0w

1,1w

2,1w

0,1w

1,2w

2,2w

0,2w

xy iWi

,maxarg

xwT 0

xwT 1

xwT 2

Multiclass Perceptron(2D and 3 classes)

max is )(0, xyw

max is )(2, xyw

max is )(1, xyw

2x

1x

Page 52: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Multiclass Perceptron(2D and 3 classes)

Example

2

1.1

1

9.446

1.44.24

5.06.32

)(xyW

Class 0

Class 1

(1.1, -2.0)

2.8

6.9

9.6

Class 2

61

Page 53: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Multiclass PerceptronLoss function

xwxwDxyLVx

nTtn

TjW

n

n

,

Sum over all wronglyclassified samples

Score of the wrong class

Score of the true class

,

Vx

nw

n

xDxyL

66

Page 54: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Multiclass Perceptron

Stochastic gradient descent (SGD)

init Wk=0, i=0DO k=k+1

FOR n = 1 to N

IF THEN /* wrongly classified sample */

UNTIL every data is well classified or k > K_MAX.

ntt

njj

xww

xww

nn

Wmaxarg nT xj

itj

67

Page 55: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Perceptron

Advantages:• Very simple• Does NOT assume the data follows a Gaussian distribution.• If data is linearly separable, convergence is guaranteed.

Limitations:• Zero gradient for many solutions => several “perfect solutions”• Data must be linearly separable

Will never convergeSuboptimal solutionMany “optimal”

solutions

Page 56: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Two famous ways of improving the Perceptron

1. New activation function + new Loss

1. New network architectureLogistic regression

Multilayer Perceptron / CNN

73

Page 57: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Logistic regression

74

Page 58: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

]1,0[)( xyw

Sigmoid activation function

Logistic regression(2D, 2 classes)

New activation function: sigmoid

te

t

1

11

1w

2w

0w

1x

2x

xw T

75

Page 59: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

xw

Tw T

exwσ)x(y

1

1

te

t

1

1

Neuron

1

1w

2w

0w

1x

2x

xwT ]1,0[)( xyw

76

Logistic regression(2D, 2 classes)

New activation function: sigmoid

Page 60: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

1

1w

2w

0w

1x

2x

xwxy Tw

)(

77

w

1x

2x5.0)( xyw

0)( xyw

1)( xyw

Neuron

xwT ]1,0[)( xyw

Logistic regression(2D, 2 classes)

New activation function: sigmoid

Page 61: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Example 5.0,6.3,0.2,0.1,4.0 wxn

125.01

194.1)(

94.1

exyw

94.15.04.0*6.32 nT xw

1

6.3

5.0

2

4.0

1

Since 0.125 is lower than 0.5, is behind the plan.nx

xw T

78

Logistic regression(2D, 2 classes)

Page 62: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Like the Perceptron the logistic regression accomodates for K-D vectors

2x

(...)

1

Kw

1w

2w

0w

Kx

xwT

1x

79

Logistic regression(K-D, 2 classes)

xwxy Tw

)(

Page 63: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

80

xwxy Tw

)(

What is the loss function?

2x

(...)

1

Kw

1w

2w

0w

Kx

xwT

1x

Page 64: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

With a sigmoid, we can simulate a conditional probabilityof c1 GIVEN x

81

xcPxwxy Tw

|)( 1

2x

(...)

1

Kw

1w

2w

0w

Kx

xwT

1x

Page 65: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

With a sigmoid, we can simulate a conditional probabilityof c1 GIVEN x

82

xcPxwxy Tw

|)( 1

)(1,|0 xyxwcP w

2x

(...)

1

Kw

1w

2w

0w

Kx

xwT

1x

Page 66: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

n

N

nnnw

w xtxywd

DxydL

1

,

Cost function is –ln of the likelihood

We can also show that

2 Class Cross entropy

As opposed to the Perceptronthe gradient does not dependon the wrongly classified samples

nwn

N

nnwnw xytxytDxyL

1ln1ln,1

Page 67: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Logistic Network

Advantages:• More stable than the Perceptron• More effective when the data is non separable

Per

cept

ron

Log

istic

net

87

Page 68: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

And for K>2 classes?New activation function : Softmax

exp

normexp

exp

)(0

xyw

c

xw

xw

w Tc

Ti

i e

exy

)(

xWcP

,|0

)(1

xyw

xWcP

,|1

)(2

xyw

xWcP

,|2

1 1,0w

2,0w

0,0w

1,1w

2,1w

0,1w

1,2w

2,2w

0,2w

xwT 0

xwT 1

xwT 2

88

2x

1x

Page 69: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Softmax

exp

normexp

exp

)(0

xyw

xWcP

,|0

)(1

xyw

xWcP

,|1

)(2

xyw

xWcP

,|2

1 1,0w

2,0w

0,0w

1,1w

2,1w

0,1w

1,2w

2,2w

0,2w

xwT 0

xwT 1

xwT 2

And for K>2 classes?New activation function : Softmax

c

xw

xw

w Tc

Ti

i e

exy

)(

2x

1x

Page 70: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

'airplane' t 1000000000

'automobile' t 0100000000

'bird' t 0010000000

'cat' t 0001000000

'deer' t 0000100000

'dog' t 0000010000

'frog' t

0000001000

'horse' t 0000000100

'ship' t 0000000010

'truck' t 0000000001

Class labels : one-hot vectors

And for K>2 classes?

Cifar10

Page 71: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Cross entropy Loss

K>2 classes

91

N

n

K

knWknW xytDxyL

k1 1

ln,

Page 72: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

K-Class cross entropy loss

92

N

nknnWn txyxL

1

N

n

K

knWknW xytDxyL

k1 1

ln,

Page 73: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Regularization

31,31,31

0,0,1

0.1,0.1,0.1

2

1

T

T

w

w

x

Different weights may give the same score

121 xwxw TT Maximum a Solution:

Maximum a posteriori

93

Page 74: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

,minarg DxyL wW

Loss function

Regularization

Constant

21

Wou WRIn general L1 or L2

WR

Maximum a posterioriRegularization

94

Page 75: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Wow! Loooots of information!

Lets recap…

95

Page 76: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Neural networks2 classes

xcPxyw

|)( 1

Sign activation Sigmoid activation(...)

1

1w

2w

Kw

0w

sgxw T

1,1 xyw

xw T

96

Perceptron Logistic regression

2x

Kx

1x

(...)

1

1w

2w

Kw

0w

2x

Kx

1x

Page 77: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

K-Class Neural networks

Softmax activation

exp

normexp

exp

)(0

xyw

xcP

|0

)(1

xyw

xcP

|1

)(2

xyw

xcP

|2

1,0w

2,0w

0,0w

1,1w

2,1w

0,1w

1,2w

2,2w

0,2w

xwT 0

xwT 1

xwT 2

xyiW

i

maxarg

1 1,0w

2,0w

0,0w

1,1w

2,1w

0,1w

1,2w

2,2w

0,2w

xT 0w

xT 1w

xT 2w

Perceptron

Logistic regression

2x

1x

1

2x

1x

Page 78: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Loss functions

samples classified wrongly ofset theis V ere wh, T

Vx

nnw

n

xwtDxyL

N

nnnw xwtDxyL

1

T,0max,

N

nnnw xwtDxyL

1

T1,0max,

“Hinge Loss” or “SVM” Loss

2 classes

Cross entropy loss

98

nwn

N

nnwnw xytxytDxyL

1ln1ln,1

Page 79: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Loss functions

K classes

Cross entropy loss with a Softmax

99

samples classified wrongly ofset thes V where, ixwxwDxyLVx

nTtn

Tjw

n

n

N

n jn

Ttn

Tjw xwxw,DxyL

n1

10max,

N

n jn

Ttn

Tjw xwxw,DxyL

n1

0max,

N

n

K

knWknw xytDxyL

k1 1

ln,

“Hinge Loss” or “SVM” Loss

Page 80: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Maximum a posteriori

Loss function

Regularization

Constante

,,1

N

nnnWw txylDxyL

21

W or WWR

WR

100

Page 81: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Optimisation

Lkkk ][][]1[ ww

Gradient of the loss function

learning rate

Stochastic gradient descent (SGD)

Initk=0DO k=k+1

FOR n = 1 to N

UNTIL every data is well classified or k== MAX_ITER

nk xLww

][

101

w

Page 82: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Now, lets goDEEPER

Page 83: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Three classical solutions

1. Acquire more data

2. Use a non-linear classifier

3. Transform the data

Non-linearly separable training data

Page 84: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Three classical solutions

1. Acquire more data

2. Use a non-linear classifier

3. Transform the data

Non-linearly separable training data

Page 85: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

2D, 2Classes, Linear logistic regression

105

1x

1

2w

1w

0w

2x

Input layer(3 “neurons”)

Output layer(1 neuron with sigmoid)

Rxwσ)x(y Tw

xwT

Page 86: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Let’s add 3 neurons

1 1,0w

2,0w

0,0w

1,1w

2,1w

0,1w

1,2w

2,2w

0,2w

Input layer(3 “neurons”)

First layer(3 neurons)

xwT 0

xwT 1

xwT 2

RxwT

0

RxwT

1

RxwT

22x

1x

Page 87: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

NOTE: The output of the first layer is a vector of 3 real values

1,0w

2,0w

0,0w

1,1w

2,1w

0,1w

1,2w

2,2w

0,2w

xwT 0

xwT 1

xwT 2

RxwT

0

RxwT

1

RxwT

2

3

2

1

2,21,20,2

2,11,10,1

2,01,00,0 1

R

x

x

www

www

www

1

2x

1x

Page 88: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

[0]W x

NOTE: The output of the first layer is a vector of 3 real values

1,0w

2,0w

0,0w

1,1w

2,1w

0,1w

1,2w

2,2w

0,2w

xwT 0

xwT 1

xwT 2

RxwT

0

RxwT

1

RxwT

2

1

2x

1x

Page 89: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

If we want a 2-class Classification via a logistic regression (a cross entropy loss) we must add an output neuron.

Input layer(3 « neurons »)

hidden layer(3 neurons)

Output layer(1 neuron)

2-D, 2-Class, 1 hidden layer

1

]0[2,0w

]0[2,2w

]0[2,1w

]0[0,0w

]0[0,2w

]0[0,1w

]0[1,0w

]0[1,2w

]0[1,1w

]1[0w

]1[1w

]1[2w

xw T ]0[0

xw T ]1[1

xw T ]2[2

xWw T ]0[]1[ RxWwxy TW

]0[]1[

2x

1x

Page 90: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

1

Visualsimplification

[0]W

1

]0[2,0w

]0[2,2w

]0[2,1w

]0[0,0w

]0[0,2w

]0[0,1w

]0[1,0w

]0[1,2w

]0[1,1w

]1[0w

]1[1w

]1[2w

xw T ]0[0

xw T ]1[1

xw T ]2[2

xWw T ]0[]1[ RxWwxy TW

]0[]1[

2x

1x

1x

0x

]1[w

xWwxy TW

]0[]1[

2-D, 2-Class, 1 hidden layer

Page 91: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Inputlayer

Hiddenlayer

Outputlayer

1

bias neuron.

This network contains a total of 13 parameters

Input layerHidden layer

3x3 1x4

Hidden layerOutput layer

1

[0]W

1x

0x

]1[w

xWwxy TW

]0[]1[

2-D, 2-Class, 1 hidden layer

Page 92: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

112

1

1

Increasing the number of neurons = increasing the capacity of the model

This network has 5x3+1x6=21 parameters

2x

1x

Inputlayer

Hiddenlayer

Outputlayer

2-D, 2-Class, 1 hidden layer

xWwxy TW

]0[]1[

Page 93: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

No hidden neuron

http://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html

12 hidden neurons 60 hidden neurons

Nb neurons VS Capacity

Linear classificationUnderfitting(low capacity)

Non linear classificationGood result

(good capacity)

Non linear classificationOver fitting

(too large capacity)

Page 94: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

114

1

Increasing the dimensionality of the data = more columns in

This network has 5x(k+1)+1x6 parameters

kD, 2Classes, 1 hidden layer

1x

1

2x

[0]W

(...)

kx

1kx

Inputlayer

Hiddenlayer

Outputlayer

xWwxy TW

]0[]1[

Page 95: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

115

1

Adding an hidden layer = Adding a matrix multiplication

This network has 5x(k+1)+6x3 + 1x4 parameters

kD, 2Classes, 2 hidden layers

[0]W

1

[1]W

Inputlayer

HiddenLayer 1

Outputlayer

HiddenLayer 2

1x

1

2x

(...)

kx

1kx

xWWwxy TW

]0[]1[]2[

Tw ]2[4]2[

63]1[

15]0[

Rw

RW

RW k

Page 96: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

kD, 2 Classes, 4 hidden layer network

116

1

kx

1

[0]W

1

[1]W

(...)

1

1

[3]W

[2]W

This network has 5x(k+1)+6x3 + 4x4+7x5+1x8 parameters

8]4[

57]3[

44]2[

63]1[

15]0[

Rw

RW

RW

RW

RW k

xWWWWwxy TW

]0[]1[]2[]3[]4[

]4[w

Inputlayer

HiddenLayer 1

Outputlayer

HiddenLayer 2

HiddenLayer 3

HiddenLayer 4

1x

2x

3x

Page 97: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

kD, 2 Classes, 4 hidden layer network

1

1x

kx

1

2x

[0]W

1

[1]W

(...)

1

1

[3]W

[2]W

8]4[

57]3[

44]2[

63]1[

15]0[

Rw

RW

RW

RW

RW k

xWWWWwxy TW

]0[]1[]2[]3[]4[

]4[w

Inputlayer

HiddenLayer 1

Outputlayer

HiddenLayer 2

HiddenLayer 3

HiddenLayer 4

NOTE : More hidden layers = deeper network = more capacity.

3x

Page 98: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Multilayer Perceptron

1

1x

2x

1

1

(...)

1

1

Kx

xyW

xf

x

3x

Page 99: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Example

1

1x

2x

1

1

Input data

x

Output of the last layer Output of the network

)(xyW

Page 100: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

A K-Class neural networkhas K output neurons.

Page 101: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

kD, 4 Classes, 4 hidden layer network

1

kx

1

[0]W

1

[1]W

[4]W

(...)

1

1

[3]W

[2]W

,1Wy x

,0Wy x

,3Wy x

,2Wy x

[4] [3] [2] [1] [0]Wy x W W W W W x

[4]0W

[4]1W

[4]2W

[4]3W

48]4[

57]3[

44]2[

63]1[

15]0[

RW

RW

RW

RW

RW kInputlayer

HiddenLayer 1

Outputlayer

HiddenLayer 2

HiddenLayer 3

HiddenLayer 4

1x

2x

3x

Page 102: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

kD, 4 Classes, 4 hidden layer network

1

kx

1

[0]W

1

[1]W

[4]W

(...)

1

1

[3]W

[2]W

,1Wy x

,0Wy x

,3Wy x

,2Wy x

[4] [3] [2] [1] [0]Wy x W W W W W x

Hinge loss

[4]0W

[4]1W

[4]2W

[4]3W

Inputlayer

HiddenLayer 1

Outputlayer

HiddenLayer 2

HiddenLayer 3

HiddenLayer 4

1x

2x

3x

Page 103: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Softmax

1

kx

1

[0]W

1

[1]W

[4]W

(...)

1

1

[3]W

[2]W exp

exp

exp

exp

,1Wy xN

ORM

,0Wy x

,3Wy x

,2Wy x

[4] [3] [2] [1] [0]softmaxWy x W W W W W x

Cross entropy

kD, 4 Classes, 4 hidden layer networkInputlayer

HiddenLayer 1

Outputlayer

HiddenLayer 2

HiddenLayer 3

HiddenLayer 4

1x

2x

3x

Page 104: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

How to make a prediction?

126

Page 105: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Forward pass

127

1

1x

2x

1

1

x

Page 106: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Forward pass

128

1

1

1

xW]0[

1x

2x

Page 107: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Forward pass

129

1

1

1

xW]0[

1x

2x

Page 108: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Forward pass

130

1

1

1

xWW]0[]1[

1x

2x

Page 109: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Forward pass

131

1

1

1

xWW]0[]1[

1x

2x

Page 110: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Forward pass

132

1

1

1

xWWwT ]0[]1[

1x

2x

Page 111: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Forward pass

133

1

1

1

xWWwxy TW

]0[]1[

1x

2x

Page 112: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Forward pass

134

xytxyttxyl WWW

1ln)1(ln,

1

1

1

txyl W ,

1x

2x

Page 113: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

How to optimize the network?

135

0- From

Choose a regularization function

,minarg1

WRtxylWN

nnnW

W

21

or WWWR

Page 114: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

How to optimize the network?

1- Choose a loss for example

Hinge lossCross entropy

Do not forget to adjust the output layer withthe loss you have choosen.

cross entropy => Softmax

nnW txyl ,

Page 115: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

How to optimize the network?

137

2- Compute the gradient of the loss with respect to each parameter

and launch a gradient descent algorithm to update the parameters.

][

,

1

,

cba

N

nnnW

W

WRtxyl

][

,

1][,

][,

,

cba

N

nnnW

cba

cba W

WRtxyl

WW

Page 116: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

How to optimize the network?

138

][

,

1

,

cba

N

nnnW

W

WRtxyl

Backpropagation

1

0x

1x

1

1

xWWwxyW

]0[]1[]2[

Page 117: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

139

xytxyttxyl WWW

1ln)1(ln,

1

0x

1x

1

1

txyl W ,

xWWwxy TW

]0[]1[

Page 118: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

140

1

0x

1x

1

1

txyl

Fxy

DwF

CD

BWC

AB

xWA

W

W

T

,

]1[

]0[

A

Page 119: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

141

1

0x

1x

1

1

txyl

Fxy

DwF

CD

BWC

AB

xWA

W

W

T

,

]1[

]0[

A B

Page 120: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

142

1

0x

1x

1

1

txyl

Fxy

DwF

CD

BWC

AB

xWA

W

W

T

,

]1[

]0[

A B

C

Page 121: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

143

1

0x

1x

1

1

txyl

Fxy

DwF

CD

BWC

AB

xWA

W

W

T

,

]1[

]0[

A B

CD

Page 122: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

144

1

0x

1x

1

1

txyl

Fxy

DwE

CD

BWC

AB

xWA

W

W

T

,

]1[

]0[

A B

CD

E

Page 123: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

145

1

0x

1x

1

1

txyl

Exy

DwE

CD

BWC

AB

xWA

W

W

T

,

]1[

]0[

xWWwxy TW

]0[]1[

A B

CD

E

Page 124: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

146

1

0x

1x

1

1

txyl W ,

txyl

Exy

DwE

CD

BWC

AB

xWA

W

W

T

,

]1[

]0[

xWWwxy TW

]0[]1[

A B

CD

E

Page 125: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

147

][

1

,

l

N

nnnW

W

txyl

Chain rule

Page 126: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Chain rule recap

148

xxv

vvu

uuf

/1)(

2)(

)( 2

2

122

xu

x

v

v

u

u

f

x

f

?

x

f

Page 127: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

149

1

0x

1x

1

1

txyl

Exy

DwE

CD

BWC

AB

xWA

W

W

T

,

]1[

]0[

A B

CD

E

]0[]0[

, ,

W

A

A

B

B

C

C

D

D

E

E

xy

xy

txyl

W

txyl W

W

WW

Page 128: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

150

1

0x

1x

1

1

txyl W ,

]0[]0[

, ,

W

B

A

B

B

C

C

D

D

E

E

xy

xy

txyl

W

txyl W

W

WW

Back propagation

A B

CD

E xyW

Page 129: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Activation functions

151

Page 130: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Activation functions

Sigmoïde

1

1 xx

e

3 Problems :

• Gradient saturates when input is large

• Not zero centered

• exp() is an expensive operation

Page 131: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Activation functions

Tanh(x)

• Output is zero-centered• Small gradient when input is large

[LeCun et al., 1991]

Page 132: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Activation functions ReLU max 0,x x

ReLU(x)(Rectified Linear Unit)

• Large gradient for x>0• Super fast

• Output non centered at zero• No gradient when x<0?

[Krizhevsky et al., 2012]

Page 133: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Activation functions LReLU max 0.01 ,x x x

Leaky ReLU(x)

• no gradient saturation • Super fast

• 0.01 is an hyperparameter

[Mass et al., 2013][He et al., 2015]

Page 134: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Activation functions PReLU max ,x x x

Parametric ReLU(x)

• no gradient saturation • Super fast

• learn with back prop

[Mass et al., 2013][He et al., 2015]

Page 135: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

In practice

• By default, people use ReLU.

• Try Leaky ReLU / PReLU / ELU

• Try tanh but might be sub-optimal

• Do not use sigmoïde except at the output of a 2 class net.

Page 136: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

How to classify an image?

https://ml4a.github.io/ml4a/neural_networks/

Page 137: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Pixel 1

Pixel 2

Pixel 3

Pixel 4

Pixel 5

Pixel 6

Pixel 7

Pixel 8

Pixel 9

Pixel 10

Pixel 11

Pixel 12

Pixel 13

Pixel 14

Biashttps://ml4a.github.io/ml4a/neural_networks/

How to classify an image?

Page 138: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Many parameters(7850 in Layer 1)

Pixel 1

Pixel 2

Pixel 3

Pixel 4

Pixel 5

Pixel 6

Pixel 7

Pixel 8

Pixel 9

Pixel 10

Pixel 11

Pixel 12

Pixel 13

Pixel 14

biashttps://ml4a.github.io/ml4a/neural_networks/

Page 139: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Pixel 1

Pixel 2

Pixel 3

Pixel 4

Pixel 5

Pixel 6

Pixel 7

Pixel 8

Pixel 9

Pixel 10

Pixel 11

Pixel 12

Pixel 13

Pixel 14

Pixel 65536https://ml4a.github.io/ml4a/neural_networks/

256x256

Too many parameters(655,370 in Layer 1)

Page 140: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Pixel 1

Pixel 2

Pixel 3

Pixel 4

Pixel 5

Pixel 6

Pixel 7

Pixel 8

Pixel 9

Pixel 10

Pixel 11

Pixel 12

Pixel 13

Pixel 14

Pixel 16Mhttps://ml4a.github.io/ml4a/neural_networks/

256x256x256

Waaay too many parameters(160M in Layer 1)

Page 141: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Full connections are too many

(...)

(...) (...)

1,01 xF

S

S

S

S

1,02 xF

1,03 xF

1,04 xF

150-D input vector with 150 neurons in Layer 1 => 22,500 parameters!!

Page 142: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

No full connection

(...) (...) (...)

1,01 xF

S

S

S

S

1,02 xF

1,03 xF

1,04 xF

150-D input vector with 150 neurons in Layer 1 => 450 parameters!!

Page 143: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Share weights

150-D input vector with 150 neurons in Layer 1 => 3 parameters!!

(...)

(...)

(...)

1,01 xF

S

S

S

S

1,02 xF

1,03 xF

1,04 xF

(...)

10w

11w

12w

11w

10w

12w

11w

12w

10w

(...)

20w

21w

22w

22w

21w

20w

21w

22w

20w

1- Learning convolution filters! 2- Small number of parameters = can make it deep!

Page 144: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

(...)(...)

10w

11w

12w

(...)

Page 145: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

(...)(...)

10w

11w

12w

(...)

3.2

7.1

0.4

8.2

7.5

4.4

8.2

7.5

4.4

Page 146: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

(...)(...)

1.02.0

3.0

(...)

3.2

7.1

0.4

8.2

7.5

4.4

8.2

7.5

4.4

Page 147: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

(...)(...)

1.02.0

3.0

(...)

3.2

7.1

0.4

8.2

7.5

4.4

25.02.134.23.

8.2

7.5

4.4

Page 148: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

(...)(...)

1.02.0

3.0

(...)

3.2

7.1

0.4

8.2

7.5

4.4

25.0

45.084.8.17.

8.2

7.5

4.4

Page 149: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

(...)(...)

1.02.0

3.0

(...)

3.2

7.1

0.4

8.2

7.5

4.4

25.0

45.0

50.017.56.4.

8.2

7.5

4.4

Page 150: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

(...)(...)

1.02.0

3.0

(...)

3.2

7.1

0.4

8.2

7.5

4.4

25.0

45.0

50.0

8.2

7.5

4.4

50.0

Convolution!

Page 151: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Each neuron of layer 1 is connected to 3x3 pixels, layer 1 has 9 parameters!!

Page 152: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1
Page 153: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Feature map

]0[WxF

Convolution operation

Page 154: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

]0[0Wx ]0[

1Wx

]0[2Wx

]0[3Wx ]0[

4Wx

Page 155: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

5-feature mapconvolution layer

Page 156: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1
Page 157: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

K-feature mapconvolution layer

Page 158: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Conv layer 1POOLING

LAYER

Page 159: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Pooling layer

181

Goals

• Reduce the spatial resolution of featuremaps

• Lower memory and computation requirements

• Provide partial invariance to position, scale and rotation

(stride = 1) (stride = 1)

Images: https://pythonmachinelearning.pro/introduction-to-convolutional-neural-networks-for-vision-tasks/

Page 160: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Conv layer 1 Pool layer 1 Conv layer 2 Pool layer 2

Page 161: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Conv layer 1 Pool layer 1 Conv layer 2 Pool layer 2

(…)

Fully connected layers

xyW

2 Class CNN

xytxyttxyl WWW

1ln)1(ln,

Page 162: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Conv layer 1 Pool layer 1 Conv layer 2 Pool layer 2

(…)

Fully connected layers

K Class CNN

xytxyttxyl WWW

1ln)1(ln,

exp

normexp

exp

)(0

xyw

)(1

xyw

)(2

xyw

SOFTMAX

Page 163: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

185

S. Banerjee, S. Mitra, A. Sharma, and B. U Shankar, A CADe System for Gliomas in Brain MRI using Convolutional Neural Networks, arXiv:1806.07589v, 2018

Nice example from the litterature

Page 164: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Learn image-based caracteristics

http://web.eecs.umich.edu/~honglak/icml09-ConvolutionalDeepBeliefNetworks.pdf

Page 165: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Batch processing

Page 166: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

5.0,6.3,0.2

0.1,4.0

w

x

1

xw T

188

1x

2x

Page 167: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

5.0,6.3,0.2

0.1,4.0

w

x

1

xw T

189

4.0

0.1

125.01

194.1)(

94.1

exyw

94.15.04.0*6.32 xwT 0.2

6.3

5.0

Page 168: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

5.0,6.3,0.2

0.1,4.0

w

x

1

xw T

190

4.0

0.1

125.0

1

4.0

1

5.0,6.3,0.2)(

aw xy

0.2

6.3

5.0

Page 169: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

5.0,6.3,0.2

0.3,1.2,0.1,4.0

w

xx ba

1

xw T4.0

0.1

125.0

1

4.0

1

5.0,6.3,0.2)(

aw xy

1

xw T1.2

0.3

99.0

0.3

1.2

1

5.0,6.3,0.2)(

bw xy

0.2

6.3

5.0

0.2

6.3

5.0

Page 170: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Mini-batch processing

192

xw T

1

4.0

0.1

99.0,125.0

0.31

1.24.0

11

5.0,6.3,0.2)(

aw xy

1

1.2

0.3

0.2

6.3

5.0

Page 171: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Mini-batch processing

193

1

xw T4.0

0.1

99.0,125.0,2.0,89.0)( aw xy

1

1.2

0.3

1

5.0

2.0

1

1.1

1.1

0.2

6.3

5.0

Page 172: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Mini-batch processing

HorseDogTruck

Page 173: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Mini-batch processing

Mini-batch of4 images 4 predictions

HorseDogTruck

Page 174: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Classical applications of ConvNets

Classification.

Page 175: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

197

Classical applications of ConvNets

Classification.

S. Banerjee, S. Mitra, A. Sharma, and B. U Shankar, A CADe System for Gliomas in Brain MRI using Convolutional Neural Networks, arXiv:1806.07589v, 2018

Page 176: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Image segmentation

Classical applications of ConvNets

Page 177: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Image segmentation

Classical applications of ConvNets

Tran, P. V., 2016. A fully convolutional neural network for cardiac segmentation in short-axis MRI. arXiv:1604.00494.

Page 178: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Image segmentation

Classical applications of ConvNets

Fang Liu, Zhaoye Zhou, +3 authors, Deep convolutional neural network and 3D deformable approach for tissue segmentation in musculoskeletal magnetic resonance imaging. in Magnetic resonance in medicine 2018 DOI:10.1002/mrm.26841

Page 179: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Image segmentation

Classical applications of ConvNets

Havaei M., Davy A., Warde-Farley D., Biard A., Courville A., Bengio Y., Pal C., Jodoin P-M, Larochelle H. (2017)Brain Tumor Segmentation with Deep Neural Networks, Medical Image Analysis, Vol 35, 18-31

Page 180: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Localization

Classical applications of ConvNets

Page 181: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Localization

Classical applications of ConvNets

S. Banerjee, S. Mitra, A. Sharma, and B. U Shankar, A CADe System for Gliomas in Brain MRI using Convolutional Neural Networks, arXiv:1806.07589v, 2018

Page 182: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1

Conclusion

• Linear classification (1 neuron network)

• Logistic regression

• Multilayer perceptron

• Conv Nets

• Many buzz words

– Softmax

– Loss

– Batch

– Gradient descent

– Etc.

204

Page 183: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1