7 steps for highly effective deep neural networks

49
7 steps for highly effective deep neural networks Natalino Busa - Head of Data Science

Upload: natalino-busa

Post on 23-Jan-2018

240 views

Category:

Data & Analytics


0 download

TRANSCRIPT

7 steps for highly effective deep neural networks

Natalino Busa - Head of Data Science

2 Natalino Busa - @natbusa

O’Reilly Author and Speaker

Teradata EMEA Practice Lead on Open Source Technologies

Teradata Principal Data Scientist

ING Group Enterprise Architect: Cybersecurity, Marketing, Fintech

Cognitive Finance Group Advisory Board Member

Philips Senior Researcher, Data Architect

Linkedin and Twitter:

@natbusa

3 Natalino Busa - @natbusa

X: independent variableY: dependent variable

Linear Regression: How to best fit a line to some data

4 Natalino Busa - @natbusa

input : 784 numbersoutput: 10 classes

28x28 pixels

Classification: Handwritten digits

5 Natalino Busa - @natbusa

Sharing the (Not so) Secret Lore:

Keras & Tensorflow

Some smaller projects: Tflearn, Tensorlayers

http://keras.io/

6 Natalino Busa - @natbusa

1: Single Layer Perceptron

“dendrites”

Axon’s response

Activation function

7 Natalino Busa - @natbusa

More activation functions:

http

s://e

n.w

ikip

edia

.org

/wik

i/Act

ivat

ion_

func

tion

8 Natalino Busa - @natbusa

Single Layer Neural NetworkTakes: n-input features: Map them to a soft “binary” space

x1x2

xn

f

1: Single Layer Perceptron (binary classifier)

9 Natalino Busa - @natbusa

1: Single Layer Perceptron (multi-class classifier)

Values between 0 and 1Sum of all outcomes = 1

It produces an estimate of a probability!

From soft binary space to predicting probabilities:Take n inputs, Divide by the sum of the predicted values

∑x1

x2

xn

f

∑ f softm

ax‘1’: 95%

‘8’: 5%

10 Natalino Busa - @natbusa

1: Single Layer Perceptron

Minimize costs:The cost function depends on:

- Parameters of the model- How the model “composes”

Goal : Reduce the mean probability error modify the parameters to reduce the error! Vintage math from last century

11 Natalino Busa - @natbusa

Supervised Learning

Stack layers of perceptrons

- Feed Forward Network

- Scoring goes from input to output

- Back propagate the errorfrom output to input

SOFTMAX

Input parameters

classes (estimated probabilities)

Feed

-for

war

d fu

nctio

ns

Cost function

supervised : actual output

Back Propagate Errors

12 Natalino Busa - @natbusa

Let’s go!

13 Natalino Busa - @natbusa

1. Single Layer Perceptron

14 Natalino Busa - @natbusa

1. Single Layer Perceptron

15 Natalino Busa - @natbusa

1. Single Layer Perceptron

16 Natalino Busa - @natbusa

1. Single Layer Perceptron

17 Natalino Busa - @natbusa

Tensorboard

18 Natalino Busa - @natbusa

Tensorboard

19 Natalino Busa - @natbusa

Tensorboard

20 Natalino Busa - @natbusa

2. Multi Layer Perceptron

21 Natalino Busa - @natbusa

2. Multi Layer Perceptron

22 Natalino Busa - @natbusa

2. Multi Layer Perceptron

23 Natalino Busa - @natbusa

2. Multi Layer Perceptron

25 Natalino Busa - @natbusa

3. Convolution

diagrams:By Aphex34 - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=45659236By Aphex34 - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=45673581CC0, https://en.wikipedia.org/w/index.php?curid=48817276

convolution Max pooling RELU / ELU

26 Natalino Busa - @natbusa

3. Convolution

27 Natalino Busa - @natbusa

3. Convolution

28 Natalino Busa - @natbusa

3. Batch Normalization

29 Natalino Busa - @natbusa

3. Batch Normalization ( example for MLP)

30 Natalino Busa - @natbusa

3. Batch Normalization ( example for MLP)

31 Natalino Busa - @natbusa

3. Batch Normalization ( example for MLP)

Activation function:SIGMOID

32 Natalino Busa - @natbusa

3. Batch Normalization ( example for MLP)

Activation function:RELU

33 Natalino Busa - @natbusa

4. Regularization: Prevent overfitting in ANNs

- Batch Normalization- RELU/ELU- RESIDUAL / SKIP Networks- DROP LAYER- REDUCE PRECISION (HUFFMAN ENCODING)

In general ANN are parameters rich, constraining the parameter space usually produces better results and speed up the learning

34 Natalino Busa - @natbusa

5. Inception architectures

Cannot be stacked!

35 Natalino Busa - @natbusa

5. Inception architectures

36 Natalino Busa - @natbusa

5. Inception architectures

compression

Avoid dimensions explosion

37 Natalino Busa - @natbusa

5. Inception architectures

38 Natalino Busa - @natbusa

5. Inception architectures (top architecture)

39 Natalino Busa - @natbusa

6. Residual Networks

https://culurciello.github.io/tech/2016/06/04/nets.html

40 Natalino Busa - @natbusa

6. Residual Networks

41 Natalino Busa - @natbusa

6. Residual + Inception Networks

42 Natalino Busa - @natbusa

7. LSTM on Images

43 Natalino Busa - @natbusa

7. LSTM on Images

44 Natalino Busa - @natbusa

7. LSTM on Images

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

45 Natalino Busa - @natbusa

7. LSTM on Images

46 Natalino Busa - @natbusa

7. LSTM on Images

47 Natalino Busa - @natbusa

7. LSTM on ConvNets (bonus slide)

48 Natalino Busa - @natbusa

All the codez :)

https://github.com/natbusa/deepnumbers

49 Natalino Busa - @natbusa

Meta- References… just a few articles, but extremely dense in content.A must read!

https://keras.io/http://karpathy.github.io/neuralnets/

https://culurciello.github.io/tech/2016/06/04/nets.htmlhttp://colah.github.io/posts/2015-08-Understanding-LSTMs/

https://gab41.lab41.org/batch-normalization-what-the-hey-d480039a9e3b