deep learning class #1 - go deep or go home

36
Louis Monier @louis_monier https://www.linkedin.com/in/louismonier Deep Learning Basics Go Deep or Go Home Gregory Renard @redo https://www.linkedin.com/in/gregoryrenard Class 1 - Q1 - 2016

Upload: holberton-school

Post on 09-Feb-2017

2.148 views

Category:

Education


2 download

TRANSCRIPT

Page 1: Deep Learning Class #1 - Go Deep or Go Home

Louis Monier@louis_monier

https://www.linkedin.com/in/louismonier

Deep Learning BasicsGo Deep or Go Home

Gregory Renard@redohttps://www.linkedin.com/in/gregoryrenard

Class 1 - Q1 - 2016

Page 2: Deep Learning Class #1 - Go Deep or Go Home

Supervised Learning

cathatdogcatmug

… =?

prediction

dog (89%)

training set …

Training(days)

Testing(ms)

Page 3: Deep Learning Class #1 - Go Deep or Go Home

Single Neuron to Neural Networks

Page 4: Deep Learning Class #1 - Go Deep or Go Home

Single Neuron

+

x

w1

xpos

x

w2

ypos

w3

ChoiceScore

Linear Combination Nonlinearity

3.4906 0.9981, red

Inputs

3.45

1.21

-0.314

1.59

2.65

Page 5: Deep Learning Class #1 - Go Deep or Go Home

More Neurons, More Layers: Deep Learning

input1

input2

input3

input layer hidden layers output layer

circleiris

eye

face

89% likely to be a face

Page 6: Deep Learning Class #1 - Go Deep or Go Home

How to find the best values for the weights?

w = Parameter

Error

a

b

Define error = |expected - computed| 2

Find parameters that minimize average error

Gradient of error wrt w is

Update:

take a step downhill

Page 7: Deep Learning Class #1 - Go Deep or Go Home

Egg carton in 1 million dimensions

Get to here

or here

Page 8: Deep Learning Class #1 - Go Deep or Go Home

Backpropagation: Assigning Blame

input1

input2

input3

label=1

loss=0.49

w1

w2

w3

b

w1’

w2’

w3’

b’

1). For loss to go down by 0.1, p must go up by 0.05.For p to go up by 0.05, w1 must go up by 0.09.For p to go up by 0.05, w2 must go down by 0.12....

prediction p=0.3

p’=0.89

2). For loss to go down by 0.1, p’ must go down by 0.01

3). For p’ to go down by 0.01, w2’ must go up by 0.04

Page 9: Deep Learning Class #1 - Go Deep or Go Home

Stochastic Gradient Descent (SGD) and Minibatch

It’s too expensive to compute the gradient on all inputs to take a step.

We prefer a quick approximation.

Use a small random sample of inputs (minibatch) to compute the gradient.

Apply the gradient to all the weights.

Welcome to SGD, much more efficient than regular GD!

Page 10: Deep Learning Class #1 - Go Deep or Go Home

Usually things would get intense at this point...

Chain rule

Partial derivatives

HessianSum over paths

Credit: Richard Socher Stanford class cs224d

Jacobian matrix

Local gradient

Page 11: Deep Learning Class #1 - Go Deep or Go Home

Learning a new representation

Credit: http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/

Page 12: Deep Learning Class #1 - Go Deep or Go Home

Training Data

Page 13: Deep Learning Class #1 - Go Deep or Go Home

Train. Validate. Test.Training set

Validation set

Cross-validation

Testing set

2 3 41

2 3 41

2 3 41

2 3 41

shuffle!

one epoch

Never ever, ever, ever, ever, ever, ever use for training!

Page 14: Deep Learning Class #1 - Go Deep or Go Home

Tensors

Page 15: Deep Learning Class #1 - Go Deep or Go Home

Tensors are not scary

Number: 0D

Vector: 1D

Matrix: 2D

Tensor: 3D, 4D, … array of numbers

This image is a 3264 x 2448 x 3 tensor

[r=0.45 g=0.84 b=0.76]

Page 16: Deep Learning Class #1 - Go Deep or Go Home

Different types of networks / layers

Page 17: Deep Learning Class #1 - Go Deep or Go Home

Fully-Connected NetworksPer layer, every input connects to every neuron.

input1

input2

input3

input layer hidden layers output layer

circleiris

eye

face

89% likely to be a face

Page 18: Deep Learning Class #1 - Go Deep or Go Home

Recurrent Neural Networks (RNN)Appropriate when inputs are sequences.

We’ll cover in detail when we work on text.

pipe a not is this

t=0t=1t=2t=3t=4

this is not a pipe

t=0 t=1 t=2 t=3 t=4

Page 19: Deep Learning Class #1 - Go Deep or Go Home

Convolutional Neural Networks (ConvNets)Appropriate for image tasks, but not limited to image tasks.

We’ll cover in detail in the next session.

Page 20: Deep Learning Class #1 - Go Deep or Go Home

HyperparametersActivations: a zoo of nonlinear functions

Initializations: Distribution of initial weights. Not all zeros.

Optimizers: driving the gradient descent

Objectives: comparing a prediction to the truth

Regularizers: forcing the function we learn to remain “simple”

...and many more

Page 21: Deep Learning Class #1 - Go Deep or Go Home

ActivationsNonlinear functions

Page 22: Deep Learning Class #1 - Go Deep or Go Home

Sigmoid, tanh, hard tanh and softsign

Page 23: Deep Learning Class #1 - Go Deep or Go Home

ReLU (Rectified Linear Unit) and Leaky ReLU

Page 24: Deep Learning Class #1 - Go Deep or Go Home

Softplus and Exponential Linear Unit (ELU)

Page 25: Deep Learning Class #1 - Go Deep or Go Home

Softmax

x1 = 6.78

x2 = 4.25

x3 = -0.16

x4 = -3.5

dog = 0.92536

cat = 0.07371

car = 0.00089

hat = 0.00003

Raw scores Probabilities

SOFTMAX

Page 26: Deep Learning Class #1 - Go Deep or Go Home

Optimizers

Page 27: Deep Learning Class #1 - Go Deep or Go Home

Various algorithms for driving Gradient DescentTricks to speed up SGD.

Learn the learning rate.

Credit: Alex Radford

Page 28: Deep Learning Class #1 - Go Deep or Go Home

Cost/Loss/Objective Functions

=?

Page 29: Deep Learning Class #1 - Go Deep or Go Home

Cross-entropy Loss

x1 = 6.78

x2 = 4.25

x3 = -0.16

x4 = -3.5

dog = 0.92536

cat = 0.07371

car = 0.00089

hat = 0.00003

Raw scores Probabilities

SOFTMAX

dog: loss = 0.078

cat: loss = 2.607

car: loss = 7.024

hat: loss = 10.414

If label is:

=?

label (truth)

Page 30: Deep Learning Class #1 - Go Deep or Go Home

RegularizationPreventing Overfitting

Page 31: Deep Learning Class #1 - Go Deep or Go Home

Overfitting

Very simple.Might not predict well.

Just right? Overfitting. Rote memorization.Will not generalize well.

Page 32: Deep Learning Class #1 - Go Deep or Go Home

Regularization: Avoiding Overfitting

Idea: keep the functions “simple” by constraining the weights.

Loss = Error(prediction, truth) + L(keep solution simple)

L2 makes all parameters medium-size

L1 kills many parameters.

L1+L2 sometimes used.

Page 33: Deep Learning Class #1 - Go Deep or Go Home

At every step, during training, ignore the output of a fraction p of neurons.

p=0.5 is a good default.

Regularization: Dropout

input1

input2

input3

Page 34: Deep Learning Class #1 - Go Deep or Go Home

One Last Bit of Wisdom

Page 35: Deep Learning Class #1 - Go Deep or Go Home
Page 36: Deep Learning Class #1 - Go Deep or Go Home

Workshop : Keras & Addition RNNhttps://github.com/holbertonschool/deep-learning/tree/master/Class%20%230