deep learning and tensorflow - paviadeep learning: a theoretical introduction –episode 2 [1] deep...

Post on 20-May-2020

26 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

[1]Deep Learning: a theoretical introduction – Episode 2

Deep Learningand TensorFlowEpisode 2The Quest for Deeper Networks

Università degli Studi di Pavia

[2]Deep Learning: a theoretical introduction – Episode 2

Feed-Forward Neural Network

[3]Deep Learning: a theoretical introduction – Episode 2

Feed-Forward Neural Network

[4]Deep Learning: a theoretical introduction – Episode 2

Training Feed-Forward Neural Networks

[5]Deep Learning: a theoretical introduction – Episode 2

The Quest forDeeper Networks

[6]Deep Learning: a theoretical introduction – Episode 2

Shallow vs. Deep Feed-Forward Neural Networks

[7]Deep Learning: a theoretical introduction – Episode 2

Shallow vs. Deep Feed-Forward Neural Networks

[8]Deep Learning: a theoretical introduction – Episode 2

Shallow vs. Deep Feed-Forward Neural Networks

[9]Deep Learning: a theoretical introduction – Episode 2

Shallow vs. Deep Feed-Forward Neural Networks

[10]Deep Learning: a theoretical introduction – Episode 2

Parity Circuits

[11]Deep Learning: a theoretical introduction – Episode 2

Parity Circuits

[12]Deep Learning: a theoretical introduction – Episode 2

Parity Circuits

[13]Deep Learning: a theoretical introduction – Episode 2

Parity Circuits

[14]Deep Learning: a theoretical introduction – Episode 2

Parity Circuits

[15]Deep Learning: a theoretical introduction – Episode 2

Depth and piecewise linear functions

[16]Deep Learning: a theoretical introduction – Episode 2

Depth and piecewise linear functions

[17]Deep Learning: a theoretical introduction – Episode 2

k > 2 h(2)

Depth and piecewise linear functions

[18]Deep Learning: a theoretical introduction – Episode 2

h

k h

pmax d k

Depth and piecewise linear functions

[19]Deep Learning: a theoretical introduction – Episode 2

About why they did not useDeep Networks

from the beginning

[20]Deep Learning: a theoretical introduction – Episode 2

Problem: vanishing or exploding Gradients

[21]Deep Learning: a theoretical introduction – Episode 2

Problem: vanishing or exploding Gradients

[22]Deep Learning: a theoretical introduction – Episode 2

• g

• W(i)

Problem: vanishing or exploding Gradients

k

[23]Deep Learning: a theoretical introduction – Episode 2

Problem: initial values of the parameters

[24]Deep Learning: a theoretical introduction – Episode 2

A bag of wonderful tricks

[25]Deep Learning: a theoretical introduction – Episode 2

Why ReLU is better (sometimes)

[26]Deep Learning: a theoretical introduction – Episode 2

Why ReLU is better (sometimes)

[27]Deep Learning: a theoretical introduction – Episode 2

Why ReLU is better (sometimes)

[28]Deep Learning: a theoretical introduction – Episode 2

Overfitting

[29]Deep Learning: a theoretical introduction – Episode 2

Dropout

[30]Deep Learning: a theoretical introduction – Episode 2

Dropout

[31]Deep Learning: a theoretical introduction – Episode 2

Dropout

[32]Deep Learning: a theoretical introduction – Episode 2

Dropout

[33]Deep Learning: a theoretical introduction – Episode 2

Contrasting Overfitting

[34]Deep Learning: a theoretical introduction – Episode 2

Improving on MBGD

[35]Deep Learning: a theoretical introduction – Episode 2

Improving on MBGD

[36]Deep Learning: a theoretical introduction – Episode 2

Improving on MBGD

[37]Deep Learning: a theoretical introduction – Episode 2

AdaGrad

[38]Deep Learning: a theoretical introduction – Episode 2

AdaGrad

B

[39]Deep Learning: a theoretical introduction – Episode 2

AdaGrad

a1 a2

[40]Deep Learning: a theoretical introduction – Episode 2

AdaGrad

d

[41]Deep Learning: a theoretical introduction – Episode 2

AdaGrad

d

[42]Deep Learning: a theoretical introduction – Episode 2

AdaGrad

[43]Deep Learning: a theoretical introduction – Episode 2

AdaDelta

[44]Deep Learning: a theoretical introduction – Episode 2

Improving on MBGD

[45]Deep Learning: a theoretical introduction – Episode 2

Improving on MBGD

[46]Deep Learning: a theoretical introduction – Episode 2

Improving on MBGD

[47]Deep Learning: a theoretical introduction – Episode 2

An aside:function approximation vs. classification

[48]Deep Learning: a theoretical introduction – Episode 2

Classification: Softmax

[49]Deep Learning: a theoretical introduction – Episode 2

Classification: Softmax

[50]Deep Learning: a theoretical introduction – Episode 2

Classification: Softmax

i

[51]Deep Learning: a theoretical introduction – Episode 2

Classification: Softmax

[52]Deep Learning: a theoretical introduction – Episode 2

Classification: Softmax

[53]Deep Learning: a theoretical introduction – Episode 2

Classification: Softmax

[54]Deep Learning: a theoretical introduction – Episode 2

Classification: Softmax

m wl

[55]Deep Learning: a theoretical introduction – Episode 2

Classification: Softmax

[56]Deep Learning: a theoretical introduction – Episode 2

Classification: Softmax

[57]Deep Learning: a theoretical introduction – Episode 2

Classification: Softmax

h

[58]Deep Learning: a theoretical introduction – Episode 2

Another aside:autoencoders

[59]Deep Learning: a theoretical introduction – Episode 2

Auto-encoders

[60]Deep Learning: a theoretical introduction – Episode 2

Auto-encoders

top related