[1]Deep Learning: a theoretical introduction – Episode 2
Deep Learningand TensorFlowEpisode 2The Quest for Deeper Networks
Università degli Studi di Pavia
[2]Deep Learning: a theoretical introduction – Episode 2
Feed-Forward Neural Network
[3]Deep Learning: a theoretical introduction – Episode 2
Feed-Forward Neural Network
[4]Deep Learning: a theoretical introduction – Episode 2
Training Feed-Forward Neural Networks
[5]Deep Learning: a theoretical introduction – Episode 2
The Quest forDeeper Networks
[6]Deep Learning: a theoretical introduction – Episode 2
Shallow vs. Deep Feed-Forward Neural Networks
[7]Deep Learning: a theoretical introduction – Episode 2
Shallow vs. Deep Feed-Forward Neural Networks
[8]Deep Learning: a theoretical introduction – Episode 2
Shallow vs. Deep Feed-Forward Neural Networks
[9]Deep Learning: a theoretical introduction – Episode 2
Shallow vs. Deep Feed-Forward Neural Networks
[10]Deep Learning: a theoretical introduction – Episode 2
Parity Circuits
[11]Deep Learning: a theoretical introduction – Episode 2
Parity Circuits
[12]Deep Learning: a theoretical introduction – Episode 2
Parity Circuits
[13]Deep Learning: a theoretical introduction – Episode 2
Parity Circuits
[14]Deep Learning: a theoretical introduction – Episode 2
Parity Circuits
[15]Deep Learning: a theoretical introduction – Episode 2
Depth and piecewise linear functions
[16]Deep Learning: a theoretical introduction – Episode 2
Depth and piecewise linear functions
[17]Deep Learning: a theoretical introduction – Episode 2
k > 2 h(2)
Depth and piecewise linear functions
[18]Deep Learning: a theoretical introduction – Episode 2
h
k h
pmax d k
Depth and piecewise linear functions
[19]Deep Learning: a theoretical introduction – Episode 2
About why they did not useDeep Networks
from the beginning
[20]Deep Learning: a theoretical introduction – Episode 2
Problem: vanishing or exploding Gradients
[21]Deep Learning: a theoretical introduction – Episode 2
Problem: vanishing or exploding Gradients
[22]Deep Learning: a theoretical introduction – Episode 2
• g
•
• W(i)
Problem: vanishing or exploding Gradients
k
[23]Deep Learning: a theoretical introduction – Episode 2
Problem: initial values of the parameters
[24]Deep Learning: a theoretical introduction – Episode 2
A bag of wonderful tricks
[25]Deep Learning: a theoretical introduction – Episode 2
Why ReLU is better (sometimes)
[26]Deep Learning: a theoretical introduction – Episode 2
•
Why ReLU is better (sometimes)
[27]Deep Learning: a theoretical introduction – Episode 2
•
•
Why ReLU is better (sometimes)
[28]Deep Learning: a theoretical introduction – Episode 2
Overfitting
[29]Deep Learning: a theoretical introduction – Episode 2
Dropout
[30]Deep Learning: a theoretical introduction – Episode 2
Dropout
[31]Deep Learning: a theoretical introduction – Episode 2
Dropout
[32]Deep Learning: a theoretical introduction – Episode 2
Dropout
[33]Deep Learning: a theoretical introduction – Episode 2
•
•
Contrasting Overfitting
[34]Deep Learning: a theoretical introduction – Episode 2
Improving on MBGD
[35]Deep Learning: a theoretical introduction – Episode 2
Improving on MBGD
[36]Deep Learning: a theoretical introduction – Episode 2
Improving on MBGD
[37]Deep Learning: a theoretical introduction – Episode 2
AdaGrad
[38]Deep Learning: a theoretical introduction – Episode 2
AdaGrad
B
[39]Deep Learning: a theoretical introduction – Episode 2
AdaGrad
a1 a2
[40]Deep Learning: a theoretical introduction – Episode 2
•
•
AdaGrad
d
[41]Deep Learning: a theoretical introduction – Episode 2
•
•
AdaGrad
d
[42]Deep Learning: a theoretical introduction – Episode 2
AdaGrad
[43]Deep Learning: a theoretical introduction – Episode 2
AdaDelta
[44]Deep Learning: a theoretical introduction – Episode 2
Improving on MBGD
[45]Deep Learning: a theoretical introduction – Episode 2
Improving on MBGD
[46]Deep Learning: a theoretical introduction – Episode 2
Improving on MBGD
[47]Deep Learning: a theoretical introduction – Episode 2
An aside:function approximation vs. classification
[48]Deep Learning: a theoretical introduction – Episode 2
Classification: Softmax
[49]Deep Learning: a theoretical introduction – Episode 2
Classification: Softmax
[50]Deep Learning: a theoretical introduction – Episode 2
Classification: Softmax
i
[51]Deep Learning: a theoretical introduction – Episode 2
Classification: Softmax
[52]Deep Learning: a theoretical introduction – Episode 2
Classification: Softmax
[53]Deep Learning: a theoretical introduction – Episode 2
Classification: Softmax
[54]Deep Learning: a theoretical introduction – Episode 2
Classification: Softmax
m wl
[55]Deep Learning: a theoretical introduction – Episode 2
Classification: Softmax
[56]Deep Learning: a theoretical introduction – Episode 2
Classification: Softmax
[57]Deep Learning: a theoretical introduction – Episode 2
Classification: Softmax
h
[58]Deep Learning: a theoretical introduction – Episode 2
Another aside:autoencoders
[59]Deep Learning: a theoretical introduction – Episode 2
Auto-encoders
[60]Deep Learning: a theoretical introduction – Episode 2
Auto-encoders