cs 6501: deep learning for computer graphics …...overview • simple neural networks •...
TRANSCRIPT
![Page 1: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/1.jpg)
CS 6501: Deep Learning forComputer Graphics
Basics of Neural Networks
Connelly Barnes
![Page 2: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/2.jpg)
Overview
• Simple neural networks• Perceptron• Feedforward neural networks• Multilayer perceptron and properties• Autoencoders
• How to train neural networks• Gradient descent• Stochastic gradient descent• Automatic differentiation• Backpropagation
![Page 3: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/3.jpg)
Perceptron (1957, Cornell)
InputsOutput (Class)
Bias b:arbitrary, learned
parameterWeights: arbitrary,learned parameters
![Page 4: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/4.jpg)
Perceptron (1957, Cornell)
• Binary classifier, can learn linearly separable patterns.
Diagram from Wikipedia
![Page 5: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/5.jpg)
Feedforward neural networks
• We could connect units (neurons) in any arbitrary graph
Input Output
![Page 6: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/6.jpg)
Feedforward neural networks
• We could connect units (neurons) in any arbitrary graph• If no cycles in the graph we call it a feedforward neural network.
Input Output
![Page 7: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/7.jpg)
Recurrent neural networks (later)
• If cycles in the graph we call it a recurrent neural network.
Input Output
![Page 8: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/8.jpg)
Overview
• Simple neural networks• Perceptron• Feedforward neural networks• Multilayer perceptron and properties• Autoencoders
• How to train neural networks• Gradient descent• Stochastic gradient descent• Automatic differentiation• Backpropagation
![Page 9: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/9.jpg)
Multilayer Perceptron (1960s)
Li = f (WiLi−1 +bi )In matrix notation:
𝐋": Input layer (inputs) vector
𝐋$: Hidden layer vector
𝐋%: Output layer vector
, 𝑖 ≥ 1
𝐖+: Weight matrix for connections from layer i-1 to layer i
𝐛+: Biases for neurons in layer iActivation function for layer ifi:
i
![Page 10: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/10.jpg)
Activation Functions: Sigmoid / Logistic
Problems:• Gradients at tails are almost zero• Outputs are not zero-centered
𝑓 𝑥 =1
1 + 𝑒23𝑑𝑓𝑑𝑥
= 𝑓(𝑥)(1 − 𝑓 𝑥 )
![Page 11: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/11.jpg)
Activation Functions: Tanh
Problems:• Gradients at tails are almost zero
𝑓 𝑥 = tanh 𝑥 =2
1+ 𝑒2$3− 1
𝑑𝑓𝑑𝑥
= 1− 𝑓 𝑥 $
![Page 12: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/12.jpg)
Activation Functions: ReLU (Rectified Linear Unit)
𝑓 𝑥 = max (𝑥, 0) 𝑑𝑓𝑑𝑥
= A1, if 𝑥 > 00, if 𝑥 < 0
Pros:• Accelerates training stage by 6x over sigmoid/tanh [1]• Simple to compute• Sparser activation patternsCons:• Neurons can “die” by getting stuck in zero gradient regionSummary:• Currently preferred kind of neuron
![Page 13: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/13.jpg)
Universal Approximation Theorem
• Multilayer perceptron with a single hidden layer and linear output layer can approximate any continuous function on a compact subset of ℝG to within any desired degree of accuracy.
• Assumes activation function is bounded, non-constant, monotonically increasing.
• Also applies for ReLU activation function.
![Page 14: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/14.jpg)
Universal Approximation Theorem
• In the worst case, exponential number of hidden units may be required.
• Can informally show this for binary case:• If we have n bits input to a binary function, how many possible
inputs are there?• How many possible binary functions are there?• So how many weights do we need to represent a given binary
function?
![Page 15: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/15.jpg)
Why Use Deep Networks?
• Functions representable with a deep rectifier network can require an exponential number of hidden units with a shallow (one hidden layer) network (Goodfellow 6.4)
• Piecewise linear networks (e.g. using ReLU) can represent functions that have a number of regions exponential in depth of network.• Can capture repeating / mirroring / symmetric patterns in data.• Empirically, greater depth often results in better generalization.
![Page 16: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/16.jpg)
Neural Network Architecture
• Architecture: refers to which parameters (e.g. weights) are used in the network and their topological connectivity.
• Fully connected: A common connectivity pattern for multilayer perceptrons. All possible connections made between layers i-1 and i.
Is this network fully connected?
![Page 17: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/17.jpg)
Neural Network Architecture
• Architecture: refers to which parameters (e.g. weights) are used in the network and their topological connectivity.
• Fully connected: A common connectivity pattern for multilayer perceptrons. All possible connections made between layers i-1 and i.
Is this network fully connected?
Input Output
![Page 18: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/18.jpg)
How to Choose Network Architecture?
• Long discussion• Summary:
• Rules of thumb do not work.• “Need 10x [or 30x] more training data than weights.”• Not true if very low noise• Might need even more training data if high noise
• Try many networks with different numbers of units and layers• Check generalization using validation dataset or cross-validation.
![Page 19: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/19.jpg)
Overview
• Simple neural networks• Perceptron• Feedforward neural networks• Multilayer perceptron and properties• Autoencoders
• How to train neural networks• Gradient descent• Stochastic gradient descent• Automatic differentiation• Backpropagation
![Page 20: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/20.jpg)
Autoencoders
• Learn the identity function ℎI 𝐱 = 𝐱
• Is this supervised or unsupervised learning?
Input OutputHidden
Encode Decode
![Page 21: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/21.jpg)
Autoencoders
• Applications:• Dimensionality reduction• Learning manifolds• Hashing for search problems
Input OutputHidden
Encode Decode
![Page 22: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/22.jpg)
Overview
• Simple neural networks• Perceptron• Feedforward neural networks• Multilayer perceptron and properties• Autoencoders
• How to train neural networks• Gradient descent• Stochastic gradient descent• Automatic differentiation• Backpropagation
![Page 23: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/23.jpg)
Gradient Descent
Step sizeorLearning rate
Discuss amongst students near you:
• What are some problemsthat could be easilyoptimized with gradient descent?
• Problems where this is difficult?• Should the learning rate be
constant or change?
![Page 24: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/24.jpg)
Gradient Descent with Energy Functions that have Narrow Valleys
Source: "Banana-SteepDesc" by P.A. Simionescu – Wikipedia English
“Zig-zagging problem”
![Page 25: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/25.jpg)
Gradient Descent with Momentum
xn+1 = xn +Δn
Δn = γn∇F(xn )+mΔn−1
MomentumCould use small value e.g. m=0.5 at firstCould use larger value e.g. m=0.9 near endof training when there are more oscillations.
𝐱GK" = 𝐱G − ∆G
![Page 26: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/26.jpg)
Gradient Descent with Momentum
Without Momentum With Momentum
Figure from Genevieve B. Orr, Willamette.edu
![Page 27: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/27.jpg)
Stochastic gradient descent
• Stochastic gradient descent (Wikipedia)• Gradient of sum of n terms where n is large• Sample rather than computing the full sum
• Sample size s is “mini-batch size”• Could be 1 (very noisy gradient estimate)• Could be 100 (collect photos 100 at a time to find each noisy
“next” estimate for the gradient)• Use same step as in gradient descent to the estimated gradient
![Page 28: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/28.jpg)
Stochastic gradient descent
• Pseudocode:
From Wikipedia
![Page 29: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/29.jpg)
Problem Statement
• Take the gradient of an arbitrary program or model (e.g. a neural network) with respect to the parameters in the model (e.g. weights).
• If we can do this, we can use gradient descent!
![Page 30: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/30.jpg)
Review: Chain Rule in One Dimension
• Suppose 𝑓:ℝ → ℝ and 𝑔:ℝ → ℝ• Define
• Then what is ℎO 𝑥 = 𝑑ℎ/𝑑𝑥 ?
ℎ 𝑥 = 𝑓(𝑔 𝑥 )
ℎ′ 𝑥 = 𝑓O 𝑔 𝑥 𝑔′(𝑥)
![Page 31: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/31.jpg)
Chain Rule in Multiple Dimensions
• Suppose 𝑓:ℝR → ℝ and 𝑔:ℝG → ℝR, and 𝐱 ∈ ℝG
• Define
• Then we can define partial derivatives using the multidimensional chain rule:
ℎ 𝐱 = 𝑓(𝑔" 𝐱 ,… , 𝑔R 𝐱 )
𝜕𝑓𝜕𝑥+
= V𝜕𝑓𝜕𝑔W
𝜕𝑔W𝜕𝑥+
R
XY"
![Page 32: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/32.jpg)
The Problem with Symbolic Derivatives
• What if our program takes 5 exponential operations:
𝑦 = exp exp exp exp exp 𝑥
• What is dy/dx? (blackboard)
• How many exponential operations in the resulting expression?• What if the program contained n exponential operations?
![Page 33: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/33.jpg)
Solution: Automatic Differentiation (1960s, 1970s)
• Write an arbitrary program as consisting of basic operations f1, ..., fn(e.g. +, -, *, cos, sin, …) that we know how to differentiate.
• Label the inputs of the program as 𝑥",… , 𝑥G, and the output 𝑥].• The computation:
For 𝑖 = 𝑛 + 1,… ,𝑁
• Reverse mode automatic differentiation: apply the chain rule from the end of the program 𝑥] back towards the beginning.
Explanation from Justin Domke
𝑥+ = 𝑓+(xa(+)) 𝜋(𝑖): Sequence of “parent” values(e.g. if 𝜋 3 = (1,2), and f3=+, then x3=x1+x2)
![Page 34: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/34.jpg)
Solution for Simplified Chain of Dependencies• Suppose 𝜋 𝑖 = 𝑖 − 1• The computation:
For 𝑖 = 𝑛 + 1,… ,𝑁
• What is ?
𝑥+ = 𝑓+(x+2")
For example:Input
𝑑𝑥]𝑑𝑥]
x1𝑥$ =𝑓$(x")
𝑥% =𝑓%(x$)
Output
![Page 35: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/35.jpg)
Solution for Simplified Chain of Dependencies• Suppose 𝜋 𝑖 = 𝑖 − 1• The computation:
For 𝑖 = 𝑛 + 1,… ,𝑁
• What is =
𝑥+ = 𝑓+(x+2")
For example:Input
𝑑𝑥]𝑑𝑥]
x1𝑥$ =𝑓$(x")
𝑥% =𝑓%(x$)
Output
1
![Page 36: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/36.jpg)
Solution for Simplified Chain of Dependencies• Suppose 𝜋 𝑖 = 𝑖 − 1• The computation:
For 𝑖 = 𝑛 + 1,… ,𝑁
• What is in terms of ?
𝑥+ = 𝑓+(x+2")
For example:Input
x1𝑥$ =𝑓$(x")
𝑥% =𝑓%(x$)
Output
𝑑𝑥]𝑑𝑥+
𝑑𝑥]𝑑𝑥+K"
𝑑𝑥+K" = 𝑑𝑥+𝜕𝑥+K"𝜕𝑥+
𝑑𝑥+K"𝑑𝑥]
=𝑑𝑥+𝑑𝑥]
𝜕𝑥+K"𝜕𝑥+
𝑑𝑥]𝑑𝑥+
=𝑑𝑥]𝑑𝑥+K"
𝜕𝑥+K"𝜕𝑥+
What is this?
![Page 37: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/37.jpg)
Solution for Simplified Chain of Dependencies• Suppose 𝜋 𝑖 = 𝑖 − 1• The computation:
For 𝑖 = 𝑛 + 1,… ,𝑁
• What is in terms of ?
• Conclusion: run the computation forwards. Then initialize and work backwards through the computation to find for each ifrom . This gives us the gradient of the output (𝑥]) with respect to every expression in our compute graph!
𝑥+ = 𝑓+(x+2")
For example:Input
x1𝑥$ =𝑓$(x")
𝑥% =𝑓%(x$)
Output
𝑑𝑥]𝑑𝑥+
𝑑𝑥]𝑑𝑥+K"
𝑑𝑥]𝑑𝑥+
=𝑑𝑥]𝑑𝑥+K"
𝜕𝑥+K"𝜕𝑥+𝑑𝑥]𝑑𝑥]
= 1𝑑𝑥]𝑑𝑥+𝑑𝑥]
𝑑𝑥+K"
![Page 38: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/38.jpg)
• The computation:For 𝑖 = 𝑛 + 1,… ,𝑁
• Solution: apply multi-dimensional chain rule.
What if the Dependency Graph is More Complex?
Input
x1𝑥$ =𝑓$(x")
𝑥d =𝑓d(x%,xe)
Output𝑥% =𝑓%(x$)
𝑥e =𝑓e(x$)
𝑥+ = 𝑓+(xa(+)) 𝜋(𝑖): Sequence of “parent” values(e.g. if 𝜋 3 = (1,2), and f3=+, then x3=x1+x2)
![Page 39: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/39.jpg)
Solution: Automatic Differentiation (1960s, 1970s)
• Computation:
• Multidimensional chain rule:
• Result:
Explanation from Justin Domke
𝑥+ = 𝑓+(xa(+))
𝑑𝑥]𝑑𝑥+
= V𝑑𝑥]𝑑𝑥W
𝜕𝑥W𝜕𝑥+W:+∈a(W)
𝜕𝑓𝜕𝑥+
= V𝜕𝑓𝜕𝑔W
𝜕𝑔W𝜕𝑥+
R
XY"
𝑓(𝑔" 𝐱 ,… , 𝑔R 𝐱 )
![Page 40: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/40.jpg)
Solution: Automatic Differentiation (1960s, 1970s)
• Back-propagation algorithm: initialize:
• For i = N – 1, N – 2, …, 1, compute:
• Now we have differentiated the output of the program 𝑥] with respect to the inputs 𝑥",… , 𝑥G, as well as every computed expression 𝑥+.
Explanation from Justin Domke
𝑑𝑥]𝑑𝑥+
= V𝑑𝑥]𝑑𝑥W
𝜕𝑥W𝜕𝑥+W:+∈a(W)
𝑑𝑥]𝑑𝑥]
= 1 Example on blackboardfor a program with oneaddition.
![Page 41: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/41.jpg)
Backpropagation Algorithm (1960s-‐1980s)
• Apply reverse mode automatic differentiation to a neural network’s loss function.
• A special case of what we just derived.• If we have one output neuron, squared error is:
From Wikipedia
![Page 42: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/42.jpg)
Backpropagation Algorithm (1960s-‐1980s)
From Wikipedia
![Page 43: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/43.jpg)
• Apply the chain rule twice:
• Last term is easy:
Backpropagation Algorithm (1960s-‐1980s)
From Wikipedia
![Page 44: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/44.jpg)
• Apply the chain rule twice:
• Second term is easy:
Backpropagation Algorithm (1960s-‐1980s)
From Wikipedia
𝜕𝑜W𝜕𝑛𝑒𝑡W
= 𝜑′(netW)
![Page 45: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/45.jpg)
• Apply the chain rule twice:
• If the neuron is in output layer, first term is easy:
Backpropagation Algorithm (1960s-‐1980s)
From Wikipedia
(Derivation on board)
![Page 46: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/46.jpg)
• Apply the chain rule twice:
• If the neuron is interior neuron, we use the chainrule from automatic differentiation.
• To do this, we need to know what expressions dependon the current neuron’s output oj?
• Answer: other neurons input sums, i.e. netl for all neurons l receivinginputs from the current neuron.
Backpropagation Algorithm (1960s-‐1980s)
From Wikipedia
![Page 47: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/47.jpg)
• Apply the chain rule twice:
• If the neuron is an interior neuron, chain rule:
Backpropagation Algorithm (1960s-‐1980s)
From WikipediaAll neurons receiving input from the current neuron.
![Page 48: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/48.jpg)
Backpropagation Algorithm (1960s-‐1980s)
• Partial derivative of error E with respect to weight wij:
𝛿W =𝜕𝐸𝜕𝑜W
𝜕𝑜W𝜕netW
= 𝜑′(𝑜W)k(𝑜W−𝑡W) if 𝑗 is an output neuron
V 𝛿X𝑤WXX∈r
if 𝑗 is an interior neuron
𝜕𝐸𝜕𝑤+W
= 𝛿W𝑜+
From Wikipedia
All neurons receiving input from the current neuron.
![Page 49: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/49.jpg)
Backpropagation Algorithm (1960s-‐1980s)
• Calculate network and error.
Forward direction
![Page 50: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/50.jpg)
Backpropagation Algorithm (1960s-‐1980s)
• Backpropagate: from output to input, recursively compute
Backward direction
𝜕𝐸𝜕𝑤+W
= 𝛁I𝐸
![Page 51: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/51.jpg)
Gradient Descent with Backpropagation
Step sizeorLearning rate
𝐰GK" = 𝐰G − 𝛾G𝛁I𝐸(𝐰G)
• Initialize weights at goodstarting point w0
• Repeatedly apply gradientdescent step (1)
• Continue training untilvalidation error hits aminimum.
(1)
![Page 52: CS 6501: Deep Learning for Computer Graphics …...Overview • Simple neural networks • Perceptron • Feedforward neural networks • Multilayer perceptron and properties • Autoencoders](https://reader033.vdocuments.mx/reader033/viewer/2022052612/5f0a380e7e708231d42a9651/html5/thumbnails/52.jpg)
Stochastic Gradient Descent with Backpropagation
Step sizeorLearning rate
𝐰GK" = 𝐰G − 𝛾G𝛁I𝐸+ (𝐰G)
• Initialize weights at goodstarting point w0
• Repeat until validation errorhits a minimum:• Randomly shuffle dataset• Loop through mini-batches
of data, batch index is i• Calculate stochastic gradient
using backpropagation foreach, and apply update rule (1)
(1)