deep neural networks scanning for patterns (aka ... · –gradients can be computed by...
TRANSCRIPT
![Page 1: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/1.jpg)
Deep Neural Networks
Scanning for patterns(aka convolutional networks)
Bhiksha Raj
1
![Page 2: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/2.jpg)
Story so far
• MLPs are universal function approximators
– Boolean functions, classifiers, and regressions
• MLPs can be trained through variations of
gradient descent
– Gradients can be computed by backpropagation
2
![Page 3: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/3.jpg)
input layer
output layer
The model so far
• Can recognize patterns in data
– E.g. digits
– Or any other vector data
Or, more generallya vector input
![Page 4: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/4.jpg)
An important observation
• The lowest layers of the network capture simple patterns– The linear decision boundaries in this example
• The next layer captures more complex patterns– The polygons
• The next one captures still more complex patterns.. 4
x2
AND AND
OR
x1 x1 x2
![Page 5: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/5.jpg)
An important observation
• The neurons in an MLP build up complex patterns from simple pattern hierarchically
– Each layer learns to “detect” simple combinations of the patterns detected by earlier layers
• This is because the basic units themselves are simple
– Typically linear classifiers or thresholding units
– Incapable of individually holding complex patterns 5
x2
AND AND
OR
x1 x1 x2
![Page 6: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/6.jpg)
What do the neurons capture?
• To understand the behavior of neurons in the network, lets consider an individual perceptron
– The perceptron is fully represented by its weights
– For illustration, we consider a simple threshold activation
• What do the weights tell us?
– The perceptron “fires” if the inner product between the weights and the inputs exceeds a threshold 6
x1
x2
x3
xN
𝑦 = ൞1 𝑖𝑓
𝑖
𝑤𝑖x𝑖 ≥ 𝑇
0 𝑒𝑙𝑠𝑒
𝑦 = ቊ1 𝑖𝑓 𝐱𝑇𝐰 ≥ 𝑇
0 𝑒𝑙𝑠𝑒
![Page 7: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/7.jpg)
The weight as a “template”
• A perceptron fires if its input is within a specified angle of its weight
– Represents a convex region on the surface of the sphere!
• I.e. the perceptron fires if the input vector is close enough to the weight vector
– If the input pattern matches the weight pattern closely enough7
w
𝐱T𝐰 > T
⇒ cos θ >T
𝐱 𝐰
⇒ θ < cos−1T
𝐱 𝐰
x1
x2
x3
xN
𝐰
𝐱 θ
![Page 8: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/8.jpg)
The weights as a correlation filter
• If the correlation between the weight pattern and the inputs exceeds a threshold, fire
• The perceptron is a correlation filter!8
W X X
Correlation = 0.57 Correlation = 0.82
𝑦 = ൞1 𝑖𝑓
𝑖
𝑤𝑖x𝑖 ≥ 𝑇
0 𝑒𝑙𝑠𝑒
![Page 9: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/9.jpg)
The MLP as a Boolean function over feature detectors
• The input layer comprises “feature detectors”
– Detect if certain patterns have occurred in the input
• The network is a Boolean function over the feature detectors
• I.e. it is important for the first layer to capture relevant patterns9
DIGIT OR NOT?
![Page 10: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/10.jpg)
The MLP as a cascade of feature detectors
• The network is a cascade of feature detectors
– Higher level neurons compose complex templates from features represented by lower-level neurons• They OR or AND the patterns from the lower layer 10
DIGIT OR NOT?
![Page 11: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/11.jpg)
Story so far
• MLPs are Boolean machines
– They represent Boolean functions over linear boundaries
– They can represent arbitrary boundaries
• Perceptrons are correlation filters
– They detect patterns in the input
• Layers in an MLP are detectors of increasingly complex patterns
– Patterns of lower-complexity patterns
• MLP in classification
– The network will fire if the combination of the detected basic features matches an “acceptable” pattern for a desired class of signal
• E.g. Appropriate combinations of (Nose, Eyes, Eyebrows, Cheek, Chin) Face
11
![Page 12: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/12.jpg)
Changing gears..
![Page 13: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/13.jpg)
A problem
• Does this signal contain the word “Welcome”?
• Compose an MLP for this problem.
– Assuming all recordings are exactly the same length..
![Page 14: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/14.jpg)
Finding a Welcome
• Trivial solution: Train an MLP for the entire recording
![Page 15: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/15.jpg)
Finding a Welcome
• Problem with trivial solution: Network that finds a “welcome” in the top recording will not find it in the lower one
– Unless trained with both
– Will require a very large network and a large amount of training data to cover every case
![Page 16: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/16.jpg)
Finding a Welcome
• Need a simple network that will fire regardless
of the location of “Welcome”
– and not fire when there is none
![Page 17: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/17.jpg)
Flowers
• Is there a flower in any of these images
![Page 18: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/18.jpg)
A problem
• Will an MLP that recognizes the left image as a flower
also recognize the one on the right as a flower?
input layer
output layer
![Page 19: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/19.jpg)
A problem
• Need a network that will “fire” regardless of
the precise location of the target object
![Page 20: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/20.jpg)
The need for shift invariance
• In many problems the location of a pattern is not important
– Only the presence of the pattern
• Conventional MLPs are sensitive to the location of the pattern
– Moving it by one component results in an entirely different input that the MLP wont recognize
• Requirement: Network must be shift invariant
=
![Page 21: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/21.jpg)
The need for shift invariance
• In many problems the location of a pattern is not important
– Only the presence of the pattern
• Conventional MLPs are sensitive to the location of the pattern
– Moving it by one component results in an entirely different input that the MLP wont recognize
• Requirement: Network must be shift invariant
![Page 22: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/22.jpg)
Solution: Scan
• Scan for the target word
– The spectral time-frequency components in a “window” are input to a “welcome-detector” MLP
![Page 23: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/23.jpg)
Solution: Scan
• Scan for the target word
– The spectral time-frequency components in a “window” are input to a “welcome-detector” MLP
![Page 24: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/24.jpg)
Solution: Scan
• Scan for the target word
– The spectral time-frequency components in a “window” are input to a “welcome-detector” MLP
![Page 25: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/25.jpg)
Solution: Scan
• Scan for the target word
– The spectral time-frequency components in a “window” are input to a “welcome-detector” MLP
![Page 26: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/26.jpg)
Solution: Scan
• Scan for the target word
– The spectral time-frequency components in a “window” are input to a “welcome-detector” MLP
![Page 27: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/27.jpg)
Solution: Scan
• Scan for the target word
– The spectral time-frequency components in a “window” are input to a “welcome-detector” MLP
![Page 28: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/28.jpg)
Solution: Scan
• “Does welcome occur in this recording?”– We have classified many “windows” individually
– “Welcome” may have occurred in any of them
![Page 29: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/29.jpg)
Solution: Scan
• “Does welcome occur in this recording?”
– Maximum of all the outputs (Equivalent of Boolean OR)
MAX
![Page 30: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/30.jpg)
Solution: Scan
• “Does welcome occur in this recording?”
– Maximum of all the outputs (Equivalent of Boolean OR)
– Or a proper softmax/logistic
• Finding a welcome in adjacent windows makes it more likely that we didn’t find noise
Perceptron
![Page 31: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/31.jpg)
Solution: Scan
• “Does welcome occur in this recording?”
– Maximum of all the outputs (Equivalent of Boolean OR)
– Or a proper softmax/logistic
• Adjacent windows can combine their evidence
– Or even an MLP
![Page 32: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/32.jpg)
Solution: Scan
• The entire operation can be viewed as one giant network– With many subnetworks, one per window
– Restriction: All subnets are identical
![Page 33: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/33.jpg)
The 2-d analogue: Does this picture have a flower?
• Scan for the desired object
– “Look” for the target object at each position
![Page 34: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/34.jpg)
Solution: Scan
• Scan for the desired object
![Page 35: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/35.jpg)
Solution: Scan
• Scan for the desired object
![Page 36: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/36.jpg)
Solution: Scan
• Scan for the desired object
![Page 37: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/37.jpg)
Solution: Scan
• Scan for the desired object
![Page 38: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/38.jpg)
Solution: Scan
• Scan for the desired object
![Page 39: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/39.jpg)
Solution: Scan
• Scan for the desired object
![Page 40: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/40.jpg)
Solution: Scan
• Scan for the desired object
![Page 41: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/41.jpg)
Solution: Scan
• Scan for the desired object
![Page 42: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/42.jpg)
Solution: Scan
• Scan for the desired object
![Page 43: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/43.jpg)
Solution: Scan
• Scan for the desired object
![Page 44: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/44.jpg)
Solution: Scan
• Scan for the desired object
![Page 45: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/45.jpg)
Solution: Scan
• Scan for the desired object
![Page 46: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/46.jpg)
Solution: Scan
• Scan for the desired object
![Page 47: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/47.jpg)
Solution: Scan
• Scan for the desired object
![Page 48: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/48.jpg)
Scanning
• Scan for the desired object
• At each location, the entire region is sent
through an MLP
Input (the pixel data)
![Page 49: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/49.jpg)
Scanning the picture to find a flower
• Determine if any of the locations had a flower
– We get one classification output per scanned location
• The score output by the MLP
– Look at the maximum value
max
![Page 50: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/50.jpg)
Its just a giant network with common subnets
• Determine if any of the locations had a flower– We get one classification output per scanned location
• The score output by the MLP
– Look at the maximum value
– Or pass it through an MLP
![Page 51: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/51.jpg)
Its just a giant network with common subnets
• The entire operation can be viewed as a single giant network
– Composed of many “subnets” (one per window)
– With one key feature: all subnets are identical
![Page 52: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/52.jpg)
Training the network
• These are really just large networks
• Can just use conventional backpropagation to learn the parameters– Provide many training examples
• Images with and without flowers
• Speech recordings with and without the word “welcome”
– Gradient descent to minimize the total divergence between predicted and desired outputs
• Backprop learns a network that maps the training inputs to the target binary outputs
![Page 53: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/53.jpg)
Training the network: constraint
• These are shared parameter networks
– All lower-level subnets are identical
• Are all searching for the same pattern
– Any update of the parameters of one copy of the
subnet must equally update all copies
![Page 54: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/54.jpg)
Learning in shared parameter networks
• Consider a simple network with shared weights
𝑤𝑖𝑗𝑘 = 𝑤𝑚𝑛
𝑙 = 𝑤𝒮
– A weight 𝑤𝑖𝑗𝑘 is required to be
identical to the weight 𝑤𝑚𝑛𝑙
• For any training instance 𝑿, a small perturbation of 𝑤𝒮perturbs both
𝑤𝑖𝑗𝑘 and 𝑤𝑚𝑛
𝑙 identically
– Each of these perturbations will individually influence the divergence 𝐷𝑖𝑣(𝑑, 𝑦)
𝑿
𝑦
Div 𝑑
𝐷𝑖𝑣(𝑑, 𝑦)
![Page 55: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/55.jpg)
Computing the divergence of shared parameters
𝑑𝐷𝑖𝑣
𝑑𝑤𝒮 =𝑑𝐷𝑖𝑣
𝑑𝑤𝑖𝑗𝑘
𝑑𝑤𝑖𝑗𝑘
𝑑𝑤𝒮 +𝑑𝐷𝑖𝑣
𝑑𝑤𝑚𝑛𝑙
𝑑𝑤𝑚𝑛𝑙
𝑑𝑤𝒮
=𝑑𝐷𝑖𝑣
𝑑𝑤𝑖𝑗𝑘 +
𝑑𝐷𝑖𝑣
𝑑𝑤𝑚𝑛𝑙
• Each of the individual terms can be computed via backpropagation
Influence diagram
𝑤𝑖𝑗𝑘 𝑤𝑚𝑛
𝑙
𝑤𝒮
𝐷𝑖𝑣
𝑿
𝑦
Div 𝑑
𝐷𝑖𝑣(𝑑, 𝑦)
![Page 56: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/56.jpg)
Computing the divergence of shared parameters
• More generally, let 𝒮 be any set of edges that have a common value, and 𝑤𝒮 be the common weight of the set– E.g. the set of all red weights in the figure
𝑑𝐷𝑖𝑣
𝑑𝑤𝒮 =
𝑒∈𝒮
𝑑𝐷𝑖𝑣
𝑑𝑤𝑒
• The individual terms in the sum can be computed via backpropagation
𝒮 = 𝑒1, 𝑒1, … , 𝑒𝑁
![Page 57: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/57.jpg)
Standard gradient descent training of networks
• Gradient descent algorithm:
• Initialize all weights 𝐖1,𝐖2, … ,𝐖𝐾
• Do:
– For every layer 𝑘 for all 𝑖, 𝑗, update:
• 𝑤𝑖,𝑗(𝑘)
= 𝑤𝑖,𝑗(𝑘)
− 𝜂𝑑𝐸𝑟𝑟
𝑑𝑤𝑖,𝑗(𝑘)
• Until 𝐸𝑟𝑟 has converged57
Total training error:
𝐸𝑟𝑟 =
𝒕
𝐷𝑖𝑣(𝒀𝒕, 𝒅𝒕;𝐖1,𝐖2, … ,𝐖𝐾)
![Page 58: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/58.jpg)
Training networks with shared parameters
• Gradient descent algorithm:
• Initialize all weights 𝐖1,𝐖2, … ,𝐖𝐾
• Do:
– For every set 𝒮:• Compute:
𝛻𝒮𝐸𝑟𝑟 =𝑑𝐸𝑟𝑟
𝑑𝑤𝒮
𝑤𝒮 = 𝑤𝒮 − 𝜂𝛻𝒮𝐸𝑟𝑟
• For every (𝑘, 𝑖, 𝑗) ∈ 𝒮 update:
𝑤𝑖,𝑗(𝑘)
= 𝑤𝒮
• Until 𝐸𝑟𝑟 has converged58
![Page 59: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/59.jpg)
Training networks with shared parameters
• Gradient descent algorithm:
• Initialize all weights 𝐖1,𝐖2, … ,𝐖𝐾
• Do:
– For every set 𝒮:• Compute:
𝛻𝒮𝐸𝑟𝑟 =𝑑𝐸𝑟𝑟
𝑑𝑤𝒮
𝑤𝒮 = 𝑤𝒮 − 𝜂𝛻𝒮𝐸𝑟𝑟
• For every (𝑘, 𝑖, 𝑗) ∈ 𝒮 update:
𝑤𝑖,𝑗(𝑘)
= 𝑤𝒮
• Until 𝐸𝑟𝑟 has converged59
![Page 60: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/60.jpg)
Training networks with shared parameters
• Gradient descent algorithm:
• Initialize all weights 𝐖1,𝐖2, … ,𝐖𝐾
• Do:
– For every set 𝒮:• Compute:
𝛻𝒮𝐸𝑟𝑟 =𝑑𝐸𝑟𝑟
𝑑𝑤𝒮
𝑤𝒮 = 𝑤𝒮 − 𝜂𝛻𝒮𝐸𝑟𝑟
• For every (𝑘, 𝑖, 𝑗) ∈ 𝒮 update:
𝑤𝑖,𝑗(𝑘)
= 𝑤𝒮
• Until 𝐸𝑟𝑟 has converged60
• For every training instance 𝑋• For every set 𝒮:
• For every (𝑘, 𝑖, 𝑗) ∈ 𝒮:
𝛻𝒮𝐷𝑖𝑣 +=𝑑𝐷𝑖𝑣
𝑑𝑤𝑖,𝑗(𝑘)
• 𝛻𝒮𝐸𝑟𝑟 += 𝛻𝒮𝐷𝑖𝑣
![Page 61: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/61.jpg)
Training networks with shared parameters
• Gradient descent algorithm:
• Initialize all weights 𝐖1,𝐖2, … ,𝐖𝐾
• Do:
– For every set 𝒮:• Compute:
𝛻𝒮𝐸𝑟𝑟 =𝑑𝐸𝑟𝑟
𝑑𝑤𝒮
𝑤𝒮 = 𝑤𝒮 − 𝜂𝛻𝒮𝐸𝑟𝑟
• For every (𝑘, 𝑖, 𝑗) ∈ 𝒮 update:
𝑤𝑖,𝑗(𝑘)
= 𝑤𝒮
• Until 𝐸𝑟𝑟 has converged61
• For every training instance 𝑋• For every set 𝒮:
• For every (𝑘, 𝑖, 𝑗) ∈ 𝒮:
𝛻𝒮𝐷𝑖𝑣 +=𝑑𝐷𝑖𝑣
𝑑𝑤𝑖,𝑗(𝑘)
• 𝛻𝒮𝐸𝑟𝑟 += 𝛻𝒮𝐷𝑖𝑣
Computed byBackprop
![Page 62: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/62.jpg)
Story so far
• Position-invariant pattern classification can be performed by scanning
– 1-D scanning for sound
– 2-D scanning for images
– 3-D and higher-dimensional scans for higher dimensional data
• Scanning is equivalent to composing a large network with repeating subnets
– The large network has shared subnets
• Learning in scanned networks: Backpropagation rules must be modified to combine gradients from parameters that share the same value
– The principle applies in general for networks with shared parameters
![Page 63: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/63.jpg)
Scanning: A closer look
• Scan for the desired object
• At each location, the entire region is sent
through an MLP
Input (the pixel data)
![Page 64: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/64.jpg)
Scanning: A closer look
• The “input layer” is just the pixels in the image
connecting to the hidden layer
Input layer Hidden layer
![Page 65: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/65.jpg)
Scanning: A closer look
• Consider a single neuron
![Page 66: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/66.jpg)
Scanning: A closer look
• Consider a single perceptron
• At each position of the box, the perceptron is evaluating the part of the picture in the box as part of the classification for that region
– We could arrange the outputs of the neurons for each position correspondingly to the original picture
𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛
𝑖,𝑗
𝑤𝑖𝑗𝑝𝑖𝑗 + 𝑏
![Page 67: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/67.jpg)
Scanning: A closer look
• Consider a single perceptron
• At each position of the box, the perceptron is evaluating the picture as part of the classification for that region
– We could arrange the outputs of the neurons for each position correspondingly to the original picture
![Page 68: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/68.jpg)
Scanning: A closer look
• Consider a single perceptron
• At each position of the box, the perceptron is evaluating the picture as part of the classification for that region
– We could arrange the outputs of the neurons for each position correspondingly to the original picture
![Page 69: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/69.jpg)
Scanning: A closer look
• Consider a single perceptron
• At each position of the box, the perceptron is evaluating the picture as part of the classification for that region
– We could arrange the outputs of the neurons for each position correspondingly to the original picture
![Page 70: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/70.jpg)
Scanning: A closer look
• Consider a single perceptron
• At each position of the box, the perceptron is evaluating the picture as part of the classification for that region
– We could arrange the outputs of the neurons for each position correspondingly to the original picture
![Page 71: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/71.jpg)
Scanning: A closer look
• Consider a single perceptron
• At each position of the box, the perceptron is evaluating the picture as part of the classification for that region
– We could arrange the outputs of the neurons for each position correspondingly to the original picture
![Page 72: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/72.jpg)
Scanning: A closer look
• Consider a single perceptron
• At each position of the box, the perceptron is evaluating the picture as part of the classification for that region
– We could arrange the outputs of the neurons for each position correspondingly to the original picture
![Page 73: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/73.jpg)
Scanning: A closer look
• Consider a single perceptron
• At each position of the box, the perceptron is evaluating the picture as part of the classification for that region
– We could arrange the outputs of the neurons for each position correspondingly to the original picture
![Page 74: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/74.jpg)
Scanning: A closer look
• Consider a single perceptron
• At each position of the box, the perceptron is evaluating the picture as part of the classification for that region
– We could arrange the outputs of the neurons for each position correspondingly to the original picture
![Page 75: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/75.jpg)
Scanning: A closer look
• Consider a single perceptron
• At each position of the box, the perceptron is evaluating the picture as part of the classification for that region
– We could arrange the outputs of the neurons for each position correspondingly to the original picture
![Page 76: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/76.jpg)
Scanning: A closer look
• Consider a single perceptron
• At each position of the box, the perceptron is evaluating the picture as part of the classification for that region
– We could arrange the outputs of the neurons for each position correspondingly to the original picture
![Page 77: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/77.jpg)
Scanning: A closer look
• Consider a single perceptron
• At each position of the box, the perceptron is evaluating the picture as part of the classification for that region
– We could arrange the outputs of the neurons for each position correspondingly to the original picture
![Page 78: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/78.jpg)
Scanning: A closer look
• Consider a single perceptron
• At each position of the box, the perceptron is evaluating the picture as part of the classification for that region
– We could arrange the outputs of the neurons for each position correspondingly to the original picture
![Page 79: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/79.jpg)
Scanning: A closer look
• Consider a single perceptron
• At each position of the box, the perceptron is evaluating the picture as part of the classification for that region
– We could arrange the outputs of the neurons for each position correspondingly to the original picture
• Eventually, we can arrange the outputs from the response at each scanned position into a rectangle that’s proportional in size to the original picture
![Page 80: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/80.jpg)
Scanning: A closer look
• Consider a single perceptron
• At each position of the box, the perceptron is evaluating the picture as part of the classification for that region
– We could arrange the outputs of the neurons for each position correspondingly to the original picture
• Eventually, we can arrange the outputs from the response at each scanned position into a rectangle that’s proportional in size to the original picture
![Page 81: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/81.jpg)
Scanning: A closer look
• Similarly, each perceptron’s outputs from each
of the scanned positions can be arranged as a
rectangular pattern
![Page 82: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/82.jpg)
Scanning: A closer look
• To classify a specific “patch” in the image, we
send the first level activations from the
positions corresponding to that position to the
next layer
![Page 83: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/83.jpg)
Scanning: A closer look
• We can recurse the logic
– The second level neurons too are “scanning” the rectangular outputs of the first-level neurons
– (Un)like the first level, they are jointly scanning multiple “pictures”
• Each location in the output of the second level neuron considers the corresponding locations from the outputs of all the first-level neurons
![Page 84: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/84.jpg)
Scanning: A closer look
• We can recurse the logic
– The second level neurons too are “scanning” the rectangular outputs of the first-level neurons
– (Un)like the first level, they are jointly scanning multiple “pictures”
• Each location in the output of the second level neuron considers the corresponding locations from the outputs of all the first-level neurons
![Page 85: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/85.jpg)
Scanning: A closer look
• We can recurse the logic
– The second level neurons too are “scanning” the rectangular outputs of the first-level neurons
– (Un)like the first level, they are jointly scanning multiple “pictures”
• Each location in the output of the second level neuron considers the corresponding locations from the outputs of all the first-level neurons
![Page 86: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/86.jpg)
Scanning: A closer look
• We can recurse the logic
– The second level neurons too are “scanning” the rectangular outputs of the first-level neurons
– (Un)like the first level, they are jointly scanning multiple “pictures”
• Each location in the output of the second level neuron considers the corresponding locations from the outputs of all the first-level neurons
![Page 87: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/87.jpg)
Scanning: A closer look
• We can recurse the logic
– The second level neurons too are “scanning” the rectangular outputs of the first-level neurons
– (Un)like the first level, they are jointly scanning multiple “pictures”
• Each location in the output of the second level neuron considers the corresponding locations from the outputs of all the first-level neurons
![Page 88: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/88.jpg)
Scanning: A closer look
• We can recurse the logic
– The second level neurons too are “scanning” the rectangular outputs of the first-level neurons
– (Un)like the first level, they are jointly scanning multiple “pictures”
• Each location in the output of the second level neuron considers the corresponding locations from the outputs of all the first-level neurons
![Page 89: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/89.jpg)
Scanning: A closer look
• We can recurse the logic
– The second level neurons too are “scanning” the rectangular outputs of the first-level neurons
– (Un)like the first level, they are jointly scanning multiple “pictures”
• Each location in the output of the second level neuron considers the corresponding locations from the outputs of all the first-level neurons
![Page 90: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/90.jpg)
Scanning: A closer look
• We can recurse the logic
– The second level neurons too are “scanning” the rectangular outputs of the first-level neurons
– (Un)like the first level, they are jointly scanning multiple “pictures”
• Each location in the output of the second level neuron considers the corresponding locations from the outputs of all the first-level neurons
![Page 91: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/91.jpg)
Scanning: A closer look
• To detect a picture at any location in the original
image, the output layer must consider the
corresponding outputs of the last hidden layer
![Page 92: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/92.jpg)
Detecting a picture anywhere in the image?
• Recursing the logic, we can create a map for
the neurons in the next layer as well
– The map is a flower detector for each location of
the original image
![Page 93: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/93.jpg)
Detecting a picture anywhere in the image?
• To detect a picture at any location in the original image, the output layer must consider the corresponding output of the last hidden layer
• Actual problem? Is there a flower in the image
– Not “detect the location of a flower”
![Page 94: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/94.jpg)
Detecting a picture anywhere in the image?
• To detect a picture at any location in the original image, the output layer must consider the corresponding output of the last hidden layer
• Actual problem? Is there a flower in the image
– Not “detect the location of a flower”
![Page 95: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/95.jpg)
Detecting a picture anywhere in the image?
• Is there a flower in the picture?
• The output of the almost-last layer is also a grid/picture
• The entire grid can be sent into a final neuron that performs a logical “OR” to detect a picture
– Finds the max output from all the positions
– Or..
![Page 96: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/96.jpg)
Detecting a picture in the image
• Redrawing the final layer
– “Flatten” the output of the neurons into a single
block, since the arrangement is no longer important
– Pass that through an MLP
![Page 97: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/97.jpg)
Generalizing a bit
• At each location, the net searches for a flower
• The entire map of outputs is sent through a
follow-up perceptron (or MLP) to determine if
there really is a flower in the picture
![Page 98: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/98.jpg)
Generalizing a bit
• The final objective is determine if the picture has a flower
• No need to use only one MLP to scan the image
– Could use multiple MLPs..
– Or a single larger MLPs with multiple outputs
• Each providing independent evidence of the presence of a flower
![Page 99: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/99.jpg)
Generalizing a bit..
• The final objective is determine if the picture has a flower
• No need to use only one MLP to scan the image
– Could use multiple MLPs..
– Or a single larger MLPs with multiple output
• Each providing independent evidence of the presence of a flower
![Page 100: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/100.jpg)
For simplicity..
• We will continue to assume the simple version
of the model for the sake of explanation
![Page 101: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/101.jpg)
Recall: What does an MLP learn?
• The lowest layers of the network capture simple patterns– The linear decision boundaries in this example
• The next layer captures more complex patterns– The polygons
• The next one captures still more complex patterns.. 101
x2
AND AND
OR
x1 x1 x2
![Page 102: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/102.jpg)
Recall: How does an MLP represent patterns
• The neurons in an MLP build up complex patterns from simple pattern hierarchically
– Each layer learns to “detect” simple combinations of the patterns detected by earlier layers 102
DIGIT OR NOT?
![Page 103: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/103.jpg)
Returning to our problem:What does the network learn?
• The entire MLP looks for a flower-like pattern
at each location
![Page 104: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/104.jpg)
The behavior of the layers
• The first layer neurons “look” at the entire “block” to extract block-level features
– Subsequent layers only perform classification over these block-level features
• The first layer neurons is responsible for evaluating the entire block of pixels
– Subsequent layers only look at a single pixel in their input maps
![Page 105: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/105.jpg)
Distributing the scan
• We can distribute the pattern matching over two layers and still achieve the same block analysis at the second layer
– The first layer evaluates smaller blocks of pixels
– The next layer evaluates blocks of outputs from the first layer
![Page 106: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/106.jpg)
Distributing the scan
• We can distribute the pattern matching over two layers and still achieve the same block analysis at the second layer
– The first layer evaluates smaller blocks of pixels
– The next layer evaluates blocks of outputs from the first layer
![Page 107: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/107.jpg)
Distributing the scan
• We can distribute the pattern matching over two layers and still achieve the same block analysis at the second layer
– The first layer evaluates smaller blocks of pixels
– The next layer evaluates blocks of outputs from the first layer
![Page 108: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/108.jpg)
Distributing the scan
• We can distribute the pattern matching over two layers and still achieve the same block analysis at the second layer
– The first layer evaluates smaller blocks of pixels
– The next layer evaluates blocks of outputs from the first layer
![Page 109: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/109.jpg)
Distributing the scan
• We can distribute the pattern matching over two layers and still achieve the same block analysis at the second layer
– The first layer evaluates smaller blocks of pixels
– The next layer evaluates blocks of outputs from the first layer
![Page 110: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/110.jpg)
Distributing the scan
• We can distribute the pattern matching over two layers and still achieve the same block analysis at the second layer
– The first layer evaluates smaller blocks of pixels
– The next layer evaluates blocks of outputs from the first layer
![Page 111: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/111.jpg)
Distributing the scan
• We can distribute the pattern matching over two layers and still achieve the same block analysis at the second layer
– The first layer evaluates smaller blocks of pixels
– The next layer evaluates blocks of outputs from the first layer
![Page 112: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/112.jpg)
Distributing the scan
• We can distribute the pattern matching over two layers and still achieve the same block analysis at the second layer
– The first layer evaluates smaller blocks of pixels
– The next layer evaluates blocks of outputs from the first layer
![Page 113: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/113.jpg)
Distributing the scan
• We can distribute the pattern matching over two layers and still achieve the same block analysis at the second layer
– The first layer evaluates smaller blocks of pixels
– The next layer evaluates blocks of outputs from the first layer
![Page 114: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/114.jpg)
Distributing the scan
• We can distribute the pattern matching over two layers and still achieve the same block analysis at the second layer
– The first layer evaluates smaller blocks of pixels
– The next layer evaluates blocks of outputs from the first layer
![Page 115: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/115.jpg)
Distributing the scan
• We can distribute the pattern matching over two layers and still achieve the same block analysis at the second layer
– The first layer evaluates smaller blocks of pixels
– The next layer evaluates blocks of outputs from the first layer
– This effectively evaluates the larger block of the original image
![Page 116: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/116.jpg)
Distributing the scan
• The higher layer implicitly learns the
arrangement of sub patterns that represents
the larger pattern (the flower in this case)
![Page 117: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/117.jpg)
This is still just scanning with a shared parameter network
• With a minor modification…
![Page 118: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/118.jpg)
This is still just scanning with a shared parameter network
• The network that analyzes individual blocks is
now itself a shared parameter network..
Colors indicate neuronswith shared parameters Layer 1
Each arrow represents an entire setof weights over the smaller cell
The pattern of weights going out ofany cell is identical to that from anyother cell.
![Page 119: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/119.jpg)
This is still just scanning with a shared parameter network
• The network that analyzes individual blocks is
now itself a shared parameter network..
Colors indicate neuronswith shared parameters Layer 1
Layer 2
No sharing at this levelwithin a block
![Page 120: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/120.jpg)
This logic can be recursed
• Building the pattern over 3 layers
![Page 121: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/121.jpg)
This logic can be recursed
• Building the pattern over 3 layers
![Page 122: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/122.jpg)
This logic can be recursed
• Building the pattern over 3 layers
![Page 123: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/123.jpg)
This logic can be recursed
• Building the pattern over 3 layers
![Page 124: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/124.jpg)
This logic can be recursed
• Building the pattern over 3 layers
![Page 125: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/125.jpg)
The 3-layer shared parameter net
• Building the pattern over 3 layers
![Page 126: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/126.jpg)
The 3-layer shared parameter net
• Building the pattern over 3 layers
All weights shown are unique
![Page 127: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/127.jpg)
The 3-layer shared parameter net
• Building the pattern over 3 layers
Colors indicateshared parameters
![Page 128: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/128.jpg)
The 3-layer shared parameter net
• Building the pattern over 3 layers
Colors indicateshared parameters
![Page 129: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/129.jpg)
This logic can be recursed
We are effectively evaluating theyellow block with the share parameternet to the right
Every block is evaluated using the samenet in the overall computation
![Page 130: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/130.jpg)
Using hierarchical build-up of features
• We scan the figure using the shared parameter network
• The entire operation can be viewed as a single giant network– Where individual subnets are themselves shared-parameter
nets
![Page 131: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/131.jpg)
Why distribute?
• Distribution forces localized patterns in lower layers
– More generalizable
• Number of parameters…
![Page 132: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/132.jpg)
Parameters in Undistributed network
• Only need to consider what happens in one block
– All other blocks are scanned by the same net
• (𝐾2 + 1)𝑁1 weights in first layer
• (𝑁1 + 1)𝑁2weights in second layer
– (𝑁𝑖−1 + 1)𝑁𝑖weights in subsequent ith layer
• Total parameters: 𝒪 𝐾2𝑁1 + 𝑁1𝑁2 + 𝑁2𝑁3…
– Ignoring the bias term
N1 units
N2 units𝐾 × 𝐾 block
![Page 133: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/133.jpg)
When distributed over 2 layers
• First layer: 𝑁1 lower-level units, each looks at 𝐿2 pixels
– 𝑁1(𝐿2 + 1) weights
• Second layer needs (𝐾
𝐿
2𝑁1 + 1)𝑁2 weights
• Subsequent layers needs 𝑁𝑖−1𝑁𝑖 when distributed over 2 layers only
– Total parameters: 𝒪 𝐿2𝑁1 +𝐾
𝐿
2𝑁1𝑁2 + 𝑁2𝑁3…
Colors indicate neuronswith shared parameters N1 groups
Layer 2
No sharing at this levelwithin a block
𝐾 × 𝐾 block
𝐿 × 𝐿 cell
![Page 134: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/134.jpg)
When distributed over 3 layers
• First layer: 𝑁1 lower-level (groups of) units, each looks at 𝐿12 pixels
– 𝑁1(𝐿12 + 1) weights
• Second layer: 𝑁2 (groups of) units looking at groups of 𝐿2 × 𝐿2 connections from each of 𝑁1 first-level neurons
– (𝐿22𝑁1 + 1)𝑁2 weights
• Third layer:
– (𝐾
𝐿1𝐿2
2𝑁2 + 1)𝑁3 weights
• Subsequent layers need 𝑁𝑖−1𝑁𝑖 neurons
– Total parameters: 𝒪 𝐿12𝑁1 + 𝐿2
2𝑁1𝑁2 +𝐾
𝐿1𝐿2
2𝑁2𝑁3 +⋯
![Page 135: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/135.jpg)
Comparing Number of Parameters
• 𝒪 𝐾2𝑁1 +𝑁1𝑁2 +𝑁2𝑁3…
• For this example, let 𝐾 =16, 𝑁1 = 4,𝑁2 = 2,𝑁3 = 1
• Total 1034 weights
Conventional MLP, not distributedDistributed (3 layers)
• 𝒪 ൬𝐿12𝑁1 + 𝐿2
2𝑁1𝑁2 +
![Page 136: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/136.jpg)
Comparing Number of Parameters
• 𝒪 𝐾2𝑁1 + σ𝑖𝑁𝑖𝑁𝑖+1
Conventional MLP, not distributedDistributed (3 layers)
• 𝒪 ቆ𝐿12𝑁1 +
σ𝑖<𝑛𝑐𝑜𝑛𝑣−1 𝐿𝑖2𝑁𝑖𝑁𝑖+1 +
𝐾
ς𝑖 ℎ𝑜𝑝𝑖
2
𝑁𝑛𝑐𝑜𝑛𝑣−1𝑁𝑛𝑐𝑜𝑛𝑣 +
These terms dominate..
![Page 137: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/137.jpg)
Why distribute?
• Distribution forces localized patterns in lower layers
– More generalizable
• Number of parameters…
– Large (sometimes order of magnitude) reduction in parameters
• Gains increase as we increase the depth over which the blocks are distributed
• Key intuition: Regardless of the distribution, we can view the network as “scanning” the picture with an MLP
– The only difference is the manner in which parameters are shared in the MLP
![Page 138: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/138.jpg)
Hierarchical composition: A different perspective
• The entire operation can be redrawn as before
as maps of the entire image
![Page 139: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/139.jpg)
Building up patterns
• The first layer looks at small sub regions of the
main image
– Sufficient to detect, say, petals
![Page 140: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/140.jpg)
Some modifications
• The first layer looks at sub regions of the main image
– Sufficient to detect, say, petals
• The second layer looks at regions of the output of the first layer
– To put the petals together into a flower
– This corresponds to looking at a larger region of the original input image
![Page 141: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/141.jpg)
Some modifications
• The first layer looks at sub regions of the main image
– Sufficient to detect, say, petals
• The second layer looks at regions of the output of the first layer
– To put the petals together into a flower
– This corresponds to looking at a larger region of the original input image
• We may have any number of layers in this fashion
![Page 142: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/142.jpg)
Some modifications
• The first layer looks at sub regions of the main image
– Sufficient to detect, say, petals
• The second layer looks at regions of the output of the first layer
– To put the petals together into a flower
– This corresponds to looking at a larger region of the original input image
• We may have any number of layers in this fashion
![Page 143: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/143.jpg)
Terminology
• The pattern in the input image that each neuron sees is its “Receptive Field”– The squares show the sizes of the receptive fields for the first, second and third-layer neurons
• The actual receptive field for a first layer neurons is simply its arrangement of weights
• For the higher level neurons, the actual receptive field is not immediately obvious and must be calculated– What patterns in the input do the neurons actually respond to?
– Will not actually be simple, identifiable patterns like “petal” and “inflorescence”
![Page 144: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/144.jpg)
Some modifications
• The final layer may feed directly into a multi layer
perceptron rather than a single neuron
• This is exactly the shared parameter net we just
saw
![Page 145: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/145.jpg)
Accounting for jitter
• We would like to account for some jitter in the
first-level patterns
– If a pattern shifts by one pixel, is it still a petal?
![Page 146: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/146.jpg)
Accounting for jitter
• We would like to account for some jitter in the first-level patterns
– If a pattern shifts by one pixel, is it still a petal?
– A small jitter is acceptable
• Replace each value by the maximum of the values within a small region around it
– Max filtering or Max pooling
Max
Max
Max
Max
![Page 147: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/147.jpg)
Accounting for jitter
• We would like to account for some jitter in the first-level patterns
– If a pattern shifts by one pixel, is it still a petal?
– A small jitter is acceptable
• Replace each value by the maximum of the values within a small region around it
– Max filtering or Max pooling
Max
1 1
5 6
Max 6
![Page 148: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/148.jpg)
The max operation is just a neuron
• The max operation is just another neuron
• Instead of applying an activation to the weighted
sum of inputs, each neuron just computes the
maximum over all inputs
Max layer
![Page 149: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/149.jpg)
The max operation is just a neuron
• The max operation is just another neuron
• Instead of applying an activation to the weighted
sum of inputs, each neuron just computes the
maximum over all inputs
Max layer
![Page 150: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/150.jpg)
Accounting for jitter
• The max filtering can also be performed as a
scan
Max
1 1
5 6
Max 6
![Page 151: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/151.jpg)
Accounting for jitter
• The “max filter” operation too “scans” the
picture
Max
1 3
6 5Max
6 6
![Page 152: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/152.jpg)
Accounting for jitter
Max
3 2
5 7Max
6 6 7
• The “max filter” operation too “scans” the
picture
![Page 153: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/153.jpg)
Accounting for jitter
Max
• The “max filter” operation too “scans” the
picture
![Page 154: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/154.jpg)
Accounting for jitter
Max
• The “max filter” operation too “scans” the
picture
![Page 155: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/155.jpg)
Accounting for jitter
Max
• The “max filter” operation too “scans” the
picture
![Page 156: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/156.jpg)
“Strides”
• The “max” operations may “stride” by more
than one pixel
Max
![Page 157: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/157.jpg)
“Strides”
• The “max” operations may “stride” by more
than one pixel
Max
![Page 158: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/158.jpg)
“Strides”
• The “max” operations may “stride” by more
than one pixel
Max
![Page 159: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/159.jpg)
“Strides”
• The “max” operations may “stride” by more
than one pixel
Max
![Page 160: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/160.jpg)
“Strides”
• The “max” operations may “stride” by more
than one pixel
Max
![Page 161: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/161.jpg)
“Strides”
• The “max” operations may “stride” by more than one pixel
– This will result in a shrinking of the map
– The operation is usually called “pooling”
• Pooling a number of outputs to get a single output
• Also called “Down sampling”
Max
![Page 162: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/162.jpg)
Shrinking with a max
• In this example we actually shrank the image
after the max
– Adjacent “max” operators did not overlap
– The stride was the size of the max filter itself
Max layer
![Page 163: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/163.jpg)
Non-overlapped strides
• Non-overlapping strides: Partition the output of the layer into blocks
• Within each block only retain the highest value
– If you detect a petal anywhere in the block, a petal is detected..
![Page 164: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/164.jpg)
1 1 2 4
5 6 7 8
3 2 1 0
1 2 3 4
Single depth slice
x
y
max pool with 2x2 filters
and stride 2 6 8
3 4
Max Pooling
![Page 165: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/165.jpg)
Higher layers
• The next layer works on the max-pooled maps
Maxpool
![Page 166: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/166.jpg)
The overall structure
• In reality we can have many layers of “convolution” (scanning) followed by max pooling (and reduction) before the final MLP
– The individual perceptrons at any “scanning” or “convolutive” layer are called “filters”• They “filter” the input image to produce an output image (map)
– As mentioned, the individual max operations are also called max pooling or max filters
![Page 167: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/167.jpg)
The overall structure
• This entire structure is called a Convolutive
Neural Network
![Page 168: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/168.jpg)
Convolutive Neural Network
Input image First layer filters
First layer maxpooling Second layer filters
Second layer maxpooling
![Page 169: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/169.jpg)
1-D convolution
• The 1-D scan version of the convolutional neural network is the time-delay neural network
– Used primarily for speech recognition
![Page 170: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/170.jpg)
1-D scan version
• The 1-D scan version of the convolutional neural network
![Page 171: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/171.jpg)
1-D scan version
• The 1-D scan version of the convolutional neural network
The spectrographic time-frequency components arethe input layer
![Page 172: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/172.jpg)
1-D scan version
• The 1-D scan version of the convolutional neural network
![Page 173: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/173.jpg)
1-D scan version
• The 1-D scan version of the convolutional neural network
![Page 174: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/174.jpg)
1-D scan version
• The 1-D scan version of the convolutional neural network– Max pooling optional
• Not generally done for speech
![Page 175: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/175.jpg)
1-D scan version
• The 1-D scan version of the convolutional neural network– Max pooling optional
• Not generally done for speech
![Page 176: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/176.jpg)
1-D scan version
• The 1-D scan version of the convolutional neural network– Max pooling optional
• Not generally done for speech
![Page 177: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/177.jpg)
1-D scan version
• The 1-D scan version of the convolutional neural network– Max pooling optional
• Not generally done for speech
![Page 178: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/178.jpg)
1-D scan version
• The 1-D scan version of the convolutional neural network– Max pooling optional
• Not generally done for speech
![Page 179: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/179.jpg)
1-D scan version
• The 1-D scan version of the convolutional neural network
• A final perceptron (or MLP) to aggregate evidence– “Does this recording have the target word”
![Page 180: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/180.jpg)
Time-Delay Neural Network
• This structure is called the Time-Delay Neural Network
![Page 181: Deep Neural Networks Scanning for patterns (aka ... · –Gradients can be computed by backpropagation 2. input layer output layer The model so far •Can recognize patterns in data](https://reader034.vdocuments.mx/reader034/viewer/2022052009/601deb5e55dee44384301465/html5/thumbnails/181.jpg)
Story so far• Neural networks learn patterns in a hierarchical manner
– Simple to complex
• Pattern classification tasks such as “does this picture contain a cat” are best performed by scanning for the target pattern
• Scanning for patterns can be viewed as classification with a large shared-parameter network
• Scanning an input with a network and combining the outcomes is equivalent to scanning with individual neurons– First level neurons scan the input
– Higher-level neurons scan the “maps” formed by lower-level neurons
– A final “decision” layer (which may be a max, a perceptron, or an MLP) makes the final decision
• At each layer, a scan by a neuron may optionally be followed by a “max” (or any other) “pooling” operation to account for deformation
• For 2-D (or higher-dimensional) scans, the structure is called a convnet
• For 1-D scan along time, it is called a Time-delay neural network