![Page 1: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/1.jpg)
D E E P L E A R N I N GPA RT T W O - C O N V O L U T I O N A L & R E C U R R E N T N E T W O R K S
C S / C N S / E E 1 5 5 - M A C H I N E L E A R N I N G & D ATA M I N I N G
![Page 2: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/2.jpg)
R E V I E W
![Page 3: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/3.jpg)
!3
x1
x2y = 0y = 1
x = (x1, x2)
we want to learn non-linear decision boundaries
we can do this by composing linear decision boundaries
![Page 4: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/4.jpg)
!4
x1
x2
xM
input features
weights
sums
⌃
non- linearities
⌃
⌃
1
1
⌃
⌃
⌃
hidden features
weights
sumsnon-
linearities
neural networks formalize a method for building these composed functions
deep networks are universal function approximators
![Page 5: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/5.jpg)
!5
each artificial neuron defines a (hyper)plane:
0 = w0 + w1x1 + w2x2 + . . . wMxM
summation: distance from plane to input
non-linearity: convert distance into non-linear field
distance
plane example transformed
distance
plane
the dot product is the shortest distance between a point and a plane
a geometric interpretation
![Page 6: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/6.jpg)
!6
1. cut the space up with hyperplanes 2. evaluate distances of points to hyperplanes 3. non-linearly transform these distances to get new points
repeat until data have been linearized
![Page 7: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/7.jpg)
!7
x1
x2
xM
⌃
⌃
⌃
1
1
⌃
⌃
⌃
“Alexa, what is the weather going to be like today?”
x1
x2
xM
⌃
⌃
⌃
1
1
⌃
⌃
⌃t
torques
x1
x2
xM
⌃
⌃
⌃
1
1
⌃
⌃
⌃
cat
![Page 8: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/8.jpg)
!8
images
“hello”
audio & text virtual/physical control tasks
to scale deep networks to these domains, we often need to use inductive biases
today
![Page 9: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/9.jpg)
I N D U C T I V E B I A S E S
![Page 10: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/10.jpg)
!10
ultimately, we care about solving tasks
Krizhevsky et al., 2012
object recognition
Ren et al., 2016
object detection object segmentation
He et al., 2017
text translation
Wu et al., 2016
Weston et al., 2015
text question answering
![Page 11: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/11.jpg)
!11
ultimately, we care about solving tasks
atari
Minh, et al., 2013
object manipulation
Levine, Finn, et al., 2016
go
Silver, Huang, et al., 2016
autonomous driving
Waymo
![Page 12: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/12.jpg)
!12
ultimately, we care about solving tasks
survival & reproduction
cellular signaling, maintenance
organ function
navigation
hunting, foraging
muscle actuation
vision
social/mating behavior
![Page 13: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/13.jpg)
!13
two components for solving any task
priors learning
![Page 14: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/14.jpg)
!14
priorsknowledge assumed beforehand
x1
x2
xM
⌃
⌃
⌃
1
1
⌃
⌃
⌃
architecture activities, outputs
w1
w2
L2
param. constraints
model 1
model 2
param. values
![Page 15: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/15.jpg)
!15
learningknowledge extracted from data
Loss
Weight
LOSS/ERROR
GRADIENT
IMPROVEMENT
filter
![Page 16: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/16.jpg)
!16
it’s a balance!
strong priors, minimal learning • fast/easy to learn and deploy • may be too rigid, unadaptable
weak priors, much learning • slow/difficult to learn and deploy • flexible, adaptable
for a desired level of performance on a task…
choose priors and collect data to obtain a model that achieves that performance in the minimal amount of time
datax1
x2
xM
⌃
⌃
⌃
1
1
⌃
⌃
⌃
![Page 17: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/17.jpg)
!17
priors are essential - always have to make some assumptions, cannot integrate over all possible models
we are all initialized from evolutionary priors
livescience.com
humans seem to have a larger capacity for learning than other organisms
![Page 18: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/18.jpg)
!18
up until now, all of our machines have been purely based on priors
these machines can perform tasks that are impossible to hand-design
for the first time in history, we can now create machines that also learn
…but they are mostly still based on priors!
Kormushev et al.
![Page 19: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/19.jpg)
!19
we can exploit known structure in spatial and sequential data to impose priors (i.e. inductive biases) on our models
this allows us to learn models in complex, high-dimensional domains while limiting the number of parameters and data examples
x
y
t
inductive: inferring general laws from examples
![Page 20: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/20.jpg)
C O N V O L U T I O N A L N E U R A L N E T W O R K S
![Page 21: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/21.jpg)
!21
task: object recognition
Yisong
discriminative mapping from image to object identity
![Page 22: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/22.jpg)
!22
images contain all of the information about the binary latent variable Yisong/Not Yisong
extract the relevant information about this latent variable to form conditional probability
p(Yisong| )inference:
notice that images also contain other nuisance information, such as pose, lighting, background, etc.
want to be invariant to nuisance information
![Page 23: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/23.jpg)
!23
data, label collection
the mapping is too difficult to define by hand,
need to learn from data
Yisong Not Yisong
then, we need to choose a model architecture…
x1
x2
xM
⌃
⌃
⌃
1
1
⌃
⌃
⌃
![Page 24: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/24.jpg)
!24
standard neural networks require a fixed input size…
clearer patterns, but more parameters
fewer parameters, but unclear patterns
280 x 280 x 3205 x 205 x 3150 x 150 x 3
15 x
15 x 3
50 x 50 x 3
35 x
35 x 3
25 x
25 x 3
100 x 100 x 375 x 75 x 3
235,200
126,075
67,500
30,000
16,8757,500
3,675
1,875675
![Page 25: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/25.jpg)
!25
convert to grayscale…
clearer patterns, but more parameters
fewer parameters, but unclear patterns
280 x 280 x 1205 x 205 x 1150 x 150 x 1
15 x
15 x 1
50 x 50 x 1
35 x
35 x 1
25 x
25 x 1
100 x 100 x 175 x 75 x 1
78,400
42,025
22,500
10,000
5,6252,500
1,225
625225
![Page 26: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/26.jpg)
!26
10,000
1
100 x 100 x 1
10,000
reshape
![Page 27: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/27.jpg)
!27
?
how many units do we need?
x1
x2
xM
⌃
⌃
⌃
1
10,0
00INPUT
1
10
100
1,000
10,000
100,000
1,000,000
10,000
100,000
1,000,000
10,000,000
100,000,000
1,000,000,000
10,000,000,000
# weights# units x 10,000 =
if we want to recognize even a few basic patterns at each location, the number of parameters will explode!
![Page 28: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/28.jpg)
!28
to reduce the amount of learning, we can introduce inductive biases
exploit the spatial structure of image data
![Page 29: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/29.jpg)
!29
localitynearby areas tend to contain stronger patterns
nearby patches tend to share characteristics and are combined in particular ways
nearby pixels tend to be similar and vary in particular ways
nearby regions tend to be found in particular arrangements
![Page 30: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/30.jpg)
!30
translation invariancerelative (rather than absolute) positions are relevant
Yisong’s identity is independent of absolute location of his pixels
![Page 31: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/31.jpg)
!31
let’s convert locality and translation invariance into inductive biases
localitynearby areas tend to contain stronger
patterns
inputs can be restricted to regions
⌃
maintain spatial ordering
translation invariancerelative positions
are relevant
same filters can be applied throughout the input
⌃
⌃
same weights
![Page 32: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/32.jpg)
!32
these are the inductive biases of convolutional neural networks
special case of standard (fully-connected) neural networks
⌃
fully-connected
⌃
convolutional
weight savings
convolutional
⌃
⌃
(same weights)
fully-connected
⌃
⌃
(different weights)
weight savings
these inductive biases make the number of weights independent of the input size!
![Page 33: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/33.jpg)
!33
convolve a set of filters with the input
filter weights:
http://deeplearning.net/software/theano_versions/dev/tutorial/conv_arithmetic.html
take inner (dot) product of filter and each input location
measures degree of filter feature at input location
feature map
![Page 34: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/34.jpg)
!34
use padding to preserve spatial size
http://deeplearning.net/software/theano_versions/dev/tutorial/conv_arithmetic.html
typically add zeros around the perimeter
![Page 35: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/35.jpg)
!35
use stride to downsample the input
http://deeplearning.net/software/theano_versions/dev/tutorial/conv_arithmetic.html
only compute output at some integer interval
stride = 2
![Page 36: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/36.jpg)
!36
filters are applied to all input channels
3 x 3 x 3 filter tensor
each filter results in a new output channel
channel 2
channel 1
channel 3
![Page 37: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/37.jpg)
!37
can be applied with padding and stride
5
0
0
0
21
1
8
3
3
4
3
2
1
4
2
5 8
2 4
pooling locally aggregates values in each feature map
predefined operation: maximum, average, etc.
downsampling and invariance
![Page 38: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/38.jpg)
!38
convolutional pop-quiz
5
5
16
input feature map
3
3
?
filters
?*36
?
?
output feature map
if we use stride=1 and padding=0 then…
how many filters are there?
what size is each filter?
what is the output filter map size?
36 same as the number of output channels
3 x 3 x 16 channels match the number of input channels
3 x 3 x 36 result of only valid convolutions
![Page 39: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/39.jpg)
!39
Caltech-101
101 classes, 9,146 images
Caltech-256
256 classes, 30,607 images
CIFAR-10
10 classes, 60,000 images
CIFAR-100
100 classes, 60,000 images
ImageNetCompetition
1,000 classes, 1.2 million images
Full
21,841 classes, 14 million images
natural image datasets
![Page 40: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/40.jpg)
!40
convolutional models for classification
LeNetAlexNet
VGG
GoogLeNet ResNet Inception v4
DenseNetResNeXt
![Page 41: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/41.jpg)
!41
convolutional models for detection, segmentation, etc.
R-CNN Fast R-CNN Faster R-CNN
FCN Hourglass U-Net
YOLOMask R-CNN
![Page 42: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/42.jpg)
!42 https://www.youtube.com/watch?v=OOT3UIXZztE
![Page 43: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/43.jpg)
!43 https://www.youtube.com/watch?v=pW6nZXeWlGM
![Page 44: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/44.jpg)
!44
convolutional models for image generation
DC-GAN convolutional VAE Pixel CNN
![Page 45: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/45.jpg)
!45 https://www.youtube.com/watch?v=XOxxPcy5Gr4
![Page 46: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/46.jpg)
!46
filter visualization
Zeiler, 2013
![Page 47: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/47.jpg)
!47
filter visualization
Zeiler, 2013
![Page 48: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/48.jpg)
!48
filter visualization
Zeiler, 2013
![Page 49: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/49.jpg)
!49
filter visualization
Zeiler, 2013
![Page 50: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/50.jpg)
!50
filter visualization
Zeiler, 2013
![Page 51: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/51.jpg)
!51
filter visualization
Zeiler, 2013
![Page 52: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/52.jpg)
!52
filter visualization
Zeiler, 2013
![Page 53: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/53.jpg)
!53
filter visualization
Zeiler, 2013
![Page 54: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/54.jpg)
!54
filter visualization
Zeiler, 2013
![Page 55: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/55.jpg)
!55
filter visualization
https://distill.pub/2017/feature-visualization/
![Page 56: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/56.jpg)
!56
convolutions applied to sequences
WaveNet
https://deepmind.com/blog/wavenet-launches-google-assistant/
![Page 57: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/57.jpg)
!57
convolutions in non-euclidean spaces
Spline CNN Graph Convolutional Network
Fey et al., 2017 Kipf & Welling, 2016
![Page 58: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/58.jpg)
!58
recapitulation
we can exploit spatial structure to impose inductive biases on the model
⌃
locality
⌃
⌃
translation invariance
this limits the number of parameters required, reducing flexibility in reasonable ways
can then scale these models to complex data sets to perform difficult tasks
ImageNet
recognition detection segmentation generation
![Page 59: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/59.jpg)
R E C U R R E N T N E U R A L N E T W O R K S
![Page 60: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/60.jpg)
!60
task: speech recognition
mapping from input waveform to sequence of characters
Graves & Jaitly, 2014
![Page 61: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/61.jpg)
!61
the input waveform contains all of the information about the corresponding transcribed text
form a discriminative mapping: p(text sequence| )
again, there is nuisance information in the waveform coming from the speaker’s voice characteristics, volume, background, etc.
![Page 62: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/62.jpg)
!62
the mapping is too difficult to define by hand,
need to learn from data
data, label collection
Audio Transcriptions
“OK Google…”
“Hey Siri…”
“Yo Alexa…”
x1
x2
xM
⌃
⌃
⌃
1
1
⌃
⌃
⌃
but how do we define the network architecture?
![Page 63: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/63.jpg)
!63
problem: inputs can be of variable size
standard neural networks can only handle data of a fixed input size
?
x1
x2⌃
⌃
⌃
1
1
⌃
⌃
⌃x?
![Page 64: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/64.jpg)
!64
wait, but convolutional networks can handle variable input sizes…can’t we just use them?
yes, we could
however, this relies on a fixed input window size
we may be able to exploit additional structure in sequence data to impose better inductive biases
![Page 65: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/65.jpg)
!65
t
the structure of sequence data
sequence data also tends to obey
locality: nearby regions tend to form stronger patterns
translation invariance: patterns are relative rather than absolute
but has a single axis on which extended patterns occur
![Page 66: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/66.jpg)
!66
to mirror the sequential structure of the data, we can process the data sequentially
maintain an internal representation during processing
potentially infinite effective input windowfixed number of parameters
teach set of colored arrows denotes shared weights
INPUT
HIDDEN
OUTPUT
![Page 67: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/67.jpg)
!67
a recurrent neural network (RNN) can be expressed as
Hidden State
ht = �(W|h[ht�1,xt])
Output
yt = �(W|yht)
![Page 68: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/68.jpg)
!68
basic recurrent networks are also a special case of standard neural networks with skip connections and shared weights
same
Depth = Steps
![Page 69: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/69.jpg)
!69
therefore, we can use standard backpropagation to train, resulting in backpropagation through time (BPTT)
Gradient
![Page 70: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/70.jpg)
!70
primary difficulty of training RNNs involves propagating information over long horizons
e.g. input at one step is predictive of output at much later step
learning extended sequential dependencies requires propagating gradients over long horizons
• vanishing / exploding gradients • large memory/computational footprint
![Page 71: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/71.jpg)
!71
naïve attempt to fix information propagation issue
add skip connections across steps
information, gradients can propagate more easily
• increases computation • must set limit on window size
but…
![Page 72: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/72.jpg)
!72
add trainable memory to the networkread from and write to “cell” state
Output Gate
Cell State
Long Short-Term Memory (LSTM)
Input Gate
Forget Gate
Hidden State
it = �(W|i [ht�1,xt])
ft = �(W|f [ht�1,xt])
ot = �(W|o[ht�1,xt])
ht = ot � tanh(ct)
ct = ft � ct�1 + it � tanh(W|c [ht�1,xt])
Output
yt = �(W|yht)
ft = �(W|f [ht�1,xt])it = �(W|
i [ht�1,xt])ot = �(W|o[ht�1,xt])
ht�1
ct�1 ct
ht
xt
yt
![Page 73: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/73.jpg)
Output Gate
Cell State
!73
add trainable memory to the network
Long Short-Term Memory (LSTM)
Input Gate
Forget Gate
Hidden State
it = �(W|i [ht�1,xt])
ft = �(W|f [ht�1,xt])
ot = �(W|o[ht�1,xt])
ht = ot � tanh(ct)
read from and write to “cell” state
ft it ot
ht�1
ct�1 ct
ht
ct = ft � ct�1 + it � tanh(W|c [ht�1,xt])
Output
yt = �(W|yht)xt
yt
![Page 74: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/74.jpg)
Output Gate
Cell State
!74
add trainable memory to the network
Long Short-Term Memory (LSTM)
Input Gate
Forget Gate
Hidden State
it = �(W|i [ht�1,xt])
ft = �(W|f [ht�1,xt])
ot = �(W|o[ht�1,xt])
ht = ot � tanh(ct)
read from and write to “cell” state
ft it ot
ht�1
ct�1 ct
ht
ct = ft � ct�1 + it � tanh(W|c [ht�1,xt])
Output
yt = �(W|yht)xt
yt
![Page 75: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/75.jpg)
Output Gate
Cell State
!75
add trainable memory to the network
Long Short-Term Memory (LSTM)
Input Gate
Forget Gate
Hidden State
it = �(W|i [ht�1,xt])
ft = �(W|f [ht�1,xt])
ot = �(W|o[ht�1,xt])
ht = ot � tanh(ct)
read from and write to “cell” state
ft = �(W|f [ht�1,xt])it ot
ht�1
ct�1 ct
ht
ct = ft � ct�1 + it � tanh(W|c [ht�1,xt])
Output
yt = �(W|yht)xt
yt
![Page 76: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/76.jpg)
Output Gate
Cell State
!76
add trainable memory to the network
Long Short-Term Memory (LSTM)
Input Gate
Forget Gate
Hidden State
it = �(W|i [ht�1,xt])
ft = �(W|f [ht�1,xt])
ot = �(W|o[ht�1,xt])
ht = ot � tanh(ct)
read from and write to “cell” state
ft it = �(W|i [ht�1,xt])ot
ht�1
ct�1 ct
ht
ct = ft � ct�1 + it � tanh(W|c [ht�1,xt])
Output
yt = �(W|yht)xt
yt
![Page 77: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/77.jpg)
Output Gate
Cell State
!77
add trainable memory to the network
Long Short-Term Memory (LSTM)
Input Gate
Forget Gate
Hidden State
it = �(W|i [ht�1,xt])
ft = �(W|f [ht�1,xt])
ot = �(W|o[ht�1,xt])
ht = ot � tanh(ct)
read from and write to “cell” state
ft = �(W|f [ht�1,xt])it = �(W|
i [ht�1,xt])ot
ht�1
ct�1 ct
ht
ct = ft � ct�1 + it � tanh(W|c [ht�1,xt])
Output
yt = �(W|yht)xt
yt
![Page 78: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/78.jpg)
Output Gate
Cell State
!78
add trainable memory to the network
Long Short-Term Memory (LSTM)
Input Gate
Forget Gate
Hidden State
it = �(W|i [ht�1,xt])
ft = �(W|f [ht�1,xt])
ot = �(W|o[ht�1,xt])
ht = ot � tanh(ct)
read from and write to “cell” state
ft it ot = �(W|o[ht�1,xt])
ht�1
ct�1 ct
ht
ct = ft � ct�1 + it � tanh(W|c [ht�1,xt])
Output
yt = �(W|yht)xt
yt
![Page 79: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/79.jpg)
Output Gate
Cell State
!79
add trainable memory to the network
Long Short-Term Memory (LSTM)
Input Gate
Forget Gate
Hidden State
it = �(W|i [ht�1,xt])
ft = �(W|f [ht�1,xt])
ot = �(W|o[ht�1,xt])
ht = ot � tanh(ct)
read from and write to “cell” state
ft it ot = �(W|o[ht�1,xt])
ht�1
ct�1 ct
ht
ct = ft � ct�1 + it � tanh(W|c [ht�1,xt])
Output
yt = �(W|yht)xt
yt
![Page 80: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/80.jpg)
Output Gate
Cell State
!80
add trainable memory to the network
Long Short-Term Memory (LSTM)
Input Gate
Forget Gate
Hidden State
it = �(W|i [ht�1,xt])
ft = �(W|f [ht�1,xt])
ot = �(W|o[ht�1,xt])
ht = ot � tanh(ct)
read from and write to “cell” state
ft it ot
ht�1
ct�1 ct
ht
ct = ft � ct�1 + it � tanh(W|c [ht�1,xt])
Outputyt = �(W|
yht)xt
yt
![Page 81: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/81.jpg)
!81
memory networks
Gated Recurrent Unit (GRU) Cho et al., 2014
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Hopfield Network Hopfield, 1982
Neural Turing Machine (NTM) Graves et al., 2014
Differentiable Neural Computer (DNC) Graves, Wayne, et al., 2016
Memory Networks (MemNN) Weston et al., 2015
![Page 82: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/82.jpg)
!82
bi-directional RNNs
t
INPUT
HIDDEN
OUTPUT
up until now, we have considered the output of the network to only be a function of the preceding inputs (filtering)
but future inputs may help in determining this output (smoothing)
can we make the output a function of both the future and the past inputs?
![Page 83: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/83.jpg)
!83
bi-directional RNNs
HIDDEN
INPUT
t
INPUT
HIDDEN
OUTPUT
![Page 84: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/84.jpg)
!84
audio classification handwriting classification
text classification
Graves, et al., 2013Eyolfsdottir, et al., 2017
Others
![Page 85: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/85.jpg)
!85
tons of options!
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
![Page 86: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/86.jpg)
!86
deep recurrent neural networks
![Page 87: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/87.jpg)
!87
auto-regressive generative modeling
output becomes next input
![Page 88: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/88.jpg)
!88
auto-regressive generative language modeling
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
![Page 89: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/89.jpg)
!89
Pixel RNN uses recurrent networks to perform auto-regressive image generation
condition the generation of each pixel on a sequence of past pixels
contextgenerated samples
van den Oord et al., 2016
![Page 90: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/90.jpg)
!90
MIDI music generation
![Page 91: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/91.jpg)
!91
recapitulation
we can exploit sequential structure to impose inductive biases on the model
this limits the number of parameters required, reducing flexibility in reasonable ways
t
INPUT
HIDDEN
OUTPUT
can then scale these models to complex data sets to perform difficult tasks
![Page 92: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/92.jpg)
R E C A P
![Page 93: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/93.jpg)
!93
we used additional priors (inductive biases) to scale deep networks up to handle spatial and sequential data
recapitulation
without these priors, we would need more parameters and data
![Page 94: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/94.jpg)
!94
we live in a spatiotemporal world
we are constantly getting sequences of spatial sensory inputs
embodied intelligent machines need to learn from spatial and temporal patterns
Berkeley AI Research
![Page 95: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/95.jpg)
!95
CNNs and RNNs are building blocks for machines that can use spatiotemporal data to solve tasks
Jaderberg, Minh, Czarnecki et al., 2016
![Page 96: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/96.jpg)
!96 Jaderberg, Minh, Czarnecki et al., 2016
![Page 97: DEEP LEARNING - Yisong Yue · standard neural networks require a fixed input size… clearer patterns, but more parameters fewer parameters, but unclear patterns 150 x 150 x 3 205](https://reader033.vdocuments.mx/reader033/viewer/2022042109/5e89d155c334a35aaf530d6a/html5/thumbnails/97.jpg)
!97