apprentissage,réseauxdeneuronesetmodèlesgraphiques (rcp209...
TRANSCRIPT
Apprentissage, réseaux de neurones et modèles graphiques(RCP209) - Neural Networks and Deep Learning
Convolutionnal Neural Nets (ConvNets)
Nicolas [email protected]
http://cedric.cnam.fr/vertigo/Cours/ml2/
Département InformatiqueConservatoire Nationnal des Arts et Métiers (Cnam)
Motivation Convolution Pooling ConvNets
Outline
1 Limitations of Fully Connected Networks
2 Convolution
3 Pooling
4 Deep Convolutionnal Neural Nets
[email protected] RCP209 / Deep Learning 2/ 64
Motivation Convolution Pooling ConvNets
Limitations of Fully Connected Networks
Credit: M.A. Ranzato
• Scalability issue with Fully Connected Networks# Parameter explosion even for a single hidden layer !
[email protected] RCP209 / Deep Learning 3/ 64
Motivation Convolution Pooling ConvNets
Limitations of Fully Connected Networks
• Signal data: importance of local structure
1D signals: local temporal structure2D signal data: local spatial structure
[email protected] RCP209 / Deep Learning 4/ 64
Motivation Convolution Pooling ConvNets
Limitations of Fully Connected Networks
• BUT: vectorial representation of inputs: dimensions arbitrary!
[email protected] RCP209 / Deep Learning 5/ 64
Motivation Convolution Pooling ConvNets
Limitations of Fully Connected Networks
• MNIST ex: same performances with initial and permuted images!However, local information obviously useful
[email protected] RCP209 / Deep Learning 6/ 64
Motivation Convolution Pooling ConvNets
Limitations of Fully Connected Networks
• Prior knowlege on data structure⇒ useful
• Example: MLP training for shaperecognition (rectangle, trinagle,diamond, star) from color images
[email protected] RCP209 / Deep Learning 7/ 64
Motivation Convolution Pooling ConvNets
Limitations of Fully Connected Networks
• Invariance & stability• Expectations:
Small deformation ⇒ similarrepresentationsLarge deformation ⇒ dissimilar
• Translation invariance difficult withFully Connected Networks ∼ localscale, rotation, deformations, etc
[email protected] RCP209 / Deep Learning 8/ 64
Motivation Convolution Pooling ConvNets
Convolutionnal Neural Networks
Overcome most of the aforementioned limitations:• Significantly limit number of free parameters• Explicitly focus on local structure of the signal• Able to gain invariance to local deformations• All parameters remain trainable with error back-propagation
[email protected] RCP209 / Deep Learning 9/ 64
Motivation Convolution Pooling ConvNets
Outline
1 Limitations of Fully Connected Networks
2 Convolution
3 Pooling
4 Deep Convolutionnal Neural Nets
[email protected] RCP209 / Deep Learning 10/ 64
Motivation Convolution Pooling ConvNets
Convolution in 1D (Signal)
• Discrete 1D convolution with Finite Impulse Response (FIR) filter h, sized (odd)
• Input signal f (i), i ∈ {1;N}• Output signal f ′(i), i ∈ {1;N}• Convolution: operator T ∶ f → f ′ = T [f ] = f ⋆ h
f ′(i) = (f ⋆ h)(i) =d−12
∑n=− d−1
2
f (i − n)h(n)
[email protected] RCP209 / Deep Learning 11/ 64
Motivation Convolution Pooling ConvNets
Convolution in 2D (Images)
• Discrete 2D convolution with FIR filter h (size d odd),T ∶ f → f ′ = T [f ] = f ⋆ h:
f ′(i , j) = (f ⋆ h)(i , j) =d−12
∑n=− d−1
2
d−12
∑m=− d−1
2
f (i − n,m − j)h(n,m)
• Ex with a 3 × 3 filter:f ′(i, j) = w1f (i − 1, j − 1) +w2f (i − 1, j) +w3f (i − 1, j + 1)
+ w4f (i, j − 1) +w5f (i, j) +w6f (i, j + 1)
+ w7f (i + 1, j − 1) +w8f (i + 1, j) +w9f (i + 1, j + 1)
• Convolution processing:1 Apply central symmetry to the filter:
h(n,m)⇒ h(−n,−m) = g(n,m)2 ∀(i , j), compute weighted sum between image value
around f (i , j) and filter coeffs g(n,m)
h =⎛⎜⎝
w9 w8 w7w6 w5 w4w3 w2 w1
⎞⎟⎠
g =⎛⎜⎝
w1 w2 w3w4 w5 w6w7 w8 w9
⎞⎟⎠
[email protected] RCP209 / Deep Learning 12/ 64
Motivation Convolution Pooling ConvNets
2D Convolution vs Cross-Correlation
• 2D Convolution: f ′(i , j) = (f ⋆ h)(i , j) = ∑n∑m
f (i − n,m − j)h(n,m)
• Cross-Correlation: f ′(i , j) = (f ⊗ h)(i , j) = ∑n∑m
f (i + n,m + j)h(n,m)
Cross-Correlation ∼ Convolution without symmetrizing mask!
h =⎛⎜⎝
−4 0 00 0 00 0 4
⎞⎟⎠⇒ g =
⎛⎜⎝
4 0 00 0 00 0 −4
⎞⎟⎠
[email protected] RCP209 / Deep Learning 13/ 64
Motivation Convolution Pooling ConvNets
2D Convolution / Cross-Correlation: Interpretation
• Cross-Correlation: ∀(i , j): dot productbetween image region and filter hLarge f ′(i , j)⇒ filter and region aligned
• Input: 2d image ⇒ output: 2d map
[email protected] RCP209 / Deep Learning 14/ 64
Motivation Convolution Pooling ConvNets
2D Convolution / Cross-Correlation: Example
• Cross-Correlation: output maps ⇔ location in input image similar to mask
[email protected] RCP209 / Deep Learning 15/ 64
Motivation Convolution Pooling ConvNets
2D Convolution / Cross-Correlation: Real Image Example
Credit: K. Matsui
[email protected] RCP209 / Deep Learning 16/ 64
Motivation Convolution Pooling ConvNets
Strided Convolution
f ′(i , j) = (f ⋆ h)(i , j) =d−12
∑n=− d−1
2
d−12
∑m=− d−1
2
f (i − n,m − j)h(n,m)
• Standard convolution: stride 1 ⇒ compute f ′(i , j) for (i , j) ∈ {1;N} × {1;M}• Strided convolution: compute f ′(i , j) for
i ∈ {1,1 + s,1 + 2s, ...,N} (idem for j)• Ex: s = 2, N = M = 5, d = 3⇒ reduced map size (3 × 3)
[email protected] RCP209 / Deep Learning 17/ 64
Motivation Convolution Pooling ConvNets
Convolution: Example for Gradient Computation
Ix ≈ I ⋆Mx
• Gradient:Ð→G (x , y) = ( ∂I
∂x∂I∂y )T = ( Ix Iy )T
• Convolution approximation: Ix ≈ I ⋆Mx , Iy ≈ I ⋆My
Mx =14⋅
⎡⎢⎢⎢⎢⎣
−1 0 1−2 0 2−1 0 1
⎤⎥⎥⎥⎥⎦
My =14⋅
⎡⎢⎢⎢⎢⎣
−1 −2 −10 0 01 2 1
⎤⎥⎥⎥⎥⎦
[email protected] RCP209 / Deep Learning 18/ 64
Motivation Convolution Pooling ConvNets
Convolution with Multiple Filters: Edge Detection
Ix ∼ filter 1 Iy ∼ filter 2 Ie = Ix2 + Iy2
Ie,t
Ie,t : Ie threshold⇒ edge detector!
[email protected] RCP209 / Deep Learning 19/ 64
Motivation Convolution Pooling ConvNets
Convolution: Linear Filtering
• Convolution can be viewed as multiplication by a matrix• 1D case: univariate Toeplitz matrix:
• 2D case: doubly block circulant matrix
[email protected] RCP209 / Deep Learning 20/ 64
Motivation Convolution Pooling ConvNets
Convolution vs Fully Connected Layers
Convolution: overcome fully connected network limitations1 Local connection⇒ drastic reduction in the number of parametersa) Sparse connectivity: hidden unit only connected to a local patch
Credit: M.A. Ranzato
[email protected] RCP209 / Deep Learning 21/ 64
Motivation Convolution Pooling ConvNets
Convolution vs Fully Connected Layers
Convolution: overcome fully connected network limitations1 Local connection
b) Weight sharing: same feature detected across all image positions
Credit: M.A. Ranzato
• Convolution: number of parameters independent of input image size ! ≠fully connected layers
[email protected] RCP209 / Deep Learning 22/ 64
Motivation Convolution Pooling ConvNets
Translation-Invariant Feature Detection
• Convolution, weight sharing: same feature detected across all imagepositions
• Very relevant prior for object classification / scene recognition
[email protected] RCP209 / Deep Learning 23/ 64
Motivation Convolution Pooling ConvNets
Convolution vs Fully Connected Layers
Convolution: overcome fully connected network limitations2 Convolution: local spatial structure
Analyses shape/appearance in a local neighborhoodPermutation to input images ⇒ very different local info⇒ Different classification performances
[email protected] RCP209 / Deep Learning 24/ 64
Motivation Convolution Pooling ConvNets
Convolution vs Fully Connected Layers
Convolution: overcome fully connected network limitations3 Convolution: equivariance property
Equivariance:f equivariant to g ⇔ f [g(x)] = g [f (x)]Convolution equivariant to translation:
T [x(t − τ)] = x(t − τ) ⋆ h(t) = (x ⋆ h)(t − τ) = y(t − τ)
[email protected] RCP209 / Deep Learning 25/ 64
Motivation Convolution Pooling ConvNets
Convolution vs Fully Connected Layers
Convolution: overcome fully connected network limitations
3 Convolution: translation equivarianceEnsure that deformation, i.e. translation, encoded in mapsLocal translation invariance: local pooling ⇒ next !
Credit: G. Hinton
[email protected] RCP209 / Deep Learning 26/ 64
Motivation Convolution Pooling ConvNets
Convolution and Non-Linearity
source image I I ⋆Mx ∣I ⋆Mx ∣ (I ⋆Mx)2
• Convolution, linear operation for each feature map
Gradient Ix ≈ I ⋆Mx , Mx = 14 ⋅⎡⎢⎢⎢⎢⎢⎣
−1 0 1−2 0 2−1 0 1
⎤⎥⎥⎥⎥⎥⎦• Convolution + point-wise non-linearity: feature detection
Ex: σ(z) = z2, σ(z) = ∣z ∣⇒ activate for large > 0 & < 0 Ix values
[email protected] RCP209 / Deep Learning 27/ 64
Motivation Convolution Pooling ConvNets
Convolution and Non-Linearity
source image I I ⋆Mx Sigmoid ReLU
• Other non-linearities: only activate for Ix > 0Sigmoid (with bias) σ(z) = (1 + e−a(z−b))−1,a = 8 ⋅ 10−2, b = 50ReLU (see later) σ(z) = max(0, z)
[email protected] RCP209 / Deep Learning 28/ 64
Motivation Convolution Pooling ConvNets
Outline
1 Limitations of Fully Connected Networks
2 Convolution
3 Pooling
4 Deep Convolutionnal Neural Nets
[email protected] RCP209 / Deep Learning 29/ 64
Motivation Convolution Pooling ConvNets
Pooling
• Pooling: statistical aggregation of a set of values, e.g. x = {x1, ..., xN}• Output: a single scalar value - Possible pooling functions:
Max pooling: pool(x) = maxi∈{1;N}
xi
Average pooling: pool(x) = 1N
N∑i=1
xi
max = 8, avg = 4.8• Goal: capture statistics of responses
Invariance wrt position of valuesPermut values ⇒ same features
[email protected] RCP209 / Deep Learning 30/ 64
Motivation Convolution Pooling ConvNets
`p Pooling
• `p pooling: pool(x) = ( 1N
N∑i=1
xpi )
1p
• Smooth transition: average → max (wrt p)
[email protected] RCP209 / Deep Learning 31/ 64
Motivation Convolution Pooling ConvNets
Pooling in Convolution Feature Maps
• Spatial pooling: aggregation over image (map) regions• Pooling Input: map (image), output: map• Local aggregation: ⇒ local pooling receptive field• Key pooling parameters:
Pooling functionLocal pooling sizeStride between two pooling areas
[email protected] RCP209 / Deep Learning 32/ 64
Motivation Convolution Pooling ConvNets
Spatial Max Pooling
Credit: K. Matsui
• Ex: max pooling with 5 × 5pooling area
• binary input: pooling ⇒ presence/ absence of feature in localpooling area
• (partial) Translation invariance ⇒later
[email protected] RCP209 / Deep Learning 33/ 64
Motivation Convolution Pooling ConvNets
Spatial Average Pooling
Credit: K. Matsui
• Ex: average pooling with 5 × 5pooling area
• binary input: pooling ∼ countnumber of present features in localpooling area
[email protected] RCP209 / Deep Learning 34/ 64
Motivation Convolution Pooling ConvNets
Spatial Pooling: Stride
• Step s to which pooling areas centered• s > 1: decreases spatial resolution ⇒ less parameters in Deep Models ∼Downsampling
Credit: M. Antony
[email protected] RCP209 / Deep Learning 35/ 64
Motivation Convolution Pooling ConvNets
Spatial Pooling: from Equivariance to Invariance
• Recap: convolution equivariantto translation:
f [g(x)] = g [f (x)]
f convolution, g translationCredit: G. Hinton
[email protected] RCP209 / Deep Learning 36/ 64
Motivation Convolution Pooling ConvNets
Max Pooling & Translation Invariance
• Under some conditions, max pooling ⇒ translation invariance:
f ′ [g(x)] = f ′(x)
f ′ = f ○ p with f convolution, p pooling
[email protected] RCP209 / Deep Learning 37/ 64
Motivation Convolution Pooling ConvNets
Max Pooling & Translation Invariance
• Translation invariance wrt vectorÐ→T = (tx , ty)t if:
Ð→T ⇏ new largest element at pooling region edgeÐ→T ⇏ remove max from pooling region
• Ex: 5 × 5 conv map, 3 × 3 max pooling centered at15: max = 15,
• Invariance OK: ∀ translation (tx , ty) ∈ ±1 px⇒ max = 15
C =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
11 −5 1 −2 01 3 0 0 58 4 15 −10 48 6 5 3 73 0 −2 9 3
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
[email protected] RCP209 / Deep Learning 38/ 64
Motivation Convolution Pooling ConvNets
Max Pooling & Translation Invariance
• Translation invariance wrt vectorÐ→T = (tx , ty)t if:
Ð→T ⇏ new largest element at pooling region edgeÐ→T remove max from pooling region
• Ex: 5 × 5 conv map, 3 × 3 max pooling centered at15: max = 15,
• Invariance KO: right translation tx = +1 px⇒ max = 7
C =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
11 −5 1 −2 01 3 0 0 58 15 4 −10 48 6 5 3 73 0 −2 9 3
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
[email protected] RCP209 / Deep Learning 39/ 64
Motivation Convolution Pooling ConvNets
Max Pooling & Translation Invariance
• Max pooling: partial translation invariance (under some conditions)At least local stability: every value in bottom changed, only half values in topchanged ⇒ Distance after pooling decreases
From [Goodfellow et al., 2016]
[email protected] RCP209 / Deep Learning 40/ 64
Motivation Convolution Pooling ConvNets
Pooling: Conclusion
• Reducing spatial feature map size (stride)• Partial translation invariance and stability
• Convolution on tensors (color images / hierarchies)?⇒ following!
[email protected] RCP209 / Deep Learning 41/ 64
Motivation Convolution Pooling ConvNets
Outline
1 Limitations of Fully Connected Networks
2 Convolution
3 Pooling
4 Deep Convolutionnal Neural Nets
[email protected] RCP209 / Deep Learning 42/ 64
Motivation Convolution Pooling ConvNets
Convolution Layer
• 2D convolution: each filter ⇒ 2D map (image)• Convolution Layer: staking maps from multiple Filters⇒ Tensor: multi-dimensional array
[email protected] RCP209 / Deep Learning 43/ 64
Motivation Convolution Pooling ConvNets
Convolution Layer
• Tensor: stacking several filters outputsDepth ⇔ # filtersEach spatial position: output for the differentfilters
• Ex: 2D convolution with gray-scale imagesInput tensor depth = 1
• Convolution on color images / hierarchies:Convolution on tensors!Input Tensor ⇒ output Tensor
[email protected] RCP209 / Deep Learning 44/ 64
Motivation Convolution Pooling ConvNets
Convolution Layer for Tensors
f ′(i , j) = (f ⋆ h)(i , j) =K
∑k=1
d−12
∑n=− d−1
2
d−12
∑m=− d−1
2
f (i − n,m − j , k)h(n,m, k) + b
• Convolution: linear, bias b ⇒ affine
• Filtering on depth: correlation between feature maps
[email protected] RCP209 / Deep Learning 45/ 64
Motivation Convolution Pooling ConvNets
Convolution Layer for Tensors
Ex: input color image
[email protected] RCP209 / Deep Learning 46/ 64
Motivation Convolution Pooling ConvNets
Convolution Layer for Tensors
Natural extension for multiple filters
[email protected] RCP209 / Deep Learning 47/ 64
Motivation Convolution Pooling ConvNets
Convolution Layer for Tensors
Ex: input color image
[email protected] RCP209 / Deep Learning 48/ 64
Motivation Convolution Pooling ConvNets
Specific Tensor Convolution Filters
• Input tensor size W ×H ×D• Filter size = W ×H ×D = tensor size, nopadding⇒ No possible displacement for filter
Output: single scalar valueUse of K filters ⇒ output: K-dim vector
• Convolution ∼ fully connected onflattened tensor
[email protected] RCP209 / Deep Learning 49/ 64
Motivation Convolution Pooling ConvNets
Convolution Layer: Non-Linearity
• Convolutional Layer:Input Tensor → Output Tensor
1 Convolution: linear / affine filtering2 Follwed by point wise non-linearity
∼ non-linearity on spatial maps
[email protected] RCP209 / Deep Learning 50/ 64
Motivation Convolution Pooling ConvNets
Convolution Layer: Non-Linearity
• Each activation in tensor map ⇔ formal neuron• Ex: sigmoid activation:
σ(z) = (1 + e−az)−1
[email protected] RCP209 / Deep Learning 51/ 64
Motivation Convolution Pooling ConvNets
Convolution Hierarchies
• Convolution Layer: affine filtering + non-linear activation• Convolution Hierarchies: stacking Convolution Layers• Motivation: depth increase modeling capacities
Non-linearity crucial: hierarchical model ≠ flat model!∼ fully connected network
[email protected] RCP209 / Deep Learning 52/ 64
Motivation Convolution Pooling ConvNets
Convolution Hierarchies: Receptive Field
• Cascading two 3× 3 convolutions: same receptive field as 5× 5 convolutionin input image
• Convolution Hierarchies:Feature combinationGradual increase of spatial receptive field ⇒ indirect global connectivity
[email protected] RCP209 / Deep Learning 53/ 64
Motivation Convolution Pooling ConvNets
Convolution Hierarchies: Example
Edge detection with convolution hierarchy (pyramid) ⇒ two layers:• Input: gray-scale image I ⇒ W ×H × 1 tensor
I 2x ∼ filter 1 I 2
y ∼ filter 2
1 1st layer: convolution with two filters
Mx =1
4⋅⎡⎢⎢⎢⎢⎢⎣
−1 0 1−2 0 2−1 0 1
⎤⎥⎥⎥⎥⎥⎦My =
1
4⋅⎡⎢⎢⎢⎢⎢⎣
−1 −2 −10 0 01 2 1
⎤⎥⎥⎥⎥⎥⎦
followed by non-linearity: σ(z) = z2
⇒ Output: W ×H × 2 tensor, H1 ∼ (Ix)2, H2 ∼ (Iy)2
[email protected] RCP209 / Deep Learning 54/ 64
Motivation Convolution Pooling ConvNets
Convolution Hierarchies: Example
Edge detection with convolution hierarchy (pyramid): two-Layers
I 2x I 2
y output
2 2nd layer: convolution with one 1 × 1 filter [1 1]For each pixel: (Ix)2 + (Iy )2 = ∣∣
Ð→G (x , y)∣∣2 = ∣∣
Ð→∇I ∣∣
2
σ(z) = Step(z −T), T threshold on ∣∣Ð→G (x , y)∣∣2
⇒ Output: W ×H × 1 tensor ⇒ edge detector
[email protected] RCP209 / Deep Learning 55/ 64
Motivation Convolution Pooling ConvNets
Pooling in Convolution Layer
Where to pool in a convolution tensor?• Most common choice: pool in each feature map independently⇒ spatial pooling on top of convolution Layer
Input / Output: a tensor of depth DOutput smaller spatial size (pooling stride)
[email protected] RCP209 / Deep Learning 56/ 64
Motivation Convolution Pooling ConvNets
Convolution / Pooling Layer
• Pooling on top of convolution Layer• An elementary block: Convolution Layer + Pooling [Conv-Pool]
[email protected] RCP209 / Deep Learning 57/ 64
Motivation Convolution Pooling ConvNets
Convolution / Pooling Layer
• An elementary block:
Convolution + Non linearity³¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹·¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹µConvolution Layer + Pooling
[email protected] RCP209 / Deep Learning 58/ 64
Motivation Convolution Pooling ConvNets
Convolutional Neural Networks (ConvNets)
• Stack several Convolution /Pooling blocks⇒ Convolutional Neural Network(ConvNet)
• Ex: 7 × 7 convolution2 × 2 pooling area, stride s = 2
• Input image 46 × 46, 1st
[Conv-Pool] Layer:Conv output = 40Pooling output = 20Receptive field size for eachpooled unit?⇒ Pooling ↑ receptive field
[email protected] RCP209 / Deep Learning 59/ 64
Motivation Convolution Pooling ConvNets
Convolutional Neural Networks (ConvNets)
• ConvNets: hierarchical tensor modification• At some (depth) point, often flattening input tensor ⇒ vector
[email protected] RCP209 / Deep Learning 60/ 64
Motivation Convolution Pooling ConvNets
Convolutional Neural Networks (ConvNets)
• ConvNet prediction: 2-stage process:1 Representation learning with [Conv-Pool] hierarchy:
Conv detects relevant featuresPool gains spatial invariance for classification
[email protected] RCP209 / Deep Learning 61/ 64
Motivation Convolution Pooling ConvNets
Convolutional Neural Networks (ConvNets)
• ConvNet prediction: 2-stage process:2 Classification: Tensor flattened ⇒ vector
Flattening: neuron position in initial tensor⇒ breaking translation invarianceFollowed by a hierarchy of fully connected layers
[email protected] RCP209 / Deep Learning 62/ 64
Motivation Convolution Pooling ConvNets
Conclusion
• ConvNet: hierarchical [Conv-Pool] + fully connectedArchitecture for famous historical Nets, e.g. LeNet, or more recent,e.g. AlexNet (2012)
• Deep Learning History?⇒ next course!
[email protected] RCP209 / Deep Learning 63/ 64
References I
[Goodfellow et al., 2016] Goodfellow, I., Bengio, Y., and Courville, A. (2016).Deep Learning.MIT Press.http://www.deeplearningbook.org.