convolution neural network cnn a tutorial kh wong convolution neural network cnn ver. 4.11a1
TRANSCRIPT
Convolution Neural NetworkCNN
A tutorialKH Wong
Convolution Neural Network CNN ver. 4.11a 1
Introduction
• Very Popular: – Toolboxes: cuda-convnet and caffe (user friendlier)
• A high performance Classifier (multi-class)• Successful in handwritten optical character OCR
recognition, speech recognition, image noise removal etc.
• Easy to implementation– Slow in learning– Fast in classification
Convolution Neural Network CNN ver. 4.11a 2
Overview of this note
• Part 1: Fully connected Back Propagation Neural Networks (BPNN)– Part 1A: feed forward processing– Part 1A: feed backward processing
• Part 2: Convolution neural networks (CNN)– Part 2A: feed forward of CNN– Part 2B: feed backward of CNN
Convolution Neural Network CNN ver. 4.11a 3
Part 1
Fully Connected Back Propagation (BP) neural net
Convolution Neural Network CNN ver. 4.11a 4
TheoryFully connected Back Propagation Neural Net (BPNN)
• Use many samples to train the weights, so it can be used to classify an unknown input into different classes
• Will explain– How to use it after training: forward pass– How to train it: how to train the weights and
biases (using forward and backward passes)
Convolution Neural Network CNN ver. 4.11a 5
Training• How to train it: how to train the weights (W) and
biases (b) (use forward, backward passes)• Initialize W and b randomly• Iter=1: all_epocks (each is called an epcok)– Forward pass for each output neuron:
• Use training samples: Xclass_t : feed forward to find y.
• Err=error_function(y-t)
– Backward pass:• Find W and b to reduce Err.• Wnew=Wold+W; bnew=bold+b
Convolution Neural Network CNN ver. 4.11a
6
Part 1A
Forward pass of Back Propagation Neural Net (BPNN)
Recall:Forward pass for each output neuron:
-Use training samples: Xclass_t : feed forward to find y.-Err=error_function(y-t)
Convolution Neural Network CNN ver. 4.11a 7
Feed forward of Back Propagation Neural Net (BPNN)
• In side each neuron:
Convolution Neural Network CNN ver. 4.11a
8
1x l 2x l 3x l
2lw 3lw Nlw
Nlx)b(1
1)u(x
,1
1)(
i.e. function, (sigmod) logistic a is Typically
b,W,u,xsuch that
,bwith )u(x
lll xW
ll
u
llllllll
llllll
ef
thereforee
uf
f
bWux
xWuf
Output neuronsInputs
Sigmod function f(u) and its derivative f’(u)
•
Convolution Neural Network CNN ver. 4.11a 9
)(1)()(
1 slope for theparamter ,simplicityFor
)(1)()1()1(
1
)1()1()(
)(
)1(1
1)(
)(
slopefor paramter theis ,1
1)(
'
22'
'
ufufuf
ufufe
e
e
ee
ee
uf
du
edf
ede
d
du
udfuf
Hencee
uf
u
u
u
uu
uu
u
u
u
u
http://link.springer.com/chapter/10.1007%2F3-540-59497-3_175#page-1
http://mathworld.wolfram.com/SigmoidFunction.html
A single neuron• The neural net can have many layers• In between any neighboring 2 layers, a set of
neurons can be found
Convolution Neural Network CNN ver. 4.11a 10
llllll bxWuufx 1 with)(
W,Wx,
1layer at inputs 1
wweightsx
lxl
l
lxl layer at input
)1(1lx
lx
)1(lW)2(lW)2(1lx
lu luf
Each Neuron
BPNN Forward pass• Forward pass is to find output when an input is given. For
example:• Assume we have used N=60,000 images to train a network to
recognize c=10 numerals.• When an unknown image is input, the output neuron
corresponds to the correct answer will give the highest output level.
Convolution Neural Network CNN ver. 4.11a 11
10 output neurons for 0,1,2,..,9
Inputimage
The criteria to train a network • Is based on the overall error function•
Convolution Neural Network CNN ver. 4.11a 12
network forward feed theofouput at the
sample training theof classoutput The
sample training theof class truegiven The
;2
yt2
1
2
1:neuron eachfor Error
2
1error Overall
2
2
2
1
2
1 1
thnk
thnk
nnc
k
nk
nk
n
N
n
c
k
nk
nk
N
ny
nt
norms
ytE
ytE
Structure of a BP neural network•
Convolution Neural Network CNN ver. 4.11a 13
Input layer
output layer
llllllll bWux b,W,u,x that such
biases ofset b weights,ofset W inputs, ofset x
1layer
hidden
l llayer
hidden
1x l lx
()
b
W
f
biases
weightsl
l
Architecture (exercise: write formulas for A1(i=4) and A2(k=3)
Convolution Neural Network CNN ver. 4.11a
•
Input:P=9x1Indexed by j
Hidden layer =5 neurons,indexed by iW1=9x5b1=5x1
W1(j=1,i=1)
W1(j=2,i=1)
W1(j=9,i=1)
P(j=1)
P(j=2)
P(j=3)
::
P(j=9)
)1(ib...)1,2()1,1(1 121111
1A
PijWPijWe
A1(i=1)
P(j=1)
P(j=2)
P(j=9)
Neuron i=1Bias=b1(i=1)
W2(i=1,k=1)
W2(i=2,k=1)
W2(i=5,k=1)
))1(b...)1()1,2()1()1,1((2 222121
1A
kkAkiWkAkiWe
A2(k=2)
A1
A2
A5
Neuron k=1Bias=b2(k=1)
W1(j=1,i=1)
W1(j=2,i=1)
W1(j=9,i=5)
W1(j=3,i=4)
A1(i=5)
A1(i=1)
Output neurons=3 neurons,indexed by kW2=5x3b2=3x1
W2(i=5,k=3)
W2(i=1,k=1)
W2(i=2,k=2)
W2(i=2,k=1)A1(i=2)
14
Answer (exercise: write values for A1(i=4) and A2(k=3)
• P=[ 0.7656 0.7344 0.9609 0.9961 0.9141 0.9063 0.0977 0.0938 0.0859]
• W1=[ 0.2112 0.1540 -0.0687 -0.0289 0.0720 -0.1666 0.2938 -0.0169 -0.1127]
• -b1= 0.1441• %Find A1(i=4)• A1_i_is_4=1/(1+exp[-(W1*P+b1))]• =0.49
Convolution Neural Network CNN ver. 4.11a 15 )1(ib...)4,2()4,1(1 121111
1)4(A PijWPijWe
i
Numerical example for the forward path
• Feed forward• Give numbers of x, w b etc
Convolution Neural Network CNN ver. 4.11a 16
Example: a simple BPNN
• Number of classes (no. of output neurons)=3• Input 9 pixels: each input is a 3x3 image• Training samples =3 for each class• Number of hidden layers =1• Number of neurons in the hidden layer =5
Convolution Neural Network CNN ver. 4.11a 17
Architecture of the example
Convolution Neural Network CNN ver. 4.11a 18
Input Layer9x1 pixels
output Layer 3x1
5x1b
5x9W
layer
hidden
x
lx
()
b
W
f
biases
weightsl
l
•
Part 1B
Backward pass of Back Propagation Neural Net (BPNN)
Convolution Neural Network CNN ver. 4.11a 19
feedback•
Convolution Neural Network CNN ver. 4.11a 20
1ll
1lxlxFeedforward
Feedbackward
llayer
)(1 bwxfxl
lllTlllTll ffWfW u1uu 11'11
•
Convolution Neural Network CNN ver. 4.11a 21
nnLL
nnn
l
nnn
nnn
l
nnnn
nl
nn
nnnn
llll
tyf
L
ivuftyb
E
b
ui
b
uufyt
b
uftyt
b
E
b
ytyt
b
E
tufy
iiiuftytE
iib
u
ubhence
ib
ubxWu
u'
layer output at the
)(
,1),(in since
,)(
(iii), & (ii) From
or target truth theis ,outputcurrent theis )( since,
)()(2
1
2
1 sampleth -n theSince
)(essensitivit theEE
),(1 so, since
'
'
22
1
derivation
derivation
•
Convolution Neural Network CNN ver. 4.11a 22
ll
lll
l
Tll
nnlnn
lnn
lnn
l
nnn
l
nnn
WW
WWW
W
uftyxufty
W
bwxufty
W
ufyt
W
yyt
W
ytE
E
hence
factor learninga
useslowly it do to,E
make so negative make
clcycle learningeverfy for W decease want to weif ,W W
calculated isW new a phase, learning eachFor
x
)(' (iv) in since,)('
)(')(E
2
1 , (iii) from Also
oldnew
1
2
Numerical example for the feed back pass
Convolution Neural Network CNN ver. 4.11a 23
Procedure
• From the last layer (output), find dt-y• Find d, then find w of the whole network• Find iterative (forward- back forward pass) to
generate a new set of W, until dW is small• Takes a long time
Convolution Neural Network CNN ver. 4.11a 24
Part 2Convolution Neural Networks
Part 2AFeed forward part of
cnnff( )
Convolution Neural Network CNN ver. 4.11a 25
Matlab examplehttp://www.mathworks.com/matlabcentral/fileexchange/38310-deep-learning-toolbox
An example optical chartered recognition OCR
• Example test_example_CNN.m in http://www.mathworks.com/matlabcentral/fileexchange/38310-deep-learning-toolbox
• Based on a data base (mnist_uint8, from http://yann.lecun.com/exdb/mnist/)
• 60,000 training examples (28x28 pixels each)
• 10,000 testing samples (a different dat.2set)– After training , given an unknown image, it
will tell whether it is 0, or 1 ,..,9 etc.– Recognition rate 11% use 1 epoch (training
200seconds)– Recognition rate 1.2% use 100 epochs
(hours of training) Convolution Neural Network CNN ver.
4.11a 26
http://andrew.gibiansky.com/blog/machine-learning/k-nearest-neighbors-simplest-machine-learning/
Overview ofTest_example_CNN.m
• Read data base• Part I: • cnnsetup.m
– Layer 1: input layer (do nothing)– Layer 2 convolution(conv.) Layer, output maps=6, kernel size=5x5– Layer 3 sub-sample (subs.) Layer, scale=2– Layer 4 conv. Layer, output maps =12, kernel size=5x5– Layer 5 subs. Layer (output layer), scale =2
• Part 2: • cnntrain.m % train wedihgts using 60,000 samples
– cnnff( ) % CNN feed forward– cnndb( ) % CNN feed back to train weighted in kernels– cnnapplygrads( ) % update weights
• cnntest.m % test the system using 10000 samples and show error rate
Convolution Neural Network CNN ver. 4.11a 27
Architecture
•
Convolution Neural Network CNN ver. 4.11a 28
Each output neuron corresponds to a character (0,1,2,..,9 etc.)
Layer 1:Image Input1x28x28
Layer 12:6 conv.Maps (C)InputMaps=6OutputMaps=6Fan_in=52=25 Fan_out=6x52=150
Layer 23:6 sub-sample Map (S)InputMaps=6OutputMaps=12
Layer 34:12 conv.Maps (C)InputMaps=6OutputMaps=12Fan_in=6x52=150Fan_out=12x52=300
Layer 45:12 sub-sample Map (S)InputMaps=12OutputMaps=12
Layer 1:One input (I)
Layer 5:12x4x4
10
Kernel=5x5
2x2
Kernel=5x5
Subs
Subs
Layer 4:12x8x8
Layer 3:6x12x12
Layer 2:6x24x24
Conv.
I=inputC=Conv.=convolutionS=Subs=sub sampling
Conv.
2x2
Cnnff.mconvolution neural networks feed forward• This is the feed forward part• Assume all the weights are initialized or
calculated, we show how to get the output from inputs.
Convolution Neural Network CNN ver. 4.11a 29
Layer 12: • Convolute layer 1 with different kernels (map_index1=1,2,.,6) and produce 6 output maps
• Inputs : • input layer 1, a 28x28 image• 6 different kernels : k(1),.,,,k(6) , each k is
5x5, K are dendrites of neurons • Output : 6 output maps each 24x24
• Algorithm• For(map_index=1:6)• {• layer_2(map_index)=• I*k(map_index)valid
• }• Discussion• Valid means only consider overlapped
areas, so if layer 1 is 28x28, kernel is 5x5 each, each output map is 24x24
• In Matlab > use convn(I,k,’valid’)• Example:• I=rand(28,28)• k=rand(5,5)• size(convn(I,k,’valid’))• > ans • > 24 24Convolution Neural Network CNN ver.
4.11a 30
Layer 1:Image Input (i)1x28x28
Layer 12:6 conv.Maps (C)InputMaps=6OutputMaps=6Fan_in=52=25 Fan_out=6x52=150
Layer 1:One input (I)
Kernel=5x5
2x2
Layer 2(c):6x24x24
Conv.*K(6)
I=inputC=Conv.=convolutionS=Subs=sub sampling
i
j
Map_index=1 2 : 6
Conv.*K(1)
Layer 23:
•
Convolution Neural Network CNN ver. 4.11a 31
Layer 23:6 sub-sample Map (S)InputMaps=6OutputMaps=12
2x2
Subs
Layer 3 (s):6x12x12
Layer 2 (c):6x24x24
• Sub-sample layer 2 to layer 3• Inputs :
• 6 maps of layer 2, each is 24x24
• Output : 6 maps of layer 3, each is 12 x12
• Algorithm• For(map_index=1:6)• {• For each input map, calculate
the average of 2x2 pixels and the result is saved in output maps.
• Hence resolution is reduced from 24x24 to 12x12
• }• Discussion
Map_index=1 2 : 6
Layer 34:
Convolution Neural Network CNN ver. 4.11a
32
Layer 34:12 conv.Maps (C)InputMaps=6OutputMaps=12Fan_in=6x52=150Fan_out=12x52=300
Kernel=5x5
Layer 4(c):12x8x8
Layer3 L3(s):6x12x12
•
• Conv. layer 3 with kernels to produce layer 4
• Inputs : • 6 maps of layer3(L3{i=1:6}), each is
12x12• Kernel set: totally 6x12 kernels, each
is 5x5,i.e.• K{i=1:6}{j=1:12}, each K{i}{j} is 5x5• 12 bias{j=1:12} in this layer, each is a
scalar• Output : 12 maps of layer4(L4{j=1:12}),
each is 8x8
• Algorithm• for(j=1:12)• {for (i=1:6)• {clear z, i.e. z=0;• z=z+covn (L3{i}, k{i}{j},’valid’)] %z is 8x8• }• L4{j}=sigm(z+bais{j}) %L4{j} is 8x8• }• function X = sigm(P)• X = 1./(1+exp(-P));• End• Discussion
– Normalization?
Index=i=1:6 Index=j=1:12
:
net.layers{l}.a{j}
Layer 45
•
Convolution Neural Network CNN ver. 4.11a 33
Layer 45:12 sub-sample Map (S)InputMaps=12OutputMaps=12
Layer 5:12x4x4
10
Subs
Layer 4:12x8x8
2x2
• Subsample layer 4 to layer 5
• Inputs : • 12 maps of
layer4(L4{i=1:12}), each is 12x8x8
• Output : 12 maps of layer5(L5{j=1:12}), each is 4x4
• Algorithm• Sub sample each 2x2 pixel
window in L4 to a pixel in L5
• Discussion– Normalization?
Layer 5output
•
Convolution Neural Network CNN ver. 4.11a 34
Each output neuron corresponds to a character (0,1,2,..,9 etc.)net.o{m=1:10}
Layer 45:12 sub-sample Map (S)InputMaps=12OutputMaps=12
Layer 5 (L5{j=1:12}:12x4x4=192Totally 192 pixels
10
• Subsample layer 4 to layer 5• Inputs :
• 12 maps of layer5(L5{i=1:12}), each is 4x4, so L5 has 192 pixels in total
• Output layer weights: Net.ffW{m=1:10}{p=1:192}, total number of weights is 192
• Output : 10 output neurons (net.o{m=1:10})
• Algorithm• For m=1:10%each output neuron• {clear net.fv• net.fv=Net.ffW{m}{all 192
weight}.*L5(all corresponding 192 pixels)
• net.o{m}=sign(net.fv + bias)• }• Discussion
: :
Totally192 weights for each output neuron
Same for each output neuron
Part 2B
Back propagation partcnnbp( )
cnnapplyweight( )
Convolution Neural Network CNN ver. 4.11a 35
cnnbp( )overview (output back to layer 5
• Convolution Neural Network CNN ver.
4.11a36
net.od) * (net.ffW' net.fvd
cnnbp.m codein so
*net.o)) - (1 *. (net.o *. net.e*.
net.o)) - (1 *. (net.o *. net.e.1
)1()(
)(.
.
._
)1()(
i
iii
ii
iii
ii
x
E
wwodnetx
E
odnetwx
E
wxyytyx
E
tyenet
yoout
mcnnbpin
xyytyw
E
Ref: See http://en.wikipedia.org/wiki/Backpropagation
Layer 5 to 4
• Expand 1x1 to 2x2
Convolution Neural Network CNN ver. 4.11a 37
Layer 4 to 3
• Rotated convolution• Find dE/dx at layer 3
Convolution Neural Network CNN ver. 4.11a 38
Layer 3 to 2
• Expand 1x1 to 2x2
Convolution Neural Network CNN ver. 4.11a 39
Calculate gradient
• From later 2 to layer 3• From later 3 to layer 4• Net.ffW• Net.ffb found
Convolution Neural Network CNN ver. 4.11a 40
Details of calc gradients• % part % reshape feature vector deltas into output map style• L4(c) run expand only• L3(s) run conv (rot180, fill), found d• L2(c) run expand only• %Part %% calc gradients• L2(c) run conv (valid), found dk and db• L3(s) not run here• L4(c) run conv(valid), found dk and db• Done , found these for the output layer L5:
– net.dffW = net.od * (net.fv)' / size(net.od, 2);– net.dffb = mean(net.od, 2);
Convolution Neural Network CNN ver. 4.11a 41
cnnapplygrads(net, opts)
• For the convolution layers, L2, L4– From k and dk find new k (weights)– From b and db find new b (bias)
• For the output layer L5– net.ffW = net.ffW - opts.alpha * net.dffW;– net.ffb = net.ffb - opts.alpha * net.dffb;– opts.alpha is to adjust learning rate
Convolution Neural Network CNN ver. 4.11a 42
appendix
•
Convolution Neural Network CNN ver. 4.11a 43
Architecture
•
Convolution Neural Network CNN ver. 4.11a 44
Each output neuron corresponds to a character (0,1,2,..,9 etc.)
Layer 1:Image Input1x28x28
Layer 12:6 conv.Maps (C)InputMaps=6OutputMaps=6Fan_in=52=25 Fan_out=6x52=150
Layer 23:6 sub-sample Map (S)InputMaps=6OutputMaps=12
Layer 34:12 conv.Maps (C)InputMaps=6OutputMaps=12Fan_in=6x52=150Fan_out=12x52=300
Layer 45:12 sub-sample Map (S)InputMaps=12OutputMaps=12
Layer 1:One input (I)
Layer 5:12x4x4
10
Kernel=5x5
2x2
Kernel=5x5
Subs
Subs
Layer 4:12x8x8
Layer 3:6x12x12
Layer 2:6x24x24
Conv.
I=inputC=Conv.=convolutionS=Subs=sub sampling
Conv.
2x2
j
i
u
v
A single neuron• The neural net has many layers• In between any neighboring 2 layers, a set of
neurons can be found
Convolution Neural Network CNN ver. 4.11a 45
llllll xuf bW with )u(x 1
W,Wx,
1layer at inputs x 1
wweightsx
ll
l
lxl layer at input
)1(1lx
lx
)1(lW)2(lW)2(1lx
lu luf
Each Neuron
Derivation
• dE/dW=changes at layer l+1 by changes in layer l
• At output layer L• dE/db=d• E=f(wx+b)• dE/db=d
Convolution Neural Network CNN ver. 4.11a 46
llTll fW u'11
nnLL tyf
L
u'
layer output at
References• Wiki– http://en.wikipedia.org/wiki/
Convolutional_neural_network– http://en.wikipedia.org/wiki/Backpropagation
• Matlab programs– Neural Network for pattern recognition- Tutorial
http://www.mathworks.com/matlabcentral/fileexchange/19997-neural-network-for-pattern-recognition-tutorial
– CNN Matlab example http://www.mathworks.com/matlabcentral/fileexchange/38310-deep-learning-toolbox
Convolution Neural Network CNN ver. 4.11a 47