artificial neural networks - university of minnesota duluthrmaclin/cs5541/notes/ml_chapter04.pdf ·...
TRANSCRIPT
![Page 1: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/1.jpg)
Artificial Neural Networks • Threshold units
• Gradient descent• Gradient descent
• Multilayer networks
• Backpropagation
• Hidden layer representationsy p
• Example: Face recognition
Ad d t i• Advanced topics
CS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 1
![Page 2: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/2.jpg)
Connectionist ModelsConsider humans• Neuron switching time ~.001 second• Number of neurons ~1010
• Connections per neuron ~104-5
• Scene recognition time ~ 1 second• Scene recognition time ~.1 second• 100 inference step does not seem like enoughmust use lots of parallel computation!f p pProperties of artificial neural nets (ANNs):• Many neuron-like threshold switching units• Many weighted interconnections among units• Highly parallel, distributed process• Emphasis on tuning weights automaticallyCS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 2
• Emphasis on tuning weights automatically
![Page 3: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/3.jpg)
When to Consider Neural Networks• Input is high-dimensional discrete or real-valued (e.g., raw
sensor input)• Output is discrete or real valued• Output is a vector of values
P ibl i d t• Possibly noisy data• Form of target function is unknown• Human readability of result is unimportantHuman readability of result is unimportant
Examples:• Speech phoneme recognition [Waibel]Speech phoneme recognition [Waibel]• Image classification [Kanade, Baluja, Rowley]• Financial prediction
CS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 3
![Page 4: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/4.jpg)
ALVINN drives 70 mph on highways
SharpLeft
SharpRight
StraightAhead
4 Hidden4 HiddenUnits
30x32 Sensor30x32 SensorInput Retina
CS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 4
![Page 5: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/5.jpg)
PerceptronX0=1
W
W0W
1
X1
X
X0 1
W n
W2
ΣX2
i
n
ii xw∑
0⎪⎨⎧
= ∑ if 1n
1iii xwσ
Xn
i=0⎪⎩⎨ =
otherwise 1-1iσ
⎧ >+++ 0if1 xwxww
⎩⎨⎧ >+++
=
t titi lillS tiotherwise 1-
0...if1),...,( 110
1
xwxwwxxo nn
n
⎨⎧ >⋅
=0 if 1
:notationctor simpler veusewillweSometimesxw
) xo(rr
r
CS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 5⎩⎨ otherwise1-
)(
![Page 6: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/6.jpg)
Decision Surface of Perceptron
X1
X1
X2 X2
Represents some useful functionsRepresents some useful functions• What weights represent g(x1,x2) = AND(x1,x2)?But some functions not representablep• e.g., not linearly separable• therefore, we will want networks of these ...
CS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 6
![Page 7: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/7.jpg)
Perceptron Training Rule•
where Δ+← www iii
lue target vais )( )( η
=•−=Δ
xctxotw ii
r
ratelearning called.1)(e.g.,constant small is output perceptron is
g)(
η•• o
convergeit will proveCan
g)( gη
smallly sufficient is and separablelinearly is data trainingIf
gp
η••
CS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 7
![Page 8: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/8.jpg)
Gradient Descent where, simpleconsider ,understand To
xwxwwotlinear uni
+++=
[ ]error squared theminimize that s'learn :Idea
21
110
wxw...xwwo
i
nn
∑
+++=
[ ]
examplestrainingofsettheisWhere
)( 221
D
otwEDd
dd∑∈
−≡r
examplestrainingofset theis Where D
CS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 8
![Page 9: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/9.jpg)
Gradient Descent
CS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 9
![Page 10: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/10.jpg)
Gradient Descent
EEE ⎤⎡ ∂∂∂
n
wEwwE
wE
wEwE
∇−=Δ
⎥⎦
⎤⎢⎣
⎡∂∂
∂∂
∂∂
≡∇
][:ruleTraining
,...,,][ Gradient 10
η r
r
ii
i
wEw
wEw
∂∂
−=Δ
∇=Δ
i.e.,
][ :rule Training
η
η
CS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 10
iw∂
![Page 11: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/11.jpg)
Gradient Descent)(
21 2∑ −
∂∂
=∂∂
ddd
ii
otww
E
)(21 2∑ −
∂∂
= ddd i
dii
otw
)()(221 ∑ −
∂∂
−= ddd i
dd
d i
otw
ot
)()(
2
∑ ⋅−∂∂
−=
∂
ddd i
dd
d i
xwtw
ot
wrr
))(( ,∑ −−=∂∂
∂
ddidd
d i
xotwE
w
CS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 11
∂ diw
![Page 12: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/12.jpg)
Gradient Descent
ctor of is the vexwhere ,,txe form pair of thples is a ining exam Each traexamplestraining
><−
) ,_(DESCENTGRADIENT η
rr
iw).(e.g., .rning rateis the leavalue. et output s the tes and t iinput valu
••
domet,isconditiononterminatitheUntil valuerandom small some toeach Initialize
05 arg η
i
examplestraining,txw
><Δ
do ,_in each For - zero. toeach Initialize -
domet,iscondition on terminati the Until
r
iii
i
xotwww
ox
−+Δ←Δ
∗
)(do ,t unit weighlinear each For *
output computeandinstance Input the
η
r
iii
iii
www
xotww
Δ+←
+Δ←Δ
do t w,unit weighlinear each For -
)( η
CS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 12
![Page 13: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/13.jpg)
SummaryPerceptron training rule guaranteed to succeed if• Training examples are linearly separableTraining examples are linearly separable• Sufficiently small learning rate η
Linear unit training rule uses gradient descent• Guaranteed to converge to hypothesis with• Guaranteed to converge to hypothesis with
minimum squared error• Given sufficiently small learning rate ηGiven sufficiently small learning rate η• Even when training data contains noise• Even when training data not separable by HCS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 13
• Even when training data not separable by H
![Page 14: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/14.jpg)
Incremental (Stochastic) Gradient DescentBatch mode Gradient Descent:Do until satisfied:
di tthC t1 ][E r∇ 21 )(][E ∑r
][ 2.gradient theCompute .1
wEww]w[E
D
Drrr
∇−←∇
η
I t l d G di t D t
221 )(][ d
DddD otwE −≡ ∑
∈
r
Incremental mode Gradient Descent:Do until satisfied:- For each training example d in D
][ 2.gradient theCompute .1
wEww]w[E
d
drrr
r
∇−←∇
η
221 )(][ ddd otwE −≡
r
Incremental Gradient Descent can approximate Batch GradientDescent arbitrarily closely if η made small enough
CS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 14
![Page 15: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/15.jpg)
Multilayer Networks of Sigmoid Units
d1 d3 d3
h1h2
d1d2
3
d0 d2d0o
h1 h2
x1 x2
CS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 15
![Page 16: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/16.jpg)
Multilayer Decision Space
CS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 16
![Page 17: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/17.jpg)
Sigmoid UnitX
X0=1
W2
W0W
1
X1
ΣX2
W n
Σi
n
ii xwnet ∑
=
=0
neteneto −+
==1
1)(σ
Xnx
1functionsigmoid theis )(σ
xxdx
xde-x
))( 1)(( )( :property Nice
1
−=
+
σσσ
ationBackpropagnetworksMultilayer unitssigmoidofunit sigmoid One
train torulesdescent gradient derivecan We
→••
CS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 17
ationBackpropagnetworksMultilayer unitssigmoidof →•
![Page 18: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/18.jpg)
The Sigmoid Function
0 70.80.9
1
-xex
+=
11)(σ
0.30.40.50.60.7
outp
ut
00.10.2
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
net input
Sort of a rounded step functionSort of a rounded step functionUnlike step function, can take derivative (makes learningpossible)
CS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 18
![Page 19: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/19.jpg)
Error Gradient for a Sigmoid Unit
dd otww
E−
∂∂
=∂∂ ∑ )(
21 2
dd neto )()(:know But we
∂∂ σ
dd
di
Ddii
otw
ww
−∂∂
=
∂∂
∑∈
)(21
2
2
didd
ddd
d
d
d
xxwnet
oonetnet
neto
)(
)1()(
=⋅∂
=∂
−=∂
∂=
∂∂
rr
σ
dd
di
dd otw
ot
⎞⎛ ∂
−∂∂
−= ∑ )()(221
diii
E
xww ,
:So
∂
∂∂
dd
d i
ddd
netowoot
∂∂
⎟⎟⎠
⎞⎜⎜⎝
⎛∂∂
−−=
∑
∑
)(
)( didddDd
di
xoootwE
,)1()( −−−=∂∂ ∑
∈
i
d
d
d
ddd w
netnetoot
∂∂
∂∂
−= ∑ )(-
CS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 19
![Page 20: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/20.jpg)
Backpropagation Algorithm
do example, ingeach trainFor do satisfied, Untilnumbers. random small to weightsall Initialize
•
otook
))(1(unit output each For 2.
outputsthecomputeandexample trainingInput the 1.
δ ←
kkhhhk
kkkkk
woohotoo
)1( unit hidden each For 3.
))(1(
δδ
δ
−←
−−←
∑
ji
koutputsk
khhhk
www
w ,
,
ight network weeach Update4.
)(
Δ+←
∑∈
jijji
jijiji
xw
www ,,,
ere wh
δη=Δ
Δ+←
CS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 20
jijji xw ,, δη
![Page 21: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/21.jpg)
More on Backpropagation• Gradient descent over entire network weight vector
• Easily generalized to arbitrary directed graphsy g y g p
• Will find a local, not necessarily global error minimum– In practice, often works well (can run multiple times)p , ( p )
• Often include weight momentum α)1( )( ,,, −Δ+=Δ nwxnw jijijji αδη
• Minimizes error over training examples
• Will it generalize well to subsequent examples?
,,, jjjj
Will it generalize well to subsequent examples?
• Training can take thousands of iterations -- slow!– Using network after training is fast
CS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 21
Using network after training is fast
![Page 22: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/22.jpg)
Learning Hidden Layer Representations
O t tI tOutputs
010000000100000010000000 10000000
Output Input
→→
000100000001000000100000 001000000100000001000000
→→→
00000100 0000010000001000 00001000
→→Inputs
00000001 0000000100000010 00000010
→→
CS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 22
![Page 23: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/23.jpg)
Learning Hidden Layer RepresentationsOutputs
O t tI t
010000008811010100000010000000 .08 .04 .89 10000000
Output Input
→→→→
Inputs00010000.71.97.99 0001000000100000 .27 .97 .01 0010000001000000 .88.11.01 01000000
→→→→→→
00000100 .99 .99 .22 0000010000001000 .02 .05 .03 00001000
→→→→
00000001 .01 .94 .60 0000000100000010 .98 .01 .80 00000010
→→→→
CS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 23
![Page 24: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/24.jpg)
Output Unit Error during Training
Sum of squared errors for each output unit
0.70.80.9
0.30.40.50.6
00.10.2
0 500 1000 1500 2000 25000 500 1000 1500 2000 2500
CS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 24
![Page 25: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/25.jpg)
Hidden Unit Encoding
Hidden unit encoding for one input
0.80.9
1
0.40.50.60.7
0.10.20.3
0 500 1000 1500 2000 25000 500 1000 1500 2000 2500
CS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 25
![Page 26: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/26.jpg)
Input to Hidden WeightsWeights from inputs to one hidden unit
1234
-2-101
-5-4-3
0 500 1000 1500 2000 25000 500 1000 1500 2000 2500
CS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 26
![Page 27: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/27.jpg)
Convergence of BackpropagationGradient descent to some local minimum• Perhaps not global minimump g• Momentum can cause quicker convergence• Stochastic gradient descent also results in faster
convergence• Can train multiple networks and get different results (using
different initial weights)different initial weights)
Nature of convergence• Initialize weights near zero• Therefore, initial networks near-linear
CS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 27
• Increasingly non-linear functions as training progresses
![Page 28: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/28.jpg)
Expressive Capabilities of ANNsBoolean functions:• Every Boolean function can be represented by network y p y
with a single hidden layer• But that might require an exponential (in the number of
inp ts) hidden nitsinputs) hidden units
C ti f tiContinuous functions:• Every bounded continuous function can be approximated
with arbitrarily small error by a network with one hiddenwith arbitrarily small error by a network with one hidden layer [Cybenko 1989; Hornik et al. 1989]
• Any function can be approximated to arbitrary accuracy by k i h hidd l [C b k 1988]
CS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 28
a network with two hidden layers [Cybenko 1988]
![Page 29: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/29.jpg)
Overfitting in ANNs
Error versus weight updates (example 1)
0.0080.009
0.01Training set
Validation set
0.0040.0050.0060.007
Erro
r
Validation set
0.0020.003
0 5000 10000 15000 20000
N b f i ht d tNumber of weight updates
CS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 29
![Page 30: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/30.jpg)
Overfitting in ANNs
Error versus weight updates (Example 2)
0.060.070.08
Training set
Validation set
0.020.030.040.05
Erro
r Validation set
00.01
0 1000 2000 3000 4000 5000 6000
N b f i ht d tNumber of weight updates
CS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 30
![Page 31: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/31.jpg)
Neural Nets for Face Recognitionleft strt rgt up
30x3230x32inputs 90% accurate learning
head pose, and recognizingp , g g1-of-20 faces
lCS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 31
Typical Input Images
![Page 32: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/32.jpg)
Learned Network Weightsleft strt rgt up
Learned Weights
30x32inputs
Typical Input Images
CS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 32
![Page 33: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/33.jpg)
Alternative Error Functions
: weightslarge Penalize )()(
ji,
2221 +−≡ ∑∑ ∑
∈ ∈
jikdDd outputsk
kd wotwE γr
)()(
:valuesaswellasslopeson target Train 2
21 ⎥⎤
⎢⎡
⎟⎞
⎜⎛ ∂∂∑ ∑ kdkd ottE r
:weightstogetherTie
)()( 221
⎥⎥⎦⎢
⎢⎣
⎟⎠⎞
⎜⎝⎛
∂−
∂+−≡ ∑ ∑
∈ ∈Dd outputsk djkd
dj
kdkdkd xx
otwE μ
nrecognitio phonemein e.g., :weights together Tie
•
CS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 33
![Page 34: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/34.jpg)
Recurrent Networksy(t+1) y(t+1)
FeedforwardNetwork
x(t)
RecurrentNetwork
x(t)
Network Network
y(t+1)
x(t)
y(t)
(t 1)
x(t-1)
y(t-1)
Recurrent Networkunfolded in time
CS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 34x(t-2)
![Page 35: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/35.jpg)
Neural Network Summary• physiologically (neurons) inspired model• powerful (accurate), slow, opaque (hard topowerful (accurate), slow, opaque (hard to
understand resulting model)• bias: preferentialp
– based on gradient descent– finds local minimum– effect by initial conditions, parameters
• neural units– linear– linear threshold
i idCS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 35
– sigmoid
![Page 36: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/36.jpg)
Neural Network Summary (cont)• gradient descent
– convergenceg
• linear units– limitation: hyperplane decision surface– learning rule
• multilayer network– advantage: can have non-linear decision surface– backpropagation to learn
b k l i l• backprop learning rule
• learning issuesunits used
CS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 36
– units used
![Page 37: Artificial Neural Networks - University of Minnesota Duluthrmaclin/cs5541/Notes/ML_Chapter04.pdf · Chapter 4 Artificial Neural Networks 27 • Increasingly non-linear functions as](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd87576f5614741013942d2/html5/thumbnails/37.jpg)
Neural Network Summary (cont)• learning issues (cont)
– batch versus incremental (stochastic)( )– parameters
• initial weightsl i• learning rate
• momentum
– cost (error) function( )• sum of squared errors• can include penalty terms
k• recurrent networks– simple
backpropagation through timeCS 5751 Machine Learning
Chapter 4 Artificial Neural Networks 37
– backpropagation through time