cs623: introduction to computing with neural nets (lecture-2)
DESCRIPTION
Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay. CS623: Introduction to Computing with Neural Nets (lecture-2). Recapituation. The human brain. Insula: Intuition & empathy. Pre-frontal cortex: Self control. Amygdala: Aggresion. Hippocampus: - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/1.jpg)
CS623: Introduction to Computing with Neural Nets
(lecture-2)
Pushpak BhattacharyyaComputer Science and Engineering
DepartmentIIT Bombay
![Page 2: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/2.jpg)
Recapituation
![Page 3: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/3.jpg)
![Page 4: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/4.jpg)
![Page 5: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/5.jpg)
The human brain
Pre-frontal cortex: Self control
Insula:Intuition &empathy
Anterior CingulateCortex:Anxiety
Hippocampus:Emotional memory
Hypothalamus:Control of HormonesPituitary gland:
Mother instincts
Amygdala:Aggresion
![Page 6: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/6.jpg)
Functional Map of the Brain
![Page 7: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/7.jpg)
The Perceptron Model
A perceptron is a computing element with input lines having associated weights and the cell having a threshold value. The perceptron model is motivated by the biological neuron.
Output = y
wnWn-1
w1
Xn-1
x1
Threshold = θ
![Page 8: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/8.jpg)
θ
1y
Step function / Threshold functiony = 1 for Σwixi >=θ =0 otherwise
Σwixi
![Page 9: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/9.jpg)
Perceptron Training Algorithm (PTA)
Preprocessing:
1. The computation law is modified to
y = 1 if ∑wixi > θ
y = o if ∑wixi < θ
. . .
θ, ≤
w1 w2 wn
x1 x2 x3 xn
. . .
θ, <
w1 w2 w3wn
x1 x2 x3 xn
w3
![Page 10: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/10.jpg)
PTA – preprocessing cont…
2. Absorb θ as a weight
3. Negate all the zero-class examples
. . .
θ
w1 w2 w3 wn
x2 x3 xnx1
w0=θ
x0= -1
. . .
θ
w1 w2 w
3
wn
x2 x3 xnx1
![Page 11: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/11.jpg)
Example to demonstrate preprocessing
• OR perceptron1-class <1,1> , <1,0> , <0,1>0-class <0,0>
Augmented x vectors:-1-class <-1,1,1> , <-1,1,0> , <-1,0,1>0-class <-1,0,0>
Negate 0-class:- <1,0,0>
![Page 12: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/12.jpg)
Example to demonstrate preprocessing cont..
Now the vectors are
x0 x1 x2
X1 -1 0 1
X2 -1 1 0
X3 -1 1 1
X4 1 0 0
![Page 13: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/13.jpg)
Perceptron Training Algorithm
1. Start with a random value of w
ex: <0,0,0…>
1. Test for wxi > 0
If the test succeeds for i=1,2,…n
then return w
3. Modify w, wnext = wprev + xfail
![Page 14: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/14.jpg)
Tracing PTA on OR-example
w=<0,0,0> wx1 fails
w=<-1,0,1> wx4 fails
w=<0,0 ,1> wx2 fails
w=<-1,1,1> wx1 fails
w=<0,1,2> wx4 fails
w=<1,1,2> wx2 fails
w=<0,2,2> wx4 fails w=<1,2,2> success
![Page 15: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/15.jpg)
Theorems on PTA
1. The process will terminate
2. The order of selection of xi for testing and wnext does not matter.
![Page 16: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/16.jpg)
End of recap
![Page 17: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/17.jpg)
Proof of Convergence of PTA
• Perceptron Training Algorithm (PTA)
• Statement:
Whatever be the initial choice of weights and whatever be the vector chosen for testing, PTA converges if the vectors are from a linearly separable function.
![Page 18: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/18.jpg)
Proof of Convergence of PTA
• Suppose wn is the weight vector at the nth step of the algorithm.
• At the beginning, the weight vector is w0
• Go from wi to wi+1 when a vector Xj fails the test wiXj > 0 and update wi as
wi+1 = wi + Xj
• Since Xjs form a linearly separable function, w* s.t. w*Xj > 0 j
![Page 19: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/19.jpg)
Proof of Convergence of PTA• Consider the expression
G(wn) = wn . w*
| wn|
where wn = weight at nth iteration
• G(wn) = |wn| . |w*| . cos |wn|
where = angle between wn and w*
• G(wn) = |w*| . cos • G(wn) ≤ |w*| ( as -1 ≤ cos ≤ 1)
![Page 20: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/20.jpg)
Behavior of Numerator of G
wn . w* = (wn-1 + Xn-1fail ) . w*
wn-1 . w* + Xn-1fail . w*
(wn-2 + Xn-2fail ) . w* + Xn-1
fail . w* …..w0 . w* + ( X0
fail + X1fail +.... + Xn-1
fail ). w*
w*.Xifail is always positive: note carefully
• Suppose |Xj| ≥ , where is the minimum magnitude.
• Num of G ≥ |w0 . w*| + n . |w*| • So, numerator of G grows with n.
![Page 21: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/21.jpg)
Behavior of Denominator of G
• |wn| = wn . wn
(wn-1 + Xn-1fail )2
(wn-1)2 + 2. wn-1. Xn-1
fail + (Xn-1fail )2
≤ (wn-1)2 + (Xn-1
fail )2 (as wn-1. Xn-1
fail ≤ 0 )
≤ (w0)2 + (X0
fail )2 + (X1fail )2 +…. + (Xn-1
fail )2
• |Xj| ≤ (max magnitude)
• So, Denom ≤ (w0)2 + n2
![Page 22: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/22.jpg)
Some Observations
• Numerator of G grows as n
• Denominator of G grows as n
=> Numerator grows faster than denominator
• If PTA does not terminate, G(wn) values will become unbounded.
![Page 23: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/23.jpg)
Some Observations contd.
• But, as |G(wn)| ≤ |w*| which is finite, this is impossible!
• Hence, PTA has to converge.
• Proof is due to Marvin Minsky.
![Page 24: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/24.jpg)
Convergence of PTA
•Whatever be the initial choice of weights and whatever be the vector chosen for testing, PTA converges if the vectors are from a linearly separable function.
![Page 25: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/25.jpg)
Study of Linear Separability
y
x1
. . .
θ
w1 w2 w3 wn
x2 x3 xn
• W. Xj = 0 defines a hyperplane in the (n+1) dimension.=> W vector and Xj
vectors are perpendicular to each other.
![Page 26: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/26.jpg)
Linear Separability
Xk+1 -
Xk+2 -
-
Xm -
X1+ + X2
Xk+
+
Positive set :
w. Xj > 0 j≤k
Negative set :
w. Xj < 0 j>kSeparating hyperplane
![Page 27: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/27.jpg)
Test for Linear Separability (LS)• Theorem:
A function is linearly separable iff the vectors corresponding to the function do not have a Positive Linear Combination (PLC)
• PLC – Both a necessary and sufficient condition.
• X1, X2, … , Xm - Vectors of the function• Y1, Y2, … , Ym - Augmented negated set
• Prepending -1 to the 0-class vector Xi and negating it, gives Yi
![Page 28: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/28.jpg)
Example (1) - XNOR
• The set {Yi} has a PLC if ∑ Pi Yi = 0 ,
1 ≤ i ≤ m– where each Pi is a
non-negative scalar and
– atleast one Pi > 0
• Example : 2 bit even-parity (X-NOR function)
<-1,1,1>Y4<1,1> +
X4
<1,-1,0>Y3<1,0> -
X3
<1,0,-1>Y2<0,1> -
X2
<-1,0,0>Y1<0,0> +
X1
![Page 29: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/29.jpg)
Example (1) - XNOR
• P1 [ -1 0 0 ]T + P2 [ 1 0 -1 ]T
+ P3 [ 1 -1 0 ]T + P4 [ -1 1 1 ]T
= [ 0 0 0 ]T
• All Pi = 1 gives the result.
• For Parity function,PLC exists => Not linearly separable.
![Page 30: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/30.jpg)
Example (2) – Majority function
<-1 1 1 1><1 1 1> +
<-1 1 1 0><1 1 0> +
<-1 1 0 1><1 0 1> +
<1 -1 0 0><1 0 0> -
<-1 0 1 1><0 1 1> +
<1 0 -1 0><0 1 0> -
<1 0 0 -1><0 0 1> -
<1 0 0 0><0 0 0> -YiXi
• 3-bit majority function• Suppose PLC exists.
Equations obtained are:• P1 + P2+ P3- P4 + P5- P6- P7 –
P8 = 0
• -P5+ P6 + P7+ P8 = 0
• -P3+ P4 + P7+ P8 = 0
• -P2+ P4 + P6+ P8 = 0
• On solving, all Pi will be forced to 0
• 3 bit majority function=> No PLC => LS
![Page 31: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/31.jpg)
Limitations of perceptron
• Non-linear separability is all pervading
• Single perceptron does not have enough computing power
• Eg: XOR cannot be computed by perceptron
![Page 32: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/32.jpg)
Solutions• Tolerate error (Ex: pocket algorithm used
by connectionist expert systems).– Try to get the best possible hyperplane using
only perceptrons
• Use higher dimension surfaces Ex: Degree - 2 surfaces like parabola
• Use layered network
![Page 33: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/33.jpg)
Example - XOR
011
001
110
000
x1x
2
x2x1
w2=1.5w1=-1θ = 1
x1 x2
21
1
2
0
ww
w
w
Calculation of XOR
Calculation of x1x2
w2=1w1=1θ = 0.5
x1x2 x1x2
![Page 34: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/34.jpg)
Example - XOR
w2=1w1=1θ = 0.5
x1x2 x1x2
-1
x1 x2
-11.5
1.5
1 1
![Page 35: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/35.jpg)
Multi Layer Perceptron (MLP)
• Question:- How to find weights for the hidden layers when no target output is available?
• Credit assignment problem – to be solved by “Gradient Descent”
![Page 36: CS623: Introduction to Computing with Neural Nets (lecture-2)](https://reader034.vdocuments.mx/reader034/viewer/2022052701/568139d0550346895da18263/html5/thumbnails/36.jpg)
Assignment
• Find by solving linear inequalities the perceptron to solve the majority Boolean Function.
• Then feed it to your implementation of the perceptron training algorithm and study its behaviour.
• Download SNNS and WEKA packages and try your hand at the perceptron algorithm built in the packages.