introduction to neural networks (under graduate course) lecture 4 of 9

Neural Networks

Dr. Randa Elanwar

Lecture 4

Lecture Content

• Linearly separable functions: logical gate implementation

– Learning laws: Perceptron learning rule

– Pattern mode solution method

– Batch mode solution method

2Neural Networks Dr. Randa Elanwar

Learning Linearly Separable Functions

• Initial network has a randomly assigned weights.

• Learning is done by making small adjustments in the weights to reduce the difference between the observed and predicted values.

• Main difference from the logical algorithms is the need to repeat the update phase several times in order to achieve convergence.

• Updating process is divided into epochs.

• Each epoch updates all the weights of the process.

• Note that: the initial weights and the learning rate value determine the number of iterations needed for conversion.


Perceptron learning rule

• Desired the desired output for a given input

• Network calculates what it thinks the output should be

• Network changes its weights in proportion to the error between the desired & calculated results

• wi,j = * [Desiredi - outputi] * inputj– where: – is the learning rate (given constant);– Desiredi - outputi is the error term; – and inputj is the input activation

• wi,j = wi,j + wi,j (delta rule)

• Note: there are other learning rules/laws that will be discussed later

• Learning rate : (1)Used to control the amount of weight adjustment at each step of training, (2) ranges from 0 to 1, (3) determines the rate of learning in each time step


Adjusting perceptron weights

• wi,j = wi,j + wi,j

• wi,j = * [Desiredi - outputi] * inputj

• missi is (Desiredi - outputi)

• Adjust each wi,j based on inputj and missi

• If a set of <input, output> pairs are learnable (representable), the delta rule will find the necessary weights (when miss=0)

– in a finite number of steps

– independent of initial weights

Desired < 0, output > 0 w<0

Desired = 0, output = 0 w=0

Desired > 0, output < 0 w>0


Hypothetical example• Suppose we have 2 glasses: first is narrow and tall and has

water in it, second is wide and short with no water in it• Target is to make both glasses contain the same volume of

water• Initially, we add some water from the tall to the short then we

measure volumes• If the volume in the short is less than the tall we add more

water• If the volume in the short is more than the tall we return back

some water• And so on till: If both volumes are equal we are done

• The target = desired output, water = weights, difference measure = error


Node biases

• A node’s output is a weighted function of its inputs

• What is a bias?

• How can we learn the bias value?

• Answer: treat them like just another weight


Training biases ()

• A node’s output:– 1 if w1x1 + w2x2 + … + wnxn >=

– 0 otherwise

• Rewrite– w1x1 + w2x2 + … + wnxn - >= 0

– w1x1 + w2x2 + … + wnxn + (-1) >= 0

• Hence, the bias is just another weight whose activation is always -1

• Just add one more input unit to the network topology

bias


Linearly Separable Functions

• When solving the logical AND problem we are searching for the straight line equation separating +ve (1)and –ve (0) output regions on the graph

• Different values for w1, w2, θ lead to different line slope. We have more than 1 solution depending on: initial weights W, learning rate , activation function f and learning mode (Pattern vs. Batch)


IwIw 2211

+ve +ve +ve +ve

-ve -ve -ve -ve


• Similarly for the logical OR problem

• Different values for w1, w2, θ lead to different line slope.

• We have more than 1 solution depending on: initial weights W, learning rate , activation function f and learning mode (Pattern vs. Batch)


IwIw 2211

-ve

-ve-ve

-ve

+ve +ve +ve +ve


• Example: logical AND, with initial weights 0.5, 0.3 with bias = 0.5 and activation step function at t=0.5. The learning rate = 1


x2

w1= 0.5

w2 = 0.3

x1

yin = x1w1 + x2w2

y

Activation Function:Binary Step Functiont = 0.5,

(y-in) = 1 if y-in >= totherwise (y-in) = 0

Solving Linearly Separable Functions (Pattern mode)

• Given:

• Since we consider bias as additional weight thus the weight vector is 1x3 we have to fix the dimensionality of the input vector x1, x2, x3 and x4from 2x1 to be 3x1 to perform the multiplication.


5.03.05.0)0( W

11100100

X).( bXWfY

x1x2

x3x4

x1 x2 y0 0 00 1 01 0 01 1 1

111101011001

X


• Update weight vector for iteration 1


0,5.0100

.5.03.05.01.)0(

yXW OK

OK

OK

Wrong

5.03.15.1

)..( 4)0()1( XyWW ydis

TT

0,2.0110

.5.03.05.02.)0(

yXW

0,0101

.5.03.05.03.)0(

yXW

0,3.0111

.5.03.05.04.)0(

yXW





1,5.0100

.5.03.15.11.)1(

yXW Wrong

5.03.15.1

)..( 1)1()2( XyWW ydis

TT

1,8.0110

.5.03.15.12.)2(

yXW Wrong

5.13.05.1

)..( 2)2()3( XyWW ydis

TT





0,0101

.5.13.05.13.)3(

yXW

0,3.0111

.5.13.05.14.)3(

yXW

OK

Wrong

0,5.0100

5.03.15.21.)4(

yXW OK

Wrong 1,8.0110

5.03.15.22.)4(

yXW

5.03.15.2

)..( 4)3()4( XyWW ydis

TT

5.13.05.2

2)..()4()5( XyyWWdis

TT





1,1101

.5.13.05.23.)5(

yXW Wrong

5.23.05.1


TT

0,7.0111

.5.23.05.14.)6(

yXW

5.13.15.2


TT

Wrong

0,5.1100

.5.13.15.21.)7(

yXW

0,2.0110

.]5.13.15.2[2.)7(

yXW

1,1101

].5.13.15.2[3.)7(

yXW Wrong

OK

OK






5.23.15.1


TT

0,3.0111

].5.23.15.1[4.)8(

yXW

Wrong

5.13.25.2


TT

0,5.1100

].5.13.25.2[1.)9(

yXW

1,8.0110

].5.13.15.2[2.)9(

yXW Wrong

OK

5.23.15.2

3)..()9()10( XyyWWdis

TT


• The weights learning has converged at 10 iterations


0,0101

].5.13.15.2[3.)10(

yXW

1,3.1111

].5.23.15.2[4.)10(

yXW

0,5.2100

].5.13.15.2[1.)10(

yXW

0,7.0110

].5.13.15.2[2.)10(

yXW

OK

OK

OK

OK

Solving Linearly Separable Functions (Batch mode)


• Add w for all misclassified inputs together in 1 step


0,5.0100

].5.03.05.0[1.)0(

yXW

0,2.0110

].5.03.05.0[2.)0(

yXW

0,0101

].5.03.05.0[3.)0(

yXW

0,3.0111

].5.03.05.0[4.)0(

yXW

OK

OK

OK

Wrong

5.03.15.1


TT





1,5.0100

].5.03.15.1[1.)1(

yXW

1,8.1110

].5.03.15.1[2.)1(

yXW

1,2101

].5.03.15.1[3.)1(

yXW

1,3.3111

].5.03.15.1[4.)1(

yXW

Wrong

Wrong

Wrong

OK

3)..(2)..(1)..()1()2( XyXyXy yyyWWdisdisdis

TT

5.23.05.0

101

110

100

5.03.15.1

)2(WT





0,5.2100

].5.23.05.0[1.)2(

yXW

0,2.2110

].5.23.05.0[2.)2(

yXW

0,2101

].5.23.05.0[3.)2(

yXW

0,7.1111

].5.23.05.0[4.)2(

yXW

OK

OK

OK

Wrong

5.13.15.1


TT


• Note that

• The number of iterations in Batch mode solution is sometimes less than those of pattern mode

• The final weights obtained by Batch mode solution are different from those obtained by pattern mode solution.


0,5.1100

].5.13.15.1[1.)3(

yXW

0,2.0110

].5.13.15.1[2.)3(

yXW

0,0101

].5.13.15.1[3.)3(

yXW

1,3.1111

].5.13.15.1[4.)3(

yXW

OK

OK

OK

OK

introduction to neural networks (under graduate course) lecture 4 of 9

Education