nural network er. abhishek k. upadhyay learning rules

Activation functions

• Unioplar

• Bipolar

Activation functions

Example

Suppose a feedforward neural network with n inputs,m hidden units (tanh activation), and l output units (linear

activation). vji is the weight from input i to hidden unit j. wkj is the weight from hidden unit j to output unit k.

If the error is we can find partial derivatives(backpropagation) and apply gradient descent.

Hebbian Learning Rule

• The learning signal is equal simply to the neuron’s output (Hebb 1949). We have :

( )

w of the weight vector becomes

w ( )

ti

i

ti i

r f w x

The increment

cf w x x

single weight w is adapted using the following

w ( )

This can be briefly written as

w , fo rj=1, 2, 3.......n

ij

tij i j

ij i j

The

cf w x x

co x

Hebbian Learning Rule

• This rule represents a purely feed forward, unsupervised learning.

• This rule states that if the crossproduct of the output and the input or correlation term is positive, this results in an increase of weight, otherwise the weight decreases.

Perceptron Learning Rule

• The learning signal is the difference between the desired and the actual neuron’s response (Rosenblatt 1958). Thus, learning is supervised and the learning signal is equal to :

• And di is the desired response.

• Weight adjustments are obtained as follows :

where sgn( x)ti i i ir d o o w

[ sgn( x)]xti i iw c d w

[ sgn( x)] for j=1,2....ntij i i jw c d w x

Perceptron Learning Rule

• This rule is applicable only for binary neuron response, and the above relationships express the rule for the bipolar binary case.

• Here, the weights are adjusted if and only if is incorrect.

• Since the desired response is either +1 or -1, the weight adjustment reduces to

• Where a + sign is applicable when di = 1 and

• The weight adjustment is zero when the desired and the actual responses agree.

io

2 xiw c sgn( x) 1t

iw

Delta learning Rule

• The delta learning rule only valid for continuous activation function and in the supervised training mode

• The learning signal for this mode is called delta and is defined as follows

•

the term f1(wtix) is the derivative of the activation function

f(net) computed for

net = wtix

[ ( x)] ( x) t ti i ir d f w f w

f (net i)

Continuous perception

oi

didi-oir

c

x

wi

x1

X2

.

.

.

.xj

xn

Delta learning rule

Delta learning Rule

• Learning rule can derived from the condition of least squared error between oi and di

• Calculating the gradient vector with respect to wi of the squared error defined as

• which is equivalent to

21( )

2 i iE d o

21[ ( x)]

2t

i iE d f w

Delta learning Rule

• We obtain the error gradient vector value

E= -(di-oi) f1(wtix)x

• The components of the gradient vector are• since the minimization of the error requires the weight

changes to be in the negative gradient direction,we take

wi= - E where is a positive constant

Delta learning Rule

• We then obtain

wi = (di-oi) f1(neti)x• or, for the single weight the adjustment becomes

wij = (di-oi) f1(neti)xj, for j=1,2,…,n

• note that weight adjustments computed based on minimization of the squared error

Delta learning Rule

• Considering the use of the general learning rule and plugging in the learning signal the weighting adjustment becomes

wi = c(di-oi) f1(neti)x

Widrow-Hoff learning Rule

• The Windrow-Hoff learning rule is applicable for the supervised training of neural networks

• It is independent of the activation function of neurons used since it maximizes the squared error between the desired output value di and the neuron’s activation

value

neti = wit x


• The learning signal for this rule is defined as follows r = di - wi

t x

• the weight vector increment under this learning rule is

or, for the single weight, the adjustment is

j = 1, 2 ….n • this rule can be considered a special case of the

delta learning rule .

ti i iw =c (d - w x) x

tij i i jw =c (d - w x) x


• assuming that f(witx)= wi

tx, or the activation function is simply the identity function f(net)=net, f ’(net)=1.

• This rule is sometimes called the LMS (Least mean square)learning rule.

• weights are initialized at any values in this method.

Correlation Learning Rule

• By substituting r = di into the general learning rule we obtain the correlation learning rule.

• The adjustments for the weight vector and the single weights respectively, are wi=cdix wij =cdixj for j=1,2,….n

Winner_take_All Learning Rule

• Winner_take_All Learning Rule is used for learning statistical properties of input.

• The learning is based on the premise that one of the neurons in the layer, say the m’th, has the max. response due to input x,as shown in.

• This neuron is declared the winner.As a result of this winning event, the weight vector wm

Figure 2.25

Winning neuron

X1

.

.Xj

.

.

.

Xn

W11

W1jW1n

Wm1

WmjWmn

Wp1

Wpj

Wpn

o1

op

on


• Wm=[wm1 wm2 …. Wmn]t

• containing weights highlighted in the figure is the only one adjusted in the given unsupervised learning step

• Its increment is computed as follows wm=(x-wm)

• or,the individual weight adjustment becomes wmj= (xj-wmj) for j=1,2, …n


• Where >0 is a small learning constant,typically decreasing as learning progresses

• the winner selection is based on the following criterion of max activation among all p neurons participating in a

competition:

wmt x = max(wi

tx) i=1,2, … n

Outstar Learning Rule

• The weight adjustments in this rule are computed as

follows wj = (d-wj)

• or, the individual adjustments are wmj = (dm-wmj) for m=1,2,..p

• note that in contrast to any learning rule discussed so far, the adjusted weights are fanning out of the j’th node in this learning

Outstar Learning Rule

method and the weight vector is defined accordingly as

wj=[w1j w2j … wpj]t

X1

.

.Xj

.

.

.

Xn

W11

W1jW1n

Wm1

Wmj

WmnWp1

Wpj

Wpn

o1

op

on

d1

dm

dp

wij

wmj

wpj

nural network er. abhishek k. upadhyay learning rules

Engineering