20.5 nerual networks

21
20.5 Nerual Networks Thanks: Professors Frank Hoffmann and Jiawei Han, and Russell and Norvig

Upload: norman-oconnor

Post on 31-Dec-2015

37 views

Category:

Documents


4 download

DESCRIPTION

20.5 Nerual Networks. Thanks: Professors Frank Hoffmann and Jiawei Han, and Russell and Norvig. Biological Neural Systems. Neuron switching time : > 10 -3 secs Number of neurons in the human brain: ~10 10 Connections (synapses) per neuron : ~10 4 –10 5 Face recognition : 0.1 secs - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 20.5 Nerual Networks

20.5 Nerual Networks

Thanks: Professors Frank Hoffmann and Jiawei Han, and

Russell and Norvig

Page 2: 20.5 Nerual Networks

Biological Neural Systems

Neuron switching time : > 10-3 secs Number of neurons in the human brain: ~1010

Connections (synapses) per neuron : ~104–105

Face recognition : 0.1 secs High degree of distributed and parallel

computation Highly fault tolerent Highly efficient Learning is key

Page 3: 20.5 Nerual Networks

Excerpt from Russell and Norvig

Page 4: 20.5 Nerual Networks

A Neuron

Computation: input signals input function(linear)

activation function(nonlinear) output signal

ajoutput links

ak

outputInput links

Wkj

ai = output(inj)

inj

j

kkjj IWin *

Page 5: 20.5 Nerual Networks

Part 1. Perceptrons: Simple NN

x1

x2

xn

.

..

w1

w2

wn

a=i=1n wi xi

Xi’s range: [0, 1]

1 if a y= 0 if a <

y

{

inputsweights

activation output

Page 6: 20.5 Nerual Networks

Decision Surface of a Perceptron

x1

x2

Decision linew1 x1 + w2 x2 = w

1

1 1

0

0

00

0

1

Page 7: 20.5 Nerual Networks

Linear Separability

x1

x2

10

0 0

Logical AND

x1 x2 a y

0 0 0 0

0 1 1 0

1 0 1 0

1 1 2 1

w1=1w2=1=1.5 x1

10

0

w1=?w2=?= ?

1

Logical XOR

x1 x2 y

0 0 0

0 1 1

1 0 1

1 1 0

Page 8: 20.5 Nerual Networks

Threshold as Weight: W0

x1

x2

xn

.

..

w1

w2

wn

w0

x0=-1

a= i=0n wi xi

y

1 if a y= 0 if a <{

=w0

Page 9: 20.5 Nerual Networks

Training the Perceptron p742

Training set S of examples {x,t} x is an input vector and t the desired target vector Example: Logical And S = {(0,0),0}, {(0,1),0}, {(1,0),0}, {(1,1),1}

Iterative process Present a training example x , compute network output y ,

compare output y with target t, adjust weights and thresholds

Learning rule Specifies how to change the weights w and thresholds

of the network as a function of the inputs x, output y and target t.

Page 10: 20.5 Nerual Networks

Perceptron Learning Rule

w’=w + (t-y) x

wi := wi + wi = wi + (t-y) xi (i=1..n) The parameter is called the learning rate.

In Han’s book it is lower case L It determines the magnitude of weight updates wi .

If the output is correct (t=y) the weights are not changed (wi =0).

If the output is incorrect (t y) the weights wi are changed such that the output of the Perceptron for the new weights w’i is closer/further to the input xi.

Page 11: 20.5 Nerual Networks

Perceptron Training Algorithm

Repeatfor each training vector pair (x,t)

evaluate the output y when x is the inputif yt then

form a new weight vector w’ accordingto w’=w + (t-y) x

else do nothing

end if end forUntil y=t for all training vector pairs or # iterations > k

Page 12: 20.5 Nerual Networks

Perceptron Convergence Theorem

The algorithm converges to the correct classification

if the training data is linearly separable and learning rate is sufficiently small

If two classes of vectors X1 and X2 are linearly separable, the application of the perceptron training algorithm will eventually result in a weight vector w0, such that w0 defines a Perceptron whose decision hyper-plane separates X1 and X2 (Rosenblatt 1962).

Solution w0 is not unique, since if w0 x =0 defines a hyper-plane, so does w’0 = k w0.

Page 13: 20.5 Nerual Networks

Experiments

Page 14: 20.5 Nerual Networks

Perceptron Learning from Patterns

x1

x2

xn

.

..

w1

w2

wn

Input pattern

Associationunits

weights (trained)

Summation Thresholdfixed

Association units (A-units) can be assigned arbitrary Booleanfunctions of the input pattern.

Page 15: 20.5 Nerual Networks

Part 2. Multi Layer Networks

Output nodes

Input nodes

Hidden nodes

Output vector

Input vector

Page 16: 20.5 Nerual Networks

Gradient Descent Learning Rule

Consider linear unit without threshold and continuous output o (not just –1,1) Output=oj=-w0 + w1 x1 + … + wn xn

Train the wi’s such that they minimize the squared error Error[w1,…,wn] = ½ jD (Tj-oj)2

where D is the set of training examples

Page 17: 20.5 Nerual Networks

Neuron with Sigmoid-Function

x1

x2

xn

.

..

w1

w2

wn

a=i=1n wi xi

Output=o=(a) =1/(1+e-a)

o

inputsweights

activation output

Page 18: 20.5 Nerual Networks

Sigmoid Unit

x1

x2

xn

.

..

w1

w2

wn

w0

x0=-1

a=i=0n wi xi

o

o=(a)=1/(1+e-a)

(x) is the sigmoid function: 1/(1+e-x)

d(x)/dx= (x) (1- (x))

Derive gradient decent rules to train:• one sigmoid function

E/wi = -j(Tj-O) o (1-o) xij

• derivation: see next page

Page 19: 20.5 Nerual Networks

Explantion: Gradient Descent Learning Rule

wi = Ojp(1-Oj

p) (Tjp-Oj

p) xip

xi

wji

yj

activation ofpre-synaptic neuron

error j ofpost-synaptic neuron

derivative of activation function

learning rate

Page 20: 20.5 Nerual Networks

Gradient Descent: Graphical

D={<(1,1),1>,<(-1,-1),1>, <(1,-1),-1>,<(-1,1),-1>}

(w1,w2)

(w1+w1,w2 +w2)

Page 21: 20.5 Nerual Networks

Perceptron vs. Gradient Descent Rule

Perceptron rule w’i = wi + (t-o) xi

derived from manipulation of decision surface.

Gradient descent rule w’i = wi + (1-y) (t-y) xi

derived from minimization of error function

E[w1,…,wn] = ½ p (t-y)2

by means of gradient descent.