neural networks - ixnisansa/classes/01... · neural networks the perceptron and learning . the...

Neural Networks THE PERCEPTRON AND LEARNING

The Biological Neuron 2

The Biological Neuron 3

The Biological Neuron

The Brain is a collection of about 10 billion interconnected neurons. Each neuron is a cell that uses biochemical reactions to receive, process and transmit information.

Each terminal button is connected to other neurons across a small gap called a synapse.

A neuron's dendritic tree is connected to a thousand neighboring neurons. When one of those neurons fire, a positive or negative charge is received by one of the dendrites. The strengths of all the received charges are added together through the processes of spatial and temporal summation.

Model of a neuron

Each neuron within the network is usually a simple processing unit which takes one or more inputs through synapses (connecting links). Every input has an associated weight (strength) which modifies the strength of each input.

The adder simply adds together all the inputs and calculates an output to be passed on.

An activation function exists for limiting the output.

Neural computing requires a number of neurons, to be connected together into a neural network. Neurons are arranged in layers.

Model of a neuron

In mathematical terms, = = = � +

= linear combiner output

= + =induced local field

(activation potential)

Affine transformation by

the bias If we want, we can consider

the bias as just another input,

� = +

Types of activation

functions

Linear Transfer Function

The activation function is generally non‐linear.

Linear functions are

limited because the

output is simply

proportional to the

input.

Types of activation

functions

Symmetric Hard Limit Transformation Function

Types of activation

functions

Threshold function

� = �� ≥ �� <

Commonly known as Heaviside function

Types of activation

functions

Satlin Transfer Function

Types of activation

functions

Tan Sigmoid Function

Types of activation

functions

Sigmoid function

One example – logistic function

� = +exp −��

You can see that the

function gets closer to

the threshold function

as the value of a

increases.

Types of activation

functions

Gauss Function

Rules of knowledge

representation

Knowledge

representation: Rules

Rule 1: Similar inputs from similar classes should usually produce similar representations inside the network, and should be classified into the same class.

Problem: how do you define similarity?

Input Vector = , , ……… �

Euclidian distance , = − = −=

Knowledge

Rule 1: Similar inputs from similar classes should usually produce similar representations inside the network, and should be classified into the same class.

Input Vector = , , ……… �

Dot product , = � = =

Usually we

normalize the

vectors to have

unit length. = =

Knowledge

Rule 1: Similar inputs from similar classes should usually

produce similar representations inside the network, and

should be classified into the same class.

As approaches

− →

� →

Knowledge

Rule 2: Items to be classified as separate classes should be

given widely different representations in the network.

Knowledge

Rule 3: If a particular feature is important, then there should

be a large number of neurons involved in representing it.

Knowledge

Rule 4: Prior information and invariances should be built into

the design of the neural network whenever possible.

This would simplify the design of NN by not having to learn

additional information.

Less free parameters to learn

Information transmission is faster

Cost is reduced

Knowledge

Rule 4: Prior information and invariances should be built into

the design of the neural network whenever possible.

How to build prior information into NN?

Unfortunately, there are no well-defined rules to do this.

Some rules of thumb:

Restrict the network architecture – usually to local connections

called receptive fields

Constrain the choice of synaptic weights – usually achieved

through weight sharing

Knowledge representation:

Invariances

The network should be invariant to trivial transformations of

the inputs.

E.g. rotation of a picture

Techniques:

Invariance by structure

Pick a structure that isn’t sensitive to the meaningless transformations of the input

Invariance by training

Let the classifier learn invariances

Invariance by feature space

pick a feature set that is invariant to the transformations

Supervised Learning &

Unsupervised learning

Classical conditioning:

Pavlov's dog

Supervised learning

In supervised training, both the inputs and the outputs are provided.

The network then processes the inputs and compares its resulting outputs against the desired outputs.

Errors are then propagated back through the system, causing the system to adjust the weights which control the network.

This process occurs over and over as the weights are continually tweaked.

The set of data which enables the training is called the training set.

During the training of a network the same set of data is processed many times as the connection weights are ever refined.

Example architectures : Multilayer perceptrons

Unsupervised learning

In unsupervised training, the network is provided with inputs but not with desired outputs.

The system itself must then decide what features it will use to

group the input data.

This is often referred to as self‐organization or adaption.

Example architectures : Kohonen, SoM

Rosenblatt’s Perceptron

Perceptron

Recall from previous section

NN – linear combiner followed by a

hard limiter

Induced local field = +=

Recap: we could represent the bias

as +1 input with b weight.

Thus we have: A Decision

hyperplane

It’s common to plot a map of decision regions into m-

dimensional input space spanned

by the m input variables

, , , …

Using Symmetric Hard

Limit Transformation

Function

� = �� ≥− �� <

The perceptron neuron

produces a 1 if the net

input into the transfer

function is equal to or

greater than 0, otherwise

it produces a -1.

Perceptron convergence

algorithm We start with:

Training vectors : + × vectors + , , , ……… . . , ,

Weight vectors : + × vectors , , , ……… . . , , � Bias

Actual responses

Desired response

Learning rate parameter � such that < � <

algorithm

Initialization: set =

At step n, activate the perceptron by applying input vector

Compute the actual response = �

Adapt the weight vector + = + � − = + −

Increment n and go to step 2.

algorithm : Example AND gate

0 0 0 -1

1 0 1 -1

2 1 0 -1

3 1 1 1

algorithm : Example

Percepron

algorithm : Example We start with:

Training vectors : + × vectors + , ,

0 + , , 1 + , , + 2 + ,+ , 3 + ,+ ,+

algorithm : Example We start with:

Bias =

Learning rate parameter � such that < � < � = .

Desired response

algorithm : Example

Initialization: set = , , = , ,

= + , , = + , , = , , � ∗ + , , = , , � ∗ + , , =

algorithm : Example

Compute the actual response

= � = =

algorithm : Example

Adapt the weight vector + = + � − = + − = − = + � − = + . ∗ − − ∗ = = + . ∗ − − ∗ = = + . ∗ − − ∗ = − .

, , = , , − .

algorithm : Example

= + , , = + , , + = , , � ∗ + , , = − . , , � ∗ + , , + =-0.2

algorithm : Example

= � = − . = −

algorithm : Example

Adapt the weight vector + = + � − = + − = − = + � − = + . ∗ − − − ∗ = = + . ∗ − − − ∗ = = − . + . ∗ − − − ∗ = − .

, , = , , − .

algorithm : Example

= + , , = + ,+ ,

= , , � ∗ + , , = − . , , � ∗ + ,+ , =-0.2

algorithm : Example

= � = − . = −

algorithm : Example

Adapt the weight vector + = + � − = + − = − = + � − = + . ∗ − − − ∗ = = + . ∗ − − − ∗ = = − . + . ∗ − − − ∗ = − .

, , = , , − .

algorithm : Example

= + , , = + ,+ ,+

= , , �* + , , = − . , , �* + ,+ ,+ =-0.2

algorithm : Example

= � = − . = −

algorithm : Example

Adapt the weight vector + = + � − = + − = = + � − = + . ∗ − − ∗ = . = + . ∗ − − ∗ = . = − . + . ∗ − − ∗ =

, , = . , . ,

algorithm : Example

Desired

outputSum

Threshold

function

C1 C2 cb s Y(n)

x1 * w1 x2 * w2 xb * wb c1+c2+cb

if s>=0

then 1,

else -1

0 0 0 1 -1 0 0 0 0 0 0 0 1 -2 -0.2 0 0 -0.2

1 0 1 1 -1 0 0 -0.2 0 0 -0.2 -0.2 -1 0 0 0 0 -0.2

2 1 0 1 -1 0 0 -0.2 0 0 -0.2 -0.2 -1 0 0 0 0 -0.2

3 1 1 1 1 0 0 -0.2 0 0 -0.2 -0.2 -1 2 0.2 0.2 0.2 0

Wb d(n)-Y(n) W1 W2 Wbb d(n) W1 W2

Initial weights

Output

Error (e)

rrecti

New weightsSensor values Per sensor

algorithm : Example

Desired

outputSum

Threshold

function

C1 C2 cb s Y(n)

x1 * w1 x2 * w2 xb * wb c1+c2+cb

if s>=0

then 1,

else -1

0 0 0 1 -1 0 0 0 0 0 0 0 1 -2 -0.2 0 0 -0.2

1 0 1 1 -1 0 0 -0.2 0 0 -0.2 -0.2 -1 0 0 0 0 -0.2

2 1 0 1 -1 0 0 -0.2 0 0 -0.2 -0.2 -1 0 0 0 0 -0.2

3 1 1 1 1 0 0 -0.2 0 0 -0.2 -0.2 -1 2 0.2 0.2 0.2 0

4 0 0 1 -1 0.2 0.2 0 0 0 0 0 1 -2 -0.2 0.2 0.2 -0.2

5 0 1 1 -1 0.2 0.2 -0.2 0 0.2 -0.2 0 1 -2 -0.2 0.2 0 -0.4

6 1 0 1 -1 0.2 0 -0.4 0.2 0 -0.4 -0.2 -1 0 0 0.2 0 -0.4

7 1 1 1 1 0.2 0 -0.4 0.2 0 -0.4 -0.2 -1 2 0.2 0.4 0.2 -0.2

8 0 0 1 -1 0.4 0.2 -0.2 0 0 -0.2 -0.2 -1 0 0 0.4 0.2 -0.2

9 0 1 1 -1 0.4 0.2 -0.2 0 0.2 -0.2 0 1 -2 -0.2 0.4 0 -0.4

10 1 0 1 -1 0.4 0 -0.4 0.4 0 -0.4 0 1 -2 -0.2 0.2 0 -0.6

11 1 1 1 1 0.2 0 -0.6 0.2 0 -0.6 -0.4 -1 2 0.2 0.4 0.2 -0.4

12 0 0 1 -1 0.4 0.2 -0.4 0 0 -0.4 -0.4 -1 0 0 0.4 0.2 -0.4

13 0 1 1 -1 0.4 0.2 -0.4 0 0.2 -0.4 -0.2 -1 0 0 0.4 0.2 -0.4

14 1 0 1 -1 0.4 0.2 -0.4 0.4 0 -0.4 0 1 -2 -0.2 0.2 0.2 -0.6

15 1 1 1 1 0.2 0.2 -0.6 0.2 0.2 -0.6 -0.2 -1 2 0.2 0.4 0.4 -0.4

16 0 0 1 -1 0.4 0.4 -0.4 0 0 -0.4 -0.4 -1 0 0 0.4 0.4 -0.4

17 0 1 1 -1 0.4 0.4 -0.4 0 0.4 -0.4 0 1 -2 -0.2 0.4 0.2 -0.6

18 1 0 1 -1 0.4 0.2 -0.6 0.4 0 -0.6 -0.2 -1 0 0 0.4 0.2 -0.6

19 1 1 1 1 0.4 0.2 -0.6 0.4 0.2 -0.6 0 1 0 0 0.4 0.2 -0.6

20 0 0 1 -1 0.4 0.2 -0.6 0 0 -0.6 -0.6 -1 0 0 0.4 0.2 -0.6

21 0 1 1 -1 0.4 0.2 -0.6 0 0.2 -0.6 -0.4 -1 0 0 0.4 0.2 -0.6

22 1 0 1 -1 0.4 0.2 -0.6 0.4 0 -0.6 -0.2 -1 0 0 0.4 0.2 -0.6

23 1 1 1 1 0.4 0.2 -0.6 0.4 0.2 -0.6 0 1 0 0 0.4 0.2 -0.6

Initial weights

Output

Error (e)

rrecti

algorithm : Example

Desired

outputSum

Threshold

function

C1 C2 cb s Y(n)

x1 * w1 x2 * w2 xb * wb c1+c2+cb

if s>=0

then 1,

else -1

0 0 0 1 -1 0 0 0 0 0 0 0 1 -2 -0.2 0 0 -0.2

1 0 1 1 -1 0 0 -0.2 0 0 -0.2 -0.2 -1 0 0 0 0 -0.2

2 1 0 1 -1 0 0 -0.2 0 0 -0.2 -0.2 -1 0 0 0 0 -0.2

3 1 1 1 1 0 0 -0.2 0 0 -0.2 -0.2 -1 2 0.2 0.2 0.2 0

4 0 0 1 -1 0.2 0.2 0 0 0 0 0 1 -2 -0.2 0.2 0.2 -0.2

5 0 1 1 -1 0.2 0.2 -0.2 0 0.2 -0.2 0 1 -2 -0.2 0.2 0 -0.4

6 1 0 1 -1 0.2 0 -0.4 0.2 0 -0.4 -0.2 -1 0 0 0.2 0 -0.4

7 1 1 1 1 0.2 0 -0.4 0.2 0 -0.4 -0.2 -1 2 0.2 0.4 0.2 -0.2

8 0 0 1 -1 0.4 0.2 -0.2 0 0 -0.2 -0.2 -1 0 0 0.4 0.2 -0.2

9 0 1 1 -1 0.4 0.2 -0.2 0 0.2 -0.2 0 1 -2 -0.2 0.4 0 -0.4

10 1 0 1 -1 0.4 0 -0.4 0.4 0 -0.4 0 1 -2 -0.2 0.2 0 -0.6

11 1 1 1 1 0.2 0 -0.6 0.2 0 -0.6 -0.4 -1 2 0.2 0.4 0.2 -0.4

12 0 0 1 -1 0.4 0.2 -0.4 0 0 -0.4 -0.4 -1 0 0 0.4 0.2 -0.4

13 0 1 1 -1 0.4 0.2 -0.4 0 0.2 -0.4 -0.2 -1 0 0 0.4 0.2 -0.4

14 1 0 1 -1 0.4 0.2 -0.4 0.4 0 -0.4 0 1 -2 -0.2 0.2 0.2 -0.6

15 1 1 1 1 0.2 0.2 -0.6 0.2 0.2 -0.6 -0.2 -1 2 0.2 0.4 0.4 -0.4

16 0 0 1 -1 0.4 0.4 -0.4 0 0 -0.4 -0.4 -1 0 0 0.4 0.4 -0.4

17 0 1 1 -1 0.4 0.4 -0.4 0 0.4 -0.4 0 1 -2 -0.2 0.4 0.2 -0.6

18 1 0 1 -1 0.4 0.2 -0.6 0.4 0 -0.6 -0.2 -1 0 0 0.4 0.2 -0.6

19 1 1 1 1 0.4 0.2 -0.6 0.4 0.2 -0.6 0 1 0 0 0.4 0.2 -0.6

20 0 0 1 -1 0.4 0.2 -0.6 0 0 -0.6 -0.6 -1 0 0 0.4 0.2 -0.6

21 0 1 1 -1 0.4 0.2 -0.6 0 0.2 -0.6 -0.4 -1 0 0 0.4 0.2 -0.6

22 1 0 1 -1 0.4 0.2 -0.6 0.4 0 -0.6 -0.2 -1 0 0 0.4 0.2 -0.6

23 1 1 1 1 0.4 0.2 -0.6 0.4 0.2 -0.6 0 1 0 0 0.4 0.2 -0.6

Wbb d(n) W1 W2

rrecti

Wb d(n)-Y(n) W1 W2

Initial weights

Output

Error (e)

algorithm : Example

AND Percepron

� . .

algorithm : Example

Decision hyperplane

1.5 0 1

.4 + . − .6 =

If there exists a set of connection weights ∗ which is able to perform the

transformation = � , the

perceptron learning rule will converge

to some solution (which may or may not be the same as ∗) in a finite

number of steps for any initial choice of

the weights.

algorithm : Example

Desired

outputSum

Threshold

function

C1 C2 cb s Y(n)

x1 * w1 x2 * w2 xb * wb c1+c2+cb

if s>=0

then 1,

else -1

0 0 0 1 -1 1 1 1 0 0 1 1 1 -2 -0.2 1 1 0.8

1 0 1 1 -1 1 1 0.8 0 1 0.8 1.8 1 -2 -0.2 1 0.8 0.6

2 1 0 1 -1 1 0.8 0.6 1 0 0.6 1.6 1 -2 -0.2 0.8 0.8 0.4

3 1 1 1 1 0.8 0.8 0.4 0.8 0.8 0.4 2 1 0 0 0.8 0.8 0.4

4 0 0 1 -1 0.8 0.8 0.4 0 0 0.4 0.4 1 -2 -0.2 0.8 0.8 0.2

5 0 1 1 -1 0.8 0.8 0.2 0 0.8 0.2 1 1 -2 -0.2 0.8 0.6 0

6 1 0 1 -1 0.8 0.6 0 0.8 0 0 0.8 1 -2 -0.2 0.6 0.6 -0.2

7 1 1 1 1 0.6 0.6 -0.2 0.6 0.6 -0.2 1 1 0 0 0.6 0.6 -0.2

8 0 0 1 -1 0.6 0.6 -0.2 0 0 -0.2 -0.2 -1 0 0 0.6 0.6 -0.2

9 0 1 1 -1 0.6 0.6 -0.2 0 0.6 -0.2 0.4 1 -2 -0.2 0.6 0.4 -0.4

10 1 0 1 -1 0.6 0.4 -0.4 0.6 0 -0.4 0.2 1 -2 -0.2 0.4 0.4 -0.6

11 1 1 1 1 0.4 0.4 -0.6 0.4 0.4 -0.6 0.2 1 0 0 0.4 0.4 -0.6

12 0 0 1 -1 0.4 0.4 -0.6 0 0 -0.6 -0.6 -1 0 0 0.4 0.4 -0.6

13 0 1 1 -1 0.4 0.4 -0.6 0 0.4 -0.6 -0.2 -1 0 0 0.4 0.4 -0.6

14 1 0 1 -1 0.4 0.4 -0.6 0.4 0 -0.6 -0.2 -1 0 0 0.4 0.4 -0.6

15 1 1 1 1 0.4 0.4 -0.6 0.4 0.4 -0.6 0.2 1 0 0 0.4 0.4 -0.6

Initial weights

Output

Error (e)

rrecti

algorithm : Example

Decision hyperplane

1.5 0 1

.4 + .4 − .6 =

Simple logic functions :

XOR????

Simple perceptron can’t represent a logical XOR

function.

Add a hidden layer 64

− . −

XOR 65

Desired

outputSum

Threshold

functionSum

Threshold

function

Y1(n) Y2(n)

if s>=0

then 1,

else -1

if s>=0

then 1,

else -1

0 -1 -1 -0.5 -0.5 -1 -2.5 -1 -0.5 -1

1 -1 1 -0.5 -0.5 1 -0.5 -1 1.5 1

2 1 -1 -0.5 -0.5 1 -0.5 -1 1.5 1

3 1 1 -0.5 -0.5 -1 1.5 1 -0.5 -1

Layer 1

Layer 2

Sensor values

X1 X2 b1 d(n)b2

− . −

XOR-Decision hyperplane 66

0 -1 -1 -1 -1

1 -1 1 -1 1

2 1 -1 -1 1

3 1 1 1 -1

n X1 X2 Y1(n) Y2(n)

Perceptron with a hidden

By adding the extra dimension, we made the XOR problem

a separable case.

Multi-layer networks are more powerful in their expressive ability.

However, we can’t use the same learning algorithm as earlier!

System isn’t linear anymore, and the training algorithm does not converge.

In the next lecture, we will learn how to handle this.

References

The lecture slides are based on the slides

prepared by Dr. Chathura De Silva and Dr.

Upali Kohomban for this class in previous

years.

neural networks - ixnisansa/classes/01... · neural networks the perceptron and learning . the...

Documents

biological and artificial neuron

biological modeling of neural networks: week 9 –...

introduction to neural networks introduction to neural...

spiking neural networks banafsheh rekabdar. biological...

neural networks. biological neuron dendrites cell body axon...

convergent temperature representations in artificial and...

convolutional neural...

refractory neuron circuits - caltechauthors · neural...

spiking neural model - maynooth...

artificial neural networks 1 morten nielsen department of...

from biological to artificial neuron...

experiment 4 neural networks using matlab. neuron model...

neural networks. biological neuronartificial neuron

vlsi - neuron and neural network design

perceptrondtnghi/ml/mlp-en.pdf · introduction artificial...

behavioral simulation of biological neuron systems in...

cnvlutin: ineffectual-neuron-free deep neural network...

neuron primer - hms.harvard.edu...role in dissecting neural...

1 review – biological neuron the neuron - a biological...

unit 3a: biological bases of behavior: neural processing and...