introduction to natural computation lecture 08

47
Introduction to Natural Computation Lecture 08 Perceptrons Leandro Minku 1 / 47

Upload: others

Post on 04-May-2022

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Natural Computation Lecture 08

Introduction to Natural Computation

Lecture 08

Perceptrons

Leandro Minku

1 / 47

Page 2: Introduction to Natural Computation Lecture 08

Overview of the Lecture

Brief introduction to machine learning.

What are Perceptrons?

How do they learn?

Applications of Perceptrons.

2 / 47

Page 3: Introduction to Natural Computation Lecture 08

Machine Learning

Focus

To study and develop computational models capable of improving their performancewith experience and of acquiring knowledge on their own.

How?

Through data examples.

Types of learning:Supervised.

Unsupervised.

Semi-supervised.

Reinforcement.

3 / 47

Page 4: Introduction to Natural Computation Lecture 08

Supervised Learning

Examples:

Attributes: features of the examples.

Label.

Unlabelled data are frequently called instances.

We use algorithms to learn based on labelled examples and generalize to newinstances.

Table: Credit Approval

Age Gender Salary Bank account start time ... Good or bad payer

4 / 47

Page 5: Introduction to Natural Computation Lecture 08

Neural Networks

[video]

5 / 47

Page 6: Introduction to Natural Computation Lecture 08

A Single Neuron

A neuron receives several inputs and combines these in the cell body.

If the input reaches a threshold, then the neuron may fire (produce an output).

Some inputs are excitatory, while others are inhibitory.

6 / 47

Page 7: Introduction to Natural Computation Lecture 08

Perceptron: An artificial neuron

Perceptron was developed by Frank Rosenblatt in 1957 and can be considered as thesimplest artificial neural network.

7 / 47

Page 8: Introduction to Natural Computation Lecture 08

Perceptron: An artificial neuron

Perceptron was developed by Frank Rosenblatt in 1957 and can be considered as thesimplest artificial neural network.

Input function:u(x) =

∑ni=1 wixi

Activation function: threshold

y = f(u(x)) =

{1, if u(x) > θ

0, otherwise

Activation state:0 or 1 (-1 or 1)

8 / 47

Page 9: Introduction to Natural Computation Lecture 08

Perceptron: An artificial neuron

Perceptron was developed by Frank Rosenblatt in 1957 and can be considered as thesimplest artificial neural network.

Input function:u(x) =

∑ni=1 wixi

Activation function: threshold

y = f(u(x)) =

{1, if u(x) > θ

0, otherwise

Activation state:0 or 1 (-1 or 1)

9 / 47

Page 10: Introduction to Natural Computation Lecture 08

Perceptron: An artificial neuron

Perceptron was developed by Frank Rosenblatt in 1957 and can be considered as thesimplest artificial neural network.

Input function:u(x) =

∑ni=1 wixi

Activation function: threshold

y = f(u(x)) =

{1, if u(x) > θ

0, otherwise

Activation state:0 or 1 (-1 or 1)

10 / 47

Page 11: Introduction to Natural Computation Lecture 08

Perceptron: An artificial neuron

Perceptron was developed by Frank Rosenblatt in 1957 and can be considered as thesimplest artificial neural network.

Inputs are typically in the range [0, 1], where 0 is “off” and 1 is “on”.

Weights can be any real number (positive or negative).

11 / 47

Page 12: Introduction to Natural Computation Lecture 08

Perceptron – A linear classifier

Input function:u(x) =

∑ni=1 wixi

Activation function: threshold

y = f(u(x)) =

{1, if u(x) > θ

0, otherwise

u(x) > θ

12 / 47

Page 13: Introduction to Natural Computation Lecture 08

Perceptron – A linear classifier

Input function:u(x) =

∑ni=1 wixi

Activation function: threshold

y = f(u(x)) =

{1, if u(x) > θ

0, otherwise

u(x) > θwx > θ

13 / 47

Page 14: Introduction to Natural Computation Lecture 08

Perceptron – A linear classifier

Input function:u(x) =

∑ni=1 wixi

Activation function: threshold

y = f(u(x)) =

{1, if u(x) > θ

0, otherwise

u(x) > θwx > θwx− θ > 0

14 / 47

Page 15: Introduction to Natural Computation Lecture 08

Perceptron – A linear classifier

Input function:u(x) =

∑ni=1 wixi

Activation function: threshold

y = f(u(x)) =

{1, if u(x) > θ

0, otherwise

u(x) > θwx > θwx− θ > 0

Linear equation (two variables):Slope–intercept form: y = mx+ bGeneral form: ax+ by + c = 0

Linear equation (n variables):General form:a1x1 + a2x2 + ...+ anxn + b = 0

15 / 47

Page 16: Introduction to Natural Computation Lecture 08

Perceptron – A linear classifier

Input function:u(x) =

∑ni=1 wixi

Activation function: threshold

y = f(u(x)) =

{1, if u(x) > θ

0, otherwise

u(x) > θwx > θwx− θ > 0

Linear equation:w1x1 + w2x2 + ...+ wnxn − θ = 0

Above the linear boundary, class 1:w1x1 + w2x2 + ...+ wnxn − θ > 0

Below or on the linear boundary, class 0:w1x1 + w2x2 + ...+ wnxn − θ ≤ 0

16 / 47

Page 17: Introduction to Natural Computation Lecture 08

Perceptron – A linear classifier

Input function:u(x) =

∑ni=1 wixi

Activation function: threshold

y = f(u(x)) =

{1, if u(x) > θ

0, otherwise

u(x) > θwx > θwx− θ > 0

Linear equation:w1x1 + w2x2 + ...+ wnxn − θ = 0

Above the linear boundary, class 1:w1x1 + w2x2 + ...+ wnxn − θ > 0

Below or on the linear boundary, class 0:w1x1 + w2x2 + ...+ wnxn − θ ≤ 0

17 / 47

Page 18: Introduction to Natural Computation Lecture 08

Exercise

Consider the perceptron in the right andθ = 1.5.

Determine the outputs for the inputs belowand plot them in a graph x1 vs x2.

{0.0,0.0},{0.0,1.0} and

{1.0,1.0}.

Use a closed circle to indicate output 1 andan opened circle to indicate output 0. Anexample for the input {1.0,0.0} is shown inthe right.

0 0.2 0.4 0.6 0.8 10

0.10.20.30.40.50.60.70.80.9

1

x1

x2

Class 1

Class 0

18 / 47

Page 19: Introduction to Natural Computation Lecture 08

Exercise

Consider the perceptron in the right andθ = 1.5.

Determine the outputs for the inputs belowand plot them in a graph x1 vs x2.

{0.0,0.0},{0.0,1.0} and

{1.0,1.0}.

Use a closed circle to indicate output 1 andan opened circle to indicate output 0. Anexample for the input {1.0,0.0} is shown inthe right.

0 0.2 0.4 0.6 0.8 10

0.10.20.30.40.50.60.70.80.9

1

x1

x2

Class 1

Class 0

19 / 47

Page 20: Introduction to Natural Computation Lecture 08

Logic Gate AND

Consider the perceptron in the right andθ = 1.5.

x1 x2 x1 AND x20 0 00 1 01 0 01 1 1

0 0.2 0.4 0.6 0.8 10

0.10.20.30.40.50.60.70.80.9

1

x1

x2

Class 1

Class 0

20 / 47

Page 21: Introduction to Natural Computation Lecture 08

Exercise

Consider the perceptron in the right andθ = 1.5.

Draw a line that represents the decisionboundary between classes 1 and 0.

Remembering (line equation):w1x1 + w2x2 = θ

You need two points to draw a line.

0 0.2 0.4 0.6 0.8 10

0.10.20.30.40.50.60.70.80.9

1

x1

x2

Class 1

Class 0

21 / 47

Page 22: Introduction to Natural Computation Lecture 08

Exercise

Consider the perceptron in the right andθ = 1.5.

Draw a line that represents the decisionboundary between classes 1 and 0.

Remembering (line equation):w1x1 + w2x2 = θ

You need two points to draw a line.

0 0.2 0.4 0.6 0.8 10

0.10.20.30.40.50.60.70.80.9

1

x1

x2

Class 1

Class 0

22 / 47

Page 23: Introduction to Natural Computation Lecture 08

Exercise

Consider the perceptron in the right andθ = 1.5.

Note that:

Anything below or on the line is class 0.

Anything above the line is class 1.

If we calculate the outputs to:

{0.2,−0.1} =1×0.2+1×(−0.1) = 0.1 < 1.5→ class 0

{0.9, 1.1} =1× 0.9 + 1× 1.1 = 2 > 1.5→ class 1

A nice property

Perceptrons are noise tolerant!0 0.2 0.4 0.6 0.8 1

00.10.20.30.40.50.60.70.80.9

1

x1

x2

Class 1

Class 0

23 / 47

Page 24: Introduction to Natural Computation Lecture 08

A Nice Property of Perceptrons

Faulty or degraded hardware

What would happen if the input signals were to degrade somewhat?What would happen if the weights were slighty wrong?

x1 x2 x1 AND x20.2 -0.1 00.1 1.05 01.1 0.3 0

0.91 1.0 1

Robustness

Neural networks are often still able to give us the right answer, if there is a smallamount of degradation.

Compare this with classical computing approaches!

24 / 47

Page 25: Introduction to Natural Computation Lecture 08

A Nice Property of Perceptrons

Faulty or degraded hardware

What would happen if the input signals were to degrade somewhat?What would happen if the weights were slighty wrong?

x1 x2 x1 AND x20.2 -0.1 00.1 1.05 01.1 0.3 0

0.91 1.0 1

Robustness

Neural networks are often still able to give us the right answer, if there is a smallamount of degradation.

Compare this with classical computing approaches!

25 / 47

Page 26: Introduction to Natural Computation Lecture 08

Another Interesting Example - Logic Gate XOR

x1 x2 x1 XOR x20 0 00 1 11 0 11 1 0

0 0.2 0.4 0.6 0.8 10

0.10.20.30.40.50.60.70.80.9

1

x1

x2

Class 1

Class 0

A not so nice property

Perceptrons cannot model non linearlyseparable data.

26 / 47

Page 27: Introduction to Natural Computation Lecture 08

Another Interesting Example - Logic Gate XOR

x1 x2 x1 XOR x20 0 00 1 11 0 11 1 0

0 0.2 0.4 0.6 0.8 10

0.10.20.30.40.50.60.70.80.9

1

x1

x2

Class 1

Class 0

A not so nice property

Perceptrons cannot model non linearlyseparable data.

27 / 47

Page 28: Introduction to Natural Computation Lecture 08

Learning

So far:

We know what is a Perceptron.

We learnt that it works as a linear classifier.

We saw a Perceptron that implements AND.

We learnt that perceptrons are somewhat robust.

But... choosing weights and threshold θ for the perceptron is not easy! How to learnthe weights and threshold from examples?

We can use a learning algorithm that adjusts the weights and threshold θ based onexamples.

[video]

28 / 47

Page 29: Introduction to Natural Computation Lecture 08

Learning

So far:

We know what is a Perceptron.

We learnt that it works as a linear classifier.

We saw a Perceptron that implements AND.

We learnt that perceptrons are somewhat robust.

But... choosing weights and threshold θ for the perceptron is not easy! How to learnthe weights and threshold from examples?

We can use a learning algorithm that adjusts the weights and threshold θ based onexamples.

[video]

29 / 47

Page 30: Introduction to Natural Computation Lecture 08

Learning

So far:

We know what is a Perceptron.

We learnt that it works as a linear classifier.

We saw a Perceptron that implements AND.

We learnt that perceptrons are somewhat robust.

But... choosing weights and threshold θ for the perceptron is not easy! How to learnthe weights and threshold from examples?

We can use a learning algorithm that adjusts the weights and threshold θ based onexamples.

[video]

30 / 47

Page 31: Introduction to Natural Computation Lecture 08

Learning – A trick to learn θ

∑ni=1 wixi > θ

∑ni=1 wixi − θ > 0

w1x1 + w2x2 + ...+ wnxn − θ > 0

w1x1 + w2x2 + ...+ wnxn + θ ∗ (−1) > 0

31 / 47

Page 32: Introduction to Natural Computation Lecture 08

Learning – A trick to learn θ

∑ni=1 wixi > θ∑ni=1 wixi − θ > 0

w1x1 + w2x2 + ...+ wnxn − θ > 0

w1x1 + w2x2 + ...+ wnxn + θ ∗ (−1) > 0

32 / 47

Page 33: Introduction to Natural Computation Lecture 08

Learning – A trick to learn θ

∑ni=1 wixi > θ∑ni=1 wixi − θ > 0

w1x1 + w2x2 + ...+ wnxn − θ > 0

w1x1 + w2x2 + ...+ wnxn + θ ∗ (−1) > 0

33 / 47

Page 34: Introduction to Natural Computation Lecture 08

Learning – A trick to learn θ

∑ni=1 wixi > θ∑ni=1 wixi − θ > 0

w1x1 + w2x2 + ...+ wnxn − θ > 0

w1x1 + w2x2 + ...+ wnxn + θ ∗ (−1) > 0

34 / 47

Page 35: Introduction to Natural Computation Lecture 08

Learning – A trick to learn θ

∑ni=1 wixi > θ∑ni=1 wixi − θ > 0

w1x1 + w2x2 + ...+ wnxn − θ > 0

w1x1 + w2x2 + ...+ wnxn + θ ∗ (−1) > 0

35 / 47

Page 36: Introduction to Natural Computation Lecture 08

Learning – A trick to learn θ

∑ni=1 wixi > θ∑ni=1 wixi − θ > 0

w1x1 + w2x2 + ...+ wnxn − θ > 0

w1x1 + w2x2 + ...+ wnxn + θ ∗ (−1) > 0

We can consider θ as a weight to belearnt!

The input is fixed as -1.

The activation function is then:

y = f(u(x)) =

{1, if u(x) > 0

0, otherwise

36 / 47

Page 37: Introduction to Natural Computation Lecture 08

Perceptron’s Learning Rule

As explained:

Learning happens by adjusting weights.

The threshold can be considered as a weight.

The adjustments are based on the following:

Perceptron’s Learning Rule

wi ← wi + ∆wi

∆wi = η(t− o)xi

Where:

η, 0 < η ≤ 1 is a constant called learning rate.

t is the target output of the current example.

o is the output obtained by the Perceptron.

37 / 47

Page 38: Introduction to Natural Computation Lecture 08

Perceptron’s Learning Rule – A closer look

Learning Rule

wi ← wi + ∆wi

∆wi = η(t− o)xi

Consider the following cases:

o = 1 and t = 1.∆wi = η(t− o)xi

= η(1− 1)xi= 0

o = 0 and t = 1.∆wi = η(t− o)xi

= η(1− 0)xi= ηxi

The learning rate η is positive and controls how big the changes ∆wi are.

If xi > 0, ∆wi > 0. Then wi increases in an attempt to make wixi becomelarger than the threshold θ.

If xi < 0, ∆wi < 0. Then wi reduces. Note that wixi > 0 if wi < 0.

38 / 47

Page 39: Introduction to Natural Computation Lecture 08

Learning Rule – Exercise

Perceptron’s Learning Rule

wi ← wi + ∆wi

∆wi = η(t− o)xi

Consider a Perceptron with only one input x1, weight w1 = 0.5, threshold θ = 0 andlearning rate η = 0.6. Consider also the training example {x1 = −1, t = 1}. For now,let’s temporarily ignore the learning of the threshold and consider it fixed.

Determine:

The output of the Perceptron for the input -1.

The new weight w1 after applying the learning rule.

The new output of the Perceptron for the input -1.

39 / 47

Page 40: Introduction to Natural Computation Lecture 08

Learning Rule – Exercise

Perceptron’s Learning Rule

wi ← wi + ∆wi

∆wi = η(t− o)xi

Consider a Perceptron with only one input x1, weight w1 = 0.5, threshold θ = 0 andlearning rate η = 0.6. Consider also the training example {x1 = −1, t = 1}. For now,let’s temporarily ignore the learning of the threshold and consider it fixed.

Determine:

The output of the Perceptron for the input -1.w1x1 = 0.5(−1) = −0.5 < θ → o = 0

The new weight w1 after applying the learning rule.

The new output of the Perceptron for the input -1.

40 / 47

Page 41: Introduction to Natural Computation Lecture 08

Learning Rule – Exercise

Perceptron’s Learning Rule

wi ← wi + ∆wi

∆wi = η(t− o)xi

Consider a Perceptron with only one input x1, weight w1 = 0.5, threshold θ = 0 andlearning rate η = 0.6. Consider also the training example {x1 = −1, t = 1}. For now,let’s temporarily ignore the learning of the threshold and consider it fixed.

Determine:

The output of the Perceptron for the input -1.w1x1 = 0.5(−1) = −0.5 < θ → o = 0

The new weight w1 after applying the learning rule.∆w1 = η(t− o)x1 = 0.6(1− 0)(−1) = −0.6→ w1 = 0.5− 0.6 = −0.1

The new output of the Perceptron for the input -1.

41 / 47

Page 42: Introduction to Natural Computation Lecture 08

Learning Rule – Exercise

Perceptron’s Learning Rule

wi ← wi + ∆wi

∆wi = η(t− o)xi

Consider a Perceptron with only one input x1, weight w1 = 0.5, threshold θ = 0 andlearning rate η = 0.6. Consider also the training example {x1 = −1, t = 1}. For now,let’s temporarily ignore the learning of the threshold and consider it fixed.

Determine:

The output of the Perceptron for the input -1.w1x1 = 0.5(−1) = −0.5 < θ → o = 0

The new weight w1 after applying the learning rule.∆w1 = η(t− o)x1 = 0.6(1− 0)(−1) = −0.6→ w1 = 0.5− 0.6 = −0.1

The new output of the Perceptron for the input -1.w1x1 = −0.1(−1) = 0.1 > θ → o = 1

42 / 47

Page 43: Introduction to Natural Computation Lecture 08

Learning Rule – To try at home

Do the same as in the previous slide, but using learning rate η = 0.3.

Do the same as in the previous slide, but considering the threshold θ as a weightto be learnt.

43 / 47

Page 44: Introduction to Natural Computation Lecture 08

Learning Algorithm

1: Initialize all weights randomly.2: repeat3: for each training example do4: Apply the learning rule.5: end for6: until the error is acceptable or a certain number of iterations is reached.

This algorithm is guaranteed to find a solution with error zero in a limited number ofiterations as long as the examples are linearly separable.

44 / 47

Page 45: Introduction to Natural Computation Lecture 08

Applications of Perceptrons

Perceptrons can be used only for linearly separable data.

SPAM filter.

Web pages classification.

However, if we use a transformation of the data to another space in which dataseparation is easier (kernels), perceptrons can have more applications.

Neural networks with several layers of artificial neurons (multilayer perceptrons) canbe used as universal approximators (next lecture)!

45 / 47

Page 46: Introduction to Natural Computation Lecture 08

Summary

In this lecture, we’ve seen:

What a Perceptron is.

That Perceptron works as a linear classifier.

That Perceptrons are somewhat robust.

A learning algorithm for Perceptrons.

What Perceptrons can be used for.

46 / 47

Page 47: Introduction to Natural Computation Lecture 08

Further Reading

Gurney K.Chapters 2, 3, 4.1-4.4.In: Introduction to Neural Networks. Taylor and Francis; 2003. .

47 / 47