perceptrons - temporal dynamics of learning...

Perceptrons Gary Cottrell

Cognitive Science Summer School2

Perceptrons: A bit of history

Frank Rosenblatt studied a simple version of a neural net called a perceptron:

n  A single layer of processingn  Binary outputn  Can compute simple things like (some) boolean

functions (OR, AND, etc.)

Cognitive Science Summer School3

Perceptrons: A bit of history

n  Computes the weighted sum of its inputs (the net input, compares it to a threshold, and “fires” if the net is greater than or equal than the threshold.

n  (Note in the picture it’s “>”, but we will use “>=“ to make sure you’re awake!)

Cognitive Science Summer School 4

The Perceptron Activation Rule

This is called a linear threshold unit

net input

output


Quiz

Assume: FALSE == 0, TRUE==1, so if X1 is false, it is 0. Can you come up with a set of weights and a threshold so that a two-input perceptron computes OR?

X1 X2 X1 OR X20 0 00 1 11 0 11 1 1


Quiz

Assume: FALSE == 0, TRUE==1 Can you come up with a set of weights and a threshold so that a two-input perceptron computes AND?

X1 X2 X1 AND X20 0 00 1 01 0 01 1 1


Quiz

Assume: FALSE == 0, TRUE==1 Can you come up with a set of weights and a threshold so that a two-input perceptron computes XOR?

X1 X2 X1 XOR X20 0 00 1 11 0 11 1 0


Perceptrons

The goal was to make a neurally-inspired machine that could categorize inputs – and learn to do this from examples

Learning: A bit of historyn  Rosenblatt (1962) discovered a learning

rule for perceptrons called the perceptron convergence procedure.

n  Guaranteed to learn anything computable (by a two-layer perceptron)

n  Unfortunately, not everything was computable (Minsky & Papert, 1969)

Cognitive Science Summer School9 8/10/15

10

Perceptron Learningn  It is supervised learning:

n  There is a set of input patternsn  and a set of desired outputs (the targets or teaching

signal)n  The network is presented with the inputs, and

FOOMP, it computes the output, and the output is compared to the target.

n  If they don’t match, it changes the weights and threshold so it will get closer to producing the target next time.

Cognitive Science Summer School


Perceptron Learning First, get a training set - let’s choose ORFour “patterns”:

X1 X2 X1 OR X20 0 00 1 11 0 11 1 1


Perceptron Learning Made Simple

n  Output activation rule: n  First, compute the output of the network:

n  Learning rule:If output is 1 and should be 0, then lower weights to

active inputs and raise the threshold θ If output is 0 and should be 1, then raise weights to

active inputs and lower the threshold θ(“active input” means xi = 1, not 0)


Perceptron Learning n  First, get a training set - let’s choose ORn  Four “patterns”:

Now, randomly present these to the network, apply the learning rule, and continue until it doesn’t make any mistakes.

X1 X2 X1 OR X20 0 00 1 11 0 11 1 1


STOP HERE FOR TRAINING DEMO of OR.

Cognitive Science Summer School

15

Characteristics of perceptron learning

n Supervised learning: Gave it a set of input-output examples for it to model the function (a teaching signal)

n Error correction learning: only corrected it when it is wrong (never praised! ;-))

n Random presentation of patterns.n Slow! Learning on some patterns ruins learning

on others.n This can explain the U-shaped learning of the

past tense in English.8/10/15


Perceptron Learning Made Simple for Computer Science

n  Output activation rule: n  First, compute the output of the network:

n  Learning rule:If output is 1 and should be 0, then lower weights to

active inputs and raise the threshold θIf output is 0 and should be 1, then raise weights to

active inputs and lower the threshold θ(“active input” means xi = 1, not 0)


Perceptron Learning Made Simple!for Computer Science

n  Learning rule: If output is 1 and should be 0, then lower weights to active

inputs and raise the threshold If output is 0 and should be 1, then raise weights to active

inputs and lower the threshold n  Easy one-line program to do this:

wi(t+1) = wi(t) + α(teacher – output)xi (α is the learning rate)

n  This is known as the delta rule because learning is based on the delta (difference) between what you did and what you should have done: δ = (teacher - output)


Let’s convince ourselves these are the same…

n  Learning rule: If output is 1 and should be 0, then lower weights to active

inputs and raise the threshold wi(t+1) = wi(t) + α(teacher – output)xi wi(t+1) = wi(t) + α(0 – 1)1 (“xi is an active input” means xi = 1)

= wi(t) – α lower weight What if xi is inactive? wi(t+1) = wi(t) + α(teacher – output)xi wi(t+1) = wi(t) + α(teacher – output)0

= wi(t) (no change)



n  Learning rule: If output is 0 and should be 1, then raise weights to active

inputs and lower the threshold wi(t+1) = wi(t) + α(teacher – output)xi wi(t+1) = wi(t) + α(1 – 0)1 = wi(t) + α (raise weight)



n  What about the threshold? We just treat θ as a weight from a unit that is always a constant -1. If output is 1 and should be 0, then lower weights to active

inputs and raise the threshold n  Learning rule:

wi(t+1) = wi(t) + α(teacher – output)xi I.e.: θ(t+1) = θ(t) + α(0 – 1)(-1)

θ(t+1) = θ(t) + α(–1)(-1) θ(t+1) = θ(t) + α raise θ



n  What if we get it right??? Then teacher = output, and …

n  Learning rule: wi(t+1) = wi(t) + α(teacher – output)xi wi(t+1) = wi(t) + α(0)xi = wi(t) (no change)


But wait! There’s more!!!n  Learning rule:


n  In fact, this learning rule works for arbitrary numeric inputs.

n  I.e., the xi’s don’t need to be binary – they can be real numbers, or integers (like pixel values!).

n  So, I can train a perceptron to be a “Gary detector,” since an image is just a table of pixel values.

n  (Illustrate on board)


But wait! There’s more!!!n  Learning rule:


n  In fact, this learning rule works for linear outputs n  I.e., we can use this to do linear regression! n  (Illustrate on board) n  The learning rule itself can be derived as

minimizing the euclidean distance between the desired outputs (the teaching signal) and the actual outputs.


But wait! There’s more!!!n  A simple variant gives us logistic regression n  Logistic regression, if you don’t know what that is,

is when you have a nominal output variable, e.g., “give this guy a loan? Or not?”

n  This ends up being a kind of “soft” perceptron. n  It is more useful than a perceptron because it gives

a probability of the output being true. n  (Do Demo, if possible)

perceptrons - temporal dynamics of learning...

Documents