linear classification: the perceptron · 2020-01-29 · linear classification: the perceptron robot...

Linear Classification:

The Perceptron

Robot Image Credit: Viktoriya Sukhanova © 123RF.com

These slides were assembled by Byron Boots, with only minor modifications from Eric Eaton’s slides and

grateful acknowledgement to the many others who made their course materials freely available online. Feel free to reuse or adapt these slides for your own academic purposes, provided that you include proper attribution.

Linear Classifiers

• A hyperplane partitions into two half-spaces

– Defined by the normal vector

• is orthogonal to any vector lying

on the hyperplane

– Assumed to pass through the origin

• This is because we incorporated bias term into it by

• Consider classification with +1, -1 labels ...

2

✓

Rd

✓ 2 Rd

✓ 2 Rd

✓0 x0 = 1

Based on slide by Piyush Rai

Linear Classifiers

• Linear classifiers: represent decision boundary by hyperplane

– Note that:

3

✓

h(x) = sign(✓|x) sign(z) =

⇢1 if z � 0

�1 if z < 0where

x| =⇥1 x1 . . . xd

⇤✓ =

2

6664

✓0✓1...✓d

3

7775

✓|x > 0 =) y = +1

✓|x < 0 =) y = �1

• The perceptron uses the following update rule each

time it receives a new training instance

– If the prediction matches the label, make no change

– Otherwise, adjust θ

h(x) = sign(✓|x) sign(z) =

⇢1 if z � 0

�1 if z < 0where

✓j ✓j �↵

2

⇣h✓

⇣x(i)

⌘� y(i)

⌘x(i)j

The Perceptron

4

(x(i), y(i))

either 2 or -2

✓j ✓j �↵

2

⇣h✓

⇣x(i)

⌘� y(i)

⌘x(i)j

• The perceptron uses the following update rule each

time it receives a new training instance

• Re-write as (only upon misclassification)

– Can eliminate α in this case, since its only effect is to scale θby a constant, which doesn’t affect performance

The Perceptron

5

(x(i), y(i))

either 2 or -2

✓j ✓j + ↵y(i)x(i)j

Perceptron Rule: If is misclassified, do ✓ ✓ + y(i)x(i)✓ ✓ + y(i)x(i)

✓old

+

Why the Perceptron Update Works

6

x

x ✓old

+✓new

✓old

+misclassified


The Perceptron Cost Function

• The perceptron uses the following cost function

– is 0 if the prediction is correct

– Otherwise, it is the confidence in the misprediction

8

Jp(✓) =1

n

nX

i=1

max(0,�y(i)x(i)✓)

Jp(✓) =1

n

nX

i=1

max(0,�y(i)x(i)✓)

Based on slide by Alan Fern

Online Perceptron Algorithm

9Based on slide by Alan Fern

1.) Let ✓ [0, 0, . . . , 0]2.) Repeat:3.) Receive training example (x(i), y(i))4.) if y(i)x(i)✓ 0 // prediction is incorrect

5.) ✓ ✓ + y(i)x(i)

Online learning – the learning mode where the model update is

performed each time a single observation is received

Batch learning – the learning mode where the model update is

performed after observing the entire training set

Online Perceptron Algorithm

10Based on slide by Alan Fern

Red points are

labeled +

Blue points are

labeled -

Batch Perceptron

11

1.) Given training data�(x(i), y(i))

n

i=12.) Let ✓ [0, 0, . . . , 0]2.) Repeat:2.) Let � [0, 0, . . . , 0]3.) for i = 1 . . . n, do4.) if y(i)x(i)✓ 0 // prediction for ith instance is incorrect

5.) � �+ y(i)x(i)

6.) � �/n // compute average update

6.) ✓ ✓ + ↵�8.) Until k�k2 < ✏

• Simplest case: α = 1 and don’t normalize, yields the fixed

increment perceptron

• Guaranteed to find a separating hyperplane if one exists

Based on slide by Alan Fern

linear classification: the perceptron · 2020-01-29 · linear classification: the perceptron robot...

Documents