linear classification: the perceptron · 2020-01-29 · linear classification: the perceptron robot...

11
Linear Classification: The Perceptron Robot Image Credit: Viktoriya Sukhanova © 123RF.com These slides were assembled by Byron Boots, with only minor modifications from Eric Eaton’s slides and grateful acknowledgement to the many others who made their course materials freely available online. Feel free to reuse or adapt these slides for your own academic purposes, provided that you include proper attribution.

Upload: others

Post on 20-May-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Linear Classification: The Perceptron · 2020-01-29 · Linear Classification: The Perceptron Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron

Linear Classification:

The Perceptron

Robot Image Credit: Viktoriya Sukhanova © 123RF.com

These slides were assembled by Byron Boots, with only minor modifications from Eric Eaton’s slides and

grateful acknowledgement to the many others who made their course materials freely available online. Feel free to reuse or adapt these slides for your own academic purposes, provided that you include proper attribution.

Page 2: Linear Classification: The Perceptron · 2020-01-29 · Linear Classification: The Perceptron Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron

Linear Classifiers

• A hyperplane partitions into two half-spaces

– Defined by the normal vector

• is orthogonal to any vector lying

on the hyperplane

– Assumed to pass through the origin

• This is because we incorporated bias term into it by

• Consider classification with +1, -1 labels ...

2

Rd

✓ 2 Rd

✓ 2 Rd

✓0 x0 = 1

Based on slide by Piyush Rai

Page 3: Linear Classification: The Perceptron · 2020-01-29 · Linear Classification: The Perceptron Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron

Linear Classifiers

• Linear classifiers: represent decision boundary by hyperplane

– Note that:

3

h(x) = sign(✓|x) sign(z) =

⇢1 if z � 0

�1 if z < 0where

x| =⇥1 x1 . . . xd

⇤✓ =

2

6664

✓0✓1...✓d

3

7775

✓|x > 0 =) y = +1

✓|x < 0 =) y = �1

Page 4: Linear Classification: The Perceptron · 2020-01-29 · Linear Classification: The Perceptron Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron

• The perceptron uses the following update rule each

time it receives a new training instance

– If the prediction matches the label, make no change

– Otherwise, adjust θ

h(x) = sign(✓|x) sign(z) =

⇢1 if z � 0

�1 if z < 0where

✓j ✓j �↵

2

⇣h✓

⇣x(i)

⌘� y(i)

⌘x(i)j

The Perceptron

4

(x(i), y(i))

either 2 or -2

Page 5: Linear Classification: The Perceptron · 2020-01-29 · Linear Classification: The Perceptron Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron

✓j ✓j �↵

2

⇣h✓

⇣x(i)

⌘� y(i)

⌘x(i)j

• The perceptron uses the following update rule each

time it receives a new training instance

• Re-write as (only upon misclassification)

– Can eliminate α in this case, since its only effect is to scale θby a constant, which doesn’t affect performance

The Perceptron

5

(x(i), y(i))

either 2 or -2

✓j ✓j + ↵y(i)x(i)j

Perceptron Rule: If is misclassified, do ✓ ✓ + y(i)x(i)✓ ✓ + y(i)x(i)

Page 6: Linear Classification: The Perceptron · 2020-01-29 · Linear Classification: The Perceptron Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron

✓old

+

Why the Perceptron Update Works

6

x

x ✓old

+✓new

✓old

+misclassified

Based on slide by Piyush Rai

Page 7: Linear Classification: The Perceptron · 2020-01-29 · Linear Classification: The Perceptron Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron

Why the Perceptron Update Works

• Consider the misclassified example (y = +1)

– Perceptron wrongly thinks that

• Update:

• Note that

• Therefore, is less negative than

– So, we are making ourselves more correct on this example!

7

✓|oldx < 0

✓new = ✓old + yx = ✓old + x (since y = +1)✓new = ✓old + yx = ✓old + x (since y = +1)

✓|newx = (✓old + x)|x

= ✓|oldx+ x|x

✓|newx = (✓old + x)|x

= ✓|oldx+ x|x

kxk22 > 0

✓|newx = (✓old + x)|x

= ✓|oldx+ x|x

✓|oldx < 0

Based on slide by Piyush Rai

Page 8: Linear Classification: The Perceptron · 2020-01-29 · Linear Classification: The Perceptron Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron

The Perceptron Cost Function

• The perceptron uses the following cost function

– is 0 if the prediction is correct

– Otherwise, it is the confidence in the misprediction

8

Jp(✓) =1

n

nX

i=1

max(0,�y(i)x(i)✓)

Jp(✓) =1

n

nX

i=1

max(0,�y(i)x(i)✓)

Based on slide by Alan Fern

Page 9: Linear Classification: The Perceptron · 2020-01-29 · Linear Classification: The Perceptron Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron

Online Perceptron Algorithm

9Based on slide by Alan Fern

1.) Let ✓ [0, 0, . . . , 0]2.) Repeat:3.) Receive training example (x(i), y(i))4.) if y(i)x(i)✓ 0 // prediction is incorrect

5.) ✓ ✓ + y(i)x(i)

Online learning – the learning mode where the model update is

performed each time a single observation is received

Batch learning – the learning mode where the model update is

performed after observing the entire training set

Page 10: Linear Classification: The Perceptron · 2020-01-29 · Linear Classification: The Perceptron Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron

Online Perceptron Algorithm

10Based on slide by Alan Fern

Red points are

labeled +

Blue points are

labeled -

Page 11: Linear Classification: The Perceptron · 2020-01-29 · Linear Classification: The Perceptron Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron

Batch Perceptron

11

1.) Given training data�(x(i), y(i))

n

i=12.) Let ✓ [0, 0, . . . , 0]2.) Repeat:2.) Let � [0, 0, . . . , 0]3.) for i = 1 . . . n, do4.) if y(i)x(i)✓ 0 // prediction for ith instance is incorrect

5.) � �+ y(i)x(i)

6.) � �/n // compute average update

6.) ✓ ✓ + ↵�8.) Until k�k2 < ✏

• Simplest case: α = 1 and don’t normalize, yields the fixed

increment perceptron

• Guaranteed to find a separating hyperplane if one exists

Based on slide by Alan Fern