linear classification: the perceptron · 2020-01-29 · linear classification: the perceptron robot...
TRANSCRIPT
Linear Classification:
The Perceptron
Robot Image Credit: Viktoriya Sukhanova © 123RF.com
These slides were assembled by Byron Boots, with only minor modifications from Eric Eaton’s slides and
grateful acknowledgement to the many others who made their course materials freely available online. Feel free to reuse or adapt these slides for your own academic purposes, provided that you include proper attribution.
Linear Classifiers
• A hyperplane partitions into two half-spaces
– Defined by the normal vector
• is orthogonal to any vector lying
on the hyperplane
– Assumed to pass through the origin
• This is because we incorporated bias term into it by
• Consider classification with +1, -1 labels ...
2
✓
Rd
✓ 2 Rd
✓ 2 Rd
✓0 x0 = 1
Based on slide by Piyush Rai
Linear Classifiers
• Linear classifiers: represent decision boundary by hyperplane
– Note that:
3
✓
h(x) = sign(✓|x) sign(z) =
⇢1 if z � 0
�1 if z < 0where
x| =⇥1 x1 . . . xd
⇤✓ =
2
6664
✓0✓1...✓d
3
7775
✓|x > 0 =) y = +1
✓|x < 0 =) y = �1
• The perceptron uses the following update rule each
time it receives a new training instance
– If the prediction matches the label, make no change
– Otherwise, adjust θ
h(x) = sign(✓|x) sign(z) =
⇢1 if z � 0
�1 if z < 0where
✓j ✓j �↵
2
⇣h✓
⇣x(i)
⌘� y(i)
⌘x(i)j
The Perceptron
4
(x(i), y(i))
either 2 or -2
✓j ✓j �↵
2
⇣h✓
⇣x(i)
⌘� y(i)
⌘x(i)j
• The perceptron uses the following update rule each
time it receives a new training instance
• Re-write as (only upon misclassification)
– Can eliminate α in this case, since its only effect is to scale θby a constant, which doesn’t affect performance
The Perceptron
5
(x(i), y(i))
either 2 or -2
✓j ✓j + ↵y(i)x(i)j
Perceptron Rule: If is misclassified, do ✓ ✓ + y(i)x(i)✓ ✓ + y(i)x(i)
✓old
+
Why the Perceptron Update Works
6
x
x ✓old
+✓new
✓old
+misclassified
Based on slide by Piyush Rai
Why the Perceptron Update Works
• Consider the misclassified example (y = +1)
– Perceptron wrongly thinks that
• Update:
• Note that
• Therefore, is less negative than
– So, we are making ourselves more correct on this example!
7
✓|oldx < 0
✓new = ✓old + yx = ✓old + x (since y = +1)✓new = ✓old + yx = ✓old + x (since y = +1)
✓|newx = (✓old + x)|x
= ✓|oldx+ x|x
✓|newx = (✓old + x)|x
= ✓|oldx+ x|x
kxk22 > 0
✓|newx = (✓old + x)|x
= ✓|oldx+ x|x
✓|oldx < 0
Based on slide by Piyush Rai
The Perceptron Cost Function
• The perceptron uses the following cost function
– is 0 if the prediction is correct
– Otherwise, it is the confidence in the misprediction
8
Jp(✓) =1
n
nX
i=1
max(0,�y(i)x(i)✓)
Jp(✓) =1
n
nX
i=1
max(0,�y(i)x(i)✓)
Based on slide by Alan Fern
Online Perceptron Algorithm
9Based on slide by Alan Fern
1.) Let ✓ [0, 0, . . . , 0]2.) Repeat:3.) Receive training example (x(i), y(i))4.) if y(i)x(i)✓ 0 // prediction is incorrect
5.) ✓ ✓ + y(i)x(i)
Online learning – the learning mode where the model update is
performed each time a single observation is received
Batch learning – the learning mode where the model update is
performed after observing the entire training set
Online Perceptron Algorithm
10Based on slide by Alan Fern
Red points are
labeled +
Blue points are
labeled -
Batch Perceptron
11
1.) Given training data�(x(i), y(i))
n
i=12.) Let ✓ [0, 0, . . . , 0]2.) Repeat:2.) Let � [0, 0, . . . , 0]3.) for i = 1 . . . n, do4.) if y(i)x(i)✓ 0 // prediction for ith instance is incorrect
5.) � �+ y(i)x(i)
6.) � �/n // compute average update
6.) ✓ ✓ + ↵�8.) Until k�k2 < ✏
• Simplest case: α = 1 and don’t normalize, yields the fixed
increment perceptron
• Guaranteed to find a separating hyperplane if one exists
Based on slide by Alan Fern