linear classification: the perceptronbboots3/cs4641-fall2018/... · improving the perceptron •...
TRANSCRIPT
LinearClassification:ThePerceptron
RobotImageCredit:Viktoriya Sukhanova ©123RF.com
TheseslideswereassembledbyByronBoots,withonlyminormodificationsfromEricEaton’sslidesandgratefulacknowledgementtothemanyotherswhomadetheircoursematerialsfreelyavailableonline.Feelfreetoreuseoradapttheseslidesforyourownacademicpurposes,providedthatyouincludeproperattribution.
LinearClassifiers• Ahyperplane partitionsintotwohalf-spaces
– Definedbythenormalvector• isorthogonaltoanyvectorlyingonthehyperplane
– Assumedtopassthroughtheorigin• Thisisbecauseweincorporatedbiastermintoitby
• Considerclassificationwith+1,-1labels...
2
✓
Rd
✓ 2 Rd
✓ 2 Rd
✓0 x0 = 1
BasedonslidebyPiyush Rai
LinearClassifiers• Linearclassifiers:representdecisionboundarybyhyperplane
– Notethat:
3
✓
h(x) = sign(✓|x) sign(z) =
⇢1 if z � 0
�1 if z < 0where
x
| =⇥1 x1 . . . xd
⇤✓ =
2
6664
✓0✓1...✓d
3
7775
✓
|x > 0 =) y = +1
✓
|x < 0 =) y = �1
• Theperceptronusesthefollowingupdateruleeachtimeitreceivesanewtraininginstance
– Ifthepredictionmatchesthelabel,makenochange– Otherwise,adjustθ
h(x) = sign(✓|x) sign(z) =
⇢1 if z � 0
�1 if z < 0where
✓j ✓j �↵
2
⇣h✓
⇣x
(i)⌘� y
(i)⌘x
(i)j
ThePerceptron
4
(x(i), y(i))
either2or-2
✓j ✓j �↵
2
⇣h✓
⇣x
(i)⌘� y
(i)⌘x
(i)j
• Theperceptronusesthefollowingupdateruleeachtimeitreceivesanewtraininginstance
• Re-writeas(onlyuponmisclassification)
– Caneliminateα inthiscase,sinceitsonlyeffectistoscaleθbyaconstant,whichdoesn’taffectperformance
ThePerceptron
5
(x(i), y(i))
either2or-2
✓j ✓j + ↵y
(i)x
(i)j
PerceptronRule:Ifismisclassified,do✓ ✓ + y(i)x(i)
✓ ✓ + y(i)x(i)
✓old
+
WhythePerceptronUpdateWorks
6
x
x ✓old
+✓new
✓old
+misclassified
BasedonslidebyPiyush Rai
WhythePerceptronUpdateWorks• Considerthemisclassifiedexample(y =+1)– Perceptronwronglythinksthat
• Update:
• Notethat
• Therefore,islessnegativethan– So,wearemakingourselvesmorecorrect onthisexample!
7
✓
|old
x < 0
✓
new
= ✓
old
+ yx = ✓
old
+ x (since y = +1)✓
new
= ✓
old
+ yx = ✓
old
+ x (since y = +1)
✓
|new
x = (✓old
+ x)|x= ✓
|old
x+ x
|x
✓
|new
x = (✓old
+ x)|x= ✓
|old
x+ x
|x
kxk22 > 0
✓
|new
x = (✓old
+ x)|x= ✓
|old
x+ x
|x
✓
|old
x < 0
BasedonslidebyPiyush Rai
ThePerceptronCostFunction• Theperceptronusesthefollowingcostfunction
– is0ifthepredictioniscorrect– Otherwise,itistheconfidenceinthemisprediction
8
Jp(✓) =1
n
nX
i=1
max(0,�y
(i)x
(i)✓)
Jp(✓) =1
n
nX
i=1
max(0,�y
(i)x
(i)✓)
BasedonslidebyAlanFern
OnlinePerceptronAlgorithm
9BasedonslidebyAlanFern
1.) Let ✓ [0, 0, . . . , 0]2.) Repeat:
3.) Receive training example (x
(i), y(i))4.) if y(i)x(i)
✓ 0 // prediction is incorrect
5.) ✓ ✓ + y(i)x(i)
Onlinelearning– thelearningmodewherethemodelupdateisperformedeachtimeasingleobservationisreceived
Batchlearning– thelearningmodewherethemodelupdateisperformedafterobservingtheentiretrainingset
OnlinePerceptronAlgorithm
10BasedonslidebyAlanFern
Redpointsarelabeled+
Bluepointsarelabeled-
BatchPerceptron
11
1.) Given training data
�(x
(i), y(i)) n
i=12.) Let ✓ [0, 0, . . . , 0]2.) Repeat:
2.) Let � [0, 0, . . . , 0]3.) for i = 1 . . . n, do4.) if y(i)x(i)
✓ 0 // prediction for i
thinstance is incorrect
5.) � �+ y(i)x(i)
6.) � �/n // compute average update
6.) ✓ ✓ + ↵�8.) Until k�k2 < ✏
• Simplestcase:α=1anddon’tnormalize,yieldsthefixedincrementperceptron
• Guaranteedtofindaseparatinghyperplane ifoneexistsBasedonslidebyAlanFern
ImprovingthePerceptron• ThePerceptronproducesmanyθ‘s duringtraining• ThestandardPerceptronsimplyusesthefinalθ attesttime
– Thismaysometimesnotbeagoodidea!– Someotherθmaybecorrecton1,000consecutiveexamples,butonemistakeruinsit!
• Idea:Useacombinationofmultipleperceptrons– (i.e.,neuralnetworks!)
• Idea:Usetheintermediateθ‘s– VotedPerceptron:voteonpredictionsoftheintermediateθ‘s– AveragedPerceptron:averagetheintermediateθ‘s
12BasedonslidebyPiyush Rai