linear classification: the perceptronbboots3/cs4641-fall2018/... · improving the perceptron •...

LinearClassification:ThePerceptron

RobotImageCredit:Viktoriya Sukhanova ©123RF.com

TheseslideswereassembledbyByronBoots,withonlyminormodificationsfromEricEaton’sslidesandgratefulacknowledgementtothemanyotherswhomadetheircoursematerialsfreelyavailableonline.Feelfreetoreuseoradapttheseslidesforyourownacademicpurposes,providedthatyouincludeproperattribution.

LinearClassifiers• Ahyperplane partitionsintotwohalf-spaces

– Definedbythenormalvector• isorthogonaltoanyvectorlyingonthehyperplane

– Assumedtopassthroughtheorigin• Thisisbecauseweincorporatedbiastermintoitby

• Considerclassificationwith+1,-1labels...

2

✓

Rd

✓ 2 Rd

✓ 2 Rd

✓0 x0 = 1

BasedonslidebyPiyush Rai

LinearClassifiers• Linearclassifiers:representdecisionboundarybyhyperplane

– Notethat:

3

✓

h(x) = sign(✓|x) sign(z) =

⇢1 if z � 0

�1 if z < 0where

x

| =⇥1 x1 . . . xd

⇤✓ =

2

6664

✓0✓1...✓d

3

7775

✓

|x > 0 =) y = +1

✓

|x < 0 =) y = �1

• Theperceptronusesthefollowingupdateruleeachtimeitreceivesanewtraininginstance

– Ifthepredictionmatchesthelabel,makenochange– Otherwise,adjustθ

h(x) = sign(✓|x) sign(z) =

⇢1 if z � 0

�1 if z < 0where

✓j ✓j �↵

2

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

ThePerceptron

4

(x(i), y(i))

either2or-2

✓j ✓j �↵

2

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

• Theperceptronusesthefollowingupdateruleeachtimeitreceivesanewtraininginstance

• Re-writeas(onlyuponmisclassification)

– Caneliminateα inthiscase,sinceitsonlyeffectistoscaleθbyaconstant,whichdoesn’taffectperformance

ThePerceptron

5

(x(i), y(i))

either2or-2

✓j ✓j + ↵y

(i)x

(i)j

PerceptronRule:Ifismisclassified,do✓ ✓ + y(i)x(i)

✓ ✓ + y(i)x(i)

✓old

+

WhythePerceptronUpdateWorks

6

x

x ✓old

+✓new

✓old

+misclassified


ThePerceptronCostFunction• Theperceptronusesthefollowingcostfunction

– is0ifthepredictioniscorrect– Otherwise,itistheconfidenceinthemisprediction

8

Jp(✓) =1

n

nX

i=1

max(0,�y

(i)x

(i)✓)

Jp(✓) =1

n

nX

i=1

max(0,�y

(i)x

(i)✓)

BasedonslidebyAlanFern

OnlinePerceptronAlgorithm

9BasedonslidebyAlanFern

1.) Let ✓ [0, 0, . . . , 0]2.) Repeat:

3.) Receive training example (x

(i), y(i))4.) if y(i)x(i)

✓ 0 // prediction is incorrect

5.) ✓ ✓ + y(i)x(i)

Onlinelearning– thelearningmodewherethemodelupdateisperformedeachtimeasingleobservationisreceived

Batchlearning– thelearningmodewherethemodelupdateisperformedafterobservingtheentiretrainingset

OnlinePerceptronAlgorithm

10BasedonslidebyAlanFern

Redpointsarelabeled+

Bluepointsarelabeled-

BatchPerceptron

11

1.) Given training data

�(x

(i), y(i)) n

i=12.) Let ✓ [0, 0, . . . , 0]2.) Repeat:

2.) Let � [0, 0, . . . , 0]3.) for i = 1 . . . n, do4.) if y(i)x(i)

✓ 0 // prediction for i

thinstance is incorrect

5.) � �+ y(i)x(i)

6.) � �/n // compute average update

6.) ✓ ✓ + ↵�8.) Until k�k2 < ✏

• Simplestcase:α=1anddon’tnormalize,yieldsthefixedincrementperceptron

• Guaranteedtofindaseparatinghyperplane ifoneexistsBasedonslidebyAlanFern

ImprovingthePerceptron• ThePerceptronproducesmanyθ‘s duringtraining• ThestandardPerceptronsimplyusesthefinalθ attesttime

– Thismaysometimesnotbeagoodidea!– Someotherθmaybecorrecton1,000consecutiveexamples,butonemistakeruinsit!

• Idea:Useacombinationofmultipleperceptrons– (i.e.,neuralnetworks!)

• Idea:Usetheintermediateθ‘s– VotedPerceptron:voteonpredictionsoftheintermediateθ‘s– AveragedPerceptron:averagetheintermediateθ‘s

12BasedonslidebyPiyush Rai

linear classification: the perceptronbboots3/cs4641-fall2018/... · improving the perceptron •...

Documents