logistic regression - cc.gatech.edu · logistic regression robot image credit: viktoriyasukhanova©...

23
Logistic Regression Robot Image Credit: Viktoriya Sukhanova © 123RF.com These slides were assembled by Byron Boots, with only minor modifications from Eric Eaton’s slides and grateful acknowledgement to the many others who made their course materials freely available online. Feel free to reuse or adapt these slides for your own academic purposes, provided that you include proper attribution.

Upload: others

Post on 30-Jul-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Logistic Regression - cc.gatech.edu · Logistic Regression Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

LogisticRegression

RobotImageCredit:Viktoriya Sukhanova ©123RF.com

TheseslideswereassembledbyByronBoots,withonlyminormodificationsfromEricEaton’sslidesandgratefulacknowledgementtothemanyotherswhomadetheircoursematerialsfreelyavailableonline.Feelfreetoreuseoradapttheseslidesforyourownacademicpurposes,providedthatyouincludeproperattribution.

Page 2: Logistic Regression - cc.gatech.edu · Logistic Regression Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

ClassificationBasedonProbability• Insteadofjustpredictingtheclass,givetheprobabilityoftheinstancebeingthatclass– i.e.,learn

• Comparisontoperceptron:– Perceptrondoesn’tproduceprobabilityestimate

• Recallthat:

2

p(y | x)

p(event) + p(¬event) = 1

0 p(event) 1

Page 3: Logistic Regression - cc.gatech.edu · Logistic Regression Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

LogisticRegression• Takesaprobabilisticapproachtolearningdiscriminativefunctions(i.e.,aclassifier)

• shouldgive– Want

• Logisticregressionmodel:

3

h✓(x) = g (✓|x)

g(z) =1

1 + e�z

0 h✓(x) 1

g(z) =1

1 + e�z

h✓

(x) =1

1 + e�✓

Tx

Logistic/SigmoidFunction

h✓(x) p(y = 1 | x;✓)

Page 4: Logistic Regression - cc.gatech.edu · Logistic Regression Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

InterpretationofHypothesisOutput

4

=estimated

à Tellpatientthat70%chanceoftumorbeingmalignant

Example:Cancerdiagnosisfromtumorsize

h✓(x) p(y = 1 | x;✓)

x =

x0

x1

�=

1

tumorSize

h✓(x) = 0.7

p(y = 0 | x;✓) + p(y = 1 | x;✓) = 1Notethat:

BasedonexamplebyAndrewNg

Therefore, p(y = 0 | x;✓) = 1� p(y = 1 | x;✓)

Page 5: Logistic Regression - cc.gatech.edu · Logistic Regression Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

AnotherInterpretation• Equivalently,logisticregressionassumesthat

• Inotherwords,logisticregressionassumesthatthelogoddsisalinearfunctionof

5

log

p(y = 1 | x;✓)p(y = 0 | x;✓) = ✓0 + ✓1x1 + . . .+ ✓dxd

x

SideNote:theoddsinfavorofaneventisthequantityp /(1−p),wherep istheprobabilityoftheevent

E.g.,IfItossafairdice,whataretheoddsthatIwillhavea6?

oddsofy =1

BasedonslidebyXiaoli Fern

Page 6: Logistic Regression - cc.gatech.edu · Logistic Regression Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

LogisticRegression

• Assumeathresholdand...

– Predicty =1if– Predicty =0if

6

h✓(x) = g (✓|x)

g(z) =1

1 + e�z

g(z) =1

1 + e�z

h✓(x) � 0.5

h✓(x) < 0.5

y =1

y =0

BasedonslidebyAndrewNg

shouldbelargenegativevaluesfornegativeinstances

h✓(x) = g (✓|x) shouldbelargepositive

valuesforpositiveinstancesh✓(x) = g (✓|

x)

Page 7: Logistic Regression - cc.gatech.edu · Logistic Regression Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Non-LinearDecisionBoundary• Canapplybasisfunctionexpansiontofeatures,sameaswithlinearregression

7

x =

2

41x1

x2

3

5 !

2

6666666666666664

1x1

x2

x1x2

x

21

x

22

x

21x2

x1x22

...

3

7777777777777775

Page 8: Logistic Regression - cc.gatech.edu · Logistic Regression Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

LogisticRegression

• Given

where

• Model:

8

x

| =⇥1 x1 . . . xd

⇤✓ =

2

6664

✓0✓1...✓d

3

7775

h✓(x) = g (✓|x)

g(z) =1

1 + e�z

n⇣

x

(1), y(1)⌘

,⇣

x

(2), y(2)⌘

, . . . ,⇣

x

(n), y(n)⌘o

x

(i) 2 Rd, y(i) 2 {0, 1}

Page 9: Logistic Regression - cc.gatech.edu · Logistic Regression Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

LogisticRegressionObjectiveFunction• Can’tjustusesquaredlossasinlinearregression:

– Usingthelogisticregressionmodel

resultsinanon-convexoptimization

9

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2

h✓

(x) =1

1 + e�✓

Tx

Page 10: Logistic Regression - cc.gatech.edu · Logistic Regression Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

DerivingtheCostFunctionviaMaximumLikelihoodEstimation

• Likelihoodofdataisgivenby:

• So,lookingfortheθ thatmaximizesthelikelihood

• Cantakethelogwithoutchangingthesolution:

10

l(✓) =nY

i=1

p(y(i) | x(i);✓)

✓MLE = argmax

✓l(✓) = argmax

nY

i=1

p(y(i) | x(i);✓)

✓MLE = argmax

✓log

nY

i=1

p(y(i) | x(i);✓)

= argmax

nX

i=1

log p(y(i) | x(i);✓)

✓MLE = argmax

✓log

nY

i=1

p(y(i) | x(i);✓)

= argmax

nX

i=1

log p(y(i) | x(i);✓)

Page 11: Logistic Regression - cc.gatech.edu · Logistic Regression Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

DerivingtheCostFunctionviaMaximumLikelihoodEstimation

11

• Expandasfollows:

• Substituteinmodel,andtakenegativetoyield

✓MLE = argmax

nX

i=1

log p(y(i) | x(i);✓)

= argmax

nX

i=1

hy(i) log p(y(i)=1 | x(i)

;✓) +

⇣1� y(i)

⌘log

⇣1� p(y(i)=1 | x(i)

;✓)

⌘i

J(✓) = �nX

i=1

hy(i) log h✓(x

(i)) +

⇣1� y(i)

⌘log

⇣1� h✓(x

(i))

⌘i

Logisticregressionobjective:min✓

J(✓)

✓MLE = argmax

nX

i=1

log p(y(i) | x(i);✓)

= argmax

nX

i=1

hy(i) log p(y(i)=1 | x(i)

;✓) +

⇣1� y(i)

⌘log

⇣1� p(y(i)=1 | x(i)

;✓)

⌘i

Page 12: Logistic Regression - cc.gatech.edu · Logistic Regression Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

IntuitionBehindtheObjective

• Costofasingleinstance:

• Canre-writeobjectivefunctionas

12

J(✓) = �nX

i=1

hy(i) log h✓(x

(i)) +

⇣1� y(i)

⌘log

⇣1� h✓(x

(i))

⌘i

cost (h✓(x), y) =

⇢� log(h✓(x)) if y = 1

� log(1� h✓(x)) if y = 0

J(✓) =nX

i=1

cost

⇣h✓(x

(i)), y(i)

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2Comparetolinearregression:

Page 13: Logistic Regression - cc.gatech.edu · Logistic Regression Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

IntuitionBehindtheObjective

13

cost (h✓(x), y) =

⇢� log(h✓(x)) if y = 1

� log(1� h✓(x)) if y = 0

Aside:Recalltheplotoflog(z)

Page 14: Logistic Regression - cc.gatech.edu · Logistic Regression Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

IntuitionBehindtheObjective

Ify =1• Cost=0ifpredictioniscorrect• As

• Capturesintuitionthatlargermistakesshouldgetlargerpenalties– e.g.,predict,buty =1

14

cost (h✓(x), y) =

⇢� log(h✓(x)) if y = 1

� log(1� h✓(x)) if y = 0

h✓(x) ! 0, cost ! 1

h✓(x) = 0

BasedonexamplebyAndrewNg

Ify =1

10

cost

h✓(x) = 0

Page 15: Logistic Regression - cc.gatech.edu · Logistic Regression Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

IntuitionBehindtheObjective

15

cost (h✓(x), y) =

⇢� log(h✓(x)) if y = 1

� log(1� h✓(x)) if y = 0

Ify =0

10

cost

Ify =1

Ify =0• Cost=0ifpredictioniscorrect• As

• Capturesintuitionthatlargermistakesshouldgetlargerpenalties

(1� h✓(x)) ! 0, cost ! 1

BasedonexamplebyAndrewNg

h✓(x) = 0

Page 16: Logistic Regression - cc.gatech.edu · Logistic Regression Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

RegularizedLogisticRegression

• Wecanregularizelogisticregressionexactlyasbefore:

16

J(✓) = �nX

i=1

hy(i) log h✓(x

(i)) +

⇣1� y(i)

⌘log

⇣1� h✓(x

(i))

⌘i

Jregularized(✓) = J(✓) + �dX

j=1

✓2j

= J(✓) + �k✓[1:d]k22

Page 17: Logistic Regression - cc.gatech.edu · Logistic Regression Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

GradientDescentforLogisticRegression

17

• Initialize• Repeatuntilconvergence

✓j ✓j � ↵@

@✓jJ(✓) simultaneousupdate

forj =0...d

Jreg(✓) = �nX

i=1

hy(i) log h✓(x

(i)) +

⇣1� y(i)

⌘log

⇣1� h✓(x

(i))

⌘i+ �k✓[1:d]k22

Want min✓

J(✓)

Usethenaturallogarithm(ln =loge)tocancelwiththeexp()in h✓

(x) =1

1 + e�✓

Tx

Page 18: Logistic Regression - cc.gatech.edu · Logistic Regression Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

✓0 ✓0 � ↵

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘

✓j ✓j � ↵

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j � �✓j

GradientDescentforLogisticRegression

18

Jreg(✓) = �nX

i=1

hy(i) log h✓(x

(i)) +

⇣1� y(i)

⌘log

⇣1� h✓(x

(i))

⌘i+ �k✓[1:d]k22

Want min✓

J(✓)

• Initialize• Repeatuntilconvergence

✓(simultaneousupdateforj =0...d)

✓j ✓j � ↵

"nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j �

n

✓j

#

Page 19: Logistic Regression - cc.gatech.edu · Logistic Regression Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

✓0 ✓0 � ↵

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘

✓j ✓j � ↵

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j � �✓j

GradientDescentforLogisticRegression

19

• Initialize• Repeatuntilconvergence

✓(simultaneousupdateforj =0...d)

ThislooksIDENTICALtolinearregression!!!• Ignoringthe1/n constant• However,theformofthemodelisverydifferent:

h✓

(x) =1

1 + e�✓

Tx

✓j ✓j � ↵

"nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j �

n

✓j

#

Page 20: Logistic Regression - cc.gatech.edu · Logistic Regression Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Multi-ClassClassification

Diseasediagnosis: healthy/cold/flu/pneumonia

Objectclassification: desk/chair/monitor/bookcase20

x1

x2

x1

x2

Binaryclassification: Multi-classclassification:

Page 21: Logistic Regression - cc.gatech.edu · Logistic Regression Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

h✓(x) =1

1 + exp(�✓

Tx)

=

exp(✓

Tx)

1 + exp(✓

Tx)

Multi-ClassLogisticRegression• For2classes:

• ForC classes{1,...,C}:

– Calledthesoftmax function

21

h✓(x) =1

1 + exp(�✓

Tx)

=

exp(✓

Tx)

1 + exp(✓

Tx)

weightassignedtoy =0

weightassignedtoy =1

p(y = c | x;✓1, . . . ,✓C) =exp(✓

Tc x)PC

c=1 exp(✓Tc x)

Page 22: Logistic Regression - cc.gatech.edu · Logistic Regression Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Multi-ClassLogisticRegression

• Trainalogisticregressionclassifierforeachclassi topredicttheprobabilitythaty =i with

22

x1

x2

SplitintoOnevs Rest:

hc(x) =exp(✓

Tc x)PC

c=1 exp(✓Tc x)

Page 23: Logistic Regression - cc.gatech.edu · Logistic Regression Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

hc(x) =exp(✓

Tc x)PC

c=1 exp(✓Tc x)

ImplementingMulti-ClassLogisticRegression

• Useasthemodelforclassc

• Gradientdescentsimultaneouslyupdatesallparametersforallmodels– Samederivativeasbefore,justwiththeabovehc(x)

• Predictclasslabelasthemostprobablelabel

23

max

chc(x)