Transcript
Page 1: UVA CS 4501: Machine Learning Lecture 13: Logistic RegressionLogistic regression models are appropriate for target variable coded as 0/1. We only observe “0”and “1”for the

UVACS4501:MachineLearning

Lecture13:LogisticRegression

Dr.Yanjun Qi

UniversityofVirginiaDepartmentofComputerScience

3/23/18

Dr.YanjunQi/UVACS/s18

1

Page 2: UVA CS 4501: Machine Learning Lecture 13: Logistic RegressionLogistic regression models are appropriate for target variable coded as 0/1. We only observe “0”and “1”for the

Wherearewe?èFivemajorsectionsofthiscourse

q Regression(supervised)q Classification(supervised)q Unsupervisedmodelsq Learningtheoryq Graphicalmodels

3/23/18 2

Dr.YanjunQi/UVACS/s18

Page 3: UVA CS 4501: Machine Learning Lecture 13: Logistic RegressionLogistic regression models are appropriate for target variable coded as 0/1. We only observe “0”and “1”for the

Wherearewe?èThreemajorsectionsforclassification

• We can divide the large variety of classification approaches into roughly three major types

1. Discriminative- directly estimate a decision rule/boundary- e.g., logistic regression, support vector machine, decisionTree

2. Generative:- build a generative statistical model- e.g., naïve bayes classifier, Bayesian networks

3. Instance based classifiers- Use observation directly (no models)- e.g. K nearest neighbors

3/23/18 3

Dr.YanjunQi/UVACS/s18

Page 4: UVA CS 4501: Machine Learning Lecture 13: Logistic RegressionLogistic regression models are appropriate for target variable coded as 0/1. We only observe “0”and “1”for the

Today

q BayesClassifierq LogisticRegressionq Binarytomulti-classq TrainingLGbyMLE

Dr.YanjunQi/UVACS/s18

3/23/18 4

Page 5: UVA CS 4501: Machine Learning Lecture 13: Logistic RegressionLogistic regression models are appropriate for target variable coded as 0/1. We only observe “0”and “1”for the

• Treat each feature attribute and the class label as random variables.

• Given a sample x with attributes ( x1, x2, … , xp ):– Goal is to predict its class C.– Specifically, we want to find the value of Ci that maximizes

p( Ci | x1, x2, … , xp ).

• Can we estimate p(Ci |x) = p( Ci | x1, x2, … , xp ) directly from data?

Bayes classifiers

3/23/18

Dr.YanjunQi/UVACS/s18

5

Page 6: UVA CS 4501: Machine Learning Lecture 13: Logistic RegressionLogistic regression models are appropriate for target variable coded as 0/1. We only observe “0”and “1”for the

• Treat each feature attribute and the class label as random variables.

• Given a sample x with attributes ( x1, x2, … , xp ):– Goal is to predict its class c.– Specifically, we want to find the class that maximizes

p( c | x1, x2, … , xp ).

• Can we estimate p(Ci |x) = p( Ci | x1, x2, … , xp ) directly from data?

Bayes classifiers

3/23/18

Dr.YanjunQi/UVACS/s18

6

MAPRule

Page 7: UVA CS 4501: Machine Learning Lecture 13: Logistic RegressionLogistic regression models are appropriate for target variable coded as 0/1. We only observe “0”and “1”for the

Bayes Classifiers – MAP Rule

Task: Classify a new instance X based on a tuple of attribute values into one of the classesX = X1,X2,…,Xp

cMAP = argmaxcj∈C

P(cj | x1, x2,…, xp )

7

MAP=MaximumAposteriori Probability

MAPRule

3/23/18

Dr.YanjunQi/UVACS/s18

AdaptFromCarols’prob tutorial

PleasereadtheL13ExtraslidesforWHY

Page 8: UVA CS 4501: Machine Learning Lecture 13: Logistic RegressionLogistic regression models are appropriate for target variable coded as 0/1. We only observe “0”and “1”for the

ADatasetforclassification

• Data/points/instances/examples/samples/records: [rows]• Features/attributes/dimensions/independent variables/covariates/predictors/regressors: [columns,exceptthelast]• Target/outcome/response/label/dependentvariable:specialcolumntobepredicted [lastcolumn ]

3/23/18 8

Output as Discrete Class Label

C1, C2, …, CL

C

CDr.YanjunQi/UVACS/s18

Discriminativeargmax

c∈CP(c |X)C = {c1 ,⋅⋅⋅,cL}

Page 9: UVA CS 4501: Machine Learning Lecture 13: Logistic RegressionLogistic regression models are appropriate for target variable coded as 0/1. We only observe “0”and “1”for the

Establishing a probabilistic model for classification

– Discriminative model

x = (x1 ,x2 ,⋅⋅⋅,xp)

Discriminative Probabilistic Classifier

1x 2x xp

)|( 1 xcP )|( 2 xcP )|( xLcP

•••

•••

3/23/18AdaptfromProf.KeChenNBslides

9

Dr.YanjunQi/UVACS/s18

argmax

c∈CP(c |X),C = {c1 ,⋅⋅⋅,cL}

Page 10: UVA CS 4501: Machine Learning Lecture 13: Logistic RegressionLogistic regression models are appropriate for target variable coded as 0/1. We only observe “0”and “1”for the

ADatasetforclassification

• Data/points/instances/examples/samples/records: [rows]• Features/attributes/dimensions/independent variables/covariates/predictors/regressors: [columns,exceptthelast]• Target/outcome/response/label/dependentvariable:specialcolumntobepredicted [lastcolumn ]

3/23/18 10

Output as Discrete Class Label

C1, C2, …, CL

C

C

Generative

Dr.YanjunQi/UVACS/s18

argmax

c∈CP(c |X )= argmax

c∈CP(X ,c)= argmax

c∈CP(X |c)P(c)

argmax

c∈CP(c |X)C = {c1 ,⋅⋅⋅,cL}Discriminative

Later!

Page 11: UVA CS 4501: Machine Learning Lecture 13: Logistic RegressionLogistic regression models are appropriate for target variable coded as 0/1. We only observe “0”and “1”for the

Today

q BayesClassifierq LogisticRegressionq Binarytomulti-classq TrainingLGbyMLE

Dr.YanjunQi/UVACS/s18

3/23/18 11

Page 12: UVA CS 4501: Machine Learning Lecture 13: Logistic RegressionLogistic regression models are appropriate for target variable coded as 0/1. We only observe “0”and “1”for the

MultivariatelinearregressiontoLogisticRegression

y = β0 +β1x1 +β2x2 + ...+βpxp

Logisticregression forbinaryclassification

ln P( y |x)

1−P( y x)⎡

⎣⎢⎢

⎦⎥⎥= β0 +β1x1 +β2x2 + ...+βpxp

3/23/18 12

Dr.YanjunQi/UVACS/s18

Page 13: UVA CS 4501: Machine Learning Lecture 13: Logistic RegressionLogistic regression models are appropriate for target variable coded as 0/1. We only observe “0”and “1”for the

Dr.YanjunQi/UVACS/s18

Review: ProbabilisticInterpretationofLinearRegression

• Letusassumethatthetargetvariableandtheinputsarerelatedbytheequation:

whereε isanerrortermofunmodeled effectsorrandomnoise

• Nowassumethatε followsaGaussianN(0,σ),thenwehave:

• ByIIDassumption:

iiT

iy εθ += x

⎟⎟⎠

⎞⎜⎜⎝

⎛ −−= 2

2

221

σθ

σπθ )(exp);|( i

Ti

iiyxyp x

⎟⎟

⎞⎜⎜

⎛ −−⎟

⎠⎞⎜

⎝⎛== ∑∏ =

=2

12

1 221

σθ

σπθθ

n

i iT

inn

iii

yxypL

)(exp);|()(

x

3/23/18 13

Page 14: UVA CS 4501: Machine Learning Lecture 13: Logistic RegressionLogistic regression models are appropriate for target variable coded as 0/1. We only observe “0”and “1”for the

LogisticRegressionp(y|x)

3/23/18

Dr.YanjunQi/UVACS/s18

14

P(y x)= eβ0+β1x1+β2x2+...+βpxp

1+eβ0+β1x1+β2x2+...+βpxp= 11+e−βT X

ln P( y |x)

1−P( y x)⎡

⎣⎢⎢

⎦⎥⎥= β0 +β1x1 +β2x2 + ...+βpxp

Page 15: UVA CS 4501: Machine Learning Lecture 13: Logistic RegressionLogistic regression models are appropriate for target variable coded as 0/1. We only observe “0”and “1”for the

LogisticRegressionmodelsalinearclassificationboundary!

3/23/18

Dr.YanjunQi/UVACS/s18

15

!!y∈{0,1}

ln P( y |x)

1−P( y x)⎡

⎣⎢⎢

⎦⎥⎥= β0 +β1x1 +β2x2 + ...+βpxp

Page 16: UVA CS 4501: Machine Learning Lecture 13: Logistic RegressionLogistic regression models are appropriate for target variable coded as 0/1. We only observe “0”and “1”for the

3/23/18

Dr.YanjunQi/UVACS/s18

16

!!y∈{0,1}

(2)p(y|x)

!!ln P( y |x)

1−P( y x)⎡

⎣⎢⎢

⎦⎥⎥=α +β1x1 +β2x2 + ...+βpxp

0.5

0.5

LogisticRegressionmodelsalinearclassificationboundary!

Page 17: UVA CS 4501: Machine Learning Lecture 13: Logistic RegressionLogistic regression models are appropriate for target variable coded as 0/1. We only observe “0”and “1”for the

Thelogisticfunction(1)-- isacommon"S"shapefunction

0.0

0.2

0.4

0.6

0.8

1.0e.g.Probability ofdisease

x

βxα

βxα

e1e)xP(y +

+

+=

P (Y=1|X)

3/23/18 17

Dr.YanjunQi/UVACS/s18

Page 18: UVA CS 4501: Machine Learning Lecture 13: Logistic RegressionLogistic regression models are appropriate for target variable coded as 0/1. We only observe “0”and “1”for the

Thelogisticfunction(1)-- isacommon"S"shapefunction

0.0

0.2

0.4

0.6

0.8

1.0e.g.Probability ofdisease

x

βxα

βxα

e1e)xP(y +

+

+=

P (Y=1|X)

3/23/18 18

Dr.YanjunQi/UVACS/s18

Page 19: UVA CS 4501: Machine Learning Lecture 13: Logistic RegressionLogistic regression models are appropriate for target variable coded as 0/1. We only observe “0”and “1”for the

LogisticRegression—when?

Logisticregressionmodelsareappropriatefortargetvariablecodedas0/1.

Weonlyobserve“0” and“1” forthetargetvariable—butwethinkofthetargetvariableconceptuallyasaprobabilitythat“1”willoccur.

3/23/18 19

Dr.YanjunQi/UVACS/s18

19P(x) 1-p(x)

Page 20: UVA CS 4501: Machine Learning Lecture 13: Logistic RegressionLogistic regression models are appropriate for target variable coded as 0/1. We only observe “0”and “1”for the

LogisticRegression—when?

Logisticregressionmodelsareappropriatefortargetvariablecodedas0/1.

Weonlyobserve“0” and“1” forthetargetvariable—butwethinkofthetargetvariableconceptuallyasaprobabilitythat“1”willoccur.

This means we use Bernoulli distribution to model the target variable with its Bernoulli parameter p=p(y=1|x) predefined.

The main interest è predicting the probability that an event occurs (i.e., the probability that p(y=1|x) ).

3/23/18 20

Dr.YanjunQi/UVACS/s18

Page 21: UVA CS 4501: Machine Learning Lecture 13: Logistic RegressionLogistic regression models are appropriate for target variable coded as 0/1. We only observe “0”and “1”for the

ln( )( )

P y xP y x

x1−⎡

⎣⎢

⎦⎥ = +α β

Thelogit functionView

Logit ofP(y|x)

{P y x e

e

x

x( ) =+

+

+

α β

α β1

3/23/18 21

Dr.YanjunQi/UVACS/s18

ln P( y =1|x)

P( y =0|x)⎡

⎣⎢

⎦⎥ = ln

P( y =1|x)1−P( y =1|x)⎡

⎣⎢

⎦⎥ =α +β1x1 +β2x2 + ...+βpxp

Logit function

DecisionBoundaryè equalstozero

Page 22: UVA CS 4501: Machine Learning Lecture 13: Logistic RegressionLogistic regression models are appropriate for target variable coded as 0/1. We only observe “0”and “1”for the

LogisticRegressionAssumptions

• Linearityinthelogit – theregressionequationshouldhavealinearrelationshipwiththelogit formofthetargetvariable

• Thereisnoassumptionaboutthefeaturevariables/targetpredictorsbeinglinearlyrelatedtoeachother.

3/23/18 22

Dr.YanjunQi/UVACS/s18

Page 23: UVA CS 4501: Machine Learning Lecture 13: Logistic RegressionLogistic regression models are appropriate for target variable coded as 0/1. We only observe “0”and “1”for the

Today

q BayesClassifierq LogisticRegressionq Binarytomulti-classq TrainingLGbyMLE

Dr.YanjunQi/UVACS/s18

3/23/18 23

Page 24: UVA CS 4501: Machine Learning Lecture 13: Logistic RegressionLogistic regression models are appropriate for target variable coded as 0/1. We only observe “0”and “1”for the

Binary Logistic RegressionIn summary that the logistic regression tells us two things at once.• Transformed, the “log odds” (logit) are linear.

• Logistic Distribution

ln[p/(1-p)]

P (Y=1|x)

x

x3/23/18 24

Dr.YanjunQi/UVACS/s18

This means we use Bernoulli distribution to model the target variable with its Bernoulli parameter p=p(y=1|x)predefined.

P(y=1|x) 1-p(x)

Odds=p/(1-p)

Page 25: UVA CS 4501: Machine Learning Lecture 13: Logistic RegressionLogistic regression models are appropriate for target variable coded as 0/1. We only observe “0”and “1”for the

25

BinaryèMultinoulliLogisticRegressionModel

Pr(G = k |X = x)= exp(βk0 +βkT x)

1+ exp(βl0 +βlT x)l=1

K−1

∑, k =1,…,K −1

Pr(G = K |X = x)= 11+ exp(βl0 +βlT x)

l=1

K−1

Directly modelstheposteriorprobabilitiesastheoutputofregression

Notethattheclassboundariesarelinear

x isp-dimensional inputvector

isap-dimensionalvectorforeachclassk

Totalnumber ofparametersis(K-1)(p+1)

3/23/18

Dr.YanjunQi/UVACS/s18

βkT

Page 26: UVA CS 4501: Machine Learning Lecture 13: Logistic RegressionLogistic regression models are appropriate for target variable coded as 0/1. We only observe “0”and “1”for the

26

BinaryèMultinoulliLogisticRegressionModel

Notethattheclassboundariesarelinear3/23/18

Dr.YanjunQi/UVACS/s18

Page 27: UVA CS 4501: Machine Learning Lecture 13: Logistic RegressionLogistic regression models are appropriate for target variable coded as 0/1. We only observe “0”and “1”for the

Today

q BayesClassifierq LogisticRegressionq Binarytomulti-classq TrainingLGbyMLE

Dr.YanjunQi/UVACS/s18

3/23/18 27

Page 28: UVA CS 4501: Machine Learning Lecture 13: Logistic RegressionLogistic regression models are appropriate for target variable coded as 0/1. We only observe “0”and “1”for the

Parameter EstimationforLGèMLEfrom thedata

• Review:– Linear regression trainingè Leastsquares– Gaussian explaination ofLinear Regression trainingèMLE

• Logistic regression Training:– MaximumLikelihood Estimationbased– Classic algorithms using iterative Newton

3/23/18 28

Dr.YanjunQi/UVACS/s18

PleasereadtheL13ExtraslidesforHOW

Page 29: UVA CS 4501: Machine Learning Lecture 13: Logistic RegressionLogistic regression models are appropriate for target variable coded as 0/1. We only observe “0”and “1”for the

Review:DefiningLikelihoodforbasicBernoulli

• Likelihood=p(data|parameter)

!!f (xi |p)= pxi (1− p)1−xi

function of x_i

PMF:

!!L(p)=∏i=1n pxi (1− p)1−xi = px(1− p)n−x

function of p

LIKELIHOOD:

è e.g.,fornindependenttossesofcoins,withunknownp

Dr.YanjunQi/UVACS

Observeddataèxheads-upfromntrials

x = xi

i=1

n

Page 30: UVA CS 4501: Machine Learning Lecture 13: Logistic RegressionLogistic regression models are appropriate for target variable coded as 0/1. We only observe “0”and “1”for the

30

MLEforLogisticRegressionTraining(extra)

Let’sfitthelogisticregressionmodelforK=2, i.e.,number ofclassesis2

!!

l(β)= {logPr(Y = yi |X = xi )}i=1

N

= yi log(Pr(Y =1|X = xi ))+(1− yi )log(Pr(Y =0|X = xi ))i=1

N

= ( yi logexp(βT xi )

1+exp(βT xi ))+(1− yi )log

11+exp(βT xi )

)i=1

N

= ( yiβT xi − log(1+exp(βT xi )))i=1

N

Trainingset:(xi,yi),i=1,…,N

Wewanttomaximize thelog-likelihood inordertoestimate\beta

xi are(p+1)-dimensional inputvectorwithleadingentry1\betaisa(p+1)-dimensional vector

pβ( y |x)y(1− pβ( y |x))1− y

ForBernoullidistribution

3/23/18

Dr.YanjunQi/UVACS/s18

How?

Page 31: UVA CS 4501: Machine Learning Lecture 13: Logistic RegressionLogistic regression models are appropriate for target variable coded as 0/1. We only observe “0”and “1”for the

3/23/18

Dr.YanjunQi/UVACS/s18

31

!!l(β)= {logPr(Y = yi |X = xi )}

i=1

N

Page 32: UVA CS 4501: Machine Learning Lecture 13: Logistic RegressionLogistic regression models are appropriate for target variable coded as 0/1. We only observe “0”and “1”for the

Logistic Regression

classification

Log-odds = linear function of X’s

EPE, MLE

Iterative (Newton) method

Logistic weights

Task

Representation

Score Function

Search/Optimization

Models, Parameters

P(c =1 x) = eα+βx

1+ eα+βx3/23/18 32

Dr.YanjunQi/UVACS/s18

Page 33: UVA CS 4501: Machine Learning Lecture 13: Logistic RegressionLogistic regression models are appropriate for target variable coded as 0/1. We only observe “0”and “1”for the

References

q Prof.Tan,Steinbach,Kumar’s“IntroductiontoDataMining”slide

q Prof.AndrewMoore’sslidesq Prof.EricXing’sslidesq Prof.Ke ChenNBslidesq Hastie,Trevor,etal.Theelementsofstatisticallearning.Vol.2.No.1.NewYork:Springer,2009.

3/23/18 33

Dr.YanjunQi/UVACS/s18


Top Related