fsan815/eleg815: foundations of statistical learning · fsan815/eleg815: foundations of statistical...

18
FSAN815/ELEG815: Foundations of Statistical Learning Gonzalo R. Arce Chapter 14: Logistic Regression Fall 2014

Upload: vohanh

Post on 08-Jun-2018

248 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FSAN815/ELEG815: Foundations of Statistical Learning · FSAN815/ELEG815: Foundations of Statistical Learning Gonzalo R. Arce Chapter 14: Logistic Regression ... Logistic Regression

FSAN815/ELEG815:Foundations of Statistical

Learning

Gonzalo R. ArceChapter 14: Logistic Regression

Fall 2014

Page 2: FSAN815/ELEG815: Foundations of Statistical Learning · FSAN815/ELEG815: Foundations of Statistical Learning Gonzalo R. Arce Chapter 14: Logistic Regression ... Logistic Regression

Course Objectives & Structure

Course Objectives & Structure

The course provides an introduction to the mathematics of data analysis anda detailed overview of statistical models for inference and prediction.Course Structure:

Weekly lectures [notes: www.ece.udel.edu/∼arce/Courses]Homework & computer assignments [30%]Midterm & Final examinations [70%]

Textbooks:Papoulis and Pillai, Probability, random variables, and stochasticprocesses.Hastie, Tibshirani and Friedman, The elements of statistical learning.Haykin, Adaptive Filter Theory.

Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 2 / 18

Page 3: FSAN815/ELEG815: Foundations of Statistical Learning · FSAN815/ELEG815: Foundations of Statistical Learning Gonzalo R. Arce Chapter 14: Logistic Regression ... Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

The logistic regression model arises from the desire to model the posteriorprobabilities of the K classes via linear functions in x , while at the same timeensuring that they sum to one and remain in [0,1].

logPr(G = 1|X = x)Pr(G = K |X = x)

= β10 +βββT1 x

logPr(G = 2|X = x)Pr(G = K |X = x)

= β20 +βββT2 x

.

.

logPr(G = K −1|X = x)

Pr(G = K |X = x)= β(K−1)0 +βββ

T1 x

(1)

Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 3 / 18

Page 4: FSAN815/ELEG815: Foundations of Statistical Learning · FSAN815/ELEG815: Foundations of Statistical Learning Gonzalo R. Arce Chapter 14: Logistic Regression ... Logistic Regression

Logistic Regression

Logistic Regression

Here βββ k = [βk1,βk2, ...,βkp]T , x = [x1,x2, ...,xp]

T .The model is specified in terms of K −1 log-odds or logittransformations(reflecting the constraint that the probabilities sum to one).A simple calculation shows that

Pr(G = k |X = x) =exp(βk0 +βββ

Tk x)

1+∑K−1l=1 exp(βl0 +βββ

Tl x)

,k = 1,2, ...,K −1,

Pr(G = K |X = x) =1

1+∑K−1l=1 exp(βl0 +βββ

Tl x)

(2)

To emphasize the dependence on the entire parameter setθ = {β10,βββ

Y1 , ...,β(K−1)0,βββ

TK−1}, we denote the probabilities

Pr(G = k |X = x) = pk (x;θ)

Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 4 / 18

Page 5: FSAN815/ELEG815: Foundations of Statistical Learning · FSAN815/ELEG815: Foundations of Statistical Learning Gonzalo R. Arce Chapter 14: Logistic Regression ... Logistic Regression

Fitting Logistic Regression Models

Fitting Logistic Regression Models

Logistic regression models are usually fit by maximum likelihood, usingthe conditional likelihood of G given X .

The log-likelihood for N observations is

l(θ) =N

∑i=1

logpgi (xi ;θ) (3)

where pk (xi ;θ) = Pr(G = k |X = xi ;θ)

Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 5 / 18

Page 6: FSAN815/ELEG815: Foundations of Statistical Learning · FSAN815/ELEG815: Foundations of Statistical Learning Gonzalo R. Arce Chapter 14: Logistic Regression ... Logistic Regression

Fitting Logistic Regression Models

Fitting Logistic Regression ModelsDiscuss in detail the two-class case.

Code the two-class gi via a 0/1 response yi , where yi = 1 when gi = 1,and yi = 0 when gi = 2.Let p1(x;θ) = p(x;θ), p2(x;θ) = 1−p(x;θ)

The log-likelihood can be written as

l(θ) =N

∑i=1{yi logp(xi ;βββ )+(1−yi)log(1−p(xi ;βββ ))}

=N

∑i=1{yi(logp(xi ;βββ )− log(1−p(xi ;βββ )))+ log(1−p(xi ;βββ ))}

=N

∑i=1{yi log

Pr(G = 1|X = xi)

Pr(G = 2|X = xi)+ log(Pr(G = 2|X = xi))}

=N

∑i=1{yiβββ

T xi − log(1+eβββT xi )}

(4)

Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 6 / 18

Page 7: FSAN815/ELEG815: Foundations of Statistical Learning · FSAN815/ELEG815: Foundations of Statistical Learning Gonzalo R. Arce Chapter 14: Logistic Regression ... Logistic Regression

Fitting Logistic Regression Models

Fitting Logistic Regression Models

Here βββ = {β10,βββ 1}, and we assume that the vector of inputs xi includes theconstant term 1 to accommodate the intercept.

To maximize the log-likelihood, we set its derivatives to zero. These scoreequations are

∂ l(βββ )∂βββ

=N

∑i=1

xi(yi −p(xi ;βββ )) = 0 (5)

which are p+1 equations nonlinear in βββ .

Since the first component of xi is 1, the first score equation specifies that∑

Ni=1 yi = ∑

Ni=1 p(xi ;βββ ), the expected number of class ones matches the

observed number.

Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 7 / 18

Page 8: FSAN815/ELEG815: Foundations of Statistical Learning · FSAN815/ELEG815: Foundations of Statistical Learning Gonzalo R. Arce Chapter 14: Logistic Regression ... Logistic Regression

Fitting Logistic Regression Models

Newton−Raphson algorithm

To solve the score equations (5), we use the Newton−Raphson algorithm,which requires the second-derivative or Hessian matrix

∂ 2l(βββ )

∂βββ∂βββT =−

N

∑i=1

xixTi p(xi ;βββ )(1−p(xi ;βββ )) (6)

Starting with βββold , a single Newton update is

βββnew = βββ

old − (∂ 2l(βββ )

∂βββ∂βββT )−1 ∂ l(βββ )

∂βββ(7)

where the derivatives are evaluated at βββold .

Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 8 / 18

Page 9: FSAN815/ELEG815: Foundations of Statistical Learning · FSAN815/ELEG815: Foundations of Statistical Learning Gonzalo R. Arce Chapter 14: Logistic Regression ... Logistic Regression

Fitting Logistic Regression Models

Newton Raphson algorithm

Write the score and Hessian in matrix notation.y: N vector of yi values.xi : p+1 vector of input.X: N× (p+1) matrix of xi values.

p: N vector of fitted probabilities with i th element p(xi ;βββold ).

W: N×N diagonal matrix of weights with i th diagonal elementp(xi ;βββ

old )(1−p(xi ;βββold )).

Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 9 / 18

Page 10: FSAN815/ELEG815: Foundations of Statistical Learning · FSAN815/ELEG815: Foundations of Statistical Learning Gonzalo R. Arce Chapter 14: Logistic Regression ... Logistic Regression

Fitting Logistic Regression Models

Fitting Logistic Regression Models

Then we have

∂ l(βββ )∂βββ

= XT (y−p)

∂ 2l(βββ )

∂βββ∂βββT =−XT WX

(8)

The Newton step is thus

βββnew = βββ

old +(XT WX)−1XT (y−p)

= (XT WX)−1XT W(Xβββold +W−1(y−p))

= (XT WX)−1XT Wz

(9)

Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 10 / 18

Page 11: FSAN815/ELEG815: Foundations of Statistical Learning · FSAN815/ELEG815: Foundations of Statistical Learning Gonzalo R. Arce Chapter 14: Logistic Regression ... Logistic Regression

Fitting Logistic Regression Models

Fitting Logistic Regression Models

In the second and third line we have re-expressed the Newton step as aweighted least squares step, with the response

z = Xβββold +W−1(y−p) (10)

sometimes known as the adjusted response.These equations get solved repeatedly, since at each iteration pchanges, and hence so does W and z.

This algorithm is referred to as iteratively reweighted least squares or IRLS,since each iteration solves the weighted least squares problem:

βββnew ← argmin

βββ

(z−Xβββ )T W(z−Xβββ ) (11)

Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 11 / 18

Page 12: FSAN815/ELEG815: Foundations of Statistical Learning · FSAN815/ELEG815: Foundations of Statistical Learning Gonzalo R. Arce Chapter 14: Logistic Regression ... Logistic Regression

Exercise

Exercise 4.10

Weekly data set is part of the ISLR package. It contains 1089 weeklypercentage returns for 21 years, from the beginning of 1990 to the end of2010.

Lag1 to The percentage returns forLag5 each of the five previous trading weeksVolume The number of shares traded

on the previous week, in billionsToday The percentage return

on the week in questionDirection Whether the market was

Up or Down on this week

Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 12 / 18

Page 13: FSAN815/ELEG815: Foundations of Statistical Learning · FSAN815/ELEG815: Foundations of Statistical Learning Gonzalo R. Arce Chapter 14: Logistic Regression ... Logistic Regression

Exercise

Exercise

(b) Use the full data set to perform a logistic regression.

library(ISLR)attach(Weekly)glm.fit=glm(Direction∼Lag1+Lag2+Lag3+Lag4+Lag5+Volume,data=Weekly,family=binomial)summary(glm.fit)

According to p-value, lag1, lag2 and lag4 appear to be statistically significant.

Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 13 / 18

Page 14: FSAN815/ELEG815: Foundations of Statistical Learning · FSAN815/ELEG815: Foundations of Statistical Learning Gonzalo R. Arce Chapter 14: Logistic Regression ... Logistic Regression

Exercise

Exercise

(c) Compute the confusion matrix and overall fraction of correct predictions.

glm.probs=predict(glm.fit,type="response")glm.pred =rep("Down",1089)glm.pred [glm.probs>.5]="Up"table(glm.pred,Direction)mean(glm.pred==Direction)

Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 14 / 18

Page 15: FSAN815/ELEG815: Foundations of Statistical Learning · FSAN815/ELEG815: Foundations of Statistical Learning Gonzalo R. Arce Chapter 14: Logistic Regression ... Logistic Regression

Exercise

Exercise

(d) Fit the logistic regression model using a training data then predict.

train=(Year<2009)A=(Year==2009)B=(Year==2010)Weekly.2009=Weekly[A,]Weekly.2010=Weekly[B,]Direction.2009=Direction[A]Direction.2010=Direction[B]glm.fit=glm(Direction∼Lag2,data=Weekly,family=binomial,subset=train)glm.probs=predict(glm.fit,Weekly.2009,type="response")glm.pred =rep("Down",52)glm.pred [glm.probs>.5]="Up"table(glm.pred,Direction.2009)mean(glm.pred==Direction.2009)

Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 15 / 18

Page 16: FSAN815/ELEG815: Foundations of Statistical Learning · FSAN815/ELEG815: Foundations of Statistical Learning Gonzalo R. Arce Chapter 14: Logistic Regression ... Logistic Regression

Exercise

Exercise

Figure: prediction for 2009 Figure: prediction for 2010

Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 16 / 18

Page 17: FSAN815/ELEG815: Foundations of Statistical Learning · FSAN815/ELEG815: Foundations of Statistical Learning Gonzalo R. Arce Chapter 14: Logistic Regression ... Logistic Regression

Exercise

Exercise(d) Fit the LDA model using a training data then predict.

library(MASS)lda.fit=lda(Direction∼Lag2,data=Weekly,subset=train)lda.pred=predict(lda.fit,Weekly.2009)lda.class=lda.pred$classtable(lda.class,Direction.2009)mean(lda.class==Direction.2009)

Figure: prediction for 2009 Figure: prediction for 2010

Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 17 / 18

Page 18: FSAN815/ELEG815: Foundations of Statistical Learning · FSAN815/ELEG815: Foundations of Statistical Learning Gonzalo R. Arce Chapter 14: Logistic Regression ... Logistic Regression

Exercise

Exercise

(e) Fit the QDA model using a training data then predict.

qda.fit=qda(Direction∼Lag2,data=Weekly,subset=train)qda.pred=predict(qda.fit,Weekly.2009)qda.class=qda.pred$classtable(qda.class,Direction.2009)mean(qda.class==Direction.2009)

Figure: prediction for 2009 Figure: prediction for 2010

Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 18 / 18