# logistic regression and newton-raphson · pdf filelogistic regression and newton-raphson 1.1...

Post on 31-Jan-2018

216 views

Embed Size (px)

TRANSCRIPT

Chapter 1

Logistic Regression andNewton-Raphson

1.1 Introduction

The logistic regression model is widely used in biomedical settings to model

the probability of an event as a function of one or more predictors. For a

single predictor X model stipulates that the log odds of success is

log

(p

1 p

)= 0 + 1X

or, equivalently, as

p =exp(0 + 1X)

1 + exp(0 + 1X)

where p is the event probability. Depending on the sign of 1, p either

increases or decreases with X and follows a sigmoidal trend. If 1 = 1

then p does not depend on X .

2 Logistic Regression and Newton-Raphson

X

Log-

Odd

s

-5 0 5

-50

5

- slope

+ slope

0 slope

Logit Scale

X

Pro

babi

lity

-5 0 5

0.0

0.2

0.4

0.6

0.8

1.0

0 slope

+ slope - slope

Probability Scale

Note that the logit transformation is undefined when p = 0 or p = 1.

To overcome this problem, researchers use the empirical logits, defined

by log{(p + 0.5/n)/(1 p + 0.5/n)}, where n is the sample size or thenumber of observations on which p is based.

Example: Mortality of confused flour beetles The aim of an

experiment originally reported by Strand (1930) and quoted by Bliss

(1935) was to assess the response of the confused flour beetle, Tribolium

confusum, to gaseous carbon disulphide (CS2). In the experiment, prescribed

volumes of liquid carbon disulphide were added to flasks in which a tubular

cloth cage containing a batch of about thirty beetles was suspended.

Duplicate batches of beetles were used for each concentration of CS2. At

the end of a five-hour period, the proportion killed was recorded and the

actual concentration of gaseous CS2 in the flask, measured in mg/l, was

1.1 Introduction 3

determined by a volumetric analysis. The mortality data are given in the

table below.

## Beetles data set

# conc = CS2 concentration

# y = number of beetles killed

# n = number of beetles exposed

# rep = Replicate number (1 or 2)

beetles

4 Logistic Regression and Newton-Raphson

In a number of articles that refer to these data, the responses from

the first two concentrations are omitted because of apparent non-linearity.

Bliss himself remarks that

. . . in comparison with the remaining observations, the two

lowest concentrations gave an exceptionally high kill. Over the

remaining concentrations, the plotted values seemed to form

a moderately straight line, so that the data were handled as

two separate sets, only the results at 56.91 mg of CS2 per litre

being included in both sets.

However, there does not appear to be any biological motivation for this

and so here they are retained in the data set.

Combining the data from the two replicates and plotting the empirical

logit of the observed proportions against concentration gives a relationship

that is better fit by a quadratic than a linear relationship,

log

(p

1 p

)= 0 + 1X + 2X

2.

The right plot below shows the linear and quadratic model fits to the

observed values with point-wise 95% confidence bands on the logit scale,

and on the left is the same on the proportion scale.

1.2 The Model 5

0.00

0.25

0.50

0.75

1.00

50 60 70conc

p.ha

t

modelorder

linear

quadratic

rep

1

2

Observed and predicted mortality, probability scale

2.5

0.0

2.5

5.0

7.5

50 60 70conc

emp.

logi

t

modelorder

linear

quadratic

rep

1

2

Observed and predicted mortality, logit scale

We will focus on how to estimate parameters of a logistic regression

model using maximum likelihood (MLEs).

1.2 The Model

Suppose Yiind Binomial(mi, pi) random variables, i = 1, 2, . . . , n. For

example, Yi is the number of beetle deaths from a total of mi beetles at

concentration Xi over the i = 1, 2, . . . , n concentrations. Note that mican equal 1 (and often does in observational studies). Recall that the

probability mass function for a Binomial is

Pr[Yi = yi|pi] =(miyi

)pyii (1 pi)

miyi, yi = 0, 1, 2, . . . ,mi.

So the joint distribution of Y1, Y2, . . . , Yn is

Pr[Y1 = y1, . . . , Yn = yn|p1, . . . , pn] =ni=1

(miyi

)pyii (1 pi)

miyi.

6 Logistic Regression and Newton-Raphson

The log-likelihood, ignoring the constant, is

` = log {Pr[Y1 = y1, . . . , Yn = yn|p1, . . . , pn]}

log

{ni=1

pyii (1 pi)miyi

}

=

ni=1

{yi log(pi) + (mi yi) log(1 pi)}

=

ni=1

{mi log(1 pi) + yi log

(pi

1 pi

)}. (1.1)

The logistic regression model assumes that pi depends on r covariates

xi1, xi2, . . . , xir through

log

(pi

1 pi

)= 0 + 1xi1 + + rxir

=[

1 xi1 xi2 xir]012...

r

= x>i .

The covariates or predictors are fixed, while is an unknown parametervector. Regardless, pi is a function of both xi and ,pi pi(xi, ) or pi() (suppressing xi, since it is known).

1.2 The Model 7

Note that the model implies

pi =exp(x>i )

1 + exp(x>i ) and1 pi =

1

1 + exp(x>i ).To obtain the MLEs we first write the log-likelihood in (1.1) as a function

of ,`() =

ni=1

mi log(

1

1 + exp(x>i ))

+ yi log

exp(x>i )1+exp(x>i )1

1+exp(x>i )

=

ni=1

{mi log

(1

1 + exp(x>i ))

+ yi(x>i )}

=

ni=1

{yi(x>i )mi log(1 + exp(x>i ))

}. (1.2)

To maximize `(), we compute the score function() =

`()/0`()/1...`()/r

and solve the likelihood equations

() = 0r+1.

8 Logistic Regression and Newton-Raphson

Note that () is an (r + 1)-by-1 vector, so we are solving a system ofr + 1 non-linear equations.Let us now compute `()/j where j is a generic element of . It isimportant to realize that `() depends on the elements of only throughthe values of xi, which is linear. Thus each of the partial derivatives in() will have the same form!Now

`()j

=

ni=1

{yi

j(x>i )mi j log(1 + exp(x>i ))

}(1.3)

where

j(x>i ) = j {0 + 1xi1 + + rxir}

= xij (where xi0 1) (1.4)

and

jlog(1 + exp(x>i )) =

j

exp(x>i )1 + exp(x>i )

=exp(x>i )

1 + exp(x>i )

j(x>i )

= pi(xi, )xij, (1.5)and so

`()j

=

ni=1

{yixij mipi(xi, )xij

}=

ni=1

{xij(yi mipi(xi, ))

}, j = 0, 1, . . . , r. (1.6)

1.2 The Model 9

For NR, we also need the second partial derivatives

2`

jk=

k

`()j

=

ni=1

{xij

(yi mi

pi(xi, )k

)}.

It is straightforward to show

pi(xi, )k

= xikpi(xi, )(1 pi(xi, )).So

2`

jk=

ni=1

{xijxikmipi(xi, )(1 pi(xi, ))

}.

Recall that Var(Yi) = mipi(xi, )(1 pi(xi, )), from the variance of thebinomial distribution. Let Var(Yi) = vi() = vi(xi, ).For programming, it is convenient to use vector/matrix notation. LetY =

Y1...Yn

p = p1...pn

m = m1...mn

X =

x>1...x>n log( p

1 p)

=

log(

p11p1

)...

log(

pn1pn

) operate elementwise.

The model can be written

log

(p

1 p)

= X,

10 Logistic Regression and Newton-Raphson

or, for the ith element,

log

(pi

1 pi

)= x>i .

Also, define vectors

exp(X) = exp(x>1 )...

exp(x>n) implies p =

exp(X)1 + exp(X)

log(1+ exp(X)) = log(1+ exp(x>1 ))...

log(1+ exp(x>n)) ,

where operations are performed elementwise.

Then

`() =ni=1

{yi log(pi) + (mi yi) log(1 pi)}

= y> log(p) + (m y)> log(1 p)=

ni=1

{yix>i mi log(1 + exp(x>i ))

}= y>X m> log(1 + exp(X)) (1.7)

and

() =`()/0`()/1...`()/r

= X>(ym p()),

1.2 The Model 11

where denotes the Hadamard or elementwise product, so that

m p() = m1p1()...mnpn()

.If we think of

E[Y ] = E[Y1]...

E[Yn]

= m1p1()...mnpn()

= 1()...n()

().then the likelihood equations have the form

() = X>(ym p()) = X>(y ()) = 0.This is the same form as the Normal equations for computing LS estimates

normal-theory regression. Also, with

() =[

2`

jk

]=

ni=1

{xijxikvi()

},

if we define the diagonal matrix

v() = diag(v1(), v2(), . . . , vn()) =v1() 0v2() . . .

0 vn()

,then it is easy to see that

() = X>v()X,that is, the jth row and kth column element of X>v()X isni=1 xijxikvi().

12 Logistic Regression and Newton-Raphson

It is important to recognize that for the logistic regression model

I() = E[ ()] = X>v()X = (),that is, NR and Scoring methods are equivalent. In particular, the NR

methods iterates via

i+1 = i [ (i)]1 (i)= i + (X>v()X)1X>(y ()), i = 0, 1, . . . ,

until convergence (hopefully) to the MLE .I will note that the observed information matrix () is independentof Y for logistic regression with the logit link, but not for other binomialresponse models, such as probit regression. Thus, for other models there

is a difference between NR and Fisher