logistic regression and newton-raphson · pdf filelogistic regression and newton-raphson 1.1...

Click here to load reader

Post on 31-Jan-2018

216 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Chapter 1

    Logistic Regression andNewton-Raphson

    1.1 Introduction

    The logistic regression model is widely used in biomedical settings to model

    the probability of an event as a function of one or more predictors. For a

    single predictor X model stipulates that the log odds of success is

    log

    (p

    1 p

    )= 0 + 1X

    or, equivalently, as

    p =exp(0 + 1X)

    1 + exp(0 + 1X)

    where p is the event probability. Depending on the sign of 1, p either

    increases or decreases with X and follows a sigmoidal trend. If 1 = 1

    then p does not depend on X .

  • 2 Logistic Regression and Newton-Raphson

    X

    Log-

    Odd

    s

    -5 0 5

    -50

    5

    - slope

    + slope

    0 slope

    Logit Scale

    X

    Pro

    babi

    lity

    -5 0 5

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    0 slope

    + slope - slope

    Probability Scale

    Note that the logit transformation is undefined when p = 0 or p = 1.

    To overcome this problem, researchers use the empirical logits, defined

    by log{(p + 0.5/n)/(1 p + 0.5/n)}, where n is the sample size or thenumber of observations on which p is based.

    Example: Mortality of confused flour beetles The aim of an

    experiment originally reported by Strand (1930) and quoted by Bliss

    (1935) was to assess the response of the confused flour beetle, Tribolium

    confusum, to gaseous carbon disulphide (CS2). In the experiment, prescribed

    volumes of liquid carbon disulphide were added to flasks in which a tubular

    cloth cage containing a batch of about thirty beetles was suspended.

    Duplicate batches of beetles were used for each concentration of CS2. At

    the end of a five-hour period, the proportion killed was recorded and the

    actual concentration of gaseous CS2 in the flask, measured in mg/l, was

  • 1.1 Introduction 3

    determined by a volumetric analysis. The mortality data are given in the

    table below.

    ## Beetles data set

    # conc = CS2 concentration

    # y = number of beetles killed

    # n = number of beetles exposed

    # rep = Replicate number (1 or 2)

    beetles

  • 4 Logistic Regression and Newton-Raphson

    In a number of articles that refer to these data, the responses from

    the first two concentrations are omitted because of apparent non-linearity.

    Bliss himself remarks that

    . . . in comparison with the remaining observations, the two

    lowest concentrations gave an exceptionally high kill. Over the

    remaining concentrations, the plotted values seemed to form

    a moderately straight line, so that the data were handled as

    two separate sets, only the results at 56.91 mg of CS2 per litre

    being included in both sets.

    However, there does not appear to be any biological motivation for this

    and so here they are retained in the data set.

    Combining the data from the two replicates and plotting the empirical

    logit of the observed proportions against concentration gives a relationship

    that is better fit by a quadratic than a linear relationship,

    log

    (p

    1 p

    )= 0 + 1X + 2X

    2.

    The right plot below shows the linear and quadratic model fits to the

    observed values with point-wise 95% confidence bands on the logit scale,

    and on the left is the same on the proportion scale.

  • 1.2 The Model 5

    0.00

    0.25

    0.50

    0.75

    1.00

    50 60 70conc

    p.ha

    t

    modelorder

    linear

    quadratic

    rep

    1

    2

    Observed and predicted mortality, probability scale

    2.5

    0.0

    2.5

    5.0

    7.5

    50 60 70conc

    emp.

    logi

    t

    modelorder

    linear

    quadratic

    rep

    1

    2

    Observed and predicted mortality, logit scale

    We will focus on how to estimate parameters of a logistic regression

    model using maximum likelihood (MLEs).

    1.2 The Model

    Suppose Yiind Binomial(mi, pi) random variables, i = 1, 2, . . . , n. For

    example, Yi is the number of beetle deaths from a total of mi beetles at

    concentration Xi over the i = 1, 2, . . . , n concentrations. Note that mican equal 1 (and often does in observational studies). Recall that the

    probability mass function for a Binomial is

    Pr[Yi = yi|pi] =(miyi

    )pyii (1 pi)

    miyi, yi = 0, 1, 2, . . . ,mi.

    So the joint distribution of Y1, Y2, . . . , Yn is

    Pr[Y1 = y1, . . . , Yn = yn|p1, . . . , pn] =ni=1

    (miyi

    )pyii (1 pi)

    miyi.

  • 6 Logistic Regression and Newton-Raphson

    The log-likelihood, ignoring the constant, is

    ` = log {Pr[Y1 = y1, . . . , Yn = yn|p1, . . . , pn]}

    log

    {ni=1

    pyii (1 pi)miyi

    }

    =

    ni=1

    {yi log(pi) + (mi yi) log(1 pi)}

    =

    ni=1

    {mi log(1 pi) + yi log

    (pi

    1 pi

    )}. (1.1)

    The logistic regression model assumes that pi depends on r covariates

    xi1, xi2, . . . , xir through

    log

    (pi

    1 pi

    )= 0 + 1xi1 + + rxir

    =[

    1 xi1 xi2 xir]012...

    r

    = x>i .

    The covariates or predictors are fixed, while is an unknown parametervector. Regardless, pi is a function of both xi and ,pi pi(xi, ) or pi() (suppressing xi, since it is known).

  • 1.2 The Model 7

    Note that the model implies

    pi =exp(x>i )

    1 + exp(x>i ) and1 pi =

    1

    1 + exp(x>i ).To obtain the MLEs we first write the log-likelihood in (1.1) as a function

    of ,`() =

    ni=1

    mi log(

    1

    1 + exp(x>i ))

    + yi log

    exp(x>i )1+exp(x>i )1

    1+exp(x>i )

    =

    ni=1

    {mi log

    (1

    1 + exp(x>i ))

    + yi(x>i )}

    =

    ni=1

    {yi(x>i )mi log(1 + exp(x>i ))

    }. (1.2)

    To maximize `(), we compute the score function() =

    `()/0`()/1...`()/r

    and solve the likelihood equations

    () = 0r+1.

  • 8 Logistic Regression and Newton-Raphson

    Note that () is an (r + 1)-by-1 vector, so we are solving a system ofr + 1 non-linear equations.Let us now compute `()/j where j is a generic element of . It isimportant to realize that `() depends on the elements of only throughthe values of xi, which is linear. Thus each of the partial derivatives in() will have the same form!Now

    `()j

    =

    ni=1

    {yi

    j(x>i )mi j log(1 + exp(x>i ))

    }(1.3)

    where

    j(x>i ) = j {0 + 1xi1 + + rxir}

    = xij (where xi0 1) (1.4)

    and

    jlog(1 + exp(x>i )) =

    j

    exp(x>i )1 + exp(x>i )

    =exp(x>i )

    1 + exp(x>i )

    j(x>i )

    = pi(xi, )xij, (1.5)and so

    `()j

    =

    ni=1

    {yixij mipi(xi, )xij

    }=

    ni=1

    {xij(yi mipi(xi, ))

    }, j = 0, 1, . . . , r. (1.6)

  • 1.2 The Model 9

    For NR, we also need the second partial derivatives

    2`

    jk=

    k

    `()j

    =

    ni=1

    {xij

    (yi mi

    pi(xi, )k

    )}.

    It is straightforward to show

    pi(xi, )k

    = xikpi(xi, )(1 pi(xi, )).So

    2`

    jk=

    ni=1

    {xijxikmipi(xi, )(1 pi(xi, ))

    }.

    Recall that Var(Yi) = mipi(xi, )(1 pi(xi, )), from the variance of thebinomial distribution. Let Var(Yi) = vi() = vi(xi, ).For programming, it is convenient to use vector/matrix notation. LetY =

    Y1...Yn

    p = p1...pn

    m = m1...mn

    X =

    x>1...x>n log( p

    1 p)

    =

    log(

    p11p1

    )...

    log(

    pn1pn

    ) operate elementwise.

    The model can be written

    log

    (p

    1 p)

    = X,

  • 10 Logistic Regression and Newton-Raphson

    or, for the ith element,

    log

    (pi

    1 pi

    )= x>i .

    Also, define vectors

    exp(X) = exp(x>1 )...

    exp(x>n) implies p =

    exp(X)1 + exp(X)

    log(1+ exp(X)) = log(1+ exp(x>1 ))...

    log(1+ exp(x>n)) ,

    where operations are performed elementwise.

    Then

    `() =ni=1

    {yi log(pi) + (mi yi) log(1 pi)}

    = y> log(p) + (m y)> log(1 p)=

    ni=1

    {yix>i mi log(1 + exp(x>i ))

    }= y>X m> log(1 + exp(X)) (1.7)

    and

    () =`()/0`()/1...`()/r

    = X>(ym p()),

  • 1.2 The Model 11

    where denotes the Hadamard or elementwise product, so that

    m p() = m1p1()...mnpn()

    .If we think of

    E[Y ] = E[Y1]...

    E[Yn]

    = m1p1()...mnpn()

    = 1()...n()

    ().then the likelihood equations have the form

    () = X>(ym p()) = X>(y ()) = 0.This is the same form as the Normal equations for computing LS estimates

    normal-theory regression. Also, with

    () =[

    2`

    jk

    ]=

    ni=1

    {xijxikvi()

    },

    if we define the diagonal matrix

    v() = diag(v1(), v2(), . . . , vn()) =v1() 0v2() . . .

    0 vn()

    ,then it is easy to see that

    () = X>v()X,that is, the jth row and kth column element of X>v()X isni=1 xijxikvi().

  • 12 Logistic Regression and Newton-Raphson

    It is important to recognize that for the logistic regression model

    I() = E[ ()] = X>v()X = (),that is, NR and Scoring methods are equivalent. In particular, the NR

    methods iterates via

    i+1 = i [ (i)]1 (i)= i + (X>v()X)1X>(y ()), i = 0, 1, . . . ,

    until convergence (hopefully) to the MLE .I will note that the observed information matrix () is independentof Y for logistic regression with the logit link, but not for other binomialresponse models, such as probit regression. Thus, for other models there

    is a difference between NR and Fisher