22s:152 applied linear regression ch. 14 (sec. 1) and ch...

22
22s:152 Applied Linear Regression Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression ———————————————————— Logistic Regression When the response variable is a binary vari- able, such as 0 or 1 live or die fail or succeed then we approach our modeling a little dif- ferently What is wrong with our previous modeling? Let’s look at an example... 1

Upload: others

Post on 04-Feb-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

  • 22s:152 Applied Linear Regression

    Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4):Logistic Regression————————————————————

    Logistic Regression

    •When the response variable is a binary vari-able, such as

    – 0 or 1

    – live or die

    – fail or succeed

    then we approach our modeling a little dif-ferently

    •What is wrong with our previous modeling?– Let’s look at an example...

    1

  • • Example: Study on lead levels in children

    Binary response called highbld:

    Either high lead blood level(1) orlow lead blood level(0)

    One continuous predictor variable called soil:

    Measure of the level of lead in the soil inthe subject’s backyard

    > sl=read.csv("soillead.csv")

    > attach(sl)

    > head(sl)

    highbld soil

    1 1 1290

    2 0 90

    3 1 894

    4 0 193

    5 1 1410

    6 1 410

    2

  • The dependent variable plotted against theindependent variable:

    0 1000 2000 3000 4000 5000

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    soil

    highbld

    Original

    0 1000 2000 3000 4000 5000

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    soil

    highbld

    Jittered

    3

  • Consider the usual regression model:Yi = β0 + β1xi + �i

    The fitted model and residuals:Ŷ = 0.3178 + 0.0003x

    0 1000 3000 5000

    0.00.20.40.60.81.0

    Fitted model (jittered points)

    soil

    highbld

    0.4 0.8 1.2 1.6

    -0.8

    -0.4

    0.0

    0.4

    Residuals vs. Fitted values

    lm.out$fitted.values

    lm.out$residuals

    -2 -1 0 1 2

    -0.8

    -0.4

    0.0

    0.4

    Normal Q-Q Plot

    Theoretical Quantiles

    Sam

    ple

    Qua

    ntile

    s

    All sorts of violations here...

    4

  • Violations:

    - Our usual �i ∼ N(0, σ2) isn’t reasonable

    The errors are not normal, and they don’thave the same variability across all x-values.

    - predictions for observations can be outside [0,1].

    For x=3000, Ŷ = 1.14, and this is notpossibly an average of the 0s and 1s at x=3000(like a conditional mean given x).

    Ŷx=3000 = 1.14 which is not in [0,1].

    5

  • •What is a better way to model a0-1 response using regression?

    • At each x value, there’s a certain chance of a0 and a certain chance of a 1.

    0 1000 2000 3000 4000 5000

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    soil

    highbld

    • For example, in this data set, it looks morelikely to get a 1 at higher values of x.

    • Or, P (Y = 1) changes with the x-value.

    6

  • • Conditioning on a given x, we haveP (Y = 1), and P (Y = 1) ∈ (0, 1).

    •We will consider this probability of getting a1 given the x-value(s) in our modeling...

    πi = P (Yi = 1|Xi)

    The previous plot shows that P (Y = 1) de-pends on the value of x.

    • It is actually a transformation of this prob-ability (πi) that we will use as our responsein the regression model.

    7

  • Odds of an Event

    • First, let’s discuss the odds of an event.

    When two fair coins are flipped,P (two heads)=1/4P (not two heads)=3/4

    The odds in favor of getting two heads is:

    odds =P (2 heads)

    P (not 2 heads)=

    1/4

    3/4= 1/3

    or sometimes referred to as 1 to 3 odds.

    You’re 3 times as likely to not get 2 headsas you are to get 2 heads.

    8

  • • For a binary variable Y (2 possible outcomes),

    odds in favor of Y=1 isP (Y =1)P (Y =0)

    =P (Y =1)

    1−P (Y =1)

    • For example, if P (heart attack)=0.0018,then the odds of a heart attack is

    0.00180.9982 =

    0.00181−0.0018 = 0.001803

    • The ratio of the odds for two different groupsis also a quantity of interest.

    For example, consider heart attacks for“male nonsmoker vs. male smoker”

    Suppose P (heart attack)=0.0036 for a malesmoker, and P (heart attack)=0.0018 for amale nonsmoker.

    9

  • Then, the odds ratio (O.R.) for a heart at-tack in nonsmoker vs. smoker is

    O.R. =odds of a heart attack for non-smoker

    odds of a heart attack for smoker

    =

    (Pr(heart attack|non-smoker)

    1−Pr(heart attack|non-smoker)

    )(

    Pr(heart attack|smoker)1−Pr(heart attack|smoker)

    )

    =

    (0.00180.9982

    )(0.00360.9964

    ) = 0.4991• Interpretation of the odds ratio for binary 0-1

    – O.R. ≥ 0– If O.R. = 1.0, then P (Y = 1) is the same

    in both samples

    – IfO.R. < 1.0, then P (Y = 1) is less in thenumerator group than in the denominatorgroup

    – O.R. = 0 if and only if P (Y = 1) = 0 innumerator sample

    10

  • Back to Logistic Regression...

    • The response variable we will model is a trans-formation of P (Yi = 1) for a given Xi.

    • The transformation is the logit transforma-tion

    logit(a) = ln

    (a

    1− a

    )• The response variable we will use:

    logit[P (Yi = 1)] = ln

    (P (Yi = 1)

    1− P (Yi = 1)

    )This is the loge of the odds that Yi = 1.

    Notice: P (Yi = 1) ∈ (0, 1)(P (Yi=1)

    1−P (Yi=1)

    )∈ (0,∞)

    and −∞ < ln(

    P (Yi=1)1−P (Yi=1)

    )

  • • The logistic regression model:

    ln

    (P (Yi = 1)

    1− P (Yi = 1)

    )= β0+β1X1i+· · ·+βkXki

    – This response on the left isn’t ‘bounded’by [0,1] eventhough the Y-values them-selves are (having the response boundedby [0,1] was a problem before).

    – The response on the left can feasibly beany positive or negative quantity.

    – This a nice characteristic because the rightside of the equation can ‘potentially’ giveany possible predicted value −∞ to ∞.

    • The logistic regression model is aGENERALIZED LINEAR MODEL.Linear model on the right, something otherthan the usual continuous Y on the left.

    12

  • • Let’s look at the fitted logistic regression modelfor the lead level data with one X covariate.

    ln

    (P (Yi = 1)

    1− P (Yi = 1)

    )= β̂0+β̂1Xi = −1.5160+0.0027 Xi

    ●●

    ●●

    ●● ●

    ●●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ●●● ●

    ●●● ●

    ●●●

    ● ●

    ●●

    ●●●

    ●●●●

    ● ●

    ● ●

    ●●●

    ●●

    ●●●

    ●●

    ●●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●●●

    ●●

    ●●

    ●●●

    ● ●

    ●●

    0 1000 2000 3000 4000 5000

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    soil

    jitte

    r(hi

    ghbl

    d, fa

    ctor

    = 0

    .12)

    The curve represents the E(Yi|xi).

    And...

    E(Yi|xi) = 0·P (Yi = 0|xi)+1·P (Yi = 1|xi)

    = P (Yi = 1|xi)

    13

  • •We can manipulate the regression model toput it in terms of P (Yi = 1) = E(Yi)

    • If we take this regression model

    ln

    (P (Yi = 1)

    1− P (Yi = 1)

    )= −1.5160 + 0.0027 Xi

    and solve for P (Yi = 1) we get...

    P (Yi = 1) =exp(−1.5160 + 0.0027 Xi)

    1 + exp(−1.5160 + 0.0027 Xi)

    – The value on the right is bounded to [0,1].

    – Because our β1 is positive,

    ∗ As X gets larger, P (Yi = 1) goes to 1.∗ As X gets smaller, P (Yi = 1) goes to 0.

    – The fitted curve, i.e. the function of Xi onthe right, is S-shaped (sigmoidal)

    14

  • • In the general model with one covariate:

    ln

    (P (Yi = 1)

    1− P (Yi = 1)

    )= β0 + β1 Xi

    which means

    P (Yi = 1) =exp(β0 + β1 Xi)

    1 + exp(β0 + β1 Xi)

    =1

    1 + exp[−(β0 + β1 Xi)]

    • If β1 is positive, we get the S-shape that goesto 1 asXi goes to∞ and goes to 0 asXi goesto −∞ (as in the lead example).

    • If β1 is negative, we get the opposite S-shapethat goes to 0 as Xi goes to ∞ and goes to1 as Xi goes to −∞.

    15

  • • Another way to think of logistic regression isthat we’re modeling a Bernoulli random vari-able occurring for each Xi, and the Bernoulliparameter πi depends on the covariate value.

    Then, Yi|Xi ∼ Bernoulli(πi)

    where πi represents P (Yi = 1|Xi)

    E(Yi|Xi) = πi and

    V (Yi|Xi) = πi(1− πi)

    We’re thinking in terms of the conditional distribu-

    tion of Y |X (we don’t have constant variance acrossthe X values, mean and variance are tied together).

    Writing πi as a function of Xi,

    πi =exp(β0 + β1 Xi)

    1 + exp(β0 + β1 Xi)

    16

  • Interpretation of parametersFor the lead levels example.

    • Intercept β0:

    When Xi = 0, there is no lead in the soil inthe backyard. Then,

    ln

    (P (Yi = 1)

    1− P (Yi = 1)

    )= β0

    So, β0 is the log-odds that a randomly se-lected child with no lead in their backyardhas a high lead blood level.

    Or, πi =eβ0

    1+eβ0is the chance that a kid with

    no lead in their backyard has high lead bloodlevel.

    For this data,β̂0 = −1.5160 or π̂i = 0.1800

    17

  • Interpretation of parametersFor the lead levels example.

    • Coefficient β1:

    Consider the loge of the odds ratio (O.R.) ofhaving high lead blood levels for the following2 groups...

    1) those with exposure level of x2) those with exposure level of x + 1

    loge

    ( π21−π2π1

    1−π1

    )= loge

    (eβ0eβ1(x+1)

    eβ0eβ1x

    )= loge

    (eβ1)

    = β1

    β1 is the loge of O.R. for a 1 unit increase inx.

    It compares the groups with x exposure and(x + 1) exposure.

    18

  • Or, un-doing the log,

    eβ1 is theO.R. comparing the two groups.

    For this data,β̂1 = 0.0027 or e

    0.0027 = 1.0027

    A 1 unit increase in X increases the odds ofhaving a high lead blood level by a factor of1.0027.

    Though this value is small, the range of theX values is large(40 to 5255), so it can havea substantial impact when you consider thefull spectrum of x values.

    19

  • Prediction

    •What is the predicted probability of high leadblood level for a child with X = 500.

    P (Yi = 1) =exp(−1.5160 + 0.0027 Xi)

    1 + exp(−1.5160 + 0.0027 Xi)

    =1

    1 + exp[−(−1.5160 + 0.0027 Xi)]

    =1

    1 + exp[1.5160− 0.0027× 500]= 0.4586

    •What is the predicted probability of high leadblood level for a child with X = 4000.

    P (Yi = 1) =1

    1 + exp[1.5160− 0.0027× 4000]= 0.9999

    ●●

    ●●

    ●● ●

    ●●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ●●● ●

    ●●● ●

    ●●●

    ● ●

    ●●

    ●●●

    ●●●●

    ● ●

    ● ●

    ●●●

    ●●

    ●●●

    ●●

    ●●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●●●

    ●●

    ●●

    ●●●

    ● ●

    ●●

    0 1000 2000 3000 4000 5000

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    soil

    jitte

    r(hi

    ghbl

    d, fa

    ctor

    = 0

    .12)

    20

  • •What is the predicted probability of high leadblood level for a child with X = 0.

    P (Yi = 1) =1

    1 + exp[1.5160]= 0.1800

    So, there’s still a chance of having high leadblood level even when the backyard doesn’thave any lead. This is why we don’t see thelow-end of our fitted S-curve go to zero.

    Testing Significance of Covariate

    > glm.out=glm(highbld~soil,family=binomial(logit))

    > summary(glm.out)

    Coefficients:

    Estimate Std. Error z value Pr(>|z|)

    (Intercept) -1.5160568 0.3380483 -4.485 7.30e-06 ***

    soil 0.0027202 0.0005385 5.051 4.39e-07 ***

    ---

    Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

    Soil is a significant predictor in the logistic re-gression model.

    21

  • A couple things...

    •Method of Maximum Likelihood is usedto fit the model.

    Estimates β̂j are asymptotically normal, soR uses Z-tests (or Wald tests) for covariatesignificance in the output. [NOTE: squaringa z-statistic gets you a chi-squared statisticwith 1 df.]

    • General concepts easily extended to logisticregression with many covariates.

    22