april 4 logistic regression –lee chapter 9 –cody and smith 9:f

37
April 4 Logistic Regression Lee Chapter 9 Cody and Smith 9:F

Upload: madeline-underwood

Post on 13-Jan-2016

222 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

April 4

• Logistic Regression– Lee Chapter 9

– Cody and Smith 9:F

Page 2: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

HRT Use and Polyps

72 175

102 114

Case (Polyps) Control (No Polyps)

HRT Use

216

174 289

RO = 72/102 175/114

= 0.46

No HRT Use

247

RO HRT Use (Case v Control)

463 ) (RO)2

174) (289) (247) (216) =16.04

463

Page 3: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

Inference for binary data

• Relative risk, odds ratios, 2x2 tables are limited– Can’t adjust for many confounders– Limited to categorical predictors– Can’t look at multiple variables simultaneously

• Logistic regression– Adjust for many confounders– Study continuous predictors– Model interactions

Page 4: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

Linear regression model

Y = o + 1X1 + 2X2 + ... + pXp

Y = dependent variableXi = independent variables

Y is continuous, normally distributed

Model the mean response (Y) based on the predictors

is mean of Y when all Xs are 0 is increase in mean of Y for increase in 1 unit of X

Page 5: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

New regression model?

Y?= o + 1X1 + 2X2 + ... + pXp

Y = binary outcome (0 or 1)

Xi = independent variables

Would like to use this type of model for a binary outcome variable

Page 6: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

Draw a line ?

Page 7: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

What if you had multiple observationsat each Score (or you grouped scores)

Score Proportion Dying

< 10 1/10 = 0.10

11-20 4/15 = 0.27

21-30 5/15 = 0.33

31-40 8/16 = 0.50

*

**

*

Page 8: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

Possibilities for Y

Y?= o + 1X1 + 2X2 + ... + pXp

Y = probability of Y = 1 (Problem: Y bound by 0 -1)

Y = odds of Y = 1

Y = log (odds of Y = 1) – Has good properties

Page 9: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

Probability, Odds, Log Odds

Odds (Log (Odds)0.01 0.01 -4.600.10 0.11 -2.200.20 0.25 -1.380.30 0.43 -0.850.40 0.63 -0.410.50 1.00 0.000.60 1.50 0.410.70 2.33 0.850.80 4.00 1.380.90 9.00 2.200.99 99.00 4.60

Bound by 0 -1Extreme Values

Less extreme values and symmetric about =0.5

Page 10: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F
Page 11: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

Nearly a straight line for middle values of P

Page 12: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

Logistic regression equation

Model log odds of outcome as a linear function of one or more variables

Xi = predictors, independent variables

...)1

log( 22110

xx

The model is:

Page 13: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

A Little Math

• The natural LOG and exponential (EXP) functions are inverse functions of each other– LOG (a) = b EXP (b) = a

– LOG (1) = 0 EXP(0) = 1

– LOG (.5) = -0.693 EXP(-.693) = .5

– LOG (1.5) = .405 EXP(.405) = 1.5

These will be logistic regression betas These will be the odds ratios

Note: Calculators and Excel use LN for natural logarithm

Page 14: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

A Little Math

• LOG function– Takes values [ 0 to +infinity] [-infinity to +infinity]

• EXP function– Takes values [ -infinity to infinity] [0 to +infinity]

Page 15: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

A Little Math

• Properties of LOG function– log (a*b) = log (a) + log (b)

– log (a/b) = log (a) – log (b)

• Properties of EXP function– exp (a+b) = exp(a) * exp(b)

– exp (a-b) = exp(a)/exp(b)

Differences in log odds

Odds Ratios

Page 16: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

(ODDS)

Page 17: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

These will be typical betas from the logistic regression model

These will be the odds ratios

Page 18: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

Logistic regression – single binary covariate

We need to use a dummy variable to code for men and women

x = 1 for women, 0 for men

What do the betas mean? What is odds ratio, women versus men?

x10)1

log(

The model is:

Page 19: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

Odds for Men and Women

For men;

01010 )0()1

log(

x

For women;

101010 )1()1

log(

x

After some algebra, the odds ratio is equal to;

)exp(menfor odds

for women odds1B

is difference in log odds between men and women

Page 20: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

Example - risk of CVD for men vs. women

log(odds) = 0 + 1x

= -2.5504 - 1.0527*x

For females; log(odds) = -2.5504 - 1.0527(1) = -3.6031

For males; log(odds) = -2.5504 - 1.0527(0) = -2.5504

exp(1) = odds ratio for women vs. men

Here, exp(1) = exp(-1.0527) = 0.35

Women are at a 65% lower risk of the outcome than men (OR<1)

Dif = -1.0527

Page 21: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

Note

• Odds ratio from 2 x 2 table• EXP () from logistic regression for binary risk factor

• These will be equal

Page 22: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

Multiple logistic regression model

log(odds) = o + 1X1 + 2X2 + ... + pXp

log(odds) = logarithm of the odds for the outcome, dependent variable

Xi = predictors, independent variables

i - log(OR) associated with either• exposure (for categorical predictors) • a 1 unit increase in predictor (for continuous)

OR adjusted for other variables in model

Page 23: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

Interpretation of coefficients - continuous predictors

Example - effect of age on risk of death in 10 years

log(odds) = -8.2784+ 0.1026*age

0 = -8.2784, 1 = 0.1026

exp(1) = exp(0.1026) = 1.108

A one year increase in age is associated with an odds ratio of death of 1.108 (assumption that this is true for any 2 consecutive ages)

This is an increase of approximately 11% (= 1.108 - 1)

Page 24: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

Interpretation of coefficients - continuous predictors

What about a 5 year increase in age?

Multiply coefficient by the change you want to look at;

exp(5*1) = exp(5*0.1026) = 1.67

A five year increase in age is associated with an odds ratio of death of 1.67

This is an increase of 67%

Note: exp(5*1) does not equal 5*exp(1)

Page 25: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

Parameter Estimation

• How do we come up with estimates for i?

• Can’t use least squares since outcome is not continuous

• Use Maximum Likelihood Estimation (MLE)

Page 26: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

Maximum Likelihood Estimation

• Choose parameter estimates that maximize the probability of observing the data you observed.

• Example for estimation a proportion – Observe 7/10 have characteristic

– P = 0.70 is estimate – P = 0.70 is MLE of Why?)

– Which value of maximizes the probability of getting 7 of 10?

– Answer: 0.70

Page 27: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

MLE Simple Example

• Wish to estimate a proportion • Sample n = 2

– Observe 1 of 2 have characteristic

– L = – What value of maximizes L?

– Answer: = 0.5 which is p=1/2

Page 28: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

Fitted regression line

xp

po 1)

1log(

Curve based on:

o effects location

1 effects curvature

Page 29: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

Inference for multiple logistic regression

• Collect data, choose model, estimate o and is

• Describe odds ratios, exp(i), in statistical terms.

– How confident are we of our estimate?– Is the odds ratio is different from one due to chance?

Not interested in inference for o (related to overall probability of outcome)

Page 30: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

Confidence Intervals for logistic regression coefficients

• General form of 95% CI: Estimate ± 1.96*SE– Bi estimate, provided by SAS– SE is complicated, provided by SAS• Related to variability of our data and sample

size

Page 31: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

95% Confidence Intervals for the odds ratio

• Based on transforming the 95% confidence interval for the parameter estimates

• Supplied automatically by SAS

• Look to see if interval contains 1

“We have a statistically significant association between the predictor and the outcome controlling for all other covariates”

• Equivalent to a hypothesis test; reject Ho: OR = 1 at alpha = 0.05. Based on whether or not 1 is in the interval

),( 96.196.1 SEbSEb ii ee

Page 32: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

Hypothesis test for individual logistic regression coefficient

• Null and alternative hypotheses– Ho : i = 0, Ha: i 0

• Test statistic: 2 = (i/ SE)2, supplied by SAS

• p-values are supplied by SAS

• If p<0.05, “there is a statistically significant association between the predictor and outcome variable controlling for all other covariates” at alpha = 0.05

Page 33: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

PROC LOGISTIC

PROC LOGISTIC DATA = dataset ; MODEL outcome = list of x variables; RUN;

• CLASS statement allows for categorical variables with many

groups (>2)

Page 34: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

DATA temp;INPUT apache death @@ ; xdeath = 2; if death = 1 then xdeath = 1;DATALINES;0 0 2 0 3 0 4 0 5 06 0 7 0 8 0 9 0 10 011 0 12 0 13 0 14 0 15 016 0 17 1 18 1 19 0 20 021 1 22 1 23 0 24 1 25 126 1 27 0 28 1 29 1 30 131 1 32 1 33 1 34 1 35 136 1 37 1 38 1 41 0;PROC LOGIST DATA=temp; MODEL xdeath = apache;RUN;

Page 35: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

The LOGISTIC Procedure

Model Information

Data Set WORK.TEMPResponse Variable xdeathNumber of Response Levels 2Number of Observations 39Model binary logitOptimization Technique Fisher's scoring

Response Profile

Ordered Total Value xdeath Frequency

1 1 18 2 2 21

Probability modeled is xdeath=1.

Page 36: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

The LOGISTIC Procedure

Analysis of Maximum Likelihood Estimates

Standard WaldParameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 -4.3861 1.3687 10.2686 0.0014apache 1 0.2034 0.0605 11.3093 0.0008

Odds Ratio Estimates

Point 95% WaldEffect Estimate Confidence Limits

apache 1.226 1.089 1.380

EXP(0.2034)EXP(0.2034 – 1.96*.0605)

EXP(0.2034 +1.96*.0605)

Page 37: April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

TOMHS – bpstudy sas dataset

• Variable CLINICAL (1=yes, 0 =no) indicates whether patient had a CVD event

• Run logistic regression separately for age and gender to determine if:

– Age is related to CVD

• What is the odds associated with a 1 year increase in age

• What is the odds associated with a 5 year increase in age

– Gender is related to CVD

• What is the odds of CVD (women versus men)

• Run logistic regression for age and gender together

• Note: Download dataset from web-page or use dataset on SATURN