Transcript
Page 1: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

Logistic RegressionLogistic Regression

Page 2: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

OutlineOutline

• Review of simple and multiple regressionReview of simple and multiple regression• Simple Logistic RegressionSimple Logistic Regression

• The logistic functionThe logistic function• Interpretation of coefficientsInterpretation of coefficients

• continuous predictor (X)continuous predictor (X)• dichotomous categorical predictor (X)dichotomous categorical predictor (X)• categorical predictor with three or more levels (X)categorical predictor with three or more levels (X)

• Multiple Logistic RegressionMultiple Logistic Regression• ExamplesExamples

Page 3: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

Simple Linear RegressionSimple Linear RegressionModel the mean of a numeric response Y as a Model the mean of a numeric response Y as a

function of a single predictor X, i.e.function of a single predictor X, i.e.

E(Y|X) = E(Y|X) = oo + + 11f(x)f(x)

Here Here f(x)f(x) is any function of X, e.g. is any function of X, e.g.

f(x) = Xf(x) = X E(Y|X) = E(Y|X) = oo + + 11X X (line)(line)

f(x) = ln(X) f(x) = ln(X) E(Y|X) = E(Y|X) = oo + + 11ln(X) ln(X) (curved)(curved)

The key is that E(Y|X) is a linear in theThe key is that E(Y|X) is a linear in theparameters parameters oo and and 1 1 but not necessarily in X.but not necessarily in X.

Page 4: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

0 = Estimated Intercept

1 = Estimated Slopew units

0

1 w units

Simple Linear RegressionSimple Linear Regression

Interpretable only if x = 0 is avalue of particular interest.

Always interpretable!

= -value at x = 0 y

= Change in for every unit increase in x

yx0

y

xXYEy 10ˆˆ)|(ˆˆ

^

^

^

^

= estimated change in the mean of Y for a unit change in X.

Page 5: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

Multiple Linear RegressionMultiple Linear Regression

We model the mean of a numeric response We model the mean of a numeric response as linear combination of the predictors as linear combination of the predictors themselves or some functions based on themselves or some functions based on the predictors, i.e.the predictors, i.e.

E(Y|E(Y|XX) = ) = oo + + 11XX11+ + 22XX22 +…+ +…+ ppXXpp

Here the terms in the model are the predictors

E(Y|E(Y|XX) = ) = oo + + 1 1 ff11((XX))++ 2 2 ff22((XX))+…++…+ k k ffkk((XX))

Here the terms in the model are k different functions of the p predictors

Page 6: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

Multiple Linear RegressionMultiple Linear RegressionFor the classic multiple regression modelFor the classic multiple regression model

E(Y|E(Y|XX) = ) = oo + + 11XX11+ + 22XX22 +…+ +…+ ppXXpp

the regression coefficients (the regression coefficients (ii) represent the ) represent the

estimated change in the mean of the estimated change in the mean of the response response YY associated with a unit change in associated with a unit change in XXii while the other predictors are held while the other predictors are held

constant. constant.

They measure the association between They measure the association between YY and and XXii adjustedadjusted for the other predictors in the for the other predictors in the

model. model.

Page 7: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

General Linear ModelsGeneral Linear Models• Family of regression modelsFamily of regression models• Response Model Type

• Continuous Linear regression

• Counts Poisson regression

• Survival times Cox model

• Binomial Logistic regression• UsesUses

• Control for potentially confounding factorsControl for potentially confounding factors• Model building , risk predictionModel building , risk prediction

Page 8: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

Logistic RegressionLogistic Regression• Models relationship betweenModels relationship between set of variables set of variables XXii

• dichotomous (yes/no, smoker/nonsmoker,…)dichotomous (yes/no, smoker/nonsmoker,…)• categorical (social classcategorical (social class, race, , race, ...... ))• continuous (agecontinuous (age, weight, gestational age, , weight, gestational age, ...)...)

aandnd• dichotomous dichotomous categorical response categorical response variable variable YY

e.g. e.g. SuccessSuccess//FailureFailure, , RemissionRemission//No RemissionNo Remission SurvivedSurvived//DiedDied, , CHDCHD//No CHDNo CHD, , Low Birth WeightLow Birth Weight//Normal Normal Birth WeightBirth Weight, etc…, etc…

Page 9: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

Logistic RegressionLogistic Regression

Example: Coronary Heart Disease (CD) and AgeExample: Coronary Heart Disease (CD) and Age In In this study sampled individuals were examined for this study sampled individuals were examined for signs of CD (present = 1 / absent = 0) and the signs of CD (present = 1 / absent = 0) and the potential relationship between this outcome and their potential relationship between this outcome and their age (yrs.) was considered. age (yrs.) was considered.

This is a portion of the raw data for the 100 subjects who participated in the study.

Page 10: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

Logistic RegressionLogistic Regression• How can we analyze these data?How can we analyze these data?

The mean age of the individuals with some signs of coronary heart disease is 51.28 years vs. 39.18 years for individuals without signs (t = 5.95, p < .0001).

Non-pooled t-test

Page 11: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

Logistic RegressionLogistic Regression

Simple Linear Regression?Simple Linear Regression? Smooth Regression Smooth Regression Estimate?Estimate?

??46.5002.54.)50|(

age of years 50 individualan For ..

02.54.)|(

AgeCDE

ge

AgeAgeCDE The smooth regression estimate is “S-shaped” but what does the estimated mean value represent?

Answer: P(CD|Age)!!!!

Page 12: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

Logistic RegressionLogistic Regression

We can group individuals into age classes and look We can group individuals into age classes and look at the percentage/proportion showing signs of at the percentage/proportion showing signs of coronary heart disease.coronary heart disease.

Notice the “S-shape” to the estimated proportions vs. age.

Page 13: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

Logistic FunctionLogistic Function

0.0

0.2

0.4

0.6

0.8

1.0

X

P(“

Su

cces

s”|X

)

X

X

o

o

e

eXSuccessP

1

1

1)|"("

Page 14: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

Logit TransformationLogit Transformation

The logistic regression model is given byThe logistic regression model is given by

which is equivalent towhich is equivalent to

X

X

o

o

e

eXYP

1

1

1)|(

XXYP

XYPo 1)|(1

)|(ln

This is called the Logit Transformation

Page 15: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

Dichotomous PredictorDichotomous Predictor Consider a dichotomous predictor (X) which Consider a dichotomous predictor (X) which

represents the presence of risk represents the presence of risk (1 = present)(1 = present)

XoeP

P1

1

o

o

e

e

0)X|1P(Y-1

0)X|1P(Y Absent Risk with Diseasefor Odds

1)X|1P(Y-1

1)X|1P(YPresentRisk with Diseasefor Odds 1

Therefore the odds ratio (OR)

1

1

AbsentRisk with Diseasefor Odds

PresentRisk with Diseasefor Odds

ee

eo

o

Page 16: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

Dichotomous PredictorDichotomous Predictor

• Therefore, for the odds ratio associated with Therefore, for the odds ratio associated with risk presence we haverisk presence we have

• Taking the natural logarithm we haveTaking the natural logarithm we have

thus the estimated regression coefficient thus the estimated regression coefficient associated with a 0-1 coded dichotomous associated with a 0-1 coded dichotomous predictor is the natural log of the OR predictor is the natural log of the OR associated with risk presence!!!associated with risk presence!!!

1eOR

1)ln( OR

Page 17: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

Logit is Directly Related to OddsLogit is Directly Related to Odds

The logistic model can be writtenThe logistic model can be written

This implies that the odds for success can This implies that the odds for success can be expressed asbe expressed as

XP

P

XYP

XYPo 11

ln)|(1

)|(ln

XoeP

P1

1

This relationship is the key to interpreting the coefficients in a logistic regression model !!

Page 18: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

Dichotomous Predictor Dichotomous Predictor (+1/-1 coding)(+1/-1 coding)

Consider a dichotomous predictor (X) which Consider a dichotomous predictor (X) which represents the presence of risk represents the presence of risk (1 = present)(1 = present)

XoeP

P1

1

1

1

-1)X|1P(Y-1

-1)X|1P(Y Absent Risk with Diseasefor Odds

1)X|1P(Y-1

1)X|1P(YPresentRisk with Diseasefor Odds

o

o

e

e

Therefore the odds ratio (OR)

1

1

12

AbsentRisk with Diseasefor Odds

PresentRisk with Diseasefor Odds

ee

eo

o

Page 19: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

Dichotomous PredictorDichotomous Predictor

• Therefore, for the odds ratio associated with Therefore, for the odds ratio associated with risk presence we haverisk presence we have

• Taking the natural logarithm we haveTaking the natural logarithm we have

thus twice the estimated regression thus twice the estimated regression coefficient associated with a +1 / -1 coded coefficient associated with a +1 / -1 coded dichotomous predictor is the natural log of dichotomous predictor is the natural log of the OR associated with risk presence!!!the OR associated with risk presence!!!

12eOR

12)ln( OR

Page 20: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

Example: Age at 1Example: Age at 1stst Pregnancy and Pregnancy and Cervical CancerCervical Cancer

Use Fit Model Y = Disease StatusUse Fit Model Y = Disease Status

X = Risk Factor StatusX = Risk Factor Status

When the response Y is a dichotomous categorical variable the Personality box will automatically change to Nominal Logistic, i.e. Logistic Regression will be used.

Remember when a dichotomous categorical predictor is used JMP uses +1/-1 coding. If you want you can code them as 0-1 and treat is as numeric.

Page 21: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

Example: Age at 1Example: Age at 1stst Pregnancy and Pregnancy and Cervical CancerCervical Cancer

607.0 ˆ

183.2ˆ

1

o

Thus the estimated odds ratio is

37.3

214.1)607(.2ˆ2)ln(214.1

1

eOR

OR

Women whose first pregnancy is at or before age 25 have 3.37 times the odds for developing cervical cancer than women whose 1st pregnancy occurs after age 25.

Page 22: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

Example: Age at 1Example: Age at 1stst Pregnancy and Pregnancy and Cervical CancerCervical Cancer

607.0 ˆ

183.2ˆ

1

o

Thus the estimated odds ratio is

Risk Present Odds Ratio for disease associated with risk presence

Page 23: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

Example: Smoking and Low Birth WeightExample: Smoking and Low Birth Weight

Use Fit Model Y = Low Birth Weight (Use Fit Model Y = Low Birth Weight (LLow, ow, NNorm)orm)

X = Smoking Status (X = Smoking Status (CCig, ig, NNoCig)oCig)

954.1

335.ˆ

670.ˆ2

1

1

eeOR

We estimate that women who smoker during pregnancy have 1.95 times higher odds for having a child with low birth weight than women who do not smoke cigarettes during pregnancy.

Page 24: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

Example: Smoking and Low Birth WeightExample: Smoking and Low Birth Weight

954.1

335.ˆ

670.ˆ2

1

1

eeOR

Find a 95% CI for OR

1st Find a 95% CI for

2nd Compute CI for OR = (e2LCL, e2UCL)

)360,.310(.025.335.)013(.96.1335.)ˆ(96.1ˆ11 SE

(LCL,UCL)

)05.2 , 86.1(),( 360.23102 ee

We estimate that the odds for having a low birth weight infant are between 1.86 and 2.05 times higher for smokers than non-smokers, with 95% confidence.

Page 25: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

Example: Smoking and Low Birth WeightExample: Smoking and Low Birth Weight

We might want to adjust for other potential We might want to adjust for other potential confounding factors in our analysis of the confounding factors in our analysis of the risk associated with smoking during risk associated with smoking during pregnancy. This is accomplished by simply pregnancy. This is accomplished by simply adding these covariates to our model.adding these covariates to our model.

ppo XXXp

p

22111ln

Multiple Logistic Regression Model

Before looking at some multiple logistic regression examples we need to look at how continuous predictors and categorical variables with 3 or levels are handled in these models and how associated OR’s are calculated.

Page 26: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

Example 2: Signs of CD and AgeExample 2: Signs of CD and AgeFit Model Y = CD (CD if signs present, No otherwise)Fit Model Y = CD (CD if signs present, No otherwise)

X = Age (years)X = Age (years)

Ageep

p

Agep

p

111.31.5

1 Odds

111.31.51

ln

Consider the risk associated with a c year increase in age.

1

1

1 )(

x Agefor Odds

c x Agefor Odds (OR) Ratio Odds

c

x

cx

ee

eo

o

Page 27: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

Example 2: Signs of CD and AgeExample 2: Signs of CD and AgeFor example consider a 10 year increase in For example consider a 10 year increase in

age, find the associated OR for showing age, find the associated OR for showing signs of CD, i.e. signs of CD, i.e. c = 10c = 10OR = eOR = ecc = e = e10*.11110*.111 = 3.03 = 3.03

Thus we estimate that the odds for exhibiting Thus we estimate that the odds for exhibiting signs of CD increase threefold for each 10 signs of CD increase threefold for each 10 years of age. Similar calculations could be years of age. Similar calculations could be done for other increments as well. done for other increments as well.

For example for a For example for a c = 1c = 1 year increase year increaseOR = eOR = e = e = e.111.111 = 1.18 or an 18% increase in = 1.18 or an 18% increase in odds per yearodds per year

Page 28: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

Example 2: Signs of CD and AgeExample 2: Signs of CD and Age

Can we assume that the increase in risk Can we assume that the increase in risk associated with a c unit increase is associated with a c unit increase is constant throughout one’s life? constant throughout one’s life?

Is the increase going from 20 Is the increase going from 20 30 years of 30 years of age the same as going from 50 age the same as going from 50 60 60 years?years?

If that assumption is not reasonable then If that assumption is not reasonable then one must be careful when discussing risk one must be careful when discussing risk associated with a continuous predictor.associated with a continuous predictor.

Page 29: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

Example 3: Race and Low Birth WeightExample 3: Race and Low Birth Weight

Calculate the odds for low birth weight for each race (Low, Norm)

White Infants (reference group, missing in parameters)

Black Infants

Other Infants

white racefor 1-

other racefor 1][

white racefor 1

black racefor 1][

OtherRace

BlackRace

0805.089.410.198.2)1(089.)1(410.198.2 ee

167.)0(089.)1(410.198.2 e

102.)1(089.)0(410.198.2 e

OR for Blacks vs. Whites= .167/.0805 = 2.075

OR for Others vs. Whites= .102/.0805 = 1.267

OR for Black vs. Others= .167/.102 = 1.637

Page 30: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

Example 3: Race and Low Birth WeightExample 3: Race and Low Birth Weight

Finding these directly using the estimated Finding these directly using the estimated parameters is cumbersome. JMP will parameters is cumbersome. JMP will compute the Odds Ratio for each possible compute the Odds Ratio for each possible comparison and their reciprocals in case comparison and their reciprocals in case those are of interest as well.those are of interest as well.

Odds Ratio column is odds for Low for Level 1 vs. Level 2.

Reciprocal is odds for Low for Level 2 vs. Level 1. These are the easiest to interpret here as they represent increased risk.

Page 31: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

Putting it all togetherPutting it all together

Now that we have seen how to interpret Now that we have seen how to interpret each of the variable types in a logistic each of the variable types in a logistic model we can consider multiple logistic model we can consider multiple logistic regression models with all these variable regression models with all these variable types included in the model. types included in the model.

We can then look at risk associated with We can then look at risk associated with certain factors adjusted for the other certain factors adjusted for the other covariates included in the model.covariates included in the model.

Page 32: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

Example 3: Smoking and Low Birth WeightExample 3: Smoking and Low Birth Weight

• Consider again the risk associated with smoking Consider again the risk associated with smoking but this time adjusting for the potential confounding but this time adjusting for the potential confounding effects of education level and age of the mother & effects of education level and age of the mother & father, race of the child, total number of prior father, race of the child, total number of prior pregnancies, number children born alive that are pregnancies, number children born alive that are now dead, and gestational age of the infant.now dead, and gestational age of the infant.

Several terms are not statistically significant and could consider using backwards elimination to simplify the model.

Page 33: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

Example 3: Race and Low Birth WeightExample 3: Race and Low Birth Weight

None of the mother and farther related covariates entered into the final model.

Adjusting for the included covariates we find smoking is statistically significant (p < .0001)

Adjusting for the included covariates we find the odds ratio for low birth weight associated with smoking during pregnancy is 2.142.

Odds Ratios for the other factors in the model can be computed as well. All of which can be prefaced by the “adjusting for…” statement.

Page 34: Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

SummarySummary• In logistic regression the response (Y) is a In logistic regression the response (Y) is a

dichotomous categorical variable.dichotomous categorical variable.• The parameter estimates give the odds ratio The parameter estimates give the odds ratio

associated the variables in the model.associated the variables in the model.• These odds ratios are adjusted for the other These odds ratios are adjusted for the other

variables in the model.variables in the model.• One can also calculate P(Y|One can also calculate P(Y|XX) if that is of ) if that is of

interest, e.g. given demographics of the interest, e.g. given demographics of the mother what is the estimated probability of mother what is the estimated probability of her having a child with low birth weight.her having a child with low birth weight.


Top Related