logistic regression for binary response variables

36
Logistic regression for binary response variables

Upload: owen-snow

Post on 23-Dec-2015

264 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Logistic regression for binary response variables

Logistic regression for binary response variables

Page 2: Logistic regression for binary response variables

Space shuttle example

• n = 24 space shuttle launches prior to Challenger disaster on January 27, 1986

• Response y is an indicator variable– y = 1 if O-ring failures during launch– y = 0 if no O-ring failures during launch

• Predictor x1 is launch temperature, in degrees Fahrenheit

Page 3: Logistic regression for binary response variables

Space shuttle example

80706050

1.0

0.5

0.0

Temperature

Fai

lure

Yes

No

Plot of Failure versus Temperature

Page 4: Logistic regression for binary response variables

A model

Page 5: Logistic regression for binary response variables

The mean of a binary response

If there are 20% smokers and 80% non-smokers, and Yi = 1, if smoker and 0, if non-smoker, then:

120.0)80.0)(0()20.0)(1( ii YPYE

If pi = P (Yi = 1) and 1 – pi = P (Yi = 0), then:

1)1)(0())(1( iiiii YPpppYE

Page 6: Logistic regression for binary response variables

A linear regression model for a binary response

iii xY 10 for Yi = 0, 1

If the simple linear regression model is:

iii xYPYE 101

Then, the mean response …

… is the probability that Yi = 1 when the level of the predictor variable is xi.

Page 7: Logistic regression for binary response variables

Space shuttle example

80706050

1.0

0.5

0.0

Temperature

Pro

babi

lity

of

Fai

lure

S = 0.414476 R-Sq = 23.8 % R-Sq(adj) = 20.3 %

failure = 2.43729 - 0.0306883 temp

Regression Plot

Page 8: Logistic regression for binary response variables

(Simple) logistic regression function

i

iii x

xYPp

exp1

exp1

Page 9: Logistic regression for binary response variables

15010050

1.0

0.5

0.0

X

E(Y

) =

p

i

iii x

xYPp

1.010exp1

1.010exp1

Page 10: Logistic regression for binary response variables

i

iii x

xYPp

1.010exp1

1.010exp1

15010050

1.0

0.5

0.0

X

E(Y

) =

p

Page 11: Logistic regression for binary response variables

Space shuttle example

80706050

1.0

0.5

0.0

Temperature

Pro

babi

lity

of

failu

re

i

iii x

xYPp

17.08.10exp1

17.08.10exp1ˆˆ

Page 12: Logistic regression for binary response variables

Alternative formulation of (simple) logistic regression function

i

iii x

xYPp

exp1

exp1

ii

i xp

p

1

ln

(algebra)

“logit”

Page 13: Logistic regression for binary response variables

Space shuttle example

80706050

2

1

0

-1

-2

-3

Temperature

Logi

t

ii

i xp

p17.08.10

ˆ1

ˆln

Page 14: Logistic regression for binary response variables

Interpretation of slope coefficients

Page 15: Logistic regression for binary response variables

Odds

If there are 20% smokers and 80% non-smokers:

420.0

80.0Odds

“Odds are 4 to 1” … 4 non-smokers to 1 smoker.

and 39.14lnln Odds

If pi = P (Yi = 1) and 1 – pi = P (Yi = 0), then:

i

i

p

pOdds

1and

i

i

p

pOdds

1lnln

Page 16: Logistic regression for binary response variables

Odds ratioMALE: 20% smokers and 80% non-smokers:

420.0

80.0MOdds

FEMALE: 40% smokers and 60% non-smokers:

5.140.0

60.0FOdds

67.25.1

4OR

The odds that a male is a nonsmoker is 2.67 times the odds that a female is a nonsmoker.

Page 17: Logistic regression for binary response variables

Odds ratio

1|

1|1 1 i

i

p

pOdds

Group 1 Group 2

2|

2|2 1 i

i

p

pOdds

The odds ratio

2|

2|

1|

1|

2

1

11 i

i

i

i

p

p

p

p

Odds

OddsOR

Page 18: Logistic regression for binary response variables

Space shuttle example

ii

i xp

p17.08.10exp

ˆ1

ˆ

Predicted odds:

Predicted odds at x1 = 55 degrees:

263.45517.08.10expˆ1

ˆ

55|

55| i

i

p

p

Predicted odds at x1 = 80 degrees:

06081.08017.08.10expˆ1

ˆ

80|

80| i

i

p

p

Page 19: Logistic regression for binary response variables

Space shuttle example

76

06081.0

263.4

8017.08.10exp

5517.08.10expˆ1

ˆ

ˆ1

ˆ

80|

80|

55|

55|

i

i

i

i

p

p

p

p

Predicted odds ratio for x1 = 55 relative to x1 = 80:

The odds of O-ring failure at 55 degrees Fahrenheit is 76 times the odds of O-ring failure at 80 degrees Fahrenheit!

Page 20: Logistic regression for binary response variables

Interpretation of slope coefficients

The ratio of the odds at X1 = A relative to the odds at X1 = B (for fixed values of other X’s) is:

BAB

A

Odds

Odds

B

A 11

1 expexp

exp

Page 21: Logistic regression for binary response variables

Estimation of logistic regression coefficients

Page 22: Logistic regression for binary response variables

Maximum likelihood estimation

• Choose as estimates of the parameters the values that assign the highest probability to (“maximize likelihood of”) the observed outcome.

Page 23: Logistic regression for binary response variables

Suppose i

iii x

xYPp

15.010exp1

15.010exp1

For first observation, Y1 = 1 and x1 = 53:

886.0

)53(15.010exp1

)53(15.010exp11

YP

… for second observation, Y2 = 1 and x2 = 56:

832.0

)56(15.010exp1

)56(15.010exp12

YP

… and for 24th observation, Y24 = 0 and x24 = 81:

896.0

)81(15.010exp1

)81(15.010exp1024

YP

Page 24: Logistic regression for binary response variables

If α = 10 and β = -0.15, what is the probability of observed outcome?

24.12896.0ln...832.0ln886.0ln

0...11ln

0,...,1,1ln

2421

2421

YPYPYP

YYYP

The log likelihood of the observed outcome is:

The likelihood of the observed outcome is:

6

2421

2421

1082.4896.0...832.0886.0

0...11

0,...,1,1

YPYPYP

YYYP

Page 25: Logistic regression for binary response variables

Maximum likelihood estimation

• Choose as estimates of the parameters the values that assign the highest probability to (“maximize likelihood of”) the observed outcome.

Page 26: Logistic regression for binary response variables

Suppose i

iii x

xYPp

17.08.10exp1

17.08.10exp1

For first observation, Y1 = 1 and x1 = 53:

857.0

)53(17.08.10exp1

)53(17.08.10exp11

YP

… for second observation, Y2 = 1 and x2 = 56:

782.0

)56(17.08.10exp1

)56(17.08.10exp12

YP

… and for 24th observation, Y24 = 0 and x24 = 81:

951.0

)81(17.08.10exp1

)81(17.08.10exp1024

YP

Page 27: Logistic regression for binary response variables

If α = 10.8 and β = -0.17, what is the probability of observed outcome?

52.11951.0ln...782.0ln857.0ln

0...11ln

0,...,1,1ln

2421

2421

YPYPYP

YYYP

The log likelihood of the observed outcome is:

The likelihood of the observed outcome is:

6

2421

2421

1097.9951.0...782.0857.0

0...11

0,...,1,1

YPYPYP

YYYP

Page 28: Logistic regression for binary response variables

Space shuttle example

Link Function: Logit

Response Information

Variable Value Countfailure 1 7 (Event) 0 17 Total 24

Logistic Regression Table Odds 95% CIPredictor Coef SE Coef Z P Ratio Lower UpperConstant 10.875 5.703 1.91 0.057temp -0.17132 0.08344 -2.05 0.040 0.84 0.72 0.99

Page 29: Logistic regression for binary response variables

Properties of MLEs

• If a model is correct and the sample size is large enough:– MLEs are essentially unbiased.– Formulas exist for estimating the standard

errors of the estimators.– The estimators are about as precise as any

nearly unbiased estimators.– MLEs are approximately normally distributed.

Page 30: Logistic regression for binary response variables

Test and confidence intervals for single coefficients

Page 31: Logistic regression for binary response variables

Inference for βj

Test statistic: jjj

seZ

ˆ

ˆ follows approximate standard

normal distribution.

Confidence interval: jj sez ˆˆ

2/

Page 32: Logistic regression for binary response variables

Space shuttle example

Link Function: Logit

Response Information

Variable Value Countfailure 1 7 (Event) 0 17 Total 24

Logistic Regression Table Odds 95% CIPredictor Coef SE Coef Z P Ratio Lower UpperConstant 10.875 5.703 1.91 0.057temp -0.17132 0.08344 -2.05 0.040 0.84 0.72 0.99

Page 33: Logistic regression for binary response variables

Space shuttle example

• There is sufficient evidence, at the α = 0.05 level, to conclude that temperature is related to the probability of O-ring failure.

• For every 1-degree increase in temperature, the odds ratio of O-ring failure to O-ring non-failure is estimated to be 0.84 (95% CI is 0.72 to 0.99).

Page 34: Logistic regression for binary response variables

Survival in the Donner Party

• In 1846, Donner and Reed families traveled from Illinois to California by covered wagon.

• Group became stranded in eastern Sierra Nevada mountains when hit by heavy snow.

• 40 of 87 members died from famine and exposure.

• Are females better able to withstand harsh conditions than are males?

Page 35: Logistic regression for binary response variables

Survival in the Donner Party

655545352515

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0

Age

Pro

babi

lity

of

surv

ival

Female

Male

Page 36: Logistic regression for binary response variables

Survival in the Donner Party

Link Function: Logit

Response Information

Variable Value CountSTATUS SURVIVED 20 (Event) DIED 25 Total 45

Logistic Regression Table Odds 95% CIPredictor Coef SE Coef Z P Ratio Lower UpperConstant 1.633 1.110 1.47 0.141AGE -0.07820 0.03729 -2.10 0.036 0.92 0.86 0.99Gender 1.5973 0.7555 2.11 0.034 4.94 1.12 21.72