logistic regression for binary response variables

of 36/36
Logistic regression for binary response variables

Post on 23-Dec-2015

233 views

Category:

Documents

2 download

Embed Size (px)

TRANSCRIPT

  • Slide 1
  • Logistic regression for binary response variables
  • Slide 2
  • Space shuttle example n = 24 space shuttle launches prior to Challenger disaster on January 27, 1986 Response y is an indicator variable y = 1 if O-ring failures during launch y = 0 if no O-ring failures during launch Predictor x 1 is launch temperature, in degrees Fahrenheit
  • Slide 3
  • Space shuttle example
  • Slide 4
  • A model
  • Slide 5
  • The mean of a binary response If there are 20% smokers and 80% non-smokers, and Y i = 1, if smoker and 0, if non-smoker, then: If p i = P (Y i = 1) and 1 p i = P (Y i = 0), then:
  • Slide 6
  • A linear regression model for a binary response for Y i = 0, 1 If the simple linear regression model is: Then, the mean response is the probability that Y i = 1 when the level of the predictor variable is x i.
  • Slide 7
  • Space shuttle example
  • Slide 8
  • (Simple) logistic regression function
  • Slide 9
  • Slide 10
  • Slide 11
  • Space shuttle example
  • Slide 12
  • Alternative formulation of (simple) logistic regression function (algebra) logit
  • Slide 13
  • Space shuttle example
  • Slide 14
  • Interpretation of slope coefficients
  • Slide 15
  • Odds If there are 20% smokers and 80% non-smokers: Odds are 4 to 1 4 non-smokers to 1 smoker. and If p i = P (Y i = 1) and 1 p i = P (Y i = 0), then: and
  • Slide 16
  • Odds ratio MALE: 20% smokers and 80% non-smokers: FEMALE: 40% smokers and 60% non-smokers: The odds that a male is a nonsmoker is 2.67 times the odds that a female is a nonsmoker.
  • Slide 17
  • Odds ratio Group 1 Group 2 The odds ratio
  • Slide 18
  • Space shuttle example Predicted odds: Predicted odds at x 1 = 55 degrees: Predicted odds at x 1 = 80 degrees:
  • Slide 19
  • Space shuttle example Predicted odds ratio for x 1 = 55 relative to x 1 = 80: The odds of O-ring failure at 55 degrees Fahrenheit is 76 times the odds of O-ring failure at 80 degrees Fahrenheit!
  • Slide 20
  • Interpretation of slope coefficients The ratio of the odds at X 1 = A relative to the odds at X 1 = B (for fixed values of other Xs) is:
  • Slide 21
  • Estimation of logistic regression coefficients
  • Slide 22
  • Maximum likelihood estimation Choose as estimates of the parameters the values that assign the highest probability to (maximize likelihood of) the observed outcome.
  • Slide 23
  • Suppose For first observation, Y 1 = 1 and x 1 = 53: for second observation, Y 2 = 1 and x 2 = 56: and for 24th observation, Y 24 = 0 and x 24 = 81:
  • Slide 24
  • If = 10 and = -0.15, what is the probability of observed outcome? The log likelihood of the observed outcome is: The likelihood of the observed outcome is:
  • Slide 25
  • Maximum likelihood estimation Choose as estimates of the parameters the values that assign the highest probability to (maximize likelihood of) the observed outcome.
  • Slide 26
  • Suppose For first observation, Y 1 = 1 and x 1 = 53: for second observation, Y 2 = 1 and x 2 = 56: and for 24th observation, Y 24 = 0 and x 24 = 81:
  • Slide 27
  • If = 10.8 and = -0.17, what is the probability of observed outcome? The log likelihood of the observed outcome is: The likelihood of the observed outcome is:
  • Slide 28
  • Space shuttle example Link Function: Logit Response Information Variable Value Count failure 1 7 (Event) 0 17 Total 24 Logistic Regression Table Odds 95% CI Predictor Coef SE Coef Z P Ratio Lower Upper Constant 10.875 5.703 1.91 0.057 temp -0.17132 0.08344 -2.05 0.040 0.84 0.72 0.99
  • Slide 29
  • Properties of MLEs If a model is correct and the sample size is large enough: MLEs are essentially unbiased. Formulas exist for estimating the standard errors of the estimators. The estimators are about as precise as any nearly unbiased estimators. MLEs are approximately normally distributed.
  • Slide 30
  • Test and confidence intervals for single coefficients
  • Slide 31
  • Inference for j Test statistic: follows approximate standard normal distribution. Confidence interval:
  • Slide 32
  • Space shuttle example Link Function: Logit Response Information Variable Value Count failure 1 7 (Event) 0 17 Total 24 Logistic Regression Table Odds 95% CI Predictor Coef SE Coef Z P Ratio Lower Upper Constant 10.875 5.703 1.91 0.057 temp -0.17132 0.08344 -2.05 0.040 0.84 0.72 0.99
  • Slide 33
  • Space shuttle example There is sufficient evidence, at the = 0.05 level, to conclude that temperature is related to the probability of O-ring failure. For every 1-degree increase in temperature, the odds ratio of O-ring failure to O-ring non-failure is estimated to be 0.84 (95% CI is 0.72 to 0.99).
  • Slide 34
  • Survival in the Donner Party In 1846, Donner and Reed families traveled from Illinois to California by covered wagon. Group became stranded in eastern Sierra Nevada mountains when hit by heavy snow. 40 of 87 members died from famine and exposure. Are females better able to withstand harsh conditions than are males?
  • Slide 35
  • Survival in the Donner Party
  • Slide 36
  • Link Function: Logit Response Information Variable Value Count STATUS SURVIVED 20 (Event) DIED 25 Total 45 Logistic Regression Table Odds 95% CI Predictor Coef SE Coef Z P Ratio Lower Upper Constant 1.633 1.110 1.47 0.141 AGE -0.07820 0.03729 -2.10 0.036 0.92 0.86 0.99 Gender 1.5973 0.7555 2.11 0.034 4.94 1.12 21.72