logistic regression

39
Logistic Regression Logistic Regression Chongming Yang Chongming Yang Research Support Center Research Support Center FHSS College FHSS College

Upload: loring

Post on 19-Mar-2016

61 views

Category:

Documents


0 download

DESCRIPTION

Logistic Regression. Chongming Yang Research Support Center FHSS College. Rules of Logarithm. Log ( uv ) = Log (u) + Log (v) Log (u/v) = Log (u) - Log (v) Log ( u ) v = v Log (u). Rules of Exponentiation (0

TRANSCRIPT

Page 1: Logistic Regression

Logistic RegressionLogistic Regression

Chongming YangChongming YangResearch Support CenterResearch Support Center

FHSS CollegeFHSS College

Page 2: Logistic Regression

Rules of LogarithmRules of Logarithm LogLog ( (uvuv) = ) = LogLog (u) + (u) + Log Log (v) (v)

LogLog (u/v) = (u/v) = LogLog (u) - (u) - Log Log (v) (v)

LogLog ( (uu))v v = v = v Log Log (u) (u)

Page 3: Logistic Regression

Rules of ExponentiationRules of Exponentiation(0<a<1)(0<a<1)

aammaann = a = am m + a+ ann

aamm/a/ann = a = am m – a– ann

(a(amm))nn = a = amnmn

Page 4: Logistic Regression

Exponential & LogarithmicExponential & Logarithmic Inverse of One AnotherInverse of One Another

Y = aY = axx

X = LogX = Logaa(y) (y)

Page 5: Logistic Regression

Assumptions of Linear Assumptions of Linear RegressionRegression

YYii = = + + XXii + + ii YYii continuous & unbounded continuous & unbounded expected or mean (expected or mean (ii)= 0 )= 0 II = normally distributed = normally distributed not correlated with predictorsnot correlated with predictors Absence of perfect multicollinearityAbsence of perfect multicollinearity No measurement error in all variablesNo measurement error in all variables

Page 6: Logistic Regression

Violation of LR Assumptions Violation of LR Assumptions Dichotomous Dependent Variable Dichotomous Dependent Variable

(DV)(DV)

Unordered Categorical (Nominal) DV Unordered Categorical (Nominal) DV

Ordered Categorical (Ordinal) DV Ordered Categorical (Ordinal) DV

Page 7: Logistic Regression

Natural Logarithmic Natural Logarithmic TransformationTransformation

(Binary DV)(Binary DV)

Let Let pp = probability of an event = probability of an event

Page 8: Logistic Regression

Logit Model Logit Model

Page 9: Logistic Regression

Rearranged Logit ModelRearranged Logit Model

Page 10: Logistic Regression

Logistic Model Logistic Model

Page 12: Logistic Regression

Odds RatioOdds Ratio

(1) / 1 (1)(0) / [1 (0)]

Bp pOR e

p p

Page 13: Logistic Regression

Interpretation of Interpretation of CoefficientsCoefficients

(odds ratio)(odds ratio) Dichotomous predictor X1: Dichotomous predictor X1:

The predicted odds of a positive response for The predicted odds of a positive response for group A is ? times the odds for the group B.group A is ? times the odds for the group B.

The odds of a positive response for group a is ?The odds of a positive response for group a is ?% higher than the odds for group B.% higher than the odds for group B.

Continuous predictor X2:Continuous predictor X2:

One unit increase is associated with ?% One unit increase is associated with ?% increase in the predicted odds of Xincrease in the predicted odds of X

Page 14: Logistic Regression

InterpretationInterpretation

See Handout See Handout

Page 15: Logistic Regression

Interpretation of InteractionInterpretation of Interaction Definition: Definition:

The effect of a covariate depends on the level The effect of a covariate depends on the level of another covariate.of another covariate.

Interpretation:Interpretation: Plug in some values of two variablesPlug in some values of two variables Plot estimated logit Plot estimated logit Interpret interaction effect only when Interpret interaction effect only when main effects is present main effects is present

Page 16: Logistic Regression

Likelihood at value of XLikelihood at value of X(left side of equation)(left side of equation)

1

11

iyni

ii i

pL p

p

Page 17: Logistic Regression

Log Likelihood Log Likelihood (left side of equation)(left side of equation)

Page 18: Logistic Regression

Log Logit ModelLog Logit Model(right side of equation)(right side of equation)

Page 19: Logistic Regression

Maximum Likelihood Maximum Likelihood EstimationEstimation

Page 20: Logistic Regression

Likelihood Ratio Test of Likelihood Ratio Test of 00, , 11… …

Likelihood Ratio Test =Likelihood Ratio Test = Deviance = -2log (likelihood of fitted Deviance = -2log (likelihood of fitted

model / model / likelihood of Saturated model)likelihood of Saturated model)

likelihood of Saturated model=1 likelihood of Saturated model=1 Deviance = -2log (likelihood of fitted model)Deviance = -2log (likelihood of fitted model)

Page 21: Logistic Regression

22 Test of Test of 00, , 11……

1. 1. 22 =-2Ln(likelihood of without x )/ =-2Ln(likelihood of without x )/ (likelihood model with x)(likelihood model with x) 2. Degree of Freedom = j - (p+1)2. Degree of Freedom = j - (p+1) where j = (# of Categories) + (# of continuous where j = (# of Categories) + (# of continuous

variables) variables) p = # of parameters, p = # of parameters,

Page 22: Logistic Regression

Hosmer-Lemeshow Hosmer-Lemeshow Test(Test(22) ) (grouping percentile of (grouping percentile of

estimated p)estimated p)

1

kC

k ij

o y

1

ˆkCj j

kj k

m pp

n

Where

1

( )ˆ(1 )

gk k k

k k k k

o n pCn p p

g = 10, k = 1..10, n' = number of subjects in kth group, ck= # of covariate patterns, p¯ = average estimated probability, df= g-2

Page 23: Logistic Regression

y = 1y = 1

y = 0y = 0

Group 1Group 1(10% prob.)(10% prob.)

Group 2Group 220% prob.20% prob.

…… Group 10Group 10100% prob.100% prob.

EstimatedEstimated ObservedObserved EstimatedEstimated ObservedObserved……

EstimatedEstimated ObservedObserved

EstimatedEstimated N1N1 N2N2……

ObservedObserved N3N3 N4N4……

Page 24: Logistic Regression

Wald Test of Wald Test of 00, , 11……

W = W = / se( / se() ) (se = standard (se = standard

error)error)

Normal Distribution testNormal Distribution test

Page 25: Logistic Regression

Multinomial Logistic Multinomial Logistic RegressionRegression

(non-ordered categorical DV)(non-ordered categorical DV) P = probability of a response categoryP = probability of a response category PPi1i1 + P + Pi2i2 + P + Pi3i3 = 1 = 1

11

3

log i

i

p B Xp

22

3

log i

i

p B Xp

13

2

log i

i

p B Xp

Page 26: Logistic Regression

Multinomial Logistic Multinomial Logistic RegressionRegression

( ) 1

1

1

1i k K

x

K

pe

Page 27: Logistic Regression

Interpretation Interpretation

See handoutSee handout

Page 28: Logistic Regression

Ordinal Logistic ModelsOrdinal Logistic Models

Adjacent Category ModelAdjacent Category Model Compare two adjacent categoriesCompare two adjacent categories

Page 29: Logistic Regression

Adjacent Categories ModelAdjacent Categories Model Let j be an ordinal scale Let j be an ordinal scale

j = 1… j = 1… j & j+1 = two adjacent categoriesj & j+1 = two adjacent categories

ModelModel

, 1

log ijj j j

i j

pa B x

p

Page 30: Logistic Regression

PracticePractice Run Logistic Regression Using ‘binary.sav’Run Logistic Regression Using ‘binary.sav’

DV = AdmitDV = Admit

IV = gre, gpa, rankIV = gre, gpa, rank

Annotated output:Annotated output:http://www.ats.ucla.edu/stat/spss/dae/logit.htm

Page 31: Logistic Regression

Pseudo R-squaredPseudo R-squared(based on Likelihood)(based on Likelihood)

Explained Variability Explained Variability

Improvement from null model to Improvement from null model to fitted model fitted model

Square of correlation (predicted and Square of correlation (predicted and observed) observed)

Page 32: Logistic Regression

Psudo R Square Psudo R Square Cox & Snell Cox & Snell

Improvement of full model over intercept modelImprovement of full model over intercept model NagelkerkeNagelkerke

Improvement of full model over intercept modelImprovement of full model over intercept model McFaddenMcFadden

adjusted R-squared in OLS adjusted R-squared in OLS penalizing a model with too many predictors penalizing a model with too many predictors

http://www.ats.ucla.edu/stat/mult_pkg/faq/general/Psuedo_RSquareds.htm

Page 33: Logistic Regression

Practice Practice (continued)(continued)

Run Multinomial Logistic Regression UsingRun Multinomial Logistic Regression Using ‘ ‘mlogit.sav’mlogit.sav’

DV= BrandDV= Brand

IV = female, ageIV = female, age

Annotated output:Annotated output:http://www.ats.ucla.edu/stat/spss/dae/mlogit.htmhttp://www.ats.ucla.edu/stat/spss/dae/mlogit.htm

Page 34: Logistic Regression

Practice Practice (continued)(continued)

Run Ordinal Logistic Regression Using Run Ordinal Logistic Regression Using ologit.savologit.sav

DV= admitDV= admit

IV = gre, gpa, topnotchIV = gre, gpa, topnotch

Annotated output:Annotated output:http://www.ats.ucla.edu/stat/SPSS/dae/ologit.htmhttp://www.ats.ucla.edu/stat/SPSS/dae/ologit.htm

Page 35: Logistic Regression

Practical IssuesPractical Issues1. Low Ratio of Cases to Variables1. Low Ratio of Cases to Variables

Problem: Problem: Extremely large parameter estimates and Extremely large parameter estimates and

standard errorsstandard errors Solution:Solution:

Collapse categories Collapse categories Delete the offending category Delete the offending category Delete discrete predictors Delete discrete predictors

Page 36: Logistic Regression

Practical IssuesPractical Issues2. Inadequacy of Expected Frequencies 2. Inadequacy of Expected Frequencies

& Power& Power Problems: Problems:

Lower power with small frequency cells Lower power with small frequency cells Solution: Solution:

Accept low powerAccept low power Collapse categories or delete discrete Collapse categories or delete discrete

predictorspredictors Evaluate model fit with Evaluate model fit with 22

Page 37: Logistic Regression

Practical IssuesPractical Issues3. Presence of multicollinearity3. Presence of multicollinearity

Problem: Problem: Large standard errors, or estimatesLarge standard errors, or estimates

Solution:Solution: Run multiway frequency tables to identify Run multiway frequency tables to identify

categorical variablescategorical variables Run correlations to identify continuous variablesRun correlations to identify continuous variables Delete theoretically less important predictors or Delete theoretically less important predictors or

combine with other procedures combine with other procedures

Page 38: Logistic Regression

Practical IssuesPractical Issues Rare events may be appropriate for Rare events may be appropriate for

poisson regression or negative poisson regression or negative binomial regression.binomial regression.

Page 39: Logistic Regression

ReferencesReferences1.1. Allison, P. D. (Logistic regression using the SAS system. NC, Allison, P. D. (Logistic regression using the SAS system. NC,

Cary: SAS Institute, Inc.Cary: SAS Institute, Inc.

2.2. Hosmer, D. W. & Lemeshow, S. (2000). Applied logistic Hosmer, D. W. & Lemeshow, S. (2000). Applied logistic regression. New York: John Wiley & Sones, Inc.regression. New York: John Wiley & Sones, Inc.

3.3. Menard, S. (1994). Applied logistic regression analysis. Menard, S. (1994). Applied logistic regression analysis. Thousand Oaks, CA: Sage Publications, Inc.Thousand Oaks, CA: Sage Publications, Inc.

4.4. Liao, T. F. (1994). Interpreting Probability models: logit, probit, Liao, T. F. (1994). Interpreting Probability models: logit, probit, and other generalized linear models. Thousand Oaks, CA: Sage and other generalized linear models. Thousand Oaks, CA: Sage Publications, Inc.Publications, Inc.

5.5. Long, S.J. & Freese, J. (2006). Regression models for categorical Long, S.J. & Freese, J. (2006). Regression models for categorical dependent variables using stata. College Station, Texus: Stata dependent variables using stata. College Station, Texus: Stata press press