logistic regression
DESCRIPTION
Logistic Regression. Chongming Yang Research Support Center FHSS College. Rules of Logarithm. Log ( uv ) = Log (u) + Log (v) Log (u/v) = Log (u) - Log (v) Log ( u ) v = v Log (u). Rules of Exponentiation (0TRANSCRIPT
Logistic RegressionLogistic Regression
Chongming YangChongming YangResearch Support CenterResearch Support Center
FHSS CollegeFHSS College
Rules of LogarithmRules of Logarithm LogLog ( (uvuv) = ) = LogLog (u) + (u) + Log Log (v) (v)
LogLog (u/v) = (u/v) = LogLog (u) - (u) - Log Log (v) (v)
LogLog ( (uu))v v = v = v Log Log (u) (u)
Rules of ExponentiationRules of Exponentiation(0<a<1)(0<a<1)
aammaann = a = am m + a+ ann
aamm/a/ann = a = am m – a– ann
(a(amm))nn = a = amnmn
Exponential & LogarithmicExponential & Logarithmic Inverse of One AnotherInverse of One Another
Y = aY = axx
X = LogX = Logaa(y) (y)
Assumptions of Linear Assumptions of Linear RegressionRegression
YYii = = + + XXii + + ii YYii continuous & unbounded continuous & unbounded expected or mean (expected or mean (ii)= 0 )= 0 II = normally distributed = normally distributed not correlated with predictorsnot correlated with predictors Absence of perfect multicollinearityAbsence of perfect multicollinearity No measurement error in all variablesNo measurement error in all variables
Violation of LR Assumptions Violation of LR Assumptions Dichotomous Dependent Variable Dichotomous Dependent Variable
(DV)(DV)
Unordered Categorical (Nominal) DV Unordered Categorical (Nominal) DV
Ordered Categorical (Ordinal) DV Ordered Categorical (Ordinal) DV
Natural Logarithmic Natural Logarithmic TransformationTransformation
(Binary DV)(Binary DV)
Let Let pp = probability of an event = probability of an event
Logit Model Logit Model
Rearranged Logit ModelRearranged Logit Model
Logistic Model Logistic Model
Odds RatioOdds Ratio
(1) / 1 (1)(0) / [1 (0)]
Bp pOR e
p p
Interpretation of Interpretation of CoefficientsCoefficients
(odds ratio)(odds ratio) Dichotomous predictor X1: Dichotomous predictor X1:
The predicted odds of a positive response for The predicted odds of a positive response for group A is ? times the odds for the group B.group A is ? times the odds for the group B.
The odds of a positive response for group a is ?The odds of a positive response for group a is ?% higher than the odds for group B.% higher than the odds for group B.
Continuous predictor X2:Continuous predictor X2:
One unit increase is associated with ?% One unit increase is associated with ?% increase in the predicted odds of Xincrease in the predicted odds of X
InterpretationInterpretation
See Handout See Handout
Interpretation of InteractionInterpretation of Interaction Definition: Definition:
The effect of a covariate depends on the level The effect of a covariate depends on the level of another covariate.of another covariate.
Interpretation:Interpretation: Plug in some values of two variablesPlug in some values of two variables Plot estimated logit Plot estimated logit Interpret interaction effect only when Interpret interaction effect only when main effects is present main effects is present
Likelihood at value of XLikelihood at value of X(left side of equation)(left side of equation)
1
11
iyni
ii i
pL p
p
Log Likelihood Log Likelihood (left side of equation)(left side of equation)
Log Logit ModelLog Logit Model(right side of equation)(right side of equation)
Maximum Likelihood Maximum Likelihood EstimationEstimation
Likelihood Ratio Test of Likelihood Ratio Test of 00, , 11… …
Likelihood Ratio Test =Likelihood Ratio Test = Deviance = -2log (likelihood of fitted Deviance = -2log (likelihood of fitted
model / model / likelihood of Saturated model)likelihood of Saturated model)
likelihood of Saturated model=1 likelihood of Saturated model=1 Deviance = -2log (likelihood of fitted model)Deviance = -2log (likelihood of fitted model)
22 Test of Test of 00, , 11……
1. 1. 22 =-2Ln(likelihood of without x )/ =-2Ln(likelihood of without x )/ (likelihood model with x)(likelihood model with x) 2. Degree of Freedom = j - (p+1)2. Degree of Freedom = j - (p+1) where j = (# of Categories) + (# of continuous where j = (# of Categories) + (# of continuous
variables) variables) p = # of parameters, p = # of parameters,
Hosmer-Lemeshow Hosmer-Lemeshow Test(Test(22) ) (grouping percentile of (grouping percentile of
estimated p)estimated p)
1
kC
k ij
o y
1
ˆkCj j
kj k
m pp
n
Where
1
( )ˆ(1 )
gk k k
k k k k
o n pCn p p
g = 10, k = 1..10, n' = number of subjects in kth group, ck= # of covariate patterns, p¯ = average estimated probability, df= g-2
y = 1y = 1
y = 0y = 0
Group 1Group 1(10% prob.)(10% prob.)
Group 2Group 220% prob.20% prob.
…… Group 10Group 10100% prob.100% prob.
EstimatedEstimated ObservedObserved EstimatedEstimated ObservedObserved……
EstimatedEstimated ObservedObserved
EstimatedEstimated N1N1 N2N2……
ObservedObserved N3N3 N4N4……
Wald Test of Wald Test of 00, , 11……
W = W = / se( / se() ) (se = standard (se = standard
error)error)
Normal Distribution testNormal Distribution test
Multinomial Logistic Multinomial Logistic RegressionRegression
(non-ordered categorical DV)(non-ordered categorical DV) P = probability of a response categoryP = probability of a response category PPi1i1 + P + Pi2i2 + P + Pi3i3 = 1 = 1
11
3
log i
i
p B Xp
22
3
log i
i
p B Xp
13
2
log i
i
p B Xp
Multinomial Logistic Multinomial Logistic RegressionRegression
( ) 1
1
1
1i k K
x
K
pe
Interpretation Interpretation
See handoutSee handout
Ordinal Logistic ModelsOrdinal Logistic Models
Adjacent Category ModelAdjacent Category Model Compare two adjacent categoriesCompare two adjacent categories
Adjacent Categories ModelAdjacent Categories Model Let j be an ordinal scale Let j be an ordinal scale
j = 1… j = 1… j & j+1 = two adjacent categoriesj & j+1 = two adjacent categories
ModelModel
, 1
log ijj j j
i j
pa B x
p
PracticePractice Run Logistic Regression Using ‘binary.sav’Run Logistic Regression Using ‘binary.sav’
DV = AdmitDV = Admit
IV = gre, gpa, rankIV = gre, gpa, rank
Annotated output:Annotated output:http://www.ats.ucla.edu/stat/spss/dae/logit.htm
Pseudo R-squaredPseudo R-squared(based on Likelihood)(based on Likelihood)
Explained Variability Explained Variability
Improvement from null model to Improvement from null model to fitted model fitted model
Square of correlation (predicted and Square of correlation (predicted and observed) observed)
Psudo R Square Psudo R Square Cox & Snell Cox & Snell
Improvement of full model over intercept modelImprovement of full model over intercept model NagelkerkeNagelkerke
Improvement of full model over intercept modelImprovement of full model over intercept model McFaddenMcFadden
adjusted R-squared in OLS adjusted R-squared in OLS penalizing a model with too many predictors penalizing a model with too many predictors
http://www.ats.ucla.edu/stat/mult_pkg/faq/general/Psuedo_RSquareds.htm
Practice Practice (continued)(continued)
Run Multinomial Logistic Regression UsingRun Multinomial Logistic Regression Using ‘ ‘mlogit.sav’mlogit.sav’
DV= BrandDV= Brand
IV = female, ageIV = female, age
Annotated output:Annotated output:http://www.ats.ucla.edu/stat/spss/dae/mlogit.htmhttp://www.ats.ucla.edu/stat/spss/dae/mlogit.htm
Practice Practice (continued)(continued)
Run Ordinal Logistic Regression Using Run Ordinal Logistic Regression Using ologit.savologit.sav
DV= admitDV= admit
IV = gre, gpa, topnotchIV = gre, gpa, topnotch
Annotated output:Annotated output:http://www.ats.ucla.edu/stat/SPSS/dae/ologit.htmhttp://www.ats.ucla.edu/stat/SPSS/dae/ologit.htm
Practical IssuesPractical Issues1. Low Ratio of Cases to Variables1. Low Ratio of Cases to Variables
Problem: Problem: Extremely large parameter estimates and Extremely large parameter estimates and
standard errorsstandard errors Solution:Solution:
Collapse categories Collapse categories Delete the offending category Delete the offending category Delete discrete predictors Delete discrete predictors
Practical IssuesPractical Issues2. Inadequacy of Expected Frequencies 2. Inadequacy of Expected Frequencies
& Power& Power Problems: Problems:
Lower power with small frequency cells Lower power with small frequency cells Solution: Solution:
Accept low powerAccept low power Collapse categories or delete discrete Collapse categories or delete discrete
predictorspredictors Evaluate model fit with Evaluate model fit with 22
Practical IssuesPractical Issues3. Presence of multicollinearity3. Presence of multicollinearity
Problem: Problem: Large standard errors, or estimatesLarge standard errors, or estimates
Solution:Solution: Run multiway frequency tables to identify Run multiway frequency tables to identify
categorical variablescategorical variables Run correlations to identify continuous variablesRun correlations to identify continuous variables Delete theoretically less important predictors or Delete theoretically less important predictors or
combine with other procedures combine with other procedures
Practical IssuesPractical Issues Rare events may be appropriate for Rare events may be appropriate for
poisson regression or negative poisson regression or negative binomial regression.binomial regression.
ReferencesReferences1.1. Allison, P. D. (Logistic regression using the SAS system. NC, Allison, P. D. (Logistic regression using the SAS system. NC,
Cary: SAS Institute, Inc.Cary: SAS Institute, Inc.
2.2. Hosmer, D. W. & Lemeshow, S. (2000). Applied logistic Hosmer, D. W. & Lemeshow, S. (2000). Applied logistic regression. New York: John Wiley & Sones, Inc.regression. New York: John Wiley & Sones, Inc.
3.3. Menard, S. (1994). Applied logistic regression analysis. Menard, S. (1994). Applied logistic regression analysis. Thousand Oaks, CA: Sage Publications, Inc.Thousand Oaks, CA: Sage Publications, Inc.
4.4. Liao, T. F. (1994). Interpreting Probability models: logit, probit, Liao, T. F. (1994). Interpreting Probability models: logit, probit, and other generalized linear models. Thousand Oaks, CA: Sage and other generalized linear models. Thousand Oaks, CA: Sage Publications, Inc.Publications, Inc.
5.5. Long, S.J. & Freese, J. (2006). Regression models for categorical Long, S.J. & Freese, J. (2006). Regression models for categorical dependent variables using stata. College Station, Texus: Stata dependent variables using stata. College Station, Texus: Stata press press