logistic regression

Post on 19-Mar-2016

53 views

Category:

Documents

Embed Size (px)

DESCRIPTION

Logistic Regression. Chongming Yang Research Support Center FHSS College. Rules of Logarithm. Log ( uv ) = Log (u) + Log (v) Log (u/v) = Log (u) - Log (v) Log ( u ) v = v Log (u). Rules of Exponentiation (0

TRANSCRIPT

• Logistic RegressionChongming YangResearch Support CenterFHSS College

• Rules of LogarithmLog (uv) = Log (u) + Log (v)

Log (u/v) = Log (u) - Log (v)

Log (u)v = v Log (u)

• Rules of Exponentiation(0
• Exponential & LogarithmicInverse of One Another

Y = ax

X = Loga(y)

• Assumptions of Linear RegressionYi = + Xi + i Yi continuous & unboundedexpected or mean (i)= 0 I = normally distributed not correlated with predictorsAbsence of perfect multicollinearityNo measurement error in all variables

• Violation of LR Assumptions Dichotomous Dependent Variable (DV)

Unordered Categorical (Nominal) DV

Ordered Categorical (Ordinal) DV

• Natural Logarithmic Transformation(Binary DV)Let p = probability of an event

• Logit Model

• Rearranged Logit Model

• Logistic Model

• Odds Ratio

• Interpretation of Coefficients(odds ratio)Dichotomous predictor X1: The predicted odds of a positive response for group A is ? times the odds for the group B.The odds of a positive response for group a is ?% higher than the odds for group B. Continuous predictor X2:One unit increase is associated with ?% increase in the predicted odds of X

• Interpretation

See Handout

• Interpretation of InteractionDefinition: The effect of a covariate depends on the level of another covariate.

Interpretation:Plug in some values of two variablesPlot estimated logit Interpret interaction effect only when main effects is present

• Likelihood at value of X(left side of equation)

• Log Likelihood (left side of equation)

• Log Logit Model(right side of equation)

• Maximum Likelihood Estimation

• Likelihood Ratio Test of 0, 1 Likelihood Ratio Test =Deviance = -2log (likelihood of fitted model / likelihood of Saturated model)

likelihood of Saturated model=1 Deviance = -2log (likelihood of fitted model)

• 2 Test of 0, 1

1. 2 =-2Ln(likelihood of without x )/ (likelihood model with x) 2. Degree of Freedom = j - (p+1) where j = (# of Categories) + (# of continuous variables) p = # of parameters,

• Hosmer-Lemeshow Test(2) (grouping percentile of estimated p)Whereg = 10, k = 1..10, n' = number of subjects in kth group, ck= # of covariate patterns, p = average estimated probability, df= g-2

• y = 1

y = 0Group 1(10% prob.)Group 220% prob.Group 10100% prob.EstimatedObservedEstimatedObservedEstimatedObservedEstimatedN1N2ObservedN3N4

• Wald Test of 0, 1

W = / se() (se = standard error)

Normal Distribution test

• Multinomial Logistic Regression(non-ordered categorical DV)P = probability of a response categoryPi1 + Pi2 + Pi3 = 1

• Multinomial Logistic Regression

• Interpretation

See handout

• Ordinal Logistic Models

• Adjacent Categories ModelLet j be an ordinal scale j = 1 j & j+1 = two adjacent categoriesModel

• PracticeRun Logistic Regression Using binary.sav

IV = gre, gpa, rank

Annotated output:http://www.ats.ucla.edu/stat/spss/dae/logit.htm

• Pseudo R-squared(based on Likelihood)Explained Variability

Improvement from null model to fitted model

Square of correlation (predicted and observed)

• Psudo R Square Cox & Snell Improvement of full model over intercept modelNagelkerkeImprovement of full model over intercept modelMcFaddenadjusted R-squared in OLS penalizing a model with too many predictors

http://www.ats.ucla.edu/stat/mult_pkg/faq/general/Psuedo_RSquareds.htm

• Practice (continued)Run Multinomial Logistic Regression Using mlogit.sav

DV= Brand

IV = female, age

Annotated output:http://www.ats.ucla.edu/stat/spss/dae/mlogit.htm

• Practice (continued)Run Ordinal Logistic Regression Using ologit.sav

IV = gre, gpa, topnotch

Annotated output:http://www.ats.ucla.edu/stat/SPSS/dae/ologit.htm

• Practical Issues1. Low Ratio of Cases to VariablesProblem: Extremely large parameter estimates and standard errorsSolution:Collapse categories Delete the offending category Delete discrete predictors

• Practical Issues2. Inadequacy of Expected Frequencies & PowerProblems: Lower power with small frequency cells Solution: Accept low powerCollapse categories or delete discrete predictorsEvaluate model fit with 2

• Practical Issues3. Presence of multicollinearityProblem: Large standard errors, or estimatesSolution:Run multiway frequency tables to identify categorical variablesRun correlations to identify continuous variablesDelete theoretically less important predictors or combine with other procedures

• Practical IssuesRare events may be appropriate for poisson regression or negative binomial regression.

• ReferencesAllison, P. D. (Logistic regression using the SAS system. NC, Cary: SAS Institute, Inc.

Hosmer, D. W. & Lemeshow, S. (2000). Applied logistic regression. New York: John Wiley & Sones, Inc.

Menard, S. (1994). Applied logistic regression analysis. Thousand Oaks, CA: Sage Publications, Inc.

Liao, T. F. (1994). Interpreting Probability models: logit, probit, and other generalized linear models. Thousand Oaks, CA: Sage Publications, Inc.

Long, S.J. & Freese, J. (2006). Regression models for categorical dependent variables using stata. College Station, Texus: Stata press