regression analysis with the ordered multinomial logistic model braden hoelzle southern methodist...

Click here to load reader

Post on 28-Mar-2015

216 views

Category:

Documents

1 download

Embed Size (px)

TRANSCRIPT

  • Slide 1

Regression Analysis with the Ordered Multinomial Logistic Model Braden Hoelzle Southern Methodist University December 2009 Slide 2 Situating the Model GLM Generalized Linear Model Linear RegressionLogistic Regression Ordered Multinomial Logistic Regression Unordered Multinomial Logistic Regression Slide 3 Review: Logistic Regression Dichotomous Dependent Variable Independent Variables can be dichotomous, integral, categoricaletc. We are trying to predict the probability that a person does or doesnt have a trait Example: At risk of dropping out or Not at risk Others?? Slide 4 Transform to Probability Probability range = (0 p 1) Therefore we must transform continuous values to the range 0-1 by using the formula: Or expanded to: Slide 5 summary(m1) Call: glm(formula = comply ~ physrec, family = binomial">summary(m1) Call: glm(formula = comply ~ physrec, family = binomial(link = "logit")) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.8383 0.4069 -4.518 6.26e-06 physrec 2.2882 0.4503 5.081 3.75e-07 The probability of complying if NOT recommended by physician: exp(-1.8383)/(1 + exp(-1.8383)) 0.1372525 The probability of complying if recommended by physician: exp(-1.8383 + 2.2882)/(1 + exp(-1.8383 + 2.2882)) 0.6106392">summary(m1) Call: glm(formula = comply ~ physrec, family = binomial" title="A Quick Example > m1summary(m1) Call: glm(formula = comply ~ physrec, family = binomial"> A Quick Example > m1summary(m1) Call: glm(formula = comply ~ physrec, family = binomial(link = "logit")) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.8383 0.4069 -4.518 6.26e-06 physrec 2.2882 0.4503 5.081 3.75e-07 The probability of complying if NOT recommended by physician: exp(-1.8383)/(1 + exp(-1.8383)) 0.1372525 The probability of complying if recommended by physician: exp(-1.8383 + 2.2882)/(1 + exp(-1.8383 + 2.2882)) 0.6106392 Slide 6 Ordered Multinomial Logistic Model Four Types of Scales 1. _________ - mutually exclusive categories w/ no logical order. 2. _________ - mutually exclusive categories w/ logical rank order. 3. _________ - ordered data w/ equal distance between each point (no absolute zero). 4. _________ - ordered data w/ equal distance between each point (w/ a true zero). What type of data would you expect our ordered multinomial regression to model? Slide 7 Definition The ordered multinomial logistic model enables us to model ordinally scaled dependent variables with one or more independent variables. These IV(s) can take many different forms (ie. real numbers values, integers, categorical, binomial, etc.). Slide 8 Does this Occur Much? Ordinal data are the most frequently encountered type of data in the social sciences (Johnson & Albert, 1999, p. 126). Examples Yes, maybe, no Likert scale (Strongly Agree Strongly Disagree) Always, frequently, sometimes, rarely, never No hs diploma, hs diploma, some college, bachelors degree, masters degree, doctoral degree Free school lunch, reduced school lunch, full price lunch 0-10k per year, 10-20K per year, 20-30K per year, 30 60K per year, > 60K per year Low, medium, high Basic math, regular math, pre-AP math, AP math Neles dancing ability, Megs dancing ability, Saralyns dancing ability, Joses dancing ability, Kyles dancing ability, Bradens dancing ability, a rock Slide 9 Running Regression using the Ordered Multinomial Logistic Model in R Load/Install Libraries: library(arm) library (psych) Load data (UCLA Academic Technology Services, n.d.) mydatastr(mydata) 'data.frame': 400 obs. of 4 variables: $ apply : int 2 1 0 1 1 0 1 1 0 1... $ pared : int 0 1 1 0 0 0 0 0 0 1... $ p"> Description of Data > str(mydata) 'data.frame': 400 obs. of 4 variables: $ apply : int 2 1 0 1 1 0 1 1 0 1... $ pared : int 0 1 1 0 0 0 0 0 0 1... $ public: int 0 0 1 0 0 1 0 0 0 0... $ gpa : num 3.26 3.21 3.94 2.81 2.53... > summary(mydata$gpa) Min. 1st Qu. Median Mean 3rd Qu. Max. 1.900 2.720 2.990 2.999 3.270 4.000 > table(apply) apply 0 1 2 220 140 40 > table(pared) pared 0 1 337 63 > table(public) public 0 1 343 57 Slide 12 Crosstabs > xtabs(~ pared + apply) apply pared 0 1 2 0 200 110 27 1 20 30 13 > xtabs(~ public + apply) apply public 0 1 2 0 189 124 30 1 31 16 10 Why would this information be important for running our ordered multinomial logistic model? Slide 13 Assumptions No perfect predictions one predictor variable value cannot solely correspond to one dependent variable value. (ex. Every student w/ parents who went to graduate school cannot indicate that they are very likely to attend graduate school) check using crosstabs ( see slide 12). No empty or very small cells see crosstabs. Sample Size always requires more cases than OLS regression. Slide 14 Running a Single Predictor Model > summary(m1