ordinal multinomial logistic regressionfaculty.smu.edu/kyler/courses/7312/presentations/... ·...

Ordinal Multinomial Logistic Regression

Thom M. Suhy Southern Methodist University May14th, 2013

GLM � Generalized Linear Model (GLM) –

“Framework for statistical analysis” (Gelman and Hill, 2007, p. 135)

�  Linear Regression – Continuous data

�  Logistic Regression – Binary data � Ordered Multinomial Logistic Regression � Unordered Multinomial Logistic Regression

Ordered Multinomial Logistic Regression

Logistic Regression �  Dependent variable is dichotomous

�  Yes or No �  Apply or Not Apply �  Pass or Fail �  Heisman or no Heisman

�  Probability of trait (yes, apply, pass, Heisman) based on independent variables

�  Independent variable does not need to be dichotomous �  Categorical �  Integral �  Dichotomous �  Nominal �  Ordinal


Logistic Regression – Refresher Call: glm(formula = comply ~ physrec, family = binomial(link = "logit")) Deviance Residuals: Min 1Q Median 3Q Max -1.3735 -1.3735 -0.5434 0.9933 1.9929 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.8383 0.4069 -4.518 6.26e-06 *** physrec 2.2882 0.4503 5.081 3.75e-07 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 226.47 on 163 degrees of freedom Residual deviance: 191.87 on 162 degrees of freedom AIC: 195.87 Number of Fisher Scoring iterations: 4


Logistic Regression – Refresher Formula for logit �  ## logit=-1.8383+(2.2882*physrec) ## �  ## logit = -1.8383 for no physrec ## �  ## logit = .4499 for yes physrec ## Probability to comply �  exp(-1.8383)/(1+(exp(-1.8383))) �  Probability of comply with no physrec = .137 or 13.7%

�  exp(.4499)/(1+(exp(.4499))) �  Probability of comply with physrec = .6106 or 61%


Logistic Regression – Refresher

physrec

factor(comply)

0.2 0.4 0.6 0.8

01

0.0

0.2

0.4

0.6

0.8

1.0


Logistic vs. Ordered Multinomial How are they different?

�  An extension of logistic regression to multiple categories (Gelman & Hill, 2007)

�  Not binary, categorical (but ordered) � Decision (Yes, Maybe, No) � Order of Finish (1st, 2nd, 3rd) � Likert Scale (Strongly Disagree – Strongly

Agree) �  Income ranges (0 – 25K, 25K-50K, 50K+) � Degree (None, Bachelors, Masters, PhD)

�  There is unordered multinomial logistic regression, but that is not for today!


A Little More Information Ordinal multinomial logistic regression is an extension of logistic regression using multiple categories that have a logical order. (Gelman & Hill, 2007) “Ordinal data are the most frequently encountered type of data in the social sciences” (Johnson & Albert, 1999, p. 126).


Running a Model in R 1. We are going to use a file from UCLA, but first load

your libraries :

> library(psych) > library(arm) 2. Now we will read in our data: > suhy<- read.dta(url("http://www.ats.ucla. edu/stat/r/dae/ologit.dta"))


Running a Model in R 3. Let’s examine our data:

> head(suhy)

apply pared public gpa

1 very likely 0 0 3.26 2 somewhat likely 1 0 3.21 3 unlikely 1 1 3.94 4 somewhat likely 0 0 2.81 5 somewhat likely 0 0 2.53 6 unlikely 0 1 2.59


Running a Model in R Defining our variables:

apply = Likelihood of college juniors applying to grad

school. (Self-reported)(very likely, somewhat likely, unlikely)

pared = Does at least one parent have a graduate

degree? (no=0, yes=1) public = Undergrad was a private or public institution.

(private = 0, public = 1) gpa = Undergrad grade point average


Running a Model in R What are our assumptions?

Data are case specific – iv has a single value for each case No perfect predictors – no single predicator variable, iv can determine the outcome of the dv No zero or very small quantities in a crosstab cell Sample size – larger than normal OLS regression


Running a Model in R 4. Let us check our assumptions:

>xtabs(~suhy$pared+suhy$apply) suhy$pared unlikely somewhat likely very likely 0 200 110 27 1 20 30 13 > xtabs(~suhy$public+suhy$apply) suhy$public unlikely somewhat likely very likely 0 189 124 30 1 31 16 10


Running a Model in R 5. We are good to go, let’s run the model:

> summary(m1<-bayespolr(as.ordered(suhy$apply)~suhy$gpa)) Call: bayespolr(formula = as.ordered(suhy$apply) ~ suhy$gpa) Coefficients: Value Std. Error t value suhy$gpa 0.7109 0.2471 2.877 Intercepts: Value Std. Error t value unlikely|somewhat likely 2.3306 0.7502 3.1065 somewhat likely|very likely 4.3505 0.7744 5.6179 Residual Deviance: 737.6921 AIC: 743.6921


Running a Model in R A visual:


Thank you Pooja Shivraj (2012)

Running a Model in R 6. Lets calculate the probabilities for the average gpa

> x<-mean(suhy$gpa) x = 2.998925 > coef<-m1$coef > coef suhy$gpa 0.710892 > intercept<-m1$zeta > intercept unlikely|somewhat likely somewhat likely|very likely 2.330599 4.350527


Running a Model in R 6. Let’s calculate the probabilities for the average gpa (cont.)

Remember: > prob<-function(input){exp(input)/(1+exp(input))} > (p0<-prob(intercept[1]-coef*x))

unlikely|somewhat likely 0.549509 OR 55%

> (p1<-prob(intercept[2]-coef*x)-p0)

somewhat likely|very likely 0.3523997 OR 35%

> (p2<-1-(p0+p1))

very likely 0.09809127 OR 9% p0+p1+p2 always equal 1 when using 3 categories


Running a Model in R 2.5 GPA

> (p0<-prob(intercept[1]-coef*2.5)) unlikely|somewhat likely 0.6349169 > (p1<-prob(intercept[2]-coef*2.5)-p0) somewhat likely|very likely 0.2942062 > (p2<-1-(p0+p1))

very likely 0.07087689

3.7 GPA > (p0<-prob(intercept[1]-coef*3.7)) unlikely|somewhat likely 0.4256305 > (p1<-prob(intercept[2]-coef*3.7)-p0) somewhat likely|very likely 0.4225275 > (p2<-1-(p0+p1))

very likely 0.151842


Running a Model in R Now you tell me the probability for each category if you had a 4.0 GPA. > (p0<-prob(intercept[1]-coef*4.0)) unlikely|somewhat likely = 37% > (p1<-prob(intercept[2]-coef*4.0)-p0) somewhat likely|very likely = 44% > (p2<-1-(p0+p1))

very likely = 18%


Multiple Predictors 1.  Let’s look at a model with multiple predictors:

> summary(m2<-bayespolr(as.ordered(suhy$apply)~suhy$gpa+suhy$pared+suhy$public)) Call: bayespolr(formula = as.ordered(suhy$apply) ~ suhy$gpa + suhy$pared + suhy$public) Coefficients: Value Std. Error t value suhy$gpa 0.60441 0.2577 2.3453 suhy$pared 1.02746 0.2636 3.8973 suhy$public -0.05297 0.2932 -0.1807 Intercepts: Value Std. Error t value unlikely|somewhat likely 2.1646 0.7710 2.8074 somewhat likely|very likely 4.2526 0.7955 5.3458 Residual Deviance: 727.0019 AIC: 737.0019


Multiple Predictors 2.  Let’s calculate the probabilities:

>(coef<- m2$coef) suhy$gpa suhy$pared suhy$public 0.60440882 1.02746355 -0.05297486 > (intercept<-m2$zeta) unlikely|somewhat likely somewhat likely|very likely 2.164642 4.252572 > mean(suhy$public) [1] 0.1425


Multiple Predictors 2.  Let’s calculate the probabilities: (cont.)

>(x1<-cbind(0:4, 0 , .1425)) [,1] [,2] [,3] [1,] 0 0 0.1425 [2,] 1 0 0.1425 [3,] 2 0 0.1425 [4,] 3 0 0.1425 [5,] 4 0 0.1425 > (x2<-cbind(0:4, 1 , .1425)) [,1] [,2] [,3] [1,] 0 1 0.1425 [2,] 1 1 0.1425 [3,] 2 1 0.1425 [4,] 3 1 0.1425 [5,] 4 1 0.1425


Multiple Predictors For pared = no (x1) > prob<-function(VAR){exp(VAR)/(1+exp(VAR))} > (p1<-prob(intercept[1]-x1 %*% coef)) [,1]

[1,] 0.8977243 [2,] 0.8274671 [3,] 0.7237966

[4,] 0.5887896 [5,] 0.4389450 > (p2<-prob(intercept[2]-x1 %*% coef)-p1) [,1]

[1,] 0.08835176 [2,] 0.14734081 [3,] 0.23104216

[4,] 0.33154442 [5,] 0.42429742 >p3<-1-(p1+p2) >p3 [,1] [1,] 0.01392398 [2,] 0.02519204 [3,] 0.04516123

[4,] 0.07966593 [5,] 0.13675756

For pared = yes (x2) > prob<-function(VAR){exp(VAR)/(1+exp(VAR))} > (p4<-prob(intercept[1]-x2 %*% coef)) [,1]

[1,] 0.7585465 [2,] 0.6318864 [3,] 0.4839828

[4,] 0.3388329 [5,] 0.2187598 > (p5<-prob(intercept[2]-x2 %*% coef)-p4) [,1]

[1,] 0.2034985 [2,] 0.3007712 [3,] 0.3992947

[4,] 0.4664163 [5,] 0.4744189 > p6<-1-(p4+p5) > p6 [,1] [1,] 0.03795509 [2,] 0.06734236 [3,] 0.11672252

[4,] 0.19475078 [5,] 0.30682131


Graphing the Results >library(lattice)

>Undergrad.GPA <-0:4 >plot(Undergrad.GPA, p1, >type="l", col=1, ylim=c(0,1)) >lines(0:4, p2, col=2) >lines(0:4, p3, col=3) >lines(0:4, p4, col=1, lty = 2) >lines(0:4, p5, col=2, lty = 2) >lines(0:4, p6, col=3, lty = 2) >legend(1.5, 1, >legend=c("P(unlikely)", >"P(somewhat likely)", >"P(very likely)", "Line Type >when Pared = 0", >"Line Type when Pared = 1"), >col=c(1:3,1,1), >lty=c(1,1,1,1,2))


0 1 2 3 4

0.0

0.2

0.4

0.6

0.8

1.0

Undergrad.GPA

p1

P(unlikely)

P(somewhat likely)

P(very likely)Line Typewhen Pared = 0Line Type when Pared = 1

Why Not Linear Regression


� The decision is not always black and white

� Large categories that are equally spaced could call for a simple linear model

� However, you must ALWAYS check your assumptions

(Gelman &Hill, 2007)

Why Not Linear Regression Here is why… If we run our model using a simple linear model: >apply2<-as.numeric(suhy$apply) >m3<-lm(apply2~gpa, suhy) >summary(m3)

Call: lm(formula = apply2 ~ gpa, data = suhy) Residuals: Min 1Q Median 3Q Max -0.7917 -0.5554 -0.3962 0.4786 1.6012 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.77984 0.25224 3.092 0.00213 ** gpa 0.25681 0.08338 3.080 0.00221 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.6628 on 398 degrees of freedom Multiple R-squared: 0.02328, Adjusted R-squared: 0.02083 F-statistic: 9.486 on 1 and 398 DF, p-value: 0.002214


This is what we see when we check our assumptions…

Why Not Linear Regression


1.3 1.4 1.5 1.6 1.7 1.8

-1.0

0.0

0.5

1.0

1.5

Fitted values

Residuals

Residuals vs Fitted

1859486

-3 -2 -1 0 1 2 3

-10

12

Theoretical Quantiles

Sta

ndar

dize

d re

sidu

als

Normal Q-Q

1859486

1.3 1.4 1.5 1.6 1.7 1.8

0.0

0.5

1.0

1.5

Fitted values

Standardized residuals

Scale-Location1859486

0.000 0.005 0.010 0.015 0.020

-10

12

Leverage

Sta

ndar

dize

d re

sidu

als

Cook's distance

Residuals vs Leverage

13

185

78

You Tell me…..

References Gelman, A. & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. NewYork: Cambridge University Press. Hoelze, B. (2009). Regression analysis with the ordinal multinomial logistic model [PowerPoint slides]. Retrieved from http://faculty.smu.edu/kyler/courses/7314//student/ordered_multinomial.pptx Johnson, V. E. & Albert, J. H. (1999). Statistics for the social sciences and public policy: Ordinal data modeling. New York: Springer. Shivraj, P. (2011). Ordered multinomial logistic regression analysis [PowerPoint slides]. Retrieved from http://faculty.smu.edu/kyler/courses/7312/presentations/shivraj/Ordered_ML_Shivraj.pdf. UCLA: Academic Technology Services. (n.d.). Retrieved from http://www.ats.ucla.edu/stat/r/ologit.dta

ordinal multinomial logistic regressionfaculty.smu.edu/kyler/courses/7312/presentations/... ·...

Documents