ordinal multinomial logistic regressionfaculty.smu.edu/kyler/courses/7312/presentations/... ·...

Click here to load reader

Post on 12-May-2019

237 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

Ordinal Multinomial Logistic Regression

Thom M. Suhy Southern Methodist University May14th, 2013

GLM Generalized Linear Model (GLM)

Framework for statistical analysis (Gelman and Hill, 2007, p. 135)

Linear Regression Continuous data

Logistic Regression Binary data Ordered Multinomial Logistic Regression Unordered Multinomial Logistic Regression

Ordered Multinomial Logistic Regression

Logistic Regression Dependent variable is dichotomous

Yes or No Apply or Not Apply Pass or Fail Heisman or no Heisman

Probability of trait (yes, apply, pass, Heisman) based on independent variables

Independent variable does not need to be dichotomous Categorical Integral Dichotomous Nominal Ordinal

Ordered Multinomial Logistic Regression

Logistic Regression Refresher Call: glm(formula = comply ~ physrec, family = binomial(link = "logit")) Deviance Residuals: Min 1Q Median 3Q Max -1.3735 -1.3735 -0.5434 0.9933 1.9929 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.8383 0.4069 -4.518 6.26e-06 *** physrec 2.2882 0.4503 5.081 3.75e-07 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 226.47 on 163 degrees of freedom Residual deviance: 191.87 on 162 degrees of freedom AIC: 195.87 Number of Fisher Scoring iterations: 4

Ordered Multinomial Logistic Regression

Logistic Regression Refresher Formula for logit ## logit=-1.8383+(2.2882*physrec) ## ## logit = -1.8383 for no physrec ## ## logit = .4499 for yes physrec ## Probability to comply exp(-1.8383)/(1+(exp(-1.8383))) Probability of comply with no physrec = .137 or 13.7%

exp(.4499)/(1+(exp(.4499))) Probability of comply with physrec = .6106 or 61%

Ordered Multinomial Logistic Regression

Logistic Regression Refresher

physrec

factor(comply)

0.2 0.4 0.6 0.8

01

0.0

0.2

0.4

0.6

0.8

1.0

Ordered Multinomial Logistic Regression

Logistic vs. Ordered Multinomial How are they different?

An extension of logistic regression to multiple categories (Gelman & Hill, 2007)

Not binary, categorical (but ordered) Decision (Yes, Maybe, No) Order of Finish (1st, 2nd, 3rd) Likert Scale (Strongly Disagree Strongly

Agree) Income ranges (0 25K, 25K-50K, 50K+) Degree (None, Bachelors, Masters, PhD)

There is unordered multinomial logistic regression, but that is not for today!

Ordered Multinomial Logistic Regression

A Little More Information Ordinal multinomial logistic regression is an extension of logistic regression using multiple categories that have a logical order. (Gelman & Hill, 2007) Ordinal data are the most frequently encountered type of data in the social sciences (Johnson & Albert, 1999, p. 126).

Ordered Multinomial Logistic Regression

Running a Model in R 1. We are going to use a file from UCLA, but first load

your libraries :

> library(psych) > library(arm) 2. Now we will read in our data: > suhy

Running a Model in R 3. Lets examine our data:

> head(suhy)

apply pared public gpa

1 very likely 0 0 3.26 2 somewhat likely 1 0 3.21 3 unlikely 1 1 3.94 4 somewhat likely 0 0 2.81 5 somewhat likely 0 0 2.53 6 unlikely 0 1 2.59

Ordered Multinomial Logistic Regression

Running a Model in R Defining our variables:

apply = Likelihood of college juniors applying to grad

school. (Self-reported)(very likely, somewhat likely, unlikely)

pared = Does at least one parent have a graduate

degree? (no=0, yes=1) public = Undergrad was a private or public institution.

(private = 0, public = 1) gpa = Undergrad grade point average

Ordered Multinomial Logistic Regression

Running a Model in R What are our assumptions?

Data are case specific iv has a single value for each case No perfect predictors no single predicator variable, iv can determine the outcome of the dv No zero or very small quantities in a crosstab cell Sample size larger than normal OLS regression

Ordered Multinomial Logistic Regression

Running a Model in R 4. Let us check our assumptions:

>xtabs(~suhy$pared+suhy$apply) suhy$pared unlikely somewhat likely very likely 0 200 110 27 1 20 30 13 > xtabs(~suhy$public+suhy$apply) suhy$public unlikely somewhat likely very likely 0 189 124 30 1 31 16 10

Ordered Multinomial Logistic Regression

Running a Model in R 5. We are good to go, lets run the model:

> summary(m1

Running a Model in R A visual:

Ordered Multinomial Logistic Regression

Thank you Pooja Shivraj (2012)

Running a Model in R 6. Lets calculate the probabilities for the average gpa

> x coef coef suhy$gpa 0.710892 > intercept intercept unlikely|somewhat likely somewhat likely|very likely 2.330599 4.350527

Ordered Multinomial Logistic Regression

Running a Model in R 6. Lets calculate the probabilities for the average gpa (cont.)

Remember: > prob (p0 (p1 (p2

Running a Model in R 2.5 GPA

> (p0 (p1 (p2 (p0 (p1 (p2

Running a Model in R Now you tell me the probability for each category if you had a 4.0 GPA. > (p0 (p1 (p2

Multiple Predictors 1. Lets look at a model with multiple predictors:

> summary(m2

Multiple Predictors 2. Lets calculate the probabilities:

>(coef (intercept mean(suhy$public) [1] 0.1425

Ordered Multinomial Logistic Regression

Multiple Predictors 2. Lets calculate the probabilities: (cont.)

>(x1 (x2

Multiple Predictors For pared = no (x1) > prob (p1 (p2p3p3 [,1] [1,] 0.01392398 [2,] 0.02519204 [3,] 0.04516123

[4,] 0.07966593 [5,] 0.13675756

For pared = yes (x2) > prob (p4 (p5 p6 p6 [,1] [1,] 0.03795509 [2,] 0.06734236 [3,] 0.11672252

[4,] 0.19475078 [5,] 0.30682131

Ordered Multinomial Logistic Regression

Graphing the Results >library(lattice) >Undergrad.GPA plot(Undergrad.GPA, p1, >type="l", col=1, ylim=c(0,1)) >lines(0:4, p2, col=2) >lines(0:4, p3, col=3) >lines(0:4, p4, col=1, lty = 2) >lines(0:4, p5, col=2, lty = 2) >lines(0:4, p6, col=3, lty = 2) >legend(1.5, 1, >legend=c("P(unlikely)", >"P(somewhat likely)", >"P(very likely)", "Line Type >when Pared = 0", >"Line Type when Pared = 1"), >col=c(1:3,1,1), >lty=c(1,1,1,1,2))

Ordered Multinomial Logistic Regression

0 1 2 3 4

0.0

0.2

0.4

0.6

0.8

1.0

Undergrad.GPA

p1

P(unlikely)

P(somewhat likely)

P(very likely)Line Typewhen Pared = 0Line Type when Pared = 1

Why Not Linear Regression

Ordered Multinomial Logistic Regression

The decision is not always black and white

Large categories that are equally spaced could call for a simple linear model

However, you must ALWAYS check your assumptions

(Gelman &Hill, 2007)

Why Not Linear Regression Here is why If we run our model using a simple linear model: >apply2m3summary(m3)

Call: lm(formula = apply2 ~ gpa, data = suhy) Residuals: Min 1Q Median 3Q Max -0.7917 -0.5554 -0.3962 0.4786 1.6012 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.77984 0.25224 3.092 0.00213 ** gpa 0.25681 0.08338 3.080 0.00221 ** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 0.6628 on 398 degrees of freedom Multiple R-squared: 0.02328, Adjusted R-squared: 0.02083 F-statistic: 9.486 on 1 and 398 DF, p-value: 0.002214

Ordered Multinomial Logistic Regression

This is what we see when we check our assumptions

Why Not Linear Regression

Ordered Multinomial Logistic Regression

1.3 1.4 1.5 1.6 1.7 1.8

-1.0

0.0

0.5

1.0

1.5

Fitted values

Residuals

Residuals vs Fitted

1859486

-3 -2 -1 0 1 2 3

-10

12

Theoretical Quantiles

Sta

ndar

dize

d re

sidu

als

Normal Q-Q

1859486

1.3 1.4 1.5 1.6 1.7 1.8

0.0

0.5

1.0

1.5

Fitted values

Standardized residuals

Scale-Location1859486

0.000 0.005 0.010 0.015 0.020

-10

12

Leverage

Sta

ndar

dize

d re

sidu

als

Cook's distance

Residuals vs Leverage

13

185

78

You Tell me..

References Gelman, A. & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. NewYork: Cambridge University Press. Hoelze, B. (2009). Regression analysis with the ordinal multinomial logistic model [PowerPoint slides]. Retrieved from http://faculty.smu.edu/kyler/courses/7314//student/ordered_multinomial.pptx Johnson, V. E. & Albert, J. H. (1999). Statistics for the social sciences and public policy: Ordinal data modeling. New York: Springer. Shivraj, P. (2011). Ordered multinomial logistic regression analysis [PowerPoint slides]. Retrieved from http://faculty.smu.edu/kyler/courses/7312/presentations/shivraj/Ordered_ML_Shivraj.pdf. UCLA: Academic Technology Services. (n.d.). Retrieved from http://www.ats.ucla.edu/stat/r/ologit.dta