rdered multinomial regression analysisfaculty.smu.edu/kyler/courses/7312/presentations/... · the...

25
ORDERED MULTINOMIAL LOGISTIC REGRESSION ANALYSIS Pooja Shivraj Southern Methodist University

Upload: others

Post on 19-Jul-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: RDERED MULTINOMIAL REGRESSION ANALYSISfaculty.smu.edu/kyler/courses/7312/presentations/... · The decision between linear regression and ordered multinomial regression is not always

ORDERED MULTINOMIAL LOGISTIC REGRESSION ANALYSIS

Pooja Shivraj Southern Methodist University

Page 2: RDERED MULTINOMIAL REGRESSION ANALYSISfaculty.smu.edu/kyler/courses/7312/presentations/... · The decision between linear regression and ordered multinomial regression is not always

 Linear Regression

 Logistic Regression   Dichotomous dependent variable (yes/no, died/

didn’t die, at risk/not at risk, etc.)   Predicts the probability of a person belonging

in that category.

KINDS OF REGRESSION ANALYSES

Page 3: RDERED MULTINOMIAL REGRESSION ANALYSISfaculty.smu.edu/kyler/courses/7312/presentations/... · The decision between linear regression and ordered multinomial regression is not always

QUICK REVIEW: LOGISTIC REGRESSION

 Values calculated from linear regression are continuous – need to be transformed on a 0-1 scale to represent probability since 0 ≤ p ≤ 1

 Logistic regression probability calculated by:

p ^ = e

(B1x + B0)

e (B1x + B0) 1 +

Page 4: RDERED MULTINOMIAL REGRESSION ANALYSISfaculty.smu.edu/kyler/courses/7312/presentations/... · The decision between linear regression and ordered multinomial regression is not always

CLASS EXAMPLE: LOGISTIC REGRESSION

 Probability of a person complying for a mammogram, based on whether or not they get a physician’s recommendation

Page 5: RDERED MULTINOMIAL REGRESSION ANALYSISfaculty.smu.edu/kyler/courses/7312/presentations/... · The decision between linear regression and ordered multinomial regression is not always

CLASS EXAMPLE: LOGISTIC REGRESSION

p ^ = e (B1x + B0)

e (B1x + B0) 1 +

Probability of complying if NOT recommended by physician:

Probability of complying if recommended by physician:

p ^ = e (2.29(0) - 1.84)

e (2.29(0) - 1.84) 1 + p ^ =

e (2.29(1) - 1.84)

e (2.29(1) - 1.84) 1 +

= 0.14 = 0.61

Page 6: RDERED MULTINOMIAL REGRESSION ANALYSISfaculty.smu.edu/kyler/courses/7312/presentations/... · The decision between linear regression and ordered multinomial regression is not always

ORDERED MULTINOMIAL LOGISTIC REGRESSION ANALYSIS

 Type of logistic regression that allows more than two discrete outcomes

 Outcomes are ordinal:   Yes, maybe, no   First, second, third place   Gold, silver, bronze medals   Strongly agree, agree, neutral, disagree,

strongly disagree

Page 7: RDERED MULTINOMIAL REGRESSION ANALYSISfaculty.smu.edu/kyler/courses/7312/presentations/... · The decision between linear regression and ordered multinomial regression is not always

ASSUMPTION

 No perfect predictions – one predictor variable value cannot solely correspond to one dependent variable value – check using crosstabs.

Page 8: RDERED MULTINOMIAL REGRESSION ANALYSISfaculty.smu.edu/kyler/courses/7312/presentations/... · The decision between linear regression and ordered multinomial regression is not always

ORDERED LOGISTIC REGRESSION EXAMPLE

 Load libraries:

 Load data: pooj<-read.csv("http://www.ats.ucla.edu/stat/r/dae/ologit.csv")

library(arm) library(psych)

Page 9: RDERED MULTINOMIAL REGRESSION ANALYSISfaculty.smu.edu/kyler/courses/7312/presentations/... · The decision between linear regression and ordered multinomial regression is not always

ORDERED LOGISTIC REGRESSION EXAMPLE

 Variables:   apply – college juniors reported likelihood

of applying to grad school (0 = unlikely, 1 = somewhat likely, 2 = very likely)

  pared – indicating whether at least one parent has a graduate degree (0 = no, 1 = yes)

  public – indicating whether the undergraduate institution is a public or private (0 = private, 1 = public)

  gpa – college GPA

Page 10: RDERED MULTINOMIAL REGRESSION ANALYSISfaculty.smu.edu/kyler/courses/7312/presentations/... · The decision between linear regression and ordered multinomial regression is not always

> str(pooj) 'data.frame': 400 obs. of 4 variables: $ apply : int 2 1 0 1 1 0 1 1 0 1 ... $ pared : int 0 1 1 0 0 0 0 0 0 1 ... $ public: int 0 0 1 0 0 1 0 0 0 0 ... $ gpa : num 3.26 3.21 3.94 2.81 2.53 ... > table(pooj$apply) 0 1 2 220 140 40 > table(pooj$pared) 0 1 337 63 > table(pooj$public) 0 1 343 57

Page 11: RDERED MULTINOMIAL REGRESSION ANALYSISfaculty.smu.edu/kyler/courses/7312/presentations/... · The decision between linear regression and ordered multinomial regression is not always

> xtabs(~pooj$pared+pooj$apply) pooj$apply pooj$pared 0 1 2 0 200 110 27 1 20 30 13 > xtabs(~pooj$public+pooj$apply) pooj$apply pooj$public 0 1 2 0 189 124 30 1 31 16 10

CHECK ASSUMPTION – CROSS-TABS

 Why is this important?

Page 12: RDERED MULTINOMIAL REGRESSION ANALYSISfaculty.smu.edu/kyler/courses/7312/presentations/... · The decision between linear regression and ordered multinomial regression is not always

SINGLE PREDICTOR MODEL - GPA > library(arm) > summary(m1<-bayespolr(as.ordered(pooj$apply)~pooj$gpa)) Call: bayespolr(formula = as.ordered(pooj$apply) ~ pooj$gpa) Coefficients: Value Std. Error t value pooj$gpa 0.7109 0.2471 2.877 Intercepts: Value Std. Error t value 0|1 2.3306 0.7502 3.1065 1|2 4.3505 0.7744 5.6179 Residual Deviance: 737.6921 AIC: 743.6921

Page 13: RDERED MULTINOMIAL REGRESSION ANALYSISfaculty.smu.edu/kyler/courses/7312/presentations/... · The decision between linear regression and ordered multinomial regression is not always

0|1 1|2

CUMULATIVE DISTRIBUTION FUNCTION

Page 14: RDERED MULTINOMIAL REGRESSION ANALYSISfaculty.smu.edu/kyler/courses/7312/presentations/... · The decision between linear regression and ordered multinomial regression is not always

LABELING COEFFICIENTS Coefficients:

Value Std. Error t value

pooj$gpa 0.7109 0.2471 2.877

Intercepts:

Value Std. Error t value

0|1 2.3306 0.7502 3.1065

1|2 4.3505 0.7744 5.6179

Coefficient of the model coef<- m1$coef

Intercepts of the model intercept <- m1$zeta

 Let us look at the likelihood of students with an average GPA applying to graduate school.

> x<-mean(pooj$gpa)

[1] 2.998925

Page 15: RDERED MULTINOMIAL REGRESSION ANALYSISfaculty.smu.edu/kyler/courses/7312/presentations/... · The decision between linear regression and ordered multinomial regression is not always

TRANSFORMING OUTCOMES TO PROBABILITIES

prob<-function(input){exp(input)/(1+exp(input))}

(p0<-prob(intercept[1]-coef*x))

0.5493198

(p1<-prob(intercept[2]-coef*x)-p0)

0.3525213 (p2<-1-(p0+p1))

0.0981589

Page 16: RDERED MULTINOMIAL REGRESSION ANALYSISfaculty.smu.edu/kyler/courses/7312/presentations/... · The decision between linear regression and ordered multinomial regression is not always

WHY NOT USE LINEAR REGRESSION? > summary(linreg<-lm(pooj$apply~pooj$gpa))

Call:

lm(formula = pooj$apply ~ pooj$gpa)

Residuals:

Min 1Q Median 3Q Max

-0.7917 -0.5554 -0.3962 0.4786 1.6012

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -0.22016 0.25224 -0.873 0.38329

pooj$gpa 0.25681 0.08338 3.080 0.00221 **

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.6628 on 398 degrees of freedom

Multiple R-squared: 0.02328, Adjusted R-squared: 0.02083

F-statistic: 9.486 on 1 and 398 DF, p-value: 0.002214

Page 17: RDERED MULTINOMIAL REGRESSION ANALYSISfaculty.smu.edu/kyler/courses/7312/presentations/... · The decision between linear regression and ordered multinomial regression is not always

AND OUR ASSUMPTIONS AREN’T MET…

Page 18: RDERED MULTINOMIAL REGRESSION ANALYSISfaculty.smu.edu/kyler/courses/7312/presentations/... · The decision between linear regression and ordered multinomial regression is not always

LINEAR REGRESSION VERSUS ORDERED LOGISTIC REGRESSION

The decision between linear regression and ordered multinomial regression is not always black and white. When you have a large number of categories that can be considered equally spaced simple linear regression is an optional alternative (Gelman & Hill, 2007).

Moral of story: Always start by checking the

assumptions of the model.

Page 19: RDERED MULTINOMIAL REGRESSION ANALYSISfaculty.smu.edu/kyler/courses/7312/presentations/... · The decision between linear regression and ordered multinomial regression is not always

USING MULTIPLE PREDICTORS summary(m2 <- bayespolr(as.ordered(apply)~gpa + pared + public ,pooj)) Call: bayespolr(formula = as.ordered(apply) ~ gpa + pared + public, pooj) Coefficients: Value Std. Error t value gpa 0.6041463 0.2577039 2.3443424 pared 1.0274106 0.2636348 3.8970973 public -0.0528103 0.2931885 -0.1801240 Intercepts: Value Std. Error t value 0|1 2.1638 0.7710 2.8064 1|2 4.2518 0.7955 5.3449 Residual Deviance: 727.002 AIC: 737.002

Page 20: RDERED MULTINOMIAL REGRESSION ANALYSISfaculty.smu.edu/kyler/courses/7312/presentations/... · The decision between linear regression and ordered multinomial regression is not always

TRANSFORMING OUTCOMES TO PROBABILITIES (coef<- m2$coef)

gpa pared public

0.6041463 1.0274106 -0.0528103

(intercept<-m2$zeta)

0|1 1|2

2.163841 4.251774

(x1<-cbind(0:4, 0 , .14))

[,1] [,2] [,3]

[1,] 0 0 0.14

[2,] 1 0 0.14

[3,] 2 0 0.14

[4,] 3 0 0.14

[5,] 4 0 0.14

(x2<-cbind(0:4, 1 , .14))

[,1] [,2] [,3]

[1,] 0 1 0.14

[2,] 1 1 0.14

[3,] 2 1 0.14

[4,] 3 1 0.14

[5,] 4 1 0.14

Page 21: RDERED MULTINOMIAL REGRESSION ANALYSISfaculty.smu.edu/kyler/courses/7312/presentations/... · The decision between linear regression and ordered multinomial regression is not always

TRANSFORMING OUTCOMES TO PROBABILITIES prob<-function(VAR){exp(VAR)/(1+exp(VAR))}

> (p1<-prob(intercept[1]-x1 %*% coef))

[,1]

[1,] 0.9119769

[2,] 0.8498732

[3,] 0.7556908

[4,] 0.6282669

[5,] 0.4801055

> (p2<-prob(intercept[2]-x1 %*% coef)-p1)

[,1]

[1,] 0.07538029

[2,] 0.12722869

[3,] 0.20318345

[4,] 0.29895089

[5,] 0.39428044

> (p3<-1-(p1+p2))

[,1]

[1,] 0.01264281

[2,] 0.02289816

[3,] 0.04112575

[4,] 0.07278223

[5,] 0.12561404

Page 22: RDERED MULTINOMIAL REGRESSION ANALYSISfaculty.smu.edu/kyler/courses/7312/presentations/... · The decision between linear regression and ordered multinomial regression is not always

TRANSFORMING OUTCOMES TO PROBABILITIES > (p4<-prob(intercept[1]-x2 %*% coef))

[,1]

[1,] 0.7876055

[2,] 0.6695483

[3,] 0.5254116

[4,] 0.3769123

[5,] 0.2484150

> (p5<-prob(intercept[2]-x2 %*% coef)-p1)

[,1]

[1,] 0.05348287

[2,] 0.08867445

[3,] 0.13730004

[4,] 0.19186675

[5,] 0.23347632

> (p6<-1-(p4+p5))

[,1]

[1,] 0.1589117

[2,] 0.2417772

[3,] 0.3372883

[4,] 0.4312209

[5,] 0.5181087

Page 23: RDERED MULTINOMIAL REGRESSION ANALYSISfaculty.smu.edu/kyler/courses/7312/presentations/... · The decision between linear regression and ordered multinomial regression is not always

PLOTTING THE RESULTS Undergrad.GPA <-0:4 plot(Undergrad.GPA, p1, type="l", col=1, ylim=c(0,1)) lines(0:4, p2, col=2) lines(0:4, p3, col=3) lines(0:4, p4, col=1, lty = 2) lines(0:4, p5, col=2, lty = 2) lines(0:4, p6, col=3, lty = 2) legend(1.5, 1, legend=c("P(unlikely)", "P(somewhat likely)", "P(very likely)", "Line Type when Pared = 0", "Line Type when Pared = 1"), col=c(1:3,1,1), lty=c(1,1,1,1,2))

Page 24: RDERED MULTINOMIAL REGRESSION ANALYSISfaculty.smu.edu/kyler/courses/7312/presentations/... · The decision between linear regression and ordered multinomial regression is not always

PRACTICE Read in the following table (Quinn, n.d.): practice <- read.table("http://www.stat.washington.edu/quinn/classes/536/data/nes96r.dat", header=TRUE)

Task: Run a regression using the ordered multinomial logistic model to predict the variation in the dependent variable ClinLR using the independent variables PID and educ.

ClinLR = Ordinal variable from 1-7 indicating ones view of Bill Clinton’s political

leanings, where 1 = extremely liberal, 2 = liberal, 3 = slightly liberal, 4 = moderate, 5= slightly conservative, 6 = conservative, 6 = extremely conservative.

PID = Ordinal variable from 0-6 indicating ones own political identification, where 0 = Strong Democrat and 6 = Strong Republican

educ = Ordinal variable from 1-7 indicating ones own level of education, where 1 = 8 grades or less and no diploma, 2 = 9-11 grades, no further schooling, 3 = High school diploma or equivalency test, 4 = More than 12 years of schooling, no higher degree, 5 = Junior or community college level degree (AA degrees), 6 = BA level degrees; 17+ years, no postgraduate degree, 7 = Advanced degree

Page 25: RDERED MULTINOMIAL REGRESSION ANALYSISfaculty.smu.edu/kyler/courses/7312/presentations/... · The decision between linear regression and ordered multinomial regression is not always

REFERENCES

 Gelman, A. & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. New York: Cambridge University Press.

 Quinn, K. (n.d.). Retrieved from http://www.stat.washington.edu/quinn/classes/536/data/nes96r.dat  UCLA: Academic Technology Services. (n.d.). Retrieved from http://www.ats.ucla.edu/stat/r/dae/ologit.csv