multinomial logistic regression - facultywestfall.ba.ttu.edu/isqs5349/mlr_16.pdf · introduction...

of 22/22
Multinomial Logistic Regression Andrea Arias Ll. Luis Sandoval-Mej´ ıa Yangmei (Emily) Wang Texas Tech University ISQS 5349: Regression Analysis April 28, 2016

Post on 12-May-2019

224 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

Multinomial Logistic Regression

Andrea Arias Ll. Luis Sandoval-Meja Yangmei (Emily) Wang

Texas Tech UniversityISQS 5349: Regression Analysis

April 28, 2016

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Agenda

1 Introduction

2 Multinomial Logistic RegressionMultinomial logit modelModel assumptionsParameter estimation: MLE

3 Example in REstimated probabilities

4 Simulation in RAccounting example

5 References

Arias Ll., Sandoval-Meja, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Introduction

Lets consider a data set

A data set with n observations where the response variable can take oneof several discrete values (1,2,...,J)

Let Y be program type1:

GeneralAcademicVocation

1IDRE, 2016

Arias Ll., Sandoval-Meja, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Some data analysis

Probability of choosing one program by SES in general

0.400.46

0.72

0.34

0.210.16

0.260.33

0.12

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

low middle high

academic general voca:on

Arias Ll., Sandoval-Meja, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Some data analysis (cont.)

Probability of choosing one program by SES at the 1st

quartile of writing score

0.31 0.33

0.57

0.38

0.23 0.22

0.31

0.44

0.21

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

ses_low ses_middle ses_high

Academic General Voca?onal

Arias Ll., Sandoval-Meja, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Some data analysis (cont.)

Probability of choosing one program by SES at the meanwriting score

0.440.48

0.70

0.36

0.230.180.20

0.29

0.12

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

ses_low ses_middle ses_high

Academic General Voca?onal

Arias Ll., Sandoval-Meja, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Some data analysis (cont.)

Probability of choosing one program by SES at the 3rd

quartile of writing score

0.580.63

0.81

0.31

0.200.130.11

0.17

0.06

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

ses_low ses_middle ses_high

Academic General Voca?onal

Arias Ll., Sandoval-Meja, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Some data analysis (cont.)

Probability of choosing one program by writing score at SESlow

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

31323334353637383940414243444546474849505152535455565758596061626364656667

Wri0ngscore

Academic General Voca0onal

Arias Ll., Sandoval-Meja, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Some data analysis (cont.)

Probability of choosing one program by writing score at SESmiddle

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

31323334353637383940414243444546474849505152535455565758596061626364656667

Wri0ngscore

Academic General Voca0onal

Arias Ll., Sandoval-Meja, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Some data analysis (cont.)

Probability of choosing one program by writing score at SEShigh

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

31323334353637383940414243444546474849505152535455565758596061626364656667

Wri0ngscore

Academic General Voca0onal

Arias Ll., Sandoval-Meja, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

When do we use Multinomial Logit?

When we need to model nominal (unordered choices) out-come variables!

The individual is choosing between more than two alternatives.

Undergraduate major choiceFood choicesCar manufacturer choice

Important!! The multinomial logit model applies only whenthe predictor variables are individual specific.

Income.Age.Social economic status.... and so on.

Arias Ll., Sandoval-Meja, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

The Multinomial Distribution

Consider a random variable Yi that may take one of severaldiscrete values (1,2,...,J)

Let:

Pr{Yi = j} = ij i = 1, .., n j = 1, .., JWhere ij is the probability that the i-th individual choosethe j-th category2.

And

Jj=1

ij = 1 i = 1, .., n

2Rodriguez,2016

Arias Ll., Sandoval-Meja, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Multinomial logit modelModel assumptionsParameter estimation: MLE

The Multinomial Logit Model

A model for the probabilities where the probabilities dependon a vector Xi.

Nominate one of the response categories as baseline.

Calculate the logits for all other categories.with respect to the baseline.

Let the logits be a linear function of the predictors.

Arias Ll., Sandoval-Meja, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Multinomial logit modelModel assumptionsParameter estimation: MLE

The Multinomial Logit model (cont.)

log

(ijiJ

)= X

ij j = 1, .., J 1

Arias Ll., Sandoval-Meja, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Multinomial logit modelModel assumptionsParameter estimation: MLE

The Multinomial Logit model (cont.)

Modeling the probabilities3:

Prob(Yi = j|Xi) = ij =exp(X

ij)

1 +J1

j=1 exp(Xij)

, j = 1, 2, ..., J 1

Prob(Yi = J |Xi) = iJ =1

1 +J1

j=1 exp(Xij)

,

Note: The binomial logit model is a special case when J = 2.

3Greene, 2012

Arias Ll., Sandoval-Meja, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Multinomial logit modelModel assumptionsParameter estimation: MLE

Model assumptions

Linearity

Independence of observations

Independence of irrelevant alternatives(Not as important as the first two!!)

Arias Ll., Sandoval-Meja, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Multinomial logit modelModel assumptionsParameter estimation: MLE

Maximum Likelihood Estimation

Recall that the Likelihood function for a parameter vector resultingfrom an independent sample is4:

L(|y1, y2, ..., yn) = p(y1|) p(y2|) ... p(yn|)

In our case, since for observation i the probability is given by:

Pr{Yi = j} = ijWe have that the Likelihood function is the following:

L(|data) =

ichoice1

i1

ichoice2

i2...

ichoiceJ

iJ

We choose the that maximizes

L(|data)

4Westfall and Henning, 2013

Arias Ll., Sandoval-Meja, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Estimated probabilities

Estimated probabilities

Once we have estimated the parameters (), we can estimate the prob-abilities for each particular cohort.

Pr (Y = O0|X1 = x1, X2 = x2) =1

1 + e(10+11x1+12x2) + e(20+21x1+22x2)

Pr (Y = O1|X1 = x1, X2 = x2) =e(10+11x1+12x2)

1 + e(10+11x1+12x2) + e(20+21x1+22x2)

Pr (Y = O2|X1 = x1, X2 = x2) =e(20+21x1+22x2)

1 + e(10+11x1+12x2) + e(20+21x1+22x2)

Arias Ll., Sandoval-Meja, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Accounting example

SimulationAccounting example

Response variable: Tax service provider.Three nominal choices:

B - Big4N - Non Big4S - Self preparer

Predictor variable: Companys tax preparation budget.(lbudget)

Arias Ll., Sandoval-Meja, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Accounting example

SimulationAccounting example

The Model:

log

(Pr(Y = N |lbudget)Pr(Y = B|lbudget)

)= 01 + 11lbudget

log

(Pr(Y = S|lbudget)Pr(Y = B|lbudget)

)= 02 + 12lbudget

The probabilities of choosing each of the three choices:

Pr (Y = B|lbudget) = 11 + e(01+11lbudget) + e(02+12lbudget)

Pr (Y = N |lbudget) = e(01+11lbudget)

1 + e(01+11lbudget) + e(02+12lbudget)

Pr (Y = S|lbudget) = e(02+12lbudget)

1 + e(01+11lbudget) + e(02+12lbudget)

Arias Ll., Sandoval-Meja, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

References

McKee, D. 2015. An intuitive introduction to the multinomial logit.Available at: https://www.youtube.com/watch?v=n7zHXjuE6PE&t=2732s

Institute for Digital Research and Education, UCLA. R Data analysisexamples: Multinomial Logistic Regression. Available at: http://www.ats.ucla.edu/stat/r/dae/mlogit.htm

Greene, W, H. 2012. Econometric Analysis, 7th Edition. Prentice Hall,NJ.

Rodriguez, G. 2016. A note on interpreting multinomial logit coeffi-cients. Princeton University. Available at: http://data.princeton.edu/wws509/stata/mlogit.html

Westfall, P., Henning, K. S. (2013). Understanding advanced statisticalmethods. CRC Press.

Arias Ll., Sandoval-Meja, Wang Multinomial Logistic Regression

https://www.youtube.com/watch?v=n7zHXjuE6PE&t=2732shttp://www.ats.ucla.edu/stat/r/dae/mlogit.htmhttp://www.ats.ucla.edu/stat/r/dae/mlogit.htmhttp://data.princeton.edu/wws509/stata/mlogit.htmlhttp://data.princeton.edu/wws509/stata/mlogit.html

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

THANK YOU !!!

Questions?

Arias Ll., Sandoval-Meja, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic RegressionMultinomial logit modelModel assumptionsParameter estimation: MLE

Example in REstimated probabilities

Simulation in RAccounting example

References