# multinomial logistic regression - facultywestfall.ba.ttu.edu/isqs5349/mlr_16.pdf · introduction...

22
Multinomial Logistic Regression Andrea Arias Ll. Luis Sandoval-Mej´ ıa Yangmei (Emily) Wang Texas Tech University ISQS 5349: Regression Analysis April 28, 2016

Post on 12-May-2019

266 views

Category:

## Documents

TRANSCRIPT

Multinomial Logistic Regression

Andrea Arias Ll. Luis Sandoval-Mej́ıa Yangmei (Emily) Wang

Texas Tech UniversityISQS 5349: Regression Analysis

April 28, 2016

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Agenda

1 Introduction

2 Multinomial Logistic RegressionMultinomial logit modelModel assumptionsParameter estimation: MLE

3 Example in REstimated probabilities

4 Simulation in RAccounting example

5 References

Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Introduction

Let’s consider a data set

A data set with n observations where the response variable can take oneof several discrete values (1,2,...,J)

Let Y be program type1:

1IDRE, 2016

Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Some data analysis

Probability of choosing one program by SES in general

0.400.46

0.72

0.34

0.210.16

0.260.33

0.12

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

low middle high

Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Some data analysis (cont.)

Probability of choosing one program by SES at the 1st

quartile of writing score

0.31 0.33

0.57

0.38

0.23 0.22

0.31

0.44

0.21

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

ses_low ses_middle ses_high

Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Some data analysis (cont.)

Probability of choosing one program by SES at the meanwriting score

0.440.48

0.70

0.36

0.230.180.20

0.29

0.12

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

ses_low ses_middle ses_high

Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Some data analysis (cont.)

Probability of choosing one program by SES at the 3rd

quartile of writing score

0.580.63

0.81

0.31

0.200.130.11

0.17

0.06

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

ses_low ses_middle ses_high

Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Some data analysis (cont.)

Probability of choosing one program by writing score at SESlow

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

31323334353637383940414243444546474849505152535455565758596061626364656667

Wri0ngscore

Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Some data analysis (cont.)

Probability of choosing one program by writing score at SESmiddle

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

31323334353637383940414243444546474849505152535455565758596061626364656667

Wri0ngscore

Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Some data analysis (cont.)

Probability of choosing one program by writing score at SEShigh

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

31323334353637383940414243444546474849505152535455565758596061626364656667

Wri0ngscore

Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

When do we use Multinomial Logit?

When we need to model nominal (unordered choices) out-come variables!

The individual is choosing between more than two alternatives.

Undergraduate major choiceFood choicesCar manufacturer choice

Important!! The multinomial logit model applies only whenthe predictor variables are individual specific.

Income.Age.Social economic status.... and so on.

Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

The Multinomial Distribution

Consider a random variable Yi that may take one of severaldiscrete values (1,2,...,J)

Let:

Pr{Yi = j} = πij i = 1, .., n j = 1, .., J

Where πij is the probability that the i-th individual choosethe j-th category2.

And

J∑j=1

πij = 1 i = 1, .., n

2Rodriguez,2016

Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Multinomial logit modelModel assumptionsParameter estimation: MLE

The Multinomial Logit Model

A model for the probabilities where the probabilities dependon a vector Xi.

Nominate one of the response categories as baseline.

Calculate the logits for all other categories.with respect to the baseline.

Let the logits be a linear function of the predictors.

Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Multinomial logit modelModel assumptionsParameter estimation: MLE

The Multinomial Logit model (cont.)

log

(πijπiJ

)= X

′iβj j = 1, .., J − 1

Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Multinomial logit modelModel assumptionsParameter estimation: MLE

The Multinomial Logit model (cont.)

Modeling the probabilities3:

Prob(Yi = j|Xi) = πij =exp(X

iβj)

1 +∑J−1

j=1 exp(X′iβj)

, j = 1, 2, ..., J − 1

Prob(Yi = J |Xi) = πiJ =1

1 +∑J−1

j=1 exp(X′iβj)

,

Note: The binomial logit model is a special case when J = 2.

3Greene, 2012

Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Multinomial logit modelModel assumptionsParameter estimation: MLE

Model assumptions

Linearity

Independence of observations

Independence of irrelevant alternatives(Not as important as the first two!!)

Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Multinomial logit modelModel assumptionsParameter estimation: MLE

Maximum Likelihood Estimation

Recall that the Likelihood function for a parameter vector θ resultingfrom an independent sample is4:

L(θ|y1, y2, ..., yn) = p(y1|θ)× p(y2|θ)× ...× p(yn|θ)

In our case, since for observation i the probability is given by:

Pr{Yi = j} = πij

We have that the Likelihood function is the following:

L(π|data) =∏

i∈choice1

πi1∏

i∈choice2

πi2...∏

i∈choiceJ

πiJ

We choose the β̂ that maximizes

L(π|data)

4Westfall and Henning, 2013

Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Estimated probabilities

Estimated probabilities

Once we have estimated the parameters (β̂), we can estimate the prob-abilities for each particular cohort.

Pr (Y = O0|X1 = x1, X2 = x2) =1

1 + e(β̂10+β̂11x1+β̂12x2) + e(β̂20+β̂21x1+β̂22x2)

Pr (Y = O1|X1 = x1, X2 = x2) =e(β̂10+β̂11x1+β̂12x2)

1 + e(β̂10+β̂11x1+β̂12x2) + e(β̂20+β̂21x1+β̂22x2)

Pr (Y = O2|X1 = x1, X2 = x2) =e(β̂20+β̂21x1+β̂22x2)

1 + e(β̂10+β̂11x1+β̂12x2) + e(β̂20+β̂21x1+β̂22x2)

Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Accounting example

SimulationAccounting example

Response variable: Tax service provider.Three nominal choices:

B - Big4N - Non Big4S - Self preparer

Predictor variable: Company’s tax preparation budget.(lbudget)

Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

Accounting example

SimulationAccounting example

The Model:

log

(Pr(Y = N |lbudget)Pr(Y = B|lbudget)

)= β01 + β11lbudget

log

(Pr(Y = S|lbudget)Pr(Y = B|lbudget)

)= β02 + β12lbudget

The probabilities of choosing each of the three choices:

Pr (Y = B|lbudget) =1

1 + e(β01+β11lbudget) + e(β02+β12lbudget)

Pr (Y = N |lbudget) =e(β01+β11lbudget)

1 + e(β01+β11lbudget) + e(β02+β12lbudget)

Pr (Y = S|lbudget) =e(β02+β12lbudget)

1 + e(β01+β11lbudget) + e(β02+β12lbudget)

Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

References

McKee, D. 2015. An intuitive introduction to the multinomial logit.Available at: https://www.youtube.com/watch?v=n7zHXjuE6PE&t=2732s

Institute for Digital Research and Education, UCLA. R Data analysisexamples: Multinomial Logistic Regression. Available at: http://www.

ats.ucla.edu/stat/r/dae/mlogit.htm

Greene, W, H. 2012. Econometric Analysis, 7th Edition. Prentice Hall,NJ.

Rodriguez, G. 2016. A note on interpreting multinomial logit coeffi-cients. Princeton University. Available at: http://data.princeton.

edu/wws509/stata/mlogit.html

Westfall, P., Henning, K. S. (2013). Understanding advanced statisticalmethods. CRC Press.

Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression

IntroductionMultinomial Logistic Regression

Example in RSimulation in R

References

THANK YOU !!!

Questions?

Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression