multinomial logit & ordered probit. multinomial logit is used when the data cannot be ordered....

40
Multinomial Logit & Ordered Probit

Upload: samantha-rowe

Post on 28-Mar-2015

245 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

Multinomial Logit & Ordered Probit

Page 2: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

Multinomial Logit• Is used when the data cannot be ordered. An example is

choice of holiday: (i) beach, (ii) mountain, (iii) culture. For each individual they are go on just one holiday.

• We will examine this within the context of insurance data. The exact meaning does not matter, just treat it like holiday data. But for a clue type:

describesumm *ins*

label list insure

Page 3: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

use http://www.stata-press.com/data/r11/sysdsn1.dta,clear

Total 294 277 45 616 Uninsure 0 0 45 45 Prepaid 0 277 0 277 Indemnity 294 0 0 294 insure Indemnity Prepaid Uninsure Total insure

-> tabulation of insure by insure

. tab2 insure insure

There are 3 options: those who prepay, those who are not insured and those who are covered by an indemnity

generate site1=site==1generate site2=site==2generate site3=site==3

NOW TYPE: mlogit insure age male nonwhite site2 site3

Page 4: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

(insure==Indemnity is the base outcome) _cons -1.286943 .5923219 -2.17 0.030 -2.447872 -.1260135 site3 -.2078123 .3662926 -0.57 0.570 -.9257327 .510108 site2 -1.211563 .4705127 -2.57 0.010 -2.133751 -.2893747 nonwhite .2170589 .4256361 0.51 0.610 -.6171725 1.05129 male .4518496 .3674867 1.23 0.219 -.268411 1.17211 age -.0077961 .0114418 -0.68 0.496 -.0302217 .0146294Uninsure _cons .2697127 .3284422 0.82 0.412 -.3740222 .9134476 site3 -.5879879 .2279351 -2.58 0.010 -1.034733 -.1412433 site2 .1130359 .2101903 0.54 0.591 -.2989296 .5250013 nonwhite .9747768 .2363213 4.12 0.000 .5115955 1.437958 male .5616934 .2027465 2.77 0.006 .1643175 .9590693 age -.011745 .0061946 -1.90 0.058 -.0238862 .0003962Prepaid insure Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -534.36165 Pseudo R2 = 0.0387 Prob > chi2 = 0.0000 LR chi2(10) = 42.99Multinomial logistic regression Number of obs = 615

Note two equations one to exalpain those who opt for ‘prepaid’ and a second for those who opt for ‘uninsure’

Page 5: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

• But there are three choices, so why two equations. Well if you know the determinants of two of the choices the third comes about from default.

• It can also be viewed as the default choice against which the other two are being compared.

• Here the default case is the first, indemnity. Could we change it? YES.

Page 6: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

• mlogit insure age male nonwhite site2 site3, base(2)

This will change the default case to the second option.

Page 7: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

(insure==Prepaid is the base outcome) _cons -1.556656 .5963286 -2.61 0.009 -2.725438 -.387873 site3 .3801756 .3728188 1.02 0.308 -.3505358 1.110887 site2 -1.324599 .4697954 -2.82 0.005 -2.245381 -.4038165 nonwhite -.7577178 .4195759 -1.81 0.071 -1.580071 .0646357 male -.1098438 .3651883 -0.30 0.764 -.8255998 .6059122 age .0039489 .0115994 0.34 0.734 -.0187855 .0266832Uninsure _cons -.2697127 .3284422 -0.82 0.412 -.9134476 .3740222 site3 .5879879 .2279351 2.58 0.010 .1412433 1.034733 site2 -.1130359 .2101903 -0.54 0.591 -.5250013 .2989296 nonwhite -.9747768 .2363213 -4.12 0.000 -1.437958 -.5115955 male -.5616934 .2027465 -2.77 0.006 -.9590693 -.1643175 age .011745 .0061946 1.90 0.058 -.0003962 .0238862Indemnity insure Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -534.36165 Pseudo R2 = 0.0387 Prob > chi2 = 0.0000 LR chi2(10) = 42.99Multinomial logistic regression Number of obs = 615

Iteration 4: log likelihood = -534.36165Iteration 3: log likelihood = -534.36165Iteration 2: log likelihood = -534.36536Iteration 1: log likelihood = -534.72983Iteration 0: log likelihood = -555.85446

Page 8: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

Data also comes from:

• use http://www.stata-press.com/data/r11/sysdsn1.dta

• mlogit insure age male nonwhite

Page 9: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

Clear, set memory and load data

clearset mem 100000use "http://staff.bath.ac.uk/hssjrh/oprob.dta"

Page 10: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

Describe pers

YEARS AGOpersit5yr double %10.0g qa5 QA5 PERSONAL SITUATION - FIVE variable name type format label variable label storage display value

. describe pers

Page 11: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

• The variable relates to a person’s situation and how it has changed over the last five years.

• Let us look at it. • Type: tab2 pers pers

Page 12: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

The most common response was improved, but for over half of the

sample this was not the case

Total 11,178 9,533 8,418 301 29,430 DK 0 0 0 301 301 Got worse 0 0 8,418 0 8,418 Stayed about the same 0 9,533 0 0 9,533 Improved 11,178 0 0 0 11,178 YEARS AGO Improved Stayed ab Got worse DK Total SITUATION - FIVE QA5 PERSONAL SITUATION - FIVE YEARS AGO QA5 PERSONAL

Page 13: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

Ordered probit

• We use this when we have discrete data and when it is ordered. In this case

• 1 best (improved)• 2 next best (stayed about the same)• 3 worst (got worse).

The ordering is clear.

Page 14: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

Change in personal situation

Assume an underlying and continuous variable relating to changes in the individual’s personal situation

Page 15: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

Change in personal situation

If this underlying variable is to the left of μ1 we classify the variable as ‘1’ the individual’s position has improved

Page 16: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

Change in personal situation

If this underlying variable is to the right of μ2 we classify the variable as ‘3’ the individual’s position has got worse

Page 17: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

Change in personal situation

In between these two values we classify the variable as ‘2’ the individual’s position has stayed the same

Page 18: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

• You might say: surely ‘stay the same’ is one specific value (perhaps 0) anything to the left of this has improved and anything to the right has got worse.

• But it is common to assume a range of values which denote too small a change to denote either ‘improve’ or ‘got worse’ and these values are μ2 and μ1

Page 19: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

Do the estimation.• Simply use oprobit rather than regress.

oprobit persi lgnipc male age agesq rlaw estonia village town selfemp marrd educ2 unemp manual if age<98 & age>17 & persi<4

This regresses persi (note we do not have to write its full name as this is the only variable in the data set to begin with persi) on a set of right hand side variables

Page 20: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

if age<98 & age>17 & persi<4This limits the regressions to individuals older

than 17 and under 98 and also cuts out those who answered dont know (coded 4) for persi

Page 21: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

The results

/cut2 .3095595 .0788323 .155051 .464068 /cut1 -.6563796 .078922 -.8110638 -.5016953 manual .0821374 .0193907 4.24 0.000 .0441323 .1201426 unemp .6080104 .0313593 19.39 0.000 .5465472 .6694736 educ2 -.1429612 .0090999 -15.71 0.000 -.1607966 -.1251257 marrd -.1534421 .015842 -9.69 0.000 -.1844918 -.1223923 selfemp -.0974318 .0293755 -3.32 0.001 -.1550067 -.0398569 town .0338535 .0182975 1.85 0.064 -.002009 .0697159 village .0524684 .0184945 2.84 0.005 .0162199 .088717 estonia -.869435 .0417246 -20.84 0.000 -.9512136 -.7876564 rlaw -.2455444 .011504 -21.34 0.000 -.2680919 -.222997 agesq -.0322755 .0025142 -12.84 0.000 -.0372033 -.0273478 age .0513208 .0025145 20.41 0.000 .0463924 .0562492 male -.0249916 .0147207 -1.70 0.090 -.0538437 .0038604 lgnipc -.0766027 .0209432 -3.66 0.000 -.1176506 -.0355548 persit5yr Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -25990.573 Pseudo R2 = 0.0779 Prob > chi2 = 0.0000 LR chi2(13) = 4392.02Ordered probit regression Number of obs = 25751

Page 22: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

The summary output shows the number of observations, the log likelihood and the likelihood ratio. A pseudo R2 is exactly that and we may cover in the lectures later. It is rarely very high in ordered probit.

Log likelihood = -25990.573 Pseudo R2 = 0.0779 Prob > chi2 = 0.0000 LR chi2(13) = 4392.02Ordered probit regression Number of obs = 25751

Page 23: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

Remember the lower is the dependent variable (persi...) the better the person has done (1 for improved, 3 got worse).

So a negative coefficient indicates that as that variable increases so the person tends to have been doing better.

OK The self employed have been doing better as have people in Estonia????????

Those in countries with a good rule of law have done better and those in richer countries too (lgnipic: log Gross nattional income per capita)

selfemp -.0974318 .0293755 -3.32 0.001 -.1550067 -.0398569 town .0338535 .0182975 1.85 0.064 -.002009 .0697159 village .0524684 .0184945 2.84 0.005 .0162199 .088717 estonia -.869435 .0417246 -20.84 0.000 -.9512136 -.7876564 rlaw -.2455444 .011504 -21.34 0.000 -.2680919 -.222997 agesq -.0322755 .0025142 -12.84 0.000 -.0372033 -.0273478 age .0513208 .0025145 20.41 0.000 .0463924 .0562492 male -.0249916 .0147207 -1.70 0.090 -.0538437 .0038604 lgnipc -.0766027 .0209432 -3.66 0.000 -.1176506 -.0355548 persit5yr Coef. Std. Err. z P>|z| [95% Conf. Interval]

Page 24: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

Married people and educated people have been doing better but the unemployed and manual workers worse.

manual .0821374 .0193907 4.24 0.000 .0441323 .1201426 unemp .6080104 .0313593 19.39 0.000 .5465472 .6694736 educ2 -.1429612 .0090999 -15.71 0.000 -.1607966 -.1251257 marrd -.1534421 .015842 -9.69 0.000 -.1844918 -.1223923

Page 25: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

Impact of age

agesq -.0322755 .0025142 -12.84 0.000 -.0372033 -.0273478 age .0513208 .0025145 20.41 0.000 .0463924 .0562492

The impact of age is thus 0.0513* AGE - 0.0322*AGE*AGE/100

0.0322*AGE*AGE/100 because this is how age squared was calculated

So the impact is:

AGE IMPACT25 1.0812 40 1.5368 55 1.8474 70 2.0132

As people get older the probability of things getting worse increases. WHY?

Page 26: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

And finally

These are the estimates of μ1 and μ2

/cut2 .3095595 .0788323 .155051 .464068 /cut1 -.6563796 .078922 -.8110638 -.5016953

Page 27: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

• If for an individual the predicted value from the regression is less than -0.6564 then they would be predicted to be categorised as ‘1’ –position improved.

• If for an individual the predicted value from the regression is greater than 0.3096 then they would be predicted to be categorised as ‘3’ –position has got worse..

Page 28: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

• And if the predicted value lies between these two values, then predicted value is ‘no change’.

Page 29: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

Let us calculate some examples. First do the regression and store the coefficient

vector as cy

oprobit persi lgnipc male age agesq rlaw estonia village town selfemp marrd educ2 unemp manual if age<98 & age>17 & persi<4

matrix cy= e(b)

Page 30: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

oprobit persi lgnipc male age agesq rlaw estonia village town selfemp marrd educ2 unemp manual if age<98 & age>17 & persi<4

cy[1,1] is the coefficient on lgnipc. The average value for this is 3.0

• Then calculate scalar py50 =cy[1,1]*3.0 + cy[1,2]*1 + cy[1,3]*

50 + cy[1,4]* 50*50/100 + cy[1,5]*5+ cy[1,6]*0 + cy[1,7]*1 + cy[1,8]*0 + cy[1,9]*0 + cy[1,10]*1 + cy[1,11]*4 + cy[1,12]*0 + cy[1,13]*0

Page 31: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

cy[1,2] is the coefficient on male. Let us code this as 1, i.e. We are predicting for a man.

• scalar py50 =cy[1,1]*3.0 + cy[1,2]*1 + cy[1,3]* 50 + cy[1,4]* 50*50/100 + cy[1,5]*5+ cy[1,6]*0 + cy[1,7]*1 + cy[1,8]*0 + cy[1,9]*0 + cy[1,10]*1 + cy[1,11]*4 + cy[1,12]*0 + cy[1,13]*0

Page 32: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

scalar py50 =cy[1,1]*3.0 + cy[1,2]*1 + cy[1,3]* 50 + cy[1,4]* 50*50/100 + cy[1,5]*5+ cy[1,6]*0 + cy[1,7]*1 + cy[1,8]*0 + cy[1,9]*0 + cy[1,10]*1 + cy[1,11]*4 + cy[1,12]*0 + cy[1,13]*0

• The other characteristics are 50 years old, country with the highest level of rule of law (5), etc,

Page 33: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

-.39618871. display py50

This lies between -0.6564 and 0.3096, the two critical values and hence this person would be predicted to be ‘no change’

Now let us try the same person, but aged 30.

scalar py30 =cy[1,1]*3.0 + cy[1,2]*1 + cy[1,3]* 30 + cy[1,4]* 30*30/100 + cy[1,5]*5+ cy[1,6]*0 + cy[1,7]*1 + cy[1,8]*0 + cy[1,9]*0 + cy[1,10]*1 + cy[1,11]*4 + cy[1,12]*0 + cy[1,13]*0

Page 34: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

-.90619611. display py30

This is less than the lower critical value of -0.6564 hence this person would be predicted to have improved.

Page 35: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

• No one has ever analysed this before and there may be a paper.

• That people’s situation gets worse as they age is not surprising, once they reach say 50. But these results suggest It is so for those aged 30 viz a viz 20, just as much as 60 viz a viz 50.

• Perhaps we should try a spline on this just to check the quadratic form on age is not misleading

• And why do educated people fare better?

Page 36: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

Multinomial Logit ‘by hand’

program myologit args lnf xb a1 a2quietly replace `lnf' = ln(1/(1+exp(-`a1' + `xb')))

if $ML_y1 == 1quietly replace `lnf' = ln(1/(1+exp(-`a2'+ `xb')) -

1/(1+exp(-`a1' + `xb'))) if $ML_y1 == 2quietly replace `lnf' = ln(1 - 1/(1+exp(-`a2'+ `xb')))

if $ML_y1 == 3end

Page 37: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

* specify the method (lf) and the name of your evaluator (myologit)* followed by the equation(s) in parantheses and then the cutpoints.ml model lf myologit (xb: insure = age male nonwhite ) /a1 /a2ml checkml search ml maximize,iterate(50)ologit insure age male nonwhiteoprobit insure age male nonwhite

Page 38: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

Warning: convergence not achieved _cons 6.30395 . . . . .a2 _cons 3.622825 .1567336 23.11 0.000 3.315632 3.930017a1 _cons 3.866424 .2924209 13.22 0.000 3.29329 4.439558 nonwhite .5615129 .1958493 2.87 0.004 .1776553 .9453705 male .5056461 .1826912 2.77 0.006 .147578 .8637142 age -.0087368 .0055974 -1.56 0.119 -.0197076 .0022339xb insure Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -547.75513 Prob > chi2 = 0.0012 Wald chi2(3) = 15.91 Number of obs = 615

convergence not achieved

Does not converge and no second cut off point. But the coefficients per se the same as if we use the ologit command:

Page 39: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

ologit insure age male nonwhite

/cut2 2.437526 .2924209 1.864391 3.01066 /cut1 -.2435994 .2619071 -.7569278 .2697289 nonwhite .5615129 .1958493 2.87 0.004 .1776553 .9453705 male .5056461 .1826912 2.77 0.006 .147578 .8637142 age -.0087368 .0055974 -1.56 0.119 -.0197076 .0022339 insure Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -547.75513 Pseudo R2 = 0.0146 Prob > chi2 = 0.0010 LR chi2(3) = 16.20Ordered logistic regression Number of obs = 615

Iteration 3: log likelihood = -547.75513 Iteration 2: log likelihood = -547.75513 Iteration 1: log likelihood = -547.76723 Iteration 0: log likelihood = -555.85446

. ologit insure age male nonwhite

See also: http://www.ats.ucla.edu/stat/stata/code/ml_maximize.htm

Page 40: Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

use http://www.stata-press.com/data/r11/sysdsn1.dta

mlogit insure age male nonwhite

ologit persi lgnipc male age agesq rlaw estonia village town selfemp marrd educ2 unemp manual if age<98 & age>17 & persi<4 program myologit args lnf xb a1 a2 * The contribution to the likelihood at each level of y quietly replace `lnf' = ln(1/(1+exp(-`a1' + `xb'))) if $ML_y1 == 1 quietly replace `lnf' = ln(1/(1+exp(-`a2'+ `xb')) - 1/(1+exp(-`a1' + `xb'))) if $ML_y1 == 2 quietly replace `lnf' = ln(1 - 1/(1+exp(-`a2'+ `xb'))) if $ML_y1 == 3 end ologit insure age male nonwhite oprobit insure age male nonwhite ml model lf myologit (xb: insure = age male nonwhite ) /a1 /a2 ml check ml search ml maximize