# multinomial logit & ordered probit

Post on 07-Feb-2016

159 views

Embed Size (px)

DESCRIPTION

Multinomial Logit & Ordered Probit. Multinomial Logit. Is used when the data cannot be ordered. An example is choice of holiday: ( i ) beach, (ii) mountain, (iii) culture. For each individual they are go on just one holiday. - PowerPoint PPT PresentationTRANSCRIPT

Multinomial Logit & Ordered Probit

Multinomial LogitIs used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain, (iii) culture. For each individual they are go on just one holiday.

We will examine this within the context of insurance data. The exact meaning does not matter, just treat it like holiday data. But for a clue type:describesumm *ins*label list insure

use http://www.stata-press.com/data/r11/sysdsn1.dta,clear

There are 3 options: those who prepay, those who are not insured and those who are covered by an indemnity

generate site1=site==1generate site2=site==2generate site3=site==3

NOW TYPE: mlogit insure age male nonwhite site2 site3

Note two equations one to exalpain those who opt for prepaid and a second for those who opt for uninsure

But there are three choices, so why two equations. Well if you know the determinants of two of the choices the third comes about from default.It can also be viewed as the default choice against which the other two are being compared. Here the default case is the first, indemnity. Could we change it? YES.

mlogit insure age male nonwhite site2 site3, base(2)

This will change the default case to the second option.

Data also comes from:use http://www.stata-press.com/data/r11/sysdsn1.dtamlogit insure age male nonwhite

Clear, set memory and load dataclearset mem 100000use "http://staff.bath.ac.uk/hssjrh/oprob.dta"

Describe pers

The variable relates to a persons situation and how it has changed over the last five years. Let us look at it. Type: tab2 pers pers

The most common response was improved, but for over half of the sample this was not the case

Ordered probitWe use this when we have discrete data and when it is ordered. In this case

1 best (improved)2 next best (stayed about the same)3 worst (got worse).

The ordering is clear.

Change in personal situationAssume an underlying and continuous variable relating to changes in the individuals personal situation

Change in personal situationIf this underlying variable is to the left of 1 we classify the variable as 1 the individuals position has improved

Change in personal situationIf this underlying variable is to the right of 2 we classify the variable as 3 the individuals position has got worse

Change in personal situationIn between these two values we classify the variable as 2 the individuals position has stayed the same

You might say: surely stay the same is one specific value (perhaps 0) anything to the left of this has improved and anything to the right has got worse.But it is common to assume a range of values which denote too small a change to denote either improve or got worse and these values are 2 and 1

Do the estimation.Simply use oprobit rather than regress.

oprobit persi lgnipc male age agesq rlaw estonia village town selfemp marrd educ2 unemp manual if age17 & persi

- if age17 & persi
The results

The summary output shows the number of observations, the log likelihood and the likelihood ratio. A pseudo R2 is exactly that and we may cover in the lectures later. It is rarely very high in ordered probit.

Remember the lower is the dependent variable (persi...) the better the person has done (1 for improved, 3 got worse).

So a negative coefficient indicates that as that variable increases so the person tends to have been doing better.

OK The self employed have been doing better as have people in Estonia????????

Those in countries with a good rule of law have done better and those in richer countries too (lgnipic: log Gross nattional income per capita)

Married people and educated people have been doing better but the unemployed and manual workers worse.

Impact of ageThe impact of age is thus 0.0513* AGE - 0.0322*AGE*AGE/100

0.0322*AGE*AGE/100 because this is how age squared was calculated

So the impact is:

AGE IMPACT 1.0812 1.5368 1.8474 70 2.0132

As people get older the probability of things getting worse increases. WHY?

And finallyThese are the estimates of 1 and 2

If for an individual the predicted value from the regression is less than -0.6564 then they would be predicted to be categorised as 1 position improved.

If for an individual the predicted value from the regression is greater than 0.3096 then they would be predicted to be categorised as 3 position has got worse..

And if the predicted value lies between these two values, then predicted value is no change.

Let us calculate some examples. First do the regression and store the coefficient vector as cy

oprobit persi lgnipc male age agesq rlaw estonia village town selfemp marrd educ2 unemp manual if age17 & persi

- oprobit persi lgnipc male age agesq rlaw estonia village town selfemp marrd educ2 unemp manual if age17 & persi
cy[1,2] is the coefficient on male. Let us code this as 1, i.e. We are predicting for a man.scalar py50 =cy[1,1]*3.0 + cy[1,2]*1 + cy[1,3]* 50 + cy[1,4]* 50*50/100 + cy[1,5]*5+ cy[1,6]*0 + cy[1,7]*1 + cy[1,8]*0 + cy[1,9]*0 + cy[1,10]*1 + cy[1,11]*4 + cy[1,12]*0 + cy[1,13]*0

scalar py50 =cy[1,1]*3.0 + cy[1,2]*1 + cy[1,3]* 50 + cy[1,4]* 50*50/100 + cy[1,5]*5+ cy[1,6]*0 + cy[1,7]*1 + cy[1,8]*0 + cy[1,9]*0 + cy[1,10]*1 + cy[1,11]*4 + cy[1,12]*0 + cy[1,13]*0

The other characteristics are 50 years old, country with the highest level of rule of law (5), etc,

This lies between -0.6564 and 0.3096, the two critical values and hence this person would be predicted to be no change

Now let us try the same person, but aged 30.

scalar py30 =cy[1,1]*3.0 + cy[1,2]*1 + cy[1,3]* 30 + cy[1,4]* 30*30/100 + cy[1,5]*5+ cy[1,6]*0 + cy[1,7]*1 + cy[1,8]*0 + cy[1,9]*0 + cy[1,10]*1 + cy[1,11]*4 + cy[1,12]*0 + cy[1,13]*0

This is less than the lower critical value of -0.6564 hence this person would be predicted to have improved.

No one has ever analysed this before and there may be a paper. That peoples situation gets worse as they age is not surprising, once they reach say 50. But these results suggest It is so for those aged 30 viz a viz 20, just as much as 60 viz a viz 50.Perhaps we should try a spline on this just to check the quadratic form on age is not misleadingAnd why do educated people fare better?

Multinomial Logit by handprogram myologit args lnf xb a1 a2quietly replace `lnf' = ln(1/(1+exp(-`a1' + `xb'))) if $ML_y1 == 1quietly replace `lnf' = ln(1/(1+exp(-`a2'+ `xb')) - 1/(1+exp(-`a1' + `xb'))) if $ML_y1 == 2quietly replace `lnf' = ln(1 - 1/(1+exp(-`a2'+ `xb'))) if $ML_y1 == 3end

* specify the method (lf) and the name of your evaluator (myologit)* followed by the equation(s) in parantheses and then the cutpoints.ml model lf myologit (xb: insure = age male nonwhite ) /a1 /a2ml checkml search ml maximize,iterate(50)ologit insure age male nonwhiteoprobit insure age male nonwhite

Does not converge and no second cut off point. But the coefficients per se the same as if we use the ologit command:

ologit insure age male nonwhite

See also: http://www.ats.ucla.edu/stat/stata/code/ml_maximize.htm

**************************************