1 ordered probit models. 2 ordered probit many discrete outcomes are to questions that have a...

1

Ordered probit models

2

Ordered Probit

• Many discrete outcomes are to questions that have a natural ordering but no quantitative interpretation:

• Examples:– Self reported health status

• (excellent, very good, good, fair, poor)

– Do you agree with the following statement• Strongly agree, agree, disagree, strongly

disagree

3

• Can use the same type of model as in the previous section to analyze these outcomes

• Another ‘latent variable’ model

• Key to the model: there is a monotonic ordering of the qualitative responses

4

Self reported health status

• Excellent, very good, good, fair, poor• Coded as 1, 2, 3, 4, 5 on National Health

Interview Survey• We will code as 5,4,3,2,1 (easier to

think of this way)• Asked on every major health survey• Important predictor of health outcomes,

e.g. mortality• Key question: what predicts health

status?

5

• Important to note – the numbers 1-5 mean nothing in terms of their value, just an ordering to show you the lowest to highest

• The example below is easily adapted to include categorical variables with any number of outcomes

6

Model

• yi* = latent index of reported health

• The latent index measures your own scale of health. Once yi* crosses a certain value you report poor, then good, then very good, then excellent health

7

• yi = (1,2,3,4,5) for (fair, poor, VG, G, excel)

• Interval decision rule

• yi=1 if yi* ≤ u1

• yi=2 if u1 < yi* ≤ u2

• yi=3 if u2 < yi* ≤ u3

• yi=4 if u3 < yi* ≤ u4

• yi=5 if yi* > u4

8

• As with logit and probit models, we will assume yi

* is a function of observed and unobserved variables

• yi* = β0 + x1i β1 + x2i β2 …. xki βk + εi

• yi* = xi β + εi

9

• The threshold values (u1, u2, u3, u4) are unknown. We do not know the value of the index necessary to push you from very good to excellent.

• In theory, the threshold values are different for everyone

• Computer will not only estimate the β’s, but also the thresholds – average across people

10

• As with probit and logit, the model will be determined by the assumed distribution of ε

• In practice, most people pick nornal, generating an ‘ordered probit’ (I have no idea why)

• We will generate the math for the probit version

11

Probabilities

• Lets do the outliers, Pr(yi=1) and Pr(yi=5) first

• Pr(yi=1) • = Pr(yi* ≤ u1) • = Pr(xi β +εi ≤ u1 ) • =Pr(εi ≤ u1 - xi β) • = Φ[u1 - xi β] = 1- Φ[xi β – u1]

12

• Pr(yi=5)

• = Pr(yi* > u4)

• = Pr(xi β +εi > u4 )

• =Pr(εi > u4 - xi β)

• = 1 - Φ[u4 - xi β] = Φ[xi β – u4]

13

Sample one for y=3

• Pr(yi=3) = Pr(u2 < yi* ≤ u3)

= Pr(yi* ≤ u3) – Pr(yi* ≤ u2)

= Pr(xi β +εi ≤ u3) – Pr(xi β +εi ≤ u2)

= Pr(εi ≤ u3- xi β) - Pr(εi ≤ u2 - xi β)

= Φ[u3- xi β] - Φ[u2 - xi β]

= 1 - Φ[xi β - u3] – 1 + Φ[xi β - u2]

= Φ[xi β - u2] - Φ[xi β - u3]

14

Summary

• Pr(yi=1) = 1- Φ[xi β – u1]

• Pr(yi=2) = Φ[xi β – u1] - Φ[xi β – u2]

• Pr(yi=3) = Φ[xi β – u2] - Φ[xi β – u3]

• Pr(yi=4) = Φ[xi β – u3] - Φ[xi β – u4]

• Pr(yi=5) = Φ[xi β – u4]

15

Likelihood function

• There are 5 possible choices for each person

• Only 1 is observed

• L = Σi ln[Pr(yi=k)] for k

16

Programming example

• Cancer control supplement to 1994 National Health Interview Survey

• Question: what observed characteristics predict self reported health (1-5 scale)

• 1=poor, 5=excellent• Key covariates: income, education, age,

current and former smoking status• Programs

• sr_health_status.do, .dta, .log

17

• desc;

• male byte %9.0g =1 if male• age byte %9.0g age in years• educ byte %9.0g years of education• smoke byte %9.0g current smoker• smoke5 byte %9.0g smoked in past 5 years• black float %9.0g =1 if respondent is black• othrace float %9.0g =1 if other race (white is ref)• sr_health float %9.0g 1-5 self reported health,• 5=excel, 1=poor• famincl float %9.0g log family income

18

• tab sr_health;

• 1-5 self |• reported |• health, |• 5=excel, |• 1=poor | Freq. Percent Cum.• ------------+-----------------------------------• 1 | 342 2.65 2.65• 2 | 991 7.68 10.33• 3 | 3,068 23.78 34.12• 4 | 3,855 29.88 64.00• 5 | 4,644 36.00 100.00• ------------+-----------------------------------• Total | 12,900 100.00

19

In STATA

• oprobit sr_health male age educ famincl black othrace smoke smoke5;

20

• Ordered probit estimates Number of obs = 12900• LR chi2(8) = 2379.61• Prob > chi2 = 0.0000• Log likelihood = -16401.987 Pseudo R2 = 0.0676

• ------------------------------------------------------------------------------• sr_health | Coef. Std. Err. z P>|z| [95% Conf. Interval]• -------------+----------------------------------------------------------------• male | .1281241 .0195747 6.55 0.000 .0897583 .1664899• age | -.0202308 .0008499 -23.80 0.000 -.0218966 -.018565• educ | .0827086 .0038547 21.46 0.000 .0751535 .0902637• famincl | .2398957 .0112206 21.38 0.000 .2179037 .2618878• black | -.221508 .029528 -7.50 0.000 -.2793818 -.1636341• othrace | -.2425083 .0480047 -5.05 0.000 -.3365958 -.1484208• smoke | -.2086096 .0219779 -9.49 0.000 -.2516855 -.1655337• smoke5 | -.1529619 .0357995 -4.27 0.000 -.2231277 -.0827961• -------------+----------------------------------------------------------------• _cut1 | .4858634 .113179 (Ancillary parameters)• _cut2 | 1.269036 .11282 • _cut3 | 2.247251 .1138171 • _cut4 | 3.094606 .1145781 • ------------------------------------------------------------------------------

21

Interpret coefficients

• Marginal effects/changes in probabilities are now a function of 2 things– Point of expansion (x’s)– Frame of reference for outcome (y)

• STATA– Picks mean values for x’s– You pick the value of y

22

Continuous x’s

• Consider y=5

• d Pr(yi=5)/dxi

= d Φ[xi β – u4]/dxi = βφ[xi β – u4]

• Consider y=3

• d Pr(yi=3)/dxi = βφ[xi β – u3] - βφ[xi β – u4]

23

Discrete X’s

• xi β = β0 + x1i β1 + x2i β2 …. xki βk

– X2i is yes or no (1 or 0)

• ΔPr(yi=5) =

• Φ[β0 + x1i β1 + β2 + x3i β3 +.. xki βk]

- Φ[β0 + x1i β1 + x3i β3 …. xki βk]

• Change in the probabilities when x2i=1 and x2i=0

24

Ask for marginal effects

• mfx compute, predict(outcome(5));

25

• mfx compute, predict(outcome(5));

• Marginal effects after oprobit• y = Pr(sr_health==5) (predict, outcome(5))• = .34103717• ------------------------------------------------------------------------------• variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X• ---------+--------------------------------------------------------------------• male*| .0471251 .00722 6.53 0.000 .03298 .06127 .438062• age | -.0074214 .00031 -23.77 0.000 -.008033 -.00681 39.8412• educ | .0303405 .00142 21.42 0.000 .027565 .033116 13.2402• famincl | .0880025 .00412 21.37 0.000 .07993 .096075 10.2131• black*| -.0781411 .00996 -7.84 0.000 -.097665 -.058617 .124264• othrace*| -.0843227 .01567 -5.38 0.000 -.115043 -.053602 .04124• smoke*| -.0749785 .00773 -9.71 0.000 -.09012 -.059837 .289147• smoke5*| -.0545062 .01235 -4.41 0.000 -.078719 -.030294 .081395• ------------------------------------------------------------------------------• (*) dy/dx is for discrete change of dummy variable from 0 to 1

26

Interpret the results

• Males are 4.7 percentage points more likely to report excellent

• Each year of age decreases chance of reporting excellent by 0.7 percentage points

• Current smokers are 7.5 percentage points less likely to report excellent health

27

Minor notes about estimation

• Wald tests/-2 log likelihood tests are done the exact same was as in PROBIT and LOGIT

28

• Use PRCHANGE to calculate marginal effect for a specific person

prchange, x(age=40 black=0 othrace=0 smoke=0 smoke5=0 educ=16);

– When a variable is NOT specified (famincl), STATA takes the sample mean.

29

• PRCHANGE will produce results for all outcomes

• male• Avg|Chg| 1 2 3 4• 0->1 .0203868 -.0020257 -.00886671 -.02677558 -.01329902

• 5• 0->1 .05096698

30

• age• Avg|Chg| 1 2 3 4• Min->Max .13358317 .0184785 .06797072 .17686112 .07064757• -+1/2 .00321942 .00032518 .00141642 .00424452 .00206241• -+sd/2 .03728014 .00382077 .01648743 .04910323 .0237889• MargEfct .00321947 .00032515 .00141639 .00424462 .00206252

1 ordered probit models. 2 ordered probit many discrete outcomes are to questions that have a...

Documents

pr i u

model y i

unobserved variables

interval decision rule

threshold values u

excellent health slide

probit version slide

summary pry