1 ordered probit models. 2 ordered probit many discrete outcomes are to questions that have a...
Post on 20-Dec-2015
227 views
TRANSCRIPT
1
Ordered probit models
2
Ordered Probit
• Many discrete outcomes are to questions that have a natural ordering but no quantitative interpretation:
• Examples:– Self reported health status
• (excellent, very good, good, fair, poor)
– Do you agree with the following statement• Strongly agree, agree, disagree, strongly
disagree
3
• Can use the same type of model as in the previous section to analyze these outcomes
• Another ‘latent variable’ model
• Key to the model: there is a monotonic ordering of the qualitative responses
4
Self reported health status
• Excellent, very good, good, fair, poor• Coded as 1, 2, 3, 4, 5 on National Health
Interview Survey• We will code as 5,4,3,2,1 (easier to
think of this way)• Asked on every major health survey• Important predictor of health outcomes,
e.g. mortality• Key question: what predicts health
status?
5
• Important to note – the numbers 1-5 mean nothing in terms of their value, just an ordering to show you the lowest to highest
• The example below is easily adapted to include categorical variables with any number of outcomes
6
Model
• yi* = latent index of reported health
• The latent index measures your own scale of health. Once yi* crosses a certain value you report poor, then good, then very good, then excellent health
7
• yi = (1,2,3,4,5) for (fair, poor, VG, G, excel)
• Interval decision rule
• yi=1 if yi* ≤ u1
• yi=2 if u1 < yi* ≤ u2
• yi=3 if u2 < yi* ≤ u3
• yi=4 if u3 < yi* ≤ u4
• yi=5 if yi* > u4
8
• As with logit and probit models, we will assume yi
* is a function of observed and unobserved variables
• yi* = β0 + x1i β1 + x2i β2 …. xki βk + εi
• yi* = xi β + εi
9
• The threshold values (u1, u2, u3, u4) are unknown. We do not know the value of the index necessary to push you from very good to excellent.
• In theory, the threshold values are different for everyone
• Computer will not only estimate the β’s, but also the thresholds – average across people
10
• As with probit and logit, the model will be determined by the assumed distribution of ε
• In practice, most people pick nornal, generating an ‘ordered probit’ (I have no idea why)
• We will generate the math for the probit version
11
Probabilities
• Lets do the outliers, Pr(yi=1) and Pr(yi=5) first
• Pr(yi=1) • = Pr(yi* ≤ u1) • = Pr(xi β +εi ≤ u1 ) • =Pr(εi ≤ u1 - xi β) • = Φ[u1 - xi β] = 1- Φ[xi β – u1]
12
• Pr(yi=5)
• = Pr(yi* > u4)
• = Pr(xi β +εi > u4 )
• =Pr(εi > u4 - xi β)
• = 1 - Φ[u4 - xi β] = Φ[xi β – u4]
13
Sample one for y=3
• Pr(yi=3) = Pr(u2 < yi* ≤ u3)
= Pr(yi* ≤ u3) – Pr(yi* ≤ u2)
= Pr(xi β +εi ≤ u3) – Pr(xi β +εi ≤ u2)
= Pr(εi ≤ u3- xi β) - Pr(εi ≤ u2 - xi β)
= Φ[u3- xi β] - Φ[u2 - xi β]
= 1 - Φ[xi β - u3] – 1 + Φ[xi β - u2]
= Φ[xi β - u2] - Φ[xi β - u3]
14
Summary
• Pr(yi=1) = 1- Φ[xi β – u1]
• Pr(yi=2) = Φ[xi β – u1] - Φ[xi β – u2]
• Pr(yi=3) = Φ[xi β – u2] - Φ[xi β – u3]
• Pr(yi=4) = Φ[xi β – u3] - Φ[xi β – u4]
• Pr(yi=5) = Φ[xi β – u4]
15
Likelihood function
• There are 5 possible choices for each person
• Only 1 is observed
• L = Σi ln[Pr(yi=k)] for k
16
Programming example
• Cancer control supplement to 1994 National Health Interview Survey
• Question: what observed characteristics predict self reported health (1-5 scale)
• 1=poor, 5=excellent• Key covariates: income, education, age,
current and former smoking status• Programs
• sr_health_status.do, .dta, .log
17
• desc;
• male byte %9.0g =1 if male• age byte %9.0g age in years• educ byte %9.0g years of education• smoke byte %9.0g current smoker• smoke5 byte %9.0g smoked in past 5 years• black float %9.0g =1 if respondent is black• othrace float %9.0g =1 if other race (white is ref)• sr_health float %9.0g 1-5 self reported health,• 5=excel, 1=poor• famincl float %9.0g log family income
18
• tab sr_health;
• 1-5 self |• reported |• health, |• 5=excel, |• 1=poor | Freq. Percent Cum.• ------------+-----------------------------------• 1 | 342 2.65 2.65• 2 | 991 7.68 10.33• 3 | 3,068 23.78 34.12• 4 | 3,855 29.88 64.00• 5 | 4,644 36.00 100.00• ------------+-----------------------------------• Total | 12,900 100.00
19
In STATA
• oprobit sr_health male age educ famincl black othrace smoke smoke5;
20
• Ordered probit estimates Number of obs = 12900• LR chi2(8) = 2379.61• Prob > chi2 = 0.0000• Log likelihood = -16401.987 Pseudo R2 = 0.0676
• ------------------------------------------------------------------------------• sr_health | Coef. Std. Err. z P>|z| [95% Conf. Interval]• -------------+----------------------------------------------------------------• male | .1281241 .0195747 6.55 0.000 .0897583 .1664899• age | -.0202308 .0008499 -23.80 0.000 -.0218966 -.018565• educ | .0827086 .0038547 21.46 0.000 .0751535 .0902637• famincl | .2398957 .0112206 21.38 0.000 .2179037 .2618878• black | -.221508 .029528 -7.50 0.000 -.2793818 -.1636341• othrace | -.2425083 .0480047 -5.05 0.000 -.3365958 -.1484208• smoke | -.2086096 .0219779 -9.49 0.000 -.2516855 -.1655337• smoke5 | -.1529619 .0357995 -4.27 0.000 -.2231277 -.0827961• -------------+----------------------------------------------------------------• _cut1 | .4858634 .113179 (Ancillary parameters)• _cut2 | 1.269036 .11282 • _cut3 | 2.247251 .1138171 • _cut4 | 3.094606 .1145781 • ------------------------------------------------------------------------------
21
Interpret coefficients
• Marginal effects/changes in probabilities are now a function of 2 things– Point of expansion (x’s)– Frame of reference for outcome (y)
• STATA– Picks mean values for x’s– You pick the value of y
22
Continuous x’s
• Consider y=5
• d Pr(yi=5)/dxi
= d Φ[xi β – u4]/dxi = βφ[xi β – u4]
• Consider y=3
• d Pr(yi=3)/dxi = βφ[xi β – u3] - βφ[xi β – u4]
23
Discrete X’s
• xi β = β0 + x1i β1 + x2i β2 …. xki βk
– X2i is yes or no (1 or 0)
• ΔPr(yi=5) =
• Φ[β0 + x1i β1 + β2 + x3i β3 +.. xki βk]
- Φ[β0 + x1i β1 + x3i β3 …. xki βk]
• Change in the probabilities when x2i=1 and x2i=0
24
Ask for marginal effects
• mfx compute, predict(outcome(5));
25
• mfx compute, predict(outcome(5));
• Marginal effects after oprobit• y = Pr(sr_health==5) (predict, outcome(5))• = .34103717• ------------------------------------------------------------------------------• variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X• ---------+--------------------------------------------------------------------• male*| .0471251 .00722 6.53 0.000 .03298 .06127 .438062• age | -.0074214 .00031 -23.77 0.000 -.008033 -.00681 39.8412• educ | .0303405 .00142 21.42 0.000 .027565 .033116 13.2402• famincl | .0880025 .00412 21.37 0.000 .07993 .096075 10.2131• black*| -.0781411 .00996 -7.84 0.000 -.097665 -.058617 .124264• othrace*| -.0843227 .01567 -5.38 0.000 -.115043 -.053602 .04124• smoke*| -.0749785 .00773 -9.71 0.000 -.09012 -.059837 .289147• smoke5*| -.0545062 .01235 -4.41 0.000 -.078719 -.030294 .081395• ------------------------------------------------------------------------------• (*) dy/dx is for discrete change of dummy variable from 0 to 1
26
Interpret the results
• Males are 4.7 percentage points more likely to report excellent
• Each year of age decreases chance of reporting excellent by 0.7 percentage points
• Current smokers are 7.5 percentage points less likely to report excellent health
27
Minor notes about estimation
• Wald tests/-2 log likelihood tests are done the exact same was as in PROBIT and LOGIT
28
• Use PRCHANGE to calculate marginal effect for a specific person
prchange, x(age=40 black=0 othrace=0 smoke=0 smoke5=0 educ=16);
– When a variable is NOT specified (famincl), STATA takes the sample mean.
29
• PRCHANGE will produce results for all outcomes
• male• Avg|Chg| 1 2 3 4• 0->1 .0203868 -.0020257 -.00886671 -.02677558 -.01329902
• 5• 0->1 .05096698
30
• age• Avg|Chg| 1 2 3 4• Min->Max .13358317 .0184785 .06797072 .17686112 .07064757• -+1/2 .00321942 .00032518 .00141642 .00424452 .00206241• -+sd/2 .03728014 .00382077 .01648743 .04910323 .0237889• MargEfct .00321947 .00032515 .00141639 .00424462 .00206252