copyright of: abhinav anand jyoti arora shraddha ramswamy discrete choice modeling in health...
TRANSCRIPT
ORDINAL AND SEQUENTIAL LOGIT MODELS
COPYRIGHT OF:
ABHINAV ANANDJYOTI ARORA
SHRADDHA RAMSWAMY
DISCRETE CHOICE MODELING IN HEALTH ECONOMICS
INTRODUCTIONStudies suggest that self rated health score is a
reliable predictor of health status
We investigate impact of a host of personal and status characteristics such as age, gender etc on the health perception of US Citizens
DATADataset : NHANES Epidemiological Followup
Study :1992Health status , represented by Yi coded as follows
Age is measured in years, education is measured in terms of number of years of schooling completed and dichotomous variable is created for gender (female = 1) and race (black = 1).
POOR• Yi = 1
FAIR• Yi = 2
GOOD• Yi = 3
VERY GOOD• Yi = 4
EXCELLENT• Yi = 5
METHODOLOGY
1) Ordered Logit model for the first part of our
enquiry.
2) Sequential Logit Model for the second part of our enquiry
ORDERED LOGIT MODEL SPECIFICATIONA multinomial choice model where the values
taken by the dependent variable takes a natural order.
Yi* is latent variable such that
Yi = j when αj-1 < Yi*< αj where j= 1,2,3,4,5 and
Yi* = β’Xi + ui where u follows logistic distribution.
5
α1 α2 α3 α4
ORDERED LOGIT MODEL
Where F( ) is a cdf and j = 1,2,3,4,5 and i is theith individual
We assume that u follows logistic distribution
CONTD….
P(Yi =1/Xi ) = F [ α1 – β’X]
P(Yi = 2/Xi) = F [α2 – β’X] – F[α1-β’X]
P(Yi = 3/Xi) = F [α3 – β’X] – F [α2 -β’X]
P(Yi = 4/Xi) = F [α4– β’X] – F [α3 – β’X]
P(Yi = 5/Xi) = 1– F [α4-β’X]
Where F ( ) is defined as above.
For estimating the model we specify 5 dummy variables for the ith individual with the following rule
Zij = 1 if Yi = j where j = 1,2,3,4,5.
= 0 otherwise
ORDERED LOGIT MODEL ESTIMATIONUsing MLE
Using Newton Raphson formula.
5 5
Assuming independent observations, we get
3712
5
F F
RESULTS
Analysis of Maximum Likelihood Estimates
Parameter DF EstimateStandard Error
Chi-Square Pr > Chisq
Intercept 5 1 -1.446 0.2473 34.1904 <.0001Intercept 4 1 0.1255 0.2463 0.2598 0.6103Intercept 3 1 1.6139 0.2479 42.3953 <.0001Intercept 2 1 3.138 0.2539 152.7003 <.0001Age 1 -0.0313 0.00262 143.3251 <.0001gender 1 0.00989 0.0605 0.0267 0.8701race 1 -0.2122 0.0669 10.0676 0.0015edu 1 0.1553 0.0114 184.097 <.0001south 1 -0.7989 0.1072 55.5218 <.0001
COMMAND: proc logistic data = sasuser.nhanes descending;model health = age gender race edu south;run;
..contdOdds Ratio Estimate
Effect
Point Estimat
e
95% Wald Confidence
Limit
Age 0.969 0.964 0.974gender 1.01 0.897 1.137
race 0.809 0.709 0.922
edu 1.168 1.142 1.194
south 0.45 0.365 0.555
Association of Predicted Probabilities and Observed
Responses
Percent Concordant 65.8
Somers'D 0.322
Percent Discordant 33.6 Gamma 0.324
Percent Tied 0.6 Tau-a 0.244
Pairs5221799 c 0.661
Probability estimate for ith individual1
2
3
4
5 1
(-1.4460+β’Xi)
(-1.4460+β’Xi)
(-1.4460+β’Xi)
(3.138+β’Xi)
(1.6139+β’Xi)(3.138+β’Xi)
(1.6139+β’Xi)(3.138+β’Xi)
(1.6139+β’Xi)
(1.6139+β’Xi)
(0.1225+β’Xi)
(0.1225+β’Xi)
(0.1225+β’Xi)
(0.1225+β’Xi)
(3.138+β’Xi)
(-1.4460+β’Xi)
INFERENCE (ORDERED LOGIT)
One additional year of age results in a
3.13% decreases
in odds ratio of
higher self rating.
The impact of gender is almost
negligible.
Blacks are 19.12%
less likely than
whites to rate their health at higher
response values
An additional
year of schooling leads to 16.80%
increase in odds ratio higher self
rating
The Southern residents in each
district are 55% less
likely than the
northern to rate their health at higher
response values.
There are 522179 pairs of
observations
Of these 65.8% are concordant pairs while 33.6% are discordant
pairs.
SEQUENTIAL LOGIT MODEL
Choices/Responses follow a sequence, so we need (m-1) latent variables to characterize (m) unordered choices.
Self-rated health measure can be considered as a purely cardinal variable following a sequence instead of some natural ordering. This allows us to perform discrete choice analysis using (non-ordered) sequential logit model.
SEQUENTIAL LOGIT MODELRoot
(Sample)
Poor (1)Fair+++
(2 or 3 or 4 or 5)
Fair (2) Good++ (3 or 4 or 5)
Good (3)
VeryGood+ (4 or 5)
Very Good (4)
Excellent (5)
Framework•Five choices, and hence we have 4 latent variables to describe the choices.•Choices in each step are independent of the previous step.
Probability Computation ExampleP (Yi = 2) = P [Yi ≠ 1 and Yi = 2 |Yi ≠ 1] = P [Yi ≠ 1] P [Yi = 2|Yi ≠ 1 ]Therefore, for an individual i the conditional probability that his self-rated health measure will have a value j є {1,2,3,4,5} will be given by : Pij= P (Yi = j |Xi )
and so on till j = 5
ESTIMATION IN SEQUENTIAL LOGIT MODEL
Maximum
Likelihood
Estimation
One-shot joint
optimization with
Independent Examples
Repeated Optimizatio
n
•Thus, the parameter β1 can be estimated by dividing the entire sample into two groupsPoor Fair OR Good OR Very Good OR Excellent
•β2 can be estimated by first taking the sub-sample of those did not report poor into two groupsFair Good OR Very Good OR Excellent
•β3 can be estimated by taking the sub-sample of those who didn’t report poor or fair into two groupsGood Very Good OR Excellent
•β4 can be estimated by taking the sub-sample of those who didn’t report poor or fair or good into two groupsVery Good Excellent
In each case the binary models can be estimated by logit using MLE.
SEQUENTIAL LOGIT MODEL Implementation in SAS
data seqlogit; set seqlogit; fairplus = (shm>1); fair = (shm=2); if fairplus = 1; run; proc format; value shm 1='poor' 2-5='fair+++'; value gender 0='male' 1='female'; value race 0='white' 1='black'; value resid 0='north' 1='south'; run; proc qlim data=seqlogit; *covest=qml; class race resid gender; endogenous fair ~ discrete(dist=logistic order=formatted); model fair = age gender race edu resid; format gender gender. race race. resid resid.; run;
The QLIM Procedure
Parameter Estimates
Parameter
Estimate
Standard Error
t Value
Pr > |t|
Intercept -0.9028 0.40898 -2.21 0.0273
Age0.0310
85 0.004264 7.29 <.0001
Gender female
-0.0323
9 0.098606 -0.33 0.7426
Gender male 0 . . .
Race black0.1212
2 0.10717 1.13 0.258
Race white 0 . . .
Edu
-0.1549
8 0.018192 -8.52 <.0001
Resid south
-1.0359
2 0.142367 -7.28 <.0001
Resid north 0 . . .
Among those who report fair or good or very good or excellent health, the odds of reporting fair (rather than good++) are 64% lower among residents south of baseline thanresidents north of baseline of the same age, gender, education and race.
CONCLUSION
• Age, race, education (in terms of number of years of schooling ), and having residence in southern part of the district have a significant impact on self rated health.
• Gender doesn’t have a significant impact.
Ordered Logit Model
• Age, education ( in terms of schooling) and having residence in southern part of the district have a significant impact on self rated health.
• Gender and race don’t have significant impact.
Sequential Logit Model
REFERENCES
• Agresti A. Categorical Data Analysis, Second edition. New York: John Wiley & Sons; 2002
• Gardiner J C. , Luo Z. Logit Models in Practice: B, C, E, G, M, N, O… SAS Institute Inc. ; 2011