copyright of: abhinav anand jyoti arora shraddha ramswamy discrete choice modeling in health...

ORDINAL AND SEQUENTIAL LOGIT MODELS

COPYRIGHT OF:

ABHINAV ANANDJYOTI ARORA

SHRADDHA RAMSWAMY

DISCRETE CHOICE MODELING IN HEALTH ECONOMICS

INTRODUCTIONStudies suggest that self rated health score is a

reliable predictor of health status

We investigate impact of a host of personal and status characteristics such as age, gender etc on the health perception of US Citizens

DATADataset : NHANES Epidemiological Followup

Study :1992Health status , represented by Yi coded as follows

Age is measured in years, education is measured in terms of number of years of schooling completed and dichotomous variable is created for gender (female = 1) and race (black = 1).

POOR• Yi = 1

FAIR• Yi = 2

GOOD• Yi = 3

VERY GOOD• Yi = 4

EXCELLENT• Yi = 5

METHODOLOGY

1) Ordered Logit model for the first part of our

enquiry.

2) Sequential Logit Model for the second part of our enquiry

ORDERED LOGIT MODEL SPECIFICATIONA multinomial choice model where the values

taken by the dependent variable takes a natural order.

Yi* is latent variable such that

Yi = j when αj-1 < Yi*< αj where j= 1,2,3,4,5 and

Yi* = β’Xi + ui where u follows logistic distribution.

5

α1 α2 α3 α4

ORDERED LOGIT MODEL

Where F( ) is a cdf and j = 1,2,3,4,5 and i is theith individual

We assume that u follows logistic distribution

CONTD….

P(Yi =1/Xi ) = F [ α1 – β’X]

P(Yi = 2/Xi) = F [α2 – β’X] – F[α1-β’X]

P(Yi = 3/Xi) = F [α3 – β’X] – F [α2 -β’X]

P(Yi = 4/Xi) = F [α4– β’X] – F [α3 – β’X]

P(Yi = 5/Xi) = 1– F [α4-β’X]

Where F ( ) is defined as above.

For estimating the model we specify 5 dummy variables for the ith individual with the following rule

Zij = 1 if Yi = j where j = 1,2,3,4,5.

= 0 otherwise

ORDERED LOGIT MODEL ESTIMATIONUsing MLE

Using Newton Raphson formula.

5 5

Assuming independent observations, we get

3712

5

F F

RESULTS

Analysis of Maximum Likelihood Estimates

Parameter DF EstimateStandard Error

Chi-Square Pr > Chisq

Intercept 5 1 -1.446 0.2473 34.1904 <.0001Intercept 4 1 0.1255 0.2463 0.2598 0.6103Intercept 3 1 1.6139 0.2479 42.3953 <.0001Intercept 2 1 3.138 0.2539 152.7003 <.0001Age 1 -0.0313 0.00262 143.3251 <.0001gender 1 0.00989 0.0605 0.0267 0.8701race 1 -0.2122 0.0669 10.0676 0.0015edu 1 0.1553 0.0114 184.097 <.0001south 1 -0.7989 0.1072 55.5218 <.0001

COMMAND: proc logistic data = sasuser.nhanes descending;model health = age gender race edu south;run;

..contdOdds Ratio Estimate

Effect

Point Estimat

e

95% Wald Confidence

Limit

Age 0.969 0.964 0.974gender 1.01 0.897 1.137

race 0.809 0.709 0.922

edu 1.168 1.142 1.194

south 0.45 0.365 0.555

Association of Predicted Probabilities and Observed

Responses

Percent Concordant 65.8

Somers'D 0.322

Percent Discordant 33.6 Gamma 0.324

Percent Tied 0.6 Tau-a 0.244

Pairs5221799 c 0.661

Probability estimate for ith individual1

2

3

4

5 1

(-1.4460+β’Xi)

(-1.4460+β’Xi)

(-1.4460+β’Xi)

(3.138+β’Xi)

(1.6139+β’Xi)(3.138+β’Xi)

(1.6139+β’Xi)(3.138+β’Xi)

(1.6139+β’Xi)

(1.6139+β’Xi)

(0.1225+β’Xi)

(0.1225+β’Xi)

(0.1225+β’Xi)

(0.1225+β’Xi)

(3.138+β’Xi)

(-1.4460+β’Xi)

INFERENCE (ORDERED LOGIT)

One additional year of age results in a

3.13% decreases

in odds ratio of

higher self rating.

The impact of gender is almost

negligible.

Blacks are 19.12%

less likely than

whites to rate their health at higher

response values

An additional

year of schooling leads to 16.80%

increase in odds ratio higher self

rating

The Southern residents in each

district are 55% less

likely than the

northern to rate their health at higher

response values.

There are 522179 pairs of

observations

Of these 65.8% are concordant pairs while 33.6% are discordant

pairs.

SEQUENTIAL LOGIT MODEL

Choices/Responses follow a sequence, so we need (m-1) latent variables to characterize (m) unordered choices.

Self-rated health measure can be considered as a purely cardinal variable following a sequence instead of some natural ordering. This allows us to perform discrete choice analysis using (non-ordered) sequential logit model.

SEQUENTIAL LOGIT MODELRoot

(Sample)

Poor (1)Fair+++

(2 or 3 or 4 or 5)

Fair (2) Good++ (3 or 4 or 5)

Good (3)

VeryGood+ (4 or 5)

Very Good (4)

Excellent (5)

Framework•Five choices, and hence we have 4 latent variables to describe the choices.•Choices in each step are independent of the previous step.

Probability Computation ExampleP (Yi = 2) = P [Yi ≠ 1 and Yi = 2 |Yi ≠ 1] = P [Yi ≠ 1] P [Yi = 2|Yi ≠ 1 ]Therefore, for an individual i the conditional probability that his self-rated health measure will have a value j є {1,2,3,4,5} will be given by : Pij= P (Yi = j |Xi )

and so on till j = 5

ESTIMATION IN SEQUENTIAL LOGIT MODEL

Maximum

Likelihood

Estimation

One-shot joint

optimization with

Independent Examples

Repeated Optimizatio

n

•Thus, the parameter β1 can be estimated by dividing the entire sample into two groupsPoor Fair OR Good OR Very Good OR Excellent

•β2 can be estimated by first taking the sub-sample of those did not report poor into two groupsFair Good OR Very Good OR Excellent

•β3 can be estimated by taking the sub-sample of those who didn’t report poor or fair into two groupsGood Very Good OR Excellent

•β4 can be estimated by taking the sub-sample of those who didn’t report poor or fair or good into two groupsVery Good Excellent

In each case the binary models can be estimated by logit using MLE.

SEQUENTIAL LOGIT MODEL Implementation in SAS

data seqlogit; set seqlogit; fairplus = (shm>1); fair = (shm=2); if fairplus = 1; run; proc format; value shm 1='poor' 2-5='fair+++'; value gender 0='male' 1='female'; value race 0='white' 1='black'; value resid 0='north' 1='south'; run; proc qlim data=seqlogit; *covest=qml; class race resid gender; endogenous fair ~ discrete(dist=logistic order=formatted); model fair = age gender race edu resid; format gender gender. race race. resid resid.; run;

The QLIM Procedure

Parameter Estimates

Parameter

Estimate

Standard Error

t Value

Pr > |t|

Intercept -0.9028 0.40898 -2.21 0.0273

Age0.0310

85 0.004264 7.29 <.0001

Gender female

-0.0323

9 0.098606 -0.33 0.7426

Gender male 0 . . .

Race black0.1212

2 0.10717 1.13 0.258

Race white 0 . . .

Edu

-0.1549

8 0.018192 -8.52 <.0001

Resid south

-1.0359

2 0.142367 -7.28 <.0001

Resid north 0 . . .

Among those who report fair or good or very good or excellent health, the odds of reporting fair (rather than good++) are 64% lower among residents south of baseline thanresidents north of baseline of the same age, gender, education and race.

CONCLUSION

• Age, race, education (in terms of number of years of schooling ), and having residence in southern part of the district have a significant impact on self rated health.

• Gender doesn’t have a significant impact.

Ordered Logit Model

• Age, education ( in terms of schooling) and having residence in southern part of the district have a significant impact on self rated health.

• Gender and race don’t have significant impact.

Sequential Logit Model

REFERENCES

• Agresti A. Categorical Data Analysis, Second edition. New York: John Wiley & Sons; 2002

• Gardiner J C. , Luo Z. Logit Models in Practice: B, C, E, G, M, N, O… SAS Institute Inc. ; 2011

copyright of: abhinav anand jyoti arora shraddha ramswamy discrete choice modeling in health...

Documents