categorical data analysis: part 1. ordinal response ... · categorical data analysis: part 1....

Categorical Data Analysis:

Part 1. Ordinal Response Regression Models

• We initial consider the case in which yi ∈ 1, . . . , C is an

ordered categorical variable.

• xi = (xi1, . . . , xip)′ can consist of both categorical and continu-

ous predictors.

• If outcomes & predictors are categorical, then we have contin-

gency table data (log-linear models standard)

1

• Methods for binary response data (e.g., logistic or probit regres-

sion) can be generalized directly to the ordinal case.

• Instead of a Bernoulli likelihood, we have a multinomial

likelihood:

π(y;X) =n∏

i=1

C∏j=1

Pr(yi = j |xi)1(yi=j) =

n∏i=1

C∏j=1

πyijij ,

where πij = Pr(yi = j |xi) and yij = 1(yi = j)

• Note that this is in the case where subjects are not grouped and

we may have continuous predictors.

2

• There are a wide variety of different forms for πij.

• Note that these are conditional category probabilities so that we

must have∑C

j=1 πij = 1 for all i.

• Regression models allow these category probabilities (or more

commonly a function of these probabilities) to depend on xi.

3

• Commonly, such models are based on transformations of the

distribution function:

hPr(yi ≤ c |xi) = αj − x′iβ

• h(·) is a link function mapping from [0, 1] → <.

• −∞ = α0 < α1 < . . . < αC−1 < αC = ∞ characterize the

baseline distribution of the categorical response.

4

• Restrictions are placed on the α’s to ensure that Pr(yi ≤ c |xi)

is interpretable as a distribution function - i.e.., is increasing with

c.

• The term x′iβ allows the distribution to shift systematically with

predictors

• By incorporating a negative sign on x′iβ, increasing xih results

in stochastic increases in the distribution of yi when βh > 0

(holding other predictors constant)

5

Example: Returning to dde and preterm birth

• Gestational length is more naturally modeled as an ordered cat-

egorical variable instead of a 0/1 indicator

• In particular, instead of having a simple 0/1 indicator of preterm

birth, we could have

yi =

1 very early preterm2 early preterm3 preterm4 full term

• We then want to see how dde and other predictors impact the

distribution of yi.

6

• Generalized probit model:

Pr(yi ≤ j |xi) = Φ(αj − x′iβ),

where −∞ = α0 < α1 < . . . < αC−1 < αC = ∞ are threshold

parameters.

• Typically, α1 = 1 for identifiability.

• Underlying normal formulation:

yi =C∑

c=1c 1(αc−1 < zi ≤ αc)

zi ∼ N(x′iβ, 1)

• Data augmentation Gibbs sampling can be used for posterior

computation

7

• Note that the distribution of yi |xi = 0 (the baseline distribu-

tion) is not restricted (i.e., the category probabilities) can take

any value.

• However, unless we allow interactions between j and β, we are

assuming a particular functional form for the shift in distribution.

• Suppose for example, we have Pr(yi = j |xi = 0) = 0.1, 0.2, 0.6, 0.1

and β = 2.

8

••

•

•

x=0

y

Pr(

Y=j

|x)

1.0 1.5 2.0 2.5 3.0 3.5 4.0

0.0

0.2

0.4

0.6

0.8

1.0

••

•

•

x=0.25

y

Pr(

Y=j

|x)

1.0 1.5 2.0 2.5 3.0 3.5 4.0

0.0

0.2

0.4

0.6

0.8

1.0

• •

••

x=0.5

y

Pr(

Y=j

|x)

1.0 1.5 2.0 2.5 3.0 3.5 4.0

0.0

0.2

0.4

0.6

0.8

1.0

• •

•

•

x=1

y

Pr(

Y=j

|x)

1.0 1.5 2.0 2.5 3.0 3.5 4.0

0.0

0.2

0.4

0.6

0.8

1.0

9

• The lines in this plot represent the probability mass function of

yi for different values of xi under the generalized probit model.

• The points X represent the pmf under the generalized logistic

model, which has

logitPr(yi ≤ j |xi) = αj − x′iβ.

• The distributions are the same for xi = 0 but diverge somewhat

as xi varies.

10

Prior Specification

• For the regression parameters β, we can choose priors as dis-

cussed previously (e.g., normal or uniform improper).

• However, the parameters α = (α1, . . . , αC−1)′ have restricted

support Ω ∈ <(C−1):

Ω = α : α1 < α2 < . . . < αC−1

• Hence, we need to choose a prior for α with support on Ω

11

• A common choice is π(α) ∝ 1(α ∈ Ω) (i.e., a uniform improper

prior on the restricted space)

• If one wants to incorporate prior information on the baseline

probability mass function, one can instead choose

π(α) ∝ 1(α ∈ Ω) N(α; α0,Σα).

• Certainly, there are many other possibilities, and this choice

should be motivated by the available prior information, subject

to α ∈ Ω.

12

Posterior Computation

• Gibbs sampling via adaptive rejection sampling or Metropolis-

Hastings can be used for posterior computation in generalized

probit/logit models.

• Such analyses are easily carried out in WinBUGS (for example).

• The data augmentation algorithm of Albert and Chib (1993) pro-

vides a convenient alternative for the generalized probit model.

13

Data Augmentation Algorithm

After choosing initial values for the parameters α and β, iterate

between the following steps:

1. Impute the underlying normal variable zi from its full conditional

posterior distribution,

π(zi |y,X, α, β) = N(x′iβ, 1) truncated to (αj−1, αj] for yi = j.

2. Sample β from its full conditional posterior distribution,

π(β | z,y,X, α) = N(β, Vβ),

obtained under the conditionally-conjugate π(β) = N(β0,Σβ)

prior.

14

3. Sample αj from its full conditional posterior distribution,

π(αj | z,y,X, β, α−j) = Unif[ maxizi : yi = j, min

izi : yi = j+1],

with α1 = 0 for identifiability & this form obtained under a

uniform improper prior for αj(j = 2, . . . , C − 1).

As a homework exercise (due next Tuesday), show the

joint posterior distribution & derivation for the condi-

tional posterior distributions shown in steps 1-3

15

Some Comments

• The 3rd step in the above Gibbs sampler can be quite inefficient

for sample sizes that are moderate to large.

• An alternative is to replace this Gibbs step with a Metropolis-

Hastings step to develop a hybrid Gibbs/Metropolis-Hastings

algorithm.

• For example, one can sample candidates for the cutpoints α from

normal distributions.

16

Continuation-Ratio Formulations

• As an alternative to specifying a model for the transformed dis-

tribution function, we can work with discrete hazards.

• In particular, we have

hPr(yi = j | yi ≥ j,xi) = αj + x′iβ,

• Models having this sequential-type specification are referred to

as continuation-ratio models

17

categorical data analysis: part 1. ordinal response ... · categorical data analysis: part 1....

Documents