categorical data analysis: part 1. ordinal response ... · categorical data analysis: part 1....
TRANSCRIPT
Categorical Data Analysis:
Part 1. Ordinal Response Regression Models
• We initial consider the case in which yi ∈ 1, . . . , C is an
ordered categorical variable.
• xi = (xi1, . . . , xip)′ can consist of both categorical and continu-
ous predictors.
• If outcomes & predictors are categorical, then we have contin-
gency table data (log-linear models standard)
1
• Methods for binary response data (e.g., logistic or probit regres-
sion) can be generalized directly to the ordinal case.
• Instead of a Bernoulli likelihood, we have a multinomial
likelihood:
π(y;X) =n∏
i=1
C∏j=1
Pr(yi = j |xi)1(yi=j) =
n∏i=1
C∏j=1
πyijij ,
where πij = Pr(yi = j |xi) and yij = 1(yi = j)
• Note that this is in the case where subjects are not grouped and
we may have continuous predictors.
2
• There are a wide variety of different forms for πij.
• Note that these are conditional category probabilities so that we
must have∑C
j=1 πij = 1 for all i.
• Regression models allow these category probabilities (or more
commonly a function of these probabilities) to depend on xi.
3
• Commonly, such models are based on transformations of the
distribution function:
hPr(yi ≤ c |xi) = αj − x′iβ
• h(·) is a link function mapping from [0, 1] → <.
• −∞ = α0 < α1 < . . . < αC−1 < αC = ∞ characterize the
baseline distribution of the categorical response.
4
• Restrictions are placed on the α’s to ensure that Pr(yi ≤ c |xi)
is interpretable as a distribution function - i.e.., is increasing with
c.
• The term x′iβ allows the distribution to shift systematically with
predictors
• By incorporating a negative sign on x′iβ, increasing xih results
in stochastic increases in the distribution of yi when βh > 0
(holding other predictors constant)
5
Example: Returning to dde and preterm birth
• Gestational length is more naturally modeled as an ordered cat-
egorical variable instead of a 0/1 indicator
• In particular, instead of having a simple 0/1 indicator of preterm
birth, we could have
yi =
1 very early preterm2 early preterm3 preterm4 full term
• We then want to see how dde and other predictors impact the
distribution of yi.
6
• Generalized probit model:
Pr(yi ≤ j |xi) = Φ(αj − x′iβ),
where −∞ = α0 < α1 < . . . < αC−1 < αC = ∞ are threshold
parameters.
• Typically, α1 = 1 for identifiability.
• Underlying normal formulation:
yi =C∑
c=1c 1(αc−1 < zi ≤ αc)
zi ∼ N(x′iβ, 1)
• Data augmentation Gibbs sampling can be used for posterior
computation
7
• Note that the distribution of yi |xi = 0 (the baseline distribu-
tion) is not restricted (i.e., the category probabilities) can take
any value.
• However, unless we allow interactions between j and β, we are
assuming a particular functional form for the shift in distribution.
• Suppose for example, we have Pr(yi = j |xi = 0) = 0.1, 0.2, 0.6, 0.1
and β = 2.
8
••
•
•
x=0
y
Pr(
Y=j
|x)
1.0 1.5 2.0 2.5 3.0 3.5 4.0
0.0
0.2
0.4
0.6
0.8
1.0
••
•
•
x=0.25
y
Pr(
Y=j
|x)
1.0 1.5 2.0 2.5 3.0 3.5 4.0
0.0
0.2
0.4
0.6
0.8
1.0
• •
••
x=0.5
y
Pr(
Y=j
|x)
1.0 1.5 2.0 2.5 3.0 3.5 4.0
0.0
0.2
0.4
0.6
0.8
1.0
• •
•
•
x=1
y
Pr(
Y=j
|x)
1.0 1.5 2.0 2.5 3.0 3.5 4.0
0.0
0.2
0.4
0.6
0.8
1.0
9
• The lines in this plot represent the probability mass function of
yi for different values of xi under the generalized probit model.
• The points X represent the pmf under the generalized logistic
model, which has
logitPr(yi ≤ j |xi) = αj − x′iβ.
• The distributions are the same for xi = 0 but diverge somewhat
as xi varies.
10
Prior Specification
• For the regression parameters β, we can choose priors as dis-
cussed previously (e.g., normal or uniform improper).
• However, the parameters α = (α1, . . . , αC−1)′ have restricted
support Ω ∈ <(C−1):
Ω = α : α1 < α2 < . . . < αC−1
• Hence, we need to choose a prior for α with support on Ω
11
• A common choice is π(α) ∝ 1(α ∈ Ω) (i.e., a uniform improper
prior on the restricted space)
• If one wants to incorporate prior information on the baseline
probability mass function, one can instead choose
π(α) ∝ 1(α ∈ Ω) N(α; α0,Σα).
• Certainly, there are many other possibilities, and this choice
should be motivated by the available prior information, subject
to α ∈ Ω.
12
Posterior Computation
• Gibbs sampling via adaptive rejection sampling or Metropolis-
Hastings can be used for posterior computation in generalized
probit/logit models.
• Such analyses are easily carried out in WinBUGS (for example).
• The data augmentation algorithm of Albert and Chib (1993) pro-
vides a convenient alternative for the generalized probit model.
13
Data Augmentation Algorithm
After choosing initial values for the parameters α and β, iterate
between the following steps:
1. Impute the underlying normal variable zi from its full conditional
posterior distribution,
π(zi |y,X, α, β) = N(x′iβ, 1) truncated to (αj−1, αj] for yi = j.
2. Sample β from its full conditional posterior distribution,
π(β | z,y,X, α) = N(β, Vβ),
obtained under the conditionally-conjugate π(β) = N(β0,Σβ)
prior.
14
3. Sample αj from its full conditional posterior distribution,
π(αj | z,y,X, β, α−j) = Unif[ maxizi : yi = j, min
izi : yi = j+1],
with α1 = 0 for identifiability & this form obtained under a
uniform improper prior for αj(j = 2, . . . , C − 1).
As a homework exercise (due next Tuesday), show the
joint posterior distribution & derivation for the condi-
tional posterior distributions shown in steps 1-3
15
Some Comments
• The 3rd step in the above Gibbs sampler can be quite inefficient
for sample sizes that are moderate to large.
• An alternative is to replace this Gibbs step with a Metropolis-
Hastings step to develop a hybrid Gibbs/Metropolis-Hastings
algorithm.
• For example, one can sample candidates for the cutpoints α from
normal distributions.
16
Continuation-Ratio Formulations
• As an alternative to specifying a model for the transformed dis-
tribution function, we can work with discrete hazards.
• In particular, we have
hPr(yi = j | yi ≥ j,xi) = αj + x′iβ,
• Models having this sequential-type specification are referred to
as continuation-ratio models
17