bayesian inference for multivariate ordinal data using parameter expansion

11
This article was downloaded by: [The Aga Khan University] On: 10 October 2014, At: 15:45 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Technometrics Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/utch20 Bayesian Inference for Multivariate Ordinal Data Using Parameter Expansion Earl Lawrence a , Derek Bingham b , Chuanhai Liu c & Vijayan N. Nair d a Statistical Sciences Los Alamos National Laboratory Los Alamos, NM 87545 b Department of Statistics and Actuarial Science Simon Fraser University Burnaby, BC, Canada c Department of Statistics Purdue University West Lafayette, IN 47907 d Department of Statistics University of Michigan Ann Arbor, MI 48109 Published online: 01 Jan 2012. To cite this article: Earl Lawrence, Derek Bingham, Chuanhai Liu & Vijayan N. Nair (2008) Bayesian Inference for Multivariate Ordinal Data Using Parameter Expansion, Technometrics, 50:2, 182-191, DOI: 10.1198/004017008000000064 To link to this article: http://dx.doi.org/10.1198/004017008000000064 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

Upload: vijayan-n

Post on 24-Feb-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Bayesian Inference for Multivariate Ordinal Data Using Parameter Expansion

This article was downloaded by: [The Aga Khan University]On: 10 October 2014, At: 15:45Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: MortimerHouse, 37-41 Mortimer Street, London W1T 3JH, UK

TechnometricsPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/utch20

Bayesian Inference for Multivariate Ordinal DataUsing Parameter ExpansionEarl Lawrencea, Derek Binghamb, Chuanhai Liuc & Vijayan N. Naird

a Statistical Sciences Los Alamos National Laboratory Los Alamos, NM 87545b Department of Statistics and Actuarial Science Simon Fraser University Burnaby, BC,Canadac Department of Statistics Purdue University West Lafayette, IN 47907d Department of Statistics University of Michigan Ann Arbor, MI 48109Published online: 01 Jan 2012.

To cite this article: Earl Lawrence, Derek Bingham, Chuanhai Liu & Vijayan N. Nair (2008) Bayesian Inference forMultivariate Ordinal Data Using Parameter Expansion, Technometrics, 50:2, 182-191, DOI: 10.1198/004017008000000064

To link to this article: http://dx.doi.org/10.1198/004017008000000064

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose ofthe Content. Any opinions and views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be reliedupon and should be independently verified with primary sources of information. Taylor and Francis shallnot be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and otherliabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to orarising out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Page 2: Bayesian Inference for Multivariate Ordinal Data Using Parameter Expansion

Bayesian Inference for Multivariate Ordinal DataUsing Parameter Expansion

Earl LAWRENCE

Statistical SciencesLos Alamos National Laboratory

Los Alamos, NM 87545([email protected])

Derek BINGHAM

Department of Statistics and Actuarial ScienceSimon Fraser UniversityBurnaby, BC, Canada

([email protected])

Chuanhai LIU

Department of StatisticsPurdue University

West Lafayette, IN 47907([email protected])

Vijayan N. NAIR

Department of StatisticsUniversity of MichiganAnn Arbor, MI 48109

([email protected])

Multivariate ordinal data arise in many applications. This article proposes a new, efficient method forBayesian inference for multivariate probit models using Markov chain Monte Carlo techniques. The keyidea is the novel use of parameter expansion to sample correlation matrices. A nice feature of the approachis that inference is performed using straightforward Gibbs sampling. Bayesian methods for model selec-tion are also discussed. Our approach is motivated by a study of how women make decisions on takingmedication to reduce the risk of breast cancer. Furthermore, we compare and contrast the performance ofour approach with other methods.

KEY WORDS: Gibbs sampling; Multivariate; Ordinal data; Parameter expansion; Probit.

1. INTRODUCTION

Data collected in marketing surveys, opinion polls, and be-havioral and social science research are often in the form ofordered categorical data. The data represent attitudes of the re-spondents measured on a Likert scale (Likert 1932), such asstrongly disagree, disagree, neutral, agree, and strongly agree.Applied researchers in these areas usually assign numericalscores to the categories (e.g., 1–5) and analyze the scores ascontinuous data.

McCullagh (1980) proposed a latent variable model that pro-vides an explicit measure of distance among the categories. Inthis formulation, categories arise from partitioning of the la-tent variable space using unknown cutpoints. Maximum like-lihood estimation for the parameters of interest as well as thecutpoints (which are viewed as nuisance parameters) was devel-oped. Bayesian inference techniques for the model were givenby Albert and Chib (1993). Here the latent variables are treatedas missing, and data augmentation techniques are used to ob-tain the posterior distributions. In this context, probit models,in which the latent variables are normally distributed, are espe-cially convenient for computational purposes. (Refer to Agresti1984, Johnson and Albert 1999, and McCullagh and Nelder1989 for details and additional references on latent variable andprobit models.)

The article deals with Bayesian inference for multivariate or-dinal data that arise in many applications, for example, multipleitem responses or longitudinal responses to the same question.We consider multivariate data with a regression structure forthe mean and develop Markov chain Monte Carlo (MCMC)techniques for estimating the underlying parameters, as wellas methods for model selection. Although this problem hasbeen considered in the literature, our contribution is to developBayesian inference using the parameter expansion technique

(see Liu, Rubin, and Wu 1998; Liu and Wu 1999) to addressthe key identifiability problem with covariance matrices. It isknown that the full covariance matrix is not identifiable frommultivariate ordinal data, so attention must be restricted to cor-relation matrices. But, there is no natural way to deal with cor-relation matrices in Bayesian inference, so we use the novelidea of parameter expansion and work with covariance matri-ces rather than correlation matrices. The parameter expansionmethod was originally proposed to accelerate EM and data aug-mentation algorithms. We use it instead to overcome the in-herent identifiability problem, with the added advantage of fastconvergence. In this article we develop Bayesian methodologyfor inference in the multivariate probit (MVP) model and com-pare it with other methods reported in the literature. (Note thatparameter expansion for this application was briefly mentionedin Liu 2001 in a different context.) Our algorithm is derivedwithin the framework of parameter expansion (Liu et al. 1998;Liu and Wu 1999), allowing us to draw on the theoretical re-sults.

Some of the early work on multivariate ordinal data con-sidered only simpler versions of the problem, which are moretractable. Ashford and Sowden (1970) restricted attention to thebivariate case due to computational limitations and suggestedcarrying out all pairwise analyses for more than two responses.Ochi and Prentice (1984) and others have extended the modelto higher dimensions for the equicorrelated case. This approachsimplifies the problem by considering just a single correlationparameter but does not extend to a general correlation matrix.Bock and Gibbons (1996) used a parameterization based on a

© 2008 American Statistical Association andthe American Society for Quality

TECHNOMETRICS, MAY 2008, VOL. 50, NO. 2DOI 10.1198/004017008000000064

182

Dow

nloa

ded

by [

The

Aga

Kha

n U

nive

rsity

] at

15:

45 1

0 O

ctob

er 2

014

Page 3: Bayesian Inference for Multivariate Ordinal Data Using Parameter Expansion

PARAMETER EXPANSION FOR MULTIVARIATE ORDINAL DATA 183

matrix decomposition that reduces the number of elements thatneed to be estimated in the correlation matrix. Again, the re-striction makes the problem more computationally tractable butdoes not allow for a general correlation matrix in the formula-tion.

Chib and Greenberg (1998) described a method for estimat-ing MVP models using an MCMC algorithm and a Monte CarloEM algorithm. The model that they described is similar to theone that we consider here, with some structural and algorithmicdifferences. First, they built a single regression model acrossthe multivariate response. Second, their approach proceeded bydrawing the elements of the correlation matrix individually orin groups using a Metropolis step. This differs from our algo-rithm, which draws the entire correlation matrix as a single unit,thereby reducing the autocorrelation in the sampled Markovchains.

Edwards and Allenby (2003) also discussed a Bayesianmethodology for a MVP model. Their approach differs fromours in several ways. They developed the model for the “pickany J” framework in which a respondent is given a set of choicesand instructed to select as many as necessary (e.g., “Whichof the following beverages do you drink regularly”). With thisgoal, they developed a binary MVP model with just a mean andcorrelation matrix, as opposed to the regression structure thatwe consider for ordinal data. The MCMC draws for the correla-tion matrix are very similar to the methodology proposed here,the difference being that they postprocess the draws as opposedto rescaling within the algorithm.

We should distinguish multivariate ordinal data and the mod-els here from the multinomial probit problem that also hasbeen considered in the literature (McCulloch, Polson, and Rossi2000; Imai and van Dyk 2005). The latter applies to univari-ate categorical data, and there are related (although somewhatdifferent) identifiability problems and latent variable issues asthose described here.

The article is organized as follows. The MVP model and thechallenges associated with estimation, along with the proposedmethodology for Bayesian inference and variable selection, aredescribed in Section 2. Data from two applications are analyzedin Section 3, where a comparison of the performance of ourmethod with others in the literature also is provided.

2. BAYESIAN INFERENCE USING MARKOV CHAINMONTE CARLO AND PARAMETER EXPANSION

2.1 The Multivariate Probit Model

We begin with a description of the MVP model. (For de-tails of the univariate probit model, see Albert and Chib 1993,Agresti 1984, or Johnson and Albert 1999.)

Denote the ith observation by {Yi,Xi}, i = 1, . . . ,n, where Yi

is q-vector in which each entry Yi,j (j = 1, . . . ,q) can take oneof kj ordered values coded 1, . . . , kj. Furthermore, Xi is a p × 1vector of predictors. Let Zi be a q-variate normal latent variablefor each observed Yi. The multivariate ordinal data Yi are ob-tained from the latent variable Zi as follows. Let γ be a set ofcutpoints, one set for each ordinal response, with γj,0 = −∞and γj,kj = ∞ for all j. Then if γj,c−1 < Zi,j ≤ γj,c, we ob-serve Yi,j = c. The vector of means of Zi is determined by the

predictors and regression parameters. Specifically, let βj be a1 × p vector of regression coefficients associated with the jthresponse. The mean for Zi,j is given by βjXi. If the matrix of re-gression coefficients is denoted by β = (β ′

1, β′2, . . . , β

′q)

′, thenZi is distributed normally with mean βXi.

To see where the challenges in estimation arise, consider theunivariate case in which each observation has a single Z. Recallthat the ordinal data are invariant to certain transformations onthe underlying latent variable, for example, a latent variable Zwith variance 1, mean μ, and cutpoints γ� gives the same ob-servable probabilities as a latent variable W with variance σ 2,mean ν = σμ, and cutpoints θ� = σγ�, with � = 1, . . . , k. Sim-ilarly, a latent variable Z with cutpoint γ1 = 0 and intercept β0gives the same observable probabilities as a latent variable Wwith cutpoint θ0 = δ and intercept α0 = β0 − δ. Thus it is com-mon to fix the latent variable variance at 1 and the first cutpointto 0 to identify the remaining parameters. The multivariate caseis similar. The mean vector and the variances cannot be identi-fied from the ordinal data; thus, for each dimension, we fix thefirst cutpoint γj,1 = 0, and restrict the covariance matrix to be acorrelation matrix.

In the univariate case, we have a single variance, and set-ting σ 2 = 1 removes the parameter from the inference frame-work. Here the restriction is on the structure of the covariancematrix (i.e., all diagonal elements equal 1). This presents themain challenge in multivariate modeling, because it is difficultto posit appropriate prior distributions on correlation matricesor estimate the posterior distribution subject to the constraint.Several different ways to get around this problem are proposedin the literature. As we describe later, we expand the parameterspace to unrestricted covariance matrices and normalize themappropriately to get the correlation matrices.

2.2 Likelihood and Computational Issues

We use matrix notation and the matrix-variate normal distri-bution to simplify the algebra and presentation (and also it easeimplementation). Let M and μ be q×n matrices, and let vec(M)

be the vector resulting from stacking the columns of M. Also,let be a q × q matrix and � be an n × n matrix. The matrixM is distributed as a matrix-variate normal variable with meanmatrix μ and covariance matrix ⊗ � if vec(M′) is distrib-uted as a qn-variate normal variable with mean vector vec(μ′)and covariance matrix ⊗ � . We denote this distribution asM ∼ Nq,n{μ, ⊗ �}. (See Gupta and Nagar 2000 for a morecomplete description of the matrix variable normal density.)

To take advantage of this representation for our model,we start by constructing the following matrices from thelatent and covariate vectors: Z = [Z1,Z2, . . . ,Zn] and X =[X1,X2, . . . ,Xn], where Zi is the q-vector of latent variablesand Xi is the p vector of covariates for response unit i. NowZ is matrix-variate normal with mean matrix βX and covari-ance matrix R ⊗ In, where In is the n × n identity matrix. Theresulting complete-data likelihood for the MVP model is

L(β,R, γ ) = |R|−n/2 etr

{−1

2R−1(Z − βX)(Z − βX)′

}

×n∏

i=1

q∏j=1

I{γj,Yi,j−1 < Zi,j ≤ γj,Yi,j

}, (1)

TECHNOMETRICS, MAY 2008, VOL. 50, NO. 2

Dow

nloa

ded

by [

The

Aga

Kha

n U

nive

rsity

] at

15:

45 1

0 O

ctob

er 2

014

Page 4: Bayesian Inference for Multivariate Ordinal Data Using Parameter Expansion

184 EARL LAWRENCE ET AL.

where etr(·) = exp{trace(·)}. This is simply the likelihood basedon the matrix-variate normal density with each component ofthe random matrix restricted to particular region based on theobserved values in Y .

As noted by other studies on this topic, we can view inferencein this setting as a missing-data problem. We do not observe thelatent variable Zi directly; rather, the observed multivariate or-dinal response Yi defines a rectangular region that contains Zi.The observed data likelihood is obtained by integrating out thelatent variable Z over the ranges specified by the truncation.This is difficult to maximize directly; the EM algorithm is anatural choice. But the truncation makes the imputation diffi-cult, because there is no closed form for the expectation of thelatent variables. Other numerical optimization techniques aresimilarly hindered by the difficulty in computing integrals andderivatives of this expression.

Simulation-based techniques are known to be computa-tionally efficient and easily tractable for this type of setting.Straightforward Monte Carlo integration is not feasible due tothe difficulty of producing samples from a truncated multivari-ate normal distribution; however, draws can be simulated fromsuch a distribution using MCMC. The conditional distributionof each component of a truncated multivariate normal variableis simply a truncated univariate normal. By cycling through theconditional draws, we can obtain a Markov chain whose station-ary distribution is the required truncated multivariate normal.

Because of the need to use MCMC to generate the latent vari-able, we frame the entire problem in the Bayesian MCMC set-ting. Using conjugate priors for the model parameters allowsfor a fairly straightforward MCMC implementation. An attrac-tive feature of this approach is that each parameter draw is aGibbs step from a relatively simple distribution. Furthermore,additional techniques, explained in the next section, ensure thatthe chain mixes quickly.

2.3 Expanding the Parameters

As noted earlier, a key challenge arises from restricting thecovariance matrix to a correlation matrix. It is convenient touse the inverse-Wishart distribution to draw unstructured co-variance matrices; however, this cannot be easily modified toproduce draws that have ones on the diagonal. The algorithmwould be greatly simplified if we could take general inverse-Wishart draws and then transform the parameters so that the re-striction is met. Surprisingly, we can do exactly this by exploit-ing the parameter expansion technique. The methodology usesan expanded parameter formulation (Liu et al. 1998; Liu andWu 1999) that enables estimation of the restricted covariancematrix from a single draw from an inverse-Wishart distribution.

Consider our observed data model:

P{Y1 = y1, . . . ,Yq = yq}

=∫ γq,yq

γq,yq−1

· · ·∫ γ1,y1

γ1,y1−1

(2π)−q/2|R|−1/2

× exp

{−1

2(Z − βX)′R−1(Z − βX)

}dZ1 · · · dZq.

Let the matrix V be a q × q diagonal matrix with positive di-agonal elements v1, . . . , vq. Now consider the following scale

transformation on the latent variable: W = V1/2Z. Make thissubstitution into the foregoing data model. After some tediousalgebra, we arrive at

P{Y1 = y1, . . . ,Yq = yq}

=∫ √

vqγq,yq

√vqγq,yq−1

· · ·∫ √

v1γ1,y1

√v1γ1,y1−1

(2π)−q/2|V|−1/2|R|−1/2

× exp

{−1

2

(W − V1/2βX

)′(V1/2RV1/2)−1

× (W − V1/2βX

)}dW1 · · · dWq.

This is a useful transformation because the probabilities for Yremain the same despite the change of variable. If we rewritethe parameters as

α = V1/2β,

θj,c = √vjγj,c,

and

= V1/2RV1/2,

then the foregoing becomes the probability for Y based on thegeneral multivariate normal density with no restriction on thevariances, that is,

P{Y1 = y1, . . . ,Yq = yq}

=∫ θq,yq

θq,yq−1

· · ·∫ θ1,y1

θ1,y1−1

(2π)−q/2||−1/2

× exp

{−1

2(W − αX)′−1(W − αX)

}dW1 · · · dWq. (2)

We call this the expanded parameter model because the set ofparameters has grown to include a set of variances previouslyfixed at unity. This leads to the following expanded likelihood:

L(α,, θ) = ||−n/2 etr

{−1

2−1(W − αX)(W − αX)′

}

×n∏

i=1

q∏j=1

I{θj,Yi,j−1 < Wi,j ≤ θj,Yi,j

}. (3)

This model matches the observed data model but does notmeet the identifiability restrictions. In other words, the observeddata do not allow us to estimate the parameters as they are de-scribed in (2). But if we could use this model, then we wouldbe able to define an MCMC scheme that uses a very simpleinverse-Wishart posterior for the covariance matrix. In fact, wedo exactly that. We propose making draws from this modeland then transforming them to meet the identifiability condi-tions.

This scheme has a number of nice features. First, all of thedraws are relatively simple. The scheme involves no compli-cated attempt to draw each correlation parameter individually,but rather uses one draw of an unrestricted covariance matrix.Second, for the correct choice of prior distributions, we candraw independently of the other parameters and reduce thedependence between iterations. In other words, for each itera-tion of the overall MCMC scheme, the entire parameter draw is

TECHNOMETRICS, MAY 2008, VOL. 50, NO. 2

Dow

nloa

ded

by [

The

Aga

Kha

n U

nive

rsity

] at

15:

45 1

0 O

ctob

er 2

014

Page 5: Bayesian Inference for Multivariate Ordinal Data Using Parameter Expansion

PARAMETER EXPANSION FOR MULTIVARIATE ORDINAL DATA 185

based only on the last latent variable draw. Both of these fea-tures will improve the mixing rate of the MCMC algorithm andresult in improved performance. In addition, the theory of para-meter expansion guarantees a further gain in convergence ratebased on the implicit resampling of the variances and adjust-ments to the parameters. Next, we focus on the implementationof this model, after which we discuss the related convergenceissues.

2.4 Priors

Here we focus on noninformative conjugate priors for tworeasons. First, in most cases we prefer to let the data determinethe parameters, with little or no influence from outside sources.Second, the noninformative priors lead to convenient results re-garding the draws and convergence properties. Our experiencehas shown that the procedure will work with other priors aswell, but general results are not as easy to obtain.

The prior chosen for the covariance matrix is the Jeffreysnoninformative prior for scale matrices. This prior,

π() ∝ ||−(q+1)/2, (4)

leads to an inverse-Wishart full conditional distribution for thecovariance draw. A key feature is that it keeps the expandedscale parameters independent from the correlations a priori, be-cause

||−(q+1)/2 = |V|−(q+1)/2|R|−(q+1)/2.

We choose a matrix-variate normal prior for the regression pa-rameter matrix β . Recall that each row of the β matrix containsthe regression parameters for one dimension of the response.With this in mind, we use the following prior formulation:

π(β) = Nq,p(B,B ⊗W). (5)

The matrix B represents the prior belief about the location ofβ . The matrices B and W represent the prior belief about therelationship between the regression coefficients. The matrix Wcontains prior information regarding the relationship among re-gression coefficients within each dimension, and the matrix Bcontains the prior information regarding the relationship amongregression coefficients across dimensions. A few aspects re-garding the choice of prior should be noted. First, the devel-opment in Section 2.5 is done by assuming that π(β) ∼ 1 for anoninformative prior. Second, if we let the prior for β depend onR by setting B = R, then we can integrate out β , allowing us todraw the covariance independently of β for an efficient MCMCimplementation. This property also applies to the noninforma-tive prior. Finally, because the matrix-variate normal distribu-tion can be rewritten as a multivariate normal distribution, theprior for β can be specified in a more general fashion. We ex-plore this issue further in the section on variable selection.

Finally, we choose normal priors with an order restriction forγ . In other words, for a particular dimension, let all of the freecutpoints follow an independent multivariate normal distribu-tion with the restriction that γj,1 = 0, γj,2 ≥ γj,1, and so on. Notealso that the order restriction has no practical outcome, becausethe likelihood will impose this restriction naturally. A nonin-formative prior is achieved by allowing the variances to go toinfinity.

More informative priors than those specified also can be usedwhile still maintaining the conjugacy required to produce Gibbsdraws. The matrix-variate normal prior for the regression coeffi-cients can be used with informative choices for the parameters.We use this technique in the later discussion of and examplefor variable selection. The prior for the cutpoints also can beused with informative parameter choices. In this case the poste-rior conditional distributions become truncated normal distribu-tions. Finally, an inverse-Wishart distribution can be used as aninformative prior for the covariance/correlation matrix that stillmaintains conjugacy. With this choice, the mechanical detailsof the algorithm remain the same, but the theory of convergence(discussed later) becomes more difficult to verify when the ex-panded parameters are no longer independent of the original pa-rameters; however, our practical experience, indicates that thereare no problems.

2.5 Markov Chain Monte Carlo Implementation

We now consider the posterior distributions and the steps ofthe MCMC scheme. We begin with a brief description of thelatent variable draw, followed by consideration of the other pa-rameters.

Each latent observation consists of a q-variate normal variatethat is truncated along each dimension. Consider a variable Zi

with mean μi = βXi and covariance matrix R. Let this variablebe truncated in all dimensions with a set of cutpoints so thatγj,a < Zi,j ≤ γj,b (a < b). The univariate conditional distribu-tions are all univariate truncated normal where the parametersare simple functions of μi and R, specifically,

σ 2 = 1/R−1j,j

and

ν = μi,j − σ 2R−1j,−j(Zi,−j − μi,−j).

Thus we can create a Gibbs sampler to cycle through the con-ditionals to produce a Markov chain with the desired truncatedmultivariate normal distribution. In practice, for this algorithm,we choose some small fixed number of iterations to run thechain to produce each draw.

We now treat the data as having arisen from the expandedmodel. Given the latent variables W , we next turn to the co-variance matrix. As noted previously, we draw this as if it wereunrestricted. Because of the noninformative choice for the prioron the regression coefficients, this draw is made independentlyof the other variables, conditional only on the missing data,which allows us to make the entire parameter draw conditionalon only the latent variables. Combining our prior (4), expandingthe likelihood (3), and integrating out α gives

|Z ∼ IW{n − p + q − 1,WW ′ − WX′(XX′)−1XW ′}.Next, we move on to the regression coefficients. Using the

noninformative prior (5) for β (with mean 0 and infinite vari-ances) and combining it with our expanded likelihood (3), theexpanded condition posterior is

α|W, ∼ Nq,p{WX′(XX′)−1, ⊗ (XX′)−1}.TECHNOMETRICS, MAY 2008, VOL. 50, NO. 2

Dow

nloa

ded

by [

The

Aga

Kha

n U

nive

rsity

] at

15:

45 1

0 O

ctob

er 2

014

Page 6: Bayesian Inference for Multivariate Ordinal Data Using Parameter Expansion

186 EARL LAWRENCE ET AL.

Finally, we draw the cutpoints. Using the noninformativeprior achieved by letting the variances go to infinity, we have

θj,c|W,Y ∼ Unif(

maxi

{Wi,j|Yi,j = c},mini

{Wi,j|Yi,j = c + 1})

for all freely varying θj,c. If we use an informative prior basedon the normal distribution, then each posterior draw is normaltruncated to the region defined for the uniform draw.

With the current draw from the expanded model, we simplyremove the expanded parameters by rescaling to get the desiredsample of interest. We use the following transformation:

Zi,j = Wi,j/√

j,j,

Ri,j = i,j/√

i,i × j,j,(6)

βk,j = αk,j/√

j,j,

γj,c = θj,c/√

j,j.

Note that the transformation on the cutpoints maintains the re-quired ordering constraint, because the members of a set for asingle dimension are all scaled by the same positive value—thestandard deviation for that dimension.

In summation, we have the following algorithm for the MVPMCMC algorithm starting with some set of latent variables andparameters Z(r−1),R(r−1), β(r−1), and γ (r−1):

1. Draw W(r)|Z(r−1),R(r−1), β(r−1), and γ (r−1).2. Draw (r)|W(r).3. Draw α(r)|W(r),(r).4. Draw θ(r)|Y,W(r).5. Transform according to (6).

Remark. A convenient way to choose starting values for thealgorithm is to run at least one iteration of the univariate pro-bit model on the marginal data. This simplifies the process ofchoosing starting points, because starting values of the latentvariables are not required for the univariate probit algorithm.This results in a set of starting values for the latent variablesand the cutpoints that fit together; that is, each latent variableappears in the correct hypercube that corresponds to the ob-served values. This process amounts to choosing a starting pointbased on the assumption of independence among the dimen-sions and seems to work well in practice. Other methods ofchoosing starting values should work just as well, although wehave found this to be a straightforward approach.

2.6 Convergence Under Parameter Expansion

Consider the standard data augmentation setting. Let the ob-served data be y and the complete data be z. Let the modelparameters be represented by θ . The basic components arethe observed-data model f (y|θ) and the complete-data modelf (y, z|θ). These models have the following relationship:

f (y|θ) =∫

f (y, z|θ)dz. (7)

In addition, we have a prior for the parameters, π(θ). Theapproach proceeds by first drawing z ∼ f (z|y, θ) ∝ f (y, z|θ)

and then drawing θ ∼ f (θ |y, z) ∝ f (y, z|θ)π(θ). We repeat thisprocess until the Markov chain converges to the stationary dis-tribution, whereupon we can use the sample of θ to obtain esti-mates of the quantities of interest.

Suppose that there is a parameter that can be identified inthe complete-data model but is unidentifiable from the observeddata alone. Call this parameter α, and let w be the completedata when α is included. We now can expand the complete-data model to f (y,w|θ,α) and still preserve the observed-datamodel. In mathematical terms, we have∫

p(y,w|θ,α)dw = f (y|θ). (8)

Let the expanded parameter set have prior π(θ,α). If the mar-ginal prior for θ resulting from this joint distribution is the sameas the prior from the original data augmentation scheme, thenthe posterior for θ from the expanded model will be the sameas the posterior for θ from the original model. In other words, ifwe preserve the prior for our regular parameters, then expand-ing the parameter set will not affect the posterior for our reg-ular parameters. A more detailed explanation is given by Liuand Wu (1999), who provided several examples of parameter-expanded data augmentation schemes.

By choosing priors as we have done (specifically, our priorfor ), we achieve the aforementioned property. The posteriordistributions for our parameters are unchanged by expandingthe parameters. The model and scheme presented here coincidewith scheme 2 of Liu and Wu (1999). We draw the latent vari-able from a set of parameters that meet the usual restriction. Wefollow this with draws of both the regular and expanded para-meters from the expanded model. The regular parameters arethen used for the next latent variable draw. Liu and Wu (1999)presented the theoretical background for convergence in somedetail. We simply offer some intuition into the convergence is-sues using their idea of orbits.

For any MVP model, an infinite number of parameter setsgive the same observed model. The latent variable can be scaledalong any and all dimensions, and the appropriate scaling of theparameters will not change the model. We call this infinite setof models the orbit. Any model in an orbit is equivalent to anyother model in that orbit. In other words, the orbit determinesthe observed model, and the location within the orbit does notmatter. Each iteration of the MCMC algorithm chooses an or-bit and a location within that orbit. The transformation simplymoves the model parameters to the place in the orbit that meetsour requirements. It does not choose a different orbit, and thusmaintains the observed model selected in that iteration.

Remark. The authors of the previously mentioned articles onparameter expansion developed the method to gain speed. Inthe EM setting, parameter expansion uses covariance adjust-ment to speed the convergence rate of the algorithm. In the dataaugmentation setting, the covariance adjustment hastens con-vergence to the stationary distribution. Both of these goals areachieved by introducing an unidentified parameter. In our set-ting, we instead use the idea of parameter expansion to simplifythe estimation procedure and overcome model nonidentifiabil-ity. Specifically, this formulation allows for inference for theentire correlation matrix through the Gibbs sampler.

2.7 Prediction

An important aspect of using a statistical model is prediction.For example, consider a bivariate example in which a product

TECHNOMETRICS, MAY 2008, VOL. 50, NO. 2

Dow

nloa

ded

by [

The

Aga

Kha

n U

nive

rsity

] at

15:

45 1

0 O

ctob

er 2

014

Page 7: Bayesian Inference for Multivariate Ordinal Data Using Parameter Expansion

PARAMETER EXPANSION FOR MULTIVARIATE ORDINAL DATA 187

undergoes a series of pass/fail tests. The probability of any fail-ures gives the overall failure rate of the product. Alternatively,consider a trivariate example in which a product undergoes aseries of tests to determine if it is below, within, or above thespecifications. The probability of being within specifications onall of the tests is obviously of interest. These probabilities areavailable for each iteration of the Markov chain.

At each iteration, we have a complete set of parame-ters: regression coefficients, covariance matrix, and cutpoints.With this information, we can determine various probabilitiesthrough several different approaches. The simple approach is toestimate the probabilities through simulation. With a given setof parameters and covariates, multivariate normal variables canbe simulated and the probabilities estimated. Embedding thisapproach into the MVP MCMC has some attractive properties.It will yield a sample of desired probabilities that can be used toform point estimates or sample-based confidence intervals. (Formore sophisticated techniques on evaluating the probabilities,see Gassmann, Deák, and Szánti 2002.)

2.8 Variable Selection

The stochastic search variable selection of George andMcCulloch (1993) lends itself well to inclusion in the method-ology discussed here. The key ingredient of their algorithm isthe use of an indicator variable λi describing the presence orabsence of a covariate i in a regression model. The prior forthe regression coefficients depends on the indicator variable. Ifλi = 1, then βi is normally distributed with large variance andmean 0, allowing large values. Otherwise, the prior for βi isnormally distributed around 0 with small variance.

We now develop this idea for the current setting. First, weintroduce a matrix of indicator variables λ, where λj,k = 1if the model for response j includes the covariate k. Thisparameter will be included in our MCMC procedure. Theprior for β will be conditional on λ. For the present de-velopment, we treat the coefficients as independent a priori.

Thus var(βj,k|λj,k) = c2λj,kj,k τ 2

j,k, where τj,k is small and cj,k

is large to produce the desired effect. For this development,we use the automatic tuning parameter values described byChipman (1998). We set cj,k = 10 to indicate an order-of-magnitude difference between significant and insignificant ef-fects. We set τj,k = 1/[3 × range(Xk)] for the reasons discussedby Chipman (1998). The combined prior for β is such thatvec(β ′) is qp-multivariate normal with mean 0 and D, a diago-nal covariance matrix with the foregoing entries above. For λ,we use the independent prior with P{λj,k = 1} = pj,k, a priori.All other priors remain the same, but the posteriors will change.

Because of the form of the prior for β , it cannot be integratedout of the likelihood, and the draw of the covariance will bedependent on the regression parameters. Thus, combining thelikelihood with the priors, we have the following conditionalposteriors:

|Z, β ∼ IW{n + q + 1, (Z − βX)(Z − βX)′},vec(α′)|Z, ∼ Nqp{VM,V},

and

P{λj,k = 1|β} = a

a + b,

where V = [D−1 + (−1 ⊗ XX′)]−1,M = (−1 ⊗ XX′) ×vec[(XX′)−1XZ′], a = π{β|λ(j,k), λj,k = 1}pj,k, and b = π{β|λ(j,k), λj,k = 0}(1 − pj,k). The draws for the cutpoints are un-changed. The transformation is also unchanged, but no trans-formation is required for λ.

Cycling through the draws in the order described here willproduce a Markov chain that includes the usual parameters anda sample of λ draws that can be used to perform variable selec-tion. Each λi represents a model choice. The models occurringmost frequently indicate those variables that should be includedin the model. (For more details about the overall procedure andanalysis of the results, see George and McCulloch 1993.) Weconsider this procedure for the breast cancer survey example inthe next section.

The foregoing posteriors are typical of what is seen when theprior for β differs from the flat prior and the parameter cannotbe integrated out of the posterior. The draws are still relativelysimple and from well-known distributions, but the correlationbetween consecutive draws is increased by the dependence of on the last set of parameters.

3. EXAMPLES AND DATA ANALYSIS

In this section we analyze two examples and use them to il-lustrate our methodology and compare it with other methods inthe literature.

3.1 Six Cities Example: Longitudinal Studyon Air Pollution

The first example is the Six Cities longitudinal study onthe health effects of air pollution (see Ware, Dockery, Spiro,Speizer, and Ferris 1984; Chib and Greenberg 1998). The dataconsist of repeated measurements of wheezing status of chil-dren in southern Ohio. The predictors are the child’s age andthe mother’s smoking habit. The responses at each time pointare binary, but the responses over time can be treated as mul-tivariate data. In this case the correlation matrix represents adependence structure over time rather than within items at thesame time. By estimating the correlation matrix, we can charac-terize the nature of the temporal relationship in the longitudinalresponses.

We reanalyze a subset of the data set, also considered by Chiband Greenberg (1998), containing repeated measurements of537 children with a yes/no response regarding the occurrenceof wheeze at age 7, 8, 9, and 10 years. The covariates of interestwere the child’s age, centered at 9 years, and an indicator vari-able for the mother’s smoking habit during the first year of thestudy.

As mentioned in Section 1, Chib and Greenberg (1998) fita single linear model for the response across all four ages, us-ing the child’s age as both a dimension of the response and acovariate. They also used the smoking indicator and its interac-tion with age as covariates. For the purpose of comparison, weuse both their algorithm and the algorithm presented here to fita separate regression model for each response variable, usingjust the smoking indicator as a covariate,

E{zi,j = 1} = βj,0 + βj,1xi,1.

TECHNOMETRICS, MAY 2008, VOL. 50, NO. 2

Dow

nloa

ded

by [

The

Aga

Kha

n U

nive

rsity

] at

15:

45 1

0 O

ctob

er 2

014

Page 8: Bayesian Inference for Multivariate Ordinal Data Using Parameter Expansion

188 EARL LAWRENCE ET AL.

The key difference in the fitting procedures is the drawing of thecorrelations—the use of the Metropolis algorithm to draw thecorrelation parameters from Chib and Greenberg (1998) (CG)versus the use of parameter expansion (PX) and Gibbs samplingto draw the correlation matrix as a single unit as presented here.The CG algorithm uses the tailored proposal density that theydescribed. Both algorithms were run for 10,500 iterations, using500 iterations as a burn-in period, and they resulted in similarpoint estimates,

β̂PX =⎛⎜⎝

−.9830 .0112−1.0295 .2194−1.0536 .1662−1.2351 .1499

⎞⎟⎠ ,

(9)

R̂PX =⎛⎜⎝

1 .5920 .5354 .5731.5920 1 .7004 .5711.5354 .7004 1 .6415.5731 .5711 .6415 1

⎞⎟⎠

and

β̂CG =⎛⎜⎝

−.9851 .0065−1.0321 .2204−1.0595 .1718−1.2419 .1554

⎞⎟⎠ ,

(10)

R̂CG =⎛⎜⎝

1 .5944 .5278 .5692.5944 1 .6970 .5633.5278 .6970 1 .6413.5692 .5633 .6413 1

⎞⎟⎠ .

An important difference arises when we compare the auto-correlation functions of the sampled Markov chains for the cor-relation parameters in Figure 1. The autocorrelation functionsfor the samples produced by our PX algorithm (shown as dot-ted lines) show significant autocorrelations only to about 25 or30 lags. At these lags, the samples from the CG algorithm stillshow much higher autocorrelations that are close to .5. Indeed,the autocorrelation functions for the CG samples do not die offuntil lags of almost 100. For chains of the same length, the ef-fective sample size for the PX algorithm will be much larger,

Figure 1. Autocorrelation functions (ACFs) for each of the corre-lation parameters in the Six Cities example. The solid lines are theACFs for the Chib–Greenberg algorithm. The dotted lines representthe ACFs for the algorithm presented here.

allowing for more accurate point estimation and better compu-tation of other quantities, such as parameter covariances. Thisanalysis provides a strong, concrete example of parameter ex-pansion in action in terms of both providing a framework fordrawing the correlation matrix as a single unit and improvingconvergence.

Another advantage of the PX algorithm is its relative simplic-ity compared with the CG algorithm. The customized proposaldensity suggested by Chib and Greenberg (1998) requires sometuning of parameters, as well as the need to optimize over theproposal parameters during every iteration. A simpler approachto Metropolis sampling using a random-walk algorithm resultsin considerably higher autocorrelations. In contrast, the PX ap-proach requires no algorithmic tuning while achieving desirableautocorrelation properties.

Next, we use this example to examine the performance ofthe algorithm presented by Edwards and Allenby (2003). Al-though not designed specifically for this type of data, their drawof the correlation matrix can be used within this algorithm. Thekey difference in these procedures is that our algorithm trans-forms the draws within each iteration of the MCMC, whereasthe algorithm of Edwards and Allenby (2003) proceeds usingthe untransformed draws to perform the MCMC with the trans-formations implemented as a postprocessing step.

For the most part, the approach of Edwards and Allenby(2003) leads to results analogous to ours. Running their al-gorithm under the scenario described here leads to essentiallyidentical posterior samples after the postprocessing transforma-tion. Consideration of the untransformed trace is revealing. Af-ter the 10,500 iterations considered in this example, some of thediagonal elements of the untransformed covariance matrix areextremely large compared with others, but after postprocessing,this issue has no effect on the inference.

A note of caution, however: Trouble can arise when the chainis run for a long time. The discrepancies in the magnitude ofthe diagonal entries possibly can lead to a matrix that can nolonger be numerically inverted. The problem is not with the in-ference, but with the numerical stability of the untransformedMarkov chain. The problem seems likely to result from feed-back between the draws of the covariance matrix and the latentvariables. A large draw for an unidentified variance leads to adraw of the latent variables with large variance, which in turncan lead to another large draw of a variance parameter. With-out a within-iteration transformation, such as that required byparameter expansion, nothing is available to correct this issue.Assessing the severity of this potential problem is difficult. Forthis example, this phenomenon is observed when the chain isrun for the admittedly large number of 350,000 iterations. (Notethat successful inference can be done with 10,500 iterations.) Inother scenarios in which convergence may be slow, such as or-dinal data with many levels and thus many cutpoints, this issuemay be more problematic.

3.2 Breast Cancer Prevention Study

An ongoing study conducted by the Program for ImprovingHealth Care Decisions at the University of Michigan is aimed atunderstanding how women make informed decisions on takingmedication to reduce the risk of breast cancer. The particular

TECHNOMETRICS, MAY 2008, VOL. 50, NO. 2

Dow

nloa

ded

by [

The

Aga

Kha

n U

nive

rsity

] at

15:

45 1

0 O

ctob

er 2

014

Page 9: Bayesian Inference for Multivariate Ordinal Data Using Parameter Expansion

PARAMETER EXPANSION FOR MULTIVARIATE ORDINAL DATA 189

medicine considered in this study is tamoxifen, a synthetic hor-mone pill used to treat breast cancer as well as decrease thelikelihood of getting breast cancer in women at high risk. Tomake an informed decision, women need to understand theirbaseline risk of breast cancer as well as the risks and benefits oftamoxifen prophylaxis. The main goal of the overall project isto identify the best ways to communicate such information.

Decision aids (i.e., communication methods) under study in-clude the level of presentation detail and the use of differenttypes of graphics for communicating risks. The researchers alsoare interested in the order that information is presented (risk be-fore/after benefit; risk perception before/after knowledge) andthe framing of questions in terms of gain versus loss. The ef-fects of these were studied in a pilot experiment. The originalsurvey instrument included 38 questions covering a broad rangeof topics, with most of the responses being ordinal in nature.A subset of the survey instrument with selected questions, in-cluding questions on demographic information, is given in theAppendix. The latter include the effects of age, education, andincome on the various responses.

In this section we analyze the outcomes of six survey ques-tions based on individual demographic information (see theApp. for the survey information). Data from 289 completedquestionnaires were used in the analysis. The number of cat-egories was collapsed to four due to data sparsity. (We also an-alyzed the original, uncolapsed data, as we discuss briefly atthe end of this example.) Although the main purpose of the pi-lot study was to examine features of the survey, there is alsointerest in studying the preliminary results regarding attitudesand knowledge about breast cancer and tamoxifen treatment.We used the variable selection procedure to evaluate the sig-nificance of the demographic information on the respondent’schoices. The covariates are standardized to place them on thesame scale. The analysis used 10,000 iterations of the MCMCprocedure.

To answer the questions about demographics, first considerFigure 2. Clearly, two models occur significantly more oftenthan any others. The most frequent model, at about 17%, fea-tures no significant variables. Slightly less frequent, at about12%, is a model featuring one significant variable: a family his-tory of breast cancer has a significant effect on the respondent’sperception of the likelihood of breast cancer compared with theaverage woman. The posterior mean for this coefficient is posi-tive, indicating that a respondent with a family history of breast

cancer believes that she is more likely to get breast cancer thanthe average woman. This is certainly a reasonable result to ex-pect.

Alternatively, we can look at the marginal coefficient inclu-sion probabilities, also shown in Figure 2. The relative frequen-cies are important here, because the prior probability of inclu-sion can play a large role in the posterior probability. Sevencoefficients stand out in this plot. Family history of breast can-cer is potentially significant in questions 1 and 3 and possiblyin questions 2 and 4, all of which are related to the perceivedlikelihood of or worry about developing breast cancer. Knowingsomeone else with breast cancer is also potentially significant inquestion 4. Age and education level are potentially significantin a respondent’s assessment of the possible impact of breastcancer.

The posterior distributions for the potentially significant re-gression coefficients and the larger correlation parameters areshown in Figure 3. The dotted lines give the .025 and .975quantiles for each distribution. The distributions for the regres-sion coefficients include all draws regardless of the value of theinclusion parameter, resulting in some shrinkage. Despite this,some of those coefficients considered marginal by the inclusionparameters seem to be significantly different than 0 based onthe given interval. All of the family history parameters shownhere appear to be significantly positive, indicating an increasein perceived risk.

There are a few large positive correlations among the re-sponses. The questions on perceived likelihood are highly cor-related with one another, as perhaps expected. The questionabout worry is also correlated with this group, suggesting thatthose who perceive a high likelihood of developing breast can-cer worry about it more than others (or that worrying leads to ahigh perception of the likelihood of development). Figure 3 sug-gests that these correlations are significantly different than 0.

The remaining coefficients and correlation parameters are notshown to save space, but they indicate nothing particularly in-teresting or alarming.

It is also important to note that the Markov chains for a smallnumber of the cutpoints exhibit significant autocorrelation outto lags of almost 200. Cowles (1996) and Chipman and Hamada(1996) have noted this problem as well. When the data are ana-lyzed without collapsing categories for the responses, the prob-lem becomes worse (although in all other ways, this analysisproduces similar results). This would seem to suggest that the

(a) (b)

Figure 2. (a) Sample probabilities for the 10 most frequent models from the breast cancer survey data. Models 1 and 2 are significantly morefrequent than the others. (b) Marginal inclusion probabilities for each covariate.

TECHNOMETRICS, MAY 2008, VOL. 50, NO. 2

Dow

nloa

ded

by [

The

Aga

Kha

n U

nive

rsity

] at

15:

45 1

0 O

ctob

er 2

014

Page 10: Bayesian Inference for Multivariate Ordinal Data Using Parameter Expansion

190 EARL LAWRENCE ET AL.

Figure 3. Posterior distributions of potentially significant regression coefficients and correlation parameters for the breast cancer surveyexample. The dotted lines represent the .025 and .975 quantiles.

problem in this example may be caused by the sparsity of thedata relative to the number of category combinations, even inthe collapsed case. Although it is beyond the scope of this arti-cle, a solution similar to that proposed by Cowles (1996) wouldmerit further investigation in certain situations. That author usesa Metropolis–Hastings step to jointly update cutpoints and la-tent variables. The idea is to alleviate the restrictive effect thatthese variables can impose on each other. (The cutpoints boundthe latent variables, which in turn bound the cutpoints.) Thismethod should fit within the parameter expansion frameworkdeveloped here.

3.3 Discussion

We also applied the MVP algorithm with variable selectionto a simulated example involving spatially dependent data. Theoutputs consisted of several binary and several three-level re-sponses. The free cutpoints for the three-level responses exhib-ited autocorrelation functions that were close to 0 at lag 10.This example suggests that a large autocorrelation on the cut-points is not a necessary consequence of this type of model.Because the data from this example were simulated, the exam-ple demonstrates the performance of the model with regard toknown parameters. For all but one of the parameters (an in-significant correlation), the known true parameters fall betweenthe .025 and .975 posterior quantiles.

These examples demonstrate some of the advantages of theproposed method. First, compared with the algorithm given byChib and Greenberg (1998), the parameter expansion algorithmresults in Markov chains with considerably less autocorrelation.The consequences for analysis are clear: The effective sample

size for the parameter expansion algorithm is much larger withgreatly reduced uncertainty in the estimates.

Although the algorithm was not developed for a regressionframework with variable selection, the methodology for the co-variance/correlation matrix of Edwards and Allenby (2003) canbe used in this context for comparison and leads to similar re-sults. A contribution of the current work is to place the unscaledcovariance draw, used both here and by Edwards and Allenby(2003), on firm theoretical ground.

4. CONCLUSION

In this article we developed a parameter-expanded MCMCprocedure for the multivariate probit model. The estimationprocedure uses draws of an unstructured covariance matrixfrom an inverse-Wishart distribution and rescales to get a drawfrom the desired covariance matrix. The formulation is help-ful because it overcomes the identifiability constraints of theMVP model and also because it allows for easy MCMC imple-mentation. Furthermore, the parameter expansion formulationfacilitates fast convergence of the Markov chain.

ACKNOWLEDGMENTS

The authors thank Dr. Angela Fagerlin of the Program forImproving Health Care Decisions at the University of Michi-gan for providing the data for Application 1. Nair’s researchwas supported in part by National Science Foundation grantDMS0204247 and National Cancer Institute Grant P50 CA101451. Bingham’s research was supported by the Natural Sci-ences and Engineering Research Council of Canada.

TECHNOMETRICS, MAY 2008, VOL. 50, NO. 2

Dow

nloa

ded

by [

The

Aga

Kha

n U

nive

rsity

] at

15:

45 1

0 O

ctob

er 2

014

Page 11: Bayesian Inference for Multivariate Ordinal Data Using Parameter Expansion

PARAMETER EXPANSION FOR MULTIVARIATE ORDINAL DATA 191

APPENDIX: BREAST CANCER SURVEY SELECTION

A.1 Survey Questions

1. Compared with the average woman (your age and in yourhealth), what are your chances of developing breast can-cer in your lifetime?0 (much less than average) – 1 – 2 – 3 (same as the aver-age) – 4 – 5 – 6 – 7 (much higher than average)

2. How worried are you that you will develop breast cancerin your lifetime?0 (not worried at all) – 1 – 2 – 3 – 4 – 5 (extremely wor-ried)

3. If you were to choose not to take tamoxifen, how likelydo you think you would be to get breast cancer in yourlifetime?0 (not at all likely) – 1 – 2 – 3 – 4 – 5 (extremely likely)

4. If you were to choose to take tamoxifen, how likely doyou think you would be to get breast cancer in your life-time?0 (not at all likely) – 1 – 2 – 3 – 4 – 5 (extremely likely)

5. If you were to develop breast cancer, how much of an im-pact do you think it would have on the quality of the restof your life?0 (not very much impact at all) – 1 – 2 – 3 – 4 – 5 (a lotof impact)

6. Do you think that taking tamoxifen to prevent breast can-cer is worth tamoxifen’s potential health problems foryou?0 (no, for me it is definitely not worth the potential healthproblems) – 1 – 2 – 3 – 4 – 5 (yes, for me it is definitelyworth the potential health problems)

A.2 Demographic Information

1. Age2. What is the highest level of education you have com-

pleted?1. none; 2. elementary school; 3. high school; 4. tradeschool; 5. some college but no degree; 6. associate de-gree; 7. bachelors’ degree; 8. masters’ degree; 9. doctoralor professional degree

3. What is your annual household income before taxes?1. less than $10,000; 2. $10,001–$25,000; 3. $25,001–$40,000; 4. $40,001–$60,000; 5. $60,001–$80,000;6. $80,001–$100,000; 7. More than $100,000

4. Have you ever known anyone who has been diagnosedwith breast cancer?

5. Has anyone in your immediate family ever been diag-nosed with breast cancer?

[Received January 2006. Revised August 2007.]

REFERENCES

Agresti, A. (1984), Analysis of Ordinal Categorical Data, New York: Wiley.Albert, J. H., and Chib, S. (1993), “Bayesian Analysis of Binary and Polychoto-

mous Response Data,” Journal of the American Statistical Association, 88,669–679.

(2001), “Sequential Ordinal Modeling With Applications to SurvivalData,” Biometrics, 57, 829–836.

Ashford, J. R., and Sowden, R. R. (1970), “Multivariate Probit Analysis,” Bio-metrics, 26, 535–546.

Bock, R. D., and Gibbons, R. D. (1996), “High-Dimensional Multivariate Pro-bit Analysis,” Biometrics, 52, 1183–1194.

Chib, S., and Greenberg, E. (1998), “Analysis of Multivariate Probit Models,”Biometrika, 85, 347–361.

Chipman, H. A. (1998), “Fast Model Search for Designed Experiments WithComplex Aliasing,” in Quality Improvement Through Statistical Methods,ed. B. Abraham, Boston: Birkhauser, pp. 207–220.

Chipman, H., and Hamada, M. (1996), “Bayesian Analysis of Ordered Cate-gorical Data From Industrial Experiments,” Technometrics, 38, 1–10.

Cowles, M. K. (1996), “Accelerating Monte Carlo Markoc Chain Convergencefor Cumulative-Link Generalized Linear Models,” Statistics and Computing,6, 101–111.

Edwards, Y. D., and Allenby, G. M. (2003), “Multivariate Analysis of MultipleResponse Data,” Journal of Marketing Research, XL, 321–334.

Fay, J. W. J. (1957), “The National Coal Board’s Pneumoconiosis Research,”Nature, 180, 309.

Gassmann, H. I., Deák, I., and Szánti, T. (2002), “Computing Multivariate Nor-mal Probabilities: A New Look,” Journal of Computational and GraphicalStatistics, 11, 920–949.

George, E. I., and McCulloch, R. E. (1993), “Variable Selection via Gibbs Sam-pling,” Journal of the American Statistical Association, 88, 881–889.

Gupta, A. K., and Nagar, D. D. (2000), Matrix Variate Distributions, Boca Ra-ton, FL: Chapman & Hall/CRC.

Imai, K., and van Dyk, D. A. (2005), “A Bayesian Analysis of the MultinomialProbit Model Using Marginal Data Augmentation,” Journal of Econometrics,124, 311–334.

Johnson, V. E., and Albert, J. H. (1999), Ordinal Data Modeling, New York:Springer.

Likert, R. (1932), “A Technique for the Measurement of Attitudes,” Archives ofPsychology, 140, 1–55.

Liu, C. (2001), “Discussion on the Art of Data Augmentation,” Journal of Com-putational and Graphical Statistics, 10, 75–81.

Liu, C., Rubin, D., and Wu, Y. (1998), “Parameter Expansion to Accelerate EM:The PX–EM Algorithm,” Biometrika, 85, 755–770.

Liu, J. S., and Wu, Y. (1999), “Parameter Expansion Scheme for Data Augmen-tation,” Journal of the American Statistical Association, 94, 1264–1274.

McCullagh, P. (1980), “Regression Models for Ordinal Data,” Journal of theRoyal Statistical Society, Ser. B, 42, 109–142.

McCullagh, P., and Nelder, J. A. (1989), Generalized Linear Models, Boca Ra-ton, FL: Chapman & Hall/CRC.

McCulloch, R. E., and Rossi, P. E. (1994), “An Exact Likelihood Analysis ofthe Multinomial Probit Model,” Journal of Econometrics, 64, 217–228.

McCulloch, R. E., Polson, N. G., and Rossi, P. E. (2000), “A Bayesian Analysisof the Multinomial Probit Model With Fully Identified Parameters,” Journalof Econometrics, 99, 173–193.

Ochi, Y., and Prentice, R. L. (1984), “Likelihood Inference in a Correlated Pro-bit Regression Model,” Biometrika, 71, 531–543.

Ware, J. H., Dockery, D. W., Spiro, A. III, Speizer, F. E., and Ferris, B. G. Jr.(1984), “Passive Smoking, Gas Cooking, and Respiratory Health in ChildrenLiving in Six Cities,” American Review of Respiratory Disease, 129, 366–374.

TECHNOMETRICS, MAY 2008, VOL. 50, NO. 2

Dow

nloa

ded

by [

The

Aga

Kha

n U

nive

rsity

] at

15:

45 1

0 O

ctob

er 2

014