a nested dirichlet process analysis of cluster randomized trial data with application in geriatric...

22
This article was downloaded by: [Universite Laval] On: 09 July 2014, At: 17:21 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Journal of the American Statistical Association Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/uasa20 A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment Man-Wai Ho a , Wanzhu Tu b , Pulak Ghosh c & Ram C. Tiwari a a Department of Statistics and Applied Probability , National University of Singapore, 6 Science Drive 2 , S117546 , Republic of Singapore b Department of Medicine , Indiana University School of Medicine, 410 West 10th Street, Suite 3000 , Indianapolis , IN , 46202 c Department of Quantitative Management and Information Sciences , Indian Institute of Management, Bannerghatta road , Bangalore , 560076 , India d Office of Biostatistics Center for Drug Evaluation and Research , Food and Drug Development Administration, 10903 New Hampshire Ave., WO Bldg. 21, Rm. 3524 , Silver Spring , MD , 20993-0002 Accepted author version posted online: 08 Oct 2012.Published online: 15 Mar 2013. To cite this article: Man-Wai Ho , Wanzhu Tu , Pulak Ghosh & Ram C. Tiwari (2013) A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment, Journal of the American Statistical Association, 108:501, 48-68, DOI: 10.1080/01621459.2012.734164 To link to this article: http://dx.doi.org/10.1080/01621459.2012.734164 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

Upload: ram-c

Post on 30-Jan-2017

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment

This article was downloaded by: [Universite Laval]On: 09 July 2014, At: 17:21Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

Journal of the American Statistical AssociationPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/uasa20

A Nested Dirichlet Process Analysis of ClusterRandomized Trial Data With Application in GeriatricCare AssessmentMan-Wai Ho a , Wanzhu Tu b , Pulak Ghosh c & Ram C. Tiwari aa Department of Statistics and Applied Probability , National University of Singapore, 6Science Drive 2 , S117546 , Republic of Singaporeb Department of Medicine , Indiana University School of Medicine, 410 West 10th Street,Suite 3000 , Indianapolis , IN , 46202c Department of Quantitative Management and Information Sciences , Indian Institute ofManagement, Bannerghatta road , Bangalore , 560076 , Indiad Office of Biostatistics Center for Drug Evaluation and Research , Food and DrugDevelopment Administration, 10903 New Hampshire Ave., WO Bldg. 21, Rm. 3524 , SilverSpring , MD , 20993-0002Accepted author version posted online: 08 Oct 2012.Published online: 15 Mar 2013.

To cite this article: Man-Wai Ho , Wanzhu Tu , Pulak Ghosh & Ram C. Tiwari (2013) A Nested Dirichlet Process Analysis ofCluster Randomized Trial Data With Application in Geriatric Care Assessment, Journal of the American Statistical Association,108:501, 48-68, DOI: 10.1080/01621459.2012.734164

To link to this article: http://dx.doi.org/10.1080/01621459.2012.734164

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Page 2: A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment

A Nested Dirichlet Process Analysis of ClusterRandomized Trial Data With Application in Geriatric

Care AssessmentMan-Wai HO, Wanzhu TU, Pulak GHOSH, and Ram C. TIWARI

In cluster randomized trials, patients seen by the same physician are randomized to the same treatment arm as a group. Besides the naturalclustering of patients due to cluster/group randomization, interactions between an individual patient and the attending physician within thegroup could just as well influence patient care outcomes. Despite the intuitive relevance of these interactions to treatment assessment, fewstudies have thus far examined their influences. Whether and to what extent these interactions affect assessment of the treatment effectremains unexplored. In fact, few statistical models provide ready accommodation for such interactions. In this research, we propose a generalmodeling framework based on the nested Dirichlet process (nDP) for assessing treatment effect in cluster randomized trials. The proposedmethodology explicitly accounts for physician–patient interactions by assuming that the interactions follow unspecified group-specificdistributions from an nDP. In addition to accounting for physician–patient interactions, the model has greatly enhanced the flexibilityof traditional mixed effect models by allowing for nonnormally distributed random effects, thus, alleviating concerns about mixed effectmisspecification and sidestepping verification of distributional assumptions on random effects. At the same time, the model retains the mixedmodels’ ability to make inferences on fixed effects. The proposed method is easily extendable to more complicated hierarchical clusteringstructures. We introduce the method in the context of a real cluster randomized trial. A comprehensive simulation study was conducted toassess the operating characteristics of the proposed nDP model.

KEY WORDS: Clustered data; Random effects distribution; Treatment effect

1. INTRODUCTION

1.1 Methodological Overview

Cluster randomized trials are common in clinical investiga-tion. The study design is often used in trials evaluating system-wide or physician-level interventions where all patients withinthe same clinic or seen by the same physician receive the samecare. While design considerations vary from study to study, in-vestigators tend to choose the design for practical reasons, suchas convenience of administration, inability to deliver interven-tion at patient level, fear of cross-treatment contamination, costof implementation, and so on. General features of cluster ran-domized trials have been studied and described in, for example,Donner and Klar (2000), Campbell, Mollison, and Grimshaw(2001), and Bland (2004). In addition to the latest Consol-idated Standards of Reporting Trials (CONSORT) statementon quality of general parallel group randomized trials (Moher,Schulz, and Altman 2001a, b), specific analytical and report-ing requirements have been developed for cluster randomizedtrials by Campbell, Elbourne, and Altman (2004). While there

Man-Wai Ho is Assistant Professor, Department of Statistics and AppliedProbability, National University of Singapore, 6 Science Drive 2, S117546,Republic of Singapore (E-mail: [email protected]). Wanzhu Tu is Professor,Department of Medicine, Indiana University School of Medicine, 410 West 10thStreet, Suite 3000, Indianapolis, IN 46202 (E-mail: [email protected]).Pulak Ghosh is Professor, Department of Quantitative Management and Infor-mation Sciences, Indian Institute of Management, Bannerghatta road, Ban-galore, 560076 India (E-mail: [email protected]). Ram C. Tiwari isAssociate Director, Office of Biostatistics Center for Drug Evaluation andResearch, Food and Drug Development Administration, 10903 New Hamp-shire Ave., WO Bldg. 21, Rm. 3524, Silver Spring, MD 20993-0002 (E-mail:[email protected]). This research was supported by the Ministry of Ed-ucation’s AcRF Tier 1 funding (WBS number: R-155-000-103-112) from Re-public of Singapore, and partially supported by grant RO1 HL095086 from theNational Institutes of Health, USA. The third author (P. G.) acknowledges thesupport of the DST grant (#SR?S4/MS:648/10) from the Government of India.The authors thank the Editor, the Associate Editor, and the anonymous reviewersfor their many constructive suggestions in improving the quality of this article.

is a general agreement on the need to account for clusteringeffects, there are no specific and evidence-based recommenda-tions on how this goal should be achieved. In practice, mixedeffect models remain the mainstay analytical tool for dealingwith clustered data (Brown and Prescott 1999; Verbeke andMolenberghs 2000). Random cluster effects in these modelsserve the purpose of accommodating the homogeneity of the el-ements within the cluster and heterogeneity across the clusters.By introducing these random effects into the model, one hopesto achieve valid assessments of the true treatment effect.

Despite the increasing popularity of mixed effect models inanalyzing clustered data, two issues could be raised about theiruse in cluster randomized trial analysis: (1) mixed models lackaccommodation for interaction between the randomization unitand the specific patient within the unit and (2) it is inherentlydifficult to verify any distributional assumptions on the randomeffects. The first issue will be particularly relevant if the inter-vention is delivered by the randomization unit. For example, in aphysician randomized trial, if the intervention/treatment is deliv-ered by the attending physician or requires active participationof the physician, it may not be justified to assume that all patientscared for by the same physician respond to the intervention ina similar fashion, as implied by shared random cluster effectsin mixed models. To our knowledge, few studies have exploredthe accommodation and impact of such patient–physician inter-actions. The second issue stems from the common practice ofmodeling random effects by normal distributions. Routine useof normally distributed random effects in mixed model has beenquestioned by a number of authors (Rashid 2003; Agresti, Caffo,and Ohman-Strickland 2004; Ohlssen, Sharples, and Spiegel-halter 2007; Branscum and Hanson 2008; Higgins, Thompson,

© 2013 American Statistical AssociationJournal of the American Statistical Association

March 2013, Vol. 108, No. 501, Applications and Case StudiesDOI: 10.1080/01621459.2012.734164

48

Dow

nloa

ded

by [

Uni

vers

ite L

aval

] at

17:

21 0

9 Ju

ly 2

014

Page 3: A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment

Ho et al.: A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment 49

and Spiegelhalter 2009). Since any deviation from normalityrepresents a form of model misspecification, chances are realthat such misspecified random effects distributions could affectthe validity and efficiency of inferences concerning the fixedeffects and the treatment effect. Unfortunately, the normalityassumption of the random effects distribution is not easily veri-fiable. Attempts have been made to verify or to relax the normalrandom effects assumption (Tchetgen and Coull 2006; Di andBandeen-Roche 2011). Also, testing the adequacy of the nor-mality assumption or any other parametric assumption for therandom effects remains a challenge unless one is willing to as-sume that the model is otherwise correctly specified, and thelatter assumption itself is typically unverifiable!

It should be noted that a number of remedies have been pro-posed to alleviate the concern of nonnormal mixed effects dis-tributions. For example, Bohning (2000) advocated the use ofmixture distributions to account for the clustering effect. Rashid(2003) proposed to minimize a sum of Jackel (1972) type dis-persion functions based on intracluster ranks of residuals. Leeand Thompson (2007) proposed to use skewed distributions toreduce the effects of outlying clusters. Although parametric dis-tributions such as the heavy-tailed t-distribution may providesome relief in certain data applications, they are fundamentallylimited due to their unverifiability.

A Bayesian semiparametric approach offers perhaps a morepromising alternative. There have been a few published worksusing Bayesian nonparametric approaches, particularly basedon Dirichlet process (DP) priors, in modeling the random ef-fects distribution (Walker and Mallick 1997; Burr et al. 2003;Burr and Doss 2005; Ohlssen, Sharples, and Spiegelhalter 2007;Branscum and Hanson 2008; Dunson 2008). A DP is parameter-ized by a precision parameter and a baseline probability distri-bution, which may be taken to be normal. Discrete mass pointsare drawn from the baseline distributions, and the closeness ofthe resulting discrete distribution to the baseline depends on thevalue of the precision parameter. Thus, the fitted random effectsdistribution using DP is more flexible and the method is alsomore robust against departures from a normal distribution. Atthe same time, it would retain good performance if the actualdistribution is normal.

Built upon these earlier works, we herein propose a broaderclass of random effects distributions for clustered data using therecently developed nested Dirichlet process (nDP; Rodrıguez,Dunson, and Gelfand 2008). By introducing this class of non-parametric distributions, we can directly accommodate andmodel the physician–patient interaction without any paramet-ric assumption. For the convenience of discussion, we introducethe proposed methodology in the context of cluster random-ized trials. In particular, we consider a trial aimed at improvingmental health outcomes in vulnerable older adults.

The remainder of the article is organized as follows. In thenext two sections, we introduce a cluster randomized trial of ageriatric care management program to highlight the motivationsfor this research and give a brief background on the nDP. InSection 2, we present the modeling framework using the nDP.Section 3 discusses the posterior and computational approach.In Section 4, we illustrate the performance of our proposedmodel through a comprehensive simulation study. Section 5

presents the results from the geriatric care management trial,followed by a brief discussion and conclusion in Section 6.

1.2 Geriatric Resources for Assessment and Careof Elders (GRACE) Study

Older inner-city residents represent a highly vulnerablepopulation for various health care outcomes. Together, thisgroup of people accounts for a disproportionately large shareof health care expenditures including high rates of acute careutilization (MedPAC 2005). The reasons that senior inner-cityresidents are vulnerable are complex, but socioeconomic stress,low health literacy, chronic medical conditions, and limitedaccess to health care are often cited as contributors to thevulnerability (Callahan et al. 1998). Because of these reasons,older adults, especially those in poverty, often do not receiverecommended care and preventive services, or chronic diseaseand geriatric syndrome management (Jencks et al. 2000; Aschet al. 2000; Wenger et al. 2003).

Past experience has shown that simple or single-dimensionalinterventions yield only limited improvements (Boult, Boult,and Pacala 1998). Lessons from prior efforts have been dissem-inated and new multi-dimensional assessment and interventionstrategies are being developed to focus more on providing inter-disciplinary assessment and consultation that integrate care de-livered by hospital attending physicians (Landefeld et al. 1995;Counsell et al. 2000; Reuben 2002; Wolff and Boult 2005).These interventions are typically delivered by physicians, butalso require the participation of medical specialists, hospitalnurses, care managers, and social workers, with the attend-ing physician overseeing the effort of the whole team/groupof experts. Because of the unique mode of intervention delivery,assessments of such interventions have some fundamental dif-ferences from traditional drug trials: in trials of pharmacologicalagents, interventions are delivered at patient level and outcomesare as well assessed at patient level. Care management inter-ventions, on the other hand, are implemented by care providers,and outcomes are assessed from individual patients. As a result,patients seen by the same team are likely to be correlated, andoutcomes depend not only on different care teams and differentpatients, but also on the interactions between the team and thepatient.

In this article, we consider a trial of a care management in-tervention (Counsell et al. 2007). It involves 749 individualpatients older than 65 years of age, being an established patient(defined as at least one visit to a primary care physician at thesame site within the past 12 months) and having an income lessthan 200% of the federal poverty level (defined as qualifyingfor Medicaid coverage or being enrolled in the county medi-cal assistance plan). Exclusion criteria included residence in anursing home or living with a study participant already enrolledin the trial, enrolled in another research study, receiving dialy-sis, severe hearing loss, English-language barrier, no access toa telephone, or severe cognitive impairment (defined by ShortPortable Mental Status Questionnaire score ≤5; Pfeiffer 1975).The intervention includes an advanced practice nurse and socialworker (intervention support team) who care for older patientsin collaboration with the patient’s primary care physician and

Dow

nloa

ded

by [

Uni

vers

ite L

aval

] at

17:

21 0

9 Ju

ly 2

014

Page 4: A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment

50 Journal of the American Statistical Association, March 2013

a geriatric interdisciplinary team. Intervention physicians andtheir support teams, referred to as physician teams, conductthorough geriatric assessments and propose care plans that areconsistent with the patient’s goals through face-to-face discus-sion (usually at the patient’s home and occasionally in the office,hospital, or nursing home) and telephone contacts with patients,family members or care-givers, and health care professionals.Group/cluster randomization was conducted at the physicianteam level. The intervention was delivered over a 1 year period.The purpose of the trial was to assess the efficacy of the inter-vention on various health outcomes, the primary outcome of thetrial is measured by the 36-Item Short-Form (SF-36) scales. Forthe simplicity of this discussion, we focused on the Mental Com-ponent Score (MCS) of the SF-36 instrument (Ware, Kosinski,and Keller 1994). Specifically, MCS was measured at baselineand at closeout. Independent variables of interest are treatmentassignment, age, sex, race, and medical comorbidities, includ-ing hypertension, angina, stroke, diabetes, chronic obstructivepulmonary disease (COPD), and cancer. The investigators hy-pothesize that the intervention would improve MCS outcome(measured as a change between baseline and 12 months). Asdiscussed previously, although the main objective is to assessthe intervention effect, the care delivery mode of the interventionis likely to result in physician and patient interactions that sig-nificantly alter the treatment outcome. We, therefore, considera model that explicitly accommodates these physician–patientinteractions.

1.3 Background on the Nested Dirichlet Process

Since Ferguson (1973) described the DP as a prior distri-bution on the space of probability distributions, DP has beenused with increased frequency in nonparametric Bayes estima-tion (see, e.g., Antoniak 1974; Lo 1978, 1984; Escobar 1988,1994; Escobar and West 1995; Ghosh, Basu, and Tiwari 2009).Let DP(αH ) denote a DP with baseline probability measure Hand precision/control parameter α > 0. Replacing H by anotherDP, Rodrıguez (2007) and Rodrıguez, Dunson, and Gelfand(2008) introduced the nDP, which, from a similar perspective,can be characterized as a prior distribution on the joint spaceof a collection of dependent distributions. The nDP provides aframework to model collections of dependent distributions usingclustering features of the DP. A collection {Fj , j = 1, . . . , C}of distributions on any complete and separable metric space� such that Fj ∼ Q with Q ≡ DP(αDP(ρH )), for α, ρ > 0,and H being a probability measure on �, is said to follow annDP. We write it as {F1, . . . , FC} ∼ nDP(α, ρ,H ). The stick-breaking characterization of the DP (Sethuraman and Tiwari1982; Sethuraman 1994) implies that

Fj (·) ∼ Q ≡∞∑

k=1

π∗k δF ∗

k (·) (1)

and

F ∗k (·) ≡

∞∑l=1

ω∗lkδβ∗

lk(·), (2)

with β∗lk

iid∼ H ,

ω∗lk = u∗

lk

l−1∏s=1

(1 − u∗sk), u∗

lk

iid∼ beta(1, ρ),

and

π∗k = v∗

k

k−1∏s=1

(1 − v∗s ), v∗

k

iid∼ beta(1, α),

where beta(a, b) represents a beta probability distribution withparameters a and b on the (0, 1) interval. The nDP naturally in-duces clustering in the space of distributions as a consequenceof the almost surely discreteness feature of Q, illustrated by(1). Specifically, there is a nonzero probability 1/(1 + α) thatthe two distributions Fj and Fj ′ follow the same random dis-tribution F ∗

k defined by (2). Furthermore, nDP enables clus-tering between samples from the distributions in the collec-tion. That is, samples from one single Fj , or from Fj and Fj ′ ,j �= j ′, are correlated, and possibly identical. As F ∗

k is almostsurely discrete as defined in (2), sample values βij and βi ′jfrom Fj may be identical to some β∗

lk if Fj = F ∗k , while the

correlation is given by corr(βij , βi ′j ) = 1/(1 + ρ). In analogy,respective sample values βij and βi ′j ′ from two different dis-tributions Fj and Fj ′ , j �= j ′, may be identical to some β∗

lk

if Fj = Fj ′ = F ∗k , while the correlation can be shown to be

corr(βij , βi ′j ′ ) = 1/[(1 + α)(1 + ρ)], which is always less thanthe correlation 1/(1 + ρ) between two samples from the sameFj . See Rodrıguez (2007) and Rodrıguez, Dunson, and Gelfand(2008) for more detailed discussion on nDP.

2. MODEL AND METHODS

In this section, we introduce the basic model assumptionsin the context of the previously described trial. In a genericform, we let Yijt be the response/outcome of ith patient caredby jth physician/physician team (or, in general, from the jthgroup/cluster) under tth intervention, with i = 1, 2, . . . , nj , j =1, 2, . . . , C, t = 1, 2, . . . , T . We assume that

Yijt = θt + βij + wijγ + eij t , (3)

where θt is the fixed effect of tth intervention, βij denotesthe interaction effect between ith patient and jth physician,γ = (γ1, . . . , γr ) is a (r × 1)-dimensional parameter vectorof covariates associated with (r × 1)-dimensional design vec-tors wij = (wij1, wij2, . . . , wijr ), and eij t |τ iid∼ N(0, τ−1) is anormally distributed error or noise. Furthermore, we assumethat for any j, βij follows an unknown physician effect distri-bution or, simply, interaction distribution, denoted by Fj , andthat the collection of all Fj , j = 1, . . . , C, follows an nDP, al-lowing for a nested patient effect within physician. That is, forj = 1, . . . , C,

(βij |Fj )iid∼ Fj , i = 1, . . . , nj ,

({F1, . . . , FC}|α, ρ, μβ, τβ ) ∼ nDP(α, ρ,H ),

with H = N(μβ, τ−1

β

),

for some −∞ < μβ < ∞ and τβ > 0. This formulation for theresponses has the following implications:

Dow

nloa

ded

by [

Uni

vers

ite L

aval

] at

17:

21 0

9 Ju

ly 2

014

Page 5: A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment

Ho et al.: A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment 51

1. Heteroscedasticity: Outcomes from patients cared for bydifferent physicians form different clusters.

2. Exchangeability: Interactions between the ith patient andjth physician, βij , are independent and identically dis-tributed within each intervention t = 1, . . . , T , given allthe physician effect distributions F1, . . . , FC .

3. Random individual physician effect: Individual physicianeffect on responses is depicted by a completely unknownphysician effect distribution rather than a single parameter.

4. Borrowing information across physicians: With sharedphysician effect distributions, the nested DP allows forthe borrowing of information about patients across physi-cians/clusters.

Based on standard choices of the prior distributions for theunknown parameters and given observed covariates (w11, . . . ,

wn11, . . . , w1C, . . . , wnCC), we assume that the observed out-comes of the response Y = (Y

1 , . . . , YT ), with Yt =

(Y11t , . . . , Yn11t , . . . , Y1Ct , . . . , YnCCt ), are realizations fromthe following hierarchical model, hereafter referred to as thenDP model:

(Yijt |θt , βij , γ , τ )ind∼ N

(θt + βij + w

ijγ , τ−1),

j = 1, . . . , C, i = 1, . . . , nj ,

θt

iid∼ N(0, σ 2

θ

), t = 1, . . . , T ,

(βij |Fj )iid∼ Fj , i = 1, . . . , nj

({F1, . . . , FC}|α, ρ, μβ, τβ) ∼ nDP(α, ρ,H ),with H = N

(μβ, τ−1

β

),

α ∼ gamma(aα, bα),ρ ∼ gamma(aρ, bρ),

μβ ∼ N(0, σ 2

μ

),

τβ ∼ gamma(aτβ, bτβ

),γ ∼ MNr (0,�γ ),τ ∼ gamma(aτ , bτ ),

(4)

where σ 2θ , aα, bα, aρ, bρ, σ 2

μ, aτβ, bτβ

, aτ , bτ are fixed positiveconstants, �γ is a known r × r variance-covariance matrix,gamma(a, b) denotes a gamma distribution with shape param-eter a > 0 and scale parameter b > 0 such that its mean isa/b, and MNr (0,�γ ) represents an r-variate normal distribu-tion with zero mean vector and covariance matrix �γ . The nDPmodel can be thought of as a generalization of the standardnormal mixed effects model, which is defined as

(Yijt |θt , βij , γ , τ )ind∼ N

(θt + βij + w

ijγ , τ−1),

j = 1, . . . , C, i = 1, . . . , nj ,

θt

iid∼ N(0, σ 2

θ

), t = 1, . . . , T ,

(βij |μβj , τβj )iid∼ N

(μβj , τ

−1βj

),

μβj

iid∼ N(0, σ 2

μ

),

τβj

iid∼ gamma(aτβ, bτβ

),γ ∼ MNr (0,�γ ),τ ∼ gamma(aτ , bτ ).

(5)

3. POSTERIORS

Following Rodrıguez (2007) and Rodrıguez, Dunson, andGelfand (2008), we replace the stick-breaking representationsof the DP priors for both Fj and F ∗

k given in (1) and (2) by their

almost sure truncation approximations, which are finite sums ofK and L elements, respectively. That is,

Fj (·) ≈K∑

k=1

π∗k δF ∗

k (·), (6)

where π∗k = v∗

k

∏k−1s=1(1 − v∗

s ) with v∗k

iid∼ beta(1, α), for k =1, . . . , K − 1, and v∗

K = 1, and, for k = 1, . . . , K ,

F ∗k (·) ≈

L∑l=1

ω∗lkδβ∗

lk(·), (7)

where β∗lk

iid∼ H , ω∗lk = u∗

lk

∏l−1s=1(1 − u∗

sk) with u∗lk

iid∼ beta(1, ρ),for l = 1, . . . , L − 1, and u∗

Lk = 1. For brevity, define π∗ =(π∗

1 , . . . , π∗K ), ω∗ = (ω∗

11, . . . , ω∗L1, . . . , ω

∗1K, . . . , ω∗

LK ), andβ∗ = (β∗

11, . . . , β∗L1, . . . , β

∗1K, . . . , β∗

LK ). Discussion and justi-fication about choosing values of K and L are available in Theo-rem B.1 and in the supplemental material of Rodrıguez, Dunson,and Gelfand (2008). Theorem B.1 gives a decreasing error boundas L,K → ∞ on the total variation distance between an nDPand its truncation approximation. The supplemental materialcontains numerical evidence suggesting how to set reasonabletruncation values.

The finite dimensionalities of (6) and (7) allow us to expressthe Bayesian semiparametric model (4) entirely in terms of afinite number of random variables. Because of the choices oftheir prior distributions, these random variables can be drawnfrom standard parametric distributions. We assign them intothe following four groups or blocks of parameters, namely,(ζ , ξ ,π∗,ω∗,β∗, α, ρ, μβ, τβ ), γ , τ , and (θ1, . . . , θT ), whereζ and ξ are two vectors of classification variables describingclustering behavior of the random interaction effects definedas follows. Let ζ = (ζ1, . . . , ζC) denote a classification vec-tor that describes the interaction effects by setting ζj = k, forj = 1, . . . , C and k = 1, . . . , K , if and only if interactions be-tween patients and jth physician, βij , are independent and iden-tically distributed as Fj = F ∗

k . In addition, define classificationvariables ξij , for j = 1, . . . , C and i = 1, . . . , nj , by settingξij = l, for l = 1, . . . , L, if and only if the interaction betweenith patient and jth physician for the response Yijt is given byβij = β∗

lζj. As shown below, the knowledge of these two vectors

of classification variables provides an equivalent expression ofthe likelihood of the observed outcomes. Under model (4), thelikelihood of the observed response Yijt = yijt from ith patientcared for by jth physician, who is under tth intervention and isassociated with a covariate vector wij , takes the form

f (yijt |θt , βij , γ , τ, wij )

=√

τ

2πexp

[− τ

2

(yijt − θt − βij − w

ijγ)2], (8)

or, equivalently,

f (yijt |θt , β∗ξij ,ζj

, γ , τ, wij )

=√

τ

2πexp

[− τ

2

(yijt − θt − β∗

ξij ,ζj− w

ijγ)2], (9)

provided that ζ and ξ are known. Generalizing the idea of theblocked Gibbs algorithm of Ishwaran and James (2001), aniterative algorithm that cycles through four steps (discussed in

Dow

nloa

ded

by [

Uni

vers

ite L

aval

] at

17:

21 0

9 Ju

ly 2

014

Page 6: A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment

52 Journal of the American Statistical Association, March 2013

the Appendix) can be derived based on this equivalence relationof the likelihood. Each step draws one of the four desirableblocks of parameters conditioning on all the other variables,and yields random variates from the joint posterior distributionof all the parameters thereby enabling evaluation of posteriorestimates of any quantity of interest in the nDP model (4).

Implementation of the iterative algorithm for M cycles, withM being a large number, results in a Markov chain of realiza-tions of the four blocks of parameters. Suppose that a Markovsequence (θ (1)

1 , . . . , θ(1)T ), . . . , (θ (M)

1 , . . . , θ(M)T ) is generated for

the intervention effects (θ1, . . . , θT ). The posterior probabili-ties for different hypotheses about any relationship between theinterventions, namely, equality, equivalence, or noninferiority(NI), can be approximated by sample probabilities of the corre-sponding events of interest obtained from the posterior samplesof the intervention effects. For instance, the probability of equiv-alence of any two interventions P (−� < θt − θt ′ < �), for asmall but known margin � > 0, is approximated by

1

M

M∑b=1

I{−�<θ(b)t −θ

(b)t ′ <�},

and the probability of NI of tth intervention (with effect θt ) tot ′th intervention (with effect θt ′), denoted by P (θt ≥ θt ′ − �),is approximated by

1

M

M∑b=1

I{(θ (b)t ≥θ

(b)t ′ −�)}.

It should be noted here that our application is a superiority trialwith � = 0. However, the proposed model can also be used forNI trial and we also assess the performance of our model for NItrials in the simulation study in Section 4.

For the purpose of investigating the accuracy in either esti-mation or prediction of any new patient outcome, one can makeuse of density estimates of any outcome y of any patient whois cared for by jth physician under tth intervention and is as-sociated with covariate vector w. Such an estimate is generallycomputed by

1

M

M∑b=1

f(b)j t (y|w), (10)

where f(b)j t is analogously defined as in (8) or (9) according

to the posterior samples of the unknown parameters in the bthcycle, denoted by θ

(b)t , β

(b)ij , γ (b), and τ (b), for patients under

the same intervention and cared for by the same physician j.By the nDP model, suppose that in the bth cycle, {β∗

ξ(b)ij ,ζ

(b)j

}(b)

denotes the posterior draw of the interaction effect βij throughclassification variables ξ

(b)ij and ζ

(b)j in the same iteration,

f(b)j t (y|w) = 1

Njt

√τ (b)

nj∑i=1

exp

{− τ (b)

2

(y − θ

(b)t − {β∗

ξ(b)ij ,ζ

(b)j

}(b)

− wγ (b))2}

I{At (i,j )},

where I{At (i,j )} is an indicator function for the event that ithpatient cared for by jth physician is under tth intervention, andNjt ≡ ∑nj

i=1 I{At (i,j )} ≤ nj is the total number of patients under

tth intervention among all nj patients cared for by jth physician.By the normal model (5),

f(b)j t (y|w) = 1

Njt

√τ (b)

nj∑i=1

× exp

{−τ (b)

2

(y−θ

(b)t −β

(b)ij −wγ (b)

)2}

I{At (i,j )}.

3.1 Model Comparison

The nDP model (4) can be viewed as a direct generalization ofthe normal/Gaussian model by modeling the random interactioneffects using nDP. Thus, it is important to formally test the utilityof the nDP model over the simple normal model (5).

We consider the predictive measure of model performanceintroduced by Geisser and Eddy (1979). This predictive criterionis termed as the log pseudo-marginal likelihood (LPML). LPMLhas been used extensively in Bayesian model selection (see, e.g.,Chen, Shao, and Ibrahim 2000, chap. 10; Brown and Ibrahim2003; Ghosh, Basu, and Tiwari 2009) as a useful summarystatistic for comparing model fits. Models with greater LPMLvalues represent a better fit. The LPML is defined based on theestimates of the conditional predictive ordinate (CPO; Gelfand,Dey, and Chang 1992; Chen, Shao, and Ibrahim 2000) for allobservations, and is given by

LPML =C∑

j=1

nj∑i=1

ln(CPOij ), (11)

where

CPOij =⎡⎣ 1

M

M∑b=1

1

f(yijt |θ (b)

t , β(b)ij , γ (b), τ (b), wij

)⎤⎦−1

,

with f defined in (8), is the estimate of the true CPO of theoutcome of ith patient cared for by jth physician. Specifically,for the nDP model,

CPOnDPij =

[1

M

M∑b=1

1

f (yijt |θ (b)t , {β∗

ξ(b)ij ,ζ

(b)j

}(b), γ (b), τ (b), wij )

]−1

= M√2π

[M∑

b=1

1√τ (b)

exp

{τ (b)

2

(yijt − θ

(b)t

−{β∗ξ

(b)ij ,ζ

(b)j

}(b) − wijγ

(b))2

}]−1

,

while, for the normal model (5), the CPO estimate takes the

same form as CPOnDPij with {β∗

ξ(b)ij ,ζ

(b)j

}(b) replaced by β(b)ij .

4. SIMULATION STUDY

In this section, we demonstrate through simulation studiesthe ability of the nDP model in providing accurate estimatesof the intervention effect, the covariate effect, the density es-timate of any response, and the random interaction effects. Inaddition, we discuss how to select the best/most representativecluster to further use the clustering features of the nDP model

Dow

nloa

ded

by [

Uni

vers

ite L

aval

] at

17:

21 0

9 Ju

ly 2

014

Page 7: A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment

Ho et al.: A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment 53

by introducing an alternative estimate of the true CPO, calledpseudo-CPO.

Throughout this section, posterior estimates of parametersand other quantities of interest are computed based on M =10,000 samples taken from the Markov chain with a thinning of 5after discarding 50,000 burn-in samples. Hyperprior parametersas described in (4) are set as follows. To deflate the priors, we setσ 2

θ = σ 2μ = σ 2

γ = 100 (as �γ = diag(σ 2γ , . . . , σ 2

γ )), and aτ =bτ = aτβ

= bτβ= 0.001. Furthermore, we set aα = bα = aρ =

bρ = 3, implying that E(α) = E(ρ) = 1, which is a commonchoice in the literature, and P (α > 3) = P (ρ > 3) ≈ 0.006. Forthe purpose of comparison, some of the following simulationsare obtained by fitting the normal model as in Equation (5) withthe same hyperparameter settings. Following Rodrıguez (2007)and Rodrıguez, Dunson, and Gelfand (2008), we assume thetruncation levels as K = 35 and L = 55, respectively. As notedby Ghosh, Ghosh, and Tiwari (2008), this choice of K and Lresults in a good approximation of the true nDP as long as allnj ≤ 500 and C ≤ 50.

4.1 Data Simulation

Two collections of datasets are generated according to Equa-tion (3) by varying the parametric families of distributionsof the interaction (βij ) and the error (eij t ). The first collec-tion includes 10 datasets simulated by a standard normal er-ror eij t ∼ Z ≡ N(0, 1), while the second collection contains 12datasets whose error density follows a Student’s t-distributionwith 1 and 5 degree of freedom (df). In each of these datasets,(1) there are T = 2 different interventions/treatments withθ1 = 0.5 and θ2 = −θ1 = −0.5, which, respectively, representthe intervention and control arms, (2) the number of indepen-dent observations from each group j is nj = 50, and (3) thedistribution (Fj ) of the interaction effects βij is drawn fromeither a family of unimodal densities, a family of mixture ofnormal distributions, or a family of mixture of point masses.Next, we describe the particular distribution (Fj ) under eachof these specifications. Under the unimodal distributions, weconsider standard normal, skew-normal, and Student’s t with3 df (denoted by t3). For datasets with these Fj ’s, the num-ber of groups is C = 8 (and, hence, the total sample size is8 × 50 = 400) and the interaction effects are described as fol-lows. For all i = 1, . . . , 50,

βi1d= βi2

d= 2X1, βi3d= βi4

d= X1, βi5d= 3 + X2, βi6

d= 5 + X2,

βi7d= 10 + X3, βi8

d= −1 + 2X3,

whered= stands for “equal in distribution.” Specifically, the

standard normal case and the Student’s t case are defined byXi ≡ Z and Xi ≡ t3, for i = 1, 2, 3, respectively. For the skew-normal case, X1 ≡ SN(10), X2 ≡ SN(3), and X3 ≡ SN(−10),where SN(λ),−∞ < λ < ∞, stands for a skew-normal randomvariable defined by Azzalini (1985). Second, for datasets withFj as mixtures of normals, the number of groups is C = 4 (and,hence, the total sample size is 4 × 50 = 200) and the interactioneffects for each group are defined as follows:

βi1iid∼ F1 ≡ 0.6N

(0, σ 2

2

)+ 0.4N(3, σ 2

1

),

βi2iid∼ F2 ≡ 0.5N

(0, σ 2

2

)+ 0.5N(3, σ 2

1

),

βi3iid∼ F3 ≡ 0.8N

(5, σ 2

1

)+ 0.2N(10, σ 2

1

),

βi4iid∼ F4 ≡ 0.8N

(5, σ 2

1

)+ 0.18N(10, σ 2

1

)+ 0.02N

(− 1, σ 22

), (12)

for i = 1, . . . , 50, with σ1 = 1 and σ2 = 2. Finally, for the otherdatasets with Fj as mixtures of point masses, there are C = 4groups and the interaction effects βij are defined similarly as in(12) with σ1 = σ2 = 0.

For the covariate W, we assume parameter γ = −5. Under ev-ery combination of distributions for βij and for eij t , two differentdatasets are generated, one from wij ∼ N(0, 1.52) and anotherfrom wij ∼ U(−1, 1), respectively. Table 1 gives a summary ofall 22 datasets defined above.

Here, in the first column, “N” stands for normal distribu-tion, “U” stands for uniform distribution, “SN” stands for skew-normal distribution, “t1,” “t3,” and “t5,” respectively, stand for t-distributions with 1, 3, and 5 df, “MixN” stands for the mixture ofnormals defined in (12), and “MixPt” stands for mixture of pointmasses defined in (12) with σ1 = σ2 = 0. For instance, we referto the dataset of responses generated by drawing βij accordingto mixtures of point masses defined in (12) with σ1 = σ2 = 0,covariate wij from U(−1, 1), and error/noise from a Student’st-distribution with 1 df, as Dataset(MixPt, U, t1) from the lastrow. In all numerical results to follow, we refer to θ1 − θ2 = 2θ1,with true value being 2 × 0.5 = 1, as the intervention effect.

4.2 Accuracy of Treatment Effect Estimation

We then assess the accuracy of treatment effect estimatesby the proposed nDP model in various data settings. Table 2

Table 1. A summary of all 22 datasets

Dataset βij wij eij t

(N, N, N)/(N, U, N) Normal N(0, 1.52)/U(−1, 1) N(0, 1)(SN, N, N)/(SN, U, N) Skew-normal N(0, 1.52)/U(−1, 1) N(0, 1)(t3, N, N)/(t3, U, N) t with 3 df N(0, 1.52)/U(−1, 1) N(0, 1)(MixN, N, N)/(MixN, U, N) (12) with σ1 = 1 and σ2 = 2 N(0, 1.52)/U(−1, 1) N(0, 1)(MixPt, N, N)/(MixPt, U, N) (12) with σ1 = σ2 = 0 N(0, 1.52)/U(−1, 1) N(0, 1)(N, N, t5)/(N, U, t5) Normal N(0, 1.52)/U(−1, 1) t5(MixN, N, t5)/(MixN, U, t5) (12) with σ1 = 1 and σ2 = 2 N(0, 1.52)/U(−1, 1) t5(MixPt, N, t5)/(MixPt, U, t5) (12) with σ1 = σ2 = 0 N(0, 1.52)/U(−1, 1) t5(N, N, t1)/(N, U, t1) Normal N(0, 1.52)/U(−1, 1) t1(MixN, N, t1)/(MixN, U, t1) (12) with σ1 = 1 and σ2 = 2 N(0, 1.52)/U(−1, 1) t1(MixPt, N, t1)/(MixPt, U, t1) (12) with σ1 = σ2 = 0 N(0, 1.52)/U(−1, 1) t1

Dow

nloa

ded

by [

Uni

vers

ite L

aval

] at

17:

21 0

9 Ju

ly 2

014

Page 8: A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment

54 Journal of the American Statistical Association, March 2013

Table 2. Probability estimates of noninferiority of the intervention under study to the control arms for � = 0.5, 0.1, 0.05, 0.005, LPML, andpseudo-LPML under both the nDP model (4) and the normal model (5) based on all the 22 different datasets defined in Section 4.1

Model

nDP Normal

� �pseudo- pseudo-Dataset 0.5 0.1 0.05 0.005 LPML LPML 0.5 0.1 0.05 0.005 LPML LPML

(N, N, N) 0.999 1 1 1 −800.32 −640.85 0.999 1 1 1 −799.93 −610.44(N, U, N) 0.992 1 1 1 −794.54 −636.09 0.987 0.999 1 1 −785.87 −612.22(SN, N, N) 0.998 1 1 1 −697.57 −626.60 0.998 1 1 1 −697.89 −582.29(SN, U, N) 0.999 1 1 1 −716.35 −626.10 0.999 1 1 1 −712.59 −590.98(t3, N, N) 0.490 0.961 0.975 0.985 −902.86 −683.88 0.468 0.960 0.974 0.984 −902.03 −642.81(t3, U, N) 0.997 1 1 1 −891.18 −652.67 0.998 0.999 0.999 0.999 −902.91 −578.49(MixN, N, N) 0.924 0.995 0.996 0.997 −459.99 −294.82 0.882 0.991 0.993 0.995 −469.43 −312.22(MixN, U, N) 0.914 0.997 0.999 0.999 −445.47 −262.81 0.894 0.991 0.994 0.996 −474.04 −396.46(MixPt, N, N) 1 1 1 1 −395.14 −234.38 0.951 0.998 0.999 0.999 −436.95 −294.43(MixPt, U, N) 0.999 1 1 1 −371.00 −225.92 0.913 0.996 0.998 0.998 −436.07 −215.18(N, N, t5) 0.998 1 1 1 −819.99 −633.02 0.997 0.999 1 1 −827.11 −556.67(N, U, t5) 0.999 1 1 1 −832.17 −683.80 0.999 1 1 1 −830.20 −676.37(MixN, N, t5) 0.834 0.988 0.993 0.994 −458.00 −321.99 0.724 0.959 0.970 0.977 −466.15 −353.91(MixN, U, t5) 0.844 0.989 0.993 0.995 −488.68 −408.53 0.883 0.988 0.992 0.994 −493.58 −452.20(MixPt, N, t5) 0.966 0.999 1 1 −410.63 −215.24 0.955 0.998 0.999 0.999 −436.61 −227.00(MixPt, U, t5) 0.994 0.999 0.999 0.999 −412.40 −265.26 0.854 0.988 0.991 0.995 −457.87 −246.75(N, N, t1) 0.984 1 1 1 −1019.32 −749.41 0.898 0.963 0.966 0.970 −1450.11 −1069.98(N, U, t1) 0.991 1 1 1 −1036.75 −721.61 0.838 0.948 0.957 0.964 −1342.10 −798.53(MixN, N, t1) 0.838 0.983 0.987 0.991 −519.70 −286.33 0.627 0.751 0.766 0.777 −710.03 −340.17(MixN, U, t1) 0.898 0.980 0.985 0.989 −561.62 −348.36 0.933 0.978 0.981 0.983 −643.90 −413.85(MixPt, N, t1) 0.729 0.953 0.963 0.970 −510.66 −304.99 0.573 0.706 0.722 0.734 −706.18 −496.42(MixPt, U, t1) 0.934 0.991 0.994 0.996 −542.43 −339.90 0.951 0.984 0.986 0.988 −657.81 −423.64

summarizes the posterior probability estimates of the treatmenteffect based on the 22 simulated datasets. As discussedbefore, our current application using the GRACE studyis a superiority trial with � = 0. However, in this sec-tion, we also assess how our nDP model fairs in an NItrial situation, that is, computing M−1 ∑M

b=1 I{�<2θ(b)1 }, when

� = 0.5, 0.1, 0.05, 0.005. It should be noted that in real clinicaltrials, � must be chosen judiciously as discussed in detail inthe FDA guidelines on NI trial (http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/UCM202140.pdf) and also in Chen et al. (2011).

When the true distributions Fj of the random interactions areunimodal (either normal, skew-normal, or t3), both the probabil-ity estimates and LPML are comparable under the normal andnDP models. However, when Fj ’s deviate from unimodality,the estimates under the nDP model are much closer to the truevalue of 1 than those under the normal model. Thus, the normalmodel fails to provide as strong evidence as the nDP modelin supporting the truth that the intervention is efficacious. Wefurther observe that the normal model on several scenarios failsto give accurate point estimates of θ1. For example, based onDataset(MixN, U, t1) and Dataset(MixPt, U, t1), the posteriormean (or median) estimates of the intervention effects underthe normal model are 1.755 (or 1.745) and 1.819 (or 1.816),which are not in the vicinity of the true value of the interventioneffect. However, the corresponding estimates, 1.134 (or 1.128)and 1.227 (or 1.232) under the nDP model, are much closerto the true value. In addition, we further observe that in severalscenarios the posterior distribution of θ1 under the normal model

is more variable than that under the nDP model, as illustrated inFigure 1.

In summary, the proposed nDP model clearly outperformsthe normal model when the data are generated from nonnormaldistributions. In particular, it performs well in estimating theintervention effect when the data are generated under differentsettings including symmetric random effects, asymmetric ran-dom effects, multi-modal random effects, normal observationalnoise, and observational noise of heavier tail than normals. Onemay also note the generally smaller values of the LPML of thenDP model than those of the normal model in Table 2, indi-cating that the nDP model provides a better fit to the data thanthe normal model. It is worthy to point out that although thenDP model is only nonparametric at the level of the randomeffect distribution and not at the level of the observational er-ror distribution, it provides better estimates of the interventioneffect as compared with the normal model when the data aregenerated with nonnormal errors. Thus, the selection of correctrandom effects could be important in this scenario also. Similarresults have been found in Ghidey, Lesaffre, and Eilers (2004)and Litiere, Alonso, and Molenberghs (2007).

4.3 Density Estimate of the Responses

Figure 2 shows density estimates of response under inter-vention from all four groups based on Dataset(MixPt, U, t5).Histograms of the simulated data, roughly 25 each, are dis-played in the respective settings according to the two differentinterventions and four different groups. Density estimates by

Dow

nloa

ded

by [

Uni

vers

ite L

aval

] at

17:

21 0

9 Ju

ly 2

014

Page 9: A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment

Ho et al.: A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment 55

Figure 1. Boxplots of posterior samples of the intervention effect 2θ1 under the nDP model (left) and under the normal model (right) basedon nine selected datasets. The online version of this figure is in color.

the nDP model are multi-modal (except for the top left graph),while those by the normal model are all unimodal. It is evidentfrom the graphs that the nDP model is able to capture multi-modality and extreme values in the data, while, as expected, thenormal model fails in that task.

4.4 Main Effects of Covariates

Based on eight selected datasets, four with a normally dis-tributed covariate and four with a uniformly distributed covari-ate, posterior distributions of the covariate coefficient γ underthe nDP model are shown as histograms in Figure 3. For the pur-pose of comparison, kernel density estimates constructed basedon posterior samples of γ by the normal model are also included

as solid lines in each graph. A comparison between the peaksof the histograms and those of the solid lines reveals that thenDP model performs better than the normal model in estimatingthe covariate effects (main) as the peaks of the histograms aregenerally closer to the true value of γ (= −5) compared withthose from the normal model. In addition, in some cases, pos-terior distributions of γ under the normal model fail in pickingthe true value, and they spread over a range larger than thoseunder the nDP model.

4.5 Interaction Effect

The superiority of the nDP model over the normal model isfurther demonstrated in the lower four rows of Figure 4. Based

Dow

nloa

ded

by [

Uni

vers

ite L

aval

] at

17:

21 0

9 Ju

ly 2

014

Page 10: A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment

56 Journal of the American Statistical Association, March 2013

Figure 2. Density estimates of responses in the intervention and control arms (left to right) from the four groups (top to bottom) under the nDPmodel (solid lines) and under the normal model (dashed lines) based on observations (represented by histograms) from Dataset(MixPt, U, t5).The online version of this figure is in color.

Dow

nloa

ded

by [

Uni

vers

ite L

aval

] at

17:

21 0

9 Ju

ly 2

014

Page 11: A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment

Ho et al.: A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment 57

Figure 3. Posterior distributions of covariate parameter γ under the nDP model (histograms) and under the normal model (solid lines) basedon eight selected datasets, with dashed lines representing the true value −5 of γ . The online version of this figure is in color.

Dow

nloa

ded

by [

Uni

vers

ite L

aval

] at

17:

21 0

9 Ju

ly 2

014

Page 12: A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment

58 Journal of the American Statistical Association, March 2013

Figure 4. True values (dashed lines) and posterior estimates of the intervention effect 2θ1 (top row), τ (second row), and the randominteraction distributions Fj for j = 1, . . . , 4 (from third to last rows), under the nDP model (left) and under the normal model (right) based onDataset(MixPt, U, t5). The online version of this figure is in color.

Dow

nloa

ded

by [

Uni

vers

ite L

aval

] at

17:

21 0

9 Ju

ly 2

014

Page 13: A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment

Ho et al.: A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment 59

Table 3. Estimates of P (Fj = Fj ′ ) based on Dataset(N, N, N)

j \ j ′ 1 2 3 4 5 6 7

2 0.99 (1.00)3 0.06 (0.00) 0.06 (0.00)4 0.12 (0.00) 0.12 (0.00) 0.92 (0.99)5 0.00 (0.00) 0.00 (0.00) 0.00 (0.00) 0.00 (0.00)6 0.00 (0.00) 0.00 (0.00) 0.00 (0.00) 0.00 (0.00) 0.00 (0.00)7 0.00 (0.00) 0.00 (0.00) 0.00 (0.00) 0.00 (0.00) 0.00 (0.00) 0.00 (0.00)8 0.98 (0.00) 0.98 (0.00) 0.07 (0.00) 0.12 (0.00) 0.00 (0.00) 0.00 (0.00) 0.00 (0.00)

on Dataset(MixPt, U, t5), posterior distributions of interactioneffects for all of the four groups under the nDP model (leftcolumn) are much closer to the true Fj , compared with thoseunder the normal model (right column), as they more preciselycapture all the modes/point masses of the true Fj defined by (12)with σ1 = σ2 = 0. Indeed, the normal model yields unimodaldistributions, and, hence, fails to correctly estimate the multi-modal Fj ’s.

4.5.1 Clustering Features. A way to illustrate how the spe-cial clustering features of the nDP model might benefit inferencein a cluster randomized trial is to study the probability estimatesof equality between interaction distributions from different pairsof groups j and j ′, P (Fj = Fj ′), obtained by the nDP model.

When Dataset(N, N, N), or any of the other nine defineddatasets based on true unimodal Fj ’s, is considered, we have trueprobabilities P (F1 = F2) = P (F3 = F4) = 1, and zero prob-abilities for other pairs of groups. In our simulation results,P (F1 = F2) and P (F3 = F4) are always estimated to be greaterthan 0.9, while most of the other probabilities are estimatedto be close to 0, except P (F8 = F1) and P (F8 = F2). This islikely because F8 ∼ N(−1, 22) overlaps substantially with F1

and F2, which follow N(0, 22). By increasing the group size,for j = 1, . . . , 8, nj = 50 to nj = 400 in these 10 datasets, wenote that estimates of all these probabilities become very closeto the true values. For instance, based on Dataset(N, N, N), esti-mates of P (Fj = Fj ′) are shown in Table 3. Inside brackets arethe corresponding probability estimates when nj = 400 in eachgroup.

Figure 5 depicts the posterior distributions of 2θ1, τ , and γ

obtained accordingly. It is clear that all the estimates with nj =400 are more precise or less variable than those with nj = 50.

Considering the 12 datasets defined on multi-modal Fj ’s, weshould have P (Fj = Fj ′) = 0 for every j �= j ′. However, theyare not all estimated to be zero by the nDP model. This is notsurprising as F1 and F2, and F3 and F4, are similarly defined tobe around or on 0 and 3, and 5 and 10, respectively, the nDPmodel identifies the two pairs of Fj ’s that are the same or similarby estimating both P (F1 = F2) and P (F3 = F4) to be close to1 to allow borrowing of information across groups.

Apart from allowing information sharing in estimating Fj ’s,the nDP model also allows information sharing in estimatingthe individual interaction effects βij . This is illustrated by re-visiting the simulation results based on Dataset(MixPt, U, t5).It is evident that the nDP model outperforms the normal modelin estimating both the intervention effect (in Table 2 and fromthe posterior distribution in Figure 4), and the interaction distri-butions (from Figure 4), and provides a better fit of the data interms of LPML. Scrutinizing for individual interaction estimatesof βij in each Gibbs cycle/iteration reveals further evidence ofinformation sharing among βij for groups j = 1, 2, most of theestimates are tied at some values close to either 0 or 3 (the twomodes of F1 and F2), while among βij for groups j = 3, 4, mostof their estimates are tied at some values close to either 5 or 10(the two modes of F3 and F4).

4.6 The Best Cluster/Iteration

CPO is a generally accepted method for model selection ina Bayesian framework. It can also be used to choose the bestcluster. Among the M iterations, we select the best or the mostrepresentative cluster that corresponds to the largest value ofa proxy of LPML, called pseudo-LPML. The pseudo-LPML,denoted by LPML(B), is defined as in (11) with CPOij replaced

Figure 5. Posterior estimates of the intervention effect 2θ1 (left), τ (middle), and covariate parameter γ (right) under the nDP model based onDataset(N, N, N)400 (histograms) and Dataset(N, N, N) (dashed lines) with the number of observations in each of the eight groups as nj = 400and nj = 50, respectively. The online version of this figure is in color.

Dow

nloa

ded

by [

Uni

vers

ite L

aval

] at

17:

21 0

9 Ju

ly 2

014

Page 14: A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment

60 Journal of the American Statistical Association, March 2013

Figure 6. Boxplots of the estimates of conditional predictive ordinate CPOij (left) and the proposed pseudo-CPO, CPO(B)ij , computed from the

best iteration (right) under the nDP model for all observations i = 1, . . . , 400 from different groups j = 1, . . . , 8, based on Dataset(N, N, N)400

with nj = 400 number of observations in each group. The online version of this figure is in color.

by

CPO(B)ij ≡ f

(yijt |θ (B)

t , β(B)ij , γ (B), τ (B), wij

),

which we refer to as the pseudo-CPO,

LPML(B) =C∑

j=1

nj∑i=1

ln(

CPO(B)ij

),

where B records the iteration m = 1, . . . , M at which∑Cj=1

∑nj

i=1 ln(CPO(m)ij ) attains its maximum value. Posterior

samples θ(B)1 , τ (B), γ (B), β

(B)ij obtained from the Bth iteration

provide us an alternative way to understand the behaviors ofthe parameters of interest in the model. The specific clusteringstructure among all βij and their Fj in the Bth iteration is re-ferred to as the best cluster. It may allow us to draw furtherinsight about the data. For instance, when a certain group is notclustered with any other group, it may be because that the grouppossesses some unique characteristics.

Before concluding the simulation studies, we take a closerlook into the proposed measure of pseudo-LPML from whichwe select the best iteration/cluster. A comparison between itand the existing measure LPML can be done through contrast-ing the proposed pseudo-CPO measure, CPO

(B)ij , and the usual

measure CPOij . We define two additional datasets, namely,Dataset(N, N, N)400 and Dataset(MixPt, U, t5)400, to be identi-cal to Dataset(N, N, N) and Dataset(MixPt, U, t5), respectively,except that nj = 400 rather than 50. The reason for generat-ing data with larger number of observations is that we wish tocompare between the sampling distributions of the two CPO

estimates from each group, and thus, a relatively large sampleis needed.

Figures 6 and 7 present the boxplots of the two CPO es-timates under the nDP model based on Dataset(N, N, N)400

and Dataset(MixPt, U, t5)400, respectively. Note that onlyFigure 7 includes boxplots under the normal model basedon Dataset(MixPt, U, t5)400 because both the nDP and nor-mal models result in almost identical boxplots based onDataset(N, N, N)400. Due to the latter reason, Figure 6 providesa fair and reliable comparison between the two CPO estimates.We arrive at the following observations: (1) CPO

(B)ij for different

groups are approximately identically distributed, while CPOij

for some groups are substantially smaller than those from theother groups, and (2) CPO

(B)ij are generally larger than CPOij

within the same group, which, in turn, results in a higher valueof the pseudo-LPML than the usual LPML measure. These sum-marize the criteria in comparing the goodness of fit of differentmodels based on the proposed pseudo-CPO and distinguish be-tween the two CPO estimates. In Figure 7, the same two observa-tions regarding the boxplots of the pseudo-CPO under the nDPmodel based on Dataset(MixPt, U, t5)400 can be seen from thetop row. However, the CPO estimates under the normal modeldepicted in the bottom row are much smaller than those in thetop row. Using the pseudo-CPO as a criterion, we conclude thatthe nDP model provides a better fit than the normal model toDataset(MixPt, U, t5)400. This is consistent with the conclusiondrawn from the LPML; for the nDP and normal models, weobtain pseudo-LPML (nDP: −2010.62; normal: −2937.28) andLPML (nDP: −3111.81; normal: −3574.43).

Dow

nloa

ded

by [

Uni

vers

ite L

aval

] at

17:

21 0

9 Ju

ly 2

014

Page 15: A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment

Ho et al.: A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment 61

Figure 7. Boxplots of the estimates of conditional predictive ordinate CPOij (left) and the proposed pseudo-CPO, CPO(B)ij , computed from the

best iteration (right) under the nDP model (top row) and under the normal model (bottom row) for all observations i = 1, . . . , 400 from differentgroups j = 1, . . . , 4, based on Dataset(MixPt, U, t5)400 with nj = 400 number of observations in each group. The online version of this figureis in color.

We also include in Table 2 the pseudo-LPML values for thefits of both the nDP and normal models to all the previously de-fined 22 datasets with nj = 50 each. Based on pseudo-LPML,one arrives at the same conclusion as one does based on LPML(as in Section 4.2) that the nDP model in general performs bet-ter than the normal model when the interaction distributions arenot unimodal or the noise/error distribution possesses a heaviertail than normals. With smaller group sizes (nj = 50), the twomeasures, LPML and pseudo-LPML, may yield different con-clusions for some datasets. For instance, the conclusions differfor fitting Dataset(MixPt, U, t5), but the two measures reach thesame conclusion when nj increases to 400, as discussed in theprevious paragraph.

We conclude that the above observation (1) that the pseudo-CPO for different groups are approximately identically dis-tributed is worth highlighting, as it implies that observationsin each group are treated equally such that there is not anyobservation in certain group associated with an outlying value(especially small or large value) as its CPO estimate. This cri-terion serves as a clearer and distinctive benchmark for a goodfit as compared with the existing measure CPOij . Consequently,the pseudo-CPO yields a better estimate of the true LPML, and,hence, the pseudo-LPML can be used as an alternative measureto the LPML for model comparison.

5. ANALYSIS OF THE TRIAL DATA

We analyze the clustered patient data from the interventiontrial described earlier in the article. For the purpose of illustra-

tion, we focus on the MCS of SF-36 Instrument as the primaryoutcome variable of this analysis, and specifically, we examinethe change of MCS between baseline and closeout (after 12months). For simplicity, we analyze data from 749 patients whohad completed the trial. These patients were cared for by 48 in-tervention/control physician teams (i.e., C = 48 groups). Eachteam was led by one primary care physician, and included ageriatric consultant and a nurse care manager. In addition to theroutine clinical care, the physician teams provided additionaldiagnosis and treatment of various geriatric conditions. Herein,we assess the intervention effect, which is 2θ1 in our model, onthe mental health outcome of the patients in the study. As de-scribed earlier, because the intervention was dispensed through48 different physician teams, patients cared for by the same teamare likely to share some common features thus forming a “clus-ter.” The interaction between a physician and a specific patientis likely to influence the care outcomes. We will show that thenDP model proposed in Section 2 serves us better than the nor-mal model by accommodating such unique physician–patienteffects in the clustered patient data.

Demographic and clinical characteristics of the study pa-tients were as follows: 382 of the 749 patients were caredfor by 48 intervention physician teams (the other were con-trols). The average age of all the patients was 71.557 years(SD 5.513 years). Among the intervention patients, 81.2% hadhypertension, 11.8% had angina, 18.1% had stroke, 23.0% hadCOPD, 34.6% had diabetes, and 12.6% had cancer, respectively.Among the control patients, 83.7% had hypertension, 10.4%had angina, 13.9% had stroke, 21.8% had COPD, 33.0% had

Dow

nloa

ded

by [

Uni

vers

ite L

aval

] at

17:

21 0

9 Ju

ly 2

014

Page 16: A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment

62 Journal of the American Statistical Association, March 2013

Figure 8. Estimates of the intervention effect 2θ1 (left) and the precision τ (right) under the nDP model (top row) and under the normal model(bottom row), and the corresponding point estimates/sample values (denoted by 2θ

(B)1 and τ (B)) from the best iteration (solid lines) under the

nDP model, for the geriatric data. The online version of this figure is in color.

diabetes, and 11.4% had cancer, respectively. The mean changesof MCS were, respectively, 2.55 and −0.33 (SD 10.42 and11.24) for intervention and control patients.

Both the nDP model and the normal model are fitted, re-sulting in LPML(B) as −2110.83 and −2645.76, and LPML as−2761.47 and −2851.11, respectively. Based on either mea-sure, the nDP model represents a better fit to the data. Thedifference between the two models is evident in Figure 8, whichshows that the variability of the intervention effect 2θ1 by thenDP model is much smaller, as compared with that by the nor-mal model. Hence, we eliminate the normal model from furtherconsideration.

Estimation of the intervention effect is summarized by theposterior distribution in the top left panel of Figure 8. It clearlyshows that intervention patients had a higher improvement inthe MCS score. The posterior mean (or median) of 2θ1 is 2.272(or 2.276). This confirms the mental health benefit of the pro-posed intervention. It should be noted that the intervention isa bundled care modality for older individuals. The interventionincludes many different components targeting different geriatric

conditions. We estimate that P (θ1 ≥ θ2) is very close to 1, whichsuggests a significant intervention effect. More relevant to ourstudy is perhaps the mental health intervention such as depres-sion screening and treatment. Our analysis is not intended todifferentiate which of the active ingredients in the interventionaffects mental health outcomes, we can only conclude that theintervention as a whole does have a beneficial impact on mentalhealth of older adults.

None of the nine covariate coefficients for age, gender, race,hypertension, angina, stroke, COPD, diabetes, and cancer ap-pear to have significant effects on the MCS score as sug-gested by the posterior means and the 95% credible intervalsin Table 4. This observation is perhaps not entirely surpris-ing as this is a randomized trial. The randomization, al-though performed at physician level, is likely to averageout much of the extraneous effects. Also, the physical co-morbid conditions may have limited impact on the mentalhealth outcome reported in this article. For the purpose ofthe trial, the lack of statistical significance in these covari-ates is quite important because it suggests the success of the

Dow

nloa

ded

by [

Uni

vers

ite L

aval

] at

17:

21 0

9 Ju

ly 2

014

Page 17: A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment

Ho et al.: A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment 63

Figure 9. Posterior estimates of the unknown physician effect distributions Fj under the nDP model of 12 randomly selected physicians/physician teams among a total of 48 teams for the geriatric data. The online version of this figure is in color.

Dow

nloa

ded

by [

Uni

vers

ite L

aval

] at

17:

21 0

9 Ju

ly 2

014

Page 18: A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment

64 Journal of the American Statistical Association, March 2013

Figure 10. Posterior samples β(B)ij of interaction effects (histograms), and the proposed pseudo-CPO, CPO

(B)ij (boxplots at right), and CPOij

(boxplots at left) of all 48 unknown physician effect distributions clustered according to four clusters determined from the best iteration underthe nDP model for the geriatric data. The online version of this figure is in color.

Dow

nloa

ded

by [

Uni

vers

ite L

aval

] at

17:

21 0

9 Ju

ly 2

014

Page 19: A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment

Ho et al.: A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment 65

Table 4. Posterior estimates of the covariate effects under the nDPmodel for the geriatric data

Covariate effects γ Posterior mean 95% Credible interval

Age −0.013 (−0.091, 0.068)Gender 0.125 (−0.541, 0.788)Race −0.318 (−0.923, 0.305)Hypertension −0.206 (−0.847, 0.445)Angina 0.138 (−0.532, 0.796)Stroke −0.098 (−0.748, 0.559)COPD 0.092 (−0.572, 0.753)Diabetes −0.062 (−0.683, 0.580)Cancer −0.162 (−0.850, 0.507)

randomization, in particular as far as the mental health outcomeis concerned.

Two of the important features of the nDP model are (1) flexi-bility of modeling the interaction effects nonparametrically, and(2) borrowing information across and within groups by cluster-ing together interaction distributions and interaction effects fordifferent observations within the same group or from differentgroups (or teams in this problem). The former feature can besummarized by the posterior distributions of βij for differentteams j. Those for 12 selected teams presented in Figure 9 re-veal that interaction distributions Fj for some teams do deviatestrongly from bell-shaped normal distributions that traditionalmixed effect models or the considered normal model (5) assume.In addition, we see various shapes of the histograms, includingunimodal, bimodal, multi-modal, and skewed curves. The his-tograms clearly differ from team to team. The lack of a commondistribution not only highlights the flexibility of the proposednDP method, but also better represents the reality of currentmedical practice, where physicians have different patient poolsand patient outcomes within the same physician team may notperfectly conform with the restrictive normal random effectsdistribution. These results highlight the limitation of the tradi-tional mixed model analysis based on normality assumptionsfor the random effects: (1) it fails to accommodate potentialpatient–physician interactions, (2) the assumed common nor-mal distribution for the random effects for all physician teamsis not verifiable and may be too restrictive, and (3) the modelwould not provide any indications on the performance of in-dividual physicians. The normal model considered still suffersfrom the aforementioned issues (1) and (2). Indeed, when thedata are fitted with the normal model, all Fj ’s are estimated asdifferent unimodal curves as in Figure 4. All things considered,it nullifies the use of either a common normal interaction dis-tribution or different normal interaction distributions for suchclustered data.

The second feature about clustering is illustrated as fol-lows. First of all, probability estimates of equality between in-teraction distributions of different teams P (Fj = Fj ′ ), j, j ′ =1, . . . , 48, j �= j ′, show that most of the pairs are similar toeach other for more than 80% of the time, with exceptions ofP (F27 = Fj ), which are estimated to be as low as 60%. Second,to demonstrate how the interaction effects βij from differentteams are clustered together, we obtain the best iteration basedon LPML(B), in which the 48 teams are assigned into four clus-

ters. Based on our numbering for the teams, the four clustersare summarized as {1}, {8, 13, 27}, {4, 7, 9, 16, 22, 34, 40}, andone big group of the other 37 teams. Figure 10 gives boxplotsof CPO

(B)ij and CPOij according to these four clusters. It re-

veals that CPO(B)ij are generally larger than CPOij within the

same group, which is consistent with observations discussed inSection 4.6. The fact that CPO

(B)ij for different groups may be

identically distributed is not quite seen. They are roughly iden-tically distributed for clusters 3 and 4, but not for clusters 1 and2. However, this is maybe because there are too few observa-tions/patients in clusters 1 and 2 (only 15 and 34, respectively).From this best iteration, 2θ

(B)1 = 1.994 is comparable to the

posterior mean or median of 2θ1.Figure 10 depicts how β

(b)ij are clustered according to the best

cluster defined above by plotting the empirical distributions ofthese samples. Among the four clusters, the variability of the(combined) interaction distribution F ∗

k for team/cluster 1 is rel-atively low, followed by that for cluster 3 formed by teams 4,7, 9, 16, 22, 34, and 40, and then that for the biggest cluster. Inaddition, the distributions for these three clusters are relativelysymmetrical about zero, while that for the second cluster formedby teams 8, 13, and 27, is highly skewed to the right. These re-sults are consistent with the above discussion of the magnitudesof estimates of P (Fj = Fj ′) and yield further information aboutthe interaction effects and their distributions. Most of the teamsare estimated to be similar to each other in the sense that theirFj ’s are symmetrical about zero. After that, they are clusteredbased on their variabilities. The reason for the relatively lowprobability for team 27 being similar to other teams is that F27

is positively skewed.

6. CONCLUSION

This research proposed a new methodology for the accommo-dation of physician–patient interactions in cluster randomizedtrials. This new nDP-based method extends the traditional mixedeffect models in a number of ways. First, it provides direct andexplicit accounting for the physician–patient interactions thatmay influence patient outcomes significantly. As shown by ourextensive simulations, failure to accommodate such interactionscould directly impact the trial results. Second, the model greatlyrelaxes the constraints or assumptions of normal random effectsdistributions in the mixed effect models. Finally, as a model-ing approach, the method provides graphical verifications of therandom interaction effects. With histograms of the estimatedinteraction effects, it affords an opportunity to detect deviationsfrom the assumed random variable distribution. Practically, italso helps to identify physicians that are outliers in performance.

For our application, care of vulnerable older adults poses agreat challenge to our society, especially in the decades to come.The concurrent mental and physical deteriorations in older pa-tients often require a more comprehensive approach to managetheir care and care delivery. In recent years, various new caremodels have been proposed by the medical community. Some ofthese newer care models have gone beyond the traditional phar-macological treatment and they seek a higher level of care co-ordination between the patients and their care providers. Theseinterventions are typically delivered at the provider and even

Dow

nloa

ded

by [

Uni

vers

ite L

aval

] at

17:

21 0

9 Ju

ly 2

014

Page 20: A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment

66 Journal of the American Statistical Association, March 2013

at the clinic level. The assessment of such intervention, how-ever, has been challenging because the ultimate care outcomesstill have to be evaluated at the patient level. The clustering ofpatient outcomes within the physician, the effect of a specificphysician–patient pair within each physician team/group, andthe potential difference across these clusters pose a challenge tothe evaluation of these care models. Despite the popularity of thetraditional mixed effect models based on normal random effectsassumption, these models are often too restrictive and simplis-tic. Furthermore, the modeling assumptions are usually difficultto verify. Few models have provided explicit accommodation ofthe physician–patient interaction. In this research, we proposeda new analytical approach that greatly alleviates these difficul-ties. To the best of our best knowledge, this is the first statisticalmodel that directly addresses the physician–patient interactionin patient clustered data. In addition, through a clinical investiga-tion, our analysis has shown a concrete example that real data donot always conform with the routinely used normal random ef-fects assumption. The nDP approach has thus greatly enhancedthe modeling flexibility as it provides a direct accommodationof the interaction effects between physicians and patients.

APPENDIX

An iterative algorithm for sampling random variates of(ζ , ξ , π∗, ω∗, β∗, α, ρ, μβ, τβ ), γ , τ , and (θ1, . . . , θT ) from their jointposterior distribution cycles through the following four steps:

1. Sampling of (ζ , ξ , π∗, ω∗,β∗, α, ρ, μβ, τβ ) for the interactioneffects are carried out through the following steps:(a) Sample the classification variables ζj for j = 1, . . . , C from

a multinomial distribution with probabilities

P (ζj = k| · · ·) ∝ π∗k

nj∏i=1

L∑l=1

f (yijt |θt , β∗lk, γ , τ , wij ),

k = 1, . . . , K.

(b) Sample the classification variables ξij for j = 1, . . . , C andi = 1, . . . , nj from a multinomial distribution with proba-bilities

P (ξij = l| · · ·) ∝ ω∗lζj

f (yijt |θt , β∗lζj

, γ , τ , wij ), l=1, . . . , L.

(c) Sample π∗ by generating

(u∗k | · · ·)

ind∼ beta

(1 + mk, α +

K∑s=k+1

ms

),

k = 1, . . . , K − 1,

u∗K = 1,

where mk = ∑C

j=1 I{ζj =k} is the number of distributionsamong F1, . . . , FC assigned to component k in (6), and con-structing π∗

k = u∗k

∏k−1s=1(1 − u∗

s ) for k = 1, . . . , K .(d) Sample ω∗ by generating, for k = 1, . . . , K ,

(v∗lk| · · ·)

ind∼ beta

(1 + nlk, ρ +

L∑s=l+1

nlk

),

l = 1, . . . , L − 1,

v∗Lk = 1,

where nlk = ∑C

j=1

∑nj

i=1 I{ζj =k,ξij =l} is the number of clus-ter effects assigned to atom l of distribution k in (7), andconstructing ω∗

lk = v∗lk

∏l−1s=1(1 − v∗

sk) for l = 1, . . . , L.

(e) Sample β∗lk , for k = 1, . . . , K and l = 1, . . . , L, according

to

(β∗lk| · · ·) ∼ N

⎛⎝ τ∑

{i,j |ζj =k,ξij =l}(yijt − θt − wijγ ) + τβμβ

nlkτ + τβ

,

1

nlkτ + τβ

⎞⎠.

(f) Sample

(α| · · ·) ∼ gamma

(aα + (K − 1), bα −

K−1∑k=1

log(1 − u∗k)

)and

(ρ| · · ·) ∼ gamma

⎛⎝aρ + K(L − 1),

bρ −L−1∑l=1

K∑k=1

log(1 − v∗lk)

⎞⎠,

where gamma(a, b) represents a gamma random variable Xwith density h(x|a, b) ∝ xa−1e−bx , x > 0.

(g) Sample μβ according to

(μβ | · · ·) ∼ N

(τβ

∑K

k=1

∑L

l=1 β∗kl

τβKL + σ−2μ

,1

τβKL + σ−2μ

).

(h) Sample τβ according to

(τβ | · · ·) ∼ gamma

(aτβ

+ KL

2,

bτβ+ 1

2

K∑k=1

L∑l=1

(β∗kl − μβ )2

).

2. Sample γ from its full conditional distribution,

p(γ | · · ·) ∝⎡⎣ C∏

j=1

nj∏i=1

f (yijt |θt , β∗ξij ,ζj

, γ , τ , wij )

⎤⎦φr (γ |0, �γ ),

where φr (·|0, �γ ) is an r-variate normal density with zero meanvector and variance-covariance matrix �γ . For instance, whenγ = γ is univariate, and is distributed as N(0, σ 2

γ ) a priori,

(γ | · · ·) ∼ N

(τ∑C

j=1

∑nj

i=1 wij (yijt − θt − β∗ξij ,ζj

)

τ∑C

j=1

∑nj

i=1 w2ij + σ−2

γ

,

1

τ∑C

j=1

∑nj

i=1 w2ij + σ−2

γ

),

where wij is the covariate for ith patient cared by jth physician.3. Sample τ from its full conditional distribution,

p(τ | · · ·) ∝⎡⎣ C∏

j=1

nj∏i=1

f (yijt |θt , β∗ξij ,ζj

, γ , τ , wij )

⎤⎦h(τ |aτ , bτ ).

That is,

(τ | · · ·) ∼ gamma

⎛⎝aτ + 1

2

C∑j=1

nj ,

bτ + 1

2

C∑j=1

nj∑i=1

(yijt − θt − β∗ξij ,ζj

− wijγ )2

⎞⎠ .

Dow

nloa

ded

by [

Uni

vers

ite L

aval

] at

17:

21 0

9 Ju

ly 2

014

Page 21: A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment

Ho et al.: A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment 67

4. For identifiability issue, assume that sum of all θi’s equalszero, that is, θT = −∑T −1

i=1 θi . For s = 1, . . . , T − 1, let θs =∑T −1i=1,i �=s θi . Sample θs for s = 1, . . . , T − 1 from its full condi-

tional distribution,

(θs | · · ·) ∼ N

(Bsσ

2θ ms

Bsσ2θ + 1

,σ 2

θ

Bsσ2θ + 1

),

where Bs = τ [∑C

j=1

∑nj

i=1 I{t=s} +∑C

j=1

∑nj

i=1 I{t=T }] ≡ Mτ

with M being the total number of observations amongN = ∑C

j=1 nj satisfying the events {t = s} or {t = T }, and

ms = τ

Bs

⎡⎣ C∑j=1

nj∑i=1

(yijt − β∗ξij ,ζj

− wijγ )I{t=s}

+C∑

j=1

nj∑i=1

(−yijt − θs + β∗ξij ,ζj

+ wijγ )I{t=T }

⎤⎦ .

As a special case of T = 2 treatments, we assume that θ1 =−θ2 ≡ θ . We sample θ1 from its full conditional distribution,

(θ1| · · ·) ∼ N

(B1σ

2θ m1

B1σ2θ + 1

,σ 2

θ

B1σ2θ + 1

),

where B1 = (∑C

j=1 nj )τ = Nτ , and

m1 = 1

N

⎡⎣ C∑j=1

nj∑i=1

(yijt − β∗ξij ,ζj

− wijγ )I{t=1}

+C∑

j=1

nj∑i=1

(−yijt + β∗ξij ,ζj

+ wijγ )I{t=2}

⎤⎦ .

[Received May 2011. Revised August 2012.]

REFERENCES

Agresti, A., Caffo, B., and Ohman-Strickland, P. (2004), “Examples in WhichMisspecification of a Random Effects Distribution Reduces Efficiency,and Possible Remedies,” Computational Statistics & Data Analysis, 47,639–653. [48]

Antoniak, C. E. (1974), “Mixtures of Dirichlet Processes With Applications toNonparametric Problems,” The Annals of Statistics, 2, 1152–1174. [50]

Asch, S. M., Sloss, E. M., Hogan, C., Brook, R. H., and Kravitz, R. L. (2000),“Measuring Underuse and Necessary Care Among Elderly Medicare Ben-eficiaries Using Inpatient and Outpatient Claims,” Journal of the AmericanMedical Association, 284, 2325–2333. [49]

Azzalini, A. (1985), “A Class of Distributions Which Includes the NormalOnes,” Scandinavian Journal of Statistics, 12, 171–178. [53]

Bland, J. M. (2004), “Cluster Randomised Trials in the Medical Literature: TwoBibliometric Surveys,” BMC Medical Research Methodology, 4, 1–6. [48]

Bohning, D. (2000), Computer-Assisted Analysis of Mixtures and Applications:Meta-Analysis, Disease Mapping and Others, Boca Raton, FL: Chapmanand Hall-CRC. [49]

Boult, C., Boult, L., and Pacala, J. (1998), “Systems of Care for Older Pop-ulations of the Future,” Journal of the American Geriatrics Society, 46,499–505. [49]

Branscum, A., and Hanson, T. E. (2008), “Bayesian Nonparametric Meta-Analysis Using Polya Tree Mixture Models,” Biometrics, 64, 825–833. [48]

Brown, E. R., and Ibrahim, J. G. (2003), “Bayesian Approaches to Joint Cure-rate and Longitudinal Models With Applications to Cancer Vaccine Trials,”Biometrics, 59, 686–693. [52]

Brown, H., and Prescott, R. (1999), Applied Mixed Models in Medicine, NewYork: Wiley. [48]

Burr, D., and Doss, H. (2005), “A Bayesian Semiparametric Model for Random-effects Meta-analysis,” Journal of the American Statistical Association, 100,242–251. [49]

Burr, D., Doss, H., Cooke, G. E., and Goldschmidt-Clermont, P. J. (2003), “AMeta-Analysis of Studies on the Association of the Platelet PlA Polymor-phism of Glycoprotein IIIa and Risk of Coronary Heart Disease,” Statisticsin Medicine, 22, 1741–1760. [49]

Callahan, C. M., Stump, T. E., Stroupe, K. T., and Tierney, W. M. (1998),“Cost of Health Care for a Community of Older Adults in an Urban Aca-demic Health Care System,” Journal of the American Geriatrics Society,46, 1371–1377. [49]

Campbell, M. K., Elbourne, D. R., and Altman, D. C. (2004), “CONSORT State-ment: Extension to Cluster Randomised Trials,” British Medical Journal,328, 702–708. [48]

Campbell, M. K., Mollison, J., and Grimshaw, J. M. (2001), “Cluster Trialsin Implementation Research: Estimation of Intracluster Correlation Coeffi-cients and Sample Size,” Statistics in Medicine, 20, 391–399. [48]

Chen, M.-H., Ibrahim, J. G., Lam, P., Yu, A., and Zhang, Y. (2011), “BayesianDesign of Noninferiority Trials for Medical Devices Using Historical Data,”Biometrics, 67, 1163–1170. [54]

Chen, M.-H., Shao, Q. M., and Ibrahim, J. G. (2000), Monte Carlo Methods inBayesian Computation, Berlin and New York: Springer-Verlag. [52]

Counsell, S. R., Callahan, C. M., Clark, D. O., Tu, W., Buttar, A. B., Stump, T.E., and Ricketts, G. D. (2007), “Geriatric Care Management for Low-IncomeSeniors,” Journal of the American Medical Association, 298, 2623–2633.[49]

Counsell, S. R., Holder, C. M., Liebenauer, L. L., Palmer, R. M., Fortinsky, R. H.,Kresevic, D. M., Quinn, L. M., Allen, K. R., Covinsky, K. E., and Landefeld,C. S. (2000), “Effects of a Multicomponent Intervention on Functional Out-comes and Process of Care in Hospitalized Older Patients: A RandomizedControlled Trial of Acute Care for Elders (ACE) in a Community Hospital,”Journal of the American Geriatrics Society, 48, 1572–1581. [49]

Di, C.-Z., and Bandeen-Roche, K. (2011), “Multilevel Latent Class ModelsWith Dirichlet Mixing Distribution,” Biometrics, 67, 86–96. [49]

Donner, A., and Klar, N. (2000), Design and Analysis of Cluster RandomizationTrials in Health Research, London, UK: Arnold Publishers Limited. [48]

Dunson, D. B. (2008), “Nonparametric Bayes Applications to Biostatistics,” inBayesian Nonparametrics, eds. N. L. Hjort, C. Holmes, P. Muller, and S. G.Walker, Cambridge: Cambridge University Press, pp. 223–273. [49]

Escobar, M. D. (1988), “Estimating the Means of Several Normal Populationsby Nonparametric Estimation of the Distribution of the Means,” unpublishedPh.D. dissertation, Department of Statistics, Yale University. [50]

——— (1994), “Estimating Normal Means With a Dirichlet ProcessPrior,” Journal of the American Statistical Association, 89, 268–277. [50]

Escobar, M. D., and West, M. (1995), “Bayesian Density Estimation and Infer-ence Using Mixtures,” Journal of the American Statistical Association, 90,577–588. [50]

Ferguson, T. S. (1973), “A Bayesian Analysis of Some Nonparametric Prob-lems,” The Annals of Statistics, 1, 209–230. [50]

Geisser, I., and Eddy, W. (1979), “A Predictive Approach to Model Selection,”Journal of the American Statistical Association, 74, 153–160. [52]

Gelfand, A. E., Dey, D. K., and Chang, H. (1992), “Model Determination UsingPredictive Distributions With Implementation via Sampling-Based Meth-ods” (with discussion), in Bayesian Statistics (Vol. 4), eds. J. M. Bernardo,J. O. Berger, A. P. Dawid, and A. F. M. Smith, Oxford: Oxford UniversityPress, pp. 147–159. [52]

Ghidey, W., Lesaffre, E., and Eilers, P. (2004), “Smooth Random Effects Dis-tribution in a Linear Mixed Model,” Biometrics, 60, 945–953. [54]

Ghosh, P., Basu, S., and Tiwari, R. C. (2009), “Bayesian Analysis of CancerRates From SEER Program Using Parametric and Semiparametric JoinpointRegression Models,” Journal of the American Statistical Association, 104,439–452. [50,52]

Ghosh, K., Ghosh, P., and Tiwari, R. C. (2008), Comment on “The NestedDirichlet Process (By Rodrıguez, Dunson and Gelfand),” Journal of theAmerican Statistical Association, 103, 1147–1149. [53]

Higgins, J. P. T., Thompson, S. G., and Spiegelhalter, D. J. (2009), “A Re-Evaluation of Random-Effects Meta-Analysis,” Journal of the Royal Statis-tical Society, Series A, 172, 139–159. [49]

Ishwaran, H., and James, L. F. (2001), “Gibbs Sampling Methods for Stick-Breaking Priors,” Journal of the American Statistical Association, 96,161–173. [51]

Jackel, L. A. (1972), “Estimating Regression Coefficients by Minimizing theDispersion of Residuals,” Annals of Mathematical Statistics, 43, 1449–1458.[49]

Jencks, S. F., Cuerdon, T., Burwen, D. R., Fleming, B., Houck, P. M., Kussmaul,A. E., Nilasena, D. S., Ordin, D. L., and Arday, D. R. (2000), “Qualityof Medical Care Delivered to Medicare Beneficiaries: A Profile at Stateand National Levels,” Journal of the American Medical Association, 284,1670–1676. [49]

Dow

nloa

ded

by [

Uni

vers

ite L

aval

] at

17:

21 0

9 Ju

ly 2

014

Page 22: A Nested Dirichlet Process Analysis of Cluster Randomized Trial Data With Application in Geriatric Care Assessment

68 Journal of the American Statistical Association, March 2013

Landefeld, C. S., Palmer, R. M., Kresevic, D. M., Fortinsky, R. H., and Kowal, J.(1995), “A Randomized Trial of Care in a Hospital Medical Unit EspeciallyDesigned to Improve the Functional Outcomes of Acutely ill Older Patients,”The New England Journal of Medicine, 332, 1338–1344. [49]

Lee, K. J., and Thompson, S. G. (2007), “Flexible Parametric Mod-els for Random-Effects Distributions,” Statistics in Medicine, 27, 418–434. [49]

Litiere, S., Alonso, A., and Molenberghs, G. (2007), “Type I and Type II Er-ror Under Random Effects Misspecification in Generalized Linear MixedModels,” Biometrics, 63, 1038–1044. [54]

Lo, A. Y. (1978), “Bayesian Nonparametric Density Methods,” Technical Re-port, Department of Statistics, University of California, Berkeley. [50]

——— (1984), “On a Class of Bayesian Nonparametric Estimates. 1. DensityEstimates,” The Annals of Statistics, 12, 351–357. [50]

MedPAC. (2005), A Data Book: Healthcare Spending and the Medicare Pro-gram, June 2005, MedPAC Report, Washington, DC: MedPAC, pp. 12–20.[49]

Moher, D., Schulz, K. F., and Altman, D. G. (2001a), “The CONSORT State-ment: Revised Recommendations for Improving the Quality of Reports ofParallel-Group Randomized Trials,” Journal of the American Medical As-sociation, 285, 1987–1991. [48]

——— (2001b), “The CONSORT Statement: Revised Recommendations forImproving the Quality of Reports of Parallel-Group Randomized Trials,”Annals of Internal Medicine, 134, 657–662. [48]

Ohlssen, D. I., Sharples, L. D., and Spiegelhalter, D. J. (2007), “FlexibleRandom-effects Models Using Bayesian Semi-parametric Models: Applica-tions to Institutional Comparisons,” Statistics in Medicine, 26, 2088–2112.[48,49]

Pfeiffer, E. (1975), “A Short Portable Mental Status Questionnaire for the As-sessment of Organic Brain Deficit in Elderly Patients,” Journal of the Amer-ican Geriatrics Society, 23, 433–441. [49]

Rashid, M. M. (2003), “Rank-based Test for Non-inferiority and EquivalenceHypotheses in Multi-cluster Clinical Trials Using Mixed Models,” Statisticsin Medicine, 22, 291–311. [48,49]

Reuben, D. B. (2002), “Organizational Interventions to Improve Health Out-comes of Older Persons,” Medical Care, 40, 416–428. [49]

Rodrıguez, A. (2007), “Some Advances in Bayesian Nonparametric Modeling,”unpublished Ph.D. dissertation, Institute of Statistics and Decision Sciences,Duke University. [50,51,53]

Rodrıguez, A., Dunson, D. B., and Gelfand, A. E. (2008), “The Nested DirichletProcess,” Journal of the American Statistical Association, 103, 1131–1154.[49,50,51,53]

Sethuraman, J. (1994), “A Constructive Definition of Dirichlet Priors,” StatisticaSinica, 4, 639–650. [50]

Sethuraman, J., and Tiwari, R. C. (1982), “Convergence of Dirichlet Measureand the Interpretation of Their Parameters,” in Statistical Decisions Theoryand Related Topics III (Vol. 2), eds. S. Gupta, and J. O. Berger, New York:Academic Press, pp. 305–315. [50]

Tchetgen, E., and Coull, B. (2006), “A Diagnostic Test for the Mixing Distri-bution in a Generalized Linear Mixed Model,” Biometrika, 93, 1003–1010.[49]

Verbeke, G., and Molenberghs, G. (2000), Linear Mixed Model for LongitudinalData, New York: Springer. [48]

Walker, S. G., and Mallick, B. K. (1997), “Hierarchical Generalized LinearModels and Frailty Models With Bayesian Nonparametric Mixing,” Journalof the Royal Statistical Society, Series B, 59, 845–860. [49]

Ware, J. E., Kosinski, M., and Keller, S. D. (1994), SF-36 R© Physical andMental Health Summary Scales: A User’s Manual, Boston, MA: The HealthInstitute, New England Medical Cluster. [50]

Wenger, N. S., Solomon, D. H., Roth, C. P., MacLean, C. H., Saliba, D.,Kamberg, C. J., Rubenstein, L. Z., Young, R. T., Sloss, E. M., Louie, R.,Adams, J., Chang, J. T., Venus, P. J., Schnelle, J. F., and Shekelle, P. G.(2003), “The Quality of Medical Care Provided to Vulnerable Community-Dwelling Older Patients,” Annals of Internal Medicine, 139, 740–747. [49]

Wolff, J. L., and Boult, C. (2005), “Moving Beyond Round Pegs and SquareHoles: Restructuring Medicare to Improve Chronic Care,” Annals of InternalMedicine, 143, 439–445. [49]

Dow

nloa

ded

by [

Uni

vers

ite L

aval

] at

17:

21 0

9 Ju

ly 2

014