jointly modeling time-to-event and longitudinal data: a...

Stat Methods Appl (2014) 23:95–121DOI 10.1007/s10260-013-0242-7

Jointly modeling time-to-event and longitudinaldata: a Bayesian approach

Yangxin Huang · X. Joan Hu · Getachew A. Dagne

Accepted: 7 September 2013 / Published online: 27 September 2013© Springer-Verlag Berlin Heidelberg 2013

Abstract This article explores Bayesian joint models of event times and longitudinalmeasures with an attempt to overcome departures from normality of the longitudinalresponse, measurement errors, and shortages of confidence in specifying a parametrictime-to-event model. We allow the longitudinal response to have a skew distribution inthe presence of measurement errors, and assume the time-to-event variable to have anonparametric prior distribution. Posterior distributions of the parameters are attainedsimultaneously for inference based on Bayesian approach. An example from a recentAIDS clinical trial illustrates the methodology by jointly modeling the viral dynam-ics and the time to decrease in CD4/CD8 ratio in the presence of CD4 counts withmeasurement errors to compare potential models with various scenarios and differentdistribution specifications. The analysis outcome indicates that the time-varying CD4covariate is closely related to the first-phase viral decay rate, but the time to CD4/CD8decrease is not highly associated with either the two viral decay rates or the CD4changing rate over time. These findings may provide some quantitative guidance tobetter understand the relationship of the virological and immunological responses toantiretroviral treatments.

Keywords Accelerated failure time model · Dirichlet process · Semiparametriclinear/nonlinear mixed-effects model · Skew-elliptical distribution · Time-to-event

Y. Huang (B) · G. A. DagneDepartment of Epidemiology and Biostatistics, College of Public Health, MDC 56,University of South Florida, Tampa, FL 33612, USAe-mail: [email protected]

X. J. HuDepartment of Statistics and Actuarial Sciences,Simon Fraser University, Burnaby, BC V5A 1S6, Canada

123

96 Y. Huang et al.

1 Introduction

Many studies aim at exploring the relationship between a time-to-event outcomeand a longitudinal marker. For example, the relationship of HIV viral suppressionand immune restoration after a treatment has received great attention in HIV/AIDSresearch (e.g., Acosta et al. 2004). One is often interested in simultaneously studyingthe HIV dynamics and immune restoration, which may be characterized by the time toCD4/CD8 ratio decline (e.g., Pawitan and Self 1993). Relatively little has been pub-lished about statistical analysis for the particular association of HIV viral dynamicsand time trend in CD4/CD8 ratio in the presence of CD4 covariate process, with fewnotable exceptions such as Wu et al. (2010). The research was motivated by the AIDSstudy (Acosta et al. 2004) to understand within-subject patterns of change in viralload or CD4 cell count, and to study the relationship of features of viral load and CD4profiles with time to decrease in CD4/CD8 ratio.

Joint analysis of event times and longitudinal measures has recently attracted greatattention. There have been a considerable number of statistical approaches in theliterature. However, most of the published methods assume that the error terms inthe models for the longitudinal response or for the measurement errors in covari-ates follow normal distributions due to mathematical tractability and computationalconvenience. This requires the variables to be “symmetrically” distributed. A viola-tion of this assumption could lead to misleading inferences. In fact, observed data inHIV studies are often far from being “symmetric”: measurement errors in covariatesoften arise and asymmetric patterns of observations usually occur. This motivated ourdevelopment of a Bayesian joint model for non-normally distributed data with imper-fectly measured covariates. Although the proposed approach applies quite generally,this article focuses on joint modeling of HIV dynamics and time to the decline ofthe CD4/CD8 ratio after treatment was initiated, using CD4 counts over time as acovariate process.

We employ a general Bayesian framework for joint modeling of longitudinalresponse and time-to-event in the presence of covariates with measurement errors,to accommodate a large class of data structures with various features. In particular,we consider an accelerated failure time (AFT) model for the time to the decline ofCD4/CD8 ratio using the nonparametric Dirichlet process (DP) prior as a distribution(Ferguson 1973; Ishwaran and James 2002). The semiparametric nonlinear mixed-effects (SNLME) model with skew-elliptical (SE) distributed random errors is usedfor HIV dynamics. The skew-elliptical distributions include the skew-normal (SN)and skew-t (ST) distributions as special cases. We model the CD4 process with thelinear mixed-effects (LME) model with SE distributions to account for the potentialmeasurement errors. The modeling is motivated by the following two considerations:(1) often, one may not be confident in specifying a parametric model for event times,and (2) normal distributions are not always appropriate for the outcomes of interest.

It is noted that the ST distribution is approximate to the SN distribution when itsdegrees of freedom approach infinity. Thus, we use an ST distribution to developjoint models and associated statistical methodologies, as it can be easily extended toother SE distributions such as the SN distribution. We consider multivariate ST andSN distributions introduced by Sahu et al. (2003), which are suitable for a Bayesian

123

Jointly modeling time-to-event and longitudinal data 97

0 50 100 150 200

24

68

10

12

Profiles of viral load in ln scale

Day

ln(R

NA

)

0 50 100 150 2002

00

30

04

00

50

06

00

70

08

00

Profiles of CD4 cell count

Day

CD

4 c

ell

cou

nt

0 50 100 150 200

0.0

0.2

0.4

0.6

0.8

Profiles of CD4/CD8 ratio

Day

CD

4/C

D8

(a) (b) (c)

Fig. 1 Viral load in natural logarithmic scale, CD4 cell count and CD4/CD8 ratio trajectories of threerandomly selected patients

inference and briefly discussed in the Appendix. The rest of this paper is organizedas follows. Notation and joint model setup are introduced; associated simultaneousBayesian inference procedures for estimating all parameters of the joint models arepresented in Sect. 2. In Sect. 3, we describe the data set that motivated this research,discuss the specific models for viral load response, CD4 covariate process and time todecline of CD4/CD8 ratio that are used to illustrate the proposed methodologies, andreport the analysis results. We conclude the paper with a discussion in Sect. 4.

2 Bayesian joint models with skew distributions

Figure 1 presents the trajectories of HIV viral load, CD4 cell count and CD4/CD8ratio of three randomly selected subjects from a recent AIDS clinical trial (Acosta etal. 2004). While the data indicate a close association of the longitudinal viral load,CD4 cell count and CD4/CD8 ratio, the data show a large variation in the associationacross subjects and over time within each subject. This observation together with thepublished researches in the area (e.g., Wu and Ding 1999; Liu and Wu 2007; Wu 2009)led us to propose the following joint models for the three components of concern.

2.1 Measurement error models with a skew-t distribution

Various covariate mixed-effects models were investigated in the literature (Carrollet al. 2006; Huang and Dagne 2011; Liu and Wu 2007). In this section we brieflydiscuss covariate mixed-effects models with random errors to have an ST distribution.For simplicity, we consider a single time-varying covariate. Let zik be the observedcovariate value for individual i at time sik(i = 1, . . . n; k = 1, . . . ,mi ). We allowthe covariate measurement times sik to differ from the response measurement timesti j ( j = 1, . . . , ni ) for each individual. In the presence of covariate measurementerrors, we consider the following covariate LME model with an ST distribution

zik = uTikα + vT

ik ai + εik (≡ z∗ik + εik),

εi i id ∼ STmi ,ν1

(−J (ν1)δεi , σ

21 Imi ,�(δεi )

), (1)

123

98 Y. Huang et al.

where zi = (zi1, . . . , zimi )T and zik is the covariate value for individual i at

time sik, z∗i = (z∗

i1, . . . , z∗imi)T and z∗

ik = uTikα + vT

ik ai may be viewed as thetrue (but unobservable) covariate value at time sik, uik and vik are r × 1 designvectors. Here α = (α1, . . . , αr )

T and ai = (ai1, . . . , air )T are unknown pop-

ulation (fixed-effects) and individual-specific (random-effects) parameter vectors,respectively, εi = (εi1, . . . , εimi )

T follows a multivariate ST distribution with ν1degrees of freedom, variance σ 2

1 and mi × mi skewness diagonal matrix �(δεi ) =diag(δεi1, . . . , δεimi

) with mi × 1 skewness parameter vector δεi = (δεi1, . . . , δεimi)T ,

and J (ν1) = (ν1/π)1/2[�((ν1 − 1)/2)/�(ν1/2)] is the function of ν1. In partic-

ular, if δεi1 = · · · = δεimi=δε , then δεi = δε1mi with 1mi = (1, . . . , 1)T and

�(δεi ) = δε Imi ; this is the case to be considered for data analysis in Sect. 3. Weassume that ai i id ∼ Nr (0,�a), where �a is the unrestricted covariance matrix.Note that model (1) may be interpreted as an ST covariate measurement error model.

2.2 Skew-t semiparametric nonlinear mixed-effects models for response process

We now present models for response process in general forms, also appropriate in otherapplications. Denote the number of subjects by n and the number of measurementson the ith subject by ni . For the response process, we consider a general SNLMEmodel with an ST distribution and incorporate possibly mis-measured time-varyingcovariates. Let yi j be the response value for individual i at ti j , g(·), d†(·) and v(·) beknown parametric functions,w(t) and hi (t) be unknown nonparametric smooth fixed-effects and random-effects functions, respectively, with hi (t) being iid realization ofa zero-mean stochastic process. Consider

yi j = g(ti j ,β†i j , φ(ti j ))+ ei j , ei i id ∼ STni ,ν2

(−J (ν2)δei , σ

22 Ini ,�(δei )

),

β†i j = d†[z∗

i j ,β†, b†

i ], φ(ti j ) = v[w(ti j ), hi (ti j )], (2)

where β†i j is s1 × 1 individual-specific time-dependent parameter vector, β† is s2 × 1

population parameter vector (s2 ≥ s1), b†i (s3 ≤ s1) is s3 ×1 vector of random-effects,

the vector of random errors ei = (ei1, . . . , eini )T follows a multivariate ST distribution

with degrees of freedom ν2, varianceσ 22 and ni ×ni skewness diagonal matrix�(δei ) =

diag(δei1, . . . , δeini)with the ni ×1 skewness parameter vector δei = (δei1, . . . , δeini

)T .In particular, if δei1 = · · · = δeini

= δe, then �(δei ) = δe Ini and δei = δe1ni . In

model (2), we assume that the individual-specific parameter β†i j depends on the true

(but unobserved) covariate z∗i j rather than the observed covariate zi j , which may be

measured with substantial errors. The ST-SNLME model (2) reverts to a parametricNLME model when the nonparametric partsw(t) and hi (t) are constants. To fit model(2), we may apply the regression spline method for nonparametric functions. Theworking principle is briefly described as follows and more details can be found in Wuand Zhang (2002).

The main idea of regression spline is to approximate w(t) and hi (t) byusing a linear combination of spline basis functions. For instance, w(t) and

123


hi (t) can be approximated by a linear combination of basis functions � p(t) ={ψ0(t), ψ1(t), ..., ψp−1(t)}T and q(t) = {φ0(t), φ1(t), ..., φq−1(t)}T , respectively.That is,

w(t) ≈ wp(t) =p−1∑l=0

μlψl(t) = � p(t)T μp,

hi (t) ≈ hiq(t) =q−1∑l=0

ξilφl(t) = q(t)T ξ iq , (3)

where μp and ξ iq (q ≤ p) are the unknown vectors of fixed and random coefficients,respectively. Based on the assumption of hi (t), we regard ξ iq as i id realizations ofa zero-mean random vector. For our model, we consider natural cubic spline baseswith the percentile-based knots. To select an optimal degree of regression spline andnumber of knots, i.e., optimal sizes of p and q, the Akaike information criterion (AIC)or the Bayesian information criterion (BIC) is often applied (Wu and Zhang 2002).

Substituting w(t) and hi (t) by their approximations wp(t) and hiq(t), we canapproximate model (2) as follows.

yi j = g(

ti j , d†[z∗i j ,β

†, b†i ], v[� p(ti j )

T μp,q(ti j )T ξ iq ]

)+ ei j

≡ g(

ti j , d[z∗i j ,β, bi ]

)+ ei j (4)

where β = (β†T ,μTp )

T and bi = (b†Ti , ξ T

iq)T are fixed-effects and random-effects,

respectively, and d(·) is a known but possible nonlinear function. By doing so, therandomness of the nonparametric mixed-effects is transferred to the randomness ofthe associated coefficients, whereas the nonparametric feature is represented by thebasis functions. Thus, for given � p(t) and q(t), we approximate the ST-SNLMEmodel (2) by the following ST-NLME model.

yi = gi (t i ,β i )+ ei , ei i id ∼ STni ,ν2

(−J (ν2)δei , σ

22 Ini ,�(δei )

),

β i j = d[z∗i j ,β, bi ], bi i id ∼ Ns4(0,�b), (5)

where s4 = s3 + q, yi = (yi1, . . . , yini )T , t i = (ti1, . . . , tini )

T , β i = (β i1, . . . ,

β ini)T , gi (t i ,β i ) = (g(ti1,β i1), . . . , g(tini ,β ini

))T with g(·) being a known non-linear function, and �b is an unstructured covariance matrix.

2.3 Time-to-event models with an unspecified distribution

Various accelerated failure time (AFT) models for time-to-event were investigated inthe literature (DeGruttola and Tu 1994; Tsiatis and Davidian 2004; Wu et al. 2010).However, the common assumption of distributions for model errors is log-normaland this assumption may lack the robustness and/or may violate the agreement with

123

100 Y. Huang et al.

observed data. Thus, statistical inference and analysis with normal assumption maylead to misleading results. We consider AFT models with random errors to have anonparametric prior distribution, which follow a DP prior. The event time �i is likelyrelated to the longitudinal response and covariate processes. This association is ofmuch interest in many practical situations. We specify the association by assumingthat, conditional on the random-effects ai and bi , the event time is related to thelongitudinal processes through the random-effects that characterize the individual-specific effects (Huang et al. 2011; Wu et al. 2010). Particularly, we consider thefollowing AFT model.

ln(�i ) = γ0 + γ T1 ai + γ T

2 bi + εi , εi ∼ G(·),G(·) ∼ D P(ηG0), G0 ∼ N (ζ0, σ

20 ), η ∼ �(η10, η20). (6)

where γ = (γ0, γT1 , γ

T2 )

T are unknown parameters, ci = (1, aTi , bT

i )T are the

random-effects from models (1) and (5). We assume that ci are independent of εi ,and εi follows an unspecified distribution G(·) that is the DP prior (Ferguson 1973;Ishwaran and James 2002) with the concentration parameter η and the base measureG0 following a � distribution and a specified normal distribution, respectively. Notethat the parameter G0 is the prior mean of G(·) and represents a guess; the other para-meter of the DP prior, η, reflects the degree of closeness of G(·) to the prior mean G0.Large values of η make G(·) very close to G0, while small values of η allow G(·) todeviate from G0. This nonparametric Bayesian analysis of specifying the distributionof εi is robust to misspecification of the distributional assumptions about the modelerror.

The AFT model (6) for time-to-event, which can be written as �i = exp(γ T ci )

exp(εi ), reduces to the AFT model considered in Wu et al. (2010) if η = ∞, a log-normal distribution conditional on the random-effects ci . Model (6) may be a goodchoice when the event times are thought to depend on individual-specific longitudinaltrajectories, such as initial intercepts and slopes, or summaries of the longitudinaltrajectories, and it is closely related to so-called shared parameter models (DeGruttolaand Tu 1994). As pointed out in the studies (Wu et al. 2010; Tsiatis and Davidian 2004),in some practical situations, the event times cannot be observed but are only knownas being contained in some time intervals, i.e., being interval-censored. In the AIDSstudy mentioned previously, for example, given that CD4 and CD8 were collectedat a finite number of times, we can only know that the time to the first occurrenceof CD4/CD8 decrease in a subject is between two data collection time points. Theobserved event time data are then (ui , vi ], i = 1, . . . , n, where (ui , vi ] is the smallestobserved interval containing �i . We take ui as the subject’s latest time in the studyand vi = ∞ if the i th subject did not experience the event of interest during the wholestudy period.

2.4 Simultaneous Bayesian inference for parameter estimation

In a longitudinal study, the longitudinal response and covariate processes, and the time-to-event are usually connected physically or biologically. As discussed previously, they

123


can be jointly modeled through the shared random-effects. This joint modeling is oftenbiologically meaningful. Statistical inference on all of the model parameters can thenbe made simultaneously. Thus, the underlying association can be fully addressed orcaptured and the uncertainty is incorporated from the estimation.

Denote the observed data by D = {( yi , zi , ui , vi ), i = 1, . . . , n}, where yi =(yi1, . . . , yini )

T , zi = (zi1, . . . , zimi )T and (ui , vi ] is the smallest observed interval

containing the event time �i . According to the properties of ST distribution, in orderto specify models (1) and (5) for MCMC computation it can be shown by introducingtwo random vectors wei = (wei1, . . . , weini

)T and wεi = (wεi1, . . . , wεimi)T based

on the stochastic representation for the ST distribution (see Appendix in detail) thatzi and yi with respective random-effects ai and bi can be hierarchically formulatedin conjunction with model (6) as follows.

zi |ai ,wεi ;α, σ 21 , δεi , ν1 ∼ tmi ,ν1+mi

(z∗

i + W εi , ωεiσ21 Imi

),

yi |ai , bi ,wei ;α,β, σ 22 , δei , ν2 ∼ tni ,ν2+ni

(gi + W ei , ωeiσ

22 Ini

),

�i |ci ; γ ,G0, η ∼ G(

ln(ti )− γ T ci

),

wεi |ν1 ∼ tmi ,ν1(0, Imi )I (wεi > 0), ai |�a ∼ Nr (0,�a),

wei |ν2 ∼ tni ,ν2(0, Ini )I (wei > 0), bi |�b ∼ Ns4(0,�b), (7)

where W εi = �(δεi )[wεi − J (ν1 + mi )1mi ], W ei = �(δei )[wei − J (ν2 +ni )1ni ], ωεi = (ν1 + wT

εiwεi )/(ν1 + mi ), ωei = (ν2 + wT

eiwei )/(ν2 + ni ), z∗

i =(z∗

i1, . . . , z∗imi)T , gi = gi (t i ,β i ), tl,ν(μ, A) denotes the l dimensional t-distribution

with parameters μ, A and degrees of freedom ν, I (w > 0) is an indicator functionand w ∼ tl,ν(0, I l) truncated in the space w > 0.

Let θ = {α,β, γ , σ 21 , σ

22 ,�a,�b, ν1, ν2, δεi , δei ; i = 1, . . . , n} be the collection

of unknown population parameters in models (1), (5) and (6). Under Bayesian frame-work, we still need to specify prior distributions for all unknown parameters in thethree models as follows.

α ∼ Nr (τ 1,�1), σ21 ∼ IG(ω1, ω2), �a ∼ IW(Ω1, ρ1), δεi ∼ Nmi (0,Γ 1),

β ∼ Ns5(τ 2,�2), σ22 ∼ IG(ω3, ω4), �b ∼ IW(Ω2, ρ2), δei ∼ Nni (0,Γ 2),

ν1 ∼ Exp(ν10)I (ν1 > 3), ν2 ∼ Exp(ν20)I (ν2 > 3), γ ∼ Ns6(τ 3,�3), (8)

where s5 = s2 + p, s6 = 1 + s4 + r , the mutually independent InverseGamma (IG), Normal (N ), Exponential (Exp) and Inverse Wishart (IW ) priordistributions are chosen to facilitate computations. The super-parameter matrices�1, �2, �3, Ω1, Ω2, Γ 1 and Γ 2 can be assumed to be diagonal for convenientimplementation.

Let f (·|·) and π(·) be a conditional density function and a prior densityfunction, respectively. Assume that the prior distribution of θ to be π(θ) =π(α)π(β)π(γ )π(σ 2

1 )π(σ22 )π(�a)π(�b)π(ν1)π(ν2)

∏i π(δεi )π(δei ). After we sp-

ecify the models for the observed data and the prior distributions for the unknownmodel parameters, we can make statistical inference for the parameters based on their

123

102 Y. Huang et al.

posterior distributions. The joint posterior density of θ based on the observed data canbe given by

f (θ |D) ∝{

n∏i

∫ ∫f ( yi |ai , bi ,wei ;α,β, σ 2

2 , δei , ν2) f (wei |wei > 0)

× f (zi |ai ,wεi ;α, σ 21 , δεi , ν1) f (wεi |wεi > 0)

×F∗(ui , vi |ci ; γ ,G0, η) f (ai |�a) f (bi |�b)dai dbi

}π(θ), (9)

where F∗(ui , vi |ci ; γ ,G0, η) = G(ln(vi )− γ T ci

)− G(ln(ui )− γ T ci

). In general,

the integrals in (9) are of high dimension and do not have closed form. Analytic approx-imations to the integrals may not be sufficiently accurate. Therefore, it is prohibitive todirectly calculate the posterior distribution of θ based on the observed data. As an alter-native, MCMC procedures can be used to sample based on (9) using the Gibbs sampleralong with the Metropolis-Hastings (M-H) algorithm. An important advantage of theabove representations based on the hierarchical models is that they can be very easilyimplemented using the freely available WinBUGS software (Lunn et al. 2000) and thecomputational effort is equivalent to the one necessary to fit the normal version of themodel. Note that when using the WinBUGS software, it is not necessary to specify thefull conditional distributions explicitly or proportional functions of the density func-tions of full conditional distributions for parameters. Although their derivations arestraightforward by working the complete joint posterior, some cumbersome algebrawill be involved. Thus, we omit those here to save space.

3 An AIDS clinical study

The aforementioned HIV/AIDS clinical trial (Acosta et al. 2004) consisted of 44 HIV-infected patients, who were treated with a potent antiretroviral regimen. HIV RNAviral load was measured in copies/mL at study days 0, 7, 14, 28, 56, 84, 112, 140 and168 of follow-up. Covariates such as CD4 and CD8 cell counts were also measuredthroughout the study on a similar scheme. The nucleic acid sequence-based amplifi-cation assay was used to measure RNA viral load, with a lower limit of quantificationof 50 copies/mL. The RNA viral load measures below this limit are not consideredreliable. Therefore, we replaced them by 25 copies/mL as in Acosta et al. (2004). Theexact day of viral load measurement (not predefined study day) was used to computestudy day in our analysis. We present below an application of the proposed approachin Sect. 2. The natural logarithmic transformation of HIV RNA copies is used in theanalysis.

3.1 Model specifications

Viral dynamic models can be formulated through a system of ordinary differentialequations (Huang et al. 2006; Wu and Ding 1999). Based on biological and clinical

123


arguments, a biphasic model with both the first- and second-phase viral decay ratesas constants can be approximately offered to describe HIV viral dynamics (Wu andDing 1999). The first- and second-phase viral decay rates may represent the minimumturnover rate of productively infected cells and that of latently or long-lived infectedcells, respectively. It is of particular interest to estimate these two viral decay ratesbecause they quantify the antiviral effect and, hence, can be used to assess the efficacyof the antiviral treatment. In estimating these decay rates, only the early segment ofthe viral load trajectory data have been used (Wu and Ding 1999). Since the viral loadtrajectory may change to different shapes in the late stages, it may not be reasonable toassume that the second-phase decay rate remains constant during long-term treatment.

To model the long-term viral load responses, a semiparametric biexponential modelcan be constructed as follows (Wu and Zhang 2002).

y(t) = ln{V (t)} + e(t) = ln{

ep1−λ1t + ep2−λ2(t)t}

+ e(t), (10)

where V (t) is the plasma HIV-1 RNA levels (viral load) at time t ; λ1 and λ2(t)are called the first- and second-phase viral decay rates, respectively. The second-phase decay rate λ2(t) is an unknown smooth function. Model (10) reduces to thatconsidered by Wu and Ding (1999) and Huang and Dagne (2011) when λ2(t) isconstant. Intuitively, model (10) is more reasonable because it assumes that the decayrate can vary with time as a result of drug resistance, pharmacokinetics, drug adherenceand other relevant clinical factors. Therefore, all data observed during antiretroviraltreatment can be used to fit model (10). This is a semiparametric model becauseof the mechanistic structure (two-exponential) with constant parameters (λ1, p1, p2)and a time-varying parameter (λ2(t)) to capture the various viral load trajectoriesincluding viral load rebound over a long period. This semiparametric model preservescompartmental mechanistic interpretation of the original parametric model under thebiexponential form. In particular, the turnover rate of productively infected cells, λ1,can still be estimated. In fact, by including long-term viral load data, the estimate ofλ1 can be more accurate and reasonable compared with those obtained in previousstudies such as Wu and Ding (1999) and among others, after excluding long-termviral load data for modeling and analysis by some ad hoc rules (that is, the screeningand inclusion of viral load data are quite arbitrary). The estimate of λ2(t) provides notonly an approximate turnover rate over time of long-lived/latently infected cells at theearly stage of treatment as in the standard parametric model, but more importantly, itdescribes how it may change over a long treatment period as driven by, presumably,drug resistance, non-compliance and other clinical determinants. Most importantly,the semiparametric model is capable of modeling long-term viral load data of whichthe trajectory may vary substantially among different patients.

Models for the covariate process are needed in order to incorporate measurementerrors in covariates. CD4 measures are known with nonnegligible errors and ignoringcovariate measurement errors can lead to severely misleading results in a statisticalinference. In this AIDS study, CD4 measurements may be missed at viral load mea-surement times due mainly to a different CD4 measurement scheme as designed inthe study (for example, CD4 measurements were missed at day 7 displayed in Fig. 1).

123

104 Y. Huang et al.

Thus, we assume that the missing data in CD4 are missing at random (MAR) in thesense of Rubin (Rubin 1976), such that the missing data mechanism may be ignoredin the analysis. With CD4 measures collected over time from the AIDS study, wemay model the CD4 process to partially address the measurement errors (Liu and Wu2007; Huang and Dagne 2011). However, the CD4 trajectories are often complicated.To model the covariate CD4 process, given the absence of a theoretical rationale forthe CD4 trajectories, we consider empirical polynomial LME models for the CD4process, and choose the best model based on AIC and BIC values. Specifically, weconsider the covariate model (1) with ui j = vi j = (1, ti j , . . . , tr−1

i j )T and focus onlinear (r = 2), quadratic (r = 3) and cubic (r = 4) polynomials. The resulting AIC(BIC) values are 799.03 (821.74), 703.56 (744.42) and 766.18 (782.08), respectively.Thus, we adopted the following quadratic polynomial ST-LME model for the observedCD4 covariate process.

zi j = (α1 + ai1)+ (α2 + ai2)ti j + (α3 + ai3)t2i j + εi j , (11)

where εi = (εi1, . . . , εini )T ∼ STni ,ν1

(−J (ν1)δε1ni , σ21 Ini , δε Ini

), z∗

i j = (α1 +ai1) + (α2 + ai2)ti j + (α3 + ai3)t2

i j , α = (α1, α2, α3)T is population (fixed-effects)

parameter vector and individual-specific random-effects ai = (ai1, ai2, ai3)T ∼

N3(0,�a). In addition, in order to avoid too small or large estimates which maybe unstable, we standardize the time-varying covariate CD4 cell counts (each CD4value is subtracted by 375.46 and divided by 228.57) and rescale the original time t(in days) such that the time scale is between 0 and 1.

Because viral load is measured on each subject repeatedly over the study period,the measurements obtained from the same subject may be correlated, but they areassumed independent between patients. One powerful tool available to handle suchlongitudinal data is NLME modeling, in which within-subjects and between-subjectsvariations are both considered. In the present context of semiparametric model, itis natural to introduce the ST-SNLME model in conjunction with the HIV dynamicmodel (10) as follows.

yi j = ln{

epi1−λi j1ti j + epi2−λi j2(ti j )ti j}

+ ei j ,

βi j1 ≡ pi1 = β1 + bi1, βi j2 ≡ λi j1 = β2 + β3z∗i j + bi2,

βi j3 ≡ pi2 = β4 + bi3, βi j4(ti j ) ≡ λi j2(ti j ) = w(ti j )+ hi (ti j ),

(12)

where yi j is the natural logarithmic transformation of the viral load for the i th sub-ject at time ti j (i = 1, 2, . . . , n, j = 1, 2, . . . , ni ) and ei = (ei1, . . . , eini )

T ∼STni ,ν2

(−J (ν2)δe1ni , σ22 Ini , δe Ini

), β i j = (βi j1, βi j2, βi j3, βi j4)

T is individualparameters for the i th subject at time ti j . Note that the ln-transformation of the viralload is taken here to stabilize the variation of measurement errors and to speed up esti-mation algorithm. Model (12) is a natural extension of model (10) that accommodatestime-varying covariate CD4 into the first-phase decay rate and specifies unknown non-parametric smooth function for the second-phase decay rate for capturing long-termviral load trajectories with different shapes including viral load rebound.

123


We employ the linear combinations of natural cubic splines with percentile-basedknots to approximate the nonparametric functions w(t) and hi (t). Following studiesin (Liu and Wu 2007; Wu and Zhang 2002), we set ψ0(t) = φ0(t) ≡ 1 and take thesame natural cubic splines in the approximations (3) with q ≤ p. The values of pand q are determined based on the AIC/BIC which suggest the following function forβi j4(t) with p = 3 and q = 1 in the model (12) below.

βi j4(ti j ) ≈ β5 + β6ψ1(ti j )+ β7ψ2(ti j )+ bi4. (13)

Thus, we have population parameter vector β = (β1, β2, . . . , β7)T and individual

random-effects bi = (bi1, bi2, bi3, bi4)T ∼ N4(0,�b).

Let �i be time to the first decline in the CD4/CD8 ratio for the i th subject. We areinterested in the association of time to immune suppression of the individual-specificinitial viral decay rates and the true CD4 trajectory, which are characterized by therandom-effects in the viral load response and CD4 covariate models. We may view the(unobservable) random-effects as error-free covariates in time-to-event models. Theassociated inference can be computationally challenging when joint models consistof (nonlinear) longitudinal models and the semiparametric Cox proportional hazardsmodel for time-to-event, a commonly used failure time model (Tsiatis and Davidian2004), especially when the event times are interval censored. To focus on the primaryissues discussed in the article, we consider the following AFT model for time to firstdecline of the CD4/CD8 ratio.

ln(�i ) = γ T ci + εi , εi ∼ G(·), (14)

where ci =(1, ai1, ai2, bi2, bi4)T with unknown coefficients γ = (γ0, γ1, γ2, γ3, γ4)

T

and we assume εi follows an unspecified distribution G(·). In model (14), the random-effects bi2 and bi4 represent individual variations in the first- and second-phase viraldecay rates, respectively, so they may be predictive for event times. While bi1 andbi3 represent variations in the baseline viral loads, they do not appear to be highlypredictive of event times, so they are excluded from the model to reduce the numberof parameters. The random-effects ai1 and ai2 capture the main features of individualCD4 trajectories (such as initial intercept and slope) and thus, are included in the model.Model (14) is a parametric AFT model with random-effects as covariates and appearsto fit the current dataset reasonably well. It offers the advantages of easy interpretationand robustness against neglected covariates (Wu et al. 2010; Hougaard 1999). Model(14) offers the following advantages: (1) the random-effects in the covariate modelsummarize the history of the covariate process, with individual-specific interceptsand rates of change, and the summary quantities are likely better predictors than thecovariates at several particular times; (2) the random-effects in the response modelsummarize individual variations in the first- and second-phase viral decay rates topredict time to immune suppression; (3) the link between the three models is madeclear by the shared random-effects; (4) the model error is assumed to be more flexibleand unspecified distribution that has DP prior; (5) it is easy to implement. Given theCD4 and CD8 evaluation scheme, the times to the CD4/CD8 decrease in the analysis

123

106 Y. Huang et al.

were interval-censored. The median of the observed interval length was 15 days, whilethree of forty-four subjects in the study did not experience a CD4/CD8 decrease.

3.2 Data analysis

In this section, we analyze the AIDS data set described in Sect. 3.1 to illustrate ourmethodologies in conjunction with the three models (11), (12) and (14) for the spec-ified prior distributions (8). As discussed previously, the viral load (in ln scale) and(standardized) CD4 cell count in this data set clearly indicate their asymmetric nature.Thus, it appears adequate fitting models with a skew distribution to the data set. Withthis information, the following three statistical models with different specification ofdistributions for the model errors based on both the SNLME response model (12) forviral load response and the LME covariate model (11) for CD4 process in conjunctionwith the AFT model (14) with an unspecified distribution for the time to CD4/CD8decline are employed to compare their performance.

– Model I: A model with independent multivariate ST distributions of random errorsfor both the response model (12) and the covariate model (11).

– Model II: A model with independent multivariate SN distributions of random errorsfor both the response model (12) and the covariate model (11).

– Model III: A model with independent multivariate normal distributions of randomerrors for both the response model (12) and the covariate model (11).

We will investigate the following three scenarios. First, since a normal distributionis a special case of the SN distribution when skewness parameter is zero, while theST distribution reduces to the SN distribution when the degrees of freedom approachinfinity, we will investigate how an asymmetric (SN or ST) distribution for modelerror contributes to modeling results and parameter estimation in comparison with asymmetric (normal) distribution. Second, we estimate the model parameters by usingthe ‘naive’ method, which does not separate the measurement errors from the true CD4values. On other words, the ‘naive’ method (denoted by NM) only uses the observedCD4 values zi j rather than true (unobserved) CD4 values z∗

i j in the response model(12), and we use it as a comparison to the joint modeling (JM) method proposedin Sect. 2. This comparison attempts to investigate how the measurement errors inCD4 influence modeling results. Third, we investigate the JM approach based on theparametric NLME model (denoted by PJM), obtained by setting w(ti j ) = β5 + β6z∗

i jand h(ti j ) = bi4 in model (12), to evaluate how the semiparametric NLME model (12)contributes to modeling results in comparison with the parametric NLME model.

In order to carry out the Bayesian inference, we must specify the values of the hyper-parameters in the prior distributions. In the Bayesian analysis, we only need to specifythe priors at the population level. We take weakly informative prior distribution forthe parameters in the models. In particular, (1) fixed-effects were taken to be indepen-dent normal distribution N (0, 100) for each component of the population parametervectors α, β and γ . (2) For the variances σ 2

1 and σ 22 we assume a limiting non-

informative inverse gamma prior distribution, I G(0.01, 0.01) so that the distributionhas mean 1 and variance 100. (3) The priors for the variance-covariance matrices of therandom-effects �a and �b are taken to be inverse Wishart distributions I W (Ω1, ρ1)

123


and I W (Ω2, ρ2) with covariance matrices Ω1 = diag(0.01, 0.01, 0.01), Ω2 =diag(0.01, 0.01, 0.01, 0.01) and ρ1 = ρ2 = 4, respectively. (4) The degrees of free-dom parameters ν1 and ν2 follow truncated exponential distribution with ν10 = ν20 =0.5. (5) For each of the skewness parameters δe and δε , we choose independent nor-mal distribution N (0, 100), where we assume that δei = δe1ni and δεi = δε1mi toindicate that we are interested in skewness of both overall viral load data and over-all CD4 cell count data. (6) The hyper-parameters for the DP prior are assigned byη10 = 0.5, η20 = 0.01, ζ0 = 0 and σ 2

0 = 100.The MCMC sampler was implemented using WinBUGS software (Lunn et al.

2000), and the program codes are available from the authors upon request. In partic-ular, the MCMC scheme for drawing samples from the posterior distributions of allparameters in the three joint models is obtained by iterating between the following twosteps: (1) Gibbs sampler is used to update α,β, γ , σ 2

1 , σ22 ,�a,�b, ν1, ν2, δε and δe;

(2) we update bi and ai (i = 1, 2, . . . , n) using the M-H algorithm. After collectingthe final MCMC samples, we are able to draw statistical inference for the unknownparameters. Specifically, we are interested in the posterior means and quantiles. See thearticles (Lunn et al. 2000; Huang et al. 2006) for detailed discussions of the Bayesianmodeling approach and the implementation of the MCMC procedures, including thechoice of the hyper-parameters, the iterative MCMC algorithm, the choice of proposaldensity related to M-H sampling, sensitivity analysis and convergence diagnostics.When the MCMC implementation is applied to the actual clinical data, convergenceof the generated samples is assessed using standard tools within WinBUGS softwaresuch as trace plots and Gelman-Rubin (GR) diagnostics (Gelman and Rubin 1992;Ntzoufras 2009). Figure 2 shows the dynamic version of GR diagnostics as obtainedfrom the WinBUGS software for the representative parameters where the three curvesare given: the middle and bottom curves below the dashed horizontal line (indicatedby the value one) represent the pooled posterior variance (V , green color) and averagewithin-sample variance (W , blue color), respectively, and the top curve representstheir ratio (R, red color), which is associated with the potential scale reduction factor(Gelman and Rubin 1992; Brooks and Gelman 1998). Note that in WinBUGS, thesemeasures of posterior variability are estimated based on the widths of the 80 % poste-rior credible intervals (see Ntzoufras 2009 in detail). It is seen that R tends to 1, andV and W will stabilize as the number of iterations increase indicating that the algo-rithm has approached convergence. With the GR convergence diagnostics observed,we propose that every 20th MCMC sample is retained from the next 200,000 afteran initial number of 50,000 burn-in iterations of a single chain of length 250,000.Thus, we obtain 10,000 samples of targeted posterior distributions of the unknownparameters for statistical inference. Along with this sampling procedure, a sensitivityanalysis has been performed to ensure that the data, rather than priors, dominate theposterior estimates (see Sect. 4 for details). We also check the k-lag serial correla-tion of the samples for each parameter to diagnose independence of MCMC samples.We graphically checked the last 500 samples drawn from MCMC sampling schemefor each parameter (plot not shown here) and found that the consecutive samplesmove randomly towards different directions, indicating that MCMC is not “sticky”and MCMC samples are independent for each parameter, suggesting the convergenceto the stationary distribution.

123

108 Y. Huang et al.

Fig. 2 Gelman-Rubin (GR) diagnostic plot with the two Markov chains as obtained from the WinBUGSsoftware for the representative parameters. The middle and bottom curves below the dashed horizontal line(indicated by the value one) represent the pooled posterior variance (V , green color) and average within-sample variance (W , blue color), respectively, and the top curve represents their ratio (R, red color), whichis associated with the potential scale reduction factor

3.3 Analysis results

The Bayesian SNLME joint modeling approach in conjunction with the three modelsfor the viral load response, the CD4 covariate process and the time to first declineof CD4/CD8 ratio with different specifications of model distribution was used to fitthe data. From the model fitting results, we have seen that, in general, all the modelsprovided a reasonably good fit to the observed data for most patients in our study,although the fitting for a few patients was not completely satisfactory due to unusualdata fluctuation patterns for these patients, particularly in Model III. To assess thegoodness-of-fit of the proposed models, the diagnosis plots of the observed valuesversus the fitted values (top panel) and ST, SN and normal Q–Q plots (bottom panel)from Models I, II and III are presented in Fig. 3. It was seen from Fig. 3 (top panel) thatthe model where the random error is assumed to have ST and SN distributions gave

123


**

***

**

*

*

**

**

*

**

*

*

*

*

*

*

*

*

*

**

*

*

*

***

*

***

*

****

***

*

**

*

*

*

*

**

*

* ***

*

*

*

*

**

*

**

***

**

**

*

*

**

**

*

***

*

**

*

*******

*

***

*

*

*

***

***

*

*

**

*

*

*

*

*

* ******

*

*

**

*

*

*

*

*

****

**

*

**

***

**

*

***

* **

*

*

**

**

*

*

**

***

*

*

******

*

*

*

** *

****

*

*****

*

******

*

*

**** **

**

*****

*

*

*******

*

**

*

*

*

*

*** **

**

**

**

*

**

* ***

*

***

**

**

*

* **

*** *

*

***

*

**

*

**

**

*

*

*

*

*

*

* *

*

**

*

**** ***

*

****

***

***

****

Model IO

bse

rved v

alu

es

of

Fitted values of log(RNA)

**

***

**

*

*

**

**

*

**

*

*

*

*

*

*

*

*

*

**

*

*

*

***

*

* **

*

****

***

*

**

*

*

*

*

**

*

* ***

*

*

*

*

**

*

**

***

* *

**

*

*

**

**

*

***

*

**

*

*******

*

***

*

*

*

***

**

*

*

*

**

*

*

*

*

*

*******

*

*

**

*

*

*

*

*

******

*

**

***

**

*

***

* **

*

*

**

**

*

*

**

* * *

*

*

******

*

*

*

***

****

*

**** *

*

******

*

*

******

**

*****

*

*

**** ***

*

**

*

*

*

*

****

***

**

**

*

**

* ***

*

***

**

**

*

***

****

*

***

*

**

*

**

**

*

*

*

*

*

*

* *

*

**

*

**** ***

*

****

***

***

*** *

Model II


**

** *

* *

*

*

**

**

*

**

*

*

*

*

*

*

*

*

*

**

*

*

*

***

*

* **

*

****

***

*

**

*

*

*

*

**

*

* ***

*

*

*

*

***

**

***

* *

**

*

*

**

**

*

***

*

**

*

*******

*

***

*

*

*

******

*

*

**

*

*

*

*

*

*******

*

*

**

*

*

*

*

*

****

**

*

**

*****

*

***

***

*

*

*****

*

**

* * *

*

*

******

*

*

*

***

****

*

*****

*

******

*

*

******

**

******

*

*******

*

**

*

*

*

*

*******

**

**

*

******

*

***

**

**

*

***

****

*

***

*

**

*

**

**

*

*

*

*

*

*

* *

*

* *

*

**** ***

*

*** *

* **

***

*** *

Model III


Model I

Resi

duals

Theoretical Quantiles

Model IIR

esi

duals


2 4 6 8 10 12 14

24

68

1012

14

2 4 6 8 10 12 14

24

68

1012

14

2 4 6 8 10 12 14

24

68

1012

14

−2 −1 0 1

−0.2

−0.1

0.0

0.1

0.2

−2 −1 0 1

−0.

2−

0.1

0.0

0.1

0.2

−3 −2 −1 0 1 2 3

−20

24

Model III

Resi

duals


log(R

NA

)

Obse

rved v

alu

es

of

log(R

NA

)

Obse

rved v

alu

es

of

log(R

NA

)

of Normalof ST of SN

Fig. 3 The goodness-of-fit: Observed values versus fitted values of ln(RNA) (top panel) and ST, SN ornormal Q–Q plot with line (bottom panel)

a similar performance, but they provided a better fit to the observed data, comparedwith the model where the random error is assumed to have normal distribution. Thisresult can also be explained by examining the Q–Q plots of the residuals (bottompanel) that all plots show the existence of outliers, but it is clearly seen that ModelsI and II only have few negative outliers and thus, provide a better goodness-of-fit tothe data than Model III. Further, Model I betetr accounts for heaviness in the tail aswell as skewness. This finding is further confirmed by their standardized residual sumof squares (RSS), which are 219.8 (ST random error), 309.3 (SN random error) and410.4 (normal random error).

The population posterior mean (PM), corresponding standard deviation (SD) and9 % credible interval for fixed-effects parameters based on Models I, II and III withtwo methods (joint modeling approach and naive method) are presented in Tables 1and 2. The following findings are observed for estimated results of parameters.

In the response model (12), the findings, particularly for the fixed-effects (β2, β3)which are coefficient parameters of the first-phase viral decay rate λ1, show that theseestimates are different from zero for all three models since the 95 % credible intervalsdo not contain zero. Nevertheless, for the coefficient of CD4 covariate β3, its estimateis significantly positive. That means CD4 has a positive effect on the first-phase viraldecay rate, suggesting that the CD4 covariate may be an important predictor of thefirst-phase viral decay during treatment. Further, the estimate of β3 based on ModelI is larger than those based on Models II and III. There are marked differences inposterior means of the variance parameter σ 2

2 across these Models considered, whichare 0.01, 0.07 and 1.25 for Models I, II and III, respectively. The estimated values of the

123

110 Y. Huang et al.

Tabl

e1

Asu

mm

ary

ofth

ees

timat

edpo

ster

ior

mea

n(P

M)

ofpo

pula

tion

(fixe

d-ef

fect

s)pa

ram

eter

s,as

wel

las

the

corr

espo

ndin

gst

anda

rdde

viat

ion

(SD

)an

dlo

wer

limit

(LC

I)an

dup

per

limit

(UC

I)of

95%

equa

l-ta

ilcr

edib

lein

terv

al(C

I)ba

sed

onth

ena

ive

met

hod

(NM

)an

dth

ejo

int

mod

elin

g(J

M)

appr

oach

for

sem

ipar

amet

ric

NL

ME

(SJM

)m

odel

and

para

met

ric

NL

ME

mod

el(P

JM)

Met

hod

Mod

elβ

1β

2β

3β

4α

1α

2α

3γ

0γ

1γ

2γ

3γ

4

SJM

IPM

6.09

33.8

12.2

2.13

−0.1

00.

52−0.1

4−2.0

8−0.0

80.

680.

030.

04

LC

I5.

2526.2

5.29

1.62

−0.4

10.

12−0.5

9−2.3

3−0.6

0−0.6

0−0.9

5−0.2

1

UC

I6.

7640.9

20.1

2.60

0.19

0.95

0.28

−1.8

60.

362.

081.

150.

37

SD0.

383.

473.

440.

270.

150.

210.

230.

120.

250.

670.

470.

14

IIPM

5.23

26.1

10.1

0.41

−0.6

50.

58−0.2

2−2.0

8−0.0

80.

470.

040.

01

LC

I4.

6320.3

3.70

−0.8

3−1.1

70.

01−0.7

6−2.3

4−0.9

9−1.3

2−0.8

7−0.3

1

UC

I5.

9332.7

17.7

2.01

0.40

1.09

0.37

−1.8

30.

531.

861.

170.

34

SD0.

333.

133.

370.

810.

460.

280.

290.

130.

360.

710.

490.

15

III

PM8.

3529.2

7.85

3.70

−0.2

30.

64−0.2

6−2.0

7−0.0

60.

40−0.0

10.

06

LC

I7.

8522.6

0.66

2.16

−0.4

8−0.1

0−0.8

1−2.3

0−0.6

5−0.4

2−1.3

2−0.3

4

UC

I8.

8436.9

15.6

5.00

0.03

1.16

0.32

−1.8

70.

421.

301.

620.

77

SD0.

253.

723.

810.

720.

130.

270.

290.

110.

270.

440.

690.

24

NM

IPM

7.87

32.4

10.6

3.94

−0.5

40.

62−0.2

5−2.0

7−0.0

70.

050.

05−0.0

2

LC

I5.

3826.0

3.43

2.22

−0.9

60.

21−0.6

8−2.2

8−0.7

2−0.6

8−1.3

6−0.7

1

UC

I9.

1540.6

19.2

5.57

0.02

1.02

0.20

−1.8

70.

431.

581.

400.

31

SD1.

233.

794.

121.

000.

280.

200.

220.

110.

290.

560.

610.

20

PJM

IPM

6.13

31.9

13.2

2.28

−0.1

60.

57−0.2

0−2.0

8−0.0

60.

660.

210.

03

LC

I5.

5526.2

7.06

1.21

−0.7

40.

18−0.6

0−2.2

9−0.6

7−0.6

0−0.5

3−0.5

0

UC

I6.

7037.0

19.8

3.03

0.24

0.95

0.24

−1.8

60.

471.

891.

250.

30

SD0.

302.

843.

470.

440.

260.

190.

210.

110.

290.

630.

460.

19

SJM

IPM

8.50

32.5

12.5

4.62

−0.1

90.

70−0.3

2−2.0

7−0.2

30.

750.

050.

04

Uni

form

LC

I8.

1028.2

6.08

4.21

−0.5

00.

33−0.7

5−2.3

2−1.0

5−0.6

6−0.9

0−0.3

4

123


Tabl

e1

cont

inue

d

Met

hod

Mod

elβ

1β

2β

3β

4α

1α

2α

3γ

0γ

1γ

2γ

3γ

4

Prio

rU

CI

8.89

38.6

17.3

4.99

0.02

1.10

0.11

−1.8

40.

531.

881.

270.

59

onσ

kSD

0.20

2.63

3.04

0.23

0.13

0.20

0.23

0.12

0.41

0.65

0.48

0.20

SJM

IPM

8.60

35.4

11.7

5.28

−0.1

70.

55−0.1

8−2.0

6−0.0

40.

480.

12−0.0

4

Hal

f-C

auch

yL

CI

7.90

29.8

6.21

5.11

−0.3

70.

15−0.6

2−2.2

9−0.6

0−1.0

3−0.8

4−0.8

3

Prio

rU

CI

9.19

42.5

18.7

5.49

0.05

0.98

0.26

−1.8

50.

641.

631.

290.

82

onσ

kSD

0.33

3.23

3.54

0.10

0.11

0.21

0.22

0.11

0.30

0.65

0.55

0.40

123

112 Y. Huang et al.

Tabl

e2

Asu

mm

ary

ofth

ees

timat

edpo

ster

ior

mea

n(P

M)

for

the

para

met

ers

ofva

rian

ce,s

kew

ness

and

degr

ees

offr

eedo

m,a

ndco

rres

pond

ing

stan

dard

devi

atio

n(S

D)

and

low

erlim

it(L

CI)

and

uppe

rlim

it(U

CI)

of95

%eq

ual-

tail

cred

ible

inte

rval

(CI)

asw

ell

asSS

Rva

lues

base

don

the

naiv

em

etho

d(N

M)

and

the

join

tm

odel

ing

(JM

)ap

proa

chfo

rse

mip

aram

etri

cN

LM

Em

odel

(SJM

)an

dpa

ram

etri

cN

LM

Em

odel

(PJM

)

Met

hod

Mod

elσ

2 1σ

2 2δ ε

δ eν

1ν

2SS

R

SJM

IPM

0.05

0.01

0.13

1.17

3.25

3.28

219.

8

LC

I0.

030.

01−0.1

21.

003.

013.

01

UC

I0.

060.

030.

331.

433.

934.

07

SD0.

010.

010.

110.

080.

260.

29

IIPM

0.07

0.07

0.28

1.87

––

309.

3

LC

I0.

030.

01−0.4

01.

58–

–

UC

I0.

130.

280.

562.

12–

–

SD0.

020.

070.

290.

14–

–

III

PM0.

131.

25–

––

–41

0.4

LC

I0.

111.

03–

––

–

UC

I0.

151.

52–

––

–

SD0.

010.

13–

––

–

NM

IPM

0.03

0.42

0.15

0.24

3.24

3.45

256.

6

LC

I0.

010.

01−0.0

6−0.2

83.

013.

01

UC

I0.

060.

730.

301.

273.

894.

70

SD0.

020.

240.

120.

550.

240.

48

PJM

IPM

0.02

0.04

0.11

1.10

3.46

3.23

263.

5

LC

I0.

010.

01−0.1

20.

863.

103.

01

UC

I0.

090.

060.

241.

374.

873.

86

SD0.

030.

010.

100.

120.

500.

24

SJM

IPM

0.02

0.01

0.24

1.13

3.23

3.28

213.

8

unif

orm

LC

I0.

010.

010.

130.

973.

103.

01

123


Tabl

e2

cont

inue

d

Met

hod

Mod

elσ

2 1σ

2 2δ ε

δ eν

1ν

2SS

R

prio

rU

CI

0.04

0.07

0.32

1.38

3.87

4.02

onσ

kSD

0.01

0.02

0.05

0.11

0.24

0.27

SJM

IPM

0.04

0.01

0.19

1.33

3.25

3.17

217.

3

half

-Cau

chy

LC

I0.

010.

010.

011.

223.

013.

01

prio

rU

CI

0.06

0.02

0.23

1.41

3.93

3.58

onσ

kSD

0.01

0.01

0.08

0.05

0.25

0.15

123

114 Y. Huang et al.

variance in both Model I and Model II are much smaller than that of Model III becausethe former models take into account skewness of the data while the latter does not.Furthermore, Model I has the least estimate of σ 2

2 since it considers heaviness in tailsas well as skewness. That is, the standard error of Model I is smaller than that of ModelII, indicating that Model I with longer tails seems to produce more accurate posteriorestimate than Model II. Thus, Model I, which is based on ST distribution for themodel error, is the most preferred model to use in modeling viral load responses. Theestimates of the skewness parameter (δe) of Models I and II are 1.17 with 95 % credibleintervals (1.00, 1.43) and 1.87 with 95 % credible intervals (1.58, 2.12), respectively.This finding suggests that there is a positive and significant skewness in the data andconfirms the fact that the distribution of the original data is skewed even after takingthe ln-transformation. Thus, incorporating skewness parameter in modeling the data isrecommended. Furthermore, the estimate (1.17) of the skewness parameter based onModel I with ST distribution is less than that (1.87) of Model II with SN distribution.This may be due to the fact that an additional parameter, the degrees of freedomν2, for heaviness in the tails was estimated to have 3.28 trading-off the effect ofskewness.

For parameter estimates of the CD4 covariate model (4), the estimate of the coeffi-cient α2 in Model I is slightly smaller than that in Model II. Its estimate is significantlydifferent from zero with a positive value. This finding suggests that there is a posi-tively linear relationship between CD4 cell count and measurement time, and thisresult may be explained by the fact that CD4 cell count may increase during treat-ment, and in turn, indicates an overall improvement of AIDS progression in this AIDStrial study. There are differences in the posterior means of the variance σ 2

1 , whichare 0.05, 0.07, and 0.13 for Model I, II and III, respectively. The estimated valuesin both Models I and II are much smaller than that in Model III because the formermodels take into account skewness of the data while the latter does not. Furthermore,Model I has the least estimate of σ 2

1 since it considers heaviness in tails as well asskewness.

The estimates for the parameters in the AFT model (14) do not directly show thattime to CD4/CD8 decrease is highly associated with either the two viral decay ratesor the CD4 changing rates over time, which is different from what was anticipated.This finding is consistent with the study by Wu et al. (2010).

For comparison, we employed the NM based on Model I to estimate the modelparameters presented in Tables 1 and 2 using the observed CD4 and ignoring the CD4measurement error in model (12). The difference between the naive estimates andthe joint modeling estimates, depending on whether or not to ignore potential CD4measurement error in conjunction with model (12), indicates that CD4 measurementerror can not be ignored in the analysis.

We also considered the joint modeling approach for a parametric NLME model,denoted by PJM, in comparison with the semiparametric NLME model developedin this paper. We find that the parametric model may lead to possibly underesti-mates and/or overestimates (see, e.g., estimates of β2, β3, β5 and β6). Thus, thesemiparametric model offers possibly more reliable estimates of some parameters,which may lead to better understanding of the HIV viral dynamics and the treatmentefficacy.

123


Estimated time−varying lambda1 curve

Day

Lam

bda1

0 50 100 150

33.5

34.0

34.5

35.0

35.5

0 50 100 150

−2.

5−

2.3

−2.

1

Estimated time−varying lambda2 curve

Day

Lam

bda2

(a) (b)

Fig. 4 The estimated curves of the first and second-phase viral decay rates based on Model I

4 Concluding discussion

We present a Bayesian joint modeling for three components (response, covariate andtime-to-event processes) linked through the random-effects that characterize the under-lying individual-specific longitudinal processes. We consider SNLME models with anasymmetric (ST or SN) distribution in comparison with a symmetric (normal) distrib-ution. The approach is applied to jointly model HIV dynamics and time to decrease inCD4/CD8 ratio in the presence of the CD4 covariate process with measurement errorto provide a tool to assess antiretroviral treatment and to monitor disease progression.Among the models considered in the application, Model I with ST distribution formodel error was found to be favorable.

The estimated results presented in Table 1 based on (best) Model I indicate thatthe population CD4 trajectory may be approximated by the quadratic polynomialz(t) = 228.57(−0.10 + 0.52t − 0.14t2)+ 375.46, where z(t) is in the original CD4scale. Figure 4 shows the estimated first- and second-phase rates (λ1 and λ2(t)) ofchange in viral load based on Model I. Thus, the population viral load process may beapproximated by V (t) = exp(6.09 − λ1t)+ exp(2.13 − λ2(t)t). Since the first-phaseviral decay rate (λ1) is closely associated with the true CD4 values (due to significantestimate of β3), the viral load change V (t) may be highly associated with the trueCD4 values. The simple approximation considered here may provide new scientificinsights into the true association.

The analysis results indicate that the first-phase and second-phase time varyingviral decay rates are always positive and negative in counterpart. The results in Fig. 4indicate that the first-phase (the second-phase) decay rate increases (decreases) at theearly stage before approximately day 80 and then decreases (increases) at the latestage. This finding is biologically meaningful and may reflect viral load trajectory onthe rapid decay phase and then a slow growth phase on population level. The trueCD4 process has a significantly positive effect on the first-phase viral decay rate; thisfinding confirms that the CD4 covariate may be a significant predictor of the first-phase viral decay rate during the treatment process. More rapid increases in CD4

123

116 Y. Huang et al.

cell count may be associated with faster viral decay in the early stages. This may beexplained by the fact that higher CD4 cell counts suggest a higher turnover rate oflymphocyte cells, which may cause a positive correlation between viral decay andCD4 cell count. In addition, the estimate of within-subject variance (σ 2

2 ) across thethree models considered is 0.01 for Model I, 0.07 for Model II and 1.25 for ModelIII; it indicates that a gain in significant efficiency for the skew model relative to thenormal model is observed for its estimation. This is expected because high variability,heaviness of the tails and skewness are interrelated to a certain extent.

For viral load response, the skewness parameter δe and the degrees of freedom ν2based on Model I are estimated as 1.17 and 3.28. This confirms the positive skewnesswith heavy tail of the viral load. It may suggest that accounting for significant skewnessand heaviness in the tails be required to model the data when data exhibit skewnesswith heavy tails. For the covariate model, the estimate of skewness parameter δε andthe degrees of freedom ν1 based on Model I are 0.13 and 3.25 indicating slightlypositive skewness but not significant since the 95 % credible interval contains zero.This may be explained by the fact that an additional parameter ν1 for heaviness inthe tails was estimated to have 3.25 lessening the effect of skewness. In addition, theestimated results of the parameters in the time-to-event model (14) based on (best)Model I suggest that the time to first decline of CD4/CD8 ratio is not highly associatedwith either the two viral decay rates or the CD4 changing rates over time. This findingis consistent with that in the study by Wu et al. (2010) in which further explanationsare provided. It is worth noting, given that the complex structure of the models isconsidered, there is the potential for overfitting due to the relatively small number ofsubjects involved in this study.

In order to examine the sensitivity of parameter estimates to the prior distribu-tions, we conducted a limited sensitivity analysis using the uniform prior distributionU (0, 100) or the half-Cauchy prior distribution with scale parameter of 25 for standarddeviation scales σ1 and σ2 (Gelman 2006) instead of inverse gamma prior distribution(used in this paper) for variance parameters σ 2

1 and σ 22 ; and we refitted data based

on Model I. The results are summarized in Tables 1 and 2. It was shown from theresults of sensitivity analysis that the conclusions with different prior distributions arein agreement. Thus, the findings from our analysis remain unchanged.

Modeling skewness by modifying well-known distributions is a topic that hasreceived much attention over the past years. In the presence of skewness in the lon-gitudinal data and measurement errors in covariates, we propose a robust Bayesianapproach to an SNLME joint model based on the ST distribution as a powerful toolto handle time-to-event data and longitudinal data with asymmetric and discrepantbehaviors in repeated measurements. The proposed methods enhance the modelingflexibility and allow practitioners to analyze longitudinal and event time data in awide variety of considerations. In addition, the proposed joint modeling approach canbe easily implemented using the publicly available WinBUGS package. This makesour approach quite powerful and accessible to practicing statisticians in the field. It isnoted that further analysis to simply use the unreliable observations (actually observedvalues) below limit of quantification or to adopt more advanced imputation methods,instead of such values imputed by half of limit of quantification, may be conductedfor inference, but results of parameter estimates may be interpreted differently. In

123


addition, there are certain limitations to note in our study. For example, the currentstudy is not intended to consider nonignorable missing data and data with detectionlimit problems. These complicated problems are beyond the focus of this article, buta further study may be warranted. A final issue to note is that in our application, bothtime-to-event and NLME response models are related to CD4 cell count, but CD4 cellcount was incorporated into these two models for different purposes. NLME responsemodels accommodate time-varying CD4 cell count as a covariate into the first-phasedecay rate to predict viral trajectories in the early period, while the CD4 cell countwas used to determine time to first decline in the CD4/CD8 ratio which served as anoutcome in the time-to-event model. Since time to decrease in the CD4/CD8 ratio isclosely associated with HIV viral rebound, it is thus related to a long-term antiviralresponse in the later period.

Acknowledgments The authors are extremely grateful to the Editor, an Associate Editor and an anony-mous reviewer for their insightful comments and suggestions that led to a marked improvement of the article.This research was partially supported by NIAID/NIH Grant R03AI080338 and USF Proposal EnhancementGrant to Huang, Natural Sciences and Engineering Research Council of Canada discovery Grant to Hu, andNIMH/NIH Grant R01MH040859-22 to Dagne.

Appendix: multivariate skew distributions

Different versions of the multivariate skew–normal(SN) and skew-t (ST) distributionshave been considered and used in the literature (Arellano-Valle and Genton 2005;Arellano-Valle et al. 2007; Azzalini and Capitanio 2003; Sahu et al. 2003 and oth-ers). A new class of distributions by introducing skewness in multivariate ellipticallydistributions were developed in publication (Sahu et al. 2003). The class, which isobtained by using transformation and conditioning, contains many standard familiesincluding the multivariate SN and ST distributions. For completeness, this appen-dix briefly summarizes the multivariate SN and ST distributions that will be used indefining the joint models with skew distributions considered in this paper. Assume anm-dimensional random vector Y follows an m variate SN or ST distribution with loca-tion vector μ, m × m positive (diagonal) dispersion matrix � and m × m skewnessmatrix �(δ) = diag(δ1, δ2, . . . , δm) or the degrees of freedom ν, where skewnessparameter vector δ = (δ1, . . . , δm)

T . In what follows, we briefly discuss multivari-ate SN and ST distributions introduced by Sahu et al. (2003), which are suitable forBayesian inference since they are built using the conditional method. For detaileddiscussion on properties of SN and ST distributions, see reference (Sahu et al. 2003).

Skew-t distribution

An m-dimensional random vector Y follows an m-variate ST distribution if its prob-ability density function (pdf) is given by

f ( y|μ,�,�(δ), ν) = 2mtm,ν( y|μ, A)P(U > 0), (15)

123

118 Y. Huang et al.

where A = � + �2(δ), we denote the m-variate t distribution with parameters μ, Aand degrees of freedom ν by tm,ν(μ, A) and the corresponding pdf by tm,ν( y|μ, A)henceforth, U follows the t distribution tm,ν+m(·). We denote this distribution bySTm,ν(μ,�,�(δ)). In particular, when � = σ 2 Im and �(δ) = δ Im , equation (15)simplifies to

f ( y|μ, σ 2, δ, ν) = 2m(σ 2 + δ2)−m/2 �((ν + m)/2)

�(ν/2)(νπ)m/2

×{

1 + ( y − μ)T ( y − μ)

ν(σ 2 + δ2)

}−(ν+m)/2

× Tm,ν+m

[{ν + (σ 2 + δ2)−1( y − μ)T ( y − μ)

ν + m

}−1/2

δ( y − μ)

σ√σ 2 + δ2

], (16)

where Tm,ν+m(·) denotes the cumulative distribution function (cdf) of tm,ν+m(0, Im).However, unlike the SN distribution below, the ST density can not be written as theproduct of univariate ST densities. Here Y are dependent but uncorrelated.

The mean and covariance matrix of the ST distribution STm,ν(μ, σ2 Im,�(δ)) are

given by

E(Y) = μ + (ν/π)1/2�((ν − 1)/2)

�(ν/2)δ,

cov(Y ) =[σ 2 Im + �2(δ)

] ν

ν − 2− ν

π

[�{(ν − 1)/2}�(ν/2)

]2

�2(δ). (17)

It is noted that when δ = 0, the ST distribution reduces to the usual t distribu-tion. In order to have a zero mean vector, we should assume the location parameterμ = −(ν/π)1/2 �((ν−1)/2)

�(ν/2) δ, which is what we assume in the paper. In order to betterunderstand the shape of an ST distribution, plots of an ST density as a function of theskewness parameter with δ = −3, 0, 3 are shown in Fig. 5a.

By the proposition of Sahu et al. (2003), the ST distribution of Y has a convenientstochastic representation as follows.

Y = μ + �(δ)|X0| + �1/2 X1, (18)

where X0 and X1 are two independent random vectors following tm,ν(0, Im). Notethat the expression (18) provides a convenient device for random number generationand for implementation purposes. Let w = |X0|; then w follows an m-dimensionalstandard t distribution tm,ν(0, Im) truncated in the space w > 0 (i.e., the standardhalf-t distribution). Thus, a hierarchical representation of (18) is given by

Y |w ∼ tm,ν+m(μ + �(δ)w, ω�), w ∼ tm,ν(0, Im)I(w > 0), (19)

where ω = (ν + wT w)/(ν + m).

123


Den

sity

func

tion

f(t)

Time (t)

ST density with mean=0

delta=0delta=3delta=−3

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

0.6

−4 −2 0 2 4

0.0

0.2

0.4

0.6

Den

sity

func

tion

f(t)

Time (t)

SN density with mean=0

delta=0delta=3delta=−3

(a) (b)

Fig. 5 The univariate skew-t (df ν = 4) and skew-normal density functions with variance σ 2 = 1 andskewness parameter δ = 0,−3 and 3, respectively

Skew-normal distribution

We briefly discuss a multivariate SN distribution introduced by Sahu et al. (2003) inthis section. An m-dimensional random vector Y follows an m-variate SN distribution,if its pdf is given by

f ( y|μ,�,�(δ)) = 2m |A|−1/2φm{A−1/2( y − μ)}P(U > 0), (20)

where U ∼ Nm{�(δ)A−1( y − μ), Im − �(δ)A−1�(δ)}, and φm(·) is the pdfof Nm(0, Im). We denote the above distribution by SNm(μ,�,�(δ)). An appeal-ing feature of equation (20) is that it gives independent marginal when � =diag(σ 2

1 , σ22 , . . . , σ

2m). The pdf (20) thus reduces to

f ( y|μ,�,�(δ)) =m∏

i=1

⎡⎣ 2√

σ 2i + δ2

i

φ

⎧⎨⎩

yi − μi√σ 2

i + δ2i

⎫⎬⎭�

⎧⎨⎩δi

σi

yi − μi√σ 2

i + δ2i

⎫⎬⎭

⎤⎦, (21)

where φ(·) and �(·) are the pdf and cdf of the standard normal distribution, respec-tively.

The mean and covariance matrix are given by E(Y) = μ + √2/πδ, cov(Y ) =

� + (1 − 2/π)�2(δ). It is noted that when δ = 0, the SN distribution reduces tousual normal distribution. In addition, the SN distribution is a special case of theST distribution. That is, the ST distribution reduces to the SN distribution when thedegrees of freedom ν → ∞. In order to have a zero mean vector, we should assumethe location parameter μ = −√

2/πδ. In order to better understand the shape of anSN distribution, plots of an SN density as a function of the skewness parameter withδ = −3, 0, and 3 are shown in Fig. 5b.

According to Sahu et al. (2003), if Y follows SNm(μ,�,�(δ)), it can be expressedby a convenient stochastic representation as follows.

123

120 Y. Huang et al.

Y = μ + �|X0| + �1/2 X1, (22)

where X0 and X1 are two independent random vectors with Nm(0, Im). Let w = |X0|;then, w follows an m-dimensional standard normal distribution Nm(0, Im) trun-cated in the space w > 0. Thus, a two-level hierarchical representation of (22) isgiven by

Y |w ∼ Nm(μ + �w,�), w ∼ Nm(0, Im)I(w > 0). (23)

References

Acosta EP, Wu H, Walawander A, Eron J, Pettinelli C, Yu S, Neath D, Ferguson E, Saah AJ, KuritzkesDR, Gerber JG (2004) Comparison of two indinavir/ritonavir regimens in treatment-experienced HIV-infected individuals. J Acquir Immune Defic Syndr 37:1358–1366

Arellano-Valle RB, Genton MG (2005) On fundamental skew distributions. J Multivar Anal 96:93–116Arellano-Valle RB, Bolfarine H, Lachos VH (2007) Bayesian inference for skew-normal linear mixed

models. J Appl Stat 34:663–682Azzalini A, Capitanio A (2003) Distributions generated by perturbation of symmetry with emphasis on a

multivariate skew t distributions. J Roy Stat Soc Ser B 65:367–389Brooks SP, Gelman A (1998) General methods for monitoring convergence of iterative simulations. J

Comput Graph Stat 7(4):434–455Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM (2006) Meas Error Nonlinear Models Mod Persp,

2nd edn. Chapman and Hall, LondonDeGruttola V, Tu XM (1994) Modeling progression of CD4-lymphocyte count and its relationship to

survival time. Biometrics 50:1003–1014Ferguson TS (1973) A Bayesian analysis of some nonparametric problems. Ann Stat 1:209–230Gelman A, Rubin DB (1992) Inference from iterative simulation using multiple sequences. Stat Sci

7(4):457–511Gelman A (2006) Prior distributions for variance parameters in hierarchical models. Bayesian Anal 1:515–

533Hougaard P (1999) Fundamentals of survival data. Biometrics 55:13–22Huang Y, Liu D, Wu H (2006) Hierarchical Bayesian methods for estimation of parameters in a longitudinal

HIV dynamic system. Biometrics 62:413–423Huang Y, Dagne G (2011) A Bayesian approach to joint mixed-effects models with a skew-normal distrib-

ution and measurement errors in covariates. Biometrics 67:260–269Huang Y, Dagne G, Wu L (2011) Bayesian inference on joint models of HIV dynamics for time-to-event

and longitudinal data with skewness and covariate measurement errors. Stat Med 30:2930–2946Ishwaran H, James L (2002) Dirichlet process computing in finite normal mixtures: smoothing and prior

information. J Comput Graph Stat 11:508–532Liu W, Wu L (2007) Simultaneous inference for semiparametric nonlinear mixed-effects models with

covariate measurement errors and missing responses. Biometrics 63:342–350Lunn DJ, Thomas A, Best N, Spiegelhalter D (2000) WinBUGS: a Bayesian modelling framework: concepts,

structure, an extensibility. Stat Comput 10:325–337Ntzoufras I (2009) Bayesian modeling using WinBUGS. Wiley, New JerseyPawitan Y, Self S (1993) Modeling disease marker processes in AIDS. J Am Stat Assoc 88:719–726Rubin DB (1976) Inference and missing data. Biometrika 63:581–592Sahu SK, Dey DK, Branco MD (2003) A new class of multivariate skew distributions with applications to

Bayesian regression models. Can J Stat 31:129–150Tsiatis AA, Davidian M (2004) Joint modeling of longitudinal and time-to-event data: an overview. Stat

Sin 14:809–834Wu H, Zhang J-T (2002) The study of long-term HIV dynamics using semi-parametric non-linear mixed-

effects models. Stat Med 21:3655–3675

123


Wu H, Ding AA (1999) Population HIV-1 dynamics in vivo: applicable models and inferential tools forvirological data from AIDS clinical trials. Biometrics 55:410–418

Wu L (2009) Mixed effects model for complex data. Chapman and Hall, LondonWu L, Liu W, Hu XJ (2010) Joint inference on HIV viral dynamics and immune suppression in presence

of measurement errors. Biometrics 66:327–335

123

jointly modeling time-to-event and longitudinal data: a...

Documents