identification of longitudinal biomarkers for survival by a score test derived from a joint model of...

13
This article was downloaded by: [Case Western Reserve University] On: 15 October 2014, At: 16:05 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Journal of Applied Statistics Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/cjas20 Identification of longitudinal biomarkers for survival by a score test derived from a joint model of longitudinal and competing risks data Feng-Shou Ko a a Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Miaoli 350, Taiwan, Republic of China Published online: 23 Apr 2014. To cite this article: Feng-Shou Ko (2014) Identification of longitudinal biomarkers for survival by a score test derived from a joint model of longitudinal and competing risks data, Journal of Applied Statistics, 41:10, 2270-2281, DOI: 10.1080/02664763.2014.909789 To link to this article: http://dx.doi.org/10.1080/02664763.2014.909789 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms- and-conditions

Upload: feng-shou

Post on 10-Feb-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Identification of longitudinal biomarkers for survival by a score test derived from a joint model of longitudinal and competing risks data

This article was downloaded by: [Case Western Reserve University]On: 15 October 2014, At: 16:05Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Journal of Applied StatisticsPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/cjas20

Identification of longitudinalbiomarkers for survival by a scoretest derived from a joint model oflongitudinal and competing risks dataFeng-Shou Koa

a Division of Biostatistics and Bioinformatics, Institute ofPopulation Health Sciences, National Health Research Institutes,Miaoli 350, Taiwan, Republic of ChinaPublished online: 23 Apr 2014.

To cite this article: Feng-Shou Ko (2014) Identification of longitudinal biomarkers for survival by ascore test derived from a joint model of longitudinal and competing risks data, Journal of AppliedStatistics, 41:10, 2270-2281, DOI: 10.1080/02664763.2014.909789

To link to this article: http://dx.doi.org/10.1080/02664763.2014.909789

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Page 2: Identification of longitudinal biomarkers for survival by a score test derived from a joint model of longitudinal and competing risks data

Journal of Applied Statistics, 2014Vol. 41, No. 10, 2270–2281, http://dx.doi.org/10.1080/02664763.2014.909789

Identification of longitudinal biomarkers forsurvival by a score test derived from a joint

model of longitudinal and competingrisks data

Feng-Shou Ko∗

Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National HealthResearch Institutes, Miaoli 350, Taiwan, Republic of China

(Received 22 July 2013; accepted 26 March 2014)

In this paper, we consider joint modelling of repeated measurements and competing risks failure time data.For competing risks time data, a semiparametric mixture model in which proportional hazards modelare specified for failure time models conditional on cause and a multinomial model for the marginaldistribution of cause conditional on covariates. We also derive a score test based on joint modellingof repeated measurements and competing risks failure time data to identify longitudinal biomarkers orsurrogates for a time to event outcome in competing risks data.

Keywords: competing risks; EM algorithm; repeated measurements; score test; surrogate

1. Introduction

Longitudinal data can be collected when individuals are followed over time. Thus, longitudinaldata may simultaneously include two components: event times and repeated measurements. Inthe past, longitudinal data analysis and survival analysis are often used separately. However, wecan find more and more studies about the methods that are used joint model of longitudinal andsurvival data analyses [5,8,9,11].

However, the methods as commented above for joint modelling of longitudinal and survivaldata allow for one event with a single mode of failure and have primarily focused on a singlefailure type for the event time. We know competing risk is a key issue for the data analysis inepidemiology studies. Time-to-event outcome for a specific disease will be influenced by otherunderlying disease for the patients. It is necessary to deal with the issue because competing riskcan confound the true association of the disease of interest and time-to-event outcome. We can

∗Email: [email protected]

c© 2014 Taylor & Francis

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

16:

05 1

5 O

ctob

er 2

014

Page 3: Identification of longitudinal biomarkers for survival by a score test derived from a joint model of longitudinal and competing risks data

Journal of Applied Statistics 2271

find that several literatures show the joint modelling of repeated measurements and survival datain the presence of multiple failure types [3,4,10].

In this paper, the first aim is to develop a joint modelling of longitudinal data and survivaldata in the presence of multiple failure types. The second aim is to derive a score test to identifylongitudinal biomarkers or surrogates for a time-to-event outcome in competing risks data. Bothaims are shown in Section 2. In Section 2, our proposed method is extended from the methodsof Ko’s [6,7] and Chang’s [2]. Simulations for demonstration of the properties of our developedmethods are made in Section 3. Real example is analysed by our proposed methods in Section 4.In Section 5, discussion on our proposed methods are made.

2. Method

2.1 Models about longitudinal and survival data

At first, we introduce the joint model of survival part. We employ a semiparametric mixturemodel in which proportional hazards model are specified for failure time models conditional oncause and a multinomial model for the marginal distribution of cause conditional on covariates.

Let Tij, Cij, Zij, and Wij be the time to event, the censoring time, the covariate, and the eventtype of the ith individual at jth observed time point. Let Wij ∈ {1, 2} with Wij = 1 indicating theith individual at jth observed time point being type 1 and Wij = 2 type 2. The conditional hazardof Tij at t given Wij = o and Zij = z exists and is shown as

λo(t)eβ ′

oz,

where λo(•) is a non-negative deterministic baseline function. In the context of competing risksproblems, Wij is usually referred to as the cause of failure type variable.

Assume (Tij, Wij) and Cij are conditionally independent given Zij. Let the conditional dis-tribution of Tij at given Wij = o and Zij = z be denoted by Fo(t, z), where o = 1, 2. Let theconditional distribution of Cij given Zij = z be G(•, z). Let [�ij = 1] = [Tij ≤ Cij, Wij = 1],[�ij = 2] = [Tij ≤ Cij, Wij = 2], and [�ij = 3] = [Tij > Cij]. Then

P(Wij = 1|Zij = z) = eα1+α′2z

(1 + eα1+α′2z)

which will be denoted by α(z). Let Xij = Tij ∧ Cij. We know the likelihood for (Xij, �ij) = (x, δ)given Zij = z is

{α(z)(1 − G(x, z))f1(x, z)}[δ=1]{(1 − α(z))(1 − G(x, z))f2(x, z)}[δ=2]

× {α(z)(1 − F1(x, z))g(x, z) + (1 − α(z))(1 − F2(x, z))g(x, z)}[δ=3],

where g(x, z) ≡ (∂G/∂x)(x, z) and fo(x, z) ≡ (∂Fj/∂x)(x, z) for o = 1, 2, which assumed to exist.Let αc ≡ (α1, α2) and �o(t) = ∫ t

0 λo(s) ds for o = 1, 2. The nonparametric maximum like-lihood estimate (henceforth NPLME) of θ = (αc, β1, β2, �1, �2) based on {Xij, �ij, Zij|i =1, . . . , m, j = 1, . . . , ni}. Here, it is assumed that (Tij, Cij, Zij, Wij) is an independent and identi-cally distributed sequence for i = 1, . . . , m, j = 1, . . . , ni.

Turning to the longitudinal part of the model, if we let Yi(t) represent the underlying biomarkervector of the ith individual at time t, then, following Henderson et al. [4], we can write

Yi(t) = ω′ϒ i(t) + �i(t) + ei(t), (1)

where ϒ(t) is a p × 1 vector of explanatory variables for the longitudinal process, and �i(t) andei(t) are zero-mean random processes. Since the biomarkers are sampled at discrete time points

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

16:

05 1

5 O

ctob

er 2

014

Page 4: Identification of longitudinal biomarkers for survival by a score test derived from a joint model of longitudinal and competing risks data

2272 F-S. Ko

and assuming that the Yi are from a normal distribution, a discrete version of the model can bere-written as

Yij = ω′ϒ ij + �ij + eij, i = 1, 2, . . . , m, j = 1, 2, . . . , ni, (2)

where �ij is the value of a zero-mean Gaussian random effect for the ith individual at the jth timepoint, eij is a zero-mean Gaussian measurement error and ni is the number of observations forindividual i. The j’s in the discrete version of the model refer to the last biomarkers observed at orbefore time t. This accommodates the case that an event precludes the observation of subsequentlongitudinal biomarkers as the last longitudinal observation will occur at or before the time ofthe event. Thus, the number of longitudinal observations, nij, for individual i depends not only onthe sampling of the biomarkers but also can depend on when an individual fails. The errors, eij,are assumed here to be mutually independent and the within individual correlation in Yij arisesthrough serial correlation in the random effect, �ij.

2.2 Joint likelihood function

Now we denote that L is the longitudinal part of the full joint likelihood function and Lγ isthe survival part of the full joint likelihood function. The full joint likelihood function L can bewritten as

L = L × Lγ ,

where L is the standard form corresponding to the normal distribution.If � is known, the survival part of the full joint likelihood function can be written as follows:

Lγ =m∏

i=1

ni∏j=1

({α(Zij)λ10(Xij) exp(β ′1Zij + γ�ij − �10(Xij)e

β ′1Zij+γ�ij)}[�ij=1]

× {[1 − α(Zij)]λ20(Xij) exp(β ′2Zij + γ�ij − �20(Xij)e

β ′2Zij+γ�ij)}[�ij=2]

× {α(Zij) exp(−�10(Xij)eβ ′

1Zij+γ�ij) + [1 − α(Zij)]

× exp(−�20(Xij)eβ ′

2Zij+γ�ij)}[�ij=3]), (3)

The survival part of the full joint log likelihood function, denoted as �γ , can be written asfollows:

�γ =m∑

i=1

ni∑j=1

{[�ij = 1] log{α(Zij)λ10(Xij) exp(β ′1Zij + γ�ij − �10(Xij)e

β ′1Zij+γ�ij)}

+ [�ij = 2] log{[1 − α(Zij)]λ20(Xij) exp(β ′2Zij + γ�ij − �20(Xij)e

β ′2Zij+γ�ij)}

+ [�ij = 3] log{α(Zij) exp(−�10(Xij)eβ ′

1Zij+γ�ij) + [1 − α(Zij)]

× exp(−�20(Xij)eβ ′

2Zij+γ�ij)}}, (4)

The EM algorithm provides a means of maximizing complex likelihoods. In the E-step of thealgorithm, the expected value of �ij is computed, given the current estimates of the parametersand the observable data. In the M -step of the algorithm, estimates of the parameters which max-imize the expected value of �ij from the E-step are obtained. The algorithm iterates betweenthese two steps until convergence.

To apply the E-step, consider the random effect �ij as missing data. The expected valuesof �ij conditional on the observable data are determined by using the current estimates θ =(αc, β1, β2, �1, �2), γ and ω.

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

16:

05 1

5 O

ctob

er 2

014

Page 5: Identification of longitudinal biomarkers for survival by a score test derived from a joint model of longitudinal and competing risks data

Journal of Applied Statistics 2273

In the M -step, θ = (αc, β1, β2, �1, �2), γ and ω need to be estimated.For the estimators of �1, �2, let

1ij = α(Zij) exp(−eβ ′1Zij+γ�ij�10(Xij))

α(Zij) exp(−eβ ′1Zij+γ�ij�10(Xij)) + [1 − α(Zij)] exp(−eβ ′

2Zij+γ�ij�20(Xij)),

2ij = 1 − 1ij,

�1ij =m∑

i=1

ni∑j=1

{[�ij = 1] + [�ij = 3] 1ij}eβ ′1Zij+γ�ij I(0,Xij](u)

�2ij =m∑

i=1

ni∑j=1

{[�ij = 2] + [�ij = 3] 2ij}eβ ′2Zij+γ�ij I(0,Xij](u)

G1ij(u) =m∑

i=1

ni∑j=1

[�ij = 1]I(0,u](Xij), G2ij(u) =m∑

i=1

ni∑j=1

[�ij = 2]I(0,u](Xij),

Then we have the estimators of �1, �2, denoted �̂10(t), �̂20(t), shown as follows:

�̂10(t) =∫ t

0

1

�1ijdG1ij(u),

�̂20(t) =∫ t

0

1

�2ijdG2ij(u),

A full implementation of the EM algorithm is shown as follows:

Step 0 Provide initial estimates of θ = (αc, β1, β2, �1, �2), γ and ω.Step 1 (E-step) Compute �ij, i = 1, . . . , m, j = 1, 2, . . . , ni based on the current values of the

parameters.Step 2 (M -step) Update the estimate of θ = (αc, β1, β2, �1, �2), γ and ω using the likelihood.Step 3 Iterate between Steps 1 and 2 until convergence.

2.3 A score test to determine whether the longitudinal biomarkers is associated with thesurvival time

In this section, we derive a score test to determine whether the random effect of the longitudi-anl biomarkers is significantly associated with the survival time; that is, we want to determinewhether the γ is zero. From Section 2.2, the survival part of the full joint likelihood function canbe written as Lγ . Now let

Uγo(τ ) =m∑

i=1

{∫ τ

0�i(t) dNio(t) −

∫ τ

0�i(t)(1 + io)e

β1 ′zij1 + γ�ijλ0o(t)

}

and we note∂Lγ

∂γo= Uγo(τ )Lγ .

The derivative of the survival part of the full joint log likelihood function is shown as

∂�γ

∂γo= ∂ log Lγ

∂γo= Uγo(τ )Lγ

= Uγo(τ ).

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

16:

05 1

5 O

ctob

er 2

014

Page 6: Identification of longitudinal biomarkers for survival by a score test derived from a joint model of longitudinal and competing risks data

2274 F-S. Ko

The resulting score statistic for γo = 0 is

Uγo = E�|Y [Uγo=0(τ )]

= E�|Y

[m∑

i=1

{∫ τ

0�i(t) dNio(t) −

∫ τ

0�i(t)(1 + io)e

β ′1zij1λ0o(t)

}]

=m∑

i=1

E�|Y [�i(t)] dMio(t),

where Mio(t) = Nio(t) − ∫ t0 �i(t)(1 + io)eβ ′

1zij1λ0o(t) is the usual counting process martingalefor ith individual for ηth cause of failure.

We consider U(t) to be a particular value of a process, {U(s) : s > 0}, and also consider � tobe known and predictable so that the variance of U(s) is

Vγo(s) =m∑

i=1

∫ s

0E�|Y [�i(t)]

2λio(t),

if the �i(t) between individuals are independent;or

Vγo(s) =m∑

i=1

{∫ τ

0E�|Y [�i(t)]

2λio(t) −∫ τ

0

∫ τ

0Cov�|Y (�i(t), �i(s)) dMio(t) dMio(s)

},

if the �i(t) between individuals are not independent.According to the martingale central limit theorem under mild conditions, U(s)/[V(s)](1/2) is

asymptotically N (0, 1) under H0 : γo = 0 as m → ∞ (Details please see [1], p. 83).

3. Simulation study

In order to demonstrate the empirical properties of our proposed joint likelihood and the scoretest for association between longitudinal biomarker values and the survival time function, weintroduce some simulation studies. In the first simulation, we want to explore the performanceof our proposed joint likelihood in the estimation of parameters. Here, the longitudinal model istaken as

Y = ω10 + ω11 × ϒ1 + �(t) + e,

where ϒ1 ∼ N(0, 1) and the measurement of Y is schedule at 0, 1, 2, 3. The survival model istaken as

λ1(t) = λ01(t) exp{β21 × Z + �1 × �(t)}and

λ2(t) = λ02(t) exp{β22 × Z + �2 × �(t)},where Z ∼ N(0, 1). The functions of α(Z) is taken as

P(W = 1|Z) = eα1+α2×Z

(1 + eα1+α2×Z)

and

P(W = 2|Z) = 1 − α(Z).

In our simulation, for the longitudinal biomarker, Y, we let E[Y] = 1 + 2t. � is from �(t) =U1 + V(t), U1 ∼ N(0, σ 2

1 ), V(t) ∼ N(0, σ 22 ), Corr(V(t), V(t + s)) = exp(−|s|), σ 2

1 = σ 22 = 1.

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

16:

05 1

5 O

ctob

er 2

014

Page 7: Identification of longitudinal biomarkers for survival by a score test derived from a joint model of longitudinal and competing risks data

Journal of Applied Statistics 2275

Table 1. Parameter estimates and standard errors forthe joint likelihood.

Parameter True Mean DSE ASE MSE

ω10 0 0.003 0.003 0.005 0.003ω11 1 1.002 0.007 0.009 0.008σ 2

1 1 1.004 0.009 0.010 0.009

σ 22 1 0.997 0.008 0.008 0.008

σ 2e 0.2 0.199 0.007 0.008 0.007

β21 1 1.002 0.006 0.009 0.007β22 1 1.001 0.006 0.010 0.007�1 0.85 0.852 0.013 0.015 0.013�2 0.85 0.847 0.012 0.014 0.012α1 1 0.993 0.012 0.013 0.012α2 1 1.011 0.013 0.013 0.013

Note: MSE, mean-squared error.

For the survival function, we choose η = 2 and let Weibull baseline survival with survivorfunction S01(t) = exp(−0.5t2) and S02(t) = exp(−1.5t2). Let the time-constant standard normalcovariates for each subjects with the coefficients of α1 = 1, α2 = 2, ω10 = 0, β11 = β21 = ω11 =1, and �1 = �2 = 0.85. Number of subjects is 200 for each realization. The results are based on1000 successful realizations and shown as Table 1. In Table 1, the standard deviation of the 1000point estimates of the parameter is denoted as DSE, and the average of the 1000 standard errorfor the estimate of parameter is denoted as ASE. The mean-squared error (MSE) of the 1000point estimates of the parameter is denoted as MSE.

In Table 1, Parameters in longitudinal part and survival part are estimated well by our proposedmethod. The values of the standard deviation of the 1000 point estimates of the parameter (DSE),the average of the 1000 standard error for the estimate of parameter (ASE), and the MSE of the1000 point estimates of the parameter (MSE) are small. It means that our proposed method canestimate the parameters accurately and precisely.

In the second simulation, we want to explore latent process types how to influence the per-formance of the score test. The survival and longitudinal models are as the same as the firstsimulation. We examined the empirical type I error rates of the score tests, that is, the powerunder H0 : �o = 0, o = 1, 2. The nominal α-level is 0.05. Other alternative hypotheses werealso explored. Sample sizes were constructed as follows: total sample size = number of subjects× number of observation times. Three different latent process types were specified for assessingthe power of score test at different sample sizes. The Monte Carlo method is used for this simula-tion. The results are based on 1000 successful realizations. The structures for the three differentlatent process types are as follows:

(1) �(t) = U1, U1 ∼ N(0, σ 21 )

(2) �(t) = U1 + U2 × t, U1 ∼ N(0, σ 21 ), U2 ∼ N(0, σ 2

2 ), Corr(U1, U2) = ρ

(3) �(t) = U1 + V(t), U1 ∼ N(0, σ 21 ), V(t) ∼ N(0, σ 2

2 ), Corr(V(t), V(t + s)) = exp(−|s|)The simulation results are given in Tables 2(a)–(c).

From Tables 2(a)–(c), we observe that the latent process type (1) has less power than bothlatent process types (2) and (3). Also, for all latent process types, an increase in correlationbetween the longitudinal biomarker values and survival time function, as expected, tends toyield an increase in the power of the score test for testing H0 : �o = 0, o = 1, 2. For all latent

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

16:

05 1

5 O

ctob

er 2

014

Page 8: Identification of longitudinal biomarkers for survival by a score test derived from a joint model of longitudinal and competing risks data

2276 F-S. Ko

Table 2. Power of score test for the (a) latent process (1)a under H0 : �o = 0, o = 1, 2, (b) latent process(2)b under H0 : �o = 0, o = 1, 2 and (c) latent process (3)c under H0 : �o = 0, o = 1, 2.

No. of subjects No. of time points σ 21 �o = 0.00 �o = 0.10 �o = 0.25 �o = 0.85

(a)20 10 1 0.058 0.524 0.756 0.79020 10 0.8 0.058 0.516 0.734 0.75620 10 0.2 0.057 0.402 0.430 0.44325 4 1 0.060 0.571 0.782 0.80025 4 0.8 0.059 0.555 0.765 0.78325 4 0.2 0.059 0.507 0.688 0.71050 4 1 0.055 0.611 0.793 0.81950 4 0.8 0.055 0.594 0.776 0.80350 4 0.2 0.054 0.576 0.749 0.789100 20 1 0.054 0.684 0.802 0.830100 20 0.8 0.054 0.665 0.785 0.820100 20 0.2 0.053 0.647 0.765 0.810200 10 1 0.052 0.704 0.812 0.870200 10 0.8 0.052 0.685 0.796 0.862200 10 0.2 0.051 0.666 0.775 0.851

Note: aLatent process �(t) = U1, U1 ∼ N(0, σ 21 ).

No. of No. of timeSubjects points σ 2

1 σ 22 ρ σ 2

1 + σ 22 ρ �o = 0.00 �o = 0.10 �o = 0.25 �o = 0.85

(b)20 10 1 1 0.5 1.5 0.056 0.564 0.780 0.85620 10 1 1 0.05 1.05 0.056 0.545 0.764 0.80320 10 0.8 0.2 0.5 0.9 0.056 0.533 0.756 0.78420 10 0.8 0.2 0.05 0.81 0.056 0.516 0.738 0.69120 10 0.2 0.8 0.5 0.6 0.055 0.454 0.491 0.50320 10 0.2 0.8 0.05 0.24 0.055 0.435 0.459 0.47225 4 1 1 0.5 1.5 0.058 0.616 0.805 0.82325 4 1 1 0.05 1.05 0.058 0.596 0.788 0.80625 4 0.8 0.2 0.5 0.9 0.058 0.576 0.777 0.78825 4 0.8 0.2 0.05 0.81 0.058 0.555 0.756 0.76925 4 0.2 0.8 0.5 0.6 0.057 0.523 0.701 0.71825 4 0.2 0.8 0.05 0.24 0.057 0.505 0.675 0.69550 4 1 1 0.5 1.5 0.053 0.676 0.811 0.83250 4 1 1 0.05 1.05 0.053 0.664 0.802 0.81850 4 0.8 0.2 0.5 0.9 0.053 0.662 0.800 0.81650 4 0.8 0.2 0.05 0.81 0.053 0.651 0.789 0.80250 4 0.2 0.8 0.5 0.6 0.052 0.624 0.789 0.80450 4 0.2 0.8 0.05 0.24 0.052 0.612 0.788 0.801100 20 1 1 0.5 1.5 0.051 0.706 0.822 0.855100 20 1 1 0.05 1.05 0.051 0.690 0.798 0.836100 20 0.8 0.2 0.5 0.9 0.051 0.688 0.797 0.834100 20 0.8 0.2 0.05 0.81 0.051 0.667 0.776 0.819

100 20 0.2 0.8 0.5 0.6 0.050 0.649 0.775 0.823

(Continued)

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

16:

05 1

5 O

ctob

er 2

014

Page 9: Identification of longitudinal biomarkers for survival by a score test derived from a joint model of longitudinal and competing risks data

Journal of Applied Statistics 2277

Table 2. Continued

No. of No. of timeSubjects points σ 2

1 σ 22 ρ σ 2

1 + σ 22 ρ �o = 0.00 �o = 0.10 �o = 0.25 �o = 0.85

(b)100 20 0.2 0.8 0.05 0.24 0.050 0.632 0.755 0.815200 10 1 1 0.5 1.5 0.050 0.725 0.830 0.910200 10 1 1 0.05 1.05 0.050 0.709 0.808 0.892200 10 0.8 0.2 0.5 0.9 0.049 0.707 0.806 0.893200 10 0.8 0.2 0.05 0.81 0.049 0.686 0.788 0.876200 10 0.2 0.8 0.5 0.6 0.049 0.684 0.786 0.874200 10 0.2 0.8 0.05 0.24 0.049 0.668 0.770 0.856

Note: bLatent process �(t) = U1 + U2 × t, U1 ∼ N(0, σ 21 ), U2 ∼ N(0, σ 2

2 ), Corr(U1, U2) = ρ.

No. of No. of timeSubjects points σ 2

1 σ 22 �o = 0.00 �o = 0.10 �o = 0.25 �o = 0.85

(c)20 10 1 1 0.057 0.593 0.791 0.90320 10 0.8 0.2 0.057 0.565 0.774 0.79220 10 0.2 0.8 0.056 0.483 0.504 0.52525 4 1 1 0.059 0.673 0.812 0.84225 4 0.8 0.2 0.058 0.644 0.781 0.81425 4 0.2 0.8 0.058 0.615 0.713 0.73050 4 1 1 0.053 0.704 0.831 0.86150 4 0.8 0.2 0.053 0.683 0.803 0.84350 4 0.2 0.8 0.052 0.664 0.784 0.821100 20 1 1 0.051 0.727 0.852 0.872100 20 0.8 0.2 0.051 0.703 0.834 0.866100 20 0.2 0.8 0.050 0.686 0.816 0.856200 10 1 1 0.050 0.746 0.852 0.930200 10 0.8 0.2 0.049 0.724 0.834 0.920200 10 0.2 0.8 0.049 0.705 0.816 0.911

Note: cLatent process �(t) = U1 + V(t), U1 ∼ N(0, σ 21 ), V(t) ∼ N(0, σ 2

1 ), Corr (V(t), V(t + s)) = exp(−|s|).

process types, a larger value for σ 21 yields a higher power for the score test. Furthermore, for

latent process type (2), a higher value of ρ is associated with a slightly higher power. In addition,given a particular sample size, the power of the score test for relatively large numbers of subjectsand small numbers of observed time points is higher than for relatively small numbers of subjectsand large numbers of observed time points.

4. Example

We next consider a real dataset from 205 patients with malignant melanoma [1]. Two causesof failure are present: (1) death from malignant melanoma and (2) death from other causes. Allpatients were followed until death or to the end of study. Variables recorded in the study werepatient id, sex, tumour thickness, current measurement time and censor indicator. The numberof the death were 71. Fifty-seven patients were dead due to malignant melanoma and the deathof 14 patients were due to other causes. Seventy-nine patients were male and 126 patients were

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

16:

05 1

5 O

ctob

er 2

014

Page 10: Identification of longitudinal biomarkers for survival by a score test derived from a joint model of longitudinal and competing risks data

2278 F-S. Ko

Table 3. Melanoma patients results for: latent process (1), latent process (2) and latent process (3).

(1) (2) (3)Est (SE) p-Value Est (SE) p-Value Est (SE) p-Value

Longitudinal part1. Fixed effectSex (ω) 13.66 < .001 13.75 < .001 14.63 < .001(reference group: (2.571) (2.239) (1.063)female)2. Random effectσ 2

1 368.64 < .001 361.10 < .001 293.41 < .001(12.44) (11.74) (9.524)

σ 22 6.80 .004 244.12 < .001

(2.547) (9.013)ρ 0.023 .011

(0.010)Survival partSex (β) 0.586 .588 −0.319 .438 −0.321 .449

(0.713) (0.550) (0.539)Sex (α) 0.423 < .001 0.452 < .001 0.393 < .001

(0.083) (0.067) (0.061)Random effect from 0.163 < .001 0.435 < .001 0.270 < .001tumour thickness for death

due to malignantmelanoma (�1)

(0.0371) (0.0819) (0.0521)

Random effect from tumourthickness for death due toother causes (�2)

0.143 < .001 0.401 < .001 0.221 < .001

(0.0322) (0.0801) (0.0511)

χ2-value (DF) p-Value χ2-value (DF) p-Value χ2-value (DF) p-Value

Score test for the 17.98 < .001 26.16 < .001 40.14 < .001random effect from tumour

thickness for death due tomalignant melanoma

(1) (1) (1)

Score test for the 16.28 < .001 24.62 < .001 37.37 < .001random effect from tumour

thickness for death due toother causes

(1) (1) (1)

log likelihood −15,090.042 −14,990.321 −14,980.218

female. The range of follow-up time is from 10 to 5565 days. The range of tumour thickness isfrom 0.26 to 17.42 cm. For our purposes, we focus on the repeated biomarker, tumour thickness,and sex. The results are given in Table 3.

From Table 3, we can find that male patients’ tumour thickness is significantly larger thanfemale patients’. In survival part, we can find that the random effect from tumour thicknessis associated with the death due to malignant melanoma and due to other causes in individuallevel. It shows that tumour thickness can indicate patients’ surviving. The score test for therandom effect from tumour thickness for death due to malignant melanoma and due to othercauses shows that tumour thickness is significantly associated with the survival time of patients.It means that patients’ tumour thickness can use to indicate patients’ surviving in the competingrisks data. Note that each model commented in this section is not nested to each other model.Besides, The values of total likelihood for Model (1) is −15,090.042 and the one for Model

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

16:

05 1

5 O

ctob

er 2

014

Page 11: Identification of longitudinal biomarkers for survival by a score test derived from a joint model of longitudinal and competing risks data

Journal of Applied Statistics 2279

(2) is −14,990.321. The −2log likelihood ratio for Model (1) and Model (2) shows that Model(2) is more suitable model fitting for the data. The log likelihood ratio between Model (1) andModel (2) are 99.721 with 2 degrees of freedom and the corresponding p-value is < 0.001. Thevalues of total likelihood for Model (3) is −14,980.218 and −2log likelihood ratio for Model(1) and Model (3) also shows that Model (3) is more suitable model fitting for the data. The loglikelihood ratio between Model (1) and Model (3) are 109.824 with 2 degrees of freedom andthe corresponding p-value is < 0.001. The number of parameters for Model (2) and Model (3) isthe same. Since the values of total likelihood for Model (3) is more than Model (2), we can saythat Model (3) can be the most suitable model fitting for the data.

5. Discussion

In this paper, the joint modelling of repeated measurements and competing risks failure time datato allow for more than one distinct failure type in the survival endpoint is employed and a scoretest is derived to identify longitudinal biomarkers or surrogates for a time-to-event outcomein competing risks data. Under a carefully chosen definition of complete data, the maximumlikelihood estimation of the cause-specific hazard functions is calculated via an EM algorithm.In order to measure the goodness of the latent process models to fit well, Henderson et al. [4]commented to compare the log likelihoods fitted by different latent process models to find theadequate latent process models for the data analysis.

About the issue associated with the score test to misspecification of the latent process model,we show some simulation results in Table 4(a) and 4(b). The power of score test for the misspeci-fication of latent process (2) to latent process (1) is given in Table 4(a). Comparison of Tables 4(a)and Table 2(a), we can find the misspecification can decrease the power. However, the effect ofthe misspecification is moderate for the misspecification of latent process (2) to latent process(1). The power of score test for the misspecification of latent process (3) to latent process (1) isgiven in Table 4(b). We also can find that the misspecification can cause the moderate decreasein the power when the misspecification of latent process (3) to latent process (1) occurred.

In simulation results, it shows that type I error is inflated for our proposed score test in smallsize and power lacks good. We still try to solve this problem for our further study. The sameproblem is also for the Wald test.

In Section 2.3, we derive a score test to determine whether the random effect of the longi-tudinal biomarkers is significantly associated with the survival time. In our proposed method,the joint model of survival part is employed a semiparametric mixture model in which propor-tional hazards model are specified for failure time models conditional on cause and a multinomialmodel for the marginal distribution of cause conditional on covariates. If we neglect a multino-mial model for the marginal distribution of cause conditional on covariates and assume that thedata are uncensored, then Uγo(τ ) can be reduced as

Uγo(τ ) =m∑

i=1

{∫ τ

0�i(t) dNio(t) −

∫ τ

0�i(t)e

β ′1zij1+γ�ijλ0o(t)

}.

Besides, the resulting score statistic for γo = 0 is

Uγo = E�|Y [Uγo=0(τ )]

= E�|Y

[m∑

i=1

{∫ τ

0�i(t) dNio(t) −

∫ τ

0�i(t)e

β ′1zij1λ0o(t)

}]

=m∑

i=1

E�|Y [�i(t)] dMio(t),

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

16:

05 1

5 O

ctob

er 2

014

Page 12: Identification of longitudinal biomarkers for survival by a score test derived from a joint model of longitudinal and competing risks data

2280 F-S. Ko

Table 4. Power of score test for the misspecification of (a) latent process (2) to latent pro-cess (1) under H0 : �o = 0, o = 1, 2 and (b) latent process (3) to latent process (1) underH0 : �o = 0, o = 1, 2.

No. of subjects No. of time points σ 21 �o = 0.00 �o = 0.10 �o = 0.25 �o = 0.85

(a)20 10 1 0.060 0.513 0.745 0.77920 10 0.8 0.060 0.505 0.725 0.74420 10 0.2 0.059 0.393 0.421 0.43225 4 1 0.062 0.564 0.771 0.78925 4 0.8 0.061 0.547 0.754 0.77425 4 0.2 0.061 0.498 0.681 0.69950 4 1 0.057 0.605 0.780 0.79350 4 0.8 0.057 0.578 0.761 0.78250 4 0.2 0.056 0.556 0.721 0.771100 20 1 0.056 0.676 0.791 0.818100 20 0.8 0.056 0.655 0.774 0.811100 20 0.2 0.055 0.638 0.756 0.802200 10 1 0.054 0.693 0.801 0.858200 10 0.8 0.054 0.678 0.784 0.851200 10 0.2 0.053 0.660 0.763 0.839

(b)20 10 1 0.060 0.508 0.740 0.77320 10 0.8 0.060 0.500 0.720 0.73820 10 0.2 0.059 0.388 0.415 0.42725 4 1 0.062 0.556 0.765 0.78425 4 0.8 0.061 0.543 0.749 0.76825 4 0.2 0.061 0.492 0.674 0.69450 4 1 0.057 0.621 0.776 0.79350 4 0.8 0.057 0.602 0.758 0.78250 4 0.2 0.056 0.583 0.749 0.771100 20 1 0.056 0.670 0.785 0.813100 20 0.8 0.056 0.651 0.768 0.806100 20 0.2 0.055 0.634 0.751 0.797200 10 1 0.054 0.689 0.795 0.853200 10 0.8 0.054 0.674 0.779 0.847200 10 0.2 0.053 0.653 0.758 0.833

where Mio(t) = Nio(t) − ∫ t0 �i(t)eβ ′

1zij1λ0o(t) is the usual counting process martingale for ithindividual for ηth cause of failure.

Now we use the data simulated by Section 3 to observe power of score test derived in Section 5.The result is given as Table 5.

Comparison of Tables 5 and Table 2(c), it shows that the power of score test without theassumption of a multinomial model is worsen than our proposed method for the simulation data.It means our method is suitable for the censored data. Our proposed method is considering thenovel method for the data containing the information of longitudinal and the competing risksdata.

Either our method or Henderson et al. [4] method, we need to assume that the random effectssuch as � are known. If we choose the inadequate structures or wrong distributions for �, itcause the biased results from the score test. It is the limitation for the score test. The missing of

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

16:

05 1

5 O

ctob

er 2

014

Page 13: Identification of longitudinal biomarkers for survival by a score test derived from a joint model of longitudinal and competing risks data

Journal of Applied Statistics 2281

Table 5. Power of score test derived in Section 5 for the latent process (3) under H0 : �o = 0, o = 1, 2.

No. of Subjects No. of Time Points σ 21 σ 2

2 �o = 0.00 �o = 0.10 �o = 0.25 �o = 0.85

20 10 1 1 0.058 0.532 0.742 0.85220 10 0.8 0.2 0.058 0.504 0.723 0.74620 10 0.2 0.8 0.057 0.423 0.458 0.57725 4 1 1 0.069 0.634 0.763 0.79325 4 0.8 0.2 0.059 0.605 0.732 0.76725 4 0.2 0.8 0.059 0.568 0.664 0.68350 4 1 1 0.054 0.665 0.783 0.81350 4 0.8 0.2 0.054 0.647 0.754 0.79250 4 0.2 0.8 0.053 0.625 0.735 0.774100 20 1 1 0.052 0.688 0.803 0.823100 20 0.8 0.2 0.052 0.662 0.782 0.817100 20 0.2 0.8 0.051 0.639 0.767 0.803200 10 1 1 0.051 0.703 0.803 0.881200 10 0.8 0.2 0.050 0.687 0.781 0.872200 10 0.2 0.8 0.050 0.668 0.767 0.863

the covariates in the longitudinal data or survival data also influence the result from the scoretest since the correct structures and distribution are used to estimate �. In the future work, wecan study the effect of the missingness in the longitudinal data on the power of score test. Inthe further, we can combine our method and the methods to deal with missing values for dataanalysis.

References

[1] P.K. Andersen, O. Borgan, R.D. Gill, and N. Keiding, Statistical Models Based on Counting Process, Springer,New York, 1993.

[2] I.S. Chang, Non-parametric maximum-likelihood estimation in a semiparametric mixture model for competing-risksdata, Scand. J. Stat. 34(4) (2007), pp. 870–895.

[3] R.M. Elashoff, G. Li, and N. Li, An approach to joint analysis of longitudinal measurements and competing risksfailure time data, Stat. Med. 26 (2007), pp. 2813–2835.

[4] R.M. Elashoff, G. Li, N. Li, A joint model for longitudinal measurements and survival data in the presence ofmultiple failure types, Biometrics 64(3) (2008), pp. 762–771.

[5] R. Henderson, P.J. Diggle, and A. Dobson, Identification and efficacy of longitudinal markers for survival,Biostatistics 3(1) (2002), pp. 33–50.

[6] F.S. Ko, Identification and assessment of longitudinal biomarkers using frailty models in survival analysis, Ph. D.thesis. Available at http://d-scholarship.pitt.edu/8680, 2006.

[7] F.S. Ko, Using frailty models to identify the longitudinal biomarkers in survival analysis, Commun. Stat. – ThoeryMethods 39(18) (2010), pp. 3222–3237.

[8] X. Song, M. Davidian, and A.A. Tsiatis, A semiparametric likelihood approach to joint modeling of longitudinaland time-to-event data, Biometrics 58(4) (2002), pp. 742–753.

[9] A.A. Tsiatis and M. Davidian, A semiparametric estimator for the proportional hazards model with longitudinalcovariates measured with error, Biometrika 88(2) (2001), pp. 447–458.

[10] P.R. Williamson, R. Kolamunnage-Dona, P. Philipson, and A.G. Marson, Joint modelling of longitudinal andcompeting risks data, Stat. Med. 27 (2008), pp. 6426–6438.

[11] M.S. Wulfsohn and A.A. Tsiatis, A joint model for survival and longitudinal data measured with error, Biometrics53(1) (1997), pp. 330–339.

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

16:

05 1

5 O

ctob

er 2

014