longitudinal quality of life data: a comparison of continuous and ordinal approaches
TRANSCRIPT
Longitudinal quality of life data: a comparison of continuousand ordinal approaches
A. F. Donneau • M. Mauer • C. Coens •
A. Bottomley • A. Albert
Accepted: 27 May 2014
� Springer International Publishing Switzerland 2014
Abstract
Purpose In cancer clinical trials, health-related quality of
life (HRQoL) is a major outcome measure. It is generally
assessed at specified time intervals by filling out a ques-
tionnaire with ordered response categories. Despite recent
advances in the statistical methodology for handling ordi-
nal longitudinal outcome data, most users keep treating
HRQoL scales as continuous rather than ordinal variables
regardless of the number of categories. The purpose of this
study was to compare the results of analyzing HRQoL
longitudinal data under both approaches, continuous and
ordinal.
Methods The EORTC QLQ-C30 scores of two EORTC
randomized brain cancer clinical trials (26951 and 26981)
were analyzed using the two approaches. In the 26951 trial,
a total of 368 patients were randomly assigned to receive
either radiotherapy (RT) or the same RT plus procarbazine,
CCNU, and vincristine. In the 26981 trial, 573 patients
were randomly allocated to RT or RT plus temozolomide.
Comparison of the two treatment arms was done using
methods for longitudinal quantitative and longitudinal
ordinal data. Both statistical methods were adapted to
account for missing data and compared in terms of
statistical significance of the results (p values) but also with
respect to data interpretation.
Results Three scales, i.e., appetite loss, insomnia, and
drowsiness, presenting four response categories (‘‘Not at
all’’, ‘‘A little’’, ‘‘Quite a bite’’, and ‘‘Very much’’) were
analyzed in each trial. Both statistical methods (continuous
and ordinal) showed statistically significant differences
between the two treatments, not only globally but also at
the same assessment time points. The magnitude of the
p values, however, varied at some time points and was less
pronounced in the ordinal approach.
Conclusions The analysis of the two clinical trials
showed that treating the HRQoL scales by a quantitative or
an ordinal method did not make much difference as far as
statistical significance was concerned. The interpretation of
results, however, was easier under the ordinal approach.
Treatment effects may be more meaningful when expres-
sed in terms of odds ratios than as mean values, particularly
when the number categories is small.
Keywords Health-related quality of life � Continuous �Ordinal � Longitudinal � Missing at random
Introduction
Classical endpoints in cancer clinical trials are usually
defined in terms of event-free or overall survival, but
physicians often need a more comprehensive evaluation of
treatment efficacy. In this context, the subjective response
of the patients to their illness and its treatment, specifically
the patients’ quality of life (HRQoL), has to be taken into
account [1]. HRQoL assessments are generally made
repeatedly during the clinical trial by means of self-
reported questionnaires consisting of various items, fre-
quently scored on a binary or ordinal scale.
A. F. Donneau (&) � A. Albert
Medical Informatics and Biostatistics, University of Liege,
Liege, Belgium
e-mail: [email protected]
M. Mauer
Department of Statistics, EORTC Headquarters, Brussels,
Belgium
C. Coens � A. Bottomley
Department of Quality of Life, EORTC Headquarters, Brussels,
Belgium
123
Qual Life Res
DOI 10.1007/s11136-014-0730-8
The statistical analysis of longitudinal HRQoL data may
be complicated in two ways: (1) by the nature of the
HRQoL score itself and (2) by missing observations. In
fact, patients enrolled in cancer clinical trials are likely to
experience adverse events (toxicity, disease progression, or
even death) that will interfere with data collection. Missing
data are all the more important in the analysis of HRQoL
data that it is likely to be related to the HRQoL deterio-
ration of the patient.
In the literature [2, 3], it is well established that when
the outcome of interest is dichotomous, methods for binary
variables should be employed. However, when the HRQoL
scale under study is ordinal, no unique approach exists.
Several models for categorical outcomes have been pro-
posed [3–5], but in practice, they have been underutilized.
In fact, most papers [6, 7] concerned with the statistical
analysis of HRQoL scales treat them as continuous or as
binary rather than as categorical variables regardless of the
number of categories of the scale. Although this approach
may not be optimal, as it does not utilize all the information
available, it is often preferred due to its simplicity.
The aim of this paper was to compare the results of the
analysis of HRQoL longitudinal data treating them as
continuous variable or as categorical ordinal.
Methods
Study material
The EORTC QLQ-C30 version 2 questionnaire [8] used
hereafter is a ‘‘core questionnaire’’, which incorporates a
range of physical, emotional, and social health issues rel-
evant to a broad spectrum of cancer patients. This core
questionnaire may be supplemented by diagnosis-specific
and/or treatment-specific questionnaire modules. The latter
can provide more detailed information relevant to evalu-
ating the HRQoL in specific patient populations. The QLQ-
C30 incorporates nine multi-items scales: five functional
scales (physical, role, cognitive, emotional, and social);
three symptom scales (fatigue, pain, and nausea/vomiting);
and a Global Health Status scale. It also includes six single
items (dyspnoea, insomnia, appetite loss, constipation,
diarrhea, and financial difficulties). The sub-scales scores
are obtained by averaging items within the scale. A high
score for a functional scale represents a high/healthy level
of functioning, whereas a high score for a symptom scale or
item represents a high level of symptomatology or prob-
lems. The EORTC Brain Cancer Module (EORTC QLQ-
BN20) is intended to supplement the QLQ-C30 when
assessing HRQoL, disease symptoms, side effects of
treatment, and some specific psychosocial issues of
importance to patients with brain cancer. The QLQ-BN20
contains 20 items, 13 of which aggregate into four scales
assessing future uncertainty, visual disorder, motor dys-
function (MD), and communication deficit. The remaining
single items assess other disease symptoms (e.g., head-
aches and seizures) and treatment toxic effects (e.g., hair
loss) [9]. For all these scales, a higher score represents
worse HRQoL. The specific questions and scoring systems
of the three measures may be found on the EORTC QOL
website: http://groups.eortc.be/qol.
The datasets used in this paper were two phase III,
multicenter, randomized trials that compare two regimens
for patients suffering from brain tumor. In the EORTC
26951, radiation alone (RT) was compared to radiother-
apy plus chemotherapy (RT ? PCV) in patients with
recurrent anaplastic oligodendroglioma (AOD) and ana-
plastic mixed oligoastrocytomas (AOA) brain tumor. The
adjuvant procarbazine, CCNU (lomustine), and vincristine
(PCV) chemotherapy consisted of 6 cycles of standard
PCV chemotherapy, started within 4 weeks following the
end of radiation therapy. Each cycle consisted of CCNU
110 mg/m2 orally on day 1 with anti-emetic; procarbazine
60 mg/m2 orally on day 8–21; and vincristine, 1.4 mg/m2
i.v. on day 8 and 29. Cycles were to be repeated every
6 weeks, with dose reduction. A total of 368 patients was
randomized in this study by 40 institutions, 183 in the RT
alone arm and 185 in the RT?PCV arm. The EORTC
26981 trial compared radiotherapy (RT) and radiotherapy
plus concomitant daily temozolomide, followed by adju-
vant temozolomide (RT?TMZ) in patients with newly
diagnosed and histologically confirmed glioblastoma.
Between August 2000 and March 2002, a total of 573
patients were randomized by 85 institutions in 15 coun-
tries in this trial, respectively, 286 in the RT arm and 287
in the RT?TMZ arm.
In both trials, HRQoL was planned to be assessed in a
longitudinal design in all patients using the EORTC QLQ-
C30 version 2 questionnaire [8] in combination with the
disease specific Brain Cancer Module [10], which addres-
ses 20 topics relevant for brain tumor. Baseline HRQoL
assessment was performed at randomization. Follow-up
assessments were performed at regular intervals, as shown
in Fig. 1.
Clinical and HRQoL results of both studies have been
published elsewhere [6, 7, 11, 12].
Only the single-item scales on both measures were
considered, namely dyspnea, insomnia, appetite loss, con-
stipation, diarrhea and financial difficulties from the QLQ-
C30 and bladder control, drowsiness, headaches, hair loss,
itchy skin, seizure, and weakness legs from the QLQ-
BN20. Each single-item scale is an ordinal variable with
four response categories: ‘‘Not at all,’’ ‘‘A little,’’ ‘‘Quite a
Qual Life Res
123
bit,’’ and ‘‘Very much.’’ For illustrative purposes, only the
four follow-up assessment times of both trials were con-
sidered here.
Statistical methods
Two main statistical approaches were considered to test for
differences between the two treatment arms. Specifically, a
linear mixed model and a proportional odds model using
the generalized estimating equations (GEE) method were
fitted to the data when considering the HRQoL scale as a
continuous or an ordinal outcome, respectively. Consider a
sample of N subjects and let Y be a HRQoL variable with
K ordered categories assessed on T occasions in each
subject. Then, let Yij denote the assessment of the variable
Y for the ith subject ði ¼ 1; . . .;NÞ at the jth occasion
ðj ¼ 1; . . .; TÞ. Hence, Yi ¼ ðYi1; . . .;YiTÞ0 is the vector of
the repeated assessments of the ith subject and Yj ¼ðY1j; . . .;YNjÞ0 is the vector of responses at the jth occa-
sion. Associated with each subject, there is a p� 1 vector
of covariates, say xij, measured at time j. Hence, let Xi ¼ðxi1; . . .; xiTÞ0 denote the T � p design matrix of the ith
subject. In the present study, covariates include treatment’s
effect, time, and interaction between time and treatment.
One way to assesses the impact of the covariates X on
the continuous HRQoL assessments, Yij, is through the
application of a linear mixed model. This model, com-
monly used for the analysis of continuous longitudinal data
[13], can be written as
Yij ¼ x0ijbþ �ij; ð1Þ
where b ¼ ðb1; . . .; bpÞ0 is the vector of coefficients and
�ij the error components assumed to be normally distrib-
uted with mean zero, �i�Nð0;RÞ ði ¼ 1; . . .;N; j ¼ 1; . . .;
T ; k ¼ 1; . . .;K � 1). In what follows, a mixed model with
an undefined covariance structure was fitted to the longitu-
dinal HRQoL data.
When considering Yij as an ordinal variable with K
categories, the cumulative proportional odds model is a
popular choice to relate the marginal probabilities of Y to
the covariate vector x [14]. Specifically,
logit½PrðYij� kjxijÞ� ¼ b0k þ x0ijb; ð2Þ
where b0 ¼ ðb01; . . .; b0;K�1Þ0 is the vector of the intercept
parameters and b ¼ ðb1; . . .; bpÞ0 the vector of coefficients
(i ¼ 1; . . .;N; j ¼ 1; . . .; T ; k ¼ 1; . . .;K � 1).
Both methods account for the repeated feature of serial
assessments of HRQoL scales over time. However, special
attention has to be given to the handling of missing data in
the repeated HRQoL assessments. Missing data occurred
when patients did not complete all or some items of the
HRQoL questionnaires at the time of a scheduled evalua-
tion. Missing data also occurred when patients dropped out
from the study because of disease progression, death, or
end of the clinical follow-up period.
The terminology introduced by Rubin [15] and Little
and Rubin [16] was considered when referring to the
missingness process: missing completely at random
(MCAR), missing at random (MAR), and missing not at
random (MNAR). Under the MCAR mechanism, the
probability of an observation being missing is independent
of both the unobserved and the observed data. Under the
MAR mechanism, the probability of an observation being
missing is independent of the unobserved measurements,
given the observed data. When neither MCAR nor MAR
holds, the missingness mechanism is said to be MNAR,
whence the probability of an observation being missing
depends on unobserved data.
To check the reliability of the use of the proposed
model, the missingness mechanism was investigated [6, 7,
17]. Specifically, a logistic regression analysis of the
occurrence of missingness at the previous quality of life
response was conducted. In both trials, this missingness
investigation revealed that the probability for an observa-
tion to be missing was significantly related to the previous
response. As a consequence, the mechanism generating the
missingness in these data was not MCAR. In the following,
the assumption of an MAR process was made. Neverthe-
less, the possibility of a MNAR process should not be
discarded. In this respect, sensitivity analysis can be
applied but was not discussed in the present paper. While
the mixed model is valid under the MCAR or MAR
assumption, the ordinal GEE method has to be adapted to
account for the presence of MAR data. In this perspective,
multiple imputation (MI) [18, 19] was applied as a pre-
liminary step. Two MI approaches were investigated. The
Fig. 1 Design of the 26951–26981 EORTC trials
Qual Life Res
123
first one is the widely used multivariate normal imputation
(MNI) algorithm, which is based on the multivariate nor-
mal distribution for each variable for which data need to be
imputed. Although appropriate for continuous outcomes,
this algorithm, referred to as (MNI ? GEE), is often
applied to impute ordinal data. The second MI method
based on the ordinal imputation method (OIM) will be
labeled (OIM ? GEE). It accounts for the ordinal feature of
the outcome by imputing missing observations through an
ordinal logistic regression model. Technical details about
the two MI methods for incomplete longitudinal ordinal
data are given in [20, 21]. In both MI approaches, the
number of imputation was fixed to 20. Based on Rubin’s
rule [23], the most important prognostic baseline clinical
factors, the factors found to be associated with the dropout
mechanism [6, 7, 12, 22] as well as the treatment, were
included in the imputation model. In both trials, the pro-
portionality assumption (underlying the proportional odds
model) was satisfied by all investigated single-item
HRQoL scales.
Results
Data distribution
Careful data scrutiny should always be a starting point
prior to data analysis. As an illustration, the distribution of
the Appetite loss HRQoL scale at each assessment time and
in each treatment arm in the 26951 EORTC trial is dis-
played in Fig. 2. The barplots show that whatever the
treatment and time point, the distribution is particularly
peaked and positively skewed. Thus, methods for contin-
uous and normally distributed data may not be the best
choice to analyze HRQoL data. Similar behaviors were
observed for the other scales (data not shown).
The distribution of patient dropout at each time point in
each treatment arm is given in Table 1 for both EORTC trials.
Data analysis
Table 2 displays the p values for ‘‘treatment effect’’ at
each time point in both trials and for each HRQoL scale.
Globally, the two statistical approaches revealed a treat-
ment effect at the same assessment times. However,
p values derived under the continuous approach were
usually ‘‘more significant.’’ As a consequence, while the
continuous approach tended to be significant at some
occasions (e.g., appetite loss at FU3 in EORTC 26981
and insomnia at FU2 in EORTC 26981), the ordinal GEE
method did not. The same observation was made for the
other scales.
Concerning the interpretation of the treatment effect, the
estimated mean scores derived in both treatment arms from
the continuous mixed model approach and OR (with cor-
responding 95 % confidence interval) derived from the
ordinal approaches are given in Tables 3 and 4 for
‘‘Appetite loss,’’ ‘‘Insomnia,’’ and ‘‘Drowsiness’’ HRQoL
scale in EORTC 26951 trial and EORTC 26981 trial,
respectively.
For the ordinal approach, the OR values derived under
the MNI approach were always higher than those derived
under the OIM. As an example (Table 4), under the MNI ?
GEE, the odds of presenting severe appetite loss at FU1
Fig. 2 Distribution of Appetite loss HRQoL scale at each assessment time and in each treatment arm—26951 EORTC trial
Table 1 Distribution of patient dropout at each time point in both
EORTC trials
Assessment
time
EORCT 26951 trial EORCT 26981 trial
RT RT ?
PCV
RT RT ?
TMZ
Baseline 25 (14.3) 35 (20.2) 19 (7.11) 27 (10.0)
FU1 46 (26.3) 45 (26.0) 78 (29.2) 67 (24.9)
FU2 86 (49.1) 70 (40.4) 124 (46.4) 104 (38.7)
FU3 106 (60.5) 98 (56.6) 204 (76.4) 175 (65.1)
FU4 119 (68.0) 109 (63.0) 236 (88.4) 188 (69.9)
Qual Life Res
123
were 1.67 (=1/0.60) times higher with RT ? TMZ than
with RT alone. By contrast, under the OIM ? GEE, the
odds of presenting severe appetite loss were 2.04 (=1/0.49)
times higher.
The use of the GEE ordinal method also allows to derive
the probabilities of each category at each assessment time
in both treatment arms. These probabilities are depicted for
both MI approaches in Fig. 3 for the three selected HRQoL
scales in EORTC 26951 trial. Figure 4 presents the same
results for the EORTC 26981 trial. It appears that the
probability profiles derived under the MNI ? GEE method
differ substantially from those derived under the
OIM ? GEE method. As an example (Fig. 3), for the
appetite loss in the RT ? PVC arm at FU3, the category
probabilities after MNI imputation were equal to 38.7,
40.0, 16.5, and 4.82 %, respectively, while under the OIM
Table 3 Interpretation of
treatment effects at each
assessment time for the selected
HRQoL scales from 26951
EORTC trial
a Mean ± SDb Cumulative odds ratio (95 %
CI) from the proportional odds
model with probabilities
cumulated over the lower
response categories (i.e., Not at
all vs. other—Not at all or A
little vs. other—Not at all or a
little or Quit a bite vs. Very
much) comparing RT versus
RT ? PCV
HRQoL scale Assessment Continuous mixed model MNI ? GEE OIM ? GEE
time RT RT ? PCV
Appetite loss Baseline 1.18 ± 0.04a 1.19 ± 0.04 0.88 (0.47–1.65)b 0.92 (0.47–1.79)b
FU1 1.41 ± 0.07 1.44 ± 0.07 0.95 (0.58–1.57) 0.93 (0.52–1.66)
FU2 1.53 ± 0.09 1.86 ± 0.09 0.59 (0.35–0.98) 0.51 (0.28–0.92)
FU3 1.37 ± 0.09 1.88 ± 0.09 0.37 (0.21–0.66) 0.35 (0.18–0.67)
FU4 1.20 ± 0.10 1.79 ± 0.09 0.34 (0.18–0.64) 0.19 (0.08–0.43)
Insomnia Baseline 1.55 ± 0.07 1.80 ± 0.08 0.57 (0.35–0.92) 0.59 (0.37–0.92)
FU1 1.45 ± 0.07 1.49 ± 0.07 0.86 (0.52–1.43) 0.88 (0.51–1.51)
FU2 1.39 ± 0.08 1.64 ± 0.08 0.57 (0.34–0.95) 0.55 (0.28–1.07)
FU3 1.46 ± 0.08 1.50 ± 0.07 1.03 (0.63–1.70) 0.90 (0.47–1.71)
FU4 1.51 ± 0.10 1.72 ± 0.09 0.69 (0.41–1.15) 0.79 (0.39–1.62)
Drowsiness Baseline 1.71 ± 0.06 1.71 ± 0.07 0.95 (0.59–1.53) 0.97 (0.64–1.46)
FU1 1.73 ± 0.07 1.77 ± 0.07 0.94 (0.60–1.49) 0.93 (0.60–1.44)
FU2 1.81 ± 0.08 1.77 ± 0.08 1.18 (0.70–2.00) 1.06 (0.63–1.76)
FU3 1.55 ± 0.09 1.89 ± 0.09 0.51 (0.29–0.88) 0.43 (0.26–0.73)
FU4 1.72 ± 0.10 1.81 ± 0.09 0.90 (0.51–1.61) 0.88 (0.49–1.60)
Table 2 P value related to
treatment effect at each
assessment time for selected
HRQoL scales in both EORTC
trials
HRQoL
scale
Assessment
time
EORCT 26951 trial EORCT 26981 trial
Continuous
mixed model
MNI
?GEE
OIM ?
GEE
Continuous
mixed model
MNI
?GEE
MNI
?GEE
Appetite
loss
Baseline 0.83 0.79 0.69 0.81 0.38 0.39
FU1 0.76 0.74 0.51 0.0009 0.015 0.0023
FU2 0.013 0.075 0.0071 0.11 0.55 0.28
FU3 \0.0001 0.0008 0.0001 0.051 0.41 0.12
FU4 \0.0001 0.0004 0.0002 0.45 0.98 0.82
Insomnia Baseline 0.021 0.023 0.020 0.40 0.32 0.36
FU1 0.63 0.57 0.65 0.45 0.98 0.49
FU2 0.023 0.032 0.079 0.84 0.97 0.69
FU3 0.72 0.90 0.73 0.40 0.75 0.31
FU4 0.13 0.15 0.51 0.25 0.35 0.19
Drowsiness Baseline 0.95 0.84 0.88 0.62 0.91 0.79
FU1 0.69 0.80 0.74 0.33 0.94 0.63
FU2 0.71 0.52 0.83 0.17 0.64 0.07
FU3 0.006 0.015 0.002 0.37 0.99 0.86
FU4 0.52 0.73 0.68 0.27 0.69 0.51
Qual Life Res
123
imputation they amounted 45.9, 28.7, 14.1, and 11.3 %,
respectively.
Conclusions
This paper compared the results of the analysis of HRQoL
longitudinal data considered either as a continuous or an
ordinal outcome. The two evaluated approaches took into
account correlated and missing data specific to longitudinal
HRQoL assessments.
Due to the low number of possible response categories
(i.e., ‘‘Not at all’’, ‘‘A little’’, ‘‘Quite a bit’’, and ‘‘Very
much’’), analyzing EORTC single-item scales as continu-
ous is not optimal. Moreover, assuming a normal distri-
bution was also unrealistic, because the distributions were
barely symmetric. This can be easily verified from the data
presented in the ‘‘EORTC QLQ-C30 Reference Values’’
manual [24]. This manual compiles EORTC QLQ-C30
data from 23,553 cancer patients. For 17 out of the 30
questions of the questionnaire, more than 50 % of the
answers fall into the lowest answer category (‘‘Not at all’’)
out of four possible categories. This is especially prevalent
for symptom-related questions (e.g., diarrhea, constipation,
and vomiting) where specific symptoms are often absent or
treated via concomitant medication. The non-normality and
skewed distributions of HRQoL scales are also recognized
by the questionnaire developers. Both the FDA regulatory
guidelines [25] and the EORTC-specific module develop-
ment guidelines [26] state that a high percentage of patients
responding either the worst or best category is sufficient
cause for adapting the questionnaire itself. Nonetheless, the
present study shows that analyzing HRQoL scales using a
longitudinal quantitative or a longitudinal ordinal method
does not make much difference as far as statistical signif-
icance is concerned. By contrast, when focusing on the
interpretation of the results, subtle discrepancies appear. In
fact, while the continuous approach only allows presenting
the treatment effect using means (±SD), the ordinal
approach yields odds ratios with 95 % confidence intervals,
as well as a probability distribution of the HRQoL response
categories. As seen above, when dealing with qualitative
data, interpretation is easier and more appealing to physi-
cians and even patients.
It is also important to remark that application of meth-
ods for continuous data does not account for the categorical
feature of the ordinal outcome. In fact, in the analysis
stage, the longitudinal quantitative approach ignored the
fact that values of the HRQoL scales are bounded between
a minimum and maximum value. In the imputation stage,
the application of the MNI algorithm to incomplete ordinal
outcome can provide imputed values that are no longer
integer values and therefore need to be rounded off to the
nearest integer (category) or to the nearest plausible value.
In binary settings, it was demonstrated that rounding off is
not recommended because the rounded imputed values
may provide biased parameter estimates [27, 28]. However,
as we are concerned with missing values for the outcome
variable, this rounding phase is unavoidable before appli-
cation of the GEE method. Another disagreement when
using methods for Normal data to impute ordinal data is the
possible generation of out-of-range imputed values. These
Table 4 Interpretation of
treatment effects at each
assessment time for the selected
HRQoL scales from 26981
EORTC trial
a Mean ± SDb Cumulative odds ratio (95 %
CI) from the proportional odds
model with probabilities
cumulated over the lower
response categories (i.e., Not at
all vs. other—Not at all or A
little vs. other—Not at all or a
little or Quit a bite vs. Very
much) comparing RT
vs. RT ? PCV
HRQoL scale Assessment Continuous mixed model MNI ? GEE OIM ? GEE
time RT RT ? TMZ
Appetite loss Baseline 1.24 ± 0.04a 1.23 ± 0.04 1.24 (0.77–1.99)b 1.24 (0.76–2.01)
FU1 1.28 ± 0.05 1.52 ± 0.05 0.60 (0.39–0.90) 0.49 (0.31–0.77)
FU2 1.37 ± 0.06 1.49 ± 0.05 0.88 (0.59–1.33) 0.78 (0.49–1.23)
FU3 1.34 ± 0.08 1.55 ± 0.07 0.82 (0.51–1.32) 0.60 (0.32–1.14)
FU4 1.25 ± 0.08 1.33 ± 0.05 1.01 (0.57–1.78) 0.90 (0.35–2.29)
Insomnia Baseline 1.76 ± 0.06 1.83 ± 0.06 0.83 (0.58–1.20) 0.84 (0.59–1.21)
FU1 1.64 ± 0.06 1.70 ± 0.06 1.01 (0.69–1.47) 0.87 (0.57–1.31)
FU2 1.61 ± 0.07 1.62 ± 0.06 1.01 (0.65–1.57) 0.93 (0.63–1.37)
FU3 1.51 ± 0.08 1.60 ± 0.07 0.92 (0.54–1.57) 0.78 (0.48–1.28)
FU4 1.36 ± 0.10 1.50 ± 0.07 0.79 (0.47–1.32) 0.48 (0.15–0.48)
Drowsiness Baseline 1.70 ± 0.05 1.73 ± 0.05 1.02 (0.70–1.49) 1.05 (0.74–1.49)
FU1 1.86 ± 0.06 1.94 ± 0.06 0.99 (0.66–1.47) 0.91 (0.63–1.33)
FU2 1.82 ± 0.07 1.95 ± 0.06 0.90 (0.59–1.39) 0.88 (0.54–1.44)
FU3 1.78 ± 0.09 1.88 ± 0.07 1.00 (0.60–1.67) 0.94 (0.45–1.95)
FU4 1.63 ± 0.11 1.78 ± 0.07 0.88 (0.47–1.67) 0.77 (0.34–1.72)
Qual Life Res
123
are values that fall outside the score range. We avoided the
generation of values beyond the upper or lower bound by
restricting the imputed values to the range of the ordinal
outcome variable.
Finally, comparison of both MI methods for incomplete
ordinal outcomes showed that interpretation of the results
differed between the two MI–GEE methods. This obser-
vation was reported elsewhere [17, 20, 21] where the MNI
? GEE method was found to favor the inner categories to
the detriment of the outer ones. By contrast, the OIM ? -
GEE approach respects the marginal distribution of the
ordinal data. As a consequence, as for the analysis model,
the choice of the imputation method should be guided by
the type of the data that need to be imputed. Thus, it is
advisable to impute missing ordinal data using an appro-
priate MI method.
The fact that within patients HRQoL assessments are
more correlated than assessments between patients is an
Fig. 3 Distribution of HRQoL scale at each assessment time and in each treatment arm for both MI ? GEE methods—26951 EORTC trial
Qual Life Res
123
important factor for testing difference between treatment
groups. The goal of a longitudinal data analysis is to
investigate the difference between treatments using all
information available; ignoring this particular feature (as
with simplistic methods such as Student’s t test or chi-
square test) may typically result in loss of power to
detect treatment effect. In case of informative dropout
over time, the result may even be a biased
overestimation.
In conclusion, using inappropriate methods and ignoring
special characteristics of HRQoL longitudinal data lead to
underuse of potential information and may bias both results
and conclusions.
Acknowledgments The authors thank the European Organization
for Research and Treatment of Cancer for permission to use the data
from EORTC trials 26951 and 26981 for this research. This publi-
cation is supported by Fondation Contre le Cancer (Belgium) through
the EORTC Charitable Trust.
Fig. 4 Distribution of HRQoL scale at each assessment time and in each treatment arm for both MI ? GEE methods—26981 EORTC trial
Qual Life Res
123
References
1. Olschewski, M., Schulgen, G., Schumacher, M., & Altman, D. G.
(1994). Quality of life assessment in clinical cancer research.
British Journal of Cancer, 70(1), 1–5.
2. Cox, D. R., & Snell, E. J. (1989). The Analysis of Binary Data
(2nd ed.). London: Chapman & Hall.
3. Agresti, A. (2013). Categorical data analysis (3rd ed.). New
York: Wiley.
4. Agresti, A. (2010). Analysis of ordinal categorical data (2nd ed.).
New York: Wiley.
5. Lall, R., Campbell, M. J., Walters, S. J., Morgan, K., & MRC
CFAS Co-operative. (2002). A review of ordinal regression
models applied on health-related quality of life assessments.
Statistical Methods in Medical Research, 11, 49–67.
6. Taphoorn, M. J., Stupp, R., Coens, C., Osoba, D., Kortmann, R.,
van den Bent, M. J., et al. (2005). Health-related quality of life in
patients with glioblastoma: A randomized controlled trial. Lancet
Oncology, 6(12), 937–944.
7. Taphoorn, M. J., van den Bent, M. J., Mauer, M., Coens, C.,
Delattre, J. Y., Brandes, A., et al. (2007). Health-related quality
of life in patients treated for anaplastic oligodendroglioma with
adjuvant chemotherapy: Results of a European Organisation for
Research and Treatment of Cancer Randomized Clinical Trial.
Journal of Clinical Oncology, 25(38), 5723–5730.
8. Aaronson, N. K., Ahmedzai, S., Bergman, B., Bullinger, M., Cull,
A., Duez, N. J., et al. (1993). The European Organization for
Research and Treatment of Cancer QLQ-C30: A quality-of-life
instrument for use in international clinical trials in oncology.
Journal of the National Cancer Institute, 85, 365–376.
9. Taphoorn, M. J., Claassens, L., Aaronson, N. K., Coens, C.,
Mauer, M., Osoba, D., et al. (2010). An international validation
study of the EORTC brain cancer module (EORTC QLQ-BN20)
for assessing health-related quality of life and symptoms in brain
cancer patients. European Journal of Cancer, 46, 1033–1040.
10. Osoba, D., Aaronson, N. K., Muller, M., Sneeuw, K., Hsu, M. A.,
Yung, W., et al. (1996). The development and psychometric
validation of a brain cancer quality-of-life questionnaire for use
in combination with general cancer-specific questionnaires.
Quality of Life Research, 5(1), 139–150.
11. Stupp, R., Mason, W. P., van den Bent, M. J., Weller, M., Fisher,
B., Taphoorn, M. J., et al. (2005). Radiotherapy plus concomitant
and adjuvant temozolomide for glioblastoma. New England
Journal of Medicine, 352(10), 987–996.
12. Van den Bent, M. J., Carpentier, A. F., Brandes, A., Sanson, M.,
Taphoorn, M. J. B., Bernsen, H., et al. (2006). Adjuvant pro-
carbazine, lomustine, and vincristine improves progression-free
survival but not overall survival in newly diagnosed anaplastic
oligodendrogliomas and oligoastrocytomas: A randomized euro-
pean organisation for research and treatment of cancer phase III
trial. Journal of Clinical Oncology, 24, 2715–2722.
13. Verbeke, G., & Molenberghs, G. (2000). Linear mixed models for
longitudinal data. New York: Springer.
14. McCullagh, P. (1980). Regression models for ordinal data (with
discussion). Journal of the Royal Statistical Society, Series B, 42,
109–142.
15. Rubin, D. B. (1976). Inference and missing data. Biometrika, 63,
581–592.
16. Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with
missing data. New-York: Wiley.
17. Donneau, A. F. (2013). Contribution to the statistical analysis of
incomplete longitudinal ordinal data. PhD thesis, University of
Liege, Belgium.
18. Rubin, D. B. (1978). Multiple imputation in sample surveys—A
phenomenological Bayesian approach to nonresponse. Imputa-
tion and Editing of Faulty or Missing Survey Data, Washington,
DC: US Department of Commerce, pp. 1–32.
19. Carpenter, J. R., & Kenward, M. G. (2013). Multiple imputation
and its application. New York: Wiley.
20. Donneau, A. F., Mauer, M., Molenberghs, G., & Albert, A.
(2013). A simulation study comparing multiple imputation
methods for incomplete longitudinal ordinal data. Communica-
tions in Statistics—Simulation and Computation (in press).
21. Donneau, A. F., Mauer, M., Lambert P., Molenberghs, G., &
Albert, A. (2013). Simulation-based study comparing multiple
imputation methods for non-monotone missing ordinal data in
longitudinal settings Journal of Biopharmaceutical Statistics (in
press).
22. Gorlia, T., van den Bent, M. J., Hegi, M. E., Mirimanoff, R. O.,
Weller, M., Cairncross, J. G., et al. (2008). Nomograms for
predicting survival of patients with newly diagnosed glioblastoma
multiforme: A prognostic factor analysis of EORTC/NCIC trial
26981–22981/CE. The Lancet Oncology, 9, 29–38.
23. Rubin, D. B. (1987). Multiple imputations for nonresponse in
survey. New York: Wiley.
24. Scott, N. W., Fayers, P. M., Aaronson, N. K., Bottomley, A., de
Graeff, A., Groenvold, M., et al. (2008). EORTC QLQ-C30 ref-
erence values. Brussels: EORTC Quality of Life Group
Publications.
25. US Department of Health and Human Services Food and Drug
Administration. (2006). Guidance for industry: Patient report
outcome measures: use in clinical medical product development
to support labelling claims: Draft guidance. Health and Quality of
Life Outcomes, 4, 79.
26. Sprangers, M. A. G., Cull, A., Bjordal, K., et al., & for the EO-
RTC Study Group on Quality of Life. (1994). The European
Organization for Research and Treatment of Cancer approach to
the quality of life (QOL) assessment: Guidelines for developing
questionnaire modules. Quality of Life Research, 3, 67–68.
27. Horton, N., Lipsitz, S., & Parzen, M. (2003). A potential for bias
when rounding in multiple imputation. The American Statistician,
57, 229–232.
28. Allison, P. (2005). Imputation of categorical variables with
PROC MI. Paper presented at SAS Users Group International,
Annual conference, Philadelphia.
Qual Life Res
123