longitudinal quality of life data: a comparison of continuous and ordinal approaches

Longitudinal quality of life data: a comparison of continuousand ordinal approaches

A. F. Donneau • M. Mauer • C. Coens •

A. Bottomley • A. Albert

Accepted: 27 May 2014

� Springer International Publishing Switzerland 2014

Abstract

Purpose In cancer clinical trials, health-related quality of

life (HRQoL) is a major outcome measure. It is generally

assessed at specified time intervals by filling out a ques-

tionnaire with ordered response categories. Despite recent

advances in the statistical methodology for handling ordi-

nal longitudinal outcome data, most users keep treating

HRQoL scales as continuous rather than ordinal variables

regardless of the number of categories. The purpose of this

study was to compare the results of analyzing HRQoL

longitudinal data under both approaches, continuous and

ordinal.

Methods The EORTC QLQ-C30 scores of two EORTC

randomized brain cancer clinical trials (26951 and 26981)

were analyzed using the two approaches. In the 26951 trial,

a total of 368 patients were randomly assigned to receive

either radiotherapy (RT) or the same RT plus procarbazine,

CCNU, and vincristine. In the 26981 trial, 573 patients

were randomly allocated to RT or RT plus temozolomide.

Comparison of the two treatment arms was done using

methods for longitudinal quantitative and longitudinal

ordinal data. Both statistical methods were adapted to

account for missing data and compared in terms of

statistical significance of the results (p values) but also with

respect to data interpretation.

Results Three scales, i.e., appetite loss, insomnia, and

drowsiness, presenting four response categories (‘‘Not at

all’’, ‘‘A little’’, ‘‘Quite a bite’’, and ‘‘Very much’’) were

analyzed in each trial. Both statistical methods (continuous

and ordinal) showed statistically significant differences

between the two treatments, not only globally but also at

the same assessment time points. The magnitude of the

p values, however, varied at some time points and was less

pronounced in the ordinal approach.

Conclusions The analysis of the two clinical trials

showed that treating the HRQoL scales by a quantitative or

an ordinal method did not make much difference as far as

statistical significance was concerned. The interpretation of

results, however, was easier under the ordinal approach.

Treatment effects may be more meaningful when expres-

sed in terms of odds ratios than as mean values, particularly

when the number categories is small.

Keywords Health-related quality of life � Continuous �Ordinal � Longitudinal � Missing at random

Introduction

Classical endpoints in cancer clinical trials are usually

defined in terms of event-free or overall survival, but

physicians often need a more comprehensive evaluation of

treatment efficacy. In this context, the subjective response

of the patients to their illness and its treatment, specifically

the patients’ quality of life (HRQoL), has to be taken into

account [1]. HRQoL assessments are generally made

repeatedly during the clinical trial by means of self-

reported questionnaires consisting of various items, fre-

quently scored on a binary or ordinal scale.

A. F. Donneau (&) � A. Albert

Medical Informatics and Biostatistics, University of Liege,

Liege, Belgium

e-mail: [email protected]

M. Mauer

Department of Statistics, EORTC Headquarters, Brussels,

Belgium

C. Coens � A. Bottomley

Department of Quality of Life, EORTC Headquarters, Brussels,

Belgium

123

Qual Life Res

DOI 10.1007/s11136-014-0730-8

The statistical analysis of longitudinal HRQoL data may

be complicated in two ways: (1) by the nature of the

HRQoL score itself and (2) by missing observations. In

fact, patients enrolled in cancer clinical trials are likely to

experience adverse events (toxicity, disease progression, or

even death) that will interfere with data collection. Missing

data are all the more important in the analysis of HRQoL

data that it is likely to be related to the HRQoL deterio-

ration of the patient.

In the literature [2, 3], it is well established that when

the outcome of interest is dichotomous, methods for binary

variables should be employed. However, when the HRQoL

scale under study is ordinal, no unique approach exists.

Several models for categorical outcomes have been pro-

posed [3–5], but in practice, they have been underutilized.

In fact, most papers [6, 7] concerned with the statistical

analysis of HRQoL scales treat them as continuous or as

binary rather than as categorical variables regardless of the

number of categories of the scale. Although this approach

may not be optimal, as it does not utilize all the information

available, it is often preferred due to its simplicity.

The aim of this paper was to compare the results of the

analysis of HRQoL longitudinal data treating them as

continuous variable or as categorical ordinal.

Methods

Study material

The EORTC QLQ-C30 version 2 questionnaire [8] used

hereafter is a ‘‘core questionnaire’’, which incorporates a

range of physical, emotional, and social health issues rel-

evant to a broad spectrum of cancer patients. This core

questionnaire may be supplemented by diagnosis-specific

and/or treatment-specific questionnaire modules. The latter

can provide more detailed information relevant to evalu-

ating the HRQoL in specific patient populations. The QLQ-

C30 incorporates nine multi-items scales: five functional

scales (physical, role, cognitive, emotional, and social);

three symptom scales (fatigue, pain, and nausea/vomiting);

and a Global Health Status scale. It also includes six single

items (dyspnoea, insomnia, appetite loss, constipation,

diarrhea, and financial difficulties). The sub-scales scores

are obtained by averaging items within the scale. A high

score for a functional scale represents a high/healthy level

of functioning, whereas a high score for a symptom scale or

item represents a high level of symptomatology or prob-

lems. The EORTC Brain Cancer Module (EORTC QLQ-

BN20) is intended to supplement the QLQ-C30 when

assessing HRQoL, disease symptoms, side effects of

treatment, and some specific psychosocial issues of

importance to patients with brain cancer. The QLQ-BN20

contains 20 items, 13 of which aggregate into four scales

assessing future uncertainty, visual disorder, motor dys-

function (MD), and communication deficit. The remaining

single items assess other disease symptoms (e.g., head-

aches and seizures) and treatment toxic effects (e.g., hair

loss) [9]. For all these scales, a higher score represents

worse HRQoL. The specific questions and scoring systems

of the three measures may be found on the EORTC QOL

website: http://groups.eortc.be/qol.

The datasets used in this paper were two phase III,

multicenter, randomized trials that compare two regimens

for patients suffering from brain tumor. In the EORTC

26951, radiation alone (RT) was compared to radiother-

apy plus chemotherapy (RT ? PCV) in patients with

recurrent anaplastic oligodendroglioma (AOD) and ana-

plastic mixed oligoastrocytomas (AOA) brain tumor. The

adjuvant procarbazine, CCNU (lomustine), and vincristine

(PCV) chemotherapy consisted of 6 cycles of standard

PCV chemotherapy, started within 4 weeks following the

end of radiation therapy. Each cycle consisted of CCNU

110 mg/m2 orally on day 1 with anti-emetic; procarbazine

60 mg/m2 orally on day 8–21; and vincristine, 1.4 mg/m2

i.v. on day 8 and 29. Cycles were to be repeated every

6 weeks, with dose reduction. A total of 368 patients was

randomized in this study by 40 institutions, 183 in the RT

alone arm and 185 in the RT?PCV arm. The EORTC

26981 trial compared radiotherapy (RT) and radiotherapy

plus concomitant daily temozolomide, followed by adju-

vant temozolomide (RT?TMZ) in patients with newly

diagnosed and histologically confirmed glioblastoma.

Between August 2000 and March 2002, a total of 573

patients were randomized by 85 institutions in 15 coun-

tries in this trial, respectively, 286 in the RT arm and 287

in the RT?TMZ arm.

In both trials, HRQoL was planned to be assessed in a

longitudinal design in all patients using the EORTC QLQ-

C30 version 2 questionnaire [8] in combination with the

disease specific Brain Cancer Module [10], which addres-

ses 20 topics relevant for brain tumor. Baseline HRQoL

assessment was performed at randomization. Follow-up

assessments were performed at regular intervals, as shown

in Fig. 1.

Clinical and HRQoL results of both studies have been

published elsewhere [6, 7, 11, 12].

Only the single-item scales on both measures were

considered, namely dyspnea, insomnia, appetite loss, con-

stipation, diarrhea and financial difficulties from the QLQ-

C30 and bladder control, drowsiness, headaches, hair loss,

itchy skin, seizure, and weakness legs from the QLQ-

BN20. Each single-item scale is an ordinal variable with

four response categories: ‘‘Not at all,’’ ‘‘A little,’’ ‘‘Quite a

Qual Life Res

123

http://groups.eortc.be/qol

bit,’’ and ‘‘Very much.’’ For illustrative purposes, only the

four follow-up assessment times of both trials were con-

sidered here.

Statistical methods

Two main statistical approaches were considered to test for

differences between the two treatment arms. Specifically, a

linear mixed model and a proportional odds model using

the generalized estimating equations (GEE) method were

fitted to the data when considering the HRQoL scale as a

continuous or an ordinal outcome, respectively. Consider a

sample of N subjects and let Y be a HRQoL variable with

K ordered categories assessed on T occasions in each

subject. Then, let Yij denote the assessment of the variable

Y for the ith subject ði ¼ 1; . . .;NÞ at the jth occasion

ðj ¼ 1; . . .; TÞ. Hence, Yi ¼ ðYi1; . . .;YiTÞ0 is the vector of

the repeated assessments of the ith subject and Yj ¼ðY1j; . . .;YNjÞ0 is the vector of responses at the jth occa-

sion. Associated with each subject, there is a p� 1 vector

of covariates, say xij, measured at time j. Hence, let Xi ¼ðxi1; . . .; xiTÞ0 denote the T � p design matrix of the ith

subject. In the present study, covariates include treatment’s

effect, time, and interaction between time and treatment.

One way to assesses the impact of the covariates X on

the continuous HRQoL assessments, Yij, is through the

application of a linear mixed model. This model, com-

monly used for the analysis of continuous longitudinal data

[13], can be written as

Yij ¼ x0ijbþ �ij; ð1Þ

where b ¼ ðb1; . . .; bpÞ0 is the vector of coefficients and

�ij the error components assumed to be normally distrib-

uted with mean zero, �i�Nð0;RÞ ði ¼ 1; . . .;N; j ¼ 1; . . .;

T ; k ¼ 1; . . .;K � 1). In what follows, a mixed model with

an undefined covariance structure was fitted to the longitu-

dinal HRQoL data.

When considering Yij as an ordinal variable with K

categories, the cumulative proportional odds model is a

popular choice to relate the marginal probabilities of Y to

the covariate vector x [14]. Specifically,

logit½PrðYij� kjxijÞ� ¼ b0k þ x0ijb; ð2Þ

where b0 ¼ ðb01; . . .; b0;K�1Þ0 is the vector of the intercept

parameters and b ¼ ðb1; . . .; bpÞ0 the vector of coefficients

(i ¼ 1; . . .;N; j ¼ 1; . . .; T ; k ¼ 1; . . .;K � 1).

Both methods account for the repeated feature of serial

assessments of HRQoL scales over time. However, special

attention has to be given to the handling of missing data in

the repeated HRQoL assessments. Missing data occurred

when patients did not complete all or some items of the

HRQoL questionnaires at the time of a scheduled evalua-

tion. Missing data also occurred when patients dropped out

from the study because of disease progression, death, or

end of the clinical follow-up period.

The terminology introduced by Rubin [15] and Little

and Rubin [16] was considered when referring to the

missingness process: missing completely at random

(MCAR), missing at random (MAR), and missing not at

random (MNAR). Under the MCAR mechanism, the

probability of an observation being missing is independent

of both the unobserved and the observed data. Under the

MAR mechanism, the probability of an observation being

missing is independent of the unobserved measurements,

given the observed data. When neither MCAR nor MAR

holds, the missingness mechanism is said to be MNAR,

whence the probability of an observation being missing

depends on unobserved data.

To check the reliability of the use of the proposed

model, the missingness mechanism was investigated [6, 7,

17]. Specifically, a logistic regression analysis of the

occurrence of missingness at the previous quality of life

response was conducted. In both trials, this missingness

investigation revealed that the probability for an observa-

tion to be missing was significantly related to the previous

response. As a consequence, the mechanism generating the

missingness in these data was not MCAR. In the following,

the assumption of an MAR process was made. Neverthe-

less, the possibility of a MNAR process should not be

discarded. In this respect, sensitivity analysis can be

applied but was not discussed in the present paper. While

the mixed model is valid under the MCAR or MAR

assumption, the ordinal GEE method has to be adapted to

account for the presence of MAR data. In this perspective,

multiple imputation (MI) [18, 19] was applied as a pre-

liminary step. Two MI approaches were investigated. The

Fig. 1 Design of the 26951–26981 EORTC trials

Qual Life Res

123

first one is the widely used multivariate normal imputation

(MNI) algorithm, which is based on the multivariate nor-

mal distribution for each variable for which data need to be

imputed. Although appropriate for continuous outcomes,

this algorithm, referred to as (MNI ? GEE), is often

applied to impute ordinal data. The second MI method

based on the ordinal imputation method (OIM) will be

labeled (OIM ? GEE). It accounts for the ordinal feature of

the outcome by imputing missing observations through an

ordinal logistic regression model. Technical details about

the two MI methods for incomplete longitudinal ordinal

data are given in [20, 21]. In both MI approaches, the

number of imputation was fixed to 20. Based on Rubin’s

rule [23], the most important prognostic baseline clinical

factors, the factors found to be associated with the dropout

mechanism [6, 7, 12, 22] as well as the treatment, were

included in the imputation model. In both trials, the pro-

portionality assumption (underlying the proportional odds

model) was satisfied by all investigated single-item

HRQoL scales.

Results

Data distribution

Careful data scrutiny should always be a starting point

prior to data analysis. As an illustration, the distribution of

the Appetite loss HRQoL scale at each assessment time and

in each treatment arm in the 26951 EORTC trial is dis-

played in Fig. 2. The barplots show that whatever the

treatment and time point, the distribution is particularly

peaked and positively skewed. Thus, methods for contin-

uous and normally distributed data may not be the best

choice to analyze HRQoL data. Similar behaviors were

observed for the other scales (data not shown).

The distribution of patient dropout at each time point in

each treatment arm is given in Table 1 for both EORTC trials.

Data analysis

Table 2 displays the p values for ‘‘treatment effect’’ at

each time point in both trials and for each HRQoL scale.

Globally, the two statistical approaches revealed a treat-

ment effect at the same assessment times. However,

p values derived under the continuous approach were

usually ‘‘more significant.’’ As a consequence, while the

continuous approach tended to be significant at some

occasions (e.g., appetite loss at FU3 in EORTC 26981

and insomnia at FU2 in EORTC 26981), the ordinal GEE

method did not. The same observation was made for the

other scales.

Concerning the interpretation of the treatment effect, the

estimated mean scores derived in both treatment arms from

the continuous mixed model approach and OR (with cor-

responding 95 % confidence interval) derived from the

ordinal approaches are given in Tables 3 and 4 for

‘‘Appetite loss,’’ ‘‘Insomnia,’’ and ‘‘Drowsiness’’ HRQoL

scale in EORTC 26951 trial and EORTC 26981 trial,

respectively.

For the ordinal approach, the OR values derived under

the MNI approach were always higher than those derived

under the OIM. As an example (Table 4), under the MNI ?

GEE, the odds of presenting severe appetite loss at FU1

Fig. 2 Distribution of Appetite loss HRQoL scale at each assessment time and in each treatment arm—26951 EORTC trial

Table 1 Distribution of patient dropout at each time point in both

EORTC trials

Assessment

time

EORCT 26951 trial EORCT 26981 trial

RT RT ?

PCV

RT RT ?

TMZ

Baseline 25 (14.3) 35 (20.2) 19 (7.11) 27 (10.0)

FU1 46 (26.3) 45 (26.0) 78 (29.2) 67 (24.9)

FU2 86 (49.1) 70 (40.4) 124 (46.4) 104 (38.7)

FU3 106 (60.5) 98 (56.6) 204 (76.4) 175 (65.1)

FU4 119 (68.0) 109 (63.0) 236 (88.4) 188 (69.9)

Qual Life Res

123

were 1.67 (=1/0.60) times higher with RT ? TMZ than

with RT alone. By contrast, under the OIM ? GEE, the

odds of presenting severe appetite loss were 2.04 (=1/0.49)

times higher.

The use of the GEE ordinal method also allows to derive

the probabilities of each category at each assessment time

in both treatment arms. These probabilities are depicted for

both MI approaches in Fig. 3 for the three selected HRQoL

scales in EORTC 26951 trial. Figure 4 presents the same

results for the EORTC 26981 trial. It appears that the

probability profiles derived under the MNI ? GEE method

differ substantially from those derived under the

OIM ? GEE method. As an example (Fig. 3), for the

appetite loss in the RT ? PVC arm at FU3, the category

probabilities after MNI imputation were equal to 38.7,

40.0, 16.5, and 4.82 %, respectively, while under the OIM

Table 3 Interpretation of

treatment effects at each

assessment time for the selected

HRQoL scales from 26951

EORTC trial

a Mean ± SDb Cumulative odds ratio (95 %

CI) from the proportional odds

model with probabilities

cumulated over the lower

response categories (i.e., Not at

all vs. other—Not at all or A

little vs. other—Not at all or a

little or Quit a bite vs. Very

much) comparing RT versus

RT ? PCV

HRQoL scale Assessment Continuous mixed model MNI ? GEE OIM ? GEE

time RT RT ? PCV

Appetite loss Baseline 1.18 ± 0.04a 1.19 ± 0.04 0.88 (0.47–1.65)b 0.92 (0.47–1.79)b

FU1 1.41 ± 0.07 1.44 ± 0.07 0.95 (0.58–1.57) 0.93 (0.52–1.66)

FU2 1.53 ± 0.09 1.86 ± 0.09 0.59 (0.35–0.98) 0.51 (0.28–0.92)

FU3 1.37 ± 0.09 1.88 ± 0.09 0.37 (0.21–0.66) 0.35 (0.18–0.67)

FU4 1.20 ± 0.10 1.79 ± 0.09 0.34 (0.18–0.64) 0.19 (0.08–0.43)

Insomnia Baseline 1.55 ± 0.07 1.80 ± 0.08 0.57 (0.35–0.92) 0.59 (0.37–0.92)

FU1 1.45 ± 0.07 1.49 ± 0.07 0.86 (0.52–1.43) 0.88 (0.51–1.51)

FU2 1.39 ± 0.08 1.64 ± 0.08 0.57 (0.34–0.95) 0.55 (0.28–1.07)

FU3 1.46 ± 0.08 1.50 ± 0.07 1.03 (0.63–1.70) 0.90 (0.47–1.71)

FU4 1.51 ± 0.10 1.72 ± 0.09 0.69 (0.41–1.15) 0.79 (0.39–1.62)

Drowsiness Baseline 1.71 ± 0.06 1.71 ± 0.07 0.95 (0.59–1.53) 0.97 (0.64–1.46)

FU1 1.73 ± 0.07 1.77 ± 0.07 0.94 (0.60–1.49) 0.93 (0.60–1.44)

FU2 1.81 ± 0.08 1.77 ± 0.08 1.18 (0.70–2.00) 1.06 (0.63–1.76)

FU3 1.55 ± 0.09 1.89 ± 0.09 0.51 (0.29–0.88) 0.43 (0.26–0.73)

FU4 1.72 ± 0.10 1.81 ± 0.09 0.90 (0.51–1.61) 0.88 (0.49–1.60)

Table 2 P value related to

treatment effect at each

assessment time for selected

HRQoL scales in both EORTC

trials

HRQoL

scale

Assessment

time

EORCT 26951 trial EORCT 26981 trial

Continuous

mixed model

MNI

?GEE

OIM ?

GEE

Continuous

mixed model

MNI

?GEE

MNI

?GEE

Appetite

loss

Baseline 0.83 0.79 0.69 0.81 0.38 0.39

FU1 0.76 0.74 0.51 0.0009 0.015 0.0023

FU2 0.013 0.075 0.0071 0.11 0.55 0.28

FU3 \0.0001 0.0008 0.0001 0.051 0.41 0.12

FU4 \0.0001 0.0004 0.0002 0.45 0.98 0.82

Insomnia Baseline 0.021 0.023 0.020 0.40 0.32 0.36

FU1 0.63 0.57 0.65 0.45 0.98 0.49

FU2 0.023 0.032 0.079 0.84 0.97 0.69

FU3 0.72 0.90 0.73 0.40 0.75 0.31

FU4 0.13 0.15 0.51 0.25 0.35 0.19

Drowsiness Baseline 0.95 0.84 0.88 0.62 0.91 0.79

FU1 0.69 0.80 0.74 0.33 0.94 0.63

FU2 0.71 0.52 0.83 0.17 0.64 0.07

FU3 0.006 0.015 0.002 0.37 0.99 0.86

FU4 0.52 0.73 0.68 0.27 0.69 0.51

Qual Life Res

123

imputation they amounted 45.9, 28.7, 14.1, and 11.3 %,

respectively.

Conclusions

This paper compared the results of the analysis of HRQoL

longitudinal data considered either as a continuous or an

ordinal outcome. The two evaluated approaches took into

account correlated and missing data specific to longitudinal

HRQoL assessments.

Due to the low number of possible response categories

(i.e., ‘‘Not at all’’, ‘‘A little’’, ‘‘Quite a bit’’, and ‘‘Very

much’’), analyzing EORTC single-item scales as continu-

ous is not optimal. Moreover, assuming a normal distri-

bution was also unrealistic, because the distributions were

barely symmetric. This can be easily verified from the data

presented in the ‘‘EORTC QLQ-C30 Reference Values’’

manual [24]. This manual compiles EORTC QLQ-C30

data from 23,553 cancer patients. For 17 out of the 30

questions of the questionnaire, more than 50 % of the

answers fall into the lowest answer category (‘‘Not at all’’)

out of four possible categories. This is especially prevalent

for symptom-related questions (e.g., diarrhea, constipation,

and vomiting) where specific symptoms are often absent or

treated via concomitant medication. The non-normality and

skewed distributions of HRQoL scales are also recognized

by the questionnaire developers. Both the FDA regulatory

guidelines [25] and the EORTC-specific module develop-

ment guidelines [26] state that a high percentage of patients

responding either the worst or best category is sufficient

cause for adapting the questionnaire itself. Nonetheless, the

present study shows that analyzing HRQoL scales using a

longitudinal quantitative or a longitudinal ordinal method

does not make much difference as far as statistical signif-

icance is concerned. By contrast, when focusing on the

interpretation of the results, subtle discrepancies appear. In

fact, while the continuous approach only allows presenting

the treatment effect using means (±SD), the ordinal

approach yields odds ratios with 95 % confidence intervals,

as well as a probability distribution of the HRQoL response

categories. As seen above, when dealing with qualitative

data, interpretation is easier and more appealing to physi-

cians and even patients.

It is also important to remark that application of meth-

ods for continuous data does not account for the categorical

feature of the ordinal outcome. In fact, in the analysis

stage, the longitudinal quantitative approach ignored the

fact that values of the HRQoL scales are bounded between

a minimum and maximum value. In the imputation stage,

the application of the MNI algorithm to incomplete ordinal

outcome can provide imputed values that are no longer

integer values and therefore need to be rounded off to the

nearest integer (category) or to the nearest plausible value.

In binary settings, it was demonstrated that rounding off is

not recommended because the rounded imputed values

may provide biased parameter estimates [27, 28]. However,

as we are concerned with missing values for the outcome

variable, this rounding phase is unavoidable before appli-

cation of the GEE method. Another disagreement when

using methods for Normal data to impute ordinal data is the

possible generation of out-of-range imputed values. These

Table 4 Interpretation of

treatment effects at each

assessment time for the selected

HRQoL scales from 26981

EORTC trial

a Mean ± SDb Cumulative odds ratio (95 %

CI) from the proportional odds

model with probabilities

cumulated over the lower

response categories (i.e., Not at

all vs. other—Not at all or A

little vs. other—Not at all or a

little or Quit a bite vs. Very

much) comparing RT

vs. RT ? PCV

HRQoL scale Assessment Continuous mixed model MNI ? GEE OIM ? GEE

time RT RT ? TMZ

Appetite loss Baseline 1.24 ± 0.04a 1.23 ± 0.04 1.24 (0.77–1.99)b 1.24 (0.76–2.01)

FU1 1.28 ± 0.05 1.52 ± 0.05 0.60 (0.39–0.90) 0.49 (0.31–0.77)

FU2 1.37 ± 0.06 1.49 ± 0.05 0.88 (0.59–1.33) 0.78 (0.49–1.23)

FU3 1.34 ± 0.08 1.55 ± 0.07 0.82 (0.51–1.32) 0.60 (0.32–1.14)

FU4 1.25 ± 0.08 1.33 ± 0.05 1.01 (0.57–1.78) 0.90 (0.35–2.29)

Insomnia Baseline 1.76 ± 0.06 1.83 ± 0.06 0.83 (0.58–1.20) 0.84 (0.59–1.21)

FU1 1.64 ± 0.06 1.70 ± 0.06 1.01 (0.69–1.47) 0.87 (0.57–1.31)

FU2 1.61 ± 0.07 1.62 ± 0.06 1.01 (0.65–1.57) 0.93 (0.63–1.37)

FU3 1.51 ± 0.08 1.60 ± 0.07 0.92 (0.54–1.57) 0.78 (0.48–1.28)

FU4 1.36 ± 0.10 1.50 ± 0.07 0.79 (0.47–1.32) 0.48 (0.15–0.48)

Drowsiness Baseline 1.70 ± 0.05 1.73 ± 0.05 1.02 (0.70–1.49) 1.05 (0.74–1.49)

FU1 1.86 ± 0.06 1.94 ± 0.06 0.99 (0.66–1.47) 0.91 (0.63–1.33)

FU2 1.82 ± 0.07 1.95 ± 0.06 0.90 (0.59–1.39) 0.88 (0.54–1.44)

FU3 1.78 ± 0.09 1.88 ± 0.07 1.00 (0.60–1.67) 0.94 (0.45–1.95)

FU4 1.63 ± 0.11 1.78 ± 0.07 0.88 (0.47–1.67) 0.77 (0.34–1.72)

Qual Life Res

123

are values that fall outside the score range. We avoided the

generation of values beyond the upper or lower bound by

restricting the imputed values to the range of the ordinal

outcome variable.

Finally, comparison of both MI methods for incomplete

ordinal outcomes showed that interpretation of the results

differed between the two MI–GEE methods. This obser-

vation was reported elsewhere [17, 20, 21] where the MNI

? GEE method was found to favor the inner categories to

the detriment of the outer ones. By contrast, the OIM ? -

GEE approach respects the marginal distribution of the

ordinal data. As a consequence, as for the analysis model,

the choice of the imputation method should be guided by

the type of the data that need to be imputed. Thus, it is

advisable to impute missing ordinal data using an appro-

priate MI method.

The fact that within patients HRQoL assessments are

more correlated than assessments between patients is an

Fig. 3 Distribution of HRQoL scale at each assessment time and in each treatment arm for both MI ? GEE methods—26951 EORTC trial

Qual Life Res

123

important factor for testing difference between treatment

groups. The goal of a longitudinal data analysis is to

investigate the difference between treatments using all

information available; ignoring this particular feature (as

with simplistic methods such as Student’s t test or chi-

square test) may typically result in loss of power to

detect treatment effect. In case of informative dropout

over time, the result may even be a biased

overestimation.

In conclusion, using inappropriate methods and ignoring

special characteristics of HRQoL longitudinal data lead to

underuse of potential information and may bias both results

and conclusions.

Acknowledgments The authors thank the European Organization

for Research and Treatment of Cancer for permission to use the data

from EORTC trials 26951 and 26981 for this research. This publi-

cation is supported by Fondation Contre le Cancer (Belgium) through

the EORTC Charitable Trust.

Fig. 4 Distribution of HRQoL scale at each assessment time and in each treatment arm for both MI ? GEE methods—26981 EORTC trial

Qual Life Res

123

References

1. Olschewski, M., Schulgen, G., Schumacher, M., & Altman, D. G.

(1994). Quality of life assessment in clinical cancer research.

British Journal of Cancer, 70(1), 1–5.

2. Cox, D. R., & Snell, E. J. (1989). The Analysis of Binary Data

(2nd ed.). London: Chapman & Hall.

3. Agresti, A. (2013). Categorical data analysis (3rd ed.). New

York: Wiley.

4. Agresti, A. (2010). Analysis of ordinal categorical data (2nd ed.).

New York: Wiley.

5. Lall, R., Campbell, M. J., Walters, S. J., Morgan, K., & MRC

CFAS Co-operative. (2002). A review of ordinal regression

models applied on health-related quality of life assessments.

Statistical Methods in Medical Research, 11, 49–67.

6. Taphoorn, M. J., Stupp, R., Coens, C., Osoba, D., Kortmann, R.,

van den Bent, M. J., et al. (2005). Health-related quality of life in

patients with glioblastoma: A randomized controlled trial. Lancet

Oncology, 6(12), 937–944.

7. Taphoorn, M. J., van den Bent, M. J., Mauer, M., Coens, C.,

Delattre, J. Y., Brandes, A., et al. (2007). Health-related quality

of life in patients treated for anaplastic oligodendroglioma with

adjuvant chemotherapy: Results of a European Organisation for

Research and Treatment of Cancer Randomized Clinical Trial.

Journal of Clinical Oncology, 25(38), 5723–5730.

8. Aaronson, N. K., Ahmedzai, S., Bergman, B., Bullinger, M., Cull,

A., Duez, N. J., et al. (1993). The European Organization for

Research and Treatment of Cancer QLQ-C30: A quality-of-life

instrument for use in international clinical trials in oncology.

Journal of the National Cancer Institute, 85, 365–376.

9. Taphoorn, M. J., Claassens, L., Aaronson, N. K., Coens, C.,

Mauer, M., Osoba, D., et al. (2010). An international validation

study of the EORTC brain cancer module (EORTC QLQ-BN20)

for assessing health-related quality of life and symptoms in brain

cancer patients. European Journal of Cancer, 46, 1033–1040.

10. Osoba, D., Aaronson, N. K., Muller, M., Sneeuw, K., Hsu, M. A.,

Yung, W., et al. (1996). The development and psychometric

validation of a brain cancer quality-of-life questionnaire for use

in combination with general cancer-specific questionnaires.

Quality of Life Research, 5(1), 139–150.

11. Stupp, R., Mason, W. P., van den Bent, M. J., Weller, M., Fisher,

B., Taphoorn, M. J., et al. (2005). Radiotherapy plus concomitant

and adjuvant temozolomide for glioblastoma. New England

Journal of Medicine, 352(10), 987–996.

12. Van den Bent, M. J., Carpentier, A. F., Brandes, A., Sanson, M.,

Taphoorn, M. J. B., Bernsen, H., et al. (2006). Adjuvant pro-

carbazine, lomustine, and vincristine improves progression-free

survival but not overall survival in newly diagnosed anaplastic

oligodendrogliomas and oligoastrocytomas: A randomized euro-

pean organisation for research and treatment of cancer phase III

trial. Journal of Clinical Oncology, 24, 2715–2722.

13. Verbeke, G., & Molenberghs, G. (2000). Linear mixed models for

longitudinal data. New York: Springer.

14. McCullagh, P. (1980). Regression models for ordinal data (with

discussion). Journal of the Royal Statistical Society, Series B, 42,

109–142.

15. Rubin, D. B. (1976). Inference and missing data. Biometrika, 63,

581–592.

16. Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with

missing data. New-York: Wiley.

17. Donneau, A. F. (2013). Contribution to the statistical analysis of

incomplete longitudinal ordinal data. PhD thesis, University of

Liege, Belgium.

18. Rubin, D. B. (1978). Multiple imputation in sample surveys—A

phenomenological Bayesian approach to nonresponse. Imputa-

tion and Editing of Faulty or Missing Survey Data, Washington,

DC: US Department of Commerce, pp. 1–32.

19. Carpenter, J. R., & Kenward, M. G. (2013). Multiple imputation

and its application. New York: Wiley.

20. Donneau, A. F., Mauer, M., Molenberghs, G., & Albert, A.

(2013). A simulation study comparing multiple imputation

methods for incomplete longitudinal ordinal data. Communica-

tions in Statistics—Simulation and Computation (in press).

21. Donneau, A. F., Mauer, M., Lambert P., Molenberghs, G., &

Albert, A. (2013). Simulation-based study comparing multiple

imputation methods for non-monotone missing ordinal data in

longitudinal settings Journal of Biopharmaceutical Statistics (in

press).

22. Gorlia, T., van den Bent, M. J., Hegi, M. E., Mirimanoff, R. O.,

Weller, M., Cairncross, J. G., et al. (2008). Nomograms for

predicting survival of patients with newly diagnosed glioblastoma

multiforme: A prognostic factor analysis of EORTC/NCIC trial

26981–22981/CE. The Lancet Oncology, 9, 29–38.

23. Rubin, D. B. (1987). Multiple imputations for nonresponse in

survey. New York: Wiley.

24. Scott, N. W., Fayers, P. M., Aaronson, N. K., Bottomley, A., de

Graeff, A., Groenvold, M., et al. (2008). EORTC QLQ-C30 ref-

erence values. Brussels: EORTC Quality of Life Group

Publications.

25. US Department of Health and Human Services Food and Drug

Administration. (2006). Guidance for industry: Patient report

outcome measures: use in clinical medical product development

to support labelling claims: Draft guidance. Health and Quality of

Life Outcomes, 4, 79.

26. Sprangers, M. A. G., Cull, A., Bjordal, K., et al., & for the EO-

RTC Study Group on Quality of Life. (1994). The European

Organization for Research and Treatment of Cancer approach to

the quality of life (QOL) assessment: Guidelines for developing

questionnaire modules. Quality of Life Research, 3, 67–68.

27. Horton, N., Lipsitz, S., & Parzen, M. (2003). A potential for bias

when rounding in multiple imputation. The American Statistician,

57, 229–232.

28. Allison, P. (2005). Imputation of categorical variables with

PROC MI. Paper presented at SAS Users Group International,

Annual conference, Philadelphia.

Qual Life Res

123

longitudinal quality of life data: a comparison of continuous and ordinal approaches

Documents