adaptive design improvements in the continual reassessment method for phase i studies

15
STATISTICS IN MEDICINE Statist. Med. 18, 1307 } 1321 (1999) ADAPTIVE DESIGN IMPROVEMENTS IN THE CONTINUAL REASSESSMENT METHOD FOR PHASE I STUDIES JULIE M. HEYD1 AND BRADLEY P. CARLIN2* 1Medtronic Inc., 800 53rd Avenue N.E. N300, Minneapolis, Minnesota 55421, U.S.A. 2Division of Biostatistics, School of Public Health, University of Minnesota, Box 303 Mayo Memorial Building, Minneapolis, Minnesota 55455-0392, U.S.A. SUMMARY The continual reassessment method (CRM) enables full and e$cient use of all data and prior information available in a phase I study. However, despite a number of recent enhancements to the method, its acceptance in actual clinical practice has been hampered by several practical di$culties. In this paper, we consider several further re"nements in the context of phase I oncology trials. In particular, we allow the trial to stop when the width of the posterior 95 per cent probability interval for the maximum tolerated dose (MTD) becomes su$ciently narrow (that is, when the information accumulating from the trial data reaches a prespeci"ed level). We employ a simulation study to evaluate "ve such stopping rules under three alternative states of prior knowledge regarding the MTD (accurate, too low and too high). Our results suggest our adaptive designs preserve the CRM's estimation ability while o!ering the possibility of earlier stopping of the trial. Copyright ( 1999 John Wiley & Sons, Ltd. 1. INTRODUCTION 1.1. Phase I Oncology Studies A phase I study is typically the "rst trial of a therapeutic agent in human subjects. To reach phase I, the drug must "rst show promise of activity in vitro and in animals. A multiple of the e!ective dose from previous laboratory and animal studies is then used as the initial dose level in human subjects. Much of our own experience with phase I studies comes from the "eld of oncology; here, unlike other phase I trials, the trial does not treat healthy volunteers. Instead, the trial enrols patients who are extremely ill with cancer, and for whom accepted treatments may not have worked. The main goals of a phase I oncology trial are to understand the dose} toxicity relationship of the new regimen, and to "nd the maximum amount of drug which can be administered without excessive toxicity. The highest possible dose is sought, since the bene"t of the new treatment is believed to increase with dose. Unfortunately, the severity of toxicity is also expected to increase with dose, so the challenge is to increase the dose without causing an unacceptable amount of * Correspondence to: B. P. Carlin, Division of Biostatistics, School of Public Health, University of Minnesota, Box 303 Mayo Memorial Building, Minneapolis, Minnesota 55455-0392, U.S.A. E-mail: brad@biostat.umn.edu Contract/grant sponsor: National Institute of Allergy and Infectious Diseases Contract/grant number: 1-R01-AI41966 CCC 0277}6715/99/111307}15$17.50 Received September 1997 Copyright ( 1999 John Wiley & Sons, Ltd. Accepted September 1998

Upload: julie-m-heyd

Post on 06-Jun-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Adaptive design improvements in the continual reassessment method for phase I studies

STATISTICS IN MEDICINE

Statist. Med. 18, 1307}1321 (1999)

ADAPTIVE DESIGN IMPROVEMENTS IN THE CONTINUALREASSESSMENT METHOD FOR PHASE I STUDIES

JULIE M. HEYD1 AND BRADLEY P. CARLIN2*

1Medtronic Inc., 800 53rd Avenue N.E. N300, Minneapolis, Minnesota 55421, U.S.A.2Division of Biostatistics, School of Public Health, University of Minnesota, Box 303 Mayo Memorial Building, Minneapolis,

Minnesota 55455-0392, U.S.A.

SUMMARY

The continual reassessment method (CRM) enables full and e$cient use of all data and prior informationavailable in a phase I study. However, despite a number of recent enhancements to the method, itsacceptance in actual clinical practice has been hampered by several practical di$culties. In this paper, weconsider several further re"nements in the context of phase I oncology trials. In particular, we allow the trialto stop when the width of the posterior 95 per cent probability interval for the maximum tolerated dose(MTD) becomes su$ciently narrow (that is, when the information accumulating from the trial data reachesa prespeci"ed level). We employ a simulation study to evaluate "ve such stopping rules under threealternative states of prior knowledge regarding the MTD (accurate, too low and too high). Our resultssuggest our adaptive designs preserve the CRM's estimation ability while o!ering the possibility of earlierstopping of the trial. Copyright ( 1999 John Wiley & Sons, Ltd.

1. INTRODUCTION

1.1. Phase I Oncology Studies

A phase I study is typically the "rst trial of a therapeutic agent in human subjects. To reach phase I,the drug must "rst show promise of activity in vitro and in animals. A multiple of the e!ective dosefrom previous laboratory and animal studies is then used as the initial dose level in humansubjects. Much of our own experience with phase I studies comes from the "eld of oncology; here,unlike other phase I trials, the trial does not treat healthy volunteers. Instead, the trial enrolspatients who are extremely ill with cancer, and for whom accepted treatments may not haveworked.

The main goals of a phase I oncology trial are to understand the dose}toxicity relationship ofthe new regimen, and to "nd the maximum amount of drug which can be administered withoutexcessive toxicity. The highest possible dose is sought, since the bene"t of the new treatment isbelieved to increase with dose. Unfortunately, the severity of toxicity is also expected to increasewith dose, so the challenge is to increase the dose without causing an unacceptable amount of

*Correspondence to: B. P. Carlin, Division of Biostatistics, School of Public Health, University of Minnesota, Box 303Mayo Memorial Building, Minneapolis, Minnesota 55455-0392, U.S.A. E-mail: [email protected]

Contract/grant sponsor: National Institute of Allergy and Infectious DiseasesContract/grant number: 1-R01-AI41966

CCC 0277}6715/99/111307}15$17.50 Received September 1997Copyright ( 1999 John Wiley & Sons, Ltd. Accepted September 1998

Page 2: Adaptive design improvements in the continual reassessment method for phase I studies

toxicity in the patients. Once this level, the maximum tolerated dose (MTD), is determined, thetreatment may go on to a phase II oncology trial. Here the recommended dose is administered tohumans with varying tumour types to determine for which type the treatment appears mostbene"cial. Finally, in a phase III trial, the e!ect of the new treatment is compared to the e!ect ofa standard therapy and/or the natural progression of the disease to determine if the newtreatment is safe and bene"cial for larger populations.

Frequently, little consideration is given to the quantitative methods of the phase I trial. Thestandard phase I design treats a small group of patients at each dose level using a predetermineddose-escalation scheme, and estimates the MTD as either the dose at which the trial stopped dueto excessive toxicity, or the dose just prior to stopping. Although alternative designs have beenintroduced which use statistical methods to determine the dose-escalation scheme and models toestimate the maximum tolerated dose, they are generally not part of current clinical practice;thanks to its simplicity, the standard design is typically preferred.

1.2. Standard Design

The standard procedure in a phase I trial is to sequentially administer one of k dose levels tocohorts of three to six patients. The dose levels are determined from animal pharmacologystudies, selected so that the initial dose is minimally toxic and the dose increments are modest.The dose that is lethal in 10 per cent of tested mice, LD

10, is translated into a human dose, and

a fraction of that is taken for the initial dose. The dose escalations usually follow some variationof a Fibonacci sequence. A Fibonacci sequence is a sequence of numbers where each number isthe sum of the two previous numbers in the sequence; an example is M1, 1, 2, 3, 5, 8,2N. The dosesare increased according to the percentage increase between successive numbers in the Fibonaccisequence; for this example M100, 50, 67, 60,2N. Often, a modi"ed sequence such asM100, 67, 50, 40, 33N is used so that the increments decrease as the dose level increases.

Prior to the enrolment of patients in the trial, the notion of dose-limiting toxicity (DLT) isspeci"cally de"ned, and an acceptable level of DLT for each patient cohort is determined.Usually, dose-limiting toxicity is de"ned as a group of toxicities of grade three or higher, wheregrades are de"ned loosely as follows; grade 0, no toxicity; grade 1, mild toxicity; grade 2, moderatetoxicity; grade 3, severe toxicity, grade 4, life-threatening toxicity; and grade 5, death. Forexample, if it is acceptable for 33 per cent of patients to experience dose-limiting toxicity, then thestopping dose will be the dose at which two or more patients out of six experience dose-limitingtoxicity.1 The speci"c scheme is as follows. If no one in a set of three patients experiencesdose-limiting toxicity, then the dose will be escalated for the next set of three patients. However, ifone of the three patients in a set experiences dose-limiting toxicity, three more patients are treatedat that same dose level. Only if none of these three additional patients experiences dose-limitingtoxicity, the dose will be escalated for the next set. Thus, escalation occurs only when none ofthree patients or one of six patients experiences dose-limiting toxicity. If a death occurs that issuspected to be related to the new treatment, the study is suspended and the escalation scheme isre-evaluated. Although di!erent variations of the standard method are used, they all treat smallcohorts at each dose level, use a predetermined sequence of dose-escalation, and clearly de"ne thedose-limiting toxicity.

1.3. Di7culties with the Standard Design

The standard approach raises two important issues that should be considered in the design ofphase I oncology studies.2 First, there is an ethical responsibility to neither undertreat nor

1308 J. HEYD AND B. CARLIN

Copyright ( 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 1307}1321 (1999)

Page 3: Adaptive design improvements in the continual reassessment method for phase I studies

overtreat too many patients. If the initial dose is as minimally toxic as estimated, the slowescalation of the standard design may treat too many patients at subtherapeutic levels. On theother hand, if the initial dose is actually more toxic than estimated, the standard design mayexpose too many patients to dangerous levels of toxicity. To resolve this ethical con#ict and at thesame time minimize the required number of patients and dose levels is a challenging task.Alternative designs try to more e$ciently reach the MTD while continuing to provide appropri-ate treatment.

A second issue relates to the inference that can be made about the MTD estimate. The "naldose in the standard design is chosen because it does not exceed some level of toxicity in the "nalcohort, but the MTD estimate is not truly associated with a target toxicity. The stopping dosedepends greatly on the patients and the order in which they were entered.3 Patients in a phase Ioncology trial are not homogeneous; drug tolerance, disease stage and disease type can varysigni"cantly. Neither are they a random or representative sample. Because of this, the incidencerate of dose-limiting toxicity found in the standard phase I trial will not necessarily be reproducedin the population. Alternative designs try to more precisely estimate the dose that will yielda target level of dose-limiting toxicity in the patient population.

To address these issues, O'Quigley et al.4 introduced the continual reassessment method (CRM),which was subsequently modi"ed by a number of authors.5~7 In this paper we propose severalfurther extensions to the CRM, and investigate their performance through simulation studies.Section 2 reviews the current state of CRM technology, as well as practical di$culties with it.Section 3 describes our adaptive design enhancements to the CRM, and lays out the parametersof our simulation experiment. Section 4 summarizes the results of these simulations, while Section5 discusses our "ndings and suggests avenues for further research.

2. METHODS

2.1. Continual Reassessment Method

The continual reassessment method is by far the most written about and simulated alternative tothe standard method in phase I oncology trials. The original design4 treats one patient at a timeand uses accumulated toxicity data to determine the next dose. In this form, the CRM requirestwo pieces of information to implement a Bayesian approach to estimation: a parametric modelthat represents the dose}toxicity relationship, and a prior distribution for each unknownparameter. An acceptable probability of dose-limiting toxicity is identi"ed, and the goalis to estimate the dose associated with this target probability h. (In recent work, O'Quigley andShen8 recast the method in a more traditional maximum likelihood setting, and obtain "nalrecommended dose levels that di!er little from those obtained via the original Bayesianapproach.)

An approximate dose}toxicity relationship is obtained from animal studies, and a model forthe probability of DLT given a dose x

i, t(x

i), is chosen to represent this relationship. Next,

k doses are chosen which are believed to cover an acceptable range of DLT probabilities,including the target probability h. The "rst patient is given the dose, xh , for which the estimatedprobability of toxicity is closest to h. Since the parameter to be estimated is considered a randomvariable, it is given a prior distribution, chosen to re#ect all available prior information concern-ing it. If no reliable prior information is available to the investigator, a vague or non-informativeprior distribution can be chosen (see, for example, reference 9, Section 2.2.3).

CONTINUAL REASSESSMENT METHOD FOR PHASE I STUDIES 1309

Copyright ( 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 1307}1321 (1999)

Page 4: Adaptive design improvements in the continual reassessment method for phase I studies

To represent the dose}toxicity relationship, O'Quigley and Chevret10 suggest one of thefollowing one-parameter models:

power model: ta(x

i)"[h

i1(x

i)]a

logistic model: ta(x

i)"

exp(a0#ax

i)

1#exp(a0#ax

i)

hyperbolic tangent model: ta(x

i)"M(1#tanhx

i)/2Na

where xiis the ith dose, i"1,2, k, h

i1(x

i) is an initial estimate of the probability of toxicity at the

ith dose, and a0

is a "xed constant. Chevret11 conducted simulations which compared estimationresults using the exponential, gamma, log-normal, uniform and Weibull prior distributions todetermine if the form of the prior distribution was important. The simulations showed that theform of the prior distribution did not considerably a!ect the results, and no one prior distributionperformed consistently better or worse than others.

Just prior to testing the jth patient, the likelihood function is given by

¸ (a; )j)"

j~1<l/1

Mta(x

i (l))NrlM1!t

a(x

i(l))N(1~rl) (1)

where )jrepresents all accumulated information on the "rst ( j!1) patients, x

i(l)is the dose

administered to the lth patient, and rlis the binary toxicity response for the lth patient. As patients

are entered into the study and toxicity data are collected, the posterior distribution of a is updatedusing Bayes rule. The posterior distribution is given by

f (a D)j)"

¸ (a; )j)g (a)

:=0¸ (u; )

j)g (u)du

. (2)

The Bayes estimate of a, the posterior mean, is then

k ( j)"P=

0

af (a D)j)da. (3)

An estimate of the probability of toxicity at dose level i given the accumulated information on the"rst ( j!1) patients, h

ij, can be found by substituting the mean of the posterior distribution into

the dose}toxicity model ta(x

i). Thus, after each patient is entered, the probability of toxicity

associated with each dose is updated, and the next dose will be that for which the estimatedposterior probability is closest to h, the target probability. This process continues until a prespeci-"ed number of patients have been treated, at which point the "nal estimate of the recommendedphase II dose is obtained.

While it was the intent of the original CRM developers4 to obtain an improved dosing strategywith operating characteristics similar to the standard design, the above basic formulation maynot ensure this. If the chosen prior distribution is vague and there is little information gained fromthe initial cohorts, the resulting posterior distribution (2) may have a large variance } so large thatthe posterior mean (3) may swing wildly from one iteration to the next, implying that somepotential dose levels could be &skipped'. Indeed, Shen and O'Quigley12 and O'Quigley and Shen8

indicate that experimentation at certain dose levels may be very di$cult or impossible if themethod is not constrained to visit the dose levels in consecutive order. As such, most subsequent

1310 J. HEYD AND B. CARLIN

Copyright ( 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 1307}1321 (1999)

Page 5: Adaptive design improvements in the continual reassessment method for phase I studies

users have imposed such constraints on the CRM, and we too impose them in our Section 3simulation below. However, Section 5 brie#y investigates the unconstrained version in oursetting.

Previous simulation studies4,10,11 have assessed the performance of the CRM relative to thestandard method under a variety of parametric models and prior distributions. These studiessuggest that the CRM may be preferable because, by using all available toxicity information, itcan more precisely estimate the dose associated with a target probability of toxicity. Also, bytreating patients at the best estimate of the MTD, it minimizes the number of patients who areundertreated. However, the CRM, at least in its simplest form, has also been criticized on severalaccounts. It requires the user to specify a prior distribution, a dose}toxicity model and an upperbound for patient accrual. If unconstrained, the computed dose levels may jump too muchbetween successive patients, causing more patients to be treated at very high doses. The CRMmay actually take longer to carry out than other methods, due to the need to evaluate toxicitybefore accrual of the next patient. Finally, the integration required in equations (2) and (3) may beawkward (requiring numerical methods) and seem rather mysterious to medical practitioners.

2.2. Past Modi5cations to the Continual Reassessment Method

To address some of these problems, Korn et al.5 suggested a modi"ed CRM and completeda simulation to compare the modi"ed CRM to the standard method. Their modi"cationsincluded the following:

(i) To minimize the risk to the "rst patient in cases where prior information was lacking, thispatient was always treated at the "rst dose level, rather than the initial estimate of theMTD.

(ii) In order to reduce the number of patients treated at dose levels that were too high, no doselevel could be &skipped' between successive patients.

(iii) In order to reduce the time it took to "nd the MTD, the trial stopped once the next dosewas one at which six patients had already been treated.

For their simulations, Korn et al. used the one-parameter logistic model with a0"3 and an

exponential prior. They considered three values of h, namely 0)25, 0)20 and 0)15. The authorsconcluded that their modi"ed CRM was not an improvement over the standard method, partlybecause it still treated more patients at high dose levels, producing a higher percentage of DLT,and partly because it required more cohorts, thus taking longer to complete.

Further improvements to the method, presented in a simulation study by Goodman et al.6sought to correct these problems. In their simulation, the trial also started at the lowest dose level,and the dose could not increase by more than one level at a time. Additionally, two stopping ruleswere considered. First, a trial could stop once a "xed number of patients had been treated: 12, 18or 24. Second, a trial could stop once six patients were treated at the recommended dose anda minimum sample size had been reached. The most important modi"cation was to treat morethan one patient at a time. The idea was to make the CRM more similar to the standard methodwithout losing precision in the MTD estimate.

The one parameter logistic model with a0"3 was used to represent the dose}toxicity

relationship, and both exponential and uniform prior distributions were considered. The priorestimates of the probability of toxicity were (0)05, 0)10, 0)20, 0)35, 0)50, 0)70) with h"0)20. Three

CONTINUAL REASSESSMENT METHOD FOR PHASE I STUDIES 1311

Copyright ( 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 1307}1321 (1999)

Page 6: Adaptive design improvements in the continual reassessment method for phase I studies

modi"ed versions of the CRM, treating 1, 2 or 3 patients at a time, were compared to the standardmethod and the unmodi"ed CRM. Di!erent variations, using six dose}toxicity curves and thetwo stopping rules, were each simulated 10,000 times. Under all variations, the resulting modi"edCRM with three patients per cohort had overall toxicity lower than the standard method and theunmodi"ed CRM. The standard method still required the least number of cohorts, but for mostsituations, the CRM with three patients per cohort did not require considerably more. Finally,the accuracy of the MTD estimate was not greatly a!ected by treating more than one patient percohort with the CRM.

Although the CRM treating three patients at a time performed as well as the original CRM andactually reduced the percentage of DLT, the standard method still required the least number ofcohorts. This may be the biggest disadvantage to the CRM because by requiring more cohorts ittakes longer to complete. Goodman et al.6 suggest that using the width of the Bayesian posteriorprobability interval as a criterion to stop early may allow for a smaller number of cohorts withlittle loss of precision. The following section describes a simulation study to investigate thishypothesis.

3. SIMULATION DESIGN

This simulation study extends previous CRM work while attempting to re#ect actual phase Itrial practice as accurately as possible. First, patients are treated three at a time. Second, resultsbased on a "xed sample size of 24 are compared to adaptive rules which use the width of theBayesian posterior probability interval as a criterion to stop the trial early. An approximate(normal theory) 95 per cent probability interval is used, so the width of the interval equals2(1)96) JMvar(b

1)N. Three widths are considered in this study: 1)0, 1)5 and 2)0. In these three cases,

the trial stops when either the width is less than the set criterion or the sample size equals 24,whichever happens "rst. In another case, the sample size is unbounded and the trial stops onlywhen the width is less than 1)5. The following logistic model is used to represent the probability ofdose-limiting toxicity given a dose x:

tb1(x)"

1

1#exp(b0!b

1x)

. (4)

Finally, a uniform prior for b1

distribution is used; we assume the background toxicity parameterb0

is known throughout, since, as noted by Piantadosi and Liu,7 joint estimation of b0

and b1

would be numerically di$cult under uniform priors until several dose levels have beenattempted.

Our simulation allows six possible dose levels: (1)63, 1)81, 2)02, 2)15, 2)36, 2)57). The priorestimates of the probability of dose-limiting toxicity (DLT) associated with each of these doses are(0)05, 0)10, 0)20, 0)30, 0)50, 0)70). This corresponds to having b

0"xed at 9)45 with a prior estimate

of b1

equal to 4)00. The target probability of DLT, h, is 0)30, and the MTD is de"ned as the dosewhich causes DLT in 30 per cent of patients. The "rst set of patients is treated at the lowest dose,1)63. The true probability of DLT, q, is calculated for each set of patients from equation (4) usingthe actual values of b

0and b

1. The binary toxicity response r

l, representing presence or absence of

DLT in patient l, is determined as rl"1 if u

l)q, and 0 otherwise, where u

l&Uniform (0, 1),

generated independently for each patient. The likelihood function is then given by equation (1),

1312 J. HEYD AND B. CARLIN

Copyright ( 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 1307}1321 (1999)

Page 7: Adaptive design improvements in the continual reassessment method for phase I studies

and the posterior distribution of equation (2). After each set of three patients is treated, theposterior distribution is updated, and the new posterior mean is found using equation (3). Weused a standard univariate quadrature routine to evaluate the associated integrals; for modelsfeaturing higher-dimensional parameters b, Monte Carlo methods may be more appropriate (see,for example, reference 9, Chapter 5).

Given the updated posterior mean, the next dose is obtained by substituting this value into theinverted dose}toxicity model. Here, our logistic model gives x"[b

0!logit(h)]/b)

1, where b)

1is

the updated posterior mean, b0"9)45 as above, and h"0)30, the target probability. However,

since this formula calculates dose on a continuous scale, the following set of conditions is used torestrict dose to the prespeci"ed set:

new dose"G1)63 if x(1)77

1)81 if 1)77)x(1)97

2)02 if 1)97)x(2)12

2)15 if 2)12)x(2)31

2)36 if 2)31)x(2)52

2)57 if x*2)52.

(5)

Here, the upper cut-o! point for each dose level was calculated as ci"x

i#0)75(x

i`1!x

i). This

condition was chosen due to its slight bias toward lower (safer) dosages, thus preventing highlevels of DLT. As mentioned above, we also constrain consecutive doses so that the current doselevel does not jump more than one level between cohorts. To check the e!ect of these cut-o!s andconstraints, the average percentage of experimentation at each dose, average per cent time eachdose is recommended, and the average percentage of DLT will be reported.

To determine how the method performs when the true underlying dose}toxicity relationship isdi!erent from our prior estimate, the simulations were done under three di!erent curvesrepresenting the true dose}toxicity relationship. These curves are shown in Figure 1, whileTable I shows the true probability of DLT associated with the six doses for each of these curves.Recall that the prior estimates are (0)05, 0)10, 0)20, 0)30, 0)50, 0)70). For the "rst curve, the priorestimates of probability of DLT equal the truth and the MTD is the fourth dose, 2)15. However,for the second curve, the true probabilities of DLT are lower than the prior estimates, and thesixth dose is the MTD. Under the third curve, the true probabilities of DLT are much higher thanthe prior estimates, and the "rst dose is the MTD.

For each curve, simulations were run for each of the following "ve stopping rules shown inTable II. Since the true values of b

1, MTD and h are known to our simulation, the bias and mean

squared error (MSE) of our estimates can be estimated for each rule by averaging over ourN Monte Carlo samples (we take N"10,000). Further, the estimated standard error of theestimated bias (or MSE) can be found in the following way. Let b

ibe the bias (MSE) calculated

from the ith Monte Carlo sample, and bKMC

"1N

+Ni/1

bi, the simulated average bias (MSE). Then

the estimated standard error of bKMC

is

SEY (bKMC

)"SG1

N(N!1)

N+i/1

(bi!bK

MC)2H .

CONTINUAL REASSESSMENT METHOD FOR PHASE I STUDIES 1313

Copyright ( 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 1307}1321 (1999)

Page 8: Adaptive design improvements in the continual reassessment method for phase I studies

Figure 1. True dose}toxicity curves

Table I. Probability of DLT associated with each dose foreach of three true dose}toxicity curves

Dose Curve 1 Curve 2 Curve 3b1"4)00 b

1"3)35 b

1"5)28

MTD is 2)15 MTD is 2)57 MTD is 1)63

1)63 0)05 0)02 0)301)81 0)10 0)03 0)532)02 0)20 0)06 0)772)15 0)30 0)10 0)872)36 0)50 0)18 0)952)57 0)70 0)30 0)98

Table II. Stopping rule de"nitions

Rule Description

Fixed Stop when the sample size reaches 24E1 Stop when width of the interval for b

1is less than 1)0 or the sample size reaches 24

E2 Stop when width is less than 1)5 or sample size reaches 24E3 Stop when width is less than 2)0 or sample size reaches 24E4 Stop when width is less than 1)5 (no explicit bound on sample size)

1314 J. HEYD AND B. CARLIN

Copyright ( 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 1307}1321 (1999)

Page 9: Adaptive design improvements in the continual reassessment method for phase I studies

Table III. Parameter estimates, dose}response curve 1

Parameter Truth Stopping rule

Fixed E1 E2 E3 E4

b1

4)00 4)00 4)00 4)01 4)03 4)03h 0)30 0)27 0)30 0)38 0)35 0)38MTD 2)15 2)12 2)15 2)24 2)20 2)23Sample size 24)00 23)35 16)51 13)19 16)47

Table IV. Experimentation and recommendation percentage by dose level, dose}response curve 1

Experimentation percentage Recommendation percentageDose level Dose level

1 2 3 4 5 6 1 2 3 4 5 6True probability of DLT True probability of DLT

0)05 0)10 0)20 0)30 0)50 0)70 0)05 0)10 0)20 0)30 0)50 0)70

Fixed 0)16 0)19 0)24 0)27 0)12 0)02 0)00 0)09 0)28 0)44 0)16 0)02E1 0)16 0)19 0)24 0)26 0)13 0)02 0)00 0)08 0)25 0)42 0)22 0)03E2 0)23 0)23 0)22 0)19 0)11 0)02 0)00 0)08 0)10 0)30 0)40 0)12E3 0)29 0)25 0)22 0)15 0)08 0)02 0)01 0)06 0)22 0)32 0)30 0)09E4 0)23 0)23 0)22 0)19 0)11 0)02 0)01 0)08 0)11 0)32 0)38 0)11

4. RESULTS

4.1. Dose+response Curve 1

Table III shows the estimates of b1, h and MTD found using the "ve stopping rules under curve 1.

Stopping the trial when the width is less than 1)0 or the sample size reaches 24 (E1) seems to yieldthe most precision. However, at an average sample size of 23)35 patients this rule does note!ectively reduce the size of the trial. The average percentage of DLT with curve 1 is 21)5 per cent.Note that rule E3 stops on average after only 13)19 patients have been treated.

Table IV shows the percentage of experimentation and "nal recommendation at each doselevel for curve 1. Since the MTD is the fourth dose, the amount of experimentation at the highestlevel should be minimized. All stopping rules are satisfactory in this regard. Among all rules, E3has the least experimentation at dose level four where (under this curve) we would like to see themost experimentation. All rules "nally recommend the correct dose level between 30 and 44 percent of the time. Under rule E2, the most toxic dose is recommended 12 per cent of the time.

4.2. Dose}response Curve 2

Table V shows the parameter estimates found under curve 2, where the prior estimates of theprobability of DLT are greater than the true probabilities. Stopping the trial when width is lessthan 1)5 (E4) reduces the average size of the trial to 20)4 patients and yields the most preciseparameter estimates. The average percentage of DLT under curve 2 is only 13)3 per cent.

CONTINUAL REASSESSMENT METHOD FOR PHASE I STUDIES 1315

Copyright ( 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 1307}1321 (1999)

Page 10: Adaptive design improvements in the continual reassessment method for phase I studies

Table V. Parameter estimates, dose}response curve 2

Parameter Truth Stopping rule

Fixed E1 E2 E3 E4

b1

3)35 3)27 3)27 3)29 3)35 3)34h 0)30 0)22 0)23 0)26 0)25 0)27MTD 2)57 2)48 2)49 2)51 2)47 2)51Sample size 24)00 23)92 20)00 18)29 20)40

Table VI. Experimentation and recommendation percentage by dose level, dose}response curve 2

Experimentation percentage Recommendation percentageDose level Dose level

1 2 3 4 5 6 1 2 3 4 5 6True probability of DLT True probability of DLT

0)02 0)03 0)06 0)10 0)18 0)30 0)02 0)03 0)06 0)10 0)18 0)30

Fixed 0)13 0)13 0)13 0)16 0)19 0)25 0)00 0)00 0)01 0)07 0)25 0)66E1 0)13 0)13 0)13 0)16 0)19 0)25 0)00 0)00 0)01 0)06 0)26 0)67E2 0)16 0)16 0)16 0)16 0)17 0)19 0)00 0)00 0)01 0)04 0)16 0)79E3 0)18 0)17 0)17 0)16 0)14 0)17 0)00 0)01 0)04 0)08 0)19 0)68E4 0)16 0)16 0)16 0)16 0)17 0)20 0)00 0)00 0)01 0)04 0)16 0)79

Table VI shows the percentage of experimentation and "nal recommendation at each doselevel for curve 2. Here the MTD is the sixth dose so most of the experimentation should be doneat the highest level. Since the lowest dose is always given at the start and subsequent doses cannotjump more than one level, the experimentation percentages across all levels are fairly uniform.For most stopping rules, there is slightly more experimentation at dose level six, as desired. Forall rules, the correct dose level is recommended at least 66 per cent of the time, and very rarely arethe extremely low dose levels recommended.

4.3. Dose+response Curve 3

Table VII shows the parameter estimates found under curve 3, where the prior estimates of theprobability of DLT are lower than the true probabilities. As before, on average E1 does not stopbefore treating 24 patients and the results for E1 are nearly the same as for those using a "xedsample size. All rules yield good estimates of b

1, but for rules E2, E3 and E4 the estimates of h and

MTD are somewhat high. The average percentage of DLT under curve 3 is 36)3 per cent.Table VIII shows the percentage of experimentation and "nal recommendation at each dose

level for curve 3. The MTD is the "rst dose and there is a high percentage of experimentation atthis dose level. Note that no experimentation is done at the "fth or sixth dose level, and very littleis done at the fourth level. For those rules which stop early, the correct dose is recommended atleast 61 per cent of the time.

4.4. General Results

For all three curves, there is little di!erence between rules E2 and E4. Recall that both rules stopearly if the width is less than 1)5, however, E2 can also stop if the sample size reaches 24 while E4

1316 J. HEYD AND B. CARLIN

Copyright ( 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 1307}1321 (1999)

Page 11: Adaptive design improvements in the continual reassessment method for phase I studies

Table VII. Parameter estimates, dose}response curve 3

Parameter Truth Stopping rule

Fixed E1 E2 E3 E4

b1

5)28 5)26 5)26 5)28 5)27 5)28h 0)30 0)31 0)31 0)38 0)42 0)38MTD 1)63 1)64 1)65 1)70 1)74 1)70Sample size 24)00 23)98 13)72 9)75 13)71

Table VIII. Experimentation and recommendation percentage by dose level, dose}response curve 3

Experimentation percentage Recommendation percentageDose level Dose level

1 2 3 4 5 6 1 2 3 4 5 6True probability of DLT True probability of DLT

0)30 0)53 0)77 0)87 0)95 0)98 0)30 0)53 0)77 0)87 0)95 0)98

Fixed 0)83 0)15 0)03 0)00 0)00 0)00 0)92 0)08 0)00 0)00 0)00 0)00E1 0)83 0)15 0)03 0)00 0)00 0)00 0)92 0)08 0)00 0)00 0)00 0)00E2 0)75 0)20 0)05 0)01 0)00 0)00 0)66 0)31 0)03 0)01 0)00 0)00E3 0)75 0)19 0)06 0)00 0)00 0)00 0)61 0)22 0)17 0)01 0)00 0)00E4 0)76 0)19 0)05 0)00 0)00 0)00 0)67 0)29 0)03 0)01 0)00 0)00

has no explicit bound on sample size. An explanation for their similar results can be found byexamining the percentage of trials that actually reach a sample size of 24. For curves 1 and 3, lessthan 3 per cent of trials reach 24 patients. Therefore, imposing a limit of 24 patients di!ers littlefrom imposing no bound on sample size. For curve 2, in which the sixth dose is the true MTD,21)7 per cent of the trials reach 24 patients. This is most likely due to the fact that each trial startsout at level one, and the dose cannot jump more than one level between cohorts of patients,forcing experimentation at each dose level.

Tables IX and X show the bias and MSE of the estimates, and associated standard errors. Forb1, h and MTD, the standard errors of bias and MSE are extremely small. In Table IX, which

shows the bias of b1, we see that for each curve, at least one of the trials which stopped early

resulted in bias at least as small as that in the trial with a "xed sample size. This is also generallytrue for the bias of h and the bias of MTD for curves 1 and 2, seen in Table X. The "xed samplesize rules occasionally o!er better MSE behaviour, but not always (for example, for curve 2,where rule E4 uniformly dominates the "xed rule).

5. DISCUSSION

Despite its known de"ciencies, the standard phase I design is still widely used. It has beenadequate because of the perception that phase I trials do not require a great deal of statisticalanalysis. However, the need to gather precise dose}toxicity information in an e$cient and ethicalmanner has prompted the development of a number of alternative designs. This paper has

CONTINUAL REASSESSMENT METHOD FOR PHASE I STUDIES 1317

Copyright ( 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 1307}1321 (1999)

Page 12: Adaptive design improvements in the continual reassessment method for phase I studies

Table IX. Estimated bias and MSE of b1$estimated standard error

Curve Stopping rule b1

Bias MSE

Curve 1 Fixed !0)001$0)003 0)068$0)001E1 0)000$0)003 0)069$0)001E2 0)013$0)003 0)105$0)002E3 0)031$0)003 0)142$0)002E4 0)027$0)003 0)108$0)002

Curve 2 Fixed !0)085$0)004 0)166$0)005E1 !0)084$0)004 0)166$0)005E2 !0)057$0)004 0)185$0)005E3 0)004$0)005 0)223$0)005E4 !0)009$0)003 0)106$0)002

Curve 3 Fixed !0)024$0)003 0)066$0)001E1 !0)024$0)003 0)067$0)001E2 !0)003$0)003 0)116$0)002E3 !0)005$0)004 0)169$0)003E4 0)003$0)003 0)121$0)002

Table X. Estimated bias and MSE of h and MTD$estimated standard error

Curve Stopping rule h MTDBias MSE Bias MSE

Curve 1 Fixed !0)026$0)001 0)005$0)000 !0)028$0)003 0)027$0)000E1 !0)005$0)001 0)006$0)000 !0)003$0)003 0)030$0)000E2 0)078$0)001 0)016$0)000 0)089$0)003 0)050$0)001E3 0)049$0)001 0)013$0)000 0)049$0)004 0)042$0)001E4 0)078$0)001 0)016$0)000 0)082$0)003 0)049$0)001

Curve 2 Fixed !0)077$0)001 0)015$0)000 !0)090$0)004 0)028$0)001E1 !0)073$0)001 0)015$0)000 !0)085$0)004 0)025$0)000E2 !0)042$0)001 0)015$0)000 !0)057$0)004 0)018$0)001E3 !0)048$0)001 0)013$0)000 !0)102$0)005 0)039$0)001E4 !0)034$0)001 0)012$0)000 !0)057$0)003 0)018$0)001

Curve 3 Fixed 0)013$0)001 0)007$0)000 0)015$0)003 0)003$0)000E1 0)013$0)001 0)007$0)000 0)015$0)003 0)003$0)000E2 0)081$0)001 0)017$0)000 0)070$0)003 0)016$0)000E3 0)125$0)001 0)034$0)000 0)109$0)004 0)035$0)001E4 0)082$0)001 0)018$0)000 0)068$0)004 0)016$0)000

reviewed and extended an alternative design, the continual reassessment method, by allowing thetrial to stop when the Bayesian posterior probability interval for the dosing parameter b

1is

su$ciently narrow; see O'Quigley and Reiner13 for a di!erent approach to the stopping rule.Previous studies have shown that a modi"ed CRM treating three patients per cohort does as

well as the standard method in terms of DLT and estimation of the MTD. A primary outcome ofour simulation study is showing that this continues to be true if we allow the trial to stop early

1318 J. HEYD AND B. CARLIN

Copyright ( 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 1307}1321 (1999)

Page 13: Adaptive design improvements in the continual reassessment method for phase I studies

Table XI. Parameter estimates, dose}response curve 1, unconstrained version

Parameter Truth Stopping rule

Fixed E1 E2 E3 E4

b1

4)00 4)03 4)01 4)02 4)09 4)03h 0)30 0)27 0)28 0)34 0)46 0)34MTD 2)15 2)10 2)13 2)19 2)29 2)19Sample size 24)00 21)08 12)60 8)49 12)62

once the width of the Bayesian posterior probability interval became su$ciently small. In general,stopping the trial when the width was less than 1)00 or the sample size reached 24 (E1) resulted inestimates as good as those found using a "xed sample size of 24. However, for all three curves, anaverage sample size of nearly 24 was required with this rule, making it not much di!erent from the"xed sample size trial. This is somewhat surprising for curve 3 since the trial always started out atthe dose it should ultimately recommend.

Stopping the trial when the width was less than 1)5 or the sample size reached 24 (E2) resultedin better estimates than the rules E1 and E3 for curve 2, and reduced the sample size by 4 patients.However, for the other curves, this rule resulted in slightly higher estimates of the MTD and theprobability of DLT. Rule E4, using the same width criteria but no explicit bound on sample size,resulted in estimates nearly identical to those found with E2. Overall, stopping the trial when thewidth was less than 2)00 or the sample size reached 24 (E3) resulted in much smaller sample sizebut at the price of somewhat higher estimates of the probability of DLT.

Our results do of course depend somewhat on the underlying dose}toxicity curve. No stoppingrule was consistently better than the others, but it seems that the most satisfactory results hoveredsomewhere around rule E1 (width of 1)00) and E2 (width of 1)5). It may be useful to investigate anintermediate rule, such as stopping the trial when the width is less than 1)25. In any case, ourresults illustrate once again the classic trade-o! between early stopping and accurate "nalestimation.

As alluded to earlier, it is interesting to consider what happens when our logistic version of theCRM is implemented without the constraint that no dose levels may be skipped. Repeating oursimulations without such constraints, Table XI shows the resulting estimates of b

1, h, MTD and

average sample size, while Table XII shows the percentage of experimentation and percentage ofultimate recommendation at each dose level under curve 1. Comparing Tables XI and III, we seethat removing the constraints allows the trial to progress faster under stopping rules E2}E4,saving roughly 3}4 patients (a little more than one cohort) on average. However, Table XIIreveals the &dose-jumping' by these rules } non-monotonicity not found in the correspondingconstrained version (Table IV). Particularly embarrassing is the Table XII behaviour of rule E3,which spends little time experimenting at dose level 4, the correct dose for this curve (ultimatelyrecommending it just 6 per cent of the time), or at either of the next-closest dose levels (3 and 5).Apparently this rule permits stopping that is so early that the induced posterior for the MTD isstill quite overdispersed relative to our dosing grid (5). Rules E2 and E4 exhibit similar, thoughless extreme, behaviour; in fact, the correct dose was recommended less than 40 per cent of thetime for this curve among all rules which used the early stopping criterion.

CONTINUAL REASSESSMENT METHOD FOR PHASE I STUDIES 1319

Copyright ( 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 1307}1321 (1999)

Page 14: Adaptive design improvements in the continual reassessment method for phase I studies

Table XII. Experimentation and recommendation percentage by dose level, dose}response curve 1, uncon-strained version

Experimentation percentage Recommendation percentageDose level Dose level

1 2 3 4 5 6 1 2 3 4 5 6True probability of DLT True probability of DLT

0)05 0)10 0)20 0)30 0)50 0)70 0)05 0)10 0)20 0)30 0)50 0)70

Fixed 0)16 0)13 0)18 0)31 0)09 0)15 0)00 0)10 0)29 0)49 0)10 0)00E1 0)18 0)13 0)16 0)27 0)08 0)18 0)00 0)10 0)26 0)44 0)17 0)03E2 0)29 0)14 0)09 0)16 0)04 0)30 0)01 0)10 0)34 0)19 0)15 0)22E3 0)43 0)13 0)00 0)02 0)00 0)42 0)01 0)32 0)02 0)06 0)00 0)60E4 0)29 0)14 0)09 0)16 0)03 0)29 0)01 0)10 0)35 0)19 0)14 0)21

On the other hand, under curve 2 (where the MTD is the sixth dose), it may actually be better toallow the dose to jump more than one level, since restricting the dose in this setting leads tosigni"cant delays in stopping the trial. For instance, the average sample size for rule E3 in thiscase is just 10)63 for the unconstrained version (compared to 18)29 for the usual constrainedapproach), while delivering a recommendation for the correct dose level an impressive 95)4 percent of the time (compared to 67)8 per cent under the constrained version). On balance, however,it seems the added element of risk in the unconstrained CRM would likely render it unacceptableto many clinicians.

Finally, several extensions to our logistic model (4) could be envisioned. For instance, if sickerpatients were more likely to be enrolled at the beginning of the trial, this could bias the resultsunless explicitly accounted for using a baseline health status covariate in the model. Alternatively,in oncology it might be the case that information on genetic protection were available from urineor serum samples. If y

l("0, 1) denoted the absence or presence of such protection in patient l,

then a suitable modi"ed model might be

tb1,b2, l(x)"

1

1#exp[b0!(b

1#b

2yl)x]

with both b1

and b2

now estimated from the data. In this way, the dose}response curve couldadapt to the genetic protection status of each patient.

ACKNOWLEDGEMENTS

The research of the second author was supported in part by the National Institute of Allergy andInfectious Diseases (NIAID), grant 1-R01-AI41966. Much of this work was carried out while the"rst author was a graduate student in the Division of Biostatistics at the University of Minnesota.The authors are grateful to Drs. Je! Sloan and Vera Suman, who supervised the initiation of thisproject during the "rst author's summer internship at the Mayo Clinic in Rochester, MN.

REFERENCES

1. Mick, R. &Phase I clinical trial design', in Schilsky, R. L., Milano, G. A. and Ratain, M. J. (eds), Principlesof Antineoplastic Drug Development and Pharmacology, Marcel Dekker, New York, 1996, pp. 29}36.

1320 J. HEYD AND B. CARLIN

Copyright ( 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 1307}1321 (1999)

Page 15: Adaptive design improvements in the continual reassessment method for phase I studies

2. Ratain, M. J., Mick, R., Schilsky, R. and Siegler, M. &Statistical and ethical issues in the design andconduct of Phase I and II clinical trials of new anticancer agents', Journal of the National CancerInstitute, 85, 1637}1643 (1993).

3. Mick, R. and Ratain, M. J. &Model-guided determination of maximum tolerated dose in Phase I clinicaltrials: evidence for increased precision', Journal of the National Cancer Institute, 85, 217}223 (1993).

4. O'Quigley, J., Pepe, M. and Fisher, L. &Continual reassessment method: a practical design for phase Iclinical trials in cancer', Biometrics, 46, 33}48 (1990).

5. Korn, E. L., Midthune, D., Chen, T. T., Rubenstein, L. V., Christian, M. C. and Simon, R. M. &Acomparison of two phase I trial designs', Statistics in Medicine, 13, 1799}1806 (1994).

6. Goodman, S. N., Zahurak, M. L. and Piantadosi, S. &Some practical improvements in the continualreassessment method for phase I studies', Statistics in Medicine, 14, 1149}1161 (1995).

7. Piantadosi, S. and Liu, G. &Improved designs for dose escalation studies using pharmacokineticmeasurements', Statistics in Medicine, 15, 1605}1618 (1996).

8. O'Quigley, J. and Shen, L. Z. &Continual reassessment method: a likelihood approach', Biometrics, 52,673}684 (1996).

9. Carlin, B. P. and Louis, T. A. Bayes and Empirical Bayes Methods for Data Analysis, Chapman and Hall,London, 1996.

10. O'Quigley, J. and Chevret, S. &Methods for dose "nding studies in cancer clinical trials: a review andresults of a Monte Carlo study', Statistics in Medicine, 10, 1647}1664 (1991).

11. Chevret, S. &The continual reassessment method in cancer phase I clinical trials: a simulation study',Statistics in Medicine, 12, 1093}1108 (1993).

12. Shen, L. Z. and O'Quigley, J. &Consistency of continual reassessment method under model misspeci"ca-tion', Biometrika, 83, 395}405 (1996).

13. O'Quigley, J. and Reiner, E. &A stopping rule for the continual reassessment method', Biometrika, 85,741}748 (1998).

CONTINUAL REASSESSMENT METHOD FOR PHASE I STUDIES 1321

Copyright ( 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 1307}1321 (1999)