sample size and power calculations in repeated measurement analysis

4
Computer Methods and Programs in Biomedicine 64 (2001) 121 – 124 Sample size and power calculations in repeated measurement analysis Chul Ahn *, John E. Overall, Scott Tonidandel Uni6ersity of Texas Health Science Center at Houston, 6431 Fannin St., MSB 1.112, Houston, TX 77030, USA Received 8 November 1999; received in revised form 3 March 2000; accepted 20 April 2000 Abstract Controlled clinical trials in neuropsychopharmacology, as in numerous other clinical research domains, tend to employ a conventional parallel-groups design with repeated measurements. The hypothesis of primary interest in the relatively short-term, double-blind trials, concerns the difference between patterns or magnitudes of change from baseline. A simple two-stage approach to the analysis of such data involves calculation of an index or coefficient of change in stage 1 and testing the significance of difference between group means on the derived measure of change in stage 2. This article has the aim of introducing formulas and a computer program for sample size and/or power calculations for such two-stage analyses involving each of three definitions of change, with or without baseline scores entered as a covariate, in the presence of homogeneous or heterogeneous (autoregressive) patterns of correlation among the repeated measurements. Empirical adjustments of sample size for the projected dropout rates are also provided in the computer program. © 2001 Elsevier Science Ireland Ltd. All rights reserved. Keywords: Repeated measures; Sample size estimate; Power calculation; Dropouts www.elsevier.com/locate/cmpb 1. Introduction Controlled clinical trials tend to employ a par- allel-groups repeated measurements design in which individuals are randomly assigned between treatment groups, evaluated at baseline, and then evaluated at intervals across a treatment period of fixed total duration. The repeated measurements are usually equally spaced, although not necessar- ily so. The hypothesis of primary interest in short- term efficacy trials of perhaps 6–8 weeks duration concerns the difference between treatment groups in patterns and magnitudes of change from base- line. The analysis of data from the randomized, parallel-groups design often focuses on the differ- ence between experimental and control groups in the average rate of change as represented by slopes of regression lines fitted to the mean re- sponse patterns. In several recent articles, we have examined the actual type I error and power provided by alter- native general linear mixed model formulations, including procedures that utilize maximum likeli- hood and related solutions to model random ef- fects and error covariance structures of the repeated measurements [1,2]. The purpose of this article is to make available a computer program for calculation of sample sizes and power for this most common clinical trials design based on * Corresponding author. 0169-2607/01/$ - see front matter © 2001 Elsevier Science Ireland Ltd. All rights reserved. PII:S0169-2607(00)00095-X

Upload: chul-ahn

Post on 16-Sep-2016

227 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Sample size and power calculations in repeated measurement analysis

Computer Methods and Programs in Biomedicine 64 (2001) 121–124

Sample size and power calculations in repeatedmeasurement analysis

Chul Ahn *, John E. Overall, Scott TonidandelUni6ersity of Texas Health Science Center at Houston, 6431 Fannin St., MSB 1.112, Houston, TX 77030, USA

Received 8 November 1999; received in revised form 3 March 2000; accepted 20 April 2000

Abstract

Controlled clinical trials in neuropsychopharmacology, as in numerous other clinical research domains, tend toemploy a conventional parallel-groups design with repeated measurements. The hypothesis of primary interest in therelatively short-term, double-blind trials, concerns the difference between patterns or magnitudes of change frombaseline. A simple two-stage approach to the analysis of such data involves calculation of an index or coefficient ofchange in stage 1 and testing the significance of difference between group means on the derived measure of changein stage 2. This article has the aim of introducing formulas and a computer program for sample size and/or powercalculations for such two-stage analyses involving each of three definitions of change, with or without baseline scoresentered as a covariate, in the presence of homogeneous or heterogeneous (autoregressive) patterns of correlationamong the repeated measurements. Empirical adjustments of sample size for the projected dropout rates are alsoprovided in the computer program. © 2001 Elsevier Science Ireland Ltd. All rights reserved.

Keywords: Repeated measures; Sample size estimate; Power calculation; Dropouts

www.elsevier.com/locate/cmpb

1. Introduction

Controlled clinical trials tend to employ a par-allel-groups repeated measurements design inwhich individuals are randomly assigned betweentreatment groups, evaluated at baseline, and thenevaluated at intervals across a treatment period offixed total duration. The repeated measurementsare usually equally spaced, although not necessar-ily so. The hypothesis of primary interest in short-term efficacy trials of perhaps 6–8 weeks durationconcerns the difference between treatment groupsin patterns and magnitudes of change from base-

line. The analysis of data from the randomized,parallel-groups design often focuses on the differ-ence between experimental and control groups inthe average rate of change as represented byslopes of regression lines fitted to the mean re-sponse patterns.

In several recent articles, we have examined theactual type I error and power provided by alter-native general linear mixed model formulations,including procedures that utilize maximum likeli-hood and related solutions to model random ef-fects and error covariance structures of therepeated measurements [1,2]. The purpose of thisarticle is to make available a computer programfor calculation of sample sizes and power for thismost common clinical trials design based on* Corresponding author.

0169-2607/01/$ - see front matter © 2001 Elsevier Science Ireland Ltd. All rights reserved.

PII: S 0169 -2607 (00 )00095 -X

Page 2: Sample size and power calculations in repeated measurement analysis

C. Ahn et al. / Computer Methods and Programs in Biomedicine 64 (2001) 121–124122

adaption of more general equations previouslyprovided by [3] and elaborated computationally fora two-stage generalized least squares solution by[2].

For sample size and power calculations, thecomputer program to be described considers atwo-stage analysis of the repeated measurements inwhich an index or coefficient of change is calculatedfor each individual in stage 1, and the significanceof the difference between group means on thederived measure of change is evaluated against thewithin-groups variability of that measure in stage2 using analysis of variance (ANOVA) or analysisof covariance (ANCOVA) methods. Stage 1 of theanalysis can specify three different definitions ofsubject-specific rate of change, which representsendpoint change, ordinary least squares (OLS), orgeneralized least squares (GLS) regression analysis.Each provides an estimate of the slope of a regres-sion line relating the repeated measurements tocorresponding assessment times. For the intent-to-treat analysis, the number of available repeatedmeasurements to which the regression model isfitted is permitted to differ for provided dropoutsand completers.

Dropouts tend to attenuate the power of tests forevaluating differences in patterns of change acrosstime in a repeated measurements design. Simula-tion methods are used to examine the attenuationof power due to dropouts, and they concluded thatthe common practice of increasing the ‘dropoutfree’ sample size by the anticipated number ofdropouts is a useful rule-of-thumb [4]. Suchdropout adjusted sample sizes are thus also pro-vided by the computer program. It first calculatessample sizes and/or power, with or without thebaseline value entered as a covariate, using all threedefinitions of change mentioned above, and then italso provides adjusted sample size estimates aimedat maintaining the same desired power for anintent-to-treat analysis that includes data for ananticipated 20 or 30% dropouts.

2. Sample size and power calculations

The power of a test of significance for thedifference between means of a normally-distributed

response measure in two treatment groups can beobtained as the area that lies beyond a critical Zb

value in the unit normal curve. The required samplesize can be calculated directly from the powercalculation formula. Type I error can be calculatedby making the treatment effect size D equal to zerofrom the power calculations formula. The formulaspresented here for sample size and power calcula-tions are simplified version in [3], which presentequations for calculating Zb for tests of the ‘equalchange hypothesis’ using simple endpoint changeor ordinary least squares (OLS) regression slopesas dependent variables. An equation for calculatingpower for tests on generalized least squares (GLS)regression slopes results from substituting x %=z %C−1 into calculation of effect size D. The generalpower equation is of the form

Zb=D'n

2−Za

where Zb is the critical Z-score delineating an areaunder the unit normal curve equal to power, Za isthe critical value corresponding to the desiredone-sided or two-sided alpha level, n is the samplesize per groups, and D is the effect size whichdepends on the particular definition of change.

D=�x %d �

sx %Cxwhere x % is the contrast vector for endpoint, OLS,or GLS analysis, d is the vector of differencesbetween group means, C is the within-groupscorrelation matrix for the repeated measurements,and s is the within-groups standard deviationwhich is assumed to be constant across the repeatedmeasurements. It is common to covary baselinescores for the simple endpoint analysis, whichinvolves adjustment of the error variance by the r2

correlation between baseline and endpoint in calcu-lating D. The following simplified equations forcalculating Zb are obtained by considering thedifferent x % linear contrasts for endpoint, OLS, orGLS analysis.

Endpoint analysis without baseline covaried:

Zb=�m1t−m2t�

sw

' n4(1−r)

−Za

where m1t−m2t is the difference between endpointmeans only, assuming that the expected value of

Page 3: Sample size and power calculations in repeated measurement analysis

C. Ahn et al. / Computer Methods and Programs in Biomedicine 64 (2001) 121–124 123

the baseline difference in the two treatment groupsis zero due to randomization, and r is the within-groups correlation between baseline and endpointscores.

Endpoint analysis with baseline covaried:

Zb=�m1t−m2t�

sw

' n2(1−r2)

−Za

OLS slope difference without baseline covaried:

Zb=�x %d �sw

' n2(x %Cx)

−Za

where x % is the mean-corrected vector of linearlyincreasing time coefficients, x %= [−4 −3 −2 −10 1 2 3 4] for a total of nine measurements, d is themean-corrected vector of postulated linearly in-creasing differences between group means, and Cis the within-groups correlation matrix among therepeated measurements.

OLS with baseline covaried:

Zb=�x %d �sw

' n2(1−rb

2)(x %Cx)−Za

where rb=x %Cx(1)/(x %Cx)1/2 is the correlation be-tween baseline scores and the time-weighted combi-nation of repeated measurements defined byx %= [−4 −3 −2 −1 0 1 2 3 4] with x (1)% = [1 00 0 0 0 0 0 0] for a total of nine measurements.

GLS slope differences without baseline covaried:

Zb=�z %C−1d �

sw

' n2(z %C−1z)

−Za

where z % is the vector of linearly increasing timecoefficients as used for OLS calculations, z %= [−4−3 −2 −1 0 1 2 3 4] for a total of ninemeasurements, and C−1 is the inverse of thewithin-groups correlation matrix or model thereof.

GLS slope differences with baseline covaried:

Zb=�z %C−1d �

sw

' n2(1−r c

2)(z ’C−1z)−Za

where rc=z %z(1)/(z %C−1z)1/2 is the correlation be-tween baseline scores and the transformed GLSdefinition of change, where z %= [−4 −3 −2 −10 1 2 3 4] and z %(1)= [1 0 0 0 0 0 0 0 0] for a totalof nine measurements.

3. Program description

Power depends on the pattern of treatmenteffects across time, the within-groups variancewhich is assumed to be constant across the differenttime points, the number of equally-spaced repeatedmeasurements, and the level and pattern of theirintercorrelations, as well as on parameters commonto estimation of sample sizes for simple randomizeddesigns. Herein we describe the requirements foruse of a computer program called POWER.EXE,which has been written in Microsoft Fortran, toperform sample size and power calculations forcomparing differences in rates of change frombaseline produced by two treatments in a controlledclinical trial. The interactive program is easilyimplemented. It calculates: (1) the required samplesize per group for desired level of power at aspecified alpha level given a specified minimum (ormeaningful) endpoint treatment difference; or (2)the statistical power against a specified minimum(or meaningful) endpoint treatment differencegiven specified alpha level and sample size pergroup.

To implement the program POWER.EXE, sim-ply type POWER at the DOS prompt. The programasks whether you want to compute the requiredsample size per treatment group (two groups) or thestatistical power provided by a specified samplesize. The program next asks whether you want tomodel the correlational structure as autoregressive(order 1) or compound symmetry. The programthen asks user to specify the desired minimaldetectable endpoint mean difference and thewithin-groups standard deviation.

Limitations of the program include the fact thatit calculates sample size and power for a two-groupdesign only, considers fitting linear equations to thepatterns of change across equally-spaced repeatedmeasurements, and assumes that any dropouts willtend to be relatively uniformly distributed acrosstime in the repeated measurements design.

4. Example

Schizophrenia symptomatology is assessed usingthe total score from the Brief Psychiatric Rating

Page 4: Sample size and power calculations in repeated measurement analysis

C. Ahn et al. / Computer Methods and Programs in Biomedicine 64 (2001) 121–124124

Scale (BPRS). Scores have a potential range from0 to 108, with higher score indicating more severepathology. Assessments are carried out at baseline(week zero) and at weeks 1, 2, 3, 4, 5, and 6. Theprimary objective of the study is to comparegradients of change in BPRS total scores frombaseline to week 6 for placebo and a new therapyfor schizophrenia. The investigator wants to esti-mate the sample size needed to detect a treatmenteffect of medium magnitude with 80% power.Cohen [5] (p. 26) has characterized a difference ofone-half standard deviation as a treatment effectof ‘medium magnitude.’ The within-groups corre-lation structure of the repeated measurements isassumed to conform to autoregressive (AR(1))with the (population) baseline-to-endpoint corre-lation equal to 0.5. Table 1 shows the results fromusing the computer program POWER.EXE toestimate the required sample size.

It can be appreciated that power and samplesize for the GLS solution approach those for thesimple endpoint difference-score analysis as thecorrelation structure of the repeated measure-ments approaches a true AR(1) pattern and g

approaches 1. The GLS solution approaches OLSdefinition of change when the correlation struc-ture approaches compound symmetry. The end-point analysis, and consequently GLS regression,have superior power and thus require smallersample sizes than OLS regression in the presenceof an autoregressive correlation structure. TheOLS and GLS regression provide superior powerin the presence of uniform correlation. An unpub-lished parametric study examined the sample sizeand power requirements for various intermediatelevels of serial dependence, and the conclusionfollowed that it is better to use the autoregressive(AR(1)) option for all cases where any meaningfuldegree of serial dependence among the repeatedmeasurements is present. This is also a conserva-tive approach as confirmed by the larger samplesizes that are required for desired power underautoregressive conditions.

Acknowledgements

This work was supported in part by NIH grantsMH32457 and MO1RR02588.

References

[1] C. Ahn, S. Tonidandel, J.E. Overall, issues in use of SASPROC.MIXED to test the significance of treatment effectsin controlled clinical trials. J. Biopharm. Stat., 10 (2000)265–286.

[2] J.E. Overall, C. Ahn, C. Shivakumar, Y. Kalburgi, Prob-lematic formulations of SAS PROC.MIXED models forrepeated measurements, J. Biopharm. Stat. 9 (1999) 189–216.

[3] J.E. Overall, S.R. Doyle, Estimating sample sizes for re-peated measurements designs, Control. Clin. Trials 15(1994) 100–123.

[4] J.E. Overall, G. Shobaki, C. Shivakumar, J. Steele, Adjust-ing samples size for anticipated dropouts in clinical trials,Psychopharmacol. Bull. 34 (1998) 25–33.

[5] J. Cohen, Statistical Power Analysis for the BehavioralSciences: 2nd ed. Hillsdale, NJ, Lawrence Erlbaum, 1988.

Table 1