design and analysis of cluster randomization trials in health research allan donner, ph.d.,...

Download Design and Analysis of Cluster Randomization Trials in Health Research Allan Donner, Ph.D., Professor and Chair Department of Epidemiology & Biostatistics

Post on 24-Dec-2015




3 download

Embed Size (px)


  • Slide 1
  • Design and Analysis of Cluster Randomization Trials in Health Research Allan Donner, Ph.D., Professor and Chair Department of Epidemiology & Biostatistics The University of Western Ontario, London, Ontario, Canada Neil Klar, Ph.D., Senior Biostatistician Division of Preventive Oncology Cancer Care Ontario, Toronto, Ontario, Canada
  • Slide 2
  • Dr. Allan Donner
  • Slide 3
  • Dr. Neil Klar
  • Slide 4
  • Learning Objectives To distinguish experimental trials based on the unit of randomization (e.g. individual, family, community). To appreciate the consequences of cluster randomization on sample size estimation and data analysis. To identify key features of a cluster randomization trial which need to be included in reports.
  • Slide 5
  • What Are Cluster Randomization Trials? Cluster randomization trials are experiments in which clusters of individuals rather than independent individuals are randomly allocated to intervention groups.
  • Slide 6
  • Example 1: Study Purpose: To evaluate the effectiveness of Vitamin A supplements on childhood mortality. 450 villages in Indonesia were randomly assigned to either participate in a Vitamin A supplementation scheme, or serve as a control. One year mortality rates were compared in the two groups. Sommer et al. (Lancet, 1986)
  • Slide 7
  • Example 2: Study Purpose: To promote smoking cessation using community resources. 11 paired communities were selected and one member of each pair was randomly assigned to the intervention group. 5-year smoking cessation rates were compared in the two groups. Communities were matched on demographic characteristics (e.g. size, population density) and geographical proximity. COMMIT Research Group (Am J Public Health, 1995)
  • Slide 8
  • Example 3: Study Purpose: To evaluate the effectiveness of treated nasal tissues versus standard tissues. 90 families were selected and randomized to one of the two intervention groups separately in each of three family size strata (2, 3, or 4 members per family). 24-week incidence of respiratory illness were compared in the two groups. Farr et al. (Am. J. Epid., 1988)
  • Slide 9
  • Reasons for Adopting Cluster Randomization Administrative convenience To obtain cooperation of investigators Ethical considerations To enhance subject compliance To avoid treatment group contamination Intervention is naturally applied at the cluster level
  • Slide 10
  • Unit of Randomization vs. Unit of Analysis A key property of cluster randomization trials is that inferences are frequently intended to apply at the individual level while randomization is at the cluster or group level. Thus the unit of randomization may be different from the unit of analysis. In this case, the lack of independence among individuals in the same cluster, i.e. between- cluster variation, creates special methologic challenges in both design and analysis.
  • Slide 11
  • Implications of Between-Cluster Variation Presence of between-cluster variation implies: (i) Reduction in effective sample size. Extent depends on degree of within-cluster correlation and on average cluster size. (ii) Standard approaches for sample size estimation and statistical analysis do not apply. Application of standard sample size approaches leads to an underpowered study. Application of standard statistical methods generally tends to bias p-values downwards, i.e. could lead to spurious statistical significance.
  • Slide 12
  • Possible Reasons for Between- Cluster Variation 1. Subjects frequently select the clusters to which they belong e.g., Patient characteristics could be related to age or sex differences among physicians 2. Important covariates at the cluster level affect all individuals within the cluster in the same manner e.g. Differences in temperature between nurseries may be related to infection rates 3. Individuals within clusters frequently interact and, as a result, may respond similarly e.g. Education strategies or therapies provided in a group setting 4. Tendency of infectious diseases to spread more rapidly within than among families or communities.
  • Slide 13
  • Quantifying the Effect of Clustering Consider a trial in which k clusters of size m are randomly assigned to each of an experimental and control group. Also assume the response variable Y is normally distributed with common variance 2 Aim is to test H 0 : 1 = 2. Then appropriate estimates of 1 and 2 are given by Y 1, Y 2, the usual sample means. Also: V(Y i ) = ( 2 /km) [1 + (m - 1) ], i =1,2 where is the coefficient of intracluster correlation. Then IF = 1 + (m- 1) is the variance inflation factor or design effect associated with cluster randomization.
  • Slide 14
  • Variance Component Interpretation of The overall response variance 2 may be expressed as the sum of two components, i.e., 2 = 2 A + 2 W, where 2 A = between-cluster component of variance 2 W = within-cluster component of variance then = 2 A / ( 2 A + 2 W )
  • Slide 15
  • Sample Size Requirements for Completely Randomized Designs Comparison of Means: Suppose k clusters of size m are to be assigned to each of two intervention groups. Then the number of subjects required per intervention group to test H 0 : 1 = 2 is given by n = {(Z /2 + Z ) 2 (2 2 ) [1 + (m 1) ]} / ( 1 - 2 ) 2 where 2 = 2 A + 2 W Equivalently, the number of required clusters is given by k = n/m.
  • Slide 16
  • Example Hsieh (1988) reported on the results of a pilot study for a planned 5-year trial examining cardiovascular risk factors, obtaining cholesterol levels from 754 individuals in 4 worksites. Estimated variance components were S 2 W = 2209, S 2 A = 93. value of assessed as = 93 / (93 + 2209) = 0.04 Assuming = 70 subjects/worksite, IF = 1 + (70 - 1) 0.04 = 3.76
  • Slide 17
  • To obtain 80% power at =.05 (2 sided) for detecting a mean difference of 20 mg/dl between intervention groups, the number of required worksites per group is given by k ={(1.96 + 0.84) 2 2(2302) (3.76)} / 70 (20) 2 = 4.8 5 To adjust for the use of normal distribution critical values, and possible loss of follow-up, might enroll 7 clusters per group.
  • Slide 18
  • Impact on Power of Increasing the Number of Clusters vs. Increasing Cluster Size: Let d = mean difference between intervention groups then, Var (d) = (2 2 / km) [ 1 + ( m 1) ] As the number of clusters k , Var(d) 0 but, as the cluster size size m , Var (d) (2 2 ) / k = 2 2 A /k Trial randomizing between 30 and 50 individuals will tend to have almost the same statistical power as trials randomizing the same number of much larger units. But clusters of larger size are often recruited for very practical reasons (to reduce contamination, to avoid logistic or ethical problems, etc.).
  • Slide 19
  • Factors Influencing Loss of Precision 1.Interventions often applied on a group basis with little or no attention given to individual study participants. 2.Some studies permit the immigration of new subjects after baseline. 3.Entire clusters, rather than just individuals, may be lost to follow-up. 4.Over-optimistic expectations regarding effect size.
  • Slide 20
  • Strategies for Improving Precision in Cluster Randomization Trials 1. Establish cluster-level eligibility criteria so as to reduce between-cluster variability e.g., geographical restrictions. 2.Consider increasing the number of clusters randomized, even if only in the control group. 3.Consider matching or stratifying in the design by baseline variable having prognostic importance. 4.Obtain baseline measurements on other potentially important prognostic variables. 5.Take repeated assessments over time from the same clusters or from different clusters of subjects. 6.Develop a detailed protocol for ensuring compliance and minimizing loss to follow-up.
  • Slide 21
  • The Importance of Cluster Level Replication Some investigators have designed community intervention trials in which exactly one cluster has been assigned to each intervention group. Such trials invariably result in interpretational difficulties caused by the total confounding of two sources of variation: the variation in response due to the effect of intervention, and the natural variation that exists between the two communities (clusters) even in the absence of an intervention effect. Analysis is only possible under the untenable assumption that there is no clustering of individuals responses within communities. More attention to the effects of clustering when determining sample size might help to eliminate designs which lack replication.
  • Slide 22
  • Analysis of Binary Outcomes Objective: To assess the effects of interventions on individuals when clusters are the sampling unit. Statistical Issue: Responses on individuals within the same cluster tend to be positively correlated, violating the assumption of independence required for the application of standard statistical methods.
  • Slide 23
  • Example: Data obtained from a study evaluating the effect of school-based interventions in reducing adolescent tobacco use. 12 school units were randomly assigned to each of four conditions, including three intervention conditions and a control condition (existing curriculum). We compare here the effect of the SFG (Smoke Free Generation) intervention to the existing curriculum (EC) with respect to reducing the proportio


View more >