trial sequential analysis reveals insufficient information size and potentially false positive...

7
ORIGINAL ARTICLE Trial sequential analysis reveals insufficient information size and potentially false positive results in many meta-analyses Jesper Brok*, Kristian Thorlund, Christian Gluud, Jørn Wetterslev Copenhagen Trial Unit, Centre for Clinical Intervention Research, The CochraneHepato-Biliary Group, Dept. 3344, Rigshospitalet, Copenhagen University Hospital, Blegdamsvej 9, DK-2100 Copenhagen, Denmark Accepted 1 October 2007 Abstract Objectives: To evaluate meta-analyses with trial sequential analysis (TSA). TSA adjusts for random error risk and provides the required number of participants (information size) in a meta-analysis. Meta-analyses not reaching information size are analyzed with trial sequential monitoring boundaries analogous to interim monitoring boundaries in a single trial. Study Design and Setting: We applied TSA on meta-analyses performed in Cochrane Neonatal reviews. We calculated information sizes and monitoring boundaries with three different anticipated intervention effects of 30% relative risk reduction (TSA 30% ), 15% (TSA 15% ), or a risk reduction suggested by low-bias risk trials of the meta-analysis corrected for heterogeneity (TSA LBHIS ). Results: A total of 174 meta-analyses were eligible; 79 out of 174 (45%) meta-analyses were statistically significant (P ! 0.05). In the significant meta-analyses, TSA 30% showed firm evidence in 61%. TSA 15% and TSA LBHIS found firm evidence in 33% and 73%, respec- tively. The remaining significant meta-analyses had potentially spurious evidence of effect. In the 95 statistically nonsignificant (P > 0.05) meta-analyses, TSA 30% showed absence of evidence in 80% (insufficient information size). TSA 15% and TSA LBHIS found that 95% and 91% had absence of evidence. The remaining nonsignificant meta-analyses had evidence of lack of effect. Conclusion: TSA reveals insufficient information size and potentially false positive results in many meta-analyses. Ó 2008 Elsevier Inc. All rights reserved. Keywords: Meta-analysis; Trial sequential analysis; Heterogeneity; Information size; Sample size; Random error 1. Introduction Meta-analyses aim to increase the power and precision of the estimated intervention effects [1,2]. Meta-analyses are, however, criticized because the included trials are inev- itably clinical diverse regarding patients, interventions, out- comes, etc. Hence, pooling the potentially heterogeneous trial results is sometimes inappropriate [3,4]. Meta-analyses may also obtain false positive results (type I errors) or over- estimate treatment effects due to systematic errors (bias) and random errors (play of chance). Bias may originate from publication bias [5e7], inclusion of trials with high- bias risk [8e11], outcome measure bias [12], premature stopping of ‘‘positive’’ trials [13], and small trial bias [14]. Meta-analyses could also be data driven because they are retrospectively conducted. Random errors may arise due to repetitive testing as data accrue and testing of multiple outcome measures, which inevitably, sooner or later, lead to type I errors [15]. The required number of participants (information size) for a meta-analysis should be at least as large as an ade- quately powered single trial. Trial sequential analysis (TSA) is an approach that provides the required informa- tion size in meta-analyses [16]. To adjust for random error risk, meta-analyses not reaching the required sample size are analyzed with trial sequential monitoring boundaries analogous to interim monitoring boundaries in a single trial [16e21]. Trial sequential monitoring boundaries adjust the P-value that is required for obtaining a statistical signifi- cance according to the number of participants and events in a meta-analysis. The fewer participants and events, the more restrictive the monitoring boundaries are and the lower P-value is required to obtain statistical significance. The use of TSA in meta-analyses has been debated be- cause the analysis ignores potential bias and heterogeneity [22], but adjustment for these factors seems possible [16]. We recently audited clinical guidelines taking Cochrane Neonatal Group reviews as basis for deciding which * Corresponding author. Tel.: þ45-35457109; fax: þ45-35457101. E-mail address: [email protected] (J. Brok). 0895-4356/08/$ e see front matter Ó 2008 Elsevier Inc. All rights reserved. doi: 10.1016/j.jclinepi.2007.10.007 Journal of Clinical Epidemiology - (2008) - ARTICLE IN PRESS

Upload: independent

Post on 27-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

ARTICLE IN PRESS

Journal of Clinical Epidemiology - (2008) -

ORIGINAL ARTICLE

Trial sequential analysis reveals insufficient information sizeand potentially false positive results in many meta-analyses

Jesper Brok*, Kristian Thorlund, Christian Gluud, Jørn WetterslevCopenhagen Trial Unit, Centre for Clinical Intervention Research, The Cochrane Hepato-Biliary Group, Dept. 3344, Rigshospitalet, Copenhagen University

Hospital, Blegdamsvej 9, DK-2100 Copenhagen, Denmark

Accepted 1 October 2007

Abstract

Objectives: To evaluate meta-analyses with trial sequential analysis (TSA). TSA adjusts for random error risk and provides the requirednumber of participants (information size) in a meta-analysis. Meta-analyses not reaching information size are analyzed with trial sequentialmonitoring boundaries analogous to interim monitoring boundaries in a single trial.

Study Design and Setting: We applied TSA on meta-analyses performed in Cochrane Neonatal reviews. We calculated informationsizes and monitoring boundaries with three different anticipated intervention effects of 30% relative risk reduction (TSA30%), 15%(TSA15%), or a risk reduction suggested by low-bias risk trials of the meta-analysis corrected for heterogeneity (TSALBHIS).

Results: A total of 174 meta-analyses were eligible; 79 out of 174 (45%) meta-analyses were statistically significant (P ! 0.05). In thesignificant meta-analyses, TSA30% showed firm evidence in 61%. TSA15% and TSALBHIS found firm evidence in 33% and 73%, respec-tively. The remaining significant meta-analyses had potentially spurious evidence of effect. In the 95 statistically nonsignificant(P > 0.05) meta-analyses, TSA30% showed absence of evidence in 80% (insufficient information size). TSA15% and TSALBHIS found that95% and 91% had absence of evidence. The remaining nonsignificant meta-analyses had evidence of lack of effect.

Conclusion: TSA reveals insufficient information size and potentially false positive results in many meta-analyses. � 2008 ElsevierInc. All rights reserved.

Keywords: Meta-analysis; Trial sequential analysis; Heterogeneity; Information size; Sample size; Random error

1. Introduction

Meta-analyses aim to increase the power and precisionof the estimated intervention effects [1,2]. Meta-analysesare, however, criticized because the included trials are inev-itably clinical diverse regarding patients, interventions, out-comes, etc. Hence, pooling the potentially heterogeneoustrial results is sometimes inappropriate [3,4]. Meta-analysesmay also obtain false positive results (type I errors) or over-estimate treatment effects due to systematic errors (bias)and random errors (play of chance). Bias may originatefrom publication bias [5e7], inclusion of trials with high-bias risk [8e11], outcome measure bias [12], prematurestopping of ‘‘positive’’ trials [13], and small trial bias[14]. Meta-analyses could also be data driven because theyare retrospectively conducted. Random errors may arisedue to repetitive testing as data accrue and testing of

* Corresponding author. Tel.: þ45-35457109; fax: þ45-35457101.

E-mail address: [email protected] (J. Brok).

0895-4356/08/$ e see front matter � 2008 Elsevier Inc. All rights reserved.

doi: 10.1016/j.jclinepi.2007.10.007

multiple outcome measures, which inevitably, sooner orlater, lead to type I errors [15].

The required number of participants (information size)for a meta-analysis should be at least as large as an ade-quately powered single trial. Trial sequential analysis(TSA) is an approach that provides the required informa-tion size in meta-analyses [16]. To adjust for random errorrisk, meta-analyses not reaching the required sample sizeare analyzed with trial sequential monitoring boundariesanalogous to interim monitoring boundaries in a single trial[16e21]. Trial sequential monitoring boundaries adjust theP-value that is required for obtaining a statistical signifi-cance according to the number of participants and eventsin a meta-analysis. The fewer participants and events, themore restrictive the monitoring boundaries are and thelower P-value is required to obtain statistical significance.

The use of TSA in meta-analyses has been debated be-cause the analysis ignores potential bias and heterogeneity[22], but adjustment for these factors seems possible [16].We recently audited clinical guidelines taking CochraneNeonatal Group reviews as basis for deciding which

2 J. Brok et al. / Journal of Clinical Epidemiology - (2008) -

ARTICLE IN PRESS

intervention to use [23]. Therefore, we have examinedmeta-analyses in these reviews with TSA with and withoutbias and heterogeneity adjustment to reassess the evidencethey provide.

2. Methods

2.1. Material and bias definition

We identified all meta-analyses that included more thantwo trials reporting on a binary outcome measure from the188 Cochrane Neonatal Group reviews in The Cochrane Li-brary, Issue 4, 2004 [24]. From each review, whenever pos-sible, we included three meta-analyses. We selected themeta-analyses on mortality outcomes and the first two eli-gible meta-analyses on clinical outcome measures accord-ing to the review authors’ priority (or three, in casemortality was not meta-analyzed).

The meta-analyses had to include at least one random-ized trial with low-bias risk (i.e., adequate allocationconcealment according to the authors of the Cochranereview). These selection criteria enabled us to constructbias-adjusted TSA [16]. Allocation concealment was cho-sen because it could be applied to all trials; adequate allo-cation concealment reduces the risk of selection bias; andunclear or inadequate allocation concealment is most con-sistently associated with a risk of biased overestimationof intervention effects [8e11].

2.2. Statistical methods

TSA necessitates a prespecified relevant (worthwhile)intervention effect and a prespecified risk of type 1(a)and 2 (b) errors. We set two-sided a 5 5% and b 5 20%(1� b 5 80% power). We used an a priori specified relativerisk reduction (RRR) of 30% and 15%. With this informa-tion, the required information size (i.e., the number of par-ticipants in the meta-analysis required to accept or rejectthe prespecified intervention effect) can be calculated andthe adjacent trial sequential monitoring boundaries forTSA30% or TSA15% can be constructed (Fig. 1). We also

Monitoring boundaryThe monito- Event rate- Interventi- Heteroge

Z = 1.96

Inf

Cu

mu

lative Z

-sco

re

Cumula

Fig. 1. Example of upper half of two-sided trial sequential analysis. The cumula

including a new trial according to publication date. Crossing of monitoring boun

calculated a bias- and heterogeneity (I2)-adjusted TSALBHIS

[16]. For TSALBHIS, we based the prespecified RRR on thepooled intervention effect in low-bias risk trials of themeta-analyses. We adjusted the TSALBHIS for heterogene-ity. I2 (inconsistency test) measures the proportion of vari-ation in treatment effect estimates due to between-studyheterogeneity rather than chance in a meta-analysis [25].Accordingly, the information sizes increased with increas-ing heterogeneity and the monitoring boundary becomesmore restrictive (Fig. 1).

We calculated both the information size and appliedthe adjacent trial sequential monitoring boundaries forTSA30%, TSA15%, and TSALBHIS, respectively. We com-pared the accrued number of participants with the calcu-lated information size. We constructed the cumulativeZ-curve (i.e., Z-statistics after each trial) of each meta-analysis and assessed its crossing of Z 5 1.96 (P 5 0.05)and the monitoring boundaries with the fixed-effect model[26] or random-effects model [27] as used in the Cochranereviews [28]. The monitoring boundaries should be crossedby the cumulative Z-curve to obtain firm evidence for an in-tervention effect (Fig. 1).

Data from each meta-analysis were extracted from the Co-chrane neonatal review by one author (J.B.) and analyzedwith a computer program, TSAv0.6. Data extraction was ver-ified by comparing the Z-scores for each meta-analyses inTSA v0.6 with the obtained Z-scores in Review Manager4.10 (Nordic Cochrane Collaboration, 2003), which is thestandard program used by Cochrane review authors [1].

2.3. Outcomes

We classified the proportion of significant (P ! 0.05)meta-analyses that had

� ‘‘Potentially spurious evidence of effect,’’ that is, thecumulative Z-curve did not cross the monitoringboundaries (Fig. 2, curve A)� ‘‘Firm evidence of effect,’’ that is, the cumulative Z-

curve crossed the monitoring boundaries (Fig. 2,curve B).

P = .05

ring boundary moves right and up if: decreases

on effect decreasesneity increases (only TSALBHIS)

ormation size

Number of patients

tive Z-curve

tive Z-curve was constructed with each cumulated Z-value calculated after

daries is needed to obtain reliable evidence adjusted for random error risk.

Z=1.96

Information size

Number of patients

Cu

mu

lative

Z-sco

re

Monitoring boundary

A

B

C DP = .05

Fig. 2. Examples of upper half of two-sided trial sequential analyses. The cumulative Z-curves (AeD) from four different meta-analyses were constructed.

Crossing of Z 5 1.96 provides a ‘‘traditionally’’ significant result (A). Crossing of the monitoring boundary before reaching the information size is needed to

obtain reliable evidence adjusted for random error risk (B). Z-curves not crossing Z 5 1.96 indicate absence of evidence if the information size is not reached

(C) or lack of the predefined intervention effect if the information size is reached (D).

3J. Brok et al. / Journal of Clinical Epidemiology - (2008) -

ARTICLE IN PRESS

The proportion of nonsignificant (P > 0.05) meta-analy-ses that had

� ‘‘Absence of evidence,’’ that is, the meta-analysis in-cluded less patients than the required information size(Fig. 2, curve C)� ‘‘Lack of effect,’’ that is, the meta-analysis included

more patients than the required information size(Fig. 2, curve D).

For meta-analyses with ‘‘potentially spurious evidenceof effect’’ or ‘‘absence of evidence,’’ we calculated the ad-ditional information size needed to obtain firm evidence(i.e., the difference between the information size and theaccrued number of patients in the meta-analyses).

3. Results

3.1. Eligible meta-analyses

We identified 188 Cochrane Neonatal Group systematicreviews in The Cochrane Library, Issue 4, 2004 [24]. Ofthese, we excluded 76 because the review included less thanthree randomized clinical trials, 29 reviews because they didnot report a binary outcome measure, and 6 reviews becauseall trials had high-bias risk (i.e., had unclear or inadequate al-location concealment). From the remaining 77 reviews, weincluded a total of 174 eligible meta-analyses.

3.2. Characteristics of meta-analyses

The 174 meta-analyses included a median of five ran-domized trials (range, 3e32) and a median of 507 patients(range, 53e7,454). A total of 172 out of 174 (99%) meta-analyses were analyzed with the fixed-effect model. Forty-eight out of 174 meta-analyses compared two different in-terventions and 126 out of 174 meta-analyses comparedone intervention vs. placebo or no intervention. Allneonatal meta-analyses used a 5 5%. The cumulativeZ-curve of the meta-analyses crossed Z 5 1.96, that is,

statistically significant (P ! 0.05) in 79 out of 174 (45%)of the meta-analyses (Fig. 3A).

3.3. Meta-analyses with ‘‘potentially spurious evidenceof effect’’

In 31 out of 79 (39%) significant meta-analyses(P ! 0.05), the cumulative Z-curve did not cross the mon-itoring boundaries for TSA30% (Fig. 3B). For these meta-analyses, the median additional information size neededto obtain firm evidence of a RRR of 30% was 1,523 patients(range, 176e2,844).

In 53 out of 79 (67%) significant meta-analyses, the cu-mulative Z-curve did not cross the monitoring boundariesfor TSA15% (Fig. 3C). The median additional informationsize needed to obtain firm evidence of a RRR of 15%was 3,017 patients (range, 284e20,970).

In 21 of 79 (27%) significant meta-analyses, the cumu-lative Z-curve did not cross the monitoring boundaries forTSALBHIS (Fig. 3D). The median additional informationsize needed to obtain firm evidence of a RRR identical tothe pooled estimate from low-bias risk trials was 1,609 pa-tients (range, 58e16,756) (Table 1, Example A).

3.4. Meta-analyses with ‘‘evidence of effect’’

For the 79 significant meta-analyses (P ! 0.05), theZ-curve crossed the monitoring boundaries for TSA30% in48/79 (61%), TSA15% in 26/79 (33%), and TSALBHIS in58/79 (73%) of these meta-analyses (Fig. 3BeD) (Table1, Example B).

3.5. Meta-analyses with ‘‘absence of evidence’’

A total of 76 of the 95 nonsignificant meta-analyses(P > 0.05) (80%) included less patients than the estimatedinformation size to obtain firm meta-analytic evidence fora RRR of 30% (Fig. 3B). The median additional informationsize needed to obtain firm evidence for a RRR of 30% was938 patients (range, 13e29,400) (Table 1, Example D).

Fig. 3. Different ways to assess the evidence in meta-analyses. ‘‘Traditional’’ meta-analyses (A), trial sequential boundary with a predefined intervention

effect of 30% (TSA30% [B]), of 15% (TSA15% [C]), and trial sequential boundary adjusted for bias and heterogeneity (TSALBHIS [D]) applied on 174 neonatal

meta-analyses.

4 J. Brok et al. / Journal of Clinical Epidemiology - (2008) -

ARTICLE IN PRESS

For TSA15%, 90 of 95 nonsignificant meta-analyses(95%) needed more trials to detect or reject a RRR of15% (Fig. 3C). The median additional information sizeneeded to obtain firm evidence for a RRR of 15% was5,277 patients (range, 181e108,700).

For TSALBHIS, 86 of 95 nonsignificant meta-analyses(91%) needed more trials to detect or reject a RRR similarto low-bias risk trials (Fig. 3D). The median additionalsample size needed to obtain firm evidence for a RRRsimilar to low-bias risk trials was 3,864 patients (range,18 to N).

3.6. Meta-analyses with ‘‘lack of effect’’

For TSA30%, 19 out of 95 (20%) nonsignificant meta-analyses (P > 0.05) illustrate lack of a RRR of 30%because they included more patients than the estimatedinformation size (Fig. 3B).

For TSA15%, 5 out of 95 (5%) nonsignificant meta-anal-yses illustrate lack of a RRR of 15% because they included

more patients than the estimated information size (Fig. 3C)(Table 1, Example C).

For TSALBHIS, 9 out of 95 (9%) nonsignificant meta-analyses illustrate lack of a RRR similar to low-bias risk tri-als because they included more patients than the estimatedinformation size (Fig. 3D).

4. Discussion

This study is the first to apply TSA on a large cohortof meta-analyses. Applying three different TSAs to Co-chrane Neonatal Group meta-analyses revealed that manymeta-analyses have insufficient information size and thereare several potentially false positive results. The respec-tive TSAs supported only the ‘‘traditional’’ significance(P ! 0.05) in 61% (TSA30%), 33% (TSA15%), and 73%(TSALBHIS) of 79 significant meta-analyses. ApplyingTSA30%, TSA15%, and TSALBHIS on the 95 nonsignificant(P > 0.05) meta-analyses showed that 80%, 95%, and

Table 1

The cumulative Z-curve relations to Z 5 1.96 (P ! 0.05) and monitoring boundaries in trial sequential analysis of four Cochrane neonatal meta-analyses

Review Outcome measure

No of trials

(patients)

Z 5 1.96

(P-value)

Boundary30%a

(information sizeb)

Boundary15%a

(information sizeb)

BoundaryLBHISa

(information sizeb)

A) Continuous

distending pressure for

respiratory

distress syndrome [38]

Mortality 5 (197) Crossed (50.01) Not crossed (675) Not crossed (2,840) Not crossed (762)

B) Nitric oxide for respiratory

failure in infants [39]

Extracorporeal

oxygenation

6 (755) Crossed (!0.0001) Crossed (301) Crossed (1,212) Crossed (144)

C) Natural vs. synthetic

surfactant for respiratory

distress syndrome [40]

Patent ductus

arteriosus

7 (3,283) Not crossed (50.63) Not crossed (412) Not crossed (1,697) Not crossed (71,221)

D) Vitamin A

supplementation for low

birthweight [41]

Mortality 6 (1,165) Not crossed (50.24) Not crossed (1,513) Not crossed (6,489) Not crossed (7,359)

Example AdContinuous distending pressure for respiratory distress syndrome: The meta-analysis included 197 patients and was significant (P 5 0.01).

The cumulative Z-curve did not cross any monitoring boundaries. TSALBHIS estimates an information size almost similar to TSA30% indicating that low-bias

risk trials estimate a 30% RRR. Accordingly, P 5 0.01 is potential false positive if a RRR of 30% or 15% is relevant. More trials are needed to obtain firm

meta-analytic evidence.

Example BdNitric oxide for respiratory failure in infants: The significant meta-analysis (P ! 0.0001) included 755 patients. The cumulative Z-curve

crossed all monitoring boundaries establishing firm evidence. Several trials were published even after crossing the monitoring boundaries. These trials were

potentially redundant as the beneficial effect on the specific outcome was established.

Example CdNatural vs. synthetic surfactant for respiratory distress syndrome: The nonsignificant (P 5 0.63) meta-analysis included more patients

(3,283) than the estimated information size needed (1,696) to detect or reject a RRR of 15%. Accordingly, we have evidence of absence of effect for

a RRR of 15% or more and no further trials are needed if this is considered the minimum worthwhile intervention effect. TSALBHIS estimates a very large

information size (71,221) because low-bias risk trials estimate a very small intervention effect (RRR 5 3%).

Example DdVitamin A supplementation for low birthweight: The nonsignificant (P 5 0.24) meta-analysis included less patients (1,165) than the esti-

mated information size (1,513) required to detect or reject a RRR of 30%. Accordingly, we have absence of evidence and more trials (including at least

348 patients) are needed. TSA15% and TSALBHIS reach similar conclusion that more trials with substantial more patients are needed.a Trial sequential monitoring boundaries for TSA30%, TSA15%, and TSALBHIS. Crossing of monitoring boundaries gives firm evidence for the predefined

intervention effect.b Number of patients required to detect (or reject) a predefined intervention effect of 30% (TSA30%), 15% (TSA15%), or as low-bias risk trials corrected

for heterogeneity (TSALBHIS).

5J. Brok et al. / Journal of Clinical Epidemiology - (2008) -

ARTICLE IN PRESS

91% of meta-analyses, respectively, had absence of evi-dence and that further trials on these topics may beneeded.

4.1. Trial sequential analysis

The basic concepts of applying discrete sequential mon-itoring boundaries, designed for a single trial, on meta-analyses were introduced in 1997 [17,18]. Only fewpublications have applied similar methods on small sam-ples or single meta-analyses [16e18,21]. Applying TSAon meta-analyses seems reasonable due to several reasons.First, minimizing potentially false positive results due torandom errors if the sufficient information size is notreached. We were surprised to find that 45% of the meta-analyses were significant; however, this tendency is in ac-cordance with previous observations [29]. The proportionmay be inflated because we excluded meta-analyses withless than three trials, which are more likely to be nonsignif-icant. Second, meta-analyses ought to be assessed withTSA that are at least as restrictive as sample size estimationand sequential monitoring boundaries applied to a singletrial. Third, TSA gives information about when reliable ev-idence is obtained which can stop implementation of redun-dant trials. Additionally, using a relevant prespecified

intervention effect, the calculated information size in non-significant meta-analyses provides a cutoff indicatingwhether more trials are needed (absence of evidence) ornot (evidence of absence of effect) [30,31]. If more trialsare needed, it is easy to reestimate the additional numberof patients required to obtain firm evidence in the meta-analyses, thereby guiding trialists about sample size infuture trials.

4.2. Strengths and limitations

We included only Cochrane neonatal meta-analyses thatreported a binary outcome. TSA should also be applied tometa-analyses on continuous outcomes and is applicableon any type of meta-analyses. We did not assess bias riskin randomized trials ourselves, but relied on the Cochraneauthor’s reporting, which may be flawed [32]. We acknowl-edge that meta-analyses are not independent observations ifthey are from the same review. However, had we used onlyone meta-analysis from each review, the results would nothave changed noticeably.

Obviously TSA provides more conservative conclusions,which may delay clinicians’ use of potentially beneficial in-terventions [33]. This conflicts the current ‘‘societal opin-ion’’ that clinicians find it difficult to do nothing and that

6 J. Brok et al. / Journal of Clinical Epidemiology - (2008) -

ARTICLE IN PRESS

errors of omission are more reprehensible than errors ofcommission [34]. Retrospective analyses indicate that byreducing the risk of false positive results, TSA may delayimplementation of interventions with 4 years [33]. Such de-lay should be weighted against the risk of introducingharmful intervention based on inconclusive evidence.

Our observed number of potentially spurious evidence ofeffect may be exaggerated because surprisingly, almost allneonatal meta-analyses were assessed with fixed-effectmodel meta-analyses as default [28]. If the more conserva-tive random-effects model, which gives wider confidenceintervals particularly in meta-analyses with considerableheterogeneity, had been used, fewer meta-analyses wouldhave obtained traditional significance (P ! 0.05). Thoughthe choice of a model is debated [1], studies have found thatthe difference between the proportion of meta-analyses be-coming significant with the fixed- and the random-effectsmodels is limited [29]. This indicates that our high numberof potentially spurious evidence in meta-analyses at theP ! 0.05 level results would only have been moderatelyreduced if the random-effects model, instead of the fixed-effect model, had been used in the Cochrane neonatalmeta-analyses.

TSA with a prespecified intervention effect (TSA30% andTSA15%) can be calculated without data from previous tri-als addressing the same question. We have used RRR of30% and 15% as examples: this seems moderate comparedto the frequently overestimation of intervention effectsduring the planning phase of randomized trials [35]. Theprespecified intervention effect is simply a guess, and ourfindings are appropriate under the assumption of only theseeffect sizes. It could be judged that a smaller or largerprespecified intervention effect is more relevant for eachindividual meta-analysis.

Conversely, the specified intervention effect in TSALBHIS

seems more reliable, as it is based on data from previouslow-bias risk trials [16]. However, if we have only few smalllow-bias risk trials, then TSALBHIS may be inaccurately es-timated. Besides allocation concealment, the distinction be-tween low-bias risk and high-bias risk trials could be madewith components, making TSALBHIS flexible. In individualmulticenter trials, it is suggested that sample sizes are ad-justed for between-center heterogeneity [36]. In this study,TSALBHIS adjusts the required information size for be-tween-study heterogeneity in meta-analyses. This heteroge-neity adjustment could also be applied to TSA30% andTSA15% [33].

4.3. Other comments

For simplicity, Fig. 1 presents only upper half of theTSA analysis, but TSA is a symmetric two-sided analysisas the cumulative Z-curve can obtain negative and positivevalues. Thus, when comparing two different interventions,either the upper or the lower boundary needs to be crossedto obtain firm evidence for a significant difference.

However, if an intervention is compared with placebo andthe outcome is, for example, adverse or harmful event,the TSA may be too restrictive, as such events are oftenpoorly reported [37].

The main idea of TSA is to reduce the risk of randomerrors due to repetitive testing on accumulative data inmeta-analysis. Therefore, if a substantial proportion of tri-als is biased, then any TSA would also provide misleadingresults. In general, crossing of monitoring boundaries forTSALBHIS before the information size is reached indicatesthat high-bias risk trials find a larger intervention effectcompared to low-bias risk trials. However, TSA should stillbe combined with bias-risk assessment in, for example,subgroup analyses, funnel plots, and meta-regression anal-yses [1].

5. Conclusion

The interpretation of meta-analyses is complex. To ad-just for random error risk in meta-analyses, we suggest ap-plying TSA (e.g., TSA with a relevant prespecifiedintervention effect in combination with TSALBHIS) onmeta-analyses. In this way, authors and readers of meta-analyses may reach a more balanced conclusion on the ef-fect of interventions.

References

[1] Higgins JPT, Green S, editors. Cochrane handbook for systematic

reviews of interventions 4.2.5. Accessed August 31, 2006. Available

at http://www.cochrane.org/resources/handbook/hbook.htm.

[2] Young C, Horton R. Putting clinical trials into context. Lancet

2005;366(9480):107e8.

[3] LeLorier J, Gregoire G, Benhaddad A, Lapierre J, Derderian F. Dis-

crepancies between meta-analyses and subsequent large randomized,

controlled trials. N Engl J Med 1997;337(8):536e42.

[4] Goodman SN. Have you ever meta-analysis you didn’t like? Ann

Intern Med 1991;114(3):244e6.

[5] Dickersin K, Rennie D. Registering clinical trials. JAMA 2003;290:

516e23.

[6] Song F, Eastwood AJ, Gilbody S, Duley L, Sutton AJ. Publication

and related biases. Health Technol Assess 2000;4:1e15.

[7] Ioannidis JP. Contradicted and initially stronger effects in highly

cited clinical research. JAMA 2005;294:218e28.

[8] Schulz KF, Chalmers I, Hayes R, Altman DG. Empirical evidence of

bias. Dimensions of methodological quality associated with estimates

of treatment in controlled trials. JAMA 1995;273:408e12.

[9] Moher D, Pham B, Jones A, Cook DJ, Jadad AR, Moher M, et al.

Does quality of reports of randomised trials affect estimates of

intervention efficacy reported in meta-analyses. Lancet 1998;352:

609e13.

[10] Kjaergard LL, Villumsen J, Gluud C. Reported methodological qual-

ity and discrepancies between large and small randomized trials in

meta-analyses. Ann Intern Med 2001;135:982e9.

[11] Als-Nielsen B, Gluud LL, Gluud C. Methodological quality and

treatment effects in randomised trialsda review of six empirical

studies. [abstract]. 12th International Cochrane Colloquium, Ottawa

2004. Accessed August 31, 2006. Available at http://www.cochrane.

org/colloquia/abstracts/ottawa/O-072.htm.

[12] Chan AW, Hrobjartsson A, Haahr MT, Gotzsche PC, Altman DG.

Empirical evidence for selective reporting of outcomes in

7J. Brok et al. / Journal of Clinical Epidemiology - (2008) -

ARTICLE IN PRESS

randomized trials: comparison of protocols to published articles.

JAMA 2004;291:2457e65.

[13] Montori VM, Devereaux PJ, Adhikari NK, Burns KE, Eggert CH,

Briel M, et al. Randomised trials stopped early for benefit: a system-

atic review. JAMA 2005;294:2203e9.

[14] Als-Nielsen B, Chen W, Gluud LL, Siersma V, Hilden J, Gluud C. Are

trial size and reported methodological quality associated with treat-

ment effects? Observational study of 523 randomised trials. 12th

International Cochrane Colloquium, Ottawa 2004. Accessed August

23, 2006. Available at. http://pubhealth.ku.dk/|vosi/files/P-003.ppt.

[15] Ioannidis JP. Why most published research findings are false. PLoS

Med 2005;2(8):e124.

[16] Wetterslev J, Thorlund K, Brok J, Gluud C. Trial sequential analysis

may establish when firm evidence is reached in cumulative meta-

analysis. J Clin Epidemiol 2008;61:64e75.

[17] Pogue J, Yusuf S. Overcoming the limitations of current meta-

analysis of randomised controlled trials. Lancet 1998;351(9095):

47e52.

[18] Pogue J, Yusuf S. Cumulating evidence from randomized trials: uti-

lizing sequential monitoring boundaries for cumulative meta-analy-

sis. Control Clin Trials 1997;18:580e93.

[19] Pocock SJ. Group sequential methods in the design and analysis of

clinical trials. Biometrika 1977;64(2):191e9.

[20] Lan KKG, DeMets DL. Discrete sequential boundaries for clinical

trials. Biometrika 1983;70:659e63.

[21] Devereaux PJ, Beattie WS, Choi PT, Badner NH, Guyatt GH,

Villar JC, et al. How strong is the evidence for the use of periopera-

tive beta blockers in non-cardiac surgery? Systematic review

and meta-analysis of randomised controlled trials. BMJ 2005;

331(7512):313e21.

[22] Egger M, Smith DG, Sterne JAC. Meta-analysis. Is moving the goal

post the answer? Lancet 1998;351:1517.

[23] Brok J, Greisen G, Jacobsen T, Gluud LL, Gluud C. Agreement be-

tween Cochrane Neonatal Group reviews and clinical guidelines for

newborns at a Copenhagen University Hospital. Acta Paediatr

2007;96:39e43.

[24] In: The Cochrane Library, Vol. 4. Chichester: Wiley; 2004. Accessed

August 31, 2006. Available at http://www3.interscience.wiley.com/

cgi-bin/mrwhome/106568753/HOME.

[25] Higgins JP, Thompson SG. Quantifying heterogeneity in a meta-

analysis. Stat Med 2002;21:1539e58.

[26] DeMets DL. Methods for combining randomized clinical trials:

strengths and limitations. Stat Med 1987;6:341e50.

[27] DerSimonian R, Laird N. Meta-analysis in clinical trials. Control

Clin Trials 1986;7(3):177e88.

[28] Soll RF, Sinclair JC, Bracken MB, Horbar JD, Haughton DE. The Ed-

itorial Team Neonatal Group. The Cochrane Neonatal Review Group.

About the Cochrane Collaboration (Collaborative Review Groups

(CRGs)). Issue 4. Art. No.: NEONATAL. Available at http://neona

tal.cochrane.org/en/index.html. Accessed Nov 14, 2007

[29] Engels EA, Schmid CH, Terrin N, Olkin I, Lau J. Heterogeneity and

statistical significance in meta-analysis: an empirical study of 125

meta-analyses. Stat Med 2000;19:1707e28.

[30] Gluud LL. Bias in clinical intervention research. Am J Epidemiol

2006;63(6):493e501.

[31] Alderson P. Absence of evidence is not evidence of absence. BMJ

2004;328(7438):476e7.

[32] Moja LP, Telaro E, D’Amico R, Moschetti I, Coe L, Liberati A. As-

sessment of methodological quality of primary studies by systematic

reviews: results of the metaquality cross sectional study. BMJ

2005;330(7499):1053.

[33] Thorlund K, Devereaux PJ, Wetterslev JW, Gyuatt G, Ioannidis JP,

Thabane L, et al. Can trial sequential monitoring boundaries reduce

spurious inferences from meta-analyses? Cochrane Colloquium, Sao

Paulo, 2007. Abstract book, O54.

[34] Doust J, Del Mar C. Why do doctors use treatments that do not work?

BMJ 2004;328(7438):474e5.

[35] Kumar A, Soares H, Djulbegovic B. High proportion of high quality

randomized clinical trials conducted by the NCI are negative or in-

conclusive. 13th International Cochrane Colloquium, Melbourne.

2005. Accessed August 31st, 2006. Available at http://www.

cochrane.org/colloquia/abstracts/melbourne/O-38.htm.

[36] Fedorov V, Jones B. The design of multicentre trials. Stat Methods

Med Res 2005;14(3):205e48.

[37] Ioannidis JP, Lau J. Completeness of safety reporting in randomized

trials: an evaluation of 7 medical areas. JAMA 2001;285:437e43.

[38] Ho JJ, Subramaniam P, Henderson-Smart DJ, Davis PG. Continuous

distending pressure for respiratory distress syndrome in preterm

infants. Cochrane Database Syst Rev 2002;10.1002/14651858.

CD002271. Issue 2. Art. No.: CD002271.

[39] Finer NN, Barrington KJ. Nitric oxide for respiratory failure in in-

fants born at or near term. Cochrane Database Syst Rev 2001;10.

1002/14651858.CD000399. Issue 4. Art. No.: CD000399.

[40] Soll RF, Blanco F. Natural surfactant extract versus synthetic surfactant

for neonatal respiratory distress syndrome. Cochrane Database Syst

Rev 2001;10.1002/14651858.CD000144. Issue 2. Art. No.: CD000144.

[41] Darlow BA, Graham PJ. Vitamin A supplementation for preventing

morbidity and mortality in very low birthweight infants. Cochrane

Database Syst Rev 2002;10.1002/14651858.CD000501. Issue 4.

Art. No.: CD000501.