nhs r&d hta programme hta · 2006-05-15 · that high-quality research information on the...

136
Bayesian methods in health technology assessment: a review DJ Spiegelhalter JP Myles DR Jones KR Abrams HTA Health Technology Assessment NHS R&D HTA Programme Health Technology Assessment 2000; Vol. 4: No. 38 Methodology

Upload: others

Post on 07-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Bayesian methods in health technologyassessment: a review

DJ SpiegelhalterJP MylesDR JonesKR Abrams

HTAHealth Technology Assessment NHS R&D HTA Programme

Health Technology Assessment 2000; Vol. 4: No. 38

Methodology

Copyright Notice
© Queen's Printer and Controller of HMSO 2001 HTA reports may be freely reproduced for the purposes of private research and study and may be included in professional journals provided that suitable acknowledgement is made and the reproduction is not associated with any form of advertising Violations should be reported to [email protected] Applications for commercial reproduction should be addressed to HMSO, The Copyright Unit, St Clements House, 2–16 Colegate, Norwich NR3 1BQ
Page 2: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

How to obtain copies of this and other HTA Programme reports.An electronic version of this publication, in Adobe Acrobat format, is available for downloading free ofcharge for personal use from the HTA website (http://www.ncchta.org). A fully searchable CD-ROM isalso available (see below).

Printed copies of HTA monographs cost £20 each (post and packing free in the UK) to both public andprivate sector purchasers from our Despatch Agents, York Publishing Services.

Non-UK purchasers will have to pay a small fee for post and packing. For European countries the cost is£2 per monograph and for the rest of the world £3 per monograph.

You can order HTA monographs from our Despatch Agents, York Publishing Services by:

– fax (with credit card or official purchase order) – post (with credit card or official purchase order or cheque)– phone during office hours (credit card only).

Additionally the HTA website allows you either to pay securely by credit card or to print out yourorder and then post or fax it.

Contact details are as follows:York Publishing Services Email: [email protected] Box 642 Tel: 0870 1616662YORK YO31 7WX Fax: 0870 1616663UK Fax from outside the UK: +44 1904 430868

NHS and public libraries can subscribe at a very reduced cost of £100 for each volume (normallycomprising 30–40 titles). The commercial subscription rate is £300 per volume. Please contact YorkPublishing Services at the address above. Subscriptions can only be purchased for the current orforthcoming volume.

Payment methods

Paying by chequeIf you pay by cheque, the cheque must be in pounds sterling, made payable to York PublishingDistribution and drawn on a bank with a UK address.

Paying by credit cardThe following cards are accepted by phone, fax, post or via the website ordering pages: Delta, Eurocard,Mastercard, Solo, Switch and Visa. We advise against sending credit card details in a plain email.

Paying by official purchase orderYou can post or fax these, but they must be from public bodies (i.e. NHS or universities) within the UK.We cannot at present accept purchase orders from commercial companies or from outside the UK.

How do I get a copy of HTA on CD?

Please use the form on the HTA website (www.ncchta.org/htacd.htm). Or contact York PublishingServices (see contact details above) by email, post, fax or phone. HTA on CD is currently free of chargeworldwide.

The website also provides information about the HTA Programme and lists the membership of the variouscommittees.

HTA

Page 3: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Bayesian methods in healthtechnology assessment: a review

DJ Spiegelhalter1 *

JP Myles1

DR Jones2

KR Abrams2

1 MRC Biostatistics Unit, Institute of Public Health, Cambridge, UK2 Department of Epidemiology and Public Health, University of Leicester,

Leicester, UK

* Corresponding author

Competing interests: none declared

Published December 2000

This report should be referenced as follows:

Spiegelhalter DJ, Myles JP, Jones DR, Abrams KR. Bayesian methods in health technologyassessment: a review. Health Technol Assess 2000;4(38).

Health Technology Assessment is indexed in Index Medicus/MEDLINE and Excerpta Medica/EMBASE. Copies of the Executive Summaries are available from the NCCHTA website(see opposite).

Page 4: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

NHS R&D HTA Programme

The NHS R&D Health Technology Assessment (HTA) Programme was set up in 1993 to ensurethat high-quality research information on the costs, effectiveness and broader impact of health

technologies is produced in the most efficient way for those who use, manage and provide carein the NHS.

Initially, six HTA panels (pharmaceuticals, acute sector, primary and community care, diagnosticsand imaging, population screening, methodology) helped to set the research priorities for the HTAProgramme. However, during the past few years there have been a number of changes in and aroundNHS R&D, such as the establishment of the National Institute for Clinical Excellence (NICE) andthe creation of three new research programmes: Service Delivery and Organisation (SDO); New andEmerging Applications of Technology (NEAT); and the Methodology Programme.

Although the National Coordinating Centre for Health Technology Assessment (NCCHTA)commissions research on behalf of the Methodology Programme, it is the Methodology Groupthat now considers and advises the Methodology Programme Director on the best researchprojects to pursue.

The research reported in this monograph was funded as project number 93/50/05.

The views expressed in this publication are those of the authors and not necessarily those of theMethodology Programme, HTA Programme or the Department of Health. The editors wish toemphasise that funding and publication of this research by the NHS should not be taken as implicitsupport for any recommendations made by the authors.

Methodology Programme Director: Professor Richard LilfordHTA Programme Director: Professor Kent WoodsSeries Editors: Professor Andrew Stevens, Dr Ken Stein and Professor John GabbayMonograph Editorial Manager: Melanie Corris

The editors and publisher have tried to ensure the accuracy of this report but do not accept liabilityfor damages or losses arising from material published in this report. They would like to thank thereferees for their constructive comments on the draft document.

ISSN 1366-5278

© Queen’s Printer and Controller of HMSO 2000

This monograph may be freely reproduced for the purposes of private research and study and may be included in professional journalsprovided that suitable acknowledgement is made and the reproduction is not associated with any form of advertising.

Applications for commercial reproduction should be addressed to HMSO, The Copyright Unit, St Clements House, 2–16 Colegate,Norwich, NR3 1BQ.

Published by Core Research, Alton, on behalf of the NCCHTA.Printed on acid-free paper in the UK by The Basingstoke Press, Basingstoke. M

Criteria for inclusion in the HTA monograph seriesReports are published in the HTA monograph series if (1) they have resulted from workcommissioned for the HTA Programme, and (2) they are of a sufficiently high scientific quality asassessed by the referees and editors.

Reviews in Health Technology Assessment are termed ‘systematic’ when the account of the search,appraisal and synthesis methods (to minimise biases and random errors) would, in theory, permitthe replication of the review by others.

Page 5: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

List of abbreviations ......................................... i

Executive summary .......................................... iii

1 Introduction .................................................. 1What are Bayesian methods? ........................ 1Reasons for conducting a review .................. 1Objectives ....................................................... 2Review methodology ...................................... 2Structure of the review .................................. 3Key points ....................................................... 4

2 An overview of the Bayesian philosophyin the context of health technologyassessment .................................................... 5The ‘classical’ statistical approach inhealth technology assessment ....................... 5Critique of the classical approach ................ 5Bayes’s theorem ............................................. 7Reporting probability statements ................. 8The subjective interpretation ofprobability ...................................................... 8The relation to the use of Bayes’s theoremin diagnostic tests ........................................... 9The prior distribution ................................... 9Predictions ..................................................... 10Sequential analysis ......................................... 10Decision-making ............................................ 11Hypothesis testing .......................................... 11Design ............................................................. 12Multiplicity ..................................................... 12Complex modelling ....................................... 12Computational issues ..................................... 12The theoretical justification for theBayesian approach ......................................... 13Schools of Bayesians ...................................... 13Making the health technology assessmentcontext explicit .............................................. 13Further reading .............................................. 14Commentary ................................................... 14Key points ....................................................... 15

3 Where does the prior distribution comefrom? .............................................................. 17Introduction ................................................... 17Elicitation of opinion .................................... 17Summary of evidence .................................... 18Default priors ................................................. 19‘Robust’ priors ............................................... 20

Exchangeability, hierarchical models andmultiplicity ..................................................... 21Empirical criticism of priors ......................... 22Commentary ................................................... 22Key points ....................................................... 23

4 A guide to the Bayesian healthtechnology assessment literature:conduct of randomised controlledtrials ............................................................... 25General arguments ........................................ 25Ethics and randomisation ............................. 25Specification of null hypotheses ................... 27Using historical controls ............................... 27Design: sample size of non-sequentialtrials ................................................................ 28Design and monitoring of sequentialtrials ................................................................ 29The role of ‘scepticism’ in confirmatorystudies ............................................................. 33Reporting, sensitivity analysis, androbustness ....................................................... 33Subset analysis ................................................ 35Multicentre analysis ....................................... 36Multiple end-points and treatments ............. 36Data-dependent allocation ............................ 37Trial designs other than two parallelgroups ............................................................. 37Other aspects of drug development ............. 38Commentary ................................................... 39Key points ....................................................... 41

5 A guide to the Bayesian healthtechnology assessment literature:observational studies .................................. 43Introduction ................................................... 43Case–control studies ...................................... 43Complex epidemiological models ................ 43Explicit modelling of biases .......................... 44Institutional comparisons ............................. 44Commentary ................................................... 44Key points ....................................................... 44

6 A guide to the Bayesian healthtechnology assessment literature:evidence synthesis ....................................... 47Meta-analysis ................................................... 47Cross-design synthesis .................................... 48Confidence profile method .......................... 49

Health Technology Assessment 2000; Vol. 4: No. 38

Contents

Page 6: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Key points ....................................................... 49

7 A guide to the Bayesian healthtechnology assessment literature:strategy, decisions and policymaking ........................................................... 51Contexts .......................................................... 51Cost-effectiveness within trials ...................... 51Cost-effectiveness of carrying out trials –‘payback models’ ............................................ 52The regulatory perspective ........................... 53Policy making and ‘comprehensivedecision modelling’ ....................................... 54Key points ....................................................... 54

8 BayesWatch: a Bayesian checklist forhealth technology assessment ................... 55Introduction ................................................... 55Methods .......................................................... 55Results ............................................................. 56Interpretation ................................................ 56Example .......................................................... 56

9 Case study 1: the CHART (lung cancer)trial ................................................................. 59

10 Case study 2: meta-analysis ofmagnesium sulphate following acutemyocardial infarction .................................. 63

11 Case study 3: confidence profilingrevisited ......................................................... 69Analysis of surveillance of colorectalcancer patients: a modelling exercisebased entirely on judgements ....................... 69

Analysis of HIP trial of breast cancerscreening: adjusting a trial’s result foruncertain internal biases ............................... 70Analysis of screening for maple syrupurine disease (MSUD): modelling usingevidence from multiple studies .................... 72Analysis of colon cancer screening trial:power calculations allowing for cross-overbetween treatment arms ................................ 74Commentary ................................................... 75

12 Case study 4: comparison of in vitrofertilisation clinics ........................................ 77

13 Conclusions and implications for futureresearch ......................................................... 83Introduction ................................................... 83Specific conclusions ....................................... 83General advantages and problems ............... 84Future research and development ............... 85

Acknowledgements ..................................... 87

References ..................................................... 89

Appendix 1 Three-star applications ......... 105

Appendix 2 Websites and software ........... 121

Health Technology Assessment reportspublished to date ......................................... 123

Methodology Group .................................... 129

HTA Commissioning Board ...................... 130

Contents

Page 7: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Health Technology Assessment 2000; Vol. 4: No. 38

i

List of abbreviations

List of abbreviations

6-MP 6-mercaptopurine

AMI acute myocardial infarction

CALGB Cancer and Leukaemia Group B

CDRH Center for Devices andRadiological Health

CHART continuous hyperfractionatedaccelerated radiotherapy

CMT conventional medical treatment

CRM continuous reassessmentmethod

DI donor insemination

DMC Data Monitoring Committee

ECMO extracorporeal membraneoxygenation

ENBS expected net benefit fromsampling

EVPI expected value of perfectinformation

EVSI expected value of sampleinformation

FDA Food and Drugs Administration

GREAT Grampian Region EarlyAnistreplase Trial

GUSTO Global Utilization ofStreptokinase and t-PA forOccluded Coronary Arteries

HFEA Human Fertility andEmbryology Authority

HIB Haemophilus influenzae type b

ISIS-4 Fourth International Study ofInfarct Survival

IVF in vitro fertilisation

LIMIT-2 Second Leicester IntravenousMagnesium Intervention Trial

MCMC Markov chain Monte Carlo

MSUD maple syrup urine disease

NCI National Cancer Institute

NSABP National Surgical AdjuvantBreast and Bowel Project

PORT Patient Outcomes ResearchTeam

SPPM Stroke Prevention Policy Model

TRACE Trandolapril Cardiac Evaluation

Page 8: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient
Page 9: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Background

Bayesian methods may be defined as the explicitquantitative use of external evidence in the design,monitoring, analysis, interpretation and reportingof a health technology assessment. In outline, themethods involve formal combination through theuse of Bayes’s theorem of:

1. a prior distribution or belief about the value ofa quantity of interest (for example, a treatmenteffect) based on evidence not derived from thestudy under analysis, with

2. a summary of the information concerning thesame quantity available from the data collectedin the study (known as the likelihood), to yield

3. an updated or posterior distribution of thequantity of interest.

These methods thus directly address the questionof how new evidence should change what wecurrently believe. They extend naturally intomaking predictions, synthesising evidence frommultiple sources, and designing studies: inaddition, if we are willing to quantify the value ofdifferent consequences as a ‘loss function’,Bayesian methods extend into a full decision-theoretic approach to study design, monitoringand eventual policy decision-making. Nonetheless,Bayesian methods are a controversial topic in thatthey may involve the explicit use of subjectivejudgements in what is conventionally supposed tobe a rigorous scientific exercise.

Objectives

This report is intended to provide:

1. a brief review of the essential ideas of Bayesiananalysis

2. a full structured review of applications ofBayesian methods to randomised controlledtrials, observational studies, and the synthesis ofevidence, in a form which should be reasonablystraightforward to update

3. a critical commentary on similarities and differ-ences between Bayesian and conventionalapproaches

4. criteria for assessing the reporting of a Bayesiananalysis

5. a comprehensive list of published ‘three-star’examples, in which a proper prior distributionhas been used for the quantity of primaryinterest

6. tutorial case studies of a variety of types7. recommendations on how Bayesian methods

and approaches may be assimilated into healthtechnology assessments in a variety of contextsand by a variety of participants in the researchprocess.

Methods

The BIDS ISI database was searched using theterms ‘Bayes’ or ‘Bayesian’. This yielded almost4000 papers published in the period 1990–98. Allresultant abstracts were reviewed for relevance tohealth technology assessment; about 250 were soidentified, and used as the basis for forward andbackward searches. In addition EMBASE andMEDLINE databases were searched, along withwebsites of prominent authors, and availablepersonal collections of references, finally yieldingnearly 500 relevant references. A comprehensivereview of all references describing use of ‘proper’Bayesian methods in health technology assessment(those which update an informative prior distribu-tion through the use of Bayes’s theorem) hasbeen attempted, and around 30 such papers arereported in structured form. There has been verylimited use of proper Bayesian methods in practice,and relevant studies appear to be relatively easilyidentified.

Results

Bayesian methods in the healthtechnology assessment context1. Different contexts may demand different

statistical approaches. Prior opinions aremost valuable when the assessment formspart of a series of similar studies. A decision-theoretic approach may be appropriate wherethe consequences of a study are reasonablypredictable.

Health Technology Assessment 2000; Vol. 4: No. 38

iii

Executive summary

Page 10: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

2. The prior distribution is important and notunique, and so a range of options should beexamined in a sensitivity analysis. Bayesianmethods are best seen as a transformation frominitial to final opinion, rather than providing asingle ‘correct’ inference.

3. The use of a prior is based on judgement,and hence a degree of subjectivity cannot beavoided. However, subjective priors tend toshow predictable biasses, and archetypal priorsmay be useful for identifying a reasonablerange of prior opinion. For a prior to be takenseriously, its evidential basis must be explicitlygiven.

4. The Bayesian approach provides a frameworkfor considering the ethics of randomisation.

5. Monitoring trials with sceptical and otherpriors may provide a unified approach toassessing whether the results of a trial shouldbe convincing to a wide range of reasonableopinion, and could provide a formal tool fordata-monitoring committees.

6. In contrast to earlier phases of development, itis generally unrealistic to formulate a Phase IIItrial as a decision problem, except in circum-stances where future treatments can be accu-rately predicted.

7. Observational data will generally require morecomplex analysis: the explicit modelling ofpotential biasses may be widely applicable butneeds some evidence-base in order to beconvincing.

8. A unified Bayesian approach is applicable toa wide range of problems concerned withevidence synthesis, for example in poolingstudies of differing designs in the assessmentmedical devices.

9. Priors for the degree of ‘similarity’ betweenalternative designs can be empirically informedby studies comparing the results of randomisedcontrolled trials and observational data.

10. Increased attention to pharmaco-economicsshould lead to further investigation of decision-theoretic models for research planning,although this will not be straightforward.

11. Regulatory agencies are acknowledgingBayesian methods and have not ruled out theiruse, and the regulation of medical devices isleading the way in establishing the role ofevidence synthesis.

12. ‘Comprehensive decision modelling’ is likelyto become increasingly important in policymaking.

13. The BayesWatch criteria described in thisreport may provide a basis for structuredreporting of Bayesian analysis.

14. Summaries of fully fledged (‘three-star’)applications of Bayesian methods in healthtechnology assessment contain few prospectiveanalyses but provide useful guidance.

15. Four case studies show:a. Bayesian analyses using a sceptical prior can

be useful to the data-monitoring committeeof a cancer clinical trial.

b. Bayesian methods can be used to temperoveroptimistic conclusions based on meta-analysis of small trials.

c. Modern graphical software can easily handlecomplex assessments previously analysedusing the ‘confidence profile’ method.

d. Bayesian methods provide a flexible toolfor performance estimation and rankingof institutions.

Recommendations and implications forfuture research and developmentBayesian methods could be of great value withinhealth technology assessment, but for a realisticappraisal of the methodology, it is necessary todistinguish the roles and requirements for fivemain participant groups in health technologyassessment: methodological researchers, sponsors,investigators, reviewers and consumers. Twocommon themes for all participants can immedi-ately be identified. First, the need for an extendedset of case studies showing practical aspects of theBayesian approach, in particular for predictionand handling multiple substudies, in which mathe-matical details are minimised but details of imple-mentation are provided. Second, the developmentof standards for the performance and reporting ofBayesian analyses, possibly derived from theBayesWatch checklist.

Some specific potential areas of research and devel-opment include:

1. Design. Realistic development of paybackmodels and consideration of ‘open’ studies.

2. Priors. Investigation of evidence-based priordistributions appropriate to the participantgroup, as well as reasonable default priors innon-standard situations.

3. Modelling. Efficient use of all available evidenceby appropriate joint modelling of historicalcontrols, related studies, and so on.

4. Reporting. Development of criteria along thelines of the BayesWatch checklist, so that futureusers can reproduce analyses.

5. Decision-making. Increased integration witha health-economic and policy perspective,together with flexible tools for implementation.

Executive summary

iv

Page 11: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

What are Bayesian methods?

Bayesian statistics began with a posthumous publica-tion in 1763 by Thomas Bayes,36 a non-conformistminister from Tunbridge Wells.237 His work wasformalised as Bayes’s theorem, which, whenexpressed mathematically, is a simple and uncontro-versial result in probability theory. However, certainspecific uses of the theorem have been the subject ofcontinued controversy for over a century,128,170 givingrise to a steady stream of polemical arguments ina number of disciplines. In recent years a morebalanced and pragmatic perspective has developed.

The basic idea of Bayesian analysis can be illus-trated by a simple example and, although we shalltry to keep mathematical notation to a minimum inthis review, it will be very helpful if we are allowedone Greek letter q (theta), to denote a currentlyunknown quantity of primary interest. Suppose ourquantity q is the median life-years gained by usingan innovative rather than a standard therapy ona defined group of patients. A clinical trial iscarried out, following which conventional statisticalanalysis of the results would typically produce aP value, an estimate and a confidence interval assummaries of what this particular trial tells usabout q. A Bayesian analysis supplements this byfocusing on the question ‘How should this trialchange our opinion about q?’ This perspectiveforces the analyst to explicitly state

• a reasonable opinion concerning q excludingthe evidence from the trial (known as the priordistribution)

• the support for different values of q based solelyon data from the trial (known as the likelihood)

and to combine these two sources to produce

• a final opinion about q (known as the posteriordistribution).

The final combination is done using Bayes’stheorem.

What, then, is a Bayesian approach to health tech-nology assessment? We have defined it 415 as “theexplicit quantitative use of external evidence in the

design, monitoring, analysis, interpretation andreporting of a health technology assessment”.

Reasons for conducting a review

Much of the standard statistical methodology usedin health technology assessment revolves aroundthat for the classical randomised controlled trial:these include power calculations at the designstage, methods for controlling type I error withinsequential monitoring, calculation of P values andconfidence intervals at the final analysis, and meta-analytic techniques for pooling the results ofmultiple studies. Such methods have served themedical research community well.

The increasing sophistication of health technologyassessment studies is, however, highlighting thelimitations of these traditional methods. Forexample, when carrying out a clinical trial, themany sources of evidence and judgement availablebefore a trial may be inadequately summarised by asingle ‘alternative hypothesis’, monitoring may becomplicated by simultaneous publication of relatedstudies, and multiple subgroups may need to beanalysed and reported. Evidence from multiplesources may need to be combined in order toinform a policy decision, such as embarking orcontinuing on a research programme, regulatoryapproval of a drug or device, or recommendationof a treatment at an individual or population level.Standard statistical methods are designed for singlestudies, and have difficulties dealing with thispervading complexity.

A Bayesian perspective leads to an approach toclinical trials and observational studies that isclaimed to be more flexible and ethical than tradi-tional methods,262 and to elegant ways of handlingmultiple substudies, for example when simulta-neously estimating the effects of a treatment onmany subgroups.72 Proponents have also arguedthat a Bayesian approach enables one to provideconclusions in a suitable form for making deci-sions: whether for specific patients, for planningresearch, or for public policy.299

The increasing interest in the Bayesian approach isreflected both in the medical and statistical

Health Technology Assessment 2000; Vol. 4: No. 38

1

Chapter 1

Introduction

Page 12: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

literature and in the popular scientific press.318

Pharmaceutical companies are beginning toexpress an interest, possibly helped by the recentinternational regulatory authority statistical guide-lines248 explicitly mentioning the possibility of aBayesian analysis. However, many outstandingquestions remain: in particular, to what extent willthe scientific community, or the regulatory authori-ties, allow the explicit introduction of evidence thatis not totally derived from observed data, or theformal pooling of data from studies of differingdesigns?

Objectives

This report is intended to provide:

1. a brief review of the essential ideas of Bayesiananalysis

2. a full structured review of applications ofBayesian methods to randomised controlledtrials, observational studies and the synthesis ofevidence, in a form which should be reasonablystraightforward to update

3. a critical commentary on similarities and differ-ences between Bayesian and conventionalapproaches

4. criteria for assessing the reporting of a Bayesiananalysis

5. a comprehensive list of published ‘three-star’examples (see the following section for a defini-tion of this term), in which a proper prior distri-bution has been used for the quantity of primaryinterest

6. tutorial case studies of a variety of types7. recommendations on how Bayesian methods and

approaches may be assimilated into health tech-nology assessments in a variety of contexts andby a variety of participants in the researchprocess.

Review methodology

What do we mean by a ‘systematic’review?In common with all such methodological reviews, itis essential to define what we mean by ‘systematic’.We have identified three levels of review:

Comprehensive. This seeks to identify all relevantreferences, and has only been attempted for whatwe have termed ‘three-star’ Bayesian health tech-nology assessment studies, which we define as those

1. intending to confirm the value of a technology

2. using an informative, carefully considered priordistribution for the primary quantity of interest

3. updating, or planning to update, this prior distri-bution by Bayes’s theorem.

Such three-star studies have been summarisedaccording to a standard pro forma, and arereported in appendix 1. We note that we do notrequire such studies to be prospective, in thatcurrently most Bayesian examples are re-analysesof previous studies.

The following are therefore not considered asthree-star studies:

1. exploratory or Phase II studies2. those using a ‘minimally informative’ or

reference prior, or only using an informativeprior for a nuisance parameter such as between-study heterogeneity

3. decision analyses in which expected utilities areassessed without any updating of beliefs usingBayes’s theorem.

Systematic. This is based on a structured searchand reporting of the literature, and covers manyareas which are not comprehensively reviewed,such as Bayesian analyses using reference priorsand Phase I and Phase II studies. An attempt hasbeen made to identify the majority of the relevantliterature.

Peripheral. Minimal references are provided ontopics such as using Bayes’s theorem for prognosticor diagnostic statements, ‘empirical Bayes’ analysiswhich uses elements of Bayesian modelling withoutgiving a Bayesian interpretation to the conclusions,preclinical work, pharmacokinetics, decisionanalysis, and descriptive studies of clinician’spersonal beliefs.

We emphasise that our definition of ‘Bayesian’may be more restrictive than that of otherresearchers who may, for example, place a muchhigher emphasis on decision-making using subjec-tive probabilities, without necessarily requiringthe use of Bayes’s theorem.

The search procedureA Bayesian approach can be applied to many scien-tific issues, and a search of the BIDS ISI databaseusing the term ‘Bayes’ or ‘Bayesian’ yielded nearly4000 papers over the period 1990–98. All of theabstracts were handsearched for relevant material,and about 300 of these were relevant to healthtechnology assessment. These were used as a

Introduction

2

Page 13: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

source for forward and backward searches, andfurther techniques included searching otherdatabases (EMBASE and MEDLINE), personalcollections, handsearching recent journals, andinternet searches of prominent authors.

Since it is very difficult to identify appropriatehealth technology assessment literature fromkeywords, we would recommend anyone con-ducting a search for Bayesian methods in healthtechnology assessment to use ‘bayes’ and ‘bayesian’in all searches and then view all abstracts.

We identified about 450 relevant papers, includingaround 30 reports of studies taking a ‘three-star’Bayesian perspective. The published studies aredispersed throughout the literature, apart fromone recent collection of papers,45 and the onlygeneral textbook which might be considered asBayesian health technology assessment is on theconfidence profile approach.150

It will be clear that the studies are mainly demon-strations of the approach rather than completeassessments, and in spite of numerous articlespromoting the use of Bayesian methods the prac-tical take-up seems very low, although increasing.Possible reasons for this will be discussed in ourreview. There is also a preponderance of articlesin the literature on methodology for clinical trials,and an apparent lack of articles on the morecomplex issues of synthesising data from studiesof different designs, in spite of this being anarea where Bayesian methods may have much tooffer.

Structure of the review

• Chapter 2 briefly reviews the ‘classical’ statisticalapproach to health technology assessment, andthen outlines the main features of the Bayesianphilosophy, including the subjective inter-pretation of probability, the relation to diag-nostic testing, predictions, decision-makingand design. There is a brief description ofcomputational methods, complex Bayesianmodelling, and the theoretical justification forthe approach. Schools of Bayesians are identi-fied, and a commentary attempts to sort out themajor ideological issues.

• Chapter 3 deals in detail with the possiblesources of prior distributions and their possiblecriticism in the light of data, and introduces theconcept of exchangeability and its relation tohierarchical or multilevel prior distributions.

• Chapter 4 attempts to structure the large litera-ture on Bayesian approaches to all aspects ofrandomised controlled trials, including sequen-tial analysis, reporting, cost-effectiveness analysisand the stages of drug development.

• Chapter 5 covers observational studies, suchas case–control designs, the use of historicalcontrols, and non-randomised comparisons ofinstitutions.

• Chapter 6 considers meta-analysis and its gener-alisations, in which evidence from multiplestudies, possibly of different designs, is pooledusing a statistical model.

• Chapter 7 examines how Bayesian analyses forrandomised or non-randomised studies may beplaced in a concrete decision-making contextin order to inform either commercial or publicpolicies, possibly with explicit costs on the conse-quences of alternative strategies. The view ofalternative ‘actors’ is emphasised.

• Chapter 8 discusses the reporting of Bayesianstudies, sets out criteria for assessing the qualityof a Bayesian analysis, and provides an example.

• Chapter 9 is a case study in which a sequentialcancer clinical trial, the continuous hyper-fractioned accelerated radiotherapy (CHART)study, was monitored using a Bayesian procedure.

• Chapter 10 is a case study concerning the much-studied issue of magnesium for acute myocardialinfarction (AMI), in which a meta-analysisconflicted with a mega-trial. We show that areasonably sceptical Bayesian meta-analysiswould not have found the initial meta-analysisconvincing evidence.

• Chapter 11 shows by four small case studieshow modern Bayesian software can deal withthe complex modelling problems previouslyanalysed using the confidence profile approach.

• Chapter 12 provides an example of a institu-tional comparison, in which the success ratesof UK in vitro fertilisation (IVF) clinics iscompared.

• Chapter 13 provides a final summary, generaldiscussion and some suggestions for futureresearch.

• Appendix 1 summarises and lists the ‘three-star’applications in a structured format, using thecriteria outlined in chapter 9.

Health Technology Assessment 2000; Vol. 4: No. 38

3

Page 14: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

• Appendix 2 briefly describes available softwareand internet sites of interest.

Most of the chapters in the review finish with acritical commentary, in which the argumentsagainst the Bayesian perspective are summarised,and a list of key points for each chapter arerepeated in chapter 13. Mathematical and computa-tional methods will be barely mentioned, anddetails should be sought in the references provided.The review is therefore structured by context ratherthan methods, although some methodologicalthemes inevitably run throughout; for example,what form of prior distribution is appropriate, andis it reasonable to adopt an explicit loss function?

Finally, we should be quite explicit as to our ownsubjective biases, which will doubtless be apparentfrom the organisation and text of this review. Wefavour the Bayesian philosophy, and would like tosee its use extended in health technology assess-ment, but feel that this should be carried outcautiously, with critical appraisal, and in parallelwith the currently accepted methods.

Key points

1. Bayesian methods are defined as the explicitquantitative use of external evidence in thedesign, monitoring, analysis, interpretation andreporting of a health technology assessment.

2. Bayesian methods are a controversial topic inthat they may involve the explicit use of subjec-tive judgements in what is conventionallysupposed to be a rigorous scientific exercise inhealth technology assessment.

3. There has been very limited use of properBayesian methods in practice, and relevantstudies appear to be relatively easily identified.

4. The potential importance of Bayesian methodsto a topic is not necessarily reflected in thevolume of published literature: in particular,publications on the design and analysis ofsingle clinical trials dominate those on thesynthesis of evidence from studies of multipledesigns.

Introduction

4

Page 15: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

In this chapter we give an overview of some of thegeneric features of the Bayesian philosophy that

find application in health technology assessment.Limited references to the literature are given atthis stage, but relevant sections within randomisedtrials, observational studies and evidence synthesisare identified. We shall use a simple runningexample to illustrate the general issues: theGrampian Region Early Anistreplase Trial(GREAT)204 of early thrombolytic treatment formyocardial infarction, which reported a 49%reduction in mortality (23/148 deaths on controlversus 13/163 deaths on active treatment).

The ‘classical’ statistical approachin health technology assessmentIt would be misleading to dichotomise statisticalmethods as either ‘classical’ or ‘Bayesian’, sinceboth terms cover a bewildering range of tech-niques. It is a little more fair to divide conventionalstatistics into two broad schools, Fisherian andNeyman–Pearson; different Bayesian approacheswill be discussed later in this chapter.

• The Fisherian approach to inference on anunknown intervention effect q is based on thelikelihood function mentioned previously,which expresses the relative support given to thedifferent values of q by the data. This gives rise toan estimate comprising the ‘most-likely’ value forq, intervals based on the range of values of q mostsupported by the data, and the evidence againstspecified null hypotheses summarised by P values(the chance of getting a result as extreme as thatobserved were the null hypothesis true).

• The Neyman–Pearson approach is focused onthe chances of making various types of error sothat, for example, clinical trials are designed tohave a fixed type I error a (the chance of incor-rectly rejecting the null hypothesis), usuallytaken as 5 or 1%, and fixed power (one minusthe type II error b, the chance of not detectingthe alternative hypothesis), often 80 or 90%.The situation is made more complex if a sequen-tial design is used, in which the data are

periodically analysed and the trial stoppedif sufficiently convincing results obtained.Repeated analysis of the data has a strong effecton the type I error, since there are many oppor-tunities to obtain a false-positive result, and thusthe P value and the confidence interval needadjusting477 (although this rarely appears to becarried out in the published report of the trial).Such sequential analysis is just one example of aproblem of ‘multiplicity’, in which adjustmentsneed to be made due to multiple analyses beingcarried out simultaneously. A standard exampleis the use of Bonferroni adjustments when esti-mating treatment effects in multiple subsets.

Clinical trials are generally designed from aNeyman–Pearson standpoint, but analysed from aFisherian perspective.398 Methods used for observa-tional methods and evidence synthesis tend to bemore Fisherian.

Advantages of the traditional framework includeits apparent separation of the evidence in the

data from subjective factors, the general ease incomputation, its wide acceptability and establishedcriteria for ‘significance’, its relevance to the drugregulatory framework in which quality control ofstatistical submissions must be ensured, the avail-ability of software, and the existence of robustnon- and semi-parametric procedures.

Critique of the classical approach

Hypothesis testing and P valuesOveremphasis on hypothesis testing has beenstrongly criticised (although shifting attentionto confidence intervals does not avoid all theproblems, since these are just the set of hypothesesthat cannot be rejected at a certain a level). Pvalues are explicitly concerned with the chance ofobserving the data (or something more extreme)given certain values of the unknown quantity q,and use an inverse (and frequently misunderstood)argument for deriving statements about q.Arguments against this procedure include: the nullhypothesis may be neither plausible nor of greatinterest, the arbitrariness of the 0.05 and 0.01 level,

Health Technology Assessment 2000; Vol. 4: No. 38

5

Chapter 2

An overview of the Bayesian philosophy in thecontext of health technology assessment

Page 16: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

the focus on statistical rather than clinical signifi-cance, the problem over one- or two-sided tests,the fact that even in some simple circumstances,such as a 2 × 2 table, the definition of the P valueis unclear, and that P values tend to create afalse dichotomy between ‘significant’ and‘non-significant’ which is inappropriate forconsequent policy decisions.299 See Schervish392

for a general discussion.

Furthermore, Freeman184 gives a good example ofthe limitations of P values as expressions ofevidence: Table 1 shows the results of four hypo-thetical trials in which equal number of patientsare given treatments A and B and asked which theyprefer, each resulting in an identical ‘significant’ Pvalue of 0.04.

But, as Freeman states, the first trial would beconsidered too small to permit reliable conclu-sions, while the last trial (with a preferenceproportion of 50.07%) would be considered asevidence for rather than against equivalence. Theimportance of sample size and plausibility ofbenefits in interpreting P values has often beenstressed, and the Fourth International Study ofInfarct Survival (ISIS-4) investigators state that“when moderate benefits or negligibly smallbenefits are both much more plausible thanextreme benefits, then a 2p = 0.001 effect in alarge trial or overview would provide muchstronger evidence of benefit than the same signifi-cance level in a small trial, a small overview, or asmall subgroup analysis”.109 Sheiner400 provides astrong polemic against hypothesis testing and infavour of an approach in which “we gather datato model and quantify nature”.

Type I and type II errorBoth Bayesians and Fisherians can express strongcriticism of Neyman–Pearson theory. Anscombe,13

quoted by Herson,232 says “the concept of errorprobabilities of the first and second kinds …has no direct relevance to experimentation. Theformalism of opinions, decisions concerningfurther experimentation and other required

actions, are not dictated in a simple prearrangedway by the formal analysis of the experiment, butcall for judgement and imagination”, while Healy224

asks “Why the invariable 5% for a? Conditional onthis, why the larger 10% or even 20% for b? Is itreally more important not to make a fool ofyourself than it is to discover something new?”.Criticism of fixed type I error has particularlybeen aimed at sequential analysis (see pages10 and 29), and relevance of hypothesis testingand decision-making to health technologyassessment will be a running theme of this review(see page 14).

Multiple testingWe have already identified the crucial issue thatarises in any context in which simultaneous analysisof multiple studies, or multiple analyses of thesame study, is required. The traditional approachwarns that repeated hypothesis testing is bound toraise the chance of a type I error (wrongly rejectinga true null hypothesis), and so suggests someadjustment, such as Bonferroni, to try to retain aspecified overall type I error. This will typically givelarger P values and wider confidence intervals: ithas been shown that such results would be consis-tent with a rather odd prior distribution in which aconstant probability is given to all the null hypoth-eses being true, regardless of their number.472

The need for any such adjustment, which neces-sarily depends on the number of hypotheses beingtested, has been strongly questioned from a non-Bayesian perspective, particularly in epidemi-ology;356,382 Cole108 states that “in every study, everyassociation should be evaluated on its own merits:its prior credibility and its features in the study athand. The number of other variables is irrelevant”,while Cook and Farewell110 say that it is generallyreasonable to report unadjusted conclusions.However, Greenland and Robins207 are among themany who have argued that some adjustment isnecessary, but rather than be based on type I errorsit should be derived from an explicit model thatreflects assumptions about variability. We shallreturn to this theme on page 12.

Other criticisms of conventional statistical analysisinclude that it fails to incorporate formally theinevitable background information that is availableboth at design and analysis and, from a moreideological perspective, that it disobeys certainreasonable axioms of rational behaviour (seepage 13). Finally, there is no doubt that classicalinferences are often inappropriately interpretedin a Bayesian way, in that P values are mistakenfor probabilities of null hypotheses being true,

Overview of Bayesian philosophy

6

Number of patientsreceiving A and B

Proportionpreferring A

P value

,,20,,200

,2,0002,000,000

15:5115:861,046:9541,001,445:998,555

0.040.040.040.04

TABLE 1 Four theoretical studies all with the same P value

Page 17: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

and 95% confidence intervals as meaning there is a95% chance of containing the true value.184

Bayes’s theorem

Suppose q is some quantity that is currentlyunknown, for example a specific patient’s truediagnosis or the true success rate of a new therapy,and let p(q) denote the probability of each possiblevalue of q (where for the moment we do notconcern ourselves with the source of that proba-bility). Suppose we have some observed evidence ywhose likelihood of occurrence depends on q,for example a diagnostic test or the results of aclinical trial. This dependence is formalised by a

probability p(y|q), which is the (conditional) proba-bility of y for each possible value of q. We wouldlike to obtain the new probability for differentvalues of q, taking account of the evidence y; thisprobability has the conditioning reversed, and isdenoted p(q|y). Bayes’s theorem simply says

p(q|y) µ p(y|q) × p(q)

(The proportionality is made into an equality bymaking probabilities for all possible values of qadd to 1.) The usual term for p(q) is the prior, forp(y|q) the likelihood, and for p(q|y) the posterior,and hence Bayes’s theorem simply says that theposterior distribution is proportional to theproduct of the prior times the likelihood.

Health Technology Assessment 2000; Vol. 4: No. 38

7

(a)

Change in risk in using home therapy (%)–70 –60 –50 –40 –30 –10 +10

(b)

(c)

Change in risk in using home therapy (%)–70 –60 –50 –40 –30 –10 +10

Change in risk in using home therapy (%)–70 –60 –50 –40 –30 –10 +10

FIGURE 1 (a) Prior, (b) likelihood (based on 23/148 versus 13/163 deaths) and (c) posterior distributions arising from the GREAT trialof home thrombolysis. (Reproduced by permission of the BMJ from Spiegelhalter et al.426)

Page 18: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

ExamplePocock and Spiegelhalter363 discuss the GREATtrial, in which the unknown quantity q is the truepercentage change in risk of mortality from usinghome thrombolytic therapy. They obtained a priordistribution for q expressing belief that “a 15–20%reduction in mortality is highly plausible, while theextremes of no benefit and a 40% reduction areboth unlikely”. This prior is shown in Figure 1a,while Figure 1b shows the likelihood expressing thesupport by the data (23/148 deaths on controlversus 13/163 deaths on active treatment) forvarious values of q. In contrast to the prior distribu-tion, Figure 1b displays strong support for values ofq representing a 40–60% risk reduction.

Figure 1c shows the posterior distribution, obtainedby multiplying the prior and likelihood togetherand then making the total area under the curve beequal to one (i.e. ‘certainty’). The evidence in thelikelihood has been pulled back towards the priordistribution – a formal representation of the beliefthat the results were ‘too good to be true’.

Reporting probability statements

Having obtained a posterior distribution, to produceprobabilities of exceeding certain thresholds, orlying in certain intervals, is only a computationaltask. In Figure 1, the posterior distribution providesan easily interpretable summary of the evidence,and probabilities for hypotheses of interest can thenbe read off the graph by calculating the relevantareas under the curve. For example, the most likelybenefit is around a 24% risk reduction (half thatobserved in the trial), the posterior probability thatthe reduction is at least 50% is only 5%, and a 95%interval runs from a 43% to 0% risk reduction.

Such an interval is generally termed a ‘credibleinterval’ – unlike a confidence interval, it can bedirectly interpreted as saying that, given the priorassumptions, the model and the data, there is a95% chance that the true reduction lies between 0and 43%.

In many standard situations a traditional confi-dence interval is essentially equivalent to a credibleinterval based on the likelihood alone, and henceequivalent to using a ‘flat’ prior. Burton82 claimsthat “it is already common practice in medicalstatistics to interpret a frequentist confidenceinterval as if it did represent a Bayesian posteriorprobability arising from a calculation invoking aprior density that is uniform on the fundamentalscale of analysis”.

The subjective interpretation ofprobabilityThe standard use of probability describes long-runfrequency properties of repeated random events.This is known as the frequency interpretation ofprobability, and so both Fisherian and Neyman–Pearson schools are often referred to as‘frequentist’. We have allowed probability torefer to generic uncertainty about any unknownquantity, and this is an application of the subjec-tivist interpretation of probability.

This subjective view of probability is not new, andused to be standard. Fienberg170 points out thatJakob Bernoulli in 1713 introduced “the subjectivenotion that the probability is personal and varieswith an individual’s knowledge”, and thatLaplace and Gauss both worked with posteriordistributions, which became known as ‘the inversemethod’. However from the mid-nineteenthcentury the frequency approach started todominate, and controversy has sporadicallycontinued. Dempster128 quotes Edgeworth in 1884as saying the critics who “heaped ridicule uponBayes’s theorem and the inverse method” weretrying to elicit “knowledge out of ignorance, some-thing out of nothing”. Polemical opinions are stillexpressed: in defence of the explicit introductionof subjective judgement into scientific research,Matthews319 states that “it simply makes no sense totake seriously every apparent falsification of a plau-sible theory, any more than it makes sense to takeseriously every new scientific idea”.

The Bayesian perspective thus extends the remit ofstandard statistical analysis, in that there is explicitconcern for what it is reasonable for an observer tobelieve in the light of data. Thus the perspective ofthe consumer of the analysis is explicitly takeninto account; for example, in a trial on a new drugbeing carried out by a pharmaceutical company,the viewpoints of the company, the regulatoryauthorities and the medical profession may besubstantially different. The subjective nature of theanalysis is therefore unapologetically emphasised.Berger and Berry38 state that “Bayesian statisticstreats subjectivity with respect by placing it in theopen and under the control of the consumer ofdata”.

The prior distribution shown in Figure 1a was basedon the subjective judgement of a senior cardiolo-gist, informed by empirical evidence derived fromone unpublished and two published trials. Ofcourse, conclusions strongly based on beliefs that

Overview of Bayesian philosophy

8

Page 19: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

cannot be supported by concrete evidence areunlikely to be widely regarded as convincing, andso it is important to attempt to find consensus onreasonable sources of external evidence. Theassessment and use of prior beliefs is discussedfurther below and in chapter 3.

The relation to the use of Bayes’stheorem in diagnostic testsIf q is something that is potentially observable, andp(q) can be derived from known data, then the useof Bayes’s theorem is uncontroversial. For example,it has long been established that sensitivity andspecificity are insufficient characteristics to judgethe result of a diagnostic test for an individual –the disease prevalence is also needed.

From the health technology assessment perspec-tive, the more interesting and controversial contextis that of an unknown q, which is a quantity thatis not potentially directly observable, such asthe mean benefit of a new therapy in a definedgroup of patients. There have been many argu-ments80,132,361,429 for the connection between theuse of Bayes’s theorem in diagnostic testing and ingeneral clinical research, pointing out that just asthe prevalence is required for the assessment of adiagnostic test, so the prior distribution on q isrequired to supplement the usual information (Pvalues and confidence intervals) which summarisesthe likelihood. We need only think of the hugenumber of clinical trials that are carried out, withfew clear successes, to realise that the ‘prevalence’of truly effective treatments is low. We should thusbe cautious about accepting extreme results, suchas observed in the GREAT trial, at face value:indeed, Grieve213 suggests a Bayesian approachprovides “a yardstick against which a surprisingfinding may be measured”.

Brophy and Joseph76 defend a Bayesian approachto clinical studies by analogy to the differing levelsof certainty which might be demanded by a diag-nostic test under different circumstances, andSimon405 provides the following example shown inTable 2. Suppose 200 trials are performed, but only

10% are of truly effective treatments. Suppose eachtrial is carried out with a type I error of 5% (thechance of claiming an ineffective treatment iseffective) and a type II error of 20% (the chanceof claiming an effective treatment is ineffective).Then Table 2 shows that 9/25 = 36% of trials withsignificant results are in fact of totally ineffectivetreatments: in diagnostic testing terms, the ‘predic-tive value positive’ is only 64%.

Simon refers to this as the ‘epidemiology of clinicaltrials’, and suggests this should imbue a spiritof scepticism about unexpected significant trialresults, which is naturally handled within aBayesian perspective.

The prior distribution

Chapter 3 provides a full discussion of the sourceand use of prior distributions, including elicitationfrom experts, the use of ‘default’ priors to repre-sent archetypal positions of ignorance, scepticismand enthusiasm and, when multiple related studiesare being simultaneously analysed, the assumptionof a common prior that may be ‘estimated’.

Four important points should be emphasisedimmediately:

1. Despite the name ‘prior’ suggesting a temporalrelationship, it is quite feasible that a priordistribution is decided on after seeing theresults of a study, since it is simply intendedto summarise reasonable uncertainty givenevidence external to the study in question.Cox117 states that “I was surprised to read thatpriors must be chosen before the data have beenseen. Nothing in the formalism demands this.Prior does not refer to time, but to a situation,hypothetical when we have data, where we assesswhat our evidence would have been if we hadhad no data. This assessment may rationally beaffected by having seen the data, although thereare considerable dangers in this, rather similarto those in frequentist theory”. Naturally whenmaking predictions or decisions one’s priordistribution needs to be unambiguously

Health Technology Assessment 2000; Vol. 4: No. 38

9

Trial conclusion Treatment truly ineffective Treatment truly effective Total

Not significant 171 4 175Significant 9 16 25

Total 180 20 200

TABLE 2 The expected results when carrying out 200 clinical trials with a = 5% and b = 20% if only 10% of treatments are trulyeffective

Page 20: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

specified, although even then it is reasonable tocarry out sensitivity analysis to alternativechoices.

2. There is no such thing as the ‘correct’ prior.Instead, researchers have suggested using a‘community’ of prior distributions expressing arange of reasonable opinions. Thus, a Bayesiananalysis of evidence is best seen as providing amapping from specified prior beliefs to appro-priate posterior beliefs.

3. When multiple related studies are being simulta-neously analysed, it may be possible to ‘estimate’the prior for each study – see page 12.

4. As the amount of data increases, the prior will,unless it is of a pathological nature, be over-whelmed by the likelihood and will exert negli-gible influence on the conclusions.

Predictions

Suppose we wish to predict some future observa-tions z, which depend on the unknown quantityq through a distribution p(z|q): for example, zmay be the outcomes to be observed on some

future patients. Since our current uncertaintyconcerning q is expressed by the posteriordistribution p(q|y), then we can average overthe current beliefs regarding the unknown q toobtain the predictive distribution p(z|y) for thefuture observations z.

Such predictive distributions are useful in manycontexts: Berry and Stangl45 describe their use indesign and power calculations, model checking,and in deciding whether to conduct a futuretrial, while Grieve208 provides examples in bio-equivalence, trial monitoring and toxicology.Applications of predictions include power calcula-tions (see page 28), sequential analysis (seepage 31), payback from research (see page 52)and health policy making (see page 54).

Sequential analysis

Sequential data fall naturally within the Bayesianframework, as the posterior distribution followingeach observation becomes the prior for thenext. This is discussed with regard to clinicaltrials in chapter 4, but for illustration weconsider the example of Berry,53 concerningthe 6-mercaptopurine (6-MP) trial for acute

Overview of Bayesian philosophy

10

Patient pair Preferred nA – nB Two-sided P P(B > A)

123456789

101112131415161718192021

ABAAABAAAAAAABAAAAAAA

101232345678989

101112131415

1.01.01.00.630.380.690.450.290.180.110.0650.0390.0220.0570.0350.0210.0130.00750.00440.00260.0015

0.250.500.310.190.110.230.140.0900.0550.0330.0190.0110.00650.0180.0110.00640.00380.00220.00130.00080.0005

TABLE 3 Classical P values (adjusted for sequential examination of the data) and posterior probabilities of treatment superiority (noadjustment necessary because of repeated looks at the data)

Page 21: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

leukaemia.186 In this study patients were allocatedin pairs to 6-MP (A) or placebo (B), and the treat-ment assigned to whichever stayed in remissionlonger was considered the preference for that pair.Of the first 18 pairs, 15 preferred A, and the trialwas stopped with P < 0.01; three subsequent pairsalso preferred A. The data are shown in Table 3,with the frequentist two-sided P values and theposterior probability of B being the preferredtreatment, assuming an initial uniform prior onthe probability of B being preferable to A.

We note that the posterior probabilities can becalculated without regard to any stated stoppingprocedure, that the posterior probability thatP(B > A) is simply the prior probability that thenext pair will prefer B, and that the posteriorprobabilities suggest the trial could have endedconsiderably sooner.

The idea that the data influence the posterioronly through the likelihood is known as thelikelihood principle37,49 (see page 13), andunderlies the strong Bayesian criticism ofsequential analysis.

Decision-making

Suppose we wish to make one of a set of decisions,and that we are willing to assess some value u(d, q),known as a utility, of taking the decision d when qis the true unknown ‘state of nature’. The theory ofoptimal decision-making says we should choose thedecision that maximises our expected utility, wherethe expectation is taken with respect to our currentprobability distribution for q.307

The use of Bayesian ideas in decision-making is ahuge area of research and application, in whichattention is focused on the utility of consequencesrather than the use of Bayesian methods to revisebeliefs.471 This activity blends naturally into cost-effectiveness analysis (see page 51), but neverthe-less the subjective interpretation of probabilityis essential, since the expressions of uncertaintyrequired for a decision analysis can rarely bebased purely on empirical data. There is along history of attempts to apply this theory tomedicine, and in particular there is a large litera-ture on decision analysis, whether applied to theindividual patient or for policy decisions.301 Thejournal Medical Decision Making contains anextensive collection of policy analyses basedon maximising expected utility, some of whichparticularly stress the importance of Bayesianconsiderations.

Therefore there has been a long debate on the useof loss functions (just the negative of utility), inparallel to that concerning prior distributions.Berry53 and others have long argued that the design,monitoring and analysis of a study must explicitlytake into account the consequences of eventualdecisions. It is important to note that there is also afrequentist theory of decision-making that uses lossfunctions, but does not average with respect to prioror posterior distributions: the decision-makingstrategy is generally ‘minimax’, where the loss isminimised whatever the true value of the parametermight be. This can be thought of as assuming themost pessimistic prior distribution: see, for example,Bather30 and Palmer.344 Thus all combinations of theuse of prior distributions and/or loss functions arepossible: this is further discussed in the commen-taries to this chapter and chapter 4.

The explicit use of utility functions within thedesign and monitoring of clinical trials is contro-versial but has been explored in a number ofcontexts: for example, Berry and Stangl45 discusswhether to stop a Phase II trial based on estimatingthe number of women in the trial and in the futurewho will respond; whether to continue a vaccinetrial by estimating the number of children who willcontract the disease; and the use of adaptive alloca-tion in a Phase III trial such that at each point thetreatment which maximises the expected numberof responders is chosen. See pages 32 and 37 forfurther discussion. A decision-theoretic approachalso leads to formal techniques for assessing thepayback from research (see page 52) and policymaking (see page 54).

Hypothesis testing

Just as the Neyman–Pearson approach focuses ontwo competing hypotheses, it is possible to take aBayesian hypothesis-comparison approach. Thisrequires a prior distribution that places a ‘lump’ ofprobability on each competing hypotheses, whichis updated to produce a posterior probability in,say, the null hypothesis that two drugs have equaleffect (note that this is not the same as a P value,although such a mistaken interpretation of a Pvalue is often made). Priors that explicitly considerthe ‘truth’ of a null hypothesis are discussed onpage 19. These were particularly promoted byCornfield, who emphasised the ‘relative bettingodds’ between two hypotheses, defined as the ratioof the posterior to the prior odds. This measure ofthe relative likelihood of two hypotheses is alsoknown as the ‘Bayes factor’, and has been thesubject of much research.266

Health Technology Assessment 2000; Vol. 4: No. 38

11

Page 22: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Design

Bayesian design of experiments can be consideredas a natural combination of prediction and deci-sion-making, in that the investigator is seeking tochoose a design which will achieve the desiredgoals. Chaloner and Verdinelli99 provide a generalreview, while Simes401 emphasises that trials thatare designed to help treatment decisions requirespecification of utilities and detailed subgroupinformation in order to individualise treatmentdecisions. A simple design problem, choosing thesize of a clinical trial, is considered on page 28.

Sequential designs present a particular problemknown as ‘backwards induction’,124 in which onemust work backwards from the end of the study,examine all the possible decision points that onemight face, and optimise the decision, allowingfor all the possible circumstances in which onemight find oneself. This is computationally verydemanding since one must consider what to doin all possible future eventualities, but limitedexamples will be described on pages 32 and 37.Early phases of clinical trials have attracted thisapproach: Brunier and Whitehead81 consider thebalancing of costs of experimentation and errorsin treatment allocation (see page 38).

We emphasise, however, that the likelihood prin-ciple states that Bayesian inference is carried outindependently of design, so that the reasons forobtaining particular data points is irrelevant: theonly aspect of interest is the relative likelihood ofobserving those particular data given possiblevalues of the parameters of interest.

Multiplicity

The context of clinical trials may present issues of“multiple analyses of accumulating data, analyses ofmultiple end-points, multiple subsets of patients,multiple treatment group contrasts and inter-preting the results of multiple clinical trials”.404

Observational data may feature multiple institu-tions, and meta-analysis involves synthesis ofmultiple studies. The general Bayesian approachto multiplicity involves specifying a common priordistribution for the substudies that expresses abelief in the expected ‘similarity’ of all the indi-vidual unknown quantities being estimated. Thisproduces a degree of pooling, in which an indi-vidual study’s results tend to be ‘shrunk’ towardsthe average result by an amount depending on thevariability between studies and the precision of theindividual study: relevant contexts include subset

analysis, between-centre variability, and random-effects meta-analysis. This is essentially a random-effect approach, often labelled as ‘empirical Bayes’or ‘multilevel’ modelling; we shall later distinguishthese from a ‘fully Bayes’ approach.

This is later discussed with respect to hierarchicalmodels (see page 21), subset analysis (see page 35),between-centre variability (see page 36), and insti-tutional comparisons (see page 44 and chapter 12).

Complex modelling

Health technology assessments will generallyinvolve some synthesis of evidence from a varietyof sources, and a single clinical trial will rarelyprovide a definitive conclusion as to a policyrecommendation. Standard statistical methodsare designed for summarising the evidence fromsingle studies, and although there has been ahuge growth in methods and application of meta-analysis, these have generally been concernedwith pooling evidence from studies with verycomparable designs, outcome measures and so on.

A Bayesian approach allows evidence from diversesources to be pooled through assuming that theirunderlying probability models (their likelihoods)have parameters of interest in common. Forexample, the ‘true’ effect of an intervention willfeature in models for both randomised trials andobservational data, even though there may be addi-tional adjustments for potential biases, differentpopulations, cross-overs between treatments and soon. One context in which this has been emphasisedis in the regulation of medical devices (see page53). This ability to deal with very complex modelsis discussed in chapter 6, but in general this greatflexibility brings with it mathematical and computa-tional challenges.

Computational issues

The Bayesian approach applies probability theoryto a model derived from substantive knowledgeand can, in theory, deal with realistically complexsituations – the approach can also be termed ‘fullprobability modelling’. It has to be acknowledged,however, that the computations may be difficult,with the specific problem being to carry out theintegrations necessary to obtain the posteriordistributions of quantities of interest in situationswhere non-standard prior distributions are used,or where there are additional ‘nuisance parame-ters’ in the model. These problems in integration

Overview of Bayesian philosophy

12

Page 23: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

for many years restricted Bayesian applicationsto rather simple examples. However, there hasrecently been enormous progress in methodsfor Bayesian computation, generally exploitingmodern computer power to carry out simulationsknown as Markov chain Monte Carlo (MCMC)methods.189,193

Although we will not be particularly concernedwith computational issues, we refer to Carlin et al.90

and Etzioni and Kadane162 who discuss a rangeof methods which may be used (normal approxi-mations, Laplace approximations and numericalmethods including MCMC), Gelman andRubin’s189 review of MCMC methods in bio-statistics, and Vanhouwelingen’s464 commentaryon the importance of computational methods inthe future of biostatistics.

The theoretical justification forthe Bayesian approachThe theoretical foundations for the optimality ofBayesian inference have been discussed at lengthby Cornfield,112 Degroot,124 Lindley,306 Bernardoand Smith43 and others. The crucial idea is that ifone accepts certain intuitively plausible behav-ioural axioms, such as not placing bets that areguaranteed to lose money, then one should actaccording to decision theory based on subjectiveprobabilities.

Apart from this somewhat ideological argument,the most important theoretical construct (asidefrom probability theory itself) is the likelihoodprinciple,37 which states that all the informationthat the data provides about the parameter iscontained in the likelihood. This entails, forexample, that frequentist stopping rules that retaintype I error (with their consequence that the infer-ences are influenced by what one would have donehad one observed something different) are entirelyirrelevant.

Schools of Bayesians

It is important to emphasise that there is no suchthing as a single Bayesian approach, and thatmany ideological differences exist between theresearchers whose work is discussed in this chapter.Four levels of Bayesian approach, of increasing‘purity’, may be identified:

1. The empirical Bayes approach, in which aprior distribution is estimated from multiple

experiments. Analyses and reporting are infrequentist terms.

2. The reference Bayes approach, in which aBayesian interpretation is given to conclusionsexpressed as posterior distributions, but anattempt is made to use ‘objective’ or ‘reference’prior distributions.

3. The proper Bayes approach, in which informa-tive prior distributions are based on availableevidence, but conclusions are summarised byposterior distributions without explicit incorpo-ration of utility functions.

4. The decision-theoretic or ‘full’ Bayes approach,in which explicit utility functions are used tomake decisions based on maximising expectedutility.

Our focus in this review is primarily on the third,proper, school of Bayesianism.

Making the health technologyassessment context explicitBayesian methods explicitly allow for the possi-bility that the conclusions of an analysis maydepend on who is conducting it and their availableevidence and opinion, and therefore the contextof the study is vital. Apart from methodologicalresearchers, at least four different viewpoints canbe identified for any health technologyassessment:

• sponsors, for example the pharmaceuticalindustry, medical charities or grantingagencies

• investigators, that is, those responsible for theconduct of a study, whether industry or publiclyfunded

• reviewers, for example regulatory bodies orjournal editors

• consumers, for example agencies setting healthpolicy, clinicians or patients.

An analysis which might be carried out solelyfor the investigators, for example, may not beappropriate for presentation to reviewers orconsumers, and Racine et al.370 point out that“experimentalists tend to draw a sharp distinctionbetween providing their opinions and assessmentsfor the purposes of experimental design andin-house discussion, and having them incorpo-rated into any form of externally disseminatedreport”.

Health Technology Assessment 2000; Vol. 4: No. 38

13

Page 24: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

A characteristic of health technology assessment isthat the investigators who plan and conduct a studyare generally not the same body as those that makedecisions on the basis of the evidence provided inpart by that study: such decision makers may beregulatory authorities or healthcare providers. Thisdivision is acknowledged both in the structure ofthis report (in which chapter 4 considers designand monitoring of trials, and chapter 7 examinesdecision-making) and in the lower profile given todecision-making compared with inferences.

Further reading

A wide-ranging introductory chapter by Berry andStangl45 in their textbook61 covers a whole rangeof modelling issues, including elicitation, modelchoice, computation, prediction and decision-making. Non-technical tutorial articles includethose by Lewis and Wears,295 Bland and Altman68

and Lilford and Braunholtz.299 Other authorsemphasise different merits of Bayesian approachesin health technology assessment: Eddy et al.148

concentrate on the ability to deal with varieties ofoutcomes, designs and sources of bias, Breslow72

stresses the flexibility with which multiple similarstudies can be handled, Etzioni and Kadane162

discuss general applications in the health scienceswith an emphasis on decision-making, whileFreedman177 and Lilford and Braunholtz299

concentrate on the ability to combine ‘objective’evidence with clinical judgement. Stangl andBerry426 provide a recent review of biomedicalapplications.

There is a huge methodological statistical literatureon general Bayesian methods, much of it quitemathematical. Dempster128 gives a historical back-ground, while Cornfield112 provides a theoreticaljustification of the Bayesian approaches, in termsof ideas such as coherence. A rather old article156 isstill one of the best technical introductions to theBayesian philosophy.

Good tutorial introductions are provided byLindley307 and Barnett,29 while more recent books,

in order of increasing technical difficulty, includethose by Berry,56 Lee,287 O’Hagan,338 Gelman etal.,188 Carlin and Louis,92 Berger36 and Bernardoand Smith.43

There is a large literature on using Bayesianmethods for specific modelling issues, and onlya few references are provided here. Specific

problems that have attracted a Bayesian analysisin health technology assessment include survivalanalysis,2,7,218,427 longitudinal models,257,282 missingdata and dropouts,116,270,285,354,385 and model criticismand selection.205

Commentary

As mentioned on page 11, we need to carefullydistinguish the debate as to whether prior distribu-tions should be used for inferences, from thequestion of whether our objective should be esti-mation, hypothesis testing or a decision requiringa loss function of some kind. Table 4 considers allpossible combinations of these elements.

All six approaches have been investigated in theoryand, to some extent, in practice, and the resultingarguments become complex. Here we shall onlygive a very brief overview; see the commentaries ineach chapter for further detailed objections.

Prior or no prior?There have been generic warnings about aBayesian approach, and Feinstein167 claims that“a statistical consultant who proposes a Bayesiananalysis should therefore be expected to obtain asuitably informed consent from the clinical clientwhose data are to be subjected to the experiment”.However, some seem to misunderstand theBayesian explicit acceptance of subjectivity: Lane280

states that “We are told that elimination of subjec-tivity by use of Bayesian inference paves the way totruly objective, evidence-based practice. Yet whobut a statistically minded minority can begin tointerpret Bayesian analysis?”.

Overview of Bayesian philosophy

14

Objective

Inference (estimation) Hypothesis testing Decision (loss function)

No prior Fisherian Neyman–Pearson Classical decision theoryPrior Proper Bayesian ‘Bayes’s factors’ Full decision-theoretic Bayesian

TABLE 4 A taxonomy of six possible ‘philosophical’ approaches to health technology assessment, depending on their objective and theirquantitative use of prior information

Page 25: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

In general, the objections to the use of a priordistribution have been pragmatic, and strict ideo-logical standpoints have been played down: Coxand Farewell118 say it would be “a great pity if differ-ences of technical approach were exaggeratedinto differences about qualitative issues”, whileArmitage19 maintains it is not appropriate topolarise the argument as a choice between twoextremes. Practical difficulties in obtaining andusing prior opinions will be described in thecommentary to succeeding chapters: Fisher171 andO’Rourke343 provide lists of general objections,while Tukey454 suggests that the Bayesian approachto multiplicity has little to contribute due to it beingrarely appropriate to assume exchangeability.

Following the discussion on page 13, manyauthors have argued that the appropriateness ofthe Bayesian approach depends crucially on thecircumstances and the precise question beingasked: for example, both Koch272 and Whitehead476

claim that a proper Bayesian approach may bereasonable at early stages of a drug’s developmentbut is not acceptable in Phase III trials.

Estimation, hypothesis testing ordecision-making?Whether or not to incorporate an explicit lossfunction would appear to depend on the questionbeing asked, and the extent to which a health tech-nology assessment should lead to an inferenceabout a treatment effect or a decision as to futurepolicy has been strongly debated.15,72,223,249,402

Important objections that have been raised to adecision-theoretic approach include the lack of acoherent theory for decision-making on behalf ofmultiple audiences with different utility functions,

the difficulty of obtaining agreed utility values,and the fact that future treatments would berecommended on the basis of even marginalexpected gains, without any concern with theconfidence with which such a recommendation ismade (see also page 39). We also note Kass andGreenhouse’s267 argument against the Bayes factorapproach, claiming that “in most RCTs, estimationwould be more appropriate than testing”.

Key points

1. Claims of advantages and disadvantages ofBayesian methods are now largely based onpragmatic reasons rather than blanket ideolog-ical positions.

2. A Bayesian approach can lead to flexiblemodelling of evidence from diverse sources.

3. Bayesian methods are best seen as a transforma-tion from initial to final opinion, rather thanproviding a single ‘correct’ inference.

4. Different contexts may demand different statis-tical approaches, both regarding the role ofprior opinion and the role of an explicit lossfunction. It is vital to establish contexts in whichBayesian approaches are appropriate.

5. A decision-theoretic approach may be appro-priate where the consequences of a study arepredictable, such as when dealing with rarediseases treated according to a protocol, withina pharmaceutical company, or in public healthpolicy.

Health Technology Assessment 2000; Vol. 4: No. 38

15

Page 26: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient
Page 27: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Introduction

There is no denying that quantifiable prior beliefsexist in medicine. For example, in the context ofclinical trials, Peto and Baigent358 state that “it isgenerally unrealistic to hope for large treatmenteffects” but that “it might be reasonable to hopethat a new treatment for acute stroke or AMI couldreduce recurrent stroke or death rates in hospitalfrom 10% to 9% or 8%, but not to hope thatit could halve in-hospital mortality”. However,turning informally expressed opinions into a math-ematical prior distribution is perhaps the mostdifficult aspect of Bayesian analysis. Four broadapproaches are outlined below, ranging from theelicitation of subjective opinion to the use ofstatistical models to estimate the prior from data.Sensitivity analysis to alternative assumptions isconsidered vital, whatever the method used toconstruct the prior distribution.

From a mathematical and computational perspec-tive, it is extremely convenient if the priordistribution is a member of a family of distributionsthat is conjugate to the form of the likelihood, inthe sense that they ‘fit together’ to produce a pos-terior distribution that is in the same family as theprior distribution. For example, many likelihoodsfor treatment effects can be assumed to have anapproximately normal shape,421 and thus in thesecircumstances it will be convenient to use a normallyshaped prior (the conjugate family), provided itapproximately summarises the appropriate externalevidence. Similarly when the observed data are to beproportions (implying a binomial likelihood), theconjugate prior is a member of the ‘beta’ familyof distributions which provide a flexible way ofexpressing beliefs about the magnitude of a trueunknown frequency. Modern computing power is,however, reducing the need for conjugacy, and inthis section we shall concentrate on the source ofthe prior rather than its precise mathematical form.

We should repeat the statements made on page 9regarding the fact that there is no ‘correct’ prior:Bayesian analysis can be seen as a means of trans-forming prior into posterior opinions, rather thanproducing the posterior distribution. It is thereforevital to take into account the context and audiencefor the assessment (see page 13).

Elicitation of opinion

A true subjectivist Bayesian approach requires onlya prior distribution that expresses the personalopinions of an individual but, if the health tech-nology assessment is to be generally accepted by awider community, it would appear to be essentialthat the prior distributions have some evidential orat least consensus support. In some circumstancesthere may, however, be little ‘objective’ evidenceavailable and summaries of expert opinion may beindispensable.

There is a very large literature on eliciting subjec-tive probability distributions from experts, withsome good early references on statistical391 andpsychological aspects,456 as well as on methodsfor pooling distributions obtained from multipleexperts.190 The fact that people are generally notgood probability assessors is well known, and thevariety of biases they suffer are discussed by Kadaneand Wolfson.265 Nevertheless it has been shownthat training can improve experts’ ability toprovide judgements that are ‘well calibrated’, inthe sense that if a series of events are given aprobability, say 0.6, then around 60% of theseevents will occur. Murphy and Winkler331 discussthis issue with regard to weather forecasting.

Chaloner97 provides a thorough review of methodsfor prior elicitation in clinical trials, includinginterviews with clinicians, postal questionnaires,and the use of an interactive computer program todraw a prior distribution. She concludes that fairlysimple methods are adequate, using interactivefeedback with a scripted interview, providingexperts with a systematic literature review, basingelicitation on 2.5 and 97.5% percentiles, and usingas many experts as possible.

Berry and Stangl45 discuss methods for elicitingconjugate priors and checking whether the assess-ments are consistent with each other. Kadane andWolfson260 distinguish between experiments tolearn for which only the prior of the experimenterneeds to be considered, and experiments to prove,in which the priors (and utilities) of an audience tobe persuaded need to be considered. They discussgeneral methods of elicitation, and give anexample applied to an observational study in

Health Technology Assessment 2000; Vol. 4: No. 38

17

Chapter 3

Where does the prior distribution come from?

Page 28: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

healthcare. They emphasise the method by whichexperts may be asked for predictions about futureobservations; from these assessments an implicitprior distribution can be derived.

The methods used in case studies can be dividedinto four main categories using increasingly formalmethods:

1. Informal discussion. Examples include Rosnerand Berry,381 who consider a trial of paclitaxel(Taxol®) in metastatic breast cancer, in whicha beta prior for the overall success rate wasassessed using a mean of 25% and 50% beliefthat the true success rate lay between 15 and35%. Similarly, Lilford and Braunholtz299

describe how priors were obtained from twodoctors for the relative risk of venous throm-bosis from oral contraceptives. In the context ofmeta-analysis, Smith et al.412 thought that it wasunlikely that the between-study odds would varyby more than an order of magnitude, and hencederived a prior distribution on the hetero-geneity parameter.

2. Structured interviewing and formal pooling ofopinion. Freedman and Spiegelhalter180

describe an interviewing technique in whicha set of experts were individually interviewedand hand-drawn plots of their prior distribu-tions elicited. Deliberate efforts were madeto prevent the opinions being over-confident(too ‘tight’). The distributions were convertedto histograms, and averaged to produce acomposite prior. This was carried out twice fortrials of thiotepa in superficial bladder cancerand these priors used later in discussing initialand interim power of the study,416,418 (see pages28 and 29). A similar exercise was carried outbefore a trial in osteosarcoma414 and used in adiscussion of the power of the trial.420 Gore198

introduced the concept of ‘trial roulette’, inwhich 20 gaming chips, each representing 5%belief, could be distributed amongst the bins ofa histogram: in a trial of artificial surfactantin premature babies, 12 collaborators wereinterviewed using this technique to obtaintheir opinion on the possible benefits of thetreatment.442 Using an electronic tool so thatindividuals in a group could respond withoutattribution, Lilford et al.297 presented collabora-tors in a trial with a series of imaginary patientsin order to elicit their opinions on the benefitof early delivery.

3. Structured questionnaires. The ‘trial roulette’scheme described above was administered by

post by Hughes242,243 for a trial in treatment ofoesophageal varices, and by Abrams et al.1,421 fora trial of neutron therapy.

Parmar et al.348 elicited prior distributions forthe effect of a new radiotherapy regime(CHART), in which the possible treatmenteffect was discretised into 5% bands and theform was sent by post to each of nine clinicians.Each provided a distribution over these bandsand an arithmetic mean was then taken. Parmaret al. provide a copy of the form in their paper,and results are provided for both lung348 andhead and neck cancer.421

4. Computer-based elicitation. Chaloner et al.98

provide a detailed case study of the use of arather complex computer program that interac-tively elicited distributions from five cliniciansfor a trial of prophylactic therapy in AIDS (seealso Carlin et al.90). Kadane258 reports the resultsof an hour-long telephone interview with eachof five clinicians, using software to estimateprior parameters from the results of a series ofquestions eliciting predictive probability distri-butions for responses of various patient types.When a second round of elicitation becamenecessary, the proposal was met by “little enthu-siasm”. Kadane and Wolfson260 provide anedited transcript of a computerised elicitationsession in a non-trial context. Dumouchel141

describes a computer program for elicitingprior distributions in hierarchical models, inwhich beliefs about contrasts give rise to infor-mative prior distributions on between-unitvariability.

Summary of evidence

If the results of previous similar studies are avail-able it is clear that may be used as the basis for aprior distribution. Three main approaches havebeen used:

1. Use of a single previous experimental result as aprior distribution. Brown et al.78 andSpiegelhalter et al.421 both provide examples ofprior distributions proportional to the likeli-hoods arising from a single pilot trial.

Korn et al.276 could not identify a beta priormatching the characteristics of their pastempirical experience, but Bring75 pointed outthat had they fitted the mode rather than themean of the prior to their past data then theproblem would have disappeared.

The prior distribution

18

Page 29: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

2. Direct use of a meta-analysis of many previousstudies. Lau et al.284 point out that cumulativemeta-analysis can be given a Bayesian interpreta-tion in which the prior for each trial is obtainedfrom the meta-analysis of preceding studies,while Dersimonian129 derives priors for a trial ofthe effectiveness of calcium supplementation inthe prevention of pre-eclampsia in pregnantwomen by a meta-analysis of previous trials usingboth random effects and fixed effects models.Smith et al.410 and Higgins and Whitehead233

both use a review of past meta-analyses as a basisfor a prior distribution for the degree of hetero-geneity in a current meta-analysis. Gilbert et al.192

report an empirical distribution of past trialresults in surgery which could be used as a prior.

3. Use of previous data in some discounted form.Previous studies may not be directly related tothe one in question and we may wish to discountits influence. Kass and Greenhouse267 state that“we wish to use this information, but we do notwish to use it as if the historical controls weresimply a previous sample from the same popula-tion as the experimental controls”, and theyinvestigate a range of such discounting methods(see appendix 1). Berry and Stangl45 discussthe use of historical information with trial data,giving an example of a Bayesian logistic regres-sion analysis with historical data weighted byvarious amounts: see page 5.4 for discussion ofhistorical controls in randomised trials. Brophyand Joseph76 pool two previous studies to form aprior but investigate the effect of downweightingtheir influence to 50 and 10%, Cronin et al.121

report a mixture of past evidence and ‘priorbelief of what is a reasonable range of values’,while Greenhouse and Wasserman206 down-weight a previous trial with 176 subjects to haveweight equivalent to only 10 subjects.

Default priors

It would clearly be attractive to have prior distribu-tions that could be taken ‘off-the-shelf’, rather thanhaving to consider all available evidence externalto the study in their construction. Four mainsuggestions can be identified.

‘Non-informative’ or ‘reference’ priorsThere has been substantial research in theBayesian literature into so-called non-informativeor reference priors, which usually take the form ofuniform distributions over the range of interest,possibly on a suitably transformed scale of the

parameter.70 Formally, a uniform distributionmeans the posterior distribution has the sameshape as the likelihood function, which in turnmeans that the resulting Bayesian intervals andestimates will essentially match the traditionalresults, so that posterior tail areas will match one-sided P values (ignoring any adjustment for multi-plicity). Hughes243 points out that a careful choiceof scale is necessary and that a uniform prior on,say, a log(relative risk) might entail an inappro-priate prior on a risk difference, while Kass andWasserman268 review the current status of suchreference priors.

Results with reference priors are generally quotedas one part of a Bayesian analysis. In particular,Burton82 suggests that most doctors interpretfrequentist confidence intervals as posterior distri-butions, and also that information prior to a studytends to be vague, and that therefore results froma study should be presented by performing aBayesian analysis with a non-informative prior andquoting posterior probabilities for the parameterof interest being in various regions. He givesexamples from a hypothetical logistic regressionsetting and from the evaluation of the effectivenessof a Haemophilus influenzae type b (HIB) vaccine inAboriginal children.

Other applications of reference priors includeAchcar et al.,7 Briggs73 on net benefit in cost-effectiveness studies, and almost all the confidenceprofile examples in Eddy et al.150 However, Fisher171

points out that “there is no such thing as a“noninformative” prior. Even improper priorsgive information: all possible values are equallylikely”.

‘Sceptical’ priorsInformative priors that express scepticism aboutlarge treatment effects have been put forward bothas a reasonable expression of doubt, and as a wayof controlling early stopping of trials on the basisof fortuitously positive results. Kass and Green-house267 suggest that a “cautious reasonable scepticwill recommend action only on the basis of fairlyfirm knowledge”, but that these sceptical “beliefswe specify need not be our own, nor need theybe the beliefs of any actual person we happen toknow, nor derived in some way from any group of“experts” ”.

Mathematically speaking, a sceptical prior about atreatment effect will have a mean of zero andsome spread which determines the degree ofscepticism. Fletcher et al.173 use such a prior, whileSpiegelhalter et al.421 argue that a reasonable

Health Technology Assessment 2000; Vol. 4: No. 38

19

Page 30: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

degree of scepticism may be equivalent to feelingthat the alternative hypothesis is optimistic, forma-lised by a prior with only a small probability (say5%) that the treatment effect is as large as thealternative hypothesis (see Figure 2).

This approach has been used in a number of casestudies,183,348 and has been suggested as a basisfor monitoring trials166 (see page 29) and whenconsidering whether or not a confirmatory studyis justified (see page 33). Other users of scepticalpriors include Dersimonian129 and Heitjan227 in thecontext of Phase II studies, while a senior US Foodand Drugs Administration (FDA) biostatistician339

has stated that he “would like to see [scepticalpriors] applied in more routine fashion to provideinsight into our decision making”.

‘Enthusiastic’ priorsAs a counterbalance to the pessimism expressedby the sceptical prior, Spiegelhalter et al.421 suggestan ‘enthusiastic’ prior centred on the alternativehypothesis and with a low chance (say 5%) thatthe true treatment benefit is negative. Use of sucha prior has been reported in case studies183,227 andas a basis for conservatism in the face of earlynegative results166 (see page 31). Digman et al.133

provide an example of such a prior, but call it‘optimistic’.

Priors with a point mass at the nullhypothesis (‘lump-and-smear’ priors)The traditional statistical approach expresses aqualitative distinction between the role of a nullhypothesis, generally of no treatment effect, andalternative hypotheses. A prior distribution thatretains this distinction would place a ‘lump’ ofprobability on the null hypothesis, and ‘smear’ theremaining probability over the whole range ofalternatives: Cornfield112 uses a normal distributioncentred on the null hypothesis, while Hughes243

uses a uniform prior over a suitably restrictedrange. The resulting posterior distribution retainsthis structure, giving rise to a posterior probabilityof the truth of the null hypothesis: this is appar-ently analogous to a P value but is neithernumerically nor conceptually equivalent (seepage 11).

Cornfield repeatedly argued for this approach,which naturally gives rise to the ‘relative bettingodds’ as a sequential monitoring tool, defined asthe ratio of the likelihood of the data under thenull hypothesis to the average likelihood (withrespect to the prior) under the alternative.113 Therelative betting odds is independent of the ‘lump’of prior probability placed on the null (although it

does depend on the shape of the ‘smear’ over thealternatives), and does not suffer from the problemof ‘sampling to a foregone conclusion’ (see page32). He suggests a ‘default’ prior under the alterna-tive as a normal distribution centred on the nullhypothesis and with expectation (conditional onthe effect being positive), equal to the alternativehypothesis d, that is, with prior standard deviation÷(p/2d). This is essentially equivalent to the formu-lation of a sceptical prior described above, but withprobability of exceeding the alternative hypothesisof g = 0.21 – this is larger than the value of 5%often used for sceptical priors, but the lump ofprobability on the null hypothesis is alreadyexpressing considerable scepticism. Values forthese prior distributions for 11 outcome measuresare reported for the Urokinase PulmonaryEmbolism Trial.389 This method was used in anumber of major studies alongside more standardapproaches,114,389,458 although relative betting oddswere dropped from the final report of theCoronary Drug Project.115

A mass of probability on the null hypothesis hasalso been used in a cancer trial182 and for sensitivityanalysis in trial reporting243 (see page 33).

Although such an analysis provides an explicitprobability that the null hypothesis is true, and soappears to answer the question of interest, theprior might be somewhat more realistic were thelump to be placed on a small range of values repre-senting the more plausible null hypothesis of ‘noclinically effective difference’ (although Corn-field112 points out that the ‘lump’ is just a mathe-matical approximation to such a prior). Lachin277

has extended the approach to this situation wherethe null hypothesis forms an interval.

‘Robust’ priors

To save all the effort of eliciting a prior distribu-tion, it seems reasonable when reporting ormonitoring a study to identify how the prior affectsthe conclusions. This essentially means identifyinga class of prior distributions, and then seeing howthe posterior conclusions vary with priors withinthat class. Greenhouse and Wasserman206 carry outtwo such case studies, while Carlin and Sargent93,388

use ‘prior partitioning’ to identify the set of priorsthat would lead, say, to stopping a trial accordingto specified tail-area posterior probabilities. Thosein authority then simply have to assess whethertheir prior lies in the appropriate partition. Seepages 31 and 33 for further discussion of thisapproach in clinical trials.

The prior distribution

20

Page 31: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Exchangeability, hierarchicalmodels and multiplicitySuppose we are interested in making inferences onmany parameters q1,…, qk measured on k ‘units’which may, for example, be true treatment effectsin subsets of patients, multiple clinics, or for eachof a series of trials. We can identify three differentassumptions:

1. All the qs are identical, in which case all the datacan be pooled and the individual unitsforgotten.

2. All the qs are entirely unrelated, in which casethe results from each unit are analysed inde-pendently (for example using a fully specifiedprior distribution).

3. The qs are assumed to be ‘similar’ in thefollowing sense. Suppose we were blinded as towhich unit was which, and all we had was a labelfor each, say A, B, C and so on. Then our prioropinion about any particular set of qs would notbe affected by only knowing the labels, in thatwe have no reason to think specific units aresystematically different. Such a set of qs arecalled ‘exchangeable’,43 and this assumption canbe shown under broad conditions to be mathe-matically equivalent to assuming they are drawnat random from some population distribution,

just as in a traditional random effects model.Note that there does not need to be any actualsampling – perhaps these k units are the onlyones that exist – since the probability structureis a consequence of the belief rather than aphysical randomisation mechanism.

We emphasise that an assumption ofexchangeability is a judgement based on ourknowledge of the context. If there are knownreasons to suspect specific units are systematicallydifferent, then those reasons need to be modelled.Dixon and Simon134 discuss the reasonableness ofexchangeability assumptions.

The Bayesian approach to multiplicity is thus tointegrate all the units into a single model, in whichit is assumed that q1,…, qk are drawn from somecommon prior distribution whose parameters areunknown: this is known as a hierarchical model.These unknown parameters may then be estimateddirectly from the data using the technique of‘marginal maximum likelihood’ (known as the‘empirical Bayes’ approach), or given a (usuallyminimally informative) prior distribution (knownas the ‘full Bayes’ approach). The results fromeither an empirical or full Bayes analysis will gener-ally be similar, and lead to the inferences for eachunit having narrower intervals than if they areassumed independent, but biased towards meanresponse. Thus there is a similar conservatism to

Health Technology Assessment 2000; Vol. 4: No. 38

21

0

Benefit of new treatment

dA

FIGURE 2 Sceptical (solid line) and enthusiastic (broken line) priors for a trial with alternative hypothesis dA. The sceptics’ probabilitythat the true difference is greater than dA is g (shown shaded). This value has also been chosen for the enthusiasts’ probability that thetrue difference is less than 0

Page 32: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

the traditional adjustment, but for a radicallydifferent reason.

Cornfield112,113 was an early proponent of theBayesian approach to multiplicity (see page 35).Breslow72 gives many examples of problems ofmultiplicity and reviews the use of empirical Bayesmethods for longitudinal data, small area mapping,estimation of a large number of relative risks in acase–control study, and multiple tumour sites ina toxicology experiments, while Louis309 reviewsthe area and provides a detailed case study.Other examples in clinical trials are describedon page 35.

Empirical criticism of priors

The ability of subjective prior distributions topredict the true benefits of interventions is clearlyof great interest, and Box69 suggested a method-ology for comparing priors with subsequent data.The prior is used to create a predictive distributionof the future summary statistic, which provides theprior predictive probability of observing data asextreme as that actually seen. This probability maybe termed Box’s P value. Spiegelhalter416 showedhow this could be applied to a clinical trial, andapplications have been reported for discretedata,369,422 while Spiegelhalter et al.421 display predic-tive distributions and Box’s P values in all theiranalyses (see chapter 9 for an application in theCHART trial.) For normal prior and likelihood,Box’s P value is simply derived from the standard-ised difference (denoted Z) between the twodistributions (difference in means divided by thesquare root of the sum of their variances).421

There have been a number of prospective elicita-tion exercises for clinical trials, and many of thesetrials have now reported their results. Table 5 showsa selection of results, including the intervals for the

prior distributions for treatment effects, theevidence from the likelihood, and Box’s P valuesummarising the conflict between the prior andthe likelihood.

Table 5 shows the generally poor experienceobtained from prior elicitation. The clinicians areuniversally optimistic about the new treatments(median of prior hazard ratios > 1), whereas onlyone of the trials eventually showed any evidence ofbenefit from the new treatment (likelihood hazardratio > 1). This also reflects the experience ofCarlin et al.90 in their elicitation exercise.

CommentaryHere we shall only consider comments on howpriors may be best obtained: whether they shouldbe used or not is discussed elsewhere (see pages 14and 39).

There have been many criticisms of the process ofeliciting subjective prior distributions in the healthtechnology assessment context. Claims include:

1. Subjects are biased in their opinions. Gilbert etal.192 state that “innovations brought to the stageof randomised trials are usually expected by theinnovators to be sure winners”, and Hughes242

points out that the very fact that clinicians areparticipating in a trial is likely to suggest theyexpect the new therapy to be of benefit; thisappears to be borne out in the results shown inTable 5. Altman12 warns that investigators mayeven begin to exaggerate their prior beliefs inorder to make their prospective trial appearmore attractive (although this appears to alreadyhappen both in public and industry-fundedstudies). Fisher171 believes the effort put intoelicitation is misplaced, since the measuredbeliefs are likely to be based more on emotionthan scientific evidence.

The prior distribution

22

Study Prior Likelihood Z P

Hazardratio

95%interval

Reference Hazardratio

95%interval

Reference

CHART (lung) 1.37 0.87–2.14 348 1.32 1.09–1.59 390 0.14 0.89Thiotepa X1 1.65 0.99–2.74 418 0.90 0.63–1.29 375 1.90 0.06Osteo 1.11 0.66–1.83 420 0.94 0.69–1.27 413 0.54 0.59Neutron(clinical prior)

1.19 0.67–2.08 421 0.66 0.40–1.10 421 1.51 0.14

TABLE 5 A comparison of some elicited subjective prior distributions and the consequent results of the clinical trials. In each case apooled prior was provided, assumed normal on a log(hazard ratio) scale – Box’s P value is calculated on this scale. This is transformedto a hazard ratio scale where a hazard ratio > 1 corresponds to benefit of the new treatment: median and 95% intervals are given

Page 33: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

2. The choice of subject biases results. The biasesdiscussed above mean that the choice of subjectfor elicitation is likely to influence the results. Ifwe wish to know the distribution of opinionsamong well-informed clinicians, then trial inves-tigators are not a random sample and will givebiased conclusions. Lewis292 says statisticiansreviewing the literature should provide muchbetter prior distributions than clinicians, whileChalmers95 suggests even lay people are biasedtowards believing new therapies will beadvances, and therefore we need empiricalevidence on which to base the prior probabilityof superiority.

Fisher171 claims that uncertainty as to whoseprior to use militates against any use of Bayesianmethods. However, it can be argued that takingcontext into account (see page 13) means thatit is quite reasonable to allow for differingperspectives.

3. Timing of elicitation has an influence. Senn397

objects to any retrospective elicitation of priorsas “present remembrance of priors past is notthe same as a true prior”, while Hughes242 pointsout that opinions are likely to be biased by whatevidence has recently been represented and bywhom.

4. Subjective judgement of exchangeability may beinappropriate. Tukey454 says that “to treat the trueimprovements for the classes concerned as asample from a nicely behaved population … doesnot seem to me to be near enough the real worldto be a satisfactory and trustworthy basis for thecareful assessment of strength of evidence”.

These concerns have led to a call for the evidentialbasis for priors to be made explicit, and for effortto go into identifying reasons for disagreement andattempting to resolve these.171 Even advocates ofBayesian methods have suggested that the biases in

clinical priors suggest more attention to empiricalevidence from past trials, possibly represented assceptical priors: Fayers164 asks, given the long expe-rience of negative trials, “should we not be usingpriors strongly centred around 0, irrespective ofinitial opinions, beliefs and hopes of clinicians?”,while Spiegelhalter et al.421 say that elicited priorsfrom investigators show predictable positive biasand may possibly be replaced by archetypal ‘enthu-siastic’ priors.

Key points

1. The use of a prior is based on judgement andhence a degree of subjectivity cannot beavoided.

2. The prior is important and not unique, and so arange of options should be examined in a sensi-tivity analysis.

3. The intended audience for the analysis needs tobe explicitly specified.

4. The quality of subjective priors (as assessed bypredictions) show predictable biases in terms ofenthusiasm.

5. For a prior to be taken seriously, its evidentialbasis must be explicitly given, as well as anyassumptions made (e.g. downweighting of pastdata). Care must, however, be taken of bias inpublished results.

6. Archetypal priors may be useful for identifying areasonable range of prior opinion.

7. Great care is required in using default priorsintended to be minimally informative.

8. Exchangeability assumptions should not bemade casually.

Health Technology Assessment 2000; Vol. 4: No. 38

23

Page 34: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient
Page 35: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

A Bayesian: one who asks you what you think before aclinical trial in order to tell you what you think afterwards.

Stephen Senn398

General arguments

Randomised controlled trials have provided fertileterritory for arguments between alternative statis-tical philosophies. There are many specific issuesin which a distinct Bayesian approach is identi-fiable, such as the ethics of randomisation, powercalculations, monitoring, subset analysis, alterna-tive designs and so on, and these are dealt with inseparate sections below.

There are also a number of general discussionpapers on the relevance of Bayesian methods totrials. These include tutorial introductions at anon-technical295 and slightly more technical level,1

while Brophy and Joseph76 explain the Bayesianapproach by demonstrating its use in re-analysingdata from the Global Utilization of Streptokinoseand t-PA for Occluded Coronary Arteries(GUSTO) trial of different thrombolytic regimes,using prior information from other studies. Moredetailed but non-mathematical discussions aregiven by Cornfield113 and Kadane,262 who particu-larly emphasises the internal consistency of theBayesian approach, and welcomes the need forexplicit prior distributions and loss functionas producing scientific openness and honesty.Pocock and Hughes365 again provide a non-mathematical discussion concentrating on estima-tion issues in trials, while Armitage22 attempts abalanced view of the competing methodologies.Particular emphasis has been placed on the abilityof Bayesian methods to take full advantage ofthe accumulating evidence provided by smalltrials.302,317 A special issue of Statistics in Medicine on‘methodological and ethical issues in clinicaltrials’ contains papers both for53,420,459 andagainst476 the Bayesian perspective, and featuresincisive discussion by Armitage, Cox and others.Palmer and Rosenberger346 review non-standard

trial designs and suggest circumstances where theymay be appropriate.

Somewhat more technical reviews are given bySpiegelhalter and colleagues.420,421 Berry52,53,55 haslong argued for a Bayesian decision-theoretic basisfor clinical trial design, and has described in detailmethods for elicitation, monitoring, decision-making and using historical controls.

Earlier (see page 14), we identified the funda-mental dual issues of prior distributions and lossfunctions, and we have followed this division byfocusing on inferences from a clinical trial in thischapter, and delaying discussion of subsequentdecisions, whether based on randomised or non-randomised evidence, to chapter 7. We realise thisis a somewhat artificial separation, as the potentialrole for explicit statement of a loss function is arunning theme throughout discussions on design,sample size, sequential analysis, adaptive allocationand payback from research programmes, and manywould argue that the eventual decision is insepa-rable from design and analysis of a study. However,the health technology assessment context oftenmeans that the investigators who design and carryout a study are generally not the same body whomake decisions on the basis of the evidence (seepage 13), and so if we take a pragmatic rather thanideological perspective, then the attempt at separa-tion appears reasonable.

Ethics and randomisation

Is randomisation necessary?Randomisation has two traditional justifications: itensures treatment groups are directly comparable(up to the play of chance), and it provides a funda-mental basis for the probability distributionsunderlying conventional statistical procedures.

Since Bayesian probability models are derivedfrom subjective beliefs and do not require anyunderlying random mechanism, the latter require-ment is irrelevant, and this has led some to

Health Technology Assessment 2000; Vol. 4: No. 38

25

Chapter 4

A guide to the Bayesian health technologyassessment literature: conductof randomised controlled trials

Page 36: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

question the need for randomisation at all,provided alternative methods of balancing groupscan be established. For example, Urbach459 arguesthat a “Bayesian analysis of clinical trials affords avalid, intuitively plausible rationale for selectivecontrols, and marks out a more limited role forrandomisation than it is generally accorded”, whileKadane261 suggests updating clinician’s priors andonly assigning treatments that at least one clini-cian considers optimal. Berry goes further inclaiming46 “Randomised trials are inherentlyunethical”. Papineau347 refutes Urbach’s positionand claims that despite it not being essential forstatistical inference, experimental randomisationforms a vital role in drawing causal conclusions(see also Rubin383). The relationship betweenrandomisation and causal inferences is beyond thescope of this chapter, but in general the need forsound experimental design appears to dominatephilosophical statistical issues.246 In fact Berry andKadane65 suggest that if there are several partieswho make different decisions and observe

different data, randomisation may be a strictlyoptimal procedure since it enables each observerto draw his or her own appropriate conclusions.Kadane and Seidenfeld263 make a useful distinc-tion between experiments to learn and those toprove, which we will find useful when it comes todiscussing confirmatory trials.

That careful analysis of databases can to someextent replace randomised trials has been arguedby Howson and Urbach240 and Hlatky.235 Byar84 putsan opposing view.

When is it ethical to randomise?If we agree that randomisation is in theory useful,then the issue arises of when it is ethical torandomise. This is closely associated with theprocess of deciding when to stop a trial (discussedfurther on page 29) and is often representedas a balance between individual and collectiveethics.346,362 The ethics of randomisation andclinical trials have been covered in Edwards et al.155

Randomised controlled trials

26

New treatmentsuperior

Range ofequivalence

Old treatmentsuperior

= new superior

= old not superior

= equivalent

= equivocal

= new not superior

= old superiorA

B

C

D

E

C*

dI dS d

FIGURE 3 Possible situations at any point in the progress of a trial, derived from superimposing an interval estimate (say 95%,denoted ´) on the range of equivalence

Page 37: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Freedman176 introduced the idea of professionalequipoise, in which disagreement among themedical profession makes randomisation ethical.The trial design of Kadane258 (see appendix 1) isan expression of this principle, in that only atreatment that at least one clinician thoughtoptimal could be given to a patient (although itunfortunately turned out that a computer bugmeant that some patients were allocated totreatments that all clinicians felt were suboptimal).Perhaps a more appealing approach is the‘uncertainty principle’, which is often arguedas a basis for ethical randomisation85: this may bethought of as ‘personal equipoise’154 in which theclinician is uncertain as to the best treatment forthe patient in front of him or her. However, aquantified degree of uncertainty has not beenspecified.

The Bayesian approach can be seen as formalisingthe uncertainty principle by explicitly representing,in theory, the belief of an individual clinician thata treatment may be beneficial to a specific patient –this could be provided by superimposing theclinician’s posterior distribution on the range ofequivalence (see page 27) relevant to a particularpatient.421 It has been argued that a Bayesianmodel naturally formalises the individual ethicalposition,300,345 in that it explicitly confronts thepersonal belief in the clinical superiority of onetreatment. Berry,53 however, has suggested that ifpatients were honestly presented with numericalvalues for their clinician’s belief in the superiorityof a treatment, then few might agree to be random-ised. One option might be to randomise but witha varying probability that is dynamically weightedtowards the currently favoured treatment: suchadaptive allocation designs are discussed onpage 37.

Kass and Greenhouse267 argue that “the purposeof a trial is to collect data that bring to conclusiveconsensus at termination opinions that had beendiverse and indecisive at the outset”, and go onto state that “randomisation is ethically justifiablewhen a cautious reasonable sceptic would beunwilling to state a preference in favour of eitherthe treatment or the control”. This approach leadsnaturally to the development of sceptical priordistributions (see page 19) and their use in moni-toring sequential trials (see page 30).

Specification of null hypotheses

Attention in a trial usually focuses on the nullhypothesis of treatment equivalence expressed by

q = 0, but realistically this is often not the onlyhypothesis of interest. Increased costs, toxicity andso on may mean that a certain improvement wouldbe necessary before the new treatment could beconsidered clinically superior, and we shall denotethis value qS. Similarly, the new treatment mightnot actually be considered clinically inferior unlessthe true benefit were less than some thresholddenoted qI. The interval between qI and qS hasbeen termed the ‘range of equivalence’179 (seeFigure 3): often qI is taken to be 0.

This is not a specifically Bayesian idea,22 and thereare several published examples of the elicitationand use of such ranges of equivalence.172,180

Using historical controls

A Bayesian basis for the use of historical controlsin clinical trials, generally in addition to somecontemporaneous controls, is based on the ideathat it is wasteful and inefficient to ignore all pastinformation on control groups when making a newcomparison. This was first suggested by Pocock,360

and has since been particularly developed withinthe field of carcinogenicity studies.386 The crucialissue is the extent to which the historical informa-tion can be considered equivalent to contemp-oraneous data: essentially the inclusion of histor-ical controls is indistinguishable from using suchdata as the basis for a prior opinion (see page 18).

Five broad approaches have been taken:

1. Assume the historical control individuals areexchangeable with those in the current controlgroup, which leads to a complete pooling ofhistorical with experimental controls.

2. Assume the historical control groups areexchangeable with the current control group,and hence build or assume a hierarchical modelfor the response within each group.127,440 Thisleads to a degree of pooling between thecontrol groups, depending on their observed orassumed heterogeneity. Gould199 suggests usingpast trials to augment current control groupinformation, assuming exchangeable controlgroups. Rather than directly producing aposterior distribution on the contrast of interest,he uses this historical information to derivepredictive probabilities of obtaining a significantresult were a full trial to have taken place (seepage 28) (this paper carefully avoids the term‘Bayesian’, and we might suspect this was a delib-erate policy).

Health Technology Assessment 2000; Vol. 4: No. 38

27

Page 38: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

3. Assume that the parameter being estimated inhistorical data is some function of the parameterthat is of interest, thus explicitly modellingpotential biases, as in the confidence profilemethod150 (see page 49).

Atkinson25 suggests this technique in a contextwhere the historical control data is available ona general population, but experimental therapyis only given to an identifiable subset of newpatients. A posterior distribution for the benefitof treatment can be obtained by constructing amodel for what would have been the controlresponse on that subset.

4. If there is only one historical group, thenassume a parameter representing the probabilitythat any past individual is exchangeable withcurrent individuals, so discounting the contribu-tion of past data to the likelihood, as used byBerry44,45 in reanalysis of the extracorporealmembrane oxygenation (ECMO) study (seepage 37).

5. Assume a certain prior probability that thehistorical control group exactly matches thecontemporaneous controls and hence can bepooled. Racine et al.370 provide a Bayes’s factorformulation within the context ofbioequivalence studies.

Design: sample size ofnon-sequential trialsHere we only deal with trials of fixed sample size:see page 29 for sequential designs.

We described a taxonomy of six different broadapproaches to health technology assessmentstudies earlier (see page 14). Here we focus on howthe four main concepts (ignoring the Bayesianhypothesis testing and the classical decision-theoryapproach) deal with selecting the size of anexperiment.

1. Fisherian. In principle there is no need forpreplanned sample sizes, but a choice may bemade by selecting a particular precision ofmeasurement and informally trading that offagainst the cost of experimentation.

2. Neyman–Pearson. Trials have traditionally beendesigned to have reasonable power, defined asthe chance of correctly detecting an alternativehypothesis, best defined as being both ‘realistic

and important’. Power is generally set to 80 or90%.

3. Proper Bayesian. As in the Fisherian approach,there is in principle no need for preplannedsample sizes, as pointed out by Lilford etal.153,300,302 Alternatively, it is natural to focuson the eventual precision of the posteriordistribution of the treatment effect. There isan extensive literature on non-power-basedBayesian sample size calculations which may bevery relevant for trial design.9,247,255,256

Considerable attention has been paid to ahybrid between Neyman–Pearson and Bayesianapproaches, in which the prior might be usedin the design but not in reporting the analysis.Thus the requirement for a trial being largeenough to detect a plausible difference naturallyleads to the use of prior distributions: either theprior mean could be taken as the alternativehypothesis or the power for each value of qcould be averaged with respect to the priordistribution to obtain an ‘expected’ or ‘average’power, which should be a more realistic assess-ment of the chance that the trial will yield apositive conclusion.

Prior distributions can thus be used for samplesize calculations for trials that will be analysed ina traditional frequentist or Bayesian manner.421

The prior distributions might be from anyof the sources described in chapter 3, andexamples include sets of subjective assess-ments,150,180,414,418,442 a single previous study,78 ora meta-analysis of previous results.96,129

Brown78 predicts the chance of correctlydetecting a positive improvement, rather thanthe overall chance of getting a positive result,and this might be considered a more reasonabletarget. Predictive power can also be applied tonull hypotheses defined as ranges of equiva-lence.328 Gould considers nuisance parameters,for example overall response rate, and useshistorical data200 or interim data201 for powercalculations.

It is natural to express a cautionary note onprojecting from previous studies,273 and possibletechniques for discounting past studies are veryrelevant (see page 27).

4. Decision-theoretic Bayesian. If we are willingto express a utility function for the cost ofexperimentation and the potential benefit ofthe treatment, then sample sizes can chosen to

Randomised controlled trials

28

Page 39: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

maximise the expected utility. This was consid-ered by Canner,88 while Thompson452 estimatedthat a trial of electronic foetal monitoring with180,000 cases per arm would cost US$22 millionbut would be expected to provide a benefit ofUS$118 million. Detsky131 conducted an earlyattempt to model the impact of a trial in termsof future lives saved, which required modellingbeliefs about the future number to be treatedand the true benefit of the treatment, while Tanand Smith439 fit the ideas in with ranges of equiv-alence. In other examples,107,238,239 sample sizesare explicitly determined by a trade-off betweenthe cost of the trial and the expected futurebenefit: see page 39 for some comments on theconsequences. This approach also attempts toanswer the question ‘what is the expected netbenefit from carrying out the trial?’, which isdiscussed further on page 52.

Instead of attempting to model the futurebenefit of a trial, Lindley305 considers a utilityfunction based only on the final interval.

Design and monitoring ofsequential trials

IntroductionWhether or not to stop a trial early is a complexethical, financial, organisational and scientificissue, in which statistical analysis plays a consider-able role. Furthermore the monitoring ofsequential clinical trials can be considered the‘front line’ between Bayesian and frequentistapproaches, and Etzioni and Kadane state that thereasons for their divergence “reach to the veryfoundations of the two paradigms”.162 Pocock,362

O’Brien336 and Whitehead478 provide goodreviews.

Four main statistical approaches can be identified,again corresponding to the four main entries inTable 4.

1. Fisherian. This is perhaps best exemplified intrials influenced by Richard Peto’s group, inwhich protocols state109 that the data-monitoringcommittee should only alert the steeringcommittee to stop the trial on efficacy groundsif there is “both (a) ‘proof beyond reasonabledoubt’ that for all, or for some, types of patientone particular treatment is clearly indicated…,and (b) evidence that might reasonably beexpected to influence the patient managementof many clinicians who are already aware of theresults of other main studies”.

2. Neyman–Pearson. This classical methodattempts to retain a fixed type I error throughpre-specified stopping boundaries. These maybe considered as stopping guidelines: Demets126

states that “while they are not stopping rules,such methods can be useful in the decision-making process”, although regulatory authori-ties require good reasons for not adhering tosuch boundaries.248 Whitehead476,477 is a majorproponent, and Jennison and Turnbull250

provide a detailed review.

3. Proper Bayesian. Probabilities derived from aposterior distribution may be used for moni-toring, without formally prespecifying a stoppingcriterion – see page 30. There is no real needin this framework even to prespecify a samplesize.53 As for fixed sample size trials, prior distri-butions have been used at the design stage butassuming a Neyman–Pearson analysis.180,321

4. Decision-theoretic Bayesian. This assumes we arewilling to explicitly assess the losses associatedwith consequences of stopping or continuingthe study (see page 32), and therefore requiresa full specification of the ‘patient horizon’, theallocation rule and so on. This approach alsoquantifies the expected benefit of the trial andtherefore helps decide whether to conduct thetrial at all.

Here we briefly describe some of the huge litera-ture on this subject.

A brief critique of Neyman–Pearsonmethods in sequential trialsAs introduced on page 10, the need to adjustconclusions because the data have been lookedat during the study has been roundly criticised.Anscombe13 baldly states that “Sequential analysis isa hoax”, and Meier323 considers that “provided theinvestigator has faithfully presented his methodsand all of his results, it seems hard indeed toaccept the notion that I should be influenced inmy judgement by how frequently he peeked atthe data while he was collecting it”. The crucialtechnical point is that Neyman–Pearson theorydisobeys the likelihood principle (see page 13),and hence there is no need for Bayesians orFisherians to take any account of what would havehappened had something other been observed.50

If we were to assign weights to the relative impor-tance of the two types of error that could be made,any resulting design would seek to minimise alinear combination of the type I error rate a andtype II error rate b. The fact that such a design

Health Technology Assessment 2000; Vol. 4: No. 38

29

Page 40: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

would obey the likelihood principle led Corn-field111 to point out that “the entire basis forsequential analysis depends upon nothing moreprofound than a preference for minimising b forgiven a rather than minimizing their linear combi-nation. Rarely has so mighty a structure, and oneso surprising to scientific common sense, rested onso frail a distinction and so delicate a preference”.

Freedman177 points out that there is no agreedmethod of estimation following a sequential trial,and Hughes and Pocock244,364 argue that frequentistsequential rules are “prone to exaggerate magni-tude of treatment effect” since they would tendto stop when on a random high. However,Whitehead478 says one can adjust to get rid of suchbiases, and Hughes and Pocock are really lookingat inconsistency with clinical opinion.

Armitage17 agrees that adjusted P values are “tootenuous to be quoted in an authoritative analysis ofthe data”, but still considers frequency propertiesof stopping rules may be useful guides for “mentaladjustment”. Heitjan et al.228 says the loss functionimplicit in a sequential analysis reflects a focus oninference rather than decisions.

Pocock and Hughes364 say that “control of theoverall type I error is a vital aid to restricting theflood of false positives in the medical literature”,but this appears to introduce the extraneous issueof selective reporting.

From a practical perspective, the responsibility forrecommending the early termination of a trial isincreasingly vested in an independent data andsafety monitoring committee, which will need totake into account multiple sources of evidencewhen making their judgements. Classical sequen-tial analysis may be a useful warning to themagainst overinterpretation of naive P values.

Monitoring using the posteriordistributionFollowing the ‘proper Bayesian’ approach, it isnatural to consider terminating a trial when oneis confident that one treatment is better than theother, and this may be formalised by assessing theposterior probability that the treatment benefit qlies above or below some boundary, such as theends of the range of equivalence. It is generallyrecommended that a range of priors are consid-ered, and applications have been reported in avariety of trials.38,51,77,90,129,182,183,191,348,381 Explicitcomparison with boundaries obtained byfrequentist procedures have been displayed,129,181,183

and the similar conservativism noted. Armitage17

and Simon405 discuss the choice of an appropriateboundary.

The informal stopping procedure described aboveexplicitly takes into account the impact of theresults on a range of clinical opinion, and sofollow Kass and Greenhouse267 in claiming that asuccessful trial should contain sufficient evidenceto bring both a sceptic and an enthusiast to broadlythe same conclusions (see page 26). This maybe formalised by using the concept of scepticaland enthusiastic priors (see page 19), in whichstopping with a positive result might be consideredif a posterior based on a sceptical prior suggesteda high probability of treatment benefit, whereasstopping without a negative result may be based onwhether the results were sufficiently disappointingto make a posterior based on an enthusiastic priorrule out a treatment benefit: in other words weshould stop if we have convinced a reasonableadversary that he or she is wrong. Fayers et al.166

provide a tutorial on such an approach, andFreedman et al.178 consider it as part of an exercisefor a data-monitoring committee. Digman et al.133

give an example in which the data have over-whelmed an optimistic prior centred on a 40% riskreduction, and hence justifies assuming a negativeresult and early stopping.

Greenhouse and Wasserman206 and Carlin andSargent93,388 consider stopping rules based on‘robust priors’: Carlin89 argues that this enablesstatements of the form “given the data so far, theprior would have to place a mass of at least p onthe range where the new treatment is consideredsuperior, in order to avoid stopping now andrejecting this treatment as inferior”. Emerson159

retrospectively examines what class of priors wouldhave been needed to replicate stopping used in astudy (see page 20). Posterior probabilities of tworesponses can be monitored jointly, and stoppingconsidered when an event of interest, such aseither outcome occurring,161 exceeds a certainthreshold.

A specific problem occurs when additional infor-mation becomes available as a trial is continuing,such as the publication of similar studies. Brophyand Joseph77 argue that this is a good opportunityfor Bayesian methods, generating considerablediscussion271 and letters.

This monitoring scheme has also been proposedfor single-arm studies, and is discussed withinthe context of Phase I and II trials (see page 38).Criticisms of this procedure include its sampling

Randomised controlled trials

30

Page 41: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

properties (see page 32), lack of explicit lossfunction (see page 32), and dependence on prior(see page 39).

Monitoring using predictionsInvestigators and funders are often concernedwith the question ‘Given the data so far, what isthe chance of getting a ‘significant’ result?’ Thetraditional approach to this question is ‘stochasticcurtailment’,221 which calculates the conditionalpower of the study, given the data so far, for arange of alternative hypotheses.250

Digman et al.133 point out that it is not, however,reasonable to condition on a hypothesis that is nolonger tenable. From a Bayesian perspective it isnatural to average such conditional powers withrespect to the current posterior distribution, just asthe pretrial power was averaged with respect to theprior to produce the average or expected power(see page 28). The methodology has been illus-trated in a variety of contexts103,104,286,368,419 in whichinitial trial results are used to predict the

probability of eventually gaining ‘significance’.Since data are available at an interim stage a mini-mally informative pretrial prior may be used, andhence the method does not strictly speakingrequire a proper Bayesian justification. Armitage22

points out circumstances in which the predictionscan be based on a pivotal quantity that does notdepend on the parameter, and Lan and Wittes’s278

‘B value’ enables calculation of predictiveprobability of significance. Chang and Shuster100

use non-parametric predictions of survivaltimes without a prior, while Frei et al.185 andHilsenbeck234 provide practical examples ofstopping studies due to the futility of continuing.

The fact that predicted probabilities of success areoften surprisingly low has been emphasised and isshown in Figure 4.420 The chance that the result willeven change sign may also be reported.48 The tech-nique has been used with results that currentlyshow approximate equivalence to justify the ‘futil-ity’ of continuing,470 and may be particularly usefulfor data-monitoring committees and funders275

Health Technology Assessment 2000; Vol. 4: No. 38

31

50%

75%

Observed Z statistic

Predictive probability of P = 0.01

–1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

90%

10%

25%

50%75%

Observed Z statistic

–1 0 1 2 3

90%

(a)

Predictive probability of P = 0.05

0.0

0.2

0.4

0.6

0.8

1.0

(b)

10%

25%

FIGURE 4 Predictive probability of obtaining a significant (two-sided P = 0.01 or 0.05) result, given the fraction f of study completed(f = 10, 25, 50, 75 and 90%) and the current standardised test statistic Z. For example, if halfway through a study (f = 50%), thetreatment effect is currently one standard error away from 0 (Z = 1), then based on this information alone there is only 29%chance that the trial will eventually show a significant (P < 0.05) benefit of treatment

Page 42: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

when accrual or event rates are lower thanexpected.5

Nevertheless, Armitage17,18,22 warns against usingthis predictive procedure as any kind of formalstopping rule as it gives an undue weight to ‘signifi-cance’, and makes strong assumptions about thedirect comparability of future data with thosealready observed – for example if future datainvolve extended follow-up there may be unduereliance on an assumption of proportionalhazards.421

Monitoring using a formal loss functionThe full Bayesian decision-theoretic approachrequires the specification of losses associated withall combinations of possible true underlying statesand all possible actions. The decision whetherto terminate a trial is then, in theory, based onwhether termination has a lower expected loss thancontinuing, where the expectation is with respectto the current posterior distribution, and theconsequences of continuing have to consider allpossible future actions – this ‘backwards induction’requires the computationally intensive techniqueof ‘dynamic programming’.

Reasonably straightforward solutions can be foundin some circumstances. For example, Anscombe13

considers n pairs of patients randomised equallyto two groups, a total patient horizon of N, auniform prior on true treatment benefit, and lossfunction proportional to the number of patientsgiven the inferior treatment times the size ofthe inferiority: he concludes it is approximatelyoptimal to stop and give the ‘best to the rest’when the standard one-sided P value is less thann/N – half the proportion of patients alreadyrandomised. Berry and Pearson60 and others59,120,228

have extended such theory to allow for unequalstages and so on. Backwards induction is extremelycomputationally demanding, but Carlin et al.91

do a retrospective analysis on a trial,90 and claimit is computationally feasible using MCMCmethods, in which forward sampling is used asan approximation to the optimal strategy. Thereis also an extensive theoretical literature on trialsdesigned from a non-Bayesian decision-theoreticperspective.30

As a worked (but retrospective) example, Berry etal.64 consider a trial of influenza vaccine for Navajochildren. They construct a model consisting ofpriors for the effectiveness of the vaccine and theplacebo treatment, the probability of obtainingregulatory approval and the length of time taken toobtain it, and the probability of a superior vaccine

appearing in the next 20 years and the length oftime taken for it to appear. After each month theexpected number of cases of the strain amongstNavajo children in the next 20 years is calculatedin the case of stopping the trial, and continuingthe trial (the latter being calculated by dynamicprogramming). The trial is stopped when theformer exceeds the latter.

The level of detail required for such an analysishas been criticised as being unrealistic,72,421 but ithas been argued that trade-offs between benefitsfor patients within and outside the trial should beexplicitly confronted.162 See page 39 for furtherdiscussion.

A recent development is reported by Kadane etal.,264 who are to be allowed to elicit prior distribu-tions and utilities from members of the data-monitoring committee for a large collaborativecancer trials group National Surgical AdjuvantBreast and Bowel Project (NSABP). They intend todo individual elicitations using predictive methods(see page 17) and use the forward samplingapproach to solve the dynamic programmingproblem.91 Their success at this ambitious ventureremains to be seen.

Frequentist properties of sequentialBayesian methodsAlthough the long-run sampling behaviour ofsequential Bayesian procedures is irrelevant froma strict Bayesian perspective, a number of investi-

gations have taken place which generally showgood sampling properties.236,293,294,381 In particular,Grossman et al.217 show that a sceptical prior (seepage 19), centred on zero and with precision equiv-alent to that arising from an ‘imaginary’ trial ofaround 26% of the planned sample size, gives riseto boundaries that have type I error around 5%for five interim analyses, with good power andexpected sample size. Thus an ‘off-the-shelf’Bayesian procedure has been shown to have goodfrequentist properties: essentially the conservativebehaviour of a Neyman–Pearson approach ismirrored by that obtained from assuming asceptical prior. The sampling properties ofBayesian designs has been particularly investigatedin the context of Phase II trials (see page 28).

One contentious issue is ‘sampling to a foregoneconclusion’.21 This mathematical result proves thatrepeated calculation of posterior tail areas will,even if the null hypothesis is true, eventually lead aBayesian procedure to reject that null hypothesis.This does not, at first, seem an attractivefrequentist property of a Bayesian procedure.

Randomised controlled trials

32

Page 43: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Nevertheless, Cornfield111 argued that “if one isseriously concerned about the probability that astopping rule will certainly result in the rejectionof a null hypothesis, it must be because somepossibility of the truth of the hypothesis is beingentertained”, and if this is the case then theyshould be placing a lump of probability on it,as discussed on page 19, and so fit within theBayesian hypothesis testing framework (seepage 11). He shows that if such a lump, howeversmall, is assumed then the problem disappears inthe sense that the probability of rejecting a truenull hypothesis does not tend to one. Breslow72

agrees with this solution, but Armitage is notpersuaded,16 claiming that even with a continuousprior distribution with no lump at the nullhypothesis, one might still be interested in type Ierror rates at the null as giving a bound to those atnon-null values.

The role of ‘scepticism’ inconfirmatory studiesAfter a clinical trial has given a positive result fora new therapy, there remains the problem ofwhether a confirmatory study is needed. Fletcher etal.173 argue that the results of the first trial might betreated with scepticism, and Berry57 points out thatusing a sceptical prior is a means of dealing with‘regression to the mean’, in which early extremeresults tend to return to the average over time.

ExampleParmar et al.349 consider two trials and show that,when using a reasonable sceptical prior, doubt canremain after both the first trial and the confirma-tory trial about whether the new treatmentprovides a clinically important benefit.

Figure 5 shows an example with a sceptical priordistribution with a median of 0 months benefit,which is equivalent to an ‘imaginary’ trial in which33 patients died on each treatment. The dashedvertical lines display the null hypothesis of noimprovement and the minimum clinically worth-while improvement of 4 months: between these liewhat can be termed the ‘range of equivalence’, andthe figure shows that the sceptical prior expresses aprobability of 41% that the true benefit lies in therange of equivalence, and only 9% that the newtreatment is clinically superior.

The likelihood plot shows the inferences to bemade from the data alone, assuming a ‘uniform’prior on the range of possible improvements:Parmar et al. call this an ‘enthusiastic’ prior. The

probability that the new treatment is actuallyinferior is 0.004 (equivalent to the one-sided Pvalue of 0.008/2.) The probability of clinicalsuperiority is 80%, which might be consideredsufficient to change treatment policy.

The posterior plot shows the impact of thesceptical prior, in that the chance of clinicalsuperiority is reduced to 44% – hardly sufficientto change practice. In fact, Parmar et al. reportthat the National Cancer Institute (NCI) Inter-group Trial investigators were unconvinced by theCancer and Leukemia Group B (CALGB) trialdue to their previous negative experience, and socarried out a further confirmatory study. Theyfound a significant median improvement but ofonly 2.4 months, suggesting the sceptical approachmight have given a more reasonable estimate.

Reporting, sensitivity analysis, androbustnessThe only ‘guidelines’ available for reportingBayesian analyses appear to be those of Lang andSecic:281

1. Report the pretrial probabilities and specify howthey were determined.

2. Report the post-trial probabilities and theirprobability intervals.

3. Interpret the post-trial probabilities.

These seem very limited: see chapter 8 for anattempt to set more stringent standards forreporting.

An integral part of any good statistical report isa sensitivity analysis of assumptions concerningthe form of the model (the likelihood). Bayesianapproaches have the additional concern of sensi-tivity to the prior distribution, both in view of itscontroversial nature and because it is by definitiona subjective assumption that is open to validdisagreement. As part of the general discussionof priors in chapter 3, the need to consider theimpact of a ‘community of priors’267 was stressed,and three main types of ‘community’ may beidentified:

1. Discrete set. Many case studies carry out sensi-tivity analysis to a limited list of possible priors,possibly embodying scepticism, enthusiasm,clinical opinion and ‘ignorance’: the studiesdescribed in appendix 1 provide manyexamples.

Health Technology Assessment 2000; Vol. 4: No. 38

33

Page 44: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

2. Parametric family. O’Rourke343 emphasises thatposterior probabilities “should be clearly andprimarily stressed as being a ‘function’ of theprior probabilities and not the probability oftreatment effects”. If the community of priorscan be described by one varying parameter, thenit is possible to graphically display the depend-ence of the main conclusion to that parameter.Hughes242 suggested examining sensitivity ofconclusions to priors based on previous trialresults and that reflecting investigators’opinions, and later243 gives an example whichfeatures a point-mass prior on zero, and an

explicit plot of the posterior probability againstthe prior probability of this null hypothesis.Hughes’s approach of plotting prior and againstposterior summaries is an example of the‘robust’ Bayesian approach, in which an attemptis made to characterise the class of priorsleading to a given decision (see appendix 1).

3. Non-parametric family. The ‘robust’ Bayesianapproach has been further explored by allowingthe community of priors to be a non-parametricfamily in the neighbourhood of an initial prior(see page 20). For example, Gustafson219

Randomised controlled trials

34

Median survival gained by new treatment (months)–2 0 2 4 6 8 10 12

< equivalence: 50.0% = equivalence: 41.4% > equivalence: 8.6%

Median survival gained by new treatment (months)–2 0 2 4 6 8 10 12

< equivalence: 0.4% = equivalence: 9.7% > equivalence: 79.9%

Median survival gained by new treatment (months)–2 0 2 4 6 8 10 12

< equivalence: 1.6% = equivalence: 54.0% > equivalence: 44.4%

(a)

(b)

(c)

FIGURE 5 (a) Sceptical prior (equivalent to 33 deaths in each group), (b) likelihood (observed hazard ratio = 1.63 after 120 deaths)and (c) posterior distributions arising from the CALGB trial of standard radiotherapy versus additional chemotherapy in advanced lungcancer. The dashed lines give the boundaries of the range of clinical equivalence, taken to be 0–4 months median improvement insurvival. Percentages by each graph show the probabilities of lying below, within and above the range of equivalence. (Reproduced bypermission of the BMJ from Spiegelhalter et al.426)

Page 45: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

considers the ECMO study (see page 37)with a community centred around a ‘non-informative’ prior but 20% ‘contaminated’with a prior with minimal restrictions, suchas being unimodal. The maximum andminimum posterior probability of the treat-ment’s superiority within such a class can beplotted, providing a sensitivity analysis. Suchan approach has also been explored byGreenhouse and Wasserman206 and Carlinand Sargent.93

Stangl and Berry426 emphasise the need for afairly broad community, with sensitivity analysisnot just to the spread of the prior but also itslocation, while they also stress that sensitivity toexchangeability and independence assumptionsshould be examined. In fact, the ‘community’approach to prior distributions of treatmenteffects can also be applied to the distributionof random effects in hierarchical models: forexample, Gustafson219 considers a non-parametricapproach to the effect of non-normality in arandom effects distribution. Stangl and Berry426

finish by saying that while sensitivity analysis isimportant, it should not serve as a substitute forcareful thought about the form of the priordistribution.

There is limited experience of reporting suchanalyses in the medical literature, and it has been

suggested by Koch,272 Hughes242 and Spiegelhalteret al.421 that a separate ‘interpretation’ section isrequired to display how the data in a study wouldadd to a range of currently held opinions. It wouldbe attractive for people to be able to carry out theirown sensitivity analysis to their own prior opinion:Lehmann and Shachter290 describe a computingarchitecture for this, and available software andweb pages are described in appendix 2.

Subset analysis

The discussion on multiplicity on page 12 hasalready described how multiple simultaneous infer-ences may be made by assuming a common priordistribution with unknown parameters, providedan assumption of exchangeability (the prior doesnot depend on units’ identities) is appropriate.Within the context of clinical trials this has imme-diate relevance to the issue of estimating treatmenteffects in subgroups of patients.

By placing an identical prior on each subgroup,the unknown between-subgroup variability is essen-tially estimated from the data. As Cornfield113

points out, this procedure “leads to 1) poolingsubgroups if the differences among them appearsmall, 2) keeping them separate if differencesappear large, and 3) providing intermediate resultsfor intermediate situations”.

Health Technology Assessment 2000; Vol. 4: No. 38

35

–2.0 –1.5 –1.0 –0.5 0.0 0.5 1.0 1.5 2.0

Standardised treatment effect

Performance status: 0–1Performance status: 2–3Anaplasia grade: 1–2Anaplasia grade: 3–4No measurable diseaseMeasurable diseaseSymptomaticNot symptomaticAge < 70 yearsAge > 70 yearsMaleFemale

FIGURE 6 Traditional (solid line) and Bayesian (broken line) estimates of standardised treatment effects in a cancer clinical trial. TheBayesian estimates are pulled towards the overall treatment effect by a degree determined by the empirical heterogeneity of the subsetresults. (Reproduced by permission of the BMJ from Spiegelhalter et al.426)

Page 46: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Published applications are generally based onassigning a reference (uniform) prior on theoverall treatment effect, and then assuming thesubgroup-specific deviations from that overalleffect have a common prior distribution with zeromean. This prior expresses scepticism about widelydiffering subgroup effects, although the variabilityallowed by the prior is usually estimated from thedata. Thus this specification avoids the need fordetailed subjective input, which may be seen as anattractive feature. Many applications consider thisan empirical Bayes’s procedure123,309,365 which givesrise to traditional confidence intervals which arenot given a Bayesian interpretation. Donner139

sets out the basic ideas, and Dixon, Simon andco-authors elaborate the techniques in severalexamples.134,135,136,403,405,406 Dixon and Simon134

discuss the reasonableness of the exchangeabilityassumption.

Example: Bayes’s theorem for subsetanalysisDixon and Simon135 describe a Bayesian approachto dealing with subset analysis in a clinical trial inadvanced colorectal cancer. The solid horizontallines in Figure 6 show the standardised treatmenteffects within a range of subgroups, using tradi-tional methods for estimating treatment bysubgroup interactions. Four of the 12 intervalsexclude zero, although due to the multiple hypoth-eses being tested an adjustment technique such asBonferroni might be used to decrease the apparentstatistical significance of these findings.

The Bayesian approach is to assume subgroup-specific deviations from the overall treatment effecthave a prior distribution centred at zero but withan unknown variability: this variability parameteris then given its own prior distribution. Since thedegree of scepticism is governed by the variance ofthe prior distribution, the observed heterogeneityof treatment effects between subgroups will influ-ence the degree of scepticism being imposed.

The resulting Bayesian estimates are shown asdashed lines in Figure 6. They tend to be pulledtowards each other, due to the prior scepticismabout substantial subgroup-by-treatment interac-tion effects. Only one 95% interval now excludeszero: the subgroup with non-measurable metastaticdisease. Dixon and Simon mention that this wasthe conclusion of the original trial, but that theBayesian analysis has the advantage of not relyingon somewhat arbitrary adjustment techniques,being generalisable to any number of subsets, andprovides a unified means of both providing esti-mates and tests of hypotheses.

Multicentre analysis

Methods for subset analysis (see page 35) naturallyextend to multicentre analysis, in which the centre-by-treatment interaction is considered as a randomeffect drawn from some common prior distributionwith unknown parameters. Explicit estimation ofindividual institutional effects may be carried out,which in turn relates strongly to the methods usedfor institutional comparisons of patient outcomes(see page 44).

There have been numerous examples of this proce-dure,203,218,409,425,427,428 generally adopting MCMCtechniques due to the intractability of the analyses.Recent case studies include Gould,202 who providesBUGS code (see appendix 2) for Gibbs samplinganalysis, and Jones et al.,251 who compare estima-tion methods. Matsuyama et al.316 allow a randomcentre effect both on baseline hazard and treat-ment effect, and examine the centres for outliersusing a Student’s t prior distribution for therandom effects.

Senn398 discusses the general issue of when arandom-effects model for centre-by-treatmentinteraction is appropriate, emphasising thepossible difficulty of interpreting the conclusions,particularly in view of the somewhat arbitrary defi-nition of ‘centre’.

Multiple end-points andtreatmentsMultiple end-points in trials can often be ofinterest when dealing with, say, simultaneousconcern with toxicity and efficacy. This tendsto occur in early phase studies, and a Bayesianapproach allows one to create a two-dimensionalposterior distribution over toxicity andefficacy.137,161,451 General random effects models formore complex situations can be constructed.288

Naturally, a two-dimensional prior is required, andparticular care must be taken over the dependenceassumptions.

A similar situation arises with many treatments: ifone is willing to make exchangeability assumptionsbetween treatment effects, then a hierarchicalmodel can be constructed to deal with the multiplecomparison problem. This was proposed long agoby Waller and Duncan.468 Brant et al.71 update thisprocedure by assuming exchangeable treatmentsand setting the critical values for the posteriorprobabilities of treatment effects by using a

Randomised controlled trials

36

Page 47: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

decision-theoretic argument based on specifyingthe relative losses for type I to type II error.

Data-dependent allocation

‘Bandit’ problemsSo far we have only covered standardrandomisation designs, but a full decision-theoreticapproach to trial design would consider data-dependent allocation so that, for example, in orderto minimise the number of patients getting theinferior treatment, the proportion randomised tothe apparently superior treatment could beincreased as the trial proceeded. Such ‘adaptive’designs would appear to satisfy ethical consider-ations for the patients under study303 (see page25). These designs can pose so-called ‘bandit’problems, as they are analogous in theory to agambler deciding which arm of a two-armed banditto pull in order to maximise the expected return.An extreme example is Zelen’s ‘play-the-winner’rule in which the next patient is given the currentlysuperior treatment, and randomisation isdispensed with entirely;485 there is an extensiveliterature on this and other designs.47

There has been considerable criticism of theseideas as not being practically rooted in the realitiesof clinical trials. Byar et al.86 identify as objections:(1) responses have to be observed without delay,(2) adaption depends on a one-dimensionalresponse, (3) sample sizes may have to be biggerand (4) patients may not be homogeneousthroughout the trial. Armitage15 and Peto357 addthat clinicians are likely to be unhappy withadaptive randomisation, the trial will be complexand may deter recruitment, and estimation of thetreatment contrast will lose efficiency. Palmer andRosenberger346 suggest that unbalanced allocationwill make blinding difficult as clinicians may guesswhich treatment is ‘in the lead’, but point out thatmodern technology is making such designs morefeasible. Finally, Senn398 emphasises that futurepatients, greatly outnumbering those in the trial,would value a more precise treatment estimate.

A careful analysis has been carried out by Berryand Eick,58 who conclude that such adaptivedesigns are more likely to yield a large improve-ment in the expected number of successfultreatments when a large proportion of patientswith the disease are likely to be in the trial. Tamuraet al.438 report one of the few adaptive clinical trialsto take place, in patients with depressive disorder:the trial designed by Kadane258 also adapts its

allocation rules, in a somewhat complex way, to thecurrent evidence.

Although there are specific aspects of adaptiveallocation that cause practical problems, it ispossibly the formulation of a trial as a decisionrather than an inference that leads to mostobjections – see page 39.

The ECMO studiesTwo studies of extracorporeal membrane oxygena-tion (ECMO) have been considered by manyauthors. ECMO is a rescue therapy for severe lungdisease in newborns which carries the chance ofgreatly improved survival but at the risk of side-effects. An initial Zelen play-the-winner study wasfollowed by the Harvard two-phase adaptive studydesigned to minimise the number of infantsexposed to an inferior treatment: the design wasto randomise between ECMO and conventionalmedical treatment (CMT) in balanced blocksof four, until one treatment had four deaths, atwhich point all subsequent patients were to receivethe currently superior therapy. Nineteen patientswere enrolled in the first phase, with 4/10 deathsunder CMT and 0/9 deaths on ECMO – in thesecond phase,20 patients were randomised toECMO and one died. Interest has focused onwhat might have been concluded during the firstphase of the study, and Bayesian analyses havebeen provided by Ware,469 Berry,45,46,51 Kass andGreenhouse,267 Greenhouse and Wasserman,206

and Gustafson.219 Each of these is considered indetail in appendix 1.

Ware469 and his discussants46,47,267 provide a rangeof views on the ethics and appropriate analysis ofthis trial, and how existing information both aboutECMO and the control therapy could have beenused in formulating a prior opinion.

Trial designs other than twoparallel groups

Equivalence trialsThere is a large statistical literature on trialsdesigned to establish equivalence betweentherapies. From a Bayesian perspective the solutionis straightforward: define a region of equivalence(see page 27) and calculate the posterior proba-bility that the treatment difference lies in thisrange – a threshold of 95 or 90% might be chosento represent strong belief in equivalence. Severalexamples of this remarkably intuitive approachhave been reported.31,72,174,211,314,332,370,394,395 Racine et

Health Technology Assessment 2000; Vol. 4: No. 38

37

Page 48: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

al.371 consider two-stage designs, Metzler324 studiessample size while Lindley308 takes a decision-theoretic approach.

Cross-over trialsThe Bayesian approach to cross-over designs, inwhich each patient is given two or more treatmentsin an order selected at random, is fully reviewedby Grieve.212 More recent references concentrateon Gibbs sampling approaches175 and sensitivityanalysis to different assumptions about carry-overeffects.11,209,210,214,215,370

N-of-1 trialsN-of-1 studies can be thought of as repeated cross-over trials in which interest focuses on the responseof an individual patient, and Zucker et al.488 pool23 N-of-1 studies using a hierarchical modelproducing the standard shrinkage of the individualconclusions towards the average result. This canbe thought of as an extreme example of the subsetprocedure described previously, in which thesubsets have been reduced to individual patients.

Factorial designsFactorial trials, in which multiple treatments aregiven simultaneously to patients in a structureddesign, can be seen as another example of multi-plicity, and hence a candidate for hierarchicalmodels. Simon and Freedman407 and Miller andSeaman325 suggest suitable prior assumptions thatavoid the need to decide whether interactions door do not exist.

Other aspects of drugdevelopment

PharmacokineticsThe ‘population’ approach to pharmacokinetics, inwhich the parameters underlying each individual’sdrug clearance curve are viewed as being drawnfrom some population, is well established and isessentially an empirical Bayes procedure. ProperBayesian analysis of this problem has been exten-sively reported by Wakefield and colleagues,372,466

emphasising MCMC methods for estimating bothpopulation and individual parameters, as well asindividualising dose selection.467

Phase I trialsPhase I trials are conducted to determine thatdosage of a new treatment which produces a levelof risk of a toxic response which is deemed to beacceptable. The primary Bayesian contribution tothe development of methodology for Phase I trials

has been the continuous reassessment method(CRM) developed by O’Quigley and colleagues.342

In CRM a parameter underlying a dose–toxicitycurve is given a proper prior which is updatedsequentially and used to find the current ‘best’estimate of the dosage which would produce theacceptable risk of a toxic event if given to thenext subject, as well as giving the probability of atoxic response at the recommended dose at theend of the trial.340 High sensitivity of the posteriorto the prior distribution187 has been reported in asimilar procedure. Numerous simulations andmodifications of the method have beenproposed.10,102,163,197,222,274,326,329,341,445,479

Etzioni and Pepe161 suggest monitoring a Phase Itrial with two possible adverse outcomes via thejoint posterior distribution of the probabilities ofthe two outcomes with frequentist inference at theend of the trial.

Phase II trialsPhase II clinical trials are carried out in order todiscover whether a new treatment is promisingenough (in terms of efficacy) to be submitted to acontrolled Phase III trial, and often a number ofdoses may be compared. Bayesian work has focusedon monitoring and sample size determination.Monitoring on the basis of posterior probability ofexceeding a desired threshold response rate hasbeen recommended by Mehta and Cain,322 whileHeitjan227 adapts the proposed use of sceptical andenthusiastic priors (see page 30) in Phase IIIstudies. Korn et al.276 consider a Phase II studywhich was stopped after three out of four patientsexhibited toxicity; Bring75 and Greenhouse andWasserman206 re-examine their problem from aBayesian perspective.

Herson231 used predictive probability calculationsto select among designs with high power inregions of high prior probability. Thall and co-workers have also developed stopping boundariesfor sequential Phase II studies based on posteriorprobabilities of clinically important events,but where the designs are selected from thefrequentist properties derived from extensivesimulation studies.160,408,444,446,447,448,449,450 However,Stallard424 has criticised this approach as beingdemonstrably suboptimal when evaluated using afull decision-theoretic model with a monetary lossfunction.

Finally, Whitehead and colleagues81,473,474,477 havetaken a full decision-theoretic approach to allo-cating subjects between Phase II and Phase IIIstudies. For example, Brunier and Whitehead81

Randomised controlled trials

38

Page 49: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

consider the case where a single treatment witha dichotomous outcome is being evaluated for apossible Phase III trial, and use Bayesian decisiontheory to determine the number of subjectsneeded. They place a prior on the probabilityof success and calculate the expected cost ofperforming or not performing a Phase III trial,using a cost function which includes considerationof the costs to future patients if the inferior treat-ment is eventually used, the power of the possiblePhase III trial (which they assume will be carriedout by frequentist methods), and the costs ofexperimentation. They show how to determine,for given parameter values, the expected cost ofperforming a Phase II trial of any particular size,and thus the optimal size for a trial. (The work issaid to correct earlier work of Sylvester andStaquet.430,435,436,437)

When faced with selecting among a list of treat-ments and allocating patients, Pepple and Choi355

have considered two-stage designs, Yao et al.482

deal with screening multiple compounds andallocating patients within a programme, whileStraus and Simon432 use a prior distribution andhorizon.

Phase IV – safety monitoringA considerable literature exists on Bayesiancausality assessment in adverse drug reactions: seefor example Hsu et al.241 and Lanctot et al.279

Commentary

Table 6 briefly summarises some major distinctionsbetween the Bayesian and the frequentist approachto trial design and analysis.

Use of prior distributionsAs well as the general issues concerning the specifi-cation and use of prior distributions (see page 22),specific questions arise in the context of clinicaltrials. These include:

1. Whose prior? Pocock366 states that the“hardened sceptical triallist, the hopefulclinician and the optimistic pharmaceuticalcompany will inevitably have grossly differentpriors”, while Fisher171 also emphasises thediffering views of participants in health tech-nology assessment.

As mentioned on page 13, context is an essentialaspect of any assessment, and chapter 7describes the relevant perspectives of regulatoryauthorities and health policy makers.

2. What about type I error? A common objectionto Bayesian methods is their apparent lack ofconcern with type I error, with the eventualcertainty of rejecting a true null hypothesis.249

Counter-arguments to this position were givenon page 29 and 32.

Health Technology Assessment 2000; Vol. 4: No. 38

39

Issue Frequentist Bayesian

Information other than that in thestudy being analysed

Informally used in design Used formally by specifying a priorprobability distribution

Interpretation of the parameter ofinterest

A fixed state of nature An unknown quantity which can have aprobability distribution

Basic question ‘How likely is the data given aparticular value of the parameter?’

‘How likely is a particular value of theparameter given the data?’

Presentation of results Likelihood functions, P values,confidence intervals

Plots of posterior distributions of theparameter, calculation of specificposterior probabilities of interest, anduse of the posterior distribution informal decision analysis

Interim analyses P values and estimates adjusted forthe number of analyses

Inference not affected by the numberor timing of interim analyses

Interim predictions Conditional power analyses Predictive probability of getting a firmconclusion

Dealing with subsets in trials Adjusted P values (e.g. Bonferroni) Subset effects shrunk towards zero bya ‘sceptical’ prior.

TABLE 6 A brief comparison of Bayesian and frequentist methods in clinical trials

Page 50: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

3. Stopping rule dependence. A somewhat moresubtle objection, well described by Rosenbaumand Rubin,380 is that a Bayesian stopping rulebased on posterior tail areas may be over-dependent on the precise prior distribution.249

A possible response is that Bayesian stoppingshould not be based on a strict rule derivedfrom a single prior, and instead a variety ofreasonable perspectives investigated and a trialstopped only if there is broad convergence ofopinion.

4. What are the implications for the size of trials?There is no single implication for trial size.Matthews317 and Edwards et al.153 have suggestedthat small, open, trials fit well into a Bayesianperspective in which all evidence contributesand there is no demand for high power to rejecthypotheses.300 Alternatively, monitoring with asceptical prior may demand larger than standardsample sizes in order to convince an archetypalsceptic about treatment superiority.

The distinction made by Kadane and Seidenfeld263

between ‘experiments to learn’ and ‘experimentsto prove’ appears useful: trial design must naturallyreflect one’s objectives and different contexts maytherefore demand differing designs.

Whitehead475.476,478 argues that where a limitedgroup of investigators are engaged on a project,for example in a drug development programme,then a full Bayesian approach with a loss functionmay be sensible, but that there is no place forprior opinions in publicly scrutinised Phase IIIstudies.

As a final comment, O’Rourke343 suggests thatall methods have arbitrary aspects; a Bayesianapproach has at least the simplicity of collectingall of this arbitrariness into the prior.

Use of a loss function: is a clinical trialfor inference or decision?There has been a long and intense dispute aboutwhether a clinical trial should be considered as adecision problem, with an accompanying lossfunction, or as an inference.

1. A clinical trial should be a decision. Lindley304

categorically states that “Clinical trials are notthere for inference but to make decisions”,while Berry54 states that “deciding whether tostop a trial requires considering why we arerunning it in the first place, and this meansassessing utilities”. Healy223 considers that “inmy view the main objective of almost all trials

on human subjects is (or should be) a decisionconcerning the treatment of patients in thefuture”.

Claxton106,107 argues from an economic perspec-tive that a utility approach to clinical trial designand analysis is necessary in order to preventconclusions based on inferential methodsleading to health or monetary losses. He pointsout that “Once a price per effectiveness unit hasbeen determined, costs can be incorporated,and the decision can then be based on(posterior) mean incremental net benefitmeasured in either monetary or effectivenessterms”. The assumptions that need to be madeabout rapid dissemination of superior treat-ments are reasonable, he claims, since weshould be designing trials that are ‘best’ for ahealthcare system: the need for incentives orarrangements to persuade decision makers is aseparate issue.

See page 52 where methods for evaluating the‘payback’ of health research are discussed.

2. A clinical trial provides an inference. Armitage,15

Breslow,72 Demets,125 Simon402 and O’Rourke343

all describe how it is unrealistic to place clinicaltrials within a decision-theoretic context,primarily because the impact of stopping a trialand reporting the results cannot be predictedwith any confidence: Peto357 states that “Bather,however, merely assumes … “it is implicit thatthe preferred treatment will then be used for allremaining patients” and gives the problem nofurther attention! This is utterly unrealistic, andleads to potentially misleading mathematicalconclusions”. Peto goes on to argue that aserious decision-theoretic formulation wouldhave to model subsequent dissemination oftreatment – attempts to do this will be discussedon page 52.

3. It depends on the context. Whitehead478 pointsout that the theory of optimal decision-makingonly exists for a single decision-maker, andthat no optimal solution exists when makinga decision on behalf of multiple parties withdifferent beliefs and utilities. He thereforeargues that internal company decisions atPhase I and Phase II of drug development canbe modelled as decision problems, but thatPhase III trials cannot be.476 Koch272 alsoprovides a non-dogmatic discussion, in whichthe relevant approach depends on the questionbeing asked.

Randomised controlled trials

40

Page 51: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Key points

1. The Bayesian approach provides a frameworkfor considering the ethics of randomisation.

2. Monitoring trials with a sceptical and otherpriors may provide a unified approach toassessing whether a trial’s results would beconvincing to a wide range of reasonableopinion, and could provide a formal tool fordata monitoring committees.

3. Various sources of multiplicity can be dealt within a unified and coherent way.

4. In contrast to earlier phases of develop-ment, it is generally unrealistic to formulate aPhase III trial as a decision problem, except incircumstances where future treatments can beaccurately predicted.

5. An empirical basis for prior opinions in clinicaltrials should be investigated, but archetypalprior opinions play a useful role.

6. The structure in which trials are conducted mustbe recognised, but can be taken into account byspecifying a range of prior opinions.

Health Technology Assessment 2000; Vol. 4: No. 38

41

Page 52: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient
Page 53: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Introduction

Since the probability models used in Bayesiananalysis do not need to be based on actualrandomisation, non-randomised studies can beanalysed in exactly the same manner as random-ised comparisons, perhaps with extra attention toadjusting for covariates in an attempt to controlfor possible baseline differences in the treatmentgroups with respect to uncontrolled risk factorsor exposures. For example, the results from anepidemiological study, whether of a cohort orcase–control design, provide a likelihood whichcan be combined with prior information usingstandard Bayesian methods. The dangers of thisapproach have been well described in the medicalliterature,86 but nevertheless there are circum-stances where randomisation is either impossibleor where there is substantial valuable informationavailable in historical data. There is, of course,a degree of subjective judgement about thecomparability of groups, which fits well into the

acknowledged judgement underlying all Bayesianreasoning. It is possible that the effect of unmea-sured confounders can be modelled by means ofa prior distribution, which could be consideredas part of the explicit modelling of biases discussedon page 41. In this chapter we consider fouraspects of non-randomised comparisons: case–control studies, complex epidemiological models,explicit modelling of biases, and comparisons ofinstitutions on the basis of their outcomes. Thediscussion of historical controls within randomisedtrials (see page 27) is also relevant to the situationin which no contemporaneous controls areavailable.

Case–control studies

Case–control designs have been considered indetail by a number of authors,24,313,335,486,487 generallyrelying on analytic approximations in order toobtain reasonably simple analyses. For example,Ashby et al.24 examine two case–control studies(one being very small) and a cohort study ofleukaemia following chemotherapy treatment for

Hodgkin’s disease, and consider the consequencesof various prior distributions based on past studies,possibly downweighted. Lilford and Braunholtz’stutorial article concerns potential side effects oforal contraceptives, with likelihoods arising fromcase–control studies299 (see appendix 1 for furtherdetails).

Complex epidemiological models

One approach to assessing the value of an inter-vention is to construct a model for the naturalhistory of a chronic disease, and predict theconsequences of implementing a specific policy.Such models can be developed by synthesisingevidence from multiple sources (see chapter 6)in order to provide a ‘comprehensive decisionmodel’ for cost-effectiveness analyses. However,Craig et al.119 point out that predictions based onsuch a model require assumptions of parameterindependence which do not need to be made ifestimation and prediction are carried outsimultaneously.

Such simultaneous analysis can be carried out ifa large cohort database is available and the jointposterior distribution of the parameters of themodel obtained, say through MCMC techniques.Craig et al.119 describe such an analysis of apopulation-based cohort of patients with diabeticretinopathy in order to evaluate differentscreening policies. They construct a Markov modelfor transitions between disease states, using reason-ably non-informative prior distributions andMCMC estimation.

There is also a substantial literature on Bayesianmethods for complex epidemiological modelling,particular concerning problems spatial correla-tion,23,40,226,377 measurement error376 and missingcovariate data.373 Analysis is now almost universallyby MCMC methods, and considerable use hasbeen made of Bayesian graphical modelling tech-niques,417 which are further explored in theMagnesium (see chapter 10) and confidenceprofile (see chapter 11) case studies.

Health Technology Assessment 2000; Vol. 4: No. 38

43

Chapter 5

A guide to the Bayesian health technologyassessment literature: observational studies

Page 54: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Explicit modelling of biases

The fullest Bayesian analysis of non-randomiseddata for health technology assessment is probablythe confidence profile method of Eddy andcolleagues.150 This is seen primarily as a tool forevidence synthesis (discussed in chapter 6), butalso emphasises the ability to explicitly modelpotential biases that occur both within studies andin the attempt to generalise studies outside theirtarget population.

They identify as ‘biases to internal validity’those features of a study that may mean thatthe effect of interest is not being appropriatelyestimated within the circumstances of thatstudy: these include dilution and contaminationdue to those who are offered a treatment notreceiving it (see chapter 11), errors in measure-ment of outcomes, errors in ascertainment ofexposure to an intervention, loss to follow-up,and patient selection and confounding in whichthe groups differ with respect to measurablefeatures. These biases may occur singly or incombination.

‘Biases to external validity’ concern the ability ofa study to generalise to defined populations or tobe combined with studies carried out on differentgroups. These include ‘population bias’ in whichthe study and general population differ withrespect to known characteristics, ‘intensity bias’ inwhich the ‘dose’ of the intervention is varied whengeneralised, and differences in lengths of follow-up. Finally, when combining studies it is possibleto downweight the likelihood, much as historicalcontrol data can be downweighted when forminga prior (see page 27).

Eddy et al.150 show how each of these biases can begiven a mathematical formulation, and hence canbe used to adjust the findings of any study. Tworequirements are necessary. First, analytic solu-tions are rarely possible, and so approximationsor simulations are necessary: chapter 11 showshow these are now reasonably straightforward.More serious are the necessary assumptionsrequired concerning the extent of the biases.Data may be available on which to base accurateestimates, but there is likely to be considerablejudgemental input. Any unknown quantity can,of course, be given a prior distribution, and Eddyet al. claim this obviates the need for sensitivityanalysis. Rittenhouse378 has argued that explicitallowance for external biases is necessary forcost-effectiveness studies.

Institutional comparisons

A classic ‘multiplicity’ problem arises in the use ofperformance indicators to compare institutionswith regard to their health outcomes or use ofparticular procedures. Analogously to subsetestimation (see page 35) and meta-analysis (seepage 47), hierarchical models can be used to makeinferences based on estimating a common prioror ‘population’ distribution.195,333 An additionalbenefit of using MCMC methods (see page 12) isthe ability to derive uncertainty intervals aroundthe rank order of each institution.312 Fully Bayesianmethods have also been used in the analysis ofpanel agreement data on the appropriateness ofcoronary angiography.27

The case study in chapter 12 describes an analysisof success rates in in vitro fertilisation clinics, inwhich Bayesian methods are used both to makeinferences on the true rank of each clinic, as wellas estimating the true underlying success rates withand without an exchangeability assumption.

Commentary

To date there has been relatively little work doneon Bayesian health technology assessment in anepidemiological setting compared with that inrandomised controlled trials. One importantreason for this is the often complex regressionmodels that are used routinely in epidemiologyto, for example, adjust for known confounders,with corresponding computational difficultiesof Bayesian analysis. The advent of computer-intensive methods (see page 12) has largelyovercome that problem.

In the future we can expect demands for increas-ingly complex analyses, such as of institutionalcomparisons and epidemiological data concernedwith gene–environment interactions, which willplace great demands on traditional hypothesistesting and estimation procedures whichwere essentially intended for fairly simple low-dimensional problems. This is likely to lead toincreased demands for Bayesian analyses.

Key points

1. Epidemiological studies tend to demand a morecomplex analysis than randomised trials.

2. Computer-intensive Bayesian methods inepidemiology are becoming more common.

Observational studies

44

Page 55: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

3. There are likely to be increased demandsfor Bayesian analyses, particularly in areassuch as institutional comparisons and gene–environment interactions.

4. The explicit modelling of potential biases inobservational data may be widely applicablebut needs some evidence base in order to beconvincing.

Health Technology Assessment 2000; Vol. 4: No. 38

45

Page 56: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient
Page 57: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Meta-analysis

Bayesian meta-analysis follows the idea that hasalready been explored in other contexts of multi-plicity, such as subset analysis (see page 35),multicentre trials (see page 36) and multipleoutcomes (see page 36), in considering each trialto be a ‘unit’ of analysis within a hierarchical struc-ture. The ‘true’ treatment effect in each trial isconsidered, assuming that the trials are exchange-able, as a random quantity drawn from somepopulation distribution. This is exactly the sameas the standard random-effects approach to meta-analysis,130 but the latter tends to focus on esti-mating an overall treatment effect while a fullBayesian approach also concentrates on estimatingtrial-specific effects. The Bayesian approach alsorequires prior distributions to be specified for themean effect size and for the between- and within-study variances: these will generally be default‘reference’ priors, but including the uncertainty ofall the parameters will tend to give wider intervalestimates than a classical random effects analysis.Sutton et al.434 review the whole area of meta-analysis and Bayesian methods in particular:other reviews are provided by Jones,254 Normand334

and Hedges.225

Empirical Bayes approaches have received mostattention in the literature until recently, largelybecause of computational difficulties in the use offully Bayesian modelling.374,431 However, the fullBayesian hierarchical model has been investigatedextensively by Dumouchel and colleagues142,143,144,145

and Abrams and Samso4 using analytic approxima-tions, and also using MCMC methods.327,412 Carlin,94

for example, considers meta-analyses of bothclinical trials and case–control studies; he examinesthe sensitivity to choice of reference priors, andexplores checking the assumption of normalrandom effects. There have been many compara-tive studies of the full Bayesian approach,including trials,379,433,455 and observationalstudies.66,433,457 These comparative studies betweendifferent approximations show few substantialeffects: the primary finding is that when there arefew studies, and hence the between-study variability

cannot be accurately estimated from the dataalone, the prior for this parameter becomes impor-tant and the empirical Bayes approach, in whichthe uncertainty about the between-study variabilityis ignored, tends to provide intervals that are toonarrow.

It is natural to use a cumulative meta-analysis asexternal evidence when monitoring a clinicaltrial,230 and cumulative meta-analysis can also begiven a Bayesian interpretation as providing a priordistribution284 (see page 18): in this situation theBayesian approach relies on the assumption ofexchangeability of trials but avoids concerns withretaining type I error over the entire course of thecumulative meta-analysis.

Others have investigated relationship of treatmenteffect to underlying risk320,393,453 – see the accompa-nying case study for an application of this approach(see chapter 10). Priors on the heterogeneityparameter were considered in chapter 3: Higginsand Whitehead233 use proper priors derived froma series of meta-analyses. Another application isinvestigation of publication bias, which has beenmodelled by Begg et al.34 and Givens et al.,194 whileDaniels and Hughes122 pool studies in order tomodel a joint distribution of a surrogate end-pointand eventual response.

Sutton et al.434 summarise the potential advantagesof the Bayesian approach to meta-analysis asincluding:

1. Unified modelling. The conflict between fixedand random effect meta-analysis is overcomeby explicitly modelling between-trial variability(which could be assumed to be small), as well asallowing regression models for the treatmenteffect in each trial.

2. Borrowing strength. As in all areas in whichBayesian hierarchical modelling is adopted,the exchangeability assumption leads to eachexperimental unit ‘borrowing’ informationfrom the other units, leading to a shrinkage ofthe estimate towards the overall mean, and a

Health Technology Assessment 2000; Vol. 4: No. 38

47

Chapter 6

A guide to the Bayesian health technologyassessment literature: evidence synthesis

Page 58: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

reduction in the width of the interval estimate.This degree of pooling depends on theempirical similarity of the estimates from theindividual units.

3. Allowing for all parameter uncertainty. The fulluncertainty from all the parameters is reflectedin the widths of the intervals for the parameterestimates.

4. Allowing for other sources of evidence. Othersources of evidence can be reflected in the priordistributions for parameters, or in poolingmultiple types of study (see below).

5. Allowing direct probability statements. As withall Bayesian analyses, quantities of interest canbe directly addressed, such as the probabilitythat the true treatment effect in a typical trial isgreater than 0.

6. Predictions. The ease of making predictionsallows, for example, current meta-analyses to beused in designing future studies.

Cross-design synthesis

The previous section has dealt with meta-analysisof studies with similar basic design, but a moregeneral approach allows mixing of different typesof study. Rubin384 emphasises pooling evidencethrough modelling in order to “build and extrapo-late a response surface”, which models the truetreatment effect conditional on both the designof the study and on subgroup factors.

As noted when discussing observational studies(see chapter 5), in some circumstances random-ised evidence will be less than adequate due toeconomic, organisational or ethical consider-ations.67 Considering all the available evidence,including that from non-randomised studies, maythen be necessary or advantageous. Droitcouret al.140 describe the limitations of using eitherrandomised controlled trials or databases alone, inthat randomised controlled trials may be rigorousbut restricted, whereas databases have a widerrange but may be biased. They introduce whatthey term cross-design synthesis, an approach forsynthesising evidence from different sources, withthe aim “not to eliminate studies of overall lowquality from the synthesis, but rather to providethe information needed to compensate for specificweaknesses”.140 Although not a Bayesian approach,they are following the explicit modelling of biasesconsidered by the confidence profile method (see

pages 44, and 49), and work on generalising theresults of clinical trials for broader populations.Cross-design synthesis was outlined in a reportfrom the US General Accounting Office,337 but aLancet editorial152 was critical of this approach,suggesting it would deflect attention from carryingout serious controlled trials: this was denied in asubsequent reply.101 A review by Begg33 suggestedthey had underestimated the difficulty of the task,and appeared to assume that randomised trials anddatabases could be reconciled by statistical adjust-ments, whereas selection biases and differences inexperimental rigour could not be eliminated soeasily.

It is natural to take a Bayesian approach to thesynthesis of multiple trial designs, and a hierar-chical model can specifically allow for the quanti-t ative within- and between-sources heterogeneity,and for a priori beliefs regarding qualitative differ-ences between the various sources of evidence. Afull Bayesian version of cross-design synthesis wassubsequently applied to data on breast cancerscreening.3,411 The concept of combining differenttypes of study in a model has also been termed‘grouped meta-analysis’; Li and Begg296 combinecontrolled and uncontrolled studies, Larose andDey283 integrate open and closed studies within asingle model, while Dominici et al.138 pool openand closed studies on migraine using a graphicalmodel, different treatment contrasts and differentdesigns. Muller et al.330 combine case–control andprospective studies.

Other examples include Berry et al.,62 who considera complex synthesis of studies concerning breastcancer therapy, facing up to issues such asunplanned analyses, multiple variables, lack ofexchangeability across and within studies, and theproblem of convincing practitioners on the basisof such a complex analysis. Belin et al.35 combineobservational databases in order to evaluate inter-ventions to increase screening rates, needingto impute missing data in some studies. Suchintegrated analyses naturally lead on to the‘comprehensive decision models’ discussed inchapter 7.

It has yet to be established when such analyses areappropriate, as there is concern that includingstudies with poorer designs will weaken theanalysis, though this issue is partially addressedby conducting sensitivity analyses under variousassumptions. However, an example of such asynthesis is provided in the context of regulatoryapproval of medical devices (see page 53).

Evidence synthesis

48

Page 59: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Confidence profile method

This approach was developed by Eddy andcolleagues and promulgated in a book withnumerous worked examples and accompanyingsoftware,150 as well as tutorial articles.146,148,149,151,399

They use directed conditional independencegraphs to represent the qualitative way in whichmultiple contributing sources of evidence relate tothe quantity of interest, explicitly allowing the userto discount studies due to their potential internalbias or their limited generalisability (see page 44).Their analysis is essentially Bayesian, although it ispossible to avoid specification of priors and useonly the likelihoods.

The software for carrying out the confidenceprofile method, FAST*PRO, has been used inmeta-analysis of the benefits of antibiotic therapy,28

mammography of those aged under 50 years147 andangioplasty.8

The need to make explicit subjective judgementsconcerning the existence and extent of possiblebiases, and the limited capacity and friendliness ofthe software, has perhaps restricted the applicationof this technique. However, we show in chapter 11that modern software can allow straightforwardimplementation of their models promulgated byEddy and co-workers.

Key points

1. A unified Bayesian approach appears to beapplicable to a wide range of problemsconcerned with evidence synthesis.

2. In the past, prospective evaluation of clinicalinterventions concentrated on randomisedcontrolled trials, but more recent interesthas focused on more diffuse areas, such ashealthcare delivery or broad public healthmeasures. This means methods that cansynthesise totality of evidence are required,for example in assessing medical devices.

3. Evaluations of current technologies may oftenbe seen as unethical subjects for randomisedcontrolled trials, and hence modelling ofavailable evidence is likely to be necessary.

4. Perhaps one reason for lack of uptake is thatsyntheses are not seen as ‘clean’ methods, witheach analysis being context-specific, less easy toset quality markers for, easier to criticise assubjective and so on.

5. Priors for the degree of ‘similarity’ betweenalternative designs can be empirically informedby studies comparing the results of randomisedcontrolled trials and observational data.

Health Technology Assessment 2000; Vol. 4: No. 38

49

Page 60: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient
Page 61: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Contexts

Throughout this report we have emphasised that itwas vital to take into account the context of healthtechnology assessment is being made. The appro-priate prior opinions, and the possibility of explicitloss functions, depend crucially on whose behalfany analysis is being reported or decision beingmade.

In this chapter we specifically address this issueusing the broad categories of ‘actors’ introduced inchapter 2:

• Sponsors, for example the pharmaceuticalindustry, medical charities or granting agencies.In deciding whether to fund studies, they will beconcerned with the potential ‘payback’ fromresearch (see page 52), which within industrytakes the form of a drug developmentprogramme.

• Investigators, that is, those responsible for theconduct of a study, whether industry or publiclyfunded. In previous chapters we have focusedprimarily on those carrying out a single study,who are primarily concerned with the accuracyof the inferences to be drawn from their work.

• Reviewers, for example regulatory bodies(see page 53) or journal editors. They will beconcerned with the appropriateness of the infer-ences drawn from the studies, and so may adopttheir own prior opinions and reportingstandards (see page 33).

• Consumers, for example agencies setting healthpolicy, clinicians or patients. Healthcare organi-sations may be concerned with the cost-effectiveness of an intervention (see page 51)although the sponsor or investigator may carryout this analysis on their behalf. Decisions abouthealth policy, whether at a community or indi-vidual level, may involve explicit considerationof costs and benefits (see page 54) – here wedistinguish between those based only on prior

opinions (forming a decision analysis), andthose requiring a full Bayesian statistical analysis.

In the remainder of this chapter we examine thepotential Bayesian contribution to these differentperspectives.

Cost-effectiveness within trials

The traditional tool for dealing with uncertainty incost-effectiveness analysis has been sensitivityanalysis, although there has been considerablerecent work on developing classical confidenceintervals for cost-effectiveness ratios. This area hasrecently been reviewed by Briggs and Gray,74 whomention the use of probabilistic sensitivity analysis,in which prior probability distributions are placedover uncertain inputs into the analysis and theresulting distribution of potential cost-effectivenessratios is generated by simulation. This is a partic-ular case of a Bayesian method: the general way inwhich uncertainty is handled by the Bayesianapproach has been emphasised by Manning et al.311

and Jones.252,253

From a technical perspective, Grieve216 showshow pharmaco-economics naturally gives rise toa bivariate posterior distribution of costs andeffectiveness, which can be plotted and fromwhich the probability of specific conclusions maybe obtained. Heitjan et al.229 illustrate this bivariateapproach with a number of examples, whileBriggs73 avoids the problem of possible zerodenominators in the cost-effectiveness ratio byworking directly with the prior and likelihood fornet benefit relative to a specified baseline ratio.

Luce and Claxton310 present a strong argumentthat Bayesian methods should be applied in cost-effectiveness studies and pharmaco-economics ingeneral. They point out that hypothesis testing isof limited relevance in economic studies, and thatadditional evidence outside a study is likely to berelevant. Furthermore, when a cost-effectivenessanalysis is being used as one of the inputs into a

Health Technology Assessment 2000; Vol. 4: No. 38

51

Chapter 7

A guide to the Bayesian health technologyassessment literature: strategy, decisions

and policy making

Page 62: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

formal decision concerning drug regulationor health policy, then they recommend a fulldecision-theoretic approach in which an explicitloss function of the decision maker is assessed.This view has led to a Bayesian initiative inpharmacoeconomics (see appendix 2). Felli andHazen168,169 extend this utility perspective to sensi-tivity analysis, suggesting that an analysis should beconsidered sensitive to a particular uncertain inputif the expected gain in utility from eliminatingthe uncertainty about that input exceeds a certainspecified threshold. The use of such an expectedvalue of perfect information (EVPI) approach hasalso been recommended when deciding researchpriorities (see page 52).

Finally, Rittenhouse378 suggests that trial resultsmay need to be adjusted to in order to generalisethe cost-effectiveness analysis to other populationsof interest. This essentially is concerned with thetype of adjustments used in cross-design synthesis(see page 48) and the explicit modelling of biasesin observational studies (see page 44).

Cost-effectiveness of carrying outtrials – ‘payback models’

Research planning in the public sectorAny organisation funding clinical trials mustmake decisions concerning the relative importanceof alternative proposals, and there have beenincreased efforts to measure the potential‘payback’ of expenditure on research. Buxton andHanney83 review the issues and propose a stagedsemi-quantitative structure, while Eddy146 suggestsa fully quantitative model based on assessingthe future numbers to benefit and the expectedbenefit, with a subjective probability distributionover the potential benefits to be shown by theresearch. However, Eddy’s146 limited approach wasnot adopted by its sponsors, the US Institute ofMedicine, who preferred a less structured modelthat employed weights.

It is clearly possible to extend this broad approachto increasingly sophisticated models within aBayesian framework, and Hornberger andEghtesady state that “by explicitly taking intoconsideration the costs and benefits of a trial,Bayesian statistical methods permit estimationof the value to a healthcare organisation ofconducting a randomised trial instead of contin-uing to treat patients in the absence of moreinformation”.238 Clearly this is a particular exampleof a decision-theoretic Bayesian approach, appliedat the planning stage of a trial (see page 28) rather

than at interim analyses (see page 32). Examplesgiven on page 28 by Detsky,131 Hornberger238,239

and others explicitly calculate the expected utilityof a trial in order to select sample sizes, and suchcalculations can also, in theory, be used to rankstudies that are competing for resources, andhence to decide whether the trial is worth doingin the first place.

Detsky’s early analysis131 assumed that a trialwould need to achieve statistical significance inorder to have an impact on future treatments, butClaxton106 strongly argues that dependence onsuch inferential methods, whether classical orBayesian, will lead to suboptimal use of healthresources. He recommends a full decision-theoretic approach to both fixed107 and sequen-tial106 trials, basing his analysis on a univariatescale comprising the net benefit relative to aprespecified standard measure of effectivenessper unit cost. The EVPI must be higher than thecost of research in order to pass the first ‘hurdle’for a proposed programme to overcome, andthe expected value of sample information (EVSI)(essentially the EVPI allowing for the samplingerror of a trial) must exceed the sample coststo overcome the hurdle for a specific proposedtrial. This model allows for unbalanced allocationof patients between arms, and the ability torevise design based on interim analyses,452 inorder to optimise the expected net benefit fromsampling (ENBS), which is the EVSI minussample costs.

The standard criticisms of decision-theoreticapproaches to trials apply (see page 40), particu-larly regarding unrealistic assumptions concerningthe impact of research results (which may noteven be ‘significant’) on clinical practice. Claxtonreplies that the first step should be to establish anormative framework that best meets the needs ofa system, and separately to conduct studies to seehow to get the research into practice.106

More generally, the role of decision analysis inall aspects of health services research has been

emphasised by Lilford and Royston.298

Research planning in thepharmaceutical industryBerry49,59 has stressed the decision-theoreticapproach to sequential trial design as beingrelevant to a a pharmaceutical company seeking tomaximise profit. However, here we are concernedwith a whole research programme in which thereare multiple competing projects at different stages

Strategy, decisions and policy making

52

Page 63: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

of drug development. It is natural that the pharma-ceutical industry would wish to organise itsprogramme in a cost-effective way, and Bergmanand Gittins39 review quantitative approaches toplanning a pharmaceutical research programme.Many of the proposed methods are sophisticateduses of bandit theory in order to allocate resourcesin a dynamically changing environment, butSenn396,398 suggests a fairly straightforward schemebased on the Pearson index, which is the expectednet present value divided by expected net presentcosts. He discusses the difficulties of elicitingsuitable probabilities for the success of each stageof a drug development programme, conditionalon the success of the previous stage, but suggestsformal Bayesian approaches involving subjectiveprobability assessment and belief revision shouldbe investigated in this context.

The regulatory perspective

Regulation of pharmaceuticalsRegulatory bodies have a duty to protect the publicfrom unsafe or ineffective therapies, and increas-ingly are taking on responsibility for assuring cost-effectiveness.

Opinions on the relevance of Bayesian methods todrug or device regulation cover a broad spectrum:Whitehead478 and Koch272 see any use of priors asbeing controversial and inappropriate, while onthe other hand Matthews319 claims that the use ofsceptical priors “should not be optional but manda-tory”. Keiding269 criticises the “ritual dances”currently prescribed for regulation, but wonderswhether Bayesian methods will allow anything lessridiculous. Claxton105 suggests that agencies takeon a full decision-theoretic approach to regulation,that evaluates the expected value of further investi-gation in order to assess whether sufficientevidence is available to permit approval. O’Neill,339

as a senior US FDA statistician, acknowledges theappropriate conservatism arising out of the useof sceptical priors, and considers that Bayesianmethods should be investigated in parallel withother techniques.

The website of the FDA allows one to search forreferences to Bayesian methods among theirpublished literature (see appendix 2). Much of thediscussion concerns medical devices (see page 53).Guidelines for population pharmacokinetics areprovided,462 which can be thought of as an empir-ical Bayes procedure (see page 38). There is alsoan interesting use of a Bayesian argument in theapproval of the drug enoxaparin (Lovenox®). The

transcript of the Cardiovascular and Renal DrugsAdvisory Committee meeting on 26 June 1997461

shows the pharmaceutical company had beenasked to make a statement about the effectivenessof enoxaparin plus aspirin as compared to placebo(aspirin alone), whereas their clinical trial hadused an active control of heparin plus aspirin.They therefore used meta-analysis data comparingheparin plus aspirin versus aspirin alone in orderto produce a posterior distribution on the treat-ment comparison of interest: an example ofcross-study inference (see page 7.2). Analyseswere repeated using the meta-analysis datadirectly, but also expressing scepticism about itsrelevance and reducing its influence, with resultsbeing expressed as posterior probabilities of treat-ment superiority over placebo. The committeewelcomed this analysis and voted to approve thedrug.

It is important to note that the latest internationalstatistical guidelines for pharmaceutical submis-sions to regulatory agencies state that “the use ofBayesian and other approaches may be consideredwhen the reasons for their use are clear and whenthe resulting conclusions are sufficiently robust”.248

Unfortunately, they do not go on to define whatthey mean by clear reasons and robust conclusions,and so it is still open as to what will constitute anappropriate Bayesian analysis for a pharmaceuticalregulatory body.

Regulation of medical devicesThe greatest enthusiasm for Bayesian methodsappears to be in the FDA Center for Devices andRadiological Health (CDRH). They co-sponsoreda workshop on Bayesian methods in November1998, and are currently proposing to produce thedocument Statistical Guidance on Bayesian Methods inMedical Device Clinical Trials.460

Campbell recently described the potential forBayesian methods in assessing medical devices,87

emphasising that devices differed frompharmaceuticals in having better understoodphysical mechanisms, which meant that effective-ness was generally robust to small changes. Sincedevices tended to develop in incremental steps,a large body of relevant evidence existed, andcompanies did not tend to follow establishedphases of drug development. The fact that anapplication for approval might include a variety ofstudies, including historical controls and registries,suggests that Bayesian methods for evidencesynthesis might be appropriate. However, thestandard conditions apply that the source androbustness of the prior information must be

Health Technology Assessment 2000; Vol. 4: No. 38

53

Page 64: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

assessed, and that Bayesian analysis does notcompensate for poor science and experimentaldesign.

Campbell draws attention to the Transcan BreastScanner®, which was approved by the CDRH inApril 1999.463 A primary ‘intended use’ study on72 women was supplemented by two additionalstudies of differing designs, using a hierarchicalmultinomial logistic regression model with studyintroduced as a random effect. MCMC simulationmethods were used by means of the BUGSsoftware.423

Policy making and ‘comprehensivedecision modelling’The primary advantage of a Bayesian approach isthat it allows the synthesis of all available sources ofevidence, whether from RCTs, databases or expertjudgement, into a single model that can then beused to evaluate the cost-effectiveness of alternativepolicies. The approach has been termed ‘compre-hensive decision modelling’, and can be thoughtof as extending the evidence synthesis methodsdescribed in chapter 6 to allow for costs in partic-ular and utilities in general.

Parmigiani and colleagues350,352 apply this idea toscreening for breast cancer, in which many sourcesof evidence are brought together in a single modelthat predicts the consequences of alternativescreening policies, while Cronin et al.121 use micro-simulation at the level of the individual patient to

predict the consequences of different policy deci-sions on lowering expected mortality from prostatecancer. Samsa et al.387 consider ischaemic stroke,and construct a model for natural history using datafrom major epidemiological studies, and a modelfor the effect of interventions based on databases,meta-analysis of trials and Medicare claim records.They also use micro-simulation of the long-termconsequences of different stroke preventionpolicies in order to compare their cost-effectiveness:Matchar, Parmigiani and colleagues315,351,353 considerfurther use of the stroke prevention policy model(SPPM), developed under the auspices of theStroke PORT (Patient Outcomes Research Team).

Key points

1. A Bayesian approach allows explicit recognitionof multiple perspectives.

2. Increased attention to pharmaco-economicsmay lead decision-theoretic models for researchplanning to be explored, although this will notbe straightforward.

3. There appears to be great potential for formalmethods for planning in the pharmaceuticalindustry.

4. The regulation of devices is leading the way inestablishing the role of evidence synthesis.

5. ‘Comprehensive decision modelling’ is likely tobecome increasingly important in policy making.

Strategy, decisions and policy making

54

Page 65: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

In this chapter we present a checklist againstwhich published accounts of Bayesian assess-

ments of health technologies can be compared. Weaim to ensure that an account which adequatelycontains all the points mentioned here would havethe property the analysis could be replicated byanother investigator who has access to the full data.

These guidelines should be seen as complementaryto the CONSORT (Consolidated Standards ofReporting Trials) guidelines, in that they focus onthose aspects crucial to an accountable Bayesiananalysis, in addition to standard sectionsconcerning the technology, the design and theresults.

Introduction

1. The technology. The intervention to beevaluated must, of course, be clearly describedwith regard to the population of interest and soon.

2. Objectives of study. It is important that a cleardistinction is made between desired inferenceson any quantity or quantities of interest, repre-senting the parameters to be estimated, and anydecisions or recommendations for action thatare to made subsequent to the inferences. Theformer will require a prior distribution, whilethe latter will require explicit or implicit consid-eration of a loss function/utility.

Methods

1. Design of study. This is a standard requirement,but when synthesising evidence, particularattention will be necessary to the similarity ofstudies in order to justify assumptions ofexchangeability.

2. Statistical model. The probabilistic relationshipbetween the parameter(s) of interest and theobserved data should be explicitly described.The relationship should either be given

mathematically, or its structure should bedescribed in such a way as to allow its mathemat-ical form to be unambiguously obtained by acompetent reader. If this likelihood has beenobtained by a method of model selection,whether Bayesian or not, this should be statedand the method described.

3. Prospective analysis? It needs to be made clearwhether the prior and any loss function wereconstructed preceding the data collection, andwhether analysis was carried out during thestudy.

4. Loss function. If an explicit method ofdeducing scientific consequences is decidedprior to the study, this should be explicitlystated. This will often be a range of equivalence(a range of values such that if the parameter ofinterest lies in that range, two different technol-ogies may be regarded as being of equal effec-tiveness), or a loss function whose expectedvalue is to be minimised with respect to theposterior distribution of the parameter ofinterest yielding an estimated value of theparameter. If these have been obtained by anelicitation process from experts, this should bestated and the process described. Any intentionto investigate the dependence of the finalconclusion on the range of equivalence, etc.,should be described.

5. Prior distribution. Explicit priors for the param-eters of interest should be given, clearly showingwhether an informative or ‘non-informative’prior is being used. If they have been obtainedby an elicitation process this should be statedand described. If it is intended to examine theeffect of using different priors on the conclusionof the study, the alternative priors explicitlyshould be stated. Any empirical evidenceunderlying the prior assessment should beprovided.

6. Computations. These need to be described tothe extent that a mathematically competentreader could, if necessary, repeat all the

Health Technology Assessment 2000; Vol. 4: No. 38

55

Chapter 8

BayesWatch: a Bayesian checklist for healthtechnology assessment

Page 66: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

calculations and obtain the required results.Details of any software used to obtain theresults should be given. If MCMC methods arebeing used, the choice of starting values, thenumber and length of runs and convergencediagnostics to be used should be clearly statedand justified

Results

1. Evidence from study. As much informationabout the observed data – sample sizes, measure-ments taken – as is compatible with brevity anddata confidentiality should be given.

Interpretation

1. Reporting. The posterior distributions should beclearly summarised. In most cases, this shouldinclude a presentation of posterior credibleintervals and a graphical presentation of theposterior distribution. If either a formal orinformal loss function has been described, theresults should be expressed in these terms.

It is also essential that the likelihood can bereconstructed, usually through informationgiven under ‘evidence from study’, so that subse-quent users can establish the contribution fromthe study to, say, a meta-analysis. There shouldbe a careful distinction between the report asa current summary for action, in which case asynthesis of all relevant sources of evidence isappropriate, and the report as a contributor ofinformation for future action.

2. Sensitivity analysis. If alternative priors and/orexpressions of the consequences of decisionshave been given in the sections above, theresults of these should be presented.

Example

The following is a single example from the litera-ture summarised using the BayesWatch headings. Afull list of ‘three-star’ Bayesian health technologyassessment studies is provided in appendix 1.

Author. Abrams K, Ashby D and Errington D.1

Title. Simple Bayesian analysis in clinical trials –a tutorial.

Year. 1994.

The technology. High-energy neutron therapyagainst standard photon therapy in treatment ofpelvic cancers.

Objectives of study. To estimate the odds ratio for12 month survival.

Design of study. Randomised controlled trial.Separate trials were ran concurrently for cancersof the rectum, bladder, colon and cervix. Subjectswere randomised in a ratio of 3:1 towards neutrontherapy from 10 February 1986 until 11 January1988 and then in a ratio of 1:1 until 12 February1990.

Evidence from study. Twelve-month survival: 61subjects on photon treatment, 36 alive, 25 dead;90 subjects on neutron treatment, 44 alive, 46dead.

Statistical model. Binomial model with 12 monthmortality rate qp on photon treatment and qn onneutron treatment. Inference is on the log(oddsratio) log[(qn(1 – qp)/qp(1 – qn)].

Prospective analysis? Partly: prior distributionselicited prospectively, analysis performedretrospectively.

Loss function. No explicit loss function. Anaverage of 10 clinicians demanded an odds ratio ofless than 0.63 before routinely preferring neutrontherapy.

Prior distribution. Ten clinicians were asked tospecify their prior for 12 month mortality onneutron therapy by the roulette method assuming12 month mortality on photon therapy to be 0.5.The arithmetic mean of these was taken and abeta prior superimposed by equating moments:qp ~ Beta(5.25, 6.17). The 12 month survival onproton therapy was taken to have a beta priorwith a mean of 0.5 (specified by clinicians) and avariance of 0.01: qn ~ Beta(12, 12) (suggested byvariability of previous studies).

Computations. Conjugate analysis for qp and qn,and normal approximation for the posterior of thelog(odds ratio).

Reporting. Kaplan–Meier survival curves. Plotsare shown of the prior, likelihood (‘referenceprior’) and posterior. Medians, 95% credibleintervals and probabilities of the odds ratio beingless than 1, and less than the clinical demand(0.63), are given.

BayesWatch

56

Page 67: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Sensitivity analysis. The analysis was repeated usinga reference prior.

Comments. Likelihood conflicts with the priordistribution in the direction of effect. There is

deliberate use of an analytically tractable model fortutorial purposes. This study is also considered bySpiegelhalter et al.421 using a log(hazard ratio)scale.

Health Technology Assessment 2000; Vol. 4: No. 38

57

Page 68: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient
Page 69: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

This case study briefly describes the prospectiveuse of an informal Bayesian monitoring

procedure. The information has been kindlyprovided by Dr Mahesh Parmar, MRC ClinicalTrials Unit (Cancer Division), Cambridge. Primaryreferences are Saunders et al.390 and Parmar et al.348

The technology. In 1986 a new radiotherapy tech-nique called CHART was introduced. Its conceptwas to give radiotherapy continuously (no weekendbreaks), in many small fractions (three a day) andaccelerated (the course completed in 12 days). Itshould be clear that there are considerable logis-tical problems in efficiently delivering CHART.Promising non-randomised and pilot studies ledthe UK Medical Research Council to instigate twolarge randomised trials for head-and-neck and lungcancer. Only the lung cancer trial390 is consideredhere.

Objectives of study. To estimate the change insurvival in lung cancer patients when given CHARTcompared with conventional radiotherapy, inparticular the probability that it provides a clini-cally important difference in survival thatcompensates for any additional toxicity andproblems of delivering the treatment.

Design of study. A 60:40 randomisation in favourof CHART was selected, with 90% power (with atwo-sided 5% significance level) to detect a 10%improvement in 2 year survival over the 15%expected under conventional treatment. Thiswould require around 600 patients with 470expected events, with accrual expected to last4 years and a further 1 year of follow-up. Thealternative hypothesis of 10% is the mean of thesubjective prior distribution expressed by 11clinicians (see below).

The trial began recruitment in January 1990, withplanned annual meetings of the Data MonitoringCommittee (DMC) to review efficacy and toxicitydata. No formal stopping procedure was specifiedin the protocol.

Evidence from study. The data reported at eachmeeting of the DMC is shown in Table 7.

Recruitment stopped in early 1995 after 563patients had entered the trial. It is clear that theextremely beneficial early results were not retainedas the data accumulated, although a clinicallyimportant and statistically significant differencewas eventually found.

The statistical model. We assume a proportionalhazards model where the hazard ratio is defined asthe hazard under standard treatment to the hazardunder CHART, and hence hazard ratios greaterthan 1 indicate the superiority of CHART. Ifrandomisation were balanced between the arms ofthe trial, the likelihood for the log(hazard ratio) qmay be assumed to be normally distributed withmean 4L/m and variance 4/m, where L is the log-rank statistic (observed number of deaths in theCHART group minus the expected number underthe null hypothesis) and m is the total numberof deaths.421 Since the randomisation ratio is60:40 in favour of CHART, this variance mustbe changed to 3.84/m. The log(hazard ratio)q can be transformed, under the proportionalhazard assumption, to an improvement in2 year survival rate d through the relationshipeq = log p0/log(d + p0), where p0 is the 2 yearsurvival rate under conventional treatment,assumed to be 15%. This relationship allows us toobtain estimates and intervals for d from normalposterior distributions calculated on the q scale.

Health Technology Assessment 2000; Vol. 4: No. 38

59

Chapter 9

Case study 1: the CHART (lung cancer) trial

Date No. of patients No. of deaths Observedhazard ratio

2 year % survivalimprovement(95% CI)

Two-sidedP value

1992 256 78 1.82 20 (5, 36) 0.0071993 346 175 1.69 18 (7, 28) 0.00041994 460 275 1.43 12 (4, 20) 0.0031995 563 379 1.33 9 (3, 16) 0.0041996 563 444 1.32 9 (3, 15) 0.003

TABLE 7 Summary data reported at each meeting of the CHART lung cancer trial DMC

Page 70: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Prospective analysis? The priors were elicitedbefore the start of the trial, and the Bayesianresults presented to the DMC at each of theirmeetings.

Loss function. No formal loss function was elicited,but a pretrial survey of 11 clinicians participating inthe trial revealed that, on average, they would bewilling to use CHART routinely if it conferred a13.5% improvement in 2 year survival (from abaseline of 15%).348

Prior distribution. Although the participating clini-cians were enthusiastic about CHART, there wasconsiderable scepticism expressed by oncologistswho declined to participate in the trial. Threeforms of prior distribution are considered:

1. A reference prior comprising a locally uniformdistribution on the log(hazard ratio) scale,

2. A clinical prior distribution was elicited from 11clinicians before the trial started using themethods of Spiegelhalter et al.421 (see page 17),who report the parallel elicitation exercise forthe head-and-neck CHART trial. The prior

distribution, when averaged over the clinicians,expressed a median anticipated 2 year survivalbenefit of 10%, and a 10% chance that CHARTwould offer no survival benefit at all. Whentransformed to a log(hazard ratio) scale, thissubjective prior distribution had a mean of 0.314(hazard ratio of 1.37) and a precision equivalentto a trial with 60:40 allocation in which 73deaths had occurred (50 under CHART, 23under standard treatment).

3. A sceptical prior was derived using the ideas inchapter 3: the prior mean is 0 and the precisionis such that the prior probability that the truebenefit exceeds the alternative hypothesis is low(5% in this case). This gives a prior equivalent totrial with 60:40 allocation with 112 events and anobserved log(hazard ratio) of 0.

Both the sceptical and clinical prior distributionsare displayed in Figure 7. The probabilities of noimprovement in 2 year survival, and of an improve-ment greater than 7% (corresponding to alog(hazard ratio) of 0.225) are shown. The value of7% as a clinically important difference has beensubjectively chosen to be half the 13.5% originally

Case study 1

60

¨ Standard superior * log(hazard ratio) * CHART superior Æ

–0.2 0.2 0.4 0.6 0.8 1.0

< equiv.: 0.500 = equiv.: 0.388 > equiv.: 0.112< equiv.: 0.086 = equiv.: 0.263 > equiv.: 0.651

¨ Standard superior * log(hazard ratio) * CHART superior Æ

–0.2 0.2 0.4 0.6 0.8 1.0

0.0

1.0

2.0

3.0

0.0

1.0

2.0

3.0

0.0

1.0

2.0

3.0

0.0

1.0

2.0

3.0

< equiv.: 0.003 = equiv.: 0.042 > equiv.: 0.954

¨ Standard superior * log(hazard ratio) * CHART superior Æ

–0.2 0.2 0.4 0.6 0.8 1.0

< equiv.: 0.042 = equiv.: 0.399 > equiv.: 0.560< equiv.: 0.002 = equiv.: 0.067 > equiv.: 0.931

Observed log(hazard ratio)

–0.2 0.2 0.4 0.6 0.8 1.0

< observ.: 0.981 > observ.: 0.019

< observ.: 0.815 > observ.: 0.185

(a)

(c)

(b)

(d)

FIGURE 7 The situation at the first interim analysis of the CHART lung cancer trial in 1992 (hazard ratio = 1.82 based on 78 events).The sceptical (solid) and clinical (dashed) (a) prior and (b) posterior distributions are shown, with probabilities of lying below, within andabove the range of equivalence (corresponding to between a 0 and 7% improvement in the 5 year survival rate. (c) Likelihood (m = 78).(d) Predictive distribution

Page 71: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

demanded by the clinicians, in view of the unex-pected lack of toxicity of the new treatment.

Computations. All analysis has been carried outusing S-plus BART functions previously used inSpiegelhalter et al.421

Reporting. The DMC were presented with survivalcurves, reference and sceptical posterior distribu-tions and tail areas.

Sensitivity analysis. The three priors provided thesensitivity analysis.

Figure 7 shows the sceptical and clinical prior distri-butions, the likelihood for the results available in

1992 (equivalent to the reference posterior), thecorresponding sceptical and clinical posterior distri-butions, and the predictive distributions for theobserved log(hazard ratios) under the two priors.

Figure 8 shows the results progressing over the5 years of the study. Under the reference priorthere is substantial reduction in the estimatedeffect as the extreme early results are attenuated.The sceptical prior is remarkably stable, and itsinitial estimate in 1992 is essentially unchanged asthe trial progresses.

The detailed results under the sceptical priorare shown in Table 8, showing the stable resultsover time.

Health Technology Assessment 2000; Vol. 4: No. 38

61

–10 –5 0 5 10 15 20 25 30 35

Improvement in 2 year survival rate (%)

Reference prior

Sceptical prior

Clinical prior

FIGURE 8 Estimates and 95% intervals for the improvement in 2 year survival rate attributable to CHART treatment, under thereference, sceptical and clinical priors ( , 1992; , 1993; , 1994; , 1995; , 1996)

Page 72: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Case study 1

62

Year No. ofdeaths

Estimates under the sceptical prior

Hazardratio

2 year % survivalimprovement(95% CI)

P(improvement > 0%) P(improvement > 7%)

1992 78 1.27 7 (–1,17) 0.044 0.461993 175 1.37 10 (3,18) 0.003 0.211994 275 1.28 8 (2,15) 0.006 0.401995 379 1.25 7 (1,13) 0.006 0.521996 444 1.24 7 (2,12) 0.004 0.54

TABLE 8 Estimates presented to the CHART DMC in successive years, obtained under a sceptical prior distribution

Page 73: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

The technology. This example has been consid-ered at length in the medical and statistical

literature, as it features an apparent contradictionbetween a meta-analysis and a ‘mega-trial’. Anabbreviated history follows. Epidemiology, animalmodels and biochemical studies have suggestedintravenous magnesium sulphate may have a protec-tive effect in patients with AMI, particularly throughpreventing serious arrhythmias.443 A series of smallrandomised trials culminated in a meta-analysis in1991443 which showed a highly significant (P < 0.001)55% reduction in odds of death. The authorsconcluded that “further large scale trials to confirm(or refute) these findings are desirable”, and in1992 the Second Leicester Intravenous MagnesiumIntervention Trial (LIMIT-2)481 published resultsshowing a 24% reduction in mortality in over 2000patients. An editorial in Circulation was entitled ‘Aneffective, safe, simple and inexpensive treatment’,484

but recommended further trials to obtain “a moreprecise estimate of the mortality benefit”. Earlyresults of the massive Fourth International Study ofInfarct Survival (ISIS-4) trial pointed, however, to alack of any benefit, and final publication of this trialon 58,000 patients showed a non-significant adversemortality effect of magnesium. ISIS-4 found noeffect in any subgroups, and concluded that“Overall, there does not now seem to be any goodclinical trial evidence for the routine use ofmagnesium in suspected acute MI”.109

There have been many responses to this apparentcontradiction between meta-analysis and mega-trial, which can be summarised under four broadheadings:

• Essential scepticism about large effects. Inresponse to the ISIS-4 results, Yusuf, the mainauthor of the optimistic Circulation editorial,claimed “since most treatments produce eitherno effect or at least moderate effects on majoroutcomes such as mortality, investigators shouldbe sceptical if the results obtained deviatesubstantially from this expectation (“too good tobe true”)”.483 This expression of prior scepticismwas echoed by Peto and colleagues,359 whoargued that the risk reduction of the initialoverview was “implausibly large”, and that even

when combined with the LIMIT-2 data “stillindicated an implausibly large reduction of one-third in mortality”. However, Peto reports thatthe ISIS-4 steering committee was convincedthere would be at least some benefit, right upuntil they were shown the results.

• Criticism of the meta-analysis. Egger andDavey-Smith157,158 claim that the meta-analysiswas flawed, as a funnel plot (of numbers ofparticipants against observed treatment effect)suggested smaller negative studies might nothave been published, and sensitivity analysiscould have prevented the misleading conclu-sions. Using a different argument, Pogue andYusuf367 claim that a frequentist stopping ruleapplied to the meta-analysis, designed to havehigh power to detect a moderate effect (15%reduction in mortality), would also have led tothe meta-analysis not being significant at the 1%level even taking into account the LIMIT-2 data.

• Criticism of the mega-trial. Woods480 has arguedthat a mega-trial such as ISIS-4 will tend to biasresults towards the null due to protocol viola-tions and inaccurate data. In addition, he claimsthat the benefit of magnesium is to preventreperfusion injury, and yet the ISIS-4 protocolexpected all patients to be given thrombolytictherapy (which tends to induce reperfusion)before randomisation, and hence magnesiumwould generally be given too late to providebenefit. He claims the subgroup who did notreceive thrombolytic therapy did not providesufficient power to detect an importantdifference.

• Treatment effect depending on baseline risk.Antman14 and others have pointed out that inISIS-4:– the control group mortality was 7.2% in

contrast to 11.0% observed in the dataavailable at the time of the 1993 Circulationeditorial

– patients were randomised at a median of 8hours after onset

– 70% received thrombolytics and 94%received antiplatelets.

Health Technology Assessment 2000; Vol. 4: No. 38

63

Chapter 10

Case study 2: meta-analysis of magnesium sulphatefollowing acute myocardial infarction

Page 74: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Antman concluded that “patients who are at lowrisk of mortality, at least in part due to otherpotent mortality-reducing therapies such asthrombolytics and aspirin, show little benefit frommagnesium”. The methodology for examiningwhether the benefit of treatment may depend onunderlying risk has been recently explored in aseries of papers,320,453 emphasising that simpletechniques can detect a spurious relationship dueto the natural correlation between baseline andchange.

Objectives of analysis. To investigate how aBayesian perspective might have influenced theinterpretation of the published evidence onmagnesium sulphate in AMI available in 1993. Inparticular, what degree of ‘scepticism’ would havebeen necessary in 1993 not to be convinced by themeta-analysis reported by Yusuf and colleagues,484

and is there evidence that the treatment effectdepends on sample size or baseline risk?

Design of study. Meta-analysis of randomisedtrials, allowing for sceptical prior distributions anddependence of treatment effect on baseline risk.

Available evidence in study. We use the data shownin Table 9 as quoted by Yusuf and colleagues,484

which comprise the data in the 1991 meta-analysis443 and LIMIT-2.481

The log(odds ratio) for the first eight trials,including LIMIT-2, using the standard Peto fixedeffect analysis, would be estimated as –0.43 withstandard error 0.12. The corresponding likelihoodis as if a single trial had been carried out with 261deaths in total (compared with the actual 286deaths) and an observed log(odds ratio) of –0.43.

Statistical models. The basic sampling model isassumed to have the following form:

Different models (numbered 1 to 4 below) corre-spond to different assumptions about the form ofthe baseline risks mi and the treatment effects di:

1. Fixed effect (pooled estimate) model:

2. Random effects model:

3. Random effects model allowing effect to dependon (logarithm of) sample size:

4. Random effects model allowing effect to dependon baseline risk (this is a Bayesian analogue ofthe bivariate meta-analysis model320,465):

Note that by assuming the mis come from anormal distribution, we are in fact assuming abivariate normal distribution for the baselinerisk and treatment effect.

The value m0 is the baseline risk (on a logitscale) at which the treatment has no effect.

Prior distribution. We consider that a reasonabledegree of scepticism is to think it unlikely (only10% chance) that magnesium would change theodds on mortality by more than 25%. This can betranslated into a normal prior distribution, centredon 0 and with precision equivalent to a ‘trial’ with65 deaths in each group:

(see chapter 3 for further discussion of suchsceptical priors). An alternative, very sceptical priorwas also examined, equivalent to having alreadyobserved 450 deaths in each group.

All other parameters are given proper minimallyinformative priors: Normal(0, 106) for locationparameters and Gamma(0.001, 0.001) forprecisions.

Loss function. No explicit loss function, but a 10%reduction in odds of death has been selected as a‘clinically important difference’.

Computations. Fixed effect analysis using theBART S-plus functions (see appendix 2), randomeffects analysis in BUGS).

Case study 2

64

C C C~ Binomial( , )i i ir p nT T T~ Binomial( , )i i ir p n

= mClogit( )i ip

= m + dTlogit( )i i ip

i dd =

d s2~ Normal( , )i d

2~ Normal( , )i idd s

d= + b -(log log )i i id d n n

2~ Normal( , )i idd s

( )i id d dd m= + b m -2~ Normal( , )i dm mm s

0 /d dm dm = - b

2~ Normal(0, 0.175 )D

Page 75: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Results and sensitivity analysis.

Fixed effect analysis. Figure 9 shows the prior, likeli-hood and posterior for two sceptical priors: a‘reasonable’ one and one designed to produce aposterior distribution with 5% chance that there isno benefit from magnesium.

The prior necessary not to have found the meta-analysis ‘significant’, even at a one-sided 5% proba-bility, is clearly a very extreme form of scepticism.Table 10 shows that a reasonably sceptical prioreven finds the meta-analysis quite convincingconcerning a clinically worthwhile improvement, inthat there is 97% chance that the treatment benefitis at least 10%.

We therefore can reject Yusuf and Flather’s claimthat a sceptical approach applied to their analysiswould have led to caution.

Random effect analysis. The random effects analysisleads to a different conclusion. Figure 10 shows the95% posterior credible intervals for the mortalityodds ratio associated with magnesium, for bothfixed and random effect analysis, and a ‘flat’ refer-ence prior and the reasonably sceptical prior. Therandom effects analysis combined with a ‘flat’reference prior, and the fixed effect analysis with aflat or sceptical prior (as also shown in Figure 9), alllead to highly ‘significant’ results. However therandom effects analysis with a sceptical prior leadsto a 95% interval that includes one, and hence thecautious result sought by Yusuf. Inclusion of theISIS-4 results has a strong impact on the fixedeffect analysis, but little influence on the randomeffect (see discussion below).

Figure 11 shows the consequences of a fully Bayesiansceptical random effects analysis on the estimatesgiven to the individual trials. The LIMIT-2 results

Health Technology Assessment 2000; Vol. 4: No. 38

65

Magnesium superiorp(d < 0)

Magnesium clinically superiorp(d < –0.1)

Very sceptical (450 in each group) 0.95 0.48Reasonably sceptical (65 in each group) 0.998 0.97

TABLE 10 Posterior probabilities of absolute and clinical superiority of magnesium, given two levels of sceptical prior

¨ Magnesium superior * log(odds ratio) * control superior Æ –0.8 –0.4 0.0 0.2 0.4

0123456

0123456

0123456

0123456

< equiv.: 0.067 = equiv.: 0.433 > equiv.: 0.500< equiv.: 0.284 = equiv.: 0.216 > equiv.: 0.500

¨ Magnesium superior * log(odds ratio) * control superior Æ –0.8 –0.4 0.0 0.2 0.4

< equiv.: 0.996 = equiv.: 0.004 > equiv.: 0.000

¨ Magnesium superior * log(odds ratio) * control superior Æ –0.8 –0.4 0.0 0.2 0.4

< equiv.: 0.477 = equiv.: 0.473 > equiv.: 0.050< equiv.: 0.968 = equiv.: 0.030 > equiv.: 0.002

Observed log(odds ratio)

–0.8 –0.4 0.0 0.2 0.4

< observ.: 0.001 > observ.: 0.999

< observ.: 0.023 > observ.: 0.977

(a) (b)

(c) (d)

FIGURE 9 (a) Prior, (b) likelihood (m = 261, x = –0.43) and (c) posterior distributions for two sceptical prior distributions (450 and 65deaths): the wider is a ‘reasonable’ expression of scepticism (equivalent to a ‘trial’ with 65 deaths in each group), while the narrowerprior is the scepticism necessary not to have found the meta-analysis ‘significant’ (equivalent to a ‘trial’ with 450 deaths in each group).

Page 76: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

are hardly changed, whereas the smaller studies arepulled towards a cautious conclusion.

Does effect depend on sample size? Egger and Davey-Smith157,158 have claimed that one problem with theinitial meta-analysis443 is publication bias againstsmaller negative trials. Using the data available toYusuf et al. in 1993, and a minimally informativereference prior, we fit the following model to seewhether there is evidence that the treatment effectdoes depend on sample size as suggested in theirfunnel plot:

A positive b corresponds to smaller trials havingsmaller expected odds ratios, corresponding to alarger treatment effect of magnesium. Table 11shows the estimated bs are positive but with verywide intervals, so that there is therefore only weakevidence that smaller studies had more extremeresults.

Relationship to underlying risk. Figure 12 shows theapparent relationship between the observedtreatment effect and underlying risk. Fitting aregression line through these points, using theappropriate bivariate model described earlier,

provides the results shown in Table 12, with andwithout the ISIS-4 data, where p0 is the risk in thecontrol group at which the treatment has no effect.

The relationship to underlying risk is suggestedbefore inclusion of ISIS-4, but with a very wideinterval. Nevertheless, the model would havepredicted that the treatment would not be effectivewith an underlying risk below 9.1%. Inclusion ofthe ISIS-4 results, whose underlying risk was 7.2%,strongly confirmed this relationship.

Discussion. We conclude that one would need tohave been unreasonably extremely sceptical not tohave found the 1993 meta-analysis convincing, ifone had carried out a standard Peto fixed-effectanalysis. Reasonable scepticism and a randomeffects meta-analysis would have led to appropriatecaution. There was limited evidence available in1993 that treatment effect was related to samplesize or underlying risk, but both have beenconfirmed by ISIS-4.

It is important to recognise the limitations of sucha statistical analysis. There is obvious heterogeneitybetween the studies, and it is vital to investigate thepossible reasons for this using substantive knowl-edge, through inclusion of covariates and so on.Many sorts of sensitivity analysis are necessary (and

Case study 2

66

0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00

Fixed effect, ‘flat’ prior

Fixed effect, sceptical prior

Random effect, ’flat’ prior

Random effect, sceptical prior

Odds ratio

FIGURE 10 Ninety-five per cent posterior credible intervals for the mortality odds ratio associated with magnesium, for both fixed andrandom effect analysis, and a ‘flat’ reference prior and the reasonably sceptical prior (10% chance of at least a 25% change in mortalityodds) (solid line, Yusuf et al. (1993); broken line, Yusuf et al. plus ISIS-4)

2~ Normal( , )i idd s

(log log )i i id d n nd= + b -

Page 77: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

feasible). The random effects methodology can bequestioned when used on studies of very differentsizes, in that the very large studies may be unrea-sonably downweighted. It suggests the question‘What is a “study”?’. We could always break downa large study into smaller ones to add weight: forexample ISIS-4 included 31 countries and 1086hospitals, and it would be very interesting to inves-tigate the heterogeneity between these centres in astructured way.

The Bayesian analysis, while not necessarilyproviding qualitatively different conclusions toa traditional analysis, does allow subjective judge-ments to be formally incorporated and thesensitivity of the conclusions to those beliefsexplored within a coherent framework.

Health Technology Assessment 2000; Vol. 4: No. 38

67

0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00

Morton

Rasmussen

Smith

Abraham

Feldstedt

Shechter

Ceremuzynski

LIMIT-2

‘Typical’/population

Study

Odds ratio

FIGURE 11 Ninety-five per cent intervals for individual studies, from a random effects analysis with a reasonably sceptical prior (solidline, fixed effects; broken line, random effects with sceptical prior)

Model b 95% interval

Fixed effect 0.50 (–1.45, 2.42)Random effect 0.29 (–0.20, 0.68)

TABLE 11 Estimate and 95% interval for the influence ofincreased sample size on the effect of magnesium treatment

Model forbaseline risks

b 95%interval

p0 (%)

Without ISIS-4 –1.4 (–24, 14) 9.1 (0.5–54)With ISIS-4 –1.2 (–2.2, –0.2) 7.2 (1.9–8.4)

TABLE 12 The estimated influence of underlying risk onmagnesium treatment effect: a negative b corresponds to thetreatment effect becoming smaller as the underlying risk declines,with the treatment effect becoming zero when the underlying riskis p0

Page 78: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Case study 2

68

Control group risk

Observed odds ratio

0.02 0.04 0.06 0.08 0.12 0.16 0.20

0.2

0.4

0.6

0.8

1.0

1.2

1.41.6

FIGURE 12 Plot of observed mortality odds ratio associated with magnesium, for seven trials contributing to the Teo et al. meta-analysis, LIMIT-2 and ISIS-4. The size of each point is proportional to the precision of estimation, so that corresponding to ISIS-4 is verylarge. A bivariate distribution of treatment effect and baseline risk has been fitted, and a predictive distribution around the regressionline plotted

Page 79: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Here we revisit four short examples provided byEddy et al. in their text on the confidence

profile method150 (see chapter 6 for a discussion oftheir work and possible reasons for its lack ofimpact).

The confidence profile method made extensiveuse of graphical models for communication ofthe essential features of a model, although theirsoftware, FAST*PRO, was menu-driven. In contrast,more recent developments in Bayesian graphicalmodelling have led to software, for exampleWinBUGS (see appendix 2), in which the graphexplicitly drives the analysis through generatingthe code for performing the required simulations(see chapter 2 for background on simulation-basedBayesian analysis). The re-analysis of some of theirexamples thus serves two purposes: to emphasisethe versatility and power of their approach, and toshow how current (freely available) software canmake its implementation reasonablystraightforward.

The four specific applications have been selectedto illustrate various facets of Bayesian graphicalmodelling applied to evidence synthesis in healthtechnology assessment, and each example isstructured according to the BayesWatch criteria.However, they are clearly rather dated and henceshould not be taken as having any substantivevalue, and we have simply reproduced theoriginal description and have not attempted tocarry out a full analysis to appropriate qualitystandards.

Analysis of surveillance ofcolorectal cancer patients: amodelling exercise based entirelyon judgements

Reference. This example forms chapter 29 ofEddy et al.150

The technology. Surveillance of colorectal cancerpatients in order to reduce the risk of liver meta-stases. No direct evidence for the effectiveness ofthis intervention is available.

Objectives of analysis. To estimate the reduction inmortality rate from liver metastasis due to introduc-tion of surveillance, denoted dsurv.

Statistical model. We assume that a death isprevented if:

1. a case has a solitary liver metastasis at the time ofexamination (with probability pexist)

2. the metastasis is detected at examination (withprobability pdetect.if.exist)

3. the detected metastasis is treatable (with proba-bility ptrt.if.detect)

4. the treatment leads to the patient surviving5 years, who would not otherwise have survivedwithout surveillance (survival probability hasincreased dsurv.if.trt)

The overall improvement in survival rate is given bythe logical product:

dsurv = dsurv.if.trt × ptrt.if.detect × pdetect.if.exist × pexist

The graph in Figure 13 shows this logical depend-ence, using the DoodleBUGS graph drawingfacility in WinBUGS.

Prior distributions. The subjective prior distribu-tions shown in Table 13 are those provided by Eddyand colleagues.

Computations. Since no data are involved aforwards Monte Carlo simulation can be carriedout without a burn-in stage and without concernswith convergence. A total of 100,000 iterationswere carried out, taking 2 seconds on a 400 MHzpersonal computer.

Results. Summary statistics for the simulated poste-rior distributions are shown in Table 14.

The posterior distributions are the same as theprior distributions (up to simulation error) sinceno data have been observed. We estimate 0.022%increase in survival (95% interval –0.010 to0.092%), compared with Eddy and colleagues’estimate of 0.025% (95% interval –0.031 to0.075%). Their estimate is based on assuming anormal posterior distribution for dsurv, and the

Health Technology Assessment 2000; Vol. 4: No. 38

69

Chapter 11

Case study 3: confidence profiling revisited

Page 80: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

skewness of the posterior distribution displayed inFigure 14 clearly shows this is inappropriate.

Sensitivity analysis. The whole case study can bethought of as a sensitivity analysis to the assump-tion of known parameter values – if the bestguesses had been used the conclusion would havebeen an increase in survival, with no allowance foruncertainty.

Comments. This analysis illustrates the propagationof uncertainty through a logical model, akin toplacing probability distributions on the inputs toa spreadsheet. The resulting skewness shows thedangers of making normal approximations to poste-rior distributions of logically transformed quantities.

Analysis of HIP trial of breastcancer screening: adjusting a trial’sresult for uncertain internal biases

Reference. This example forms chapter 19 of Eddyet al.150

The technology. Breast cancer screening offered towomen aged under 50 years.

Objectives of analysis. To estimate the reduction inmortality in breast cancer mortality associated withaccepting screening, denoted ed = qc – qt, where qc

is the true mortality rate in those not screened,and qt is the rate in those actually screened. Wealso require a 95% interval and the chance that thereduction is greater than 2/1000.

Design of study. Randomised controlled trial.

Available evidence in study. The HIP published in1988 the results shown in Table 15. Note that thedata reflect the mortality rate of those offeredscreening, whereas we wish to make statementsabout those actually taking up screening.

Statistical model. Let qoff and qc be the underlyingmortality rate of those offered and not offeredscreening respectively, so that rt ~ Binomial(nt, qoff),and rc ~ Binomial(nc, qc). Eddy and colleaguesconsider four increasingly complex models:

1. ‘Intention to treat’: act as if those screened havethe same mortality rate as those offered, that isqoff = qt.

2. Adjustment for dilution: assume the proportiond who do not accept the screening has the samemortality rate as those not offered screening,that is qoff = (1 – d)qt + dqc. In this example d isinitially assumed to be 45%.

3. Adjustment for selection bias: suppose it werehypothesised that those women who wouldreject screening were at lower average risk than

Case study 3

70

delta.surv. if.trt beta.prior

p.trt.if.detect

p.detect. if.exist

p.exist

delta. sur vivedelta.survive

FIGURE 13 A ‘doodle’ from WinBUGS illustrating the structural dependencies of the model: this graphical representation directlygenerates the code which carries out the simulations

Page 81: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

those who would accept. Specifically, let pn andpt be the relative risks associated with the charac-teristics that would lead to screening beingrejected, in the groups not offered and offeredscreening respectively. Then it can be shown(Eddy and colleagues,150 Chapter 15) that

pn and pt are initially assumed to be 0.9.

4. Uncertainty on the biases: informative priordistributions are now placed on the bias parame-ters (see below) to represent the more realisticassumption that the biases are only impreciselyknown.

Figure 15 shows the model for the first analysis.

Prior distribution. The prior distributions shown inTable 16 and provided by Eddy and colleagues are‘non-informative’ (Jeffreys priors) for the primaryparameters, and specify a standard deviation of 0.1for the distributions of d, pn and pc. The distribu-tions for pn and pc are log-normal, that is theirlogarithms are assumed to have a normal distribu-tion with the appropriate mean and standarddeviation.

Health Technology Assessment 2000; Vol. 4: No. 38

71

Parameter Posteriormean (%)

95% credibleinterval

pexist 1.0 (0.3, 2.0)pdetect.if.exist 75.0 (41.0, 96.7)ptrt.if.detect 30.0 (6.1, 62.6)dsurv.if.trt 10.0 (– 4.9, 24.7)dsurv 0.022 (–0.010, 0.092)

TABLE 14 Results for cancer surveillance example

delta.survive sample: 50,000

–0.2 0.0 0.2

FIGURE 14 Posterior distribution of the change in survivalattributable to screening dsurv, showing strongly skewed distribution

Parameter Best guess (%) 95% confidence range (%) Distribution

pexist 1 0.3 to 2 Beta(4.83, 488)pdetect.if.exist 75 40 to 95 Beta(5.5, 1.83)ptrt.if.detect 30 5 to 65 Beta(2.5, 5.83)

dsurv.if.trt 10 –5 to 25 (2 × Beta(96.25, 78.75)*2) – 1

TABLE 13 Prior distributions provided by Eddy and colleagues for cancer surveillance example

prob.imp

r.t

n.t

theta.t

e.d

theta.c

n.c

r.c

theta.c

FIGURE 15 A ‘doodle’ illustrating the structural dependencies of the model; thick lines represent logical dependencies, and thin linesrepresent stochastic dependencies

Not offeredscreening

Offeredscreening

Breast cancer deaths rc = 65 rt = 49Total nc = 12,000 nt = 12,000

TABLE 15 Results of HIP screening trial published in 1988

t coff

t t

(1 )1 (1 )/

d dd dp d d pq q

q = - +- + + -

Page 82: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Computations. A total of 100,000 iterations werecarried out, taking 2 seconds on a 400 MHzpersonal computer.

Results. The posterior distribution of ed had theproperties shown in Table 17.

These agree closely with the results of Eddy et al.The crucial finding is that the results are very sensi-tive to the introduction of an allowance for bias(moving from model 1 to 2), but robust to specifi-cation of its precise nature.

Comments. As Eddy et al. point out, many addi-tional issues might be addressed in this framework,including: varying degrees of dilution in whichdifferent proportions of women receive different

numbers of examinations, the possibility ofcontamination in the control group, loss tofollow-up, errors in measurement of outcome,and the possibility that the technology might haveimproved over time. It would be interesting tocontrast this analysis with recent investigations ofthis still-controversial topic.

Analysis of screening for maplesyrup urine disease (MSUD):modelling using evidence frommultiple studies

Reference. This example forms chapter 27 ofEddy et al.150

Case study 3

72

Parameter Mean Standard deviation Distribution

qt Beta(0.5, 0.5)qc Beta(0.5, 0.5)d 0.45 0.1 Beta(10.69, 13.06)log pn –0.105 0.0953 N(–0.105, 1/111.1)log pt –0.105 0.0953 N(–0.105, 1/111.1)

TABLE 16 Prior distributions provided by Eddy and colleagues for the cancer-screening example

Model Posterior mean 95% credible interval P(ed < –0.002)

1 –0.0013 (–0.0031, 0.0004) 0.232 –0.0027 (–0.0061, 0.0005) 0.663 –0.0025 (–0.0058, 0.0005) 0.634 –0.0028 (–0.0061, 0.0006) 0.66

TABLE 17 Results for the cancer-screening example

Factor Notation Outcomes Observations

Probability of MSUD r 7 724,262Probability of early detection with screening fs 253 ,276Probability of early detection without screening fn 8 ,18Probability of retardation with early detection qem 2 ,10Probability of retardation without early detection qlm 10 ,10

TABLE 18 Data used in the MSUD example

Factor Notation Derivation

Probability of retardation for a case of MSUD who is screened qsm fsqem + (1 –fs)qlm

Probability of retardation for a case of MSUD who is not screened qnm fnqem + (1 – fn)qlm

Expected retardations per 100,000 newborns who are screened 100,000qs 100,000qsmrExpected retardations per 100,000 newborns who are not screened 100,000qn 100,000qnmrChange in retardations due to screening 100,000 newborns ed qs – qn

TABLE 19 Model and notation for the MSUD example

Page 83: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

The technology. Neonatal screening for MSUD, aninborn error in amino acid metabolism, for whichearly detection should lead to reduced rates ofretardation.

Objectives of analysis. To estimate the probabilityof retardation without screening, and the changein retardation rate associated with screening. Thelatter is denoted ed = qs – qn, where qn is the retarda-tion rate in those not screened, and qs is the rate inthose screened.

Design of study. Modelling exercise using resultsfrom multiple epidemiological cohort studies.

Available evidence. There was no direct evidenceon the change in retardation rate in screenedand unscreened populations. The data shown inTable 18 was used (references provided by Eddyand colleagues).

Statistical model. The data described in Table 18are all assumed to arise from binomial distributionswith the appropriate parameters. The functionalrelationships shown in Table 19 then exist. Thegraphical model is shown in Figure 16.

Prior distribution. The prior distributions for allthe binomial parameters provided by Eddy andcolleagues are ‘non-informative’ (Jeffreys priors –Beta(0.5, 0.5)).

Computations. A total of 100,000 iterations werecarried out, taking 3 seconds on a 400 MHzpersonal computer.

Results. The posterior distribution of ed had theproperties shown in Table 20.

Eddy and colleagues display a normal approxima-tion to the posterior distribution for ed, with an

Health Technology Assessment 2000; Vol. 4: No. 38

73

n.r n.s n.n n.em n.lm

r.lmr.emr.nr.sr.r

theta.n

theta.nmtheta.sm

theta.lmtheta.emphi.nphi.s

r

e.d

theta.s

theta.nm

FIGURE 16 Doodle for the MSUD example

Parameter Notation Posteriormean

95% credibleinterval

Expected retardations per 100,000 newborns who are not screened qn 0.65 (0.25, 1.27)Change in expected retardations due to screening 100,000 newborns ed –0.35 (–0.77, –0.11)

TABLE 20 Results for the MSUD example

Page 84: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

estimate of –0.35 (95% interval –0.69 to –0.19).Our wider interval accurately reflects the skewedposterior distribution.

Comments. This example illustrates the synthesisof evidence from multiple studies, with appropriateallowance for the uncertainty of the parameter esti-mates. Further extensions could include allowance

for various biases and uncertainty on the inputs tothe model.

Analysis of colon cancer screeningtrial: power calculations allowingfor cross-over between treatmentarms

Reference. This example forms chapter 30 of Eddyet al.150

The technology. Screening for colon cancer.

Objectives of analysis. To estimate the probabilityof a statistically significant result in a future trial

Case study 3

74

Treatment Offeredscreening

Not offeredscreening

Colon cancer deaths rt rc

Number of cases nt nc

TABLE 21 Data to be observed in the colon cancer screeningexample

Factor Notation Derivation

Proportion of those offered screening who cross-over dt

Proportion of those not offered screening who cross-over dc

Mortality rate in group offered screening pt (1 – dt)qt + dtqc

Mortality rate in group not offered screening pc (1 – dc)qc + dcqt

Chi-squared statistic chisquare [ ( ) ( )] ( )( )(

r n r r n r n nn n r r n n r rt c c t c t c

t c t c t c t

− − − ++ + − −

2

c )

Significant result? Significant? chisquare > 3.84

TABLE 22 Model and notation for colon cancer screening example

d.c

significant

chisquare

d.t theta.t

p.cp.t

rc

nt nc

theta.c

rtrc

FIGURE 17 Doodle of the model for predicting the power of the colon cancer screening trial

Page 85: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

(the power), assuming a two-sided type I error (a)of 0.05. This will be calculated assuming 50,000individuals per group as an example. The analysiswill be by intention-to-treat, but we want toadjust for the possibility of some of those offeredscreening ‘crossing over’ to the unscreened group,and some of those not offered screening crossingover to the screened group. Eddy terms this‘dilution’.

Design of study. Proposed randomised controlledtrial.

Available evidence in study. None yet, but the finalevidence will have the form shown in Table 21.

Statistical model. Let pt and pc be the underlyingmortality rate of those offered and not offeredscreening, respectively, so that rc ~ Binomial(nc, pc),rt ~ Binomial(nt, pt). Let qt and qc be the assumedmortality rates in those actually obtaining and notobtaining screening, respectively. Under the cross-over assumption, we have the relationships shownin Table 22. The graph of the model is shown inFigure 17, from which the WinBUGS analysis isdriven.

Prior distribution. Eddy and colleagues assumethe following point estimates for the parameters:qc = 0.005, qt = 0.004, dt = 0.3, dc = 0.2. Thus theyare attempting to detect a 20% mortality reduction,assuming 30% of those offered screening refuse,

and 20% of those not offered screening arescreened outside the trial.

Computations. Sampling from the binomialdistribution with a large denominator is slow, andso 5000 iterations took nearly 3 minutes on a400 MHz personal computer.

Results. These are shown in Table 23, with the resultsof Eddy et al. in bold; ‘dilution’ refers to the allow-ance for cross-over between intervention groups.

Comments. The results of Eddy et al. appear to beincorrect for the trial with dilution. It is clear thatif a moderate amount of cross-over is plausible,then a clinical trial needs to be very large in orderto have a reasonable chance of correctly obtaininga significant conclusion.

Commentary

The confidence profile technique can easily betransformed into the form of Bayesian graphicalmodelling used by the WinBUGS software. It is avery powerful method, but has perhaps beenlimited by its implementation, and WinBUGSappears to allow straightforward application.However, there are inevitably dangers with such amodelling exercise, which can only be as good asits structural assumptions and the quality of thedata going into it.

Health Technology Assessment 2000; Vol. 4: No. 38

75

Sample size per group With dilution Without dilution

WinBUGS Eddy WinBUGS Eddy

50,000 0.21 0.48 0.66 0.66150,000 0.53 0.91 0.98 0.98

TABLE 23 Power of the proposed colon cancer screening trial

Page 86: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient
Page 87: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

The UK Human Fertilisation and EmbryologyAuthority (HFEA) has a responsibility to

monitor clinics in the UK licensed to carry outdonor insemination (DI) and in vitro fertilisation(IVF). Their annual publication – The PatientsGuide to DI and IVF Clinics – is designed to helppeople who are considering fertility treatment tounderstand the services offered by licensed clinicsand to decide which clinic is best for them.245

One of the statistics provided regarding IVF treat-ment at each clinic is an adjusted live-birth rate pertreatment cycle started. A live birth is defined as anybirth event in which at least one baby is born andsurvives for more than 1 month, and a treatmentcycle begins with the administration of drugs toinduce superovulation. The adjustment, which isintended to take account of the mix of patientstreated by the clinic by using factors such as age,duration of infertility, number of previous treat-ment cycles and so on,441 varies from year to yearand is based on a pooled logistic regression of allIVF treatments carried out in the UK in the relevantyear. Also provided are associated 95% confidenceintervals for each adjusted live-birth rate.

Marshall and Spiegelhalter312 analyse the datapublished in 1996, using Figure 18 to show thesubstantial range of success rates displayed by theclinics.

They then use MCMC techniques to derive poste-rior distributions for the ranks of the institutions:

this is easily done by calculating the current rank ofeach institution at each iteration of the simulation,and then summarising the distribution of thesecalculated ranks after many thousands of iterations.Figure 19 shows that there is considerable uncer-tainty in the true rank of an institution, even whenthey show substantial differences in performance.

We now assume the clinics are fully exchangeable(see page 21) with the true rates (on a logit scale)being drawn from a common normal distribution:if, after adjusting for case mix, we can find noother contextually meaningful way to differentiatebetween the institutions a priori, then the assump-tion of their exchangeability seems justified. Itis clear from Figure 20 that there is substantial‘shrinkage’ towards the overall mean performance,although there are still a number of clinics thatwould be considered ‘significantly’ above or belowaverage. It can be argued that this adjustment is anappropriate means of dealing with the problem ofmultiple comparisons. In addition, this shrinkageshould deal with ‘regression-to-the-mean’, in whichextreme institutions will tend back towards theoverall average when they recover from theirtemporary run of good or bad luck.

The consequence of assuming exchangeability is toreduce the differences between clinics and henceto make their ranks even more uncertain. Figure 21shows this is the case to a limited extent, althoughsince many of the extreme clinics are also fairlylarge, their rank is not unduly affected.

Health Technology Assessment 2000; Vol. 4: No. 38

77

Chapter 12

Case study 4: comparison of in vitrofertilisation clinics

Page 88: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Case study 4

78

0 5 10 15 20 25 30

NURTURE, NottinghamBirmingham Womens Hospital

St James’s HospitalRoyal Maternity Hosp Belfast

Lister HospitalSouthmead General

Oxford IVF UnitBMI Chelsfield Park Hospital

Leeds General InfirmarySouth Cleveland Hospital

BMI Priory HospitalHolly House Fertility Unit

BUPA Roding HospitalBMI Park Hospital

Guys & St Thomas’ HospitalsLondon Womens Clinic

North Staffordshire HospitalNorthampton Fertility Service

Royal Masonic HospitalWolfson Family Clinic

University of BristolMidland Fertility Services

Churchill ClinicWessex Fertility Services

Esperance Hospital, EastbourneBridge Fertility Centre

University Hospital WalesBourne Hall Clinic

Royal Victoria InfirmaryWashington Hospital

BMI Portland HospitalEdinburgh ACU

Newham General HospitalSt Mary’s Hospital

London Fertility CentreLeicester Royal Infirmary Sheffield Fertility CentreGlasgow Royal Infirmary

Wirral Fertility CentreUniversity College Hospital

BUPA Hospital, LeicesterHartlepool General Hospital

Walsgrave HospitalARU Aberdeen University

Cromwell IVF CentreBMI Chiltern Hospital

Kings College HospitalHull IVF Unit

Ninewells HospitalFazakerley Hospital

Manchester Fertility ServicesWithington Hospital

Adjusted live birth rate (%)

(4.9%: 147)(8.2%: 506)(8.3%: 240)(8.3%: 501)(8.5%: 390)(8.6%: 1453)(8.7%: 149)(9.1%: 427)(9.8%: 327)(9.8%: 458)(10.8%: 85)(11.2%: 110)(11.2%: 366)(11.8%: 141)(12%: 876)(12.1%: 661)(12.4%: 114)(12.7%: 786)(13%: 627)(13%: 68)(13.2%: 447)(13.5%: 152)(13.7%: 307)(13.8%: 342)(14.1%: 1315)(14.1%: 168)(14.3%: 568)(14.7%: 212)(14.9%: 404)(15.1%: 519)(15.3%: 787)(15.4%: 773)(15.9%: 1004)(15.9%: 839)(16.1%: 223)(16.3%: 116)(16.4%: 643)(16.9%: 496)(17.3%: 640)(17.9%: 211)(18.8%: 262)(19%: 241)(19.6%: 104)(19.7%: 946)(20.4%: 208)(21.2%: 603)(21.5%: 82)(22.1%: 1104)(22.2%: 548)(22.5%: 537)(22.6%: 267)(23.7%: 861)

FIGURE 18 Estimates and 95% intervals for the adjusted live-birth rate in each clinic. The vertical line represents the national averageof 14%. The estimated adjusted live-birth rate for each clinic is given in parentheses, together with the number of treatment cyclesstarted

Page 89: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Health Technology Assessment 2000; Vol. 4: No. 38

79

0 10 20 30 40 50

Rank

NURTURE, NottinghamBirmingham Womens Hospital

St James’s HospitalRoyal Maternity Hosp Belfast

Lister HospitalSouthmead General

Oxford IVF UnitBMI Chelsfield Park Hospital

Leeds General InfirmarySouth Cleveland Hospital

BMI Priory HospitalHolly House Fertility Unit

BUPA Roding HospitalBMI Park Hospital

Guys & St Thomas’ HospitalsLondon Womens Clinic

North Staffordshire HospitalNorthampton Fertility Service

Royal Masonic HospitalWolfson Family Clinic

University of BristolMidland Fertility Services

Churchill ClinicWessex Fertility Services

Esperance Hospital, EastbourneBridge Fertility Centre

University Hospital WalesBourne Hall Clinic

Royal Victoria InfirmaryWashington Hospital

BMI Portland HospitalEdinburgh ACU

Newham General HospitalSt Mary’s Hospital

London Fertility CentreLeicester Royal Infirmary Sheffield Fertility CentreGlasgow Royal Infirmary

Wirral Fertility CentreUniversity College Hospital

BUPA Hospital, LeicesterHartlepool General Hospital

Walsgrave HospitalARU Aberdeen University

Cromwell IVF CentreBMI Chiltern Hospital

Kings College HospitalHull IVF Unit

Ninewells HospitalFazakerley Hospital

Manchester Fertility ServicesWithington Hospital (4.9%: 147)

(8.2%: 506)(8.3%: 240)(8.3%: 501)(8.5%: 390)(8.6%: 1453)(8.7%: 149)(9.1%: 427)(9.8%: 327)(9.8%: 458)(10.8%: 85)(11.2%: 110)(11.2%: 366)(11.8%: 141)(12%: 876)(12.1%: 661)(12.4%: 114)(12.7%: 786)(13%: 627)(13%: 68)(13.2%: 447)(13.5%: 152)(13.7%: 307)(13.8%: 342)(14.1%: 1315)(14.1%: 168)(14.3%: 568)(14.7%: 212)(14.9%: 404)(15.1%: 519)(15.3%: 787)(15.4%: 773)(15.9%: 1004)(15.9%: 839)(16.1%: 223)(16.3%: 116)(16.4%: 643)(16.9%: 496)(17.3%: 640)(17.9%: 211)(18.8%: 262)(19%: 241)(19.6%: 104)(19.7%: 946)(20.4%: 208)(21.2%: 603)(21.5%: 82)(22.1%: 1104)(22.2%: 548)(22.5%: 537)(22.6%: 267)(23.7%: 861)

FIGURE 19 Median and 95% intervals for the rank of each clinic. The estimated adjusted live-birth rate for each clinic is given inparentheses, together with the number of treatment cycles started. The dashed vertical lines divide the clinics into quarters according totheir rank

Page 90: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Case study 4

80

0 5 10 15 20 25 30

NURTURE, NottinghamBirmingham Womens Hospital

St James’s HospitalRoyal Maternity Hosp Belfast

Lister HospitalSouthmead General

Oxford IVF UnitBMI Chelsfield Park Hospital

Leeds General InfirmarySouth Cleveland Hospital

BMI Priory HospitalHolly House Fertility Unit

BUPA Roding HospitalBMI Park Hospital

Guys & St Thomas’ HospitalsLondon Womens Clinic

North Staffordshire HospitalNorthampton Fertility Service

Royal Masonic HospitalWolfson Family Clinic

Midland Fertility ServicesUniversity of Bristol

Churchill ClinicWessex Fertility Services

Esperance Hospital, EastbourneBridge Fertility Centre

University Hospital WalesBourne Hall Clinic

Royal Victoria InfirmaryWashington Hospital

BMI Portland HospitalEdinburgh ACU

Newham General HospitalSt Mary’s Hospital

London Fertility CentreLeicester Royal Infirmary Sheffield Fertility CentreGlasgow Royal Infirmary

Wirral Fertility CentreUniversity College Hospital

BUPA Hospital, LeicesterHartlepool General Hospital

ARU Aberdeen UniversityWalsgrave Hospital

Cromwell IVF CentreBMI Chiltern Hospital

Hull IVF UnitKings College Hospital

Ninewells HospitalFazakerley Hospital

Manchester Fertility ServicesWithington Hospital

Adjusted live birth rate (%)

(10.98%: 147)(10.11%: 506)(10.15%: 240)(9.62%: 501)(8.99%: 1453)(10.11%: 390)(12.12%: 149)(10.74%: 427)(10.71%: 458)(11.69%: 327)(12.93%: 85)(12.92%: 110)(12.1%: 366)(13.22%: 141)(12.28%: 876)(12.41%: 661)(13.43%: 114)(13.24%: 786)(13.42%: 627)(13.89%: 68)(13.6%: 447)(13.9%: 152)(13.99%: 307)(13.99%: 342)(14.06%: 1315)(14.25%: 168)(14.35%: 568)(14.68%: 212)(14.83%: 404)(14.93%: 519)(15.32%: 773)(15.22%: 787)(15.78%: 1004)(15.73%: 839)(15.61%: 223)(15.47%: 116)(16.19%: 643)(16.48%: 496)(16.97%: 640)(16.77%: 211)(16.71%: 262)(17.88%: 241)(17.6%: 104)(19.22%: 946)(18.76%: 208)(20.43%: 603)(17.83%: 82)(20.64%: 1104)(21.32%: 548)(21.39%: 537)(20.95%: 267)(22.19%: 861)

FIGURE 20 Estimates and 95% intervals for the adjusted live-birth rate in each clinic, assuming exchangeability between clinics. Thedashed vertical line represents the national average of 14%

Page 91: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Health Technology Assessment 2000; Vol. 4: No. 38

81

0 10 20 30 40 50Rank

NURTURE, NottinghamBirmingham Womens Hospital

St James’s HospitalRoyal Maternity Hosp Belfast

Lister HospitalSouthmead General

Oxford IVF UnitBMI Chelsfield Park Hospital

Leeds General InfirmarySouth Cleveland Hospital

BMI Priory HospitalHolly House Fertility Unit

BUPA Roding HospitalBMI Park Hospital

Guys & St Thomas’ HospitalsLondon Womens Clinic

North Staffordshire HospitalNorthampton Fertility Service

Royal Masonic HospitalWolfson Family Clinic

Midland Fertility ServicesUniversity of Bristol

Churchill ClinicWessex Fertility Services

Esperance Hospital, EastbourneBridge Fertility Centre

University Hospital WalesBourne Hall Clinic

Royal Victoria InfirmaryWashington Hospital

BMI Portland HospitalEdinburgh ACU

Newham General HospitalSt Mary’s Hospital

London Fertility CentreLeicester Royal Infirmary Sheffield Fertility CentreGlasgow Royal Infirmary

Wirral Fertility CentreUniversity College Hospital

BUPA Hospital, LeicesterHartlepool General Hospital

ARU Aberdeen UniversityWalsgrave Hospital

Cromwell IVF CentreBMI Chiltern Hospital

Hull IVF UnitKings College Hospital

Ninewells HospitalFazakerley Hospital

Manchester Fertility ServicesWithington Hospital (10.98%: 147)

(10.11%: 506)(10.15%: 240)(9.62%: 501)(8.99%: 1453)(10.11%: 390)(12.12%: 149)(10.74%: 427)(10.71%: 458)(11.69%: 327)(12.93%: 85)(12.92%: 110)(12.1%: 366)(13.22%: 141)(12.28%: 876)(12.41%: 661)(13.43%: 114)(13.24%: 786)(13.42%: 627)(13.89%: 68)(13.6%: 447)(13.9%: 152)(13.99%: 307)(13.99%: 342)(14.06%: 1315)(14.25%: 168)(14.35%: 568)(14.68%: 212)(14.83%: 404)(14.93%: 519)(15.32%: 773)(15.22%: 787)(15.78%: 1004)(15.73%: 839)(15.61%: 223)(15.47%: 116)(16.19%: 643)(16.48%: 496)(16.97%: 640)(16.77%: 211)(16.71%: 262)(17.88%: 241)(17.6%: 104)(19.22%: 946)(18.76%: 208)(20.43%: 603)(17.83%: 82)(20.64%: 1104)(21.32%: 548)(21.39%: 537)(20.95%: 267)(22.19%: 861)

FIGURE 21 Estimated true ranks and 95% intervals for each clinic, assuming exchangeability

Page 92: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient
Page 93: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Introduction

This review has described the general use ofBayesian methods in health technology assessment,and has considered a number of specific areas ofapplication, for example, randomised controlledtrials. While in many of these areas the advantagesof adopting a Bayesian approach have been clearlydemonstrated, a number of problems have alsobeen identified. This chapter summarises many ofthese advantages and disadvantages, and makes aseries of recommendations for the main partici-pant groups in health technology assessment. Thenext section summarises the specific conclusionsthat have been drawn in the preceding chapters,after which the general advantages and problemsassociated with adopting a Bayesian approach areconsidered. The chapter concludes by summarisingthe areas requiring further research.

Specific conclusions

Introduction1. Bayesian methods are defined as the explicit

quantitative use of external evidence in thedesign, monitoring, analysis, interpretation andreporting of a health technology assessment.

2. Bayesian methods are a controversial topic inthat they may involve the explicit use of subject-ive judgements in what is conventionallysupposed to be a rigorous scientific exercisein health technology assessment.

3. There has been very limited use of properBayesian methods in practice, and relevantstudies appear to be relatively easily identified.

4. The potential importance of Bayesian methodsto a topic is not necessarily reflected in thevolume of published literature: in particular,publications on the design and analysis of singleclinical trials dominate those on the synthesis ofevidence from studies of multiple designs.

Bayesian methods in the healthtechnology assessment context1. Claims of advantages and disadvantages of

Bayesian methods are now largely based onpragmatic reasons rather than blanket ideolog-ical positions.

2. A Bayesian approach can lead to flexiblemodelling of evidence from diverse sources.

3. Bayesian methods are best seen as a transforma-tion from an initial to a final opinion, ratherthan providing a single ‘correct’ inference.

4. Different contexts may demand different statis-tical approaches, both regarding the role ofprior opinion and the role of an explicit lossfunction. It is vital to establish contexts in whichBayesian approaches are appropriate.

5. A decision-theoretic approach may be appro-priate where the consequences of a study arepredictable, such as when dealing with rarediseases treated according to a protocol, withina pharmaceutical company, or in public healthpolicy.

The prior distribution1. The use of a prior is based on judgement, and

hence a degree of subjectivity cannot be avoided.2. The prior is important and not unique, and so a

range of options should be examined in a sensi-tivity analysis.

3. The intended audience for the analysis needs tobe explicitly specified.

4. The quality of subjective priors (as assessed bypredictions) show predictable biases in terms ofenthusiasm.

5. For a prior to be taken seriously, its evidentialbasis must be explicitly given, as well as anyassumptions made (e.g. downweighting of pastdata). Care must, however, be taken of bias inpublished results.

6. Archetypal priors may be useful for identifying areasonable range of prior opinion.

7. Great care is required in using default priorsintended to be minimally informative.

8. Exchangeability assumption should not be madelightly.

Randomised trials1. The Bayesian approach provides a framework

for considering the ethics of randomisation.2. Monitoring trials with sceptical and other priors

may provide a unified approach to assessingwhether a trial’s results should be convincing towide range of reasonable opinion, and couldprovide a formal tool for Data MonitoringCommittees.

Health Technology Assessment 2000; Vol. 4: No. 38

83

Chapter 13

Conclusions and implications for future research

Page 94: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

3. Various sources of multiplicity can be dealt within a unified and coherent way.

4. In contrast to earlier phases of development, itis generally unrealistic to formulate a Phase IIItrial as a decision problem, except in circum-stances where future treatments can be accu-rately predicted.

5. An empirical basis for prior opinions in clinicaltrials should be investigated, but archetypalprior opinions play a useful role.

6. The structure in which trials are conducted mustbe recognised, but can be taken into account byspecifying a range of prior opinions.

Observational studies1. Epidemiological studies tend to demand a more

complex analysis than randomised trials.2. Computer-intensive Bayesian methods in epi-

demiology are becoming more common.3. There are likely to be increased demands, partic-

ularly in areas such as institutional comparisonsand gene–environment interactions.

4. The explicit modelling of potential biases inobservational data may be widely applicablebut needs some evidence base in order to beconvincing.

Evidence synthesis1. A unified Bayesian approach appears to be

applicable to a wide range of problemsconcerned with evidence synthesis.

2. In the past, prospective evaluation of clinicalinterventions concentrated on randomisedcontrolled trials, but more recent interesthas focused on more diffuse areas, such ashealthcare delivery or broad public healthmeasures. This means methods that can synthe-sise totality of evidence are required, forexample in assessing medical devices.

3. Evaluations of current technologies may oftenbe seen as unethical subjects for randomisedcontrolled trials, and hence modelling ofavailable evidence is likely to be necessary.

4. Perhaps one reason for lack of uptake is thatsyntheses are not seen as ‘clean’ methods, witheach analysis being context-specific, less easyto set quality markers for, easier to criticise assubjective and so on.

5. Priors for the degree of ‘similarity’ betweenalternative designs can be empirically informedby studies comparing the results of randomisedcontrolled trials and observational data.

Strategy, decisions and policy making1. A Bayesian approach allows explicit recognition

of multiple perspectives.

2. Increased attention to pharmaco-economicsmay lead decision-theoretic models for researchplanning to be explored, although this will notbe straightforward.

3. There appears to be great potential for formalmethods for planning in the pharmaceuticalindustry.

4. The regulation of devices is leading the way inestablishing the role of evidence synthesis.

5. ‘Comprehensive decision modelling’ is likely tobecome increasingly important in policy making.

Practical examples and case studies1. The BayesWatch criteria may provide a basis for

structured reporting of Bayesian analysis.2. Summaries of fully fledged (‘three-star’) applica-

tions of Bayesian methods in health technologyassessment contain few prospective analyses butprovide useful guidance.

3. Four case studies show:a. Bayesian analyses using a sceptical prior can

be useful to the data monitoring committeeof a cancer clinical trial (case study 1 (chapter9): the CHART trial).

b. Bayesian methods can be used to temper overoptimistic conclusions based on meta-analysisof small trials (case study 2 (chapter 10):magnesium sulphate after AMI).

c. Modern graphical software can easily handlecomplex assessments previously analysedusing the ‘confidence profile’ method (casestudy 3 (chapter 11)).

d. Bayesian methods provide a flexible tool forperformance estimation and ranking of institu-tions (case study 4 (chapter 12): IVF clinics).

General advantages and problems

Potential advantages of Bayesianapproaches in health technologyassessment1. All evidence regarding a specific problem can be

taken into account.2. Specification of a prior distribution requires

sponsors, investigators and policy makers tothink carefully and be explicit about whatexternal evidence and judgement they shouldinclude.

3. Hierarchical models, which also can be handledwithin a non-Bayesian framework, allow poolingof evidence and ‘borrowing of strength’ betweenmultiple substudies.

4. Potential biases can be explicitly modelled,allowing the synthesis of studies of varyingdesigns.

Conclusions and implications for future research

84

Page 95: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

5. The Bayesian approach focuses on the vitalquestion ‘How should this piece of evidencechange what we currently believe?’.

6. Probability statements can be made directlyregarding quantities of interest, and predictivestatements are easily derived.

7. Juxtaposition of current belief with clinicaldemands provides an intuitive and flexiblemechanism for monitoring and reporting studies.

8. The inferential outputs from a Bayesian analysisfeed naturally into a decision-theoretic andpolicy-making context.

9. Explicit recognition of the importance ofcontext makes Bayesian methods particularlysuitable for health technology assessment, inwhich multiple parties may well interpret thesame evidence in different ways.

Generic problems1. Unfamiliarity with Bayesian techniques, perhaps

along with their perceived mathematicalcomplexity, and some conservatism on the partof potential users, has resulted in limited use ofproper Bayesian methods in health technologyassessment practice to date.

2. The use of prior opinions acknowledges asubjective input into analyses, which may appearto contravene the scientific aim of objectivity.

3. Specification of priors, whether by elicitation orchoice of defaults, is a contentious and difficultissue.

4. There are no established standards for thedesign, analysis and reporting of Bayesian studies.

5. A full decision-theoretic framework can lead toinnovative but non-standard trial designs verydifferent from those currently inuse.

6. Specification of expected utilities is difficult andmay require extensive assumptions about futureuse of technology.

7. There is no automatic measure of statisticalsignificance and lack of model fit, such as adeviance measure and a P value.

8. The computational complexity of the methodshas been a major issue until recently.

9. Software for implementation of the methods isstill limited in availability and user-friendliness.

Many of the issues raised above have beenmentioned in more specific contexts by others, forexample Sutton et al.434

Future research and development

Bayesian methods could be of great value withinhealth technology assessment. For a realistic

appraisal of the methodology, it is necessary todistinguish the roles and requirements for fivemain participant groups in health technologyassessment: methodological researchers, sponsors,investigators, reviewers and consumers. However,two common themes for all participants canimmediately be identified. First, the need for anextended set of case studies showing practicalaspects of the Bayesian approach, in particular forprediction and handling multiple substudies, inwhich mathematical details are minimised butdetails of implementation are provided. Second,the development of standards for the performanceand reporting of Bayesian analyses, possiblyderived from the BayesWatch checklist described inthis report.

The potential roles for each of the participantgroups are summarised below under each aspect ofa Bayesian assessment: design, priors, modelling,reporting and decision-making.

1. Methodological researchers:a. Design. There is a need for transferable

methods for sample size calculation that arenot based on type I and type II error, such astargeting precision, and realistic developmentof payback models, with modelling ofdissemination.

b. Priors. Simple and reliable elicitationmethods for ‘non-enthusiasts’ requiretesting, as well as demonstrations of theuse of empirical data as a basis for priordistributions. Reasonable default priors innon-standard situations need to be available.

c. Modelling. Methods for flexible modelselection and robust MCMC analysis requiredevelopment and dissemination. With regardto implementation, there is a need for user-friendly software for clinical trials andevidence synthesis.

d. Reporting. It is essential to have appraisalcriteria along the lines of the BayesWatchchecklist, with possible reformulation asguidelines along the lines of ‘How to read aBayesian study’. It would be useful to have theterm ‘Bayesian’ in all relevant papers in orderto aid literature searches.

e. Decision-making. Increased integration witha health economic and policy perspective ishighly desirable, together with flexible toolsfor implementation.

2. Sponsors:a. Design. Both public sector and industry could

extend their perspective beyond the classicalNeyman–Pearson criteria, and in particular

Health Technology Assessment 2000; Vol. 4: No. 38

85

Page 96: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

investigate quantitative payback models. Thepharmaceutical industry might also investi-gate formal project prioritisation schemes.

b. Priors. All sponsors could focus on the eviden-tial basis for assumptions made concerningalternative hypotheses and the potential gainsfrom technology, and use empirical reviews toestablish reasonable prior opinions.

3. Investigators:a. Design. Apart from the considerations given

above under ‘sponsors’, there is also potentialfor ‘open’ studies in which interim results arereported to investigators.

b. Priors. It would be valuable to gain experi-ence in eliciting prior opinions from bothenthusiasts and a general cross-section of thetarget community.

c. Modelling. There is great scope, whenanalysing data, to go beyond the usual limitedlist of models and consider a range of priorsand structural assumptions.

d. Reporting. It is vital that any Bayesianreporting allows future users to include theevidence in their synthesis or decision. Theuse of BayesWatch or a similar scheme forreporting should help in this.

4. Reviewers/regulatory bodies:a. Priors. Regulatory bodies could establish

reasonable prior opinions based on pastexperience in order to provide defaultpriors.

b. Modelling. Regulatory bodies could take amore flexible approach to the use of data,particularly in areas such as medical devices,and encourage efficient use of data by appro-priate use of historical controls, evidencesynthesis and so on.

c. Decision-making. More experimental wouldbe the explicit modelling of the conse-quences of decisions in order to decideevidential criteria.

5. Consumers/policy makers. There is a need forcareful case studies in which policy makersexplicitly go through the following stages inreaching a conclusion based on a full Bayesiananalysis:a. Priors: specify prior opinions relevant at the

time of decision-makingb. Modelling: pool all available evidence into a

coherent modelc. Reporting: make predictive probability state-

ments about the consequences of differentpolicies

d. Decision-making: assign costs to potentialconsequences and so assess (with sensitivityanalysis) the expected value of differentactions.

Conclusions and implications for future research

86

Page 97: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Health Technology Assessment 2000; Vol. 4: No. 38

87

This study was commissioned by the NHS R&DHTA programme. The authors are indebted to

the HTA referees for their perseverance in reading

this report and the quality of their comments.The views expressed in this report are those of theauthors, who are responsible for any errors.

Acknowledgements

Page 98: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient
Page 99: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

1. Abrams K, Ashby D, Errington D. Simple Bayesiananalysis in clinical trials – a tutorial. Controlled ClinTrials 1994;15:349–59.

2. Abrams K, Ashby D, Houghton J, Riley D. Assessingdrug interactions: tamoxifen and cyclophosph-amide. In: Berry and Stangl,61 p. 3–66.

3. Abrams K, Jones DR. Meta-analysis and thesynthesis of evidence. IMA J Math Appl Med Biol1995;12:297–313.

4. Abrams K, Sansó B. Approximate Bayesianinference for random effects meta-analysis. Stat Med1998;17:201–218.

5. Abrams KR. Monitoring randomised controlledtrials. Parkinson’s disease trial illustrates thedangers of stopping early. BMJ 1998;316:1183–4.

6. Abrams KR, Jones DR. Bayesian interim analysis ofrandomised trials. Lancet 1997;349:1911–12.

7. Achcar JA, Brookmeyer R, Hunter WG. Anapplication of Bayesian analysis to medical follow-up data. Stat Med 1985;4:509–20.

8. Adar R, Critchfield GC, Eddy DM. A confidenceprofile analysis of the results of femoropoplitealpercutaneous transluminal angioplasty in thetreatment of lower-extremity ischemia. J Vasc Surg1989;10:57–67.

9. Adcock CJ. The Bayesian approach to deter-mination of sample sizes – some comments on thepaper by Joseph, Wolfson and Du Berger. J R StatSoc Ser D 1995;44:155–61.

10. Ahn C. An evaluation of Phase I cancer clinical trialdesigns. Stat Med 1998;17(14):1537–49.

11. Albert J, Chib S. Bayesian modelling of binaryrepeated measures data with application to cross-over trials. In: Berry and Stangl,61 p. 577–600.

12. Altman DG. Bayesian approaches to randomisedtrials – discussion. J R Stat Soc Ser A 1994;157:387–416.

13. Anscombe F. Sequential medical trials. J Am StatAssoc 1963;58:365–83.

14. Antman EM. Magnesium in acute MI – timing iscritical. Circulation 1995;92:2367–72.

15. Armitage P. The search for optimality in clinicaltrials. Int Stat Rev 1985;53:15–24.

16. Armitage P. Discussion of Breslow (1990). Stat Sci1990;5.

17. Armitage P. Interim analysis in clinical trials. StatMed 1991;10(6):925–37.

18. Armitage P. Letter to the editor. Controlled ClinTrials 1991;12:345.

19. Armitage P. A case for Bayesianism in clinicaltrials – discussion. Stat Med 1993;12:1395–404.

20. Armitage P, Colton T, editors. Encyclopaedia ofbiostatistics. Chichester: Wiley; 1998.

21. Armitage P, Mcpherson K, Rowe B. Repeatedsignificance tests on accumulating data. J R Stat SocA 1969;132:235–44.

22. Armitage PA. Inference and decision in clinicaltrials. J Clin Epidemiol 1989;42:293–9.

23. Ashby D, Hutton J. Bayesian epidemiology. In:Berry and Stangl,61 p. 109–138.

24. Ashby D, Hutton J, McGee M. Simple Bayesiananalyses for case–control studies in cancerepidemiology. Statistician 1993;42:385–97.

25. Atkinson EN. A Bayesian strategy for evaluatingtreatments applicable only to a subset of patients.Stat Med 1997;16:1803–15.

26. Avins AL. Bayesian analysis and the GUSTO trial.J Am Med Assoc 1995;274:873.

27. Ayanian J, Landrum M, Normand S, Guadagnoli E,McNeil B. Rating the appropriateness of coronaryangiography – do practicing physicians agree withan expert panel and with each other? N Eng J Med1998;338:1896–904.

28. Baraff LJ, Oslund S, Prather M. Effect of antibiotictherapy and etiologic microorganism on the risk ofbacterial meningitis in children with occultbacteremia. Pediatrics 1993;92:140–3.

29. Barnett V. Comparative statistical inference. 2nded. Chichester: Wiley; 1982.

30. Bather JA. On the allocation of treatments insequential medical trials. Int Stat Rev 1985;53:1–13.

31. Baudoin C, O’Quigley J. Symmetrical intervals andconfidence intervals. Biometr J 1994;36:927–34.

32. Bayes T. An essay towards solving a problem in thedoctrine of chances. Phil Trans R Soc 1763;53:418.

33. Begg CB. Book review of ‘cross design synthesis’.Stat Med 1992;12:1627–30.

Health Technology Assessment 2000; Vol. 4: No. 38

89

References

Page 100: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

34. Begg CB, Dumouchel W, Harris J, Dobson A,Dear K, Givens GH, et al. Publication bias in meta-analysis: a Bayesian data-augmentation approachto account for issues exemplified in the passivesmoking debate – comments and rejoinders. StatSci 1997;12:241–50.

35. Belin TR, Elashoff RM, Leung K, Nisembaum R,Bastani R, Nasseri K, et al. Combining informationfrom multiple sources in the analysis of non-equivalent control group design. In: Gatsonis C,Hodges J, Kass R, Singpurwalla N, editors. Casestudies in Bayesian statistics II. Berlin: Springer-Verlag; 1995. p. 241–260.

36. Berger J. Statistical decision theory and Bayesianinference. Berlin: Springer-Verlag 1985.

37. Berger J, Wolpert R. The Likelihood Principle. 2nded. Hayward (CA): Institute of MathematicalStatistics; 1988.

38. Berger JO, Berry DA. Statistical analysis and theillusion of objectivity. Am Sci 1988;76:159–65.

39. Bergman S, Gittins J, editors. Statistical methods forpharmaceutical research planning. New York:Marcel Dekker; 1985.

40. Bernardinelli L, Clayton D, Pascutto C, MontomoliC, Ghislandi M, Songini M. Bayesian analysis ofspace-time variation in disease risk. Stat Med 1995;14:2433–43.

41. Bernardo J, Berger JO, Lindley DV, Smith AFM,editors. Bayesian statistics 4. Oxford: OxfordUniversity Press; 1992.

42. Bernardo J, DeGroot MH, Lindley DV, Smith AFM,editors. Bayesian statistics 3. Oxford: OxfordUniversity Press; 1988.

43. Bernardo JM, Smith AFM. Bayesian theory.Chichester: Wiley; 1994.

44. Berry D, Hardwick J. Using historical controls inclinical trials: application to ECMO. In: Berger J,Gupta S, editors. Statistical decision theory andrelated topics V. New York: Springer-Verlag; 1993.p. 141–56.

45. Berry D, Stangl D. Bayesian methods in health-related research. In: Berry and Stangl,61 p. 3–66.

46. Berry DA. Ethics and ECMO. Comments on‘Investigating therapies of potentially great benefit:ECMO’ by J H Ware. Stat Sci 1989;4:306–10.

47. Berry DA, Hardwick J. Recent progress in clinicaltrial designs that adapt for ethical purposes:comments on ‘Investigating therapies of potentiallygreat benefit: ECMO’ by J H Ware. Stat Sci1989;4:327–36.

48. Berry DA. Interim analyses in clinical trials –classical vs Bayesian approaches. Stat Med 1985;4:521–6.

49. Berry DA. Interim analyses in clinical research.Cancer Invest 1987;5:469–77.

50. Berry DA. Interim analysis in clinical trials – therole of the likelihood principle. Am Stat 1987;41:117–22.

51. Berry DA. Monitoring accumulating data in aclinical trial. Biometrics 1989;45:1197–211.

52. Berry DA. Bayesian methods in Phase III trials. DrugInformation J 1991;25:345–68.

53. Berry DA. A case for Bayesianism in clinical trials.Stat Med 1993;12:1377–93.

54. Berry DA. Bayesian approaches to randomisedtrials – discussion. J R Stat Soc Ser A 1994;157:387–416.

55. Berry DA. Decision analysis and Bayesian methodsin clinical trials. Cancer Treatment Res 1995;75:125–54.

56. Berry DA. Statistics: a Bayesian perspective.Belmont (CA): Duxbury Press; 1996.

57. Berry DA. When is a confirmatory randomisedclinical trial needed? J Natl Cancer Inst 1996;88:1606–7.

58. Berry DA, Eick SG. Adaptive assignment versusbalanced randomization in clinical trials – adecision-analysis. Stat Med 1995;14:231–46.

59. Berry DA, Ho CH. One-sided sequential stoppingboundaries for clinical trials – a decision-theoreticapproach. Biometrics 1988;44:219–27.

60. Berry DA, Pearson LM. Optimal designs for clinicaltrials with dichotomous responses. Stat Med 1985;4:597–608.

61. Berry DA, Stangl DK, editors. Bayesian biostatistics.New York: Marcel Dekker; 1996.

62. Berry DA, Thor A, Cirrincione C, Edgerton S, MussH, Marks J, et al. Scientific inference andpredictions: Multiplicities and convincing stories: Acase study in breast cancer therapy. In: Bernardo J,Berger JO, Lindley DV, Smith AFM, editors.Bayesian statistics 5. Oxford: Oxford UniversityPress; 1996. p. 45–67.

63. Berry DA, Wolff MC, Sack D. Public health decisionmaking: a sequential vaccine trial. In: Bernardoet al.,41 p. 79–96.

64. Berry DA, Wolff MC, Sack D. Decision makingduring a Phase III randomised controlled trial.Controlled Clin Trials 1994;15:360–78.

65. Berry SM, Kadane JB. Optimal Bayesianrandomization. J R Stat Soc Ser B 1997;59:813–19.

66. Biggerstaff BJ, Tweedie RL, Mengersen KL. Passivesmoking in the workplace: classical and Bayesian

References

90

Page 101: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

meta-analyses. Int Arch Occupat Environ Health1994;66:269–77.

67. Black N. Why we need observational studies toevaluate the effectiveness of health care. BMJ1996;312:1215–18.

68. Bland JM, Altman DG. Statistics notes: Bayesiansand frequentists. BMJ 1998;317:1151.

69. Box GEP. Sampling and Bayes inference inscientific modelling and robustness (withdiscussion). J R Stat Soc A 1980;143:383–430.

70. Box GEP, Tiao GC. Bayesian inference in statisticalanalysis. Reading: Addison–Wesley; 1973.

71. Brant LJ, Duncan DB, Dixon DO. K-ratio t-tests formultiple comparisons involving several treatmentsand a control. Stat Med 1992;11:863–73.

72. Breslow N. Biostatistics and Bayes. Stat Sci 1990;5(3):269–84.

73. Briggs A. A Bayesian approach to stochastic cost-effectiveness analysis. Electronic Health Electronics Lett1998;2(4):6–13.

74. Briggs AH, Gray AM. Handling uncertainty whenperforming economic evaluation of healthcareinterventions. HTA 1999;3(2).

75. Bring J. Stopping a clinical trial early because oftoxicity – the Bayesian approach. Controlled ClinTrials 1995;16:131–2.

76. Brophy JM, Joseph L. Placing trials in context usingBayesian analysis – GUSTO revisited by ReverendBayes. J Am Med Assoc 1995;273:871–5.

77. Brophy JM, Joseph L. Bayesian interim statisticalanalysis of randomised trials. Lancet 1997;349:1166–8.

78. Brown BW, Herson J, Atkinson EN, Rozell ME.Projection from previous studies – a Bayesian andfrequentist compromise. Controlled Clin Trials1987;8:29–44.

79. Browne RH. Bayesian analysis and the GUSTO trial.J Am Med Assoc 1995;274:873.

80. Browner WS, Newman TB. Are all significant pvalues created equal? The analogy betweendiagnostic tests and clinical research. J Am MedAssoc 1987;257:2459–63.

81. Brunier HC, Whitehead J. Sample sizes for Phase IIclinical trials derived from Bayesian decisiontheory. Stat Med 1994;13:2493–502.

82. Burton PR. Helping doctors to draw appropriateinferences from the analysis of medical studies. StatMed 1994;13:1699–713.

83. Buxton M, Hanney S. Evaluating the NHS researchand development programme: will the programmegive value for money? J R Soc Med 1998;91:2–6.

84. Byar D. Why data bases should not replace clinicaltrials. Biometrics 1980;36:337–42.

85. Byar DP, 21 others. Design considerations for AIDStrials. N Eng J Med 1990;323:1343–8.

86. Byar DP, Simon RM, Friedewald WT, SchlesselmanJJ, DeMets DL, Ellenberg JH, et al. Randomisedclinical trials. perspective on some recent ideas.N Eng J Med 1976;295:74–80.

87. Campbell G. A regulatory perspective for Bayesianclinical trials. Food and Drug Administration; 1999.URL http://www.fda.gov/cdrh/present/Bayesian.pdf

88. Canner PL. Selecting one of two treatments whenthe responses are dichotomous. J Am Stat Assoc1970;65:293–306.

89. Carlin BP. Bayesian approaches to randomisedtrials – discussion. J R Stat Soc Ser A 1994;157:387–416.

90. Carlin BP, Chaloner K, Church T, Louis TA, MattsJP. Bayesian approaches for monitoring clinicaltrials with an application to toxoplasmicencephalitis prophylaxis. Statistician 1993;42:355–67.

91. Carlin BP, Kadane JB, Gelfand AE. Approaches foroptimal sequential decision analysis in clinicaltrials. Biometrics 1998;54:964–75.

92. Carlin BP, Louis TA. Bayes and empirical Bayesmethods for data analysis. London: Chapman andHall; 1996.

93. Carlin BP, Sargent DJ. Robust Bayesian approachesfor clinical trial monitoring. Stat Med 1996;15:1093–106.

94. Carlin JB. Meta-analysis for 2×2 tables – a Bayesianapproach. Stat Med 1992;11:141–58.

95. Chalmers I. What is the prior probability of aproposed new treatment being superior toestablished treatments? BMJ 1997;314:74–5.

96. Chalmers TC, Lau J. Changes in clinical trialsmandated by the advent of meta-analysis. Stat Med1996;15:1263–8.

97. Chaloner K. Elicitation of prior distributions. In:Berry and Stangl,61 p. 141–156.

98. Chaloner K, Church T, Louis TA, Matts JP.Graphical elicitation of a prior distribution for aclinical trial. Statistician 1993;42:341–53.

99. Chaloner K, Verdinelli I. Bayesian experimentaldesign – a review. Stat Sci 1995;10:273–304.

100. Chang MN, Shuster JJ. Interim analysis forrandomised clinical trials: simulating the predictivedistribution of the log-rank test statistic. Biometrics1994;50:827–33.

Health Technology Assessment 2000; Vol. 4: No. 38

91

Page 102: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

101. Chelimsky E, Silberman G, Droitcour J. Crossdesign synthesis. Lancet 1993;341:498.

102. Chevret S. The continual reassessment method incancer Phase I clinical trials: a simulation study. StatMed 1993;12(12):1093–108.

103. Choi SC, Pepple PA. Monitoring clinical trialsbased on predictive probability of significance.Biometrics 1989;45:317–23.

104. Choi SC, Smith PJ, Becker DP. Early decision inclinical trials when the treatment differences aresmall: experience of a controlled trial in headtrauma. Controlled Clin Trials 1985;6(4):280–8.

105. Claxton K. Bayesian approaches to the value ofinformation: implications for the regulation of newpharmaceutical. Health Econ 1999;8:269–74.

106. Claxton K. The irrelevance of inference: a decision-making approach to the stochastic evaluation ofhealth care technologies. J Health Econ1999;18:341–64.

107. Claxton K, Posnett J. An economic approach toclinical trial design and research priority-setting.Health Econ 1996;5:513–24.

108. Cole P. The evolving case–control study. J ChronicDis 1979;32:15–27.

109. Collins R, Peto R, Flather M, ISIS-4 CollaborativeGroup. ISIS-4 – a randomised factorial trialassessing early oral captopril, oral mononitrate, andintravenous magnesium-sulfate in 58,050 patientswith suspected acute myocardial infarction. Lancet1995;345:669–85.

110. Cook RJ, Farewell VT. Multiplicity considerations inthe design and analysis of clinical trials. J R Stat SocSer A 1996;159:93–110.

111. Cornfield J. Sequential trials, sequential analysisand the likelihood principle. Am Stat 1966;20:18–23.

112. Cornfield J. The Bayesian outlook and itsapplications. Biometrics 1969;25:617–57.

113. Cornfield J. Recent methodological contributionsto clinical trials. Am J Epidemiol 1976;104:408–21.

114. Coronary Drug Project Research Group. TheCoronary Drug Project. Initial findings leading to amodification of its research protocol. J Am MedAssoc 1970;214:1301–13.

115. Coronary Drug Project Research Group. Clofibrateand niacin in coronary heart disease. J Am Med Assoc1975;231:360–81.

116. Cowles MK, Carlin BP, Connett JE. Bayesian tobitmodeling of longitudinal ordinal clinical trialcompliance data with non-ignorable missingness.J Am Stat Assoc 1996;91:86–98.

117. Cox DR. Discussion of paper by Lindsey. J R Stat SocD 1999;48:30.

118. Cox DR, Farewell VT. Statistical basis of publicpolicy – qualitative and quantitative aspects shouldnot be confused. BMJ 1997;314:73.

119. Craig BA, Fryback DG, Klein R., Klein BEK. ABayesian approach to modelling the natural historyof a chronic condition from observations withintervention. Stat Med 1999;18:1355–72.

120. Cressie N, Biele J. A sample-size-optimal Bayesianprocedure for sequential pharmaceutical trials.Biometrics 1994;50:700–11.

121. Cronin KA, Legler JM, Etzioni RD. Assessinguncertainty in microsimulation modelling withapplication to cancer screening interventions. StatMed 1998;17(21):2509–23.

122. Daniels MJ, Hughes MD. Meta-analysis for theevaluation of potential surrogate markers. Stat Med1997;16(17):1965–82.

123. Davis CE, Leffingwell DP. Empirical Bayes estimatesof subgroup effects in clinical trials. Controlled ClinTrials 1990;11:37–42.

124. DeGroot MH. Optimal statistical decisions. Reading(MA): Addison-Wesley; 1970.

125. Demets D. Bayesian approaches to randomisedtrials – discussion. J R Stat Soc Ser A 1994;157:387–416.

126. Demets DL. Stopping guidelines vs stopping rules –a practitioner’s point of view. Commun Stat TheoryMethods 1984;13:2395–417.

127. Dempster A, Selwyn M, Weeks B. Combininghistorical and randomised controls for assessingtrends in proportions. J Am Stat Assoc 1983;78:221–7.

128. Dempster AP. Bayesian methods. In: Armitage andColton,20 p. 263–271.

129. Dersimonian R. Meta-analysis in the design andmonitoring of clinical trials. Stat Med 1996;15:1237–48.

130. DerSimonian R, Laird N. Meta-analysis in clinicaltrials. Controlled Clin Trials 1986;7:177–88.

131. Detsky A. Using economic-analysis to determine theresource consequences of choices made inplanning clinical-trials. J Chronic Dis 1985;38:753–65.

132. Diamond GA, Forrester JS. Clinical trials andstatistical verdicts: probable grounds for appeal.Ann Int Med 1983;98:385–94.

133. Digman JJ, Bryant J, Wieand HS, Fisher B, WolmarkN. Early stopping of a clinical trial when there isevidence of no treatment benefit: protocol B-14 of

References

92

Page 103: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

the national surgical adjuvant breast and bowelproject. Controlled Clin Trials 1998;19:575–88.

134. Dixon DO, Simon R. Bayesian subset analysis.Biometrics 1991;47:871–81.

135. Dixon DO, Simon R. Bayesian subset analysis in acolorectal cancer clinical trial. Stat Med 1992;11:13–22.

136. Dixon DO, Simon R. Erratum: Bayesian subsetanalysis (Biometrics 1991;47(871–81)). Biometrics1994;50:322.

137. Dominici F. Testing simultaneous hypothesesin pharmaceutical trials: a Bayesian approach.J Biopharm Stat 1998;8(2):283–97.

138. Dominici F, Parmigiani G, Wolpert R, Hasselblad V.Meta-analysis of migraine headache treatments:Combining information from heterogeneousdesigns. J Am Stat Assoc 1999;94:16–28.

139. Donner A. A Bayesian approach to theinterpretation of subgroup results in clinical trials.J Chronic Dis 1982;35:429–35.

140. Droitcour J, Silberman G, Chelimsky E. Cross-design synthesis:a new form of meta-analysis forcombining results from randomised clinical trialsand medical-practice databases. Int J Technol AssesHealth Care 1993;9:440–9.

141. DuMouchel W. A Bayesian model and graphicalelicitation procedure for multiple comparisons. In:Bernardo et al.,42 p. 127–145.

142. DuMouchel W. Bayesian meta-analysis. In: Berry D,editor. Statistical methodology in the pharma-ceutical sciences. New York: Marcel Dekker, 1990.p. 509–29.

143. DuMouchel W, Berry DA. Meta-analysis fordose–response models. Stat Med 1995;14:679–85.

144. DuMouchel W, Waternaux C. Discussion of‘Hierarchical models for combining informationand for meta-analyses’. In: Bernardo JM, Berger JO,Dawid AP, Smith AFM, editors. Bayesian Statistics 4.Oxford: Clarendon Press; 1992. p. 338–41.

145. DuMouchel WH, Harris JE. Bayes methods forcombining the results of cancer studies in humansand other species (with comment). J Am Stat Assoc1983;78:293–308.

146. Eddy DM. The confidence profile method – aBayesian method for assessing health technologies.Operations Res 1989;37(2):210–28.

147. Eddy DM, Hasselblad V, McGivney W, Hendee W.The value of mammography screening in womenunder age 50 years. J Am Med Assoc 1988;259(10):1512–19.

148. Eddy DM, Hasselblad V, Shachter R. A Bayesianmethod for synthesizing evidence: the confidence

profile method. Int J Technol Assess Health Care1990;6(1):31–55.

149. Eddy DM, Hasselblad V, Shachter R. Anintroduction to a Bayesian method for meta-analysis – the confidence profile method. MedDecision Making 1990;10:15–23.

150. Eddy DM, Hasselblad V, Shachter R. Meta-analysisby the confidence profile method: the statisticalsynthesis of evidence. San Diego (CA): AcademicPress; 1992.

151. Eddy DM, Wolpert RL, Hasselblad V. Confidenceprofiles – a Bayesian method for synthesizingevidence. Med Decision Making 1987;7:287.

152. Editorial. Cross design synthesis: A new strategy forstudying medical outcomes? Lancet 1992;340:944–6.

153. Edwards S, Lilford R, Braunholtz D, Jackson J. Why‘underpowered’ trials are not necessarily unethical.Lancet 1997;350:804–7.

154. Edwards SJL, Lilford RJ, Hewison J. The ethics ofrandomised controlled trials from the perspectivesof patients, the public, and healthcareprofessionals. BMJ 1998;317:1209–12.

155. Edwards SJL, Lilford RJ, Jackson JC, Hewison J,Thornton J. Ethical issues in the design andconduct of randomised controlled trials. HTA1998;2(14).

156. Edwards W, Lindman H, Savage L. Bayesianstatistical inference for psychological research.Psychol Rev 1963;70:193–242.

157. Egger M, Smith GD. Magnesium and myocardial-infarction. Lancet 1994;343:1285.

158. Egger M, Smith GD. Misleading metaanalysis. BMJ1995;311:753–54.

159. Emerson SS. Stopping a clinical trial very earlybased on unplanned interim analyses: a groupsequential approach. Biometrics 1995;51:1152–62.

160. Estey E, Thall P, David C. Design and analysis oftrials of salvage therapy in acute myelogenousleukemia. Cancer Chemotherapy Pharmacol 1997;40:S9–S12.

161. Etzioni R, Pepe MS. Monitoring of a pilot toxicitystudy with 2 adverse outcomes. Stat Med1994;13:2311–21.

162. Etzioni RD, Kadane JB. Bayesian statistical methodsin public health and medicine. Annu Rev PublicHealth 1995;16:23–41.

163. Faries D. Practical modifications of the continualreassessment method for Phase I cancer clinicaltrials. J Biopharm Stat 1994;4:147–64.

164. Fayers P. Bayesian approaches to randomisedtrials – discussion. J R Stat Soc Ser A 1994;157:387–416.

Health Technology Assessment 2000; Vol. 4: No. 38

93

Page 104: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

165. Fayers P. Bayesian interim analysis of randomisedtrials. Lancet 1997;349:1911.

166. Fayers PM, Ashby D, Parmar MKB. Bayesian datamonitoring in clinical trials. Stat Med 1997;16:1413–30.

167. Feinstein AR. Clinical Biostatistics XXXIX: the hazeof Bayes, the aerial palaces of decision analysis, andthe computerised Ouija board. Clin PharmacolTherapeutics 1977;21:482–96.

168. Felli JC, Hazen GB. Sensitivity analysis and theexpected value of perfect information. Med DecisionMaking 1998;18:95–109.

169. Felli JC, Hazen GB. A Bayesian approach tosensitivity analysis. Health Econ 1999;8:263–8.

170. Fienberg S. A brief history of statistics in three andone-half chapters: a review essay. Stat Sci 1992;7(2):208–25.

171. Fisher LD. Comments on Bayesian and frequentistanalysis and interpretation of clinical trials –comment. Controlled Clin Trials 1996;17:423–34.

172. Fleming TR, Watelet LF. Approaches to monitoringclinical trials. J Natl Cancer Inst 1989;81:188–93.

173. Fletcher A, Spiegelhalter D, Staessen J, Thijs L,Bulpitt C. Implications for trials in progress ofpublication of positive results. Lancet 1993;342:653–7.

174. Fluehler H, Grieve AP, Mandallaz D, Mau J, MoserHA. Bayesian-approach to bioequivalenceassessment – an example. J Pharm Sci 1983;72:1178–81.

175. Forster JJ. A Bayesian approach to the analysis ofbinary cross-over data. Statistician 1994;43:61–8.

176. Freedman B. Equipoise and the ethics of clinicalresearch. N Eng J Med 1987;317:141–5.

177. Freedman L. Bayesian statistical methods – anatural way to assess clinical evidence. BMJ1996;313:569–70.

178. Freedman L, Anderson G, Kipnis V, Prentice R,Wang CY, Rossouw J, et al. Approaches tomonitoring the results of long-term diseaseprevention trials: Examples from the women’shealth initiative. Controlled Clin Trials 1996;17(6):509–25.

179. Freedman LS, Lowe D, Macaskill P. Stopping rulesfor clinical trials incorporating clinical opinion.Biometrics 1984;40:575–86.

180. Freedman LS, Spiegelhalter DJ. The assessmentof subjective opinion and its use in relation tostopping rules for clinical trials. Statistician1983;32:153–60.

181. Freedman LS, Spiegelhalter DJ. Comparison ofBayesian with group sequential methods for

monitoring clinical trials. Controlled Clin Trials1989;10:357–67.

182. Freedman LS, Spiegelhalter DJ. Application ofBayesian statistics to decision making during aclinical trial. Stat Med 1992;11:23–35.

183. Freedman LS, Spiegelhalter DJ, Parmar MKB.The what, why and how of Bayesian clinical trialsmonitoring. Stat Med 1994;13:1371–83.

184. Freeman P. The role of p-values in analyzing trialresults. Stat Med 1993;12:1443–52.

185. Frei A, Cottier C, Wunderlich P, Ludin E. Glyceroland dextran combined in the therapy of acutestroke. Stroke 1987;18:373–9.

186. Freirich E, Gehan E, et al. The effect of 6-mercaptopurine on the duration of steroid-induced remissions in acute leukaemia: a modelfor evaluation of other potentially useful therapy.Blood 1963;21:699–716.

187. Gatsonis C, Greenhouse JB. Bayesian methods forPhase I clinical trials. Stat Med 1992;11:1377–89.

188. Gelman A, Carlin J, Stern H, Rubin DB. Bayesiandata analysis. New York: Chapman and Hall; 1995.

189. Gelman A, Rubin DB. Markov chain Monte Carlomethods in biostatistics. Stat Methods Med Res 1996;5:339–55.

190. Genest C, Zidek J. Combining probabilitydistributions: a critique and an annotatedbibliography (with discussion). Stat Sci 1986;1:114–48.

191. George SL, Li CC, Berry DA, Green MR. Stoppinga clinical trial early – frequentist and Bayesianapproaches applied to a CALGB trial in non-small-cell lung cancer. Stat Med 1994;13:1313–27.

192. Gilbert JP, McPeek B, Mosteller F. Statistics andethics in surgery and anesthesia. Science 1977;198:684–9.

193. Gilks WR, Richardson S, Spiegelhalter DJ. Markovchain Monte Carlo methods in practice. New York:Chapman and Hall; 1996.

194. Givens GH, Smith DD, Tweedie RL. Publicationbias in meta-analysis: a Bayesian data-augmentationapproach to account for issues exemplified in thepassive smoking debate. Stat Sci 1997;12:221–40.

195. Goldstein H, Spiegelhalter DJ. Statistical aspects ofinstitutional performance: league tables and theirlimitations (with discussion). J R Stat Soc A 1996;159:385–444.

196. Goodman S, Langer A. Bayesian analysis and theGUSTO trial. J Am Med Assoc 1995;274:873–4.

197. Goodman SN, Zahurak ML, Piantadosi S. Somepractical improvements in the continual

References

94

Page 105: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

reassessment method for Phase I studies. Stat Med1995;14:1149–61.

198. Gore SM. Biostatistics and the Medical ResearchCouncil. Med Res Council News 1987;36:19–20.

199. Gould AL. Using prior findings to augment active-controlled trials and trials with small placebogroups. Drug Inform J 1991;25(3):369–80.

200. Gould AL. Sample sizes for event rate equivalencetrials using prior information. Stat Med 1993;12(21):2009–23.

201. Gould AL. Planning and revising the sample sizefor a trial. Stat Med 1995;14:1039–51.

202. Gould AL. Multi-centre trial analysis revisited. StatMed 1998;17:1779–1797.

203. Gray RJ. A Bayesian analysis of institutional effectsin a multicenter cancer clinical trial. Biometrics1994;50:244–53.

204. GREAT group. Feasibility, safety and efficacy ofdomiciliary thrombolysis by general practitioners:Grampian region early anisteplase trial. BMJ 1992;305:548–53.

205. Greenhouse JB. On some applications of Bayesianmethods in cancer clinical trials. Stat Med 1992;11:37–53.

206. Greenhouse JB, Wasserman L. Robust Bayesianmethods for monitoring clinical trials. Stat Med1995;14:1379–91.

207. Greenland S, Robins JM. Empirical-Bayesadjustments for multiple comparisons aresometimes useful. Epidemiology 1991;2:244–51.

208. Grieve A. Some uses of predictive distributions inpharmaceutical research. In: Okuno T, editor.Biometry – Clinical trials and related topics. NewYork: Elsevier; 1988. p. 83–99.

209. Grieve A, Senn S. Estimating treatment effects inclinical cross-over trials. J Biopharm Stat 1998;8(2):191–247.

210. Grieve AP. A Bayesian analysis of the 2-periodcross-over design for clinical trials. Biometrics 1985;41:979–90.

211. Grieve AP. Evaluation of bioequivalence studies.Eur J Clin Pharmacol 1991;40:201–2.

212. Grieve AP. Bayesian analyses of two-treatmentcross-over studies. Stat Methods Med Res 1994;3:407–29.

213. Grieve AP. Bayesian approaches to randomisedtrials – discussion. J R Stat Soc Ser A 1994;157:387–416.

214. Grieve AP. Extending a Bayesian analysis of thetwo-period cross-over to allow for baselinemeasurements. Stat Med 1994;13:905–29.

215. Grieve AP. Extending a Bayesian analysis of the two-period cross-over to accommodate missing data.Biometrika 1995;82:277–86.

216. Grieve AP. Issues for statisticians in pharmaco-economic evaluations. Stat Med 1998;17:1715–23.

217. Grossman J, Parmar MKB, Spiegelhalter DJ,Freedman LS. A unified method for monitoringand analysing controlled trials. Stat Med 1994;13:1815–26.

218. Gustafson P. A Bayesian analysis of bivariate survivaldata from a multicenter cancer clinical trial. StatMed 1995;14:2523–35.

219. Gustafson P. Robustness considerations in Bayesiananalysis. Stat Methods Med Res 1996;4:357–73.

220. Hall GH. Bayesian interim analysis of randomisedtrials. Lancet 1997;349:1910–11.

221. Halperin M, Lan KKG, Ware JH, Johnson NJ,DeMets DL. An aid to data monitoring in long-termclinical trials. Controlled Clin Trials 1982;3:311–23.

222. Hanauske AR, Edler L. New clinical trial designsfor Phase I studies in hematology and oncology:principles and practice of the continualreassessment model. Onkologie 1996;19:404–9.

223. Healy M. New methodology in clinical trials.Biometrics 1978;34:709–12.

224. Healy MJR. Probability and decisions. Arch Dis Child1994;71:90–4.

225. Hedges LV. Bayesian meta-analysis. In: Everitt BS,Dunn G, editors. Statistical analysis of medical data:new developments. London: Arnold; 1998.p. 251–76.

226. Heisterkamp SH, Doornbos G, Gankema M.Disease mapping using empirical Bayes and Bayesmethods on mortality statistics in The Netherlands.Stat Med 1993;12:1895–913.

227. Heitjan DF. Bayesian interim analysis of Phase IIcancer clinical trials. Stat Med 1997;16:1791–802.

228. Heitjan DF, Houts PS, Harvey HA. A decision-theoretic evaluation of early stopping rules. StatMed 1992;11:673–83.

229. Heitjan DF, Moskowitz AJ, Whang W. Bayesianestimation of cost-effectiveness ratios from clinicaltrials. Health Econ 1999;8:191–201.

230. Henderson WG, Moritz T, Goldman S, Copeland J,Sethi G. Use of cumulative meta-analysis in thedesign, monitoring, and final analysis of a clinicaltrial: a case study. Controlled Clin Trials 1995;16(5):331–41.

231. Herson J. Predictive probability early terminationfor Phase II clinical trials. Biometrics 1979;35:775–83.

Health Technology Assessment 2000; Vol. 4: No. 38

95

Page 106: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

232. Herson J. Bayesian analysis of cancer clinical trials –an introduction to 4 papers. Stat Med 1992;11:1–3.

233. Higgins JP, Whitehead A. Borrowing strength fromexternal trials in a meta-analysis. Stat Med 1996;15:2733–49.

234. Hilsenbeck SG. Early termination of a Phase IIclinical trial. Controlled Clin Trials 1988;9(3):177–88.

235. Hlatky MA. Using databases to evaluate therapy.Stat Med 1991;10(4):647–52.

236. Ho CH. Some frequentist properties of a Bayesianmethod in clinical trials. Biometr J 1991;33:735–40.

237. Holland J. The Reverend Thomas Bayes, F. R. S(1702–61). J R Stat Soc Ser A 1962;125.

238. Hornberger J, Eghtesady P. The cost–benefit of arandomised trial to a health care organization.Controlled Clin Trials 1998;19:198–211.

239. Hornberger JC, Brown BW, Halpern J. Designinga cost-effective clinical trial. Stat Med 1995;14:2249–59.

240. Howson C, Urbach P. Scientific reasoning: theBayesian approach. La Salle (IL): Open Court;1989.

241. Hsu PH, Laddu A, Jordan DC, Stoll RW, DeverkaPA. Use of Bayesian drug causality assessment ofadverse events in a United States pharmaceuticalenvironment. Clin Pharmacol Therapeutics 1992;51:124.

242. Hughes MD. Practical reporting of Bayesiananalyses of clinical trials. Drug Information J1991;25:381–93.

243. Hughes MD. Reporting Bayesian analyses of clinicaltrials. Stat Med 1993;12:1651–63.

244. Hughes MD, Pocock SJ. Stopping rules andestimation problems in clinical trials. Stat Med1988;7:1231–42.

245. Human Fertilisation and Embryology Authority.The patients’ guide to DI and IVF clinics. 2nd ed.London: The Authority; 1996.

246. Hutton JL. The ethics of randomised controlledtrials: a matter of statistical belief? Health CareAnalysis 1996;4:95–102.

247. Hutton JL, Owens RG. Bayesian sample sizecalculations and prior beliefs about child sexualabuse. Statistician 1993;42:399–404.

248. International Conference on Harmonisation E9Expert Working Group. Statistical principles forclinical trials: ICH harmonised tripartite guideline.Stat Med 1999;18:1905–42. URL: http://www.ich.org/ich5e.html

249. Jennison C. Discussion of Breslow (1990). Stat Sci1990;5(3).

250. Jennison C, Turnbull BW. Statistical approaches tointerim monitoring of medical trials: a review andcommentary. Stat Sci 1990;5:299–317.

251. Jones B, Teather D, Wang J, Lewis JA. A comparisonof various estimators of a treatment difference for amulti-centre clinical trial. Stat Med 1998;17:1767–77.

252. Jones DA. A Bayesian approach to economicevaluation of health care technologies. In: SpilkerB, editor. Quality of life and pharmacoeconomicsin clinical trials. New York: Raven Press; 1995.

253. Jones DA. The role of probability distributions ineconomic evaluations. Br J Med Econ 1995;8:137–46.

254. Jones DR. Meta-analysis: weighing the evidence. StatMed 1995;14:137–49.

255. Joseph L, Duberger R, Belisle P. Bayesian andmixed Bayesian/likelihood criteria for sample sizedetermination. Stat Med 1997;16:769–81.

256. Joseph L, Wolfson DB, Du Berger R. Sample sizecalculations for binomial proportions via highestposterior density intervals. J R Stat Soc Ser D 1995;44:143–54.

257. Joseph L, Wolfson DB, Du Berger R, Lyle RM.Change-point analysis of a randomised trial in theeffects of calcium supplementation on bloodpressure. In: Berry and Stangl.61

258. Kadane J. Bayesian methods and ethics in a clinicaltrial design. New York: Wiley; 1996.

259. Kadane J, Sedransk N. Towards a more ethicalclinical trial. In: Bernardo J, DeGroot MH, LindleyDV, Smith AFM, editors. Bayesian statistics 1.Valencia: University Press; 1980. p. 329–38.

260. Kadane J, Wolfson L. Priors for the design andanalysis of clinical trials. In: Berry and Stangl,61

p. 157–84.

261. Kadane JB. Progress toward a more ethical methodfor clinical trials. J Med Phil 1986;11:385–404.

262. Kadane JB. Prime time for Bayes. Controlled ClinTrials 1995;16:313–18.

263. Kadane JB, Seidenfeld T. Randomization in aBayesian perspective. J Stat Planning Inference1990;25:329–45.

264. Kadane JB, Vlachos P, Wieand S. Decision analysisfor a data monitoring committee of a clinical trial.In: Giron FJ, Martinez ML, editors. Proceedings ofthe International Workshop on Decision AnalysisApplications. Boston: Kluwer, 1998. p. 115–21.

265. Kadane JB, Wolfson L. Experiences in elicitation.Statistician 1997;46:1–17.

References

96

Page 107: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

266. Kass R, Raftery A. Bayes factors and modeluncertainty. J Am Stat Assoc 1995;90:773–95.

267. Kass RE, Greenhouse JB. Comments on‘Investigating therapies of potentially great benefit:ECMO’ by J H Ware. Stat Sci 1989;4:310–17.

268. Kass RE, Wasserman L. The selection of priordistributions by formal rules. J Am Stat Assoc1996;91:1343–70.

269. Keiding N. Bayesian approaches to randomisedtrials – discussion. J R Stat Soc Ser A 1994;157:387–416.

270. Kleinman KP, Ibrahim JG, Laird NM. A Bayesianframework for intent-to-treat analysis with missingdata. Biometrics 1998;54:265–78.

271. Kober L, Torppedersen C, Cole D, Hampton JR,Camm AJ. Bayesian interim statistical analysis ofrandomised trials: the case against. Lancet 1997;349:1168–9.

272. Koch GG. Summary and discussion for ‘Statisticalissues in the pharmaceutical industry: analysis andreporting of Phase III clinical trials includingkinetic/dynamic analysis and Bayesian analysis’.Drug Information J 1991;25:433–7.

273. Korn EL. Projection from previous studies. acaution. Controlled Clin Trials 1990;11:67–9.

274. Korn EL, Midthune D, Chen TT, Rubinstein LV,Christian MC, Simon RM. A comparison of twoPhase I trial designs. Stat Med 1994;13:1799–806.

275. Korn EL, Simon R. Data monitoring committeesand problems of lower than expected accrual orevent rates. Controlled Clin Trials 1996;17:526–35.

276. Korn EL, Yu KF, Miller LL. Stopping a clinical trialvery early because of toxicity – summarizing theevidence. Controlled Clin Trials 1993;14:286–95.

277. Lachin JM. Sequential clinical-trials for normalvariates using interval composite hypotheses.Biometrics 1981;37:87–101.

278. Lan KKG, Wittes J. The B-value – a tool formonitoring data. Biometrics 1988;44:579–85.

279. Lanctot KL, Kwok MCO, Busto U, Naranjo CA.Bayesian evaluation of spontaneous reports ofpancreatitis. Clin Pharmacol Therapeutics 1996;59:PII 4.

280. Lane N. Common sense, nonsense and statistics.J R Soc Med 1999;92:202–5.

281. Lang T, Secic M. Considering ‘prior probabilities’:reporting Bayesian statistical analyses. In: How toreport statistics in medicine. American College ofPhysicians; 1997. p. 231–5.

282. Lange N, Carlin BP, Gelfand AE. HierarchicalBayes models for the progression of HIV-infection

using longitudinal CD4 T-cell numbers. J Am StatAssoc 1992;87:615–26.

283. Larose DT, Dey DK. Grouped random effectsmodels for Bayesian meta-analysis. Stat Med 1997;16:1817–29.

284. Lau J, Schmid CH, Chalmers TC. Cumulativemeta-analysis of clinical trials builds evidence forexemplary medical care. J Clin Epidemiol 1995;48:45–57.

285. Lavori PW, Dawson R, Shera D. A multipleimputation strategy for clinical trials withtruncation of patient data. Stat Med 1995;14:1913–25.

286. Lecoutre B, Derzko G, Grouin JM. Bayesianpredictive approach for inference aboutproportions. Stat Med 1995;14:1057–63.

287. Lee PM. Bayesian statistics: an introduction.London: Edward Arnold; 1989.

288. Legler JM, Ryan LM. Latent variable models forteratogenesis using multiple binary outcomes. J AmStat Assoc 1997;92:13–20.

289. Lehmann HP, Nguyen B. Bayesian communicationof research results over the world wide web. M DComput 1997;14:353–9.

290. Lehmann HP, Shachter RD. A physician-basedarchitecture for the construction and use ofstatistical models. Methods Inform Med1994;33:423–32.

291. Lehmann HP, Wachter MR. Implementing theBayesian paradigm: reporting research results overthe world-wide web. In: Proceedings of the AMIAAnnual Fall Symposium; 1996. p. 433–7.

292. Lewis J. Bayesian approaches to randomised trials –discussion. J R Stat Soc Ser A 1994;157:387–416.

293. Lewis R. Bayesian hypothesis testing: interimanalysis of a clinical trial evaluating phenytoin forthe prophylaxis of early post-traumatic seizures inchildren. In: Berry and Stangl,61 p. 279–96.

294. Lewis RJ, Berry DA. Group sequential clinicaltrials – a classical evaluation of Bayesian decision-theoretic designs. J Am Stat Assoc 1994;89:1528–34.

295. Lewis RJ, Wears RL. An introduction to theBayesian analysis of clinical trials. Ann Emerg Med1993;22:1328–36.

296. Li Z, Begg CB. Random effects models forcombining results from controlled anduncontrolled studies in a meta-analysis. J Am StatAssoc 1994;89:1523–7.

297. Lilford R, Gudmundsson S, James D, Mason G,Neales K, Pearce M, et al. Formal measurement ofclinical uncertainty: prelude to a trial in perinatalmedicine. BMJ 1994;308:111–12.

Health Technology Assessment 2000; Vol. 4: No. 38

97

Page 108: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

298. Lilford R, Royston G. Decision analysis in theselection, design and application of clinical andhealth services research. J Health Serv Res Policy1998;3:159–66.

299. Lilford RJ, Braunholtz D. The statistical basis ofpublic policy: a paradigm shift is overdue. BMJ1996;313:603–7.

300. Lilford RJ, Jackson J. Equipoise and the ethics ofrandomization. J R Soc Med 1995;88:552–9.

301. Lilford RJ, Pauker SG, Braunholtz DA, Chard J.Getting research findings into practice: Decisionanalysis and the implementation of researchfindings. BMJ 1998;317(7155):405–9.

302. Lilford RJ, Thornton JG, Braunholtz D. Clinicaltrials and rare diseases – a way out of a conundrum.BMJ 1995;311:1621–5.

303. Lindley D. The effect of ethical designconsiderations on statistical analysis. Appl Stat1975;24:218–28.

304. Lindley D. Bayesian approaches to randomisedtrials – discussion. J R Stat Soc Ser A 1994;157:387–416.

305. Lindley D. The choice of sample size. Statistician1997;46:129–38.

306. Lindley DV. Scoring rules and the inevitability ofprobability. Int Stat Rev 1982;50:1–26.

307. Lindley DV. Making decisions. 2nd ed. Chichester:Wiley; 1985.

308. Lindley DV. Decision analysis and bioequivalencetrials. Stat Sci 1998;13:136–41.

309. Louis TA. Using empirical Bayes methods inbiopharmaceutical research. Stat Med 1991;10:811–29.

310. Luce BR, Claxton K. Redefining the analyticalapproach to pharmacoeconomics. Health Econ1999;8:187–9.

311. Manning WG, Fryback FG, Weinstein MC.Reflecting uncertainty in cost-effectiveness analysis.In: Gold MR, Siegel JR, Weinstein MC, Russell LB,editors. Cost effectiveness in health and medicine.New York: Oxford University Press; 1996. p. 247–75.

312. Marshall EC, Spiegelhalter DJ. League tables of invitro fertilisation clinics: how confident can we beabout the rankings? BMJ 1998;317:1701–4.

313. Marshall RJ. Bayesian analysis of case–controlstudies. Stat Med 1988;7:1223–30.

314. Martinez ML. Bioequivalence assessment.population and individual bioequivalence. An RealAcad Farm Inst Espana 1997;63(3):493–531.

315. Matchar DB, Samsa GP, Matthews JR, AncukiewiczM, Parmigiani G, Hasselblad V, et al. The stroke

prevention policy model: Linking evidence andclinical decisions. Ann Intern Med 1997;127:704–11.

316. Matsuyama Y, Sakamoto J, Ohashi Y. A Bayesianhierarchical survival model for the institutionaleffects in a multi-centre cancer clinical trial. StatMed 1998;17(17):1893–908.

317. Matthews JNS. Small clinical trials – are they allbad? Stat Med 1995;14:115–26.

318. Matthews R. Faith, hope and statistics. New Sci1997;156:36.

319. Matthews R. Fact versus factions: the use and abuseof subjectivity in scientific research. Cambridge:The European Science and Environment Forum;1998. Technical report.

320. McIntosh MW. The population risk as anexplanatory variable in research synthesis of clinicaltrials. Stat Med 1996;15(16):1713–28.

321. Mcpherson K. On choosing the number of interimanalyses in clinical trials. Stat Med 1982;1:25–36.

322. Mehta CR, Cain KC. Charts for the early stoppingof pilot studies. J Clin Oncol 1984;2:676–82.

323. Meier P. Statistics and medical experimentation.Biometrics 1975;31:511–29.

324. Metzler CM. Sample sizes for bioequivalencestudies. Stat Med 1991;10:961–70.

325. Miller MA, Seaman JW. A Bayesian approach toassessing the superiority of a dose combination.Biometr J 1998;40:43–55.

326. Moller S. An extension of the continualreassessment methods using a preliminary up-and-down design in a dose finding study in cancerpatients, in order to investigate a greater range ofdoses. Stat Med 1995;14:911–22. Discussion: 923.

327. Morris CN, Normand SL. Hierarchical models forcombining information and for meta-analysis. In:Bernardo et al.,41 321–44.

328. Moussa MAA. Exact, conditional and predictivepower in planning clinical trials. Controlled ClinTrials 1989;10:378–85.

329. Muliere P, Walker S. A Bayesian nonparametricapproach to determining a maximum tolerateddose. J Stat Planning Inference 1997;61:339–53.

330. Müller P, Parmigiani G, Schlidkraut J, Tardella L.A Bayesian hierarchical approach for combiningcase–control and prospective studies. Biometrics1999;55:858–66.

331. Murphy A, Winkler R. Reliability of subjectiveprobability forecasts of precipitation andtemperature. Appl Stat 1977;26:41–7.

References

98

Page 109: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

332. Nicolas P, Tod M, Petitjean O. Review and use ofdecision rules for bioequivalence studies. Therapie1993;48:15–22.

333. Normand SL, Glickman ME, Gatsonis CA. Statisticalmethods for profiling providers of medical care:issues and applications. J Am Stat Assoc1997;92:803–14.

334. Normand ST. Meta-analysis: Formulating,evaluating, combining and reporting. Stat Med1999;18:321–59.

335. Nurminen M, Mutanen P. Bayesian analysis ofcase–control studies. Stat Med 1989;8:1023–4.

336. O’Brien PC. Data and safety monitoring. In:Armitage and Colton,20 p. 1058–66.

337. Office GA. Cross design synthesis: a new strategy formedical effectiveness research. Washington, DC.:General Accounting Office, 1992.

338. O’Hagan A. Kendall’s advanced theory of statistics.Vol 2B. Bayesian inference. London: Arnold; 1994.

339. O’Neill RT. Conclusions: 2. Stat Med 1994;13:1493–9.

340. O’Quigley J. Estimating the probability of toxicity atthe recommended dose following a Phase I clinicaltrial in cancer. Biometrics 1992;48:853–62.

341. O’Quigley J, Chevret S. Methods for dose findingstudies in cancer clinical-trials – a review and resultsof a Monte Carlo study. Stat Med 1991;10:1647–64.

342. O’Quigley J, Pepe M, Fisher L. Continualreassessment method – a practical design for PhaseI clinical trials in cancer. Biometrics 1990;46:33–48.

343. O’Rourke K. Two cheers for Bayes. Controlled ClinTrials 1996;17:350–2.

344. Palmer CR. A comparative phase-2 clinical-trialsprocedure for choosing the best of 3 treatments.Stat Med 1991;10:1327–40.

345. Palmer CR. Ethics and statistical methodology inclinical trials. J Med Ethics 1993;19:219–22.

346. Palmer CR, Rosenberger WF. Ethics and practice:alternative designs for Phase III randomised clinicaltrials. Controlled Clin Trials 1999;20:172–86.

347. Papineau D. The virtues of randomization. Br J PhilSci 1994;45:437–50.

348. Parmar MKB, Spiegelhalter DJ, Freedman LS. TheCHART trials: Bayesian design and monitoring inpractice. Stat Med 1994;13:1297–312.

349. Parmar MKB, Ungerleider RS, Simon R. Assessingwhether to perform a confirmatory randomisedclinical trial. J Natl Cancer Inst 1996;88:1645–51.

350. Parmigiani G. Decision models in screening forbreast cancer. In: Bernardo J, Berger J, Dawid A,

Smith A, editors. Bayesian statistics 6. Oxford:Oxford University Press; 1999. p. 525–46.

351. Parmigiani G, Anckiewicz M, Matchar D. Decisionmodels in clinical recommendations development:The stroke prevention policy model. In: Berry andStangl,61 p. 207–233.

352. Parmigiani G, Kamlet M. A cost-utility analysis ofalternative strategies in screening for breast cancer.In: Gatsonis C, Hodges J, Kass R, Singpurwalla N,editors. Case studies in Bayesian statistics. Berlin:Springer-Verlag; 1993. p. 390–402.

353. Parmigiani G, Samsa GP, Ancukiewicz M, LipscombJ, Hasselblad V, Matchar DB. Assessing uncertaintyin cost-effectiveness analyses: application to acomplex decision model. Med Decision Making1997;17:390–401.

354. Pepple PA, Choi SC. Analysis of incomplete dataunder nonrandom mechanisms – Bayesianinference. Comm Stat Simulation Comput 1994;23:743–67.

355. Pepple PA, Choi SC. Bayesian approach to two-stage Phase II trial. J Biopharm Stat 1997;7(2):271–86.

356. Perneger T. What’s wrong with Bonferroniadjustments. BMJ 1998;316:1236–8.

357. Peto R. “discussion of ‘on the allocation oftreatments in sequential medical trials’ byJ Bather”. Int Stat Rev 1985;53:1–13.

358. Peto R, Baigent C. Trials: the next 50 years. BMJ1998;317:1170–1.

359. Peto R, Collins R, Gray R. Large-scale randomisedevidence – large, simple trials and overviews oftrials. J Clin Epidemiol 1995;48:23–40.

360. Pocock S. The combination of randomised andhistorical controls in clinical trials. J Chronic Dis1976;29:175–88.

361. Pocock S. Clinical trials: a practical approach.Chichester: Wiley; 1983.

362. Pocock S. Statistical and ethical issues inmonitoring clinical-trials. BMJ 1992;305:235–40.

363. Pocock S, Spiegelhalter D. Domiciliary thrombolysisby general practitioners. BMJ 1992;305:1015.

364. Pocock SJ, Hughes MD. Practical problems ininterim analyses, with particular regard toestimation. Controlled Clin Trials 1989;10:209S–21S.

365. Pocock SJ, Hughes MD. Estimation issues in clinicaltrials and overviews. Stat Med 1990;9:657–71.

366. Pocock SP. Bayesian approaches to randomisedtrials – discussion. J R Stat Soc Ser A 1994;157:387–416.

Health Technology Assessment 2000; Vol. 4: No. 38

99

Page 110: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

367. Pogue JM, Yusuf S. Cumulating evidence fromrandomised trials: utilizing sequential monitoringboundaries for cumulative meta-analysis. ControlledClin Trials 1997;18(6):580–93.

368. Qian J, Stangl D, George S. A Weibull model forsurvival data: using prediction to decide when tostop a clinical trial. In: Berry and Stangl.61

369. Raab GM. Conflict between prior and current data.Appl Stat 1996;45(2):247–51.

370. Racine A, Grieve AP, Fluhler H, Smith AFM.Bayesian methods in practice – experiences in thepharmaceutical industry. Appl Stat 1986;35:93–150.

371. Racine-Poon A, Grieve AP, Fluhler H, Smith AFM.A 2-stage procedure for bioequivalence studies.Biometrics 1987;43:847–56.

372. Racine-Poon A, Wakefield J. Bayesian analysis ofpopulation pharmacokinetic and instantaneouspharmacodynamic relationships. In: Berry andStangl,61 p. 321–54.

373. Raghunathan TE, Siscovick DS. A multiple-imputation analysis of a case–control study of therisk of primary cardiac-arrest among pharmaco-logically treated hypertensives. Appl Stat J R Stat SocSer C 1996;45:335–52.

374. Raudenbush SW, Bryk AS. Empirical Bayes meta-analysis. J Educ Stat 1985;10:75–98.

375. Richards B, Blandy J, Bloom HGJ, MRC BladderCancer Working Party. The effect of intravesicalthiotepa on tumor recurrence after endoscopictreatment of newly-diagnosed superficial bladder-cancer – a further report with long-term follow-upof a Medical-Research-Council randomised trial.Br J Urol 1994;73:632–8.

376. Richardson S, Gilks WR. A Bayesian approach tomeasurement error problems in epidemiologyusing conditional independence models. Am JEpidemiol 1993;138:430–42.

377. Richardson S, Monfort C, Green M, Draper G,Muirhead C. Spatial variation of natural radiationand childhood leukaemia incidence in GreatBritain. Stat Med 1995;14:2487–501.

378. Rittenhouse BE. Exorcising protocol-inducedspirits: making the clinical trial relevant foreconomics. Med Decision Making 1997;17:331–9.

379. Rogatko A. Bayesian approach for meta-analysis ofcontrolled clinical-trials. Commun Stat TheoryMethods 1992;21:1441–62.

380. Rosenbaum PR, Rubin DB. Sensitivity of Bayesinference with data-dependent stopping rules. AmStat 1984;38:106–9.

381. Rosner GL, Berry DA. A Bayesian group sequentialdesign for a multiple arm randomised clinical trial.Stat Med 1995;14:381–94.

382. Rothman K. No adjustments are needed formultiple comparisons. Epidemiology 1990;1:43–6.

383. Rubin D. Bayesian inference for casual effects: therole of randomization. Ann Stat 1978;7:34–58.

384. Rubin DB. A new perspective. In: Rubin D, WachterK, Straf M, editors. The future of meta-analysis.New York: Russell Sage Foundation; 1992.p. 155–65.

385. Rubin DB. More powerful randomization-based p-values in double-blind trials with non-compliance.Stat Med 1998;17:371–85; Discussion: 387–9.

386. Ryan L. Using historical controls in the analysis ofdevelopmental toxicity data. Biometrics 1993;49:1126–35.

387. Samsa GP, Reutter RA, Parmigiani G, AncukiewiczM, Abrahamse P, Lipscomb J, et al. Performing cost-effectiveness analysis by integrating randomisedtrial data with a comprehensive decision model:application to treatment of acute ischemic stroke.J Clin Epidemiol 1999;52:259–71.

388. Sargent D, Carlin B. Robust Bayesian design andanalysis of clinical trials via prior partitioning(with discussion). In: Berger JO, editor. Bayesianrobustness: IMS lecture notes – monograph series,29. Hayward (CA): Institute of MathematicalStatistics; 1996. p. 175–193.

389. Sasahara A, Cole T, Ederer F, Murray J, Wenger N,Sherry S, et al. Urokinase pulmonary embolism trial,a national cooperative study. Circulation 1973;47(suppl 2):1–108.

390. Saunders M, Dische S, Barrett A, Harvey A,Gibson D, Parmar M, et al. Continuous hyper-fractionated accelerated radiotherapy CHARTversus con ventional radiotherapy in non-small-celllung cancer: a randomised multicentre trial. Lancet1997;350:161–5.

391. Savage L. Elicitation of personal probabilities andexpectations. J Am Stat Assoc 1971;66:783–801.

392. Schervish MJ. p values: what they are and what theyare not. Am Stat 1996;50:203–6.

393. Schmid CH, Lau J, McIntosh MW, Cappelleri JC.An empirical study of the effect of the control rateas a predictor of treatment efficacy in meta-analysisof clinical trials. Stat Med 1998;17(17):1923–42.

394. Selwyn MR, Dempster AP, Hall NR. A Bayesianapproach to bioequivalence for the 2 × 2changeover design. Biometrics 1981;37:11–21.

395. Selwyn MR, Hall NR. On Bayesian methods forbioequivalence. Biometrics 1984;40:1103–8.

396. Senn S. Some statistical issues in projectprioritization in the pharmaceutical industry. StatMed 1996;15:2689–702.

References

100

Page 111: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

397. Senn S. Statistical basis of public policy – presentremembrance of priors past is not the same as atrue prior. BMJ 1997;314:73.

398. Senn S. Statistical issues in drug development.Chichester: Wiley; 1997.

399. Shachter R, Eddy DM, Hasselblad V. An influencediagram approach to medical technologyassessment. In: Oliver RM, Smith JQ, editors.Influence diagrams, belief nets and decisionanalysis. Chichester: Wiley; 1990. p. 321–350.

400. Sheiner LB. The intellectual health of clinical drug-evaluation. Clin Pharmacol Therapeutics 1991;50:4–9.

401. Simes RJ. Application of statistical decision theoryto treatment choices: implications for the designand analysis of clinical trials. Stat Med 1986;4:1401–9.

402. Simon R. Adaptive treatment assignment methodsand clinical trials. Biometrics 1977;33:743–9.

403. Simon R. Statistical tools for subset analysis inclinical trials. Recent Results Cancer Res 1988;111:55–66.

404. Simon R. Problems of multiplicity in clinical trials.J Stat Planning Inference 1994;42:209–21.

405. Simon R. Some practical aspects of the interimmonitoring of clinical trials. Stat Med 1994;13:1401–9.

406. Simon R, Dixon DO, Friedlin B. Bayesian subsetanalysis of a clinical trial for the treatment of HIVinfections. In: Berry and Stangl,61 p. 555–76.

407. Simon R, Freedman LS. Bayesian design andanalysis of two × two factorial clinical trials.Biometrics 1997;53:456–64.

408. Simon R, Thall PF, Ellenberg SS. New designsfor the selection of treatments to be tested inrandomised clinical trials. Stat Med 1994;13:417–29.

409. Skene AM, Wakefield JC. Hierarchial models formulticentre binary response studies. Stat Med1990;9:919–29.

410. Smith T, Spiegelhalter DJ, Parmar MKB. Bayesianmeta-analysis of randomised trials using graphicalmodels and BUGS. In: Berry and Stangl,61

p. 411–27.

411. Smith TC, Abrams KR, Jones DR. Usinghierarchical models in generalised synthesis ofevidence: an example based on studies of breastcancer screening. Leicester: Department ofEpidemiology and Public Health, University ofLeicester; 1995. Technical report 95-02.

412. Smith TC, Spiegelhalter DJ, Thomas A. Bayesianapproaches to random-effects meta-analysis: acomparative study. Stat Med 1995;14:2685–99.

413. Souhami RL, Craft AW, Van der Eijken, Nooij M,Spooner D, Bramwell VHC, et al. Randomised trialof two regimens of chemotherapy in operableosteosarcoma: a study of the European osteo-sarcoma intergroup. Lancet 1997;350:911–17.

414. Spiegelhalter D, Freedman L. Bayesian approachesto clinical trials. In: Bernardo et al.,42 p. 453–77.

415. Spiegelhalter D, Myles J, Jones D, Abrams K. Anintroduction to Bayesian methods in healthtechnology assessment. BMJ 1999;319:508–12.

416. Spiegelhalter DJ. Probabilistic prediction in patientmanagement and clinical trials. Stat Med 1986;5:421–33.

417. Spiegelhalter DJ. Bayesian graphical modelling: acase-study in monitoring health outcomes. Appl StatJ R Stat Soc Ser C 1998;47:115–33.

418. Spiegelhalter DJ, Freedman LS. A predictiveapproach to selecting the size of a clinical trial,based on subjective clinical opinion. Stat Med1986;5:1–13.

419. Spiegelhalter DJ, Freedman LS, Blackburn PR.Monitoring clinical trials: Conditional of predictivepower? Controlled Clin Trials 1986;7:8–17.

420. Spiegelhalter DJ, Freedman LS, Parmar MKB.Applying Bayesian ideas in drug development andclinical trials. Stat Med 1993;12:1501–17.

421. Spiegelhalter DJ, Freedman LS, Parmar MKB.Bayesian approaches to randomised trials (withdiscussion). J R Stat Soc Ser A 1994;157:357–87.

422. Spiegelhalter DJ, Harris NL, Bull K, Franklin RCG.Empirical evaluation of prior beliefs aboutfrequencies: methodology and a case study incongenital heart disease. J Am Stat Assoc1994;89:435–43.

423. Spiegelhalter DJ, Thomas A, Best NG, Gilks WR.BUGS: Bayesian inference using Gibbs sampling,version 0.5 (version ii). Cambridge: MRCBiostatistics Unit; 1996.

424. Stallard N. Sample size determination for Phase IIclinical trials based on Bayesian decision theory.Biometrics 1998;54:279–94.

425. Stangl D. Hierarchical analysis of continuous-timesurvival models. In: Berry and Stangl,61 p. 429–50.

426. Stangl D, Berry DA. Bayesian statistics in medicine:where we are and where we should be going.Sankhya Ser B 1998;60:176–95.

427. Stangl DK. Prediction and decision-making usingBayesian hierarchical-models. Stat Med 1995;14:2173–90.

428. Stangl DK, Greenhouse JB. Assessing placeboresponse using Bayesian hierarchical survivalmodels. Lifetime Data Anal 1998;4:5–28.

Health Technology Assessment 2000; Vol. 4: No. 38

101

Page 112: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

429. Staquet MJ, Rozencweig M, Von Hoff DD, MuggiaFM. The delta and epsilon errors in the assessmentof cancer clinical trials. Cancer Treatment Rep 1979;63:1917–21.

430. Staquet MJ, Sylvster RJ. A decision theory approachto Phase II clinical trials. Biomedicine 1977;26:262–6.

431. Stijnen T, Van Houwelingen JC. Empirical Bayesmethods in clinical trials meta-analysis. Biometric J1990;32:335–46.

432. Strauss N, Simon R. Investigating a sequence ofrandomised Phase II trials to discover promisingtreatments. Stat Med 1995;14(13):1479–89.

433. Su XY, Po ALW. Combining event rates fromclinical trials: comparison of Bayesian and classicalmethods. Ann Pharmacotherapy 1996;30:460–5.

434. Sutton A, Abrams KR, Jones DR, Sheldon TA,Song F. Systematic reviews of trials and otherstudies. HTA 1998;2(19).

435. Sylvester RJ. A Bayesian approach to sample sizedetermination in Phase II cancer clinical trials.Controlled Clin Trials 1984;5:305.

436. Sylvester RJ. A Bayesian approach to the design ofPhase II clinical trials. Biometrics 1988;44:823–36.

437. Sylvester RJ, Stquet M. Design of Phase II trials incancer using decision theory. Cancer Treatment Rep1988;64:519–24.

438. Tamura RN, Faries DE, Andersen JS, HeiligensteinJH. A case-study of an adaptive clinical trial in thetreatment of out-patients with depressive disorder.J Am Stat Assoc 1994;89:768–76.

439. Tan SB, Smith AFM. Exploratory thoughts onclinical trials with utilities. Stat Med 1998;17:2771–91.

440. Tarone R. The use of historical control informationin testing for a trend in proportions. Biometrics1982;38:215–20.

441. Templeton A, Morris JK, Parslow W. Factors thataffect outcome on in vitro fertilisation treatment.Lancet 1996;348:1402–6.

442. Ten Centre Study Group. Ten centre study ofartificial surfactant (artificial lung expandingcompound) in very premature babies. BMJ1987;294:991–6.

443. Teo KK, Yusuf S, Collins R, Held PH, Peto R.Effects of intravenous magnesium in suspectedacute myocardial infarction: Overview ofrandomised trials. BMJ 1991;303:1499–503.

444. Thall PF, Estey EH. A Bayesian strategy forscreening cancer treatments prior to Phase IIclinical evaluation. Stat Med 1993;12:1197–211.

445. Thall PF, Lee JJ, Tseng CH, Estey EH. Accrualstrategies for Phase I trials with delayed patientoutcome. Stat Med 1999;18:1155–69.

446. Thall PF, Russell KE. A strategy for dose-findingand safety monitoring based on efficacy andadverse outcomes in Phase I/II clinical trials.Biometrics 1998;54:251–64.

447. Thall PF, Simon R. A Bayesian approach toestablishing sample size and monitoring criteria forPhase II clinical trials. Controlled Clin Trials 1994;15:463–81.

448. Thall PF, Simon R. Practical Bayesian guidelinesfor Phase IIb clinical trials. Biometrics 1994;50:337–49.

449. Thall PF, Simon RM, Estey EH. Bayesian sequentialmonitoring designs for single-arm clinical trialswith multiple outcomes. Stat Med 1995;14:357–79.

450. Thall PF, Simon RM, Estey EH. New statisticalstrategy for monitoring safety and efficacy in single-arm clinical trials. J Clin Oncol 1996;14:296–303.

451. Thall PF, Sung H. Some extensions andapplications of a Bayesian strategy for monitoringmultiple outcomes in clinical trials. Stat Med 1998;17(14):1563–80.

452. Thompson M. Decision-analytic determination ofstudy size. Med Decision Making 1981;1:165–79.

453. Thompson SG, Smith TC, Sharp SJ. Investigatingunderlying risk as a source of heterogeneity inmeta-analysis. Stat Med 1997;16(23):2741–58.

454. Tukey J. Some thoughts on clinical trials, especiallyproblems of multiplicity. Science 1977;198:679–84.

455. Tunis SR, Sheinhait IA, Schmid CH, Bishop DJ,Ross SD. Lansoprazole compared withhistamine(2)- receptor antagonists in healinggastric ulcers: a meta-analysis. Clin Therapeutics1997;19:743–57.

456. Tversky A. Assessing uncertainty (with discussion).J R Stat Soc B 1974;36:148–59.

457. Tweedie RL, Scott DJ, Biggerstaff BJ, MengersenKL. Bayesian meta-analysis, with application tostudies of ETS and lung-cancer. Lung Cancer1996;14: S 171–S 194.

458. University Group Diabetes Program. A study ofthe effects of hypoglycemic agents on vascularcomplications in patients with adult onset diabetes.Diabetes 1970;19 (suppl 2):747–830.

459. Urbach P. The value of randomization and controlin clinical trials. Stat Med 1993;12:1421–31.

460. US Food and Drug Administration. Semiannualguidance agenda. Federal Register 1998;63(212):59317–26.

References

102

Page 113: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

461. US Food and Drug Administration. Transcriptof Cardiovascular and Renal Drugs AdvisoryCommittee meeting, 26 June 1997. URL:http://www.fda.gov/ohrms/dockets/ac/97/transcpt/3320t1.pdf; 1998.

462. US Food and Drug Administration. Guidance forindustry: population pharmacokinetics. URL:http://www.fda.gov.cder/guidance/index.htm;1999.

463. US Food and Drug Administration. Summary ofSafety and effectiveness data for T-scan breastscanner. URL: http://www.fda.gov/cdrh/pdf/p970033b.pdf; 1999.

464. Vanhouwelingen HC. The future of biostatistics:expecting the unexpected. Stat Med 1997;16:2773–84.

465. Vanhouwelingen HC, Zwinderman K, Stijnen T. Abivariate approach to meta-analysis. Stat Med 1993;12:2272–84.

466. Wakefield J, Bennett J. The Bayesian modeling ofcovariates for population pharmacokinetic models.J Am Stat Assoc 1996;91:917–27.

467. Wakefield J, Walker S. A population approach toinitial dose selection. Stat Med 1997;16:1135–49.

468. Waller SE, Duncan D. A Bayes rule for thesymmetric multiple comparison problem. J AmStat Assoc 1969;64:1484–508.

469. Ware J. Investigating therapies of potentiallygreat benefit: ECMO (with discussion). Stat Sci1989;4:298–340.

470. Ware JH, Muller JE, Braunwald E. The futilityindex: an approach to the cost-effective terminationof randomised clinical trials. Am J Med 1985;78:635–43.

471. Weinstein M, Fineberg H. Clinical decision analysis.Philadelphia: Saunders; 1980.

472. Westfall PH, Johnson WO, Utts JM. A Bayesianperspective on the Bonferroni adjustment.Biometrika 1997;84:419–27.

473. Whitehead J. Designing Phase II studies in thecontext of a program of clinical research. Biometrics1985;41:373–83.

474. Whitehead J. Sample sizes for Phase-II and Phase-IIIclinical trials – an integrated approach. Stat Med1986;5:459–64.

475. Whitehead J. Letter to the editor. Controlled ClinTrials 1991;12:340–4.

476. Whitehead J. The case for frequentism in clinicaltrials. Stat Med 1993;12:1405–19.

477. Whitehead J. Bayesian decision procedures withapplication to dose-finding studies. Int J Pharm Med1997;11(4):201–7.

478. Whitehead J. The design and analysis of sequentialclinical trials. 2nd ed. Chichester: Wiley; 1997.

479. Whitehead J, Brunier H. Bayesian decisionprocedures for dose determining experiments. StatMed 1995;14:885–93.

480. Woods KL. Mega-trials and management of acutemyocardial-infarction. Lancet 1995;346:611–14.

481. Woods KL, Fletcher S, Roffe C, Haider Y.Intravenous magnesium sulphate in suspectedacute myocardial infarction: results of the secondLeicester intravenous magnesium intervention trial(LIMIT-2). Lancet 1992;339(8809):1553–8.

482. Yao TJ, Begg CB, Livingston PO. Optimal samplesize for a series of pilot trials of new agents.Biometrics 1996;52:992–1001.

483. Yusuf S, Flather M. Magnesium in acute myocardial-infarction. BMJ 1995;310:751–2.

484. Yusuf S, Teo K, Woods K. Intravenous magnesiumin acute myocardial infarction: an effective, safe,simple and inexpensive treatment. Circulation1993;87:2043–6.

485. Zelen M. Play the winner rule and the controlledclinical trial. J Am Stat Assoc 1969;64:131–46.

486. Zelen M. Discussion of Breslow (1990). Stat Sci1990;5(3).

487. Zelen M, Parker RA. Case control studies andBayesian inference. Stat Med 1986;5:261–9.

488. Zucker DR, Schmid CH, McIntosh MW, AgostinoRB, Selker HP, Lau J. Combining single patient(n-of-1) trials to estimate population treatmenteffects and to evaluate individual patient responsesto treatment. J Clin Epidemiol 1997;50:401–10.

Health Technology Assessment 2000; Vol. 4: No. 38

103

Page 114: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient
Page 115: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Health Technology Assessment 2000; Vol. 4: No. 38

105

Appendix 1

Three-star applications

Author(s) Year Design Model Prospectivestudy

Elicitedpriorfromexperts

Lossfunctionused

Computations Sensitivity

Abrams et al.1 * 1994 RCT B 7 3 3 C 3

Abrams et al.2 1996 RCT S 7 7 7 L 3

Berry51 1989 RCT B 7 7 7 NI 7

Berry et al.64 1994 RCT P 7 7 33 D 33

Brophy and Joseph76 1995 RCT NA 7 7 7 C 33

Brophy and Joseph77 1997 RCT NA 7 7 7 C 7

Carlin et al90 1993 RCT S 7 3 3 NA/NI 33

DerSimonian129 1996 RCT NA 3 7 7 C 33

Digman et al.133 1998 RCT NA 3 7 7 C 33

Fayers et al.166 1997 RCT NA 3 7 3 C 33

Fletcher et al.173 1993 RCT NA 7 7 3 CNR NRFreedman andSpiegelhalter182

1992 RCT NA 7 7 3 C 33

Freedman et al.183 1994 RCT NA 7 7 3 C 33

George et al.191 1994 RCT S 7 7 7 MCMC 33

Greenhouse andWasserman206

1995 RCT B 7 7 7 NI 33

Gustafson219 1996 RCT B 7 7 7 NES 33

Hughes242 1991 RCT NA 3 3 7 C 33

Kadane258 1996 ETH N 3 3 7 C 33

Kass andGreenhouse267

1989 RCT B 7 7 7 NES 33

Lewis293 1996 RCT B 3 7 33 C 33

Lilford andBraunholtz299

1996 M-A NA 7 3 7 C 33

Parmar et al.348 1994 RCT NA 3 3 3 C 33

Parmar et al.349 1996 RCT NA 7 7 7 C 33

Pocock andHughes364

1989 RCT NA 7 7 7 C 33

Pocock andSpiegelhalter363

1992 RCT NA 7 3 7 C 7

Sasahara et al.389 1973 RCT NA 3 7 7 C 7

Spiegelhalter et al.420 1993 RCT NA 3 3 3 C 33

Spiegelhalter et al.421 1994 RCT NA 7 7 3 C 33

Stangl427 1995 RCT S 7 7 7 MCMC 7

Ware469 1989 RCT B 7 7 7 A 7

Key: design (RCT, randomised controlled trial; ETH, ‘ethical’ study; M-A, meta-analysis); model (B, binomial; NA, normalapproximation; N, normal; P, Poisson; S, Survival); prospective analysis (3, yes; 7, no); elicited prior from experts (3, yes; 7, no); lossfunction used (33, yes; 3, just demands; 7, no); computations (A, analytical/closed-form; C, conjugate; D, dynamic programming; L,Laplace; NA, normal approximations; NI, numerical integration; NES, not explicitly stated); sensitivity analysis (33, full; 3, referenceonly; 7, no; NR, none reported)* This three-star application is provided earlier in this report in chapter 8 (see page 56)

TABLE 24 Summary of three-star applications with respect to BayesWatch criteria

Page 116: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

What is a three-star application

In chapter 1 we defined ‘three-star’ Bayesian healthtechnology assessment studies as those

1. intending to confirm the value of a technology2. using an informative, carefully considered prior

distribution for the primary quantity of interestand

3. updating, or planning to update, this priordistribution by Bayes’s theorem.

Table 24 summarises the three-star applicationsidentified by the review with respect to theBayesWatch criteria outlined in chapter 8,although we have placed ‘evidence from study’earlier in the list. Out of the 30 studies identified,all but two considered the application of Bayesianmethods in a randomised controlled trial setting,with 17 studies adopting a normal approximationto the appropriate likelihood and a correspondingconjugate analysis. This is particularly interestingsince the majority of papers appeared after 1993yet only two used a MCMC technique, and high-lights the wide applicability of a normal–normalconjugate model. Perhaps more disappointinglyonly nine studies conducted the analysis prospec-tively, of which four also undertook elicitation ofsubjective prior beliefs from experts, though anadditional four studies undertook some form ofelicitation exercise before conducting a retrospec-tive analysis. Although 11 of the 30 studiesconsidered some form of clinical/policy demand,only two of these did so using a formal lossfunction. Particularly encouraging was the factthat all but five of the studies undertook someform of sensitivity analysis.

Three-star applications

Author. Abrams K, Ashby D, Houghton J andRiley D.2

Title. Assessing drug interactions: tamoxifen andcyclophosphamide.

Year. 1996.

The technology. Tamoxifen and cyclophosphamideas treatments in early breast cancer.

Objectives of study. To compare overall anddisease-free survival with each drug separately andin combination.

Design of study. Randomised controlled trial: 2 × 2factorial study with 2230 women randomised.

Evidence from study. Survival curves and estimatedhazard ratios.

Statistical model. Proportional hazards with a fullyparametric exponential model.

Prospective analysis? No.

Loss function. No.

Prior distribution. Priors for main effects obtainedfrom trial evidence available at the start of the trial(1980) – uniform reference prior on interaction.

Computations. Laplace approximation.

Reporting. Posterior distributions and probabilitythat effects are less than 0, for survival and disease-free survival.

Sensitivity analysis. Reference and data-basedprior.

Comments. Model checking of proportionalhazards assumption is carried out. Problem of usinga clinical prior for the interaction is discussed. Thisis an example of the direct use of a previous trial’sresults to provide a prior, although strictly speakingthe parameter addressed is not the same in the twostudies – the current model includes another maineffect and an interaction term.

Author. Berry DA.51

Title. Monitoring accumulating data in a clinicaltrial.

Year. 1989.

The technology. ECMO.

Objectives of study. To determine the probabilityof superiority and the difference in expectedmortality between the ECMO and CMT treatmentgroups.

Design of study. Adaptive randomised controlledtrial.

Evidence from study. Four deaths out of 10controls, and zero deaths out of nine ECMOpatients.

Appendix 1

106

Page 117: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Statistical model. Models probability of death ineach group via a logistic function.

Prospective analysis? No.

Loss function. No.

Prior distribution. Arbitrary prior assumed.

Computations. Numerical integration.

Reporting. Reports as the posterior probability ofpatient receiving superior, in terms of mortalityand probability of death related to initial prog-nostic score.

Sensitivity analysis. Not explicitly reported.

Comments. See also Berry and Stangl,45 Green-house and Wasserman206 and Kass andGreenhouse.267

Author. Berry DA, Wolff MC and Sack D.64

Title. Decision making during a Phase III random-ised controlled trial.

Year. 1994.

The technology. The paper describes a trial ofthe effectiveness of a vaccine, that links the HIBcapsular polysaccharide to the outer-membraneprotein complex (OMPC) of Neisseria meningitidisserogroup B, in preventing HIB infection.

Objectives of study. To minimise the expectednumber of cases of HIB amongst Navajo childrenin the next 20 years.

Design of study. Randomised controlled trial.

Evidence from study. A total of 5190 children inNavajo were vaccinated at 2 and 4 months, witheither the vaccine being tested or a placebo.Evidence is the number of children who contractHIB in the vaccinated and unvaccinated groupeach month’, where a ‘month’ is the length of timetaken to enrol 105 children in each group.

Statistical model. Poisson event model.

Prospective analysis? No, retrospective analysis,with prior assessed after trial had taken place.

Loss function. Assumed to be linear in the numberof HIB cases.

Prior distribution. lv, the rate of vaccinatedchildren affected by HIB per child month wasgiven a gamma prior with parameters (1, 3200)and lp, the rate for placebo children, was givena gamma prior with parameters (5, 10,700).This was based on the judgement of one of theauthors, derived from published backgroundinformation.

Computations. Dynamic programming.

Reporting. After each month, the authors calcu-lated the expected number of cases of HIB underthe following assumptions:

1. The probability of the vaccine being acceptedby the regulatory authorities, following a subjec-tively assessed model by one of the authors.

2. The time taken for the vaccine to becomeavailable once the vaccine is approved (if it is) isgiven by the smaller of 1 year and (2 × g) years,where g is the probability of accepting thevaccine as given above.

3. If the vaccine is rejected, it is assumed thatone of 50% efficiency will be developedin 10 years. (Efficiency is defined as1 – [(incidence among vaccinated)/(incidence among non-vaccinated)].)

Sensitivity analysis. The excess number of HIBcases in the case of stopping as opposed to notstopping is plotted against month for:

1. different horizons, that is, different lengths oftime over which the expected number of HIBcases is to be minimised (5, 10, 40 and 80 years)

2. different priors for lv and lp

3. different values for the parameter s.

Comments. A rare example of a full decision-theoretic analysis, but a retrospective study. Mathe-matical details are described by Berry et al.63

Author. Brophy JM and Joseph L.76

Title. Placing trials in context using Bayesiananalysis – GUSTO revisited by Reverend Bayes.

Year. 1995.

The technology. Thrombolytic therapy followingmyocardial infarction.

Objectives of study. To estimate the mortalityand stroke rate difference between tissueplasminogen activator (t-PA) and streptokinase.

Health Technology Assessment 2000; Vol. 4: No. 38

107

Page 118: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Design of study. Randomised controlled trial(GUSTO).

Evidence from study. Event rate data fromGUSTO.

Statistical model. Binomial.

Prospective analysis? No, carried out after publica-tion of GUSTO results.

Loss function. No explicit loss.

Prior distribution. Based on pooled data from twoprevious trials (Gruppo Italiano per lo Studio dellaSopravvivenza dell'Infarto Miocardico 2 (GISSI-2)and Third International Study of Infarct Survival(ISIS-3)), downweighted to 50 and 10%,respectively.

Computations. Conjugate normal approximations.

Reporting. Posterior plots and probabilities of netbenefit.

Sensitivity analysis. Reference, full prior anddiscounted prior.

Comments. Letters: Avins26 says null hypothesisshould be shifted to a 1% difference, Browne79 saysBayesian methods are not appropriate, andGoodman and Langer196 warn about assumingexchangeability.

Author. Brophy JM, Joseph L.77

Title. Bayesian interim statistical analysis ofrandomised trials.

Year. 1997.

The technology. Angiotensin-converting enzyme(ACE) inhibitors for congestive heart failure.

Objectives of study. To estimate mortality benefitover placebo.

Design of study. Randomised controlled trial(Trandolapril Cardiac Evaluation, TRACE).

Evidence from study. Event rate data from TRACE.

Statistical model. Binomial.

Prospective analysis? No, carried out after publica-tion of TRACE results.

Loss function. No explicit loss.

Prior distribution. Based on pooled data from twoprevious trials (Survival And Ventricular Enlarge-ment (SAVE) and Acute Infarction RamiprilEfficacy (AIRE) studies).

Computations. Conjugate normal approximations.

Reporting. Posterior plots and probabilities of netbenefit.

Sensitivity analysis. None.

Comments. The authors suggest that TRACE couldhave ended earlier owing to consistency withprevious findings. In a reply, Kober et al.271 sayprevious trials were not sufficiently relevant.Letters: Hall220 points out a simple normal approxi-mation, Fayers165 suggests monitoring using asceptical prior, and Abrams and Jones6 point outthat DMC can also make predictions.

Author. Carlin BP, Chaloner K, Church T, LouisTA and Matts JP.90

Title. Bayesian approaches for monitoring clinicaltrials with an application to toxoplasmic encepha-litis prophylaxis.

Year. 1993.

The technology. The use of the drug pyrimeth-amine in preventing toxoplasmic encephalitisamong HIV-positive patients.

Objectives of study. To estimate log(hazard ratio)associated with the treatment for time until devel-opment of toxoplasmic encephalitis.

Design of study. Randomised controlled trial.

Evidence from study. Survival data available atthree interim analyses, with 12 events at the finalanalysis, when the trial was stopped early on aninformal stopping rule.

Statistical model. Cox regression with twocovariates: treatment and baseline CD4 counts.

Prospective analysis? Priors elicited prospectively,and analyses carried out retrospectively and notused in monitoring study.

Appendix 1

108

Page 119: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Loss function. No explicit loss function, but 25%reduction in hazard used as the lower bound ofrange of equivalence.

Prior distribution. Priors elicited from five AIDSexperts (three are clinicians) using techniquesdescribed in Chaloner et al.98 Beliefs about 2 yearsurvival were transformed into a 31-point histo-gram on treatment coefficient. Priors were notgiven, but the prior of one of the experts wasplotted.

Computations. Normal likelihood, exact posteriorby discretisation, and normal approximation toposterior.

Reporting. Number of events at each reportingdate; plots of likelihood, prior, posterior andnormal approximation to posterior for the first twoexperts; and plots of probability of 25 and 50%reductions in hazard rates at each reporting stagewith priors of the first two experts with likelihood,exact posterior and normal approximation toposterior.

Sensitivity analysis. Five priors from differentexperts.

Comments. Marked conflict between the optimismof the prior distribution and the ineffectiveness oftreatment (see page 22). Authors discuss possiblereasons. This trial is further discussed by Carlinand Sargent,93 where the use of robust monitoringschemes (identifying what class of priors wouldlead to specific conclusions) are exploredretrospectively.

Author. Dersimonian R.129

Title. Meta-analysis in the design and monitoringof clinical trials.

Year. 1996.

The technology. Calcium supplementation in theprevention of pre-eclampsia in pregnant women.

Objectives of study. To estimate the odds ratioassociated with treatment with regard to preven-tion of pre-eclampsia.

Design of study. Randomised controlled trial ofmaximum size 4500 with interim analyses.

Evidence from study. None: trial is being designed.

Statistical model. Normal approximation to likeli-hood for log(odds ratio).

Prospective analysis? Yes.

Loss function. No explicit loss function.

Prior distribution. ‘Enthusiastic’ normal priorderived from the meta-analysis of previous studies,and sceptical prior centred at 0 with same precisionas the enthusiastic prior, equivalent to an experi-ment in which 8% of the intended sample size hadbeen entered with no observed treatment effect.

Computations. Conjugate analysis.

Reporting. Stopping boundaries plotted under allthree priors, as well as those obtained underPocock and O’Brien–Fleming rules.

Sensitivity analysis. Boundaries for priors with fourcombinations of mean and precision arecompared.

Comments. None.

Author. Digman JJ, Bryant J, Wieand HS, Fisher Band Wolmark N.133

Title. Early stopping of a clinical trial when there isevidence of no treatment benefit: protocol B-14 ofthe National Surgical Adjuvant Breast and BowelProject.

Year. 1998.

The technology. Tamoxifen therapy for preventionof recurrence of breast cancer.

Objectives of study. To estimate the disease-freesurvival benefit from tamoxifen over placebo.

Design of study. Sequential randomisedcontrolled study (protocol B-14 of the NationalSurgical Adjuvant Breast and Bowel Project)using O’Brien–Fleming stopping boundaries.

Evidence from study. At the third interim analysisthere were 32 events on placebo and 56 ontamoxifen, giving a normalised likelihood with amean of –0.53 and standard deviation of 0.21.

Statistical model. Proportional hazards.

Prospective analysis? Yes.

Health Technology Assessment 2000; Vol. 4: No. 38

109

Page 120: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Loss function. No explicit loss.

Prior distribution. An ‘optimistic’ prior centred ona 40% hazard reduction and a 5% chance of noeffect, equivalent on the log(hazard ratio) scale toa normal prior with a mean of 0.51 and standarddeviation of 0.31.

Computations. Conjugate normal.

Reporting. Posterior distribution, probability ofbenefit from tamoxifen, and 95% interval for thehazard ratio.

Sensitivity analysis. Range of prior distributionswith means varying between optimistic andsceptical.

Comments. Analysis suggested that an optimistwould have a 13% belief of treatment benefit, andtherefore would not rule out further trials. But apredictive calculation suggests that continuedfollow-up would almost certainly not lead toevidence of benefit for tamoxifen.

Author. Fayers PM, Ashby D and Parmar MKB.166

Title. Bayesian data monitoring in clinicaltrials.

Year. 1997.

The technology. Pre-operative chemotherapy forresectable oesophageal cancer.

Objectives of study. To estimate log(hazard ratio)associated with treatment with respect to survival,and to suggest stopping criteria for a sequentialtrial.

Design of study. Randomised controlled trial, withinterim analyses.

Evidence from study. Survival in each arm atinterim analyses.

Statistical model. Proportional hazards model.

Prospective analysis? Yes: methods are being usedfor monitoring, but the study is continuing anddata are confidential, so imaginary data were usedin this tutorial.

Loss function. No explicit loss function, althoughthe chances of 5 and 10% improvement in 2 yearsurvival were used as quantities of interest.

Prior distribution. Archetypal reference, andsceptical and enthusiastic priors, following thesuggestions of Spiegelhalter et al.421

Computations. Conjugate analysis using normalapproximation to the likelihood.

Reporting. Observed log(hazard ratio), andposterior probability of 0, 5 and 10% improve-ments in 2 year survival.

Sensitivity analysis. Results reported for reference,sceptical and enthusiastic priors.

Comments. This study suggests monitoringsequential trials according to whether a sceptic

is ‘convinced’ of efficacy, or an enthusiast isconvinced of inefficacy.

Author. Fletcher A, Spiegelhalter D, Staessen J,Thijs L and Bulpitt C.173

Title. Implications for trials in progress of publica-tion of positive results.

Year. 1993.

The technology. The treatment of isolated systolichypertension with the aim of preventing coronaryheart disease and stroke in the elderly.

Objectives of study. To estimate the risk reductionassociated with treatment and hence judge whethera confirmatory trial is justifiable.

Design of study. Randomised controlled trial(Systolic Hypertension in the Elderly Program,SHEP).

Evidence from study. Thirty-seven per cent reduc-tion in fatal strokes (P = 0.0004; 95% confidenceinterval, 18 to 51%). Other reductions seen forfatal stroke, myocardial infraction or coronaryheart disease deaths.

Statistical model. Normal approximation to likeli-hood for risk reduction.

Prospective analysis? No, retrospective analysis.

Loss function. No explicit loss function, but rangeof equivalence of 0–15% risk reduction selected asbeing ‘reasonable’.

Appendix 1

110

Page 121: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Prior distribution. Archetypal sceptical prior (“onlyto give a flavour of the approach”) normal, mean 0and Pr(|x|) > 33% = 0.05.

Computations. Conjugate normal.

Reporting. Observed values with confidence inter-vals and P values, plots of likelihood and posteriorsfor strokes and coronary heart disease deaths,probabilities of any effect greater than zero andeffect greater than upper point of range of equiva-lence, and probability of future trials yielding asignificant result.

Sensitivity analysis. Not mentioned.

Comments. None.

Author. Freedman LS and Spiegelhalter DJ.182

Title. Application of Bayesian statistics to decisionmaking during a clinical trial.

Year. 1992.

The technology. The study described wasintended to compare six treatments of colorectalcancer. This paper concentrates on comparisonsbetween:

1. subjects receiving the standard treatment, thechemopheraputics agent 5-fluorouracil

2. subjects receiving the standard treatment plushigh-dose leucovorin

3. subjects receiving the standard treatment plushigh-dose cisplatinum.

Objectives of study. Estimation oflog(hazard ratio).

Design of study. Randomised controlled trial.

Evidence from study. 5-Fluorouracil versus5-fluorouracil plus leucovorin after 70 subjectshad entered each arm: 82 deaths had occurredand the estimated log(hazard ratio) was 1.65.5-Fluorouracil versus 5-fluorouracil pluscisplatinum after 70 subjects had entered eacharm: 96 deaths had occurred and the estimatedlog(hazard ratio) was –0.01.

Statistical model. Proportional hazards model.

Prospective analysis? No, retrospective analysis.

Loss function. No explicit loss function, but therange of equivalence was log(1)–log(1.5).

Prior distribution. As a prior distribution theauthors use a mixture distribution with a mass of p0

on 0, and a mass of 1 – p0 on a normal distributionwith a mean of 0 and a variance of 4/n0, truncatedat 0. The authors use values of p0 = 0.25 and n0 = 25as giving a 5% chance that the experimentalregimen doubles survival time, with a small chanceof no benefit “in view of the observed regression oftumours”.

Computations. Closed form normalapproximations.

Reporting. Plots of likelihood and posterior,mention of likelihood/prior agreement, z values,and posterior probabilities critical to stoppingdecisions.

Sensitivity analysis. In the 5-fluorouracil plusleucovorin trial, posterior probabilities are calcu-lated for four different sets of values of parametervalues for the prior.

Comments. Further analyses of this study areprovided by Greenhouse205 and Dixon andSimon.135

Author. Freedman LS, Spiegelhalter DJ andParmar MKB.183

Title. The what, why and how of Bayesian clinicaltrials monitoring.

Year. 1994.

The technology. The treatment of Duke’s Ccolorectal cancer with levamisole plus5-fluorouracil.

Objectives of study. To estimate log(hazard ratio).

Design of study. Randomised controlled trial.

Evidence from study. Data from published study.

Statistical model. Proportional hazards, andapproximate normal likelihood forlog(hazard ratio).

Prospective analysis? No.

Loss function. No, but range of equivalenceassessed: 0–log(1.33) on log(hazard ratio) scale.

Health Technology Assessment 2000; Vol. 4: No. 38

111

Page 122: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Prior distribution. A sceptical prior with a mean of0 and a 5% chance that log(hazard ratio) is greaterthan the alternative hypothesis of 0.3, and anenthusiastic prior with the same precision but amean of 0.3.

Computations. Conjugate normal.

Reporting. Number of deaths in each group; esti-mated log(hazard ratio); plots of prior, likelihoodand posterior; posterior mean and standard devia-tion; and the probability of log(hazard ratio) beingin range of equivalence range and on either side.

Sensitivity analysis. Reference, sceptical and enthu-siastic priors.

Comments. Discussed further by Spiegelhalteret al.421

Author. George SL, Li CC, Berry DA andGreen MR.191

Title. Stopping a clinical trial early – frequentistand Bayesian approaches applied to a CALGB trialin non-small-cell lung cancer.

Year. 1994.

The technology. The addition of two cycles ofinduction chemotherapy prior to thoracic radia-tion in subjects with Phase II non-small cell cancer.

Objectives of study. To compare survival by esti-mating log(hazard ratio).

Design of study. Sequential randomised controlledstudy (CALGB 8433) using O’Brien–Flemingstopping boundaries.

Evidence from study. Stopped at the fifth interimanalysis, there had been 56 deaths and the survivaldifference had an adjusted P value of 0.0015.

Statistical model. Exponential survival model.

Prospective analysis? No, prior distributionschosen after the trial.

Loss function. No explicit loss.

Prior distribution. A gamma prior for the rate onstandard treatment assessed from explicitevidence, and a sceptical prior distribution forlog(hazard ratio) with standard deviation of 1,

giving 16% chance that the true effect exceeds thealternative hypothesis.

Computations. Gibbs sampling.

Reporting. Posterior distribution, probability oflog(hazard ratio) being less than –0.25 and –0.5.

Sensitivity analysis. Analysis by proper and scepticalpriors.

Comments. The authors simulate and displaypredictive distributions of what demand probabili-ties would have been estimated had the trial runits full course given the data at the final analysis.This trial is also analysed using a more complexWeibull model, but a matching prior opinion, byQian et al.368

Author. Greenhouse JB and Wasserman L.206

Title. Robust Bayesian methods for monitoringclinical trials.

Year. 1995.

The technology. ECMO for premature babies.

Objectives of study. To compare survival withECMO (new) and without (conventional regimenCMT).

Design of study. Sequential adaptive randomisedcontrolled trial.

Evidence from study. Six out of 10 survivors onCMT, and nine out of nine on ECMO.

Statistical model. Binomial.

Prospective analysis? No.

Loss function. No.

Prior distribution. Uniform prior on survival ratesas the baseline; e contaminated the prior aroundthis.

Computations. Numerical integration.

Reporting. Upper and lower bounds on “P(ECMOsuperior to CMT)”, over all possible priors.

Sensitivity analysis. Search over all priors with e =0.1, 0.2, 0.3 and 0.4.

Appendix 1

112

Page 123: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Comments. The authors also consider the single-armed Phase II study of Korn et al.,276 in which3/4 patients showed toxicity. They again use arobust prior based on past trial evidence (withand without discounting), reporting the bounds

on the posterior probability of at least 20%toxicity.

Author. Gustafson P.219

Title. Robustness considerations in Bayesiananalysis.

Year. 1996.

The technology. ECMO.

Objectives of study. To estimate the probabilityof survival better for ECMO patients, that is,P(qE > qC|Data).

Design of study. Adaptive randomised controlledtrial.

Evidence from study. Four deaths out of 10controls, and zero deaths out of nine ECMOpatients.

Statistical model. Model/likelihood: (qC, qE),the survival probabilities for control andECMO patients, respectively, are assumedindependent.

Prospective analysis? No.

Loss function. No.

Prior distribution. Arbitrary, initial Beta(1.25, 1.25)prior distributions are assumed for both qC and qE.

Computations. Not explicitly reported.

Reporting. Ranges of posterior probabilities for thethree different classes of prior distributions as dataaccumulates.

Sensitivity analysis. Uses three different classesof unrestricted, restricted and density-boundedcontamination classes of prior distributions,assuming qC and qE are independent, andassuming that the prior for one parameter isfixed throughout.

Comments. None.

Author. Hughes MD.242

Title. Practical reporting of Bayesian analyses ofclinical trials.

Year. 1991.

The technology. Beta-blocker treatment for preven-tion of bleeding oesophageal varices.

Objectives of study. To compare the incidence ofbleeding and survival with a beta-blocker (newregimen) and placebo (existing regimen).

Design of study. Randomised controlled trial.

Evidence from study. No evidence available at thetime of analysis: hypothetical data used.

Statistical model. Approximate normal likelihoodfor log(odds ratio).

Prospective analysis? Yes.

Loss function. No.

Prior distribution. Reference prior, data-basedprior obtained from simple pooling of previousstudies, and subjective prior elicited from sixparticipating clinicians as a histogram with a risk-difference scale. Also considers Bayes factoranalysis in which a lump of prior is placed at 0.

Computations. Conjugate normal analysis.

Reporting. Posterior distributions and probabilitiesof benefit.

Sensitivity analysis. Reference, clinical and data-based posterior distributions. For ‘lump’ prior,the study explores the sensitivity of the “posteriorprobability of no effect” versus “prior probability ofno effect”.

Comments. See Hughes243 for more details.

Author. Kadane JB.258

Title. Bayesian methods and ethics in a clinical trialdesign.

Year. 1996.

The technology. Drug treatment to control hyper-tension following open-heart surgery.

Health Technology Assessment 2000; Vol. 4: No. 38

113

Page 124: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Objectives of study. To compare verapamil withnitroprusside with respect to post-operative bloodpressure.

Design of study. ‘Ethical’ design, in which apatient will only receive a treatment if at least onemember of a team of experts (represented by amodel of his or her current posterior beliefs)considers that the treatment is optimal for thepatient’s covariates. If more than one memberconsiders a treatment to be optimal, then thetreatment is allocated to balance prognosticfactors between groups.259,261

Evidence from study. A total of 49 patients wereenrolled, of which 30 could be studied: 12 onnitroprusside, 18 on verapamil. As the trialproceeded, there was an imbalance towardsverapamil as the experts’ opinions changed.

Statistical model. Normal linear model for bloodpressure depending on four covariates andtreatment.

Prospective analysis? Yes.

Loss function. No.

Prior distribution. Pilot data on five patients wereavailable. Prior opinions were elicited from fiveparticipating clinicians through hour-long interac-tive computer sessions.

Computations. Conjugate normal/t analysis.

Reporting. Prior and posterior distributions werepresented for each of the five experts, and for eachof 16 types of patient. However, the primary resultsof the trial (see chapter 12) were presented using anon-Bayesian analysis.

Sensitivity analysis. Results for each expert weredisplayed.

Comments. The study was passed by the internalreview board of the Johns Hopkins Hospital.Further elicitations were necessary midway throughthe study when the safety criterion being updatedwas changed. The elicitation procedure is discussedin detail and difficulties acknowledged. A bug inthe allocation program was found midway, whichhad meant that some of the early patients had notin fact been allocated to the appropriate treatment,although it is claimed that no adverse effectsresulted.

Author. Kass RE and Greenhouse JB.267

Title. Comments on ‘Investigating therapies ofpotentially great benefit: ECMO’ by J H Ware.

Year. 1989.

The technology. ECMO.

Objectives of study. To estimate the log(mortalityodds ratio) associated with ECMO patients.

Design of study. Adaptive randomised controlledtrial.

Evidence from study. Four deaths out of 10 controls,and zero deaths out of nine ECMO patients.

Statistical model. Model/likelihood: (d, g) whered = nC – nE and g = (nC + nE)/2 and nC and nE arelog(odds ratio) of death in the control and ECMOpatients, respectively.

Prospective analysis? No.

Loss function. No, but log(odds ratio) of 0.4considered important, not based on clinicaldemands.

Prior distribution. Historical clinical series of 13patients receiving conventional medical therapy, ofwhom two survived.

Computations. Not specifically reported, butpossibly used Laplace approximations to the poste-rior distribution.

Reporting. Posterior probabilities, but alsoconsider the use of Bayes factors, based on eachof the five prior-to-posterior analyses, in order todetermine the evidence for and against a treatmenteffect, that is, on the log(odds ratio) scale, of zero.

Sensitivity analysis. Five different prior distribu-tions used in the analysis reported, with thehistorical evidence downweighted in each.

Comments. See also Berry and Stangl45 Berry51 andGreenhouse and Wasserman.206

Author. Lewis RJ.293

Title. Bayesian hypothesis testing: interim analysisof a clinical trial evaluating phenytoin for theprophylaxis of early post-traumatic seizures inchildren.

Appendix 1

114

Page 125: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Year. 1996.

The technology. Phenytoin for the prophylaxis ofearly post-traumatic seizures in children.

Objectives of study. To estimate the differencein 48 hour seizure rates with and without phenytoin.

Design of study. Randomised controlled trial, with95% power to detect a 50% reduction in theseizure rate.

Evidence from study. First interim analysis, 0/7versus 2/7 seizures on phenytoin.

Statistical model. Binomial model.

Prospective analysis? Yes.

Loss function. Yes, an implicit loss function isfound that would lead to a decision-theoreticdesign with good type I and type II error rates.

Prior distribution. “Wide, pessimistic and opti-mistic” priors assessed.

Computations. Conjugate binomial analysis.

Reporting. Posterior distributions and probabilitythat rate difference is greater than 12.5% and 0.

Sensitivity analysis. Results for the three priors aregiven.

Comments. The decision to continue was made, inspite of initially unencouraging results.

Author. Lilford RJ and Braunholtz D.299

Title. The statistical basis of public policy: aparadigm shift is overdue.

Year. 1996.

The technology. Third-generation contraceptivepill.

Objectives of study. To assess the risk of venousthrombosis associated with the third-generationcontraceptive pill.

Design of study. Meta-analysis of case–controlstudies.

Evidence from study. Odds ratio of 2 and 95%interval of 1.4 to 2.7.

Statistical model. Normal likelihood forlog(odds ratio).

Prospective analysis? No.

Loss function. No.

Prior distribution. Priors were subjectively assessedby two experts.

Computations. Conjugate normal.

Reporting. Posterior distributions and 95%intervals.

Sensitivity analysis. Different priors examined, aswell as discounting likelihood by 30% additionaldispersion (undirected bias) and allowing forthe possibility of a 30% overestimate (directedbias).

Comments. Critical letters included one by Coxand Farewell,118 who pointed out that bias could beexplored through sensitivity analysis and that theauthors were setting up an unrealistic “straw man”of traditional statistics as relying heavily on signifi-cance testing.

Author. Parmar MKB, Spiegelhalter DJ andFreedman LS.348

Title. The CHART trials: Bayesian design andmonitoring in practice.

Year. 1994.

The technology. CHART radiotherapy regimen fornon-small-cell lung and head-and-neck cancer.

Objectives of study. To compare survival with(new regimen) and without (existing regimen)CHART .

Design of study. Randomised controlled trial.

Evidence from study. No evidence available at thetime of analysis: hypothetical data used.

Statistical model. Proportional hazards and approx-imate normal likelihood for log(hazard ratio).

Prospective analysis? Yes.

Loss function. No, but ranges of equivalenceassessed.

Health Technology Assessment 2000; Vol. 4: No. 38

115

Page 126: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Prior distribution. Elicited from 11 participatingclinicians.

Computations. Conjugate normal analysis.

Reporting. Posterior distributions and probabilitiesrelative to ranges of equivalence.

Sensitivity analysis. Reference, clinical andsceptical priors.

Comments. See chapter 9 for details of this study.The eventual results of these studies are discussedon page 4.7.

Author. Parmar MKB, Ungerleider RS andSimon R.349

Title. Assessing whether to perform a confirmatoryrandomised clinical trial.

Year. 1996.

The technology. Adjunct chemotherapy for non-small-cell lung cancer.

Objectives of study. To compare survival with(new regimen) and without (existing regimen)chemotherapy

Design of study. Randomised controlled trialconducted between 1984 and 1987.

Evidence from study. Hazard ratio of 1.63(1.14–2.33).

Statistical model. Proportional hazards andapproximate normal likelihood for log(hazardratio).

Prospective analysis? No.

Loss function. No.

Prior distribution. Default reference and scepticalpriors.

Computations. Conjugate normal analysis.

Reporting. Posterior distributions and probabilitiesrelative to median survival improvements of 3, 4and 5 months.

Sensitivity analysis. Reference, clinical andsceptical priors.

Comments. See page 33 for details of this study.This paper also considers another chemotherapystudy using the same prior distribution.

Author. Pocock SJ and Hughes MD.364

Title. Practical problems in interim analyses, withparticular regard to estimation.

Year. 1989.

The technology. Anisoylated plasminogen strepto-kinase activator complex (APSAC) thrombolytictherapy after myocardial infarction.

Objectives of study. To compare 30 day mortalityunder APSAC (new regimen) and placebo(existing regimen).

Design of study. Randomised controlled trial(APSAC Intervention Mortality Study, AIMS).

Evidence from study. At the second interimanalysis: 32 versus 61 deaths with 502 patients ineach arm.

Statistical model. Approximate normal likelihoodfor log(risk ratio).

Prospective analysis? No.

Loss function. No.

Prior distribution. Normal on log(risk ratio) scale,with a median risk ratio of 0.8, a 7% chance of arisk ratio above 1, with a 10% chance of a risk ratiobelow 0.67 that “seems plausible given earlierclinical trials of other thrombolytic agents”.

Computations. Conjugate normal analysis.

Reporting. Median and 95% intervals for risk ratio.

Sensitivity analysis. Four choices of prior medianand standard deviation explored.

Comments. Simulation exercise to see how theprior generates results, and what kind of biaseswould occur with sequential trials.

Author. Pocock SJ and Spiegelhalter DJ.363

Title. Domiciliary thrombolysis by generalpractitioners.

Appendix 1

116

Page 127: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Year. 1992.

The technology. Home thrombolytic therapy aftermyocardial infarction.

Objectives of study. To compare the 30 daymortality rate under antistreplase (new regimen)and placebo (existing regimen).

Design of study. Randomised controlled trial(GREAT).

Evidence from study. At interim analysis: 23/148on control versus 13/163 deaths on new treatment.

Statistical model. Approximate normal likelihoodfor log(odds ratio).

Prospective analysis? No.

Loss function. No.

Prior distribution. Elicited from an expert, whobased his opinion on published and unpublisheddata.

Computations. Conjugate normal analysis.

Reporting. Posterior distribution of risk ratio,mean and 95% interval.

Sensitivity analysis. None.

Comments. See chapter 2 for further discussion ofthis example.

Author. Sasahara AA, Cole TM, Ederer F, MurrayJA, Wenger NK, Sherry S and Stengle JM.389

Title. Urokinase Pulmonary Embolism Trial, anational cooperative study.

Year. 1973.

The technology. Urokinase treatment in pulmo-nary embolism.

Objectives of study. To compare thrombolyticcapability on urokinase (new regimen) withheparin (standard regimen).

Design of study. Randomised controlled trial.

Evidence from study. Multiple end-points on 160patients entered between 1968 and 1970.

Statistical model. Normal model.

Prospective analysis? Yes.

Loss function. No.

Prior distribution. Point mass on null hypothesis,with remainder normally distributed around 0,with standard deviation such that the expectedeffect, were it to be present, would be equalto the alternative hypothesis (see page 19). Alter-native hypotheses were “based on what appearedreasonable from previous experience withthrombolytics”.

Computations. Conjugate normal analysis.

Reporting. “Relative betting odds”, that is,posterior odds on null hypothesis.

Sensitivity analysis. None.

Comments. No P values were provided in theanalysis.

Author. Spiegelhalter DJ, Freedman LS andParmar MKB.420

Title. Applying Bayesian ideas in drug developmentand clinical trials.

Year. 1993.

The technology. Chemotherapy for osteosarcoma.

Objectives of study. To compare survival inmulti-drug (new) and two-drug (existing)regimens.

Design of study. Randomised controlled trial.

Evidence from study. None available at time ofanalysis.

Statistical model. Proportional hazards andapproximate normal likelihood for log(hazardratio).

Prospective analysis? Yes.

Loss function. No, but a range of equivalence of0–10% improvement in 5 year survival.

Prior distribution. Elicited priors from seven partic-ipating oncologists, averaged to give a median

Health Technology Assessment 2000; Vol. 4: No. 38

117

Page 128: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

hazard ratio of 1.11, and 95% interval (0.66–1.83)in favour of new therapy. Sceptical prior with thesame precision but centred on 0.

Computations. Conjugate normal analysis.

Reporting. Posterior distributions and probabilitiesrelative to range of equivalence.

Sensitivity analysis. Reference, clinical andsceptical priors.

Comments. Final results are now available413

showing no evidence of benefit: hazard ratio = 0.94(0.69–1.27) (see page 4.7).

Author. Spiegelhalter DJ, Freedman LS andParmar MKB.421

Title. Bayesian approaches to randomised trials(with discussion).

Year. 1994.

The technology. Misonidazole as adjunct chemo-therapy for head and neck cancer.

Objectives of study. To compare primary controlwith (new regimen) and without (existingregimen) misonidazole.

Design of study. Randomised controlled trial.

Evidence from study. Third interim analysis: 108events and a hazard ratio of 0.9 (0.62–1.31).

Statistical model. Proportional hazards and approx-imate normal likelihood for log(hazard ratio).

Prospective analysis? No.

Loss function. No, but a range of equivalence of0–0.414 in log(hazard ratio), corresponding to arise from 25 to 40% improvement in 2 year primarycontrol.

Prior distribution. Default reference, sceptical andenthusiastic priors.

Computations. Conjugate normal analysis.

Reporting. Posterior distributions and probabilitiesrelative to a range of equivalence.

Sensitivity analysis. Reference, clinical andsceptical priors.

Comments. The probability of clinical superioritywas low even with an enthusiastic prior. In fact,the trial was terminated at this third analysis.This paper also considers ‘three-star’ analysesof the neutron study1 and the levamisole plus5-fluorouracil study.183

Author. Stangl DK.427

Title. Prediction and decision-making usingBayesian hierarchical-models.

Year. 1995.

The technology. Imipramine hydrochloride forprevention of or delaying a return to depression.

Objectives of study. To compare the time to recur-rent depression with (new regimen) and without(existing regimen) imipramine.

Design of study. Five-centre randomised controlledtrial.

Evidence from study. Survival data from five centres.

Statistical model. Exponential survival model, withthe option of a change point to allow for a non-constant hazard.

Prospective analysis? No.

Loss function. No.

Prior distribution. Gamma priors were obtainedby considering the first gamma parameter to be asample size, and making the second have a gammaprior which itself has a second parameter equal tothe maximum likelihood estimate. Three differentsets of priors were used.

Computations. MCMC (Gibbs sampling).

Reporting. The expectation of the time to recur-rence for patient in each treatment group, andthe expected difference between the treated anduntreated group and at each clinic, were providedunder each model and prior, and the predictivedistribution of the difference at each clinic underone model and prior was plotted.

Sensitivity analysis. Three different priors wereexplored.

Comments. The use of decision theory is exploredin deciding which treatment to give to a patient.

Appendix 1

118

Page 129: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

This study was also analysed using Laplace approxi-mations425 in which further sensitivity analysis toprior assumptions is carried out.

Author. Ware J.469

Title. Investigating therapies of potentially greatbenefit: ECMO (with discussion).

Year. 1989.

The technology. ECMO.

Objectives of study. To estimate the difference inmortality associated with ECMO patients.

Design of study. Adaptive randomised controlledtrial.

Evidence from study. Four deaths out of 10 controls,and zero deaths out of nine ECMO patients.

Statistical model. Model/likelihood: (pC, pE), thesurvival probabilities in the control and ECMO

groups, respectively, such that a beta prior isassumed for pC, and the conditional distribution ofpE, given pC, is such that P(pC < pE) = P(pC = pE) =P(pC > pE) = 1/3.

Prospective analysis? No.

Loss function. No.

Prior distribution. Historical clinical series of 13patients receiving conventional medical therapy, ofwhom two survived.

Computations. Due to the model formulation,posterior probabilities could be obtained in aclosed form.

Reporting. Posterior probabilities: P(pC < pE),P(pC = pE), P(pC > pE).

Sensitivity analysis. Both an informative data-basedprior and a vague prior for pC were used.

Comments. None.

Health Technology Assessment 2000; Vol. 4: No. 38

119

Page 130: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient
Page 131: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Here we provide a selection of sites thatcurrently provide useful material on Bayesian

methods applicable to health technology assess-ment and lists of links. This list is not exhaustivebut should provide some entry into the huge rangeof material available on the Internet.

This sites below were all functioning in November2000.

Bayesian methods in healthtechnology assessment

http://www.fda.gov/cdrh/This is the home page for the US Food and DrugAdministration’s Center for Devices and Radiolog-ical Health, which contains a number of itemsrelating to Bayesian methods. To identify these,use the search facility with the keyword ‘Bayesian’.

http://www.bayesian-initiative.com/The Bayesian Initiative in Health Economics andOutcome Research provides useful backgroundmaterial on Bayesian approaches to pharmaco-economics, but does not appear to have beenupdated for some time.

Bayesian software

http://www.shef.ac.uk/~st1ao/1b.htmlThe First Bayes software is freely available, andfeatures good graphical presentation of conjugateanalysis of basic data sets. It is suitable for teaching,and is strong on predictive distributions.

http://www.mrc-bsu.cam.ac.uk/bugs/The BUGS software is designed for analysis ofcomplex analysis using MCMC methods. TheWinBUGS version features an interface for speci-fying models as graphs. The software assumesfamiliarity with Bayesian methods and MCMCcomputation.

http://www.epi.mcgill.ca/~web2/Joseph/software.htmlLawrence Joseph’s Bayesian Software site providesdownloadable code for a wide variety of sample sizecalculations using prior opinion.

http://omie.med.jhmi.edu/bayes/The Bayesian Communication website is hostedby Harold Lehmann, and features a prototypeexample in which a Bayesian analysis can becarried out on-line.289–291

http://www.research.att.com/~volinsky/bma.htmlThe Bayesian Model Averaging web page providesS-plus and Fortran software for carrying out modelaveraging, as well as featuring reprints and links.

http://www.palisade.com/The Palisade Corporation markets the @RISK®

software, which is an add-on to spreadsheetpackages that allows probability distributions to beplaced over the inputs to spreadsheets. Predictivedistributions over the outputs are then obtained bysimulation. Demonstrator versions are available fordownloading.

http://www-math.bgsu.edu/~albert/mini_bayes/info.htmlThis site is an adjunct to Jim Albert’s book BayesianComputation Using Minitab, and features macros forcarrying out a variety of analyses.

General Bayesian sites

http://bayes.stat.washington.edu/bayes_people.htmlThe Bayesian Statistics Personal Web Pages site haslinks to the home pages of many researchers inBayesian methods. These provide a vast array oflecture notes, reprints and slide presentations.

http://www.bayesian.org/The International Society for Bayesian Analysisprovides information on its activities and usefullinks.

http://www.stat.ucla.edu/~jsanchez/sbssnews/sbssnews.htmlThe American Statistical Association Section onBayesian Statistical Sciences (SBSS) has a preprintarchive and links to other sites.

http://www.isds.duke.edu/sites/bayes.htmlThis web page hosted from Duke Universityprovides a list of Bayesian sites.

Health Technology Assessment 2000; Vol. 4: No. 38

121

Appendix 2

Websites and software

Page 132: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient
Page 133: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Health Technology Assessment 2000; Vol. 4: No. 38

129Current and past membership details of all HTA ‘committees’ are available from the HTA website (see inside front cover for details)

Methodology Group

MembersMethodology ProgammeDirectorProfessor Richard LilfordDirector of Researchand DevelopmentNHS Executive – WestMidlands, Birmingham

ChairProfessor Martin BuxtonDirector, Health EconomicsResearch GroupBrunel University, Uxbridge

Professor Douglas AltmanProfessor of Statisticsin MedicineUniversity of Oxford

Dr David ArmstrongReader in Sociology asApplied to MedicineKing’s College, London

Professor Nicholas BlackProfessor ofHealth Services ResearchLondon School of Hygiene& Tropical Medicine

Professor Ann BowlingProfessor ofHealth Services ResearchUniversity College LondonMedical School

Professor David ChadwickProfessor of NeurologyThe Walton Centre forNeurology & NeurosurgeryLiverpool

Dr Mike ClarkeAssociate Director (Research)UK Cochrane Centre, Oxford

Professor Paul DieppeDirector, MRC Health ServicesResearch CentreUniversity of Bristol

Professor Michael DrummondDirector, Centre forHealth EconomicsUniversity of York

Dr Vikki EntwistleSenior Research Fellow,Health Services Research UnitUniversity of Aberdeen

Professor Ewan B FerlieProfessor of Public ServicesManagementImperial College, London

Professor Ray FitzpatrickProfessor of Public Health& Primary CareUniversity of Oxford

Dr Naomi FulopDeputy Director,Service Delivery &Organisation ProgrammeLondon School of Hygiene& Tropical Medicine

Mrs Jenny GriffinHead, Policy ResearchProgrammeDepartment of HealthLondon

Professor Jeremy GrimshawProgramme DirectorHealth Services Research UnitUniversity of Aberdeen

Professor Stephen HarrisonProfessor of Social PolicyUniversity of Manchester

Mr John HendersonEconomic AdvisorDepartment of Health, London

Professor Theresa MarteauDirector, Psychology &Genetics Research GroupGuy’s, King’s & St Thomas’sSchool of Medicine, London

Dr Henry McQuayClinical Reader in Pain ReliefUniversity of Oxford

Dr Nick PayneConsultant Senior Lecturerin Public Health MedicineScHARRUniversity of Sheffield

Professor Joy TownsendDirector, Centre for Researchin Primary & Community CareUniversity of Hertfordshire

Professor Kent WoodsDirector, NHS HTAProgramme, & Professorof TherapeuticsUniversity of Leicester

Page 134: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Health Technology Assessment 2000; Vol. 4: No. 38

130

HTA Commissioning Board

Current and past membership details of all HTA ‘committees’ are available from the HTA website (see inside front cover for details)

MembersProgramme DirectorProfessor Kent WoodsDirector, NHS HTAProgramme, &Professor of TherapeuticsUniversity of Leicester

ChairProfessor Shah EbrahimProfessor of Epidemiologyof AgeingUniversity of Bristol

Deputy ChairProfessor Jon NichollDirector, Medical CareResearch UnitUniversity of Sheffield

Professor Douglas AltmanDirector, ICRF MedicalStatistics GroupUniversity of Oxford

Professor John BondDirector, Centre for HealthServices ResearchUniversity of Newcastle-upon-Tyne

Ms Christine ClarkFreelance Medical WriterBury, Lancs

Professor Martin EcclesProfessor ofClinical EffectivenessUniversity of Newcastle-upon-Tyne

Dr Andrew FarmerGeneral Practitioner &NHS R&DClinical ScientistInstitute of Health SciencesUniversity of Oxford

Professor Adrian GrantDirector, Health ServicesResearch UnitUniversity of Aberdeen

Dr Alastair GrayDirector, Health EconomicsResearch CentreInstitute of Health SciencesUniversity of Oxford

Professor Mark HaggardDirector, MRC Instituteof Hearing ResearchUniversity of Nottingham

Professor Jenny HewisonSenior LecturerSchool of PsychologyUniversity of Leeds

Professor Alison KitsonDirector, Royal College ofNursing Institute, London

Dr Donna LampingHead, Health ServicesResearch UnitLondon School of Hygiene& Tropical Medicine

Professor David NealProfessor of SurgeryUniversity of Newcastle-upon-Tyne

Professor Gillian ParkerNuffield Professor ofCommunity CareUniversity of Leicester

Dr Tim PetersReader in Medical StatisticsUniversity of Bristol

Professor Martin SeversProfessor in ElderlyHealth CareUniversity of Portsmouth

Dr Sarah Stewart-BrownDirector, Health ServicesResearch UnitUniversity of Oxford

Professor Ala SzczepuraDirector, Centre for HealthServices StudiesUniversity of Warwick

Dr Gillian VivianConsultant in NuclearMedicine & RadiologyRoyal Cornwall Hospitals TrustTruro

Professor Graham WattDepartment ofGeneral PracticeUniversity of Glasgow

Dr Jeremy WyattSenior FellowHealth KnowledgeManagement CentreUniversity College London

Page 135: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient
Page 136: NHS R&D HTA Programme HTA · 2006-05-15 · that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient

Copies of this report can be obtained from:

The National Coordinating Centre for Health Technology Assessment,Mailpoint 728, Boldrewood,University of Southampton,Southampton, SO16 7PX, UK.Fax: +44 (0) 23 8059 5639 Email: [email protected]://www.ncchta.org ISSN 1366-5278

Health Technology Assessm

ent 2000;Vol.4:No.38

Bayesian methods in H

TA

FeedbackThe HTA programme and the authors would like to know

your views about this report.

The Correspondence Page on the HTA website(http://www.ncchta.org) is a convenient way to publish

your comments. If you prefer, you can send your comments to the address below, telling us whether you would like

us to transfer them to the website.

We look forward to hearing from you.