improving evaluation of obstetric interventions€¦ · the mother, suboptimal growth of the...

UvA-DARE is a service provided by the library of the University of Amsterdam (http://dare.uva.nl)

UvA-DARE (Digital Academic Repository)

Improving evaluation of obstetric interventions

van 't Hooft, J.

Link to publication

Citation for published version (APA):van 't Hooft, J. (2016). Improving evaluation of obstetric interventions.

General rightsIt is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s),other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons).

Disclaimer/Complaints regulationsIf you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, statingyour reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Askthe Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam,The Netherlands. You will be contacted as soon as possible.

Download date: 24 Aug 2020

IMPROVING EVALUATION OF OBSTETRIC INTERVENTIONS

Janneke van ‘t Hooft

41939 Hooft, Janneke van 't.indd 1 21-09-16 09:48

ISBN 978-94-6332-078-8

Painting cover: Rattner, Abraham (1895-1978). Mother and Child, 1938. New

York, Museaum of Modern Art (MoMA). Oil on canvas, 28 3/4 x 39 3/8" (73 x 100

Modern Art, New York/Scala, Florence.

Design: Ferdinand van Nispen tot Pannerden,

Citroenvlinder DTP&Vormgeving, my-thesis.nl

Printed by: GVO drukkers & vormgevers BV, Ede, The Netherlands

Financial support for printing this thesis was kindly provided by: Universiteit

van Amsterdam (UvA), Onze Lieve Vrouwe Gasthuis, Clara Angela Foundation,

Dutch Farm Experience, Paul S. Klussenbedrijf, ChipSoft, BMA BV (Mosos),

Medical Dynamics, IM Services BV and Memidis Pharma BV.

system or transmitted in any form or by any means without prior permission

of the author.

ACADEMISCH PROEFSCHRIFT

ter verkrijging van de graad van doctor

aan de Universiteit van Amsterdam

op gezag van de Rector Magnificus

prof. dr. ir. K.I.J. Maex

ten overstaan van een door het College voor Promoties ingestelde commissie,

in het openbaar te verdedigen in de Agnietenkapel

op dinsdag 8 november 2016, te 14.00 uur

Janneke van ’t Hooft

geboren te Pueblo Nuevo, Nicaragua

PROMOTIECOMMISSIE

Promotor:

Prof. dr. B.W.J. Mol Universiteit van Amsterdam

Copromotores:

Dr. B.C. Opmeer Universiteit van Amsterdam

Dr. J.H. van der Lee Universiteit van Amsterdam

Overige leden:

Prof. dr. J.A.M. van der Post Universiteit van Amsterdam

Prof. dr. E. Pajkrt Universiteit van Amsterdam

Prof. dr. A.H.L.C. van Kaam Universiteit van Amsterdam

Prof. dr. T.J. Roseboom Universiteit van Amsterdam

Dr. M.E. van den Akker-van Marle Leids Universitair Medisch Centrum

Prof. dr. K.S. Khan Barts and

the London School of Medicine

Faculteit: Faculteit der Geneeskunde

Voor Jan

CONTENTS

Chapter 1 General introduction 9

Part I Core outcome set for obstetrical evaluation studies 25

Chapter 2 Core Outcome Set for Evaluation of

Interventions to Prevent Preterm Birth

Part II Long-term outcomes of obstetrical evaluation studies 79

Chapter 3 Cervical pessary for preterm birth prevention in twin

pregnancy with a short cervix: a 3 years follow-up

Chapter 4 Preventing preterm birth with progesterone in

women with short cervical length: outcomes

in children at 24 months of age

Part III Integrating outcomes of obstetrical evaluation studies 121

Chapter 5 Predicting developmental outcomes in

premature infants by term equivalent MRI:

systematic review and meta-analysis

Chapter 6 ST-analysis in electronic foetal monitoring

is cost-effective from both the maternal

and neonatal perspective

Chapter 7 Cost and health outcomes of effectiveness

studies in obstetrics (Kosten en effecten van

doelmatigheidsonderzoek in de obstetrie)

Chapter 8 Summary and general discussion 193

Appendices 211

Dutch summary (Nederlandse samenvatting) 212

List of co-authors and their

contribution to the manuscript

PhD portfolio 227

List of publications 229

Curriculum Vitae 230

Acknowledgments (Dankwoord) 231

'drawings from children that participated in the ProTwinkids follow-up study (chapter 3)'

CHAPTER 1

GENERAL INTRODUCTION

Chapter 1

GENERAL INTRODUCTION

In most pregnancies the synergy between mother and her unborn child

is adequately balanced, resulting in the birth of the baby at the end of an

uncomplicated pregnancy. Unfortunately, not all pregnancies and deliveries

remain in such optimal balance. In fact, the day of birth is a high risk event for

both mother and child, with a more than 5 times greater risk of dying for the

mother and 400 times greater risk of dying for the baby than travelling 370 km

by car.1 Moreover, pregnancies can be complicated by high blood pressure of

the mother, suboptimal growth of the foetus, foetal distress before or during

labour, or preterm birth.

Many new and existing interventions can be offered to pregnant women who

face a problem in pregnancy or during labour. In order to guide clinical as well

as policy decision making, evaluation research is needed to establish evidence

on effectiveness and potential harm of these interventions. A randomized

controlled trial (RCT) is worldwide considered as the best instrument to

evaluate the effectiveness of medical interventions. It is defined as a prospective

study comparing the effect and value of intervention(s) against a control in human

beings.2 By randomly allocating subjects, an RCT incorporates a control group

which does not differ from the intervention group except for the intervention

being studied (Figure 1). But although RCTs represent primary research with

the highest level of evidence, a single RCT is still prone to chance for false

positive or false negative results, limited generalisability or various forms of

bias. In addition, research that is relevant to evaluate whether an intervention

is effective can be scattered all over the literature and published in different

languages. Systematic reviews and meta-analyses identify these relevant

studies, appraise their quality and summarize their results using scientific

methodology.3 The aggregated evidence gives a more balanced answer to a

research question and therefore systematic reviews and meta-analysis are

considered to have a higher level (hierarchy) of evidence compared to separate

RCTs (Figure 1).4 An individual patient data (IPD) meta-analysis is a specific type

of systematic review. Rather than extracting summary (aggregate) outcomes

from study publications, the original research data are sought directly from

the researchers responsible for each study. These data can then be re-analyzed

centrally and combined, if appropriate, in a meta-analysis. IPD-meta-analysis

General introduction

can provide additional relevant results by analyzing associations at the

individual patient level. They include the ability to allow in depth exploration

of patient factors and subgroup analyses and have been described as the gold

standard of systematic reviews.5

Variation in outcomes used in obstetrical evaluation research

When an RCT or systematic review (SR) addresses a relevant question

regarding a specific population, in which an intervention group is compared to

a comparison group, it will measure and report on key outcomes that provide a

better understanding on the effectiveness and safety of that intervention at a

specific time point (PICO structure, Figure 1). Outcomes used in RCTs and SRs

are ideally of real importance to the population. However, if researchers have

a more biological/mechanistic oriented question the outcomes chosen might

be different compared to more clinically related research questions. Within

the context of clinical evaluation research, we will limit our exploration of the

problem of variation in outcomes used in RCTs and SRs to clinical outcomes of

obstetric interventions. In the design phase of a clinical trial about prevention

of preterm birth, for example, the chosen ‘outcome’ can be ‘gestational age at

delivery’ or ‘admission to neonatal intensive care’ or ‘respiratory problems of

the neonate’. Besides collection of outcomes with the greatest (therapeutic)

importance for the patients,6 outcomes are selected because of their available

(internal and external) validated measurement tools. However, researchers may

need to make pragmatic decisions when designing a trial. Funding and time

limitations may mean that outcomes with higher event rates that are easy to

measure are more attractive, increasing the statistical power of the trial at the

expense of relevance for patients. Also historical perspective (outcomes that

are already used by other researchers in the same field) and special interest of

the researchers team can influence the list of outcomes used.

In preterm birth clinical research the lack of consistency in choice of outcomes

has led to over 72 different primary outcomes being reported in 103 clinical

trials.7 The same lack of consistency in the choice of outcomes exists in SR en

meta-analyses: in 33 Cochrane reviews on preterm birth, 29 different primary

outcomes were reported.7

Chapter 1

Figure 1. An overview of evaluation research in obstetrics and some of the problems we face

Theoretically, total freedom of research teams to choose the outcomes used for

their RCT or SR gives rise to several problems:

1) The selected outcomes may not be the most relevant ones

(especially relevant for patients and clinicians).

2) When relevant outcomes cannot be easily acquired, there may

be a tendency to report on intermediate, surrogate or proxy

outcomes. This can give misleading (and even harmful) results,

further explained in the next section.

3) Similar trials may use a wide variety of outcomes, outcome

measurement tools and definitions which hampers the

comparison and meta-analysis of results of various trials with

similar goals, and thus leads to inefficiency and waste of research.

These problems of lack of consistency in outcomes reported in RCT and SR, and

the lack of reporting of relevant outcomes can be addressed by introducing

the use of a core outcome set (COS) in research, i.e. a set of critical and

important outcomes that should be measured and reported, as a minimum,

in a standardised manner. A core outcome set captures the key outcomes

to be used in trials on a specific topic, defined through an international

consensus involving all relevant stakeholders (including patients) using

proper methodology.8 The introduction of core outcome sets, enhances the

translation and integration of research in decision making into clinical practice

(i.e. Evidence Based Medicine).

Long-term outcomes

Many interventions applied in pregnancy are evaluated for their efficacy and

safety by measuring short-term maternal and neonatal outcomes (Figure 1).

Neonatal follow-up often ends at the moment of the child’s discharge from the

hospital or within 6 to 10 weeks after the expected term date. In obstetrical

research to evaluate the effect of a specific perinatal intervention only a small

minority (approximately 16%) of large RCTs reports on long-term follow-up

of the child.9 These short-term outcomes can be surrogate outcomes or short-

term clinical outcomes. An example of a surrogate outcome is the Apgar score

at 1, 5 and 10 minutes after birth as a surrogate for short-term and long-term

mortality and morbidity. Although there may be an association between a

surrogate outcome and long-term outcome (e.g. there is an association of

Chapter 1

Apgar score <7 at five minutes with increased risk of neurologic disability)

the vast majority of children born with a low Apgar score grow up without

disability.10 Moreover, use of surrogate outcomes in clinical research can also

have serious harmful effects. There are numerous examples of drugs used in

the past for heart diseases that had been approved on the basis of surrogate

outcomes, but were ultimately proven to be harmful by increasing mortality

rates.11 12 So restricting conclusions to short-term surrogate outcomes can lead

to seriously erroneous conclusions due to the fact that these outcomes may

not reflect any possible clinical effect.

Subsequently, more reliable short-term outcomes (e.g. admittance to neonatal

intensive care, or problems related to early neonatal life such as respiratory

distress syndrome ) still have their drawbacks because they do not show the

full scope of information necessary to assess clinical impact.9 Thus, restricting

conclusions to short-term outcomes can also have serious drawbacks due to

the fact that the risk-benefit ratio of any perinatal interventions may change

considerably both for the pregnant woman and her infant, between the period

immediately after birth and later on in childhood.13 This was shown, for example

with the ORACLE II study, on use of antibiotics for women in spontaneous

imminent preterm labour.14 In this study no short-term benefit in the use of

antibiotics compared to placebo was seen in the initial trial. At follow-up after

seven years a potential harmful effect of the use of erythromycin in the children

was found, indicated by an increased risk of cerebral palsy RR [95%CI] 1.69 [1.07

to 2.67].15 Another trial, evaluating the use of vitamin K and phenobarbital to

prevent intracranial haemorrhage in newborns less than 34 weeks gestation,

also showed no effect on the short-term, but significantly lower Bayley scores

in the treatment group compared to the placebo group (mean scores (SD) of

104 (21) vs 113 (22), p=0.023).16 Warning signs of long-term harm were seen in

trials evaluating the use of progesterone, 17 and the use of repeated doses of

corticosteroids in women with a high risk of imminent preterm labour.18 The

OPPTIMUM trial showed an increased risk (although still of low frequency)

for problems related to renal, gastrointestinal, and respiratory systems in the

progesterone group (e.g. gastrointestinal disability in 4 (1%) in placebo vs 9 (2%)

in progesterone group, OR [95%CI] 2.67 [1.37 to 5.20]), while repeated doses of

corticosteroids evaluated in another trial showed an increased risk (though not

significant) for cerebral palsy RR [95%CI] 5.7 [0.7 to 46.7] compared to single

dose corticosteroids in imminent preterm labour.18 Another famous example

of prenatal effects that only came to expression later in adulthood is the Dutch

famine study, a historical cohort that provided information on the effects of

famine exposure during specific periods of gestation on outcomes measured

at birth and outcomes in adulthood. Data from 821 children exposed in utero

to famine (divided in subgroups of early-, mid- and late gestation of exposure)

were compared to data of 1593 children that were conceived before and after

the period of famine. Babies exposed to maternal famine in late- or mid gestation

were lighter, shorter, thinner and had smaller head circumference than babies

that had not been exposed to famine.19 The long term consequences found

(metabolic syndrome –including high blood pressure, obesity, misbalanced

lipid profiles and glucose intolerance- breast cancer, depression, airways

disease and renal function) were however to a large extent independent of size

at birth, underlining the fact that programming may take place even without

effects that are not visible immediately after birth.19 20 Long-term follow-up of

mothers and children participating in obstetrical trials is therefore pivotal.

Integrating outcomes of obstetrical evaluation studies to guide clinical

decision making

Now that we have introduced the importance of consistency in (relevant)

outcomes reported in RCTs and SRs, and the added value of long-term

outcomes of obstetrical interventions, it will be clear that most ideally, the

measured outcomes in clinical research will have an impact on clinical practice.

An efficient system of research addresses health problems of importance

to populations and interventions and outcomes considered important by

patients and clinicians.21 However, much has been written about research

waste due to low priority questions, inappropriate study design and problems

in access to study data and obtaining unbiased reports. This subsequently

leads to difficulties of implementation of research into clinical practice.21 22 A

quote of dr. Ioannidis in a published essay entitled ‘why most clinical research

is not useful’23 demonstrates this: ’Practicing doctors and other health care

professionals will be familiar with how little of what they find in medical journals is

useful. The term “clinical research” is meant to cover all types of investigation that

address questions on the treatment, prevention, diagnosis/screening, or prognosis

of disease or enhancement and maintenance of health. Experimental intervention

studies (clinical trials) are the major design intended to answer such questions, but

observational studies may also offer relevant evidence. “Useful clinical research”

Chapter 1

means that it can lead to a favorable change in decision making (when changes in

benefits, harms, cost, and any other impact are considered) either by itself or when

integrated with other studies and evidence in systematic reviews, meta-analyses,

decision analyses, and guidelines’.

In this thesis we will address some clinically based research questions, and we

will discuss the integration of outcomes from obstetrical evaluation studies

(as suggested by Ioannidis in the above quote) in systematic review/meta-

analysis, cost-effectiveness analysis and budget impact analysis in order to give

guidance for clinical decision making. We will start with an example of clinically

based research and introduce some of the methodologies.

Clinically based research questions

An example of a clinically based research question addressed in this

thesis originated from doctors working in the neonatal intensive care in

Amsterdam. It is know that preterm birth is associated with an increased risk

of neurodevelopmental problems.24 However, not all children born preterm will

develop developmental problems, and if there are problems, there is a broad

range in type of problems (cognitive, motor, visual, etc) and severity (mild to

severe). Predicting the long term impact of a preterm birth can be of great value

as it may help parents to better prepare for the future and improve selection of

children that may benefit from early intervention programs (i.e. physiotherapy

or speech therapy) to improve outcomes. However, if the predictive value is

poor, it may invoke unwanted effects, as parents may worry unnecessarily

about the possible abnormal development of their child.25 Neonatologists were

in doubt whether to perform brain MRI in all very preterm born neonates at

term equivalent age. Several studies reported high predictive value of term

equivalent MRI on long term development of these children. But no systematic

review and meta-analysis was available on this topic. However, after discussing

this topic with international colleagues at conferences apparently many of them

were already convinced by this technique and were using this as standard care

in their clinical practice. Instead of blindly implementing this imaging technique

in standard care, the department of neonatology conducted a systematic review

and meta-analysis (a chapter incorporated in this thesis). This example shows

that research aimed at answering a research question arising from clinical

practice may have a higher chance of influencing clinical practice than research

which does not have such a close connection with daily clinical practice.

Meta-analysis using bivariate model to assess predictive value of a prognostic tool To determine the predictive value of a prognostic tool is challenging. First, the time-frame between the performed prognostic test (e.g. Apgar score, cord blood pH, brain MRI) and the outcome of interest (e.g. neurodevelopment) is broad, resulting in the lack of studies evaluating this topic due to feasibility reasons. Second, the outcome of interest (unfavourable neurodevelopment) can vary (e.g. cerebral palsy, visual and/or hearing problems, motor-, neurocognitive- and behavioural problems). However, information of each cohort studies (e.g. prospectively following a consecutive sample of patients presenting with the prognostic dilemma) can be useful (acknowledging the increased risk of bias due to confounding factors in this type of studies). These studies report estimates of sensitivity (correctly detecting those with the target condition) and specicity (correctly identifying those without the target condition) of the prognostic test, which can be pooled in meta-analytic approaches. Because sensitivity and specicity-values of a test are related and very much depending on the used cut-os, the bivariate model has the advantage of preserving the two-dimensional nature of the underlying data and gives insight in the optimal use of the predictive test in clinical practice.26

Cost-eectiveness analysis of an obstetrical intervention using data from clinical research As the rising health care expenditures and aordability questions become increasingly relevant for health care decision making, interventions also need to be evaluated in terms of economic outcomes. Many RCTs are therefore complemented with an economic evaluation. Health care use and costs are estimated as economic outcomes, and related to clinical outcomes through cost-eectiveness analyses, cost-utility analyses or cost-benet analyses. Cost-eectiveness analyses most often involves the comparison of two or more interventions/alternatives where the health gains of one intervention is related to the additional costs (or cost savings) associated with that intervention, relative to the comparator. In case health gains are achieved at increased cost, the question is whether the health gains are ‘worth’ the extra costs (willingness to pay).27 In case an intervention leads to health gains and cost savings at the same time, implementing this intervention would benet patients while reducing health care expenses.

Chapter 1

Innovative (and costly) interventions and drugs often result in cost increases,

and need consideration whether society is willing to pay for the anticipated

benefits. In many other cases, where interventions optimize existing care

arrangements, the improvement in health and reduction of costs often go

hand in hand, and cost savings can be expected.

Budget impact for exploring potential health and budget impact in a

population before and after implementation of clinical trial results.

Finally, if we succeed to perform clinical research that has impact on clinical

practice we can estimate its potential (health and economical) impact by

performing a budget impact analysis (BIA). The purpose of a BIA is to estimate

the financial and healthcare consequences of adoption and diffusion of a

health-care intervention within a specific health-care setting or system.28 For

this purpose a BIA can be used for prediction of a shift in prevalence of disease

(or health conditions) and cost after implementation of interventions that are

found to be effective and de-implementation of interventions that are found to

be ineffective in evaluation research. A BIA can provide additional information

that helps to motivate clinicians to implement evaluation research into clinical

practice.

PROBLEMS

In summary we face the following problems in evaluation research in obstetric

interventions:

1) a lack of standardization in the selection and operationalization

of outcomes. This may lead to inefficiency in research and waste

of resources.

2) a lack of systematic follow-up data of randomized controlled

trials, leaving a blind spot in clinical research.

3) a gap between clinical research and its impact in clinical decisions.

Patients therefore do not fully benefit from the available evidence.

AIMS OF THE THESIS

This thesis focuses on improving evaluation research on obstetric interventions.

The aims are to:

• develop a core outcome set that can be used in obstetrical

evaluation studies

• measure long-term outcomes of obstetrical evaluation studies

• integrate outcomes of obstetrical evaluation studies in order to

guide clinical decision making

OUTLINE OF THE THESIS

The thesis is divided in three parts.

Part I describes the development of a core outcome set that can be used in

obstetrical evaluation studies. Chapter 2 presents the development of a core

outcome set (COS) for studies on prevention of preterm birth developed with

an international e-Delphi consensus group. This COS reflects the outcomes

that are critically important to all relevant stakeholders (patients, obstetricians,

midwives, neonatologist and researchers).

Part II explores ways to measure long-term outcomes of obstetrical intervention

studies. Chapter 3 evaluates the long-term effects in children born to mothers

with a short cervical length that were given a pessary during twin pregnancy

in a randomized controlled trial. Chapter 4 evaluates the long-term effects in

children born to mothers with a short cervical length in a singleton pregnancy.

These women were included in a randomized controlled trial comparing the

use of vaginal progesterone in the second and third trimester with placebo to

prevent preterm birth.

Part III deals with outcomes of obstetrical evaluation studies in systematic

reviews, cost effectiveness analysis or budget impact analysis to give guidance

for clinical decision making. Chapter 5 evaluates the predictive value of brain

MRI results for long-term developmental outcomes in children born preterm

or with a low birth weight. Chapter 6 models the short and long-term costs

and effects of using an advanced form of foetal monitoring (ST-analysis) during

Chapter 1

labour when compared to conventional foetal monitoring from a maternal and

neonatal perspective.

Finally, by performing evaluation studies of obstetrical interventions our

ultimate goal is to improve health outcomes of mothers and their children at

an acceptable cost. Therefore, implementation of trial results is a crucial step.

In Chapter 7 explores the potential impact of implementation of nationwide

evaluation studies in obstetrics on health and costs at a national level. It

assesses whether evaluation research leads to cost-savings and if this covers

the cost of performing them.

REFERENCES

1. Walker KF, Cohen AL, Walker SH, et al. The dangers of the day of birth. BJOG 2014;121(6):714-8.

2. Friedman LF, Furberg CD, DeMets DL. Fundamentals of Clinical Trials. Springer, 4th ed. 2010.

3. Khan K, Kunz R, Kleijnen J, et al. Systematic revies to support evidence-based medicine. Secondary Systematic revies to support evidence-based medicine 2nd edition, 2011.

4. Sackett DL. Rules of evidence and clinical recommendations on the use of antithrombotic agents. Chest 1989;95(2 Suppl):2S-4S.

5. Stewart LA, Tierney JF. To IPD or not to IPD? Advantages and disadvantages of systematic reviews using individual patient data. Eval Health Prof 2002;25(1):76-97.

6. Schulz KF, Altman DG, Moher D, et al. CONSORT 2010 statement: updated guidelines for reporting parallel group randomised trials. BMJ 2010;340:c332.

7. Meher S, Alfirevic Z. Choice of primary outcomes in randomised trials and systematic reviews evaluating interventions for preterm birth prevention: a systematic review. BJOG 2014;121(10):1188-94.

8. Williamson PR, Altman DG, Blazeby JM, et al. Developing core outcome sets for clinical trials: issues to consider. Trials 2012;13:132.

9. Teune MJ, van Wassenaer AG, Malin GL, et al. Long-term child follow-up after large obstetric randomised controlled trials for the evaluation of perinatal interventions: a systematic review of the literature. BJOG 2013;120(1):15-22.

10. Ehrenstein V, Pedersen L, Grijota M, et al. Association of Apgar score at five minutes with long-term neurologic disability and cognitive function in a prevalence study of Danish conscripts. BMC Pregnancy Childbirth 2009;9:14.

11. Svensson S, Menkes DB, Lexchin J. Surrogate outcomes in clinical trials: a cautionary tale. JAMA Intern Med 2013;173(8):611-2.

12. Yudkin JS, Lipska KJ, Montori VM. The idolatry of the surrogate. BMJ 2011;343:d7995.

13. Mol BW, Daly M, Dodd JM. Progestogens and preterm birth--not the hoped for panacea? Lancet 2016;387(10033):2066-8.

14. Kenyon SL, Taylor DJ, Tarnow-Mordi W, et al. Broad-spectrum antibiotics for spontaneous preterm labour: the ORACLE II randomised trial. ORACLE Collaborative Group. Lancet 2001;357(9261):989-94.

15. Kenyon S, Pike K, Jones DR, et al. Childhood outcomes after prescription of antibiotics to pregnant women with spontaneous preterm labour: 7-year follow-up of the ORACLE II trial. Lancet 2008;372(9646):1319-27.

16. Thorp JA, O’Connor M, Jones AM, et al. Does perinatal phenobarbital exposure affect developmental outcome at age 2? Am J Perinatol 1999;16(2):51-60.

17. Norman JE, Marlow N, Messow CM, et al. Vaginal progesterone prophylaxis for preterm birth (the OPPTIMUM study): a multicentre, randomised, double-blind trial. Lancet 2016;387(10033):2106-16.

18. Wapner RJ, Sorokin Y, Mele L, et al. Long-term outcomes after repeat doses of antenatal corticosteroids. N Engl J Med 2007;357(12):1190-8.

19. Painter RC, Roseboom TJ, Bleker OP. Prenatal exposure to the Dutch famine and disease in later life: an overview. Reprod Toxicol 2005;20(3):345-52.

20. Roseboom T, de Rooij S, Painter R. The Dutch famine and its long-term consequences for adult health. Early Hum Dev 2006;82(8):485-91.

21. Chalmers I, Glasziou P. Avoidable waste in the production and reporting of research evidence. Lancet 2009;374(9683):86-9.

22. Bero LA, Grilli R, Grimshaw JM, et al. Closing the gap between research and practice: an overview of systematic reviews of interventions to promote the implementation of research findings. The Cochrane Effective Practice and Organization of Care Review Group. BMJ 1998;317(7156):465-8.

23. Ioannidis JP. Why Most Clinical Research Is Not Useful. PLoS Med 2016;13(6):e1002049.

Chapter 1

24. Aarnoudse-Moens CS, Weisglas-Kuperus N, van Goudoever JB, et al. Meta-analysis of neurobehavioral outcomes in very preterm and/or very low birth weight children. Pediatrics 2009;124(2):717-28.

25. Janvier A, Barrington K. Trying to predict the future of ex-preterm infants: who benefits from a brain MRI at term? Acta Paediatr 2012;101(10):1016-7.

26. Reitsma JB, Glas AS, Rutjes AW, et al. Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. J Clin Epidemiol 2005;58(10):982-90.

27. Ryder HF, McDonough C, Tosteson AN, et al. Decision Analysis and Cost-effectiveness Analysis. Semin Spine Surg 2009;21(4):216-22.

28. Mauskopf JA, Sullivan SD, Annemans L, et al. Principles of good practice for budget impact analysis: report of the ISPOR Task Force on good research practices--budget impact analysis. Value Health 2007;10(5):336-47.

PART 1Core outcome set for obstetrical

evaluation studies

CHAPTER 2

A CORE OUTCOME SET FOR EVALUATION OF

INTERVENTIONS TO PREVENT PRETERM BIRTH

Janneke van ‘t Hooft, James M. N. Duffy, Mandy Daly, Paula R. Williamson,

Shireen Meher, Elizabeth Thom, George R. Saade, Zarko Alfirevic,

Ben Willem J. Mol, Khalid S. Khan.

On behalf of the Global Obstetrics Network (GONet)

Obstet Gynecol. 2016;127(1):49-58

Chapter 2

ABSTRACT

Objective

To develop consensus on a set of key clinical outcomes for the evaluation of

preventive interventions for preterm birth in asymptomatic pregnant women.

Methods

A two-stage web-based Delphi survey and a face-to-face meeting of key

stakeholders were employed to develop consensus on a set of critical and

important outcomes. We approached five stakeholder groups (parents,

midwives, obstetricians, neonatologists and researchers) from middle and high-

income countries. Outcomes subjected to the Delphi survey were identified

by systematic literature review and stakeholder input. Survey participants

scored each outcome on a 9-point Likert scale anchored between 1 (limited

importance) and 9 (critical importance). They had the opportunity to reflect

upon total and stakeholder sub-group feedback between survey stages. For

consensus, defined a priori, outcomes required at least 70% of participants of

each stakeholder group scoring them as ‘critical’ and less than 15% as ‘limited’.

Results

A total of 228 participants from five stakeholder groups from three lower-

middle-income countries, seven upper-middle-income countries and 17 high-

income countries were asked to score 31 outcomes. Of these participants,

195 completed the first survey and 174 the second. Consensus was reached

on 13 core outcome: four related to pregnant women: maternal mortality,

maternal infection or inflammation, prelabor rupture of membranes, and

harm to mother from intervention. Nine related to offspring: gestation age at

birth, offspring mortality, birth weight, early neurodevelopmental morbidity,

late neurodevelopmental morbidity, gastrointestinal morbidity, infection,

respiratory morbidity, and harm to offspring from intervention.

Conclusions

This core outcome set for studies that evaluate prevention of preterm birth

developed with an international multidisciplinary perspective will ensure that

data from trials that assess prevention of preterm birth can be compared and

combined.

Core outcome set for preterm birth studies

INTRODUCTION

Clinical trials, systematic reviews and guidelines evaluate interventions

by comparing outcomes chosen to reflect beneficial and harmful effects.

Systematic reviews have the potential to minimize bias and to increase the

precision of measurements of treatment effects by quantitative pooling (meta-

analysis) of similar clinical trial outcomes. However, this method does not work

if clinical trials collect different outcomes. The lack of consistency in outcomes

reported in comparative health research evaluating interventions for preterm

birth has led to over 72 different primary outcomes being reported in 103

randomised trials.1 Such heterogeneity results in substantial outcome reporting

bias and an inability to synthesise results across studies in systematic reviews.2

As a consequence, 33 Cochrane reviews on preterm birth have reported on

29 different primary outcomes.1 This problem could be addressed by the use

of a core outcome set, that is, a set of critical and important outcomes that

should be measured and reported, as a minimum, in a standardised manner in

research.3 Such a set currently does not exist and its need has recently being

expressed in a systematic review of studies of preterm neonates. They found

that the outcome ‘’chronic lung disease,’’ considered an important outcome,

was found to be missing in 55% of relevant systematic reviews.2

A core outcome set captures the key outcomes from those that could be or

have been used in trials of a specific topic. These core outcomes sets should be

included in future studies of that topic. This does not imply that a particular trial

should be restricted to those outcomes in the core set. Ideally, core outcomes will

always be collected and reported, but researchers will continue to explore other

outcomes.3 In many trials, the primary outcome would be expected to be one of

those contained in the core set, although this is not part of the definition of a core

outcome set. Successful implementation of a core outcome set for rheumatoid

arthritis has resulted in improved harmonization of research by establishing

outcomes which are now more frequently measured by researchers.4

The aim of our project was to use robust consensus methods and engage all

key stakeholders to identify a set of critical and important outcomes (core

outcome set) for the evaluation of preventive interventions for the preterm

birth in asymptomatic pregnant women.

Chapter 2

METHODS

A protocol with explicitly defined objectives, formal consensus development

methods, criteria for participant identification and selection, and statistical

methods was developed. The study was prospectively registered with the

Core Outcome Measures in Effectiveness Trials (COMET) initiative (registration

number 603 available online at www.comet-initiative.org/studies/details/603).

The ethics board of the Academic Medical Center, Amsterdam, The

Netherlands, advised that ethical approval was not required (reference number

E2-172) because this project should be considered as service evaluation and

development.

The target of the core outcome set was to capture important outcomes

for individual studies, systematic reviews, and guidelines for preterm birth

prevention in asymptomatic woman. For our purposes, preterm birth was

defined as neonates born alive before 37 weeks of gestation.5 An asymptomatic

woman was defined as one without symptoms of preterm labor (e.g. increased

uterine contractions, menstrual cramps of backache, color change of vaginal

discharge, prelabor rupture of membranes). Preventive treatment of preterm

birth was defined as one started before any symptoms of preterm labor were

present. This preventive strategy could be pharmacologic (e.g. progesterone,

marine oils, probiotics) or non-pharmacologic (e.g. cerclage, pessary, lifestyle

interventions and alternative therapies).

A Project Steering Committee was established to give guidance to the different

phases of this project consisting of two obstetricians (Irene de Graaf, Khalid S.

Khan), two neonatologists (Timo de Haan, Stephen Kempley), two midwives

(Felipe Castro, Birgit van der Goes), two patient representatives (Aoife Ahern,

Mandy Daly) and three methodologists with experience in formal consensus

and/or core outcome set methods (James Duffy, Brent Opmeer, and Paula

Williamson).

A systematic literature review was undertaken searching the Cochrane

Pregnancy and Childbirth Group’s (PCG) Trials Register.1 The Pregnancy and

Childbirth Group register is maintained by monthly searches of the Cochrane

Central Register of Controlled Trials and weekly searches of EMBASE and

MEDLINE and hand-searches of 30 journal and conference proceedings (from

January 1997 to January 2011). The register was searched utilizing the register’s

codes for preterm birth. Two reviewers (S.M. and Z.A.) independently screened

titles and abstracts. They critically reviewed the full text of selected studies and

extracted reported outcomes. Any discrepancies were resolved by discussion.

In addition, all delegates (n=168) of the First European Spontaneous Preterm

Birth Congress (Svendborg, Denmark, May 24-25, 2014), mainly representing

obstetricians and researchers, but also midwives, neonatologists and members

of industry, were requested via e-mail to recommend potential outcomes.

Patient representatives and parents were invited through social media (Twitter

and patient forums on Facebook) to participate in an online questionnaire to

share their opinions regarding outcomes relevant to preterm birth. Members

of patient organisations including the European foundation for the Care of

Newborn Infants, their partner organizations, and parental forums of neonatal

baby units were e-mailed by their own organization including an invitation

for the online questionnaire through an electronic newsletter. Patients also

contributed their opinions through in-person semistructured interviews

conducted by one of the authors (J.v.t.H.).

The Project Steering Committee identified outcomes that were duplicated as a

result of varied terminologies used by different stakeholders and for grouping

closely related outcomes into overarching domains. This outcome inventory of

29 outcomes was entered into a Delphi process (Figure 1).

We used a two-round electronic Delphi survey design, a well-established

method to elicit consensus based on an iterative process with anonymous

consultation and with controlled feedback and quantified analysis of the

responses.6 A priori we agreed the important methodological features for our

Delphi process: [1] composition of the group; [2] anonymity; [3] how to assess

the importance of outcomes; [4] method of feedback of results to participants;

[5] how consensus would be reached; [6] how to assess possible attrition bias.

The setting for the Delphi survey was multinational involving stakeholders

from middle- and high-income countries. A formal written invitation was

e-mailed to all members of the Cochrane Pregnancy and Childbirth group

(n=30), the Core Outcomes in Women’s Health initiative (n=77), the European

Chapter 2

Preterm Birth Congress (n=168), and the Global Obstetrics Network (n=237).

Most members of these organizations are researchers (methodologists),

obstetricians (mainly specialized in maternal fetal medicine) or neonatologists.

The European foundation for the Care of Newborn Infants approached their

members themselves, including their partner organizations in Australia,

Belgium, Bulgaria, Canada, Chile, Croatia, Cyprus, Denmark, Finland, France,

Germany, Greece, Hungary, Ireland, Israel, Italy, Lithuania, Mexico, the

Netherlands, Norway, Poland, Portugal, Spain, Turkey, United Kingdom, and

the United States. All midwifes from ‘Barts Health Nursing and Midwifery’

(n=132) and some midwifes of the School of Nursing and Midwifery (Galway,

Ireland) and the Dutch Consortium for Healthcare Evaluation in Obstetrics

and Gynaecology were approached. With this approach we aimed to targeted

midwifes who were active in research (50%) and midwifes who were not active

in research (50%). In total 337 obstetricians, 152 midwives, 174 researchers, 75

neonatologists, and an unknown number of parents (through the previously

mentioned patient organizations) were invited.

We used LimeSurvey for the Delphi survey. The survey was piloted first by eight

people representing every stakeholder group. No changes were needed after

the pilot. The official survey had a closing date of 5 weeks after the date of

invitation for every Delphi round. An e-mail reminder was sent to participants

on days 7, 14, 21, and 28. Nonresponders in the first round were not invited to

participate in the subsequent round.

Participants were asked to rate the importance of each outcome on a

9-point Likert scale anchored between 1 (‘limited importance’) and 9 (‘critical

importance’). The scale is recommended by the Grading of Recommendations

Assessment, Development and Evaluation working group: 1-3: limited

importance; 4-6: important but not critical; 7-9: critical.7 Participants were

invited to recommend additional potential outcomes for consideration at the

end of the survey using free-text responses.

The individual, stakeholder group and total results from the first round were

relayed back to participants by e-mail; the individual responses directly after

filling in the first round questionnaire, the stakeholder group, and total group

responses were fed back anonymously 1 day prior to the invitation to the

second round of the Delphi survey. Furthermore, participants of the second

survey were able to see the mean value of the total group responses from the

first Delphi round while completing the survey. Participants were asked to

score all the individual outcomes again using the same 9-point Likert scale. No

outcomes were excluded in this round to ensure a holistic approach to scoring

in round 2.

The Delphi survey responses were analyzed using SPSS version 21.0. For

each outcome the median and interquartile range were calculated. Frequency

tables of all scores were generated, as well as boxplots for visualization (that

were used to relay back the whole and stakeholder group responses). We

defined consensus a priori. Core outcomes required at least 70% of participants

in each stakeholder group scoring the outcome as ‘critical’ and less than 15%

of participants in each stakeholder group scoring the outcome as ‘limited

importance’.8 Outcomes which should not be included in a core outcome set

required at least 70% of participants in each stakeholder group scoring the

outcome as ‘limited importance’ and less than 15% of participants in each

stakeholder group scoring the outcome as ‘critical’. If outcomes did not meet

either criteria they were classified as outcomes with no consensus. Attrition

bias (e.g. a selective group did not respond to the second round of the survey

or a selective group participated in the consultation meeting) was assessed by

1) comparing the distribution of median first round scores across the outcomes

for those not participating in the second round with those who did; and 2)

comparing the distribution of median round 2 scores across the outcomes for

those participating in the consultation meeting compared with those who did

The final phase of the study was a face-to-face consultation meeting with

participants of the Delphi exercise representing all stakeholder groups

(Washington, DC, November 9, 2014). This meeting was organized within a

meeting for a prospective individual participant data analysis project for studies

on the use of pessary in the prevention of preterm birth in asymptomatic

women. Eleven participants of this prospective individual patient data project

did also took part in the Delphi survey earlier. They mainly represented the

stakeholder groups of obstetricians and methodologists. Representatives from

the other stakeholder groups (parents, midwives and neonatologists), who

were living close to the location of the consultation meeting, were invited for

this consultation meeting as well. In total 23 obstetricians, 10 researchers, two

Chapter 2

neonatologists, two patient representatives, and one midwife were invited to

attend this meeting. Information material on the purpose of the consultation

meeting and the Delphi round 2 results were sent to participants before

the meeting. A plenary presentation on the Delphi survey outcomes was

complemented by small group sessions (mixed groups) where participants

expressed their views on the candidate outcomes. Only outcomes that did not

reach full consensus in the Delphi exercise were presented to the attendees of

the meeting with an anonymous voting using electronic touchpads. Consensus

in the consultation meeting required a majority of 70% of participants from

each stakeholder group approving an individual outcome as ‘critical’ according

to the 1-9 Likert scale. With the permission of the participants the consultation

meeting was recorded.

RESULTS

The systematic review yielded 170 randomised trials and 33 reviews and

protocols. The flowchart and more information on the selection process and the

systematic review are reported elsewhere.1 We identified 72 outcomes reported

as primary outcomes and 155 outcomes reported as secondary outcomes. A

further 25 outcomes were recommended by participants of the First European

Spontaneous Preterm Birth Congress, and eight additional outcomes were

recommended by patients through interview or online questionnaires (Figure 1).

The Project Steering Committee considered all 260 identified outcomes.

The committee excluded 36 outcomes that were not relevant to the study’s

population, 92 outcomes that were rather outcome measurement instruments

or definitions of a particular outcome, and 17 outcomes that were duplicates

(Appendix 1). Subsequently, 86 different outcomes (with some closely related)

were grouped into 29 outcome domains that were entered into the Delphi

process (Figure 1).

In round 1 of the survey, overall, 195 (86%) of the 228 participants from five

stakeholder groups representing three lower-middle-income countries, seven

upper-middle-income countries and 17 high-income countries (classification

according to http:// data.worldbank.org/about/country-and-lending-

groups#Lower_middle_income) responded (Table 1). The Project Steering

Committee considered the free text responses of participants and entered an

additional 2 outcomes (offspring circulatory morbidity and offspring metabolic

morbidity) into the Delphi survey round 2 and considered changes in the

formulation of some outcomes (Appendix 2)

Table 1. Total number and baseline characteristics of participants of the Delphi survey and consultation meeting

Statekeholder groups 1st Delphi round(n=195)

2nd Delphi round(n=174)

Consultation meeting (n=29)

Parents n (% of total group)Response %

32 (16)84

25 (14)78

Midwives n (%)Response %

28 (14)78

25 (14)89

Neonatologists n (%)Response %

34 (18)80

34 (20)100

Obstetricians n (%)Response %

62 (32)90

55 (32)89

14 (48)

Researchers n (%)Response %

39 (20)91

35 (20)90

10 (34)

Total group response % 86 89 100

Characteristics health professionals

Main work clinical related % 62 60 57

Main work research related % 36 40 43

Other 2 0 0

Representing other stakeholder groups

Parent experiencing preterm birth %

1 1 17

Midwife % 6 6 7

Obstetrician % 14 16 14

Neonatologist % 4 5 3

Researcher % 38 39 34

Industry % 2 2 0

Part of CROWN or representing journal %

22 22 18

Part of Cochrane collaboration % 25 27 21

Systematic review related to preterm birth? %

54 54 -

Role in development of national/international guidelines %

60 61 64

Role in allocation of healthcare budgets %

9 9 21

Countries represented n (countries healthcare professionals working in) *†

25 25 8

Chapter 2

Table 1. continued

Statekeholder groups 1st Delphi round(n=195)

2nd Delphi round(n=174)

Consultation meeting (n=29)

High-income countries n Upper-middle-income countries n Lower-middle-income countries n

Participants middle-income countries n (%)

20 (12) 19 (13) 2 (7)

Characteristics parents

Female % 91 88 -

Experienced preterm birth <37 weeks %

94 96 -

Once % 69 72 -

Twice % 25 24 -

Gestational age most premature baby median (range)

30 (24-35) 30 (24-35) -

Highest degree of education median (range)

Master’s degree (high school to doctorate degree)

Ethnic group white % 100 100 -

Involved in research before % 59 60 -

Participated in study % 31 36 -

Helped in a study giving advice from parental/patient perspective %

Worked as a researcher % 19 20 -

Represented countries of residence n ‡

High-income countries n Upper-middle-income countries n

*Represented countries healthcare professionals: Argentina, Australia, Brazil, Canada, Chile, China, Denmark, Egypt, Germany, Hong Kong, Iran, Ireland, Italy, Lebanon, Mexico, Nigeria, Pakistan, Qatar, South Africa, Spain, Switzerland, the Netherlands, United Kingdom, Uruguay, USA.†Represented countries consultation meeting: Australia, Brazil, China, France, Spain, the Netherlands, UK, USA. ‡ Represented countries parents: Greece, Ireland, Serbia, the Netherlands, United Kingdom, USA.

t of i

Chapter 2

s ‘in

’ lis

t of a

us ‘i

n’ li

in ‘n

o’ c

Round 2 of the survey was completed by 174 participants (89% response rate).

Participants reflected on the stakeholder group response and total group

responses of the 31 outcomes included in round 2 (Table 2). They reached

full consensus in all stakeholder groups on 11 outcomes (Appendix 3). They

failed to reach consensus regarding the remaining 20 outcomes. Ten of the 20

outcomes that did not reach consensus in the Delphi survey were considered

in the consultation meeting. These 10 outcomes were outcomes that came out

of the Delphi survey as consensus ‘in’ (i.e. greater than 70% of the stakeholder

group scoring the outcome as ‘critical’) by at least one stakeholder group (n=9

outcomes) or were listed in the top 10 of most important outcomes (n=1).

Participants who did not respond to the second round Delphi survey scored

comparable median scores in the first round survey (with overlap in interquartile

ranges) when compared to the scores of those who participated in both

surveys (Appendix 4). Also, the median second round scores of participants

who attended the consultation meeting did not differ significantly from the

median scores of those who did not attend this meeting.

At the stakeholder meeting in Washington, DC, 29 participants representing

all stakeholder groups discussed and voted on the 10 outcomes that did not

reach full consensus by all stakeholder groups in the Delphi exercise (i.e. the

nine outcomes that were consensus in by 1 or more in the stakeholder group

and the outcome that was listed in the top 10; Figure 1). Only the outcome

birth weight was rated by greater than 70% of all stakeholder groups with a

Likert score of 7-9. Minutes of the discussion and arguments for including or

excluding outcomes are provided in Appendix 5.

The Project Steering Committee considered the results of the Delphi method

and consultation meeting. The committee discussed and ratified all the 12

selected core outcomes of the Delphi and consultation meeting process. The

committee agreed that the 12 outcomes should be presented as outcomes

related to the pregnant woman (maternal set of outcomes), and outcomes

related to the offspring (neonate set of outcomes). The Project Steering

Committee agreed unanimously that the outcome selected in the consultation

meeting should be included in the final core outcome set and that mother

and offspring should have separate outcomes for ‘harm’. Therefore, the core

Chapter 2

outcome set would consist of 13 instead of 12 outcomes. The nal core outcome set represents four outcomes related to pregnant women (maternal set): [1] maternal mortality; [2] maternal infection or inammation; [3] prelabor rupture of membranes; [4] harm to mother from intervention. Nine outcomes related to the ospring (neonate set): [5] gestation age at birth; [6] ospring mortality; [7] birth weight; [8] early neurodevelopmental morbidity; [9] late neurodevelopmental morbidity; [10] gastrointestinal morbidity; [11] infection; [12] respiratory morbidity and [13] harm to ospring from intervention (Box 1).

Box 1. Final Core Outcome Set of 13 Outcomes Presented as a Maternal and Neonate Set.

MATERNAL SET OF OUTCOMES NEONATAL SET OF OUTCOMES

Maternal mortality Ospring mortality

Maternal infection or inammation Ospring infection

Prelabor rupture of membranes Gestational age at birth

Harm to mother from intervention Harm to ospring from intervention

Birth weight

Early neurodevelopmental morbidity

Late neurodevelopmental morbidity

Gastrointestinal morbidity

Respiratory morbidity

DISCUSSION

In this project, by utilizing formal consensus methods, we identied a core outcome set of 13 outcomes for comparative health research on preventative interventions for preterm birth in asymptomatic women. These outcomes can be used in future studies, reviews and guidelines on preterm birth prevention.

There are several strengths throughout the dierent phases of this project. We have followed the guidelines for core outcome set development, as outlined by the Core Outcome Measurement in Eectiveness Trials initiative.3 Second, the method of identication of outcomes was not restricted to the results from a systematic literature review. Questionnaires and interviews

were disseminated through conferences and through social media. Third, the

parental (patient) perspective was included. This is an important strength as

patients can identify outcomes not considered by other stakeholders or within

the literature.9 10 In this particular core outcome set project we noted that a

total of eight outcomes were identified by parental participation that were

not identified by prior methods (Appendix 1). Four of these eight outcomes

suggested by patients and parents were clustered in the outcome ‘late

neurodevelopmental morbidity’ that was selected in the core outcome set. We

hope this will motivate future research to actively involve parents because a

recent systematic review concluded that only 16% of reported core outcome

studies mentioned that the public has been involved in the process.11 Fourth,

we used a Delphi exercise, a well-established method that has the advantage of

capturing a large number of geographically distant participants compared to

face-to-face discussions. Also, participants have the chance to reconsider their

opinion without the pressure to agree with senior or domineering individuals.6

This project successfully involved a large number of participants amongst

important stakeholder groups and a global representation with participation

of middle- and high-income countries. Most of the healthcare professionals

involved have a prominent role in their specialties (e.g. a high number of the

participants are involved in (inter)national guideline development). This broad

involvement of key stakeholders resulted in a core outcome set that should be

globally representative and acceptable.

The first limitation of the study is the lack of a formal qualitative analysis of

the semistructured interviews with patients and that all patients involved

were representing a white ethnic group only. Another limitation is that the

stakeholder group representation at the consultation meeting was not

reflective of the composition of the group during the Delphi process. Although

all stakeholders were represented at the consultation meeting, specifically

the midwives, neonatologists and parents (patients), representatives were

underrepresented at the consultation meeting. It is possible that the two

parents attending the meeting could have found it difficult to argue against

the healthcare professionals. The Project Steering Committee addressed this

underrepresentation of some stakeholders at the consultation meeting. First,

only the outcomes that did not reach full consensus (i.e. consensus ‘in’ by one

or more stakeholder group) were considered in the consultation meeting.

Chapter 2

Outcomes that were already consensus ‘in’ after the Delphi exercise were not

discussed (i.e. 11 of the 13 outcomes in the core outcome set were already

agreed through the Delphi exercise). Second, the analysis of the consultation

meeting was based on the voting per ‘stakeholder group’. This means that

every stakeholder group (and not every individual) had the same weight for

the decision to include an outcome as a core outcome set. Still, we cannot

exclude the possibility that some outcomes were not scored as consensus ‘in’

due to underrepresentation of some stakeholder groups. A core outcome set is

therefore not static and can be adjusted and reviewed in the future.

In the Delphi exercise there were two individuals reporting that they

represented the industry as their main stakeholder group. In the analysis

we incorporated their outcomes to the second stakeholder group they also

belonged to (e.g. obstetrician and researcher). Finally, the method of reflecting

the first Delphi results to all participants prior to the second Delphi survey might

have influenced the second Delphi. Besides the individual stakeholder group

responses that were relayed back to the participants by email, we summarised

the total group responses in the survey. Because the whole group summary

will be affected by the number per stakeholder group, participants may have

been influenced by this without realizing that some groups were over- or

underrepresented. However, by reflecting both responses (per stakeholder and

total responses), we felt that participants were receiving a complete overview

of the results.

The proposed 13 core outcomes guide researchers on what to measure. It does

not tell researchers how to measure these outcomes by specifying an outcome

measurement instrument and definition for each specific outcome domain.

Guidance for selecting high-quality outcome measurement instruments are

now being written by the Core Outcome Measurement Instrument Selection

project group.12 In preterm birth, a high quality outcome measurement

instrument and definitions are being developed in a separate project.13 Until

then, we encourage researchers to annotate how an outcome was actually

measured and provide the definition used in each trial.

Furthermore, once an outcome set is chosen there may be continued concern

that the choices of outcomes in the set do not fit the need of a particular study.

Researchers will have their own hypothesis to test, and therefore will need to

consider the outcome(s) that reflect their specific hypothesis. Besides collecting

hypothesis-specific outcomes, data should be collected and reported on the

core outcome set.

Studies on evaluation of treatments in symptomatic woman (like tocolytics)

might consider to use this core outcome set in addition to the use of outcomes

that are also relevant for that particular study population, for example,

‘successful prolongation of pregnancy for 48 hours or longer’. The selection

and evaluation of the importance of those particular outcomes are beyond the

scope of this core outcome set project.

Consistency of measurements and reporting of the core outcome set in trials

is only the first step in the attempt to improve impact and reduce waste.14

To address possible barriers to the awareness of the core outcome set, a

journal editors initiative, Core Outcomes in Women’s Health, is encouraging

researchers to implement core outcome sets in women’s health studies.15 More

than 70 women’s health journals are now participating in this initiative (www.

crown-initiative.org). Also funders could have an important role encouraging

consideration of a core outcome set.13

Based on a review of the literature (MEDLINE and EMBASE search ‘premature

infant [MeSH] AND core outcomes set’) and search on the Core Outcome

Measures in Effectiveness Trials initiative website (http://www.comet-initiative.

org/studies/search). This is the first core outcome set developed to ensure

consistency in preterm birth prevention research. The initiative from the James

Lind Alliance (a partnership regarding research priorities) registered a core

outcome sets for very preterm birth from patient perspectives on the Core

Outcome Measures in Effectiveness Trials website. This project is still ongoing

(www.comet-initiative.org/studies/details/256). It will be of interest to compare

the results of these two approaches. Earlier work on a core outcome set for

maternity care reported 48 outcomes to consider.16 This core outcome set did

not target preterm birth research specifically, and we think that the set of

13 outcomes we recommend here will be more applicable to preterm birth

prevention research. The importance of reporting all crucial outcomes has

been highlighted in a recent systematic review, which concluded that most

Chapter 2

published trials in preterm birth are missing information on one of the most

crucial outcomes in this population: chronic lung disease.2 Although this project

does not provide definitions and give advice to which outcome measurement

instruments should be used, we would like to suggest that the outcome named

as ‘chronic lung disease’ is captured by the outcome ‘respiratory morbidity of

the offspring’ that is used in this core outcome set project.

In a related project involving the Global Obstetrics Network (www.

globalobstetricsnetwork.org), 15 planned trials focusing on the use of vaginal

pessary for prevention of preterm birth have already expressed their intention

to include this core outcome set in the study protocols and case report forms

to facilitate a prospective individual patient data analysis collaboration (see

further details above in the method section ‘consultation meeting’). The

participating research teams have the intention to use the same methods to

measure these outcomes and use the same definitions across studies as well.

Even if researchers have the intention to comply with the core outcome set, it

is likely that some core outcomes may be difficult to collect. One such example

is ‘long term neurodevelopment’. This is an outcome that is often not collected

due to logistics or lack of funding. In a recent review, only 16% of large obstetrical

trials were able to report on follow-up,17 and only one study used this outcome

as a primary outcome.18 We hope that the development of core outcomes will

provide a strong incentive to researchers to argue for adequate funding to

perform a follow-up of their planned study. When researchers fail to collect any

of the core outcomes, we encourage them to provide an explanation why this

outcome was not collected.

We are confident that the development and implementation of a core

outcome set will benefit the patients and health care providers by reducing

the chance for reporting bias and improving the interpretation of evidence.14 It

will facilitate individual patient data analyses and allowed adequately powered

subgroup analyses.

The core outcome set for studies on preterm birth prevention developed

with an international multidisciplinary perspective, when implemented

in comparative health research, will ensure that data from all trials can be

compared and combined.

REFERENCES

1. Meher S, Alfirevic Z. Choice of primary outcomes in randomised trials and systematic reviews evaluating interventions for preterm birth prevention: A systematic review. BJOG 2014;1–9.

2. Ioannidis JP a., Horbar JD, Ovelman CM, Brosseau Y, Thorlund K, Buus-Frank ME, et al. Completeness of main outcomes across randomized trials in entire discipline: survey of chronic lung disease outcomes in preterm infants. BMJ 2015;350:e72.

3. Williamson PR, Altman DG, Blazeby JM, Clarke M, Devane D, Gargon E, et al. Developing core outcome sets for clinical trials: issues to consider. Trials 2012;13:132.

4. Kirkham JJ, Boers M, Tugwell P, Clarke M, Williamson PR. Outcome measures in rheumatoid arthritis randomised trials over the last 50 years. Trials 2013;14:324.

5. WHO. WHO: recommended definitions, terminology and format for statistical tables related to the perinatal period and use of a new certificate for cause of perinatal deaths. Modifications recommended by FIGO as amended October 14, 1976. Acta Obstet Gynecol Scand 1977;56:247–53.

6. Sinha IP, Smyth RL, Williamson PR. Using the Delphi technique to determine which outcomes to measure in clinical trials: Recommendations for the future based on a systematic review of existing studies. PLoS Med 2011;8.

7 Guyatt GH, Oxman AD, Kunz R, Atkins D, Brozek J, Vist G, et al. GRADE guidelines: 2. Framing the question and deciding on important outcomes. J Clin Epidemiol 2011;64:395–400.

8. Harman NL, Bruce I a, Callery P, Tierney S, Sharif MO, O’Brien K, et al. MOMENT--Management of Otitis Media with Effusion in Cleft Palate: protocol for a systematic review of the literature and identification of a core outcome set using a Delphi survey. Trials 2013;14:70.

9. De Wit M, Abma T, Koelewijn-van Loon M, Collins S, Kirwan J. Involving patient research partners has a significant impact on outcomes research: a responsive evaluation of the international OMERACT conferences. BMJ Open 2013;3:1–12.

10. Hewlett S, De Wit M, Richards P, Quest E, Hughes R, Heiberg T, et al. Patients and professionals as research partners: Challenges, practicalities, and benefits. Arthritis Care Res 2006;55:676–80.

11. Gargon E, Gurung B, Medley N, Altman DG, Blazeby JM, Clarke M, et al. Choosing important health3 13-16 outcomes for comparative effectiveness research: A systematic review. PLoS One 2014;9(6).

12. Prinsen CC, Vohra S, Rose MR, King-Jones S, Ishaque S, Bhaloo Z, et al. Core Outcome Measures in Effectiveness Trials (COMET) initiative: protocol for an international Delphi study to achieve consensus on how to select outcome measurement instruments for outcomes included in a “core outcome set”. Trials 2014;15:247.

13. Lawn JE, Gravett MG, Nunes TM, Rubens CE, Stanton CS and the GAPPS Review Group. Global report on preterm birth and stillbirth (1 of 7):definitions, description of the burden and opportunities to imporve data. BMC Preg and Chil 2010; 10:S1.

14. Chalmers I, Glasziou P. Avoidable waste in the production and reporting of research evidence. Lancet 2009;374:786.

15. Khan K. The CROWN Initiative: Journal editors invite researchers to develop core outcomes in women’s health. BJOG 2014;126:201–2.

16. Devane D, Begley CM, Clarke M, Horey D, OBoyle C. Evaluating Maternity Care: A Core Set of Outcomes Measures. BIRTH 2007; 34:164-72.

17. Teune MJ, van Wassenaer a G, Malin GL, et al. Long-term child follow-up after large obstetric randomised controlled trials for the evaluation of perinatal interventions: a systematic review of the literature. BJOG 2013;120:15–22.

18. Lees CC, Marlow N, van Wassenaer-Leemhuis A, Arabin B, Bilardo CM, Brezinka C, et al. 2 year neurodevelopmental and intermediate perinatal outcomes in infants with very preterm fetal growth restriction (TRUFFLE): a randomised trial. Lancet 2015;4:S0140–6736.

Chapter 2

entifi

h ‘s

ous’

‘iatr

‘ges

ery’.

‘out

res’

e ‘g

ery’.

tic fo

Chapter 2

tic fo

r alte

effica

t with

Chapter 2

ital t

entifi

tifica

ite le

Chapter 2

rial v

ng ≤

‘cer

gth’

itatio

Chapter 2

finiti

‘off

lity’

l or n

Chapter 2

t defi

itatio

‘neo

till a

Chapter 2

es ‘h

on’ o

r ‘ho

its’

ital r

Chapter 2

‘adm

ring’

‘oxy

ilatio

nt’s

e ‘a

’ or ‘

Chapter 2

Appendix 2. missing outcomes mentioned by participants of the online Delphi questionnaire round 1 with comments of the Project Steering Committee.

Outcomes suggested Comment Project Steering Committee

1) Identification of microbial species associated with inflammation

2) Parental attachment kangaroo care3) social morbidity for the father of the baby

1) Included as example of outcome ‘maternal infection or inflammation’.

2) included as example of ‘societal morbidity offspring’

3) Outcome ‘maternal mental health and maternal social morbidity’ are formulated as ‘parental’ instead of ‘maternal’

Medical intervention - medically elective caesarean or induction prior to 39 weeks of gestation

This outcome is already incorporated in the outcomes ’mode of delivery’ and ‘harm’

Gestational age at delivery Formulation of outcome ‘gestational age by mode of delivery’ is changed to ‘gestation age at delivery’

interventricular haemorrhage This outcome is already included in early neurodevelopmental morbidity. Formulation of word ‘periventricular haemorrhage’ is changed to ‘interventricular haemorrhage’

1) very long term cognitive development (school age)

2) quality of life perception3) indirect cost 4) impact on family structure and functioning

1) Overlap with outcome ‘late neurodevelopmental morbidity’

2) Overlap with outcome ‘patient experience’3) Formulation of outcomes ‘service costs

mother and offspring’ are changed to indirect costs, and outcomes ‘service utilization mother and offspring’ are changed to direct costs. Argument: costs can always be calculated after collecting the service utilization. The outcome ‘costs’ is therefore an extension of ‘service utilization’

4) Already in outcome ‘parental social morbidity’

maternal cardiovascular morbidity Overlap with outcome ‘harm’. Change in formulation in outcome by including example: incidence of maternal adverse events during pregnancy or labour’

Standardized definition/categories for reporting fetal/neonatal death as well as standardized categories for reporting gestational age at delivery

This is not the objective of this project. This will be defined in consensus meeting or in a next phase of Delphi survey defining how the outcomes should be measured.

pre-existing maternal morbidity such as obesity, hypertension, diabetes or other chronic medical condition.

These are baseline characteristics, not outcomes

Retinopathy of prematurity? Already in outcome ‘early neurodevelopmental morbidity’

Access to special care for preterm babies after discharge from NICU

Overlap with outcome ‘service utilization offspring’

Paternal or partner side-effects (stress, cost, depression,...)

All already included in existing outcomes

1) Quality of parental care2) Evaluation of the implementation of beneficial

care practices: iron supplementation; screening of bacteriuria;smoking;alcohol consumption and drug use cessation programs.

3) Impact of the occupational health

1) This might be a reason why a child delivered preterm, but is not an outcome

2) All baseline characteristics3) Also a baseline characteristic.

Appendix 2. continued

birth weight centime for GA This is an outcome measure, part of the outcome birth weight which is already included

Patient’s preference (in RCT) Patient’s risk perception

Overlap with outcome patients experience’

Bayley score after 2 years for cognitive outcome, IQ at school age, decision making at school age, further cognitive development

These are all outcome measures and already included in the outcome ‘late neurodevelopmental morbidity’

cognitive child outcome schooling Already in outcome ‘late neurodevelopmental morbidity’

Intra-uterine fetal death Already in outcome ‘offspring mortality’

IVH-PVLM Changing of formulation ‘periventricular haemorrhage’ to interventricular haemorrhage’

community services for patient and families. Changing of formulation outcome ‘service costs mother’ to ‘ Societal service utilization parents/indirect costs ‘

1) pregnancy complications? (PE, haemorrhage)2) induction or spontaneous initiation of labour,

or Elective CS

1) Already in outcome ‘harm’2) Already in outcome ‘mode of delivery’.

Included as examples: Mode of delivery (for example vaginal birth, labour induction or elective/ emergency Caesarean section)

maternal socioeconomic status Baseline characteristics

need for transfer of offspring or mother to centre for higher level of care

Already in outcome ‘service utilization’ mother and offspring

educational attainment of offspring Already in outcome ‘late neurodevelopmental morbidity’

1) Cervical dilation status. (If yes, quantify the extent of dilation in centimetres and gestational age when dilation is observed).

2) History of cervical surgery (e.g. LEEP or conization) or uterine abnormalities (e.g. fibroid (specify its size))

1) Already in outcome ‘cervical length’2) Baseline characteristics

preterm birth Already included in ‘gestation age at delivery’

Intrauterine fetal death (as a specific fetal side effect from the intervention)

Already in outcome ‘offspring mortality’ This is an outcome measure

You haven’t mentioned surrogate outcomes. The steering committee thinks there are several surrogate outcomes included in the outcome list. For example ‘cervical length’ or ‘biophysical measures’.

cliënt satisfaction with prevention measure Already included in outcome ‘patient’s experience’. Change of formulation of this outcome to ‘patient’s reported outcome measures’

Recurrent late onset septicaemia (especially gram negative sepsis) is also an independent risk factor for adverse neurodevelopmental outcome in premature babies.

This is about risk factors, not outcomes

BMI, gestational weight gain, smoking in pregnancy, years of education

Baseline characteristics, not outcomes

Chapter 2

Appendix 2. continued

1) Seizures in the rst 24 hours 2) Refractory hypotension 3) Severe interventricular haemorrhage

1) Already in outcome ‘early neurodevelopmental morbidity’

2) We included ‘ospring circulatory morbidity’ as a new outcome

3) Already in outcome ‘early neurodevelopmental morbidity’

I think co morbid plays an important role in the prevention of pretermRisk of hypoglycaemia and electrolytes disturbances in preterm infants?

We included ‘metabolic morbidity’ as a new outcome

1) mother’s pre-pregnancy BMI;2) number of surgeries baby underwent in

rst year of life3) whether baby had a PDA4) longer term behavioural and developmental

outcomes child5) longer term mental health outcomes

mother (and beliefs and fears for second pregnancy)

1) baseline characteristic2) included as example to outcome ‘health

service utilization ospring/direct costs’3) included in new outcome ‘circulatory

morbidity’ as an example in4) Already in outcome ‘late neurodevelopmental

morbidity’5) Already in outcome ‘parental mental health

morbidity’ and ‘maternal reproductive morbidity’

Each table column represents the suggestion of one participantOutcomes marked in bold are outcomes suggested by patient representatives

Chapter 2

: 1) c

‘crit

’ and

‘not

nt’;

n ‘n

nt’ A

n ‘c

al’;

us ‘in

’ for

us ‘in

’ for

iteria

‘in’ b

Chapter 2

ife, 2

e ‘m

ery’

‘PPR

’ ch

: ‘sp

ery’

sus ‘

ery’.

‘mod

y’ s

at ‘m

ery’

‘PPR

’ and

e ‘m

ery’.

n ‘W

’ PPR

up ‘m

ery’

It’s

t ‘m

ery’

ith th

it’s

. ‘If

ldn’

me’.

Chapter 2

idn’

‘NIC

’ is

’s so

ith li

at’s

n’t c

to ‘l

ght’

don’

’s p

tially

‘har

’s no

on’t

ital a

finiti

it’s

is ‘c

ore’

’s a

esn’

as ‘

y’•

‘har

m’ a

ith th

Chapter 2

PART IILong-term outcomes of

obstetrical evaluation studies

CHAPTER 3

CERVICAL PESSARY FOR PRETERM BIRTH PREVENTION IN TWIN PREGNANCY WITH SHORT CERVIX: A 3 YEARS FOLLOWUP

Janneke van ’t Hooft, Johanna H. van der Lee, Brent C. Opmeer, Cuny Cuijpers,

Aleid G. van Wassenaer-Leemhuis, Anneloes L. van Baar, Leonie Steenis,

Sophie Liem, Ewoud Schuit, Elise Bleker, Margot E. Vinke, Noor Simons,

Irene de Graaf, Dick Bekedam, Ben Willem J. Mol, Cornelieke van de Beek.

Submitted

Chapter 3

ABSTRACT

Objective

We recently showed that cervical pessary prevented preterm birth and improved neonatal outcome in women with a multiple pregnancy and a cervical length (CL) <38mm. This follow-up study evaluates long-term developmental outcomes in the offspring.

Methods

We performed a follow-up of the ProTWIN trial, in which between 2009 and 2012 women with a multiple pregnancy had been randomised to pessary or no pessary. Our current follow-up and analysis was limited to mothers with a midtrimester CL< 38mm (78 and 55 mothers, 157 and 111 children in the pessary and control group, respectively). At 3 years of corrected age, surviving children were invited for a Bayley Scales of Infant and Toddler Development-third edition (Bayley-III) assessment. We compared death or neurodevelopmental disability (defined as a Bayley score <1SD) rates between pessary and controls according to intention-to-treat principle and using multiple imputation for missing data. We compared mean Bayley-III scores in surviving children. A linear mixed effects model was used to adjust for correlated data in twins and triplets.

Results

In total 27 children had died (6 pessary vs 21 control group, 5% versus 26%,

adjusted odds ratio (aOR) 0.14 [95% CI 0.04 to 0.50]. Bayley-III outcomes were

collected for 173 (72%) out of 241 surviving children (114 (75%) pessary vs

59 (66%) control group).The cumulative incidence of death or survival with a

neurodevelopmental disability was 12 (10%) vs 23 (29%) in pessary and control

group respectively; aOR 0.26 [95% CI 0.09 to 0.75]. We found neither statistical

nor clinically relevant differences in Bayley-III scores among surviving children

between both groups. Comparable results were found after multiple imputation.

Conclusion

In women with a twin pregnancy and a CL < 38 mm, the use of cervical

pessary strongly improved survival of the children without affecting

neurodevelopmental disability at three years corrected age.

Follow-up cervical pessary in twin pregnancy

INTRODUCTION

Preterm birth, defined as birth prior to 37 weeks of gestation, is a leading contributing factor to perinatal mortality and morbidity. With a 50 to 60% of the women with a multiple pregnancy delivering preterm, multiple pregnancy is a major risk factor for preterm birth.1 2

Several interventions such as vaginal progesterone and 17α-hydroxyprogesterone caproate have been studied for their capacity to prevent preterm birth in women with multiple pregnancy. In unselected women with a twin pregnancy they have been proven to be ineffective, 3 4 whereas women with a twin pregnancy and a short cervix may benefit from these treatments.5 A promising intervention to prevent preterm birth in women with a multiple pregnancy and a short cervix is the cervical pessary. We recently reported the results of the ProTWIN trial, which was a randomized controlled trial comparing pessary versus no intervention in asymptomatic women with a multiple pregnancy randomized between 12 and 20 weeks of gestation.6 The trial showed no benefit in the total group of women. However, in a subgroup of women with a short cervical length (CL) at screening (<38mm), the pessary significantly reduced the preterm delivery rates below 32 weeks (11% vs 25%; RR 0.44 [95% CI 0.20 to 0.98] corresponding with an extension of the median duration of pregnancy with 10 days (35+0 weeks of gestation in control group compared to 36+3 weeks of gestation in pessary group; hazard ratio 0.49 [95% CI 0.32 to 0.77]. This extension had a strong impact on neonatal outcome. The primary outcome, defined as a composite of poor perinatal outcome (including stillbirth, short term neonatal morbidity and neonatal death within 6 weeks after the expected term date) occurred in 16 (10%) children in the pessary group and in 27 (24%) children in the no-pessary group; RR 0.42 [95% CI 0.19 to 0.91].

Previous studies have demonstrated that agents given to pregnant woman with the aim of improving pregnancy outcomes can have unexpected long-term effects on children which may not be apparent at birth. The follow-up of the ORACLE study showed expectantly that in women with threatened preterm labor without ruptured membranes, antibiotics are harmful on the long term.7 New data on the use of vaginal progesterone in the prevention

Chapter 3

of preterm birth reported in the OPPTIMUM study do not show short term benefit or translation to long term health, and might even show possible long term harmful consequences of the use of progesterone.8 Although a pessary, as a non-pharmacological agent that works mechanically, will not have a direct effect on the fetuses, the prolongation of pregnancy might have harmful effects.9 Considering that preterm labor has multiple etiologies, an association with chorioamnionitis might implicate that keeping a child 10 days longer in utero could have a harmful impact on the children. So, although the outcomes in twin children born to mothers with a short CL randomized to a pessary were beneficial on the short term, long-term effects of the use of pessary during pregnancy, including potential harm, are unknown. The aim of the current study –ProTWINkids study-, was to investigate long-term developmental outcomes in children born to mothers after placement of a cervical pessary, as compared to no pessary, in women with a multiple pregnancy and a short cervix.

METHODS

The study population consisted of children born to mothers who participated

in the ProTWIN trial and who had a short cervical length (< 38mm) at screening.

Details of the ProTWIN trial (NTR1858) have been described elsewhere.6 10

Briefly, this multicenter randomized controlled clinical trial was conducted

in 40 hospitals in the Netherlands. A total of 812 women with a multiple

pregnancy between 12 and 20 weeks of gestation were allocated to treatment

with a pessary (n=401) versus no intervention (n=407). Cervical length was

measured between 16 and 22 weeks’ gestation, either before or shortly after

randomization. Analysis of the primary outcome in the total group showed no

difference in the pessary group compared to controls; RR 0.98 [95% CI 0.69 to

1.39]. In a subgroup analysis in women with a CL < 25th centile (< 38 mm) (78

versus 55 mothers, 157 versus 111 children in the pessary and control group,

respectively) women randomized to a pessary had a 60% reduction in poor

perinatal outcome when compared to no-pessary. As positive effects of pessary

on pregnancy prolongation and improvement of neonatal outcome only had

been seen in women with a CL <38mm, we limited follow-up to this specific

group of 268 children (157 vs 111). A power calculation performed before the

start of the follow-up study showed that a sample size of 56 children in each

group was sufficient to detect a difference in of 8 points (>0.5 SD) in the Bayley

III Scales of Infant and Toddler Development (Bayley-III) test with a power of

80% and a two-sided α of 0.05 and ß of 0.20.

Follow-up assessment

The ProTWINkids study evaluated children at three years corrected age

collecting data on survival, neurodevelopment and general health using the

Bayley III test and a parental questionnaire in the children born to mothers

with a CL <38mm. Ethical approval for this follow-up assessment was given by

the research ethics committee of the Academic Medical Center in Amsterdam

(MEC E2-170).

Families were contacted by phone 3 to 4 months prior to the corrected age of

3 years. After informed consent of the parents, cognitive, language and motor

scales of the Bayley-III were assessed. A trained team of 7 people, masked to

study group, performed all Bayley-III tests in a nearby consultation clinic (well-

baby center) or hospital. In case a Bayley-III tests was already performed in a

standard care setting at two or three years corrected age (due to a follow-up

program on extreme premature infants or for other reasons), and parents were

not willing to repeat the test, those test results were collected (after informed

consent of the parents) in order to minimize attrition bias.

Survival

Research nurses in participating centers were asked to scrutinize the medical

records of all potential participants of the follow-up study to track the possible

occurrence of death of one or both of twins/triplets before contacting their

parents.

Bayley III Scales

The Dutch translation of the Bayley-III test was used to assess cognitive, motor

and language development in infants and toddlers.11 12 The Bayley-III norms

for children of the US population were used, as the Dutch norms were not yet

available.11 The test and its norms are used worldwide in health care settings,

as well as for scientific research purposes. The developmental outcomes are

Chapter 3

reported by the three scales for cognition, language and motor development

with a normed mean of 100 and a SD of 15 using US norms. A score of ≤ 85

points (i.e. lower than 1 SD below the mean) for any of the scales of the Bayley-

III represents suboptimal development.11 12

Health questionnaires

The questionnaire for parents included questions about demographic variables

(e.g. stable or unstable family composition, education of both parents, use of

daycare, bilingualism, being the eldest of the siblings) and health care use

(e.g. need for healthcare providers, medication use, hospitalization and need

for surgery in the last 3 years). This information was clustered in clinically

relevant groups. Regarding healthcare providers for example, visits to the

general practitioner were separated from visits to developmental specialists

(i.e. physical therapy, occupational therapy or speech therapy) and visits to a

medical specialist (i.e. pediatrician, surgeon, etc). Concerning medication use,

clustering in the most prevalent and clinically relevant groups of medications

resulted in the following categories: antibiotics, anti-epileptic-, anti-asthmatic-,

anti-eczema-medication and others. Frequencies in hospital admission and

need for surgery were clustered to 0 to 2 or >2.

In the ProTWINkids study three outcomes were assessed (Figure 1).

1. Death or survival with neurodevelopmental disability. This

composite outcome includes deceased and disabled children

defined as a combination of mortality (stillborn, death

before discharge and death before the age of 3 years) and

neurodevelopmental disability (defined as a Bayley-III score

below 85 points (-1SD) in one or more index scores (motor,

cognitive or language)).

2. Bayley-III scores in surviving children.

3. Health related outcomes (need for healthcare providers,

medication use, hospitalization and need for surgery in the last 3

years) derived from the health questionnaire.

Statistical analysis

Differences in demographic characteristics and the short term pregnancy and

perinatal outcomes were compared for participants of the ProTWIN initial trial

and the ProTWINkids study for the pessary and control group using unpaired

T-test, Mann-Whitney U test, Chi-square test or Fisher’s exact test when

appropriate. To compare the primary composite outcome (death or survival

with neurodevelopmental disability) between the pessary and control groups,

odds ratios (OR) and their corresponding 95% confidence intervals (CI) were

calculated using general linear mixed-effects model (GLMM) to account for

the correlation between children from the same pregnancy.13 As the numbers

were small, no adjusted OR were calculated. Mean cognitive, language and

motor function index scores of the Bayley-III test assessed at three years of

age, corrected for prematurity, were compared between the pessary group

and control group. We explored potential confounding variables using direct

acyclic graphs.14 In the GLMM analysis adjusted odds ratios were calculated

adjusting for these potential confounders (parental education, smoking

during pregnancy, ethnicity, children being the eldest in family, use of day

care, bilingualism, and breastfed for the first 6 months). A 2-sided level of

0.05 was considered significant. We used two different approaches to deal

with missing data due to perinatal/neonatal mortality and loss-to-follow-up:

complete case-analysis and multiple (n=5) imputation. A minimal clinically

important difference of 8 points in the Bayley-III scale (0.5SD) was considered

as a difference indicating potential harm.

Health related problems (reflected as the need for medical specialist and/

or developmental care, medication use in the past and present, hospital

admissions and operations) were clustered in different categories to give insight

in the range of health related problems. To prevent multiple testing, only one

(predetermined) analysis was performed in each health related category. All

analyses were performed according to the intention-to-treat principle using

IBM SPSS version 21 (NY, USA).

Chapter 3

t of c

ires a

• BS

-III o

ires o

-III t

ires a

-III a

ires n

• BS

-III t

RESULTS

In the ProTWIN trial 812 women gave birth to 1634 children (Figure 1). Cervical length was measured in 621 women (81% in pessary group and 71% in control group). The subgroup of women with a CL < 38mm consisted of 133 women (78 vs 55 in pessary group and control group) who gave birth to 268 children (157 vs 111 in pessary group and control group) of whom 27 children died from randomization till the end of the follow-up period of 3 years. Out of the 241 surviving children, 200 children (response rate 83%) participated in any kind of follow-up (Bayley-III and/or questionnaires). Bayley-III outcomes were collected in 173 children (response rate 72%, 114 (75%) pessary vs 59 (66%) control) at 3 years corrected age (mean 37 months, SD 3 months). Nine of these Bayley-III tests were performed as standard care, and six of them were performed around 24 months corrected age (range 23 to 27 months). Parental questionnaires of 123 children (82%) in the pessary group and 66 children (73%) in the control group were returned (Figure 1).

The mothers of children that were assessed in the ProTWINkids study had characteristics comparable to the original sample of mothers with short cervical length at the entry to the ProTWIN trial (Table 1). For the short term outcomes, the median gestational age at delivery for the total group was 36+0 weeks (IQR 32+7 to 37+1 weeks) as compared to 36+2 (IQR 34+6 to 37+3 weeks) in the ProTWINkids participants at follow-up. Less children born < 28 weeks participated in the follow-up compared to the original sample, as these children were mostly the ones that died (10 out of 12 mothers with GA <28 weeks at delivery were confronted with perinatal death). No clinically relevant or statistically significant differences were seen between the pessary and control groups in demographic characteristic of the mothers or social background of the participating children in the ProTWINkids study (Table 1).

Chapter 3

INkids

t birt

)†16

‡ n(

“low

l” (t

“mid

l” (t

n, “

l” (t

de ≥

‡ Fa

r at l

Chapter 3

Death or survival with a neurodevelopmental disability

From the entry in the ProTWIN trial at randomization till the age of 3 years, in

the pessary group a total of 6 children (5%) died during pregnancy or before

discharge of the hospital, with no deaths after discharge till the age of 3 years,

as compared to a total of 21 children (26%) in the control group, of whom

19 children died during pregnancy or before discharge of the hospital and 2

children died after discharge till the age of 3 years; OR 0.13 [95% CI 0.04 to 0.48]

(Table 2). In the children that had a Bayley-III assessment (N=173), 6 children

(5.0%) had a at least one scale of the Bayley scored as ≤ 85 points (-1SD) in

the pessary group compared to 2 children in the control group (2.5%). The

composite outcome death or survival with a neurodevelopmental disability

occurred in 12 (10%) vs 23 children (29%) in the pessary and control group

respectively, with OR 0.26 [95% CI 0.09 to 0.75]; p=0.013.

Bayley-III outcomes in surviving children

Amongst the surviving children the mean Bayley scores were all around 100,

which is comparable with the general population. The 95% CI of the differences

in mean Bayley scores on cognitive, language and motor development

between the pessary and control group did not exceed the minimal clinically

important difference of 8 points (Table 3) before and after adjustment for

socio-demographic confounders (parental education, smoking, ethnic origin,

children being the eldest in the family, use of daycare, bilingual and received

breastfeeding>6 months). Analysis with multiple imputed data showed

comparable results (Table 3).

General health outcomes in surviving children

The number of surviving children that did not visit any medical specialist or

developmental specialist till the age of 3 years (besides visits to a general

practitioner) was higher in the pessary group compared to the control group,

63 (51%) vs 20 (30%), aOR 2.63 [95%CI 1.09 to 6.3]. Also the number of children

that did not use any medication till the age of 3 years was higher in the pessary

group compared to the control group, 46 (37%) vs 13 (20%), aOR 3.29 [95%CI

1.23 to 8.8]. No significant differences were seen between surviving children

from the pessary and control groups for current use of medication, the need of

hospital admissions or operation(s) in the past 3 years (Table 4).

Table 2. Death or survival with a neurodevelopmental disability. Results of Linear mixed-effects model for complete case and multiple imputation

Odds Ratio unadjusted (95% CI)

P-value NNT (95% CI)

Complete case analysis

Pessary n=120

Control n=80

Death or survival with neurodevelopmental disability

12 (10.0) 23 (28.8) 0.26 (0.09 to 0.73) 0.013 6 (3.3 to 13.4)

Total death Stillbirth Death before discharge Death before 3yrs corrected age

6 (5.0) 3 (2.5) 3 (2.5) 0 (0)

21 (26.3) 2 (2.5) 17 (21.25) 2 (2.5)

0.13 (0.04 to 0.48) 0.003 -

Disability 6 (5.0) 2 (2.5) 1.43 (0.38 to 5.4) 0.597 -

Multiple imputation analysis (n=5 datasets)

5 datasets of 157 children (n=785)

5 datasets of 111 children (n=555)

Death or survival with a neurodevelopmental disability*

60 (7.6) 115 (20.7) 0.19 (0.046 to 0.77) 0.035 8 (5.9 to 10.8)

Death 30 (3.8) 105 (18.9) 0.09 (0.017 to 0.44) 0.007 -

Disability 30 (3.8) 10 (1.8) 1.08 (0.79 to 1.50) 0.624 -

*Neurodevelopmental disability defined as at least one Bayley-III score ≤85points (less than -1SD below the mean) in one or more subtests (cognitive, language and motor scores)

Chapter 3

tial b

Table 4. Results of multilevel analysis adjusting for confounders for health related problems in the last three years in surviving children

Pessary n=123n(%)

Control n=66n(%)

Odds Ratio (95% CI) unadjusted

Odds Ratio (95% CI) adjusted*

Involvement of healthcare providersNo healthcare providers involved General practitioner onlyDevelopmental support only†Specialist care only†General practitioner and developmental support General practitioner and specialists careDevelopmental support and specialist careGeneral practitioner, developmental support and specialist care

43 (35.0)20 (16.3)5 (4.1)10 (8.1)1 (0.8)

26 (21.1)

4 (3.2)

14 (11.4)

12 (18.2)8 (12.1)2 (3.0)6 (9.1)8 (12.1)

18 (27.3)

2 (3.0)

10 (15.2)

2.55 (1.15 to 5.7) ** 2.63 (1.09 to 6.3)

Use of medicationIn the past No medication Antibiotics only (median –range-) Anti-epileptic medication only Anti-asthmatic medication only Anti-eczema medication only Other medication used (e.g.

anti-laxans, antivomiting) >1 medication Current use of medication ‡

46 (37.4)30 (24.4)0 (0)1 (0.8)9 (7.3)10 (8.1)

27 (22.0)16 (13)

13 (19.7)15 (22.7)0 (0)5 (7.6)4 (6.1)8 (12.1)

21 (31.8)13 (19.7)

2.54 (1.06 to 6.1) ***

0.60 (0.24 to 1.50)

3.29 (1.23 to 8.8)

0.58 (0.21 to 1.58)Admission to hospital in last 3 years §012>2

85 (69.1)30 (24.4)6 (4.9)2 (1.6)

49 (76.6)8 (12.5)3 (4.7)4 (6.2)

0.75 (0.33 to 1.72)**** 0.65 (0.27 to 1.62)

Surgery in last 3 years 012>2

104 (84.5)14 (11.4)4 (3.3)1 (0.8)

58 (87.9)6 (9.1)1 (1.5)1 (1.5)

0.77 (0.28 to 2.1)***** 0.70 (0.23 to 2.1)

*Adjusted for parental education (high-middle-low), smoking, ethnic origin, children being the eldest in the family, use of daycare, bilingual and received breastfeeding>6 months†Developmental support: physical therapy, occupational therapy and speech therapySpecialist care: pediatrics, surgery, audiology, cardiology, dermatology, ENT, orthopedics, psychiatry, ophthalmology, genetics‡ Current use of medication as antibiotics, anti-epileptic-, anti-asthmatic-, anti-eczema-medication and others) Only two children used >1 current medication§ Missing data on hospital admission frequency 2 children in control group**Analysis comparing children seeing no healthcare provider or only a general practitioner to children seeing a specialist or developmental support or a combination*** Analysis comparing children taking no medication to children taking one or more medications in the past****Analysis comparing children with no hospital admission to children with ≥1 hospital admissions*****Analysis comparing children with no surgery to children with ≥1 operations

Chapter 3

DISCUSSION

This 3-years follow-up study shows long term outcomes to be consistent with

earlier reported short term outcomes of the ProTWIN trial, reflecting that the

use of a pessary in women with a twin pregnancy and a short cervix reduces

short term mortality without affecting neurodevelopmental disability in the

offspring.

Strengths and Limitations

There are several strengths of this study. This is a follow-up study of a randomized

controlled trial, a practice that is unfortunately not common in perinatal trials.

In a systematic review evaluating the results of long-term follow-up of children

from large obstetrical trials, only 16% of the trials were able to report on follow-

up.15 Second, to our knowledge, this is the first follow-up study evaluating the

long-term effects in children born after placement of a pessary in pregnancy.

To date, there are two other randomized controlled studies evaluating the use

of a pessary in women with a twin pregnancy 16 17 and three in singletons as

a prevention of preterm birth strategy.18-20 No follow-up data of these trials

have been published so far. Third, instead of parental reports, this follow-up

study used an objective, validated and standardized assessment (Bayley-III)

to evaluate neurodevelopmental outcomes, with blinding of the outcome

evaluator for study group. Fourth, also Bayley-III test results performed in

standardized care setting were included, which prevented a potential higher

attrition of children that already received more medical monitoring.

A first limitation is the number of missing cervical length measurements

that differed between the pessary and the control group. The cervical length

measurements were performed either before or shortly after randomization

between 16 and 22 weeks’ gestation and were not considered as a measurement

in the inclusion criteria because the study targeted an unselected population

of women with a multiple pregnancy between 12 and 20 weeks’ gestation.

Although cervical length was not part of the stratification process at the

moment of randomization, the numbers of patients that were randomized to

each group can be considered to be sufficient to expect a more or less even

distribution of baseline characteristics. Sensitivity analysis performed in the

original ProTWIN trial showed that missing cervical length measurement did

not alter the effect of pessary and therefore is unlikely to be a confounder.

A second limitation of this study is the loss to follow-up. Even though a 83%

(and 73% for Bayley participants) is a reasonable follow-up response rate, there

is still a substantial risk of bias. The first source of bias when comparing follow-

up groups is mortality, since the main reason for not being able to follow-up

the extreme preterm infants (categories < 28 weeks and < 32 weeks) was

mortality. We therefore chose to evaluate our long-term results by including

deceased children in the analysis. As a composite outcome ‘survival without

disability’ was used to account for bias due to different survival rates.21 The

second reason for loss to follow-up was unavailable or inaccurate contact data

to approach parents for participation. We assume that this loss to follow-up

occurred randomly, and is therefore not associated with neonatal outcomes.

For example, one hospital did not provide any contact data for follow-up. The

third reason for loss to follow-up was caused by parents refusing participation

causing attrition bias. However, we found no statistical differences between

follow-up participants and original ProTWIN kids participants with respect to

maternal demographical background.

To date the benefit of a pessary in twin pregnancy is still discussed due to

contradictory results of the studies published so far. After publication of the

ProTWIN trial, the PECEP-twin trial evaluated the effect of a pessary in 137

women with twin pregnancy and a cervical length of ≤25mm between 2011 and

2014.16 A reduction in spontaneous preterm birth before 34 weeks of gestation

was observed (16.2% versus 39.4%; RR 0.41 [95% CI 0.22 to 0.76]). Nicolaides et

al. performed a trial evaluating the effect of pessary in unselected women with

a twin pregnancy.17 This trial evaluated 1180 women between 2008 and 2011

and published its results 5 years after finishing recruitment. Overall this study

found no reduction in spontaneous preterm birth rate <34 weeks (14% vs 13%,

RR 1.05 [95% CI 0.78 to 1.41]). Posthoc analysis in the subgroup of 214 women

with a short cervix (≤25mm) also showed no benefit of pessary whatsoever

(31% vs 26%, RR 1.20 [95% CI 0.78 to 1.84]).

Chapter 3

In women with a singleton pregnancy the same contradicting results have been

reported. Goya et al. reported a strong benefit from pessary in women with a

singleton pregnancy with short cervical length (≤25mm), while Nicolaides et

al. found no benefit in women with a singleton pregnany with a short cervical

length (≤25mm).18 20 A small study from Hong Kong, that was stopped due

to recruitment problems, also found no benefit.19 These conflicting results

may be due to the difference in gestational age at which the pessary was

inserted between the studies. In studies where the pessary was inserted at an

earlier gestational age, the effect seems to be present for singletons and twin

pregnancy. The ProTWIN trial reports a mean gestational age of 16.9 weeks (SD

2.0) 6 and Goya et al. reports a mean gestational age at randomization of 22.1

weeks (SD 0.8) in their twin study16, as compared to Nicolaides et al. that reports

a median gestational age at randomization of 22.6 weeks (IQR 21.4 to 23.9).17

For singletons Goya et al reporting a mean gestational age at randomization

of 22.2 weeks (SD 0.9) and Nicolaides et al a median of 23.4 weeks (IQR 22.6 to

24.3).18 20

We should emphasize that our report in women with a short cervix, although prespecified, was based on a subgroup analysis of an overall trial that was negative. As the benefit seems to be mostly present in the group of women with a short cervix, future studies should limit the eligibility criteria to this subgroup or at least ensure sufficient statistical power in this subgroup. We found in an explanatory analysis that the treatment effect, although present in all women with cervix <38 mm, was related to the length of the cervix, i.e. the shorter the cervix, the stronger the treatment effect.22 Worldwide, more trials are in progress, not only evaluating the use of a pessary, but also the use of progesterone and cerclage in different populations. More than 10 trialists from five continents have expressed their intention to contribute to a prospective Individual Participant Data Meta-Analysis (e.g. trials registered with the following numbers, NCT02328989, NCT02056639, NCT02056652, NCT02235181, NCT02405455, NCT00735137, NCT02511574, NCT02518594, NTR4414, NTR4415). This international collaboration will hopefully provide further evidence for optimal management of pregnant women with a multiple pregnancy to reduce their high risk of preterm birth.23

When we, while considering the limitations mentioned above, translate the

results of this follow-up study into clinical practice, six to eight women with

a short cervix (<38mm) between 12-20 weeks gestation need to be treated

with a cervical pessary to prevent one child from death or survival with

neurodevelopmental disability. This is information that can be used in clinical

counselling for women. In our opinion, the number needed to treat of 6 to

8 clinically, justifies the possible discomfort of women wearing a pessary

(i.e. excessive vaginal discharge and the very small risk of cervical necrosis)

compared to the potential effect of preventing one child’s death or survival

with neurodevelopmental disability.

CONCLUSION

This follow-up study found no evidence of harm associated with the use of a

cervical pessary in a twin pregnancy and a cervical length < 38mm, for surviving

children at three years corrected age.

Chapter 3

REFERENCES

1. Martin JA, Hamilton BE, Osterman MJ, et al. Births: final data for 2013. Natl Vital Stat Rep 2015;64(1):1-65.

2. Schaaf JM, Mol BW, Abu-Hanna A, et al. Trends in preterm birth: singleton and multiple pregnancies in the Netherlands, 2000-2007. BJOG 2011;118(10):1196-204.

3. Dodd JM, Jones L, Flenady V, et al. Prenatal administration of progesterone for preventing preterm birth in women considered to be at risk of preterm birth. Cochrane Database Syst Rev 2013(7):CD004947.

4. Koullali B, Oudijk MA, Nijman TA, et al. Risk assessment and management to prevent preterm birth. Semin Fetal Neonatal Med 2016;21(2):80-8.

5. Schuit E, Stock S, Rode L, et al. Effectiveness of progestogens to improve perinatal outcome in twin pregnancies: an individual participant data meta-analysis. BJOG 2015;122(1):27-37.

6. Liem S, Schuit E, Hegeman M, et al. Cervical pessaries for prevention of preterm birth in women with a multiple pregnancy (ProTWIN): a multicentre, open-label randomised controlled trial. The Lancet 2013;382(9901):1341-49.

9. Alfirevic Z. Tocolytics: do they actually work? BMJ 2012;345:e6531.

10. Liem SM, Schuit E, van Pampus MG, et al. Cervical pessaries to prevent preterm birth in women with a multiple pregnancy: a per-protocol analysis of a randomized clinical trial. Acta Obstet Gynecol Scand 2016;95(4):444-51.

11. Bayley N. 2006a. The Bayley scales of infant and toddler development-third edition: San Antonio, TX: Psychological Corperation.

12. Van Baar AL, Steenis LJ, Verhoeven M, et al. Bayley-III-NL; technische handleiding: Amsterdam, the Netherlands: Pearson Assessment and Information B.V., 2014.

13. Gates S, Brocklehurst P. How should randomised trials including multiple pregnancies be analysed? BJOG 2004;111(3):213-9.

14. VanderWeele TJ, Hernan MA, Robins JM. Causal directed acyclic graphs and the direction of unmeasured confounding bias. Epidemiology 2008;19(5):720-8.

16. Goya M, de la Calle M, Pratcorona L, et al. Cervical pessary to prevent preterm birth in women with twin gestation and sonographic short cervix: a multicenter randomized controlled trial (PECEP-Twins). Am J Obstet Gynecol 2016;214(2):145-52.

17. Nicolaides KH, Syngelaki A, Poon LC, et al. Cervical pessary placement for prevention of preterm birth in unselected twin pregnancies: a randomized controlled trial. Am J Obstet Gynecol 2016;214(1):3 e1-9.

18. Goya M, Pratcorona L, Merced C, et al. Cervical pessary in pregnant women with a short cervix (PECEP): an open-label randomised controlled trial. Lancet 2012;379(9828):1800-6.

19. Hui SY, Chor CM, Lau TK, et al. Cerclage pessary for preventing preterm birth in women with a singleton pregnancy and a short cervix at 20 to 24 weeks: a randomized controlled trial. Am J Perinatol 2013;30(4):283-8.

20. Nicolaides KH, Syngelaki A, Poon LC, et al. A Randomized Trial of a Cervical Pessary to Prevent Preterm Singleton Birth. N Engl J Med 2016;374(11):1044-52.

21. Parekh SA, Field DJ, Johnson S, et al. Accounting for deaths in neonatal trials: is there a correct approach? Arch Dis Child Fetal Neonatal Ed 2015;100(3):F193-7.

22. Tajik P, Monfrance M, van ‘t Hooft J, et al. A multivariable model to guide the decision for pessary placement to prevent preterm birth in women with a multiple pregnancy: a secondary analysis of the ProTWIN trial. Ultrasound Obstet Gynecol 2016;48(1):48-55.

23. Mol BW, Ruifrok AE, Global Obstetrics N. Global alignment, coordination and collaboration in perinatal research: the Global Obstetrics Network (GONet) Initiative. Am J Perinatol 2013;30(3):163-6.

24. Potharst ES, van Wassenaer-Leemhuis AG, Houtzager BA, et al. Perinatal risk factors for neurocognitive impairments in preschool children born very preterm. Dev Med Child Neurol 2013;55(2):178-84.

CHAPTER 4

PREVENTING PRETERM BIRTH WITH PROGESTERONE IN

WOMEN WITH SHORT CERVICAL LENGTH: OUTCOMES IN

CHILDREN AT 24 MONTHS OF AGE

Janneke van ‘t Hooft, Cuny Cuijpers, Caroline Schneeberger,

Johanna H. van der Lee, Brent C. Opmeer, Leonie Steenis,

Cornelieke van de Beek, Melanie van Os, Jeanine van der Ven,

Christianne J.M. de Groot, Ben Willem J. Mol, Aleid G. van Wassenaer-Leemhuis.

Manuscript in preparation

Chapter 4

ABSTRACT

Objective

To evaluate, in women with a short cervix but otherwise from low-risk population,

the effects of antenatal progesterone on long-term neurodevelopmental and

physical outcomes.

Methods

We invited children born from 80 women randomised to progesterone (n=41)

or placebo (n=39) at 2 years corrected age for a Bayley- Scales of Infant and

Toddler Development-third edition (Bayley-III) assessment and a neurological

and physical examination. Mothers with singleton pregnancies had been

identified from a screening program through a cervix ≤30mm but were

otherwise without risk factors. Outcomes were assessed double-blinded. Mean

cognitive and motor-developmental scores were compared and corrected for

potential confounders in linear regression analysis. Information on physical-,

genital- and neurological examination were collected. Our sample size was

dictated by the original sample of the Triple P trial and gave us 95% power to

detect a mean difference of 15 points, 1SD from the mean, in the Bayley scales.

Results

Of the 77 surviving children in the Triple P trial, 59 children (77%) were reached

for follow-up and 57 (74%, n=28 progesterone, vs n=29 placebo) Bayley-III

outcomes were collected. Adjusted mean difference in cognitive and motor

scores were -3.2 [95% CI -9.2 to 2.8] and -4.9 [95% CI -11.3 to 1.4]. Minor

congenital malformations were seen in 8 (30%) and 2 (11%) children in the

progesterone and placebo group respectively, RR 4.0 [95% CI 0.93 to 17.1].

No differences in physical-, genital and neurological examination were seen

between both groups.

Conclusion

In this sample of low risk women with a short cervix no differences in

neurodevelopmental outcome were found in the offspring at 2 years corrected

age between those exposed to progesterone in second and third trimester and

those exposed to placebo. The difference in (minor) congenital abnormalities

should be further explored.

Follow-up progesterone in singleton pregnancy with short cervix

INTRODUCTION

Preterm birth is associated with increased rates of neonatal mortality and long-

term morbidity.1 Preventing preterm birth has therefore a substantial benefit

on health and healthcare costs of children. Although women with a previous

preterm birth and women with a multiple gestation are at the highest risk of

preterm birth, the majority of spontaneous preterm birth occur in low risk

women.2 Identification of low-risk women who will deliver prematurely can be

done by cervical length (CL) screening by transvaginal ultrasound at 20 to 22

weeks of gestation.

Since two decades, it is known that women with a short cervix (≤30mm) have

a 3- to 4-fold risk of preterm birth.2 Several studies evaluating the effect of

progesterone in the prevention of preterm birth in women with a short

cervix in a mixed population of high and low-risk women (based on history

of preterm birth, twin gestation, etc) show contradictory results in benefit

on short term neonatal outcomes.3-5 Fonseca et al. randomized patients to

vaginal progesterone and placebo and showed a reduction of spontaneous

delivery before 34 weeks, RR 0.56 [95% CI 0.36 to 0.86] but not in neonatal

morbidity.3 The reduction of spontaneous delivery before 33 weeks was also

found in the study of Romero et al, RR 0.55 [95% CI 0.31 to 0.91] as well as a

reduction in respiratory distress syndrome (RDS), RR 0.39 [95% CI 0.17 to 0.91]4.

Benefits on preterm birth reduction were confirmed in a Cochrane analysis of

four RCT’s on (intramuscular and vaginal) progesterone. However, benefits on

neonatal morbidity (e.g. RDS, intraventricular haemorrhage, periventricular

leukomalacia, necrotising enterocolitis and neonatal death) were not found

after pooling the available data.5

For safety outcomes, historically the possibility of masculinisation of the

genital tract in female foetuses or hypospadias in the male infants was feared.6

However, several observational studies on exposed and non-exposed infants

could not confirm or refute these assumptions.7-9 Other concerns regarding

foetal loss and increased severe respiratory distress in neonates are described

regarding exposure to 17-hydroxyprogesterone caproate, the synthetic form

of progesterone.10

Chapter 4

Follow-up studies of randomized trials evaluating progesterone exposure

compared to placebo in second and third trimester of pregnancy are

scarce, and the studies published before 2016 were of limited quality due

to use of questionnaires instead of discriminative diagnostic physical and

neurodevelopmental examinations.11-13 Recently, the OPPTIMUM trial

evaluated the use of vaginal progesterone compared to placebo in women at

risk of preterm birth (because of previous preterm birth or short cervix) in 869

children up to 2-years of age with a discriminative neurodevelopmental tool

(Bayley- Scales of Infant and Toddler Development) and clinical assessment.14

This trial showed similarly distributed neurodevelopmental impairment in each

group, but a significant increase (although this occurred in at maximum 2% of

the progesterone and 1% in the placebo group) of somatic impairments in the

renal, gastrointestinal and respiratory system. Follow-up of the PREDICT trial,

evaluating vaginal progesterone treatment in women with a twin pregnancy,

was done with questionnaires combined with medical histories in 989 twins

up to 8 years. The authors reported a possible 8-fold increased risk of cardiac

abnormalities (e.g. a septum malformations, other malformations, murmur,

rhythm disturbance and aortic aneurism).15

To our knowledge the Triple P trial was the first randomised controlled trial

evaluating the effect of progesterone in women with a short cervix from a low-

risk population only.16 Between 2009 and 2013, 20,234 women were screened

for a short cervix. We identified 151 women with a cervical length ≤30mm at

2 measurements, of whom 80 women were randomly allocated after informed

consent to progesterone (n=41) and placebo (n=39). Use of progesterone

(capsules with 200mg micronized progesterone self-administered vaginally on

a daily basis between 22 and 34 weeks of gestation) resulted in a nonsignificant

decrease in the primary outcome (adverse perinatal event) of 5% vs 11%, RR

0.47 [95% CI 0.09 to 2.4) and other outcomes like preterm birth rates <32 weeks

(2% vs. 8%; RR 0.33 [95% CI 0.04 to 3.0].

As previous studies have demonstrated that agents given to pregnant women

with the aim of improving pregnancy outcomes can have unexpected long-

term effects on children, which may not be apparent at birth, 17 18 and new long-

term follow-up data on progesterone exposure are non-reassuring, long term

follow-up in the children exposed to progesterone is highly needed. The aim

of this study was therefore to determine the long term effects of prophylactic

exposure to vaginal progesterone in second and third trimester of pregnancy

in low risk women with a short cervix on health and neurodevelopmental

outcomes in the children at two years corrected age.

METHODS

The study evaluated follow-up outcomes in children born to mothers that

participated in the Triple P trial at two years corrected age. The Triple P trial was

a multicentre double-blind placebo-controlled randomised trial (NTR- 2078).

This study and its follow-up was approved by the Medical Ethics Committee

of the Academic Medical Centre Amsterdam the Netherlands (MEC AMC 08-

328).16 The trial was discontinued early due to the unexpected low rate of

women screened with a short cervix. After 4 years we had screened 20,234

women, of whom 151 women were eligible, but only 80 of the planned 1,920

women were randomized.16

Follow-up assessment

Families were contacted by phone 3 months prior to the corrected age of 2

years. After informed consent of the parents, cognitive and motor scales of the

Bayley Scales of Infant Development-III (Bayley-III) were assessed followed by

a physical an neurological examination. A trained team of psychologists and

medical doctors performed all Bayley tests at home or in the hospital. Parents

were asked to fill in the questionnaires prior or shortly after the visit. Parents,

psychologists, doctors and researchers involved in data collection and entry

remained blind to the allocated treatment.

Bayley-III Scales

The Dutch translation of the Bayley-III was used to assess cognitive and motor

development in infants and toddlers.19 20 The Bayley-III norms for children of the

US population were used, as the Dutch norms were not yet available.20 The test

and its norms are used worldwide in health care settings, as well as for scientific

research purposes. The developmental outcomes are reported by three scales

for cognition, language and motor development with a normed mean of 100

and a SD of 15 using US norms. A score of ≤85 points (i.e. lower than 1 SD

Chapter 4

below the mean) for any of the scales of the Bayley-III represents suboptimal

development.20

Physical examination

Physical examination was performed by one paediatrician using a standardized

assessment format evaluating vision, hearing, heart, lungs, abdomen, dermal,

genital and neurological abnormalities.

Questionnaires

Two parent completed validated questionnaires were used: The Ages and

Stages Questionnaire (ASQ) 3rd edition21 and the Child Behaviour Checklist 1.5-

5 years (CBCL).22 An additional general health questionnaire provided further

information on baseline characteristics, and clinical history between birth and

age 2 and concerned medication use, hospital admittance and need for surgery.

The ASQ is a developmental screening tool that covers five domains of child

development (communication, gross and fine motor development, problem-

solving and personal-social skills).21 The CBCL measures parental perception of

social competencies and behavioural problems during the past 2 months.22

Power calculation

An indicative power calculation showed that a number of 17 children per group

would give 80% power to find a difference of -1 SD in the Bayley-III test with a

two-sided significance level of 0.05. A Bayley-III score of 85 (mean population

score minus 1 SD) indicates a mild developmental delay. A more subtle (but

still clinically relevant) difference of 0.5 SD would require 64 children per

group. Because the size of the study was predefined by the number of women

recruited to the TripleP trial, the study was deemed sufficiently powered to

demonstrate a 15 points difference in the Bayley-III test but was underpowered

to find smaller (but clinically relevant) differences.

The outcomes presented are based on the intention to treat principle.

For analysis IBM SPSS version 21 (NY, USA) was used. Differences in social

background characteristics of mothers and children participating in the TripleP

follow-up study were compared between progesterone and control group

using unpaired T-test, Mann-Whitney U test, Chi-square test or Fishers exact

test when appropriate. Mean cognitive and motor scores of the Bayley-III

test were compared using linear regression adjusting for educational level of

parents (high-middle-low).

Results of the physical examination were grouped in ‘congenital abnormalities’

(minor and major) and ‘syndromes/genetic disorders’. If more than one

abnormality was found in the same child this was counted once. Special

attention was given to all abnormalities seen in the genital region. ASQ scores

of 1 SD below the normative mean in two or more domains, or 2 SD below the

normative mean in at least one domain were scored as abnormal consistent with

the clinical use of the ASQ.23 CBCL t scores were calculated and a score >97th

percentile (indicating serious behavioural problems) were scored as abnormal.22

RESULTS

Of 80 women and children enrolled in the TripleP trial (n=41 progesterone

and n=39 placebo), 3 children died because of extreme prematurity (n=1

progesterone and n= 2 placebo) leaving 77 surviving children eligible for

follow-up. In total, 59 (77%) children participated in the follow-up and 57 (74%)

performed a Bayley-III tests (n=28 progesterone and n=29 placebo) and in 54

children a physical examination was carried out (Figure 1).

Social background characteristics of mothers and children participating in

the follow-up study were broadly similar when compared to characteristics of

mothers and children that were lost to follow-up except for parental education

and ethnicity (Appendix 1). Characteristics between placebo and progesterone

groups that participated at follow-up were comparable (Table 1).

No differences were observed in mean cognitive and motor scores of the Bayley-

III test and number of children with a Bayley score <-1SD (Table 2). Neurological

examination showed no cases of CP, while mild neurological abnormalities

consisting of mild hypotonia or hypertonia of the extremities occurred in two

children in the progesterone group and one in the placebo group. Two children

with developmental problems were identified in the placebo group. One child

had severe eating problems and behavioural issues while another child had a

delay in active language development.

Chapter 4

Figure 1. Flowchart of children that participated in the TripleP follow-up study starting from randomization of pregnant women with a singleton pregnancy and short cervix in the TripleP trial

80 women were randomily assigned in the Triple P trial

Progesterone n=41 Placebo n=39

Deceased children n=1

Deceased children n=2

Children eligible for follow-up

Children seen at 2years corrected age n=29 (73%)Number of Bayley tests completed n=28• Bayley + physical examination+ questionnaires n=25• Bayley and questionnaires only n=1• Other combinations n=3

Lost to follow-up n=7

No contact data n=4Not willing to participate n=3

Lost to follow-up n=11

No contact data n=7Not willing to participate n=4

Children seen at 2years corrected age n=30 (81%)Number of Bayley tests completed n=29 • Bayley + physical examination+ questionnaires n=22• Bayley and questionnaires only n=3• Other combinations n=5

Children eligible for follow-up

Miscellaneous minor congenital malformations were somewhat more frequent

in the progesterone group than in the placebo group, 8 (30%) vs 2 (11%) RR 4.0

[95% CI 0.93 to 17.1]. In the progesterone group haemangiomas were seen in

two children, combination of café au lait spot and small cardiac septum defect

(n=1), combination of 2 dimples on the back and café au lait spots (n=1), de-

pigmented small stripes on upper body (n=1), colour diff erences between

both iris (n=1) and a small umbilical hernia (n=1). In the placebo group a

haemangioma was seen in one child, and an isolated café au lait spot and

ptosis in one eye in another one.

A clinical score on the CBCL questionnaire was somewhat less found in the

progesterone than placebo group 1 (4%) vs 4 (15%) RR 0.25 [95% CI 0.03 to

2.1]. Genital abnormalities were comparable between both groups. Also,

neurological examination, ASQ scores, , and health care utilization (e.g.

medication use, hospital admittance) were comparable between both groups

(Table 2).

Table 1. Sociodemographic characteristics from mothers and their children that participated in the TripleP 2 years follow-up study and short term pregnancy and neonatal outcomes

Mothers and their children assessed at two years follow-up with Bayley-III test

Social background characteristics mothers

n/n Progesterone n=28 Placebo N=29 P value

Median (IQR) maternal age at randomization

28/29 31 (26 to 34) 31 (29 to 34) 0.46

Nuliparity n(%) 28/29 6 (21.4) 11 (37.9) 0.17

Parental education* High n(%) Middle n(%) Low n(%)

28/2817 (60.7)5 (17.9)6 (21.4)

18 (64.3)5 (17.9)5 (17.9)

Ethnic origin white European n(%) 28/29 22 (78.6) 22 (75.9) 0.81

Social background characteristics children at the age of 2yrs

Living in two parent family † n(%) 28/27 25 (89.3) 26 (96.3) 0.51

First born child n(%) 28/27 22 (78.6) 16 (59.3) 0.12

Dutch main language spoken at home n(%) 28/27 26 (92.9) 21 (77.8) 0.11

Bilingual n(%) 28/27 7 (25) 9 (33.3) 0.50

Breastfed in the first 6 months‡ n(%) 27/27 6 (22.2) 9 (33.3) 0.36

Use of day care n(%) 28/27 19 (67.9) 18 (66.7) 0.32

Pregnancy outcomes

Corticosteroids during pregnancy n(%) 28/29 5 (17.9) 5 (17.2) 0.95

PPROM n(%) 28/29 4 (14.3) 3 (10.3) 0.71

Treatment compliance (n taken >80% of medication) n(%)

28/28 20 (71.4) 15 (53.6) 0.21

Neonatal outcomes

Gender male n(%) 28/29 9 (32.1) 16 (55.2) 0.08

Composite adverse neonatal outcome ∫ 28/29 0 (0) 2 (6.9) 0.49

NICU admission n(%) 28/29 0 (0) 5 (17.2) 0.05

Gestational age at birth in weeks + days median (IQR)<32 wk n(%)<34 wk n(%)<37 wk n(%)

28/29 38+6 (37+1 to 40+2)0 (0)2 (7.1)5 (17.9)

38+5 (37+6 to 40+1)3 (10.3)3 (10.3)4 (13.8)

0.241.000.73

Birthweight in grams (IQR) <2500g n(%) <1500g n(%)

28/29 3013 (2602 to 3012)5 (17.9)0 (0)

3360 (2915 to 3755)4 (13.8)2 (6.9)

0.730.49

* parental education: “low level” (total years postelementary schooling:<6) if at least one of the parent has a low level of education (but not if one parent is highly educated), “middle level” (total years postelementary schooling: 6-8)if both parents have middle level of education, “high level” (total years of post-elementary schooling:>8) if at least one parent is highly educated†families with both biological parents or one biological and one non-biological patent ‡ breastfeeding exclusive or in combination with formula for at least 6 months∫ Composite of adverse neonatal outcome until 10 weeks after the expected date of delivery, containing the following components: respiratory distress syndrome, bronchopulmonary dysplasia, intracerebral haemorrhage >grade II, necrotizing enterocolitis >stage 1, proven sepsis, and death before discharge

Chapter 4

) ‡27

or ‡

or ∫

on ≥

n: “

), “m

vel”

n, “h

l” (t

†No cases of CP were found‡Minor congenital abnormalitiesprogesterone: haemangioma's (n=2), combination of café au lait spot and small cardiac septum defect (n=1), combination of 2 dimples on the back and café au lait spots (n=1), de-pigmented small stripes on upper body (n=1), colour differences between both iris (n=1), small umbilical hernia (n=1).Placebo: haemangioma (n=1), café au lait spots isolated (n=1), ptosis in one eye (n=1)∫Major congenital abnormality seen in one child: a combination of failure to thrive, need for Percutaneous Endoscopic Gastrostomy (PEG) tube and genital abnormality (small testes and thin penis with normal length) without underlying genetic cause as yet§ Genetic disorder seen in one child diagnosed with Noonan syndromeΩ Genital abnormalitiesprogesterone: small testes and thin penis with normal length, café au lait spot of 5cm on labia majora, underdeveloped scrotal skinplacebo: undescended testis, unretractable foreskin, labial adhesion to 70% of minor labium¥ ASQ child scores 1 SD below the normative mean on two or more domains, or 2 SD below the normative mean on at least one domain. This corresponds to immediate referral for further assessment in clinical practiceCBCL scoring of >97 percentile defined as deviantNeed for healthcare providers. This could be either developmental support needing physiotherapy, occupational therapy and/or speech therapist AND/OR specialist care visiting: paediatrics, surgery, audiology, cardiology, dermatology, ENT, orthopaedics, psychiatry, ophthalmology, genetics

DISCUSSION

In this follow-up study of a randomised clinical trial, we could not

demonstrate long term effects of exposure to progesterone in women with

a singleton pregnancy and short cervix, concerning physical, health and

neurodevelopmental outcomes of children at two years corrected age.

Strengths and Limitations

There are several strengths of this study. First, this is a follow-up study of a

randomized controlled trial, maintaining blinding of parents, care-providers

and researchers in the performance of the follow-up measurements. Second,

instead of parental reports only, this follow-up study used a broad variety of

validated instruments and assessments (Bayley-III test and neurological and

physical assessment by a paediatrician), together with validated behavioural

and neurodevelopmental questionnaires. To our knowledge only the

OPPTIMUM study has been able to report Bayley scales and health assessment

in children exposed to progesterone.14

A limitation of this study is the small number of patients randomized in the

original TripleP study. Since we measured cervical length in a low risk population

Chapter 4

previously monitored by primary care midwives, the pre-defined cut-off value

of 30mm is likely to have influenced the distribution of the CL measurements.

Since the measurement was not blinded, non-blinded assessors of cervical

length might have led to unexpected low rates of women with short cervixes,

and the fact that we wanted to have the short cervix confirmed at two different

time points, favoured a measurement just above the cut-off value, thus

reducing the number of eligible women.24 With the predetermined number of

80 randomized women in the TripleP study we were only able to detect large

(>1 SD) differences in Bayley outcomes between both groups. We were not

able to calculate power on outcomes like minor congenital abnormalities nor

genital abnormalities, but nevertheless found it important to describe all of

these, after examination by a paediatrician blinded for randomization.

A second limitation of this study is the loss to follow-up. Even though 77% (and

74% for Bayley participants) is a reasonable follow-up rate, there is still risk of

attrition bias. Appendix 1 shows a significant proportion of the children born

at premature gestational age (<34 weeks) that was not measured at follow-up

(5 out of 11). However, 3 out of 5 could not be assessed due to mortality.

Children that were not assessed at follow-up had lower educated parents

and a smaller proportion of white European ethnic group when compared to

children that were measured at follow-up. This may have influenced the results

as these factors are known to be associated with neurodevelopment.25

Subsequently, the sample in our follow-up study might not be fully representative

to a general population sample. We found a lower (3%, 2/62) proportion of mild

cognitive developmental delay (Bayley <1 SD), and 0% of mild fine- and 0% in

gross motor delay, as this is expected to be respectively 5%, 6% and 20% in

the general population.26 The high proportion of highly educated parents in the

follow-up sample (more than 60% in both study arms) may have influenced these

findings as educational status of parents is known to affect neurodevelopment,

which is even more pronounced in premature born infants.25

To date prophylactic administration of vaginal progesterone is common

practice in women with a past history of spontaneous preterm birth.5 Use of

progesterone for the prevention of preterm birth in other high-risk populations

(low-risk women with short cervical length, women with a multiple pregnancy

or women presenting with threatened preterm labour) is still debated and

practice variation is high. The 2013 Cochrane review on progesterone for

preterm birth prevention included 36 randomized controlled trials (RCT) but

concluded that long-term follow-up data is highly needed. After systematically

searching the literature we found 5 publications of RCT long term follow-up

studies and one conference abstract.11-15 27

Two follow-up publications of the PREDICT trial evaluated long-term effects of

progesterone exposure in twin pregnancy compared to placebo at 6 and 18

months 13 and 48 months up to 8 years.15 No differences in neurodevelopment

using the ASQ were found at 18 months (433 infants) and at 48 to 60 months

(437 infants). No differences in deaths or medical problems as reported in

medical records of outpatient clinics and during hospital admission were

found in 989 children (492 progesterone vs 497 placebo) up to 8 years of age.

In a subgroup analysis of children who had been admitted to hospital, an 8-fold

increased risk of cardiac abnormalities (e.g. septum malformations, other

malformations, murmur, rhythm disturbance and aortic aneurism) was found

for the progesterone group compared to controls. The follow-up of the STOPPIT

trial, which also evaluated the long-term effects of progesterone compared

to placebo in twins, used parent-completed validated questionnaires and

found no differences between the two groups concerning deaths, congenital

anomalies, hospitalization, or routine child health assessments in 324 children

between the age of 3 and 6 years.11

Child outcomes after intramuscular progesterone administration compared

to placebo during singleton pregnancy in women with previous spontaneous

preterm birth were evaluated by Northen et al.12 No differences in ASQ scores

were found in 278 children assessed between the ages of 3 and 5 years. The

OPPTIMUM trial, evaluating the effect of vaginal progesterone compared

to placebo in singleton pregnancy with previous spontaneous birth at ≤34

weeks or a cervical length ≤25mm, included 2-years neurodevelopmental

outcomes in their initial study protocol.14 Bayley-III test and child health

assessment were available of 869 children. Although no differences were

seen in neurodevelopmental impairments between both groups, renal,

gastrointestinal, and respiratory problems though of low frequency, were more

common in the progesterone group (e.g. gastrointestinal problems in 2% in

progesterone group vs 1% in placebo, OR 2.67 [95% CI 1.37 to 5.20].

Chapter 4

To our knowledge, there is no earlier follow-up study evaluating the effect

of vaginal progesterone in a population of solely low risk women that have a

mid-pregnancy short cervical length at screening. Findings from the Triple-P

follow-up study are consistent with earlier follow-up studies in higher risk

populations also showing no difference in neurodevelopment in children

exposed to progesterone in the second and third trimester when compared

to placebo. We did not find an indication of increased renal, gastrointestinal

or respiratory problems like the OPPTIMUM study, or cardiac problems like

the PREDICT follow-up study. We did however find a 4 fold increase in minor

congenital abnormalities, which were mostly related to the skin (like café au

lait spots and haemangiomas). Especially CBCL outcome seemed better in the

progesterone group. Behavioural problems were previously measured in the

OPPTIMUM trial using the strength and difficulties questionnaire (SDQ), and

was not found to be different in both groups. Importantly, the results of most

of the present follow-up studies are underpowered or using less discriminative

instruments. Furthermore, assessment at two years of age does not reflect the

potential problems during infancy and should therefore be further explored.28

Future meta-analyses, preferably from individual participant data, must answer

whether progesterone can be considered safe for use in pregnancy.

CONCLUSION

In offspring of low risk women with a mid-pregnancy short cervix,

developmental outcome at 2 years corrected age was not different between

groups exposed to progesterone or placebo. Although our sample size was

small, the tendency towards more minor congenital abnormalities in the

progesterone group should encourage future studies to include this outcome

in their follow-up study protocol. Our data can contribute to future meta

analyses that must answer whether progesterone can be considered safe for

use in pregnancy.

REFERENCES

1. Hille ET, Weisglas-Kuperus N, van Goudoever JB, et al. Functional outcomes and participation in young adulthood for very preterm and very low birth weight infants: the Dutch Project on Preterm and Small for Gestational Age Infants at 19 years of age. Pediatrics 2007;120(3):e587-95.

2. Iams JD, Goldenberg RL, Meis PJ, et al. The length of the cervix and the risk of spontaneous premature delivery. National Institute of Child Health and Human Development Maternal Fetal Medicine Unit Network. N Engl J Med 1996;334(9):567-72.

3. Fonseca EB, Celik E, Parra M, et al. Progesterone and the risk of preterm birth among women with a short cervix. N Engl J Med 2007;357(5):462-9.

4. Hassan SS, Romero R, Vidyadhari D, et al. Vaginal progesterone reduces the rate of preterm birth in women with a sonographic short cervix: a multicenter, randomized, double-blind, placebo-controlled trial. Ultrasound Obstet Gynecol 2011;38(1):18-31.

6. Harlap S, Prywes R, Davies AM. Letter: Birth defects and oestrogens and progesterones in pregnancy. Lancet 1975;1(7908):682-3.

7. Katz Z, Lancet M, Skornik J, et al. Teratogenicity of progestogens given during the first trimester of pregnancy. Obstet Gynecol 1985;65(6):775-80.

8. Resseguie LJ, Hick JF, Bruen JA, et al. Congenital malformations among offspring exposed in utero to progestins, Olmsted County, Minnesota, 1936-1974. Fertility and sterility 1985;43(4):514-9.

9. Yovich JL, Turner SR, Draper R. Medroxyprogesterone acetate therapy in early pregnancy has no apparent fetal effects. Teratology 1988;38(2):135-44.

10. O’Brien JM. The safety of progesterone and 17-hydroxyprogesterone caproate administration for the prevention of preterm birth: an evidence-based assessment. Am J Perinatol 2012;29(9):665-72.

11. McNamara HC, Wood R, Chalmers J, et al. STOPPIT Baby Follow-up Study: the effect of prophylactic progesterone in twin pregnancy on childhood outcome. PLoS One 2015;10(4):e0122341.

12. Northen AT, Norman GS, Anderson K, et al. Follow-up of children exposed in utero to 17 alpha-hydroxyprogesterone caproate compared with placebo. Obstet Gynecol 2007;110(4):865-72.

13. Rode L, Klein K, Nicolaides KH, et al. Prevention of preterm delivery in twin gestations (PREDICT): a multicenter, randomized, placebo-controlled trial on the effect of vaginal micronized progesterone. Ultrasound Obstet Gynecol 2011;38(3):272-80.

15. Vedel C, Larsen H, Holmskov A, et al. Long-term effects of prenatal progesterone exposure: Neurophysiological development and hospital admissions in twins up to 8 years of age. Ultrasound Obstet Gynecol 2016.

16. van Os MA, van der Ven AJ, Kleinrouweler CE, et al. Preventing Preterm Birth with Progesterone in Women with a Short Cervical Length from a Low-Risk Population: A Multicenter Double-Blind Placebo-Controlled Randomized Trial. Am J Perinatol 2015;32(10):993-1000.

17. Crowther CA, Hiller JE, Haslam RR, et al. Australian Collaborative Trial of Antenatal Thyrotropin-Releasing Hormone: adverse effects at 12-month follow-up. ACTOBAT Study Group. Pediatrics 1997;99(3):311-7.

Chapter 4

21. Squires J, Twombly E, Bricker D, et al. ASQ-3 User’s Guide Brookes Publishing, Baltimore, USA, 2009.

22. Achenbach T, Rescorla L. ASEBA Child Behavior Checklists for Ages 1.5-5 Years (CBCL/1.5-5) ASEBA.

23. Steenis LJ, Verhoeven M, Hessen DJ, et al. Parental and professional assessment of early child development: the ASQ-3 and the Bayley-III-NL. Early Hum Dev 2015;91(3):217-25.

24. van Os MA, Kleinrouweler CE, Schuit E, et al. Influence of cut-off value on the prevalence of short cervical length. Ultrasound Obstet Gynecol 2016.

25. Potharst ES, Schuengel C, Last BF, et al. Difference in mother-child interaction between preterm- and term-born preschoolers with and without disabilities. Acta Paediatr 2012;101(6):597-603.

26. Steenis LJ, Verhoeven M, Hessen DJ, et al. Performance of Dutch children on the Bayley III: a comparison study of US and Dutch norms. PLoS One 2015;10(8):e0132871.

27. O’Brien JM, Steichen JJ, Phillips JA, et al. Two year infant outcomes for children exposed to supplemental intravaginal progesterone gel in utero: secondary analysis of a multicenter, randomized, double-blind, placebo-congtrolled trial. Am J Obstet Gynecol Supplement to January 2012.

28. Potharst ES, Houtzager BA, van Sonderen L, et al. Prediction of cognitive abilities at the age of 5 years using developmental follow-up assessments at the age of 2 and 3 years in very preterm children. Dev Med Child Neurol 2012;54(3):240-6.

Appendix 1. Differences sociodemographic characteristics from mothers and their children that participated in the TripleP 2 years follow-up study (n=59) and mothers and children that were loss to follow-up (n=21)

Mothers and their children that were assessed at two years follow-up or loss to follow-up

Social background characteristics mothers

Follow-up=59 Loss to follow-up N=21 P value P value*

Median (IQR) maternal age at randomization

31 (27 to 34) 29 (19 to 39) 0.07 -

Nuliparity n(%) 41 (69.5) 14 (66.7) 0.81 0.51

Parental education† High n(%) Middle n(%) Low n(%)

37 (63.8)10 (17.2)11 (19.0)

1 (11.1)2 (22.2)6 (66.7)

0.004 0.001

Ethnic origin white European n(%) 44 (76.3) 9 (42.9) 0.005 0.011

Pregnancy outcomes

Corticosteroids during pregnancy n(%)

10 (16.9) 5 (25.0) 0.43 0.73

PPROM n(%) 7 (11.9) 4 (19.0) 0.47 1.00

Treatment compliance (n taken >80% of medication) n(%)

37 (63.8) 11 (64.7) 0.95 0.56

Neonatal outcomes

Gender male n(%) 33 (55.9) 10 (47.6) 0.51 1.00

Composite adverse neonatal outcome ∫

2 (3.4) 4 (19.0) 0.038 0.56

Death before discharge 0 (0) 3 (14.3) 0.016 -

NICU admission n(%) 5 (8.5) 3 (14.3) 0.43 0.66

Gestational age at birth <32 wk n(%)<34 wk n(%)<37 wk n(%)

3 (5.1)5 (8.5)9 (15.3)

4 (19.0)6 (28.6)7(33.3)

0.0730.0220.075

0.940.380.49

Birthweight <2500g n(%) <1500g n(%)

9 (15.3)2 (3.4)

8 (38.1)4 (19.0)

0.0280.038

0.230.56

* Maternal and neonatal characteristics in follow-up participants are compared to maternal and neonatal characteristics in the group that was loss-to follow-up due to mortality, unavailable contact data to approach parents or parents not willing to participate. The first p-value represents this comparison. The second p-value* gives the same comparison without the deceased children.

†parental education (high-middle-low): “low level” (total years postelementary schooling:<6) if at least one of the parent has a low level of education (but not if one parent is highly educated), “middle level” (total years postelementary schooling: 6-8)if both parents have middle level of education, “high level” (total years of post-elementary schooling:>8) if at least one parent is highly educated

Part IIIIntegrating outcomes of

obstetrical evaluation studies

CHAPTER 5

PREDICTING DEVELOPMENTAL OUTCOMES IN PREMATURE

INFANTS BY TERM EQUIVALENT MRI: SYSTEMATIC REVIEW AND

METAANALYSIS

Janneke van ’t Hooft, Johanna H. van der Lee, Brent C. Opmeer,

Cornelieke SH. Aarnoudse-Moens, Arnold GE. Leenders,

Ben Willem J Mol, Timo R. de Haan.

Syst Rev. 2015;4:71.

Chapter 5

ABSTRACT

Background

This study aims to determine the prognostic accuracy of term MRI in very preterm

born (≤32 weeks) or low-birth-weight (≤ 1500 g) infants for long-term (>18 months)

developmental outcomes.

Methods

We performed a systematic review searching Central, Medline, Embase, and PsycInfo.

Two independent reviewers performed study selection, data extraction and quality

assessment. We documented sensitivity and specificity for three different MRI

findings (white matter abnormalities (WMA), brain abnormality (BA), and diffuse

excessive high signal intensity (DEHSI)), related to developmental outcomes including

cerebral palsy (CP), visual and/or hearing problems, motor, neurocognitive, and

behavioral function. Using bivariate meta-analysis, we estimated pooled sensitivity

and specificity and plotted summary receiver operating characteristic (sROC) curves

for different cut-offs of MRI.

Results

We included 20 papers published between 2000 and 2013. Quality of included

studies varied. Pooled sensitivity and specificity values (95% confidence interval (CI))

for prediction of CP combining the three different MRI findings (using normal/mild

vs. moderate/severe cut-off) were 77% (53 to 91%) and 79% (51 to 93%), respectively.

For prediction of motor function, the values were 72% (52 to 86%) and 62% (29

to 87%), respectively. Prognostic accuracy for visual and/or hearing problems,

neurocognitive, and/or behavioral function was poor. sROC curves of the individual

MRI findings showed that presence of WMA provided the best prognostic accuracy

whereas DEHSI did not show any potential prognostic accuracy.

Conclusions

This study shows that presence of moderate/severe WMA on MRI around term

equivalent age can predict CP and motor function in very preterm or low-birth-weight

infants with moderate sensitivity and specificity. Its ability to predict other long-term

outcomes such as neurocognitive and behavioral impairments is limited. Also, other

white matter related tests as BA and DEHSI demonstrated limited prognostic value.

Systematic review registration. PROSPERO CRD42013006362

MRI for prediction of developmental outcomes in prematures

BACKGROUND

Preterm birth is associated with an increased risk of neurodevelopmental

problems.1 Magnetic resonance imaging (MRI) is increasingly being used to

identify cerebral white matter lesions in the brain of preterm infants at term

equivalent age. It is claimed to be a valuable tool to predict neurodevelopmental

outcomes in very preterm infants and its clinical use is, therefore, being

promoted.2,3 However, the prognostic accuracy of white matter related MRI

abnormalities for long-term developmental outcomes is debatable and its

use as a standard of care is not yet recommended by the American Academy

of Neurology Quality Standards.4 The lack of meta-analytic synthesis of the

primary studies reporting prognostic values, which tends to show conflicting

results, hampers the debate.

Subsequently, the lack of knowledge about the prognostic accuracy of

term MRI hampers an adequate interpretation of this test. This may invoke

unwanted effects, as parents may worry unnecessarily about the possible

abnormal development of their child.5,6 However, if term MRI can predict

neurodevelopmental outcomes accurately, the use of this expensive diagnostic

procedure as part of standard care could be justified as it may select high risk

infants for prolonged and intensive supportive care.

Our study aims to evaluate the following two questions:

1. What is the prognostic accuracy (in terms of sensitivity and

specificity) of white matter related abnormalities seen on term

MRI for long-term developmental outcomes of infants born very

preterm or with low birth weight?

2. Is there a difference in prognostic accuracy between the three

types of white matter abnormalities as seen on term MRI

including white matter abnormality, a combination of cerebral

white matter lesions defined as ‘brain abnormality’, and diffuse

excessive high signal intensity? To answer these questions, we

performed a systematic review and meta-analysis on the subject.

Chapter 5

METHODS

We performed a systematic review following the guidance of the PRISMA

statement, Cochrane Handbook for Systematic Reviews of Diagnostic Test

Accuracy and other recommendations found in the literature,7–9 with a

prospectively published protocol at the Prospero database (www.crd.york.

ac.uk/PROSPERO/display_record.asp?ID=CRD42013006362#.VVMAX47tlBc).

Search strategy

We searched Central, Medline, Embase, and PsycInfo from their inception to

November 2013 for relevant studies. The search was performed by a trained

clinical librarian (AL) and two other authors (TdH and JvH). Broad text and MeSH

terms were used. Also, keywords of eligible papers were screened and included

in the final search. We did not apply any language restrictions. The search was

limited to studies including humans. The full search in all these databases can

be seen in Additional file 1. References from included studies were checked.

Abstracts and reports from meetings were included only if they related directly

to previously published work.

Eligibility criteria

The following inclusion criteria were used to select studies: (1) the study

pertained to infants born at a gestational age ≤ 32 weeks and/or birth weight

≤ 1500 g; (2) MRI should be planned at term equivalent age (37-42 weeks) with

a maximum range of 3 weeks earlier or later (34-45 weeks); (3) MRI findings

should be related to any developmental outcome; and (4) developmental

follow-up should be performed ≥18 months postnatal age. Isolated single case

studies and review articles were not included.

Abstracts were screened for eligibility by two independent reviewers (JvH and

TdH). Full-text articles were retrieved if applicable to the core research question,

or if the abstract did not supply sufficient information. Any disagreement

was set by discussion until consensus. The same two reviewers appraised the

methodological quality and performed the data extraction. Any disagreement

at this stage was resolved by a third reviewer.

Methodological quality

Due to lack of existing quality assessment tools for prognostic accuracy

studies, we developed a modified version of the QUADAS-2 assessment tool10

to evaluate the risk of bias (see Additional file 2).

Data extraction

A standardized data extraction form (see Additional file 3) was used to record

study information. The results of white matter abnormalities (WMA) and brain

abnormalities (BA) are usually expressed as either no, mild, moderate or severe

abnormalities as described by Inder and Woodward et al.11,12 Where possible

we defined two cut-offs, i.e., (1) no abnormality vs. mild, moderate or severe

abnormality, reported as ‘normal vs. any’ and (2) no or mild abnormality vs.

moderate to severe abnormality, reported as ‘normal/mild vs. moderate/

severe’. BA was defined as a combination of WMA plus presence of other brain

abnormalities such as ventricular haemorrhage or increased ventricle size. For

diffuse excessive high signal intensity (DEHSI), the results are usually expressed

as either present or absent. Therefore, only one cut-off was used in the 2x2

tables presenting the results for these MRI findings.

The cut-off point for unfavorable developmental outcome was defined as a

minus 2 standard deviations (-2 SD) difference from the mean for each MRI

finding. If this cut off was not reported (but for example, only a -1.25 or -1 SD),

we used the reported cut-off in the meta-analysis.

In cases of duplicate reporting, i.e., the same cohort was described in two

papers or one paper reporting developmental outcomes at different time

points of age, we used data from the paper that reported the developmental

outcome at a comparable age with the other included papers. For example: if

two papers reported motor skills at 2 years of age and one paper reported at 2

and 6 years of age, the reporting at 2 years of age was used. In case two papers

reported the same cohort at similar ages, the study with the largest sample

size and least quality concerns was selected. If the required data could not be

extracted from the publication, authors were contacted by email. All data were

entered in Review Manager (RevMan) version 5.3. Copenhagen: The Nordic

Cochrane Centre, The Cochrane Collaboration, 2012.

Chapter 5

We performed a meta-analysis using a bivariate modelling approach.13 In

view of the observed heterogeneity, a random-effects model was used.

We compared pooled sensitivity and specificity (95% confidence intervals);

likelihood ratios of positive and negative test results (LR+/LR-) were calculated

from the pooled sensitivity and specificity; diagnostic odds ratios (DOR), and

posttest probabilities of three different MRI findings (WMA, BA and DEHSI), for

all types of developmental outcomes. Sensitivity and specificity for individual

studies and summary ROC curves (sROC) were plotted to visualize possible

heterogeneity of data and overall test accuracy.

RESULTS

Our search strategy yielded 1 311 citations after removal of duplicates (Figure

1). A total of 44 papers met the inclusion criteria, of which 27 papers provided

2x2 tables. One more relevant paper was identified by contact with the authors.

After excluding multiple publications from the same cohorts (8 papers), a total

of 20 papers were available for the meta-analysis.

The 20 papers were all published between 2000 and 2013. These papers

reported on 12 different cohort studies (2 retrospective and 10 prospective)

including 1 287 patients (682 male and 605 female). The extracted data

provided 54 2x2 tables for WMA, BA or DEHSI. These three MRI findings were

used for the prediction of various developmental outcomes: cerebral palsy

(CP), visual and/or hearing problems, motor, neurocognitive, and behavioral

function, as well as a combination of problems in these domains defined as

‘neurodevelopmental impairment’ (NDI). Study characteristics are shown in

Additional file 4: Table S1.

Studies from which 2x2 tables could not be derived (n=17 papers, not reported

in this manuscript) reported continuous data with no cut-offs. These studies

mostly reported the following MRI tests: cerebellar abnormalities, volumes and

diameter measures of the brain (total brain or specific regions as hippocampus,

corpus callosum or ventricles).

Figure 1. Flowchart of study selection

1311 records identified through database search after removal of

duplicates

2 record identified through other sources

1313 records screened

83 full-text articles assessed for eligibility

1230 records excluded

39 full-text articles excluded, with reasons:

N=3 no relation between MRI and follow upN=4 follow up too shortN=3 MRI <34 and >45 weeks N=9 MRI >45 weeksN=7 included preterms >32 weeksN=1 cohort of <10 patientsN=12 abstracts of conferences with no full text available

44 studies included in qualitative synthesis

20 studies included in quantitative synthesis (meta-analysis including 59

2x2 tables)

17 studies did not provide 2x2 tables8 studies published double data

1 study with two 2x2 tables included after contact with authors

Methodological quality of included studies

In general, 70 to 90% of the included studies scored positive on each of the

QUADAS-2 quality assessment items (Figure 2). For example, 90% of the studies

included in the meta-analysis used a consecutive sample of very preterm born

and/or low-birth-weight neonates over a specific period of time in their clinic

(Figure 2). In general, a good description of the MRI test and reference standard

was provided, as well as a verification process to all neonates who had a MRI

performed. However, almost 50% of the papers did not report blinding of

the test results, i.e., results of the MRI findings are not (made) available to the

person performing the follow-up neurodevelopmental test.

Chapter 5

Figure 2. Quality assessment of included studies in meta-analysis (n=20).

Meta-analysis

The reported sensitivity and specificity were generally higher for the WMA tests

when compared to BA or DEHSI findings (Table 1). Figure 3 shows the sROC

curves for prediction of four different developmental delays related to any MRI

abnormality (combination of WMA, BA or DEHSI tests) using a ‘normal/mild vs.

moderate/severe’ cut-off. The sROC curve for prediction of CP shows a curve that

lies the most towards the (optimal) upper left corner of the ROC space. Also

the sROC curve for prediction of motor function has a tendency to the upper

left corner. The sROC curves for mental impairment and neurodevelopmental

impairment, which are visualized in Figure 3, are heading more towards the

diagonal (non-discriminating) line of the ROC space.

tio (L

ilitie

100a )

-‘no

0a )75

ith la

Chapter 5

Figure 3. Pooled sensitivity and specificity with sROC reporting four developmental outcomes detected by any MRI abnormality (including White Matter Abnormality, Brain abnormality or Diffuse Excessive High Signal Intensity using ‘normal/mild vs. moderate/severe’ cut-off).

a. Cerebral palsy b. Motor function

c. Mental impairment d. neurodevelopmental impairment (NDI)

a (n=seven studies): pooled sensitivity 77% (53 to 91%) and specificity 79% (51 to 93%). b (n=seven studies): pooled sensitivity 72% (52 to 86%) and specificity 62% (29 to 87%). c (n=seven studies): pooled sensitivity 66% (41 to 84%) and specificity 53% (35 to 71%). d (n=four studies): pooled sensitivity 61% (34 to 83%) and specificity 85% (75 to 92%). The individual studies are visualized as squares with the horizontal axis corresponding to the total non-diseased neonates and vertical axis the total diseased neonates of that particular study population, i.e., a flat square represents a low prevalence of the disease and the surface of the square represents the size of the study population.

The pooled sensitivity and specificity values (95% confidence interval (CI))

for prediction of CP were 77% (53 to 91%) and 79% (51 to 93%), respectively.

Almost similar values were found for the prediction of motor function with a

sensitivity of 72% (52 to 86%) and specificity of 62% (29 to 87%). Lower values

were found for mental development and NDI with sensitivity of 66% (41 to 84%)

and 53% (35 to 71%) respectively and specificity of 61% (34 to 83%) and 85%

(75 to 92%). Using a ‘normal vs. any’ cut-off pooled sensitivity and specificity

values were 84% (45 to 97%) and 58% (27 to 84%) for prediction of CP; 76% (48

to 92%) and 26% (8 to 57%) for prediction of motor function; and 85% (74 to

92%) and 36% (20 to 56%) for prediction of mental development, respectively.

Figure 4 shows the sROC curves corresponding to the two different cut-offs:

‘normal vs. any’ and ‘normal/mild vs. moderate/severe’ when only the results of

WMA are taken into consideration for prediction of various developmental

outcomes. If only moderate to severe WMA lesions are coded as abnormal

(‘normal/mild vs. moderate/severe’), the specificity increases and the sensitivity

decreases.

The spread of the individual studies alongside the sROC curves in Figures 3

and 4 shows a substantial heterogeneity of the collected data explained by

a threshold effect. The threshold effect is similar to the shift in sensitivity and

specificity as described above, yet without an explicit change in cut-off levels.

The shift is presumably the result of an implicit use of a different threshold, e.g.,

following from subjective judgments or calibration of diagnostic devices.

Chapter 5

Figure 4. Pooled sensitivity and specificity with sROC corresponding to two different cut-offs of WMA for prediction of for various developmental outcomes/delays (cerebral palsy, IQ, working memory, visual and/or hearing, mental development, language and motor function delay)

a. Developmental delay in case of ‘normal vs. any’ WMA (n=13 studies)

b. Developmental delay in case of ‘normal/mild vs. moderate/severe WMA (n=15 studies)

a Developmental delay in case ‘normal vs. any’ WMA (n=13 studies). b Developmental delay in case of ‘normal/mild vs. moderate/severe’ WMB (n=15 studies). The line respresents the sROC curve. The black dot the pooled sensitivity and specificity. The blank squares respresents the individual studies , with the horizontal axis corresponding to the total non-diseased and vertical axis the total diseased of that particular study population.

DISCUSSION

This study shows that the presence of moderate/severe WMA on MRI performed

around term equivalent age can predict CP and motor function in very preterm

or low birth weight neonates with moderate sensitivity and specificity. The

ability to predict other long-term outcomes such as neurocognitive and

behavioral impairments is limited. Also, other white matter related tests as BA

and DEHSI demonstrated limited to no prognostic value.

In the last decade, the use of MRI as a screening tool for very preterm and low-

birth-weight neonates has been a topic of major interest and several reviews

have been published on its use.3,14–18 Most of these reviews are narrative

(describing practical issues like sedation for MRI and/or different types of MRI

techniques) or examined the impact of preterm birth and brain abnormalities

on long- term development through the use of MRI. Although none of them

systematically reported test accuracy of MRI for prediction of developmental

outcome, most of these reviews, however, recommended the use of term MRI in

clinical practice. To our best knowledge, our study is the first that systematically

reviews the prognostic accuracy of different MRI findings on various long-term

developmental outcomes.

Clinical implications

The data in our meta-analysis suggest that presence of moderate/severe WMA

has higher positive likelihood ratio, and absence of any WMA has a higher

negative likelihood ratio than any other test that we now use for preterm

infants (e.g., cranial ultrasound or neurological examination).19 The prognostic

accuracy of WMA finding on MRI therefore supports the use of MRI for preterm

infants. However, whether this alters clinical management is a different

question. Answering this question was beyond the scope of our meta-analysis.

In our opinion however, showing potential prognostic accuracy of a test does

not directly justify its clinical use as a standard test. The usefulness of this tool

for clinical decision-making requires the presence of possible treatment or

specified follow-up strategies following the results of the MRI.20 At present,

there is no specific treatment available addressing the needs of infants with

abnormal white matter on MRI. However, the use of term MRI results may

give focus to specific follow-up programs (i.e., offering a screening tool for

developmental disorders at an earlier age) or improve selection of neonates

for early intervention programs (i.e., physiotherapy or speech therapy). Also,

available MRI results may help parents of prematurely born infants to better

prepare for the future.

On the other hand, after screening all very preterm born or low-birth-weight

neonates with a term MRI, there is no other test available with better accuracy.

Therefore the possible harm due to false positive and false negative results

must be taken into consideration. The value of being timely informed (value

of information) must be weighed against the possibility of unnecessary

concern for adverse outcome.21,22 For example, based on the results of this

meta-analysis, we can expect that the finding of moderate to severe WMA in

a very preterm born child will increase the probability of developing CP from

the known prevalence of 7% in this population to 37% (Table 1). This raises

Chapter 5

the question if this increase in probability will change practice for both the

clinician and patient. More specifically: will the clinician offer a different follow-

up program when the risk of developing CP is 37% instead of 7%? And will

the negative posttest probability of 2.5% (i.e., 2.5% will still develop CP after a

normal MRI test result) justify a denial of follow-up to those with normal MRI?

Our meta-analysis also shows that adverse outcomes, such as neurocognitive

and behavioral impairments, could not be predicted by term MRI abnormalities.

Compromised white matter may result in more ‘subtle’ impairments in such

areas of the child’s long-term function. The limited prognostic value of WMA for

these specific outcomes also suggests that despite MRI abnormalities, whether

or not a child develops neurocognitive and behavioral impairments, is also

dependent on other factors. Such other factors may include the presence of a

stimulating home and/or school environment, educational level of the parents

and therapy use.23,24

Other considerations relevant to deciding on the use of MRI for the prediction

of developmental outcomes are the substantial health care costs associated

with its use. In many neonatal units, MRI technology is unavailable or its use

is severely restricted. Also, expert neuroradiologists are needed for proper

interpretation of the MRI results. In view of its potential prognostic capacity,

it is therefore still debatable whether performing a standard term MRI is cost-

effective.

Limitations

This meta-analysis has some limitations that need to be considered. Although

a considerable number of studies were identified on the subject, only a limited

number of data points were available for each specific combination of MRI

findings and neonatal outcome. Although even the results of only two studies

can be pooled, the limited number of data points and often limited sample size

per study imply limited power (hence wide confidence intervals).25

Also the presence of heterogeneity may raise the question whether pooling

of results is justified in our study. In prognostic meta-analysis two possible

reasons for heterogeneity of the data are known i.e., clinical heterogeneity, due

to differences in features of the cohorts, and heterogeneity due to threshold

effect. We estimate a smaller impact of the clinical heterogeneity as all cohorts

included consecutive and comparable populations (although inadequate

and inconsistent reporting of possible confounders in the studies, e.g., use

of medication, birth weight and presence of neonatal complications during

admission, made it impossible to correct for potential confounders in our

meta-analysis). Heterogeneity due to threshold effect is a common occurrence

in many diagnostic test systematic reviews and probably explaining most

of the heterogeneity in our meta-analysis.9 The threshold effect in MRI tests

is explained by the relative subjectivity of interpretations of MRI results e.g.,

one lesion on the MRI might be seen as abnormal for one radiologist, but

not by another. Also the use of different scoring systems, and differences in

background of the evaluators (neonatologists or radiologist) contribute to this

type of heterogeneity. For this review, heterogeneity due to different scoring

systems is probably the case in studies describing ‘brain abnormalities’. These

studies not only include WMA as one of the MRI findings but also a composite

of other MRI findings (i.e., IVH and/or increased ventricle size). However, since

this heterogeneous definition of ‘brain abnormality’ reflects common practice,

we included these diverging MRI findings.

Furthermore, the quality of the included studies varied. In general the majority

of the studies were of good quality, although the lack of reporting of blinding

of the MRI test at follow-up assessment in almost 50% of the papers is a point of

concern. However, in view of the limited number of included studies, subgroup

analyses by excluding low quality studies is unlikely to resolve this question, as

it would merely lead to broader confidence intervals.26 As with all reviews, this

systematic review is susceptible to publication bias. Especially cohort studies

that did not show any predictive value of MRI have a lower chance of being

published. The effect of publication bias may have resulted to overestimation

of the predictive value of MRI in our meta-analysis.

Recommendations for clinical care and further research

There is solid evidence that very preterm birth and low birth weight has negative

consequences on motor, neurocognitive, and behavioral functioning.1,27,28

Preterm birth is also associated with variable degrees of brain injury and reduced

brain volumes.18,29 A multitude of possible confounding factors play a role in

the developmental outcomes of these fragile infants. Although MRI results can

Chapter 5

add valuable information on the prediction of long-term development, this

information is in our opinion too marginal to use it on its own. A next step

to consider is the performance of an Individual Patient Data (IPD) analysis

gathering the results from the individual level. First, this will enhance correction

of confounders of the different cohort studies. Second, this extensive data-

analyses technique may be used to develop a prognostic model, in which the

presence of WMA on MRI can be combined with other biomarkers known to

influence long-term development such as gender, neonatal history, clinical

symptoms as infection,30 poor nutrition,31 use of steroids,32 low birth weight,

socio-demographic factors, other imaging techniques as ultrasonography,33

or other promising MRI techniques that might show moderate prognostic

accuracy in the near future (e.g., MR spectroscopy, diffusion tensor imaging

(DTI), and neurite orientation dispersion and density imaging (NODDI)).34

A model statistically combining various relevant prognostic factors likely

increases the accuracy to predict outcomes, and may therefore be a more

valuable tool for clinical use than MRI on its own.

CONCLUSIONS

This meta-analysis shows that the presence of moderate/severe WMA on

MRI around term equivalent age can predict CP and motor function in very

preterm or low birth weight neonates with moderate sensitivity and specificity.

The ability to predict other long-term outcomes such as neurocognitive and

behavioral impairments is limited. Before considering the use of this test as a

standard test in clinical practice we encourage the continued use of routine MRI

in a research setting to generate further evidence on its prognostic capacity

together with other prognostic factors.

REFERENCES

1. Aarnoudse-Moens CS, Weisglas-Kuperus N, van Goudoever JB, Oosterlaan J. Meta-analysis of neurobehavioral outcomes in very preterm and/or very low birth weight children. Pediatrics 2009; 124: 717-28.

2. Keunen K, Kersbergen KJ, Groenendaal F, Isgum I, de Vries LS, Benders MJ. Brain tissue volumes in preterm infants: prematurity, perinatal risk factors and neurodevelopmental outcome: a systematic review. J Matern Fetal Neonatal Med 2012; 25: 89-100.

3. Ment LR, Hirtz D, Huppi PS. Imaging biomarkers of outcome in the developing preterm brain. Lancet Neurol 2009; 8: 1042-55.

4. Ment LR, Bada HS, Barnes P, Grant PE, Hirtz D, Papile LA, et al. Practice parameter: neuroimaging of the neonate: report of the Quality Standards Subcommittee of the American Academy of Neurology and the Practice Committee of the Child Neurology Society. Neurology 2002; 58: 1726-38.

5. Janvier A, Barrington K. Trying to predict the future of ex-preterm infants: who benefits from a brain MRI at term? Acta Paediatr 2012; 101: 1016-7.

6. Pearce R, Baardsnes J. Term MRI for small preterm babies: do parents really want to know and why has nobody asked them? Acta Paediatr 2012; 101: 1013-5.

7. Khan KS. Systematic reviews of diagnostic tests: a guide to methods and application. Best Pract Res Clin Obstet Gynaecol 2005; 19: 37-46.

8. Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. BMJ 2009; 339: b2535.

9. Macaskill P, Gatsonis C, Deeks J, Harbord R, Takwoingi Y. Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy. Chapter 10: Analysing and Presenting Results. In: Deeks JJ, Bossuyt PM, Gatsonis C (editors). Version 1.0. The Cochrane Collaboration 2010;1–61. Available from: https://srdta.cochrane.org.

10. Whiting PF, Rutjes AWS, Westwood ME, Mallet S, Deeks JJ, Reitsma JB, et al. Research and Reporting Methods Accuracy Studies. Ann Intern Med. 2011;155:529–36.

11. Inder TE, Wells SJ, Mogridge NB, Spencer C, Volpe JJ. Defining the nature of the cerebral abnormalities in the premature infant: a qualitative magnetic resonance imaging study. J Pediatr 2003; 143: 171-9.

12. Woodward LJ, Anderson PJ, Austin NC, Howard K, Inder TE. Neonatal MRI to predict neurodevelopmental outcomes in preterm infants. N Engl J Med 2006; 355: 685-94.

13. Reitsma JB, Glas AS, Rutjes AW, Scholten RJ, Bossuyt PM, Zwinderman AH. Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. J Clin Epidemiol 2005; 58: 982-90.

14. Russ A, Hand IL. Preterm brain injury: imaging and neurodevelopmental outcome. Am J Perinatol 2004; 21: 167-72.

15. Ramenghi LA, Rutherford M, Fumagalli M, Bassi L, Messner H, Counsell S, et al. Neonatal neuroimaging: going beyond the pictures. Early Hum Dev 2009; 85: S75-S77.

16. Mathur A, Inder T. Magnetic resonance imaging-insights into brain injury and outcomes in premature infants. J Commun Disord 2009; 42: 248-55.

17. El-Dib M, Massaro AN, Bulas D, Aly H. Neuroimaging and neurodevelopmental outcome of premature infants. Am J Perinatol 2010; 27: 803-18.

18. de Kieviet JF, Zoetebier L, van Elburg RM, Vermeulen RJ, Oosterlaan J. Brain development of very preterm and very low-birthweight children in childhood and adolescence: a meta-analysis. Dev Med Child Neurol 2012; 54: 313-23.

19. Bosanquet M, Copeland L, Ware R, Boyd R. A systematic review of tests to predict cerebral palsy in young children. Dev Med Child Neurol. 2013;55(5):418–26.

20. Altman DG. Systematic reviews of evaluations of prognostic variables. BMJ 2001; 323: 224-8.

21. Asch DA, Patton JP, Hershey JC. Knowing for the sake of knowing: the value of prognostic information. Med Decis Making 1990; 10: 47-57.

Chapter 5

22. Vis JY, van Zwieten MC, Bossuyt PM, et al. The influence of medical testing on patients’ health: an overview from the gynecologists’ perspective. BMC Med Inform Decis Mak 2013; 13: 117.

23. Teune MJ, van Wassenaer AG, van DP, Mol BW, Opmeer BC. Perinatal risk indicators for long-term neurological morbidity among preterm neonates. Am J Obstet Gynecol 2011; 204: 396.

24. Weisglas-Kuperus N, Baerts W, Smrkovsky M, Sauer PJ. Effects of biological and social factors on the cognitive development of very low birth weight children. Pediatrics 1993; 92: 658-65.

25. Rosenthal R, DiMatteo MR. Meta-analysis: recent developments in quantitative methods for literature reviews. Annu Rev Psychol 2001; 52: 59-82.

26. Leeflang M, Reitsma J, Scholten R, Rutjes A, Di NM, Deeks J, et al. Impact of adjustment for quality on results of metaanalyses of diagnostic accuracy. Clin Chem 2007; 53: 164-72.

27. Bhutta AT, Cleves MA, Casey PH, Cradock MM, Anand KJ. Cognitive and behavioral outcomes of school-aged children who were born preterm: a meta-analysis. JAMA 2002; 288: 728-37.

28. de Kieviet JF, Piek JP, Aarnoudse-Moens CS, Oosterlaan J. Motor development in very preterm and very low-birth-weight children from birth to adolescence: a meta-analysis. JAMA 2009; 302: 2235-42.

29. Bhutta AT, Anand KJ. Abnormal cognition and behavior in preterm neonates linked to smaller brain volumes. Trends Neurosci 2001; 24: 129-30.

30. Mitha A, Foix-L’Helias L, Arnaud C, Marret S, Vieux R, Aujard Y, et al. Neonatal infection and 5-year neurodevelopmental outcome of very preterm infants. Pediatrics 2013; 132: e372-e380.

31. Lucas A, Morley R, Cole TJ. Randomised trial of early diet in preterm babies and later intelligence quotient. BMJ 1998; 317: 1481-7.

32. Needelman H, Evans M, Roberts H, Sweney M, Bodensteiner JB. Effects of postnatal dexamethasone exposure on the developmental outcome of premature infants. J Child Neurol 2008; 23: 421-4.

33. Counsell SJ, Rutherford MA, Cowan FM, Edwards AD. Magnetic resonance imaging of preterm brain injury. Arch Dis Child Fetal Neonatal Ed 2003; 88: F269-F274.

34. Kwon SH, Vasung L ML. The Role of Neuroimaging in Predicting Neurodevelopmental Outcomes of Preterm Neonates. Clin Perinatol. 2013;41:257–83.

35. T. Y. Jeon, J. H. Kim, S.-Y. Yoo, H. Eo, J.-Y. Kwon, J. Lee, et al. “Neurodevelopmental Outcomes in Preterm Infants: Comparison of Infants with and without Diffuse Excessive High Signal Intensity on MR Images at Near-term-equivalent Age,” Radiology 2012; 263:518–526.

36. S. Setänen, L. Haataja, R. Parkkola, A. Lind, and L. Lehtonen, "Predictive value of neonatal brain MRI on the neurodevelopmental outcome of preterm infants by 5 years of age.," Acta Paediatr 2013; 102: 492-7.

37. L. J. Woodward, C. a C. Clark, S. Bora, and T. E. Inder, “Neonatal White Matter Abnormalities an Important Predictor of Neurocognitive Outcome for Very Preterm Children,” PLoS One 2012; 7.

38. K. Howard, G. Roberts, J. Lim, K. J. Lee, N. Barre, K. Treyvaud, et al. “Biological and environmental factors as predictors of language skills in very preterm children at 5 years of age.,” J. Dev. Behav. Pediatr 2011; 32: 239–249.

39. K. Treyvaud, T. E. Inder, K. J. Lee, E. a. Northam, L. W. Doyle, and P. J. Anderson, “Can the home environment promote resilience for children born very preterm in the context of social and medical risk?,” J. Exp. Child Psychol 2012; 112: 326–337.

40. A. J. Spittle, J. Cheong, L. W. Doyle, G. Roberts, K. J. Lee, J. Lim, et al. “Neonatal white matter abnormality predicts childhood motor impairment in very preterm children,” Dev. Med. Child Neurol 2011; 53: 1000–6.

41. F. T. de Bruïne, A. a van den Berg-Huysmans, L. M. Leijser, M. Rijken, S. J. Steggerda, J. van der Grond et al. “Clinical implications of MR imaging findings in the white matter in very preterm infants: a 2-year follow-up study.,” Radiology 2011; 261: 899–906.

42. B. Skiöld, B. Vollmer, B. Böhm, B. Hallberg, S. Horsch, M. Mosskin et al. “Neonatal magnetic resonance imaging and outcome at age 30 months in extremely preterm infants,” J. Pediatr 2012; 160.

43. M. H. Beauchamp, D. K. Thompson, K. Howard, L. W. Doyle, G. F. Egan, T. E. Inder et al. “Preterm infant hippocampal volumes correlate with later working memory deficits,” Brain 2008; 131: 2986–2994.

44. C. a C. Clark and L. J. Woodward, “Neonatal cerebral abnormalities and later verbal and visuospatial working memory abilities of children born very preterm.,” Dev. Neuropsychol 2010; 35: 622–642.

45. S. Iwata, T. Nakamura, E. Hizume, H. Kihara, S. Takashima, T. Matsuishi, and O. Iwata, “Qualitative brain MRI at term and cognitive outcomes at 9 years after very preterm birth.,” Pediatrics 2012; 129: e1138–47.

46. P. Munck, L. Haataja, J. Maunu, R. Parkkola, H. Rikalainen, H. Lapinleimu et al. “Cognitive outcome at 2 years of age in Finnish infants with very low birth weight born between 2001 and 2006,” Acta Paediatr. Int. J. Paediatr 2010; 99: 359–366.

47. K. Treyvaud, A. Ure, L. W. Doyle, K. J. Lee, C. E. Rogers, H. Kidokoro et al. “Psychiatric outcomes at age seven for very preterm children: Rates and predictors,” J. Child Psychol. Psychiatry Allied Discip 2013; 54: 772–779.

48. M. Mirmiran, P. D. Barnes, K. Keller, J. C. Constantinou, B. E. Fleisher, S. R. Hintz, and R. L. Ariagno, “Neonatal brain magnetic resonance imaging before discharge is better than serial cranial ultrasound in predicting cerebral palsy in very low birth weight preterm infants.,” Pediatrics 2004; 114: 992–998.

49. a M. Valkama, E. L. Pääkkö, L. K. Vainionpää, F. P. Lanning, E. a Ilkko, and M. E. Koivisto, "Magnetic resonance imaging at term and neuromotor outcome in preterm infants.," Acta Paediatr 2000; 89:348-55.

50. E. M. Augustine, D. M. Spielman, P. D. Barnes, T. L. Sutcliffe, J. D. Dermon, M. Mirmiran, et al. “Can magnetic resonance spectroscopy predict neurodevelopmental outcome in very low birth weight preterm infants?,” J. Perinatol 2008; 28: 611–618.

51. M. L. Giannì, O. Picciolini, C. Vegni, L. Gardon, M. Fumagalli, and F. Mosca, "Twelve-month neurofunctional assessment and cognitive performance at 36 months of age in extremely low birth weight infants.," Pediatrics 2007; 120: 1012-1019.

52. R. Arthur, “Magnetic resonance imaging in preterm infants.,” Pediatr. Radiol. 2006; 36: 593–607.

53. H. Kidokoro, P. J. Anderson, L. W. Doyle, J. J. Neil, and T. E. Inder, “High signal intensity on T2-weighted MR imaging at term-equivalent age in preterm infants does not predict 2-year neurodevelopmental outcomes,” Am. J. Neuroradiol 2011; 32: 2005–2010.

Chapter 5

Additional file 1. Full search in Central, Medline, Embase and PsycInfo database

CENTRAL (cochrane library)#1 mri near/3 (neonat* or a term or term) #2 (magnetic or resonance or imaging or spectroscopy or tensor or diffusion) near/5 (neonat* or

newborn* or a term or term) #3 volumetric mr imaging or fractional anisotropy or fluid attenuated inversion recovery or flair or

apparent diffusion coefficient or diffuse excessive high signal intensity or dehsi or (diffusion near/4 imaging)

#4 (#1 or #2 or #3) #5 MeSH descriptor: [Infant, Premature] explode all trees #6 MeSH descriptor: [Intensive Care, Neonatal] explode all trees #7 (birth weight near/4 week*) or (gestational age near/4 32 week*) or lower gestational age #8 preterm* or prematur* or elbw or vlbw or low birth weight* or small for date or nicu #9 #5 or #6 or #7 or #8 #10 #4 and #9

MEDLINE 1. (birth weight adj4 week*).tw.2. ((gestational age adj4 32 week*) or lower gestational age).tw.3. exp infant, low birth weight/ or infant, premature/ or neonatal intensive care/4. (preterm* or prematur* or elbw or vlbw or low birth weight* or small for date or nicu).tw.5. or/1-46. (mri adj4 (neonat* or newborn* or a term or a-term or term)).tw.7. ((magnetic or resonance or imaging) adj5 (neonat* or newborn* or a term or term)).tw.8. (volumetric mr imaging or fluid attenuated inversion recovery or flair or apparent diffusion coefficient or fractional anisotropy or diffuse excessive high signal intensity or dehsi or (diffusion adj4 imaging)).tw.9. 6 or 7 or 810. (cohort or prospective or retrospective or longitudinal or prognosis or risk or case control or long term or longterm).tw.11. exp cohort studies/ or exp prognosis/ or exp risk/ or case control studies/12. 10 or 1113. exp mental disorders diagnosed in childhood/14. exp Nervous System Diseases/15. exp mortality/16. (corpus callosum or cerebrospinal fluid or white matter or grey matter or ((brain or cerebell*) adj2 (volum* or abnomalit* or atroph*))).tw.17. (periventricular leu*omalacia or intraventricular h?emorrhage or cerebrospinal fluid or cerebellum).tw.18. exp Cognition Disorders/ or cognition.tw.19. (seizure* or epileps* or cerebral pals* or (learning adj3 disorder*) or deafness or blindness or (vision adj3 disorder*) or ((hearing adj3 disorder*) or visuospatial memory)).tw.20. exp Intelligence Tests/21. exp intelligence/22. (intelligen* or stanford-binet or wechsler or bayley scal* or iq).tw.23. exp Education, Special/24. (outcome or neurological sequelae).mp.25. (mental development index or psychomotor development index or social emotional development or movement assessment or executive function or neurodevelopment* or motor impairment or cognitive impairment or language skills or language development or language delay).tw.26. or/13-2527. 5 and 9 and 2628. 5 and 9 and 1229. 27 or 2830. animal/ not (human/ and animal/)31. 29 not 3032. limit 31 to yr=”1980 -Current”

EMBASE (1980- and weekly alerts)1. (birth weight adj4 week*).tw.2. ((gestational age adj4 32 week*) or lower gestational age).tw.3. exp low birth weight/ or exp prematurity/ or neonatal intensive care.mp.4. (preterm* or prematur* or elbw or vlbw or low birth weight* or small for date or nicu).tw.5. or/1-46. (volumetric mr imaging or fractional anisotropy or fluid attenuated inversion recovery or flair or apparent diffusion coefficient or diffuse excessive high signal intensity or dehsi or (diffusion adj4 imaging)).tw.7. (mri adj4 (neonat* or newborn* or a term or a-term or term)).tw.8. ((magnetic or resonance or imaging) adj5 (neonat* or newborn* or a term or term)).tw.9. 6 or 7 or 810. (cohort or prospective or retrospective or longitudinal or prognosis or risk or case control).tw.11. cohort analyse/ or follow up/ or prospective study/ or retrospective study/ or exp prognosis/ or exp risk/ or case control study/12. 10 or 1113. exp mental disease/14. exp Neurologic disease/15. exp mortality/16. (seizure* or epileps* or cerebral pals* or white matter or grey matter or ((brain or cerebell*) adj2 (volum* or abnomalit* or atroph*))).tw.17. (periventricular leu*omalacia or intraventricular h?emorrhage or cerebrospinal fluid or cerebellum).tw.18. cognitive defect/ or cognition.tw.19. ((learning adj3 disorder*) or deafness or blindness or (vision adj3 disorder*) or ((hearing adj3 disorder*) or visuospatial memory)).tw.20. exp Intelligence Test/21. exp intelligence/22. (intelligen* or stanford-binet or wechsler or bayley scal* or iq).tw.23. exp special education/24. (outcome or neurological sequelae).mp.25. (mental development or psychomotor development or social emotional development or movement assessment or executive function or neurodevelopment* or motor impairment or cognitive impairment or language skills or language development or language delay).tw.26. or/13-2527. 5 and 9 and 2628. 5 and 9 and 1229. 27 or 2830. (animal/ or nonhuman/) not (human/ or ((animal/ or nonhuman/) and human/))31. 29 not 30

PsycInfo 1. (birth weight adj4 week*).tw,id.2. ((gestational age adj4 32 week*) or lower gestational age).tw.3. exp birth weight/ or premature birth/ or neonatal intensive care.mp,id.4. (preterm* or prematur* or elbw or vlbw or low birth weight* or small for date or nicu).tw,id.5. or/1-46. exp Magnetic Resonance Imaging/ or magnetic resonance spectroscopy.tw,id.7. (mri or ((magnetic or cerebell*) adj3 imaging)).tw,id.8. 6 or 79. neonatal period/ or newborn*.tw,id. or neonat*.tw,id.10. 9 and 811. (mri adj4 (neonat* or newborn* or a term or a-term or term)).tw,id.12. ((magnetic or resonance or imaging or spectroscopy or tensor or diffusion) adj5 (neonat* or newborn* or a term or term)).tw,id.13. (volumetric mr imaging or fractional anisotropy or fluid attenuated inversion recovery or flair or apparent diffusion coefficient or diffuse excessive high signal intensity or dehsi or (diffusion adj4 imaging)).tw.14. 11 or 12 or 1315. 10 or 14

Chapter 5

16. 5 and 1517. (22* or 23* or 28* or 32*).cc.18. (white matter or grey matter or cerebellum or ((brain or cerrebel*) adj2 (volum* or abnormalit* or atroph*)) or corpus callosum or periventricular leu*omalcia or intraventricular h?emorrhage or cerebrospinal fluid).mp,id.19. 5 and 1820. 5 and (8 or 13) and 1821. 16 or 2022. 17 and 2123. limit 22 to yr=”1980 -Current”

Additional file 2. Modified version of QUADAS-2 assessment tool

Scoring for methodologic quality: Risk of Bias and Applicability JudgementBased on QUADAS-2Rater: Main concerns Article:Author: Date of publication:Reference Manager Number: Patient Selection1. Describe methods of patient selection

2. Describe included patients (previous testing, presentation, intended use of MRI and setting)

3. Was a consecutive or random sample of patients enrolled? yes/no/unclear4. Did the study avoid inappropriate exclusion? yes/no/unclear5. Could the selection of patients have introduced bias? high/low/unclear6. Are there concerns that the included patients do not match the review question?

high/low/unclear

MRI7. Describe how the MRI was conducted and interpreted

8. Were the MRI results interpreted without knowledge of the results of the test at follow-up (citerium test) yes/no/unclear9. If a threshold was used (on MRI findings) was it prespecified? yes/no/unclear10. Could the conduct or interpretation of the MRI have introduced bias? high/low/unclear11. Are there concerns that the MRI, its conduct or its interpretation differ

from the review question? high/low/unclear

Test at follow-up (Criterium test)12. Describe the test at follow-up and how it was conducted and interpreted

13. Is the test at follow-up likely to correctly classify the target condition? yes/no/unclear14. Were the test at follow-up results interpreted without knowledge of the results of the MRI? yes/no/unclear15. Could the test at follow-up, its conduct, or its interpretation have

introduced bias? high/low/unclear16. Are there concerns that the target condition as defined by the follow-up does

not match the review question? high/low/unclear

Flow and Timing17. Describe any patients who did not receive the MRI or test-at follow-up (loss-to-

follow-up) or who were excluded from the 2x2 table (refer to flow diagram)

18. Describe the interval and any interventions between MRI and the test at follow-up

19. Did all patients receive a test at follow-up? yes/no/uncleara. Percentage of loss-to follow up ____%20. Did all patients receive the same test at follow-up? yes/no/unclear21. Were all patients included in the analysis? yes/no/unclear22. Could the patients flow have introduced bias? high/low/unclear

Additional file 3. Standardized data extraction form

Standardized data extraction formRater:Author:Date of publication:Reference Manager number:

Characteristics of the studyParticipantsTotal number of cases Total: MRI: Follow-up:

Number of preterms <32 weeks GA male female

Number of cases by GA

Important Baseline Characteristics-Mean GA:-Mean birth weight:-Asphyxia-Sepsis/NEC-IUGR-Interventions-Other:

MRI performanceGestational age of MRI performance

Used MRI technique

Prognostic factor (ex. White matter leasons/haemorrhage)

Used Classification for prognostic factor

Used cut-off values/range

Which part of the brain is imaged?

Chapter 5

Cerebrum / CerebellumStudy designProspective/Retrospective/UnclearSingle/Multi centre

Test at follow-upAge at follow-up Corrected?

Type of tests used

Cut-off points tests used

Which areas do these tests cover?Neurologic performance BehaviourCognitive Somatic

Mortality numbers during follow-up?

Results of the studyMain finding (related to prognostic factor)

2x2 table

Table 1: Used cut-off points MRI……………………………………………………………..…

Used cut-off points Follow-up…………………………………………………………

Abnormal outcome (Follow-up) Normal outcome(Follow-up)

Test abnormal (MRI)

Test normal (MRI)

Table 2: Used cut-off points MRI……………………………………………………………..…

Test abnormal (MRI)

Test normal (MRI)

EXTRA 2x2 Tables

Table 3: Used cut-off points MRI…………………………………………………………….….

Test abnormal (MRI)

Test normal (MRI)

ild -1

Chapter 5

lifica

Chapter 5

r et a

-III M

-III P

Chapter 5

coeffi

ren’

IR, fl

riffith

a Inder TE, Wells SJ, Mogridge NB, Spencer C, Volpe JJ. Defining the nature of the cerebral abnormalities in the premature infant: a qualitative magnetic resonance imaging study. J Pediatr 2003 Aug;143(2):171-9. b Woodward LJ, Anderson PJ, Austin NC, Howard K, Inder TE. Neonatal MRI to predict neurodevelopmental outcomes in preterm infants. N Engl J Med 2006 Aug 17;355:(7):685-94.c Nanba Y, Matsui K, Aida N, Sato Y, Toyoshima K, Kawataki M, et al. Magnetic resonance imaging regional T1 abnormalities at term accurately predict motor outcome in preterm infants. Pediatrics 2007 Jul;120(1):e10-e19. d Papile LA, Burstein J, Burstein R, Koffler H. Incidence and evolution of subependymal and intraventricular hemorrhage: a study of infants with birth weights less than 1,500 gm. J Pediatr 1978 Apr;92(4):529-34.e Miller SP, Cozzio CC, Goldstein RB, Ferriero DM, Partridge JC, Vigneron DB, et al. Comparing the diagnosis of white matter injury in premature newborns with serial MR imaging and transfontanel ultrasonography findings. AJNR Am J Neuroradiol 2003 Sep;24(8):1661-9.

CHAPTER 6

STANALYSIS IN ELECTRONIC FOETAL MONITORING IS COST

EFFECTIVE FROM BOTH THE MATERNAL AND NEONATAL

PERSPECTIVE.

Janneke van ’t Hooft, Maarten Vink, Brent C. Opmeer,

Sabine Ensing, Anneke Kwee, Ben Willem J. Mol

J Matern Fetal Neonatal Med. 2016:1-6

Chapter 6

ABSTRACT

Objective

Electronic foetal monitoring (EFM) together with non-invasive ST-analysis

(STAN) has been suggested as a superior technique to EFM alone for foetal

surveillance to prevent metabolic acidosis. This study aims to compare the

cost-effectiveness of these two techniques from both maternal (short-term) as

neonatal (long-term) perspective to guide clinical decision-making.

Methods

We created two models: a maternal model, focused on the difference in mode of

delivery as most important outcome, and a neonatal Markov model focused on

the differences in metabolic acidosis – and its relationship to cerebral palsy (CP)

– as the most relevant outcome to estimate the long-term cost-effectiveness.

The cost to prevent one instrumental delivery was estimated in the maternal

model. The costs to prevent one metabolic acidosis and the costs per quality

adjusted life years were calculated in the neonatal model.

Results

The average costs of STAN are only €34 higher when compared to EFM alone.

From maternal perspective the cost of preventing one instrumental delivery

was estimated at €2602. From neonatal perspective the cost to prevent one

case of metabolic acidosis was €14 509. Over the long-term, STAN becomes a

dominant (cost-saving) strategy if >1% of the patients exposed to metabolic

acidosis acquire CP.

Conclusions

Our study suggests that STAN, when compared to EFM alone, can be a cost-

effective strategy from both a maternal and neonatal perspective.

Long-term cost and effects of ST-analysis in foetal monitoring

INTRODUCTION

Perinatal metabolic acidosis is associated with long-term developmental

complications like cerebral palsy (CP).1 This is a condition with substantial

consequences on quality of life, but which also impacts healthcare and societal

costs. Calculations from a Danish national CP register estimating a life time

additional cost of €860 000 for men and €800 000 for women with the societal

cost components representing the largest proportion.2 Electronic foetal

monitoring (EFM) is worldwide the primary method for foetal surveillance

during labour and used to prevent metabolic acidosis by identifying foetuses

at risk of metabolic acidosis. However, EFM alone has shown many false positive

results with a subsequent increase in instrumental vaginal and operative

deliveries without an improvement of foetal outcome.3

A relatively new technique, consisting of a combination of EFM and non-invasive

ST-analysis (STAN) of the foetal electrocardiogram (STAN method, Neoventa

Medical, Gothenburg, Sweden), has been suggested as an improvement relative

to the EFM method alone for preventing perinatal metabolic acidosis.3–5 An

Individual Patient Data Meta-analysis (IPD) of Schuit et al., has indeed shown

benefits of STAN, by significantly reducing the rates of foetal blood sampling

(FBS) and instrumental vaginal deliveries when compared to EFM alone.6

However, although the findings suggested a possible reduction in metabolic

acidosis, this was not statistically significant. Based on these inconclusive

findings alone, it is difficult to decide whether the introduction of STAN would

be a cost-effective strategy from the maternal and/or neonatal perspective.

Two previously published cost-effectiveness studies on EFM versus STAN were

either based on outdated data or based on a single randomize trial and did not

have relevant impact on present clinical decision-making.7,8 In contrast, the IPD

of Schuit et al., analysed individual data from four randomised trials providing

more reliability and statistical power. If these data, together with new evidence

on the association between metabolic acidosis and CP, were modelled in a

cost-effectiveness assessment of STAN compared to EFM alone, these results

would be helpful to guide medical decision-making.

We aimed to answer the following question: is the use of STAN during labour,

when compared to EFM alone, a cost effective strategy from both the maternal

(short term) and neonatal (long term) perspective?

Chapter 6

METHODS

To conduct our cost-effectiveness analysis comparing STAN with the EFM

(alone) strategy for foetal monitoring during labour we used the results from

an IPD meta-analysis that analysed data from 12 987 women and their newborn

infants.6 In short, this IPD concluded that the use of STAN resulted in a reduction

in the frequency of instrumental vaginal deliveries (RR 0.90; 95% CI, 0.83-0.99)

and foetal blood samples (RR 0.49; 95% CI, 0.44-0.55) when compared to EFM

alone. Caesarean delivery rates were comparable between both groups. The

risk for metabolic acidosis was reduced (RR 0.76; 95% CI, 0.53-1.10), but not

statistically significant.

In this cost-effectiveness study, two models were created: one model from a

maternal perspective and the other from a neonatal perspective. The maternal

model (Figure 1) focused on the difference in mode of delivery (vaginal/

instrumental/caesarean section) as most important outcome for the short-term

cost-effectiveness. The neonatal model (Figure 2) focused on the differences in

metabolic acidosis –and its relationship to CP– as the most relevant outcome

to estimate the long-term cost-effectiveness.

To calculate cost-effectiveness, information on probabilities, costs and effects

measures are needed. We used the Consolidated Health Economic Evaluation

Reporting Standards (CHEERS) statement for reporting health economics.9

Details on the measures used for each of the models are reported below.

Probabilities

For the maternal model differences in probability (risk) on the need for FBS and

mode of delivery between the STAN and EFM groups reported in the IPD study

of Schuit et al. was used (Table 1).

For the neonatal model difference in occurrence of metabolic acidosis reported

in the IPD study of Schuit et al. was used. The review of Ellenberg et al. provided

estimates of the difference in occurrence of CP related to metabolic acidosis.10

This review included a population of term pregnancies (like the IPD of Schuit

et al.) but concluded that the literature is not consistent on the proportion of

CP with metabolic acidosis as a precursor. In the reviewed articles, metabolic

acidosis was found to be the cause of CP in 3% to 50% of the patients. Therefore,

instead of choosing a single point estimate as a probability of CP in relation to

Long-term cost and eects of ST-analysis in foetal monitoring

Figure 1. Decision model short term (maternal) analysis

Figure 2. Decision model long term (neonatal) analysis

Chapter 6

metabolic acidosis, we used a range of probabilities from 3% to 50% in the model.

The reported prevalence of CP of 2.11 per 1000 live births (95% CI 1.98 – 2.25) in

the general population was used to estimate the absolute number of patients

with CP in the model.11 These data allowed us to estimate the distribution of CP

after exposure or non-exposure to metabolic acidosis (Table 1).

Table 1. Probabilities for maternal and neonatal outcomes for electronic foetal monitoring (EFM) and ST-analysis (STAN)6

Probabilities (95% CI)

EFM EFM+STAN

Vaginal delivery 0.7407 (0.7298-0.7513) 0.7561 (0.7455-0.7665

Instrumental vaginal delivery 0.1406 (0.1323-0.1494) 0.1261 (0.1182-0.1345)

Caesarean section 0.1185 (0.1107-0.1266) 0.1177 (0.1100-0.1258)

Foetal blood sample 0.1456 (0.1371-0.1544) 0.07050 (0.0644-0.0770)

Acidosis 0.0113 (0.0089-0.0142) 0.0087 (0.0066-0.0113)

CP, metabolic acidosis 0.00409 – 0.09340* 0.00409 – 0.09340*

CP, no metabolic acidosis 0.00107 – 0.00207* 0.00107 – 0.00207**Calculation as the result of combining the probability of Cerebral Palsy (CP) after exposure to metabolic acidosis (range 3% to 50%)10 taking into account the prevalence of CP (2.11 per 1000 live births) in the general population.11

Outcome/effect measures

For the maternal model we used mode of delivery (vaginal/instrumental/

caesarean section) as the most important short term maternal outcome. We

assumed that a vaginal delivery is the preferable maternal outcome when

compared to instrumental delivery and/or caesarean section. This effect

measure was used to calculate the costs to prevent one instrumental delivery

and/or caesarean section related to STAN or EFM.

For the neonatal model, we used two effect measures. The first effect measure:

the number of neonates born with metabolic acidosis related to STAN or

EFM, was used to calculate the costs to prevent one metabolic acidosis for

both strategies. Furthermore we used quality adjusted life years (QALY) as the

second long-term effect measure. The QALY were calculated by multiplying the

reported utilities (explained below) of CP with the life expectancy of CP.

Utilities: Utilities are a universally used measure representing the strength of

individual preferences for specific health-related outcomes using standardized

methods.12 After a literature search on Pubmed we found two articles reporting

utility scores for CP patients. Rosebaum et al. reporting mean Health Related

Quality of Life (HRQoL) of an adolescent population with CP based on the

Health Utilities Index, Mark 3 (HUI3)13; and Young et al. reporting QoL scores

of a combined population of young and adult CP patients with mild and

moderate CP.14 In our base case analysis we used the average of the reported

utilities (respectively 0.42 and 0.36) of these two articles.

Life expectancy: Strauss et al. reported crude death rates of CP patients stratified

by age and severity of disease.15 We calculated the average crude death rate

per age category. The crude death rates of the general population were derived

from the Centraal Bureau voor de Statistiek (CBS), the ‘Statistics Netherlands’

based on the year 2012.16

QALY: In the general population the quality of life decreases with age.17 However,

no age-dependent QALY of CP patients have been reported in the literature.

We therefore performed our own calculations to establish an age-dependent

QALY of CP patients (table and example of calculation provided in Appendix A).

Cost related to EFM of STAN monitoring, delivery and CP are listed in Table

2. All reported costs were converted to 2012 euros using the consumer price

index of the Netherlands. For the short-term costs in the maternal model we

used the upstream costs of FBS and the use of EFM and STAN. For the long-

term cost in the neonatal model we used the downstream costs of CP. To our

best knowledge the lifetime costs of CP are approximated by studies from the

United States, which report lifetime costs of CP from societal perspective, and

one recent study from Denmark who considered both health and social care

costs. We therefore used the Danish study; the authors also kindly supplied us

with their raw data. This Danish study estimated an average lifetime cost (till

the age of 70) of a person born in 2000 of €830 000.2 Using the raw data of

this study we were able to calculate the costs based on a discount rate of 3.5%

according to the National Institute of Health and Care Excellence guidelines18

and simulate a cohort of patients with CP using crude death rates of patients

with CP. As the lifetime costs of patient with CP were calculated in comparison

with the lifetime costs of a person in the general population, people without

CP were reported as a lifetime cost equal to zero.

Chapter 6

Analysis

In the maternal model we performed a cost-effectiveness analysis to estimate

the costs of the prevention of one instrumental delivery. In the neonatal model,

a Markov model was used in order to account for the life expectancy of patients

with CP and the general population. Both cost-effectiveness analyses were

performed using TreeAge Pro 2009 (Williamstown, MA).

Table 2. Cost: unit of resource use, unit cost and valuation method (2012 euros)

Unit Unit cost Valuation method

Monitoring costs 8

STAN + CTG Unit € 50 Bottom-up

CTG Unit € 11 Bottom-up

Foetal blood sampling Procedure € 17 Bottom-up

Delivery costs 8

Caesarean section Procedure € 2064 Top-down calculation

Instrumental delivery Procedure € 1382 Top-down calculation

Spontaneous vaginal delivery Procedure € 1170 Top-down calculation

Costs cerebral palsy 2

Cerebral Palsy* Lifetime cost € 1062732 Bottom-up* These lifetime costs were calculated at a discount rate of 5%, using the consumer price index of 2000. The average lifetime costs of CP patients were compared with those of the general population. Therefore, we use € 0 as lifetime cost for the general population.

Sensitivity analysis

The robustness of our findings was evaluated with multiple univariate

sensitivity analyses. In four models we examined the influence of differences

in the probabilities and costs. Model 1 and 2 assessed the impact of different

probabilities for vaginal instrumental delivery and metabolic acidosis by using

the 95% confidence intervals reported in the IPD meta-analysis of Schuit et al.6

Model 3 and 4 assessed the impact of an decreased (-25%) or increased (+25%)

costs difference between STAN and EFM.

RESULTS

The results from our cost-effectiveness analysis comparing STAN with EFM

using the neonatal model showed that metabolic acidosis is reduced from

1100 to 900 per 100 000 newborns when a STAN based strategy is followed. The

average costs of STAN are €38 higher when compared to EFM alone resulting

in a cost to prevent one case of metabolic acidosis of €14 509. In the maternal

model we found a reduction of instrumental deliveries of 1.5% in favour or

STAN. The costs to prevent one instrumental delivery was estimated at €2602.

Long-term cost-effectiveness

The long-term cost-effectiveness model of STAN compared to EFM only show

different results depending on the probability of CP after enduring metabolic

acidosis. When the lower boundary is used (probability of 0.5%) we found

a costs per QALY gained of €95 308. In that case, STAN is cost-effective at a

willingness to pay (WTP) above € 95 308. When the upper boundary is used

(probability of 9.3%) we found a net benefit of €6025 per QALY, resulting in

a dominant cost-effective strategy of STAN (i.e., the use of STAN has a higher

health effect and has a cost-saving effect at the same time, when compared to

EFM alone). We found a break even point at a probability of 1.3% of CP after

enduring metabolic acidosis, meaning that STAN becomes the dominant foetal

monitoring strategy if ≥1.3% of the patients exposed to metabolic acidosis

obtain CP. A WTP of €20 000 would correspond to a probability of CP after

metabolic acidosis of 1.0%.

Sensitivity analyses

The univariate sensitivity analyses shows that the costs to prevent one

instrumental delivery vary between €1951 and €3252 when using different

assumptions. The costs to prevent metabolic acidosis vary between €10 882

and €18 136 (Table 3).

Table 3. Sensitivity analysis (2012 euros)

Model Description Costs to prevent one instrumental delivery

Costs to prevent one metabolic acidosis

0 Base case €2 602 €14 509

1 Probability of instrumental delivery (lowest value of 95% CI)

€2 675 €16 401

2 Probability of instrumental delivery (highest value of 95% CI)

€2 532 €13 008

3 Cost difference EFM and STAN (decrease of 25%) €1 951 €10 882

4 Cost difference EFM and STAN (increase of 25%) €3 252 €18 136

Chapter 6

DISCUSSION

This study suggests that STAN, when compared to EFM alone during labour,

can be a cost-effective strategy from both a maternal (short-term) and

neonatal (long-term) perspective. At short-term we found a costs to prevent

one instrumental delivery of €2602. This means that STAN requires a small

investment to realize the health gain (prevention of instrumental delivery). It

is up to society if it is willing to pay the costs (WTP). For the long-term, STAN

is likely to be a cost-saving strategy, but this is depends on the association

between metabolic acidosis and CP. We found a cost to prevent one metabolic

acidosis of €14 509. STAN becomes a cost-effective strategy if ≥1.0% of the

patients exposed to metabolic acidosis acquires CP. When tested in sensitivity

analysis, our findings are robust for different assumptions in probabilities and

costs.

A limitation of our study can be found in the neonatal model that uses a

non-significant difference of metabolic acidosis between the STAN and EFM

groups found in the IPD analysis of Schuit et al. We think however that this

non-significance can be related to an underpowered IPD study due to the low

incidence of the outcome. We performed a power calculation showing that the

difference in metabolic acidosis would become significant at α 0.05 if 24 561

patients per group were included. As this was not the case in this IPD analysis,

we can assume that the found difference is not significant possibly due to lack

of power. This lack of power however can affect the generalizability of the

results due to the higher uncertainty on the ‘real’ effect of STAN and therefore

the possible over- or underestimation of the cost-effectiveness of STAN.

Second, the lack of consistent literature reporting the strength of association

between metabolic acidosis and the risk of developing CP limits the long-term

cost-effectiveness analysis towards a more explorative (using ranges) rather

than conclusive analysis.

This analysis supports the use of STAN as a cost-effective device for at least

the maternal short-term perspective by reducing the number of instrumental

deliveries. This short term economic benefit can also have long-term

consequences. By reducing the number of instrumental delivery, there should

be a decrease in third- and fourth-degree perineal tears. This implies less

costs regarding perineal repair on the operation room, but also the risks for

a repeated perineal tear, or delivery by caesarean section in next pregnancy.

The economic consequences of this potential reduction were not taken into

consideration in our analysis due to the fact that the outcome ‘perineal tear’

was not reported in the IPD meta-analysis of Schuit et al.6

The results of our cost-effectiveness analysis are consistent with the conclusions

of previous cost-effectiveness reporting.7,8 However, both of these analyses

were restricted to the neonatal perspective. To date several STAN trials have

been performed, but none of them have had mayor impact on the clinical

implementation of STAN. This is possibly explained by the fact that STAN has not

been proven to reduce the rate of metabolic acidosis at a statistically significant

level. Also, the reporting on the relationship between metabolic acidosis and

development of CP is still very heterogeneous. In order to prove the long-term

cost-saving benefit of STAN due to effective CP prevention, long-term follow-

up data of these randomized trials of STAN compared to EFM alone should

be collected. Until that data becomes available, our cost-effectiveness study

motivates the clinical implementation of STAN from maternal perspective (i.e.,

reduction of instrumental deliveries at a small investment).

CONCLUSION

The use of STAN during labour can reduce the number of metabolic acidosis

and instrumental deliveries at a small investment when compared to EFM alone.

In the long-term, STAN is a cost-saving strategy when the risk of developing

cerebral palsy is >1% for patients exposed to metabolic acidosis at birth.

Chapter 6

REFERENCES

1. Malin GL, Morris RK, Khan KS. Strength of association between umbilical cord pH and perinatal and long term outcomes: systematic review and meta-analysis. BMJ. 2010;340:c1471.

2. Kruse M, Michelsen SI, Flachs EM, Brønnum-Hansen H, Madsen M, Uldall P. Lifetime costs of cerebral palsy. Dev Med Child Neurol. 2009;51:622-8.

3. Norén H, Amer-Wåhlin I, Hagberg H, Herbst A, Kjellmer I, Marčál K, et al. Fetal electrocardiography in labor and neonatal outcome: Data from the Swedish randomized controlled trial on intrapartum fetal monitoring. Am J Obstet Gynecol. 2003;188:183-92.

4. Amer-Wåhlin I, Hellsten C, Norén H, Hagberg H, Herbst A, Kjellmer I, et al. Cardiotocography only versus cardiotocography plus ST analysis of fetal electrocardiogram for intrapartum fetal monitoring: A Swedish randomised controlled trial. Lancet. 2001;358:534-8.

5. Westgate J a., Bennet L, Brabyn C, Williams CE, Gunn AJ. ST waveform changes during repeated umbilical cord occlusions in near-term fetal sheep. Am J Obstet Gynecol. 2001;184:743–51.

6. Schuit E, Amer-Wahlin I, Ojala K, Vayssière C, Westerhuis MEMH, Marčál K, et al. Effectiveness of electronic fetal monitoring with additional ST analysis in vertex singleton pregnancies at >36 weeks of gestation: An individual participant data metaanalysis. Am J Obstet Gynecol. 2013;208:1-13.

7. Heintz E, Brodtkorb TH, Nelson N, Levin LÅ. The long-term cost-effectiveness of fetal monitoring during labour: A comparison of cardiotocography complemented with ST analysis versus cardiotocography alone. BJOG An Int J Obstet Gynaecol. 2008;115:1676–87.

8. Vijgen SMC, Westerhuis MEMH, Opmeer BC, Visser GH a, Moons KGM, Porath MM, et al. Cost-effectiveness of cardiotocography plus ST analysis of the fetal electrocardiogram compared with cardiotocography only. Acta Obstet Gynecol Scand. 2011;90:772–8.

9. Husereau D, Drummond M, Petrou S, Carswell C, Moher D, Greenberg D, et al. Consolidated Health Economic Evaluation Reporting Standards (CHEERS) statement. Eur J Heal Econ. 2013;14:367–72.

10. Ellenberg JH, Nelson KB. The association of cerebral palsy with birth asphyxia: a definitional quagmire. Dev Med Child Neurol. 2013;55:210–6.

11. Oskoui M, Coutinho F, Dykeman J, Jetté N, Pringsheim T. An update on the prevalence of cerebral palsy: a systematic review and meta-analysis. Dev Med Child Neurol. 2013 Jun;55:509-19.

12. Sculpher M. The use of quality-adjusted life-years in cost-effectiveness studies. Allergy Eur J Allergy Clin Immunol. 2006;61:527–30.

13. Rosenbaum PL, Livingston MH, Palisano RJ, Galuppi BE, Russell DJ. Quality of life and health-related quality of life of adolescents with cerebral palsy. Dev Med Child Neurol. 2007;49:516–21.

14. Young NL, Rochon TG, McCormick A, Law M, Wedge JH, Fehlings D. The Health and Quality of Life Outcomes Among Youth and Young Adults With Cerebral Palsy. Arch Phys Med Rehabil 2010;91:143–8.

15. Strauss D, Shavelle R, Reynolds R, Rosenbloom L, Day S. Survival in cerebral palsy in the last 20 years: Signs of improvement? Dev Med Child Neurol. 2007;49:86–92.

16. Centraal Bureau voor de Statistiek. Levensverwachting; geslacht en leeftijd, vanaf 1950 (per jaar) [cited 2014 Mar 1]. Available from: http://statline.cbs.nl/StatWeb/publication/?VW=T&DM=SLNL&PA=37360ned&LA=NL

17. Burström K, Johannesson M, Diderichsen F. Swedish population health-related quality of life results using the EQ-5D. Qual Life Res. 2001;10:621-35.

18. National Institute for Health and Care Excellence. Guide to the methods of technology appraisal 2013. [cited 2014 April 2]. Available from: http://www.nice.org.uk/media/D45/1E/GuideToMethodsTechnologyAppraisal2013.pdf

19. Xu X, Ivy JS, Patel D a, Patel SN, Smith DG, Ransom SB, et al. Pelvic floor consequences of cesarean delivery on maternal request in women with a single birth: a cost-effectiveness analysis. J Womens Health 2010;19:147–60.

Appendix A. Quality Adjusted Life Years of patients with cerebral palsy and general population.17

Age General population Patients with CP

General population < 30Y 0.90 0.36

General population 30 - 39 Y 0.88 0.35

General population 40 – 49 Y 0.87 0.35

General population > 80 Y 0.69 0.28

An example of the formula used to calculate the QALY of a 40 year-old CP

patient is shown below.7

UsevCP

QALY sevCP X UGeneral 40 X 1 year

UGeneral ≤30

CHAPTER 7

KOSTEN EN EFFECTEN VAN DOELMATIGHEIDSONDERZOEK

IN DE OBSTETRIEEEN BUDGET-IMPACTANALYSE VAN 8 OBSTETRISCHE

DOELMATIGHEIDSSTUDIES

COSTS AND HEALTH OUTCOMES OF EFFECTIVENESS STUDIES IN OBSTETRICS:

a budget impact analysis of 8 obstetric effectiveness studies]

Janneke van ’t Hooft, Brent C. Opmeer,

Margreet J. Teune, Luuk Versluis, Ben Willem J. Mol

Ned Tijdschr Geneeskd. 2013;157:A6287.

Chapter 7

ABSTRACT

Het verkrijgen van inzicht in de kosten en gezondheidseffecten van

doelmatigheidsonderzoek in de obstetrie op nationaal niveau.

Budget-impactanalyse

Methode

We zochten naar obstetrische doelmatigheidsstudies. De mogelijke budget-

impact van implementatie bij alle patiënten in Nederland werd bekeken,

evenals de gezondheidswinst voor moeders en hun kinderen.

Resultaten

Wij gebruikten 8 multicentrische gerandomiseerde trials met in totaal bijna

11.000 patiënten. In totaal bedroeg de potentiële kostenbesparing voor

deze trials € 9,6 miljoen per jaar terwijl de eenmalige kosten voor de trials

€ 3,1 miljoen bedroegen. Bij goede implementatie van de resultaten van

deze doelmatigheidsstudies, is er voor aterme zwangere vrouwen met

hypertensie of pre-eclampsie, vrouwen die ingeleid worden en vrouwen die

foetale bewaking krijgen gezondheidswinst te boeken; dit geld ook voor de

kinderen van deze vrouwen. Daarnaast kunnen de zorgkosten dalen door

het afschaffen of niet invoeren van interventies die geen positief effect op de

gezondheid hebben, namelijk verlengde tocolyse, gebruik van progesteron

bij tweelingzwangerschappen, druklijnen en het direct inleiden bij preterm

gebroken vliezen.

Conclusie

Adequate toepassing van de resultaten van doelmatigheidsstudies in

de obstetrie zou een aanzienlijke gezondheidswinst en kostenbesparing

kunnen opleveren, vergeleken met het verlenen van niet wetenschappelijk

onderbouwde zorg.

Kosten en effecten van doelmatigheidsonderzoek in de obstetrie

EXTENDED ENGLISH ABSTRACT

Objective

To estimate the expected impact of nationwide effectiveness studies in

obstetrics on health outcome and costs at national level.

Design

Budget impact analysis.

Method

We searched for all completed multicentre obstetrical evaluation research

completed in the Netherlands between 2008 and 2012. We used the website

of the Dutch ‘consortium for women’s health and reproductive studies’ giving

an overview of all evaluation research performed at multicentre level. We

also consulted gynaecologist in order to find out if other evaluation research

performed in this period was missed. To be included, study results had to be

published in MEDLINE or EMBASE database or available in a final report of the

(governmental) funder ZonMw. Moreover, either an economic analysis had to

be available, or a reliable cost-difference calculation could be performed from

the available data.

We extrapolated the study findings to the national situation, according to

the total numbers of patients in the Netherlands on whom the study results

could be applied, and performed a budget impact analysis. In this analysis, we

estimated the health-improvements for mother and child, as well as potential

cost-savings assuming a reasonable implementation of care after the study

results had been published. The robustness of calculations was evaluated in

sensitivity analyses varying assumptions on implementation after the study,

difference in costs of interventions performed, difference in study-related

costs and difference in incidence of studied obstetrical condition.

Results

We detected 19 obstetrical evaluation studies of which 11 studies did not

have a final report on their outcomes yet, while eight multicentre randomized

controlled trials (RCTs) randomizing 10.980 patients met the inclusion criteria.

The results of these trials had been published between 2009 and 2010 in The

Lancet, JAMA, New England Journal of Medicine, Obstetrics and Gynecology, BMJ

Chapter 7

and Plos Medicine. The studies evaluated include: (1) induction of labour versus

expectant monitoring for gestational hypertension or mild pre-eclampsia (GH/

PE) at term (HYPITAT trial); (2) induction of labour versus expectant monitoring

for intrauterine growth restriction at term (DIGITAT trial); (3) induction of labour

versus expectant management in women with preterm prelabour rupture

of membranes between 34 and 37 weeks (PPROMEXIL trial); (4) induction

of labour with foley catheter versus vaginal prostaglandin E2 gel at term

(PROBAAT trial); (5) internal versus external tocodynamometry for monitoring

labour (IUPC trial); (6) 17α-Hydroxyprogesteron Caproate for the prevention

of adverse neonatal outcome in multiple pregnancies (AMPHIA trial); and (7)

effect of maintenance tocolysis with nifedipine in threatened preterm labour

(APOSTEL II trial).

A potential benefit on the health of mothers and children could be observed

in women suffering from pregnancy induced hypertension and mild pre-

eclampsia, women with an unfavourable cervix in whom labour was induced;

and women undergoing foetal monitoring. Within women with hypertensive

disease at term, for example, an increase in labour induction from 50% to 80%

in women suffering GH/PE at term, can potentially reduce a severe maternal

morbidity with 10%. Intravenous antihypertensive medication is no longer

needed for 32% and 20% of their neonates will not be born with an arterial pH

of less than 7.05.

Furthermore, de-implementation of non-effective practices such as prolonged

tocolysis and induction of labour in preterm prelabour ruptured membranes

or not implementing a strategy such as progestagens for the prevention of

preterm delivery in twins generates substantial cost savings. The potential cost

reduction of these eight studies was found to be €9,6 million per year, with a one-

time investment of €3,1 million for the conduction of the evaluation projects.

Conclusion

Evaluation of the effectiveness and the health care efficiency of obstetrical

care can potentially result in considerable health-gains and cost-reduction

when compared to continuation of non-evaluated treatment. The potential

reduction of health costs at national level found in this study are 3 times the

trial costs. Adequate implementation and de-implementation of the results

of effectiveness studies is essential. This economic analysis can be extended

toward other medical fields and might be extrapolated to a global level.

INLEIDING

Sinds 2003 wordt er binnen de verloskunde in Nederland in multicenter

verband doelmatigheidsonderzoek verricht.1 Deze grotendeels door ZonMw

gesubsidieerde onderzoeken richten zich op het evalueren van effectiviteit en

doelmatigheid van verloskundige zorg met als doel kwaliteitsverbetering en

kostenbeheersing.2 Daarnaast is het opsporen van overbodige zorg mogelijk

een reële strategie voor het reduceren van het zorgbudget.3,4

Nu, tien jaar later, nemen meer dan 70 centra deel aan dit consortium en zijn

er een aantal van deze doelmatigheidsstudies afgerond en gepubliceerd in

gerenommeerde bladen als New England Journal of Medicine, The Lancet, het

Journal of the American Medical Association en het British Medical Journal.5-9 De

potentiële effecten op de kwaliteit en kosten van de zorg door het toepassen

van deze studieresultaten zijn naar ons weten niet eerder systematisch

onderzocht.

Het doel van ons onderzoek is het verkrijgen van inzicht in de budgetimpact

van doelmatigheidsonderzoek. Wij maakten hiertoe een inschatting van

de te verwachten kosten en gezondheidseffecten in Nederland na optimale

implementatie van in de periode tussen 2008-2012 afgeronde obstetrische

doelmatigheidsstudies, en vergeleken deze met de kosten van het doen van

het onderzoek op zich.

MATERIAAL EN METHODE

Doelmatigheidsstudies

We zochten naar multicentrische evaluaties van obstetrische interventies

uitgevoerd in Nederland. We gebruikten hiertoe het overzicht van

doelmatigheidsstudies opgesteld op de website van het Consortium for

women’s health and reproductive studies (www.studies-obsgyn.nl) en vroegen

verschillende gynaecologen om input. We includeerden studies waarvan de

studieresultaten gepubliceerd waren in de periode 2008-2012 en die terug

te vinden waren via Medline of Embase, of studies waarover gerapporteerd

werd in een eindverslag van ZonMw. Daarnaast moest een economische

analyse zijn uitgevoerd die was gepubliceerd of in een ZonMw-verslag was

Chapter 7

weergegeven, of moest het mogelijk zijn een eigen betrouwbare berekening

van kostenverschillen tussen behandelingen te maken.

Berekening gezondheidswinst

Voor de berekening van de gezondheidswinst gebruikten we van elke studie

de volgende gegevens:

Algemene gegevens We gebruikten gegevens over het jaarlijkse aantal

bevallingen in Nederland, de amenorroeduur en of er sprake was van een

eenling- of meerlingzwangerschap (bron: www.cbs.nl).10 De incidentie van het

onderzochte obstetrische probleem werd geschat uit de literatuur.

Gegevens van de studie We gebruikten gepubliceerde percentageverschillen

tussen onderzochte interventies voor primaire en secundaire uitkomstmaten

van moeder en neonaat.

Implementatie van de interventies voor en na de studie Gegevens hierover

haalde we uit de literatuur. Indien er geen gegevens beschikbaar waren, werden

ze door ons (JvtH, BWJM) geschat op basis van de volgende beslisregels:

(a) bij ongelijke gezondheidswinst werd gekozen voor de zorg met

de meeste gezondheidswinst, ongeacht de kosten, waarbij er

vanuit werd gegaan dat een gunstig studieresultaat door een

groot deel van het veld zal worden geïmplementeerd;

(b) bij gelijke gezondheidswinst werd gekozen voor de zorg met de

laagste kosten.

Door het verschil in aantal patiënten met een primaire of secundaire

uitkomstmaat voor en na implementatie te berekenen, kon de gezondheidswinst

of het gezondheidsverlies in kaart gebracht worden.

Budget-Impactanalyse

In een budget-impactanalyse werd het verschil in kosten tussen de in de studie

onderzochte interventies berekend voor de situatie vóór en na implementatie

van de studieresultaten. Informatie over kostenverschillen werd verkregen

uit economische analyses van de studies. Voor de eenmalige kosten voor de

uitvoering van de doelmatigheidsstudie zelf werd het uitgekeerde ZonMw-

subsidiebedrag meegenomen, of het gemiddelde hiervan indien het geen

ZonMw-studie betrof. De formule voor de budget-impactanalyse is te zien als

Appendix 1.

Sensitiviteitsanalyses

Omdat verschillende gegevens waren gebaseerd op aannames, werden

voor 8 scenario’s univariabele sensitiviteitsanalyses uitgevoerd om inzicht te

krijgen in de robuustheid van de berekeningen. In scenario 1 en 2 verkenden

we de verschillen in implementatiepercentage na de studie (voorzichtige

implementatie:+10%, en maximale implementatie: tot 100%). In scenario 3

en 4 verkenden we de kostenverschillen van het beleid (-50% en +50% van

de in de economische analyse berekende kostenverschillen). In scenario 5 en

6 verkenden we de minimale en maximale studiekosten (-50% en +50%). In

scenario 7 en 8 verkenden we de verschillen in incidentie van de in de studie

onderzochte aandoening (-25% en +25%).

RESULTATEN

Studies

In totaal werden 19 obstetrische doelmatigheidsstudies gevonden die in de

periode 2008-2012 hun inclusies hadden afgerond. Van 10 van deze studies

was nog niet over een primaire uitkomst gepubliceerd of in een ZonMw-verslag

gerapporteerd, en in 1 geval was geen economische analyse beschikbaar. 8

doelmatigheidsstudies die in totaal 10.980 patiënten randomiseerden werden

meegenomen in onze analyse.5-9;11-13 In tabel 1 staat per studie welke interventie

onderzocht werd in de desbetreffende studie. In tabel 2 staat per studie

welke gegevens wij gebruikten voor de berekening van gezondheidswinst en

kostenbeperking.

Gezondheidswinst

3 studies toonden een significant verschil tussen de interventies aan: de

HYPITAT-studie liet een verschil in primaire en secundaire uitkomstmaten

zien, terwijl de PROBAAT- en STAN-studies enkel verschilden op secundaire

uitkomstmaten (tabel 3).

Uit tabel 3 volgt bijvoorbeeld de verwachte gezondheidswinst na

implementatie van de HYPITAT-studie. De HYPITAT-studie onderzocht het

verschil tussen inleiden versus expectatief beleid bij in totaal 756 aterme

vrouwen met een zwangerschapshypertensie of milde vorm van pre-eclampsie.

De studieresultaten wezen op een verschil in de primaire uitkomstmaat

Chapter 7

(samengestelde maternale uitkomst) van 31% in de inleidingsgroep versus

44% in de expectatieve groep.8 Tabel 3 laat zien dat als 80% van deze vrouwen

wordt ingeleid, in plaats van de 50% die ingeleid werd vóór het uitvoeren van

de studie, er jaarlijks bij 10% een slechte maternale uitkomst voorkomen zou

kunnen worden.

Tabel 1. Nederlandse obstetrische doelmatigheidsstudies die werden meegenomen in de analyse naar budgetimpact van doelmatigheidsonderzoek

Acroniem Trialnummer Onderzochte interventie

PROBAAT NTR 1646 Priming van de baring met Foley-katheter versus vaginale prostaglandine E2-gel

HYPITAT ISRCTN08132825 Inleiden versus expectatief beleid bij zwangerschapshypertensie en milde pre-eclampsie bij 36 en 41 weken

DIGITAT ISRCTN10363217 Inleiden versus expectatief beleid bij intra-uteriene groeivertraging bij 36 en 41 weken

PPROMEXIL ISRCTN29313500 Inleiden versus expectatief beleid bij prematuur gebroken vliezen bij 34 tot 37 weken

IUPC ISRCTN13667534 Gebruik van een intra-uteriene druklijn voor weeënregistratie versus uitwendige registratie middels een tocodynamometer

APOSTEL II NTR1336 Verlengde tocolyse versus tocolyse tot 48 uur ter voorkoming vroeggeboorte

AMPHIA ISRCTN40512715 Gebruik progesteron versus placebo ter preventie van vroeggeboorte bij tweelingzwangerschappen

STAN ISRCTN95732366 Gebruik van electrocardiografie plus ST-analyse versus alleen electorcardiografie voor foetale bewaking

Kosten

Voor 4 studies werd een gepubliceerde economische analyse gebruikt,14-17

voor 3 studies de ZonMw-rapportages. Tabel 2 toont per studie een overzicht

over het perspectief van de economische analyses. Tevens lichten we de

analyse van de IUPC en PROBAAT studie nadere toe: de economische impact

van de IUPC studie is geschat door het verschil in kosten te berekenen tussen

het gebruik van een Koala-katheter voor intra-uteriene drukmeting en het

gebruik van een tocodynamometer. We weken bij de PROBAAT-studie af van

de internationaal gepubliceerde cijfers en gebruikten in de plaats daarvan een

sensitiviteitsanalyse uit de economische analyse. Deze analyse hield rekening

met een belangrijk verschil. De PROBAAT-studie liet zien dat vrouwen met een

Foley-katheter niet langer in een verloskamer opgenomen hoeven te worden,

maar in een goedkoper kraambed terecht kunnen, waardoor deze berekening

naar ons inzicht meer van toepassing was voor de Nederlandse situatie.

vóór

Chapter 7

l 2. c

e, €

n, €

€754

iteits

, €38

l 3. G

patië

tie, %

tiënt

Kosten en eecten van doelmatigheidsonderzoek in de obstetrie

l 3. v

, n (g

tiënt

ria fo

; h (g

patië

Chapter 7

In figuur 1 staat de verwachte kostenbesparing per studie voor een

basisscenario dat gebaseerd is op de meest realistische aannames voor

wat betreft de studiekosten, de mate en kosten van implementatie, en de

incidentie van de onderzochte aandoeningen. Met inbegrip van studiekosten

waren er volgens onze aannames 2 studies die netto resulteerden in een

kostenstijging, namelijk STAN en AMPHIA (zie tabel 2). Bij de STAN-studie

was de netto kostenstijging voor het gebruik van ST-analyse van het foetale

ecg aan bewaking met cardiotocografie durante partu € 29 per patiënt, bij de

AMPHIA-studie waren de netto kosten gelijk aan de eenmalige kosten voor de

studie, die geen aanleiding gaf voor beleidswijziging. 6 studies toonden een

netto kostenbesparing van € 540.000-4,1 miljoen per jaar. In totaal leverden

de 8 studies een potentiele kostenbesparing op van € 9,6 miljoen per jaar bij

een eenmalige investering van € 3,1 miljoen voor de uitvoering van de studies.

Sensitiviteitsanalyses lieten zien dat ook bij aanzienlijk verschillende aannames

de resultaten grotendeels vergelijkbaar zijn met het basisscenario (figuur 2).

Figuur 1. Verwachte kostenbesparing per jaar van 8 doelmatigheidsstudies tov. eenmalige kosten die gemaakt werden voor het uitvoeren van de studies.

Kosten en eecten van doelmatigheidsonderzoek in de obstetrie

BESCHOUWING

Onze berekeningen geven een beeld over de potentiële kostenbesparing van 8 obstetrische interventies, afgezet tegen de eenmalige investering in doelmatigheidsstudies. We berekenden dat op nationaal niveau de waarschijnlijke jaarlijkse kostenbesparing van een beleidsverandering naar aanleiding van deze doelmatigheidsstudies 3 keer zo groot is als de eenmalige kosten voor het uitvoeren van deze studies. De evaluaties van bestaande, breed ingevoerde maar niet met wetenschappelijk bewijs onderbouwde interventies sluit aan bij actuele politieke discussies over hoe de zorgmarkt er uit moet gaan zien. De Nederlandse Zorgautoriteit rapporteert dat zorgprofessionals en wetenschappelijke verenigingen zorgvuldig moeten gaan bepalen of alle zorg in alle gevallen geboden moet worden, onder meer omdat voor de helft van de therapieën de eectiviteit niet bekend is.18,19

Figuur 2. Resultaten van univariabele sensitiviteitsanalyses ten opzichte van basisscenario (Base case)

Balken van links naar rechts: basisscenario (Base case). Model 1en 2; Implementatie van studieresultaten bij circa 10 en 100% van de patiënten (implementatie voorzichtig en implementatie maximaal). Model 3 en 4; Kostenverschil van het onderzochte beleid van -50% en +50% ten opzichte van het basisscenario (kostenverschil beleid -50% en +50%). Model 5 en 6; Kostenverschil voor het uitvoeren van de studies -50% en +50% (studiekosten -50% en +50%) ten opzichte van het basisscenario. Model 7 en 8; Verschil in incidentie van de onderzochte aandoening van -25% en +25% (incidentie -25% en +25%) ten opzichte van het basisscenario.

Meer zorg betekent niet automatisch betere zorg. Dit werd eerder al inzichtelijk gemaakt door Amerikaans onderzoek uit 2003 dat liet zien dat grote verschillen in behandelingsintensiteit tussen regio’s niet tot verschillen

Chapter 7

in gezondheidswinst leidden.20 Het rapport ‘Kwaliteit als medicijn’ adviseert

ook een vermindering van overbehandeling en praktijkvariatie als belangrijke

kwaliteitsverbetering.21 Juist door de kwaliteit van de zorg te verhogen is

er volgens dit rapport potentie om 4-8 miljard euro te besparen op de naar

schatting 30 miljard aan uitgaven in de curatieve zorg. Ook het rapport ‘Meten

van zorguitkomsten: de heilige graal binnen handbereik’ concludeert dat in de

meeste gevallen kwaliteitsverbeteringen kunnen leiden tot lagere kosten.22

Dit betekent dat naast de evaluatie van innovatieve zorg er ook gewerkt moet

worden aan de bewijsvoering van de huidige zorg. Doelmatigheidsonderzoek

zou dus de regel moeten zijn in plaats van een uitzondering.

De vraag is hoe doelmatigheidsonderzoek op een structurele basis in

het zorgsysteem kan worden geïntegreerd. Op dit moment is de uitvoer

van elke studie afhankelijk van subsidies, vaak verkregen via ZonMW. Een

meer structurele financiering zou op zijn plaats zijn. Wij zijn van mening

dat zorgverzekeraars een regiefunctie kunnen hebben om dit samen met

wetenschappelijke verenigingen, ziekenhuizen en specialismen te organiseren.

Vervolgens zouden de uitkomsten van deze budget-impactanalyse een

aansporing voor zorgverzekeraars moeten zijn om zelf middelen beschikbaar

te stellen voor de financiering van doelmatigheidsonderzoek en het actief

sturen op snelle implementatie van studieresultaten.

Beperkingen studie

Het proces dat wij gebruikten om studies te includeren heeft voor een breed

spectrum aan obstetrische studies opgeleverd. Toch is een selectiebias niet uit

te sluiten, aangezien studies met een duidelijke uitkomst potentieel eerder

hun studieresultaten gepubliceerd krijgen.

Kostenbesparing Daarnaast blijkt uit onze analyse dat de totale

kostenbesparing vooral is toe te schrijven aan 2 van de 8 studies: PROBAAT- en

de HYPITAT-studie. Dit maakt naar ons idee de berekeningen echter niet minder

realistisch. Omdat de uitkomst van een studie van tevoren niet te voorspellen

valt, is er altijd onzekerheid bij het investeren in een studie.23 De grootste winst

wordt uiteraard geboekt bij studies met een duidelijk positief resultaat voor

een therapie die tot minder kosten leidt. Maar ook studies die geen verschil

in effectiviteit aantonen, zoals APOSTEL II- en IUPC-studie, kunnen een netto-

besparing opleveren, doordat er dan een onderbouwing is voor het afschaffen

of niet invoeren van dure zorg.

Dubbeltelling Tevens kan er in onze berekening spraken zijn van dubbeltelling

omdat er per aandoening wordt geteld in plaats van per zwangere. De

gecombineerde patiënte aantallen tellen bij elkaar 172.700 patiënten terwijl er

in Nederland ruim 184.000 bevallingen zijn.10 Het is lastig om een inschatting

te maken van de mate van overlap. Een van de vele mogelijke scenario’s is

bijvoorbeeld een patiënte met zwangerschapshypertensie (HYPITAT-studie)

die ingeleid wordt (PROBAAT-studie) met daarbij foetale bewaking (STAN-

studie) en een intra-uteriene drukkatheter (IUPC-studie). De kosten die met

deze interventies gepaard gaan staan echter voor een groot deel los van elkaar,

zodat het verdedigbaar is om eventuele dubbeltelling te handhaven.

Beperkingen van RCT’s De potentiële beperkingen van de RCT zelf dienen

evenmin uit het oog verloren te worden. De gebruikte methode in dit

onderzoek schiet daarin soms te kort. Wij gingen er namelijk van uit dat als een

behandeling niet bewezen effectief was, er voor de goedkoopste optie gekozen

kon worden. Ten eerste zijn artsen niet snel geneigd hun ingesleten gewoontes

te veranderen. Bovendien overtuigt één enkele studie vaak niet voldoende om

het medisch handelen te doen veranderen. Zeker als de resultaten niet heel

overtuigend in één richting wijzen zal invoering of afschaffing van een bepaalde

werkwijze veelal weerbarstig zijn. Zo concluderen de auteurs van de DIGITAT-

studie dat een expectatief beleid bij aterme intra-uteriene groeivertraging

gerechtvaardigd kan zijn, hoewel hun studie niet voldoende power had

om antenatale sterfte aan te tonen. Recent onderzoek met gegevens uit de

Perinatale registratie Nederland toonde wél een significant verschil in intra-

uteriene vruchtdood, ten gunste van inleiding.24 Ondanks dat inleiding in dit

geval de duurdere optie is, zou men op basis van dit bewijs kunnen beslissen

om toch eerder in te leiden.

Implementatie evaluatieonderzoek

De kennis uit doelmatigheidsonderzoek kan pas helpen de zorguitgaven

te reduceren als het medisch handelen wordt aangepast. Het uitvoeren van

doelmatigheidsonderzoek is dus niet de enige voorwaarde om het budget in

de hand te houden. Er is in ieder geval een structurele registratie nodig die

implementatie van onderzoeksresultaten in de praktijk registreert.

Chapter 7

De implementatie van de resultaten van doelmatigheidsonderzoek vindt

plaats via richtlijnen. Het is algemeen geaccepteerd dat de resultaten van

doelmatigheidsonderzoek eerst gepubliceerd worden in peer-reviewed

tijdschriften, en daarna de richtlijnen aan de hand van de resultaten worden

aangepast. Bij het maken van richtlijnen kan ook gebruik worden gemaakt

van buitenlandse studies, want doelmatigheidsonderzoek wordt niet alleen in

Nederland verricht. Er is momenteel nog geen structuur voor het opstellen van

een de internationale onderzoekagenda. Zo is de Nederlandse AMPHIA-studie

naar de effectiviteit van progesteron voor de preventie van vroeggeboorte

bij meerling zwangerschap op 11 plaatsen in de wereld herhaald, steeds met

hetzelfde resultaat.25 Momenteel proberen wij via het ‘Global Obstetrics Network’

afstemming te bereiken voor het verrichten van doelmatigheidsstudies (www.

globalobstetricsnetwork.org).26

CONCLUSIE

Het evalueren van de effectiviteit en de doelmatigheid van obstetrische zorg

kan leiden gezondheidswinst en kostenbesparing. Onze cijfers tonen aan

dat een beleidsverandering naar aanleiding van doelmatigheidsonderzoek

mogelijk 3 keer meer oplevert dan wat de kosten zijn voor het uitvoeren van het

doelmatigheidsonderzoek. Gezien de moeilijk beheersbare kostenstijgingen

in de zorg is dit een aantrekkelijk alternatief voor de huidige situatie, waarbij

niet onderbouwde zorg gewoon vergoed wordt.

Er is geen adequate registratie van implementatie van onderzoeksresultaten in

de praktijk. Wij zijn van mening dat zorgverzekeraars een regiefunctie kunnen

hebben om samen met wetenschappelijke verenigingen, ziekenhuizen en

specialismen een structurele financiering van doelmatigheidsonderzoek en

implementatieregistratie te organiseren.

REFERENTIES

(1) Consortium for women’s health and reproductive studies; cited 2013 Feb. Available from: www.studies-obsgyn.nl

(2) ZonMw. Programmatekst doelmatigheidsonderzoek 2010-2012; cited 2013 Feb. Available from: http://www.zonmw.nl/nl/programmas/programma-detail/doelmatigheidsonderzoek/publicaties/.

(3) Berwick DM, Hackbarth AD. Eliminating waste in US health care. JAMA 2012 Apr 11;307(14):1513-6.

(4) ZonMw. Signalement Verstandig Kiezen Kostenbesparng door bepaalde interventies ‘niet of minder te doen’. 2012. Cited 2013 Apr. Available from: http://www.zonmw.nl/nl/publicaties/detail/signalement-verstandig-kiezen/?no_cache=1&cHash=4a3b9aa0c8a8e0ad2b992a14d455f05f/

(5) Bakker JJ, Verhoeven CJ, Janssen PF, van Lith JM, van Oudgaarden ED, Bloemenkamp KW, et al. Outcomes after internal versus external tocodynamometry for monitoring labor. N Engl J Med 2010 Jan 28;362(4):306-13.

(6) Boers KE, Vijgen SM, Bijlenga D, van der Post JA, Bekedam DJ, Kwee A, et al. Induction versus expectant monitoring for intrauterine growth restriction at term: randomised equivalence trial (DIGITAT). BMJ 2010;341:c7087.

(7) Jozwiak M, Oude Rengerink K, Benthem M, van Beek E, Dijksterhuis MG, de Graaf IM, et al. Foley catheter versus vaginal prostaglandin E2 gel for induction of labour at term (PROBAAT trial): an open-label, randomised controlled trial. Lancet 2011 Dec 17;378(9809):2095-103.

(8) Koopmans CM, Bijlenga D, Groen H, Vijgen SM, Aarnoudse JG, Bekedam DJ, et al. Induction of labour versus expectant monitoring for gestational hypertension or mild pre-eclampsia after 36 weeks’ gestation (HYPITAT): a multicentre, open-label randomised controlled trial. Lancet 2009 Sep 19;374(9694):979-88.

(9) Roos C, Spaanderman ME, Schuit E, Bloemenkamp KW, Bolte AC, Cornette J, et al. Effect of maintenance tocolysis with nifedipine in threatened preterm labor on perinatal outcomes: a randomized controlled trial. JAMA 2013 Jan 2;309(1):41-7.

(10) Centraal Bureau voor de Statistiek (CBS). Perinatale en zuigelingensterfte; zwangerschapsduur en geslacht (2006-2008); cited 2012 Oct. Avalable from: www.cbs.nl.

(11) Lim AC, Schuit E, Bloemenkamp K, Bernardus RE, Duvekot JJ, Erwich JJ, et al. 17alpha-hydroxyprogesterone caproate for the prevention of adverse neonatal outcome in multiple pregnancies: a randomized controlled trial. Obstet Gynecol 2011 Sep;118(3):513-20.

(12) van der Ham DP, Vijgen SM, Nijhuis JG, van Beek JJ, Opmeer BC, Mulder AL, et al. Induction of labor versus expectant management in women with preterm prelabor rupture of membranes between 34 and 37 weeks: a randomized controlled trial. PLoS Med 2012;9(4):e1001208.

(13) Westerhuis ME, Visser GH, Moons KG, van Beek E, Benders MJ, Bijvoet SM, et al. Cardiotocography plus ST analysis of fetal electrocardiogram compared with cardiotocography only for intrapartum monitoring: a randomized controlled trial. Obstet Gynecol 2010 Jun;115(6):1173-80.

(14) van Baaren GJ, Jozwiak M, Opmeer BC, Oude Rengerink K, Benthem M, Dijksterhuis MG, et al. Cost-effectiveness of induction of labour at term with a foley catheter compared to vaginal prostaglandin E2 gel (PROBAAT trial). BJOG 2013; Mar 26. Epub ahead of print.

(15) Vijgen SM, Boers KE, Opmeer BC, Bijlenga D, Willekes C, Bloemenkamp K, et al. An economic analysis comparing induction of labour and expectant management for intrauterine growth restriction at term (Digitat Trial). Am J Obstet Gynecol 2009 Dec 1;2009.10.113.

(16) Vijgen SM, Koopmans CM, Opmeer BC, Groen H, Bijlenga D, Aarnoudse JG, et al. An economic analysis of induction of labour and expectant monitoring in women with gestational hypertension or pre-eclampsia at term (HYPITAT trial). BJOG 2010 Dec;117(13):1577-85.

(17) Vijgen SM, Westerhuis ME, Opmeer BC, Visser GH, Moons KG, Porath MM, et al. Cost-effectiveness of cardiotocography plus ST analysis of the fetal electrocardiogram compared with cardiotocography only. Acta Obstet Gynecol Scand 2011 Jul;90(7):772-8.

(18) Britisch Medical Journal. What conclusions has Clinical Evidence drawn about what works, what doesn’t based on randomised controlled trial evidence?; cited 2013 Feb. Available form: www. clinicalevidence.bmj.com/x/set/static/cms/efficacy-categorisations.html

Chapter 7

(19) Nederlandse Zorgautoriteit. Van fabels naar feiten. Stand van de zorgmarkten 2012; cited 2013 Jan. Available from: www.nza.nl/104107/426385/Stand_van_de_zorgmarkten_2012.pdf

(20) Fisher ES, Wennberg DE, Stukel TA, Gottlieb DJ, Lucas FL, Pinder EL. The implications of regional variations in Medicare spending. Part 1: the content, quality, and accessibility of care. Ann Intern Med 2003 Feb 18;138(4):273-87.

(21) Booz&co. Kwaliteit als medicijn. Aanpak voor betere zorg en lagere kosten; c2012; [cited 2013 Jan]. Available from: www.booz.com/media/uploads/BoozCo_Kwaliteit-als-medicijn.pdf

(22) KPMG Advisory N.V. Meten van zorguitkomsten: de heilige graal binnen handbereik; c2012 [cited 2013 Jan]. Available from: www.kpmg.com/NL/nl/IssuesAndInsights/ArticlesPublications/Documents/PDF/Healthcare/Meten-van-zorguitkomsten.pdf

(23) Djulbegovic B, Kumar A, Glasziou PP, Perera R, Reljic T, Dent L, et al. New treatments compared to established treatments in randomized trials. Cochrane Database Syst Rev 2012;10:MR000024.

(24) Kazemier B, Ravelli AC, Mol BW. Optimal timing of delivery in term small for gestational age infants, a national cohort study. Am J Obstet Gynecol 2013;208(1):S294-S295.

(25) Schuit E, Stock S, Groenwold RH, Maurel K, Combs CA, Garite T, et al. Progestogens to prevent preterm birth in twin pregnancies: an individual participant data meta-analysis of randomized trials. BMC Pregnancy Childbirth 2012;12:13.

(26) Mol BW, Ruifrok AE. Global Alignment, Coordination and Collaboration in Perinatal Research: The Global Obstetrics Network (GONet) Initiative. Am J Perinatol 2013;30(3):163-6..

(27) Nederlandse Vereniging Voor Obstetrie en Gynaecologie. Hypertensieve aandoeningen in de zwangerschap versie 2.0. Cited 2012 Jun. Available from: www.nvog.nl

(28) Nederlandse Vereniging Voor Obstetrie en Gynaecologie. Foetale groiebeperking versie 2.1. Cited 2012 July. Available from: www.nvog.nl

(29) Stichting Perinatale Registratie Nederland. Perinatale Zorg in Nederland 2008. Cited: 2012 July Available from: http://www.perinatreg.nl/uploads/150/122/Jaarboek_Zorg_in_Nederland_2008.

(30) van der Tuuk K, Koopmans CM, Groen H, Mol BW, van Pampus MG. Impact of the HYPITAT trial on doctors’ behaviour and prevalence of eclampsia in the Netherlands. BJOG 2011 Dec;118(13):1658-60.

Appendix 1. Formule budget-impactanalyse

K = (N *((preIMPA * Kost

A) + ( 1-preIMP

A * Kost

B))) – (N *(( postIMP

A * Kost

A) + (1-postIMP

A * Kost

K Kostenbesparing studie per jaar

Aantal patiënten met aandoening per jaar

preIMPA

Implementatiepercentage interventie A (% patiënten in NL dat interventie A krijgt) voor de studieresultaten bekend waren

1-preIMPA

Implementatiepercentage interventie B (% patiënten in NL dat interventie B krijgt) voor de studieresultaten bekend waren

postIMPA

Implementatiepercentage interventie A (% patiënten in NL dat interventie A krijgt) nadat

studieresultaten bekend waren

1-postIMPA

Implementatiepercentage voor interventie B (% patiënten in NL dat interventie B krijgt) nadat studieresultaten bekend waren

Kosten gegenereerd door inzet interventie A

Kosten gegenereerd door inzet interventie B

CHAPTER 8

SUMMARY AND

GENERAL DISCUSSION

Chapter 8

SUMMARY

Pregnancy is an important life event, and a good start of the baby’s life is

important for that baby and her family. Whenever a woman experiences an

imbalance of her pregnancy (e.g. preterm birth, foetal distress during labour,

high blood pressure), medical doctors are faced with the challenge to find a

solution with the most favourable outcomes. This thesis focuses on improving

evaluation research in obstetric interventions by:

• improving the outcomes used in evaluation research

• measuring long term outcomes of evaluation research

• integrating outcomes of obstetrical evaluation research in other

study designs to give guidance for clinical decision making.

The thesis is divided in three parts. Part I of this thesis provides a core outcome

set for obstetrical evaluation studies. In Chapter 2 the core outcome set for

studies on prevention of preterm birth is presented, developed with an

international e-Delphi consensus group. This core outcome set reflects the

outcomes that are critically important to all relevant stakeholders (patients,

obstetricians, midwives, neonatologists and researchers). One hundred and

seventy-four individuals, representing five stakeholder groups, including

obstetricians, midwives, neonatologists, researchers, and patients, from

twenty-five countries participated in a modified e-Delphi procedure. We

were able to reduce 227 outcomes identified by a systematic review of the

literature and 33 outcomes suggested by participants to 13 consensus ‘core’

outcomes. This set contains four outcomes related to pregnant women:

[1] maternal mortality; [2] maternal infection or inflammation; [3] preterm

rupture of membranes; and [4] harm to mother from intervention. Nine of the

core outcomes are related to the offspring: [1] gestational age at delivery; [2]

offspring mortality; [3] birthweight; [4] early neurodevelopmental morbidity;

[5] late neurodevelopmental morbidity; [6] gastrointestinal morbidity; [7]

infectious morbidity; [8] respiratory morbidity; and [9] harm to offspring from

intervention. Implementation of the core outcome set in future evaluation

studies on preterm birth prevention will ensure that data from these trials can

be compared and combined.

Summary and general discussion

Part II of this thesis presents data on long-term outcomes of obstetrical

intervention studies.

Chapter 3 evaluates the long-term outcomes in children born to mothers with

a short cervical length that were given a pessary during twin pregnancy in a

randomized controlled trial. The initial trial showed no benefit of pessary use

in the overall group of asymptomatic women with a twin pregnancy, when

compared to controls. However, in the women with a short cervical length (CL)

at screening (<38mm), short-term benefit (measured as a composite outcome

of neonatal mortality and morbidity, including prolongation of pregnancy)

was found in the pessary group. In this follow-up study, long-term survival

and neurodevelopmental outcomes were measured at three years corrected

age in the children born to mothers with a short cervix who were offered a

pessary (n=157) compared to controls (n=111). After three years 27 children

had died, 6 (5%) in pessary vs 21 (26%) in control group, adjusted OR [95%

CI] 0.14 [0.04 to 0.50]. To assess neurodevelopmental outcome Bayley-III scores

were collected for 173 (72%) out of 241 surviving children (114 (75%) pessary

vs 59 (66%) control group). The cumulative incidence of death or survival with

a neurodevelopmental disability was 12 (10%) vs 23 (29%) in the pessary and

control group, respectively, aOR [95% CI] 0.26 [0.09 to 0.75]. Neither statistical

nor clinically relevant differences in Bayley-III scores between both groups

were found. We concluded that in women with a twin pregnancy and a CL < 38

mm, the use of cervical pessary strongly improved survival of children without

affecting neurodevelopmental disability at three years corrected age.

Chapter 4 evaluates the long-term neurodevelopmental and physical

outcomes of the children born to mothers with a short cervical length (CL

<30mm) in a singleton pregnancy. These women were included in a randomized

controlled trial (Triple P) comparing the use of vaginal progesterone with

placebo in the second and third trimester to prevent preterm birth. The

initial trial showed no benefit of progesterone in the short-term but was

underpowered. Due to the lack of long-term outcome data after the use of

progesterone in pregnancy, and its safety being recently questioned, follow-

up of this trial gives valuable information. This chapter reports on survival, as

well as neurodevelopmental, neurological and physical outcomes of the trial

at two years corrected age of the children. Of the 77 surviving children in the

Triple P trial, 59 (77%) children were reached for follow-up of whom 57 (n=28

Chapter 8

progesterone vs n=29 placebo) completed a Bayley III test. Neither statistical

nor clinically relevant differences in Bayley-III scores between both groups were

found. Congenital malformations were seen in 8 (30%) and 2 (11%) children

in the progesterone and placebo group respectively RR [95% CI] 4.0 [0.93 to

17.1]. No differences in genital abnormalities and neurological examination

were seen between both groups. We concluded that in low risk women with a

short cervix the prescription of progesterone in second and third trimester is

not associated with moderate neurodevelopmental delay at 2 years corrected

age but that the difference in (minor) congenital anomalies should be further

explored. Although the sample size of this follow-up study is too small to pick

up small differences between study groups. These data can contribute to

future meta-analyses that must answer whether progesterone is a safe drug to

use during pregnancy.

Part III of this thesis focusses on the integration of outcomes derived

from obstetrical evaluations into a systematic review/meta-analysis, cost-

effectiveness analysis and budget impact analysis in order to give guidance

for clinical decision making. Chapter 5 provides an overview of the existing

literature to evaluate the predictive value of brain MRI results for long-term

developmental outcomes in children born preterm or with a low birth weight.

The study descriptively aggregates the results of 20 papers. The prognostic

accuracy of MRI, performed at term equivalent age and evaluating the

presence of moderate to severe white matter lesions, was found to be highest

for prediction of cerebral palsy with a sensitivity of 51% and specificity of 93%.

Its ability to predict other long-term outcomes such as neurocognitive and

behavioural impairments is limited. We concluded that routine use of MRI in

clinical practice is not recommended due to its moderate predictive value.

However, routine use of MRI in a research setting with adequate and uniform

recording of the data can help to generate future evidence on its prognostic

capacity.

As most clinical studies only collect outcomes at short-term, long-term data

is lacking in >80% of large evaluation studies in obstetrics.1 Especially for

interventions that target long-term outcome improvement, the lack of long-

term outcome data constitutes a blind spot for clinical decision-making. This

is the case in studies that evaluate interventions to prevent perinatal asphyxia.

Asphyxia is a clinical condition of impaired oxygen supply or blood flow to

the foetus and can occur before the onset of labour and during labour. This

outcome is known to have immediate consequences, but can also lead to long-

term neurological impairment.2 3 In Chapter 6 we used the data of an individual

patient participant data meta-analysis4 that assessed the use of ST-analysis in

electronic foetal monitoring (STAN) which aims to reduce neonatal asphyxia

during labour. In a cost-effectiveness analysis two models were created: one

model from a maternal perspective and the other from a neonatal perspective.

Costs and effects at short and long-term were evaluated for women and

children who were monitored with STAN compared to electronic foetal

monitoring (EFM) only. Results from the neonatal model showed that the STAN

strategy reduced metabolic acidosis (asphyxia) from 1100 to 900 per 100 000

newborns at an additional cost of € 14 509 to prevent one case of metabolic

acidosis. In the maternal model we found a reduction of instrumental deliveries

of 1.5% in favour of STAN. The cost to prevent one instrumental delivery was

estimated at € 2602. The results of the long term benefit of STAN is very much

depended on the association between short term metabolic acidosis and long

term cerebral palsy (CP). Evidence on this association is very heterogeneous,

providing a variation of probabilities between 0.5 to 9.3%. Explorative analysis

showed that STAN becomes a cost-saving strategy if ≥1.3% of children exposed

to metabolic acidosis acquire CP. This study therefore suggests that STAN, when

compared to EFM alone, can be a cost-effective strategy from both maternal

and neonatal perspective on the short term, and is potentially cost-saving on

the long term.

Finally, by performing evaluation studies of obstetrical interventions our

ultimate goal is to improve health outcomes of mothers and their children

at acceptable costs. Therefore, implementation of trial results is a crucial

step. Chapter 7 presents the predicted health and financial impact of the

implementation of eight nationwide evaluation studies in obstetrics. The

potential budget impact of the individual studies in terms of costs and effects

was extrapolated to the situation in the Netherlands. When the results of these

eight studies are implemented, a beneficial effect on health outcomes can be

expected in: (1) women suffering from pregnancy induced hypertension and

mild pre-eclampsia at term; (2) women in whom labour is induced and; (3)

women with foetal monitoring by STAN analysis. De-implemenation of non-

Chapter 8

effective practices such as (4) prolonged tocolysis; (5) intra uterine pressure

catheters; (6) progestagens to prevent preterm delivery in twins; (7) immediate

induction of labour in preterm prelabour ruptured membranes; and (8)

induction of labour at 35 completed weeks of gestation in intrauterine growth

restriction, reduces costs. The potential cost reduction was estimated to be €

9.6 million per year on the basis of a one-time investment cost of € 3.1 million

for the eight evaluation projects. We concluded that the financial and health

benefits of useful clinical research more than offset the costs of performing it.

GENERAL DISCUSSION

In perinatal medicine, RCTs can help us to identify the best policies and

interventions. A well-structured research question following the PICO(T)

criteria is an important ingredient of a RCT. Systematic reviews and meta-

analysis provide an overview of clinical trials on a topic and in doing so provide

the highest level of evidence that is meant to give guidance for clinical decision

making (Figure 1, introduction section).

There are, however, some problems in the design of randomised clinical trials and

other clinical evaluation studies that hamper the usefulness of clinical research.

First, there is a lack of standardization in the selection and operationalization

of outcomes. This may lead to inefficiency in research and waste of resources 5. Second, in more than 90% of large clinical trials in obstetrics there is lack

of follow-up outcomes.1 This can imply a blind spot in clinical practice, as not

all effects (benefit and harm) become apparent on the short term. The Dutch

famine study has provided examples for this phenomena.6 Finally, we are facing

a gap between clinical research and its impact on clinical decision making.5

This thesis focused on each of these three problems, all being barriers to

optimally use clinical research in order to improve the health of a pregnant

woman who faces a problem in her pregnancy. Using different methodologies,

we have explored several strategies that aim to improve evaluation research in

obstetric interventions.

I Towards standardisation of outcome measurement in obstetrical

evaluation research

At present the RCT is considered the optimal design to answer questions

about the effectiveness of clinical interventions. There are standards for what

should be addressed in the protocol (i.e. SPIRIT guideline7), standards for

the conduct of the trial (i.e. ICH-Good Clinical Practice guidelines8) and on

what should be addressed in the final report (i.e. CONSORT guideline9). Also,

registration in a public trial registry at or before the onset of patient enrolment

is a mandatory practice (e.g. ClinicalTrials.gov, WHO registry). One of the goals

of these standard practices is the reduction of ‘waste’ across medical research,

a phenomenon that has been estimated as consuming 85% of the billions

spent on medical research each year.10 One of the items frequently mentioned

in the literature exploring the different sources of research ‘waste’ is the use

of inadequate outcomes in research. The outcomes that researchers have

measured have not always been those that patients regard as most relevant11

and the variety of outcomes used hampers the comparison and meta-analysis

of results. In preterm birth research for example, 72 different primary outcomes

were reported in 103 clinical trials and 29 different outcomes in 33 Cochrane

reviews.12

After international consensus has been achieved on how to perform and report

on good quality trials, a next step is to establish international consensus on

what outcome measures make trial results more suitable for clinical decision

making. This could be achieved by defining so called ‘core outcome sets’, a

minimal set of outcomes that relevant stakeholders consider as critical to

evaluate interventions for specific health conditions.13

The idea of core outcome sets already came to practice in the 90ties, when a

group of rheumatology researchers started to develop core outcome sets for

rheumatology related health problems (Outcome Measures in Rheumatoid

Arthritis Clinical Trials- OMERACT).14 This work has led to an increase in research

effectiveness in this field due to the availability of comparable data and the

possibility to pool data from different studies.15 The OMERACT team also shared

their experience and stressed the importance of patient involvement in this

process.11

There are now several international initiatives that support the idea of

core outcome sets and deliberate how to improve its methodology and

dissemination. The ‘Core outcome sets in effectiveness trials’ (COMET) initiative

Chapter 8

launched in 2010 aims to foster methodological research in this area by

publications on core outcome set methodology and organizing yearly scientific

meetings (www.comet-initiative.org). It also developed a publicly available

searchable database of completed and ongoing projects in core outcome

set development. This will prevent duplication of core outcome set projects

and inspire new relevant core outcome set projects. The ‘Core Outcomes in

Women’s and Newborn Health’ (CROWN) initiative is led by journal editors, to

harmonise outcome measurement and reporting in women’s health research.16

This consortium of 76 women’s health journals aims to encourage researchers

to develop core outcome sets using robust consensus methodology and to

organize peer-review and effective dissemination of manuscripts describing

core outcome sets. By facilitating the dissemination of core outcome sets, the

final goal is to improve synthesis of evidence to generate recommendations for

clinical practice.

To date, the first core outcome set for preterm birth studies endorsed by

CROWN has been published (chapter 2) together with summary publications

in several journals and languages.17 18 At present, 23 ongoing Core outcome set

projects on pregnancy and childbirth are registered in the COMET database,

amongst them core outcome sets for hypertensive disorders in pregnancy, pre-

eclampsia, gestational diabetes, intrauterine growth restriction, postpartum

haemorrhage, pain management in labour, and hyperemesis gravidarum

(www.comet-initiative.org).

Future implications

This work on Core outcome sets will only pay off if these sets are implemented

in future study protocols and trials. We would like to suggest a roadmap on

how to move forward:

1) Researchers, reviewers and guideline-developers experiencing

the lack of Core outcomes in a specific research area targeting a

specific population, need to address this. Either by the initiation

of a Core outcome set project together with multinational key

stakeholders , or by recommending this in their research- or

review manuscript and guideline protocol. Guidelines on how to

develop a proper Core outcome set project are needed.19

2) Researchers and clinicians involved in national data registries can

implement the Core outcome set in this registry.

3) Researchers and reviewers should incorporate the set of Core

outcomes in their research protocol and motivate when they are

not able to collect all Core outcomes of the set.

4) Journal reviewers should encourage authors to report all core

outcomes included in the particular Core outcome set of a specific

health area. Eventually, this could even be mandatory, just like

reporting of trial registration number and ethical approval. The

CROWN initiative is preparing a simple guideline for reviewers

about Core outcome sets.16

Improving awareness and dissemination of a Core outcome set will hopefully

increase the amount of comparable data incorporating relevant outcomes. This

will contribute to the reduction of unnecessary duplication of randomized trials

and meta-analyses with the same research question and provide a valuable

source of information for clinical guidelines.

II Evaluation of long-term outcomes in perinatal trials

Many interventions applied in pregnancy are evaluated for their efficacy

and safety by measuring short-term maternal and neonatal outcomes only.

There are numerous examples in the literature on evaluation studies showing

either short-term benefit or no benefit, but remarkable long-term harm20 21 or

warning signs for long-term harm.22 23 Performance of follow-up measurements

of obstetrical evaluation studies is therefore pivotal. In chapters 3 and 4 we

evaluated the long term follow-up of two RCTs. In chapter 3 we performed

a follow-up study of children born to mothers that used a cervical pessary to

prevent preterm birth in twin pregnancy. As positive effects of pessary on

pregnancy prolongation and improvement of neonatal outcome only had

been seen in women with a short cervix, we limited follow-up to that group.

This choice was partly financially driven. With the available resources we

acknowledged that follow-up in the group of women with the potential short

term effect was most needed as potential long-term effects of the use of pessary

in pregnancy (including harm) were unknown. We also calculated upfront that

we had enough power to detect possible clinically important differences. This

was different for the follow-up study performed in chapter 4. Here we were

confronted with an underpowered RCT evaluating the use of progesterone to

prevent preterm birth in women with a short cervical length at screening. After

Chapter 8

reviewing the available literature, we were surprised about the small amount of

long-term data on progesterone use in second and third trimester of pregnancy.

Before the start of this follow-up study only two studies were published, both

using parental questionnaires.24 25 This confirmed that this follow-up study was

highly needed due to this blind spot of long-term information of this widely

used drug. New data on potential long term effects of progesterone published

in the last year demonstrated that indeed there are some concerns related to

long term health. The OPPTIMUM trial described an increase risk (although

still with low frequency) for problems related to renal, gastrointestinal, and

respiratory systems in the progesterone group (e.g. gastrointestinal disability

in 4 (1%) in placebo vs 9 (2%) in progesterone group, OR [95% CI] 2.67 [1.37 to

5.20]).23 The follow-up of the PREDICT trial also reported an 8-fold increased risk

of a cardiac problem in children exposed to progesterone at 8 years of age.26

Challenges in design

To date there are no international standards on whether and how to perform

follow-up of (obstetrical) trials. This can be explained by the fact that the design

of the follow-up is related to the questions addressed in the initial trial and

the potential harms expected according to pathophysiological reasoning.27

However, as explained in the above section about standardising outcomes,

there is a huge potential gain in consistency in outcomes used; this implies

short term outcomes, but also long term outcomes. Consistency in timing of

follow-up and assessment methods will help in achieving follow-up data that

can also be compared or pooled across studies. In the Netherlands a national

working group is addressing the need for a national structured follow-up of all

children that were discharged from a neonatal intensive care unit. In 2000, this

group published recommendations on how to perform standardized follow-up

in these children (Table 1).28 Just recently, this has been implemented in clinical

practice. With this collaborative effort, more compatible data will be generated.

Table 1. recommendations of the national working group of neonatal follow-up in children discharged from Neonatal intensive care born at gestational age <30 weeks and/or birthweight <1000gr and/or birthweight <1500gr if below tenth percentile28 29

Corrected age at follow-up Type of assessment

6 months Neurological assessment and general physical examination by paediatrician

12 months Neurological assessment and general physical examination by paediatrician

24 months Neurological assessment and general physical examination by paediatrician. Bayley scales of infant and toddler development (Bayley) test30 31

Child Behaviour Checklist32

5 years Neurological assessment and general physical examination by paediatricianNeurocognitive assessment (including IQ, language, executive function, visuomotor assessment) Movement ABC33

Child Behaviour Checklist

8 years Neurological assessment and general physical examination by paediatricianWechsler preschool and primary scale of intelligence (WPPSI) test34

Movement ABCChild Behaviour Checklist

A second problem of follow-up studies concerns that available follow-up

tools/tests are sometimes not capable to detect subtle, but clinically relevant

differences. Screening tools (i.e. questionnaires like ASQ) instead of diagnostic

tools (i.e. Bayley test) are used with poor predictive value for long-term (neuro)

development.35 Furthermore, due to logistic and financial reasons, it is hardly

feasible to cover the whole spectrum of long-term developmental or healthcare

related outcomes. A follow-up interpretation is therefore always restricted to

the type of test and timing of test chosen.36

Third, a follow-up study faces the problem of loss to follow-up. This can be due

to mortality, but there is a high risk of attrition that causes potential selection

bias in the results and complicates data analysis to adjust for such bias.37

Challenges in interpretation of long-term follow-up data

The need for follow-up should not be misinterpreted with the need to prove

effectiveness on the long-term.36 The reason for this is that some short-term

outcomes indeed do not have long-term consequences, but can still be relevant

for the first period of life. An example is the difference in interpretation of the

follow-up results of the ORACLE I by two medical specialist organisations. In

the original study, women with preterm rupture of membranes without signs

Chapter 8

of clinical infection were randomized to erythromycin or placebo. The study

concluded that erythromycin decreased the risk of the primary outcome,

a composite defined as death or major cerebral abnormality or chronic

neonatal lung disease (11.2% vs 14.4%, p=0.02).38 The 7-years follow-up study,

however, found no difference in the proportion of children with any functional

impairment after prescription of erythromycin compared to placebo (38.3% vs

40.4%, OR [95% CI] 0.91 [0.79 to 1.05] and concluded that the prescription of

antibiotics for women with preterm rupture of membranes did not show an

persisting effect on the health of children.39 The Dutch society of obstetrics and

gynaecology (NVOG) probably interpreted these findings as ‘antibiotics do not

show an effect on the long-term in this group of women, and are therefore

not useful’ by not recommending the use of antibiotics in its guideline using

the ORACLE I follow-up in its references.40 The British society of obstetrics and

gynaecology (RCOG) probably interpreted these findings as ‘antibiotics do

show a relevant short-term effect, and do not show harm on the long-term,

and therefore are useful in this group of women’ by recommending the use of

antibiotics in its guideline.41 Follow-up outcomes should therefore not always

be regarded as a proof of efficacy, rather as a proof of safety.36

Future implications

So long term follow-up is important to remove potential blind spots in clinical

research by evaluating the full scope of potential effect and harm. However,

realising this is challenging, because of financial, logistical and time-restraints,

as well as to appropriately design and interpret the results of follow-up studies.

When long-term follow-up cannot be achieved in clinical trials, we also can

rely on long-term outcomes of cohort studies. A combination of short-term

outcomes obtained from trials and long-term outcomes obtained from cohort

studies might be the best achievable combination to assess the effectiveness

and safety of a treatment.

Furthermore, working towards a standardization of follow-up will help to

generate more consistently documented long-term follow-up data. Initiatives as

the recommendations made by the national working group of neonatal follow-

up (Table 1) will enhance this. Also the Dutch consortium for health evaluation

in obstetrics and gynaecology is currently working on standardization and

feasibility of follow-up assessment (e.g. using electronic questionnaires and

a mobile-bus as test location) for the randomized controlled trials performed

within this consortium (www.studies-obsgyn.nl). This will help to generate

long-term follow-up data that allows comparison and pooling across studies.

Subsequently this framework might be a first step towards the development of

a core outcome set for follow-up studies of obstetrical interventions.

III Towards integration of outcomes of evaluation studies to guide clinical

decision making

There is no such thing as a perfect study. However, a well thought-out, well-

designed, appropriately conducted and analyzed clinical trial is an effective tool

to generate valid and clinically relevant evidence. On the other hand, poorly

designed and conducted trials can be misleading. Also, without supporting

evidence no single study ought to be definitive.42 Therefore, integration of

outcomes of well-designed evaluation studies can be powerful in providing

crucial information for clinical decision making. In chapter 5 the method of meta-

analysis is used to aggregate data from cohort studies to assess the prognostic

value of term MRI in premature born infants on long term developmental

outcomes. The work in this chapter started from a clinically based research

question and therefore has a higher likelihood of being implemented in clinical

practice. Furthermore, the Cochrane collaboration approached our research

team to repeat this work within the Cochrane framework.

The method of cost-effectiveness analysis allows us to model the expected

benefits and costs of two or more interventions in order to determine whether

the expected results of an intervention are ‘worth’ the added costs. In chapter 6

we used data from an individual patient data (IPD) meta-analysis to model the

potential benefit and costs of an intervention of ST-analysis in foetal monitoring

during labour (STAN) compared to foetal monitoring only. We concluded that

STAN can be a cost-effective strategy for both mother and child.

Finally, In this era when health-care budgets are constrained, the underlying

goal of public health care allocation decisions is to attain maximal health

benefit for a given budget. The ultimate goal of evaluation research is to

maximize health benefit. When an intervention is found to be effective, this

presumably results in a population health benefit when implemented in clinical

practice. An example is the implementation of policies to discourage tobacco

use during pregnancy on perinatal health in various populations (e.g. rates

Chapter 8

of stillbirth, neonatal mortality, preterm birth an low birth weight).43 44 When

an existing intervention is found to be ineffective, the de-implementation in

clinical practice might reduce costs to a comparable health benefit.

The method of budget impact analysis allows us to estimate the potential

impact of implementation and de-implementation of interventions on

budget and health. We used this method to evaluate the potential impact

of implementation of the results of obstetrical evaluation studies performed

within the Dutch obstetrical consortium on national health and budget

(chapter 7). Strikingly, the potential cost reduction was estimated to be € 9.6

million per year on the basis of a one-time investment cost of € 3.1 million

for the eight evaluation projects. So, potential cost reduction for the health

care budget driven by evidence from clinical research more than offsets the

costs of performing this research, a conclusion in line with earlier findings.45

This paper also contributed to a debate that healthcare insurance companies

should consider whether they also have a role to invest in evaluation research,46

as healthcare insurance companies are the first to profit from cost reduction

by implementation of effective interventions and de-implementation of

ineffective interventions.

Future implications

Studies should be designed to provide an optimal contribution to the body of

evidence to enable patients, clinicians, and decision makers to be confident

about the magnitude and uncertainties of benefits and harms, and these

studies should be judged based on clinical impact and their ability to change

practice. Ideally, studies that are launched should be clinically useful regardless

of their eventual results.47 Future development should therefore focus on proper

study design, standardised assessment of short and long term outcomes and

international collaboration with the ultimate goal of solid aggregation of data

to guide clinical decision making. An Individual Participant Data (IPD) meta-

analyses is a design containing all these key elements. IPD meta-analysis can

provide additional relevant results by using patient level information, thus

allowing evaluation of sub-groups and individualized approaches to health

care delivery. An IPD can be performed retrospectively (when all studies are

already completed) or prospectively (at the starting or recruitment phase of

several studies). Ideally, we should all prospectively collect data anticipating

future use within an international IPD collaboration. This will allow optimal

use of research data to generate and improve the available evidence to enable

patients, clinicians, and decision makers to be confident about the effects and

possible harm of clinical interventions.

Chapter 8

REFERENCES

2. Ehrenstein V, Pedersen L, Grijota M, et al. Association of Apgar score at five minutes with long-term neurologic disability and cognitive function in a prevalence study of Danish conscripts. BMC Pregnancy Childbirth 2009;9:14.

3. Malin GL, Morris RK, Khan KS. Strength of association between umbilical cord pH and perinatal and long term outcomes: systematic review and meta-analysis. BMJ 2010;340:c1471.

4. Schuit E, Amer-Wahlin I, Ojala K, et al. Effectiveness of electronic fetal monitoring with additional ST analysis in vertex singleton pregnancies at >36 weeks of gestation: an individual participant data metaanalysis. Am J Obstet Gynecol 2013;208(3):187 e1-87 e13.

6. Roseboom T, de Rooij S, Painter R. The Dutch famine and its long-term consequences for adult health. Early Hum Dev 2006;82(8):485-91.

7. Chan AW, Tetzlaff JM, Altman DG, et al. SPIRIT 2013: new guidance for content of clinical trial protocols. Lancet 2013;381(9861):91-2.

8. International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use. ICH Harmonized Tripartite Guideline. Guideline for Good Clinical Practice E6(R1). . Secondary ICH Harmonized Tripartite Guideline. Guideline for Good Clinical Practice E6(R1). 1996.

9. Schulz KF, Altman DG, Moher D, et al. CONSORT 2010 statement: updated guidelines for reporting parallel group randomised trials. BMJ 2010;340:c332.

10. Macleod MR, Michie S, Roberts I, et al. Biomedical research: increasing value, reducing waste. Lancet 2014;383(9912):101-4.

11. Kirwan JR, Minnock P, Adebajo A, et al. Patient perspective: fatigue as a recommended patient centered outcome measure in rheumatoid arthritis. J Rheumatol 2007;34(5):1174-7.

12. Meher S, Alfirevic Z. Choice of primary outcomes in randomised trials and systematic reviews evaluating interventions for preterm birth prevention: a systematic review. BJOG 2014;121(10):1188-94;

14. OMERACT, Conference on Outcome Measures in Rheumatoid Arthritis Clinical Trials. Proceedings. Maastricht, The Netherlands, April 29-May 3, 1992. J Rheumatol 1993;20(3):527-91.

15. Kirkham JJ, Boers M, Tugwell P, et al. Outcome measures in rheumatoid arthritis randomised trials over the last 50 years. Trials 2013;14:324.

16. Khan K. The CROWN Initiative: journal editors invite researchers to develop core outcomes in women’s health. BJOG 2014;121(10):1181-2.

17. van ‘t Hooft J. Crucial outcomes for preterm brith studies [Cruciale uitkomsten voor vroeggeboortestudies]. NTOG 2016;129(03):114-16.

18. van ‘t Hooft J. A core outcome set for evaluation of interventions to prevent preterm birth: summary for CROWN. BJOG 2016;123(5):666.

19. Duffy J, McManus RJ. Influence of methodology upon the identification of potential core outcomes. Recommendations for core outcome set developers are needed. BJOG 2016.

21. Thorp JA, O’Connor M, Jones AM, et al. Does perinatal phenobarbital exposure affect developmental outcome at age 2? Am J Perinatol 1999;16(2):51-60.

22. Wapner RJ, Sorokin Y, Mele L, et al. Long-term outcomes after repeat doses of antenatal corticosteroids. N Engl J Med 2007;357(12):1190-8.

24. Northen AT, Norman GS, Anderson K, et al. Follow-up of children exposed in utero to 17 alpha-hydroxyprogesterone caproate compared with placebo. Obstet Gynecol 2007;110(4):865-72.

25. Rode L, Klein K, Nicolaides KH, et al. Prevention of preterm delivery in twin gestations (PREDICT): a multicenter, randomized, placebo-controlled trial on the effect of vaginal micronized progesterone. Ultrasound Obstet Gynecol 2011;38(3):272-80.

26. Vedel C, Larsen H, Holmskov A, et al. Long-term effects of prenatal progesterone exposure: Neurophysiological development and hospital admissions in twins up to 8 years of age. Ultrasound Obstet Gynecol 2016.

27. Marlow N. Measuring neurodevelopmental outcome in neonatal trials: a continuing and increasing challenge. Arch Dis Child Fetal Neonatal Ed 2013;98(6):F554-8.

28. Van Baar AL, den Ouden AL, Kollee LA. Development of children with perinatal risk factors: theoretical background, literature and implementation [Ontwikkeling van kinderen met perinatale risicofactoren: theoretische achtergrond, literatuurgegevens en implementatie in de praktijk]. Tijdschrift voor Kindergeneeskunde 2000;68(6):210-16.

29. Recommendations National Neonatal Follow-up - NICU follow-up [Aanbeveling Landelijke Neonatale Follow-up - NICU follow-up]. Secondary Recommendations National Neonatal Follow-up - NICU follow-up [Aanbeveling Landelijke Neonatale Follow-up - NICU follow-up] 2015.https://www.nvk.nl/Portals/0/richtlijnen/Neonatale%20Follow%20up%20NICU/Richtlijn%20LNF%20januari%202015.pdf.

32. Achenbach T, Rescorla L. ASEBA Child Behavior Checklists for Ages 1.5-5 Years (CBCL/1.5-5) ASEBA.33. Henderson SE, Sugden DA. Movement ABC-2 NL Movement assessment battery for children:

Pearson, 2010.34. Hendriksen J, Hurks P. WPPSI-III-NL Wechler preschool and primary scale of intelligence, 2009.35. Steenis LJ, Verhoeven M, Hessen DJ, et al. Parental and professional assessment of early child

development: the ASQ-3 and the Bayley-III-NL. Early Hum Dev 2015;91(3):217-25.36. Marlow N. Is survival and neurodevelopmental impairment at 2 years of age the gold standard

outcome for neonatal studies? Arch Dis Child Fetal Neonatal Ed 2015;100(1):F82-4.37. Parekh SA, Field DJ, Johnson S, et al. Accounting for deaths in neonatal trials: is there a correct

approach? Arch Dis Child Fetal Neonatal Ed 2015;100(3):F193-7.38. Kenyon SL, Taylor DJ, Tarnow-Mordi W, et al. Broad-spectrum antibiotics for preterm, prelabour

rupture of fetal membranes: the ORACLE I randomised trial. ORACLE Collaborative Group. Lancet 2001;357(9261):979-88.

39. Kenyon S, Pike K, Jones DR, et al. Childhood outcomes after prescription of antibiotics to pregnant women with preterm rupture of the membranes: 7-year follow-up of the ORACLE I trial. Lancet 2008;372(9646):1310-8.

40. NVOG. Richtlijn dreigende vroeggeboorte. Secondary Richtlijn dreigende vroeggeboorte 2012. http://nvog-documenten.nl.

41. Guidelines R. preterm prelabour rupture of membranes. Secondary preterm prelabour rupture of membranes 2010. www.rcog.org.uk.

42. Friedman LF, Furberg CD, DeMets DL. Fundamentals of Clinical Trials: Springer, 4th ed. 2010.43. Belfort MA, Saade GR, Thom E, et al. A Randomized Trial of Intrapartum Fetal ECG ST-Segment

Analysis. N Engl J Med 2015;373(7):632-41.44. Peelen MJ, Sheikh A, Kok M, et al. Tobacco control policies and perinatal health: a national quasi-

experimental study. Sci Rep 2016;6:23907.45. Been JV, Mackenbach JP, Millett C, et al. Tobacco control policies and perinatal and child health: a

systematic review and meta-analysis protocol. BMJ Open 2015;5(9):e008398.46. Detsky AS. Are clinical trials a cost-effective investment? JAMA 1989;262(13):1795-800.47. Visser E. Hele zorg halve waarheid. Volkskrant 2014.48. Ioannidis JP. Why Most Clinical Research Is Not Useful. PLoS Med 2016;13(6):e1002049.

APPENDICES

DUTCH SUMMARY NEDERLANDSE SAMENVATTING

LIST OF COAUTHORS AND THEIR CONTRIBUTION

PHD PORTFOLIO

LIST OF PUBLICATIONS

CURRICULUM VITAE

ACKNOWLEDGMENTS DANKWOORD

Appendices

NEDERLANDSE SAMENVATTING

VERBETEREN VAN EVALUATIEONDERZOEK NAAR OBSTETRISCHE INTERVENTIES

De meeste zwangere koppels maken een ongecompliceerde zwangerschap en

bevalling mee en gaan met een gezonde baby naar huis. Helaas kunnen er ook

complicaties zijn tijdens de zwangerschap of bevalling die zorgen voor een

slechtere uitkomst voor de moeder en het kind. Als een vrouw geconfronteerd

wordt met een complicatie tijdens haar zwangerschap of bevalling (zoals

vroeggeboorte, foetale nood tijdens de bevalling of hoge bloeddruk) is het de

uitdaging voor de zorgverlener om de uitkomsten voor de moeder en het kind

te verbeteren.

Maar welke interventie is in deze situatie de beste? Het antwoord op deze

vraag kan gevonden worden door het uitvoeren van evaluatie studies. Een

'Randomized controlled trial' (RCT) wordt gezien als het optimale design waarin

het effect van een interventie onderzocht kan worden. In dit design worden

deelnemers aan de studie middels loting verdeeld over twee (of meerdere) gelijke

groepen. Hierdoor zijn de groepen in alle aspecten en patiënt-eigenschappen

gelijkwaardig behoudens voor de interventie die zij krijgen: de ene groep

ontvangt interventie A en de andere groep interventie B. De resultaten van een

goed uitgevoerde RCT geven weliswaar een gedegen antwoord maar kunnen

niet op zichzelf staan als het gaat om klinische besluitvorming.1 De kans dat

een studie een effect vindt dat op toeval berust is namelijk 5%. De kans hierop

wordt verkleind door het systematisch verzamelen van alle RCTs die er over

een bepaald onderwerp zijn, en de data samen te voegen in een meta-analyse.

Systematische reviews (SR) en meta-analyses staan bovenaan de piramide van

wetenschappelijk bewijs (Figuur 1) en de informatie verkregen uit deze studies

zijn bruikbaar om klinische besluitvorming te ondersteunen.2

RCTs en SRs richten zich erop een klinische vraag (volgens de PICO structuur,

Figuur 1) te beantwoorden die betrekking heeft op een bepaalde populatie.

Hierbij wordt een interventiegroep vergeleken met een controlegroep en

worden er uitkomsten gemeten die een indruk geven of de interventie effectief

Nederlandse samenvatting

(en veilig) is. De uitkomsten die gemeten worden in obstetrische evaluatie

studies kunnen bijvoorbeeld zijn: ‘zwangerschapsduur bij geboorte’ of ‘opname

van baby in een neonatale intensive care’. Dit zijn voorbeelden van uitkomsten

die gemeten worden op korte termijn na de geboorte. Uitkomsten die gemeten

worden op lange termijn na de geboorte kunnen zijn ‘motorische-cognitieve-

en taalontwikkeling bij 2 jaar’ maar ook uitkomsten die pas later kenbaar kunnen

worden zoals ‘cerebrale parese’ (een aandoening waarbij kinderen moeite

hebben met bewegen, maar ook gehoor-, zicht- en leerproblemen kunnen

ontwikkelen). Om meerdere redenen beperken RCTs en SRs zich vaak tot de

korte termijn uitkomsten; Maar 16% van grote obstetrische RCT’s rapporteert

zowel korte als lange termijn uitkomsten.3 Daarnaast is er veel variatie in de

uitkomsten die gemeten worden. Een overzicht over alle gepubliceerde RCTs

over vroeggeboorte liet 72 verschillende uitkomsten zien in 133 RCTs.4 Ook

SRs hebben te maken met dit probleem. Voor SR naar vroeggeboorte werden

er 29 verschillende uitkomsten gerapporteerd in 33 SRs.4 Doordat er zoveel

verschillende uitkomsten gemeten worden, zijn onderzoeksresultaten niet

meer met elkaar te vergelijken en kunnen ze ook niet samengebundeld worden

in een SR.

Een ander overkoepelend probleem is de aanwezige scheidslijn tussen

wetenschappelijk onderzoek en toepasbaarheid van de resultaten in de

klinische praktijk. De resultaten van RCTs kunnen hierbij vaak de vertaalslag

naar de praktijk missen, waardoor dit onderzoek niet ten goede komt aan

de gezondheid van de patiënt.5 6 Het integreren van uitkomsten uit RCTs in

ander type onderzoek (bijvoorbeeld in meta-analyse -zoals hierboven reeds

beschreven- maar ook in kost-effectiviteit en budget impact analyses) kan

ervoor zorgen de vertaalslag te maken naar de klinische praktijk en richting

geven aan klinische besluitvorming.

Appendices

Figuur 1. Een schematisch overzicht van evaluatieonderzoek naar obstetrische interventies en de problemen die zich voordoen

Dit proefschrift richt zich op drie problemen:

· Er worden veel verschillende uitkomsten gemeten in

evaluatiestudies

· Er worden maar weinig lange-termijn uitkomsten gemeten in

evaluatiestudies

· Evaluatiestudies missen vaak de vertaalslag naar de klinische

praktijk

Al deze punten samen maken dat evaluatieonderzoek minder efficiënt is dan

dat het potentieel kan zijn. Het doel van dit proefschrift is om suggesties aan te

dragen waarop bovenstaande punten kunnen worden verbeterd/aangepakt.

Dit proefschrift beoogt:

· Het definiëren van een standaard set van belangrijke

uitkomstmaten die in ieder geval gemeten dienen te worden

in alle toekomstige evaluatiestudies met een gelijkwaardig

onderwerp, een zogenaamde 'core outcome set'.

· Het meten van lange-termijn uitkomsten in twee gerandomiseerde

evaluatiestudies in de obstetrie.

· Het integreren van uitkomsten uit evaluatiestudies in andere

studies die met hun methoden (bijvoorbeeld meta-analyse,

kosten-effectiviteitsanalyse en budget impact analyse) zich

richten op de vertaalslag van wetenschappelijk bewijs naar

klinische besluitvorming.

I Het definiëren van een standaard set van belangrijke uitkomsten

Na een algemene introductie in hoofdstuk 1, wordt in hoofdstuk 2 ingegaan op

het probleem van de grote hoeveelheid verschillende uitkomsten die gebruikt

worden in vroeggeboortestudies, waardoor ze niet met elkaar vergeleken

kunnen worden of gebundeld in meta-analyses. Een 'core outcome set' is een

lijst van uitkomsten die door een groep stakeholders (medisch professionals,

patiënten, overheid, onderzoekers, industrie) is benoemd als 'cruciaal'.7 Dit

hoofdstuk had als doel een core outcome set te ontwikkelen voor studies die

preventieve maatregelen in het voorkomen van vroeggeboorte evalueren

(e.g. progesteron, pessarium, cerclage, leefstijlinterventies). Eerst werden alle

potentiële uitkomsten geïdentificeerd middels een systematische review,

vragenlijsten en interviews. Vervolgens werden vijf verschillende stakeholders

Appendices

(patiënten, verloskundigen, perinatologen, neonatologen en wetenschappers)

gevraagd om middels een online Delphi vragenlijst de uitkomsten te scoren

naar relevantie. Voor iedere uitkomst werd gevraagd hoe relevant deze werd

bevonden voor het evalueren van effecten in studies naar het voorkomen van

vroeggeboorte. Hierbij werd gebruik gemaakt van een 9-punts schaal, waarbij

'1-3' beperkte relevantie, '4-6' relevant maar niet cruciaal en '7-9' cruciaal

representeerde. In de Delphi methodiek heeft de deelnemer de mogelijkheid

zijn antwoorden van een eerdere ronde te herzien op basis van de gegeven

antwoorden van alle (groepen) stakeholders. Op die manier wordt er gestreefd

naar consensus. In totaal hebben 25 ouders van te vroeg geboren kinderen, 25

verloskundigen, 55 perinatologen, 34 neonatologen en 35 wetenschappers/

methodologen uit 25 (laag-midden en hoog inkomens) landen deelgenomen

aan de twee Delphi rondes. De zorgverleners (verloskundigen, perinatologen en

neonatologen) gaven aan hoofdzakelijk klinisch werkzaam te zijn. Echter, 61%

van hen was ook betrokken bij de ontwikkeling van (inter)nationale richtlijnen.

Van de 227 geïndiceerde uitkomsten uit de literatuur en 33 uitkomsten die nog

werden gesuggereerd door deelnemers, werd er consensus bereikt voor een set

van 13 ‘cruciale’ uitkomsten (Tabel 1).

Tabel 1. Core outcome set voor vroegeboortestudies. Lijst van 13 cruciale uitkomsten gedefinieerd na Delphi consensus door 5 stakeholdergroepen.

Maternale set van uitkomsten Neonatale set van uitkomsten

Maternale mortaliteit Neonatale mortaliteit

Maternale infectie of inflammatie Neonatale infectie

Preterm gebroken vliezen (PPROM) Amenorroeduur bij geboorte

Bijwerkingen/schade van interventie aan moeder Bijwerkingen/schade interventie aan de ongeborene/neonaat

Geboortegewicht

Vroege neurologische morbiditeit

Late ontwikkelingsachterstand

Gastro-intestinale morbiditeit

Respiratoire morbiditeit

II Het meten van lange-termijn uitkomsten

Er is een tekort aan evaluatiestudies dat naast de korte termijn ook lange termijn

resultaten meet. In hoofdstuk 3 en hoofdstuk 4 rapporteren we lange termijn

resultaten van twee RCTs die interventies evalueerden om vroeggeboorte te

voorkomen (pessarium en progesteron).

Lange termijn resultaten van de ProTWIN studie

Vrouwen die zwanger zijn van een meerling hebben 50% kans om te vroeg

te bevallen (<37 weken zwangerschapsduur) en 15% kans om voor de 32

zwangerschapsweek te bevallen. Een bevalling voor de 32 weken geeft een

verhoogde kans op ontwikkelingsachterstand van de kinderen. Het voorkomen

van vroeggeboorte in deze groep vrouwen zou een grote winst voor moeder

en kinderen kunnen betekenen. Een te vroege bevalling zou op gang kunnen

komen door toegenomen druk op de baarmoederhals door een groter gewicht

bij een meerlingzwangerschap. Een pessarium (rubberen ring) rond de

baarmoederhals wordt gezien als een potentiële interventie om vroeggeboorte

te voorkomen door extra steun te geven aan de baarmoedermond.

De ProTWIN studie randomiseerde vrouwen met een meerlingzwangerschap

bij een zwangerschapsduur tussen 12 en 20 weken tussen een pessarium en

zonder een pessarium.8 In totaal namen 808 vrouwen deel aan deze studie. De

studie liet geen positief effect zien van het gebruik van een pessarium in de totale

groep vrouwen. Echter, wanneer gekeken werd naar de groep vrouwen met

een verkorte baarmoederhals (dus vrouwen met verminderde steun van hun

baarmoederhals) zag men wel een potentieel effect. Hierbij werd een halvering

gezien in het aantal vroeggeboortes onder de 32 weken in de pessariumgroep

ten opzichte van de controlegroep (11% vs 25%), de zwangerschapsduur werd

gemiddeld met 10 dagen verlengd en het risico op een kind met een slechte

uitkomst (waaronder sterfte en korte termijn morbiditeit) nam met 60% af (12%

vs 29%).8 In hoofdstuk 3 worden de lange termijn effecten van het gebruik

van een pessarium versus geen pessarium in deze groep vrouwen met een

verkorte baarmoederhals, bekeken (78 vs 55 moeders, 157 vs 111 kinderen).

Op driejarige leeftijd ondergingen de kinderen een ontwikkelingsonderzoek

(Bayley-III test). In totaal stierven er 27 kinderen (6 in pessarium en 21 in

controle groep) gemeten vanaf het moment van randomisatie tot 3 jaar

follow-up. De Bayley-III uitkomsten werden bij 173 (72%) kinderen gemeten

(114 in pessarium vs 59 in controlegroep). De cumulatieve kans op sterfte of

overleving met een ontwikkelingsachterstand was 12 (10%) vs 23 (29%) in de

pessarium en controlegroep. Voor de overlevende kinderen waren de Bayley-

III uitkomsten niet verschillend. We concluderen dus dat bij vrouwen met een

meerlingzwangerschap én een verkorte baarmoederhals, de kans op overleving

sterk toeneemt met het gebruik van een pessarium zonder dat er sprake is van

een toename in ontwikkelingsachterstand in de overlevende kinderen op drie

jarige leeftijd.

Appendices

Lange termijn resultaten van de Triple P studie.

Al hebben vrouwen met een meerling en vrouwen met een vroeggeboorte in

de voorgeschiedenis een hoge kans op vroeggeboorte, toch komen de meeste

vroeggeboortes voor bij vrouwen met een eenling die deze twee risicofactoren

niet hebben. Vrouwen met een verkorte baarmoederhals tijdens de 20 weken

echo (een baarmoederhalslengte ≤30mm) hebben een 3 tot 4x verhoogde kans

op vroeggeboorte.9 Het screenen van de baarmoederhals rond de 20 weken

zwangerschap en het behandelen van vrouwen die geïdentificeerd worden met

een verkorte baarmoederhals zou mogelijk een groot aantal vroeggeboortes

kunnen voorkomen. Eerder is gebleken dat dagelijks gebruik van progesteron

vanaf de 16e zwangerschapsweek, bij vrouwen met een vroeggeboorte in de

voorgeschiedenis, de kans op herhaling van een vroeggeboorte verlaagt.10

Mogelijk is er dus ook een preventief effect bij gebruik van progesteron voor

vrouwen met een verkorte baarmoederhals rond de 20 weken.

De Triple P studie onderzocht dit en randomiseerde vrouwen die zwanger

waren van een eenling en met een verkorte baarmoederhals na screening

tussen progesteron en placebo.11 In totaal namen 80 vrouwen deel. Zowel de

arts als de vrouwen wisten niet welk middel ze ontvingen. De studie liet geen

verschil zien tussen beide groepen maar had te maken met het probleem dat

er te weinig vrouwen aan de studie hadden deelgenomen om betrouwbare

resultaten te krijgen. Initieel was gepland om 1920 vrouwen te randomiseren

waar het daadwerkelijk 80 vrouwen in de studie betrof. Dit had voornamelijk

te maken met een onverwachts laag aantal vrouwen met een verkorte

baarmoederhals.12 In hoofdstuk 4 worden de lange termijn effecten van het

gebruik van een progesteron versus placebo bekeken (41 vs 39 vrouwen). Op

twee jarige leeftijd ondergingen de kinderen een ontwikkelingsonderzoek

(Bayley-III test). Van de 77 overlevende kinderen in deze TripleP studie, werden

de Bayley-III uitkomsten van 57 kinderen (74%, 28 progesteron vs 29 placebo)

gemeten. Er werden geen verschillen in ontwikkeling tussen de twee groepen

gevonden wat suggereert dat progesteron gebruik in de zwangerschap geen

schadelijke lange termijn effecten laat zien, deze resultaten geven echter geen

zekerheid tot veiligheid van dit middel gezien de studie enkel een klein aantal

kinderen heeft kunnen onderzoeken.

III Het integreren van uitkomsten uit evaluatiestudies in andere studies

met als doel de vertaalslag van wetenschappelijk bewijs naar klinische

besluitvorming te maken

Meta-analyse

Eerder beschreven we al de kracht van het samenbundelen van informatie uit RCTs

in systematische reviews en meta-analyses. In hoofdstuk 5 gebruiken we deze

methode om te kijken of lange termijn uitkomsten bij te vroeg geboren kinderen

voorspeld kunnen worden met het uitvoeren van hersen-MRI vlak voor ontslag

uit het ziekenhuis. De analyse keek naar alle kinderen geboren onder de 32 weken

zwangerschapsduur of met een geboortegewicht onder de 1500gram aangezien

dit de kinderen zijn met de grootste kans op een ontwikkelingsachterstand. Er

zijn steeds meer suggesties dat witte stof afwijkingen in de hersenen, te zien

middels MRI, een belangrijke voorspeller zijn voor ontwikkelingsachterstand.

Het voorspellen van ontwikkelingsachterstand kan heel waardevol zijn omdat

ouders zich dan beter kunnen voorbereiden op de toekomst en er eerder gekeken

kan worden naar mogelijke ondersteuning en interventies om de ontwikkeling

te bevorderen. Echter, indien de voorspellende waarde van een instrument laag

is, betekent dit dat ouders onnodig ongerust worden gemaakt of juist onterecht

gerust worden gesteld.13 In dit hoofdstuk werden alle studies die keken naar MRI

en lange termijn ontwikkeling, systematische geïdentificeerd, beoordeeld op de

kwaliteit en de data samengevoegd in een meta-analyse. In totaal werden er

20 studies geïdentificeerd. De resultaten toonden dat enkel MRI matig in staat

is om cerebrale parese te voorspellen. De sensitiviteit (detectie van de zieken)

en specificiteit (identificeren van de niet-zieke) was 51% en 93%. MRI was nog

minder in staat om andere ontwikkelingsachterstanden (in gedrag, motoriek en

cognitie) te voorspellen. We concluderen in deze studie dat het routinematig

uitvoeren van MRI (nog) niet geschikt is in de klinische praktijk. Echter, de

uitvoering van MRI in onderzoekverband kan mogelijk in de toekomst meer

zicht geven op de meerwaarde van deze techniek met als doel de ontwikkeling

van kinderen te voorspellen.

Kosten-effectiviteitsanalyse

Een andere methode die kan helpen om informatie uit evaluatieonderzoek te

vertalen naar de klinische praktijk, is middels kosten-effectiviteitsanalyses (KEA).

Een KEA weegt de informatie over kosten en effecten van twee of meerdere

Appendices

interventies tegen elkaar af, met als doel te onderzoeken of de effecten van een

interventie de gemaakte kosten ‘waard’ zijn.14 In hoofdstuk 6 werd de informatie

van een meta-analyse gebruikt om te kunnen vaststellen of de effectiviteit

van een speciale techniek van foetale bewaking (STAN) meerwaarde heeft

ten opzichte van de conventionele techniek van foetale bewaking (CTG). De

kosten voor moeder en kind op de lange termijn werden in een model verwerkt,

waarbij een indruk werd verkregen over de korte- en lange termijn effecten

en de kosten voor zowel moeder als kind. De resultaten lieten zien dat STAN

een investering vraagt van € 14 509 per bevalling om 1 kind met metabole

acidose* te voorkomen. Een extra investering van € 2602 per bevalling is nodig

om een vaginale kunstverlossing bij de vrouw te voorkomen ten gunste van

een vaginale bevalling (met gelijk aantal keizersnedes). We concluderen dat

STAN een kost-effectieve strategie kan zijn in vergelijking met CTG in foetale

bewaking. (Metabole acidose: dit is de meting van een lage pH in het navelstreng

bloed van de baby vlak na de geboorte, welke in 0.5 tot 9% van de gevallen kan

betekenen dat een kindje cerebrale parese ontwikkelt op latere leeftijd).

Budget impact analyse

Ten slotte kan de uitvoering van een budget impact analyse inzicht geven in

de mogelijke voordelen van het implementeren van evaluatieonderzoek. Het

probleem dat evaluatieonderzoek de vertaalslag naar de klinische praktijk

mist heeft zowel consequenties voor de patiënt als voor de maatschappij

en zorgkosten. De patiënt zal namelijk minder gezondheidswinst ervaren

doordat in de kliniek niet de meest effectieve interventie wordt toegepast. De

maatschappij zal het merken aan de toename van zorgkosten doordat er onnodig

geld wordt uitgegeven aan ineffectieve interventies. Dat het implementeren

inderdaad grote potentiële gezondheidswinst en kostenbesparingen met zich

meebrengt, laten we zien in hoofdstuk 7. Hierin worden acht evaluatiestudies

in de obstetrie meegenomen. De resultaten lieten zien dat door implementatie

van drie studies de gezondheid verbeterde voor (1) vrouwen met een hoge

bloeddruk in de zwangerschap en milde zwangerschapsvergiftiging, (2)

vrouwen die een inleiding ondergaan en (3) vrouwen die foetale registratie

ondergingen tijdens de bevalling. De de-implementatie van niet-effectieve

interventies als (4) verlengde tocolyse, (5) het gebruik van intra-uteriene druk

katheters tijdens de bevalling, (6) progesteron bij tweelingzwangerschappen,

(7) het direct inleiden na te vroeg gebroken vliezen, (8) het direct inleiden

bij intra-uteriene groei restrictie, zorgde voor reductie van zorgkosten. De

potentiële reductie in kosten werd geschat op €9,6 miljoen per jaar, waarbij

er eenmalig €3,1 miljoen geïnvesteerd is om deze acht evaluatiestudies uit te

voeren. We concluderen dat het uitvoeren van evaluatiestudies de investering

meer dan 3x terugverdient in het eerste jaar.

ALGEMENE DISCUSSIE

Als een vrouw geconfronteerd wordt met een complicatie tijdens haar

zwangerschap of bevalling (zoals vroeggeboorte, foetale nood tijdens de

bevalling of hoge bloeddruk) is het de uitdaging voor de zorgverlener een

oplossing te vinden die de uitkomsten voor de moeder en het kind te verbeteren.

Een goed opgestelde onderzoeksvraag is het begin van een gerandomiseerde

evaluatiestudie (RCT). Meerdere RCTs samen in een systematische review

en meta-analyse hebben de capaciteit om richting te geven aan klinische

besluitvorming. Echter, momenteel is het evaluatieonderzoek niet geheel

efficiënt en kan er gewerkt worden aan verbetering in de uitkomstmaten die

gebruikt worden, het meer structureel meten van lange termijn uitkomsten en

het integreren van uitkomsten in ander type studies om zo de vertaalslag van

wetenschappelijke kennis te maken naar de klinische praktijk. Indien we hierin

slagen, en dus hierdoor evaluatiestudies beter weten te implementeren, zijn de

te verwachten gezondheidswinst én kostenbesparing groot.

Appendices

REFERENTIES

1. Friedman LF, Furberg CD, DeMets DL. Fundamentals of Clinical Trials: Springer, 4th ed. 2010.

2. Khan K, Kunz R, Kleijnen J, et al. Systematic revies to support evidence-based medicine. Secondary Systematic revies to support evidence-based medicine 2nd edition, 2011.

4. Meher S, Alfirevic Z. Choice of primary outcomes in randomised trials and systematic reviews evaluating interventions for preterm birth prevention: a systematic review. BJOG 2014;121(10):1188-94; discussion 95-6.

6. Ioannidis JP. Why Most Clinical Research Is Not Useful. PLoS Med 2016;13(6):e1002049.

8. Liem S, Schuit E, Hegeman M, et al. Cervical pessaries for prevention of preterm birth in women with a multiple pregnancy (ProTWIN): a multicentre, open-label randomised controlled trial. The Lancet 2013;382(9901):1341-49.

9. Iams JD, Goldenberg RL, Meis PJ, et al. The length of the cervix and the risk of spontaneous premature delivery. National Institute of Child Health and Human Development Maternal Fetal Medicine Unit Network. N Engl J Med 1996;334(9):567-72.

11. van Os MA, van der Ven AJ, Kleinrouweler CE, et al. Preventing Preterm Birth with Progesterone in Women with a Short Cervical Length from a Low-Risk Population: A Multicenter Double-Blind Placebo-Controlled Randomized Trial. Am J Perinatol 2015;32(10):993-1000.

12. van Os MA, Kleinrouweler CE, Schuit E, et al. Influence of cut-off value on the prevalence of short cervical length. Ultrasound Obstet Gynecol 2016.

13. Janvier A, Barrington K. Trying to predict the future of ex-preterm infants: who benefits from a brain MRI at term? Acta Paediatr 2012;101(10):1016-7.

14. Ryder HF, McDonough C, Tosteson AN, et al. Decision Analysis and Cost-effectiveness Analysis. Semin Spine Surg 2009;21(4):216-22.

List of co-authors and their contribution to the manuscript

LIST OF COAUTHORS AND THEIR CONTRIBUTION TO THE MANUSCRIPT

A Core Outcome Set for Evaluation of Interventions to Prevent Preterm Birth.

Obstet Gynecol. 2016;127(1):49-58Janneke van ‘t Hooft Study concept and design, acquisition of data, data anlaysis, interpretation

of data, drafting the mansucript, final approval of the manuscript.

James M. N. Duffy Study concept and design, interpretation of data, drafting the mansucript, critically reviewing the manuscript, final approval of the manuscript.

Mandy Daly interpretation of data, critically reviewing the manuscript, final approval of the manuscript.

Paula R. Williamson Study concept and design, data anlaysis, interpretation of data, critically reviewing the manuscript, final approval of the manuscript.

Shireen Meher Acquisition of data, interpretation of data, critically reviewing the manuscript, final approval of the manuscript.

Elizabeth Thom critically reviewing the manuscript, final approval of the manuscript.

George R. Saade Study concept and design, interpretation of data, critically reviewing the manuscript, final approval of the manuscript.

Zarko Alfirevic Study concept and design, acquisition of data, data anlaysis, interpretation of data, critically reviewing the manuscript, final approval of the manuscript.

Ben Willem J. Mol Study concept and design, interpretation of data, critically reviewing the manuscript, final approval of the manuscript.

Khalid S. Khan Study concept and design, acquisition of data, interpretation of data, critically reviewing the manuscript, final approval of the manuscript.

Cervical pessary for preterm birth prevention in twin pregnancy with short

cervix: a 3 years follow-up. SubmittedJanneke van ‘t Hooft Study concept and design, acquisition of data, data anlaysis, interpretation

Johanna H. van der Lee Study concept and design, data analysis, interpretation of data, critically reviewing the manuscript, final approval of the manuscript.

Brent C. Opmeer Study concept and design, data analysis, interpretation of data, critically reviewing the manuscript, final approval of the manuscript.

Aleid G. van Wassenaer-Leemhuis

Study concept and design, interpretation of data, critically reviewing the manuscript, final approval of the manuscript.

Anneloes L. van Baar Study concept and design, acquisition of data, interpretation of data, critically reviewing the manuscript, final approval of the manuscript.

Leonie Steenis Acquisition of data, interpretation of data, critically reviewing the manuscript, final approval of the manuscript.

Sophie Liem Study concept and design, acquisition of data, critically reviewing the manuscript, final approval of the manuscript.

Ewoud Schuit Study concept and design, acquisition of data, critically reviewing the manuscript, final approval of the manuscript.

Elise Bleker acquisition of data, critically reviewing the manuscript, final approval of the manuscript.

Appendices

Margot E. Vinke acquisition of data, critically reviewing the manuscript, final approval of the manuscript.

Noor Simons acquisition of data, critically reviewing the manuscript, final approval of the manuscript.

Irene de Graaf Study concept and design, critically reviewing the manuscript, final approval of the manuscript

Dick Bekedam Study concept and design, critically reviewing the manuscript, final approval of the manuscript

Cornelieke van de Beek Study concept and design, interpretation of data, critically reviewing the manuscript, final approval of the manuscript.

Preventing preterm birth with progesterone in women with short cervical

length: outcomes in children at 24 months of age. Manuscript in preparationJanneke van ‘t Hooft Study concept and design, acquisition of data, data anlaysis, interpretation

Cuny Cuijpers acquisition of data, data anlaysis, interpretation of data, drafting the mansucript, final approval of the manuscript.

Caroline Schneeberger Study concept and design, acquisition of data, interpretation of data, critically reviewing the manuscript, final approval of the manuscript.

Johanna H. van der Lee Data analysis, interpretation of data, critically reviewing the manuscript, final approval of the manuscript.

Brent C. Opmeer Data analysis, interpretation of data, critically reviewing the manuscript, final approval of the manuscript.

Leonie Steenis Acquisition of data, interpretation of data, critically reviewing the manuscript, final approval of the manuscript.

Cornelieke van de Beek Interpretation of data, critically reviewing the manuscript, final approval of the manuscript.

Melanie van Os Acquisition of data, critically reviewing the manuscript, final approval of the manuscript.

Jeanine van der Ven Acquisition of data, critically reviewing the manuscript, final approval of the manuscript.

Christianne J. M. de Groot Study concept and design, critically reviewing the manuscript, final approval of the manuscript.

Aleid G. van Wassenaer-Leemhuis

Study concept and design, acquisition of data, interpretation of data, critically reviewing the manuscript, final approval of the manuscript.

List of co-authors and their contribution to the manuscript

Predicting developmental outcomes in premature infants by term equivalent

MRI: systematic review and meta-analysis. Syst Rev. 2015;4:71.Janneke van ‘t Hooft Study concept and design, acquisition of data, interpretation of data,

drafting the mansucript, final approval of the manuscript.

Johanna H. van der Lee Study concept and design, interpretation of data, critically reviewing the manuscript, final approval of the manuscript.

Brent C. Opmeer Data analysis, interpretation of data, critically reviewing the manuscript, final approval of the manuscript.

Cornelieke SH. Aarnoudse-Moens

Study concept and design, acquisition of data, interpretation of data, critically reviewing the manuscript, final approval of the manuscript.

Arnold GE. Leenders acquisition of data, critically reviewing the manuscript, final approval of the manuscript.

Ben Willem J. Mol Interpretation of data, critically reviewing the manuscript, final approval of the manuscript.

Timo R. de Haan Study concept and design, acquisition of data, interpretation of data, critically reviewing the manuscript, final approval of the manuscript.

ST-analysis in electronic foetal monitoring is cost-effective from both the

maternal and neonatal perspective. J Matern Fetal Neonatal Med. 2016:1-6Janneke van ‘t Hooft Study concept and design, data analysis, acquisition of data,

interpretation of data, drafting the mansucript, final approval of the manuscript.

Maarten Vink Study concept and design, data analysis, acquisition of data, interpretation of data, drafting the mansucript, final approval of the manuscript.

Brent C. Opmeer Data analysis, Interpretation of data, critically reviewing the manuscript, final approval of the manuscript.

Sabine Ensing Study concept and design, interpretation of data, critically reviewing the manuscript, final approval of the manuscript.

Anneke Kwee Acquisition of data, critically reviewing the manuscript, final approval of the manuscript.

Ben Willem J. Mol Study concept and design, critically reviewing the manuscript, final approval of the manuscript.

Appendices

Kosten en effecten van doelmatigheidsonderzoek in de obstetrie. Costs

and health outcomes of effectiveness studies in obstetrics: a budget

impact analysis of 8 obstetric effectiveness studies.] Ned Tijdschr Geneeskd.

2013;157:A6287. Janneke van ‘t Hooft Study concept and design, data analysis, acquisition of data,

interpretation of data, drafting the mansucript, final approval of the manuscript.

Brent C. Opmeer Data analysis, Interpretation of data, critically reviewing the manuscript, final approval of the manuscript.

Margreet J. Teune Study concept and design, critically reviewing the manuscript, final approval of the manuscript.

Luuk Versluis Study concept and design, interpretation of data, critically reviewing the manuscript, final approval of the manuscript.

Ben Willem J. Mol Study concept and design, cquisition of data, interpretation of data,critically reviewing the manuscript, final approval of the manuscript.

PhD portfolio

PHD PORTFOLIO

PhD period: 1-4-2012 to 1-8-2016

Phd training. General courses in Epidemiology and more specific topics Workload (ECTS) Year

Clinical Epidemiology. M. Leeflang. Graduate School, Amsterdam. 0.64 2012

The practice of Epidemologic Analysis. A. Ikram en M. Vernooij. NIHES, Rotterdam.

0.7 2012

The AMC World of Science. Graduate School, Amsterdam. 0.7 2012

Clinical Decision Analysis. Z. Voko. NIHES, Rotterdam. 0.7 2012

Markers and Prognostic Research. E. Steyerberg. NIHES, Rotterdam. 0.7 2012

Topics in Meta-analysis. M. Egger. NIHES, Rotterdam. 0.7 2012

Health Economics. K. Redekop. NIHES, Rotterdam. 0.7 2012

Methods of Health Services Research. N. Klazinga. NIHES, Rotterdam. 0.7 2013

Advanced Topics in Clinical Epidemiology. P. Bossuyt. Graduate School, Amsterdam.

1.14 2013

Introduction to Data-analysis. A. Albert. NIHES, Rotterdam. 0.7 2013

Conceptual Foundation of Epidemiologic Study Design. K. Rothman. NIHES, Rotterdam.

0.7 2013

Practical Biostatistics. B. Opmeer. Graduate School, Amsterdam. 1.1 2013

Causal Inference. M. Hernan. NIHES, Rotterdam. 0.7 2013

Advances Topics in Decision Making in Medicine. M. Hunink. NIHES, Rotterdam

1.9 2013

Scientific writing in English for Publication. E.Hull. Graduate School, Amsterdam

1.5 2014

Clinical Data management. B. Opmeer. Graduate School, Amsterdam. 0.3 2014

Basic Course Legislation and Organization (BROK). Graduate School, Amsterdam.

1.0 2014

Clinical Epidemiology: P. Bossuyt. Observational Epidemiology. Graduate School.

0.6 2015

Advance Topics of Biostatistics. A. Zwinderman. Graduate School. 2.1 2016

Appendices

Oral presentations Year

Society of Clinical Trials-34th Annual Meeting 2013, Boston, USA. Invited Session. Ben Willem Mol, Janneke van’t Hooft, Katrien Oude Rengerink, Parvin Tajik, John Norrie. ‘Incorporation of Clinical Trials in Routine Patient Care – RCTs as the Standard of Care Rather than the Exception in Women’s Health’

Society of Maternal-Fetal Medicine (SMFM), New Orleans, USA. Global Obstetrics Networks members meeting. Van ’t Hooft J. Replacement for Norman J. ‘Protocol of an open andomized trial of the Arabin pessary to prevent preterm birth in twin pregnancy – STOPPIT- 2. UK’.

Zorgvisie congres 'sturen op gepaste zorg', Haarlem, 2015. Duo presentation with T. van Barneveld (namens het Kennisinstituut) en J. van 't Hooft (namens de NVOG). 'Zorgevaluatie: een 'no brainer'. Winst voor patiënt en premiebetaler'.

ZonMw 15jaar lustrum, Utrecht. 2015. PechaKucha presentation on collaborative work in obstetric research

Gynaecongres Arnhem, 2015. Talent en onderzoek sessie. Presentatie over ‘cruciale uitkomstmaten in vroeggeboortestudies’.

European Preterm Birth Conference, Gotenbrug, 2016. Long term outcomes following a pessary placement to pregnant woman with a multiple pregnancy: 3 years follow up of the ProTwin trial.

Poster presentations Year

A joint meeting of Belgian Society for Reproductive Medicine (BSRM) and Dutch Society for Reproductive Medicine (DSRM), Antwerpen. 2011. van ’t Hooft J, Maas JWM, Pennings G. ‘High quality fertility care: Belgium vs. The Netherlands 1-0?’.

Gynaecongres. Arnhem, 2011 van ‘t Hooft J, Rabotti C, Oei SG. ‘Evaluatie van het elektrohysterogram bij een zwangere met een uterus unicornis en preterme contracties’.

Society of Maternal-Fetal Medicine (SMFM), New Orleans, USA, 2014. Van ‘t Hooft J, Opmeer BC, Teune MJ, Versluis L, Mol BWJ. ‘Implementing Clinical Trial Results: A Budget Impact Analysis’.

Society of Maternal-Fetal Medicine (SMFM), New Orleans, Atlanta, 2015 van ’t Hooft J, van der Lee JH, Opmeer BC, Cuijpers C, Steenis L, de Graaf I, van Wassenaer-Leemhuis AG, van Baar AL, van der Post JAM, Mol BWJ, van de Beek C. Long term outcomes following a pessary placement to pregnant woman with a multiple pregnancy: 3 years follow up of the ProTwin trial..

Teaching

Uva, Master EBP, 2014. Duo college met dr. B.C. Opmeer over kosten-effectiviteitsanalyses. 2014

List of publications

LIST OF PUBLICATIONS

van ‘t Hooft J, Duffy JM, Daly M, Williamson PR, Meher S, Thom E, et al. A Core Outcome Set for Evaluation of Interventions to Prevent Preterm Birth. Obstet Gynecol. 2016;127(1):49-58

van ‘t Hooft J, Duffy JM, Daly M, Williamson PR, Meher S, Thom E, et al. A Core Outcome Set for Evaluation of Interventions to Prevent Preterm Birth. Obstet Gynecol (Indian edition, selected articles reprinted from Vol. 127(1). 2016, Issue 2, 2016.

van ‘t Hooft J, Duffy JM, Daly M, Williamson PR, Meher S, Thom E, et al. Conjunto de criterios de valoración centrales para la evaluación de intervenciones que prevengan el parto prematuro. Obstet Gynecol (Argentinian edition, selected articles reprinted from Vol. 127(1). 2016, Número 2, Mayo 2016.

van ‘t Hooft J. A core outcome set for evaluation of interventions to prevent preterm birth: summary for CROWN. BJOG. 2016 Apr;123(5):666

Duffy, J.M.N., van ’t Hooft, J., Gale, C., Brown, M., Grobman, W., Fitzpatrick, R., Ananth

Karumanchi, S., Lucas, N., Magee, L., Mol, B., Stark, M., Thangaratinam, S., Wilson, M., von Dadelszen, P., Williamson, P., Khan, K.S., Ziebland, S., McManus, R.J., On behalf of the International Collaboration to Harmonise Outcomes for Pre-eclampsia (iHOPE), A protocol for developing, disseminating, and implementing a core outcome set for pre-eclampsia, Pregnancy Hypertension: An International Journal of Women’s Cardiovascular Health (2016).

van ‘t Hooft J, Vink M, Opmeer BC, Ensing S, Kwee A, Mol BW. ST-analysis in electronic foetal monitoring is cost-effective from both the maternal and neonatal perspective. J Matern Fetal Neonatal Med. 2016:1-6.

Tajik P, Monfrance M, van ‘t Hooft J, Liem SM, Schuit E, Bloemenkamp KW, et al. A multivariable model to guide the decision for pessary placement to prevent preterm birth in women with a multiple pregnancy: a secondary analysis of the ProTWIN trial. Ultrasound Obstet Gynecol. 2016.

van ‘t Hooft J. Cruciale uitkomsten voor vroeggeboortestudies [Crucial outcomes in preterm birth studies] NTOG, March 2016.

Van’t Hooft J, van der Lee JH, Opmeer BC, Aarnoudse-Moens CS, Leenders AG, Mol BW, de Haan TR.. Predicting developmental outcomes in premature infants by term equivalent MRI: systematic review and meta-analysis. Syst Rev. 2015;4:71

van ’t Hooft J. Eerste stap naar betere uitkomstmaten in de verloskunde [First steps to better outcomes in obstetrics] NTOG, May 2015.

van der Heyden JL, Willekes C, van Baar AL, van Wassenaer-Leemhuis AG, Pajkrt E, Oudijk MA, Porath MM, Duvekot HJJ, Bloemenkamp KWM, Groenwout M, Woiski M, Bijvank BN, Bax CJ, van ‘t Hooft J, Sikkema MJM, Akerboorn, BMC, Mulder TLM, Nijhuis JG, Mol BWJM, van der Ham DP. Behavioural and neurodevelopmental outcome of 2-year-old children after preterm premature rupture of membranes: follow-up of a randomised clinical trial comparing induction of labour and expectant management. Eur J Obstet Gynecol Reprod Biol. 2015;194:17-23.

van t Hooft J, Opmeer BC, Teune MJ, Versluis L, Mol BWJ. Kosten en effecten van doelmatigheidsonderzoek in de obstetrie [Costs and health outcomes of effectiveness studies in obstetrics: a budget impact analysis of 8 obstetric effectiveness studies]. Ned Tijdschr Geneeskd. 2013;157:A6287.

van ‘t Hooft J, Rabotti C, Oei SG. Electrohysterographic evaluation of preterm contractions in a patient with a unicornuate uterus. Acta Obstet Gynecol Scand 2013 Jan 29.

van 't Hooft J, Maas JWM, Pennings G. Grensoverschrijdende fertiliteitszorg van Nederland naar België: een grens te ver? Kwalitatief onderzoek naar het perspectief van de gynaecoloog. [Corss border reproductive care from the Netherlands to Belgium: a border too far? Qualitative research on the perspective of the gynaecologist] Tijdschrift voor Geneeskunde 2012;68:974-78.

Rabotti C, Oei SG, van ’t Hoof J, Mischi M. Electrohysterographic propagation velocity for preterm delivery prediction. Am J Obstet Gynecol 2011;205:e9-10.

Appendices

CURRICULUM VITAE

Janneke van ’t Hooft was born the 15th of October

1985 in Pueblo Nuevo, Nicaragua. Until the age

of eleven she lived in Nicaragua and Bolivia after

which she moved to the Netherlands. In 2004, after

graduating from secondary school, she travelled

for a year visiting the Foundation of Revitalization of

Local Health Traditions in Bangalore-India as well as

Cuba and Central-America. In 2005 she started her

medical studies at the Maastricht University with internships in hospitals in

Nepal, Ecuador and Dublin. She became inspired to do research after joining

the kidney Euro-transplant team collecting samples for kidney transplantation,

and contributing to the ‘Electrohysterography’ project of the Maxima Medical

Center in Veldhoven and TU Eindhoven. In August 2011 she graduated from

medical school and started as a resident at the department of Obstetrics and

Gynaecology of the Kennemer Gasthuis in Haarlem. She had the opportunity to

combine this work with a PhD project supervised by Prof. dr. B.W.J Mol, dr. B.C.

Opmeer and dr. J.H. van der Lee, the outcome of which is presented in this thesis.

During the PhD project she was trained as an epidemiologist at the Graduate

School in Amsterdam and NIHES in Rotterdam. In 2013 she visited the Centro

Rosarino de Estudios Perinatales (CREP) in Rosario, Argentina, as part of the

European collaboration Evidence Based Medicine-Connect. For her work on

Core Outcome Sets she visited the Women’s Health Research Unit at Barts and

the London School of Medicine, UK in 2014. Between 2013 and 2015 she was

an active member of the Global Obstetric Network (GONet), where she helped

organizing several annual meetings and participated in collaborative projects

through prospective Individual Patient Meta-Analyses. At national level she

supported a ‘Stichting Kwaliteitsgelden Medisch Specialisten’ (SKMS) project on

prioritizing topics for evaluation research. Since 2015 she is a member of the

Editorial board of the Dutch Journal for Obstetrics and Gynaecology, Nederlands

Tijdschrift voor Obstetrie en Gynaecologie, representing the CROWN initiative

(Core Outcomes in Women’s and Newborn Health). In August 2015 she started

her residency in Obstetrics and Gyneacology at the Onze Lieve Vrouwe Gasthuis-

Oost in Amsterdam, under the supervision of dr. E.M. Kaaijk. Janneke is married

to Jan - they have a son, Mees.

Dankwoord

DANKWOORD

Onderzoek doe je nooit alleen. Vooral niet bij patiëntgebonden onderzoek.

Op de eerste plaats wil ik alle vrouwen en kinderen bedanken die hebben

deelgenomen aan de twee follow-up studies die in dit proefschrift te lezen zijn.

De mooie, bijgevoegde tekeningen en soms persoonlijke notities maakten het

bijzonder om elke envelop die terugkwam te openen.

Het was van begin tot eind fantastisch om dit onderzoek te mogen doen.

Dat kan alleen met een team om je heen dat je steunt, zowel inhoudelijk als

persoonlijk. Daarom wil ik iedereen die direct en indirect heeft bijgedragen

aan dit proefschrift, bedanken. Jullie zijn met teveel om iedereen persoonlijk te

noemen, maar ik heb veel van jullie mogen leren en me laten inspireren.

Prof. Dr. Mol, beste Ben Willem, dank voor je vertrouwen en begeleiding.

Vanaf moment 1 vertrouwde je mij een presentatie toe waarvan ik tijdens

ons kennismakingsgesprek vond dat het 'wel wat mooier kon worden

vormgegeven'. Je stuurde 1 week later een bericht vanuit Hong Kong dat de

presentatie goed was ontvangen. En zo begon een samenwerking waar ik

enorm van heb genoten. Je creëerde samen met dr. Lips een duo baan voor

Irma en mijzelf, waarin we beiden parttime als ANIOS in Haarlem werkten en als

arts-onderzoeker in het AMC. Een echte aanrader. Je begeleiding begon met

een rol als mentor en docent, gezien mijn onervarenheid in de epidemiologie

en het doelmatigheidsonderzoek. Deze rol heb je nog steeds, maar er is

ook erkenning, collegialiteit en wederzijds begrip bij gekomen. Ik heb dit

vertrouwen enorm gewaardeerd en heb het gevoel dat ik daardoor echt heb

kunnen groeien. Groeien in de epidemiologie en in mijn blik op de medische

wetenschap, groeien in de internationale wetenschappelijke wereld en de

kansen die je me hebt gegeven in GONet, groeien in mijn persoon en in het

vinden van de balans als moeder, arts en wetenschapper. Veel dank hiervoor.

Dr. B.C. Opmeer, beste Brent, vanaf het begin heb je me begeleid en gesteund

in dit traject. Je commitment was groot en ik dank je hiervoor. Jij hebt met

je kritische blik het goede spoor kunnen aanwijzen, en je liet het toe als ik er

bewust van afweek, om daarna tot de conclusie te komen dat je gelijk had. Dat

heeft me wel de mogelijkheid gegeven enorm veel van je te leren. Ook erg leuk

om samen college te hebben gegeven.

Appendices

Dr. J.H. van der Lee, beste Hanneke, iets later stroomde je in als begeleider. Maar

daardoor niet minder waardevol. Je invalshoek vanuit de kindergeneeskunde

heeft een mooie aanvulling gegeven in de inhoud van dit proefschrift. Je

hebt me ‘gescout’ bij de vaatwasser, waar je blijkbaar beoordeelde dat een

onderzoeker die bij de vaatwasser de kopjes in- en uitruimt, wel bereid is om

een langdurige klus aan te kunnen: een systematische review. En het was me

een klus, maar we hebben het wel succesvol voltooid!

De leden van de promotie commissie: Prof. dr. J.A.M. van der Post, Prof. dr.

E. Pajkrt, Prof. dr. A.H.L.C. van Kaam, Prof. dr. T.J. Roseboom en Dr. M.E. van

den Akker-van Marle. Dank voor het beoordelen van het manuscript en het

plaatsnemen in de commissie. Prof. dr. A.L. van Baar, dank voor het willen

deelnemen aan de commissie op de dag van de verdediging. Prof. dr. K.S. Khan,

thank you for being here today. Your international EBM collaborations have

given me the oportunity to visit Argentina and London. I’ve learned so much

during my stay at the Women’s Health Research Network (Also thanks to Shakila,

James, Emilia and John). It’s an honour that you are one of my opponents.

Graag wil ik alle mede auteurs bedanken voor hun bijdrage aan het manuscript

in dit proefschrift (I would like to thank all co-authors of the chapters in this

thesis). In het bijzonder Aleid van Wassenaer en Anneloes van Baar voor de

samenwerking met de kindergeneeskunde en pedagogiek. Ik heb enorm veel

bewondering voor jullie drive in de Landelijke Neonatale Follow-up Club en de

sessies die jullie hebben bijgewoond in de Follow-up club van het consortium.

Dit heeft echt het verschil gemaakt in de uitvoering van de follow-up binnen de

consortium studies. Timo de Haan wil ik bedanken voor het vertrouwen in de

uitvoering van de systematische review. Een samenwerking die de Cochrane

heeft opgemerkt waardoor we dit samen nog een keer mogen vervolgen.

Cornelieke Aarnoudse-Moens voor de steun en leuke brainstormsessies en

Leonie Steenis voor de fijne samenwerking bij de coördinatie en uitvoering van

de Bayley onderzoeken en dank aan alle Bayley testers die hiervoor het hele

land zijn afgereisd. Cornelieke van de Beek voor je fijne steun in de logistiek en

inhoud van de follow-up studies. Caroline Schneeberger voor je vertrouwen

in de afronding van de TripleP follow-up. Jouw werk in deze studie mag niet

ongezien blijven.

Dankwoord

Dit onderzoekstraject was lang niet zo leuk geweest zonder de hulp,

brainstormsessies en gezelligheid van een aantal wetenschapsstudenten.

Allereerst Maarten Vink. Je interesse in economische analyses bracht ons

samen. Cuny Cuijpers, Elise Bleker, Margot Vinke en Noor Simons. Jullie waren

het follow-up team. De verzameling en invoering van de data hebben jullie heel

gedreven en nauwkeurig gedaan. Het bij elkaar controleren van de data werkte

goed en jullie hebben een belangrijke basis gelegd van twee mooie artikelen.

Lieve mede-onderzoekers van het AMC en overige werknemers van de

afdeling Verloskunde & Gynaecologie in het AMC. Veel dank voor alle steun.

Op de kamer met Irma, Rosa, Gert, Joost, Fred, Raissa en later Bouchra, Laura,

Chantal, Maud en Iris. Het was ontzettend waardevol om met elkaar te sparren.

En ook te ontspannen, zoals met de boottochten van Bart en het bezoek aan

congressen met oa NY-ladies Brenda, Sabine en Larissa, Boston-crew Parvin en

Katrien en fantastische Argentinië maatje Babette. Katrien: de Spss uitleggen

op de fiets naar het station zijn de beste tutorials! Parvin, Brenda en Floortje,

gaaf dat we samen zo lekker kunnen brainstormen over onderzoek! En bedankt

Floortje voor je kritische blik in het laatste uur. Anneloes: het opzetten van

GONet begon bij jou. Mirjam: altijd gezellig met een bakkie koffie. Dank voor

jullie vriendschap!

Sjaak Wijma, Veronique van Dooren, Sjoert Repping, Teus van Barneveld en

Maya Kruijt. Het consortium verbonden met de NVOG en het kennisinstituut.

Ik vond het fantastisch om gezamenlijk aan hetzelfde doel te werken in de

transitie van consortium, naar consortium 2.0 waarbij ook het prioriteren van

onderzoek naar voren kwam. Geweldig om daar deel van te zijn geweest.

Alle gynaecologen, arts-assistenten, verloskundigen en verpleegkundigen van

het Kennemer Gasthuis, thans Spaarne Gasthuis. De basis van en het plezier in

het vak heb ik bij jullie mogen ervaren. Veel dank voor de kennis en kunde die

jullie me overbrachten!

Mijn collega’s in het OLVG-Oost. Wat fijn om bij jullie de opleiding te mogen

starten. Ik voel me helemaal thuis in een kliniek waarin wetenschap en praktijk

dicht naast elkaar staan, met de patiënt bovenaan. Dank voor het prettige

leerklimaat en de steun bij de laatste loodjes van dit proefschrift!

Appendices

Mijn paranimfen, wat jn dat jullie naast mij staan tijdens de verdediging. Laura, al kennen we elkaar al sinds de basisschool, de diepe vriendschap is ontstaan op de middelbare school en de eerste ‘wereldreizen’ die we samen maakten als 15 jarigen. We bewandelen ieder onze eigen weg, en toch ook weer een vergelijkbare weg, waarbij ik enorm veel bewondering heb voor je energie en doorzettingsvermogen om te vinden wat nou echt bij je past. Iris, samen met bolle buik en hoge hakken door het AMC. Beiden in een zucht bevallen en de uitdaging van een core outcome set project aangegaan. Je bent mijn onderzoeksmaatje en ik brainstorm heel graag met je over de diepere statistiek en syntaxen. De chill-momenten met de kleine mannen zijn ook erg waardevol.

Willemien Spook: je kunst is inspirerend. Veel dank voor de prachtige illustratie die je hebt gemaakt naar aanleiding van dit proefschrift. Het geeft een mooie en blijvende herinnering aan dit werk.

Lieve vrienden, schoonfamilie en familie. Dank voor jullie steun en begrip. Het was zonder jullie niet gelukt. Jullie zijn met teveel om iedereen op te noemen maar de gezellige momenten samen en daardoor het ontspannen gaf de optimale balans tegenover de uren achter de computer.

Wel wil ik in het bijzonder nog mijn Opa en Oma, Wim en Els, bedanken voor jullie liefde en steun. De reis die we samen maakten rond jullie 80ste levensjaar naar Bolivia is het grootste cadeau dat je een kleinkind kan geven. Paul, Suzanne, Onne, Teun en Josephine, wat gaan jullie het gaaf hebben in Nieuw Zeeland. In gedachten reis ik tijdens de promotie met jullie mee. Marly and Ivan, this book contains a dierent kind of research than your eld. But I hope you enjoy reading it. Theo, Maritza, Laura, Melissa, Matheo. Aunque lejos de distancia, siempre cerca en el corazon. Que maravillosa visita de ustedes en 2013 y espero verles pronto en el futuro. Paul, Be, Willem, Elselien, Rianne en Jans. Ik bof met zo’n schoonfamilie als jullie. Be: veel dank voor je hulp bij het meelezen.

Jochum en Marleen. Wat fantastisch dat we zo lekker kunnen chillen samen. Marleen: Mees is helemaal gek op je, en bedankt voor je steun zo nu en dan. Al kan ik Jochum maar niet overtuigen dat gynaecologie het leukste vak is dat er bestaat, ik zal toch moeten accepteren dat hij meer naar de neurologie kant neigt. You’re in the ow, bro!

Dankwoord

Lieve Katrien, vanaf het begin las je met me mee. Je hebt me de systematiek

in wetenschappelijk schrijven helpen ontwikkelen. Veel dank voor je kritische

blik. In de latere stukken wat minder betrokken, maar niet minder aanwezig in

steun. Je bent de liefste moeder en ik heb veel bewondering voor hoe je mij

hebt gesteund in de weg die ik graag wilde bewandelen. Dick, dank dat je bij

onze familie bent gekomen!

Lieve Jan, er zijn niet genoeg woorden om mijn waardering voor je liefde en

steun te uiten. Zonder jou geen proefschrift. Dit proefwerk is daarom ook aan

jou opgedragen. Het zal nooit gelijk staan aan je geduld en ondersteuning in

periodes waarin ik dat nodig had. We voelen elkaar feilloos aan, en al zijn we in

karakter soms twee tegenpolen, we kunnen tegelijkertijd bloeien door elkaars

energie en kwaliteiten. Met ons lieve zoontje Mees blijven we de wereld samen

ontdekken en hebben we zelfs gedrieën aan tafel gezeten in San Diego met 6

professoren in de obstetrie uit verschillende landen. Je weet dat ik een zwak

heb voor onderzoek, dus dat ik graag door ga met onderzoek na deze PhD

schrikt je gelukkig niet af. Ik geniet van al onze momenten samen. Het tot rust

komen bij elkaar maar ook onze gezamenlijke nieuwsgierigheid naar andere

plekken in de wereld. Ik wil nog ontelbare momenten samen met jou delen.

Janneke van ’t Hooft

Improving evaluation of obstetric interventions Janneke van ’t H

improving evaluation of obstetric interventions€¦ · the mother, suboptimal growth of the...

Documents

trichomona foetus/trypanosomas

antepartum and intrapartum foetal monitoring

hyperglycaemia foetal hyperinsulinism in diabetic

prof sobir optimalisasi lahan suboptimal

foetal alcohol spectrum disorder (fasd)

neonatology subspecialty training curriculum€¦ ·...

a survey of suboptimal search algorithms - computer...

coeur foetal normal

valvular heart disease management and...

anatomy of foetal circulation

tritrichomonas foetus in a cat

effects of alcohol on the foetus fasd... · effects of...

dissertaÇÃo detecção de tritrichomonas foetus e

some recent advances in foetal physiology · the foetal...

foetal membranes-chick

suboptimal landscapes & false mirror

€¦ · web viewiugr is used to designate a foetus that...

late' effects of foetal undernutrition

research on right of the foetus

human foetal development