goettingen impact nov2017 handout - s2... · comparative designs e.g. target population, quality of...

18
12/11/2017 1 Multivariable diagnostic models: measuring their impact HANS REITSMA, MD, PHD Julius Center for Health Sciences and Primary Care Cochrane Netherlands THINC. The Healthcare Innovation Center Utrecht, The Netherlands Personal background Trained as a MD, working as clinical epidemiologist Better decisions for patients Research into whether interventions produce better outcomes for patients or reduce costs Long-term interest in the evaluation of diagnostic tests, biomarkers and prediction models Innovation in the evaluation University Medical Center Utrecht

Upload: others

Post on 06-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Goettingen impact nov2017 handout - S2... · comparative designs e.g. target population, quality of reference standard procedure – Compare like with like – If difference in accuracy,

12/11/2017

1

Multivariable diagnostic models:

measuring their impact

HANS REITSMA, MD, PHD

Julius Center for Health Sciences and Primary Care

Cochrane Netherlands

THINC. The Healthcare Innovation Center

Utrecht, The Netherlands

Personal background

� Trained as a MD, working as clinical epidemiologist

� Better decisions for patients

� Research into whether interventions produce better outcomes for patients or reduce costs

� Long-term interest in the evaluation of diagnostic tests, biomarkers and prediction models

� Innovation in the evaluation

University Medical Center Utrecht

Page 2: Goettingen impact nov2017 handout - S2... · comparative designs e.g. target population, quality of reference standard procedure – Compare like with like – If difference in accuracy,

12/11/2017

2

Landscape of Test Evaluation

What do we use medical tests for?

Diagnosis in practice

� Key question: what is causing my symptoms?

– cross-sectional interest

– multiple pieces of information

� Probabilistic thinking: estimating that a particular condition is present given test results

� Diagnosis relevant for:

– likely course for the patient

– guides therapy and further management

Multistage, Multidimensional Diagnostic Process

PresentingSymptoms& Signs

FurtherTesting 1

TherapyPatient

Outcomes

FurtherTesting 2

No therapyPatient

Outcomes

Symptoms& Signs& Lab

& Imaging

Diagnosis

U n c e r t a i n t y

Page 3: Goettingen impact nov2017 handout - S2... · comparative designs e.g. target population, quality of reference standard procedure – Compare like with like – If difference in accuracy,

12/11/2017

3

Problems in test evaluation

� Dominance of single test evaluation

� Accuracy is an intermediate outcome Dominance of single test evaluation

Dominance of single test evaluation

� Diagnostic studies typically are single diagnostic accuracy studies using routine care data or convenience sample

� However, diagnostic process in practice is about pathways and multiple tests

� Informative questions are comparative, about changes in specific pathways, like triage, replacement and added value

Bossuyt et al. BMJ 2005;332:1089-92

Changes in diagnostic pathways

Page 4: Goettingen impact nov2017 handout - S2... · comparative designs e.g. target population, quality of reference standard procedure – Compare like with like – If difference in accuracy,

12/11/2017

4

The career of a medical test

Birth of anew test

Conceptuallyright?

Ability to distinguish diseased from non-diseased?

Role of the testin practice?

Brilliant idea

Improve patient important outcomes?

Technical validityIs the measurement true?

Clinical validityIs het clinically meaningful?

Clinical utilityWill using the test improve

patient outcomes?

Outside world

Comparative accuracy questions

� Goal: to compare the accuracy of two or more index tests (A vs B)

� In literature often non-comparative designs:single arm or singe test studies

Series of patients

Index test A

Reference standard

Cross tabulation

Study 1

Series of patients

Index test B

Reference standard

Cross tabulation

Study 2

Overview designs

Series of patients

Index test

Reference standard

Cross tabulation

Series of patients

Index test A

Reference

standard

Cross tabulation

Index test B

Reference

standard

R

Series of patients

Index test A or Index test B

Reference standard

Cross tabulation

Index test A or Index test B

Single arm/test design Parallel randomised design Cross-over design

Comparative studies “direct evidence” Key advantageHigher validity in comparative designs

� Gain in validity:

– Easier to keep everything else similar in comparative designs e.g. target population, quality of reference standard procedure

– Compare like with like

– If difference in accuracy, more confidence that it is related to the difference in tests

Page 5: Goettingen impact nov2017 handout - S2... · comparative designs e.g. target population, quality of reference standard procedure – Compare like with like – If difference in accuracy,

12/11/2017

5

Efficiency of Paired Design

� Paired / Cross-over design: within-patient analysis leads to increased power over parallel randomised design if correlation exist

– Even if no correlation exist -> still half the number of patients necessary

� Insight in the correlation: errors in same or different patients

� Drawbacks: patient burden, one test influencing the other, time interval

Cross-Over / Paired DesignSpecific quality issues

� Verify that validity threats associated with cross-over designs are absent

� Perform both tests in all subjects

� Blinded assessment of both index tests

� First test does not influence the result of the second e.g. learning effects

� No change in disease status between tests due to time interval, change in treatment, performing first test

Accuracy is an intermediate outcome

Patient Outcome

Medical Test Test Results

Treat

Patient Outcome

Clinical

Decision

No Treat

Patient Outcome

Relation between Tests and Patient Outcomes

Page 6: Goettingen impact nov2017 handout - S2... · comparative designs e.g. target population, quality of reference standard procedure – Compare like with like – If difference in accuracy,

12/11/2017

6

Test B

Test A

RStudy

Population

Outcome

Outcome

positive

result

negative

result

positive

result

negative

result

treat X

treat Y

treat X

treat Y

RCT of test-treatment combinations RCT of Test-Treatment combinations

� Best evidence for assessing utility, but rare

� Documents both intended positive effects and unintended negative effects of testing

� Correct methods in designing such RCT:

– Strict protocol linking test results with specific interventions

– Often requires large sample sizes

� Linked evidence approach alternative?

Test A or B?

risk reduction as observed in trial + risk side effects

Test A

True positive [p*seA]

True negative [(1-p)*spA]

Test B

False positive [(1-p)*(1-specA)]

False negative [p*(1-seA)]

True positive [p*seB]

False positive [(1-p)*(1-specB)]

False negative [p*(1-seB)]

True negative [(1-p)*spB]

Treat

Treat

No Treat

Treat

Treat

No Treat

No Treat

No Treat

risk side effects, no treatment effect

risk untreated

none

risk reduction as observed in trial + risk side effects

risk untreated

none

Sutton et al. Integration of meta-analysis and economic decision modelling for evaluating tests. MDM 2008

risk side effects, no treatment effect

p = prevalencese = sensitivitysp = specificity

New highly sensitive test

de Groot et al. Methodologic approaches to evaluating new highly sensitive diagnostic test:

avoiding overdiagnosis. CMAJ 2016

Page 7: Goettingen impact nov2017 handout - S2... · comparative designs e.g. target population, quality of reference standard procedure – Compare like with like – If difference in accuracy,

12/11/2017

7

Multivariable prediction models:

Measuring their Impact

Clinical prediction models: general aims

� Risk Prediction = foreseeing / foretelling

– … (probability) of something that is yet unknown

� Convert predictor values of subject to an absolute probability…

– … of having a particular disease � diagnosis

– … of developing a particular event � prognosis

� Different uses, including:

– informing patients / relatives

– risk adjustment in performance studies

– guide clinical decisions

Example

BMJ 2004;329:206-210

Page 8: Goettingen impact nov2017 handout - S2... · comparative designs e.g. target population, quality of reference standard procedure – Compare like with like – If difference in accuracy,

12/11/2017

8

Diagnostic research: results Diagnostic research: results continued

Evaluation journey of prediction models

Model development

•Identify variables

•Model building

Validation

•Narrow

•Broad

Updating

•Improve performance

Impact

•Quantify impact

•Experimental design

Dissemination &Implementation

•Widespread use

•Barriers

Increasing level of evidence

Prediction models: clinical benefit

� … Not meant to replace physician, but to complement their clinical intuition -> decision support tool

� Assumptions how prediction models lead to clinical benefits:

– generate accurately estimated probabilities…

– … improve decision making by health care professionals …

– .… which subsequently lead to clinical benefits

Page 9: Goettingen impact nov2017 handout - S2... · comparative designs e.g. target population, quality of reference standard procedure – Compare like with like – If difference in accuracy,

12/11/2017

9

Measuring impact Case study: HEART score

Impact of a risk stratification tool in chest pain patients

Judith Poldervaart, MD, PhD-student

Sponsor: ZON-MW, grant number 171202015

Chest pain

� Common cause to visit ER: underlying heart condition may be serious

� Diagnosis is challenging

� If acute coronary syndrome present prompt treatment required, but only in 20%

� Currently approach is conservative: additional testing & admission in 75% of the patients

Role for prediction rules to triage such patients?

The HEART-score

• Diagnostic risk score for chest pain patients at ED

• 5 clinical elements

• Supports direct clinical decision

Risk

category

HEART

scorePolicy

Low 0-3 Discharge

Intermed 4-6 Observation

High 7-10 Invasive

HEART score hypothesis

Actual use of the HEART score is safe and streamlines the management of patients with chest pain

� More aggressive approach in high-risk patients

� Fewer admissions and extra testing in low-risk patients

Page 10: Goettingen impact nov2017 handout - S2... · comparative designs e.g. target population, quality of reference standard procedure – Compare like with like – If difference in accuracy,

12/11/2017

10

What evidence is enough? HEART score: Three validation studies

Study 1 Study 2 Study 3 Total

Low risk

(0-3)3/303*

0.99%

15/870

1.7%

14/820

1.7%

32/1993

1.6% (95%-CI 1.05-2.15)

Intermediate

risk (4-6) 48/413 183/1101 160/1622391/3136

12.5% (95%-CI 11.34-13.66)

High risk

(7-10) 107/164 209/417 200/464516/1045

49.4% (95%-CI 46.37- 52.43)

Total number

of pts 880 2388 2906 6174

*Ratio MACE / total number of patients

Rationale for decision model

� Formally structure the problem, specifying the relevant intervention options and key outcomes

� Performing multiple sensitivity analysis

� Helpful step in designing the primary study to generate direct evidence

� Can the decision model predict the results of the trial?

� Value of information analysis

PopulationChest pain

presenting at theER

Care as usual

HEARTscore Compliance

No Compliance

Score 4 - 6

Score 0 - 3

Score 7 - 10

Same as above

Non-invasivetests

Discharge home

No MACE

MACE

The HEART score model

Discharge home

Non-invasivetests

Invasive tests

Positive(admission)

Negative(discharge)

Positive(admission)

Negative(discharge)

No MACE

MACE

Page 11: Goettingen impact nov2017 handout - S2... · comparative designs e.g. target population, quality of reference standard procedure – Compare like with like – If difference in accuracy,

12/11/2017

11

Impact studies: aim & rationale

� Aim: Whether the actual use of a prediction model truly improves physicians decision making and subsequent health outcomes, and ideally cost-effectiveness of care

Impact studies: design issues

� Prediction model is now a health care intervention (part of a complex intervention)

� Same methodological principles as in the evaluation of any intervention

� Key features:

– Part of strategy: outcomes not generated by prediction model alone

– Comparative: relative to best current practice

– Outcomes: outcomes that matter to patient or society

Designing the HEART impact study

HEART impact study

� Design?

Page 12: Goettingen impact nov2017 handout - S2... · comparative designs e.g. target population, quality of reference standard procedure – Compare like with like – If difference in accuracy,

12/11/2017

12

Before impact studies: alternatives

� Before performing the ultimate RCT consider alternatives that are cheaper & easier

� Determine the best model: review.

� Comparative validation studies

� Cross-sectional studies with intermediate outcome: therapeutic decisions

� If remaining uncertainty, perform impact study

HEART impact study

� Design / allocation?

� Options:

– Individual randomization

– Cluster randomized trial

– Before-after

– Stepped wedge

HEART impact study

� Stepped wedge design

HEART-Impact trial: stepped wedge design

Page 13: Goettingen impact nov2017 handout - S2... · comparative designs e.g. target population, quality of reference standard procedure – Compare like with like – If difference in accuracy,

12/11/2017

13

HEART-Impact trial: stepped wedge design HEART Impact trial: stepped wedge design

Stepped wedge: key features

� Type of cluster randomized design:

– Comparability of intervention groups

– Clustered data

� Type of cross-over design, unidirectional:

– Improved logistics: 1 cluster switches at a time

– Time as confounder

– Statistical analyses & sample size & monitoring

– Within-cluster comparison possible

� All clusters experience new intervention:

– Increase participation

– Evaluate implementation

– De-implementation

HEART impact study

� Population?

Page 14: Goettingen impact nov2017 handout - S2... · comparative designs e.g. target population, quality of reference standard procedure – Compare like with like – If difference in accuracy,

12/11/2017

14

HEART impact study: population

� Pragmatic research. Broad inclusion of all patients in whom the model will be used in practice

� Domain: All chest pain patients presenting at ED

Inclusion criteria:• Chest pain• >18 years• Legal competence

Exclusion criteria:• STEMI (ST-Elevation

Myocardial Infarction)

HEART impact study

� Interventions?

HEART score Risk of MACE Proposed policy

0-3 1.6% Discharge

4-6 12.5%Observation

+ Non-invasive diagnostics

7-10 49.4%Early invasive treatment

CAG/PCI

‘HEART period’Assessment of a chest pain

patient with active use of the HEART score

HEART impact study: HEART intervention Use of the model

� Assistive vs. directive rules

– Assistive: only predicted probabilities

• Room for intuition

– Directive: probabilities with corresponding actions

• Appears to have greater impact

Page 15: Goettingen impact nov2017 handout - S2... · comparative designs e.g. target population, quality of reference standard procedure – Compare like with like – If difference in accuracy,

12/11/2017

15

‘Usual care period’ Assessment of a chest pain patient according to current

local guidelines

HEART impact study: control interventionHEART impact study

� Outcomes?

HEART impact study

� Measure multiple outcomes related to all relevant down-stream consequences

� Take into account the hierarchy of outcomes

� HEART study:

– Safety: Major Adverse Cardiac Events (AMI, PCI, CABG, death) within 3 months

– Quality of life

– Costs & cost-effectiveness

– Number of patients with low HEART score developing MACE

– Duration of time on ED

Stepped wedge trial results

Page 16: Goettingen impact nov2017 handout - S2... · comparative designs e.g. target population, quality of reference standard procedure – Compare like with like – If difference in accuracy,

12/11/2017

16

Results: patient flow

Results: baseline characteristics

All patients

(N=3,648)

Usual care

(N=1,827)

HEART care

(N=1,821)

Demographics

# male 1980 (54%) 1005 (55%) 975 (54%)

Mean age (SD) 62 (14) 62 (14) 62 (14)

History of cardiovascular disease 1266 (35%) 670 (37%) 596 (33%)

HEART score

HEART score 0-3 (low risk) - - 715 (40%)

HEART score 4-6 (intermediate risk) - - 861 (49%)

HEART score 7-10 (high risk) - - 190 (11%)

Results: incidence of MACE

Usual care

(n=1,827)

HEART care

(n=1,821)

HEART 0-3

(n=715)

HEART 4-6

(n=861)

HEART 7-10

(n=190)

Number of patients with MACE 405 (22.2%) 345 (18.9%) 14 (2.0%) 175 (20.2%) 140 (73.7%)

MACE - components*

Death – total 9 (0.5%) 5 (0.3%) 1 (0.1%) 2 (0.2%) 2 (1.1%)

Cardiovascular death 6 1 0 0 1

Non-cardiovascular death 0 1 0 0 1

Death by unknown cause 3 3 1 2 0

Cardiac ischemia – total 400 (21.9%) 329 (18.1%) 10 (1.4%) 162 (18.8%) 143 (75.3%)

Unstable angina 157 105 6 70 25

NSTEMI 214 211 4 91 107

STEMI 29 13 0 1 11

Significant stenosis – total 290 (15.9%) 247 (13.6%) 10 (1.4%) 117 (13.6%) 102 (11.8%)

Stenosis managed conservatively 39 41 1 27 13

PCI 208 158 7 70 66

CABG 43 48 2 20 23

Total number of MACE 699 581 21 281 247

* total of MACE components exceeds MACE total: 1 patient can have > 1 component

Results: non-inferiority of HEART care

Page 17: Goettingen impact nov2017 handout - S2... · comparative designs e.g. target population, quality of reference standard procedure – Compare like with like – If difference in accuracy,

12/11/2017

17

Summary of results

• % admitted:

– Usual care: 34% versus HEART care: 31%

– Low HEART score patients: 9%

• 1 or more out-patient visits:

– Usual care: 60% versus HEART care: 70%

– Low HEART score patients: 53%

• 1 or more additional diagnostic tests:

– Usual care: 65% versus HEART care: 57%

– Low HEART score patients: 40%

• Quality of life & costs sligthly better in HEART patients

Implications of findings

• HEART score is safe in work-up chest pain patients

• Identification of barriers for acceptance and use

• Chest pain remains diagnostic dilemma:

– What is an acceptable risk?

Publication Take home messages: impact studies

• Randomized clinical trial (RCT) provides most valid and direct evidence on (cost-)effectiveness

• Disadvantages:

– long duration

– large sample sizes

– costly

– further changes in strategy likely: new marker, other treatments

Page 18: Goettingen impact nov2017 handout - S2... · comparative designs e.g. target population, quality of reference standard procedure – Compare like with like – If difference in accuracy,

12/11/2017

18

Take home messages

� Great progress in methodology. Use it!

� Extensive validation and if necessary updating

� Use decision model to structure the problem: incorporate all actions and down-stream consequences

� Use model to predict results of impact study

� Impact study: think carefully when to do

– select best and mature model

– link probabilities to action (directive actions)

– use complex intervention methodology

Further reading (1)

� Moons KG, Altman DG, Reitsma JB, et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med. 2015 Jan 6;162(1):W1-73.

� Steyerberg EW et al., Prognosis Research Strategy (PROGRESS) 3: Prognostic Model Research. PLoS Med 2013.

� Altman DG et al. Prognosis and prognostic research: validating a prognostic model. BMJ 2009

� Moons KG et al. Prognosis and prognostic research: application and impact of prognostic models in clinical practice. BMJ 2009.

� Moons KG, Kengne AP, Grobbee DE, Royston P, Vergouwe Y, Altman DG, et al. Risk prediction models: II. External validation, model updating, and impact assessment. Heart. 2012;98:691-8.

� Poldervaart JM, Reitsma JB, Koffijberg H, al. The impact of the HEART risk score in the early assessment of patients with acute chest pain: design of a stepped wedge, cluster randomised trial. BMC Cardiovasc Disord. 2013;13:77.

� Poldervaart JM, Reitsma JB, Backus BE, et al. Effect of Using the HEART Score in Patients With Chest Pain in the Emergency Department: A Stepped-Wedge, Cluster Randomized Trial. Ann Intern Med 2017;166:689-697.

Further reading (2)

� Hemming K, Haines TP, Chilton PJ, Girling AJ, Lilford RJ. The stepped wedge cluster randomised trial: rationale, design, analysis, and reporting. BMJ. 2015;350:h391.

� Davey C, Hargreaves J, Thompson JA, et al. Analysis and reporting of stepped wedge randomised controlled trials: synthesis and critical appraisal of published studies,

� 2010 to 2014. Trials. 2015;16:358.

� Zhan Z, de Bock GH, van den Heuvel ER. Statistical methods for unidirectional switch designs: Past, present, and future. Stat Methods Med Res. 2017.

� Lord SJ, Irwig L, Simes RJ. When is measuring sensitivity and specificity sufficient to evaluate a diagnostic test, and when do we need randomized trials? Ann Intern Med 2006;144:850-5.

� Lijmer JG, Bossuyt PM. Various randomized designs can be used to evaluate medical tests. J Clin Epidemiol 2009;62:364-73.

� Ferrante di Ruffano L, Davenport C, Eisinga A, et al. A capture-recapture analysis demonstrated that randomized controlled trials evaluating the impact of diagnostic tests on patient outcomes are rare. J Clin Epidemiol 2012;65:282-7.

� de Groot JA, Naaktgeboren CA, Reitsma JB, Moons KG. Methodologic approaches to evaluating new highly sensitive diagnostic tests: avoiding overdiagnosis. CMAJ 2017;189:E64-E68.

Further reading (3)

� Staub LP, Dyer S, Lord SJ, et al. Linking the evidence: intermediate outcomes in medical test assessments. Int J Technol Assess Health Care 2012;28:52-8.

� Koffijberg H, van Zaane B, Moons KG. From accuracy to patient outcome and cost-effectiveness evaluations of diagnostic tests and biomarkers: an exemplary modelling study. BMC Med Res Methodol 2013;13:12.

� Schaafsma JD, van der Graaf Y, Rinkel GJ, et al. Decision analysis to complete diagnostic research by closing the gap between test characteristics and cost-effectiveness. J Clin Epidemiol 2009;62:1248-52.

� Siontes KC et al. Diagnostic tests often fail to lead to changes in patient outcomes. J Clin Epidemiol 2014.

� El Dib R, et al. Systematic survey of randomized trials evaluating the impact of alternative diagnostic strategies on patient-important outcomes. J Clin Epidemiol2017.