summarising and validating test accuracy results across ... · summarising and validating test...

44
1 Summarising and validating test accuracy results across multiple studies for use in clinical practice Richard D. Riley Professor of Biostatistics Research Institute for Primary Care & Health Sciences Thank you: Brian Willis, Thomas Debray, Kym Snell, Joie Ensor, Jon Deeks, Carl Moons, Julian Higgins, and others 1

Upload: others

Post on 20-Mar-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

1

Summarising and validating test accuracy results across multiple studies for use in clinical practice

Richard D. Riley Professor of Biostatistics

Research Institute for Primary Care & Health Sciences

Thank you: Brian Willis, Thomas Debray, Kym Snell, Joie Ensor, Jon Deeks, Carl Moons, Julian Higgins, and others

1

2

Objectives

• Consider whether meta-analysis results are actually helpful (applicable) to clinical practice

• Consider methods to examine test performance in multiple settings, that go beyond average results

• Focus on probabilistic inferences and validation

Meta-analysis & heterogeneity • Meta-analysis should be producing clinically useful results

3

Meta-analysis & heterogeneity • Meta-analysis should be producing clinically useful results

• Usually there will be heterogeneity in a meta-analysis of

test accuracy studies.

e.g. differences across studies in: - thresholds reported (we just heard about this!) - methods of measurement - reference standard - population characteristics (case-mix variation) - prevalence of disease

4

Meta-analysis & heterogeneity • Meta-analysis should be producing clinically useful results

• Usually there will be heterogeneity in a meta-analysis of

test accuracy studies.

e.g. differences across studies in: - thresholds reported (we just heard about this!) - methods of measurement - reference standard - population characteristics (case-mix variation) - prevalence of disease • Causes the TRUE sensitivity and TRUE specificity to vary

from setting to setting

5

Meta-analysis & heterogeneity • Yet this does not stop us (me) from doing meta-analysis

• Just use a random effects model !!! - accounts for unexplained between-study heterogeneity

6

Meta-analysis & heterogeneity • Yet this does not stop us (me) from doing meta-analysis

• Just use a random effects model !!! - accounts for unexplained between-study heterogeneity

• Gives a pooled result for sensitivity and specificity • And – most importantly (?) - another publication for the CV

• But can the clinician actually use that pooled result? • Is the summary sensitivity and specificity applicable to their

population?

7

Meta-analysis & heterogeneity • Yet this does not stop us (me) from doing meta-analysis

• Just use a random effects model !!! - accounts for unexplained between-study heterogeneity

• Gives a pooled result for sensitivity and specificity • And – most importantly (?) - another publication for the CV

• But can the clinician actually use that pooled result? • Is the summary sensitivity and specificity applicable to their

population?

Time for us to do better … but can we?

8

9

10

11

Going beyond the average?

PART 1:

Probabilistic inferences for performance in clinical settings

12

Example 1: Accuracy of ear temperature for fever?

11 studies identified that

- All used > 38 degrees to define test positive

- All used rectal temperature as reference standard

- All used the ‘FirstTemp’ ear thermometer

- All used an electronic rectal thermometer

Bivariate meta-analysis used to combined 2 by 2 tables

Produces summary sensitivity and specificity

0.2

.4.6

.81

Sen

sitiv

ity

0.2.4.6.81Specificity

Study estimate Summary point95% confidenceregion

Summary sensitivity

= 65% (51% to 77%)

Summary specificity

= 98% (96% to 99%)

0.2

.4.6

.81

Sen

sitiv

ity

0.2.4.6.81Specificity

Study estimate Summary point95% confidenceregion

Summary sensitivity

= 65% (51% to 77%)

Summary specificity

= 98% (96% to 99%)

But these relate to the average performance across all populations

- could sensitivity and specificity be different in particular settings?

Prediction regions (intervals) indicate the potential true test performance in a single setting

Can be derived after a meta-analysis

In a Bayesian framework they allow predictive inferences & distributions:

Prediction regions (intervals) indicate the potential true test performance in a single setting

Can be derived after a meta-analysis

In a Bayesian framework they allow predictive inferences & distributions:

0.2

.4.6

.81

Sen

sitiv

ity

0.2.4.6.81Specificity

Study estimate Summary point95% confidenceregion

95% predictionregion

Prediction regions (intervals) indicate the potential true test performance in a single setting

Can be derived after a meta-analysis

In a Bayesian framework they allow predictive inferences & distributions:

“What is probability sensitivity and specificity are > 80% in a single setting?”

0.2

.4.6

.81

Sen

sitiv

ity

0.2.4.6.81Specificity

Study estimate Summary point95% confidenceregion

95% predictionregion

Probability sensitivity and specificity will fall in this region (values > 80%) in a single setting = 0.18

Prediction regions (intervals) indicate the potential true test performance in a single setting

Can be derived after a meta-analysis

In a Bayesian framework they allow predictive inferences & distributions:

“What is probability sensitivity and specificity are > 80% in a single setting?”

Example 2: Accuracy of PTH for hypocalcaemia? 5 studies identified that studied parathyroid (PTH)

- Patients all had a thyroidectomy

- % change in PTH from before to after surgery

- Change > 65% indicates test positive

- Reference standard measured 48 hours later

• Accurate PTH test may help to send people home earlier

• Bivariate meta-analysis used to combined 2 by 2 tables

• Produces summary sensitivity and specificity

Sensitivity

Specificity

0.2

.4.6

.81

Sens

itivi

ty

0.2.4.6.81Specificity

Study estimate Summary point

Probability sensitivity and specificity will fall in this region (values > 80%) in a single setting = 0.57

Sensitivity

Specificity

Going beyond the average?

PART 2:

Tailoring PPV and NPV to clinical settings

(with validation)

23

PPV and NPV PPV: probability of disease given positive test result NPV: probability of non-disease given negative test result

• Clinicians are usually more interested in PPV and NPV • They need to combine sensitivity and specificity with

prevalence to obtain them (e.g. using Bayesian theorem, or an equation)

• But what values of sensitivity, specificity and prevalence

do they take from a meta-analysis?

• Are the derived PPV and NPV reliable?

24

Deriving PPV and NPV OPTION A: • Take the pooled sensitivity, specificity and prevalence

from meta-analysis of existing studies (though makes no use of local data)

25

Deriving PPV and NPV OPTION A: • Take the pooled sensitivity, specificity and prevalence

from meta-analysis of existing studies (though makes no use of local data)

• Applying Bayes Theorem we obtain predictions using:

26

Deriving PPV and NPV OPTION B: • Take pooled sensitivity & specificity from meta-analysis • Combine with ‘known’ prevalence in local setting

27

Deriving PPV and NPV OPTION B: • Take pooled sensitivity & specificity from meta-analysis • Combine with ‘known’ prevalence in local setting • Applying Bayes Theorem we obtain predictions using

28

Deriving PPV and NPV OPTION C: • Develop a meta-regression with study-level covariates • Then predict PPV and NPV using the fitted model

29

Deriving PPV and NPV OPTION C: • Develop a meta-regression with study-level covariates • Then predict PPV and NPV using the fitted model

e.g. fit bivariate meta-analysis of PPV and NPV from existing studies, with prevalence as a covariate (Leeflang et al.)

Obtain predictions using…

30

Are predicted PPV and NPV reliable? As for any risk prediction equation, we must check performance • Here good calibration is essential

• Does predicted PPV & NPV agree with observed PPV & NPV?

31

Are predicted PPV and NPV reliable? As for any risk prediction equation, we must check performance • Here good calibration is essential

• Does predicted PPV & NPV agree with observed PPV & NPV? PROBLEM: • It is well-known that developed models are over-fitted

(over-optimistic) in the data they were developed in (see work by Harrell, Steyerberg, etc)

PROPOSAL: Use internal-external cross-validation (Royston et al., Debray et al.)

• Each cycle produces estimates of predictive performance (calibration statistics such as O:E)

• Can then use meta-analysis to summarise across all cycles

Internal-external cross-validation (IECV): example with 3 studies

Internal-external cross-validation (IECV) • Helps answer the question:

“If I use meta-analysis to develop a prediction equation (model) for deriving PPV and NPV in a new population, - is it likely to perform well? - if not, will a different strategy work better?”

34

Revisit PTH example - can we use options A or B to produce reliable PPV and NPV for particular clinical settings? (i) apply internal-external cross-validation (ii) then summarise performance

35

OPTION A: Predict PPV and NPV from meta-analysis estimates

36

OPTION B: Tailor PPV and NPV using local prevalence

37

OPTION B: Tailor PPV and NPV using local prevalence

38

But large uncertainty due to small number of studies: Probability of only 40% that true O/E is between 0.9 and 1.1

NPV is most important for the PTH test

We want to know for sure that a patient does not have hypocalcaemia, so we can send them home

Q: “What is the potential true NPV given a predicted NPV from option B”

• Internal-external cross-validation followed by Bayesian meta-analysis of calibration performance tells us:

For a predicted NPV of 0.95, there is a 95% probability the true NPV is between 0.78 and 0.99 - acceptable error?

39

Risk prediction models • Our examples focused on test research • But principles apply to risk prediction research in general e.g. multivariable models for diagnosis or prognosis • Allow the inclusion of patient-level (& study-level) covariates • Aim the same: we should want reliable model predictions in all

clinical settings, not just on average

40

41

Time to go beyond the average performance and validate (improve) predictions in each

setting, subgroup, and population of interest

External validation of a prognostic model in breast cancer patients

- does it calibrate well upon validation? OPTION 1: No recalibration OPTION 2: Recalibration of the baseline hazard

42

Country used for external validation

Country used for external validation

Summary

A challenge for us all:

- Are our meta-analysis results useful?

- Should we move away from summary results?

- Focus rather on probabilistic statements?

- Leave out studies to validate test performance? Much related work: - in particular ‘tailored meta-analysis’ of Willis and Hyde - overfitting & recalibration of prediction models (Steyerberg et al.) - IPD meta-analysis of risk prediction studies (Debray et al.)

Other clinical measures highly relevant (e.g. net-benefit: Vickers et al)

Some references

• Willis BH, Hyde CJ. Estimating a test's accuracy using tailored meta-analysis-How setting-specific data may aid study selection. J Clin Epidemiol 2014;67(5):538-46.

• Riley RD, Ahmed I, Debray TP, et al. Summarising and validating test accuracy results across multiple studies for use in clinical practice. Stat Med 2015;34(13):2081-103

• Leeflang MM, Bossuyt PM, Irwig L. Diagnostic test accuracy may vary with prevalence: implications for evidence-based diagnosis. J Clin Epidemiol 2009;62(1):5-12.

• Leeflang MM, Deeks JJ, Rutjes AW, et al. Bivariate meta-analysis of predictive values of diagnostic tests can be an alternative to bivariate meta-analysis of sensitivity and specificity. J Clin Epidemiol 2012;65(10):1088-97.

• Leeflang MM, Rutjes AW, Reitsma JB, et al. Variation of a test's sensitivity and specificity with disease prevalence. CMAJ : 2013;185(11):E537-44.

• Debray TP, Moons KG, Ahmed I, et al. A framework for developing, implementing, and evaluating clinical prediction models in an individual participant data meta-analysis. Stat Med 2013;32(18):3158-8.

KEELE COURSE: Statistical methods for IPD meta-analysis, 6-7th December 2016