darlene goldstein 29 january 2003 receiver operating characteristic methodology

27
Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology

Post on 21-Dec-2015

243 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology

Darlene Goldstein 29 January 2003

Receiver Operating Characteristic Methodology

Page 2: Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology

Outline

• Introduction

• Hypothesis testing

• ROC curve

• Area under the ROC curve (AUC)

• Examples using ROC

• Concluding remarks

Page 3: Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology

Introduction to ROC curves

• ROC = Receiver Operating Characteristic

• Started in electronic signal detection theory (1940s - 1950s)

• Has become very popular in biomedical applications, particularly radiology and imaging

• Also used in machine learning applications to assess classifiers

• Can be used to compare tests/procedures

Page 4: Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology

ROC curves: simplest case

• Consider diagnostic test for a disease

• Test has 2 possible outcomes:

– ‘postive’ = suggesting presence of disease

– ‘negative’

• An individual can test either positive or negative for the disease

• Prof. Mean...

Page 5: Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology

Hypothesis testing refresher

• 2 ‘competing theories’ regarding a population parameter:– NULL hypothesis H (‘straw man’)– ALTERNATIVE hypothesis A (‘claim’, or

theory you wish to test)• H: NO DIFFERENCE

– any observed deviation from what we expect to see is due to chance variability

• A: THE DIFFERENCE IS REAL

Page 6: Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology

Test statistic

• Measure how far the observed data are from what is expected assuming the NULL H by computing the value of a test statistic (TS) from the data

• The particular TS computed depends on the parameter

• For example, to test the population mean , the TS is the sample mean (or standardized sample mean)

• The NULL is rejected fi the TS falls in a user-specified ‘rejection region’

Page 7: Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology

True disease state vs. Test result

not rejected rejected

No disease (D = 0)

specificity

XType I error (False +)

Disease (D = 1) X

Type II error (False -)

Power 1 - ; sensitivity

DiseaseTest

Page 8: Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology

Specific Example

Test Result

Pts Pts with with diseasdiseasee

Pts Pts without without the the diseasedisease

Page 9: Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology

Test Result

Call these patients “negative”

Call these patients “positive”

Threshold

Page 10: Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology

Test Result

Call these patients “negative”

Call these patients “positive”

without the diseasewith the disease

True Positives

Some definitions ...

Page 11: Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology

Test Result

Call these patients “negative”

Call these patients “positive”

without the diseasewith the disease

False Positives

Page 12: Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology

Test Result

Call these patients “negative”

Call these patients “positive”

without the diseasewith the disease

True negatives

Page 13: Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology

Test Result

Call these patients “negative”

Call these patients “positive”

without the diseasewith the disease

False negatives

Page 14: Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology

Test Result

without the diseasewith the disease

‘‘‘‘-’-’’’

‘‘‘‘+’+’’’

Moving the Threshold: right

Page 15: Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology

Test Result

without the diseasewith the disease

‘‘‘‘-’-’’’

‘‘‘‘+’+’’’

Moving the Threshold: left

Page 16: Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology

Tru

e P

osi

tive R

ate

(s

en

siti

vit

y)

0%

100%

False Positive Rate (1-specificity)

0%

100%

ROC curve

Page 17: Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology

Tru

e P

osi

tive

Ra

te

0%

100%

False Positive Rate0%

100%

Tru

e P

osi

tive

Ra

te

0%

100%

False Positive Rate0%

100%

A good test: A poor test:

ROC curve comparison

Page 18: Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology

Best Test: Worst test:T

rue

Po

sitiv

e R

ate

0%

100%

False Positive Rate

0%

100%

Tru

e P

osi

tive

R

ate

0%

100%

False Positive Rate

0%

100%

The distributions don’t overlap at all

The distributions overlap completely

ROC curve extremes

Page 19: Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology

‘Classical’ estimation

• Binormal model:

– X ~ N(0,1) in nondiseased population

– X ~ N(a, 1/b) in diseased population

• Then

ROC(t) = (a + b-1(t)) for 0 < t < 1

• Estimate a, b by ML using readings from sets of diseased and nondiseased patients

Page 20: Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology

ROC curve estimation with continuous data

• Many biochemical measurements are in fact continuous, e.g. blood glucose vs. diabetes

• Can also do ROC analysis for continuous (rather than binary or ordinal) data

• Estimate ROC curve (and smooth) based on empirical ‘survivor’ function (1 – cdf) in diseased and nondiseased groups

• Can also do regression modeling of the test result

• Another approach is to model the ROC curve directlyas a function of covariates

Page 21: Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology

Area under ROC curve (AUC)

• Overall measure of test performance

• Comparisons between two tests based on differences between (estimated) AUC

• For continuous data, AUC equivalent to Mann-Whitney U-statistic (nonparametric test of difference in location between two populations)

Page 22: Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology

Tru

e P

osi

tive

Ra

te

0%

100%

False Positive Rate

0%

100%

Tru

e P

osi

tive

R

ate

0%

100%

False Positive Rate

0%

100%

Tru

e P

osi

tive

R

ate

0%

100%

False Positive Rate

0%

100%

AUC = 50%

AUC = 90% AUC =

65%

AUC = 100%

Tru

e P

osi

tive

R

ate

0%

100%

False Positive Rate

0%

100%

AUC for ROC curves

Page 23: Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology

Interpretation of AUC

• AUC can be interpreted as the probability that the test result from a randomly chosen diseased individual is more indicative of disease than that from a randomly chosen nondiseased individual: P(Xi Xj | Di = 1, Dj = 0)

• So can think of this as a nonparametric distance between disease/nondisease test results

Page 24: Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology

Problems with AUC

• No clinically relevant meaning

• A lot of the area is coming from the range of large false positive values, no one cares what’s going on in that region (need to examine restricted regions)

• The curves might cross, so that there might be a meaningful difference in performance that is not picked up by AUC

Page 25: Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology

Examples using ROC analysis

• Threshold selection for ‘tuning’ an already trained classifier (e.g. neural nets)

• Defining signal thresholds in DNA microarrays (Bilban et al.)

• Comparing test statistics for identifying differentially expressed genes in replicated microarray data (Lönnstedt and Speed)

• Assessing performance of different protein prediction algorithms (Tang et al.)

• Inferring protein homology (Karwath and King)

Page 26: Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology

Homology Induction ROC

Page 27: Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology

Concluding remarks – remaining challenges in ROC methodology

• Inference for ROC curve when no ‘gold standard’

• Role of ROC in combining information?

• Incorporating time into ROC analysis

• Alternatives to ROC for describing test accuracy?

• Generalization of positive/negative predictive value to continuous test? (+/-) predictive value = proportion of patients with

(+/-) result who are correctly diagnosed = True/(True + False)