trimming screening tests and modern psychometrics paul k. crane, md mph general internal medicine...

61
Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University of Washington

Upload: ellen-stafford

Post on 20-Jan-2018

218 views

Category:

Documents


0 download

DESCRIPTION

I. Background on Screening

TRANSCRIPT

Page 1: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Trimming screening tests and modern psychometrics

Paul K. Crane, MD MPHGeneral Internal MedicineUniversity of Washington

Page 2: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Outline

• Background on screening tests• 2x2 tables• ROC curves• Consideration of strategies for

shortening tests• A word or two on testlets

Page 3: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

I. Background on Screening

Page 4: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Purposes of measurement

• Discriminative (e.g. screening)• Evaluative (e.g. longitudinal analyses)• Predictive (e.g. prognostication)

– After Kirshner and Guyatt (1985)

Page 5: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Information curves applied to K&G

theta

mmsetic tic

-3 -2 -1 0 1 2 3

0

10

20

30

40

50

60

70

80

90

100

MMSE

Evaluative

Discriminative

Page 6: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Diagnostic medicine as a Bayesian task

Disease probability

test

Don’t test

treat

Page 7: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Diagnostic medicine as a Bayesian task

Disease probability

test

Don’t test

treat

12

3

Page 8: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Screening tests are the same – only different

• Screening implies applying the test to asymptomatic individuals in whom there is no specific reason to suspect the disease– In the previous slides, the test/don’t test threshold

of 0• Often result of screening test is need for

further testing rather than a specific diagnosis– Need for biopsy rather than need for

chemotherapy

Page 9: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Screening tests

Disease probability

test further

(don’t test)

treat

screen

Page 10: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Rationale for screening

• Screening tests should be applied when they may make a difference– Effect on management– Some difference in outcome between disease

detected in asymptomatic people as opposed to disease detected in symptomatic patients

• (note people vs. patients)• If no difference in outcome, no benefit from expenditures on

screening– Implies a disease model of worsening disease in

which an intervention early prevents subsequent badness

Page 11: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Screening isn’t always a good idea

• Lung cancer in smokers with CT scans (disease grows too fast)

• Liver cancer with CEA in Hep C patients (yield too low, false negatives too common – test isn’t accurate enough)

• Breast cancer in young women (disease is less prevalent but more aggressive, breast biology leads to higher false-positive rates in young women, which in combination lead to unacceptable morbidity for a negligible mortality benefit)

Page 12: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

What about dementia?

• No DMARD equivalents (so far); marginal benefit to early detection– Planning, QOL decisions, etc.– Potential harm in early detection?

• There are those who advocate population-based screening now (Borson 2004)– USPSTF says evidence insufficient to recommend

for or against screening– Spiegel letter to editor (2007); Brodaty paper (2006)

• Primary purpose is research

Page 13: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

What about CIND / MCI?

• Even less rationale for population-based screening– In several studies, while patients with MCI have an

increased risk of progressing to AD, for any individual with MCI their risk for reverting to normal is higher than their risk for AD.

– No intervention known to reduce rates of conversion

• Again, research rationale

Page 14: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Research rationale• Parameter of interest is the rate of disease in the

general population (or other denominator)• Most valid way: gold standard test applied to

entire population– Chicago study, ROS, some others– Problems: expensive

• General idea: apply a screening test / strategy to determine who should receive gold standard eval– Most of the epidemiological studies of cognitive

functioning use this strategy

Page 15: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

2-stage sampling

• 1st stage: everyone in enumerated sample receives a screening test/strategy

• 2nd stage: some decision rule is applied to the 1st stage results to identify people who receive further evaluations to definitively rule-in or rule-out disease

• Analysis: disease status from 2nd stage extrapolated back to the underlying sample from the 1st stage

Page 16: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Variations on a theme

• Simplest: single cut-point, no sampling over the cutpoint– EURODEM, ACT

• Slight elaboration: single cut-point, 100% below, small % above– Can address possibility of false negatives– CSHA

• Fancier still: age/education adjusted cutpoints, sampling (Cache County) (also case-cohort design, which is even more fancy)

Page 17: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Validity of the screening protocol• Imagine an epidemiological risk factor study• Risk factor is correlated with educational quality

(e.g. smoking, obesity, untreated hypertension…)• Educational quality is associated with DIF on the

screening test– People with lower education have lower scores for a given

degree of actual cognitive deficit• Borderline people with higher education more likely

to escape detection by the screening test; ignored by the study

• Rates extrapolated back: biased study

Page 18: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

DIF in screening tests

• DIF thus becomes a key feature for validity of epidemiological investigations of studies that employ 2-stage sampling designs– Crane et al. Int Psychogeriatr 2006; 18: 505-15.

• Overwhelmingly ignored in the literature– Entire session on epidemiological studies of HTN

at VasCOG 2007 in which education and SES were not mentioned at all

• Not really the focus this year, but could be an important feature of validation

Page 19: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Test accuracy in 2-stage sampling

• Begg and Greenes. Assessment of diagnostic tests when disease verification is subject to selection bias. Biometrics. 1983;39:207-215 (web site)

• Straight-forward way to extend back to the original population

Page 20: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Quality of papers on diagnostic tests

• STARD initiative. Ann Int Med 2003 (web site)• Provides a guideline for high-quality articles

about diagnostic or screening tests• We should play by these rules• There is a checklist (p. 42) and a flow chart (p.

43; next slide)– Nothing too surprising– Reviews on quality of papers about diagnostic and

screening tests: quality is terrible.

Page 21: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

STARD flowchart

Page 22: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

II. 2x2 tables

Page 23: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Set up of 2x2 tables

Nature

+ - Total

Test

+ TP FP Test pos

- FN TN Test neg

Total Diseased Non-diseased N

Page 24: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Summaries of 2x2 tables: SN, SP

• Sensitivity– TP/diseased– Proportion of those with disease caught by

the test• Specificity

– TN/non-diseased– Proportion of those who truly don’t have

the disease correctly identified by the test

Page 25: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Summaries of 2x2 tables: LR

• Pos LR– TP/Test positives.– Proportion of those with a positive test who

actually have the disease• Neg LR

– TN/Test negatives– Proportion of those with a negative test

who actually don’t have the disease

Page 26: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

SPIN, SNOUT

• Need a (positive result on a) SPecific test to rule something IN

• Need a (negative result on a) SENsitive test to rule something OUT

• Decent rule of thumb but doesn’t apply pre-test probabilities

Page 27: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

II. ROC curves

Page 28: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

ROC curves• "Signal Dectection Theory"• World War II -- analysis of radar images• Radar operators had to decide whether a blip on the

screen represented an enemy target, a friendly ship, or just noise

• Signal detection theory measured the ability of radar receiver operators to do this, hence the name Receiver Operator Characteristics

• In the 1970's signal detection theory recognized as useful for interpreting medical test results

http://gim.unmc.edu/dxtests/roc3.htm

Page 29: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

ROC basics

• ROC curves plot – sensitivity vs. (1-specificity) – (the true positive rate vs. the false negative rate) – at each possible cutpoint

• Useful for visualizing the impact of various potential cutoff points on a continuous measure (continuous binary)

• Economic decision on cutpoint; no single right answer

Page 30: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Limitations of ROC curves

• Not intended to help with choosing particular items or for improving tests

• Doesn’t tell us which parts of the test (items) are helpful in the region of interest

• Doesn’t help us in combining the best items from several tests

Page 31: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

ROC curve for dementia from ACT

Area under ROC curve = 0.9105

1 - Specificity0.00 0.25 0.50 0.75 1.00

0.00

0.25

0.50

0.75

1.00

626160595857565554

5352

51

50

4948

47

4645

4443

4241 40 39 38 37 36 35 34 33 3231305048

Page 32: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Optimality from an ROC curve

• Always tradeoffs between sensitivity and specificity

• Also numbers of people who need to be evaluated with the gold standard test (number of individuals who will screen positive)

• Optimal point depends on consequences of missing cases (false negatives), costs of working up false positives

• Breast cancer: 10:1 for sufficient sensitivity

Page 33: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

What about normal/CIND/dementia?

• Chengjie Xiong: ROC surface (Stat Med 2006; 25:1251-1273)– May have the same issues in terms of tradeoffs– Does missing a case of CIND have the same

impact as missing a case of dementia?– Should we try to use the same tool to do both

tasks? Dementia/normal is an easier target than CIND/normal. Dementia/CIND is hard and primarily depends on whether deficits have a functional impact, which in turn is very hard to tease out

Page 34: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

III. Shortening of psychometric tests: strategies used in the

literature

Page 35: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Search strategy

• “short*”• “psychometric test*”• #1 AND #2• Convenience sample of resulting

articles– One or two examples of each technique

Page 36: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

CTT strategies

• Bengtssen et al (2007): item:total correlations >0.80; missing>5%– Standard CTT approaches to limiting an

item pool (also commonly see low item:total correlations excluded)

– Doesn’t use disease status

Page 37: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Brute force strategies

• Christensen et al. (2007) looked at all pairs of 2 tests for each subdomain and compared based on alpha and correlation with the subdomain (Psychological Assessment; WAIS-3 SF)

Page 38: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Regression strategies• Regress on total score (for evaluative tests) or use

logistic regression approaches– Sapin et al (2004) used linear regression to predict a

longer measure; nice series of validation steps including an independent validation sample

– Eberhard-Gran et al (2007) used stepwise linear regression; no external validation

• Problems: overfitting, ignores colinearity of items– Need a 2nd confirmatory sample and/or some

bootstrapping approach for model optimism– Also need a modeling strategy: Forwards/backwards

stepwise? Best subsets?

Page 39: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

EFA strategies

• Rosen et al. (2007) looked at loadings from EFA and chose items with the highest loadings– No use of external (disease status) information– Highest loadings (// to highest item

discriminations) has nothing to do with item difficulty; may well end up selecting highly discriminatory items with no relevance to disease/no disease

Page 40: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

CFA strategies

• Bunn et al. (2007) used MPLUS: CFA on a new sample, modified paths and eliminated items to improve fit statistics– No independent sample confirmation– No disease status reference

Page 41: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

IRT strategies

• Gill et al. (2007) – Bayesian IRT but I can’t figure out how they reduced their scale

• Beevers et al. (2007) used nonparametric IRT – single sample. If the items looked bad they threw them out. Psych Assessment– Both of these papers relied only on IRT

parameters to reduce the scale, not anything external (e.g. disease status)

Page 42: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Combining IRT with external information

• Combine item characteristic / information curves with some indicator of disease status – Paul’s old idea: ROC curves, identify region of

interest, determine items with maximal information in that region

– Rich’s new and simpler (and thus likely better) idea: box plots for diseased and non-diseased individuals superimposed on the ICCs/IICs

• Takes advantage of the fact that item difficulty and person ability are on the same scale

Page 43: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Rich’s idea

0

.2

.4

.6

.8

1

Item

mea

n

-3.5 -2.5 -1.5 -.5 .5 1.5 2.5 3.5Theta (24 item)

crying elablity sadness fear anxiety swearing anger irritabi combativ impatien restless incspeed grasping apathy lethargy

0

.2

.4

.6

Item

mea

n

-3.5 -2.5 -1.5 -.5 .5 1.5 2.5 3.5Theta (24 item)

crying elablity sadness fear anxiety swearing anger irritabi combativ impatien restless incspeed grasping apathy lethargy slowness consciou staring startled sleepyda

Page 44: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Paul’s ideaBounded Information between 37 and 53

0

5

10

15

20

25

30

35

40

0 10 20 30 40

Page 45: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Extensions of IRT / external information approaches

• Targeted creation / addition of new items in particular regions of the theta scale seems like a reasonable strategy

• We have only talked about fixed forms – CAT is a reasonable extension– CAT likely more relevant for evaluative tests– Could terminate early if results became clear –

reduced burden for those not close to the threshold

Page 46: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Other strategies

• Decision trees– A bit like PCA– Based entirely on relationships with disease

• Random forests– Machine learning technique; extension of decision

trees– Microarray and GWA applications– Jonathan Gruhl – expertise obtained since he first

heard of this topic on Monday

Page 47: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

General comments

• Literature is pretty wide open• Seems like IRT provides some useful tools• IRT wedded to distribution of scores of diseased

/ non-diseased individuals seems like a good strategy

• Machine learning tools are interesting– Ignore covariation between items, the theoryitem

link• Hope to compare/contrast strategies with CSHA

data

Page 48: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

IV. Testlet response theory

Page 49: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Rationale• IRT posits unidimensionality: a single underlying

latent trait (domain) explains observed covariation between items

• Various tools to address this assumption– Literature essentially always concludes that scales are

sufficiently unidimensional to do what the investigator wanted to do in the first place

• See JS Lai, D Cella, P Crane, “Factor analysis techniques for assessing sufficient unidimensionality of cancer related fatigue.” Qual Life Res 2006; 15: 1179-90.

Page 50: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Dimensionality of cognitive screening tests

• Initial MMSE and 3MS papers do not mention different cognitive domains

• Initial CASI paper (also by Evelyn Teng) describes 9 domains:

• long-term memory, • orientation, • attention, • concentration, • short-term memory, • language, • visual construction, • fluency, • abstraction and judgment

Page 51: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Single factor (IRT) model

i1 i3i2 i4 i6i5 i8i7

F

Page 52: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Bifactor (testlet) model

i1 i3i2 i4 i6i5 i8i7

F3F2

F1

Page 53: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Bradlow, Wainer, Wang (1999) model

• p(Y=1|θ,γ,a,b,)=

– Primary ability θ– Secondary (subdomain) ability γ– Single difficulty parameter b– Single discrimination parameter a

)b(Da

)b(Da

e1e

Page 54: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Li, Bolt, Fu (2006)

• p(Y=1|θ,a1,a2,b,γ)=

– Same as before except relax the constraint that the slope on the primary and secondary domains are the same (a1, a2)

• Compared this model with the constrained Bradlow et al. model– The more flexible model fit the data better– Yanmei Li wins 2005 Brenda H. Loyd Award for outstanding

dissertation work in field of educational measurement from NCME

2Da)b(1Da

2Da)b(1Da

e1e

Page 55: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Why worry about the second dimension?

• Ignoring the second dimension results in over-estimating amount of information on primary domain

• Success on memory questions may be more highly related to success on other memory questions than success on questions from other cognitive domains– Theory suggests a testlet model rather than a single

factor model• Don’t know how big the effect is unless we

measure it

Page 56: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Secondary domains in the 3MS Table 2. Bi-factor results for the 3MS

Item 1o loading

2o loading Name

Birth year 0.95 0.44 Birth date 0.83 0.57

Birth month 0.74 0.52 Birth state 1.22 0.29 Birth town 1.39 0.32

LTM

Date 0.82 0.60 Month 0.99 0.57 Year 1.16 0.51

Season 0.90 0.26 Week 0.94 0.40 State 1.35 0.09

Country 1.15 0.15 City 1.28 0.21

Hospital 1.15 0.14

ORI

Page 57: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Secondary domains, continued

Shirt 0.75 0.43 Brown 1.07 0.58

Honesty 1.11 0.51 Repeat 1.07 0.13 No ifs 0.76 0.47 Ands 0.88 0.64

Or buts 0.90 0.60

ATT

Count 1.28 n/a Spell 0.89 n/a CON

Shirt1 0.72 0.55 Brown1 0.87 0.60

Honesty1 0.76 0.50 Shirt2 0.79 0.62

Brown2 0.91 0.65 Honesty2 0.92 0.51

STM

Page 58: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Secondary domains, slide 3 Forehead 0.95 0.52

Chin 1.22 0.50 Shoulder 1.11 0.47

Elbow 0.89 0.54 Knuckle 0.92 0.36

Obey 1.09 0.30 Write would 1.02 0.48 Write like 1.01 0.51 Write to 1.03 0.56 Write go 1.01 0.58 Write out 1.05 0.57 Left hand 0.56 0.24

Fold 1.00 0.27 Return 0.53 0.28

LAN

Page 59: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Secondary domains, slide 4 L pentagon 0.88 0.56

R pentagon 0.91 0.62 Intersect 0.98 0.54

VC

Animals 1.27 n/a FLU Arm/leg 0.74 0.40

Laugh/cry 0.85 0.62 Eat/sleep 0.72 0.48

ABS

Page 60: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Summary of 3MS dimensionality data from CHS

• Loadings on primary factor are all high– McDonald (1999) cites 0.30 as cutpoint for salient

loadings– “Sufficiently unidimensional” for IRT in MS

• Original rationale for this analysis

• Loadings on the secondary factors are non-ignorable in many cases

• Fit for bi-factor model was better than fit for single factor model– Still a CFA model; not necessarily just relaxing constraints

• Note: FH 2004 data analyses

Page 61: Trimming screening tests and modern psychometrics Paul K. Crane, MD MPH General Internal Medicine University…

Options for fitting these models

• MPLUS (Rich’s favorite)• WinBUGS using Bayesian approaches to IRT

– Yanmei Li’s code is on the web site– WinBUGS is on my laptop (freeware anyway)– I’m so not an expert

• Quite possibly coming soon to an R01 near me – stay tuned for future developments!