ptp 560 research methods week 3 thomas ruediger, pt

PTP 560

• Research Methods

• Week 3

Thomas Ruediger, PT

Reliability• Observed score = True Score ± Error, (X) = (T) ± (E)• Consistent

– Score– Performance

• The true score is free from Error (X)• Measurement Error

– Hypothetically it could be zero=– Practically, it is…

• Systematic=uses scales, stateometer, • Random=done differently for no reason.• Or both

Types of Measurement Error• Systematic

– Biased= always there– Consistent= use same instrument– Often more of a validity concern, but affects

reliability– Examples?

• Random– Unpredictable factors– As likely to be high as low– Examples?

Sources of Measurement Error• Individual

– Skill of the person taking the measure– Also called rater or tester error

• The instrument: can be limited by using the same.

• Lability of the phenomenon (when not from instrument or tester)– An actual change from measurement to

measurement, then a real difference is obsereved.

Regression towards the mean• Initial extreme high scores

– Subsequent scores will tend toward the mean– Proportional to the amount of error

• Extreme low scores– Will also tend toward the mean subsequently– Proportional to the amount of error

• “Bell Shaped”

• Research repercussion– Group assignments based on scores– Intervention effect may be masked

Reliability Coefficients• True Score Variance/Total Variance• Can range from 0 to 1

– By convention 0.00 to 1.00– 0.00 = no reliability– 1.00 = perfect reliability

• Portney and Watkins Guidelines *TESTABLE– Less than 0.50 = poor reliability– 0.50 to 0.75 = moderate reliability– 0.75 to 1.00 = good reliability– These are NOT standards– Acceptable level should be based on application

Correlation v Agreement

• Correlation – degree of association– Is X correlated/associated with Y

• Usually not as clinically important for PT– We want to know whether they agree, not just

correlated. We want accuracy to be consistent.

• We generally want to know agreement– Between tests– Between raters


In this case both are perfect


In this case correlation is still perfect, but there is no agreement

Reliability• Required to have validity

– Validity needs to be reliable– But does not have to be valid to be reliable.

• Four general approaches– Test-Retest

• (Nominal data) Kappa statistic for percent agreement– Good vs. No Good

• (Ordinal Data) Spearman rho• (Interval or Ratio Data) Pearson Product-moment

• ICC (For Ordinal, Interval, and Ratio Data)– Association and agreement reflected– The current preferred index

Reliability– Rater reliability

• ICC should be used

– Alternate forms• Limits of Agreement

– Internal Consistency (Homogeneity)• Usually Cronbach’s alpha

Reliability

• Generalizability– Reliability is not “owned “ by the instrument– May not apply to:

• Another population• Another rater (or group of raters)• Different time interval

• Minimum Detectable Difference– Or minimum detectable change– How much change is needed to say it’s not chance– Not the same as MCID

Minimum detectable difference (MDD)?

• Smallest difference that reflects true difference• Better the reliability, smaller the MDD• Different than statistical difference• (1.96*SEM*√2) 1.96 = 95% CI

• Ask yourself: Difference b/w 1 and 2?– Is it statistically different?– Is it clinically different? (Next slide)

Weir JP. Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. J Strength Cond Res. Feb 2005;19(1):231-240.

Eliasziw M, Young SL, Woodbury MG, Fryday-Field K. Statistical methodology for the concurrent assessment of interrater and intrarater reliability: using goniometric measurements as an example. Phys Ther. Aug 1994;74(8):777-788.

Minimum Clinically Important Difference (MCID)?

• Smallest difference considered clinically non-trivial

• Smallest that patient perceives as beneficial

• Usually associated with either:– Expert judgment of clinician– External Health Status Measure

Validity• Measurement measures what is intended

• We use them to draw inferences in clinical use– Due to indirect nature of measuring– To apply our result to a diagnostic challenge– Ex: Why do we do a manual muscle test?

• Validity– Is not something an instrument has

• Is specific to the intended use

• Not required for Reliability– (i.e. Just because it is reliable does not mean it is valid)

Validity• Multiple types

– Face Validity (LEAST rigorous, looks like it should make sense)

– Content (tests content, GRE content is a good predictor of passing leisure exam)

– Criterion-referenced (To a GOLD or a Reference standard)

• Concurrent validity• Predictive validity

– Construct (Figure 6.2 in P &W helpful here)• Part content• Part theoretical• Multiple ways to assess (I won’t test these!)

Validity of Change• Change is often how we make clinical decisions

– Evaluate treatment effect– Consider different options

• Validity affected by four issues– Level of measurement (Ordinal has highest risk)– Reliability

• There will likely be a change due to chance• There may be a true change(One suggestion (reliability > 0.50 to use change scores))

– Stability of variable– Baseline scores

• Floor effect• Ceiling effect

Truth

Test+

+

-

Sp

Sn

a b

c dNPV = d/c+d

PPV = a/a+b

1-Sn = - LR

+ LR = 1-Sp

Sp = d/b+d

Sn = a/a+c

Truth

Test+

+

-

Sp

Sn = ?

99 b

1d

Sp = d/b+d

Sn = a/a+c

In this example we picked 100 people with a known disorder, applied our clinical test and got these results.

Truth

Test+

+

-

Sp= ?

Sn

a 20

c 80

Sp = d/b+d

Sn = a/a+c

In this example we picked 100 people known to not have the disorder, applied our clinical test and got these results.

Now a patient comes in

• The history suggests to you that she has the disorder

• You do the clinical test

• The result of the test is negative

• Which is more useful?– SpPin? or– SnNout?

Another patient comes in

• The history suggests to you that she does not have the disorder

• She is very concerned that she has it

• You do the clinical test

• The result of the test is positive

• Which is more useful:– SpPin or– SnNout

Truth

Test+

+

-

99 20

1 80

= - LR

+ LR =

+ LR =

= - LR

Likelihood Ratios• Allows us to quantify the likelihood of a condition (present

or absent)

• Importance ↑ as they move away from 1

• 1 does not change our confidence

• Which number is further away from 1?– (look at the nomogram)

• - LR is further away from 1 (this is a logarithmic scale)

ptp 560 research methods week 3 thomas ruediger, pt

Documents