ptp 560 research methods week 3 thomas ruediger, pt
TRANSCRIPT
PTP 560
• Research Methods
• Week 3
Thomas Ruediger, PT
Reliability• Observed score = True Score ± Error, (X) = (T) ± (E)• Consistent
– Score– Performance
• The true score is free from Error (X)• Measurement Error
– Hypothetically it could be zero=– Practically, it is…
• Systematic=uses scales, stateometer, • Random=done differently for no reason.• Or both
Types of Measurement Error• Systematic
– Biased= always there– Consistent= use same instrument– Often more of a validity concern, but affects
reliability– Examples?
• Random– Unpredictable factors– As likely to be high as low– Examples?
Sources of Measurement Error• Individual
– Skill of the person taking the measure– Also called rater or tester error
• The instrument: can be limited by using the same.
• Lability of the phenomenon (when not from instrument or tester)– An actual change from measurement to
measurement, then a real difference is obsereved.
Regression towards the mean• Initial extreme high scores
– Subsequent scores will tend toward the mean– Proportional to the amount of error
• Extreme low scores– Will also tend toward the mean subsequently– Proportional to the amount of error
• “Bell Shaped”
• Research repercussion– Group assignments based on scores– Intervention effect may be masked
Reliability Coefficients• True Score Variance/Total Variance• Can range from 0 to 1
– By convention 0.00 to 1.00– 0.00 = no reliability– 1.00 = perfect reliability
• Portney and Watkins Guidelines *TESTABLE– Less than 0.50 = poor reliability– 0.50 to 0.75 = moderate reliability– 0.75 to 1.00 = good reliability– These are NOT standards– Acceptable level should be based on application
Correlation v Agreement
• Correlation – degree of association– Is X correlated/associated with Y
• Usually not as clinically important for PT– We want to know whether they agree, not just
correlated. We want accuracy to be consistent.
• We generally want to know agreement– Between tests– Between raters
Correlation v Agreement
In this case both are perfect
Correlation v Agreement
In this case correlation is still perfect, but there is no agreement
Reliability• Required to have validity
– Validity needs to be reliable– But does not have to be valid to be reliable.
• Four general approaches– Test-Retest
• (Nominal data) Kappa statistic for percent agreement– Good vs. No Good
• (Ordinal Data) Spearman rho• (Interval or Ratio Data) Pearson Product-moment
• ICC (For Ordinal, Interval, and Ratio Data)– Association and agreement reflected– The current preferred index
Reliability– Rater reliability
• ICC should be used
– Alternate forms• Limits of Agreement
– Internal Consistency (Homogeneity)• Usually Cronbach’s alpha
Reliability
• Generalizability– Reliability is not “owned “ by the instrument– May not apply to:
• Another population• Another rater (or group of raters)• Different time interval
• Minimum Detectable Difference– Or minimum detectable change– How much change is needed to say it’s not chance– Not the same as MCID
Minimum detectable difference (MDD)?
• Smallest difference that reflects true difference• Better the reliability, smaller the MDD• Different than statistical difference• (1.96*SEM*√2) 1.96 = 95% CI
• Ask yourself: Difference b/w 1 and 2?– Is it statistically different?– Is it clinically different? (Next slide)
Weir JP. Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. J Strength Cond Res. Feb 2005;19(1):231-240.
Eliasziw M, Young SL, Woodbury MG, Fryday-Field K. Statistical methodology for the concurrent assessment of interrater and intrarater reliability: using goniometric measurements as an example. Phys Ther. Aug 1994;74(8):777-788.
Minimum Clinically Important Difference (MCID)?
• Smallest difference considered clinically non-trivial
• Smallest that patient perceives as beneficial
• Usually associated with either:– Expert judgment of clinician– External Health Status Measure
Validity• Measurement measures what is intended
• We use them to draw inferences in clinical use– Due to indirect nature of measuring– To apply our result to a diagnostic challenge– Ex: Why do we do a manual muscle test?
• Validity– Is not something an instrument has
• Is specific to the intended use
• Not required for Reliability– (i.e. Just because it is reliable does not mean it is valid)
Validity• Multiple types
– Face Validity (LEAST rigorous, looks like it should make sense)
– Content (tests content, GRE content is a good predictor of passing leisure exam)
– Criterion-referenced (To a GOLD or a Reference standard)
• Concurrent validity• Predictive validity
– Construct (Figure 6.2 in P &W helpful here)• Part content• Part theoretical• Multiple ways to assess (I won’t test these!)
Validity of Change• Change is often how we make clinical decisions
– Evaluate treatment effect– Consider different options
• Validity affected by four issues– Level of measurement (Ordinal has highest risk)– Reliability
• There will likely be a change due to chance• There may be a true change(One suggestion (reliability > 0.50 to use change scores))
– Stability of variable– Baseline scores
• Floor effect• Ceiling effect
Truth
Test+
+
-
Sp
Sn
a b
c dNPV = d/c+d
PPV = a/a+b
1-Sn = - LR
+ LR = 1-Sp
Sp = d/b+d
Sn = a/a+c
Truth
Test+
+
-
Sp
Sn = ?
99 b
1d
Sp = d/b+d
Sn = a/a+c
In this example we picked 100 people with a known disorder, applied our clinical test and got these results.
Truth
Test+
+
-
Sp= ?
Sn
a 20
c 80
Sp = d/b+d
Sn = a/a+c
In this example we picked 100 people known to not have the disorder, applied our clinical test and got these results.
Now a patient comes in
• The history suggests to you that she has the disorder
• You do the clinical test
• The result of the test is negative
• Which is more useful?– SpPin? or– SnNout?
Another patient comes in
• The history suggests to you that she does not have the disorder
• She is very concerned that she has it
• You do the clinical test
• The result of the test is positive
• Which is more useful:– SpPin or– SnNout
Truth
Test+
+
-
99 20
1 80
= - LR
+ LR =
+ LR =
= - LR
Likelihood Ratios• Allows us to quantify the likelihood of a condition (present
or absent)
• Importance ↑ as they move away from 1
• 1 does not change our confidence
• Which number is further away from 1?– (look at the nomogram)
• - LR is further away from 1 (this is a logarithmic scale)