data collection

EDWARD JAMES R GORGON MPhysio BCHPEd PTRP

Department of Physical TherapyCollege of Allied Medical ProfessionsUniversity of the Philippines ManilaEmail: [email protected]

Methods, tools, and issuesMethods, tools, and issues

Learning objectivesobjectives

Define reliability

Discuss potential sources of measurement error

Explain the types of reliability

Explain concepts in measurement reliability

Define validity

Explain the types of validity

Explain the concepts of sensitivity and specificity

Part One

Reliability and validity

Measurement reliabilityreliability

Degree of consistency or agreement between repeated measurement taken when the underlying phenomenon has not changed

Reproducibility and repeatability of an instrument or procedure in measurement

ErrorError = Variation without true change

Repeatability = Reproducibility


Potential sources of measurement error

Rater

Patient / subject

Equipment

Procedure


Error related to the RATER

Competence / skill

Preparation

Motivation / interest

Fatigue


Error related to the PATIENT / SUBJECT

Comprehension

Familiarization

Environment

Pain

Fatigue


Error related to the PATIENT / SUBJECT

Recovery / deterioration

Hawthorne effect


Error related to the EQUIPMENT

Operation

Maintenance

Calibration

Sensitivity


Error related to the PROCEDURE

Positioning

Handling

Stabilization

Instructions


Types of reliability

Internal consistency

Test-retest

Intra-rater

Inter-rater

Reliability

9.00 10.00 11.00 12.00 13.00

Rater2

9.00

10.00

11.00

12.00

13.00

Ra

ter1

Reliability

2.00 4.00 6.00 8.00 10.00 12.00

Measurement 2

9.00

10.00

11.00

12.00

13.00

Mea

sure

men

t 1

Internal consistencyInternal consistency

Degree of homogeneity of test items within an instrument to the attribute being measured

Measured at one point in time

Usually assessed using Cronbach’s alpha (α)

Test-retestTest-retest reliability

Degree to which an instrument is stable, based on repeated (at least 2) measurements on different occasions

Constant test conditions, including subjects and rater(s), in both occasions

Not possible to assess if the variable is labile

Test-retestTest-retest reliability

Barthel Index, BADL (Sackley et al, 2006)

WEEK 1 WEEK 2 SUBJ1 10 11SUBJ2 10 10SUBJ3 11 12SUBJ4 13 13SUBJ5 9 11SUBJ6 11 12SUBJ7 12 11SUBJ8 10 9

Intra-raterIntra-rater reliability

Stability of data recorded by 1 rater across 2 or more trials done in 1 occasion of measurement

Constant test conditions, including subjects, in both trials

Intra-raterIntra-rater reliability

Goniometry, knee flexion (Lin, 2003)

TRIAL 1 (deg) TRIAL 2 (deg)SUBJ1 76 75SUBJ2 90 87SUBJ3 84 82SUBJ4 83 85SUBJ5 79 78SUBJ6 87 86SUBJ7 80 82SUBJ8 77 79

Inter-raterInter-rater reliability

Variation between 2 or more raters who measure the same group of subjects at least once each

Constant test conditions, including subjects

Potential bias from differences in raters’ training and experience levels

Inter-raterInter-rater reliability

Peabody, language skills (van Kleeck et al., 2006)

RATER 1 RATER 2 SUBJ1 45 69SUBJ2 99 81SUBJ3 84 75SUBJ4 80 74SUBJ5 79 72SUBJ6 81 85SUBJ7 60 82SUBJ8 76 87

Reliability coefficientReliability coefficient

Formula:

true score variance

-----------------------------------------------------

true score variance + error variance

Kappa (k)Kappa (k)

Represents the average rate of agreement for an entire set of yes/no responses

Appropriate when data are nominal-level or ordinal-level

Varies from 0 – 1 (no units associated)

Coefficient of variation (CoV)Coefficient of variation (CoV)

Formula:

Standard deviation

------------------------------- X 100%

Mean

Coefficient of variation (CoV)Coefficient of variation (CoV)

The standard deviation expressed as a percentage of the mean

Useful when comparing variability in different groups

Appropriate when data are interval-level or ratio-level

Intraclass correlation coefficient (ICC)Intraclass correlation coefficient (ICC)

Ratio of person variance divided by total variance (between persons + within persons)

Reflects both the degree of correspondence and agreement among ratings

Varies from 0 – 1 (no units associated)

Interpreting reliability estimates reliability estimates

“Rule of thumb”

> 0.80 = Excellent

0.60 – 0.79 = Adequate

< 0.60 = Poor

HOWEVER, estimates are population-specific and use may be context-specific

Choosing reliablereliable outcome measures Rigor of standardization studies for reliability

ExcellentExcellent More than 2 well-designed reliability studies completed with

adequate to excellent reliability values

AdequateAdequate1-2 well-designed reliability studies with adequate to excellent reliability values

PoorPoorReliability studies poorly completed, or reliability studies showing poor levels of reliability

No evidence availableNo evidence available

Measurement validityvalidity

Extent to which an instrument measures what it is supposed to measure

= TRUENESSTRUENESS OF A MEASURE

Validity implies that a measurement is relatively free from errorfree from error, i.e., a valid test is also reliable

Validity allows generalizations beyond a specific score


Emphasis is placed on the objectives of a test and the ability to make inferences from test scores or measurements

Specificity of validity evaluated within the context of the test’s intended use and a specific population


How to say that inferences from a test are validvalid?

Instrument output related and proportional to the actual variable of interest

Values assigned to the variable are representative of response

Types of validityvalidity

Face validity

Content validity

Criterion-related validity

Construct validity

FaceFace validity

The extent to which an instrument appears to test what it is supposed to test

Determined by a non-rigorous process – ALL OR NONEALL OR NONE

Insufficient for the overall validity of a test

ContentContent validity

The extent to which items in an instrument addresses and samples relevant aspects within the concept / variable being measured / assessed

ContentContent validity

Important characteristic of questionnaires, examinations, and interviews

Demands that a test is not influenced by factors irrelevant to the purpose of measurement

CriterionCriterion validity

The extent to which an instrument agrees with an external criterion measurement (a “gold standard”) of that concept

Ergo, outcomes of the instrument can be used as a substitute measure for the gold standard

If the correlation between the target test and criterion is high, the test is a valid predictor of the criterion score


Criterion must be reliable and relevant to the parameter measured by the target test

Criterion and target ratings should be independent and free from bias

If a gold standard does not exist, other similar measures are used


CONCURRENT validityTarget measurement and criterion measurement taken at the same time

PREDICTIVE validityTest will be a valid predictor of a future criterion score

ConstructConstruct validity

Ability of an instrument to measure an abstract (typically multidimensional) construct and the degree to which the instrument reflects the theoretical components of that construct

ConstructConstruct validity

CONVERGENT validityThe extent to which an instrument agrees with conceptually similar instruments

DIVERGENT validityThe extent to which an instrument lacks correlation with instruments that, conceptually, are distinct

Validity estimates: Pearson’s Pearson’s rr

Demonstrates the strength of linear relationship between 2 variables

Often used, if erroneously, as a reliability indicator

Varies from –1 through 0 through +1 (directionality of relationship indicated by the - / + sign)

Sensitivity Sensitivity and specificity specificity

SensitivitySensitivity

The ability of a test to obtain a positive test when the condition is actually present

SpecificitySpecificity

The ability of a test to obtain a negative test when the condition is actually absent

SensitivitySensitivity

Sensitivity = [a / (a + c)] x 100%

Condition + Condition - Total

Test result + a b a + b

Test result - c d c + d

Total a + c b + d

SpecificitySpecificity

Specificity = [d / (b + d)] x 100%

Condition + Condition - Total

Test result + a b a + b

Test result - c d c + d

Total a + c b + d

Measure developmentMeasure development

Planning

Test construction

Reliability testing

Validation

Measure developmentMeasure development

Appropriateness of the test for the target group

Interpretation of results in a meaningful way

Sufficient sensitivity to detect small but CLINICALLY RELEVANT change

Application of the test in varied settings and populations to determine useful properties

Selection criteriaSelection criteria for measures

Appropriateness to the target group

Psychometric properties

Validity

Reliability

Sensitivity to clinically relevant change

Sensitivity and specificity , if diagnostic purpose

Selection criteriaSelection criteria for measures

Clinical utility / practicality of administration

Clarity of instructions

Format (interview, questionnaire, task performance, naturalistic observation, other)

Ease of administration (time required to complete, scoring, interpretation)

Expertise / training required for administering and/or interpreting

Cost-effectiveness

In summary...

The End.

Thanks for listening.

data collection

Documents