measurement characteristics error & confidence reliability, validity, & usability
TRANSCRIPT
MEASUREMENT CHARACTERISTICS
Error & ConfidenceReliability, Validity, & Usability
ERROR & CONFIDENCE
Reducing error All assessment scores have error Want to minimize so scores are accurate Protocols & periodic staff training/retraining
Increasing confidence Results lead to correct placement Assessments that produce valid, reliable,
and usable results
ASSESSMENT RESULTS
Norm-referenced Individual’s score compared to others
in their peer/norm group School tests, 95%
Norm group needs to be representative of test takers the test was designed for
ASSESSMENT RESULTS
Criterion-referenced Individual’s score compared to a
preset standard or criterion Standard doesn’t change based on
the individual or group A=250-295 points
VALIDITY
Describes how well the assessment results match their intended purposeAre you measuring what you think you are measuring?Relationship between program & assessment contentDoes not have validity for all purposes, populations or time
VALIDITY
Depends on different types of evidenceIs a matter of degree (no tool is perfect)Is a unitary concept Change from past Former types are now considered as
evidence Content validity/content-related evidence
FACE VALIDITY
Not listed in textDo the items seem to fit?
CONTENT VALIDITY(Content-related evidence)
How well does assessment measure subject or content?RepresentativeCompleteness----all major areasNonstatisticalReview of literature or expert opinionBlueprint of major componentsPer Austin (1991), minimum requirement for any assessment
CRITERION-RELATED VALIDITY (Criterion-related evidence)
Comparison of resultsStatisticalReported as validity or correlation coefficient+1 to -1 (1 is a perfect relationship)0 = no relationshipr.73 better than r.52r +/-.40 to +/-.70 = acceptable range
CRITERION-RELATED VALIDITY (Criterion-related evidence)
May use .30 to .40 if statistically significantIf validity is reported, it is generally criterion-related validity2 types Predictive Concurrent
PREDICTIVE VALIDITY
The ability of an assessment to predict future behaviors or outcomesMeasures are taken at different times ACT or SAT & success in college Leisure Satisfaction predicts
discharge
CONCURRENT VALIDITY
More than one instrument measures the same contentDesire to predict 1 set of scores from another set of scores that are taken at the same or nearly same time measuring the same variable
CONSTRUCT VALIDITY(Construct-related evidence)
Theoretical/conceptualContent & criterion-related validity contribute to construct validityResearch concerning conceptual framework on which assessment is based contribute to construct validityNot demonstrated in a single project or statistical measureFew TR have: focus = behavior not construct
CONSTRUCT VALIDITY(Construct-related evidence)
Factor analysisConvergent validity (what it measures)Divergent validity (what it doesn’t measure)Expert panels here too
THREATS TO VALIDITY
Assessment s/b valid for intended use (e.g. research instruments)Unclear directionsUnclear or ambiguous termsItems that are at inappropriate level for subjectsItems not related to construct being measured
THREATS TO VALIDITY
Too few itemsToo many itemsItems with an identifiable pattern of responseMethod of administrationTesting conditionsSubjects health, reluctance, attitudesSee Stumbo, 2002, p.41-42
VALIDITY
Can’t get valid results without reliable results, but can get reliable results without valid resultsReliability is a necessary but not sufficient condition for validitySee Stumbo, 2002, p. 54
RELIABLITY
Accuracy or consistency of a measurementReproducible resultsStatistical in naturer = between 0 & 1 (with 1 being perfect)Should not be lower than .80Tells what portion of variance is non-error varianceIncreases with length of test & spread of scores
STABILITY (Test-retest)
How stable is the assessment?Assessment not overly influenced by passage of timeSame group assessed 2 times with same instrument & results of the 2 testings are correlatedAre the 2 sets of scores alike?Time effects (longer, shorter)
EQUIVALENCY (Equivalent forms)
Also known as parallel-form or alternative-form reliabilityHow closely correlated are 2 or more forms of the same assessment?2 forms have been developed and demonstrated to measure the same constructForms have similar but not same itemse.g. NCTRC examShort & long forms are not equivalent
INTERNAL CONSISTENCY
How closely are items on the assessment related?Split half 1st half vs. 2nd half Odd/even Matched random subsets
If can’t divide Cronbach’s alpha Kuder-Richardson Spearman-Brown’s formula
INTERRATER RELIABILITY
Percentage of agreements with number of observationsDifference between agreement & accuracyRaters compared to each other80% agreement
INTERRATER RELIABILITY
Simple agreement Number of agreements & disagreements
Point-to-point agreement Takes each data point into consideration
Percentages of agreement for the occurrence of target behaviorKappa index
INTRARATER RELIABILITY
Not in textCompared with self
RELIABILITY
Manuals often give this informationHigh reliability doesn’t indicate validityGenerally a longer test has higher reliability Lessens influence of chance or
guessing
FAIRNESS
Reduction or elimination of undue bias Language Ethnic or racial backgrounds Gender Free of stereotypes & biases
Beginning to be a concern for TR
USABILITY & PRACTICALITY
NonstatisticalIs this tool better than any other tool on market or one I can design?Time, cost, staff qualifications, ease of administration, scoring, etc