reliability and validity
DESCRIPTION
Presentation given at ICE 2010 (Atlanta) regarding basic concepts and vocabulary of test reliability and validityTRANSCRIPT
The Many Faces of Reliability and Validity
November 17, 2010
James A. Penny, PhDStephen B. Johnson, PhD Diane M. Talley, MA
Castle Worldwide
Why reliability & validity matter The Standards Reliability Validity Round Tables
Reliability & validity for small programsFocus on reliabilityFocus on validity
Wrap up and questions
The plan for our roundtable session
ICE 2010 Conference Atlanta Georgia
First principles – why they matter
We want to make a decision about someone’s competence (reliability)
We can’t physically “weigh” the concept
So we make logical inferences linking what can be observed to the concept (validity)
ICE 2010 Conference Atlanta Georgia
The Standards
The Standards:The degree to which all the accumulated evidence supports the intended interpretation of test scores for the proposed purpose
The Uniform Guidelines require a demonstration of validity if a test is used in employment selection
Especially if adverse impact exists
The Supreme Court – validity matters
ICE 2010 Conference Atlanta Georgia
Information from testing is like a radio signal
Measures (tests and items) have signal and noise like an AM radio.
When the signal is strong and the noise is weak, you’re happy.
ICE 2010 Conference Atlanta Georgia
But reliability of what and assessed how?
ICE 2010 Conference Atlanta Georgia
Consistency, replicability, and/or the precision of our measure We have to infer reliability
Reliability comes in many flavors Test-retest Parallel Forms Split half KR-20 KR-21 Cronbach’s alpha Generalizability
Decision consistency Standard error of measurement
Factors influencing reliability
ICE 2010 Conference Atlanta Georgia
Test length Increasing the length increases consistency
Location of the cut score in the score distribution Away from the mean improves consistency
Test score variability Increasing variability increases consistency
Spread of candidate performance (heterogeneity) Consistency of testing experiences Quality of item writing process
Validity is about making a case
ICE 2010 Conference Atlanta Georgia
Data being collected supports the theory
Procedural evidence How were the content requirements, items, and tests created?
Measurement evidence Linear factor analysis, SEM Regression analysis (prediction) Relationship to other measures
Should be able to stand court scrutiny
Like reliability, validity has many flavors
ICE 2010 Conference Atlanta Georgia
But what about Face
validity?
But what about Face
validity?
Factors influencing validity Process matters Purpose matters Theory matters Logical implications Measurement tool must match
measurement goals Nature of the group
Age, gender, background, heterogeneity
Reliability
ICE 2010 Conference Atlanta Georgia
Relationship between validity and reliability
Valid tests should be reliable (if measuring validity). Reliable tests may not be valid. Validity is more important than reliability (according
to the courts). BUT, the more important the decision, the more reliable you must be.
To be useful, an instrument (test, scale) must be both reasonably reliable and valid.
Aim for validity first, and then try make the test more reliable little by little.
ICE 2010 Conference Atlanta Georgia
Implications for creating assessments
ICE 2010 Conference Atlanta Georgia
Clearly define the construct(s) to be assessed. Identify the logical outcomes of that definition. Define how the construct(s) should be measured.
e.g., format, time required Don’t let the format do the driving!
Define the process to create any required tools. Identify who should be involved in creating any tools. Identify how you will assess reliability and validity.
Questions?
James A. Penny [email protected] Stephen B. Johnson [email protected]
Diane M. Talley [email protected]
919.572.6880www.castleworldwide.com