reliability and validity

The Many Faces of Reliability and Validity

November 17, 2010

James A. Penny, PhDStephen B. Johnson, PhD Diane M. Talley, MA

Castle Worldwide

Why reliability & validity matter The Standards Reliability Validity Round Tables

Reliability & validity for small programsFocus on reliabilityFocus on validity

Wrap up and questions

The plan for our roundtable session

ICE 2010 Conference Atlanta Georgia

First principles – why they matter

We want to make a decision about someone’s competence (reliability)

We can’t physically “weigh” the concept

So we make logical inferences linking what can be observed to the concept (validity)


The Standards

The Standards:The degree to which all the accumulated evidence supports the intended interpretation of test scores for the proposed purpose

The Uniform Guidelines require a demonstration of validity if a test is used in employment selection

Especially if adverse impact exists

The Supreme Court – validity matters


Information from testing is like a radio signal

Measures (tests and items) have signal and noise like an AM radio.

When the signal is strong and the noise is weak, you’re happy.


But reliability of what and assessed how?


Consistency, replicability, and/or the precision of our measure We have to infer reliability

Reliability comes in many flavors Test-retest Parallel Forms Split half KR-20 KR-21 Cronbach’s alpha Generalizability

Decision consistency Standard error of measurement

Factors influencing reliability


Test length Increasing the length increases consistency

Location of the cut score in the score distribution Away from the mean improves consistency

Test score variability Increasing variability increases consistency

Spread of candidate performance (heterogeneity) Consistency of testing experiences Quality of item writing process

Validity is about making a case


Data being collected supports the theory

Procedural evidence How were the content requirements, items, and tests created?

Measurement evidence Linear factor analysis, SEM Regression analysis (prediction) Relationship to other measures

Should be able to stand court scrutiny

Like reliability, validity has many flavors


But what about Face

validity?

But what about Face

validity?

Factors influencing validity Process matters Purpose matters Theory matters Logical implications Measurement tool must match

measurement goals Nature of the group

Age, gender, background, heterogeneity

Reliability


Relationship between validity and reliability

Valid tests should be reliable (if measuring validity). Reliable tests may not be valid. Validity is more important than reliability (according

to the courts). BUT, the more important the decision, the more reliable you must be.

To be useful, an instrument (test, scale) must be both reasonably reliable and valid.

Aim for validity first, and then try make the test more reliable little by little.


Implications for creating assessments


Clearly define the construct(s) to be assessed. Identify the logical outcomes of that definition. Define how the construct(s) should be measured.

e.g., format, time required Don’t let the format do the driving!

Define the process to create any required tools. Identify who should be involved in creating any tools. Identify how you will assess reliability and validity.

Questions?

James A. Penny [email protected] Stephen B. Johnson [email protected]

Diane M. Talley [email protected]

919.572.6880www.castleworldwide.com

reliability and validity

Documents

reliability reliability

concept validity ice

validity process

heterogeneity reliability

demonstration of validity

face validity

reliability focus

faces of reliability