michigan assessment consortium common assessment development series module 16 validity

Michigan Assessment Michigan Assessment ConsortiumConsortium

Common Assessment Common Assessment Development SeriesDevelopment Series

Module 16 –Module 16 –ValidityValidity

Developed byDeveloped by

Bruce R. Fay, PhD Bruce R. Fay, PhD Wayne RESAWayne RESA

James Gullen, PhD James Gullen, PhD Oakland SchoolsOakland Schools

SupportSupport

The Michigan Assessment Consortium The Michigan Assessment Consortium professional development series in professional development series in common assessment development is common assessment development is funded in part by the funded in part by the Michigan Michigan Association of Intermediate School Association of Intermediate School AdministratorsAdministrators in cooperation with … in cooperation with …

In Module 16 you will learn aboutIn Module 16 you will learn about

Validity: what it is, what it isn’t, why Validity: what it is, what it isn’t, why it’s importantit’s important

Types/Sources of Evidence for Types/Sources of Evidence for ValidityValidity

Validity & Achievement Validity & Achievement Testing – The Old(er) Testing – The Old(er) ViewView

Validity is the degree to which a test Validity is the degree to which a test measures what it is intended to measure.measures what it is intended to measure.

This view suggests that validity is a This view suggests that validity is a property of a test.property of a test.

Validity and Achievement Validity and Achievement Testing – The New(er) ViewTesting – The New(er) View

Validity relates to the meaningful use of Validity relates to the meaningful use of resultsresults

Validity is not a property of a testValidity is not a property of a test

Key question: Is it appropriate to use the Key question: Is it appropriate to use the results of this test to make the decision(s) results of this test to make the decision(s) we are trying to make?we are trying to make?

Validity & Proposed UseValidity & Proposed Use ““Validity refers to the degree to which Validity refers to the degree to which

evidence and theory support the evidence and theory support the interpretations of test scores entailed by interpretations of test scores entailed by proposed uses of the tests.” proposed uses of the tests.”

(AERA, APA, & NCME, 1999, p. 9)(AERA, APA, & NCME, 1999, p. 9)

Validity as EvaluationValidity as Evaluation ““Validity is an integrated evaluative Validity is an integrated evaluative

judgment of the degree to which judgment of the degree to which empirical evidence and theoretical empirical evidence and theoretical rationales support the rationales support the adequacyadequacy and and appropriateness of inferences appropriateness of inferences and and actions actions based on test scores or other based on test scores or other modes of assessment.”modes of assessment.”

(Messick, 1989, p. 13)(Messick, 1989, p. 13)

Meaning in ContextMeaning in Context Validity is contextual – it does not exist in Validity is contextual – it does not exist in

a vacuuma vacuum Validity has to do with the degree to Validity has to do with the degree to

which test results can be meaningfully which test results can be meaningfully interpreted and correctly used with interpreted and correctly used with respect to a question to be answered or a respect to a question to be answered or a decision to be made – it is not an all or decision to be made – it is not an all or nothing thingnothing thing

Prerequisites to ValidityPrerequisites to Validity Certain things have to be in place before Certain things have to be in place before

validity can be addressedvalidity can be addressed

ReliabilityReliability A property of the testA property of the test Statistical in natureStatistical in nature ““Consistency” or repeatabilityConsistency” or repeatability The test actually measures somethingThe test actually measures something

FairnessFairness Freedom from bias with respect to:Freedom from bias with respect to:

ContentContent Item constructionItem construction Test administration (testing environment)Test administration (testing environment) Anything else that would cause differential Anything else that would cause differential

performance based on factors other than the performance based on factors other than the students knowledge/ability with respect to students knowledge/ability with respect to the subject matterthe subject matter

The Natural Order of ThingsThe Natural Order of Things

Reliability precedes Fairness precedes Reliability precedes Fairness precedes Validity.Validity.

Only if a test is reliable can you then Only if a test is reliable can you then determine if it is fair, and only if it is fair, determine if it is fair, and only if it is fair, can you then make any defensible use of can you then make any defensible use of the results.the results.

However, having a reliable, fair test does However, having a reliable, fair test does not guarantee valid use.not guarantee valid use.

Validity RecapValidity Recap

Not a property of the testNot a property of the test Not essentially statisticalNot essentially statistical Interpretation of resultsInterpretation of results Meaning in contextMeaning in context Requires judgmentRequires judgment

Types/Sources of ValidityTypes/Sources of Validity

Internal ValidityInternal Validity FaceFace ContentContent ResponseResponse Criterion (int)Criterion (int) ConstructConstruct

External ValidityExternal Validity Criterion (ext)Criterion (ext)

ConcurrentConcurrent PredictivePredictive

ConsequentialConsequential

Internal ValidityInternal Validity

PracticalPractical ContentContent ResponseResponse Criterion (int)Criterion (int)

Not so muchNot so much FaceFace ConstructConstruct

External ValidityExternal ValidityCriterion (ext)Criterion (ext) Usually statistical Usually statistical

(measures of (measures of association or association or correlation)correlation)

Requires the existence Requires the existence of other tests or points of other tests or points of quantitative of quantitative comparisoncomparison

May require a “known May require a “known good” assumptiongood” assumption

ConsequentialConsequential Relates directly to the Relates directly to the

“correctness” of “correctness” of decisions based on decisions based on resultsresults

Usually established Usually established over multiple cases over multiple cases and timeand time

To Validate or Not to To Validate or Not to Validate…Validate…

……is that the question?is that the question?

Decision-making without Decision-making without data…data…

is just guessing.is just guessing.

But use of improperly But use of improperly validated data…validated data…

……leads to confidently leads to confidently arriving at potentially arriving at potentially

false conclusions.false conclusions.

Practical RealitiesPractical Realities Although validity is not a statistical Although validity is not a statistical

property of a test, both quantitative and property of a test, both quantitative and qualitative methods are used to establish qualitative methods are used to establish evidence for the validity for any particular evidence for the validity for any particular useuse

Many of these methods are beyond the Many of these methods are beyond the scope of what most schools/districts can scope of what most schools/districts can do for themselves…but there are things do for themselves…but there are things you can doyou can do

Clear PurposeClear Purpose

Be clear and explicit about the intended Be clear and explicit about the intended purpose for which a test is developed purpose for which a test is developed and how the results are to be usedand how the results are to be used

Documented ProcessDocumented Process Implementing the process outlined in this Implementing the process outlined in this

training, with fidelity, will provide a big training, with fidelity, will provide a big step in this direction, especially if you step in this direction, especially if you document what you are doingdocument what you are doing

Internal First, then ExternalInternal First, then External

Focus first on Internal ValidityFocus first on Internal Validity ContentContent ResponseResponse CriterionCriterion

Focus next on External ValidityFocus next on External Validity ConcurrentConcurrent PredictivePredictive ConsequentialConsequential

Content & Criterion Content & Criterion EvidenceEvidence

Create the foundation for these by:Create the foundation for these by: Using test blueprints to design and explicitly Using test blueprints to design and explicitly

document the relationship (alignment and document the relationship (alignment and coverage) of the items on a test to content coverage) of the items on a test to content standardsstandards

Specifying appropriate numbers, types, and Specifying appropriate numbers, types, and levels of items for the content to be levels of items for the content to be assessedassessed

More on Content & CriterionMore on Content & Criterion

Have test items written and reviewed by Have test items written and reviewed by people with content/assessment expertise people with content/assessment expertise using a defined process such as the one using a defined process such as the one described in this series. Be sure to review described in this series. Be sure to review for bias and other criteria.for bias and other criteria.

Create rubrics, scoring guides, or answer Create rubrics, scoring guides, or answer keys as needed, and check them for keys as needed, and check them for accuracyaccuracy

It’s Not Just the Items…It’s Not Just the Items…

Establish/document administration Establish/document administration proceduresprocedures

Determine how the results will be reported Determine how the results will be reported and to whom. Develop draft reporting and to whom. Develop draft reporting formats.formats.

Field Testing and ScoringField Testing and Scoring

Field test your assessmentField test your assessment Evaluate the test administrationEvaluate the test administration For open-ended items, train scorers and For open-ended items, train scorers and

check that scoring is consistent (establish check that scoring is consistent (establish inter-rater reliability)inter-rater reliability)

Create annotated scoring guides using Create annotated scoring guides using actual (anonymous) student papers as actual (anonymous) student papers as exemplarsexemplars

Field Test Results AnalysisField Test Results Analysis Analyze the field test results for reliability, Analyze the field test results for reliability,

bias, and response patternsbias, and response patterns Make adjustments based on this analysisMake adjustments based on this analysis Report results to field testers and Report results to field testers and

evaluate their ability to interpret the data evaluate their ability to interpret the data and make correct inferences/decisionsand make correct inferences/decisions

Repeat the field testing if neededRepeat the field testing if needed

How Good is Good Enough?How Good is Good Enough?

Establish your initial performance Establish your initial performance standards in light of your field test data, standards in light of your field test data, and adjust if neededand adjust if needed

Consider external validity by “comparing” Consider external validity by “comparing” pilot results to results from other “known pilot results to results from other “known good” tests or data pointsgood” tests or data points

Ready, Set, Go! (?)Ready, Set, Go! (?)

When the test “goes live” take steps to When the test “goes live” take steps to ensure the it is administered properly; ensure the it is administered properly; monitor and document this, noting any monitor and document this, noting any anomaliesanomalies

Behind the ScenesBehind the Scenes

Ensure that tests are scored accurately. Ensure that tests are scored accurately. Pay particular attention to the scoring of Pay particular attention to the scoring of open-ended items. Use a process that open-ended items. Use a process that allows you to check on inter-rater allows you to check on inter-rater reliability, at least on a sample basisreliability, at least on a sample basis

Making MeaningMaking Meaning

Ensure that test results are reported:Ensure that test results are reported: Using previously developed formatsUsing previously developed formats To the correct usersTo the correct users In a timely fashionIn a timely fashion

Follow up on whether the users can/do Follow up on whether the users can/do make meaningful use of the resultsmake meaningful use of the results

ConclusionConclusion

michigan assessment consortium common assessment development series module 16 validity

Documents

georgia alternate assessment validity studies: review · pdf...

psychological assessment of symptom and performance validity

teacher performance assessment consortium

smarter balanced assessment consortium:

consortium for alternate assessment validity and...

validity and reliability in assessment

michigan assessment consortium vision

smarter balanced assessment consortium model

validity in action: state assessment validity evidence for...

is assessment centre validity falling?

smarter balanced assessment consortium

consortium for alternate assessment validity and...

designing validity into an alternate assessment

worker rights consortium assessment palermo …

the smarter balanced assessment consortium

changes in the reliability and validity of peer assessment...

smarter balanced assessment consortium overview

smarter balanced assessment consortium: next generation...

a validity in language assessment

smarter balanced assessment consortium smarter balanced...