michigan assessment consortium common assessment development series module 16 validity
Post on 19-Jan-2018
217 Views
Preview:
DESCRIPTION
TRANSCRIPT
Michigan Assessment Michigan Assessment ConsortiumConsortium
Common Assessment Common Assessment Development SeriesDevelopment Series
Module 16 –Module 16 –ValidityValidity
Developed byDeveloped by
Bruce R. Fay, PhD Bruce R. Fay, PhD Wayne RESAWayne RESA
James Gullen, PhD James Gullen, PhD Oakland SchoolsOakland Schools
SupportSupport
The Michigan Assessment Consortium The Michigan Assessment Consortium professional development series in professional development series in common assessment development is common assessment development is funded in part by the funded in part by the Michigan Michigan Association of Intermediate School Association of Intermediate School AdministratorsAdministrators in cooperation with … in cooperation with …
In Module 16 you will learn aboutIn Module 16 you will learn about
Validity: what it is, what it isn’t, why Validity: what it is, what it isn’t, why it’s importantit’s important
Types/Sources of Evidence for Types/Sources of Evidence for ValidityValidity
Validity & Achievement Validity & Achievement Testing – The Old(er) Testing – The Old(er) ViewView
Validity is the degree to which a test Validity is the degree to which a test measures what it is intended to measure.measures what it is intended to measure.
This view suggests that validity is a This view suggests that validity is a property of a test.property of a test.
Validity and Achievement Validity and Achievement Testing – The New(er) ViewTesting – The New(er) View
Validity relates to the meaningful use of Validity relates to the meaningful use of resultsresults
Validity is not a property of a testValidity is not a property of a test
Key question: Is it appropriate to use the Key question: Is it appropriate to use the results of this test to make the decision(s) results of this test to make the decision(s) we are trying to make?we are trying to make?
Validity & Proposed UseValidity & Proposed Use ““Validity refers to the degree to which Validity refers to the degree to which
evidence and theory support the evidence and theory support the interpretations of test scores entailed by interpretations of test scores entailed by proposed uses of the tests.” proposed uses of the tests.”
(AERA, APA, & NCME, 1999, p. 9)(AERA, APA, & NCME, 1999, p. 9)
Validity as EvaluationValidity as Evaluation ““Validity is an integrated evaluative Validity is an integrated evaluative
judgment of the degree to which judgment of the degree to which empirical evidence and theoretical empirical evidence and theoretical rationales support the rationales support the adequacyadequacy and and appropriateness of inferences appropriateness of inferences and and actions actions based on test scores or other based on test scores or other modes of assessment.”modes of assessment.”
(Messick, 1989, p. 13)(Messick, 1989, p. 13)
Meaning in ContextMeaning in Context Validity is contextual – it does not exist in Validity is contextual – it does not exist in
a vacuuma vacuum Validity has to do with the degree to Validity has to do with the degree to
which test results can be meaningfully which test results can be meaningfully interpreted and correctly used with interpreted and correctly used with respect to a question to be answered or a respect to a question to be answered or a decision to be made – it is not an all or decision to be made – it is not an all or nothing thingnothing thing
Prerequisites to ValidityPrerequisites to Validity Certain things have to be in place before Certain things have to be in place before
validity can be addressedvalidity can be addressed
ReliabilityReliability A property of the testA property of the test Statistical in natureStatistical in nature ““Consistency” or repeatabilityConsistency” or repeatability The test actually measures somethingThe test actually measures something
FairnessFairness Freedom from bias with respect to:Freedom from bias with respect to:
ContentContent Item constructionItem construction Test administration (testing environment)Test administration (testing environment) Anything else that would cause differential Anything else that would cause differential
performance based on factors other than the performance based on factors other than the students knowledge/ability with respect to students knowledge/ability with respect to the subject matterthe subject matter
The Natural Order of ThingsThe Natural Order of Things
Reliability precedes Fairness precedes Reliability precedes Fairness precedes Validity.Validity.
Only if a test is reliable can you then Only if a test is reliable can you then determine if it is fair, and only if it is fair, determine if it is fair, and only if it is fair, can you then make any defensible use of can you then make any defensible use of the results.the results.
However, having a reliable, fair test does However, having a reliable, fair test does not guarantee valid use.not guarantee valid use.
Validity RecapValidity Recap
Not a property of the testNot a property of the test Not essentially statisticalNot essentially statistical Interpretation of resultsInterpretation of results Meaning in contextMeaning in context Requires judgmentRequires judgment
Types/Sources of ValidityTypes/Sources of Validity
Internal ValidityInternal Validity FaceFace ContentContent ResponseResponse Criterion (int)Criterion (int) ConstructConstruct
External ValidityExternal Validity Criterion (ext)Criterion (ext)
ConcurrentConcurrent PredictivePredictive
ConsequentialConsequential
Internal ValidityInternal Validity
PracticalPractical ContentContent ResponseResponse Criterion (int)Criterion (int)
Not so muchNot so much FaceFace ConstructConstruct
External ValidityExternal ValidityCriterion (ext)Criterion (ext) Usually statistical Usually statistical
(measures of (measures of association or association or correlation)correlation)
Requires the existence Requires the existence of other tests or points of other tests or points of quantitative of quantitative comparisoncomparison
May require a “known May require a “known good” assumptiongood” assumption
ConsequentialConsequential Relates directly to the Relates directly to the
“correctness” of “correctness” of decisions based on decisions based on resultsresults
Usually established Usually established over multiple cases over multiple cases and timeand time
To Validate or Not to To Validate or Not to Validate…Validate…
……is that the question?is that the question?
Decision-making without Decision-making without data…data…
is just guessing.is just guessing.
But use of improperly But use of improperly validated data…validated data…
……leads to confidently leads to confidently arriving at potentially arriving at potentially
false conclusions.false conclusions.
Practical RealitiesPractical Realities Although validity is not a statistical Although validity is not a statistical
property of a test, both quantitative and property of a test, both quantitative and qualitative methods are used to establish qualitative methods are used to establish evidence for the validity for any particular evidence for the validity for any particular useuse
Many of these methods are beyond the Many of these methods are beyond the scope of what most schools/districts can scope of what most schools/districts can do for themselves…but there are things do for themselves…but there are things you can doyou can do
Clear PurposeClear Purpose
Be clear and explicit about the intended Be clear and explicit about the intended purpose for which a test is developed purpose for which a test is developed and how the results are to be usedand how the results are to be used
Documented ProcessDocumented Process Implementing the process outlined in this Implementing the process outlined in this
training, with fidelity, will provide a big training, with fidelity, will provide a big step in this direction, especially if you step in this direction, especially if you document what you are doingdocument what you are doing
Internal First, then ExternalInternal First, then External
Focus first on Internal ValidityFocus first on Internal Validity ContentContent ResponseResponse CriterionCriterion
Focus next on External ValidityFocus next on External Validity ConcurrentConcurrent PredictivePredictive ConsequentialConsequential
Content & Criterion Content & Criterion EvidenceEvidence
Create the foundation for these by:Create the foundation for these by: Using test blueprints to design and explicitly Using test blueprints to design and explicitly
document the relationship (alignment and document the relationship (alignment and coverage) of the items on a test to content coverage) of the items on a test to content standardsstandards
Specifying appropriate numbers, types, and Specifying appropriate numbers, types, and levels of items for the content to be levels of items for the content to be assessedassessed
More on Content & CriterionMore on Content & Criterion
Have test items written and reviewed by Have test items written and reviewed by people with content/assessment expertise people with content/assessment expertise using a defined process such as the one using a defined process such as the one described in this series. Be sure to review described in this series. Be sure to review for bias and other criteria.for bias and other criteria.
Create rubrics, scoring guides, or answer Create rubrics, scoring guides, or answer keys as needed, and check them for keys as needed, and check them for accuracyaccuracy
It’s Not Just the Items…It’s Not Just the Items…
Establish/document administration Establish/document administration proceduresprocedures
Determine how the results will be reported Determine how the results will be reported and to whom. Develop draft reporting and to whom. Develop draft reporting formats.formats.
Field Testing and ScoringField Testing and Scoring
Field test your assessmentField test your assessment Evaluate the test administrationEvaluate the test administration For open-ended items, train scorers and For open-ended items, train scorers and
check that scoring is consistent (establish check that scoring is consistent (establish inter-rater reliability)inter-rater reliability)
Create annotated scoring guides using Create annotated scoring guides using actual (anonymous) student papers as actual (anonymous) student papers as exemplarsexemplars
Field Test Results AnalysisField Test Results Analysis Analyze the field test results for reliability, Analyze the field test results for reliability,
bias, and response patternsbias, and response patterns Make adjustments based on this analysisMake adjustments based on this analysis Report results to field testers and Report results to field testers and
evaluate their ability to interpret the data evaluate their ability to interpret the data and make correct inferences/decisionsand make correct inferences/decisions
Repeat the field testing if neededRepeat the field testing if needed
How Good is Good Enough?How Good is Good Enough?
Establish your initial performance Establish your initial performance standards in light of your field test data, standards in light of your field test data, and adjust if neededand adjust if needed
Consider external validity by “comparing” Consider external validity by “comparing” pilot results to results from other “known pilot results to results from other “known good” tests or data pointsgood” tests or data points
Ready, Set, Go! (?)Ready, Set, Go! (?)
When the test “goes live” take steps to When the test “goes live” take steps to ensure the it is administered properly; ensure the it is administered properly; monitor and document this, noting any monitor and document this, noting any anomaliesanomalies
Behind the ScenesBehind the Scenes
Ensure that tests are scored accurately. Ensure that tests are scored accurately. Pay particular attention to the scoring of Pay particular attention to the scoring of open-ended items. Use a process that open-ended items. Use a process that allows you to check on inter-rater allows you to check on inter-rater reliability, at least on a sample basisreliability, at least on a sample basis
Making MeaningMaking Meaning
Ensure that test results are reported:Ensure that test results are reported: Using previously developed formatsUsing previously developed formats To the correct usersTo the correct users In a timely fashionIn a timely fashion
Follow up on whether the users can/do Follow up on whether the users can/do make meaningful use of the resultsmake meaningful use of the results
ConclusionConclusion
top related