assessing learning for students with disabilities tom haladyna arizona state university
TRANSCRIPT
Assessing Learning Assessing Learning for Students with for Students with
DisabilitiesDisabilitiesTom HaladynaTom Haladyna
Arizona State UniversityArizona State University
Useful SourcesUseful Sources Standards for Educational & Psychological Testing, Standards for Educational & Psychological Testing,
AERA, APA, NCME (1999)AERA, APA, NCME (1999) Tindal & Haladyna (2002) Tindal & Haladyna (2002) Large-scale assessment
programs for all students: Validity, technical adequacy, and implementation.
Downing & Haladyna (2006) Downing & Haladyna (2006) Handbook of test Handbook of test developmentdevelopment
Haladyna & DowningHaladyna & Downing (2004). (2004). Construct-irrelevant variance in high-stakes testing. Educational Measurement: Issues and Practice.
Kane (2006) Content-related validity evidence. Kane (2006) Content-related validity evidence. Handbook of test development.Handbook of test development.
Kane (In press). Validation. Kane (In press). Validation. Educational MeasurementEducational Measurement (4(4thth ed.) ed.)
Assessment vs. TestingAssessment vs. Testing Assessment is the act of judging the Assessment is the act of judging the
indicators of student achievement for the indicators of student achievement for the benefit of planning future instruction. benefit of planning future instruction.
Testing is a way of providing one valid Testing is a way of providing one valid source of information for assessmentsource of information for assessment
A test is NEVER a valid source of A test is NEVER a valid source of information for assessment unless information for assessment unless corroborated by other evidence.—Multiple corroborated by other evidence.—Multiple indicatorsindicators
Validity of a Test ScoreValidity of a Test ScoreInterpretation or UseInterpretation or Use
Way of reasoning about test scores.Way of reasoning about test scores. Concerned about the accuracy of any Concerned about the accuracy of any
interpretation or use.interpretation or use. Involves an argument about how an Involves an argument about how an
assessment or a test score can be assessment or a test score can be validly interpreted or used.validly interpreted or used.
Involves a claim by the developer/userInvolves a claim by the developer/user Involves evidence that might support Involves evidence that might support
this claimthis claim
Validation’s StepsValidation’s Steps
Developmental PhaseDevelopmental Phase State a purpose for the test.State a purpose for the test. Define the trait (construct).Define the trait (construct).
ContentContent Cognitive demandCognitive demand
Develop the test.Develop the test. Validate—conduct the study.Validate—conduct the study.
Investigative phaseInvestigative phase
Two Types of EvidenceTwo Types of Evidence
That supports our claimThat supports our claim
That weakens or threatens validityThat weakens or threatens validity Construct under representationConstruct under representation Construct-irrelevant varianceConstruct-irrelevant variance
Two Types of EvidenceTwo Types of Evidence
Includes Includes proceduresprocedures known to known to strengthen our argument and strengthen our argument and support our claim.support our claim.
Includes Includes statistical/empiricalstatistical/empirical information that also strengthens information that also strengthens our argument and supports our our argument and supports our claimclaim
More Types of EvidenceMore Types of Evidence
Content-related Content-related ReliabilityReliability Item qualityItem quality Test designTest design Test administrationTest administration Test scoringTest scoring Test reportingTest reporting ConsequencesConsequences
ContentContent
Structure—sub scores???Structure—sub scores???
Concurrent—how it correlates with Concurrent—how it correlates with other informationother information
Does it represent the construct Does it represent the construct (content)?(content)?
ReliabilityReliability Very important type of validity evidence.Very important type of validity evidence.
Can be applied to individual or group Can be applied to individual or group scores.scores.
Group scores tend to be very reliable.Group scores tend to be very reliable.
Can focus at reliability at a decision point.Can focus at reliability at a decision point.
Subjective judgment a factor in reliability.Subjective judgment a factor in reliability.
Random ErrorRandom Error Basis for reliabilityBasis for reliability
Can be large or smallCan be large or small
Can be positive or negativeCan be positive or negative
We never know.We never know.
We just guess.We just guess.
Guessing allows us to speculate about where a Guessing allows us to speculate about where a student’s true score lies and what action we student’s true score lies and what action we take.take.
Item QualityItem Quality
Universal item designUniversal item design
Format issuesFormat issues
Item reviewsItem reviews
Field testsField tests
Test DesignTest Design BreadthBreadth
ScopeScope
DepthDepth
LengthLength
FormatsFormats
Test AdministrationTest Administration
StandardizedStandardized
AccommodationsAccommodations
StandardsStandards
Test ScoringTest Scoring
Avoid errors.Avoid errors.
Quality control is important.Quality control is important.
Invalidate scores when evidence Invalidate scores when evidence suggests that.suggests that.
Score ReportingScore Reporting
Helpful to teachers for assessmentHelpful to teachers for assessment
Meets requirements for Meets requirements for accountabilityaccountability
Meet Meet StandardsStandards (Ryan, 2006) (Ryan, 2006)
AdviceAdvice
Document what you do. Technical ReportDocument what you do. Technical Report
Build the case for validity. Build the case for validity.
Do validity studies when possible.Do validity studies when possible.
Stay focused on the real reason for Stay focused on the real reason for assessment and testing: helping students assessment and testing: helping students learn not satisfying someone in DC. learn not satisfying someone in DC.