© copyright 2006 stephen g. sireci stephen g. sireci center for educational assessment university...

62
© copyright 2006 Stephen G. Sireci Stephen G. Sireci Center for Educational Assessment University of Massachusetts Amherst Mary J. Pitoniak Educational Testing Service Assessment Accommodations: What Have We Learned From Research?

Upload: myles-french

Post on 22-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

© copyright 2006 Stephen G. Sireci

Stephen G. SireciCenter for Educational Assessment

University of Massachusetts Amherst

Mary J. PitoniakEducational Testing Service

Assessment Accommodations: What Have We Learned

From Research?

© copyright 2006 Stephen G. Sireci

In this presentation we will

• Discuss validity issues in test accommodations

• List the most common test accommodations used to promote valid score interpretation

• Discuss research conducted on test accommodations

• Suggest areas for future research on test accommodations

© copyright 2006 Stephen G. Sireci

Defining “Accommodation”

• The Standards for Educational and Psychological Testing

– use the terms “modification” and “accommodation” almost interchangeably,

– use accommodation “as the general term for any action taken in response to a determination that an individual’s disability requires a departure from standard testing protocol” (p. 101).

© copyright 2006 Stephen G. Sireci

• “Accommodation” is used to refer to test or test administration changes that are not considered to alter the construct measured.

• “Modification” is used to refer to changes that are thought to alter the construct.

Current State Testing Programs

© copyright 2006 Stephen G. Sireci

• To support valid test score interpretations for students with disabilities, it is important to remove construct-irrelevant barriers to these students’ test performance, but it is also important to maintain “construct representation.”

• In situations where individuals who take accommodated versions of tests are compared with those who take the standard version, an additional validity issue is the comparability of scores across the different test formats.

Validity Issues in Accommodations

© copyright 2006 Stephen G. Sireci

• Accommodated Standardized Test– Promotes fairness in testing?

Or– Provides an unfair advantage to some

examinees?

The Psychometric Oxymoron

What do the Standards for Educational and Psychological Testing say on this issue?

© copyright 2006 Stephen G. Sireci

• Standard 10.1: “In testing individuals with disabilities, test developers, test administrators, and test users should take steps to ensure that the test score inferences accurately reflect the intended construct rather than any disabilities and their associated characteristics extraneous to the intent of the measurement” (AERA, et al., p. 106).

Standards for Educational and Psychological Testing

© copyright 2006 Stephen G. Sireci

• Standard 10.4: If modifications are made or recommended by test developers. . . (unless) evidence of validity for a given inference has been established for individuals with the specific disabilities, test developers should issue cautionary statements in manuals or supplementary materials regarding confidence in interpretations based on such test scores” (AERA et al., p. 106).

Standards for Educational and Psychological Testing

© copyright 2006 Stephen G. Sireci

“Cautionary statements”

• Flagging of test scores: Controversial—most research in this area focused on postsecondary and postgraduate admissions tests (Sireci, 2005).

• How do states handle score reporting issues for accommodated and alternate assessments?

© copyright 2006 Stephen G. Sireci

Accommodated Tests and Accommodated Test Administrations have the Potential to

Undermine Validity in at Least 2 Ways:

1. Construct underrepresentation

2. Construct-irrelevant variance

As stated by Messick (1989):

“Tests are imperfect measures of constructs because they either leave out something that should be included…or else include something that should be left out, or both” (p. 34)

© copyright 2006 Stephen G. Sireci

• When standardized tests are NOT accommodated for SWD– Construct-irrelevant variance can interfere

with test performance• e.g. ability to see, hear, focus, interferes with

measurement of math or reading proficiency

• When standardized tests ARE accommodated– Construct underrepresentation may occur

• e.g., read-aloud for a reading assessment

© copyright 2006 Stephen G. Sireci

What methods do states use to minimize construct-irrelevant variance, while

maintaining construct representation?

© copyright 2006 Stephen G. Sireci

Categories of Accommodations

• Presentation• Timing• Response• Setting

Thompson, Blount, and Thurlow (2002)

© copyright 2006 Stephen G. Sireci

Presentation Accommodations•Oral (read-aloud, audiocassette)

• Paraphrasing

• Technological

• Braille/large print

• Sign language interpreter

• Encouragement (redirecting)

• Cueing

• Spelling assistance

• Use of manipulatives

© copyright 2006 Stephen G. Sireci

• Extended time• Multiple days/sessions• Separate sessions

Timing Accommodations

Timing accommodations are not so much an issue on state standards-based assessments because most have generous time limits.

© copyright 2006 Stephen G. Sireci

• Scribe• Booklet versus answer sheet• Marking booklet to maintain place• Transcription

Response Accommodations

Setting Accommodations• Individual administration• Administration in a separate room

© copyright 2006 Stephen G. Sireci

Other Accommodations

• Alternate assessment• Others?

© copyright 2006 Stephen G. Sireci

Psychometric Research on Test Accommodations Has Focused On

•Has the accommodation changed the construct measured?

•Speed•Different skill

•Do accommodations help only those who need them?

–Interaction hypothesis

•Do test scores from accommodated and non-accommodated administrations have the same meaning?

© copyright 2006 Stephen G. Sireci

Research on test accommodations for individuals with disabilities:

•Little empirical study

•Some literature reviews–Willingham et al. (1988) ─Chiu & Pearson (1999)

–Tindal & Fuchs (2000) ─Pitoniak & Royer (2001)

–Thompson et al. (2002) ─Bolt & Thurlow (2004)

–Sireci, Scarpati, & Li (2005)

•Psychometric issues (Geisinger, 1994)

•Legal issues (Phillips, 1994)

•Also: Keeping Score for All (Koenig & Bachman, 2004)

© copyright 2006 Stephen G. Sireci

• Do test accommodations improve the scores of students with disabilities (SWD)?

• If so, do such score gains reflect increased validity or unfair advantage?– Interaction hypothesis

• What specific types of accommodations are best for specific types of students?

Sireci, Scarpati, & Li (2005)Research Questions

Interaction Hypothesis

Figure 1

Illustration of Interaction Hypothesis

Accommodation Condition

ACCNo ACC

Me

an

Sco

re

60

50

40

30

20

10

GROUP

GEN

SWD/ELL

© copyright 2006 Stephen G. Sireci

Macarthur & Cavalier (2004)

“Differential impact on students with and without disabilities provides evidence that the accommodation removes a barrier based on disability” (p. 55).

© copyright 2006 Stephen G. Sireci

Fletcher et al. (2006)

“Because the source of variance is fundamentally irrelevant to the measurement of the construct, a valid accommodation will improve performance only for students with a disability” (p. 138).

© copyright 2006 Stephen G. Sireci

• Extended time seems to help and it helps SWD more than non-SWD.

• Oral accommodations show promise (math), but less uniformity across studies. Effects are considered unclear.

Are there any general conclusions regarding effects?

© copyright 2006 Stephen G. Sireci

• ERIC and PsychInfo searches• E-mails to researchers in this area

Review Process

© copyright 2006 Stephen G. Sireci

• Dimension 1: SWD or ELL• Dimension 2: Type of accommodation• Dimension 3: Experimental or non-experimental

study

Note that the review was primarily conducted in 2003 and so the results are somewhat dated. We have, however, reviewed additional research since then.

Structure of review

© copyright 2006 Stephen G. Sireci

Characteristics of Studies

Research DesignStudy Focused On

Total

SWD ELL

Experimental13 8 21

Quasi-experimental2 4 6

Non-experimental10 1 11

Total25 13 38

Studies pertaining exclusively to ELL will not be discussed in this presentation.

Types of AccommodationsType(s) of Accommodation # of Studies

Presentation:

Oral* 23

Paraphrase 2

Technological 2

Braille/Large Print 1

Sign Language 1

Encouragement 1

Cueing 1

Spelling assistances 1

Manipulatives 1

*Includes read aloud, audiotape, or videotape, and screen-reading software. Note: Literature reviews and issues papers are not included in this table.

Types of Accommodations

Note: Literature reviews and issues papers are not included in this table.

Type(s) of Accommodation # of Studies

Timing:

Extended time 14

Multi day/sessions 1

Separate sessions 1

Response:

Scribes 2

In booklet vs. answer sheet 1

Mark task book to maintain place 1

Transcription 1

Setting (separate room) 1

© copyright 2006 Stephen G. Sireci

• Most of the studies focused on elementary school (2/3 between grades 3 and 8).

• Only 41% were published in peer-reviewed journals.

Characteristics of Studies

© copyright 2006 Stephen G. Sireci

• Most common findings were gains for both SWD and and non-SWD.– Contrast Camara et al. (1998) with Bridgeman

et al. (in press)• Most studies of extended time (6 of 8) looked at

students with learning disabilities (SWLD)

Results: Extended Time

© copyright 2006 Stephen G. Sireci

Study Subject(s) Design Results H1? Elliott & Marquart (2004)

Math Experimental All student groups gained

No

Runyan (1991) Reading Experimental Greater gains for SWD

Yes

Zurcher & Bryant (2001)

Analogy test

Quasi-experimental

No gains for either group

No

Huesman & Frisbie (2000)

Reading Quasi-experimental

Gains for LD but not for non-LD groups

Yes

Alster (1997) Math Quasi-experimental

Greater gains for SWD

Yes

Summary of Studies on Extended Time (1)

© copyright 2006 Stephen G. Sireci

Summary of Studies on Extended Time (2)

Study Subject(s) Design Results H1?

Camara, Copeland, & Rothchild (1998)

SAT Ex post facto

Gains for LD retesters 3x > greater than standard retesters

Yes

Ziomek & Andrews (1998)

ACT Ex post facto

Gains for LD retesters 4x > greater than gains of standard retesters

Yes

Zuriff (2000) Reading,

ACT, GRE 5 experimental

Gains for both SWD and non-SWD

No

© copyright 2006 Stephen G. Sireci

• Results depend on subject– Gains for SWD only in Math– No differential gain in other subject areas– Tends to support oral accommodation for math

tests

Results: Oral

Study Subject Design Results H1?

Weston (2002) MathExperimental

(b/w and w/in groups)Greater gains for SWD Yes

Tindal, Heath, et al. (1998)

MathExperimental

(b/w and w/in groups)Sig. gain for SWD only Yes

Calhoon, Fuchs, & Hamlett (2000)

MathExperimental (w/in

group)

Sig. gains for oral accom., no differences b/w teacher & computer

Yes

Johnson (2000) MathExperimental (b/w

group)Greater gains for SWD Yes

Huynh, Meyer, & Gallant (2004)

Math Ex post factoAccommodated SWD > matched non-accom. SWD Yes

Helwig, & Tindal (2003)

Math Quasi-experimental

Teachers not accurate in predicting benefit; no gains for either group.

No

Meloy, Deville, & Frisbie (2000)

Science, Math,

Reading

Experimental(b/w and w/in groups)

Similar gains for SWD and non-SWD

No

Oral (continued)

Study Subject Design Results H1?

Brown & Augustine (2001)

Science, Social Studies

Experimental(b/w and w/in groups)

No gain No

Kosciolek & Ysseldyke (2000)

Reading Quasi-experimentalSWD had greater gains, but not

statistically significantNo

McKevitt & Elliot (2003)

ReadingExperimental

(b/w and w/in groups)

No sig. effect size differences b/w accom. & standard. conditions for either group.

No

© copyright 2006 Stephen G. Sireci

More Recent Research

• Extended time– Cohen, Gregg, & Deng (2005)– Wainer, Bridgeman, Najarian, & Trapani (2004)

• Oral– Fletcher, Francis, Boudousquie, Copeland,

Young, Kalinowski, & Vaughn (2006)• Dictation software

– MacArthur & Cavalier (2004)

© copyright 2006 Stephen G. Sireci

Cohen, Gregg, & Deng (2005)• Looked at groups of students with and without

accommodations and their performance on specific types of math items using differential item functioning methods– Accommodation status “only marginally related to the

pattern of accommodation-related DIF”– Different types of students benefited from the extra time– DIF not due to accommodations, but to differences in

students’ performance across different types of math items

© copyright 2006 Stephen G. Sireci

Cohen, Gregg, & Deng (2005)

“Accommodations are more appropriately viewed as leveling the playing field; they do not supply the knowledge necessary to pass tests” (p. 231).

© copyright 2006 Stephen G. Sireci

Wainer et al. (2004)• Reanalysis of Bridgeman, Trapani, & Curley (2004)

data• Evaluated extended time by shortening experimental

sections of SAT• Little difference for verbal (about 5-point gain)• Big difference for quantitative

– about 10-30 points, with larger gain associated with larger time extension

– Largest gains for highest-scoring students

© copyright 2006 Stephen G. Sireci

Wainer et al. (2004)

• Looked at correlations b/w scores from standard and extended time with students’ HS math grades– Claimed no relationship, but results

(correlations and sample sizes) were not reported!

– Important idea to look at external validity criterion

© copyright 2006 Stephen G. Sireci

Wainer et al. (2004)

• Claim that results support not flagging verbal, but should flag quantitative– Don’t acknowledge presence of undesired

speededness– SWD not included in study

• Hard to agree with conclusions• Supports increasing time limit on SAT-Q

© copyright 2006 Stephen G. Sireci

Fletcher et al. (2006)

• Experimental study involving Grade 3 students with (n=91) and without (n=91) decoding difficulties associated with dyslexia

• Oral vs. standard accommodation reading test (Texas)

© copyright 2006 Stephen G. Sireci

Fletcher et al. (2006)

• Accommodation targeted for specific disability– Oral reading of proper nouns,

comprehension stems, & answer choices– Designed to reduce the impact of word

recognition difficulties

© copyright 2006 Stephen G. Sireci

Fletcher et al. (2006)

• Results– Significant group/accommodation

interaction– Only SWD benefited from the

accommodation– Seven times greater likelihood of passing

the test with the accommodation

© copyright 2006 Stephen G. Sireci

Macarthur & Cavalier (2004)• Looked at accommodations for writing

assessments– Experimental study: SWD (n=21), students w/o

documented disability (n=10)– Three accommodation conditions:

• hand-written• dictation to scribe• dictation to speech recognition software

– 48 states allow dictation accommodation (17 exclude scores)

© copyright 2006 Stephen G. Sireci

Macarthur & Cavalier (2004)• Results:

– Dictation improved writing scores for SWD, with Scribe > speech recognition software > hand-written

– Dictation did not improve scores for students w/o disability

– No difference between student groups with respect to preference (hand vs. dictation)

© copyright 2006 Stephen G. Sireci

Macarthur & Cavalier (2004)

• Caveat–Small n (21, 10)

• Construct issue–Dictation okay if construct =

“composing”–Not okay if construct=“writing”

© copyright 2006 Stephen G. Sireci

Research on Equivalence of Test Structure

• One aspect of “construct equivalence”– Rock, Bennett, Kaplan, & Jirele (1988)– Tippets & Michaels (1997)– Huynh, Meyer, & Gallant (2004)– Huynh & Barton (2006)– Cook, Eignor, Sawaki, Steinberg, & Cline (2006)

© copyright 2006 Stephen G. Sireci

Research on Equivalence of Test Structure

Results tend to support similarity of test structure across accommodated and standard test administrations (oral, extended time, various).

© copyright 2006 Stephen G. Sireci

• Do accommodations hurt or promote valid score interpretations for students with disabilities?– Accommodations are designed to promote

validity by removing barriers (irrelevant variance)

– In general, the research suggests the accommodations being used are sensible and defensible.

Discussion (1)

© copyright 2006 Stephen G. Sireci

• Extended time seems to be a valid accommodation.– Unintended test speededness could

explain results for students w/o disabilities

– Result support revised interaction hypothesis or “differential boost.”

Discussion (2)

Interaction Hypothesis: Typical

Illustration of Interaction Hypothesis

Accommodation Condition

ACCNo ACC

Mea

n S

core

60

50

40

30

20

10

GROUP

GEN

SWD/ELL

© copyright 2006 Stephen G. Sireci

Interaction Hypothesis: Revised “Differential Boost”(Fuchs, Fuchs, Eaton, Hamlett, & Karns, 2000)

Illustration of Revised Interaction Hypothesis

Accommodation Condition

ACCNo ACC

Mea

n S

core

60

50

40

30

20

10

GROUP

GEN

SWD/ELL

© copyright 2006 Stephen G. Sireci

• Other accommodations have less consistent and convincing results, but no evidence of “harm” or “unfairness.”

• It should be noted that lots of solid and ingenious experimental research has been done in this area.– Small n, but intense with respect to data

collection

Discussion (3)

© copyright 2006 Stephen G. Sireci

• Oral accommodation for math seems valid. • Oral accommodation for reading involves

consideration of specific construct changes– Fletcher et al. (2006) results indicate matching

disability and accommodation to one aspect of construct promotes validity

Discussion (4)

© copyright 2006 Stephen G. Sireci

• Looking across various studies and accommodation conditions– Lots of variability across studies with respect to

• accommodation conditions and how they were implemented

• Student groups (within and between)• Results

Discussion (5)

© copyright 2006 Stephen G. Sireci

• Test Development: Universal test design– Build tests that are “accessible to all”

(i.e., that do not need to be accommodated).– CBT could be particularly helpful in this regard.– 19th & 20th century: Standardization– 21st century?—Adaptivity?

(can’t be oxymoronic)

Future Directions for Test Design

© copyright 2006 Stephen G. Sireci

• Meta-analysis based on practice– Non-published test accommodations being

conducted in states– Establish a data warehouse for teachers and

test administrators to record results and make comments?

– Would address the small-n issue

Future Directions for Research (1)

© copyright 2006 Stephen G. Sireci

• Larger sample sizes due to inclusion, coupled with improved school data management systems should promote more research on– Differential item functioning– Structural equivalence– Analysis of educational gains

Future Directions for Research (2)

© copyright 2006 Stephen G. Sireci

• More needs to be done on potential changes to the construct– Most often decided by logical analysis– Structural equivalence research is limited– Structural equivalence construct equivalence

Future Directions for Research (3)

© copyright 2006 Stephen G. Sireci

Let’s go do it!

Thank you for your attention!

[email protected]

[email protected]