c r e s s t / u c l a impact of linguistic factors in content-based assessment for ell students...

C R E S S T / U C L A

Impact of Linguistic Factors in Content-Based Assessment for ELL Students

Jamal Abedi

UCLA Graduate School of Education & Information StudiesCenter for the Study of Evaluation

National Center for Research on Evaluation, Standards, and Student Testing

Paper presented at the 2003 Annual Meeting of the American Educational Research Association

Chicago

April 2003


Validity of Academic Achievement Measures

We will focus on construct and content validity approaches:

A test’s content validity involves the careful definition of the domain of behaviors to be measured by a test and the logical design of items to cover all the important areas of this domain (Allen & Yen, 1979, p. 96).

A test’s construct validity is the degree to which it measures the theoretical construct or trait that it was designed to measure (Allen & Yen, 1979, p. 108).

A content-based achievement test has construct validity if it measures the content that it is supposed to measure.

A content-based achievement test has content validity if the test content is representative of the content being measured.

Examples:


Two major questions on the psychometrics of academic achievement tests for ELLs:

1. Are there any sources of measurement error that may specifically influence ELL performance?

2. Do achievement tests accurately measure ELLs’ content knowledge?


Familiarity/frequency of non-math vocabulary: unfamiliar or infrequent words changed

census > video game

Length of nominals: long nominals shortened last year’s class vice president > vice president

Question phrases: complex question phrases changed to simple question words

At which of the following times > When

Linguistic Modification Concerns


Conditional clauses: conditionals either replaced with separate sentences or order of conditional and main clause changed If Lee delivers x newspapers > Lee delivers x newspapers

Relative clauses: relative clauses either removed or re-cast A report that contains 64 sheets of paper >

He needs 64 sheets of paper for each report

Linguistic Modification (continued)

Voice of verb phrase: passive verb forms changed to active The weights of 3 objects were compared >

Sandra compared the weights of 3 rabbits


CRESST Studies on the Assessment and Accommodation

of ELL Students


Analyses of extant data (Abedi, Lord, & Plummer, 1995)

Used existing data from NAEP 1992 assessments in math and science. SAMPLE: Approximately 100,000 ELL and non-ELLs in grades 4, 8, and 12. NAEP test items were grouped into long and short items.

Findings

ELL students performed significantly lower on the longer test items. ELL students had higher proportions of omitted and/or not-reached items.ELL students had higher scores on the less linguistically complex items.


Interview study (Abedi, Lord, & Plummer, 1997)

37 students asked to express their preference between the original NAEP items and the linguistically modified version of these same items. Math test items were modified to reduce the level of linguistic complexity.

Findings

Over 80% interviewed preferred the linguistically modified items over the original version.


Impact of linguistic factors on students’ performance (Abedi, Lord, & Plummer, 1997)

Two studies: testing performance and speed.SAMPLE: 1,031 grade 8 ELL and non-ELL students.41 classes from 21 southern California schools.

Findings

ELL students who received a linguistically modified version of the math test items performed significantly better than those receiving the original test items.


The impact of different types of accommodations on students

with limited English proficiency (Abedi, Lord, & Hofstetter, 1997)

SAMPLE: 1,394 grade 8 students. 56 classes from 27 southern California schools.

Findings Spanish translation of NAEP math test.Spanish-speakers taking the Spanish translation version performed significantly lower than Spanish-speakers taking the English version. We believe that this is due to the impact of language of instruction on assessment.

Linguistic ModificationContributed to improved performance on 49% of the items. Extra TimeHelped grade 8 ELL students on NAEP math tests.Also aided non-ELL students. Limited potential as an assessment accommodation.


Impact of selected background variables on students’ NAEP math performance. (Abedi, Hofstetter, & Lord, 1998)

SAMPLE: 946 grade 8 ELL and non-ELL students. 38 classes from 19 southern California schools.

Findings

Four different accommodations used (linguistically modified, a glossary only, extra time only, and a glossary plus extra time).The glossary plus extra time was the most effective accommodation.

Glossary plus extra time accommodationNon-ELLs showed a greater improvement (16%) than the ELLs (13%). This is the opposite of what is expected and casts doubt on the validity of this accommodation.


The effects of accommodations on the assessment of LEP students in NAEP (Abedi, Lord, Kim, & Miyoshi, 2000)

SAMPLE: 422 grade 8 ELL and non-ELL students. 17 science classes from 9 southern California schools.

Findings Some forms of accommodations may help the recipients with the content of assessment. For example, a dictionary defines all the words in a test, both content and non-content.

A Customized Dictionary easier to use than published dictionary included only non-content words in the test ELL students showed significant improvement in performance no impact on non-ELL performance


Language accommodation for large-scale assessment in science (Abedi, Courtney, Leon, Mirocha, & Goldberg, 2001)

SAMPLE: 612 grades 4 and 8 students. 25 classes from 14 southern California schools.

Findings A published dictionary was both ineffective and administratively difficult to implement as an accommodation.


Language accommodation for large-scale assessment in science (Abedi, Courtney, & Leon, 2001)

SAMPLE: 1,856 grade 4 and 1,512 grade 8 ELL and non-ELL students.132 classes from 40 school sites in four cities, three states.

Findings Results suggested that linguistic modification of test items improved performance of ELLs in grade 8No change on the performance of non-ELLs with modified testThe validity of assessment was not compromised by the provision of an accommodation


Impact of students’ language background on content-based performance: Analyses of extant data (Abedi & Leon, 1999)

Analyses were performed on extant data, such as Stanford 9 and ITBSSAMPLE: Over 900,000 students from four different sites nationwide.

Examining ELL and non-ELL student performance differences and their relationship to background factors (Abedi, Leon, & Mirocha, 2001)

Data were analyzed for the language impact on assessment and accommodations of ELL students.SAMPLE: Over 700,000 students from four different sites nationwide.

Findings

The higher the level of language demand of the test items, the higher the performance gap between ELL and non-ELL students. Large performance gap between ELL and non-ELL students on reading, science and math problem solving (about 15 NCE score points). This performance gap was reduced to zero in math computation.


Normal Curve Equivalent Means and Standard Deviations for Students in Grades 10 and 11, Site 3 School District

Reading Science Math M SD M SD M SD

Grade 10SD only 16.4 12.7 25.5 13.3 22.5 11.7LEP only 24.0 16.4 32.9 15.3 36.8 16.0LEP & SD 16.3 11.2 24.8 9.3 23.6 9.8Non-LEP & SD 38.0 16.0 42.6 17.2 39.6 16.9All students 36.0 16.9 41.3 17.5 38.5 17.0

Grade 11SD only 14.9 13.2 21.5 12.3 24.3 13.2LEP only 22.5 16.1 28.4 14.4 45.5 18.2LEP & SD 15.5 12.7 26.1 20.1 25.1 13.0Non-LEP & SD 38.4 18.3 39.6 18.8 45.2 21.1All students 36.2 19.0 38.2 18.9 44.0 21.2

Note. LEP = limited English proficient. SD = students with disabilities.


Disparity Index (DI) is an index of performance differences between LEP and non-LEP.

Site 3 Disparity Index (DI)

Non-LEP/Non-SD Students Compared to LEP-Only Students

Disparity Index (DI) Math Math Grade Reading Math Total Calculation Analytical

3 53.4 25.8 12.9 32.86 81.6 37.6 22.2 46.18 125.2 36.9 25.2 44.0


Site 3 Grades 10 and 11 School District Item Level Data: Raw Score P-Value Difference with Non-LEP students as a Reference—Reading, Science, and Math Stanford 9 Scores

Percent of Items with Small, Moderate & Large p-value differences*

Reading (54 Items) Science (40 Items) Math (48 Items)

Small Mod. Large Small Mod. Large Small Mod. Large

Grade 10

All LEP 18% 54% 28% 88% 10% 2% 100% 0% 0%

Non-Accom. 54% 44% 2% 95% 5% 0% 100% 0% 0%

Accom. 11% 30% 59% 68% 22% 10% 88% 12% 0%

Grade 11

All LEP 11% 56% 33% 73% 23% 5% 98% 2% 0%

Non-Accom. 37% 52% 11% 85% 10% 5% 100% 0% 0%

Accom. 4% 30% 67% 68% 20% 13% 90% 10% 0%


Site 3 Grade 11 Stanford 9 Reading and Science Structural Modeling Results (DF = 24)

All cases (N=7,176)

Even cases (N=3,588)

Odd cases (N=3,588)

Non-LEP (N=6,932)

LEP (N=244)

Goodness of fit Chi Square 1786 943 870 1675 81 NFI .931 .926 .934 .932 .877 NNFI .898 .891 .904 .900 .862 CFI .932 .928 .936 .933 .908

Factor Loadings Reading Variables

Composite 1 .733 .720 .745 .723 .761 Composite 2 .735 .730 .741 .727 .713 Composite 3 .784 .779 .789 .778 .782 Composite 4 .817 .822 .812 .816 .730 Composite 5 .633 .622 .644 .636 .435

Math Variables Composite 1 .712 .719 .705 .709 .660 Composite 2 .695 .696 .695 .701 .581 Composite 3 .641 .628 .654 .644 .492 Composite 4 .450 .428 .470 .455 .257

Factor Correlation Reading vs Math .796 .796 .795 .797 .791

Note. NFI = Normed Fit Index. NNFI = Non-Normed Fit Index. CFI = Comparative Fit Index.


Site 2 Stanford 9 Sub-scale Reliabilities (1998) GRADE 9 Alpha’s

Sub-scale(Items) Non-LEP Students Hi SES Low SES

English Only

FEP

RFEP

LEP

Reading N=205,092 N=35,855 N=181,202 N=37,876 N=21,869 N=52,720

-Vocabulary (30) .828 .781 .835 .814 .759 .666

-Reading Comp. (54)

.912 .892 .916 .903 .877 .833

Average reliability .870 .837 .876 .859 .818 .750

Math N=207,155 N=36,588 N=183,262 N=38,329 N=22,152 N=54,815

-Total (48) .899 .853 .898 .898 .876 .802

Language N=204,571 N=35,886 N=180,743 N=37,862 N=21,852 N=52,863

-Mechanics (24) .801 .759 .803 .802 .755 .686

-Expression (24) .818 .779 .823 .804 .757 .680

Average reliability .810 .769 .813 .803 .756 .683

Science N=163,960 N=28,377 N=144,821 N=29,946 N=17,570 N=40,255

-Total (40) .800 .723 .805 .778 .716 .597

Social Science N=204,965 N=36,132 N=181,078 N=38,052 N=21,967 N=53,925

-Total (40) .803 .702 .805 .784 .722 .530

c r e s s t / u c l a impact of linguistic factors in content-based assessment for ell students...

Documents