c r e s s t / u c l a impact of linguistic factors in content-based assessment for ell students...
TRANSCRIPT
C R E S S T / U C L A
Impact of Linguistic Factors in Content-Based Assessment for ELL Students
Jamal Abedi
UCLA Graduate School of Education & Information StudiesCenter for the Study of Evaluation
National Center for Research on Evaluation, Standards, and Student Testing
Paper presented at the 2003 Annual Meeting of the American Educational Research Association
Chicago
April 2003
C R E S S T / U C L A
Validity of Academic Achievement Measures
We will focus on construct and content validity approaches:
A test’s content validity involves the careful definition of the domain of behaviors to be measured by a test and the logical design of items to cover all the important areas of this domain (Allen & Yen, 1979, p. 96).
A test’s construct validity is the degree to which it measures the theoretical construct or trait that it was designed to measure (Allen & Yen, 1979, p. 108).
A content-based achievement test has construct validity if it measures the content that it is supposed to measure.
A content-based achievement test has content validity if the test content is representative of the content being measured.
Examples:
C R E S S T / U C L A
Two major questions on the psychometrics of academic achievement tests for ELLs:
1. Are there any sources of measurement error that may specifically influence ELL performance?
2. Do achievement tests accurately measure ELLs’ content knowledge?
C R E S S T / U C L A
Familiarity/frequency of non-math vocabulary: unfamiliar or infrequent words changed
census > video game
Length of nominals: long nominals shortened last year’s class vice president > vice president
Question phrases: complex question phrases changed to simple question words
At which of the following times > When
Linguistic Modification Concerns
C R E S S T / U C L A
Conditional clauses: conditionals either replaced with separate sentences or order of conditional and main clause changed If Lee delivers x newspapers > Lee delivers x newspapers
Relative clauses: relative clauses either removed or re-cast A report that contains 64 sheets of paper >
He needs 64 sheets of paper for each report
Linguistic Modification (continued)
Voice of verb phrase: passive verb forms changed to active The weights of 3 objects were compared >
Sandra compared the weights of 3 rabbits
C R E S S T / U C L A
CRESST Studies on the Assessment and Accommodation
of ELL Students
C R E S S T / U C L A
Analyses of extant data (Abedi, Lord, & Plummer, 1995)
Used existing data from NAEP 1992 assessments in math and science. SAMPLE: Approximately 100,000 ELL and non-ELLs in grades 4, 8, and 12. NAEP test items were grouped into long and short items.
Findings
ELL students performed significantly lower on the longer test items. ELL students had higher proportions of omitted and/or not-reached items.ELL students had higher scores on the less linguistically complex items.
C R E S S T / U C L A
Interview study (Abedi, Lord, & Plummer, 1997)
37 students asked to express their preference between the original NAEP items and the linguistically modified version of these same items. Math test items were modified to reduce the level of linguistic complexity.
Findings
Over 80% interviewed preferred the linguistically modified items over the original version.
C R E S S T / U C L A
Impact of linguistic factors on students’ performance (Abedi, Lord, & Plummer, 1997)
Two studies: testing performance and speed.SAMPLE: 1,031 grade 8 ELL and non-ELL students.41 classes from 21 southern California schools.
Findings
ELL students who received a linguistically modified version of the math test items performed significantly better than those receiving the original test items.
C R E S S T / U C L A
The impact of different types of accommodations on students
with limited English proficiency (Abedi, Lord, & Hofstetter, 1997)
SAMPLE: 1,394 grade 8 students. 56 classes from 27 southern California schools.
Findings Spanish translation of NAEP math test.Spanish-speakers taking the Spanish translation version performed significantly lower than Spanish-speakers taking the English version. We believe that this is due to the impact of language of instruction on assessment.
Linguistic ModificationContributed to improved performance on 49% of the items. Extra TimeHelped grade 8 ELL students on NAEP math tests.Also aided non-ELL students. Limited potential as an assessment accommodation.
C R E S S T / U C L A
Impact of selected background variables on students’ NAEP math performance. (Abedi, Hofstetter, & Lord, 1998)
SAMPLE: 946 grade 8 ELL and non-ELL students. 38 classes from 19 southern California schools.
Findings
Four different accommodations used (linguistically modified, a glossary only, extra time only, and a glossary plus extra time).The glossary plus extra time was the most effective accommodation.
Glossary plus extra time accommodationNon-ELLs showed a greater improvement (16%) than the ELLs (13%). This is the opposite of what is expected and casts doubt on the validity of this accommodation.
C R E S S T / U C L A
The effects of accommodations on the assessment of LEP students in NAEP (Abedi, Lord, Kim, & Miyoshi, 2000)
SAMPLE: 422 grade 8 ELL and non-ELL students. 17 science classes from 9 southern California schools.
Findings Some forms of accommodations may help the recipients with the content of assessment. For example, a dictionary defines all the words in a test, both content and non-content.
A Customized Dictionary easier to use than published dictionary included only non-content words in the test ELL students showed significant improvement in performance no impact on non-ELL performance
C R E S S T / U C L A
Language accommodation for large-scale assessment in science (Abedi, Courtney, Leon, Mirocha, & Goldberg, 2001)
SAMPLE: 612 grades 4 and 8 students. 25 classes from 14 southern California schools.
Findings A published dictionary was both ineffective and administratively difficult to implement as an accommodation.
C R E S S T / U C L A
Language accommodation for large-scale assessment in science (Abedi, Courtney, & Leon, 2001)
SAMPLE: 1,856 grade 4 and 1,512 grade 8 ELL and non-ELL students.132 classes from 40 school sites in four cities, three states.
Findings Results suggested that linguistic modification of test items improved performance of ELLs in grade 8No change on the performance of non-ELLs with modified testThe validity of assessment was not compromised by the provision of an accommodation
C R E S S T / U C L A
Impact of students’ language background on content-based performance: Analyses of extant data (Abedi & Leon, 1999)
Analyses were performed on extant data, such as Stanford 9 and ITBSSAMPLE: Over 900,000 students from four different sites nationwide.
Examining ELL and non-ELL student performance differences and their relationship to background factors (Abedi, Leon, & Mirocha, 2001)
Data were analyzed for the language impact on assessment and accommodations of ELL students.SAMPLE: Over 700,000 students from four different sites nationwide.
Findings
The higher the level of language demand of the test items, the higher the performance gap between ELL and non-ELL students. Large performance gap between ELL and non-ELL students on reading, science and math problem solving (about 15 NCE score points). This performance gap was reduced to zero in math computation.
C R E S S T / U C L A
Normal Curve Equivalent Means and Standard Deviations for Students in Grades 10 and 11, Site 3 School District
Reading Science Math M SD M SD M SD
Grade 10SD only 16.4 12.7 25.5 13.3 22.5 11.7LEP only 24.0 16.4 32.9 15.3 36.8 16.0LEP & SD 16.3 11.2 24.8 9.3 23.6 9.8Non-LEP & SD 38.0 16.0 42.6 17.2 39.6 16.9All students 36.0 16.9 41.3 17.5 38.5 17.0
Grade 11SD only 14.9 13.2 21.5 12.3 24.3 13.2LEP only 22.5 16.1 28.4 14.4 45.5 18.2LEP & SD 15.5 12.7 26.1 20.1 25.1 13.0Non-LEP & SD 38.4 18.3 39.6 18.8 45.2 21.1All students 36.2 19.0 38.2 18.9 44.0 21.2
Note. LEP = limited English proficient. SD = students with disabilities.
C R E S S T / U C L A
Disparity Index (DI) is an index of performance differences between LEP and non-LEP.
Site 3 Disparity Index (DI)
Non-LEP/Non-SD Students Compared to LEP-Only Students
Disparity Index (DI) Math Math Grade Reading Math Total Calculation Analytical
3 53.4 25.8 12.9 32.86 81.6 37.6 22.2 46.18 125.2 36.9 25.2 44.0
C R E S S T / U C L A
Site 3 Grades 10 and 11 School District Item Level Data: Raw Score P-Value Difference with Non-LEP students as a Reference—Reading, Science, and Math Stanford 9 Scores
Percent of Items with Small, Moderate & Large p-value differences*
Reading (54 Items) Science (40 Items) Math (48 Items)
Small Mod. Large Small Mod. Large Small Mod. Large
Grade 10
All LEP 18% 54% 28% 88% 10% 2% 100% 0% 0%
Non-Accom. 54% 44% 2% 95% 5% 0% 100% 0% 0%
Accom. 11% 30% 59% 68% 22% 10% 88% 12% 0%
Grade 11
All LEP 11% 56% 33% 73% 23% 5% 98% 2% 0%
Non-Accom. 37% 52% 11% 85% 10% 5% 100% 0% 0%
Accom. 4% 30% 67% 68% 20% 13% 90% 10% 0%
C R E S S T / U C L A
Site 3 Grade 11 Stanford 9 Reading and Science Structural Modeling Results (DF = 24)
All cases (N=7,176)
Even cases (N=3,588)
Odd cases (N=3,588)
Non-LEP (N=6,932)
LEP (N=244)
Goodness of fit Chi Square 1786 943 870 1675 81 NFI .931 .926 .934 .932 .877 NNFI .898 .891 .904 .900 .862 CFI .932 .928 .936 .933 .908
Factor Loadings Reading Variables
Composite 1 .733 .720 .745 .723 .761 Composite 2 .735 .730 .741 .727 .713 Composite 3 .784 .779 .789 .778 .782 Composite 4 .817 .822 .812 .816 .730 Composite 5 .633 .622 .644 .636 .435
Math Variables Composite 1 .712 .719 .705 .709 .660 Composite 2 .695 .696 .695 .701 .581 Composite 3 .641 .628 .654 .644 .492 Composite 4 .450 .428 .470 .455 .257
Factor Correlation Reading vs Math .796 .796 .795 .797 .791
Note. NFI = Normed Fit Index. NNFI = Non-Normed Fit Index. CFI = Comparative Fit Index.
C R E S S T / U C L A
Site 2 Stanford 9 Sub-scale Reliabilities (1998) GRADE 9 Alpha’s
Sub-scale(Items) Non-LEP Students Hi SES Low SES
English Only
FEP
RFEP
LEP
Reading N=205,092 N=35,855 N=181,202 N=37,876 N=21,869 N=52,720
-Vocabulary (30) .828 .781 .835 .814 .759 .666
-Reading Comp. (54)
.912 .892 .916 .903 .877 .833
Average reliability .870 .837 .876 .859 .818 .750
Math N=207,155 N=36,588 N=183,262 N=38,329 N=22,152 N=54,815
-Total (48) .899 .853 .898 .898 .876 .802
Language N=204,571 N=35,886 N=180,743 N=37,862 N=21,852 N=52,863
-Mechanics (24) .801 .759 .803 .802 .755 .686
-Expression (24) .818 .779 .823 .804 .757 .680
Average reliability .810 .769 .813 .803 .756 .683
Science N=163,960 N=28,377 N=144,821 N=29,946 N=17,570 N=40,255
-Total (40) .800 .723 .805 .778 .716 .597
Social Science N=204,965 N=36,132 N=181,078 N=38,052 N=21,967 N=53,925
-Total (40) .803 .702 .805 .784 .722 .530