reading to learn and reading to integrate: new tasks for...
TRANSCRIPT
Language Testing 2005 22 (2) 174ndash210 1011910265532205lt299oa copy 2005 Edward Arnold (Publishers) Ltd
Reading to learn and reading tointegrate new tasks for readingcomprehension testsLatricia Trites Murray State University and Mary McGroarty Northern Arizona University
To address the concern that most traditional reading comprehension testsonly measure basic comprehension this study designed measures to assessmore complex reading tasks Reading to Learn and Reading to Integrate The new measures were taken by 251 participants 105 undergraduate native speakers of English 106 undergraduate nonnative speakers and 40graduate nonnative speakers The research subproblems included determina-tion of the influence of overall basic reading comprehension level nativelanguage background medium of presentation level of education and com-puter familiarity on Reading to Learn and Reading to Integrate measures andthe relationships among measures of Basic Comprehension Reading toLearn and Reading to Integrate Results revealed that native language back-ground and level of education had a significant effect on performance on bothexperimental measures while other independent variables did not While allreading measures showed some correlation Reading to Learn and Readingto Integrate had lower correlations with Basic Comprehension suggesting apossible distinction between Basic Comprehension and the new measures
I Introduction
Each year thousands of international students apply to Americanuniversities in the hope of obtaining a degree from an English-speaking university and one of the hurdles they face is attaining alsquopassingrsquo score on the Test of English as a Foreign Language(TOEFL) Although not designed as a gatekeeper by EducationalTesting Service (ETS) the TOEFL is often used as such by manyinstitutions of higher education across the USA (Educational TestingService 1997) Prior to 2000 the test assessed basic reading andlistening comprehension as well as grammatical ability While theseskills are essential they represent the minimum needed to succeed in
Address for correspondence Latricia Trites Assistant Professor Murray State UniversityDepartment of English and Philosophy 7C Faculty Hall Murray KY 42071 USA emaillatriciatritesmurraystateedu
higher education ETS aware of this minimum standard and asBachman (2000) mentions the need for task authenticity embarkedon a large-scale project to redesign TOEFL to better reflect the aca-demic language skills required in higher education Among othergoals the TOEFL 2000 project (Enright et al 1998) outlined plansto establish reading tasks for four distinct purposes
bull finding informationbull achieving basic comprehensionbull learning from texts andbull integrating information
The latter two purposes for reading represent a departure from tradi-tional reading tests and constitute more complex tasks that requiremore cognitive processing Tasks appropriate to measure these newpurposes needed to be developed and validated
The project reported here pursued the creation and evaluation ofthese new task types the development of scoring rubrics and the evaluation of native language effects on task and test perform-ance (Educational Testing Service 1998) In addition because thesewere new reading tasks some evidence for their validity was soughtby establishing a baseline for native speakers and then comparingthat baseline to performance of nonnative speakers The TOEFL2000 reading construct paper (Enright et al 1998) suggested that aReading to Learn task would require students to recognize the largerrhetorical frame organizing the information in a given text and carryout a task demonstrating awareness of this larger organizing frameEnright et al (1998) hold that in reading to learn readers must inte-grate and connect information presented by the author with whatthey already know Thus readers must rely on background knowl-edge of text structures to form a Situation Model a representation ofthe content and a Text Model a representation of the rhetoricalstructures of the text as postulated by van Dijk and Kintsch (1983)and discussed by Perfetti (1997) Goldman (1997 362) asserted thatto learn from texts readers must have an awareness of text structureand know how to use it to aid comprehension Reading to Learn canbe assessed in a variety of ways McNamara and Kintsch (1996) sug-gested that inferencing and sorting tasks requiring readers to processthe text based on domain-specific knowledge of the text structurescould yield a representation of the readersrsquo ability to learn from thetext Hence we postulated that one useful means of assessmentwould be to have participants recall information and reproduce infor-mation relationships reflecting their concept of text structure
Latricia Trites and Mary McGroarty 175
(Enright et al 1998 46ndash48) For the Reading to Learn task weassessed readersrsquo knowledge model through their ability to recall andcategorize information from a single text (Enright et al 1998 57)
Another goal of the project was to assess Reading to Integrateinformation which requires readers to integrate information frommultiple sources on the same topic Reading to Integrate goes a stepfurther than Reading to Learn because readers must integrate therhetorical and contextual information found across the texts andgenerate their own representation of this interrelationship (Perfetti1997) Therefore readers must assess the information presented inall sources read and accept or reject pieces of it as they create theirown understanding One means of assessing integration of informa-tion found in typical university assignments is the open-ended taskof generating a synthesis based on one or more texts (Enright et al1998 48ndash49) We used a writing task specifically a writing promptthat elicited the readerrsquos perception of the authorsrsquo communicativepurposes (Enright et al 1998 56) as well as amount of informationretained from two texts to test Reading to Integrate
II Related literature
Recent research has begun to explore the development of tasks thatdistinguish the constructs of Reading to Learn from basic compre-hension Researchers (van Dijk and Kintsch 1983 McNamara andKintsch 1996 Goldman 1997) have determined that reading tolearn requires an interaction between the Text Model of a text as wellas its Situation Model thus resulting in a more difficult measureThese researchers further suggest that Reading to Learn can beassessed through measures that go beyond recall summarizationand text-based multiple-choice questions
The construct of Reading to Integrate requires that readers notonly integrate the Text Model with the Situation Model but also that they create what Perfetti (1997 346) calls a Documents Modelconsisting of two critical elements lsquoAn Intertext Model that linkstexts in terms of their rhetorical relations to each other and aSituations Model that represents situations described in one or moretext with links to the textsrsquo He argues that the use of multiple textsas opposed to a single text brings into clearer focus the relationshipbetween the Text Model and the Situation Model This again sug-gests that Reading to Integrate should be more difficult than Readingto Learn
176 New tasks for reading comprehension tests
Because these constructs go beyond basic comprehensionReading to Learn and Reading to Integrate are hypothesized to bemore difficult reading tasks than Reading to Find Information andReading for Basic Comprehension Perfetti (1997) further suggeststhat Reading to Integrate is a more difficult task than Reading toLearn because it not only requires an integration of a Text Model anda Situation Model but requires an integration of multiple TextModels and multiple Situation Models Thus current reading theorysuggests a difficulty hierarchy of reading tasks based on the level ofintegration necessary to complete the tasks successfully Severalstudies (Perfetti et al 1995 1996 Britt et al 1996 Wiley and Voss1999) have attempted to move beyond basic comprehension andexamine readersrsquo ability to integrate the information from multipletexts into one cohesive knowledge base by having students makeconnections compare or contrast information across texts
Additionally recent research has addressed the effects of computerson reading and assessment such research is relevant to the currentproject because the new TOEFL is administered via computersReading-medium studies have shown that the only effect that com-puters have on reading is related to task (Reinking and Schreiner1985 Reinking 1988 van den Berg and Watt 1991 Lehto et al1995 Perfetti et al 1995 1996 Britt et al 1996 Foltz 1996Wiley and Voss 1999) Taylor et al (1998) found that after minimalcomputer training familiarity with technology did not have a signif-icant effect on examineesrsquo performance on TOEFL-like questionsBecause of the relevance of computer familiarity to TOEFL admin-istration a brief measure of computer familiarity was included in theresearch
For this project we asked three research questions
1) Is performance on a measure of Reading to Learn affected by medium of presentation (paper versus computer) technologyfamiliarity native language (native versus nonnative speakers of English) or level of education (graduate versus under-graduate)
2) Is performance on a measure of Reading to Integrate affected bymedium of presentation (paper versus computer) technologyfamiliarity native language (native versus nonnative) or level ofeducation (graduate versus undergraduate)
3) To what extent are measures of finding informationbasic read-ing comprehension Reading to Learn and Reading to Integraterelated
Latricia Trites and Mary McGroarty 177
III Methods
1 Participants
Two hundred and fifty-one participants the majority undergraduatesvolunteered to take part in this study The sample consisted of 105undergraduate native speakers of English (NSUs) 106 undergraduatenonnative speakers (NNSUs) and 40 graduate nonnative speakers(NNSGs) of English at a midsized southwestern university All data were collected between February and October 1999 All under-graduate participants were recruited through large undergraduateclasses in the areas enrolling most NNSs (business administrationhotel management engineering social sciences and humanities)We tested all NNSs accessible at the institution at the time of datacollection compared to a national sample of international studentsfrom the prior academic year we had a relatively larger proportionof undergraduate relative to graduate students Nearly all undergrad-uate participants were young adults with an average age of 21Nonnative speakers were also recruited from students enrolled in thesummer intensive English program which is made up of studentsneeding to increase TOEFL scores to at least 500 in order to enrollat a university We included 46 participants (32 of NNS sample)with TOEFL scores below 500 in the nonnative sample Graduatenonnative speakers (n 40) were recruited from the entire univer-sity population and had an average age of 3075 Nonnative speakersrepresented a range of language backgrounds One third wereJapanese with other Asian Germanic and Romance languages alsosubstantially represented Both the relatively modest sample size andthe all-volunteer nature of the participant sample preclude directgeneralization to the worldwide TOEFL population but participantswere representative of the levels of international students at the insti-tution where they were enrolled Participants who completed all fourdata collection sessions received a payment of US$10 per hour(US$40 for the entire project)
2 Instruments
This project used three existing instruments two to determine initialreading levels and one to assess levels of computer familiarity andtwo new instruments one for Reading to Learn and one for Readingto Integrate these were developed especially for the project Each ofthe new measures also served as the basis for an additional measure
178 New tasks for reading comprehension tests
of basic reading comprehension related directly to the text includedin the new task Thus each participant completed a total of sevendifferent instruments
a Existing instruments Initial levels of reading comprehensionwere determined based on the NelsonndashDenny Reading Test(NelsonndashDenny) Form G used to identify the reading levels of theNSs and three retired versions of the Institutional TOEFL ReadingComprehension Section (TOEFL Reading Comprehension) used toidentify the reading levels of the NNSs Although each of these testswas used to assess reading levels in the population for which it had been developed all 251 participants took both tests in order toprovide comparative data All 251 participants also completed a brief computer familiarity questionnaire
Participantsrsquo computer familiarity was determined through an 11-item questionnaire based on a longer 23-item questionnairepreviously developed by ETS (Eignor et al 1998) In the presentstudy we used only the 11 items that loaded the most heavily on themajor factors resulting from administration to a large sample ofTOEFL participants For these 11 items developers determined thereliability to be 93 using a split-half method (Eignor et al 199822) This brief questionnaire took approximately 5 minutes tocomplete reliability in our sample using coefficient alpha was 87
b Texts used for new measures In developing the new tasks weselected texts that would conform to the design specifications ofTOEFL 2000 They were problemsolution texts recommended asone of the potentially relevant text types for TOEFL 2000 (Enright et al 1998) Longer texts were used because these represented morechallenging and authentic academic tasks (Enright et al 1998) Weused one 1200-word and two 600-word texts The longer text(Tennesen 1997) was used to assess Reading to Learn and the two600-word texts (Monks 1997 Zimmerman 1997) were used toassess Reading to Integrate We chose these text lengths based onwork by Meyer (1985a) and further research by the first authorindicating that natural science texts between 1200 and 1500 wordsincluded representation of all necessary macro-rhetorical structuresof problemsolution texts with or without explicit signaling While1200ndash1500 word texts provide optimal representation of the macro-rhetorical structures texts of 600-words provide all the basic macro-rhetorical structures present in problemsolution texts Thus these
Latricia Trites and Mary McGroarty 179
180 New tasks for reading comprehension tests
lengths were long enough for adequate argumentation but not so long that they were excessively redundant (Enright et al 1998)Texts were also matched for readability according to standard read-ability scales such as the FleschndashKincaid ColemanndashLiau andBormuth scales and averaged a minimum of grade level 110 to 120on these scales Also all texts pertained to natural and social scienceseach text covered environmental issues such as air and water pollution(Enright et al 1998) Thus text topics were similar across tasks
c New instruments used in the study Three new reading measureswere used in this study to assess Reading to Learn Reading to Integrateand Basic Comprehension Trites (2000 Chapters 2 and 3) presents amore extensive review of literature and rationale for development of thenew measures
bull Reading to Learn The first new measure completion of a chart was used to determine participantsrsquo ability to read to learnSpivey (1997 69) suggests that readersrsquo categorization of infor-mation in text offers insight into their cognitive processes andtheir making of meaning We designed a measure to be used with a 1200-word text that students read on either paper or com-puter Students were asked to recall identify and categorizeinformation from the text on a chart reflecting macro-rhetoricalstructures called macrostructures in this study (problems andsolutions) and other types of information from problemsolutiontexts (causes effects and examples) categories based on thework of Meyer (1985a) The scoring rubric based on work byMeyer (1985b) and later modified by Jamieson et al (1993)awarded points only for the upper levels of textual structurerepresented on the chart (for task and scoring rubric seeAppendix 1) We weighted the information supplied on the chartas follows 10 points for correct information in the problem andsolution categories five points for correct information suppliedin the cause and effect categories and one point for accurateexamples This weighting reflects Meyerrsquos (1985b) hierarchicallevels which characterize problem and solution propositions ashigher order structures while the other categories represent lowerorder propositions1 The theoretical maximum score for this scale
1Students received no points for information improperly placed or for information not found in thetext
Latricia Trites and Mary McGroarty 181
was 241 which would result from maximum points given in allcategories The first author and two research assistants spent35ndash40 hours creating revising norming the scoring rubric anddeveloping the scoring guide (Trites 2000 Chapter 3) To deter-mine interrater reliability we used coefficient alpha rather thanpercentage of agreement because percentage of agreementinflates the likelihood of chance agreement (Hayes and Hatch1999) After norming overall interrater reliability was 99(coefficient alpha) with similarly high reliabilities assessed withsimilarly high alpha coefficients for all subcategories2
bull Reading to Integrate The second new measure assessed Readingto Integrate The task used to assess Reading to Integraterequired participants to read two 600-word texts and compose awritten synthesis The prompt asked students to make connec-tions across the range of ideas presented thus we asked readersto synthesize information rather than summarize or makecomparisons (Wiley and Voss 1999) This synthesis was scoredbased on an analytic scale ranging from 0 to 80 reflecting read-ersrsquo ability to recognize and manipulate the structure of the textsinclude specific information and express connections acrosstexts through the use of cohesive devices (for task and scoringrubric see Appendix 2) The test was designed to measure theintegration of content from both readings and did not assessother aspects of writing such as the creation of rhetorical stylegrammaticality or mechanics The rubric was composed of threesubcategories integration ability macrostructure recognitionand use of relevant details The integration subscore wasawarded the highest point values because this was the predomi-nant skill being tested It scored participants on their ability tomake connections across texts based on the manipulation of thetextual frames in both texts The second subcategory awardedpoints for the ability to recognize and articulate the macrostruc-tures (problem cause effect or solution) present in each textThis subcategory was similar to the categorizing task used in theReading to Learn measure with the additional constraint thatparticipants had to express the connections overtly The thirdsubcategory in the scoring rubric analysed the ability to use
2We recognize that tasks requiring high inference measures plus extensive norming and revision of the scoring rubric pose feasibility issues in large-scale testing Further research is needed todetermine whether and how such scoring procedures could be adapted in standardized testing fornumerous test-takers
relevant details as support in the written synthesis The firstauthor and two research assistants spent 30 hours revising norm-ing the scoring rubric and developing a decision guide resultingin an overall interrater reliability of 99 (coefficient alpha) withsimilarly high alphas for all subcategories
bull Basic Comprehension The third construct was measured bymultiple-choice tests related specifically to the texts used in the new tasks These tests were created by TOEFL TestDevelopment staff and followed current TOEFL reading sectionspecifications We used two multiple choice tests BasicComprehension Test 1 (BC1) and Basic Comprehension Test 2 (BC2) 20 items each one for the longer passage used to assessReading to Learn and one for the two passages used to assessReading to Integrate Both were scored based on number of items answered correctly Reliability on BC1 calculated basedon 251 participants was 84 (coefficient alpha) Inadvertently theorder of the texts used in BC2 was different for the two differentmedia however reliability on both versions of the test was highFor those who took BC2 based on paper texts (n 127) relia-bility was 84 (coefficient alpha) for those who took BC2 basedon computerized texts (n 124) reliability was 86 (coefficientalpha)
3 Design for data collection
This study used a 22 repeated measures design to examineperformance on the new reading tasks Native speaker undergraduatesand nonnative speaker undergraduates were divided into two groupseach of equal ability as determined by performance on the baselinestandardized measures of reading comprehension (NelsonndashDenny orTOEFL) Half of each group read texts on paper the other half readthe same texts on a computer screen A smaller group of nonnativespeaker graduates equally divided were also included for a compar-ison between performance by graduate and undergraduate nonnativespeakers Additionally the administration of the new measures wascounterbalanced to control for any practice effect
a Procedures All participants met with the researchers in foursessions each lasting about an hour The first two sessions were devotedto administering the existing instruments During Session 1 partici-pants received an introduction to the study and took one of the two
182 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 183
standardized basic reading comprehension measures (NelsonndashDennyor TOEFL Reading Comprehension) Students completed thecomputer familiarity questionnaire and the NelsonndashDenny Test at the same testing session because the NelsonndashDenny was shorter than the TOEFL Reading Comprehension During Session 2 partic-ipants took the other standardized basic reading comprehensionmeasure
Next each participant group was subdivided into two subgroupsfor computer-based or paper reading of the texts for the new tasksThe subgroups were matched on their performance on initial readingmeasures the NelsonndashDenny was used for native speakers and theTOEFL Reading Comprehension was used for nonnative speakersIndependent t-tests run on these reading measures showed no signif-icant difference in basic comprehension for the newly created sub-groups assigned to each medium ensuring that they were balancedfor initial reading levels Participants stayed in the same subgroupsfor the duration of the study To ensure uniformity of response modeall participants whether they read the source texts on the computeror on paper responded to the reading tasks using paper and pencilformat3
The last two sessions each lasting approximately one hour werededicated to administration of the new measures The Reading toLearn session took slightly longer to administer because administra-tive procedures were longer for this novel task The new tasks werecounterbalanced to control for practice effect thus half of the partic-ipants took the Reading to Learn measure first and half took theReading to Integrate measure first During Session 3 we administeredthe first new measure (for ease of discussion Reading to Learn is dis-cussed first) and BC1 At this session students were given 12 minutesto read a 1200-word passage either on computer or on paper We lim-ited the time allowed for reading based on 100 words per minutethought to be ample (Grabe personal communication 1998) Afterexaminees read the text they were given 4 minutes to take notes ona half sheet of paper Participants were instructed to take minimalnotes due to the time constraints Next the text was removed andexaminees were allowed 15 minutes to complete a chart based on the reading with the aid of their notes After completing this Readingto Learn activity participants were allowed to use the text and
3Although responses could have been entered and perhaps scored by computer this would haveintroduced factors not directly related to our research questions and remains an area for furtherstudy
were given 15 minutes to answer BC1 Following these new testingsessions 49 participants were selected for a related interview con-cerning the cognitive processes used in task completion (for furtherdetails see Trites 2000 Chapter 6)
During Session 4 students were given 12 minutes to read twoshort texts (600 words each) either on computer or paper Afterparticipants read the assigned texts they were given 4 minutes totake one-half page of notes (Enright et al 1998) Next the textswere removed and participants were asked to demonstrate Readingto Integrate by writing a synthesis of the texts with the aid of theirnotes (15 minutes allowed for this task) After completing theReading to Integrate task participants were allowed to see the textsagain and answered BC2 (15 minutes allowed for this task) In oneReading to Integrate session for unknown reasons six of the sevenparticipants read only one text Because we cannot explain the causeof this anomalous session we have eliminated scores from thesessionrsquos seven participants from subsequent analyses thus slightlyreducing the N size for the Reading to Integrate measure
b Variables used in study The six independent variables includedthree nominal (Native Language Background Medium of TextPresentation and Level of Education) and three interval variables(NelsonndashDenny TOEFL Reading Comprehension and ComputerFamiliarity) The four dependent variables were Reading to LearnReading to Integrate BC1 and BC2
IV Results
First we present the descriptive statistics for all reading measuresfollowed by a systematic analysis of independent variables that mightaffect participant performance on the new measures Scatterplotswere checked for all reading measures to ensure normality of dataKurtosis and skewness levels for all reading measures were found tobe within normal limits indicating a relatively normal distributionDescriptive statistics for all existing measures are shown in Table 1Means for these measures show a consistent pattern the nativespeaker undergraduates had the highest mean followed by the non-native speaker graduates followed by the nonnative speaker under-graduates On the reading measures NelsonndashDenny and TOEFLReading Comprehension the nonnative speaker undergraduates
184 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 185
showed the largest variance in performance while on the computerfamiliarity measure the variance of both nonnative speaker groupswas substantially larger than that of the native speakers
The same pattern emerged for the means on the new measures (seeTable 2) as for the existing measures The native speaker undergradu-ate group performed better on all new measures than both of thenonnative speaker groups The nonnative speaker graduate groupperformed better than the nonnative speaker undergraduate group onall measures as well This robust pattern of performance was alsofound in the variance of three of the four new measures On BC1 andBC2 the performance of the native speaker undergraduates showed the least amount of variance followed by the nonnative speaker grad-uates followed by the nonnative speaker undergraduates On Readingto Integrate the native speaker undergraduate group showed substan-tially less variance than the nonnative speaker groups however thevariance of the two nonnative speaker groups was almost identical OnReading to Learn all three groups showed considerable variance
Table 3 reveals the range of awarded points achieved by all partici-pant groups The nature of the Reading to Learn point system created amaximum possible point value (241) that no participant achieved Wespeculate that there are at least three possible causes of the discrepancybetween the theoretical maximum and the range of observed scores
Table 1 Descriptive statistics for existing measures for three participant groups
Group n Mean sd kMax
NelsonndashDennyNSU 105 12648 1646 156NNSU 106 6724 3191 156NNSG 40 8888 2188 156Total participants 251 9547 3693 156
TOEFL Reading comprehensionNSU 105 6130 424 67NNSU 106 5030 853 67NNSG 40 5715 455 67Total participants 251 5599 819 67
Computer familiarityNSU 104 3808 360 44NNSU 104 3482 599 44NNSG 40 3563 602 44Total participants 248 3631 533 44
Note kMax number of items or maximum possible score
186 New tasks for reading comprehension tests
bull task novelty no participant reported ever doing such a taskpreviously
bull time allowed for task completion andbull space on the response sheet space constraints may have limited
the amount of information that participants could include
Future research would need to address these issues However for theReading to Integrate measure the full range of possible point totalswas achieved by at least one participant in each group
1 Computer familiarity
The overall plan for the analyses was to check the influence of theindependent variables on the dependent measures with computerfamiliarity being addressed first Initially we had proposed that if computer familiarity was significantly different across groups itwould be entered into all calculations as a covariate To determinethis it was necessary to conduct an Analysis of Variance (ANOVA) forcomputer familiarity across the six participantmedium subgroups
Table 2 Descriptive statistics for new measures for three participant groups
Group n Mean sd kMax
Reading to Learn (chart)NSU 105 5185 1986 241NNSU 106 3173 1950 241NNSG 40 4468 1927 241Total participants 251 4221 2164 241
Basic Comprehension Test 1NSU 105 1698 247 20NNSU 106 1173 425 20NNSG 40 1498 350 20Total participants 251 1444 423 20
Reading to Integrate (synthesis)NSU 101 6365 1105 80NNSU 103 3724 2176 80NNSG 40 5360 2103 80Total participants 244 5086 2163 80
Basic Comprehension Test 2NSU 105 1591 278 20NNSU 106 975 454 20NNSG 40 1285 361 20Total participants 251 1282 472 20
Notes kMax number of items or maximum possible score n size reduced forreading to integrate because of anomalous testing session
Latricia Trites and Mary McGroarty 187
The resulting ANOVA (F 470 p 05) showed a significantdifference between subgroups on the computer familiarity question-naire therefore a post hoc Scheffeacute test was done to locate significantcontrasts After analysis of all possible subgroup contrasts the post hoc Scheffeacute revealed that the only significant difference in sub-groups appeared between the native speaker undergraduates andnonnative speaker undergraduates who read texts on paper Hencealthough there was one significant contrast it occurred in two sub-groups reading on paper not in any of the subgroups who read oncomputer All groups generally scored high on computer familiarityalthough as noted variance of the nonnative groups was greater Itwas thus established that computer familiarity had no significanteffect on participants who read texts on computer so we did not usecomputer familiarity as a covariate in further analyses and proceededto the three research questions of central interest to this study
Because both Research Questions 1 and 2 are similar ndash except thatthey address the two different new reading measures Reading toLearn and Reading to Integrate ndash we approached them in the samemanner through ANOVA to identify the independent variables thatcould have significantly affected the results on the new measures
2 Research Question 1
The first research question asked if performance on a measure ofReading to Learn was affected by medium of presentation computerfamiliarity native language or level of education We calculated a uni-variate ANOVA with Type III sums of squares on Reading to Learn with
Table 3 Range of scores for new measures for three participant groups
Group n Minimum Maximum kMax
Reading to Learn (chart)NSU 105 14 120 241NNSU 106 0 86 241NNSG 40 3 94 241Total participants 251 0 120 241
Reading to Integrate (synthesis)NSU 101 38 80 80NNSU 103 0 80 80NNSG 40 5 80 80Total participants 244 0 80 80
Notes kMax number of items or maximum possible score n size reduced forreading to integrate because of anomalous testing session
188 New tasks for reading comprehension tests
group status medium of text presentation and test order as possiblecontributing factors4 Table 4 shows that there were no significantinteractions for any of the group medium or test order combinationsThe only significant main effect was group membership
Because group membership was a combined measure that includedboth native language background as well as level of education posthoc analysis was needed to identify the significant contrasts Table 5shows that there was a significant difference in performance on theReading to Learn measure between the native speaker undergraduateand the nonnative speaker undergraduate groups as well as a sig-nificant difference between the nonnative speaker undergraduate and nonnative speaker graduate groups There was no significantdifference in performance between the native speaker undergrad-uate and the nonnative speaker graduate groups Therefore theanswer to Research Question 1 is that native language backgroundand level of education did have a significant effect on performance onthe Reading to Learn measure but that medium of text presentationdid not Further order of testing whether participants took Readingto Learn or Reading to Integrate first had no significant effect
3 Research Question 2
The second research question related to the first asked if perform-ance on Reading to Integrate was affected by medium of presentation
Table 4 Performance on Reading to Learn measure by groups medium and testorder (n 251) (univariate analysis of variance)
Source Type III sum df Mean square Fof squares
Group 2186929 2 1093464 2810Medium 121586 1 121586 313Test order 29498 1 29498 076Group medium 33481 2 16740 043Group test order 437 2 219 001Medium test order 39173 1 39173 101Group medium test order 57529 2 28765 074Error 9299745 239 38911
Note p 05
4Test order was added as an additional variable to double check that our counterbalancing had beeneffective in controlling for any practice effect
Latricia Trites and Mary McGroarty 189
computer familiarity native language or level of education Again toensure that counterbalancing of tests controlled for any practiceeffect test order was added as an additional variable
To answer this question we proceeded to calculate a univariateANOVA on the Reading to Integrate measure with group statusmedium of text presentation and test order entered as possible contri-buting factors The results (Table 6) show as for Research Question 1that there were no significant interactions for any of the groupmedium or test order combinations the only significant main effectwas group membership The answer for Research Question 2 is thatnative language background and educational level had a significanteffect on Reading to Integrate but medium of text presentation did notPost hoc analysis of group contrasts showed that all three groups weredistinct in their performance on Reading to Integrate (see Table 7)
4 Research Question 3
The third research question asked to what extent measures of basiccomprehension Reading to Learn and Reading to Integrate were
Table 5 Post hoc Scheffeacute for Reading to Learn measure (n 251)
Group n Group n Mean difference Standard error
NSU 105 NNSU 106 2012 272NNSG 40 717 367
NNSU 106 NNSG 40 1295 366
Note p 05
Table 6 Performance on Reading to Integrate measure by groups medium andtest order (n 244a) (univariate analysis of variance)
Source Type III sum df Mean square Fof squares
Group 3629433 2 1814717 5582b
Medium 19295 1 19295 059Test order 9783 1 9783 030Group medium 1182 2 591 002Group test order 3014 2 1507 005Medium test order 109872 1 109872 338Group medium test order 148858 2 74429 229Error 7543037 232 32513
Notes an size reduced for Reading to Integrate because of anomalous testingsession bp 05
190 New tasks for reading comprehension tests
related We used correlational analysis as the first step in answeringthis question Results for the total participant population (see Table 8)showed moderate to high correlations across all reading measuresHowever the analyses done for Research Questions 1 and 2 revealedthat group status had a significant effect on performance on Readingto Learn and Reading to Integrate Further we realize that corre-lations are sensitive to variance so the high correlations seen in thetotal population could have been an artifact of combining the threegroups Therefore we examined the correlations among all readingmeasures for each group (available in Trites 2000 Appendix 1 pp230ndash33) While the reading measures were still correlated oftenmoderately sometimes highly magnitudes differed and sometimesdropped substantially The text-specific multiple-choice measuresBC1 and BC2 consistently correlated more highly with theNelsonndashDenny and TOEFL Reading Comprehension tests than withReading to Learn and Reading to Integrate based on the same textssuggesting a test method or construct effect Because comparisonsbetween different measures of basic comprehension were not a goal of the project BC1 and BC2 were not used in further analysesWe conclude that as expected all reading measures were relatedbut the lower correlations between Reading to Learn and Reading to Integrate and the traditional basic comprehension measures led us to consider further types of analysis to identify the possibledistinctiveness of the new measures
5 Discriminant analysis
Because we were interested in determining how constructs differedwe sought additional analyses to help us better characterize the new constructs Of the several possible statistical methods that could have been employed two are most plausible multivariateanalysis of variance usually associated with experimental research
Table 7 Post hoc Scheffeacute for Reading to Integrate measure (n 244a)
Group n Group n Mean difference Standard error
NS 101 NNSU 103 2641b 253NNSG 40 1005b 337
NNSU 103 NNSG 40 1636b 336
Notes an size reduced for Reading to Integrate because of anomalous testingsession bp 05
Latricia Trites and Mary McGroarty 191
Tab
le 8
Co
rrel
atio
ns
for
all r
ead
ing
mea
sure
s fo
r al
l par
tici
pan
ts (
n
251)
TO
EFL
Rea
din
gB
asic
B
asic
Rea
din
g t
oR
ead
ing
to
C
om
pre
hen
sio
nC
om
pre
hen
sio
n T
est
1C
om
pre
hen
sio
n T
est
2Le
arn
Inte
gra
tea
Nel
son
ndashDen
ny
90b
85b
84b
66b
69b
TO
EFL
Rea
din
g
100
85b
84b
64b
69b
com
pre
hen
sio
nB
asic
1
008
4b6
8b6
8b
com
pre
hen
sio
n 1
Bas
ic
100
68b
70b
com
pre
hen
sio
n 2
Rea
din
g t
o L
earn
100
59b
Not
es a
nsi
ze r
edu
ced
fo
r R
ead
ing
to
Inte
gra
te b
ecau
se o
f an
om
alo
us
test
ing
ses
sio
n b
p
05
and discriminant analysis usually associated with descriptiveresearch (Tabachnick and Fidell 1996) The present research wasconducted with samples of naturally occurring student groups andwas not experimental Moreover we were interested in finding waysto compare participant performance on the measures of the newconstructs Reading to Learn and Reading to Integrate with perform-ance on more traditional measures of basic comprehension Thus we opted to use discriminant analysis because of its parsimony of description and clarity of interpretation (Stevens 1996)Discriminant analysis a technique recommended to describe groupdifferences or predict group membership based on a comparison ofmultiple predictors (Huberty 1994) has been used in other areas ofapplied linguistic research to investigate creation of a student profileof success or failure on Computer Assisted Language Learning(CALL) lessons (Jamieson et al 1993) and accurate classification oftext types into registers (Biber 1993) among other purposes
To further distinguish basic comprehension from the new con-structs we conducted discriminant analysis on each of the twolanguage groups (native and nonnative) to determine whetherReading to Learn and Reading to Integrate would classify partici-pants in the same way that Basic Comprehension would We dividedthe native speaker and nonnative speaker groups into three levelshigh middle (mid) and low reading ability scorers based on thebasic comprehension measure chosen for that group (NelsonndashDennyfor native speakers TOEFL Reading Comprehension for nonnativespeakers) Research methodologists (Tabachnick and Fidell 1996513) note that robustness is expected when the smallest group has isleast 20 our smallest group was 25 To check the assumption ofhomogeneity of the variancecovariance matrices we examined theoutcomes of Boxrsquos M Test and found them all nonsignificant(Klecka 1980) Each group was checked for outliers usingMahalanobis distance treated as Chi-Square and no outliers were found (Tabachnick and Fidell 1996) Thus the data met allassumptions required for use of discriminant analysis
a Discriminant analysis for nonnative speakers To organize thediscriminant analysis in order to see if the Reading to LearnReadingto Integrate Composite classified participants similarly to themeasure of Basic Comprehension (for nonnative speakers TOEFLReading Comprehension) we divided the entire nonnative speakergroup (n 146) into three levels of basic comprehension high mid
192 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 193
and low These reading ability groups of high mid and low werebased on typical TOEFL Reading Comprehension score levelsrequired for program entry Participants were classified as high iftheir scores were greater than or equal to 550 or 56 and above on thescaled score on the TOEFL Reading Comprehension (550 is the cutscore often used for graduate entry) Participants were classified asmid if scores ranged between 500 and 549 or 50 to 55 on the TOEFLReading Comprehension Participants were classified as low if theirscores fell below 500 or 49 and below on TOEFL ReadingComprehension (500 is a minimum TOEFL score sometimes usedfor undergraduate admission often with the proviso that studentsenroll in ESL classes either prior to official enrollment or concur-rently) For our entire nonnative speaker group descriptive statisticson basic comprehension reading ability group membership levelsappear in Table 9
We ran SPSS Discriminant Analysis with initial grouping variablesof high mid and low reading ability We compared initial readingability levels with high mid and low categories on the Reading toLearnReading to Integrate Composite a new variable reflectinglevel of performance on the Reading to Learn and Reading toIntegrate measures combined5 The discriminant analysis yieldedone discriminant function with an eigenvalue of 92 responsible for999 of the variance in outcomes Wilkrsquos Lambda for this functionwas 52 significant at 001 the associated Chi-Square value wasextremely large (9093) and highly significant ( p 001) indicatingthe group centroids on the composite Reading to LearnReading toIntegrate function for the three nonnative speaker reading abilitygroups were significantly different Both Reading to Learn and
Table 9 Descriptive statistics for nonnative speakers by TOEFL reading compre-hension reading ability groups (n 143)
Reading ability group n Mean on TOEFL reading comprehension sd
High (56) 57 5968 303Mid (50ndash55) 42 5281 167Low (49) 44 4211 547
Total 143 5226 822
Note Total n 143 due to loss of three cases in anomalous Reading to Integratesession
5We first calculated two separate discriminant analyses one for Reading to Learn and one for Reading to Integrate but we found that both loaded on a single function so we used thecomposite in subsequent analyses
194 New tasks for reading comprehension tests
Reading to Integrate loaded significantly on the discriminant func-tion at 86 for Reading to Learn and 78 for Reading to Integrate ( p 05)
Over half (649) of the high reading ability group remained highon the new measure less than half (429) of the mid reading ability group remained classified as mid Hence the Reading toLearnReading to Integrate Composite was particularly influential inreclassifying the mid reading ability group and to a lesser extent thehigh group However most (818) of the low reading ability groupremained low on the Reading to LearnReading to IntegrateComposite (see Table 10) Of the 143 nonnative speaker participants92 (64) remained in the initial basic comprehension category onthe composite the rest moved but in different directions Twenty-one participants (147) were classified into a higher category on theReading to LearnReading to Integrate Composite than their initialbasic comprehension level would have suggested while 31 (217)were reclassified into a lower category Thus 51 participants(364) just over one third of the sample were classified differentlybased on their Reading to LearnReading to Integrate Compositeperformance
b Discriminant analysis for native speakers Because one of ourgoals in this project was to probe the possible validity of these newmeasures by assessing performance of two groups native as well as
Table 10 Discriminant analysis comparison of nonnative speaker reading abilitygroups with reading to learnreading to integrate composite (n 143)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto integrate composite
High Mid Low
CountHigh (56) 37 17 3 57Mid (50ndash55) 13 18 11 42Low (49) 2 6 36 44Reclassification total 52 41 50 143
PercentageHigh (56) 649 298 53 1000Mid (50ndash55) 310 429 262 1000Low (49) 45 136 818 1000
NoteTotal n 143 due to loss of three cases in anomalous Reading to Integratesession
Latricia Trites and Mary McGroarty 195
nonnative speakers we conducted a parallel discriminant analysisfor native speakers Thus for the native speakers we followed thesame procedure dividing the entire native speaker group (n 105)into three levels of basic comprehension high mid and low basedon score distances of 5 standard deviations from the sample meanof the NelsonndashDenny test for these participants Native speakersneed not take reading comprehension tests when entering the univer-sity so the three-way split was based entirely on our sample dataParticipants were classified as high if their scores on the Nelson-Denny were greater than or equal to 135 Participants were classifiedas mid if their scores ranged between 119ndash134 on the Nelson-DennyParticipants were classified as low if their scores fell at or below 118on the Nelson-Denny Descriptive statistics for the entire nativespeaker sample on basic comprehension group membership appearin Table 11
The discriminant analysis yielded one discriminant function with an eigenvalue of 31 responsible for 987 of the variance inoutcomes Wilkrsquos Lambda for this function was 76 significant at 001 the associated Chi-Square value was large (2639) and highlysignificant (p 001) indicating the group centroids on the discrim-inant function for the three reading ability groups on the Reading toLearnReading to Integrate Composite were significantly differentPooled within groups correlations between discriminating variablesshowed that Reading to Learn correlated with the first discriminantfunction at a level of 81 Reading to Integrate correlated with thesecond discriminant function at 71 This contrasts with findings forthe nonnative speakers where scores on the combined new measuresloaded significantly on only one discriminant function For nativespeakers then there is evidence for two significant discriminantfunctions although the first accounts for almost all of the varianceAlthough these two measures (Reading to Learn and Reading toIntegrate) loaded on two separate discriminant functions they still
Table 11 Descriptive statistics for native speakers by NelsonndashDenny reading abilitygroups (n 101)
Reading ability group n Mean on NelsonndashDenny sd
High (135) 39 14126 519Mid (119ndash134) 37 12565 509Low (118) 25 10288 1098Total 101 12604 1652
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
196 New tasks for reading comprehension tests
showed moderate correlations with the alternate function6 justifyingthe composite calculations
Results of discriminant analysis for native speakers seen in Table 12 show a different pattern than that observed for nonnativespeakers Nearly three-fourths (718) of the native speakersclassified as high in the basic comprehension reading ability groupremained high on the Reading to LearnReading to IntegrateComposite For the mid group however only 189 remainedclassified as mid Just over half (52) of the low reading abilitygroup members remained low on the Reading to LearnReading to Integrate Composite As with the nonnative speakers participantsin the mid category on basic comprehension showed the mostfrequent reclassification Forty-eight of the 101 (475) nativespeaker participants remained in the initial classification categoriesTwenty-two (218) were reclassified into a higher category and 31(307) were reclassified into a lower category Thus 53 participants(525) over half of the sample were classified differently based ontheir performance on the Reading to LearnReading to IntegrateComposite
6Reading to Learn correlated with function 2 at -58 Reading to Integrate with function 1 at 71
Table 12 Discriminant analysis comparison of native speaker reading abilitygroups with Reading to LearnReading to Integrate Composite (n 101)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto Integrate Composite
High Mid Low
CountHigh (56) 37 17 3CountHigh (135) 28 3 8 39Mid (119ndash134) 10 7 20 37Low (118) 5 7 13 25Reclassification total 43 17 41 101
PercentageHigh (135) 718 77 205 1000Mid (119ndash134) 270 189 541 1000Low (118) 200 280 520 1000
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
V Interpretation
Analyses done to answer Research Questions 1 and 2 showed that performance on Reading to Learn and Reading to Integratemeasures was significantly influenced by language background and level of education (graduate vs undergraduate for nonnativespeakers only) Moreover level of computer familiarity had nosignificant effect on Reading to Learn and Reading to Integrateperformance
Correlations showed that as expected all reading measures corre-lated to some degree generally answering Research Question 3 Themost interesting results came from the discriminant analyses becausethey showed a pattern of differential classification on the new taskssensitive to initial reading ability This pattern differed for nonnativespeakers and native speakers Examination of the reclassificationsbased on the new tasks revealed some of the problems of using abasic comprehension-only test to predict performance on more chal-lenging literacy tasks These results also imply the need for furtherwork on new measures of advanced literacy skills such as Reading toLearn and Reading to Integrate to reflect trends in construct-drivenassessment (Pellegrino et al 1999)
For the nonnative speakers most (818) of the participants withTOEFL Reading Comprehension below 50 remained classified as lowon the Reading to LearnReading to Integrate Composite suggestingthe existence of a lower threshold of academic English proficiencyParticipants below this threshold were unlikely to perform well ontasks assessing Reading to Learn and Reading to Integrate Even tasksinvolving only selection (such as BC1 and BC2) rather than produc-tion of responses were difficult for this group However for thenonnative speaker readers in the mid and high reading ability groupsbasic comprehension level was not nearly as consistent a predictor of performance on the Reading to LearnReading to IntegrateComposite This was especially striking for those in the mid readingability group Approximately one fourth of the mid group droppedinto the low category on the Reading to LearnReading to IntegrateComposite indicating that they experienced more difficulty in com-pleting the new measures but approximately one fourth could indeedcategorize and synthesize information relatively well despite mid-dling performance on basic comprehension This finding suggeststhat for nonnative speaker readers in the mid reading ability categorybasic comprehension measures were insufficient to predict theirperformance on the new measures
Latricia Trites and Mary McGroarty 197
Almost two-thirds of the high basic comprehension nonnativespeakers remained high on the Reading to LearnReading toIntegrate Composite but one third dropped suggesting that theycould not categorize and synthesize information from texts as well asthey could recognize information For this third the recognition-onlytask multiple-choice basic comprehension overestimated ability tomanipulate and synthesize information For the high and mid read-ing ability nonnative speaker groups basic comprehension-only testsmay then overestimate academic English proficiency
Results for the native speakers showed a more definitive pattern ofreclassification For the low reading ability group approximatelyhalf were misclassified suggesting that the basic comprehensionmeasure underrepresented their ability to categorize and synthesizeinformation in academic texts Most of the mid reading ability groupwere reclassified as lower on the Reading to LearnReading toIntegrate Composite showing that basic comprehension resultsoverestimated ability to succeed on more challenging tasks On theother hand almost one third of the mid reading ability group didbetter on the Reading to LearnReading to Integrate Composite forthem basic comprehension underrepresented their ability to com-plete more challenging reading tasks For the high ability readersnearly one third dropped on the Reading to LearnReading toIntegrate Composite For them the basic comprehension measureoverestimated their level of academic English proficiency The other two thirds of the high reading ability group remained highsuggesting a higher threshold of academic English proficiency
Considering results from both the nonnative speakers and nativespeakers we conclude that the new tasks did assess something dif-ferent from basic comprehension once a lower level threshold ofbasic academic English proficiency had been achieved Examinationof scatterplots based on discriminant functions indicated an obviousseparation between participants who could and could not perform onthe Reading to LearnReading to Integrate Composite thus provid-ing some tentative evidence for concurrent validity (Messick 1989Chapelle 1999) We had hoped to find clear evidence of a hierarchysuggesting that Reading to Learn was demonstrably more difficultthan basic comprehension and Reading to Integrate demonstrablymore difficult than Reading to Learn but results did not yield anobvious hierarchy Results did however suggest an even simplerpattern a dichotomy For those nonnative speakers above the lowerthreshold of English language proficiency which in our data wouldbe approximately 500 or 49 on the scaled TOEFL Reading
198 New tasks for reading comprehension tests
Comprehension scores some could perform well on the newmeasures some could not revealing two rather than three groups onthe Reading to LearnReading to Integrate Composite These resultslead us to speculate that the new measures tap additional skills suchas lsquosophisticated discourse processes and critical thinking skillsrsquo(Enright and Schedl 1999 24) in addition to language proficiencyFor native speakers the discriminant loading suggested a possiblehierarchy of difficulty but the scatterplots and classification tablesrevealed a dichotomy Reclassification based on the Reading toLearnReading to Integrate Composite nearly eliminated the mid basic comprehension reading ability category with only 168of the native speakers qualifying as mid on the Reading toLearnReading to Integrate Composite
VI Conclusions
While we acknowledge that there are some limitations to this projectsuch as the time required to administer and score new measuresthe results are illuminating useful and suggest some considerationsfor future research The first consideration relates to appropriateclassification of students based on academic reading abilities Thiscorresponds to the role of TOEFL as a gatekeeper by many institu-tions For TOEFL test-takers whose current TOEFL scores would bein an intermediate to high intermediate range the new tasks couldassess additional abilities relevant to academic performance Forsuch participants additional test development efforts are warrantedThe majority of high reading ability nonnative speakers (649)could perform the new measures a third could not Our resultssuggest that some unknown number of mid reading ability nonnativespeakers are more capable of succeeding at more challenging aca-demic reading tasks than their current level of basic comprehensionassessment would indicate At the same time there are some highand mid reading ability participants who could not perform the moredemanding tasks admitting such students directly into universityprograms could result in failure These results suggest that for moststudents at lower levels of basic comprehension (in our study thosewith TOEFL scores below 490 to 500) development of new tasks isunnecessary Nevertheless our results indicate that a certain smallpercentage (in our sample 8 students 182) of nonnative speakersclassified as low reading ability on basic comprehension did in fact
Latricia Trites and Mary McGroarty 199
perform adequately on more challenging tasks like Reading to Learnand Reading to Integrate Given the very large number of TOEFLtest-takers around the world this finding merits further research Itwould be potentially unfair to exclude such students from universitystudy based on basic comprehension-only measures such as the cur-rent TOEFL We would urge institutions that use TOEFL to considerscores on these new more demanding tasks if doing so wouldaddress institutional needs According to interview data collected inconjunction with this project (Trites 2000 Chapter 6) participantsperceived that successful completion of the new tasks required athorough understanding of the texts whereas the multiple-choicetests were less demanding requiring only superficial grasp of con-tent This finding corroborates the observation of Freedle and Kostin(1999 3) who note that often examinees do not need to comprehendthe accompanying text to answer the test item Therefore for allTOEFL test-takers incorporating more challenging tasks such as theReading to Learn and Reading to Integrate measures into typicalEnglish language instruction could have positive washback effects
The second area of consideration is the focus on additional rele-vant research For predictive validity it is important to determine thecorrelation between the Reading to LearnReading to IntegrateComposite measures and actual academic performance in universityclasses Correlations might differ depending on the degree of catego-rization and synthesis required by different major fields of studyAnother area for future research is related to the novelty of theReading to Learn task as an assessment technique Current pedagog-ical trends in literacy instruction emphasize the use of graphic organ-izers as a means to understand and manipulate information in textsTo our knowledge graphic organizers are typically used as classroomactivities rather than assessment tools They have good potential foruse in assessment if students are familiar with them and if appropri-ate scoring systems can be developed This project demonstrates that it is possible though labor intensive to develop reliable scoringsystems Further research is needed to explore refinement of this tasktype and related scoring systems for use in testing programs withlarge numbers of test-takers and scorers Although the Reading toIntegrate task (generating a synthesis) was more familiar the scoringsystem was innovative because it reflected a readerrsquos ability torecognize textual frames as well as integrate information If testdevelopers are interested in the abilities of nonnative speaker readersto perform such tasks further development of similar tasks andscoring systems is warranted While a substantial investment of time
200 New tasks for reading comprehension tests
would be required to refine the administration and scoring systemsneeded for more complex tasks such as these it should be weighedagainst the possible danger of under-representing the likelihood ofacademic success based on results of the basic comprehension-onlymeasures still most often used to assess academic proficiency
Acknowledgements
This project was funded by a grant from Educational Testing Serviceas part of the TOEFL 2000 effort We appreciate the cooperation ofthe ETS staff members who assisted in the development of passagespecific tasks and other aspects of the research however no officialendorsement of Educational Testing Service should be inferred
VII References
Bachman L 2000 Modern language testing at the turn of the century assur-ing that what we count counts Language Testing 17 1ndash42
Biber D 1993 Using register-diversified corpora for general language studiesComputational Linguistics 19 219ndash41
Britt M Rouet J and Perfetti C 1996 Using hypertext to study and reasonabout historical evidence In Rouet J Levonen J Dillon A and Spiro Reditors Hypertext and cognition Mahwah NJ Lawrence Erlbaum 43ndash72
Chapelle C 1999 Validity in language assessment Annual Review of AppliedLinguistics 19 1ndash19
Educational Testing Service 1997 TOEFL test and score manual PrincetonNJ Educational Testing Service
mdashmdash 1998 Draft TOEFL 2000 research agenda framework areas of researchResearch agenda three TOEFL 2000 internal document Princeton NJEducational Testing Service
Eignor D Taylor C Kirsch I and Jamieson J 1998 Development of ascale for assessing the level of computer familiarity of TOEFL exami-nees TOEFL Research Report No 60 Princeton NJ EducationalTesting Service
Enright M and Schedl M 1999 Reading for a reason using reader purposeto guide test design TOEFL 2000 Internal Report Princeton NJEducational Testing Service
Enright M Grabe W Mosenthal P Mulcahy-Ernt P and Schedl M1998 A TOEFL 2000 framework for testing reading comprehension aworking paper Princeton NJ Educational Testing Service
Foltz P 1996 Comprehension coherence and strategies in hypertext InRouet J Levonen J Dillon A and Spiro R editors Hypertext and cog-nition Mahwah NJ Lawrence Erlbaum 109ndash36
Latricia Trites and Mary McGroarty 201
Freedle R and Kostin I 1999 Does the text matter in a multiple choice testof comprehension The case for the construct validity of TOEFLrsquosminitalks Language Testing 16 2ndash32
Goldman S 1997 Learning from text reflections on the past and suggestionsfor the future Discourse Processes 23 357ndash98
Hayes J and Hatch J 1999 Issues in measuring reliability correlationversus percentage of agreement Written Communication 16 354ndash67
Huberty C 1994 Applied discriminant analysis New York John WileyJamieson J Campbell J Norfleet L and Berbisada N 1993 Reliability
of a computerized scoring routine for an open-ended task System 21305ndash22
Jamieson J Norfleet L and Berbisada N 1993 Successes failures anddropouts in computer-assisted language lessons Computer AssistedEnglish Language Learning Journal 4 12ndash20
Klecka W 1980 Discriminant analysis In Lewis-Beck M editorQuantitative applications in the social sciences Volume 19 NewburyPark CA Sage
Lehto M Zhu W and Carpenter B 1995 The relative effectiveness ofhypertext and text International Journal of Human-ComputerInteraction 7 293ndash313
McNamara D and Kintsch W 1996 Learning from texts effects of prior knowl-edge and text coherence Discourse Processes 22 247ndash88
Messick S 1989 Validity In Linn RL editor Educational measurement 3rdedition New York American Council on Education Macmillan 13ndash103
Meyer B 1985a Prose analyses purposes procedures and problems InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 11ndash64
mdashmdash 1985b Prose analysis purposes procedures and problems Part 2 InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 269ndash97
Monks V 1997 AugustSeptember Two views same waterway NationalWildlife 35 36ndash37
Pellegrino J Baxter G and Glaser R 1999 Addressing the lsquotwo disci-plinesrsquo problem linking theories of cognition and learning with assess-ment and instructional practice Review of Research in Education 24307ndash53
Perfetti C 1997 Sentences individual differences and multiple texts threeissues in text comprehension Discourse Processes 23 337ndash55
Perfetti C Britt MA and Georgi M 1995 Text-based learning andreasoning studies in history Hillside NJ Lawrence Erlbaum
Perfetti C Marron M and Foltz P 1996 Sources of comprehension failuretheoretical perspectives and case studies In Cornoldi C and Oakhill Jeditors Reading comprehension difficulties processes and interventionMahwah NJ Lawrence Erlbaum 137ndash65
202 New tasks for reading comprehension tests
Reinking D 1988 Computer-mediated text and comprehension differencesthe role of reading time reader preference and estimation of learningReading Research Quarterly 23 484ndash500
Reinking D and Schreiner R 1985 The effects of computer-mediated texton measures of reading comprehension and reading behavior ReadingResearch Quarterly 20 536ndash53
Spivey N 1997 The constructivist metaphor reading writing and themaking of meaning San Diego CA Academic Press
Stevens J 1996 Applied multivariate statistics for the social sciences 3rdedition Mahwah NJ Lawrence Erlbaum
Tabachnick B and Fidell L 1996 Using multivariate statistics 3rd editionNew York Harper Collins
Taylor C Jamieson J Eignor D and Kirsch I 1998 The relationshipbetween computer familiarity and performance on computer-basedTOEFL test tasks TOEFL Research Report No 61 Princeton NJEducational Testing Service
Tennesen M 1997 NovemberDecember On a clear day National Parks 7126ndash9
Trites L 2000 Beyond basic comprehension reading to learn and reading tointegrate for native and non-native speakers Unpublished doctoraldissertation Northern Arizona University Flagstaff AZ
Van den Berg S and Watt J 1991 Effects of educational setting on studentresponses to structured hypertext Journal of Computer-BasedInstruction 18 118ndash24
Van Dijk TA and Kintsch W 1983 Strategies of discourse comprehensionNew York Academic Press
Wiley J and Voss J 1999 Constructing arguments from multiple sourcestasks that promote understanding and not just memory for text Journalof Educational Psychology 91 310ndash11
Zimmerman T 1997 December 29 Filter it with billions and billions of oys-ters how to revive the Chesapeake Bay US News and World Report 12363
Appendix 1a Chart completion task
Directions Complete the following chart Fill in as much detail aspossible from the text read by categorizing the information into thedifferent areas on the chart Do not use single words for yourresponses form your responses in phrases or complete sentencesInclude examples from the text
Make no judgments about the accuracy of causes or effects or theeffectiveness of the solutions mentioned in the text Solutions are seen
Latricia Trites and Mary McGroarty 203
204 New tasks for reading comprehension tests
as any action taken in response to the problem(s) Solutions can takemany forms such as proposed solutions attempted solutions or failedsolutions Also space is provided under each category for examplesExamples are specific examples found in the text that are used by theauthor(s) to exemplify the problems causes effects or solutions inthe text However there may not be examples for every category
Points will be awarded for correct responses only There is no penaltyfor incorrect responses Points will be awarded in the following manner
Problems and Solutions 10 points eachCauses and Effects 5 points eachExamples 1 point each
Problems Causes Effects Solutions
Examples Examples Examples Examples
Latricia Trites and Mary McGroarty 205
Ap
pen
dix
1b
Rea
din
g t
o L
earn
sco
rin
g r
ub
ric
0ndash2
41 t
ota
l po
ssib
le
Pro
ble
ms
(10
po
ints
) 1
wo
rd
Cau
ses
(5p
oin
ts)
1 w
ord
E
ffec
ts (
5 p
oin
ts)
1 w
ord
S
olu
tio
ns
(10
po
ints
en
d)
1 w
ord
(6 p
oin
ts)
40
max
imu
m(3
po
ints
) 5
5 m
axim
um
(3 p
oin
ts)
30
max
imu
m(6
po
ints
) 9
0 m
axim
um
bullA
ir p
ollu
tio
nS
mo
g i
n t
he
bullS
ulf
ur
nit
rog
en e
mis
sio
ns
bullTre
es a
nd
pla
nts
aff
ecte
dS
tric
ter
En
vir
on
men
tal
Law
s
Nati
on
al
Park
sbull
Aci
d r
ain
(in
jure
d)
gro
wth
hin
der
ed(r
eso
luti
on
s
acts
)
bullO
pp
osit
ion
or
Ign
ori
ng
bull
Gro
un
d-l
evel
ozo
ne
bullV
isib
ilit
y d
ecre
ased
bull19
77 C
lean
Air
Act
(or
lack
of
coo
per
atio
n)
to
Urb
an
In
du
str
ial
em
issio
ns
bullM
eta
ls l
oo
sen
ed
into
wat
ers
Am
en
dm
en
ts l
ab
elin
gN
Ps
envi
ron
men
tal s
tan
dar
ds
bullA
uto
mo
bile
emis
sio
ns
(su
rfac
eg
rou
nd
wat
er
as C
lass I
are
as
(reg
ula
tio
ns)
bullP
ow
er
pla
nts
fac
tori
esp
lan
ts
wat
er p
ollu
ted
)bull
Reg
ion
al h
aze
reg
ula
tio
ns
bullN
o t
rue
lsquopo
int
sou
rces
rsquoem
issi
on
sbull
Nu
trie
nts
rem
oved
pro
po
sed
by
the
EPA
(myri
ad
of
sm
aller
po
llu
tio
n
bullE
mis
sio
ns
fro
m S
mo
kesta
cks
(lea
ched
) fr
om
so
il bull
Red
ucti
on
of
allo
wab
leso
urc
es
rath
er t
han
on
e (c
him
neys)
and
or
pla
nts
po
llu
tio
n s
tan
dard
s f
rom
la
rge
sou
rce)
bullE
mis
sio
ns
fro
m K
iln
sbull
Pu
blic o
utc
ryo
ver
stan
ce
ind
ust
rybull
Nati
on
al
Park
s l
imit
ed
bull
Un
reg
ula
ted
po
lluti
on
sm
og
o
f b
ig b
usi
nes
sg
ove
rnm
ent
bullC
lean
Air
Act
set
ju
risd
icti
on
of
po
lluti
on
(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)bull
Aq
uati
c l
ife in
jure
d (
dam
aged
)vis
ibilit
y g
oals
sou
rces
ou
tsid
e o
f th
e p
arks
bullS
mo
ke f
rom
Co
ntr
olled
bu
rns
bullO
bje
cti
ng
to c
on
str
ucti
on
bullE
mis
sio
ns
fro
m la
rge
citi
esp
erm
its
bullId
en
tifi
cati
on
of
po
lluti
on
so
urc
es
bullIn
du
str
y m
od
ificati
on
s
Insta
llati
on
o
f d
evic
es
(scr
ub
ber
s)
to r
edu
ce
po
lluti
on
bullP
ub
lic p
ressu
re t
op
rote
ct p
arks
206 New tasks for reading comprehension testsA
pp
en
dix
1b
(co
nti
nu
ed)
Exam
ple
s (
1 p
oin
t) 5
maxim
um
Exam
ple
s (
1 p
oin
t o
r 5 p
oin
ts
Exam
ple
s (
1 p
oin
t) 6
maxim
um
Exam
ple
s (
1 p
oin
t o
r 10 p
oin
ts
wit
h o
vera
rch
ing
cau
se
wit
h o
vera
rch
ing
so
luti
on
liste
d a
bo
ve)
liste
d a
bo
ve)
bullG
reat
Sm
oky M
ou
nta
inbull
Au
tom
ob
ile
emis
sio
ns
bullLe
aves
of
pla
nts
tu
rnin
g
bull19
77 C
lean
Air
Act
N
atio
nal
Par
kbull
Po
wer
pla
nts
fac
tori
es
pu
rple
an
d b
row
n (
stip
plin
g)
Am
en
dm
en
ts
lab
elin
gN
Ps
bullG
ran
d C
an
yo
np
lan
ts e
mis
sio
ns
bullH
ind
erin
g p
ho
tosyn
thesis
of
as C
lass I
are
as
Nat
ion
al P
ark
bullE
mis
sio
ns
fro
m
pla
nts
an
d t
rees
bullS
mo
ky M
ou
nta
in S
ou
rces
S
mo
kesta
cks
(ch
imn
eys)
bullR
ed
ucti
on
of
vis
ibilit
yat
bullR
egio
nal
haz
ere
gu
lati
on
s
fro
m O
hio
New
Yo
rk
bullE
mis
sio
ns
fro
m K
iln
sG
ran
d C
an
yo
np
rop
ose
d b
y th
e E
PA
Atl
anta
etc
bull
Un
reg
ula
ted
po
lluti
on
sm
og
bull
Vis
ibili
ty r
educ
ed 9
0 o
f bull
Red
ucti
on
of
allo
wab
lebull
Gra
nd
Can
yon
So
urc
es(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)th
e da
ysp
ollu
tio
n s
tan
dard
s f
rom
fro
m C
A N
V U
T A
Z N
M
bullS
mok
e fr
om C
on
tro
lled
bu
rns
bullS
ee v
ague
blu
e m
asse
sin
du
stry
and
Mex
ico
bullE
mis
sio
ns
fro
m la
rge
citi
esbull
See
hal
f as
far
as
in 1
919
bullT
N L
utt
rell
corp
bu
ildin
g
per
mit
sit
uat
ion
Exam
ple
s (
1 p
oin
t)
10 m
axim
um
Exam
ple
s (
1 p
oin
t)
5 m
axim
um
bullN
avaj
o G
ener
atin
g S
tati
on
bull
Ob
ject
ing
to
kiln
s in
TN
Pag
e A
Z (
Po
wer
Pla
nt
AZ
)bull
Res
earc
her
s u
sin
g s
cien
tifi
cbull
Po
wer
pla
nts
in T
N a
nd
OH
tech
no
log
y to
ID s
ou
rces
rive
r va
lley
(rad
ioac
tive
iso
top
es)
bullS
ou
ther
n C
alif
orn
ia E
dis
on
bull
So
uth
ern
Cal
ifo
rnia
Ed
iso
nP
lan
t L
aug
hlin
NV
Pla
nt
in L
aug
hlin
NV
bull
Ten
nes
see
Lutt
rell
Kiln
s T
Nid
enti
fied
as
po
int
sou
rce
bullIn
du
stry
in A
Zbull
Scru
bber
sin
stal
led
at
bullC
ars
in C
AN
avaj
o G
ener
atin
g P
lan
tbull
Sm
oke
stac
ks in
NM
NV
UT
bull10
r
edu
ctio
n o
f p
ollu
tio
nbull
Los
An
gel
esin
10ndash
15 y
ears
(g
oal
of
no
bullA
tlan
tam
an-m
ade
po
lluti
on
)bull
New
Yo
rk
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
higher education ETS aware of this minimum standard and asBachman (2000) mentions the need for task authenticity embarkedon a large-scale project to redesign TOEFL to better reflect the aca-demic language skills required in higher education Among othergoals the TOEFL 2000 project (Enright et al 1998) outlined plansto establish reading tasks for four distinct purposes
bull finding informationbull achieving basic comprehensionbull learning from texts andbull integrating information
The latter two purposes for reading represent a departure from tradi-tional reading tests and constitute more complex tasks that requiremore cognitive processing Tasks appropriate to measure these newpurposes needed to be developed and validated
The project reported here pursued the creation and evaluation ofthese new task types the development of scoring rubrics and the evaluation of native language effects on task and test perform-ance (Educational Testing Service 1998) In addition because thesewere new reading tasks some evidence for their validity was soughtby establishing a baseline for native speakers and then comparingthat baseline to performance of nonnative speakers The TOEFL2000 reading construct paper (Enright et al 1998) suggested that aReading to Learn task would require students to recognize the largerrhetorical frame organizing the information in a given text and carryout a task demonstrating awareness of this larger organizing frameEnright et al (1998) hold that in reading to learn readers must inte-grate and connect information presented by the author with whatthey already know Thus readers must rely on background knowl-edge of text structures to form a Situation Model a representation ofthe content and a Text Model a representation of the rhetoricalstructures of the text as postulated by van Dijk and Kintsch (1983)and discussed by Perfetti (1997) Goldman (1997 362) asserted thatto learn from texts readers must have an awareness of text structureand know how to use it to aid comprehension Reading to Learn canbe assessed in a variety of ways McNamara and Kintsch (1996) sug-gested that inferencing and sorting tasks requiring readers to processthe text based on domain-specific knowledge of the text structurescould yield a representation of the readersrsquo ability to learn from thetext Hence we postulated that one useful means of assessmentwould be to have participants recall information and reproduce infor-mation relationships reflecting their concept of text structure
Latricia Trites and Mary McGroarty 175
(Enright et al 1998 46ndash48) For the Reading to Learn task weassessed readersrsquo knowledge model through their ability to recall andcategorize information from a single text (Enright et al 1998 57)
Another goal of the project was to assess Reading to Integrateinformation which requires readers to integrate information frommultiple sources on the same topic Reading to Integrate goes a stepfurther than Reading to Learn because readers must integrate therhetorical and contextual information found across the texts andgenerate their own representation of this interrelationship (Perfetti1997) Therefore readers must assess the information presented inall sources read and accept or reject pieces of it as they create theirown understanding One means of assessing integration of informa-tion found in typical university assignments is the open-ended taskof generating a synthesis based on one or more texts (Enright et al1998 48ndash49) We used a writing task specifically a writing promptthat elicited the readerrsquos perception of the authorsrsquo communicativepurposes (Enright et al 1998 56) as well as amount of informationretained from two texts to test Reading to Integrate
II Related literature
Recent research has begun to explore the development of tasks thatdistinguish the constructs of Reading to Learn from basic compre-hension Researchers (van Dijk and Kintsch 1983 McNamara andKintsch 1996 Goldman 1997) have determined that reading tolearn requires an interaction between the Text Model of a text as wellas its Situation Model thus resulting in a more difficult measureThese researchers further suggest that Reading to Learn can beassessed through measures that go beyond recall summarizationand text-based multiple-choice questions
The construct of Reading to Integrate requires that readers notonly integrate the Text Model with the Situation Model but also that they create what Perfetti (1997 346) calls a Documents Modelconsisting of two critical elements lsquoAn Intertext Model that linkstexts in terms of their rhetorical relations to each other and aSituations Model that represents situations described in one or moretext with links to the textsrsquo He argues that the use of multiple textsas opposed to a single text brings into clearer focus the relationshipbetween the Text Model and the Situation Model This again sug-gests that Reading to Integrate should be more difficult than Readingto Learn
176 New tasks for reading comprehension tests
Because these constructs go beyond basic comprehensionReading to Learn and Reading to Integrate are hypothesized to bemore difficult reading tasks than Reading to Find Information andReading for Basic Comprehension Perfetti (1997) further suggeststhat Reading to Integrate is a more difficult task than Reading toLearn because it not only requires an integration of a Text Model anda Situation Model but requires an integration of multiple TextModels and multiple Situation Models Thus current reading theorysuggests a difficulty hierarchy of reading tasks based on the level ofintegration necessary to complete the tasks successfully Severalstudies (Perfetti et al 1995 1996 Britt et al 1996 Wiley and Voss1999) have attempted to move beyond basic comprehension andexamine readersrsquo ability to integrate the information from multipletexts into one cohesive knowledge base by having students makeconnections compare or contrast information across texts
Additionally recent research has addressed the effects of computerson reading and assessment such research is relevant to the currentproject because the new TOEFL is administered via computersReading-medium studies have shown that the only effect that com-puters have on reading is related to task (Reinking and Schreiner1985 Reinking 1988 van den Berg and Watt 1991 Lehto et al1995 Perfetti et al 1995 1996 Britt et al 1996 Foltz 1996Wiley and Voss 1999) Taylor et al (1998) found that after minimalcomputer training familiarity with technology did not have a signif-icant effect on examineesrsquo performance on TOEFL-like questionsBecause of the relevance of computer familiarity to TOEFL admin-istration a brief measure of computer familiarity was included in theresearch
For this project we asked three research questions
1) Is performance on a measure of Reading to Learn affected by medium of presentation (paper versus computer) technologyfamiliarity native language (native versus nonnative speakers of English) or level of education (graduate versus under-graduate)
2) Is performance on a measure of Reading to Integrate affected bymedium of presentation (paper versus computer) technologyfamiliarity native language (native versus nonnative) or level ofeducation (graduate versus undergraduate)
3) To what extent are measures of finding informationbasic read-ing comprehension Reading to Learn and Reading to Integraterelated
Latricia Trites and Mary McGroarty 177
III Methods
1 Participants
Two hundred and fifty-one participants the majority undergraduatesvolunteered to take part in this study The sample consisted of 105undergraduate native speakers of English (NSUs) 106 undergraduatenonnative speakers (NNSUs) and 40 graduate nonnative speakers(NNSGs) of English at a midsized southwestern university All data were collected between February and October 1999 All under-graduate participants were recruited through large undergraduateclasses in the areas enrolling most NNSs (business administrationhotel management engineering social sciences and humanities)We tested all NNSs accessible at the institution at the time of datacollection compared to a national sample of international studentsfrom the prior academic year we had a relatively larger proportionof undergraduate relative to graduate students Nearly all undergrad-uate participants were young adults with an average age of 21Nonnative speakers were also recruited from students enrolled in thesummer intensive English program which is made up of studentsneeding to increase TOEFL scores to at least 500 in order to enrollat a university We included 46 participants (32 of NNS sample)with TOEFL scores below 500 in the nonnative sample Graduatenonnative speakers (n 40) were recruited from the entire univer-sity population and had an average age of 3075 Nonnative speakersrepresented a range of language backgrounds One third wereJapanese with other Asian Germanic and Romance languages alsosubstantially represented Both the relatively modest sample size andthe all-volunteer nature of the participant sample preclude directgeneralization to the worldwide TOEFL population but participantswere representative of the levels of international students at the insti-tution where they were enrolled Participants who completed all fourdata collection sessions received a payment of US$10 per hour(US$40 for the entire project)
2 Instruments
This project used three existing instruments two to determine initialreading levels and one to assess levels of computer familiarity andtwo new instruments one for Reading to Learn and one for Readingto Integrate these were developed especially for the project Each ofthe new measures also served as the basis for an additional measure
178 New tasks for reading comprehension tests
of basic reading comprehension related directly to the text includedin the new task Thus each participant completed a total of sevendifferent instruments
a Existing instruments Initial levels of reading comprehensionwere determined based on the NelsonndashDenny Reading Test(NelsonndashDenny) Form G used to identify the reading levels of theNSs and three retired versions of the Institutional TOEFL ReadingComprehension Section (TOEFL Reading Comprehension) used toidentify the reading levels of the NNSs Although each of these testswas used to assess reading levels in the population for which it had been developed all 251 participants took both tests in order toprovide comparative data All 251 participants also completed a brief computer familiarity questionnaire
Participantsrsquo computer familiarity was determined through an 11-item questionnaire based on a longer 23-item questionnairepreviously developed by ETS (Eignor et al 1998) In the presentstudy we used only the 11 items that loaded the most heavily on themajor factors resulting from administration to a large sample ofTOEFL participants For these 11 items developers determined thereliability to be 93 using a split-half method (Eignor et al 199822) This brief questionnaire took approximately 5 minutes tocomplete reliability in our sample using coefficient alpha was 87
b Texts used for new measures In developing the new tasks weselected texts that would conform to the design specifications ofTOEFL 2000 They were problemsolution texts recommended asone of the potentially relevant text types for TOEFL 2000 (Enright et al 1998) Longer texts were used because these represented morechallenging and authentic academic tasks (Enright et al 1998) Weused one 1200-word and two 600-word texts The longer text(Tennesen 1997) was used to assess Reading to Learn and the two600-word texts (Monks 1997 Zimmerman 1997) were used toassess Reading to Integrate We chose these text lengths based onwork by Meyer (1985a) and further research by the first authorindicating that natural science texts between 1200 and 1500 wordsincluded representation of all necessary macro-rhetorical structuresof problemsolution texts with or without explicit signaling While1200ndash1500 word texts provide optimal representation of the macro-rhetorical structures texts of 600-words provide all the basic macro-rhetorical structures present in problemsolution texts Thus these
Latricia Trites and Mary McGroarty 179
180 New tasks for reading comprehension tests
lengths were long enough for adequate argumentation but not so long that they were excessively redundant (Enright et al 1998)Texts were also matched for readability according to standard read-ability scales such as the FleschndashKincaid ColemanndashLiau andBormuth scales and averaged a minimum of grade level 110 to 120on these scales Also all texts pertained to natural and social scienceseach text covered environmental issues such as air and water pollution(Enright et al 1998) Thus text topics were similar across tasks
c New instruments used in the study Three new reading measureswere used in this study to assess Reading to Learn Reading to Integrateand Basic Comprehension Trites (2000 Chapters 2 and 3) presents amore extensive review of literature and rationale for development of thenew measures
bull Reading to Learn The first new measure completion of a chart was used to determine participantsrsquo ability to read to learnSpivey (1997 69) suggests that readersrsquo categorization of infor-mation in text offers insight into their cognitive processes andtheir making of meaning We designed a measure to be used with a 1200-word text that students read on either paper or com-puter Students were asked to recall identify and categorizeinformation from the text on a chart reflecting macro-rhetoricalstructures called macrostructures in this study (problems andsolutions) and other types of information from problemsolutiontexts (causes effects and examples) categories based on thework of Meyer (1985a) The scoring rubric based on work byMeyer (1985b) and later modified by Jamieson et al (1993)awarded points only for the upper levels of textual structurerepresented on the chart (for task and scoring rubric seeAppendix 1) We weighted the information supplied on the chartas follows 10 points for correct information in the problem andsolution categories five points for correct information suppliedin the cause and effect categories and one point for accurateexamples This weighting reflects Meyerrsquos (1985b) hierarchicallevels which characterize problem and solution propositions ashigher order structures while the other categories represent lowerorder propositions1 The theoretical maximum score for this scale
1Students received no points for information improperly placed or for information not found in thetext
Latricia Trites and Mary McGroarty 181
was 241 which would result from maximum points given in allcategories The first author and two research assistants spent35ndash40 hours creating revising norming the scoring rubric anddeveloping the scoring guide (Trites 2000 Chapter 3) To deter-mine interrater reliability we used coefficient alpha rather thanpercentage of agreement because percentage of agreementinflates the likelihood of chance agreement (Hayes and Hatch1999) After norming overall interrater reliability was 99(coefficient alpha) with similarly high reliabilities assessed withsimilarly high alpha coefficients for all subcategories2
bull Reading to Integrate The second new measure assessed Readingto Integrate The task used to assess Reading to Integraterequired participants to read two 600-word texts and compose awritten synthesis The prompt asked students to make connec-tions across the range of ideas presented thus we asked readersto synthesize information rather than summarize or makecomparisons (Wiley and Voss 1999) This synthesis was scoredbased on an analytic scale ranging from 0 to 80 reflecting read-ersrsquo ability to recognize and manipulate the structure of the textsinclude specific information and express connections acrosstexts through the use of cohesive devices (for task and scoringrubric see Appendix 2) The test was designed to measure theintegration of content from both readings and did not assessother aspects of writing such as the creation of rhetorical stylegrammaticality or mechanics The rubric was composed of threesubcategories integration ability macrostructure recognitionand use of relevant details The integration subscore wasawarded the highest point values because this was the predomi-nant skill being tested It scored participants on their ability tomake connections across texts based on the manipulation of thetextual frames in both texts The second subcategory awardedpoints for the ability to recognize and articulate the macrostruc-tures (problem cause effect or solution) present in each textThis subcategory was similar to the categorizing task used in theReading to Learn measure with the additional constraint thatparticipants had to express the connections overtly The thirdsubcategory in the scoring rubric analysed the ability to use
2We recognize that tasks requiring high inference measures plus extensive norming and revision of the scoring rubric pose feasibility issues in large-scale testing Further research is needed todetermine whether and how such scoring procedures could be adapted in standardized testing fornumerous test-takers
relevant details as support in the written synthesis The firstauthor and two research assistants spent 30 hours revising norm-ing the scoring rubric and developing a decision guide resultingin an overall interrater reliability of 99 (coefficient alpha) withsimilarly high alphas for all subcategories
bull Basic Comprehension The third construct was measured bymultiple-choice tests related specifically to the texts used in the new tasks These tests were created by TOEFL TestDevelopment staff and followed current TOEFL reading sectionspecifications We used two multiple choice tests BasicComprehension Test 1 (BC1) and Basic Comprehension Test 2 (BC2) 20 items each one for the longer passage used to assessReading to Learn and one for the two passages used to assessReading to Integrate Both were scored based on number of items answered correctly Reliability on BC1 calculated basedon 251 participants was 84 (coefficient alpha) Inadvertently theorder of the texts used in BC2 was different for the two differentmedia however reliability on both versions of the test was highFor those who took BC2 based on paper texts (n 127) relia-bility was 84 (coefficient alpha) for those who took BC2 basedon computerized texts (n 124) reliability was 86 (coefficientalpha)
3 Design for data collection
This study used a 22 repeated measures design to examineperformance on the new reading tasks Native speaker undergraduatesand nonnative speaker undergraduates were divided into two groupseach of equal ability as determined by performance on the baselinestandardized measures of reading comprehension (NelsonndashDenny orTOEFL) Half of each group read texts on paper the other half readthe same texts on a computer screen A smaller group of nonnativespeaker graduates equally divided were also included for a compar-ison between performance by graduate and undergraduate nonnativespeakers Additionally the administration of the new measures wascounterbalanced to control for any practice effect
a Procedures All participants met with the researchers in foursessions each lasting about an hour The first two sessions were devotedto administering the existing instruments During Session 1 partici-pants received an introduction to the study and took one of the two
182 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 183
standardized basic reading comprehension measures (NelsonndashDennyor TOEFL Reading Comprehension) Students completed thecomputer familiarity questionnaire and the NelsonndashDenny Test at the same testing session because the NelsonndashDenny was shorter than the TOEFL Reading Comprehension During Session 2 partic-ipants took the other standardized basic reading comprehensionmeasure
Next each participant group was subdivided into two subgroupsfor computer-based or paper reading of the texts for the new tasksThe subgroups were matched on their performance on initial readingmeasures the NelsonndashDenny was used for native speakers and theTOEFL Reading Comprehension was used for nonnative speakersIndependent t-tests run on these reading measures showed no signif-icant difference in basic comprehension for the newly created sub-groups assigned to each medium ensuring that they were balancedfor initial reading levels Participants stayed in the same subgroupsfor the duration of the study To ensure uniformity of response modeall participants whether they read the source texts on the computeror on paper responded to the reading tasks using paper and pencilformat3
The last two sessions each lasting approximately one hour werededicated to administration of the new measures The Reading toLearn session took slightly longer to administer because administra-tive procedures were longer for this novel task The new tasks werecounterbalanced to control for practice effect thus half of the partic-ipants took the Reading to Learn measure first and half took theReading to Integrate measure first During Session 3 we administeredthe first new measure (for ease of discussion Reading to Learn is dis-cussed first) and BC1 At this session students were given 12 minutesto read a 1200-word passage either on computer or on paper We lim-ited the time allowed for reading based on 100 words per minutethought to be ample (Grabe personal communication 1998) Afterexaminees read the text they were given 4 minutes to take notes ona half sheet of paper Participants were instructed to take minimalnotes due to the time constraints Next the text was removed andexaminees were allowed 15 minutes to complete a chart based on the reading with the aid of their notes After completing this Readingto Learn activity participants were allowed to use the text and
3Although responses could have been entered and perhaps scored by computer this would haveintroduced factors not directly related to our research questions and remains an area for furtherstudy
were given 15 minutes to answer BC1 Following these new testingsessions 49 participants were selected for a related interview con-cerning the cognitive processes used in task completion (for furtherdetails see Trites 2000 Chapter 6)
During Session 4 students were given 12 minutes to read twoshort texts (600 words each) either on computer or paper Afterparticipants read the assigned texts they were given 4 minutes totake one-half page of notes (Enright et al 1998) Next the textswere removed and participants were asked to demonstrate Readingto Integrate by writing a synthesis of the texts with the aid of theirnotes (15 minutes allowed for this task) After completing theReading to Integrate task participants were allowed to see the textsagain and answered BC2 (15 minutes allowed for this task) In oneReading to Integrate session for unknown reasons six of the sevenparticipants read only one text Because we cannot explain the causeof this anomalous session we have eliminated scores from thesessionrsquos seven participants from subsequent analyses thus slightlyreducing the N size for the Reading to Integrate measure
b Variables used in study The six independent variables includedthree nominal (Native Language Background Medium of TextPresentation and Level of Education) and three interval variables(NelsonndashDenny TOEFL Reading Comprehension and ComputerFamiliarity) The four dependent variables were Reading to LearnReading to Integrate BC1 and BC2
IV Results
First we present the descriptive statistics for all reading measuresfollowed by a systematic analysis of independent variables that mightaffect participant performance on the new measures Scatterplotswere checked for all reading measures to ensure normality of dataKurtosis and skewness levels for all reading measures were found tobe within normal limits indicating a relatively normal distributionDescriptive statistics for all existing measures are shown in Table 1Means for these measures show a consistent pattern the nativespeaker undergraduates had the highest mean followed by the non-native speaker graduates followed by the nonnative speaker under-graduates On the reading measures NelsonndashDenny and TOEFLReading Comprehension the nonnative speaker undergraduates
184 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 185
showed the largest variance in performance while on the computerfamiliarity measure the variance of both nonnative speaker groupswas substantially larger than that of the native speakers
The same pattern emerged for the means on the new measures (seeTable 2) as for the existing measures The native speaker undergradu-ate group performed better on all new measures than both of thenonnative speaker groups The nonnative speaker graduate groupperformed better than the nonnative speaker undergraduate group onall measures as well This robust pattern of performance was alsofound in the variance of three of the four new measures On BC1 andBC2 the performance of the native speaker undergraduates showed the least amount of variance followed by the nonnative speaker grad-uates followed by the nonnative speaker undergraduates On Readingto Integrate the native speaker undergraduate group showed substan-tially less variance than the nonnative speaker groups however thevariance of the two nonnative speaker groups was almost identical OnReading to Learn all three groups showed considerable variance
Table 3 reveals the range of awarded points achieved by all partici-pant groups The nature of the Reading to Learn point system created amaximum possible point value (241) that no participant achieved Wespeculate that there are at least three possible causes of the discrepancybetween the theoretical maximum and the range of observed scores
Table 1 Descriptive statistics for existing measures for three participant groups
Group n Mean sd kMax
NelsonndashDennyNSU 105 12648 1646 156NNSU 106 6724 3191 156NNSG 40 8888 2188 156Total participants 251 9547 3693 156
TOEFL Reading comprehensionNSU 105 6130 424 67NNSU 106 5030 853 67NNSG 40 5715 455 67Total participants 251 5599 819 67
Computer familiarityNSU 104 3808 360 44NNSU 104 3482 599 44NNSG 40 3563 602 44Total participants 248 3631 533 44
Note kMax number of items or maximum possible score
186 New tasks for reading comprehension tests
bull task novelty no participant reported ever doing such a taskpreviously
bull time allowed for task completion andbull space on the response sheet space constraints may have limited
the amount of information that participants could include
Future research would need to address these issues However for theReading to Integrate measure the full range of possible point totalswas achieved by at least one participant in each group
1 Computer familiarity
The overall plan for the analyses was to check the influence of theindependent variables on the dependent measures with computerfamiliarity being addressed first Initially we had proposed that if computer familiarity was significantly different across groups itwould be entered into all calculations as a covariate To determinethis it was necessary to conduct an Analysis of Variance (ANOVA) forcomputer familiarity across the six participantmedium subgroups
Table 2 Descriptive statistics for new measures for three participant groups
Group n Mean sd kMax
Reading to Learn (chart)NSU 105 5185 1986 241NNSU 106 3173 1950 241NNSG 40 4468 1927 241Total participants 251 4221 2164 241
Basic Comprehension Test 1NSU 105 1698 247 20NNSU 106 1173 425 20NNSG 40 1498 350 20Total participants 251 1444 423 20
Reading to Integrate (synthesis)NSU 101 6365 1105 80NNSU 103 3724 2176 80NNSG 40 5360 2103 80Total participants 244 5086 2163 80
Basic Comprehension Test 2NSU 105 1591 278 20NNSU 106 975 454 20NNSG 40 1285 361 20Total participants 251 1282 472 20
Notes kMax number of items or maximum possible score n size reduced forreading to integrate because of anomalous testing session
Latricia Trites and Mary McGroarty 187
The resulting ANOVA (F 470 p 05) showed a significantdifference between subgroups on the computer familiarity question-naire therefore a post hoc Scheffeacute test was done to locate significantcontrasts After analysis of all possible subgroup contrasts the post hoc Scheffeacute revealed that the only significant difference in sub-groups appeared between the native speaker undergraduates andnonnative speaker undergraduates who read texts on paper Hencealthough there was one significant contrast it occurred in two sub-groups reading on paper not in any of the subgroups who read oncomputer All groups generally scored high on computer familiarityalthough as noted variance of the nonnative groups was greater Itwas thus established that computer familiarity had no significanteffect on participants who read texts on computer so we did not usecomputer familiarity as a covariate in further analyses and proceededto the three research questions of central interest to this study
Because both Research Questions 1 and 2 are similar ndash except thatthey address the two different new reading measures Reading toLearn and Reading to Integrate ndash we approached them in the samemanner through ANOVA to identify the independent variables thatcould have significantly affected the results on the new measures
2 Research Question 1
The first research question asked if performance on a measure ofReading to Learn was affected by medium of presentation computerfamiliarity native language or level of education We calculated a uni-variate ANOVA with Type III sums of squares on Reading to Learn with
Table 3 Range of scores for new measures for three participant groups
Group n Minimum Maximum kMax
Reading to Learn (chart)NSU 105 14 120 241NNSU 106 0 86 241NNSG 40 3 94 241Total participants 251 0 120 241
Reading to Integrate (synthesis)NSU 101 38 80 80NNSU 103 0 80 80NNSG 40 5 80 80Total participants 244 0 80 80
Notes kMax number of items or maximum possible score n size reduced forreading to integrate because of anomalous testing session
188 New tasks for reading comprehension tests
group status medium of text presentation and test order as possiblecontributing factors4 Table 4 shows that there were no significantinteractions for any of the group medium or test order combinationsThe only significant main effect was group membership
Because group membership was a combined measure that includedboth native language background as well as level of education posthoc analysis was needed to identify the significant contrasts Table 5shows that there was a significant difference in performance on theReading to Learn measure between the native speaker undergraduateand the nonnative speaker undergraduate groups as well as a sig-nificant difference between the nonnative speaker undergraduate and nonnative speaker graduate groups There was no significantdifference in performance between the native speaker undergrad-uate and the nonnative speaker graduate groups Therefore theanswer to Research Question 1 is that native language backgroundand level of education did have a significant effect on performance onthe Reading to Learn measure but that medium of text presentationdid not Further order of testing whether participants took Readingto Learn or Reading to Integrate first had no significant effect
3 Research Question 2
The second research question related to the first asked if perform-ance on Reading to Integrate was affected by medium of presentation
Table 4 Performance on Reading to Learn measure by groups medium and testorder (n 251) (univariate analysis of variance)
Source Type III sum df Mean square Fof squares
Group 2186929 2 1093464 2810Medium 121586 1 121586 313Test order 29498 1 29498 076Group medium 33481 2 16740 043Group test order 437 2 219 001Medium test order 39173 1 39173 101Group medium test order 57529 2 28765 074Error 9299745 239 38911
Note p 05
4Test order was added as an additional variable to double check that our counterbalancing had beeneffective in controlling for any practice effect
Latricia Trites and Mary McGroarty 189
computer familiarity native language or level of education Again toensure that counterbalancing of tests controlled for any practiceeffect test order was added as an additional variable
To answer this question we proceeded to calculate a univariateANOVA on the Reading to Integrate measure with group statusmedium of text presentation and test order entered as possible contri-buting factors The results (Table 6) show as for Research Question 1that there were no significant interactions for any of the groupmedium or test order combinations the only significant main effectwas group membership The answer for Research Question 2 is thatnative language background and educational level had a significanteffect on Reading to Integrate but medium of text presentation did notPost hoc analysis of group contrasts showed that all three groups weredistinct in their performance on Reading to Integrate (see Table 7)
4 Research Question 3
The third research question asked to what extent measures of basiccomprehension Reading to Learn and Reading to Integrate were
Table 5 Post hoc Scheffeacute for Reading to Learn measure (n 251)
Group n Group n Mean difference Standard error
NSU 105 NNSU 106 2012 272NNSG 40 717 367
NNSU 106 NNSG 40 1295 366
Note p 05
Table 6 Performance on Reading to Integrate measure by groups medium andtest order (n 244a) (univariate analysis of variance)
Source Type III sum df Mean square Fof squares
Group 3629433 2 1814717 5582b
Medium 19295 1 19295 059Test order 9783 1 9783 030Group medium 1182 2 591 002Group test order 3014 2 1507 005Medium test order 109872 1 109872 338Group medium test order 148858 2 74429 229Error 7543037 232 32513
Notes an size reduced for Reading to Integrate because of anomalous testingsession bp 05
190 New tasks for reading comprehension tests
related We used correlational analysis as the first step in answeringthis question Results for the total participant population (see Table 8)showed moderate to high correlations across all reading measuresHowever the analyses done for Research Questions 1 and 2 revealedthat group status had a significant effect on performance on Readingto Learn and Reading to Integrate Further we realize that corre-lations are sensitive to variance so the high correlations seen in thetotal population could have been an artifact of combining the threegroups Therefore we examined the correlations among all readingmeasures for each group (available in Trites 2000 Appendix 1 pp230ndash33) While the reading measures were still correlated oftenmoderately sometimes highly magnitudes differed and sometimesdropped substantially The text-specific multiple-choice measuresBC1 and BC2 consistently correlated more highly with theNelsonndashDenny and TOEFL Reading Comprehension tests than withReading to Learn and Reading to Integrate based on the same textssuggesting a test method or construct effect Because comparisonsbetween different measures of basic comprehension were not a goal of the project BC1 and BC2 were not used in further analysesWe conclude that as expected all reading measures were relatedbut the lower correlations between Reading to Learn and Reading to Integrate and the traditional basic comprehension measures led us to consider further types of analysis to identify the possibledistinctiveness of the new measures
5 Discriminant analysis
Because we were interested in determining how constructs differedwe sought additional analyses to help us better characterize the new constructs Of the several possible statistical methods that could have been employed two are most plausible multivariateanalysis of variance usually associated with experimental research
Table 7 Post hoc Scheffeacute for Reading to Integrate measure (n 244a)
Group n Group n Mean difference Standard error
NS 101 NNSU 103 2641b 253NNSG 40 1005b 337
NNSU 103 NNSG 40 1636b 336
Notes an size reduced for Reading to Integrate because of anomalous testingsession bp 05
Latricia Trites and Mary McGroarty 191
Tab
le 8
Co
rrel
atio
ns
for
all r
ead
ing
mea
sure
s fo
r al
l par
tici
pan
ts (
n
251)
TO
EFL
Rea
din
gB
asic
B
asic
Rea
din
g t
oR
ead
ing
to
C
om
pre
hen
sio
nC
om
pre
hen
sio
n T
est
1C
om
pre
hen
sio
n T
est
2Le
arn
Inte
gra
tea
Nel
son
ndashDen
ny
90b
85b
84b
66b
69b
TO
EFL
Rea
din
g
100
85b
84b
64b
69b
com
pre
hen
sio
nB
asic
1
008
4b6
8b6
8b
com
pre
hen
sio
n 1
Bas
ic
100
68b
70b
com
pre
hen
sio
n 2
Rea
din
g t
o L
earn
100
59b
Not
es a
nsi
ze r
edu
ced
fo
r R
ead
ing
to
Inte
gra
te b
ecau
se o
f an
om
alo
us
test
ing
ses
sio
n b
p
05
and discriminant analysis usually associated with descriptiveresearch (Tabachnick and Fidell 1996) The present research wasconducted with samples of naturally occurring student groups andwas not experimental Moreover we were interested in finding waysto compare participant performance on the measures of the newconstructs Reading to Learn and Reading to Integrate with perform-ance on more traditional measures of basic comprehension Thus we opted to use discriminant analysis because of its parsimony of description and clarity of interpretation (Stevens 1996)Discriminant analysis a technique recommended to describe groupdifferences or predict group membership based on a comparison ofmultiple predictors (Huberty 1994) has been used in other areas ofapplied linguistic research to investigate creation of a student profileof success or failure on Computer Assisted Language Learning(CALL) lessons (Jamieson et al 1993) and accurate classification oftext types into registers (Biber 1993) among other purposes
To further distinguish basic comprehension from the new con-structs we conducted discriminant analysis on each of the twolanguage groups (native and nonnative) to determine whetherReading to Learn and Reading to Integrate would classify partici-pants in the same way that Basic Comprehension would We dividedthe native speaker and nonnative speaker groups into three levelshigh middle (mid) and low reading ability scorers based on thebasic comprehension measure chosen for that group (NelsonndashDennyfor native speakers TOEFL Reading Comprehension for nonnativespeakers) Research methodologists (Tabachnick and Fidell 1996513) note that robustness is expected when the smallest group has isleast 20 our smallest group was 25 To check the assumption ofhomogeneity of the variancecovariance matrices we examined theoutcomes of Boxrsquos M Test and found them all nonsignificant(Klecka 1980) Each group was checked for outliers usingMahalanobis distance treated as Chi-Square and no outliers were found (Tabachnick and Fidell 1996) Thus the data met allassumptions required for use of discriminant analysis
a Discriminant analysis for nonnative speakers To organize thediscriminant analysis in order to see if the Reading to LearnReadingto Integrate Composite classified participants similarly to themeasure of Basic Comprehension (for nonnative speakers TOEFLReading Comprehension) we divided the entire nonnative speakergroup (n 146) into three levels of basic comprehension high mid
192 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 193
and low These reading ability groups of high mid and low werebased on typical TOEFL Reading Comprehension score levelsrequired for program entry Participants were classified as high iftheir scores were greater than or equal to 550 or 56 and above on thescaled score on the TOEFL Reading Comprehension (550 is the cutscore often used for graduate entry) Participants were classified asmid if scores ranged between 500 and 549 or 50 to 55 on the TOEFLReading Comprehension Participants were classified as low if theirscores fell below 500 or 49 and below on TOEFL ReadingComprehension (500 is a minimum TOEFL score sometimes usedfor undergraduate admission often with the proviso that studentsenroll in ESL classes either prior to official enrollment or concur-rently) For our entire nonnative speaker group descriptive statisticson basic comprehension reading ability group membership levelsappear in Table 9
We ran SPSS Discriminant Analysis with initial grouping variablesof high mid and low reading ability We compared initial readingability levels with high mid and low categories on the Reading toLearnReading to Integrate Composite a new variable reflectinglevel of performance on the Reading to Learn and Reading toIntegrate measures combined5 The discriminant analysis yieldedone discriminant function with an eigenvalue of 92 responsible for999 of the variance in outcomes Wilkrsquos Lambda for this functionwas 52 significant at 001 the associated Chi-Square value wasextremely large (9093) and highly significant ( p 001) indicatingthe group centroids on the composite Reading to LearnReading toIntegrate function for the three nonnative speaker reading abilitygroups were significantly different Both Reading to Learn and
Table 9 Descriptive statistics for nonnative speakers by TOEFL reading compre-hension reading ability groups (n 143)
Reading ability group n Mean on TOEFL reading comprehension sd
High (56) 57 5968 303Mid (50ndash55) 42 5281 167Low (49) 44 4211 547
Total 143 5226 822
Note Total n 143 due to loss of three cases in anomalous Reading to Integratesession
5We first calculated two separate discriminant analyses one for Reading to Learn and one for Reading to Integrate but we found that both loaded on a single function so we used thecomposite in subsequent analyses
194 New tasks for reading comprehension tests
Reading to Integrate loaded significantly on the discriminant func-tion at 86 for Reading to Learn and 78 for Reading to Integrate ( p 05)
Over half (649) of the high reading ability group remained highon the new measure less than half (429) of the mid reading ability group remained classified as mid Hence the Reading toLearnReading to Integrate Composite was particularly influential inreclassifying the mid reading ability group and to a lesser extent thehigh group However most (818) of the low reading ability groupremained low on the Reading to LearnReading to IntegrateComposite (see Table 10) Of the 143 nonnative speaker participants92 (64) remained in the initial basic comprehension category onthe composite the rest moved but in different directions Twenty-one participants (147) were classified into a higher category on theReading to LearnReading to Integrate Composite than their initialbasic comprehension level would have suggested while 31 (217)were reclassified into a lower category Thus 51 participants(364) just over one third of the sample were classified differentlybased on their Reading to LearnReading to Integrate Compositeperformance
b Discriminant analysis for native speakers Because one of ourgoals in this project was to probe the possible validity of these newmeasures by assessing performance of two groups native as well as
Table 10 Discriminant analysis comparison of nonnative speaker reading abilitygroups with reading to learnreading to integrate composite (n 143)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto integrate composite
High Mid Low
CountHigh (56) 37 17 3 57Mid (50ndash55) 13 18 11 42Low (49) 2 6 36 44Reclassification total 52 41 50 143
PercentageHigh (56) 649 298 53 1000Mid (50ndash55) 310 429 262 1000Low (49) 45 136 818 1000
NoteTotal n 143 due to loss of three cases in anomalous Reading to Integratesession
Latricia Trites and Mary McGroarty 195
nonnative speakers we conducted a parallel discriminant analysisfor native speakers Thus for the native speakers we followed thesame procedure dividing the entire native speaker group (n 105)into three levels of basic comprehension high mid and low basedon score distances of 5 standard deviations from the sample meanof the NelsonndashDenny test for these participants Native speakersneed not take reading comprehension tests when entering the univer-sity so the three-way split was based entirely on our sample dataParticipants were classified as high if their scores on the Nelson-Denny were greater than or equal to 135 Participants were classifiedas mid if their scores ranged between 119ndash134 on the Nelson-DennyParticipants were classified as low if their scores fell at or below 118on the Nelson-Denny Descriptive statistics for the entire nativespeaker sample on basic comprehension group membership appearin Table 11
The discriminant analysis yielded one discriminant function with an eigenvalue of 31 responsible for 987 of the variance inoutcomes Wilkrsquos Lambda for this function was 76 significant at 001 the associated Chi-Square value was large (2639) and highlysignificant (p 001) indicating the group centroids on the discrim-inant function for the three reading ability groups on the Reading toLearnReading to Integrate Composite were significantly differentPooled within groups correlations between discriminating variablesshowed that Reading to Learn correlated with the first discriminantfunction at a level of 81 Reading to Integrate correlated with thesecond discriminant function at 71 This contrasts with findings forthe nonnative speakers where scores on the combined new measuresloaded significantly on only one discriminant function For nativespeakers then there is evidence for two significant discriminantfunctions although the first accounts for almost all of the varianceAlthough these two measures (Reading to Learn and Reading toIntegrate) loaded on two separate discriminant functions they still
Table 11 Descriptive statistics for native speakers by NelsonndashDenny reading abilitygroups (n 101)
Reading ability group n Mean on NelsonndashDenny sd
High (135) 39 14126 519Mid (119ndash134) 37 12565 509Low (118) 25 10288 1098Total 101 12604 1652
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
196 New tasks for reading comprehension tests
showed moderate correlations with the alternate function6 justifyingthe composite calculations
Results of discriminant analysis for native speakers seen in Table 12 show a different pattern than that observed for nonnativespeakers Nearly three-fourths (718) of the native speakersclassified as high in the basic comprehension reading ability groupremained high on the Reading to LearnReading to IntegrateComposite For the mid group however only 189 remainedclassified as mid Just over half (52) of the low reading abilitygroup members remained low on the Reading to LearnReading to Integrate Composite As with the nonnative speakers participantsin the mid category on basic comprehension showed the mostfrequent reclassification Forty-eight of the 101 (475) nativespeaker participants remained in the initial classification categoriesTwenty-two (218) were reclassified into a higher category and 31(307) were reclassified into a lower category Thus 53 participants(525) over half of the sample were classified differently based ontheir performance on the Reading to LearnReading to IntegrateComposite
6Reading to Learn correlated with function 2 at -58 Reading to Integrate with function 1 at 71
Table 12 Discriminant analysis comparison of native speaker reading abilitygroups with Reading to LearnReading to Integrate Composite (n 101)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto Integrate Composite
High Mid Low
CountHigh (56) 37 17 3CountHigh (135) 28 3 8 39Mid (119ndash134) 10 7 20 37Low (118) 5 7 13 25Reclassification total 43 17 41 101
PercentageHigh (135) 718 77 205 1000Mid (119ndash134) 270 189 541 1000Low (118) 200 280 520 1000
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
V Interpretation
Analyses done to answer Research Questions 1 and 2 showed that performance on Reading to Learn and Reading to Integratemeasures was significantly influenced by language background and level of education (graduate vs undergraduate for nonnativespeakers only) Moreover level of computer familiarity had nosignificant effect on Reading to Learn and Reading to Integrateperformance
Correlations showed that as expected all reading measures corre-lated to some degree generally answering Research Question 3 Themost interesting results came from the discriminant analyses becausethey showed a pattern of differential classification on the new taskssensitive to initial reading ability This pattern differed for nonnativespeakers and native speakers Examination of the reclassificationsbased on the new tasks revealed some of the problems of using abasic comprehension-only test to predict performance on more chal-lenging literacy tasks These results also imply the need for furtherwork on new measures of advanced literacy skills such as Reading toLearn and Reading to Integrate to reflect trends in construct-drivenassessment (Pellegrino et al 1999)
For the nonnative speakers most (818) of the participants withTOEFL Reading Comprehension below 50 remained classified as lowon the Reading to LearnReading to Integrate Composite suggestingthe existence of a lower threshold of academic English proficiencyParticipants below this threshold were unlikely to perform well ontasks assessing Reading to Learn and Reading to Integrate Even tasksinvolving only selection (such as BC1 and BC2) rather than produc-tion of responses were difficult for this group However for thenonnative speaker readers in the mid and high reading ability groupsbasic comprehension level was not nearly as consistent a predictor of performance on the Reading to LearnReading to IntegrateComposite This was especially striking for those in the mid readingability group Approximately one fourth of the mid group droppedinto the low category on the Reading to LearnReading to IntegrateComposite indicating that they experienced more difficulty in com-pleting the new measures but approximately one fourth could indeedcategorize and synthesize information relatively well despite mid-dling performance on basic comprehension This finding suggeststhat for nonnative speaker readers in the mid reading ability categorybasic comprehension measures were insufficient to predict theirperformance on the new measures
Latricia Trites and Mary McGroarty 197
Almost two-thirds of the high basic comprehension nonnativespeakers remained high on the Reading to LearnReading toIntegrate Composite but one third dropped suggesting that theycould not categorize and synthesize information from texts as well asthey could recognize information For this third the recognition-onlytask multiple-choice basic comprehension overestimated ability tomanipulate and synthesize information For the high and mid read-ing ability nonnative speaker groups basic comprehension-only testsmay then overestimate academic English proficiency
Results for the native speakers showed a more definitive pattern ofreclassification For the low reading ability group approximatelyhalf were misclassified suggesting that the basic comprehensionmeasure underrepresented their ability to categorize and synthesizeinformation in academic texts Most of the mid reading ability groupwere reclassified as lower on the Reading to LearnReading toIntegrate Composite showing that basic comprehension resultsoverestimated ability to succeed on more challenging tasks On theother hand almost one third of the mid reading ability group didbetter on the Reading to LearnReading to Integrate Composite forthem basic comprehension underrepresented their ability to com-plete more challenging reading tasks For the high ability readersnearly one third dropped on the Reading to LearnReading toIntegrate Composite For them the basic comprehension measureoverestimated their level of academic English proficiency The other two thirds of the high reading ability group remained highsuggesting a higher threshold of academic English proficiency
Considering results from both the nonnative speakers and nativespeakers we conclude that the new tasks did assess something dif-ferent from basic comprehension once a lower level threshold ofbasic academic English proficiency had been achieved Examinationof scatterplots based on discriminant functions indicated an obviousseparation between participants who could and could not perform onthe Reading to LearnReading to Integrate Composite thus provid-ing some tentative evidence for concurrent validity (Messick 1989Chapelle 1999) We had hoped to find clear evidence of a hierarchysuggesting that Reading to Learn was demonstrably more difficultthan basic comprehension and Reading to Integrate demonstrablymore difficult than Reading to Learn but results did not yield anobvious hierarchy Results did however suggest an even simplerpattern a dichotomy For those nonnative speakers above the lowerthreshold of English language proficiency which in our data wouldbe approximately 500 or 49 on the scaled TOEFL Reading
198 New tasks for reading comprehension tests
Comprehension scores some could perform well on the newmeasures some could not revealing two rather than three groups onthe Reading to LearnReading to Integrate Composite These resultslead us to speculate that the new measures tap additional skills suchas lsquosophisticated discourse processes and critical thinking skillsrsquo(Enright and Schedl 1999 24) in addition to language proficiencyFor native speakers the discriminant loading suggested a possiblehierarchy of difficulty but the scatterplots and classification tablesrevealed a dichotomy Reclassification based on the Reading toLearnReading to Integrate Composite nearly eliminated the mid basic comprehension reading ability category with only 168of the native speakers qualifying as mid on the Reading toLearnReading to Integrate Composite
VI Conclusions
While we acknowledge that there are some limitations to this projectsuch as the time required to administer and score new measuresthe results are illuminating useful and suggest some considerationsfor future research The first consideration relates to appropriateclassification of students based on academic reading abilities Thiscorresponds to the role of TOEFL as a gatekeeper by many institu-tions For TOEFL test-takers whose current TOEFL scores would bein an intermediate to high intermediate range the new tasks couldassess additional abilities relevant to academic performance Forsuch participants additional test development efforts are warrantedThe majority of high reading ability nonnative speakers (649)could perform the new measures a third could not Our resultssuggest that some unknown number of mid reading ability nonnativespeakers are more capable of succeeding at more challenging aca-demic reading tasks than their current level of basic comprehensionassessment would indicate At the same time there are some highand mid reading ability participants who could not perform the moredemanding tasks admitting such students directly into universityprograms could result in failure These results suggest that for moststudents at lower levels of basic comprehension (in our study thosewith TOEFL scores below 490 to 500) development of new tasks isunnecessary Nevertheless our results indicate that a certain smallpercentage (in our sample 8 students 182) of nonnative speakersclassified as low reading ability on basic comprehension did in fact
Latricia Trites and Mary McGroarty 199
perform adequately on more challenging tasks like Reading to Learnand Reading to Integrate Given the very large number of TOEFLtest-takers around the world this finding merits further research Itwould be potentially unfair to exclude such students from universitystudy based on basic comprehension-only measures such as the cur-rent TOEFL We would urge institutions that use TOEFL to considerscores on these new more demanding tasks if doing so wouldaddress institutional needs According to interview data collected inconjunction with this project (Trites 2000 Chapter 6) participantsperceived that successful completion of the new tasks required athorough understanding of the texts whereas the multiple-choicetests were less demanding requiring only superficial grasp of con-tent This finding corroborates the observation of Freedle and Kostin(1999 3) who note that often examinees do not need to comprehendthe accompanying text to answer the test item Therefore for allTOEFL test-takers incorporating more challenging tasks such as theReading to Learn and Reading to Integrate measures into typicalEnglish language instruction could have positive washback effects
The second area of consideration is the focus on additional rele-vant research For predictive validity it is important to determine thecorrelation between the Reading to LearnReading to IntegrateComposite measures and actual academic performance in universityclasses Correlations might differ depending on the degree of catego-rization and synthesis required by different major fields of studyAnother area for future research is related to the novelty of theReading to Learn task as an assessment technique Current pedagog-ical trends in literacy instruction emphasize the use of graphic organ-izers as a means to understand and manipulate information in textsTo our knowledge graphic organizers are typically used as classroomactivities rather than assessment tools They have good potential foruse in assessment if students are familiar with them and if appropri-ate scoring systems can be developed This project demonstrates that it is possible though labor intensive to develop reliable scoringsystems Further research is needed to explore refinement of this tasktype and related scoring systems for use in testing programs withlarge numbers of test-takers and scorers Although the Reading toIntegrate task (generating a synthesis) was more familiar the scoringsystem was innovative because it reflected a readerrsquos ability torecognize textual frames as well as integrate information If testdevelopers are interested in the abilities of nonnative speaker readersto perform such tasks further development of similar tasks andscoring systems is warranted While a substantial investment of time
200 New tasks for reading comprehension tests
would be required to refine the administration and scoring systemsneeded for more complex tasks such as these it should be weighedagainst the possible danger of under-representing the likelihood ofacademic success based on results of the basic comprehension-onlymeasures still most often used to assess academic proficiency
Acknowledgements
This project was funded by a grant from Educational Testing Serviceas part of the TOEFL 2000 effort We appreciate the cooperation ofthe ETS staff members who assisted in the development of passagespecific tasks and other aspects of the research however no officialendorsement of Educational Testing Service should be inferred
VII References
Bachman L 2000 Modern language testing at the turn of the century assur-ing that what we count counts Language Testing 17 1ndash42
Biber D 1993 Using register-diversified corpora for general language studiesComputational Linguistics 19 219ndash41
Britt M Rouet J and Perfetti C 1996 Using hypertext to study and reasonabout historical evidence In Rouet J Levonen J Dillon A and Spiro Reditors Hypertext and cognition Mahwah NJ Lawrence Erlbaum 43ndash72
Chapelle C 1999 Validity in language assessment Annual Review of AppliedLinguistics 19 1ndash19
Educational Testing Service 1997 TOEFL test and score manual PrincetonNJ Educational Testing Service
mdashmdash 1998 Draft TOEFL 2000 research agenda framework areas of researchResearch agenda three TOEFL 2000 internal document Princeton NJEducational Testing Service
Eignor D Taylor C Kirsch I and Jamieson J 1998 Development of ascale for assessing the level of computer familiarity of TOEFL exami-nees TOEFL Research Report No 60 Princeton NJ EducationalTesting Service
Enright M and Schedl M 1999 Reading for a reason using reader purposeto guide test design TOEFL 2000 Internal Report Princeton NJEducational Testing Service
Enright M Grabe W Mosenthal P Mulcahy-Ernt P and Schedl M1998 A TOEFL 2000 framework for testing reading comprehension aworking paper Princeton NJ Educational Testing Service
Foltz P 1996 Comprehension coherence and strategies in hypertext InRouet J Levonen J Dillon A and Spiro R editors Hypertext and cog-nition Mahwah NJ Lawrence Erlbaum 109ndash36
Latricia Trites and Mary McGroarty 201
Freedle R and Kostin I 1999 Does the text matter in a multiple choice testof comprehension The case for the construct validity of TOEFLrsquosminitalks Language Testing 16 2ndash32
Goldman S 1997 Learning from text reflections on the past and suggestionsfor the future Discourse Processes 23 357ndash98
Hayes J and Hatch J 1999 Issues in measuring reliability correlationversus percentage of agreement Written Communication 16 354ndash67
Huberty C 1994 Applied discriminant analysis New York John WileyJamieson J Campbell J Norfleet L and Berbisada N 1993 Reliability
of a computerized scoring routine for an open-ended task System 21305ndash22
Jamieson J Norfleet L and Berbisada N 1993 Successes failures anddropouts in computer-assisted language lessons Computer AssistedEnglish Language Learning Journal 4 12ndash20
Klecka W 1980 Discriminant analysis In Lewis-Beck M editorQuantitative applications in the social sciences Volume 19 NewburyPark CA Sage
Lehto M Zhu W and Carpenter B 1995 The relative effectiveness ofhypertext and text International Journal of Human-ComputerInteraction 7 293ndash313
McNamara D and Kintsch W 1996 Learning from texts effects of prior knowl-edge and text coherence Discourse Processes 22 247ndash88
Messick S 1989 Validity In Linn RL editor Educational measurement 3rdedition New York American Council on Education Macmillan 13ndash103
Meyer B 1985a Prose analyses purposes procedures and problems InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 11ndash64
mdashmdash 1985b Prose analysis purposes procedures and problems Part 2 InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 269ndash97
Monks V 1997 AugustSeptember Two views same waterway NationalWildlife 35 36ndash37
Pellegrino J Baxter G and Glaser R 1999 Addressing the lsquotwo disci-plinesrsquo problem linking theories of cognition and learning with assess-ment and instructional practice Review of Research in Education 24307ndash53
Perfetti C 1997 Sentences individual differences and multiple texts threeissues in text comprehension Discourse Processes 23 337ndash55
Perfetti C Britt MA and Georgi M 1995 Text-based learning andreasoning studies in history Hillside NJ Lawrence Erlbaum
Perfetti C Marron M and Foltz P 1996 Sources of comprehension failuretheoretical perspectives and case studies In Cornoldi C and Oakhill Jeditors Reading comprehension difficulties processes and interventionMahwah NJ Lawrence Erlbaum 137ndash65
202 New tasks for reading comprehension tests
Reinking D 1988 Computer-mediated text and comprehension differencesthe role of reading time reader preference and estimation of learningReading Research Quarterly 23 484ndash500
Reinking D and Schreiner R 1985 The effects of computer-mediated texton measures of reading comprehension and reading behavior ReadingResearch Quarterly 20 536ndash53
Spivey N 1997 The constructivist metaphor reading writing and themaking of meaning San Diego CA Academic Press
Stevens J 1996 Applied multivariate statistics for the social sciences 3rdedition Mahwah NJ Lawrence Erlbaum
Tabachnick B and Fidell L 1996 Using multivariate statistics 3rd editionNew York Harper Collins
Taylor C Jamieson J Eignor D and Kirsch I 1998 The relationshipbetween computer familiarity and performance on computer-basedTOEFL test tasks TOEFL Research Report No 61 Princeton NJEducational Testing Service
Tennesen M 1997 NovemberDecember On a clear day National Parks 7126ndash9
Trites L 2000 Beyond basic comprehension reading to learn and reading tointegrate for native and non-native speakers Unpublished doctoraldissertation Northern Arizona University Flagstaff AZ
Van den Berg S and Watt J 1991 Effects of educational setting on studentresponses to structured hypertext Journal of Computer-BasedInstruction 18 118ndash24
Van Dijk TA and Kintsch W 1983 Strategies of discourse comprehensionNew York Academic Press
Wiley J and Voss J 1999 Constructing arguments from multiple sourcestasks that promote understanding and not just memory for text Journalof Educational Psychology 91 310ndash11
Zimmerman T 1997 December 29 Filter it with billions and billions of oys-ters how to revive the Chesapeake Bay US News and World Report 12363
Appendix 1a Chart completion task
Directions Complete the following chart Fill in as much detail aspossible from the text read by categorizing the information into thedifferent areas on the chart Do not use single words for yourresponses form your responses in phrases or complete sentencesInclude examples from the text
Make no judgments about the accuracy of causes or effects or theeffectiveness of the solutions mentioned in the text Solutions are seen
Latricia Trites and Mary McGroarty 203
204 New tasks for reading comprehension tests
as any action taken in response to the problem(s) Solutions can takemany forms such as proposed solutions attempted solutions or failedsolutions Also space is provided under each category for examplesExamples are specific examples found in the text that are used by theauthor(s) to exemplify the problems causes effects or solutions inthe text However there may not be examples for every category
Points will be awarded for correct responses only There is no penaltyfor incorrect responses Points will be awarded in the following manner
Problems and Solutions 10 points eachCauses and Effects 5 points eachExamples 1 point each
Problems Causes Effects Solutions
Examples Examples Examples Examples
Latricia Trites and Mary McGroarty 205
Ap
pen
dix
1b
Rea
din
g t
o L
earn
sco
rin
g r
ub
ric
0ndash2
41 t
ota
l po
ssib
le
Pro
ble
ms
(10
po
ints
) 1
wo
rd
Cau
ses
(5p
oin
ts)
1 w
ord
E
ffec
ts (
5 p
oin
ts)
1 w
ord
S
olu
tio
ns
(10
po
ints
en
d)
1 w
ord
(6 p
oin
ts)
40
max
imu
m(3
po
ints
) 5
5 m
axim
um
(3 p
oin
ts)
30
max
imu
m(6
po
ints
) 9
0 m
axim
um
bullA
ir p
ollu
tio
nS
mo
g i
n t
he
bullS
ulf
ur
nit
rog
en e
mis
sio
ns
bullTre
es a
nd
pla
nts
aff
ecte
dS
tric
ter
En
vir
on
men
tal
Law
s
Nati
on
al
Park
sbull
Aci
d r
ain
(in
jure
d)
gro
wth
hin
der
ed(r
eso
luti
on
s
acts
)
bullO
pp
osit
ion
or
Ign
ori
ng
bull
Gro
un
d-l
evel
ozo
ne
bullV
isib
ilit
y d
ecre
ased
bull19
77 C
lean
Air
Act
(or
lack
of
coo
per
atio
n)
to
Urb
an
In
du
str
ial
em
issio
ns
bullM
eta
ls l
oo
sen
ed
into
wat
ers
Am
en
dm
en
ts l
ab
elin
gN
Ps
envi
ron
men
tal s
tan
dar
ds
bullA
uto
mo
bile
emis
sio
ns
(su
rfac
eg
rou
nd
wat
er
as C
lass I
are
as
(reg
ula
tio
ns)
bullP
ow
er
pla
nts
fac
tori
esp
lan
ts
wat
er p
ollu
ted
)bull
Reg
ion
al h
aze
reg
ula
tio
ns
bullN
o t
rue
lsquopo
int
sou
rces
rsquoem
issi
on
sbull
Nu
trie
nts
rem
oved
pro
po
sed
by
the
EPA
(myri
ad
of
sm
aller
po
llu
tio
n
bullE
mis
sio
ns
fro
m S
mo
kesta
cks
(lea
ched
) fr
om
so
il bull
Red
ucti
on
of
allo
wab
leso
urc
es
rath
er t
han
on
e (c
him
neys)
and
or
pla
nts
po
llu
tio
n s
tan
dard
s f
rom
la
rge
sou
rce)
bullE
mis
sio
ns
fro
m K
iln
sbull
Pu
blic o
utc
ryo
ver
stan
ce
ind
ust
rybull
Nati
on
al
Park
s l
imit
ed
bull
Un
reg
ula
ted
po
lluti
on
sm
og
o
f b
ig b
usi
nes
sg
ove
rnm
ent
bullC
lean
Air
Act
set
ju
risd
icti
on
of
po
lluti
on
(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)bull
Aq
uati
c l
ife in
jure
d (
dam
aged
)vis
ibilit
y g
oals
sou
rces
ou
tsid
e o
f th
e p
arks
bullS
mo
ke f
rom
Co
ntr
olled
bu
rns
bullO
bje
cti
ng
to c
on
str
ucti
on
bullE
mis
sio
ns
fro
m la
rge
citi
esp
erm
its
bullId
en
tifi
cati
on
of
po
lluti
on
so
urc
es
bullIn
du
str
y m
od
ificati
on
s
Insta
llati
on
o
f d
evic
es
(scr
ub
ber
s)
to r
edu
ce
po
lluti
on
bullP
ub
lic p
ressu
re t
op
rote
ct p
arks
206 New tasks for reading comprehension testsA
pp
en
dix
1b
(co
nti
nu
ed)
Exam
ple
s (
1 p
oin
t) 5
maxim
um
Exam
ple
s (
1 p
oin
t o
r 5 p
oin
ts
Exam
ple
s (
1 p
oin
t) 6
maxim
um
Exam
ple
s (
1 p
oin
t o
r 10 p
oin
ts
wit
h o
vera
rch
ing
cau
se
wit
h o
vera
rch
ing
so
luti
on
liste
d a
bo
ve)
liste
d a
bo
ve)
bullG
reat
Sm
oky M
ou
nta
inbull
Au
tom
ob
ile
emis
sio
ns
bullLe
aves
of
pla
nts
tu
rnin
g
bull19
77 C
lean
Air
Act
N
atio
nal
Par
kbull
Po
wer
pla
nts
fac
tori
es
pu
rple
an
d b
row
n (
stip
plin
g)
Am
en
dm
en
ts
lab
elin
gN
Ps
bullG
ran
d C
an
yo
np
lan
ts e
mis
sio
ns
bullH
ind
erin
g p
ho
tosyn
thesis
of
as C
lass I
are
as
Nat
ion
al P
ark
bullE
mis
sio
ns
fro
m
pla
nts
an
d t
rees
bullS
mo
ky M
ou
nta
in S
ou
rces
S
mo
kesta
cks
(ch
imn
eys)
bullR
ed
ucti
on
of
vis
ibilit
yat
bullR
egio
nal
haz
ere
gu
lati
on
s
fro
m O
hio
New
Yo
rk
bullE
mis
sio
ns
fro
m K
iln
sG
ran
d C
an
yo
np
rop
ose
d b
y th
e E
PA
Atl
anta
etc
bull
Un
reg
ula
ted
po
lluti
on
sm
og
bull
Vis
ibili
ty r
educ
ed 9
0 o
f bull
Red
ucti
on
of
allo
wab
lebull
Gra
nd
Can
yon
So
urc
es(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)th
e da
ysp
ollu
tio
n s
tan
dard
s f
rom
fro
m C
A N
V U
T A
Z N
M
bullS
mok
e fr
om C
on
tro
lled
bu
rns
bullS
ee v
ague
blu
e m
asse
sin
du
stry
and
Mex
ico
bullE
mis
sio
ns
fro
m la
rge
citi
esbull
See
hal
f as
far
as
in 1
919
bullT
N L
utt
rell
corp
bu
ildin
g
per
mit
sit
uat
ion
Exam
ple
s (
1 p
oin
t)
10 m
axim
um
Exam
ple
s (
1 p
oin
t)
5 m
axim
um
bullN
avaj
o G
ener
atin
g S
tati
on
bull
Ob
ject
ing
to
kiln
s in
TN
Pag
e A
Z (
Po
wer
Pla
nt
AZ
)bull
Res
earc
her
s u
sin
g s
cien
tifi
cbull
Po
wer
pla
nts
in T
N a
nd
OH
tech
no
log
y to
ID s
ou
rces
rive
r va
lley
(rad
ioac
tive
iso
top
es)
bullS
ou
ther
n C
alif
orn
ia E
dis
on
bull
So
uth
ern
Cal
ifo
rnia
Ed
iso
nP
lan
t L
aug
hlin
NV
Pla
nt
in L
aug
hlin
NV
bull
Ten
nes
see
Lutt
rell
Kiln
s T
Nid
enti
fied
as
po
int
sou
rce
bullIn
du
stry
in A
Zbull
Scru
bber
sin
stal
led
at
bullC
ars
in C
AN
avaj
o G
ener
atin
g P
lan
tbull
Sm
oke
stac
ks in
NM
NV
UT
bull10
r
edu
ctio
n o
f p
ollu
tio
nbull
Los
An
gel
esin
10ndash
15 y
ears
(g
oal
of
no
bullA
tlan
tam
an-m
ade
po
lluti
on
)bull
New
Yo
rk
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
(Enright et al 1998 46ndash48) For the Reading to Learn task weassessed readersrsquo knowledge model through their ability to recall andcategorize information from a single text (Enright et al 1998 57)
Another goal of the project was to assess Reading to Integrateinformation which requires readers to integrate information frommultiple sources on the same topic Reading to Integrate goes a stepfurther than Reading to Learn because readers must integrate therhetorical and contextual information found across the texts andgenerate their own representation of this interrelationship (Perfetti1997) Therefore readers must assess the information presented inall sources read and accept or reject pieces of it as they create theirown understanding One means of assessing integration of informa-tion found in typical university assignments is the open-ended taskof generating a synthesis based on one or more texts (Enright et al1998 48ndash49) We used a writing task specifically a writing promptthat elicited the readerrsquos perception of the authorsrsquo communicativepurposes (Enright et al 1998 56) as well as amount of informationretained from two texts to test Reading to Integrate
II Related literature
Recent research has begun to explore the development of tasks thatdistinguish the constructs of Reading to Learn from basic compre-hension Researchers (van Dijk and Kintsch 1983 McNamara andKintsch 1996 Goldman 1997) have determined that reading tolearn requires an interaction between the Text Model of a text as wellas its Situation Model thus resulting in a more difficult measureThese researchers further suggest that Reading to Learn can beassessed through measures that go beyond recall summarizationand text-based multiple-choice questions
The construct of Reading to Integrate requires that readers notonly integrate the Text Model with the Situation Model but also that they create what Perfetti (1997 346) calls a Documents Modelconsisting of two critical elements lsquoAn Intertext Model that linkstexts in terms of their rhetorical relations to each other and aSituations Model that represents situations described in one or moretext with links to the textsrsquo He argues that the use of multiple textsas opposed to a single text brings into clearer focus the relationshipbetween the Text Model and the Situation Model This again sug-gests that Reading to Integrate should be more difficult than Readingto Learn
176 New tasks for reading comprehension tests
Because these constructs go beyond basic comprehensionReading to Learn and Reading to Integrate are hypothesized to bemore difficult reading tasks than Reading to Find Information andReading for Basic Comprehension Perfetti (1997) further suggeststhat Reading to Integrate is a more difficult task than Reading toLearn because it not only requires an integration of a Text Model anda Situation Model but requires an integration of multiple TextModels and multiple Situation Models Thus current reading theorysuggests a difficulty hierarchy of reading tasks based on the level ofintegration necessary to complete the tasks successfully Severalstudies (Perfetti et al 1995 1996 Britt et al 1996 Wiley and Voss1999) have attempted to move beyond basic comprehension andexamine readersrsquo ability to integrate the information from multipletexts into one cohesive knowledge base by having students makeconnections compare or contrast information across texts
Additionally recent research has addressed the effects of computerson reading and assessment such research is relevant to the currentproject because the new TOEFL is administered via computersReading-medium studies have shown that the only effect that com-puters have on reading is related to task (Reinking and Schreiner1985 Reinking 1988 van den Berg and Watt 1991 Lehto et al1995 Perfetti et al 1995 1996 Britt et al 1996 Foltz 1996Wiley and Voss 1999) Taylor et al (1998) found that after minimalcomputer training familiarity with technology did not have a signif-icant effect on examineesrsquo performance on TOEFL-like questionsBecause of the relevance of computer familiarity to TOEFL admin-istration a brief measure of computer familiarity was included in theresearch
For this project we asked three research questions
1) Is performance on a measure of Reading to Learn affected by medium of presentation (paper versus computer) technologyfamiliarity native language (native versus nonnative speakers of English) or level of education (graduate versus under-graduate)
2) Is performance on a measure of Reading to Integrate affected bymedium of presentation (paper versus computer) technologyfamiliarity native language (native versus nonnative) or level ofeducation (graduate versus undergraduate)
3) To what extent are measures of finding informationbasic read-ing comprehension Reading to Learn and Reading to Integraterelated
Latricia Trites and Mary McGroarty 177
III Methods
1 Participants
Two hundred and fifty-one participants the majority undergraduatesvolunteered to take part in this study The sample consisted of 105undergraduate native speakers of English (NSUs) 106 undergraduatenonnative speakers (NNSUs) and 40 graduate nonnative speakers(NNSGs) of English at a midsized southwestern university All data were collected between February and October 1999 All under-graduate participants were recruited through large undergraduateclasses in the areas enrolling most NNSs (business administrationhotel management engineering social sciences and humanities)We tested all NNSs accessible at the institution at the time of datacollection compared to a national sample of international studentsfrom the prior academic year we had a relatively larger proportionof undergraduate relative to graduate students Nearly all undergrad-uate participants were young adults with an average age of 21Nonnative speakers were also recruited from students enrolled in thesummer intensive English program which is made up of studentsneeding to increase TOEFL scores to at least 500 in order to enrollat a university We included 46 participants (32 of NNS sample)with TOEFL scores below 500 in the nonnative sample Graduatenonnative speakers (n 40) were recruited from the entire univer-sity population and had an average age of 3075 Nonnative speakersrepresented a range of language backgrounds One third wereJapanese with other Asian Germanic and Romance languages alsosubstantially represented Both the relatively modest sample size andthe all-volunteer nature of the participant sample preclude directgeneralization to the worldwide TOEFL population but participantswere representative of the levels of international students at the insti-tution where they were enrolled Participants who completed all fourdata collection sessions received a payment of US$10 per hour(US$40 for the entire project)
2 Instruments
This project used three existing instruments two to determine initialreading levels and one to assess levels of computer familiarity andtwo new instruments one for Reading to Learn and one for Readingto Integrate these were developed especially for the project Each ofthe new measures also served as the basis for an additional measure
178 New tasks for reading comprehension tests
of basic reading comprehension related directly to the text includedin the new task Thus each participant completed a total of sevendifferent instruments
a Existing instruments Initial levels of reading comprehensionwere determined based on the NelsonndashDenny Reading Test(NelsonndashDenny) Form G used to identify the reading levels of theNSs and three retired versions of the Institutional TOEFL ReadingComprehension Section (TOEFL Reading Comprehension) used toidentify the reading levels of the NNSs Although each of these testswas used to assess reading levels in the population for which it had been developed all 251 participants took both tests in order toprovide comparative data All 251 participants also completed a brief computer familiarity questionnaire
Participantsrsquo computer familiarity was determined through an 11-item questionnaire based on a longer 23-item questionnairepreviously developed by ETS (Eignor et al 1998) In the presentstudy we used only the 11 items that loaded the most heavily on themajor factors resulting from administration to a large sample ofTOEFL participants For these 11 items developers determined thereliability to be 93 using a split-half method (Eignor et al 199822) This brief questionnaire took approximately 5 minutes tocomplete reliability in our sample using coefficient alpha was 87
b Texts used for new measures In developing the new tasks weselected texts that would conform to the design specifications ofTOEFL 2000 They were problemsolution texts recommended asone of the potentially relevant text types for TOEFL 2000 (Enright et al 1998) Longer texts were used because these represented morechallenging and authentic academic tasks (Enright et al 1998) Weused one 1200-word and two 600-word texts The longer text(Tennesen 1997) was used to assess Reading to Learn and the two600-word texts (Monks 1997 Zimmerman 1997) were used toassess Reading to Integrate We chose these text lengths based onwork by Meyer (1985a) and further research by the first authorindicating that natural science texts between 1200 and 1500 wordsincluded representation of all necessary macro-rhetorical structuresof problemsolution texts with or without explicit signaling While1200ndash1500 word texts provide optimal representation of the macro-rhetorical structures texts of 600-words provide all the basic macro-rhetorical structures present in problemsolution texts Thus these
Latricia Trites and Mary McGroarty 179
180 New tasks for reading comprehension tests
lengths were long enough for adequate argumentation but not so long that they were excessively redundant (Enright et al 1998)Texts were also matched for readability according to standard read-ability scales such as the FleschndashKincaid ColemanndashLiau andBormuth scales and averaged a minimum of grade level 110 to 120on these scales Also all texts pertained to natural and social scienceseach text covered environmental issues such as air and water pollution(Enright et al 1998) Thus text topics were similar across tasks
c New instruments used in the study Three new reading measureswere used in this study to assess Reading to Learn Reading to Integrateand Basic Comprehension Trites (2000 Chapters 2 and 3) presents amore extensive review of literature and rationale for development of thenew measures
bull Reading to Learn The first new measure completion of a chart was used to determine participantsrsquo ability to read to learnSpivey (1997 69) suggests that readersrsquo categorization of infor-mation in text offers insight into their cognitive processes andtheir making of meaning We designed a measure to be used with a 1200-word text that students read on either paper or com-puter Students were asked to recall identify and categorizeinformation from the text on a chart reflecting macro-rhetoricalstructures called macrostructures in this study (problems andsolutions) and other types of information from problemsolutiontexts (causes effects and examples) categories based on thework of Meyer (1985a) The scoring rubric based on work byMeyer (1985b) and later modified by Jamieson et al (1993)awarded points only for the upper levels of textual structurerepresented on the chart (for task and scoring rubric seeAppendix 1) We weighted the information supplied on the chartas follows 10 points for correct information in the problem andsolution categories five points for correct information suppliedin the cause and effect categories and one point for accurateexamples This weighting reflects Meyerrsquos (1985b) hierarchicallevels which characterize problem and solution propositions ashigher order structures while the other categories represent lowerorder propositions1 The theoretical maximum score for this scale
1Students received no points for information improperly placed or for information not found in thetext
Latricia Trites and Mary McGroarty 181
was 241 which would result from maximum points given in allcategories The first author and two research assistants spent35ndash40 hours creating revising norming the scoring rubric anddeveloping the scoring guide (Trites 2000 Chapter 3) To deter-mine interrater reliability we used coefficient alpha rather thanpercentage of agreement because percentage of agreementinflates the likelihood of chance agreement (Hayes and Hatch1999) After norming overall interrater reliability was 99(coefficient alpha) with similarly high reliabilities assessed withsimilarly high alpha coefficients for all subcategories2
bull Reading to Integrate The second new measure assessed Readingto Integrate The task used to assess Reading to Integraterequired participants to read two 600-word texts and compose awritten synthesis The prompt asked students to make connec-tions across the range of ideas presented thus we asked readersto synthesize information rather than summarize or makecomparisons (Wiley and Voss 1999) This synthesis was scoredbased on an analytic scale ranging from 0 to 80 reflecting read-ersrsquo ability to recognize and manipulate the structure of the textsinclude specific information and express connections acrosstexts through the use of cohesive devices (for task and scoringrubric see Appendix 2) The test was designed to measure theintegration of content from both readings and did not assessother aspects of writing such as the creation of rhetorical stylegrammaticality or mechanics The rubric was composed of threesubcategories integration ability macrostructure recognitionand use of relevant details The integration subscore wasawarded the highest point values because this was the predomi-nant skill being tested It scored participants on their ability tomake connections across texts based on the manipulation of thetextual frames in both texts The second subcategory awardedpoints for the ability to recognize and articulate the macrostruc-tures (problem cause effect or solution) present in each textThis subcategory was similar to the categorizing task used in theReading to Learn measure with the additional constraint thatparticipants had to express the connections overtly The thirdsubcategory in the scoring rubric analysed the ability to use
2We recognize that tasks requiring high inference measures plus extensive norming and revision of the scoring rubric pose feasibility issues in large-scale testing Further research is needed todetermine whether and how such scoring procedures could be adapted in standardized testing fornumerous test-takers
relevant details as support in the written synthesis The firstauthor and two research assistants spent 30 hours revising norm-ing the scoring rubric and developing a decision guide resultingin an overall interrater reliability of 99 (coefficient alpha) withsimilarly high alphas for all subcategories
bull Basic Comprehension The third construct was measured bymultiple-choice tests related specifically to the texts used in the new tasks These tests were created by TOEFL TestDevelopment staff and followed current TOEFL reading sectionspecifications We used two multiple choice tests BasicComprehension Test 1 (BC1) and Basic Comprehension Test 2 (BC2) 20 items each one for the longer passage used to assessReading to Learn and one for the two passages used to assessReading to Integrate Both were scored based on number of items answered correctly Reliability on BC1 calculated basedon 251 participants was 84 (coefficient alpha) Inadvertently theorder of the texts used in BC2 was different for the two differentmedia however reliability on both versions of the test was highFor those who took BC2 based on paper texts (n 127) relia-bility was 84 (coefficient alpha) for those who took BC2 basedon computerized texts (n 124) reliability was 86 (coefficientalpha)
3 Design for data collection
This study used a 22 repeated measures design to examineperformance on the new reading tasks Native speaker undergraduatesand nonnative speaker undergraduates were divided into two groupseach of equal ability as determined by performance on the baselinestandardized measures of reading comprehension (NelsonndashDenny orTOEFL) Half of each group read texts on paper the other half readthe same texts on a computer screen A smaller group of nonnativespeaker graduates equally divided were also included for a compar-ison between performance by graduate and undergraduate nonnativespeakers Additionally the administration of the new measures wascounterbalanced to control for any practice effect
a Procedures All participants met with the researchers in foursessions each lasting about an hour The first two sessions were devotedto administering the existing instruments During Session 1 partici-pants received an introduction to the study and took one of the two
182 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 183
standardized basic reading comprehension measures (NelsonndashDennyor TOEFL Reading Comprehension) Students completed thecomputer familiarity questionnaire and the NelsonndashDenny Test at the same testing session because the NelsonndashDenny was shorter than the TOEFL Reading Comprehension During Session 2 partic-ipants took the other standardized basic reading comprehensionmeasure
Next each participant group was subdivided into two subgroupsfor computer-based or paper reading of the texts for the new tasksThe subgroups were matched on their performance on initial readingmeasures the NelsonndashDenny was used for native speakers and theTOEFL Reading Comprehension was used for nonnative speakersIndependent t-tests run on these reading measures showed no signif-icant difference in basic comprehension for the newly created sub-groups assigned to each medium ensuring that they were balancedfor initial reading levels Participants stayed in the same subgroupsfor the duration of the study To ensure uniformity of response modeall participants whether they read the source texts on the computeror on paper responded to the reading tasks using paper and pencilformat3
The last two sessions each lasting approximately one hour werededicated to administration of the new measures The Reading toLearn session took slightly longer to administer because administra-tive procedures were longer for this novel task The new tasks werecounterbalanced to control for practice effect thus half of the partic-ipants took the Reading to Learn measure first and half took theReading to Integrate measure first During Session 3 we administeredthe first new measure (for ease of discussion Reading to Learn is dis-cussed first) and BC1 At this session students were given 12 minutesto read a 1200-word passage either on computer or on paper We lim-ited the time allowed for reading based on 100 words per minutethought to be ample (Grabe personal communication 1998) Afterexaminees read the text they were given 4 minutes to take notes ona half sheet of paper Participants were instructed to take minimalnotes due to the time constraints Next the text was removed andexaminees were allowed 15 minutes to complete a chart based on the reading with the aid of their notes After completing this Readingto Learn activity participants were allowed to use the text and
3Although responses could have been entered and perhaps scored by computer this would haveintroduced factors not directly related to our research questions and remains an area for furtherstudy
were given 15 minutes to answer BC1 Following these new testingsessions 49 participants were selected for a related interview con-cerning the cognitive processes used in task completion (for furtherdetails see Trites 2000 Chapter 6)
During Session 4 students were given 12 minutes to read twoshort texts (600 words each) either on computer or paper Afterparticipants read the assigned texts they were given 4 minutes totake one-half page of notes (Enright et al 1998) Next the textswere removed and participants were asked to demonstrate Readingto Integrate by writing a synthesis of the texts with the aid of theirnotes (15 minutes allowed for this task) After completing theReading to Integrate task participants were allowed to see the textsagain and answered BC2 (15 minutes allowed for this task) In oneReading to Integrate session for unknown reasons six of the sevenparticipants read only one text Because we cannot explain the causeof this anomalous session we have eliminated scores from thesessionrsquos seven participants from subsequent analyses thus slightlyreducing the N size for the Reading to Integrate measure
b Variables used in study The six independent variables includedthree nominal (Native Language Background Medium of TextPresentation and Level of Education) and three interval variables(NelsonndashDenny TOEFL Reading Comprehension and ComputerFamiliarity) The four dependent variables were Reading to LearnReading to Integrate BC1 and BC2
IV Results
First we present the descriptive statistics for all reading measuresfollowed by a systematic analysis of independent variables that mightaffect participant performance on the new measures Scatterplotswere checked for all reading measures to ensure normality of dataKurtosis and skewness levels for all reading measures were found tobe within normal limits indicating a relatively normal distributionDescriptive statistics for all existing measures are shown in Table 1Means for these measures show a consistent pattern the nativespeaker undergraduates had the highest mean followed by the non-native speaker graduates followed by the nonnative speaker under-graduates On the reading measures NelsonndashDenny and TOEFLReading Comprehension the nonnative speaker undergraduates
184 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 185
showed the largest variance in performance while on the computerfamiliarity measure the variance of both nonnative speaker groupswas substantially larger than that of the native speakers
The same pattern emerged for the means on the new measures (seeTable 2) as for the existing measures The native speaker undergradu-ate group performed better on all new measures than both of thenonnative speaker groups The nonnative speaker graduate groupperformed better than the nonnative speaker undergraduate group onall measures as well This robust pattern of performance was alsofound in the variance of three of the four new measures On BC1 andBC2 the performance of the native speaker undergraduates showed the least amount of variance followed by the nonnative speaker grad-uates followed by the nonnative speaker undergraduates On Readingto Integrate the native speaker undergraduate group showed substan-tially less variance than the nonnative speaker groups however thevariance of the two nonnative speaker groups was almost identical OnReading to Learn all three groups showed considerable variance
Table 3 reveals the range of awarded points achieved by all partici-pant groups The nature of the Reading to Learn point system created amaximum possible point value (241) that no participant achieved Wespeculate that there are at least three possible causes of the discrepancybetween the theoretical maximum and the range of observed scores
Table 1 Descriptive statistics for existing measures for three participant groups
Group n Mean sd kMax
NelsonndashDennyNSU 105 12648 1646 156NNSU 106 6724 3191 156NNSG 40 8888 2188 156Total participants 251 9547 3693 156
TOEFL Reading comprehensionNSU 105 6130 424 67NNSU 106 5030 853 67NNSG 40 5715 455 67Total participants 251 5599 819 67
Computer familiarityNSU 104 3808 360 44NNSU 104 3482 599 44NNSG 40 3563 602 44Total participants 248 3631 533 44
Note kMax number of items or maximum possible score
186 New tasks for reading comprehension tests
bull task novelty no participant reported ever doing such a taskpreviously
bull time allowed for task completion andbull space on the response sheet space constraints may have limited
the amount of information that participants could include
Future research would need to address these issues However for theReading to Integrate measure the full range of possible point totalswas achieved by at least one participant in each group
1 Computer familiarity
The overall plan for the analyses was to check the influence of theindependent variables on the dependent measures with computerfamiliarity being addressed first Initially we had proposed that if computer familiarity was significantly different across groups itwould be entered into all calculations as a covariate To determinethis it was necessary to conduct an Analysis of Variance (ANOVA) forcomputer familiarity across the six participantmedium subgroups
Table 2 Descriptive statistics for new measures for three participant groups
Group n Mean sd kMax
Reading to Learn (chart)NSU 105 5185 1986 241NNSU 106 3173 1950 241NNSG 40 4468 1927 241Total participants 251 4221 2164 241
Basic Comprehension Test 1NSU 105 1698 247 20NNSU 106 1173 425 20NNSG 40 1498 350 20Total participants 251 1444 423 20
Reading to Integrate (synthesis)NSU 101 6365 1105 80NNSU 103 3724 2176 80NNSG 40 5360 2103 80Total participants 244 5086 2163 80
Basic Comprehension Test 2NSU 105 1591 278 20NNSU 106 975 454 20NNSG 40 1285 361 20Total participants 251 1282 472 20
Notes kMax number of items or maximum possible score n size reduced forreading to integrate because of anomalous testing session
Latricia Trites and Mary McGroarty 187
The resulting ANOVA (F 470 p 05) showed a significantdifference between subgroups on the computer familiarity question-naire therefore a post hoc Scheffeacute test was done to locate significantcontrasts After analysis of all possible subgroup contrasts the post hoc Scheffeacute revealed that the only significant difference in sub-groups appeared between the native speaker undergraduates andnonnative speaker undergraduates who read texts on paper Hencealthough there was one significant contrast it occurred in two sub-groups reading on paper not in any of the subgroups who read oncomputer All groups generally scored high on computer familiarityalthough as noted variance of the nonnative groups was greater Itwas thus established that computer familiarity had no significanteffect on participants who read texts on computer so we did not usecomputer familiarity as a covariate in further analyses and proceededto the three research questions of central interest to this study
Because both Research Questions 1 and 2 are similar ndash except thatthey address the two different new reading measures Reading toLearn and Reading to Integrate ndash we approached them in the samemanner through ANOVA to identify the independent variables thatcould have significantly affected the results on the new measures
2 Research Question 1
The first research question asked if performance on a measure ofReading to Learn was affected by medium of presentation computerfamiliarity native language or level of education We calculated a uni-variate ANOVA with Type III sums of squares on Reading to Learn with
Table 3 Range of scores for new measures for three participant groups
Group n Minimum Maximum kMax
Reading to Learn (chart)NSU 105 14 120 241NNSU 106 0 86 241NNSG 40 3 94 241Total participants 251 0 120 241
Reading to Integrate (synthesis)NSU 101 38 80 80NNSU 103 0 80 80NNSG 40 5 80 80Total participants 244 0 80 80
Notes kMax number of items or maximum possible score n size reduced forreading to integrate because of anomalous testing session
188 New tasks for reading comprehension tests
group status medium of text presentation and test order as possiblecontributing factors4 Table 4 shows that there were no significantinteractions for any of the group medium or test order combinationsThe only significant main effect was group membership
Because group membership was a combined measure that includedboth native language background as well as level of education posthoc analysis was needed to identify the significant contrasts Table 5shows that there was a significant difference in performance on theReading to Learn measure between the native speaker undergraduateand the nonnative speaker undergraduate groups as well as a sig-nificant difference between the nonnative speaker undergraduate and nonnative speaker graduate groups There was no significantdifference in performance between the native speaker undergrad-uate and the nonnative speaker graduate groups Therefore theanswer to Research Question 1 is that native language backgroundand level of education did have a significant effect on performance onthe Reading to Learn measure but that medium of text presentationdid not Further order of testing whether participants took Readingto Learn or Reading to Integrate first had no significant effect
3 Research Question 2
The second research question related to the first asked if perform-ance on Reading to Integrate was affected by medium of presentation
Table 4 Performance on Reading to Learn measure by groups medium and testorder (n 251) (univariate analysis of variance)
Source Type III sum df Mean square Fof squares
Group 2186929 2 1093464 2810Medium 121586 1 121586 313Test order 29498 1 29498 076Group medium 33481 2 16740 043Group test order 437 2 219 001Medium test order 39173 1 39173 101Group medium test order 57529 2 28765 074Error 9299745 239 38911
Note p 05
4Test order was added as an additional variable to double check that our counterbalancing had beeneffective in controlling for any practice effect
Latricia Trites and Mary McGroarty 189
computer familiarity native language or level of education Again toensure that counterbalancing of tests controlled for any practiceeffect test order was added as an additional variable
To answer this question we proceeded to calculate a univariateANOVA on the Reading to Integrate measure with group statusmedium of text presentation and test order entered as possible contri-buting factors The results (Table 6) show as for Research Question 1that there were no significant interactions for any of the groupmedium or test order combinations the only significant main effectwas group membership The answer for Research Question 2 is thatnative language background and educational level had a significanteffect on Reading to Integrate but medium of text presentation did notPost hoc analysis of group contrasts showed that all three groups weredistinct in their performance on Reading to Integrate (see Table 7)
4 Research Question 3
The third research question asked to what extent measures of basiccomprehension Reading to Learn and Reading to Integrate were
Table 5 Post hoc Scheffeacute for Reading to Learn measure (n 251)
Group n Group n Mean difference Standard error
NSU 105 NNSU 106 2012 272NNSG 40 717 367
NNSU 106 NNSG 40 1295 366
Note p 05
Table 6 Performance on Reading to Integrate measure by groups medium andtest order (n 244a) (univariate analysis of variance)
Source Type III sum df Mean square Fof squares
Group 3629433 2 1814717 5582b
Medium 19295 1 19295 059Test order 9783 1 9783 030Group medium 1182 2 591 002Group test order 3014 2 1507 005Medium test order 109872 1 109872 338Group medium test order 148858 2 74429 229Error 7543037 232 32513
Notes an size reduced for Reading to Integrate because of anomalous testingsession bp 05
190 New tasks for reading comprehension tests
related We used correlational analysis as the first step in answeringthis question Results for the total participant population (see Table 8)showed moderate to high correlations across all reading measuresHowever the analyses done for Research Questions 1 and 2 revealedthat group status had a significant effect on performance on Readingto Learn and Reading to Integrate Further we realize that corre-lations are sensitive to variance so the high correlations seen in thetotal population could have been an artifact of combining the threegroups Therefore we examined the correlations among all readingmeasures for each group (available in Trites 2000 Appendix 1 pp230ndash33) While the reading measures were still correlated oftenmoderately sometimes highly magnitudes differed and sometimesdropped substantially The text-specific multiple-choice measuresBC1 and BC2 consistently correlated more highly with theNelsonndashDenny and TOEFL Reading Comprehension tests than withReading to Learn and Reading to Integrate based on the same textssuggesting a test method or construct effect Because comparisonsbetween different measures of basic comprehension were not a goal of the project BC1 and BC2 were not used in further analysesWe conclude that as expected all reading measures were relatedbut the lower correlations between Reading to Learn and Reading to Integrate and the traditional basic comprehension measures led us to consider further types of analysis to identify the possibledistinctiveness of the new measures
5 Discriminant analysis
Because we were interested in determining how constructs differedwe sought additional analyses to help us better characterize the new constructs Of the several possible statistical methods that could have been employed two are most plausible multivariateanalysis of variance usually associated with experimental research
Table 7 Post hoc Scheffeacute for Reading to Integrate measure (n 244a)
Group n Group n Mean difference Standard error
NS 101 NNSU 103 2641b 253NNSG 40 1005b 337
NNSU 103 NNSG 40 1636b 336
Notes an size reduced for Reading to Integrate because of anomalous testingsession bp 05
Latricia Trites and Mary McGroarty 191
Tab
le 8
Co
rrel
atio
ns
for
all r
ead
ing
mea
sure
s fo
r al
l par
tici
pan
ts (
n
251)
TO
EFL
Rea
din
gB
asic
B
asic
Rea
din
g t
oR
ead
ing
to
C
om
pre
hen
sio
nC
om
pre
hen
sio
n T
est
1C
om
pre
hen
sio
n T
est
2Le
arn
Inte
gra
tea
Nel
son
ndashDen
ny
90b
85b
84b
66b
69b
TO
EFL
Rea
din
g
100
85b
84b
64b
69b
com
pre
hen
sio
nB
asic
1
008
4b6
8b6
8b
com
pre
hen
sio
n 1
Bas
ic
100
68b
70b
com
pre
hen
sio
n 2
Rea
din
g t
o L
earn
100
59b
Not
es a
nsi
ze r
edu
ced
fo
r R
ead
ing
to
Inte
gra
te b
ecau
se o
f an
om
alo
us
test
ing
ses
sio
n b
p
05
and discriminant analysis usually associated with descriptiveresearch (Tabachnick and Fidell 1996) The present research wasconducted with samples of naturally occurring student groups andwas not experimental Moreover we were interested in finding waysto compare participant performance on the measures of the newconstructs Reading to Learn and Reading to Integrate with perform-ance on more traditional measures of basic comprehension Thus we opted to use discriminant analysis because of its parsimony of description and clarity of interpretation (Stevens 1996)Discriminant analysis a technique recommended to describe groupdifferences or predict group membership based on a comparison ofmultiple predictors (Huberty 1994) has been used in other areas ofapplied linguistic research to investigate creation of a student profileof success or failure on Computer Assisted Language Learning(CALL) lessons (Jamieson et al 1993) and accurate classification oftext types into registers (Biber 1993) among other purposes
To further distinguish basic comprehension from the new con-structs we conducted discriminant analysis on each of the twolanguage groups (native and nonnative) to determine whetherReading to Learn and Reading to Integrate would classify partici-pants in the same way that Basic Comprehension would We dividedthe native speaker and nonnative speaker groups into three levelshigh middle (mid) and low reading ability scorers based on thebasic comprehension measure chosen for that group (NelsonndashDennyfor native speakers TOEFL Reading Comprehension for nonnativespeakers) Research methodologists (Tabachnick and Fidell 1996513) note that robustness is expected when the smallest group has isleast 20 our smallest group was 25 To check the assumption ofhomogeneity of the variancecovariance matrices we examined theoutcomes of Boxrsquos M Test and found them all nonsignificant(Klecka 1980) Each group was checked for outliers usingMahalanobis distance treated as Chi-Square and no outliers were found (Tabachnick and Fidell 1996) Thus the data met allassumptions required for use of discriminant analysis
a Discriminant analysis for nonnative speakers To organize thediscriminant analysis in order to see if the Reading to LearnReadingto Integrate Composite classified participants similarly to themeasure of Basic Comprehension (for nonnative speakers TOEFLReading Comprehension) we divided the entire nonnative speakergroup (n 146) into three levels of basic comprehension high mid
192 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 193
and low These reading ability groups of high mid and low werebased on typical TOEFL Reading Comprehension score levelsrequired for program entry Participants were classified as high iftheir scores were greater than or equal to 550 or 56 and above on thescaled score on the TOEFL Reading Comprehension (550 is the cutscore often used for graduate entry) Participants were classified asmid if scores ranged between 500 and 549 or 50 to 55 on the TOEFLReading Comprehension Participants were classified as low if theirscores fell below 500 or 49 and below on TOEFL ReadingComprehension (500 is a minimum TOEFL score sometimes usedfor undergraduate admission often with the proviso that studentsenroll in ESL classes either prior to official enrollment or concur-rently) For our entire nonnative speaker group descriptive statisticson basic comprehension reading ability group membership levelsappear in Table 9
We ran SPSS Discriminant Analysis with initial grouping variablesof high mid and low reading ability We compared initial readingability levels with high mid and low categories on the Reading toLearnReading to Integrate Composite a new variable reflectinglevel of performance on the Reading to Learn and Reading toIntegrate measures combined5 The discriminant analysis yieldedone discriminant function with an eigenvalue of 92 responsible for999 of the variance in outcomes Wilkrsquos Lambda for this functionwas 52 significant at 001 the associated Chi-Square value wasextremely large (9093) and highly significant ( p 001) indicatingthe group centroids on the composite Reading to LearnReading toIntegrate function for the three nonnative speaker reading abilitygroups were significantly different Both Reading to Learn and
Table 9 Descriptive statistics for nonnative speakers by TOEFL reading compre-hension reading ability groups (n 143)
Reading ability group n Mean on TOEFL reading comprehension sd
High (56) 57 5968 303Mid (50ndash55) 42 5281 167Low (49) 44 4211 547
Total 143 5226 822
Note Total n 143 due to loss of three cases in anomalous Reading to Integratesession
5We first calculated two separate discriminant analyses one for Reading to Learn and one for Reading to Integrate but we found that both loaded on a single function so we used thecomposite in subsequent analyses
194 New tasks for reading comprehension tests
Reading to Integrate loaded significantly on the discriminant func-tion at 86 for Reading to Learn and 78 for Reading to Integrate ( p 05)
Over half (649) of the high reading ability group remained highon the new measure less than half (429) of the mid reading ability group remained classified as mid Hence the Reading toLearnReading to Integrate Composite was particularly influential inreclassifying the mid reading ability group and to a lesser extent thehigh group However most (818) of the low reading ability groupremained low on the Reading to LearnReading to IntegrateComposite (see Table 10) Of the 143 nonnative speaker participants92 (64) remained in the initial basic comprehension category onthe composite the rest moved but in different directions Twenty-one participants (147) were classified into a higher category on theReading to LearnReading to Integrate Composite than their initialbasic comprehension level would have suggested while 31 (217)were reclassified into a lower category Thus 51 participants(364) just over one third of the sample were classified differentlybased on their Reading to LearnReading to Integrate Compositeperformance
b Discriminant analysis for native speakers Because one of ourgoals in this project was to probe the possible validity of these newmeasures by assessing performance of two groups native as well as
Table 10 Discriminant analysis comparison of nonnative speaker reading abilitygroups with reading to learnreading to integrate composite (n 143)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto integrate composite
High Mid Low
CountHigh (56) 37 17 3 57Mid (50ndash55) 13 18 11 42Low (49) 2 6 36 44Reclassification total 52 41 50 143
PercentageHigh (56) 649 298 53 1000Mid (50ndash55) 310 429 262 1000Low (49) 45 136 818 1000
NoteTotal n 143 due to loss of three cases in anomalous Reading to Integratesession
Latricia Trites and Mary McGroarty 195
nonnative speakers we conducted a parallel discriminant analysisfor native speakers Thus for the native speakers we followed thesame procedure dividing the entire native speaker group (n 105)into three levels of basic comprehension high mid and low basedon score distances of 5 standard deviations from the sample meanof the NelsonndashDenny test for these participants Native speakersneed not take reading comprehension tests when entering the univer-sity so the three-way split was based entirely on our sample dataParticipants were classified as high if their scores on the Nelson-Denny were greater than or equal to 135 Participants were classifiedas mid if their scores ranged between 119ndash134 on the Nelson-DennyParticipants were classified as low if their scores fell at or below 118on the Nelson-Denny Descriptive statistics for the entire nativespeaker sample on basic comprehension group membership appearin Table 11
The discriminant analysis yielded one discriminant function with an eigenvalue of 31 responsible for 987 of the variance inoutcomes Wilkrsquos Lambda for this function was 76 significant at 001 the associated Chi-Square value was large (2639) and highlysignificant (p 001) indicating the group centroids on the discrim-inant function for the three reading ability groups on the Reading toLearnReading to Integrate Composite were significantly differentPooled within groups correlations between discriminating variablesshowed that Reading to Learn correlated with the first discriminantfunction at a level of 81 Reading to Integrate correlated with thesecond discriminant function at 71 This contrasts with findings forthe nonnative speakers where scores on the combined new measuresloaded significantly on only one discriminant function For nativespeakers then there is evidence for two significant discriminantfunctions although the first accounts for almost all of the varianceAlthough these two measures (Reading to Learn and Reading toIntegrate) loaded on two separate discriminant functions they still
Table 11 Descriptive statistics for native speakers by NelsonndashDenny reading abilitygroups (n 101)
Reading ability group n Mean on NelsonndashDenny sd
High (135) 39 14126 519Mid (119ndash134) 37 12565 509Low (118) 25 10288 1098Total 101 12604 1652
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
196 New tasks for reading comprehension tests
showed moderate correlations with the alternate function6 justifyingthe composite calculations
Results of discriminant analysis for native speakers seen in Table 12 show a different pattern than that observed for nonnativespeakers Nearly three-fourths (718) of the native speakersclassified as high in the basic comprehension reading ability groupremained high on the Reading to LearnReading to IntegrateComposite For the mid group however only 189 remainedclassified as mid Just over half (52) of the low reading abilitygroup members remained low on the Reading to LearnReading to Integrate Composite As with the nonnative speakers participantsin the mid category on basic comprehension showed the mostfrequent reclassification Forty-eight of the 101 (475) nativespeaker participants remained in the initial classification categoriesTwenty-two (218) were reclassified into a higher category and 31(307) were reclassified into a lower category Thus 53 participants(525) over half of the sample were classified differently based ontheir performance on the Reading to LearnReading to IntegrateComposite
6Reading to Learn correlated with function 2 at -58 Reading to Integrate with function 1 at 71
Table 12 Discriminant analysis comparison of native speaker reading abilitygroups with Reading to LearnReading to Integrate Composite (n 101)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto Integrate Composite
High Mid Low
CountHigh (56) 37 17 3CountHigh (135) 28 3 8 39Mid (119ndash134) 10 7 20 37Low (118) 5 7 13 25Reclassification total 43 17 41 101
PercentageHigh (135) 718 77 205 1000Mid (119ndash134) 270 189 541 1000Low (118) 200 280 520 1000
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
V Interpretation
Analyses done to answer Research Questions 1 and 2 showed that performance on Reading to Learn and Reading to Integratemeasures was significantly influenced by language background and level of education (graduate vs undergraduate for nonnativespeakers only) Moreover level of computer familiarity had nosignificant effect on Reading to Learn and Reading to Integrateperformance
Correlations showed that as expected all reading measures corre-lated to some degree generally answering Research Question 3 Themost interesting results came from the discriminant analyses becausethey showed a pattern of differential classification on the new taskssensitive to initial reading ability This pattern differed for nonnativespeakers and native speakers Examination of the reclassificationsbased on the new tasks revealed some of the problems of using abasic comprehension-only test to predict performance on more chal-lenging literacy tasks These results also imply the need for furtherwork on new measures of advanced literacy skills such as Reading toLearn and Reading to Integrate to reflect trends in construct-drivenassessment (Pellegrino et al 1999)
For the nonnative speakers most (818) of the participants withTOEFL Reading Comprehension below 50 remained classified as lowon the Reading to LearnReading to Integrate Composite suggestingthe existence of a lower threshold of academic English proficiencyParticipants below this threshold were unlikely to perform well ontasks assessing Reading to Learn and Reading to Integrate Even tasksinvolving only selection (such as BC1 and BC2) rather than produc-tion of responses were difficult for this group However for thenonnative speaker readers in the mid and high reading ability groupsbasic comprehension level was not nearly as consistent a predictor of performance on the Reading to LearnReading to IntegrateComposite This was especially striking for those in the mid readingability group Approximately one fourth of the mid group droppedinto the low category on the Reading to LearnReading to IntegrateComposite indicating that they experienced more difficulty in com-pleting the new measures but approximately one fourth could indeedcategorize and synthesize information relatively well despite mid-dling performance on basic comprehension This finding suggeststhat for nonnative speaker readers in the mid reading ability categorybasic comprehension measures were insufficient to predict theirperformance on the new measures
Latricia Trites and Mary McGroarty 197
Almost two-thirds of the high basic comprehension nonnativespeakers remained high on the Reading to LearnReading toIntegrate Composite but one third dropped suggesting that theycould not categorize and synthesize information from texts as well asthey could recognize information For this third the recognition-onlytask multiple-choice basic comprehension overestimated ability tomanipulate and synthesize information For the high and mid read-ing ability nonnative speaker groups basic comprehension-only testsmay then overestimate academic English proficiency
Results for the native speakers showed a more definitive pattern ofreclassification For the low reading ability group approximatelyhalf were misclassified suggesting that the basic comprehensionmeasure underrepresented their ability to categorize and synthesizeinformation in academic texts Most of the mid reading ability groupwere reclassified as lower on the Reading to LearnReading toIntegrate Composite showing that basic comprehension resultsoverestimated ability to succeed on more challenging tasks On theother hand almost one third of the mid reading ability group didbetter on the Reading to LearnReading to Integrate Composite forthem basic comprehension underrepresented their ability to com-plete more challenging reading tasks For the high ability readersnearly one third dropped on the Reading to LearnReading toIntegrate Composite For them the basic comprehension measureoverestimated their level of academic English proficiency The other two thirds of the high reading ability group remained highsuggesting a higher threshold of academic English proficiency
Considering results from both the nonnative speakers and nativespeakers we conclude that the new tasks did assess something dif-ferent from basic comprehension once a lower level threshold ofbasic academic English proficiency had been achieved Examinationof scatterplots based on discriminant functions indicated an obviousseparation between participants who could and could not perform onthe Reading to LearnReading to Integrate Composite thus provid-ing some tentative evidence for concurrent validity (Messick 1989Chapelle 1999) We had hoped to find clear evidence of a hierarchysuggesting that Reading to Learn was demonstrably more difficultthan basic comprehension and Reading to Integrate demonstrablymore difficult than Reading to Learn but results did not yield anobvious hierarchy Results did however suggest an even simplerpattern a dichotomy For those nonnative speakers above the lowerthreshold of English language proficiency which in our data wouldbe approximately 500 or 49 on the scaled TOEFL Reading
198 New tasks for reading comprehension tests
Comprehension scores some could perform well on the newmeasures some could not revealing two rather than three groups onthe Reading to LearnReading to Integrate Composite These resultslead us to speculate that the new measures tap additional skills suchas lsquosophisticated discourse processes and critical thinking skillsrsquo(Enright and Schedl 1999 24) in addition to language proficiencyFor native speakers the discriminant loading suggested a possiblehierarchy of difficulty but the scatterplots and classification tablesrevealed a dichotomy Reclassification based on the Reading toLearnReading to Integrate Composite nearly eliminated the mid basic comprehension reading ability category with only 168of the native speakers qualifying as mid on the Reading toLearnReading to Integrate Composite
VI Conclusions
While we acknowledge that there are some limitations to this projectsuch as the time required to administer and score new measuresthe results are illuminating useful and suggest some considerationsfor future research The first consideration relates to appropriateclassification of students based on academic reading abilities Thiscorresponds to the role of TOEFL as a gatekeeper by many institu-tions For TOEFL test-takers whose current TOEFL scores would bein an intermediate to high intermediate range the new tasks couldassess additional abilities relevant to academic performance Forsuch participants additional test development efforts are warrantedThe majority of high reading ability nonnative speakers (649)could perform the new measures a third could not Our resultssuggest that some unknown number of mid reading ability nonnativespeakers are more capable of succeeding at more challenging aca-demic reading tasks than their current level of basic comprehensionassessment would indicate At the same time there are some highand mid reading ability participants who could not perform the moredemanding tasks admitting such students directly into universityprograms could result in failure These results suggest that for moststudents at lower levels of basic comprehension (in our study thosewith TOEFL scores below 490 to 500) development of new tasks isunnecessary Nevertheless our results indicate that a certain smallpercentage (in our sample 8 students 182) of nonnative speakersclassified as low reading ability on basic comprehension did in fact
Latricia Trites and Mary McGroarty 199
perform adequately on more challenging tasks like Reading to Learnand Reading to Integrate Given the very large number of TOEFLtest-takers around the world this finding merits further research Itwould be potentially unfair to exclude such students from universitystudy based on basic comprehension-only measures such as the cur-rent TOEFL We would urge institutions that use TOEFL to considerscores on these new more demanding tasks if doing so wouldaddress institutional needs According to interview data collected inconjunction with this project (Trites 2000 Chapter 6) participantsperceived that successful completion of the new tasks required athorough understanding of the texts whereas the multiple-choicetests were less demanding requiring only superficial grasp of con-tent This finding corroborates the observation of Freedle and Kostin(1999 3) who note that often examinees do not need to comprehendthe accompanying text to answer the test item Therefore for allTOEFL test-takers incorporating more challenging tasks such as theReading to Learn and Reading to Integrate measures into typicalEnglish language instruction could have positive washback effects
The second area of consideration is the focus on additional rele-vant research For predictive validity it is important to determine thecorrelation between the Reading to LearnReading to IntegrateComposite measures and actual academic performance in universityclasses Correlations might differ depending on the degree of catego-rization and synthesis required by different major fields of studyAnother area for future research is related to the novelty of theReading to Learn task as an assessment technique Current pedagog-ical trends in literacy instruction emphasize the use of graphic organ-izers as a means to understand and manipulate information in textsTo our knowledge graphic organizers are typically used as classroomactivities rather than assessment tools They have good potential foruse in assessment if students are familiar with them and if appropri-ate scoring systems can be developed This project demonstrates that it is possible though labor intensive to develop reliable scoringsystems Further research is needed to explore refinement of this tasktype and related scoring systems for use in testing programs withlarge numbers of test-takers and scorers Although the Reading toIntegrate task (generating a synthesis) was more familiar the scoringsystem was innovative because it reflected a readerrsquos ability torecognize textual frames as well as integrate information If testdevelopers are interested in the abilities of nonnative speaker readersto perform such tasks further development of similar tasks andscoring systems is warranted While a substantial investment of time
200 New tasks for reading comprehension tests
would be required to refine the administration and scoring systemsneeded for more complex tasks such as these it should be weighedagainst the possible danger of under-representing the likelihood ofacademic success based on results of the basic comprehension-onlymeasures still most often used to assess academic proficiency
Acknowledgements
This project was funded by a grant from Educational Testing Serviceas part of the TOEFL 2000 effort We appreciate the cooperation ofthe ETS staff members who assisted in the development of passagespecific tasks and other aspects of the research however no officialendorsement of Educational Testing Service should be inferred
VII References
Bachman L 2000 Modern language testing at the turn of the century assur-ing that what we count counts Language Testing 17 1ndash42
Biber D 1993 Using register-diversified corpora for general language studiesComputational Linguistics 19 219ndash41
Britt M Rouet J and Perfetti C 1996 Using hypertext to study and reasonabout historical evidence In Rouet J Levonen J Dillon A and Spiro Reditors Hypertext and cognition Mahwah NJ Lawrence Erlbaum 43ndash72
Chapelle C 1999 Validity in language assessment Annual Review of AppliedLinguistics 19 1ndash19
Educational Testing Service 1997 TOEFL test and score manual PrincetonNJ Educational Testing Service
mdashmdash 1998 Draft TOEFL 2000 research agenda framework areas of researchResearch agenda three TOEFL 2000 internal document Princeton NJEducational Testing Service
Eignor D Taylor C Kirsch I and Jamieson J 1998 Development of ascale for assessing the level of computer familiarity of TOEFL exami-nees TOEFL Research Report No 60 Princeton NJ EducationalTesting Service
Enright M and Schedl M 1999 Reading for a reason using reader purposeto guide test design TOEFL 2000 Internal Report Princeton NJEducational Testing Service
Enright M Grabe W Mosenthal P Mulcahy-Ernt P and Schedl M1998 A TOEFL 2000 framework for testing reading comprehension aworking paper Princeton NJ Educational Testing Service
Foltz P 1996 Comprehension coherence and strategies in hypertext InRouet J Levonen J Dillon A and Spiro R editors Hypertext and cog-nition Mahwah NJ Lawrence Erlbaum 109ndash36
Latricia Trites and Mary McGroarty 201
Freedle R and Kostin I 1999 Does the text matter in a multiple choice testof comprehension The case for the construct validity of TOEFLrsquosminitalks Language Testing 16 2ndash32
Goldman S 1997 Learning from text reflections on the past and suggestionsfor the future Discourse Processes 23 357ndash98
Hayes J and Hatch J 1999 Issues in measuring reliability correlationversus percentage of agreement Written Communication 16 354ndash67
Huberty C 1994 Applied discriminant analysis New York John WileyJamieson J Campbell J Norfleet L and Berbisada N 1993 Reliability
of a computerized scoring routine for an open-ended task System 21305ndash22
Jamieson J Norfleet L and Berbisada N 1993 Successes failures anddropouts in computer-assisted language lessons Computer AssistedEnglish Language Learning Journal 4 12ndash20
Klecka W 1980 Discriminant analysis In Lewis-Beck M editorQuantitative applications in the social sciences Volume 19 NewburyPark CA Sage
Lehto M Zhu W and Carpenter B 1995 The relative effectiveness ofhypertext and text International Journal of Human-ComputerInteraction 7 293ndash313
McNamara D and Kintsch W 1996 Learning from texts effects of prior knowl-edge and text coherence Discourse Processes 22 247ndash88
Messick S 1989 Validity In Linn RL editor Educational measurement 3rdedition New York American Council on Education Macmillan 13ndash103
Meyer B 1985a Prose analyses purposes procedures and problems InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 11ndash64
mdashmdash 1985b Prose analysis purposes procedures and problems Part 2 InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 269ndash97
Monks V 1997 AugustSeptember Two views same waterway NationalWildlife 35 36ndash37
Pellegrino J Baxter G and Glaser R 1999 Addressing the lsquotwo disci-plinesrsquo problem linking theories of cognition and learning with assess-ment and instructional practice Review of Research in Education 24307ndash53
Perfetti C 1997 Sentences individual differences and multiple texts threeissues in text comprehension Discourse Processes 23 337ndash55
Perfetti C Britt MA and Georgi M 1995 Text-based learning andreasoning studies in history Hillside NJ Lawrence Erlbaum
Perfetti C Marron M and Foltz P 1996 Sources of comprehension failuretheoretical perspectives and case studies In Cornoldi C and Oakhill Jeditors Reading comprehension difficulties processes and interventionMahwah NJ Lawrence Erlbaum 137ndash65
202 New tasks for reading comprehension tests
Reinking D 1988 Computer-mediated text and comprehension differencesthe role of reading time reader preference and estimation of learningReading Research Quarterly 23 484ndash500
Reinking D and Schreiner R 1985 The effects of computer-mediated texton measures of reading comprehension and reading behavior ReadingResearch Quarterly 20 536ndash53
Spivey N 1997 The constructivist metaphor reading writing and themaking of meaning San Diego CA Academic Press
Stevens J 1996 Applied multivariate statistics for the social sciences 3rdedition Mahwah NJ Lawrence Erlbaum
Tabachnick B and Fidell L 1996 Using multivariate statistics 3rd editionNew York Harper Collins
Taylor C Jamieson J Eignor D and Kirsch I 1998 The relationshipbetween computer familiarity and performance on computer-basedTOEFL test tasks TOEFL Research Report No 61 Princeton NJEducational Testing Service
Tennesen M 1997 NovemberDecember On a clear day National Parks 7126ndash9
Trites L 2000 Beyond basic comprehension reading to learn and reading tointegrate for native and non-native speakers Unpublished doctoraldissertation Northern Arizona University Flagstaff AZ
Van den Berg S and Watt J 1991 Effects of educational setting on studentresponses to structured hypertext Journal of Computer-BasedInstruction 18 118ndash24
Van Dijk TA and Kintsch W 1983 Strategies of discourse comprehensionNew York Academic Press
Wiley J and Voss J 1999 Constructing arguments from multiple sourcestasks that promote understanding and not just memory for text Journalof Educational Psychology 91 310ndash11
Zimmerman T 1997 December 29 Filter it with billions and billions of oys-ters how to revive the Chesapeake Bay US News and World Report 12363
Appendix 1a Chart completion task
Directions Complete the following chart Fill in as much detail aspossible from the text read by categorizing the information into thedifferent areas on the chart Do not use single words for yourresponses form your responses in phrases or complete sentencesInclude examples from the text
Make no judgments about the accuracy of causes or effects or theeffectiveness of the solutions mentioned in the text Solutions are seen
Latricia Trites and Mary McGroarty 203
204 New tasks for reading comprehension tests
as any action taken in response to the problem(s) Solutions can takemany forms such as proposed solutions attempted solutions or failedsolutions Also space is provided under each category for examplesExamples are specific examples found in the text that are used by theauthor(s) to exemplify the problems causes effects or solutions inthe text However there may not be examples for every category
Points will be awarded for correct responses only There is no penaltyfor incorrect responses Points will be awarded in the following manner
Problems and Solutions 10 points eachCauses and Effects 5 points eachExamples 1 point each
Problems Causes Effects Solutions
Examples Examples Examples Examples
Latricia Trites and Mary McGroarty 205
Ap
pen
dix
1b
Rea
din
g t
o L
earn
sco
rin
g r
ub
ric
0ndash2
41 t
ota
l po
ssib
le
Pro
ble
ms
(10
po
ints
) 1
wo
rd
Cau
ses
(5p
oin
ts)
1 w
ord
E
ffec
ts (
5 p
oin
ts)
1 w
ord
S
olu
tio
ns
(10
po
ints
en
d)
1 w
ord
(6 p
oin
ts)
40
max
imu
m(3
po
ints
) 5
5 m
axim
um
(3 p
oin
ts)
30
max
imu
m(6
po
ints
) 9
0 m
axim
um
bullA
ir p
ollu
tio
nS
mo
g i
n t
he
bullS
ulf
ur
nit
rog
en e
mis
sio
ns
bullTre
es a
nd
pla
nts
aff
ecte
dS
tric
ter
En
vir
on
men
tal
Law
s
Nati
on
al
Park
sbull
Aci
d r
ain
(in
jure
d)
gro
wth
hin
der
ed(r
eso
luti
on
s
acts
)
bullO
pp
osit
ion
or
Ign
ori
ng
bull
Gro
un
d-l
evel
ozo
ne
bullV
isib
ilit
y d
ecre
ased
bull19
77 C
lean
Air
Act
(or
lack
of
coo
per
atio
n)
to
Urb
an
In
du
str
ial
em
issio
ns
bullM
eta
ls l
oo
sen
ed
into
wat
ers
Am
en
dm
en
ts l
ab
elin
gN
Ps
envi
ron
men
tal s
tan
dar
ds
bullA
uto
mo
bile
emis
sio
ns
(su
rfac
eg
rou
nd
wat
er
as C
lass I
are
as
(reg
ula
tio
ns)
bullP
ow
er
pla
nts
fac
tori
esp
lan
ts
wat
er p
ollu
ted
)bull
Reg
ion
al h
aze
reg
ula
tio
ns
bullN
o t
rue
lsquopo
int
sou
rces
rsquoem
issi
on
sbull
Nu
trie
nts
rem
oved
pro
po
sed
by
the
EPA
(myri
ad
of
sm
aller
po
llu
tio
n
bullE
mis
sio
ns
fro
m S
mo
kesta
cks
(lea
ched
) fr
om
so
il bull
Red
ucti
on
of
allo
wab
leso
urc
es
rath
er t
han
on
e (c
him
neys)
and
or
pla
nts
po
llu
tio
n s
tan
dard
s f
rom
la
rge
sou
rce)
bullE
mis
sio
ns
fro
m K
iln
sbull
Pu
blic o
utc
ryo
ver
stan
ce
ind
ust
rybull
Nati
on
al
Park
s l
imit
ed
bull
Un
reg
ula
ted
po
lluti
on
sm
og
o
f b
ig b
usi
nes
sg
ove
rnm
ent
bullC
lean
Air
Act
set
ju
risd
icti
on
of
po
lluti
on
(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)bull
Aq
uati
c l
ife in
jure
d (
dam
aged
)vis
ibilit
y g
oals
sou
rces
ou
tsid
e o
f th
e p
arks
bullS
mo
ke f
rom
Co
ntr
olled
bu
rns
bullO
bje
cti
ng
to c
on
str
ucti
on
bullE
mis
sio
ns
fro
m la
rge
citi
esp
erm
its
bullId
en
tifi
cati
on
of
po
lluti
on
so
urc
es
bullIn
du
str
y m
od
ificati
on
s
Insta
llati
on
o
f d
evic
es
(scr
ub
ber
s)
to r
edu
ce
po
lluti
on
bullP
ub
lic p
ressu
re t
op
rote
ct p
arks
206 New tasks for reading comprehension testsA
pp
en
dix
1b
(co
nti
nu
ed)
Exam
ple
s (
1 p
oin
t) 5
maxim
um
Exam
ple
s (
1 p
oin
t o
r 5 p
oin
ts
Exam
ple
s (
1 p
oin
t) 6
maxim
um
Exam
ple
s (
1 p
oin
t o
r 10 p
oin
ts
wit
h o
vera
rch
ing
cau
se
wit
h o
vera
rch
ing
so
luti
on
liste
d a
bo
ve)
liste
d a
bo
ve)
bullG
reat
Sm
oky M
ou
nta
inbull
Au
tom
ob
ile
emis
sio
ns
bullLe
aves
of
pla
nts
tu
rnin
g
bull19
77 C
lean
Air
Act
N
atio
nal
Par
kbull
Po
wer
pla
nts
fac
tori
es
pu
rple
an
d b
row
n (
stip
plin
g)
Am
en
dm
en
ts
lab
elin
gN
Ps
bullG
ran
d C
an
yo
np
lan
ts e
mis
sio
ns
bullH
ind
erin
g p
ho
tosyn
thesis
of
as C
lass I
are
as
Nat
ion
al P
ark
bullE
mis
sio
ns
fro
m
pla
nts
an
d t
rees
bullS
mo
ky M
ou
nta
in S
ou
rces
S
mo
kesta
cks
(ch
imn
eys)
bullR
ed
ucti
on
of
vis
ibilit
yat
bullR
egio
nal
haz
ere
gu
lati
on
s
fro
m O
hio
New
Yo
rk
bullE
mis
sio
ns
fro
m K
iln
sG
ran
d C
an
yo
np
rop
ose
d b
y th
e E
PA
Atl
anta
etc
bull
Un
reg
ula
ted
po
lluti
on
sm
og
bull
Vis
ibili
ty r
educ
ed 9
0 o
f bull
Red
ucti
on
of
allo
wab
lebull
Gra
nd
Can
yon
So
urc
es(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)th
e da
ysp
ollu
tio
n s
tan
dard
s f
rom
fro
m C
A N
V U
T A
Z N
M
bullS
mok
e fr
om C
on
tro
lled
bu
rns
bullS
ee v
ague
blu
e m
asse
sin
du
stry
and
Mex
ico
bullE
mis
sio
ns
fro
m la
rge
citi
esbull
See
hal
f as
far
as
in 1
919
bullT
N L
utt
rell
corp
bu
ildin
g
per
mit
sit
uat
ion
Exam
ple
s (
1 p
oin
t)
10 m
axim
um
Exam
ple
s (
1 p
oin
t)
5 m
axim
um
bullN
avaj
o G
ener
atin
g S
tati
on
bull
Ob
ject
ing
to
kiln
s in
TN
Pag
e A
Z (
Po
wer
Pla
nt
AZ
)bull
Res
earc
her
s u
sin
g s
cien
tifi
cbull
Po
wer
pla
nts
in T
N a
nd
OH
tech
no
log
y to
ID s
ou
rces
rive
r va
lley
(rad
ioac
tive
iso
top
es)
bullS
ou
ther
n C
alif
orn
ia E
dis
on
bull
So
uth
ern
Cal
ifo
rnia
Ed
iso
nP
lan
t L
aug
hlin
NV
Pla
nt
in L
aug
hlin
NV
bull
Ten
nes
see
Lutt
rell
Kiln
s T
Nid
enti
fied
as
po
int
sou
rce
bullIn
du
stry
in A
Zbull
Scru
bber
sin
stal
led
at
bullC
ars
in C
AN
avaj
o G
ener
atin
g P
lan
tbull
Sm
oke
stac
ks in
NM
NV
UT
bull10
r
edu
ctio
n o
f p
ollu
tio
nbull
Los
An
gel
esin
10ndash
15 y
ears
(g
oal
of
no
bullA
tlan
tam
an-m
ade
po
lluti
on
)bull
New
Yo
rk
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
Because these constructs go beyond basic comprehensionReading to Learn and Reading to Integrate are hypothesized to bemore difficult reading tasks than Reading to Find Information andReading for Basic Comprehension Perfetti (1997) further suggeststhat Reading to Integrate is a more difficult task than Reading toLearn because it not only requires an integration of a Text Model anda Situation Model but requires an integration of multiple TextModels and multiple Situation Models Thus current reading theorysuggests a difficulty hierarchy of reading tasks based on the level ofintegration necessary to complete the tasks successfully Severalstudies (Perfetti et al 1995 1996 Britt et al 1996 Wiley and Voss1999) have attempted to move beyond basic comprehension andexamine readersrsquo ability to integrate the information from multipletexts into one cohesive knowledge base by having students makeconnections compare or contrast information across texts
Additionally recent research has addressed the effects of computerson reading and assessment such research is relevant to the currentproject because the new TOEFL is administered via computersReading-medium studies have shown that the only effect that com-puters have on reading is related to task (Reinking and Schreiner1985 Reinking 1988 van den Berg and Watt 1991 Lehto et al1995 Perfetti et al 1995 1996 Britt et al 1996 Foltz 1996Wiley and Voss 1999) Taylor et al (1998) found that after minimalcomputer training familiarity with technology did not have a signif-icant effect on examineesrsquo performance on TOEFL-like questionsBecause of the relevance of computer familiarity to TOEFL admin-istration a brief measure of computer familiarity was included in theresearch
For this project we asked three research questions
1) Is performance on a measure of Reading to Learn affected by medium of presentation (paper versus computer) technologyfamiliarity native language (native versus nonnative speakers of English) or level of education (graduate versus under-graduate)
2) Is performance on a measure of Reading to Integrate affected bymedium of presentation (paper versus computer) technologyfamiliarity native language (native versus nonnative) or level ofeducation (graduate versus undergraduate)
3) To what extent are measures of finding informationbasic read-ing comprehension Reading to Learn and Reading to Integraterelated
Latricia Trites and Mary McGroarty 177
III Methods
1 Participants
Two hundred and fifty-one participants the majority undergraduatesvolunteered to take part in this study The sample consisted of 105undergraduate native speakers of English (NSUs) 106 undergraduatenonnative speakers (NNSUs) and 40 graduate nonnative speakers(NNSGs) of English at a midsized southwestern university All data were collected between February and October 1999 All under-graduate participants were recruited through large undergraduateclasses in the areas enrolling most NNSs (business administrationhotel management engineering social sciences and humanities)We tested all NNSs accessible at the institution at the time of datacollection compared to a national sample of international studentsfrom the prior academic year we had a relatively larger proportionof undergraduate relative to graduate students Nearly all undergrad-uate participants were young adults with an average age of 21Nonnative speakers were also recruited from students enrolled in thesummer intensive English program which is made up of studentsneeding to increase TOEFL scores to at least 500 in order to enrollat a university We included 46 participants (32 of NNS sample)with TOEFL scores below 500 in the nonnative sample Graduatenonnative speakers (n 40) were recruited from the entire univer-sity population and had an average age of 3075 Nonnative speakersrepresented a range of language backgrounds One third wereJapanese with other Asian Germanic and Romance languages alsosubstantially represented Both the relatively modest sample size andthe all-volunteer nature of the participant sample preclude directgeneralization to the worldwide TOEFL population but participantswere representative of the levels of international students at the insti-tution where they were enrolled Participants who completed all fourdata collection sessions received a payment of US$10 per hour(US$40 for the entire project)
2 Instruments
This project used three existing instruments two to determine initialreading levels and one to assess levels of computer familiarity andtwo new instruments one for Reading to Learn and one for Readingto Integrate these were developed especially for the project Each ofthe new measures also served as the basis for an additional measure
178 New tasks for reading comprehension tests
of basic reading comprehension related directly to the text includedin the new task Thus each participant completed a total of sevendifferent instruments
a Existing instruments Initial levels of reading comprehensionwere determined based on the NelsonndashDenny Reading Test(NelsonndashDenny) Form G used to identify the reading levels of theNSs and three retired versions of the Institutional TOEFL ReadingComprehension Section (TOEFL Reading Comprehension) used toidentify the reading levels of the NNSs Although each of these testswas used to assess reading levels in the population for which it had been developed all 251 participants took both tests in order toprovide comparative data All 251 participants also completed a brief computer familiarity questionnaire
Participantsrsquo computer familiarity was determined through an 11-item questionnaire based on a longer 23-item questionnairepreviously developed by ETS (Eignor et al 1998) In the presentstudy we used only the 11 items that loaded the most heavily on themajor factors resulting from administration to a large sample ofTOEFL participants For these 11 items developers determined thereliability to be 93 using a split-half method (Eignor et al 199822) This brief questionnaire took approximately 5 minutes tocomplete reliability in our sample using coefficient alpha was 87
b Texts used for new measures In developing the new tasks weselected texts that would conform to the design specifications ofTOEFL 2000 They were problemsolution texts recommended asone of the potentially relevant text types for TOEFL 2000 (Enright et al 1998) Longer texts were used because these represented morechallenging and authentic academic tasks (Enright et al 1998) Weused one 1200-word and two 600-word texts The longer text(Tennesen 1997) was used to assess Reading to Learn and the two600-word texts (Monks 1997 Zimmerman 1997) were used toassess Reading to Integrate We chose these text lengths based onwork by Meyer (1985a) and further research by the first authorindicating that natural science texts between 1200 and 1500 wordsincluded representation of all necessary macro-rhetorical structuresof problemsolution texts with or without explicit signaling While1200ndash1500 word texts provide optimal representation of the macro-rhetorical structures texts of 600-words provide all the basic macro-rhetorical structures present in problemsolution texts Thus these
Latricia Trites and Mary McGroarty 179
180 New tasks for reading comprehension tests
lengths were long enough for adequate argumentation but not so long that they were excessively redundant (Enright et al 1998)Texts were also matched for readability according to standard read-ability scales such as the FleschndashKincaid ColemanndashLiau andBormuth scales and averaged a minimum of grade level 110 to 120on these scales Also all texts pertained to natural and social scienceseach text covered environmental issues such as air and water pollution(Enright et al 1998) Thus text topics were similar across tasks
c New instruments used in the study Three new reading measureswere used in this study to assess Reading to Learn Reading to Integrateand Basic Comprehension Trites (2000 Chapters 2 and 3) presents amore extensive review of literature and rationale for development of thenew measures
bull Reading to Learn The first new measure completion of a chart was used to determine participantsrsquo ability to read to learnSpivey (1997 69) suggests that readersrsquo categorization of infor-mation in text offers insight into their cognitive processes andtheir making of meaning We designed a measure to be used with a 1200-word text that students read on either paper or com-puter Students were asked to recall identify and categorizeinformation from the text on a chart reflecting macro-rhetoricalstructures called macrostructures in this study (problems andsolutions) and other types of information from problemsolutiontexts (causes effects and examples) categories based on thework of Meyer (1985a) The scoring rubric based on work byMeyer (1985b) and later modified by Jamieson et al (1993)awarded points only for the upper levels of textual structurerepresented on the chart (for task and scoring rubric seeAppendix 1) We weighted the information supplied on the chartas follows 10 points for correct information in the problem andsolution categories five points for correct information suppliedin the cause and effect categories and one point for accurateexamples This weighting reflects Meyerrsquos (1985b) hierarchicallevels which characterize problem and solution propositions ashigher order structures while the other categories represent lowerorder propositions1 The theoretical maximum score for this scale
1Students received no points for information improperly placed or for information not found in thetext
Latricia Trites and Mary McGroarty 181
was 241 which would result from maximum points given in allcategories The first author and two research assistants spent35ndash40 hours creating revising norming the scoring rubric anddeveloping the scoring guide (Trites 2000 Chapter 3) To deter-mine interrater reliability we used coefficient alpha rather thanpercentage of agreement because percentage of agreementinflates the likelihood of chance agreement (Hayes and Hatch1999) After norming overall interrater reliability was 99(coefficient alpha) with similarly high reliabilities assessed withsimilarly high alpha coefficients for all subcategories2
bull Reading to Integrate The second new measure assessed Readingto Integrate The task used to assess Reading to Integraterequired participants to read two 600-word texts and compose awritten synthesis The prompt asked students to make connec-tions across the range of ideas presented thus we asked readersto synthesize information rather than summarize or makecomparisons (Wiley and Voss 1999) This synthesis was scoredbased on an analytic scale ranging from 0 to 80 reflecting read-ersrsquo ability to recognize and manipulate the structure of the textsinclude specific information and express connections acrosstexts through the use of cohesive devices (for task and scoringrubric see Appendix 2) The test was designed to measure theintegration of content from both readings and did not assessother aspects of writing such as the creation of rhetorical stylegrammaticality or mechanics The rubric was composed of threesubcategories integration ability macrostructure recognitionand use of relevant details The integration subscore wasawarded the highest point values because this was the predomi-nant skill being tested It scored participants on their ability tomake connections across texts based on the manipulation of thetextual frames in both texts The second subcategory awardedpoints for the ability to recognize and articulate the macrostruc-tures (problem cause effect or solution) present in each textThis subcategory was similar to the categorizing task used in theReading to Learn measure with the additional constraint thatparticipants had to express the connections overtly The thirdsubcategory in the scoring rubric analysed the ability to use
2We recognize that tasks requiring high inference measures plus extensive norming and revision of the scoring rubric pose feasibility issues in large-scale testing Further research is needed todetermine whether and how such scoring procedures could be adapted in standardized testing fornumerous test-takers
relevant details as support in the written synthesis The firstauthor and two research assistants spent 30 hours revising norm-ing the scoring rubric and developing a decision guide resultingin an overall interrater reliability of 99 (coefficient alpha) withsimilarly high alphas for all subcategories
bull Basic Comprehension The third construct was measured bymultiple-choice tests related specifically to the texts used in the new tasks These tests were created by TOEFL TestDevelopment staff and followed current TOEFL reading sectionspecifications We used two multiple choice tests BasicComprehension Test 1 (BC1) and Basic Comprehension Test 2 (BC2) 20 items each one for the longer passage used to assessReading to Learn and one for the two passages used to assessReading to Integrate Both were scored based on number of items answered correctly Reliability on BC1 calculated basedon 251 participants was 84 (coefficient alpha) Inadvertently theorder of the texts used in BC2 was different for the two differentmedia however reliability on both versions of the test was highFor those who took BC2 based on paper texts (n 127) relia-bility was 84 (coefficient alpha) for those who took BC2 basedon computerized texts (n 124) reliability was 86 (coefficientalpha)
3 Design for data collection
This study used a 22 repeated measures design to examineperformance on the new reading tasks Native speaker undergraduatesand nonnative speaker undergraduates were divided into two groupseach of equal ability as determined by performance on the baselinestandardized measures of reading comprehension (NelsonndashDenny orTOEFL) Half of each group read texts on paper the other half readthe same texts on a computer screen A smaller group of nonnativespeaker graduates equally divided were also included for a compar-ison between performance by graduate and undergraduate nonnativespeakers Additionally the administration of the new measures wascounterbalanced to control for any practice effect
a Procedures All participants met with the researchers in foursessions each lasting about an hour The first two sessions were devotedto administering the existing instruments During Session 1 partici-pants received an introduction to the study and took one of the two
182 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 183
standardized basic reading comprehension measures (NelsonndashDennyor TOEFL Reading Comprehension) Students completed thecomputer familiarity questionnaire and the NelsonndashDenny Test at the same testing session because the NelsonndashDenny was shorter than the TOEFL Reading Comprehension During Session 2 partic-ipants took the other standardized basic reading comprehensionmeasure
Next each participant group was subdivided into two subgroupsfor computer-based or paper reading of the texts for the new tasksThe subgroups were matched on their performance on initial readingmeasures the NelsonndashDenny was used for native speakers and theTOEFL Reading Comprehension was used for nonnative speakersIndependent t-tests run on these reading measures showed no signif-icant difference in basic comprehension for the newly created sub-groups assigned to each medium ensuring that they were balancedfor initial reading levels Participants stayed in the same subgroupsfor the duration of the study To ensure uniformity of response modeall participants whether they read the source texts on the computeror on paper responded to the reading tasks using paper and pencilformat3
The last two sessions each lasting approximately one hour werededicated to administration of the new measures The Reading toLearn session took slightly longer to administer because administra-tive procedures were longer for this novel task The new tasks werecounterbalanced to control for practice effect thus half of the partic-ipants took the Reading to Learn measure first and half took theReading to Integrate measure first During Session 3 we administeredthe first new measure (for ease of discussion Reading to Learn is dis-cussed first) and BC1 At this session students were given 12 minutesto read a 1200-word passage either on computer or on paper We lim-ited the time allowed for reading based on 100 words per minutethought to be ample (Grabe personal communication 1998) Afterexaminees read the text they were given 4 minutes to take notes ona half sheet of paper Participants were instructed to take minimalnotes due to the time constraints Next the text was removed andexaminees were allowed 15 minutes to complete a chart based on the reading with the aid of their notes After completing this Readingto Learn activity participants were allowed to use the text and
3Although responses could have been entered and perhaps scored by computer this would haveintroduced factors not directly related to our research questions and remains an area for furtherstudy
were given 15 minutes to answer BC1 Following these new testingsessions 49 participants were selected for a related interview con-cerning the cognitive processes used in task completion (for furtherdetails see Trites 2000 Chapter 6)
During Session 4 students were given 12 minutes to read twoshort texts (600 words each) either on computer or paper Afterparticipants read the assigned texts they were given 4 minutes totake one-half page of notes (Enright et al 1998) Next the textswere removed and participants were asked to demonstrate Readingto Integrate by writing a synthesis of the texts with the aid of theirnotes (15 minutes allowed for this task) After completing theReading to Integrate task participants were allowed to see the textsagain and answered BC2 (15 minutes allowed for this task) In oneReading to Integrate session for unknown reasons six of the sevenparticipants read only one text Because we cannot explain the causeof this anomalous session we have eliminated scores from thesessionrsquos seven participants from subsequent analyses thus slightlyreducing the N size for the Reading to Integrate measure
b Variables used in study The six independent variables includedthree nominal (Native Language Background Medium of TextPresentation and Level of Education) and three interval variables(NelsonndashDenny TOEFL Reading Comprehension and ComputerFamiliarity) The four dependent variables were Reading to LearnReading to Integrate BC1 and BC2
IV Results
First we present the descriptive statistics for all reading measuresfollowed by a systematic analysis of independent variables that mightaffect participant performance on the new measures Scatterplotswere checked for all reading measures to ensure normality of dataKurtosis and skewness levels for all reading measures were found tobe within normal limits indicating a relatively normal distributionDescriptive statistics for all existing measures are shown in Table 1Means for these measures show a consistent pattern the nativespeaker undergraduates had the highest mean followed by the non-native speaker graduates followed by the nonnative speaker under-graduates On the reading measures NelsonndashDenny and TOEFLReading Comprehension the nonnative speaker undergraduates
184 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 185
showed the largest variance in performance while on the computerfamiliarity measure the variance of both nonnative speaker groupswas substantially larger than that of the native speakers
The same pattern emerged for the means on the new measures (seeTable 2) as for the existing measures The native speaker undergradu-ate group performed better on all new measures than both of thenonnative speaker groups The nonnative speaker graduate groupperformed better than the nonnative speaker undergraduate group onall measures as well This robust pattern of performance was alsofound in the variance of three of the four new measures On BC1 andBC2 the performance of the native speaker undergraduates showed the least amount of variance followed by the nonnative speaker grad-uates followed by the nonnative speaker undergraduates On Readingto Integrate the native speaker undergraduate group showed substan-tially less variance than the nonnative speaker groups however thevariance of the two nonnative speaker groups was almost identical OnReading to Learn all three groups showed considerable variance
Table 3 reveals the range of awarded points achieved by all partici-pant groups The nature of the Reading to Learn point system created amaximum possible point value (241) that no participant achieved Wespeculate that there are at least three possible causes of the discrepancybetween the theoretical maximum and the range of observed scores
Table 1 Descriptive statistics for existing measures for three participant groups
Group n Mean sd kMax
NelsonndashDennyNSU 105 12648 1646 156NNSU 106 6724 3191 156NNSG 40 8888 2188 156Total participants 251 9547 3693 156
TOEFL Reading comprehensionNSU 105 6130 424 67NNSU 106 5030 853 67NNSG 40 5715 455 67Total participants 251 5599 819 67
Computer familiarityNSU 104 3808 360 44NNSU 104 3482 599 44NNSG 40 3563 602 44Total participants 248 3631 533 44
Note kMax number of items or maximum possible score
186 New tasks for reading comprehension tests
bull task novelty no participant reported ever doing such a taskpreviously
bull time allowed for task completion andbull space on the response sheet space constraints may have limited
the amount of information that participants could include
Future research would need to address these issues However for theReading to Integrate measure the full range of possible point totalswas achieved by at least one participant in each group
1 Computer familiarity
The overall plan for the analyses was to check the influence of theindependent variables on the dependent measures with computerfamiliarity being addressed first Initially we had proposed that if computer familiarity was significantly different across groups itwould be entered into all calculations as a covariate To determinethis it was necessary to conduct an Analysis of Variance (ANOVA) forcomputer familiarity across the six participantmedium subgroups
Table 2 Descriptive statistics for new measures for three participant groups
Group n Mean sd kMax
Reading to Learn (chart)NSU 105 5185 1986 241NNSU 106 3173 1950 241NNSG 40 4468 1927 241Total participants 251 4221 2164 241
Basic Comprehension Test 1NSU 105 1698 247 20NNSU 106 1173 425 20NNSG 40 1498 350 20Total participants 251 1444 423 20
Reading to Integrate (synthesis)NSU 101 6365 1105 80NNSU 103 3724 2176 80NNSG 40 5360 2103 80Total participants 244 5086 2163 80
Basic Comprehension Test 2NSU 105 1591 278 20NNSU 106 975 454 20NNSG 40 1285 361 20Total participants 251 1282 472 20
Notes kMax number of items or maximum possible score n size reduced forreading to integrate because of anomalous testing session
Latricia Trites and Mary McGroarty 187
The resulting ANOVA (F 470 p 05) showed a significantdifference between subgroups on the computer familiarity question-naire therefore a post hoc Scheffeacute test was done to locate significantcontrasts After analysis of all possible subgroup contrasts the post hoc Scheffeacute revealed that the only significant difference in sub-groups appeared between the native speaker undergraduates andnonnative speaker undergraduates who read texts on paper Hencealthough there was one significant contrast it occurred in two sub-groups reading on paper not in any of the subgroups who read oncomputer All groups generally scored high on computer familiarityalthough as noted variance of the nonnative groups was greater Itwas thus established that computer familiarity had no significanteffect on participants who read texts on computer so we did not usecomputer familiarity as a covariate in further analyses and proceededto the three research questions of central interest to this study
Because both Research Questions 1 and 2 are similar ndash except thatthey address the two different new reading measures Reading toLearn and Reading to Integrate ndash we approached them in the samemanner through ANOVA to identify the independent variables thatcould have significantly affected the results on the new measures
2 Research Question 1
The first research question asked if performance on a measure ofReading to Learn was affected by medium of presentation computerfamiliarity native language or level of education We calculated a uni-variate ANOVA with Type III sums of squares on Reading to Learn with
Table 3 Range of scores for new measures for three participant groups
Group n Minimum Maximum kMax
Reading to Learn (chart)NSU 105 14 120 241NNSU 106 0 86 241NNSG 40 3 94 241Total participants 251 0 120 241
Reading to Integrate (synthesis)NSU 101 38 80 80NNSU 103 0 80 80NNSG 40 5 80 80Total participants 244 0 80 80
Notes kMax number of items or maximum possible score n size reduced forreading to integrate because of anomalous testing session
188 New tasks for reading comprehension tests
group status medium of text presentation and test order as possiblecontributing factors4 Table 4 shows that there were no significantinteractions for any of the group medium or test order combinationsThe only significant main effect was group membership
Because group membership was a combined measure that includedboth native language background as well as level of education posthoc analysis was needed to identify the significant contrasts Table 5shows that there was a significant difference in performance on theReading to Learn measure between the native speaker undergraduateand the nonnative speaker undergraduate groups as well as a sig-nificant difference between the nonnative speaker undergraduate and nonnative speaker graduate groups There was no significantdifference in performance between the native speaker undergrad-uate and the nonnative speaker graduate groups Therefore theanswer to Research Question 1 is that native language backgroundand level of education did have a significant effect on performance onthe Reading to Learn measure but that medium of text presentationdid not Further order of testing whether participants took Readingto Learn or Reading to Integrate first had no significant effect
3 Research Question 2
The second research question related to the first asked if perform-ance on Reading to Integrate was affected by medium of presentation
Table 4 Performance on Reading to Learn measure by groups medium and testorder (n 251) (univariate analysis of variance)
Source Type III sum df Mean square Fof squares
Group 2186929 2 1093464 2810Medium 121586 1 121586 313Test order 29498 1 29498 076Group medium 33481 2 16740 043Group test order 437 2 219 001Medium test order 39173 1 39173 101Group medium test order 57529 2 28765 074Error 9299745 239 38911
Note p 05
4Test order was added as an additional variable to double check that our counterbalancing had beeneffective in controlling for any practice effect
Latricia Trites and Mary McGroarty 189
computer familiarity native language or level of education Again toensure that counterbalancing of tests controlled for any practiceeffect test order was added as an additional variable
To answer this question we proceeded to calculate a univariateANOVA on the Reading to Integrate measure with group statusmedium of text presentation and test order entered as possible contri-buting factors The results (Table 6) show as for Research Question 1that there were no significant interactions for any of the groupmedium or test order combinations the only significant main effectwas group membership The answer for Research Question 2 is thatnative language background and educational level had a significanteffect on Reading to Integrate but medium of text presentation did notPost hoc analysis of group contrasts showed that all three groups weredistinct in their performance on Reading to Integrate (see Table 7)
4 Research Question 3
The third research question asked to what extent measures of basiccomprehension Reading to Learn and Reading to Integrate were
Table 5 Post hoc Scheffeacute for Reading to Learn measure (n 251)
Group n Group n Mean difference Standard error
NSU 105 NNSU 106 2012 272NNSG 40 717 367
NNSU 106 NNSG 40 1295 366
Note p 05
Table 6 Performance on Reading to Integrate measure by groups medium andtest order (n 244a) (univariate analysis of variance)
Source Type III sum df Mean square Fof squares
Group 3629433 2 1814717 5582b
Medium 19295 1 19295 059Test order 9783 1 9783 030Group medium 1182 2 591 002Group test order 3014 2 1507 005Medium test order 109872 1 109872 338Group medium test order 148858 2 74429 229Error 7543037 232 32513
Notes an size reduced for Reading to Integrate because of anomalous testingsession bp 05
190 New tasks for reading comprehension tests
related We used correlational analysis as the first step in answeringthis question Results for the total participant population (see Table 8)showed moderate to high correlations across all reading measuresHowever the analyses done for Research Questions 1 and 2 revealedthat group status had a significant effect on performance on Readingto Learn and Reading to Integrate Further we realize that corre-lations are sensitive to variance so the high correlations seen in thetotal population could have been an artifact of combining the threegroups Therefore we examined the correlations among all readingmeasures for each group (available in Trites 2000 Appendix 1 pp230ndash33) While the reading measures were still correlated oftenmoderately sometimes highly magnitudes differed and sometimesdropped substantially The text-specific multiple-choice measuresBC1 and BC2 consistently correlated more highly with theNelsonndashDenny and TOEFL Reading Comprehension tests than withReading to Learn and Reading to Integrate based on the same textssuggesting a test method or construct effect Because comparisonsbetween different measures of basic comprehension were not a goal of the project BC1 and BC2 were not used in further analysesWe conclude that as expected all reading measures were relatedbut the lower correlations between Reading to Learn and Reading to Integrate and the traditional basic comprehension measures led us to consider further types of analysis to identify the possibledistinctiveness of the new measures
5 Discriminant analysis
Because we were interested in determining how constructs differedwe sought additional analyses to help us better characterize the new constructs Of the several possible statistical methods that could have been employed two are most plausible multivariateanalysis of variance usually associated with experimental research
Table 7 Post hoc Scheffeacute for Reading to Integrate measure (n 244a)
Group n Group n Mean difference Standard error
NS 101 NNSU 103 2641b 253NNSG 40 1005b 337
NNSU 103 NNSG 40 1636b 336
Notes an size reduced for Reading to Integrate because of anomalous testingsession bp 05
Latricia Trites and Mary McGroarty 191
Tab
le 8
Co
rrel
atio
ns
for
all r
ead
ing
mea
sure
s fo
r al
l par
tici
pan
ts (
n
251)
TO
EFL
Rea
din
gB
asic
B
asic
Rea
din
g t
oR
ead
ing
to
C
om
pre
hen
sio
nC
om
pre
hen
sio
n T
est
1C
om
pre
hen
sio
n T
est
2Le
arn
Inte
gra
tea
Nel
son
ndashDen
ny
90b
85b
84b
66b
69b
TO
EFL
Rea
din
g
100
85b
84b
64b
69b
com
pre
hen
sio
nB
asic
1
008
4b6
8b6
8b
com
pre
hen
sio
n 1
Bas
ic
100
68b
70b
com
pre
hen
sio
n 2
Rea
din
g t
o L
earn
100
59b
Not
es a
nsi
ze r
edu
ced
fo
r R
ead
ing
to
Inte
gra
te b
ecau
se o
f an
om
alo
us
test
ing
ses
sio
n b
p
05
and discriminant analysis usually associated with descriptiveresearch (Tabachnick and Fidell 1996) The present research wasconducted with samples of naturally occurring student groups andwas not experimental Moreover we were interested in finding waysto compare participant performance on the measures of the newconstructs Reading to Learn and Reading to Integrate with perform-ance on more traditional measures of basic comprehension Thus we opted to use discriminant analysis because of its parsimony of description and clarity of interpretation (Stevens 1996)Discriminant analysis a technique recommended to describe groupdifferences or predict group membership based on a comparison ofmultiple predictors (Huberty 1994) has been used in other areas ofapplied linguistic research to investigate creation of a student profileof success or failure on Computer Assisted Language Learning(CALL) lessons (Jamieson et al 1993) and accurate classification oftext types into registers (Biber 1993) among other purposes
To further distinguish basic comprehension from the new con-structs we conducted discriminant analysis on each of the twolanguage groups (native and nonnative) to determine whetherReading to Learn and Reading to Integrate would classify partici-pants in the same way that Basic Comprehension would We dividedthe native speaker and nonnative speaker groups into three levelshigh middle (mid) and low reading ability scorers based on thebasic comprehension measure chosen for that group (NelsonndashDennyfor native speakers TOEFL Reading Comprehension for nonnativespeakers) Research methodologists (Tabachnick and Fidell 1996513) note that robustness is expected when the smallest group has isleast 20 our smallest group was 25 To check the assumption ofhomogeneity of the variancecovariance matrices we examined theoutcomes of Boxrsquos M Test and found them all nonsignificant(Klecka 1980) Each group was checked for outliers usingMahalanobis distance treated as Chi-Square and no outliers were found (Tabachnick and Fidell 1996) Thus the data met allassumptions required for use of discriminant analysis
a Discriminant analysis for nonnative speakers To organize thediscriminant analysis in order to see if the Reading to LearnReadingto Integrate Composite classified participants similarly to themeasure of Basic Comprehension (for nonnative speakers TOEFLReading Comprehension) we divided the entire nonnative speakergroup (n 146) into three levels of basic comprehension high mid
192 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 193
and low These reading ability groups of high mid and low werebased on typical TOEFL Reading Comprehension score levelsrequired for program entry Participants were classified as high iftheir scores were greater than or equal to 550 or 56 and above on thescaled score on the TOEFL Reading Comprehension (550 is the cutscore often used for graduate entry) Participants were classified asmid if scores ranged between 500 and 549 or 50 to 55 on the TOEFLReading Comprehension Participants were classified as low if theirscores fell below 500 or 49 and below on TOEFL ReadingComprehension (500 is a minimum TOEFL score sometimes usedfor undergraduate admission often with the proviso that studentsenroll in ESL classes either prior to official enrollment or concur-rently) For our entire nonnative speaker group descriptive statisticson basic comprehension reading ability group membership levelsappear in Table 9
We ran SPSS Discriminant Analysis with initial grouping variablesof high mid and low reading ability We compared initial readingability levels with high mid and low categories on the Reading toLearnReading to Integrate Composite a new variable reflectinglevel of performance on the Reading to Learn and Reading toIntegrate measures combined5 The discriminant analysis yieldedone discriminant function with an eigenvalue of 92 responsible for999 of the variance in outcomes Wilkrsquos Lambda for this functionwas 52 significant at 001 the associated Chi-Square value wasextremely large (9093) and highly significant ( p 001) indicatingthe group centroids on the composite Reading to LearnReading toIntegrate function for the three nonnative speaker reading abilitygroups were significantly different Both Reading to Learn and
Table 9 Descriptive statistics for nonnative speakers by TOEFL reading compre-hension reading ability groups (n 143)
Reading ability group n Mean on TOEFL reading comprehension sd
High (56) 57 5968 303Mid (50ndash55) 42 5281 167Low (49) 44 4211 547
Total 143 5226 822
Note Total n 143 due to loss of three cases in anomalous Reading to Integratesession
5We first calculated two separate discriminant analyses one for Reading to Learn and one for Reading to Integrate but we found that both loaded on a single function so we used thecomposite in subsequent analyses
194 New tasks for reading comprehension tests
Reading to Integrate loaded significantly on the discriminant func-tion at 86 for Reading to Learn and 78 for Reading to Integrate ( p 05)
Over half (649) of the high reading ability group remained highon the new measure less than half (429) of the mid reading ability group remained classified as mid Hence the Reading toLearnReading to Integrate Composite was particularly influential inreclassifying the mid reading ability group and to a lesser extent thehigh group However most (818) of the low reading ability groupremained low on the Reading to LearnReading to IntegrateComposite (see Table 10) Of the 143 nonnative speaker participants92 (64) remained in the initial basic comprehension category onthe composite the rest moved but in different directions Twenty-one participants (147) were classified into a higher category on theReading to LearnReading to Integrate Composite than their initialbasic comprehension level would have suggested while 31 (217)were reclassified into a lower category Thus 51 participants(364) just over one third of the sample were classified differentlybased on their Reading to LearnReading to Integrate Compositeperformance
b Discriminant analysis for native speakers Because one of ourgoals in this project was to probe the possible validity of these newmeasures by assessing performance of two groups native as well as
Table 10 Discriminant analysis comparison of nonnative speaker reading abilitygroups with reading to learnreading to integrate composite (n 143)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto integrate composite
High Mid Low
CountHigh (56) 37 17 3 57Mid (50ndash55) 13 18 11 42Low (49) 2 6 36 44Reclassification total 52 41 50 143
PercentageHigh (56) 649 298 53 1000Mid (50ndash55) 310 429 262 1000Low (49) 45 136 818 1000
NoteTotal n 143 due to loss of three cases in anomalous Reading to Integratesession
Latricia Trites and Mary McGroarty 195
nonnative speakers we conducted a parallel discriminant analysisfor native speakers Thus for the native speakers we followed thesame procedure dividing the entire native speaker group (n 105)into three levels of basic comprehension high mid and low basedon score distances of 5 standard deviations from the sample meanof the NelsonndashDenny test for these participants Native speakersneed not take reading comprehension tests when entering the univer-sity so the three-way split was based entirely on our sample dataParticipants were classified as high if their scores on the Nelson-Denny were greater than or equal to 135 Participants were classifiedas mid if their scores ranged between 119ndash134 on the Nelson-DennyParticipants were classified as low if their scores fell at or below 118on the Nelson-Denny Descriptive statistics for the entire nativespeaker sample on basic comprehension group membership appearin Table 11
The discriminant analysis yielded one discriminant function with an eigenvalue of 31 responsible for 987 of the variance inoutcomes Wilkrsquos Lambda for this function was 76 significant at 001 the associated Chi-Square value was large (2639) and highlysignificant (p 001) indicating the group centroids on the discrim-inant function for the three reading ability groups on the Reading toLearnReading to Integrate Composite were significantly differentPooled within groups correlations between discriminating variablesshowed that Reading to Learn correlated with the first discriminantfunction at a level of 81 Reading to Integrate correlated with thesecond discriminant function at 71 This contrasts with findings forthe nonnative speakers where scores on the combined new measuresloaded significantly on only one discriminant function For nativespeakers then there is evidence for two significant discriminantfunctions although the first accounts for almost all of the varianceAlthough these two measures (Reading to Learn and Reading toIntegrate) loaded on two separate discriminant functions they still
Table 11 Descriptive statistics for native speakers by NelsonndashDenny reading abilitygroups (n 101)
Reading ability group n Mean on NelsonndashDenny sd
High (135) 39 14126 519Mid (119ndash134) 37 12565 509Low (118) 25 10288 1098Total 101 12604 1652
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
196 New tasks for reading comprehension tests
showed moderate correlations with the alternate function6 justifyingthe composite calculations
Results of discriminant analysis for native speakers seen in Table 12 show a different pattern than that observed for nonnativespeakers Nearly three-fourths (718) of the native speakersclassified as high in the basic comprehension reading ability groupremained high on the Reading to LearnReading to IntegrateComposite For the mid group however only 189 remainedclassified as mid Just over half (52) of the low reading abilitygroup members remained low on the Reading to LearnReading to Integrate Composite As with the nonnative speakers participantsin the mid category on basic comprehension showed the mostfrequent reclassification Forty-eight of the 101 (475) nativespeaker participants remained in the initial classification categoriesTwenty-two (218) were reclassified into a higher category and 31(307) were reclassified into a lower category Thus 53 participants(525) over half of the sample were classified differently based ontheir performance on the Reading to LearnReading to IntegrateComposite
6Reading to Learn correlated with function 2 at -58 Reading to Integrate with function 1 at 71
Table 12 Discriminant analysis comparison of native speaker reading abilitygroups with Reading to LearnReading to Integrate Composite (n 101)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto Integrate Composite
High Mid Low
CountHigh (56) 37 17 3CountHigh (135) 28 3 8 39Mid (119ndash134) 10 7 20 37Low (118) 5 7 13 25Reclassification total 43 17 41 101
PercentageHigh (135) 718 77 205 1000Mid (119ndash134) 270 189 541 1000Low (118) 200 280 520 1000
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
V Interpretation
Analyses done to answer Research Questions 1 and 2 showed that performance on Reading to Learn and Reading to Integratemeasures was significantly influenced by language background and level of education (graduate vs undergraduate for nonnativespeakers only) Moreover level of computer familiarity had nosignificant effect on Reading to Learn and Reading to Integrateperformance
Correlations showed that as expected all reading measures corre-lated to some degree generally answering Research Question 3 Themost interesting results came from the discriminant analyses becausethey showed a pattern of differential classification on the new taskssensitive to initial reading ability This pattern differed for nonnativespeakers and native speakers Examination of the reclassificationsbased on the new tasks revealed some of the problems of using abasic comprehension-only test to predict performance on more chal-lenging literacy tasks These results also imply the need for furtherwork on new measures of advanced literacy skills such as Reading toLearn and Reading to Integrate to reflect trends in construct-drivenassessment (Pellegrino et al 1999)
For the nonnative speakers most (818) of the participants withTOEFL Reading Comprehension below 50 remained classified as lowon the Reading to LearnReading to Integrate Composite suggestingthe existence of a lower threshold of academic English proficiencyParticipants below this threshold were unlikely to perform well ontasks assessing Reading to Learn and Reading to Integrate Even tasksinvolving only selection (such as BC1 and BC2) rather than produc-tion of responses were difficult for this group However for thenonnative speaker readers in the mid and high reading ability groupsbasic comprehension level was not nearly as consistent a predictor of performance on the Reading to LearnReading to IntegrateComposite This was especially striking for those in the mid readingability group Approximately one fourth of the mid group droppedinto the low category on the Reading to LearnReading to IntegrateComposite indicating that they experienced more difficulty in com-pleting the new measures but approximately one fourth could indeedcategorize and synthesize information relatively well despite mid-dling performance on basic comprehension This finding suggeststhat for nonnative speaker readers in the mid reading ability categorybasic comprehension measures were insufficient to predict theirperformance on the new measures
Latricia Trites and Mary McGroarty 197
Almost two-thirds of the high basic comprehension nonnativespeakers remained high on the Reading to LearnReading toIntegrate Composite but one third dropped suggesting that theycould not categorize and synthesize information from texts as well asthey could recognize information For this third the recognition-onlytask multiple-choice basic comprehension overestimated ability tomanipulate and synthesize information For the high and mid read-ing ability nonnative speaker groups basic comprehension-only testsmay then overestimate academic English proficiency
Results for the native speakers showed a more definitive pattern ofreclassification For the low reading ability group approximatelyhalf were misclassified suggesting that the basic comprehensionmeasure underrepresented their ability to categorize and synthesizeinformation in academic texts Most of the mid reading ability groupwere reclassified as lower on the Reading to LearnReading toIntegrate Composite showing that basic comprehension resultsoverestimated ability to succeed on more challenging tasks On theother hand almost one third of the mid reading ability group didbetter on the Reading to LearnReading to Integrate Composite forthem basic comprehension underrepresented their ability to com-plete more challenging reading tasks For the high ability readersnearly one third dropped on the Reading to LearnReading toIntegrate Composite For them the basic comprehension measureoverestimated their level of academic English proficiency The other two thirds of the high reading ability group remained highsuggesting a higher threshold of academic English proficiency
Considering results from both the nonnative speakers and nativespeakers we conclude that the new tasks did assess something dif-ferent from basic comprehension once a lower level threshold ofbasic academic English proficiency had been achieved Examinationof scatterplots based on discriminant functions indicated an obviousseparation between participants who could and could not perform onthe Reading to LearnReading to Integrate Composite thus provid-ing some tentative evidence for concurrent validity (Messick 1989Chapelle 1999) We had hoped to find clear evidence of a hierarchysuggesting that Reading to Learn was demonstrably more difficultthan basic comprehension and Reading to Integrate demonstrablymore difficult than Reading to Learn but results did not yield anobvious hierarchy Results did however suggest an even simplerpattern a dichotomy For those nonnative speakers above the lowerthreshold of English language proficiency which in our data wouldbe approximately 500 or 49 on the scaled TOEFL Reading
198 New tasks for reading comprehension tests
Comprehension scores some could perform well on the newmeasures some could not revealing two rather than three groups onthe Reading to LearnReading to Integrate Composite These resultslead us to speculate that the new measures tap additional skills suchas lsquosophisticated discourse processes and critical thinking skillsrsquo(Enright and Schedl 1999 24) in addition to language proficiencyFor native speakers the discriminant loading suggested a possiblehierarchy of difficulty but the scatterplots and classification tablesrevealed a dichotomy Reclassification based on the Reading toLearnReading to Integrate Composite nearly eliminated the mid basic comprehension reading ability category with only 168of the native speakers qualifying as mid on the Reading toLearnReading to Integrate Composite
VI Conclusions
While we acknowledge that there are some limitations to this projectsuch as the time required to administer and score new measuresthe results are illuminating useful and suggest some considerationsfor future research The first consideration relates to appropriateclassification of students based on academic reading abilities Thiscorresponds to the role of TOEFL as a gatekeeper by many institu-tions For TOEFL test-takers whose current TOEFL scores would bein an intermediate to high intermediate range the new tasks couldassess additional abilities relevant to academic performance Forsuch participants additional test development efforts are warrantedThe majority of high reading ability nonnative speakers (649)could perform the new measures a third could not Our resultssuggest that some unknown number of mid reading ability nonnativespeakers are more capable of succeeding at more challenging aca-demic reading tasks than their current level of basic comprehensionassessment would indicate At the same time there are some highand mid reading ability participants who could not perform the moredemanding tasks admitting such students directly into universityprograms could result in failure These results suggest that for moststudents at lower levels of basic comprehension (in our study thosewith TOEFL scores below 490 to 500) development of new tasks isunnecessary Nevertheless our results indicate that a certain smallpercentage (in our sample 8 students 182) of nonnative speakersclassified as low reading ability on basic comprehension did in fact
Latricia Trites and Mary McGroarty 199
perform adequately on more challenging tasks like Reading to Learnand Reading to Integrate Given the very large number of TOEFLtest-takers around the world this finding merits further research Itwould be potentially unfair to exclude such students from universitystudy based on basic comprehension-only measures such as the cur-rent TOEFL We would urge institutions that use TOEFL to considerscores on these new more demanding tasks if doing so wouldaddress institutional needs According to interview data collected inconjunction with this project (Trites 2000 Chapter 6) participantsperceived that successful completion of the new tasks required athorough understanding of the texts whereas the multiple-choicetests were less demanding requiring only superficial grasp of con-tent This finding corroborates the observation of Freedle and Kostin(1999 3) who note that often examinees do not need to comprehendthe accompanying text to answer the test item Therefore for allTOEFL test-takers incorporating more challenging tasks such as theReading to Learn and Reading to Integrate measures into typicalEnglish language instruction could have positive washback effects
The second area of consideration is the focus on additional rele-vant research For predictive validity it is important to determine thecorrelation between the Reading to LearnReading to IntegrateComposite measures and actual academic performance in universityclasses Correlations might differ depending on the degree of catego-rization and synthesis required by different major fields of studyAnother area for future research is related to the novelty of theReading to Learn task as an assessment technique Current pedagog-ical trends in literacy instruction emphasize the use of graphic organ-izers as a means to understand and manipulate information in textsTo our knowledge graphic organizers are typically used as classroomactivities rather than assessment tools They have good potential foruse in assessment if students are familiar with them and if appropri-ate scoring systems can be developed This project demonstrates that it is possible though labor intensive to develop reliable scoringsystems Further research is needed to explore refinement of this tasktype and related scoring systems for use in testing programs withlarge numbers of test-takers and scorers Although the Reading toIntegrate task (generating a synthesis) was more familiar the scoringsystem was innovative because it reflected a readerrsquos ability torecognize textual frames as well as integrate information If testdevelopers are interested in the abilities of nonnative speaker readersto perform such tasks further development of similar tasks andscoring systems is warranted While a substantial investment of time
200 New tasks for reading comprehension tests
would be required to refine the administration and scoring systemsneeded for more complex tasks such as these it should be weighedagainst the possible danger of under-representing the likelihood ofacademic success based on results of the basic comprehension-onlymeasures still most often used to assess academic proficiency
Acknowledgements
This project was funded by a grant from Educational Testing Serviceas part of the TOEFL 2000 effort We appreciate the cooperation ofthe ETS staff members who assisted in the development of passagespecific tasks and other aspects of the research however no officialendorsement of Educational Testing Service should be inferred
VII References
Bachman L 2000 Modern language testing at the turn of the century assur-ing that what we count counts Language Testing 17 1ndash42
Biber D 1993 Using register-diversified corpora for general language studiesComputational Linguistics 19 219ndash41
Britt M Rouet J and Perfetti C 1996 Using hypertext to study and reasonabout historical evidence In Rouet J Levonen J Dillon A and Spiro Reditors Hypertext and cognition Mahwah NJ Lawrence Erlbaum 43ndash72
Chapelle C 1999 Validity in language assessment Annual Review of AppliedLinguistics 19 1ndash19
Educational Testing Service 1997 TOEFL test and score manual PrincetonNJ Educational Testing Service
mdashmdash 1998 Draft TOEFL 2000 research agenda framework areas of researchResearch agenda three TOEFL 2000 internal document Princeton NJEducational Testing Service
Eignor D Taylor C Kirsch I and Jamieson J 1998 Development of ascale for assessing the level of computer familiarity of TOEFL exami-nees TOEFL Research Report No 60 Princeton NJ EducationalTesting Service
Enright M and Schedl M 1999 Reading for a reason using reader purposeto guide test design TOEFL 2000 Internal Report Princeton NJEducational Testing Service
Enright M Grabe W Mosenthal P Mulcahy-Ernt P and Schedl M1998 A TOEFL 2000 framework for testing reading comprehension aworking paper Princeton NJ Educational Testing Service
Foltz P 1996 Comprehension coherence and strategies in hypertext InRouet J Levonen J Dillon A and Spiro R editors Hypertext and cog-nition Mahwah NJ Lawrence Erlbaum 109ndash36
Latricia Trites and Mary McGroarty 201
Freedle R and Kostin I 1999 Does the text matter in a multiple choice testof comprehension The case for the construct validity of TOEFLrsquosminitalks Language Testing 16 2ndash32
Goldman S 1997 Learning from text reflections on the past and suggestionsfor the future Discourse Processes 23 357ndash98
Hayes J and Hatch J 1999 Issues in measuring reliability correlationversus percentage of agreement Written Communication 16 354ndash67
Huberty C 1994 Applied discriminant analysis New York John WileyJamieson J Campbell J Norfleet L and Berbisada N 1993 Reliability
of a computerized scoring routine for an open-ended task System 21305ndash22
Jamieson J Norfleet L and Berbisada N 1993 Successes failures anddropouts in computer-assisted language lessons Computer AssistedEnglish Language Learning Journal 4 12ndash20
Klecka W 1980 Discriminant analysis In Lewis-Beck M editorQuantitative applications in the social sciences Volume 19 NewburyPark CA Sage
Lehto M Zhu W and Carpenter B 1995 The relative effectiveness ofhypertext and text International Journal of Human-ComputerInteraction 7 293ndash313
McNamara D and Kintsch W 1996 Learning from texts effects of prior knowl-edge and text coherence Discourse Processes 22 247ndash88
Messick S 1989 Validity In Linn RL editor Educational measurement 3rdedition New York American Council on Education Macmillan 13ndash103
Meyer B 1985a Prose analyses purposes procedures and problems InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 11ndash64
mdashmdash 1985b Prose analysis purposes procedures and problems Part 2 InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 269ndash97
Monks V 1997 AugustSeptember Two views same waterway NationalWildlife 35 36ndash37
Pellegrino J Baxter G and Glaser R 1999 Addressing the lsquotwo disci-plinesrsquo problem linking theories of cognition and learning with assess-ment and instructional practice Review of Research in Education 24307ndash53
Perfetti C 1997 Sentences individual differences and multiple texts threeissues in text comprehension Discourse Processes 23 337ndash55
Perfetti C Britt MA and Georgi M 1995 Text-based learning andreasoning studies in history Hillside NJ Lawrence Erlbaum
Perfetti C Marron M and Foltz P 1996 Sources of comprehension failuretheoretical perspectives and case studies In Cornoldi C and Oakhill Jeditors Reading comprehension difficulties processes and interventionMahwah NJ Lawrence Erlbaum 137ndash65
202 New tasks for reading comprehension tests
Reinking D 1988 Computer-mediated text and comprehension differencesthe role of reading time reader preference and estimation of learningReading Research Quarterly 23 484ndash500
Reinking D and Schreiner R 1985 The effects of computer-mediated texton measures of reading comprehension and reading behavior ReadingResearch Quarterly 20 536ndash53
Spivey N 1997 The constructivist metaphor reading writing and themaking of meaning San Diego CA Academic Press
Stevens J 1996 Applied multivariate statistics for the social sciences 3rdedition Mahwah NJ Lawrence Erlbaum
Tabachnick B and Fidell L 1996 Using multivariate statistics 3rd editionNew York Harper Collins
Taylor C Jamieson J Eignor D and Kirsch I 1998 The relationshipbetween computer familiarity and performance on computer-basedTOEFL test tasks TOEFL Research Report No 61 Princeton NJEducational Testing Service
Tennesen M 1997 NovemberDecember On a clear day National Parks 7126ndash9
Trites L 2000 Beyond basic comprehension reading to learn and reading tointegrate for native and non-native speakers Unpublished doctoraldissertation Northern Arizona University Flagstaff AZ
Van den Berg S and Watt J 1991 Effects of educational setting on studentresponses to structured hypertext Journal of Computer-BasedInstruction 18 118ndash24
Van Dijk TA and Kintsch W 1983 Strategies of discourse comprehensionNew York Academic Press
Wiley J and Voss J 1999 Constructing arguments from multiple sourcestasks that promote understanding and not just memory for text Journalof Educational Psychology 91 310ndash11
Zimmerman T 1997 December 29 Filter it with billions and billions of oys-ters how to revive the Chesapeake Bay US News and World Report 12363
Appendix 1a Chart completion task
Directions Complete the following chart Fill in as much detail aspossible from the text read by categorizing the information into thedifferent areas on the chart Do not use single words for yourresponses form your responses in phrases or complete sentencesInclude examples from the text
Make no judgments about the accuracy of causes or effects or theeffectiveness of the solutions mentioned in the text Solutions are seen
Latricia Trites and Mary McGroarty 203
204 New tasks for reading comprehension tests
as any action taken in response to the problem(s) Solutions can takemany forms such as proposed solutions attempted solutions or failedsolutions Also space is provided under each category for examplesExamples are specific examples found in the text that are used by theauthor(s) to exemplify the problems causes effects or solutions inthe text However there may not be examples for every category
Points will be awarded for correct responses only There is no penaltyfor incorrect responses Points will be awarded in the following manner
Problems and Solutions 10 points eachCauses and Effects 5 points eachExamples 1 point each
Problems Causes Effects Solutions
Examples Examples Examples Examples
Latricia Trites and Mary McGroarty 205
Ap
pen
dix
1b
Rea
din
g t
o L
earn
sco
rin
g r
ub
ric
0ndash2
41 t
ota
l po
ssib
le
Pro
ble
ms
(10
po
ints
) 1
wo
rd
Cau
ses
(5p
oin
ts)
1 w
ord
E
ffec
ts (
5 p
oin
ts)
1 w
ord
S
olu
tio
ns
(10
po
ints
en
d)
1 w
ord
(6 p
oin
ts)
40
max
imu
m(3
po
ints
) 5
5 m
axim
um
(3 p
oin
ts)
30
max
imu
m(6
po
ints
) 9
0 m
axim
um
bullA
ir p
ollu
tio
nS
mo
g i
n t
he
bullS
ulf
ur
nit
rog
en e
mis
sio
ns
bullTre
es a
nd
pla
nts
aff
ecte
dS
tric
ter
En
vir
on
men
tal
Law
s
Nati
on
al
Park
sbull
Aci
d r
ain
(in
jure
d)
gro
wth
hin
der
ed(r
eso
luti
on
s
acts
)
bullO
pp
osit
ion
or
Ign
ori
ng
bull
Gro
un
d-l
evel
ozo
ne
bullV
isib
ilit
y d
ecre
ased
bull19
77 C
lean
Air
Act
(or
lack
of
coo
per
atio
n)
to
Urb
an
In
du
str
ial
em
issio
ns
bullM
eta
ls l
oo
sen
ed
into
wat
ers
Am
en
dm
en
ts l
ab
elin
gN
Ps
envi
ron
men
tal s
tan
dar
ds
bullA
uto
mo
bile
emis
sio
ns
(su
rfac
eg
rou
nd
wat
er
as C
lass I
are
as
(reg
ula
tio
ns)
bullP
ow
er
pla
nts
fac
tori
esp
lan
ts
wat
er p
ollu
ted
)bull
Reg
ion
al h
aze
reg
ula
tio
ns
bullN
o t
rue
lsquopo
int
sou
rces
rsquoem
issi
on
sbull
Nu
trie
nts
rem
oved
pro
po
sed
by
the
EPA
(myri
ad
of
sm
aller
po
llu
tio
n
bullE
mis
sio
ns
fro
m S
mo
kesta
cks
(lea
ched
) fr
om
so
il bull
Red
ucti
on
of
allo
wab
leso
urc
es
rath
er t
han
on
e (c
him
neys)
and
or
pla
nts
po
llu
tio
n s
tan
dard
s f
rom
la
rge
sou
rce)
bullE
mis
sio
ns
fro
m K
iln
sbull
Pu
blic o
utc
ryo
ver
stan
ce
ind
ust
rybull
Nati
on
al
Park
s l
imit
ed
bull
Un
reg
ula
ted
po
lluti
on
sm
og
o
f b
ig b
usi
nes
sg
ove
rnm
ent
bullC
lean
Air
Act
set
ju
risd
icti
on
of
po
lluti
on
(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)bull
Aq
uati
c l
ife in
jure
d (
dam
aged
)vis
ibilit
y g
oals
sou
rces
ou
tsid
e o
f th
e p
arks
bullS
mo
ke f
rom
Co
ntr
olled
bu
rns
bullO
bje
cti
ng
to c
on
str
ucti
on
bullE
mis
sio
ns
fro
m la
rge
citi
esp
erm
its
bullId
en
tifi
cati
on
of
po
lluti
on
so
urc
es
bullIn
du
str
y m
od
ificati
on
s
Insta
llati
on
o
f d
evic
es
(scr
ub
ber
s)
to r
edu
ce
po
lluti
on
bullP
ub
lic p
ressu
re t
op
rote
ct p
arks
206 New tasks for reading comprehension testsA
pp
en
dix
1b
(co
nti
nu
ed)
Exam
ple
s (
1 p
oin
t) 5
maxim
um
Exam
ple
s (
1 p
oin
t o
r 5 p
oin
ts
Exam
ple
s (
1 p
oin
t) 6
maxim
um
Exam
ple
s (
1 p
oin
t o
r 10 p
oin
ts
wit
h o
vera
rch
ing
cau
se
wit
h o
vera
rch
ing
so
luti
on
liste
d a
bo
ve)
liste
d a
bo
ve)
bullG
reat
Sm
oky M
ou
nta
inbull
Au
tom
ob
ile
emis
sio
ns
bullLe
aves
of
pla
nts
tu
rnin
g
bull19
77 C
lean
Air
Act
N
atio
nal
Par
kbull
Po
wer
pla
nts
fac
tori
es
pu
rple
an
d b
row
n (
stip
plin
g)
Am
en
dm
en
ts
lab
elin
gN
Ps
bullG
ran
d C
an
yo
np
lan
ts e
mis
sio
ns
bullH
ind
erin
g p
ho
tosyn
thesis
of
as C
lass I
are
as
Nat
ion
al P
ark
bullE
mis
sio
ns
fro
m
pla
nts
an
d t
rees
bullS
mo
ky M
ou
nta
in S
ou
rces
S
mo
kesta
cks
(ch
imn
eys)
bullR
ed
ucti
on
of
vis
ibilit
yat
bullR
egio
nal
haz
ere
gu
lati
on
s
fro
m O
hio
New
Yo
rk
bullE
mis
sio
ns
fro
m K
iln
sG
ran
d C
an
yo
np
rop
ose
d b
y th
e E
PA
Atl
anta
etc
bull
Un
reg
ula
ted
po
lluti
on
sm
og
bull
Vis
ibili
ty r
educ
ed 9
0 o
f bull
Red
ucti
on
of
allo
wab
lebull
Gra
nd
Can
yon
So
urc
es(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)th
e da
ysp
ollu
tio
n s
tan
dard
s f
rom
fro
m C
A N
V U
T A
Z N
M
bullS
mok
e fr
om C
on
tro
lled
bu
rns
bullS
ee v
ague
blu
e m
asse
sin
du
stry
and
Mex
ico
bullE
mis
sio
ns
fro
m la
rge
citi
esbull
See
hal
f as
far
as
in 1
919
bullT
N L
utt
rell
corp
bu
ildin
g
per
mit
sit
uat
ion
Exam
ple
s (
1 p
oin
t)
10 m
axim
um
Exam
ple
s (
1 p
oin
t)
5 m
axim
um
bullN
avaj
o G
ener
atin
g S
tati
on
bull
Ob
ject
ing
to
kiln
s in
TN
Pag
e A
Z (
Po
wer
Pla
nt
AZ
)bull
Res
earc
her
s u
sin
g s
cien
tifi
cbull
Po
wer
pla
nts
in T
N a
nd
OH
tech
no
log
y to
ID s
ou
rces
rive
r va
lley
(rad
ioac
tive
iso
top
es)
bullS
ou
ther
n C
alif
orn
ia E
dis
on
bull
So
uth
ern
Cal
ifo
rnia
Ed
iso
nP
lan
t L
aug
hlin
NV
Pla
nt
in L
aug
hlin
NV
bull
Ten
nes
see
Lutt
rell
Kiln
s T
Nid
enti
fied
as
po
int
sou
rce
bullIn
du
stry
in A
Zbull
Scru
bber
sin
stal
led
at
bullC
ars
in C
AN
avaj
o G
ener
atin
g P
lan
tbull
Sm
oke
stac
ks in
NM
NV
UT
bull10
r
edu
ctio
n o
f p
ollu
tio
nbull
Los
An
gel
esin
10ndash
15 y
ears
(g
oal
of
no
bullA
tlan
tam
an-m
ade
po
lluti
on
)bull
New
Yo
rk
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
III Methods
1 Participants
Two hundred and fifty-one participants the majority undergraduatesvolunteered to take part in this study The sample consisted of 105undergraduate native speakers of English (NSUs) 106 undergraduatenonnative speakers (NNSUs) and 40 graduate nonnative speakers(NNSGs) of English at a midsized southwestern university All data were collected between February and October 1999 All under-graduate participants were recruited through large undergraduateclasses in the areas enrolling most NNSs (business administrationhotel management engineering social sciences and humanities)We tested all NNSs accessible at the institution at the time of datacollection compared to a national sample of international studentsfrom the prior academic year we had a relatively larger proportionof undergraduate relative to graduate students Nearly all undergrad-uate participants were young adults with an average age of 21Nonnative speakers were also recruited from students enrolled in thesummer intensive English program which is made up of studentsneeding to increase TOEFL scores to at least 500 in order to enrollat a university We included 46 participants (32 of NNS sample)with TOEFL scores below 500 in the nonnative sample Graduatenonnative speakers (n 40) were recruited from the entire univer-sity population and had an average age of 3075 Nonnative speakersrepresented a range of language backgrounds One third wereJapanese with other Asian Germanic and Romance languages alsosubstantially represented Both the relatively modest sample size andthe all-volunteer nature of the participant sample preclude directgeneralization to the worldwide TOEFL population but participantswere representative of the levels of international students at the insti-tution where they were enrolled Participants who completed all fourdata collection sessions received a payment of US$10 per hour(US$40 for the entire project)
2 Instruments
This project used three existing instruments two to determine initialreading levels and one to assess levels of computer familiarity andtwo new instruments one for Reading to Learn and one for Readingto Integrate these were developed especially for the project Each ofthe new measures also served as the basis for an additional measure
178 New tasks for reading comprehension tests
of basic reading comprehension related directly to the text includedin the new task Thus each participant completed a total of sevendifferent instruments
a Existing instruments Initial levels of reading comprehensionwere determined based on the NelsonndashDenny Reading Test(NelsonndashDenny) Form G used to identify the reading levels of theNSs and three retired versions of the Institutional TOEFL ReadingComprehension Section (TOEFL Reading Comprehension) used toidentify the reading levels of the NNSs Although each of these testswas used to assess reading levels in the population for which it had been developed all 251 participants took both tests in order toprovide comparative data All 251 participants also completed a brief computer familiarity questionnaire
Participantsrsquo computer familiarity was determined through an 11-item questionnaire based on a longer 23-item questionnairepreviously developed by ETS (Eignor et al 1998) In the presentstudy we used only the 11 items that loaded the most heavily on themajor factors resulting from administration to a large sample ofTOEFL participants For these 11 items developers determined thereliability to be 93 using a split-half method (Eignor et al 199822) This brief questionnaire took approximately 5 minutes tocomplete reliability in our sample using coefficient alpha was 87
b Texts used for new measures In developing the new tasks weselected texts that would conform to the design specifications ofTOEFL 2000 They were problemsolution texts recommended asone of the potentially relevant text types for TOEFL 2000 (Enright et al 1998) Longer texts were used because these represented morechallenging and authentic academic tasks (Enright et al 1998) Weused one 1200-word and two 600-word texts The longer text(Tennesen 1997) was used to assess Reading to Learn and the two600-word texts (Monks 1997 Zimmerman 1997) were used toassess Reading to Integrate We chose these text lengths based onwork by Meyer (1985a) and further research by the first authorindicating that natural science texts between 1200 and 1500 wordsincluded representation of all necessary macro-rhetorical structuresof problemsolution texts with or without explicit signaling While1200ndash1500 word texts provide optimal representation of the macro-rhetorical structures texts of 600-words provide all the basic macro-rhetorical structures present in problemsolution texts Thus these
Latricia Trites and Mary McGroarty 179
180 New tasks for reading comprehension tests
lengths were long enough for adequate argumentation but not so long that they were excessively redundant (Enright et al 1998)Texts were also matched for readability according to standard read-ability scales such as the FleschndashKincaid ColemanndashLiau andBormuth scales and averaged a minimum of grade level 110 to 120on these scales Also all texts pertained to natural and social scienceseach text covered environmental issues such as air and water pollution(Enright et al 1998) Thus text topics were similar across tasks
c New instruments used in the study Three new reading measureswere used in this study to assess Reading to Learn Reading to Integrateand Basic Comprehension Trites (2000 Chapters 2 and 3) presents amore extensive review of literature and rationale for development of thenew measures
bull Reading to Learn The first new measure completion of a chart was used to determine participantsrsquo ability to read to learnSpivey (1997 69) suggests that readersrsquo categorization of infor-mation in text offers insight into their cognitive processes andtheir making of meaning We designed a measure to be used with a 1200-word text that students read on either paper or com-puter Students were asked to recall identify and categorizeinformation from the text on a chart reflecting macro-rhetoricalstructures called macrostructures in this study (problems andsolutions) and other types of information from problemsolutiontexts (causes effects and examples) categories based on thework of Meyer (1985a) The scoring rubric based on work byMeyer (1985b) and later modified by Jamieson et al (1993)awarded points only for the upper levels of textual structurerepresented on the chart (for task and scoring rubric seeAppendix 1) We weighted the information supplied on the chartas follows 10 points for correct information in the problem andsolution categories five points for correct information suppliedin the cause and effect categories and one point for accurateexamples This weighting reflects Meyerrsquos (1985b) hierarchicallevels which characterize problem and solution propositions ashigher order structures while the other categories represent lowerorder propositions1 The theoretical maximum score for this scale
1Students received no points for information improperly placed or for information not found in thetext
Latricia Trites and Mary McGroarty 181
was 241 which would result from maximum points given in allcategories The first author and two research assistants spent35ndash40 hours creating revising norming the scoring rubric anddeveloping the scoring guide (Trites 2000 Chapter 3) To deter-mine interrater reliability we used coefficient alpha rather thanpercentage of agreement because percentage of agreementinflates the likelihood of chance agreement (Hayes and Hatch1999) After norming overall interrater reliability was 99(coefficient alpha) with similarly high reliabilities assessed withsimilarly high alpha coefficients for all subcategories2
bull Reading to Integrate The second new measure assessed Readingto Integrate The task used to assess Reading to Integraterequired participants to read two 600-word texts and compose awritten synthesis The prompt asked students to make connec-tions across the range of ideas presented thus we asked readersto synthesize information rather than summarize or makecomparisons (Wiley and Voss 1999) This synthesis was scoredbased on an analytic scale ranging from 0 to 80 reflecting read-ersrsquo ability to recognize and manipulate the structure of the textsinclude specific information and express connections acrosstexts through the use of cohesive devices (for task and scoringrubric see Appendix 2) The test was designed to measure theintegration of content from both readings and did not assessother aspects of writing such as the creation of rhetorical stylegrammaticality or mechanics The rubric was composed of threesubcategories integration ability macrostructure recognitionand use of relevant details The integration subscore wasawarded the highest point values because this was the predomi-nant skill being tested It scored participants on their ability tomake connections across texts based on the manipulation of thetextual frames in both texts The second subcategory awardedpoints for the ability to recognize and articulate the macrostruc-tures (problem cause effect or solution) present in each textThis subcategory was similar to the categorizing task used in theReading to Learn measure with the additional constraint thatparticipants had to express the connections overtly The thirdsubcategory in the scoring rubric analysed the ability to use
2We recognize that tasks requiring high inference measures plus extensive norming and revision of the scoring rubric pose feasibility issues in large-scale testing Further research is needed todetermine whether and how such scoring procedures could be adapted in standardized testing fornumerous test-takers
relevant details as support in the written synthesis The firstauthor and two research assistants spent 30 hours revising norm-ing the scoring rubric and developing a decision guide resultingin an overall interrater reliability of 99 (coefficient alpha) withsimilarly high alphas for all subcategories
bull Basic Comprehension The third construct was measured bymultiple-choice tests related specifically to the texts used in the new tasks These tests were created by TOEFL TestDevelopment staff and followed current TOEFL reading sectionspecifications We used two multiple choice tests BasicComprehension Test 1 (BC1) and Basic Comprehension Test 2 (BC2) 20 items each one for the longer passage used to assessReading to Learn and one for the two passages used to assessReading to Integrate Both were scored based on number of items answered correctly Reliability on BC1 calculated basedon 251 participants was 84 (coefficient alpha) Inadvertently theorder of the texts used in BC2 was different for the two differentmedia however reliability on both versions of the test was highFor those who took BC2 based on paper texts (n 127) relia-bility was 84 (coefficient alpha) for those who took BC2 basedon computerized texts (n 124) reliability was 86 (coefficientalpha)
3 Design for data collection
This study used a 22 repeated measures design to examineperformance on the new reading tasks Native speaker undergraduatesand nonnative speaker undergraduates were divided into two groupseach of equal ability as determined by performance on the baselinestandardized measures of reading comprehension (NelsonndashDenny orTOEFL) Half of each group read texts on paper the other half readthe same texts on a computer screen A smaller group of nonnativespeaker graduates equally divided were also included for a compar-ison between performance by graduate and undergraduate nonnativespeakers Additionally the administration of the new measures wascounterbalanced to control for any practice effect
a Procedures All participants met with the researchers in foursessions each lasting about an hour The first two sessions were devotedto administering the existing instruments During Session 1 partici-pants received an introduction to the study and took one of the two
182 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 183
standardized basic reading comprehension measures (NelsonndashDennyor TOEFL Reading Comprehension) Students completed thecomputer familiarity questionnaire and the NelsonndashDenny Test at the same testing session because the NelsonndashDenny was shorter than the TOEFL Reading Comprehension During Session 2 partic-ipants took the other standardized basic reading comprehensionmeasure
Next each participant group was subdivided into two subgroupsfor computer-based or paper reading of the texts for the new tasksThe subgroups were matched on their performance on initial readingmeasures the NelsonndashDenny was used for native speakers and theTOEFL Reading Comprehension was used for nonnative speakersIndependent t-tests run on these reading measures showed no signif-icant difference in basic comprehension for the newly created sub-groups assigned to each medium ensuring that they were balancedfor initial reading levels Participants stayed in the same subgroupsfor the duration of the study To ensure uniformity of response modeall participants whether they read the source texts on the computeror on paper responded to the reading tasks using paper and pencilformat3
The last two sessions each lasting approximately one hour werededicated to administration of the new measures The Reading toLearn session took slightly longer to administer because administra-tive procedures were longer for this novel task The new tasks werecounterbalanced to control for practice effect thus half of the partic-ipants took the Reading to Learn measure first and half took theReading to Integrate measure first During Session 3 we administeredthe first new measure (for ease of discussion Reading to Learn is dis-cussed first) and BC1 At this session students were given 12 minutesto read a 1200-word passage either on computer or on paper We lim-ited the time allowed for reading based on 100 words per minutethought to be ample (Grabe personal communication 1998) Afterexaminees read the text they were given 4 minutes to take notes ona half sheet of paper Participants were instructed to take minimalnotes due to the time constraints Next the text was removed andexaminees were allowed 15 minutes to complete a chart based on the reading with the aid of their notes After completing this Readingto Learn activity participants were allowed to use the text and
3Although responses could have been entered and perhaps scored by computer this would haveintroduced factors not directly related to our research questions and remains an area for furtherstudy
were given 15 minutes to answer BC1 Following these new testingsessions 49 participants were selected for a related interview con-cerning the cognitive processes used in task completion (for furtherdetails see Trites 2000 Chapter 6)
During Session 4 students were given 12 minutes to read twoshort texts (600 words each) either on computer or paper Afterparticipants read the assigned texts they were given 4 minutes totake one-half page of notes (Enright et al 1998) Next the textswere removed and participants were asked to demonstrate Readingto Integrate by writing a synthesis of the texts with the aid of theirnotes (15 minutes allowed for this task) After completing theReading to Integrate task participants were allowed to see the textsagain and answered BC2 (15 minutes allowed for this task) In oneReading to Integrate session for unknown reasons six of the sevenparticipants read only one text Because we cannot explain the causeof this anomalous session we have eliminated scores from thesessionrsquos seven participants from subsequent analyses thus slightlyreducing the N size for the Reading to Integrate measure
b Variables used in study The six independent variables includedthree nominal (Native Language Background Medium of TextPresentation and Level of Education) and three interval variables(NelsonndashDenny TOEFL Reading Comprehension and ComputerFamiliarity) The four dependent variables were Reading to LearnReading to Integrate BC1 and BC2
IV Results
First we present the descriptive statistics for all reading measuresfollowed by a systematic analysis of independent variables that mightaffect participant performance on the new measures Scatterplotswere checked for all reading measures to ensure normality of dataKurtosis and skewness levels for all reading measures were found tobe within normal limits indicating a relatively normal distributionDescriptive statistics for all existing measures are shown in Table 1Means for these measures show a consistent pattern the nativespeaker undergraduates had the highest mean followed by the non-native speaker graduates followed by the nonnative speaker under-graduates On the reading measures NelsonndashDenny and TOEFLReading Comprehension the nonnative speaker undergraduates
184 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 185
showed the largest variance in performance while on the computerfamiliarity measure the variance of both nonnative speaker groupswas substantially larger than that of the native speakers
The same pattern emerged for the means on the new measures (seeTable 2) as for the existing measures The native speaker undergradu-ate group performed better on all new measures than both of thenonnative speaker groups The nonnative speaker graduate groupperformed better than the nonnative speaker undergraduate group onall measures as well This robust pattern of performance was alsofound in the variance of three of the four new measures On BC1 andBC2 the performance of the native speaker undergraduates showed the least amount of variance followed by the nonnative speaker grad-uates followed by the nonnative speaker undergraduates On Readingto Integrate the native speaker undergraduate group showed substan-tially less variance than the nonnative speaker groups however thevariance of the two nonnative speaker groups was almost identical OnReading to Learn all three groups showed considerable variance
Table 3 reveals the range of awarded points achieved by all partici-pant groups The nature of the Reading to Learn point system created amaximum possible point value (241) that no participant achieved Wespeculate that there are at least three possible causes of the discrepancybetween the theoretical maximum and the range of observed scores
Table 1 Descriptive statistics for existing measures for three participant groups
Group n Mean sd kMax
NelsonndashDennyNSU 105 12648 1646 156NNSU 106 6724 3191 156NNSG 40 8888 2188 156Total participants 251 9547 3693 156
TOEFL Reading comprehensionNSU 105 6130 424 67NNSU 106 5030 853 67NNSG 40 5715 455 67Total participants 251 5599 819 67
Computer familiarityNSU 104 3808 360 44NNSU 104 3482 599 44NNSG 40 3563 602 44Total participants 248 3631 533 44
Note kMax number of items or maximum possible score
186 New tasks for reading comprehension tests
bull task novelty no participant reported ever doing such a taskpreviously
bull time allowed for task completion andbull space on the response sheet space constraints may have limited
the amount of information that participants could include
Future research would need to address these issues However for theReading to Integrate measure the full range of possible point totalswas achieved by at least one participant in each group
1 Computer familiarity
The overall plan for the analyses was to check the influence of theindependent variables on the dependent measures with computerfamiliarity being addressed first Initially we had proposed that if computer familiarity was significantly different across groups itwould be entered into all calculations as a covariate To determinethis it was necessary to conduct an Analysis of Variance (ANOVA) forcomputer familiarity across the six participantmedium subgroups
Table 2 Descriptive statistics for new measures for three participant groups
Group n Mean sd kMax
Reading to Learn (chart)NSU 105 5185 1986 241NNSU 106 3173 1950 241NNSG 40 4468 1927 241Total participants 251 4221 2164 241
Basic Comprehension Test 1NSU 105 1698 247 20NNSU 106 1173 425 20NNSG 40 1498 350 20Total participants 251 1444 423 20
Reading to Integrate (synthesis)NSU 101 6365 1105 80NNSU 103 3724 2176 80NNSG 40 5360 2103 80Total participants 244 5086 2163 80
Basic Comprehension Test 2NSU 105 1591 278 20NNSU 106 975 454 20NNSG 40 1285 361 20Total participants 251 1282 472 20
Notes kMax number of items or maximum possible score n size reduced forreading to integrate because of anomalous testing session
Latricia Trites and Mary McGroarty 187
The resulting ANOVA (F 470 p 05) showed a significantdifference between subgroups on the computer familiarity question-naire therefore a post hoc Scheffeacute test was done to locate significantcontrasts After analysis of all possible subgroup contrasts the post hoc Scheffeacute revealed that the only significant difference in sub-groups appeared between the native speaker undergraduates andnonnative speaker undergraduates who read texts on paper Hencealthough there was one significant contrast it occurred in two sub-groups reading on paper not in any of the subgroups who read oncomputer All groups generally scored high on computer familiarityalthough as noted variance of the nonnative groups was greater Itwas thus established that computer familiarity had no significanteffect on participants who read texts on computer so we did not usecomputer familiarity as a covariate in further analyses and proceededto the three research questions of central interest to this study
Because both Research Questions 1 and 2 are similar ndash except thatthey address the two different new reading measures Reading toLearn and Reading to Integrate ndash we approached them in the samemanner through ANOVA to identify the independent variables thatcould have significantly affected the results on the new measures
2 Research Question 1
The first research question asked if performance on a measure ofReading to Learn was affected by medium of presentation computerfamiliarity native language or level of education We calculated a uni-variate ANOVA with Type III sums of squares on Reading to Learn with
Table 3 Range of scores for new measures for three participant groups
Group n Minimum Maximum kMax
Reading to Learn (chart)NSU 105 14 120 241NNSU 106 0 86 241NNSG 40 3 94 241Total participants 251 0 120 241
Reading to Integrate (synthesis)NSU 101 38 80 80NNSU 103 0 80 80NNSG 40 5 80 80Total participants 244 0 80 80
Notes kMax number of items or maximum possible score n size reduced forreading to integrate because of anomalous testing session
188 New tasks for reading comprehension tests
group status medium of text presentation and test order as possiblecontributing factors4 Table 4 shows that there were no significantinteractions for any of the group medium or test order combinationsThe only significant main effect was group membership
Because group membership was a combined measure that includedboth native language background as well as level of education posthoc analysis was needed to identify the significant contrasts Table 5shows that there was a significant difference in performance on theReading to Learn measure between the native speaker undergraduateand the nonnative speaker undergraduate groups as well as a sig-nificant difference between the nonnative speaker undergraduate and nonnative speaker graduate groups There was no significantdifference in performance between the native speaker undergrad-uate and the nonnative speaker graduate groups Therefore theanswer to Research Question 1 is that native language backgroundand level of education did have a significant effect on performance onthe Reading to Learn measure but that medium of text presentationdid not Further order of testing whether participants took Readingto Learn or Reading to Integrate first had no significant effect
3 Research Question 2
The second research question related to the first asked if perform-ance on Reading to Integrate was affected by medium of presentation
Table 4 Performance on Reading to Learn measure by groups medium and testorder (n 251) (univariate analysis of variance)
Source Type III sum df Mean square Fof squares
Group 2186929 2 1093464 2810Medium 121586 1 121586 313Test order 29498 1 29498 076Group medium 33481 2 16740 043Group test order 437 2 219 001Medium test order 39173 1 39173 101Group medium test order 57529 2 28765 074Error 9299745 239 38911
Note p 05
4Test order was added as an additional variable to double check that our counterbalancing had beeneffective in controlling for any practice effect
Latricia Trites and Mary McGroarty 189
computer familiarity native language or level of education Again toensure that counterbalancing of tests controlled for any practiceeffect test order was added as an additional variable
To answer this question we proceeded to calculate a univariateANOVA on the Reading to Integrate measure with group statusmedium of text presentation and test order entered as possible contri-buting factors The results (Table 6) show as for Research Question 1that there were no significant interactions for any of the groupmedium or test order combinations the only significant main effectwas group membership The answer for Research Question 2 is thatnative language background and educational level had a significanteffect on Reading to Integrate but medium of text presentation did notPost hoc analysis of group contrasts showed that all three groups weredistinct in their performance on Reading to Integrate (see Table 7)
4 Research Question 3
The third research question asked to what extent measures of basiccomprehension Reading to Learn and Reading to Integrate were
Table 5 Post hoc Scheffeacute for Reading to Learn measure (n 251)
Group n Group n Mean difference Standard error
NSU 105 NNSU 106 2012 272NNSG 40 717 367
NNSU 106 NNSG 40 1295 366
Note p 05
Table 6 Performance on Reading to Integrate measure by groups medium andtest order (n 244a) (univariate analysis of variance)
Source Type III sum df Mean square Fof squares
Group 3629433 2 1814717 5582b
Medium 19295 1 19295 059Test order 9783 1 9783 030Group medium 1182 2 591 002Group test order 3014 2 1507 005Medium test order 109872 1 109872 338Group medium test order 148858 2 74429 229Error 7543037 232 32513
Notes an size reduced for Reading to Integrate because of anomalous testingsession bp 05
190 New tasks for reading comprehension tests
related We used correlational analysis as the first step in answeringthis question Results for the total participant population (see Table 8)showed moderate to high correlations across all reading measuresHowever the analyses done for Research Questions 1 and 2 revealedthat group status had a significant effect on performance on Readingto Learn and Reading to Integrate Further we realize that corre-lations are sensitive to variance so the high correlations seen in thetotal population could have been an artifact of combining the threegroups Therefore we examined the correlations among all readingmeasures for each group (available in Trites 2000 Appendix 1 pp230ndash33) While the reading measures were still correlated oftenmoderately sometimes highly magnitudes differed and sometimesdropped substantially The text-specific multiple-choice measuresBC1 and BC2 consistently correlated more highly with theNelsonndashDenny and TOEFL Reading Comprehension tests than withReading to Learn and Reading to Integrate based on the same textssuggesting a test method or construct effect Because comparisonsbetween different measures of basic comprehension were not a goal of the project BC1 and BC2 were not used in further analysesWe conclude that as expected all reading measures were relatedbut the lower correlations between Reading to Learn and Reading to Integrate and the traditional basic comprehension measures led us to consider further types of analysis to identify the possibledistinctiveness of the new measures
5 Discriminant analysis
Because we were interested in determining how constructs differedwe sought additional analyses to help us better characterize the new constructs Of the several possible statistical methods that could have been employed two are most plausible multivariateanalysis of variance usually associated with experimental research
Table 7 Post hoc Scheffeacute for Reading to Integrate measure (n 244a)
Group n Group n Mean difference Standard error
NS 101 NNSU 103 2641b 253NNSG 40 1005b 337
NNSU 103 NNSG 40 1636b 336
Notes an size reduced for Reading to Integrate because of anomalous testingsession bp 05
Latricia Trites and Mary McGroarty 191
Tab
le 8
Co
rrel
atio
ns
for
all r
ead
ing
mea
sure
s fo
r al
l par
tici
pan
ts (
n
251)
TO
EFL
Rea
din
gB
asic
B
asic
Rea
din
g t
oR
ead
ing
to
C
om
pre
hen
sio
nC
om
pre
hen
sio
n T
est
1C
om
pre
hen
sio
n T
est
2Le
arn
Inte
gra
tea
Nel
son
ndashDen
ny
90b
85b
84b
66b
69b
TO
EFL
Rea
din
g
100
85b
84b
64b
69b
com
pre
hen
sio
nB
asic
1
008
4b6
8b6
8b
com
pre
hen
sio
n 1
Bas
ic
100
68b
70b
com
pre
hen
sio
n 2
Rea
din
g t
o L
earn
100
59b
Not
es a
nsi
ze r
edu
ced
fo
r R
ead
ing
to
Inte
gra
te b
ecau
se o
f an
om
alo
us
test
ing
ses
sio
n b
p
05
and discriminant analysis usually associated with descriptiveresearch (Tabachnick and Fidell 1996) The present research wasconducted with samples of naturally occurring student groups andwas not experimental Moreover we were interested in finding waysto compare participant performance on the measures of the newconstructs Reading to Learn and Reading to Integrate with perform-ance on more traditional measures of basic comprehension Thus we opted to use discriminant analysis because of its parsimony of description and clarity of interpretation (Stevens 1996)Discriminant analysis a technique recommended to describe groupdifferences or predict group membership based on a comparison ofmultiple predictors (Huberty 1994) has been used in other areas ofapplied linguistic research to investigate creation of a student profileof success or failure on Computer Assisted Language Learning(CALL) lessons (Jamieson et al 1993) and accurate classification oftext types into registers (Biber 1993) among other purposes
To further distinguish basic comprehension from the new con-structs we conducted discriminant analysis on each of the twolanguage groups (native and nonnative) to determine whetherReading to Learn and Reading to Integrate would classify partici-pants in the same way that Basic Comprehension would We dividedthe native speaker and nonnative speaker groups into three levelshigh middle (mid) and low reading ability scorers based on thebasic comprehension measure chosen for that group (NelsonndashDennyfor native speakers TOEFL Reading Comprehension for nonnativespeakers) Research methodologists (Tabachnick and Fidell 1996513) note that robustness is expected when the smallest group has isleast 20 our smallest group was 25 To check the assumption ofhomogeneity of the variancecovariance matrices we examined theoutcomes of Boxrsquos M Test and found them all nonsignificant(Klecka 1980) Each group was checked for outliers usingMahalanobis distance treated as Chi-Square and no outliers were found (Tabachnick and Fidell 1996) Thus the data met allassumptions required for use of discriminant analysis
a Discriminant analysis for nonnative speakers To organize thediscriminant analysis in order to see if the Reading to LearnReadingto Integrate Composite classified participants similarly to themeasure of Basic Comprehension (for nonnative speakers TOEFLReading Comprehension) we divided the entire nonnative speakergroup (n 146) into three levels of basic comprehension high mid
192 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 193
and low These reading ability groups of high mid and low werebased on typical TOEFL Reading Comprehension score levelsrequired for program entry Participants were classified as high iftheir scores were greater than or equal to 550 or 56 and above on thescaled score on the TOEFL Reading Comprehension (550 is the cutscore often used for graduate entry) Participants were classified asmid if scores ranged between 500 and 549 or 50 to 55 on the TOEFLReading Comprehension Participants were classified as low if theirscores fell below 500 or 49 and below on TOEFL ReadingComprehension (500 is a minimum TOEFL score sometimes usedfor undergraduate admission often with the proviso that studentsenroll in ESL classes either prior to official enrollment or concur-rently) For our entire nonnative speaker group descriptive statisticson basic comprehension reading ability group membership levelsappear in Table 9
We ran SPSS Discriminant Analysis with initial grouping variablesof high mid and low reading ability We compared initial readingability levels with high mid and low categories on the Reading toLearnReading to Integrate Composite a new variable reflectinglevel of performance on the Reading to Learn and Reading toIntegrate measures combined5 The discriminant analysis yieldedone discriminant function with an eigenvalue of 92 responsible for999 of the variance in outcomes Wilkrsquos Lambda for this functionwas 52 significant at 001 the associated Chi-Square value wasextremely large (9093) and highly significant ( p 001) indicatingthe group centroids on the composite Reading to LearnReading toIntegrate function for the three nonnative speaker reading abilitygroups were significantly different Both Reading to Learn and
Table 9 Descriptive statistics for nonnative speakers by TOEFL reading compre-hension reading ability groups (n 143)
Reading ability group n Mean on TOEFL reading comprehension sd
High (56) 57 5968 303Mid (50ndash55) 42 5281 167Low (49) 44 4211 547
Total 143 5226 822
Note Total n 143 due to loss of three cases in anomalous Reading to Integratesession
5We first calculated two separate discriminant analyses one for Reading to Learn and one for Reading to Integrate but we found that both loaded on a single function so we used thecomposite in subsequent analyses
194 New tasks for reading comprehension tests
Reading to Integrate loaded significantly on the discriminant func-tion at 86 for Reading to Learn and 78 for Reading to Integrate ( p 05)
Over half (649) of the high reading ability group remained highon the new measure less than half (429) of the mid reading ability group remained classified as mid Hence the Reading toLearnReading to Integrate Composite was particularly influential inreclassifying the mid reading ability group and to a lesser extent thehigh group However most (818) of the low reading ability groupremained low on the Reading to LearnReading to IntegrateComposite (see Table 10) Of the 143 nonnative speaker participants92 (64) remained in the initial basic comprehension category onthe composite the rest moved but in different directions Twenty-one participants (147) were classified into a higher category on theReading to LearnReading to Integrate Composite than their initialbasic comprehension level would have suggested while 31 (217)were reclassified into a lower category Thus 51 participants(364) just over one third of the sample were classified differentlybased on their Reading to LearnReading to Integrate Compositeperformance
b Discriminant analysis for native speakers Because one of ourgoals in this project was to probe the possible validity of these newmeasures by assessing performance of two groups native as well as
Table 10 Discriminant analysis comparison of nonnative speaker reading abilitygroups with reading to learnreading to integrate composite (n 143)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto integrate composite
High Mid Low
CountHigh (56) 37 17 3 57Mid (50ndash55) 13 18 11 42Low (49) 2 6 36 44Reclassification total 52 41 50 143
PercentageHigh (56) 649 298 53 1000Mid (50ndash55) 310 429 262 1000Low (49) 45 136 818 1000
NoteTotal n 143 due to loss of three cases in anomalous Reading to Integratesession
Latricia Trites and Mary McGroarty 195
nonnative speakers we conducted a parallel discriminant analysisfor native speakers Thus for the native speakers we followed thesame procedure dividing the entire native speaker group (n 105)into three levels of basic comprehension high mid and low basedon score distances of 5 standard deviations from the sample meanof the NelsonndashDenny test for these participants Native speakersneed not take reading comprehension tests when entering the univer-sity so the three-way split was based entirely on our sample dataParticipants were classified as high if their scores on the Nelson-Denny were greater than or equal to 135 Participants were classifiedas mid if their scores ranged between 119ndash134 on the Nelson-DennyParticipants were classified as low if their scores fell at or below 118on the Nelson-Denny Descriptive statistics for the entire nativespeaker sample on basic comprehension group membership appearin Table 11
The discriminant analysis yielded one discriminant function with an eigenvalue of 31 responsible for 987 of the variance inoutcomes Wilkrsquos Lambda for this function was 76 significant at 001 the associated Chi-Square value was large (2639) and highlysignificant (p 001) indicating the group centroids on the discrim-inant function for the three reading ability groups on the Reading toLearnReading to Integrate Composite were significantly differentPooled within groups correlations between discriminating variablesshowed that Reading to Learn correlated with the first discriminantfunction at a level of 81 Reading to Integrate correlated with thesecond discriminant function at 71 This contrasts with findings forthe nonnative speakers where scores on the combined new measuresloaded significantly on only one discriminant function For nativespeakers then there is evidence for two significant discriminantfunctions although the first accounts for almost all of the varianceAlthough these two measures (Reading to Learn and Reading toIntegrate) loaded on two separate discriminant functions they still
Table 11 Descriptive statistics for native speakers by NelsonndashDenny reading abilitygroups (n 101)
Reading ability group n Mean on NelsonndashDenny sd
High (135) 39 14126 519Mid (119ndash134) 37 12565 509Low (118) 25 10288 1098Total 101 12604 1652
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
196 New tasks for reading comprehension tests
showed moderate correlations with the alternate function6 justifyingthe composite calculations
Results of discriminant analysis for native speakers seen in Table 12 show a different pattern than that observed for nonnativespeakers Nearly three-fourths (718) of the native speakersclassified as high in the basic comprehension reading ability groupremained high on the Reading to LearnReading to IntegrateComposite For the mid group however only 189 remainedclassified as mid Just over half (52) of the low reading abilitygroup members remained low on the Reading to LearnReading to Integrate Composite As with the nonnative speakers participantsin the mid category on basic comprehension showed the mostfrequent reclassification Forty-eight of the 101 (475) nativespeaker participants remained in the initial classification categoriesTwenty-two (218) were reclassified into a higher category and 31(307) were reclassified into a lower category Thus 53 participants(525) over half of the sample were classified differently based ontheir performance on the Reading to LearnReading to IntegrateComposite
6Reading to Learn correlated with function 2 at -58 Reading to Integrate with function 1 at 71
Table 12 Discriminant analysis comparison of native speaker reading abilitygroups with Reading to LearnReading to Integrate Composite (n 101)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto Integrate Composite
High Mid Low
CountHigh (56) 37 17 3CountHigh (135) 28 3 8 39Mid (119ndash134) 10 7 20 37Low (118) 5 7 13 25Reclassification total 43 17 41 101
PercentageHigh (135) 718 77 205 1000Mid (119ndash134) 270 189 541 1000Low (118) 200 280 520 1000
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
V Interpretation
Analyses done to answer Research Questions 1 and 2 showed that performance on Reading to Learn and Reading to Integratemeasures was significantly influenced by language background and level of education (graduate vs undergraduate for nonnativespeakers only) Moreover level of computer familiarity had nosignificant effect on Reading to Learn and Reading to Integrateperformance
Correlations showed that as expected all reading measures corre-lated to some degree generally answering Research Question 3 Themost interesting results came from the discriminant analyses becausethey showed a pattern of differential classification on the new taskssensitive to initial reading ability This pattern differed for nonnativespeakers and native speakers Examination of the reclassificationsbased on the new tasks revealed some of the problems of using abasic comprehension-only test to predict performance on more chal-lenging literacy tasks These results also imply the need for furtherwork on new measures of advanced literacy skills such as Reading toLearn and Reading to Integrate to reflect trends in construct-drivenassessment (Pellegrino et al 1999)
For the nonnative speakers most (818) of the participants withTOEFL Reading Comprehension below 50 remained classified as lowon the Reading to LearnReading to Integrate Composite suggestingthe existence of a lower threshold of academic English proficiencyParticipants below this threshold were unlikely to perform well ontasks assessing Reading to Learn and Reading to Integrate Even tasksinvolving only selection (such as BC1 and BC2) rather than produc-tion of responses were difficult for this group However for thenonnative speaker readers in the mid and high reading ability groupsbasic comprehension level was not nearly as consistent a predictor of performance on the Reading to LearnReading to IntegrateComposite This was especially striking for those in the mid readingability group Approximately one fourth of the mid group droppedinto the low category on the Reading to LearnReading to IntegrateComposite indicating that they experienced more difficulty in com-pleting the new measures but approximately one fourth could indeedcategorize and synthesize information relatively well despite mid-dling performance on basic comprehension This finding suggeststhat for nonnative speaker readers in the mid reading ability categorybasic comprehension measures were insufficient to predict theirperformance on the new measures
Latricia Trites and Mary McGroarty 197
Almost two-thirds of the high basic comprehension nonnativespeakers remained high on the Reading to LearnReading toIntegrate Composite but one third dropped suggesting that theycould not categorize and synthesize information from texts as well asthey could recognize information For this third the recognition-onlytask multiple-choice basic comprehension overestimated ability tomanipulate and synthesize information For the high and mid read-ing ability nonnative speaker groups basic comprehension-only testsmay then overestimate academic English proficiency
Results for the native speakers showed a more definitive pattern ofreclassification For the low reading ability group approximatelyhalf were misclassified suggesting that the basic comprehensionmeasure underrepresented their ability to categorize and synthesizeinformation in academic texts Most of the mid reading ability groupwere reclassified as lower on the Reading to LearnReading toIntegrate Composite showing that basic comprehension resultsoverestimated ability to succeed on more challenging tasks On theother hand almost one third of the mid reading ability group didbetter on the Reading to LearnReading to Integrate Composite forthem basic comprehension underrepresented their ability to com-plete more challenging reading tasks For the high ability readersnearly one third dropped on the Reading to LearnReading toIntegrate Composite For them the basic comprehension measureoverestimated their level of academic English proficiency The other two thirds of the high reading ability group remained highsuggesting a higher threshold of academic English proficiency
Considering results from both the nonnative speakers and nativespeakers we conclude that the new tasks did assess something dif-ferent from basic comprehension once a lower level threshold ofbasic academic English proficiency had been achieved Examinationof scatterplots based on discriminant functions indicated an obviousseparation between participants who could and could not perform onthe Reading to LearnReading to Integrate Composite thus provid-ing some tentative evidence for concurrent validity (Messick 1989Chapelle 1999) We had hoped to find clear evidence of a hierarchysuggesting that Reading to Learn was demonstrably more difficultthan basic comprehension and Reading to Integrate demonstrablymore difficult than Reading to Learn but results did not yield anobvious hierarchy Results did however suggest an even simplerpattern a dichotomy For those nonnative speakers above the lowerthreshold of English language proficiency which in our data wouldbe approximately 500 or 49 on the scaled TOEFL Reading
198 New tasks for reading comprehension tests
Comprehension scores some could perform well on the newmeasures some could not revealing two rather than three groups onthe Reading to LearnReading to Integrate Composite These resultslead us to speculate that the new measures tap additional skills suchas lsquosophisticated discourse processes and critical thinking skillsrsquo(Enright and Schedl 1999 24) in addition to language proficiencyFor native speakers the discriminant loading suggested a possiblehierarchy of difficulty but the scatterplots and classification tablesrevealed a dichotomy Reclassification based on the Reading toLearnReading to Integrate Composite nearly eliminated the mid basic comprehension reading ability category with only 168of the native speakers qualifying as mid on the Reading toLearnReading to Integrate Composite
VI Conclusions
While we acknowledge that there are some limitations to this projectsuch as the time required to administer and score new measuresthe results are illuminating useful and suggest some considerationsfor future research The first consideration relates to appropriateclassification of students based on academic reading abilities Thiscorresponds to the role of TOEFL as a gatekeeper by many institu-tions For TOEFL test-takers whose current TOEFL scores would bein an intermediate to high intermediate range the new tasks couldassess additional abilities relevant to academic performance Forsuch participants additional test development efforts are warrantedThe majority of high reading ability nonnative speakers (649)could perform the new measures a third could not Our resultssuggest that some unknown number of mid reading ability nonnativespeakers are more capable of succeeding at more challenging aca-demic reading tasks than their current level of basic comprehensionassessment would indicate At the same time there are some highand mid reading ability participants who could not perform the moredemanding tasks admitting such students directly into universityprograms could result in failure These results suggest that for moststudents at lower levels of basic comprehension (in our study thosewith TOEFL scores below 490 to 500) development of new tasks isunnecessary Nevertheless our results indicate that a certain smallpercentage (in our sample 8 students 182) of nonnative speakersclassified as low reading ability on basic comprehension did in fact
Latricia Trites and Mary McGroarty 199
perform adequately on more challenging tasks like Reading to Learnand Reading to Integrate Given the very large number of TOEFLtest-takers around the world this finding merits further research Itwould be potentially unfair to exclude such students from universitystudy based on basic comprehension-only measures such as the cur-rent TOEFL We would urge institutions that use TOEFL to considerscores on these new more demanding tasks if doing so wouldaddress institutional needs According to interview data collected inconjunction with this project (Trites 2000 Chapter 6) participantsperceived that successful completion of the new tasks required athorough understanding of the texts whereas the multiple-choicetests were less demanding requiring only superficial grasp of con-tent This finding corroborates the observation of Freedle and Kostin(1999 3) who note that often examinees do not need to comprehendthe accompanying text to answer the test item Therefore for allTOEFL test-takers incorporating more challenging tasks such as theReading to Learn and Reading to Integrate measures into typicalEnglish language instruction could have positive washback effects
The second area of consideration is the focus on additional rele-vant research For predictive validity it is important to determine thecorrelation between the Reading to LearnReading to IntegrateComposite measures and actual academic performance in universityclasses Correlations might differ depending on the degree of catego-rization and synthesis required by different major fields of studyAnother area for future research is related to the novelty of theReading to Learn task as an assessment technique Current pedagog-ical trends in literacy instruction emphasize the use of graphic organ-izers as a means to understand and manipulate information in textsTo our knowledge graphic organizers are typically used as classroomactivities rather than assessment tools They have good potential foruse in assessment if students are familiar with them and if appropri-ate scoring systems can be developed This project demonstrates that it is possible though labor intensive to develop reliable scoringsystems Further research is needed to explore refinement of this tasktype and related scoring systems for use in testing programs withlarge numbers of test-takers and scorers Although the Reading toIntegrate task (generating a synthesis) was more familiar the scoringsystem was innovative because it reflected a readerrsquos ability torecognize textual frames as well as integrate information If testdevelopers are interested in the abilities of nonnative speaker readersto perform such tasks further development of similar tasks andscoring systems is warranted While a substantial investment of time
200 New tasks for reading comprehension tests
would be required to refine the administration and scoring systemsneeded for more complex tasks such as these it should be weighedagainst the possible danger of under-representing the likelihood ofacademic success based on results of the basic comprehension-onlymeasures still most often used to assess academic proficiency
Acknowledgements
This project was funded by a grant from Educational Testing Serviceas part of the TOEFL 2000 effort We appreciate the cooperation ofthe ETS staff members who assisted in the development of passagespecific tasks and other aspects of the research however no officialendorsement of Educational Testing Service should be inferred
VII References
Bachman L 2000 Modern language testing at the turn of the century assur-ing that what we count counts Language Testing 17 1ndash42
Biber D 1993 Using register-diversified corpora for general language studiesComputational Linguistics 19 219ndash41
Britt M Rouet J and Perfetti C 1996 Using hypertext to study and reasonabout historical evidence In Rouet J Levonen J Dillon A and Spiro Reditors Hypertext and cognition Mahwah NJ Lawrence Erlbaum 43ndash72
Chapelle C 1999 Validity in language assessment Annual Review of AppliedLinguistics 19 1ndash19
Educational Testing Service 1997 TOEFL test and score manual PrincetonNJ Educational Testing Service
mdashmdash 1998 Draft TOEFL 2000 research agenda framework areas of researchResearch agenda three TOEFL 2000 internal document Princeton NJEducational Testing Service
Eignor D Taylor C Kirsch I and Jamieson J 1998 Development of ascale for assessing the level of computer familiarity of TOEFL exami-nees TOEFL Research Report No 60 Princeton NJ EducationalTesting Service
Enright M and Schedl M 1999 Reading for a reason using reader purposeto guide test design TOEFL 2000 Internal Report Princeton NJEducational Testing Service
Enright M Grabe W Mosenthal P Mulcahy-Ernt P and Schedl M1998 A TOEFL 2000 framework for testing reading comprehension aworking paper Princeton NJ Educational Testing Service
Foltz P 1996 Comprehension coherence and strategies in hypertext InRouet J Levonen J Dillon A and Spiro R editors Hypertext and cog-nition Mahwah NJ Lawrence Erlbaum 109ndash36
Latricia Trites and Mary McGroarty 201
Freedle R and Kostin I 1999 Does the text matter in a multiple choice testof comprehension The case for the construct validity of TOEFLrsquosminitalks Language Testing 16 2ndash32
Goldman S 1997 Learning from text reflections on the past and suggestionsfor the future Discourse Processes 23 357ndash98
Hayes J and Hatch J 1999 Issues in measuring reliability correlationversus percentage of agreement Written Communication 16 354ndash67
Huberty C 1994 Applied discriminant analysis New York John WileyJamieson J Campbell J Norfleet L and Berbisada N 1993 Reliability
of a computerized scoring routine for an open-ended task System 21305ndash22
Jamieson J Norfleet L and Berbisada N 1993 Successes failures anddropouts in computer-assisted language lessons Computer AssistedEnglish Language Learning Journal 4 12ndash20
Klecka W 1980 Discriminant analysis In Lewis-Beck M editorQuantitative applications in the social sciences Volume 19 NewburyPark CA Sage
Lehto M Zhu W and Carpenter B 1995 The relative effectiveness ofhypertext and text International Journal of Human-ComputerInteraction 7 293ndash313
McNamara D and Kintsch W 1996 Learning from texts effects of prior knowl-edge and text coherence Discourse Processes 22 247ndash88
Messick S 1989 Validity In Linn RL editor Educational measurement 3rdedition New York American Council on Education Macmillan 13ndash103
Meyer B 1985a Prose analyses purposes procedures and problems InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 11ndash64
mdashmdash 1985b Prose analysis purposes procedures and problems Part 2 InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 269ndash97
Monks V 1997 AugustSeptember Two views same waterway NationalWildlife 35 36ndash37
Pellegrino J Baxter G and Glaser R 1999 Addressing the lsquotwo disci-plinesrsquo problem linking theories of cognition and learning with assess-ment and instructional practice Review of Research in Education 24307ndash53
Perfetti C 1997 Sentences individual differences and multiple texts threeissues in text comprehension Discourse Processes 23 337ndash55
Perfetti C Britt MA and Georgi M 1995 Text-based learning andreasoning studies in history Hillside NJ Lawrence Erlbaum
Perfetti C Marron M and Foltz P 1996 Sources of comprehension failuretheoretical perspectives and case studies In Cornoldi C and Oakhill Jeditors Reading comprehension difficulties processes and interventionMahwah NJ Lawrence Erlbaum 137ndash65
202 New tasks for reading comprehension tests
Reinking D 1988 Computer-mediated text and comprehension differencesthe role of reading time reader preference and estimation of learningReading Research Quarterly 23 484ndash500
Reinking D and Schreiner R 1985 The effects of computer-mediated texton measures of reading comprehension and reading behavior ReadingResearch Quarterly 20 536ndash53
Spivey N 1997 The constructivist metaphor reading writing and themaking of meaning San Diego CA Academic Press
Stevens J 1996 Applied multivariate statistics for the social sciences 3rdedition Mahwah NJ Lawrence Erlbaum
Tabachnick B and Fidell L 1996 Using multivariate statistics 3rd editionNew York Harper Collins
Taylor C Jamieson J Eignor D and Kirsch I 1998 The relationshipbetween computer familiarity and performance on computer-basedTOEFL test tasks TOEFL Research Report No 61 Princeton NJEducational Testing Service
Tennesen M 1997 NovemberDecember On a clear day National Parks 7126ndash9
Trites L 2000 Beyond basic comprehension reading to learn and reading tointegrate for native and non-native speakers Unpublished doctoraldissertation Northern Arizona University Flagstaff AZ
Van den Berg S and Watt J 1991 Effects of educational setting on studentresponses to structured hypertext Journal of Computer-BasedInstruction 18 118ndash24
Van Dijk TA and Kintsch W 1983 Strategies of discourse comprehensionNew York Academic Press
Wiley J and Voss J 1999 Constructing arguments from multiple sourcestasks that promote understanding and not just memory for text Journalof Educational Psychology 91 310ndash11
Zimmerman T 1997 December 29 Filter it with billions and billions of oys-ters how to revive the Chesapeake Bay US News and World Report 12363
Appendix 1a Chart completion task
Directions Complete the following chart Fill in as much detail aspossible from the text read by categorizing the information into thedifferent areas on the chart Do not use single words for yourresponses form your responses in phrases or complete sentencesInclude examples from the text
Make no judgments about the accuracy of causes or effects or theeffectiveness of the solutions mentioned in the text Solutions are seen
Latricia Trites and Mary McGroarty 203
204 New tasks for reading comprehension tests
as any action taken in response to the problem(s) Solutions can takemany forms such as proposed solutions attempted solutions or failedsolutions Also space is provided under each category for examplesExamples are specific examples found in the text that are used by theauthor(s) to exemplify the problems causes effects or solutions inthe text However there may not be examples for every category
Points will be awarded for correct responses only There is no penaltyfor incorrect responses Points will be awarded in the following manner
Problems and Solutions 10 points eachCauses and Effects 5 points eachExamples 1 point each
Problems Causes Effects Solutions
Examples Examples Examples Examples
Latricia Trites and Mary McGroarty 205
Ap
pen
dix
1b
Rea
din
g t
o L
earn
sco
rin
g r
ub
ric
0ndash2
41 t
ota
l po
ssib
le
Pro
ble
ms
(10
po
ints
) 1
wo
rd
Cau
ses
(5p
oin
ts)
1 w
ord
E
ffec
ts (
5 p
oin
ts)
1 w
ord
S
olu
tio
ns
(10
po
ints
en
d)
1 w
ord
(6 p
oin
ts)
40
max
imu
m(3
po
ints
) 5
5 m
axim
um
(3 p
oin
ts)
30
max
imu
m(6
po
ints
) 9
0 m
axim
um
bullA
ir p
ollu
tio
nS
mo
g i
n t
he
bullS
ulf
ur
nit
rog
en e
mis
sio
ns
bullTre
es a
nd
pla
nts
aff
ecte
dS
tric
ter
En
vir
on
men
tal
Law
s
Nati
on
al
Park
sbull
Aci
d r
ain
(in
jure
d)
gro
wth
hin
der
ed(r
eso
luti
on
s
acts
)
bullO
pp
osit
ion
or
Ign
ori
ng
bull
Gro
un
d-l
evel
ozo
ne
bullV
isib
ilit
y d
ecre
ased
bull19
77 C
lean
Air
Act
(or
lack
of
coo
per
atio
n)
to
Urb
an
In
du
str
ial
em
issio
ns
bullM
eta
ls l
oo
sen
ed
into
wat
ers
Am
en
dm
en
ts l
ab
elin
gN
Ps
envi
ron
men
tal s
tan
dar
ds
bullA
uto
mo
bile
emis
sio
ns
(su
rfac
eg
rou
nd
wat
er
as C
lass I
are
as
(reg
ula
tio
ns)
bullP
ow
er
pla
nts
fac
tori
esp
lan
ts
wat
er p
ollu
ted
)bull
Reg
ion
al h
aze
reg
ula
tio
ns
bullN
o t
rue
lsquopo
int
sou
rces
rsquoem
issi
on
sbull
Nu
trie
nts
rem
oved
pro
po
sed
by
the
EPA
(myri
ad
of
sm
aller
po
llu
tio
n
bullE
mis
sio
ns
fro
m S
mo
kesta
cks
(lea
ched
) fr
om
so
il bull
Red
ucti
on
of
allo
wab
leso
urc
es
rath
er t
han
on
e (c
him
neys)
and
or
pla
nts
po
llu
tio
n s
tan
dard
s f
rom
la
rge
sou
rce)
bullE
mis
sio
ns
fro
m K
iln
sbull
Pu
blic o
utc
ryo
ver
stan
ce
ind
ust
rybull
Nati
on
al
Park
s l
imit
ed
bull
Un
reg
ula
ted
po
lluti
on
sm
og
o
f b
ig b
usi
nes
sg
ove
rnm
ent
bullC
lean
Air
Act
set
ju
risd
icti
on
of
po
lluti
on
(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)bull
Aq
uati
c l
ife in
jure
d (
dam
aged
)vis
ibilit
y g
oals
sou
rces
ou
tsid
e o
f th
e p
arks
bullS
mo
ke f
rom
Co
ntr
olled
bu
rns
bullO
bje
cti
ng
to c
on
str
ucti
on
bullE
mis
sio
ns
fro
m la
rge
citi
esp
erm
its
bullId
en
tifi
cati
on
of
po
lluti
on
so
urc
es
bullIn
du
str
y m
od
ificati
on
s
Insta
llati
on
o
f d
evic
es
(scr
ub
ber
s)
to r
edu
ce
po
lluti
on
bullP
ub
lic p
ressu
re t
op
rote
ct p
arks
206 New tasks for reading comprehension testsA
pp
en
dix
1b
(co
nti
nu
ed)
Exam
ple
s (
1 p
oin
t) 5
maxim
um
Exam
ple
s (
1 p
oin
t o
r 5 p
oin
ts
Exam
ple
s (
1 p
oin
t) 6
maxim
um
Exam
ple
s (
1 p
oin
t o
r 10 p
oin
ts
wit
h o
vera
rch
ing
cau
se
wit
h o
vera
rch
ing
so
luti
on
liste
d a
bo
ve)
liste
d a
bo
ve)
bullG
reat
Sm
oky M
ou
nta
inbull
Au
tom
ob
ile
emis
sio
ns
bullLe
aves
of
pla
nts
tu
rnin
g
bull19
77 C
lean
Air
Act
N
atio
nal
Par
kbull
Po
wer
pla
nts
fac
tori
es
pu
rple
an
d b
row
n (
stip
plin
g)
Am
en
dm
en
ts
lab
elin
gN
Ps
bullG
ran
d C
an
yo
np
lan
ts e
mis
sio
ns
bullH
ind
erin
g p
ho
tosyn
thesis
of
as C
lass I
are
as
Nat
ion
al P
ark
bullE
mis
sio
ns
fro
m
pla
nts
an
d t
rees
bullS
mo
ky M
ou
nta
in S
ou
rces
S
mo
kesta
cks
(ch
imn
eys)
bullR
ed
ucti
on
of
vis
ibilit
yat
bullR
egio
nal
haz
ere
gu
lati
on
s
fro
m O
hio
New
Yo
rk
bullE
mis
sio
ns
fro
m K
iln
sG
ran
d C
an
yo
np
rop
ose
d b
y th
e E
PA
Atl
anta
etc
bull
Un
reg
ula
ted
po
lluti
on
sm
og
bull
Vis
ibili
ty r
educ
ed 9
0 o
f bull
Red
ucti
on
of
allo
wab
lebull
Gra
nd
Can
yon
So
urc
es(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)th
e da
ysp
ollu
tio
n s
tan
dard
s f
rom
fro
m C
A N
V U
T A
Z N
M
bullS
mok
e fr
om C
on
tro
lled
bu
rns
bullS
ee v
ague
blu
e m
asse
sin
du
stry
and
Mex
ico
bullE
mis
sio
ns
fro
m la
rge
citi
esbull
See
hal
f as
far
as
in 1
919
bullT
N L
utt
rell
corp
bu
ildin
g
per
mit
sit
uat
ion
Exam
ple
s (
1 p
oin
t)
10 m
axim
um
Exam
ple
s (
1 p
oin
t)
5 m
axim
um
bullN
avaj
o G
ener
atin
g S
tati
on
bull
Ob
ject
ing
to
kiln
s in
TN
Pag
e A
Z (
Po
wer
Pla
nt
AZ
)bull
Res
earc
her
s u
sin
g s
cien
tifi
cbull
Po
wer
pla
nts
in T
N a
nd
OH
tech
no
log
y to
ID s
ou
rces
rive
r va
lley
(rad
ioac
tive
iso
top
es)
bullS
ou
ther
n C
alif
orn
ia E
dis
on
bull
So
uth
ern
Cal
ifo
rnia
Ed
iso
nP
lan
t L
aug
hlin
NV
Pla
nt
in L
aug
hlin
NV
bull
Ten
nes
see
Lutt
rell
Kiln
s T
Nid
enti
fied
as
po
int
sou
rce
bullIn
du
stry
in A
Zbull
Scru
bber
sin
stal
led
at
bullC
ars
in C
AN
avaj
o G
ener
atin
g P
lan
tbull
Sm
oke
stac
ks in
NM
NV
UT
bull10
r
edu
ctio
n o
f p
ollu
tio
nbull
Los
An
gel
esin
10ndash
15 y
ears
(g
oal
of
no
bullA
tlan
tam
an-m
ade
po
lluti
on
)bull
New
Yo
rk
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
of basic reading comprehension related directly to the text includedin the new task Thus each participant completed a total of sevendifferent instruments
a Existing instruments Initial levels of reading comprehensionwere determined based on the NelsonndashDenny Reading Test(NelsonndashDenny) Form G used to identify the reading levels of theNSs and three retired versions of the Institutional TOEFL ReadingComprehension Section (TOEFL Reading Comprehension) used toidentify the reading levels of the NNSs Although each of these testswas used to assess reading levels in the population for which it had been developed all 251 participants took both tests in order toprovide comparative data All 251 participants also completed a brief computer familiarity questionnaire
Participantsrsquo computer familiarity was determined through an 11-item questionnaire based on a longer 23-item questionnairepreviously developed by ETS (Eignor et al 1998) In the presentstudy we used only the 11 items that loaded the most heavily on themajor factors resulting from administration to a large sample ofTOEFL participants For these 11 items developers determined thereliability to be 93 using a split-half method (Eignor et al 199822) This brief questionnaire took approximately 5 minutes tocomplete reliability in our sample using coefficient alpha was 87
b Texts used for new measures In developing the new tasks weselected texts that would conform to the design specifications ofTOEFL 2000 They were problemsolution texts recommended asone of the potentially relevant text types for TOEFL 2000 (Enright et al 1998) Longer texts were used because these represented morechallenging and authentic academic tasks (Enright et al 1998) Weused one 1200-word and two 600-word texts The longer text(Tennesen 1997) was used to assess Reading to Learn and the two600-word texts (Monks 1997 Zimmerman 1997) were used toassess Reading to Integrate We chose these text lengths based onwork by Meyer (1985a) and further research by the first authorindicating that natural science texts between 1200 and 1500 wordsincluded representation of all necessary macro-rhetorical structuresof problemsolution texts with or without explicit signaling While1200ndash1500 word texts provide optimal representation of the macro-rhetorical structures texts of 600-words provide all the basic macro-rhetorical structures present in problemsolution texts Thus these
Latricia Trites and Mary McGroarty 179
180 New tasks for reading comprehension tests
lengths were long enough for adequate argumentation but not so long that they were excessively redundant (Enright et al 1998)Texts were also matched for readability according to standard read-ability scales such as the FleschndashKincaid ColemanndashLiau andBormuth scales and averaged a minimum of grade level 110 to 120on these scales Also all texts pertained to natural and social scienceseach text covered environmental issues such as air and water pollution(Enright et al 1998) Thus text topics were similar across tasks
c New instruments used in the study Three new reading measureswere used in this study to assess Reading to Learn Reading to Integrateand Basic Comprehension Trites (2000 Chapters 2 and 3) presents amore extensive review of literature and rationale for development of thenew measures
bull Reading to Learn The first new measure completion of a chart was used to determine participantsrsquo ability to read to learnSpivey (1997 69) suggests that readersrsquo categorization of infor-mation in text offers insight into their cognitive processes andtheir making of meaning We designed a measure to be used with a 1200-word text that students read on either paper or com-puter Students were asked to recall identify and categorizeinformation from the text on a chart reflecting macro-rhetoricalstructures called macrostructures in this study (problems andsolutions) and other types of information from problemsolutiontexts (causes effects and examples) categories based on thework of Meyer (1985a) The scoring rubric based on work byMeyer (1985b) and later modified by Jamieson et al (1993)awarded points only for the upper levels of textual structurerepresented on the chart (for task and scoring rubric seeAppendix 1) We weighted the information supplied on the chartas follows 10 points for correct information in the problem andsolution categories five points for correct information suppliedin the cause and effect categories and one point for accurateexamples This weighting reflects Meyerrsquos (1985b) hierarchicallevels which characterize problem and solution propositions ashigher order structures while the other categories represent lowerorder propositions1 The theoretical maximum score for this scale
1Students received no points for information improperly placed or for information not found in thetext
Latricia Trites and Mary McGroarty 181
was 241 which would result from maximum points given in allcategories The first author and two research assistants spent35ndash40 hours creating revising norming the scoring rubric anddeveloping the scoring guide (Trites 2000 Chapter 3) To deter-mine interrater reliability we used coefficient alpha rather thanpercentage of agreement because percentage of agreementinflates the likelihood of chance agreement (Hayes and Hatch1999) After norming overall interrater reliability was 99(coefficient alpha) with similarly high reliabilities assessed withsimilarly high alpha coefficients for all subcategories2
bull Reading to Integrate The second new measure assessed Readingto Integrate The task used to assess Reading to Integraterequired participants to read two 600-word texts and compose awritten synthesis The prompt asked students to make connec-tions across the range of ideas presented thus we asked readersto synthesize information rather than summarize or makecomparisons (Wiley and Voss 1999) This synthesis was scoredbased on an analytic scale ranging from 0 to 80 reflecting read-ersrsquo ability to recognize and manipulate the structure of the textsinclude specific information and express connections acrosstexts through the use of cohesive devices (for task and scoringrubric see Appendix 2) The test was designed to measure theintegration of content from both readings and did not assessother aspects of writing such as the creation of rhetorical stylegrammaticality or mechanics The rubric was composed of threesubcategories integration ability macrostructure recognitionand use of relevant details The integration subscore wasawarded the highest point values because this was the predomi-nant skill being tested It scored participants on their ability tomake connections across texts based on the manipulation of thetextual frames in both texts The second subcategory awardedpoints for the ability to recognize and articulate the macrostruc-tures (problem cause effect or solution) present in each textThis subcategory was similar to the categorizing task used in theReading to Learn measure with the additional constraint thatparticipants had to express the connections overtly The thirdsubcategory in the scoring rubric analysed the ability to use
2We recognize that tasks requiring high inference measures plus extensive norming and revision of the scoring rubric pose feasibility issues in large-scale testing Further research is needed todetermine whether and how such scoring procedures could be adapted in standardized testing fornumerous test-takers
relevant details as support in the written synthesis The firstauthor and two research assistants spent 30 hours revising norm-ing the scoring rubric and developing a decision guide resultingin an overall interrater reliability of 99 (coefficient alpha) withsimilarly high alphas for all subcategories
bull Basic Comprehension The third construct was measured bymultiple-choice tests related specifically to the texts used in the new tasks These tests were created by TOEFL TestDevelopment staff and followed current TOEFL reading sectionspecifications We used two multiple choice tests BasicComprehension Test 1 (BC1) and Basic Comprehension Test 2 (BC2) 20 items each one for the longer passage used to assessReading to Learn and one for the two passages used to assessReading to Integrate Both were scored based on number of items answered correctly Reliability on BC1 calculated basedon 251 participants was 84 (coefficient alpha) Inadvertently theorder of the texts used in BC2 was different for the two differentmedia however reliability on both versions of the test was highFor those who took BC2 based on paper texts (n 127) relia-bility was 84 (coefficient alpha) for those who took BC2 basedon computerized texts (n 124) reliability was 86 (coefficientalpha)
3 Design for data collection
This study used a 22 repeated measures design to examineperformance on the new reading tasks Native speaker undergraduatesand nonnative speaker undergraduates were divided into two groupseach of equal ability as determined by performance on the baselinestandardized measures of reading comprehension (NelsonndashDenny orTOEFL) Half of each group read texts on paper the other half readthe same texts on a computer screen A smaller group of nonnativespeaker graduates equally divided were also included for a compar-ison between performance by graduate and undergraduate nonnativespeakers Additionally the administration of the new measures wascounterbalanced to control for any practice effect
a Procedures All participants met with the researchers in foursessions each lasting about an hour The first two sessions were devotedto administering the existing instruments During Session 1 partici-pants received an introduction to the study and took one of the two
182 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 183
standardized basic reading comprehension measures (NelsonndashDennyor TOEFL Reading Comprehension) Students completed thecomputer familiarity questionnaire and the NelsonndashDenny Test at the same testing session because the NelsonndashDenny was shorter than the TOEFL Reading Comprehension During Session 2 partic-ipants took the other standardized basic reading comprehensionmeasure
Next each participant group was subdivided into two subgroupsfor computer-based or paper reading of the texts for the new tasksThe subgroups were matched on their performance on initial readingmeasures the NelsonndashDenny was used for native speakers and theTOEFL Reading Comprehension was used for nonnative speakersIndependent t-tests run on these reading measures showed no signif-icant difference in basic comprehension for the newly created sub-groups assigned to each medium ensuring that they were balancedfor initial reading levels Participants stayed in the same subgroupsfor the duration of the study To ensure uniformity of response modeall participants whether they read the source texts on the computeror on paper responded to the reading tasks using paper and pencilformat3
The last two sessions each lasting approximately one hour werededicated to administration of the new measures The Reading toLearn session took slightly longer to administer because administra-tive procedures were longer for this novel task The new tasks werecounterbalanced to control for practice effect thus half of the partic-ipants took the Reading to Learn measure first and half took theReading to Integrate measure first During Session 3 we administeredthe first new measure (for ease of discussion Reading to Learn is dis-cussed first) and BC1 At this session students were given 12 minutesto read a 1200-word passage either on computer or on paper We lim-ited the time allowed for reading based on 100 words per minutethought to be ample (Grabe personal communication 1998) Afterexaminees read the text they were given 4 minutes to take notes ona half sheet of paper Participants were instructed to take minimalnotes due to the time constraints Next the text was removed andexaminees were allowed 15 minutes to complete a chart based on the reading with the aid of their notes After completing this Readingto Learn activity participants were allowed to use the text and
3Although responses could have been entered and perhaps scored by computer this would haveintroduced factors not directly related to our research questions and remains an area for furtherstudy
were given 15 minutes to answer BC1 Following these new testingsessions 49 participants were selected for a related interview con-cerning the cognitive processes used in task completion (for furtherdetails see Trites 2000 Chapter 6)
During Session 4 students were given 12 minutes to read twoshort texts (600 words each) either on computer or paper Afterparticipants read the assigned texts they were given 4 minutes totake one-half page of notes (Enright et al 1998) Next the textswere removed and participants were asked to demonstrate Readingto Integrate by writing a synthesis of the texts with the aid of theirnotes (15 minutes allowed for this task) After completing theReading to Integrate task participants were allowed to see the textsagain and answered BC2 (15 minutes allowed for this task) In oneReading to Integrate session for unknown reasons six of the sevenparticipants read only one text Because we cannot explain the causeof this anomalous session we have eliminated scores from thesessionrsquos seven participants from subsequent analyses thus slightlyreducing the N size for the Reading to Integrate measure
b Variables used in study The six independent variables includedthree nominal (Native Language Background Medium of TextPresentation and Level of Education) and three interval variables(NelsonndashDenny TOEFL Reading Comprehension and ComputerFamiliarity) The four dependent variables were Reading to LearnReading to Integrate BC1 and BC2
IV Results
First we present the descriptive statistics for all reading measuresfollowed by a systematic analysis of independent variables that mightaffect participant performance on the new measures Scatterplotswere checked for all reading measures to ensure normality of dataKurtosis and skewness levels for all reading measures were found tobe within normal limits indicating a relatively normal distributionDescriptive statistics for all existing measures are shown in Table 1Means for these measures show a consistent pattern the nativespeaker undergraduates had the highest mean followed by the non-native speaker graduates followed by the nonnative speaker under-graduates On the reading measures NelsonndashDenny and TOEFLReading Comprehension the nonnative speaker undergraduates
184 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 185
showed the largest variance in performance while on the computerfamiliarity measure the variance of both nonnative speaker groupswas substantially larger than that of the native speakers
The same pattern emerged for the means on the new measures (seeTable 2) as for the existing measures The native speaker undergradu-ate group performed better on all new measures than both of thenonnative speaker groups The nonnative speaker graduate groupperformed better than the nonnative speaker undergraduate group onall measures as well This robust pattern of performance was alsofound in the variance of three of the four new measures On BC1 andBC2 the performance of the native speaker undergraduates showed the least amount of variance followed by the nonnative speaker grad-uates followed by the nonnative speaker undergraduates On Readingto Integrate the native speaker undergraduate group showed substan-tially less variance than the nonnative speaker groups however thevariance of the two nonnative speaker groups was almost identical OnReading to Learn all three groups showed considerable variance
Table 3 reveals the range of awarded points achieved by all partici-pant groups The nature of the Reading to Learn point system created amaximum possible point value (241) that no participant achieved Wespeculate that there are at least three possible causes of the discrepancybetween the theoretical maximum and the range of observed scores
Table 1 Descriptive statistics for existing measures for three participant groups
Group n Mean sd kMax
NelsonndashDennyNSU 105 12648 1646 156NNSU 106 6724 3191 156NNSG 40 8888 2188 156Total participants 251 9547 3693 156
TOEFL Reading comprehensionNSU 105 6130 424 67NNSU 106 5030 853 67NNSG 40 5715 455 67Total participants 251 5599 819 67
Computer familiarityNSU 104 3808 360 44NNSU 104 3482 599 44NNSG 40 3563 602 44Total participants 248 3631 533 44
Note kMax number of items or maximum possible score
186 New tasks for reading comprehension tests
bull task novelty no participant reported ever doing such a taskpreviously
bull time allowed for task completion andbull space on the response sheet space constraints may have limited
the amount of information that participants could include
Future research would need to address these issues However for theReading to Integrate measure the full range of possible point totalswas achieved by at least one participant in each group
1 Computer familiarity
The overall plan for the analyses was to check the influence of theindependent variables on the dependent measures with computerfamiliarity being addressed first Initially we had proposed that if computer familiarity was significantly different across groups itwould be entered into all calculations as a covariate To determinethis it was necessary to conduct an Analysis of Variance (ANOVA) forcomputer familiarity across the six participantmedium subgroups
Table 2 Descriptive statistics for new measures for three participant groups
Group n Mean sd kMax
Reading to Learn (chart)NSU 105 5185 1986 241NNSU 106 3173 1950 241NNSG 40 4468 1927 241Total participants 251 4221 2164 241
Basic Comprehension Test 1NSU 105 1698 247 20NNSU 106 1173 425 20NNSG 40 1498 350 20Total participants 251 1444 423 20
Reading to Integrate (synthesis)NSU 101 6365 1105 80NNSU 103 3724 2176 80NNSG 40 5360 2103 80Total participants 244 5086 2163 80
Basic Comprehension Test 2NSU 105 1591 278 20NNSU 106 975 454 20NNSG 40 1285 361 20Total participants 251 1282 472 20
Notes kMax number of items or maximum possible score n size reduced forreading to integrate because of anomalous testing session
Latricia Trites and Mary McGroarty 187
The resulting ANOVA (F 470 p 05) showed a significantdifference between subgroups on the computer familiarity question-naire therefore a post hoc Scheffeacute test was done to locate significantcontrasts After analysis of all possible subgroup contrasts the post hoc Scheffeacute revealed that the only significant difference in sub-groups appeared between the native speaker undergraduates andnonnative speaker undergraduates who read texts on paper Hencealthough there was one significant contrast it occurred in two sub-groups reading on paper not in any of the subgroups who read oncomputer All groups generally scored high on computer familiarityalthough as noted variance of the nonnative groups was greater Itwas thus established that computer familiarity had no significanteffect on participants who read texts on computer so we did not usecomputer familiarity as a covariate in further analyses and proceededto the three research questions of central interest to this study
Because both Research Questions 1 and 2 are similar ndash except thatthey address the two different new reading measures Reading toLearn and Reading to Integrate ndash we approached them in the samemanner through ANOVA to identify the independent variables thatcould have significantly affected the results on the new measures
2 Research Question 1
The first research question asked if performance on a measure ofReading to Learn was affected by medium of presentation computerfamiliarity native language or level of education We calculated a uni-variate ANOVA with Type III sums of squares on Reading to Learn with
Table 3 Range of scores for new measures for three participant groups
Group n Minimum Maximum kMax
Reading to Learn (chart)NSU 105 14 120 241NNSU 106 0 86 241NNSG 40 3 94 241Total participants 251 0 120 241
Reading to Integrate (synthesis)NSU 101 38 80 80NNSU 103 0 80 80NNSG 40 5 80 80Total participants 244 0 80 80
Notes kMax number of items or maximum possible score n size reduced forreading to integrate because of anomalous testing session
188 New tasks for reading comprehension tests
group status medium of text presentation and test order as possiblecontributing factors4 Table 4 shows that there were no significantinteractions for any of the group medium or test order combinationsThe only significant main effect was group membership
Because group membership was a combined measure that includedboth native language background as well as level of education posthoc analysis was needed to identify the significant contrasts Table 5shows that there was a significant difference in performance on theReading to Learn measure between the native speaker undergraduateand the nonnative speaker undergraduate groups as well as a sig-nificant difference between the nonnative speaker undergraduate and nonnative speaker graduate groups There was no significantdifference in performance between the native speaker undergrad-uate and the nonnative speaker graduate groups Therefore theanswer to Research Question 1 is that native language backgroundand level of education did have a significant effect on performance onthe Reading to Learn measure but that medium of text presentationdid not Further order of testing whether participants took Readingto Learn or Reading to Integrate first had no significant effect
3 Research Question 2
The second research question related to the first asked if perform-ance on Reading to Integrate was affected by medium of presentation
Table 4 Performance on Reading to Learn measure by groups medium and testorder (n 251) (univariate analysis of variance)
Source Type III sum df Mean square Fof squares
Group 2186929 2 1093464 2810Medium 121586 1 121586 313Test order 29498 1 29498 076Group medium 33481 2 16740 043Group test order 437 2 219 001Medium test order 39173 1 39173 101Group medium test order 57529 2 28765 074Error 9299745 239 38911
Note p 05
4Test order was added as an additional variable to double check that our counterbalancing had beeneffective in controlling for any practice effect
Latricia Trites and Mary McGroarty 189
computer familiarity native language or level of education Again toensure that counterbalancing of tests controlled for any practiceeffect test order was added as an additional variable
To answer this question we proceeded to calculate a univariateANOVA on the Reading to Integrate measure with group statusmedium of text presentation and test order entered as possible contri-buting factors The results (Table 6) show as for Research Question 1that there were no significant interactions for any of the groupmedium or test order combinations the only significant main effectwas group membership The answer for Research Question 2 is thatnative language background and educational level had a significanteffect on Reading to Integrate but medium of text presentation did notPost hoc analysis of group contrasts showed that all three groups weredistinct in their performance on Reading to Integrate (see Table 7)
4 Research Question 3
The third research question asked to what extent measures of basiccomprehension Reading to Learn and Reading to Integrate were
Table 5 Post hoc Scheffeacute for Reading to Learn measure (n 251)
Group n Group n Mean difference Standard error
NSU 105 NNSU 106 2012 272NNSG 40 717 367
NNSU 106 NNSG 40 1295 366
Note p 05
Table 6 Performance on Reading to Integrate measure by groups medium andtest order (n 244a) (univariate analysis of variance)
Source Type III sum df Mean square Fof squares
Group 3629433 2 1814717 5582b
Medium 19295 1 19295 059Test order 9783 1 9783 030Group medium 1182 2 591 002Group test order 3014 2 1507 005Medium test order 109872 1 109872 338Group medium test order 148858 2 74429 229Error 7543037 232 32513
Notes an size reduced for Reading to Integrate because of anomalous testingsession bp 05
190 New tasks for reading comprehension tests
related We used correlational analysis as the first step in answeringthis question Results for the total participant population (see Table 8)showed moderate to high correlations across all reading measuresHowever the analyses done for Research Questions 1 and 2 revealedthat group status had a significant effect on performance on Readingto Learn and Reading to Integrate Further we realize that corre-lations are sensitive to variance so the high correlations seen in thetotal population could have been an artifact of combining the threegroups Therefore we examined the correlations among all readingmeasures for each group (available in Trites 2000 Appendix 1 pp230ndash33) While the reading measures were still correlated oftenmoderately sometimes highly magnitudes differed and sometimesdropped substantially The text-specific multiple-choice measuresBC1 and BC2 consistently correlated more highly with theNelsonndashDenny and TOEFL Reading Comprehension tests than withReading to Learn and Reading to Integrate based on the same textssuggesting a test method or construct effect Because comparisonsbetween different measures of basic comprehension were not a goal of the project BC1 and BC2 were not used in further analysesWe conclude that as expected all reading measures were relatedbut the lower correlations between Reading to Learn and Reading to Integrate and the traditional basic comprehension measures led us to consider further types of analysis to identify the possibledistinctiveness of the new measures
5 Discriminant analysis
Because we were interested in determining how constructs differedwe sought additional analyses to help us better characterize the new constructs Of the several possible statistical methods that could have been employed two are most plausible multivariateanalysis of variance usually associated with experimental research
Table 7 Post hoc Scheffeacute for Reading to Integrate measure (n 244a)
Group n Group n Mean difference Standard error
NS 101 NNSU 103 2641b 253NNSG 40 1005b 337
NNSU 103 NNSG 40 1636b 336
Notes an size reduced for Reading to Integrate because of anomalous testingsession bp 05
Latricia Trites and Mary McGroarty 191
Tab
le 8
Co
rrel
atio
ns
for
all r
ead
ing
mea
sure
s fo
r al
l par
tici
pan
ts (
n
251)
TO
EFL
Rea
din
gB
asic
B
asic
Rea
din
g t
oR
ead
ing
to
C
om
pre
hen
sio
nC
om
pre
hen
sio
n T
est
1C
om
pre
hen
sio
n T
est
2Le
arn
Inte
gra
tea
Nel
son
ndashDen
ny
90b
85b
84b
66b
69b
TO
EFL
Rea
din
g
100
85b
84b
64b
69b
com
pre
hen
sio
nB
asic
1
008
4b6
8b6
8b
com
pre
hen
sio
n 1
Bas
ic
100
68b
70b
com
pre
hen
sio
n 2
Rea
din
g t
o L
earn
100
59b
Not
es a
nsi
ze r
edu
ced
fo
r R
ead
ing
to
Inte
gra
te b
ecau
se o
f an
om
alo
us
test
ing
ses
sio
n b
p
05
and discriminant analysis usually associated with descriptiveresearch (Tabachnick and Fidell 1996) The present research wasconducted with samples of naturally occurring student groups andwas not experimental Moreover we were interested in finding waysto compare participant performance on the measures of the newconstructs Reading to Learn and Reading to Integrate with perform-ance on more traditional measures of basic comprehension Thus we opted to use discriminant analysis because of its parsimony of description and clarity of interpretation (Stevens 1996)Discriminant analysis a technique recommended to describe groupdifferences or predict group membership based on a comparison ofmultiple predictors (Huberty 1994) has been used in other areas ofapplied linguistic research to investigate creation of a student profileof success or failure on Computer Assisted Language Learning(CALL) lessons (Jamieson et al 1993) and accurate classification oftext types into registers (Biber 1993) among other purposes
To further distinguish basic comprehension from the new con-structs we conducted discriminant analysis on each of the twolanguage groups (native and nonnative) to determine whetherReading to Learn and Reading to Integrate would classify partici-pants in the same way that Basic Comprehension would We dividedthe native speaker and nonnative speaker groups into three levelshigh middle (mid) and low reading ability scorers based on thebasic comprehension measure chosen for that group (NelsonndashDennyfor native speakers TOEFL Reading Comprehension for nonnativespeakers) Research methodologists (Tabachnick and Fidell 1996513) note that robustness is expected when the smallest group has isleast 20 our smallest group was 25 To check the assumption ofhomogeneity of the variancecovariance matrices we examined theoutcomes of Boxrsquos M Test and found them all nonsignificant(Klecka 1980) Each group was checked for outliers usingMahalanobis distance treated as Chi-Square and no outliers were found (Tabachnick and Fidell 1996) Thus the data met allassumptions required for use of discriminant analysis
a Discriminant analysis for nonnative speakers To organize thediscriminant analysis in order to see if the Reading to LearnReadingto Integrate Composite classified participants similarly to themeasure of Basic Comprehension (for nonnative speakers TOEFLReading Comprehension) we divided the entire nonnative speakergroup (n 146) into three levels of basic comprehension high mid
192 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 193
and low These reading ability groups of high mid and low werebased on typical TOEFL Reading Comprehension score levelsrequired for program entry Participants were classified as high iftheir scores were greater than or equal to 550 or 56 and above on thescaled score on the TOEFL Reading Comprehension (550 is the cutscore often used for graduate entry) Participants were classified asmid if scores ranged between 500 and 549 or 50 to 55 on the TOEFLReading Comprehension Participants were classified as low if theirscores fell below 500 or 49 and below on TOEFL ReadingComprehension (500 is a minimum TOEFL score sometimes usedfor undergraduate admission often with the proviso that studentsenroll in ESL classes either prior to official enrollment or concur-rently) For our entire nonnative speaker group descriptive statisticson basic comprehension reading ability group membership levelsappear in Table 9
We ran SPSS Discriminant Analysis with initial grouping variablesof high mid and low reading ability We compared initial readingability levels with high mid and low categories on the Reading toLearnReading to Integrate Composite a new variable reflectinglevel of performance on the Reading to Learn and Reading toIntegrate measures combined5 The discriminant analysis yieldedone discriminant function with an eigenvalue of 92 responsible for999 of the variance in outcomes Wilkrsquos Lambda for this functionwas 52 significant at 001 the associated Chi-Square value wasextremely large (9093) and highly significant ( p 001) indicatingthe group centroids on the composite Reading to LearnReading toIntegrate function for the three nonnative speaker reading abilitygroups were significantly different Both Reading to Learn and
Table 9 Descriptive statistics for nonnative speakers by TOEFL reading compre-hension reading ability groups (n 143)
Reading ability group n Mean on TOEFL reading comprehension sd
High (56) 57 5968 303Mid (50ndash55) 42 5281 167Low (49) 44 4211 547
Total 143 5226 822
Note Total n 143 due to loss of three cases in anomalous Reading to Integratesession
5We first calculated two separate discriminant analyses one for Reading to Learn and one for Reading to Integrate but we found that both loaded on a single function so we used thecomposite in subsequent analyses
194 New tasks for reading comprehension tests
Reading to Integrate loaded significantly on the discriminant func-tion at 86 for Reading to Learn and 78 for Reading to Integrate ( p 05)
Over half (649) of the high reading ability group remained highon the new measure less than half (429) of the mid reading ability group remained classified as mid Hence the Reading toLearnReading to Integrate Composite was particularly influential inreclassifying the mid reading ability group and to a lesser extent thehigh group However most (818) of the low reading ability groupremained low on the Reading to LearnReading to IntegrateComposite (see Table 10) Of the 143 nonnative speaker participants92 (64) remained in the initial basic comprehension category onthe composite the rest moved but in different directions Twenty-one participants (147) were classified into a higher category on theReading to LearnReading to Integrate Composite than their initialbasic comprehension level would have suggested while 31 (217)were reclassified into a lower category Thus 51 participants(364) just over one third of the sample were classified differentlybased on their Reading to LearnReading to Integrate Compositeperformance
b Discriminant analysis for native speakers Because one of ourgoals in this project was to probe the possible validity of these newmeasures by assessing performance of two groups native as well as
Table 10 Discriminant analysis comparison of nonnative speaker reading abilitygroups with reading to learnreading to integrate composite (n 143)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto integrate composite
High Mid Low
CountHigh (56) 37 17 3 57Mid (50ndash55) 13 18 11 42Low (49) 2 6 36 44Reclassification total 52 41 50 143
PercentageHigh (56) 649 298 53 1000Mid (50ndash55) 310 429 262 1000Low (49) 45 136 818 1000
NoteTotal n 143 due to loss of three cases in anomalous Reading to Integratesession
Latricia Trites and Mary McGroarty 195
nonnative speakers we conducted a parallel discriminant analysisfor native speakers Thus for the native speakers we followed thesame procedure dividing the entire native speaker group (n 105)into three levels of basic comprehension high mid and low basedon score distances of 5 standard deviations from the sample meanof the NelsonndashDenny test for these participants Native speakersneed not take reading comprehension tests when entering the univer-sity so the three-way split was based entirely on our sample dataParticipants were classified as high if their scores on the Nelson-Denny were greater than or equal to 135 Participants were classifiedas mid if their scores ranged between 119ndash134 on the Nelson-DennyParticipants were classified as low if their scores fell at or below 118on the Nelson-Denny Descriptive statistics for the entire nativespeaker sample on basic comprehension group membership appearin Table 11
The discriminant analysis yielded one discriminant function with an eigenvalue of 31 responsible for 987 of the variance inoutcomes Wilkrsquos Lambda for this function was 76 significant at 001 the associated Chi-Square value was large (2639) and highlysignificant (p 001) indicating the group centroids on the discrim-inant function for the three reading ability groups on the Reading toLearnReading to Integrate Composite were significantly differentPooled within groups correlations between discriminating variablesshowed that Reading to Learn correlated with the first discriminantfunction at a level of 81 Reading to Integrate correlated with thesecond discriminant function at 71 This contrasts with findings forthe nonnative speakers where scores on the combined new measuresloaded significantly on only one discriminant function For nativespeakers then there is evidence for two significant discriminantfunctions although the first accounts for almost all of the varianceAlthough these two measures (Reading to Learn and Reading toIntegrate) loaded on two separate discriminant functions they still
Table 11 Descriptive statistics for native speakers by NelsonndashDenny reading abilitygroups (n 101)
Reading ability group n Mean on NelsonndashDenny sd
High (135) 39 14126 519Mid (119ndash134) 37 12565 509Low (118) 25 10288 1098Total 101 12604 1652
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
196 New tasks for reading comprehension tests
showed moderate correlations with the alternate function6 justifyingthe composite calculations
Results of discriminant analysis for native speakers seen in Table 12 show a different pattern than that observed for nonnativespeakers Nearly three-fourths (718) of the native speakersclassified as high in the basic comprehension reading ability groupremained high on the Reading to LearnReading to IntegrateComposite For the mid group however only 189 remainedclassified as mid Just over half (52) of the low reading abilitygroup members remained low on the Reading to LearnReading to Integrate Composite As with the nonnative speakers participantsin the mid category on basic comprehension showed the mostfrequent reclassification Forty-eight of the 101 (475) nativespeaker participants remained in the initial classification categoriesTwenty-two (218) were reclassified into a higher category and 31(307) were reclassified into a lower category Thus 53 participants(525) over half of the sample were classified differently based ontheir performance on the Reading to LearnReading to IntegrateComposite
6Reading to Learn correlated with function 2 at -58 Reading to Integrate with function 1 at 71
Table 12 Discriminant analysis comparison of native speaker reading abilitygroups with Reading to LearnReading to Integrate Composite (n 101)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto Integrate Composite
High Mid Low
CountHigh (56) 37 17 3CountHigh (135) 28 3 8 39Mid (119ndash134) 10 7 20 37Low (118) 5 7 13 25Reclassification total 43 17 41 101
PercentageHigh (135) 718 77 205 1000Mid (119ndash134) 270 189 541 1000Low (118) 200 280 520 1000
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
V Interpretation
Analyses done to answer Research Questions 1 and 2 showed that performance on Reading to Learn and Reading to Integratemeasures was significantly influenced by language background and level of education (graduate vs undergraduate for nonnativespeakers only) Moreover level of computer familiarity had nosignificant effect on Reading to Learn and Reading to Integrateperformance
Correlations showed that as expected all reading measures corre-lated to some degree generally answering Research Question 3 Themost interesting results came from the discriminant analyses becausethey showed a pattern of differential classification on the new taskssensitive to initial reading ability This pattern differed for nonnativespeakers and native speakers Examination of the reclassificationsbased on the new tasks revealed some of the problems of using abasic comprehension-only test to predict performance on more chal-lenging literacy tasks These results also imply the need for furtherwork on new measures of advanced literacy skills such as Reading toLearn and Reading to Integrate to reflect trends in construct-drivenassessment (Pellegrino et al 1999)
For the nonnative speakers most (818) of the participants withTOEFL Reading Comprehension below 50 remained classified as lowon the Reading to LearnReading to Integrate Composite suggestingthe existence of a lower threshold of academic English proficiencyParticipants below this threshold were unlikely to perform well ontasks assessing Reading to Learn and Reading to Integrate Even tasksinvolving only selection (such as BC1 and BC2) rather than produc-tion of responses were difficult for this group However for thenonnative speaker readers in the mid and high reading ability groupsbasic comprehension level was not nearly as consistent a predictor of performance on the Reading to LearnReading to IntegrateComposite This was especially striking for those in the mid readingability group Approximately one fourth of the mid group droppedinto the low category on the Reading to LearnReading to IntegrateComposite indicating that they experienced more difficulty in com-pleting the new measures but approximately one fourth could indeedcategorize and synthesize information relatively well despite mid-dling performance on basic comprehension This finding suggeststhat for nonnative speaker readers in the mid reading ability categorybasic comprehension measures were insufficient to predict theirperformance on the new measures
Latricia Trites and Mary McGroarty 197
Almost two-thirds of the high basic comprehension nonnativespeakers remained high on the Reading to LearnReading toIntegrate Composite but one third dropped suggesting that theycould not categorize and synthesize information from texts as well asthey could recognize information For this third the recognition-onlytask multiple-choice basic comprehension overestimated ability tomanipulate and synthesize information For the high and mid read-ing ability nonnative speaker groups basic comprehension-only testsmay then overestimate academic English proficiency
Results for the native speakers showed a more definitive pattern ofreclassification For the low reading ability group approximatelyhalf were misclassified suggesting that the basic comprehensionmeasure underrepresented their ability to categorize and synthesizeinformation in academic texts Most of the mid reading ability groupwere reclassified as lower on the Reading to LearnReading toIntegrate Composite showing that basic comprehension resultsoverestimated ability to succeed on more challenging tasks On theother hand almost one third of the mid reading ability group didbetter on the Reading to LearnReading to Integrate Composite forthem basic comprehension underrepresented their ability to com-plete more challenging reading tasks For the high ability readersnearly one third dropped on the Reading to LearnReading toIntegrate Composite For them the basic comprehension measureoverestimated their level of academic English proficiency The other two thirds of the high reading ability group remained highsuggesting a higher threshold of academic English proficiency
Considering results from both the nonnative speakers and nativespeakers we conclude that the new tasks did assess something dif-ferent from basic comprehension once a lower level threshold ofbasic academic English proficiency had been achieved Examinationof scatterplots based on discriminant functions indicated an obviousseparation between participants who could and could not perform onthe Reading to LearnReading to Integrate Composite thus provid-ing some tentative evidence for concurrent validity (Messick 1989Chapelle 1999) We had hoped to find clear evidence of a hierarchysuggesting that Reading to Learn was demonstrably more difficultthan basic comprehension and Reading to Integrate demonstrablymore difficult than Reading to Learn but results did not yield anobvious hierarchy Results did however suggest an even simplerpattern a dichotomy For those nonnative speakers above the lowerthreshold of English language proficiency which in our data wouldbe approximately 500 or 49 on the scaled TOEFL Reading
198 New tasks for reading comprehension tests
Comprehension scores some could perform well on the newmeasures some could not revealing two rather than three groups onthe Reading to LearnReading to Integrate Composite These resultslead us to speculate that the new measures tap additional skills suchas lsquosophisticated discourse processes and critical thinking skillsrsquo(Enright and Schedl 1999 24) in addition to language proficiencyFor native speakers the discriminant loading suggested a possiblehierarchy of difficulty but the scatterplots and classification tablesrevealed a dichotomy Reclassification based on the Reading toLearnReading to Integrate Composite nearly eliminated the mid basic comprehension reading ability category with only 168of the native speakers qualifying as mid on the Reading toLearnReading to Integrate Composite
VI Conclusions
While we acknowledge that there are some limitations to this projectsuch as the time required to administer and score new measuresthe results are illuminating useful and suggest some considerationsfor future research The first consideration relates to appropriateclassification of students based on academic reading abilities Thiscorresponds to the role of TOEFL as a gatekeeper by many institu-tions For TOEFL test-takers whose current TOEFL scores would bein an intermediate to high intermediate range the new tasks couldassess additional abilities relevant to academic performance Forsuch participants additional test development efforts are warrantedThe majority of high reading ability nonnative speakers (649)could perform the new measures a third could not Our resultssuggest that some unknown number of mid reading ability nonnativespeakers are more capable of succeeding at more challenging aca-demic reading tasks than their current level of basic comprehensionassessment would indicate At the same time there are some highand mid reading ability participants who could not perform the moredemanding tasks admitting such students directly into universityprograms could result in failure These results suggest that for moststudents at lower levels of basic comprehension (in our study thosewith TOEFL scores below 490 to 500) development of new tasks isunnecessary Nevertheless our results indicate that a certain smallpercentage (in our sample 8 students 182) of nonnative speakersclassified as low reading ability on basic comprehension did in fact
Latricia Trites and Mary McGroarty 199
perform adequately on more challenging tasks like Reading to Learnand Reading to Integrate Given the very large number of TOEFLtest-takers around the world this finding merits further research Itwould be potentially unfair to exclude such students from universitystudy based on basic comprehension-only measures such as the cur-rent TOEFL We would urge institutions that use TOEFL to considerscores on these new more demanding tasks if doing so wouldaddress institutional needs According to interview data collected inconjunction with this project (Trites 2000 Chapter 6) participantsperceived that successful completion of the new tasks required athorough understanding of the texts whereas the multiple-choicetests were less demanding requiring only superficial grasp of con-tent This finding corroborates the observation of Freedle and Kostin(1999 3) who note that often examinees do not need to comprehendthe accompanying text to answer the test item Therefore for allTOEFL test-takers incorporating more challenging tasks such as theReading to Learn and Reading to Integrate measures into typicalEnglish language instruction could have positive washback effects
The second area of consideration is the focus on additional rele-vant research For predictive validity it is important to determine thecorrelation between the Reading to LearnReading to IntegrateComposite measures and actual academic performance in universityclasses Correlations might differ depending on the degree of catego-rization and synthesis required by different major fields of studyAnother area for future research is related to the novelty of theReading to Learn task as an assessment technique Current pedagog-ical trends in literacy instruction emphasize the use of graphic organ-izers as a means to understand and manipulate information in textsTo our knowledge graphic organizers are typically used as classroomactivities rather than assessment tools They have good potential foruse in assessment if students are familiar with them and if appropri-ate scoring systems can be developed This project demonstrates that it is possible though labor intensive to develop reliable scoringsystems Further research is needed to explore refinement of this tasktype and related scoring systems for use in testing programs withlarge numbers of test-takers and scorers Although the Reading toIntegrate task (generating a synthesis) was more familiar the scoringsystem was innovative because it reflected a readerrsquos ability torecognize textual frames as well as integrate information If testdevelopers are interested in the abilities of nonnative speaker readersto perform such tasks further development of similar tasks andscoring systems is warranted While a substantial investment of time
200 New tasks for reading comprehension tests
would be required to refine the administration and scoring systemsneeded for more complex tasks such as these it should be weighedagainst the possible danger of under-representing the likelihood ofacademic success based on results of the basic comprehension-onlymeasures still most often used to assess academic proficiency
Acknowledgements
This project was funded by a grant from Educational Testing Serviceas part of the TOEFL 2000 effort We appreciate the cooperation ofthe ETS staff members who assisted in the development of passagespecific tasks and other aspects of the research however no officialendorsement of Educational Testing Service should be inferred
VII References
Bachman L 2000 Modern language testing at the turn of the century assur-ing that what we count counts Language Testing 17 1ndash42
Biber D 1993 Using register-diversified corpora for general language studiesComputational Linguistics 19 219ndash41
Britt M Rouet J and Perfetti C 1996 Using hypertext to study and reasonabout historical evidence In Rouet J Levonen J Dillon A and Spiro Reditors Hypertext and cognition Mahwah NJ Lawrence Erlbaum 43ndash72
Chapelle C 1999 Validity in language assessment Annual Review of AppliedLinguistics 19 1ndash19
Educational Testing Service 1997 TOEFL test and score manual PrincetonNJ Educational Testing Service
mdashmdash 1998 Draft TOEFL 2000 research agenda framework areas of researchResearch agenda three TOEFL 2000 internal document Princeton NJEducational Testing Service
Eignor D Taylor C Kirsch I and Jamieson J 1998 Development of ascale for assessing the level of computer familiarity of TOEFL exami-nees TOEFL Research Report No 60 Princeton NJ EducationalTesting Service
Enright M and Schedl M 1999 Reading for a reason using reader purposeto guide test design TOEFL 2000 Internal Report Princeton NJEducational Testing Service
Enright M Grabe W Mosenthal P Mulcahy-Ernt P and Schedl M1998 A TOEFL 2000 framework for testing reading comprehension aworking paper Princeton NJ Educational Testing Service
Foltz P 1996 Comprehension coherence and strategies in hypertext InRouet J Levonen J Dillon A and Spiro R editors Hypertext and cog-nition Mahwah NJ Lawrence Erlbaum 109ndash36
Latricia Trites and Mary McGroarty 201
Freedle R and Kostin I 1999 Does the text matter in a multiple choice testof comprehension The case for the construct validity of TOEFLrsquosminitalks Language Testing 16 2ndash32
Goldman S 1997 Learning from text reflections on the past and suggestionsfor the future Discourse Processes 23 357ndash98
Hayes J and Hatch J 1999 Issues in measuring reliability correlationversus percentage of agreement Written Communication 16 354ndash67
Huberty C 1994 Applied discriminant analysis New York John WileyJamieson J Campbell J Norfleet L and Berbisada N 1993 Reliability
of a computerized scoring routine for an open-ended task System 21305ndash22
Jamieson J Norfleet L and Berbisada N 1993 Successes failures anddropouts in computer-assisted language lessons Computer AssistedEnglish Language Learning Journal 4 12ndash20
Klecka W 1980 Discriminant analysis In Lewis-Beck M editorQuantitative applications in the social sciences Volume 19 NewburyPark CA Sage
Lehto M Zhu W and Carpenter B 1995 The relative effectiveness ofhypertext and text International Journal of Human-ComputerInteraction 7 293ndash313
McNamara D and Kintsch W 1996 Learning from texts effects of prior knowl-edge and text coherence Discourse Processes 22 247ndash88
Messick S 1989 Validity In Linn RL editor Educational measurement 3rdedition New York American Council on Education Macmillan 13ndash103
Meyer B 1985a Prose analyses purposes procedures and problems InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 11ndash64
mdashmdash 1985b Prose analysis purposes procedures and problems Part 2 InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 269ndash97
Monks V 1997 AugustSeptember Two views same waterway NationalWildlife 35 36ndash37
Pellegrino J Baxter G and Glaser R 1999 Addressing the lsquotwo disci-plinesrsquo problem linking theories of cognition and learning with assess-ment and instructional practice Review of Research in Education 24307ndash53
Perfetti C 1997 Sentences individual differences and multiple texts threeissues in text comprehension Discourse Processes 23 337ndash55
Perfetti C Britt MA and Georgi M 1995 Text-based learning andreasoning studies in history Hillside NJ Lawrence Erlbaum
Perfetti C Marron M and Foltz P 1996 Sources of comprehension failuretheoretical perspectives and case studies In Cornoldi C and Oakhill Jeditors Reading comprehension difficulties processes and interventionMahwah NJ Lawrence Erlbaum 137ndash65
202 New tasks for reading comprehension tests
Reinking D 1988 Computer-mediated text and comprehension differencesthe role of reading time reader preference and estimation of learningReading Research Quarterly 23 484ndash500
Reinking D and Schreiner R 1985 The effects of computer-mediated texton measures of reading comprehension and reading behavior ReadingResearch Quarterly 20 536ndash53
Spivey N 1997 The constructivist metaphor reading writing and themaking of meaning San Diego CA Academic Press
Stevens J 1996 Applied multivariate statistics for the social sciences 3rdedition Mahwah NJ Lawrence Erlbaum
Tabachnick B and Fidell L 1996 Using multivariate statistics 3rd editionNew York Harper Collins
Taylor C Jamieson J Eignor D and Kirsch I 1998 The relationshipbetween computer familiarity and performance on computer-basedTOEFL test tasks TOEFL Research Report No 61 Princeton NJEducational Testing Service
Tennesen M 1997 NovemberDecember On a clear day National Parks 7126ndash9
Trites L 2000 Beyond basic comprehension reading to learn and reading tointegrate for native and non-native speakers Unpublished doctoraldissertation Northern Arizona University Flagstaff AZ
Van den Berg S and Watt J 1991 Effects of educational setting on studentresponses to structured hypertext Journal of Computer-BasedInstruction 18 118ndash24
Van Dijk TA and Kintsch W 1983 Strategies of discourse comprehensionNew York Academic Press
Wiley J and Voss J 1999 Constructing arguments from multiple sourcestasks that promote understanding and not just memory for text Journalof Educational Psychology 91 310ndash11
Zimmerman T 1997 December 29 Filter it with billions and billions of oys-ters how to revive the Chesapeake Bay US News and World Report 12363
Appendix 1a Chart completion task
Directions Complete the following chart Fill in as much detail aspossible from the text read by categorizing the information into thedifferent areas on the chart Do not use single words for yourresponses form your responses in phrases or complete sentencesInclude examples from the text
Make no judgments about the accuracy of causes or effects or theeffectiveness of the solutions mentioned in the text Solutions are seen
Latricia Trites and Mary McGroarty 203
204 New tasks for reading comprehension tests
as any action taken in response to the problem(s) Solutions can takemany forms such as proposed solutions attempted solutions or failedsolutions Also space is provided under each category for examplesExamples are specific examples found in the text that are used by theauthor(s) to exemplify the problems causes effects or solutions inthe text However there may not be examples for every category
Points will be awarded for correct responses only There is no penaltyfor incorrect responses Points will be awarded in the following manner
Problems and Solutions 10 points eachCauses and Effects 5 points eachExamples 1 point each
Problems Causes Effects Solutions
Examples Examples Examples Examples
Latricia Trites and Mary McGroarty 205
Ap
pen
dix
1b
Rea
din
g t
o L
earn
sco
rin
g r
ub
ric
0ndash2
41 t
ota
l po
ssib
le
Pro
ble
ms
(10
po
ints
) 1
wo
rd
Cau
ses
(5p
oin
ts)
1 w
ord
E
ffec
ts (
5 p
oin
ts)
1 w
ord
S
olu
tio
ns
(10
po
ints
en
d)
1 w
ord
(6 p
oin
ts)
40
max
imu
m(3
po
ints
) 5
5 m
axim
um
(3 p
oin
ts)
30
max
imu
m(6
po
ints
) 9
0 m
axim
um
bullA
ir p
ollu
tio
nS
mo
g i
n t
he
bullS
ulf
ur
nit
rog
en e
mis
sio
ns
bullTre
es a
nd
pla
nts
aff
ecte
dS
tric
ter
En
vir
on
men
tal
Law
s
Nati
on
al
Park
sbull
Aci
d r
ain
(in
jure
d)
gro
wth
hin
der
ed(r
eso
luti
on
s
acts
)
bullO
pp
osit
ion
or
Ign
ori
ng
bull
Gro
un
d-l
evel
ozo
ne
bullV
isib
ilit
y d
ecre
ased
bull19
77 C
lean
Air
Act
(or
lack
of
coo
per
atio
n)
to
Urb
an
In
du
str
ial
em
issio
ns
bullM
eta
ls l
oo
sen
ed
into
wat
ers
Am
en
dm
en
ts l
ab
elin
gN
Ps
envi
ron
men
tal s
tan
dar
ds
bullA
uto
mo
bile
emis
sio
ns
(su
rfac
eg
rou
nd
wat
er
as C
lass I
are
as
(reg
ula
tio
ns)
bullP
ow
er
pla
nts
fac
tori
esp
lan
ts
wat
er p
ollu
ted
)bull
Reg
ion
al h
aze
reg
ula
tio
ns
bullN
o t
rue
lsquopo
int
sou
rces
rsquoem
issi
on
sbull
Nu
trie
nts
rem
oved
pro
po
sed
by
the
EPA
(myri
ad
of
sm
aller
po
llu
tio
n
bullE
mis
sio
ns
fro
m S
mo
kesta
cks
(lea
ched
) fr
om
so
il bull
Red
ucti
on
of
allo
wab
leso
urc
es
rath
er t
han
on
e (c
him
neys)
and
or
pla
nts
po
llu
tio
n s
tan
dard
s f
rom
la
rge
sou
rce)
bullE
mis
sio
ns
fro
m K
iln
sbull
Pu
blic o
utc
ryo
ver
stan
ce
ind
ust
rybull
Nati
on
al
Park
s l
imit
ed
bull
Un
reg
ula
ted
po
lluti
on
sm
og
o
f b
ig b
usi
nes
sg
ove
rnm
ent
bullC
lean
Air
Act
set
ju
risd
icti
on
of
po
lluti
on
(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)bull
Aq
uati
c l
ife in
jure
d (
dam
aged
)vis
ibilit
y g
oals
sou
rces
ou
tsid
e o
f th
e p
arks
bullS
mo
ke f
rom
Co
ntr
olled
bu
rns
bullO
bje
cti
ng
to c
on
str
ucti
on
bullE
mis
sio
ns
fro
m la
rge
citi
esp
erm
its
bullId
en
tifi
cati
on
of
po
lluti
on
so
urc
es
bullIn
du
str
y m
od
ificati
on
s
Insta
llati
on
o
f d
evic
es
(scr
ub
ber
s)
to r
edu
ce
po
lluti
on
bullP
ub
lic p
ressu
re t
op
rote
ct p
arks
206 New tasks for reading comprehension testsA
pp
en
dix
1b
(co
nti
nu
ed)
Exam
ple
s (
1 p
oin
t) 5
maxim
um
Exam
ple
s (
1 p
oin
t o
r 5 p
oin
ts
Exam
ple
s (
1 p
oin
t) 6
maxim
um
Exam
ple
s (
1 p
oin
t o
r 10 p
oin
ts
wit
h o
vera
rch
ing
cau
se
wit
h o
vera
rch
ing
so
luti
on
liste
d a
bo
ve)
liste
d a
bo
ve)
bullG
reat
Sm
oky M
ou
nta
inbull
Au
tom
ob
ile
emis
sio
ns
bullLe
aves
of
pla
nts
tu
rnin
g
bull19
77 C
lean
Air
Act
N
atio
nal
Par
kbull
Po
wer
pla
nts
fac
tori
es
pu
rple
an
d b
row
n (
stip
plin
g)
Am
en
dm
en
ts
lab
elin
gN
Ps
bullG
ran
d C
an
yo
np
lan
ts e
mis
sio
ns
bullH
ind
erin
g p
ho
tosyn
thesis
of
as C
lass I
are
as
Nat
ion
al P
ark
bullE
mis
sio
ns
fro
m
pla
nts
an
d t
rees
bullS
mo
ky M
ou
nta
in S
ou
rces
S
mo
kesta
cks
(ch
imn
eys)
bullR
ed
ucti
on
of
vis
ibilit
yat
bullR
egio
nal
haz
ere
gu
lati
on
s
fro
m O
hio
New
Yo
rk
bullE
mis
sio
ns
fro
m K
iln
sG
ran
d C
an
yo
np
rop
ose
d b
y th
e E
PA
Atl
anta
etc
bull
Un
reg
ula
ted
po
lluti
on
sm
og
bull
Vis
ibili
ty r
educ
ed 9
0 o
f bull
Red
ucti
on
of
allo
wab
lebull
Gra
nd
Can
yon
So
urc
es(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)th
e da
ysp
ollu
tio
n s
tan
dard
s f
rom
fro
m C
A N
V U
T A
Z N
M
bullS
mok
e fr
om C
on
tro
lled
bu
rns
bullS
ee v
ague
blu
e m
asse
sin
du
stry
and
Mex
ico
bullE
mis
sio
ns
fro
m la
rge
citi
esbull
See
hal
f as
far
as
in 1
919
bullT
N L
utt
rell
corp
bu
ildin
g
per
mit
sit
uat
ion
Exam
ple
s (
1 p
oin
t)
10 m
axim
um
Exam
ple
s (
1 p
oin
t)
5 m
axim
um
bullN
avaj
o G
ener
atin
g S
tati
on
bull
Ob
ject
ing
to
kiln
s in
TN
Pag
e A
Z (
Po
wer
Pla
nt
AZ
)bull
Res
earc
her
s u
sin
g s
cien
tifi
cbull
Po
wer
pla
nts
in T
N a
nd
OH
tech
no
log
y to
ID s
ou
rces
rive
r va
lley
(rad
ioac
tive
iso
top
es)
bullS
ou
ther
n C
alif
orn
ia E
dis
on
bull
So
uth
ern
Cal
ifo
rnia
Ed
iso
nP
lan
t L
aug
hlin
NV
Pla
nt
in L
aug
hlin
NV
bull
Ten
nes
see
Lutt
rell
Kiln
s T
Nid
enti
fied
as
po
int
sou
rce
bullIn
du
stry
in A
Zbull
Scru
bber
sin
stal
led
at
bullC
ars
in C
AN
avaj
o G
ener
atin
g P
lan
tbull
Sm
oke
stac
ks in
NM
NV
UT
bull10
r
edu
ctio
n o
f p
ollu
tio
nbull
Los
An
gel
esin
10ndash
15 y
ears
(g
oal
of
no
bullA
tlan
tam
an-m
ade
po
lluti
on
)bull
New
Yo
rk
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
180 New tasks for reading comprehension tests
lengths were long enough for adequate argumentation but not so long that they were excessively redundant (Enright et al 1998)Texts were also matched for readability according to standard read-ability scales such as the FleschndashKincaid ColemanndashLiau andBormuth scales and averaged a minimum of grade level 110 to 120on these scales Also all texts pertained to natural and social scienceseach text covered environmental issues such as air and water pollution(Enright et al 1998) Thus text topics were similar across tasks
c New instruments used in the study Three new reading measureswere used in this study to assess Reading to Learn Reading to Integrateand Basic Comprehension Trites (2000 Chapters 2 and 3) presents amore extensive review of literature and rationale for development of thenew measures
bull Reading to Learn The first new measure completion of a chart was used to determine participantsrsquo ability to read to learnSpivey (1997 69) suggests that readersrsquo categorization of infor-mation in text offers insight into their cognitive processes andtheir making of meaning We designed a measure to be used with a 1200-word text that students read on either paper or com-puter Students were asked to recall identify and categorizeinformation from the text on a chart reflecting macro-rhetoricalstructures called macrostructures in this study (problems andsolutions) and other types of information from problemsolutiontexts (causes effects and examples) categories based on thework of Meyer (1985a) The scoring rubric based on work byMeyer (1985b) and later modified by Jamieson et al (1993)awarded points only for the upper levels of textual structurerepresented on the chart (for task and scoring rubric seeAppendix 1) We weighted the information supplied on the chartas follows 10 points for correct information in the problem andsolution categories five points for correct information suppliedin the cause and effect categories and one point for accurateexamples This weighting reflects Meyerrsquos (1985b) hierarchicallevels which characterize problem and solution propositions ashigher order structures while the other categories represent lowerorder propositions1 The theoretical maximum score for this scale
1Students received no points for information improperly placed or for information not found in thetext
Latricia Trites and Mary McGroarty 181
was 241 which would result from maximum points given in allcategories The first author and two research assistants spent35ndash40 hours creating revising norming the scoring rubric anddeveloping the scoring guide (Trites 2000 Chapter 3) To deter-mine interrater reliability we used coefficient alpha rather thanpercentage of agreement because percentage of agreementinflates the likelihood of chance agreement (Hayes and Hatch1999) After norming overall interrater reliability was 99(coefficient alpha) with similarly high reliabilities assessed withsimilarly high alpha coefficients for all subcategories2
bull Reading to Integrate The second new measure assessed Readingto Integrate The task used to assess Reading to Integraterequired participants to read two 600-word texts and compose awritten synthesis The prompt asked students to make connec-tions across the range of ideas presented thus we asked readersto synthesize information rather than summarize or makecomparisons (Wiley and Voss 1999) This synthesis was scoredbased on an analytic scale ranging from 0 to 80 reflecting read-ersrsquo ability to recognize and manipulate the structure of the textsinclude specific information and express connections acrosstexts through the use of cohesive devices (for task and scoringrubric see Appendix 2) The test was designed to measure theintegration of content from both readings and did not assessother aspects of writing such as the creation of rhetorical stylegrammaticality or mechanics The rubric was composed of threesubcategories integration ability macrostructure recognitionand use of relevant details The integration subscore wasawarded the highest point values because this was the predomi-nant skill being tested It scored participants on their ability tomake connections across texts based on the manipulation of thetextual frames in both texts The second subcategory awardedpoints for the ability to recognize and articulate the macrostruc-tures (problem cause effect or solution) present in each textThis subcategory was similar to the categorizing task used in theReading to Learn measure with the additional constraint thatparticipants had to express the connections overtly The thirdsubcategory in the scoring rubric analysed the ability to use
2We recognize that tasks requiring high inference measures plus extensive norming and revision of the scoring rubric pose feasibility issues in large-scale testing Further research is needed todetermine whether and how such scoring procedures could be adapted in standardized testing fornumerous test-takers
relevant details as support in the written synthesis The firstauthor and two research assistants spent 30 hours revising norm-ing the scoring rubric and developing a decision guide resultingin an overall interrater reliability of 99 (coefficient alpha) withsimilarly high alphas for all subcategories
bull Basic Comprehension The third construct was measured bymultiple-choice tests related specifically to the texts used in the new tasks These tests were created by TOEFL TestDevelopment staff and followed current TOEFL reading sectionspecifications We used two multiple choice tests BasicComprehension Test 1 (BC1) and Basic Comprehension Test 2 (BC2) 20 items each one for the longer passage used to assessReading to Learn and one for the two passages used to assessReading to Integrate Both were scored based on number of items answered correctly Reliability on BC1 calculated basedon 251 participants was 84 (coefficient alpha) Inadvertently theorder of the texts used in BC2 was different for the two differentmedia however reliability on both versions of the test was highFor those who took BC2 based on paper texts (n 127) relia-bility was 84 (coefficient alpha) for those who took BC2 basedon computerized texts (n 124) reliability was 86 (coefficientalpha)
3 Design for data collection
This study used a 22 repeated measures design to examineperformance on the new reading tasks Native speaker undergraduatesand nonnative speaker undergraduates were divided into two groupseach of equal ability as determined by performance on the baselinestandardized measures of reading comprehension (NelsonndashDenny orTOEFL) Half of each group read texts on paper the other half readthe same texts on a computer screen A smaller group of nonnativespeaker graduates equally divided were also included for a compar-ison between performance by graduate and undergraduate nonnativespeakers Additionally the administration of the new measures wascounterbalanced to control for any practice effect
a Procedures All participants met with the researchers in foursessions each lasting about an hour The first two sessions were devotedto administering the existing instruments During Session 1 partici-pants received an introduction to the study and took one of the two
182 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 183
standardized basic reading comprehension measures (NelsonndashDennyor TOEFL Reading Comprehension) Students completed thecomputer familiarity questionnaire and the NelsonndashDenny Test at the same testing session because the NelsonndashDenny was shorter than the TOEFL Reading Comprehension During Session 2 partic-ipants took the other standardized basic reading comprehensionmeasure
Next each participant group was subdivided into two subgroupsfor computer-based or paper reading of the texts for the new tasksThe subgroups were matched on their performance on initial readingmeasures the NelsonndashDenny was used for native speakers and theTOEFL Reading Comprehension was used for nonnative speakersIndependent t-tests run on these reading measures showed no signif-icant difference in basic comprehension for the newly created sub-groups assigned to each medium ensuring that they were balancedfor initial reading levels Participants stayed in the same subgroupsfor the duration of the study To ensure uniformity of response modeall participants whether they read the source texts on the computeror on paper responded to the reading tasks using paper and pencilformat3
The last two sessions each lasting approximately one hour werededicated to administration of the new measures The Reading toLearn session took slightly longer to administer because administra-tive procedures were longer for this novel task The new tasks werecounterbalanced to control for practice effect thus half of the partic-ipants took the Reading to Learn measure first and half took theReading to Integrate measure first During Session 3 we administeredthe first new measure (for ease of discussion Reading to Learn is dis-cussed first) and BC1 At this session students were given 12 minutesto read a 1200-word passage either on computer or on paper We lim-ited the time allowed for reading based on 100 words per minutethought to be ample (Grabe personal communication 1998) Afterexaminees read the text they were given 4 minutes to take notes ona half sheet of paper Participants were instructed to take minimalnotes due to the time constraints Next the text was removed andexaminees were allowed 15 minutes to complete a chart based on the reading with the aid of their notes After completing this Readingto Learn activity participants were allowed to use the text and
3Although responses could have been entered and perhaps scored by computer this would haveintroduced factors not directly related to our research questions and remains an area for furtherstudy
were given 15 minutes to answer BC1 Following these new testingsessions 49 participants were selected for a related interview con-cerning the cognitive processes used in task completion (for furtherdetails see Trites 2000 Chapter 6)
During Session 4 students were given 12 minutes to read twoshort texts (600 words each) either on computer or paper Afterparticipants read the assigned texts they were given 4 minutes totake one-half page of notes (Enright et al 1998) Next the textswere removed and participants were asked to demonstrate Readingto Integrate by writing a synthesis of the texts with the aid of theirnotes (15 minutes allowed for this task) After completing theReading to Integrate task participants were allowed to see the textsagain and answered BC2 (15 minutes allowed for this task) In oneReading to Integrate session for unknown reasons six of the sevenparticipants read only one text Because we cannot explain the causeof this anomalous session we have eliminated scores from thesessionrsquos seven participants from subsequent analyses thus slightlyreducing the N size for the Reading to Integrate measure
b Variables used in study The six independent variables includedthree nominal (Native Language Background Medium of TextPresentation and Level of Education) and three interval variables(NelsonndashDenny TOEFL Reading Comprehension and ComputerFamiliarity) The four dependent variables were Reading to LearnReading to Integrate BC1 and BC2
IV Results
First we present the descriptive statistics for all reading measuresfollowed by a systematic analysis of independent variables that mightaffect participant performance on the new measures Scatterplotswere checked for all reading measures to ensure normality of dataKurtosis and skewness levels for all reading measures were found tobe within normal limits indicating a relatively normal distributionDescriptive statistics for all existing measures are shown in Table 1Means for these measures show a consistent pattern the nativespeaker undergraduates had the highest mean followed by the non-native speaker graduates followed by the nonnative speaker under-graduates On the reading measures NelsonndashDenny and TOEFLReading Comprehension the nonnative speaker undergraduates
184 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 185
showed the largest variance in performance while on the computerfamiliarity measure the variance of both nonnative speaker groupswas substantially larger than that of the native speakers
The same pattern emerged for the means on the new measures (seeTable 2) as for the existing measures The native speaker undergradu-ate group performed better on all new measures than both of thenonnative speaker groups The nonnative speaker graduate groupperformed better than the nonnative speaker undergraduate group onall measures as well This robust pattern of performance was alsofound in the variance of three of the four new measures On BC1 andBC2 the performance of the native speaker undergraduates showed the least amount of variance followed by the nonnative speaker grad-uates followed by the nonnative speaker undergraduates On Readingto Integrate the native speaker undergraduate group showed substan-tially less variance than the nonnative speaker groups however thevariance of the two nonnative speaker groups was almost identical OnReading to Learn all three groups showed considerable variance
Table 3 reveals the range of awarded points achieved by all partici-pant groups The nature of the Reading to Learn point system created amaximum possible point value (241) that no participant achieved Wespeculate that there are at least three possible causes of the discrepancybetween the theoretical maximum and the range of observed scores
Table 1 Descriptive statistics for existing measures for three participant groups
Group n Mean sd kMax
NelsonndashDennyNSU 105 12648 1646 156NNSU 106 6724 3191 156NNSG 40 8888 2188 156Total participants 251 9547 3693 156
TOEFL Reading comprehensionNSU 105 6130 424 67NNSU 106 5030 853 67NNSG 40 5715 455 67Total participants 251 5599 819 67
Computer familiarityNSU 104 3808 360 44NNSU 104 3482 599 44NNSG 40 3563 602 44Total participants 248 3631 533 44
Note kMax number of items or maximum possible score
186 New tasks for reading comprehension tests
bull task novelty no participant reported ever doing such a taskpreviously
bull time allowed for task completion andbull space on the response sheet space constraints may have limited
the amount of information that participants could include
Future research would need to address these issues However for theReading to Integrate measure the full range of possible point totalswas achieved by at least one participant in each group
1 Computer familiarity
The overall plan for the analyses was to check the influence of theindependent variables on the dependent measures with computerfamiliarity being addressed first Initially we had proposed that if computer familiarity was significantly different across groups itwould be entered into all calculations as a covariate To determinethis it was necessary to conduct an Analysis of Variance (ANOVA) forcomputer familiarity across the six participantmedium subgroups
Table 2 Descriptive statistics for new measures for three participant groups
Group n Mean sd kMax
Reading to Learn (chart)NSU 105 5185 1986 241NNSU 106 3173 1950 241NNSG 40 4468 1927 241Total participants 251 4221 2164 241
Basic Comprehension Test 1NSU 105 1698 247 20NNSU 106 1173 425 20NNSG 40 1498 350 20Total participants 251 1444 423 20
Reading to Integrate (synthesis)NSU 101 6365 1105 80NNSU 103 3724 2176 80NNSG 40 5360 2103 80Total participants 244 5086 2163 80
Basic Comprehension Test 2NSU 105 1591 278 20NNSU 106 975 454 20NNSG 40 1285 361 20Total participants 251 1282 472 20
Notes kMax number of items or maximum possible score n size reduced forreading to integrate because of anomalous testing session
Latricia Trites and Mary McGroarty 187
The resulting ANOVA (F 470 p 05) showed a significantdifference between subgroups on the computer familiarity question-naire therefore a post hoc Scheffeacute test was done to locate significantcontrasts After analysis of all possible subgroup contrasts the post hoc Scheffeacute revealed that the only significant difference in sub-groups appeared between the native speaker undergraduates andnonnative speaker undergraduates who read texts on paper Hencealthough there was one significant contrast it occurred in two sub-groups reading on paper not in any of the subgroups who read oncomputer All groups generally scored high on computer familiarityalthough as noted variance of the nonnative groups was greater Itwas thus established that computer familiarity had no significanteffect on participants who read texts on computer so we did not usecomputer familiarity as a covariate in further analyses and proceededto the three research questions of central interest to this study
Because both Research Questions 1 and 2 are similar ndash except thatthey address the two different new reading measures Reading toLearn and Reading to Integrate ndash we approached them in the samemanner through ANOVA to identify the independent variables thatcould have significantly affected the results on the new measures
2 Research Question 1
The first research question asked if performance on a measure ofReading to Learn was affected by medium of presentation computerfamiliarity native language or level of education We calculated a uni-variate ANOVA with Type III sums of squares on Reading to Learn with
Table 3 Range of scores for new measures for three participant groups
Group n Minimum Maximum kMax
Reading to Learn (chart)NSU 105 14 120 241NNSU 106 0 86 241NNSG 40 3 94 241Total participants 251 0 120 241
Reading to Integrate (synthesis)NSU 101 38 80 80NNSU 103 0 80 80NNSG 40 5 80 80Total participants 244 0 80 80
Notes kMax number of items or maximum possible score n size reduced forreading to integrate because of anomalous testing session
188 New tasks for reading comprehension tests
group status medium of text presentation and test order as possiblecontributing factors4 Table 4 shows that there were no significantinteractions for any of the group medium or test order combinationsThe only significant main effect was group membership
Because group membership was a combined measure that includedboth native language background as well as level of education posthoc analysis was needed to identify the significant contrasts Table 5shows that there was a significant difference in performance on theReading to Learn measure between the native speaker undergraduateand the nonnative speaker undergraduate groups as well as a sig-nificant difference between the nonnative speaker undergraduate and nonnative speaker graduate groups There was no significantdifference in performance between the native speaker undergrad-uate and the nonnative speaker graduate groups Therefore theanswer to Research Question 1 is that native language backgroundand level of education did have a significant effect on performance onthe Reading to Learn measure but that medium of text presentationdid not Further order of testing whether participants took Readingto Learn or Reading to Integrate first had no significant effect
3 Research Question 2
The second research question related to the first asked if perform-ance on Reading to Integrate was affected by medium of presentation
Table 4 Performance on Reading to Learn measure by groups medium and testorder (n 251) (univariate analysis of variance)
Source Type III sum df Mean square Fof squares
Group 2186929 2 1093464 2810Medium 121586 1 121586 313Test order 29498 1 29498 076Group medium 33481 2 16740 043Group test order 437 2 219 001Medium test order 39173 1 39173 101Group medium test order 57529 2 28765 074Error 9299745 239 38911
Note p 05
4Test order was added as an additional variable to double check that our counterbalancing had beeneffective in controlling for any practice effect
Latricia Trites and Mary McGroarty 189
computer familiarity native language or level of education Again toensure that counterbalancing of tests controlled for any practiceeffect test order was added as an additional variable
To answer this question we proceeded to calculate a univariateANOVA on the Reading to Integrate measure with group statusmedium of text presentation and test order entered as possible contri-buting factors The results (Table 6) show as for Research Question 1that there were no significant interactions for any of the groupmedium or test order combinations the only significant main effectwas group membership The answer for Research Question 2 is thatnative language background and educational level had a significanteffect on Reading to Integrate but medium of text presentation did notPost hoc analysis of group contrasts showed that all three groups weredistinct in their performance on Reading to Integrate (see Table 7)
4 Research Question 3
The third research question asked to what extent measures of basiccomprehension Reading to Learn and Reading to Integrate were
Table 5 Post hoc Scheffeacute for Reading to Learn measure (n 251)
Group n Group n Mean difference Standard error
NSU 105 NNSU 106 2012 272NNSG 40 717 367
NNSU 106 NNSG 40 1295 366
Note p 05
Table 6 Performance on Reading to Integrate measure by groups medium andtest order (n 244a) (univariate analysis of variance)
Source Type III sum df Mean square Fof squares
Group 3629433 2 1814717 5582b
Medium 19295 1 19295 059Test order 9783 1 9783 030Group medium 1182 2 591 002Group test order 3014 2 1507 005Medium test order 109872 1 109872 338Group medium test order 148858 2 74429 229Error 7543037 232 32513
Notes an size reduced for Reading to Integrate because of anomalous testingsession bp 05
190 New tasks for reading comprehension tests
related We used correlational analysis as the first step in answeringthis question Results for the total participant population (see Table 8)showed moderate to high correlations across all reading measuresHowever the analyses done for Research Questions 1 and 2 revealedthat group status had a significant effect on performance on Readingto Learn and Reading to Integrate Further we realize that corre-lations are sensitive to variance so the high correlations seen in thetotal population could have been an artifact of combining the threegroups Therefore we examined the correlations among all readingmeasures for each group (available in Trites 2000 Appendix 1 pp230ndash33) While the reading measures were still correlated oftenmoderately sometimes highly magnitudes differed and sometimesdropped substantially The text-specific multiple-choice measuresBC1 and BC2 consistently correlated more highly with theNelsonndashDenny and TOEFL Reading Comprehension tests than withReading to Learn and Reading to Integrate based on the same textssuggesting a test method or construct effect Because comparisonsbetween different measures of basic comprehension were not a goal of the project BC1 and BC2 were not used in further analysesWe conclude that as expected all reading measures were relatedbut the lower correlations between Reading to Learn and Reading to Integrate and the traditional basic comprehension measures led us to consider further types of analysis to identify the possibledistinctiveness of the new measures
5 Discriminant analysis
Because we were interested in determining how constructs differedwe sought additional analyses to help us better characterize the new constructs Of the several possible statistical methods that could have been employed two are most plausible multivariateanalysis of variance usually associated with experimental research
Table 7 Post hoc Scheffeacute for Reading to Integrate measure (n 244a)
Group n Group n Mean difference Standard error
NS 101 NNSU 103 2641b 253NNSG 40 1005b 337
NNSU 103 NNSG 40 1636b 336
Notes an size reduced for Reading to Integrate because of anomalous testingsession bp 05
Latricia Trites and Mary McGroarty 191
Tab
le 8
Co
rrel
atio
ns
for
all r
ead
ing
mea
sure
s fo
r al
l par
tici
pan
ts (
n
251)
TO
EFL
Rea
din
gB
asic
B
asic
Rea
din
g t
oR
ead
ing
to
C
om
pre
hen
sio
nC
om
pre
hen
sio
n T
est
1C
om
pre
hen
sio
n T
est
2Le
arn
Inte
gra
tea
Nel
son
ndashDen
ny
90b
85b
84b
66b
69b
TO
EFL
Rea
din
g
100
85b
84b
64b
69b
com
pre
hen
sio
nB
asic
1
008
4b6
8b6
8b
com
pre
hen
sio
n 1
Bas
ic
100
68b
70b
com
pre
hen
sio
n 2
Rea
din
g t
o L
earn
100
59b
Not
es a
nsi
ze r
edu
ced
fo
r R
ead
ing
to
Inte
gra
te b
ecau
se o
f an
om
alo
us
test
ing
ses
sio
n b
p
05
and discriminant analysis usually associated with descriptiveresearch (Tabachnick and Fidell 1996) The present research wasconducted with samples of naturally occurring student groups andwas not experimental Moreover we were interested in finding waysto compare participant performance on the measures of the newconstructs Reading to Learn and Reading to Integrate with perform-ance on more traditional measures of basic comprehension Thus we opted to use discriminant analysis because of its parsimony of description and clarity of interpretation (Stevens 1996)Discriminant analysis a technique recommended to describe groupdifferences or predict group membership based on a comparison ofmultiple predictors (Huberty 1994) has been used in other areas ofapplied linguistic research to investigate creation of a student profileof success or failure on Computer Assisted Language Learning(CALL) lessons (Jamieson et al 1993) and accurate classification oftext types into registers (Biber 1993) among other purposes
To further distinguish basic comprehension from the new con-structs we conducted discriminant analysis on each of the twolanguage groups (native and nonnative) to determine whetherReading to Learn and Reading to Integrate would classify partici-pants in the same way that Basic Comprehension would We dividedthe native speaker and nonnative speaker groups into three levelshigh middle (mid) and low reading ability scorers based on thebasic comprehension measure chosen for that group (NelsonndashDennyfor native speakers TOEFL Reading Comprehension for nonnativespeakers) Research methodologists (Tabachnick and Fidell 1996513) note that robustness is expected when the smallest group has isleast 20 our smallest group was 25 To check the assumption ofhomogeneity of the variancecovariance matrices we examined theoutcomes of Boxrsquos M Test and found them all nonsignificant(Klecka 1980) Each group was checked for outliers usingMahalanobis distance treated as Chi-Square and no outliers were found (Tabachnick and Fidell 1996) Thus the data met allassumptions required for use of discriminant analysis
a Discriminant analysis for nonnative speakers To organize thediscriminant analysis in order to see if the Reading to LearnReadingto Integrate Composite classified participants similarly to themeasure of Basic Comprehension (for nonnative speakers TOEFLReading Comprehension) we divided the entire nonnative speakergroup (n 146) into three levels of basic comprehension high mid
192 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 193
and low These reading ability groups of high mid and low werebased on typical TOEFL Reading Comprehension score levelsrequired for program entry Participants were classified as high iftheir scores were greater than or equal to 550 or 56 and above on thescaled score on the TOEFL Reading Comprehension (550 is the cutscore often used for graduate entry) Participants were classified asmid if scores ranged between 500 and 549 or 50 to 55 on the TOEFLReading Comprehension Participants were classified as low if theirscores fell below 500 or 49 and below on TOEFL ReadingComprehension (500 is a minimum TOEFL score sometimes usedfor undergraduate admission often with the proviso that studentsenroll in ESL classes either prior to official enrollment or concur-rently) For our entire nonnative speaker group descriptive statisticson basic comprehension reading ability group membership levelsappear in Table 9
We ran SPSS Discriminant Analysis with initial grouping variablesof high mid and low reading ability We compared initial readingability levels with high mid and low categories on the Reading toLearnReading to Integrate Composite a new variable reflectinglevel of performance on the Reading to Learn and Reading toIntegrate measures combined5 The discriminant analysis yieldedone discriminant function with an eigenvalue of 92 responsible for999 of the variance in outcomes Wilkrsquos Lambda for this functionwas 52 significant at 001 the associated Chi-Square value wasextremely large (9093) and highly significant ( p 001) indicatingthe group centroids on the composite Reading to LearnReading toIntegrate function for the three nonnative speaker reading abilitygroups were significantly different Both Reading to Learn and
Table 9 Descriptive statistics for nonnative speakers by TOEFL reading compre-hension reading ability groups (n 143)
Reading ability group n Mean on TOEFL reading comprehension sd
High (56) 57 5968 303Mid (50ndash55) 42 5281 167Low (49) 44 4211 547
Total 143 5226 822
Note Total n 143 due to loss of three cases in anomalous Reading to Integratesession
5We first calculated two separate discriminant analyses one for Reading to Learn and one for Reading to Integrate but we found that both loaded on a single function so we used thecomposite in subsequent analyses
194 New tasks for reading comprehension tests
Reading to Integrate loaded significantly on the discriminant func-tion at 86 for Reading to Learn and 78 for Reading to Integrate ( p 05)
Over half (649) of the high reading ability group remained highon the new measure less than half (429) of the mid reading ability group remained classified as mid Hence the Reading toLearnReading to Integrate Composite was particularly influential inreclassifying the mid reading ability group and to a lesser extent thehigh group However most (818) of the low reading ability groupremained low on the Reading to LearnReading to IntegrateComposite (see Table 10) Of the 143 nonnative speaker participants92 (64) remained in the initial basic comprehension category onthe composite the rest moved but in different directions Twenty-one participants (147) were classified into a higher category on theReading to LearnReading to Integrate Composite than their initialbasic comprehension level would have suggested while 31 (217)were reclassified into a lower category Thus 51 participants(364) just over one third of the sample were classified differentlybased on their Reading to LearnReading to Integrate Compositeperformance
b Discriminant analysis for native speakers Because one of ourgoals in this project was to probe the possible validity of these newmeasures by assessing performance of two groups native as well as
Table 10 Discriminant analysis comparison of nonnative speaker reading abilitygroups with reading to learnreading to integrate composite (n 143)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto integrate composite
High Mid Low
CountHigh (56) 37 17 3 57Mid (50ndash55) 13 18 11 42Low (49) 2 6 36 44Reclassification total 52 41 50 143
PercentageHigh (56) 649 298 53 1000Mid (50ndash55) 310 429 262 1000Low (49) 45 136 818 1000
NoteTotal n 143 due to loss of three cases in anomalous Reading to Integratesession
Latricia Trites and Mary McGroarty 195
nonnative speakers we conducted a parallel discriminant analysisfor native speakers Thus for the native speakers we followed thesame procedure dividing the entire native speaker group (n 105)into three levels of basic comprehension high mid and low basedon score distances of 5 standard deviations from the sample meanof the NelsonndashDenny test for these participants Native speakersneed not take reading comprehension tests when entering the univer-sity so the three-way split was based entirely on our sample dataParticipants were classified as high if their scores on the Nelson-Denny were greater than or equal to 135 Participants were classifiedas mid if their scores ranged between 119ndash134 on the Nelson-DennyParticipants were classified as low if their scores fell at or below 118on the Nelson-Denny Descriptive statistics for the entire nativespeaker sample on basic comprehension group membership appearin Table 11
The discriminant analysis yielded one discriminant function with an eigenvalue of 31 responsible for 987 of the variance inoutcomes Wilkrsquos Lambda for this function was 76 significant at 001 the associated Chi-Square value was large (2639) and highlysignificant (p 001) indicating the group centroids on the discrim-inant function for the three reading ability groups on the Reading toLearnReading to Integrate Composite were significantly differentPooled within groups correlations between discriminating variablesshowed that Reading to Learn correlated with the first discriminantfunction at a level of 81 Reading to Integrate correlated with thesecond discriminant function at 71 This contrasts with findings forthe nonnative speakers where scores on the combined new measuresloaded significantly on only one discriminant function For nativespeakers then there is evidence for two significant discriminantfunctions although the first accounts for almost all of the varianceAlthough these two measures (Reading to Learn and Reading toIntegrate) loaded on two separate discriminant functions they still
Table 11 Descriptive statistics for native speakers by NelsonndashDenny reading abilitygroups (n 101)
Reading ability group n Mean on NelsonndashDenny sd
High (135) 39 14126 519Mid (119ndash134) 37 12565 509Low (118) 25 10288 1098Total 101 12604 1652
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
196 New tasks for reading comprehension tests
showed moderate correlations with the alternate function6 justifyingthe composite calculations
Results of discriminant analysis for native speakers seen in Table 12 show a different pattern than that observed for nonnativespeakers Nearly three-fourths (718) of the native speakersclassified as high in the basic comprehension reading ability groupremained high on the Reading to LearnReading to IntegrateComposite For the mid group however only 189 remainedclassified as mid Just over half (52) of the low reading abilitygroup members remained low on the Reading to LearnReading to Integrate Composite As with the nonnative speakers participantsin the mid category on basic comprehension showed the mostfrequent reclassification Forty-eight of the 101 (475) nativespeaker participants remained in the initial classification categoriesTwenty-two (218) were reclassified into a higher category and 31(307) were reclassified into a lower category Thus 53 participants(525) over half of the sample were classified differently based ontheir performance on the Reading to LearnReading to IntegrateComposite
6Reading to Learn correlated with function 2 at -58 Reading to Integrate with function 1 at 71
Table 12 Discriminant analysis comparison of native speaker reading abilitygroups with Reading to LearnReading to Integrate Composite (n 101)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto Integrate Composite
High Mid Low
CountHigh (56) 37 17 3CountHigh (135) 28 3 8 39Mid (119ndash134) 10 7 20 37Low (118) 5 7 13 25Reclassification total 43 17 41 101
PercentageHigh (135) 718 77 205 1000Mid (119ndash134) 270 189 541 1000Low (118) 200 280 520 1000
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
V Interpretation
Analyses done to answer Research Questions 1 and 2 showed that performance on Reading to Learn and Reading to Integratemeasures was significantly influenced by language background and level of education (graduate vs undergraduate for nonnativespeakers only) Moreover level of computer familiarity had nosignificant effect on Reading to Learn and Reading to Integrateperformance
Correlations showed that as expected all reading measures corre-lated to some degree generally answering Research Question 3 Themost interesting results came from the discriminant analyses becausethey showed a pattern of differential classification on the new taskssensitive to initial reading ability This pattern differed for nonnativespeakers and native speakers Examination of the reclassificationsbased on the new tasks revealed some of the problems of using abasic comprehension-only test to predict performance on more chal-lenging literacy tasks These results also imply the need for furtherwork on new measures of advanced literacy skills such as Reading toLearn and Reading to Integrate to reflect trends in construct-drivenassessment (Pellegrino et al 1999)
For the nonnative speakers most (818) of the participants withTOEFL Reading Comprehension below 50 remained classified as lowon the Reading to LearnReading to Integrate Composite suggestingthe existence of a lower threshold of academic English proficiencyParticipants below this threshold were unlikely to perform well ontasks assessing Reading to Learn and Reading to Integrate Even tasksinvolving only selection (such as BC1 and BC2) rather than produc-tion of responses were difficult for this group However for thenonnative speaker readers in the mid and high reading ability groupsbasic comprehension level was not nearly as consistent a predictor of performance on the Reading to LearnReading to IntegrateComposite This was especially striking for those in the mid readingability group Approximately one fourth of the mid group droppedinto the low category on the Reading to LearnReading to IntegrateComposite indicating that they experienced more difficulty in com-pleting the new measures but approximately one fourth could indeedcategorize and synthesize information relatively well despite mid-dling performance on basic comprehension This finding suggeststhat for nonnative speaker readers in the mid reading ability categorybasic comprehension measures were insufficient to predict theirperformance on the new measures
Latricia Trites and Mary McGroarty 197
Almost two-thirds of the high basic comprehension nonnativespeakers remained high on the Reading to LearnReading toIntegrate Composite but one third dropped suggesting that theycould not categorize and synthesize information from texts as well asthey could recognize information For this third the recognition-onlytask multiple-choice basic comprehension overestimated ability tomanipulate and synthesize information For the high and mid read-ing ability nonnative speaker groups basic comprehension-only testsmay then overestimate academic English proficiency
Results for the native speakers showed a more definitive pattern ofreclassification For the low reading ability group approximatelyhalf were misclassified suggesting that the basic comprehensionmeasure underrepresented their ability to categorize and synthesizeinformation in academic texts Most of the mid reading ability groupwere reclassified as lower on the Reading to LearnReading toIntegrate Composite showing that basic comprehension resultsoverestimated ability to succeed on more challenging tasks On theother hand almost one third of the mid reading ability group didbetter on the Reading to LearnReading to Integrate Composite forthem basic comprehension underrepresented their ability to com-plete more challenging reading tasks For the high ability readersnearly one third dropped on the Reading to LearnReading toIntegrate Composite For them the basic comprehension measureoverestimated their level of academic English proficiency The other two thirds of the high reading ability group remained highsuggesting a higher threshold of academic English proficiency
Considering results from both the nonnative speakers and nativespeakers we conclude that the new tasks did assess something dif-ferent from basic comprehension once a lower level threshold ofbasic academic English proficiency had been achieved Examinationof scatterplots based on discriminant functions indicated an obviousseparation between participants who could and could not perform onthe Reading to LearnReading to Integrate Composite thus provid-ing some tentative evidence for concurrent validity (Messick 1989Chapelle 1999) We had hoped to find clear evidence of a hierarchysuggesting that Reading to Learn was demonstrably more difficultthan basic comprehension and Reading to Integrate demonstrablymore difficult than Reading to Learn but results did not yield anobvious hierarchy Results did however suggest an even simplerpattern a dichotomy For those nonnative speakers above the lowerthreshold of English language proficiency which in our data wouldbe approximately 500 or 49 on the scaled TOEFL Reading
198 New tasks for reading comprehension tests
Comprehension scores some could perform well on the newmeasures some could not revealing two rather than three groups onthe Reading to LearnReading to Integrate Composite These resultslead us to speculate that the new measures tap additional skills suchas lsquosophisticated discourse processes and critical thinking skillsrsquo(Enright and Schedl 1999 24) in addition to language proficiencyFor native speakers the discriminant loading suggested a possiblehierarchy of difficulty but the scatterplots and classification tablesrevealed a dichotomy Reclassification based on the Reading toLearnReading to Integrate Composite nearly eliminated the mid basic comprehension reading ability category with only 168of the native speakers qualifying as mid on the Reading toLearnReading to Integrate Composite
VI Conclusions
While we acknowledge that there are some limitations to this projectsuch as the time required to administer and score new measuresthe results are illuminating useful and suggest some considerationsfor future research The first consideration relates to appropriateclassification of students based on academic reading abilities Thiscorresponds to the role of TOEFL as a gatekeeper by many institu-tions For TOEFL test-takers whose current TOEFL scores would bein an intermediate to high intermediate range the new tasks couldassess additional abilities relevant to academic performance Forsuch participants additional test development efforts are warrantedThe majority of high reading ability nonnative speakers (649)could perform the new measures a third could not Our resultssuggest that some unknown number of mid reading ability nonnativespeakers are more capable of succeeding at more challenging aca-demic reading tasks than their current level of basic comprehensionassessment would indicate At the same time there are some highand mid reading ability participants who could not perform the moredemanding tasks admitting such students directly into universityprograms could result in failure These results suggest that for moststudents at lower levels of basic comprehension (in our study thosewith TOEFL scores below 490 to 500) development of new tasks isunnecessary Nevertheless our results indicate that a certain smallpercentage (in our sample 8 students 182) of nonnative speakersclassified as low reading ability on basic comprehension did in fact
Latricia Trites and Mary McGroarty 199
perform adequately on more challenging tasks like Reading to Learnand Reading to Integrate Given the very large number of TOEFLtest-takers around the world this finding merits further research Itwould be potentially unfair to exclude such students from universitystudy based on basic comprehension-only measures such as the cur-rent TOEFL We would urge institutions that use TOEFL to considerscores on these new more demanding tasks if doing so wouldaddress institutional needs According to interview data collected inconjunction with this project (Trites 2000 Chapter 6) participantsperceived that successful completion of the new tasks required athorough understanding of the texts whereas the multiple-choicetests were less demanding requiring only superficial grasp of con-tent This finding corroborates the observation of Freedle and Kostin(1999 3) who note that often examinees do not need to comprehendthe accompanying text to answer the test item Therefore for allTOEFL test-takers incorporating more challenging tasks such as theReading to Learn and Reading to Integrate measures into typicalEnglish language instruction could have positive washback effects
The second area of consideration is the focus on additional rele-vant research For predictive validity it is important to determine thecorrelation between the Reading to LearnReading to IntegrateComposite measures and actual academic performance in universityclasses Correlations might differ depending on the degree of catego-rization and synthesis required by different major fields of studyAnother area for future research is related to the novelty of theReading to Learn task as an assessment technique Current pedagog-ical trends in literacy instruction emphasize the use of graphic organ-izers as a means to understand and manipulate information in textsTo our knowledge graphic organizers are typically used as classroomactivities rather than assessment tools They have good potential foruse in assessment if students are familiar with them and if appropri-ate scoring systems can be developed This project demonstrates that it is possible though labor intensive to develop reliable scoringsystems Further research is needed to explore refinement of this tasktype and related scoring systems for use in testing programs withlarge numbers of test-takers and scorers Although the Reading toIntegrate task (generating a synthesis) was more familiar the scoringsystem was innovative because it reflected a readerrsquos ability torecognize textual frames as well as integrate information If testdevelopers are interested in the abilities of nonnative speaker readersto perform such tasks further development of similar tasks andscoring systems is warranted While a substantial investment of time
200 New tasks for reading comprehension tests
would be required to refine the administration and scoring systemsneeded for more complex tasks such as these it should be weighedagainst the possible danger of under-representing the likelihood ofacademic success based on results of the basic comprehension-onlymeasures still most often used to assess academic proficiency
Acknowledgements
This project was funded by a grant from Educational Testing Serviceas part of the TOEFL 2000 effort We appreciate the cooperation ofthe ETS staff members who assisted in the development of passagespecific tasks and other aspects of the research however no officialendorsement of Educational Testing Service should be inferred
VII References
Bachman L 2000 Modern language testing at the turn of the century assur-ing that what we count counts Language Testing 17 1ndash42
Biber D 1993 Using register-diversified corpora for general language studiesComputational Linguistics 19 219ndash41
Britt M Rouet J and Perfetti C 1996 Using hypertext to study and reasonabout historical evidence In Rouet J Levonen J Dillon A and Spiro Reditors Hypertext and cognition Mahwah NJ Lawrence Erlbaum 43ndash72
Chapelle C 1999 Validity in language assessment Annual Review of AppliedLinguistics 19 1ndash19
Educational Testing Service 1997 TOEFL test and score manual PrincetonNJ Educational Testing Service
mdashmdash 1998 Draft TOEFL 2000 research agenda framework areas of researchResearch agenda three TOEFL 2000 internal document Princeton NJEducational Testing Service
Eignor D Taylor C Kirsch I and Jamieson J 1998 Development of ascale for assessing the level of computer familiarity of TOEFL exami-nees TOEFL Research Report No 60 Princeton NJ EducationalTesting Service
Enright M and Schedl M 1999 Reading for a reason using reader purposeto guide test design TOEFL 2000 Internal Report Princeton NJEducational Testing Service
Enright M Grabe W Mosenthal P Mulcahy-Ernt P and Schedl M1998 A TOEFL 2000 framework for testing reading comprehension aworking paper Princeton NJ Educational Testing Service
Foltz P 1996 Comprehension coherence and strategies in hypertext InRouet J Levonen J Dillon A and Spiro R editors Hypertext and cog-nition Mahwah NJ Lawrence Erlbaum 109ndash36
Latricia Trites and Mary McGroarty 201
Freedle R and Kostin I 1999 Does the text matter in a multiple choice testof comprehension The case for the construct validity of TOEFLrsquosminitalks Language Testing 16 2ndash32
Goldman S 1997 Learning from text reflections on the past and suggestionsfor the future Discourse Processes 23 357ndash98
Hayes J and Hatch J 1999 Issues in measuring reliability correlationversus percentage of agreement Written Communication 16 354ndash67
Huberty C 1994 Applied discriminant analysis New York John WileyJamieson J Campbell J Norfleet L and Berbisada N 1993 Reliability
of a computerized scoring routine for an open-ended task System 21305ndash22
Jamieson J Norfleet L and Berbisada N 1993 Successes failures anddropouts in computer-assisted language lessons Computer AssistedEnglish Language Learning Journal 4 12ndash20
Klecka W 1980 Discriminant analysis In Lewis-Beck M editorQuantitative applications in the social sciences Volume 19 NewburyPark CA Sage
Lehto M Zhu W and Carpenter B 1995 The relative effectiveness ofhypertext and text International Journal of Human-ComputerInteraction 7 293ndash313
McNamara D and Kintsch W 1996 Learning from texts effects of prior knowl-edge and text coherence Discourse Processes 22 247ndash88
Messick S 1989 Validity In Linn RL editor Educational measurement 3rdedition New York American Council on Education Macmillan 13ndash103
Meyer B 1985a Prose analyses purposes procedures and problems InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 11ndash64
mdashmdash 1985b Prose analysis purposes procedures and problems Part 2 InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 269ndash97
Monks V 1997 AugustSeptember Two views same waterway NationalWildlife 35 36ndash37
Pellegrino J Baxter G and Glaser R 1999 Addressing the lsquotwo disci-plinesrsquo problem linking theories of cognition and learning with assess-ment and instructional practice Review of Research in Education 24307ndash53
Perfetti C 1997 Sentences individual differences and multiple texts threeissues in text comprehension Discourse Processes 23 337ndash55
Perfetti C Britt MA and Georgi M 1995 Text-based learning andreasoning studies in history Hillside NJ Lawrence Erlbaum
Perfetti C Marron M and Foltz P 1996 Sources of comprehension failuretheoretical perspectives and case studies In Cornoldi C and Oakhill Jeditors Reading comprehension difficulties processes and interventionMahwah NJ Lawrence Erlbaum 137ndash65
202 New tasks for reading comprehension tests
Reinking D 1988 Computer-mediated text and comprehension differencesthe role of reading time reader preference and estimation of learningReading Research Quarterly 23 484ndash500
Reinking D and Schreiner R 1985 The effects of computer-mediated texton measures of reading comprehension and reading behavior ReadingResearch Quarterly 20 536ndash53
Spivey N 1997 The constructivist metaphor reading writing and themaking of meaning San Diego CA Academic Press
Stevens J 1996 Applied multivariate statistics for the social sciences 3rdedition Mahwah NJ Lawrence Erlbaum
Tabachnick B and Fidell L 1996 Using multivariate statistics 3rd editionNew York Harper Collins
Taylor C Jamieson J Eignor D and Kirsch I 1998 The relationshipbetween computer familiarity and performance on computer-basedTOEFL test tasks TOEFL Research Report No 61 Princeton NJEducational Testing Service
Tennesen M 1997 NovemberDecember On a clear day National Parks 7126ndash9
Trites L 2000 Beyond basic comprehension reading to learn and reading tointegrate for native and non-native speakers Unpublished doctoraldissertation Northern Arizona University Flagstaff AZ
Van den Berg S and Watt J 1991 Effects of educational setting on studentresponses to structured hypertext Journal of Computer-BasedInstruction 18 118ndash24
Van Dijk TA and Kintsch W 1983 Strategies of discourse comprehensionNew York Academic Press
Wiley J and Voss J 1999 Constructing arguments from multiple sourcestasks that promote understanding and not just memory for text Journalof Educational Psychology 91 310ndash11
Zimmerman T 1997 December 29 Filter it with billions and billions of oys-ters how to revive the Chesapeake Bay US News and World Report 12363
Appendix 1a Chart completion task
Directions Complete the following chart Fill in as much detail aspossible from the text read by categorizing the information into thedifferent areas on the chart Do not use single words for yourresponses form your responses in phrases or complete sentencesInclude examples from the text
Make no judgments about the accuracy of causes or effects or theeffectiveness of the solutions mentioned in the text Solutions are seen
Latricia Trites and Mary McGroarty 203
204 New tasks for reading comprehension tests
as any action taken in response to the problem(s) Solutions can takemany forms such as proposed solutions attempted solutions or failedsolutions Also space is provided under each category for examplesExamples are specific examples found in the text that are used by theauthor(s) to exemplify the problems causes effects or solutions inthe text However there may not be examples for every category
Points will be awarded for correct responses only There is no penaltyfor incorrect responses Points will be awarded in the following manner
Problems and Solutions 10 points eachCauses and Effects 5 points eachExamples 1 point each
Problems Causes Effects Solutions
Examples Examples Examples Examples
Latricia Trites and Mary McGroarty 205
Ap
pen
dix
1b
Rea
din
g t
o L
earn
sco
rin
g r
ub
ric
0ndash2
41 t
ota
l po
ssib
le
Pro
ble
ms
(10
po
ints
) 1
wo
rd
Cau
ses
(5p
oin
ts)
1 w
ord
E
ffec
ts (
5 p
oin
ts)
1 w
ord
S
olu
tio
ns
(10
po
ints
en
d)
1 w
ord
(6 p
oin
ts)
40
max
imu
m(3
po
ints
) 5
5 m
axim
um
(3 p
oin
ts)
30
max
imu
m(6
po
ints
) 9
0 m
axim
um
bullA
ir p
ollu
tio
nS
mo
g i
n t
he
bullS
ulf
ur
nit
rog
en e
mis
sio
ns
bullTre
es a
nd
pla
nts
aff
ecte
dS
tric
ter
En
vir
on
men
tal
Law
s
Nati
on
al
Park
sbull
Aci
d r
ain
(in
jure
d)
gro
wth
hin
der
ed(r
eso
luti
on
s
acts
)
bullO
pp
osit
ion
or
Ign
ori
ng
bull
Gro
un
d-l
evel
ozo
ne
bullV
isib
ilit
y d
ecre
ased
bull19
77 C
lean
Air
Act
(or
lack
of
coo
per
atio
n)
to
Urb
an
In
du
str
ial
em
issio
ns
bullM
eta
ls l
oo
sen
ed
into
wat
ers
Am
en
dm
en
ts l
ab
elin
gN
Ps
envi
ron
men
tal s
tan
dar
ds
bullA
uto
mo
bile
emis
sio
ns
(su
rfac
eg
rou
nd
wat
er
as C
lass I
are
as
(reg
ula
tio
ns)
bullP
ow
er
pla
nts
fac
tori
esp
lan
ts
wat
er p
ollu
ted
)bull
Reg
ion
al h
aze
reg
ula
tio
ns
bullN
o t
rue
lsquopo
int
sou
rces
rsquoem
issi
on
sbull
Nu
trie
nts
rem
oved
pro
po
sed
by
the
EPA
(myri
ad
of
sm
aller
po
llu
tio
n
bullE
mis
sio
ns
fro
m S
mo
kesta
cks
(lea
ched
) fr
om
so
il bull
Red
ucti
on
of
allo
wab
leso
urc
es
rath
er t
han
on
e (c
him
neys)
and
or
pla
nts
po
llu
tio
n s
tan
dard
s f
rom
la
rge
sou
rce)
bullE
mis
sio
ns
fro
m K
iln
sbull
Pu
blic o
utc
ryo
ver
stan
ce
ind
ust
rybull
Nati
on
al
Park
s l
imit
ed
bull
Un
reg
ula
ted
po
lluti
on
sm
og
o
f b
ig b
usi
nes
sg
ove
rnm
ent
bullC
lean
Air
Act
set
ju
risd
icti
on
of
po
lluti
on
(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)bull
Aq
uati
c l
ife in
jure
d (
dam
aged
)vis
ibilit
y g
oals
sou
rces
ou
tsid
e o
f th
e p
arks
bullS
mo
ke f
rom
Co
ntr
olled
bu
rns
bullO
bje
cti
ng
to c
on
str
ucti
on
bullE
mis
sio
ns
fro
m la
rge
citi
esp
erm
its
bullId
en
tifi
cati
on
of
po
lluti
on
so
urc
es
bullIn
du
str
y m
od
ificati
on
s
Insta
llati
on
o
f d
evic
es
(scr
ub
ber
s)
to r
edu
ce
po
lluti
on
bullP
ub
lic p
ressu
re t
op
rote
ct p
arks
206 New tasks for reading comprehension testsA
pp
en
dix
1b
(co
nti
nu
ed)
Exam
ple
s (
1 p
oin
t) 5
maxim
um
Exam
ple
s (
1 p
oin
t o
r 5 p
oin
ts
Exam
ple
s (
1 p
oin
t) 6
maxim
um
Exam
ple
s (
1 p
oin
t o
r 10 p
oin
ts
wit
h o
vera
rch
ing
cau
se
wit
h o
vera
rch
ing
so
luti
on
liste
d a
bo
ve)
liste
d a
bo
ve)
bullG
reat
Sm
oky M
ou
nta
inbull
Au
tom
ob
ile
emis
sio
ns
bullLe
aves
of
pla
nts
tu
rnin
g
bull19
77 C
lean
Air
Act
N
atio
nal
Par
kbull
Po
wer
pla
nts
fac
tori
es
pu
rple
an
d b
row
n (
stip
plin
g)
Am
en
dm
en
ts
lab
elin
gN
Ps
bullG
ran
d C
an
yo
np
lan
ts e
mis
sio
ns
bullH
ind
erin
g p
ho
tosyn
thesis
of
as C
lass I
are
as
Nat
ion
al P
ark
bullE
mis
sio
ns
fro
m
pla
nts
an
d t
rees
bullS
mo
ky M
ou
nta
in S
ou
rces
S
mo
kesta
cks
(ch
imn
eys)
bullR
ed
ucti
on
of
vis
ibilit
yat
bullR
egio
nal
haz
ere
gu
lati
on
s
fro
m O
hio
New
Yo
rk
bullE
mis
sio
ns
fro
m K
iln
sG
ran
d C
an
yo
np
rop
ose
d b
y th
e E
PA
Atl
anta
etc
bull
Un
reg
ula
ted
po
lluti
on
sm
og
bull
Vis
ibili
ty r
educ
ed 9
0 o
f bull
Red
ucti
on
of
allo
wab
lebull
Gra
nd
Can
yon
So
urc
es(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)th
e da
ysp
ollu
tio
n s
tan
dard
s f
rom
fro
m C
A N
V U
T A
Z N
M
bullS
mok
e fr
om C
on
tro
lled
bu
rns
bullS
ee v
ague
blu
e m
asse
sin
du
stry
and
Mex
ico
bullE
mis
sio
ns
fro
m la
rge
citi
esbull
See
hal
f as
far
as
in 1
919
bullT
N L
utt
rell
corp
bu
ildin
g
per
mit
sit
uat
ion
Exam
ple
s (
1 p
oin
t)
10 m
axim
um
Exam
ple
s (
1 p
oin
t)
5 m
axim
um
bullN
avaj
o G
ener
atin
g S
tati
on
bull
Ob
ject
ing
to
kiln
s in
TN
Pag
e A
Z (
Po
wer
Pla
nt
AZ
)bull
Res
earc
her
s u
sin
g s
cien
tifi
cbull
Po
wer
pla
nts
in T
N a
nd
OH
tech
no
log
y to
ID s
ou
rces
rive
r va
lley
(rad
ioac
tive
iso
top
es)
bullS
ou
ther
n C
alif
orn
ia E
dis
on
bull
So
uth
ern
Cal
ifo
rnia
Ed
iso
nP
lan
t L
aug
hlin
NV
Pla
nt
in L
aug
hlin
NV
bull
Ten
nes
see
Lutt
rell
Kiln
s T
Nid
enti
fied
as
po
int
sou
rce
bullIn
du
stry
in A
Zbull
Scru
bber
sin
stal
led
at
bullC
ars
in C
AN
avaj
o G
ener
atin
g P
lan
tbull
Sm
oke
stac
ks in
NM
NV
UT
bull10
r
edu
ctio
n o
f p
ollu
tio
nbull
Los
An
gel
esin
10ndash
15 y
ears
(g
oal
of
no
bullA
tlan
tam
an-m
ade
po
lluti
on
)bull
New
Yo
rk
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
Latricia Trites and Mary McGroarty 181
was 241 which would result from maximum points given in allcategories The first author and two research assistants spent35ndash40 hours creating revising norming the scoring rubric anddeveloping the scoring guide (Trites 2000 Chapter 3) To deter-mine interrater reliability we used coefficient alpha rather thanpercentage of agreement because percentage of agreementinflates the likelihood of chance agreement (Hayes and Hatch1999) After norming overall interrater reliability was 99(coefficient alpha) with similarly high reliabilities assessed withsimilarly high alpha coefficients for all subcategories2
bull Reading to Integrate The second new measure assessed Readingto Integrate The task used to assess Reading to Integraterequired participants to read two 600-word texts and compose awritten synthesis The prompt asked students to make connec-tions across the range of ideas presented thus we asked readersto synthesize information rather than summarize or makecomparisons (Wiley and Voss 1999) This synthesis was scoredbased on an analytic scale ranging from 0 to 80 reflecting read-ersrsquo ability to recognize and manipulate the structure of the textsinclude specific information and express connections acrosstexts through the use of cohesive devices (for task and scoringrubric see Appendix 2) The test was designed to measure theintegration of content from both readings and did not assessother aspects of writing such as the creation of rhetorical stylegrammaticality or mechanics The rubric was composed of threesubcategories integration ability macrostructure recognitionand use of relevant details The integration subscore wasawarded the highest point values because this was the predomi-nant skill being tested It scored participants on their ability tomake connections across texts based on the manipulation of thetextual frames in both texts The second subcategory awardedpoints for the ability to recognize and articulate the macrostruc-tures (problem cause effect or solution) present in each textThis subcategory was similar to the categorizing task used in theReading to Learn measure with the additional constraint thatparticipants had to express the connections overtly The thirdsubcategory in the scoring rubric analysed the ability to use
2We recognize that tasks requiring high inference measures plus extensive norming and revision of the scoring rubric pose feasibility issues in large-scale testing Further research is needed todetermine whether and how such scoring procedures could be adapted in standardized testing fornumerous test-takers
relevant details as support in the written synthesis The firstauthor and two research assistants spent 30 hours revising norm-ing the scoring rubric and developing a decision guide resultingin an overall interrater reliability of 99 (coefficient alpha) withsimilarly high alphas for all subcategories
bull Basic Comprehension The third construct was measured bymultiple-choice tests related specifically to the texts used in the new tasks These tests were created by TOEFL TestDevelopment staff and followed current TOEFL reading sectionspecifications We used two multiple choice tests BasicComprehension Test 1 (BC1) and Basic Comprehension Test 2 (BC2) 20 items each one for the longer passage used to assessReading to Learn and one for the two passages used to assessReading to Integrate Both were scored based on number of items answered correctly Reliability on BC1 calculated basedon 251 participants was 84 (coefficient alpha) Inadvertently theorder of the texts used in BC2 was different for the two differentmedia however reliability on both versions of the test was highFor those who took BC2 based on paper texts (n 127) relia-bility was 84 (coefficient alpha) for those who took BC2 basedon computerized texts (n 124) reliability was 86 (coefficientalpha)
3 Design for data collection
This study used a 22 repeated measures design to examineperformance on the new reading tasks Native speaker undergraduatesand nonnative speaker undergraduates were divided into two groupseach of equal ability as determined by performance on the baselinestandardized measures of reading comprehension (NelsonndashDenny orTOEFL) Half of each group read texts on paper the other half readthe same texts on a computer screen A smaller group of nonnativespeaker graduates equally divided were also included for a compar-ison between performance by graduate and undergraduate nonnativespeakers Additionally the administration of the new measures wascounterbalanced to control for any practice effect
a Procedures All participants met with the researchers in foursessions each lasting about an hour The first two sessions were devotedto administering the existing instruments During Session 1 partici-pants received an introduction to the study and took one of the two
182 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 183
standardized basic reading comprehension measures (NelsonndashDennyor TOEFL Reading Comprehension) Students completed thecomputer familiarity questionnaire and the NelsonndashDenny Test at the same testing session because the NelsonndashDenny was shorter than the TOEFL Reading Comprehension During Session 2 partic-ipants took the other standardized basic reading comprehensionmeasure
Next each participant group was subdivided into two subgroupsfor computer-based or paper reading of the texts for the new tasksThe subgroups were matched on their performance on initial readingmeasures the NelsonndashDenny was used for native speakers and theTOEFL Reading Comprehension was used for nonnative speakersIndependent t-tests run on these reading measures showed no signif-icant difference in basic comprehension for the newly created sub-groups assigned to each medium ensuring that they were balancedfor initial reading levels Participants stayed in the same subgroupsfor the duration of the study To ensure uniformity of response modeall participants whether they read the source texts on the computeror on paper responded to the reading tasks using paper and pencilformat3
The last two sessions each lasting approximately one hour werededicated to administration of the new measures The Reading toLearn session took slightly longer to administer because administra-tive procedures were longer for this novel task The new tasks werecounterbalanced to control for practice effect thus half of the partic-ipants took the Reading to Learn measure first and half took theReading to Integrate measure first During Session 3 we administeredthe first new measure (for ease of discussion Reading to Learn is dis-cussed first) and BC1 At this session students were given 12 minutesto read a 1200-word passage either on computer or on paper We lim-ited the time allowed for reading based on 100 words per minutethought to be ample (Grabe personal communication 1998) Afterexaminees read the text they were given 4 minutes to take notes ona half sheet of paper Participants were instructed to take minimalnotes due to the time constraints Next the text was removed andexaminees were allowed 15 minutes to complete a chart based on the reading with the aid of their notes After completing this Readingto Learn activity participants were allowed to use the text and
3Although responses could have been entered and perhaps scored by computer this would haveintroduced factors not directly related to our research questions and remains an area for furtherstudy
were given 15 minutes to answer BC1 Following these new testingsessions 49 participants were selected for a related interview con-cerning the cognitive processes used in task completion (for furtherdetails see Trites 2000 Chapter 6)
During Session 4 students were given 12 minutes to read twoshort texts (600 words each) either on computer or paper Afterparticipants read the assigned texts they were given 4 minutes totake one-half page of notes (Enright et al 1998) Next the textswere removed and participants were asked to demonstrate Readingto Integrate by writing a synthesis of the texts with the aid of theirnotes (15 minutes allowed for this task) After completing theReading to Integrate task participants were allowed to see the textsagain and answered BC2 (15 minutes allowed for this task) In oneReading to Integrate session for unknown reasons six of the sevenparticipants read only one text Because we cannot explain the causeof this anomalous session we have eliminated scores from thesessionrsquos seven participants from subsequent analyses thus slightlyreducing the N size for the Reading to Integrate measure
b Variables used in study The six independent variables includedthree nominal (Native Language Background Medium of TextPresentation and Level of Education) and three interval variables(NelsonndashDenny TOEFL Reading Comprehension and ComputerFamiliarity) The four dependent variables were Reading to LearnReading to Integrate BC1 and BC2
IV Results
First we present the descriptive statistics for all reading measuresfollowed by a systematic analysis of independent variables that mightaffect participant performance on the new measures Scatterplotswere checked for all reading measures to ensure normality of dataKurtosis and skewness levels for all reading measures were found tobe within normal limits indicating a relatively normal distributionDescriptive statistics for all existing measures are shown in Table 1Means for these measures show a consistent pattern the nativespeaker undergraduates had the highest mean followed by the non-native speaker graduates followed by the nonnative speaker under-graduates On the reading measures NelsonndashDenny and TOEFLReading Comprehension the nonnative speaker undergraduates
184 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 185
showed the largest variance in performance while on the computerfamiliarity measure the variance of both nonnative speaker groupswas substantially larger than that of the native speakers
The same pattern emerged for the means on the new measures (seeTable 2) as for the existing measures The native speaker undergradu-ate group performed better on all new measures than both of thenonnative speaker groups The nonnative speaker graduate groupperformed better than the nonnative speaker undergraduate group onall measures as well This robust pattern of performance was alsofound in the variance of three of the four new measures On BC1 andBC2 the performance of the native speaker undergraduates showed the least amount of variance followed by the nonnative speaker grad-uates followed by the nonnative speaker undergraduates On Readingto Integrate the native speaker undergraduate group showed substan-tially less variance than the nonnative speaker groups however thevariance of the two nonnative speaker groups was almost identical OnReading to Learn all three groups showed considerable variance
Table 3 reveals the range of awarded points achieved by all partici-pant groups The nature of the Reading to Learn point system created amaximum possible point value (241) that no participant achieved Wespeculate that there are at least three possible causes of the discrepancybetween the theoretical maximum and the range of observed scores
Table 1 Descriptive statistics for existing measures for three participant groups
Group n Mean sd kMax
NelsonndashDennyNSU 105 12648 1646 156NNSU 106 6724 3191 156NNSG 40 8888 2188 156Total participants 251 9547 3693 156
TOEFL Reading comprehensionNSU 105 6130 424 67NNSU 106 5030 853 67NNSG 40 5715 455 67Total participants 251 5599 819 67
Computer familiarityNSU 104 3808 360 44NNSU 104 3482 599 44NNSG 40 3563 602 44Total participants 248 3631 533 44
Note kMax number of items or maximum possible score
186 New tasks for reading comprehension tests
bull task novelty no participant reported ever doing such a taskpreviously
bull time allowed for task completion andbull space on the response sheet space constraints may have limited
the amount of information that participants could include
Future research would need to address these issues However for theReading to Integrate measure the full range of possible point totalswas achieved by at least one participant in each group
1 Computer familiarity
The overall plan for the analyses was to check the influence of theindependent variables on the dependent measures with computerfamiliarity being addressed first Initially we had proposed that if computer familiarity was significantly different across groups itwould be entered into all calculations as a covariate To determinethis it was necessary to conduct an Analysis of Variance (ANOVA) forcomputer familiarity across the six participantmedium subgroups
Table 2 Descriptive statistics for new measures for three participant groups
Group n Mean sd kMax
Reading to Learn (chart)NSU 105 5185 1986 241NNSU 106 3173 1950 241NNSG 40 4468 1927 241Total participants 251 4221 2164 241
Basic Comprehension Test 1NSU 105 1698 247 20NNSU 106 1173 425 20NNSG 40 1498 350 20Total participants 251 1444 423 20
Reading to Integrate (synthesis)NSU 101 6365 1105 80NNSU 103 3724 2176 80NNSG 40 5360 2103 80Total participants 244 5086 2163 80
Basic Comprehension Test 2NSU 105 1591 278 20NNSU 106 975 454 20NNSG 40 1285 361 20Total participants 251 1282 472 20
Notes kMax number of items or maximum possible score n size reduced forreading to integrate because of anomalous testing session
Latricia Trites and Mary McGroarty 187
The resulting ANOVA (F 470 p 05) showed a significantdifference between subgroups on the computer familiarity question-naire therefore a post hoc Scheffeacute test was done to locate significantcontrasts After analysis of all possible subgroup contrasts the post hoc Scheffeacute revealed that the only significant difference in sub-groups appeared between the native speaker undergraduates andnonnative speaker undergraduates who read texts on paper Hencealthough there was one significant contrast it occurred in two sub-groups reading on paper not in any of the subgroups who read oncomputer All groups generally scored high on computer familiarityalthough as noted variance of the nonnative groups was greater Itwas thus established that computer familiarity had no significanteffect on participants who read texts on computer so we did not usecomputer familiarity as a covariate in further analyses and proceededto the three research questions of central interest to this study
Because both Research Questions 1 and 2 are similar ndash except thatthey address the two different new reading measures Reading toLearn and Reading to Integrate ndash we approached them in the samemanner through ANOVA to identify the independent variables thatcould have significantly affected the results on the new measures
2 Research Question 1
The first research question asked if performance on a measure ofReading to Learn was affected by medium of presentation computerfamiliarity native language or level of education We calculated a uni-variate ANOVA with Type III sums of squares on Reading to Learn with
Table 3 Range of scores for new measures for three participant groups
Group n Minimum Maximum kMax
Reading to Learn (chart)NSU 105 14 120 241NNSU 106 0 86 241NNSG 40 3 94 241Total participants 251 0 120 241
Reading to Integrate (synthesis)NSU 101 38 80 80NNSU 103 0 80 80NNSG 40 5 80 80Total participants 244 0 80 80
Notes kMax number of items or maximum possible score n size reduced forreading to integrate because of anomalous testing session
188 New tasks for reading comprehension tests
group status medium of text presentation and test order as possiblecontributing factors4 Table 4 shows that there were no significantinteractions for any of the group medium or test order combinationsThe only significant main effect was group membership
Because group membership was a combined measure that includedboth native language background as well as level of education posthoc analysis was needed to identify the significant contrasts Table 5shows that there was a significant difference in performance on theReading to Learn measure between the native speaker undergraduateand the nonnative speaker undergraduate groups as well as a sig-nificant difference between the nonnative speaker undergraduate and nonnative speaker graduate groups There was no significantdifference in performance between the native speaker undergrad-uate and the nonnative speaker graduate groups Therefore theanswer to Research Question 1 is that native language backgroundand level of education did have a significant effect on performance onthe Reading to Learn measure but that medium of text presentationdid not Further order of testing whether participants took Readingto Learn or Reading to Integrate first had no significant effect
3 Research Question 2
The second research question related to the first asked if perform-ance on Reading to Integrate was affected by medium of presentation
Table 4 Performance on Reading to Learn measure by groups medium and testorder (n 251) (univariate analysis of variance)
Source Type III sum df Mean square Fof squares
Group 2186929 2 1093464 2810Medium 121586 1 121586 313Test order 29498 1 29498 076Group medium 33481 2 16740 043Group test order 437 2 219 001Medium test order 39173 1 39173 101Group medium test order 57529 2 28765 074Error 9299745 239 38911
Note p 05
4Test order was added as an additional variable to double check that our counterbalancing had beeneffective in controlling for any practice effect
Latricia Trites and Mary McGroarty 189
computer familiarity native language or level of education Again toensure that counterbalancing of tests controlled for any practiceeffect test order was added as an additional variable
To answer this question we proceeded to calculate a univariateANOVA on the Reading to Integrate measure with group statusmedium of text presentation and test order entered as possible contri-buting factors The results (Table 6) show as for Research Question 1that there were no significant interactions for any of the groupmedium or test order combinations the only significant main effectwas group membership The answer for Research Question 2 is thatnative language background and educational level had a significanteffect on Reading to Integrate but medium of text presentation did notPost hoc analysis of group contrasts showed that all three groups weredistinct in their performance on Reading to Integrate (see Table 7)
4 Research Question 3
The third research question asked to what extent measures of basiccomprehension Reading to Learn and Reading to Integrate were
Table 5 Post hoc Scheffeacute for Reading to Learn measure (n 251)
Group n Group n Mean difference Standard error
NSU 105 NNSU 106 2012 272NNSG 40 717 367
NNSU 106 NNSG 40 1295 366
Note p 05
Table 6 Performance on Reading to Integrate measure by groups medium andtest order (n 244a) (univariate analysis of variance)
Source Type III sum df Mean square Fof squares
Group 3629433 2 1814717 5582b
Medium 19295 1 19295 059Test order 9783 1 9783 030Group medium 1182 2 591 002Group test order 3014 2 1507 005Medium test order 109872 1 109872 338Group medium test order 148858 2 74429 229Error 7543037 232 32513
Notes an size reduced for Reading to Integrate because of anomalous testingsession bp 05
190 New tasks for reading comprehension tests
related We used correlational analysis as the first step in answeringthis question Results for the total participant population (see Table 8)showed moderate to high correlations across all reading measuresHowever the analyses done for Research Questions 1 and 2 revealedthat group status had a significant effect on performance on Readingto Learn and Reading to Integrate Further we realize that corre-lations are sensitive to variance so the high correlations seen in thetotal population could have been an artifact of combining the threegroups Therefore we examined the correlations among all readingmeasures for each group (available in Trites 2000 Appendix 1 pp230ndash33) While the reading measures were still correlated oftenmoderately sometimes highly magnitudes differed and sometimesdropped substantially The text-specific multiple-choice measuresBC1 and BC2 consistently correlated more highly with theNelsonndashDenny and TOEFL Reading Comprehension tests than withReading to Learn and Reading to Integrate based on the same textssuggesting a test method or construct effect Because comparisonsbetween different measures of basic comprehension were not a goal of the project BC1 and BC2 were not used in further analysesWe conclude that as expected all reading measures were relatedbut the lower correlations between Reading to Learn and Reading to Integrate and the traditional basic comprehension measures led us to consider further types of analysis to identify the possibledistinctiveness of the new measures
5 Discriminant analysis
Because we were interested in determining how constructs differedwe sought additional analyses to help us better characterize the new constructs Of the several possible statistical methods that could have been employed two are most plausible multivariateanalysis of variance usually associated with experimental research
Table 7 Post hoc Scheffeacute for Reading to Integrate measure (n 244a)
Group n Group n Mean difference Standard error
NS 101 NNSU 103 2641b 253NNSG 40 1005b 337
NNSU 103 NNSG 40 1636b 336
Notes an size reduced for Reading to Integrate because of anomalous testingsession bp 05
Latricia Trites and Mary McGroarty 191
Tab
le 8
Co
rrel
atio
ns
for
all r
ead
ing
mea
sure
s fo
r al
l par
tici
pan
ts (
n
251)
TO
EFL
Rea
din
gB
asic
B
asic
Rea
din
g t
oR
ead
ing
to
C
om
pre
hen
sio
nC
om
pre
hen
sio
n T
est
1C
om
pre
hen
sio
n T
est
2Le
arn
Inte
gra
tea
Nel
son
ndashDen
ny
90b
85b
84b
66b
69b
TO
EFL
Rea
din
g
100
85b
84b
64b
69b
com
pre
hen
sio
nB
asic
1
008
4b6
8b6
8b
com
pre
hen
sio
n 1
Bas
ic
100
68b
70b
com
pre
hen
sio
n 2
Rea
din
g t
o L
earn
100
59b
Not
es a
nsi
ze r
edu
ced
fo
r R
ead
ing
to
Inte
gra
te b
ecau
se o
f an
om
alo
us
test
ing
ses
sio
n b
p
05
and discriminant analysis usually associated with descriptiveresearch (Tabachnick and Fidell 1996) The present research wasconducted with samples of naturally occurring student groups andwas not experimental Moreover we were interested in finding waysto compare participant performance on the measures of the newconstructs Reading to Learn and Reading to Integrate with perform-ance on more traditional measures of basic comprehension Thus we opted to use discriminant analysis because of its parsimony of description and clarity of interpretation (Stevens 1996)Discriminant analysis a technique recommended to describe groupdifferences or predict group membership based on a comparison ofmultiple predictors (Huberty 1994) has been used in other areas ofapplied linguistic research to investigate creation of a student profileof success or failure on Computer Assisted Language Learning(CALL) lessons (Jamieson et al 1993) and accurate classification oftext types into registers (Biber 1993) among other purposes
To further distinguish basic comprehension from the new con-structs we conducted discriminant analysis on each of the twolanguage groups (native and nonnative) to determine whetherReading to Learn and Reading to Integrate would classify partici-pants in the same way that Basic Comprehension would We dividedthe native speaker and nonnative speaker groups into three levelshigh middle (mid) and low reading ability scorers based on thebasic comprehension measure chosen for that group (NelsonndashDennyfor native speakers TOEFL Reading Comprehension for nonnativespeakers) Research methodologists (Tabachnick and Fidell 1996513) note that robustness is expected when the smallest group has isleast 20 our smallest group was 25 To check the assumption ofhomogeneity of the variancecovariance matrices we examined theoutcomes of Boxrsquos M Test and found them all nonsignificant(Klecka 1980) Each group was checked for outliers usingMahalanobis distance treated as Chi-Square and no outliers were found (Tabachnick and Fidell 1996) Thus the data met allassumptions required for use of discriminant analysis
a Discriminant analysis for nonnative speakers To organize thediscriminant analysis in order to see if the Reading to LearnReadingto Integrate Composite classified participants similarly to themeasure of Basic Comprehension (for nonnative speakers TOEFLReading Comprehension) we divided the entire nonnative speakergroup (n 146) into three levels of basic comprehension high mid
192 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 193
and low These reading ability groups of high mid and low werebased on typical TOEFL Reading Comprehension score levelsrequired for program entry Participants were classified as high iftheir scores were greater than or equal to 550 or 56 and above on thescaled score on the TOEFL Reading Comprehension (550 is the cutscore often used for graduate entry) Participants were classified asmid if scores ranged between 500 and 549 or 50 to 55 on the TOEFLReading Comprehension Participants were classified as low if theirscores fell below 500 or 49 and below on TOEFL ReadingComprehension (500 is a minimum TOEFL score sometimes usedfor undergraduate admission often with the proviso that studentsenroll in ESL classes either prior to official enrollment or concur-rently) For our entire nonnative speaker group descriptive statisticson basic comprehension reading ability group membership levelsappear in Table 9
We ran SPSS Discriminant Analysis with initial grouping variablesof high mid and low reading ability We compared initial readingability levels with high mid and low categories on the Reading toLearnReading to Integrate Composite a new variable reflectinglevel of performance on the Reading to Learn and Reading toIntegrate measures combined5 The discriminant analysis yieldedone discriminant function with an eigenvalue of 92 responsible for999 of the variance in outcomes Wilkrsquos Lambda for this functionwas 52 significant at 001 the associated Chi-Square value wasextremely large (9093) and highly significant ( p 001) indicatingthe group centroids on the composite Reading to LearnReading toIntegrate function for the three nonnative speaker reading abilitygroups were significantly different Both Reading to Learn and
Table 9 Descriptive statistics for nonnative speakers by TOEFL reading compre-hension reading ability groups (n 143)
Reading ability group n Mean on TOEFL reading comprehension sd
High (56) 57 5968 303Mid (50ndash55) 42 5281 167Low (49) 44 4211 547
Total 143 5226 822
Note Total n 143 due to loss of three cases in anomalous Reading to Integratesession
5We first calculated two separate discriminant analyses one for Reading to Learn and one for Reading to Integrate but we found that both loaded on a single function so we used thecomposite in subsequent analyses
194 New tasks for reading comprehension tests
Reading to Integrate loaded significantly on the discriminant func-tion at 86 for Reading to Learn and 78 for Reading to Integrate ( p 05)
Over half (649) of the high reading ability group remained highon the new measure less than half (429) of the mid reading ability group remained classified as mid Hence the Reading toLearnReading to Integrate Composite was particularly influential inreclassifying the mid reading ability group and to a lesser extent thehigh group However most (818) of the low reading ability groupremained low on the Reading to LearnReading to IntegrateComposite (see Table 10) Of the 143 nonnative speaker participants92 (64) remained in the initial basic comprehension category onthe composite the rest moved but in different directions Twenty-one participants (147) were classified into a higher category on theReading to LearnReading to Integrate Composite than their initialbasic comprehension level would have suggested while 31 (217)were reclassified into a lower category Thus 51 participants(364) just over one third of the sample were classified differentlybased on their Reading to LearnReading to Integrate Compositeperformance
b Discriminant analysis for native speakers Because one of ourgoals in this project was to probe the possible validity of these newmeasures by assessing performance of two groups native as well as
Table 10 Discriminant analysis comparison of nonnative speaker reading abilitygroups with reading to learnreading to integrate composite (n 143)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto integrate composite
High Mid Low
CountHigh (56) 37 17 3 57Mid (50ndash55) 13 18 11 42Low (49) 2 6 36 44Reclassification total 52 41 50 143
PercentageHigh (56) 649 298 53 1000Mid (50ndash55) 310 429 262 1000Low (49) 45 136 818 1000
NoteTotal n 143 due to loss of three cases in anomalous Reading to Integratesession
Latricia Trites and Mary McGroarty 195
nonnative speakers we conducted a parallel discriminant analysisfor native speakers Thus for the native speakers we followed thesame procedure dividing the entire native speaker group (n 105)into three levels of basic comprehension high mid and low basedon score distances of 5 standard deviations from the sample meanof the NelsonndashDenny test for these participants Native speakersneed not take reading comprehension tests when entering the univer-sity so the three-way split was based entirely on our sample dataParticipants were classified as high if their scores on the Nelson-Denny were greater than or equal to 135 Participants were classifiedas mid if their scores ranged between 119ndash134 on the Nelson-DennyParticipants were classified as low if their scores fell at or below 118on the Nelson-Denny Descriptive statistics for the entire nativespeaker sample on basic comprehension group membership appearin Table 11
The discriminant analysis yielded one discriminant function with an eigenvalue of 31 responsible for 987 of the variance inoutcomes Wilkrsquos Lambda for this function was 76 significant at 001 the associated Chi-Square value was large (2639) and highlysignificant (p 001) indicating the group centroids on the discrim-inant function for the three reading ability groups on the Reading toLearnReading to Integrate Composite were significantly differentPooled within groups correlations between discriminating variablesshowed that Reading to Learn correlated with the first discriminantfunction at a level of 81 Reading to Integrate correlated with thesecond discriminant function at 71 This contrasts with findings forthe nonnative speakers where scores on the combined new measuresloaded significantly on only one discriminant function For nativespeakers then there is evidence for two significant discriminantfunctions although the first accounts for almost all of the varianceAlthough these two measures (Reading to Learn and Reading toIntegrate) loaded on two separate discriminant functions they still
Table 11 Descriptive statistics for native speakers by NelsonndashDenny reading abilitygroups (n 101)
Reading ability group n Mean on NelsonndashDenny sd
High (135) 39 14126 519Mid (119ndash134) 37 12565 509Low (118) 25 10288 1098Total 101 12604 1652
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
196 New tasks for reading comprehension tests
showed moderate correlations with the alternate function6 justifyingthe composite calculations
Results of discriminant analysis for native speakers seen in Table 12 show a different pattern than that observed for nonnativespeakers Nearly three-fourths (718) of the native speakersclassified as high in the basic comprehension reading ability groupremained high on the Reading to LearnReading to IntegrateComposite For the mid group however only 189 remainedclassified as mid Just over half (52) of the low reading abilitygroup members remained low on the Reading to LearnReading to Integrate Composite As with the nonnative speakers participantsin the mid category on basic comprehension showed the mostfrequent reclassification Forty-eight of the 101 (475) nativespeaker participants remained in the initial classification categoriesTwenty-two (218) were reclassified into a higher category and 31(307) were reclassified into a lower category Thus 53 participants(525) over half of the sample were classified differently based ontheir performance on the Reading to LearnReading to IntegrateComposite
6Reading to Learn correlated with function 2 at -58 Reading to Integrate with function 1 at 71
Table 12 Discriminant analysis comparison of native speaker reading abilitygroups with Reading to LearnReading to Integrate Composite (n 101)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto Integrate Composite
High Mid Low
CountHigh (56) 37 17 3CountHigh (135) 28 3 8 39Mid (119ndash134) 10 7 20 37Low (118) 5 7 13 25Reclassification total 43 17 41 101
PercentageHigh (135) 718 77 205 1000Mid (119ndash134) 270 189 541 1000Low (118) 200 280 520 1000
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
V Interpretation
Analyses done to answer Research Questions 1 and 2 showed that performance on Reading to Learn and Reading to Integratemeasures was significantly influenced by language background and level of education (graduate vs undergraduate for nonnativespeakers only) Moreover level of computer familiarity had nosignificant effect on Reading to Learn and Reading to Integrateperformance
Correlations showed that as expected all reading measures corre-lated to some degree generally answering Research Question 3 Themost interesting results came from the discriminant analyses becausethey showed a pattern of differential classification on the new taskssensitive to initial reading ability This pattern differed for nonnativespeakers and native speakers Examination of the reclassificationsbased on the new tasks revealed some of the problems of using abasic comprehension-only test to predict performance on more chal-lenging literacy tasks These results also imply the need for furtherwork on new measures of advanced literacy skills such as Reading toLearn and Reading to Integrate to reflect trends in construct-drivenassessment (Pellegrino et al 1999)
For the nonnative speakers most (818) of the participants withTOEFL Reading Comprehension below 50 remained classified as lowon the Reading to LearnReading to Integrate Composite suggestingthe existence of a lower threshold of academic English proficiencyParticipants below this threshold were unlikely to perform well ontasks assessing Reading to Learn and Reading to Integrate Even tasksinvolving only selection (such as BC1 and BC2) rather than produc-tion of responses were difficult for this group However for thenonnative speaker readers in the mid and high reading ability groupsbasic comprehension level was not nearly as consistent a predictor of performance on the Reading to LearnReading to IntegrateComposite This was especially striking for those in the mid readingability group Approximately one fourth of the mid group droppedinto the low category on the Reading to LearnReading to IntegrateComposite indicating that they experienced more difficulty in com-pleting the new measures but approximately one fourth could indeedcategorize and synthesize information relatively well despite mid-dling performance on basic comprehension This finding suggeststhat for nonnative speaker readers in the mid reading ability categorybasic comprehension measures were insufficient to predict theirperformance on the new measures
Latricia Trites and Mary McGroarty 197
Almost two-thirds of the high basic comprehension nonnativespeakers remained high on the Reading to LearnReading toIntegrate Composite but one third dropped suggesting that theycould not categorize and synthesize information from texts as well asthey could recognize information For this third the recognition-onlytask multiple-choice basic comprehension overestimated ability tomanipulate and synthesize information For the high and mid read-ing ability nonnative speaker groups basic comprehension-only testsmay then overestimate academic English proficiency
Results for the native speakers showed a more definitive pattern ofreclassification For the low reading ability group approximatelyhalf were misclassified suggesting that the basic comprehensionmeasure underrepresented their ability to categorize and synthesizeinformation in academic texts Most of the mid reading ability groupwere reclassified as lower on the Reading to LearnReading toIntegrate Composite showing that basic comprehension resultsoverestimated ability to succeed on more challenging tasks On theother hand almost one third of the mid reading ability group didbetter on the Reading to LearnReading to Integrate Composite forthem basic comprehension underrepresented their ability to com-plete more challenging reading tasks For the high ability readersnearly one third dropped on the Reading to LearnReading toIntegrate Composite For them the basic comprehension measureoverestimated their level of academic English proficiency The other two thirds of the high reading ability group remained highsuggesting a higher threshold of academic English proficiency
Considering results from both the nonnative speakers and nativespeakers we conclude that the new tasks did assess something dif-ferent from basic comprehension once a lower level threshold ofbasic academic English proficiency had been achieved Examinationof scatterplots based on discriminant functions indicated an obviousseparation between participants who could and could not perform onthe Reading to LearnReading to Integrate Composite thus provid-ing some tentative evidence for concurrent validity (Messick 1989Chapelle 1999) We had hoped to find clear evidence of a hierarchysuggesting that Reading to Learn was demonstrably more difficultthan basic comprehension and Reading to Integrate demonstrablymore difficult than Reading to Learn but results did not yield anobvious hierarchy Results did however suggest an even simplerpattern a dichotomy For those nonnative speakers above the lowerthreshold of English language proficiency which in our data wouldbe approximately 500 or 49 on the scaled TOEFL Reading
198 New tasks for reading comprehension tests
Comprehension scores some could perform well on the newmeasures some could not revealing two rather than three groups onthe Reading to LearnReading to Integrate Composite These resultslead us to speculate that the new measures tap additional skills suchas lsquosophisticated discourse processes and critical thinking skillsrsquo(Enright and Schedl 1999 24) in addition to language proficiencyFor native speakers the discriminant loading suggested a possiblehierarchy of difficulty but the scatterplots and classification tablesrevealed a dichotomy Reclassification based on the Reading toLearnReading to Integrate Composite nearly eliminated the mid basic comprehension reading ability category with only 168of the native speakers qualifying as mid on the Reading toLearnReading to Integrate Composite
VI Conclusions
While we acknowledge that there are some limitations to this projectsuch as the time required to administer and score new measuresthe results are illuminating useful and suggest some considerationsfor future research The first consideration relates to appropriateclassification of students based on academic reading abilities Thiscorresponds to the role of TOEFL as a gatekeeper by many institu-tions For TOEFL test-takers whose current TOEFL scores would bein an intermediate to high intermediate range the new tasks couldassess additional abilities relevant to academic performance Forsuch participants additional test development efforts are warrantedThe majority of high reading ability nonnative speakers (649)could perform the new measures a third could not Our resultssuggest that some unknown number of mid reading ability nonnativespeakers are more capable of succeeding at more challenging aca-demic reading tasks than their current level of basic comprehensionassessment would indicate At the same time there are some highand mid reading ability participants who could not perform the moredemanding tasks admitting such students directly into universityprograms could result in failure These results suggest that for moststudents at lower levels of basic comprehension (in our study thosewith TOEFL scores below 490 to 500) development of new tasks isunnecessary Nevertheless our results indicate that a certain smallpercentage (in our sample 8 students 182) of nonnative speakersclassified as low reading ability on basic comprehension did in fact
Latricia Trites and Mary McGroarty 199
perform adequately on more challenging tasks like Reading to Learnand Reading to Integrate Given the very large number of TOEFLtest-takers around the world this finding merits further research Itwould be potentially unfair to exclude such students from universitystudy based on basic comprehension-only measures such as the cur-rent TOEFL We would urge institutions that use TOEFL to considerscores on these new more demanding tasks if doing so wouldaddress institutional needs According to interview data collected inconjunction with this project (Trites 2000 Chapter 6) participantsperceived that successful completion of the new tasks required athorough understanding of the texts whereas the multiple-choicetests were less demanding requiring only superficial grasp of con-tent This finding corroborates the observation of Freedle and Kostin(1999 3) who note that often examinees do not need to comprehendthe accompanying text to answer the test item Therefore for allTOEFL test-takers incorporating more challenging tasks such as theReading to Learn and Reading to Integrate measures into typicalEnglish language instruction could have positive washback effects
The second area of consideration is the focus on additional rele-vant research For predictive validity it is important to determine thecorrelation between the Reading to LearnReading to IntegrateComposite measures and actual academic performance in universityclasses Correlations might differ depending on the degree of catego-rization and synthesis required by different major fields of studyAnother area for future research is related to the novelty of theReading to Learn task as an assessment technique Current pedagog-ical trends in literacy instruction emphasize the use of graphic organ-izers as a means to understand and manipulate information in textsTo our knowledge graphic organizers are typically used as classroomactivities rather than assessment tools They have good potential foruse in assessment if students are familiar with them and if appropri-ate scoring systems can be developed This project demonstrates that it is possible though labor intensive to develop reliable scoringsystems Further research is needed to explore refinement of this tasktype and related scoring systems for use in testing programs withlarge numbers of test-takers and scorers Although the Reading toIntegrate task (generating a synthesis) was more familiar the scoringsystem was innovative because it reflected a readerrsquos ability torecognize textual frames as well as integrate information If testdevelopers are interested in the abilities of nonnative speaker readersto perform such tasks further development of similar tasks andscoring systems is warranted While a substantial investment of time
200 New tasks for reading comprehension tests
would be required to refine the administration and scoring systemsneeded for more complex tasks such as these it should be weighedagainst the possible danger of under-representing the likelihood ofacademic success based on results of the basic comprehension-onlymeasures still most often used to assess academic proficiency
Acknowledgements
This project was funded by a grant from Educational Testing Serviceas part of the TOEFL 2000 effort We appreciate the cooperation ofthe ETS staff members who assisted in the development of passagespecific tasks and other aspects of the research however no officialendorsement of Educational Testing Service should be inferred
VII References
Bachman L 2000 Modern language testing at the turn of the century assur-ing that what we count counts Language Testing 17 1ndash42
Biber D 1993 Using register-diversified corpora for general language studiesComputational Linguistics 19 219ndash41
Britt M Rouet J and Perfetti C 1996 Using hypertext to study and reasonabout historical evidence In Rouet J Levonen J Dillon A and Spiro Reditors Hypertext and cognition Mahwah NJ Lawrence Erlbaum 43ndash72
Chapelle C 1999 Validity in language assessment Annual Review of AppliedLinguistics 19 1ndash19
Educational Testing Service 1997 TOEFL test and score manual PrincetonNJ Educational Testing Service
mdashmdash 1998 Draft TOEFL 2000 research agenda framework areas of researchResearch agenda three TOEFL 2000 internal document Princeton NJEducational Testing Service
Eignor D Taylor C Kirsch I and Jamieson J 1998 Development of ascale for assessing the level of computer familiarity of TOEFL exami-nees TOEFL Research Report No 60 Princeton NJ EducationalTesting Service
Enright M and Schedl M 1999 Reading for a reason using reader purposeto guide test design TOEFL 2000 Internal Report Princeton NJEducational Testing Service
Enright M Grabe W Mosenthal P Mulcahy-Ernt P and Schedl M1998 A TOEFL 2000 framework for testing reading comprehension aworking paper Princeton NJ Educational Testing Service
Foltz P 1996 Comprehension coherence and strategies in hypertext InRouet J Levonen J Dillon A and Spiro R editors Hypertext and cog-nition Mahwah NJ Lawrence Erlbaum 109ndash36
Latricia Trites and Mary McGroarty 201
Freedle R and Kostin I 1999 Does the text matter in a multiple choice testof comprehension The case for the construct validity of TOEFLrsquosminitalks Language Testing 16 2ndash32
Goldman S 1997 Learning from text reflections on the past and suggestionsfor the future Discourse Processes 23 357ndash98
Hayes J and Hatch J 1999 Issues in measuring reliability correlationversus percentage of agreement Written Communication 16 354ndash67
Huberty C 1994 Applied discriminant analysis New York John WileyJamieson J Campbell J Norfleet L and Berbisada N 1993 Reliability
of a computerized scoring routine for an open-ended task System 21305ndash22
Jamieson J Norfleet L and Berbisada N 1993 Successes failures anddropouts in computer-assisted language lessons Computer AssistedEnglish Language Learning Journal 4 12ndash20
Klecka W 1980 Discriminant analysis In Lewis-Beck M editorQuantitative applications in the social sciences Volume 19 NewburyPark CA Sage
Lehto M Zhu W and Carpenter B 1995 The relative effectiveness ofhypertext and text International Journal of Human-ComputerInteraction 7 293ndash313
McNamara D and Kintsch W 1996 Learning from texts effects of prior knowl-edge and text coherence Discourse Processes 22 247ndash88
Messick S 1989 Validity In Linn RL editor Educational measurement 3rdedition New York American Council on Education Macmillan 13ndash103
Meyer B 1985a Prose analyses purposes procedures and problems InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 11ndash64
mdashmdash 1985b Prose analysis purposes procedures and problems Part 2 InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 269ndash97
Monks V 1997 AugustSeptember Two views same waterway NationalWildlife 35 36ndash37
Pellegrino J Baxter G and Glaser R 1999 Addressing the lsquotwo disci-plinesrsquo problem linking theories of cognition and learning with assess-ment and instructional practice Review of Research in Education 24307ndash53
Perfetti C 1997 Sentences individual differences and multiple texts threeissues in text comprehension Discourse Processes 23 337ndash55
Perfetti C Britt MA and Georgi M 1995 Text-based learning andreasoning studies in history Hillside NJ Lawrence Erlbaum
Perfetti C Marron M and Foltz P 1996 Sources of comprehension failuretheoretical perspectives and case studies In Cornoldi C and Oakhill Jeditors Reading comprehension difficulties processes and interventionMahwah NJ Lawrence Erlbaum 137ndash65
202 New tasks for reading comprehension tests
Reinking D 1988 Computer-mediated text and comprehension differencesthe role of reading time reader preference and estimation of learningReading Research Quarterly 23 484ndash500
Reinking D and Schreiner R 1985 The effects of computer-mediated texton measures of reading comprehension and reading behavior ReadingResearch Quarterly 20 536ndash53
Spivey N 1997 The constructivist metaphor reading writing and themaking of meaning San Diego CA Academic Press
Stevens J 1996 Applied multivariate statistics for the social sciences 3rdedition Mahwah NJ Lawrence Erlbaum
Tabachnick B and Fidell L 1996 Using multivariate statistics 3rd editionNew York Harper Collins
Taylor C Jamieson J Eignor D and Kirsch I 1998 The relationshipbetween computer familiarity and performance on computer-basedTOEFL test tasks TOEFL Research Report No 61 Princeton NJEducational Testing Service
Tennesen M 1997 NovemberDecember On a clear day National Parks 7126ndash9
Trites L 2000 Beyond basic comprehension reading to learn and reading tointegrate for native and non-native speakers Unpublished doctoraldissertation Northern Arizona University Flagstaff AZ
Van den Berg S and Watt J 1991 Effects of educational setting on studentresponses to structured hypertext Journal of Computer-BasedInstruction 18 118ndash24
Van Dijk TA and Kintsch W 1983 Strategies of discourse comprehensionNew York Academic Press
Wiley J and Voss J 1999 Constructing arguments from multiple sourcestasks that promote understanding and not just memory for text Journalof Educational Psychology 91 310ndash11
Zimmerman T 1997 December 29 Filter it with billions and billions of oys-ters how to revive the Chesapeake Bay US News and World Report 12363
Appendix 1a Chart completion task
Directions Complete the following chart Fill in as much detail aspossible from the text read by categorizing the information into thedifferent areas on the chart Do not use single words for yourresponses form your responses in phrases or complete sentencesInclude examples from the text
Make no judgments about the accuracy of causes or effects or theeffectiveness of the solutions mentioned in the text Solutions are seen
Latricia Trites and Mary McGroarty 203
204 New tasks for reading comprehension tests
as any action taken in response to the problem(s) Solutions can takemany forms such as proposed solutions attempted solutions or failedsolutions Also space is provided under each category for examplesExamples are specific examples found in the text that are used by theauthor(s) to exemplify the problems causes effects or solutions inthe text However there may not be examples for every category
Points will be awarded for correct responses only There is no penaltyfor incorrect responses Points will be awarded in the following manner
Problems and Solutions 10 points eachCauses and Effects 5 points eachExamples 1 point each
Problems Causes Effects Solutions
Examples Examples Examples Examples
Latricia Trites and Mary McGroarty 205
Ap
pen
dix
1b
Rea
din
g t
o L
earn
sco
rin
g r
ub
ric
0ndash2
41 t
ota
l po
ssib
le
Pro
ble
ms
(10
po
ints
) 1
wo
rd
Cau
ses
(5p
oin
ts)
1 w
ord
E
ffec
ts (
5 p
oin
ts)
1 w
ord
S
olu
tio
ns
(10
po
ints
en
d)
1 w
ord
(6 p
oin
ts)
40
max
imu
m(3
po
ints
) 5
5 m
axim
um
(3 p
oin
ts)
30
max
imu
m(6
po
ints
) 9
0 m
axim
um
bullA
ir p
ollu
tio
nS
mo
g i
n t
he
bullS
ulf
ur
nit
rog
en e
mis
sio
ns
bullTre
es a
nd
pla
nts
aff
ecte
dS
tric
ter
En
vir
on
men
tal
Law
s
Nati
on
al
Park
sbull
Aci
d r
ain
(in
jure
d)
gro
wth
hin
der
ed(r
eso
luti
on
s
acts
)
bullO
pp
osit
ion
or
Ign
ori
ng
bull
Gro
un
d-l
evel
ozo
ne
bullV
isib
ilit
y d
ecre
ased
bull19
77 C
lean
Air
Act
(or
lack
of
coo
per
atio
n)
to
Urb
an
In
du
str
ial
em
issio
ns
bullM
eta
ls l
oo
sen
ed
into
wat
ers
Am
en
dm
en
ts l
ab
elin
gN
Ps
envi
ron
men
tal s
tan
dar
ds
bullA
uto
mo
bile
emis
sio
ns
(su
rfac
eg
rou
nd
wat
er
as C
lass I
are
as
(reg
ula
tio
ns)
bullP
ow
er
pla
nts
fac
tori
esp
lan
ts
wat
er p
ollu
ted
)bull
Reg
ion
al h
aze
reg
ula
tio
ns
bullN
o t
rue
lsquopo
int
sou
rces
rsquoem
issi
on
sbull
Nu
trie
nts
rem
oved
pro
po
sed
by
the
EPA
(myri
ad
of
sm
aller
po
llu
tio
n
bullE
mis
sio
ns
fro
m S
mo
kesta
cks
(lea
ched
) fr
om
so
il bull
Red
ucti
on
of
allo
wab
leso
urc
es
rath
er t
han
on
e (c
him
neys)
and
or
pla
nts
po
llu
tio
n s
tan
dard
s f
rom
la
rge
sou
rce)
bullE
mis
sio
ns
fro
m K
iln
sbull
Pu
blic o
utc
ryo
ver
stan
ce
ind
ust
rybull
Nati
on
al
Park
s l
imit
ed
bull
Un
reg
ula
ted
po
lluti
on
sm
og
o
f b
ig b
usi
nes
sg
ove
rnm
ent
bullC
lean
Air
Act
set
ju
risd
icti
on
of
po
lluti
on
(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)bull
Aq
uati
c l
ife in
jure
d (
dam
aged
)vis
ibilit
y g
oals
sou
rces
ou
tsid
e o
f th
e p
arks
bullS
mo
ke f
rom
Co
ntr
olled
bu
rns
bullO
bje
cti
ng
to c
on
str
ucti
on
bullE
mis
sio
ns
fro
m la
rge
citi
esp
erm
its
bullId
en
tifi
cati
on
of
po
lluti
on
so
urc
es
bullIn
du
str
y m
od
ificati
on
s
Insta
llati
on
o
f d
evic
es
(scr
ub
ber
s)
to r
edu
ce
po
lluti
on
bullP
ub
lic p
ressu
re t
op
rote
ct p
arks
206 New tasks for reading comprehension testsA
pp
en
dix
1b
(co
nti
nu
ed)
Exam
ple
s (
1 p
oin
t) 5
maxim
um
Exam
ple
s (
1 p
oin
t o
r 5 p
oin
ts
Exam
ple
s (
1 p
oin
t) 6
maxim
um
Exam
ple
s (
1 p
oin
t o
r 10 p
oin
ts
wit
h o
vera
rch
ing
cau
se
wit
h o
vera
rch
ing
so
luti
on
liste
d a
bo
ve)
liste
d a
bo
ve)
bullG
reat
Sm
oky M
ou
nta
inbull
Au
tom
ob
ile
emis
sio
ns
bullLe
aves
of
pla
nts
tu
rnin
g
bull19
77 C
lean
Air
Act
N
atio
nal
Par
kbull
Po
wer
pla
nts
fac
tori
es
pu
rple
an
d b
row
n (
stip
plin
g)
Am
en
dm
en
ts
lab
elin
gN
Ps
bullG
ran
d C
an
yo
np
lan
ts e
mis
sio
ns
bullH
ind
erin
g p
ho
tosyn
thesis
of
as C
lass I
are
as
Nat
ion
al P
ark
bullE
mis
sio
ns
fro
m
pla
nts
an
d t
rees
bullS
mo
ky M
ou
nta
in S
ou
rces
S
mo
kesta
cks
(ch
imn
eys)
bullR
ed
ucti
on
of
vis
ibilit
yat
bullR
egio
nal
haz
ere
gu
lati
on
s
fro
m O
hio
New
Yo
rk
bullE
mis
sio
ns
fro
m K
iln
sG
ran
d C
an
yo
np
rop
ose
d b
y th
e E
PA
Atl
anta
etc
bull
Un
reg
ula
ted
po
lluti
on
sm
og
bull
Vis
ibili
ty r
educ
ed 9
0 o
f bull
Red
ucti
on
of
allo
wab
lebull
Gra
nd
Can
yon
So
urc
es(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)th
e da
ysp
ollu
tio
n s
tan
dard
s f
rom
fro
m C
A N
V U
T A
Z N
M
bullS
mok
e fr
om C
on
tro
lled
bu
rns
bullS
ee v
ague
blu
e m
asse
sin
du
stry
and
Mex
ico
bullE
mis
sio
ns
fro
m la
rge
citi
esbull
See
hal
f as
far
as
in 1
919
bullT
N L
utt
rell
corp
bu
ildin
g
per
mit
sit
uat
ion
Exam
ple
s (
1 p
oin
t)
10 m
axim
um
Exam
ple
s (
1 p
oin
t)
5 m
axim
um
bullN
avaj
o G
ener
atin
g S
tati
on
bull
Ob
ject
ing
to
kiln
s in
TN
Pag
e A
Z (
Po
wer
Pla
nt
AZ
)bull
Res
earc
her
s u
sin
g s
cien
tifi
cbull
Po
wer
pla
nts
in T
N a
nd
OH
tech
no
log
y to
ID s
ou
rces
rive
r va
lley
(rad
ioac
tive
iso
top
es)
bullS
ou
ther
n C
alif
orn
ia E
dis
on
bull
So
uth
ern
Cal
ifo
rnia
Ed
iso
nP
lan
t L
aug
hlin
NV
Pla
nt
in L
aug
hlin
NV
bull
Ten
nes
see
Lutt
rell
Kiln
s T
Nid
enti
fied
as
po
int
sou
rce
bullIn
du
stry
in A
Zbull
Scru
bber
sin
stal
led
at
bullC
ars
in C
AN
avaj
o G
ener
atin
g P
lan
tbull
Sm
oke
stac
ks in
NM
NV
UT
bull10
r
edu
ctio
n o
f p
ollu
tio
nbull
Los
An
gel
esin
10ndash
15 y
ears
(g
oal
of
no
bullA
tlan
tam
an-m
ade
po
lluti
on
)bull
New
Yo
rk
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
relevant details as support in the written synthesis The firstauthor and two research assistants spent 30 hours revising norm-ing the scoring rubric and developing a decision guide resultingin an overall interrater reliability of 99 (coefficient alpha) withsimilarly high alphas for all subcategories
bull Basic Comprehension The third construct was measured bymultiple-choice tests related specifically to the texts used in the new tasks These tests were created by TOEFL TestDevelopment staff and followed current TOEFL reading sectionspecifications We used two multiple choice tests BasicComprehension Test 1 (BC1) and Basic Comprehension Test 2 (BC2) 20 items each one for the longer passage used to assessReading to Learn and one for the two passages used to assessReading to Integrate Both were scored based on number of items answered correctly Reliability on BC1 calculated basedon 251 participants was 84 (coefficient alpha) Inadvertently theorder of the texts used in BC2 was different for the two differentmedia however reliability on both versions of the test was highFor those who took BC2 based on paper texts (n 127) relia-bility was 84 (coefficient alpha) for those who took BC2 basedon computerized texts (n 124) reliability was 86 (coefficientalpha)
3 Design for data collection
This study used a 22 repeated measures design to examineperformance on the new reading tasks Native speaker undergraduatesand nonnative speaker undergraduates were divided into two groupseach of equal ability as determined by performance on the baselinestandardized measures of reading comprehension (NelsonndashDenny orTOEFL) Half of each group read texts on paper the other half readthe same texts on a computer screen A smaller group of nonnativespeaker graduates equally divided were also included for a compar-ison between performance by graduate and undergraduate nonnativespeakers Additionally the administration of the new measures wascounterbalanced to control for any practice effect
a Procedures All participants met with the researchers in foursessions each lasting about an hour The first two sessions were devotedto administering the existing instruments During Session 1 partici-pants received an introduction to the study and took one of the two
182 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 183
standardized basic reading comprehension measures (NelsonndashDennyor TOEFL Reading Comprehension) Students completed thecomputer familiarity questionnaire and the NelsonndashDenny Test at the same testing session because the NelsonndashDenny was shorter than the TOEFL Reading Comprehension During Session 2 partic-ipants took the other standardized basic reading comprehensionmeasure
Next each participant group was subdivided into two subgroupsfor computer-based or paper reading of the texts for the new tasksThe subgroups were matched on their performance on initial readingmeasures the NelsonndashDenny was used for native speakers and theTOEFL Reading Comprehension was used for nonnative speakersIndependent t-tests run on these reading measures showed no signif-icant difference in basic comprehension for the newly created sub-groups assigned to each medium ensuring that they were balancedfor initial reading levels Participants stayed in the same subgroupsfor the duration of the study To ensure uniformity of response modeall participants whether they read the source texts on the computeror on paper responded to the reading tasks using paper and pencilformat3
The last two sessions each lasting approximately one hour werededicated to administration of the new measures The Reading toLearn session took slightly longer to administer because administra-tive procedures were longer for this novel task The new tasks werecounterbalanced to control for practice effect thus half of the partic-ipants took the Reading to Learn measure first and half took theReading to Integrate measure first During Session 3 we administeredthe first new measure (for ease of discussion Reading to Learn is dis-cussed first) and BC1 At this session students were given 12 minutesto read a 1200-word passage either on computer or on paper We lim-ited the time allowed for reading based on 100 words per minutethought to be ample (Grabe personal communication 1998) Afterexaminees read the text they were given 4 minutes to take notes ona half sheet of paper Participants were instructed to take minimalnotes due to the time constraints Next the text was removed andexaminees were allowed 15 minutes to complete a chart based on the reading with the aid of their notes After completing this Readingto Learn activity participants were allowed to use the text and
3Although responses could have been entered and perhaps scored by computer this would haveintroduced factors not directly related to our research questions and remains an area for furtherstudy
were given 15 minutes to answer BC1 Following these new testingsessions 49 participants were selected for a related interview con-cerning the cognitive processes used in task completion (for furtherdetails see Trites 2000 Chapter 6)
During Session 4 students were given 12 minutes to read twoshort texts (600 words each) either on computer or paper Afterparticipants read the assigned texts they were given 4 minutes totake one-half page of notes (Enright et al 1998) Next the textswere removed and participants were asked to demonstrate Readingto Integrate by writing a synthesis of the texts with the aid of theirnotes (15 minutes allowed for this task) After completing theReading to Integrate task participants were allowed to see the textsagain and answered BC2 (15 minutes allowed for this task) In oneReading to Integrate session for unknown reasons six of the sevenparticipants read only one text Because we cannot explain the causeof this anomalous session we have eliminated scores from thesessionrsquos seven participants from subsequent analyses thus slightlyreducing the N size for the Reading to Integrate measure
b Variables used in study The six independent variables includedthree nominal (Native Language Background Medium of TextPresentation and Level of Education) and three interval variables(NelsonndashDenny TOEFL Reading Comprehension and ComputerFamiliarity) The four dependent variables were Reading to LearnReading to Integrate BC1 and BC2
IV Results
First we present the descriptive statistics for all reading measuresfollowed by a systematic analysis of independent variables that mightaffect participant performance on the new measures Scatterplotswere checked for all reading measures to ensure normality of dataKurtosis and skewness levels for all reading measures were found tobe within normal limits indicating a relatively normal distributionDescriptive statistics for all existing measures are shown in Table 1Means for these measures show a consistent pattern the nativespeaker undergraduates had the highest mean followed by the non-native speaker graduates followed by the nonnative speaker under-graduates On the reading measures NelsonndashDenny and TOEFLReading Comprehension the nonnative speaker undergraduates
184 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 185
showed the largest variance in performance while on the computerfamiliarity measure the variance of both nonnative speaker groupswas substantially larger than that of the native speakers
The same pattern emerged for the means on the new measures (seeTable 2) as for the existing measures The native speaker undergradu-ate group performed better on all new measures than both of thenonnative speaker groups The nonnative speaker graduate groupperformed better than the nonnative speaker undergraduate group onall measures as well This robust pattern of performance was alsofound in the variance of three of the four new measures On BC1 andBC2 the performance of the native speaker undergraduates showed the least amount of variance followed by the nonnative speaker grad-uates followed by the nonnative speaker undergraduates On Readingto Integrate the native speaker undergraduate group showed substan-tially less variance than the nonnative speaker groups however thevariance of the two nonnative speaker groups was almost identical OnReading to Learn all three groups showed considerable variance
Table 3 reveals the range of awarded points achieved by all partici-pant groups The nature of the Reading to Learn point system created amaximum possible point value (241) that no participant achieved Wespeculate that there are at least three possible causes of the discrepancybetween the theoretical maximum and the range of observed scores
Table 1 Descriptive statistics for existing measures for three participant groups
Group n Mean sd kMax
NelsonndashDennyNSU 105 12648 1646 156NNSU 106 6724 3191 156NNSG 40 8888 2188 156Total participants 251 9547 3693 156
TOEFL Reading comprehensionNSU 105 6130 424 67NNSU 106 5030 853 67NNSG 40 5715 455 67Total participants 251 5599 819 67
Computer familiarityNSU 104 3808 360 44NNSU 104 3482 599 44NNSG 40 3563 602 44Total participants 248 3631 533 44
Note kMax number of items or maximum possible score
186 New tasks for reading comprehension tests
bull task novelty no participant reported ever doing such a taskpreviously
bull time allowed for task completion andbull space on the response sheet space constraints may have limited
the amount of information that participants could include
Future research would need to address these issues However for theReading to Integrate measure the full range of possible point totalswas achieved by at least one participant in each group
1 Computer familiarity
The overall plan for the analyses was to check the influence of theindependent variables on the dependent measures with computerfamiliarity being addressed first Initially we had proposed that if computer familiarity was significantly different across groups itwould be entered into all calculations as a covariate To determinethis it was necessary to conduct an Analysis of Variance (ANOVA) forcomputer familiarity across the six participantmedium subgroups
Table 2 Descriptive statistics for new measures for three participant groups
Group n Mean sd kMax
Reading to Learn (chart)NSU 105 5185 1986 241NNSU 106 3173 1950 241NNSG 40 4468 1927 241Total participants 251 4221 2164 241
Basic Comprehension Test 1NSU 105 1698 247 20NNSU 106 1173 425 20NNSG 40 1498 350 20Total participants 251 1444 423 20
Reading to Integrate (synthesis)NSU 101 6365 1105 80NNSU 103 3724 2176 80NNSG 40 5360 2103 80Total participants 244 5086 2163 80
Basic Comprehension Test 2NSU 105 1591 278 20NNSU 106 975 454 20NNSG 40 1285 361 20Total participants 251 1282 472 20
Notes kMax number of items or maximum possible score n size reduced forreading to integrate because of anomalous testing session
Latricia Trites and Mary McGroarty 187
The resulting ANOVA (F 470 p 05) showed a significantdifference between subgroups on the computer familiarity question-naire therefore a post hoc Scheffeacute test was done to locate significantcontrasts After analysis of all possible subgroup contrasts the post hoc Scheffeacute revealed that the only significant difference in sub-groups appeared between the native speaker undergraduates andnonnative speaker undergraduates who read texts on paper Hencealthough there was one significant contrast it occurred in two sub-groups reading on paper not in any of the subgroups who read oncomputer All groups generally scored high on computer familiarityalthough as noted variance of the nonnative groups was greater Itwas thus established that computer familiarity had no significanteffect on participants who read texts on computer so we did not usecomputer familiarity as a covariate in further analyses and proceededto the three research questions of central interest to this study
Because both Research Questions 1 and 2 are similar ndash except thatthey address the two different new reading measures Reading toLearn and Reading to Integrate ndash we approached them in the samemanner through ANOVA to identify the independent variables thatcould have significantly affected the results on the new measures
2 Research Question 1
The first research question asked if performance on a measure ofReading to Learn was affected by medium of presentation computerfamiliarity native language or level of education We calculated a uni-variate ANOVA with Type III sums of squares on Reading to Learn with
Table 3 Range of scores for new measures for three participant groups
Group n Minimum Maximum kMax
Reading to Learn (chart)NSU 105 14 120 241NNSU 106 0 86 241NNSG 40 3 94 241Total participants 251 0 120 241
Reading to Integrate (synthesis)NSU 101 38 80 80NNSU 103 0 80 80NNSG 40 5 80 80Total participants 244 0 80 80
Notes kMax number of items or maximum possible score n size reduced forreading to integrate because of anomalous testing session
188 New tasks for reading comprehension tests
group status medium of text presentation and test order as possiblecontributing factors4 Table 4 shows that there were no significantinteractions for any of the group medium or test order combinationsThe only significant main effect was group membership
Because group membership was a combined measure that includedboth native language background as well as level of education posthoc analysis was needed to identify the significant contrasts Table 5shows that there was a significant difference in performance on theReading to Learn measure between the native speaker undergraduateand the nonnative speaker undergraduate groups as well as a sig-nificant difference between the nonnative speaker undergraduate and nonnative speaker graduate groups There was no significantdifference in performance between the native speaker undergrad-uate and the nonnative speaker graduate groups Therefore theanswer to Research Question 1 is that native language backgroundand level of education did have a significant effect on performance onthe Reading to Learn measure but that medium of text presentationdid not Further order of testing whether participants took Readingto Learn or Reading to Integrate first had no significant effect
3 Research Question 2
The second research question related to the first asked if perform-ance on Reading to Integrate was affected by medium of presentation
Table 4 Performance on Reading to Learn measure by groups medium and testorder (n 251) (univariate analysis of variance)
Source Type III sum df Mean square Fof squares
Group 2186929 2 1093464 2810Medium 121586 1 121586 313Test order 29498 1 29498 076Group medium 33481 2 16740 043Group test order 437 2 219 001Medium test order 39173 1 39173 101Group medium test order 57529 2 28765 074Error 9299745 239 38911
Note p 05
4Test order was added as an additional variable to double check that our counterbalancing had beeneffective in controlling for any practice effect
Latricia Trites and Mary McGroarty 189
computer familiarity native language or level of education Again toensure that counterbalancing of tests controlled for any practiceeffect test order was added as an additional variable
To answer this question we proceeded to calculate a univariateANOVA on the Reading to Integrate measure with group statusmedium of text presentation and test order entered as possible contri-buting factors The results (Table 6) show as for Research Question 1that there were no significant interactions for any of the groupmedium or test order combinations the only significant main effectwas group membership The answer for Research Question 2 is thatnative language background and educational level had a significanteffect on Reading to Integrate but medium of text presentation did notPost hoc analysis of group contrasts showed that all three groups weredistinct in their performance on Reading to Integrate (see Table 7)
4 Research Question 3
The third research question asked to what extent measures of basiccomprehension Reading to Learn and Reading to Integrate were
Table 5 Post hoc Scheffeacute for Reading to Learn measure (n 251)
Group n Group n Mean difference Standard error
NSU 105 NNSU 106 2012 272NNSG 40 717 367
NNSU 106 NNSG 40 1295 366
Note p 05
Table 6 Performance on Reading to Integrate measure by groups medium andtest order (n 244a) (univariate analysis of variance)
Source Type III sum df Mean square Fof squares
Group 3629433 2 1814717 5582b
Medium 19295 1 19295 059Test order 9783 1 9783 030Group medium 1182 2 591 002Group test order 3014 2 1507 005Medium test order 109872 1 109872 338Group medium test order 148858 2 74429 229Error 7543037 232 32513
Notes an size reduced for Reading to Integrate because of anomalous testingsession bp 05
190 New tasks for reading comprehension tests
related We used correlational analysis as the first step in answeringthis question Results for the total participant population (see Table 8)showed moderate to high correlations across all reading measuresHowever the analyses done for Research Questions 1 and 2 revealedthat group status had a significant effect on performance on Readingto Learn and Reading to Integrate Further we realize that corre-lations are sensitive to variance so the high correlations seen in thetotal population could have been an artifact of combining the threegroups Therefore we examined the correlations among all readingmeasures for each group (available in Trites 2000 Appendix 1 pp230ndash33) While the reading measures were still correlated oftenmoderately sometimes highly magnitudes differed and sometimesdropped substantially The text-specific multiple-choice measuresBC1 and BC2 consistently correlated more highly with theNelsonndashDenny and TOEFL Reading Comprehension tests than withReading to Learn and Reading to Integrate based on the same textssuggesting a test method or construct effect Because comparisonsbetween different measures of basic comprehension were not a goal of the project BC1 and BC2 were not used in further analysesWe conclude that as expected all reading measures were relatedbut the lower correlations between Reading to Learn and Reading to Integrate and the traditional basic comprehension measures led us to consider further types of analysis to identify the possibledistinctiveness of the new measures
5 Discriminant analysis
Because we were interested in determining how constructs differedwe sought additional analyses to help us better characterize the new constructs Of the several possible statistical methods that could have been employed two are most plausible multivariateanalysis of variance usually associated with experimental research
Table 7 Post hoc Scheffeacute for Reading to Integrate measure (n 244a)
Group n Group n Mean difference Standard error
NS 101 NNSU 103 2641b 253NNSG 40 1005b 337
NNSU 103 NNSG 40 1636b 336
Notes an size reduced for Reading to Integrate because of anomalous testingsession bp 05
Latricia Trites and Mary McGroarty 191
Tab
le 8
Co
rrel
atio
ns
for
all r
ead
ing
mea
sure
s fo
r al
l par
tici
pan
ts (
n
251)
TO
EFL
Rea
din
gB
asic
B
asic
Rea
din
g t
oR
ead
ing
to
C
om
pre
hen
sio
nC
om
pre
hen
sio
n T
est
1C
om
pre
hen
sio
n T
est
2Le
arn
Inte
gra
tea
Nel
son
ndashDen
ny
90b
85b
84b
66b
69b
TO
EFL
Rea
din
g
100
85b
84b
64b
69b
com
pre
hen
sio
nB
asic
1
008
4b6
8b6
8b
com
pre
hen
sio
n 1
Bas
ic
100
68b
70b
com
pre
hen
sio
n 2
Rea
din
g t
o L
earn
100
59b
Not
es a
nsi
ze r
edu
ced
fo
r R
ead
ing
to
Inte
gra
te b
ecau
se o
f an
om
alo
us
test
ing
ses
sio
n b
p
05
and discriminant analysis usually associated with descriptiveresearch (Tabachnick and Fidell 1996) The present research wasconducted with samples of naturally occurring student groups andwas not experimental Moreover we were interested in finding waysto compare participant performance on the measures of the newconstructs Reading to Learn and Reading to Integrate with perform-ance on more traditional measures of basic comprehension Thus we opted to use discriminant analysis because of its parsimony of description and clarity of interpretation (Stevens 1996)Discriminant analysis a technique recommended to describe groupdifferences or predict group membership based on a comparison ofmultiple predictors (Huberty 1994) has been used in other areas ofapplied linguistic research to investigate creation of a student profileof success or failure on Computer Assisted Language Learning(CALL) lessons (Jamieson et al 1993) and accurate classification oftext types into registers (Biber 1993) among other purposes
To further distinguish basic comprehension from the new con-structs we conducted discriminant analysis on each of the twolanguage groups (native and nonnative) to determine whetherReading to Learn and Reading to Integrate would classify partici-pants in the same way that Basic Comprehension would We dividedthe native speaker and nonnative speaker groups into three levelshigh middle (mid) and low reading ability scorers based on thebasic comprehension measure chosen for that group (NelsonndashDennyfor native speakers TOEFL Reading Comprehension for nonnativespeakers) Research methodologists (Tabachnick and Fidell 1996513) note that robustness is expected when the smallest group has isleast 20 our smallest group was 25 To check the assumption ofhomogeneity of the variancecovariance matrices we examined theoutcomes of Boxrsquos M Test and found them all nonsignificant(Klecka 1980) Each group was checked for outliers usingMahalanobis distance treated as Chi-Square and no outliers were found (Tabachnick and Fidell 1996) Thus the data met allassumptions required for use of discriminant analysis
a Discriminant analysis for nonnative speakers To organize thediscriminant analysis in order to see if the Reading to LearnReadingto Integrate Composite classified participants similarly to themeasure of Basic Comprehension (for nonnative speakers TOEFLReading Comprehension) we divided the entire nonnative speakergroup (n 146) into three levels of basic comprehension high mid
192 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 193
and low These reading ability groups of high mid and low werebased on typical TOEFL Reading Comprehension score levelsrequired for program entry Participants were classified as high iftheir scores were greater than or equal to 550 or 56 and above on thescaled score on the TOEFL Reading Comprehension (550 is the cutscore often used for graduate entry) Participants were classified asmid if scores ranged between 500 and 549 or 50 to 55 on the TOEFLReading Comprehension Participants were classified as low if theirscores fell below 500 or 49 and below on TOEFL ReadingComprehension (500 is a minimum TOEFL score sometimes usedfor undergraduate admission often with the proviso that studentsenroll in ESL classes either prior to official enrollment or concur-rently) For our entire nonnative speaker group descriptive statisticson basic comprehension reading ability group membership levelsappear in Table 9
We ran SPSS Discriminant Analysis with initial grouping variablesof high mid and low reading ability We compared initial readingability levels with high mid and low categories on the Reading toLearnReading to Integrate Composite a new variable reflectinglevel of performance on the Reading to Learn and Reading toIntegrate measures combined5 The discriminant analysis yieldedone discriminant function with an eigenvalue of 92 responsible for999 of the variance in outcomes Wilkrsquos Lambda for this functionwas 52 significant at 001 the associated Chi-Square value wasextremely large (9093) and highly significant ( p 001) indicatingthe group centroids on the composite Reading to LearnReading toIntegrate function for the three nonnative speaker reading abilitygroups were significantly different Both Reading to Learn and
Table 9 Descriptive statistics for nonnative speakers by TOEFL reading compre-hension reading ability groups (n 143)
Reading ability group n Mean on TOEFL reading comprehension sd
High (56) 57 5968 303Mid (50ndash55) 42 5281 167Low (49) 44 4211 547
Total 143 5226 822
Note Total n 143 due to loss of three cases in anomalous Reading to Integratesession
5We first calculated two separate discriminant analyses one for Reading to Learn and one for Reading to Integrate but we found that both loaded on a single function so we used thecomposite in subsequent analyses
194 New tasks for reading comprehension tests
Reading to Integrate loaded significantly on the discriminant func-tion at 86 for Reading to Learn and 78 for Reading to Integrate ( p 05)
Over half (649) of the high reading ability group remained highon the new measure less than half (429) of the mid reading ability group remained classified as mid Hence the Reading toLearnReading to Integrate Composite was particularly influential inreclassifying the mid reading ability group and to a lesser extent thehigh group However most (818) of the low reading ability groupremained low on the Reading to LearnReading to IntegrateComposite (see Table 10) Of the 143 nonnative speaker participants92 (64) remained in the initial basic comprehension category onthe composite the rest moved but in different directions Twenty-one participants (147) were classified into a higher category on theReading to LearnReading to Integrate Composite than their initialbasic comprehension level would have suggested while 31 (217)were reclassified into a lower category Thus 51 participants(364) just over one third of the sample were classified differentlybased on their Reading to LearnReading to Integrate Compositeperformance
b Discriminant analysis for native speakers Because one of ourgoals in this project was to probe the possible validity of these newmeasures by assessing performance of two groups native as well as
Table 10 Discriminant analysis comparison of nonnative speaker reading abilitygroups with reading to learnreading to integrate composite (n 143)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto integrate composite
High Mid Low
CountHigh (56) 37 17 3 57Mid (50ndash55) 13 18 11 42Low (49) 2 6 36 44Reclassification total 52 41 50 143
PercentageHigh (56) 649 298 53 1000Mid (50ndash55) 310 429 262 1000Low (49) 45 136 818 1000
NoteTotal n 143 due to loss of three cases in anomalous Reading to Integratesession
Latricia Trites and Mary McGroarty 195
nonnative speakers we conducted a parallel discriminant analysisfor native speakers Thus for the native speakers we followed thesame procedure dividing the entire native speaker group (n 105)into three levels of basic comprehension high mid and low basedon score distances of 5 standard deviations from the sample meanof the NelsonndashDenny test for these participants Native speakersneed not take reading comprehension tests when entering the univer-sity so the three-way split was based entirely on our sample dataParticipants were classified as high if their scores on the Nelson-Denny were greater than or equal to 135 Participants were classifiedas mid if their scores ranged between 119ndash134 on the Nelson-DennyParticipants were classified as low if their scores fell at or below 118on the Nelson-Denny Descriptive statistics for the entire nativespeaker sample on basic comprehension group membership appearin Table 11
The discriminant analysis yielded one discriminant function with an eigenvalue of 31 responsible for 987 of the variance inoutcomes Wilkrsquos Lambda for this function was 76 significant at 001 the associated Chi-Square value was large (2639) and highlysignificant (p 001) indicating the group centroids on the discrim-inant function for the three reading ability groups on the Reading toLearnReading to Integrate Composite were significantly differentPooled within groups correlations between discriminating variablesshowed that Reading to Learn correlated with the first discriminantfunction at a level of 81 Reading to Integrate correlated with thesecond discriminant function at 71 This contrasts with findings forthe nonnative speakers where scores on the combined new measuresloaded significantly on only one discriminant function For nativespeakers then there is evidence for two significant discriminantfunctions although the first accounts for almost all of the varianceAlthough these two measures (Reading to Learn and Reading toIntegrate) loaded on two separate discriminant functions they still
Table 11 Descriptive statistics for native speakers by NelsonndashDenny reading abilitygroups (n 101)
Reading ability group n Mean on NelsonndashDenny sd
High (135) 39 14126 519Mid (119ndash134) 37 12565 509Low (118) 25 10288 1098Total 101 12604 1652
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
196 New tasks for reading comprehension tests
showed moderate correlations with the alternate function6 justifyingthe composite calculations
Results of discriminant analysis for native speakers seen in Table 12 show a different pattern than that observed for nonnativespeakers Nearly three-fourths (718) of the native speakersclassified as high in the basic comprehension reading ability groupremained high on the Reading to LearnReading to IntegrateComposite For the mid group however only 189 remainedclassified as mid Just over half (52) of the low reading abilitygroup members remained low on the Reading to LearnReading to Integrate Composite As with the nonnative speakers participantsin the mid category on basic comprehension showed the mostfrequent reclassification Forty-eight of the 101 (475) nativespeaker participants remained in the initial classification categoriesTwenty-two (218) were reclassified into a higher category and 31(307) were reclassified into a lower category Thus 53 participants(525) over half of the sample were classified differently based ontheir performance on the Reading to LearnReading to IntegrateComposite
6Reading to Learn correlated with function 2 at -58 Reading to Integrate with function 1 at 71
Table 12 Discriminant analysis comparison of native speaker reading abilitygroups with Reading to LearnReading to Integrate Composite (n 101)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto Integrate Composite
High Mid Low
CountHigh (56) 37 17 3CountHigh (135) 28 3 8 39Mid (119ndash134) 10 7 20 37Low (118) 5 7 13 25Reclassification total 43 17 41 101
PercentageHigh (135) 718 77 205 1000Mid (119ndash134) 270 189 541 1000Low (118) 200 280 520 1000
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
V Interpretation
Analyses done to answer Research Questions 1 and 2 showed that performance on Reading to Learn and Reading to Integratemeasures was significantly influenced by language background and level of education (graduate vs undergraduate for nonnativespeakers only) Moreover level of computer familiarity had nosignificant effect on Reading to Learn and Reading to Integrateperformance
Correlations showed that as expected all reading measures corre-lated to some degree generally answering Research Question 3 Themost interesting results came from the discriminant analyses becausethey showed a pattern of differential classification on the new taskssensitive to initial reading ability This pattern differed for nonnativespeakers and native speakers Examination of the reclassificationsbased on the new tasks revealed some of the problems of using abasic comprehension-only test to predict performance on more chal-lenging literacy tasks These results also imply the need for furtherwork on new measures of advanced literacy skills such as Reading toLearn and Reading to Integrate to reflect trends in construct-drivenassessment (Pellegrino et al 1999)
For the nonnative speakers most (818) of the participants withTOEFL Reading Comprehension below 50 remained classified as lowon the Reading to LearnReading to Integrate Composite suggestingthe existence of a lower threshold of academic English proficiencyParticipants below this threshold were unlikely to perform well ontasks assessing Reading to Learn and Reading to Integrate Even tasksinvolving only selection (such as BC1 and BC2) rather than produc-tion of responses were difficult for this group However for thenonnative speaker readers in the mid and high reading ability groupsbasic comprehension level was not nearly as consistent a predictor of performance on the Reading to LearnReading to IntegrateComposite This was especially striking for those in the mid readingability group Approximately one fourth of the mid group droppedinto the low category on the Reading to LearnReading to IntegrateComposite indicating that they experienced more difficulty in com-pleting the new measures but approximately one fourth could indeedcategorize and synthesize information relatively well despite mid-dling performance on basic comprehension This finding suggeststhat for nonnative speaker readers in the mid reading ability categorybasic comprehension measures were insufficient to predict theirperformance on the new measures
Latricia Trites and Mary McGroarty 197
Almost two-thirds of the high basic comprehension nonnativespeakers remained high on the Reading to LearnReading toIntegrate Composite but one third dropped suggesting that theycould not categorize and synthesize information from texts as well asthey could recognize information For this third the recognition-onlytask multiple-choice basic comprehension overestimated ability tomanipulate and synthesize information For the high and mid read-ing ability nonnative speaker groups basic comprehension-only testsmay then overestimate academic English proficiency
Results for the native speakers showed a more definitive pattern ofreclassification For the low reading ability group approximatelyhalf were misclassified suggesting that the basic comprehensionmeasure underrepresented their ability to categorize and synthesizeinformation in academic texts Most of the mid reading ability groupwere reclassified as lower on the Reading to LearnReading toIntegrate Composite showing that basic comprehension resultsoverestimated ability to succeed on more challenging tasks On theother hand almost one third of the mid reading ability group didbetter on the Reading to LearnReading to Integrate Composite forthem basic comprehension underrepresented their ability to com-plete more challenging reading tasks For the high ability readersnearly one third dropped on the Reading to LearnReading toIntegrate Composite For them the basic comprehension measureoverestimated their level of academic English proficiency The other two thirds of the high reading ability group remained highsuggesting a higher threshold of academic English proficiency
Considering results from both the nonnative speakers and nativespeakers we conclude that the new tasks did assess something dif-ferent from basic comprehension once a lower level threshold ofbasic academic English proficiency had been achieved Examinationof scatterplots based on discriminant functions indicated an obviousseparation between participants who could and could not perform onthe Reading to LearnReading to Integrate Composite thus provid-ing some tentative evidence for concurrent validity (Messick 1989Chapelle 1999) We had hoped to find clear evidence of a hierarchysuggesting that Reading to Learn was demonstrably more difficultthan basic comprehension and Reading to Integrate demonstrablymore difficult than Reading to Learn but results did not yield anobvious hierarchy Results did however suggest an even simplerpattern a dichotomy For those nonnative speakers above the lowerthreshold of English language proficiency which in our data wouldbe approximately 500 or 49 on the scaled TOEFL Reading
198 New tasks for reading comprehension tests
Comprehension scores some could perform well on the newmeasures some could not revealing two rather than three groups onthe Reading to LearnReading to Integrate Composite These resultslead us to speculate that the new measures tap additional skills suchas lsquosophisticated discourse processes and critical thinking skillsrsquo(Enright and Schedl 1999 24) in addition to language proficiencyFor native speakers the discriminant loading suggested a possiblehierarchy of difficulty but the scatterplots and classification tablesrevealed a dichotomy Reclassification based on the Reading toLearnReading to Integrate Composite nearly eliminated the mid basic comprehension reading ability category with only 168of the native speakers qualifying as mid on the Reading toLearnReading to Integrate Composite
VI Conclusions
While we acknowledge that there are some limitations to this projectsuch as the time required to administer and score new measuresthe results are illuminating useful and suggest some considerationsfor future research The first consideration relates to appropriateclassification of students based on academic reading abilities Thiscorresponds to the role of TOEFL as a gatekeeper by many institu-tions For TOEFL test-takers whose current TOEFL scores would bein an intermediate to high intermediate range the new tasks couldassess additional abilities relevant to academic performance Forsuch participants additional test development efforts are warrantedThe majority of high reading ability nonnative speakers (649)could perform the new measures a third could not Our resultssuggest that some unknown number of mid reading ability nonnativespeakers are more capable of succeeding at more challenging aca-demic reading tasks than their current level of basic comprehensionassessment would indicate At the same time there are some highand mid reading ability participants who could not perform the moredemanding tasks admitting such students directly into universityprograms could result in failure These results suggest that for moststudents at lower levels of basic comprehension (in our study thosewith TOEFL scores below 490 to 500) development of new tasks isunnecessary Nevertheless our results indicate that a certain smallpercentage (in our sample 8 students 182) of nonnative speakersclassified as low reading ability on basic comprehension did in fact
Latricia Trites and Mary McGroarty 199
perform adequately on more challenging tasks like Reading to Learnand Reading to Integrate Given the very large number of TOEFLtest-takers around the world this finding merits further research Itwould be potentially unfair to exclude such students from universitystudy based on basic comprehension-only measures such as the cur-rent TOEFL We would urge institutions that use TOEFL to considerscores on these new more demanding tasks if doing so wouldaddress institutional needs According to interview data collected inconjunction with this project (Trites 2000 Chapter 6) participantsperceived that successful completion of the new tasks required athorough understanding of the texts whereas the multiple-choicetests were less demanding requiring only superficial grasp of con-tent This finding corroborates the observation of Freedle and Kostin(1999 3) who note that often examinees do not need to comprehendthe accompanying text to answer the test item Therefore for allTOEFL test-takers incorporating more challenging tasks such as theReading to Learn and Reading to Integrate measures into typicalEnglish language instruction could have positive washback effects
The second area of consideration is the focus on additional rele-vant research For predictive validity it is important to determine thecorrelation between the Reading to LearnReading to IntegrateComposite measures and actual academic performance in universityclasses Correlations might differ depending on the degree of catego-rization and synthesis required by different major fields of studyAnother area for future research is related to the novelty of theReading to Learn task as an assessment technique Current pedagog-ical trends in literacy instruction emphasize the use of graphic organ-izers as a means to understand and manipulate information in textsTo our knowledge graphic organizers are typically used as classroomactivities rather than assessment tools They have good potential foruse in assessment if students are familiar with them and if appropri-ate scoring systems can be developed This project demonstrates that it is possible though labor intensive to develop reliable scoringsystems Further research is needed to explore refinement of this tasktype and related scoring systems for use in testing programs withlarge numbers of test-takers and scorers Although the Reading toIntegrate task (generating a synthesis) was more familiar the scoringsystem was innovative because it reflected a readerrsquos ability torecognize textual frames as well as integrate information If testdevelopers are interested in the abilities of nonnative speaker readersto perform such tasks further development of similar tasks andscoring systems is warranted While a substantial investment of time
200 New tasks for reading comprehension tests
would be required to refine the administration and scoring systemsneeded for more complex tasks such as these it should be weighedagainst the possible danger of under-representing the likelihood ofacademic success based on results of the basic comprehension-onlymeasures still most often used to assess academic proficiency
Acknowledgements
This project was funded by a grant from Educational Testing Serviceas part of the TOEFL 2000 effort We appreciate the cooperation ofthe ETS staff members who assisted in the development of passagespecific tasks and other aspects of the research however no officialendorsement of Educational Testing Service should be inferred
VII References
Bachman L 2000 Modern language testing at the turn of the century assur-ing that what we count counts Language Testing 17 1ndash42
Biber D 1993 Using register-diversified corpora for general language studiesComputational Linguistics 19 219ndash41
Britt M Rouet J and Perfetti C 1996 Using hypertext to study and reasonabout historical evidence In Rouet J Levonen J Dillon A and Spiro Reditors Hypertext and cognition Mahwah NJ Lawrence Erlbaum 43ndash72
Chapelle C 1999 Validity in language assessment Annual Review of AppliedLinguistics 19 1ndash19
Educational Testing Service 1997 TOEFL test and score manual PrincetonNJ Educational Testing Service
mdashmdash 1998 Draft TOEFL 2000 research agenda framework areas of researchResearch agenda three TOEFL 2000 internal document Princeton NJEducational Testing Service
Eignor D Taylor C Kirsch I and Jamieson J 1998 Development of ascale for assessing the level of computer familiarity of TOEFL exami-nees TOEFL Research Report No 60 Princeton NJ EducationalTesting Service
Enright M and Schedl M 1999 Reading for a reason using reader purposeto guide test design TOEFL 2000 Internal Report Princeton NJEducational Testing Service
Enright M Grabe W Mosenthal P Mulcahy-Ernt P and Schedl M1998 A TOEFL 2000 framework for testing reading comprehension aworking paper Princeton NJ Educational Testing Service
Foltz P 1996 Comprehension coherence and strategies in hypertext InRouet J Levonen J Dillon A and Spiro R editors Hypertext and cog-nition Mahwah NJ Lawrence Erlbaum 109ndash36
Latricia Trites and Mary McGroarty 201
Freedle R and Kostin I 1999 Does the text matter in a multiple choice testof comprehension The case for the construct validity of TOEFLrsquosminitalks Language Testing 16 2ndash32
Goldman S 1997 Learning from text reflections on the past and suggestionsfor the future Discourse Processes 23 357ndash98
Hayes J and Hatch J 1999 Issues in measuring reliability correlationversus percentage of agreement Written Communication 16 354ndash67
Huberty C 1994 Applied discriminant analysis New York John WileyJamieson J Campbell J Norfleet L and Berbisada N 1993 Reliability
of a computerized scoring routine for an open-ended task System 21305ndash22
Jamieson J Norfleet L and Berbisada N 1993 Successes failures anddropouts in computer-assisted language lessons Computer AssistedEnglish Language Learning Journal 4 12ndash20
Klecka W 1980 Discriminant analysis In Lewis-Beck M editorQuantitative applications in the social sciences Volume 19 NewburyPark CA Sage
Lehto M Zhu W and Carpenter B 1995 The relative effectiveness ofhypertext and text International Journal of Human-ComputerInteraction 7 293ndash313
McNamara D and Kintsch W 1996 Learning from texts effects of prior knowl-edge and text coherence Discourse Processes 22 247ndash88
Messick S 1989 Validity In Linn RL editor Educational measurement 3rdedition New York American Council on Education Macmillan 13ndash103
Meyer B 1985a Prose analyses purposes procedures and problems InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 11ndash64
mdashmdash 1985b Prose analysis purposes procedures and problems Part 2 InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 269ndash97
Monks V 1997 AugustSeptember Two views same waterway NationalWildlife 35 36ndash37
Pellegrino J Baxter G and Glaser R 1999 Addressing the lsquotwo disci-plinesrsquo problem linking theories of cognition and learning with assess-ment and instructional practice Review of Research in Education 24307ndash53
Perfetti C 1997 Sentences individual differences and multiple texts threeissues in text comprehension Discourse Processes 23 337ndash55
Perfetti C Britt MA and Georgi M 1995 Text-based learning andreasoning studies in history Hillside NJ Lawrence Erlbaum
Perfetti C Marron M and Foltz P 1996 Sources of comprehension failuretheoretical perspectives and case studies In Cornoldi C and Oakhill Jeditors Reading comprehension difficulties processes and interventionMahwah NJ Lawrence Erlbaum 137ndash65
202 New tasks for reading comprehension tests
Reinking D 1988 Computer-mediated text and comprehension differencesthe role of reading time reader preference and estimation of learningReading Research Quarterly 23 484ndash500
Reinking D and Schreiner R 1985 The effects of computer-mediated texton measures of reading comprehension and reading behavior ReadingResearch Quarterly 20 536ndash53
Spivey N 1997 The constructivist metaphor reading writing and themaking of meaning San Diego CA Academic Press
Stevens J 1996 Applied multivariate statistics for the social sciences 3rdedition Mahwah NJ Lawrence Erlbaum
Tabachnick B and Fidell L 1996 Using multivariate statistics 3rd editionNew York Harper Collins
Taylor C Jamieson J Eignor D and Kirsch I 1998 The relationshipbetween computer familiarity and performance on computer-basedTOEFL test tasks TOEFL Research Report No 61 Princeton NJEducational Testing Service
Tennesen M 1997 NovemberDecember On a clear day National Parks 7126ndash9
Trites L 2000 Beyond basic comprehension reading to learn and reading tointegrate for native and non-native speakers Unpublished doctoraldissertation Northern Arizona University Flagstaff AZ
Van den Berg S and Watt J 1991 Effects of educational setting on studentresponses to structured hypertext Journal of Computer-BasedInstruction 18 118ndash24
Van Dijk TA and Kintsch W 1983 Strategies of discourse comprehensionNew York Academic Press
Wiley J and Voss J 1999 Constructing arguments from multiple sourcestasks that promote understanding and not just memory for text Journalof Educational Psychology 91 310ndash11
Zimmerman T 1997 December 29 Filter it with billions and billions of oys-ters how to revive the Chesapeake Bay US News and World Report 12363
Appendix 1a Chart completion task
Directions Complete the following chart Fill in as much detail aspossible from the text read by categorizing the information into thedifferent areas on the chart Do not use single words for yourresponses form your responses in phrases or complete sentencesInclude examples from the text
Make no judgments about the accuracy of causes or effects or theeffectiveness of the solutions mentioned in the text Solutions are seen
Latricia Trites and Mary McGroarty 203
204 New tasks for reading comprehension tests
as any action taken in response to the problem(s) Solutions can takemany forms such as proposed solutions attempted solutions or failedsolutions Also space is provided under each category for examplesExamples are specific examples found in the text that are used by theauthor(s) to exemplify the problems causes effects or solutions inthe text However there may not be examples for every category
Points will be awarded for correct responses only There is no penaltyfor incorrect responses Points will be awarded in the following manner
Problems and Solutions 10 points eachCauses and Effects 5 points eachExamples 1 point each
Problems Causes Effects Solutions
Examples Examples Examples Examples
Latricia Trites and Mary McGroarty 205
Ap
pen
dix
1b
Rea
din
g t
o L
earn
sco
rin
g r
ub
ric
0ndash2
41 t
ota
l po
ssib
le
Pro
ble
ms
(10
po
ints
) 1
wo
rd
Cau
ses
(5p
oin
ts)
1 w
ord
E
ffec
ts (
5 p
oin
ts)
1 w
ord
S
olu
tio
ns
(10
po
ints
en
d)
1 w
ord
(6 p
oin
ts)
40
max
imu
m(3
po
ints
) 5
5 m
axim
um
(3 p
oin
ts)
30
max
imu
m(6
po
ints
) 9
0 m
axim
um
bullA
ir p
ollu
tio
nS
mo
g i
n t
he
bullS
ulf
ur
nit
rog
en e
mis
sio
ns
bullTre
es a
nd
pla
nts
aff
ecte
dS
tric
ter
En
vir
on
men
tal
Law
s
Nati
on
al
Park
sbull
Aci
d r
ain
(in
jure
d)
gro
wth
hin
der
ed(r
eso
luti
on
s
acts
)
bullO
pp
osit
ion
or
Ign
ori
ng
bull
Gro
un
d-l
evel
ozo
ne
bullV
isib
ilit
y d
ecre
ased
bull19
77 C
lean
Air
Act
(or
lack
of
coo
per
atio
n)
to
Urb
an
In
du
str
ial
em
issio
ns
bullM
eta
ls l
oo
sen
ed
into
wat
ers
Am
en
dm
en
ts l
ab
elin
gN
Ps
envi
ron
men
tal s
tan
dar
ds
bullA
uto
mo
bile
emis
sio
ns
(su
rfac
eg
rou
nd
wat
er
as C
lass I
are
as
(reg
ula
tio
ns)
bullP
ow
er
pla
nts
fac
tori
esp
lan
ts
wat
er p
ollu
ted
)bull
Reg
ion
al h
aze
reg
ula
tio
ns
bullN
o t
rue
lsquopo
int
sou
rces
rsquoem
issi
on
sbull
Nu
trie
nts
rem
oved
pro
po
sed
by
the
EPA
(myri
ad
of
sm
aller
po
llu
tio
n
bullE
mis
sio
ns
fro
m S
mo
kesta
cks
(lea
ched
) fr
om
so
il bull
Red
ucti
on
of
allo
wab
leso
urc
es
rath
er t
han
on
e (c
him
neys)
and
or
pla
nts
po
llu
tio
n s
tan
dard
s f
rom
la
rge
sou
rce)
bullE
mis
sio
ns
fro
m K
iln
sbull
Pu
blic o
utc
ryo
ver
stan
ce
ind
ust
rybull
Nati
on
al
Park
s l
imit
ed
bull
Un
reg
ula
ted
po
lluti
on
sm
og
o
f b
ig b
usi
nes
sg
ove
rnm
ent
bullC
lean
Air
Act
set
ju
risd
icti
on
of
po
lluti
on
(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)bull
Aq
uati
c l
ife in
jure
d (
dam
aged
)vis
ibilit
y g
oals
sou
rces
ou
tsid
e o
f th
e p
arks
bullS
mo
ke f
rom
Co
ntr
olled
bu
rns
bullO
bje
cti
ng
to c
on
str
ucti
on
bullE
mis
sio
ns
fro
m la
rge
citi
esp
erm
its
bullId
en
tifi
cati
on
of
po
lluti
on
so
urc
es
bullIn
du
str
y m
od
ificati
on
s
Insta
llati
on
o
f d
evic
es
(scr
ub
ber
s)
to r
edu
ce
po
lluti
on
bullP
ub
lic p
ressu
re t
op
rote
ct p
arks
206 New tasks for reading comprehension testsA
pp
en
dix
1b
(co
nti
nu
ed)
Exam
ple
s (
1 p
oin
t) 5
maxim
um
Exam
ple
s (
1 p
oin
t o
r 5 p
oin
ts
Exam
ple
s (
1 p
oin
t) 6
maxim
um
Exam
ple
s (
1 p
oin
t o
r 10 p
oin
ts
wit
h o
vera
rch
ing
cau
se
wit
h o
vera
rch
ing
so
luti
on
liste
d a
bo
ve)
liste
d a
bo
ve)
bullG
reat
Sm
oky M
ou
nta
inbull
Au
tom
ob
ile
emis
sio
ns
bullLe
aves
of
pla
nts
tu
rnin
g
bull19
77 C
lean
Air
Act
N
atio
nal
Par
kbull
Po
wer
pla
nts
fac
tori
es
pu
rple
an
d b
row
n (
stip
plin
g)
Am
en
dm
en
ts
lab
elin
gN
Ps
bullG
ran
d C
an
yo
np
lan
ts e
mis
sio
ns
bullH
ind
erin
g p
ho
tosyn
thesis
of
as C
lass I
are
as
Nat
ion
al P
ark
bullE
mis
sio
ns
fro
m
pla
nts
an
d t
rees
bullS
mo
ky M
ou
nta
in S
ou
rces
S
mo
kesta
cks
(ch
imn
eys)
bullR
ed
ucti
on
of
vis
ibilit
yat
bullR
egio
nal
haz
ere
gu
lati
on
s
fro
m O
hio
New
Yo
rk
bullE
mis
sio
ns
fro
m K
iln
sG
ran
d C
an
yo
np
rop
ose
d b
y th
e E
PA
Atl
anta
etc
bull
Un
reg
ula
ted
po
lluti
on
sm
og
bull
Vis
ibili
ty r
educ
ed 9
0 o
f bull
Red
ucti
on
of
allo
wab
lebull
Gra
nd
Can
yon
So
urc
es(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)th
e da
ysp
ollu
tio
n s
tan
dard
s f
rom
fro
m C
A N
V U
T A
Z N
M
bullS
mok
e fr
om C
on
tro
lled
bu
rns
bullS
ee v
ague
blu
e m
asse
sin
du
stry
and
Mex
ico
bullE
mis
sio
ns
fro
m la
rge
citi
esbull
See
hal
f as
far
as
in 1
919
bullT
N L
utt
rell
corp
bu
ildin
g
per
mit
sit
uat
ion
Exam
ple
s (
1 p
oin
t)
10 m
axim
um
Exam
ple
s (
1 p
oin
t)
5 m
axim
um
bullN
avaj
o G
ener
atin
g S
tati
on
bull
Ob
ject
ing
to
kiln
s in
TN
Pag
e A
Z (
Po
wer
Pla
nt
AZ
)bull
Res
earc
her
s u
sin
g s
cien
tifi
cbull
Po
wer
pla
nts
in T
N a
nd
OH
tech
no
log
y to
ID s
ou
rces
rive
r va
lley
(rad
ioac
tive
iso
top
es)
bullS
ou
ther
n C
alif
orn
ia E
dis
on
bull
So
uth
ern
Cal
ifo
rnia
Ed
iso
nP
lan
t L
aug
hlin
NV
Pla
nt
in L
aug
hlin
NV
bull
Ten
nes
see
Lutt
rell
Kiln
s T
Nid
enti
fied
as
po
int
sou
rce
bullIn
du
stry
in A
Zbull
Scru
bber
sin
stal
led
at
bullC
ars
in C
AN
avaj
o G
ener
atin
g P
lan
tbull
Sm
oke
stac
ks in
NM
NV
UT
bull10
r
edu
ctio
n o
f p
ollu
tio
nbull
Los
An
gel
esin
10ndash
15 y
ears
(g
oal
of
no
bullA
tlan
tam
an-m
ade
po
lluti
on
)bull
New
Yo
rk
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
Latricia Trites and Mary McGroarty 183
standardized basic reading comprehension measures (NelsonndashDennyor TOEFL Reading Comprehension) Students completed thecomputer familiarity questionnaire and the NelsonndashDenny Test at the same testing session because the NelsonndashDenny was shorter than the TOEFL Reading Comprehension During Session 2 partic-ipants took the other standardized basic reading comprehensionmeasure
Next each participant group was subdivided into two subgroupsfor computer-based or paper reading of the texts for the new tasksThe subgroups were matched on their performance on initial readingmeasures the NelsonndashDenny was used for native speakers and theTOEFL Reading Comprehension was used for nonnative speakersIndependent t-tests run on these reading measures showed no signif-icant difference in basic comprehension for the newly created sub-groups assigned to each medium ensuring that they were balancedfor initial reading levels Participants stayed in the same subgroupsfor the duration of the study To ensure uniformity of response modeall participants whether they read the source texts on the computeror on paper responded to the reading tasks using paper and pencilformat3
The last two sessions each lasting approximately one hour werededicated to administration of the new measures The Reading toLearn session took slightly longer to administer because administra-tive procedures were longer for this novel task The new tasks werecounterbalanced to control for practice effect thus half of the partic-ipants took the Reading to Learn measure first and half took theReading to Integrate measure first During Session 3 we administeredthe first new measure (for ease of discussion Reading to Learn is dis-cussed first) and BC1 At this session students were given 12 minutesto read a 1200-word passage either on computer or on paper We lim-ited the time allowed for reading based on 100 words per minutethought to be ample (Grabe personal communication 1998) Afterexaminees read the text they were given 4 minutes to take notes ona half sheet of paper Participants were instructed to take minimalnotes due to the time constraints Next the text was removed andexaminees were allowed 15 minutes to complete a chart based on the reading with the aid of their notes After completing this Readingto Learn activity participants were allowed to use the text and
3Although responses could have been entered and perhaps scored by computer this would haveintroduced factors not directly related to our research questions and remains an area for furtherstudy
were given 15 minutes to answer BC1 Following these new testingsessions 49 participants were selected for a related interview con-cerning the cognitive processes used in task completion (for furtherdetails see Trites 2000 Chapter 6)
During Session 4 students were given 12 minutes to read twoshort texts (600 words each) either on computer or paper Afterparticipants read the assigned texts they were given 4 minutes totake one-half page of notes (Enright et al 1998) Next the textswere removed and participants were asked to demonstrate Readingto Integrate by writing a synthesis of the texts with the aid of theirnotes (15 minutes allowed for this task) After completing theReading to Integrate task participants were allowed to see the textsagain and answered BC2 (15 minutes allowed for this task) In oneReading to Integrate session for unknown reasons six of the sevenparticipants read only one text Because we cannot explain the causeof this anomalous session we have eliminated scores from thesessionrsquos seven participants from subsequent analyses thus slightlyreducing the N size for the Reading to Integrate measure
b Variables used in study The six independent variables includedthree nominal (Native Language Background Medium of TextPresentation and Level of Education) and three interval variables(NelsonndashDenny TOEFL Reading Comprehension and ComputerFamiliarity) The four dependent variables were Reading to LearnReading to Integrate BC1 and BC2
IV Results
First we present the descriptive statistics for all reading measuresfollowed by a systematic analysis of independent variables that mightaffect participant performance on the new measures Scatterplotswere checked for all reading measures to ensure normality of dataKurtosis and skewness levels for all reading measures were found tobe within normal limits indicating a relatively normal distributionDescriptive statistics for all existing measures are shown in Table 1Means for these measures show a consistent pattern the nativespeaker undergraduates had the highest mean followed by the non-native speaker graduates followed by the nonnative speaker under-graduates On the reading measures NelsonndashDenny and TOEFLReading Comprehension the nonnative speaker undergraduates
184 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 185
showed the largest variance in performance while on the computerfamiliarity measure the variance of both nonnative speaker groupswas substantially larger than that of the native speakers
The same pattern emerged for the means on the new measures (seeTable 2) as for the existing measures The native speaker undergradu-ate group performed better on all new measures than both of thenonnative speaker groups The nonnative speaker graduate groupperformed better than the nonnative speaker undergraduate group onall measures as well This robust pattern of performance was alsofound in the variance of three of the four new measures On BC1 andBC2 the performance of the native speaker undergraduates showed the least amount of variance followed by the nonnative speaker grad-uates followed by the nonnative speaker undergraduates On Readingto Integrate the native speaker undergraduate group showed substan-tially less variance than the nonnative speaker groups however thevariance of the two nonnative speaker groups was almost identical OnReading to Learn all three groups showed considerable variance
Table 3 reveals the range of awarded points achieved by all partici-pant groups The nature of the Reading to Learn point system created amaximum possible point value (241) that no participant achieved Wespeculate that there are at least three possible causes of the discrepancybetween the theoretical maximum and the range of observed scores
Table 1 Descriptive statistics for existing measures for three participant groups
Group n Mean sd kMax
NelsonndashDennyNSU 105 12648 1646 156NNSU 106 6724 3191 156NNSG 40 8888 2188 156Total participants 251 9547 3693 156
TOEFL Reading comprehensionNSU 105 6130 424 67NNSU 106 5030 853 67NNSG 40 5715 455 67Total participants 251 5599 819 67
Computer familiarityNSU 104 3808 360 44NNSU 104 3482 599 44NNSG 40 3563 602 44Total participants 248 3631 533 44
Note kMax number of items or maximum possible score
186 New tasks for reading comprehension tests
bull task novelty no participant reported ever doing such a taskpreviously
bull time allowed for task completion andbull space on the response sheet space constraints may have limited
the amount of information that participants could include
Future research would need to address these issues However for theReading to Integrate measure the full range of possible point totalswas achieved by at least one participant in each group
1 Computer familiarity
The overall plan for the analyses was to check the influence of theindependent variables on the dependent measures with computerfamiliarity being addressed first Initially we had proposed that if computer familiarity was significantly different across groups itwould be entered into all calculations as a covariate To determinethis it was necessary to conduct an Analysis of Variance (ANOVA) forcomputer familiarity across the six participantmedium subgroups
Table 2 Descriptive statistics for new measures for three participant groups
Group n Mean sd kMax
Reading to Learn (chart)NSU 105 5185 1986 241NNSU 106 3173 1950 241NNSG 40 4468 1927 241Total participants 251 4221 2164 241
Basic Comprehension Test 1NSU 105 1698 247 20NNSU 106 1173 425 20NNSG 40 1498 350 20Total participants 251 1444 423 20
Reading to Integrate (synthesis)NSU 101 6365 1105 80NNSU 103 3724 2176 80NNSG 40 5360 2103 80Total participants 244 5086 2163 80
Basic Comprehension Test 2NSU 105 1591 278 20NNSU 106 975 454 20NNSG 40 1285 361 20Total participants 251 1282 472 20
Notes kMax number of items or maximum possible score n size reduced forreading to integrate because of anomalous testing session
Latricia Trites and Mary McGroarty 187
The resulting ANOVA (F 470 p 05) showed a significantdifference between subgroups on the computer familiarity question-naire therefore a post hoc Scheffeacute test was done to locate significantcontrasts After analysis of all possible subgroup contrasts the post hoc Scheffeacute revealed that the only significant difference in sub-groups appeared between the native speaker undergraduates andnonnative speaker undergraduates who read texts on paper Hencealthough there was one significant contrast it occurred in two sub-groups reading on paper not in any of the subgroups who read oncomputer All groups generally scored high on computer familiarityalthough as noted variance of the nonnative groups was greater Itwas thus established that computer familiarity had no significanteffect on participants who read texts on computer so we did not usecomputer familiarity as a covariate in further analyses and proceededto the three research questions of central interest to this study
Because both Research Questions 1 and 2 are similar ndash except thatthey address the two different new reading measures Reading toLearn and Reading to Integrate ndash we approached them in the samemanner through ANOVA to identify the independent variables thatcould have significantly affected the results on the new measures
2 Research Question 1
The first research question asked if performance on a measure ofReading to Learn was affected by medium of presentation computerfamiliarity native language or level of education We calculated a uni-variate ANOVA with Type III sums of squares on Reading to Learn with
Table 3 Range of scores for new measures for three participant groups
Group n Minimum Maximum kMax
Reading to Learn (chart)NSU 105 14 120 241NNSU 106 0 86 241NNSG 40 3 94 241Total participants 251 0 120 241
Reading to Integrate (synthesis)NSU 101 38 80 80NNSU 103 0 80 80NNSG 40 5 80 80Total participants 244 0 80 80
Notes kMax number of items or maximum possible score n size reduced forreading to integrate because of anomalous testing session
188 New tasks for reading comprehension tests
group status medium of text presentation and test order as possiblecontributing factors4 Table 4 shows that there were no significantinteractions for any of the group medium or test order combinationsThe only significant main effect was group membership
Because group membership was a combined measure that includedboth native language background as well as level of education posthoc analysis was needed to identify the significant contrasts Table 5shows that there was a significant difference in performance on theReading to Learn measure between the native speaker undergraduateand the nonnative speaker undergraduate groups as well as a sig-nificant difference between the nonnative speaker undergraduate and nonnative speaker graduate groups There was no significantdifference in performance between the native speaker undergrad-uate and the nonnative speaker graduate groups Therefore theanswer to Research Question 1 is that native language backgroundand level of education did have a significant effect on performance onthe Reading to Learn measure but that medium of text presentationdid not Further order of testing whether participants took Readingto Learn or Reading to Integrate first had no significant effect
3 Research Question 2
The second research question related to the first asked if perform-ance on Reading to Integrate was affected by medium of presentation
Table 4 Performance on Reading to Learn measure by groups medium and testorder (n 251) (univariate analysis of variance)
Source Type III sum df Mean square Fof squares
Group 2186929 2 1093464 2810Medium 121586 1 121586 313Test order 29498 1 29498 076Group medium 33481 2 16740 043Group test order 437 2 219 001Medium test order 39173 1 39173 101Group medium test order 57529 2 28765 074Error 9299745 239 38911
Note p 05
4Test order was added as an additional variable to double check that our counterbalancing had beeneffective in controlling for any practice effect
Latricia Trites and Mary McGroarty 189
computer familiarity native language or level of education Again toensure that counterbalancing of tests controlled for any practiceeffect test order was added as an additional variable
To answer this question we proceeded to calculate a univariateANOVA on the Reading to Integrate measure with group statusmedium of text presentation and test order entered as possible contri-buting factors The results (Table 6) show as for Research Question 1that there were no significant interactions for any of the groupmedium or test order combinations the only significant main effectwas group membership The answer for Research Question 2 is thatnative language background and educational level had a significanteffect on Reading to Integrate but medium of text presentation did notPost hoc analysis of group contrasts showed that all three groups weredistinct in their performance on Reading to Integrate (see Table 7)
4 Research Question 3
The third research question asked to what extent measures of basiccomprehension Reading to Learn and Reading to Integrate were
Table 5 Post hoc Scheffeacute for Reading to Learn measure (n 251)
Group n Group n Mean difference Standard error
NSU 105 NNSU 106 2012 272NNSG 40 717 367
NNSU 106 NNSG 40 1295 366
Note p 05
Table 6 Performance on Reading to Integrate measure by groups medium andtest order (n 244a) (univariate analysis of variance)
Source Type III sum df Mean square Fof squares
Group 3629433 2 1814717 5582b
Medium 19295 1 19295 059Test order 9783 1 9783 030Group medium 1182 2 591 002Group test order 3014 2 1507 005Medium test order 109872 1 109872 338Group medium test order 148858 2 74429 229Error 7543037 232 32513
Notes an size reduced for Reading to Integrate because of anomalous testingsession bp 05
190 New tasks for reading comprehension tests
related We used correlational analysis as the first step in answeringthis question Results for the total participant population (see Table 8)showed moderate to high correlations across all reading measuresHowever the analyses done for Research Questions 1 and 2 revealedthat group status had a significant effect on performance on Readingto Learn and Reading to Integrate Further we realize that corre-lations are sensitive to variance so the high correlations seen in thetotal population could have been an artifact of combining the threegroups Therefore we examined the correlations among all readingmeasures for each group (available in Trites 2000 Appendix 1 pp230ndash33) While the reading measures were still correlated oftenmoderately sometimes highly magnitudes differed and sometimesdropped substantially The text-specific multiple-choice measuresBC1 and BC2 consistently correlated more highly with theNelsonndashDenny and TOEFL Reading Comprehension tests than withReading to Learn and Reading to Integrate based on the same textssuggesting a test method or construct effect Because comparisonsbetween different measures of basic comprehension were not a goal of the project BC1 and BC2 were not used in further analysesWe conclude that as expected all reading measures were relatedbut the lower correlations between Reading to Learn and Reading to Integrate and the traditional basic comprehension measures led us to consider further types of analysis to identify the possibledistinctiveness of the new measures
5 Discriminant analysis
Because we were interested in determining how constructs differedwe sought additional analyses to help us better characterize the new constructs Of the several possible statistical methods that could have been employed two are most plausible multivariateanalysis of variance usually associated with experimental research
Table 7 Post hoc Scheffeacute for Reading to Integrate measure (n 244a)
Group n Group n Mean difference Standard error
NS 101 NNSU 103 2641b 253NNSG 40 1005b 337
NNSU 103 NNSG 40 1636b 336
Notes an size reduced for Reading to Integrate because of anomalous testingsession bp 05
Latricia Trites and Mary McGroarty 191
Tab
le 8
Co
rrel
atio
ns
for
all r
ead
ing
mea
sure
s fo
r al
l par
tici
pan
ts (
n
251)
TO
EFL
Rea
din
gB
asic
B
asic
Rea
din
g t
oR
ead
ing
to
C
om
pre
hen
sio
nC
om
pre
hen
sio
n T
est
1C
om
pre
hen
sio
n T
est
2Le
arn
Inte
gra
tea
Nel
son
ndashDen
ny
90b
85b
84b
66b
69b
TO
EFL
Rea
din
g
100
85b
84b
64b
69b
com
pre
hen
sio
nB
asic
1
008
4b6
8b6
8b
com
pre
hen
sio
n 1
Bas
ic
100
68b
70b
com
pre
hen
sio
n 2
Rea
din
g t
o L
earn
100
59b
Not
es a
nsi
ze r
edu
ced
fo
r R
ead
ing
to
Inte
gra
te b
ecau
se o
f an
om
alo
us
test
ing
ses
sio
n b
p
05
and discriminant analysis usually associated with descriptiveresearch (Tabachnick and Fidell 1996) The present research wasconducted with samples of naturally occurring student groups andwas not experimental Moreover we were interested in finding waysto compare participant performance on the measures of the newconstructs Reading to Learn and Reading to Integrate with perform-ance on more traditional measures of basic comprehension Thus we opted to use discriminant analysis because of its parsimony of description and clarity of interpretation (Stevens 1996)Discriminant analysis a technique recommended to describe groupdifferences or predict group membership based on a comparison ofmultiple predictors (Huberty 1994) has been used in other areas ofapplied linguistic research to investigate creation of a student profileof success or failure on Computer Assisted Language Learning(CALL) lessons (Jamieson et al 1993) and accurate classification oftext types into registers (Biber 1993) among other purposes
To further distinguish basic comprehension from the new con-structs we conducted discriminant analysis on each of the twolanguage groups (native and nonnative) to determine whetherReading to Learn and Reading to Integrate would classify partici-pants in the same way that Basic Comprehension would We dividedthe native speaker and nonnative speaker groups into three levelshigh middle (mid) and low reading ability scorers based on thebasic comprehension measure chosen for that group (NelsonndashDennyfor native speakers TOEFL Reading Comprehension for nonnativespeakers) Research methodologists (Tabachnick and Fidell 1996513) note that robustness is expected when the smallest group has isleast 20 our smallest group was 25 To check the assumption ofhomogeneity of the variancecovariance matrices we examined theoutcomes of Boxrsquos M Test and found them all nonsignificant(Klecka 1980) Each group was checked for outliers usingMahalanobis distance treated as Chi-Square and no outliers were found (Tabachnick and Fidell 1996) Thus the data met allassumptions required for use of discriminant analysis
a Discriminant analysis for nonnative speakers To organize thediscriminant analysis in order to see if the Reading to LearnReadingto Integrate Composite classified participants similarly to themeasure of Basic Comprehension (for nonnative speakers TOEFLReading Comprehension) we divided the entire nonnative speakergroup (n 146) into three levels of basic comprehension high mid
192 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 193
and low These reading ability groups of high mid and low werebased on typical TOEFL Reading Comprehension score levelsrequired for program entry Participants were classified as high iftheir scores were greater than or equal to 550 or 56 and above on thescaled score on the TOEFL Reading Comprehension (550 is the cutscore often used for graduate entry) Participants were classified asmid if scores ranged between 500 and 549 or 50 to 55 on the TOEFLReading Comprehension Participants were classified as low if theirscores fell below 500 or 49 and below on TOEFL ReadingComprehension (500 is a minimum TOEFL score sometimes usedfor undergraduate admission often with the proviso that studentsenroll in ESL classes either prior to official enrollment or concur-rently) For our entire nonnative speaker group descriptive statisticson basic comprehension reading ability group membership levelsappear in Table 9
We ran SPSS Discriminant Analysis with initial grouping variablesof high mid and low reading ability We compared initial readingability levels with high mid and low categories on the Reading toLearnReading to Integrate Composite a new variable reflectinglevel of performance on the Reading to Learn and Reading toIntegrate measures combined5 The discriminant analysis yieldedone discriminant function with an eigenvalue of 92 responsible for999 of the variance in outcomes Wilkrsquos Lambda for this functionwas 52 significant at 001 the associated Chi-Square value wasextremely large (9093) and highly significant ( p 001) indicatingthe group centroids on the composite Reading to LearnReading toIntegrate function for the three nonnative speaker reading abilitygroups were significantly different Both Reading to Learn and
Table 9 Descriptive statistics for nonnative speakers by TOEFL reading compre-hension reading ability groups (n 143)
Reading ability group n Mean on TOEFL reading comprehension sd
High (56) 57 5968 303Mid (50ndash55) 42 5281 167Low (49) 44 4211 547
Total 143 5226 822
Note Total n 143 due to loss of three cases in anomalous Reading to Integratesession
5We first calculated two separate discriminant analyses one for Reading to Learn and one for Reading to Integrate but we found that both loaded on a single function so we used thecomposite in subsequent analyses
194 New tasks for reading comprehension tests
Reading to Integrate loaded significantly on the discriminant func-tion at 86 for Reading to Learn and 78 for Reading to Integrate ( p 05)
Over half (649) of the high reading ability group remained highon the new measure less than half (429) of the mid reading ability group remained classified as mid Hence the Reading toLearnReading to Integrate Composite was particularly influential inreclassifying the mid reading ability group and to a lesser extent thehigh group However most (818) of the low reading ability groupremained low on the Reading to LearnReading to IntegrateComposite (see Table 10) Of the 143 nonnative speaker participants92 (64) remained in the initial basic comprehension category onthe composite the rest moved but in different directions Twenty-one participants (147) were classified into a higher category on theReading to LearnReading to Integrate Composite than their initialbasic comprehension level would have suggested while 31 (217)were reclassified into a lower category Thus 51 participants(364) just over one third of the sample were classified differentlybased on their Reading to LearnReading to Integrate Compositeperformance
b Discriminant analysis for native speakers Because one of ourgoals in this project was to probe the possible validity of these newmeasures by assessing performance of two groups native as well as
Table 10 Discriminant analysis comparison of nonnative speaker reading abilitygroups with reading to learnreading to integrate composite (n 143)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto integrate composite
High Mid Low
CountHigh (56) 37 17 3 57Mid (50ndash55) 13 18 11 42Low (49) 2 6 36 44Reclassification total 52 41 50 143
PercentageHigh (56) 649 298 53 1000Mid (50ndash55) 310 429 262 1000Low (49) 45 136 818 1000
NoteTotal n 143 due to loss of three cases in anomalous Reading to Integratesession
Latricia Trites and Mary McGroarty 195
nonnative speakers we conducted a parallel discriminant analysisfor native speakers Thus for the native speakers we followed thesame procedure dividing the entire native speaker group (n 105)into three levels of basic comprehension high mid and low basedon score distances of 5 standard deviations from the sample meanof the NelsonndashDenny test for these participants Native speakersneed not take reading comprehension tests when entering the univer-sity so the three-way split was based entirely on our sample dataParticipants were classified as high if their scores on the Nelson-Denny were greater than or equal to 135 Participants were classifiedas mid if their scores ranged between 119ndash134 on the Nelson-DennyParticipants were classified as low if their scores fell at or below 118on the Nelson-Denny Descriptive statistics for the entire nativespeaker sample on basic comprehension group membership appearin Table 11
The discriminant analysis yielded one discriminant function with an eigenvalue of 31 responsible for 987 of the variance inoutcomes Wilkrsquos Lambda for this function was 76 significant at 001 the associated Chi-Square value was large (2639) and highlysignificant (p 001) indicating the group centroids on the discrim-inant function for the three reading ability groups on the Reading toLearnReading to Integrate Composite were significantly differentPooled within groups correlations between discriminating variablesshowed that Reading to Learn correlated with the first discriminantfunction at a level of 81 Reading to Integrate correlated with thesecond discriminant function at 71 This contrasts with findings forthe nonnative speakers where scores on the combined new measuresloaded significantly on only one discriminant function For nativespeakers then there is evidence for two significant discriminantfunctions although the first accounts for almost all of the varianceAlthough these two measures (Reading to Learn and Reading toIntegrate) loaded on two separate discriminant functions they still
Table 11 Descriptive statistics for native speakers by NelsonndashDenny reading abilitygroups (n 101)
Reading ability group n Mean on NelsonndashDenny sd
High (135) 39 14126 519Mid (119ndash134) 37 12565 509Low (118) 25 10288 1098Total 101 12604 1652
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
196 New tasks for reading comprehension tests
showed moderate correlations with the alternate function6 justifyingthe composite calculations
Results of discriminant analysis for native speakers seen in Table 12 show a different pattern than that observed for nonnativespeakers Nearly three-fourths (718) of the native speakersclassified as high in the basic comprehension reading ability groupremained high on the Reading to LearnReading to IntegrateComposite For the mid group however only 189 remainedclassified as mid Just over half (52) of the low reading abilitygroup members remained low on the Reading to LearnReading to Integrate Composite As with the nonnative speakers participantsin the mid category on basic comprehension showed the mostfrequent reclassification Forty-eight of the 101 (475) nativespeaker participants remained in the initial classification categoriesTwenty-two (218) were reclassified into a higher category and 31(307) were reclassified into a lower category Thus 53 participants(525) over half of the sample were classified differently based ontheir performance on the Reading to LearnReading to IntegrateComposite
6Reading to Learn correlated with function 2 at -58 Reading to Integrate with function 1 at 71
Table 12 Discriminant analysis comparison of native speaker reading abilitygroups with Reading to LearnReading to Integrate Composite (n 101)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto Integrate Composite
High Mid Low
CountHigh (56) 37 17 3CountHigh (135) 28 3 8 39Mid (119ndash134) 10 7 20 37Low (118) 5 7 13 25Reclassification total 43 17 41 101
PercentageHigh (135) 718 77 205 1000Mid (119ndash134) 270 189 541 1000Low (118) 200 280 520 1000
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
V Interpretation
Analyses done to answer Research Questions 1 and 2 showed that performance on Reading to Learn and Reading to Integratemeasures was significantly influenced by language background and level of education (graduate vs undergraduate for nonnativespeakers only) Moreover level of computer familiarity had nosignificant effect on Reading to Learn and Reading to Integrateperformance
Correlations showed that as expected all reading measures corre-lated to some degree generally answering Research Question 3 Themost interesting results came from the discriminant analyses becausethey showed a pattern of differential classification on the new taskssensitive to initial reading ability This pattern differed for nonnativespeakers and native speakers Examination of the reclassificationsbased on the new tasks revealed some of the problems of using abasic comprehension-only test to predict performance on more chal-lenging literacy tasks These results also imply the need for furtherwork on new measures of advanced literacy skills such as Reading toLearn and Reading to Integrate to reflect trends in construct-drivenassessment (Pellegrino et al 1999)
For the nonnative speakers most (818) of the participants withTOEFL Reading Comprehension below 50 remained classified as lowon the Reading to LearnReading to Integrate Composite suggestingthe existence of a lower threshold of academic English proficiencyParticipants below this threshold were unlikely to perform well ontasks assessing Reading to Learn and Reading to Integrate Even tasksinvolving only selection (such as BC1 and BC2) rather than produc-tion of responses were difficult for this group However for thenonnative speaker readers in the mid and high reading ability groupsbasic comprehension level was not nearly as consistent a predictor of performance on the Reading to LearnReading to IntegrateComposite This was especially striking for those in the mid readingability group Approximately one fourth of the mid group droppedinto the low category on the Reading to LearnReading to IntegrateComposite indicating that they experienced more difficulty in com-pleting the new measures but approximately one fourth could indeedcategorize and synthesize information relatively well despite mid-dling performance on basic comprehension This finding suggeststhat for nonnative speaker readers in the mid reading ability categorybasic comprehension measures were insufficient to predict theirperformance on the new measures
Latricia Trites and Mary McGroarty 197
Almost two-thirds of the high basic comprehension nonnativespeakers remained high on the Reading to LearnReading toIntegrate Composite but one third dropped suggesting that theycould not categorize and synthesize information from texts as well asthey could recognize information For this third the recognition-onlytask multiple-choice basic comprehension overestimated ability tomanipulate and synthesize information For the high and mid read-ing ability nonnative speaker groups basic comprehension-only testsmay then overestimate academic English proficiency
Results for the native speakers showed a more definitive pattern ofreclassification For the low reading ability group approximatelyhalf were misclassified suggesting that the basic comprehensionmeasure underrepresented their ability to categorize and synthesizeinformation in academic texts Most of the mid reading ability groupwere reclassified as lower on the Reading to LearnReading toIntegrate Composite showing that basic comprehension resultsoverestimated ability to succeed on more challenging tasks On theother hand almost one third of the mid reading ability group didbetter on the Reading to LearnReading to Integrate Composite forthem basic comprehension underrepresented their ability to com-plete more challenging reading tasks For the high ability readersnearly one third dropped on the Reading to LearnReading toIntegrate Composite For them the basic comprehension measureoverestimated their level of academic English proficiency The other two thirds of the high reading ability group remained highsuggesting a higher threshold of academic English proficiency
Considering results from both the nonnative speakers and nativespeakers we conclude that the new tasks did assess something dif-ferent from basic comprehension once a lower level threshold ofbasic academic English proficiency had been achieved Examinationof scatterplots based on discriminant functions indicated an obviousseparation between participants who could and could not perform onthe Reading to LearnReading to Integrate Composite thus provid-ing some tentative evidence for concurrent validity (Messick 1989Chapelle 1999) We had hoped to find clear evidence of a hierarchysuggesting that Reading to Learn was demonstrably more difficultthan basic comprehension and Reading to Integrate demonstrablymore difficult than Reading to Learn but results did not yield anobvious hierarchy Results did however suggest an even simplerpattern a dichotomy For those nonnative speakers above the lowerthreshold of English language proficiency which in our data wouldbe approximately 500 or 49 on the scaled TOEFL Reading
198 New tasks for reading comprehension tests
Comprehension scores some could perform well on the newmeasures some could not revealing two rather than three groups onthe Reading to LearnReading to Integrate Composite These resultslead us to speculate that the new measures tap additional skills suchas lsquosophisticated discourse processes and critical thinking skillsrsquo(Enright and Schedl 1999 24) in addition to language proficiencyFor native speakers the discriminant loading suggested a possiblehierarchy of difficulty but the scatterplots and classification tablesrevealed a dichotomy Reclassification based on the Reading toLearnReading to Integrate Composite nearly eliminated the mid basic comprehension reading ability category with only 168of the native speakers qualifying as mid on the Reading toLearnReading to Integrate Composite
VI Conclusions
While we acknowledge that there are some limitations to this projectsuch as the time required to administer and score new measuresthe results are illuminating useful and suggest some considerationsfor future research The first consideration relates to appropriateclassification of students based on academic reading abilities Thiscorresponds to the role of TOEFL as a gatekeeper by many institu-tions For TOEFL test-takers whose current TOEFL scores would bein an intermediate to high intermediate range the new tasks couldassess additional abilities relevant to academic performance Forsuch participants additional test development efforts are warrantedThe majority of high reading ability nonnative speakers (649)could perform the new measures a third could not Our resultssuggest that some unknown number of mid reading ability nonnativespeakers are more capable of succeeding at more challenging aca-demic reading tasks than their current level of basic comprehensionassessment would indicate At the same time there are some highand mid reading ability participants who could not perform the moredemanding tasks admitting such students directly into universityprograms could result in failure These results suggest that for moststudents at lower levels of basic comprehension (in our study thosewith TOEFL scores below 490 to 500) development of new tasks isunnecessary Nevertheless our results indicate that a certain smallpercentage (in our sample 8 students 182) of nonnative speakersclassified as low reading ability on basic comprehension did in fact
Latricia Trites and Mary McGroarty 199
perform adequately on more challenging tasks like Reading to Learnand Reading to Integrate Given the very large number of TOEFLtest-takers around the world this finding merits further research Itwould be potentially unfair to exclude such students from universitystudy based on basic comprehension-only measures such as the cur-rent TOEFL We would urge institutions that use TOEFL to considerscores on these new more demanding tasks if doing so wouldaddress institutional needs According to interview data collected inconjunction with this project (Trites 2000 Chapter 6) participantsperceived that successful completion of the new tasks required athorough understanding of the texts whereas the multiple-choicetests were less demanding requiring only superficial grasp of con-tent This finding corroborates the observation of Freedle and Kostin(1999 3) who note that often examinees do not need to comprehendthe accompanying text to answer the test item Therefore for allTOEFL test-takers incorporating more challenging tasks such as theReading to Learn and Reading to Integrate measures into typicalEnglish language instruction could have positive washback effects
The second area of consideration is the focus on additional rele-vant research For predictive validity it is important to determine thecorrelation between the Reading to LearnReading to IntegrateComposite measures and actual academic performance in universityclasses Correlations might differ depending on the degree of catego-rization and synthesis required by different major fields of studyAnother area for future research is related to the novelty of theReading to Learn task as an assessment technique Current pedagog-ical trends in literacy instruction emphasize the use of graphic organ-izers as a means to understand and manipulate information in textsTo our knowledge graphic organizers are typically used as classroomactivities rather than assessment tools They have good potential foruse in assessment if students are familiar with them and if appropri-ate scoring systems can be developed This project demonstrates that it is possible though labor intensive to develop reliable scoringsystems Further research is needed to explore refinement of this tasktype and related scoring systems for use in testing programs withlarge numbers of test-takers and scorers Although the Reading toIntegrate task (generating a synthesis) was more familiar the scoringsystem was innovative because it reflected a readerrsquos ability torecognize textual frames as well as integrate information If testdevelopers are interested in the abilities of nonnative speaker readersto perform such tasks further development of similar tasks andscoring systems is warranted While a substantial investment of time
200 New tasks for reading comprehension tests
would be required to refine the administration and scoring systemsneeded for more complex tasks such as these it should be weighedagainst the possible danger of under-representing the likelihood ofacademic success based on results of the basic comprehension-onlymeasures still most often used to assess academic proficiency
Acknowledgements
This project was funded by a grant from Educational Testing Serviceas part of the TOEFL 2000 effort We appreciate the cooperation ofthe ETS staff members who assisted in the development of passagespecific tasks and other aspects of the research however no officialendorsement of Educational Testing Service should be inferred
VII References
Bachman L 2000 Modern language testing at the turn of the century assur-ing that what we count counts Language Testing 17 1ndash42
Biber D 1993 Using register-diversified corpora for general language studiesComputational Linguistics 19 219ndash41
Britt M Rouet J and Perfetti C 1996 Using hypertext to study and reasonabout historical evidence In Rouet J Levonen J Dillon A and Spiro Reditors Hypertext and cognition Mahwah NJ Lawrence Erlbaum 43ndash72
Chapelle C 1999 Validity in language assessment Annual Review of AppliedLinguistics 19 1ndash19
Educational Testing Service 1997 TOEFL test and score manual PrincetonNJ Educational Testing Service
mdashmdash 1998 Draft TOEFL 2000 research agenda framework areas of researchResearch agenda three TOEFL 2000 internal document Princeton NJEducational Testing Service
Eignor D Taylor C Kirsch I and Jamieson J 1998 Development of ascale for assessing the level of computer familiarity of TOEFL exami-nees TOEFL Research Report No 60 Princeton NJ EducationalTesting Service
Enright M and Schedl M 1999 Reading for a reason using reader purposeto guide test design TOEFL 2000 Internal Report Princeton NJEducational Testing Service
Enright M Grabe W Mosenthal P Mulcahy-Ernt P and Schedl M1998 A TOEFL 2000 framework for testing reading comprehension aworking paper Princeton NJ Educational Testing Service
Foltz P 1996 Comprehension coherence and strategies in hypertext InRouet J Levonen J Dillon A and Spiro R editors Hypertext and cog-nition Mahwah NJ Lawrence Erlbaum 109ndash36
Latricia Trites and Mary McGroarty 201
Freedle R and Kostin I 1999 Does the text matter in a multiple choice testof comprehension The case for the construct validity of TOEFLrsquosminitalks Language Testing 16 2ndash32
Goldman S 1997 Learning from text reflections on the past and suggestionsfor the future Discourse Processes 23 357ndash98
Hayes J and Hatch J 1999 Issues in measuring reliability correlationversus percentage of agreement Written Communication 16 354ndash67
Huberty C 1994 Applied discriminant analysis New York John WileyJamieson J Campbell J Norfleet L and Berbisada N 1993 Reliability
of a computerized scoring routine for an open-ended task System 21305ndash22
Jamieson J Norfleet L and Berbisada N 1993 Successes failures anddropouts in computer-assisted language lessons Computer AssistedEnglish Language Learning Journal 4 12ndash20
Klecka W 1980 Discriminant analysis In Lewis-Beck M editorQuantitative applications in the social sciences Volume 19 NewburyPark CA Sage
Lehto M Zhu W and Carpenter B 1995 The relative effectiveness ofhypertext and text International Journal of Human-ComputerInteraction 7 293ndash313
McNamara D and Kintsch W 1996 Learning from texts effects of prior knowl-edge and text coherence Discourse Processes 22 247ndash88
Messick S 1989 Validity In Linn RL editor Educational measurement 3rdedition New York American Council on Education Macmillan 13ndash103
Meyer B 1985a Prose analyses purposes procedures and problems InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 11ndash64
mdashmdash 1985b Prose analysis purposes procedures and problems Part 2 InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 269ndash97
Monks V 1997 AugustSeptember Two views same waterway NationalWildlife 35 36ndash37
Pellegrino J Baxter G and Glaser R 1999 Addressing the lsquotwo disci-plinesrsquo problem linking theories of cognition and learning with assess-ment and instructional practice Review of Research in Education 24307ndash53
Perfetti C 1997 Sentences individual differences and multiple texts threeissues in text comprehension Discourse Processes 23 337ndash55
Perfetti C Britt MA and Georgi M 1995 Text-based learning andreasoning studies in history Hillside NJ Lawrence Erlbaum
Perfetti C Marron M and Foltz P 1996 Sources of comprehension failuretheoretical perspectives and case studies In Cornoldi C and Oakhill Jeditors Reading comprehension difficulties processes and interventionMahwah NJ Lawrence Erlbaum 137ndash65
202 New tasks for reading comprehension tests
Reinking D 1988 Computer-mediated text and comprehension differencesthe role of reading time reader preference and estimation of learningReading Research Quarterly 23 484ndash500
Reinking D and Schreiner R 1985 The effects of computer-mediated texton measures of reading comprehension and reading behavior ReadingResearch Quarterly 20 536ndash53
Spivey N 1997 The constructivist metaphor reading writing and themaking of meaning San Diego CA Academic Press
Stevens J 1996 Applied multivariate statistics for the social sciences 3rdedition Mahwah NJ Lawrence Erlbaum
Tabachnick B and Fidell L 1996 Using multivariate statistics 3rd editionNew York Harper Collins
Taylor C Jamieson J Eignor D and Kirsch I 1998 The relationshipbetween computer familiarity and performance on computer-basedTOEFL test tasks TOEFL Research Report No 61 Princeton NJEducational Testing Service
Tennesen M 1997 NovemberDecember On a clear day National Parks 7126ndash9
Trites L 2000 Beyond basic comprehension reading to learn and reading tointegrate for native and non-native speakers Unpublished doctoraldissertation Northern Arizona University Flagstaff AZ
Van den Berg S and Watt J 1991 Effects of educational setting on studentresponses to structured hypertext Journal of Computer-BasedInstruction 18 118ndash24
Van Dijk TA and Kintsch W 1983 Strategies of discourse comprehensionNew York Academic Press
Wiley J and Voss J 1999 Constructing arguments from multiple sourcestasks that promote understanding and not just memory for text Journalof Educational Psychology 91 310ndash11
Zimmerman T 1997 December 29 Filter it with billions and billions of oys-ters how to revive the Chesapeake Bay US News and World Report 12363
Appendix 1a Chart completion task
Directions Complete the following chart Fill in as much detail aspossible from the text read by categorizing the information into thedifferent areas on the chart Do not use single words for yourresponses form your responses in phrases or complete sentencesInclude examples from the text
Make no judgments about the accuracy of causes or effects or theeffectiveness of the solutions mentioned in the text Solutions are seen
Latricia Trites and Mary McGroarty 203
204 New tasks for reading comprehension tests
as any action taken in response to the problem(s) Solutions can takemany forms such as proposed solutions attempted solutions or failedsolutions Also space is provided under each category for examplesExamples are specific examples found in the text that are used by theauthor(s) to exemplify the problems causes effects or solutions inthe text However there may not be examples for every category
Points will be awarded for correct responses only There is no penaltyfor incorrect responses Points will be awarded in the following manner
Problems and Solutions 10 points eachCauses and Effects 5 points eachExamples 1 point each
Problems Causes Effects Solutions
Examples Examples Examples Examples
Latricia Trites and Mary McGroarty 205
Ap
pen
dix
1b
Rea
din
g t
o L
earn
sco
rin
g r
ub
ric
0ndash2
41 t
ota
l po
ssib
le
Pro
ble
ms
(10
po
ints
) 1
wo
rd
Cau
ses
(5p
oin
ts)
1 w
ord
E
ffec
ts (
5 p
oin
ts)
1 w
ord
S
olu
tio
ns
(10
po
ints
en
d)
1 w
ord
(6 p
oin
ts)
40
max
imu
m(3
po
ints
) 5
5 m
axim
um
(3 p
oin
ts)
30
max
imu
m(6
po
ints
) 9
0 m
axim
um
bullA
ir p
ollu
tio
nS
mo
g i
n t
he
bullS
ulf
ur
nit
rog
en e
mis
sio
ns
bullTre
es a
nd
pla
nts
aff
ecte
dS
tric
ter
En
vir
on
men
tal
Law
s
Nati
on
al
Park
sbull
Aci
d r
ain
(in
jure
d)
gro
wth
hin
der
ed(r
eso
luti
on
s
acts
)
bullO
pp
osit
ion
or
Ign
ori
ng
bull
Gro
un
d-l
evel
ozo
ne
bullV
isib
ilit
y d
ecre
ased
bull19
77 C
lean
Air
Act
(or
lack
of
coo
per
atio
n)
to
Urb
an
In
du
str
ial
em
issio
ns
bullM
eta
ls l
oo
sen
ed
into
wat
ers
Am
en
dm
en
ts l
ab
elin
gN
Ps
envi
ron
men
tal s
tan
dar
ds
bullA
uto
mo
bile
emis
sio
ns
(su
rfac
eg
rou
nd
wat
er
as C
lass I
are
as
(reg
ula
tio
ns)
bullP
ow
er
pla
nts
fac
tori
esp
lan
ts
wat
er p
ollu
ted
)bull
Reg
ion
al h
aze
reg
ula
tio
ns
bullN
o t
rue
lsquopo
int
sou
rces
rsquoem
issi
on
sbull
Nu
trie
nts
rem
oved
pro
po
sed
by
the
EPA
(myri
ad
of
sm
aller
po
llu
tio
n
bullE
mis
sio
ns
fro
m S
mo
kesta
cks
(lea
ched
) fr
om
so
il bull
Red
ucti
on
of
allo
wab
leso
urc
es
rath
er t
han
on
e (c
him
neys)
and
or
pla
nts
po
llu
tio
n s
tan
dard
s f
rom
la
rge
sou
rce)
bullE
mis
sio
ns
fro
m K
iln
sbull
Pu
blic o
utc
ryo
ver
stan
ce
ind
ust
rybull
Nati
on
al
Park
s l
imit
ed
bull
Un
reg
ula
ted
po
lluti
on
sm
og
o
f b
ig b
usi
nes
sg
ove
rnm
ent
bullC
lean
Air
Act
set
ju
risd
icti
on
of
po
lluti
on
(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)bull
Aq
uati
c l
ife in
jure
d (
dam
aged
)vis
ibilit
y g
oals
sou
rces
ou
tsid
e o
f th
e p
arks
bullS
mo
ke f
rom
Co
ntr
olled
bu
rns
bullO
bje
cti
ng
to c
on
str
ucti
on
bullE
mis
sio
ns
fro
m la
rge
citi
esp
erm
its
bullId
en
tifi
cati
on
of
po
lluti
on
so
urc
es
bullIn
du
str
y m
od
ificati
on
s
Insta
llati
on
o
f d
evic
es
(scr
ub
ber
s)
to r
edu
ce
po
lluti
on
bullP
ub
lic p
ressu
re t
op
rote
ct p
arks
206 New tasks for reading comprehension testsA
pp
en
dix
1b
(co
nti
nu
ed)
Exam
ple
s (
1 p
oin
t) 5
maxim
um
Exam
ple
s (
1 p
oin
t o
r 5 p
oin
ts
Exam
ple
s (
1 p
oin
t) 6
maxim
um
Exam
ple
s (
1 p
oin
t o
r 10 p
oin
ts
wit
h o
vera
rch
ing
cau
se
wit
h o
vera
rch
ing
so
luti
on
liste
d a
bo
ve)
liste
d a
bo
ve)
bullG
reat
Sm
oky M
ou
nta
inbull
Au
tom
ob
ile
emis
sio
ns
bullLe
aves
of
pla
nts
tu
rnin
g
bull19
77 C
lean
Air
Act
N
atio
nal
Par
kbull
Po
wer
pla
nts
fac
tori
es
pu
rple
an
d b
row
n (
stip
plin
g)
Am
en
dm
en
ts
lab
elin
gN
Ps
bullG
ran
d C
an
yo
np
lan
ts e
mis
sio
ns
bullH
ind
erin
g p
ho
tosyn
thesis
of
as C
lass I
are
as
Nat
ion
al P
ark
bullE
mis
sio
ns
fro
m
pla
nts
an
d t
rees
bullS
mo
ky M
ou
nta
in S
ou
rces
S
mo
kesta
cks
(ch
imn
eys)
bullR
ed
ucti
on
of
vis
ibilit
yat
bullR
egio
nal
haz
ere
gu
lati
on
s
fro
m O
hio
New
Yo
rk
bullE
mis
sio
ns
fro
m K
iln
sG
ran
d C
an
yo
np
rop
ose
d b
y th
e E
PA
Atl
anta
etc
bull
Un
reg
ula
ted
po
lluti
on
sm
og
bull
Vis
ibili
ty r
educ
ed 9
0 o
f bull
Red
ucti
on
of
allo
wab
lebull
Gra
nd
Can
yon
So
urc
es(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)th
e da
ysp
ollu
tio
n s
tan
dard
s f
rom
fro
m C
A N
V U
T A
Z N
M
bullS
mok
e fr
om C
on
tro
lled
bu
rns
bullS
ee v
ague
blu
e m
asse
sin
du
stry
and
Mex
ico
bullE
mis
sio
ns
fro
m la
rge
citi
esbull
See
hal
f as
far
as
in 1
919
bullT
N L
utt
rell
corp
bu
ildin
g
per
mit
sit
uat
ion
Exam
ple
s (
1 p
oin
t)
10 m
axim
um
Exam
ple
s (
1 p
oin
t)
5 m
axim
um
bullN
avaj
o G
ener
atin
g S
tati
on
bull
Ob
ject
ing
to
kiln
s in
TN
Pag
e A
Z (
Po
wer
Pla
nt
AZ
)bull
Res
earc
her
s u
sin
g s
cien
tifi
cbull
Po
wer
pla
nts
in T
N a
nd
OH
tech
no
log
y to
ID s
ou
rces
rive
r va
lley
(rad
ioac
tive
iso
top
es)
bullS
ou
ther
n C
alif
orn
ia E
dis
on
bull
So
uth
ern
Cal
ifo
rnia
Ed
iso
nP
lan
t L
aug
hlin
NV
Pla
nt
in L
aug
hlin
NV
bull
Ten
nes
see
Lutt
rell
Kiln
s T
Nid
enti
fied
as
po
int
sou
rce
bullIn
du
stry
in A
Zbull
Scru
bber
sin
stal
led
at
bullC
ars
in C
AN
avaj
o G
ener
atin
g P
lan
tbull
Sm
oke
stac
ks in
NM
NV
UT
bull10
r
edu
ctio
n o
f p
ollu
tio
nbull
Los
An
gel
esin
10ndash
15 y
ears
(g
oal
of
no
bullA
tlan
tam
an-m
ade
po
lluti
on
)bull
New
Yo
rk
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
were given 15 minutes to answer BC1 Following these new testingsessions 49 participants were selected for a related interview con-cerning the cognitive processes used in task completion (for furtherdetails see Trites 2000 Chapter 6)
During Session 4 students were given 12 minutes to read twoshort texts (600 words each) either on computer or paper Afterparticipants read the assigned texts they were given 4 minutes totake one-half page of notes (Enright et al 1998) Next the textswere removed and participants were asked to demonstrate Readingto Integrate by writing a synthesis of the texts with the aid of theirnotes (15 minutes allowed for this task) After completing theReading to Integrate task participants were allowed to see the textsagain and answered BC2 (15 minutes allowed for this task) In oneReading to Integrate session for unknown reasons six of the sevenparticipants read only one text Because we cannot explain the causeof this anomalous session we have eliminated scores from thesessionrsquos seven participants from subsequent analyses thus slightlyreducing the N size for the Reading to Integrate measure
b Variables used in study The six independent variables includedthree nominal (Native Language Background Medium of TextPresentation and Level of Education) and three interval variables(NelsonndashDenny TOEFL Reading Comprehension and ComputerFamiliarity) The four dependent variables were Reading to LearnReading to Integrate BC1 and BC2
IV Results
First we present the descriptive statistics for all reading measuresfollowed by a systematic analysis of independent variables that mightaffect participant performance on the new measures Scatterplotswere checked for all reading measures to ensure normality of dataKurtosis and skewness levels for all reading measures were found tobe within normal limits indicating a relatively normal distributionDescriptive statistics for all existing measures are shown in Table 1Means for these measures show a consistent pattern the nativespeaker undergraduates had the highest mean followed by the non-native speaker graduates followed by the nonnative speaker under-graduates On the reading measures NelsonndashDenny and TOEFLReading Comprehension the nonnative speaker undergraduates
184 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 185
showed the largest variance in performance while on the computerfamiliarity measure the variance of both nonnative speaker groupswas substantially larger than that of the native speakers
The same pattern emerged for the means on the new measures (seeTable 2) as for the existing measures The native speaker undergradu-ate group performed better on all new measures than both of thenonnative speaker groups The nonnative speaker graduate groupperformed better than the nonnative speaker undergraduate group onall measures as well This robust pattern of performance was alsofound in the variance of three of the four new measures On BC1 andBC2 the performance of the native speaker undergraduates showed the least amount of variance followed by the nonnative speaker grad-uates followed by the nonnative speaker undergraduates On Readingto Integrate the native speaker undergraduate group showed substan-tially less variance than the nonnative speaker groups however thevariance of the two nonnative speaker groups was almost identical OnReading to Learn all three groups showed considerable variance
Table 3 reveals the range of awarded points achieved by all partici-pant groups The nature of the Reading to Learn point system created amaximum possible point value (241) that no participant achieved Wespeculate that there are at least three possible causes of the discrepancybetween the theoretical maximum and the range of observed scores
Table 1 Descriptive statistics for existing measures for three participant groups
Group n Mean sd kMax
NelsonndashDennyNSU 105 12648 1646 156NNSU 106 6724 3191 156NNSG 40 8888 2188 156Total participants 251 9547 3693 156
TOEFL Reading comprehensionNSU 105 6130 424 67NNSU 106 5030 853 67NNSG 40 5715 455 67Total participants 251 5599 819 67
Computer familiarityNSU 104 3808 360 44NNSU 104 3482 599 44NNSG 40 3563 602 44Total participants 248 3631 533 44
Note kMax number of items or maximum possible score
186 New tasks for reading comprehension tests
bull task novelty no participant reported ever doing such a taskpreviously
bull time allowed for task completion andbull space on the response sheet space constraints may have limited
the amount of information that participants could include
Future research would need to address these issues However for theReading to Integrate measure the full range of possible point totalswas achieved by at least one participant in each group
1 Computer familiarity
The overall plan for the analyses was to check the influence of theindependent variables on the dependent measures with computerfamiliarity being addressed first Initially we had proposed that if computer familiarity was significantly different across groups itwould be entered into all calculations as a covariate To determinethis it was necessary to conduct an Analysis of Variance (ANOVA) forcomputer familiarity across the six participantmedium subgroups
Table 2 Descriptive statistics for new measures for three participant groups
Group n Mean sd kMax
Reading to Learn (chart)NSU 105 5185 1986 241NNSU 106 3173 1950 241NNSG 40 4468 1927 241Total participants 251 4221 2164 241
Basic Comprehension Test 1NSU 105 1698 247 20NNSU 106 1173 425 20NNSG 40 1498 350 20Total participants 251 1444 423 20
Reading to Integrate (synthesis)NSU 101 6365 1105 80NNSU 103 3724 2176 80NNSG 40 5360 2103 80Total participants 244 5086 2163 80
Basic Comprehension Test 2NSU 105 1591 278 20NNSU 106 975 454 20NNSG 40 1285 361 20Total participants 251 1282 472 20
Notes kMax number of items or maximum possible score n size reduced forreading to integrate because of anomalous testing session
Latricia Trites and Mary McGroarty 187
The resulting ANOVA (F 470 p 05) showed a significantdifference between subgroups on the computer familiarity question-naire therefore a post hoc Scheffeacute test was done to locate significantcontrasts After analysis of all possible subgroup contrasts the post hoc Scheffeacute revealed that the only significant difference in sub-groups appeared between the native speaker undergraduates andnonnative speaker undergraduates who read texts on paper Hencealthough there was one significant contrast it occurred in two sub-groups reading on paper not in any of the subgroups who read oncomputer All groups generally scored high on computer familiarityalthough as noted variance of the nonnative groups was greater Itwas thus established that computer familiarity had no significanteffect on participants who read texts on computer so we did not usecomputer familiarity as a covariate in further analyses and proceededto the three research questions of central interest to this study
Because both Research Questions 1 and 2 are similar ndash except thatthey address the two different new reading measures Reading toLearn and Reading to Integrate ndash we approached them in the samemanner through ANOVA to identify the independent variables thatcould have significantly affected the results on the new measures
2 Research Question 1
The first research question asked if performance on a measure ofReading to Learn was affected by medium of presentation computerfamiliarity native language or level of education We calculated a uni-variate ANOVA with Type III sums of squares on Reading to Learn with
Table 3 Range of scores for new measures for three participant groups
Group n Minimum Maximum kMax
Reading to Learn (chart)NSU 105 14 120 241NNSU 106 0 86 241NNSG 40 3 94 241Total participants 251 0 120 241
Reading to Integrate (synthesis)NSU 101 38 80 80NNSU 103 0 80 80NNSG 40 5 80 80Total participants 244 0 80 80
Notes kMax number of items or maximum possible score n size reduced forreading to integrate because of anomalous testing session
188 New tasks for reading comprehension tests
group status medium of text presentation and test order as possiblecontributing factors4 Table 4 shows that there were no significantinteractions for any of the group medium or test order combinationsThe only significant main effect was group membership
Because group membership was a combined measure that includedboth native language background as well as level of education posthoc analysis was needed to identify the significant contrasts Table 5shows that there was a significant difference in performance on theReading to Learn measure between the native speaker undergraduateand the nonnative speaker undergraduate groups as well as a sig-nificant difference between the nonnative speaker undergraduate and nonnative speaker graduate groups There was no significantdifference in performance between the native speaker undergrad-uate and the nonnative speaker graduate groups Therefore theanswer to Research Question 1 is that native language backgroundand level of education did have a significant effect on performance onthe Reading to Learn measure but that medium of text presentationdid not Further order of testing whether participants took Readingto Learn or Reading to Integrate first had no significant effect
3 Research Question 2
The second research question related to the first asked if perform-ance on Reading to Integrate was affected by medium of presentation
Table 4 Performance on Reading to Learn measure by groups medium and testorder (n 251) (univariate analysis of variance)
Source Type III sum df Mean square Fof squares
Group 2186929 2 1093464 2810Medium 121586 1 121586 313Test order 29498 1 29498 076Group medium 33481 2 16740 043Group test order 437 2 219 001Medium test order 39173 1 39173 101Group medium test order 57529 2 28765 074Error 9299745 239 38911
Note p 05
4Test order was added as an additional variable to double check that our counterbalancing had beeneffective in controlling for any practice effect
Latricia Trites and Mary McGroarty 189
computer familiarity native language or level of education Again toensure that counterbalancing of tests controlled for any practiceeffect test order was added as an additional variable
To answer this question we proceeded to calculate a univariateANOVA on the Reading to Integrate measure with group statusmedium of text presentation and test order entered as possible contri-buting factors The results (Table 6) show as for Research Question 1that there were no significant interactions for any of the groupmedium or test order combinations the only significant main effectwas group membership The answer for Research Question 2 is thatnative language background and educational level had a significanteffect on Reading to Integrate but medium of text presentation did notPost hoc analysis of group contrasts showed that all three groups weredistinct in their performance on Reading to Integrate (see Table 7)
4 Research Question 3
The third research question asked to what extent measures of basiccomprehension Reading to Learn and Reading to Integrate were
Table 5 Post hoc Scheffeacute for Reading to Learn measure (n 251)
Group n Group n Mean difference Standard error
NSU 105 NNSU 106 2012 272NNSG 40 717 367
NNSU 106 NNSG 40 1295 366
Note p 05
Table 6 Performance on Reading to Integrate measure by groups medium andtest order (n 244a) (univariate analysis of variance)
Source Type III sum df Mean square Fof squares
Group 3629433 2 1814717 5582b
Medium 19295 1 19295 059Test order 9783 1 9783 030Group medium 1182 2 591 002Group test order 3014 2 1507 005Medium test order 109872 1 109872 338Group medium test order 148858 2 74429 229Error 7543037 232 32513
Notes an size reduced for Reading to Integrate because of anomalous testingsession bp 05
190 New tasks for reading comprehension tests
related We used correlational analysis as the first step in answeringthis question Results for the total participant population (see Table 8)showed moderate to high correlations across all reading measuresHowever the analyses done for Research Questions 1 and 2 revealedthat group status had a significant effect on performance on Readingto Learn and Reading to Integrate Further we realize that corre-lations are sensitive to variance so the high correlations seen in thetotal population could have been an artifact of combining the threegroups Therefore we examined the correlations among all readingmeasures for each group (available in Trites 2000 Appendix 1 pp230ndash33) While the reading measures were still correlated oftenmoderately sometimes highly magnitudes differed and sometimesdropped substantially The text-specific multiple-choice measuresBC1 and BC2 consistently correlated more highly with theNelsonndashDenny and TOEFL Reading Comprehension tests than withReading to Learn and Reading to Integrate based on the same textssuggesting a test method or construct effect Because comparisonsbetween different measures of basic comprehension were not a goal of the project BC1 and BC2 were not used in further analysesWe conclude that as expected all reading measures were relatedbut the lower correlations between Reading to Learn and Reading to Integrate and the traditional basic comprehension measures led us to consider further types of analysis to identify the possibledistinctiveness of the new measures
5 Discriminant analysis
Because we were interested in determining how constructs differedwe sought additional analyses to help us better characterize the new constructs Of the several possible statistical methods that could have been employed two are most plausible multivariateanalysis of variance usually associated with experimental research
Table 7 Post hoc Scheffeacute for Reading to Integrate measure (n 244a)
Group n Group n Mean difference Standard error
NS 101 NNSU 103 2641b 253NNSG 40 1005b 337
NNSU 103 NNSG 40 1636b 336
Notes an size reduced for Reading to Integrate because of anomalous testingsession bp 05
Latricia Trites and Mary McGroarty 191
Tab
le 8
Co
rrel
atio
ns
for
all r
ead
ing
mea
sure
s fo
r al
l par
tici
pan
ts (
n
251)
TO
EFL
Rea
din
gB
asic
B
asic
Rea
din
g t
oR
ead
ing
to
C
om
pre
hen
sio
nC
om
pre
hen
sio
n T
est
1C
om
pre
hen
sio
n T
est
2Le
arn
Inte
gra
tea
Nel
son
ndashDen
ny
90b
85b
84b
66b
69b
TO
EFL
Rea
din
g
100
85b
84b
64b
69b
com
pre
hen
sio
nB
asic
1
008
4b6
8b6
8b
com
pre
hen
sio
n 1
Bas
ic
100
68b
70b
com
pre
hen
sio
n 2
Rea
din
g t
o L
earn
100
59b
Not
es a
nsi
ze r
edu
ced
fo
r R
ead
ing
to
Inte
gra
te b
ecau
se o
f an
om
alo
us
test
ing
ses
sio
n b
p
05
and discriminant analysis usually associated with descriptiveresearch (Tabachnick and Fidell 1996) The present research wasconducted with samples of naturally occurring student groups andwas not experimental Moreover we were interested in finding waysto compare participant performance on the measures of the newconstructs Reading to Learn and Reading to Integrate with perform-ance on more traditional measures of basic comprehension Thus we opted to use discriminant analysis because of its parsimony of description and clarity of interpretation (Stevens 1996)Discriminant analysis a technique recommended to describe groupdifferences or predict group membership based on a comparison ofmultiple predictors (Huberty 1994) has been used in other areas ofapplied linguistic research to investigate creation of a student profileof success or failure on Computer Assisted Language Learning(CALL) lessons (Jamieson et al 1993) and accurate classification oftext types into registers (Biber 1993) among other purposes
To further distinguish basic comprehension from the new con-structs we conducted discriminant analysis on each of the twolanguage groups (native and nonnative) to determine whetherReading to Learn and Reading to Integrate would classify partici-pants in the same way that Basic Comprehension would We dividedthe native speaker and nonnative speaker groups into three levelshigh middle (mid) and low reading ability scorers based on thebasic comprehension measure chosen for that group (NelsonndashDennyfor native speakers TOEFL Reading Comprehension for nonnativespeakers) Research methodologists (Tabachnick and Fidell 1996513) note that robustness is expected when the smallest group has isleast 20 our smallest group was 25 To check the assumption ofhomogeneity of the variancecovariance matrices we examined theoutcomes of Boxrsquos M Test and found them all nonsignificant(Klecka 1980) Each group was checked for outliers usingMahalanobis distance treated as Chi-Square and no outliers were found (Tabachnick and Fidell 1996) Thus the data met allassumptions required for use of discriminant analysis
a Discriminant analysis for nonnative speakers To organize thediscriminant analysis in order to see if the Reading to LearnReadingto Integrate Composite classified participants similarly to themeasure of Basic Comprehension (for nonnative speakers TOEFLReading Comprehension) we divided the entire nonnative speakergroup (n 146) into three levels of basic comprehension high mid
192 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 193
and low These reading ability groups of high mid and low werebased on typical TOEFL Reading Comprehension score levelsrequired for program entry Participants were classified as high iftheir scores were greater than or equal to 550 or 56 and above on thescaled score on the TOEFL Reading Comprehension (550 is the cutscore often used for graduate entry) Participants were classified asmid if scores ranged between 500 and 549 or 50 to 55 on the TOEFLReading Comprehension Participants were classified as low if theirscores fell below 500 or 49 and below on TOEFL ReadingComprehension (500 is a minimum TOEFL score sometimes usedfor undergraduate admission often with the proviso that studentsenroll in ESL classes either prior to official enrollment or concur-rently) For our entire nonnative speaker group descriptive statisticson basic comprehension reading ability group membership levelsappear in Table 9
We ran SPSS Discriminant Analysis with initial grouping variablesof high mid and low reading ability We compared initial readingability levels with high mid and low categories on the Reading toLearnReading to Integrate Composite a new variable reflectinglevel of performance on the Reading to Learn and Reading toIntegrate measures combined5 The discriminant analysis yieldedone discriminant function with an eigenvalue of 92 responsible for999 of the variance in outcomes Wilkrsquos Lambda for this functionwas 52 significant at 001 the associated Chi-Square value wasextremely large (9093) and highly significant ( p 001) indicatingthe group centroids on the composite Reading to LearnReading toIntegrate function for the three nonnative speaker reading abilitygroups were significantly different Both Reading to Learn and
Table 9 Descriptive statistics for nonnative speakers by TOEFL reading compre-hension reading ability groups (n 143)
Reading ability group n Mean on TOEFL reading comprehension sd
High (56) 57 5968 303Mid (50ndash55) 42 5281 167Low (49) 44 4211 547
Total 143 5226 822
Note Total n 143 due to loss of three cases in anomalous Reading to Integratesession
5We first calculated two separate discriminant analyses one for Reading to Learn and one for Reading to Integrate but we found that both loaded on a single function so we used thecomposite in subsequent analyses
194 New tasks for reading comprehension tests
Reading to Integrate loaded significantly on the discriminant func-tion at 86 for Reading to Learn and 78 for Reading to Integrate ( p 05)
Over half (649) of the high reading ability group remained highon the new measure less than half (429) of the mid reading ability group remained classified as mid Hence the Reading toLearnReading to Integrate Composite was particularly influential inreclassifying the mid reading ability group and to a lesser extent thehigh group However most (818) of the low reading ability groupremained low on the Reading to LearnReading to IntegrateComposite (see Table 10) Of the 143 nonnative speaker participants92 (64) remained in the initial basic comprehension category onthe composite the rest moved but in different directions Twenty-one participants (147) were classified into a higher category on theReading to LearnReading to Integrate Composite than their initialbasic comprehension level would have suggested while 31 (217)were reclassified into a lower category Thus 51 participants(364) just over one third of the sample were classified differentlybased on their Reading to LearnReading to Integrate Compositeperformance
b Discriminant analysis for native speakers Because one of ourgoals in this project was to probe the possible validity of these newmeasures by assessing performance of two groups native as well as
Table 10 Discriminant analysis comparison of nonnative speaker reading abilitygroups with reading to learnreading to integrate composite (n 143)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto integrate composite
High Mid Low
CountHigh (56) 37 17 3 57Mid (50ndash55) 13 18 11 42Low (49) 2 6 36 44Reclassification total 52 41 50 143
PercentageHigh (56) 649 298 53 1000Mid (50ndash55) 310 429 262 1000Low (49) 45 136 818 1000
NoteTotal n 143 due to loss of three cases in anomalous Reading to Integratesession
Latricia Trites and Mary McGroarty 195
nonnative speakers we conducted a parallel discriminant analysisfor native speakers Thus for the native speakers we followed thesame procedure dividing the entire native speaker group (n 105)into three levels of basic comprehension high mid and low basedon score distances of 5 standard deviations from the sample meanof the NelsonndashDenny test for these participants Native speakersneed not take reading comprehension tests when entering the univer-sity so the three-way split was based entirely on our sample dataParticipants were classified as high if their scores on the Nelson-Denny were greater than or equal to 135 Participants were classifiedas mid if their scores ranged between 119ndash134 on the Nelson-DennyParticipants were classified as low if their scores fell at or below 118on the Nelson-Denny Descriptive statistics for the entire nativespeaker sample on basic comprehension group membership appearin Table 11
The discriminant analysis yielded one discriminant function with an eigenvalue of 31 responsible for 987 of the variance inoutcomes Wilkrsquos Lambda for this function was 76 significant at 001 the associated Chi-Square value was large (2639) and highlysignificant (p 001) indicating the group centroids on the discrim-inant function for the three reading ability groups on the Reading toLearnReading to Integrate Composite were significantly differentPooled within groups correlations between discriminating variablesshowed that Reading to Learn correlated with the first discriminantfunction at a level of 81 Reading to Integrate correlated with thesecond discriminant function at 71 This contrasts with findings forthe nonnative speakers where scores on the combined new measuresloaded significantly on only one discriminant function For nativespeakers then there is evidence for two significant discriminantfunctions although the first accounts for almost all of the varianceAlthough these two measures (Reading to Learn and Reading toIntegrate) loaded on two separate discriminant functions they still
Table 11 Descriptive statistics for native speakers by NelsonndashDenny reading abilitygroups (n 101)
Reading ability group n Mean on NelsonndashDenny sd
High (135) 39 14126 519Mid (119ndash134) 37 12565 509Low (118) 25 10288 1098Total 101 12604 1652
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
196 New tasks for reading comprehension tests
showed moderate correlations with the alternate function6 justifyingthe composite calculations
Results of discriminant analysis for native speakers seen in Table 12 show a different pattern than that observed for nonnativespeakers Nearly three-fourths (718) of the native speakersclassified as high in the basic comprehension reading ability groupremained high on the Reading to LearnReading to IntegrateComposite For the mid group however only 189 remainedclassified as mid Just over half (52) of the low reading abilitygroup members remained low on the Reading to LearnReading to Integrate Composite As with the nonnative speakers participantsin the mid category on basic comprehension showed the mostfrequent reclassification Forty-eight of the 101 (475) nativespeaker participants remained in the initial classification categoriesTwenty-two (218) were reclassified into a higher category and 31(307) were reclassified into a lower category Thus 53 participants(525) over half of the sample were classified differently based ontheir performance on the Reading to LearnReading to IntegrateComposite
6Reading to Learn correlated with function 2 at -58 Reading to Integrate with function 1 at 71
Table 12 Discriminant analysis comparison of native speaker reading abilitygroups with Reading to LearnReading to Integrate Composite (n 101)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto Integrate Composite
High Mid Low
CountHigh (56) 37 17 3CountHigh (135) 28 3 8 39Mid (119ndash134) 10 7 20 37Low (118) 5 7 13 25Reclassification total 43 17 41 101
PercentageHigh (135) 718 77 205 1000Mid (119ndash134) 270 189 541 1000Low (118) 200 280 520 1000
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
V Interpretation
Analyses done to answer Research Questions 1 and 2 showed that performance on Reading to Learn and Reading to Integratemeasures was significantly influenced by language background and level of education (graduate vs undergraduate for nonnativespeakers only) Moreover level of computer familiarity had nosignificant effect on Reading to Learn and Reading to Integrateperformance
Correlations showed that as expected all reading measures corre-lated to some degree generally answering Research Question 3 Themost interesting results came from the discriminant analyses becausethey showed a pattern of differential classification on the new taskssensitive to initial reading ability This pattern differed for nonnativespeakers and native speakers Examination of the reclassificationsbased on the new tasks revealed some of the problems of using abasic comprehension-only test to predict performance on more chal-lenging literacy tasks These results also imply the need for furtherwork on new measures of advanced literacy skills such as Reading toLearn and Reading to Integrate to reflect trends in construct-drivenassessment (Pellegrino et al 1999)
For the nonnative speakers most (818) of the participants withTOEFL Reading Comprehension below 50 remained classified as lowon the Reading to LearnReading to Integrate Composite suggestingthe existence of a lower threshold of academic English proficiencyParticipants below this threshold were unlikely to perform well ontasks assessing Reading to Learn and Reading to Integrate Even tasksinvolving only selection (such as BC1 and BC2) rather than produc-tion of responses were difficult for this group However for thenonnative speaker readers in the mid and high reading ability groupsbasic comprehension level was not nearly as consistent a predictor of performance on the Reading to LearnReading to IntegrateComposite This was especially striking for those in the mid readingability group Approximately one fourth of the mid group droppedinto the low category on the Reading to LearnReading to IntegrateComposite indicating that they experienced more difficulty in com-pleting the new measures but approximately one fourth could indeedcategorize and synthesize information relatively well despite mid-dling performance on basic comprehension This finding suggeststhat for nonnative speaker readers in the mid reading ability categorybasic comprehension measures were insufficient to predict theirperformance on the new measures
Latricia Trites and Mary McGroarty 197
Almost two-thirds of the high basic comprehension nonnativespeakers remained high on the Reading to LearnReading toIntegrate Composite but one third dropped suggesting that theycould not categorize and synthesize information from texts as well asthey could recognize information For this third the recognition-onlytask multiple-choice basic comprehension overestimated ability tomanipulate and synthesize information For the high and mid read-ing ability nonnative speaker groups basic comprehension-only testsmay then overestimate academic English proficiency
Results for the native speakers showed a more definitive pattern ofreclassification For the low reading ability group approximatelyhalf were misclassified suggesting that the basic comprehensionmeasure underrepresented their ability to categorize and synthesizeinformation in academic texts Most of the mid reading ability groupwere reclassified as lower on the Reading to LearnReading toIntegrate Composite showing that basic comprehension resultsoverestimated ability to succeed on more challenging tasks On theother hand almost one third of the mid reading ability group didbetter on the Reading to LearnReading to Integrate Composite forthem basic comprehension underrepresented their ability to com-plete more challenging reading tasks For the high ability readersnearly one third dropped on the Reading to LearnReading toIntegrate Composite For them the basic comprehension measureoverestimated their level of academic English proficiency The other two thirds of the high reading ability group remained highsuggesting a higher threshold of academic English proficiency
Considering results from both the nonnative speakers and nativespeakers we conclude that the new tasks did assess something dif-ferent from basic comprehension once a lower level threshold ofbasic academic English proficiency had been achieved Examinationof scatterplots based on discriminant functions indicated an obviousseparation between participants who could and could not perform onthe Reading to LearnReading to Integrate Composite thus provid-ing some tentative evidence for concurrent validity (Messick 1989Chapelle 1999) We had hoped to find clear evidence of a hierarchysuggesting that Reading to Learn was demonstrably more difficultthan basic comprehension and Reading to Integrate demonstrablymore difficult than Reading to Learn but results did not yield anobvious hierarchy Results did however suggest an even simplerpattern a dichotomy For those nonnative speakers above the lowerthreshold of English language proficiency which in our data wouldbe approximately 500 or 49 on the scaled TOEFL Reading
198 New tasks for reading comprehension tests
Comprehension scores some could perform well on the newmeasures some could not revealing two rather than three groups onthe Reading to LearnReading to Integrate Composite These resultslead us to speculate that the new measures tap additional skills suchas lsquosophisticated discourse processes and critical thinking skillsrsquo(Enright and Schedl 1999 24) in addition to language proficiencyFor native speakers the discriminant loading suggested a possiblehierarchy of difficulty but the scatterplots and classification tablesrevealed a dichotomy Reclassification based on the Reading toLearnReading to Integrate Composite nearly eliminated the mid basic comprehension reading ability category with only 168of the native speakers qualifying as mid on the Reading toLearnReading to Integrate Composite
VI Conclusions
While we acknowledge that there are some limitations to this projectsuch as the time required to administer and score new measuresthe results are illuminating useful and suggest some considerationsfor future research The first consideration relates to appropriateclassification of students based on academic reading abilities Thiscorresponds to the role of TOEFL as a gatekeeper by many institu-tions For TOEFL test-takers whose current TOEFL scores would bein an intermediate to high intermediate range the new tasks couldassess additional abilities relevant to academic performance Forsuch participants additional test development efforts are warrantedThe majority of high reading ability nonnative speakers (649)could perform the new measures a third could not Our resultssuggest that some unknown number of mid reading ability nonnativespeakers are more capable of succeeding at more challenging aca-demic reading tasks than their current level of basic comprehensionassessment would indicate At the same time there are some highand mid reading ability participants who could not perform the moredemanding tasks admitting such students directly into universityprograms could result in failure These results suggest that for moststudents at lower levels of basic comprehension (in our study thosewith TOEFL scores below 490 to 500) development of new tasks isunnecessary Nevertheless our results indicate that a certain smallpercentage (in our sample 8 students 182) of nonnative speakersclassified as low reading ability on basic comprehension did in fact
Latricia Trites and Mary McGroarty 199
perform adequately on more challenging tasks like Reading to Learnand Reading to Integrate Given the very large number of TOEFLtest-takers around the world this finding merits further research Itwould be potentially unfair to exclude such students from universitystudy based on basic comprehension-only measures such as the cur-rent TOEFL We would urge institutions that use TOEFL to considerscores on these new more demanding tasks if doing so wouldaddress institutional needs According to interview data collected inconjunction with this project (Trites 2000 Chapter 6) participantsperceived that successful completion of the new tasks required athorough understanding of the texts whereas the multiple-choicetests were less demanding requiring only superficial grasp of con-tent This finding corroborates the observation of Freedle and Kostin(1999 3) who note that often examinees do not need to comprehendthe accompanying text to answer the test item Therefore for allTOEFL test-takers incorporating more challenging tasks such as theReading to Learn and Reading to Integrate measures into typicalEnglish language instruction could have positive washback effects
The second area of consideration is the focus on additional rele-vant research For predictive validity it is important to determine thecorrelation between the Reading to LearnReading to IntegrateComposite measures and actual academic performance in universityclasses Correlations might differ depending on the degree of catego-rization and synthesis required by different major fields of studyAnother area for future research is related to the novelty of theReading to Learn task as an assessment technique Current pedagog-ical trends in literacy instruction emphasize the use of graphic organ-izers as a means to understand and manipulate information in textsTo our knowledge graphic organizers are typically used as classroomactivities rather than assessment tools They have good potential foruse in assessment if students are familiar with them and if appropri-ate scoring systems can be developed This project demonstrates that it is possible though labor intensive to develop reliable scoringsystems Further research is needed to explore refinement of this tasktype and related scoring systems for use in testing programs withlarge numbers of test-takers and scorers Although the Reading toIntegrate task (generating a synthesis) was more familiar the scoringsystem was innovative because it reflected a readerrsquos ability torecognize textual frames as well as integrate information If testdevelopers are interested in the abilities of nonnative speaker readersto perform such tasks further development of similar tasks andscoring systems is warranted While a substantial investment of time
200 New tasks for reading comprehension tests
would be required to refine the administration and scoring systemsneeded for more complex tasks such as these it should be weighedagainst the possible danger of under-representing the likelihood ofacademic success based on results of the basic comprehension-onlymeasures still most often used to assess academic proficiency
Acknowledgements
This project was funded by a grant from Educational Testing Serviceas part of the TOEFL 2000 effort We appreciate the cooperation ofthe ETS staff members who assisted in the development of passagespecific tasks and other aspects of the research however no officialendorsement of Educational Testing Service should be inferred
VII References
Bachman L 2000 Modern language testing at the turn of the century assur-ing that what we count counts Language Testing 17 1ndash42
Biber D 1993 Using register-diversified corpora for general language studiesComputational Linguistics 19 219ndash41
Britt M Rouet J and Perfetti C 1996 Using hypertext to study and reasonabout historical evidence In Rouet J Levonen J Dillon A and Spiro Reditors Hypertext and cognition Mahwah NJ Lawrence Erlbaum 43ndash72
Chapelle C 1999 Validity in language assessment Annual Review of AppliedLinguistics 19 1ndash19
Educational Testing Service 1997 TOEFL test and score manual PrincetonNJ Educational Testing Service
mdashmdash 1998 Draft TOEFL 2000 research agenda framework areas of researchResearch agenda three TOEFL 2000 internal document Princeton NJEducational Testing Service
Eignor D Taylor C Kirsch I and Jamieson J 1998 Development of ascale for assessing the level of computer familiarity of TOEFL exami-nees TOEFL Research Report No 60 Princeton NJ EducationalTesting Service
Enright M and Schedl M 1999 Reading for a reason using reader purposeto guide test design TOEFL 2000 Internal Report Princeton NJEducational Testing Service
Enright M Grabe W Mosenthal P Mulcahy-Ernt P and Schedl M1998 A TOEFL 2000 framework for testing reading comprehension aworking paper Princeton NJ Educational Testing Service
Foltz P 1996 Comprehension coherence and strategies in hypertext InRouet J Levonen J Dillon A and Spiro R editors Hypertext and cog-nition Mahwah NJ Lawrence Erlbaum 109ndash36
Latricia Trites and Mary McGroarty 201
Freedle R and Kostin I 1999 Does the text matter in a multiple choice testof comprehension The case for the construct validity of TOEFLrsquosminitalks Language Testing 16 2ndash32
Goldman S 1997 Learning from text reflections on the past and suggestionsfor the future Discourse Processes 23 357ndash98
Hayes J and Hatch J 1999 Issues in measuring reliability correlationversus percentage of agreement Written Communication 16 354ndash67
Huberty C 1994 Applied discriminant analysis New York John WileyJamieson J Campbell J Norfleet L and Berbisada N 1993 Reliability
of a computerized scoring routine for an open-ended task System 21305ndash22
Jamieson J Norfleet L and Berbisada N 1993 Successes failures anddropouts in computer-assisted language lessons Computer AssistedEnglish Language Learning Journal 4 12ndash20
Klecka W 1980 Discriminant analysis In Lewis-Beck M editorQuantitative applications in the social sciences Volume 19 NewburyPark CA Sage
Lehto M Zhu W and Carpenter B 1995 The relative effectiveness ofhypertext and text International Journal of Human-ComputerInteraction 7 293ndash313
McNamara D and Kintsch W 1996 Learning from texts effects of prior knowl-edge and text coherence Discourse Processes 22 247ndash88
Messick S 1989 Validity In Linn RL editor Educational measurement 3rdedition New York American Council on Education Macmillan 13ndash103
Meyer B 1985a Prose analyses purposes procedures and problems InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 11ndash64
mdashmdash 1985b Prose analysis purposes procedures and problems Part 2 InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 269ndash97
Monks V 1997 AugustSeptember Two views same waterway NationalWildlife 35 36ndash37
Pellegrino J Baxter G and Glaser R 1999 Addressing the lsquotwo disci-plinesrsquo problem linking theories of cognition and learning with assess-ment and instructional practice Review of Research in Education 24307ndash53
Perfetti C 1997 Sentences individual differences and multiple texts threeissues in text comprehension Discourse Processes 23 337ndash55
Perfetti C Britt MA and Georgi M 1995 Text-based learning andreasoning studies in history Hillside NJ Lawrence Erlbaum
Perfetti C Marron M and Foltz P 1996 Sources of comprehension failuretheoretical perspectives and case studies In Cornoldi C and Oakhill Jeditors Reading comprehension difficulties processes and interventionMahwah NJ Lawrence Erlbaum 137ndash65
202 New tasks for reading comprehension tests
Reinking D 1988 Computer-mediated text and comprehension differencesthe role of reading time reader preference and estimation of learningReading Research Quarterly 23 484ndash500
Reinking D and Schreiner R 1985 The effects of computer-mediated texton measures of reading comprehension and reading behavior ReadingResearch Quarterly 20 536ndash53
Spivey N 1997 The constructivist metaphor reading writing and themaking of meaning San Diego CA Academic Press
Stevens J 1996 Applied multivariate statistics for the social sciences 3rdedition Mahwah NJ Lawrence Erlbaum
Tabachnick B and Fidell L 1996 Using multivariate statistics 3rd editionNew York Harper Collins
Taylor C Jamieson J Eignor D and Kirsch I 1998 The relationshipbetween computer familiarity and performance on computer-basedTOEFL test tasks TOEFL Research Report No 61 Princeton NJEducational Testing Service
Tennesen M 1997 NovemberDecember On a clear day National Parks 7126ndash9
Trites L 2000 Beyond basic comprehension reading to learn and reading tointegrate for native and non-native speakers Unpublished doctoraldissertation Northern Arizona University Flagstaff AZ
Van den Berg S and Watt J 1991 Effects of educational setting on studentresponses to structured hypertext Journal of Computer-BasedInstruction 18 118ndash24
Van Dijk TA and Kintsch W 1983 Strategies of discourse comprehensionNew York Academic Press
Wiley J and Voss J 1999 Constructing arguments from multiple sourcestasks that promote understanding and not just memory for text Journalof Educational Psychology 91 310ndash11
Zimmerman T 1997 December 29 Filter it with billions and billions of oys-ters how to revive the Chesapeake Bay US News and World Report 12363
Appendix 1a Chart completion task
Directions Complete the following chart Fill in as much detail aspossible from the text read by categorizing the information into thedifferent areas on the chart Do not use single words for yourresponses form your responses in phrases or complete sentencesInclude examples from the text
Make no judgments about the accuracy of causes or effects or theeffectiveness of the solutions mentioned in the text Solutions are seen
Latricia Trites and Mary McGroarty 203
204 New tasks for reading comprehension tests
as any action taken in response to the problem(s) Solutions can takemany forms such as proposed solutions attempted solutions or failedsolutions Also space is provided under each category for examplesExamples are specific examples found in the text that are used by theauthor(s) to exemplify the problems causes effects or solutions inthe text However there may not be examples for every category
Points will be awarded for correct responses only There is no penaltyfor incorrect responses Points will be awarded in the following manner
Problems and Solutions 10 points eachCauses and Effects 5 points eachExamples 1 point each
Problems Causes Effects Solutions
Examples Examples Examples Examples
Latricia Trites and Mary McGroarty 205
Ap
pen
dix
1b
Rea
din
g t
o L
earn
sco
rin
g r
ub
ric
0ndash2
41 t
ota
l po
ssib
le
Pro
ble
ms
(10
po
ints
) 1
wo
rd
Cau
ses
(5p
oin
ts)
1 w
ord
E
ffec
ts (
5 p
oin
ts)
1 w
ord
S
olu
tio
ns
(10
po
ints
en
d)
1 w
ord
(6 p
oin
ts)
40
max
imu
m(3
po
ints
) 5
5 m
axim
um
(3 p
oin
ts)
30
max
imu
m(6
po
ints
) 9
0 m
axim
um
bullA
ir p
ollu
tio
nS
mo
g i
n t
he
bullS
ulf
ur
nit
rog
en e
mis
sio
ns
bullTre
es a
nd
pla
nts
aff
ecte
dS
tric
ter
En
vir
on
men
tal
Law
s
Nati
on
al
Park
sbull
Aci
d r
ain
(in
jure
d)
gro
wth
hin
der
ed(r
eso
luti
on
s
acts
)
bullO
pp
osit
ion
or
Ign
ori
ng
bull
Gro
un
d-l
evel
ozo
ne
bullV
isib
ilit
y d
ecre
ased
bull19
77 C
lean
Air
Act
(or
lack
of
coo
per
atio
n)
to
Urb
an
In
du
str
ial
em
issio
ns
bullM
eta
ls l
oo
sen
ed
into
wat
ers
Am
en
dm
en
ts l
ab
elin
gN
Ps
envi
ron
men
tal s
tan
dar
ds
bullA
uto
mo
bile
emis
sio
ns
(su
rfac
eg
rou
nd
wat
er
as C
lass I
are
as
(reg
ula
tio
ns)
bullP
ow
er
pla
nts
fac
tori
esp
lan
ts
wat
er p
ollu
ted
)bull
Reg
ion
al h
aze
reg
ula
tio
ns
bullN
o t
rue
lsquopo
int
sou
rces
rsquoem
issi
on
sbull
Nu
trie
nts
rem
oved
pro
po
sed
by
the
EPA
(myri
ad
of
sm
aller
po
llu
tio
n
bullE
mis
sio
ns
fro
m S
mo
kesta
cks
(lea
ched
) fr
om
so
il bull
Red
ucti
on
of
allo
wab
leso
urc
es
rath
er t
han
on
e (c
him
neys)
and
or
pla
nts
po
llu
tio
n s
tan
dard
s f
rom
la
rge
sou
rce)
bullE
mis
sio
ns
fro
m K
iln
sbull
Pu
blic o
utc
ryo
ver
stan
ce
ind
ust
rybull
Nati
on
al
Park
s l
imit
ed
bull
Un
reg
ula
ted
po
lluti
on
sm
og
o
f b
ig b
usi
nes
sg
ove
rnm
ent
bullC
lean
Air
Act
set
ju
risd
icti
on
of
po
lluti
on
(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)bull
Aq
uati
c l
ife in
jure
d (
dam
aged
)vis
ibilit
y g
oals
sou
rces
ou
tsid
e o
f th
e p
arks
bullS
mo
ke f
rom
Co
ntr
olled
bu
rns
bullO
bje
cti
ng
to c
on
str
ucti
on
bullE
mis
sio
ns
fro
m la
rge
citi
esp
erm
its
bullId
en
tifi
cati
on
of
po
lluti
on
so
urc
es
bullIn
du
str
y m
od
ificati
on
s
Insta
llati
on
o
f d
evic
es
(scr
ub
ber
s)
to r
edu
ce
po
lluti
on
bullP
ub
lic p
ressu
re t
op
rote
ct p
arks
206 New tasks for reading comprehension testsA
pp
en
dix
1b
(co
nti
nu
ed)
Exam
ple
s (
1 p
oin
t) 5
maxim
um
Exam
ple
s (
1 p
oin
t o
r 5 p
oin
ts
Exam
ple
s (
1 p
oin
t) 6
maxim
um
Exam
ple
s (
1 p
oin
t o
r 10 p
oin
ts
wit
h o
vera
rch
ing
cau
se
wit
h o
vera
rch
ing
so
luti
on
liste
d a
bo
ve)
liste
d a
bo
ve)
bullG
reat
Sm
oky M
ou
nta
inbull
Au
tom
ob
ile
emis
sio
ns
bullLe
aves
of
pla
nts
tu
rnin
g
bull19
77 C
lean
Air
Act
N
atio
nal
Par
kbull
Po
wer
pla
nts
fac
tori
es
pu
rple
an
d b
row
n (
stip
plin
g)
Am
en
dm
en
ts
lab
elin
gN
Ps
bullG
ran
d C
an
yo
np
lan
ts e
mis
sio
ns
bullH
ind
erin
g p
ho
tosyn
thesis
of
as C
lass I
are
as
Nat
ion
al P
ark
bullE
mis
sio
ns
fro
m
pla
nts
an
d t
rees
bullS
mo
ky M
ou
nta
in S
ou
rces
S
mo
kesta
cks
(ch
imn
eys)
bullR
ed
ucti
on
of
vis
ibilit
yat
bullR
egio
nal
haz
ere
gu
lati
on
s
fro
m O
hio
New
Yo
rk
bullE
mis
sio
ns
fro
m K
iln
sG
ran
d C
an
yo
np
rop
ose
d b
y th
e E
PA
Atl
anta
etc
bull
Un
reg
ula
ted
po
lluti
on
sm
og
bull
Vis
ibili
ty r
educ
ed 9
0 o
f bull
Red
ucti
on
of
allo
wab
lebull
Gra
nd
Can
yon
So
urc
es(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)th
e da
ysp
ollu
tio
n s
tan
dard
s f
rom
fro
m C
A N
V U
T A
Z N
M
bullS
mok
e fr
om C
on
tro
lled
bu
rns
bullS
ee v
ague
blu
e m
asse
sin
du
stry
and
Mex
ico
bullE
mis
sio
ns
fro
m la
rge
citi
esbull
See
hal
f as
far
as
in 1
919
bullT
N L
utt
rell
corp
bu
ildin
g
per
mit
sit
uat
ion
Exam
ple
s (
1 p
oin
t)
10 m
axim
um
Exam
ple
s (
1 p
oin
t)
5 m
axim
um
bullN
avaj
o G
ener
atin
g S
tati
on
bull
Ob
ject
ing
to
kiln
s in
TN
Pag
e A
Z (
Po
wer
Pla
nt
AZ
)bull
Res
earc
her
s u
sin
g s
cien
tifi
cbull
Po
wer
pla
nts
in T
N a
nd
OH
tech
no
log
y to
ID s
ou
rces
rive
r va
lley
(rad
ioac
tive
iso
top
es)
bullS
ou
ther
n C
alif
orn
ia E
dis
on
bull
So
uth
ern
Cal
ifo
rnia
Ed
iso
nP
lan
t L
aug
hlin
NV
Pla
nt
in L
aug
hlin
NV
bull
Ten
nes
see
Lutt
rell
Kiln
s T
Nid
enti
fied
as
po
int
sou
rce
bullIn
du
stry
in A
Zbull
Scru
bber
sin
stal
led
at
bullC
ars
in C
AN
avaj
o G
ener
atin
g P
lan
tbull
Sm
oke
stac
ks in
NM
NV
UT
bull10
r
edu
ctio
n o
f p
ollu
tio
nbull
Los
An
gel
esin
10ndash
15 y
ears
(g
oal
of
no
bullA
tlan
tam
an-m
ade
po
lluti
on
)bull
New
Yo
rk
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
Latricia Trites and Mary McGroarty 185
showed the largest variance in performance while on the computerfamiliarity measure the variance of both nonnative speaker groupswas substantially larger than that of the native speakers
The same pattern emerged for the means on the new measures (seeTable 2) as for the existing measures The native speaker undergradu-ate group performed better on all new measures than both of thenonnative speaker groups The nonnative speaker graduate groupperformed better than the nonnative speaker undergraduate group onall measures as well This robust pattern of performance was alsofound in the variance of three of the four new measures On BC1 andBC2 the performance of the native speaker undergraduates showed the least amount of variance followed by the nonnative speaker grad-uates followed by the nonnative speaker undergraduates On Readingto Integrate the native speaker undergraduate group showed substan-tially less variance than the nonnative speaker groups however thevariance of the two nonnative speaker groups was almost identical OnReading to Learn all three groups showed considerable variance
Table 3 reveals the range of awarded points achieved by all partici-pant groups The nature of the Reading to Learn point system created amaximum possible point value (241) that no participant achieved Wespeculate that there are at least three possible causes of the discrepancybetween the theoretical maximum and the range of observed scores
Table 1 Descriptive statistics for existing measures for three participant groups
Group n Mean sd kMax
NelsonndashDennyNSU 105 12648 1646 156NNSU 106 6724 3191 156NNSG 40 8888 2188 156Total participants 251 9547 3693 156
TOEFL Reading comprehensionNSU 105 6130 424 67NNSU 106 5030 853 67NNSG 40 5715 455 67Total participants 251 5599 819 67
Computer familiarityNSU 104 3808 360 44NNSU 104 3482 599 44NNSG 40 3563 602 44Total participants 248 3631 533 44
Note kMax number of items or maximum possible score
186 New tasks for reading comprehension tests
bull task novelty no participant reported ever doing such a taskpreviously
bull time allowed for task completion andbull space on the response sheet space constraints may have limited
the amount of information that participants could include
Future research would need to address these issues However for theReading to Integrate measure the full range of possible point totalswas achieved by at least one participant in each group
1 Computer familiarity
The overall plan for the analyses was to check the influence of theindependent variables on the dependent measures with computerfamiliarity being addressed first Initially we had proposed that if computer familiarity was significantly different across groups itwould be entered into all calculations as a covariate To determinethis it was necessary to conduct an Analysis of Variance (ANOVA) forcomputer familiarity across the six participantmedium subgroups
Table 2 Descriptive statistics for new measures for three participant groups
Group n Mean sd kMax
Reading to Learn (chart)NSU 105 5185 1986 241NNSU 106 3173 1950 241NNSG 40 4468 1927 241Total participants 251 4221 2164 241
Basic Comprehension Test 1NSU 105 1698 247 20NNSU 106 1173 425 20NNSG 40 1498 350 20Total participants 251 1444 423 20
Reading to Integrate (synthesis)NSU 101 6365 1105 80NNSU 103 3724 2176 80NNSG 40 5360 2103 80Total participants 244 5086 2163 80
Basic Comprehension Test 2NSU 105 1591 278 20NNSU 106 975 454 20NNSG 40 1285 361 20Total participants 251 1282 472 20
Notes kMax number of items or maximum possible score n size reduced forreading to integrate because of anomalous testing session
Latricia Trites and Mary McGroarty 187
The resulting ANOVA (F 470 p 05) showed a significantdifference between subgroups on the computer familiarity question-naire therefore a post hoc Scheffeacute test was done to locate significantcontrasts After analysis of all possible subgroup contrasts the post hoc Scheffeacute revealed that the only significant difference in sub-groups appeared between the native speaker undergraduates andnonnative speaker undergraduates who read texts on paper Hencealthough there was one significant contrast it occurred in two sub-groups reading on paper not in any of the subgroups who read oncomputer All groups generally scored high on computer familiarityalthough as noted variance of the nonnative groups was greater Itwas thus established that computer familiarity had no significanteffect on participants who read texts on computer so we did not usecomputer familiarity as a covariate in further analyses and proceededto the three research questions of central interest to this study
Because both Research Questions 1 and 2 are similar ndash except thatthey address the two different new reading measures Reading toLearn and Reading to Integrate ndash we approached them in the samemanner through ANOVA to identify the independent variables thatcould have significantly affected the results on the new measures
2 Research Question 1
The first research question asked if performance on a measure ofReading to Learn was affected by medium of presentation computerfamiliarity native language or level of education We calculated a uni-variate ANOVA with Type III sums of squares on Reading to Learn with
Table 3 Range of scores for new measures for three participant groups
Group n Minimum Maximum kMax
Reading to Learn (chart)NSU 105 14 120 241NNSU 106 0 86 241NNSG 40 3 94 241Total participants 251 0 120 241
Reading to Integrate (synthesis)NSU 101 38 80 80NNSU 103 0 80 80NNSG 40 5 80 80Total participants 244 0 80 80
Notes kMax number of items or maximum possible score n size reduced forreading to integrate because of anomalous testing session
188 New tasks for reading comprehension tests
group status medium of text presentation and test order as possiblecontributing factors4 Table 4 shows that there were no significantinteractions for any of the group medium or test order combinationsThe only significant main effect was group membership
Because group membership was a combined measure that includedboth native language background as well as level of education posthoc analysis was needed to identify the significant contrasts Table 5shows that there was a significant difference in performance on theReading to Learn measure between the native speaker undergraduateand the nonnative speaker undergraduate groups as well as a sig-nificant difference between the nonnative speaker undergraduate and nonnative speaker graduate groups There was no significantdifference in performance between the native speaker undergrad-uate and the nonnative speaker graduate groups Therefore theanswer to Research Question 1 is that native language backgroundand level of education did have a significant effect on performance onthe Reading to Learn measure but that medium of text presentationdid not Further order of testing whether participants took Readingto Learn or Reading to Integrate first had no significant effect
3 Research Question 2
The second research question related to the first asked if perform-ance on Reading to Integrate was affected by medium of presentation
Table 4 Performance on Reading to Learn measure by groups medium and testorder (n 251) (univariate analysis of variance)
Source Type III sum df Mean square Fof squares
Group 2186929 2 1093464 2810Medium 121586 1 121586 313Test order 29498 1 29498 076Group medium 33481 2 16740 043Group test order 437 2 219 001Medium test order 39173 1 39173 101Group medium test order 57529 2 28765 074Error 9299745 239 38911
Note p 05
4Test order was added as an additional variable to double check that our counterbalancing had beeneffective in controlling for any practice effect
Latricia Trites and Mary McGroarty 189
computer familiarity native language or level of education Again toensure that counterbalancing of tests controlled for any practiceeffect test order was added as an additional variable
To answer this question we proceeded to calculate a univariateANOVA on the Reading to Integrate measure with group statusmedium of text presentation and test order entered as possible contri-buting factors The results (Table 6) show as for Research Question 1that there were no significant interactions for any of the groupmedium or test order combinations the only significant main effectwas group membership The answer for Research Question 2 is thatnative language background and educational level had a significanteffect on Reading to Integrate but medium of text presentation did notPost hoc analysis of group contrasts showed that all three groups weredistinct in their performance on Reading to Integrate (see Table 7)
4 Research Question 3
The third research question asked to what extent measures of basiccomprehension Reading to Learn and Reading to Integrate were
Table 5 Post hoc Scheffeacute for Reading to Learn measure (n 251)
Group n Group n Mean difference Standard error
NSU 105 NNSU 106 2012 272NNSG 40 717 367
NNSU 106 NNSG 40 1295 366
Note p 05
Table 6 Performance on Reading to Integrate measure by groups medium andtest order (n 244a) (univariate analysis of variance)
Source Type III sum df Mean square Fof squares
Group 3629433 2 1814717 5582b
Medium 19295 1 19295 059Test order 9783 1 9783 030Group medium 1182 2 591 002Group test order 3014 2 1507 005Medium test order 109872 1 109872 338Group medium test order 148858 2 74429 229Error 7543037 232 32513
Notes an size reduced for Reading to Integrate because of anomalous testingsession bp 05
190 New tasks for reading comprehension tests
related We used correlational analysis as the first step in answeringthis question Results for the total participant population (see Table 8)showed moderate to high correlations across all reading measuresHowever the analyses done for Research Questions 1 and 2 revealedthat group status had a significant effect on performance on Readingto Learn and Reading to Integrate Further we realize that corre-lations are sensitive to variance so the high correlations seen in thetotal population could have been an artifact of combining the threegroups Therefore we examined the correlations among all readingmeasures for each group (available in Trites 2000 Appendix 1 pp230ndash33) While the reading measures were still correlated oftenmoderately sometimes highly magnitudes differed and sometimesdropped substantially The text-specific multiple-choice measuresBC1 and BC2 consistently correlated more highly with theNelsonndashDenny and TOEFL Reading Comprehension tests than withReading to Learn and Reading to Integrate based on the same textssuggesting a test method or construct effect Because comparisonsbetween different measures of basic comprehension were not a goal of the project BC1 and BC2 were not used in further analysesWe conclude that as expected all reading measures were relatedbut the lower correlations between Reading to Learn and Reading to Integrate and the traditional basic comprehension measures led us to consider further types of analysis to identify the possibledistinctiveness of the new measures
5 Discriminant analysis
Because we were interested in determining how constructs differedwe sought additional analyses to help us better characterize the new constructs Of the several possible statistical methods that could have been employed two are most plausible multivariateanalysis of variance usually associated with experimental research
Table 7 Post hoc Scheffeacute for Reading to Integrate measure (n 244a)
Group n Group n Mean difference Standard error
NS 101 NNSU 103 2641b 253NNSG 40 1005b 337
NNSU 103 NNSG 40 1636b 336
Notes an size reduced for Reading to Integrate because of anomalous testingsession bp 05
Latricia Trites and Mary McGroarty 191
Tab
le 8
Co
rrel
atio
ns
for
all r
ead
ing
mea
sure
s fo
r al
l par
tici
pan
ts (
n
251)
TO
EFL
Rea
din
gB
asic
B
asic
Rea
din
g t
oR
ead
ing
to
C
om
pre
hen
sio
nC
om
pre
hen
sio
n T
est
1C
om
pre
hen
sio
n T
est
2Le
arn
Inte
gra
tea
Nel
son
ndashDen
ny
90b
85b
84b
66b
69b
TO
EFL
Rea
din
g
100
85b
84b
64b
69b
com
pre
hen
sio
nB
asic
1
008
4b6
8b6
8b
com
pre
hen
sio
n 1
Bas
ic
100
68b
70b
com
pre
hen
sio
n 2
Rea
din
g t
o L
earn
100
59b
Not
es a
nsi
ze r
edu
ced
fo
r R
ead
ing
to
Inte
gra
te b
ecau
se o
f an
om
alo
us
test
ing
ses
sio
n b
p
05
and discriminant analysis usually associated with descriptiveresearch (Tabachnick and Fidell 1996) The present research wasconducted with samples of naturally occurring student groups andwas not experimental Moreover we were interested in finding waysto compare participant performance on the measures of the newconstructs Reading to Learn and Reading to Integrate with perform-ance on more traditional measures of basic comprehension Thus we opted to use discriminant analysis because of its parsimony of description and clarity of interpretation (Stevens 1996)Discriminant analysis a technique recommended to describe groupdifferences or predict group membership based on a comparison ofmultiple predictors (Huberty 1994) has been used in other areas ofapplied linguistic research to investigate creation of a student profileof success or failure on Computer Assisted Language Learning(CALL) lessons (Jamieson et al 1993) and accurate classification oftext types into registers (Biber 1993) among other purposes
To further distinguish basic comprehension from the new con-structs we conducted discriminant analysis on each of the twolanguage groups (native and nonnative) to determine whetherReading to Learn and Reading to Integrate would classify partici-pants in the same way that Basic Comprehension would We dividedthe native speaker and nonnative speaker groups into three levelshigh middle (mid) and low reading ability scorers based on thebasic comprehension measure chosen for that group (NelsonndashDennyfor native speakers TOEFL Reading Comprehension for nonnativespeakers) Research methodologists (Tabachnick and Fidell 1996513) note that robustness is expected when the smallest group has isleast 20 our smallest group was 25 To check the assumption ofhomogeneity of the variancecovariance matrices we examined theoutcomes of Boxrsquos M Test and found them all nonsignificant(Klecka 1980) Each group was checked for outliers usingMahalanobis distance treated as Chi-Square and no outliers were found (Tabachnick and Fidell 1996) Thus the data met allassumptions required for use of discriminant analysis
a Discriminant analysis for nonnative speakers To organize thediscriminant analysis in order to see if the Reading to LearnReadingto Integrate Composite classified participants similarly to themeasure of Basic Comprehension (for nonnative speakers TOEFLReading Comprehension) we divided the entire nonnative speakergroup (n 146) into three levels of basic comprehension high mid
192 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 193
and low These reading ability groups of high mid and low werebased on typical TOEFL Reading Comprehension score levelsrequired for program entry Participants were classified as high iftheir scores were greater than or equal to 550 or 56 and above on thescaled score on the TOEFL Reading Comprehension (550 is the cutscore often used for graduate entry) Participants were classified asmid if scores ranged between 500 and 549 or 50 to 55 on the TOEFLReading Comprehension Participants were classified as low if theirscores fell below 500 or 49 and below on TOEFL ReadingComprehension (500 is a minimum TOEFL score sometimes usedfor undergraduate admission often with the proviso that studentsenroll in ESL classes either prior to official enrollment or concur-rently) For our entire nonnative speaker group descriptive statisticson basic comprehension reading ability group membership levelsappear in Table 9
We ran SPSS Discriminant Analysis with initial grouping variablesof high mid and low reading ability We compared initial readingability levels with high mid and low categories on the Reading toLearnReading to Integrate Composite a new variable reflectinglevel of performance on the Reading to Learn and Reading toIntegrate measures combined5 The discriminant analysis yieldedone discriminant function with an eigenvalue of 92 responsible for999 of the variance in outcomes Wilkrsquos Lambda for this functionwas 52 significant at 001 the associated Chi-Square value wasextremely large (9093) and highly significant ( p 001) indicatingthe group centroids on the composite Reading to LearnReading toIntegrate function for the three nonnative speaker reading abilitygroups were significantly different Both Reading to Learn and
Table 9 Descriptive statistics for nonnative speakers by TOEFL reading compre-hension reading ability groups (n 143)
Reading ability group n Mean on TOEFL reading comprehension sd
High (56) 57 5968 303Mid (50ndash55) 42 5281 167Low (49) 44 4211 547
Total 143 5226 822
Note Total n 143 due to loss of three cases in anomalous Reading to Integratesession
5We first calculated two separate discriminant analyses one for Reading to Learn and one for Reading to Integrate but we found that both loaded on a single function so we used thecomposite in subsequent analyses
194 New tasks for reading comprehension tests
Reading to Integrate loaded significantly on the discriminant func-tion at 86 for Reading to Learn and 78 for Reading to Integrate ( p 05)
Over half (649) of the high reading ability group remained highon the new measure less than half (429) of the mid reading ability group remained classified as mid Hence the Reading toLearnReading to Integrate Composite was particularly influential inreclassifying the mid reading ability group and to a lesser extent thehigh group However most (818) of the low reading ability groupremained low on the Reading to LearnReading to IntegrateComposite (see Table 10) Of the 143 nonnative speaker participants92 (64) remained in the initial basic comprehension category onthe composite the rest moved but in different directions Twenty-one participants (147) were classified into a higher category on theReading to LearnReading to Integrate Composite than their initialbasic comprehension level would have suggested while 31 (217)were reclassified into a lower category Thus 51 participants(364) just over one third of the sample were classified differentlybased on their Reading to LearnReading to Integrate Compositeperformance
b Discriminant analysis for native speakers Because one of ourgoals in this project was to probe the possible validity of these newmeasures by assessing performance of two groups native as well as
Table 10 Discriminant analysis comparison of nonnative speaker reading abilitygroups with reading to learnreading to integrate composite (n 143)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto integrate composite
High Mid Low
CountHigh (56) 37 17 3 57Mid (50ndash55) 13 18 11 42Low (49) 2 6 36 44Reclassification total 52 41 50 143
PercentageHigh (56) 649 298 53 1000Mid (50ndash55) 310 429 262 1000Low (49) 45 136 818 1000
NoteTotal n 143 due to loss of three cases in anomalous Reading to Integratesession
Latricia Trites and Mary McGroarty 195
nonnative speakers we conducted a parallel discriminant analysisfor native speakers Thus for the native speakers we followed thesame procedure dividing the entire native speaker group (n 105)into three levels of basic comprehension high mid and low basedon score distances of 5 standard deviations from the sample meanof the NelsonndashDenny test for these participants Native speakersneed not take reading comprehension tests when entering the univer-sity so the three-way split was based entirely on our sample dataParticipants were classified as high if their scores on the Nelson-Denny were greater than or equal to 135 Participants were classifiedas mid if their scores ranged between 119ndash134 on the Nelson-DennyParticipants were classified as low if their scores fell at or below 118on the Nelson-Denny Descriptive statistics for the entire nativespeaker sample on basic comprehension group membership appearin Table 11
The discriminant analysis yielded one discriminant function with an eigenvalue of 31 responsible for 987 of the variance inoutcomes Wilkrsquos Lambda for this function was 76 significant at 001 the associated Chi-Square value was large (2639) and highlysignificant (p 001) indicating the group centroids on the discrim-inant function for the three reading ability groups on the Reading toLearnReading to Integrate Composite were significantly differentPooled within groups correlations between discriminating variablesshowed that Reading to Learn correlated with the first discriminantfunction at a level of 81 Reading to Integrate correlated with thesecond discriminant function at 71 This contrasts with findings forthe nonnative speakers where scores on the combined new measuresloaded significantly on only one discriminant function For nativespeakers then there is evidence for two significant discriminantfunctions although the first accounts for almost all of the varianceAlthough these two measures (Reading to Learn and Reading toIntegrate) loaded on two separate discriminant functions they still
Table 11 Descriptive statistics for native speakers by NelsonndashDenny reading abilitygroups (n 101)
Reading ability group n Mean on NelsonndashDenny sd
High (135) 39 14126 519Mid (119ndash134) 37 12565 509Low (118) 25 10288 1098Total 101 12604 1652
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
196 New tasks for reading comprehension tests
showed moderate correlations with the alternate function6 justifyingthe composite calculations
Results of discriminant analysis for native speakers seen in Table 12 show a different pattern than that observed for nonnativespeakers Nearly three-fourths (718) of the native speakersclassified as high in the basic comprehension reading ability groupremained high on the Reading to LearnReading to IntegrateComposite For the mid group however only 189 remainedclassified as mid Just over half (52) of the low reading abilitygroup members remained low on the Reading to LearnReading to Integrate Composite As with the nonnative speakers participantsin the mid category on basic comprehension showed the mostfrequent reclassification Forty-eight of the 101 (475) nativespeaker participants remained in the initial classification categoriesTwenty-two (218) were reclassified into a higher category and 31(307) were reclassified into a lower category Thus 53 participants(525) over half of the sample were classified differently based ontheir performance on the Reading to LearnReading to IntegrateComposite
6Reading to Learn correlated with function 2 at -58 Reading to Integrate with function 1 at 71
Table 12 Discriminant analysis comparison of native speaker reading abilitygroups with Reading to LearnReading to Integrate Composite (n 101)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto Integrate Composite
High Mid Low
CountHigh (56) 37 17 3CountHigh (135) 28 3 8 39Mid (119ndash134) 10 7 20 37Low (118) 5 7 13 25Reclassification total 43 17 41 101
PercentageHigh (135) 718 77 205 1000Mid (119ndash134) 270 189 541 1000Low (118) 200 280 520 1000
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
V Interpretation
Analyses done to answer Research Questions 1 and 2 showed that performance on Reading to Learn and Reading to Integratemeasures was significantly influenced by language background and level of education (graduate vs undergraduate for nonnativespeakers only) Moreover level of computer familiarity had nosignificant effect on Reading to Learn and Reading to Integrateperformance
Correlations showed that as expected all reading measures corre-lated to some degree generally answering Research Question 3 Themost interesting results came from the discriminant analyses becausethey showed a pattern of differential classification on the new taskssensitive to initial reading ability This pattern differed for nonnativespeakers and native speakers Examination of the reclassificationsbased on the new tasks revealed some of the problems of using abasic comprehension-only test to predict performance on more chal-lenging literacy tasks These results also imply the need for furtherwork on new measures of advanced literacy skills such as Reading toLearn and Reading to Integrate to reflect trends in construct-drivenassessment (Pellegrino et al 1999)
For the nonnative speakers most (818) of the participants withTOEFL Reading Comprehension below 50 remained classified as lowon the Reading to LearnReading to Integrate Composite suggestingthe existence of a lower threshold of academic English proficiencyParticipants below this threshold were unlikely to perform well ontasks assessing Reading to Learn and Reading to Integrate Even tasksinvolving only selection (such as BC1 and BC2) rather than produc-tion of responses were difficult for this group However for thenonnative speaker readers in the mid and high reading ability groupsbasic comprehension level was not nearly as consistent a predictor of performance on the Reading to LearnReading to IntegrateComposite This was especially striking for those in the mid readingability group Approximately one fourth of the mid group droppedinto the low category on the Reading to LearnReading to IntegrateComposite indicating that they experienced more difficulty in com-pleting the new measures but approximately one fourth could indeedcategorize and synthesize information relatively well despite mid-dling performance on basic comprehension This finding suggeststhat for nonnative speaker readers in the mid reading ability categorybasic comprehension measures were insufficient to predict theirperformance on the new measures
Latricia Trites and Mary McGroarty 197
Almost two-thirds of the high basic comprehension nonnativespeakers remained high on the Reading to LearnReading toIntegrate Composite but one third dropped suggesting that theycould not categorize and synthesize information from texts as well asthey could recognize information For this third the recognition-onlytask multiple-choice basic comprehension overestimated ability tomanipulate and synthesize information For the high and mid read-ing ability nonnative speaker groups basic comprehension-only testsmay then overestimate academic English proficiency
Results for the native speakers showed a more definitive pattern ofreclassification For the low reading ability group approximatelyhalf were misclassified suggesting that the basic comprehensionmeasure underrepresented their ability to categorize and synthesizeinformation in academic texts Most of the mid reading ability groupwere reclassified as lower on the Reading to LearnReading toIntegrate Composite showing that basic comprehension resultsoverestimated ability to succeed on more challenging tasks On theother hand almost one third of the mid reading ability group didbetter on the Reading to LearnReading to Integrate Composite forthem basic comprehension underrepresented their ability to com-plete more challenging reading tasks For the high ability readersnearly one third dropped on the Reading to LearnReading toIntegrate Composite For them the basic comprehension measureoverestimated their level of academic English proficiency The other two thirds of the high reading ability group remained highsuggesting a higher threshold of academic English proficiency
Considering results from both the nonnative speakers and nativespeakers we conclude that the new tasks did assess something dif-ferent from basic comprehension once a lower level threshold ofbasic academic English proficiency had been achieved Examinationof scatterplots based on discriminant functions indicated an obviousseparation between participants who could and could not perform onthe Reading to LearnReading to Integrate Composite thus provid-ing some tentative evidence for concurrent validity (Messick 1989Chapelle 1999) We had hoped to find clear evidence of a hierarchysuggesting that Reading to Learn was demonstrably more difficultthan basic comprehension and Reading to Integrate demonstrablymore difficult than Reading to Learn but results did not yield anobvious hierarchy Results did however suggest an even simplerpattern a dichotomy For those nonnative speakers above the lowerthreshold of English language proficiency which in our data wouldbe approximately 500 or 49 on the scaled TOEFL Reading
198 New tasks for reading comprehension tests
Comprehension scores some could perform well on the newmeasures some could not revealing two rather than three groups onthe Reading to LearnReading to Integrate Composite These resultslead us to speculate that the new measures tap additional skills suchas lsquosophisticated discourse processes and critical thinking skillsrsquo(Enright and Schedl 1999 24) in addition to language proficiencyFor native speakers the discriminant loading suggested a possiblehierarchy of difficulty but the scatterplots and classification tablesrevealed a dichotomy Reclassification based on the Reading toLearnReading to Integrate Composite nearly eliminated the mid basic comprehension reading ability category with only 168of the native speakers qualifying as mid on the Reading toLearnReading to Integrate Composite
VI Conclusions
While we acknowledge that there are some limitations to this projectsuch as the time required to administer and score new measuresthe results are illuminating useful and suggest some considerationsfor future research The first consideration relates to appropriateclassification of students based on academic reading abilities Thiscorresponds to the role of TOEFL as a gatekeeper by many institu-tions For TOEFL test-takers whose current TOEFL scores would bein an intermediate to high intermediate range the new tasks couldassess additional abilities relevant to academic performance Forsuch participants additional test development efforts are warrantedThe majority of high reading ability nonnative speakers (649)could perform the new measures a third could not Our resultssuggest that some unknown number of mid reading ability nonnativespeakers are more capable of succeeding at more challenging aca-demic reading tasks than their current level of basic comprehensionassessment would indicate At the same time there are some highand mid reading ability participants who could not perform the moredemanding tasks admitting such students directly into universityprograms could result in failure These results suggest that for moststudents at lower levels of basic comprehension (in our study thosewith TOEFL scores below 490 to 500) development of new tasks isunnecessary Nevertheless our results indicate that a certain smallpercentage (in our sample 8 students 182) of nonnative speakersclassified as low reading ability on basic comprehension did in fact
Latricia Trites and Mary McGroarty 199
perform adequately on more challenging tasks like Reading to Learnand Reading to Integrate Given the very large number of TOEFLtest-takers around the world this finding merits further research Itwould be potentially unfair to exclude such students from universitystudy based on basic comprehension-only measures such as the cur-rent TOEFL We would urge institutions that use TOEFL to considerscores on these new more demanding tasks if doing so wouldaddress institutional needs According to interview data collected inconjunction with this project (Trites 2000 Chapter 6) participantsperceived that successful completion of the new tasks required athorough understanding of the texts whereas the multiple-choicetests were less demanding requiring only superficial grasp of con-tent This finding corroborates the observation of Freedle and Kostin(1999 3) who note that often examinees do not need to comprehendthe accompanying text to answer the test item Therefore for allTOEFL test-takers incorporating more challenging tasks such as theReading to Learn and Reading to Integrate measures into typicalEnglish language instruction could have positive washback effects
The second area of consideration is the focus on additional rele-vant research For predictive validity it is important to determine thecorrelation between the Reading to LearnReading to IntegrateComposite measures and actual academic performance in universityclasses Correlations might differ depending on the degree of catego-rization and synthesis required by different major fields of studyAnother area for future research is related to the novelty of theReading to Learn task as an assessment technique Current pedagog-ical trends in literacy instruction emphasize the use of graphic organ-izers as a means to understand and manipulate information in textsTo our knowledge graphic organizers are typically used as classroomactivities rather than assessment tools They have good potential foruse in assessment if students are familiar with them and if appropri-ate scoring systems can be developed This project demonstrates that it is possible though labor intensive to develop reliable scoringsystems Further research is needed to explore refinement of this tasktype and related scoring systems for use in testing programs withlarge numbers of test-takers and scorers Although the Reading toIntegrate task (generating a synthesis) was more familiar the scoringsystem was innovative because it reflected a readerrsquos ability torecognize textual frames as well as integrate information If testdevelopers are interested in the abilities of nonnative speaker readersto perform such tasks further development of similar tasks andscoring systems is warranted While a substantial investment of time
200 New tasks for reading comprehension tests
would be required to refine the administration and scoring systemsneeded for more complex tasks such as these it should be weighedagainst the possible danger of under-representing the likelihood ofacademic success based on results of the basic comprehension-onlymeasures still most often used to assess academic proficiency
Acknowledgements
This project was funded by a grant from Educational Testing Serviceas part of the TOEFL 2000 effort We appreciate the cooperation ofthe ETS staff members who assisted in the development of passagespecific tasks and other aspects of the research however no officialendorsement of Educational Testing Service should be inferred
VII References
Bachman L 2000 Modern language testing at the turn of the century assur-ing that what we count counts Language Testing 17 1ndash42
Biber D 1993 Using register-diversified corpora for general language studiesComputational Linguistics 19 219ndash41
Britt M Rouet J and Perfetti C 1996 Using hypertext to study and reasonabout historical evidence In Rouet J Levonen J Dillon A and Spiro Reditors Hypertext and cognition Mahwah NJ Lawrence Erlbaum 43ndash72
Chapelle C 1999 Validity in language assessment Annual Review of AppliedLinguistics 19 1ndash19
Educational Testing Service 1997 TOEFL test and score manual PrincetonNJ Educational Testing Service
mdashmdash 1998 Draft TOEFL 2000 research agenda framework areas of researchResearch agenda three TOEFL 2000 internal document Princeton NJEducational Testing Service
Eignor D Taylor C Kirsch I and Jamieson J 1998 Development of ascale for assessing the level of computer familiarity of TOEFL exami-nees TOEFL Research Report No 60 Princeton NJ EducationalTesting Service
Enright M and Schedl M 1999 Reading for a reason using reader purposeto guide test design TOEFL 2000 Internal Report Princeton NJEducational Testing Service
Enright M Grabe W Mosenthal P Mulcahy-Ernt P and Schedl M1998 A TOEFL 2000 framework for testing reading comprehension aworking paper Princeton NJ Educational Testing Service
Foltz P 1996 Comprehension coherence and strategies in hypertext InRouet J Levonen J Dillon A and Spiro R editors Hypertext and cog-nition Mahwah NJ Lawrence Erlbaum 109ndash36
Latricia Trites and Mary McGroarty 201
Freedle R and Kostin I 1999 Does the text matter in a multiple choice testof comprehension The case for the construct validity of TOEFLrsquosminitalks Language Testing 16 2ndash32
Goldman S 1997 Learning from text reflections on the past and suggestionsfor the future Discourse Processes 23 357ndash98
Hayes J and Hatch J 1999 Issues in measuring reliability correlationversus percentage of agreement Written Communication 16 354ndash67
Huberty C 1994 Applied discriminant analysis New York John WileyJamieson J Campbell J Norfleet L and Berbisada N 1993 Reliability
of a computerized scoring routine for an open-ended task System 21305ndash22
Jamieson J Norfleet L and Berbisada N 1993 Successes failures anddropouts in computer-assisted language lessons Computer AssistedEnglish Language Learning Journal 4 12ndash20
Klecka W 1980 Discriminant analysis In Lewis-Beck M editorQuantitative applications in the social sciences Volume 19 NewburyPark CA Sage
Lehto M Zhu W and Carpenter B 1995 The relative effectiveness ofhypertext and text International Journal of Human-ComputerInteraction 7 293ndash313
McNamara D and Kintsch W 1996 Learning from texts effects of prior knowl-edge and text coherence Discourse Processes 22 247ndash88
Messick S 1989 Validity In Linn RL editor Educational measurement 3rdedition New York American Council on Education Macmillan 13ndash103
Meyer B 1985a Prose analyses purposes procedures and problems InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 11ndash64
mdashmdash 1985b Prose analysis purposes procedures and problems Part 2 InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 269ndash97
Monks V 1997 AugustSeptember Two views same waterway NationalWildlife 35 36ndash37
Pellegrino J Baxter G and Glaser R 1999 Addressing the lsquotwo disci-plinesrsquo problem linking theories of cognition and learning with assess-ment and instructional practice Review of Research in Education 24307ndash53
Perfetti C 1997 Sentences individual differences and multiple texts threeissues in text comprehension Discourse Processes 23 337ndash55
Perfetti C Britt MA and Georgi M 1995 Text-based learning andreasoning studies in history Hillside NJ Lawrence Erlbaum
Perfetti C Marron M and Foltz P 1996 Sources of comprehension failuretheoretical perspectives and case studies In Cornoldi C and Oakhill Jeditors Reading comprehension difficulties processes and interventionMahwah NJ Lawrence Erlbaum 137ndash65
202 New tasks for reading comprehension tests
Reinking D 1988 Computer-mediated text and comprehension differencesthe role of reading time reader preference and estimation of learningReading Research Quarterly 23 484ndash500
Reinking D and Schreiner R 1985 The effects of computer-mediated texton measures of reading comprehension and reading behavior ReadingResearch Quarterly 20 536ndash53
Spivey N 1997 The constructivist metaphor reading writing and themaking of meaning San Diego CA Academic Press
Stevens J 1996 Applied multivariate statistics for the social sciences 3rdedition Mahwah NJ Lawrence Erlbaum
Tabachnick B and Fidell L 1996 Using multivariate statistics 3rd editionNew York Harper Collins
Taylor C Jamieson J Eignor D and Kirsch I 1998 The relationshipbetween computer familiarity and performance on computer-basedTOEFL test tasks TOEFL Research Report No 61 Princeton NJEducational Testing Service
Tennesen M 1997 NovemberDecember On a clear day National Parks 7126ndash9
Trites L 2000 Beyond basic comprehension reading to learn and reading tointegrate for native and non-native speakers Unpublished doctoraldissertation Northern Arizona University Flagstaff AZ
Van den Berg S and Watt J 1991 Effects of educational setting on studentresponses to structured hypertext Journal of Computer-BasedInstruction 18 118ndash24
Van Dijk TA and Kintsch W 1983 Strategies of discourse comprehensionNew York Academic Press
Wiley J and Voss J 1999 Constructing arguments from multiple sourcestasks that promote understanding and not just memory for text Journalof Educational Psychology 91 310ndash11
Zimmerman T 1997 December 29 Filter it with billions and billions of oys-ters how to revive the Chesapeake Bay US News and World Report 12363
Appendix 1a Chart completion task
Directions Complete the following chart Fill in as much detail aspossible from the text read by categorizing the information into thedifferent areas on the chart Do not use single words for yourresponses form your responses in phrases or complete sentencesInclude examples from the text
Make no judgments about the accuracy of causes or effects or theeffectiveness of the solutions mentioned in the text Solutions are seen
Latricia Trites and Mary McGroarty 203
204 New tasks for reading comprehension tests
as any action taken in response to the problem(s) Solutions can takemany forms such as proposed solutions attempted solutions or failedsolutions Also space is provided under each category for examplesExamples are specific examples found in the text that are used by theauthor(s) to exemplify the problems causes effects or solutions inthe text However there may not be examples for every category
Points will be awarded for correct responses only There is no penaltyfor incorrect responses Points will be awarded in the following manner
Problems and Solutions 10 points eachCauses and Effects 5 points eachExamples 1 point each
Problems Causes Effects Solutions
Examples Examples Examples Examples
Latricia Trites and Mary McGroarty 205
Ap
pen
dix
1b
Rea
din
g t
o L
earn
sco
rin
g r
ub
ric
0ndash2
41 t
ota
l po
ssib
le
Pro
ble
ms
(10
po
ints
) 1
wo
rd
Cau
ses
(5p
oin
ts)
1 w
ord
E
ffec
ts (
5 p
oin
ts)
1 w
ord
S
olu
tio
ns
(10
po
ints
en
d)
1 w
ord
(6 p
oin
ts)
40
max
imu
m(3
po
ints
) 5
5 m
axim
um
(3 p
oin
ts)
30
max
imu
m(6
po
ints
) 9
0 m
axim
um
bullA
ir p
ollu
tio
nS
mo
g i
n t
he
bullS
ulf
ur
nit
rog
en e
mis
sio
ns
bullTre
es a
nd
pla
nts
aff
ecte
dS
tric
ter
En
vir
on
men
tal
Law
s
Nati
on
al
Park
sbull
Aci
d r
ain
(in
jure
d)
gro
wth
hin
der
ed(r
eso
luti
on
s
acts
)
bullO
pp
osit
ion
or
Ign
ori
ng
bull
Gro
un
d-l
evel
ozo
ne
bullV
isib
ilit
y d
ecre
ased
bull19
77 C
lean
Air
Act
(or
lack
of
coo
per
atio
n)
to
Urb
an
In
du
str
ial
em
issio
ns
bullM
eta
ls l
oo
sen
ed
into
wat
ers
Am
en
dm
en
ts l
ab
elin
gN
Ps
envi
ron
men
tal s
tan
dar
ds
bullA
uto
mo
bile
emis
sio
ns
(su
rfac
eg
rou
nd
wat
er
as C
lass I
are
as
(reg
ula
tio
ns)
bullP
ow
er
pla
nts
fac
tori
esp
lan
ts
wat
er p
ollu
ted
)bull
Reg
ion
al h
aze
reg
ula
tio
ns
bullN
o t
rue
lsquopo
int
sou
rces
rsquoem
issi
on
sbull
Nu
trie
nts
rem
oved
pro
po
sed
by
the
EPA
(myri
ad
of
sm
aller
po
llu
tio
n
bullE
mis
sio
ns
fro
m S
mo
kesta
cks
(lea
ched
) fr
om
so
il bull
Red
ucti
on
of
allo
wab
leso
urc
es
rath
er t
han
on
e (c
him
neys)
and
or
pla
nts
po
llu
tio
n s
tan
dard
s f
rom
la
rge
sou
rce)
bullE
mis
sio
ns
fro
m K
iln
sbull
Pu
blic o
utc
ryo
ver
stan
ce
ind
ust
rybull
Nati
on
al
Park
s l
imit
ed
bull
Un
reg
ula
ted
po
lluti
on
sm
og
o
f b
ig b
usi
nes
sg
ove
rnm
ent
bullC
lean
Air
Act
set
ju
risd
icti
on
of
po
lluti
on
(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)bull
Aq
uati
c l
ife in
jure
d (
dam
aged
)vis
ibilit
y g
oals
sou
rces
ou
tsid
e o
f th
e p
arks
bullS
mo
ke f
rom
Co
ntr
olled
bu
rns
bullO
bje
cti
ng
to c
on
str
ucti
on
bullE
mis
sio
ns
fro
m la
rge
citi
esp
erm
its
bullId
en
tifi
cati
on
of
po
lluti
on
so
urc
es
bullIn
du
str
y m
od
ificati
on
s
Insta
llati
on
o
f d
evic
es
(scr
ub
ber
s)
to r
edu
ce
po
lluti
on
bullP
ub
lic p
ressu
re t
op
rote
ct p
arks
206 New tasks for reading comprehension testsA
pp
en
dix
1b
(co
nti
nu
ed)
Exam
ple
s (
1 p
oin
t) 5
maxim
um
Exam
ple
s (
1 p
oin
t o
r 5 p
oin
ts
Exam
ple
s (
1 p
oin
t) 6
maxim
um
Exam
ple
s (
1 p
oin
t o
r 10 p
oin
ts
wit
h o
vera
rch
ing
cau
se
wit
h o
vera
rch
ing
so
luti
on
liste
d a
bo
ve)
liste
d a
bo
ve)
bullG
reat
Sm
oky M
ou
nta
inbull
Au
tom
ob
ile
emis
sio
ns
bullLe
aves
of
pla
nts
tu
rnin
g
bull19
77 C
lean
Air
Act
N
atio
nal
Par
kbull
Po
wer
pla
nts
fac
tori
es
pu
rple
an
d b
row
n (
stip
plin
g)
Am
en
dm
en
ts
lab
elin
gN
Ps
bullG
ran
d C
an
yo
np
lan
ts e
mis
sio
ns
bullH
ind
erin
g p
ho
tosyn
thesis
of
as C
lass I
are
as
Nat
ion
al P
ark
bullE
mis
sio
ns
fro
m
pla
nts
an
d t
rees
bullS
mo
ky M
ou
nta
in S
ou
rces
S
mo
kesta
cks
(ch
imn
eys)
bullR
ed
ucti
on
of
vis
ibilit
yat
bullR
egio
nal
haz
ere
gu
lati
on
s
fro
m O
hio
New
Yo
rk
bullE
mis
sio
ns
fro
m K
iln
sG
ran
d C
an
yo
np
rop
ose
d b
y th
e E
PA
Atl
anta
etc
bull
Un
reg
ula
ted
po
lluti
on
sm
og
bull
Vis
ibili
ty r
educ
ed 9
0 o
f bull
Red
ucti
on
of
allo
wab
lebull
Gra
nd
Can
yon
So
urc
es(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)th
e da
ysp
ollu
tio
n s
tan
dard
s f
rom
fro
m C
A N
V U
T A
Z N
M
bullS
mok
e fr
om C
on
tro
lled
bu
rns
bullS
ee v
ague
blu
e m
asse
sin
du
stry
and
Mex
ico
bullE
mis
sio
ns
fro
m la
rge
citi
esbull
See
hal
f as
far
as
in 1
919
bullT
N L
utt
rell
corp
bu
ildin
g
per
mit
sit
uat
ion
Exam
ple
s (
1 p
oin
t)
10 m
axim
um
Exam
ple
s (
1 p
oin
t)
5 m
axim
um
bullN
avaj
o G
ener
atin
g S
tati
on
bull
Ob
ject
ing
to
kiln
s in
TN
Pag
e A
Z (
Po
wer
Pla
nt
AZ
)bull
Res
earc
her
s u
sin
g s
cien
tifi
cbull
Po
wer
pla
nts
in T
N a
nd
OH
tech
no
log
y to
ID s
ou
rces
rive
r va
lley
(rad
ioac
tive
iso
top
es)
bullS
ou
ther
n C
alif
orn
ia E
dis
on
bull
So
uth
ern
Cal
ifo
rnia
Ed
iso
nP
lan
t L
aug
hlin
NV
Pla
nt
in L
aug
hlin
NV
bull
Ten
nes
see
Lutt
rell
Kiln
s T
Nid
enti
fied
as
po
int
sou
rce
bullIn
du
stry
in A
Zbull
Scru
bber
sin
stal
led
at
bullC
ars
in C
AN
avaj
o G
ener
atin
g P
lan
tbull
Sm
oke
stac
ks in
NM
NV
UT
bull10
r
edu
ctio
n o
f p
ollu
tio
nbull
Los
An
gel
esin
10ndash
15 y
ears
(g
oal
of
no
bullA
tlan
tam
an-m
ade
po
lluti
on
)bull
New
Yo
rk
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
186 New tasks for reading comprehension tests
bull task novelty no participant reported ever doing such a taskpreviously
bull time allowed for task completion andbull space on the response sheet space constraints may have limited
the amount of information that participants could include
Future research would need to address these issues However for theReading to Integrate measure the full range of possible point totalswas achieved by at least one participant in each group
1 Computer familiarity
The overall plan for the analyses was to check the influence of theindependent variables on the dependent measures with computerfamiliarity being addressed first Initially we had proposed that if computer familiarity was significantly different across groups itwould be entered into all calculations as a covariate To determinethis it was necessary to conduct an Analysis of Variance (ANOVA) forcomputer familiarity across the six participantmedium subgroups
Table 2 Descriptive statistics for new measures for three participant groups
Group n Mean sd kMax
Reading to Learn (chart)NSU 105 5185 1986 241NNSU 106 3173 1950 241NNSG 40 4468 1927 241Total participants 251 4221 2164 241
Basic Comprehension Test 1NSU 105 1698 247 20NNSU 106 1173 425 20NNSG 40 1498 350 20Total participants 251 1444 423 20
Reading to Integrate (synthesis)NSU 101 6365 1105 80NNSU 103 3724 2176 80NNSG 40 5360 2103 80Total participants 244 5086 2163 80
Basic Comprehension Test 2NSU 105 1591 278 20NNSU 106 975 454 20NNSG 40 1285 361 20Total participants 251 1282 472 20
Notes kMax number of items or maximum possible score n size reduced forreading to integrate because of anomalous testing session
Latricia Trites and Mary McGroarty 187
The resulting ANOVA (F 470 p 05) showed a significantdifference between subgroups on the computer familiarity question-naire therefore a post hoc Scheffeacute test was done to locate significantcontrasts After analysis of all possible subgroup contrasts the post hoc Scheffeacute revealed that the only significant difference in sub-groups appeared between the native speaker undergraduates andnonnative speaker undergraduates who read texts on paper Hencealthough there was one significant contrast it occurred in two sub-groups reading on paper not in any of the subgroups who read oncomputer All groups generally scored high on computer familiarityalthough as noted variance of the nonnative groups was greater Itwas thus established that computer familiarity had no significanteffect on participants who read texts on computer so we did not usecomputer familiarity as a covariate in further analyses and proceededto the three research questions of central interest to this study
Because both Research Questions 1 and 2 are similar ndash except thatthey address the two different new reading measures Reading toLearn and Reading to Integrate ndash we approached them in the samemanner through ANOVA to identify the independent variables thatcould have significantly affected the results on the new measures
2 Research Question 1
The first research question asked if performance on a measure ofReading to Learn was affected by medium of presentation computerfamiliarity native language or level of education We calculated a uni-variate ANOVA with Type III sums of squares on Reading to Learn with
Table 3 Range of scores for new measures for three participant groups
Group n Minimum Maximum kMax
Reading to Learn (chart)NSU 105 14 120 241NNSU 106 0 86 241NNSG 40 3 94 241Total participants 251 0 120 241
Reading to Integrate (synthesis)NSU 101 38 80 80NNSU 103 0 80 80NNSG 40 5 80 80Total participants 244 0 80 80
Notes kMax number of items or maximum possible score n size reduced forreading to integrate because of anomalous testing session
188 New tasks for reading comprehension tests
group status medium of text presentation and test order as possiblecontributing factors4 Table 4 shows that there were no significantinteractions for any of the group medium or test order combinationsThe only significant main effect was group membership
Because group membership was a combined measure that includedboth native language background as well as level of education posthoc analysis was needed to identify the significant contrasts Table 5shows that there was a significant difference in performance on theReading to Learn measure between the native speaker undergraduateand the nonnative speaker undergraduate groups as well as a sig-nificant difference between the nonnative speaker undergraduate and nonnative speaker graduate groups There was no significantdifference in performance between the native speaker undergrad-uate and the nonnative speaker graduate groups Therefore theanswer to Research Question 1 is that native language backgroundand level of education did have a significant effect on performance onthe Reading to Learn measure but that medium of text presentationdid not Further order of testing whether participants took Readingto Learn or Reading to Integrate first had no significant effect
3 Research Question 2
The second research question related to the first asked if perform-ance on Reading to Integrate was affected by medium of presentation
Table 4 Performance on Reading to Learn measure by groups medium and testorder (n 251) (univariate analysis of variance)
Source Type III sum df Mean square Fof squares
Group 2186929 2 1093464 2810Medium 121586 1 121586 313Test order 29498 1 29498 076Group medium 33481 2 16740 043Group test order 437 2 219 001Medium test order 39173 1 39173 101Group medium test order 57529 2 28765 074Error 9299745 239 38911
Note p 05
4Test order was added as an additional variable to double check that our counterbalancing had beeneffective in controlling for any practice effect
Latricia Trites and Mary McGroarty 189
computer familiarity native language or level of education Again toensure that counterbalancing of tests controlled for any practiceeffect test order was added as an additional variable
To answer this question we proceeded to calculate a univariateANOVA on the Reading to Integrate measure with group statusmedium of text presentation and test order entered as possible contri-buting factors The results (Table 6) show as for Research Question 1that there were no significant interactions for any of the groupmedium or test order combinations the only significant main effectwas group membership The answer for Research Question 2 is thatnative language background and educational level had a significanteffect on Reading to Integrate but medium of text presentation did notPost hoc analysis of group contrasts showed that all three groups weredistinct in their performance on Reading to Integrate (see Table 7)
4 Research Question 3
The third research question asked to what extent measures of basiccomprehension Reading to Learn and Reading to Integrate were
Table 5 Post hoc Scheffeacute for Reading to Learn measure (n 251)
Group n Group n Mean difference Standard error
NSU 105 NNSU 106 2012 272NNSG 40 717 367
NNSU 106 NNSG 40 1295 366
Note p 05
Table 6 Performance on Reading to Integrate measure by groups medium andtest order (n 244a) (univariate analysis of variance)
Source Type III sum df Mean square Fof squares
Group 3629433 2 1814717 5582b
Medium 19295 1 19295 059Test order 9783 1 9783 030Group medium 1182 2 591 002Group test order 3014 2 1507 005Medium test order 109872 1 109872 338Group medium test order 148858 2 74429 229Error 7543037 232 32513
Notes an size reduced for Reading to Integrate because of anomalous testingsession bp 05
190 New tasks for reading comprehension tests
related We used correlational analysis as the first step in answeringthis question Results for the total participant population (see Table 8)showed moderate to high correlations across all reading measuresHowever the analyses done for Research Questions 1 and 2 revealedthat group status had a significant effect on performance on Readingto Learn and Reading to Integrate Further we realize that corre-lations are sensitive to variance so the high correlations seen in thetotal population could have been an artifact of combining the threegroups Therefore we examined the correlations among all readingmeasures for each group (available in Trites 2000 Appendix 1 pp230ndash33) While the reading measures were still correlated oftenmoderately sometimes highly magnitudes differed and sometimesdropped substantially The text-specific multiple-choice measuresBC1 and BC2 consistently correlated more highly with theNelsonndashDenny and TOEFL Reading Comprehension tests than withReading to Learn and Reading to Integrate based on the same textssuggesting a test method or construct effect Because comparisonsbetween different measures of basic comprehension were not a goal of the project BC1 and BC2 were not used in further analysesWe conclude that as expected all reading measures were relatedbut the lower correlations between Reading to Learn and Reading to Integrate and the traditional basic comprehension measures led us to consider further types of analysis to identify the possibledistinctiveness of the new measures
5 Discriminant analysis
Because we were interested in determining how constructs differedwe sought additional analyses to help us better characterize the new constructs Of the several possible statistical methods that could have been employed two are most plausible multivariateanalysis of variance usually associated with experimental research
Table 7 Post hoc Scheffeacute for Reading to Integrate measure (n 244a)
Group n Group n Mean difference Standard error
NS 101 NNSU 103 2641b 253NNSG 40 1005b 337
NNSU 103 NNSG 40 1636b 336
Notes an size reduced for Reading to Integrate because of anomalous testingsession bp 05
Latricia Trites and Mary McGroarty 191
Tab
le 8
Co
rrel
atio
ns
for
all r
ead
ing
mea
sure
s fo
r al
l par
tici
pan
ts (
n
251)
TO
EFL
Rea
din
gB
asic
B
asic
Rea
din
g t
oR
ead
ing
to
C
om
pre
hen
sio
nC
om
pre
hen
sio
n T
est
1C
om
pre
hen
sio
n T
est
2Le
arn
Inte
gra
tea
Nel
son
ndashDen
ny
90b
85b
84b
66b
69b
TO
EFL
Rea
din
g
100
85b
84b
64b
69b
com
pre
hen
sio
nB
asic
1
008
4b6
8b6
8b
com
pre
hen
sio
n 1
Bas
ic
100
68b
70b
com
pre
hen
sio
n 2
Rea
din
g t
o L
earn
100
59b
Not
es a
nsi
ze r
edu
ced
fo
r R
ead
ing
to
Inte
gra
te b
ecau
se o
f an
om
alo
us
test
ing
ses
sio
n b
p
05
and discriminant analysis usually associated with descriptiveresearch (Tabachnick and Fidell 1996) The present research wasconducted with samples of naturally occurring student groups andwas not experimental Moreover we were interested in finding waysto compare participant performance on the measures of the newconstructs Reading to Learn and Reading to Integrate with perform-ance on more traditional measures of basic comprehension Thus we opted to use discriminant analysis because of its parsimony of description and clarity of interpretation (Stevens 1996)Discriminant analysis a technique recommended to describe groupdifferences or predict group membership based on a comparison ofmultiple predictors (Huberty 1994) has been used in other areas ofapplied linguistic research to investigate creation of a student profileof success or failure on Computer Assisted Language Learning(CALL) lessons (Jamieson et al 1993) and accurate classification oftext types into registers (Biber 1993) among other purposes
To further distinguish basic comprehension from the new con-structs we conducted discriminant analysis on each of the twolanguage groups (native and nonnative) to determine whetherReading to Learn and Reading to Integrate would classify partici-pants in the same way that Basic Comprehension would We dividedthe native speaker and nonnative speaker groups into three levelshigh middle (mid) and low reading ability scorers based on thebasic comprehension measure chosen for that group (NelsonndashDennyfor native speakers TOEFL Reading Comprehension for nonnativespeakers) Research methodologists (Tabachnick and Fidell 1996513) note that robustness is expected when the smallest group has isleast 20 our smallest group was 25 To check the assumption ofhomogeneity of the variancecovariance matrices we examined theoutcomes of Boxrsquos M Test and found them all nonsignificant(Klecka 1980) Each group was checked for outliers usingMahalanobis distance treated as Chi-Square and no outliers were found (Tabachnick and Fidell 1996) Thus the data met allassumptions required for use of discriminant analysis
a Discriminant analysis for nonnative speakers To organize thediscriminant analysis in order to see if the Reading to LearnReadingto Integrate Composite classified participants similarly to themeasure of Basic Comprehension (for nonnative speakers TOEFLReading Comprehension) we divided the entire nonnative speakergroup (n 146) into three levels of basic comprehension high mid
192 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 193
and low These reading ability groups of high mid and low werebased on typical TOEFL Reading Comprehension score levelsrequired for program entry Participants were classified as high iftheir scores were greater than or equal to 550 or 56 and above on thescaled score on the TOEFL Reading Comprehension (550 is the cutscore often used for graduate entry) Participants were classified asmid if scores ranged between 500 and 549 or 50 to 55 on the TOEFLReading Comprehension Participants were classified as low if theirscores fell below 500 or 49 and below on TOEFL ReadingComprehension (500 is a minimum TOEFL score sometimes usedfor undergraduate admission often with the proviso that studentsenroll in ESL classes either prior to official enrollment or concur-rently) For our entire nonnative speaker group descriptive statisticson basic comprehension reading ability group membership levelsappear in Table 9
We ran SPSS Discriminant Analysis with initial grouping variablesof high mid and low reading ability We compared initial readingability levels with high mid and low categories on the Reading toLearnReading to Integrate Composite a new variable reflectinglevel of performance on the Reading to Learn and Reading toIntegrate measures combined5 The discriminant analysis yieldedone discriminant function with an eigenvalue of 92 responsible for999 of the variance in outcomes Wilkrsquos Lambda for this functionwas 52 significant at 001 the associated Chi-Square value wasextremely large (9093) and highly significant ( p 001) indicatingthe group centroids on the composite Reading to LearnReading toIntegrate function for the three nonnative speaker reading abilitygroups were significantly different Both Reading to Learn and
Table 9 Descriptive statistics for nonnative speakers by TOEFL reading compre-hension reading ability groups (n 143)
Reading ability group n Mean on TOEFL reading comprehension sd
High (56) 57 5968 303Mid (50ndash55) 42 5281 167Low (49) 44 4211 547
Total 143 5226 822
Note Total n 143 due to loss of three cases in anomalous Reading to Integratesession
5We first calculated two separate discriminant analyses one for Reading to Learn and one for Reading to Integrate but we found that both loaded on a single function so we used thecomposite in subsequent analyses
194 New tasks for reading comprehension tests
Reading to Integrate loaded significantly on the discriminant func-tion at 86 for Reading to Learn and 78 for Reading to Integrate ( p 05)
Over half (649) of the high reading ability group remained highon the new measure less than half (429) of the mid reading ability group remained classified as mid Hence the Reading toLearnReading to Integrate Composite was particularly influential inreclassifying the mid reading ability group and to a lesser extent thehigh group However most (818) of the low reading ability groupremained low on the Reading to LearnReading to IntegrateComposite (see Table 10) Of the 143 nonnative speaker participants92 (64) remained in the initial basic comprehension category onthe composite the rest moved but in different directions Twenty-one participants (147) were classified into a higher category on theReading to LearnReading to Integrate Composite than their initialbasic comprehension level would have suggested while 31 (217)were reclassified into a lower category Thus 51 participants(364) just over one third of the sample were classified differentlybased on their Reading to LearnReading to Integrate Compositeperformance
b Discriminant analysis for native speakers Because one of ourgoals in this project was to probe the possible validity of these newmeasures by assessing performance of two groups native as well as
Table 10 Discriminant analysis comparison of nonnative speaker reading abilitygroups with reading to learnreading to integrate composite (n 143)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto integrate composite
High Mid Low
CountHigh (56) 37 17 3 57Mid (50ndash55) 13 18 11 42Low (49) 2 6 36 44Reclassification total 52 41 50 143
PercentageHigh (56) 649 298 53 1000Mid (50ndash55) 310 429 262 1000Low (49) 45 136 818 1000
NoteTotal n 143 due to loss of three cases in anomalous Reading to Integratesession
Latricia Trites and Mary McGroarty 195
nonnative speakers we conducted a parallel discriminant analysisfor native speakers Thus for the native speakers we followed thesame procedure dividing the entire native speaker group (n 105)into three levels of basic comprehension high mid and low basedon score distances of 5 standard deviations from the sample meanof the NelsonndashDenny test for these participants Native speakersneed not take reading comprehension tests when entering the univer-sity so the three-way split was based entirely on our sample dataParticipants were classified as high if their scores on the Nelson-Denny were greater than or equal to 135 Participants were classifiedas mid if their scores ranged between 119ndash134 on the Nelson-DennyParticipants were classified as low if their scores fell at or below 118on the Nelson-Denny Descriptive statistics for the entire nativespeaker sample on basic comprehension group membership appearin Table 11
The discriminant analysis yielded one discriminant function with an eigenvalue of 31 responsible for 987 of the variance inoutcomes Wilkrsquos Lambda for this function was 76 significant at 001 the associated Chi-Square value was large (2639) and highlysignificant (p 001) indicating the group centroids on the discrim-inant function for the three reading ability groups on the Reading toLearnReading to Integrate Composite were significantly differentPooled within groups correlations between discriminating variablesshowed that Reading to Learn correlated with the first discriminantfunction at a level of 81 Reading to Integrate correlated with thesecond discriminant function at 71 This contrasts with findings forthe nonnative speakers where scores on the combined new measuresloaded significantly on only one discriminant function For nativespeakers then there is evidence for two significant discriminantfunctions although the first accounts for almost all of the varianceAlthough these two measures (Reading to Learn and Reading toIntegrate) loaded on two separate discriminant functions they still
Table 11 Descriptive statistics for native speakers by NelsonndashDenny reading abilitygroups (n 101)
Reading ability group n Mean on NelsonndashDenny sd
High (135) 39 14126 519Mid (119ndash134) 37 12565 509Low (118) 25 10288 1098Total 101 12604 1652
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
196 New tasks for reading comprehension tests
showed moderate correlations with the alternate function6 justifyingthe composite calculations
Results of discriminant analysis for native speakers seen in Table 12 show a different pattern than that observed for nonnativespeakers Nearly three-fourths (718) of the native speakersclassified as high in the basic comprehension reading ability groupremained high on the Reading to LearnReading to IntegrateComposite For the mid group however only 189 remainedclassified as mid Just over half (52) of the low reading abilitygroup members remained low on the Reading to LearnReading to Integrate Composite As with the nonnative speakers participantsin the mid category on basic comprehension showed the mostfrequent reclassification Forty-eight of the 101 (475) nativespeaker participants remained in the initial classification categoriesTwenty-two (218) were reclassified into a higher category and 31(307) were reclassified into a lower category Thus 53 participants(525) over half of the sample were classified differently based ontheir performance on the Reading to LearnReading to IntegrateComposite
6Reading to Learn correlated with function 2 at -58 Reading to Integrate with function 1 at 71
Table 12 Discriminant analysis comparison of native speaker reading abilitygroups with Reading to LearnReading to Integrate Composite (n 101)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto Integrate Composite
High Mid Low
CountHigh (56) 37 17 3CountHigh (135) 28 3 8 39Mid (119ndash134) 10 7 20 37Low (118) 5 7 13 25Reclassification total 43 17 41 101
PercentageHigh (135) 718 77 205 1000Mid (119ndash134) 270 189 541 1000Low (118) 200 280 520 1000
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
V Interpretation
Analyses done to answer Research Questions 1 and 2 showed that performance on Reading to Learn and Reading to Integratemeasures was significantly influenced by language background and level of education (graduate vs undergraduate for nonnativespeakers only) Moreover level of computer familiarity had nosignificant effect on Reading to Learn and Reading to Integrateperformance
Correlations showed that as expected all reading measures corre-lated to some degree generally answering Research Question 3 Themost interesting results came from the discriminant analyses becausethey showed a pattern of differential classification on the new taskssensitive to initial reading ability This pattern differed for nonnativespeakers and native speakers Examination of the reclassificationsbased on the new tasks revealed some of the problems of using abasic comprehension-only test to predict performance on more chal-lenging literacy tasks These results also imply the need for furtherwork on new measures of advanced literacy skills such as Reading toLearn and Reading to Integrate to reflect trends in construct-drivenassessment (Pellegrino et al 1999)
For the nonnative speakers most (818) of the participants withTOEFL Reading Comprehension below 50 remained classified as lowon the Reading to LearnReading to Integrate Composite suggestingthe existence of a lower threshold of academic English proficiencyParticipants below this threshold were unlikely to perform well ontasks assessing Reading to Learn and Reading to Integrate Even tasksinvolving only selection (such as BC1 and BC2) rather than produc-tion of responses were difficult for this group However for thenonnative speaker readers in the mid and high reading ability groupsbasic comprehension level was not nearly as consistent a predictor of performance on the Reading to LearnReading to IntegrateComposite This was especially striking for those in the mid readingability group Approximately one fourth of the mid group droppedinto the low category on the Reading to LearnReading to IntegrateComposite indicating that they experienced more difficulty in com-pleting the new measures but approximately one fourth could indeedcategorize and synthesize information relatively well despite mid-dling performance on basic comprehension This finding suggeststhat for nonnative speaker readers in the mid reading ability categorybasic comprehension measures were insufficient to predict theirperformance on the new measures
Latricia Trites and Mary McGroarty 197
Almost two-thirds of the high basic comprehension nonnativespeakers remained high on the Reading to LearnReading toIntegrate Composite but one third dropped suggesting that theycould not categorize and synthesize information from texts as well asthey could recognize information For this third the recognition-onlytask multiple-choice basic comprehension overestimated ability tomanipulate and synthesize information For the high and mid read-ing ability nonnative speaker groups basic comprehension-only testsmay then overestimate academic English proficiency
Results for the native speakers showed a more definitive pattern ofreclassification For the low reading ability group approximatelyhalf were misclassified suggesting that the basic comprehensionmeasure underrepresented their ability to categorize and synthesizeinformation in academic texts Most of the mid reading ability groupwere reclassified as lower on the Reading to LearnReading toIntegrate Composite showing that basic comprehension resultsoverestimated ability to succeed on more challenging tasks On theother hand almost one third of the mid reading ability group didbetter on the Reading to LearnReading to Integrate Composite forthem basic comprehension underrepresented their ability to com-plete more challenging reading tasks For the high ability readersnearly one third dropped on the Reading to LearnReading toIntegrate Composite For them the basic comprehension measureoverestimated their level of academic English proficiency The other two thirds of the high reading ability group remained highsuggesting a higher threshold of academic English proficiency
Considering results from both the nonnative speakers and nativespeakers we conclude that the new tasks did assess something dif-ferent from basic comprehension once a lower level threshold ofbasic academic English proficiency had been achieved Examinationof scatterplots based on discriminant functions indicated an obviousseparation between participants who could and could not perform onthe Reading to LearnReading to Integrate Composite thus provid-ing some tentative evidence for concurrent validity (Messick 1989Chapelle 1999) We had hoped to find clear evidence of a hierarchysuggesting that Reading to Learn was demonstrably more difficultthan basic comprehension and Reading to Integrate demonstrablymore difficult than Reading to Learn but results did not yield anobvious hierarchy Results did however suggest an even simplerpattern a dichotomy For those nonnative speakers above the lowerthreshold of English language proficiency which in our data wouldbe approximately 500 or 49 on the scaled TOEFL Reading
198 New tasks for reading comprehension tests
Comprehension scores some could perform well on the newmeasures some could not revealing two rather than three groups onthe Reading to LearnReading to Integrate Composite These resultslead us to speculate that the new measures tap additional skills suchas lsquosophisticated discourse processes and critical thinking skillsrsquo(Enright and Schedl 1999 24) in addition to language proficiencyFor native speakers the discriminant loading suggested a possiblehierarchy of difficulty but the scatterplots and classification tablesrevealed a dichotomy Reclassification based on the Reading toLearnReading to Integrate Composite nearly eliminated the mid basic comprehension reading ability category with only 168of the native speakers qualifying as mid on the Reading toLearnReading to Integrate Composite
VI Conclusions
While we acknowledge that there are some limitations to this projectsuch as the time required to administer and score new measuresthe results are illuminating useful and suggest some considerationsfor future research The first consideration relates to appropriateclassification of students based on academic reading abilities Thiscorresponds to the role of TOEFL as a gatekeeper by many institu-tions For TOEFL test-takers whose current TOEFL scores would bein an intermediate to high intermediate range the new tasks couldassess additional abilities relevant to academic performance Forsuch participants additional test development efforts are warrantedThe majority of high reading ability nonnative speakers (649)could perform the new measures a third could not Our resultssuggest that some unknown number of mid reading ability nonnativespeakers are more capable of succeeding at more challenging aca-demic reading tasks than their current level of basic comprehensionassessment would indicate At the same time there are some highand mid reading ability participants who could not perform the moredemanding tasks admitting such students directly into universityprograms could result in failure These results suggest that for moststudents at lower levels of basic comprehension (in our study thosewith TOEFL scores below 490 to 500) development of new tasks isunnecessary Nevertheless our results indicate that a certain smallpercentage (in our sample 8 students 182) of nonnative speakersclassified as low reading ability on basic comprehension did in fact
Latricia Trites and Mary McGroarty 199
perform adequately on more challenging tasks like Reading to Learnand Reading to Integrate Given the very large number of TOEFLtest-takers around the world this finding merits further research Itwould be potentially unfair to exclude such students from universitystudy based on basic comprehension-only measures such as the cur-rent TOEFL We would urge institutions that use TOEFL to considerscores on these new more demanding tasks if doing so wouldaddress institutional needs According to interview data collected inconjunction with this project (Trites 2000 Chapter 6) participantsperceived that successful completion of the new tasks required athorough understanding of the texts whereas the multiple-choicetests were less demanding requiring only superficial grasp of con-tent This finding corroborates the observation of Freedle and Kostin(1999 3) who note that often examinees do not need to comprehendthe accompanying text to answer the test item Therefore for allTOEFL test-takers incorporating more challenging tasks such as theReading to Learn and Reading to Integrate measures into typicalEnglish language instruction could have positive washback effects
The second area of consideration is the focus on additional rele-vant research For predictive validity it is important to determine thecorrelation between the Reading to LearnReading to IntegrateComposite measures and actual academic performance in universityclasses Correlations might differ depending on the degree of catego-rization and synthesis required by different major fields of studyAnother area for future research is related to the novelty of theReading to Learn task as an assessment technique Current pedagog-ical trends in literacy instruction emphasize the use of graphic organ-izers as a means to understand and manipulate information in textsTo our knowledge graphic organizers are typically used as classroomactivities rather than assessment tools They have good potential foruse in assessment if students are familiar with them and if appropri-ate scoring systems can be developed This project demonstrates that it is possible though labor intensive to develop reliable scoringsystems Further research is needed to explore refinement of this tasktype and related scoring systems for use in testing programs withlarge numbers of test-takers and scorers Although the Reading toIntegrate task (generating a synthesis) was more familiar the scoringsystem was innovative because it reflected a readerrsquos ability torecognize textual frames as well as integrate information If testdevelopers are interested in the abilities of nonnative speaker readersto perform such tasks further development of similar tasks andscoring systems is warranted While a substantial investment of time
200 New tasks for reading comprehension tests
would be required to refine the administration and scoring systemsneeded for more complex tasks such as these it should be weighedagainst the possible danger of under-representing the likelihood ofacademic success based on results of the basic comprehension-onlymeasures still most often used to assess academic proficiency
Acknowledgements
This project was funded by a grant from Educational Testing Serviceas part of the TOEFL 2000 effort We appreciate the cooperation ofthe ETS staff members who assisted in the development of passagespecific tasks and other aspects of the research however no officialendorsement of Educational Testing Service should be inferred
VII References
Bachman L 2000 Modern language testing at the turn of the century assur-ing that what we count counts Language Testing 17 1ndash42
Biber D 1993 Using register-diversified corpora for general language studiesComputational Linguistics 19 219ndash41
Britt M Rouet J and Perfetti C 1996 Using hypertext to study and reasonabout historical evidence In Rouet J Levonen J Dillon A and Spiro Reditors Hypertext and cognition Mahwah NJ Lawrence Erlbaum 43ndash72
Chapelle C 1999 Validity in language assessment Annual Review of AppliedLinguistics 19 1ndash19
Educational Testing Service 1997 TOEFL test and score manual PrincetonNJ Educational Testing Service
mdashmdash 1998 Draft TOEFL 2000 research agenda framework areas of researchResearch agenda three TOEFL 2000 internal document Princeton NJEducational Testing Service
Eignor D Taylor C Kirsch I and Jamieson J 1998 Development of ascale for assessing the level of computer familiarity of TOEFL exami-nees TOEFL Research Report No 60 Princeton NJ EducationalTesting Service
Enright M and Schedl M 1999 Reading for a reason using reader purposeto guide test design TOEFL 2000 Internal Report Princeton NJEducational Testing Service
Enright M Grabe W Mosenthal P Mulcahy-Ernt P and Schedl M1998 A TOEFL 2000 framework for testing reading comprehension aworking paper Princeton NJ Educational Testing Service
Foltz P 1996 Comprehension coherence and strategies in hypertext InRouet J Levonen J Dillon A and Spiro R editors Hypertext and cog-nition Mahwah NJ Lawrence Erlbaum 109ndash36
Latricia Trites and Mary McGroarty 201
Freedle R and Kostin I 1999 Does the text matter in a multiple choice testof comprehension The case for the construct validity of TOEFLrsquosminitalks Language Testing 16 2ndash32
Goldman S 1997 Learning from text reflections on the past and suggestionsfor the future Discourse Processes 23 357ndash98
Hayes J and Hatch J 1999 Issues in measuring reliability correlationversus percentage of agreement Written Communication 16 354ndash67
Huberty C 1994 Applied discriminant analysis New York John WileyJamieson J Campbell J Norfleet L and Berbisada N 1993 Reliability
of a computerized scoring routine for an open-ended task System 21305ndash22
Jamieson J Norfleet L and Berbisada N 1993 Successes failures anddropouts in computer-assisted language lessons Computer AssistedEnglish Language Learning Journal 4 12ndash20
Klecka W 1980 Discriminant analysis In Lewis-Beck M editorQuantitative applications in the social sciences Volume 19 NewburyPark CA Sage
Lehto M Zhu W and Carpenter B 1995 The relative effectiveness ofhypertext and text International Journal of Human-ComputerInteraction 7 293ndash313
McNamara D and Kintsch W 1996 Learning from texts effects of prior knowl-edge and text coherence Discourse Processes 22 247ndash88
Messick S 1989 Validity In Linn RL editor Educational measurement 3rdedition New York American Council on Education Macmillan 13ndash103
Meyer B 1985a Prose analyses purposes procedures and problems InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 11ndash64
mdashmdash 1985b Prose analysis purposes procedures and problems Part 2 InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 269ndash97
Monks V 1997 AugustSeptember Two views same waterway NationalWildlife 35 36ndash37
Pellegrino J Baxter G and Glaser R 1999 Addressing the lsquotwo disci-plinesrsquo problem linking theories of cognition and learning with assess-ment and instructional practice Review of Research in Education 24307ndash53
Perfetti C 1997 Sentences individual differences and multiple texts threeissues in text comprehension Discourse Processes 23 337ndash55
Perfetti C Britt MA and Georgi M 1995 Text-based learning andreasoning studies in history Hillside NJ Lawrence Erlbaum
Perfetti C Marron M and Foltz P 1996 Sources of comprehension failuretheoretical perspectives and case studies In Cornoldi C and Oakhill Jeditors Reading comprehension difficulties processes and interventionMahwah NJ Lawrence Erlbaum 137ndash65
202 New tasks for reading comprehension tests
Reinking D 1988 Computer-mediated text and comprehension differencesthe role of reading time reader preference and estimation of learningReading Research Quarterly 23 484ndash500
Reinking D and Schreiner R 1985 The effects of computer-mediated texton measures of reading comprehension and reading behavior ReadingResearch Quarterly 20 536ndash53
Spivey N 1997 The constructivist metaphor reading writing and themaking of meaning San Diego CA Academic Press
Stevens J 1996 Applied multivariate statistics for the social sciences 3rdedition Mahwah NJ Lawrence Erlbaum
Tabachnick B and Fidell L 1996 Using multivariate statistics 3rd editionNew York Harper Collins
Taylor C Jamieson J Eignor D and Kirsch I 1998 The relationshipbetween computer familiarity and performance on computer-basedTOEFL test tasks TOEFL Research Report No 61 Princeton NJEducational Testing Service
Tennesen M 1997 NovemberDecember On a clear day National Parks 7126ndash9
Trites L 2000 Beyond basic comprehension reading to learn and reading tointegrate for native and non-native speakers Unpublished doctoraldissertation Northern Arizona University Flagstaff AZ
Van den Berg S and Watt J 1991 Effects of educational setting on studentresponses to structured hypertext Journal of Computer-BasedInstruction 18 118ndash24
Van Dijk TA and Kintsch W 1983 Strategies of discourse comprehensionNew York Academic Press
Wiley J and Voss J 1999 Constructing arguments from multiple sourcestasks that promote understanding and not just memory for text Journalof Educational Psychology 91 310ndash11
Zimmerman T 1997 December 29 Filter it with billions and billions of oys-ters how to revive the Chesapeake Bay US News and World Report 12363
Appendix 1a Chart completion task
Directions Complete the following chart Fill in as much detail aspossible from the text read by categorizing the information into thedifferent areas on the chart Do not use single words for yourresponses form your responses in phrases or complete sentencesInclude examples from the text
Make no judgments about the accuracy of causes or effects or theeffectiveness of the solutions mentioned in the text Solutions are seen
Latricia Trites and Mary McGroarty 203
204 New tasks for reading comprehension tests
as any action taken in response to the problem(s) Solutions can takemany forms such as proposed solutions attempted solutions or failedsolutions Also space is provided under each category for examplesExamples are specific examples found in the text that are used by theauthor(s) to exemplify the problems causes effects or solutions inthe text However there may not be examples for every category
Points will be awarded for correct responses only There is no penaltyfor incorrect responses Points will be awarded in the following manner
Problems and Solutions 10 points eachCauses and Effects 5 points eachExamples 1 point each
Problems Causes Effects Solutions
Examples Examples Examples Examples
Latricia Trites and Mary McGroarty 205
Ap
pen
dix
1b
Rea
din
g t
o L
earn
sco
rin
g r
ub
ric
0ndash2
41 t
ota
l po
ssib
le
Pro
ble
ms
(10
po
ints
) 1
wo
rd
Cau
ses
(5p
oin
ts)
1 w
ord
E
ffec
ts (
5 p
oin
ts)
1 w
ord
S
olu
tio
ns
(10
po
ints
en
d)
1 w
ord
(6 p
oin
ts)
40
max
imu
m(3
po
ints
) 5
5 m
axim
um
(3 p
oin
ts)
30
max
imu
m(6
po
ints
) 9
0 m
axim
um
bullA
ir p
ollu
tio
nS
mo
g i
n t
he
bullS
ulf
ur
nit
rog
en e
mis
sio
ns
bullTre
es a
nd
pla
nts
aff
ecte
dS
tric
ter
En
vir
on
men
tal
Law
s
Nati
on
al
Park
sbull
Aci
d r
ain
(in
jure
d)
gro
wth
hin
der
ed(r
eso
luti
on
s
acts
)
bullO
pp
osit
ion
or
Ign
ori
ng
bull
Gro
un
d-l
evel
ozo
ne
bullV
isib
ilit
y d
ecre
ased
bull19
77 C
lean
Air
Act
(or
lack
of
coo
per
atio
n)
to
Urb
an
In
du
str
ial
em
issio
ns
bullM
eta
ls l
oo
sen
ed
into
wat
ers
Am
en
dm
en
ts l
ab
elin
gN
Ps
envi
ron
men
tal s
tan
dar
ds
bullA
uto
mo
bile
emis
sio
ns
(su
rfac
eg
rou
nd
wat
er
as C
lass I
are
as
(reg
ula
tio
ns)
bullP
ow
er
pla
nts
fac
tori
esp
lan
ts
wat
er p
ollu
ted
)bull
Reg
ion
al h
aze
reg
ula
tio
ns
bullN
o t
rue
lsquopo
int
sou
rces
rsquoem
issi
on
sbull
Nu
trie
nts
rem
oved
pro
po
sed
by
the
EPA
(myri
ad
of
sm
aller
po
llu
tio
n
bullE
mis
sio
ns
fro
m S
mo
kesta
cks
(lea
ched
) fr
om
so
il bull
Red
ucti
on
of
allo
wab
leso
urc
es
rath
er t
han
on
e (c
him
neys)
and
or
pla
nts
po
llu
tio
n s
tan
dard
s f
rom
la
rge
sou
rce)
bullE
mis
sio
ns
fro
m K
iln
sbull
Pu
blic o
utc
ryo
ver
stan
ce
ind
ust
rybull
Nati
on
al
Park
s l
imit
ed
bull
Un
reg
ula
ted
po
lluti
on
sm
og
o
f b
ig b
usi
nes
sg
ove
rnm
ent
bullC
lean
Air
Act
set
ju
risd
icti
on
of
po
lluti
on
(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)bull
Aq
uati
c l
ife in
jure
d (
dam
aged
)vis
ibilit
y g
oals
sou
rces
ou
tsid
e o
f th
e p
arks
bullS
mo
ke f
rom
Co
ntr
olled
bu
rns
bullO
bje
cti
ng
to c
on
str
ucti
on
bullE
mis
sio
ns
fro
m la
rge
citi
esp
erm
its
bullId
en
tifi
cati
on
of
po
lluti
on
so
urc
es
bullIn
du
str
y m
od
ificati
on
s
Insta
llati
on
o
f d
evic
es
(scr
ub
ber
s)
to r
edu
ce
po
lluti
on
bullP
ub
lic p
ressu
re t
op
rote
ct p
arks
206 New tasks for reading comprehension testsA
pp
en
dix
1b
(co
nti
nu
ed)
Exam
ple
s (
1 p
oin
t) 5
maxim
um
Exam
ple
s (
1 p
oin
t o
r 5 p
oin
ts
Exam
ple
s (
1 p
oin
t) 6
maxim
um
Exam
ple
s (
1 p
oin
t o
r 10 p
oin
ts
wit
h o
vera
rch
ing
cau
se
wit
h o
vera
rch
ing
so
luti
on
liste
d a
bo
ve)
liste
d a
bo
ve)
bullG
reat
Sm
oky M
ou
nta
inbull
Au
tom
ob
ile
emis
sio
ns
bullLe
aves
of
pla
nts
tu
rnin
g
bull19
77 C
lean
Air
Act
N
atio
nal
Par
kbull
Po
wer
pla
nts
fac
tori
es
pu
rple
an
d b
row
n (
stip
plin
g)
Am
en
dm
en
ts
lab
elin
gN
Ps
bullG
ran
d C
an
yo
np
lan
ts e
mis
sio
ns
bullH
ind
erin
g p
ho
tosyn
thesis
of
as C
lass I
are
as
Nat
ion
al P
ark
bullE
mis
sio
ns
fro
m
pla
nts
an
d t
rees
bullS
mo
ky M
ou
nta
in S
ou
rces
S
mo
kesta
cks
(ch
imn
eys)
bullR
ed
ucti
on
of
vis
ibilit
yat
bullR
egio
nal
haz
ere
gu
lati
on
s
fro
m O
hio
New
Yo
rk
bullE
mis
sio
ns
fro
m K
iln
sG
ran
d C
an
yo
np
rop
ose
d b
y th
e E
PA
Atl
anta
etc
bull
Un
reg
ula
ted
po
lluti
on
sm
og
bull
Vis
ibili
ty r
educ
ed 9
0 o
f bull
Red
ucti
on
of
allo
wab
lebull
Gra
nd
Can
yon
So
urc
es(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)th
e da
ysp
ollu
tio
n s
tan
dard
s f
rom
fro
m C
A N
V U
T A
Z N
M
bullS
mok
e fr
om C
on
tro
lled
bu
rns
bullS
ee v
ague
blu
e m
asse
sin
du
stry
and
Mex
ico
bullE
mis
sio
ns
fro
m la
rge
citi
esbull
See
hal
f as
far
as
in 1
919
bullT
N L
utt
rell
corp
bu
ildin
g
per
mit
sit
uat
ion
Exam
ple
s (
1 p
oin
t)
10 m
axim
um
Exam
ple
s (
1 p
oin
t)
5 m
axim
um
bullN
avaj
o G
ener
atin
g S
tati
on
bull
Ob
ject
ing
to
kiln
s in
TN
Pag
e A
Z (
Po
wer
Pla
nt
AZ
)bull
Res
earc
her
s u
sin
g s
cien
tifi
cbull
Po
wer
pla
nts
in T
N a
nd
OH
tech
no
log
y to
ID s
ou
rces
rive
r va
lley
(rad
ioac
tive
iso
top
es)
bullS
ou
ther
n C
alif
orn
ia E
dis
on
bull
So
uth
ern
Cal
ifo
rnia
Ed
iso
nP
lan
t L
aug
hlin
NV
Pla
nt
in L
aug
hlin
NV
bull
Ten
nes
see
Lutt
rell
Kiln
s T
Nid
enti
fied
as
po
int
sou
rce
bullIn
du
stry
in A
Zbull
Scru
bber
sin
stal
led
at
bullC
ars
in C
AN
avaj
o G
ener
atin
g P
lan
tbull
Sm
oke
stac
ks in
NM
NV
UT
bull10
r
edu
ctio
n o
f p
ollu
tio
nbull
Los
An
gel
esin
10ndash
15 y
ears
(g
oal
of
no
bullA
tlan
tam
an-m
ade
po
lluti
on
)bull
New
Yo
rk
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
Latricia Trites and Mary McGroarty 187
The resulting ANOVA (F 470 p 05) showed a significantdifference between subgroups on the computer familiarity question-naire therefore a post hoc Scheffeacute test was done to locate significantcontrasts After analysis of all possible subgroup contrasts the post hoc Scheffeacute revealed that the only significant difference in sub-groups appeared between the native speaker undergraduates andnonnative speaker undergraduates who read texts on paper Hencealthough there was one significant contrast it occurred in two sub-groups reading on paper not in any of the subgroups who read oncomputer All groups generally scored high on computer familiarityalthough as noted variance of the nonnative groups was greater Itwas thus established that computer familiarity had no significanteffect on participants who read texts on computer so we did not usecomputer familiarity as a covariate in further analyses and proceededto the three research questions of central interest to this study
Because both Research Questions 1 and 2 are similar ndash except thatthey address the two different new reading measures Reading toLearn and Reading to Integrate ndash we approached them in the samemanner through ANOVA to identify the independent variables thatcould have significantly affected the results on the new measures
2 Research Question 1
The first research question asked if performance on a measure ofReading to Learn was affected by medium of presentation computerfamiliarity native language or level of education We calculated a uni-variate ANOVA with Type III sums of squares on Reading to Learn with
Table 3 Range of scores for new measures for three participant groups
Group n Minimum Maximum kMax
Reading to Learn (chart)NSU 105 14 120 241NNSU 106 0 86 241NNSG 40 3 94 241Total participants 251 0 120 241
Reading to Integrate (synthesis)NSU 101 38 80 80NNSU 103 0 80 80NNSG 40 5 80 80Total participants 244 0 80 80
Notes kMax number of items or maximum possible score n size reduced forreading to integrate because of anomalous testing session
188 New tasks for reading comprehension tests
group status medium of text presentation and test order as possiblecontributing factors4 Table 4 shows that there were no significantinteractions for any of the group medium or test order combinationsThe only significant main effect was group membership
Because group membership was a combined measure that includedboth native language background as well as level of education posthoc analysis was needed to identify the significant contrasts Table 5shows that there was a significant difference in performance on theReading to Learn measure between the native speaker undergraduateand the nonnative speaker undergraduate groups as well as a sig-nificant difference between the nonnative speaker undergraduate and nonnative speaker graduate groups There was no significantdifference in performance between the native speaker undergrad-uate and the nonnative speaker graduate groups Therefore theanswer to Research Question 1 is that native language backgroundand level of education did have a significant effect on performance onthe Reading to Learn measure but that medium of text presentationdid not Further order of testing whether participants took Readingto Learn or Reading to Integrate first had no significant effect
3 Research Question 2
The second research question related to the first asked if perform-ance on Reading to Integrate was affected by medium of presentation
Table 4 Performance on Reading to Learn measure by groups medium and testorder (n 251) (univariate analysis of variance)
Source Type III sum df Mean square Fof squares
Group 2186929 2 1093464 2810Medium 121586 1 121586 313Test order 29498 1 29498 076Group medium 33481 2 16740 043Group test order 437 2 219 001Medium test order 39173 1 39173 101Group medium test order 57529 2 28765 074Error 9299745 239 38911
Note p 05
4Test order was added as an additional variable to double check that our counterbalancing had beeneffective in controlling for any practice effect
Latricia Trites and Mary McGroarty 189
computer familiarity native language or level of education Again toensure that counterbalancing of tests controlled for any practiceeffect test order was added as an additional variable
To answer this question we proceeded to calculate a univariateANOVA on the Reading to Integrate measure with group statusmedium of text presentation and test order entered as possible contri-buting factors The results (Table 6) show as for Research Question 1that there were no significant interactions for any of the groupmedium or test order combinations the only significant main effectwas group membership The answer for Research Question 2 is thatnative language background and educational level had a significanteffect on Reading to Integrate but medium of text presentation did notPost hoc analysis of group contrasts showed that all three groups weredistinct in their performance on Reading to Integrate (see Table 7)
4 Research Question 3
The third research question asked to what extent measures of basiccomprehension Reading to Learn and Reading to Integrate were
Table 5 Post hoc Scheffeacute for Reading to Learn measure (n 251)
Group n Group n Mean difference Standard error
NSU 105 NNSU 106 2012 272NNSG 40 717 367
NNSU 106 NNSG 40 1295 366
Note p 05
Table 6 Performance on Reading to Integrate measure by groups medium andtest order (n 244a) (univariate analysis of variance)
Source Type III sum df Mean square Fof squares
Group 3629433 2 1814717 5582b
Medium 19295 1 19295 059Test order 9783 1 9783 030Group medium 1182 2 591 002Group test order 3014 2 1507 005Medium test order 109872 1 109872 338Group medium test order 148858 2 74429 229Error 7543037 232 32513
Notes an size reduced for Reading to Integrate because of anomalous testingsession bp 05
190 New tasks for reading comprehension tests
related We used correlational analysis as the first step in answeringthis question Results for the total participant population (see Table 8)showed moderate to high correlations across all reading measuresHowever the analyses done for Research Questions 1 and 2 revealedthat group status had a significant effect on performance on Readingto Learn and Reading to Integrate Further we realize that corre-lations are sensitive to variance so the high correlations seen in thetotal population could have been an artifact of combining the threegroups Therefore we examined the correlations among all readingmeasures for each group (available in Trites 2000 Appendix 1 pp230ndash33) While the reading measures were still correlated oftenmoderately sometimes highly magnitudes differed and sometimesdropped substantially The text-specific multiple-choice measuresBC1 and BC2 consistently correlated more highly with theNelsonndashDenny and TOEFL Reading Comprehension tests than withReading to Learn and Reading to Integrate based on the same textssuggesting a test method or construct effect Because comparisonsbetween different measures of basic comprehension were not a goal of the project BC1 and BC2 were not used in further analysesWe conclude that as expected all reading measures were relatedbut the lower correlations between Reading to Learn and Reading to Integrate and the traditional basic comprehension measures led us to consider further types of analysis to identify the possibledistinctiveness of the new measures
5 Discriminant analysis
Because we were interested in determining how constructs differedwe sought additional analyses to help us better characterize the new constructs Of the several possible statistical methods that could have been employed two are most plausible multivariateanalysis of variance usually associated with experimental research
Table 7 Post hoc Scheffeacute for Reading to Integrate measure (n 244a)
Group n Group n Mean difference Standard error
NS 101 NNSU 103 2641b 253NNSG 40 1005b 337
NNSU 103 NNSG 40 1636b 336
Notes an size reduced for Reading to Integrate because of anomalous testingsession bp 05
Latricia Trites and Mary McGroarty 191
Tab
le 8
Co
rrel
atio
ns
for
all r
ead
ing
mea
sure
s fo
r al
l par
tici
pan
ts (
n
251)
TO
EFL
Rea
din
gB
asic
B
asic
Rea
din
g t
oR
ead
ing
to
C
om
pre
hen
sio
nC
om
pre
hen
sio
n T
est
1C
om
pre
hen
sio
n T
est
2Le
arn
Inte
gra
tea
Nel
son
ndashDen
ny
90b
85b
84b
66b
69b
TO
EFL
Rea
din
g
100
85b
84b
64b
69b
com
pre
hen
sio
nB
asic
1
008
4b6
8b6
8b
com
pre
hen
sio
n 1
Bas
ic
100
68b
70b
com
pre
hen
sio
n 2
Rea
din
g t
o L
earn
100
59b
Not
es a
nsi
ze r
edu
ced
fo
r R
ead
ing
to
Inte
gra
te b
ecau
se o
f an
om
alo
us
test
ing
ses
sio
n b
p
05
and discriminant analysis usually associated with descriptiveresearch (Tabachnick and Fidell 1996) The present research wasconducted with samples of naturally occurring student groups andwas not experimental Moreover we were interested in finding waysto compare participant performance on the measures of the newconstructs Reading to Learn and Reading to Integrate with perform-ance on more traditional measures of basic comprehension Thus we opted to use discriminant analysis because of its parsimony of description and clarity of interpretation (Stevens 1996)Discriminant analysis a technique recommended to describe groupdifferences or predict group membership based on a comparison ofmultiple predictors (Huberty 1994) has been used in other areas ofapplied linguistic research to investigate creation of a student profileof success or failure on Computer Assisted Language Learning(CALL) lessons (Jamieson et al 1993) and accurate classification oftext types into registers (Biber 1993) among other purposes
To further distinguish basic comprehension from the new con-structs we conducted discriminant analysis on each of the twolanguage groups (native and nonnative) to determine whetherReading to Learn and Reading to Integrate would classify partici-pants in the same way that Basic Comprehension would We dividedthe native speaker and nonnative speaker groups into three levelshigh middle (mid) and low reading ability scorers based on thebasic comprehension measure chosen for that group (NelsonndashDennyfor native speakers TOEFL Reading Comprehension for nonnativespeakers) Research methodologists (Tabachnick and Fidell 1996513) note that robustness is expected when the smallest group has isleast 20 our smallest group was 25 To check the assumption ofhomogeneity of the variancecovariance matrices we examined theoutcomes of Boxrsquos M Test and found them all nonsignificant(Klecka 1980) Each group was checked for outliers usingMahalanobis distance treated as Chi-Square and no outliers were found (Tabachnick and Fidell 1996) Thus the data met allassumptions required for use of discriminant analysis
a Discriminant analysis for nonnative speakers To organize thediscriminant analysis in order to see if the Reading to LearnReadingto Integrate Composite classified participants similarly to themeasure of Basic Comprehension (for nonnative speakers TOEFLReading Comprehension) we divided the entire nonnative speakergroup (n 146) into three levels of basic comprehension high mid
192 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 193
and low These reading ability groups of high mid and low werebased on typical TOEFL Reading Comprehension score levelsrequired for program entry Participants were classified as high iftheir scores were greater than or equal to 550 or 56 and above on thescaled score on the TOEFL Reading Comprehension (550 is the cutscore often used for graduate entry) Participants were classified asmid if scores ranged between 500 and 549 or 50 to 55 on the TOEFLReading Comprehension Participants were classified as low if theirscores fell below 500 or 49 and below on TOEFL ReadingComprehension (500 is a minimum TOEFL score sometimes usedfor undergraduate admission often with the proviso that studentsenroll in ESL classes either prior to official enrollment or concur-rently) For our entire nonnative speaker group descriptive statisticson basic comprehension reading ability group membership levelsappear in Table 9
We ran SPSS Discriminant Analysis with initial grouping variablesof high mid and low reading ability We compared initial readingability levels with high mid and low categories on the Reading toLearnReading to Integrate Composite a new variable reflectinglevel of performance on the Reading to Learn and Reading toIntegrate measures combined5 The discriminant analysis yieldedone discriminant function with an eigenvalue of 92 responsible for999 of the variance in outcomes Wilkrsquos Lambda for this functionwas 52 significant at 001 the associated Chi-Square value wasextremely large (9093) and highly significant ( p 001) indicatingthe group centroids on the composite Reading to LearnReading toIntegrate function for the three nonnative speaker reading abilitygroups were significantly different Both Reading to Learn and
Table 9 Descriptive statistics for nonnative speakers by TOEFL reading compre-hension reading ability groups (n 143)
Reading ability group n Mean on TOEFL reading comprehension sd
High (56) 57 5968 303Mid (50ndash55) 42 5281 167Low (49) 44 4211 547
Total 143 5226 822
Note Total n 143 due to loss of three cases in anomalous Reading to Integratesession
5We first calculated two separate discriminant analyses one for Reading to Learn and one for Reading to Integrate but we found that both loaded on a single function so we used thecomposite in subsequent analyses
194 New tasks for reading comprehension tests
Reading to Integrate loaded significantly on the discriminant func-tion at 86 for Reading to Learn and 78 for Reading to Integrate ( p 05)
Over half (649) of the high reading ability group remained highon the new measure less than half (429) of the mid reading ability group remained classified as mid Hence the Reading toLearnReading to Integrate Composite was particularly influential inreclassifying the mid reading ability group and to a lesser extent thehigh group However most (818) of the low reading ability groupremained low on the Reading to LearnReading to IntegrateComposite (see Table 10) Of the 143 nonnative speaker participants92 (64) remained in the initial basic comprehension category onthe composite the rest moved but in different directions Twenty-one participants (147) were classified into a higher category on theReading to LearnReading to Integrate Composite than their initialbasic comprehension level would have suggested while 31 (217)were reclassified into a lower category Thus 51 participants(364) just over one third of the sample were classified differentlybased on their Reading to LearnReading to Integrate Compositeperformance
b Discriminant analysis for native speakers Because one of ourgoals in this project was to probe the possible validity of these newmeasures by assessing performance of two groups native as well as
Table 10 Discriminant analysis comparison of nonnative speaker reading abilitygroups with reading to learnreading to integrate composite (n 143)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto integrate composite
High Mid Low
CountHigh (56) 37 17 3 57Mid (50ndash55) 13 18 11 42Low (49) 2 6 36 44Reclassification total 52 41 50 143
PercentageHigh (56) 649 298 53 1000Mid (50ndash55) 310 429 262 1000Low (49) 45 136 818 1000
NoteTotal n 143 due to loss of three cases in anomalous Reading to Integratesession
Latricia Trites and Mary McGroarty 195
nonnative speakers we conducted a parallel discriminant analysisfor native speakers Thus for the native speakers we followed thesame procedure dividing the entire native speaker group (n 105)into three levels of basic comprehension high mid and low basedon score distances of 5 standard deviations from the sample meanof the NelsonndashDenny test for these participants Native speakersneed not take reading comprehension tests when entering the univer-sity so the three-way split was based entirely on our sample dataParticipants were classified as high if their scores on the Nelson-Denny were greater than or equal to 135 Participants were classifiedas mid if their scores ranged between 119ndash134 on the Nelson-DennyParticipants were classified as low if their scores fell at or below 118on the Nelson-Denny Descriptive statistics for the entire nativespeaker sample on basic comprehension group membership appearin Table 11
The discriminant analysis yielded one discriminant function with an eigenvalue of 31 responsible for 987 of the variance inoutcomes Wilkrsquos Lambda for this function was 76 significant at 001 the associated Chi-Square value was large (2639) and highlysignificant (p 001) indicating the group centroids on the discrim-inant function for the three reading ability groups on the Reading toLearnReading to Integrate Composite were significantly differentPooled within groups correlations between discriminating variablesshowed that Reading to Learn correlated with the first discriminantfunction at a level of 81 Reading to Integrate correlated with thesecond discriminant function at 71 This contrasts with findings forthe nonnative speakers where scores on the combined new measuresloaded significantly on only one discriminant function For nativespeakers then there is evidence for two significant discriminantfunctions although the first accounts for almost all of the varianceAlthough these two measures (Reading to Learn and Reading toIntegrate) loaded on two separate discriminant functions they still
Table 11 Descriptive statistics for native speakers by NelsonndashDenny reading abilitygroups (n 101)
Reading ability group n Mean on NelsonndashDenny sd
High (135) 39 14126 519Mid (119ndash134) 37 12565 509Low (118) 25 10288 1098Total 101 12604 1652
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
196 New tasks for reading comprehension tests
showed moderate correlations with the alternate function6 justifyingthe composite calculations
Results of discriminant analysis for native speakers seen in Table 12 show a different pattern than that observed for nonnativespeakers Nearly three-fourths (718) of the native speakersclassified as high in the basic comprehension reading ability groupremained high on the Reading to LearnReading to IntegrateComposite For the mid group however only 189 remainedclassified as mid Just over half (52) of the low reading abilitygroup members remained low on the Reading to LearnReading to Integrate Composite As with the nonnative speakers participantsin the mid category on basic comprehension showed the mostfrequent reclassification Forty-eight of the 101 (475) nativespeaker participants remained in the initial classification categoriesTwenty-two (218) were reclassified into a higher category and 31(307) were reclassified into a lower category Thus 53 participants(525) over half of the sample were classified differently based ontheir performance on the Reading to LearnReading to IntegrateComposite
6Reading to Learn correlated with function 2 at -58 Reading to Integrate with function 1 at 71
Table 12 Discriminant analysis comparison of native speaker reading abilitygroups with Reading to LearnReading to Integrate Composite (n 101)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto Integrate Composite
High Mid Low
CountHigh (56) 37 17 3CountHigh (135) 28 3 8 39Mid (119ndash134) 10 7 20 37Low (118) 5 7 13 25Reclassification total 43 17 41 101
PercentageHigh (135) 718 77 205 1000Mid (119ndash134) 270 189 541 1000Low (118) 200 280 520 1000
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
V Interpretation
Analyses done to answer Research Questions 1 and 2 showed that performance on Reading to Learn and Reading to Integratemeasures was significantly influenced by language background and level of education (graduate vs undergraduate for nonnativespeakers only) Moreover level of computer familiarity had nosignificant effect on Reading to Learn and Reading to Integrateperformance
Correlations showed that as expected all reading measures corre-lated to some degree generally answering Research Question 3 Themost interesting results came from the discriminant analyses becausethey showed a pattern of differential classification on the new taskssensitive to initial reading ability This pattern differed for nonnativespeakers and native speakers Examination of the reclassificationsbased on the new tasks revealed some of the problems of using abasic comprehension-only test to predict performance on more chal-lenging literacy tasks These results also imply the need for furtherwork on new measures of advanced literacy skills such as Reading toLearn and Reading to Integrate to reflect trends in construct-drivenassessment (Pellegrino et al 1999)
For the nonnative speakers most (818) of the participants withTOEFL Reading Comprehension below 50 remained classified as lowon the Reading to LearnReading to Integrate Composite suggestingthe existence of a lower threshold of academic English proficiencyParticipants below this threshold were unlikely to perform well ontasks assessing Reading to Learn and Reading to Integrate Even tasksinvolving only selection (such as BC1 and BC2) rather than produc-tion of responses were difficult for this group However for thenonnative speaker readers in the mid and high reading ability groupsbasic comprehension level was not nearly as consistent a predictor of performance on the Reading to LearnReading to IntegrateComposite This was especially striking for those in the mid readingability group Approximately one fourth of the mid group droppedinto the low category on the Reading to LearnReading to IntegrateComposite indicating that they experienced more difficulty in com-pleting the new measures but approximately one fourth could indeedcategorize and synthesize information relatively well despite mid-dling performance on basic comprehension This finding suggeststhat for nonnative speaker readers in the mid reading ability categorybasic comprehension measures were insufficient to predict theirperformance on the new measures
Latricia Trites and Mary McGroarty 197
Almost two-thirds of the high basic comprehension nonnativespeakers remained high on the Reading to LearnReading toIntegrate Composite but one third dropped suggesting that theycould not categorize and synthesize information from texts as well asthey could recognize information For this third the recognition-onlytask multiple-choice basic comprehension overestimated ability tomanipulate and synthesize information For the high and mid read-ing ability nonnative speaker groups basic comprehension-only testsmay then overestimate academic English proficiency
Results for the native speakers showed a more definitive pattern ofreclassification For the low reading ability group approximatelyhalf were misclassified suggesting that the basic comprehensionmeasure underrepresented their ability to categorize and synthesizeinformation in academic texts Most of the mid reading ability groupwere reclassified as lower on the Reading to LearnReading toIntegrate Composite showing that basic comprehension resultsoverestimated ability to succeed on more challenging tasks On theother hand almost one third of the mid reading ability group didbetter on the Reading to LearnReading to Integrate Composite forthem basic comprehension underrepresented their ability to com-plete more challenging reading tasks For the high ability readersnearly one third dropped on the Reading to LearnReading toIntegrate Composite For them the basic comprehension measureoverestimated their level of academic English proficiency The other two thirds of the high reading ability group remained highsuggesting a higher threshold of academic English proficiency
Considering results from both the nonnative speakers and nativespeakers we conclude that the new tasks did assess something dif-ferent from basic comprehension once a lower level threshold ofbasic academic English proficiency had been achieved Examinationof scatterplots based on discriminant functions indicated an obviousseparation between participants who could and could not perform onthe Reading to LearnReading to Integrate Composite thus provid-ing some tentative evidence for concurrent validity (Messick 1989Chapelle 1999) We had hoped to find clear evidence of a hierarchysuggesting that Reading to Learn was demonstrably more difficultthan basic comprehension and Reading to Integrate demonstrablymore difficult than Reading to Learn but results did not yield anobvious hierarchy Results did however suggest an even simplerpattern a dichotomy For those nonnative speakers above the lowerthreshold of English language proficiency which in our data wouldbe approximately 500 or 49 on the scaled TOEFL Reading
198 New tasks for reading comprehension tests
Comprehension scores some could perform well on the newmeasures some could not revealing two rather than three groups onthe Reading to LearnReading to Integrate Composite These resultslead us to speculate that the new measures tap additional skills suchas lsquosophisticated discourse processes and critical thinking skillsrsquo(Enright and Schedl 1999 24) in addition to language proficiencyFor native speakers the discriminant loading suggested a possiblehierarchy of difficulty but the scatterplots and classification tablesrevealed a dichotomy Reclassification based on the Reading toLearnReading to Integrate Composite nearly eliminated the mid basic comprehension reading ability category with only 168of the native speakers qualifying as mid on the Reading toLearnReading to Integrate Composite
VI Conclusions
While we acknowledge that there are some limitations to this projectsuch as the time required to administer and score new measuresthe results are illuminating useful and suggest some considerationsfor future research The first consideration relates to appropriateclassification of students based on academic reading abilities Thiscorresponds to the role of TOEFL as a gatekeeper by many institu-tions For TOEFL test-takers whose current TOEFL scores would bein an intermediate to high intermediate range the new tasks couldassess additional abilities relevant to academic performance Forsuch participants additional test development efforts are warrantedThe majority of high reading ability nonnative speakers (649)could perform the new measures a third could not Our resultssuggest that some unknown number of mid reading ability nonnativespeakers are more capable of succeeding at more challenging aca-demic reading tasks than their current level of basic comprehensionassessment would indicate At the same time there are some highand mid reading ability participants who could not perform the moredemanding tasks admitting such students directly into universityprograms could result in failure These results suggest that for moststudents at lower levels of basic comprehension (in our study thosewith TOEFL scores below 490 to 500) development of new tasks isunnecessary Nevertheless our results indicate that a certain smallpercentage (in our sample 8 students 182) of nonnative speakersclassified as low reading ability on basic comprehension did in fact
Latricia Trites and Mary McGroarty 199
perform adequately on more challenging tasks like Reading to Learnand Reading to Integrate Given the very large number of TOEFLtest-takers around the world this finding merits further research Itwould be potentially unfair to exclude such students from universitystudy based on basic comprehension-only measures such as the cur-rent TOEFL We would urge institutions that use TOEFL to considerscores on these new more demanding tasks if doing so wouldaddress institutional needs According to interview data collected inconjunction with this project (Trites 2000 Chapter 6) participantsperceived that successful completion of the new tasks required athorough understanding of the texts whereas the multiple-choicetests were less demanding requiring only superficial grasp of con-tent This finding corroborates the observation of Freedle and Kostin(1999 3) who note that often examinees do not need to comprehendthe accompanying text to answer the test item Therefore for allTOEFL test-takers incorporating more challenging tasks such as theReading to Learn and Reading to Integrate measures into typicalEnglish language instruction could have positive washback effects
The second area of consideration is the focus on additional rele-vant research For predictive validity it is important to determine thecorrelation between the Reading to LearnReading to IntegrateComposite measures and actual academic performance in universityclasses Correlations might differ depending on the degree of catego-rization and synthesis required by different major fields of studyAnother area for future research is related to the novelty of theReading to Learn task as an assessment technique Current pedagog-ical trends in literacy instruction emphasize the use of graphic organ-izers as a means to understand and manipulate information in textsTo our knowledge graphic organizers are typically used as classroomactivities rather than assessment tools They have good potential foruse in assessment if students are familiar with them and if appropri-ate scoring systems can be developed This project demonstrates that it is possible though labor intensive to develop reliable scoringsystems Further research is needed to explore refinement of this tasktype and related scoring systems for use in testing programs withlarge numbers of test-takers and scorers Although the Reading toIntegrate task (generating a synthesis) was more familiar the scoringsystem was innovative because it reflected a readerrsquos ability torecognize textual frames as well as integrate information If testdevelopers are interested in the abilities of nonnative speaker readersto perform such tasks further development of similar tasks andscoring systems is warranted While a substantial investment of time
200 New tasks for reading comprehension tests
would be required to refine the administration and scoring systemsneeded for more complex tasks such as these it should be weighedagainst the possible danger of under-representing the likelihood ofacademic success based on results of the basic comprehension-onlymeasures still most often used to assess academic proficiency
Acknowledgements
This project was funded by a grant from Educational Testing Serviceas part of the TOEFL 2000 effort We appreciate the cooperation ofthe ETS staff members who assisted in the development of passagespecific tasks and other aspects of the research however no officialendorsement of Educational Testing Service should be inferred
VII References
Bachman L 2000 Modern language testing at the turn of the century assur-ing that what we count counts Language Testing 17 1ndash42
Biber D 1993 Using register-diversified corpora for general language studiesComputational Linguistics 19 219ndash41
Britt M Rouet J and Perfetti C 1996 Using hypertext to study and reasonabout historical evidence In Rouet J Levonen J Dillon A and Spiro Reditors Hypertext and cognition Mahwah NJ Lawrence Erlbaum 43ndash72
Chapelle C 1999 Validity in language assessment Annual Review of AppliedLinguistics 19 1ndash19
Educational Testing Service 1997 TOEFL test and score manual PrincetonNJ Educational Testing Service
mdashmdash 1998 Draft TOEFL 2000 research agenda framework areas of researchResearch agenda three TOEFL 2000 internal document Princeton NJEducational Testing Service
Eignor D Taylor C Kirsch I and Jamieson J 1998 Development of ascale for assessing the level of computer familiarity of TOEFL exami-nees TOEFL Research Report No 60 Princeton NJ EducationalTesting Service
Enright M and Schedl M 1999 Reading for a reason using reader purposeto guide test design TOEFL 2000 Internal Report Princeton NJEducational Testing Service
Enright M Grabe W Mosenthal P Mulcahy-Ernt P and Schedl M1998 A TOEFL 2000 framework for testing reading comprehension aworking paper Princeton NJ Educational Testing Service
Foltz P 1996 Comprehension coherence and strategies in hypertext InRouet J Levonen J Dillon A and Spiro R editors Hypertext and cog-nition Mahwah NJ Lawrence Erlbaum 109ndash36
Latricia Trites and Mary McGroarty 201
Freedle R and Kostin I 1999 Does the text matter in a multiple choice testof comprehension The case for the construct validity of TOEFLrsquosminitalks Language Testing 16 2ndash32
Goldman S 1997 Learning from text reflections on the past and suggestionsfor the future Discourse Processes 23 357ndash98
Hayes J and Hatch J 1999 Issues in measuring reliability correlationversus percentage of agreement Written Communication 16 354ndash67
Huberty C 1994 Applied discriminant analysis New York John WileyJamieson J Campbell J Norfleet L and Berbisada N 1993 Reliability
of a computerized scoring routine for an open-ended task System 21305ndash22
Jamieson J Norfleet L and Berbisada N 1993 Successes failures anddropouts in computer-assisted language lessons Computer AssistedEnglish Language Learning Journal 4 12ndash20
Klecka W 1980 Discriminant analysis In Lewis-Beck M editorQuantitative applications in the social sciences Volume 19 NewburyPark CA Sage
Lehto M Zhu W and Carpenter B 1995 The relative effectiveness ofhypertext and text International Journal of Human-ComputerInteraction 7 293ndash313
McNamara D and Kintsch W 1996 Learning from texts effects of prior knowl-edge and text coherence Discourse Processes 22 247ndash88
Messick S 1989 Validity In Linn RL editor Educational measurement 3rdedition New York American Council on Education Macmillan 13ndash103
Meyer B 1985a Prose analyses purposes procedures and problems InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 11ndash64
mdashmdash 1985b Prose analysis purposes procedures and problems Part 2 InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 269ndash97
Monks V 1997 AugustSeptember Two views same waterway NationalWildlife 35 36ndash37
Pellegrino J Baxter G and Glaser R 1999 Addressing the lsquotwo disci-plinesrsquo problem linking theories of cognition and learning with assess-ment and instructional practice Review of Research in Education 24307ndash53
Perfetti C 1997 Sentences individual differences and multiple texts threeissues in text comprehension Discourse Processes 23 337ndash55
Perfetti C Britt MA and Georgi M 1995 Text-based learning andreasoning studies in history Hillside NJ Lawrence Erlbaum
Perfetti C Marron M and Foltz P 1996 Sources of comprehension failuretheoretical perspectives and case studies In Cornoldi C and Oakhill Jeditors Reading comprehension difficulties processes and interventionMahwah NJ Lawrence Erlbaum 137ndash65
202 New tasks for reading comprehension tests
Reinking D 1988 Computer-mediated text and comprehension differencesthe role of reading time reader preference and estimation of learningReading Research Quarterly 23 484ndash500
Reinking D and Schreiner R 1985 The effects of computer-mediated texton measures of reading comprehension and reading behavior ReadingResearch Quarterly 20 536ndash53
Spivey N 1997 The constructivist metaphor reading writing and themaking of meaning San Diego CA Academic Press
Stevens J 1996 Applied multivariate statistics for the social sciences 3rdedition Mahwah NJ Lawrence Erlbaum
Tabachnick B and Fidell L 1996 Using multivariate statistics 3rd editionNew York Harper Collins
Taylor C Jamieson J Eignor D and Kirsch I 1998 The relationshipbetween computer familiarity and performance on computer-basedTOEFL test tasks TOEFL Research Report No 61 Princeton NJEducational Testing Service
Tennesen M 1997 NovemberDecember On a clear day National Parks 7126ndash9
Trites L 2000 Beyond basic comprehension reading to learn and reading tointegrate for native and non-native speakers Unpublished doctoraldissertation Northern Arizona University Flagstaff AZ
Van den Berg S and Watt J 1991 Effects of educational setting on studentresponses to structured hypertext Journal of Computer-BasedInstruction 18 118ndash24
Van Dijk TA and Kintsch W 1983 Strategies of discourse comprehensionNew York Academic Press
Wiley J and Voss J 1999 Constructing arguments from multiple sourcestasks that promote understanding and not just memory for text Journalof Educational Psychology 91 310ndash11
Zimmerman T 1997 December 29 Filter it with billions and billions of oys-ters how to revive the Chesapeake Bay US News and World Report 12363
Appendix 1a Chart completion task
Directions Complete the following chart Fill in as much detail aspossible from the text read by categorizing the information into thedifferent areas on the chart Do not use single words for yourresponses form your responses in phrases or complete sentencesInclude examples from the text
Make no judgments about the accuracy of causes or effects or theeffectiveness of the solutions mentioned in the text Solutions are seen
Latricia Trites and Mary McGroarty 203
204 New tasks for reading comprehension tests
as any action taken in response to the problem(s) Solutions can takemany forms such as proposed solutions attempted solutions or failedsolutions Also space is provided under each category for examplesExamples are specific examples found in the text that are used by theauthor(s) to exemplify the problems causes effects or solutions inthe text However there may not be examples for every category
Points will be awarded for correct responses only There is no penaltyfor incorrect responses Points will be awarded in the following manner
Problems and Solutions 10 points eachCauses and Effects 5 points eachExamples 1 point each
Problems Causes Effects Solutions
Examples Examples Examples Examples
Latricia Trites and Mary McGroarty 205
Ap
pen
dix
1b
Rea
din
g t
o L
earn
sco
rin
g r
ub
ric
0ndash2
41 t
ota
l po
ssib
le
Pro
ble
ms
(10
po
ints
) 1
wo
rd
Cau
ses
(5p
oin
ts)
1 w
ord
E
ffec
ts (
5 p
oin
ts)
1 w
ord
S
olu
tio
ns
(10
po
ints
en
d)
1 w
ord
(6 p
oin
ts)
40
max
imu
m(3
po
ints
) 5
5 m
axim
um
(3 p
oin
ts)
30
max
imu
m(6
po
ints
) 9
0 m
axim
um
bullA
ir p
ollu
tio
nS
mo
g i
n t
he
bullS
ulf
ur
nit
rog
en e
mis
sio
ns
bullTre
es a
nd
pla
nts
aff
ecte
dS
tric
ter
En
vir
on
men
tal
Law
s
Nati
on
al
Park
sbull
Aci
d r
ain
(in
jure
d)
gro
wth
hin
der
ed(r
eso
luti
on
s
acts
)
bullO
pp
osit
ion
or
Ign
ori
ng
bull
Gro
un
d-l
evel
ozo
ne
bullV
isib
ilit
y d
ecre
ased
bull19
77 C
lean
Air
Act
(or
lack
of
coo
per
atio
n)
to
Urb
an
In
du
str
ial
em
issio
ns
bullM
eta
ls l
oo
sen
ed
into
wat
ers
Am
en
dm
en
ts l
ab
elin
gN
Ps
envi
ron
men
tal s
tan
dar
ds
bullA
uto
mo
bile
emis
sio
ns
(su
rfac
eg
rou
nd
wat
er
as C
lass I
are
as
(reg
ula
tio
ns)
bullP
ow
er
pla
nts
fac
tori
esp
lan
ts
wat
er p
ollu
ted
)bull
Reg
ion
al h
aze
reg
ula
tio
ns
bullN
o t
rue
lsquopo
int
sou
rces
rsquoem
issi
on
sbull
Nu
trie
nts
rem
oved
pro
po
sed
by
the
EPA
(myri
ad
of
sm
aller
po
llu
tio
n
bullE
mis
sio
ns
fro
m S
mo
kesta
cks
(lea
ched
) fr
om
so
il bull
Red
ucti
on
of
allo
wab
leso
urc
es
rath
er t
han
on
e (c
him
neys)
and
or
pla
nts
po
llu
tio
n s
tan
dard
s f
rom
la
rge
sou
rce)
bullE
mis
sio
ns
fro
m K
iln
sbull
Pu
blic o
utc
ryo
ver
stan
ce
ind
ust
rybull
Nati
on
al
Park
s l
imit
ed
bull
Un
reg
ula
ted
po
lluti
on
sm
og
o
f b
ig b
usi
nes
sg
ove
rnm
ent
bullC
lean
Air
Act
set
ju
risd
icti
on
of
po
lluti
on
(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)bull
Aq
uati
c l
ife in
jure
d (
dam
aged
)vis
ibilit
y g
oals
sou
rces
ou
tsid
e o
f th
e p
arks
bullS
mo
ke f
rom
Co
ntr
olled
bu
rns
bullO
bje
cti
ng
to c
on
str
ucti
on
bullE
mis
sio
ns
fro
m la
rge
citi
esp
erm
its
bullId
en
tifi
cati
on
of
po
lluti
on
so
urc
es
bullIn
du
str
y m
od
ificati
on
s
Insta
llati
on
o
f d
evic
es
(scr
ub
ber
s)
to r
edu
ce
po
lluti
on
bullP
ub
lic p
ressu
re t
op
rote
ct p
arks
206 New tasks for reading comprehension testsA
pp
en
dix
1b
(co
nti
nu
ed)
Exam
ple
s (
1 p
oin
t) 5
maxim
um
Exam
ple
s (
1 p
oin
t o
r 5 p
oin
ts
Exam
ple
s (
1 p
oin
t) 6
maxim
um
Exam
ple
s (
1 p
oin
t o
r 10 p
oin
ts
wit
h o
vera
rch
ing
cau
se
wit
h o
vera
rch
ing
so
luti
on
liste
d a
bo
ve)
liste
d a
bo
ve)
bullG
reat
Sm
oky M
ou
nta
inbull
Au
tom
ob
ile
emis
sio
ns
bullLe
aves
of
pla
nts
tu
rnin
g
bull19
77 C
lean
Air
Act
N
atio
nal
Par
kbull
Po
wer
pla
nts
fac
tori
es
pu
rple
an
d b
row
n (
stip
plin
g)
Am
en
dm
en
ts
lab
elin
gN
Ps
bullG
ran
d C
an
yo
np
lan
ts e
mis
sio
ns
bullH
ind
erin
g p
ho
tosyn
thesis
of
as C
lass I
are
as
Nat
ion
al P
ark
bullE
mis
sio
ns
fro
m
pla
nts
an
d t
rees
bullS
mo
ky M
ou
nta
in S
ou
rces
S
mo
kesta
cks
(ch
imn
eys)
bullR
ed
ucti
on
of
vis
ibilit
yat
bullR
egio
nal
haz
ere
gu
lati
on
s
fro
m O
hio
New
Yo
rk
bullE
mis
sio
ns
fro
m K
iln
sG
ran
d C
an
yo
np
rop
ose
d b
y th
e E
PA
Atl
anta
etc
bull
Un
reg
ula
ted
po
lluti
on
sm
og
bull
Vis
ibili
ty r
educ
ed 9
0 o
f bull
Red
ucti
on
of
allo
wab
lebull
Gra
nd
Can
yon
So
urc
es(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)th
e da
ysp
ollu
tio
n s
tan
dard
s f
rom
fro
m C
A N
V U
T A
Z N
M
bullS
mok
e fr
om C
on
tro
lled
bu
rns
bullS
ee v
ague
blu
e m
asse
sin
du
stry
and
Mex
ico
bullE
mis
sio
ns
fro
m la
rge
citi
esbull
See
hal
f as
far
as
in 1
919
bullT
N L
utt
rell
corp
bu
ildin
g
per
mit
sit
uat
ion
Exam
ple
s (
1 p
oin
t)
10 m
axim
um
Exam
ple
s (
1 p
oin
t)
5 m
axim
um
bullN
avaj
o G
ener
atin
g S
tati
on
bull
Ob
ject
ing
to
kiln
s in
TN
Pag
e A
Z (
Po
wer
Pla
nt
AZ
)bull
Res
earc
her
s u
sin
g s
cien
tifi
cbull
Po
wer
pla
nts
in T
N a
nd
OH
tech
no
log
y to
ID s
ou
rces
rive
r va
lley
(rad
ioac
tive
iso
top
es)
bullS
ou
ther
n C
alif
orn
ia E
dis
on
bull
So
uth
ern
Cal
ifo
rnia
Ed
iso
nP
lan
t L
aug
hlin
NV
Pla
nt
in L
aug
hlin
NV
bull
Ten
nes
see
Lutt
rell
Kiln
s T
Nid
enti
fied
as
po
int
sou
rce
bullIn
du
stry
in A
Zbull
Scru
bber
sin
stal
led
at
bullC
ars
in C
AN
avaj
o G
ener
atin
g P
lan
tbull
Sm
oke
stac
ks in
NM
NV
UT
bull10
r
edu
ctio
n o
f p
ollu
tio
nbull
Los
An
gel
esin
10ndash
15 y
ears
(g
oal
of
no
bullA
tlan
tam
an-m
ade
po
lluti
on
)bull
New
Yo
rk
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
188 New tasks for reading comprehension tests
group status medium of text presentation and test order as possiblecontributing factors4 Table 4 shows that there were no significantinteractions for any of the group medium or test order combinationsThe only significant main effect was group membership
Because group membership was a combined measure that includedboth native language background as well as level of education posthoc analysis was needed to identify the significant contrasts Table 5shows that there was a significant difference in performance on theReading to Learn measure between the native speaker undergraduateand the nonnative speaker undergraduate groups as well as a sig-nificant difference between the nonnative speaker undergraduate and nonnative speaker graduate groups There was no significantdifference in performance between the native speaker undergrad-uate and the nonnative speaker graduate groups Therefore theanswer to Research Question 1 is that native language backgroundand level of education did have a significant effect on performance onthe Reading to Learn measure but that medium of text presentationdid not Further order of testing whether participants took Readingto Learn or Reading to Integrate first had no significant effect
3 Research Question 2
The second research question related to the first asked if perform-ance on Reading to Integrate was affected by medium of presentation
Table 4 Performance on Reading to Learn measure by groups medium and testorder (n 251) (univariate analysis of variance)
Source Type III sum df Mean square Fof squares
Group 2186929 2 1093464 2810Medium 121586 1 121586 313Test order 29498 1 29498 076Group medium 33481 2 16740 043Group test order 437 2 219 001Medium test order 39173 1 39173 101Group medium test order 57529 2 28765 074Error 9299745 239 38911
Note p 05
4Test order was added as an additional variable to double check that our counterbalancing had beeneffective in controlling for any practice effect
Latricia Trites and Mary McGroarty 189
computer familiarity native language or level of education Again toensure that counterbalancing of tests controlled for any practiceeffect test order was added as an additional variable
To answer this question we proceeded to calculate a univariateANOVA on the Reading to Integrate measure with group statusmedium of text presentation and test order entered as possible contri-buting factors The results (Table 6) show as for Research Question 1that there were no significant interactions for any of the groupmedium or test order combinations the only significant main effectwas group membership The answer for Research Question 2 is thatnative language background and educational level had a significanteffect on Reading to Integrate but medium of text presentation did notPost hoc analysis of group contrasts showed that all three groups weredistinct in their performance on Reading to Integrate (see Table 7)
4 Research Question 3
The third research question asked to what extent measures of basiccomprehension Reading to Learn and Reading to Integrate were
Table 5 Post hoc Scheffeacute for Reading to Learn measure (n 251)
Group n Group n Mean difference Standard error
NSU 105 NNSU 106 2012 272NNSG 40 717 367
NNSU 106 NNSG 40 1295 366
Note p 05
Table 6 Performance on Reading to Integrate measure by groups medium andtest order (n 244a) (univariate analysis of variance)
Source Type III sum df Mean square Fof squares
Group 3629433 2 1814717 5582b
Medium 19295 1 19295 059Test order 9783 1 9783 030Group medium 1182 2 591 002Group test order 3014 2 1507 005Medium test order 109872 1 109872 338Group medium test order 148858 2 74429 229Error 7543037 232 32513
Notes an size reduced for Reading to Integrate because of anomalous testingsession bp 05
190 New tasks for reading comprehension tests
related We used correlational analysis as the first step in answeringthis question Results for the total participant population (see Table 8)showed moderate to high correlations across all reading measuresHowever the analyses done for Research Questions 1 and 2 revealedthat group status had a significant effect on performance on Readingto Learn and Reading to Integrate Further we realize that corre-lations are sensitive to variance so the high correlations seen in thetotal population could have been an artifact of combining the threegroups Therefore we examined the correlations among all readingmeasures for each group (available in Trites 2000 Appendix 1 pp230ndash33) While the reading measures were still correlated oftenmoderately sometimes highly magnitudes differed and sometimesdropped substantially The text-specific multiple-choice measuresBC1 and BC2 consistently correlated more highly with theNelsonndashDenny and TOEFL Reading Comprehension tests than withReading to Learn and Reading to Integrate based on the same textssuggesting a test method or construct effect Because comparisonsbetween different measures of basic comprehension were not a goal of the project BC1 and BC2 were not used in further analysesWe conclude that as expected all reading measures were relatedbut the lower correlations between Reading to Learn and Reading to Integrate and the traditional basic comprehension measures led us to consider further types of analysis to identify the possibledistinctiveness of the new measures
5 Discriminant analysis
Because we were interested in determining how constructs differedwe sought additional analyses to help us better characterize the new constructs Of the several possible statistical methods that could have been employed two are most plausible multivariateanalysis of variance usually associated with experimental research
Table 7 Post hoc Scheffeacute for Reading to Integrate measure (n 244a)
Group n Group n Mean difference Standard error
NS 101 NNSU 103 2641b 253NNSG 40 1005b 337
NNSU 103 NNSG 40 1636b 336
Notes an size reduced for Reading to Integrate because of anomalous testingsession bp 05
Latricia Trites and Mary McGroarty 191
Tab
le 8
Co
rrel
atio
ns
for
all r
ead
ing
mea
sure
s fo
r al
l par
tici
pan
ts (
n
251)
TO
EFL
Rea
din
gB
asic
B
asic
Rea
din
g t
oR
ead
ing
to
C
om
pre
hen
sio
nC
om
pre
hen
sio
n T
est
1C
om
pre
hen
sio
n T
est
2Le
arn
Inte
gra
tea
Nel
son
ndashDen
ny
90b
85b
84b
66b
69b
TO
EFL
Rea
din
g
100
85b
84b
64b
69b
com
pre
hen
sio
nB
asic
1
008
4b6
8b6
8b
com
pre
hen
sio
n 1
Bas
ic
100
68b
70b
com
pre
hen
sio
n 2
Rea
din
g t
o L
earn
100
59b
Not
es a
nsi
ze r
edu
ced
fo
r R
ead
ing
to
Inte
gra
te b
ecau
se o
f an
om
alo
us
test
ing
ses
sio
n b
p
05
and discriminant analysis usually associated with descriptiveresearch (Tabachnick and Fidell 1996) The present research wasconducted with samples of naturally occurring student groups andwas not experimental Moreover we were interested in finding waysto compare participant performance on the measures of the newconstructs Reading to Learn and Reading to Integrate with perform-ance on more traditional measures of basic comprehension Thus we opted to use discriminant analysis because of its parsimony of description and clarity of interpretation (Stevens 1996)Discriminant analysis a technique recommended to describe groupdifferences or predict group membership based on a comparison ofmultiple predictors (Huberty 1994) has been used in other areas ofapplied linguistic research to investigate creation of a student profileof success or failure on Computer Assisted Language Learning(CALL) lessons (Jamieson et al 1993) and accurate classification oftext types into registers (Biber 1993) among other purposes
To further distinguish basic comprehension from the new con-structs we conducted discriminant analysis on each of the twolanguage groups (native and nonnative) to determine whetherReading to Learn and Reading to Integrate would classify partici-pants in the same way that Basic Comprehension would We dividedthe native speaker and nonnative speaker groups into three levelshigh middle (mid) and low reading ability scorers based on thebasic comprehension measure chosen for that group (NelsonndashDennyfor native speakers TOEFL Reading Comprehension for nonnativespeakers) Research methodologists (Tabachnick and Fidell 1996513) note that robustness is expected when the smallest group has isleast 20 our smallest group was 25 To check the assumption ofhomogeneity of the variancecovariance matrices we examined theoutcomes of Boxrsquos M Test and found them all nonsignificant(Klecka 1980) Each group was checked for outliers usingMahalanobis distance treated as Chi-Square and no outliers were found (Tabachnick and Fidell 1996) Thus the data met allassumptions required for use of discriminant analysis
a Discriminant analysis for nonnative speakers To organize thediscriminant analysis in order to see if the Reading to LearnReadingto Integrate Composite classified participants similarly to themeasure of Basic Comprehension (for nonnative speakers TOEFLReading Comprehension) we divided the entire nonnative speakergroup (n 146) into three levels of basic comprehension high mid
192 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 193
and low These reading ability groups of high mid and low werebased on typical TOEFL Reading Comprehension score levelsrequired for program entry Participants were classified as high iftheir scores were greater than or equal to 550 or 56 and above on thescaled score on the TOEFL Reading Comprehension (550 is the cutscore often used for graduate entry) Participants were classified asmid if scores ranged between 500 and 549 or 50 to 55 on the TOEFLReading Comprehension Participants were classified as low if theirscores fell below 500 or 49 and below on TOEFL ReadingComprehension (500 is a minimum TOEFL score sometimes usedfor undergraduate admission often with the proviso that studentsenroll in ESL classes either prior to official enrollment or concur-rently) For our entire nonnative speaker group descriptive statisticson basic comprehension reading ability group membership levelsappear in Table 9
We ran SPSS Discriminant Analysis with initial grouping variablesof high mid and low reading ability We compared initial readingability levels with high mid and low categories on the Reading toLearnReading to Integrate Composite a new variable reflectinglevel of performance on the Reading to Learn and Reading toIntegrate measures combined5 The discriminant analysis yieldedone discriminant function with an eigenvalue of 92 responsible for999 of the variance in outcomes Wilkrsquos Lambda for this functionwas 52 significant at 001 the associated Chi-Square value wasextremely large (9093) and highly significant ( p 001) indicatingthe group centroids on the composite Reading to LearnReading toIntegrate function for the three nonnative speaker reading abilitygroups were significantly different Both Reading to Learn and
Table 9 Descriptive statistics for nonnative speakers by TOEFL reading compre-hension reading ability groups (n 143)
Reading ability group n Mean on TOEFL reading comprehension sd
High (56) 57 5968 303Mid (50ndash55) 42 5281 167Low (49) 44 4211 547
Total 143 5226 822
Note Total n 143 due to loss of three cases in anomalous Reading to Integratesession
5We first calculated two separate discriminant analyses one for Reading to Learn and one for Reading to Integrate but we found that both loaded on a single function so we used thecomposite in subsequent analyses
194 New tasks for reading comprehension tests
Reading to Integrate loaded significantly on the discriminant func-tion at 86 for Reading to Learn and 78 for Reading to Integrate ( p 05)
Over half (649) of the high reading ability group remained highon the new measure less than half (429) of the mid reading ability group remained classified as mid Hence the Reading toLearnReading to Integrate Composite was particularly influential inreclassifying the mid reading ability group and to a lesser extent thehigh group However most (818) of the low reading ability groupremained low on the Reading to LearnReading to IntegrateComposite (see Table 10) Of the 143 nonnative speaker participants92 (64) remained in the initial basic comprehension category onthe composite the rest moved but in different directions Twenty-one participants (147) were classified into a higher category on theReading to LearnReading to Integrate Composite than their initialbasic comprehension level would have suggested while 31 (217)were reclassified into a lower category Thus 51 participants(364) just over one third of the sample were classified differentlybased on their Reading to LearnReading to Integrate Compositeperformance
b Discriminant analysis for native speakers Because one of ourgoals in this project was to probe the possible validity of these newmeasures by assessing performance of two groups native as well as
Table 10 Discriminant analysis comparison of nonnative speaker reading abilitygroups with reading to learnreading to integrate composite (n 143)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto integrate composite
High Mid Low
CountHigh (56) 37 17 3 57Mid (50ndash55) 13 18 11 42Low (49) 2 6 36 44Reclassification total 52 41 50 143
PercentageHigh (56) 649 298 53 1000Mid (50ndash55) 310 429 262 1000Low (49) 45 136 818 1000
NoteTotal n 143 due to loss of three cases in anomalous Reading to Integratesession
Latricia Trites and Mary McGroarty 195
nonnative speakers we conducted a parallel discriminant analysisfor native speakers Thus for the native speakers we followed thesame procedure dividing the entire native speaker group (n 105)into three levels of basic comprehension high mid and low basedon score distances of 5 standard deviations from the sample meanof the NelsonndashDenny test for these participants Native speakersneed not take reading comprehension tests when entering the univer-sity so the three-way split was based entirely on our sample dataParticipants were classified as high if their scores on the Nelson-Denny were greater than or equal to 135 Participants were classifiedas mid if their scores ranged between 119ndash134 on the Nelson-DennyParticipants were classified as low if their scores fell at or below 118on the Nelson-Denny Descriptive statistics for the entire nativespeaker sample on basic comprehension group membership appearin Table 11
The discriminant analysis yielded one discriminant function with an eigenvalue of 31 responsible for 987 of the variance inoutcomes Wilkrsquos Lambda for this function was 76 significant at 001 the associated Chi-Square value was large (2639) and highlysignificant (p 001) indicating the group centroids on the discrim-inant function for the three reading ability groups on the Reading toLearnReading to Integrate Composite were significantly differentPooled within groups correlations between discriminating variablesshowed that Reading to Learn correlated with the first discriminantfunction at a level of 81 Reading to Integrate correlated with thesecond discriminant function at 71 This contrasts with findings forthe nonnative speakers where scores on the combined new measuresloaded significantly on only one discriminant function For nativespeakers then there is evidence for two significant discriminantfunctions although the first accounts for almost all of the varianceAlthough these two measures (Reading to Learn and Reading toIntegrate) loaded on two separate discriminant functions they still
Table 11 Descriptive statistics for native speakers by NelsonndashDenny reading abilitygroups (n 101)
Reading ability group n Mean on NelsonndashDenny sd
High (135) 39 14126 519Mid (119ndash134) 37 12565 509Low (118) 25 10288 1098Total 101 12604 1652
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
196 New tasks for reading comprehension tests
showed moderate correlations with the alternate function6 justifyingthe composite calculations
Results of discriminant analysis for native speakers seen in Table 12 show a different pattern than that observed for nonnativespeakers Nearly three-fourths (718) of the native speakersclassified as high in the basic comprehension reading ability groupremained high on the Reading to LearnReading to IntegrateComposite For the mid group however only 189 remainedclassified as mid Just over half (52) of the low reading abilitygroup members remained low on the Reading to LearnReading to Integrate Composite As with the nonnative speakers participantsin the mid category on basic comprehension showed the mostfrequent reclassification Forty-eight of the 101 (475) nativespeaker participants remained in the initial classification categoriesTwenty-two (218) were reclassified into a higher category and 31(307) were reclassified into a lower category Thus 53 participants(525) over half of the sample were classified differently based ontheir performance on the Reading to LearnReading to IntegrateComposite
6Reading to Learn correlated with function 2 at -58 Reading to Integrate with function 1 at 71
Table 12 Discriminant analysis comparison of native speaker reading abilitygroups with Reading to LearnReading to Integrate Composite (n 101)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto Integrate Composite
High Mid Low
CountHigh (56) 37 17 3CountHigh (135) 28 3 8 39Mid (119ndash134) 10 7 20 37Low (118) 5 7 13 25Reclassification total 43 17 41 101
PercentageHigh (135) 718 77 205 1000Mid (119ndash134) 270 189 541 1000Low (118) 200 280 520 1000
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
V Interpretation
Analyses done to answer Research Questions 1 and 2 showed that performance on Reading to Learn and Reading to Integratemeasures was significantly influenced by language background and level of education (graduate vs undergraduate for nonnativespeakers only) Moreover level of computer familiarity had nosignificant effect on Reading to Learn and Reading to Integrateperformance
Correlations showed that as expected all reading measures corre-lated to some degree generally answering Research Question 3 Themost interesting results came from the discriminant analyses becausethey showed a pattern of differential classification on the new taskssensitive to initial reading ability This pattern differed for nonnativespeakers and native speakers Examination of the reclassificationsbased on the new tasks revealed some of the problems of using abasic comprehension-only test to predict performance on more chal-lenging literacy tasks These results also imply the need for furtherwork on new measures of advanced literacy skills such as Reading toLearn and Reading to Integrate to reflect trends in construct-drivenassessment (Pellegrino et al 1999)
For the nonnative speakers most (818) of the participants withTOEFL Reading Comprehension below 50 remained classified as lowon the Reading to LearnReading to Integrate Composite suggestingthe existence of a lower threshold of academic English proficiencyParticipants below this threshold were unlikely to perform well ontasks assessing Reading to Learn and Reading to Integrate Even tasksinvolving only selection (such as BC1 and BC2) rather than produc-tion of responses were difficult for this group However for thenonnative speaker readers in the mid and high reading ability groupsbasic comprehension level was not nearly as consistent a predictor of performance on the Reading to LearnReading to IntegrateComposite This was especially striking for those in the mid readingability group Approximately one fourth of the mid group droppedinto the low category on the Reading to LearnReading to IntegrateComposite indicating that they experienced more difficulty in com-pleting the new measures but approximately one fourth could indeedcategorize and synthesize information relatively well despite mid-dling performance on basic comprehension This finding suggeststhat for nonnative speaker readers in the mid reading ability categorybasic comprehension measures were insufficient to predict theirperformance on the new measures
Latricia Trites and Mary McGroarty 197
Almost two-thirds of the high basic comprehension nonnativespeakers remained high on the Reading to LearnReading toIntegrate Composite but one third dropped suggesting that theycould not categorize and synthesize information from texts as well asthey could recognize information For this third the recognition-onlytask multiple-choice basic comprehension overestimated ability tomanipulate and synthesize information For the high and mid read-ing ability nonnative speaker groups basic comprehension-only testsmay then overestimate academic English proficiency
Results for the native speakers showed a more definitive pattern ofreclassification For the low reading ability group approximatelyhalf were misclassified suggesting that the basic comprehensionmeasure underrepresented their ability to categorize and synthesizeinformation in academic texts Most of the mid reading ability groupwere reclassified as lower on the Reading to LearnReading toIntegrate Composite showing that basic comprehension resultsoverestimated ability to succeed on more challenging tasks On theother hand almost one third of the mid reading ability group didbetter on the Reading to LearnReading to Integrate Composite forthem basic comprehension underrepresented their ability to com-plete more challenging reading tasks For the high ability readersnearly one third dropped on the Reading to LearnReading toIntegrate Composite For them the basic comprehension measureoverestimated their level of academic English proficiency The other two thirds of the high reading ability group remained highsuggesting a higher threshold of academic English proficiency
Considering results from both the nonnative speakers and nativespeakers we conclude that the new tasks did assess something dif-ferent from basic comprehension once a lower level threshold ofbasic academic English proficiency had been achieved Examinationof scatterplots based on discriminant functions indicated an obviousseparation between participants who could and could not perform onthe Reading to LearnReading to Integrate Composite thus provid-ing some tentative evidence for concurrent validity (Messick 1989Chapelle 1999) We had hoped to find clear evidence of a hierarchysuggesting that Reading to Learn was demonstrably more difficultthan basic comprehension and Reading to Integrate demonstrablymore difficult than Reading to Learn but results did not yield anobvious hierarchy Results did however suggest an even simplerpattern a dichotomy For those nonnative speakers above the lowerthreshold of English language proficiency which in our data wouldbe approximately 500 or 49 on the scaled TOEFL Reading
198 New tasks for reading comprehension tests
Comprehension scores some could perform well on the newmeasures some could not revealing two rather than three groups onthe Reading to LearnReading to Integrate Composite These resultslead us to speculate that the new measures tap additional skills suchas lsquosophisticated discourse processes and critical thinking skillsrsquo(Enright and Schedl 1999 24) in addition to language proficiencyFor native speakers the discriminant loading suggested a possiblehierarchy of difficulty but the scatterplots and classification tablesrevealed a dichotomy Reclassification based on the Reading toLearnReading to Integrate Composite nearly eliminated the mid basic comprehension reading ability category with only 168of the native speakers qualifying as mid on the Reading toLearnReading to Integrate Composite
VI Conclusions
While we acknowledge that there are some limitations to this projectsuch as the time required to administer and score new measuresthe results are illuminating useful and suggest some considerationsfor future research The first consideration relates to appropriateclassification of students based on academic reading abilities Thiscorresponds to the role of TOEFL as a gatekeeper by many institu-tions For TOEFL test-takers whose current TOEFL scores would bein an intermediate to high intermediate range the new tasks couldassess additional abilities relevant to academic performance Forsuch participants additional test development efforts are warrantedThe majority of high reading ability nonnative speakers (649)could perform the new measures a third could not Our resultssuggest that some unknown number of mid reading ability nonnativespeakers are more capable of succeeding at more challenging aca-demic reading tasks than their current level of basic comprehensionassessment would indicate At the same time there are some highand mid reading ability participants who could not perform the moredemanding tasks admitting such students directly into universityprograms could result in failure These results suggest that for moststudents at lower levels of basic comprehension (in our study thosewith TOEFL scores below 490 to 500) development of new tasks isunnecessary Nevertheless our results indicate that a certain smallpercentage (in our sample 8 students 182) of nonnative speakersclassified as low reading ability on basic comprehension did in fact
Latricia Trites and Mary McGroarty 199
perform adequately on more challenging tasks like Reading to Learnand Reading to Integrate Given the very large number of TOEFLtest-takers around the world this finding merits further research Itwould be potentially unfair to exclude such students from universitystudy based on basic comprehension-only measures such as the cur-rent TOEFL We would urge institutions that use TOEFL to considerscores on these new more demanding tasks if doing so wouldaddress institutional needs According to interview data collected inconjunction with this project (Trites 2000 Chapter 6) participantsperceived that successful completion of the new tasks required athorough understanding of the texts whereas the multiple-choicetests were less demanding requiring only superficial grasp of con-tent This finding corroborates the observation of Freedle and Kostin(1999 3) who note that often examinees do not need to comprehendthe accompanying text to answer the test item Therefore for allTOEFL test-takers incorporating more challenging tasks such as theReading to Learn and Reading to Integrate measures into typicalEnglish language instruction could have positive washback effects
The second area of consideration is the focus on additional rele-vant research For predictive validity it is important to determine thecorrelation between the Reading to LearnReading to IntegrateComposite measures and actual academic performance in universityclasses Correlations might differ depending on the degree of catego-rization and synthesis required by different major fields of studyAnother area for future research is related to the novelty of theReading to Learn task as an assessment technique Current pedagog-ical trends in literacy instruction emphasize the use of graphic organ-izers as a means to understand and manipulate information in textsTo our knowledge graphic organizers are typically used as classroomactivities rather than assessment tools They have good potential foruse in assessment if students are familiar with them and if appropri-ate scoring systems can be developed This project demonstrates that it is possible though labor intensive to develop reliable scoringsystems Further research is needed to explore refinement of this tasktype and related scoring systems for use in testing programs withlarge numbers of test-takers and scorers Although the Reading toIntegrate task (generating a synthesis) was more familiar the scoringsystem was innovative because it reflected a readerrsquos ability torecognize textual frames as well as integrate information If testdevelopers are interested in the abilities of nonnative speaker readersto perform such tasks further development of similar tasks andscoring systems is warranted While a substantial investment of time
200 New tasks for reading comprehension tests
would be required to refine the administration and scoring systemsneeded for more complex tasks such as these it should be weighedagainst the possible danger of under-representing the likelihood ofacademic success based on results of the basic comprehension-onlymeasures still most often used to assess academic proficiency
Acknowledgements
This project was funded by a grant from Educational Testing Serviceas part of the TOEFL 2000 effort We appreciate the cooperation ofthe ETS staff members who assisted in the development of passagespecific tasks and other aspects of the research however no officialendorsement of Educational Testing Service should be inferred
VII References
Bachman L 2000 Modern language testing at the turn of the century assur-ing that what we count counts Language Testing 17 1ndash42
Biber D 1993 Using register-diversified corpora for general language studiesComputational Linguistics 19 219ndash41
Britt M Rouet J and Perfetti C 1996 Using hypertext to study and reasonabout historical evidence In Rouet J Levonen J Dillon A and Spiro Reditors Hypertext and cognition Mahwah NJ Lawrence Erlbaum 43ndash72
Chapelle C 1999 Validity in language assessment Annual Review of AppliedLinguistics 19 1ndash19
Educational Testing Service 1997 TOEFL test and score manual PrincetonNJ Educational Testing Service
mdashmdash 1998 Draft TOEFL 2000 research agenda framework areas of researchResearch agenda three TOEFL 2000 internal document Princeton NJEducational Testing Service
Eignor D Taylor C Kirsch I and Jamieson J 1998 Development of ascale for assessing the level of computer familiarity of TOEFL exami-nees TOEFL Research Report No 60 Princeton NJ EducationalTesting Service
Enright M and Schedl M 1999 Reading for a reason using reader purposeto guide test design TOEFL 2000 Internal Report Princeton NJEducational Testing Service
Enright M Grabe W Mosenthal P Mulcahy-Ernt P and Schedl M1998 A TOEFL 2000 framework for testing reading comprehension aworking paper Princeton NJ Educational Testing Service
Foltz P 1996 Comprehension coherence and strategies in hypertext InRouet J Levonen J Dillon A and Spiro R editors Hypertext and cog-nition Mahwah NJ Lawrence Erlbaum 109ndash36
Latricia Trites and Mary McGroarty 201
Freedle R and Kostin I 1999 Does the text matter in a multiple choice testof comprehension The case for the construct validity of TOEFLrsquosminitalks Language Testing 16 2ndash32
Goldman S 1997 Learning from text reflections on the past and suggestionsfor the future Discourse Processes 23 357ndash98
Hayes J and Hatch J 1999 Issues in measuring reliability correlationversus percentage of agreement Written Communication 16 354ndash67
Huberty C 1994 Applied discriminant analysis New York John WileyJamieson J Campbell J Norfleet L and Berbisada N 1993 Reliability
of a computerized scoring routine for an open-ended task System 21305ndash22
Jamieson J Norfleet L and Berbisada N 1993 Successes failures anddropouts in computer-assisted language lessons Computer AssistedEnglish Language Learning Journal 4 12ndash20
Klecka W 1980 Discriminant analysis In Lewis-Beck M editorQuantitative applications in the social sciences Volume 19 NewburyPark CA Sage
Lehto M Zhu W and Carpenter B 1995 The relative effectiveness ofhypertext and text International Journal of Human-ComputerInteraction 7 293ndash313
McNamara D and Kintsch W 1996 Learning from texts effects of prior knowl-edge and text coherence Discourse Processes 22 247ndash88
Messick S 1989 Validity In Linn RL editor Educational measurement 3rdedition New York American Council on Education Macmillan 13ndash103
Meyer B 1985a Prose analyses purposes procedures and problems InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 11ndash64
mdashmdash 1985b Prose analysis purposes procedures and problems Part 2 InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 269ndash97
Monks V 1997 AugustSeptember Two views same waterway NationalWildlife 35 36ndash37
Pellegrino J Baxter G and Glaser R 1999 Addressing the lsquotwo disci-plinesrsquo problem linking theories of cognition and learning with assess-ment and instructional practice Review of Research in Education 24307ndash53
Perfetti C 1997 Sentences individual differences and multiple texts threeissues in text comprehension Discourse Processes 23 337ndash55
Perfetti C Britt MA and Georgi M 1995 Text-based learning andreasoning studies in history Hillside NJ Lawrence Erlbaum
Perfetti C Marron M and Foltz P 1996 Sources of comprehension failuretheoretical perspectives and case studies In Cornoldi C and Oakhill Jeditors Reading comprehension difficulties processes and interventionMahwah NJ Lawrence Erlbaum 137ndash65
202 New tasks for reading comprehension tests
Reinking D 1988 Computer-mediated text and comprehension differencesthe role of reading time reader preference and estimation of learningReading Research Quarterly 23 484ndash500
Reinking D and Schreiner R 1985 The effects of computer-mediated texton measures of reading comprehension and reading behavior ReadingResearch Quarterly 20 536ndash53
Spivey N 1997 The constructivist metaphor reading writing and themaking of meaning San Diego CA Academic Press
Stevens J 1996 Applied multivariate statistics for the social sciences 3rdedition Mahwah NJ Lawrence Erlbaum
Tabachnick B and Fidell L 1996 Using multivariate statistics 3rd editionNew York Harper Collins
Taylor C Jamieson J Eignor D and Kirsch I 1998 The relationshipbetween computer familiarity and performance on computer-basedTOEFL test tasks TOEFL Research Report No 61 Princeton NJEducational Testing Service
Tennesen M 1997 NovemberDecember On a clear day National Parks 7126ndash9
Trites L 2000 Beyond basic comprehension reading to learn and reading tointegrate for native and non-native speakers Unpublished doctoraldissertation Northern Arizona University Flagstaff AZ
Van den Berg S and Watt J 1991 Effects of educational setting on studentresponses to structured hypertext Journal of Computer-BasedInstruction 18 118ndash24
Van Dijk TA and Kintsch W 1983 Strategies of discourse comprehensionNew York Academic Press
Wiley J and Voss J 1999 Constructing arguments from multiple sourcestasks that promote understanding and not just memory for text Journalof Educational Psychology 91 310ndash11
Zimmerman T 1997 December 29 Filter it with billions and billions of oys-ters how to revive the Chesapeake Bay US News and World Report 12363
Appendix 1a Chart completion task
Directions Complete the following chart Fill in as much detail aspossible from the text read by categorizing the information into thedifferent areas on the chart Do not use single words for yourresponses form your responses in phrases or complete sentencesInclude examples from the text
Make no judgments about the accuracy of causes or effects or theeffectiveness of the solutions mentioned in the text Solutions are seen
Latricia Trites and Mary McGroarty 203
204 New tasks for reading comprehension tests
as any action taken in response to the problem(s) Solutions can takemany forms such as proposed solutions attempted solutions or failedsolutions Also space is provided under each category for examplesExamples are specific examples found in the text that are used by theauthor(s) to exemplify the problems causes effects or solutions inthe text However there may not be examples for every category
Points will be awarded for correct responses only There is no penaltyfor incorrect responses Points will be awarded in the following manner
Problems and Solutions 10 points eachCauses and Effects 5 points eachExamples 1 point each
Problems Causes Effects Solutions
Examples Examples Examples Examples
Latricia Trites and Mary McGroarty 205
Ap
pen
dix
1b
Rea
din
g t
o L
earn
sco
rin
g r
ub
ric
0ndash2
41 t
ota
l po
ssib
le
Pro
ble
ms
(10
po
ints
) 1
wo
rd
Cau
ses
(5p
oin
ts)
1 w
ord
E
ffec
ts (
5 p
oin
ts)
1 w
ord
S
olu
tio
ns
(10
po
ints
en
d)
1 w
ord
(6 p
oin
ts)
40
max
imu
m(3
po
ints
) 5
5 m
axim
um
(3 p
oin
ts)
30
max
imu
m(6
po
ints
) 9
0 m
axim
um
bullA
ir p
ollu
tio
nS
mo
g i
n t
he
bullS
ulf
ur
nit
rog
en e
mis
sio
ns
bullTre
es a
nd
pla
nts
aff
ecte
dS
tric
ter
En
vir
on
men
tal
Law
s
Nati
on
al
Park
sbull
Aci
d r
ain
(in
jure
d)
gro
wth
hin
der
ed(r
eso
luti
on
s
acts
)
bullO
pp
osit
ion
or
Ign
ori
ng
bull
Gro
un
d-l
evel
ozo
ne
bullV
isib
ilit
y d
ecre
ased
bull19
77 C
lean
Air
Act
(or
lack
of
coo
per
atio
n)
to
Urb
an
In
du
str
ial
em
issio
ns
bullM
eta
ls l
oo
sen
ed
into
wat
ers
Am
en
dm
en
ts l
ab
elin
gN
Ps
envi
ron
men
tal s
tan
dar
ds
bullA
uto
mo
bile
emis
sio
ns
(su
rfac
eg
rou
nd
wat
er
as C
lass I
are
as
(reg
ula
tio
ns)
bullP
ow
er
pla
nts
fac
tori
esp
lan
ts
wat
er p
ollu
ted
)bull
Reg
ion
al h
aze
reg
ula
tio
ns
bullN
o t
rue
lsquopo
int
sou
rces
rsquoem
issi
on
sbull
Nu
trie
nts
rem
oved
pro
po
sed
by
the
EPA
(myri
ad
of
sm
aller
po
llu
tio
n
bullE
mis
sio
ns
fro
m S
mo
kesta
cks
(lea
ched
) fr
om
so
il bull
Red
ucti
on
of
allo
wab
leso
urc
es
rath
er t
han
on
e (c
him
neys)
and
or
pla
nts
po
llu
tio
n s
tan
dard
s f
rom
la
rge
sou
rce)
bullE
mis
sio
ns
fro
m K
iln
sbull
Pu
blic o
utc
ryo
ver
stan
ce
ind
ust
rybull
Nati
on
al
Park
s l
imit
ed
bull
Un
reg
ula
ted
po
lluti
on
sm
og
o
f b
ig b
usi
nes
sg
ove
rnm
ent
bullC
lean
Air
Act
set
ju
risd
icti
on
of
po
lluti
on
(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)bull
Aq
uati
c l
ife in
jure
d (
dam
aged
)vis
ibilit
y g
oals
sou
rces
ou
tsid
e o
f th
e p
arks
bullS
mo
ke f
rom
Co
ntr
olled
bu
rns
bullO
bje
cti
ng
to c
on
str
ucti
on
bullE
mis
sio
ns
fro
m la
rge
citi
esp
erm
its
bullId
en
tifi
cati
on
of
po
lluti
on
so
urc
es
bullIn
du
str
y m
od
ificati
on
s
Insta
llati
on
o
f d
evic
es
(scr
ub
ber
s)
to r
edu
ce
po
lluti
on
bullP
ub
lic p
ressu
re t
op
rote
ct p
arks
206 New tasks for reading comprehension testsA
pp
en
dix
1b
(co
nti
nu
ed)
Exam
ple
s (
1 p
oin
t) 5
maxim
um
Exam
ple
s (
1 p
oin
t o
r 5 p
oin
ts
Exam
ple
s (
1 p
oin
t) 6
maxim
um
Exam
ple
s (
1 p
oin
t o
r 10 p
oin
ts
wit
h o
vera
rch
ing
cau
se
wit
h o
vera
rch
ing
so
luti
on
liste
d a
bo
ve)
liste
d a
bo
ve)
bullG
reat
Sm
oky M
ou
nta
inbull
Au
tom
ob
ile
emis
sio
ns
bullLe
aves
of
pla
nts
tu
rnin
g
bull19
77 C
lean
Air
Act
N
atio
nal
Par
kbull
Po
wer
pla
nts
fac
tori
es
pu
rple
an
d b
row
n (
stip
plin
g)
Am
en
dm
en
ts
lab
elin
gN
Ps
bullG
ran
d C
an
yo
np
lan
ts e
mis
sio
ns
bullH
ind
erin
g p
ho
tosyn
thesis
of
as C
lass I
are
as
Nat
ion
al P
ark
bullE
mis
sio
ns
fro
m
pla
nts
an
d t
rees
bullS
mo
ky M
ou
nta
in S
ou
rces
S
mo
kesta
cks
(ch
imn
eys)
bullR
ed
ucti
on
of
vis
ibilit
yat
bullR
egio
nal
haz
ere
gu
lati
on
s
fro
m O
hio
New
Yo
rk
bullE
mis
sio
ns
fro
m K
iln
sG
ran
d C
an
yo
np
rop
ose
d b
y th
e E
PA
Atl
anta
etc
bull
Un
reg
ula
ted
po
lluti
on
sm
og
bull
Vis
ibili
ty r
educ
ed 9
0 o
f bull
Red
ucti
on
of
allo
wab
lebull
Gra
nd
Can
yon
So
urc
es(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)th
e da
ysp
ollu
tio
n s
tan
dard
s f
rom
fro
m C
A N
V U
T A
Z N
M
bullS
mok
e fr
om C
on
tro
lled
bu
rns
bullS
ee v
ague
blu
e m
asse
sin
du
stry
and
Mex
ico
bullE
mis
sio
ns
fro
m la
rge
citi
esbull
See
hal
f as
far
as
in 1
919
bullT
N L
utt
rell
corp
bu
ildin
g
per
mit
sit
uat
ion
Exam
ple
s (
1 p
oin
t)
10 m
axim
um
Exam
ple
s (
1 p
oin
t)
5 m
axim
um
bullN
avaj
o G
ener
atin
g S
tati
on
bull
Ob
ject
ing
to
kiln
s in
TN
Pag
e A
Z (
Po
wer
Pla
nt
AZ
)bull
Res
earc
her
s u
sin
g s
cien
tifi
cbull
Po
wer
pla
nts
in T
N a
nd
OH
tech
no
log
y to
ID s
ou
rces
rive
r va
lley
(rad
ioac
tive
iso
top
es)
bullS
ou
ther
n C
alif
orn
ia E
dis
on
bull
So
uth
ern
Cal
ifo
rnia
Ed
iso
nP
lan
t L
aug
hlin
NV
Pla
nt
in L
aug
hlin
NV
bull
Ten
nes
see
Lutt
rell
Kiln
s T
Nid
enti
fied
as
po
int
sou
rce
bullIn
du
stry
in A
Zbull
Scru
bber
sin
stal
led
at
bullC
ars
in C
AN
avaj
o G
ener
atin
g P
lan
tbull
Sm
oke
stac
ks in
NM
NV
UT
bull10
r
edu
ctio
n o
f p
ollu
tio
nbull
Los
An
gel
esin
10ndash
15 y
ears
(g
oal
of
no
bullA
tlan
tam
an-m
ade
po
lluti
on
)bull
New
Yo
rk
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
Latricia Trites and Mary McGroarty 189
computer familiarity native language or level of education Again toensure that counterbalancing of tests controlled for any practiceeffect test order was added as an additional variable
To answer this question we proceeded to calculate a univariateANOVA on the Reading to Integrate measure with group statusmedium of text presentation and test order entered as possible contri-buting factors The results (Table 6) show as for Research Question 1that there were no significant interactions for any of the groupmedium or test order combinations the only significant main effectwas group membership The answer for Research Question 2 is thatnative language background and educational level had a significanteffect on Reading to Integrate but medium of text presentation did notPost hoc analysis of group contrasts showed that all three groups weredistinct in their performance on Reading to Integrate (see Table 7)
4 Research Question 3
The third research question asked to what extent measures of basiccomprehension Reading to Learn and Reading to Integrate were
Table 5 Post hoc Scheffeacute for Reading to Learn measure (n 251)
Group n Group n Mean difference Standard error
NSU 105 NNSU 106 2012 272NNSG 40 717 367
NNSU 106 NNSG 40 1295 366
Note p 05
Table 6 Performance on Reading to Integrate measure by groups medium andtest order (n 244a) (univariate analysis of variance)
Source Type III sum df Mean square Fof squares
Group 3629433 2 1814717 5582b
Medium 19295 1 19295 059Test order 9783 1 9783 030Group medium 1182 2 591 002Group test order 3014 2 1507 005Medium test order 109872 1 109872 338Group medium test order 148858 2 74429 229Error 7543037 232 32513
Notes an size reduced for Reading to Integrate because of anomalous testingsession bp 05
190 New tasks for reading comprehension tests
related We used correlational analysis as the first step in answeringthis question Results for the total participant population (see Table 8)showed moderate to high correlations across all reading measuresHowever the analyses done for Research Questions 1 and 2 revealedthat group status had a significant effect on performance on Readingto Learn and Reading to Integrate Further we realize that corre-lations are sensitive to variance so the high correlations seen in thetotal population could have been an artifact of combining the threegroups Therefore we examined the correlations among all readingmeasures for each group (available in Trites 2000 Appendix 1 pp230ndash33) While the reading measures were still correlated oftenmoderately sometimes highly magnitudes differed and sometimesdropped substantially The text-specific multiple-choice measuresBC1 and BC2 consistently correlated more highly with theNelsonndashDenny and TOEFL Reading Comprehension tests than withReading to Learn and Reading to Integrate based on the same textssuggesting a test method or construct effect Because comparisonsbetween different measures of basic comprehension were not a goal of the project BC1 and BC2 were not used in further analysesWe conclude that as expected all reading measures were relatedbut the lower correlations between Reading to Learn and Reading to Integrate and the traditional basic comprehension measures led us to consider further types of analysis to identify the possibledistinctiveness of the new measures
5 Discriminant analysis
Because we were interested in determining how constructs differedwe sought additional analyses to help us better characterize the new constructs Of the several possible statistical methods that could have been employed two are most plausible multivariateanalysis of variance usually associated with experimental research
Table 7 Post hoc Scheffeacute for Reading to Integrate measure (n 244a)
Group n Group n Mean difference Standard error
NS 101 NNSU 103 2641b 253NNSG 40 1005b 337
NNSU 103 NNSG 40 1636b 336
Notes an size reduced for Reading to Integrate because of anomalous testingsession bp 05
Latricia Trites and Mary McGroarty 191
Tab
le 8
Co
rrel
atio
ns
for
all r
ead
ing
mea
sure
s fo
r al
l par
tici
pan
ts (
n
251)
TO
EFL
Rea
din
gB
asic
B
asic
Rea
din
g t
oR
ead
ing
to
C
om
pre
hen
sio
nC
om
pre
hen
sio
n T
est
1C
om
pre
hen
sio
n T
est
2Le
arn
Inte
gra
tea
Nel
son
ndashDen
ny
90b
85b
84b
66b
69b
TO
EFL
Rea
din
g
100
85b
84b
64b
69b
com
pre
hen
sio
nB
asic
1
008
4b6
8b6
8b
com
pre
hen
sio
n 1
Bas
ic
100
68b
70b
com
pre
hen
sio
n 2
Rea
din
g t
o L
earn
100
59b
Not
es a
nsi
ze r
edu
ced
fo
r R
ead
ing
to
Inte
gra
te b
ecau
se o
f an
om
alo
us
test
ing
ses
sio
n b
p
05
and discriminant analysis usually associated with descriptiveresearch (Tabachnick and Fidell 1996) The present research wasconducted with samples of naturally occurring student groups andwas not experimental Moreover we were interested in finding waysto compare participant performance on the measures of the newconstructs Reading to Learn and Reading to Integrate with perform-ance on more traditional measures of basic comprehension Thus we opted to use discriminant analysis because of its parsimony of description and clarity of interpretation (Stevens 1996)Discriminant analysis a technique recommended to describe groupdifferences or predict group membership based on a comparison ofmultiple predictors (Huberty 1994) has been used in other areas ofapplied linguistic research to investigate creation of a student profileof success or failure on Computer Assisted Language Learning(CALL) lessons (Jamieson et al 1993) and accurate classification oftext types into registers (Biber 1993) among other purposes
To further distinguish basic comprehension from the new con-structs we conducted discriminant analysis on each of the twolanguage groups (native and nonnative) to determine whetherReading to Learn and Reading to Integrate would classify partici-pants in the same way that Basic Comprehension would We dividedthe native speaker and nonnative speaker groups into three levelshigh middle (mid) and low reading ability scorers based on thebasic comprehension measure chosen for that group (NelsonndashDennyfor native speakers TOEFL Reading Comprehension for nonnativespeakers) Research methodologists (Tabachnick and Fidell 1996513) note that robustness is expected when the smallest group has isleast 20 our smallest group was 25 To check the assumption ofhomogeneity of the variancecovariance matrices we examined theoutcomes of Boxrsquos M Test and found them all nonsignificant(Klecka 1980) Each group was checked for outliers usingMahalanobis distance treated as Chi-Square and no outliers were found (Tabachnick and Fidell 1996) Thus the data met allassumptions required for use of discriminant analysis
a Discriminant analysis for nonnative speakers To organize thediscriminant analysis in order to see if the Reading to LearnReadingto Integrate Composite classified participants similarly to themeasure of Basic Comprehension (for nonnative speakers TOEFLReading Comprehension) we divided the entire nonnative speakergroup (n 146) into three levels of basic comprehension high mid
192 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 193
and low These reading ability groups of high mid and low werebased on typical TOEFL Reading Comprehension score levelsrequired for program entry Participants were classified as high iftheir scores were greater than or equal to 550 or 56 and above on thescaled score on the TOEFL Reading Comprehension (550 is the cutscore often used for graduate entry) Participants were classified asmid if scores ranged between 500 and 549 or 50 to 55 on the TOEFLReading Comprehension Participants were classified as low if theirscores fell below 500 or 49 and below on TOEFL ReadingComprehension (500 is a minimum TOEFL score sometimes usedfor undergraduate admission often with the proviso that studentsenroll in ESL classes either prior to official enrollment or concur-rently) For our entire nonnative speaker group descriptive statisticson basic comprehension reading ability group membership levelsappear in Table 9
We ran SPSS Discriminant Analysis with initial grouping variablesof high mid and low reading ability We compared initial readingability levels with high mid and low categories on the Reading toLearnReading to Integrate Composite a new variable reflectinglevel of performance on the Reading to Learn and Reading toIntegrate measures combined5 The discriminant analysis yieldedone discriminant function with an eigenvalue of 92 responsible for999 of the variance in outcomes Wilkrsquos Lambda for this functionwas 52 significant at 001 the associated Chi-Square value wasextremely large (9093) and highly significant ( p 001) indicatingthe group centroids on the composite Reading to LearnReading toIntegrate function for the three nonnative speaker reading abilitygroups were significantly different Both Reading to Learn and
Table 9 Descriptive statistics for nonnative speakers by TOEFL reading compre-hension reading ability groups (n 143)
Reading ability group n Mean on TOEFL reading comprehension sd
High (56) 57 5968 303Mid (50ndash55) 42 5281 167Low (49) 44 4211 547
Total 143 5226 822
Note Total n 143 due to loss of three cases in anomalous Reading to Integratesession
5We first calculated two separate discriminant analyses one for Reading to Learn and one for Reading to Integrate but we found that both loaded on a single function so we used thecomposite in subsequent analyses
194 New tasks for reading comprehension tests
Reading to Integrate loaded significantly on the discriminant func-tion at 86 for Reading to Learn and 78 for Reading to Integrate ( p 05)
Over half (649) of the high reading ability group remained highon the new measure less than half (429) of the mid reading ability group remained classified as mid Hence the Reading toLearnReading to Integrate Composite was particularly influential inreclassifying the mid reading ability group and to a lesser extent thehigh group However most (818) of the low reading ability groupremained low on the Reading to LearnReading to IntegrateComposite (see Table 10) Of the 143 nonnative speaker participants92 (64) remained in the initial basic comprehension category onthe composite the rest moved but in different directions Twenty-one participants (147) were classified into a higher category on theReading to LearnReading to Integrate Composite than their initialbasic comprehension level would have suggested while 31 (217)were reclassified into a lower category Thus 51 participants(364) just over one third of the sample were classified differentlybased on their Reading to LearnReading to Integrate Compositeperformance
b Discriminant analysis for native speakers Because one of ourgoals in this project was to probe the possible validity of these newmeasures by assessing performance of two groups native as well as
Table 10 Discriminant analysis comparison of nonnative speaker reading abilitygroups with reading to learnreading to integrate composite (n 143)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto integrate composite
High Mid Low
CountHigh (56) 37 17 3 57Mid (50ndash55) 13 18 11 42Low (49) 2 6 36 44Reclassification total 52 41 50 143
PercentageHigh (56) 649 298 53 1000Mid (50ndash55) 310 429 262 1000Low (49) 45 136 818 1000
NoteTotal n 143 due to loss of three cases in anomalous Reading to Integratesession
Latricia Trites and Mary McGroarty 195
nonnative speakers we conducted a parallel discriminant analysisfor native speakers Thus for the native speakers we followed thesame procedure dividing the entire native speaker group (n 105)into three levels of basic comprehension high mid and low basedon score distances of 5 standard deviations from the sample meanof the NelsonndashDenny test for these participants Native speakersneed not take reading comprehension tests when entering the univer-sity so the three-way split was based entirely on our sample dataParticipants were classified as high if their scores on the Nelson-Denny were greater than or equal to 135 Participants were classifiedas mid if their scores ranged between 119ndash134 on the Nelson-DennyParticipants were classified as low if their scores fell at or below 118on the Nelson-Denny Descriptive statistics for the entire nativespeaker sample on basic comprehension group membership appearin Table 11
The discriminant analysis yielded one discriminant function with an eigenvalue of 31 responsible for 987 of the variance inoutcomes Wilkrsquos Lambda for this function was 76 significant at 001 the associated Chi-Square value was large (2639) and highlysignificant (p 001) indicating the group centroids on the discrim-inant function for the three reading ability groups on the Reading toLearnReading to Integrate Composite were significantly differentPooled within groups correlations between discriminating variablesshowed that Reading to Learn correlated with the first discriminantfunction at a level of 81 Reading to Integrate correlated with thesecond discriminant function at 71 This contrasts with findings forthe nonnative speakers where scores on the combined new measuresloaded significantly on only one discriminant function For nativespeakers then there is evidence for two significant discriminantfunctions although the first accounts for almost all of the varianceAlthough these two measures (Reading to Learn and Reading toIntegrate) loaded on two separate discriminant functions they still
Table 11 Descriptive statistics for native speakers by NelsonndashDenny reading abilitygroups (n 101)
Reading ability group n Mean on NelsonndashDenny sd
High (135) 39 14126 519Mid (119ndash134) 37 12565 509Low (118) 25 10288 1098Total 101 12604 1652
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
196 New tasks for reading comprehension tests
showed moderate correlations with the alternate function6 justifyingthe composite calculations
Results of discriminant analysis for native speakers seen in Table 12 show a different pattern than that observed for nonnativespeakers Nearly three-fourths (718) of the native speakersclassified as high in the basic comprehension reading ability groupremained high on the Reading to LearnReading to IntegrateComposite For the mid group however only 189 remainedclassified as mid Just over half (52) of the low reading abilitygroup members remained low on the Reading to LearnReading to Integrate Composite As with the nonnative speakers participantsin the mid category on basic comprehension showed the mostfrequent reclassification Forty-eight of the 101 (475) nativespeaker participants remained in the initial classification categoriesTwenty-two (218) were reclassified into a higher category and 31(307) were reclassified into a lower category Thus 53 participants(525) over half of the sample were classified differently based ontheir performance on the Reading to LearnReading to IntegrateComposite
6Reading to Learn correlated with function 2 at -58 Reading to Integrate with function 1 at 71
Table 12 Discriminant analysis comparison of native speaker reading abilitygroups with Reading to LearnReading to Integrate Composite (n 101)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto Integrate Composite
High Mid Low
CountHigh (56) 37 17 3CountHigh (135) 28 3 8 39Mid (119ndash134) 10 7 20 37Low (118) 5 7 13 25Reclassification total 43 17 41 101
PercentageHigh (135) 718 77 205 1000Mid (119ndash134) 270 189 541 1000Low (118) 200 280 520 1000
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
V Interpretation
Analyses done to answer Research Questions 1 and 2 showed that performance on Reading to Learn and Reading to Integratemeasures was significantly influenced by language background and level of education (graduate vs undergraduate for nonnativespeakers only) Moreover level of computer familiarity had nosignificant effect on Reading to Learn and Reading to Integrateperformance
Correlations showed that as expected all reading measures corre-lated to some degree generally answering Research Question 3 Themost interesting results came from the discriminant analyses becausethey showed a pattern of differential classification on the new taskssensitive to initial reading ability This pattern differed for nonnativespeakers and native speakers Examination of the reclassificationsbased on the new tasks revealed some of the problems of using abasic comprehension-only test to predict performance on more chal-lenging literacy tasks These results also imply the need for furtherwork on new measures of advanced literacy skills such as Reading toLearn and Reading to Integrate to reflect trends in construct-drivenassessment (Pellegrino et al 1999)
For the nonnative speakers most (818) of the participants withTOEFL Reading Comprehension below 50 remained classified as lowon the Reading to LearnReading to Integrate Composite suggestingthe existence of a lower threshold of academic English proficiencyParticipants below this threshold were unlikely to perform well ontasks assessing Reading to Learn and Reading to Integrate Even tasksinvolving only selection (such as BC1 and BC2) rather than produc-tion of responses were difficult for this group However for thenonnative speaker readers in the mid and high reading ability groupsbasic comprehension level was not nearly as consistent a predictor of performance on the Reading to LearnReading to IntegrateComposite This was especially striking for those in the mid readingability group Approximately one fourth of the mid group droppedinto the low category on the Reading to LearnReading to IntegrateComposite indicating that they experienced more difficulty in com-pleting the new measures but approximately one fourth could indeedcategorize and synthesize information relatively well despite mid-dling performance on basic comprehension This finding suggeststhat for nonnative speaker readers in the mid reading ability categorybasic comprehension measures were insufficient to predict theirperformance on the new measures
Latricia Trites and Mary McGroarty 197
Almost two-thirds of the high basic comprehension nonnativespeakers remained high on the Reading to LearnReading toIntegrate Composite but one third dropped suggesting that theycould not categorize and synthesize information from texts as well asthey could recognize information For this third the recognition-onlytask multiple-choice basic comprehension overestimated ability tomanipulate and synthesize information For the high and mid read-ing ability nonnative speaker groups basic comprehension-only testsmay then overestimate academic English proficiency
Results for the native speakers showed a more definitive pattern ofreclassification For the low reading ability group approximatelyhalf were misclassified suggesting that the basic comprehensionmeasure underrepresented their ability to categorize and synthesizeinformation in academic texts Most of the mid reading ability groupwere reclassified as lower on the Reading to LearnReading toIntegrate Composite showing that basic comprehension resultsoverestimated ability to succeed on more challenging tasks On theother hand almost one third of the mid reading ability group didbetter on the Reading to LearnReading to Integrate Composite forthem basic comprehension underrepresented their ability to com-plete more challenging reading tasks For the high ability readersnearly one third dropped on the Reading to LearnReading toIntegrate Composite For them the basic comprehension measureoverestimated their level of academic English proficiency The other two thirds of the high reading ability group remained highsuggesting a higher threshold of academic English proficiency
Considering results from both the nonnative speakers and nativespeakers we conclude that the new tasks did assess something dif-ferent from basic comprehension once a lower level threshold ofbasic academic English proficiency had been achieved Examinationof scatterplots based on discriminant functions indicated an obviousseparation between participants who could and could not perform onthe Reading to LearnReading to Integrate Composite thus provid-ing some tentative evidence for concurrent validity (Messick 1989Chapelle 1999) We had hoped to find clear evidence of a hierarchysuggesting that Reading to Learn was demonstrably more difficultthan basic comprehension and Reading to Integrate demonstrablymore difficult than Reading to Learn but results did not yield anobvious hierarchy Results did however suggest an even simplerpattern a dichotomy For those nonnative speakers above the lowerthreshold of English language proficiency which in our data wouldbe approximately 500 or 49 on the scaled TOEFL Reading
198 New tasks for reading comprehension tests
Comprehension scores some could perform well on the newmeasures some could not revealing two rather than three groups onthe Reading to LearnReading to Integrate Composite These resultslead us to speculate that the new measures tap additional skills suchas lsquosophisticated discourse processes and critical thinking skillsrsquo(Enright and Schedl 1999 24) in addition to language proficiencyFor native speakers the discriminant loading suggested a possiblehierarchy of difficulty but the scatterplots and classification tablesrevealed a dichotomy Reclassification based on the Reading toLearnReading to Integrate Composite nearly eliminated the mid basic comprehension reading ability category with only 168of the native speakers qualifying as mid on the Reading toLearnReading to Integrate Composite
VI Conclusions
While we acknowledge that there are some limitations to this projectsuch as the time required to administer and score new measuresthe results are illuminating useful and suggest some considerationsfor future research The first consideration relates to appropriateclassification of students based on academic reading abilities Thiscorresponds to the role of TOEFL as a gatekeeper by many institu-tions For TOEFL test-takers whose current TOEFL scores would bein an intermediate to high intermediate range the new tasks couldassess additional abilities relevant to academic performance Forsuch participants additional test development efforts are warrantedThe majority of high reading ability nonnative speakers (649)could perform the new measures a third could not Our resultssuggest that some unknown number of mid reading ability nonnativespeakers are more capable of succeeding at more challenging aca-demic reading tasks than their current level of basic comprehensionassessment would indicate At the same time there are some highand mid reading ability participants who could not perform the moredemanding tasks admitting such students directly into universityprograms could result in failure These results suggest that for moststudents at lower levels of basic comprehension (in our study thosewith TOEFL scores below 490 to 500) development of new tasks isunnecessary Nevertheless our results indicate that a certain smallpercentage (in our sample 8 students 182) of nonnative speakersclassified as low reading ability on basic comprehension did in fact
Latricia Trites and Mary McGroarty 199
perform adequately on more challenging tasks like Reading to Learnand Reading to Integrate Given the very large number of TOEFLtest-takers around the world this finding merits further research Itwould be potentially unfair to exclude such students from universitystudy based on basic comprehension-only measures such as the cur-rent TOEFL We would urge institutions that use TOEFL to considerscores on these new more demanding tasks if doing so wouldaddress institutional needs According to interview data collected inconjunction with this project (Trites 2000 Chapter 6) participantsperceived that successful completion of the new tasks required athorough understanding of the texts whereas the multiple-choicetests were less demanding requiring only superficial grasp of con-tent This finding corroborates the observation of Freedle and Kostin(1999 3) who note that often examinees do not need to comprehendthe accompanying text to answer the test item Therefore for allTOEFL test-takers incorporating more challenging tasks such as theReading to Learn and Reading to Integrate measures into typicalEnglish language instruction could have positive washback effects
The second area of consideration is the focus on additional rele-vant research For predictive validity it is important to determine thecorrelation between the Reading to LearnReading to IntegrateComposite measures and actual academic performance in universityclasses Correlations might differ depending on the degree of catego-rization and synthesis required by different major fields of studyAnother area for future research is related to the novelty of theReading to Learn task as an assessment technique Current pedagog-ical trends in literacy instruction emphasize the use of graphic organ-izers as a means to understand and manipulate information in textsTo our knowledge graphic organizers are typically used as classroomactivities rather than assessment tools They have good potential foruse in assessment if students are familiar with them and if appropri-ate scoring systems can be developed This project demonstrates that it is possible though labor intensive to develop reliable scoringsystems Further research is needed to explore refinement of this tasktype and related scoring systems for use in testing programs withlarge numbers of test-takers and scorers Although the Reading toIntegrate task (generating a synthesis) was more familiar the scoringsystem was innovative because it reflected a readerrsquos ability torecognize textual frames as well as integrate information If testdevelopers are interested in the abilities of nonnative speaker readersto perform such tasks further development of similar tasks andscoring systems is warranted While a substantial investment of time
200 New tasks for reading comprehension tests
would be required to refine the administration and scoring systemsneeded for more complex tasks such as these it should be weighedagainst the possible danger of under-representing the likelihood ofacademic success based on results of the basic comprehension-onlymeasures still most often used to assess academic proficiency
Acknowledgements
This project was funded by a grant from Educational Testing Serviceas part of the TOEFL 2000 effort We appreciate the cooperation ofthe ETS staff members who assisted in the development of passagespecific tasks and other aspects of the research however no officialendorsement of Educational Testing Service should be inferred
VII References
Bachman L 2000 Modern language testing at the turn of the century assur-ing that what we count counts Language Testing 17 1ndash42
Biber D 1993 Using register-diversified corpora for general language studiesComputational Linguistics 19 219ndash41
Britt M Rouet J and Perfetti C 1996 Using hypertext to study and reasonabout historical evidence In Rouet J Levonen J Dillon A and Spiro Reditors Hypertext and cognition Mahwah NJ Lawrence Erlbaum 43ndash72
Chapelle C 1999 Validity in language assessment Annual Review of AppliedLinguistics 19 1ndash19
Educational Testing Service 1997 TOEFL test and score manual PrincetonNJ Educational Testing Service
mdashmdash 1998 Draft TOEFL 2000 research agenda framework areas of researchResearch agenda three TOEFL 2000 internal document Princeton NJEducational Testing Service
Eignor D Taylor C Kirsch I and Jamieson J 1998 Development of ascale for assessing the level of computer familiarity of TOEFL exami-nees TOEFL Research Report No 60 Princeton NJ EducationalTesting Service
Enright M and Schedl M 1999 Reading for a reason using reader purposeto guide test design TOEFL 2000 Internal Report Princeton NJEducational Testing Service
Enright M Grabe W Mosenthal P Mulcahy-Ernt P and Schedl M1998 A TOEFL 2000 framework for testing reading comprehension aworking paper Princeton NJ Educational Testing Service
Foltz P 1996 Comprehension coherence and strategies in hypertext InRouet J Levonen J Dillon A and Spiro R editors Hypertext and cog-nition Mahwah NJ Lawrence Erlbaum 109ndash36
Latricia Trites and Mary McGroarty 201
Freedle R and Kostin I 1999 Does the text matter in a multiple choice testof comprehension The case for the construct validity of TOEFLrsquosminitalks Language Testing 16 2ndash32
Goldman S 1997 Learning from text reflections on the past and suggestionsfor the future Discourse Processes 23 357ndash98
Hayes J and Hatch J 1999 Issues in measuring reliability correlationversus percentage of agreement Written Communication 16 354ndash67
Huberty C 1994 Applied discriminant analysis New York John WileyJamieson J Campbell J Norfleet L and Berbisada N 1993 Reliability
of a computerized scoring routine for an open-ended task System 21305ndash22
Jamieson J Norfleet L and Berbisada N 1993 Successes failures anddropouts in computer-assisted language lessons Computer AssistedEnglish Language Learning Journal 4 12ndash20
Klecka W 1980 Discriminant analysis In Lewis-Beck M editorQuantitative applications in the social sciences Volume 19 NewburyPark CA Sage
Lehto M Zhu W and Carpenter B 1995 The relative effectiveness ofhypertext and text International Journal of Human-ComputerInteraction 7 293ndash313
McNamara D and Kintsch W 1996 Learning from texts effects of prior knowl-edge and text coherence Discourse Processes 22 247ndash88
Messick S 1989 Validity In Linn RL editor Educational measurement 3rdedition New York American Council on Education Macmillan 13ndash103
Meyer B 1985a Prose analyses purposes procedures and problems InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 11ndash64
mdashmdash 1985b Prose analysis purposes procedures and problems Part 2 InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 269ndash97
Monks V 1997 AugustSeptember Two views same waterway NationalWildlife 35 36ndash37
Pellegrino J Baxter G and Glaser R 1999 Addressing the lsquotwo disci-plinesrsquo problem linking theories of cognition and learning with assess-ment and instructional practice Review of Research in Education 24307ndash53
Perfetti C 1997 Sentences individual differences and multiple texts threeissues in text comprehension Discourse Processes 23 337ndash55
Perfetti C Britt MA and Georgi M 1995 Text-based learning andreasoning studies in history Hillside NJ Lawrence Erlbaum
Perfetti C Marron M and Foltz P 1996 Sources of comprehension failuretheoretical perspectives and case studies In Cornoldi C and Oakhill Jeditors Reading comprehension difficulties processes and interventionMahwah NJ Lawrence Erlbaum 137ndash65
202 New tasks for reading comprehension tests
Reinking D 1988 Computer-mediated text and comprehension differencesthe role of reading time reader preference and estimation of learningReading Research Quarterly 23 484ndash500
Reinking D and Schreiner R 1985 The effects of computer-mediated texton measures of reading comprehension and reading behavior ReadingResearch Quarterly 20 536ndash53
Spivey N 1997 The constructivist metaphor reading writing and themaking of meaning San Diego CA Academic Press
Stevens J 1996 Applied multivariate statistics for the social sciences 3rdedition Mahwah NJ Lawrence Erlbaum
Tabachnick B and Fidell L 1996 Using multivariate statistics 3rd editionNew York Harper Collins
Taylor C Jamieson J Eignor D and Kirsch I 1998 The relationshipbetween computer familiarity and performance on computer-basedTOEFL test tasks TOEFL Research Report No 61 Princeton NJEducational Testing Service
Tennesen M 1997 NovemberDecember On a clear day National Parks 7126ndash9
Trites L 2000 Beyond basic comprehension reading to learn and reading tointegrate for native and non-native speakers Unpublished doctoraldissertation Northern Arizona University Flagstaff AZ
Van den Berg S and Watt J 1991 Effects of educational setting on studentresponses to structured hypertext Journal of Computer-BasedInstruction 18 118ndash24
Van Dijk TA and Kintsch W 1983 Strategies of discourse comprehensionNew York Academic Press
Wiley J and Voss J 1999 Constructing arguments from multiple sourcestasks that promote understanding and not just memory for text Journalof Educational Psychology 91 310ndash11
Zimmerman T 1997 December 29 Filter it with billions and billions of oys-ters how to revive the Chesapeake Bay US News and World Report 12363
Appendix 1a Chart completion task
Directions Complete the following chart Fill in as much detail aspossible from the text read by categorizing the information into thedifferent areas on the chart Do not use single words for yourresponses form your responses in phrases or complete sentencesInclude examples from the text
Make no judgments about the accuracy of causes or effects or theeffectiveness of the solutions mentioned in the text Solutions are seen
Latricia Trites and Mary McGroarty 203
204 New tasks for reading comprehension tests
as any action taken in response to the problem(s) Solutions can takemany forms such as proposed solutions attempted solutions or failedsolutions Also space is provided under each category for examplesExamples are specific examples found in the text that are used by theauthor(s) to exemplify the problems causes effects or solutions inthe text However there may not be examples for every category
Points will be awarded for correct responses only There is no penaltyfor incorrect responses Points will be awarded in the following manner
Problems and Solutions 10 points eachCauses and Effects 5 points eachExamples 1 point each
Problems Causes Effects Solutions
Examples Examples Examples Examples
Latricia Trites and Mary McGroarty 205
Ap
pen
dix
1b
Rea
din
g t
o L
earn
sco
rin
g r
ub
ric
0ndash2
41 t
ota
l po
ssib
le
Pro
ble
ms
(10
po
ints
) 1
wo
rd
Cau
ses
(5p
oin
ts)
1 w
ord
E
ffec
ts (
5 p
oin
ts)
1 w
ord
S
olu
tio
ns
(10
po
ints
en
d)
1 w
ord
(6 p
oin
ts)
40
max
imu
m(3
po
ints
) 5
5 m
axim
um
(3 p
oin
ts)
30
max
imu
m(6
po
ints
) 9
0 m
axim
um
bullA
ir p
ollu
tio
nS
mo
g i
n t
he
bullS
ulf
ur
nit
rog
en e
mis
sio
ns
bullTre
es a
nd
pla
nts
aff
ecte
dS
tric
ter
En
vir
on
men
tal
Law
s
Nati
on
al
Park
sbull
Aci
d r
ain
(in
jure
d)
gro
wth
hin
der
ed(r
eso
luti
on
s
acts
)
bullO
pp
osit
ion
or
Ign
ori
ng
bull
Gro
un
d-l
evel
ozo
ne
bullV
isib
ilit
y d
ecre
ased
bull19
77 C
lean
Air
Act
(or
lack
of
coo
per
atio
n)
to
Urb
an
In
du
str
ial
em
issio
ns
bullM
eta
ls l
oo
sen
ed
into
wat
ers
Am
en
dm
en
ts l
ab
elin
gN
Ps
envi
ron
men
tal s
tan
dar
ds
bullA
uto
mo
bile
emis
sio
ns
(su
rfac
eg
rou
nd
wat
er
as C
lass I
are
as
(reg
ula
tio
ns)
bullP
ow
er
pla
nts
fac
tori
esp
lan
ts
wat
er p
ollu
ted
)bull
Reg
ion
al h
aze
reg
ula
tio
ns
bullN
o t
rue
lsquopo
int
sou
rces
rsquoem
issi
on
sbull
Nu
trie
nts
rem
oved
pro
po
sed
by
the
EPA
(myri
ad
of
sm
aller
po
llu
tio
n
bullE
mis
sio
ns
fro
m S
mo
kesta
cks
(lea
ched
) fr
om
so
il bull
Red
ucti
on
of
allo
wab
leso
urc
es
rath
er t
han
on
e (c
him
neys)
and
or
pla
nts
po
llu
tio
n s
tan
dard
s f
rom
la
rge
sou
rce)
bullE
mis
sio
ns
fro
m K
iln
sbull
Pu
blic o
utc
ryo
ver
stan
ce
ind
ust
rybull
Nati
on
al
Park
s l
imit
ed
bull
Un
reg
ula
ted
po
lluti
on
sm
og
o
f b
ig b
usi
nes
sg
ove
rnm
ent
bullC
lean
Air
Act
set
ju
risd
icti
on
of
po
lluti
on
(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)bull
Aq
uati
c l
ife in
jure
d (
dam
aged
)vis
ibilit
y g
oals
sou
rces
ou
tsid
e o
f th
e p
arks
bullS
mo
ke f
rom
Co
ntr
olled
bu
rns
bullO
bje
cti
ng
to c
on
str
ucti
on
bullE
mis
sio
ns
fro
m la
rge
citi
esp
erm
its
bullId
en
tifi
cati
on
of
po
lluti
on
so
urc
es
bullIn
du
str
y m
od
ificati
on
s
Insta
llati
on
o
f d
evic
es
(scr
ub
ber
s)
to r
edu
ce
po
lluti
on
bullP
ub
lic p
ressu
re t
op
rote
ct p
arks
206 New tasks for reading comprehension testsA
pp
en
dix
1b
(co
nti
nu
ed)
Exam
ple
s (
1 p
oin
t) 5
maxim
um
Exam
ple
s (
1 p
oin
t o
r 5 p
oin
ts
Exam
ple
s (
1 p
oin
t) 6
maxim
um
Exam
ple
s (
1 p
oin
t o
r 10 p
oin
ts
wit
h o
vera
rch
ing
cau
se
wit
h o
vera
rch
ing
so
luti
on
liste
d a
bo
ve)
liste
d a
bo
ve)
bullG
reat
Sm
oky M
ou
nta
inbull
Au
tom
ob
ile
emis
sio
ns
bullLe
aves
of
pla
nts
tu
rnin
g
bull19
77 C
lean
Air
Act
N
atio
nal
Par
kbull
Po
wer
pla
nts
fac
tori
es
pu
rple
an
d b
row
n (
stip
plin
g)
Am
en
dm
en
ts
lab
elin
gN
Ps
bullG
ran
d C
an
yo
np
lan
ts e
mis
sio
ns
bullH
ind
erin
g p
ho
tosyn
thesis
of
as C
lass I
are
as
Nat
ion
al P
ark
bullE
mis
sio
ns
fro
m
pla
nts
an
d t
rees
bullS
mo
ky M
ou
nta
in S
ou
rces
S
mo
kesta
cks
(ch
imn
eys)
bullR
ed
ucti
on
of
vis
ibilit
yat
bullR
egio
nal
haz
ere
gu
lati
on
s
fro
m O
hio
New
Yo
rk
bullE
mis
sio
ns
fro
m K
iln
sG
ran
d C
an
yo
np
rop
ose
d b
y th
e E
PA
Atl
anta
etc
bull
Un
reg
ula
ted
po
lluti
on
sm
og
bull
Vis
ibili
ty r
educ
ed 9
0 o
f bull
Red
ucti
on
of
allo
wab
lebull
Gra
nd
Can
yon
So
urc
es(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)th
e da
ysp
ollu
tio
n s
tan
dard
s f
rom
fro
m C
A N
V U
T A
Z N
M
bullS
mok
e fr
om C
on
tro
lled
bu
rns
bullS
ee v
ague
blu
e m
asse
sin
du
stry
and
Mex
ico
bullE
mis
sio
ns
fro
m la
rge
citi
esbull
See
hal
f as
far
as
in 1
919
bullT
N L
utt
rell
corp
bu
ildin
g
per
mit
sit
uat
ion
Exam
ple
s (
1 p
oin
t)
10 m
axim
um
Exam
ple
s (
1 p
oin
t)
5 m
axim
um
bullN
avaj
o G
ener
atin
g S
tati
on
bull
Ob
ject
ing
to
kiln
s in
TN
Pag
e A
Z (
Po
wer
Pla
nt
AZ
)bull
Res
earc
her
s u
sin
g s
cien
tifi
cbull
Po
wer
pla
nts
in T
N a
nd
OH
tech
no
log
y to
ID s
ou
rces
rive
r va
lley
(rad
ioac
tive
iso
top
es)
bullS
ou
ther
n C
alif
orn
ia E
dis
on
bull
So
uth
ern
Cal
ifo
rnia
Ed
iso
nP
lan
t L
aug
hlin
NV
Pla
nt
in L
aug
hlin
NV
bull
Ten
nes
see
Lutt
rell
Kiln
s T
Nid
enti
fied
as
po
int
sou
rce
bullIn
du
stry
in A
Zbull
Scru
bber
sin
stal
led
at
bullC
ars
in C
AN
avaj
o G
ener
atin
g P
lan
tbull
Sm
oke
stac
ks in
NM
NV
UT
bull10
r
edu
ctio
n o
f p
ollu
tio
nbull
Los
An
gel
esin
10ndash
15 y
ears
(g
oal
of
no
bullA
tlan
tam
an-m
ade
po
lluti
on
)bull
New
Yo
rk
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
190 New tasks for reading comprehension tests
related We used correlational analysis as the first step in answeringthis question Results for the total participant population (see Table 8)showed moderate to high correlations across all reading measuresHowever the analyses done for Research Questions 1 and 2 revealedthat group status had a significant effect on performance on Readingto Learn and Reading to Integrate Further we realize that corre-lations are sensitive to variance so the high correlations seen in thetotal population could have been an artifact of combining the threegroups Therefore we examined the correlations among all readingmeasures for each group (available in Trites 2000 Appendix 1 pp230ndash33) While the reading measures were still correlated oftenmoderately sometimes highly magnitudes differed and sometimesdropped substantially The text-specific multiple-choice measuresBC1 and BC2 consistently correlated more highly with theNelsonndashDenny and TOEFL Reading Comprehension tests than withReading to Learn and Reading to Integrate based on the same textssuggesting a test method or construct effect Because comparisonsbetween different measures of basic comprehension were not a goal of the project BC1 and BC2 were not used in further analysesWe conclude that as expected all reading measures were relatedbut the lower correlations between Reading to Learn and Reading to Integrate and the traditional basic comprehension measures led us to consider further types of analysis to identify the possibledistinctiveness of the new measures
5 Discriminant analysis
Because we were interested in determining how constructs differedwe sought additional analyses to help us better characterize the new constructs Of the several possible statistical methods that could have been employed two are most plausible multivariateanalysis of variance usually associated with experimental research
Table 7 Post hoc Scheffeacute for Reading to Integrate measure (n 244a)
Group n Group n Mean difference Standard error
NS 101 NNSU 103 2641b 253NNSG 40 1005b 337
NNSU 103 NNSG 40 1636b 336
Notes an size reduced for Reading to Integrate because of anomalous testingsession bp 05
Latricia Trites and Mary McGroarty 191
Tab
le 8
Co
rrel
atio
ns
for
all r
ead
ing
mea
sure
s fo
r al
l par
tici
pan
ts (
n
251)
TO
EFL
Rea
din
gB
asic
B
asic
Rea
din
g t
oR
ead
ing
to
C
om
pre
hen
sio
nC
om
pre
hen
sio
n T
est
1C
om
pre
hen
sio
n T
est
2Le
arn
Inte
gra
tea
Nel
son
ndashDen
ny
90b
85b
84b
66b
69b
TO
EFL
Rea
din
g
100
85b
84b
64b
69b
com
pre
hen
sio
nB
asic
1
008
4b6
8b6
8b
com
pre
hen
sio
n 1
Bas
ic
100
68b
70b
com
pre
hen
sio
n 2
Rea
din
g t
o L
earn
100
59b
Not
es a
nsi
ze r
edu
ced
fo
r R
ead
ing
to
Inte
gra
te b
ecau
se o
f an
om
alo
us
test
ing
ses
sio
n b
p
05
and discriminant analysis usually associated with descriptiveresearch (Tabachnick and Fidell 1996) The present research wasconducted with samples of naturally occurring student groups andwas not experimental Moreover we were interested in finding waysto compare participant performance on the measures of the newconstructs Reading to Learn and Reading to Integrate with perform-ance on more traditional measures of basic comprehension Thus we opted to use discriminant analysis because of its parsimony of description and clarity of interpretation (Stevens 1996)Discriminant analysis a technique recommended to describe groupdifferences or predict group membership based on a comparison ofmultiple predictors (Huberty 1994) has been used in other areas ofapplied linguistic research to investigate creation of a student profileof success or failure on Computer Assisted Language Learning(CALL) lessons (Jamieson et al 1993) and accurate classification oftext types into registers (Biber 1993) among other purposes
To further distinguish basic comprehension from the new con-structs we conducted discriminant analysis on each of the twolanguage groups (native and nonnative) to determine whetherReading to Learn and Reading to Integrate would classify partici-pants in the same way that Basic Comprehension would We dividedthe native speaker and nonnative speaker groups into three levelshigh middle (mid) and low reading ability scorers based on thebasic comprehension measure chosen for that group (NelsonndashDennyfor native speakers TOEFL Reading Comprehension for nonnativespeakers) Research methodologists (Tabachnick and Fidell 1996513) note that robustness is expected when the smallest group has isleast 20 our smallest group was 25 To check the assumption ofhomogeneity of the variancecovariance matrices we examined theoutcomes of Boxrsquos M Test and found them all nonsignificant(Klecka 1980) Each group was checked for outliers usingMahalanobis distance treated as Chi-Square and no outliers were found (Tabachnick and Fidell 1996) Thus the data met allassumptions required for use of discriminant analysis
a Discriminant analysis for nonnative speakers To organize thediscriminant analysis in order to see if the Reading to LearnReadingto Integrate Composite classified participants similarly to themeasure of Basic Comprehension (for nonnative speakers TOEFLReading Comprehension) we divided the entire nonnative speakergroup (n 146) into three levels of basic comprehension high mid
192 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 193
and low These reading ability groups of high mid and low werebased on typical TOEFL Reading Comprehension score levelsrequired for program entry Participants were classified as high iftheir scores were greater than or equal to 550 or 56 and above on thescaled score on the TOEFL Reading Comprehension (550 is the cutscore often used for graduate entry) Participants were classified asmid if scores ranged between 500 and 549 or 50 to 55 on the TOEFLReading Comprehension Participants were classified as low if theirscores fell below 500 or 49 and below on TOEFL ReadingComprehension (500 is a minimum TOEFL score sometimes usedfor undergraduate admission often with the proviso that studentsenroll in ESL classes either prior to official enrollment or concur-rently) For our entire nonnative speaker group descriptive statisticson basic comprehension reading ability group membership levelsappear in Table 9
We ran SPSS Discriminant Analysis with initial grouping variablesof high mid and low reading ability We compared initial readingability levels with high mid and low categories on the Reading toLearnReading to Integrate Composite a new variable reflectinglevel of performance on the Reading to Learn and Reading toIntegrate measures combined5 The discriminant analysis yieldedone discriminant function with an eigenvalue of 92 responsible for999 of the variance in outcomes Wilkrsquos Lambda for this functionwas 52 significant at 001 the associated Chi-Square value wasextremely large (9093) and highly significant ( p 001) indicatingthe group centroids on the composite Reading to LearnReading toIntegrate function for the three nonnative speaker reading abilitygroups were significantly different Both Reading to Learn and
Table 9 Descriptive statistics for nonnative speakers by TOEFL reading compre-hension reading ability groups (n 143)
Reading ability group n Mean on TOEFL reading comprehension sd
High (56) 57 5968 303Mid (50ndash55) 42 5281 167Low (49) 44 4211 547
Total 143 5226 822
Note Total n 143 due to loss of three cases in anomalous Reading to Integratesession
5We first calculated two separate discriminant analyses one for Reading to Learn and one for Reading to Integrate but we found that both loaded on a single function so we used thecomposite in subsequent analyses
194 New tasks for reading comprehension tests
Reading to Integrate loaded significantly on the discriminant func-tion at 86 for Reading to Learn and 78 for Reading to Integrate ( p 05)
Over half (649) of the high reading ability group remained highon the new measure less than half (429) of the mid reading ability group remained classified as mid Hence the Reading toLearnReading to Integrate Composite was particularly influential inreclassifying the mid reading ability group and to a lesser extent thehigh group However most (818) of the low reading ability groupremained low on the Reading to LearnReading to IntegrateComposite (see Table 10) Of the 143 nonnative speaker participants92 (64) remained in the initial basic comprehension category onthe composite the rest moved but in different directions Twenty-one participants (147) were classified into a higher category on theReading to LearnReading to Integrate Composite than their initialbasic comprehension level would have suggested while 31 (217)were reclassified into a lower category Thus 51 participants(364) just over one third of the sample were classified differentlybased on their Reading to LearnReading to Integrate Compositeperformance
b Discriminant analysis for native speakers Because one of ourgoals in this project was to probe the possible validity of these newmeasures by assessing performance of two groups native as well as
Table 10 Discriminant analysis comparison of nonnative speaker reading abilitygroups with reading to learnreading to integrate composite (n 143)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto integrate composite
High Mid Low
CountHigh (56) 37 17 3 57Mid (50ndash55) 13 18 11 42Low (49) 2 6 36 44Reclassification total 52 41 50 143
PercentageHigh (56) 649 298 53 1000Mid (50ndash55) 310 429 262 1000Low (49) 45 136 818 1000
NoteTotal n 143 due to loss of three cases in anomalous Reading to Integratesession
Latricia Trites and Mary McGroarty 195
nonnative speakers we conducted a parallel discriminant analysisfor native speakers Thus for the native speakers we followed thesame procedure dividing the entire native speaker group (n 105)into three levels of basic comprehension high mid and low basedon score distances of 5 standard deviations from the sample meanof the NelsonndashDenny test for these participants Native speakersneed not take reading comprehension tests when entering the univer-sity so the three-way split was based entirely on our sample dataParticipants were classified as high if their scores on the Nelson-Denny were greater than or equal to 135 Participants were classifiedas mid if their scores ranged between 119ndash134 on the Nelson-DennyParticipants were classified as low if their scores fell at or below 118on the Nelson-Denny Descriptive statistics for the entire nativespeaker sample on basic comprehension group membership appearin Table 11
The discriminant analysis yielded one discriminant function with an eigenvalue of 31 responsible for 987 of the variance inoutcomes Wilkrsquos Lambda for this function was 76 significant at 001 the associated Chi-Square value was large (2639) and highlysignificant (p 001) indicating the group centroids on the discrim-inant function for the three reading ability groups on the Reading toLearnReading to Integrate Composite were significantly differentPooled within groups correlations between discriminating variablesshowed that Reading to Learn correlated with the first discriminantfunction at a level of 81 Reading to Integrate correlated with thesecond discriminant function at 71 This contrasts with findings forthe nonnative speakers where scores on the combined new measuresloaded significantly on only one discriminant function For nativespeakers then there is evidence for two significant discriminantfunctions although the first accounts for almost all of the varianceAlthough these two measures (Reading to Learn and Reading toIntegrate) loaded on two separate discriminant functions they still
Table 11 Descriptive statistics for native speakers by NelsonndashDenny reading abilitygroups (n 101)
Reading ability group n Mean on NelsonndashDenny sd
High (135) 39 14126 519Mid (119ndash134) 37 12565 509Low (118) 25 10288 1098Total 101 12604 1652
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
196 New tasks for reading comprehension tests
showed moderate correlations with the alternate function6 justifyingthe composite calculations
Results of discriminant analysis for native speakers seen in Table 12 show a different pattern than that observed for nonnativespeakers Nearly three-fourths (718) of the native speakersclassified as high in the basic comprehension reading ability groupremained high on the Reading to LearnReading to IntegrateComposite For the mid group however only 189 remainedclassified as mid Just over half (52) of the low reading abilitygroup members remained low on the Reading to LearnReading to Integrate Composite As with the nonnative speakers participantsin the mid category on basic comprehension showed the mostfrequent reclassification Forty-eight of the 101 (475) nativespeaker participants remained in the initial classification categoriesTwenty-two (218) were reclassified into a higher category and 31(307) were reclassified into a lower category Thus 53 participants(525) over half of the sample were classified differently based ontheir performance on the Reading to LearnReading to IntegrateComposite
6Reading to Learn correlated with function 2 at -58 Reading to Integrate with function 1 at 71
Table 12 Discriminant analysis comparison of native speaker reading abilitygroups with Reading to LearnReading to Integrate Composite (n 101)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto Integrate Composite
High Mid Low
CountHigh (56) 37 17 3CountHigh (135) 28 3 8 39Mid (119ndash134) 10 7 20 37Low (118) 5 7 13 25Reclassification total 43 17 41 101
PercentageHigh (135) 718 77 205 1000Mid (119ndash134) 270 189 541 1000Low (118) 200 280 520 1000
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
V Interpretation
Analyses done to answer Research Questions 1 and 2 showed that performance on Reading to Learn and Reading to Integratemeasures was significantly influenced by language background and level of education (graduate vs undergraduate for nonnativespeakers only) Moreover level of computer familiarity had nosignificant effect on Reading to Learn and Reading to Integrateperformance
Correlations showed that as expected all reading measures corre-lated to some degree generally answering Research Question 3 Themost interesting results came from the discriminant analyses becausethey showed a pattern of differential classification on the new taskssensitive to initial reading ability This pattern differed for nonnativespeakers and native speakers Examination of the reclassificationsbased on the new tasks revealed some of the problems of using abasic comprehension-only test to predict performance on more chal-lenging literacy tasks These results also imply the need for furtherwork on new measures of advanced literacy skills such as Reading toLearn and Reading to Integrate to reflect trends in construct-drivenassessment (Pellegrino et al 1999)
For the nonnative speakers most (818) of the participants withTOEFL Reading Comprehension below 50 remained classified as lowon the Reading to LearnReading to Integrate Composite suggestingthe existence of a lower threshold of academic English proficiencyParticipants below this threshold were unlikely to perform well ontasks assessing Reading to Learn and Reading to Integrate Even tasksinvolving only selection (such as BC1 and BC2) rather than produc-tion of responses were difficult for this group However for thenonnative speaker readers in the mid and high reading ability groupsbasic comprehension level was not nearly as consistent a predictor of performance on the Reading to LearnReading to IntegrateComposite This was especially striking for those in the mid readingability group Approximately one fourth of the mid group droppedinto the low category on the Reading to LearnReading to IntegrateComposite indicating that they experienced more difficulty in com-pleting the new measures but approximately one fourth could indeedcategorize and synthesize information relatively well despite mid-dling performance on basic comprehension This finding suggeststhat for nonnative speaker readers in the mid reading ability categorybasic comprehension measures were insufficient to predict theirperformance on the new measures
Latricia Trites and Mary McGroarty 197
Almost two-thirds of the high basic comprehension nonnativespeakers remained high on the Reading to LearnReading toIntegrate Composite but one third dropped suggesting that theycould not categorize and synthesize information from texts as well asthey could recognize information For this third the recognition-onlytask multiple-choice basic comprehension overestimated ability tomanipulate and synthesize information For the high and mid read-ing ability nonnative speaker groups basic comprehension-only testsmay then overestimate academic English proficiency
Results for the native speakers showed a more definitive pattern ofreclassification For the low reading ability group approximatelyhalf were misclassified suggesting that the basic comprehensionmeasure underrepresented their ability to categorize and synthesizeinformation in academic texts Most of the mid reading ability groupwere reclassified as lower on the Reading to LearnReading toIntegrate Composite showing that basic comprehension resultsoverestimated ability to succeed on more challenging tasks On theother hand almost one third of the mid reading ability group didbetter on the Reading to LearnReading to Integrate Composite forthem basic comprehension underrepresented their ability to com-plete more challenging reading tasks For the high ability readersnearly one third dropped on the Reading to LearnReading toIntegrate Composite For them the basic comprehension measureoverestimated their level of academic English proficiency The other two thirds of the high reading ability group remained highsuggesting a higher threshold of academic English proficiency
Considering results from both the nonnative speakers and nativespeakers we conclude that the new tasks did assess something dif-ferent from basic comprehension once a lower level threshold ofbasic academic English proficiency had been achieved Examinationof scatterplots based on discriminant functions indicated an obviousseparation between participants who could and could not perform onthe Reading to LearnReading to Integrate Composite thus provid-ing some tentative evidence for concurrent validity (Messick 1989Chapelle 1999) We had hoped to find clear evidence of a hierarchysuggesting that Reading to Learn was demonstrably more difficultthan basic comprehension and Reading to Integrate demonstrablymore difficult than Reading to Learn but results did not yield anobvious hierarchy Results did however suggest an even simplerpattern a dichotomy For those nonnative speakers above the lowerthreshold of English language proficiency which in our data wouldbe approximately 500 or 49 on the scaled TOEFL Reading
198 New tasks for reading comprehension tests
Comprehension scores some could perform well on the newmeasures some could not revealing two rather than three groups onthe Reading to LearnReading to Integrate Composite These resultslead us to speculate that the new measures tap additional skills suchas lsquosophisticated discourse processes and critical thinking skillsrsquo(Enright and Schedl 1999 24) in addition to language proficiencyFor native speakers the discriminant loading suggested a possiblehierarchy of difficulty but the scatterplots and classification tablesrevealed a dichotomy Reclassification based on the Reading toLearnReading to Integrate Composite nearly eliminated the mid basic comprehension reading ability category with only 168of the native speakers qualifying as mid on the Reading toLearnReading to Integrate Composite
VI Conclusions
While we acknowledge that there are some limitations to this projectsuch as the time required to administer and score new measuresthe results are illuminating useful and suggest some considerationsfor future research The first consideration relates to appropriateclassification of students based on academic reading abilities Thiscorresponds to the role of TOEFL as a gatekeeper by many institu-tions For TOEFL test-takers whose current TOEFL scores would bein an intermediate to high intermediate range the new tasks couldassess additional abilities relevant to academic performance Forsuch participants additional test development efforts are warrantedThe majority of high reading ability nonnative speakers (649)could perform the new measures a third could not Our resultssuggest that some unknown number of mid reading ability nonnativespeakers are more capable of succeeding at more challenging aca-demic reading tasks than their current level of basic comprehensionassessment would indicate At the same time there are some highand mid reading ability participants who could not perform the moredemanding tasks admitting such students directly into universityprograms could result in failure These results suggest that for moststudents at lower levels of basic comprehension (in our study thosewith TOEFL scores below 490 to 500) development of new tasks isunnecessary Nevertheless our results indicate that a certain smallpercentage (in our sample 8 students 182) of nonnative speakersclassified as low reading ability on basic comprehension did in fact
Latricia Trites and Mary McGroarty 199
perform adequately on more challenging tasks like Reading to Learnand Reading to Integrate Given the very large number of TOEFLtest-takers around the world this finding merits further research Itwould be potentially unfair to exclude such students from universitystudy based on basic comprehension-only measures such as the cur-rent TOEFL We would urge institutions that use TOEFL to considerscores on these new more demanding tasks if doing so wouldaddress institutional needs According to interview data collected inconjunction with this project (Trites 2000 Chapter 6) participantsperceived that successful completion of the new tasks required athorough understanding of the texts whereas the multiple-choicetests were less demanding requiring only superficial grasp of con-tent This finding corroborates the observation of Freedle and Kostin(1999 3) who note that often examinees do not need to comprehendthe accompanying text to answer the test item Therefore for allTOEFL test-takers incorporating more challenging tasks such as theReading to Learn and Reading to Integrate measures into typicalEnglish language instruction could have positive washback effects
The second area of consideration is the focus on additional rele-vant research For predictive validity it is important to determine thecorrelation between the Reading to LearnReading to IntegrateComposite measures and actual academic performance in universityclasses Correlations might differ depending on the degree of catego-rization and synthesis required by different major fields of studyAnother area for future research is related to the novelty of theReading to Learn task as an assessment technique Current pedagog-ical trends in literacy instruction emphasize the use of graphic organ-izers as a means to understand and manipulate information in textsTo our knowledge graphic organizers are typically used as classroomactivities rather than assessment tools They have good potential foruse in assessment if students are familiar with them and if appropri-ate scoring systems can be developed This project demonstrates that it is possible though labor intensive to develop reliable scoringsystems Further research is needed to explore refinement of this tasktype and related scoring systems for use in testing programs withlarge numbers of test-takers and scorers Although the Reading toIntegrate task (generating a synthesis) was more familiar the scoringsystem was innovative because it reflected a readerrsquos ability torecognize textual frames as well as integrate information If testdevelopers are interested in the abilities of nonnative speaker readersto perform such tasks further development of similar tasks andscoring systems is warranted While a substantial investment of time
200 New tasks for reading comprehension tests
would be required to refine the administration and scoring systemsneeded for more complex tasks such as these it should be weighedagainst the possible danger of under-representing the likelihood ofacademic success based on results of the basic comprehension-onlymeasures still most often used to assess academic proficiency
Acknowledgements
This project was funded by a grant from Educational Testing Serviceas part of the TOEFL 2000 effort We appreciate the cooperation ofthe ETS staff members who assisted in the development of passagespecific tasks and other aspects of the research however no officialendorsement of Educational Testing Service should be inferred
VII References
Bachman L 2000 Modern language testing at the turn of the century assur-ing that what we count counts Language Testing 17 1ndash42
Biber D 1993 Using register-diversified corpora for general language studiesComputational Linguistics 19 219ndash41
Britt M Rouet J and Perfetti C 1996 Using hypertext to study and reasonabout historical evidence In Rouet J Levonen J Dillon A and Spiro Reditors Hypertext and cognition Mahwah NJ Lawrence Erlbaum 43ndash72
Chapelle C 1999 Validity in language assessment Annual Review of AppliedLinguistics 19 1ndash19
Educational Testing Service 1997 TOEFL test and score manual PrincetonNJ Educational Testing Service
mdashmdash 1998 Draft TOEFL 2000 research agenda framework areas of researchResearch agenda three TOEFL 2000 internal document Princeton NJEducational Testing Service
Eignor D Taylor C Kirsch I and Jamieson J 1998 Development of ascale for assessing the level of computer familiarity of TOEFL exami-nees TOEFL Research Report No 60 Princeton NJ EducationalTesting Service
Enright M and Schedl M 1999 Reading for a reason using reader purposeto guide test design TOEFL 2000 Internal Report Princeton NJEducational Testing Service
Enright M Grabe W Mosenthal P Mulcahy-Ernt P and Schedl M1998 A TOEFL 2000 framework for testing reading comprehension aworking paper Princeton NJ Educational Testing Service
Foltz P 1996 Comprehension coherence and strategies in hypertext InRouet J Levonen J Dillon A and Spiro R editors Hypertext and cog-nition Mahwah NJ Lawrence Erlbaum 109ndash36
Latricia Trites and Mary McGroarty 201
Freedle R and Kostin I 1999 Does the text matter in a multiple choice testof comprehension The case for the construct validity of TOEFLrsquosminitalks Language Testing 16 2ndash32
Goldman S 1997 Learning from text reflections on the past and suggestionsfor the future Discourse Processes 23 357ndash98
Hayes J and Hatch J 1999 Issues in measuring reliability correlationversus percentage of agreement Written Communication 16 354ndash67
Huberty C 1994 Applied discriminant analysis New York John WileyJamieson J Campbell J Norfleet L and Berbisada N 1993 Reliability
of a computerized scoring routine for an open-ended task System 21305ndash22
Jamieson J Norfleet L and Berbisada N 1993 Successes failures anddropouts in computer-assisted language lessons Computer AssistedEnglish Language Learning Journal 4 12ndash20
Klecka W 1980 Discriminant analysis In Lewis-Beck M editorQuantitative applications in the social sciences Volume 19 NewburyPark CA Sage
Lehto M Zhu W and Carpenter B 1995 The relative effectiveness ofhypertext and text International Journal of Human-ComputerInteraction 7 293ndash313
McNamara D and Kintsch W 1996 Learning from texts effects of prior knowl-edge and text coherence Discourse Processes 22 247ndash88
Messick S 1989 Validity In Linn RL editor Educational measurement 3rdedition New York American Council on Education Macmillan 13ndash103
Meyer B 1985a Prose analyses purposes procedures and problems InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 11ndash64
mdashmdash 1985b Prose analysis purposes procedures and problems Part 2 InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 269ndash97
Monks V 1997 AugustSeptember Two views same waterway NationalWildlife 35 36ndash37
Pellegrino J Baxter G and Glaser R 1999 Addressing the lsquotwo disci-plinesrsquo problem linking theories of cognition and learning with assess-ment and instructional practice Review of Research in Education 24307ndash53
Perfetti C 1997 Sentences individual differences and multiple texts threeissues in text comprehension Discourse Processes 23 337ndash55
Perfetti C Britt MA and Georgi M 1995 Text-based learning andreasoning studies in history Hillside NJ Lawrence Erlbaum
Perfetti C Marron M and Foltz P 1996 Sources of comprehension failuretheoretical perspectives and case studies In Cornoldi C and Oakhill Jeditors Reading comprehension difficulties processes and interventionMahwah NJ Lawrence Erlbaum 137ndash65
202 New tasks for reading comprehension tests
Reinking D 1988 Computer-mediated text and comprehension differencesthe role of reading time reader preference and estimation of learningReading Research Quarterly 23 484ndash500
Reinking D and Schreiner R 1985 The effects of computer-mediated texton measures of reading comprehension and reading behavior ReadingResearch Quarterly 20 536ndash53
Spivey N 1997 The constructivist metaphor reading writing and themaking of meaning San Diego CA Academic Press
Stevens J 1996 Applied multivariate statistics for the social sciences 3rdedition Mahwah NJ Lawrence Erlbaum
Tabachnick B and Fidell L 1996 Using multivariate statistics 3rd editionNew York Harper Collins
Taylor C Jamieson J Eignor D and Kirsch I 1998 The relationshipbetween computer familiarity and performance on computer-basedTOEFL test tasks TOEFL Research Report No 61 Princeton NJEducational Testing Service
Tennesen M 1997 NovemberDecember On a clear day National Parks 7126ndash9
Trites L 2000 Beyond basic comprehension reading to learn and reading tointegrate for native and non-native speakers Unpublished doctoraldissertation Northern Arizona University Flagstaff AZ
Van den Berg S and Watt J 1991 Effects of educational setting on studentresponses to structured hypertext Journal of Computer-BasedInstruction 18 118ndash24
Van Dijk TA and Kintsch W 1983 Strategies of discourse comprehensionNew York Academic Press
Wiley J and Voss J 1999 Constructing arguments from multiple sourcestasks that promote understanding and not just memory for text Journalof Educational Psychology 91 310ndash11
Zimmerman T 1997 December 29 Filter it with billions and billions of oys-ters how to revive the Chesapeake Bay US News and World Report 12363
Appendix 1a Chart completion task
Directions Complete the following chart Fill in as much detail aspossible from the text read by categorizing the information into thedifferent areas on the chart Do not use single words for yourresponses form your responses in phrases or complete sentencesInclude examples from the text
Make no judgments about the accuracy of causes or effects or theeffectiveness of the solutions mentioned in the text Solutions are seen
Latricia Trites and Mary McGroarty 203
204 New tasks for reading comprehension tests
as any action taken in response to the problem(s) Solutions can takemany forms such as proposed solutions attempted solutions or failedsolutions Also space is provided under each category for examplesExamples are specific examples found in the text that are used by theauthor(s) to exemplify the problems causes effects or solutions inthe text However there may not be examples for every category
Points will be awarded for correct responses only There is no penaltyfor incorrect responses Points will be awarded in the following manner
Problems and Solutions 10 points eachCauses and Effects 5 points eachExamples 1 point each
Problems Causes Effects Solutions
Examples Examples Examples Examples
Latricia Trites and Mary McGroarty 205
Ap
pen
dix
1b
Rea
din
g t
o L
earn
sco
rin
g r
ub
ric
0ndash2
41 t
ota
l po
ssib
le
Pro
ble
ms
(10
po
ints
) 1
wo
rd
Cau
ses
(5p
oin
ts)
1 w
ord
E
ffec
ts (
5 p
oin
ts)
1 w
ord
S
olu
tio
ns
(10
po
ints
en
d)
1 w
ord
(6 p
oin
ts)
40
max
imu
m(3
po
ints
) 5
5 m
axim
um
(3 p
oin
ts)
30
max
imu
m(6
po
ints
) 9
0 m
axim
um
bullA
ir p
ollu
tio
nS
mo
g i
n t
he
bullS
ulf
ur
nit
rog
en e
mis
sio
ns
bullTre
es a
nd
pla
nts
aff
ecte
dS
tric
ter
En
vir
on
men
tal
Law
s
Nati
on
al
Park
sbull
Aci
d r
ain
(in
jure
d)
gro
wth
hin
der
ed(r
eso
luti
on
s
acts
)
bullO
pp
osit
ion
or
Ign
ori
ng
bull
Gro
un
d-l
evel
ozo
ne
bullV
isib
ilit
y d
ecre
ased
bull19
77 C
lean
Air
Act
(or
lack
of
coo
per
atio
n)
to
Urb
an
In
du
str
ial
em
issio
ns
bullM
eta
ls l
oo
sen
ed
into
wat
ers
Am
en
dm
en
ts l
ab
elin
gN
Ps
envi
ron
men
tal s
tan
dar
ds
bullA
uto
mo
bile
emis
sio
ns
(su
rfac
eg
rou
nd
wat
er
as C
lass I
are
as
(reg
ula
tio
ns)
bullP
ow
er
pla
nts
fac
tori
esp
lan
ts
wat
er p
ollu
ted
)bull
Reg
ion
al h
aze
reg
ula
tio
ns
bullN
o t
rue
lsquopo
int
sou
rces
rsquoem
issi
on
sbull
Nu
trie
nts
rem
oved
pro
po
sed
by
the
EPA
(myri
ad
of
sm
aller
po
llu
tio
n
bullE
mis
sio
ns
fro
m S
mo
kesta
cks
(lea
ched
) fr
om
so
il bull
Red
ucti
on
of
allo
wab
leso
urc
es
rath
er t
han
on
e (c
him
neys)
and
or
pla
nts
po
llu
tio
n s
tan
dard
s f
rom
la
rge
sou
rce)
bullE
mis
sio
ns
fro
m K
iln
sbull
Pu
blic o
utc
ryo
ver
stan
ce
ind
ust
rybull
Nati
on
al
Park
s l
imit
ed
bull
Un
reg
ula
ted
po
lluti
on
sm
og
o
f b
ig b
usi
nes
sg
ove
rnm
ent
bullC
lean
Air
Act
set
ju
risd
icti
on
of
po
lluti
on
(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)bull
Aq
uati
c l
ife in
jure
d (
dam
aged
)vis
ibilit
y g
oals
sou
rces
ou
tsid
e o
f th
e p
arks
bullS
mo
ke f
rom
Co
ntr
olled
bu
rns
bullO
bje
cti
ng
to c
on
str
ucti
on
bullE
mis
sio
ns
fro
m la
rge
citi
esp
erm
its
bullId
en
tifi
cati
on
of
po
lluti
on
so
urc
es
bullIn
du
str
y m
od
ificati
on
s
Insta
llati
on
o
f d
evic
es
(scr
ub
ber
s)
to r
edu
ce
po
lluti
on
bullP
ub
lic p
ressu
re t
op
rote
ct p
arks
206 New tasks for reading comprehension testsA
pp
en
dix
1b
(co
nti
nu
ed)
Exam
ple
s (
1 p
oin
t) 5
maxim
um
Exam
ple
s (
1 p
oin
t o
r 5 p
oin
ts
Exam
ple
s (
1 p
oin
t) 6
maxim
um
Exam
ple
s (
1 p
oin
t o
r 10 p
oin
ts
wit
h o
vera
rch
ing
cau
se
wit
h o
vera
rch
ing
so
luti
on
liste
d a
bo
ve)
liste
d a
bo
ve)
bullG
reat
Sm
oky M
ou
nta
inbull
Au
tom
ob
ile
emis
sio
ns
bullLe
aves
of
pla
nts
tu
rnin
g
bull19
77 C
lean
Air
Act
N
atio
nal
Par
kbull
Po
wer
pla
nts
fac
tori
es
pu
rple
an
d b
row
n (
stip
plin
g)
Am
en
dm
en
ts
lab
elin
gN
Ps
bullG
ran
d C
an
yo
np
lan
ts e
mis
sio
ns
bullH
ind
erin
g p
ho
tosyn
thesis
of
as C
lass I
are
as
Nat
ion
al P
ark
bullE
mis
sio
ns
fro
m
pla
nts
an
d t
rees
bullS
mo
ky M
ou
nta
in S
ou
rces
S
mo
kesta
cks
(ch
imn
eys)
bullR
ed
ucti
on
of
vis
ibilit
yat
bullR
egio
nal
haz
ere
gu
lati
on
s
fro
m O
hio
New
Yo
rk
bullE
mis
sio
ns
fro
m K
iln
sG
ran
d C
an
yo
np
rop
ose
d b
y th
e E
PA
Atl
anta
etc
bull
Un
reg
ula
ted
po
lluti
on
sm
og
bull
Vis
ibili
ty r
educ
ed 9
0 o
f bull
Red
ucti
on
of
allo
wab
lebull
Gra
nd
Can
yon
So
urc
es(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)th
e da
ysp
ollu
tio
n s
tan
dard
s f
rom
fro
m C
A N
V U
T A
Z N
M
bullS
mok
e fr
om C
on
tro
lled
bu
rns
bullS
ee v
ague
blu
e m
asse
sin
du
stry
and
Mex
ico
bullE
mis
sio
ns
fro
m la
rge
citi
esbull
See
hal
f as
far
as
in 1
919
bullT
N L
utt
rell
corp
bu
ildin
g
per
mit
sit
uat
ion
Exam
ple
s (
1 p
oin
t)
10 m
axim
um
Exam
ple
s (
1 p
oin
t)
5 m
axim
um
bullN
avaj
o G
ener
atin
g S
tati
on
bull
Ob
ject
ing
to
kiln
s in
TN
Pag
e A
Z (
Po
wer
Pla
nt
AZ
)bull
Res
earc
her
s u
sin
g s
cien
tifi
cbull
Po
wer
pla
nts
in T
N a
nd
OH
tech
no
log
y to
ID s
ou
rces
rive
r va
lley
(rad
ioac
tive
iso
top
es)
bullS
ou
ther
n C
alif
orn
ia E
dis
on
bull
So
uth
ern
Cal
ifo
rnia
Ed
iso
nP
lan
t L
aug
hlin
NV
Pla
nt
in L
aug
hlin
NV
bull
Ten
nes
see
Lutt
rell
Kiln
s T
Nid
enti
fied
as
po
int
sou
rce
bullIn
du
stry
in A
Zbull
Scru
bber
sin
stal
led
at
bullC
ars
in C
AN
avaj
o G
ener
atin
g P
lan
tbull
Sm
oke
stac
ks in
NM
NV
UT
bull10
r
edu
ctio
n o
f p
ollu
tio
nbull
Los
An
gel
esin
10ndash
15 y
ears
(g
oal
of
no
bullA
tlan
tam
an-m
ade
po
lluti
on
)bull
New
Yo
rk
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
Latricia Trites and Mary McGroarty 191
Tab
le 8
Co
rrel
atio
ns
for
all r
ead
ing
mea
sure
s fo
r al
l par
tici
pan
ts (
n
251)
TO
EFL
Rea
din
gB
asic
B
asic
Rea
din
g t
oR
ead
ing
to
C
om
pre
hen
sio
nC
om
pre
hen
sio
n T
est
1C
om
pre
hen
sio
n T
est
2Le
arn
Inte
gra
tea
Nel
son
ndashDen
ny
90b
85b
84b
66b
69b
TO
EFL
Rea
din
g
100
85b
84b
64b
69b
com
pre
hen
sio
nB
asic
1
008
4b6
8b6
8b
com
pre
hen
sio
n 1
Bas
ic
100
68b
70b
com
pre
hen
sio
n 2
Rea
din
g t
o L
earn
100
59b
Not
es a
nsi
ze r
edu
ced
fo
r R
ead
ing
to
Inte
gra
te b
ecau
se o
f an
om
alo
us
test
ing
ses
sio
n b
p
05
and discriminant analysis usually associated with descriptiveresearch (Tabachnick and Fidell 1996) The present research wasconducted with samples of naturally occurring student groups andwas not experimental Moreover we were interested in finding waysto compare participant performance on the measures of the newconstructs Reading to Learn and Reading to Integrate with perform-ance on more traditional measures of basic comprehension Thus we opted to use discriminant analysis because of its parsimony of description and clarity of interpretation (Stevens 1996)Discriminant analysis a technique recommended to describe groupdifferences or predict group membership based on a comparison ofmultiple predictors (Huberty 1994) has been used in other areas ofapplied linguistic research to investigate creation of a student profileof success or failure on Computer Assisted Language Learning(CALL) lessons (Jamieson et al 1993) and accurate classification oftext types into registers (Biber 1993) among other purposes
To further distinguish basic comprehension from the new con-structs we conducted discriminant analysis on each of the twolanguage groups (native and nonnative) to determine whetherReading to Learn and Reading to Integrate would classify partici-pants in the same way that Basic Comprehension would We dividedthe native speaker and nonnative speaker groups into three levelshigh middle (mid) and low reading ability scorers based on thebasic comprehension measure chosen for that group (NelsonndashDennyfor native speakers TOEFL Reading Comprehension for nonnativespeakers) Research methodologists (Tabachnick and Fidell 1996513) note that robustness is expected when the smallest group has isleast 20 our smallest group was 25 To check the assumption ofhomogeneity of the variancecovariance matrices we examined theoutcomes of Boxrsquos M Test and found them all nonsignificant(Klecka 1980) Each group was checked for outliers usingMahalanobis distance treated as Chi-Square and no outliers were found (Tabachnick and Fidell 1996) Thus the data met allassumptions required for use of discriminant analysis
a Discriminant analysis for nonnative speakers To organize thediscriminant analysis in order to see if the Reading to LearnReadingto Integrate Composite classified participants similarly to themeasure of Basic Comprehension (for nonnative speakers TOEFLReading Comprehension) we divided the entire nonnative speakergroup (n 146) into three levels of basic comprehension high mid
192 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 193
and low These reading ability groups of high mid and low werebased on typical TOEFL Reading Comprehension score levelsrequired for program entry Participants were classified as high iftheir scores were greater than or equal to 550 or 56 and above on thescaled score on the TOEFL Reading Comprehension (550 is the cutscore often used for graduate entry) Participants were classified asmid if scores ranged between 500 and 549 or 50 to 55 on the TOEFLReading Comprehension Participants were classified as low if theirscores fell below 500 or 49 and below on TOEFL ReadingComprehension (500 is a minimum TOEFL score sometimes usedfor undergraduate admission often with the proviso that studentsenroll in ESL classes either prior to official enrollment or concur-rently) For our entire nonnative speaker group descriptive statisticson basic comprehension reading ability group membership levelsappear in Table 9
We ran SPSS Discriminant Analysis with initial grouping variablesof high mid and low reading ability We compared initial readingability levels with high mid and low categories on the Reading toLearnReading to Integrate Composite a new variable reflectinglevel of performance on the Reading to Learn and Reading toIntegrate measures combined5 The discriminant analysis yieldedone discriminant function with an eigenvalue of 92 responsible for999 of the variance in outcomes Wilkrsquos Lambda for this functionwas 52 significant at 001 the associated Chi-Square value wasextremely large (9093) and highly significant ( p 001) indicatingthe group centroids on the composite Reading to LearnReading toIntegrate function for the three nonnative speaker reading abilitygroups were significantly different Both Reading to Learn and
Table 9 Descriptive statistics for nonnative speakers by TOEFL reading compre-hension reading ability groups (n 143)
Reading ability group n Mean on TOEFL reading comprehension sd
High (56) 57 5968 303Mid (50ndash55) 42 5281 167Low (49) 44 4211 547
Total 143 5226 822
Note Total n 143 due to loss of three cases in anomalous Reading to Integratesession
5We first calculated two separate discriminant analyses one for Reading to Learn and one for Reading to Integrate but we found that both loaded on a single function so we used thecomposite in subsequent analyses
194 New tasks for reading comprehension tests
Reading to Integrate loaded significantly on the discriminant func-tion at 86 for Reading to Learn and 78 for Reading to Integrate ( p 05)
Over half (649) of the high reading ability group remained highon the new measure less than half (429) of the mid reading ability group remained classified as mid Hence the Reading toLearnReading to Integrate Composite was particularly influential inreclassifying the mid reading ability group and to a lesser extent thehigh group However most (818) of the low reading ability groupremained low on the Reading to LearnReading to IntegrateComposite (see Table 10) Of the 143 nonnative speaker participants92 (64) remained in the initial basic comprehension category onthe composite the rest moved but in different directions Twenty-one participants (147) were classified into a higher category on theReading to LearnReading to Integrate Composite than their initialbasic comprehension level would have suggested while 31 (217)were reclassified into a lower category Thus 51 participants(364) just over one third of the sample were classified differentlybased on their Reading to LearnReading to Integrate Compositeperformance
b Discriminant analysis for native speakers Because one of ourgoals in this project was to probe the possible validity of these newmeasures by assessing performance of two groups native as well as
Table 10 Discriminant analysis comparison of nonnative speaker reading abilitygroups with reading to learnreading to integrate composite (n 143)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto integrate composite
High Mid Low
CountHigh (56) 37 17 3 57Mid (50ndash55) 13 18 11 42Low (49) 2 6 36 44Reclassification total 52 41 50 143
PercentageHigh (56) 649 298 53 1000Mid (50ndash55) 310 429 262 1000Low (49) 45 136 818 1000
NoteTotal n 143 due to loss of three cases in anomalous Reading to Integratesession
Latricia Trites and Mary McGroarty 195
nonnative speakers we conducted a parallel discriminant analysisfor native speakers Thus for the native speakers we followed thesame procedure dividing the entire native speaker group (n 105)into three levels of basic comprehension high mid and low basedon score distances of 5 standard deviations from the sample meanof the NelsonndashDenny test for these participants Native speakersneed not take reading comprehension tests when entering the univer-sity so the three-way split was based entirely on our sample dataParticipants were classified as high if their scores on the Nelson-Denny were greater than or equal to 135 Participants were classifiedas mid if their scores ranged between 119ndash134 on the Nelson-DennyParticipants were classified as low if their scores fell at or below 118on the Nelson-Denny Descriptive statistics for the entire nativespeaker sample on basic comprehension group membership appearin Table 11
The discriminant analysis yielded one discriminant function with an eigenvalue of 31 responsible for 987 of the variance inoutcomes Wilkrsquos Lambda for this function was 76 significant at 001 the associated Chi-Square value was large (2639) and highlysignificant (p 001) indicating the group centroids on the discrim-inant function for the three reading ability groups on the Reading toLearnReading to Integrate Composite were significantly differentPooled within groups correlations between discriminating variablesshowed that Reading to Learn correlated with the first discriminantfunction at a level of 81 Reading to Integrate correlated with thesecond discriminant function at 71 This contrasts with findings forthe nonnative speakers where scores on the combined new measuresloaded significantly on only one discriminant function For nativespeakers then there is evidence for two significant discriminantfunctions although the first accounts for almost all of the varianceAlthough these two measures (Reading to Learn and Reading toIntegrate) loaded on two separate discriminant functions they still
Table 11 Descriptive statistics for native speakers by NelsonndashDenny reading abilitygroups (n 101)
Reading ability group n Mean on NelsonndashDenny sd
High (135) 39 14126 519Mid (119ndash134) 37 12565 509Low (118) 25 10288 1098Total 101 12604 1652
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
196 New tasks for reading comprehension tests
showed moderate correlations with the alternate function6 justifyingthe composite calculations
Results of discriminant analysis for native speakers seen in Table 12 show a different pattern than that observed for nonnativespeakers Nearly three-fourths (718) of the native speakersclassified as high in the basic comprehension reading ability groupremained high on the Reading to LearnReading to IntegrateComposite For the mid group however only 189 remainedclassified as mid Just over half (52) of the low reading abilitygroup members remained low on the Reading to LearnReading to Integrate Composite As with the nonnative speakers participantsin the mid category on basic comprehension showed the mostfrequent reclassification Forty-eight of the 101 (475) nativespeaker participants remained in the initial classification categoriesTwenty-two (218) were reclassified into a higher category and 31(307) were reclassified into a lower category Thus 53 participants(525) over half of the sample were classified differently based ontheir performance on the Reading to LearnReading to IntegrateComposite
6Reading to Learn correlated with function 2 at -58 Reading to Integrate with function 1 at 71
Table 12 Discriminant analysis comparison of native speaker reading abilitygroups with Reading to LearnReading to Integrate Composite (n 101)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto Integrate Composite
High Mid Low
CountHigh (56) 37 17 3CountHigh (135) 28 3 8 39Mid (119ndash134) 10 7 20 37Low (118) 5 7 13 25Reclassification total 43 17 41 101
PercentageHigh (135) 718 77 205 1000Mid (119ndash134) 270 189 541 1000Low (118) 200 280 520 1000
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
V Interpretation
Analyses done to answer Research Questions 1 and 2 showed that performance on Reading to Learn and Reading to Integratemeasures was significantly influenced by language background and level of education (graduate vs undergraduate for nonnativespeakers only) Moreover level of computer familiarity had nosignificant effect on Reading to Learn and Reading to Integrateperformance
Correlations showed that as expected all reading measures corre-lated to some degree generally answering Research Question 3 Themost interesting results came from the discriminant analyses becausethey showed a pattern of differential classification on the new taskssensitive to initial reading ability This pattern differed for nonnativespeakers and native speakers Examination of the reclassificationsbased on the new tasks revealed some of the problems of using abasic comprehension-only test to predict performance on more chal-lenging literacy tasks These results also imply the need for furtherwork on new measures of advanced literacy skills such as Reading toLearn and Reading to Integrate to reflect trends in construct-drivenassessment (Pellegrino et al 1999)
For the nonnative speakers most (818) of the participants withTOEFL Reading Comprehension below 50 remained classified as lowon the Reading to LearnReading to Integrate Composite suggestingthe existence of a lower threshold of academic English proficiencyParticipants below this threshold were unlikely to perform well ontasks assessing Reading to Learn and Reading to Integrate Even tasksinvolving only selection (such as BC1 and BC2) rather than produc-tion of responses were difficult for this group However for thenonnative speaker readers in the mid and high reading ability groupsbasic comprehension level was not nearly as consistent a predictor of performance on the Reading to LearnReading to IntegrateComposite This was especially striking for those in the mid readingability group Approximately one fourth of the mid group droppedinto the low category on the Reading to LearnReading to IntegrateComposite indicating that they experienced more difficulty in com-pleting the new measures but approximately one fourth could indeedcategorize and synthesize information relatively well despite mid-dling performance on basic comprehension This finding suggeststhat for nonnative speaker readers in the mid reading ability categorybasic comprehension measures were insufficient to predict theirperformance on the new measures
Latricia Trites and Mary McGroarty 197
Almost two-thirds of the high basic comprehension nonnativespeakers remained high on the Reading to LearnReading toIntegrate Composite but one third dropped suggesting that theycould not categorize and synthesize information from texts as well asthey could recognize information For this third the recognition-onlytask multiple-choice basic comprehension overestimated ability tomanipulate and synthesize information For the high and mid read-ing ability nonnative speaker groups basic comprehension-only testsmay then overestimate academic English proficiency
Results for the native speakers showed a more definitive pattern ofreclassification For the low reading ability group approximatelyhalf were misclassified suggesting that the basic comprehensionmeasure underrepresented their ability to categorize and synthesizeinformation in academic texts Most of the mid reading ability groupwere reclassified as lower on the Reading to LearnReading toIntegrate Composite showing that basic comprehension resultsoverestimated ability to succeed on more challenging tasks On theother hand almost one third of the mid reading ability group didbetter on the Reading to LearnReading to Integrate Composite forthem basic comprehension underrepresented their ability to com-plete more challenging reading tasks For the high ability readersnearly one third dropped on the Reading to LearnReading toIntegrate Composite For them the basic comprehension measureoverestimated their level of academic English proficiency The other two thirds of the high reading ability group remained highsuggesting a higher threshold of academic English proficiency
Considering results from both the nonnative speakers and nativespeakers we conclude that the new tasks did assess something dif-ferent from basic comprehension once a lower level threshold ofbasic academic English proficiency had been achieved Examinationof scatterplots based on discriminant functions indicated an obviousseparation between participants who could and could not perform onthe Reading to LearnReading to Integrate Composite thus provid-ing some tentative evidence for concurrent validity (Messick 1989Chapelle 1999) We had hoped to find clear evidence of a hierarchysuggesting that Reading to Learn was demonstrably more difficultthan basic comprehension and Reading to Integrate demonstrablymore difficult than Reading to Learn but results did not yield anobvious hierarchy Results did however suggest an even simplerpattern a dichotomy For those nonnative speakers above the lowerthreshold of English language proficiency which in our data wouldbe approximately 500 or 49 on the scaled TOEFL Reading
198 New tasks for reading comprehension tests
Comprehension scores some could perform well on the newmeasures some could not revealing two rather than three groups onthe Reading to LearnReading to Integrate Composite These resultslead us to speculate that the new measures tap additional skills suchas lsquosophisticated discourse processes and critical thinking skillsrsquo(Enright and Schedl 1999 24) in addition to language proficiencyFor native speakers the discriminant loading suggested a possiblehierarchy of difficulty but the scatterplots and classification tablesrevealed a dichotomy Reclassification based on the Reading toLearnReading to Integrate Composite nearly eliminated the mid basic comprehension reading ability category with only 168of the native speakers qualifying as mid on the Reading toLearnReading to Integrate Composite
VI Conclusions
While we acknowledge that there are some limitations to this projectsuch as the time required to administer and score new measuresthe results are illuminating useful and suggest some considerationsfor future research The first consideration relates to appropriateclassification of students based on academic reading abilities Thiscorresponds to the role of TOEFL as a gatekeeper by many institu-tions For TOEFL test-takers whose current TOEFL scores would bein an intermediate to high intermediate range the new tasks couldassess additional abilities relevant to academic performance Forsuch participants additional test development efforts are warrantedThe majority of high reading ability nonnative speakers (649)could perform the new measures a third could not Our resultssuggest that some unknown number of mid reading ability nonnativespeakers are more capable of succeeding at more challenging aca-demic reading tasks than their current level of basic comprehensionassessment would indicate At the same time there are some highand mid reading ability participants who could not perform the moredemanding tasks admitting such students directly into universityprograms could result in failure These results suggest that for moststudents at lower levels of basic comprehension (in our study thosewith TOEFL scores below 490 to 500) development of new tasks isunnecessary Nevertheless our results indicate that a certain smallpercentage (in our sample 8 students 182) of nonnative speakersclassified as low reading ability on basic comprehension did in fact
Latricia Trites and Mary McGroarty 199
perform adequately on more challenging tasks like Reading to Learnand Reading to Integrate Given the very large number of TOEFLtest-takers around the world this finding merits further research Itwould be potentially unfair to exclude such students from universitystudy based on basic comprehension-only measures such as the cur-rent TOEFL We would urge institutions that use TOEFL to considerscores on these new more demanding tasks if doing so wouldaddress institutional needs According to interview data collected inconjunction with this project (Trites 2000 Chapter 6) participantsperceived that successful completion of the new tasks required athorough understanding of the texts whereas the multiple-choicetests were less demanding requiring only superficial grasp of con-tent This finding corroborates the observation of Freedle and Kostin(1999 3) who note that often examinees do not need to comprehendthe accompanying text to answer the test item Therefore for allTOEFL test-takers incorporating more challenging tasks such as theReading to Learn and Reading to Integrate measures into typicalEnglish language instruction could have positive washback effects
The second area of consideration is the focus on additional rele-vant research For predictive validity it is important to determine thecorrelation between the Reading to LearnReading to IntegrateComposite measures and actual academic performance in universityclasses Correlations might differ depending on the degree of catego-rization and synthesis required by different major fields of studyAnother area for future research is related to the novelty of theReading to Learn task as an assessment technique Current pedagog-ical trends in literacy instruction emphasize the use of graphic organ-izers as a means to understand and manipulate information in textsTo our knowledge graphic organizers are typically used as classroomactivities rather than assessment tools They have good potential foruse in assessment if students are familiar with them and if appropri-ate scoring systems can be developed This project demonstrates that it is possible though labor intensive to develop reliable scoringsystems Further research is needed to explore refinement of this tasktype and related scoring systems for use in testing programs withlarge numbers of test-takers and scorers Although the Reading toIntegrate task (generating a synthesis) was more familiar the scoringsystem was innovative because it reflected a readerrsquos ability torecognize textual frames as well as integrate information If testdevelopers are interested in the abilities of nonnative speaker readersto perform such tasks further development of similar tasks andscoring systems is warranted While a substantial investment of time
200 New tasks for reading comprehension tests
would be required to refine the administration and scoring systemsneeded for more complex tasks such as these it should be weighedagainst the possible danger of under-representing the likelihood ofacademic success based on results of the basic comprehension-onlymeasures still most often used to assess academic proficiency
Acknowledgements
This project was funded by a grant from Educational Testing Serviceas part of the TOEFL 2000 effort We appreciate the cooperation ofthe ETS staff members who assisted in the development of passagespecific tasks and other aspects of the research however no officialendorsement of Educational Testing Service should be inferred
VII References
Bachman L 2000 Modern language testing at the turn of the century assur-ing that what we count counts Language Testing 17 1ndash42
Biber D 1993 Using register-diversified corpora for general language studiesComputational Linguistics 19 219ndash41
Britt M Rouet J and Perfetti C 1996 Using hypertext to study and reasonabout historical evidence In Rouet J Levonen J Dillon A and Spiro Reditors Hypertext and cognition Mahwah NJ Lawrence Erlbaum 43ndash72
Chapelle C 1999 Validity in language assessment Annual Review of AppliedLinguistics 19 1ndash19
Educational Testing Service 1997 TOEFL test and score manual PrincetonNJ Educational Testing Service
mdashmdash 1998 Draft TOEFL 2000 research agenda framework areas of researchResearch agenda three TOEFL 2000 internal document Princeton NJEducational Testing Service
Eignor D Taylor C Kirsch I and Jamieson J 1998 Development of ascale for assessing the level of computer familiarity of TOEFL exami-nees TOEFL Research Report No 60 Princeton NJ EducationalTesting Service
Enright M and Schedl M 1999 Reading for a reason using reader purposeto guide test design TOEFL 2000 Internal Report Princeton NJEducational Testing Service
Enright M Grabe W Mosenthal P Mulcahy-Ernt P and Schedl M1998 A TOEFL 2000 framework for testing reading comprehension aworking paper Princeton NJ Educational Testing Service
Foltz P 1996 Comprehension coherence and strategies in hypertext InRouet J Levonen J Dillon A and Spiro R editors Hypertext and cog-nition Mahwah NJ Lawrence Erlbaum 109ndash36
Latricia Trites and Mary McGroarty 201
Freedle R and Kostin I 1999 Does the text matter in a multiple choice testof comprehension The case for the construct validity of TOEFLrsquosminitalks Language Testing 16 2ndash32
Goldman S 1997 Learning from text reflections on the past and suggestionsfor the future Discourse Processes 23 357ndash98
Hayes J and Hatch J 1999 Issues in measuring reliability correlationversus percentage of agreement Written Communication 16 354ndash67
Huberty C 1994 Applied discriminant analysis New York John WileyJamieson J Campbell J Norfleet L and Berbisada N 1993 Reliability
of a computerized scoring routine for an open-ended task System 21305ndash22
Jamieson J Norfleet L and Berbisada N 1993 Successes failures anddropouts in computer-assisted language lessons Computer AssistedEnglish Language Learning Journal 4 12ndash20
Klecka W 1980 Discriminant analysis In Lewis-Beck M editorQuantitative applications in the social sciences Volume 19 NewburyPark CA Sage
Lehto M Zhu W and Carpenter B 1995 The relative effectiveness ofhypertext and text International Journal of Human-ComputerInteraction 7 293ndash313
McNamara D and Kintsch W 1996 Learning from texts effects of prior knowl-edge and text coherence Discourse Processes 22 247ndash88
Messick S 1989 Validity In Linn RL editor Educational measurement 3rdedition New York American Council on Education Macmillan 13ndash103
Meyer B 1985a Prose analyses purposes procedures and problems InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 11ndash64
mdashmdash 1985b Prose analysis purposes procedures and problems Part 2 InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 269ndash97
Monks V 1997 AugustSeptember Two views same waterway NationalWildlife 35 36ndash37
Pellegrino J Baxter G and Glaser R 1999 Addressing the lsquotwo disci-plinesrsquo problem linking theories of cognition and learning with assess-ment and instructional practice Review of Research in Education 24307ndash53
Perfetti C 1997 Sentences individual differences and multiple texts threeissues in text comprehension Discourse Processes 23 337ndash55
Perfetti C Britt MA and Georgi M 1995 Text-based learning andreasoning studies in history Hillside NJ Lawrence Erlbaum
Perfetti C Marron M and Foltz P 1996 Sources of comprehension failuretheoretical perspectives and case studies In Cornoldi C and Oakhill Jeditors Reading comprehension difficulties processes and interventionMahwah NJ Lawrence Erlbaum 137ndash65
202 New tasks for reading comprehension tests
Reinking D 1988 Computer-mediated text and comprehension differencesthe role of reading time reader preference and estimation of learningReading Research Quarterly 23 484ndash500
Reinking D and Schreiner R 1985 The effects of computer-mediated texton measures of reading comprehension and reading behavior ReadingResearch Quarterly 20 536ndash53
Spivey N 1997 The constructivist metaphor reading writing and themaking of meaning San Diego CA Academic Press
Stevens J 1996 Applied multivariate statistics for the social sciences 3rdedition Mahwah NJ Lawrence Erlbaum
Tabachnick B and Fidell L 1996 Using multivariate statistics 3rd editionNew York Harper Collins
Taylor C Jamieson J Eignor D and Kirsch I 1998 The relationshipbetween computer familiarity and performance on computer-basedTOEFL test tasks TOEFL Research Report No 61 Princeton NJEducational Testing Service
Tennesen M 1997 NovemberDecember On a clear day National Parks 7126ndash9
Trites L 2000 Beyond basic comprehension reading to learn and reading tointegrate for native and non-native speakers Unpublished doctoraldissertation Northern Arizona University Flagstaff AZ
Van den Berg S and Watt J 1991 Effects of educational setting on studentresponses to structured hypertext Journal of Computer-BasedInstruction 18 118ndash24
Van Dijk TA and Kintsch W 1983 Strategies of discourse comprehensionNew York Academic Press
Wiley J and Voss J 1999 Constructing arguments from multiple sourcestasks that promote understanding and not just memory for text Journalof Educational Psychology 91 310ndash11
Zimmerman T 1997 December 29 Filter it with billions and billions of oys-ters how to revive the Chesapeake Bay US News and World Report 12363
Appendix 1a Chart completion task
Directions Complete the following chart Fill in as much detail aspossible from the text read by categorizing the information into thedifferent areas on the chart Do not use single words for yourresponses form your responses in phrases or complete sentencesInclude examples from the text
Make no judgments about the accuracy of causes or effects or theeffectiveness of the solutions mentioned in the text Solutions are seen
Latricia Trites and Mary McGroarty 203
204 New tasks for reading comprehension tests
as any action taken in response to the problem(s) Solutions can takemany forms such as proposed solutions attempted solutions or failedsolutions Also space is provided under each category for examplesExamples are specific examples found in the text that are used by theauthor(s) to exemplify the problems causes effects or solutions inthe text However there may not be examples for every category
Points will be awarded for correct responses only There is no penaltyfor incorrect responses Points will be awarded in the following manner
Problems and Solutions 10 points eachCauses and Effects 5 points eachExamples 1 point each
Problems Causes Effects Solutions
Examples Examples Examples Examples
Latricia Trites and Mary McGroarty 205
Ap
pen
dix
1b
Rea
din
g t
o L
earn
sco
rin
g r
ub
ric
0ndash2
41 t
ota
l po
ssib
le
Pro
ble
ms
(10
po
ints
) 1
wo
rd
Cau
ses
(5p
oin
ts)
1 w
ord
E
ffec
ts (
5 p
oin
ts)
1 w
ord
S
olu
tio
ns
(10
po
ints
en
d)
1 w
ord
(6 p
oin
ts)
40
max
imu
m(3
po
ints
) 5
5 m
axim
um
(3 p
oin
ts)
30
max
imu
m(6
po
ints
) 9
0 m
axim
um
bullA
ir p
ollu
tio
nS
mo
g i
n t
he
bullS
ulf
ur
nit
rog
en e
mis
sio
ns
bullTre
es a
nd
pla
nts
aff
ecte
dS
tric
ter
En
vir
on
men
tal
Law
s
Nati
on
al
Park
sbull
Aci
d r
ain
(in
jure
d)
gro
wth
hin
der
ed(r
eso
luti
on
s
acts
)
bullO
pp
osit
ion
or
Ign
ori
ng
bull
Gro
un
d-l
evel
ozo
ne
bullV
isib
ilit
y d
ecre
ased
bull19
77 C
lean
Air
Act
(or
lack
of
coo
per
atio
n)
to
Urb
an
In
du
str
ial
em
issio
ns
bullM
eta
ls l
oo
sen
ed
into
wat
ers
Am
en
dm
en
ts l
ab
elin
gN
Ps
envi
ron
men
tal s
tan
dar
ds
bullA
uto
mo
bile
emis
sio
ns
(su
rfac
eg
rou
nd
wat
er
as C
lass I
are
as
(reg
ula
tio
ns)
bullP
ow
er
pla
nts
fac
tori
esp
lan
ts
wat
er p
ollu
ted
)bull
Reg
ion
al h
aze
reg
ula
tio
ns
bullN
o t
rue
lsquopo
int
sou
rces
rsquoem
issi
on
sbull
Nu
trie
nts
rem
oved
pro
po
sed
by
the
EPA
(myri
ad
of
sm
aller
po
llu
tio
n
bullE
mis
sio
ns
fro
m S
mo
kesta
cks
(lea
ched
) fr
om
so
il bull
Red
ucti
on
of
allo
wab
leso
urc
es
rath
er t
han
on
e (c
him
neys)
and
or
pla
nts
po
llu
tio
n s
tan
dard
s f
rom
la
rge
sou
rce)
bullE
mis
sio
ns
fro
m K
iln
sbull
Pu
blic o
utc
ryo
ver
stan
ce
ind
ust
rybull
Nati
on
al
Park
s l
imit
ed
bull
Un
reg
ula
ted
po
lluti
on
sm
og
o
f b
ig b
usi
nes
sg
ove
rnm
ent
bullC
lean
Air
Act
set
ju
risd
icti
on
of
po
lluti
on
(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)bull
Aq
uati
c l
ife in
jure
d (
dam
aged
)vis
ibilit
y g
oals
sou
rces
ou
tsid
e o
f th
e p
arks
bullS
mo
ke f
rom
Co
ntr
olled
bu
rns
bullO
bje
cti
ng
to c
on
str
ucti
on
bullE
mis
sio
ns
fro
m la
rge
citi
esp
erm
its
bullId
en
tifi
cati
on
of
po
lluti
on
so
urc
es
bullIn
du
str
y m
od
ificati
on
s
Insta
llati
on
o
f d
evic
es
(scr
ub
ber
s)
to r
edu
ce
po
lluti
on
bullP
ub
lic p
ressu
re t
op
rote
ct p
arks
206 New tasks for reading comprehension testsA
pp
en
dix
1b
(co
nti
nu
ed)
Exam
ple
s (
1 p
oin
t) 5
maxim
um
Exam
ple
s (
1 p
oin
t o
r 5 p
oin
ts
Exam
ple
s (
1 p
oin
t) 6
maxim
um
Exam
ple
s (
1 p
oin
t o
r 10 p
oin
ts
wit
h o
vera
rch
ing
cau
se
wit
h o
vera
rch
ing
so
luti
on
liste
d a
bo
ve)
liste
d a
bo
ve)
bullG
reat
Sm
oky M
ou
nta
inbull
Au
tom
ob
ile
emis
sio
ns
bullLe
aves
of
pla
nts
tu
rnin
g
bull19
77 C
lean
Air
Act
N
atio
nal
Par
kbull
Po
wer
pla
nts
fac
tori
es
pu
rple
an
d b
row
n (
stip
plin
g)
Am
en
dm
en
ts
lab
elin
gN
Ps
bullG
ran
d C
an
yo
np
lan
ts e
mis
sio
ns
bullH
ind
erin
g p
ho
tosyn
thesis
of
as C
lass I
are
as
Nat
ion
al P
ark
bullE
mis
sio
ns
fro
m
pla
nts
an
d t
rees
bullS
mo
ky M
ou
nta
in S
ou
rces
S
mo
kesta
cks
(ch
imn
eys)
bullR
ed
ucti
on
of
vis
ibilit
yat
bullR
egio
nal
haz
ere
gu
lati
on
s
fro
m O
hio
New
Yo
rk
bullE
mis
sio
ns
fro
m K
iln
sG
ran
d C
an
yo
np
rop
ose
d b
y th
e E
PA
Atl
anta
etc
bull
Un
reg
ula
ted
po
lluti
on
sm
og
bull
Vis
ibili
ty r
educ
ed 9
0 o
f bull
Red
ucti
on
of
allo
wab
lebull
Gra
nd
Can
yon
So
urc
es(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)th
e da
ysp
ollu
tio
n s
tan
dard
s f
rom
fro
m C
A N
V U
T A
Z N
M
bullS
mok
e fr
om C
on
tro
lled
bu
rns
bullS
ee v
ague
blu
e m
asse
sin
du
stry
and
Mex
ico
bullE
mis
sio
ns
fro
m la
rge
citi
esbull
See
hal
f as
far
as
in 1
919
bullT
N L
utt
rell
corp
bu
ildin
g
per
mit
sit
uat
ion
Exam
ple
s (
1 p
oin
t)
10 m
axim
um
Exam
ple
s (
1 p
oin
t)
5 m
axim
um
bullN
avaj
o G
ener
atin
g S
tati
on
bull
Ob
ject
ing
to
kiln
s in
TN
Pag
e A
Z (
Po
wer
Pla
nt
AZ
)bull
Res
earc
her
s u
sin
g s
cien
tifi
cbull
Po
wer
pla
nts
in T
N a
nd
OH
tech
no
log
y to
ID s
ou
rces
rive
r va
lley
(rad
ioac
tive
iso
top
es)
bullS
ou
ther
n C
alif
orn
ia E
dis
on
bull
So
uth
ern
Cal
ifo
rnia
Ed
iso
nP
lan
t L
aug
hlin
NV
Pla
nt
in L
aug
hlin
NV
bull
Ten
nes
see
Lutt
rell
Kiln
s T
Nid
enti
fied
as
po
int
sou
rce
bullIn
du
stry
in A
Zbull
Scru
bber
sin
stal
led
at
bullC
ars
in C
AN
avaj
o G
ener
atin
g P
lan
tbull
Sm
oke
stac
ks in
NM
NV
UT
bull10
r
edu
ctio
n o
f p
ollu
tio
nbull
Los
An
gel
esin
10ndash
15 y
ears
(g
oal
of
no
bullA
tlan
tam
an-m
ade
po
lluti
on
)bull
New
Yo
rk
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
and discriminant analysis usually associated with descriptiveresearch (Tabachnick and Fidell 1996) The present research wasconducted with samples of naturally occurring student groups andwas not experimental Moreover we were interested in finding waysto compare participant performance on the measures of the newconstructs Reading to Learn and Reading to Integrate with perform-ance on more traditional measures of basic comprehension Thus we opted to use discriminant analysis because of its parsimony of description and clarity of interpretation (Stevens 1996)Discriminant analysis a technique recommended to describe groupdifferences or predict group membership based on a comparison ofmultiple predictors (Huberty 1994) has been used in other areas ofapplied linguistic research to investigate creation of a student profileof success or failure on Computer Assisted Language Learning(CALL) lessons (Jamieson et al 1993) and accurate classification oftext types into registers (Biber 1993) among other purposes
To further distinguish basic comprehension from the new con-structs we conducted discriminant analysis on each of the twolanguage groups (native and nonnative) to determine whetherReading to Learn and Reading to Integrate would classify partici-pants in the same way that Basic Comprehension would We dividedthe native speaker and nonnative speaker groups into three levelshigh middle (mid) and low reading ability scorers based on thebasic comprehension measure chosen for that group (NelsonndashDennyfor native speakers TOEFL Reading Comprehension for nonnativespeakers) Research methodologists (Tabachnick and Fidell 1996513) note that robustness is expected when the smallest group has isleast 20 our smallest group was 25 To check the assumption ofhomogeneity of the variancecovariance matrices we examined theoutcomes of Boxrsquos M Test and found them all nonsignificant(Klecka 1980) Each group was checked for outliers usingMahalanobis distance treated as Chi-Square and no outliers were found (Tabachnick and Fidell 1996) Thus the data met allassumptions required for use of discriminant analysis
a Discriminant analysis for nonnative speakers To organize thediscriminant analysis in order to see if the Reading to LearnReadingto Integrate Composite classified participants similarly to themeasure of Basic Comprehension (for nonnative speakers TOEFLReading Comprehension) we divided the entire nonnative speakergroup (n 146) into three levels of basic comprehension high mid
192 New tasks for reading comprehension tests
Latricia Trites and Mary McGroarty 193
and low These reading ability groups of high mid and low werebased on typical TOEFL Reading Comprehension score levelsrequired for program entry Participants were classified as high iftheir scores were greater than or equal to 550 or 56 and above on thescaled score on the TOEFL Reading Comprehension (550 is the cutscore often used for graduate entry) Participants were classified asmid if scores ranged between 500 and 549 or 50 to 55 on the TOEFLReading Comprehension Participants were classified as low if theirscores fell below 500 or 49 and below on TOEFL ReadingComprehension (500 is a minimum TOEFL score sometimes usedfor undergraduate admission often with the proviso that studentsenroll in ESL classes either prior to official enrollment or concur-rently) For our entire nonnative speaker group descriptive statisticson basic comprehension reading ability group membership levelsappear in Table 9
We ran SPSS Discriminant Analysis with initial grouping variablesof high mid and low reading ability We compared initial readingability levels with high mid and low categories on the Reading toLearnReading to Integrate Composite a new variable reflectinglevel of performance on the Reading to Learn and Reading toIntegrate measures combined5 The discriminant analysis yieldedone discriminant function with an eigenvalue of 92 responsible for999 of the variance in outcomes Wilkrsquos Lambda for this functionwas 52 significant at 001 the associated Chi-Square value wasextremely large (9093) and highly significant ( p 001) indicatingthe group centroids on the composite Reading to LearnReading toIntegrate function for the three nonnative speaker reading abilitygroups were significantly different Both Reading to Learn and
Table 9 Descriptive statistics for nonnative speakers by TOEFL reading compre-hension reading ability groups (n 143)
Reading ability group n Mean on TOEFL reading comprehension sd
High (56) 57 5968 303Mid (50ndash55) 42 5281 167Low (49) 44 4211 547
Total 143 5226 822
Note Total n 143 due to loss of three cases in anomalous Reading to Integratesession
5We first calculated two separate discriminant analyses one for Reading to Learn and one for Reading to Integrate but we found that both loaded on a single function so we used thecomposite in subsequent analyses
194 New tasks for reading comprehension tests
Reading to Integrate loaded significantly on the discriminant func-tion at 86 for Reading to Learn and 78 for Reading to Integrate ( p 05)
Over half (649) of the high reading ability group remained highon the new measure less than half (429) of the mid reading ability group remained classified as mid Hence the Reading toLearnReading to Integrate Composite was particularly influential inreclassifying the mid reading ability group and to a lesser extent thehigh group However most (818) of the low reading ability groupremained low on the Reading to LearnReading to IntegrateComposite (see Table 10) Of the 143 nonnative speaker participants92 (64) remained in the initial basic comprehension category onthe composite the rest moved but in different directions Twenty-one participants (147) were classified into a higher category on theReading to LearnReading to Integrate Composite than their initialbasic comprehension level would have suggested while 31 (217)were reclassified into a lower category Thus 51 participants(364) just over one third of the sample were classified differentlybased on their Reading to LearnReading to Integrate Compositeperformance
b Discriminant analysis for native speakers Because one of ourgoals in this project was to probe the possible validity of these newmeasures by assessing performance of two groups native as well as
Table 10 Discriminant analysis comparison of nonnative speaker reading abilitygroups with reading to learnreading to integrate composite (n 143)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto integrate composite
High Mid Low
CountHigh (56) 37 17 3 57Mid (50ndash55) 13 18 11 42Low (49) 2 6 36 44Reclassification total 52 41 50 143
PercentageHigh (56) 649 298 53 1000Mid (50ndash55) 310 429 262 1000Low (49) 45 136 818 1000
NoteTotal n 143 due to loss of three cases in anomalous Reading to Integratesession
Latricia Trites and Mary McGroarty 195
nonnative speakers we conducted a parallel discriminant analysisfor native speakers Thus for the native speakers we followed thesame procedure dividing the entire native speaker group (n 105)into three levels of basic comprehension high mid and low basedon score distances of 5 standard deviations from the sample meanof the NelsonndashDenny test for these participants Native speakersneed not take reading comprehension tests when entering the univer-sity so the three-way split was based entirely on our sample dataParticipants were classified as high if their scores on the Nelson-Denny were greater than or equal to 135 Participants were classifiedas mid if their scores ranged between 119ndash134 on the Nelson-DennyParticipants were classified as low if their scores fell at or below 118on the Nelson-Denny Descriptive statistics for the entire nativespeaker sample on basic comprehension group membership appearin Table 11
The discriminant analysis yielded one discriminant function with an eigenvalue of 31 responsible for 987 of the variance inoutcomes Wilkrsquos Lambda for this function was 76 significant at 001 the associated Chi-Square value was large (2639) and highlysignificant (p 001) indicating the group centroids on the discrim-inant function for the three reading ability groups on the Reading toLearnReading to Integrate Composite were significantly differentPooled within groups correlations between discriminating variablesshowed that Reading to Learn correlated with the first discriminantfunction at a level of 81 Reading to Integrate correlated with thesecond discriminant function at 71 This contrasts with findings forthe nonnative speakers where scores on the combined new measuresloaded significantly on only one discriminant function For nativespeakers then there is evidence for two significant discriminantfunctions although the first accounts for almost all of the varianceAlthough these two measures (Reading to Learn and Reading toIntegrate) loaded on two separate discriminant functions they still
Table 11 Descriptive statistics for native speakers by NelsonndashDenny reading abilitygroups (n 101)
Reading ability group n Mean on NelsonndashDenny sd
High (135) 39 14126 519Mid (119ndash134) 37 12565 509Low (118) 25 10288 1098Total 101 12604 1652
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
196 New tasks for reading comprehension tests
showed moderate correlations with the alternate function6 justifyingthe composite calculations
Results of discriminant analysis for native speakers seen in Table 12 show a different pattern than that observed for nonnativespeakers Nearly three-fourths (718) of the native speakersclassified as high in the basic comprehension reading ability groupremained high on the Reading to LearnReading to IntegrateComposite For the mid group however only 189 remainedclassified as mid Just over half (52) of the low reading abilitygroup members remained low on the Reading to LearnReading to Integrate Composite As with the nonnative speakers participantsin the mid category on basic comprehension showed the mostfrequent reclassification Forty-eight of the 101 (475) nativespeaker participants remained in the initial classification categoriesTwenty-two (218) were reclassified into a higher category and 31(307) were reclassified into a lower category Thus 53 participants(525) over half of the sample were classified differently based ontheir performance on the Reading to LearnReading to IntegrateComposite
6Reading to Learn correlated with function 2 at -58 Reading to Integrate with function 1 at 71
Table 12 Discriminant analysis comparison of native speaker reading abilitygroups with Reading to LearnReading to Integrate Composite (n 101)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto Integrate Composite
High Mid Low
CountHigh (56) 37 17 3CountHigh (135) 28 3 8 39Mid (119ndash134) 10 7 20 37Low (118) 5 7 13 25Reclassification total 43 17 41 101
PercentageHigh (135) 718 77 205 1000Mid (119ndash134) 270 189 541 1000Low (118) 200 280 520 1000
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
V Interpretation
Analyses done to answer Research Questions 1 and 2 showed that performance on Reading to Learn and Reading to Integratemeasures was significantly influenced by language background and level of education (graduate vs undergraduate for nonnativespeakers only) Moreover level of computer familiarity had nosignificant effect on Reading to Learn and Reading to Integrateperformance
Correlations showed that as expected all reading measures corre-lated to some degree generally answering Research Question 3 Themost interesting results came from the discriminant analyses becausethey showed a pattern of differential classification on the new taskssensitive to initial reading ability This pattern differed for nonnativespeakers and native speakers Examination of the reclassificationsbased on the new tasks revealed some of the problems of using abasic comprehension-only test to predict performance on more chal-lenging literacy tasks These results also imply the need for furtherwork on new measures of advanced literacy skills such as Reading toLearn and Reading to Integrate to reflect trends in construct-drivenassessment (Pellegrino et al 1999)
For the nonnative speakers most (818) of the participants withTOEFL Reading Comprehension below 50 remained classified as lowon the Reading to LearnReading to Integrate Composite suggestingthe existence of a lower threshold of academic English proficiencyParticipants below this threshold were unlikely to perform well ontasks assessing Reading to Learn and Reading to Integrate Even tasksinvolving only selection (such as BC1 and BC2) rather than produc-tion of responses were difficult for this group However for thenonnative speaker readers in the mid and high reading ability groupsbasic comprehension level was not nearly as consistent a predictor of performance on the Reading to LearnReading to IntegrateComposite This was especially striking for those in the mid readingability group Approximately one fourth of the mid group droppedinto the low category on the Reading to LearnReading to IntegrateComposite indicating that they experienced more difficulty in com-pleting the new measures but approximately one fourth could indeedcategorize and synthesize information relatively well despite mid-dling performance on basic comprehension This finding suggeststhat for nonnative speaker readers in the mid reading ability categorybasic comprehension measures were insufficient to predict theirperformance on the new measures
Latricia Trites and Mary McGroarty 197
Almost two-thirds of the high basic comprehension nonnativespeakers remained high on the Reading to LearnReading toIntegrate Composite but one third dropped suggesting that theycould not categorize and synthesize information from texts as well asthey could recognize information For this third the recognition-onlytask multiple-choice basic comprehension overestimated ability tomanipulate and synthesize information For the high and mid read-ing ability nonnative speaker groups basic comprehension-only testsmay then overestimate academic English proficiency
Results for the native speakers showed a more definitive pattern ofreclassification For the low reading ability group approximatelyhalf were misclassified suggesting that the basic comprehensionmeasure underrepresented their ability to categorize and synthesizeinformation in academic texts Most of the mid reading ability groupwere reclassified as lower on the Reading to LearnReading toIntegrate Composite showing that basic comprehension resultsoverestimated ability to succeed on more challenging tasks On theother hand almost one third of the mid reading ability group didbetter on the Reading to LearnReading to Integrate Composite forthem basic comprehension underrepresented their ability to com-plete more challenging reading tasks For the high ability readersnearly one third dropped on the Reading to LearnReading toIntegrate Composite For them the basic comprehension measureoverestimated their level of academic English proficiency The other two thirds of the high reading ability group remained highsuggesting a higher threshold of academic English proficiency
Considering results from both the nonnative speakers and nativespeakers we conclude that the new tasks did assess something dif-ferent from basic comprehension once a lower level threshold ofbasic academic English proficiency had been achieved Examinationof scatterplots based on discriminant functions indicated an obviousseparation between participants who could and could not perform onthe Reading to LearnReading to Integrate Composite thus provid-ing some tentative evidence for concurrent validity (Messick 1989Chapelle 1999) We had hoped to find clear evidence of a hierarchysuggesting that Reading to Learn was demonstrably more difficultthan basic comprehension and Reading to Integrate demonstrablymore difficult than Reading to Learn but results did not yield anobvious hierarchy Results did however suggest an even simplerpattern a dichotomy For those nonnative speakers above the lowerthreshold of English language proficiency which in our data wouldbe approximately 500 or 49 on the scaled TOEFL Reading
198 New tasks for reading comprehension tests
Comprehension scores some could perform well on the newmeasures some could not revealing two rather than three groups onthe Reading to LearnReading to Integrate Composite These resultslead us to speculate that the new measures tap additional skills suchas lsquosophisticated discourse processes and critical thinking skillsrsquo(Enright and Schedl 1999 24) in addition to language proficiencyFor native speakers the discriminant loading suggested a possiblehierarchy of difficulty but the scatterplots and classification tablesrevealed a dichotomy Reclassification based on the Reading toLearnReading to Integrate Composite nearly eliminated the mid basic comprehension reading ability category with only 168of the native speakers qualifying as mid on the Reading toLearnReading to Integrate Composite
VI Conclusions
While we acknowledge that there are some limitations to this projectsuch as the time required to administer and score new measuresthe results are illuminating useful and suggest some considerationsfor future research The first consideration relates to appropriateclassification of students based on academic reading abilities Thiscorresponds to the role of TOEFL as a gatekeeper by many institu-tions For TOEFL test-takers whose current TOEFL scores would bein an intermediate to high intermediate range the new tasks couldassess additional abilities relevant to academic performance Forsuch participants additional test development efforts are warrantedThe majority of high reading ability nonnative speakers (649)could perform the new measures a third could not Our resultssuggest that some unknown number of mid reading ability nonnativespeakers are more capable of succeeding at more challenging aca-demic reading tasks than their current level of basic comprehensionassessment would indicate At the same time there are some highand mid reading ability participants who could not perform the moredemanding tasks admitting such students directly into universityprograms could result in failure These results suggest that for moststudents at lower levels of basic comprehension (in our study thosewith TOEFL scores below 490 to 500) development of new tasks isunnecessary Nevertheless our results indicate that a certain smallpercentage (in our sample 8 students 182) of nonnative speakersclassified as low reading ability on basic comprehension did in fact
Latricia Trites and Mary McGroarty 199
perform adequately on more challenging tasks like Reading to Learnand Reading to Integrate Given the very large number of TOEFLtest-takers around the world this finding merits further research Itwould be potentially unfair to exclude such students from universitystudy based on basic comprehension-only measures such as the cur-rent TOEFL We would urge institutions that use TOEFL to considerscores on these new more demanding tasks if doing so wouldaddress institutional needs According to interview data collected inconjunction with this project (Trites 2000 Chapter 6) participantsperceived that successful completion of the new tasks required athorough understanding of the texts whereas the multiple-choicetests were less demanding requiring only superficial grasp of con-tent This finding corroborates the observation of Freedle and Kostin(1999 3) who note that often examinees do not need to comprehendthe accompanying text to answer the test item Therefore for allTOEFL test-takers incorporating more challenging tasks such as theReading to Learn and Reading to Integrate measures into typicalEnglish language instruction could have positive washback effects
The second area of consideration is the focus on additional rele-vant research For predictive validity it is important to determine thecorrelation between the Reading to LearnReading to IntegrateComposite measures and actual academic performance in universityclasses Correlations might differ depending on the degree of catego-rization and synthesis required by different major fields of studyAnother area for future research is related to the novelty of theReading to Learn task as an assessment technique Current pedagog-ical trends in literacy instruction emphasize the use of graphic organ-izers as a means to understand and manipulate information in textsTo our knowledge graphic organizers are typically used as classroomactivities rather than assessment tools They have good potential foruse in assessment if students are familiar with them and if appropri-ate scoring systems can be developed This project demonstrates that it is possible though labor intensive to develop reliable scoringsystems Further research is needed to explore refinement of this tasktype and related scoring systems for use in testing programs withlarge numbers of test-takers and scorers Although the Reading toIntegrate task (generating a synthesis) was more familiar the scoringsystem was innovative because it reflected a readerrsquos ability torecognize textual frames as well as integrate information If testdevelopers are interested in the abilities of nonnative speaker readersto perform such tasks further development of similar tasks andscoring systems is warranted While a substantial investment of time
200 New tasks for reading comprehension tests
would be required to refine the administration and scoring systemsneeded for more complex tasks such as these it should be weighedagainst the possible danger of under-representing the likelihood ofacademic success based on results of the basic comprehension-onlymeasures still most often used to assess academic proficiency
Acknowledgements
This project was funded by a grant from Educational Testing Serviceas part of the TOEFL 2000 effort We appreciate the cooperation ofthe ETS staff members who assisted in the development of passagespecific tasks and other aspects of the research however no officialendorsement of Educational Testing Service should be inferred
VII References
Bachman L 2000 Modern language testing at the turn of the century assur-ing that what we count counts Language Testing 17 1ndash42
Biber D 1993 Using register-diversified corpora for general language studiesComputational Linguistics 19 219ndash41
Britt M Rouet J and Perfetti C 1996 Using hypertext to study and reasonabout historical evidence In Rouet J Levonen J Dillon A and Spiro Reditors Hypertext and cognition Mahwah NJ Lawrence Erlbaum 43ndash72
Chapelle C 1999 Validity in language assessment Annual Review of AppliedLinguistics 19 1ndash19
Educational Testing Service 1997 TOEFL test and score manual PrincetonNJ Educational Testing Service
mdashmdash 1998 Draft TOEFL 2000 research agenda framework areas of researchResearch agenda three TOEFL 2000 internal document Princeton NJEducational Testing Service
Eignor D Taylor C Kirsch I and Jamieson J 1998 Development of ascale for assessing the level of computer familiarity of TOEFL exami-nees TOEFL Research Report No 60 Princeton NJ EducationalTesting Service
Enright M and Schedl M 1999 Reading for a reason using reader purposeto guide test design TOEFL 2000 Internal Report Princeton NJEducational Testing Service
Enright M Grabe W Mosenthal P Mulcahy-Ernt P and Schedl M1998 A TOEFL 2000 framework for testing reading comprehension aworking paper Princeton NJ Educational Testing Service
Foltz P 1996 Comprehension coherence and strategies in hypertext InRouet J Levonen J Dillon A and Spiro R editors Hypertext and cog-nition Mahwah NJ Lawrence Erlbaum 109ndash36
Latricia Trites and Mary McGroarty 201
Freedle R and Kostin I 1999 Does the text matter in a multiple choice testof comprehension The case for the construct validity of TOEFLrsquosminitalks Language Testing 16 2ndash32
Goldman S 1997 Learning from text reflections on the past and suggestionsfor the future Discourse Processes 23 357ndash98
Hayes J and Hatch J 1999 Issues in measuring reliability correlationversus percentage of agreement Written Communication 16 354ndash67
Huberty C 1994 Applied discriminant analysis New York John WileyJamieson J Campbell J Norfleet L and Berbisada N 1993 Reliability
of a computerized scoring routine for an open-ended task System 21305ndash22
Jamieson J Norfleet L and Berbisada N 1993 Successes failures anddropouts in computer-assisted language lessons Computer AssistedEnglish Language Learning Journal 4 12ndash20
Klecka W 1980 Discriminant analysis In Lewis-Beck M editorQuantitative applications in the social sciences Volume 19 NewburyPark CA Sage
Lehto M Zhu W and Carpenter B 1995 The relative effectiveness ofhypertext and text International Journal of Human-ComputerInteraction 7 293ndash313
McNamara D and Kintsch W 1996 Learning from texts effects of prior knowl-edge and text coherence Discourse Processes 22 247ndash88
Messick S 1989 Validity In Linn RL editor Educational measurement 3rdedition New York American Council on Education Macmillan 13ndash103
Meyer B 1985a Prose analyses purposes procedures and problems InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 11ndash64
mdashmdash 1985b Prose analysis purposes procedures and problems Part 2 InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 269ndash97
Monks V 1997 AugustSeptember Two views same waterway NationalWildlife 35 36ndash37
Pellegrino J Baxter G and Glaser R 1999 Addressing the lsquotwo disci-plinesrsquo problem linking theories of cognition and learning with assess-ment and instructional practice Review of Research in Education 24307ndash53
Perfetti C 1997 Sentences individual differences and multiple texts threeissues in text comprehension Discourse Processes 23 337ndash55
Perfetti C Britt MA and Georgi M 1995 Text-based learning andreasoning studies in history Hillside NJ Lawrence Erlbaum
Perfetti C Marron M and Foltz P 1996 Sources of comprehension failuretheoretical perspectives and case studies In Cornoldi C and Oakhill Jeditors Reading comprehension difficulties processes and interventionMahwah NJ Lawrence Erlbaum 137ndash65
202 New tasks for reading comprehension tests
Reinking D 1988 Computer-mediated text and comprehension differencesthe role of reading time reader preference and estimation of learningReading Research Quarterly 23 484ndash500
Reinking D and Schreiner R 1985 The effects of computer-mediated texton measures of reading comprehension and reading behavior ReadingResearch Quarterly 20 536ndash53
Spivey N 1997 The constructivist metaphor reading writing and themaking of meaning San Diego CA Academic Press
Stevens J 1996 Applied multivariate statistics for the social sciences 3rdedition Mahwah NJ Lawrence Erlbaum
Tabachnick B and Fidell L 1996 Using multivariate statistics 3rd editionNew York Harper Collins
Taylor C Jamieson J Eignor D and Kirsch I 1998 The relationshipbetween computer familiarity and performance on computer-basedTOEFL test tasks TOEFL Research Report No 61 Princeton NJEducational Testing Service
Tennesen M 1997 NovemberDecember On a clear day National Parks 7126ndash9
Trites L 2000 Beyond basic comprehension reading to learn and reading tointegrate for native and non-native speakers Unpublished doctoraldissertation Northern Arizona University Flagstaff AZ
Van den Berg S and Watt J 1991 Effects of educational setting on studentresponses to structured hypertext Journal of Computer-BasedInstruction 18 118ndash24
Van Dijk TA and Kintsch W 1983 Strategies of discourse comprehensionNew York Academic Press
Wiley J and Voss J 1999 Constructing arguments from multiple sourcestasks that promote understanding and not just memory for text Journalof Educational Psychology 91 310ndash11
Zimmerman T 1997 December 29 Filter it with billions and billions of oys-ters how to revive the Chesapeake Bay US News and World Report 12363
Appendix 1a Chart completion task
Directions Complete the following chart Fill in as much detail aspossible from the text read by categorizing the information into thedifferent areas on the chart Do not use single words for yourresponses form your responses in phrases or complete sentencesInclude examples from the text
Make no judgments about the accuracy of causes or effects or theeffectiveness of the solutions mentioned in the text Solutions are seen
Latricia Trites and Mary McGroarty 203
204 New tasks for reading comprehension tests
as any action taken in response to the problem(s) Solutions can takemany forms such as proposed solutions attempted solutions or failedsolutions Also space is provided under each category for examplesExamples are specific examples found in the text that are used by theauthor(s) to exemplify the problems causes effects or solutions inthe text However there may not be examples for every category
Points will be awarded for correct responses only There is no penaltyfor incorrect responses Points will be awarded in the following manner
Problems and Solutions 10 points eachCauses and Effects 5 points eachExamples 1 point each
Problems Causes Effects Solutions
Examples Examples Examples Examples
Latricia Trites and Mary McGroarty 205
Ap
pen
dix
1b
Rea
din
g t
o L
earn
sco
rin
g r
ub
ric
0ndash2
41 t
ota
l po
ssib
le
Pro
ble
ms
(10
po
ints
) 1
wo
rd
Cau
ses
(5p
oin
ts)
1 w
ord
E
ffec
ts (
5 p
oin
ts)
1 w
ord
S
olu
tio
ns
(10
po
ints
en
d)
1 w
ord
(6 p
oin
ts)
40
max
imu
m(3
po
ints
) 5
5 m
axim
um
(3 p
oin
ts)
30
max
imu
m(6
po
ints
) 9
0 m
axim
um
bullA
ir p
ollu
tio
nS
mo
g i
n t
he
bullS
ulf
ur
nit
rog
en e
mis
sio
ns
bullTre
es a
nd
pla
nts
aff
ecte
dS
tric
ter
En
vir
on
men
tal
Law
s
Nati
on
al
Park
sbull
Aci
d r
ain
(in
jure
d)
gro
wth
hin
der
ed(r
eso
luti
on
s
acts
)
bullO
pp
osit
ion
or
Ign
ori
ng
bull
Gro
un
d-l
evel
ozo
ne
bullV
isib
ilit
y d
ecre
ased
bull19
77 C
lean
Air
Act
(or
lack
of
coo
per
atio
n)
to
Urb
an
In
du
str
ial
em
issio
ns
bullM
eta
ls l
oo
sen
ed
into
wat
ers
Am
en
dm
en
ts l
ab
elin
gN
Ps
envi
ron
men
tal s
tan
dar
ds
bullA
uto
mo
bile
emis
sio
ns
(su
rfac
eg
rou
nd
wat
er
as C
lass I
are
as
(reg
ula
tio
ns)
bullP
ow
er
pla
nts
fac
tori
esp
lan
ts
wat
er p
ollu
ted
)bull
Reg
ion
al h
aze
reg
ula
tio
ns
bullN
o t
rue
lsquopo
int
sou
rces
rsquoem
issi
on
sbull
Nu
trie
nts
rem
oved
pro
po
sed
by
the
EPA
(myri
ad
of
sm
aller
po
llu
tio
n
bullE
mis
sio
ns
fro
m S
mo
kesta
cks
(lea
ched
) fr
om
so
il bull
Red
ucti
on
of
allo
wab
leso
urc
es
rath
er t
han
on
e (c
him
neys)
and
or
pla
nts
po
llu
tio
n s
tan
dard
s f
rom
la
rge
sou
rce)
bullE
mis
sio
ns
fro
m K
iln
sbull
Pu
blic o
utc
ryo
ver
stan
ce
ind
ust
rybull
Nati
on
al
Park
s l
imit
ed
bull
Un
reg
ula
ted
po
lluti
on
sm
og
o
f b
ig b
usi
nes
sg
ove
rnm
ent
bullC
lean
Air
Act
set
ju
risd
icti
on
of
po
lluti
on
(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)bull
Aq
uati
c l
ife in
jure
d (
dam
aged
)vis
ibilit
y g
oals
sou
rces
ou
tsid
e o
f th
e p
arks
bullS
mo
ke f
rom
Co
ntr
olled
bu
rns
bullO
bje
cti
ng
to c
on
str
ucti
on
bullE
mis
sio
ns
fro
m la
rge
citi
esp
erm
its
bullId
en
tifi
cati
on
of
po
lluti
on
so
urc
es
bullIn
du
str
y m
od
ificati
on
s
Insta
llati
on
o
f d
evic
es
(scr
ub
ber
s)
to r
edu
ce
po
lluti
on
bullP
ub
lic p
ressu
re t
op
rote
ct p
arks
206 New tasks for reading comprehension testsA
pp
en
dix
1b
(co
nti
nu
ed)
Exam
ple
s (
1 p
oin
t) 5
maxim
um
Exam
ple
s (
1 p
oin
t o
r 5 p
oin
ts
Exam
ple
s (
1 p
oin
t) 6
maxim
um
Exam
ple
s (
1 p
oin
t o
r 10 p
oin
ts
wit
h o
vera
rch
ing
cau
se
wit
h o
vera
rch
ing
so
luti
on
liste
d a
bo
ve)
liste
d a
bo
ve)
bullG
reat
Sm
oky M
ou
nta
inbull
Au
tom
ob
ile
emis
sio
ns
bullLe
aves
of
pla
nts
tu
rnin
g
bull19
77 C
lean
Air
Act
N
atio
nal
Par
kbull
Po
wer
pla
nts
fac
tori
es
pu
rple
an
d b
row
n (
stip
plin
g)
Am
en
dm
en
ts
lab
elin
gN
Ps
bullG
ran
d C
an
yo
np
lan
ts e
mis
sio
ns
bullH
ind
erin
g p
ho
tosyn
thesis
of
as C
lass I
are
as
Nat
ion
al P
ark
bullE
mis
sio
ns
fro
m
pla
nts
an
d t
rees
bullS
mo
ky M
ou
nta
in S
ou
rces
S
mo
kesta
cks
(ch
imn
eys)
bullR
ed
ucti
on
of
vis
ibilit
yat
bullR
egio
nal
haz
ere
gu
lati
on
s
fro
m O
hio
New
Yo
rk
bullE
mis
sio
ns
fro
m K
iln
sG
ran
d C
an
yo
np
rop
ose
d b
y th
e E
PA
Atl
anta
etc
bull
Un
reg
ula
ted
po
lluti
on
sm
og
bull
Vis
ibili
ty r
educ
ed 9
0 o
f bull
Red
ucti
on
of
allo
wab
lebull
Gra
nd
Can
yon
So
urc
es(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)th
e da
ysp
ollu
tio
n s
tan
dard
s f
rom
fro
m C
A N
V U
T A
Z N
M
bullS
mok
e fr
om C
on
tro
lled
bu
rns
bullS
ee v
ague
blu
e m
asse
sin
du
stry
and
Mex
ico
bullE
mis
sio
ns
fro
m la
rge
citi
esbull
See
hal
f as
far
as
in 1
919
bullT
N L
utt
rell
corp
bu
ildin
g
per
mit
sit
uat
ion
Exam
ple
s (
1 p
oin
t)
10 m
axim
um
Exam
ple
s (
1 p
oin
t)
5 m
axim
um
bullN
avaj
o G
ener
atin
g S
tati
on
bull
Ob
ject
ing
to
kiln
s in
TN
Pag
e A
Z (
Po
wer
Pla
nt
AZ
)bull
Res
earc
her
s u
sin
g s
cien
tifi
cbull
Po
wer
pla
nts
in T
N a
nd
OH
tech
no
log
y to
ID s
ou
rces
rive
r va
lley
(rad
ioac
tive
iso
top
es)
bullS
ou
ther
n C
alif
orn
ia E
dis
on
bull
So
uth
ern
Cal
ifo
rnia
Ed
iso
nP
lan
t L
aug
hlin
NV
Pla
nt
in L
aug
hlin
NV
bull
Ten
nes
see
Lutt
rell
Kiln
s T
Nid
enti
fied
as
po
int
sou
rce
bullIn
du
stry
in A
Zbull
Scru
bber
sin
stal
led
at
bullC
ars
in C
AN
avaj
o G
ener
atin
g P
lan
tbull
Sm
oke
stac
ks in
NM
NV
UT
bull10
r
edu
ctio
n o
f p
ollu
tio
nbull
Los
An
gel
esin
10ndash
15 y
ears
(g
oal
of
no
bullA
tlan
tam
an-m
ade
po
lluti
on
)bull
New
Yo
rk
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
Latricia Trites and Mary McGroarty 193
and low These reading ability groups of high mid and low werebased on typical TOEFL Reading Comprehension score levelsrequired for program entry Participants were classified as high iftheir scores were greater than or equal to 550 or 56 and above on thescaled score on the TOEFL Reading Comprehension (550 is the cutscore often used for graduate entry) Participants were classified asmid if scores ranged between 500 and 549 or 50 to 55 on the TOEFLReading Comprehension Participants were classified as low if theirscores fell below 500 or 49 and below on TOEFL ReadingComprehension (500 is a minimum TOEFL score sometimes usedfor undergraduate admission often with the proviso that studentsenroll in ESL classes either prior to official enrollment or concur-rently) For our entire nonnative speaker group descriptive statisticson basic comprehension reading ability group membership levelsappear in Table 9
We ran SPSS Discriminant Analysis with initial grouping variablesof high mid and low reading ability We compared initial readingability levels with high mid and low categories on the Reading toLearnReading to Integrate Composite a new variable reflectinglevel of performance on the Reading to Learn and Reading toIntegrate measures combined5 The discriminant analysis yieldedone discriminant function with an eigenvalue of 92 responsible for999 of the variance in outcomes Wilkrsquos Lambda for this functionwas 52 significant at 001 the associated Chi-Square value wasextremely large (9093) and highly significant ( p 001) indicatingthe group centroids on the composite Reading to LearnReading toIntegrate function for the three nonnative speaker reading abilitygroups were significantly different Both Reading to Learn and
Table 9 Descriptive statistics for nonnative speakers by TOEFL reading compre-hension reading ability groups (n 143)
Reading ability group n Mean on TOEFL reading comprehension sd
High (56) 57 5968 303Mid (50ndash55) 42 5281 167Low (49) 44 4211 547
Total 143 5226 822
Note Total n 143 due to loss of three cases in anomalous Reading to Integratesession
5We first calculated two separate discriminant analyses one for Reading to Learn and one for Reading to Integrate but we found that both loaded on a single function so we used thecomposite in subsequent analyses
194 New tasks for reading comprehension tests
Reading to Integrate loaded significantly on the discriminant func-tion at 86 for Reading to Learn and 78 for Reading to Integrate ( p 05)
Over half (649) of the high reading ability group remained highon the new measure less than half (429) of the mid reading ability group remained classified as mid Hence the Reading toLearnReading to Integrate Composite was particularly influential inreclassifying the mid reading ability group and to a lesser extent thehigh group However most (818) of the low reading ability groupremained low on the Reading to LearnReading to IntegrateComposite (see Table 10) Of the 143 nonnative speaker participants92 (64) remained in the initial basic comprehension category onthe composite the rest moved but in different directions Twenty-one participants (147) were classified into a higher category on theReading to LearnReading to Integrate Composite than their initialbasic comprehension level would have suggested while 31 (217)were reclassified into a lower category Thus 51 participants(364) just over one third of the sample were classified differentlybased on their Reading to LearnReading to Integrate Compositeperformance
b Discriminant analysis for native speakers Because one of ourgoals in this project was to probe the possible validity of these newmeasures by assessing performance of two groups native as well as
Table 10 Discriminant analysis comparison of nonnative speaker reading abilitygroups with reading to learnreading to integrate composite (n 143)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto integrate composite
High Mid Low
CountHigh (56) 37 17 3 57Mid (50ndash55) 13 18 11 42Low (49) 2 6 36 44Reclassification total 52 41 50 143
PercentageHigh (56) 649 298 53 1000Mid (50ndash55) 310 429 262 1000Low (49) 45 136 818 1000
NoteTotal n 143 due to loss of three cases in anomalous Reading to Integratesession
Latricia Trites and Mary McGroarty 195
nonnative speakers we conducted a parallel discriminant analysisfor native speakers Thus for the native speakers we followed thesame procedure dividing the entire native speaker group (n 105)into three levels of basic comprehension high mid and low basedon score distances of 5 standard deviations from the sample meanof the NelsonndashDenny test for these participants Native speakersneed not take reading comprehension tests when entering the univer-sity so the three-way split was based entirely on our sample dataParticipants were classified as high if their scores on the Nelson-Denny were greater than or equal to 135 Participants were classifiedas mid if their scores ranged between 119ndash134 on the Nelson-DennyParticipants were classified as low if their scores fell at or below 118on the Nelson-Denny Descriptive statistics for the entire nativespeaker sample on basic comprehension group membership appearin Table 11
The discriminant analysis yielded one discriminant function with an eigenvalue of 31 responsible for 987 of the variance inoutcomes Wilkrsquos Lambda for this function was 76 significant at 001 the associated Chi-Square value was large (2639) and highlysignificant (p 001) indicating the group centroids on the discrim-inant function for the three reading ability groups on the Reading toLearnReading to Integrate Composite were significantly differentPooled within groups correlations between discriminating variablesshowed that Reading to Learn correlated with the first discriminantfunction at a level of 81 Reading to Integrate correlated with thesecond discriminant function at 71 This contrasts with findings forthe nonnative speakers where scores on the combined new measuresloaded significantly on only one discriminant function For nativespeakers then there is evidence for two significant discriminantfunctions although the first accounts for almost all of the varianceAlthough these two measures (Reading to Learn and Reading toIntegrate) loaded on two separate discriminant functions they still
Table 11 Descriptive statistics for native speakers by NelsonndashDenny reading abilitygroups (n 101)
Reading ability group n Mean on NelsonndashDenny sd
High (135) 39 14126 519Mid (119ndash134) 37 12565 509Low (118) 25 10288 1098Total 101 12604 1652
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
196 New tasks for reading comprehension tests
showed moderate correlations with the alternate function6 justifyingthe composite calculations
Results of discriminant analysis for native speakers seen in Table 12 show a different pattern than that observed for nonnativespeakers Nearly three-fourths (718) of the native speakersclassified as high in the basic comprehension reading ability groupremained high on the Reading to LearnReading to IntegrateComposite For the mid group however only 189 remainedclassified as mid Just over half (52) of the low reading abilitygroup members remained low on the Reading to LearnReading to Integrate Composite As with the nonnative speakers participantsin the mid category on basic comprehension showed the mostfrequent reclassification Forty-eight of the 101 (475) nativespeaker participants remained in the initial classification categoriesTwenty-two (218) were reclassified into a higher category and 31(307) were reclassified into a lower category Thus 53 participants(525) over half of the sample were classified differently based ontheir performance on the Reading to LearnReading to IntegrateComposite
6Reading to Learn correlated with function 2 at -58 Reading to Integrate with function 1 at 71
Table 12 Discriminant analysis comparison of native speaker reading abilitygroups with Reading to LearnReading to Integrate Composite (n 101)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto Integrate Composite
High Mid Low
CountHigh (56) 37 17 3CountHigh (135) 28 3 8 39Mid (119ndash134) 10 7 20 37Low (118) 5 7 13 25Reclassification total 43 17 41 101
PercentageHigh (135) 718 77 205 1000Mid (119ndash134) 270 189 541 1000Low (118) 200 280 520 1000
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
V Interpretation
Analyses done to answer Research Questions 1 and 2 showed that performance on Reading to Learn and Reading to Integratemeasures was significantly influenced by language background and level of education (graduate vs undergraduate for nonnativespeakers only) Moreover level of computer familiarity had nosignificant effect on Reading to Learn and Reading to Integrateperformance
Correlations showed that as expected all reading measures corre-lated to some degree generally answering Research Question 3 Themost interesting results came from the discriminant analyses becausethey showed a pattern of differential classification on the new taskssensitive to initial reading ability This pattern differed for nonnativespeakers and native speakers Examination of the reclassificationsbased on the new tasks revealed some of the problems of using abasic comprehension-only test to predict performance on more chal-lenging literacy tasks These results also imply the need for furtherwork on new measures of advanced literacy skills such as Reading toLearn and Reading to Integrate to reflect trends in construct-drivenassessment (Pellegrino et al 1999)
For the nonnative speakers most (818) of the participants withTOEFL Reading Comprehension below 50 remained classified as lowon the Reading to LearnReading to Integrate Composite suggestingthe existence of a lower threshold of academic English proficiencyParticipants below this threshold were unlikely to perform well ontasks assessing Reading to Learn and Reading to Integrate Even tasksinvolving only selection (such as BC1 and BC2) rather than produc-tion of responses were difficult for this group However for thenonnative speaker readers in the mid and high reading ability groupsbasic comprehension level was not nearly as consistent a predictor of performance on the Reading to LearnReading to IntegrateComposite This was especially striking for those in the mid readingability group Approximately one fourth of the mid group droppedinto the low category on the Reading to LearnReading to IntegrateComposite indicating that they experienced more difficulty in com-pleting the new measures but approximately one fourth could indeedcategorize and synthesize information relatively well despite mid-dling performance on basic comprehension This finding suggeststhat for nonnative speaker readers in the mid reading ability categorybasic comprehension measures were insufficient to predict theirperformance on the new measures
Latricia Trites and Mary McGroarty 197
Almost two-thirds of the high basic comprehension nonnativespeakers remained high on the Reading to LearnReading toIntegrate Composite but one third dropped suggesting that theycould not categorize and synthesize information from texts as well asthey could recognize information For this third the recognition-onlytask multiple-choice basic comprehension overestimated ability tomanipulate and synthesize information For the high and mid read-ing ability nonnative speaker groups basic comprehension-only testsmay then overestimate academic English proficiency
Results for the native speakers showed a more definitive pattern ofreclassification For the low reading ability group approximatelyhalf were misclassified suggesting that the basic comprehensionmeasure underrepresented their ability to categorize and synthesizeinformation in academic texts Most of the mid reading ability groupwere reclassified as lower on the Reading to LearnReading toIntegrate Composite showing that basic comprehension resultsoverestimated ability to succeed on more challenging tasks On theother hand almost one third of the mid reading ability group didbetter on the Reading to LearnReading to Integrate Composite forthem basic comprehension underrepresented their ability to com-plete more challenging reading tasks For the high ability readersnearly one third dropped on the Reading to LearnReading toIntegrate Composite For them the basic comprehension measureoverestimated their level of academic English proficiency The other two thirds of the high reading ability group remained highsuggesting a higher threshold of academic English proficiency
Considering results from both the nonnative speakers and nativespeakers we conclude that the new tasks did assess something dif-ferent from basic comprehension once a lower level threshold ofbasic academic English proficiency had been achieved Examinationof scatterplots based on discriminant functions indicated an obviousseparation between participants who could and could not perform onthe Reading to LearnReading to Integrate Composite thus provid-ing some tentative evidence for concurrent validity (Messick 1989Chapelle 1999) We had hoped to find clear evidence of a hierarchysuggesting that Reading to Learn was demonstrably more difficultthan basic comprehension and Reading to Integrate demonstrablymore difficult than Reading to Learn but results did not yield anobvious hierarchy Results did however suggest an even simplerpattern a dichotomy For those nonnative speakers above the lowerthreshold of English language proficiency which in our data wouldbe approximately 500 or 49 on the scaled TOEFL Reading
198 New tasks for reading comprehension tests
Comprehension scores some could perform well on the newmeasures some could not revealing two rather than three groups onthe Reading to LearnReading to Integrate Composite These resultslead us to speculate that the new measures tap additional skills suchas lsquosophisticated discourse processes and critical thinking skillsrsquo(Enright and Schedl 1999 24) in addition to language proficiencyFor native speakers the discriminant loading suggested a possiblehierarchy of difficulty but the scatterplots and classification tablesrevealed a dichotomy Reclassification based on the Reading toLearnReading to Integrate Composite nearly eliminated the mid basic comprehension reading ability category with only 168of the native speakers qualifying as mid on the Reading toLearnReading to Integrate Composite
VI Conclusions
While we acknowledge that there are some limitations to this projectsuch as the time required to administer and score new measuresthe results are illuminating useful and suggest some considerationsfor future research The first consideration relates to appropriateclassification of students based on academic reading abilities Thiscorresponds to the role of TOEFL as a gatekeeper by many institu-tions For TOEFL test-takers whose current TOEFL scores would bein an intermediate to high intermediate range the new tasks couldassess additional abilities relevant to academic performance Forsuch participants additional test development efforts are warrantedThe majority of high reading ability nonnative speakers (649)could perform the new measures a third could not Our resultssuggest that some unknown number of mid reading ability nonnativespeakers are more capable of succeeding at more challenging aca-demic reading tasks than their current level of basic comprehensionassessment would indicate At the same time there are some highand mid reading ability participants who could not perform the moredemanding tasks admitting such students directly into universityprograms could result in failure These results suggest that for moststudents at lower levels of basic comprehension (in our study thosewith TOEFL scores below 490 to 500) development of new tasks isunnecessary Nevertheless our results indicate that a certain smallpercentage (in our sample 8 students 182) of nonnative speakersclassified as low reading ability on basic comprehension did in fact
Latricia Trites and Mary McGroarty 199
perform adequately on more challenging tasks like Reading to Learnand Reading to Integrate Given the very large number of TOEFLtest-takers around the world this finding merits further research Itwould be potentially unfair to exclude such students from universitystudy based on basic comprehension-only measures such as the cur-rent TOEFL We would urge institutions that use TOEFL to considerscores on these new more demanding tasks if doing so wouldaddress institutional needs According to interview data collected inconjunction with this project (Trites 2000 Chapter 6) participantsperceived that successful completion of the new tasks required athorough understanding of the texts whereas the multiple-choicetests were less demanding requiring only superficial grasp of con-tent This finding corroborates the observation of Freedle and Kostin(1999 3) who note that often examinees do not need to comprehendthe accompanying text to answer the test item Therefore for allTOEFL test-takers incorporating more challenging tasks such as theReading to Learn and Reading to Integrate measures into typicalEnglish language instruction could have positive washback effects
The second area of consideration is the focus on additional rele-vant research For predictive validity it is important to determine thecorrelation between the Reading to LearnReading to IntegrateComposite measures and actual academic performance in universityclasses Correlations might differ depending on the degree of catego-rization and synthesis required by different major fields of studyAnother area for future research is related to the novelty of theReading to Learn task as an assessment technique Current pedagog-ical trends in literacy instruction emphasize the use of graphic organ-izers as a means to understand and manipulate information in textsTo our knowledge graphic organizers are typically used as classroomactivities rather than assessment tools They have good potential foruse in assessment if students are familiar with them and if appropri-ate scoring systems can be developed This project demonstrates that it is possible though labor intensive to develop reliable scoringsystems Further research is needed to explore refinement of this tasktype and related scoring systems for use in testing programs withlarge numbers of test-takers and scorers Although the Reading toIntegrate task (generating a synthesis) was more familiar the scoringsystem was innovative because it reflected a readerrsquos ability torecognize textual frames as well as integrate information If testdevelopers are interested in the abilities of nonnative speaker readersto perform such tasks further development of similar tasks andscoring systems is warranted While a substantial investment of time
200 New tasks for reading comprehension tests
would be required to refine the administration and scoring systemsneeded for more complex tasks such as these it should be weighedagainst the possible danger of under-representing the likelihood ofacademic success based on results of the basic comprehension-onlymeasures still most often used to assess academic proficiency
Acknowledgements
This project was funded by a grant from Educational Testing Serviceas part of the TOEFL 2000 effort We appreciate the cooperation ofthe ETS staff members who assisted in the development of passagespecific tasks and other aspects of the research however no officialendorsement of Educational Testing Service should be inferred
VII References
Bachman L 2000 Modern language testing at the turn of the century assur-ing that what we count counts Language Testing 17 1ndash42
Biber D 1993 Using register-diversified corpora for general language studiesComputational Linguistics 19 219ndash41
Britt M Rouet J and Perfetti C 1996 Using hypertext to study and reasonabout historical evidence In Rouet J Levonen J Dillon A and Spiro Reditors Hypertext and cognition Mahwah NJ Lawrence Erlbaum 43ndash72
Chapelle C 1999 Validity in language assessment Annual Review of AppliedLinguistics 19 1ndash19
Educational Testing Service 1997 TOEFL test and score manual PrincetonNJ Educational Testing Service
mdashmdash 1998 Draft TOEFL 2000 research agenda framework areas of researchResearch agenda three TOEFL 2000 internal document Princeton NJEducational Testing Service
Eignor D Taylor C Kirsch I and Jamieson J 1998 Development of ascale for assessing the level of computer familiarity of TOEFL exami-nees TOEFL Research Report No 60 Princeton NJ EducationalTesting Service
Enright M and Schedl M 1999 Reading for a reason using reader purposeto guide test design TOEFL 2000 Internal Report Princeton NJEducational Testing Service
Enright M Grabe W Mosenthal P Mulcahy-Ernt P and Schedl M1998 A TOEFL 2000 framework for testing reading comprehension aworking paper Princeton NJ Educational Testing Service
Foltz P 1996 Comprehension coherence and strategies in hypertext InRouet J Levonen J Dillon A and Spiro R editors Hypertext and cog-nition Mahwah NJ Lawrence Erlbaum 109ndash36
Latricia Trites and Mary McGroarty 201
Freedle R and Kostin I 1999 Does the text matter in a multiple choice testof comprehension The case for the construct validity of TOEFLrsquosminitalks Language Testing 16 2ndash32
Goldman S 1997 Learning from text reflections on the past and suggestionsfor the future Discourse Processes 23 357ndash98
Hayes J and Hatch J 1999 Issues in measuring reliability correlationversus percentage of agreement Written Communication 16 354ndash67
Huberty C 1994 Applied discriminant analysis New York John WileyJamieson J Campbell J Norfleet L and Berbisada N 1993 Reliability
of a computerized scoring routine for an open-ended task System 21305ndash22
Jamieson J Norfleet L and Berbisada N 1993 Successes failures anddropouts in computer-assisted language lessons Computer AssistedEnglish Language Learning Journal 4 12ndash20
Klecka W 1980 Discriminant analysis In Lewis-Beck M editorQuantitative applications in the social sciences Volume 19 NewburyPark CA Sage
Lehto M Zhu W and Carpenter B 1995 The relative effectiveness ofhypertext and text International Journal of Human-ComputerInteraction 7 293ndash313
McNamara D and Kintsch W 1996 Learning from texts effects of prior knowl-edge and text coherence Discourse Processes 22 247ndash88
Messick S 1989 Validity In Linn RL editor Educational measurement 3rdedition New York American Council on Education Macmillan 13ndash103
Meyer B 1985a Prose analyses purposes procedures and problems InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 11ndash64
mdashmdash 1985b Prose analysis purposes procedures and problems Part 2 InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 269ndash97
Monks V 1997 AugustSeptember Two views same waterway NationalWildlife 35 36ndash37
Pellegrino J Baxter G and Glaser R 1999 Addressing the lsquotwo disci-plinesrsquo problem linking theories of cognition and learning with assess-ment and instructional practice Review of Research in Education 24307ndash53
Perfetti C 1997 Sentences individual differences and multiple texts threeissues in text comprehension Discourse Processes 23 337ndash55
Perfetti C Britt MA and Georgi M 1995 Text-based learning andreasoning studies in history Hillside NJ Lawrence Erlbaum
Perfetti C Marron M and Foltz P 1996 Sources of comprehension failuretheoretical perspectives and case studies In Cornoldi C and Oakhill Jeditors Reading comprehension difficulties processes and interventionMahwah NJ Lawrence Erlbaum 137ndash65
202 New tasks for reading comprehension tests
Reinking D 1988 Computer-mediated text and comprehension differencesthe role of reading time reader preference and estimation of learningReading Research Quarterly 23 484ndash500
Reinking D and Schreiner R 1985 The effects of computer-mediated texton measures of reading comprehension and reading behavior ReadingResearch Quarterly 20 536ndash53
Spivey N 1997 The constructivist metaphor reading writing and themaking of meaning San Diego CA Academic Press
Stevens J 1996 Applied multivariate statistics for the social sciences 3rdedition Mahwah NJ Lawrence Erlbaum
Tabachnick B and Fidell L 1996 Using multivariate statistics 3rd editionNew York Harper Collins
Taylor C Jamieson J Eignor D and Kirsch I 1998 The relationshipbetween computer familiarity and performance on computer-basedTOEFL test tasks TOEFL Research Report No 61 Princeton NJEducational Testing Service
Tennesen M 1997 NovemberDecember On a clear day National Parks 7126ndash9
Trites L 2000 Beyond basic comprehension reading to learn and reading tointegrate for native and non-native speakers Unpublished doctoraldissertation Northern Arizona University Flagstaff AZ
Van den Berg S and Watt J 1991 Effects of educational setting on studentresponses to structured hypertext Journal of Computer-BasedInstruction 18 118ndash24
Van Dijk TA and Kintsch W 1983 Strategies of discourse comprehensionNew York Academic Press
Wiley J and Voss J 1999 Constructing arguments from multiple sourcestasks that promote understanding and not just memory for text Journalof Educational Psychology 91 310ndash11
Zimmerman T 1997 December 29 Filter it with billions and billions of oys-ters how to revive the Chesapeake Bay US News and World Report 12363
Appendix 1a Chart completion task
Directions Complete the following chart Fill in as much detail aspossible from the text read by categorizing the information into thedifferent areas on the chart Do not use single words for yourresponses form your responses in phrases or complete sentencesInclude examples from the text
Make no judgments about the accuracy of causes or effects or theeffectiveness of the solutions mentioned in the text Solutions are seen
Latricia Trites and Mary McGroarty 203
204 New tasks for reading comprehension tests
as any action taken in response to the problem(s) Solutions can takemany forms such as proposed solutions attempted solutions or failedsolutions Also space is provided under each category for examplesExamples are specific examples found in the text that are used by theauthor(s) to exemplify the problems causes effects or solutions inthe text However there may not be examples for every category
Points will be awarded for correct responses only There is no penaltyfor incorrect responses Points will be awarded in the following manner
Problems and Solutions 10 points eachCauses and Effects 5 points eachExamples 1 point each
Problems Causes Effects Solutions
Examples Examples Examples Examples
Latricia Trites and Mary McGroarty 205
Ap
pen
dix
1b
Rea
din
g t
o L
earn
sco
rin
g r
ub
ric
0ndash2
41 t
ota
l po
ssib
le
Pro
ble
ms
(10
po
ints
) 1
wo
rd
Cau
ses
(5p
oin
ts)
1 w
ord
E
ffec
ts (
5 p
oin
ts)
1 w
ord
S
olu
tio
ns
(10
po
ints
en
d)
1 w
ord
(6 p
oin
ts)
40
max
imu
m(3
po
ints
) 5
5 m
axim
um
(3 p
oin
ts)
30
max
imu
m(6
po
ints
) 9
0 m
axim
um
bullA
ir p
ollu
tio
nS
mo
g i
n t
he
bullS
ulf
ur
nit
rog
en e
mis
sio
ns
bullTre
es a
nd
pla
nts
aff
ecte
dS
tric
ter
En
vir
on
men
tal
Law
s
Nati
on
al
Park
sbull
Aci
d r
ain
(in
jure
d)
gro
wth
hin
der
ed(r
eso
luti
on
s
acts
)
bullO
pp
osit
ion
or
Ign
ori
ng
bull
Gro
un
d-l
evel
ozo
ne
bullV
isib
ilit
y d
ecre
ased
bull19
77 C
lean
Air
Act
(or
lack
of
coo
per
atio
n)
to
Urb
an
In
du
str
ial
em
issio
ns
bullM
eta
ls l
oo
sen
ed
into
wat
ers
Am
en
dm
en
ts l
ab
elin
gN
Ps
envi
ron
men
tal s
tan
dar
ds
bullA
uto
mo
bile
emis
sio
ns
(su
rfac
eg
rou
nd
wat
er
as C
lass I
are
as
(reg
ula
tio
ns)
bullP
ow
er
pla
nts
fac
tori
esp
lan
ts
wat
er p
ollu
ted
)bull
Reg
ion
al h
aze
reg
ula
tio
ns
bullN
o t
rue
lsquopo
int
sou
rces
rsquoem
issi
on
sbull
Nu
trie
nts
rem
oved
pro
po
sed
by
the
EPA
(myri
ad
of
sm
aller
po
llu
tio
n
bullE
mis
sio
ns
fro
m S
mo
kesta
cks
(lea
ched
) fr
om
so
il bull
Red
ucti
on
of
allo
wab
leso
urc
es
rath
er t
han
on
e (c
him
neys)
and
or
pla
nts
po
llu
tio
n s
tan
dard
s f
rom
la
rge
sou
rce)
bullE
mis
sio
ns
fro
m K
iln
sbull
Pu
blic o
utc
ryo
ver
stan
ce
ind
ust
rybull
Nati
on
al
Park
s l
imit
ed
bull
Un
reg
ula
ted
po
lluti
on
sm
og
o
f b
ig b
usi
nes
sg
ove
rnm
ent
bullC
lean
Air
Act
set
ju
risd
icti
on
of
po
lluti
on
(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)bull
Aq
uati
c l
ife in
jure
d (
dam
aged
)vis
ibilit
y g
oals
sou
rces
ou
tsid
e o
f th
e p
arks
bullS
mo
ke f
rom
Co
ntr
olled
bu
rns
bullO
bje
cti
ng
to c
on
str
ucti
on
bullE
mis
sio
ns
fro
m la
rge
citi
esp
erm
its
bullId
en
tifi
cati
on
of
po
lluti
on
so
urc
es
bullIn
du
str
y m
od
ificati
on
s
Insta
llati
on
o
f d
evic
es
(scr
ub
ber
s)
to r
edu
ce
po
lluti
on
bullP
ub
lic p
ressu
re t
op
rote
ct p
arks
206 New tasks for reading comprehension testsA
pp
en
dix
1b
(co
nti
nu
ed)
Exam
ple
s (
1 p
oin
t) 5
maxim
um
Exam
ple
s (
1 p
oin
t o
r 5 p
oin
ts
Exam
ple
s (
1 p
oin
t) 6
maxim
um
Exam
ple
s (
1 p
oin
t o
r 10 p
oin
ts
wit
h o
vera
rch
ing
cau
se
wit
h o
vera
rch
ing
so
luti
on
liste
d a
bo
ve)
liste
d a
bo
ve)
bullG
reat
Sm
oky M
ou
nta
inbull
Au
tom
ob
ile
emis
sio
ns
bullLe
aves
of
pla
nts
tu
rnin
g
bull19
77 C
lean
Air
Act
N
atio
nal
Par
kbull
Po
wer
pla
nts
fac
tori
es
pu
rple
an
d b
row
n (
stip
plin
g)
Am
en
dm
en
ts
lab
elin
gN
Ps
bullG
ran
d C
an
yo
np
lan
ts e
mis
sio
ns
bullH
ind
erin
g p
ho
tosyn
thesis
of
as C
lass I
are
as
Nat
ion
al P
ark
bullE
mis
sio
ns
fro
m
pla
nts
an
d t
rees
bullS
mo
ky M
ou
nta
in S
ou
rces
S
mo
kesta
cks
(ch
imn
eys)
bullR
ed
ucti
on
of
vis
ibilit
yat
bullR
egio
nal
haz
ere
gu
lati
on
s
fro
m O
hio
New
Yo
rk
bullE
mis
sio
ns
fro
m K
iln
sG
ran
d C
an
yo
np
rop
ose
d b
y th
e E
PA
Atl
anta
etc
bull
Un
reg
ula
ted
po
lluti
on
sm
og
bull
Vis
ibili
ty r
educ
ed 9
0 o
f bull
Red
ucti
on
of
allo
wab
lebull
Gra
nd
Can
yon
So
urc
es(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)th
e da
ysp
ollu
tio
n s
tan
dard
s f
rom
fro
m C
A N
V U
T A
Z N
M
bullS
mok
e fr
om C
on
tro
lled
bu
rns
bullS
ee v
ague
blu
e m
asse
sin
du
stry
and
Mex
ico
bullE
mis
sio
ns
fro
m la
rge
citi
esbull
See
hal
f as
far
as
in 1
919
bullT
N L
utt
rell
corp
bu
ildin
g
per
mit
sit
uat
ion
Exam
ple
s (
1 p
oin
t)
10 m
axim
um
Exam
ple
s (
1 p
oin
t)
5 m
axim
um
bullN
avaj
o G
ener
atin
g S
tati
on
bull
Ob
ject
ing
to
kiln
s in
TN
Pag
e A
Z (
Po
wer
Pla
nt
AZ
)bull
Res
earc
her
s u
sin
g s
cien
tifi
cbull
Po
wer
pla
nts
in T
N a
nd
OH
tech
no
log
y to
ID s
ou
rces
rive
r va
lley
(rad
ioac
tive
iso
top
es)
bullS
ou
ther
n C
alif
orn
ia E
dis
on
bull
So
uth
ern
Cal
ifo
rnia
Ed
iso
nP
lan
t L
aug
hlin
NV
Pla
nt
in L
aug
hlin
NV
bull
Ten
nes
see
Lutt
rell
Kiln
s T
Nid
enti
fied
as
po
int
sou
rce
bullIn
du
stry
in A
Zbull
Scru
bber
sin
stal
led
at
bullC
ars
in C
AN
avaj
o G
ener
atin
g P
lan
tbull
Sm
oke
stac
ks in
NM
NV
UT
bull10
r
edu
ctio
n o
f p
ollu
tio
nbull
Los
An
gel
esin
10ndash
15 y
ears
(g
oal
of
no
bullA
tlan
tam
an-m
ade
po
lluti
on
)bull
New
Yo
rk
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
194 New tasks for reading comprehension tests
Reading to Integrate loaded significantly on the discriminant func-tion at 86 for Reading to Learn and 78 for Reading to Integrate ( p 05)
Over half (649) of the high reading ability group remained highon the new measure less than half (429) of the mid reading ability group remained classified as mid Hence the Reading toLearnReading to Integrate Composite was particularly influential inreclassifying the mid reading ability group and to a lesser extent thehigh group However most (818) of the low reading ability groupremained low on the Reading to LearnReading to IntegrateComposite (see Table 10) Of the 143 nonnative speaker participants92 (64) remained in the initial basic comprehension category onthe composite the rest moved but in different directions Twenty-one participants (147) were classified into a higher category on theReading to LearnReading to Integrate Composite than their initialbasic comprehension level would have suggested while 31 (217)were reclassified into a lower category Thus 51 participants(364) just over one third of the sample were classified differentlybased on their Reading to LearnReading to Integrate Compositeperformance
b Discriminant analysis for native speakers Because one of ourgoals in this project was to probe the possible validity of these newmeasures by assessing performance of two groups native as well as
Table 10 Discriminant analysis comparison of nonnative speaker reading abilitygroups with reading to learnreading to integrate composite (n 143)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto integrate composite
High Mid Low
CountHigh (56) 37 17 3 57Mid (50ndash55) 13 18 11 42Low (49) 2 6 36 44Reclassification total 52 41 50 143
PercentageHigh (56) 649 298 53 1000Mid (50ndash55) 310 429 262 1000Low (49) 45 136 818 1000
NoteTotal n 143 due to loss of three cases in anomalous Reading to Integratesession
Latricia Trites and Mary McGroarty 195
nonnative speakers we conducted a parallel discriminant analysisfor native speakers Thus for the native speakers we followed thesame procedure dividing the entire native speaker group (n 105)into three levels of basic comprehension high mid and low basedon score distances of 5 standard deviations from the sample meanof the NelsonndashDenny test for these participants Native speakersneed not take reading comprehension tests when entering the univer-sity so the three-way split was based entirely on our sample dataParticipants were classified as high if their scores on the Nelson-Denny were greater than or equal to 135 Participants were classifiedas mid if their scores ranged between 119ndash134 on the Nelson-DennyParticipants were classified as low if their scores fell at or below 118on the Nelson-Denny Descriptive statistics for the entire nativespeaker sample on basic comprehension group membership appearin Table 11
The discriminant analysis yielded one discriminant function with an eigenvalue of 31 responsible for 987 of the variance inoutcomes Wilkrsquos Lambda for this function was 76 significant at 001 the associated Chi-Square value was large (2639) and highlysignificant (p 001) indicating the group centroids on the discrim-inant function for the three reading ability groups on the Reading toLearnReading to Integrate Composite were significantly differentPooled within groups correlations between discriminating variablesshowed that Reading to Learn correlated with the first discriminantfunction at a level of 81 Reading to Integrate correlated with thesecond discriminant function at 71 This contrasts with findings forthe nonnative speakers where scores on the combined new measuresloaded significantly on only one discriminant function For nativespeakers then there is evidence for two significant discriminantfunctions although the first accounts for almost all of the varianceAlthough these two measures (Reading to Learn and Reading toIntegrate) loaded on two separate discriminant functions they still
Table 11 Descriptive statistics for native speakers by NelsonndashDenny reading abilitygroups (n 101)
Reading ability group n Mean on NelsonndashDenny sd
High (135) 39 14126 519Mid (119ndash134) 37 12565 509Low (118) 25 10288 1098Total 101 12604 1652
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
196 New tasks for reading comprehension tests
showed moderate correlations with the alternate function6 justifyingthe composite calculations
Results of discriminant analysis for native speakers seen in Table 12 show a different pattern than that observed for nonnativespeakers Nearly three-fourths (718) of the native speakersclassified as high in the basic comprehension reading ability groupremained high on the Reading to LearnReading to IntegrateComposite For the mid group however only 189 remainedclassified as mid Just over half (52) of the low reading abilitygroup members remained low on the Reading to LearnReading to Integrate Composite As with the nonnative speakers participantsin the mid category on basic comprehension showed the mostfrequent reclassification Forty-eight of the 101 (475) nativespeaker participants remained in the initial classification categoriesTwenty-two (218) were reclassified into a higher category and 31(307) were reclassified into a lower category Thus 53 participants(525) over half of the sample were classified differently based ontheir performance on the Reading to LearnReading to IntegrateComposite
6Reading to Learn correlated with function 2 at -58 Reading to Integrate with function 1 at 71
Table 12 Discriminant analysis comparison of native speaker reading abilitygroups with Reading to LearnReading to Integrate Composite (n 101)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto Integrate Composite
High Mid Low
CountHigh (56) 37 17 3CountHigh (135) 28 3 8 39Mid (119ndash134) 10 7 20 37Low (118) 5 7 13 25Reclassification total 43 17 41 101
PercentageHigh (135) 718 77 205 1000Mid (119ndash134) 270 189 541 1000Low (118) 200 280 520 1000
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
V Interpretation
Analyses done to answer Research Questions 1 and 2 showed that performance on Reading to Learn and Reading to Integratemeasures was significantly influenced by language background and level of education (graduate vs undergraduate for nonnativespeakers only) Moreover level of computer familiarity had nosignificant effect on Reading to Learn and Reading to Integrateperformance
Correlations showed that as expected all reading measures corre-lated to some degree generally answering Research Question 3 Themost interesting results came from the discriminant analyses becausethey showed a pattern of differential classification on the new taskssensitive to initial reading ability This pattern differed for nonnativespeakers and native speakers Examination of the reclassificationsbased on the new tasks revealed some of the problems of using abasic comprehension-only test to predict performance on more chal-lenging literacy tasks These results also imply the need for furtherwork on new measures of advanced literacy skills such as Reading toLearn and Reading to Integrate to reflect trends in construct-drivenassessment (Pellegrino et al 1999)
For the nonnative speakers most (818) of the participants withTOEFL Reading Comprehension below 50 remained classified as lowon the Reading to LearnReading to Integrate Composite suggestingthe existence of a lower threshold of academic English proficiencyParticipants below this threshold were unlikely to perform well ontasks assessing Reading to Learn and Reading to Integrate Even tasksinvolving only selection (such as BC1 and BC2) rather than produc-tion of responses were difficult for this group However for thenonnative speaker readers in the mid and high reading ability groupsbasic comprehension level was not nearly as consistent a predictor of performance on the Reading to LearnReading to IntegrateComposite This was especially striking for those in the mid readingability group Approximately one fourth of the mid group droppedinto the low category on the Reading to LearnReading to IntegrateComposite indicating that they experienced more difficulty in com-pleting the new measures but approximately one fourth could indeedcategorize and synthesize information relatively well despite mid-dling performance on basic comprehension This finding suggeststhat for nonnative speaker readers in the mid reading ability categorybasic comprehension measures were insufficient to predict theirperformance on the new measures
Latricia Trites and Mary McGroarty 197
Almost two-thirds of the high basic comprehension nonnativespeakers remained high on the Reading to LearnReading toIntegrate Composite but one third dropped suggesting that theycould not categorize and synthesize information from texts as well asthey could recognize information For this third the recognition-onlytask multiple-choice basic comprehension overestimated ability tomanipulate and synthesize information For the high and mid read-ing ability nonnative speaker groups basic comprehension-only testsmay then overestimate academic English proficiency
Results for the native speakers showed a more definitive pattern ofreclassification For the low reading ability group approximatelyhalf were misclassified suggesting that the basic comprehensionmeasure underrepresented their ability to categorize and synthesizeinformation in academic texts Most of the mid reading ability groupwere reclassified as lower on the Reading to LearnReading toIntegrate Composite showing that basic comprehension resultsoverestimated ability to succeed on more challenging tasks On theother hand almost one third of the mid reading ability group didbetter on the Reading to LearnReading to Integrate Composite forthem basic comprehension underrepresented their ability to com-plete more challenging reading tasks For the high ability readersnearly one third dropped on the Reading to LearnReading toIntegrate Composite For them the basic comprehension measureoverestimated their level of academic English proficiency The other two thirds of the high reading ability group remained highsuggesting a higher threshold of academic English proficiency
Considering results from both the nonnative speakers and nativespeakers we conclude that the new tasks did assess something dif-ferent from basic comprehension once a lower level threshold ofbasic academic English proficiency had been achieved Examinationof scatterplots based on discriminant functions indicated an obviousseparation between participants who could and could not perform onthe Reading to LearnReading to Integrate Composite thus provid-ing some tentative evidence for concurrent validity (Messick 1989Chapelle 1999) We had hoped to find clear evidence of a hierarchysuggesting that Reading to Learn was demonstrably more difficultthan basic comprehension and Reading to Integrate demonstrablymore difficult than Reading to Learn but results did not yield anobvious hierarchy Results did however suggest an even simplerpattern a dichotomy For those nonnative speakers above the lowerthreshold of English language proficiency which in our data wouldbe approximately 500 or 49 on the scaled TOEFL Reading
198 New tasks for reading comprehension tests
Comprehension scores some could perform well on the newmeasures some could not revealing two rather than three groups onthe Reading to LearnReading to Integrate Composite These resultslead us to speculate that the new measures tap additional skills suchas lsquosophisticated discourse processes and critical thinking skillsrsquo(Enright and Schedl 1999 24) in addition to language proficiencyFor native speakers the discriminant loading suggested a possiblehierarchy of difficulty but the scatterplots and classification tablesrevealed a dichotomy Reclassification based on the Reading toLearnReading to Integrate Composite nearly eliminated the mid basic comprehension reading ability category with only 168of the native speakers qualifying as mid on the Reading toLearnReading to Integrate Composite
VI Conclusions
While we acknowledge that there are some limitations to this projectsuch as the time required to administer and score new measuresthe results are illuminating useful and suggest some considerationsfor future research The first consideration relates to appropriateclassification of students based on academic reading abilities Thiscorresponds to the role of TOEFL as a gatekeeper by many institu-tions For TOEFL test-takers whose current TOEFL scores would bein an intermediate to high intermediate range the new tasks couldassess additional abilities relevant to academic performance Forsuch participants additional test development efforts are warrantedThe majority of high reading ability nonnative speakers (649)could perform the new measures a third could not Our resultssuggest that some unknown number of mid reading ability nonnativespeakers are more capable of succeeding at more challenging aca-demic reading tasks than their current level of basic comprehensionassessment would indicate At the same time there are some highand mid reading ability participants who could not perform the moredemanding tasks admitting such students directly into universityprograms could result in failure These results suggest that for moststudents at lower levels of basic comprehension (in our study thosewith TOEFL scores below 490 to 500) development of new tasks isunnecessary Nevertheless our results indicate that a certain smallpercentage (in our sample 8 students 182) of nonnative speakersclassified as low reading ability on basic comprehension did in fact
Latricia Trites and Mary McGroarty 199
perform adequately on more challenging tasks like Reading to Learnand Reading to Integrate Given the very large number of TOEFLtest-takers around the world this finding merits further research Itwould be potentially unfair to exclude such students from universitystudy based on basic comprehension-only measures such as the cur-rent TOEFL We would urge institutions that use TOEFL to considerscores on these new more demanding tasks if doing so wouldaddress institutional needs According to interview data collected inconjunction with this project (Trites 2000 Chapter 6) participantsperceived that successful completion of the new tasks required athorough understanding of the texts whereas the multiple-choicetests were less demanding requiring only superficial grasp of con-tent This finding corroborates the observation of Freedle and Kostin(1999 3) who note that often examinees do not need to comprehendthe accompanying text to answer the test item Therefore for allTOEFL test-takers incorporating more challenging tasks such as theReading to Learn and Reading to Integrate measures into typicalEnglish language instruction could have positive washback effects
The second area of consideration is the focus on additional rele-vant research For predictive validity it is important to determine thecorrelation between the Reading to LearnReading to IntegrateComposite measures and actual academic performance in universityclasses Correlations might differ depending on the degree of catego-rization and synthesis required by different major fields of studyAnother area for future research is related to the novelty of theReading to Learn task as an assessment technique Current pedagog-ical trends in literacy instruction emphasize the use of graphic organ-izers as a means to understand and manipulate information in textsTo our knowledge graphic organizers are typically used as classroomactivities rather than assessment tools They have good potential foruse in assessment if students are familiar with them and if appropri-ate scoring systems can be developed This project demonstrates that it is possible though labor intensive to develop reliable scoringsystems Further research is needed to explore refinement of this tasktype and related scoring systems for use in testing programs withlarge numbers of test-takers and scorers Although the Reading toIntegrate task (generating a synthesis) was more familiar the scoringsystem was innovative because it reflected a readerrsquos ability torecognize textual frames as well as integrate information If testdevelopers are interested in the abilities of nonnative speaker readersto perform such tasks further development of similar tasks andscoring systems is warranted While a substantial investment of time
200 New tasks for reading comprehension tests
would be required to refine the administration and scoring systemsneeded for more complex tasks such as these it should be weighedagainst the possible danger of under-representing the likelihood ofacademic success based on results of the basic comprehension-onlymeasures still most often used to assess academic proficiency
Acknowledgements
This project was funded by a grant from Educational Testing Serviceas part of the TOEFL 2000 effort We appreciate the cooperation ofthe ETS staff members who assisted in the development of passagespecific tasks and other aspects of the research however no officialendorsement of Educational Testing Service should be inferred
VII References
Bachman L 2000 Modern language testing at the turn of the century assur-ing that what we count counts Language Testing 17 1ndash42
Biber D 1993 Using register-diversified corpora for general language studiesComputational Linguistics 19 219ndash41
Britt M Rouet J and Perfetti C 1996 Using hypertext to study and reasonabout historical evidence In Rouet J Levonen J Dillon A and Spiro Reditors Hypertext and cognition Mahwah NJ Lawrence Erlbaum 43ndash72
Chapelle C 1999 Validity in language assessment Annual Review of AppliedLinguistics 19 1ndash19
Educational Testing Service 1997 TOEFL test and score manual PrincetonNJ Educational Testing Service
mdashmdash 1998 Draft TOEFL 2000 research agenda framework areas of researchResearch agenda three TOEFL 2000 internal document Princeton NJEducational Testing Service
Eignor D Taylor C Kirsch I and Jamieson J 1998 Development of ascale for assessing the level of computer familiarity of TOEFL exami-nees TOEFL Research Report No 60 Princeton NJ EducationalTesting Service
Enright M and Schedl M 1999 Reading for a reason using reader purposeto guide test design TOEFL 2000 Internal Report Princeton NJEducational Testing Service
Enright M Grabe W Mosenthal P Mulcahy-Ernt P and Schedl M1998 A TOEFL 2000 framework for testing reading comprehension aworking paper Princeton NJ Educational Testing Service
Foltz P 1996 Comprehension coherence and strategies in hypertext InRouet J Levonen J Dillon A and Spiro R editors Hypertext and cog-nition Mahwah NJ Lawrence Erlbaum 109ndash36
Latricia Trites and Mary McGroarty 201
Freedle R and Kostin I 1999 Does the text matter in a multiple choice testof comprehension The case for the construct validity of TOEFLrsquosminitalks Language Testing 16 2ndash32
Goldman S 1997 Learning from text reflections on the past and suggestionsfor the future Discourse Processes 23 357ndash98
Hayes J and Hatch J 1999 Issues in measuring reliability correlationversus percentage of agreement Written Communication 16 354ndash67
Huberty C 1994 Applied discriminant analysis New York John WileyJamieson J Campbell J Norfleet L and Berbisada N 1993 Reliability
of a computerized scoring routine for an open-ended task System 21305ndash22
Jamieson J Norfleet L and Berbisada N 1993 Successes failures anddropouts in computer-assisted language lessons Computer AssistedEnglish Language Learning Journal 4 12ndash20
Klecka W 1980 Discriminant analysis In Lewis-Beck M editorQuantitative applications in the social sciences Volume 19 NewburyPark CA Sage
Lehto M Zhu W and Carpenter B 1995 The relative effectiveness ofhypertext and text International Journal of Human-ComputerInteraction 7 293ndash313
McNamara D and Kintsch W 1996 Learning from texts effects of prior knowl-edge and text coherence Discourse Processes 22 247ndash88
Messick S 1989 Validity In Linn RL editor Educational measurement 3rdedition New York American Council on Education Macmillan 13ndash103
Meyer B 1985a Prose analyses purposes procedures and problems InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 11ndash64
mdashmdash 1985b Prose analysis purposes procedures and problems Part 2 InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 269ndash97
Monks V 1997 AugustSeptember Two views same waterway NationalWildlife 35 36ndash37
Pellegrino J Baxter G and Glaser R 1999 Addressing the lsquotwo disci-plinesrsquo problem linking theories of cognition and learning with assess-ment and instructional practice Review of Research in Education 24307ndash53
Perfetti C 1997 Sentences individual differences and multiple texts threeissues in text comprehension Discourse Processes 23 337ndash55
Perfetti C Britt MA and Georgi M 1995 Text-based learning andreasoning studies in history Hillside NJ Lawrence Erlbaum
Perfetti C Marron M and Foltz P 1996 Sources of comprehension failuretheoretical perspectives and case studies In Cornoldi C and Oakhill Jeditors Reading comprehension difficulties processes and interventionMahwah NJ Lawrence Erlbaum 137ndash65
202 New tasks for reading comprehension tests
Reinking D 1988 Computer-mediated text and comprehension differencesthe role of reading time reader preference and estimation of learningReading Research Quarterly 23 484ndash500
Reinking D and Schreiner R 1985 The effects of computer-mediated texton measures of reading comprehension and reading behavior ReadingResearch Quarterly 20 536ndash53
Spivey N 1997 The constructivist metaphor reading writing and themaking of meaning San Diego CA Academic Press
Stevens J 1996 Applied multivariate statistics for the social sciences 3rdedition Mahwah NJ Lawrence Erlbaum
Tabachnick B and Fidell L 1996 Using multivariate statistics 3rd editionNew York Harper Collins
Taylor C Jamieson J Eignor D and Kirsch I 1998 The relationshipbetween computer familiarity and performance on computer-basedTOEFL test tasks TOEFL Research Report No 61 Princeton NJEducational Testing Service
Tennesen M 1997 NovemberDecember On a clear day National Parks 7126ndash9
Trites L 2000 Beyond basic comprehension reading to learn and reading tointegrate for native and non-native speakers Unpublished doctoraldissertation Northern Arizona University Flagstaff AZ
Van den Berg S and Watt J 1991 Effects of educational setting on studentresponses to structured hypertext Journal of Computer-BasedInstruction 18 118ndash24
Van Dijk TA and Kintsch W 1983 Strategies of discourse comprehensionNew York Academic Press
Wiley J and Voss J 1999 Constructing arguments from multiple sourcestasks that promote understanding and not just memory for text Journalof Educational Psychology 91 310ndash11
Zimmerman T 1997 December 29 Filter it with billions and billions of oys-ters how to revive the Chesapeake Bay US News and World Report 12363
Appendix 1a Chart completion task
Directions Complete the following chart Fill in as much detail aspossible from the text read by categorizing the information into thedifferent areas on the chart Do not use single words for yourresponses form your responses in phrases or complete sentencesInclude examples from the text
Make no judgments about the accuracy of causes or effects or theeffectiveness of the solutions mentioned in the text Solutions are seen
Latricia Trites and Mary McGroarty 203
204 New tasks for reading comprehension tests
as any action taken in response to the problem(s) Solutions can takemany forms such as proposed solutions attempted solutions or failedsolutions Also space is provided under each category for examplesExamples are specific examples found in the text that are used by theauthor(s) to exemplify the problems causes effects or solutions inthe text However there may not be examples for every category
Points will be awarded for correct responses only There is no penaltyfor incorrect responses Points will be awarded in the following manner
Problems and Solutions 10 points eachCauses and Effects 5 points eachExamples 1 point each
Problems Causes Effects Solutions
Examples Examples Examples Examples
Latricia Trites and Mary McGroarty 205
Ap
pen
dix
1b
Rea
din
g t
o L
earn
sco
rin
g r
ub
ric
0ndash2
41 t
ota
l po
ssib
le
Pro
ble
ms
(10
po
ints
) 1
wo
rd
Cau
ses
(5p
oin
ts)
1 w
ord
E
ffec
ts (
5 p
oin
ts)
1 w
ord
S
olu
tio
ns
(10
po
ints
en
d)
1 w
ord
(6 p
oin
ts)
40
max
imu
m(3
po
ints
) 5
5 m
axim
um
(3 p
oin
ts)
30
max
imu
m(6
po
ints
) 9
0 m
axim
um
bullA
ir p
ollu
tio
nS
mo
g i
n t
he
bullS
ulf
ur
nit
rog
en e
mis
sio
ns
bullTre
es a
nd
pla
nts
aff
ecte
dS
tric
ter
En
vir
on
men
tal
Law
s
Nati
on
al
Park
sbull
Aci
d r
ain
(in
jure
d)
gro
wth
hin
der
ed(r
eso
luti
on
s
acts
)
bullO
pp
osit
ion
or
Ign
ori
ng
bull
Gro
un
d-l
evel
ozo
ne
bullV
isib
ilit
y d
ecre
ased
bull19
77 C
lean
Air
Act
(or
lack
of
coo
per
atio
n)
to
Urb
an
In
du
str
ial
em
issio
ns
bullM
eta
ls l
oo
sen
ed
into
wat
ers
Am
en
dm
en
ts l
ab
elin
gN
Ps
envi
ron
men
tal s
tan
dar
ds
bullA
uto
mo
bile
emis
sio
ns
(su
rfac
eg
rou
nd
wat
er
as C
lass I
are
as
(reg
ula
tio
ns)
bullP
ow
er
pla
nts
fac
tori
esp
lan
ts
wat
er p
ollu
ted
)bull
Reg
ion
al h
aze
reg
ula
tio
ns
bullN
o t
rue
lsquopo
int
sou
rces
rsquoem
issi
on
sbull
Nu
trie
nts
rem
oved
pro
po
sed
by
the
EPA
(myri
ad
of
sm
aller
po
llu
tio
n
bullE
mis
sio
ns
fro
m S
mo
kesta
cks
(lea
ched
) fr
om
so
il bull
Red
ucti
on
of
allo
wab
leso
urc
es
rath
er t
han
on
e (c
him
neys)
and
or
pla
nts
po
llu
tio
n s
tan
dard
s f
rom
la
rge
sou
rce)
bullE
mis
sio
ns
fro
m K
iln
sbull
Pu
blic o
utc
ryo
ver
stan
ce
ind
ust
rybull
Nati
on
al
Park
s l
imit
ed
bull
Un
reg
ula
ted
po
lluti
on
sm
og
o
f b
ig b
usi
nes
sg
ove
rnm
ent
bullC
lean
Air
Act
set
ju
risd
icti
on
of
po
lluti
on
(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)bull
Aq
uati
c l
ife in
jure
d (
dam
aged
)vis
ibilit
y g
oals
sou
rces
ou
tsid
e o
f th
e p
arks
bullS
mo
ke f
rom
Co
ntr
olled
bu
rns
bullO
bje
cti
ng
to c
on
str
ucti
on
bullE
mis
sio
ns
fro
m la
rge
citi
esp
erm
its
bullId
en
tifi
cati
on
of
po
lluti
on
so
urc
es
bullIn
du
str
y m
od
ificati
on
s
Insta
llati
on
o
f d
evic
es
(scr
ub
ber
s)
to r
edu
ce
po
lluti
on
bullP
ub
lic p
ressu
re t
op
rote
ct p
arks
206 New tasks for reading comprehension testsA
pp
en
dix
1b
(co
nti
nu
ed)
Exam
ple
s (
1 p
oin
t) 5
maxim
um
Exam
ple
s (
1 p
oin
t o
r 5 p
oin
ts
Exam
ple
s (
1 p
oin
t) 6
maxim
um
Exam
ple
s (
1 p
oin
t o
r 10 p
oin
ts
wit
h o
vera
rch
ing
cau
se
wit
h o
vera
rch
ing
so
luti
on
liste
d a
bo
ve)
liste
d a
bo
ve)
bullG
reat
Sm
oky M
ou
nta
inbull
Au
tom
ob
ile
emis
sio
ns
bullLe
aves
of
pla
nts
tu
rnin
g
bull19
77 C
lean
Air
Act
N
atio
nal
Par
kbull
Po
wer
pla
nts
fac
tori
es
pu
rple
an
d b
row
n (
stip
plin
g)
Am
en
dm
en
ts
lab
elin
gN
Ps
bullG
ran
d C
an
yo
np
lan
ts e
mis
sio
ns
bullH
ind
erin
g p
ho
tosyn
thesis
of
as C
lass I
are
as
Nat
ion
al P
ark
bullE
mis
sio
ns
fro
m
pla
nts
an
d t
rees
bullS
mo
ky M
ou
nta
in S
ou
rces
S
mo
kesta
cks
(ch
imn
eys)
bullR
ed
ucti
on
of
vis
ibilit
yat
bullR
egio
nal
haz
ere
gu
lati
on
s
fro
m O
hio
New
Yo
rk
bullE
mis
sio
ns
fro
m K
iln
sG
ran
d C
an
yo
np
rop
ose
d b
y th
e E
PA
Atl
anta
etc
bull
Un
reg
ula
ted
po
lluti
on
sm
og
bull
Vis
ibili
ty r
educ
ed 9
0 o
f bull
Red
ucti
on
of
allo
wab
lebull
Gra
nd
Can
yon
So
urc
es(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)th
e da
ysp
ollu
tio
n s
tan
dard
s f
rom
fro
m C
A N
V U
T A
Z N
M
bullS
mok
e fr
om C
on
tro
lled
bu
rns
bullS
ee v
ague
blu
e m
asse
sin
du
stry
and
Mex
ico
bullE
mis
sio
ns
fro
m la
rge
citi
esbull
See
hal
f as
far
as
in 1
919
bullT
N L
utt
rell
corp
bu
ildin
g
per
mit
sit
uat
ion
Exam
ple
s (
1 p
oin
t)
10 m
axim
um
Exam
ple
s (
1 p
oin
t)
5 m
axim
um
bullN
avaj
o G
ener
atin
g S
tati
on
bull
Ob
ject
ing
to
kiln
s in
TN
Pag
e A
Z (
Po
wer
Pla
nt
AZ
)bull
Res
earc
her
s u
sin
g s
cien
tifi
cbull
Po
wer
pla
nts
in T
N a
nd
OH
tech
no
log
y to
ID s
ou
rces
rive
r va
lley
(rad
ioac
tive
iso
top
es)
bullS
ou
ther
n C
alif
orn
ia E
dis
on
bull
So
uth
ern
Cal
ifo
rnia
Ed
iso
nP
lan
t L
aug
hlin
NV
Pla
nt
in L
aug
hlin
NV
bull
Ten
nes
see
Lutt
rell
Kiln
s T
Nid
enti
fied
as
po
int
sou
rce
bullIn
du
stry
in A
Zbull
Scru
bber
sin
stal
led
at
bullC
ars
in C
AN
avaj
o G
ener
atin
g P
lan
tbull
Sm
oke
stac
ks in
NM
NV
UT
bull10
r
edu
ctio
n o
f p
ollu
tio
nbull
Los
An
gel
esin
10ndash
15 y
ears
(g
oal
of
no
bullA
tlan
tam
an-m
ade
po
lluti
on
)bull
New
Yo
rk
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
Latricia Trites and Mary McGroarty 195
nonnative speakers we conducted a parallel discriminant analysisfor native speakers Thus for the native speakers we followed thesame procedure dividing the entire native speaker group (n 105)into three levels of basic comprehension high mid and low basedon score distances of 5 standard deviations from the sample meanof the NelsonndashDenny test for these participants Native speakersneed not take reading comprehension tests when entering the univer-sity so the three-way split was based entirely on our sample dataParticipants were classified as high if their scores on the Nelson-Denny were greater than or equal to 135 Participants were classifiedas mid if their scores ranged between 119ndash134 on the Nelson-DennyParticipants were classified as low if their scores fell at or below 118on the Nelson-Denny Descriptive statistics for the entire nativespeaker sample on basic comprehension group membership appearin Table 11
The discriminant analysis yielded one discriminant function with an eigenvalue of 31 responsible for 987 of the variance inoutcomes Wilkrsquos Lambda for this function was 76 significant at 001 the associated Chi-Square value was large (2639) and highlysignificant (p 001) indicating the group centroids on the discrim-inant function for the three reading ability groups on the Reading toLearnReading to Integrate Composite were significantly differentPooled within groups correlations between discriminating variablesshowed that Reading to Learn correlated with the first discriminantfunction at a level of 81 Reading to Integrate correlated with thesecond discriminant function at 71 This contrasts with findings forthe nonnative speakers where scores on the combined new measuresloaded significantly on only one discriminant function For nativespeakers then there is evidence for two significant discriminantfunctions although the first accounts for almost all of the varianceAlthough these two measures (Reading to Learn and Reading toIntegrate) loaded on two separate discriminant functions they still
Table 11 Descriptive statistics for native speakers by NelsonndashDenny reading abilitygroups (n 101)
Reading ability group n Mean on NelsonndashDenny sd
High (135) 39 14126 519Mid (119ndash134) 37 12565 509Low (118) 25 10288 1098Total 101 12604 1652
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
196 New tasks for reading comprehension tests
showed moderate correlations with the alternate function6 justifyingthe composite calculations
Results of discriminant analysis for native speakers seen in Table 12 show a different pattern than that observed for nonnativespeakers Nearly three-fourths (718) of the native speakersclassified as high in the basic comprehension reading ability groupremained high on the Reading to LearnReading to IntegrateComposite For the mid group however only 189 remainedclassified as mid Just over half (52) of the low reading abilitygroup members remained low on the Reading to LearnReading to Integrate Composite As with the nonnative speakers participantsin the mid category on basic comprehension showed the mostfrequent reclassification Forty-eight of the 101 (475) nativespeaker participants remained in the initial classification categoriesTwenty-two (218) were reclassified into a higher category and 31(307) were reclassified into a lower category Thus 53 participants(525) over half of the sample were classified differently based ontheir performance on the Reading to LearnReading to IntegrateComposite
6Reading to Learn correlated with function 2 at -58 Reading to Integrate with function 1 at 71
Table 12 Discriminant analysis comparison of native speaker reading abilitygroups with Reading to LearnReading to Integrate Composite (n 101)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto Integrate Composite
High Mid Low
CountHigh (56) 37 17 3CountHigh (135) 28 3 8 39Mid (119ndash134) 10 7 20 37Low (118) 5 7 13 25Reclassification total 43 17 41 101
PercentageHigh (135) 718 77 205 1000Mid (119ndash134) 270 189 541 1000Low (118) 200 280 520 1000
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
V Interpretation
Analyses done to answer Research Questions 1 and 2 showed that performance on Reading to Learn and Reading to Integratemeasures was significantly influenced by language background and level of education (graduate vs undergraduate for nonnativespeakers only) Moreover level of computer familiarity had nosignificant effect on Reading to Learn and Reading to Integrateperformance
Correlations showed that as expected all reading measures corre-lated to some degree generally answering Research Question 3 Themost interesting results came from the discriminant analyses becausethey showed a pattern of differential classification on the new taskssensitive to initial reading ability This pattern differed for nonnativespeakers and native speakers Examination of the reclassificationsbased on the new tasks revealed some of the problems of using abasic comprehension-only test to predict performance on more chal-lenging literacy tasks These results also imply the need for furtherwork on new measures of advanced literacy skills such as Reading toLearn and Reading to Integrate to reflect trends in construct-drivenassessment (Pellegrino et al 1999)
For the nonnative speakers most (818) of the participants withTOEFL Reading Comprehension below 50 remained classified as lowon the Reading to LearnReading to Integrate Composite suggestingthe existence of a lower threshold of academic English proficiencyParticipants below this threshold were unlikely to perform well ontasks assessing Reading to Learn and Reading to Integrate Even tasksinvolving only selection (such as BC1 and BC2) rather than produc-tion of responses were difficult for this group However for thenonnative speaker readers in the mid and high reading ability groupsbasic comprehension level was not nearly as consistent a predictor of performance on the Reading to LearnReading to IntegrateComposite This was especially striking for those in the mid readingability group Approximately one fourth of the mid group droppedinto the low category on the Reading to LearnReading to IntegrateComposite indicating that they experienced more difficulty in com-pleting the new measures but approximately one fourth could indeedcategorize and synthesize information relatively well despite mid-dling performance on basic comprehension This finding suggeststhat for nonnative speaker readers in the mid reading ability categorybasic comprehension measures were insufficient to predict theirperformance on the new measures
Latricia Trites and Mary McGroarty 197
Almost two-thirds of the high basic comprehension nonnativespeakers remained high on the Reading to LearnReading toIntegrate Composite but one third dropped suggesting that theycould not categorize and synthesize information from texts as well asthey could recognize information For this third the recognition-onlytask multiple-choice basic comprehension overestimated ability tomanipulate and synthesize information For the high and mid read-ing ability nonnative speaker groups basic comprehension-only testsmay then overestimate academic English proficiency
Results for the native speakers showed a more definitive pattern ofreclassification For the low reading ability group approximatelyhalf were misclassified suggesting that the basic comprehensionmeasure underrepresented their ability to categorize and synthesizeinformation in academic texts Most of the mid reading ability groupwere reclassified as lower on the Reading to LearnReading toIntegrate Composite showing that basic comprehension resultsoverestimated ability to succeed on more challenging tasks On theother hand almost one third of the mid reading ability group didbetter on the Reading to LearnReading to Integrate Composite forthem basic comprehension underrepresented their ability to com-plete more challenging reading tasks For the high ability readersnearly one third dropped on the Reading to LearnReading toIntegrate Composite For them the basic comprehension measureoverestimated their level of academic English proficiency The other two thirds of the high reading ability group remained highsuggesting a higher threshold of academic English proficiency
Considering results from both the nonnative speakers and nativespeakers we conclude that the new tasks did assess something dif-ferent from basic comprehension once a lower level threshold ofbasic academic English proficiency had been achieved Examinationof scatterplots based on discriminant functions indicated an obviousseparation between participants who could and could not perform onthe Reading to LearnReading to Integrate Composite thus provid-ing some tentative evidence for concurrent validity (Messick 1989Chapelle 1999) We had hoped to find clear evidence of a hierarchysuggesting that Reading to Learn was demonstrably more difficultthan basic comprehension and Reading to Integrate demonstrablymore difficult than Reading to Learn but results did not yield anobvious hierarchy Results did however suggest an even simplerpattern a dichotomy For those nonnative speakers above the lowerthreshold of English language proficiency which in our data wouldbe approximately 500 or 49 on the scaled TOEFL Reading
198 New tasks for reading comprehension tests
Comprehension scores some could perform well on the newmeasures some could not revealing two rather than three groups onthe Reading to LearnReading to Integrate Composite These resultslead us to speculate that the new measures tap additional skills suchas lsquosophisticated discourse processes and critical thinking skillsrsquo(Enright and Schedl 1999 24) in addition to language proficiencyFor native speakers the discriminant loading suggested a possiblehierarchy of difficulty but the scatterplots and classification tablesrevealed a dichotomy Reclassification based on the Reading toLearnReading to Integrate Composite nearly eliminated the mid basic comprehension reading ability category with only 168of the native speakers qualifying as mid on the Reading toLearnReading to Integrate Composite
VI Conclusions
While we acknowledge that there are some limitations to this projectsuch as the time required to administer and score new measuresthe results are illuminating useful and suggest some considerationsfor future research The first consideration relates to appropriateclassification of students based on academic reading abilities Thiscorresponds to the role of TOEFL as a gatekeeper by many institu-tions For TOEFL test-takers whose current TOEFL scores would bein an intermediate to high intermediate range the new tasks couldassess additional abilities relevant to academic performance Forsuch participants additional test development efforts are warrantedThe majority of high reading ability nonnative speakers (649)could perform the new measures a third could not Our resultssuggest that some unknown number of mid reading ability nonnativespeakers are more capable of succeeding at more challenging aca-demic reading tasks than their current level of basic comprehensionassessment would indicate At the same time there are some highand mid reading ability participants who could not perform the moredemanding tasks admitting such students directly into universityprograms could result in failure These results suggest that for moststudents at lower levels of basic comprehension (in our study thosewith TOEFL scores below 490 to 500) development of new tasks isunnecessary Nevertheless our results indicate that a certain smallpercentage (in our sample 8 students 182) of nonnative speakersclassified as low reading ability on basic comprehension did in fact
Latricia Trites and Mary McGroarty 199
perform adequately on more challenging tasks like Reading to Learnand Reading to Integrate Given the very large number of TOEFLtest-takers around the world this finding merits further research Itwould be potentially unfair to exclude such students from universitystudy based on basic comprehension-only measures such as the cur-rent TOEFL We would urge institutions that use TOEFL to considerscores on these new more demanding tasks if doing so wouldaddress institutional needs According to interview data collected inconjunction with this project (Trites 2000 Chapter 6) participantsperceived that successful completion of the new tasks required athorough understanding of the texts whereas the multiple-choicetests were less demanding requiring only superficial grasp of con-tent This finding corroborates the observation of Freedle and Kostin(1999 3) who note that often examinees do not need to comprehendthe accompanying text to answer the test item Therefore for allTOEFL test-takers incorporating more challenging tasks such as theReading to Learn and Reading to Integrate measures into typicalEnglish language instruction could have positive washback effects
The second area of consideration is the focus on additional rele-vant research For predictive validity it is important to determine thecorrelation between the Reading to LearnReading to IntegrateComposite measures and actual academic performance in universityclasses Correlations might differ depending on the degree of catego-rization and synthesis required by different major fields of studyAnother area for future research is related to the novelty of theReading to Learn task as an assessment technique Current pedagog-ical trends in literacy instruction emphasize the use of graphic organ-izers as a means to understand and manipulate information in textsTo our knowledge graphic organizers are typically used as classroomactivities rather than assessment tools They have good potential foruse in assessment if students are familiar with them and if appropri-ate scoring systems can be developed This project demonstrates that it is possible though labor intensive to develop reliable scoringsystems Further research is needed to explore refinement of this tasktype and related scoring systems for use in testing programs withlarge numbers of test-takers and scorers Although the Reading toIntegrate task (generating a synthesis) was more familiar the scoringsystem was innovative because it reflected a readerrsquos ability torecognize textual frames as well as integrate information If testdevelopers are interested in the abilities of nonnative speaker readersto perform such tasks further development of similar tasks andscoring systems is warranted While a substantial investment of time
200 New tasks for reading comprehension tests
would be required to refine the administration and scoring systemsneeded for more complex tasks such as these it should be weighedagainst the possible danger of under-representing the likelihood ofacademic success based on results of the basic comprehension-onlymeasures still most often used to assess academic proficiency
Acknowledgements
This project was funded by a grant from Educational Testing Serviceas part of the TOEFL 2000 effort We appreciate the cooperation ofthe ETS staff members who assisted in the development of passagespecific tasks and other aspects of the research however no officialendorsement of Educational Testing Service should be inferred
VII References
Bachman L 2000 Modern language testing at the turn of the century assur-ing that what we count counts Language Testing 17 1ndash42
Biber D 1993 Using register-diversified corpora for general language studiesComputational Linguistics 19 219ndash41
Britt M Rouet J and Perfetti C 1996 Using hypertext to study and reasonabout historical evidence In Rouet J Levonen J Dillon A and Spiro Reditors Hypertext and cognition Mahwah NJ Lawrence Erlbaum 43ndash72
Chapelle C 1999 Validity in language assessment Annual Review of AppliedLinguistics 19 1ndash19
Educational Testing Service 1997 TOEFL test and score manual PrincetonNJ Educational Testing Service
mdashmdash 1998 Draft TOEFL 2000 research agenda framework areas of researchResearch agenda three TOEFL 2000 internal document Princeton NJEducational Testing Service
Eignor D Taylor C Kirsch I and Jamieson J 1998 Development of ascale for assessing the level of computer familiarity of TOEFL exami-nees TOEFL Research Report No 60 Princeton NJ EducationalTesting Service
Enright M and Schedl M 1999 Reading for a reason using reader purposeto guide test design TOEFL 2000 Internal Report Princeton NJEducational Testing Service
Enright M Grabe W Mosenthal P Mulcahy-Ernt P and Schedl M1998 A TOEFL 2000 framework for testing reading comprehension aworking paper Princeton NJ Educational Testing Service
Foltz P 1996 Comprehension coherence and strategies in hypertext InRouet J Levonen J Dillon A and Spiro R editors Hypertext and cog-nition Mahwah NJ Lawrence Erlbaum 109ndash36
Latricia Trites and Mary McGroarty 201
Freedle R and Kostin I 1999 Does the text matter in a multiple choice testof comprehension The case for the construct validity of TOEFLrsquosminitalks Language Testing 16 2ndash32
Goldman S 1997 Learning from text reflections on the past and suggestionsfor the future Discourse Processes 23 357ndash98
Hayes J and Hatch J 1999 Issues in measuring reliability correlationversus percentage of agreement Written Communication 16 354ndash67
Huberty C 1994 Applied discriminant analysis New York John WileyJamieson J Campbell J Norfleet L and Berbisada N 1993 Reliability
of a computerized scoring routine for an open-ended task System 21305ndash22
Jamieson J Norfleet L and Berbisada N 1993 Successes failures anddropouts in computer-assisted language lessons Computer AssistedEnglish Language Learning Journal 4 12ndash20
Klecka W 1980 Discriminant analysis In Lewis-Beck M editorQuantitative applications in the social sciences Volume 19 NewburyPark CA Sage
Lehto M Zhu W and Carpenter B 1995 The relative effectiveness ofhypertext and text International Journal of Human-ComputerInteraction 7 293ndash313
McNamara D and Kintsch W 1996 Learning from texts effects of prior knowl-edge and text coherence Discourse Processes 22 247ndash88
Messick S 1989 Validity In Linn RL editor Educational measurement 3rdedition New York American Council on Education Macmillan 13ndash103
Meyer B 1985a Prose analyses purposes procedures and problems InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 11ndash64
mdashmdash 1985b Prose analysis purposes procedures and problems Part 2 InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 269ndash97
Monks V 1997 AugustSeptember Two views same waterway NationalWildlife 35 36ndash37
Pellegrino J Baxter G and Glaser R 1999 Addressing the lsquotwo disci-plinesrsquo problem linking theories of cognition and learning with assess-ment and instructional practice Review of Research in Education 24307ndash53
Perfetti C 1997 Sentences individual differences and multiple texts threeissues in text comprehension Discourse Processes 23 337ndash55
Perfetti C Britt MA and Georgi M 1995 Text-based learning andreasoning studies in history Hillside NJ Lawrence Erlbaum
Perfetti C Marron M and Foltz P 1996 Sources of comprehension failuretheoretical perspectives and case studies In Cornoldi C and Oakhill Jeditors Reading comprehension difficulties processes and interventionMahwah NJ Lawrence Erlbaum 137ndash65
202 New tasks for reading comprehension tests
Reinking D 1988 Computer-mediated text and comprehension differencesthe role of reading time reader preference and estimation of learningReading Research Quarterly 23 484ndash500
Reinking D and Schreiner R 1985 The effects of computer-mediated texton measures of reading comprehension and reading behavior ReadingResearch Quarterly 20 536ndash53
Spivey N 1997 The constructivist metaphor reading writing and themaking of meaning San Diego CA Academic Press
Stevens J 1996 Applied multivariate statistics for the social sciences 3rdedition Mahwah NJ Lawrence Erlbaum
Tabachnick B and Fidell L 1996 Using multivariate statistics 3rd editionNew York Harper Collins
Taylor C Jamieson J Eignor D and Kirsch I 1998 The relationshipbetween computer familiarity and performance on computer-basedTOEFL test tasks TOEFL Research Report No 61 Princeton NJEducational Testing Service
Tennesen M 1997 NovemberDecember On a clear day National Parks 7126ndash9
Trites L 2000 Beyond basic comprehension reading to learn and reading tointegrate for native and non-native speakers Unpublished doctoraldissertation Northern Arizona University Flagstaff AZ
Van den Berg S and Watt J 1991 Effects of educational setting on studentresponses to structured hypertext Journal of Computer-BasedInstruction 18 118ndash24
Van Dijk TA and Kintsch W 1983 Strategies of discourse comprehensionNew York Academic Press
Wiley J and Voss J 1999 Constructing arguments from multiple sourcestasks that promote understanding and not just memory for text Journalof Educational Psychology 91 310ndash11
Zimmerman T 1997 December 29 Filter it with billions and billions of oys-ters how to revive the Chesapeake Bay US News and World Report 12363
Appendix 1a Chart completion task
Directions Complete the following chart Fill in as much detail aspossible from the text read by categorizing the information into thedifferent areas on the chart Do not use single words for yourresponses form your responses in phrases or complete sentencesInclude examples from the text
Make no judgments about the accuracy of causes or effects or theeffectiveness of the solutions mentioned in the text Solutions are seen
Latricia Trites and Mary McGroarty 203
204 New tasks for reading comprehension tests
as any action taken in response to the problem(s) Solutions can takemany forms such as proposed solutions attempted solutions or failedsolutions Also space is provided under each category for examplesExamples are specific examples found in the text that are used by theauthor(s) to exemplify the problems causes effects or solutions inthe text However there may not be examples for every category
Points will be awarded for correct responses only There is no penaltyfor incorrect responses Points will be awarded in the following manner
Problems and Solutions 10 points eachCauses and Effects 5 points eachExamples 1 point each
Problems Causes Effects Solutions
Examples Examples Examples Examples
Latricia Trites and Mary McGroarty 205
Ap
pen
dix
1b
Rea
din
g t
o L
earn
sco
rin
g r
ub
ric
0ndash2
41 t
ota
l po
ssib
le
Pro
ble
ms
(10
po
ints
) 1
wo
rd
Cau
ses
(5p
oin
ts)
1 w
ord
E
ffec
ts (
5 p
oin
ts)
1 w
ord
S
olu
tio
ns
(10
po
ints
en
d)
1 w
ord
(6 p
oin
ts)
40
max
imu
m(3
po
ints
) 5
5 m
axim
um
(3 p
oin
ts)
30
max
imu
m(6
po
ints
) 9
0 m
axim
um
bullA
ir p
ollu
tio
nS
mo
g i
n t
he
bullS
ulf
ur
nit
rog
en e
mis
sio
ns
bullTre
es a
nd
pla
nts
aff
ecte
dS
tric
ter
En
vir
on
men
tal
Law
s
Nati
on
al
Park
sbull
Aci
d r
ain
(in
jure
d)
gro
wth
hin
der
ed(r
eso
luti
on
s
acts
)
bullO
pp
osit
ion
or
Ign
ori
ng
bull
Gro
un
d-l
evel
ozo
ne
bullV
isib
ilit
y d
ecre
ased
bull19
77 C
lean
Air
Act
(or
lack
of
coo
per
atio
n)
to
Urb
an
In
du
str
ial
em
issio
ns
bullM
eta
ls l
oo
sen
ed
into
wat
ers
Am
en
dm
en
ts l
ab
elin
gN
Ps
envi
ron
men
tal s
tan
dar
ds
bullA
uto
mo
bile
emis
sio
ns
(su
rfac
eg
rou
nd
wat
er
as C
lass I
are
as
(reg
ula
tio
ns)
bullP
ow
er
pla
nts
fac
tori
esp
lan
ts
wat
er p
ollu
ted
)bull
Reg
ion
al h
aze
reg
ula
tio
ns
bullN
o t
rue
lsquopo
int
sou
rces
rsquoem
issi
on
sbull
Nu
trie
nts
rem
oved
pro
po
sed
by
the
EPA
(myri
ad
of
sm
aller
po
llu
tio
n
bullE
mis
sio
ns
fro
m S
mo
kesta
cks
(lea
ched
) fr
om
so
il bull
Red
ucti
on
of
allo
wab
leso
urc
es
rath
er t
han
on
e (c
him
neys)
and
or
pla
nts
po
llu
tio
n s
tan
dard
s f
rom
la
rge
sou
rce)
bullE
mis
sio
ns
fro
m K
iln
sbull
Pu
blic o
utc
ryo
ver
stan
ce
ind
ust
rybull
Nati
on
al
Park
s l
imit
ed
bull
Un
reg
ula
ted
po
lluti
on
sm
og
o
f b
ig b
usi
nes
sg
ove
rnm
ent
bullC
lean
Air
Act
set
ju
risd
icti
on
of
po
lluti
on
(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)bull
Aq
uati
c l
ife in
jure
d (
dam
aged
)vis
ibilit
y g
oals
sou
rces
ou
tsid
e o
f th
e p
arks
bullS
mo
ke f
rom
Co
ntr
olled
bu
rns
bullO
bje
cti
ng
to c
on
str
ucti
on
bullE
mis
sio
ns
fro
m la
rge
citi
esp
erm
its
bullId
en
tifi
cati
on
of
po
lluti
on
so
urc
es
bullIn
du
str
y m
od
ificati
on
s
Insta
llati
on
o
f d
evic
es
(scr
ub
ber
s)
to r
edu
ce
po
lluti
on
bullP
ub
lic p
ressu
re t
op
rote
ct p
arks
206 New tasks for reading comprehension testsA
pp
en
dix
1b
(co
nti
nu
ed)
Exam
ple
s (
1 p
oin
t) 5
maxim
um
Exam
ple
s (
1 p
oin
t o
r 5 p
oin
ts
Exam
ple
s (
1 p
oin
t) 6
maxim
um
Exam
ple
s (
1 p
oin
t o
r 10 p
oin
ts
wit
h o
vera
rch
ing
cau
se
wit
h o
vera
rch
ing
so
luti
on
liste
d a
bo
ve)
liste
d a
bo
ve)
bullG
reat
Sm
oky M
ou
nta
inbull
Au
tom
ob
ile
emis
sio
ns
bullLe
aves
of
pla
nts
tu
rnin
g
bull19
77 C
lean
Air
Act
N
atio
nal
Par
kbull
Po
wer
pla
nts
fac
tori
es
pu
rple
an
d b
row
n (
stip
plin
g)
Am
en
dm
en
ts
lab
elin
gN
Ps
bullG
ran
d C
an
yo
np
lan
ts e
mis
sio
ns
bullH
ind
erin
g p
ho
tosyn
thesis
of
as C
lass I
are
as
Nat
ion
al P
ark
bullE
mis
sio
ns
fro
m
pla
nts
an
d t
rees
bullS
mo
ky M
ou
nta
in S
ou
rces
S
mo
kesta
cks
(ch
imn
eys)
bullR
ed
ucti
on
of
vis
ibilit
yat
bullR
egio
nal
haz
ere
gu
lati
on
s
fro
m O
hio
New
Yo
rk
bullE
mis
sio
ns
fro
m K
iln
sG
ran
d C
an
yo
np
rop
ose
d b
y th
e E
PA
Atl
anta
etc
bull
Un
reg
ula
ted
po
lluti
on
sm
og
bull
Vis
ibili
ty r
educ
ed 9
0 o
f bull
Red
ucti
on
of
allo
wab
lebull
Gra
nd
Can
yon
So
urc
es(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)th
e da
ysp
ollu
tio
n s
tan
dard
s f
rom
fro
m C
A N
V U
T A
Z N
M
bullS
mok
e fr
om C
on
tro
lled
bu
rns
bullS
ee v
ague
blu
e m
asse
sin
du
stry
and
Mex
ico
bullE
mis
sio
ns
fro
m la
rge
citi
esbull
See
hal
f as
far
as
in 1
919
bullT
N L
utt
rell
corp
bu
ildin
g
per
mit
sit
uat
ion
Exam
ple
s (
1 p
oin
t)
10 m
axim
um
Exam
ple
s (
1 p
oin
t)
5 m
axim
um
bullN
avaj
o G
ener
atin
g S
tati
on
bull
Ob
ject
ing
to
kiln
s in
TN
Pag
e A
Z (
Po
wer
Pla
nt
AZ
)bull
Res
earc
her
s u
sin
g s
cien
tifi
cbull
Po
wer
pla
nts
in T
N a
nd
OH
tech
no
log
y to
ID s
ou
rces
rive
r va
lley
(rad
ioac
tive
iso
top
es)
bullS
ou
ther
n C
alif
orn
ia E
dis
on
bull
So
uth
ern
Cal
ifo
rnia
Ed
iso
nP
lan
t L
aug
hlin
NV
Pla
nt
in L
aug
hlin
NV
bull
Ten
nes
see
Lutt
rell
Kiln
s T
Nid
enti
fied
as
po
int
sou
rce
bullIn
du
stry
in A
Zbull
Scru
bber
sin
stal
led
at
bullC
ars
in C
AN
avaj
o G
ener
atin
g P
lan
tbull
Sm
oke
stac
ks in
NM
NV
UT
bull10
r
edu
ctio
n o
f p
ollu
tio
nbull
Los
An
gel
esin
10ndash
15 y
ears
(g
oal
of
no
bullA
tlan
tam
an-m
ade
po
lluti
on
)bull
New
Yo
rk
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
196 New tasks for reading comprehension tests
showed moderate correlations with the alternate function6 justifyingthe composite calculations
Results of discriminant analysis for native speakers seen in Table 12 show a different pattern than that observed for nonnativespeakers Nearly three-fourths (718) of the native speakersclassified as high in the basic comprehension reading ability groupremained high on the Reading to LearnReading to IntegrateComposite For the mid group however only 189 remainedclassified as mid Just over half (52) of the low reading abilitygroup members remained low on the Reading to LearnReading to Integrate Composite As with the nonnative speakers participantsin the mid category on basic comprehension showed the mostfrequent reclassification Forty-eight of the 101 (475) nativespeaker participants remained in the initial classification categoriesTwenty-two (218) were reclassified into a higher category and 31(307) were reclassified into a lower category Thus 53 participants(525) over half of the sample were classified differently based ontheir performance on the Reading to LearnReading to IntegrateComposite
6Reading to Learn correlated with function 2 at -58 Reading to Integrate with function 1 at 71
Table 12 Discriminant analysis comparison of native speaker reading abilitygroups with Reading to LearnReading to Integrate Composite (n 101)
Reading ability group Predicted group membership Initial for Reading to LearnReading classification totalto Integrate Composite
High Mid Low
CountHigh (56) 37 17 3CountHigh (135) 28 3 8 39Mid (119ndash134) 10 7 20 37Low (118) 5 7 13 25Reclassification total 43 17 41 101
PercentageHigh (135) 718 77 205 1000Mid (119ndash134) 270 189 541 1000Low (118) 200 280 520 1000
Note Total n 101 for discriminant analysis due to loss of four cases in anomalousreading to integrate session
V Interpretation
Analyses done to answer Research Questions 1 and 2 showed that performance on Reading to Learn and Reading to Integratemeasures was significantly influenced by language background and level of education (graduate vs undergraduate for nonnativespeakers only) Moreover level of computer familiarity had nosignificant effect on Reading to Learn and Reading to Integrateperformance
Correlations showed that as expected all reading measures corre-lated to some degree generally answering Research Question 3 Themost interesting results came from the discriminant analyses becausethey showed a pattern of differential classification on the new taskssensitive to initial reading ability This pattern differed for nonnativespeakers and native speakers Examination of the reclassificationsbased on the new tasks revealed some of the problems of using abasic comprehension-only test to predict performance on more chal-lenging literacy tasks These results also imply the need for furtherwork on new measures of advanced literacy skills such as Reading toLearn and Reading to Integrate to reflect trends in construct-drivenassessment (Pellegrino et al 1999)
For the nonnative speakers most (818) of the participants withTOEFL Reading Comprehension below 50 remained classified as lowon the Reading to LearnReading to Integrate Composite suggestingthe existence of a lower threshold of academic English proficiencyParticipants below this threshold were unlikely to perform well ontasks assessing Reading to Learn and Reading to Integrate Even tasksinvolving only selection (such as BC1 and BC2) rather than produc-tion of responses were difficult for this group However for thenonnative speaker readers in the mid and high reading ability groupsbasic comprehension level was not nearly as consistent a predictor of performance on the Reading to LearnReading to IntegrateComposite This was especially striking for those in the mid readingability group Approximately one fourth of the mid group droppedinto the low category on the Reading to LearnReading to IntegrateComposite indicating that they experienced more difficulty in com-pleting the new measures but approximately one fourth could indeedcategorize and synthesize information relatively well despite mid-dling performance on basic comprehension This finding suggeststhat for nonnative speaker readers in the mid reading ability categorybasic comprehension measures were insufficient to predict theirperformance on the new measures
Latricia Trites and Mary McGroarty 197
Almost two-thirds of the high basic comprehension nonnativespeakers remained high on the Reading to LearnReading toIntegrate Composite but one third dropped suggesting that theycould not categorize and synthesize information from texts as well asthey could recognize information For this third the recognition-onlytask multiple-choice basic comprehension overestimated ability tomanipulate and synthesize information For the high and mid read-ing ability nonnative speaker groups basic comprehension-only testsmay then overestimate academic English proficiency
Results for the native speakers showed a more definitive pattern ofreclassification For the low reading ability group approximatelyhalf were misclassified suggesting that the basic comprehensionmeasure underrepresented their ability to categorize and synthesizeinformation in academic texts Most of the mid reading ability groupwere reclassified as lower on the Reading to LearnReading toIntegrate Composite showing that basic comprehension resultsoverestimated ability to succeed on more challenging tasks On theother hand almost one third of the mid reading ability group didbetter on the Reading to LearnReading to Integrate Composite forthem basic comprehension underrepresented their ability to com-plete more challenging reading tasks For the high ability readersnearly one third dropped on the Reading to LearnReading toIntegrate Composite For them the basic comprehension measureoverestimated their level of academic English proficiency The other two thirds of the high reading ability group remained highsuggesting a higher threshold of academic English proficiency
Considering results from both the nonnative speakers and nativespeakers we conclude that the new tasks did assess something dif-ferent from basic comprehension once a lower level threshold ofbasic academic English proficiency had been achieved Examinationof scatterplots based on discriminant functions indicated an obviousseparation between participants who could and could not perform onthe Reading to LearnReading to Integrate Composite thus provid-ing some tentative evidence for concurrent validity (Messick 1989Chapelle 1999) We had hoped to find clear evidence of a hierarchysuggesting that Reading to Learn was demonstrably more difficultthan basic comprehension and Reading to Integrate demonstrablymore difficult than Reading to Learn but results did not yield anobvious hierarchy Results did however suggest an even simplerpattern a dichotomy For those nonnative speakers above the lowerthreshold of English language proficiency which in our data wouldbe approximately 500 or 49 on the scaled TOEFL Reading
198 New tasks for reading comprehension tests
Comprehension scores some could perform well on the newmeasures some could not revealing two rather than three groups onthe Reading to LearnReading to Integrate Composite These resultslead us to speculate that the new measures tap additional skills suchas lsquosophisticated discourse processes and critical thinking skillsrsquo(Enright and Schedl 1999 24) in addition to language proficiencyFor native speakers the discriminant loading suggested a possiblehierarchy of difficulty but the scatterplots and classification tablesrevealed a dichotomy Reclassification based on the Reading toLearnReading to Integrate Composite nearly eliminated the mid basic comprehension reading ability category with only 168of the native speakers qualifying as mid on the Reading toLearnReading to Integrate Composite
VI Conclusions
While we acknowledge that there are some limitations to this projectsuch as the time required to administer and score new measuresthe results are illuminating useful and suggest some considerationsfor future research The first consideration relates to appropriateclassification of students based on academic reading abilities Thiscorresponds to the role of TOEFL as a gatekeeper by many institu-tions For TOEFL test-takers whose current TOEFL scores would bein an intermediate to high intermediate range the new tasks couldassess additional abilities relevant to academic performance Forsuch participants additional test development efforts are warrantedThe majority of high reading ability nonnative speakers (649)could perform the new measures a third could not Our resultssuggest that some unknown number of mid reading ability nonnativespeakers are more capable of succeeding at more challenging aca-demic reading tasks than their current level of basic comprehensionassessment would indicate At the same time there are some highand mid reading ability participants who could not perform the moredemanding tasks admitting such students directly into universityprograms could result in failure These results suggest that for moststudents at lower levels of basic comprehension (in our study thosewith TOEFL scores below 490 to 500) development of new tasks isunnecessary Nevertheless our results indicate that a certain smallpercentage (in our sample 8 students 182) of nonnative speakersclassified as low reading ability on basic comprehension did in fact
Latricia Trites and Mary McGroarty 199
perform adequately on more challenging tasks like Reading to Learnand Reading to Integrate Given the very large number of TOEFLtest-takers around the world this finding merits further research Itwould be potentially unfair to exclude such students from universitystudy based on basic comprehension-only measures such as the cur-rent TOEFL We would urge institutions that use TOEFL to considerscores on these new more demanding tasks if doing so wouldaddress institutional needs According to interview data collected inconjunction with this project (Trites 2000 Chapter 6) participantsperceived that successful completion of the new tasks required athorough understanding of the texts whereas the multiple-choicetests were less demanding requiring only superficial grasp of con-tent This finding corroborates the observation of Freedle and Kostin(1999 3) who note that often examinees do not need to comprehendthe accompanying text to answer the test item Therefore for allTOEFL test-takers incorporating more challenging tasks such as theReading to Learn and Reading to Integrate measures into typicalEnglish language instruction could have positive washback effects
The second area of consideration is the focus on additional rele-vant research For predictive validity it is important to determine thecorrelation between the Reading to LearnReading to IntegrateComposite measures and actual academic performance in universityclasses Correlations might differ depending on the degree of catego-rization and synthesis required by different major fields of studyAnother area for future research is related to the novelty of theReading to Learn task as an assessment technique Current pedagog-ical trends in literacy instruction emphasize the use of graphic organ-izers as a means to understand and manipulate information in textsTo our knowledge graphic organizers are typically used as classroomactivities rather than assessment tools They have good potential foruse in assessment if students are familiar with them and if appropri-ate scoring systems can be developed This project demonstrates that it is possible though labor intensive to develop reliable scoringsystems Further research is needed to explore refinement of this tasktype and related scoring systems for use in testing programs withlarge numbers of test-takers and scorers Although the Reading toIntegrate task (generating a synthesis) was more familiar the scoringsystem was innovative because it reflected a readerrsquos ability torecognize textual frames as well as integrate information If testdevelopers are interested in the abilities of nonnative speaker readersto perform such tasks further development of similar tasks andscoring systems is warranted While a substantial investment of time
200 New tasks for reading comprehension tests
would be required to refine the administration and scoring systemsneeded for more complex tasks such as these it should be weighedagainst the possible danger of under-representing the likelihood ofacademic success based on results of the basic comprehension-onlymeasures still most often used to assess academic proficiency
Acknowledgements
This project was funded by a grant from Educational Testing Serviceas part of the TOEFL 2000 effort We appreciate the cooperation ofthe ETS staff members who assisted in the development of passagespecific tasks and other aspects of the research however no officialendorsement of Educational Testing Service should be inferred
VII References
Bachman L 2000 Modern language testing at the turn of the century assur-ing that what we count counts Language Testing 17 1ndash42
Biber D 1993 Using register-diversified corpora for general language studiesComputational Linguistics 19 219ndash41
Britt M Rouet J and Perfetti C 1996 Using hypertext to study and reasonabout historical evidence In Rouet J Levonen J Dillon A and Spiro Reditors Hypertext and cognition Mahwah NJ Lawrence Erlbaum 43ndash72
Chapelle C 1999 Validity in language assessment Annual Review of AppliedLinguistics 19 1ndash19
Educational Testing Service 1997 TOEFL test and score manual PrincetonNJ Educational Testing Service
mdashmdash 1998 Draft TOEFL 2000 research agenda framework areas of researchResearch agenda three TOEFL 2000 internal document Princeton NJEducational Testing Service
Eignor D Taylor C Kirsch I and Jamieson J 1998 Development of ascale for assessing the level of computer familiarity of TOEFL exami-nees TOEFL Research Report No 60 Princeton NJ EducationalTesting Service
Enright M and Schedl M 1999 Reading for a reason using reader purposeto guide test design TOEFL 2000 Internal Report Princeton NJEducational Testing Service
Enright M Grabe W Mosenthal P Mulcahy-Ernt P and Schedl M1998 A TOEFL 2000 framework for testing reading comprehension aworking paper Princeton NJ Educational Testing Service
Foltz P 1996 Comprehension coherence and strategies in hypertext InRouet J Levonen J Dillon A and Spiro R editors Hypertext and cog-nition Mahwah NJ Lawrence Erlbaum 109ndash36
Latricia Trites and Mary McGroarty 201
Freedle R and Kostin I 1999 Does the text matter in a multiple choice testof comprehension The case for the construct validity of TOEFLrsquosminitalks Language Testing 16 2ndash32
Goldman S 1997 Learning from text reflections on the past and suggestionsfor the future Discourse Processes 23 357ndash98
Hayes J and Hatch J 1999 Issues in measuring reliability correlationversus percentage of agreement Written Communication 16 354ndash67
Huberty C 1994 Applied discriminant analysis New York John WileyJamieson J Campbell J Norfleet L and Berbisada N 1993 Reliability
of a computerized scoring routine for an open-ended task System 21305ndash22
Jamieson J Norfleet L and Berbisada N 1993 Successes failures anddropouts in computer-assisted language lessons Computer AssistedEnglish Language Learning Journal 4 12ndash20
Klecka W 1980 Discriminant analysis In Lewis-Beck M editorQuantitative applications in the social sciences Volume 19 NewburyPark CA Sage
Lehto M Zhu W and Carpenter B 1995 The relative effectiveness ofhypertext and text International Journal of Human-ComputerInteraction 7 293ndash313
McNamara D and Kintsch W 1996 Learning from texts effects of prior knowl-edge and text coherence Discourse Processes 22 247ndash88
Messick S 1989 Validity In Linn RL editor Educational measurement 3rdedition New York American Council on Education Macmillan 13ndash103
Meyer B 1985a Prose analyses purposes procedures and problems InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 11ndash64
mdashmdash 1985b Prose analysis purposes procedures and problems Part 2 InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 269ndash97
Monks V 1997 AugustSeptember Two views same waterway NationalWildlife 35 36ndash37
Pellegrino J Baxter G and Glaser R 1999 Addressing the lsquotwo disci-plinesrsquo problem linking theories of cognition and learning with assess-ment and instructional practice Review of Research in Education 24307ndash53
Perfetti C 1997 Sentences individual differences and multiple texts threeissues in text comprehension Discourse Processes 23 337ndash55
Perfetti C Britt MA and Georgi M 1995 Text-based learning andreasoning studies in history Hillside NJ Lawrence Erlbaum
Perfetti C Marron M and Foltz P 1996 Sources of comprehension failuretheoretical perspectives and case studies In Cornoldi C and Oakhill Jeditors Reading comprehension difficulties processes and interventionMahwah NJ Lawrence Erlbaum 137ndash65
202 New tasks for reading comprehension tests
Reinking D 1988 Computer-mediated text and comprehension differencesthe role of reading time reader preference and estimation of learningReading Research Quarterly 23 484ndash500
Reinking D and Schreiner R 1985 The effects of computer-mediated texton measures of reading comprehension and reading behavior ReadingResearch Quarterly 20 536ndash53
Spivey N 1997 The constructivist metaphor reading writing and themaking of meaning San Diego CA Academic Press
Stevens J 1996 Applied multivariate statistics for the social sciences 3rdedition Mahwah NJ Lawrence Erlbaum
Tabachnick B and Fidell L 1996 Using multivariate statistics 3rd editionNew York Harper Collins
Taylor C Jamieson J Eignor D and Kirsch I 1998 The relationshipbetween computer familiarity and performance on computer-basedTOEFL test tasks TOEFL Research Report No 61 Princeton NJEducational Testing Service
Tennesen M 1997 NovemberDecember On a clear day National Parks 7126ndash9
Trites L 2000 Beyond basic comprehension reading to learn and reading tointegrate for native and non-native speakers Unpublished doctoraldissertation Northern Arizona University Flagstaff AZ
Van den Berg S and Watt J 1991 Effects of educational setting on studentresponses to structured hypertext Journal of Computer-BasedInstruction 18 118ndash24
Van Dijk TA and Kintsch W 1983 Strategies of discourse comprehensionNew York Academic Press
Wiley J and Voss J 1999 Constructing arguments from multiple sourcestasks that promote understanding and not just memory for text Journalof Educational Psychology 91 310ndash11
Zimmerman T 1997 December 29 Filter it with billions and billions of oys-ters how to revive the Chesapeake Bay US News and World Report 12363
Appendix 1a Chart completion task
Directions Complete the following chart Fill in as much detail aspossible from the text read by categorizing the information into thedifferent areas on the chart Do not use single words for yourresponses form your responses in phrases or complete sentencesInclude examples from the text
Make no judgments about the accuracy of causes or effects or theeffectiveness of the solutions mentioned in the text Solutions are seen
Latricia Trites and Mary McGroarty 203
204 New tasks for reading comprehension tests
as any action taken in response to the problem(s) Solutions can takemany forms such as proposed solutions attempted solutions or failedsolutions Also space is provided under each category for examplesExamples are specific examples found in the text that are used by theauthor(s) to exemplify the problems causes effects or solutions inthe text However there may not be examples for every category
Points will be awarded for correct responses only There is no penaltyfor incorrect responses Points will be awarded in the following manner
Problems and Solutions 10 points eachCauses and Effects 5 points eachExamples 1 point each
Problems Causes Effects Solutions
Examples Examples Examples Examples
Latricia Trites and Mary McGroarty 205
Ap
pen
dix
1b
Rea
din
g t
o L
earn
sco
rin
g r
ub
ric
0ndash2
41 t
ota
l po
ssib
le
Pro
ble
ms
(10
po
ints
) 1
wo
rd
Cau
ses
(5p
oin
ts)
1 w
ord
E
ffec
ts (
5 p
oin
ts)
1 w
ord
S
olu
tio
ns
(10
po
ints
en
d)
1 w
ord
(6 p
oin
ts)
40
max
imu
m(3
po
ints
) 5
5 m
axim
um
(3 p
oin
ts)
30
max
imu
m(6
po
ints
) 9
0 m
axim
um
bullA
ir p
ollu
tio
nS
mo
g i
n t
he
bullS
ulf
ur
nit
rog
en e
mis
sio
ns
bullTre
es a
nd
pla
nts
aff
ecte
dS
tric
ter
En
vir
on
men
tal
Law
s
Nati
on
al
Park
sbull
Aci
d r
ain
(in
jure
d)
gro
wth
hin
der
ed(r
eso
luti
on
s
acts
)
bullO
pp
osit
ion
or
Ign
ori
ng
bull
Gro
un
d-l
evel
ozo
ne
bullV
isib
ilit
y d
ecre
ased
bull19
77 C
lean
Air
Act
(or
lack
of
coo
per
atio
n)
to
Urb
an
In
du
str
ial
em
issio
ns
bullM
eta
ls l
oo
sen
ed
into
wat
ers
Am
en
dm
en
ts l
ab
elin
gN
Ps
envi
ron
men
tal s
tan
dar
ds
bullA
uto
mo
bile
emis
sio
ns
(su
rfac
eg
rou
nd
wat
er
as C
lass I
are
as
(reg
ula
tio
ns)
bullP
ow
er
pla
nts
fac
tori
esp
lan
ts
wat
er p
ollu
ted
)bull
Reg
ion
al h
aze
reg
ula
tio
ns
bullN
o t
rue
lsquopo
int
sou
rces
rsquoem
issi
on
sbull
Nu
trie
nts
rem
oved
pro
po
sed
by
the
EPA
(myri
ad
of
sm
aller
po
llu
tio
n
bullE
mis
sio
ns
fro
m S
mo
kesta
cks
(lea
ched
) fr
om
so
il bull
Red
ucti
on
of
allo
wab
leso
urc
es
rath
er t
han
on
e (c
him
neys)
and
or
pla
nts
po
llu
tio
n s
tan
dard
s f
rom
la
rge
sou
rce)
bullE
mis
sio
ns
fro
m K
iln
sbull
Pu
blic o
utc
ryo
ver
stan
ce
ind
ust
rybull
Nati
on
al
Park
s l
imit
ed
bull
Un
reg
ula
ted
po
lluti
on
sm
og
o
f b
ig b
usi
nes
sg
ove
rnm
ent
bullC
lean
Air
Act
set
ju
risd
icti
on
of
po
lluti
on
(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)bull
Aq
uati
c l
ife in
jure
d (
dam
aged
)vis
ibilit
y g
oals
sou
rces
ou
tsid
e o
f th
e p
arks
bullS
mo
ke f
rom
Co
ntr
olled
bu
rns
bullO
bje
cti
ng
to c
on
str
ucti
on
bullE
mis
sio
ns
fro
m la
rge
citi
esp
erm
its
bullId
en
tifi
cati
on
of
po
lluti
on
so
urc
es
bullIn
du
str
y m
od
ificati
on
s
Insta
llati
on
o
f d
evic
es
(scr
ub
ber
s)
to r
edu
ce
po
lluti
on
bullP
ub
lic p
ressu
re t
op
rote
ct p
arks
206 New tasks for reading comprehension testsA
pp
en
dix
1b
(co
nti
nu
ed)
Exam
ple
s (
1 p
oin
t) 5
maxim
um
Exam
ple
s (
1 p
oin
t o
r 5 p
oin
ts
Exam
ple
s (
1 p
oin
t) 6
maxim
um
Exam
ple
s (
1 p
oin
t o
r 10 p
oin
ts
wit
h o
vera
rch
ing
cau
se
wit
h o
vera
rch
ing
so
luti
on
liste
d a
bo
ve)
liste
d a
bo
ve)
bullG
reat
Sm
oky M
ou
nta
inbull
Au
tom
ob
ile
emis
sio
ns
bullLe
aves
of
pla
nts
tu
rnin
g
bull19
77 C
lean
Air
Act
N
atio
nal
Par
kbull
Po
wer
pla
nts
fac
tori
es
pu
rple
an
d b
row
n (
stip
plin
g)
Am
en
dm
en
ts
lab
elin
gN
Ps
bullG
ran
d C
an
yo
np
lan
ts e
mis
sio
ns
bullH
ind
erin
g p
ho
tosyn
thesis
of
as C
lass I
are
as
Nat
ion
al P
ark
bullE
mis
sio
ns
fro
m
pla
nts
an
d t
rees
bullS
mo
ky M
ou
nta
in S
ou
rces
S
mo
kesta
cks
(ch
imn
eys)
bullR
ed
ucti
on
of
vis
ibilit
yat
bullR
egio
nal
haz
ere
gu
lati
on
s
fro
m O
hio
New
Yo
rk
bullE
mis
sio
ns
fro
m K
iln
sG
ran
d C
an
yo
np
rop
ose
d b
y th
e E
PA
Atl
anta
etc
bull
Un
reg
ula
ted
po
lluti
on
sm
og
bull
Vis
ibili
ty r
educ
ed 9
0 o
f bull
Red
ucti
on
of
allo
wab
lebull
Gra
nd
Can
yon
So
urc
es(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)th
e da
ysp
ollu
tio
n s
tan
dard
s f
rom
fro
m C
A N
V U
T A
Z N
M
bullS
mok
e fr
om C
on
tro
lled
bu
rns
bullS
ee v
ague
blu
e m
asse
sin
du
stry
and
Mex
ico
bullE
mis
sio
ns
fro
m la
rge
citi
esbull
See
hal
f as
far
as
in 1
919
bullT
N L
utt
rell
corp
bu
ildin
g
per
mit
sit
uat
ion
Exam
ple
s (
1 p
oin
t)
10 m
axim
um
Exam
ple
s (
1 p
oin
t)
5 m
axim
um
bullN
avaj
o G
ener
atin
g S
tati
on
bull
Ob
ject
ing
to
kiln
s in
TN
Pag
e A
Z (
Po
wer
Pla
nt
AZ
)bull
Res
earc
her
s u
sin
g s
cien
tifi
cbull
Po
wer
pla
nts
in T
N a
nd
OH
tech
no
log
y to
ID s
ou
rces
rive
r va
lley
(rad
ioac
tive
iso
top
es)
bullS
ou
ther
n C
alif
orn
ia E
dis
on
bull
So
uth
ern
Cal
ifo
rnia
Ed
iso
nP
lan
t L
aug
hlin
NV
Pla
nt
in L
aug
hlin
NV
bull
Ten
nes
see
Lutt
rell
Kiln
s T
Nid
enti
fied
as
po
int
sou
rce
bullIn
du
stry
in A
Zbull
Scru
bber
sin
stal
led
at
bullC
ars
in C
AN
avaj
o G
ener
atin
g P
lan
tbull
Sm
oke
stac
ks in
NM
NV
UT
bull10
r
edu
ctio
n o
f p
ollu
tio
nbull
Los
An
gel
esin
10ndash
15 y
ears
(g
oal
of
no
bullA
tlan
tam
an-m
ade
po
lluti
on
)bull
New
Yo
rk
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
V Interpretation
Analyses done to answer Research Questions 1 and 2 showed that performance on Reading to Learn and Reading to Integratemeasures was significantly influenced by language background and level of education (graduate vs undergraduate for nonnativespeakers only) Moreover level of computer familiarity had nosignificant effect on Reading to Learn and Reading to Integrateperformance
Correlations showed that as expected all reading measures corre-lated to some degree generally answering Research Question 3 Themost interesting results came from the discriminant analyses becausethey showed a pattern of differential classification on the new taskssensitive to initial reading ability This pattern differed for nonnativespeakers and native speakers Examination of the reclassificationsbased on the new tasks revealed some of the problems of using abasic comprehension-only test to predict performance on more chal-lenging literacy tasks These results also imply the need for furtherwork on new measures of advanced literacy skills such as Reading toLearn and Reading to Integrate to reflect trends in construct-drivenassessment (Pellegrino et al 1999)
For the nonnative speakers most (818) of the participants withTOEFL Reading Comprehension below 50 remained classified as lowon the Reading to LearnReading to Integrate Composite suggestingthe existence of a lower threshold of academic English proficiencyParticipants below this threshold were unlikely to perform well ontasks assessing Reading to Learn and Reading to Integrate Even tasksinvolving only selection (such as BC1 and BC2) rather than produc-tion of responses were difficult for this group However for thenonnative speaker readers in the mid and high reading ability groupsbasic comprehension level was not nearly as consistent a predictor of performance on the Reading to LearnReading to IntegrateComposite This was especially striking for those in the mid readingability group Approximately one fourth of the mid group droppedinto the low category on the Reading to LearnReading to IntegrateComposite indicating that they experienced more difficulty in com-pleting the new measures but approximately one fourth could indeedcategorize and synthesize information relatively well despite mid-dling performance on basic comprehension This finding suggeststhat for nonnative speaker readers in the mid reading ability categorybasic comprehension measures were insufficient to predict theirperformance on the new measures
Latricia Trites and Mary McGroarty 197
Almost two-thirds of the high basic comprehension nonnativespeakers remained high on the Reading to LearnReading toIntegrate Composite but one third dropped suggesting that theycould not categorize and synthesize information from texts as well asthey could recognize information For this third the recognition-onlytask multiple-choice basic comprehension overestimated ability tomanipulate and synthesize information For the high and mid read-ing ability nonnative speaker groups basic comprehension-only testsmay then overestimate academic English proficiency
Results for the native speakers showed a more definitive pattern ofreclassification For the low reading ability group approximatelyhalf were misclassified suggesting that the basic comprehensionmeasure underrepresented their ability to categorize and synthesizeinformation in academic texts Most of the mid reading ability groupwere reclassified as lower on the Reading to LearnReading toIntegrate Composite showing that basic comprehension resultsoverestimated ability to succeed on more challenging tasks On theother hand almost one third of the mid reading ability group didbetter on the Reading to LearnReading to Integrate Composite forthem basic comprehension underrepresented their ability to com-plete more challenging reading tasks For the high ability readersnearly one third dropped on the Reading to LearnReading toIntegrate Composite For them the basic comprehension measureoverestimated their level of academic English proficiency The other two thirds of the high reading ability group remained highsuggesting a higher threshold of academic English proficiency
Considering results from both the nonnative speakers and nativespeakers we conclude that the new tasks did assess something dif-ferent from basic comprehension once a lower level threshold ofbasic academic English proficiency had been achieved Examinationof scatterplots based on discriminant functions indicated an obviousseparation between participants who could and could not perform onthe Reading to LearnReading to Integrate Composite thus provid-ing some tentative evidence for concurrent validity (Messick 1989Chapelle 1999) We had hoped to find clear evidence of a hierarchysuggesting that Reading to Learn was demonstrably more difficultthan basic comprehension and Reading to Integrate demonstrablymore difficult than Reading to Learn but results did not yield anobvious hierarchy Results did however suggest an even simplerpattern a dichotomy For those nonnative speakers above the lowerthreshold of English language proficiency which in our data wouldbe approximately 500 or 49 on the scaled TOEFL Reading
198 New tasks for reading comprehension tests
Comprehension scores some could perform well on the newmeasures some could not revealing two rather than three groups onthe Reading to LearnReading to Integrate Composite These resultslead us to speculate that the new measures tap additional skills suchas lsquosophisticated discourse processes and critical thinking skillsrsquo(Enright and Schedl 1999 24) in addition to language proficiencyFor native speakers the discriminant loading suggested a possiblehierarchy of difficulty but the scatterplots and classification tablesrevealed a dichotomy Reclassification based on the Reading toLearnReading to Integrate Composite nearly eliminated the mid basic comprehension reading ability category with only 168of the native speakers qualifying as mid on the Reading toLearnReading to Integrate Composite
VI Conclusions
While we acknowledge that there are some limitations to this projectsuch as the time required to administer and score new measuresthe results are illuminating useful and suggest some considerationsfor future research The first consideration relates to appropriateclassification of students based on academic reading abilities Thiscorresponds to the role of TOEFL as a gatekeeper by many institu-tions For TOEFL test-takers whose current TOEFL scores would bein an intermediate to high intermediate range the new tasks couldassess additional abilities relevant to academic performance Forsuch participants additional test development efforts are warrantedThe majority of high reading ability nonnative speakers (649)could perform the new measures a third could not Our resultssuggest that some unknown number of mid reading ability nonnativespeakers are more capable of succeeding at more challenging aca-demic reading tasks than their current level of basic comprehensionassessment would indicate At the same time there are some highand mid reading ability participants who could not perform the moredemanding tasks admitting such students directly into universityprograms could result in failure These results suggest that for moststudents at lower levels of basic comprehension (in our study thosewith TOEFL scores below 490 to 500) development of new tasks isunnecessary Nevertheless our results indicate that a certain smallpercentage (in our sample 8 students 182) of nonnative speakersclassified as low reading ability on basic comprehension did in fact
Latricia Trites and Mary McGroarty 199
perform adequately on more challenging tasks like Reading to Learnand Reading to Integrate Given the very large number of TOEFLtest-takers around the world this finding merits further research Itwould be potentially unfair to exclude such students from universitystudy based on basic comprehension-only measures such as the cur-rent TOEFL We would urge institutions that use TOEFL to considerscores on these new more demanding tasks if doing so wouldaddress institutional needs According to interview data collected inconjunction with this project (Trites 2000 Chapter 6) participantsperceived that successful completion of the new tasks required athorough understanding of the texts whereas the multiple-choicetests were less demanding requiring only superficial grasp of con-tent This finding corroborates the observation of Freedle and Kostin(1999 3) who note that often examinees do not need to comprehendthe accompanying text to answer the test item Therefore for allTOEFL test-takers incorporating more challenging tasks such as theReading to Learn and Reading to Integrate measures into typicalEnglish language instruction could have positive washback effects
The second area of consideration is the focus on additional rele-vant research For predictive validity it is important to determine thecorrelation between the Reading to LearnReading to IntegrateComposite measures and actual academic performance in universityclasses Correlations might differ depending on the degree of catego-rization and synthesis required by different major fields of studyAnother area for future research is related to the novelty of theReading to Learn task as an assessment technique Current pedagog-ical trends in literacy instruction emphasize the use of graphic organ-izers as a means to understand and manipulate information in textsTo our knowledge graphic organizers are typically used as classroomactivities rather than assessment tools They have good potential foruse in assessment if students are familiar with them and if appropri-ate scoring systems can be developed This project demonstrates that it is possible though labor intensive to develop reliable scoringsystems Further research is needed to explore refinement of this tasktype and related scoring systems for use in testing programs withlarge numbers of test-takers and scorers Although the Reading toIntegrate task (generating a synthesis) was more familiar the scoringsystem was innovative because it reflected a readerrsquos ability torecognize textual frames as well as integrate information If testdevelopers are interested in the abilities of nonnative speaker readersto perform such tasks further development of similar tasks andscoring systems is warranted While a substantial investment of time
200 New tasks for reading comprehension tests
would be required to refine the administration and scoring systemsneeded for more complex tasks such as these it should be weighedagainst the possible danger of under-representing the likelihood ofacademic success based on results of the basic comprehension-onlymeasures still most often used to assess academic proficiency
Acknowledgements
This project was funded by a grant from Educational Testing Serviceas part of the TOEFL 2000 effort We appreciate the cooperation ofthe ETS staff members who assisted in the development of passagespecific tasks and other aspects of the research however no officialendorsement of Educational Testing Service should be inferred
VII References
Bachman L 2000 Modern language testing at the turn of the century assur-ing that what we count counts Language Testing 17 1ndash42
Biber D 1993 Using register-diversified corpora for general language studiesComputational Linguistics 19 219ndash41
Britt M Rouet J and Perfetti C 1996 Using hypertext to study and reasonabout historical evidence In Rouet J Levonen J Dillon A and Spiro Reditors Hypertext and cognition Mahwah NJ Lawrence Erlbaum 43ndash72
Chapelle C 1999 Validity in language assessment Annual Review of AppliedLinguistics 19 1ndash19
Educational Testing Service 1997 TOEFL test and score manual PrincetonNJ Educational Testing Service
mdashmdash 1998 Draft TOEFL 2000 research agenda framework areas of researchResearch agenda three TOEFL 2000 internal document Princeton NJEducational Testing Service
Eignor D Taylor C Kirsch I and Jamieson J 1998 Development of ascale for assessing the level of computer familiarity of TOEFL exami-nees TOEFL Research Report No 60 Princeton NJ EducationalTesting Service
Enright M and Schedl M 1999 Reading for a reason using reader purposeto guide test design TOEFL 2000 Internal Report Princeton NJEducational Testing Service
Enright M Grabe W Mosenthal P Mulcahy-Ernt P and Schedl M1998 A TOEFL 2000 framework for testing reading comprehension aworking paper Princeton NJ Educational Testing Service
Foltz P 1996 Comprehension coherence and strategies in hypertext InRouet J Levonen J Dillon A and Spiro R editors Hypertext and cog-nition Mahwah NJ Lawrence Erlbaum 109ndash36
Latricia Trites and Mary McGroarty 201
Freedle R and Kostin I 1999 Does the text matter in a multiple choice testof comprehension The case for the construct validity of TOEFLrsquosminitalks Language Testing 16 2ndash32
Goldman S 1997 Learning from text reflections on the past and suggestionsfor the future Discourse Processes 23 357ndash98
Hayes J and Hatch J 1999 Issues in measuring reliability correlationversus percentage of agreement Written Communication 16 354ndash67
Huberty C 1994 Applied discriminant analysis New York John WileyJamieson J Campbell J Norfleet L and Berbisada N 1993 Reliability
of a computerized scoring routine for an open-ended task System 21305ndash22
Jamieson J Norfleet L and Berbisada N 1993 Successes failures anddropouts in computer-assisted language lessons Computer AssistedEnglish Language Learning Journal 4 12ndash20
Klecka W 1980 Discriminant analysis In Lewis-Beck M editorQuantitative applications in the social sciences Volume 19 NewburyPark CA Sage
Lehto M Zhu W and Carpenter B 1995 The relative effectiveness ofhypertext and text International Journal of Human-ComputerInteraction 7 293ndash313
McNamara D and Kintsch W 1996 Learning from texts effects of prior knowl-edge and text coherence Discourse Processes 22 247ndash88
Messick S 1989 Validity In Linn RL editor Educational measurement 3rdedition New York American Council on Education Macmillan 13ndash103
Meyer B 1985a Prose analyses purposes procedures and problems InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 11ndash64
mdashmdash 1985b Prose analysis purposes procedures and problems Part 2 InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 269ndash97
Monks V 1997 AugustSeptember Two views same waterway NationalWildlife 35 36ndash37
Pellegrino J Baxter G and Glaser R 1999 Addressing the lsquotwo disci-plinesrsquo problem linking theories of cognition and learning with assess-ment and instructional practice Review of Research in Education 24307ndash53
Perfetti C 1997 Sentences individual differences and multiple texts threeissues in text comprehension Discourse Processes 23 337ndash55
Perfetti C Britt MA and Georgi M 1995 Text-based learning andreasoning studies in history Hillside NJ Lawrence Erlbaum
Perfetti C Marron M and Foltz P 1996 Sources of comprehension failuretheoretical perspectives and case studies In Cornoldi C and Oakhill Jeditors Reading comprehension difficulties processes and interventionMahwah NJ Lawrence Erlbaum 137ndash65
202 New tasks for reading comprehension tests
Reinking D 1988 Computer-mediated text and comprehension differencesthe role of reading time reader preference and estimation of learningReading Research Quarterly 23 484ndash500
Reinking D and Schreiner R 1985 The effects of computer-mediated texton measures of reading comprehension and reading behavior ReadingResearch Quarterly 20 536ndash53
Spivey N 1997 The constructivist metaphor reading writing and themaking of meaning San Diego CA Academic Press
Stevens J 1996 Applied multivariate statistics for the social sciences 3rdedition Mahwah NJ Lawrence Erlbaum
Tabachnick B and Fidell L 1996 Using multivariate statistics 3rd editionNew York Harper Collins
Taylor C Jamieson J Eignor D and Kirsch I 1998 The relationshipbetween computer familiarity and performance on computer-basedTOEFL test tasks TOEFL Research Report No 61 Princeton NJEducational Testing Service
Tennesen M 1997 NovemberDecember On a clear day National Parks 7126ndash9
Trites L 2000 Beyond basic comprehension reading to learn and reading tointegrate for native and non-native speakers Unpublished doctoraldissertation Northern Arizona University Flagstaff AZ
Van den Berg S and Watt J 1991 Effects of educational setting on studentresponses to structured hypertext Journal of Computer-BasedInstruction 18 118ndash24
Van Dijk TA and Kintsch W 1983 Strategies of discourse comprehensionNew York Academic Press
Wiley J and Voss J 1999 Constructing arguments from multiple sourcestasks that promote understanding and not just memory for text Journalof Educational Psychology 91 310ndash11
Zimmerman T 1997 December 29 Filter it with billions and billions of oys-ters how to revive the Chesapeake Bay US News and World Report 12363
Appendix 1a Chart completion task
Directions Complete the following chart Fill in as much detail aspossible from the text read by categorizing the information into thedifferent areas on the chart Do not use single words for yourresponses form your responses in phrases or complete sentencesInclude examples from the text
Make no judgments about the accuracy of causes or effects or theeffectiveness of the solutions mentioned in the text Solutions are seen
Latricia Trites and Mary McGroarty 203
204 New tasks for reading comprehension tests
as any action taken in response to the problem(s) Solutions can takemany forms such as proposed solutions attempted solutions or failedsolutions Also space is provided under each category for examplesExamples are specific examples found in the text that are used by theauthor(s) to exemplify the problems causes effects or solutions inthe text However there may not be examples for every category
Points will be awarded for correct responses only There is no penaltyfor incorrect responses Points will be awarded in the following manner
Problems and Solutions 10 points eachCauses and Effects 5 points eachExamples 1 point each
Problems Causes Effects Solutions
Examples Examples Examples Examples
Latricia Trites and Mary McGroarty 205
Ap
pen
dix
1b
Rea
din
g t
o L
earn
sco
rin
g r
ub
ric
0ndash2
41 t
ota
l po
ssib
le
Pro
ble
ms
(10
po
ints
) 1
wo
rd
Cau
ses
(5p
oin
ts)
1 w
ord
E
ffec
ts (
5 p
oin
ts)
1 w
ord
S
olu
tio
ns
(10
po
ints
en
d)
1 w
ord
(6 p
oin
ts)
40
max
imu
m(3
po
ints
) 5
5 m
axim
um
(3 p
oin
ts)
30
max
imu
m(6
po
ints
) 9
0 m
axim
um
bullA
ir p
ollu
tio
nS
mo
g i
n t
he
bullS
ulf
ur
nit
rog
en e
mis
sio
ns
bullTre
es a
nd
pla
nts
aff
ecte
dS
tric
ter
En
vir
on
men
tal
Law
s
Nati
on
al
Park
sbull
Aci
d r
ain
(in
jure
d)
gro
wth
hin
der
ed(r
eso
luti
on
s
acts
)
bullO
pp
osit
ion
or
Ign
ori
ng
bull
Gro
un
d-l
evel
ozo
ne
bullV
isib
ilit
y d
ecre
ased
bull19
77 C
lean
Air
Act
(or
lack
of
coo
per
atio
n)
to
Urb
an
In
du
str
ial
em
issio
ns
bullM
eta
ls l
oo
sen
ed
into
wat
ers
Am
en
dm
en
ts l
ab
elin
gN
Ps
envi
ron
men
tal s
tan
dar
ds
bullA
uto
mo
bile
emis
sio
ns
(su
rfac
eg
rou
nd
wat
er
as C
lass I
are
as
(reg
ula
tio
ns)
bullP
ow
er
pla
nts
fac
tori
esp
lan
ts
wat
er p
ollu
ted
)bull
Reg
ion
al h
aze
reg
ula
tio
ns
bullN
o t
rue
lsquopo
int
sou
rces
rsquoem
issi
on
sbull
Nu
trie
nts
rem
oved
pro
po
sed
by
the
EPA
(myri
ad
of
sm
aller
po
llu
tio
n
bullE
mis
sio
ns
fro
m S
mo
kesta
cks
(lea
ched
) fr
om
so
il bull
Red
ucti
on
of
allo
wab
leso
urc
es
rath
er t
han
on
e (c
him
neys)
and
or
pla
nts
po
llu
tio
n s
tan
dard
s f
rom
la
rge
sou
rce)
bullE
mis
sio
ns
fro
m K
iln
sbull
Pu
blic o
utc
ryo
ver
stan
ce
ind
ust
rybull
Nati
on
al
Park
s l
imit
ed
bull
Un
reg
ula
ted
po
lluti
on
sm
og
o
f b
ig b
usi
nes
sg
ove
rnm
ent
bullC
lean
Air
Act
set
ju
risd
icti
on
of
po
lluti
on
(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)bull
Aq
uati
c l
ife in
jure
d (
dam
aged
)vis
ibilit
y g
oals
sou
rces
ou
tsid
e o
f th
e p
arks
bullS
mo
ke f
rom
Co
ntr
olled
bu
rns
bullO
bje
cti
ng
to c
on
str
ucti
on
bullE
mis
sio
ns
fro
m la
rge
citi
esp
erm
its
bullId
en
tifi
cati
on
of
po
lluti
on
so
urc
es
bullIn
du
str
y m
od
ificati
on
s
Insta
llati
on
o
f d
evic
es
(scr
ub
ber
s)
to r
edu
ce
po
lluti
on
bullP
ub
lic p
ressu
re t
op
rote
ct p
arks
206 New tasks for reading comprehension testsA
pp
en
dix
1b
(co
nti
nu
ed)
Exam
ple
s (
1 p
oin
t) 5
maxim
um
Exam
ple
s (
1 p
oin
t o
r 5 p
oin
ts
Exam
ple
s (
1 p
oin
t) 6
maxim
um
Exam
ple
s (
1 p
oin
t o
r 10 p
oin
ts
wit
h o
vera
rch
ing
cau
se
wit
h o
vera
rch
ing
so
luti
on
liste
d a
bo
ve)
liste
d a
bo
ve)
bullG
reat
Sm
oky M
ou
nta
inbull
Au
tom
ob
ile
emis
sio
ns
bullLe
aves
of
pla
nts
tu
rnin
g
bull19
77 C
lean
Air
Act
N
atio
nal
Par
kbull
Po
wer
pla
nts
fac
tori
es
pu
rple
an
d b
row
n (
stip
plin
g)
Am
en
dm
en
ts
lab
elin
gN
Ps
bullG
ran
d C
an
yo
np
lan
ts e
mis
sio
ns
bullH
ind
erin
g p
ho
tosyn
thesis
of
as C
lass I
are
as
Nat
ion
al P
ark
bullE
mis
sio
ns
fro
m
pla
nts
an
d t
rees
bullS
mo
ky M
ou
nta
in S
ou
rces
S
mo
kesta
cks
(ch
imn
eys)
bullR
ed
ucti
on
of
vis
ibilit
yat
bullR
egio
nal
haz
ere
gu
lati
on
s
fro
m O
hio
New
Yo
rk
bullE
mis
sio
ns
fro
m K
iln
sG
ran
d C
an
yo
np
rop
ose
d b
y th
e E
PA
Atl
anta
etc
bull
Un
reg
ula
ted
po
lluti
on
sm
og
bull
Vis
ibili
ty r
educ
ed 9
0 o
f bull
Red
ucti
on
of
allo
wab
lebull
Gra
nd
Can
yon
So
urc
es(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)th
e da
ysp
ollu
tio
n s
tan
dard
s f
rom
fro
m C
A N
V U
T A
Z N
M
bullS
mok
e fr
om C
on
tro
lled
bu
rns
bullS
ee v
ague
blu
e m
asse
sin
du
stry
and
Mex
ico
bullE
mis
sio
ns
fro
m la
rge
citi
esbull
See
hal
f as
far
as
in 1
919
bullT
N L
utt
rell
corp
bu
ildin
g
per
mit
sit
uat
ion
Exam
ple
s (
1 p
oin
t)
10 m
axim
um
Exam
ple
s (
1 p
oin
t)
5 m
axim
um
bullN
avaj
o G
ener
atin
g S
tati
on
bull
Ob
ject
ing
to
kiln
s in
TN
Pag
e A
Z (
Po
wer
Pla
nt
AZ
)bull
Res
earc
her
s u
sin
g s
cien
tifi
cbull
Po
wer
pla
nts
in T
N a
nd
OH
tech
no
log
y to
ID s
ou
rces
rive
r va
lley
(rad
ioac
tive
iso
top
es)
bullS
ou
ther
n C
alif
orn
ia E
dis
on
bull
So
uth
ern
Cal
ifo
rnia
Ed
iso
nP
lan
t L
aug
hlin
NV
Pla
nt
in L
aug
hlin
NV
bull
Ten
nes
see
Lutt
rell
Kiln
s T
Nid
enti
fied
as
po
int
sou
rce
bullIn
du
stry
in A
Zbull
Scru
bber
sin
stal
led
at
bullC
ars
in C
AN
avaj
o G
ener
atin
g P
lan
tbull
Sm
oke
stac
ks in
NM
NV
UT
bull10
r
edu
ctio
n o
f p
ollu
tio
nbull
Los
An
gel
esin
10ndash
15 y
ears
(g
oal
of
no
bullA
tlan
tam
an-m
ade
po
lluti
on
)bull
New
Yo
rk
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
Almost two-thirds of the high basic comprehension nonnativespeakers remained high on the Reading to LearnReading toIntegrate Composite but one third dropped suggesting that theycould not categorize and synthesize information from texts as well asthey could recognize information For this third the recognition-onlytask multiple-choice basic comprehension overestimated ability tomanipulate and synthesize information For the high and mid read-ing ability nonnative speaker groups basic comprehension-only testsmay then overestimate academic English proficiency
Results for the native speakers showed a more definitive pattern ofreclassification For the low reading ability group approximatelyhalf were misclassified suggesting that the basic comprehensionmeasure underrepresented their ability to categorize and synthesizeinformation in academic texts Most of the mid reading ability groupwere reclassified as lower on the Reading to LearnReading toIntegrate Composite showing that basic comprehension resultsoverestimated ability to succeed on more challenging tasks On theother hand almost one third of the mid reading ability group didbetter on the Reading to LearnReading to Integrate Composite forthem basic comprehension underrepresented their ability to com-plete more challenging reading tasks For the high ability readersnearly one third dropped on the Reading to LearnReading toIntegrate Composite For them the basic comprehension measureoverestimated their level of academic English proficiency The other two thirds of the high reading ability group remained highsuggesting a higher threshold of academic English proficiency
Considering results from both the nonnative speakers and nativespeakers we conclude that the new tasks did assess something dif-ferent from basic comprehension once a lower level threshold ofbasic academic English proficiency had been achieved Examinationof scatterplots based on discriminant functions indicated an obviousseparation between participants who could and could not perform onthe Reading to LearnReading to Integrate Composite thus provid-ing some tentative evidence for concurrent validity (Messick 1989Chapelle 1999) We had hoped to find clear evidence of a hierarchysuggesting that Reading to Learn was demonstrably more difficultthan basic comprehension and Reading to Integrate demonstrablymore difficult than Reading to Learn but results did not yield anobvious hierarchy Results did however suggest an even simplerpattern a dichotomy For those nonnative speakers above the lowerthreshold of English language proficiency which in our data wouldbe approximately 500 or 49 on the scaled TOEFL Reading
198 New tasks for reading comprehension tests
Comprehension scores some could perform well on the newmeasures some could not revealing two rather than three groups onthe Reading to LearnReading to Integrate Composite These resultslead us to speculate that the new measures tap additional skills suchas lsquosophisticated discourse processes and critical thinking skillsrsquo(Enright and Schedl 1999 24) in addition to language proficiencyFor native speakers the discriminant loading suggested a possiblehierarchy of difficulty but the scatterplots and classification tablesrevealed a dichotomy Reclassification based on the Reading toLearnReading to Integrate Composite nearly eliminated the mid basic comprehension reading ability category with only 168of the native speakers qualifying as mid on the Reading toLearnReading to Integrate Composite
VI Conclusions
While we acknowledge that there are some limitations to this projectsuch as the time required to administer and score new measuresthe results are illuminating useful and suggest some considerationsfor future research The first consideration relates to appropriateclassification of students based on academic reading abilities Thiscorresponds to the role of TOEFL as a gatekeeper by many institu-tions For TOEFL test-takers whose current TOEFL scores would bein an intermediate to high intermediate range the new tasks couldassess additional abilities relevant to academic performance Forsuch participants additional test development efforts are warrantedThe majority of high reading ability nonnative speakers (649)could perform the new measures a third could not Our resultssuggest that some unknown number of mid reading ability nonnativespeakers are more capable of succeeding at more challenging aca-demic reading tasks than their current level of basic comprehensionassessment would indicate At the same time there are some highand mid reading ability participants who could not perform the moredemanding tasks admitting such students directly into universityprograms could result in failure These results suggest that for moststudents at lower levels of basic comprehension (in our study thosewith TOEFL scores below 490 to 500) development of new tasks isunnecessary Nevertheless our results indicate that a certain smallpercentage (in our sample 8 students 182) of nonnative speakersclassified as low reading ability on basic comprehension did in fact
Latricia Trites and Mary McGroarty 199
perform adequately on more challenging tasks like Reading to Learnand Reading to Integrate Given the very large number of TOEFLtest-takers around the world this finding merits further research Itwould be potentially unfair to exclude such students from universitystudy based on basic comprehension-only measures such as the cur-rent TOEFL We would urge institutions that use TOEFL to considerscores on these new more demanding tasks if doing so wouldaddress institutional needs According to interview data collected inconjunction with this project (Trites 2000 Chapter 6) participantsperceived that successful completion of the new tasks required athorough understanding of the texts whereas the multiple-choicetests were less demanding requiring only superficial grasp of con-tent This finding corroborates the observation of Freedle and Kostin(1999 3) who note that often examinees do not need to comprehendthe accompanying text to answer the test item Therefore for allTOEFL test-takers incorporating more challenging tasks such as theReading to Learn and Reading to Integrate measures into typicalEnglish language instruction could have positive washback effects
The second area of consideration is the focus on additional rele-vant research For predictive validity it is important to determine thecorrelation between the Reading to LearnReading to IntegrateComposite measures and actual academic performance in universityclasses Correlations might differ depending on the degree of catego-rization and synthesis required by different major fields of studyAnother area for future research is related to the novelty of theReading to Learn task as an assessment technique Current pedagog-ical trends in literacy instruction emphasize the use of graphic organ-izers as a means to understand and manipulate information in textsTo our knowledge graphic organizers are typically used as classroomactivities rather than assessment tools They have good potential foruse in assessment if students are familiar with them and if appropri-ate scoring systems can be developed This project demonstrates that it is possible though labor intensive to develop reliable scoringsystems Further research is needed to explore refinement of this tasktype and related scoring systems for use in testing programs withlarge numbers of test-takers and scorers Although the Reading toIntegrate task (generating a synthesis) was more familiar the scoringsystem was innovative because it reflected a readerrsquos ability torecognize textual frames as well as integrate information If testdevelopers are interested in the abilities of nonnative speaker readersto perform such tasks further development of similar tasks andscoring systems is warranted While a substantial investment of time
200 New tasks for reading comprehension tests
would be required to refine the administration and scoring systemsneeded for more complex tasks such as these it should be weighedagainst the possible danger of under-representing the likelihood ofacademic success based on results of the basic comprehension-onlymeasures still most often used to assess academic proficiency
Acknowledgements
This project was funded by a grant from Educational Testing Serviceas part of the TOEFL 2000 effort We appreciate the cooperation ofthe ETS staff members who assisted in the development of passagespecific tasks and other aspects of the research however no officialendorsement of Educational Testing Service should be inferred
VII References
Bachman L 2000 Modern language testing at the turn of the century assur-ing that what we count counts Language Testing 17 1ndash42
Biber D 1993 Using register-diversified corpora for general language studiesComputational Linguistics 19 219ndash41
Britt M Rouet J and Perfetti C 1996 Using hypertext to study and reasonabout historical evidence In Rouet J Levonen J Dillon A and Spiro Reditors Hypertext and cognition Mahwah NJ Lawrence Erlbaum 43ndash72
Chapelle C 1999 Validity in language assessment Annual Review of AppliedLinguistics 19 1ndash19
Educational Testing Service 1997 TOEFL test and score manual PrincetonNJ Educational Testing Service
mdashmdash 1998 Draft TOEFL 2000 research agenda framework areas of researchResearch agenda three TOEFL 2000 internal document Princeton NJEducational Testing Service
Eignor D Taylor C Kirsch I and Jamieson J 1998 Development of ascale for assessing the level of computer familiarity of TOEFL exami-nees TOEFL Research Report No 60 Princeton NJ EducationalTesting Service
Enright M and Schedl M 1999 Reading for a reason using reader purposeto guide test design TOEFL 2000 Internal Report Princeton NJEducational Testing Service
Enright M Grabe W Mosenthal P Mulcahy-Ernt P and Schedl M1998 A TOEFL 2000 framework for testing reading comprehension aworking paper Princeton NJ Educational Testing Service
Foltz P 1996 Comprehension coherence and strategies in hypertext InRouet J Levonen J Dillon A and Spiro R editors Hypertext and cog-nition Mahwah NJ Lawrence Erlbaum 109ndash36
Latricia Trites and Mary McGroarty 201
Freedle R and Kostin I 1999 Does the text matter in a multiple choice testof comprehension The case for the construct validity of TOEFLrsquosminitalks Language Testing 16 2ndash32
Goldman S 1997 Learning from text reflections on the past and suggestionsfor the future Discourse Processes 23 357ndash98
Hayes J and Hatch J 1999 Issues in measuring reliability correlationversus percentage of agreement Written Communication 16 354ndash67
Huberty C 1994 Applied discriminant analysis New York John WileyJamieson J Campbell J Norfleet L and Berbisada N 1993 Reliability
of a computerized scoring routine for an open-ended task System 21305ndash22
Jamieson J Norfleet L and Berbisada N 1993 Successes failures anddropouts in computer-assisted language lessons Computer AssistedEnglish Language Learning Journal 4 12ndash20
Klecka W 1980 Discriminant analysis In Lewis-Beck M editorQuantitative applications in the social sciences Volume 19 NewburyPark CA Sage
Lehto M Zhu W and Carpenter B 1995 The relative effectiveness ofhypertext and text International Journal of Human-ComputerInteraction 7 293ndash313
McNamara D and Kintsch W 1996 Learning from texts effects of prior knowl-edge and text coherence Discourse Processes 22 247ndash88
Messick S 1989 Validity In Linn RL editor Educational measurement 3rdedition New York American Council on Education Macmillan 13ndash103
Meyer B 1985a Prose analyses purposes procedures and problems InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 11ndash64
mdashmdash 1985b Prose analysis purposes procedures and problems Part 2 InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 269ndash97
Monks V 1997 AugustSeptember Two views same waterway NationalWildlife 35 36ndash37
Pellegrino J Baxter G and Glaser R 1999 Addressing the lsquotwo disci-plinesrsquo problem linking theories of cognition and learning with assess-ment and instructional practice Review of Research in Education 24307ndash53
Perfetti C 1997 Sentences individual differences and multiple texts threeissues in text comprehension Discourse Processes 23 337ndash55
Perfetti C Britt MA and Georgi M 1995 Text-based learning andreasoning studies in history Hillside NJ Lawrence Erlbaum
Perfetti C Marron M and Foltz P 1996 Sources of comprehension failuretheoretical perspectives and case studies In Cornoldi C and Oakhill Jeditors Reading comprehension difficulties processes and interventionMahwah NJ Lawrence Erlbaum 137ndash65
202 New tasks for reading comprehension tests
Reinking D 1988 Computer-mediated text and comprehension differencesthe role of reading time reader preference and estimation of learningReading Research Quarterly 23 484ndash500
Reinking D and Schreiner R 1985 The effects of computer-mediated texton measures of reading comprehension and reading behavior ReadingResearch Quarterly 20 536ndash53
Spivey N 1997 The constructivist metaphor reading writing and themaking of meaning San Diego CA Academic Press
Stevens J 1996 Applied multivariate statistics for the social sciences 3rdedition Mahwah NJ Lawrence Erlbaum
Tabachnick B and Fidell L 1996 Using multivariate statistics 3rd editionNew York Harper Collins
Taylor C Jamieson J Eignor D and Kirsch I 1998 The relationshipbetween computer familiarity and performance on computer-basedTOEFL test tasks TOEFL Research Report No 61 Princeton NJEducational Testing Service
Tennesen M 1997 NovemberDecember On a clear day National Parks 7126ndash9
Trites L 2000 Beyond basic comprehension reading to learn and reading tointegrate for native and non-native speakers Unpublished doctoraldissertation Northern Arizona University Flagstaff AZ
Van den Berg S and Watt J 1991 Effects of educational setting on studentresponses to structured hypertext Journal of Computer-BasedInstruction 18 118ndash24
Van Dijk TA and Kintsch W 1983 Strategies of discourse comprehensionNew York Academic Press
Wiley J and Voss J 1999 Constructing arguments from multiple sourcestasks that promote understanding and not just memory for text Journalof Educational Psychology 91 310ndash11
Zimmerman T 1997 December 29 Filter it with billions and billions of oys-ters how to revive the Chesapeake Bay US News and World Report 12363
Appendix 1a Chart completion task
Directions Complete the following chart Fill in as much detail aspossible from the text read by categorizing the information into thedifferent areas on the chart Do not use single words for yourresponses form your responses in phrases or complete sentencesInclude examples from the text
Make no judgments about the accuracy of causes or effects or theeffectiveness of the solutions mentioned in the text Solutions are seen
Latricia Trites and Mary McGroarty 203
204 New tasks for reading comprehension tests
as any action taken in response to the problem(s) Solutions can takemany forms such as proposed solutions attempted solutions or failedsolutions Also space is provided under each category for examplesExamples are specific examples found in the text that are used by theauthor(s) to exemplify the problems causes effects or solutions inthe text However there may not be examples for every category
Points will be awarded for correct responses only There is no penaltyfor incorrect responses Points will be awarded in the following manner
Problems and Solutions 10 points eachCauses and Effects 5 points eachExamples 1 point each
Problems Causes Effects Solutions
Examples Examples Examples Examples
Latricia Trites and Mary McGroarty 205
Ap
pen
dix
1b
Rea
din
g t
o L
earn
sco
rin
g r
ub
ric
0ndash2
41 t
ota
l po
ssib
le
Pro
ble
ms
(10
po
ints
) 1
wo
rd
Cau
ses
(5p
oin
ts)
1 w
ord
E
ffec
ts (
5 p
oin
ts)
1 w
ord
S
olu
tio
ns
(10
po
ints
en
d)
1 w
ord
(6 p
oin
ts)
40
max
imu
m(3
po
ints
) 5
5 m
axim
um
(3 p
oin
ts)
30
max
imu
m(6
po
ints
) 9
0 m
axim
um
bullA
ir p
ollu
tio
nS
mo
g i
n t
he
bullS
ulf
ur
nit
rog
en e
mis
sio
ns
bullTre
es a
nd
pla
nts
aff
ecte
dS
tric
ter
En
vir
on
men
tal
Law
s
Nati
on
al
Park
sbull
Aci
d r
ain
(in
jure
d)
gro
wth
hin
der
ed(r
eso
luti
on
s
acts
)
bullO
pp
osit
ion
or
Ign
ori
ng
bull
Gro
un
d-l
evel
ozo
ne
bullV
isib
ilit
y d
ecre
ased
bull19
77 C
lean
Air
Act
(or
lack
of
coo
per
atio
n)
to
Urb
an
In
du
str
ial
em
issio
ns
bullM
eta
ls l
oo
sen
ed
into
wat
ers
Am
en
dm
en
ts l
ab
elin
gN
Ps
envi
ron
men
tal s
tan
dar
ds
bullA
uto
mo
bile
emis
sio
ns
(su
rfac
eg
rou
nd
wat
er
as C
lass I
are
as
(reg
ula
tio
ns)
bullP
ow
er
pla
nts
fac
tori
esp
lan
ts
wat
er p
ollu
ted
)bull
Reg
ion
al h
aze
reg
ula
tio
ns
bullN
o t
rue
lsquopo
int
sou
rces
rsquoem
issi
on
sbull
Nu
trie
nts
rem
oved
pro
po
sed
by
the
EPA
(myri
ad
of
sm
aller
po
llu
tio
n
bullE
mis
sio
ns
fro
m S
mo
kesta
cks
(lea
ched
) fr
om
so
il bull
Red
ucti
on
of
allo
wab
leso
urc
es
rath
er t
han
on
e (c
him
neys)
and
or
pla
nts
po
llu
tio
n s
tan
dard
s f
rom
la
rge
sou
rce)
bullE
mis
sio
ns
fro
m K
iln
sbull
Pu
blic o
utc
ryo
ver
stan
ce
ind
ust
rybull
Nati
on
al
Park
s l
imit
ed
bull
Un
reg
ula
ted
po
lluti
on
sm
og
o
f b
ig b
usi
nes
sg
ove
rnm
ent
bullC
lean
Air
Act
set
ju
risd
icti
on
of
po
lluti
on
(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)bull
Aq
uati
c l
ife in
jure
d (
dam
aged
)vis
ibilit
y g
oals
sou
rces
ou
tsid
e o
f th
e p
arks
bullS
mo
ke f
rom
Co
ntr
olled
bu
rns
bullO
bje
cti
ng
to c
on
str
ucti
on
bullE
mis
sio
ns
fro
m la
rge
citi
esp
erm
its
bullId
en
tifi
cati
on
of
po
lluti
on
so
urc
es
bullIn
du
str
y m
od
ificati
on
s
Insta
llati
on
o
f d
evic
es
(scr
ub
ber
s)
to r
edu
ce
po
lluti
on
bullP
ub
lic p
ressu
re t
op
rote
ct p
arks
206 New tasks for reading comprehension testsA
pp
en
dix
1b
(co
nti
nu
ed)
Exam
ple
s (
1 p
oin
t) 5
maxim
um
Exam
ple
s (
1 p
oin
t o
r 5 p
oin
ts
Exam
ple
s (
1 p
oin
t) 6
maxim
um
Exam
ple
s (
1 p
oin
t o
r 10 p
oin
ts
wit
h o
vera
rch
ing
cau
se
wit
h o
vera
rch
ing
so
luti
on
liste
d a
bo
ve)
liste
d a
bo
ve)
bullG
reat
Sm
oky M
ou
nta
inbull
Au
tom
ob
ile
emis
sio
ns
bullLe
aves
of
pla
nts
tu
rnin
g
bull19
77 C
lean
Air
Act
N
atio
nal
Par
kbull
Po
wer
pla
nts
fac
tori
es
pu
rple
an
d b
row
n (
stip
plin
g)
Am
en
dm
en
ts
lab
elin
gN
Ps
bullG
ran
d C
an
yo
np
lan
ts e
mis
sio
ns
bullH
ind
erin
g p
ho
tosyn
thesis
of
as C
lass I
are
as
Nat
ion
al P
ark
bullE
mis
sio
ns
fro
m
pla
nts
an
d t
rees
bullS
mo
ky M
ou
nta
in S
ou
rces
S
mo
kesta
cks
(ch
imn
eys)
bullR
ed
ucti
on
of
vis
ibilit
yat
bullR
egio
nal
haz
ere
gu
lati
on
s
fro
m O
hio
New
Yo
rk
bullE
mis
sio
ns
fro
m K
iln
sG
ran
d C
an
yo
np
rop
ose
d b
y th
e E
PA
Atl
anta
etc
bull
Un
reg
ula
ted
po
lluti
on
sm
og
bull
Vis
ibili
ty r
educ
ed 9
0 o
f bull
Red
ucti
on
of
allo
wab
lebull
Gra
nd
Can
yon
So
urc
es(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)th
e da
ysp
ollu
tio
n s
tan
dard
s f
rom
fro
m C
A N
V U
T A
Z N
M
bullS
mok
e fr
om C
on
tro
lled
bu
rns
bullS
ee v
ague
blu
e m
asse
sin
du
stry
and
Mex
ico
bullE
mis
sio
ns
fro
m la
rge
citi
esbull
See
hal
f as
far
as
in 1
919
bullT
N L
utt
rell
corp
bu
ildin
g
per
mit
sit
uat
ion
Exam
ple
s (
1 p
oin
t)
10 m
axim
um
Exam
ple
s (
1 p
oin
t)
5 m
axim
um
bullN
avaj
o G
ener
atin
g S
tati
on
bull
Ob
ject
ing
to
kiln
s in
TN
Pag
e A
Z (
Po
wer
Pla
nt
AZ
)bull
Res
earc
her
s u
sin
g s
cien
tifi
cbull
Po
wer
pla
nts
in T
N a
nd
OH
tech
no
log
y to
ID s
ou
rces
rive
r va
lley
(rad
ioac
tive
iso
top
es)
bullS
ou
ther
n C
alif
orn
ia E
dis
on
bull
So
uth
ern
Cal
ifo
rnia
Ed
iso
nP
lan
t L
aug
hlin
NV
Pla
nt
in L
aug
hlin
NV
bull
Ten
nes
see
Lutt
rell
Kiln
s T
Nid
enti
fied
as
po
int
sou
rce
bullIn
du
stry
in A
Zbull
Scru
bber
sin
stal
led
at
bullC
ars
in C
AN
avaj
o G
ener
atin
g P
lan
tbull
Sm
oke
stac
ks in
NM
NV
UT
bull10
r
edu
ctio
n o
f p
ollu
tio
nbull
Los
An
gel
esin
10ndash
15 y
ears
(g
oal
of
no
bullA
tlan
tam
an-m
ade
po
lluti
on
)bull
New
Yo
rk
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
Comprehension scores some could perform well on the newmeasures some could not revealing two rather than three groups onthe Reading to LearnReading to Integrate Composite These resultslead us to speculate that the new measures tap additional skills suchas lsquosophisticated discourse processes and critical thinking skillsrsquo(Enright and Schedl 1999 24) in addition to language proficiencyFor native speakers the discriminant loading suggested a possiblehierarchy of difficulty but the scatterplots and classification tablesrevealed a dichotomy Reclassification based on the Reading toLearnReading to Integrate Composite nearly eliminated the mid basic comprehension reading ability category with only 168of the native speakers qualifying as mid on the Reading toLearnReading to Integrate Composite
VI Conclusions
While we acknowledge that there are some limitations to this projectsuch as the time required to administer and score new measuresthe results are illuminating useful and suggest some considerationsfor future research The first consideration relates to appropriateclassification of students based on academic reading abilities Thiscorresponds to the role of TOEFL as a gatekeeper by many institu-tions For TOEFL test-takers whose current TOEFL scores would bein an intermediate to high intermediate range the new tasks couldassess additional abilities relevant to academic performance Forsuch participants additional test development efforts are warrantedThe majority of high reading ability nonnative speakers (649)could perform the new measures a third could not Our resultssuggest that some unknown number of mid reading ability nonnativespeakers are more capable of succeeding at more challenging aca-demic reading tasks than their current level of basic comprehensionassessment would indicate At the same time there are some highand mid reading ability participants who could not perform the moredemanding tasks admitting such students directly into universityprograms could result in failure These results suggest that for moststudents at lower levels of basic comprehension (in our study thosewith TOEFL scores below 490 to 500) development of new tasks isunnecessary Nevertheless our results indicate that a certain smallpercentage (in our sample 8 students 182) of nonnative speakersclassified as low reading ability on basic comprehension did in fact
Latricia Trites and Mary McGroarty 199
perform adequately on more challenging tasks like Reading to Learnand Reading to Integrate Given the very large number of TOEFLtest-takers around the world this finding merits further research Itwould be potentially unfair to exclude such students from universitystudy based on basic comprehension-only measures such as the cur-rent TOEFL We would urge institutions that use TOEFL to considerscores on these new more demanding tasks if doing so wouldaddress institutional needs According to interview data collected inconjunction with this project (Trites 2000 Chapter 6) participantsperceived that successful completion of the new tasks required athorough understanding of the texts whereas the multiple-choicetests were less demanding requiring only superficial grasp of con-tent This finding corroborates the observation of Freedle and Kostin(1999 3) who note that often examinees do not need to comprehendthe accompanying text to answer the test item Therefore for allTOEFL test-takers incorporating more challenging tasks such as theReading to Learn and Reading to Integrate measures into typicalEnglish language instruction could have positive washback effects
The second area of consideration is the focus on additional rele-vant research For predictive validity it is important to determine thecorrelation between the Reading to LearnReading to IntegrateComposite measures and actual academic performance in universityclasses Correlations might differ depending on the degree of catego-rization and synthesis required by different major fields of studyAnother area for future research is related to the novelty of theReading to Learn task as an assessment technique Current pedagog-ical trends in literacy instruction emphasize the use of graphic organ-izers as a means to understand and manipulate information in textsTo our knowledge graphic organizers are typically used as classroomactivities rather than assessment tools They have good potential foruse in assessment if students are familiar with them and if appropri-ate scoring systems can be developed This project demonstrates that it is possible though labor intensive to develop reliable scoringsystems Further research is needed to explore refinement of this tasktype and related scoring systems for use in testing programs withlarge numbers of test-takers and scorers Although the Reading toIntegrate task (generating a synthesis) was more familiar the scoringsystem was innovative because it reflected a readerrsquos ability torecognize textual frames as well as integrate information If testdevelopers are interested in the abilities of nonnative speaker readersto perform such tasks further development of similar tasks andscoring systems is warranted While a substantial investment of time
200 New tasks for reading comprehension tests
would be required to refine the administration and scoring systemsneeded for more complex tasks such as these it should be weighedagainst the possible danger of under-representing the likelihood ofacademic success based on results of the basic comprehension-onlymeasures still most often used to assess academic proficiency
Acknowledgements
This project was funded by a grant from Educational Testing Serviceas part of the TOEFL 2000 effort We appreciate the cooperation ofthe ETS staff members who assisted in the development of passagespecific tasks and other aspects of the research however no officialendorsement of Educational Testing Service should be inferred
VII References
Bachman L 2000 Modern language testing at the turn of the century assur-ing that what we count counts Language Testing 17 1ndash42
Biber D 1993 Using register-diversified corpora for general language studiesComputational Linguistics 19 219ndash41
Britt M Rouet J and Perfetti C 1996 Using hypertext to study and reasonabout historical evidence In Rouet J Levonen J Dillon A and Spiro Reditors Hypertext and cognition Mahwah NJ Lawrence Erlbaum 43ndash72
Chapelle C 1999 Validity in language assessment Annual Review of AppliedLinguistics 19 1ndash19
Educational Testing Service 1997 TOEFL test and score manual PrincetonNJ Educational Testing Service
mdashmdash 1998 Draft TOEFL 2000 research agenda framework areas of researchResearch agenda three TOEFL 2000 internal document Princeton NJEducational Testing Service
Eignor D Taylor C Kirsch I and Jamieson J 1998 Development of ascale for assessing the level of computer familiarity of TOEFL exami-nees TOEFL Research Report No 60 Princeton NJ EducationalTesting Service
Enright M and Schedl M 1999 Reading for a reason using reader purposeto guide test design TOEFL 2000 Internal Report Princeton NJEducational Testing Service
Enright M Grabe W Mosenthal P Mulcahy-Ernt P and Schedl M1998 A TOEFL 2000 framework for testing reading comprehension aworking paper Princeton NJ Educational Testing Service
Foltz P 1996 Comprehension coherence and strategies in hypertext InRouet J Levonen J Dillon A and Spiro R editors Hypertext and cog-nition Mahwah NJ Lawrence Erlbaum 109ndash36
Latricia Trites and Mary McGroarty 201
Freedle R and Kostin I 1999 Does the text matter in a multiple choice testof comprehension The case for the construct validity of TOEFLrsquosminitalks Language Testing 16 2ndash32
Goldman S 1997 Learning from text reflections on the past and suggestionsfor the future Discourse Processes 23 357ndash98
Hayes J and Hatch J 1999 Issues in measuring reliability correlationversus percentage of agreement Written Communication 16 354ndash67
Huberty C 1994 Applied discriminant analysis New York John WileyJamieson J Campbell J Norfleet L and Berbisada N 1993 Reliability
of a computerized scoring routine for an open-ended task System 21305ndash22
Jamieson J Norfleet L and Berbisada N 1993 Successes failures anddropouts in computer-assisted language lessons Computer AssistedEnglish Language Learning Journal 4 12ndash20
Klecka W 1980 Discriminant analysis In Lewis-Beck M editorQuantitative applications in the social sciences Volume 19 NewburyPark CA Sage
Lehto M Zhu W and Carpenter B 1995 The relative effectiveness ofhypertext and text International Journal of Human-ComputerInteraction 7 293ndash313
McNamara D and Kintsch W 1996 Learning from texts effects of prior knowl-edge and text coherence Discourse Processes 22 247ndash88
Messick S 1989 Validity In Linn RL editor Educational measurement 3rdedition New York American Council on Education Macmillan 13ndash103
Meyer B 1985a Prose analyses purposes procedures and problems InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 11ndash64
mdashmdash 1985b Prose analysis purposes procedures and problems Part 2 InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 269ndash97
Monks V 1997 AugustSeptember Two views same waterway NationalWildlife 35 36ndash37
Pellegrino J Baxter G and Glaser R 1999 Addressing the lsquotwo disci-plinesrsquo problem linking theories of cognition and learning with assess-ment and instructional practice Review of Research in Education 24307ndash53
Perfetti C 1997 Sentences individual differences and multiple texts threeissues in text comprehension Discourse Processes 23 337ndash55
Perfetti C Britt MA and Georgi M 1995 Text-based learning andreasoning studies in history Hillside NJ Lawrence Erlbaum
Perfetti C Marron M and Foltz P 1996 Sources of comprehension failuretheoretical perspectives and case studies In Cornoldi C and Oakhill Jeditors Reading comprehension difficulties processes and interventionMahwah NJ Lawrence Erlbaum 137ndash65
202 New tasks for reading comprehension tests
Reinking D 1988 Computer-mediated text and comprehension differencesthe role of reading time reader preference and estimation of learningReading Research Quarterly 23 484ndash500
Reinking D and Schreiner R 1985 The effects of computer-mediated texton measures of reading comprehension and reading behavior ReadingResearch Quarterly 20 536ndash53
Spivey N 1997 The constructivist metaphor reading writing and themaking of meaning San Diego CA Academic Press
Stevens J 1996 Applied multivariate statistics for the social sciences 3rdedition Mahwah NJ Lawrence Erlbaum
Tabachnick B and Fidell L 1996 Using multivariate statistics 3rd editionNew York Harper Collins
Taylor C Jamieson J Eignor D and Kirsch I 1998 The relationshipbetween computer familiarity and performance on computer-basedTOEFL test tasks TOEFL Research Report No 61 Princeton NJEducational Testing Service
Tennesen M 1997 NovemberDecember On a clear day National Parks 7126ndash9
Trites L 2000 Beyond basic comprehension reading to learn and reading tointegrate for native and non-native speakers Unpublished doctoraldissertation Northern Arizona University Flagstaff AZ
Van den Berg S and Watt J 1991 Effects of educational setting on studentresponses to structured hypertext Journal of Computer-BasedInstruction 18 118ndash24
Van Dijk TA and Kintsch W 1983 Strategies of discourse comprehensionNew York Academic Press
Wiley J and Voss J 1999 Constructing arguments from multiple sourcestasks that promote understanding and not just memory for text Journalof Educational Psychology 91 310ndash11
Zimmerman T 1997 December 29 Filter it with billions and billions of oys-ters how to revive the Chesapeake Bay US News and World Report 12363
Appendix 1a Chart completion task
Directions Complete the following chart Fill in as much detail aspossible from the text read by categorizing the information into thedifferent areas on the chart Do not use single words for yourresponses form your responses in phrases or complete sentencesInclude examples from the text
Make no judgments about the accuracy of causes or effects or theeffectiveness of the solutions mentioned in the text Solutions are seen
Latricia Trites and Mary McGroarty 203
204 New tasks for reading comprehension tests
as any action taken in response to the problem(s) Solutions can takemany forms such as proposed solutions attempted solutions or failedsolutions Also space is provided under each category for examplesExamples are specific examples found in the text that are used by theauthor(s) to exemplify the problems causes effects or solutions inthe text However there may not be examples for every category
Points will be awarded for correct responses only There is no penaltyfor incorrect responses Points will be awarded in the following manner
Problems and Solutions 10 points eachCauses and Effects 5 points eachExamples 1 point each
Problems Causes Effects Solutions
Examples Examples Examples Examples
Latricia Trites and Mary McGroarty 205
Ap
pen
dix
1b
Rea
din
g t
o L
earn
sco
rin
g r
ub
ric
0ndash2
41 t
ota
l po
ssib
le
Pro
ble
ms
(10
po
ints
) 1
wo
rd
Cau
ses
(5p
oin
ts)
1 w
ord
E
ffec
ts (
5 p
oin
ts)
1 w
ord
S
olu
tio
ns
(10
po
ints
en
d)
1 w
ord
(6 p
oin
ts)
40
max
imu
m(3
po
ints
) 5
5 m
axim
um
(3 p
oin
ts)
30
max
imu
m(6
po
ints
) 9
0 m
axim
um
bullA
ir p
ollu
tio
nS
mo
g i
n t
he
bullS
ulf
ur
nit
rog
en e
mis
sio
ns
bullTre
es a
nd
pla
nts
aff
ecte
dS
tric
ter
En
vir
on
men
tal
Law
s
Nati
on
al
Park
sbull
Aci
d r
ain
(in
jure
d)
gro
wth
hin
der
ed(r
eso
luti
on
s
acts
)
bullO
pp
osit
ion
or
Ign
ori
ng
bull
Gro
un
d-l
evel
ozo
ne
bullV
isib
ilit
y d
ecre
ased
bull19
77 C
lean
Air
Act
(or
lack
of
coo
per
atio
n)
to
Urb
an
In
du
str
ial
em
issio
ns
bullM
eta
ls l
oo
sen
ed
into
wat
ers
Am
en
dm
en
ts l
ab
elin
gN
Ps
envi
ron
men
tal s
tan
dar
ds
bullA
uto
mo
bile
emis
sio
ns
(su
rfac
eg
rou
nd
wat
er
as C
lass I
are
as
(reg
ula
tio
ns)
bullP
ow
er
pla
nts
fac
tori
esp
lan
ts
wat
er p
ollu
ted
)bull
Reg
ion
al h
aze
reg
ula
tio
ns
bullN
o t
rue
lsquopo
int
sou
rces
rsquoem
issi
on
sbull
Nu
trie
nts
rem
oved
pro
po
sed
by
the
EPA
(myri
ad
of
sm
aller
po
llu
tio
n
bullE
mis
sio
ns
fro
m S
mo
kesta
cks
(lea
ched
) fr
om
so
il bull
Red
ucti
on
of
allo
wab
leso
urc
es
rath
er t
han
on
e (c
him
neys)
and
or
pla
nts
po
llu
tio
n s
tan
dard
s f
rom
la
rge
sou
rce)
bullE
mis
sio
ns
fro
m K
iln
sbull
Pu
blic o
utc
ryo
ver
stan
ce
ind
ust
rybull
Nati
on
al
Park
s l
imit
ed
bull
Un
reg
ula
ted
po
lluti
on
sm
og
o
f b
ig b
usi
nes
sg
ove
rnm
ent
bullC
lean
Air
Act
set
ju
risd
icti
on
of
po
lluti
on
(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)bull
Aq
uati
c l
ife in
jure
d (
dam
aged
)vis
ibilit
y g
oals
sou
rces
ou
tsid
e o
f th
e p
arks
bullS
mo
ke f
rom
Co
ntr
olled
bu
rns
bullO
bje
cti
ng
to c
on
str
ucti
on
bullE
mis
sio
ns
fro
m la
rge
citi
esp
erm
its
bullId
en
tifi
cati
on
of
po
lluti
on
so
urc
es
bullIn
du
str
y m
od
ificati
on
s
Insta
llati
on
o
f d
evic
es
(scr
ub
ber
s)
to r
edu
ce
po
lluti
on
bullP
ub
lic p
ressu
re t
op
rote
ct p
arks
206 New tasks for reading comprehension testsA
pp
en
dix
1b
(co
nti
nu
ed)
Exam
ple
s (
1 p
oin
t) 5
maxim
um
Exam
ple
s (
1 p
oin
t o
r 5 p
oin
ts
Exam
ple
s (
1 p
oin
t) 6
maxim
um
Exam
ple
s (
1 p
oin
t o
r 10 p
oin
ts
wit
h o
vera
rch
ing
cau
se
wit
h o
vera
rch
ing
so
luti
on
liste
d a
bo
ve)
liste
d a
bo
ve)
bullG
reat
Sm
oky M
ou
nta
inbull
Au
tom
ob
ile
emis
sio
ns
bullLe
aves
of
pla
nts
tu
rnin
g
bull19
77 C
lean
Air
Act
N
atio
nal
Par
kbull
Po
wer
pla
nts
fac
tori
es
pu
rple
an
d b
row
n (
stip
plin
g)
Am
en
dm
en
ts
lab
elin
gN
Ps
bullG
ran
d C
an
yo
np
lan
ts e
mis
sio
ns
bullH
ind
erin
g p
ho
tosyn
thesis
of
as C
lass I
are
as
Nat
ion
al P
ark
bullE
mis
sio
ns
fro
m
pla
nts
an
d t
rees
bullS
mo
ky M
ou
nta
in S
ou
rces
S
mo
kesta
cks
(ch
imn
eys)
bullR
ed
ucti
on
of
vis
ibilit
yat
bullR
egio
nal
haz
ere
gu
lati
on
s
fro
m O
hio
New
Yo
rk
bullE
mis
sio
ns
fro
m K
iln
sG
ran
d C
an
yo
np
rop
ose
d b
y th
e E
PA
Atl
anta
etc
bull
Un
reg
ula
ted
po
lluti
on
sm
og
bull
Vis
ibili
ty r
educ
ed 9
0 o
f bull
Red
ucti
on
of
allo
wab
lebull
Gra
nd
Can
yon
So
urc
es(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)th
e da
ysp
ollu
tio
n s
tan
dard
s f
rom
fro
m C
A N
V U
T A
Z N
M
bullS
mok
e fr
om C
on
tro
lled
bu
rns
bullS
ee v
ague
blu
e m
asse
sin
du
stry
and
Mex
ico
bullE
mis
sio
ns
fro
m la
rge
citi
esbull
See
hal
f as
far
as
in 1
919
bullT
N L
utt
rell
corp
bu
ildin
g
per
mit
sit
uat
ion
Exam
ple
s (
1 p
oin
t)
10 m
axim
um
Exam
ple
s (
1 p
oin
t)
5 m
axim
um
bullN
avaj
o G
ener
atin
g S
tati
on
bull
Ob
ject
ing
to
kiln
s in
TN
Pag
e A
Z (
Po
wer
Pla
nt
AZ
)bull
Res
earc
her
s u
sin
g s
cien
tifi
cbull
Po
wer
pla
nts
in T
N a
nd
OH
tech
no
log
y to
ID s
ou
rces
rive
r va
lley
(rad
ioac
tive
iso
top
es)
bullS
ou
ther
n C
alif
orn
ia E
dis
on
bull
So
uth
ern
Cal
ifo
rnia
Ed
iso
nP
lan
t L
aug
hlin
NV
Pla
nt
in L
aug
hlin
NV
bull
Ten
nes
see
Lutt
rell
Kiln
s T
Nid
enti
fied
as
po
int
sou
rce
bullIn
du
stry
in A
Zbull
Scru
bber
sin
stal
led
at
bullC
ars
in C
AN
avaj
o G
ener
atin
g P
lan
tbull
Sm
oke
stac
ks in
NM
NV
UT
bull10
r
edu
ctio
n o
f p
ollu
tio
nbull
Los
An
gel
esin
10ndash
15 y
ears
(g
oal
of
no
bullA
tlan
tam
an-m
ade
po
lluti
on
)bull
New
Yo
rk
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
perform adequately on more challenging tasks like Reading to Learnand Reading to Integrate Given the very large number of TOEFLtest-takers around the world this finding merits further research Itwould be potentially unfair to exclude such students from universitystudy based on basic comprehension-only measures such as the cur-rent TOEFL We would urge institutions that use TOEFL to considerscores on these new more demanding tasks if doing so wouldaddress institutional needs According to interview data collected inconjunction with this project (Trites 2000 Chapter 6) participantsperceived that successful completion of the new tasks required athorough understanding of the texts whereas the multiple-choicetests were less demanding requiring only superficial grasp of con-tent This finding corroborates the observation of Freedle and Kostin(1999 3) who note that often examinees do not need to comprehendthe accompanying text to answer the test item Therefore for allTOEFL test-takers incorporating more challenging tasks such as theReading to Learn and Reading to Integrate measures into typicalEnglish language instruction could have positive washback effects
The second area of consideration is the focus on additional rele-vant research For predictive validity it is important to determine thecorrelation between the Reading to LearnReading to IntegrateComposite measures and actual academic performance in universityclasses Correlations might differ depending on the degree of catego-rization and synthesis required by different major fields of studyAnother area for future research is related to the novelty of theReading to Learn task as an assessment technique Current pedagog-ical trends in literacy instruction emphasize the use of graphic organ-izers as a means to understand and manipulate information in textsTo our knowledge graphic organizers are typically used as classroomactivities rather than assessment tools They have good potential foruse in assessment if students are familiar with them and if appropri-ate scoring systems can be developed This project demonstrates that it is possible though labor intensive to develop reliable scoringsystems Further research is needed to explore refinement of this tasktype and related scoring systems for use in testing programs withlarge numbers of test-takers and scorers Although the Reading toIntegrate task (generating a synthesis) was more familiar the scoringsystem was innovative because it reflected a readerrsquos ability torecognize textual frames as well as integrate information If testdevelopers are interested in the abilities of nonnative speaker readersto perform such tasks further development of similar tasks andscoring systems is warranted While a substantial investment of time
200 New tasks for reading comprehension tests
would be required to refine the administration and scoring systemsneeded for more complex tasks such as these it should be weighedagainst the possible danger of under-representing the likelihood ofacademic success based on results of the basic comprehension-onlymeasures still most often used to assess academic proficiency
Acknowledgements
This project was funded by a grant from Educational Testing Serviceas part of the TOEFL 2000 effort We appreciate the cooperation ofthe ETS staff members who assisted in the development of passagespecific tasks and other aspects of the research however no officialendorsement of Educational Testing Service should be inferred
VII References
Bachman L 2000 Modern language testing at the turn of the century assur-ing that what we count counts Language Testing 17 1ndash42
Biber D 1993 Using register-diversified corpora for general language studiesComputational Linguistics 19 219ndash41
Britt M Rouet J and Perfetti C 1996 Using hypertext to study and reasonabout historical evidence In Rouet J Levonen J Dillon A and Spiro Reditors Hypertext and cognition Mahwah NJ Lawrence Erlbaum 43ndash72
Chapelle C 1999 Validity in language assessment Annual Review of AppliedLinguistics 19 1ndash19
Educational Testing Service 1997 TOEFL test and score manual PrincetonNJ Educational Testing Service
mdashmdash 1998 Draft TOEFL 2000 research agenda framework areas of researchResearch agenda three TOEFL 2000 internal document Princeton NJEducational Testing Service
Eignor D Taylor C Kirsch I and Jamieson J 1998 Development of ascale for assessing the level of computer familiarity of TOEFL exami-nees TOEFL Research Report No 60 Princeton NJ EducationalTesting Service
Enright M and Schedl M 1999 Reading for a reason using reader purposeto guide test design TOEFL 2000 Internal Report Princeton NJEducational Testing Service
Enright M Grabe W Mosenthal P Mulcahy-Ernt P and Schedl M1998 A TOEFL 2000 framework for testing reading comprehension aworking paper Princeton NJ Educational Testing Service
Foltz P 1996 Comprehension coherence and strategies in hypertext InRouet J Levonen J Dillon A and Spiro R editors Hypertext and cog-nition Mahwah NJ Lawrence Erlbaum 109ndash36
Latricia Trites and Mary McGroarty 201
Freedle R and Kostin I 1999 Does the text matter in a multiple choice testof comprehension The case for the construct validity of TOEFLrsquosminitalks Language Testing 16 2ndash32
Goldman S 1997 Learning from text reflections on the past and suggestionsfor the future Discourse Processes 23 357ndash98
Hayes J and Hatch J 1999 Issues in measuring reliability correlationversus percentage of agreement Written Communication 16 354ndash67
Huberty C 1994 Applied discriminant analysis New York John WileyJamieson J Campbell J Norfleet L and Berbisada N 1993 Reliability
of a computerized scoring routine for an open-ended task System 21305ndash22
Jamieson J Norfleet L and Berbisada N 1993 Successes failures anddropouts in computer-assisted language lessons Computer AssistedEnglish Language Learning Journal 4 12ndash20
Klecka W 1980 Discriminant analysis In Lewis-Beck M editorQuantitative applications in the social sciences Volume 19 NewburyPark CA Sage
Lehto M Zhu W and Carpenter B 1995 The relative effectiveness ofhypertext and text International Journal of Human-ComputerInteraction 7 293ndash313
McNamara D and Kintsch W 1996 Learning from texts effects of prior knowl-edge and text coherence Discourse Processes 22 247ndash88
Messick S 1989 Validity In Linn RL editor Educational measurement 3rdedition New York American Council on Education Macmillan 13ndash103
Meyer B 1985a Prose analyses purposes procedures and problems InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 11ndash64
mdashmdash 1985b Prose analysis purposes procedures and problems Part 2 InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 269ndash97
Monks V 1997 AugustSeptember Two views same waterway NationalWildlife 35 36ndash37
Pellegrino J Baxter G and Glaser R 1999 Addressing the lsquotwo disci-plinesrsquo problem linking theories of cognition and learning with assess-ment and instructional practice Review of Research in Education 24307ndash53
Perfetti C 1997 Sentences individual differences and multiple texts threeissues in text comprehension Discourse Processes 23 337ndash55
Perfetti C Britt MA and Georgi M 1995 Text-based learning andreasoning studies in history Hillside NJ Lawrence Erlbaum
Perfetti C Marron M and Foltz P 1996 Sources of comprehension failuretheoretical perspectives and case studies In Cornoldi C and Oakhill Jeditors Reading comprehension difficulties processes and interventionMahwah NJ Lawrence Erlbaum 137ndash65
202 New tasks for reading comprehension tests
Reinking D 1988 Computer-mediated text and comprehension differencesthe role of reading time reader preference and estimation of learningReading Research Quarterly 23 484ndash500
Reinking D and Schreiner R 1985 The effects of computer-mediated texton measures of reading comprehension and reading behavior ReadingResearch Quarterly 20 536ndash53
Spivey N 1997 The constructivist metaphor reading writing and themaking of meaning San Diego CA Academic Press
Stevens J 1996 Applied multivariate statistics for the social sciences 3rdedition Mahwah NJ Lawrence Erlbaum
Tabachnick B and Fidell L 1996 Using multivariate statistics 3rd editionNew York Harper Collins
Taylor C Jamieson J Eignor D and Kirsch I 1998 The relationshipbetween computer familiarity and performance on computer-basedTOEFL test tasks TOEFL Research Report No 61 Princeton NJEducational Testing Service
Tennesen M 1997 NovemberDecember On a clear day National Parks 7126ndash9
Trites L 2000 Beyond basic comprehension reading to learn and reading tointegrate for native and non-native speakers Unpublished doctoraldissertation Northern Arizona University Flagstaff AZ
Van den Berg S and Watt J 1991 Effects of educational setting on studentresponses to structured hypertext Journal of Computer-BasedInstruction 18 118ndash24
Van Dijk TA and Kintsch W 1983 Strategies of discourse comprehensionNew York Academic Press
Wiley J and Voss J 1999 Constructing arguments from multiple sourcestasks that promote understanding and not just memory for text Journalof Educational Psychology 91 310ndash11
Zimmerman T 1997 December 29 Filter it with billions and billions of oys-ters how to revive the Chesapeake Bay US News and World Report 12363
Appendix 1a Chart completion task
Directions Complete the following chart Fill in as much detail aspossible from the text read by categorizing the information into thedifferent areas on the chart Do not use single words for yourresponses form your responses in phrases or complete sentencesInclude examples from the text
Make no judgments about the accuracy of causes or effects or theeffectiveness of the solutions mentioned in the text Solutions are seen
Latricia Trites and Mary McGroarty 203
204 New tasks for reading comprehension tests
as any action taken in response to the problem(s) Solutions can takemany forms such as proposed solutions attempted solutions or failedsolutions Also space is provided under each category for examplesExamples are specific examples found in the text that are used by theauthor(s) to exemplify the problems causes effects or solutions inthe text However there may not be examples for every category
Points will be awarded for correct responses only There is no penaltyfor incorrect responses Points will be awarded in the following manner
Problems and Solutions 10 points eachCauses and Effects 5 points eachExamples 1 point each
Problems Causes Effects Solutions
Examples Examples Examples Examples
Latricia Trites and Mary McGroarty 205
Ap
pen
dix
1b
Rea
din
g t
o L
earn
sco
rin
g r
ub
ric
0ndash2
41 t
ota
l po
ssib
le
Pro
ble
ms
(10
po
ints
) 1
wo
rd
Cau
ses
(5p
oin
ts)
1 w
ord
E
ffec
ts (
5 p
oin
ts)
1 w
ord
S
olu
tio
ns
(10
po
ints
en
d)
1 w
ord
(6 p
oin
ts)
40
max
imu
m(3
po
ints
) 5
5 m
axim
um
(3 p
oin
ts)
30
max
imu
m(6
po
ints
) 9
0 m
axim
um
bullA
ir p
ollu
tio
nS
mo
g i
n t
he
bullS
ulf
ur
nit
rog
en e
mis
sio
ns
bullTre
es a
nd
pla
nts
aff
ecte
dS
tric
ter
En
vir
on
men
tal
Law
s
Nati
on
al
Park
sbull
Aci
d r
ain
(in
jure
d)
gro
wth
hin
der
ed(r
eso
luti
on
s
acts
)
bullO
pp
osit
ion
or
Ign
ori
ng
bull
Gro
un
d-l
evel
ozo
ne
bullV
isib
ilit
y d
ecre
ased
bull19
77 C
lean
Air
Act
(or
lack
of
coo
per
atio
n)
to
Urb
an
In
du
str
ial
em
issio
ns
bullM
eta
ls l
oo
sen
ed
into
wat
ers
Am
en
dm
en
ts l
ab
elin
gN
Ps
envi
ron
men
tal s
tan
dar
ds
bullA
uto
mo
bile
emis
sio
ns
(su
rfac
eg
rou
nd
wat
er
as C
lass I
are
as
(reg
ula
tio
ns)
bullP
ow
er
pla
nts
fac
tori
esp
lan
ts
wat
er p
ollu
ted
)bull
Reg
ion
al h
aze
reg
ula
tio
ns
bullN
o t
rue
lsquopo
int
sou
rces
rsquoem
issi
on
sbull
Nu
trie
nts
rem
oved
pro
po
sed
by
the
EPA
(myri
ad
of
sm
aller
po
llu
tio
n
bullE
mis
sio
ns
fro
m S
mo
kesta
cks
(lea
ched
) fr
om
so
il bull
Red
ucti
on
of
allo
wab
leso
urc
es
rath
er t
han
on
e (c
him
neys)
and
or
pla
nts
po
llu
tio
n s
tan
dard
s f
rom
la
rge
sou
rce)
bullE
mis
sio
ns
fro
m K
iln
sbull
Pu
blic o
utc
ryo
ver
stan
ce
ind
ust
rybull
Nati
on
al
Park
s l
imit
ed
bull
Un
reg
ula
ted
po
lluti
on
sm
og
o
f b
ig b
usi
nes
sg
ove
rnm
ent
bullC
lean
Air
Act
set
ju
risd
icti
on
of
po
lluti
on
(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)bull
Aq
uati
c l
ife in
jure
d (
dam
aged
)vis
ibilit
y g
oals
sou
rces
ou
tsid
e o
f th
e p
arks
bullS
mo
ke f
rom
Co
ntr
olled
bu
rns
bullO
bje
cti
ng
to c
on
str
ucti
on
bullE
mis
sio
ns
fro
m la
rge
citi
esp
erm
its
bullId
en
tifi
cati
on
of
po
lluti
on
so
urc
es
bullIn
du
str
y m
od
ificati
on
s
Insta
llati
on
o
f d
evic
es
(scr
ub
ber
s)
to r
edu
ce
po
lluti
on
bullP
ub
lic p
ressu
re t
op
rote
ct p
arks
206 New tasks for reading comprehension testsA
pp
en
dix
1b
(co
nti
nu
ed)
Exam
ple
s (
1 p
oin
t) 5
maxim
um
Exam
ple
s (
1 p
oin
t o
r 5 p
oin
ts
Exam
ple
s (
1 p
oin
t) 6
maxim
um
Exam
ple
s (
1 p
oin
t o
r 10 p
oin
ts
wit
h o
vera
rch
ing
cau
se
wit
h o
vera
rch
ing
so
luti
on
liste
d a
bo
ve)
liste
d a
bo
ve)
bullG
reat
Sm
oky M
ou
nta
inbull
Au
tom
ob
ile
emis
sio
ns
bullLe
aves
of
pla
nts
tu
rnin
g
bull19
77 C
lean
Air
Act
N
atio
nal
Par
kbull
Po
wer
pla
nts
fac
tori
es
pu
rple
an
d b
row
n (
stip
plin
g)
Am
en
dm
en
ts
lab
elin
gN
Ps
bullG
ran
d C
an
yo
np
lan
ts e
mis
sio
ns
bullH
ind
erin
g p
ho
tosyn
thesis
of
as C
lass I
are
as
Nat
ion
al P
ark
bullE
mis
sio
ns
fro
m
pla
nts
an
d t
rees
bullS
mo
ky M
ou
nta
in S
ou
rces
S
mo
kesta
cks
(ch
imn
eys)
bullR
ed
ucti
on
of
vis
ibilit
yat
bullR
egio
nal
haz
ere
gu
lati
on
s
fro
m O
hio
New
Yo
rk
bullE
mis
sio
ns
fro
m K
iln
sG
ran
d C
an
yo
np
rop
ose
d b
y th
e E
PA
Atl
anta
etc
bull
Un
reg
ula
ted
po
lluti
on
sm
og
bull
Vis
ibili
ty r
educ
ed 9
0 o
f bull
Red
ucti
on
of
allo
wab
lebull
Gra
nd
Can
yon
So
urc
es(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)th
e da
ysp
ollu
tio
n s
tan
dard
s f
rom
fro
m C
A N
V U
T A
Z N
M
bullS
mok
e fr
om C
on
tro
lled
bu
rns
bullS
ee v
ague
blu
e m
asse
sin
du
stry
and
Mex
ico
bullE
mis
sio
ns
fro
m la
rge
citi
esbull
See
hal
f as
far
as
in 1
919
bullT
N L
utt
rell
corp
bu
ildin
g
per
mit
sit
uat
ion
Exam
ple
s (
1 p
oin
t)
10 m
axim
um
Exam
ple
s (
1 p
oin
t)
5 m
axim
um
bullN
avaj
o G
ener
atin
g S
tati
on
bull
Ob
ject
ing
to
kiln
s in
TN
Pag
e A
Z (
Po
wer
Pla
nt
AZ
)bull
Res
earc
her
s u
sin
g s
cien
tifi
cbull
Po
wer
pla
nts
in T
N a
nd
OH
tech
no
log
y to
ID s
ou
rces
rive
r va
lley
(rad
ioac
tive
iso
top
es)
bullS
ou
ther
n C
alif
orn
ia E
dis
on
bull
So
uth
ern
Cal
ifo
rnia
Ed
iso
nP
lan
t L
aug
hlin
NV
Pla
nt
in L
aug
hlin
NV
bull
Ten
nes
see
Lutt
rell
Kiln
s T
Nid
enti
fied
as
po
int
sou
rce
bullIn
du
stry
in A
Zbull
Scru
bber
sin
stal
led
at
bullC
ars
in C
AN
avaj
o G
ener
atin
g P
lan
tbull
Sm
oke
stac
ks in
NM
NV
UT
bull10
r
edu
ctio
n o
f p
ollu
tio
nbull
Los
An
gel
esin
10ndash
15 y
ears
(g
oal
of
no
bullA
tlan
tam
an-m
ade
po
lluti
on
)bull
New
Yo
rk
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
would be required to refine the administration and scoring systemsneeded for more complex tasks such as these it should be weighedagainst the possible danger of under-representing the likelihood ofacademic success based on results of the basic comprehension-onlymeasures still most often used to assess academic proficiency
Acknowledgements
This project was funded by a grant from Educational Testing Serviceas part of the TOEFL 2000 effort We appreciate the cooperation ofthe ETS staff members who assisted in the development of passagespecific tasks and other aspects of the research however no officialendorsement of Educational Testing Service should be inferred
VII References
Bachman L 2000 Modern language testing at the turn of the century assur-ing that what we count counts Language Testing 17 1ndash42
Biber D 1993 Using register-diversified corpora for general language studiesComputational Linguistics 19 219ndash41
Britt M Rouet J and Perfetti C 1996 Using hypertext to study and reasonabout historical evidence In Rouet J Levonen J Dillon A and Spiro Reditors Hypertext and cognition Mahwah NJ Lawrence Erlbaum 43ndash72
Chapelle C 1999 Validity in language assessment Annual Review of AppliedLinguistics 19 1ndash19
Educational Testing Service 1997 TOEFL test and score manual PrincetonNJ Educational Testing Service
mdashmdash 1998 Draft TOEFL 2000 research agenda framework areas of researchResearch agenda three TOEFL 2000 internal document Princeton NJEducational Testing Service
Eignor D Taylor C Kirsch I and Jamieson J 1998 Development of ascale for assessing the level of computer familiarity of TOEFL exami-nees TOEFL Research Report No 60 Princeton NJ EducationalTesting Service
Enright M and Schedl M 1999 Reading for a reason using reader purposeto guide test design TOEFL 2000 Internal Report Princeton NJEducational Testing Service
Enright M Grabe W Mosenthal P Mulcahy-Ernt P and Schedl M1998 A TOEFL 2000 framework for testing reading comprehension aworking paper Princeton NJ Educational Testing Service
Foltz P 1996 Comprehension coherence and strategies in hypertext InRouet J Levonen J Dillon A and Spiro R editors Hypertext and cog-nition Mahwah NJ Lawrence Erlbaum 109ndash36
Latricia Trites and Mary McGroarty 201
Freedle R and Kostin I 1999 Does the text matter in a multiple choice testof comprehension The case for the construct validity of TOEFLrsquosminitalks Language Testing 16 2ndash32
Goldman S 1997 Learning from text reflections on the past and suggestionsfor the future Discourse Processes 23 357ndash98
Hayes J and Hatch J 1999 Issues in measuring reliability correlationversus percentage of agreement Written Communication 16 354ndash67
Huberty C 1994 Applied discriminant analysis New York John WileyJamieson J Campbell J Norfleet L and Berbisada N 1993 Reliability
of a computerized scoring routine for an open-ended task System 21305ndash22
Jamieson J Norfleet L and Berbisada N 1993 Successes failures anddropouts in computer-assisted language lessons Computer AssistedEnglish Language Learning Journal 4 12ndash20
Klecka W 1980 Discriminant analysis In Lewis-Beck M editorQuantitative applications in the social sciences Volume 19 NewburyPark CA Sage
Lehto M Zhu W and Carpenter B 1995 The relative effectiveness ofhypertext and text International Journal of Human-ComputerInteraction 7 293ndash313
McNamara D and Kintsch W 1996 Learning from texts effects of prior knowl-edge and text coherence Discourse Processes 22 247ndash88
Messick S 1989 Validity In Linn RL editor Educational measurement 3rdedition New York American Council on Education Macmillan 13ndash103
Meyer B 1985a Prose analyses purposes procedures and problems InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 11ndash64
mdashmdash 1985b Prose analysis purposes procedures and problems Part 2 InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 269ndash97
Monks V 1997 AugustSeptember Two views same waterway NationalWildlife 35 36ndash37
Pellegrino J Baxter G and Glaser R 1999 Addressing the lsquotwo disci-plinesrsquo problem linking theories of cognition and learning with assess-ment and instructional practice Review of Research in Education 24307ndash53
Perfetti C 1997 Sentences individual differences and multiple texts threeissues in text comprehension Discourse Processes 23 337ndash55
Perfetti C Britt MA and Georgi M 1995 Text-based learning andreasoning studies in history Hillside NJ Lawrence Erlbaum
Perfetti C Marron M and Foltz P 1996 Sources of comprehension failuretheoretical perspectives and case studies In Cornoldi C and Oakhill Jeditors Reading comprehension difficulties processes and interventionMahwah NJ Lawrence Erlbaum 137ndash65
202 New tasks for reading comprehension tests
Reinking D 1988 Computer-mediated text and comprehension differencesthe role of reading time reader preference and estimation of learningReading Research Quarterly 23 484ndash500
Reinking D and Schreiner R 1985 The effects of computer-mediated texton measures of reading comprehension and reading behavior ReadingResearch Quarterly 20 536ndash53
Spivey N 1997 The constructivist metaphor reading writing and themaking of meaning San Diego CA Academic Press
Stevens J 1996 Applied multivariate statistics for the social sciences 3rdedition Mahwah NJ Lawrence Erlbaum
Tabachnick B and Fidell L 1996 Using multivariate statistics 3rd editionNew York Harper Collins
Taylor C Jamieson J Eignor D and Kirsch I 1998 The relationshipbetween computer familiarity and performance on computer-basedTOEFL test tasks TOEFL Research Report No 61 Princeton NJEducational Testing Service
Tennesen M 1997 NovemberDecember On a clear day National Parks 7126ndash9
Trites L 2000 Beyond basic comprehension reading to learn and reading tointegrate for native and non-native speakers Unpublished doctoraldissertation Northern Arizona University Flagstaff AZ
Van den Berg S and Watt J 1991 Effects of educational setting on studentresponses to structured hypertext Journal of Computer-BasedInstruction 18 118ndash24
Van Dijk TA and Kintsch W 1983 Strategies of discourse comprehensionNew York Academic Press
Wiley J and Voss J 1999 Constructing arguments from multiple sourcestasks that promote understanding and not just memory for text Journalof Educational Psychology 91 310ndash11
Zimmerman T 1997 December 29 Filter it with billions and billions of oys-ters how to revive the Chesapeake Bay US News and World Report 12363
Appendix 1a Chart completion task
Directions Complete the following chart Fill in as much detail aspossible from the text read by categorizing the information into thedifferent areas on the chart Do not use single words for yourresponses form your responses in phrases or complete sentencesInclude examples from the text
Make no judgments about the accuracy of causes or effects or theeffectiveness of the solutions mentioned in the text Solutions are seen
Latricia Trites and Mary McGroarty 203
204 New tasks for reading comprehension tests
as any action taken in response to the problem(s) Solutions can takemany forms such as proposed solutions attempted solutions or failedsolutions Also space is provided under each category for examplesExamples are specific examples found in the text that are used by theauthor(s) to exemplify the problems causes effects or solutions inthe text However there may not be examples for every category
Points will be awarded for correct responses only There is no penaltyfor incorrect responses Points will be awarded in the following manner
Problems and Solutions 10 points eachCauses and Effects 5 points eachExamples 1 point each
Problems Causes Effects Solutions
Examples Examples Examples Examples
Latricia Trites and Mary McGroarty 205
Ap
pen
dix
1b
Rea
din
g t
o L
earn
sco
rin
g r
ub
ric
0ndash2
41 t
ota
l po
ssib
le
Pro
ble
ms
(10
po
ints
) 1
wo
rd
Cau
ses
(5p
oin
ts)
1 w
ord
E
ffec
ts (
5 p
oin
ts)
1 w
ord
S
olu
tio
ns
(10
po
ints
en
d)
1 w
ord
(6 p
oin
ts)
40
max
imu
m(3
po
ints
) 5
5 m
axim
um
(3 p
oin
ts)
30
max
imu
m(6
po
ints
) 9
0 m
axim
um
bullA
ir p
ollu
tio
nS
mo
g i
n t
he
bullS
ulf
ur
nit
rog
en e
mis
sio
ns
bullTre
es a
nd
pla
nts
aff
ecte
dS
tric
ter
En
vir
on
men
tal
Law
s
Nati
on
al
Park
sbull
Aci
d r
ain
(in
jure
d)
gro
wth
hin
der
ed(r
eso
luti
on
s
acts
)
bullO
pp
osit
ion
or
Ign
ori
ng
bull
Gro
un
d-l
evel
ozo
ne
bullV
isib
ilit
y d
ecre
ased
bull19
77 C
lean
Air
Act
(or
lack
of
coo
per
atio
n)
to
Urb
an
In
du
str
ial
em
issio
ns
bullM
eta
ls l
oo
sen
ed
into
wat
ers
Am
en
dm
en
ts l
ab
elin
gN
Ps
envi
ron
men
tal s
tan
dar
ds
bullA
uto
mo
bile
emis
sio
ns
(su
rfac
eg
rou
nd
wat
er
as C
lass I
are
as
(reg
ula
tio
ns)
bullP
ow
er
pla
nts
fac
tori
esp
lan
ts
wat
er p
ollu
ted
)bull
Reg
ion
al h
aze
reg
ula
tio
ns
bullN
o t
rue
lsquopo
int
sou
rces
rsquoem
issi
on
sbull
Nu
trie
nts
rem
oved
pro
po
sed
by
the
EPA
(myri
ad
of
sm
aller
po
llu
tio
n
bullE
mis
sio
ns
fro
m S
mo
kesta
cks
(lea
ched
) fr
om
so
il bull
Red
ucti
on
of
allo
wab
leso
urc
es
rath
er t
han
on
e (c
him
neys)
and
or
pla
nts
po
llu
tio
n s
tan
dard
s f
rom
la
rge
sou
rce)
bullE
mis
sio
ns
fro
m K
iln
sbull
Pu
blic o
utc
ryo
ver
stan
ce
ind
ust
rybull
Nati
on
al
Park
s l
imit
ed
bull
Un
reg
ula
ted
po
lluti
on
sm
og
o
f b
ig b
usi
nes
sg
ove
rnm
ent
bullC
lean
Air
Act
set
ju
risd
icti
on
of
po
lluti
on
(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)bull
Aq
uati
c l
ife in
jure
d (
dam
aged
)vis
ibilit
y g
oals
sou
rces
ou
tsid
e o
f th
e p
arks
bullS
mo
ke f
rom
Co
ntr
olled
bu
rns
bullO
bje
cti
ng
to c
on
str
ucti
on
bullE
mis
sio
ns
fro
m la
rge
citi
esp
erm
its
bullId
en
tifi
cati
on
of
po
lluti
on
so
urc
es
bullIn
du
str
y m
od
ificati
on
s
Insta
llati
on
o
f d
evic
es
(scr
ub
ber
s)
to r
edu
ce
po
lluti
on
bullP
ub
lic p
ressu
re t
op
rote
ct p
arks
206 New tasks for reading comprehension testsA
pp
en
dix
1b
(co
nti
nu
ed)
Exam
ple
s (
1 p
oin
t) 5
maxim
um
Exam
ple
s (
1 p
oin
t o
r 5 p
oin
ts
Exam
ple
s (
1 p
oin
t) 6
maxim
um
Exam
ple
s (
1 p
oin
t o
r 10 p
oin
ts
wit
h o
vera
rch
ing
cau
se
wit
h o
vera
rch
ing
so
luti
on
liste
d a
bo
ve)
liste
d a
bo
ve)
bullG
reat
Sm
oky M
ou
nta
inbull
Au
tom
ob
ile
emis
sio
ns
bullLe
aves
of
pla
nts
tu
rnin
g
bull19
77 C
lean
Air
Act
N
atio
nal
Par
kbull
Po
wer
pla
nts
fac
tori
es
pu
rple
an
d b
row
n (
stip
plin
g)
Am
en
dm
en
ts
lab
elin
gN
Ps
bullG
ran
d C
an
yo
np
lan
ts e
mis
sio
ns
bullH
ind
erin
g p
ho
tosyn
thesis
of
as C
lass I
are
as
Nat
ion
al P
ark
bullE
mis
sio
ns
fro
m
pla
nts
an
d t
rees
bullS
mo
ky M
ou
nta
in S
ou
rces
S
mo
kesta
cks
(ch
imn
eys)
bullR
ed
ucti
on
of
vis
ibilit
yat
bullR
egio
nal
haz
ere
gu
lati
on
s
fro
m O
hio
New
Yo
rk
bullE
mis
sio
ns
fro
m K
iln
sG
ran
d C
an
yo
np
rop
ose
d b
y th
e E
PA
Atl
anta
etc
bull
Un
reg
ula
ted
po
lluti
on
sm
og
bull
Vis
ibili
ty r
educ
ed 9
0 o
f bull
Red
ucti
on
of
allo
wab
lebull
Gra
nd
Can
yon
So
urc
es(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)th
e da
ysp
ollu
tio
n s
tan
dard
s f
rom
fro
m C
A N
V U
T A
Z N
M
bullS
mok
e fr
om C
on
tro
lled
bu
rns
bullS
ee v
ague
blu
e m
asse
sin
du
stry
and
Mex
ico
bullE
mis
sio
ns
fro
m la
rge
citi
esbull
See
hal
f as
far
as
in 1
919
bullT
N L
utt
rell
corp
bu
ildin
g
per
mit
sit
uat
ion
Exam
ple
s (
1 p
oin
t)
10 m
axim
um
Exam
ple
s (
1 p
oin
t)
5 m
axim
um
bullN
avaj
o G
ener
atin
g S
tati
on
bull
Ob
ject
ing
to
kiln
s in
TN
Pag
e A
Z (
Po
wer
Pla
nt
AZ
)bull
Res
earc
her
s u
sin
g s
cien
tifi
cbull
Po
wer
pla
nts
in T
N a
nd
OH
tech
no
log
y to
ID s
ou
rces
rive
r va
lley
(rad
ioac
tive
iso
top
es)
bullS
ou
ther
n C
alif
orn
ia E
dis
on
bull
So
uth
ern
Cal
ifo
rnia
Ed
iso
nP
lan
t L
aug
hlin
NV
Pla
nt
in L
aug
hlin
NV
bull
Ten
nes
see
Lutt
rell
Kiln
s T
Nid
enti
fied
as
po
int
sou
rce
bullIn
du
stry
in A
Zbull
Scru
bber
sin
stal
led
at
bullC
ars
in C
AN
avaj
o G
ener
atin
g P
lan
tbull
Sm
oke
stac
ks in
NM
NV
UT
bull10
r
edu
ctio
n o
f p
ollu
tio
nbull
Los
An
gel
esin
10ndash
15 y
ears
(g
oal
of
no
bullA
tlan
tam
an-m
ade
po
lluti
on
)bull
New
Yo
rk
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
Freedle R and Kostin I 1999 Does the text matter in a multiple choice testof comprehension The case for the construct validity of TOEFLrsquosminitalks Language Testing 16 2ndash32
Goldman S 1997 Learning from text reflections on the past and suggestionsfor the future Discourse Processes 23 357ndash98
Hayes J and Hatch J 1999 Issues in measuring reliability correlationversus percentage of agreement Written Communication 16 354ndash67
Huberty C 1994 Applied discriminant analysis New York John WileyJamieson J Campbell J Norfleet L and Berbisada N 1993 Reliability
of a computerized scoring routine for an open-ended task System 21305ndash22
Jamieson J Norfleet L and Berbisada N 1993 Successes failures anddropouts in computer-assisted language lessons Computer AssistedEnglish Language Learning Journal 4 12ndash20
Klecka W 1980 Discriminant analysis In Lewis-Beck M editorQuantitative applications in the social sciences Volume 19 NewburyPark CA Sage
Lehto M Zhu W and Carpenter B 1995 The relative effectiveness ofhypertext and text International Journal of Human-ComputerInteraction 7 293ndash313
McNamara D and Kintsch W 1996 Learning from texts effects of prior knowl-edge and text coherence Discourse Processes 22 247ndash88
Messick S 1989 Validity In Linn RL editor Educational measurement 3rdedition New York American Council on Education Macmillan 13ndash103
Meyer B 1985a Prose analyses purposes procedures and problems InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 11ndash64
mdashmdash 1985b Prose analysis purposes procedures and problems Part 2 InBritton B and Black J editors Understanding expository text HillsideNJ Lawrence Erlbaum 269ndash97
Monks V 1997 AugustSeptember Two views same waterway NationalWildlife 35 36ndash37
Pellegrino J Baxter G and Glaser R 1999 Addressing the lsquotwo disci-plinesrsquo problem linking theories of cognition and learning with assess-ment and instructional practice Review of Research in Education 24307ndash53
Perfetti C 1997 Sentences individual differences and multiple texts threeissues in text comprehension Discourse Processes 23 337ndash55
Perfetti C Britt MA and Georgi M 1995 Text-based learning andreasoning studies in history Hillside NJ Lawrence Erlbaum
Perfetti C Marron M and Foltz P 1996 Sources of comprehension failuretheoretical perspectives and case studies In Cornoldi C and Oakhill Jeditors Reading comprehension difficulties processes and interventionMahwah NJ Lawrence Erlbaum 137ndash65
202 New tasks for reading comprehension tests
Reinking D 1988 Computer-mediated text and comprehension differencesthe role of reading time reader preference and estimation of learningReading Research Quarterly 23 484ndash500
Reinking D and Schreiner R 1985 The effects of computer-mediated texton measures of reading comprehension and reading behavior ReadingResearch Quarterly 20 536ndash53
Spivey N 1997 The constructivist metaphor reading writing and themaking of meaning San Diego CA Academic Press
Stevens J 1996 Applied multivariate statistics for the social sciences 3rdedition Mahwah NJ Lawrence Erlbaum
Tabachnick B and Fidell L 1996 Using multivariate statistics 3rd editionNew York Harper Collins
Taylor C Jamieson J Eignor D and Kirsch I 1998 The relationshipbetween computer familiarity and performance on computer-basedTOEFL test tasks TOEFL Research Report No 61 Princeton NJEducational Testing Service
Tennesen M 1997 NovemberDecember On a clear day National Parks 7126ndash9
Trites L 2000 Beyond basic comprehension reading to learn and reading tointegrate for native and non-native speakers Unpublished doctoraldissertation Northern Arizona University Flagstaff AZ
Van den Berg S and Watt J 1991 Effects of educational setting on studentresponses to structured hypertext Journal of Computer-BasedInstruction 18 118ndash24
Van Dijk TA and Kintsch W 1983 Strategies of discourse comprehensionNew York Academic Press
Wiley J and Voss J 1999 Constructing arguments from multiple sourcestasks that promote understanding and not just memory for text Journalof Educational Psychology 91 310ndash11
Zimmerman T 1997 December 29 Filter it with billions and billions of oys-ters how to revive the Chesapeake Bay US News and World Report 12363
Appendix 1a Chart completion task
Directions Complete the following chart Fill in as much detail aspossible from the text read by categorizing the information into thedifferent areas on the chart Do not use single words for yourresponses form your responses in phrases or complete sentencesInclude examples from the text
Make no judgments about the accuracy of causes or effects or theeffectiveness of the solutions mentioned in the text Solutions are seen
Latricia Trites and Mary McGroarty 203
204 New tasks for reading comprehension tests
as any action taken in response to the problem(s) Solutions can takemany forms such as proposed solutions attempted solutions or failedsolutions Also space is provided under each category for examplesExamples are specific examples found in the text that are used by theauthor(s) to exemplify the problems causes effects or solutions inthe text However there may not be examples for every category
Points will be awarded for correct responses only There is no penaltyfor incorrect responses Points will be awarded in the following manner
Problems and Solutions 10 points eachCauses and Effects 5 points eachExamples 1 point each
Problems Causes Effects Solutions
Examples Examples Examples Examples
Latricia Trites and Mary McGroarty 205
Ap
pen
dix
1b
Rea
din
g t
o L
earn
sco
rin
g r
ub
ric
0ndash2
41 t
ota
l po
ssib
le
Pro
ble
ms
(10
po
ints
) 1
wo
rd
Cau
ses
(5p
oin
ts)
1 w
ord
E
ffec
ts (
5 p
oin
ts)
1 w
ord
S
olu
tio
ns
(10
po
ints
en
d)
1 w
ord
(6 p
oin
ts)
40
max
imu
m(3
po
ints
) 5
5 m
axim
um
(3 p
oin
ts)
30
max
imu
m(6
po
ints
) 9
0 m
axim
um
bullA
ir p
ollu
tio
nS
mo
g i
n t
he
bullS
ulf
ur
nit
rog
en e
mis
sio
ns
bullTre
es a
nd
pla
nts
aff
ecte
dS
tric
ter
En
vir
on
men
tal
Law
s
Nati
on
al
Park
sbull
Aci
d r
ain
(in
jure
d)
gro
wth
hin
der
ed(r
eso
luti
on
s
acts
)
bullO
pp
osit
ion
or
Ign
ori
ng
bull
Gro
un
d-l
evel
ozo
ne
bullV
isib
ilit
y d
ecre
ased
bull19
77 C
lean
Air
Act
(or
lack
of
coo
per
atio
n)
to
Urb
an
In
du
str
ial
em
issio
ns
bullM
eta
ls l
oo
sen
ed
into
wat
ers
Am
en
dm
en
ts l
ab
elin
gN
Ps
envi
ron
men
tal s
tan
dar
ds
bullA
uto
mo
bile
emis
sio
ns
(su
rfac
eg
rou
nd
wat
er
as C
lass I
are
as
(reg
ula
tio
ns)
bullP
ow
er
pla
nts
fac
tori
esp
lan
ts
wat
er p
ollu
ted
)bull
Reg
ion
al h
aze
reg
ula
tio
ns
bullN
o t
rue
lsquopo
int
sou
rces
rsquoem
issi
on
sbull
Nu
trie
nts
rem
oved
pro
po
sed
by
the
EPA
(myri
ad
of
sm
aller
po
llu
tio
n
bullE
mis
sio
ns
fro
m S
mo
kesta
cks
(lea
ched
) fr
om
so
il bull
Red
ucti
on
of
allo
wab
leso
urc
es
rath
er t
han
on
e (c
him
neys)
and
or
pla
nts
po
llu
tio
n s
tan
dard
s f
rom
la
rge
sou
rce)
bullE
mis
sio
ns
fro
m K
iln
sbull
Pu
blic o
utc
ryo
ver
stan
ce
ind
ust
rybull
Nati
on
al
Park
s l
imit
ed
bull
Un
reg
ula
ted
po
lluti
on
sm
og
o
f b
ig b
usi
nes
sg
ove
rnm
ent
bullC
lean
Air
Act
set
ju
risd
icti
on
of
po
lluti
on
(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)bull
Aq
uati
c l
ife in
jure
d (
dam
aged
)vis
ibilit
y g
oals
sou
rces
ou
tsid
e o
f th
e p
arks
bullS
mo
ke f
rom
Co
ntr
olled
bu
rns
bullO
bje
cti
ng
to c
on
str
ucti
on
bullE
mis
sio
ns
fro
m la
rge
citi
esp
erm
its
bullId
en
tifi
cati
on
of
po
lluti
on
so
urc
es
bullIn
du
str
y m
od
ificati
on
s
Insta
llati
on
o
f d
evic
es
(scr
ub
ber
s)
to r
edu
ce
po
lluti
on
bullP
ub
lic p
ressu
re t
op
rote
ct p
arks
206 New tasks for reading comprehension testsA
pp
en
dix
1b
(co
nti
nu
ed)
Exam
ple
s (
1 p
oin
t) 5
maxim
um
Exam
ple
s (
1 p
oin
t o
r 5 p
oin
ts
Exam
ple
s (
1 p
oin
t) 6
maxim
um
Exam
ple
s (
1 p
oin
t o
r 10 p
oin
ts
wit
h o
vera
rch
ing
cau
se
wit
h o
vera
rch
ing
so
luti
on
liste
d a
bo
ve)
liste
d a
bo
ve)
bullG
reat
Sm
oky M
ou
nta
inbull
Au
tom
ob
ile
emis
sio
ns
bullLe
aves
of
pla
nts
tu
rnin
g
bull19
77 C
lean
Air
Act
N
atio
nal
Par
kbull
Po
wer
pla
nts
fac
tori
es
pu
rple
an
d b
row
n (
stip
plin
g)
Am
en
dm
en
ts
lab
elin
gN
Ps
bullG
ran
d C
an
yo
np
lan
ts e
mis
sio
ns
bullH
ind
erin
g p
ho
tosyn
thesis
of
as C
lass I
are
as
Nat
ion
al P
ark
bullE
mis
sio
ns
fro
m
pla
nts
an
d t
rees
bullS
mo
ky M
ou
nta
in S
ou
rces
S
mo
kesta
cks
(ch
imn
eys)
bullR
ed
ucti
on
of
vis
ibilit
yat
bullR
egio
nal
haz
ere
gu
lati
on
s
fro
m O
hio
New
Yo
rk
bullE
mis
sio
ns
fro
m K
iln
sG
ran
d C
an
yo
np
rop
ose
d b
y th
e E
PA
Atl
anta
etc
bull
Un
reg
ula
ted
po
lluti
on
sm
og
bull
Vis
ibili
ty r
educ
ed 9
0 o
f bull
Red
ucti
on
of
allo
wab
lebull
Gra
nd
Can
yon
So
urc
es(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)th
e da
ysp
ollu
tio
n s
tan
dard
s f
rom
fro
m C
A N
V U
T A
Z N
M
bullS
mok
e fr
om C
on
tro
lled
bu
rns
bullS
ee v
ague
blu
e m
asse
sin
du
stry
and
Mex
ico
bullE
mis
sio
ns
fro
m la
rge
citi
esbull
See
hal
f as
far
as
in 1
919
bullT
N L
utt
rell
corp
bu
ildin
g
per
mit
sit
uat
ion
Exam
ple
s (
1 p
oin
t)
10 m
axim
um
Exam
ple
s (
1 p
oin
t)
5 m
axim
um
bullN
avaj
o G
ener
atin
g S
tati
on
bull
Ob
ject
ing
to
kiln
s in
TN
Pag
e A
Z (
Po
wer
Pla
nt
AZ
)bull
Res
earc
her
s u
sin
g s
cien
tifi
cbull
Po
wer
pla
nts
in T
N a
nd
OH
tech
no
log
y to
ID s
ou
rces
rive
r va
lley
(rad
ioac
tive
iso
top
es)
bullS
ou
ther
n C
alif
orn
ia E
dis
on
bull
So
uth
ern
Cal
ifo
rnia
Ed
iso
nP
lan
t L
aug
hlin
NV
Pla
nt
in L
aug
hlin
NV
bull
Ten
nes
see
Lutt
rell
Kiln
s T
Nid
enti
fied
as
po
int
sou
rce
bullIn
du
stry
in A
Zbull
Scru
bber
sin
stal
led
at
bullC
ars
in C
AN
avaj
o G
ener
atin
g P
lan
tbull
Sm
oke
stac
ks in
NM
NV
UT
bull10
r
edu
ctio
n o
f p
ollu
tio
nbull
Los
An
gel
esin
10ndash
15 y
ears
(g
oal
of
no
bullA
tlan
tam
an-m
ade
po
lluti
on
)bull
New
Yo
rk
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
Reinking D 1988 Computer-mediated text and comprehension differencesthe role of reading time reader preference and estimation of learningReading Research Quarterly 23 484ndash500
Reinking D and Schreiner R 1985 The effects of computer-mediated texton measures of reading comprehension and reading behavior ReadingResearch Quarterly 20 536ndash53
Spivey N 1997 The constructivist metaphor reading writing and themaking of meaning San Diego CA Academic Press
Stevens J 1996 Applied multivariate statistics for the social sciences 3rdedition Mahwah NJ Lawrence Erlbaum
Tabachnick B and Fidell L 1996 Using multivariate statistics 3rd editionNew York Harper Collins
Taylor C Jamieson J Eignor D and Kirsch I 1998 The relationshipbetween computer familiarity and performance on computer-basedTOEFL test tasks TOEFL Research Report No 61 Princeton NJEducational Testing Service
Tennesen M 1997 NovemberDecember On a clear day National Parks 7126ndash9
Trites L 2000 Beyond basic comprehension reading to learn and reading tointegrate for native and non-native speakers Unpublished doctoraldissertation Northern Arizona University Flagstaff AZ
Van den Berg S and Watt J 1991 Effects of educational setting on studentresponses to structured hypertext Journal of Computer-BasedInstruction 18 118ndash24
Van Dijk TA and Kintsch W 1983 Strategies of discourse comprehensionNew York Academic Press
Wiley J and Voss J 1999 Constructing arguments from multiple sourcestasks that promote understanding and not just memory for text Journalof Educational Psychology 91 310ndash11
Zimmerman T 1997 December 29 Filter it with billions and billions of oys-ters how to revive the Chesapeake Bay US News and World Report 12363
Appendix 1a Chart completion task
Directions Complete the following chart Fill in as much detail aspossible from the text read by categorizing the information into thedifferent areas on the chart Do not use single words for yourresponses form your responses in phrases or complete sentencesInclude examples from the text
Make no judgments about the accuracy of causes or effects or theeffectiveness of the solutions mentioned in the text Solutions are seen
Latricia Trites and Mary McGroarty 203
204 New tasks for reading comprehension tests
as any action taken in response to the problem(s) Solutions can takemany forms such as proposed solutions attempted solutions or failedsolutions Also space is provided under each category for examplesExamples are specific examples found in the text that are used by theauthor(s) to exemplify the problems causes effects or solutions inthe text However there may not be examples for every category
Points will be awarded for correct responses only There is no penaltyfor incorrect responses Points will be awarded in the following manner
Problems and Solutions 10 points eachCauses and Effects 5 points eachExamples 1 point each
Problems Causes Effects Solutions
Examples Examples Examples Examples
Latricia Trites and Mary McGroarty 205
Ap
pen
dix
1b
Rea
din
g t
o L
earn
sco
rin
g r
ub
ric
0ndash2
41 t
ota
l po
ssib
le
Pro
ble
ms
(10
po
ints
) 1
wo
rd
Cau
ses
(5p
oin
ts)
1 w
ord
E
ffec
ts (
5 p
oin
ts)
1 w
ord
S
olu
tio
ns
(10
po
ints
en
d)
1 w
ord
(6 p
oin
ts)
40
max
imu
m(3
po
ints
) 5
5 m
axim
um
(3 p
oin
ts)
30
max
imu
m(6
po
ints
) 9
0 m
axim
um
bullA
ir p
ollu
tio
nS
mo
g i
n t
he
bullS
ulf
ur
nit
rog
en e
mis
sio
ns
bullTre
es a
nd
pla
nts
aff
ecte
dS
tric
ter
En
vir
on
men
tal
Law
s
Nati
on
al
Park
sbull
Aci
d r
ain
(in
jure
d)
gro
wth
hin
der
ed(r
eso
luti
on
s
acts
)
bullO
pp
osit
ion
or
Ign
ori
ng
bull
Gro
un
d-l
evel
ozo
ne
bullV
isib
ilit
y d
ecre
ased
bull19
77 C
lean
Air
Act
(or
lack
of
coo
per
atio
n)
to
Urb
an
In
du
str
ial
em
issio
ns
bullM
eta
ls l
oo
sen
ed
into
wat
ers
Am
en
dm
en
ts l
ab
elin
gN
Ps
envi
ron
men
tal s
tan
dar
ds
bullA
uto
mo
bile
emis
sio
ns
(su
rfac
eg
rou
nd
wat
er
as C
lass I
are
as
(reg
ula
tio
ns)
bullP
ow
er
pla
nts
fac
tori
esp
lan
ts
wat
er p
ollu
ted
)bull
Reg
ion
al h
aze
reg
ula
tio
ns
bullN
o t
rue
lsquopo
int
sou
rces
rsquoem
issi
on
sbull
Nu
trie
nts
rem
oved
pro
po
sed
by
the
EPA
(myri
ad
of
sm
aller
po
llu
tio
n
bullE
mis
sio
ns
fro
m S
mo
kesta
cks
(lea
ched
) fr
om
so
il bull
Red
ucti
on
of
allo
wab
leso
urc
es
rath
er t
han
on
e (c
him
neys)
and
or
pla
nts
po
llu
tio
n s
tan
dard
s f
rom
la
rge
sou
rce)
bullE
mis
sio
ns
fro
m K
iln
sbull
Pu
blic o
utc
ryo
ver
stan
ce
ind
ust
rybull
Nati
on
al
Park
s l
imit
ed
bull
Un
reg
ula
ted
po
lluti
on
sm
og
o
f b
ig b
usi
nes
sg
ove
rnm
ent
bullC
lean
Air
Act
set
ju
risd
icti
on
of
po
lluti
on
(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)bull
Aq
uati
c l
ife in
jure
d (
dam
aged
)vis
ibilit
y g
oals
sou
rces
ou
tsid
e o
f th
e p
arks
bullS
mo
ke f
rom
Co
ntr
olled
bu
rns
bullO
bje
cti
ng
to c
on
str
ucti
on
bullE
mis
sio
ns
fro
m la
rge
citi
esp
erm
its
bullId
en
tifi
cati
on
of
po
lluti
on
so
urc
es
bullIn
du
str
y m
od
ificati
on
s
Insta
llati
on
o
f d
evic
es
(scr
ub
ber
s)
to r
edu
ce
po
lluti
on
bullP
ub
lic p
ressu
re t
op
rote
ct p
arks
206 New tasks for reading comprehension testsA
pp
en
dix
1b
(co
nti
nu
ed)
Exam
ple
s (
1 p
oin
t) 5
maxim
um
Exam
ple
s (
1 p
oin
t o
r 5 p
oin
ts
Exam
ple
s (
1 p
oin
t) 6
maxim
um
Exam
ple
s (
1 p
oin
t o
r 10 p
oin
ts
wit
h o
vera
rch
ing
cau
se
wit
h o
vera
rch
ing
so
luti
on
liste
d a
bo
ve)
liste
d a
bo
ve)
bullG
reat
Sm
oky M
ou
nta
inbull
Au
tom
ob
ile
emis
sio
ns
bullLe
aves
of
pla
nts
tu
rnin
g
bull19
77 C
lean
Air
Act
N
atio
nal
Par
kbull
Po
wer
pla
nts
fac
tori
es
pu
rple
an
d b
row
n (
stip
plin
g)
Am
en
dm
en
ts
lab
elin
gN
Ps
bullG
ran
d C
an
yo
np
lan
ts e
mis
sio
ns
bullH
ind
erin
g p
ho
tosyn
thesis
of
as C
lass I
are
as
Nat
ion
al P
ark
bullE
mis
sio
ns
fro
m
pla
nts
an
d t
rees
bullS
mo
ky M
ou
nta
in S
ou
rces
S
mo
kesta
cks
(ch
imn
eys)
bullR
ed
ucti
on
of
vis
ibilit
yat
bullR
egio
nal
haz
ere
gu
lati
on
s
fro
m O
hio
New
Yo
rk
bullE
mis
sio
ns
fro
m K
iln
sG
ran
d C
an
yo
np
rop
ose
d b
y th
e E
PA
Atl
anta
etc
bull
Un
reg
ula
ted
po
lluti
on
sm
og
bull
Vis
ibili
ty r
educ
ed 9
0 o
f bull
Red
ucti
on
of
allo
wab
lebull
Gra
nd
Can
yon
So
urc
es(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)th
e da
ysp
ollu
tio
n s
tan
dard
s f
rom
fro
m C
A N
V U
T A
Z N
M
bullS
mok
e fr
om C
on
tro
lled
bu
rns
bullS
ee v
ague
blu
e m
asse
sin
du
stry
and
Mex
ico
bullE
mis
sio
ns
fro
m la
rge
citi
esbull
See
hal
f as
far
as
in 1
919
bullT
N L
utt
rell
corp
bu
ildin
g
per
mit
sit
uat
ion
Exam
ple
s (
1 p
oin
t)
10 m
axim
um
Exam
ple
s (
1 p
oin
t)
5 m
axim
um
bullN
avaj
o G
ener
atin
g S
tati
on
bull
Ob
ject
ing
to
kiln
s in
TN
Pag
e A
Z (
Po
wer
Pla
nt
AZ
)bull
Res
earc
her
s u
sin
g s
cien
tifi
cbull
Po
wer
pla
nts
in T
N a
nd
OH
tech
no
log
y to
ID s
ou
rces
rive
r va
lley
(rad
ioac
tive
iso
top
es)
bullS
ou
ther
n C
alif
orn
ia E
dis
on
bull
So
uth
ern
Cal
ifo
rnia
Ed
iso
nP
lan
t L
aug
hlin
NV
Pla
nt
in L
aug
hlin
NV
bull
Ten
nes
see
Lutt
rell
Kiln
s T
Nid
enti
fied
as
po
int
sou
rce
bullIn
du
stry
in A
Zbull
Scru
bber
sin
stal
led
at
bullC
ars
in C
AN
avaj
o G
ener
atin
g P
lan
tbull
Sm
oke
stac
ks in
NM
NV
UT
bull10
r
edu
ctio
n o
f p
ollu
tio
nbull
Los
An
gel
esin
10ndash
15 y
ears
(g
oal
of
no
bullA
tlan
tam
an-m
ade
po
lluti
on
)bull
New
Yo
rk
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
204 New tasks for reading comprehension tests
as any action taken in response to the problem(s) Solutions can takemany forms such as proposed solutions attempted solutions or failedsolutions Also space is provided under each category for examplesExamples are specific examples found in the text that are used by theauthor(s) to exemplify the problems causes effects or solutions inthe text However there may not be examples for every category
Points will be awarded for correct responses only There is no penaltyfor incorrect responses Points will be awarded in the following manner
Problems and Solutions 10 points eachCauses and Effects 5 points eachExamples 1 point each
Problems Causes Effects Solutions
Examples Examples Examples Examples
Latricia Trites and Mary McGroarty 205
Ap
pen
dix
1b
Rea
din
g t
o L
earn
sco
rin
g r
ub
ric
0ndash2
41 t
ota
l po
ssib
le
Pro
ble
ms
(10
po
ints
) 1
wo
rd
Cau
ses
(5p
oin
ts)
1 w
ord
E
ffec
ts (
5 p
oin
ts)
1 w
ord
S
olu
tio
ns
(10
po
ints
en
d)
1 w
ord
(6 p
oin
ts)
40
max
imu
m(3
po
ints
) 5
5 m
axim
um
(3 p
oin
ts)
30
max
imu
m(6
po
ints
) 9
0 m
axim
um
bullA
ir p
ollu
tio
nS
mo
g i
n t
he
bullS
ulf
ur
nit
rog
en e
mis
sio
ns
bullTre
es a
nd
pla
nts
aff
ecte
dS
tric
ter
En
vir
on
men
tal
Law
s
Nati
on
al
Park
sbull
Aci
d r
ain
(in
jure
d)
gro
wth
hin
der
ed(r
eso
luti
on
s
acts
)
bullO
pp
osit
ion
or
Ign
ori
ng
bull
Gro
un
d-l
evel
ozo
ne
bullV
isib
ilit
y d
ecre
ased
bull19
77 C
lean
Air
Act
(or
lack
of
coo
per
atio
n)
to
Urb
an
In
du
str
ial
em
issio
ns
bullM
eta
ls l
oo
sen
ed
into
wat
ers
Am
en
dm
en
ts l
ab
elin
gN
Ps
envi
ron
men
tal s
tan
dar
ds
bullA
uto
mo
bile
emis
sio
ns
(su
rfac
eg
rou
nd
wat
er
as C
lass I
are
as
(reg
ula
tio
ns)
bullP
ow
er
pla
nts
fac
tori
esp
lan
ts
wat
er p
ollu
ted
)bull
Reg
ion
al h
aze
reg
ula
tio
ns
bullN
o t
rue
lsquopo
int
sou
rces
rsquoem
issi
on
sbull
Nu
trie
nts
rem
oved
pro
po
sed
by
the
EPA
(myri
ad
of
sm
aller
po
llu
tio
n
bullE
mis
sio
ns
fro
m S
mo
kesta
cks
(lea
ched
) fr
om
so
il bull
Red
ucti
on
of
allo
wab
leso
urc
es
rath
er t
han
on
e (c
him
neys)
and
or
pla
nts
po
llu
tio
n s
tan
dard
s f
rom
la
rge
sou
rce)
bullE
mis
sio
ns
fro
m K
iln
sbull
Pu
blic o
utc
ryo
ver
stan
ce
ind
ust
rybull
Nati
on
al
Park
s l
imit
ed
bull
Un
reg
ula
ted
po
lluti
on
sm
og
o
f b
ig b
usi
nes
sg
ove
rnm
ent
bullC
lean
Air
Act
set
ju
risd
icti
on
of
po
lluti
on
(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)bull
Aq
uati
c l
ife in
jure
d (
dam
aged
)vis
ibilit
y g
oals
sou
rces
ou
tsid
e o
f th
e p
arks
bullS
mo
ke f
rom
Co
ntr
olled
bu
rns
bullO
bje
cti
ng
to c
on
str
ucti
on
bullE
mis
sio
ns
fro
m la
rge
citi
esp
erm
its
bullId
en
tifi
cati
on
of
po
lluti
on
so
urc
es
bullIn
du
str
y m
od
ificati
on
s
Insta
llati
on
o
f d
evic
es
(scr
ub
ber
s)
to r
edu
ce
po
lluti
on
bullP
ub
lic p
ressu
re t
op
rote
ct p
arks
206 New tasks for reading comprehension testsA
pp
en
dix
1b
(co
nti
nu
ed)
Exam
ple
s (
1 p
oin
t) 5
maxim
um
Exam
ple
s (
1 p
oin
t o
r 5 p
oin
ts
Exam
ple
s (
1 p
oin
t) 6
maxim
um
Exam
ple
s (
1 p
oin
t o
r 10 p
oin
ts
wit
h o
vera
rch
ing
cau
se
wit
h o
vera
rch
ing
so
luti
on
liste
d a
bo
ve)
liste
d a
bo
ve)
bullG
reat
Sm
oky M
ou
nta
inbull
Au
tom
ob
ile
emis
sio
ns
bullLe
aves
of
pla
nts
tu
rnin
g
bull19
77 C
lean
Air
Act
N
atio
nal
Par
kbull
Po
wer
pla
nts
fac
tori
es
pu
rple
an
d b
row
n (
stip
plin
g)
Am
en
dm
en
ts
lab
elin
gN
Ps
bullG
ran
d C
an
yo
np
lan
ts e
mis
sio
ns
bullH
ind
erin
g p
ho
tosyn
thesis
of
as C
lass I
are
as
Nat
ion
al P
ark
bullE
mis
sio
ns
fro
m
pla
nts
an
d t
rees
bullS
mo
ky M
ou
nta
in S
ou
rces
S
mo
kesta
cks
(ch
imn
eys)
bullR
ed
ucti
on
of
vis
ibilit
yat
bullR
egio
nal
haz
ere
gu
lati
on
s
fro
m O
hio
New
Yo
rk
bullE
mis
sio
ns
fro
m K
iln
sG
ran
d C
an
yo
np
rop
ose
d b
y th
e E
PA
Atl
anta
etc
bull
Un
reg
ula
ted
po
lluti
on
sm
og
bull
Vis
ibili
ty r
educ
ed 9
0 o
f bull
Red
ucti
on
of
allo
wab
lebull
Gra
nd
Can
yon
So
urc
es(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)th
e da
ysp
ollu
tio
n s
tan
dard
s f
rom
fro
m C
A N
V U
T A
Z N
M
bullS
mok
e fr
om C
on
tro
lled
bu
rns
bullS
ee v
ague
blu
e m
asse
sin
du
stry
and
Mex
ico
bullE
mis
sio
ns
fro
m la
rge
citi
esbull
See
hal
f as
far
as
in 1
919
bullT
N L
utt
rell
corp
bu
ildin
g
per
mit
sit
uat
ion
Exam
ple
s (
1 p
oin
t)
10 m
axim
um
Exam
ple
s (
1 p
oin
t)
5 m
axim
um
bullN
avaj
o G
ener
atin
g S
tati
on
bull
Ob
ject
ing
to
kiln
s in
TN
Pag
e A
Z (
Po
wer
Pla
nt
AZ
)bull
Res
earc
her
s u
sin
g s
cien
tifi
cbull
Po
wer
pla
nts
in T
N a
nd
OH
tech
no
log
y to
ID s
ou
rces
rive
r va
lley
(rad
ioac
tive
iso
top
es)
bullS
ou
ther
n C
alif
orn
ia E
dis
on
bull
So
uth
ern
Cal
ifo
rnia
Ed
iso
nP
lan
t L
aug
hlin
NV
Pla
nt
in L
aug
hlin
NV
bull
Ten
nes
see
Lutt
rell
Kiln
s T
Nid
enti
fied
as
po
int
sou
rce
bullIn
du
stry
in A
Zbull
Scru
bber
sin
stal
led
at
bullC
ars
in C
AN
avaj
o G
ener
atin
g P
lan
tbull
Sm
oke
stac
ks in
NM
NV
UT
bull10
r
edu
ctio
n o
f p
ollu
tio
nbull
Los
An
gel
esin
10ndash
15 y
ears
(g
oal
of
no
bullA
tlan
tam
an-m
ade
po
lluti
on
)bull
New
Yo
rk
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
Latricia Trites and Mary McGroarty 205
Ap
pen
dix
1b
Rea
din
g t
o L
earn
sco
rin
g r
ub
ric
0ndash2
41 t
ota
l po
ssib
le
Pro
ble
ms
(10
po
ints
) 1
wo
rd
Cau
ses
(5p
oin
ts)
1 w
ord
E
ffec
ts (
5 p
oin
ts)
1 w
ord
S
olu
tio
ns
(10
po
ints
en
d)
1 w
ord
(6 p
oin
ts)
40
max
imu
m(3
po
ints
) 5
5 m
axim
um
(3 p
oin
ts)
30
max
imu
m(6
po
ints
) 9
0 m
axim
um
bullA
ir p
ollu
tio
nS
mo
g i
n t
he
bullS
ulf
ur
nit
rog
en e
mis
sio
ns
bullTre
es a
nd
pla
nts
aff
ecte
dS
tric
ter
En
vir
on
men
tal
Law
s
Nati
on
al
Park
sbull
Aci
d r
ain
(in
jure
d)
gro
wth
hin
der
ed(r
eso
luti
on
s
acts
)
bullO
pp
osit
ion
or
Ign
ori
ng
bull
Gro
un
d-l
evel
ozo
ne
bullV
isib
ilit
y d
ecre
ased
bull19
77 C
lean
Air
Act
(or
lack
of
coo
per
atio
n)
to
Urb
an
In
du
str
ial
em
issio
ns
bullM
eta
ls l
oo
sen
ed
into
wat
ers
Am
en
dm
en
ts l
ab
elin
gN
Ps
envi
ron
men
tal s
tan
dar
ds
bullA
uto
mo
bile
emis
sio
ns
(su
rfac
eg
rou
nd
wat
er
as C
lass I
are
as
(reg
ula
tio
ns)
bullP
ow
er
pla
nts
fac
tori
esp
lan
ts
wat
er p
ollu
ted
)bull
Reg
ion
al h
aze
reg
ula
tio
ns
bullN
o t
rue
lsquopo
int
sou
rces
rsquoem
issi
on
sbull
Nu
trie
nts
rem
oved
pro
po
sed
by
the
EPA
(myri
ad
of
sm
aller
po
llu
tio
n
bullE
mis
sio
ns
fro
m S
mo
kesta
cks
(lea
ched
) fr
om
so
il bull
Red
ucti
on
of
allo
wab
leso
urc
es
rath
er t
han
on
e (c
him
neys)
and
or
pla
nts
po
llu
tio
n s
tan
dard
s f
rom
la
rge
sou
rce)
bullE
mis
sio
ns
fro
m K
iln
sbull
Pu
blic o
utc
ryo
ver
stan
ce
ind
ust
rybull
Nati
on
al
Park
s l
imit
ed
bull
Un
reg
ula
ted
po
lluti
on
sm
og
o
f b
ig b
usi
nes
sg
ove
rnm
ent
bullC
lean
Air
Act
set
ju
risd
icti
on
of
po
lluti
on
(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)bull
Aq
uati
c l
ife in
jure
d (
dam
aged
)vis
ibilit
y g
oals
sou
rces
ou
tsid
e o
f th
e p
arks
bullS
mo
ke f
rom
Co
ntr
olled
bu
rns
bullO
bje
cti
ng
to c
on
str
ucti
on
bullE
mis
sio
ns
fro
m la
rge
citi
esp
erm
its
bullId
en
tifi
cati
on
of
po
lluti
on
so
urc
es
bullIn
du
str
y m
od
ificati
on
s
Insta
llati
on
o
f d
evic
es
(scr
ub
ber
s)
to r
edu
ce
po
lluti
on
bullP
ub
lic p
ressu
re t
op
rote
ct p
arks
206 New tasks for reading comprehension testsA
pp
en
dix
1b
(co
nti
nu
ed)
Exam
ple
s (
1 p
oin
t) 5
maxim
um
Exam
ple
s (
1 p
oin
t o
r 5 p
oin
ts
Exam
ple
s (
1 p
oin
t) 6
maxim
um
Exam
ple
s (
1 p
oin
t o
r 10 p
oin
ts
wit
h o
vera
rch
ing
cau
se
wit
h o
vera
rch
ing
so
luti
on
liste
d a
bo
ve)
liste
d a
bo
ve)
bullG
reat
Sm
oky M
ou
nta
inbull
Au
tom
ob
ile
emis
sio
ns
bullLe
aves
of
pla
nts
tu
rnin
g
bull19
77 C
lean
Air
Act
N
atio
nal
Par
kbull
Po
wer
pla
nts
fac
tori
es
pu
rple
an
d b
row
n (
stip
plin
g)
Am
en
dm
en
ts
lab
elin
gN
Ps
bullG
ran
d C
an
yo
np
lan
ts e
mis
sio
ns
bullH
ind
erin
g p
ho
tosyn
thesis
of
as C
lass I
are
as
Nat
ion
al P
ark
bullE
mis
sio
ns
fro
m
pla
nts
an
d t
rees
bullS
mo
ky M
ou
nta
in S
ou
rces
S
mo
kesta
cks
(ch
imn
eys)
bullR
ed
ucti
on
of
vis
ibilit
yat
bullR
egio
nal
haz
ere
gu
lati
on
s
fro
m O
hio
New
Yo
rk
bullE
mis
sio
ns
fro
m K
iln
sG
ran
d C
an
yo
np
rop
ose
d b
y th
e E
PA
Atl
anta
etc
bull
Un
reg
ula
ted
po
lluti
on
sm
og
bull
Vis
ibili
ty r
educ
ed 9
0 o
f bull
Red
ucti
on
of
allo
wab
lebull
Gra
nd
Can
yon
So
urc
es(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)th
e da
ysp
ollu
tio
n s
tan
dard
s f
rom
fro
m C
A N
V U
T A
Z N
M
bullS
mok
e fr
om C
on
tro
lled
bu
rns
bullS
ee v
ague
blu
e m
asse
sin
du
stry
and
Mex
ico
bullE
mis
sio
ns
fro
m la
rge
citi
esbull
See
hal
f as
far
as
in 1
919
bullT
N L
utt
rell
corp
bu
ildin
g
per
mit
sit
uat
ion
Exam
ple
s (
1 p
oin
t)
10 m
axim
um
Exam
ple
s (
1 p
oin
t)
5 m
axim
um
bullN
avaj
o G
ener
atin
g S
tati
on
bull
Ob
ject
ing
to
kiln
s in
TN
Pag
e A
Z (
Po
wer
Pla
nt
AZ
)bull
Res
earc
her
s u
sin
g s
cien
tifi
cbull
Po
wer
pla
nts
in T
N a
nd
OH
tech
no
log
y to
ID s
ou
rces
rive
r va
lley
(rad
ioac
tive
iso
top
es)
bullS
ou
ther
n C
alif
orn
ia E
dis
on
bull
So
uth
ern
Cal
ifo
rnia
Ed
iso
nP
lan
t L
aug
hlin
NV
Pla
nt
in L
aug
hlin
NV
bull
Ten
nes
see
Lutt
rell
Kiln
s T
Nid
enti
fied
as
po
int
sou
rce
bullIn
du
stry
in A
Zbull
Scru
bber
sin
stal
led
at
bullC
ars
in C
AN
avaj
o G
ener
atin
g P
lan
tbull
Sm
oke
stac
ks in
NM
NV
UT
bull10
r
edu
ctio
n o
f p
ollu
tio
nbull
Los
An
gel
esin
10ndash
15 y
ears
(g
oal
of
no
bullA
tlan
tam
an-m
ade
po
lluti
on
)bull
New
Yo
rk
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
206 New tasks for reading comprehension testsA
pp
en
dix
1b
(co
nti
nu
ed)
Exam
ple
s (
1 p
oin
t) 5
maxim
um
Exam
ple
s (
1 p
oin
t o
r 5 p
oin
ts
Exam
ple
s (
1 p
oin
t) 6
maxim
um
Exam
ple
s (
1 p
oin
t o
r 10 p
oin
ts
wit
h o
vera
rch
ing
cau
se
wit
h o
vera
rch
ing
so
luti
on
liste
d a
bo
ve)
liste
d a
bo
ve)
bullG
reat
Sm
oky M
ou
nta
inbull
Au
tom
ob
ile
emis
sio
ns
bullLe
aves
of
pla
nts
tu
rnin
g
bull19
77 C
lean
Air
Act
N
atio
nal
Par
kbull
Po
wer
pla
nts
fac
tori
es
pu
rple
an
d b
row
n (
stip
plin
g)
Am
en
dm
en
ts
lab
elin
gN
Ps
bullG
ran
d C
an
yo
np
lan
ts e
mis
sio
ns
bullH
ind
erin
g p
ho
tosyn
thesis
of
as C
lass I
are
as
Nat
ion
al P
ark
bullE
mis
sio
ns
fro
m
pla
nts
an
d t
rees
bullS
mo
ky M
ou
nta
in S
ou
rces
S
mo
kesta
cks
(ch
imn
eys)
bullR
ed
ucti
on
of
vis
ibilit
yat
bullR
egio
nal
haz
ere
gu
lati
on
s
fro
m O
hio
New
Yo
rk
bullE
mis
sio
ns
fro
m K
iln
sG
ran
d C
an
yo
np
rop
ose
d b
y th
e E
PA
Atl
anta
etc
bull
Un
reg
ula
ted
po
lluti
on
sm
og
bull
Vis
ibili
ty r
educ
ed 9
0 o
f bull
Red
ucti
on
of
allo
wab
lebull
Gra
nd
Can
yon
So
urc
es(f
rom
oth
er c
ou
ntr
ies
Mex
ico
)th
e da
ysp
ollu
tio
n s
tan
dard
s f
rom
fro
m C
A N
V U
T A
Z N
M
bullS
mok
e fr
om C
on
tro
lled
bu
rns
bullS
ee v
ague
blu
e m
asse
sin
du
stry
and
Mex
ico
bullE
mis
sio
ns
fro
m la
rge
citi
esbull
See
hal
f as
far
as
in 1
919
bullT
N L
utt
rell
corp
bu
ildin
g
per
mit
sit
uat
ion
Exam
ple
s (
1 p
oin
t)
10 m
axim
um
Exam
ple
s (
1 p
oin
t)
5 m
axim
um
bullN
avaj
o G
ener
atin
g S
tati
on
bull
Ob
ject
ing
to
kiln
s in
TN
Pag
e A
Z (
Po
wer
Pla
nt
AZ
)bull
Res
earc
her
s u
sin
g s
cien
tifi
cbull
Po
wer
pla
nts
in T
N a
nd
OH
tech
no
log
y to
ID s
ou
rces
rive
r va
lley
(rad
ioac
tive
iso
top
es)
bullS
ou
ther
n C
alif
orn
ia E
dis
on
bull
So
uth
ern
Cal
ifo
rnia
Ed
iso
nP
lan
t L
aug
hlin
NV
Pla
nt
in L
aug
hlin
NV
bull
Ten
nes
see
Lutt
rell
Kiln
s T
Nid
enti
fied
as
po
int
sou
rce
bullIn
du
stry
in A
Zbull
Scru
bber
sin
stal
led
at
bullC
ars
in C
AN
avaj
o G
ener
atin
g P
lan
tbull
Sm
oke
stac
ks in
NM
NV
UT
bull10
r
edu
ctio
n o
f p
ollu
tio
nbull
Los
An
gel
esin
10ndash
15 y
ears
(g
oal
of
no
bullA
tlan
tam
an-m
ade
po
lluti
on
)bull
New
Yo
rk
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
Latricia Trites and Mary McGroarty 207
Appendix 2a Reading to Integrate task integration activity
Directions For the next 15 minutes reflect on what you read and compose a short essay that combines the information in thematerial read and makes connections across the range of ideas andarguments presented
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
mdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdashmdash
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
208 New tasks for reading comprehension tests
Appendix 2b Reading to Integrate scoring rubric
Integration50 Excellent Integrates texts accurately and successfully on
multiple levels and creates a true DocumentsModel Generalizes at least two macrostructureconcepts common across texts (this may besimply identifying the existence of amacrostructure followed by the supportingmacrostructures from each text) Effectivelyintegrates relevant support (ie details ormacrostructures) to support generalizations andmay still discuss each article separately to somedegree Integrates macrostructures present inboth texts
40 Good Integrates through the creation of a welldeveloped and accurate introduction andor aconclusion yet summarizes articles separatelyANDOR generalizes one major macrostructurethat is fully developed with support Uses somerelevant support
30 Fair Attempts to integrate through the creation of apartially developed possibly inaccurateintroductory ANDOR concluding statementusually related to the main topic of the articlesbut has no substantive development Creates aText and Situation Model of each text separatelyDoes not make generalizations for integrationbeyond the one statement May make evaluativeor editorial statements however these may beinaccurate
20 Poor Creates a well developed and accurate TextModel and Situation Model for each of thetexts yet attempts no integrative connectionacross the texts
10 Very Poor Ineffectively or inaccurately attempts tosummarize each article in a very reporting stylerevealing little or no use of backgroundknowledge contextualization or evaluationResponse is simply a recall of information fromthe texts No introduction or conclusion ORparticipant confuses texts and sees separatetexts as one issue (problem)
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
Latricia Trites and Mary McGroarty 209
0 No Response Participant does not attempt response oronly addresses one article
MacrostructuresParticipants will receive one point for each macrostructureaccurately identified in each text with a minimum score of 4 awardedfor the inability to accurately identify any macrostructures
25 Excellent Accurately identifies all 4 macrostructures(Total 8) present in both texts
20 Good Accurately identifies all 4 macrostructures in(Total 67) one text and 3 in the other OR accurately
identifies 3 of the four macrostructures inboth texts (Or 4 in one and 2 in the other)
15 Fair Accurately identifies 3 of the four(Total 45) macrostructures in one text and 2 of the four
in the other (Or 4 in one and 1 in the other)OR accurately identifies 2 of the fourmacrostructures in both texts (Or 4 in oneand 0 in the other or 3 in one and 1 in theother)
10 Poor Accurately identifies 2 of the macrostructures
(Total 23) in one text and 1 in the other (Or 3 in oneand 0 in the other) Accurately identifies 1 ofthe four macrostructures in both texts (Or 2in one and 0 in the other)
5 Very Poor Accurately identifies 1 of the four (Total 01) macrostructures in one text and 0 in the
other OR unable to accurately identify anymacrostructures in the texts
0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response
210 New tasks for reading comprehension tests
Use of relevant details
5 Excellent Effectively uses multiple relevant details assupport no irrelevant or erroneous details
4 Good Effectively uses relevant details as supportmay include some inaccurate or irrelevantdetails (More than 50 of details arerelevant)
3 Fair Possibly uses one or two relevant details yetseveral irrelevant details or inaccurate detailserroneous or fabricated) appear in thesynthesis (50 or less of details arerelevant)
2 Poor Erroneous or fabricated details used inattempt to support arguments presented doesnot include relevant details
1 Very Poor No details used or listed0 No Response Participant does not attempt response