  • Examining dialogue: another approachto content specification and tovalidating inferences drawn from testscoresMerrill Swain University of Toronto

    In this article one aspect of the many interfaces between second language (L2)learning and L2 testing is examined. The aspect that is examined is the oral interac-tion the dialogue that occurs within small groups. Discussed from within asociocultural theory of mind, the point is made that, in a group, performance isjointly constructed and distributed across the participants. Dialogues construct cog-nitive and strategic processes which in turn construct student performance, infor-mation which may be invaluable in validating inferences drawn from test scores.Furthermore, student dialogues provide opportunities for language learning, i.e.,opportunities for the joint construction of knowledge. It is suggested that an exam-ination of the content of these dialogues can provide test developers with targetsfor measurement. Other implications for L2 testing are also discussed.

    I Introduction

    In this article I examine one aspect of the many interfaces betweenthe fields of second language (L2) assessment and L2 learning: smallgroups and the oral interaction that occurs within them.

    Small groups consist of two or more individuals. Small groups aredifferent from interview contexts where asymmetry in the exchangesare a given, and where one person is solely responsible for beginningand ending the interaction [and] for ending one topic and introducinga new topic . . . (van Lier, 1989: 489). Both language educators andlanguage assessors are interested in what individuals say to each otherin small groups. I am interested because what individuals say to eachother can inform us about L2 learning processes and strategies. Lang-uage assessors are interested because there are tests which evaluatethe performance of individuals as they interact in pairs or smallgroups. Importantly, some of those tests are high-stakes tests.

    Address for correspondence: Merrill Swain, The Ontario Institute for Studies in Education of the University of Toronto, 252 Bloor St. W., Toronto, Ontario, M5S 1V6, Canada; email:MSwain oise.utoronto.ca

    Language Testing 2001 18 (3) 275302

  • 276 Content specification & validating inferences drawn from test scores

    Given that interaction in small groups is one point of overlapamong our interests, I discuss some of the research that I am currentlyengaged in and the theoretical orientation within which this researchis being conducted. Small groups are important to them both. Mygoal is to suggest some of what we, as researchers, might discoverabout L2 learners and about L2 test-takers by examining their dia-logues as they jointly construct their performance in group activities.Specifically I raise two issues for consideration:

    1) Might these dialogues generate content which could serve as newtargets for measurement?

    2) Might analyses of the dialogues provide additional insights forexamining the validity of the inferences we draw from a test andthe uses we make of them?

    It is perhaps important to say at this point that the ways in whichwe have been examining dialogue (e.g., Swain and Lapkin, 1998) aresomewhat different from the text and discourse analyses that are nowquite commonly applied in our respective fields. Text and discourseanalyses focus on linguistic and interactional features of speaking,rather than on its content and on the underlying cognitive and strategicprocesses which both generate that talk, and that that talk generates.Our literatures are rich in the use of text and discourse analyses whichhave been used, for example, to examine the similarities and differ-ences of test-speak with that used outside of the testing situation(e.g., Lazarton, 1992), or of the performances generated by differentoral tasks. A large number of features such as lexical density, fluency,structural complexity and turn-taking have been examined (e.g., Sho-hamy et al., 1993; Young and He, 1998). Also, the ways in whichwe have been examining dialogue are somewhat different from thatof other verbal protocol techniques (Green, 1998) in that the datawe have examined are the dialogues that occur between participatingindividuals, not those which occur in solo think-aloud reports andretrospective accounts. What I propose could complement theseanalyses as a source of validity evidence.

    In the next section I provide two examples of high-stakes testswhich make use of small groups, along with a sampling of some ofthe issues that have been considered by researchers in the testing fieldrelated to small group testing. This is done as a reminder of the impor-tance of understanding the dynamics of small group interaction toassessment practices. In Section III I introduce some ideas emanatingfrom a theoretical perspective a sociocultural theory of mind which has recently been attracting interest amongst some L2 learningresearchers. In Section IV I then consider aspects of our research,which is being conducted from within this theoretical framework. The

  • Merrill Swain 277

    purpose of this is to explore the relevance of this theoretical orien-tation and research to language testing.

    II Small group tests and related issues

    A number of high and lower stakes tests incorporate a speaking sec-tion in which two or more candidates are required to talk to eachother. For example, the main suite of UCLESs five EFL Examin-ations each have a speaking component that involves interactionamong the candidates. The non-interview-like part of the speakingsection of the First Certificate in English, which lasts for approxi-mately seven minutes, is described in their handbook as follows: Thecandidates are given visual prompts (photographs, line drawings, dia-grams, etc.) which generate discussion through engaging in tasks suchas planning, problem solving, decision making, prioritising, speculat-ing, etc. (UCLES, 1996: 84) (three minutes). The examiner thenencourages a discussion among the participants of matters related tothe theme of the visual prompt (four minutes).

    A second example among high-stakes tests comes from HongKong. There, performance on the Hong Kong Use of English A/Slevel Examination determines whether a student can gain entry intoHong Kongs tertiary institutions. Since 1994, this test includes a two-part oral component, the second part involving groups of four studentsinteracting in a university-like setting, replicating, for example, asmall academic seminar.

    There are a number of reasons why some tests now include pairsor small groups of individuals interacting together to debate an issueor to solve a problem. Dissatisfaction with the oral interview as thesole means of assessing oral proficiency and a search for other tasksthat elicit different aspects of oral proficiency (Shohamy et al., 1986)are concomitant reasons. An attempt to influence teaching practices(Hilsdon, 1991) or, alternatively, to mirror teaching practices havealso played an important role. Economic reasons, too, have playedtheir part: where there are many students to be tested, it can be lessexpensive to test them in groups (Berry, 2000).

    Given that small group testing occurs in even one high-stakes test,as well as its reasoned use, it is surprising that so little validationwork has been carried out. How are scores based on interactionamong participants to be interpreted as an indication of individualperformance ability? Can they be interpreted as individual perform-ance ability at all? McNamara, in his thought-provoking paper entitledInteraction in second language performance assessment: Whoseperformance? (1997) has raised precisely these questions. His ques-tions encompass a broad range of interactions including those between

  • 278 Content specification & validating inferences drawn from test scores

    interviewer and interviewee, and those between test-raters and test-takers. The research of Lumley and Brown (1996), for example,found that features of an interviewers language behaviour differen-tially support or handicap a test candidates performance. The pointhere is that performance is not solo performance, but rests on a jointconstruction by the participating individuals.