teaching writing teachers about assessment

Teaching writing teachers about assessment

Sara Cushing Weigle *

Department of Applied Linguistics & ESL, Georgia State University,

P.O. Box 4099, Atlanta, GA 30302-4099, USA

Abstract

The assessment of student writing is an essential task for writing teachers, and yet many graduate

programs do not require students to take a course in assessment or evaluation, and courses on teaching

writing often devote only a limited amount of time to the discussion of assessment. Furthermore, teachers

frequently need to prepare their students for externally mandated large-scale writing assessments, and thus

they need to have an understanding of the uses and misuses of such tests. This article outlines some of the

essential considerations in classroom and large-scale assessments and provides suggestions for how to

incorporate considerations about assessment into a course on teaching writing or as a stand-alone course.

# 2007 Elsevier Inc. All rights reserved.

Keywords: Second language writing; Writing assessment; Teacher education

Assessment of student writing is an essential task for writing teachers. Unfortunately,

however, many graduate programs in TESOL and rhetoric/composition do not require students to

take a course in assessment or evaluation, and courses on teaching writing often devote only a

limited amount of time to the discussion of assessment. Moreover, teachers often feel that

assessment is a necessary evil rather than a central aspect of teaching that has the potential to be

beneficial to both teacher and students. They may believe, rightly or wrongly, that assessment

courses focus too much on statistics and large-scale assessment and have little to offer classroom

teachers. As a result, teachers sometimes avoid learning about assessment or, worse, delay

thinking about how they will assess their students until they are forced to do so, a situation which

unfortunately decreases the chances that assessments will be fair and valid.

At the same time, writing teachers often find themselves in a position of having to prepare

their students for externally imposed assessments such as departmental or university-wide exit

examinations or large-scale high stakes tests such as the test of English as a foreign language

Available online at www.sciencedirect.com

Journal of Second Language Writing 16 (2007) 194–209

* Tel.: +1 404 413 5192; fax: +1 404 413 5201.

E-mail address: [email protected].

1060-3743/$ – see front matter # 2007 Elsevier Inc. All rights reserved.

doi:10.1016/j.jslw.2007.07.004

mailto:[email protected]

http://dx.doi.org/10.1016/j.jslw.2007.07.004

(TOEFL). Teachers sometimes feel that such assessments have little to do with the skills they are

trying to teach their students; consequently, they may approach these tests with some resistance

and, unfortunately, little understanding of how such tests are constructed or scored and whether

or not they have been validated for the purpose for which they are being used.

It is my belief that writing teachers must be adequately prepared to construct, administer, score,

and communicate the results of valid and reliable classroom tests, and that, similarly, they should

have an understanding of the uses and misuses of large-scale assessments so that they can be critical

users of such tests and effective advocates of their students in the face of mandatory assessments not

of their own making. In this paper, I start by outlining some of the fundamental principles of

assessment in general, and then discuss the process of test development, some of the considerations

that teachers must think about in designing classroom writing assessments, and some suggestions

for how teacher trainers might approach these issues in a course on second language writing issues

or on assessment. Finally, I discuss large-scale assessment and some of the ways in which teachers

can be empowered by a deeper understanding of these assessments that affect their students.

Classroom assessment

For any teacher, the ability to design fair and valid ways of assessing their own students’

progress and achievement is an essential skill. In order to do so, teachers need to understand the

range of possibilities for assessing students, what the essential qualities of a good assessment

instrument are, and how to develop assessments that maximize these essential qualities within the

constraints of time and resources that teachers face.

It may be useful at first to clarify some terminology and to outline various types of

assessments. Assessment is a broad term that encompasses all sorts of activities that teachers

engage in to evaluate their students’ progress, learning needs, and achievements. As Brown

(2004) notes, teachers are constantly evaluating their students in informal ways, and these

informal evaluations are an important part of assessment, just as more formal tests are. Informal

assessments include such things as clarification checks to make sure students understand

particular teaching points, eliciting responses to questions on style and usage from students, or

circulating among students doing peer response work to ensure that they are on task. Formal

assessments can be defined as ‘‘exercises or procedures specifically designed to tap into a

storehouse of skills and knowledge’’ (Brown, 2004, p. 6). For a writing class, formal assessments

may include traditional writing tests, for example, an exercise in which students are required to

generate one or more pieces of connected discourse in a limited time period, which are then

scored on some sort of numerical scale (Hamp-Lyons, 1991a,b), and other activities, in particular,

response to and evaluation of artifacts such as portfolios, homework assignments, or out – class

writing assignments. It is important for teachers to recognize that all of these activities – informal

assessments, and various types of formal assessments, including tests – have a place in a teacher’s

assessment toolbox, all are appropriate under certain circumstances, and all need to be evaluated

according to the most important qualities of effective assessments: in particular, reliability,

validity, and practicality. Thorough treatments of these qualities can be found in a variety of

sources, including those listed in the Appendix. I include only a brief discussion of them here.

A good test is reliable; that is, it is consistent. A student should get the same score on a test one

day as on the next day (assuming, of course, that no additional learning has taken place in the

interim) or from one grader/rater as from another. If there is a choice of topics or tasks, they

should be equivalent in difficulty so that a student’s chances of performing optimally do not

depend on which topic they choose. Finally, conditions of administration should be as similar as

S.C. Weigle / Journal of Second Language Writing 16 (2007) 194–209 195

possible so that factors not related to the skills being assessed do not affect student performance.

For example, students should all be given the same amount of time to complete the assessment.

A good test is valid for the purposes for which it is being used. Validity is a complex issue that

is discussed at length in many references on assessment (e.g., Bachman, 1990; Bachman &

Palmer, 1996; Hamp-Lyons, 1991a,b; Hudson & Brown, 2002; McNamara, 1996). In essence,

validity has to do with the appropriateness of decisions that will be made on the basis of the test so

that, for example, students who are capable of demonstrating excellent work in class are able to

do so on the test, and those who are not as capable are not able to pass the test by other means (for

instance, by lucky guessing or by memorizing a response). For most classroom purposes, the

most important validity consideration is that the content of the test is representative of the skill(s)

and knowledge that are being taught in the course, both in terms of covering the range of skills

adequately, and also in terms of not assessing skills that are not being taught in the course.

A good test is practical; that is, it can be developed, administered, and scored within the

constraints of available resources, particularly time. For teachers, practicality is an overriding

concern; writing teachers in particular know how time-consuming it is to grade papers. Teachers

need to have realistic expectations about how much time they can devote to developing

assessments, as well as how long it will take to administer and score any assessment of writing.

Reliability, validity, and practicality are not the only considerations for assessment. For

example, Bachman and Palmer (1996) include interactiveness, authenticity, and impact, or the

effect of an assessment on learners, teachers, and other stakeholders, in their model of test

usefulness, but for classroom teachers, these three are perhaps the most critical to be familiar with.

The test development process

Whether one is writing a test for an individual classroom or for large-scale administration, the

essential steps are the same. Many books on language testing provide guidance for test

development at the classroom level and for large-scale tests which go into greater detail than is

possible here (see, for example, Alderson, Clapham, & Wall, 1995; Bachman & Palmer, 1996;

Weigle, 2002). For any classroom test, there are four major considerations that go into an

assessment procedure. These are:

� setting measurable objectives,

� deciding on how to assess objectives (formally and informally),

� setting tasks,

� scoring.

Specifying measurable objectives

One of the most fundamental lessons about assessment is that decisions about assessment

should not be left until the end of instruction, but rather should be taken into account from the

very beginning, preferably in the earliest planning stages for a course. Teachers need to learn how

to articulate precisely what it is they hope students will learn in their courses so that they can

develop ways of assessing whether their students have, in fact, mastered the course objectives.

For this reason, it is helpful to state course objectives in terms of observable behaviors or products

so that they can be evaluated appropriately.

Many writing course syllabi contain general objective statements such as ‘‘students will learn

the basics of academic writing’’ or ‘‘after completing this class, you will know how to revise and

S.C. Weigle / Journal of Second Language Writing 16 (2007) 194–209196

edit your writing.’’ The problem with statements that are framed in this way is that they do not

provide any guidance for developing assessments that will help teachers judge whether students

have met these objectives. As a teacher, how does one know when a student has ‘‘learned the

basics of academic writing’’? Does the writing of a student who has accomplished this objective

differ from that of a student who has not? Without a clearer statement of measurable outcomes, it

will be impossible for teachers to know whether they have been successful. This problem of

vaguely worded objectives is compounded in a multi-level program where students need to

progress through two or more levels, and different sections of the same course are taught by

different teachers. Statements such as those above do not provide useful ways of articulating

between levels.

It is, therefore, much more helpful to start out by stating objectives in such a way that it is clear

when the objectives have been met. There are many sources in the educational literature for

writing clear objectives (see, for example, Gronlund, 2004), but my own inspiration in this area

comes from business rather than education. David Allen, in his excellent book Getting Things

Done, provides a three-step model for stating outcomes, which can be applied just as easily to

teaching as to the business world.

The steps are:

1. View the project from beyond the completion date.

2. Envision ‘‘WILD SUCCESS.’’

3. Capture features, aspects, qualities you imagine in place (p. 69).

In terms of teaching writing, these steps can be conceptualized as follows:

1. Imagine the class and the students at the end of the term.

2. Think about the very best piece of writing that could come from this class.

3. Describe its attributes. What does it look like? What makes it stand out? Is it the correct use of

verb tenses? Is it the vivid details or the insightful thinking that went into the writing? Is it the

use of transitions and other cohesive devices? Has the student revised appropriately to

instructor and/or peer feedback? Teachers who can articulate what they imagine their best

writers can accomplish at the end of a term are in a good position to begin developing

assessments. Furthermore, by defining one’s objectives in this way at the beginning of

instruction, teachers can begin to plan out how they will assist students in reaching these goals,

thus allowing concerns about assessment to inform instruction from the very beginning.

One activity that can help teachers with articulating their objectives is to have them write an

imaginary endnote to a final draft of a writing assignment from their course, as in Figure 1. Note

that the questions cover three main areas: what the student has done well (i.e., the student’s


Fig. 1. Imaginary endnote to a final draft of a writing assignment.

strengths, whether specifically learned in the course or not), ways in which the student has

improved (i.e., what the student has learned from the course), and what the student could focus on

for the future (i.e., what the student may not yet have mastered but is ready to learn). The

questions are flexible enough to cover linguistic, content, rhetorical, or process dimensions of

writing.

Once teachers have determined what a successful paper would look like, they are ready to

write outcome statements that contain measurable objectives. One rule of thumb for specifying

objectives is to include three characteristics: a description of the performance itself, or what the

student is expected to do, the conditions under which the performance will be elicited, and the

level of performance that will be deemed acceptable (Mager, 1975, cited in Ferris & Hedgcock,

2005).

Some teachers may object that setting goals in this way is inappropriate for teaching writing,

especially those who view personal expression as the main goal of writing instruction (see

Raimes, 1991, for an overview of different perspectives on the goals of writing courses). Indeed,

one of the dangers of writing objectives in this way is that what is measurable is not always what

is essential, so that the focus often turns to easily quantifiable traits of essays such as error counts.

Teachers need to find a compromise that they can live with between too much specificity, which

can lead to an unhealthy focus on lower level skills to the detriment of the big picture, and too

much generality, which can make it nearly impossible to ascertain how successfully the course

objectives have been met. One example of such a compromise can be found in Figure 2. Note that

the outcome statements cover objectives related to the range of written products, the use of

language, and the writing process, and are written using verbs that describe observable behavior

(uses, writes, etc.).

The benefits of specifying outcomes in this way are numerous. Teachers can use these

outcome statements to make teaching decisions and to design rubrics for evaluating writing, and

students benefit because what is expected of them becomes much more clear.

One assignment that that can help students learn how to practice writing clear course

objectives is to provide a sample syllabus (see, for example, Ferris & Hedgcock, 2005, pp. 110–

118). The students’ task is to evaluate the course objectives in terms of whether they are specific

and measurable. For ones that are not, students need to rewrite the objectives so that they are

specific and measurable. For those objectives that are already specific, students can discuss how

they would write assignments that measure those objectives.

Deciding on how to assess objectives

Once teachers have a list of objectives, the next step in the process is to decide which

objectives will be assessed informally, which will be assessed formally through tests, and which

will be assessed formally through means other than tests. For example, objectives related to

critical thinking skills might best be assessed through informal means such as observing

participation in class discussions or responding to reading journals, while more specific

language-related objectives such as the correct use of verb tenses might be assessed as part of a

test, either as a controlled exercise or as part of the evaluation of a timed writing assignment.

Teachers need to be aware of the multiplicity of ways in which various objectives might be

tested. The bibliography at the end of this article contains numerous resources for designing

assessment tasks, testing books in particular. Cohen (1994) and Hughes (2002) contain chapters

devoted to various ways of assessing writing skills either holistically or as discrete subskills. In

this next section, I will focus on setting tasks for independent writing (either as timed single-draft


essays or as untimed multiple-draft essays) rather than for testing subskills such as grammatical

knowledge or the ability to paraphrase. Following this, I describe portfolio assessment as a

potentially more valid way of assessing many aspects of writing than can be assessed in a single

test. First, however, I will explore the issue of whether one should test writing at all in a writing

course; that is, under what circumstances is a writing test appropriate?

In-class versus out-of-class writing

Writing teachers frequently face the dilemma of whether to assess in-class as well as out-of-

class writing. Particularly in classes where the writing process is emphasized, many teachers feel

that it is counterproductive to assess students on a single draft of a paper, especially on an

impromptu topic that students may not have had time to think about before the day of the


Fig. 2. Examples of outcome statements.

assessment. If a final examination for a writing course consists of impromptu writing only,

students are given a mixed message about what kind of writing is actually important for them to

be able to master.

Furthermore, most writing outside of testing situations in the real world is not completed

under time pressure. This is particularly true for academic writing. The process of writing

involves reflection, discussion, reading, feedback, and revision, and one’s best work is usually not

produced in a single draft within 30 or 60 minutes.

A final reason for emphasizing out-of-class writing is that some L2 students may have

difficulties on timed writing tests even if they are successful in other academic writing tasks


Fig. 2. (Continued ).

(Byrd & Nelson, 1995; Johns, 1991). Furthermore, English teachers without ESL training may be

susceptible to basing their evaluations of NNS writing more on sentence-level concerns than on

content or rhetorical concerns (Sweedler-Brown, 1993). NNS may not be able to perform as well

under time pressure as their native-speaking peers, and this may be especially noticeable in timed

writing.

However, there are at least three important reasons why teachers would want to include some

sort of in-class writing assessment as part of their assessment of students’ abilities. The first

reason is simply a pragmatic one: timed writing tests are a fact of life for many students. Writing

tests have become standard on large-scale high-stakes tests such as the TOEFL and the GRE, and

such tests can have a profound effect on students’ futures. Furthermore, in content courses such

as history or psychology – at the undergraduate level, at least – students are frequently expected

to write short essays on their examinations (Carson, Chase, Gibson, & Hargrove, 1992). The

ability to compose under time pressure is thus critical for many students, and the writing class can

be a valuable place to learn strategies for timed writing and to practice this skill.

In addition, while collaboration in writing is often seen as an important component of the

writing process, there are times when teachers want to know what students can do on their own

without assistance. In out-of-class writing assignments there is always the danger that students

have received inappropriate amounts and kinds of help from tutors, friends, or roommates. In

particular, second language writers may ask their native speaker friends to proofread their papers

and fix sentence-level errors. Teachers are certainly justified in asking students to produce at least

some writing in class where they are unable to rely on such outside support.

A third reason for testing writing in class under timed condition comes from second language

acquisition theory. From a psycholinguistic viewpoint, in-class writing can serve as a test of

automatized knowledge of English. In general, adults writing in their first language have

automatic access to lexical and syntactic resources, while for many second language writers,

particularly at lower levels of proficiency, these processes are not yet automatic, so writers need

to focus conscious attention on retrieving words and explicit grammar rules from long-term

memory. This need to pay attention to word and sentence level concerns makes it difficult to

focus on macro-level issues such as overall structure and organization and writing strategies that

they may use in their first language (see Weigle, 2005, for a summary of research in this area).

Furthermore, as Ellis (2005) demonstrates, different tasks evoke implicit and explicit

knowledge. Ellis found that an untimed grammaticality judgment test evoked explicit or rule-

governed knowledge, particularly for those sentences that were ungrammatical, while a timed

test evoked implicit knowledge. One might hypothesize on the basis of these results that timed

and untimed writing assignments would evoke different knowledge types, and, therefore, if one is

interested in knowing how much linguistic knowledge is implicit and automatized, a timed

writing assessment may be an appropriate vehicle for this purpose.

For these reasons, although many writing teachers feel that in-class writing does not allow

students to demonstrate their best ability, one can justify assessing both in-class and out-of-class

writing as complementary sources of information about student abilities, particularly when it

comes to making high-stakes decisions such as passing or not passing a course. In such cases, as

assessment specialists (e.g., Brown & Hudson, 1998) frequently point out, it is particularly

critical to use multiple sources of information, as no single test of an ability is without error.

In assessing in-class or timed writing, however, classroom teachers have advantages that

developers of large-scale tests do not, in that they can modify the timed impromptu essay to take

advantage of the extended time they spend with students before a test. Elsewhere (Weigle, 2002),

I have presented ways of modifying the timed impromptu essay to fit the classroom environment.


Possibilities include strategies such as discussing a topic in class and doing preliminary

brainstorming, allowing students to write an essay outline before writing their drafts in class, and/

or writing an in-class draft for a grade, followed by revising it out of class based on teacher or

peer feedback for a separate grade. Because of the difficulties that second language writers often

have managing both the content and linguistic demands of a writing assignment, giving students

the opportunity to prepare the content in advance of the writing may allow them to demonstrate

their best writing.

Setting tasks

Whether or not one is evaluating in-class or out-of-class writing, a useful approach to task

development is to begin by drafting test specifications as a way of articulating clearly what one is

attempting to assess. Specifications are particularly important in developing large-scale tests, but

even for an individual teacher, specifications can be helpful as a tool for planning out an

assessment. Specifications can benefit teachers in at least three ways: (1) the process of

developing specifications helps to ensure that teachers have considered the specific aspects of

writing that they are attempting to assess and how those aspects are operationalized in tasks and

scoring procedures; (2) within a given program, teachers can share specifications so that courses

at the same level can maintain the same evaluation standards and procedures; and (3) sharing

specifications with students allows them to know exactly how they will be assessed, in terms of

what sorts of tasks they can be expected to perform and how they will be evaluated (Weigle,

2002).

Specifications can take many forms, but one useful format is that described in detail in

Davidson and Lynch (2002). The main parts of the specification are (a) a general description of

the skill(s) or ability(ies) being tested, including a rationale for why these particular skills are

important for the given testing context; (b) a description of the prompt, or the instructions to

the student about what to write, including a description of any additional stimulus material

such as reading passages, pictures, or graphs; and (c) a description of the scoring guide, or

rating scale.

In my experience, teachers in training are often skeptical of the value of specifications and

sometimes resistant to the notion of spending time on specifications until they actually go

through the process of developing a specification and a test. However, they usually find that

writing a specification is helpful in clarifying their thinking and anticipating potential difficulties,

and that, in the long run, writing a specification saves time. As one student wrote in an online

posting for an assessment course:

When we began discussing test specifications I felt sooo lost and had no clue where to

begin. After reading and discussing in class, I thought that writing the specs would be

difficult but not impossible. Now, I can just look at my specs and create test items with

much more understanding of how the process works. I would just like to advertise for spec

writing and say that they really are the blueprints and make things so much clearer when it

comes to creating a test that is relevant. I finally see the light even though I still have much

to learn and perfect when it comes to writing tests.

As noted above, specifications should include a description of the prompt (instructions to the

student) and of the expected response. Useful guidelines for designing prompts can be found in

Kroll and Reid (1994). Depending on the goals of the assessment, specifications can include any

of the dimensions for writing tasks (from Weigle, 2002) outlined in Table 1. For example, one


might specify that students will write a one page (length) narrative letter (rhetorical task/genre) to

a close friend (audience) using a series of picture prompts (stimulus) as input, and so on.

Scoring

One of the most troublesome aspects of assessing writing for many teachers is assigning letter

grades or numerical scores to their students’ work. One reason for this difficulty is that many

teachers feel much more comfortable in the role of supportive coach than of evaluator. Another

reason is that teachers sometimes begin their assessment with some idea of how many points a

particular assignment is worth, but without a clear notion of how those points should be awarded

or the criteria they should use to grade their student work.

For these reasons, among others, teachers need to have a systematic process for assigning

scores to essays or other written work and some sort of written rubric that outlines the criteria for

grading. Sources for writing rubrics abound in print and online, so there is little need for a teacher

to start from scratch in developing a rubric for grading.

In creating a rubric, teachers need to be familiar with the main types of rubrics. Rubrics vary

along two dimensions: whether they are general (to be used across a variety of assignments/

writing tasks) or specific to an assignment, and whether a single score is given (usually referred to

as a holistic scale) or analytic (i.e., points are given for different aspects of writing, such as

content, organization, and use of language). Much has been written about the advantages and

disadvantages of different types of scoring rubrics; see in particular Hamp-Lyons (1991a,b) and

Weigle (2002, chap. 6). While arguments can be made for either type of scoring rubric, research

suggests that, while holistic scales are faster and more efficient, analytic scales tend to be

somewhat more reliable than holistic scales, and certainly provide more useful feedback to

students, as scores on different aspects of writing can tell students where their respective

strengths and weaknesses are.


Table 1

Dimensions of tasks for writing assessment

Dimension Examples

Subject matter Self, family, school, technology, etc.

Stimulus Text, multiple texts, graph, table

Genre Essay, letter, informal note, advertisement

Rhetorical task Narration, description, exposition, argument

Pattern of exposition Process, comparison/contrast, cause/effect, classification, definition

Cognitive demands Reproduce facts/ideas, organize/reorganize information,

apply/analyze/synthesize/evaluate

Specification of

Audience Self, teacher, classmates, general public

Role Self/detached observer, other/assumed persona

Tone, style Formal, informal

Length Less than 1/2 page, 1/2 to 1 page, 2–5 pages

Time allowed Less than 30 min, 30–59 min, 1–2 h

Prompt wording Question vs. statement, implicit vs. explicit, amount of context provided

Choice of prompts Choice vs. no choice

Transcription mode Hand-written vs. word-processed

Scoring criteria Primarily content and organization, primarily linguistic accuracy, unspecified

Weigle (2002). Adapted from Purves, Soter, Takala, and Vahapassi (1984, pp. 397–398) and Hale et al. (1996).

In training teachers, it is useful to have them try out existing scoring rubrics on a set of essays

on a given topic—perhaps one holistic rubric such as the TOEFL writing rubric and one analytic

rubric such as that proposed by Jacobs, Zinkgraf, Wormuth, Hartfiel, and Hughey (1981) – and

compare their answers in small groups. Teachers in training usually learn from this experience

that (a) without exemplars at different levels the various descriptors are difficult to interpret

consistently; (b) they can usually agree on the best and the worst essays, but the ones in the

middle are more difficult to agree on; and (c) different raters read different things into papers and

bring their own values and experiences into the rating process, which highlights the importance

of rater training to clarify how the scale should be used in a given context so that raters can learn

to apply similar standards.

To summarize, there are many things that novice teachers need to learn about developing their

own classroom writing assessments—in particular, how to articulate their course objectives

clearly so that their assessments match their instruction as closely as possible, how to construct

prompts that can elicit reliable samples of writing that are valid indicators of their students’

ability, and how to score writing reliably and efficiently. This article has only scratched the

surface of these issues; the interested reader is referred to the sources listed in the Appendix for

additional information in these areas.

Portfolio assessment

Experienced writing teachers and scholars agree that writing tests such as those described in

the previous section is quite limited in terms of its usefulness in assessing the complete range of a

student’s writing ability. Writing ability is perhaps best conceptualized as the ability to compose

texts in a variety of genres that are appropriate for their audience and purpose, and it is difficult, if

not impossible, to generalize from a single text on a single topic composed under time constraints

to this broader universe of writing. For this reason, many individual teachers and writing

programs have adopted portfolio assessment as a (potentially) more valid approach writing

assessment. A complete discussion of portfolio assessment is beyond the scope of this paper;

interested readers are referred to Hamp-Lyons and Condon (2000), Mabry (1999), Weigle (2002,

chap. 9), and Wolcott and Leggett (1998) for more thorough treatments of portfolio assessment.

Here, I will briefly define portfolio assessment and provide a brief overview of some of the

advantages and constraints of portfolio assessment.

A portfolio is ‘‘a purposeful collection of student works that exhibits to the student (and/or

others) the student’s efforts, progress, or achievement in a given area’’ (Northwest Evaluation

Association, 1991, p. 4, cited in Wolcott and Leggett, 1998). Portfolios vary greatly depending on

the age of students, the purpose of the course, and the learning context, among other variables, but

three essential components of a portfolio are collection, reflection, and selection (Hamp-Lyons

and Condon, 2000). A portfolio is a collection of written products rather than a single writing

sample, but it is the process of selecting and arranging the specific contents through deliberate

reflection that distinguishes a portfolio from a pile of papers or a large folder (p. 119). Another

important component of most portfolio assessment programs is delayed evaluation, which gives

students both motivation and time to revise their papers based on feedback and self-reflection

before turning them in for a final grade.

Portfolio assessment has several advantages over traditional writing tests as a means for

evaluating student growth and achievement in writing. First and foremost, portfolios allow

assessment and instruction to be integrated seamlessly, as everything that happens in the writing

class contributes directly to the process of assembling the portfolio. Furthermore, portfolio


assessment allows students to demonstrate their mastery of different genres and registers, as well

as their mastery of different aspects of the writing process such as the ability to revise one’s

writing based on feedback and to edit one’s writing for sentence-level errors. For second

language writers, in particular, portfolio assessment has the advantage of affording extra time for

revision and editing to students who may not perform as well under timed conditions.

Despite these advantages, however, implementing a portfolio assessment program is not without

its difficulties. One potentially problematic aspect of portfolio assessment is reliability of scoring:

individual portfolios may contain writing samples that vary greatly in quality, which makes it

difficult to assign a single score or grade to the portfolio, and the content of portfolios assembled by

different students may vary considerably, making it difficult to score consistently across portfolios.

Another area of potential difficulty has to do with practicality: setting up and maintaining a

successful portfolio program requires a great deal of advanced planning and investment of

time and effort on the part of teachers, administrators, and students. These difficulties are not

insurmountable, however, and many teachers who have successfully implemented portfolio

assessment will state unequivocally that the benefits of portfolios far outweigh the difficulties.

What teachers should know about externally mandated assessments

In addition to knowing about classroom assessments, writing teachers need to be aware of

many issues related to large-scale assessment. In many programs and institutions, teachers are

obligated to prepare their students for large-scale examinations, ranging from locally produced

exit examinations to professionally written tests such as the TOEFL. Teachers frequently have

one of two attitudes towards these tests. Some teachers feel mistrustful of standardized tests and

the companies that make and administer them. They believe – not completely without

justification – that many externally imposed tests are thrust upon them and their students for

political reasons, and that such tests are created by people who are out of touch with the world of

education. Others, on the other hand, are all too willing to trust the judgments of the ‘‘experts’’

rather than their own expertise. They tend to assume that a test is valid simply because it was

written by a professional test writer and do not take the time to examine the test closely or look at

the match between the test and their own goals and objectives in teaching.

As a teacher trainer, I find it important to explore both of these points of view and point out

some of the dangers and misconceptions involved in each. Several scholars have pointed out the

inherently political nature of large-scale assessment; as tests can be used as gatekeeping

mechanisms that allow or restrict access to educational resources and opportunities (see for

example, Shohamy, 1998; White, Lutz, & Kamusikiri, 1996). Questions about whose agenda is

being served by large-scale tests, who has the right to determine what ‘‘good writing’’ means, and

what the intended and unintended effects of policy decisions about testing are on students,

teachers, and programs, need to be continually asked, particularly by teachers, who are among

those most affected by large-scale tests and most immediately aware of how these tests affect

their students. On the other hand, teachers may too easily adopt the view described by Scharton

(1996) of ‘‘right-minded teachers struggl[ing] against ruthless big-company test designers who

merely want to sell a test score to administrators interested in a quick fix’’ (p. 56) and may not

appreciate the professionalism behind the development of large-scale tests. My own experience

as a member of the TOEFL Committee of Examiners for three years helped disabuse me of that

particular perspective; I found that the people involved in developing the TOEFL were deeply

committed to creating high-quality, fair, and valid assessments and were just as concerned with

mitigating negative consequences to students as any teacher.


Of course, most teachers will not have the opportunity to get an up-close look at the inner

workings of testing companies, and teachers’ busy schedules make it difficult to devote time to

advocacy issues. However, one valuable assignment that I have used with teachers in training to

raise their awareness of some of these issues is to critique an existing test from the point of view

of reliability, validity, authenticity, practicality, and washback. Students have critiqued large-

scale tests and tests that are used in their institutions and are often surprised at what they find out.

For example, students have discovered that placement and exit tests used in local language

schools and community colleges frequently have no handbook or technical manual, no record of

how they were developed or validated, and often use writing prompts that are not pretested or

equated, so that there is no way to determine the effect of particular prompts or prompt types on

the scores given. At the same time, students come to appreciate that large testing organizations

such as Educational Testing Service, whom many are used to thinking of in negative terms

because of their dominance in high-stakes tests, in fact take tremendous care in defining

constructs, designing valid and reliable assessments, and maintaining a program of research to

ensure that their tests are of high quality.

As test users and as advocates for their students, teachers have a responsibility to understand

the powerful role that tests play in their students’ lives, and where relevant, to challenge misuses

of tests, for example, the use of a single essay test to make high – stakes decisions such as exit or

admissions, or the practice of administering writing prompts that have not been validated or even

pre-tested. Huot (1996) proposes a set of principles for assessing writing that take into account

the needs and concerns of all stakeholders in assessment; these principles can be a useful starting

point in evaluating any mandated assessments that are in place at an institution. Huot believes

that writing assessment should be site-based. That is, it should be developed in response to a need

at a specific site; locally controlled by the institution involved; context-sensitive to take into

account the instructional goals as well as the cultural and social environment of the institution;

rhetorically-based, adhering to ‘‘recognizable principles integral to the thoughtful expression

and reflective interpretation of text’’; and accessible, so that procedures for creating and scoring

writing assessments are available to all stakeholders, including the test takers themselves.

Teachers can also be advocates for fair testing by insisting that those who are administering

and scoring tests adhere to a code of practice and ethics, such as that promulgated by the

International Language Testing Association (http://iltaonline.com/). The ILTA code of ethics

consists of nine principles that should guide the professional behavior of language testers, each

elaborated upon with a set of annotations. For example, Principle 1 states: ‘‘Language testers

shall have respect for the humanity and dignity of each of their test takers. They shall provide

them with the best possible professional consideration and shall respect all persons’ needs, values

and cultures in the provision of their language testing service.’’ The draft code of practice outlines

responsibilities and obligations for test writers, institutions, and users of test results with regard to

good testing practices. Teachers should not hesitate to ask questions about the reliability and

validity of the tests that their students are required to take and how test results will be used, and

teachers should be proactive in bringing issues of questionable testing practices to the attention of

administrators.

Conclusion

In this paper, I have briefly touched upon several issues related to assessment that writing

teachers should be aware of, both in terms of tests that teachers develop for their own courses and

in terms of large-scale tests. Because assessment is such an integral component of teaching, it is


http://iltaonline.com/

regrettable that many graduate programs in composition and TESOL do not require an

assessment course, and thus many teachers enter the classroom without a thorough grounding in

assessment issues. Fortunately, teachers are not without resources; there are regional associations

of language testing specialists that hold annual conferences, and, increasingly, there are

assessment-related sessions at major international conferences such as TESOL and many

excellent volumes that discuss assessment issues in clear, understandable terms. A solid

understanding of assessment issues should be part of every teacher’s knowledge base, and

teachers should be encouraged to equip themselves with this knowledge as part of their ongoing

professional development.

Acknowledgments

I am indebted to Diane Belcher, Alan Hirvela, and two anonymous reviewers for their helpful

suggestions on earlier drafts of this manuscript.

References

Alderson, J. C., Clapham, C., & Wall, D. (1995). Language test construction and evaluation. Cambridge: Cambridge

University Press.

Allen, D. (2002). Getting things done: The art of stress-free productivity. London: Piatkus Books.

Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.

Bachman, L., & Palmer, A. (1996). Language testing in practice. Oxford: Oxford University Press.

Brown, H. (2004). Language assessment: Principles and classroom practices. White Plains, NJ: Pearson Education Inc.

Brown, J. D., & Hudson, T. (1998). The alternatives in language assessment. TESOL Quarterly, 32, 653–675.

Byrd, P., & Nelson, G. (1995). NNS performance on writing proficiency exams: Focus on students who failed. Journal of

Second Language Writing, 4, 273–285.

Carson, J. G., Chase, N. D., Gibson, S. U., & Hargrove, M. (1992). Literacy demands of the undergraduate curriculum.

Reading Research and Instruction, 31(4), 25–50.

Cohen, A. D. (1994). Assessing language ability in the classroom. Boston, MA: Heinle and Heinle.

Davidson, F., & Lynch, B. K. (2002). Testcraft: A teacher’s guide to writing and using language test specifications. New

Haven, CT: Yale University Press.

Ellis, R. (2005). Measuring implicit and explicit knowledge of a second language: A psychometric study. Studies in

Second Language Acquisition, 27, 141–172.

Ferris, D. R., & Hedgcock, J. R. (2005). Teaching ESL composition: Purpose, process, and practice. Mahwah, NJ:

Lawrence Erlbaum Associates.

Gronlund, N. (2004). Writing instructional objectives for teaching and assessment (7th ed.). Upper Saddle River, NJ:

Pearson Education.

Hale, G., Taylor, C., Bridgeman, B., Carson, J., Kroll, B., and Kantor, R. (1996). A study of writing tasks assigned in

academic degree programs. (TOEFL Research Report No. 54). Princeton, NJ: Educational Testing Service.

Hamp-Lyons, L. (1991a). Assessing second language writing in academic contexts. Norwood, NJ: Ablex.

Hamp-Lyons, L. (1991b). Scoring procedures for ESL contexts. In L. Hamp-Lyons (Ed.), Assessing second language

writing in academic contexts (pp. 241–276). Norwood, NJ: Ablex.

Hamp-Lyons, L., & Condon, W. (2000). Assessing the portfolio: Principles for practice, theory, and research. Cresskill,

NJ: Hampton Press.

Hudson, T., & Brown, J. D. (2002). Criterion-referenced language testing. Cambridge: Cambridge University Press.

Hughes, A. (2002). Testing for language teachers (2nd ed.). Cambridge: Cambridge University Press.

Huot, B. (1996). Toward a new theory of writing assessment. College Composition and Communication, 47, 549–566.

International Language Testing Association. (2000). Code of ethics for ILTA. Retrieved October 3, 2007 from http://

www.iltaonline.com/code.pdf.

Jacobs, H. L., Zinkgraf, S. A., Wormuth, D. R., Hartfiel, V. F., & Hughey, J. B. (1981). Testing ESL composition: A

practical approach. Rowley, MA: Newbury House.


http://www.iltaonline.com/code.pdf

http://www.iltaonline.com/code.pdf

Johns, A. M. (1991). Interpreting an English competency examination: The frustrations of an ESL science student. Written

Communication, 8, 379–401.

Kroll, B., & Reid, J. (1994). Guidelines for designing writing prompts: Clarifications, caveats, and cautions. Journal of


Mabry, L. (1999). Portfolios plus: A critical guide to alternative assessment. Thousand Oaks, CA: Corwin.

McNamara, T. F. (1996). Measuring second language performance. London: Longman.

Purves, A. C., Soter, A., Takala, S., & Vahapassi, A. (1984). Towards a domain-referenced system for classifying

assignments. Research in the Teaching of English, 18(4), 385–416.

Raimes, A. (1991). Out of the woods: Emerging traditions in the teaching of writing. TESOL Quarterly, 25, 407–

430.

Scharton, M. (1996). The politics of validity. In E. M. White, W. D. Lutz, & S. Kamusikiri (Eds.), Assessment of writing:

Politics, policies, practices. New York: The Modern Language Association of America.

Shohamy, E. (1998). Critical language testing and beyond. Studies In Educational Evaluation, 24, 331–345.

Sweedler-Brown, C. O. (1993). ESL essay evaluation: The influence of sentence-level and rhetorical features. Journal of


Weigle, S. (2002). Assessing writing. Cambridge: Cambridge University Press.

Weigle, S. (2005). Second language writing expertise. In K. Johnson (Ed.), Expertise in language learning and teaching

(pp. 128–149). Hampshire, England: Palgrave Macmillan.

White, E., Lutz, D., & Kamusikiri, S. (Eds.). (1996). Assessment of writing: Politics, policies, practices. New York: The

Modern Language Association.

Wolcott, W., & Leggett, S. M. (1998). An overview of writing assessment: Theory, research and practice. Urbana, IL:

National Council of Teachers of English.

Appendix. Selected references on assessment

Alderson, J. C., Clapham, C., & Wall, D. (1995). Language test construction and evaluation.

Cambridge: Cambridge University Press.

Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford

University Press.

Bachman, L., & Palmer, A. (1996). Language testing in practice. Oxford: Oxford University

Press.

Brown, H. (2004). Language assessment: Principles and classroom practices. White Plains,

NJ: Pearson Education.

Brown, J. D., & Hudson, T. (1998). The alternatives in language assessment. TESOL

Quarterly, 32, 653–675.

Cohen, A. D. (1994). Assessing language ability in the classroom. Boston: Heinle and Heinle.

Davidson, F., Lynch, B. K. (2002). Testcraft: A teacher’s guide to writing and using language

test specifications. New Haven, CT: Yale University Press.

Hamp-Lyons, L. (1990). Second language writing: Assessment issues. In B. Kroll (Ed.),

Second language writing: Research insights for the classroom (pp. 69–87). Cambridge:

Cambridge University Press.

Hamp-Lyons, L. (1991). Assessing second language writing in academic contexts. Norwood,

NJ: Ablex.

Hamp-Lyons, L., & Kroll, B. (1997). TOEFL 2000-writing; composition, community, and

assessment (TOEFL Monograph Series Report No. 5). Princeton, NJ: Educational Testing

Service.

Hudson, T., & Brown, J. D. (2002). Criterion-referenced language testing. Cambridge:

Cambridge University Press.

Huot, B. (1990). The literature of direct writing assessment: major concerns and prevailing

trends. Review of Educational Research, 60, 237–263.


Shohamy, E. (2001). The power of tests: A critical perspective on the uses of language tests.

London: Longman/Pearson Education.

Weigle, S. (2002). Assessing writing. Cambridge: Cambridge University Press.

White, E. (1994). Teaching and assessing writing: Recent advances in understanding,

evaluating and improving student performance, 2nd ed. San Francisco: Jossey-Bass.

Wolcott, W., & Leggett, S. M. (1998). An overview of writing assessment: Theory, research

and practice. Urbana, IL: National Council of Teachers of English.


teaching writing teachers about assessment

Documents