durham research online - eportfolio€¦ · literature review of e-assessment report 10: futurelab...

52

Upload: others

Post on 30-Sep-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

Durham Research Online

Deposited in DRO:

05 February 2007

Version of attached �le:

Published Version

Peer-review status of attached �le:

Not peer-reviewed

Citation for published item:

Ridgway, J. and McCusker, S. and Pead, D. (2004) 'Literature review of e-assessment.', UNSPECIFIED.Futurelab, Bristol.

Further information on publisher's website:

http://www.futurelab.org.uk/resources/publications-reports-articles/literature-reviews/Literature-Review204

Publisher's copyright statement:

This Open Access Policy allows anyone to access our text content (where we have copyright - please note theexceptions listed below) electronically without charge, as long as certain conditions are met. Users are welcome todownload, save, perform or distribute this work electronically or in any other format, including in foreign languagetranslation, without written permission subject to the conditions set out in the Futurelab Open Access Licence, some ofwhich are as follows: * The material cannot be used for commercial gain including professional, political or promotionaluses or for any �nancial gain. * The material must be used in full and without alterations or amendments. * Futurelaband the authors must be acknowledged (with the Futurelab logo - see press page for download details) as the originalsource of the material and the relevant Futurelab webpage (where the material can be found) must be given in aprominent position. It should also acknowledge that its use is subject to the terms of this licence. * Only text is coveredby this licence - no pictures, images, diagrams, moving images, sound, video, downloads of prototypes or software is tobe used. * No more than 500 copies of any one piece of work may be reproduced. * You must advise Futurelab of whereand when the work will be reproduced - please send an e-mail to [email protected]. For a full list of the criteria tobe met, please read the Futurelab Open Access Licence.

Additional information:

Report number 10.

Use policy

The full-text may be used and/or reproduced, and given to third parties in any format or medium, without prior permission or charge, forpersonal research or study, educational, or not-for-pro�t purposes provided that:

• a full bibliographic reference is made to the original source

• a link is made to the metadata record in DRO

• the full-text is not changed in any way

The full-text must not be sold in any format or medium without the formal permission of the copyright holders.

Please consult the full DRO policy for further details.

Durham University Library, Stockton Road, Durham DH1 3LY, United KingdomTel : +44 (0)191 334 3042 | Fax : +44 (0)191 334 2971

http://dro-test.dur.ac.uk

Page 2: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

Literature Review of E-assessment

REPORT 10:

FUTURELAB SERIES

Jim Ridgway and Sean McCusker, School of Education, University of DurhamDaniel Pead, School of Education, University of Nottingham

Page 3: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

FOREWORD

I have to admit to being someone who formany years has avoided thinking aboutassessment – it somehow always seemeddistant from my interests, divorced frommy concerns about how children learnwith technologies and, to be honest, just alittle less interesting than other things Iwas working on… In recent years, however,working in the field of education andtechnology, it has become clear thatanyone with an interest in how we createequitable, engaging and relevant educationsystems needs to think long and hardabout assessment. Futurelab’s conference‘Beyond the Exam’ in November 2003further highlighted this point, as committedand engaged educators, software andmedia developers came together to raisea rallying cry for a rethink of our currentassessment practices.

What I and many others working in thisarea have come to realise is that we can’tjust ignore assessment, or simply see itas ‘someone else’s job’. Assessmentpractices shape, possibly more than anyother factor, what is taught and how it istaught in schools. At the same time, these assessment practices serve as the

focus (perhaps the only focus in this dayand age) for a shared societal debateabout what we, as a society, think are thecore purposes and values of education. If we wish to create an education systemthat reflects and contributes to thedevelopment of our changing world, thenwe need to ask how we might changeassessment practices to achieve this.

The authors of this review provide acompelling argument for the central role of assessment in shaping educationalpractice. They outline the challenges and opportunities posed by the changingglobal world around us, and the potentialrole of technologies in our assessmentpractices. Both optimistic and practical,the review summarises existing researchand emergent practice, and provides ablueprint for thinking about the risks andpotential that awaits us in this area.

We look forward to hearing your responseto this review.

Keri Facer, Director of Learning Research [email protected]

1

CONTENTS:

EXECUTIVE SUMMARY 2

PURPOSE 4

SECTION 1ASSESSMENT DRIVES EDUCATION 5

SECTION 2HOW AND WHERE MIGHTASSESSMENT BE DRIVEN? 11

SECTION 3CURRENT DEVELOPMENTS IN E-ASSESSMENT 17

SECTION 4OPPORTUNITIES AND CHALLENGES FOR E-ASSESSMENT 29

GLOSSARY 40

BIBLIOGRAPHY 43

APPENDIX: FUNDAMENTALS OF ASSESSMENT 46

Literature Review of E-assessment

REPORT 10:

FUTURELAB SERIES

Jim Ridgway and Sean McCusker, School of Education, University of DurhamDaniel Pead, School of Education, University of Nottingham

Page 4: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

EXECUTIVE SUMMARY

“E-assessment must not simply invent new technologies which recycle our current ineffective practices.” Martin Ripley, QCA, 2004

Assessment is central to educationalpractice. High-stakes assessmentsexemplify curriculum ambitions, definewhat is worth knowing, and driveclassroom practices. It is essential todevelop systems for assessment whichreflect our core educational goals, andwhich reward students for developingskills and attributes which will be of long-term benefit to them and to society.There is good research evidence to showthat well designed assessment systemslead to improved student performance. In contrast, the USA provides somespectacular examples of systems wherenarrowly focused high-stakes assessmentsystems produce illusory student gains;this ‘friendly fire’ results at best in lostopportunities, and at worst in damagedstudents, teachers and communities.

ICT provides a link between learning,teaching and assessment. In school, ICT is used to support learning. Currently, we have bizarre assessment practiceswhere students use ICT tools such as word processors and graphics calculatorsas an integral part of learning, and arethen restricted to paper and pencil whentheir ‘knowledge’ is assessed.

Assessment systems drive education, butare themselves driven by a number offactors, which sometimes are in conflict.To understand likely developments inassessment, we need to examine some ofthese drivers of change. Implications of technology, globalisation, the EU,

multinational companies, and the need to defend democracy are discussed. All ofthese influences are drivers for increaseduses of ICT in assessment. Many of thedevelopments require the assessment ofhigher-order thinking. However, there is a constant danger that assessmentsystems are driven in undesirable ways,where things that are easy to measure arevalued more highly than things that aremore important to learn (but harder toassess). In order to satisfy educationalgoals, we need to develop ways to makeimportant things easier to measure - and ICT can help.

All is not well with education. TheTomlinson Report (2004) identifies majorproblems with current educationalprovision at ages 14-19 years: there is aplethora of qualifications; too few studentsengage with education; the drop-out rate is scandalously high; and the most ablestudents are not stretched by their studies.Young people are not being equipped withthe generic skills, knowledge and personalattributes they will need in the future. A radical approach to qualifications issuggested which (in our view) can only be introduced if there is a widespreadadoption of e-assessment.

The UK government is committed to a bold e-assessment strategy. Componentsinclude: ICT support for current paper-based assessment systems; some online,on-demand testing; and the developmentof radical, ICT-set and assessed tests ofICT capability. Some good progress hasbeen made with these developments.

E-assessment can be justified in a numberof ways. It can help avoid the meltdown of current paper-based systems; it canassess valuable life skills; it can be better

2

assessment iscentral to

educationalpractice

EXECUTIVE SUMMARY

Page 5: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

for users – for example by providing on-demand tests with immediate feedback,and perhaps diagnostic feedback, andmore accurate results via adaptive testing;it can help improve the technical quality oftests by improving the reliability of scoring.

E-assessment can support currenteducational goals. Paper and pencil testscan be made more authentic by allowingstudents to word process essays, or to usespreadsheets, calculators or computeralgebra systems in paper-basedexaminations. It can support current UKexamination processes by using ElectronicData Exchange to smooth communicationsbetween schools and examinationsauthorities; current processes of trainingmarkers and recording scores can beimproved. Systems where student work isscanned then distributed have advantagesover conventional systems in terms oflogistics (posting and tracking largevolumes of paper, for example), andcontinuous monitoring can ensure highmarker reliability. Current work is pushingboundaries in areas such as textcomprehension, and automated analysis of student processes and strategies.

E-assessment can be used to assess ‘new’educational goals. Interactive displayswhich show changes in variables overtime, microworlds and simulations,interfaces that present complex data inways that are easy to control, all facilitatethe assessment of problem-solving andprocess skills such as understanding and representing problems, controllingvariables, generating and testinghypotheses, and finding rules andrelationships. ICT facilitates newrepresentations, which can be powerfulaids to learning. Little is known about the cognitive implications of these

representations; however, it seems likelythat complex ideas (notably in reasoningfrom evidence of various sorts) will beacquired better and earlier than they are at present, and that the standards ofperformance demanded of students willrise dramatically. Here, we also exploreways to assess important but ill-definedgoals such as the development ofmetacognitive skills, creativity,communication skills, and the ability to work productively in groups.

A major problem with education policy andpractice in England is the separation of‘academic’ and ‘practical’ subjects. In theworst case, to be able to invent and createsomething of value is taken to be a suresign of feeble-mindedness; where as toopine on the work of others showstowering intellectual power. A diet ofacademic subjects with no opportunities toact upon the world fails to equip studentswith ways to deal with their environments;a diet of practical subjects which do notengage higher-order thinking throughoutthe creative process equip students only tobecome workers for others. Both streamsproduce one-handed people, and polarisedsocieties. E-portfolios can provide workingenvironments and assessment frameworkswhich support project-based work acrossthe curriculum, and can offer an escapefrom one of the most pernicious historicallegacies in education. E-portfolios solveproblems of storing student work, andmake the activity of documenting theprocess of creation and reflection relativelyeasy. Reliable teacher assessment isenabled. There is likely to be extensive useof teacher assessment of those aspects ofperformance best judged by humans(including extended pieces of workassembled into portfolios), and moreextensive use made of on-demand tests

3

e-assessmentcan be used toassess ‘new’educationalgoals

REPORT 10LITERATURE REVIEW OF E-ASSESSMENT

JIM RIDGWAY AND SEAN MCCUSKER, SCHOOL OF EDUCATION, UNIVERSITY OF DURHAMDANIEL PEAD, SCHOOL OF EDUCATION, UNIVERSITY OF NOTTINGHAM

Page 6: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

of those aspects of performance which can be done easily by computer, or whichare done best by computer.

The issue for e-assessment is not if it willhappen, but rather, what, when and how itwill happen. E-assessment is a stimulusfor rethinking the whole curriculum, aswell as all current assessment systems.New educational goals continue toemerge, and the process of criticalreflection on what is important to learn,and how this might be assessedauthentically, needs to be institutionalisedinto curriculum planning.

E-assessment is certain to play a majorrole in defining and implementingcurriculum change in the UK. There is astrong government commitment to highquality e-assessment, and good initialprogress has been made; nevertheless,there is a need to be vigilant that thedesign of assessment systems is notdriven by considerations of cost.

Major challenges of ‘going to scale’ haveyet to be faced. A good deal of innovativework is needed, coupled with a groundedapproach to system-wide implementation.

PURPOSE

The purpose of this report is:

• to assert the centrality of assessment in education systems

• to identify ‘drivers’ of assessment, and their likely impact on assessment,and thence on education systems

• to describe current, radical plans for increased use of high-stakes e-assessment in the UK

• to describe and exemplify current uses of ICT in assessment

• to explore the potential of newtechnologies for enhancing currentassessment (and pedagogic) practices

• to identify opportunities and to suggest ways forward

• to ‘drip feed’ criteria for goodassessment throughout (set outexplicitly in an appendix).

This report has been designed to: presentkey findings on research in assessment;describe current UK government plans,and likely future developments; providelinks to interesting examples of e-assessment; offer speculations onpossible future developments; and tostimulate a debate on the role of e-assessment in assessment, teaching,and learning.

The key findings and implications of the report are presented within theExecutive Summary.

4

e-assessment isa stimulus forrethinking the

wholecurriculum

Page 7: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

1 ASSESSMENT DRIVES EDUCATION

Assessment is an integral part of being.We all make myriads of assessments inthe course of everyday life. Is Jane a goodfriend? Which Rachel Whiteread do I likebest? Does my bum look big in this? Thequestions we ask, and the referents, givean insight into the way we see ourselvesand the world (eg Groucho Marx’s “Pleaseaccept my resignation. I don’t want tobelong to any club that will accept me as a member”). For aspects of our lives thatare goal-directed (getting promoted, goingshopping), assessment is essential toprogress. To be effective, it is necessary to know something of the intended goal; in well-defined situations, this will berelatively easy, and goals will be specifiedclearly. In ill-defined situations, such ascreative acts, and research, the goalsthemselves might not be well specified,but the criteria for assessing products and processes may well be.

1.1 ASSESSMENT AND EDUCATION

Assessment is central to the practice ofeducation. For students, good performanceon ‘high-stakes’ assessment gives accessto further educational opportunities andemployment. For teachers and schools, it provides evidence of success asindividuals and organisations. Cultures of accountability drive everyone to be‘instrumental’ – how do I demonstratesuccess (without compromising my deepvalues)? Assessment systems provide the ways to measure individual andorganisational success, and so can have a profound driving influence on systemsthey were designed to serve.

There is an intimate association betweenteaching, learning and assessment,illustrated in Fig 1. Robitaille et al (1993)distinguish three components of thecurriculum: the intended curriculum (setout in policy statements), the implementedcurriculum (which can only be known bystudying classroom practices) and theattained curriculum (which is whatstudents can do at the end of a course ofstudy). The links between these threeaspects of the curriculum are notstraightforward. The ‘top down’ ambitionsof some policy makers are hostages to anumber of other factors. The assessmentsystem – tests and scoring guides -provides a far clearer definition of what is to be learned than does any verbaldescription (and perhaps provides the onlyclear definition), and so is a far betterbasis for curriculum planning atclassroom level than are grand statementsof educational ambitions. Teachers’ valuesand competences also mediate policy andattainment; however, the assessmentsystem is the most potent driver ofclassroom practice.

5

the assessmentsystem is themost potentdriver ofclassroompractice

SECTION 1

ASSESSMENT DRIVES EDUCATION

Fig 1: Adapted from Pellegrino, Chudowskiand Glaser (2001)

Learning

Assessment Pedagogy

Page 8: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

In the UK, there is a long-standing belief(eg Cockcroft 1982) that assessmentsystems have a direct effect on curriculumand on classroom practices. In Australia,Barnes, Clarke and Stevens (2000) tracedthe effects of changing a high-stakesassessment on classroom practice, andclaimed evidence for a direct causal link.Mathews (1985) traced the distortingeffects on the whole school curriculum offormal examinations for universityentrance (now A-levels), introduced whenthe university sector expanded beyondCambridge, Durham and Oxford – toaccommodate as much as 5% of thepopulation. There was a perceived needfor entrance tests to pre-universitycourses (O-levels) – designed for about20% of the population - followed by aperceived need to align all certification inthe education system (notably O-levelsand CSE). This linkage betweenassessment for university admission andthe assessment of low-attaining studentshad a direct and often damaging impacton courses of study for lower attainingstudents (Cockcroft 1982).

Ill-conceived assessment can damageeducational systems. Klein, Hamilton,McCaffrey and Stecher (2000) presentevidence on the ‘Texas Miracle’. Here,scores on a rather narrow test designed bythe State of Texas showed very large gainsover a period of just four years. This test isused to determine the funding received byindividual schools. Unfortunately, scoreson a national test which supposedlymeasured the same sort of studentattainment were largely unchanged in the same time interval. So scores onnarrow tests can rise, even whenunderlying student attainment does not.The ‘Texas Miracle’ was used in theelection campaign of President Bush,

as evidence of his effectiveness as agovernor in raising educational standards.

Linn (2000) points to an underhand method sometimes used by incomingsuperintendents of school districts to showthe effectiveness of their leadership. Mostcommercially available multiple choicetests of educational attainment have anumber of ‘parallel test forms’, designedto measure the same knowledge and skillsin the same way, but with slightly differentformats (so ‘12 men take six days, howlong will six men take?’ becomes ‘12 mentake six days, how long will four mentake?’). These tests are designed in such away that student scores on two parallelforms would be the same (plus or minusmeasurement error). Test designers dothis so that school districts can change thetest form every year, in order that testsmeasure the underlying knowledge andskills, not the ability to memorise theanswers to specific questions. Linn (2000)gives an example where an incomingSuperintendent decides to use a new testform and also chooses to use this sametest form in successive years. The result isa steady increase in student scores simplybecause of poor test security – studentsare taught to memorise answers. Itappears that the superintendent hasworked miracles with student attainment,because scores have gone up so much.However, when students are tested on anew parallel form, and have to work outthe answers and not rely on memory, then scores plummet. So the highreputation for increasing studentperformance is built upon deliberatedeceit. This is bad for teachers andstudents, and bad for public morality.

High-stakes assessment systems definewhat is rewarded by a culture, and

6

ill-conceivedassessment can

damageeducational

systems

SECTION 1

ASSESSMENT DRIVES EDUCATION

Page 9: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

therefore the knowledge that is valuable. It is unsurprising that high-stakesassessment has a profound effect on bothlearning and teaching. Decisions aboutassessment systems are not made in avacuum; the educational community in theUK (but not universally) is involved in thedesign of assessment systems, and thesedecisions are usually grounded indiscussions on what is worth knowing, andin the practicalities of teaching differentconcepts and techniques to students ofdifferent ages.

1.2 THE IMPACT OF ASSESSMENT ON ATTAINMENT

An extensive literature review by Black andWiliam (2002) showed that well designedformative assessment is associated withmajor gains in student attainment on awide range of conventional measures ofattainment. This result was found acrossall ages and all subject disciplines.Topping (1998) reviewed the impact of peerassessment between students in highereducation on writing, and found largepositive effects. A major literature reviewcommissioned by the EPPI Centre (2002)showed that regular summativeassessment had a large negative effect onthe attainment of low-attaining students,but did little harm to high-attainingstudents. These studies provide strongevidence that good assessment practicesproduce large performance gains. Thesegains are amongst the largest gains foundin any educational ‘treatments’. Similarly,poor assessment systems have negative –not neutral – effects on the performance ofweak students. It follows that when weconsider the introduction of e-assessment,we should be aware that we are workingwith a very sharp sword.

1.3 ICT AND ASSESSMENT

ICT perturbs the links between learning,teaching and assessment in a number ofdistinct ways:

1 ICT has changed the ways that researchis conducted in most disciplines.Linguists analyse large corpuses of text;geographers use GIS systems; scientistsand engineers use modelling packages.Everyone uses word processors,databases and spreadsheets. Studentsshould use contemporary researchmethods; if they do not, school-basedlearning will become increasinglyirrelevant to understandingdevelopments in knowledge.Assessment should reinforce goodcurriculum practice. We areapproaching a bizarre situation wherestudents use powerful and appropriatetools to support learning and solveproblems in class, but are then deniedaccess to these tools when their‘knowledge’ is assessed.

2 ICT can support educational goals thathave been judged to be desirable for along time, but hard to achieve viaconventional teaching methods. Inparticular, ICT can support thedevelopment of higher-order thinkingskills such as critiquing, reflection oncognitive processes, and ‘learning tolearn’, and can facilitate group work,and engagement with extended projects;ICT competence is itself a (moving)target for assessment.

3 New technologies raise an important set of questions about what is worthlearning in an ICT-rich environment;what can be taught, given newpedagogic tools; and how assessment

7

well designedformativeassessment isassociated withmajor gains instudentattainment

Page 10: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

systems can be designed which putpressure on educational systems to helpstudents achieve these new goals. If weignore these important questions, werun the risk that e-assessment will bedesigned on the basis of convenience,with disastrous consequences foreducational practice.

1.4 ON THE NATURE OF SUMMATIVEAND FORMATIVE ASSESSMENT

We should distinguish between summativeand formative assessment, which aredifferent in conception and function. Inprinciple, it is easy to distinguish betweenthem. Summative assessment takes placeat the end of some course of study, and isdesigned to summarise performance andattainment at the time of testing; high-stakes, end of schooling assessment such as GCSE provides a good example.Formative assessment takes place in mid-course, and is intended to enhancestudents’ final performance; comments on the first draft of an essay provide an example.

Summative and formative assessmentsdiffer on a number of dimensions. Theseinclude:

Consequences: summative assessment isoften highly significant for the student andteacher, whereas formative assessmentsneed not be.

Exchange value: summative assessmentsoften have a value outside the classroom -for certification, access to further courses,and careers; formative assessment usuallyhas no currency outside a small group.

Audience: summative evaluations oftenhave a large audience; the student andteacher, parent, school, employer andeducational system. Formative evaluationcan have a small audience; perhaps justthe student and teacher (and parent inyounger years).

Mendacity quotient: in summativeassessment, students are advised to focuson things they do best and hide areas ofignorance; in formative assessment, it ismore sensible for students to focus onthings they understand least well.

Agency: summative assessment is oftendone to students, perhaps without theirwilling participation. Formativeassessment is often actively sought out bythe student; good formative feedbackdepends on student engagement in theprocess of revision.

Validation methods: summativeassessment is often judged in terms ofpredictive validity - are students who got A grades more likely to get top grades incollege (but see Messick 1995)??Formative assessment might be judged in terms of its usefulness in undoingpredictive validity – what feedback can wegive to students with C grades, so that theyperform as well in college as anyone else?

Quality of the assessment: for summativeassessment, the assessment methodshould achieve appropriately highstandards of reliability and validity; forformative assessment, ‘reliability andvalidity’ are negotiable between teacherand student.

Resources required: the nature ofsummative assessment can be influencedby considerations of cost and time. In

8

SECTION 1

ASSESSMENT DRIVES EDUCATION

Page 11: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

terms of cost, the estimation of the cost oftesting is often done very badly, especiallyin the USA. There, it is common for ‘cost’to be equated with the money paid for thetest and its scoring, not the real cost,which is the opportunity cost, measured interms of the reduction in time spentlearning which has been diverted touseless ‘test prep’. Formative evaluationshould be an integral part of the work ofteaching, so estimation of cost focusesnaturally on opportunity costs – just whatis an effective allocation of teaching andlearning time to formative evaluation? Interms of time, for summative assessmenttime is easy to measure (so long asuseless ‘test prep’ is counted in); again,formative assessment is an integral part of teaching.

Knowledge and the knowledgecommunity: summative assessment isexplicit about what is being assessed, andideas about the nature of knowledge areshared within a wide community; withformative evaluation, ideas about thenature of knowledge might be negotiatedby just two people.

Status of the assessment: in summativeassessment, the assessment can beignored by the student; formativeassessment simply isn’t formativeassessment unless the student doessomething with it to improve performance.

Focal domain: it is useful to distinguishbetween cognitive, social and emotionalaspects of performance. Summativeassessment commonly focuses oncognitive performance; formativeassessment can run wild in the social andaffective domains.

Theory dependence: summativeassessment rarely rests on theory;formative assessment is likely to be‘theory-genic’ as participants discussprogress, what is known, how to learn andremember things, and how best to useevidence.

Tool types: summative assessmentcommonly uses timed writtenassessments where the structure isspecified in advance, and which is scoredusing a common set of rules. Tests areoften designed to discriminate betweenstudents, and to put them into a rank orderin terms of performance. Formativeassessment commonly uses a variety ofmethods such as portfolios of work,student draft work, student annotations oftheir work, concept mapping tools,diagnostic interviews and diagnostic tests.Each student is their own referent –comparison with other students may notbe useful, and is often harmful to learning.

1.4.1 Reflecting on summative and formative assessment

Despite the differences highlighted here,the two sorts of assessment have manyareas of overlap:

• a student can change their studymethods on the basis of an end-of-yearexamination result (summativeassessment used for formative purposes)

• summative evaluation of students can provide formative evaluation forteachers, schools and educationalsystems

• formative assessment always rests onsome sort of summative assessment –feedback and discussion must rest

9

Page 12: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

on some assessment of the currentstate of knowledge

• some summative assessment shouldinclude the ability to benefit fromformative assessment – learning tolearn is an important educational goal,and should be assessed, formally

• summative assessment (eg of studentteachers) should include the ability toprovide formative assessment.

1.5 SUMMARY OF SECTION 1

Assessment lies at the heart of education.Assessment systems exemplify the goalsand values of education systems. High-stakes assessment systems have a directinfluence on classroom practices. Anydiscussion of assessment raises importantquestions about what is worth knowing,the extent to which such knowledge can betaught, and the best ways to supportknowledge acquisition.

Well designed assessment systems areassociated with large increases in studentperformances; frequent testing andreporting of scores damages weakerstudents. Badly designed high-stakesassessment systems can have strongnegative consequences for students,communities and societies.

In this section, we distinguish betweensummative assessment (assessment oflearning) and formative assessment(assessment for learning), and comparetheir characteristics.

ICT has changed the ways that academicwork is done; this should be reflected inthe tools used in education for bothlearning and assessment. Bizarre current

practices where ICT is an integral part oflearning, but where students are deniedaccess to technology during assessment,must be reformed as a matter of urgency.Skills in ICT are essential for muchmodern living, and so should be a targetfor assessment.

10

frequent testingand reporting ofscores damagesweaker students

SECTION 1

ASSESSMENT DRIVES EDUCATION

Page 13: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

2 HOW AND WHERE MIGHTASSESSMENT BE DRIVEN?

There is a comforting belief that decisionsabout education and education systemsare made within those systems, and thatoutside agents – notably foreign outsideagents – have little or no influence oninternal affairs. This has been true in theUK for a long time, but has not been truein countries which (for example) make useof UK examinations to certify students.If we are to explore plausible scenariosabout the future impact of ICT onassessment, it is necessary to takeaccount of ‘drivers of change’. Here, weconsider technology, globalisation, the riseof mass education, problems of politicalstability, current government plans, andlikely government plans, as drivers ofeducational change and, in parallel, oflikely changes in assessment systems.

2.1 TECHNOLOGY AS DRIVER OFSOCIAL CHANGE

Technology is a key driver of social change.Technology has transformed the ways wework, our leisure activities, and the wayswe interact with each other. The use of theweb is growing at an extraordinary rate,and people increasingly have access to richsources of information. Metcalfe’s lawstates that the value of a network risesdramatically as more people join in – itsvalue doesn’t just increase steadily. Thecapability of computer hardware andsoftware continues to improve, andfeatures are being added (such as highquality video) which make computer useincreasingly attractive, and well suited tosupporting human-human interactions.The web is an increasingly valuableresource which is becoming progressively

easier to use, and is attracting users at anincreasing rate. Technology is ubiquitous:as well as computers in the form ofdesktops and laptops, there has been anexplosion of distributed computer power inthe form of mobile phones which are alsofully functioning personal digital assistants(PDAs), containing features such as aspreadsheet, database and wordprocessor. It has been estimated that there are over three billion mobile phonesworldwide (Bennett 2002); as before, this number is growing very fast, and new phones are manufactured with anincreasing range of features. Technologyas a driver has a number of likely effectson assessment. New skills (and so newassessments) are needed for work andsocial functioning, which require fluent use of ICT; technology has had a profoundeffect on many labour intensive workpractices, many of which resembleeducational assessment. The use of ICT for assessment has hardly begun, andsome new technologies such as mobilephones offer great promise not onlybecause of their ubiquity (which mightsolve a current problem of access whichhas restricted widespread use of ICT inassessment in the past), but also becausenew technologies have become a naturalform of communication for very manyyoung people.

2.2 GLOBALISATION

Globalisation is probably the most obviousdriver of change. Significant features forthe current discussion are: the mobility of capital, employment opportunities (jobs), and people. Cooperation betweencountries (eg in the European Union), andthe pervasive influence of multinationalcompanies also have profound social effects.

11

the use of ICT for assessmenthas hardly begun

SECTION 2

HOW AND WHERE MIGHT ASSESSMENT BE DRIVEN?

Page 14: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

The mobility of capital and jobs haschanged the profile of the job market, with new kinds of jobs being created (eg in ICT) and old ones disappearing (eg inmanufacturing industries). It is very easy toexport jobs and capital from the developedworld to the developing world (eg byrelocating telephone call centres, or byestablishing factories in countries with lowwage costs). For people (and economies) to be successful, they must continue tolearn new skills, and to adapt to change.Retraining will often require re-certification of competence, with theobvious consequence of furtherassessment, and the need to designassessment systems appropriate to thenew needs of employment. These arepressures for more, and effective, systemsof competence-based assessment.

Migration for work and education raisessimilar issues. The developed world has aneed to import highly skilled workers;universities worldwide seek internationalstudents. In both cases, there is a need tocertify the competence of applicants, andto reject those least likely to be effectiveworkers, or to complete coursessuccessfully (because of a lack of fluencyin the language of instruction, forexample). Financial considerations make it impractical for testing to take place inthe target country, and so a good deal of testing takes place in the countrysupplying workers or students. Again, it iscommon to use competence tests whichare externally mandated and designed.Language testing provides a good example;a computer-based version of the Test ofEnglish as a Foreign Language (TOEFL)has been developed which adjusts thedifficulty level of the questions in the lightof the performance of the candidate on thetest (see www.ets.org/toefl).

For developed economies to maintain their global dominance, their economiesmust be geared to ‘adding value’ to rawmaterials (or to creating value fromnothing, as in the entertainment andfinance industries). This requires changesin the education system which encouragecreative activities, and good problem-solving ability. Employment in a post-industrial society is likely to depend onhigher-order thinking skills, such as‘learning to learn’. This requires that these thinking skills be exemplified and assessed, if they are to receiveappropriate attention in school.

The effects of cooperation betweencountries in Europe will have an effect onassessment systems. Currently, there is a problem that qualifications in differentmember states (‘architect’, ‘engineer’) are gained after rather different amountsof training, and equip people for quitedifferent levels of professionalresponsibility. This makes job mobility very difficult. The Bologna Accord is anagreement between EU member statesthat all universities will adopt the samepattern of professional training (typically athree-year undergraduate degree followedby a two-year professional qualification) inorder to make qualifications in differentmember states more comparable.Convergence of course structure is likelyto lead to a convergence of assessmentsystems, in line with the desire to increasemobility (see www.engc.org.uk/international/bologna.asp for an analysis ofthe impact of the Bologna, Washington andSidney Accords on engineering).

Globalisation is having a profound effect oneducational systems worldwide. In highereducation, Slaughter and Leslie (1997)describe the response of universities in

12

cooperationbetween

countries inEurope will have

an effect onassessment

systems

SECTION 2

HOW AND WHERE MIGHT ASSESSMENT BE DRIVEN?

Page 15: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

several countries to ‘academic capitalism’– a global trend to view knowledge as a‘product’ to be created and controlled, andto see universities as organisations whichproduce knowledge and moreknowledgeable people as efficiently aspossible. They document the changes inuniversity structures and functioning whichhave been a response to such pressures;these include greater collaboration onteaching between universities, and mutualaccreditation of courses. Again, the needfor comparability of course difficulty andstudent attainment will lead to a carefulre-examination of assessment systems,and some homogenisation.

Multinational companies also drivechanges in assessment practices. Thesecompanies are successful in part becauseof their emphasis on uniform standards;one is unlikely to get a badly cookedhamburger in Macdonalds, or a copy ofExcel that functions worse than othercopies. This emphasis on quality controlextends to job qualifications, and tostandards required of workers. In fastchanging markets such as technologyprovision, retraining workers and checkingtheir competence to use, install or repairnew equipment or software requiresappropriate assessment of competence.The needs of employers for large numbersof staff who are able to use ICT effectivelyas part of their job has lead to trans-national qualifications such as theEuropean Computer Driving Licence(www.ecdl.co.uk). Such examples areinteresting because they are set byinternational organisations, or commercialorganisations, and in some cases (eg theMicrosoft Academy programme -www.microsoft.com/education/msitacademy/ITAPApplyOnline.aspx), state-funded educational organisationsmust submit themselves for examination

by a commercial company before they areallowed to certify student competence.

The scale on which such examinations aretaken is impressive. Bennett (2002)describes the National Computer RankExamination, China, which is a proficiencyexam to assess knowledge of computerscience and the ability to use it; twomillion examinations were taken in 2002.Tests for the European Computer DrivingLicence have been taken by more than amillion people.

2.3 MASS EDUCATION

Mass education has developed rapidly andrecently. In the last 30 years, thepercentage of the UK population beingeducated at university has risen fromabout 5% to about 40%. This putspressures on academic systems to developefficient assessment systems.

There is now a great deal of distanceeducation. China plans to have five millionstudents in 50-100 online colleges by 2005.At least 35 US states have virtualuniversities (Bennett 2002). (The recentfailure of the E-university in the UK -www.parliament.uk/post/pn200.pdf - andof the US Open University, shows that suchventures are not always successful!) Agreat deal of curriculum material isdelivered via a variety of technologies (theMassachusetts Institute of Technology is in the process of putting all its coursematerial online, for example – seehttp://ocw.mit.edu/index.html). Over 3,000 textbooks are freely available online at the National Academy Press(www.nap.edu). The use of technology inthe assessment process is a logicalconsequence of these developments.

13

multinationalcompanies alsodrive changes inassessmentpractices

Page 16: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

2.4 DEFENDING DEMOCRACY

Problems of potential political instabilityprovide another driver of change. The riseof fundamentalism (both Christian andMoslem) can be seen as a loss forrationalism. Electoral apathy is a threat tothe democratic process. One problem forpoliticians is to explain complex policies tocitizens. This is made difficult if citizensunderstand little about modelling (such asideas of multiple causality, feedback insystems, lead and lag times of effects etc).Informed citizens need to understandsomething about ways to describe andmodel complex systems, in order that theydo not give up on democracy simplybecause they do not understand the policyarguments being made. Understandingarguments about causality and someexperience of modelling systems via ICTshould be major educational goals. Thesegoals will need to be exemplified andvalued by high-stakes assessmentsystems, if they are to become part ofstudents’ educational experiences.

Education for citizenship has receivedincreasing emphasis in the UK. Some ofthe educational goals – such asunderstanding different perspectives,increased empathy, and communityengagement - seem intangible. However,ICT can play a role in posing authenticquestions (for example via video) and could play a role in formative assessment,and perhaps in summative assessment(using portfolios).

2.5 GOVERNMENT-LED REFORMS IN CURRICULUM AND ASSESSMENT

Governments are responsive to globalpressures, and analyses of the limitations

of current national systems. Two currentUK initiatives are likely to lead to radicalchanges in assessment practices, notablyto increase the use of e-assessment. Oneis the DfES E-assessment Strategy(www.dfes.gov.uk/elearningstrategy/default.stm) which maps out a tighttimeline for change in current examinationsystems; the other is the Tomlinson (2004)Report 14-19 Curriculum AndQualifications Reform, which proposesradical changes in educational provisionitself (with direct consequences for e-assessment).

The Tomlinson Report (2002) into A-levelstandards argued that the examinationssystem is operating at, or perhaps beyond,capacity. According to Tomlinson (2002), in2001, 24 million examination scripts andcoursework assignments were produced atGCSE, AS and A level. In terms of thenumber of students being assessed, in2002 there were around six million GCSEentries and nearly two million children satKey Stage tests. More students areengaging in post-compulsory education;the introduction of modular A-levels, andthe popularity of AS courses has resultedin an increase in the number ofexaminations taken (Tomlinson reports agrowth of 158% over a 20-year period).There is an associated problem concerningthe supply of examiners, in terms of bothrecruitment and training. Roan (2003)estimated that about 50,000 examinerswere involved in the assessment of GCSEs,GNVQs and A-levels. Continued expansionof the current examination system withoutsome changes does not seem a viableoption. ICT support for current activities,described later, might well be of benefit.

ICT-based assessment is now part of UKgovernment policy, and will be introduced

14

continuedexpansion of the

currentexamination

system withoutsome changes

does not seem aviable option

SECTION 2

HOW AND WHERE MIGHT ASSESSMENT BE DRIVEN?

Page 17: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

progressively, but on a tight timescale. The DfES E-learning Strategy will beaccompanied by radical changes to theassessment process, for which theQualifications and Curriculum Authorityare responsible (www.qca.org.uk/adultlearning/workforce/6877.html). Overthe next five years, the following activitiesare planned:

“All new qualifications should includeassessment on-screen

Awarding bodies set up to accept andassess e-portfolios

Most examinations should be availableoptionally on-screen, where appropriate

National curriculum tests available on-screen for those schools that want to use them

The first on-demand GCSE examinationsare starting to be introduced

10 new qualifications specifically designedfor electronic delivery and assessment”

QCA Blueprint (2004)

The timescale for these changes is short.For example, in 2005, 75% of basic and keyskills tests will be delivered on-screen; in2006, each major examination board willoffer live GCSE examinations in twosubjects, and will pilot at least onequalification, specifically designed forelectronic delivery and assessment; in2007, 10% of GCSE examinations will beadministered on-screen; in 2008, there willbe on-demand testing for GCSEs in atleast two subjects.

Good progress has been made with thesedevelopments. For example, Edexcel iscarrying out a pilot scheme for onlineGCSEs in chemistry, biology, physics andgeography with 200 schools and colleges

across the West Midlands and the west ofEngland. AQA conducted a live trial inMarch 2004 on 20,000 scripts (Adams andHudson 2004); in Summer 2004, about500,000 marks (5% of the total) will becollected; by 2007, 100% of marks will becaptured electronically.

The Tomlinson Report (2004, in prep) will offer a more radical challenge toassessment practices. The Interim Report(Tomlinson 2004) identified a number ofproblems with the existing system. Theseinclude concerns about:

• excellence – the current system doesnot stretch the most able young people(in 2003, over 20% of A-level entriesresulted in grade A)

• vocational training – there is an historicfailure to provide high-quality vocationalcourses that stretch young people andprepare them for work

• vocational learning is often assessed byexternal written examinations, notpractical and continuous assessment

• assessment - the burden on studentsand teachers is too high

• disaffection - our high drop-out ratesare scandalous

• the plethora of qualifications – currentlyaround 4,000

• curricula - are often narrow, overfull,and limit in-depth learning

• too few students develop high levels ofcompetence in mathematical skills,communication, working with others, orproblem-solving

• failure to equip young people with thegeneric skills, knowledge and personalattributes they will need in the future.

15

the TomlinsonReport will offera more radicalchallenge toassessmentpractices

Page 18: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

The Report proposes a single qualificationsframework, based on diplomas set at fourlevels (Entry, Foundation, Intermediate andAdvanced). Students are expected toprogress at a pace appropriate to theirattainment, rather than their age. Eachdiploma shares some common features.These require students to demonstrateevidence of:

• mathematical skills, communicationand ICT skills

• successful completion of an extendedproject

• participation in activities based onpersonal interest, contribution to thecommunity as active citizens, andexperience of employment

• personal planning, review and makinginformed choices

• engagement in ‘main learning’- themajor part of the diploma – chosen bythe student in order to open access tofurther opportunities (eg in employmentor education).

These recommendations are exciting andvery ambitious, but deeply problematic,unless there are radical changes tocurrent assessment systems – notably inthe large-scale adoption of e-assessment.We consider ways these recommendationsmight be met, in Section 3.

2.6 SUMMARY OF SECTION 2

A number of ‘drivers’ are shaping bothassessment and ICT; these need to betaken into account in any discussion offuture developments. These drivers provideconflicting pressures. The driversconsidered here include the increasingpower and ubiquity of ICT, and the

explosion of its usefulness and use ineveryday life. These provide pressures formore relevant skills to be assessed, andalso provide an assessment medium whichis largely unexplored. Demands for lifelonglearning, for people who can innovate andcreate new ideas, and the needs forinformed citizenship are all pressures foreducation (and associated assessmentsystems) that rewards higher-orderthinking, and personal development.Conversely, drivers such as the need toretrain and recertify staff, to ensurecommon standards across organisations in different countries, and to allow accessto well-qualified migrants for jobs andeducation, emphasise assessments whichtranscend national boundaries and whichare based on well-defined competencies(and where assessment design issometimes based on perceivedcommercial imperatives). These driversrequire different approaches toassessment, and all require new sorts ofassessments and assessment systems to be developed.

In the UK, there are a number of problemswith current assessment systems. First,they serve students very badly; second,they might soon collapse under their ownweight. There is now the political will (anda tight timescale) to develop pervasive,high quality e-assessment on a tighttimeline, aligned with current andemerging educational goals. There is alsoan urgent need to invent and apply newsorts of e-assessment on a large scale.

16

there is anurgent need to

invent and applynew sorts of e-

assessment on alarge scale

SECTION 2

HOW AND WHERE MIGHT ASSESSMENT BE DRIVEN?

Page 19: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

3 CURRENT DEVELOPMENTSIN E-ASSESSMENT

The UK government has embarked on avery ambitious project to extend the use ofe-assessment. The issue for education isnot if e-assessment will play a major role,but when, what, and how. E-assessmentcan take a number of forms, includingautomating administrative procedures;digitising paper-based systems, and onlinetesting - which extends from banalmultiple choice tests to interactiveassessments of problem-solving skills. In this section, we focus on currentdevelopments in e-assessment forsummative purposes that can be usedacross the educational system. In Section4 we address important but less well-defined targets for e-assessment.

Before we begin this section exploringdifferent aspects of e-assessment, weshould remember some of the virtues ofpaper-based tests, in order that we do not become so enamoured of newtechnologies that we lose sight of thebenefits of current assessment systems.With paper:

• all stakeholders are familiar with allaspects of the medium

• paper is robust – it can be dropped, and it still functions

• there are rarely problems of legibility

• high resolution displays are readilyavailable

• students can take questions in any order

• users can input cursive script,diagrams, graphs, tables

• a number of equity issues have beensolved – it is easy to create large fontsand to solve other access problems

• paper-based testing systems are wellestablished - it is relatively easy toprevent candidates from copying fromeach other, for example

• paper is easy to distribute, and can beused in most locations

• in extreme circumstances, it is possibleto copy an examination paper, and findanother desk

• human judgements are brought to bearthroughout the process, so the scope ofquestions is unconstrained.

3.1 SOME MOTIVES FOR COMPUTER-BASED TESTING

A number of justifications have been putforward for computer-based testing, andare set out below. Not all justificationsapply to every use of computers inassessment.

Avoiding meltdown: it may well beimpossible to maintain existing paper-based assessment systems in the face of the current growth in the number ofstudents being tested. Scanningtechnologies can help.

Valuable life skills: much of everyday life(including professional life) requires peopleto use computers. Not using computers for assessment seems perverse.

Alignment of curriculum and assessment:there is a danger of an emerging gapbetween classroom practices and theassessment system. It is very common forstudents (and almost all professionals) touse word processors when they write; inmathematics and science, the use ofgraphics calculators, spreadsheets,computer algebra systems (CAS) and

17

the issue foreducation is notif e-assessmentwill play a majorrole, but when,what, and how

SECTION 3

CURRENT DEVELOPMENTSIN E-ASSESSMENT

Page 20: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

modelling software is commonplace (and universal in professional practice).Assessment systems that do not allowaccess to these tools are requiringstudents to work in unfamiliar andmaladaptive ways. Non-ICT-basedassessment can be a drag on curriculumreform, rather than a useful driver (see Section 1.2).

On-demand testing: in many situations (for example, students engaged in part-time study; students taking coursesdesigned to develop competencies;students on short courses) it is appropriateto test students whenever they are judged(or judge themselves) to be ready. City andGuilds tests provide an illustration; 75,000online tests have been taken, andcandidates book a test time that suitsthem. Saturday is the third most popularday for assessment (Ripley 2004).

Students progress at different rates:currently, the UK examination system actsas a force against differentiation in thecurriculum. Summative end-of-year testsmake it attractive to schools to teach yeargroups together and to enter them in acommon set of examinations. On-demandtesting would enable students to take testssuch as GCSEs when they are ready, andto progress through different academicsubjects at different rates. In the USA, the Advanced Placement system allowsstudents to take university-level courses in school, be tested, and to have successrewarded by college credits – so a studentmight enter the second year universitycourse, for example. The Tomlinson Report(2004) argues for a more differentiatedcurriculum.

Adaptive testing: in some circumstances,the group to be tested is heterogeneous as

in the case of language testing, andselection tests for employment. Systems of assessment that change the tasks takenin the light of progress so far can be usefulin such circumstances. The principle isstraightforward: candidates are presentedwith tasks of intermediate difficulty; if they are successful, the difficulty levelincreases; if they are unsuccessful, itdecreases. This allows a more accurateestimate of the level of attainment.Adaptive tests can work well when there isa single scale of difficulty – for example innumber skill, or vocabulary. They requirecareful development when a number ofdifferent factors affect performance (suchas technical as well as problem-solvingskills), and are unlikely to be useful whereextended responses are required, becausethe adaptive system has too little to workon. Examples in the school system can befound in Victoria, Australia (AIM Online2003), where adaptive tests of English andmathematics are used.

Better immediate feedback: candidatescan often be given information immediatelyabout success, as is the case in the teststhat all trainee teachers are required totake in English, mathematics and ICT(Teacher Training Agency 2003). (This is notnecessarily an advantage, if this testingmethod encourages an ‘instrumental’approach, where students learn in order to pass tests rather than to learn things. It could also force assessment design to focus on objective knowledge ratherthan the development of process skills, if immediate feedback became arequirement for all testing.) In principle,candidates could also be given diagnosticinformation about those aspects ofperformance most in need of improvement.

18

on-demandtesting would

enable studentsto take tests

when they areready

SECTION 3

CURRENT DEVELOPMENTSIN E-ASSESSMENT

Page 21: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

Motivational gains: there are claims(Richardson, Baird, Ridgway, Ripley,Shorrocks-Taylor and Swan 2002; Ripley2004) that students prefer e-assessment topaper-based assessment, because theusers feel more in control; interfaces arejudged to be friendly; and because sometests use games and simulations, whichresemble both learning environments andrecreational activities.

Better exemplification for students andteachers: posting examples of work whichmeets certain standards can be beneficial.In South Australia, excellent student workin technology is displayed on the web (seewww.ssabsa.sa.edu.au/tech/2004techsho/index.htm).

Better ‘system’ feedback: having full setsof response data from students available atthe time of Examiners’ Reports canimprove the quality of feedback. Details ofquestions, and parts of questions, thatproved relatively difficult and easy shouldimprove the quality of Examiners’ Reports(which are based currently on examiners’experiences of a sample of scripts, andrarely on candidate success on questionsand part-questions). This information willbe useful for both improving the quality ofquestions, and in providing information toteachers about topics that have not beenlearned well.

Faster information for higher education:universities need assessment results in atimely fashion. UK universities receive A-level results quite late in the academicyear, and engage in a frenetic process to fill places with appropriately qualifiedapplicants when students do and do notachieve the grades that were a condition of entry. These pressures would be eased if results were delivered earlier.

Better task design: it is easier for testconstructors to change tasks on the basisof information during testing and pre-testing, because of the immediacy of datacollection. This can range from therejection of items that do not function well(for example items where students whoscore well overall are likely to fail aparticular item) to improved test design(for example, ensuring that there are a lotof items set around critical cut-off points – especially the pass/fail boundary – sothat the test is most reliable there).

Cost: it is common to claim that e-assessment can save money – it is clearthat online multiple choice tests can becheap to administer and score. However, if we are to exploit the potential of ICT toimprove assessment – for example bypresenting simulations or video as anintegral part of a test – then the costs oftesting are likely to increase.

3.2 USES OF E-ASSESSMENT TOSUPPORT CURRENT EDUCATIONALGOALS

3.2.1 Using ICT to support Multiple Choice Tests

This is a well-established technology,particularly well suited to assessingdeclarative knowledge (‘knowing that’) inwell-defined domains. Developing tasks toidentify student misconceptions is alsopossible. It is harder to assess proceduralknowledge (‘knowing how’). MCT isunsuited to eliciting student explanations,or other open responses. MCT have thegreat advantage that they can be verycheap to create and use. Some of thischeapness is illusory, because the costs

19

Page 22: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

of designing good items can be high. Over-use of MCT can be very expensive, ifit leads to a distortion of the curriculum infavour of atomised declarative knowledge,divorced from conceptual structures thatstudents can use to work on the world,effectively. MCT are used extensively in theUSA for high-stakes assessment, and arepresented increasingly via the web. Forexample, web-based high-stakes Statetests are available in Dakota and Georgia;the Graduate Record Examination (GRE),used by many colleges to determineaccess to Graduate School in many UScolleges, is available online.

3.2.2 Creating more authentic paperand pencil tests

It makes sense to allow students access tothe tools they use in class, such as wordprocessors, and that professionals use at work, such as graphing tools andmodelling packages, during testing. Itmakes no sense at all to always forbidstudents to use ‘tools of the trade’ whenbeing assessed. E-learning changes the nature of the skills required. E-assessment allows examiners to focusmore on conceptual understanding of whatneeds to be done to solve problems, andless on telling students what to do, thenassessing them on their competence inusing the manual techniques required toget the answer. In Australia, the State ofVictoria (www.vcaa.vic.edu.au/prep10) has asystem for essay marking where studentskey in their responses to questions, whichare then distributed electronically andmarked by human markers. ComputerAlgebra Systems (CAS) can be used in theBaccalauréat Général Mathématiquesexamination in France; the InternationalBaccalaureate Organisation (IBO) is

running a CAS pilot for its Higher LevelMathematics Diploma from September2004. In the USA, CAS can be used whentaking the College Board’s AdvancedPlacement Calculus test.

3.2.3 Using ICT to support current UK examination processes

A number of ways in which ICT canimprove current examination practices are set out below.

Better school-examination boardcommunication: Tomlinson (2002) pointsto existing extensive use of ICT by awardingbodies in the examination process, andargues for more use of Electronic DataInterchange (EDI) systems, which enableschools and colleges to submitexamination entries and information aboutcandidates online and to receive resultsautomatically.

Supporting the current marking andmoderation process: a challenge faced bylarge-scale tests that require humanmarkers is to ensure the comparability ofstandards across markers, and over timefor all markers during the grading process.Chief examiners create scoring rubrics toguide other markers, and there is usually aprocess of standardisation where markersuse the scoring rubrics to score a sampleof scripts, and attend a standardisingmeeting where standards are compared,discrepancies are discussed, and therubric is tuned. Once markers havereached an appropriate level of markingaccuracy, they mark examinationsindependently. Systems vary in terms ofthe extent of the moderation used. In somesystems, scripts are sampled by chiefexaminers, and serious deviation from the

20

it makes senseto allow students

access to thetools they use in

class, duringtesting

SECTION 3

CURRENT DEVELOPMENTSIN E-ASSESSMENT

Page 23: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

rubric can lead to the remarking of all thescripts sent to a particular examiner. ICTcan be used to support this process.Sample scripts typical of differentcategories of student work can be putonline, for easy reference by markers.Entry of marks can be done via templatesthat ensure that markers complete everysection, and the tedious process ofaggregating marks from different parts ofthe script is done automatically andwithout error. Data is collected in a waythat facilitates rapid and detailed analysis,at the level of responses to different partsof questions, whole questions, and thedistribution of test scores.

Replacing paper: in the USA (andincreasingly in the UK), there iswidespread use of systems where studentstake paper-based examinations, and thescripts are scanned electronically (this isanalogous to Optical Mark Recognition formultiple choice tests that has beenavailable for many years). Once in thisformat, the documents can be sentelectronically to markers, who can beworking almost anywhere. These systemshave a number of advantages over paper-based systems. First, there areconsiderable problems in tracking thedistribution and return of large volumes ofpaper to and from markers; there aresecurity issues sending examinationpapers by post, and scripts can get lost.Second, moderation of the quality ofscoring can be done easily. Pre-scored‘anchor’ papers can be sent to markersduring the course of their marking, toensure they are maintaining standards;markers who do not perform adequatelycan be told to take a break, or can beremoved from the pool of markers. Thewhole process can be monitored in termsof the rate at which scripts are being

marked. There is flexibility in the ways that scoring is done. Markers can be asked to score whole scripts, or individualquestions. So a newly appointed markermight be sent questions judged to be easy to mark, and more experiencedmarkers might be sent questions whichrequire deeper subject knowledge. Thereliability of scoring can be increased.Scripts judged to be around keyborderlines on first marking can be sent to other markers; scripts judged to be well away from boundaries need be scored only once. Online support can beprovided; markers can ask for help withspecific student responses. Data iscaptured in a form suitable for a number of subsequent analyses.

An interesting variant of this approach thatobviates the need for scanning would be torequire candidates to use ‘intelligent pens’.These pens have two distinct functions.The first is to write like a conventional pen.The second is to record its movements(exactly) on the page. This is done by usingspecially prepared stationery. Imagine youcould see a small square area of abanknote. The pattern across the wholesurface is never repeated, so that, givensufficient time, you could find exactlywhere the square is located on the note.The pen works in a similar way, to recordits position on the page over the course ofthe examination. The pen is thenconnected to a computer, and all the datais downloaded. The whole studentresponse can then be reconstructed.Clearly, this approach would have to besubjected to extensive trialling before anywidespread adoption.

21

Page 24: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

3.2.4 Online assessment: turning a GCSE paper into ‘computer-only’ e-assessment

An interesting challenge is to devise ways to replace paper-based tests withICT-based tests, and to score themautomatically. Some virtues of paper-based tests are unlikely to be replicated fora number of reasons, so setting tests on-screen is likely to bring about changes inthe nature of what is assessed. Here, weconsider one specimen GCSE mathematicspaper to illustrate the problems.

Measuring and drawing: about 10% of themarks in the paper-based assessmentrequired the use of actual ‘instruments’(ruler, protractor, compasses). Oneapproach for translation onto screen wouldbe to simulate the physical instruments,eg to provide a virtual protractor that canbe dragged around the screen and rotated.Another is to provide CAD or interactivegeometry packages. The latter wouldrequire a substantial change to thesyllabus, but could provide real benefits interms of student learning.

Mathematical expressions: about 20% ofthe marks required the student to writedown answers that could not be keyed in,using a standard keyboard. These includedfractions, division expressions, and powers.

Rough work and partial credit: almostevery question in the paper formatincluded space for rough work, and about30% of the total marks potentially could beawarded based on this work, in the form ofpartial credit awarded where the finalanswer is incorrect (these marks areusually awarded in full if the final answeris correct). There are two distinct problemsin translating this to a digital format – first

capturing the rough work, and second,allocating partial credit. Computer captureis very difficult, given current interfaces;the rules for allocating partial credit wouldhave to be specified in very fine detail forthem to be used as part of an automaticscoring routine.

3.2.5 Scoring of open responses

GCSE questions often require students toanswer questions in their own way, and toexplain things – scoring these responsesautomatically is inherently difficult.Automated scoring of open studentresponses is the focus of a good deal ofongoing work. A number of approacheshave been taken to the problem ofautomatic scoring. One is based on theanalysis of the surface features of theresponse (Cohen, Ben-Simon and Hovav2003), such as the number of charactersentered, the number of sentences,sentence length, the number of low-frequency words used, and the like. Thesuccess of such methods can be judged bycomparing the correlation betweencomputer and human judges, and thecorrelation between scores given by twosets of human judges. Cohen, Ben-Simonand Hovav (2003) looked at the scoring of arange of essay types by humans andcomputer, and report that the correlationbetween the number of characters keyedby the student, and the scores given byhuman judges are as high as thecorrelation between scores given byhuman judges. Nevertheless, thesescoring systems do not provide a panacea.In the USA, double marking is used toensure reliability (this is rarely done in theUK). ICT can be used to moderate humanmarkers (and save money) – if thecomputer and the human disagree, the

22

ICT can be usedto moderate

human markers

SECTION 3

CURRENT DEVELOPMENTSIN E-ASSESSMENT

Page 25: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

paper is re-marked by a human. Machine-only scoring is unlikely to be useful in UKcontexts, for two reasons. First is that theUK culture requires that scoring schemesbe described in ways that are useful toteachers and students. Second is that theconsequential validity of such scoringsystems would be dire – the advice tostudents would be to improve their scoressimply by using more keystrokes. A secondapproach which could improve the qualityof scoring and reduce costs is being usedto assess student responses on tasks incontexts where the range of acceptableresponses can be well defined, such as inshort answer science tasks (eg Sukkarieh,Pulman and Raikes 2003). Here,appropriate (‘the Earth rotates around thesun’) and inappropriate (‘the sun rotatesaround the Earth’) responses are defined.Lists of synonyms are generated for nouns(‘our globe’) and verbs (‘circles’), andalternative grammatical forms are defined,based on analyses of large numbers ofstudent responses. Student responses areparsed using techniques borrowed fromNatural Language Processing, and arecompared with stored appropriate andinappropriate responses, using a variety ofInformation Extraction techniques (seeCowie and Lehnert 1996). Mitchell,Aldridge, Williamson and Broomhead(2003) describe work at The DundeeMedical School. Here, all students take thesame examination at the end of every year.Academics are presented with all theresponses to the same question, with thecomputer’s judgement on the correctnessor otherwise of the answer, and anestimate of the confidence of thejudgement. Human scoring time isdramatically reduced, and staff reportpositive benefits in terms of the quality ofthe questions they ask, both in terms ofrewriting ambiguous questions (which

produce student responses that aredifficult to score) and in terms of writingquestions which highlight studentmisconceptions. This approach requires agood deal of work prior to live testing, so is well suited to situations where tasks will be used repeatedly.

In the USA, the Graduate ManagementAptitude Test (GMAT) - used to determineaccess to business schools - usesautomated scoring of text. Here again, thetest is scored by both human and machine,to offer some sort of reliability check forthe human marker.

3.3 ICT SUPPORT FOR CURRENT‘NEW’ EDUCATIONAL GOALS

There is an emerging consensusworldwide on ‘new’ educational goals,focused on problem solving usingmathematics and science, supported by anincreased use of information technology(compare, for example, UK developmentswith those in New Zealandwww.minedu.govt.nz; and Singaporewww1.moe.edu.sg/iteducation). These newgoals involve the development of higher-order thinking, and a range of social skillssuch as communication, and working ingroups. There is an honourable tradition ofassessing problem solving via the use ofextended tasks, such as those developedby the APU (eg Archenhold, Bell, Donnelly,Johnson and Welford 1988). However, thecomputer offers some unique features interms of representation, interaction, andits support for modelling. Here, wedescribe some recent developments whichmake use of these unique features.

23

new goals involvethe developmentof higher-orderthinking, and arange of socialskills

Page 26: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

3.3.1 The development of World Class Tests

Tests were designed to identify high-attaining students in problem solving inmathematics, science and technology at ages 9 and 13 years, as part of the work on the World Class Arena(www.worldclassarena.org). Computersmake it easy to present new sorts of tasks,for example tasks where dynamic displaysshow changes in several variables overtime, or which present video of a situationwhich students must model. A wide varietyof representations can be supported, andstudents can be asked to switch betweenthem. The interactive properties ofcomputers make them well suited to the assessment of process skills.

Using computers to give students controlover how data is presented allows them towork with complex data sets of a sort thatwould be very difficult to work with onpaper. Tasks can be set in realisticcontexts, using realistic data to addressproblems of considerable complexity, usingresources and methods that are familiar toprofessionals working in the relevant field.Two examples are presented here: Oxygenand Bean Lab.

Further examples of tasks can be found in Ridgway and McCusker (2003). Skillsassessed include:

Understanding and representingproblems: traditional educational goalssuch as the ability to interpret tables andgraphs, and to translate information codedin one representation into informationcoded in another representation continueto be vital skills for mathematical andscientific literacy. Computers allow fastand reversible transformations ofinformation from one representation toanother, and students can be asked toexplain the relationships between them.

Assessing process skills in science and mathematics: the desire to assessprocess skills is not new. Traditionally,students would be presented with tasks inlaboratories, or would be required to keeplogs and portfolios of their laboratorywork. However, the laboratory setting canintroduce elements which reduce thereliability of the assessment, such asinstruments which fail to function properly,or materials whose properties are lessthan ideal. Students are required tophysically manipulate apparatus – chancedifferences between students in terms of

24

the interactiveproperties

of computersmake them well

suited to theassessment ofprocess skills

SECTION 3

CURRENT DEVELOPMENTSIN E-ASSESSMENT

Page 27: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

their previous exposure to particularequipment can both reduce reliability, and add an extra cognitive load to theintellectual task being performed. In somesituations, issues of health and safetyarise. Some education systems areunwilling to accept teacher ratings ofstudents for the purposes of high-stakestesting, with the result that process skillsin science are not assessed at all.Computer-based assessment permits theassessment of these valuable aspects oflearning science, at modest cost. A rangeof different process skills can be identified,which include:

• working systematically (for example,choosing tests systematically,controlling variables and recordingresults systematically)

• generating and testing hypotheses

• finding rules and relationships

• handling complex data

• testing solutions

• seeking completeness and rigour (inmany real-world situations, exemplifiedby diagnosis and remediation in spheressuch as medicine and industrial processcontrol, it is important to find all of thefaults in a system).

Five sets of live tests have beenadministered in the UK and elsewhere,each of which was preceded by extensivepre-testing. A notable result was the easewith which students interacted withcomputers. The affective response fromstudents was very strong – they reallyenjoy working on these tasks. This mightbe related to the sustained challenge thetasks present, which is similar to thereported reasons why they like computer-based games (Kirriemuir andMcFarlane 2004).

Students performed better on some tasksthan one might expect – notably tasks thatrequire them to reason from complex datasets (eg data with two independentvariables and one dependent variable atage 9 years). We take this as a very positivesign that computers can play a leading rolein the development of the skills whichconstitute the new educational agenda.In many aspects, student performance waspoor - work characterised by guessing, too little use of systematic methods, poor hypothesis generation, and poorgeneralisation. On many tasks, studentswere able to show evidence of goodreasoning skills; however, explanationswere often weak. Given the earlierdiscussion of the impact of assessment onthe curriculum, it is to be hoped that theuse of e-assessment of process skills willlead to better student performance on arange of important activities.

World Class Tests focused on summativeassessment in science, mathematics andtechnology, and used a variety of contexts,including geography and economics, aswell as biology, physics, and engineering.The ideas are generic, and can be appliedto many curriculum areas. On the basis ofanalyses of student performance on WCT,teaching modules for whole class use havebeen developed, targeted on weak processskills. These teaching modules provide agood deal of formative assessment, andrequire students to engage in reflectiveactivities such as critiquing student work,and explaining their own solution strategies.

We discuss ‘new’ educational goals that are less amenable to summativeassessment – such as the ability to work in groups, to communicate, to learn tolearn – in Section 4.

25

computers canplay a leadingrole in thedevelopment ofthe skills whichconstitute thenew educationalagenda

Page 28: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

3.3.2 Assessing ICT at Key Stage 3

Ongoing work funded by QCA sets out toassess student attainment in ICT at age 13years. A key principle for the design ofthese tests is that students should betested on their performance on extendedtasks (‘create a web page about topic X foraudience Y, using a particular set ofresources - a database, ‘clients’ accessiblevia e-mail, spreadsheets for planning, web page creation tools’) not on a series of sub-tasks (‘use a spreadsheet to add up these numbers’). An extraordinarilyambitious goal is to present tasks andscore performance entirely by computer.This is a laudable aim, and shows agovernment commitment to high quality e-assessment (including £20m for the project).

3.3.3 Digital portfolios

An historical legacy which bedevils thecurrent education system in the UK is the distinction between ‘academic’ andpractical’ subjects. This was enshrined inthe 1944 Education Act, which createdgrammar, technical and secondarymodern schools (Tattersall 2003). Abstractthinking is important; appropriate action incontext that rests on practical competenceis important. Neither is much use on itsown, and students should be taught toboth abstract and apply. For this tobecome a classroom reality, assessmentsystems must require students to show the full spectrum of competencies in anumber of school subjects. If high-stakesassessment systems fail to reward suchbehaviours, they are unlikely to be thefocus of much work in school. E-portfoliosoffer a way forward.

There are three distinct uses for portfolios.The first is to provide a repository forstudent work; the second is to provide astimulus for reflective activity – whichmight involve reflection by the student, and critical and creative input from peersand tutors; the third is as showcase, which might be selected by the student torepresent their ‘best work’ (as in an artist’sportfolio) or to show that the student hassatisfied some externally defined criteria,as in some teacher accreditation systems(eg Schulman 1998). These uses are notmutually exclusive. Students may well wish to archive all their work; reflectiveactivities and feedback from others will bebased on a subset of this work; the final‘presentation portfolio’ will be selectedfrom this corpus.

These different uses of portfolios reflectdifferent, but not always incompatible,theories of learning. A behaviouristapproach will focus on defining ‘corecompetencies’ that are impossible toassess in timed examinations, and theneed for fast and efficient feedback onstudent products. A social constructivistview will focus on the importance ofreflection and sense making by a group(including the tutor) which will include the negotiation of educational goals.

ICT provides an opportunity to introducemanageable, high quality coursework aspart of the summative assessmentprocess. Student portfolios have beenadvocated for a long time, and have been used on a limited basis. From theviewpoint of assessment, the rationale forportfolios is clear: there are a number ofvaluable activities and attainments thatcannot be assessed using the format oftimed tests. The ability to create, design,reflect, modify and persevere are all

26

assessmentsystems must

require studentsto show the full

spectrum ofcompetencies

SECTION 3

CURRENT DEVELOPMENTSIN E-ASSESSMENT

Page 29: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

important goals of education. It is entirelyappropriate to assess these processes bycollecting evidence on the ability to engagein an extended piece of work, and to bringit to a successful conclusion by thecreation of some product – lab report,video, installation etc. Part of the portfoliocan (should) provide evidence of the rangeof personal skills demonstrated, perhapsunder the headings suggested in theTomlinson Report (2004): student self-awareness – of themselves and the waysthey learn and what they know; howstudents appear to, and interact with,others; thinking about possible futures andmaking informed decisions. A section ofthe portfolio in the form of a viva, or simplyannotations of products where studentsshow their attainments in these threeaspects of performance is appropriate.

A number of problems are associated withportfolios and other sorts of coursework.One is the problem of storage – especiallyin design projects and in art. ICT can solvethe problem by holding images of artefactscreated. A second problem is studentmisbehaviour; this can have a number of forms. One is simply that work isplagiarised; another is that students createsome artefact, then ‘back-fill’ by inventingthe development process (which is oftenassessed as part of the final mark) posthoc. ICT can help with both of theseproblems by requiring the submission ofimages of intermediate products, with time stamps. On a more positive note, the ability to store and work with images(photographs, video) is likely to maketeaching of the design process moreeffective. Devices such as mobile phoneswith in-built cameras and facilities foraudio recording make it easy to documentthe evolution of ideas and artefacts. Thisfacility serves a number of functions. First,

it simplifies the documentation of thedevelopment of work – reducing the ‘busy work’ students might otherwise have had to engage in. The process ofdocumentation via a portfolio of worksupports student reflections on processes– on decisions made deliberately, thoseforced by circumstances, and those thatjust sort of happened. Digital images areeasy to manipulate and present. Studentpresentations of work on the developmentof artefacts is easy, once images arecaptured digitally.

In some subjects, such as design andtechnology, and art, extended projects areat the heart of the discipline. The use of e-portfolios maps directly onto currentconceptions of the domain, and offerspractical solutions to some commonproblems (eg Kimbell 2003). This work isimportant, and is likely to be applicable ona large scale in the near future. A verylarge number of institutions have madeuse of portfolio systems; the AmericanAssociation for Higher Education (AAHE)Portfolio Clearinghouse (www.aahe.org/teaching/portfolio_db.htm) provides anonline searchable database of profiles ofelectronic portfolio projects and resourcesin higher education, and is a valuablesource of ideas.

3.4 SUMMARY OF SECTION 3

There are a number of excitingdevelopments in the use of e-assessmentfor both summative and formativepurposes, and several UK developmentsare at the leading edge, worldwide. In the UK, the government has decided that extensive use will be made of e-assessment. Some of these developmentsare a response to current problems

27

the ability tocreate, design,reflect, modifyand persevereare all importantgoals ofeducation

Page 30: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

associated with increases in the volume of assessment; some reflect a desire to improve the technical quality ofassessment (such as increased scoringreliability), and to make the assessmentprocess more convenient and more usefulto users (by the introduction of on-demandtesting, and fast reporting of results, forexample). E-assessment also makes itpossible to assess aspects of performancethat have been seen as desirable for a longtime – such as the assessment of processskills, and the efficient handling of studentportfolios. Using E-assessment to teststudent ICT capability represents anextremely ambitious goal of presentingholistic tasks to assess performance,rather than a collection of short taskswhich are symptoms, rather thanexemplars, of ICT capability. Nevertheless,some major challenges face these newdevelopments. Paper tests have a numberof advantages in terms of the quality of theimage presented, and the variety of ways inwhich students can respond; automaticscoring of responses will be very difficult,and in some cases impossible to achievevia computer.

A complete reliance on paper-basedassessment has a number of drawbacks;first is that such assessments areincreasingly ‘inauthentic’ as classroom andprofessional practices embrace ICT.Second is that such assessmentsconstrain progress, and have a negativeeffect on students who have to learn (justfor the exam) how to do things on paperthat are done far more effectively with ICT.A third major constraint is that currentinnovative suggestions for curriculumreform, which rely on student portfolios fortheir implementation, will be impossible tomanage on a large scale without extensiveuse of ICT.

E-assessment is a stimulus for rethinkingthe whole curriculum, as well as allcurrent assessment systems. E-assessment provides a cost-effective wayto integrate high quality portfolioassessment with externally set andmarked tests, in any combination. Thismakes it likely that there will be significantchanges in the structure of summativeassessments, because of the range ofstudent attainments that can now beassessed reliably. There is likely to beextensive use of teacher assessment ofthose aspects of performance best judgedby humans (including extended pieces ofwork assembled into portfolios), and moreextensive use made of on-demand tests ofthose aspects of performance which canbe done easily by computer, or which aredone best by computer.

28

SECTION 3

CURRENT DEVELOPMENTSIN E-ASSESSMENT

Page 31: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

4 OPPORTUNITIES ANDCHALLENGES FOR E-ASSESSMENT

Here, we consider some issues which needto be addressed as a matter of urgency.First are some speculations on how wemight assess process skills - essential butoften ill-defined educational goals. It willbe important to establish the value of such assessments as part of large-scalesummative assessment, in contrast to their roles as potentially usefulcomponents of formative assessment. It will also be important to establish theappropriate scale of such assessments,and their locus in the curriculum, in termsof educational gains and manageability.Second, we consider the problems of‘going to scale’. Large scale innovation –especially where computers are involved –does not always run smoothly.

4.1 ASSESSING PROCESS SKILLS

4.1.1 Assessing metacognition

As we move towards a knowledge-basedsociety, the development of metacognitiveskills increases in importance, and theybecome educational goals in themselves.Currently, these goals are ill-defined inthat there is not yet a consensus in theeducational community about their exactnature or how they can be assessed. Goalscan be described, and recognised whenthey are achieved, but exemplificationneeds further work, and a general sharingof ideas. Ridgway, Swan and Burkhardt(2001) exemplify this process as part of‘Assessing Mathematical Thinking’ inmaterials developed for the US NationalInstitute for Science Education(www.wcer.wisc.edu/nise/cl1).

Here, examples of metacognition are givenunder four headings: knowing how to useknowledge; analysing and improvingcognitive processes; supporting reflectionand critical skills; and assessingcompetence with different thinking styles.

Knowing how to use knowledge: the weboffers great opportunities and pitfalls forassessment. Most obviously, the existenceof the web means that successful use of itshould be an educational target. Expertisein navigation, such as learning how tobookmark useful sources, and how torefine searches are useful skills, but aresubsidiary to a set of meta-knowledgeskills about the nature of knowledge – howit is constructed, presented, and used bydifferent people for different purposes.There is a need for students to developsophisticated theories-in-action aboutknowledge. These theories should includeaccounts of the nature of knowledge – itsgeneration, and the various functions itserves (including its use as just anotherrhetorical device!). Students also need toknow about their own knowing – what theydo and do not know, how they acquire, loseand change their own knowledge – andhow they control their cognitive processeswhen solving problems.

We address the first goal elsewhere in thediscussion on assessing competence inICT. The latter goal is illustrated by LordArmstrong’s remark “power is knowinghow to use knowledge”. The commoncorruption to “knowledge is power” misses Armstrong’s point almost entirely.Our educational ambitions should be to encourage students to becomesophisticated users and creators ofknowledge. Good formative assessmentshould contribute to students’development; web-based sources can

29

there is a needfor students to developsophisticatedtheories-in-action aboutknowledge

SECTION 4

OPPORTUNITIES AND CHALLENGES FOR E-ASSESSMENT

Page 32: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

be part of both formative and summativeassessment of these key elements ofstudent performance.

Key aspects of performance relate to theexploration of the origins of the source,analysis of its qualities as a source, and itsrelation to a wider set of information.Successful formative assessment helpsstudents to internalise questions andquestion styles. For summativeassessment, we expect students to askquestions about the nature of theinformation source. The originator can beimportant – dietary advice from Kellogg’sshould be treated more cautiously thanadvice from the British MedicalAssociation. Who created it? For whatpurpose? From what perspective was thiswritten? The poor quality of much of theinformation on the web can be a virtue,pedagogically, because students see thesense in challenging the authority of anysource, and can do so easily by consideringalternative sources (eg Downes andZammit 2000).

Skills in analysing documents in terms oftheir style and their use of particularrhetorical devices, and in creatingdocuments for different audiences and indifferent writing genres, are beingdeveloped and used in English (andsociology and philosophy at universitylevel). Again, the ubiquitous use of websources provides both a rationale for thevalue of these analytic and creativeactivities, and a rich source of resourcesfor assessment purposes.

The web makes it easy to compare andcontrast different interpretations of ‘thesame’ events by different ‘news’ providers,and by the same provider over time. Interms of assessment, students can be

asked to compare and contrast differentpresentations, and to describe theevolution of a news event over time. Thisrequires analysis of the way that evidenceis selected, and the ways that ‘events’ arereconstructed over time.

A further key aspect of knowledge use isthe ability to relate a particular source to alarger body of knowledge. It will always beimportant for learners to develop richschemas of knowledge – facts, skills, andprocedures and their interconnections – asthe basis for judging the value orotherwise of putative new information, or atheoretical account. In science, a simpleexample is a digital image of a mammalwith horns and claws. Students areexpected to say it is most unlikely, becausehorns are associated with herbivores, andclaws with carnivores. At a higher level ofabstraction, students might be asked toresolve famous conflicts in scientific ideas,in terms of what was known at the time.For example, Lord Kelvin – probably themost distinguished scientist of his day –argued against the theory of evolution, onthe grounds that the timescale wasimpossible. The core of the Earth is largelymolten, but if the Earth were really themillions of years old needed forevolutionary processes to work, it wouldhave cooled down long ago. What didn’t heknow (or is his criticism valid)? The web isa source of information that challengescurrent knowledge – students can beasked to relate ‘breaking’ research to awider set of knowledge. The recent scareover the MMR vaccine (and the damagethat will be done to children by an under-analysed and over-publicised piece ofresearch) provides an example.

A vivid example of summative evaluationwhich requires both a deep knowledge

30

SECTION 4

OPPORTUNITIES AND CHALLENGES FOR E-ASSESSMENT

Page 33: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

schema and powerful skills in knowledgedeconstruction and reconstruction isprovided by a final undergraduateexamination at Goldsmith’s University onthe art history course, where students arepresented with two pictures, side by side,which they are to compare and contrast.They are required to name the artist,deconstruct the iconography, and interpreteach work in its historical context. Thiscould be presented via ICT, and could beextended to film, and to other contexts.

Another approach to supporting reflectionabout knowledge acquisition and creationis to incorporate assignments that requirea reflective account of the process ofcreating some artefact (object or written).Students can be asked process questionsabout sources of information – ways to findgood sources (perhaps in the form of‘advice to someone with a similar job todo’), and about the sources themselves.They can be asked about problems faced,and the ways they were solved, in these‘meta-learning’ essays.

‘Open-web’ examinations offer a parallel to open-book examinations. One virtue of such examinations is that they are more ‘authentic’ than conventionalexaminations, in that, outside educationalcontexts, one rarely has to answer asubstantive question without anyresources. They allow the examiner to seta broader range of questions, becausestudents are not expected to retain all the relevant information in memory. An adaptive strategy for success on such examinations is to develop meta-knowledge of the whole area, and to indexsources very carefully. A large informationbank with no index is of little use. Comparethe preparation necessary for this sort ofexamination with the ‘cramming’ strategy’

that can be effective when preparing forconventional examinations. There, thedanger is that students hold information in a relatively temporary state for thepurpose of the examination, then forgetthe information once the examination isover. Open-web examinations are likely tohave desirable ‘consequential validity’ –that is to say, are likely to lead to desirablelearning (and learning strategies). Theunpopularity of open-book examinations(which probably arises because theyrequire serious thought about the subjectmatter) is likely to apply equally to open-web examinations. The potential forfraudulent behaviour by students (such ase-mailing for advice in situations wherethe purpose of testing is to assess theability to search the web, or searching theweb when the purpose of the assessmentis to assess ‘networking’ skills) means that student activities will need to beconstrained in appropriate ways.Nevertheless, open-web assessmentshould be explored further.

Analysing and improving cognitiveprocesses: interactive whiteboards canprovide the facility to work as a wholeclass on a problem or simulation, then toreplay and critique the sequence ofactions. This provides the opportunity todiscuss seemingly abstract concepts suchas ‘strategy’ and exemplify them withconcrete examples. Analogies with theanalysis of games (eg tennis) can make theactivity seem natural in class (of course,analysis of on-screen video of ongoinggames is a specific example of the sorts ofanalyses being described here). The long-term intention is to help students developmetacognitive skills that will be applicablein a wide variety of situations. By looking atdifferent solution attempts, students canbe asked high-level questions such as

31

open-webexaminations are likely to leadto desirablelearning (and learningstrategies)

Page 34: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

‘how do you solve problems of this sort?’ –which can be assessed more formally bytasks such as ‘write some guidance forsomeone else, that will help them to solveproblems like this one’. A requirement forsummative e-portfolios could be thatsample reflective analyses of processes be included.

These techniques have great potentialwhen the focus is on the social andemotional education of students. Topicsraised in personal and social educationsuch as approaches to bullying can beapproached by presenting students withvideo vignettes, and asking them todescribe situations, the interactions thattake place, and the feelings of participants.Parallel information channels (provided bythe participants) can provide students withfeedback on the correctness or otherwiseof their insights. At a lower level,assessing children’s ability to identify theemotions being expressed in differentfaces can give insights into theirdevelopmental state (or, in more extremecases, into pathological states such asautism). If summative information isappropriate, it can be based on theanalysis of such vignettes.

Supporting reflection and critical skills:an important higher-order skill is theability to review and improve work. Thiscan be done via paper and pencil (forexample by writing on every third line, andchanging pen colour at every revisioncycle), but is made very easy by the use ofICT, with facilities such as ‘track changes’in MS-Word. Students can be asked toprovide examples of their ability to improvework on the basis of others’ and their ownsuggestions, and of their ability to critiquethe work of others. Another way to assesscritical thinking is to require students to

annotate work to show where they meetthe assessment criteria.

Courtenay (personal communication, 2004)described an activity designed to supportcreative writing in English in a night classcomprised of 30 non-native speakers at anearly stage of learning English. Courtenayfocuses on creation and critique, andseeks to spend as much time as possibleinteracting with his students. Each studentwrites online, and when they are satisfiedwith their composition, it is posted to ashared server. Every student is required tooffer constructive comments on fivecompositions, and to revise their ownwriting in the light of five sets ofcomments. The teacher is able to tour andcoach individuals as they write. With littleeffort, this approach could be extended toproviding summative assessment.Students could be required to submit theircomments on others’ writing to beevaluated, and could provide evidence oftheir ability to use comments on their ownwork. An assessment system like thiswould reinforce rather than distort theeducational ambitions of the teacher.

Peer assessment is attractive for anumber of reasons. (Topping’s 1998 reviewdemonstrated that it is associated withgains on conventional performancemeasures, in higher education.) Studentscan be asked to create far more pieces ofwork than could be marked by a singletutor. It can avoid the problem that as aclass size gets bigger, the load on thetutor increases directly, along with thetime taken to provide feedback to students.Students must understand criteria forassessment, and must acquire a range ofhigher-order skills, such as abstractingideas, detecting errors and misconceptions,critiquing and suggesting improvements, if

32

SECTION 4

OPPORTUNITIES AND CHALLENGES FOR E-ASSESSMENT

Page 35: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

they are to engage in peer assessment.Peer assessment is a fact of life outsideeducation, so peer assessment is far more‘authentic’ than some forms of assessmentsuch as multiple choice tests. Possibledisadvantages relate to the possibility of anenhanced workload on students, unreliablefeedback, and biased feedback.

A number of commercially availablesystems have been designed to supportpeer assessment. Calibrated PeerReview™ (Chapman and Fiore 2001) wasdesigned to support the peer assessmentof essays in molecular science, but hasbeen applied in a variety of subjects, andwith students across the education system.Students write short essays, and are askedquestions designed to foster their criticalthinking. Students are presented withthree ‘calibration’ essays to grade, andmust demonstrate their competencebefore they progress. Two of the essayscontain errors and misconceptions whichstudents must identify and correct.Students are also asked questions on styleand grammar. The scores they give to theassignments are compared with ‘official’scores, and a calibration report is createdfor the student and the tutor. Ifperformance is inadequate, moreinstruction is provided, and the studentmust repeat the activity. Once they haveshown that they can assess essayseffectively and reliably, they are asked tograde three essays by peers, and finallyare asked to grade their own essay. Thestudent and the instructor receivecomments and scores.

CPR is not restricted to essays in science;the idea is generic, and can be applied toliterary criticism, commentaries on a pieceof art, or laboratory reports, for example.The tutor must select the focus of the

assignment, write an exemplar answer for calibration, and select two pieces ofstudent work which contain interestingerrors or omissions. Each of these has to be graded by the tutor, and relevantcomments have to be written. The tutoralso writes key questions on content andstyle. CPR is designed to overcome thepotential weakness of peer assessment in terms of unreliable assessment (via training and moderation) and bias (via anonymity). The authors claimconsiderable gains in students’ ability to‘learn to learn’ because their attention is focused on abstracting ideas andarguments, describing, analysing andassessing the quality of material, and inreview. CPR also increases the amount of writing that students do.

Doiron and Isaac (2002) have developed anovel form of online peer review designedto complement the American College ofSurgeons Advanced Trauma Life SupportCourse for fourth year medical students.Their system involves self-assessment,peer evaluation, feedback and debate.There is an inherent problem giving largenumbers of students direct experience ofEmergency Room procedures. Here,students are presented with a realisticcase study, and must prevent the patientfrom dying, conduct clinical tests, thenrequest appropriate lab work followed bydiagnosis and recommendation of atreatment. Students reflect on, and self-assess, their knowledge. They submit adiagnosis and proposed treatment plan tothe whole group. For peer review, they arepresented with two other diagnoses andtreatments – one from the tutor, preparedto contain errors, for critique. If thestudent fails to detect the errors, they getindividual feedback from the tutor.Students then review ‘live’ reports from

33

peer assessmentis a fact of lifeoutsideeducation

Page 36: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

two of their peers (so three reviews areconsidered together). Where there aredisagreements, the two views arepresented to a larger group (four to tenstudents) who must all offer their ownview, and debate the issue. Similar work isbeing conducted on a health psychologycourse, and in engineering.

Assessing competence with differentthinking styles: mobile phone technologymight provide a means of assessingthinking styles via simulated group work.Here, each student works in a simulatedenvironment, where responses from other‘group members’ are pre-specified, andsome responses to the actions of thestudent are pre-defined. This environmentis artificial for a number of obviousreasons – contact is via phone (or e-mail)rather than face-to-face and the range ofdynamic interactions is constrained.However, these constraints mean thatstudents can be assessed in relativelystandardised conditions, and sequencescan be replayed for analysis and reflectionas part of formative assessment.

Analysing the ability to engage in DeBono’s (2000) ‘Thinking Hats’ activityprovides a concrete example. De Bono hasidentified a number of thinking styles, allof which are useful when solving problems.None is effective on its own. He arguesthat people differ in their preferences forthese different thinking styles, and oftenstick with a particular style of thinking. Interms of group dynamics, individuals canbecome ego-involved with a particularstyle of thinking, with negativeconsequences for the productivity of thegroup. De Bono argues that these differentthinking styles should be made explicit,and that every group member shouldengage with every thinking style in the

course of group work. He suggests aformal mechanism for this, where thinkingstyles are associated with hats of differentcolours, and group members are invited totake particular roles – sometimes asindividuals, and sometimes as a wholegroup. Thinking styles include askingabout what is known or what is needed(the White Hat); saying why an idea won’twork (the Black Hat); generating ideas andalternatives (the Green Hat); describingfeelings, hunches and intuitions (the RedHat); managing group processes (the BlueHat); and the optimistic advocacy of ideas(the Yellow Hat).

Given some specific suggestions foractions via mobile phone or e-mail,students can be asked to work in Red,Yellow and Black Hat styles; or given astream of (simulated) input to aconference, students can be asked to workin Blue Hat mode. Their responses provideinformation on their strengths andweaknesses working in different thinkingstyles. This idea is not restricted to deBono’s framework, but is a generic idea for assessing individual skills in group settings.

4.1.2 Assessing group projects

A valuable skill is the ability to workproductively in groups. This requires good communication skills, understandingthe criteria for effective group work,understanding different roles, the ability toassess one’s own work and the work ofothers, and the ability to respond positivelyto formative and summative feedback. Theassessment of group work is problematicfor a number of reasons: problems can be caused by ‘social loafing’ and theallocation of equal marks for unequal

34

mobile phonetechnology mightprovide a means

of assessingthinking styles

SECTION 4

OPPORTUNITIES AND CHALLENGES FOR E-ASSESSMENT

Page 37: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

contributions; undesirable effects ofstudents rating peers; and time-hungryprocedures for gathering accurateevidence on student performance.

SPARK (Self and Peer AssessmentResource Kit - www.educ.dab.uts.edu.au/darrall/sparksite) is an academicopen source project designed to supportthe effective evaluation of group work, thathas been used in a variety of contexts inhigher education. It requires a clearspecification of the tasks to be performedby the group and the assessment criteria.Students reflect on group processes duringthe performance of the task, and rate allthe group members, and themselvesagainst the criteria provided. The tutormonitors the work of the group, grades theproduct of the group work, uses SPARK toconvert group marks into individual marks,and provides individual summative andformative feedback (eg that a student ratestheir own contribution to the group farhigher than other group members do).Evaluations of SPARK by its authors in a variety of higher education contexts have been positive (eg Freeman andMcKenzie 2002).

4.1.3 Assessing creativity

‘Creativity’ involves the production of a newidea or artefact that is judged by somecommunity to be of value. Many writershave made a distinction between analyticand creative thinking. Analytic thinking hasbeen characterised as: linear, rational,logical, conscious and deliberate. Creativethinking has been described as: parallel,unconstrained, illogical, unconscious, andchaotic. Creativity became a bandwagonfor education in the 1960s, in part as ahealthy corrective to an over-emphasis on

‘Intelligence’. A problem with some ofthese early proponents of ‘creativity’ (egGetzels and Jackson 1962) was that theyaccepted many of the philosophicalassumptions of the Intelligence movement,and many of their methods, but wereincompetent in their use. The result was amovement that was based on some goodideas, but which was poorly theorised, andsupported by flawed evidence. Just asthere are many styles of analytic thinking,that are coloured and improved byknowledge in particular domains, anddifferent ways to represent information, so too are there many styles of creativethinking, again, influenced by knowledgeand experience in a variety of domains.Creativity (as defined above) requires anintimate interplay of creative and analyticthinking. It is important to developcreativity, and to evaluate the products ofcreative thinking. Creativity should beevaluated by an analysis of product, and by an analysis of student processes, using methods described earlier (notably,tracking the design process, and reflectiveaccounts on this process).

It can be difficult to obtain good paper-based accounts of student processes andresults after engaging with an extendedpiece of work. This can be a desirableactivity for a number of reasons. First, itrequires students to translate knowledgefrom one form to another, and to considerthe needs of a different audience – notablyfrom a static written form whose primaryaudience is the teacher, to a visual anddynamic form for some predefinedaudience, who will have a range ofunderstandings about the topic in hand.Second, it is inherently valuable as a skill.Digital cameras and whiteboards make iteasy for students to show their work(which might be on paper, in the form of

35

it is important to developcreativity, and to evaluate theproducts ofcreative thinking

Page 38: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

manipulatives, or some artefact that hasbeen created) and to explain what theyhave done, justify their answer, anddescribe the design decisions they took.

4.1.4 Assessing communication skills

Mobile phones could be used moreextensively for assessment. A simpleexample would be to use mobile phonesfor the aural comprehension aspect oflanguage learning. Current practices ofusing an analogue tape recorder at thefront of a classroom are inherently unfair.The quality of the sound will differ as afunction of the tape machine used; thesound intensity at the front of the room willbe dramatically higher than at the back ofthe room. Using conventional computertechnology, Southern Australia uses MP3files to test language comprehension (seewww.ssabsa.sa.edu.au) – clearly, goodpractice.

The eVIVA project (www.qca.org.uk/adultlearning/downloads/eviva_project.pdf,www.eviva.tv) uses phones as the mediumfor oral testing with portfolio-based KeyStage 3 ICT assessment. Students canbook a test session, and so can have(almost) on-demand testing. The phonesare also used for recording ‘voicepostcards’ of learning milestones, andposting these to a central website. The‘voice postcards’ can be used by a studentto support the piece of portfolio evidencewhich they are presenting.

As speech recognition technologiescontinue to improve, one can envisage asituation where questions are posed orallyby telephone, and student responses arescored automatically. In the case oflanguage learning, this could be applied to

elementary aspects of learning such aspronunciation, to vocabulary, and tocorrecting sentence structure ‘mistakes’presented to students. Given testtechnologies that support ‘tailored testing’,the phone system could be used to provideon-demand testing of some aspects oflanguage use. Such systems are unlikely tobe useable (in the short term at least) forhigh-stakes testing, because of problemsof impersonation. These problems may beremoved if effective person recognitionsystems are developed and introduced on a large scale.

4.2 NATIONAL CURRICULA, NATIONAL ASSESSMENT

The Tomlinson Report (2004) addressesfundamental questions about curriculumdesign and assessment, and describes anumber of serious problems with currentsystems. Assessment exemplifieseducational goals, and has a major effecton educational practice. Unlessassessment systems are aligned witheducational goals, they will distortcurriculum ambitions. There is a generaldesire for more school-based assessment,and more process-based assessment, andan insistence that current high standardsof equity and probity in the examinationprocess are maintained. E-assessment(eg via e-portfolios) can provide the meansto empower teachers and schools, whileensuring that high standards ofassessment are met. ICT can support thewhole process of teacher preparation, andthe establishment of procedures to ensurecomparability of standards across schools.School-based judgements could bemoderated by external computer-basedtests. E-assessment can extend the rangeof reliable assessments that can be

36

SECTION 4

OPPORTUNITIES AND CHALLENGES FOR E-ASSESSMENT

Page 39: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

conducted, and so can widen the debate oncurriculum and assessment design. On-demand testing will have considerableimplications for curriculum planning.Students could take summative tests atdifferent times, and could progressthrough the curriculum at different rates.

E-assessment could reduce the damagecaused by current tests. At present, newSAT papers are created each year, and allstudents answer the same questions. If thepurpose of testing is to establish theperformance of some system (such as aschool or an LEA), better methods couldbe employed. If there were a large bank oftasks available in electronic form, anddifferent students received a different setof tasks, then coverage of the curriculumcould be better, and there would be noneed to report individual student scores.This would have the advantage that alarger variety of task types could be used,and would avoid the current distortionscaused by teachers ‘teaching to the SAT’.

4.3 EVOLUTION AND REVOLUTION

Even where there is a shared vision onfuture curricula, there can be considerableproblems in implementation. Ridgway(1998) draws analogies between ecologicalrestoration and educational change, anddescribes the sorts of research needed forsuccessful change. This style is close toresearch in fast-changing fields such aselectronics, where discoveries andinventions drive practice and theory, incontrast to well-established fields wheretheory can lead practice. It is important tobe aware that some goals are easy toachieve from most starting points, whilstothers need a good deal of capacitybuilding before they can be reached. It

will be important to phase the introductionof e-assessment in such a way that theload on students, teachers, schools and systems is lower than the currentassessment load. Some barriers arediscussed below.

Establishing the credibility of e-assessment: in some areas such ascompetency-based assessment, the casefor e-assessment is self-evident. In otherareas, reasonable sceptics will have to beconvinced of its value. They will haveconcerns about the construct validity ofnew tests (exactly what do they measure?);the reliability of new tests in comparisonwith existing tests; and the educationalstandards required – both in relation tocurrent tests, and across tests such asthose given ‘on-demand’ in different placesand at different times. Each of thesequestions will need to be addressed foreach family of e-assessments, usually by means of an empirical study.

Building system capacity: there is anurgent need to build capacity for e-assessment that ranges from test design,test delivery and processing, and expertisein school. Each of these is problematic.

Task and test design: very few people haveexpertise in creating e-assessments, incomparison to the large numbers ofpeople competent to create conventionaltests. There is an urgent need to createnew task types and to explore theirreliability and validity. If we do not continueto explore, students will be faced with a setof tasks which recently were innovative,but which are now hackneyed.

37

e-assessmentcould reduce thedamage causedby current tests

Page 40: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

Establishing technical standards:currently, there are three sets of technicalstandards. We need a consensusdocument. The needs of students withspecial needs must be addressed.Standards for monitoring the quality of theassessments given in schools (actually arather hostile environment for ICT,because of the plethora of machines andoperating systems), and the proceduresput in place by examination authoritiesneed to be written, and validated inpractical settings.

ICT infrastructure: good broadbandsystems are needed – in particular, veryhigh specification systems are needed forbig schools. Currently, about 40% ofprimary schools, and about 100% ofsecondary schools have broadband access,but not necessarily at the levels needed foronline assessment (Rt Hon Charles ClarkeMP 2004). The proposals set out in theTomlinson Report are only feasible if anational database of student achievementis established. At school level, extensiveinvestment in ICT will be needed, and costs will recur.

The examination process: dealing with e-assessment poses serious challenges to paper-based examination authorities.They need to develop a robust technologyinfrastructure, and (at least as important)the competencies of staff to make thesesystems function effectively. A good starthas been made here, for example in thework on the assessment of basic and keyskills. However, there are salutarymessages from the QCA Report onimplementation (QCA 2004). AQA report(Adams and Hudson 2004) that theirsurveys show considerable satisfactionfrom examiners. Examiners report that thesoftware is easy to use; they like the

increased accuracy and validation at input,and the auto-totalling of marks by thecomputer, and the electronic managementof reporting and discrepancies.

On examiners and examining: High qualitytraining is an essential aspect of reliableassessment. Tomlinson recommends(paras 134–136) “a thoroughprofessionalisation of the role of markersand examiners, including courseworkmarkers”, and the Report makes a numberof specific recommendations on how thismight be institutionalised via schemes forprofessional development, accreditation,and appropriate professional rewardsystems. The Secondary HeadsAssociations have argued for theestablishment of ‘Chartered Examiners’ inschools and colleges, who would give theirorganisations the right to take morecontrol over examination assessment.

School and test-centre expertise:this presents a massive challenge forprofessional development. Schools need to develop systems which are robust.

Plagiarism: poses a major threat to allassessment systems (eg Ridgway andSmith 2004). These threats range fromdownloading work direct from the internet,commissioning work, and impersonation.Assessment systems will need to beresistant to such attacks.

Equity issues: it is important that e-assessment does not create a ‘digitaldivide’ which privileges some studentsover others on the basis of opportunities of access.

38

it is important that e-assessment

does not create a ‘digital divide’

SECTION 4

OPPORTUNITIES AND CHALLENGES FOR E-ASSESSMENT

Page 41: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

4.4 RELIABLE TEACHERASSESSMENT VIA E-PORTFOLIOS

A key decision for educational systems isto decide exactly how much of thestudents’ time should be devoted toworking on extended projects, and howmuch should be based on shorteractivities. A related decision is the balanceto be struck between portfolio systemsassessed in school, and timed externalassessments. A key issue is to establishrobust and reliable systems of school-based assessment. It is worth highlightingthe extreme positions that differentsystems use. In some systems, allassessment is done externally. In somesystems – for example Queensland,Australia - all assessment is school-based. Queensland provides extensivesystems for training teachers, and formoderating their judgements. ICT canfacilitate this process. All studentsubmissions can be put onto the web, and systems of cross-moderation can beestablished. Externally defined tests canbe used to guide the moderation process.

4.5 DUMBING-DOWN ASSESSMENT

There is a danger that considerations ofcost and ease of assessment will lead tothe introduction of ‘cheap’ assessmentsystems which prove to be very expensivein terms of the damage they do tostudents’ educational experiences. At thetime of writing, this seems most unlikely inthe UK. QCA have funded some innovativee-assessment developments at investmentlevels beyond the reach of most companies,and have a large group focused ondeveloping and sharing expertise in e-assessment (www.qca.org.uk).

4.6 SUMMARY OF SECTION 4

New educational goals continue toemerge, and the process of criticalreflection on what is important to learn,and how this might be assessedauthentically needs to be institutionalisedinto curriculum planning. In this section,we explore ways to assess metacognition,group projects, creativity andcommunication skills. E-assessment iscertain to play a major role in defining andimplementing curriculum change in theUK. There is a strong governmentcommitment to e-assessment, and goodinitial progress has been made. Majorchallenges of ‘going to scale’ have yet tobe faced. A good deal of innovative work isneeded, coupled with a grounded approachto system-wide implementation.

39

e-assessment is certain to playa major role indefining andimplementingcurriculumchange in the UK

Page 42: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

ACKNOWLEDGEMENTS

We wish to thank a number of people whohave commented constructively on thisdocument, in particular Keri Facer, AnnikaSmall, Jeremy Tafler, and KathleenTattersall. We are grateful to them for theirinput. All the faults and errors of omissionare our own.

GLOSSARY

Adaptive testing a sequential form ofindividual testing in which successiveitems in the test are chosen basedprimarily on the psychometric propertiesand content of the items, and theparticipant’s response to previous items

A-level (AS/A2) General Certificate ofEducation (GCE) Advanced Level. Studyusually consists of a two-year academiccourse and students will usually select twoor three subjects from subjects studied atAS-levels to continue to A-level (called A2)

Anchor(s) a sample of student work thatexemplifies a specific level of performance.Markers use anchors to score studentwork, usually comparing the studentperformance to the anchor

AQA an awarding body: Assessment andQualifications Alliance formed from themerger of Associated Examining Board(AEB) and the Northern Examinations andAssessment Board (NEAB) in 2000

AS-levels General Certificate of AdvancedSupplementary Level, considered to be theequivalent of half an A-level. Young peopleare now expected to study four AS-levelsduring Year 12 at school or college

Assessment any systematic method ofobtaining evidence from tests,

examinations, questionnaires, surveys andcollateral sources used to draw inferencesabout characteristics of people, objects orprograms for a specific purpose

Basic skills the ability to read, write andspeak in English and use mathematics at alevel necessary to function and progress atwork and society in general

CAS Computer Algebra System. Softwarepackage used for the manipulation ofmathematical formulae. Automatestedious and sometimes difficult algebraicmanipulation tasks. Systems vary and mayinclude facilities for graphing equations orprovide a programming language for theuser to define their own procedures

City and Guilds major awarding body forvocational qualifications in the UK

Competency-based assessmentassessment process based on thecollection of evidence on which judgmentsare made concerning progress towardssatisfaction of standard performancecriteria

Concept map the arrangement of ideasinto a visual layout highlightingconnections between associated ideas,revealing the structural pattern in theinformation

Criterion referenced assessmentassessment linked to predefinedstandards. (eg ‘Can swim 25 metres in aswimming pool’)

CSE Certificate of Secondary Education:former system of British examinationstaken in a range of subjects, usually at theage of 16

Diagnostic testing testing used to identifythe conceptions and misconceptions with aview to providing appropriate remedialexperiences

40

GLOSSARY

Page 43: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

Discrimination the ability to distinguishbetween and among different levels ofwork or achievement

E-assessment electronic assessment:processes involving the implementation ofICT for the recording, transmission,presentation and processing ofassessment material

Edexcel UK examining and awarding bodyproviding a range of qualificationsincluding at higher education level

EiC Excellence in Cities. Governmentinitiative aimed at raising the educationalaspirations and attainment of children ininner cities

European Computer Driving LicenceEuropean-wide qualification allowingcandidates to demonstrate competence incomputer skills, covering the areas ofbasic concepts of IT, using the computerand managing files, word processing,spreadsheets, database, presentation andinformation, and communication

Formative assessment often calledassessment for learning. Assessment usedto support teaching and learning, whichidentifies strengths and weaknesses of thestudent

GCE General Certificate of Education

GCSE General Certificate of SecondaryEducation (GCSE). The main secondaryschool examinations usually at 16, whichreplaced previous system GCE O-levelsand CSEs

GIS Geographic Information System.System of software used for the storage,retrieval, mapping and analysis of spatialdata, such as mortality by different regions

GNVQ General National VocationalQualification. Vocational qualification, often

taken as an alternative to GCSE or A-levels, usually after compulsory schooling.Available at three levels; Foundation,Intermediate, and Advanced

High-stakes assessment assessment thathas important consequences orimplications for students, staff or schools

ICT Information and CommunicationsTechnology

Key sills a group of skills valued byemployers as being central to all work andlearning, including communication,information technology, application ofnumbers, working with others, andimproving own learning and performance

Key Stages the four stages of the NationalCurriculum: KS1 for pupils aged 5-7; KS2for 7-11; KS3 for 11-14; KS4 for 14-16

NVQ National Vocational Qualifications.Work-based vocational qualifications. Theyare portfolio-based qualifications whichshow skills, knowledge and ability inspecific work areas. Can be taken at fivelevels, depending on level of expertise andresponsibility of the job

O-level also GCE Ordinary level. Formersystem of British examinations taken in arange of subjects, usually at the age of 16.Ran in parallel with but at a higher levelthan CSE. Both systems now replaced bycurrent GCSE

Parallel forms tests that are created tomeasure the same constructs, and toproduce the same scores, if they weregiven to individuals on different occasions

PDA Personal Digital Assistant; a smallhand-held computer. Depending on level ofsophistication may allow e-mail, wordprocessing, music playback, internetaccess, digital photography or GPSreception

41

Page 44: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

Pedagogy philosophy of approach toschooling, learning, and teaching includingwhat is taught, how teaching occurs, andhow learning occurs

Portfolio a representative collection of acandidate’s work, which is used todemonstrate or exemplify either that arange of criteria has been met, or toshowcase the very best that a candidate iscapable of

Portfolio assessment assessment basedon judgment made about the work shownas evidence within a portfolio

Predictive validity the extent to whichscores on a test predict some futureperformance. For example, a student’sGSCE grade can be used to predict theirlikely A-level grade – in some subjects, theprediction is better than in other subjects

QCA UK public body, sponsored by theDepartment for Education and Skills(DfES). Roles include the maintenance anddevelopment of the national curriculumand associated assessments, tests andexaminations

Reliability reliability in measurement andtesting is a measure of the accuracy of thescore achieved, with respect to thelikelihood that the score would be constantif the test were re-taken or the sameperformance were re-scored by anothermarker, or if another test from a test bankof ostensibly equivalent items is used

Summative assessment assessment usedto measure performance, usually at theend of a course of study

TIMSS Trends in InternationalMathematics and Science Study, formerlyThird International Mathematics andScience Study. Comprehensive studyoffering data on students’ mathematics

and science achievement from aninternational perspective. Data from 1995,1999, and 2003

UCLES University of Cambridge LocalExaminations Syndicate, comprising threebusiness units: Cambridge ESOL (Englishfor Speakers of Other Languages),providing examinations in English as aforeign language and qualifications forlanguage teachers; CIE (University ofCambridge International Examinations),providing international schoolexaminations and international vocationalawards; and OCR (Oxford, Cambridge andRSA Examinations), providing general andvocational qualification

Validity the appropriateness of theinterpretation and use of the results forany assessment procedure

Value added the increase in learning thatoccurs during a course of education.Based either on the gains of an individualor a group of students. Requires a baselinemeasurement for comparison

42

GLOSSARY

Page 45: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

BIBLIOGRAPHY

Adams, C and Hudson, G (2004). AQA andDRS electronic mark capture, presented atthe QCA E-assessment Summit, 24 April

Aim Online P–10 Supplement (2003).Supplement to the VCAA Bulletin No 6September 2003. AIM Online:www.aimonline.vic.edu.au

Archenhold, WF, Bell, J, Donnelly, J,Johnson, S and Welford, G (1988). Scienceat Age 15: a Review of APU Findings 1980-1984. London: HMSO

Barnes, M, Clarke, D and Stephens, M(2000). Assessment: the engine of systemiccurriculum reform? Journal of CurriculumStudies, 32(5) 623-650

Bennett, RE (2002). Inexorable andinevitable: the continuing story oftechnology and assessment. Journal ofTechnology, Learning, and Assessment, ı(ı).Available from www.jtla.org

Black, P and Wiliam, D (2002).Assessment for Learning: Beyond theBlack Box (2002). www.assessment-reform-group.org.uk/publications.html

Chapman, OL and Fiore, MA (2001).Calibrated peer review: a writing andcritical thinking instructional tool. TheWhite Paper: a Description of CPR.http://cpr.molsci.ucla.edu/

Cockcroft, WH (1982). MathematicsCounts. London: HMSO

Cohen, Y, Ben-Simon, A and Hovav, M(2003). The effect of specific languagefeatures on the complexity of systems forautomated essay scoring. Paper presentedto the 29th Annual Conference of theInternational Association for EducationalAssessment. www.aqa.org.uk/support/iaea/papers/ben-cohen-hovav.pdf

Cowie, J and Lehnert W (1996).Information extraction. Communications of the ACM vol 39 (1), pp80-91

De Bono, E (2000). Six Thinking Hats.London: Penguin Books

Doiron, G and Isaac JR (2002). Designingan ER online role play for medicalstudents. 2nd Symposium on Teaching andLearning in Higher Education ParadigmShift in Higher Education, NationalUniversity of Singapore, 4-6 September 2002

Downes, T and Zammit, K (2000). Newliteracies for connected learning in globalclassrooms, in: H Taylor and P Hogenbirk(Eds) Information and CommunicationTechnologies: the School of the Future.London: Kluwer Academic Publishers

EPPI Centre (2002). A Systematic Reviewof the Impact of Summative Assessmentand Tests on Students’ Motivation forLearning. http://eppi.ioe.ac.uk

Frederikson, JR and Collins, A (1989). A system approach to educational testing.Educational Researcher, 18(9), 27-32

Freeman, MA and McKenzie, J (2002).Implementing and evaluating SPARK, aconfidential web-based template for selfand peer assessment of studentteamwork: benefits of evaluating acrossdifferent subjects. British Journal ofEducational Technology, 33 (5), pp553-572.Cited at www.educ.dab.uts.edu.au/darrall/sparksite

Getzels, JW and Jackson, PW (1962).Creativity and Intelligence: Explorationswith Gifted Students. New York: John Wiley

Kimbell, R (2003). Performanceassessment: assessing the inaccessible.Paper presented at Futurelab’s Beyond theExam conference, 19-20 November 2003,Bristol

43

BIBLIOGRAPHY

Page 46: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

Kirriemuir, J and McFarlane, A (2003).Literature Review in Games and Learning(2004). Bristol: Futurelab. Retrieved05/09/2004 from www.futurelab.org.uk/research/lit_reviews.htm

Klein SP, Hamilton, LS, McCaffrey, DFand Stecher, BM (2000). What do testscores in Texas tell us? RAND Issues Paper.www.rand.org/publications/IP/IP202

Koretz and Barron (1998). The Validity of Gains in Scores on the KentuckyInstructional Results Information System(KIRIS). www.rand.org/publications/MR/MR1014/MR1014.pref.pdf

Linn, RL (2000). Assessments andaccountability. ER Online, 29(2).www.aera.net/pubs/er/arts/29-02/linn01.htm

Mathews, JC (1985). Examinations: a Commentary. London: George Allen and Unwin

Messick, S (1995). Validity of psychologicalassessment. American Psychologist vol 50,no 9, pp741-749

Mitchell, T, Aldridge, N, Williamson, Wand Broomhead, P (2003). Computerbased testing of medical knowledge.Proceedings of the 7th InternationalComputer Assisted AssessmentConference, Loughborough, pp249-267

Pellegrino, JW, Chudowski, N, Glaser, R(Eds) (2001). Knowing What StudentsKnow. Washington DC: National Academyof Sciences

QCA (2004). The Basic and Key Skills (BKS)E-assessment Experience Report.www.qca.org.uk/adultlearning/downloads/bks_e-assessment_experience.pdf

Richardson, M, Baird, J, Ridgway, J, Ripley,M, Shorrocks-Taylor, D and Swan, M (2002).Challenging minds? Students’ perceptions

of computer-based World Class Tests ofproblem solving. Computers and HumanBehaviour, 18 (6), 633-649

Ridgway, J and Passey, D (1993). Aninternational view of mathematicsassessment - through a class, darkly, in:Niss, M (Ed) Investigations intoAssessment in Mathematics Education.Kluwer Academic Publishers, pp57-72

Ridgway, J (1998). The Modelling ofSystems and Macro-Systemic Change -Lessons for Evaluation from Epidemiologyand Ecology. National Institute for ScienceEducation Monograph 8, University ofWisconsin-Madison

Ridgway, J and Smith, H (2004). Againstplagiarism: strategies for defending thevalidity of assessment systems. EARLIAssessment SIG, Bergen, Norway

Ridgway, J, Swan, M and Burkhardt, H(2001). Assessing mathematical thinkingvia FLAG, in: D Holton and M Niss (Eds)Teaching and Learning Mathematics atUniversity Level - An ICMI Study.Dordrecht: Kluwer Academic Publishers,pp 423-430. Field-Tested LearningAssessment Guide (FLAG).www.wcer.wisc.edu/nise/cl1

Ridgway J and McCusker, S (2003). Using computers to assess neweducational goals. Assessment inEducation: Principles, Policy and Practice,vol 10, no 3, pp309-328(20)

Ripley, M (2004). E-assessment question2004 – QCA keynote speech e-assessment:an overview. Presentation given by MartinRipley at Delivering E-assessment - a FairDeal for Learners, a summit held by QCAon 20 April 2004

Roan, M (2003). Computerisedassessment: changes in marking UKexaminations – are we ready yet? Paper

44

BIBLIOGRAPHY

Page 47: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

presented to the 29th Annual Conferenceof the International Association forEducational Assessment. www.aqa.org.uk/support/iaea/papers/roan.pdf

Robitaille, DF, Schmidt, WH, Raizen, S,McKnight, C, Britton, E and Nicol, C(1993). Curriculum frameworks formathematics and science. TIMSSMonograph No 1. Vancouver: PacificEducational Press

Rt Hon Charles Clarke MP, Secretary ofState for Education and Skills. Keynotespeech at Delivering E-assessment - aFair Deal for Learners, a summit held byQCA on 20 April 2004

Schulman, L (1998). Teacher portfolios: atheoretical activity, in: N Lyons (Ed) WithPortfolio in Hand: Validating the NewTeacher Professionalism (pp23-37). NY:Teachers College Press

Slaughter, S and Leslie, LL (1997).Academic Capitalism: Politics, Policies andthe Entrepreneurial University. Baltimore:The Johns Hopkins University Press

Sukkarieh, JZ, Pulman, SG and Raikes, N(2003). Auto-marking: using computationallinguistics to score short, free textresponses. Paper presented to the 29thAnnual Conference of the InternationalAssociation for Educational Assessment.www.aqa.org.uk/support/iaea/papers/sukkarieh-pulman-raikes.pdf

Tattersall, K (2003). Ringing the changes:educational and assessment policies, 1900to the present, in: Setting the Standard.AQA: Manchester, pp7-27

Teacher Training Agency (2003). Qualifyingto Teach: Professional Standards forQualified Teacher Status andRequirements for Initial Teacher Training

Tomlinson, M (2002). Inquiry into A LevelStandards. London: DfES

Tomlinson, M (2004). 14-19 Curriculumand Qualifications Reform: Interim ReportOf The Working Group On 14-19 Reform.London: DfES. www.14-19reform.gov.uk

Topping, KJ (1998). Peer assessmentbetween students in college and university.Review of Educational Research. 68 (3),249-276

45

Page 48: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

APPENDIX: FUNDAMENTALS OF ASSESSMENT

How shall they be judged?

Here we consider some of the criteriaagainst which tests and testing systemscan be judged.

Validity and reliability are often writtenabout as if they were separate things.Actually, they are intimately entwined, butit is worth starting with two simpledefinitions: validity is concerned with thenature of what is being measured, whilereliability is concerned with the quality ofthe measurement instrument.

A loose set of criteria can be set out underthe heading of educational validity(Frederikson and Collins (1989) use theterm ‘systemic validity’). Educationalvalidity encompasses a number of aspectswhich are set out below.

Consequential validity: refers to theeffects that assessment has on theeducational system (Ridgway and Passey(1993) use ‘generative validity’). Messick(1995) argues that consequential validity isprobably the most important criteria onwhich to judge an assessment system. Forexample, high-stakes testing regimeswhich focus exclusively on timed multiplechoice items in a narrow domain canproduce severe distortions of theeducational process, including rewardingboth students and teachers for cheating.Klein, Hamilton, McCaffrey and Stecher(2000), and Koretz and Barron (1998)provide examples where scores on high-stakes State tests rise dramatically over afour-year period, while national tests takenby the same students, which measure thesame constructs, show little change.

Construct validity: refers to the extent towhich a test measures what it purports to

measure. There is a need for a cleardescription of the whole topic area (thedomain definition) covered by the test.There is a need for a clear statement ofthe design of the test (the test blueprint),with examples in the form of tasks andsample tests. Construct validity requiressupporting evidence on the match betweenthe domain definition and the test.Construct validity can be approached in anumber of ways. It is important to check on:

• content validity: are items fullyrepresentative of the topic beingmeasured?

• convergent validity: given the domaindefinition, are constructs which shouldbe related to each other actuallyobserved to be related to each other?

• discriminant validity: given the domaindefinition, are constructs which shouldnot be related to each other actuallyobserved to be unrelated?

• concurrent validity: does the testcorrelate highly with other tests whichsupposedly measure the same things?

The essential idea about reliability is thattest scores should be a lot better thanrandom numbers. Test situations have lotsof reliabilities. The over-arching questionconcerning reliability is: if we could testidentical students on different occasionsusing the same tests, would we get thesame results?

Take the measurement of student heightas an example. The concept is easy todefine; we have good reason to believe that ‘height’ can be measured on a singledimension (contrast this with ‘athleticability’, or ‘creativity’ where a number of different components need to beconsidered). However, the accuratemeasurement of height needs care.

46

APPENDIX: FUNDAMENTALS OF ASSESSMENT

Page 49: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

Height is affected by the circumstances ofmeasurement – students should take offtheir shoes and hats, and should notslump when they are measured. Themeasuring instrument is important – ayard stick will provide a crude estimate,good for identifying students who areexceptionally short or exceptionally tall,but not capable of fine discriminationsbetween students; using a tape measure islikely to lead to more measurement errorthan using a fixed vertical ruler with a barwhich rests on each student’s head. Timeof day should be considered (people aretaller in the morning); so should the timebetween measurements. If we assess thereliability of measurement by comparingmeasurements on successive occasions,we will under-estimate reliability if themeasures are taken too far apart, andstudents grow different amounts in theintervening period.

Exploration of reliability raises a set offiner-grained questions. Here are someexamples:

• is the phenomenon of being measuredrelatively stable? What inherentvariation do we expect? (mood is likelyto be less stable than vocabulary size)

• to what extent do different markersassign the same marks as each other toa set of student responses?

• do students of equal ability get thesame marks no matter which version ofthe test they take?

Fitness for purpose: the quality of anydesign can be judged in terms of its‘fitness for purpose’. Tests are designedfor a variety of purposes, and so thecriteria for judging a particular test willshift as a function of its intended purpose;the same test may be well suited to onepurpose and ill suited to another.

Usability: people using an assessmentsystem – notably students and teachers –need to understand and be sympathetic toits purposes.

Practicality: few designers work in arenaswhere cost is irrelevant. In educationalsettings, a major restriction on design isthe total cost of the assessment system.The key principle here is that testadministration and scoring must bemanageable within existing financialresources, and should be cost-effective inthe context of the education of students.

Equity: equity issues must be addressed -inequitable tests are (by definition) unfair,illegal, and can have negative socialconsequences.

47

Page 50: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

About Futurelab

Futurelab is passionate about transforming the way people learn. Tapping into the hugepotential offered by digital and other technologies, we are developing innovative learningresources and practices that support new approaches to education for the 21st century.

Working in partnership with industry, policy and practice, Futurelab:

• incubates new ideas, taking them from the lab to the classroom• offers hard evidence and practical advice to support the design and use of innovative

learning tools• communicates the latest thinking and practice in educational ICT• provides the space for experimentation and the exchange of ideas between the

creative, technology and education sectors.

A not-for-profit organisation, Futurelab is committed to sharing the lessons learnt fromour research and development in order to inform positive change to educational policyand practice.

Futurelab1 Canons RoadHarboursideBristol BS1 5UHUnited Kingdom

tel +44 (0)117 915 8200fax +44 (0)117 915 [email protected]

www.futurelab.org.uk

Registered charity 1113051

Page 51: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

Creative Commons

© Futurelab 2006. All rights reserved; Futurelab has an open access policy which encourages circulation ofour work, including this report, under certain copyright conditions - however, please ensure that Futurelab isacknowledged. For full details of our Creative Commons licence, go to www.futurelab.org.uk/open_access.htm

Disclaimer

These reviews have been published to present useful and timely information and to stimulate thinking anddebate. It should be recognised that the opinions expressed in this document are personal to the author andshould not be taken to reflect the views of Futurelab. Futurelab does not guarantee the accuracy of theinformation or opinion contained within the review.

This publication is available to download from the Futurelab website –www.futurelab.org.uk/research/lit_reviews.htm

Also from Futurelab:

Literature Reviews and Research ReportsWritten by leading academics, these publications provide comprehensive surveys ofresearch and practice in a range of different fields.

HandbooksDrawing on Futurelab's in-house R&D programme as well as projects from around theworld, these handbooks offer practical advice and guidance to support the design anddevelopment of new approaches to education.

Opening Education SeriesFocusing on emergent ideas in education and technology, this series of publicationsopens up new areas for debate and discussion.

We encourage the use and circulation of the text content of these publications, whichare available to download from the Futurelab website – www.futurelab.org.uk/research.For full details of our open access policy, go to www.futurelab.org.uk/open_access.htm.

Page 52: Durham Research Online - eportfolio€¦ · Literature Review of E-assessment REPORT 10: FUTURELAB SERIES Jim Ridgway and Sean McCusker, School of Education, University of Durham

FUTURELAB SERIES

REPORT 10

ISBN: 0-9544695-8-5Futurelab © 2004