charles alderson
DESCRIPTION
TestsTRANSCRIPT
-
Do all tests have washback?
J Charles Alderson Lancaster University
-
Tests
Washback
Diagnosis
-
A four-letter word
Elicitation device: getting somebody to perform their competence
Description of performance
Procedure for making judgements based on criteria
Measurement, not the same as assessment
Not observation
-
Tests whose results are seen rightly or wrongly by students, teachers, administrators, parents or the general
public, as being used to make important
decisions that immediately and directly
affect them.
(Madaus, 1988)
-
Relates to the effects of tests on classroom practices particularly teaching and learning.
Can be positive or negative, to the extent
that it either promotes or impedes the
accomplishment of educational goals held
by learners and/or programme personnel.
(Bailey, 1996)
-
Mismatch between the stated goals of
instruction and the focus of assessment
May lead to the abandonment of
instructional goals in favour of test
preparation
Forces teachers to do things they would
not normally do
-
If a test has positive washback,
there is no difference between teaching the curriculum and teaching to the test.
(Weigle & Jensen, 1997, p. 205)
-
Tests can be a powerful, low-cost means of
influencing the quality of what teachers
teach and what learners learn at school.
(Heyneman & Ransom, 1992)
-
psychometric imperialism
Leads to cramming
Narrows the curriculum
Focuses the attention on skills that are easy to test
Restricts teacher and student creativity
Demeans the professional judgement of teachers
-
A test will influence teaching.
A test will influence learning.
A test will influence what teachers teach.
A test will influence what learners learn.
A test will influence how teachers teach.
A test will influence how learners learn.
(Alderson and Wall, 1993)
-
A test will influence the rate and sequence, and the degree and depth of teaching.
A test will influence the rate and sequence, and
the degree and depth of learning.
A test will influence attitudes to the content,
method, etc of teaching and learning. Tests will have washback on all teachers and
learners.
Tests will have washback on some teachers and
some learners but not on others.
(Alderson and Wall, 1993)
-
Wall & Alderson 1993
different amounts of washback on content, methods, means of assessment
Alderson & Hamp-Lyons 1996, Watanabe 1996
teachers are affected by tests in different ways
Shohamy, Donitsa-Schmitt & Ferman, 1996
the washback of tests can change over time
Tsagari, 2006 The complexity of washback: Participants perceptions, material design and classroom applications
Virtually all studies relate to high-stakes tests
-
Curriculum contents of curriculum, timetabling Teaching materials choice of textbooks, use of past papers, teacher- made materials Teaching methods choice of methods, teaching of test-taking skills Attitudes and feelings of learners and teachers Learning Do test results improve? Does learning improve?
(Spratt, 2005)
-
The exam
Teacher beliefs
Teacher attitudes
Teacher training
Resources
The school
Cultural factors (Spratt, 2005)
-
Very under-developed and under-theorised in language testing and teaching
Focus on learners strengths and weaknesses; on their prediction, even explanation
Diagnosis requires a better understanding of what the nature might be of strengths and weaknesses in particular language skills
There are very few diagnostic SFL tests
(Alderson 2005, 2007; Huhta 2008)
-
NOT Proficiency NOT Achievement NOT Progress NOT Placement NOT Aptitude BUT all the above could yield useful
diagnostic information HOWEVER, better is diagnosis by design
-
Bachman, 1990: 60 Virtually any test has some potential for
providing diagnostic information But he then goes on to say: When we speak of a diagnostic test..we are
generally referring to a test that has been designed and developed specifically to provide detailed information about the specific content domains that are covered in a given program or that are part of a general theory of language proficiency. Thus, diagnostic tests may be either theory or syllabus-based
-
Yet Alderson (2005: 6) points out: It would appear that we have a problem
here: diagnosis (is said to be) useful, most language tests are (said to be) usable for diagnosis, it is common for universities to administer diagnostic tests (actually placement tests), and yet diagnostic tests are rare!
Two examples of diagnosis in action and in
research into theory: DIALANG and DIALUKI
-
DIALANG
Diagnosis in action Diagnosis by design
-
Computer-based diagnostic language testing system
14 European languages
Delivers tests across the Internet
Supports language learners
Institutional or private use, free of charge
Still widely used throughout Europe and beyond, 8 years after launch
-
DIALANG is an application of the Common European Framework of reference
DIALANG uses Common European Framework scales self-assessment statements (modified)
DIALANG provides some evidence of their validity
-
to provide language users and learners with diagnostic information about their strengths and weaknesses and to help them to find ways of improving their proficiency
-
to raise the learners awareness of their own language proficiency, of language
learning and proficiency in general, and of
the role that language tests might have in
the learning process
this takes place through the use of self-assessment and various kinds of feedback
and information services
-
first large-scale system for diagnosis / feedback rather than certification
on-line, Internet-delivered, universally available, not restricted to a particular place or time
-
available for all kinds & levels of learners & can support them throughout their language learning career
multi-lingual (14 languages): tests interface (instructions, help screens) self-assessment & advice / feedback
-
Vocabulary
Size
Placement
Test
reading
writing
listening
structures
vocabulary
Client
enters
D
I
A
L
A
N
G
Selection
of section:
1 2
3
ASSESSMENT PROCEDURE
-
Self-
assess-
ment
Respond-
ing to
tasks
F
e
e
d
b
a
c
k
Selection
EXIT
Another
section/
language
Goodbye!
4 5 6 7
ASSESSMENT PROCEDURE
-
Reading Comprehension (CEFR)
Listening Comprehension (CEFR)
Writing (CEFR)
Structures
Vocabulary
no overall section (nor grade & feedback)
from beginners to advanced
-
Danish
Dutch
English
Finnish
French
German
Greek
Icelandic
Irish
Italian
Norwegian Portuguese Spanish Swedish
-
VSPT score band and description
results (and self-assessment) CEFR scales and report on self assessment
explanatory feedback Why self-assessment may not match test result
advisory feedback What you can do and how to progress, based on
CEFR
item review
-
http://www.lancs.ac.uk/researchenterprise/dialang/about
-
Validity relates to what the test is intended to measure
Design for diagnosis, dont retrofit Diagnosis should relate to future
treatment Treatment should be teachable or
learnable Diagnosis should be based on theory:
what we know about what affects learning
-
Informed by SLA research Focus on weaknesses rather than strengths Enable a detailed analysis and report Give detailed feedback which can be acted on Provide immediate results Involve little anxiety Based on content covered in instruction Less authentic Discrete-point rather than integrated More likely to focus on low-level language skills
than higher-order skills which are more integrated; Likely to be enhanced by being computer-based.
-
DIALUKI
Understanding Diagnosis Researching Diagnosis
-
Diagnosing Reading and Writing in a Second or Foreign Language
Research project 2010-2013: work in progress
Funded by the Academy of Finland, the University of Jyvskyl and the UK Economic and Social Research Council (ESRC)
Cooperation between language testers, other applied linguists and psychologists (L1 reading)
-
The main research questions: Can different L1 and L2 linguistic,
psycholinguistic, motivation and background measures predict difficulties in SFL R/W ?
How does SFL proficiency in R/W develop in psycholinguistic and linguistic terms?
Which features or combinations of features characterise different CEFR proficiency levels?
-
Study 1 Study 2 Study 3
A cross-sectional study with 850 students Data collection: 2010-11 Exploring the value of a range of L1 & L2 measures in predicting L2 reading & writing, in order to select the best predictors for further studies
Longitudinal study Data collection 2010-13 The development of literacy skills, and the relationship of this development to the diagnostic measures.
Intervention study Data collection 2012-13 The effects of training on SFL reading and writing
-
Finnish-speaking learners of English as FL
Russian-speaking learners of Finnish as SL
primary school 4th grade (age 10; N = 210)
lower secondary school, 8th grade (age 14; N= 208)
Gymnasium 2nd year students (age 17; N= 218)
primary school (3-6th grade; N= 186)
lower secondary school (7-9th grade; N= 78)
-
Independent predictor variables in L1 and FL
-
Instruments in DIALUKI STUDY ONE
English as a Foreign Language
Group tasks
-
QUESTIONNAIRES 4th grade
(age 1011)
8th grade
(age 1415)
Gymnasium
(age 1718)
Parents
questionnaire X X X
Students
questionnaire X X X
Motivational
questionnaire
49 statements 58 statements 58 statements
Self assessment:
reading &
writing L1 (2 x
18 items)
DIALANG DIALANG
Self assessment:
reading &
writing L2 (2 x
18 items)
DIALANG DIALANG
-
LINGUISTIC
MEASURES
4th grade
(age 1011)
8th grade
(age 1415)
Gymnasium
(age 1718)
Reading L1 ALLU (1 text, 12 items) PISA 2009 (3 texts, 11 items) PISA 2009 (3 texts, 11 items)
Reading L2 Pearson Young Learners
(20 items)
Pearson PTE General (25
items)
Dialang (30 items)
Pearson PTE General (25
items)
Dialang (30 items)
Writing L1 An opinion: Mobile phones /
Internet
An opinion: School food/
Summer job
A complaint
An opinion: School food/
Summer job
A complaint
Writing L2 A message to a friend How do you travel?
An opinion: Mobile phones /
Boys and girls on different
classes
An article
An opinion: Mobile phones /
Boys and girls on different
classes
Vocabulary L1 Dialang
(75 items)
Dialang
(75 items)
Dialang
(75 items)
Vocabulary L2 Selected from 1000 most
common English words
(60 items)
Selected from 3000 most
common English words
(90 words)
Selected from 5000 most
common English words +
AWL (120 words)
Segmentation L1 Text: Isois (Grandpa) 36
items
Text: Lilli
(73 items)
Text: Lilli
(73 items)
Segmentation L2 Text: Little pigs
(51 items)
Text: Australia
(59 items)
Text: Coffee
(71 items)
Typing errors L1 NMI test
(100 words/3min 30 sec)
NMI test
(100 words/3min 30 sec)
Dictation L2 12 units with 24 words (32
words)
10 units with 311 words (52
words)
12 units with 311 words (77
words)
-
Instruments in DIALUKI STUDY ONE
English as a Foreign Language
Individual tasks (1)
-
PSYCHOLINGUISTIC
AND COGNITIVE
TASKS
4th grade
(age 1011)
8th grade
(age 1415)
Gymnasium
(age 1718)
Backwards digit span L1 28 digits,
14 items
(numbers 19)
28 digits,
14 items
(numbers 19)
28 digits,
14 items
(numbers 19)
Backwards digit span L2 25 digits,
8 items
(numbers 16)
25 digits,
8 items
(numbers 16)
25 digits,
8 items
(numbers 16)
Rapidly presented words
L1
14 words
(28 letters)
14 words
(28 letters)
14 words
(28 letters)
Rapidly presented words
L2
8 words
(24 letters)
12 words
(29 letters)
12 words
(29 letters)
List reading L1 105 words
time limit 60 sec
(Lukilasse)
105 words
time limit 60 sec
(Lukilasse)
105 words
time limit 60 sec
(Lukilasse)
List reading L2 105 words
time limit 60 sec
105 words
time limit 60 sec
105 words
time limit 60 sec
Non-word reading L1
(mlkenti)
10 non-words with
34 syllables
10 non-words with
34 syllables
10 non-words with
34 syllables
Non-word reading L2
(kipthirm)
10 non-words (Snowling et al
1996: Graded Nonword
Reading Test )
10 non-words (Snowling et al
1996: Graded Nonword
Reading Test )
Non-word repetition L1
(vrelyytti)
10 non-words with 25
syllables
10 non-words with 25
syllables
10 non-words with 25
syllables
-
PSYCHOLINGUISTIC
AND COGNITIVE
TASKS
4th grade
(age 1011)
8th grade
(age 1415)
Gymnasium
(age 1718)
Non-word repetition L2
(bassodoke)
10 non-words (selected from
Gupta et al 2005)
10 non-words (selected from
Gupta et al 2005)
10 non-words (selected from
Gupta et al 2005)
Non-word spelling L1
(peunumiile)
12 non-words with 4 syllables 12 non-words with 4 syllables 12 non-words with 4 syllables
Phoneme deletion L1
(hamsa hama) 12 non-words with 13
syllables
12 non-words with 13
syllables
12 non-words with 13
syllables
Phoneme deletion L2
(nolcrid olcrid) 8 non-words 10 non-words 10 non-words
Common unit L1 (lauhkua
- terike)
10 pairs of non-words 10 pairs of non-words 10 pairs of non-words
Common unit L2 (filk
maf)
10 pairs of non-words 10 pairs of non-words
Rapid automatic naming
L1
Mixed list of numbers, letters
and colours (50 items)
Mixed list of numbers, letters
and colours (50 items)
Mixed list of numbers, letters
and colours (50 items)
Rapid automatic naming
L2
Mixed list of numbers, colors
and objects (30 items)
Mixed list of numbers, letters
and colours (50 items)
Mixed list of numbers, letters
and colours (50 items)
-
Example Instruments
-
Reading rapidly presented words
***
-
Reading rapidly presented words
day
-
Reading rapidly presented words
%
-
Cognitive and psycholinguistic tasks (2)
RAN Rapid Automatized Naming L1 and FL
Mixed stimuli:
numbers, letters and colours (L1)
numbers, objects and colours (FL)
-
Backward digit span memory test in L1 and FL
repeat the numbers you hear but backwards
-
Rapid reading (aloud) of a list of real L1 words
read as many as you can in one minute
-
L1 1. viepere 6. kylmnsi
2. larvaanto 7. hiemakkola
3. mlkenti 8. sertsapeivo
4. seivolssi 9. vaastiloima
5. euksatus 10. ahkontalsi
Non-word reading task
L2 1. hast 6. tegwop
2. mosp 7. molsmit
3. prab 8. twamket
4. gromp 9. hinshink
5. trolb 10. kipthirm
-
L1 1. seitu 6. peunivatna
2. ronksa 7. ysipulentti
3. minksakka 8. restomeliitti
4. kletsoma 9. plotiskntsingis
5. vrelyytti 10. intjirinanttiin
Non-word repetition task
L2 1. bassim 6. kotiesote
2. peggut 7. doosennane
3. bipup 8. keegulol
4. gaypoom 9. beenodoofop
5. bassodoke 10. daysomaysice
-
L1 1. lauhkua terike 6. vaaso leikua
2. mustele kyhinty 7. hirattu vnkki
3. vommiras thmykkyyn 8. kanttuuso vyyrt
4. tookselo murlain 9. aamestus hilpialli
5. vapi lumpe 10. tlkys angilme
Common Unit task
L2 1. mip pank 6. madast wordle
2. auk honch 7. prinkle mapgom
3. skey twisp 8. sloskon nagar
4. brang peb 9. larsk mambron
5. kelpit membro 10. filk maf
-
L1 1. Tauk auk 7. mesTo meso
2. Hok ok 8. puLke puke
3. Peuk euk 9. kelaMpa kelapa
4. gooK goo 10. makalTo makalo
5. hamSa hama 11. sinepTe sinepe
6. pokRi poki 12. halneSko halneko
Phoneme deletion task
L2 1. kisP kis 6. stanseRt stanset
2. Drant rant 7. dockOAn dockn
3. Apren pren 8. pronaTE prona
4. balraS balra 9. driggLE drigg
5. Nolcrid olcrid 10. norCH nor
-
Example:
|thepigsweresohappytheysangthissong|
|the|pigs|were|so|happy|they|sang|this|song|
Task: |sothenextdaythethreelittlepigslefthomethefirstpigmadeahomef
romstrawthesecondpig| |madeahomefromsticksbutthethirdpigwascleverhemadehishom
efrombricksonedaythebig| |badwolfcametothestrawhouseheknockedonthedoor|
Segmentation task in L2 (4th graders version)
-
L2 Vocabulary OSA 1 2000
1 birth 2 dust 3 operation 4 row 5 sport 6 victory
___ urheilu
___ voitto
___ syntyminen
1 adopt 2 climb 3 examine 4 pour 5 satisfy 6 surround
___ kiivet, nousta
___ katsoa tarkasti
___ olla joka puolella
1 choice 2 crop 3 flesh 4 salary 5 secret 6 temperature
___ lmp
___ liha
___ palkka
1 bake 2 connect 3 inquire 4 limit 5 recognize 6 wander
___ yhdist
___ kvell ilman pmr
___ rajoittaa
1 cap 2 education 3 journey 4 parent 5 scale 6 trick
___ koulutus
___ asteikko
___ matka
1 burst 2 concern 3 deliver 4 fold 5 improve 6 urge
___ srky, puhjeta
___ tehd paremmaksi
___ vied jotakin jollekulle
1 attack 2 charm 3 lack 4 pen 5 shadow 6 treasure
___ aarre
___ lumous, viehtysvoima
___ puuttua, olla vailla jotakin
1 original 2 private 3 royal 4 slow 5 sorry 6 total
___ alkuperinen
___ yksityinen
___ yhteens
1 cream 2 factory 3 nail 4 pupil 5 sacrifice 6 wealth
___ kerma
___ rikkaudet, varallisuus
___ oppilas
1 brave 2 electric 3 firm 4 hungry 5 local 6 usual
___ tavallinen
___ nlkinen
___ urhea,
rohkea
-
Motivation
English Self-concept
Intrinsic interest
Instrumentality
Motivational Intensity
Parental Encouragement
Self-regulation
Anxiety
-
ENGLISH SELF-CONCEPT
Compared to other students, I'm good at English
I have always done well in English.
Studying English is easy for me.
I get good marks in English.
I learn English quickly.
Im better at English than most of my classmates.
Items dropped
I am hopeless when it comes to English
I am satisfied with how well I do in English.
-
Independent variables: Students How much homework do you normally do during a normal school day?
o Not at all o Half an hour or less a day o From half an hour to an hour a day o 12 hours a day o Over 2 hours a day
How do you feel about reading in your free time?
oI like reading a lot
oI like reading somewhat
oI dont like reading
-
How often do you read the following things in your free time?
Independent variables: Students
I read Daily or nearly
daily
12
times a week
12
times a month Rarely or never
a) text messages
b) email
c) Facebook or
Twitter
conversations
d) messages in
chats (e.g. MSN,
IRC)
e) intenet
chatforums
f) blogs or home
pages
g) news or other
newspaper articles
online
h) online non-
fiction texts (e.g.
Wikipedia)
-
Parents education
Parents occupation
Independent variables: Parents
Compulsory
school
Vocational
school or
institute
Gymnasium Bachelors
degree
Masters
degree
a) Childs
mother 1 2 3 4 5
b) Childs
father 1 2 3 4 5
Working Retired Student Housewife/
husband Unemployed
a) Childs
mother 1 2 3 4 5
b) Childs
father 1 2 3 4 5
-
Before the child learned to read, was somebody in the family engaged in the following activities with the child?
Independent variables: Parents
Rarely or never 12 times a month 12 times a week
Everyday or nearly
everyday
a) Read books or told
stories 1 2 3 4
b) Talked about
everyday activities or
events
1 2 3 4
c) Sang 1 2 3 4
d) Played with letter-
toys (e.g. blocks) 1 2 3 4
e) Played word-
games 1 2 3 4
f) Wrote letters or
words 1 2 3 4
g) Read aloud signs
or labels 1 2 3 4
-
Structural Equation Modelling (SEM) Cognitive variables, 4th graders
Three latent variables (path model)
-
Structural Equation Modelling (SEM) Cognitive variables, Gymnasium
Three latent variables (path model)
-
Dependent variable
Adjusted R Squared
%
variance
First IV Second
IV
Third IV Fourth IV Fifth IV
4th Grade Pearson
Young
Learners
Test in
English
.526
53%
Size of
English
Vocab
(.664)
Writing
in L1
Finnish
(.419)
L2
segment-
ation
accuracy
(-.584)
L1
Finnish
Reading
(ALLU)
(.403)
8th Grade Pearson
General +
DIALANG
Medium
.671
67%
Size of
English
Vocab
(.740)
Writing in
English
(.696)
L2
segment-
ation
accuracy
(-.641)
Size of
Finnish
Vocab
(.282)
Gymnasiu
m
Pearson
General +
DIALANG
Advanced
.708
71%
English
dictation
(.795)
Size of
English
Vocab
(.747)
LI Finnish
Reading
(PISA)
(.418)
L2
segment-
ation
accuracy
(-.677)
Writing
in
English
(.680)
-
Positive or negative? On teaching? On learning? On content? On method? On rate and sequence of learning? On degree and depth of learning? On attitudes? On all teachers and learners? On some teachers and learners?
-
Curriculum contents of curriculum, timetabling Teaching materials choice of textbooks, use of past papers, teacher- made materials Teaching methods choice of methods, teaching of test-taking skills Attitudes and feelings of learners and teachers Learning Do test results improve? Does learning improve?
-
What might be the possible unintended negative consequences of diagnostic testing? The fact is that so far we have no research into the washback or impact of diagnostic tests. Empirical research is urgently needed: How might such research be designed and conducted?
-
Thank you for your attention!