simplifying reading: implications for instruction
DESCRIPTION
Simplifying reading: Implications for instruction. Janet Vousden University of Warwick. Michelle Ellefson, Nick Chater, Jonathan Solity. Overview. English spelling-to-sound inconsistency and reading rational analysis of English reading applying the simplicity principle - PowerPoint PPT PresentationTRANSCRIPT
Simplifying reading: Implications for instruction
Simplifying reading: Implications for instruction
Janet VousdenUniversity of Warwick
Janet VousdenUniversity of Warwick
Michelle Ellefson, Nick Chater, Jonathan Solity
OverviewOverview
English spelling-to-sound inconsistency and reading
rational analysis of English reading
applying the simplicity principle
analysis of some common reading programmes
English spelling-to-sound inconsistency and reading
rational analysis of English reading
applying the simplicity principle
analysis of some common reading programmes
Spelling-to-sound mappingsSpelling-to-sound mappings
spelling-to-sound mappings in English are not transparent at sub-lexical level
spelling-to-sound mappings in English are not transparent at sub-lexical level
some spellings are consistent:“ck”: duck - /dʌk/, mock - /mok/, etc
and a simple grapheme-phoneme rule will suffice;ck - /k/
others are not:“ea”: beach - /biːtʃ/, real - /rɪəl/, great - /ɡreɪt/, or head - /hɛd/
most obvious at the grapheme level - “ou” grapheme is credited with having 10 different pronunciations (Gontijo, Gontijo, & Shillcock, 2003)
most obvious at the grapheme level - “ou” grapheme is credited with having 10 different pronunciations (Gontijo, Gontijo, & Shillcock, 2003)
overall measure of (in)consistency in a language is its orthographic depth: average number of pronunciations per grapheme
for English, orthographic depth estimates 2.1 - 2.4 (Berndt, Reggia, & Mitchum, 1987; Gontijo, Gontijo, & Shillcock, 2003) polysyllabic text 1.7 (Vousden, 2008) monosyllabic text
compare e.g. Serbo-Croat which has OD of 1
e.g., round, group, should, four, country, tenuous, soul, journal, cough, pompous
e.g., round, group, should, four, country, tenuous, soul, journal, cough, pompous
Data: % correct reading scores (adapted from Seymour, Aro, & Erskine, 2003).Data: % correct reading scores (adapted from Seymour, Aro, & Erskine, 2003).
how do literacy levels in English compare with other languages?
how do literacy levels in English compare with other languages?
Language Real-words
Non-words
Greek 98 92
Finnish 98 95
German 98 94
Italian 95 89
Spanish 95 89
Swedish 95 88
Dutch 95 82
Icelandic 94 86
Norwegian 92 91
French 79 85
Portuguese 73 77
Danish 71 54
Scottish English 34 29
yes - inconsistency clearly increases difficulty of learning to read compared with more consistent languages (Frith, Wimmer & Landerl, 1998)
yes - inconsistency clearly increases difficulty of learning to read compared with more consistent languages (Frith, Wimmer & Landerl, 1998)
can differences in consistency account for the difficulty in learning to read English?
can differences in consistency account for the difficulty in learning to read English?
Language Real-words
Non-words
Greek 98 92
Finnish 98 95
German 98 94
Italian 95 89
Spanish 95 89
Swedish 95 88
Dutch 95 82
Icelandic 94 86
Norwegian 92 91
French 79 85
Portuguese 73 77
Danish 71 54
Scottish English 34 29
Data: non-word reading accuracy (reproduced from Frith, Wimmer, & Landerl, 1998)
Data: non-word reading accuracy (reproduced from Frith, Wimmer, & Landerl, 1998)
0102030405060708090
100
6 7 8 9 10 11 12 13
Age
% c
orr
ect
German
English
lag in performance persists through school years lag in performance persists through school years
Most often, vowel graphemes are inconsistent, but can use immediate context to resolve ambiguity
C V C - C V or V C
Most often, vowel graphemes are inconsistent, but can use immediate context to resolve ambiguity
C V C - C V or V C
ambiguity can be resolved by considering the following consonant (a rime unit) rather than the previous consonant (Treiman et al., 1995)
ambiguity can be resolved by considering the following consonant (a rime unit) rather than the previous consonant (Treiman et al., 1995)
ea pronounced to rhyme with breath when
followed by ‘d’ ~80% pronounced to rhyme with meat when
followed by ‘p’ 100%
ea pronounced to rhyme with breath when
followed by ‘d’ ~80% pronounced to rhyme with meat when
followed by ‘p’ 100% also, rime units are more consistent than
graphemes 23% graphemes inconsistent 15% rimes inconsistent
also, rime units are more consistent than graphemes
23% graphemes inconsistent 15% rimes inconsistent
Choosing spelling-to-sound mappingsChoosing spelling-to-sound mappings
and many are inconsistent 15% rimes, 23% graphemes
and many are inconsistent 15% rimes, 23% graphemes
variety of approaches from reading schemes (Rhymeworld, THRASS, etc)
variety of approaches from reading schemes (Rhymeworld, THRASS, etc)
influences from developmental literature (do rimes or gpcs predict reading ability?)
influences from developmental literature (do rimes or gpcs predict reading ability?)
so many to choose from, ~2000 rime mappings ~300 grapheme mappings
so many to choose from, ~2000 rime mappings ~300 grapheme mappings
Rational analysisRational analysis
Attempt to explain behaviour in terms of adaptation to environment, independent of details of cognitive architecture
Attempt to explain behaviour in terms of adaptation to environment, independent of details of cognitive architecture
e.g., Anderson & Schooler (1991) showed that the probability that a memory will be needed over time matches the availability of human memories same factors that predict memory performance
also predict the odds that an item will be needed
i.e. reliable effects of recency and frequency
e.g., Anderson & Schooler (1991) showed that the probability that a memory will be needed over time matches the availability of human memories same factors that predict memory performance
also predict the odds that an item will be needed
i.e. reliable effects of recency and frequency
Solution adopted by cognitive architecture should reflect structure of environment
Solution adopted by cognitive architecture should reflect structure of environment
factors that affect performance of skilled readers should be reflected in the statistical structure of the language, e.g. frequency and consistency
factors that affect performance of skilled readers should be reflected in the statistical structure of the language, e.g. frequency and consistency
by examining linguistic factors that skilled readers have adapted to, could the input be more optimally structured for learners?
by examining linguistic factors that skilled readers have adapted to, could the input be more optimally structured for learners?
effects of word frequency in naming and lexical decision
effects of rime frequency on word-likeness judgements and pronunciation
effects of grapheme frequency in letter search and word priming experiments
effects of word frequency in naming and lexical decision
effects of rime frequency on word-likeness judgements and pronunciation
effects of grapheme frequency in letter search and word priming experiments
Analyses of spelling-to-sound mappings
Analyses of spelling-to-sound mappings
rational analysis predicts the most frequent and consistent mappings best predict pronunciation
rational analysis predicts the most frequent and consistent mappings best predict pronunciation
interested in the frequency & consistency of mappings at level of words, rimes, and graphemes, and their ability to predict correct pronunciation
interested in the frequency & consistency of mappings at level of words, rimes, and graphemes, and their ability to predict correct pronunciation
CELEX database: 7,297 different monosyllabic words, 10,924,491 words in total
CELEX database: 7,297 different monosyllabic words, 10,924,491 words in total
WordsWords
00.10.20.30.40.50.60.70.80.9
1
0 100 200 300 400 500
Number of Words
Pro
port
ion
of T
ext
Rea
d0
2
4
6
8
10
12
14
0 50 100 150 200
Rank order of Frequency
Fre
qu
en
cy in
10
0,0
00
's
Onsets and rimesOnsets and rimes
Exclude 100 most frequent words: 7,197 diffrent words, total of 2,263,264 words
Exclude 100 most frequent words: 7,197 diffrent words, total of 2,263,264 words
Create table of onset and rime mapping frequencies, remove all but most frequent of inconsistent mappings
Create table of onset and rime mapping frequencies, remove all but most frequent of inconsistent mappings
Rime Frequency
oul - /əʊl/ 731
oul - /aʊl/ 175
oul - /uːl/ 12
ove - /ʌv/ 4779
ove - /uːv/ 1852
ove - /əʊv/ 838
Rime Frequency
oul - /əʊl/ 731
ove - /ʌv/ 4779
0
50
100
150
200
250
300
350
400
450
500
0 10 20 30 40 50
Rank Order of Frequency
Fre
quen
cy in
100
0's
0
5
10
15
20
25
30
35
40
45
50
0 20 40 60 80 100
Rank Order of Frequency
Fre
quen
cy in
100
0's
Onsets Rimes
0
10
20
30
40
50
60
0 20 40 60 80 100
Number of Rimes
Pro
port
ion
of T
ext R
ead
(%) 0 Onsets
10 Onsets20 Onsets30 Onsets40 Onsets80 Onsets
GPCsGPCs
exclude 100 most frequent words: 7197 diffrent words, total of 2,263,264 words
exclude 100 most frequent words: 7197 diffrent words, total of 2,263,264 words
create table of GPC mapping frequencies, remove all but most frequent of inconsistent mappings
create table of GPC mapping frequencies, remove all but most frequent of inconsistent mappings
Rime Frequency
g - /g/ 133930
g - /dʒ/ 33342
g - /ʒ/ 98
i - /ɪ/ 153606
i - /aɪ/ 46628
i - /iː/ 455
Rime Frequency
g - /g/ 133930
i - /ɪ/ 153606
0
1
2
3
4
5
6
7
8
9
10
0 20 40 60 80
Rank Order of Frequency
Fre
qu
en
cy in
10
0,0
00
's
0
10
20
30
40
50
60
0 20 40 60 80 100
Number of GPCs
Pe
rce
nta
ge
of T
ext
Re
ad
GPCs
70
75
80
85
90
95
100
0 100 200 300Number of Mappings
Per
cent
age
of T
ext R
ead
Grapheme -Phoneme
Onset/Rime
SummarySummary
some words much more frequent than others, therefore sight vocabulary very effective for small number of words, up to ~100
some words much more frequent than others, therefore sight vocabulary very effective for small number of words, up to ~100
sub-lexical units also have skewed frequency distribution, and learning the most frequent mappings predicts high potential outcome
sub-lexical units also have skewed frequency distribution, and learning the most frequent mappings predicts high potential outcome
high initial gains with GPCs, greater overall gain with rimes in the long run
high initial gains with GPCs, greater overall gain with rimes in the long run
What is the optimal size unit to learn? What is the optimal size unit to learn?
Can we measure the potential benefit from, and cost of, remembering mappings for GPCs onset/rimes A combination of both ?
Can we measure the potential benefit from, and cost of, remembering mappings for GPCs onset/rimes A combination of both ?
Potential benefits for reading outcome are larger for onset/rimes, but is this out-weighed by the cost of remembering many more mappings?
Potential benefits for reading outcome are larger for onset/rimes, but is this out-weighed by the cost of remembering many more mappings?
The Simplicity PrincipleThe Simplicity Principle
reading, like much high-level cognition, involves finding patterns in data, but many patterns are compatible with any finite set of data - so how does the cognitive system choose from the possibilities?
reading, like much high-level cognition, involves finding patterns in data, but many patterns are compatible with any finite set of data - so how does the cognitive system choose from the possibilities?
Using the simplicity principle, choose the simplest explanation of the data - intuitively, has long history (Occam’s razor)
Using the simplicity principle, choose the simplest explanation of the data - intuitively, has long history (Occam’s razor)
can quantify simplicity by measuring (shortest) description from which data can be reconstructed - trade off brevity against goodness of fit cognition as compression
can quantify simplicity by measuring (shortest) description from which data can be reconstructed - trade off brevity against goodness of fit cognition as compression
implement with minimum description length (MDL) more regularity = more compression no regularity = no compression, just reproduce
data
implement with minimum description length (MDL) more regularity = more compression no regularity = no compression, just reproduce
data can measure compression with Shannon’s (1948) coding theorem - more probable events are assigned shorter code lengths:
length/bits = log2(1/p)
can measure compression with Shannon’s (1948) coding theorem - more probable events are assigned shorter code lengths:
length/bits = log2(1/p)
measure code length to specify: hypothesis about data (mappings) data, given hypothesis (decoding accuracy,
given mappings)
measure code length to specify: hypothesis about data (mappings) data, given hypothesis (decoding accuracy,
given mappings)
MethodMethod
determine code length to describe mappingsdecoding accuracy, given mappings
for each mapping size
determine code length to describe mappingsdecoding accuracy, given mappings
for each mapping size
determine mappings & frequencies from monosyllabic corpus of children’s reading materials (Stuart et al., 2003), for mapping sizes: words CV/C (head/coda) C/VC (onset/rime) GPCs
determine mappings & frequencies from monosyllabic corpus of children’s reading materials (Stuart et al., 2003), for mapping sizes: words CV/C (head/coda) C/VC (onset/rime) GPCs
Table 1. A list of reading schemes/series used by over a third of schools in the survey
Name of scheme % using scheme Included in database? Ginn 360 74% Yes Storychest 58% Yes Magic Circle 58% Yes 1 2 3 and Away 50% Yes Griffin Pirates 43% Yes Breakthrough to Literacy 41% Bangers and Mash 40% Yes Wide range readers 38% Yes Dragon Pirates 37% Yes Through the rainbow 34% Ladybird read-it-yourself 33% Yes Humming birds 32% Thunder the dinosaur 29% Yes Link Up 29% Gay Way 27% Yes Monster 27% Yes Oxford Reading Tree 27% Yes Once Upon a Time 26% Yes Trog 26%
words CV/C C/VC GPCsletter sound freq letter sound freq letter sound freq
wiː bi bI 1 d d 7 t t 17
kæn ca kæ 1 g g 4 i I 11
bʌt do dɒ 1 f f 1 a æ 11
ɒn fro frɒ 2 wh w 1 g g 7
miː ra ræ 1 k k 4
c k 4
bæk n n 9 an æn 4 ay eɪ 2
kʌm k k 1 og ɒg 2 a*e eɪ 1
frɒg lp lp 1 elp elp 1
Code length for mappingsCode length for mappings
length = log2(1/p(w)) + log2(1/p(iː)) + log2(1/p(newline))
length = log2(1/p(w)) + log2(1/p(iː)) + log2(1/p(newline))
length = log2(1/p(b)) + log2(1/p(i)) + log2(1/p(space)) +
log2(1/p(b)) + log2(1/p(I)) + log2(1/p(newline))
length = log2(1/p(b)) + log2(1/p(i)) + log2(1/p(space)) +
log2(1/p(b)) + log2(1/p(I)) + log2(1/p(newline))
Code length for decoding accuracyCode length for decoding accuracy
apply letter-to-sound rules to produce a list of pronunciations
apply letter-to-sound rules to produce a list of pronunciations
bread breIdbri:dbrɛd
bread breIdbri:dbrɛd
arrange in rank order of most probable (computed from letter-to-sound frequencies) & note rank of correct pronunciation
arrange in rank order of most probable (computed from letter-to-sound frequencies) & note rank of correct pronunciation
bread bri:dbrɛdbreId
bread bri:dbrɛdbreId
code length for data, given hypothesis = log2(1/p(rank=2))code length for data, given hypothesis = log2(1/p(rank=2))
0
2
4
6
8
10
12
14
0 0.2 0.4 0.6 0.8 1
p
log
2(1
/p)
how does code length vary as a function of size of vocabulary for each unit size?
how does code length vary as a function of size of vocabulary for each unit size?
optimize number of mappings by removing those that reduce total code length
optimize number of mappings by removing those that reduce total code length
overall comparison between different unit sizes for whole vocabulary
overall comparison between different unit sizes for whole vocabulary
SimulationsSimulations
compare different reading schemes compare different reading schemes
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
Words CV/C C/VC GPCs
Rule size
Co
st/b
its
rulesacctot
Comparing different unit sizes for whole vocabulary Comparing different unit sizes for whole vocabulary
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
0 500 1000 1500 2000 2500 3000
Vocabulary size
Cos
t/bits
words
CV/C
C/VC
GPC
0
500
1000
1500
2000
2500
3000
3500
4000
10 30 50 70 90
Vocabulary size
Cos
t/bits
words
CV/C
C/VC
GPC
Code length as a function of vocabulary sizeCode length as a function of vocabulary size
Optimizing number of mappings Optimizing number of mappings
All mappings Mappings remaining
Mapping Size N Total code
length
N Total code
length
Words 3000 77,825
- -
Onset/rimes 1141 48,612
404 29,420
GPCs 240 10,845
114 8,536
GPCs: Description length reduced by removing mainly inconsistent, low frequency mappingsGPCs: Description length reduced by removing mainly inconsistent, low frequency mappings
Comparing different reading schemes Comparing different reading schemes
SchemeN
GPC rules
Jolly Phonics 43
Hutzler et al. (2004) 67
ERR (Solity & Vousden, 2008)
77
Letters & Sounds 94
THRASS 106
0
5000
10000
15000
20000
25000
30000
Jolly N=43 HutzlerN=71
ERR N=77 LettSouN=94
THRASSN=106
simplicityN=111
all N=240
Cos
t/bits
rulesacctotal
0
5000
10000
15000
20000
25000
30000
43 71 77 94 106 111 240
N rules
Cos
t/bits
rulesacctotal
0
10
20
30
40
50
60
70
80
JollyN=43
HutzlerN=71
ERRN=77
LettSouN=94
THRASSN=106
all N=240 simplicityN=111
% c
orr
ect
SchemesSimplicity
Decoding accuracy by scheme Decoding accuracy by scheme
ERR implemented as a reading intervention in 12 Essex schools:
ERR implemented as a reading intervention in 12 Essex schools:
increase in reading scores significantly greater for ERR schoolsincrease in reading scores significantly greater for ERR schools
0
10
20
30
40
50
60
base YR Y1 Y2
School Age
BA
S R
aw
Sco
re
Comparison ERR
Data: from Shapiro & Solity (2008)Data: from Shapiro & Solity (2008)
Comparison
ERR
reading difficulty
20% 5%
serious reading difficulty
5% 1%
Some conclusionsSome conclusions
small amount of sight vocabulary accounts for large proportion of text, but only small vocabularies most simply described by whole words Complements recent work by Treiman and
colleagues that shows children learn better when association between sound and print is non-arbitrary
small amount of sight vocabulary accounts for large proportion of text, but only small vocabularies most simply described by whole words Complements recent work by Treiman and
colleagues that shows children learn better when association between sound and print is non-arbitrary
As a homogenous set, GPCs provide a simpler explanation of the data
As a homogenous set, GPCs provide a simpler explanation of the data
choosing the best set could be important choosing the best set could be important