simplifying reading: implications for instruction

Simplifying reading: Implications for instruction

Simplifying reading: Implications for instruction

Janet VousdenUniversity of Warwick

Janet VousdenUniversity of Warwick

Michelle Ellefson, Nick Chater, Jonathan Solity

OverviewOverview

English spelling-to-sound inconsistency and reading

rational analysis of English reading

applying the simplicity principle

analysis of some common reading programmes

English spelling-to-sound inconsistency and reading

rational analysis of English reading

applying the simplicity principle

analysis of some common reading programmes

Spelling-to-sound mappingsSpelling-to-sound mappings

spelling-to-sound mappings in English are not transparent at sub-lexical level

spelling-to-sound mappings in English are not transparent at sub-lexical level

some spellings are consistent:“ck”: duck - /dʌk/, mock - /mok/, etc

and a simple grapheme-phoneme rule will suffice;ck - /k/

others are not:“ea”: beach - /biːtʃ/, real - /rɪəl/, great - /ɡreɪt/, or head - /hɛd/

most obvious at the grapheme level - “ou” grapheme is credited with having 10 different pronunciations (Gontijo, Gontijo, & Shillcock, 2003)

most obvious at the grapheme level - “ou” grapheme is credited with having 10 different pronunciations (Gontijo, Gontijo, & Shillcock, 2003)

overall measure of (in)consistency in a language is its orthographic depth: average number of pronunciations per grapheme

for English, orthographic depth estimates 2.1 - 2.4 (Berndt, Reggia, & Mitchum, 1987; Gontijo, Gontijo, & Shillcock, 2003) polysyllabic text 1.7 (Vousden, 2008) monosyllabic text

compare e.g. Serbo-Croat which has OD of 1

e.g., round, group, should, four, country, tenuous, soul, journal, cough, pompous

e.g., round, group, should, four, country, tenuous, soul, journal, cough, pompous

Data: % correct reading scores (adapted from Seymour, Aro, & Erskine, 2003).Data: % correct reading scores (adapted from Seymour, Aro, & Erskine, 2003).

how do literacy levels in English compare with other languages?

how do literacy levels in English compare with other languages?

Language Real-words

Non-words

Greek 98 92

Finnish 98 95

German 98 94

Italian 95 89

Spanish 95 89

Swedish 95 88

Dutch 95 82

Icelandic 94 86

Norwegian 92 91

French 79 85

Portuguese 73 77

Danish 71 54

Scottish English 34 29

yes - inconsistency clearly increases difficulty of learning to read compared with more consistent languages (Frith, Wimmer & Landerl, 1998)

yes - inconsistency clearly increases difficulty of learning to read compared with more consistent languages (Frith, Wimmer & Landerl, 1998)

can differences in consistency account for the difficulty in learning to read English?

can differences in consistency account for the difficulty in learning to read English?

Language Real-words

Non-words

Greek 98 92

Finnish 98 95

German 98 94

Italian 95 89

Spanish 95 89

Swedish 95 88

Dutch 95 82

Icelandic 94 86

Norwegian 92 91

French 79 85

Portuguese 73 77

Danish 71 54

Scottish English 34 29

Data: non-word reading accuracy (reproduced from Frith, Wimmer, & Landerl, 1998)

Data: non-word reading accuracy (reproduced from Frith, Wimmer, & Landerl, 1998)

0102030405060708090

100

6 7 8 9 10 11 12 13

Age

% c

orr

ect

German

English

lag in performance persists through school years lag in performance persists through school years

Most often, vowel graphemes are inconsistent, but can use immediate context to resolve ambiguity

C V C - C V or V C

Most often, vowel graphemes are inconsistent, but can use immediate context to resolve ambiguity

C V C - C V or V C

ambiguity can be resolved by considering the following consonant (a rime unit) rather than the previous consonant (Treiman et al., 1995)

ambiguity can be resolved by considering the following consonant (a rime unit) rather than the previous consonant (Treiman et al., 1995)

ea pronounced to rhyme with breath when

followed by ‘d’ ~80% pronounced to rhyme with meat when

followed by ‘p’ 100%

ea pronounced to rhyme with breath when

followed by ‘d’ ~80% pronounced to rhyme with meat when

followed by ‘p’ 100% also, rime units are more consistent than

graphemes 23% graphemes inconsistent 15% rimes inconsistent

also, rime units are more consistent than graphemes

23% graphemes inconsistent 15% rimes inconsistent

Choosing spelling-to-sound mappingsChoosing spelling-to-sound mappings

and many are inconsistent 15% rimes, 23% graphemes

and many are inconsistent 15% rimes, 23% graphemes

variety of approaches from reading schemes (Rhymeworld, THRASS, etc)

variety of approaches from reading schemes (Rhymeworld, THRASS, etc)

influences from developmental literature (do rimes or gpcs predict reading ability?)

influences from developmental literature (do rimes or gpcs predict reading ability?)

so many to choose from, ~2000 rime mappings ~300 grapheme mappings

so many to choose from, ~2000 rime mappings ~300 grapheme mappings

Rational analysisRational analysis

Attempt to explain behaviour in terms of adaptation to environment, independent of details of cognitive architecture

Attempt to explain behaviour in terms of adaptation to environment, independent of details of cognitive architecture

e.g., Anderson & Schooler (1991) showed that the probability that a memory will be needed over time matches the availability of human memories same factors that predict memory performance

also predict the odds that an item will be needed

i.e. reliable effects of recency and frequency

e.g., Anderson & Schooler (1991) showed that the probability that a memory will be needed over time matches the availability of human memories same factors that predict memory performance

also predict the odds that an item will be needed

i.e. reliable effects of recency and frequency

Solution adopted by cognitive architecture should reflect structure of environment

Solution adopted by cognitive architecture should reflect structure of environment

factors that affect performance of skilled readers should be reflected in the statistical structure of the language, e.g. frequency and consistency

factors that affect performance of skilled readers should be reflected in the statistical structure of the language, e.g. frequency and consistency

by examining linguistic factors that skilled readers have adapted to, could the input be more optimally structured for learners?

by examining linguistic factors that skilled readers have adapted to, could the input be more optimally structured for learners?

effects of word frequency in naming and lexical decision

effects of rime frequency on word-likeness judgements and pronunciation

effects of grapheme frequency in letter search and word priming experiments

effects of word frequency in naming and lexical decision

effects of rime frequency on word-likeness judgements and pronunciation

effects of grapheme frequency in letter search and word priming experiments

Analyses of spelling-to-sound mappings

Analyses of spelling-to-sound mappings

rational analysis predicts the most frequent and consistent mappings best predict pronunciation

rational analysis predicts the most frequent and consistent mappings best predict pronunciation

interested in the frequency & consistency of mappings at level of words, rimes, and graphemes, and their ability to predict correct pronunciation

interested in the frequency & consistency of mappings at level of words, rimes, and graphemes, and their ability to predict correct pronunciation

CELEX database: 7,297 different monosyllabic words, 10,924,491 words in total

CELEX database: 7,297 different monosyllabic words, 10,924,491 words in total

WordsWords

00.10.20.30.40.50.60.70.80.9

1

0 100 200 300 400 500

Number of Words

Pro

port

ion

of T

ext

Rea

d0

2

4

6

8

10

12

14

0 50 100 150 200

Rank order of Frequency

Fre

qu

en

cy in

10

0,0

00

's

Onsets and rimesOnsets and rimes

Exclude 100 most frequent words: 7,197 diffrent words, total of 2,263,264 words

Exclude 100 most frequent words: 7,197 diffrent words, total of 2,263,264 words

Create table of onset and rime mapping frequencies, remove all but most frequent of inconsistent mappings

Create table of onset and rime mapping frequencies, remove all but most frequent of inconsistent mappings

Rime Frequency

oul - /əʊl/ 731

oul - /aʊl/ 175

oul - /uːl/ 12

ove - /ʌv/ 4779

ove - /uːv/ 1852

ove - /əʊv/ 838

Rime Frequency

oul - /əʊl/ 731

ove - /ʌv/ 4779

0

50

100

150

200

250

300

350

400

450

500

0 10 20 30 40 50

Rank Order of Frequency

Fre

quen

cy in

100

0's

0

5

10

15

20

25

30

35

40

45

50

0 20 40 60 80 100


Fre

quen

cy in

100

0's

Onsets Rimes

0

10

20

30

40

50

60

0 20 40 60 80 100

Number of Rimes

Pro

port

ion

of T

ext R

ead

(%) 0 Onsets

10 Onsets20 Onsets30 Onsets40 Onsets80 Onsets

GPCsGPCs

exclude 100 most frequent words: 7197 diffrent words, total of 2,263,264 words

exclude 100 most frequent words: 7197 diffrent words, total of 2,263,264 words

create table of GPC mapping frequencies, remove all but most frequent of inconsistent mappings

create table of GPC mapping frequencies, remove all but most frequent of inconsistent mappings

Rime Frequency

g - /g/ 133930

g - /dʒ/ 33342

g - /ʒ/ 98

i - /ɪ/ 153606

i - /aɪ/ 46628

i - /iː/ 455

Rime Frequency

g - /g/ 133930

i - /ɪ/ 153606

0

1

2

3

4

5

6

7

8

9

10

0 20 40 60 80


Fre

qu

en

cy in

10

0,0

00

's

0

10

20

30

40

50

60

0 20 40 60 80 100

Number of GPCs

Pe

rce

nta

ge

of T

ext

Re

ad

GPCs

70

75

80

85

90

95

100

0 100 200 300Number of Mappings

Per

cent

age

of T

ext R

ead

Grapheme -Phoneme

Onset/Rime

SummarySummary

some words much more frequent than others, therefore sight vocabulary very effective for small number of words, up to ~100

some words much more frequent than others, therefore sight vocabulary very effective for small number of words, up to ~100

sub-lexical units also have skewed frequency distribution, and learning the most frequent mappings predicts high potential outcome

sub-lexical units also have skewed frequency distribution, and learning the most frequent mappings predicts high potential outcome

high initial gains with GPCs, greater overall gain with rimes in the long run

high initial gains with GPCs, greater overall gain with rimes in the long run

What is the optimal size unit to learn? What is the optimal size unit to learn?

Can we measure the potential benefit from, and cost of, remembering mappings for GPCs onset/rimes A combination of both ?

Can we measure the potential benefit from, and cost of, remembering mappings for GPCs onset/rimes A combination of both ?

Potential benefits for reading outcome are larger for onset/rimes, but is this out-weighed by the cost of remembering many more mappings?

Potential benefits for reading outcome are larger for onset/rimes, but is this out-weighed by the cost of remembering many more mappings?

The Simplicity PrincipleThe Simplicity Principle

reading, like much high-level cognition, involves finding patterns in data, but many patterns are compatible with any finite set of data - so how does the cognitive system choose from the possibilities?

reading, like much high-level cognition, involves finding patterns in data, but many patterns are compatible with any finite set of data - so how does the cognitive system choose from the possibilities?

Using the simplicity principle, choose the simplest explanation of the data - intuitively, has long history (Occam’s razor)

Using the simplicity principle, choose the simplest explanation of the data - intuitively, has long history (Occam’s razor)

can quantify simplicity by measuring (shortest) description from which data can be reconstructed - trade off brevity against goodness of fit cognition as compression

can quantify simplicity by measuring (shortest) description from which data can be reconstructed - trade off brevity against goodness of fit cognition as compression

implement with minimum description length (MDL) more regularity = more compression no regularity = no compression, just reproduce

data

implement with minimum description length (MDL) more regularity = more compression no regularity = no compression, just reproduce

data can measure compression with Shannon’s (1948) coding theorem - more probable events are assigned shorter code lengths:

length/bits = log2(1/p)

can measure compression with Shannon’s (1948) coding theorem - more probable events are assigned shorter code lengths:

length/bits = log2(1/p)

measure code length to specify: hypothesis about data (mappings) data, given hypothesis (decoding accuracy,

given mappings)

measure code length to specify: hypothesis about data (mappings) data, given hypothesis (decoding accuracy,

given mappings)

MethodMethod

determine code length to describe mappingsdecoding accuracy, given mappings

for each mapping size

determine code length to describe mappingsdecoding accuracy, given mappings

for each mapping size

determine mappings & frequencies from monosyllabic corpus of children’s reading materials (Stuart et al., 2003), for mapping sizes: words CV/C (head/coda) C/VC (onset/rime) GPCs

determine mappings & frequencies from monosyllabic corpus of children’s reading materials (Stuart et al., 2003), for mapping sizes: words CV/C (head/coda) C/VC (onset/rime) GPCs

Table 1. A list of reading schemes/series used by over a third of schools in the survey

Name of scheme % using scheme Included in database? Ginn 360 74% Yes Storychest 58% Yes Magic Circle 58% Yes 1 2 3 and Away 50% Yes Griffin Pirates 43% Yes Breakthrough to Literacy 41% Bangers and Mash 40% Yes Wide range readers 38% Yes Dragon Pirates 37% Yes Through the rainbow 34% Ladybird read-it-yourself 33% Yes Humming birds 32% Thunder the dinosaur 29% Yes Link Up 29% Gay Way 27% Yes Monster 27% Yes Oxford Reading Tree 27% Yes Once Upon a Time 26% Yes Trog 26%

words CV/C C/VC GPCsletter sound freq letter sound freq letter sound freq

wiː bi bI 1 d d 7 t t 17

kæn ca kæ 1 g g 4 i I 11

bʌt do dɒ 1 f f 1 a æ 11

ɒn fro frɒ 2 wh w 1 g g 7

miː ra ræ 1 k k 4

c k 4

bæk n n 9 an æn 4 ay eɪ 2

kʌm k k 1 og ɒg 2 a*e eɪ 1

frɒg lp lp 1 elp elp 1

Code length for mappingsCode length for mappings

length = log2(1/p(w)) + log2(1/p(iː)) + log2(1/p(newline))

length = log2(1/p(w)) + log2(1/p(iː)) + log2(1/p(newline))

length = log2(1/p(b)) + log2(1/p(i)) + log2(1/p(space)) +

log2(1/p(b)) + log2(1/p(I)) + log2(1/p(newline))

length = log2(1/p(b)) + log2(1/p(i)) + log2(1/p(space)) +

log2(1/p(b)) + log2(1/p(I)) + log2(1/p(newline))

Code length for decoding accuracyCode length for decoding accuracy

apply letter-to-sound rules to produce a list of pronunciations

apply letter-to-sound rules to produce a list of pronunciations

bread breIdbri:dbrɛd

bread breIdbri:dbrɛd

arrange in rank order of most probable (computed from letter-to-sound frequencies) & note rank of correct pronunciation

arrange in rank order of most probable (computed from letter-to-sound frequencies) & note rank of correct pronunciation

bread bri:dbrɛdbreId

bread bri:dbrɛdbreId

code length for data, given hypothesis = log2(1/p(rank=2))code length for data, given hypothesis = log2(1/p(rank=2))

0

2

4

6

8

10

12

14

0 0.2 0.4 0.6 0.8 1

p

log

2(1

/p)

how does code length vary as a function of size of vocabulary for each unit size?

how does code length vary as a function of size of vocabulary for each unit size?

optimize number of mappings by removing those that reduce total code length

optimize number of mappings by removing those that reduce total code length

overall comparison between different unit sizes for whole vocabulary

overall comparison between different unit sizes for whole vocabulary

SimulationsSimulations

compare different reading schemes compare different reading schemes

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

Words CV/C C/VC GPCs

Rule size

Co

st/b

its

rulesacctot

Comparing different unit sizes for whole vocabulary Comparing different unit sizes for whole vocabulary

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

0 500 1000 1500 2000 2500 3000

Vocabulary size

Cos

t/bits

words

CV/C

C/VC

GPC

0

500

1000

1500

2000

2500

3000

3500

4000

10 30 50 70 90

Vocabulary size

Cos

t/bits

words

CV/C

C/VC

GPC

Code length as a function of vocabulary sizeCode length as a function of vocabulary size

Optimizing number of mappings Optimizing number of mappings

All mappings Mappings remaining

Mapping Size N Total code

length

N Total code

length

Words 3000 77,825

- -

Onset/rimes 1141 48,612

404 29,420

GPCs 240 10,845

114 8,536

GPCs: Description length reduced by removing mainly inconsistent, low frequency mappingsGPCs: Description length reduced by removing mainly inconsistent, low frequency mappings

Comparing different reading schemes Comparing different reading schemes

SchemeN

GPC rules

Jolly Phonics 43

Hutzler et al. (2004) 67

ERR (Solity & Vousden, 2008)

77

Letters & Sounds 94

THRASS 106

0

5000

10000

15000

20000

25000

30000

Jolly N=43 HutzlerN=71

ERR N=77 LettSouN=94

THRASSN=106

simplicityN=111

all N=240

Cos

t/bits

rulesacctotal

0

5000

10000

15000

20000

25000

30000

43 71 77 94 106 111 240

N rules

Cos

t/bits

rulesacctotal

0

10

20

30

40

50

60

70

80

JollyN=43

HutzlerN=71

ERRN=77

LettSouN=94

THRASSN=106

all N=240 simplicityN=111

% c

orr

ect

SchemesSimplicity

Decoding accuracy by scheme Decoding accuracy by scheme

ERR implemented as a reading intervention in 12 Essex schools:

ERR implemented as a reading intervention in 12 Essex schools:

increase in reading scores significantly greater for ERR schoolsincrease in reading scores significantly greater for ERR schools

0

10

20

30

40

50

60

base YR Y1 Y2

School Age

BA

S R

aw

Sco

re

Comparison ERR

Data: from Shapiro & Solity (2008)Data: from Shapiro & Solity (2008)

Comparison

ERR

reading difficulty

20% 5%

serious reading difficulty

5% 1%

Some conclusionsSome conclusions

small amount of sight vocabulary accounts for large proportion of text, but only small vocabularies most simply described by whole words Complements recent work by Treiman and

colleagues that shows children learn better when association between sound and print is non-arbitrary

small amount of sight vocabulary accounts for large proportion of text, but only small vocabularies most simply described by whole words Complements recent work by Treiman and

colleagues that shows children learn better when association between sound and print is non-arbitrary

As a homogenous set, GPCs provide a simpler explanation of the data

As a homogenous set, GPCs provide a simpler explanation of the data

choosing the best set could be important choosing the best set could be important

simplifying reading: implications for instruction

Documents

reading ability

sound inconsistency

ambiguity c v c c v

v c ambiguity

graphemefor english

english compare

correct reading scores

sound mappingsand