multisensory learning and learning to read

49
Multi-sensory learning and learning to read Leo Blomert, Dries Froyen PII: S0167-8760(10)00169-8 DOI: doi: 10.1016/j.ijpsycho.2010.06.025 Reference: INTPSY 10166 To appear in: International Journal of Psychophysiology Please cite this article as: Blomert, Leo, Froyen, Dries, Multi-sensory learn- ing and learning to read, International Journal of Psychophysiology (2010), doi: 10.1016/j.ijpsycho.2010.06.025 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Upload: khlim

Post on 25-Jan-2023

1 views

Category:

Documents


0 download

TRANSCRIPT

�������� ����� ��

Multi-sensory learning and learning to read

Leo Blomert, Dries Froyen

PII: S0167-8760(10)00169-8DOI: doi: 10.1016/j.ijpsycho.2010.06.025Reference: INTPSY 10166

To appear in: International Journal of Psychophysiology

Please cite this article as: Blomert, Leo, Froyen, Dries, Multi-sensory learn-ing and learning to read, International Journal of Psychophysiology (2010), doi:10.1016/j.ijpsycho.2010.06.025

This is a PDF file of an unedited manuscript that has been accepted for publication.As a service to our customers we are providing this early version of the manuscript.The manuscript will undergo copyediting, typesetting, and review of the resulting proofbefore it is published in its final form. Please note that during the production processerrors may be discovered which could affect the content, and all legal disclaimers thatapply to the journal pertain.

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

1

Keynote #3

Multi-sensory learning and learning to read

Leo Blomert & Dries Froyen

Department of Cognitive Neuroscience & Maastricht Brain Imaging Centre

Faculty of Psychology & Neuroscience, Maastricht University, The Netherlands

Running head: Multi-sensory learning and learning to read

Key words: letters and speech sound correspondences; multi-sensory processing,

audiovisual integration, reading development

Correspondence to

Leo Blomert Ph.D.

Faculty of Psychology & Neuroscience

Maastricht University

P.O. Box 616

6200 MD Maastricht

The Netherlands

Tel.: +31-43-3881949

e-mail: [email protected]

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

2

Abstract

The basis of literacy acquisition in alphabetic orthographies is the learning of the

associations between the letters and the corresponding speech sounds. In spite of this

primacy in learning to read, there is only scarce knowledge how this audiovisual

integration process works and which mechanisms are involved. Recent

electrophysiological studies of letter – speech sound processing have revealed that

normally developing readers take years to automate these associations and dyslexic

readers hardly exhibit automation of these associations. It is argued that the reason for

this effortful learning may reside in the nature of the audiovisual process that is

recruited for the integration of in principle arbitrarily linked elements. It is shown that

letter-speech sound integration does not resemble the processes involved in the

integration of natural audiovisual objects such as audiovisual speech. The automatic

symmetrical recruitment of the assumedly uni-sensory visual and auditory cortices in

audiovisual speech integration does not occur for letter and speech sound integration.

It is also argued that letter-speech sound integration only partly resembles the

integration of arbitrarily linked unfamiliar audiovisual objects. Letter-sound

integration and artificial audiovisual objects share the necessity of a narrow time

window for integration to occur. However, they differ from these artificial objects,

because they constitute an integration of partly familiar elements which acquire

meaning through the learning of an orthography. Although letter – speech sound pairs

share similarities with audiovisual speech processing as well as with unfamiliar,

arbitrary objects, it seems that letter – speech sound pairs develop into unique

audiovisual objects that furthermore have to be processed in a unique way in order to

enable fluent reading and thus very likely recruit other neurobiological learning

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

3

mechanisms than the ones involved in learning natural or arbitrary unfamiliar

audiovisual associations.

1. The role of multi-sensory learning in learning to read

The importance of written language in today’s society can hardly be overestimated. It

allows us to transcend our communication limits in space and time and proficient

literacy has thus become a crucial marker of the quality of life (UNESCO, 2005). In

the last decade neuroimaging studies have identified a brain region that shows

specialisation for fast visual word recognition; i.e. the putative Visual Word form

Area (Cohen et al., 2000) in the occipito-temporal cortex. Since fluency and

automaticity are the most salient features of experienced reading it is indeed plausible

that a neural network involved in visual object recognition has specialised for

recognising visual letters and word forms (McCandliss, Cohen, & Dehaene, 2003).

Fluency of reading however is an intriguing characteristic, since we need years of

explicit instruction and practice before we start to exhibit any fluency in visual word

recognition (Vaessen & Blomert, 2010) and persons with dyslexia may never attain a

really fluent reading performance (Gabrieli, 2009). This long-lasting process contrasts

sharply with the way we learn to master spoken language. Infants and young children

start to pick up and develop the many complexities of spoken language without

explicit instructions at a time when literacy instruction is still a potential event in a far

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

4

future (e.g., Jusczyk, 1997). So we may ask the question: What makes learning to

read so effortful?

One obvious answer to this question is that writing systems have only emerged fairly

recently in evolution; i.e. a few thousand years ago and for the majority of people

only a few hundred years ago (Rayner & Pollatsek, 1989). In contrast, spoken

communication is an old habit and speech probably arose some 60.000 ago

(Lieberman, 2006). Therefore, it is very likely that our brains are evolutionarily

prepared for speech, but not for learning to read. So, we may further ask: Which

mechanisms enable the brain to learn to read?

A first hint may come from the fact that spoken language development not only

precedes the learning of written language evolutionarily, but also ontogenetically: we

speak and listen before we write and read. To find an answer we thus need a closer

look at the beginnings of reading. The very first step in learning to read is establishing

associations between letters and speech sounds (Frith, 1985; Marsh, Friedman, Welch,

& Desberg, 1981). In alphabetic languages written words are created out of a limited

set of elements, i.e. letters. These letters, are purportedly representing their spoken

counterparts, i.e. speech sounds. Since children have already mastered spoken

language to a considerable degree when they enter school, it has been suggested that

written language builds on the spoken language system, particularly on the

mechanisms for processing speech sounds (Liberman, 1973; Mattingly, 1972). By

now it has been generally accepted that the ability to manipulate speech sounds and

learning to read reciprocally influence each other during reading development

(Perfetti, Beck, Bell, & Hughes, 1987). It has indeed been shown that the brain

responses to phonological stimulation of healthy illiterates differed from the responses

of normal readers (Castro-Caldas, Petersson, Reis, Stone-Elander, & Ingvar, 1998).

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

5

This influence from learning to read on spoken phonological representations may

occur because before learning to read the smallest elements of the spoken speech

system are not isolated speech sounds or phonemes, but larger chunks of sound

information which do not directly match onto the newly learned letters (Ziegler &

Goswami, 2005). So, although the impression may be that the learning of an

orthography directly connects to the already existing phonological representations of

speech sounds, this may in effect consist of a re-shaping of the relevant spoken

language elements, thus changing permanently the spoken language system. Although

the mechanisms enabling these cross-modal influences are still unknown, we

hypothesize that the formation of letter-sound associations very likely constitutes the

vehicle via which learning to read changes spoken phonological representations.

Considering the importance of letter – speech sound associations for learning to read,

we may again rephrase our question to: How does the brain establish associations

between letters and speech sounds?

Recent research has made clear that multi-sensory information processing is part and

parcel of object perception and recognition in daily life, whereby the brain integrates

the information from different modalities into a coherent percept (Ghazanfar &

Schroeder, 2006). Neuroimaging research pointed to the superior temporal sulcus

(STS) as an important brain area for audiovisual integration processes (e.g.

Beauchamp, Argall, Bodurka, Duyn, & Martin, 2004; Calvert, 2001). Although this

latter study mainly found activations for meaningful objects like animals and tools in

this area, it was speculated that STS might also be instrumental for other kinds of

audiovisual associations (see Hocking & Price, 2008 for a similar interpretation). The

findings of a posterior temporal network for audiovisual speech processing (lip-

reading), including the assumedly uni-sensory visual and auditory sensory cortices

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

6

(e.g. Calvert, Brammer, & Iversen, 1998; Calvert et al., 1997) also sparked interest in

letter-speech sound pairs as a special kind of audiovisual objects (Hashimoto & Sakai,

2004; Herdman et al., 2006; Raij, Uutela, & Hari, 2000). The interest for letters and

their corresponding speech sounds partly stems from the fact that they are recent

cultural inventions, sharply contrasting with e.g. audiovisual speech. The attraction

also partly resides in the fact that letter-speech sound pairs are highly over-learned

multi-sensory associations allowing people to manipulate the congruency between the

elements without activating higher order cognitive processes. Letter-speech sound

pairs may thus be conceptualized as in principle arbitrary associations, which acquire

meaning by learning a specific orthography. Although, as discussed above, one

element of these associations seems already in place when learning to read starts, this

is only partly true: the exact phonemes corresponding to letters or letter strings are not

part of the neural and behavioural repertoire of spoken language before learning to

read (Blomert & Willems, in revision; Morais, Cary, Alegria, & Bertelson, 1979;

Wimmer, Landerl, Linortner, & Hummer, 1991) and existing representations of

speech sounds probably need a fundamental re-modelling to fit the requirements for

adequate letter-speech sound associations. Thus, it is more adept to formulate that one

element of the pair, i.e., phonemic speech sounds, are familiar in kind but not in type,

when learning to read starts. These considerations point to the potentially ambiguous

status of letter-sound pairs as audiovisual objects: Do letter-speech sound associations

resemble natural audiovisual objects with known elements or do they resemble

arbitrary associations between unfamiliar elements? And lastly, is it possible that the

type of association (natural versus artificial) and the way these associations are used

during reading implies different association mechanisms?

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

7

Before we review the findings on the neural correlates of letter-speech sound

processing, we first need to clarify our concept of letter-sound associations. Although

influential models of reading (Coltheart, Rastle, Perry, Langdon, & Ziegler, 2001) and

reading development (Ehri, 1995; Share, 1995) implemented a central role for the

learning of letter-speech sound relations and its role in the learning of new words, any

fundamental insights in the nature and workings of this association process are

basically missing. This lack of basic research findings might, in part, be attributed to

the widespread opinion that the associations between letters and speech sounds are

mastered within a few months by most children in most alphabetic orthographies

(e.g., Ziegler & Goswami, 2005). The assumed fast and easy learning process would

by implication transfer the burden for explaining effortful and long-lasting fluent

reading development to other processes. Therefore we want to emphasize the

difference between the learning of letter speech sound associations and “letter

knowledge” or “letter-sound knowledge”, as “it is possible in principle for a child to

know the modal pronunciations for all letters and still have not in place any notion

that these sounds are parts of words” (Byrne & Fielding-Barnsley, 1989). Recent

evidence indeed showed that even dyslexic children, who exhibited serious problems

learning to read and learning letter-speech sound associations nevertheless showed

full letter-knowledge mastery just like their normal reading peers towards the end of

first grade (Blomert & Willems, in revision). Furthermore, it was recently shown that

the speed of letter-sound association processes systematically decreased over the full

range of primary school grades without reaching a floor in sixth grade in normal

readers (Blomert & Vaessen, 2009) suggesting an ongoing automation of these

associations (Chein & Schneider, 2005). The salient differences in learning rate

between letter knowledge and letter-speech sound associations suggest that an

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

8

exploration of the type of audiovisual association, which is formed when learning

letter-speech sound correspondences may provide key insights for understanding

reading development.

2.1 Magneto-encephalographic (MEG) insights in letter-speech sound

associations

A rare early behavioural study revealed a first basic insight in letter-speech sound

processing by investigating the influence of letter primes on the recognition of a

speech sound in spoken syllables (Dijkstra, Schreuder, & Frauenfelder, 1989).

Subjects were asked to identify the vowel in a spoken syllable consisting of a

consonant and a vowel. The target vowel was primed by a letter prime that was either

congruent or incongruent with the target. The results showed clear decreases of

response latencies if prime and target were congruent, thus indicating automatic cross-

modal activations of speech sounds by letters.

It took another decade before a study appeared which investigated the neural

correlates of the automatic audiovisual integration of letters and speech sounds with

an emphasis on its temporal dynamics (Raij, Uutela, & Hari, 2000). This magneto-

encephalographic (MEG) study reported no letter–specific cross-modal interaction

effects in temporal sensory specific cortices. A first difference between the processing

of letters (matching and non-matching) and non-letter control stimuli was recorded in

the temporo-occipital-parietal junction around 225 milliseconds after stimulus onset.

And only the superior temporal sulci (STS) revealed strong interactions between

letters and speech sounds 380-450 ms after stimulus onset. It is noteworthy that the

interactions in the right hemisphere started almost 70 ms later than in the left STS,

suggesting that the audiovisual integration process of letters and speech sounds

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

9

mainly and first occurred in left STS. These results thus established that STS also

served as a converging and integration site for arbitrarily linked objects like letters

and speech sounds. The involvement of the visual and auditory cortices, which was

reported for audiovisual speech processing (Calvert, 2001), was not confirmed for

these culturally defined letter – speech sound associations.

A recent MEG study investigated changes in cortical oscillations consequent upon the

perception of congruent and incongruent letter-speech sound pairs (Herdman et al.,

2006). Subjects saw Japanese Hiragana graphemes, which have a fully transparent

relation with the auditory presented corresponding phonemes and were asked to press

a button to indicate congruency or incongruency between the letters and speech

sounds. The reaction times showed faster responses for congruent pairs, confirming

the findings of the early behavioural study (Dijkstra, Schreuder, & Frauenfelder,

1989). The MEG results showed greater response power to congruent letter-sound

pairs than incongruent ones in the left auditory cortex, and a later congruency effect in

the bilateral visual cortices. Although the latter effect might also indicate that

phonemes directly influence the processing of a letter, the authors suggested that this

was due to feedback from multi-sensory integration sites like STS given the later time

window. The fact that these activations in ‘sensory specific cortices’ were not

reported in the earlier MEG study by Raij et al. (2000) was contributed to the fact that

this study relied on minimum current estimates to estimate the evoked fields, whereas

their own MEG study capitalized on total change in signal power of neural activation,

a method closer to the BOLD signal as measured with fMRI. This might explain why

in the study by Herdman and co-authors auditory cortex involvement was found, as

was also reported in an earlier fMRI study of letter – speech sound processing (Van

Atteveldt, Formisano, Goebel, & Blomert, 2004).

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

10

Together, these studies convincingly point to an automated cross-modal process for

integrating letters and speech sounds with a focus in the left temporal cortex.

However, the role of low level sensory specific cortices in the integration process

needs further clarification.

2.2 Functional magnetic resonance imaging (fMRI) insights in letter – speech

sound associations

A first basic insight in the neural network for processing letters and speech sounds

was reported in a fMRI study manipulating the congruency of letter-sound pairs (Van

Atteveldt, Formisano, Goebel, & Blomert, 2004). Adult experienced readers were

asked to watch letters and listen to speech sounds presented simultaneously without

having to execute a task. The results revealed that areas in STS not only responded to

isolated letters and speech sounds, but also showed an enhanced response to bimodal

stimulation whether congruent or incongruent, thus confirming the role of STS as an

audiovisual integration site also for in principle arbitrary associations. Interestingly,

the ‘sensory specific’ response from auditory cortex showed an enhancement to

congruent and a suppression to incongruent letter-sound pairs in comparison to the

processing of speech sounds alone. This modulating influence of letters on speech

sound processing, presumably as a consequence of feedback from STS, was

reminiscent of the activation patterns in temporal cortex for audiovisual speech

(Calvert, 2001). Since the time window for the integration of audiovisual speech in

STS was reportedly rather wide (Massaro, Cohen, & Smeele, 1996; Munhall, Gribble,

Sacco, & Ward, 1996; Van Wassenhove, Grant, & Poeppel, 2007), we tested if this

was also the case for associations like letter-speech sound correspondences. A fMRI

study in which we systematically varied the stimulus onset asynchrony (SOA)

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

11

between the letters and the speech sounds confirmed the wide temporal window for

integration in STS, but interestingly, feedback to auditory cortex only occurred if the

stimuli were presented simultaneously (Van Atteveldt, Formisano, Blomert, &

Goebel, 2007). Although the earlier fMRI studies investigating audiovisual speech

and also the MEG study by Herdman and co-authors investigating letter-sound

associations reported feedback to visual cortex, we did not find congruency effects

and thus no support for potential feedback to visual cortices. It should therefore be

noted, that the MEG letter-sound study used an active task (subjects had to decide if

letters and speech sounds were congruent or incongruent by button press) whereas the

fMRI study used a passive design (subjects did not have to do anything other than

watch letters and listen to speech sounds). It is therefore relevant to report the results

of a third fMRI study in which the congruency effect found for the passive task in

auditory cortex disappeared in the active task (Van Atteveldt, Formisano, Goebel, &

Blomert, 2007). The disappearance of the congruency effect probably resulted from

the fact that congruent and incongruent letter-sound pairs become equally relevant in

a decision task, whereas only the congruent pairs have relevance during more natural

processing. This indicates that the passive design is the more ecologically valid design

for triggering reading related processes.

In summary: The results of the MEG and fMRI studies confirm that STS is not only

an integration site for natural associations, but also for in principle arbitrary

associations such as letter-speech sound pairs. The difference between types of cross-

modal associations so far seems to reside in bidirectional feedback from hetero-modal

integration sites to visual and auditory cortices in the case of natural associations like

audiovisual speech, whereas it is still unclear if there is similar or asymmetric

feedback (mainly to auditory cortex) in the case of letter-speech sound associations.

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

12

2.3 Electrophysiological (MMN) insights in letter – speech sound associations

The cross-modal MMN paradigm for letter-speech sound associations

Since the time course of letter-sound processing and the absence of a task proved

critical for revealing neural pathways for the association and/or integration of letters

and speech sounds we chose a method of investigation characterized by a high

temporal resolution and which doesn’t require a task: The auditory MisMatch

Negativity (MMN) paradigm (Näätänen, 1995). The MMN is an automatic deviance

detection mechanism, known to be evoked between 100 and 200 ms when, in a

sequence of auditory stimuli, a rarely presented sound (the deviant) deviates in one or

more aspects from the sound that is frequently presented (the standard). The MMN is

considered to reflect the neurophysiological correlate of a comparison process

between an incoming auditory stimulus and the memory trace formed by the repetitive

aspect of the standard stimulus (Näätänen, 2000; Picton, Alain, Otten, Ritter, &

Achim, 2000; Schröger, 1998). The MMN has repeatedly been shown to be sensitive

not only to auditory deviants, but also to language-specific speech sound

representations in adults and in children (Bonte, Mitterer, Zellagui, Poelmans, &

Blomert, 2005; Bonte, Poelmans, & Blomert, 2007; Mitterer & Blomert, 2003;

Näätänen, 2001; Winkler et al., 1999). The child MMN is furthermore suggested to be

a stable component resembling the adult MMN (Csépe, 2003), and is particularly

useful for research with children, because its evocation does not require sustained

attention or completion of a task. Moreover, the MMN has been used before to

investigate phonological and auditory processing deficits in children and adults with

reading disabilities (Bishop, 2007; Bonte, Poelmans, & Blomert, 2007; Csépe, 2003;

Kujala & Naatanen, 2001).

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

13

Although the MMN is considered a purely auditory deviance detection mechanism, it

has also been successfully used to investigate neural correlates of audiovisual

integration. Most of these studies, however, investigated how a deviant visual

stimulus evoked an auditory MMN by illusorily changing the percept of the standard

sound (e.g. the McGurk or ventriloquist illusions, see later paragraphs). Although

letters are not expected to cause illusory effects on speech sound processing and may

thus not be expected to evoke an auditory MMN, we speculated that the automated

binding of highly over-learned letters and speech sounds might act as a comparably

strong influence as the aforementioned illusions. This speculation was inspired by a

study showing that subjects listening to speech sounds were very capable of

imagining the physically absent letters corresponding to these speech sounds (Raij,

1999). The majority of subjects showed neural activations during mental imagining,

close to the ones later reported for audiovisual integration of letters and speech sounds

(Raij, Uutela, & Hari, 2000).

In all electrophysiological studies to be reported below a cross-modal MMN paradigm

is used. In our MMN studies, participants were presented with speech sound standards

/a/ and deviants /o/ appearing either in isolation (auditory only) or with a standard

letter “a” (audiovisual) (Figure 1). The rationale for this design resided in the

assumption that if letters and speech sounds are automatically integrated, this might

be reflected in an effect of letters on the MMN to speech sounds due to a double

cross-modal violation of the deviant speech sound /o/ towards the standard speech

sound /a/ and the standard letter “a”. We thus predicted an effect of the audiovisual

presentation on the MMN amplitude or latency over and above the standard auditory-

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

14

only deviancy effect. In addition, we investigated the temporal window within which

letters and speech sounds were processed as an integrated audiovisual object. Letters

were presented either simultaneously with or preceding the speech sounds by 100 or

200 ms. Participants were normally reading adults (Froyen, van Atteveldt, Bonte, &

Blomert, 2008), normally reading children (Froyen, Bonte, van Atteveldt, & Blomert,

2009) and dyslexic children (Froyen, Willems, & Blomert, in revision).

Letter-speech sound processing in adults

To probe the feasibility of the cross-modal paradigm for studying letter-speech sound

processing we first investigated normal reading adults for whom letter – speech sound

processing is expected to be fully automated (Froyen, van Atteveldt, Bonte, &

Blomert, 2008). Results showed that the MMN amplitude to the speech sound deviant

presented together with letters was found to be enhanced in comparison with the same

speech sound deviant presented in isolation (Figure 2C). However, this effect was

only found if letters and speech sounds were presented simultaneously. The MMN

amplitude enhancement induced by letters diminished linearly with increasing

stimulus onset asynchrony (SOA); when the letter preceded the speech sound by 100

ms the cross-modal enhancement effect was no longer significant, and when letters

and speech sounds where presented with 200 ms SOA the MMN amplitude differed

significantly from the MMN amplitude in the condition with 0 ms SOA. Considering

the early and automatic evocation of the MMN, these results strongly indicated that

letters and speech sounds were automatically integrated in experienced readers’

brains, within an early but also narrow temporal window of integration.

To investigate if this effect was specific for letter-speech sound presentations we

conducted a replication study, again with speech sounds in isolation and with

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

15

simultaneously presented letters, this time however also with non-letter visual stimuli

(Froyen, de Doelder, & Blomert, submitted). The non-letter was a scrambled version

of the letter, and thus contained the same basic visual features as the letter but not its

letter-specific content. In this way we were able to check if the MMN amplitude

enhancement as found in the earlier described adult study (Froyen, van Atteveldt,

Bonte, & Blomert, 2008) was genuinely letter-specific. The results revealed first, that

in both audiovisual conditions (letters and scrambled letters) the MMN amplitude was

enhanced in comparison with the auditory only condition. However, the MMN

amplitude enhancement in the audiovisual letter condition was significantly stronger

than the enhancement in the audiovisual non-letter condition. The letter-specificity of

the MMN enhancement effect was further bolstered by the finding of a significant and

substantial correlation between reading fluency and the letter-specific MMN

enhancement. These results not only replicated the previously reported letter - speech

sound integration effect (Froyen, van Atteveldt, Bonte, & Blomert, 2008), but also

established that this effect was specific for visual letters and strongly related with

reading. And lastly, these results together indicated the MMN as a valid and valuable

tool for the investigation of letter-speech sound integration.

Development of letter – speech sound processing

Learning the correspondences between letters and their corresponding speech sounds

is a crucial step towards reading acquisition. As discussed in the introduction there are

widely different interpretations of the time it takes to learn these associations; varying

from a few months to several years. The only available direct evidence is based on

behavioural measurements of the speed of letter-speech sound identification and

discrimination and strongly favours an extended development (Blomert & Vaessen,

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

16

2009). However, to the best of our knowledge, no neural correlates of letter – speech

sound association development have been reported yet. The availability of a valid

cross-modal MMN paradigm, well suited for research with children, presented the

opportunity to investigate the development of pre-attentive letter – speech sound

processing in beginner as well as in advanced readers in primary education (Froyen,

Bonte, van Atteveldt, & Blomert, 2009). We presented eight and eleven year old

children, with respectively one and four years of reading instruction, with the exact

same letter – speech sound MMN paradigm as was used previously with adults

(Froyen, van Atteveldt, Bonte, & Blomert, 2008). Audiovisual stimuli were either

presented simultaneously or the letter preceded the speech sound by 200 milliseconds.

The results revealed that eight year old beginner readers showed full letter knowledge

mastery at the time of the experiment. Despite this letter knowledge, we found no

modulation of the MMN by letters and therefore no signs of letter-speech sound

integration for both SOA conditions. Surprisingly, even in eleven year old advanced

readers we did not observe the MMN modulation by letters found in adults during

simultaneous audiovisual presentation. However we did find a significant MMN

amplitude enhancement when the letter preceded the speech sound by 200 ms, a time

window much too wide for adults to show audiovisual integration. We interpreted

these findings as an indication that letter– speech sound integration had become

automatic after four years of reading and reading instruction. The finding of a wider

temporal window for integration than in adults was interpreted as a probable

consequence of a still maturing brain. This main finding of early automatic letter-

speech sound integration was completed by a second finding in a much later time

window: Whenever there was no early integration effect, there was a systematic

influence of letters on speech sound processing at 650 ms after speech sound onset.

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

17

Interestingly, both eight and eleven year old beginner and advanced readers showed

this late pattern of letter influence on speech sound processing. Moreover, the

influence of SOA on this late 650 ms effect mirrored the pattern found for the SOA

effects on the MMN found for eleven year old children and adults: the older children

(eleven year olds) showed the effect at simultaneous presentation, whereas the

younger group only revealed the effect, if the letter preceded the speech sound by 200

ms (see table 1). This SOA effect was again speculatively interpreted as a

consequence of a difference in brain maturation, leading to a difference in processing

speed. Because the early and late effects of letters on speech sound processing might

reflect differences in reading experience, we re-analyzed the adult data (Froyen, van

Atteveldt, Bonte, & Blomert, 2008) and only found an early congruency effect on the

MMN, but no late effects. We interpreted this effect to mean that if a reader has

become fully experienced, letter-speech sound integration is fully automatic and no

further processing is required after the first fast stimuli integration. If, in contrast,

reading has just started, letters and speech sounds are not yet automatically integrated,

hence no effects on the MMN will be found. The fact that young beginning readers

nevertheless exhibited full letter knowledge, thus knowing which letter belonged to

which speech sound, may indicate that the much later effect of letters on speech sound

processing represented still weak associations between letters and speech sounds. The

eleven year olds showed a pattern, partly reminiscent of automatic adult letter-sound

integration and partly reminiscent of still effortful association also found in beginning

readers. They did show early MMN effects, but only if the time window was 200 ms

and they did show late association effects only when letters and speech sounds were

presented simultaneously. Together these results indicated a transition from the mere

association of letters and speech sounds in beginner readers to more automatic, but

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

18

still not “adult-like”, integration in advanced readers. The regularity of these effects

over time suggests a gradual development of letter-speech sound integration processes

resulting from a dynamic interplay between brain maturation and developing reading

experience.

Letter-speech sound processing in dyslexia

If the automation of letter-speech sounds associations in normally reading children

takes years to develop, one wonders what happens in children suffering from a

constitutional developmental reading disorder, like dyslexia. Results from recent

neuro-imaging studies indeed revealed reduced activation for letter-sound associations

in dyslexic children and adults. But more interestingly, these results also showed

comparable activation for congruent as well as incongruent letter-sound pairs in

dyslexic readers, whereas all normal readers, adults and children, immediately

suppressed the activations for incongruent pairs (Blau, Atteveldt, Ekkebus, Goebel, &

Blomert, 2009; Blau et al., 2010). These results indicated that letter-speech sound

association in dyslexia was not only anomalous at the start of reading development,

but remained anomalous into adulthood.

To investigate the nature of these anomalous associations we conducted the same

cross-modal MMN paradigm, used in our earlier studies, with eleven year old

dyslexic children (Froyen, Willems, & Blomert, in revision). The results revealed that,

in contrast to their normal reading peers, children with dyslexia showed no effect of

letters on the MMN and thus no signs of early automatic integration even after four

years of reading instruction. Since our previous study with normally reading children

(Froyen, Bonte, van Atteveldt, & Blomert, 2009) had indicated that in the absence of

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

19

early integration, deviant letters still might reveal an effect on letter-speech sound

association in a much later time-window, we searched for possible letter effects in the

650 ms time-window. We indeed found a late effect of deviant letters, but this effect

differed from what we found for their normal reading peers in our previous study

(Table 1). These eleven year old normal reading children did show a late effect only if

letters and speech sounds were presented simultaneously. In contrast the dyslexic

children only showed a late effect, when letters preceded speech sounds by 200 ms,

resembling the late effect found for the much younger normal reading children who

had only received one year of reading instruction. We interpreted the finding that

dyslexic children still showed weak associations of letters and speech sounds and no

integration, after four years of reading instruction, as an indication that letter-speech

sound associations deficits may pose as a proximal cause of their reading problems.

The validity of this interpretation was further reinforced by the finding of a strong

correlation between the late letter effect and their word reading performance.

In closing it is of interest to note, that the dyslexic readers showed very similar speech

sound processing as their age matched normal reading peers when speech sounds

where presented in isolation. They did however show problems in processing letter-

speech sound pairs, in which the speech sound was the same as the one presented in

isolation. It is thus unlikely that the letter – speech sound association/integration

problems evidenced in this MMN study resulted from impoverished or otherwise poor

phonological representations or poor processing of the speech sounds involved. In line

with this result we found no correlation between the quality of the late letter-speech

sound processing effect and their phonological awareness performance. These

findings also suggested that the reduced speech sound processing reported in the

fMRI studies revealing anomalous associations (Blau, Atteveldt, Ekkebus, Goebel, &

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

20

Blomert, 2009; Blau et al., 2010) might have been more consequence than cause of

these cross-modal problems.

Since all dyslexic children did show normal phonological discrimination of speech

sounds in isolation and also did show full mastery of ‘letter knowledge’, like their

normal reading peers, it is appropriate to ask if there was some deficiency with the

association, per se. This then brings us back to the question of whether the specific

audiovisual nature of the letter-speech sound associations may be of influence on the

quality and learning rate of the associations in reading success as well as in reading

failure. In the following we will therefore try to clarify the nature of letter-speech

sound associations and thus clarify their status as audiovisual objects.

2.4 Summary of main findings

The learning of letter- speech sound associations is the very basis of reading

acquisition and consists of the formation of integrated audiovisual objects presumably

necessary for the development of fluent reading. Recent electrophysiological evidence

shows that letter-speech sound integration takes years to fully automate and never

seems to reach adequate automatic integration in dyslexic persons, despite the fact

that all normal and dyslexic children know which letters belongs to which sounds

within a year of reading instruction. It is therefore necessary to differentiate between

‘letter knowledge’ and the learning of cross-modal letter-sound integration for the

purpose of the development of fluent reading.

Letter-speech sound associations are cultural inventions and thus in principle arbitrary

audiovisual objects, of which the speech sound element may be regarded as familiar

in kind, but not in type, because the particular speech sounds which match directly

onto letters or letter strings only develop as a consequence of learning to read. The

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

21

integration of letters and speech sounds occurs mainly in the left STS and feedback to

the auditory cortex modulates the processing of speech, however only if both stimuli

are perceived in synchrony. The involvement of low level visual areas is however still

unclear and will be further evaluated in following paragraphs. Since the processes

involved in learning letter-sound associations do not seem to coincide with the

mechanisms for associating natural audiovisual objects, we will further explore the

status of letter-sound associations as audiovisual objects.

2. Letter speech sound associations as audiovisual objects

By clarifying the status of letter – speech sound pairs as audiovisual objects we might

also gain some insights in the fundaments of normal and abnormal reading

development. We will therefore evaluate in what aspects letter – speech sound

processing resembles natural associations like audiovisual speech and artificial

audiovisual object processing of unfamiliar elements like flashes and beeps, and in

which aspects it differs from them.

3.1 Does letter-speech sound integration resemble natural audiovisual

integration?

Whenever another person speaks to us, we focus mainly on the auditory speech

signal. However, the lip movements accompanying the speech signal have been

shown to contribute substantially to the processing of audiovisual speech, e.g. by

improving the intelligibility of speech in noisy environments by 20dB (Sumby &

Pollack, 1954). Audiovisual integration of communication signals may have occurred

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

22

timely in evolution and the automation and strength of audiovisual speech integration

has been convincingly demonstrated with the famous McGurk-paradigm: The

auditory signal /ba/ is perceived as / da/ when synchronized to a face articulating /ga/

(McGurk & MacDonald, 1976). This visually induced auditory illusion has even been

found to be able to evoke a genuine auditory MMN (Colin et al., 2002; Möttönen,

Krause, Tiippana, & Sams, 2002; Sams et al., 1991), confirming the early and

automatic integration of audiovisual speech. A recent behavioural study also provided

evidence for influence in the other direction; i.e. from speech sounds to the perception

of lip-movements (Baart & Vroomen, 2010). The symmetrical influence of lip-

movements on speech perception as well as vice versa coincides well with the

symmetrical involvement of both low level auditory and low level visual brain areas

during audiovisual speech processing as found with fMRI (Calvert et al., 1999;

Calvert, Campbell, & Brammer, 2000; Macaluso, George, Dolan, Spence, & Driver,

2004). It is generally assumed that these low level auditory and visual responses

constitute feedback from multi-sensory integration sites, but feed forward connections

have also been suggested to contribute to audiovisual interactions. Finally,

audiovisual speech processing is characterized by being relatively insensitive to

temporal asynchrony between the visual and auditory signal; i.e. audiovisual speech

recognition remains robust for up to 300 ms asynchrony (Massaro, Cohen, & Smeele,

1996; Munhall, Gribble, Sacco, & Ward, 1996; Van Wassenhove, Grant, & Poeppel,

2007).

In contrast to audiovisual speech processing, no integration effects were found in low

level visual areas during passive letter – speech sound processing, while low level

auditory areas did show audiovisual integration effects (Van Atteveldt, Formisano,

Blomert, & Goebel, 2007; Van Atteveldt, Formisano, Goebel, & Blomert, 2007).

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

23

Although the involvement of low level visual areas was reported in two other studies

(Blau, van Atteveldt, Formisano, Goebel, & Blomert, 2008; Herdman et al., 2006), the

possibility that this may have occurred as a consequence of the use of an active task,

which has the potential of changing the relevance of incongruent and congruent

stimuli and thus the way the audiovisual object is processed, cannot be excluded (Van

Atteveldt, Formisano, Goebel, & Blomert, 2007). Furthermore the use of degraded

visual stimuli in the study by Blau et al may have induced early visual cortex

involvement.

To our knowledge, the role of low level visual areas during passive letter – speech

sound integration with a high temporal resolution method has not been investigated

before. In analogy to the use of the auditory MMN-paradigm to show influences of

letters on speech sound processing, we now used the visual counterpart of the MMN

(vMMN) in a crossmodal design to investigate the influences of speech sounds on

letter processing (Froyen, van Atteveldt, & Blomert, 2010). The vMMN is described

as a negativity measured at the occipital electrodes between 150 and 350 msec. after

the onset of an infrequent (deviant) visual stimulus in a sequence of frequently

presented (standard) visual stimuli (Czigler, 2007; Pazo-Alvarez, Cadaveira, &

Amenedo, 2003). The vMMN is suggested to have similar properties as the aMMN: It

can be evoked pre-attentively and it reflects the use of a memory representation of

regularities of visual stimulation (Czigler, 2007).

We used this vMMN paradigm to investigate the exact opposite as we did in the

previously described aMMN studies: The vMMN evoked by a deviant letter in a

visual-only experiment (Figure 3A) is compared with the vMMN evoked by the same

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

24

deviant letter accompanied by a standard speech sound (audiovisual, Figure 2). The

visual stimulation in both experiments is exactly the same, standard letter “a” and

deviant letter “o”. Results revealed no effect of speech sounds on the amplitude of the

vMMN to letter processing, and thus not a reversal of the effects of letters on speech

sounds as reported in our auditory MMN studies (Froyen, Bonte, van Atteveldt, &

Blomert, 2009; Froyen, van Atteveldt, Bonte, & Blomert, 2008). In addition, we also

presented a visual control stimulus “*” in order to control for any non-specific

crossmodal influences. Interestingly, the vMMN amplitude to non-letter processing

was significantly reduced in the cross-modal condition, if compared to the visual only

condition. Since the letter processing was not suppressed in the cross-modal

condition, this may indirectly reflect a content related letter specific cross-modal

effect. It was hypothesized that the crossmodal suppression of non-letter processing

by speech sounds constituted a baseline effect, implying that the non-modulation of

the letter processing by speech sounds reflected a content related crossmodal effect.

This would mean that letters were recognised as a relevant part of an audiovisual

object during processing in visual cortex. However, low level visual cortices did not

reveal an influence of speech sounds on letter processing as a consequence of cross-

modal integration. To summarize, these results indicate that speech sounds do not

automatically influence standard or deviant letter processing in a way comparable to

the automatic and systematic modulation of speech sound processing by letters.

Whereas low level auditory processing is automatically involved in letter – speech

sound integration (Froyen, van Atteveldt, Bonte, & Blomert, 2008), this does not

seem to hold for low level visual processing as was already indicated by previous

fMRI studies also using a passive task design (Van Atteveldt, Formisano, Blomert, &

Goebel, 2007; Van Atteveldt, Formisano, Goebel, & Blomert, 2007).

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

25

Secondly, both audiovisual speech as well as letter-speech sound integration resulted

in feedback to low level auditory cortex. It is therefore interesting to explore if this

influence differed, given the different origin of the respective associations, i.e. an

evolutionary versus a cultural background. Recall that the perception of lip-

movements could alter the perception of an unaltered speech sound (McGurk effect)

and even evoke an auditory MMN (Colin et al., 2002; Möttönen, Krause, Tiippana, &

Sams, 2002; Sams et al., 1991). We reasoned that if letter-sound integration is of a

comparable strength, it might in principle be possible that deviant letters violating a

standard letter may cause a change in the perception of an accompanying unaltered

speech sound. In analogy to the McGurk effect, we investigated if a deviant letter

violating a standard letter, but accompanied by an unaltered speech sound, also

invoked an auditory MMN: the standard speech sound was always /a/, presented

either with the standard letter “a” or the deviant letter “o”. If the incongruent deviant

letter “o” would evoke an auditory MMN, despite the standard speech sound /a/ being

unaltered, then this effect would be comparable to the McGurk effect and indicate that

the mechanism for letter – speech sound integration is comparable to audiovisual

speech integration. However, the deviant letter did not evoke an auditory MMN

(Froyen, van Atteveldt, & Blomert, 2010) pointing to an integration mechanism for

letter-speech sound pairs that is different from that for audiovisual speech. A further

argument for this difference may be inferred from the fact that all studies in which an

aMMN was evoked by a deviating visual part of an audiovisual stimulus employed an

auditory illusion (Besle, Fort, & Giard, 2005).

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

26

One last finding further supports the status of letter-sound pairs as a special type of

audiovisual objects: the feedback from STS to auditory cortex only occurred if letters

and speech sounds were presented in synchrony, although the time window for

integration in STS itself was rather wide (Van Atteveldt, Formisano, Blomert, &

Goebel, 2007), in agreement with the wide window for audiovisual speech. This

preference for a very narrow time window in which letters were allowed to modulate

speech sound processing in auditory cortex may be directly related to the finding of a

narrow time-window (<100 ms) during which congruency effects between letters and

speech sounds were found in the above reviewed aMMN studies (Froyen, Bonte, van

Atteveldt, & Blomert, 2009; Froyen, van Atteveldt, Bonte, & Blomert, 2008; Froyen,

Willems, & Blomert, in revision). This proximity in time principle governing letter-

speech sound integration, but not audiovisual speech, might be related to the lack of

shared characteristics between letters and speech sounds and thus to the inherently

arbitrary relation between them.

The nature of the link between the two modalities of a stimulus may thus not only be

critical for the automatic involvement of both low level sensory cortices (Calvert,

2001), but also for the time window of integration. When we see and hear speech, the

auditory speech signal shares time varying aspects with the concurrent lip movements

(Amedi, von Kriegstein, Van Atteveldt, Beauchamp, & Naumer, 2005; Calvert,

Brammer, & Iversen, 1998; Munhall & Vatikiotis-Bateson, 1998). These shared time

varying aspects constitute a strong natural cross-modal binding factor, which may

automatically recruit both low level sensory areas. Letters, however, are culturally

defined symbols without any natural relation with their corresponding speech sounds

and we hypothesize that the narrow temporal window of integration may compensate

for this lack of shared features. This effect of the type of link might also play a role in

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

27

the huge differences in learning rate and learning mode between both types of

audiovisual stimuli. While our MMN studies revealed a year-long development

towards automated letter-speech sound integration aided by explicit reading

instruction (Froyen, Bonte, van Atteveldt, & Blomert, 2009), audiovisual speech

integration is already observable very early in development (Burnham & Dodd, 2004)

without the requirement of explicit instructions. Burnham and Dodd reported that four

and a half month old infants showed habituation responses to the sound /da/ by

presenting them with an auditory /ba/ and the lip-movements of a /ga/, thus strongly

indicating audiovisual speech integration early in infancy. In further studies they

found that this early audiovisual speech effect did not change qualitatively during

development, but became increasingly more robust in children six, eight and eleven

years of age (Burnham & Sekiyama, in press; Sekiyama & Burnham, 2008).

Audiovisual speech integration is thus already measurable a few months after birth

and its magnitude increases during childhood development. This supports our

hypothesis that the human brain is well adapted to integrate naturally linked

audiovisual objects, but not in principle arbitrary audiovisual objects like letter –

speech pairs. This may explain why the learning of these non-natural audiovisual

objects may require explicit instruction and is much harder to acquire.

To conclude; letter – speech sound integration differs from audiovisual speech

integration in the involvement of low level visual areas, the type of integration

mechanism and the temporal window for integration. The arbitrary link between

letters and speech sounds might account for the recruitment of a different neural

mechanism with different properties as the neural mechanism involved in processing

naturally linked audiovisual stimuli like audiovisual speech. We may thus ask: Does

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

28

letter-sound association and integration resemble arbitrary associations of unfamiliar

audiovisual objects?

3.2 Do letter-speech sound associations resemble artificial audiovisual objects?

Many ERP-studies have investigated the exact timing of processing arbitrarily linked

unfamiliar audiovisual objects. Visual stimuli consisted, amongst others, of

geometrical figures like ellipses and circles, square wave gratings or a flash of a light-

emitting diode (LED). Auditory stimuli varied from rich tones shifting linearly in

frequency to tone pips or ‘pink’ noise bursts. It is generally accepted that the earliest

reliable interaction effects for such arbitrary audiovisual objects can be observed from

100 ms after stimulus onset on (Fort, Delpuech, Pernier, & Giard, 2002a, 2002b; Fort

& Giard, 2004; Giard & Peronnet, 1999; Molholm et al., 2002; Talsma, Doty, &

Woldorff, 2007; Talsma & Woldorff, 2005; Teder-Salejarvi, McDonald, Di Russo, &

Hillyard, 2002), which is in accordance with the timing of the MMN and the

crossmodal congruency effects reported in our MMN-studies on letter – speech sound

processing (Froyen, Bonte, van Atteveldt, & Blomert, 2009; Froyen, van Atteveldt,

Bonte, & Blomert, 2008; Froyen, Willems, & Blomert, in revision). Furthermore, the

temporal window for integrating arbitrarily linked audiovisual objects is very narrow;

i.e. within 25 to 50 ms (Lewald, Ehrensteinb, & Guski, 2001; Lewald & Guski, 2003;

Lewkowicz, 1996; Zampini, Guest, Shore, & Spence, 2005), again in accordance with

the narrow temporal window observed for letter – speech sound integration, less than

100 ms (Froyen, Bonte, van Atteveldt, & Blomert, 2009; Froyen, van Atteveldt,

Bonte, & Blomert, 2008; Froyen, Willems, & Blomert, in revision). It seems that, if a

visual and an auditory stimulus do not share any common features, proximity in time

is a necessary condition for the audiovisual linking to occur. Letter speech sound pairs

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

29

and artificial audiovisual objects are both characterized by a narrow time window for

audiovisual integration, probably as a consequence of the fact that the relation

between the elements of each audiovisual object is arbitrary.

Letter – speech sound pairs, although arbitrarily linked by cultural convention, differ

from artificial unfamiliar audiovisual objects in that they have become highly familiar

in experienced readers. Two recent studies revealed that different brain regions were

involved in the integration of familiar animal sounds and visual images versus

arbitrarily linked unfamiliar artificial sounds and images: The inferior frontal cortex

(IFC) was found to be involved in processing unfamiliar and incongruent familiar

audiovisual objects, while the superior temporal sulcus (STS) was involved in

processing familiar audiovisual stimuli (Hein et al., 2007; Naumer et al., 2009). The

finding of STS as integration site for letters and speech sounds (Blau, van Atteveldt,

Formisano, Goebel, & Blomert, 2008; Van Atteveldt, Formisano, Blomert, & Goebel,

2007; Van Atteveldt, Formisano, Goebel, & Blomert, 2004; Van Atteveldt,

Formisano, Goebel, & Blomert, 2007) may be considered in line with these findings,

since letter-speech sound associations are highly over-learned and thus familiar

audiovisual objects in experienced readers. However, we did not find the frontal

activations for incongruent (familiar) objects, since congruence and incongruence

were both expressed in the feedback from STS to auditory cortex (Van Atteveldt,

Formisano, Goebel, & Blomert, 2004) and were thus handled within the temporal

integration network for letters-sound processing.

Although letter speech sound associations become familiar audiovisual objects

through reading experience, one may object that they constitute artificial unfamiliar

audiovisual objects for a beginning reader. However, as explained in the introduction,

this is only partly true, since only one element of a letter-sound pair is new and

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

30

unfamiliar, i.e. the visual letter. At age six, when a child starts learning to read, it is

already familiar with auditory speech sounds. It is therefore particularly interesting

that Hashimoto and Sakai (2004) investigated the neural changes accompanying the

formation of new audiovisual associations between familiar Japanese speech sounds

and unfamiliar new Korean Hangul letters. In contrast with the studies by Naumer and

colleagues, in which unfamiliar visual and unfamiliar auditory stimuli were used,

Hashimoto and Sakai did not report differential effects in IFC, but in the left parietal

temporal gyrus (PITG) and left parieto-occipital cortex (PO) and in the connection

between these areas. If the subjects saw familiar Japanese Kana letters and heard

familiar Japanese speech sounds, activation was found in the STS region (Hashimoto

& Sakai, 2004). Clearly, learning to associate unfamiliar letters with unrelated, but

familiar speech sounds engages a different neural network than the one for learning

unfamiliar arbitrary audiovisual objects as were used in the Hein et al study.

In sum; the arbitrary link between letters and speech sounds probably accounts for the

narrow temporal window for integration as was also observed for the association of

unfamiliar, arbitrary audiovisual stimuli. On the other hand, letter – speech sound

pairs differ from unfamiliar arbitrary audiovisual objects, not only because they are

highly familiar in experienced readers, but also because one of the elements of the

association is already quite familiar for beginner readers. Consequently, a different

mechanism is involved in processing these more or less familiar letter speech sound

pairs than is involved in processing arbitrary unfamiliar audiovisual objects.

3. Conclusion

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

31

Fluency is the quintessence of skilled reading: It takes years to develop in normal

readers and does not develop in disabled readers. A closer look at the beginnings of

reading may shed light on the reasons for such an effortful and long learning process.

During a first encounter with written letters and words of an alphabetic orthography, a

child has to learn to associate letters with speech sounds to enable reading acquisition.

Recent electrophysiological evidence showed that it takes several years of reading

instruction and practice before the first signs of automatic integration of letters and

speech sounds appear in normally developing children. This gradual and highly

systematic development of letter-speech sound integration was interpreted as a result

from the dynamic interplay of brain maturation and reading experience. The validity

of this interpretation was supported by strong correlations between the

electrophysiological indices of letter-speech sound integration and behavioural

reading performance. The present meta-study indicated that this stretched learning

process in normal readers might be a consequence of the emergence of letter-sound

pairs as a rather specific type of audiovisual objects. The finding that dyslexic

children and adults do not develop adequate and automatic integration of letters and

speech sounds, despite years of reading training, also indicates a potential role for this

multisensory learning process.

Letter-speech sound associations are cultural inventions and therefore biologically

arbitrary in nature. This arbitrariness stems from the lack of shared features between

the elements, which form the integrated audiovisual object. The integration of

artificially linked objects is characterized by a narrow time window, which was

indeed also found for letter-sound integration. However letter-speech sound pairs

differ from artificial unfamiliar audiovisual objects, because one element of the

association is already familiar when the reading process starts and furthermore letter-

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

32

speech sound pairs become highly over-learned and thus familiar audiovisual objects

in more experienced readers. Despite this familiarity aspect of letter-speech sound

pairs, they remain in principle arbitrary associations differing in many aspects from

natural associations like audiovisual speech. Although letter-speech sound pairs and

audiovisual speech both show integration in the multi-sensory left superior temporal

sulcus (STS), only the natural integration processes recruit both uni-sensory auditory

and unisensory visual cortices automatically in this integration process. Letter-speech

sound integration only recruits the auditory cortex in the integration process by means

of a modulating feedback mechanism from STS. This modulating feedback however

only occurred in a very narrow time window, thus emphasizing the basically arbitrary

nature of letter-speech sound pairs, independent of their familiarity in experienced

readers. This familiarity aspect however is important, since arbitrary and unfamiliar

audiovisual objects are mainly processed in the inferior frontal cortex and letter-

speech sound pairs are not. In short; letters and speech sounds are integrated in a left

temporal network involving STS/STG, but not visual cortex, probably as a

consequence of the development of familiarity. These integration processes only

occur in a very narrow time window probably as a consequence of the arbitrary link

between letters and speech sounds. Although letter – speech sound pairs thus share

similarities with audiovisual speech as well as with unfamiliar arbitrary audiovisual

objects, letter – speech sound pairs seem to develop into unique audiovisual objects

that furthermore have to be processed in a unique way in order to be enable fluent

reading. Future research should provide insights in how far the unique multi-sensory

nature of letters and words permeates through the neural network for reading.

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

33

Acknowledgements

The main collaborators of our studies included in this review were Nienke van

Atteveldt (ERP and fMRI studies) and Vera Blau, Elia Formisano and Rainer Goebel

(fMRI studies). The main grants supporting this research were: Dutch Health Care

Insurance Board (CVZ 608/001/2005) to Leo Blomert and European Union-6th

Framework Program (LSHM/CT/2005/018696) to Leo Blomert and Rainer Goebel).

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

34

References

Amedi, A., von Kriegstein, K., Van Atteveldt, N. M., Beauchamp, M. S., & Naumer,

M. J., 2005. Functional imaging of human crossmodal identification and

object recognition. Experimental Brain Research, 166, 559-571.

Baart, M., & Vroomen, J., 2010. Do you see what you are hearing? Cross-modal

effects of speech sounds on lipreading. Neuroscience letters, 471, 100-103.

Beauchamp, M. S., Argall, B. D., Bodurka, J., Duyn, J., & Martin, A., 2004.

Unraveling multisensory integration: patchy organization within human STS

multisensory cortex. Nature Neuroscience, 7, 1190 - 1192.

Besle, J., Fort, A., & Giard, M., 2005. Is the auditory memory sensitive to visual

information. Experimental Brain Research, 166, 337-344.

Bishop, D. V. M., 2007. Using mismatch negativity to study central auditory

processing in developmental language and literacy impairments: Where are

we, and where should we be going? Psychological bulletin, 133, 651-672.

Blau, V., Atteveldt, N., Ekkebus, M., Goebel, R., & Blomert, L., 2009). Reduced

neural integration of letters and speech sounds links phonological and reading

deficits in adult dyslexia. Current Biology, 19, 503 - 508.

Blau, V., Reithler, J., van Atteveldt, N., Seitz, J., Gerretsen, P., Goebel, R. &

L.Blomert, 2010. Deviant processing of letters and speech sounds as

proximate cause of reading failure: A functional magnetic resonance imaging

study of dyslexic children. Brain, doi:10.1093/brain/awp308

Blau, V., van Atteveldt, N., Formisano, E., Goebel, R., & Blomert, L., 2008. Task-

irrelevant visual letters interact with the processing of speech sounds in

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

35

heteromodal and unimodal cortex. European Journal of Neuroscience, 28(3),

500-509.

Blomert, L., & Vaessen, A., 2009. 3DM Differentiaal diagnose voor dyslexie:

Cognitieve analyse van lezen en spelling [3DM Differential diagnostics for

dyslexia: Cognitive analysis of reading and spelling]. Amsterdam: Boom Test

Publishers.

Blomert, L., & Willems, G. (in revision). Is there a causal link from a phonological

awareness deficit to reading failure in children at familial risk for dyslexia?

Bonte, M., Mitterer, H., Zellagui, N., Poelmans, H., & Blomert, L., 2005. Auditory

cortical tuning to statistical regularities in phonology. Clinical

Neurophysiology, 116(12), 2765-2774.

Bonte, M., Poelmans, H., & Blomert, L., 2007. Deviant neurophysiological responses

to phonological regularities in speech in dyslexic children. Neuropsychologia,

45, 1427-1437.

Burnham, D., & Dodd, B., 2004. Auditory - visual speech integration by prelinguistic

infants: Perception of an emergent consonant in the McGurk effect.

Developmental Psychobiology, 45, 204 - 220.

Burnham, D., & Sekiyama, K. (in press). Investigating auditory-visual speech

preception development using the ontogenetic and differential language

methods. In E. Vatikiotis-Bateson, P. Perrier & G. Bailly (Eds.), Advances in

auditory–visual speech processing. Cambridge: Mitt Press.

Byrne, B., & R, F.-B., 1989. Phonemic awareness and letter knowledge in the child's

acquisitions of the alphabetic principle. Journal of Educational Psychology,

80, 313-321.

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

36

Calvert, G. A., 2001. Crossmodal processing in the human brain: insights from

functional neuroimaging studies. Cereb Cortex, 11(12), 1110-1123.

Calvert, G. A., Brammer, M. J., Bullmore, E. T., Campbell, R., Iversen, S. D., &

David, A. S., 1999. Response amplification in sensory-specific cortices during

crossmodal binding. Neuroreport, 10(12), 2619-2623.

Calvert, G. A., Brammer, M. J., & Iversen, S. D., 1998. Crossmodal identification.

Trends in Cognitive Sciences, 2, 247-253.

Calvert, G. A., Bullmore, E. T., Brammer, M. J., Campbell, R., Williams, S. C.,

McGuire, P. K., et al., 1997. Activation of auditory cortex during silent

lipreading. Science, 276(5312), 593-596.

Calvert, G. A., Campbell, R., & Brammer, M. J., 2000. Evidence from functional

magnetic resonance imaging of crossmodal binding in the human heteromodal

cortex. Current Biology, 10(11), 649-657.

Castro-Caldas, A., Petersson, K. M., Reis, A., Stone-Elander, S., & Ingvar, M., 1998.

The illiterate brain. Learning to read and write during childhood influences the

functional organization of the adult brain. Brain, 121 ( Pt 6), 1053-1063.

Chein, J. M., & Schneider, W., 2005. Neuroimaging studies of practice-related

change: fMRI and meta-analytic evidence of a domain-general control

network for learning. Cognitive Brain Research, 25, 607-623.

Cohen, L., Dehaene, S., Naccache, L., Lehericy, S., Dehaene-Lambertz, G., Henaff,

M. A., et al., 2000. The visual word form area: spatial and temporal

characterization of an initial stage of reading in normal subjects and posterior

split-brain patients. Brain, 123 ( Pt 2), 291-307.

Colin, C., Radeau, M., Soquet, A., Demolin, D., Colin, F., & Deltenre, P., 2002.

Mismatch negativity evoked by the McGurk-MacDonald effect: a phonetic

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

37

representation within short-term memory. Clin Neurophysiol, 113(4), 495-

506.

Coltheart, M., Rastle, K., Perry, C., Langdon, R., & Ziegler, J., 2001. DRC: A dual

route cascaded model of visual word recognition and reading aloud.

Psychological review, 108, 204 - 256.

Csépe, V., 2003. Dyslexia: Different Brain, Different Behavior. New York: Kluwer

Academic/ Plenum Publishers.

Czigler, I., 2007. Visual mismatch negativity; Violation of nonattended environmental

regularities. Journal of Psychophysiology, 21, 224-230.

Dijkstra, T., Schreuder, R., & Frauenfelder, U. H. ,1989. Grapheme Context Effects

on Phonemic Processing. Language and Speech, 32, 89-108.

Ehri, L. C. (1995). Phases of development in learning to read words by sight. Journal

of Research in Reading, 18, 116-125.

Fort, A., Delpuech, C., Pernier, J., & Giard, M., 2002a. Dynamics of cortico-

subcortical cross-modal operations involved in audio-visual object detection in

humans. Cerebral Cortex, 12(10), 1031-1039.

Fort, A., Delpuech, C., Pernier, J., & Giard, M., 2002b. Early auditory-visual

interactions in human cortex during nonredundant target identification.

Cognitive Brain Research, 14(1), 20-30.

Fort, A., & Giard, M., 2004. Multiple electrophysiological mechanisms of audiovisual

integration in human preception. In G. A. Calvert, C. Spence & B. E. Stein

(Eds.), The Handbook of Multisensory Processes (pp. 503 - 513). London: The

Mitt Press.

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

38

Frith, U., 1985. Beneath the surface of developmental dyslexia. In K. E. Patterson, J.

C. Marshall & M. Coltheart (Eds.), Surface dyslexia. London: Routledge &

Kegan-Paul.

Froyen, D., Bonte, M., van Atteveldt, N., & Blomert, L., 2009. The long road to

automation: Neurocognitive development of letter-speech sound processing.

Journal of Cognitive Neuroscience, 21, 567 - 580.

Froyen, D., de Doelder, N., & Blomert, L. (submitted). Cross-modal letter-specific

influences on speech sound processing.

Froyen, D., van Atteveldt, N., & Blomert, L., 2010. Exploring the role of low level

visual processing in letter–speech sound integration: a visual MMN study.

Frontiers in Integrative Neuroscience, 4(9).

Froyen, D., van Atteveldt, N., Bonte, M., & Blomert, L., 2008. Cross-modal

enhancement of the MMN to speech sounds indicates early and automatic

integration of letters and speech sounds. Neuroscience Letters, 430, 23-28.

Froyen, D., Willems, G., & Blomert, L. (in revision). Evidence for a specific cross-

modal binding deficit in dyslexia: An MMN-study of letter – speech sound

processing.

Gabrieli, J. D. E., 2009. Dyslexia: A new synergy between education and cognitive

neuroscience. Science, 325, 280-283.

Ghazanfar, A. A., & Schroeder, C. E., 2006. Is the neocortex essentially

multisensory? Trends in Cognitive Science, 10, 278-285.

Giard, M. H., & Peronnet, F., 1999. Auditory-visual integration during multimodal

object recognition in humans: a behavioral and electrophysiological study.

Journal of Cognitive Neuroscience, 11(5), 473-490.

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

39

Hashimoto, R., & Sakai, K. L., 2004. Learning letters in adulthood: direct

visualization of cortical plasticity for forming a new link between orthography

and phonology. Neuron, 42(2), 311-322.

Hein, G., Doehrmann, O., Müller, N. G., Kaiser, J., Muckli, L., & Naumer, M. J.,

2007. Object familiarity and semantic congruency modulate responses in

cortical audiovisual integration areas. The Journal of Neuroscience, 27, 7881-

7887.

Herdman, A. T., Fujioka, T., Chau, W., Ross, B., Pantev, C., & Picton, T. W., 2006.

Cortical oscillations related to processing congruent and incongruent

grapheme-phoneme pairs. Neuroscience Letters, 399, 61 - 66.

Hocking, J., & Price, C. J., 2008. The role of the posterior superior temporal sulcus in

audiovisual processing. Cerebral Cortex, 18, 2439 - 2449.

Jusczyk, P. W. (1997). The discovery of spoken language. Cambridge, MA: Mitt

Press.

Kujala, T., & Naatanen, R., 2001. The mismatch negativity in evaluating central

auditory dysfunction in dyslexia. Neurosci Biobehav Rev, 25(6), 535-543.

Lewald, J., Ehrensteinb, W. H., & Guski, G., 2001. Spatio-temporal constraints for

auditory-visual integration. Behavioural Brain Research, 121(1-2), 69-79.

Lewald, J., & Guski, R., 2003. Cross-modal perceptual integration of spatially and

temporally disparate auditory and visual stimuli. Cognitive Brain Research,

16, 468-478.

Lewkowicz, D. J.,1996. Perception of auditory-visual temporal synchrony in human

infants. Journal of Experimental Psychology. Human Perception and

Performance, 22, 1094-1106.

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

40

Liberman, I. Y., 1973. Segmentation of the spoken word and reading acquisition.

Bulletin of the Orton Society, 23, 65-77.

Lieberman, P., 2006. Toward an evolutionary biology of language. Cambridge, MA:

MIT Press.

Macaluso, E., George, N., Dolan, R., Spence, C., & Driver, J., 2004. Spatial and

temporal factors during processing of audiovisual speech: a PET study.

NeuroImage, 21, 725-732.

Marsh, G., Friedman, M., Welch, V., & Desberg, P., 1981. A cognitive-

developmental theory of reading acquisition. In G. E. MacKinnon & T. G.

Waller (Eds.), Reading research: advances in theory and practice. New York:

Academic Press.

Massaro, D. W., Cohen, M. M., & Smeele, P. M., 1996. Perception of asynchronous

and conflicting visual and auditory speech. Journal of the Acoustical Society

of America, 100, 1777-1786.

Mattingly, I. G., 1972. Reading, the linguistic process, and linguistic awareness. In J.

F. Kavanagh & I. G. Mattingly (Eds.), Language by ear and by eye: The

relationship between speech and reading. Cambridge, MA: MIT Press.

McCandliss, B. D., Cohen, L., & Dehaene, S., 2003. The Visual Word Form Area:

expertise for reading in the fusiform gyrus. Trends in Cognitive Sciences, 7(7),

293-299.

McGurk, H., & MacDonald, J., 1976. Hearing lips and seeing voices. Nature, 263,

747.

Mitterer, H., & Blomert, L., 2003. Coping with phonological assimilation in speech

perception: evidence for early compensation. Perception & Psychophysics, 65,

956-969.

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

41

Molholm, S., Ritter, W., Murray, M. M., Javitt, D. C., Schroeder, C. E., & Foxe, J. J.,

2002. Multisensory auditory-visual interactions during early sensory

processing in humans: a high-density electrical mapping study. Brain Res

Cogn Brain Res, 14(1), 115-128.

Morais, J., Cary, L., Alegria, J., & Bertelson, P., 1979. Does awareness of speech as a

saquence of phones arise spontaneously? Cognition, 49, 957 - 958.

Möttönen, R., Krause, C. M., Tiippana, K., & Sams, M., 2002. Processing of changes

in visual speech in the human auditory cortex. Brain Res Cogn Brain Res,

13(3), 417-425.

Munhall, K., Gribble, P., Sacco, L., & Ward, M. 1996. Temporal constraints on the

McGurk effect. Perception & Psychophysics, 58(3), 351-362.

Munhall, K., & Vatikiotis-Bateson, E., 1998. The moving face during speech

communication. In R. Campbell, B. Dodd & D. Burnham (Eds.), Hearing by

Eye, Part 2: The psychology of speechreading and audiovisual speech (pp.

123-139). London, UK: Taylor & Francis, Psychology Press.

Näätänen, R., 1995. The Mismatch Negativity: A powerful tool for cognitive

neuroscience. Ear and Hearing, 16, 6-18.

Näätänen, R., 2000. Mismatch negativity (MMN): perspectives for application. Int J

Psychophysiol., 37, 3-10.

Näätänen, R., 2001. The perception of speech sounds by the human brain as reflected

by the mismatch negativity (MMN) and its magnetic equivalent (MMNm).

Psychophysiology, 38, 1-21.

Naumer, M. J., Doehrmann, O., Müller, N. G., Muckli, L., Kaiser, J., & Hein, G.,

2009. Cortical plasticity of audio-visual object representations. Cerebral

Cortex, 19, 1641-1653.

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

42

Pazo-Alvarez, P., Cadaveira, F., & Amenedo, E., 2003. MMN in the visual modality:

a review. Biol Psychol, 63(3), 199-236.

Perfetti, C. A., Beck, I., Bell, L., & Hughes, C., 1987. Phonemic knowledge and

learning to read are reciprocal: A longitudinal study of first grade children.

Merrill-Palmer Quarterly, 33, 283-319.

Picton, T. W., Alain, C., Otten, L., Ritter, W., & Achim, A., 2000. Mismatch

negativity: different water in the same river. Audiol Neurootol, 5, 111-139.

Raij, T.,1999. Patterns of brain activity during visual imagery of letters. J Cogn

Neurosci, 11, 282-299.

Raij, T., Uutela, K., & Hari, R., 2000. Audiovisual integration of letters in the human

brain. Neuron, 28, 617-625.

Rayner, K., & Pollatsek, A., 1989. The psychology of reading. New Yersey: Prentice-

Hall.

Sams, M., Aulanko, R., Hamalainen, M., Hari, R., Lounasmaa, O. V., Lu, S. T., et al.,

1991. Seeing speech: visual information from lip movements modifies activity

in the human auditory cortex. Neuroscience letters, 127(1), 141-145.

Schröger, E., 1998. Measurement and interpretation of the mismatch negativity.

Behavior Research Methods Instruments & Computers, 30, 131-145.

Sekiyama, K., & Burnham, D., 2008. Impact of language on development of auditory-

visual speech perception. Developmental Science, 11, 303-317.

Share, D. L., 1995. Phonological recoding and self-teaching: sine qua non of reading

acquisition. Cognition, 55, 151-218.

Sumby, W. H., & Pollack, I., 1954. Visual contribution to speech intelligibility in

noise. The Journal of the Acoustical Society of America, 26(2), 212-215.

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

43

Talsma, D., Doty, T. J., & Woldorff, M. G., 2007. Selective attention and audiovisual

integration: Is attenting to both modalities a prerequisite for early integration.

Cerebral Cortex, 17, 679-690.

Talsma, D., & Woldorff, M. G., 2005. selective attention and multisensory

integration: Multiple phases of effects on the evoked brain activity. Journal of

Cognitive Neuroscience, 17, 1098-1114.

Teder-Salejarvi, W. A., McDonald, J. J., Di Russo, F., & Hillyard, S. A., 2002. An

analysis of audio-visual crossmodal integration by means of event-related

potential (ERP) recordings. Brain Res Cogn Brain Res, 14(1), 106-114.

UNESCO. (2005). Education for all: Literacy for life. Paris: UNESCO Publishing.

Vaessen, A., & Blomert, L. ,2010. Long-term cognitive dynamics of fluent reading

development. Journal of Experimental Child Psychology, 105, 213-231.

Van Atteveldt, N., Formisano, E., Blomert, L., & Goebel, R., 2007. The effect of

temporal asynchrony on the multisensory integration of letters and speech

sounds. Cerebral Cortex, 13, 962-974.

Van Atteveldt, N., Formisano, E., Goebel, R., & Blomert, L., 2004. Integration of

letters and speech sounds in the human brain. Neuron, 43, 271-282.

Van Atteveldt, N., Formisano, E., Goebel, R., & Blomert, L., 2007. Top-down task

effects overrule automatic multisensory responses to letter-sound pairs in

auditory association cortex. NeuroImage, 36, 1345-1360.

Van Wassenhove, V., Grant, K. W., & Poeppel, D., 2007. Temporal window of

integration in auditory-visual speech perception. Neuropsychologia, 45, 598-

607.

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

44

Wimmer, H., Landerl, K., Linortner, R., & Hummer, P., 1991. The relationship of

phonemic arareness to reading acquisition: More consequence than

precondition but still important. Cognition, 40, 219-249.

Winkler, I., Kujala, A., Tiitinen, H., Sivonen, P., Alku, P., Lehtokoski, A., et al.

(1999). Brain responses reveal the learning of foreign language phonemes.

Psychophysiology, 36, 638-642.

Zampini, M., Guest, S., Shore, D., & Spence, C., 2005. Audio-visual simultaneity

judgments. Perception and Psychophysics, 67, 531-544.

Ziegler, J. C., & Goswami, U., 2005. Reading acquisition, developmental dyslexia,

and skilled reading across languages: a psycholinguistic grain size theory.

Psychol Bull, 131(1), 3-29.

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

45

Figure 1.

Design of the audiovisual MMN studies with the auditory only condition (A) and the

audiovisual condition (B). “A” represents the auditory stimulus presentation, “V”

represents the visual stimulus presentation. The arrow indicates the violation of the

standard speech sound in the auditory condition (A) and the double violation of both

the standard speech sound and the letter in the audiovisual condition (B).

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

46

Figure 2.

Design of the audiovisual MMN studies with the auditory only condition (A) and the

audiovisual condition (B). “A” represents the auditory stimulus presentation, “V”

represents the visual stimulus presentation. The arrow indicates the violation of the

standard speech sound in the auditory condition (A) and the double violation of both

the standard speech sound and the letter in the audiovisual condition (B).

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

47

Figure 3.

Mean amplitudes of the visual MMN averaged over the three occipital electrodes (O1,

O2 and Oz).

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

48