multisensory learning and learning to read
TRANSCRIPT
�������� ����� ��
Multi-sensory learning and learning to read
Leo Blomert, Dries Froyen
PII: S0167-8760(10)00169-8DOI: doi: 10.1016/j.ijpsycho.2010.06.025Reference: INTPSY 10166
To appear in: International Journal of Psychophysiology
Please cite this article as: Blomert, Leo, Froyen, Dries, Multi-sensory learn-ing and learning to read, International Journal of Psychophysiology (2010), doi:10.1016/j.ijpsycho.2010.06.025
This is a PDF file of an unedited manuscript that has been accepted for publication.As a service to our customers we are providing this early version of the manuscript.The manuscript will undergo copyediting, typesetting, and review of the resulting proofbefore it is published in its final form. Please note that during the production processerrors may be discovered which could affect the content, and all legal disclaimers thatapply to the journal pertain.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
1
Keynote #3
Multi-sensory learning and learning to read
Leo Blomert & Dries Froyen
Department of Cognitive Neuroscience & Maastricht Brain Imaging Centre
Faculty of Psychology & Neuroscience, Maastricht University, The Netherlands
Running head: Multi-sensory learning and learning to read
Key words: letters and speech sound correspondences; multi-sensory processing,
audiovisual integration, reading development
Correspondence to
Leo Blomert Ph.D.
Faculty of Psychology & Neuroscience
Maastricht University
P.O. Box 616
6200 MD Maastricht
The Netherlands
Tel.: +31-43-3881949
e-mail: [email protected]
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
2
Abstract
The basis of literacy acquisition in alphabetic orthographies is the learning of the
associations between the letters and the corresponding speech sounds. In spite of this
primacy in learning to read, there is only scarce knowledge how this audiovisual
integration process works and which mechanisms are involved. Recent
electrophysiological studies of letter – speech sound processing have revealed that
normally developing readers take years to automate these associations and dyslexic
readers hardly exhibit automation of these associations. It is argued that the reason for
this effortful learning may reside in the nature of the audiovisual process that is
recruited for the integration of in principle arbitrarily linked elements. It is shown that
letter-speech sound integration does not resemble the processes involved in the
integration of natural audiovisual objects such as audiovisual speech. The automatic
symmetrical recruitment of the assumedly uni-sensory visual and auditory cortices in
audiovisual speech integration does not occur for letter and speech sound integration.
It is also argued that letter-speech sound integration only partly resembles the
integration of arbitrarily linked unfamiliar audiovisual objects. Letter-sound
integration and artificial audiovisual objects share the necessity of a narrow time
window for integration to occur. However, they differ from these artificial objects,
because they constitute an integration of partly familiar elements which acquire
meaning through the learning of an orthography. Although letter – speech sound pairs
share similarities with audiovisual speech processing as well as with unfamiliar,
arbitrary objects, it seems that letter – speech sound pairs develop into unique
audiovisual objects that furthermore have to be processed in a unique way in order to
enable fluent reading and thus very likely recruit other neurobiological learning
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
3
mechanisms than the ones involved in learning natural or arbitrary unfamiliar
audiovisual associations.
1. The role of multi-sensory learning in learning to read
The importance of written language in today’s society can hardly be overestimated. It
allows us to transcend our communication limits in space and time and proficient
literacy has thus become a crucial marker of the quality of life (UNESCO, 2005). In
the last decade neuroimaging studies have identified a brain region that shows
specialisation for fast visual word recognition; i.e. the putative Visual Word form
Area (Cohen et al., 2000) in the occipito-temporal cortex. Since fluency and
automaticity are the most salient features of experienced reading it is indeed plausible
that a neural network involved in visual object recognition has specialised for
recognising visual letters and word forms (McCandliss, Cohen, & Dehaene, 2003).
Fluency of reading however is an intriguing characteristic, since we need years of
explicit instruction and practice before we start to exhibit any fluency in visual word
recognition (Vaessen & Blomert, 2010) and persons with dyslexia may never attain a
really fluent reading performance (Gabrieli, 2009). This long-lasting process contrasts
sharply with the way we learn to master spoken language. Infants and young children
start to pick up and develop the many complexities of spoken language without
explicit instructions at a time when literacy instruction is still a potential event in a far
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
4
future (e.g., Jusczyk, 1997). So we may ask the question: What makes learning to
read so effortful?
One obvious answer to this question is that writing systems have only emerged fairly
recently in evolution; i.e. a few thousand years ago and for the majority of people
only a few hundred years ago (Rayner & Pollatsek, 1989). In contrast, spoken
communication is an old habit and speech probably arose some 60.000 ago
(Lieberman, 2006). Therefore, it is very likely that our brains are evolutionarily
prepared for speech, but not for learning to read. So, we may further ask: Which
mechanisms enable the brain to learn to read?
A first hint may come from the fact that spoken language development not only
precedes the learning of written language evolutionarily, but also ontogenetically: we
speak and listen before we write and read. To find an answer we thus need a closer
look at the beginnings of reading. The very first step in learning to read is establishing
associations between letters and speech sounds (Frith, 1985; Marsh, Friedman, Welch,
& Desberg, 1981). In alphabetic languages written words are created out of a limited
set of elements, i.e. letters. These letters, are purportedly representing their spoken
counterparts, i.e. speech sounds. Since children have already mastered spoken
language to a considerable degree when they enter school, it has been suggested that
written language builds on the spoken language system, particularly on the
mechanisms for processing speech sounds (Liberman, 1973; Mattingly, 1972). By
now it has been generally accepted that the ability to manipulate speech sounds and
learning to read reciprocally influence each other during reading development
(Perfetti, Beck, Bell, & Hughes, 1987). It has indeed been shown that the brain
responses to phonological stimulation of healthy illiterates differed from the responses
of normal readers (Castro-Caldas, Petersson, Reis, Stone-Elander, & Ingvar, 1998).
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
5
This influence from learning to read on spoken phonological representations may
occur because before learning to read the smallest elements of the spoken speech
system are not isolated speech sounds or phonemes, but larger chunks of sound
information which do not directly match onto the newly learned letters (Ziegler &
Goswami, 2005). So, although the impression may be that the learning of an
orthography directly connects to the already existing phonological representations of
speech sounds, this may in effect consist of a re-shaping of the relevant spoken
language elements, thus changing permanently the spoken language system. Although
the mechanisms enabling these cross-modal influences are still unknown, we
hypothesize that the formation of letter-sound associations very likely constitutes the
vehicle via which learning to read changes spoken phonological representations.
Considering the importance of letter – speech sound associations for learning to read,
we may again rephrase our question to: How does the brain establish associations
between letters and speech sounds?
Recent research has made clear that multi-sensory information processing is part and
parcel of object perception and recognition in daily life, whereby the brain integrates
the information from different modalities into a coherent percept (Ghazanfar &
Schroeder, 2006). Neuroimaging research pointed to the superior temporal sulcus
(STS) as an important brain area for audiovisual integration processes (e.g.
Beauchamp, Argall, Bodurka, Duyn, & Martin, 2004; Calvert, 2001). Although this
latter study mainly found activations for meaningful objects like animals and tools in
this area, it was speculated that STS might also be instrumental for other kinds of
audiovisual associations (see Hocking & Price, 2008 for a similar interpretation). The
findings of a posterior temporal network for audiovisual speech processing (lip-
reading), including the assumedly uni-sensory visual and auditory sensory cortices
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
6
(e.g. Calvert, Brammer, & Iversen, 1998; Calvert et al., 1997) also sparked interest in
letter-speech sound pairs as a special kind of audiovisual objects (Hashimoto & Sakai,
2004; Herdman et al., 2006; Raij, Uutela, & Hari, 2000). The interest for letters and
their corresponding speech sounds partly stems from the fact that they are recent
cultural inventions, sharply contrasting with e.g. audiovisual speech. The attraction
also partly resides in the fact that letter-speech sound pairs are highly over-learned
multi-sensory associations allowing people to manipulate the congruency between the
elements without activating higher order cognitive processes. Letter-speech sound
pairs may thus be conceptualized as in principle arbitrary associations, which acquire
meaning by learning a specific orthography. Although, as discussed above, one
element of these associations seems already in place when learning to read starts, this
is only partly true: the exact phonemes corresponding to letters or letter strings are not
part of the neural and behavioural repertoire of spoken language before learning to
read (Blomert & Willems, in revision; Morais, Cary, Alegria, & Bertelson, 1979;
Wimmer, Landerl, Linortner, & Hummer, 1991) and existing representations of
speech sounds probably need a fundamental re-modelling to fit the requirements for
adequate letter-speech sound associations. Thus, it is more adept to formulate that one
element of the pair, i.e., phonemic speech sounds, are familiar in kind but not in type,
when learning to read starts. These considerations point to the potentially ambiguous
status of letter-sound pairs as audiovisual objects: Do letter-speech sound associations
resemble natural audiovisual objects with known elements or do they resemble
arbitrary associations between unfamiliar elements? And lastly, is it possible that the
type of association (natural versus artificial) and the way these associations are used
during reading implies different association mechanisms?
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
7
Before we review the findings on the neural correlates of letter-speech sound
processing, we first need to clarify our concept of letter-sound associations. Although
influential models of reading (Coltheart, Rastle, Perry, Langdon, & Ziegler, 2001) and
reading development (Ehri, 1995; Share, 1995) implemented a central role for the
learning of letter-speech sound relations and its role in the learning of new words, any
fundamental insights in the nature and workings of this association process are
basically missing. This lack of basic research findings might, in part, be attributed to
the widespread opinion that the associations between letters and speech sounds are
mastered within a few months by most children in most alphabetic orthographies
(e.g., Ziegler & Goswami, 2005). The assumed fast and easy learning process would
by implication transfer the burden for explaining effortful and long-lasting fluent
reading development to other processes. Therefore we want to emphasize the
difference between the learning of letter speech sound associations and “letter
knowledge” or “letter-sound knowledge”, as “it is possible in principle for a child to
know the modal pronunciations for all letters and still have not in place any notion
that these sounds are parts of words” (Byrne & Fielding-Barnsley, 1989). Recent
evidence indeed showed that even dyslexic children, who exhibited serious problems
learning to read and learning letter-speech sound associations nevertheless showed
full letter-knowledge mastery just like their normal reading peers towards the end of
first grade (Blomert & Willems, in revision). Furthermore, it was recently shown that
the speed of letter-sound association processes systematically decreased over the full
range of primary school grades without reaching a floor in sixth grade in normal
readers (Blomert & Vaessen, 2009) suggesting an ongoing automation of these
associations (Chein & Schneider, 2005). The salient differences in learning rate
between letter knowledge and letter-speech sound associations suggest that an
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
8
exploration of the type of audiovisual association, which is formed when learning
letter-speech sound correspondences may provide key insights for understanding
reading development.
2.1 Magneto-encephalographic (MEG) insights in letter-speech sound
associations
A rare early behavioural study revealed a first basic insight in letter-speech sound
processing by investigating the influence of letter primes on the recognition of a
speech sound in spoken syllables (Dijkstra, Schreuder, & Frauenfelder, 1989).
Subjects were asked to identify the vowel in a spoken syllable consisting of a
consonant and a vowel. The target vowel was primed by a letter prime that was either
congruent or incongruent with the target. The results showed clear decreases of
response latencies if prime and target were congruent, thus indicating automatic cross-
modal activations of speech sounds by letters.
It took another decade before a study appeared which investigated the neural
correlates of the automatic audiovisual integration of letters and speech sounds with
an emphasis on its temporal dynamics (Raij, Uutela, & Hari, 2000). This magneto-
encephalographic (MEG) study reported no letter–specific cross-modal interaction
effects in temporal sensory specific cortices. A first difference between the processing
of letters (matching and non-matching) and non-letter control stimuli was recorded in
the temporo-occipital-parietal junction around 225 milliseconds after stimulus onset.
And only the superior temporal sulci (STS) revealed strong interactions between
letters and speech sounds 380-450 ms after stimulus onset. It is noteworthy that the
interactions in the right hemisphere started almost 70 ms later than in the left STS,
suggesting that the audiovisual integration process of letters and speech sounds
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
9
mainly and first occurred in left STS. These results thus established that STS also
served as a converging and integration site for arbitrarily linked objects like letters
and speech sounds. The involvement of the visual and auditory cortices, which was
reported for audiovisual speech processing (Calvert, 2001), was not confirmed for
these culturally defined letter – speech sound associations.
A recent MEG study investigated changes in cortical oscillations consequent upon the
perception of congruent and incongruent letter-speech sound pairs (Herdman et al.,
2006). Subjects saw Japanese Hiragana graphemes, which have a fully transparent
relation with the auditory presented corresponding phonemes and were asked to press
a button to indicate congruency or incongruency between the letters and speech
sounds. The reaction times showed faster responses for congruent pairs, confirming
the findings of the early behavioural study (Dijkstra, Schreuder, & Frauenfelder,
1989). The MEG results showed greater response power to congruent letter-sound
pairs than incongruent ones in the left auditory cortex, and a later congruency effect in
the bilateral visual cortices. Although the latter effect might also indicate that
phonemes directly influence the processing of a letter, the authors suggested that this
was due to feedback from multi-sensory integration sites like STS given the later time
window. The fact that these activations in ‘sensory specific cortices’ were not
reported in the earlier MEG study by Raij et al. (2000) was contributed to the fact that
this study relied on minimum current estimates to estimate the evoked fields, whereas
their own MEG study capitalized on total change in signal power of neural activation,
a method closer to the BOLD signal as measured with fMRI. This might explain why
in the study by Herdman and co-authors auditory cortex involvement was found, as
was also reported in an earlier fMRI study of letter – speech sound processing (Van
Atteveldt, Formisano, Goebel, & Blomert, 2004).
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
10
Together, these studies convincingly point to an automated cross-modal process for
integrating letters and speech sounds with a focus in the left temporal cortex.
However, the role of low level sensory specific cortices in the integration process
needs further clarification.
2.2 Functional magnetic resonance imaging (fMRI) insights in letter – speech
sound associations
A first basic insight in the neural network for processing letters and speech sounds
was reported in a fMRI study manipulating the congruency of letter-sound pairs (Van
Atteveldt, Formisano, Goebel, & Blomert, 2004). Adult experienced readers were
asked to watch letters and listen to speech sounds presented simultaneously without
having to execute a task. The results revealed that areas in STS not only responded to
isolated letters and speech sounds, but also showed an enhanced response to bimodal
stimulation whether congruent or incongruent, thus confirming the role of STS as an
audiovisual integration site also for in principle arbitrary associations. Interestingly,
the ‘sensory specific’ response from auditory cortex showed an enhancement to
congruent and a suppression to incongruent letter-sound pairs in comparison to the
processing of speech sounds alone. This modulating influence of letters on speech
sound processing, presumably as a consequence of feedback from STS, was
reminiscent of the activation patterns in temporal cortex for audiovisual speech
(Calvert, 2001). Since the time window for the integration of audiovisual speech in
STS was reportedly rather wide (Massaro, Cohen, & Smeele, 1996; Munhall, Gribble,
Sacco, & Ward, 1996; Van Wassenhove, Grant, & Poeppel, 2007), we tested if this
was also the case for associations like letter-speech sound correspondences. A fMRI
study in which we systematically varied the stimulus onset asynchrony (SOA)
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
11
between the letters and the speech sounds confirmed the wide temporal window for
integration in STS, but interestingly, feedback to auditory cortex only occurred if the
stimuli were presented simultaneously (Van Atteveldt, Formisano, Blomert, &
Goebel, 2007). Although the earlier fMRI studies investigating audiovisual speech
and also the MEG study by Herdman and co-authors investigating letter-sound
associations reported feedback to visual cortex, we did not find congruency effects
and thus no support for potential feedback to visual cortices. It should therefore be
noted, that the MEG letter-sound study used an active task (subjects had to decide if
letters and speech sounds were congruent or incongruent by button press) whereas the
fMRI study used a passive design (subjects did not have to do anything other than
watch letters and listen to speech sounds). It is therefore relevant to report the results
of a third fMRI study in which the congruency effect found for the passive task in
auditory cortex disappeared in the active task (Van Atteveldt, Formisano, Goebel, &
Blomert, 2007). The disappearance of the congruency effect probably resulted from
the fact that congruent and incongruent letter-sound pairs become equally relevant in
a decision task, whereas only the congruent pairs have relevance during more natural
processing. This indicates that the passive design is the more ecologically valid design
for triggering reading related processes.
In summary: The results of the MEG and fMRI studies confirm that STS is not only
an integration site for natural associations, but also for in principle arbitrary
associations such as letter-speech sound pairs. The difference between types of cross-
modal associations so far seems to reside in bidirectional feedback from hetero-modal
integration sites to visual and auditory cortices in the case of natural associations like
audiovisual speech, whereas it is still unclear if there is similar or asymmetric
feedback (mainly to auditory cortex) in the case of letter-speech sound associations.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
12
2.3 Electrophysiological (MMN) insights in letter – speech sound associations
The cross-modal MMN paradigm for letter-speech sound associations
Since the time course of letter-sound processing and the absence of a task proved
critical for revealing neural pathways for the association and/or integration of letters
and speech sounds we chose a method of investigation characterized by a high
temporal resolution and which doesn’t require a task: The auditory MisMatch
Negativity (MMN) paradigm (Näätänen, 1995). The MMN is an automatic deviance
detection mechanism, known to be evoked between 100 and 200 ms when, in a
sequence of auditory stimuli, a rarely presented sound (the deviant) deviates in one or
more aspects from the sound that is frequently presented (the standard). The MMN is
considered to reflect the neurophysiological correlate of a comparison process
between an incoming auditory stimulus and the memory trace formed by the repetitive
aspect of the standard stimulus (Näätänen, 2000; Picton, Alain, Otten, Ritter, &
Achim, 2000; Schröger, 1998). The MMN has repeatedly been shown to be sensitive
not only to auditory deviants, but also to language-specific speech sound
representations in adults and in children (Bonte, Mitterer, Zellagui, Poelmans, &
Blomert, 2005; Bonte, Poelmans, & Blomert, 2007; Mitterer & Blomert, 2003;
Näätänen, 2001; Winkler et al., 1999). The child MMN is furthermore suggested to be
a stable component resembling the adult MMN (Csépe, 2003), and is particularly
useful for research with children, because its evocation does not require sustained
attention or completion of a task. Moreover, the MMN has been used before to
investigate phonological and auditory processing deficits in children and adults with
reading disabilities (Bishop, 2007; Bonte, Poelmans, & Blomert, 2007; Csépe, 2003;
Kujala & Naatanen, 2001).
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
13
Although the MMN is considered a purely auditory deviance detection mechanism, it
has also been successfully used to investigate neural correlates of audiovisual
integration. Most of these studies, however, investigated how a deviant visual
stimulus evoked an auditory MMN by illusorily changing the percept of the standard
sound (e.g. the McGurk or ventriloquist illusions, see later paragraphs). Although
letters are not expected to cause illusory effects on speech sound processing and may
thus not be expected to evoke an auditory MMN, we speculated that the automated
binding of highly over-learned letters and speech sounds might act as a comparably
strong influence as the aforementioned illusions. This speculation was inspired by a
study showing that subjects listening to speech sounds were very capable of
imagining the physically absent letters corresponding to these speech sounds (Raij,
1999). The majority of subjects showed neural activations during mental imagining,
close to the ones later reported for audiovisual integration of letters and speech sounds
(Raij, Uutela, & Hari, 2000).
In all electrophysiological studies to be reported below a cross-modal MMN paradigm
is used. In our MMN studies, participants were presented with speech sound standards
/a/ and deviants /o/ appearing either in isolation (auditory only) or with a standard
letter “a” (audiovisual) (Figure 1). The rationale for this design resided in the
assumption that if letters and speech sounds are automatically integrated, this might
be reflected in an effect of letters on the MMN to speech sounds due to a double
cross-modal violation of the deviant speech sound /o/ towards the standard speech
sound /a/ and the standard letter “a”. We thus predicted an effect of the audiovisual
presentation on the MMN amplitude or latency over and above the standard auditory-
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
14
only deviancy effect. In addition, we investigated the temporal window within which
letters and speech sounds were processed as an integrated audiovisual object. Letters
were presented either simultaneously with or preceding the speech sounds by 100 or
200 ms. Participants were normally reading adults (Froyen, van Atteveldt, Bonte, &
Blomert, 2008), normally reading children (Froyen, Bonte, van Atteveldt, & Blomert,
2009) and dyslexic children (Froyen, Willems, & Blomert, in revision).
Letter-speech sound processing in adults
To probe the feasibility of the cross-modal paradigm for studying letter-speech sound
processing we first investigated normal reading adults for whom letter – speech sound
processing is expected to be fully automated (Froyen, van Atteveldt, Bonte, &
Blomert, 2008). Results showed that the MMN amplitude to the speech sound deviant
presented together with letters was found to be enhanced in comparison with the same
speech sound deviant presented in isolation (Figure 2C). However, this effect was
only found if letters and speech sounds were presented simultaneously. The MMN
amplitude enhancement induced by letters diminished linearly with increasing
stimulus onset asynchrony (SOA); when the letter preceded the speech sound by 100
ms the cross-modal enhancement effect was no longer significant, and when letters
and speech sounds where presented with 200 ms SOA the MMN amplitude differed
significantly from the MMN amplitude in the condition with 0 ms SOA. Considering
the early and automatic evocation of the MMN, these results strongly indicated that
letters and speech sounds were automatically integrated in experienced readers’
brains, within an early but also narrow temporal window of integration.
To investigate if this effect was specific for letter-speech sound presentations we
conducted a replication study, again with speech sounds in isolation and with
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
15
simultaneously presented letters, this time however also with non-letter visual stimuli
(Froyen, de Doelder, & Blomert, submitted). The non-letter was a scrambled version
of the letter, and thus contained the same basic visual features as the letter but not its
letter-specific content. In this way we were able to check if the MMN amplitude
enhancement as found in the earlier described adult study (Froyen, van Atteveldt,
Bonte, & Blomert, 2008) was genuinely letter-specific. The results revealed first, that
in both audiovisual conditions (letters and scrambled letters) the MMN amplitude was
enhanced in comparison with the auditory only condition. However, the MMN
amplitude enhancement in the audiovisual letter condition was significantly stronger
than the enhancement in the audiovisual non-letter condition. The letter-specificity of
the MMN enhancement effect was further bolstered by the finding of a significant and
substantial correlation between reading fluency and the letter-specific MMN
enhancement. These results not only replicated the previously reported letter - speech
sound integration effect (Froyen, van Atteveldt, Bonte, & Blomert, 2008), but also
established that this effect was specific for visual letters and strongly related with
reading. And lastly, these results together indicated the MMN as a valid and valuable
tool for the investigation of letter-speech sound integration.
Development of letter – speech sound processing
Learning the correspondences between letters and their corresponding speech sounds
is a crucial step towards reading acquisition. As discussed in the introduction there are
widely different interpretations of the time it takes to learn these associations; varying
from a few months to several years. The only available direct evidence is based on
behavioural measurements of the speed of letter-speech sound identification and
discrimination and strongly favours an extended development (Blomert & Vaessen,
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
16
2009). However, to the best of our knowledge, no neural correlates of letter – speech
sound association development have been reported yet. The availability of a valid
cross-modal MMN paradigm, well suited for research with children, presented the
opportunity to investigate the development of pre-attentive letter – speech sound
processing in beginner as well as in advanced readers in primary education (Froyen,
Bonte, van Atteveldt, & Blomert, 2009). We presented eight and eleven year old
children, with respectively one and four years of reading instruction, with the exact
same letter – speech sound MMN paradigm as was used previously with adults
(Froyen, van Atteveldt, Bonte, & Blomert, 2008). Audiovisual stimuli were either
presented simultaneously or the letter preceded the speech sound by 200 milliseconds.
The results revealed that eight year old beginner readers showed full letter knowledge
mastery at the time of the experiment. Despite this letter knowledge, we found no
modulation of the MMN by letters and therefore no signs of letter-speech sound
integration for both SOA conditions. Surprisingly, even in eleven year old advanced
readers we did not observe the MMN modulation by letters found in adults during
simultaneous audiovisual presentation. However we did find a significant MMN
amplitude enhancement when the letter preceded the speech sound by 200 ms, a time
window much too wide for adults to show audiovisual integration. We interpreted
these findings as an indication that letter– speech sound integration had become
automatic after four years of reading and reading instruction. The finding of a wider
temporal window for integration than in adults was interpreted as a probable
consequence of a still maturing brain. This main finding of early automatic letter-
speech sound integration was completed by a second finding in a much later time
window: Whenever there was no early integration effect, there was a systematic
influence of letters on speech sound processing at 650 ms after speech sound onset.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
17
Interestingly, both eight and eleven year old beginner and advanced readers showed
this late pattern of letter influence on speech sound processing. Moreover, the
influence of SOA on this late 650 ms effect mirrored the pattern found for the SOA
effects on the MMN found for eleven year old children and adults: the older children
(eleven year olds) showed the effect at simultaneous presentation, whereas the
younger group only revealed the effect, if the letter preceded the speech sound by 200
ms (see table 1). This SOA effect was again speculatively interpreted as a
consequence of a difference in brain maturation, leading to a difference in processing
speed. Because the early and late effects of letters on speech sound processing might
reflect differences in reading experience, we re-analyzed the adult data (Froyen, van
Atteveldt, Bonte, & Blomert, 2008) and only found an early congruency effect on the
MMN, but no late effects. We interpreted this effect to mean that if a reader has
become fully experienced, letter-speech sound integration is fully automatic and no
further processing is required after the first fast stimuli integration. If, in contrast,
reading has just started, letters and speech sounds are not yet automatically integrated,
hence no effects on the MMN will be found. The fact that young beginning readers
nevertheless exhibited full letter knowledge, thus knowing which letter belonged to
which speech sound, may indicate that the much later effect of letters on speech sound
processing represented still weak associations between letters and speech sounds. The
eleven year olds showed a pattern, partly reminiscent of automatic adult letter-sound
integration and partly reminiscent of still effortful association also found in beginning
readers. They did show early MMN effects, but only if the time window was 200 ms
and they did show late association effects only when letters and speech sounds were
presented simultaneously. Together these results indicated a transition from the mere
association of letters and speech sounds in beginner readers to more automatic, but
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
18
still not “adult-like”, integration in advanced readers. The regularity of these effects
over time suggests a gradual development of letter-speech sound integration processes
resulting from a dynamic interplay between brain maturation and developing reading
experience.
Letter-speech sound processing in dyslexia
If the automation of letter-speech sounds associations in normally reading children
takes years to develop, one wonders what happens in children suffering from a
constitutional developmental reading disorder, like dyslexia. Results from recent
neuro-imaging studies indeed revealed reduced activation for letter-sound associations
in dyslexic children and adults. But more interestingly, these results also showed
comparable activation for congruent as well as incongruent letter-sound pairs in
dyslexic readers, whereas all normal readers, adults and children, immediately
suppressed the activations for incongruent pairs (Blau, Atteveldt, Ekkebus, Goebel, &
Blomert, 2009; Blau et al., 2010). These results indicated that letter-speech sound
association in dyslexia was not only anomalous at the start of reading development,
but remained anomalous into adulthood.
To investigate the nature of these anomalous associations we conducted the same
cross-modal MMN paradigm, used in our earlier studies, with eleven year old
dyslexic children (Froyen, Willems, & Blomert, in revision). The results revealed that,
in contrast to their normal reading peers, children with dyslexia showed no effect of
letters on the MMN and thus no signs of early automatic integration even after four
years of reading instruction. Since our previous study with normally reading children
(Froyen, Bonte, van Atteveldt, & Blomert, 2009) had indicated that in the absence of
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
19
early integration, deviant letters still might reveal an effect on letter-speech sound
association in a much later time-window, we searched for possible letter effects in the
650 ms time-window. We indeed found a late effect of deviant letters, but this effect
differed from what we found for their normal reading peers in our previous study
(Table 1). These eleven year old normal reading children did show a late effect only if
letters and speech sounds were presented simultaneously. In contrast the dyslexic
children only showed a late effect, when letters preceded speech sounds by 200 ms,
resembling the late effect found for the much younger normal reading children who
had only received one year of reading instruction. We interpreted the finding that
dyslexic children still showed weak associations of letters and speech sounds and no
integration, after four years of reading instruction, as an indication that letter-speech
sound associations deficits may pose as a proximal cause of their reading problems.
The validity of this interpretation was further reinforced by the finding of a strong
correlation between the late letter effect and their word reading performance.
In closing it is of interest to note, that the dyslexic readers showed very similar speech
sound processing as their age matched normal reading peers when speech sounds
where presented in isolation. They did however show problems in processing letter-
speech sound pairs, in which the speech sound was the same as the one presented in
isolation. It is thus unlikely that the letter – speech sound association/integration
problems evidenced in this MMN study resulted from impoverished or otherwise poor
phonological representations or poor processing of the speech sounds involved. In line
with this result we found no correlation between the quality of the late letter-speech
sound processing effect and their phonological awareness performance. These
findings also suggested that the reduced speech sound processing reported in the
fMRI studies revealing anomalous associations (Blau, Atteveldt, Ekkebus, Goebel, &
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
20
Blomert, 2009; Blau et al., 2010) might have been more consequence than cause of
these cross-modal problems.
Since all dyslexic children did show normal phonological discrimination of speech
sounds in isolation and also did show full mastery of ‘letter knowledge’, like their
normal reading peers, it is appropriate to ask if there was some deficiency with the
association, per se. This then brings us back to the question of whether the specific
audiovisual nature of the letter-speech sound associations may be of influence on the
quality and learning rate of the associations in reading success as well as in reading
failure. In the following we will therefore try to clarify the nature of letter-speech
sound associations and thus clarify their status as audiovisual objects.
2.4 Summary of main findings
The learning of letter- speech sound associations is the very basis of reading
acquisition and consists of the formation of integrated audiovisual objects presumably
necessary for the development of fluent reading. Recent electrophysiological evidence
shows that letter-speech sound integration takes years to fully automate and never
seems to reach adequate automatic integration in dyslexic persons, despite the fact
that all normal and dyslexic children know which letters belongs to which sounds
within a year of reading instruction. It is therefore necessary to differentiate between
‘letter knowledge’ and the learning of cross-modal letter-sound integration for the
purpose of the development of fluent reading.
Letter-speech sound associations are cultural inventions and thus in principle arbitrary
audiovisual objects, of which the speech sound element may be regarded as familiar
in kind, but not in type, because the particular speech sounds which match directly
onto letters or letter strings only develop as a consequence of learning to read. The
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
21
integration of letters and speech sounds occurs mainly in the left STS and feedback to
the auditory cortex modulates the processing of speech, however only if both stimuli
are perceived in synchrony. The involvement of low level visual areas is however still
unclear and will be further evaluated in following paragraphs. Since the processes
involved in learning letter-sound associations do not seem to coincide with the
mechanisms for associating natural audiovisual objects, we will further explore the
status of letter-sound associations as audiovisual objects.
2. Letter speech sound associations as audiovisual objects
By clarifying the status of letter – speech sound pairs as audiovisual objects we might
also gain some insights in the fundaments of normal and abnormal reading
development. We will therefore evaluate in what aspects letter – speech sound
processing resembles natural associations like audiovisual speech and artificial
audiovisual object processing of unfamiliar elements like flashes and beeps, and in
which aspects it differs from them.
3.1 Does letter-speech sound integration resemble natural audiovisual
integration?
Whenever another person speaks to us, we focus mainly on the auditory speech
signal. However, the lip movements accompanying the speech signal have been
shown to contribute substantially to the processing of audiovisual speech, e.g. by
improving the intelligibility of speech in noisy environments by 20dB (Sumby &
Pollack, 1954). Audiovisual integration of communication signals may have occurred
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
22
timely in evolution and the automation and strength of audiovisual speech integration
has been convincingly demonstrated with the famous McGurk-paradigm: The
auditory signal /ba/ is perceived as / da/ when synchronized to a face articulating /ga/
(McGurk & MacDonald, 1976). This visually induced auditory illusion has even been
found to be able to evoke a genuine auditory MMN (Colin et al., 2002; Möttönen,
Krause, Tiippana, & Sams, 2002; Sams et al., 1991), confirming the early and
automatic integration of audiovisual speech. A recent behavioural study also provided
evidence for influence in the other direction; i.e. from speech sounds to the perception
of lip-movements (Baart & Vroomen, 2010). The symmetrical influence of lip-
movements on speech perception as well as vice versa coincides well with the
symmetrical involvement of both low level auditory and low level visual brain areas
during audiovisual speech processing as found with fMRI (Calvert et al., 1999;
Calvert, Campbell, & Brammer, 2000; Macaluso, George, Dolan, Spence, & Driver,
2004). It is generally assumed that these low level auditory and visual responses
constitute feedback from multi-sensory integration sites, but feed forward connections
have also been suggested to contribute to audiovisual interactions. Finally,
audiovisual speech processing is characterized by being relatively insensitive to
temporal asynchrony between the visual and auditory signal; i.e. audiovisual speech
recognition remains robust for up to 300 ms asynchrony (Massaro, Cohen, & Smeele,
1996; Munhall, Gribble, Sacco, & Ward, 1996; Van Wassenhove, Grant, & Poeppel,
2007).
In contrast to audiovisual speech processing, no integration effects were found in low
level visual areas during passive letter – speech sound processing, while low level
auditory areas did show audiovisual integration effects (Van Atteveldt, Formisano,
Blomert, & Goebel, 2007; Van Atteveldt, Formisano, Goebel, & Blomert, 2007).
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
23
Although the involvement of low level visual areas was reported in two other studies
(Blau, van Atteveldt, Formisano, Goebel, & Blomert, 2008; Herdman et al., 2006), the
possibility that this may have occurred as a consequence of the use of an active task,
which has the potential of changing the relevance of incongruent and congruent
stimuli and thus the way the audiovisual object is processed, cannot be excluded (Van
Atteveldt, Formisano, Goebel, & Blomert, 2007). Furthermore the use of degraded
visual stimuli in the study by Blau et al may have induced early visual cortex
involvement.
To our knowledge, the role of low level visual areas during passive letter – speech
sound integration with a high temporal resolution method has not been investigated
before. In analogy to the use of the auditory MMN-paradigm to show influences of
letters on speech sound processing, we now used the visual counterpart of the MMN
(vMMN) in a crossmodal design to investigate the influences of speech sounds on
letter processing (Froyen, van Atteveldt, & Blomert, 2010). The vMMN is described
as a negativity measured at the occipital electrodes between 150 and 350 msec. after
the onset of an infrequent (deviant) visual stimulus in a sequence of frequently
presented (standard) visual stimuli (Czigler, 2007; Pazo-Alvarez, Cadaveira, &
Amenedo, 2003). The vMMN is suggested to have similar properties as the aMMN: It
can be evoked pre-attentively and it reflects the use of a memory representation of
regularities of visual stimulation (Czigler, 2007).
We used this vMMN paradigm to investigate the exact opposite as we did in the
previously described aMMN studies: The vMMN evoked by a deviant letter in a
visual-only experiment (Figure 3A) is compared with the vMMN evoked by the same
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
24
deviant letter accompanied by a standard speech sound (audiovisual, Figure 2). The
visual stimulation in both experiments is exactly the same, standard letter “a” and
deviant letter “o”. Results revealed no effect of speech sounds on the amplitude of the
vMMN to letter processing, and thus not a reversal of the effects of letters on speech
sounds as reported in our auditory MMN studies (Froyen, Bonte, van Atteveldt, &
Blomert, 2009; Froyen, van Atteveldt, Bonte, & Blomert, 2008). In addition, we also
presented a visual control stimulus “*” in order to control for any non-specific
crossmodal influences. Interestingly, the vMMN amplitude to non-letter processing
was significantly reduced in the cross-modal condition, if compared to the visual only
condition. Since the letter processing was not suppressed in the cross-modal
condition, this may indirectly reflect a content related letter specific cross-modal
effect. It was hypothesized that the crossmodal suppression of non-letter processing
by speech sounds constituted a baseline effect, implying that the non-modulation of
the letter processing by speech sounds reflected a content related crossmodal effect.
This would mean that letters were recognised as a relevant part of an audiovisual
object during processing in visual cortex. However, low level visual cortices did not
reveal an influence of speech sounds on letter processing as a consequence of cross-
modal integration. To summarize, these results indicate that speech sounds do not
automatically influence standard or deviant letter processing in a way comparable to
the automatic and systematic modulation of speech sound processing by letters.
Whereas low level auditory processing is automatically involved in letter – speech
sound integration (Froyen, van Atteveldt, Bonte, & Blomert, 2008), this does not
seem to hold for low level visual processing as was already indicated by previous
fMRI studies also using a passive task design (Van Atteveldt, Formisano, Blomert, &
Goebel, 2007; Van Atteveldt, Formisano, Goebel, & Blomert, 2007).
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
25
Secondly, both audiovisual speech as well as letter-speech sound integration resulted
in feedback to low level auditory cortex. It is therefore interesting to explore if this
influence differed, given the different origin of the respective associations, i.e. an
evolutionary versus a cultural background. Recall that the perception of lip-
movements could alter the perception of an unaltered speech sound (McGurk effect)
and even evoke an auditory MMN (Colin et al., 2002; Möttönen, Krause, Tiippana, &
Sams, 2002; Sams et al., 1991). We reasoned that if letter-sound integration is of a
comparable strength, it might in principle be possible that deviant letters violating a
standard letter may cause a change in the perception of an accompanying unaltered
speech sound. In analogy to the McGurk effect, we investigated if a deviant letter
violating a standard letter, but accompanied by an unaltered speech sound, also
invoked an auditory MMN: the standard speech sound was always /a/, presented
either with the standard letter “a” or the deviant letter “o”. If the incongruent deviant
letter “o” would evoke an auditory MMN, despite the standard speech sound /a/ being
unaltered, then this effect would be comparable to the McGurk effect and indicate that
the mechanism for letter – speech sound integration is comparable to audiovisual
speech integration. However, the deviant letter did not evoke an auditory MMN
(Froyen, van Atteveldt, & Blomert, 2010) pointing to an integration mechanism for
letter-speech sound pairs that is different from that for audiovisual speech. A further
argument for this difference may be inferred from the fact that all studies in which an
aMMN was evoked by a deviating visual part of an audiovisual stimulus employed an
auditory illusion (Besle, Fort, & Giard, 2005).
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
26
One last finding further supports the status of letter-sound pairs as a special type of
audiovisual objects: the feedback from STS to auditory cortex only occurred if letters
and speech sounds were presented in synchrony, although the time window for
integration in STS itself was rather wide (Van Atteveldt, Formisano, Blomert, &
Goebel, 2007), in agreement with the wide window for audiovisual speech. This
preference for a very narrow time window in which letters were allowed to modulate
speech sound processing in auditory cortex may be directly related to the finding of a
narrow time-window (<100 ms) during which congruency effects between letters and
speech sounds were found in the above reviewed aMMN studies (Froyen, Bonte, van
Atteveldt, & Blomert, 2009; Froyen, van Atteveldt, Bonte, & Blomert, 2008; Froyen,
Willems, & Blomert, in revision). This proximity in time principle governing letter-
speech sound integration, but not audiovisual speech, might be related to the lack of
shared characteristics between letters and speech sounds and thus to the inherently
arbitrary relation between them.
The nature of the link between the two modalities of a stimulus may thus not only be
critical for the automatic involvement of both low level sensory cortices (Calvert,
2001), but also for the time window of integration. When we see and hear speech, the
auditory speech signal shares time varying aspects with the concurrent lip movements
(Amedi, von Kriegstein, Van Atteveldt, Beauchamp, & Naumer, 2005; Calvert,
Brammer, & Iversen, 1998; Munhall & Vatikiotis-Bateson, 1998). These shared time
varying aspects constitute a strong natural cross-modal binding factor, which may
automatically recruit both low level sensory areas. Letters, however, are culturally
defined symbols without any natural relation with their corresponding speech sounds
and we hypothesize that the narrow temporal window of integration may compensate
for this lack of shared features. This effect of the type of link might also play a role in
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
27
the huge differences in learning rate and learning mode between both types of
audiovisual stimuli. While our MMN studies revealed a year-long development
towards automated letter-speech sound integration aided by explicit reading
instruction (Froyen, Bonte, van Atteveldt, & Blomert, 2009), audiovisual speech
integration is already observable very early in development (Burnham & Dodd, 2004)
without the requirement of explicit instructions. Burnham and Dodd reported that four
and a half month old infants showed habituation responses to the sound /da/ by
presenting them with an auditory /ba/ and the lip-movements of a /ga/, thus strongly
indicating audiovisual speech integration early in infancy. In further studies they
found that this early audiovisual speech effect did not change qualitatively during
development, but became increasingly more robust in children six, eight and eleven
years of age (Burnham & Sekiyama, in press; Sekiyama & Burnham, 2008).
Audiovisual speech integration is thus already measurable a few months after birth
and its magnitude increases during childhood development. This supports our
hypothesis that the human brain is well adapted to integrate naturally linked
audiovisual objects, but not in principle arbitrary audiovisual objects like letter –
speech pairs. This may explain why the learning of these non-natural audiovisual
objects may require explicit instruction and is much harder to acquire.
To conclude; letter – speech sound integration differs from audiovisual speech
integration in the involvement of low level visual areas, the type of integration
mechanism and the temporal window for integration. The arbitrary link between
letters and speech sounds might account for the recruitment of a different neural
mechanism with different properties as the neural mechanism involved in processing
naturally linked audiovisual stimuli like audiovisual speech. We may thus ask: Does
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
28
letter-sound association and integration resemble arbitrary associations of unfamiliar
audiovisual objects?
3.2 Do letter-speech sound associations resemble artificial audiovisual objects?
Many ERP-studies have investigated the exact timing of processing arbitrarily linked
unfamiliar audiovisual objects. Visual stimuli consisted, amongst others, of
geometrical figures like ellipses and circles, square wave gratings or a flash of a light-
emitting diode (LED). Auditory stimuli varied from rich tones shifting linearly in
frequency to tone pips or ‘pink’ noise bursts. It is generally accepted that the earliest
reliable interaction effects for such arbitrary audiovisual objects can be observed from
100 ms after stimulus onset on (Fort, Delpuech, Pernier, & Giard, 2002a, 2002b; Fort
& Giard, 2004; Giard & Peronnet, 1999; Molholm et al., 2002; Talsma, Doty, &
Woldorff, 2007; Talsma & Woldorff, 2005; Teder-Salejarvi, McDonald, Di Russo, &
Hillyard, 2002), which is in accordance with the timing of the MMN and the
crossmodal congruency effects reported in our MMN-studies on letter – speech sound
processing (Froyen, Bonte, van Atteveldt, & Blomert, 2009; Froyen, van Atteveldt,
Bonte, & Blomert, 2008; Froyen, Willems, & Blomert, in revision). Furthermore, the
temporal window for integrating arbitrarily linked audiovisual objects is very narrow;
i.e. within 25 to 50 ms (Lewald, Ehrensteinb, & Guski, 2001; Lewald & Guski, 2003;
Lewkowicz, 1996; Zampini, Guest, Shore, & Spence, 2005), again in accordance with
the narrow temporal window observed for letter – speech sound integration, less than
100 ms (Froyen, Bonte, van Atteveldt, & Blomert, 2009; Froyen, van Atteveldt,
Bonte, & Blomert, 2008; Froyen, Willems, & Blomert, in revision). It seems that, if a
visual and an auditory stimulus do not share any common features, proximity in time
is a necessary condition for the audiovisual linking to occur. Letter speech sound pairs
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
29
and artificial audiovisual objects are both characterized by a narrow time window for
audiovisual integration, probably as a consequence of the fact that the relation
between the elements of each audiovisual object is arbitrary.
Letter – speech sound pairs, although arbitrarily linked by cultural convention, differ
from artificial unfamiliar audiovisual objects in that they have become highly familiar
in experienced readers. Two recent studies revealed that different brain regions were
involved in the integration of familiar animal sounds and visual images versus
arbitrarily linked unfamiliar artificial sounds and images: The inferior frontal cortex
(IFC) was found to be involved in processing unfamiliar and incongruent familiar
audiovisual objects, while the superior temporal sulcus (STS) was involved in
processing familiar audiovisual stimuli (Hein et al., 2007; Naumer et al., 2009). The
finding of STS as integration site for letters and speech sounds (Blau, van Atteveldt,
Formisano, Goebel, & Blomert, 2008; Van Atteveldt, Formisano, Blomert, & Goebel,
2007; Van Atteveldt, Formisano, Goebel, & Blomert, 2004; Van Atteveldt,
Formisano, Goebel, & Blomert, 2007) may be considered in line with these findings,
since letter-speech sound associations are highly over-learned and thus familiar
audiovisual objects in experienced readers. However, we did not find the frontal
activations for incongruent (familiar) objects, since congruence and incongruence
were both expressed in the feedback from STS to auditory cortex (Van Atteveldt,
Formisano, Goebel, & Blomert, 2004) and were thus handled within the temporal
integration network for letters-sound processing.
Although letter speech sound associations become familiar audiovisual objects
through reading experience, one may object that they constitute artificial unfamiliar
audiovisual objects for a beginning reader. However, as explained in the introduction,
this is only partly true, since only one element of a letter-sound pair is new and
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
30
unfamiliar, i.e. the visual letter. At age six, when a child starts learning to read, it is
already familiar with auditory speech sounds. It is therefore particularly interesting
that Hashimoto and Sakai (2004) investigated the neural changes accompanying the
formation of new audiovisual associations between familiar Japanese speech sounds
and unfamiliar new Korean Hangul letters. In contrast with the studies by Naumer and
colleagues, in which unfamiliar visual and unfamiliar auditory stimuli were used,
Hashimoto and Sakai did not report differential effects in IFC, but in the left parietal
temporal gyrus (PITG) and left parieto-occipital cortex (PO) and in the connection
between these areas. If the subjects saw familiar Japanese Kana letters and heard
familiar Japanese speech sounds, activation was found in the STS region (Hashimoto
& Sakai, 2004). Clearly, learning to associate unfamiliar letters with unrelated, but
familiar speech sounds engages a different neural network than the one for learning
unfamiliar arbitrary audiovisual objects as were used in the Hein et al study.
In sum; the arbitrary link between letters and speech sounds probably accounts for the
narrow temporal window for integration as was also observed for the association of
unfamiliar, arbitrary audiovisual stimuli. On the other hand, letter – speech sound
pairs differ from unfamiliar arbitrary audiovisual objects, not only because they are
highly familiar in experienced readers, but also because one of the elements of the
association is already quite familiar for beginner readers. Consequently, a different
mechanism is involved in processing these more or less familiar letter speech sound
pairs than is involved in processing arbitrary unfamiliar audiovisual objects.
3. Conclusion
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
31
Fluency is the quintessence of skilled reading: It takes years to develop in normal
readers and does not develop in disabled readers. A closer look at the beginnings of
reading may shed light on the reasons for such an effortful and long learning process.
During a first encounter with written letters and words of an alphabetic orthography, a
child has to learn to associate letters with speech sounds to enable reading acquisition.
Recent electrophysiological evidence showed that it takes several years of reading
instruction and practice before the first signs of automatic integration of letters and
speech sounds appear in normally developing children. This gradual and highly
systematic development of letter-speech sound integration was interpreted as a result
from the dynamic interplay of brain maturation and reading experience. The validity
of this interpretation was supported by strong correlations between the
electrophysiological indices of letter-speech sound integration and behavioural
reading performance. The present meta-study indicated that this stretched learning
process in normal readers might be a consequence of the emergence of letter-sound
pairs as a rather specific type of audiovisual objects. The finding that dyslexic
children and adults do not develop adequate and automatic integration of letters and
speech sounds, despite years of reading training, also indicates a potential role for this
multisensory learning process.
Letter-speech sound associations are cultural inventions and therefore biologically
arbitrary in nature. This arbitrariness stems from the lack of shared features between
the elements, which form the integrated audiovisual object. The integration of
artificially linked objects is characterized by a narrow time window, which was
indeed also found for letter-sound integration. However letter-speech sound pairs
differ from artificial unfamiliar audiovisual objects, because one element of the
association is already familiar when the reading process starts and furthermore letter-
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
32
speech sound pairs become highly over-learned and thus familiar audiovisual objects
in more experienced readers. Despite this familiarity aspect of letter-speech sound
pairs, they remain in principle arbitrary associations differing in many aspects from
natural associations like audiovisual speech. Although letter-speech sound pairs and
audiovisual speech both show integration in the multi-sensory left superior temporal
sulcus (STS), only the natural integration processes recruit both uni-sensory auditory
and unisensory visual cortices automatically in this integration process. Letter-speech
sound integration only recruits the auditory cortex in the integration process by means
of a modulating feedback mechanism from STS. This modulating feedback however
only occurred in a very narrow time window, thus emphasizing the basically arbitrary
nature of letter-speech sound pairs, independent of their familiarity in experienced
readers. This familiarity aspect however is important, since arbitrary and unfamiliar
audiovisual objects are mainly processed in the inferior frontal cortex and letter-
speech sound pairs are not. In short; letters and speech sounds are integrated in a left
temporal network involving STS/STG, but not visual cortex, probably as a
consequence of the development of familiarity. These integration processes only
occur in a very narrow time window probably as a consequence of the arbitrary link
between letters and speech sounds. Although letter – speech sound pairs thus share
similarities with audiovisual speech as well as with unfamiliar arbitrary audiovisual
objects, letter – speech sound pairs seem to develop into unique audiovisual objects
that furthermore have to be processed in a unique way in order to be enable fluent
reading. Future research should provide insights in how far the unique multi-sensory
nature of letters and words permeates through the neural network for reading.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
33
Acknowledgements
The main collaborators of our studies included in this review were Nienke van
Atteveldt (ERP and fMRI studies) and Vera Blau, Elia Formisano and Rainer Goebel
(fMRI studies). The main grants supporting this research were: Dutch Health Care
Insurance Board (CVZ 608/001/2005) to Leo Blomert and European Union-6th
Framework Program (LSHM/CT/2005/018696) to Leo Blomert and Rainer Goebel).
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
34
References
Amedi, A., von Kriegstein, K., Van Atteveldt, N. M., Beauchamp, M. S., & Naumer,
M. J., 2005. Functional imaging of human crossmodal identification and
object recognition. Experimental Brain Research, 166, 559-571.
Baart, M., & Vroomen, J., 2010. Do you see what you are hearing? Cross-modal
effects of speech sounds on lipreading. Neuroscience letters, 471, 100-103.
Beauchamp, M. S., Argall, B. D., Bodurka, J., Duyn, J., & Martin, A., 2004.
Unraveling multisensory integration: patchy organization within human STS
multisensory cortex. Nature Neuroscience, 7, 1190 - 1192.
Besle, J., Fort, A., & Giard, M., 2005. Is the auditory memory sensitive to visual
information. Experimental Brain Research, 166, 337-344.
Bishop, D. V. M., 2007. Using mismatch negativity to study central auditory
processing in developmental language and literacy impairments: Where are
we, and where should we be going? Psychological bulletin, 133, 651-672.
Blau, V., Atteveldt, N., Ekkebus, M., Goebel, R., & Blomert, L., 2009). Reduced
neural integration of letters and speech sounds links phonological and reading
deficits in adult dyslexia. Current Biology, 19, 503 - 508.
Blau, V., Reithler, J., van Atteveldt, N., Seitz, J., Gerretsen, P., Goebel, R. &
L.Blomert, 2010. Deviant processing of letters and speech sounds as
proximate cause of reading failure: A functional magnetic resonance imaging
study of dyslexic children. Brain, doi:10.1093/brain/awp308
Blau, V., van Atteveldt, N., Formisano, E., Goebel, R., & Blomert, L., 2008. Task-
irrelevant visual letters interact with the processing of speech sounds in
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
35
heteromodal and unimodal cortex. European Journal of Neuroscience, 28(3),
500-509.
Blomert, L., & Vaessen, A., 2009. 3DM Differentiaal diagnose voor dyslexie:
Cognitieve analyse van lezen en spelling [3DM Differential diagnostics for
dyslexia: Cognitive analysis of reading and spelling]. Amsterdam: Boom Test
Publishers.
Blomert, L., & Willems, G. (in revision). Is there a causal link from a phonological
awareness deficit to reading failure in children at familial risk for dyslexia?
Bonte, M., Mitterer, H., Zellagui, N., Poelmans, H., & Blomert, L., 2005. Auditory
cortical tuning to statistical regularities in phonology. Clinical
Neurophysiology, 116(12), 2765-2774.
Bonte, M., Poelmans, H., & Blomert, L., 2007. Deviant neurophysiological responses
to phonological regularities in speech in dyslexic children. Neuropsychologia,
45, 1427-1437.
Burnham, D., & Dodd, B., 2004. Auditory - visual speech integration by prelinguistic
infants: Perception of an emergent consonant in the McGurk effect.
Developmental Psychobiology, 45, 204 - 220.
Burnham, D., & Sekiyama, K. (in press). Investigating auditory-visual speech
preception development using the ontogenetic and differential language
methods. In E. Vatikiotis-Bateson, P. Perrier & G. Bailly (Eds.), Advances in
auditory–visual speech processing. Cambridge: Mitt Press.
Byrne, B., & R, F.-B., 1989. Phonemic awareness and letter knowledge in the child's
acquisitions of the alphabetic principle. Journal of Educational Psychology,
80, 313-321.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
36
Calvert, G. A., 2001. Crossmodal processing in the human brain: insights from
functional neuroimaging studies. Cereb Cortex, 11(12), 1110-1123.
Calvert, G. A., Brammer, M. J., Bullmore, E. T., Campbell, R., Iversen, S. D., &
David, A. S., 1999. Response amplification in sensory-specific cortices during
crossmodal binding. Neuroreport, 10(12), 2619-2623.
Calvert, G. A., Brammer, M. J., & Iversen, S. D., 1998. Crossmodal identification.
Trends in Cognitive Sciences, 2, 247-253.
Calvert, G. A., Bullmore, E. T., Brammer, M. J., Campbell, R., Williams, S. C.,
McGuire, P. K., et al., 1997. Activation of auditory cortex during silent
lipreading. Science, 276(5312), 593-596.
Calvert, G. A., Campbell, R., & Brammer, M. J., 2000. Evidence from functional
magnetic resonance imaging of crossmodal binding in the human heteromodal
cortex. Current Biology, 10(11), 649-657.
Castro-Caldas, A., Petersson, K. M., Reis, A., Stone-Elander, S., & Ingvar, M., 1998.
The illiterate brain. Learning to read and write during childhood influences the
functional organization of the adult brain. Brain, 121 ( Pt 6), 1053-1063.
Chein, J. M., & Schneider, W., 2005. Neuroimaging studies of practice-related
change: fMRI and meta-analytic evidence of a domain-general control
network for learning. Cognitive Brain Research, 25, 607-623.
Cohen, L., Dehaene, S., Naccache, L., Lehericy, S., Dehaene-Lambertz, G., Henaff,
M. A., et al., 2000. The visual word form area: spatial and temporal
characterization of an initial stage of reading in normal subjects and posterior
split-brain patients. Brain, 123 ( Pt 2), 291-307.
Colin, C., Radeau, M., Soquet, A., Demolin, D., Colin, F., & Deltenre, P., 2002.
Mismatch negativity evoked by the McGurk-MacDonald effect: a phonetic
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
37
representation within short-term memory. Clin Neurophysiol, 113(4), 495-
506.
Coltheart, M., Rastle, K., Perry, C., Langdon, R., & Ziegler, J., 2001. DRC: A dual
route cascaded model of visual word recognition and reading aloud.
Psychological review, 108, 204 - 256.
Csépe, V., 2003. Dyslexia: Different Brain, Different Behavior. New York: Kluwer
Academic/ Plenum Publishers.
Czigler, I., 2007. Visual mismatch negativity; Violation of nonattended environmental
regularities. Journal of Psychophysiology, 21, 224-230.
Dijkstra, T., Schreuder, R., & Frauenfelder, U. H. ,1989. Grapheme Context Effects
on Phonemic Processing. Language and Speech, 32, 89-108.
Ehri, L. C. (1995). Phases of development in learning to read words by sight. Journal
of Research in Reading, 18, 116-125.
Fort, A., Delpuech, C., Pernier, J., & Giard, M., 2002a. Dynamics of cortico-
subcortical cross-modal operations involved in audio-visual object detection in
humans. Cerebral Cortex, 12(10), 1031-1039.
Fort, A., Delpuech, C., Pernier, J., & Giard, M., 2002b. Early auditory-visual
interactions in human cortex during nonredundant target identification.
Cognitive Brain Research, 14(1), 20-30.
Fort, A., & Giard, M., 2004. Multiple electrophysiological mechanisms of audiovisual
integration in human preception. In G. A. Calvert, C. Spence & B. E. Stein
(Eds.), The Handbook of Multisensory Processes (pp. 503 - 513). London: The
Mitt Press.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
38
Frith, U., 1985. Beneath the surface of developmental dyslexia. In K. E. Patterson, J.
C. Marshall & M. Coltheart (Eds.), Surface dyslexia. London: Routledge &
Kegan-Paul.
Froyen, D., Bonte, M., van Atteveldt, N., & Blomert, L., 2009. The long road to
automation: Neurocognitive development of letter-speech sound processing.
Journal of Cognitive Neuroscience, 21, 567 - 580.
Froyen, D., de Doelder, N., & Blomert, L. (submitted). Cross-modal letter-specific
influences on speech sound processing.
Froyen, D., van Atteveldt, N., & Blomert, L., 2010. Exploring the role of low level
visual processing in letter–speech sound integration: a visual MMN study.
Frontiers in Integrative Neuroscience, 4(9).
Froyen, D., van Atteveldt, N., Bonte, M., & Blomert, L., 2008. Cross-modal
enhancement of the MMN to speech sounds indicates early and automatic
integration of letters and speech sounds. Neuroscience Letters, 430, 23-28.
Froyen, D., Willems, G., & Blomert, L. (in revision). Evidence for a specific cross-
modal binding deficit in dyslexia: An MMN-study of letter – speech sound
processing.
Gabrieli, J. D. E., 2009. Dyslexia: A new synergy between education and cognitive
neuroscience. Science, 325, 280-283.
Ghazanfar, A. A., & Schroeder, C. E., 2006. Is the neocortex essentially
multisensory? Trends in Cognitive Science, 10, 278-285.
Giard, M. H., & Peronnet, F., 1999. Auditory-visual integration during multimodal
object recognition in humans: a behavioral and electrophysiological study.
Journal of Cognitive Neuroscience, 11(5), 473-490.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
39
Hashimoto, R., & Sakai, K. L., 2004. Learning letters in adulthood: direct
visualization of cortical plasticity for forming a new link between orthography
and phonology. Neuron, 42(2), 311-322.
Hein, G., Doehrmann, O., Müller, N. G., Kaiser, J., Muckli, L., & Naumer, M. J.,
2007. Object familiarity and semantic congruency modulate responses in
cortical audiovisual integration areas. The Journal of Neuroscience, 27, 7881-
7887.
Herdman, A. T., Fujioka, T., Chau, W., Ross, B., Pantev, C., & Picton, T. W., 2006.
Cortical oscillations related to processing congruent and incongruent
grapheme-phoneme pairs. Neuroscience Letters, 399, 61 - 66.
Hocking, J., & Price, C. J., 2008. The role of the posterior superior temporal sulcus in
audiovisual processing. Cerebral Cortex, 18, 2439 - 2449.
Jusczyk, P. W. (1997). The discovery of spoken language. Cambridge, MA: Mitt
Press.
Kujala, T., & Naatanen, R., 2001. The mismatch negativity in evaluating central
auditory dysfunction in dyslexia. Neurosci Biobehav Rev, 25(6), 535-543.
Lewald, J., Ehrensteinb, W. H., & Guski, G., 2001. Spatio-temporal constraints for
auditory-visual integration. Behavioural Brain Research, 121(1-2), 69-79.
Lewald, J., & Guski, R., 2003. Cross-modal perceptual integration of spatially and
temporally disparate auditory and visual stimuli. Cognitive Brain Research,
16, 468-478.
Lewkowicz, D. J.,1996. Perception of auditory-visual temporal synchrony in human
infants. Journal of Experimental Psychology. Human Perception and
Performance, 22, 1094-1106.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
40
Liberman, I. Y., 1973. Segmentation of the spoken word and reading acquisition.
Bulletin of the Orton Society, 23, 65-77.
Lieberman, P., 2006. Toward an evolutionary biology of language. Cambridge, MA:
MIT Press.
Macaluso, E., George, N., Dolan, R., Spence, C., & Driver, J., 2004. Spatial and
temporal factors during processing of audiovisual speech: a PET study.
NeuroImage, 21, 725-732.
Marsh, G., Friedman, M., Welch, V., & Desberg, P., 1981. A cognitive-
developmental theory of reading acquisition. In G. E. MacKinnon & T. G.
Waller (Eds.), Reading research: advances in theory and practice. New York:
Academic Press.
Massaro, D. W., Cohen, M. M., & Smeele, P. M., 1996. Perception of asynchronous
and conflicting visual and auditory speech. Journal of the Acoustical Society
of America, 100, 1777-1786.
Mattingly, I. G., 1972. Reading, the linguistic process, and linguistic awareness. In J.
F. Kavanagh & I. G. Mattingly (Eds.), Language by ear and by eye: The
relationship between speech and reading. Cambridge, MA: MIT Press.
McCandliss, B. D., Cohen, L., & Dehaene, S., 2003. The Visual Word Form Area:
expertise for reading in the fusiform gyrus. Trends in Cognitive Sciences, 7(7),
293-299.
McGurk, H., & MacDonald, J., 1976. Hearing lips and seeing voices. Nature, 263,
747.
Mitterer, H., & Blomert, L., 2003. Coping with phonological assimilation in speech
perception: evidence for early compensation. Perception & Psychophysics, 65,
956-969.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
41
Molholm, S., Ritter, W., Murray, M. M., Javitt, D. C., Schroeder, C. E., & Foxe, J. J.,
2002. Multisensory auditory-visual interactions during early sensory
processing in humans: a high-density electrical mapping study. Brain Res
Cogn Brain Res, 14(1), 115-128.
Morais, J., Cary, L., Alegria, J., & Bertelson, P., 1979. Does awareness of speech as a
saquence of phones arise spontaneously? Cognition, 49, 957 - 958.
Möttönen, R., Krause, C. M., Tiippana, K., & Sams, M., 2002. Processing of changes
in visual speech in the human auditory cortex. Brain Res Cogn Brain Res,
13(3), 417-425.
Munhall, K., Gribble, P., Sacco, L., & Ward, M. 1996. Temporal constraints on the
McGurk effect. Perception & Psychophysics, 58(3), 351-362.
Munhall, K., & Vatikiotis-Bateson, E., 1998. The moving face during speech
communication. In R. Campbell, B. Dodd & D. Burnham (Eds.), Hearing by
Eye, Part 2: The psychology of speechreading and audiovisual speech (pp.
123-139). London, UK: Taylor & Francis, Psychology Press.
Näätänen, R., 1995. The Mismatch Negativity: A powerful tool for cognitive
neuroscience. Ear and Hearing, 16, 6-18.
Näätänen, R., 2000. Mismatch negativity (MMN): perspectives for application. Int J
Psychophysiol., 37, 3-10.
Näätänen, R., 2001. The perception of speech sounds by the human brain as reflected
by the mismatch negativity (MMN) and its magnetic equivalent (MMNm).
Psychophysiology, 38, 1-21.
Naumer, M. J., Doehrmann, O., Müller, N. G., Muckli, L., Kaiser, J., & Hein, G.,
2009. Cortical plasticity of audio-visual object representations. Cerebral
Cortex, 19, 1641-1653.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
42
Pazo-Alvarez, P., Cadaveira, F., & Amenedo, E., 2003. MMN in the visual modality:
a review. Biol Psychol, 63(3), 199-236.
Perfetti, C. A., Beck, I., Bell, L., & Hughes, C., 1987. Phonemic knowledge and
learning to read are reciprocal: A longitudinal study of first grade children.
Merrill-Palmer Quarterly, 33, 283-319.
Picton, T. W., Alain, C., Otten, L., Ritter, W., & Achim, A., 2000. Mismatch
negativity: different water in the same river. Audiol Neurootol, 5, 111-139.
Raij, T.,1999. Patterns of brain activity during visual imagery of letters. J Cogn
Neurosci, 11, 282-299.
Raij, T., Uutela, K., & Hari, R., 2000. Audiovisual integration of letters in the human
brain. Neuron, 28, 617-625.
Rayner, K., & Pollatsek, A., 1989. The psychology of reading. New Yersey: Prentice-
Hall.
Sams, M., Aulanko, R., Hamalainen, M., Hari, R., Lounasmaa, O. V., Lu, S. T., et al.,
1991. Seeing speech: visual information from lip movements modifies activity
in the human auditory cortex. Neuroscience letters, 127(1), 141-145.
Schröger, E., 1998. Measurement and interpretation of the mismatch negativity.
Behavior Research Methods Instruments & Computers, 30, 131-145.
Sekiyama, K., & Burnham, D., 2008. Impact of language on development of auditory-
visual speech perception. Developmental Science, 11, 303-317.
Share, D. L., 1995. Phonological recoding and self-teaching: sine qua non of reading
acquisition. Cognition, 55, 151-218.
Sumby, W. H., & Pollack, I., 1954. Visual contribution to speech intelligibility in
noise. The Journal of the Acoustical Society of America, 26(2), 212-215.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
43
Talsma, D., Doty, T. J., & Woldorff, M. G., 2007. Selective attention and audiovisual
integration: Is attenting to both modalities a prerequisite for early integration.
Cerebral Cortex, 17, 679-690.
Talsma, D., & Woldorff, M. G., 2005. selective attention and multisensory
integration: Multiple phases of effects on the evoked brain activity. Journal of
Cognitive Neuroscience, 17, 1098-1114.
Teder-Salejarvi, W. A., McDonald, J. J., Di Russo, F., & Hillyard, S. A., 2002. An
analysis of audio-visual crossmodal integration by means of event-related
potential (ERP) recordings. Brain Res Cogn Brain Res, 14(1), 106-114.
UNESCO. (2005). Education for all: Literacy for life. Paris: UNESCO Publishing.
Vaessen, A., & Blomert, L. ,2010. Long-term cognitive dynamics of fluent reading
development. Journal of Experimental Child Psychology, 105, 213-231.
Van Atteveldt, N., Formisano, E., Blomert, L., & Goebel, R., 2007. The effect of
temporal asynchrony on the multisensory integration of letters and speech
sounds. Cerebral Cortex, 13, 962-974.
Van Atteveldt, N., Formisano, E., Goebel, R., & Blomert, L., 2004. Integration of
letters and speech sounds in the human brain. Neuron, 43, 271-282.
Van Atteveldt, N., Formisano, E., Goebel, R., & Blomert, L., 2007. Top-down task
effects overrule automatic multisensory responses to letter-sound pairs in
auditory association cortex. NeuroImage, 36, 1345-1360.
Van Wassenhove, V., Grant, K. W., & Poeppel, D., 2007. Temporal window of
integration in auditory-visual speech perception. Neuropsychologia, 45, 598-
607.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
44
Wimmer, H., Landerl, K., Linortner, R., & Hummer, P., 1991. The relationship of
phonemic arareness to reading acquisition: More consequence than
precondition but still important. Cognition, 40, 219-249.
Winkler, I., Kujala, A., Tiitinen, H., Sivonen, P., Alku, P., Lehtokoski, A., et al.
(1999). Brain responses reveal the learning of foreign language phonemes.
Psychophysiology, 36, 638-642.
Zampini, M., Guest, S., Shore, D., & Spence, C., 2005. Audio-visual simultaneity
judgments. Perception and Psychophysics, 67, 531-544.
Ziegler, J. C., & Goswami, U., 2005. Reading acquisition, developmental dyslexia,
and skilled reading across languages: a psycholinguistic grain size theory.
Psychol Bull, 131(1), 3-29.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
45
Figure 1.
Design of the audiovisual MMN studies with the auditory only condition (A) and the
audiovisual condition (B). “A” represents the auditory stimulus presentation, “V”
represents the visual stimulus presentation. The arrow indicates the violation of the
standard speech sound in the auditory condition (A) and the double violation of both
the standard speech sound and the letter in the audiovisual condition (B).
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
46
Figure 2.
Design of the audiovisual MMN studies with the auditory only condition (A) and the
audiovisual condition (B). “A” represents the auditory stimulus presentation, “V”
represents the visual stimulus presentation. The arrow indicates the violation of the
standard speech sound in the auditory condition (A) and the double violation of both
the standard speech sound and the letter in the audiovisual condition (B).
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
47
Figure 3.
Mean amplitudes of the visual MMN averaged over the three occipital electrodes (O1,
O2 and Oz).