impact of modality and linguistic complexity during reading and listening tasks

17
Impact of modality and linguistic complexity during reading and listening tasks G. Jobard, a M. Vigneau, a B. Mazoyer, a,b and N. Tzourio-Mazoyer a, a Groupe dImagerie Neurofonctionnelle, UMR 6194, CNRS/CEA/Univ. Caen and Paris 5, France b IRM CHU Caen, Institut Universitaire de France, France Received 15 March 2006; revised 6 June 2006; accepted 18 June 2006 Reading and understanding speech are usually considered as different manifestations of a single cognitive ability, that of language. In this study, we were interested in characterizing the specific contributions of input modality and linguistic complexity on the neural networks involved when subjects understand language. We conducted an fMRI study during which 10 right-handed male subjects had to read and listen to words, sentences and texts in different runs. By comparing reading to listening tasks, we were able to show that the cerebral regions specifically recruited by a given modality were circumscribed to unimodal and associative unimodal cortices associated with the task, indicating that higher cognitive processes required by the task may be common to both modalities. Such cognitive processes involved a common phonological network as well as lexico-semantic activations as revealed by the conjunction between all reading and listening tasks. The restriction of modality-specific regions to their corresponding unimodal cortices was replicated when looking at brain areas showing a greater increase during the comprehension of more complex linguistic units than words (such as sentences and texts) for each modality. Finally, we discuss the possible roles of regions showing pure effect of linguistic complexity, such as the anterior part of the superior temporal gyrus and the ventro-posterior part of the middle temporal gyrus that were activated for sentences and texts but not for isolated words, as well as a text-specific region found in the left posterior STS. © 2006 Elsevier Inc. All rights reserved. Introduction Language is one of the most prominent and characteristic cognitive abilities of humans. Although it is usually accomplished through vocal production addressed to the interlocutor s ears, humans have developed ways of using language through other sensory modalities. In this very moment indeed, the reader is actually confronted with information aimed at his visual system rather than his ears although this remains unarguably a language activity. At the brain level, this possibility to understand language originating from different sensory modalities can be explained either by the existence of centralized amodal language resources to which information would converge to be processed, and/or by the dedication of sets of cerebral regions to the processing of modality- specific information. In the framework of listening and reading, former works dating back to the beginning of neuropsychology typically assumed the existence of lexicons where the auditory (Wernicke, 1874) and visual (Geschwind, 1970) forms of words would be separately stored, while information would ultimately be processed by the same common resources. Using visually or orally presented isolated words, Booth showed that subjects performing a rhyme judgment task (relying on auditory word forms) recruited phonological areas situated in associative auditory cortex while a spelling judgment (relying on word visual forms) engaged associative visual areas (Booth et al., 2002a). More importantly, they demonstrated that information delivered to both the eyes and the ears was adequate in activating these regions in order to complete the task, therefore exhibiting a great overlap of the networks involved in a given task for both modalities. This result was recently confirmed with the same tasks by Cohen (Cohen et al., 2004) who found a large multimodal network as well as a very restricted set of associative regions responding more to either visual or auditory words (in the fusiform gyrus and superior temporal gyrus respectively). Their putative role now differs substantially from the earlier view of lexicons, but it remains unclear as to where this modality specificity extends in the cortex, and how specific to words these regions are (see Vigneau et al., 2005; Jobard et al., 2003; Price and Devlin, 2003, 2004 for a discussion about the nature of the specialization occurring in the occipito-temporal junction). While these studies interestingly showed a high degree of overlap between brain networks involved in tasks completed in different modalities, one could argue that these similarities have been induced by the very nature of the tasks since they required a cross-modal conversion (i.e. from visual to auditory during phonological tasks or from auditory to visual during orthographical tasks). Commonality of word processing independently of entry modality has been more directly tested by asking subjects to accomplish semantic judgments with words. Most of these studies demonstrated the supramodal implication of anterior regions (anterior SMA, anterior prefrontal, pre-motor and www.elsevier.com/locate/ynimg NeuroImage 34 (2007) 784 800 Corresponding author. Groupe d'Imagerie Neurofonctionnelle, Centre Cyceron, 22, Boulevard Becquerel, BP5229, 14074 Caen Cedex, France. Fax: +33 231 470 222. E-mail address: [email protected] (N. Tzourio-Mazoyer). Available online on ScienceDirect (www.sciencedirect.com). 1053-8119/$ - see front matter © 2006 Elsevier Inc. All rights reserved. doi:10.1016/j.neuroimage.2006.06.067

Upload: univ-bordeauxsegalen

Post on 18-Jan-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

www.elsevier.com/locate/ynimg

NeuroImage 34 (2007) 784–800

Impact of modality and linguistic complexity during reading andlistening tasks

G. Jobard,a M. Vigneau,a B. Mazoyer,a,b and N. Tzourio-Mazoyera,⁎

aGroupe d’Imagerie Neurofonctionnelle, UMR 6194, CNRS/CEA/Univ. Caen and Paris 5, FrancebIRM CHU Caen, Institut Universitaire de France, France

Received 15 March 2006; revised 6 June 2006; accepted 18 June 2006

Reading and understanding speech are usually considered as differentmanifestations of a single cognitive ability, that of language. In thisstudy, we were interested in characterizing the specific contributions ofinput modality and linguistic complexity on the neural networksinvolved when subjects understand language. We conducted an fMRIstudy during which 10 right-handed male subjects had to read andlisten to words, sentences and texts in different runs. By comparingreading to listening tasks, we were able to show that the cerebralregions specifically recruited by a given modality were circumscribedto unimodal and associative unimodal cortices associated with the task,indicating that higher cognitive processes required by the task may becommon to both modalities. Such cognitive processes involved acommon phonological network as well as lexico-semantic activations asrevealed by the conjunction between all reading and listening tasks.The restriction of modality-specific regions to their correspondingunimodal cortices was replicated when looking at brain areas showinga greater increase during the comprehension of more complexlinguistic units than words (such as sentences and texts) for eachmodality. Finally, we discuss the possible roles of regions showing pureeffect of linguistic complexity, such as the anterior part of the superiortemporal gyrus and the ventro-posterior part of the middle temporalgyrus that were activated for sentences and texts but not for isolatedwords, as well as a text-specific region found in the left posterior STS.© 2006 Elsevier Inc. All rights reserved.

Introduction

Language is one of the most prominent and characteristiccognitive abilities of humans. Although it is usually accomplishedthrough vocal production addressed to the interlocutor’s ears,humans have developed ways of using language through othersensory modalities. In this very moment indeed, the reader isactually confronted with information aimed at his visual systemrather than his ears although this remains unarguably a languageactivity. At the brain level, this possibility to understand language

⁎ Corresponding author. Groupe d'Imagerie Neurofonctionnelle, CentreCyceron, 22, Boulevard Becquerel, BP5229, 14074 Caen Cedex, France.Fax: +33 231 470 222.

E-mail address: [email protected] (N. Tzourio-Mazoyer).Available online on ScienceDirect (www.sciencedirect.com).

1053-8119/$ - see front matter © 2006 Elsevier Inc. All rights reserved.doi:10.1016/j.neuroimage.2006.06.067

originating from different sensory modalities can be explainedeither by the existence of centralized amodal language resources towhich information would converge to be processed, and/or by thededication of sets of cerebral regions to the processing of modality-specific information. In the framework of listening and reading,former works dating back to the beginning of neuropsychologytypically assumed the existence of lexicons where the auditory(Wernicke, 1874) and visual (Geschwind, 1970) forms of wordswould be separately stored, while information would ultimately beprocessed by the same common resources. Using visually or orallypresented isolated words, Booth showed that subjects performing arhyme judgment task (relying on auditory word forms) recruitedphonological areas situated in associative auditory cortex while aspelling judgment (relying on word visual forms) engagedassociative visual areas (Booth et al., 2002a). More importantly,they demonstrated that information delivered to both the eyes andthe ears was adequate in activating these regions in order tocomplete the task, therefore exhibiting a great overlap of thenetworks involved in a given task for both modalities. This resultwas recently confirmed with the same tasks by Cohen (Cohen etal., 2004) who found a large multimodal network as well as a veryrestricted set of associative regions responding more to eithervisual or auditory words (in the fusiform gyrus and superiortemporal gyrus respectively). Their putative role now differssubstantially from the earlier view of lexicons, but it remainsunclear as to where this modality specificity extends in the cortex,and how specific to words these regions are (see Vigneau et al.,2005; Jobard et al., 2003; Price and Devlin, 2003, 2004 for adiscussion about the nature of the specialization occurring in theoccipito-temporal junction). While these studies interestinglyshowed a high degree of overlap between brain networks involvedin tasks completed in different modalities, one could argue thatthese similarities have been induced by the very nature of the taskssince they required a cross-modal conversion (i.e. from visual toauditory during phonological tasks or from auditory to visualduring orthographical tasks). Commonality of word processingindependently of entry modality has been more directly tested byasking subjects to accomplish semantic judgments with words.Most of these studies demonstrated the supramodal implication ofanterior regions (anterior SMA, anterior prefrontal, pre-motor and

785G. Jobard et al. / NeuroImage 34 (2007) 784–800

inferior frontal gyrus) during semantic-related tasks, but variationshave been observed within the temporal lobe (Chee et al., 1999;Buckner et al., 2000; Booth et al., 2002a,b). Using a word stemcompletion task, Buckner showed the modality independentimplication of an inferior temporal region that Chee had describedas being activated during abstract/concrete decisions on auditorywords but not on visual words. Additionally, Booth described thesupramodal implication of yet another region, the posterior middletemporal region, when a semantic task was contrasted to a rhymingtask, although it was more active during the auditory modality(Booth et al., 2002b). Beyond the supramodal networks involved atthe mere lexical level, some studies attempted to uncover to whatextent the same cerebral regions are involved when subjectsprocess more complex language stimuli such as sentences. Usingsemantic and syntactic error detection tasks in 3 words sentences,Carpentier described the common implication of the inferior frontaland superior temporal gyri during exposure to spoken and writtenlanguage (Carpentier et al., 2001). While a similar overlap wasreported in part by Michael also with sentences, this author pointedout some differences between visual and auditory modalities inmultimodal areas, such as a more anterior activation in thetemporal gyrus and a bigger involvement of anterior part of the leftfrontal inferior region during audition (Michael et al., 2001).

Because of the possibility that the overlap between modalitymay be affected by the level of processing, the present study aimedat exploring the effect of modality on brain networks whilesubjects are processing language stimuli of differing complexity.Rather than comparing levels of activity associated with increasingsemantic or syntactic complexity as it has been previously done,we chose to explore the effects of linguistic complexity by using anecological comprehension task with gradually more complex unitsof language, such as words, sentences and texts.

Materials and methods

Subjects

This study was approved by the local ethics committee. Tenhealthy male volunteers aged 18–26 years (mean=22.4±3.12) tookpart in this study after they gave their informed written consent. Allsubjects were right-handed as attested by the Edinburgh inventoryquestionnaire (mean=86.2±16) (Oldfield, 1971), had a high level ofeducation and reported French as their mother tongue.

Experimental paradigm

The protocol included three sessions of passive reading as wellas three sessions of passive listening during which three levels oflinguistic complexity were explored: words, sentences and texts.

Each reading run consisted in a block design alternating five 30 speriods of cross fixation with four 30 s duration reading periods ofeither words, sentences or texts (Fig. 1). The 3 runs of listening alsoalternated four 30 s periods of either words, sentences or textslistening with five 30 s periods of a cross fixation. A cross fixationbaseline was therefore used both during reading and listening tasks.Visual stimuli were projected in white font onto a black screen, whileauditory stimuli were delivered through ear-phones. During thereading task, subjects were asked to read silently the stimuliappearing on the screen. In the listening task, subjects wereinstructed to fixate a central crosshair while attending to the auditorystimuli. As an incentive for the subjects to carefully attend to the

stimuli, they were informed that they would have to perform arecognition test after the scanning session.

In order to avoid contamination effects of a task by the tasksrelying on superior linguistic levels, subjects were exposed to runs ofincreasing linguistic complexity (words, then sentences, then texts).We were indeed concerned that subjects having read a sentence first(or a text) may spontaneously try to combine the stimuli of the word(or sentence) runs to reconstitute similarly constructed sentences (ora texts). Furthermore, within each level of linguistic complexity, werandomly varied the order of modality (visual or auditory) from onesubject to another.

Word reading and listeningThe word lists used in the reading and listening conditions were

matched for length, imageability and category. French commonwords were extracted from the « Brulex » database (Content et al.,1990). Half of these words were frequent and highly imageablecommon nouns, 25% were non conjugated verbs and 25% wereadjectives. The number of words of natural and manufacturedobjects was equivalent. Mean length of these words was 6.5 (±1.8)letters. Each 30 s-duration period of listening and reading was madeup of 32 words, each word being presented during 940 ms.

Sentence reading and listeningAll sentences were made up of common French words selected

from the database “Brulex”. Each sentence presented the followinginvariable structure “I +present perfect verb+direct object +adjective+adverbial phrase of place”. They were composed ofthree common nouns (or two common nouns and one adjective), oneverb and included functional words (personal pronoun, article)critical for a consistent syntactic form. Both sentence reading andlistening tasks were matched for length, imageability and categoryand were equated with those of words. Thus, each 30 s-durationperiod was made up of eight sentences, namely 32 words plus 36.8functional words. Each sentence was visually or aurally presentedduring 3.75 s.

Text reading and listeningThe texts were constructed to resemble to small scripts relating a

regular action made by the narrator, such as “When I do thehousework”, “When I repair my bike”, “When I go to the doctor”.These texts were made up of sentences matched with those of thesentence condition for grammatical form: “I+present perfect verb+direct object+adjective+adverbial phrase of place”. Each text wasvisually or orally presented during 30 s and consisted of eightsentences, i.e. 32 common French words plus 40 functional words(conjunctions, personal pronouns, articles) used in order to preservethe syntactic coherence of the text.

In average, each block of sentences contained 36.8 function wordsmore than word blocks while text blocks included 40 function wordsmore than words. A reason for this inequality lies in the fact that ittakes more time to read 4 isolated words presented sequentially than 4words presented at once. If we equated the number of words in theword condition to that of the sentence or text condition, thepresentation rate in the word condition would have been too highfor the subjects to be able to identify the stimuli correctly. Presentingan equal number of words would thus necessitate differentpresentation times for all conditions, which would in turn result inan unequal number of brain images among conditions.

We decided rather to keep the same number of brain images percondition and to equate the conditions on the number of content

Fig. 1. Block design alternating 30-s duration periods of cognitive task with 30 s-duration periods of cross fixation. Up: Reading tasks consisted of four periods ofcovert reading of either words, sentences or texts, alternating with five periods of a cross fixation. Down: Listening tasks during which four periods of a passivelistening of either words, sentences or texts alternated with five periods of a cross fixation.

786 G. Jobard et al. / NeuroImage 34 (2007) 784–800

words included: since functions words generally do not carrymeaning by themselves, we chose not to include them in the wordcondition. Such a choice ensured that the word condition wouldresult in the activation of as many concepts as there are in thesentence and text conditions.

Moreover, presenting function words in the stream of isolatedwords could have induced the subjects to search for a link betweeneach stimulus, and possibly try to reconstruct sentences based onthese elements. Such a search would have altered the spontaneousstrategy involved in lexical access that we wanted to capture.

Image acquisition

MRI acquisitions were conducted on a GE Signa 1.5-T HorizonEchospeed scanner (General Electric, BUC, France). The sessionstarted with two anatomical acquisitions. First, a high-resolutionstructural T1-weighted sequence (T1-MRI) was acquired using aspoiled gradient recalled sequence (SPGR-3D, FOV=256×256×186 mm3, sampling=0.94×0.94×1.5 mm3) to providedetailed anatomical images. The second anatomical acquisitionconsisted in a double echo proton density/T2-weighted (PD-MRI/T2-MRI) sequence (FOV=256×256×121 mm3, sampling=0.94×0.94×3.8 mm3) and allowed to define the location of the 32axial slices to be acquired during the functional sequences. Notethat, because of the limited size of the field of view, most of thetime, the cerebellum was not entirely imaged. Each functionalsession consisted in a time series of 50 EPI volumes (TR=6 s,TE=60 ms, FA=90°, sampling=3.75×3.75×3.8 mm3).

Functional image pre-processing

The first five volumes of the functional acquisition werediscarded, allowing for signal stabilization, and differences in sliceacquisition timing were corrected (SPM99). The sixth volume ofthe run was considered as the reference functional volume(fMRI0). For registration of fMRI0 onto the stereotactic MontrealNeurological Institute (MNI) template, rigid (fMRI0 onto T2-MRIand PD-MRI onto T1-MRI) and non-linear (T1-MRI onto theMNI template) registration matrices were computed and thencombined. The registration of fMRI0 and T1-MRI volumes in theMNI space was thereafter visually checked with the MPI Toolsoftware (Max-Planck Institute, Germany) and manually correctedwhen necessary.

Then, each fMRI volume was registered onto the fMRI0volume (SPM99) and resampled in the MNI space using theregistration parameters calculated in the first procedure. Finally,data were spatially smoothed with an 8-mm FWHM Gaussiankernel, leading to an image smoothness of approximately 11 mm inthe 3 directions.

Image analysis

The functional pre-processed MRI data were analyzed andintegrated in a statistical model by the semi-automated softwareSPM99 (Friston et al., 1995). In an initial fixed effect analysis, weobtained, for each subject, a set of contrast maps containing thevalues of BOLD signal variations between each cognitive task and

787G. Jobard et al. / NeuroImage 34 (2007) 784–800

the reference condition (cross fixation). This first stage resulted in 6contrast maps for each subjects: word reading–fixation (WORD_R),sentence reading–fixation (SENT_R), text reading–fixation(TEXT_R), word listening–fixation (WORD_L), sentence listen-ing–fixation (SENT_L) and text listening–fixation (TEXT_L).

These contrast maps were entered in a random effect statisticalanalysis including the search for 4 different types of effect at thegroup level: (1) the first contrast analysis was aimed at evidencingthe neural networks whose BOLD signal varied with the modality,regardless of the linguistic complexity. This effect of modality wasexplored by contrasting reading (WORD_R+SENT_R+TEXT_R)to listening (WORD_L+SENT_L+TEXT_L) and the reversecontrast (i.e. listening minus reading). (2) A conjunction analysisof reading and listening tasks was performed to identify commonneural networks to both modalities irrespective of the linguisticcomplexity (WORD_R+SENT_R+TEXT_R)∩(WORD_L+SENT_L+TEXT_L). (3) Then, we searched for an interactionbetween modality and linguistic complexity, considering thatwords, sentences and texts are the three ordered levels of acomplexity factor. We therefore searched for greater BOLD signalincreases with linguistic complexity for the visual compared tothe auditory modality ((TEXT_R−WORD_R)− (TEXT_L−WORD_L)), as well as the opposite effects (i.e. greater increaseswith complexity for auditory modality than visual: (TEXT_L−WORD_L)− (TEXT_R−WORD_R)). The same procedure wasused to compare sentences to words ((SENT_L−WORD_L)−(SENT_R−WORD_R) and the opposite comparison) and texts tosentences ((TEXT_L−SENT_L)− (TEXT_R−SENT_R) and op-posite). (4) Finally, the impact of complexity (irrespective of themodality input) was explored thanks to the contrast text minus wordcomprehension ((TEXT_R+TEXT_L)− (WORD_R+WORD_L)).This resulting map was then exclusively masked by theinteraction contrast maps obtained in (3) in order to remove thecerebral regions showing modality-specific increases of activityalong with complexity. Similar contrast maps were obtained forthe sentences compared to words and the texts compared to thesentences.

For each of these analyses, beta values associated with the sixinitial contrast maps (WORD_R, WORD_L, SENT_R, SENT_L,TEXT_R and TEXT_L) were extracted from significantly activatedregions in order to be plotted and to provide detailed informationabout their activation profiles during each task.

The thresholds for the contrast maps used throughout the secondlevel analysis were set at p<0.001 uncorrected for multiplecomparisons and at p<0.000001 for the conjunction analysis(p<0.001 for each map included in the conjunction). Activationclusters encompassing less than 10 voxels were discarded, except forthe analyses concerning the impact of complexity on the brainnetworks (‘text minus sentence’ and ‘sentence minus words’, seeTable 4), since these clusters were already present and abovethreshold for the ‘text minus word’ contrast.

Results

Post-fMRI behavioral results

Although subjects did not have explicit instructions to encodethe material presented, they went through a post-fMRI session inorder to measure their incidental memorization, as an indirect proofof their attention during the passive tasks they had to accomplishduring the fMRI acquisition.

Words and sentence free recognition tasksFor the words and sentences conditions, the post-fMRI

questionnaires consisted of recognition of the materials presentedeither orally or visually among other non-presented items. Twolists (one for each modality of presentation) containing 128 wordsand two lists made of 32 sentences were presented to the subjectsthat simply had to circle the ones they recognized. Unknown to thesubjects, these lists were constituted of half distractors and halftargets. Because of these unconstrained instructions, results werequite variable from a subject to another, some of them choosing tocircle only a few they were absolutely sure about, others circling agreater number of stimuli.

For technical reasons, data concerning words presented orallycould not be gathered. On average 20.8 (±10.64) words out of the64 were correctly recognized from the word reading condition,representing 81.24% of correct answers among the circled items. Inthe sentence reading condition, subjects correctly recognized 8.1(±2.47) sentences out of 16 (85.74% of the circled items) whilethey recognized 6.7 (±2.45) of the 16 sentences presented orally(74.26% of the circled items).

Text comprehension tasksFor the text conditions, subject’s attention was tested by

asking six questions about each of the four texts presented eitherorally or visually. On average, subjects scored 15.5 (±4.01) out of24 for the text reading condition and 15.9 (±5.34) for the textlistening condition, yielding a correct comprehension of the textspresented.

Effect of input modality

Reading minus listeningThe reading associated activation pattern compared to that of

listening included three clusters located within the bilateraloccipital, the parietal and frontal lobes. The most posterior clusterwas centered on the medial part of the bilateral occipital lobe (i.e.the bilateral calcarine fissure) and spread dorsally towards thebilateral middle occipital gyrus and rostrally towards the fusiformgyrus that included the left the occipito-temporal junction oftendescribed as the Visual Word Form Area (VWFA). In the parietallobe, the activation peak was situated in the bilateral superiorparietal gyri. The left frontal cluster was located in the precentralgyrus at the level of the inferior frontal sulcus. Finally, a significantactivation was detected within the bilateral lateral geniculate nuclei(Table 1, Fig. 2).

All activation peaks exhibited a significant activation duringreading and a small deactivation during listening, except for the leftprecentral gyrus and the VWFA that were also active duringlistening, although at a smaller degree.

Listening minus readingIn the left hemisphere, the main cluster activated preferentially

during the listening task was observed in the temporal cortex. Itwas situated in the superior temporal gyrus, including some partsof the middle temporal gyrus and was stretched along the y axisfrom the temporal pole to the posterior part of the superiortemporal gyrus, including both the Heschl gyrus and the PlanumTemporale. Surprisingly, some restricted activations were observedin the visual cortex (middle occipital gyrus) as well as in thepostcentral gyrus and the medial temporal cortex (anterior part ofthe fusiform gyrus; Table 1, Fig. 3).

Table 1Modality effect: mean activation peaks identified during reading minus listening and reverse contrast, irrespective of the linguistic complexity (N=10)

Anatomical localization N voxels Stereotaxic coordinates Z scores

X Y Z

Reading>Listening [(WORD_R+SENT_R+TEXT_R)− (WORD_L+SENT_L+TEXT_L)]Occipito-temporal lobe

L calcarine* 6394 −16 −94 −6 InfR lingual* 24 −90 −4 7.71L fusiform* −40 −72 −18 7.08L middle occipital* −26 −84 20 6.22R middle occipital* 32 −78 22 5.43L VWFA −48 −52 −12 3.72R fusiform 61 38 −44 −22 4.10

Parietal lobeL superior parietal* 433 −28 −62 52 6.46R superior parietal 24 28 −68 62 3.71

Frontal lobeL precentral* 264 −52 0 34 5.17L lat geniculate nucleus* 156 −22 −32 −2 5.97R lat geniculate nucleus 76 26 −30 0 5.62

Listening>Reading [(WORD_L+SENT_L+TEXT_L)− (WORD_R+SENT_R+TEXT_R)]Temporal cortex

L superior temporal* 4811 −52 −20 6 (Inf)L mid/superior temporal* −60 −36 16 (6.41)L temporal pole −44 −4 −14 (4.64)L fusiform 54 −24 −44 −12 (3.92)R superior temporal* 5720 56 −14 0 (Inf)R superior temporal* 66 −18 0 (Inf)R temporal pole* 58 0 −8 (Inf)R STS* 62 −40 6 (5.72)

Occipital cortexL middle occipital 87 −56 −74 16 (4.51)R precuneus 123 24 −48 14 (4.58)

Parietal cortexL postcentral 10 −40 −36 64 (3.97)

Frontal cortexR IFG triangularis 108 44 20 20 (4.18)L posterior cingulum 98 −14 −42 14 (4.78)R cerebellum 29 8 −44 −16 (3.58)

Statistical maps were set at p<0.001 uncorrected for multiple comparisons. The reading minus listening contrast map was inclusively masked by reading minusfixation (and the reverse contrast by the listening minus cross fixation map; p<0.05 uncorrected). Anatomical localization is given in the MNI stereotactic space.Activation peaks still significantly activated at the corrected threshold of p<0.05 are indicated by an asterisk (L: left, R: right, STS: superior temporal sulcus,IFG: inferior frontal gyrus, VWFA: visual word form area, lat: lateral, mid: middle).

788 G. Jobard et al. / NeuroImage 34 (2007) 784–800

In the right hemisphere, a large cluster was detected in thetemporal lobe, homologous to that observed in the left counterpart. Itwas located in the superior and middle temporal gyri, overlappingthe superior temporal sulcus and extended from the temporal pole tothe posterior part of the superior temporal gyrus. A small activationwas also detected in the pars triangularis of the inferior frontal gyrus.

All clusters of this contrast exhibited activation during listening,and a small deactivation during reading, except for the right middletemporal gyrus that showed a small BOLD increase during readingtasks.

Conjunction of reading and listening

The neural network shared both by listening and reading taskswas observed bilaterally, with a larger extent in the left hemisphere.In the left hemisphere, two large clusters of activation locatedwithin the temporal and the frontal cortex were jointly activated by

reading and listening. The left temporal cluster was centered on theposterior part of the middle temporal gyrus, close to the superiortemporal sulcus and extended ventrally in the occipito-temporaljunction including the VWFA and the inferior temporal gyrus. Inthe left frontal cortex, a large activation was centered on the upperpart of the precentral gyrus and extended ventrally towards bothpars triangularis and opercularis of the inferior frontal gyrus (IFG)(Table 2, Fig. 4).

In the right hemisphere, common activations consisted in threeclusters also located within the frontal and temporal lobes. In thefrontal cortex, a first peak was centered on the precentral gyrusencompassing the middle frontal gyrus, while a second cluster waslocated within both pars orbitaris and triangularis of the IFG. In thetemporal cortex, an activation peak was detected in the posteriorpart of the superior temporal sulcus. Although its location seems tobe homologous to the one observed in the left hemisphere, itsextent was much more reduced.

Fig. 2. Modality effect: BOLD signal increases identified in the whole group (N=10) during reading minus listening, irrespective of the linguistic complexity.Contrast was set at p<0.001 uncorrected and masked by reading (p<0.05 uncorrected, inclusive). Activation peaks were projected onto the MNI template.Graph: BOLD signal variations observed during reading (yellow) and listening tasks (blue). VWFA: visual word form area.

789G. Jobard et al. / NeuroImage 34 (2007) 784–800

Additional common activations were seen within the supple-mentary motor area, the left cerebellum and the cerebellar vermis.

Interaction between modality and linguistic complexity

By looking for regions presenting a greater BOLD increase fromwords to texts in the reading compared to the listening conditions,we could unravel the linguistic complexity dependent areas thatwere specific to the visual modality. The reverse contrast naturally

showed regions in which the difference of activity elicited by thetexts vs. words comparison was higher for auditory than visualmodalities. These two distinct analyses characterized the interactionbetween the modality of presentation and the linguistic complexity.

Effect of linguistic complexity in the visual modality-specificnetworks

In the bilateral occipital cortex, a large medial cluster ofactivation was centered on the calcarine fissure in its polar part,

Fig. 3. Modality effect: BOLD signal increases identified in the whole group (N=10) for the contrast listening minus reading, irrespective of the linguisticcomplexity. Contrast was set at p<0.001 uncorrected and was masked by listening (p<0.05 uncorrected, inclusive). Activation peaks were projected onto theMNI template. Graph: BOLD signal variations observed during reading (yellow) and listening tasks (blue).

790 G. Jobard et al. / NeuroImage 34 (2007) 784–800

encompassing the lingual gyrus, and elicited larger BOLDvariations during reading than during listening task. In thisbilateral activation peak, a gradual BOLD increase from word to

text was seen for the visual modality, whereas no variationaccording to the linguistic complexity could be observed in thisregion for the auditory input (Table 3, Fig. 5A).

Table 2Common networks: mean activation peaks identified by conjunction ofreading and listening irrespective of the linguistic complexity (WORD_L+SENT_L+TEXT_L)∩ (WORD_R+SENT_R+TEXT_R)

Anatomical localization Nvoxels

Stereotacticcoordinates

Zscores

X Y Z

Temporal lobeL posterior MTG* 1347 −56 −48 6 InfL post t2 (BTLA)* −50 −44 −10 InfL VWFA* −48 −56 −12 7.42R superior temporal 14 52 −42 14 5.26

Frontal lobeL superior precentral* 1931 −48 2 54 InfL IFG pars triangularis* −58 30 2 7.46L IFG pars triangularis* −46 26 16 6.65L inf precentral sulcus* −42 6 28 5.98R IFG pars orbitaris* 535 46 28 −4 7.14R IFG pars triangularis* 60 28 −2 6.88R superior precentral* 356 58 −6 44 InfR middle frontal* 46 6 58 5.69SMA* 114 0 8 58 5.54

CerebellumR cerebellum* 40 32 −62 −28 6.98L cerebellum* 73 −26 −68 −26 6.53

Vermis* 253 0 −36 −4 7.36L lat geniculate nucleus* −14 −30 −6 6.08Vermis * 63 −2 −56 −36 6.02

The conjunction statistical map was thresholded at p<0.000001 uncorrected(p<0.001 for each contrast map). Coordinates are given in the MNIstereotactic space. Activation peaks still significantly activated at thecorrected threshold of p<0.0025 (corrected p<0.05 for each contrast map)are indicated by an asterisk (L: left, R: right, MTG: middle temporal gyrus,IFG: inferior frontal gyrus, SMA: supplementary motor area, VWFA: visualword form area, t2: inferior temporal sulcus, BTLA: basal temporal languagearea, inf: inferior, lat: lateral).

791G. Jobard et al. / NeuroImage 34 (2007) 784–800

This interaction contrast map also presented an activation peaklocated in the right Lateral geniculate nucleus. Within the rightlateral geniculate nucleus, BOLD signal variations were largerduring reading than listening task, and increased linearly with thelinguistic complexity during reading only.

Effect of linguistic complexity in the auditory modality-specificnetworks

One peak located in the right Heschl gyrus showed an increasein BOLD activity with linguistic complexity during listening, whileno signal variation – or a small decrease – was observed fromwords to texts during reading (Table 3, Fig. 5B).

Impact of linguistic complexity irrespective of modality

Subtracting BOLD signal variations of words to that of texts,three activation peaks located within the temporal lobe showed acomplexity effect in the left hemisphere. One peak was located in theposterior part of the STS, one was situated in the anterior part of thesuperior temporal sulcus and the last peak resided in the posteriorpart of the middle temporal gyrus. In addition, a small activation waspresent within the anterior fusiform gyrus (Table 4, Fig. 6).

Within the left frontal cortex, activation was detected in theupper part of the precentral sulcus.

In the right hemisphere, the only activation peak we couldobserve was located within the temporal pole.

These regions could be categorized according to their activationpattern as a function of linguistic complexity, as revealed by thedirect comparison between (a) sentences minus words and (b) textsminus sentences. (1) The left precentral sulcus and left IFGtriangularis were activated at the level of word and were graduallymore recruited as the linguistic complexity arouse, although thisdifference only became significant when comparing directly wordsto texts; (2) Some regions were gradually more activated forsentences and texts with a deactivation during word comprehen-sion, as was seen in the anterior fusiform gyrus, the anterior part ofthe left superior temporal sulcus, the ventral part of the left middletemporal gyrus and the right temporal pole; (3) Finally the onlytext-specific region, i.e. the left posterior part of the superiortemporal sulcus (STS), exhibited a small deactivation during alltasks but text comprehension.

Discussion

Impact of input modality

The most striking result concerning the impact of entrymodality consisted in the confinement of activations in mainlyunimodal association cortex. Indeed, no higher order areapreviously described as involved in language comprehensioncontributed at a greater degree to one modality compared to theother.

During reading more than listening tasks, activity was observedin unimodal visual areas situated in the bilateral occipital corticesand extended ventrally towards the fusiform and inferior temporalgyri, at the junction between occipital and temporal cortices.Beyond the self-understandable involvement of occipital regions inorder to process the flow of visual information, the greaterimplication of the left occipito-temporal junction during readingthan listening is in agreement with previous reports presenting thisregion as being critical for establishing the connection between aperceived visual form and its corresponding representation. Thisregion was introduced by Cohen as being the ‘visual word formarea’ (Cohen et al., 2000) and has been attributed functionalcharacteristics (i.e. dedication to words compared to pictures,dedication to word-like stimuli compared to alphabetically illegalstrings, dedication to the visual modality) that are still under debate(Vigneau et al., 2005; Jobard et al., 2003; Price and Devlin, 2003,2004). Among these characteristics, its purported dedication to thevisual modality has been questioned (Price and Devlin, 2004).While Dehaene showed no trend for activation in the VWFAduring auditory word comprehension (Dehaene et al., 2002), somestudies reported the implication of the VWFA during tasks withwords delivered auditorily (Price et al., 2003; Cohen et al., 2004;Booth et al., 2002b). The present results therefore confirm thepreference of the VWFA for visual stimulations since we found itmore activated by reading than listening, but the conjunctionanalysis discussed below revealed that this region is alsosignificantly activated by word listening. This activation in theauditory modality is still at the center of an actual debate on thefunctional role of the VWFA. For some authors, its activationduring auditory words simply results from the subjects picturingthe orthographic forms of the words in their mind (Dehaene et al.,2002), which remains compatible to the initially conceived role andspecificity of the VWFA. According to others, since pictures

Fig. 4. Common networks: Activation peaks identified in the whole group (N=10) by conjunction of reading and listening, irrespective of the complexity.Contrast was set at p<0.000001 uncorrected (p<0.001 uncorrected for each contrast map) and was projected onto the MNI template. Graph: BOLD signalvariations observed during reading (yellow) and listening tasks (blue). STS: superior temporal sulcus, IFG: inferior frontal gyrus, VWFA: visual word form area,SMA: supplementary motor area, BTLA: basal temporal language area.

792 G. Jobard et al. / NeuroImage 34 (2007) 784–800

Table 3Mean activation peaks identified for the interaction between modality (reading and listening) and linguistic complexity (N=10)

Anatomical localization N voxels Stereotactic coordinates Z scores

X Y Z

Texts minus wordsReading>Listening [(TEXT_R−WORD_R)− (TEXT_L−WORD_L)]L calcarine* 3208 −12 −98 −2 6.08L lingual* −24 −82 −14 4.98R calcarine* 16 −88 4 5.19R middle occipital* 24 −94 8 5.16R lingual 16 −62 −8 3.75R lateral geniculate nucleus 33 24 −30 −2 3.81

Listening>Reading [(TEXT_L−WORD_L)− (TEXT_R−WORD_R)]R heschl 13 44 −20 8 3.40

Texts minus sentencesReading>Listening [(TEXT_R−WORD_R)− (TEXT_L−WORD_L)]

L middle occipital 55 −14 100 0 3.98L middle occipital 16 −28 −90 0 3.61R middle occipital 68 28 −94 6 4.26

Listening>Reading [(TEXT_L−WORD_L)− (TEXT_R−WORD_R)]No cluster above threshold

Sentences minus wordsReading>Listening [(SENT_R−WORD_R)− (SENT_L−WORD_L)]

L calcarine 78 −10 −86 2 3.47L middle occipital −14 −92 −8 3.35R calcarine 120 14 −76 8 3.87R lingual 23 14 −70 −6 3.34

Listening>Reading [(SENT_L−WORD_L)− (SENT_R−WORD_R)]No cluster above threshold

Statistical maps were thresholded at p<0.001 uncorrected and inclusively masked by the main activation map (i.e. TEXT_R and TEXT_L for the text minuswords and the text minus sentences, SENT_R and SENT_L for the sentence minus words comparisons). The threshold used for the mask was set p<0.05uncorrected. Coordinates of activation peaks are given in the MNI stereotactic space. Activation peaks still significantly activated at the corrected threshold ofp<0.05 are indicated by an asterisk (L: left, R: right, List: listening, Read: reading, IFG: inferior frontal gyrus).

793G. Jobard et al. / NeuroImage 34 (2007) 784–800

activate the VWFA to a similar degree (Price et al., 2003), itsactivity during word listening could correspond to subjects relatingthe meaning of the words to the visual attributes of the objectsnamed (Vigneau et al., 2005). Its role would then be broader thanpurely orthographical/alphabetical processing of the visual words,and would be involved in the binding of all visual representations(lexical and pre-lexical letter assemblies for words, but also imagesof objects) to their verbal counterparts.

During listening more than reading, the superior temporalcortex, known to encompass primary and associative auditoryareas, was greatly activated in both hemispheres, extending fromthe temporal pole to the back of the gyrus, and spreading to themid-part of the middle temporal gyrus. Beside the numerousstudies detailing its contribution in phonological tasks and humanvoice processing (see Vigneau et al., 2006 for a review), thepreference of the superior temporal cortex – together with thesuperior temporal sulcus – for auditory over written languageprocessing has been reported by previous studies (Chee et al.,1999; Booth et al., 2002b). It is noteworthy that this region ofactivation encompasses an area located in the anterior part of thesuperior temporal gyrus – spanning the superior bank of thesuperior temporal sulcus – and previously associated with theauditory equivalent of the VWFA. This auditory word form areawas originally identified on the basis of auditory word repetitionsuppression and its lack of activation during written wordprocessing (Price et al., 2005; Cohen et al., 2004). Similarly to

the VWFA, its dedication to the processing of spoken words isalso under debate, as it also corresponds to the voice area andshows enhanced activity during the perception of vocal soundscompared to other types of sounds (Belin et al., 2000, 2002).Rather than being specific to words, this region could betriggered by auditory processes relative to frequency componentsoccurring in speech, but also in other types of human vocalsounds (laughs, coughs…).

Multimodal network of language comprehension

Complementarily to the previous analysis opposing visual andauditory conditions in order to test for cerebral preferenceassociated with a given modality, we also looked for the brainareas jointly activated in both modalities. It is important to keep inmind that such a conjunction analysis explores the commonnetworks to both reading and listening independently of linguisticcomplexity. As it will be shown in the next paragraphs, multimodalregions may still exist that would not be uncovered by thisconjunction, simply because their implication does not span acrossall levels of linguistic complexity. This conjunction is neverthelessvery informative in that it enlightens the most stable brainnetworks involved in the comprehension of variously complexlinguistic units. We thus revealed the involvement in languagecomprehension tasks of the precentral, the inferior frontal and

794 G. Jobard et al. / NeuroImage 34 (2007) 784–800

temporal gyri as well as the posterior part of the middle temporalgyrus, close to the superior temporal sulcus.

Phonological processingA quite large frontal cluster of activation was found commonly

recruited by auditory and visual tasks. This cluster extended from thesuperior part of the left precentral gyrus into the inferior frontalgyrus. Situated immediately anterior to the mouth primary motorarea, this bilateral precentral region has been reported in manystudies calling for overt or covert repetition of syllables (Wildgruberet al., 2001), phonemes (Bookheimer et al., 1998, 2000) and words(Price et al., 1996), as well as syllable discrimination (Poeppel et al.,2004). Given that this area activates during both production andperception of monosyllabic speech sounds, Wilson recentlyproposed an involvement of this area in the retrieval of the phoneticcodes postulated by Liberman (Wilson et al., 2004). According toLiberman, articulatory motor commands at the origin of speechsounds most accurately define the phonemes of a given language.These commands would constitute phonetic codes that serve duringlistening as a basis for pairing the sound heard and the adequatephoneme it represents (Liberman and Whalen, 2000). In the context

Fig. 5. Modality/complexity interaction. Activation peaks showing a greater involvlistening than reading in the whole group (N=10). Both contrasts were set at p<0reading, the reverse contrast was masked by text listening (p<0.05 uncorrected, iBOLD signal variations observed during reading (yellow) and listening tasks (blu

of speech perception, the retrieval of phonetic codes would helpdisambiguate the auditory flow analyzed by the superior temporalcortex, and permit the access to phonemes. While reading, theprecentral involvement would manifest the activation of thesearticulatory codes during the subvocal pronunciation of the visuallypresented words.

Whereas this precentral activation is probably tied to a more motoraspect of phonology, we believe this region works in close relationshipwith a sub-part of the inferior frontal gyrus also implicated inphonology. The dorsal aspect of the triangular part of Broca’s area hasindeed been described in contrasts maximizing the role of phonology,such as visual word rhyming tasks (Booth et al., 2002a), non-word vs.word reading (Paulesu et al., 2000) and pseudowords repetitioncompared to verb generation (Warburton et al., 1996).

On the basis of our previous meta-analysis, we expected thesuperior temporal gyrus to appear in the phonological networkscommon to reading and hearing, since its activation had been shownduring the phonological reconstruction of written words and duringthe phonological analysis of heard words (Jobard et al., 2003). Itsabsence in the present conjunction may reflect the fact that subjectsdid not rely systematically on the phonological reconstruction of

ement for texts than words (A) during reading than listening and (B) during.001 uncorrected. The reading minus listening contrast was masked by textnclusive). Activation peaks were projected onto the MNI template. Graph:e).

Table 4Complexity effect: mean activation peaks identified for the contrast betweenthe three different complexity levels, irrespective of the modality (N=10)

Anatomical localization Nvoxels

Stereotaxiccoordinates

Zscores

X Y Z

Texts minus words [(TEXT_R+TEXT_L)− (WORD_R+WORDT_L)]Frontal lobe

L superior precentral 37 −42 4 50 3.39L superior precentral −46 4 54 3.34

Temporal lobeL posterior STS* 372 −48 −52 20 4.77L post middle temporal ventral −58 −56 −2 4.46L middle temporal −50 −66 20 3.94L ant STS 107 −58 −8 −16 4.12L ant middle temporal −56 −16 −12 3.87L middle temporal 24 −54 −36 −6 3.53L fusiform 15 −22 −42 −12 3.66R temporal pole 10 56 12 −24 3.76

Text minus sentences [(TEXT_R+TEXT_L)− (SENT_R+SENT_L)]L posterior STS* 250 −48 −54 20 4.73

Sentences minus words [(SENT_R+SENT_L)− (WORD_R+WORD_L)]Temporal lobe

L cerebellum 142 −18 −42 −18 4.23L fusiform −30 −40 −18 3.81L fusiform −24 −50 −16 3.59L ant STS 5 −56 −8 −14 3.44L post middle temporal ventral 2 −58 −56 −2 3.18R temporal pole 10 54 12 −22 3.40

Vermis 24 6 −36 −8 3.80

Statistical contrast maps were set at p<0.001 uncorrected, inclusivelymasked by (TEXT_R+TEXT_L) or (SENT_R+SENT_L) (p<0.05 uncor-rected) and exclusively masked by the corresponding interaction maps(p<0.05 uncorrected). Anatomical labelization is given in the MNIstereotaxic space. Activation peaks still significantly activated at thecorrected threshold of p<0.05 are indicated by an asterisk. Activationclusters encompassing less than 10 voxels were discarded, except for therestrictive contrast ‘sentence minus words’ that were already present andabove threshold for the ‘text minus word’ contrast (L: left, R: right, STS:superior temporal sulcus, IFG: inferior frontal gyrus).

795G. Jobard et al. / NeuroImage 34 (2007) 784–800

written words during reading and may instead have accessed thewords by way of the lexico-semantic route, by visual form-to-meaning association. This form-to-meaning mapping allows a fasteraccess to words that is independent of word-length, provided thesubject has encountered the target words often enough to encodesuch connections. Given that the stimuli were quite frequent and thatour subjects were all expert readers, they were very likely to use thishighly efficient route built on their reading experience, resulting inthe combined activation of the VWFA and semantic regions. Withless frequent words, or not so experienced subjects, it is probable thata bigger involvement of the grapho-phonological conversion wouldhave been observed, resulting in a bigger involvement of thesuperior temporal sulcus necessary to the grapheme to phonemeconversion. This difference in terms of word access during thereading tasks would have contributed to the apparition of this regionin the present conjunction.

Lexico-semantic accessIn studies relying on auditory or written language, the triangular

part of the inferior frontal gyrus, the inferior temporal gyrus andthe posterior part of the middle temporal gyrus have all beendescribed as playing a role in semantics. The ventral aspect of parstriangularis of the left inferior frontal gyrus is indeed repeatedlyreported when subjects have to monitor semantic attributes ofstimuli (Binder et al., 1996; Adams and Janata, 2002; Bright et al.,2004) and to generate verbs associated with a target word (Petersenet al., 1998; Poldrack et al., 1999). The posterior part of the middletemporal gyrus, close to the superior temporal sulcus, sometimesreferred to by some authors as Wernicke’s area,1 is systematicallyinvolved during access to words meaning in visual (Pugh et al.,1996; Price et al., 1997; Fiebach et al., 2002) and auditorymodalities (Thompson-Schill et al., 1999; Bookheimer et al., 1998,2000). The inferior temporal region seems to correspond to theBasal Temporal Language (BTLA) area described in neurologicalstudies (Luders et al., 1991).2 The BTLA is generally described assituated in the left ventral temporal cortex with varying degrees ofexactitude at a location ranging from 3 to 7 cm from the tip of thetemporal lobe. By observing impairment of object naming, Krausssituated it 4.6±0.5 cm from the temporal tip (Krauss et al., 1996)which approximately coincides with the actual location. Electricalstimulation of the BTLA has been associated with variouslanguage deficits such as naming and comprehension, althoughsome studies found no long lasting language impairments afterBTLA resection (Krauss et al., 1996). In the functional imagingfield, based on coordinates around y=−40, its implication wasreported during written (Fiez et al., 1999; Hagoort et al., 1999;Cappa et al., 1998) or auditory word comprehension (Binder et al.,1996; Démonet et al., 1992, 1994), as well as in semantic decisionon objects (Thompson-Schill et al., 1999). Büchel moreovershowed its recruitment by sighted and blind subjects during areading task (or Braille reading task for blind subjects), with noeffect of word imageability (Buchel et al., 1998). The lack of

1 Although this label can be confusing, given the semantic glide from theinitial ‘auditory word form center’ to ‘semantic integration center’.2 Note that this cluster of activation is situated at the edge of the mask

delimiting the actual functional data for our group of subjects. Because ofstrong susceptibility artefacts in this region, we could not measure fMRIsignal anterior to this point. It may therefore well be that this BTLAactivations extends in fact more anteriorly towards the tip of the temporalpole than what we observe here.

BTLA modulation by word imageability suggests a role inaccessing the properties of stimuli evoked by words in a trulysupramodal fashion. This region was formerly reported by Cheeet al. (1999) as being active only during auditory abstract/concretedecision—and not during decision on visual words. Bucknerhowever, in accordance with our results, also observed itsimplication in both modalities (Buckner et al., 2000).

Our former suggestion that the set of areas we just described forma lexico-semantic network (Jobard et al., 2003) finds further supportin this study with the conjoint activation of all its constituents (i.e.inferior frontal and temporal gyri as well as the posterior part of themiddle temporal gyrus, close to the superior temporal sulcus).Furthermore, this result confirms the independency of the wholenetwork towards modality of input since all regions were activated ata similar degree during both visual and written tasks (see Fig. 4).

Functional inhomogeneity of the ventral occipito-temporalactivation cluster

The results of the conjunction between visual and auditorycomprehension led us to conclude to a semantic role of the ventral

Fig. 6. Complexity effect: Activation peaks identified in the whole group (N=10) for the contrast texts minus words, independently of input modality. Contrastbetween texts and words was set at p<0.001 uncorrected and was masked by Text (p<0.05 uncorrected, inclusive). Activation peaks were projected onto theMNItemplate. Graph: BOLD signal variations observed during reading (yellow) and listening tasks (blue).

796 G. Jobard et al. / NeuroImage 34 (2007) 784–800

cluster we described as the BTLA (coordinates: −50 −44 −10). Asurprising result, however, concerned the posterior extent of thisactivation cluster. In its most posterior part indeed, this regionactually corresponds to the occipito-temporal junction we reportedabove as being more engaged by the visual modality (coordinates:−48 −56 −12). While the conjunction analysis reveals areascommonly activated during visual and auditory tasks, it does notexclude the possibility that a region of this network is actuallymore activated in a given task. The activation profiles within theresulting network may thus differ largely between the two tasks,depending on the local maximum under scrutiny. When looking atthe most anterior local maximum corresponding to the BTLA,auditory and visual tasks elicited a similar degree of activation (seeoccipito-temporal histograms on Fig. 4). On the contrary, the most

posterior local maximum – the occipito-temporal junction possiblycorresponding to the VWFA – was jointly activated during visualand auditory comprehension, but it was still significantly moreactivated during reading tasks since it appeared in the visual vs.auditory comparison. Cohen recently described similar behaviorswithin this occipito-temporal complex during phonological andorthographic tasks: (1) A joint and equal activation for bothmodalities in a left inferior temporal region he named ‘Left InferiorMultimodal Area’ – LIMA – and (2) a joint activation during readand spoken words (with a higher level of activation during thevisual modality) observed in a more posterior region he referred toas the VWFA.

While the activation profiles of the LIMA and VWFA describedby Cohen do correspond to the ones of the BTLA and the VWFA

Table 5Coordinates and activation profiles of the occipito-temporal junction's local maxima compared to the ones found in Cohen 2004 and Jobard's meta-analysis(2003)

Visual*>Auditory Visual*>Auditory* Visual*≈Auditory*

X Y Z Label X Y Z Label X Y Z Label

Jobard 2003 −44 −58 −15 VWFA −48 −41 −16 BTLACohen 2004 −44 −68 −4 VWFA −48 −60 −16 LIMAThis study −40 −72 −18 L fusiform −48 −56 −12 VWFA −50 −44 −10 BTLA

While the activation profiles of the VWFA do match with the ones of the present study, the coordinates of the VWFA observed in this study and in our meta-analysis are more anterior than Cohen's 2004 VWFA and rather correspond to the LIMA. The asterisk * attached to the modality indicates that the region wassignificantly activated when that modality was compared to baseline at p<0.001 uncorrected in our study and p<0.01 in Cohen 2004.

797G. Jobard et al. / NeuroImage 34 (2007) 784–800

of this study, respectively, the coordinates associated with eachprofile do not match (see Table 5). Cohen’s Lateral InferiorMultimodal Area is indeed 16 mm more posterior than the BasalTemporal Language Area of this study, and is rather in the vicinityof the occipito-temporal region – ‘our’ VWFA – that exhibitedmore activation for visual than auditory tasks. ‘Cohen’s 2004VWFA’, however, is 12 mm posterior and 8 mm superior to ‘ours’,a location quite different from his earlier works (Cohen 2000: −43−54 −12 and Cohen 2002: −42 −57 −15 as opposed to Cohen2004: −44 −68 −4). While we cannot explain the discrepancy withCohen’s results in terms of spatial location, note that the presentcoordinates are very consistent with the antero-posterior gradientdescribed by the literature in the ventral pathway between semanticsand visual processing. As shown in Table 5, the coordinates of thisfMRI study are in fact very close to the clusters found in our earliermeta-analysis based on the works of several different researchers(Jobard et al., 2003), as well as Cohen’s earlier coordinates.

On the basis of the activation profiles, we believe the presentBTLAmay correspond to Cohen’s LIMA. According to us, the equalactivation of this region for the visual and auditory modalities andthe body of results issued from neurology and functional neuroima-ging strongly suggest its involvement in semantics. As discussedearlier, the non exclusive preference of our more posterior VWFAfor the visual modality originates from two different processes inauditory and visual word presentation, although they are both relatedin nature: during reading, the perceived words would trigger astronger implication of the VWFA by initiating an identification andbinding of orthographic units to semantic representations, while theweaker VWFA activation during audition would attest a top-downbinding between the names and the visual aspect of the objects.

Modality-specific increases associated with linguisticcomplexity

Having found areas specifically involved in each modality anda semantic integration network that was not depending on themodality of input, we tested whether the activity of some regionsincreased specifically in a given modality as a function of linguisticcomplexity. More precisely, we were interested in discoveringwhether some (possibly high-level) regions would be recruited byvision or audition independently to enable the comprehension of amore complex linguistic unit.

The regions exhibiting a modality by complexity interactionwere very few and were situated in unimodal association cortices.In bilateral occipital areas spreading from the calcarine scissure tothe lingual gyri, as well as in the Lateral geniculate nuclei, agradual increase was observed from word to text only for the visual

modality. The primary visual cortex and lateral geniculate nucleibeing retinotopically organized and the lingual gyrus beingsensitive to the size of the visually perceived stimuli (Indefrey etal., 1997), their gradual increase of activation during visual taskslikely corresponds to the increase in screen occupation as thesubject is confronted to words, sentences and texts containingseveral lines. Similarly, the only regions showing an increase fromword to text specifically during audition were situated in aunimodal auditory region of the right hemisphere, in the Heschlgyrus. Because increasing the linguistic complexity required theaddition of function words to keep the semantic coherence ofsentences and texts, it was associated with an increase in thenumber of words presented in each condition (in average, eachblock of sentences contained 36.8 function words more than wordblocks while text blocks included 40 function words more thanwords). While this effect adds on to the screen occupation effect invisual areas during reading, we believe that the Heschl gyrusactivation during listening is related to the greater number of wordsbeing perceived on the whole during the text condition comparedto words. Such an interpretation is in agreement with previousworks on the effect of stimulus rate of presentation during reading(Price et al., 1996) and listening to words (Price, 1992).

Impact of linguistic complexity: the neural basis of supramodalintegrative semantic processing of linguistic items

Our approach to linguistic complexity was somewhat differentfrom the earlier works in the field since we were primarilyinterested in discovering the differences in brain networks whensubjects were processing different linguistic units, rather thanenlightening areas engaged in the comprehension of similar unitsthat are syntactically more complex, as it is usually the case. Thismethodology has some drawbacks since it introduces a variationbetween conditions in terms of number of total stimuli (see above),but also because it prevents us from achieving fine grainedanalyses about the nature of syntactic or semantic processing.However, this paradigm enabled us to address the cerebral basesinvolved in the processing of ecological units of language, and tosort out regions that were common to all levels of processing (seethe common networks described above) as well as brain areas thatseemed to be specific to any of these linguistic units.

Regions presenting a gradual increase from word to textOnly one region (the precentral sulcus) was found significantly

active at the word level with a gradual increase in activity assubjects were reading sentences and texts. This brain area wasreported in the present study as being part of a phonological

798 G. Jobard et al. / NeuroImage 34 (2007) 784–800

network common to both modalities, and we think this gradualincrease may originate from the bigger number of stimuli to beanalyzed or subvocally pronounced in each condition. Followingthe same logics, a similar effect of the number of stimuli couldhave been expected also in regions engaged by lexico-semanticaccess, but the fact that only function words with no propersignification were added between conditions may have prevented asignificant increase to occur within this network.

Regions activated at the sentence levelSeveral temporal regions were deactivated during word com-

prehension but actually showed some activity when subjects wereconfronted to sentences and texts. This pattern of activation rules outa possible effect of unequal number of stimuli and is more likely tobe due to the use of specific processes triggered at least at thesentence level.

We found the anterior part of the left superior temporal sulcus,close to the temporal pole, to be selectively activated for sentencesand texts but not for words. Previous studies had shown itsimplication during the auditory comprehension of more complexlinguistic units than isolated words, such as texts (Mazoyer et al.,1993; Crinion et al., 2003) and sentences (Kriegstein et al., 2003;Kuperberg et al., 2000; Scott et al., 2000). Although they concernedisolated words, restrictive contrasts comparing associative semanticdecision tasks to phonological decisions yielded an activation in thisregion (Scott et al., 2000; Roskies et al., 2001; Binder et al., 1996).These tasks required the subjects to relate several concepts together(“can ‘messenger’ apply to human ?”, “is this animal both found inthe US and used by people ?”) and therefore points to a role of thispart of the brain in establishing the local coherence of the conceptualunits present within the sentence. A confirmation of this hypothesiscan be seen in the works of Partiot (Partiot et al., 1996) since heshowed a greater involvement of the anterior part of STS whensubjects had to judge whether the visually presented sentencesreferred to actions belonging to a given script (e.g. does “go in thewater” belong to the script “speaking/talking”?). This role is close tothe purported function of the more anterior temporal pole duringlanguage processing as proposed by several authors (Vigneau et al.,2006;Mazoyer et al., 1993; Vandenberghe et al., 2002), andwe thinkthe present activationmay be linked to the temporal pole. As amatterof fact, a loss of signal prevented us to detect any BOLD changesmore anterior along the left sulcus, probably because of suscept-ibility artifacts inherent to the fMRI technique.3

The only right hemisphere region showing specificity tosentences and texts compared to words was situated in the righttemporal pole. This region was reported by the same studies thatreported an implication of the left anterior STS (Mazoyer et al.,1993; Crinion et al., 2003). Considering the results obtained bystudies focusing on the processing of human voice, we believe thisregion could in fact be related to prosody, may it be directlyperceived or reconstructed by the subject while reading. VonKriegstein and Imaizumi indeed showed a greater involvement of avery similar region when subjects had to focus on the voice of thespeaker rather than on the semantic content of the sentence beingpronounced (Von Kriegstein et al., 2003; Imaizumi et al., 1997).

The last sentence- and text-specific activation we detected wassituated in the posterior part of the ventral middle temporal gyrus.This region was initially shown by Martin as being specifically

3 Both Mazoyer and Crinion reporting an activation in the temporal poleused PET rather than fMRI.

recruited when subjects had to generate an action related to thevisually presented object compared to when they had to generate acolor (Martin et al., 1995). The authors concluded that this region‘may be a critical site for stored knowledge about the visualpatterns of motion associated with the use of objects’. Phillipsrecently found similar results by demonstrating that this region wasactivated during the retrieval of actions associated with objects,while no activation could be observed with similar objects duringjudgements pertaining to their size (Phillips et al., 2002). Thisventral part of the middle temporal gyrus, situated 2 cm anterior tothe motion sensitive visual area, would therefore not be in chargeof motion processing per se, but rather involved in the associationof possible human actions and the perceived objects they can beapplied to. This interpretation is in agreement with a cross-paradigm study involving 60 subjects that showed a preferentialinvolvement of this area during naming or semantic decisionsinvolving man-made objects (particularly tools) rather than livingthings (Devlin et al., 2002). Devlin concluded that the posteriorventral part of the left middle temporal gyrus is part of a networkresponsible for the identification of graspable objects and is closelyrelated to the knowledge concerning the movements associatedwith the object perceived. In our study, sentences and textsdescribed everyday actions in a script-like fashion narrated at thefirst person that are prompt to engage visual imagery of the scene.We therefore believe that its implication in sentences and textswitnesses the mental representation by the subjects of the actiondescribed by the verbal material. This activation is thus likely to berelated to the semantic content of our material, rather than beingdue to specific processes involved by more complex linguisticunits.

Regions activated at the text levelThe only text-specific region of this study was situated in the

posterior part of STS, a region previously reported by some authorsduring text comprehension (Crinion et al., 2003; Vingerhoets et al.,2003). This region corresponded to the STSp described inVigneau’s meta-analysis, and is thought to be involved in theprocessing of context, particularly when sentences are linkedtogether (Vigneau et al., 2006). Specifically, we believe this regionmay be responsible for establishing the sequential coherence of therepresentation constructed from the text. In their studies usingscripts, both Partiot and Crozier reported a very similar activationwhen subjects had to judge the temporal ordering of actions withina script (Partiot et al., 1996; Crozier et al., 1999). Such anactivation in the present study during the comprehension of textsand not sentences could corroborate this role, with subjectsbuilding an ordered representations of all actions described withineach script.

Conclusion

The results of this study are in agreement with theories of brainorganization postulating the dedication of unimodal brain regionsto the processing of low-level information, while modalityindependent regions would take in charge more abstract levels(Mesulam, 1998). Regions preferentially involved in one modalitycompared to the other were indeed quite restricted to unimodalassociative cortex, even when subjects had to process complexlinguistic units such as sentences or texts. We demonstrate theexistence of a large common brain networks constituted of frontaland temporal regions necessary to extract the meaning of words

799G. Jobard et al. / NeuroImage 34 (2007) 784–800

and to process their phonological form. Except for the so-calledVWFA that was more active during reading than hearing, thisnetwork common to both modalities was engaged in equalproportions during the comprehension of spoken and writtenlanguage. Among the modality independent regions, the use ofgradually more complex linguistic units revealed several regionsspecifically involved for sentences, although it does not lendsupport for the existence of brain areas merely involved in syntax.Indeed, sentence-specific regions are discussed in terms ofprosody, post-lexical semantics needed to establish an interpreta-tive context and imagination of human movements described bythe content of our sentences. The role of the posterior part of theleft superior temporal sulcus, solely involved during textcomprehension, is thought to be related with the temporal orderingof events, although further investigations will be needed to confirmthis hypothesis.

References

Adams, R.B., Janata, P., 2002. A comparison of neural circuits underlyingauditory and visual object categorization. NeuroImage 16.2, 361–377.

Belin, P., et al., 2000. Voice-selective areas in human auditory cortex. Nature403.6767, 309–312.

Belin, P., Zatorre, R.J., Ahad, P., 2002. Human temporal-lobe response tovocal sounds. Brain Res. Cogn. Brain Res. 13.1, 17–26.

Binder, J.R., et al., 1996. Function of the left planum temporale in auditoryand linguistic processing. Brain 119, 1239–1247.

Bookheimer, S.Y., Zeffiro, T.A., Blaxton, T.A., Gaillard, W.D., Malow, B.,Theodore, W.H., 1998. Regional cerebral blood flow during auditoryresponsive naming: evidence for cross-modality neural activation.NeuroReport 9 (10), 2409–2413 (Ref Type: Journal (Full)).

Bookheimer, S.Y., et al., 2000. Activation of language cortex with automaticspeech tasks. Neurology 55.8, 1151–1157.

Booth, J.R., et al., 2002a. Functional anatomy of intra- and cross-modallexical tasks. NeuroImage 16.1, 7–22.

Booth, J.R., et al., 2002b. Modality independence of word comprehension.Hum. Brain Mapp. 16.4, 251–261.

Bright, P., Moss, H., Tyler, L.K., 2004. Unitary vs. multiple semantics: PETstudies of word and picture processing. Brain Lang. 89.3, 417–432.

Buchel, C., Price, C., Friston, K., 1998. A multimodal language region in theventral visual pathway. Nature 394.6690, 274–277.

Buckner, R.L., et al., 2000. Functional MRI evidence for a role of frontal andinferior temporal cortex in amodal components of priming. Brain 123(Pt. 3), 620–640.

Cappa, S.F., et al., 1998. The effects of semantic category and knowledgetype on lexical-semantic access: a PET study. NeuroImage 8.4,350–359.

Carpentier, A., Pugh, K.R., Westerveld, M., Studholme, C., Skrinjar, O.,Thompson, J.L., Spencer, D.D., Constable, R.T., 2001. FunctionalMRI of language processing: dependence on input modality andtemporal lobe epilepsy. Epilepsia 42 (10), 1241–1254 (Ref Type:Journal(Full)).

Chee, M.W., et al., 1999. Processing of visually presented sentences inmandarin and English studied with fMRI. Neuron 23.1, 127–137.

Cohen, L., et al., 2000. The visual word form area: spatial and temporalcharacterization of an initial stage of reading in normal subjects andposterior split-brain patients. Brain 123 (Pt. 2), 291–307.

Cohen, L., et al., 2004. Distinct unimodal and multimodal regions for wordprocessing in the left temporal cortex. NeuroImage 23.4, 1256–1270.

Content, A., Mousty, P., Radeau, M., 1990. Brulex, une base de donnéeslexicales informatisée pour le français écrit et parlé. Année Psychol. 90,551–566.

Crinion, J.T., et al., 2003. Temporal lobe regions engaged during normalspeech comprehension. Brain 126 (Pt. 5), 1193–1201.

Crozier, S., et al., 1999. Distinct prefrontal activations in processingsequence at the sentence and script level: an fMRI study. Neuropsycho-logia 37.13, 1469–1476.

Dehaene, S., et al., 2002. The visual word form area: a prelexicalrepresentation of visual words in the fusiform gyrus. NeuroReport13.3, 321–325.

Devlin, J.T., et al., 2002. Anatomic constraints on cognitive theories ofcategory specificity. NeuroImage 15.3, 675–685.

Démonet, J.F., et al., 1992. The anatomy of phonological and semanticprocessing in normal subjects. Brain 115, 1753–1768.

Démonet, J.F., et al., 1994. Differential activation of right and left posteriorsylvian regions by semantic and phonological tasks: a positron-emissiontomography study in normal human subjects. Neurosci. Lett. 182,25–28.

Fiebach, C.J., et al., 2002. fMRI evidence for dual routes to the mentallexicon in visual word recognition. J. Cogn. Neurosci. 14.1, 11–23.

Fiez, J.A., et al., 1999. Effects of lexicality, frequency, and spelling-to-soundconsistency on the functional anatomy of reading. Neuron 24.1,205–218.

Friston, K.J., et al., 1995. Analysis of fMRI time-series revisited.NeuroImage 2.1, 45–53.

Geschwind, N., 1970. The organization of language and the brain. Science170, 940944.

Hagoort, P., et al., 1999. The neural circuitry involved in the reading ofGerman words and pseudowords: a PET study. J. Cogn. Neurosci. 11.4,383–398.

Imaizumi, S., et al., 1997. Vocal identification of speaker and emotionactivates different brain regions. NeuroReport 8, 2809–2812.

Indefrey, P., et al., 1997. Equivalent responses to lexical and nonlexicalvisual stimuli in occipital cortex: a functional magnetic resonanceimaging study. NeuroImage 5.1, 78–81.

Jobard, G., Crivello, F., Tzourio-Mazoyer, N., 2003. Evaluation of the dualroute theory of reading: a metanalysis of 35 neuroimaging studies.NeuroImage 20.2, 693–712.

Krauss, G.L., et al., 1996. Cognitive effects of resecting basal temporallanguage areas. Epilepsia 37.5, 476–483.

Kriegstein, K., et al., 2003. Modulation of neural responses to speech bydirecting attention to voices or verbal content. cognitive brain research17, 48–55.

Kuperberg, G.R., et al., 2000. Common and distinct neural substrates forpragmatic, semantic, and syntactic processing of spoken sentences: anfMRI study. J. Cogn. Neurosci. 12.2, 321–341.

Liberman, A.M., Whalen, D.H., 2000. On the relation of speech to language.TICS 4.5, 187–196.

Luders, H., et al., 1991. Basal temporal language area. Brain 114 (Pt. 2),743–754.

Martin, A., et al., 1995. Discrete cortical regions associated with knowledgeof color and knowledge of action. Science 270, 102–105.

Mazoyer, B., et al., 1993. The cortical representation of speech. J. Cogn.Neurosci. 5, 467–479.

Mesulam, M.M., 1998. From sensation to cognition. Brain 121, 1013–1052.Michael, E.B., et al., 2001. fMRI investigation of sentence comprehension

by eye and by ear: modality fingerprints on cognitive processes. Hum.Brain Mapp. 13.4, 239–252.

Oldfield, R.C., 1971. The assessment and analysis of handedness: theEdinburgh inventory. Neuropsychologia 9, 97–113.

Partiot, A., et al., 1996. Brain activation during script event processing.NeuroReport 7.3, 761–766.

Paulesu, E., et al., 2000. A cultural effect on brain function. Nat. Neurosci.3.1, 91–96.

Petersen, S.E., et al., 1998. The effects of practice on the functional anatomyof task performance. Proc. Natl. Acad. Sci. 95.3, 853–860.

Phillips, J.A., et al., 2002. Can segregation within the semantic systemaccount for category-specific deficits? Brain 125 (Pt. 9), 2067–2080.

Poeppel, D., et al., 2004. Auditory lexical decision, categorical perception,and FM direction discrimination differentially engage left and rightauditory cortex. Neuropsychologia 42.2, 183–200.

800 G. Jobard et al. / NeuroImage 34 (2007) 784–800

Poldrack, R.A., et al., 1999. Functional specialization for semantic andphonological processing in the left inferior prefrontal cortex. Neuro-Image 10.1, 15–35.

Price, C.J., Devlin, J.T., 2003. The myth of the visual word form area.NeuroImage 19, 473–481.

Price, C.J., Devlin, J.T., 2004. The pro and cons of labelling a leftoccipitotemporal region: “the visual word form area”. NeuroImage 22.1,477–479.

Price, C., 1992. Regional response differences within the human auditorycortex when listening to words. Neurosci. Lett. 146, 179–182.

Price, C.J., et al., 1996. Hearing and saying. the functional neuro-anatomy ofauditory word processing. Brain 119, 919–931.

Price, C.J., et al., 1997. Segregating semantic from phonological processesduring reading. J. Cogn. Neurosci. 9.6, 727–733.

Price, C.J., et al., 2003. Cortical localisation of the visual and auditory wordform areas: a reconsideration of the evidence. Brain Lang. 86.2,272–286.

Price, C.J., Thierry, G., Griffiths, T., 2005. Speech-specific auditoryprocessing: where is it ? Trends Cogn. Sci. 9.6, 271–276.

Pugh, K.R., et al., 1996. Cerebral organization of component processes inreading. Brain 119 (Pt. 4), 1221–1238.

Roskies, A.L., et al., 2001. Task-dependent modulation of regions in the leftinferior frontal cortex during semantic processing. J. Cogn. Neurosci.13.6, 829–843.

Scott, S.K., et al., 2000. Identification of a pathway for intelligible speech inthe left temporal lobe. Brain 123 (Pt. 12), 2400–2406.

Thompson-Schill, S.L., et al., 1999. A neural basis for category and modalityspecific of semantic knowledge. Neuropsychologia 37, 671–676.

Vandenberghe, R., Nobre, A.C., Price, C.J., 2002. The response of lefttemporal cortex to sentences. J. Cogn. Neurosci. 14.4, 550–560.

Vigneau, M., et al., 2005. Word and non-word reading; what role for thevisual word form area? NeuroImage 27.3, 694–705.

Vigneau, M., Beaucousin, V., Hervé, P.Y., Duffau, H., Crivello, F., Houdé, O.,Mazoyer, B., Tzourio-Mazoyer, N., 2006. NeuroImage 30, 1414–1432.

Vingerhoets, G., et al., 2003. Multilingualism: an fMRI study. NeuroImage20.4, 2181–2196.

von Kriegstein, K., et al., 2003. Modulation of neural responses to speech bydirecting attention to voices or verbal content. Brain Res.Cogn BrainRes. 17.1, 48–55.

Warburton, E.A., et al., 1996. Noun and verb retrieval by normal subjects.Studies with PET. Brain 119, 159–179.

Wernicke, C., 1874. Der aphasische symptomemcomplex.Wildgruber, D., Ackermann, H., Grodd, W., 2001. Differential contribu-

tions of motor cortex, basal ganglia, and cerebellum to speech motorcontrol: effects of syllable repetition rate evaluated by fMRI. Neuro-Image 13.1, 101–109.

Wilson, S.M., et al., 2004. Listening to speech activates motor areasinvolved in speech production. Nat. Neurosci. 7.7, 701–702.