enhancement of visual perception by crossmodal visuo-auditory interaction

12
Exp Brain Res (2002) 147:332–343 DOI 10.1007/s00221-002-1262-y RESEARCH ARTICLE Francesca Frassinetti · Nadia Bolognini · Elisabetta Làdavas Enhancement of visual perception by crossmodal visuo-auditory interaction Received: 6 March 2002 / Accepted: 21 August 2002 / Published online: 10 October 2002 # Springer-Verlag 2002 Abstract Neurophysiological studies have shown in animals that a sudden sound enhanced perceptual pro- cessing of subsequent visual stimuli. In the present study, we explored the possibility that such enhancement also exists in humans and can be explained through cross- modal integration effects, whereby the interaction occurs at the level of bimodal neurons. Subjects were required to detect visual stimuli in a unimodal visual condition or in crossmodal audio-visual conditions. The spatial and the temporal proximity of multisensory stimuli were system- atically varied. An enhancement of the perceptual sensi- tivity (d’) for luminance detection was found when the audiovisual stimuli followed a rather clear spatial and temporal rule, governing multisensory integration at the neuronal level. Keywords Crossmodal integration · Visual-acoustic interaction · Perceptual sensitivity Introduction In several brain structures, including the superior collic- ulus (Stein and Meredith 1993), plus regions of the cortex (Wallace et al. 1992), there are neurons that respond to stimuli in more than one modality. They typically have spatially corresponding receptive fields and can show overadditive responses to multimodal stimulation at the same location, compared with responses to unimodal stimulation (Wallace et al. 1992; Stein and Meredith 1993). Their activity shows a response gradient: visual and acoustical stimuli occurring in closed spatial and temporal proximity produce neuronal response enhance- ment, whereas spatially and temporally disparate stimuli produce either depression or no change in neuronal response (the so-called spatial and temporal rules) (Stein and Meredith 1993). Furthermore, multisensory neurons show proportionately greater response enhancement when unimodal stimuli are less effective (e.g. lower intensity) (Stein and Meredith 1993; Stein et al. 1996). The capacity of a stimulus from one sensory modality (auditory) to affect the salience of a cue from another modality (visual) is influenced by the efficacy of the stimuli used: when unimodal stimuli evoke no response, their combination becomes effective and produces a surprisingly vigorous response (the so-called inverse effectiveness rule) (Mered- ith and Stein 1986). In order to investigate whether the integrative effects observed in animals could also be found in humans, an experiment was conducted on which signal detection measures were used and the visual stimuli were presented below threshold. Signal detection measures allow for the separation of perceptual level and decision level, with the d’ parameter reflecting the subject’s accuracy to discern a sensory event from its background (perceptual level), and the b parameter reflecting the subject’s decision criterion of response (decision-level). Moreover because multisen- sory response occurs with weak unimodal stimuli, i.e. stimuli unable to elicit a neural response when presented (Stein and Meredith 1993), the visual target was degraded by using a visual mask. Subjects were asked to fixate a central point and to detect masked visual-targets horizontally displayed at 8, 24, 40 and 560, in the left visual field (LVF) as well as in the right visual field (RVF). The task was performed either in a unimodal condition (i.e. only visual stimuli were presented) or in crossmodal conditions (i.e. a sound was presented together with the visual target). In the crossmodal conditions, sounds could be presented either at the same spatial position as the visual target (spatially coincident crossmodal condition), or at one of the remaining seven spatial positions (spatially disparate crossmodal conditions) (see Fig. 1). F. Frassinetti ( ) ) · N. Bolognini · E. Làdavas Dipartimento di Psicologia, Università degli Studi di Bologna, Viale Berti Pichat, 5-40127 Bologna, Italy e-mail: [email protected] Tel.: +39-051-2091847 Fax: +39-051-2091844 F. Frassinetti · N. Bolognini · E. Làdavas Centro di Neuroscienze Cognitive, Cesena, Italy

Upload: independent

Post on 12-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Exp Brain Res (2002) 147:332–343DOI 10.1007/s00221-002-1262-y

R E S E A R C H A R T I C L E

Francesca Frassinetti · Nadia Bolognini ·Elisabetta L�davas

Enhancement of visual perceptionby crossmodal visuo-auditory interaction

Received: 6 March 2002 / Accepted: 21 August 2002 / Published online: 10 October 2002� Springer-Verlag 2002

Abstract Neurophysiological studies have shown inanimals that a sudden sound enhanced perceptual pro-cessing of subsequent visual stimuli. In the present study,we explored the possibility that such enhancement alsoexists in humans and can be explained through cross-modal integration effects, whereby the interaction occursat the level of bimodal neurons. Subjects were required todetect visual stimuli in a unimodal visual condition or incrossmodal audio-visual conditions. The spatial and thetemporal proximity of multisensory stimuli were system-atically varied. An enhancement of the perceptual sensi-tivity (d’) for luminance detection was found when theaudiovisual stimuli followed a rather clear spatial andtemporal rule, governing multisensory integration at theneuronal level.

Keywords Crossmodal integration · Visual-acousticinteraction · Perceptual sensitivity

Introduction

In several brain structures, including the superior collic-ulus (Stein and Meredith 1993), plus regions of the cortex(Wallace et al. 1992), there are neurons that respond tostimuli in more than one modality. They typically havespatially corresponding receptive fields and can showoveradditive responses to multimodal stimulation at thesame location, compared with responses to unimodalstimulation (Wallace et al. 1992; Stein and Meredith1993). Their activity shows a response gradient: visualand acoustical stimuli occurring in closed spatial andtemporal proximity produce neuronal response enhance-

ment, whereas spatially and temporally disparate stimuliproduce either depression or no change in neuronalresponse (the so-called spatial and temporal rules) (Steinand Meredith 1993). Furthermore, multisensory neuronsshow proportionately greater response enhancement whenunimodal stimuli are less effective (e.g. lower intensity)(Stein and Meredith 1993; Stein et al. 1996). The capacityof a stimulus from one sensory modality (auditory) toaffect the salience of a cue from another modality (visual)is influenced by the efficacy of the stimuli used: whenunimodal stimuli evoke no response, their combinationbecomes effective and produces a surprisingly vigorousresponse (the so-called inverse effectiveness rule) (Mered-ith and Stein 1986).

In order to investigate whether the integrative effectsobserved in animals could also be found in humans, anexperiment was conducted on which signal detectionmeasures were used and the visual stimuli were presentedbelow threshold. Signal detection measures allow for theseparation of perceptual level and decision level, with thed’ parameter reflecting the subject’s accuracy to discern asensory event from its background (perceptual level), andthe b parameter reflecting the subject’s decision criterionof response (decision-level). Moreover because multisen-sory response occurs with weak unimodal stimuli, i.e.stimuli unable to elicit a neural response when presented(Stein and Meredith 1993), the visual target was degradedby using a visual mask.

Subjects were asked to fixate a central point and todetect masked visual-targets horizontally displayed at 8,24, 40 and 56�, in the left visual field (LVF) as well as inthe right visual field (RVF). The task was performedeither in a unimodal condition (i.e. only visual stimuliwere presented) or in crossmodal conditions (i.e. a soundwas presented together with the visual target). In thecrossmodal conditions, sounds could be presented eitherat the same spatial position as the visual target (spatiallycoincident crossmodal condition), or at one of theremaining seven spatial positions (spatially disparatecrossmodal conditions) (see Fig. 1).

F. Frassinetti ()) · N. Bolognini · E. L�davasDipartimento di Psicologia, Universit� degli Studi di Bologna,Viale Berti Pichat, 5-40127 Bologna, Italye-mail: [email protected].: +39-051-2091847Fax: +39-051-2091844

F. Frassinetti · N. Bolognini · E. L�davasCentro di Neuroscienze Cognitive, Cesena, Italy

Moreover, in order to determine the effect of thetemporal proximity on auditory-visual interaction, theacoustic and the visual stimuli were presented simulta-neously (experiment 1a) or the acoustic stimulus precededthe visual stimulus by 500 ms (experiment 1b).

According to the crossmodal integration hypothesis,the d’ parameter should vary with the degree of spatialand temporal correspondence between the acoustic andthe visual stimuli: if crossmodal integration facilitatedvisual perceptual processes, an increased perceptualsensitivity (d’) is expected by presenting temporallyoverlapping visual and acoustic stimuli in the same spatialposition. Conversely, if crossmodal integration affectspost-perceptual decision processes, then a reduction of thedecision criterion parameter, b, is expected; this isbecause subjects might require less visual informationto decide whether a target has been presented, when asimultaneous sound is presented in the same location.

Experiment 1

Materials and method

Subjects

Participants were 13 right-handed volunteers (aged 22–27 years).They were students of the University of Bologna and they receivedcourse credits for their participation. All had normal or correctedvision and normal hearing. All were unaware of the purpose of theexperiment.

Apparatus and stimuli

The apparatus consisted of eight piezoelectric loudspeakers (0.4 W,8 W), arranged horizontally at the subject’s ear level. Loudspeakers,located at an eccentricity of 8, 24, 40 and 56� to the left and right ofthe central fixation point, were mounted on a plastic horizontal arc(height 30 cm, length 200 cm) and were covered by a strip of blackfabric. Eight light displays, placed directly in front of eachloudspeaker, were visible poking out from the black fabric. Eachlight display contained four red light-emitting diodes (LEDs)arranged in a 1� square and one green LED positioned at the centerof the square. All light/speaker arrays were equidistant from thesubject’s head. We refer to the auditory positions by labels A1 toA8 when moving from left to right and similarly we describe thecorresponding lateral visual target positions by labels V1 to V8.

Acoustic stimulus was a white noise burst (80 dB) and thevisual target was a flash of green LED (luminance 90 cd/m2). Thevisual mask was a simultaneous flash of all four red LEDs(luminance 80 cd/m2 each). Target and mask never appeared atdifferent locations. The duration of the visual target was graduallyreduced from 100 to 60 ms prior to data collection and before eachblock of trials, in order to maintain a 60–70% hit rate. Immediatelyfollowing the visual target, a visual mask was presented. Theduration of the visual mask (from 10 to 50) increased when theduration of the visual target decreased so that the visual stimulation(visual target+visual mask) had always the same duration (110 ms).The acoustic stimulus had the same duration of the visual target(from 100 to 60 ms).

Procedure

Participants, sat at approximate 70 cm in front of the apparatus,were instructed to fix their gaze on a small white triangle (1�),pasted in the center of the apparatus.

Four different combinations of visual and acoustic stimuli couldbe presented: 1) a single visual stimulus, that is a visual targetfollowed by a visual mask (unimodal visual condition); 2) a visual

Fig. 1 Bird’s eye schematicview of the position of lightdisplays and loudspeakers

333

mask presented alone, without any preceding visual target (catchtrials for unimodal visual condition); 3) a visual stimulus (a visualtarget followed by a visual mask) and an auditory stimuluspresented together (crossmodal condition); 4) a single auditorystimulus without any visual target, followed by a visual mask (catchtrials for crossmodal condition). In the crossmodal condition, foreach visual position, the auditory stimulus could be presented eitherin the same position as the visual target (spatially coincidentcrossmodal condition) or in one of the remaining seven positions(spatially disparate crossmodal conditions). In the same way, in thecatch trials for crossmodal condition, the auditory stimulus could bepresented either in the same position as the visual mask (catch trialsfor spatially coincident crossmodal condition) or in one of theremaining seven positions (catch trials for spatially disparatecrossmodal condition).

The acoustic and the visual stimulus could be presentedsimultaneously (experiment 1a) or the acoustic stimulus wasfollowed, after a 500 ms interval, by the visual stimulus (exper-iment 1b).

In each experiment there were, for each spatial position, thefollowing trials: 12 unimodal visual trials, 12 visual catch trials, 12spatially coincident crossmodal trials, 42 spatially disparate cross-modal trials (six trials for each of the other seven positions), 12spatially coincident crossmodal catch trials and 42 spatiallydisparate crossmodal catch trials (six trials for each of the otherseven positions). As a consequence, on half of the trials the targetwas presented in the display, and in the other half of the trials thetarget was absent. In each experiment, the total number of trials(1056 trials, i.e. 132 trials for each of the eight positions) wasequally distributed in 24 experimental blocks (44 trials each), for atotal of 48 experimental blocks given in a random order and run intwo sessions. Each session lasted approximately 1 h and was run onconsecutive days.

Subjects were instructed to detect the presence of a visual target(green flash of light), by pressing a button with the index finger ofthe right hand, and to refrain from pressing when the target wasabsent. Each subject participated in experiments 1a and 1b.

Results

To assess the auditory cue effects on visual detectionperformance, d’ values and b values were analyzedseparately for each LED position (V1, V2, V3, V4, V5,V6, V7 and V8), using separate one-way ANOVAs. EachANOVA was carried out with Condition as main factor:one unimodal visual condition (e.g. V1) and eightcrossmodal conditions, in which the acoustic stimuluswas presented either in the same visual position (e.g. V1-A1), or in one of the other seven alternative positions (e.g.V1-A2, V1-A3, V1-A4, V1-A5, V1-A6, V1-A7 and V1-A8). Whenever necessary, pairwise comparisons wereconducted using the Newman-Keuls test. Only compar-isons that turned out to be significant were reported.

Each subject considered in the present study made atleast one false alarm and one omission error in eachcondition.

Experiment 1a

Signal detection analyses (d’) The mean d’ valuesshowed a selective increment of perceptual sensitivity (d’)in spatially coincident crossmodal condition comparing tounimodal visual condition, for all the spatial positions

considered in the present study: V1 [F(8,96)=2.9,P<0.007] (d’ 1.7 versus 0.5, P<0.001); V2 [F(8,96)=3.6,P<0.001] (d’ 2 versus 0.8, P<0.0009); V3 [F(8,96)=3.9,P<0.0005] (d’ 2.3 versus 1.2, P<0.0002); V4[F(8,96)=4.8, P<0.00006] (d’ 3.1 versus 1.6, P<0.0002);V5 [F(8,96)=3.1, P<0.004] (d’ 2.1 versus 1.2, P<0.02);V6 [F(8,96)=4.4, P<0.0002] (d’ 2.5 versus 1.4,P<0.0004); V7 [F(8,96)=4.1, P<0.0003] (d’ 2.2 versus1, P<0.0006) and V8 [F(8,96)=3.3, P<0.002] (d’ 1.9versus 0.7, P<0.001) (see Fig. 2 a–h, respectively).

Response criterion analyses (b) Analysis of responsecriterion (b) data for positions V1, V2, V3, V7, and V8showed that crossmodal stimulation did not influence theparticipants’ willingness to respond to most of the spatialpositions considered (P values always >0.05). By con-trast, subjects’ response to visual stimuli placed near thefovea (V4, V5 and V6) was influenced by the presence ofan acoustic stimulus, but not in a spatially specific way:V4 [F(8,96)=9.2, P<0.0000001], V5 [F(8,96)=3.3,P<0.002] and V6 [F(8,96)=6.4, P<0.000001] (see Fig. 3).

Experiment 1b

Signal detection analyses (d’) d’ values did not show aselective increment of perceptual sensitivity (d’) incrossmodal comparing to unimodal visual conditions. d’values are reported in Fig. 4.

Response criterion analyses (b) The results related to bwere not very consistent across the eight spatial positions(see Fig. 5). No difference was found between unimodaland crossmodal conditions for positions V1, V4 and V7.Instead, a decrement of b value in spatially disparatecrossmodal conditions as compared to unimodal conditionwas found in V2, V3, V5, V6 and V8: V2 [F(8,96)=2.9,P<0.005], V3 [F(8,96)=6.8, P<0.0000001], V5[F(8,96)=4.2, P<0.0002], V6 [F(8,96)=6.8, P<0.000001]and V8 [F(8,96)=2.6, P<0.01]. Moreover, in V8 asignificant difference was also found between the uni-modal and spatially coincident crossmodal condition(P<0.04).

Discussion

The results of experiment 1 show that an auditorystimulus presented at one spatial location facilitatesresponses to a visual target at that location. Instead, whenthe same visual and auditory stimuli were presented atspatially disparate loci, the detectability of the visualstimulus did not improve. Moreover, the capacity of anacoustic stimulus to enhance the detectability of a visualstimulus was evident only when the two stimuli werepresented simultaneously (experiment 1a). By contrastwhen the acoustic stimulus preceded the visual stimulusof 500 ms, no improvement in visual detectability was

334

found (experiment 1b). Moreover, the results from thesignal detection analyses showed that a sound influencedvision at early perceptual levels more than at later,decision-related levels. The presentation of temporallyoverlapping visual and acoustic stimuli in the same spatial

position produced an increment of d’ parameter, i.e. theability to discern a sensory event from its background. Onthe contrary, analysis of response criterion (b) datashowed that the presentation of an auditory cue alsoinfluenced the participants “willingness” to respond, both

Fig. 2 Means of perceptualsensitivity (d’), in experiment1a, for each LED position (a–hfor V1, V2, V3, V4, V5, V6, V7and V8, respectively). Dottedbars represent unimodal visualconditions; black bars representcrossmodal visual-auditoryconditions; asterisks indicatesignificant pairwise compar-isons between unimodal andcrossmodal conditions

335

in experiment 1a and 1b, but the effect was not spatiallyspecific.

An anonymous referee pointed out that real differencesbetween spatially disparate crossmodal conditions andunimodal condition may have been missed due to the

unequal sample size used for spatially disparate condition,whereby only six trials were used compared to the 12trials used in the spatially coincident condition. Moreover,she/he suggested reducing the number of locations inorder to reduce the number of comparisons. For these

Fig. 3 Means of response cri-terion (b), in experiment 1a, foreach LED position (a–h for V1,V2, V3, V4, V5, V6, V7 andV8, respectively). Dotted barsrepresent unimodal visual con-ditions; black bars representcrossmodal visual-auditoryconditions; asterisks indicatesignificant pairwise compar-isons between unimodal andcrossmodal conditions

336

reasons, another experiment was run on which the numberof trials in spatially coincident and spatially disparatecrossmodal conditions were the same and the number ofvisual stimulus locations were reduced from eight to four.In this experiment, visual stimuli were located at an

eccentricity of 40 and 56� to the right and to the left of thecentral fixation point. In this way, we varied the spatialdisparity between the two stimuli from 0� (spatiallycoincident crossmodal condition) to 16� and 32� (spatiallydisparate crossmodal conditions). Spatial disparities su-

Fig. 4 Means of perceptualsensitivity (d’), in experiment1b, for each LED position (a–hfor V1, V2, V3, V4, V5, V6, V7and V8, respectively). Dottedbars represent unimodal visualconditions; black bars representcrossmodal visual-auditoryconditions

337

Fig. 5 Means of response cri-terion (b), in experiment 1b, foreach LED position (a–h for V1,V2, V3, V4, V5, V6, V7 andV8, respectively). Dotted barsrepresent unimodal visual con-ditions; black bars representcrossmodal visual-auditoryconditions; asterisks indicatesignificant pairwise compar-isons between unimodal andcrossmodal conditions

338

perior to 32� were not tested, because in the previousexperiment (see also Frassinetti et al. 2002, in right braindamaged patients with neglect) facilitatory effects werenot found when the auditory and the visual stimuli werelocated at a spatial disparity superior to 32�.

Experiment 2

Materials and method

Subjects

Thirteen participants, all students from the University of Bologna,received course credits for their participation. The mean age ofsubjects was 22 years old. They had normal or corrected vision,normal hearing and were right handed. The participants were naiveas to the purpose of the experiment.

Apparatus and stimuli

The apparatus was the same used in experiment 1. The number andthe location of loudspeakers were the same used in the previousexperiment. By contrast, the number of light displays was reducedfrom eight to four and two light displays in the LVF and RVF wereremoved (LVF: V3, V4; RVF: V5, V6). The four light displayswere located at an eccentricity of 40 and 56� to the right and to theleft of the central fixation point, directly in front of thecorresponding loudspeaker. Note that we refer to the auditorypositions by labels A1 to A8 moving from left to right, andsimilarly we refer to the visual target positions in the LVF by labelsV1, V2 and in the RVF by labels V7 and V8. The characteristic andthe duration of the visual stimuli (visual target+visual mask) andacoustic stimuli were the same of the previous experiment.

Procedure

Participants sat 70 cm in front of the apparatus, fixating their gazeon the central fixation point.

Four different kinds of sensory stimulation were presented: 1) asingle visual stimulus, that is a visual target followed by a visualmask (unimodal visual condition); 2) a visual mask presentedalone, without any preceding visual target (catch trials for unimodalvisual condition); 3) a visual stimulus (a visual target followed by avisual mask) and an auditory stimulus presented together (cross-modal condition); 4) a single auditory stimulus without any visualtarget, followed by a visual mask (catch trials for crossmodalcondition). In the crossmodal condition, for each visual position,the auditory stimulus could be presented either in the same positionof the visual stimulus (spatially coincident crossmodal condition) orat a spatial disparity of 16� or 32� from the visual stimulus(spatially disparate crossmodal conditions). More precisely, inspatially disparate crossmodal conditions there were ten possiblecrossmodal stimulations: V1: V1-A2, V1-A3; V2: V2-A1, V2-A3,V2-A4; V7: V7-A5, V7-A6, V7-A8; V8: V8-A7, V8-A6.

In the same way, in the catch trials for crossmodal condition, theauditory stimulus could be presented either in the same position asthe visual mask (catch trials for spatially coincident crossmodalcondition) or at a spatial disparity of 16� or 32� from the visualstimulus (catch trials for spatially disparate crossmodal condition).The acoustic and the visual stimulus could be presented simulta-neously (experiment 2a) or the acoustic stimulus was followed,after a 500 ms interval, by the visual stimulus (experiment 2b).

In each experiment there were the following trials: 72 unimodalvisual trials and 72 visual catch trials (18 trials for each of the fourvisual positions), 24 spatially coincident crossmodal trials and 24spatially coincident crossmodal catch trials (six trials for each ofthe four visual positions), 60 spatially disparate crossmodal trials

and 60 spatially disparate crossmodal catch trials (six trials for eachof the ten crossmodal spatially disparate stimulations). In this way,on half of the trials the target was presented in the display, and onthe other half of the trials the target was not presented. In eachexperiment, the total number of trials (312 trials) was equallydistributed in eight experimental blocks (39 trials each) in eachexperiment, for a total of 16 experimental blocks given in a randomorder and run on consecutive days.

Subjects participated in both experiments 2a and 2b and wereinstructed to detect only the presence of the visual target (greenLED) and to ignore the sound. They had to respond by pressing abutton with the index finger of the right hand.

Results

Experiment 2a

Signal detection analyses (d’) The mean d’ valuesshowed a selective increment of perceptual sensitivity (d’)in spatially coincident crossmodal conditions comparingto unimodal visual conditions, for all the spatial positionsconsidered in the present experiment: V1 [F(3,36)=11.9,P<0.0001] (d’ 2 versus 1, P<0.0003); V2 [F(4,48)=6.3,P<0.0004] (d’ 2 versus 1.3, P<0.01); V7 [F(4,48)=11.6,P<0.0001] (d’ 2.2 versus 1.2, P<0.0009) and V8[F(3,36)=5.9, P<0.002] (d’ 1.8 versus 1, P<0.005).Moreover, an increment of perceptual sensitivity wasalso found at a spatial disparity of 16� in a relativelylateral (temporal) position (V7–A8=1.9, P<0.001) fromthe visual stimulus (see Fig, 6a–d for positions V1, V2,V7 and V8, respectively).

Response criterion analyses (b) Analysis of responsecriterion (b) data showed a significant effect of conditionfor all spatial positions: V1 [F(3,36)=8.8, P<0.0002]; V2[F(4,48)=51.3, P<0.0001]; V7 [F(4,48)=9.7, P<0.0001]and V8 [F(3,36)=13.4, P<0.0001]. Post-hoc comparisonsshowed a reduction of b values in crossmodal comparingto unimodal condition without any spatial specific effect(see Fig. 6e–h for positions V1, V2, V7 and V8,respectively).

Experiment 2b

Signal detection analyses (d’) In experiment 2b, theANOVA did not show a significant difference between d’values in unimodal and crossmodal conditions for any ofthe spatial positions considered in this experiment. d’values are reported in Fig. 7 (see Fig. 7a–d for positionsV1, V2, V7 and V8, respectively).

Response criterion analyses (b) Analysis of responsecriterion (b) data showed a significant effect of conditionfor all spatial positions: V1 [F(3,36)=3.1, P<0.04]; V2[F(4,48)=34.7, P<0.0001]; V7 [F(4,48)=6.6, P<0.0003]and V8 [F(3,36)=13.2, P<0.0001]. Post-hoc comparisonsshowed a reduction of beta values in crossmodal

339

comparing to unimodal condition without any spatialspecific effect (see Fig. 7e–h for positions V1, V2, V7 andV8, respectively).

Discussion

The results of experiment 2 confirm the results ofexperiment 1: an increment of the signal detection (d’)was found in spatially coincident crossmodal condition

comparing to unimodal condition only when audio-visualstimuli were simultaneously presented, according with thespatial and temporal rules of multisensory integration. Noimprovement of visual detection accuracy was found inspatially disparate crossmodal conditions, with the ex-ception of the position V7 when the acoustic stimulus was16� lateral (temporal) to the visual stimulus (V7-A8).Again, the analysis of response criterion (b) data showedthat the presentation of an auditory cue also influenced

Fig. 6 Means of perceptualsensitivity (d’) (a–d) and meansof response criterion (b) (e–h),in experiment 2a, for each LEDposition (V1, V2, V7 and V8,respectively). Dotted bars re-present unimodal visual condi-tions; black bars representcrossmodal visual-auditoryconditions; asterisks indicatesignificant pairwise compar-isons between unimodal andcrossmodal conditions

340

the participants’ willingness to respond, but the effect wasnot spatially specific.

General discussion

The results of the present study show the existence of anintegrated visuo-acoustic system in humans. In particularwe show that auditory stimuli may enhance the efficiency

of the visual system in a difficult detection task:perceptual sensitivity of subthreshold masked visualstimuli was indeed improved by concurrent acousticstimuli. Moreover, perceptual sensitivity to visual stim-ulus increased when the two temporally overlappingvisual and acoustic stimuli were presented in the samerather than different spatial position.

These results are in keeping with the spatial ruledescribed in electrophysiological recordings by Meredith

Fig. 7 Means of perceptualsensitivity (d’) (a–d) and meansof response criterion (b) (e–h),in experiment 2b, for each LEDposition (V1, V2, V7 and V8,respectively). Dotted bars re-present unimodal visual condi-tions; black bars representcrossmodal visual-auditoryconditions; asterisks indicatesignificant pairwise compar-isons between unimodal andcrossmodal conditions

341

and Stein (1986): multimodal neurons in the superiorcolliculus only show multisensory enhancement whenacoustic and visual stimuli are presented within thereceptive fields of such neurons. Indeed, in the superiorcolliculus, visual and auditory maps are in rough topo-graphic register with one another (e.g. see Gordon 1975;Middlebrooks and Knudsen 1984; King and Palmer 1985;Meredith and Stein 1986), and individual multisensorycells (e.g. visual-auditory) have each of their unimodalreceptive fields overlapping in space. Consequently,spatially coincident multisensory stimuli located withinthe excitatory region of one receptive field will also fallwithin the excitatory region of the other, and theresponses they evoke will enhance one another. Sincevisual and auditory receptive fields also often haveinhibiting regions bordering them (Wise and Irvine 1983;Middlebrookes and Knudsen 1984; Meredith and Stein1986), moving one of these stimuli (e.g. auditory) out ofcoincidence with the other and into the inhibitory regionof its receptive field will depress that cell’s responses tothe visual stimulus.

Interestingly, we found that, in experiment 2a (seeposition V7), visual perceptual sensitivity (d’) wasenhanced not only by spatially coincident auditorystimuli, but also when the two stimuli were located at aspatial disparity of 16�. This effect was found only whenauditory stimuli were presented in a lateral (temporal),rather than medial (nasal), spatial location as compared tothe visual stimulus. Again, this result is similar toelectrophysiological data. In multimodal neurons, audito-ry receptive fields are larger than visual receptive fields(Middlebrooks and Knudsen 1984). As a consequence, anauditory stimulus can excite neurons over a larger region,in respect to a visual stimulus. Moreover, the auditoryreceptive fields are rarely symmetric, and instead theirtemporal borders extend to the peripheral space more thanthe nasal borders (Middlebrooks and Knudsen 1984; Steinet al. 1989). As a consequence, in the case of a relativelysmall spatial disparity between auditory and visualstimuli, an auditory stimulus may still enhance responsesto a visual stimulus when presented relatively moretemporally, but not nasally, to it. Results from position V7in experiment 2a are in line with such neurophysiologicaldata.

The multimodal audio-visual integration found in thepresent study is likely to reflect a perceptual, rather thanpost-perceptual mechanism. This becomes evident whenthe results of decision criterion parameter, b, are consid-ered. A decrement of b in the crossmodal comparing tounimodal stimulation was found for some spatial posi-tions in experiment 1a and 2a, but never in a spatialspecific way. Since the decrement of b was not selectivefor spatially coincident crossmodal conditions, it cannotbe explained using the spatially specific rules of cross-modal integration. Instead, it can be interpreted byconsidering that an acoustic stimulus acts as an alertingsignal enhancing visual responses.

The results of the present study highlighted anotherinteresting functional characteristic of this crossmodal

integration, i.e. multisensory interaction only takes placewhen periods of peak activity of unimodal discharge trainoverlap. We found spatially specific effects only whenstimuli in the two different modalities were presentedsimultaneously; at an interval of 500 ms, no responseenhancement (d’) was found. These data are consistentwith the temporal rule of multisensory integration (Steinand Meredith 1993): stimuli occurring in close temporalproximity affect one another, while those separated bylong intervals are processed as separate events. In thelatter case, an acoustic stimulus might act simply as analerting signal (Robertson et al. 1998), as documented bya reduction of b value in most of crossmodal comparingto unimodal conditions, in both experiments 1b and 2b.

The results of the present study are similar to thosereported by Stein and co-workers (1989) at the behaviorallevel in animals. They examined the orientation behaviorsof cats in response to unimodal and multimodal stimuli. Intheir study, animals, trained to approach a low intensityvisual cue, had their performance significantly enhancedwhen a brief, low intensity auditory stimulus was alsopresented. Stein and co-workers showed that such mul-timodal enhancement of orienting to visual stimuli wasonly present with concurrent auditory stimuli presented atthe same location as the visual stimulus; by contrast, theirperformance was significantly depressed when the audi-tory stimulus was presented at a different spatial location(Stein et al. 1989). Moreover, our results are in keepingwith previous neuropsychological data in right braindamaged patients with left visual spatial neglect(Frassinetti et al. 2002). In neglect patients, a soundpresented simultaneously and at the same spatial locationwith a visual stimulus, improved detection of previouslyneglected visual targets.

By contrast, previous studies in normal subjects failedto demonstrate the validity of the rules of audiovisualintegration. Stein and co-workers (1996) examined thepossibility that auditory stimuli might alter the judgementof visual intensity and do so in a manner consistent withthe spatial rule of multisensory integration. They foundthat perceived visual intensity was enhanced regardless ofwhether an auditory stimulus was spatially coincident, ordisplaced 45� to the right or left of a visual stimulus. Onepossible explanation of Stein et al.’s results is thatmultisensory neurons, which only show multimodalintegration for visual and auditory stimuli in close spatialregister, are not involved in functions for which stimuluslocalization in space is not essential. The perceptual taskof assessing stimulus intensity is likely to be one of these.

Visual facilitation with auditory cues was also foundby other authors (e.g. Spence and Driver 1997; McDonaldet al. 2000). Spence and Driver (1997) showed that anuninformative auditory cue affected elevation judgmentsfor visual targets, that is elevation discriminations werefaster on the cued side. Since they adopted very fewspatial positions for the visual stimulus (two verticallyaligned positions on the left and on the right side) theprecise spatial mapping of audiovisual multimodal inte-gration could not be assessed.

342

Similar facilitatory effects were found by McDonaldand coworkers (2000) by using only one possible spatialposition on the right or on the left of fixation. Theyprovided psychophysical evidence that a sudden soundimproves the detectability of a subsequent flash appearingat the same location when the delay between the cue andthe target was less than 300 ms. They interpreted thefinding as an evidence for crossmodal consequences ofinvoluntary/exogenous shift in spatial attention thatprobably are not linked to multisensory interaction atthe neural level (see also McDonald and Ward 2000;McDonald et al. 2001a). In light of the present results, wesuggest that a distinction between crossmodal integrationand crossmodal attention may be drawn for the case ofendogenous attention (Driver 1996), but it is lessstraightforward for the stimulus-driven case of exogenousattention (see also Macaluso et al. 2001; but seeMcDonald et al. 2001b for a different opinion). In themultimodal brain areas, there may be an overlap betweenthe physiological mechanism responsible of stimulus-driven crossmodal spatial integration and that responsibleof stimulus-driven (exogenous) crossmodal spatial atten-tion. Further studies, however, are needed to clarifywhether and how these systems may interact.

In conclusion the present study, considering differentspatial positions for visual and auditory stimulation,provides the first demonstration of the existence of anintegrated spatial visuo-auditory system in normal sub-jects, with functional properties similar to that describedat neuronal and behavioral level in animals.

Acknowledgement This work was supported by grants fromMURST to Elisabetta L�davas.

References

Driver J (1996) Enhancement of selective listening by illusorymislocation of speech sounds due to lip-reading. Nature381:66–68

Frassinetti F, Pavani F, L�davas E (2002) Acoustic vision ofneglected stimuli: interaction among spatially convergingaudiovisual inputs in neglect patients. J Cognit Neurosci14:62–69

Gordon BG (1975) Superior colliculus: structure, physiology andpossible functions. MTP Int Rev Sci 3:185–230

King AJ, Palmer AR (1985) Integration of visual and auditoryinformation in bimodal neurones in the guinea-pig superiorcolliculus. Exp Brain Res 60:492–500

Macaluso E, Frith C, Driver J (2001) Multisensory integration andcrossmodal attention effects in the human brain – response.Science 292:1791

McDonald JJ, Ward LM (2000) Involuntary listening aids seeing:evidence from human electrophysiology. Psychol Sci 11:167–171

McDonald JJ, Teder-Salejarvi WA, Hillyard SA (2000) Involuntaryorienting to sound improves visual perception. Nature 407:906–908

McDonald JJ, Teder-Salejarvi WA, Heraldez D, Hillyard SA(2001a) Electrophysiological evidence for the “missing link” incrossmodal attention. Can J Exp Psychol 55:141–149

McDonald JJ, Teder-Salejarvi WA, Ward LM (2001b) Multisen-sory integration and crossmodal attention effects in the humanbrain. Science 292:1791

Meredith MA, Stein BE (1986) Spatial factors determine theactivity of multisensory neurons in cat superior colliculus.Brain Res 369:350–354

Middlebrooks JC, Knudsen EI (1984) A neural code for auditoryspace in the cat’s superior colliculus. J Neurosci 4:2621–2634

Robertson IH, Mattingley JB, Rorden C, Driver J (1998) Phasicalerting of neglect patients overcomes their spatial deficit invisual awareness. Nature 395:169–172

Spence C, Driver J (1997) Audiovisual links in exogenous covertspatial orienting. Percept Psychophysiol 59:1–22

Stein BE, Meredith MA (1993) Merging of the senses. MIT Press,Cambridge, Mass.

Stein BE, Meredith MA, Huneycutt WS, McDade LW (1989)Behavioral indices of multisensory integration: orientation tovisual cues is affected by auditory stimuli. J Cognit Neurosci1:12–24

Stein BE, London N, Wilkinson LK, Price DD (1996) Enhancementof perceived visual intensity by auditory stimuli: a psycho-physical analysis. J Cognit Neurosci 8:497–506

Wallace MT, Meredith MA, Stein BE (1992) Integration ofmultiple sensory modalities in cat cortex. Exp Brain Res91:484–488

Wise LZ, Irvine DRF (1983) Auditory response properties ofneurons in deep layers of cat superior colliculus. J Neurophysiol49:674–685

343