modulation of the auditory cortex during speech: an meg study

14
Modulation of the Auditory Cortex during Speech: An MEG Study John F. Houde* ,1 , Srikantan S. Nagarajan* ,1 , Kensuke Sekihara 2 , and Michael M. Merzenich 1 Abstract & Several behavioral and brain imaging studies have demon- strated a significant interaction between speech perception and speech production. In this study, auditory cortical responses to speech were examined during self-production and feedback alteration. Magnetic field recordings were obtained from both hemispheres in subjects who spoke while hearing controlled acoustic versions of their speech feedback via earphones. These responses were compared to recordings made while subjects listened to a tape playback of their production. The amplitude of tape playback was adjusted to match the amplitude of self-produced speech. Recordings of evoked responses to both self-produced and tape-recorded speech were obtained free of movement-related artifacts. Responses to self-produced speech were weaker than were responses to tape-recorded speech. Responses to tones were also weaker during speech production, when compared with responses to tones recorded in the presence of speech from tape playback. However, responses evoked by gated noise stimuli did not differ for recordings made during self- produced speech versus recordings made during tape- recorded speech playback. These data suggest that during speech production, the auditory cortex (1) attenuates its sensitivity and (2) modulates its activity as a function of the expected acoustic feedback. & INTRODUCTION Behavioral experiments have shown that different aspects of speech production are sensitive to auditory feedback. Adding noise to a speaker’s auditory feedback results in an elevation of voice volume (Lane & Tranel, 1971). Delaying a speaker’s auditory feedback by more than about 50 msec produces noticeable disruptions in speech (Yates, 1963; Lee, 1950). Shifting the spectrum of auditory feedback causes shifts in the spectrum of produced speech (Gracco, Ross, Kalinowski, & Stuart, 1994). Perturbing perceived pitch causes the subject to alter his pitch to compensate for those perturbations (Kawahara, 1993). Altering perceived formants induces compensating changes in the production of vowels (Houde & Jordan, 1998). Prolonged listening to one phoneme causes small but significant changes in the production of that phoneme (Cooper, 1979). A number of physiological studies have shown that activity at different levels of the auditory system is modulated by the act of vocal production. Early evi- dence came from animal studies in bats, birds, and monkeys. In bats, responses in the lateral lemniscus of the midbrain are attenuated by about 15 dB during vocalization (Suga & Schlegel, 1972; Suga & Shimozawa, 1974). In birds, it is now well known that vocal pro- duction is disrupted by perturbation in the auditory feedback, and that this feedback alteration affects the responses in higher-order neurons of the auditory fore- brain in birds (Leonardo & Konishi, 1999; Schmidt & Konishi, 1998; McCasland & Konishi, 1981). In mon- keys, activity in the auditory cortex is found to be inhibited due to either spontaneous vocalizations or to vocalizations evoked by electrical stimulation of the cingulate cortex (Muller-Preuss & Ploog, 1981). Similar, albeit less consistent results, have been reported for recordings from human temporal lobe (Creutzfeldt & Ojemann, 1989; Creutzfeldt, Ojemann, & Lettich, 1989a, 1989b). However, the exact characteristics of this atten- uation have not been adequately investigated in animal studies. Furthermore, the dynamics of this attenuation is poorly understood. Recent functional brain imaging studies in humans have investigated the activity of the auditory system during speech perception. The analysis of phonological and semantic characteristics of externally generated speech is found to engage the left inferior frontal, left temporal, and posterior parietal cortices (Demonet et al., 1992; Zatorre, Evans, Meyer, & Gjedde, 1992), while processing of prosodic qualities engages analo- gous regions in the right hemisphere (Zatorre et al., 1992). It has been hypothesized that similar areas also 1 University of California, San Francisco, 2 Tokyo Metropolitan Institute of Technology, *These authors contributed equally to this paper. © 2002 Massachusetts Institute of Technology Journal of Cognitive Neuroscience 14:8, pp. 1125–1138

Upload: independent

Post on 04-Mar-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Modulation of the Auditory Cortex during SpeechAn MEG Study

John F Houde1 Srikantan S Nagarajan1 Kensuke Sekihara2and Michael M Merzenich1

Abstract

amp Several behavioral and brain imaging studies have demon-strated a significant interaction between speech perceptionand speech production In this study auditory corticalresponses to speech were examined during self-productionand feedback alteration Magnetic field recordings wereobtained from both hemispheres in subjects who spoke whilehearing controlled acoustic versions of their speech feedbackvia earphones These responses were compared to recordingsmade while subjects listened to a tape playback of theirproduction The amplitude of tape playback was adjusted tomatch the amplitude of self-produced speech Recordings ofevoked responses to both self-produced and tape-recorded

speech were obtained free of movement-related artifactsResponses to self-produced speech were weaker than wereresponses to tape-recorded speech Responses to tones werealso weaker during speech production when compared withresponses to tones recorded in the presence of speech fromtape playback However responses evoked by gated noisestimuli did not differ for recordings made during self-produced speech versus recordings made during tape-recorded speech playback These data suggest that duringspeech production the auditory cortex (1) attenuates itssensitivity and (2) modulates its activity as a function of theexpected acoustic feedback amp

INTRODUCTION

Behavioral experiments have shown that differentaspects of speech production are sensitive to auditoryfeedback Adding noise to a speakerrsquos auditory feedbackresults in an elevation of voice volume (Lane amp Tranel1971) Delaying a speakerrsquos auditory feedback by morethan about 50 msec produces noticeable disruptions inspeech (Yates 1963 Lee 1950) Shifting the spectrumof auditory feedback causes shifts in the spectrum ofproduced speech (Gracco Ross Kalinowski amp Stuart1994) Perturbing perceived pitch causes the subject toalter his pitch to compensate for those perturbations(Kawahara 1993) Altering perceived formants inducescompensating changes in the production of vowels(Houde amp Jordan 1998) Prolonged listening to onephoneme causes small but significant changes in theproduction of that phoneme (Cooper 1979)

A number of physiological studies have shown thatactivity at different levels of the auditory system ismodulated by the act of vocal production Early evi-dence came from animal studies in bats birds andmonkeys In bats responses in the lateral lemniscusof the midbrain are attenuated by about 15 dB during

vocalization (Suga amp Schlegel 1972 Suga amp Shimozawa1974) In birds it is now well known that vocal pro-duction is disrupted by perturbation in the auditoryfeedback and that this feedback alteration affects theresponses in higher-order neurons of the auditory fore-brain in birds (Leonardo amp Konishi 1999 Schmidt ampKonishi 1998 McCasland amp Konishi 1981) In mon-keys activity in the auditory cortex is found to beinhibited due to either spontaneous vocalizations orto vocalizations evoked by electrical stimulation of thecingulate cortex (Muller-Preuss amp Ploog 1981) Similaralbeit less consistent results have been reported forrecordings from human temporal lobe (Creutzfeldt ampOjemann 1989 Creutzfeldt Ojemann amp Lettich 1989a1989b) However the exact characteristics of this atten-uation have not been adequately investigated in animalstudies Furthermore the dynamics of this attenuationis poorly understood

Recent functional brain imaging studies in humanshave investigated the activity of the auditory systemduring speech perception The analysis of phonologicaland semantic characteristics of externally generatedspeech is found to engage the left inferior frontal lefttemporal and posterior parietal cortices (Demonetet al 1992 Zatorre Evans Meyer amp Gjedde 1992)while processing of prosodic qualities engages analo-gous regions in the right hemisphere (Zatorre et al1992) It has been hypothesized that similar areas also

1University of California San Francisco 2Tokyo MetropolitanInstitute of Technology These authors contributed equally tothis paper

copy 2002 Massachusetts Institute of Technology Journal of Cognitive Neuroscience 148 pp 1125ndash1138

participate in monitoring these aspects of speech thatare self-generated (Hickok amp Poeppel 2000 Levelt1983 1989)

PET studies have shown that the response to primaryauditory cortex (A1) to self-produced speech is minimal(Hirano et al 1996 Hirano Naito et al 1997) How-ever this A1 response activity increased if the subjectheard an altered version of their normal speech (HiranoKojima et al 1997) In other PET studies varying therate of speech output while the speech-contingentauditory input was masked by constant white noiseproduced significant modulation of the response fromthe secondary auditory cortex of the left hemisphere(McGuire Silbersweig amp Frith 1996) Furthermorethese studies have reported that the medial prefrontalcortex and the left frontal insulaoperculum were differ-entially activated during speech feedback alterationHowever these studies have not examined the dynamicsof activity in these different auditory regions engagedduring speech production

More recently magnetoencephalography (MEG) stud-ies have shown that the responses from the auditorycortex to self-produced speech are attenuated in themillisecond time scale when compared with responsesfrom tape-recorded speech (Curio Neuloh NumminenJousmaki amp Hari 2000 Numminen amp Curio 1999Numminen Salmelin amp Hari 1999) These initial MEGstudies also reported a lack of a lsquolsquochangersquorsquo response toself-produced stimuli The study by Curio et al (2000)examined 100 msec poststimulus (M100) response dif-ferences between speaking and tape playback condi-tions The authors found significant differences inM100 amplitude across listening conditions in the lefthemisphere and significant differences in M100 laten-cies across listening conditions in both hemispheresNumminen et al have investigated this phenomena intwo studies In one study they reported that M100responses to 1-kHz tones were slightly delayed andsignificantly inhibited during overt speech as comparedwith silent speech (Numminen et al 1999) In a relatedstudy they also reported that M100 responses to shortrecorded vowel sounds heard while subjects also heardeither self-produced or tape recorded vocalizations weredelayed and dampened relative to background-free pre-sentations (Numminen amp Curio 1999)

In this study we sought to further examine thisattenuation of responses to self-produced speech Inparticular we were interested in discerning betweentwo possible mechanisms of the M100 response reduc-tion One possibility is that activity in the auditorysystem is generally suppressed during speech Suchlsquolsquononspecificrsquorsquo attenuation could be the indirect resultof middle-ear muscle activity during speaking (Papani-colaou Raz Loring amp Eisenberg 1986) Alternativelythe M100 response reduction during speaking couldresult from a comparison between actual and predictedauditory feedbackmdashthat is an auditory version of Heldrsquos

lsquolsquoreafference hypothesisrsquorsquo (Hein amp Held 1962) Motorsystem activity during speaking could generate an inter-nal representation of the expected auditory feedbackand a match between expected and actual feedbackcould reduce the M100 response

To test between the efference copy and nonspecificattenuation hypotheses we performed a series of exper-iments in which MEG was used to examine M100responses to speech in a variety of conditions In theseexperiments magnetic field recordings were obtainedfrom the auditory cortices in both hemispheres ofsubjects who spoke while hearing controlled versionsof their speech feedback via earphones These re-sponses were compared to MEG recordings made whilesubjects listened to a tape recording of their production

Three experiments were conducted First we repeatthe experiments of Numminen et al and examined thedynamics of attenuation of responses to self-producedspeech Second we tested the specificity of the recordedsignal attenuation by examining M100 responses to tonepulses added to the speech heard by the subject Thirdwe tested the lsquolsquoreafference hypothesisrsquorsquo by examiningM100 responses to speech feedback altered to lsquolsquomis-matchrsquorsquo the expected speech feedback

RESULTS

Experiment 1

Figure 1 shows the apparatus used in the experimentsMagnetic fields were recorded from both hemispheresin a shielded room using two 37-channel biomagnetom-eters A directional microphone was placed in thechamber at a distance where it did not distort the

Figure 1 Apparatus used in the experiments

1126 Journal of Cognitive Neuroscience Volume 14 Number 8

recorded magnetic fields Acoustic input was deliveredto the subject via air tube-based earphones Triggeringof the data acquisition system and of stimuli delivered tothe ears was controlled and differed for each of threeexperiments in the study

The schematic for our first experiment is shown inFigure 2 This experiment consisted of two successiveconditions speaking and tape playback In the speakingcondition (Figure 2a) subjects were instructed to pro-duce the short vowel sound by phonating with theirjaw and tongue relaxed and stationary The subjectrsquosspeech was picked up by a microphone and fed to a taperecorder his earphones and the trigger input for MEGdata acquisition A total of 100 utterances were recordedon tape In the subsequent tape playback condition(Figure 2b) subjects were instructed to remain silentwhile they heard a tape-recording of their utterancesfrom the speaking condition

Results for one subject in the speaking condition areshown in Figure 3 which shows evoked magnetic fieldresponse recorded at each detector position averagedover the 100 utterance trials In the figure the tracesshow average response at each detector positionaligned to the onset of vocalization (gray tick marks)For each condition (speaking tape playback) the RMSresponses at each detector were averaged together forthe left and right hemispheres as shown in Figure 4The figure shows RMS responses averaged over detec-tors in the left (upper panel) and right (lower panel)hemisphere detector arrays In each panel the thicktrace shows the average RMS response in the speaking

condition and the thin trace shows the same for thetape playback condition The vertical axis of each panelindicates RMS field strength (fT) of the response whilethe horizontal axis indicates time (msec) relative to theonset of the vocalization which occurs at 0 msec Manydifferences between responses in the speaking and

Figure 3 RMS responses across detector arrays recorded in thespeaking condition of Experiment 1 from a single representativesubject The traces show average response at each detector positionaligned to the onset of vocalization (gray tick marks) (Gaps in thedetector arrays are due to offline detectors)

Figure 2 Setup for the two conditions of Experiment 1

Houde et al 1127

tape playback conditions can be seen in the figure Atabout 100 msec poststimulus onset most of the tracesshow a peak in RMS amplitude This peak is referred toas the M100 response and there are large differencesin its amplitude across conditions in both hemispheresfor this subject In the left hemisphere the M100response in the speaking condition is greatly sup-pressed compared to that seen in the tape playbackcondition In the right hemisphere this suppression iseven greater In the tape playback condition there isquite a large M100 response but in the speakingcondition the M100 is essentially absent

Figure 5 shows the same response waveforms of theprevious figure averaged across all eight subjects In thisfigure in addition to the average response waveformsvertical bars indicating standard errors are also shownThe asterisks in each panel indicate latencies at whichthe speaking and tape playback responses differed sig-nificantly relative to p lt 001 These plots show that inboth hemispheres mean M100 responses were signifi-cantly smaller in the speaking condition than in the tapeplayback condition It appears that left hemisphereresponses are larger than right hemisphere responsesin the tape playback condition but that both hemi-spheres are inhibited to about the same level in thespeaking condition However even in the speakingcondition the left hemisphere M100 response appearsmore focal in time than that of the right hemisphereLess consistent are the longer latency responses Forexample the amplitude of the M200 response in the lefthemisphere was significantly larger in the tape playbackcondition while in the right hemisphere it was signifi-cantly larger in the speaking condition

The acoustic signals reaching the ears via air conduc-tion the so-called lsquolsquoside-tonersquorsquo sound amplitudes wereadjusted to be the same for both the speaking and tapeplayback conditions all our subjects reported that theintensity of mic and tape signals were perceptuallyidentical However in the speaking condition becausethe subject hears both the side tone and bone conduc-tion of his voice the speech heard by a subject is actuallyabout twice the amplitude of (ie 3 dB louder) thespeech heard in the tape playback condition (vonBekesy 1949) Thus a priori we would expect thisamplitude difference to be a possible confound in theexperiment The size of evoked magnetic field responsein the auditory cortex and in particular the amplitudeof the M100 response have been shown to be a monot-onically increasing function of audio input level (Stuf-flebeam Poeppel Rowley amp Roberts 1998) From thiswe would expect the M100 response to be larger in thevoiced condition than in the tape playback conditionHowever this response difference should be small Atthe audio levels used in this experiment (80 dBA at theearphones) the effect of audio amplitude on the M100response has nearly saturated Additionally as we haveseen the actual M100 response recorded in the speakingcondition was much less than that recorded in the tapeplayback conditionmdasha result opposite of what would bepredicted based on audio amplitude differences

Source locations were performed on the spatio-temporal magnetic field responses recorded 50ndash200msec following stimulus onset Across all our stimulusconditions (mic vs tape vs tone response) we found nostatistically significant difference ( p gt 5) in the locationof an equivalent dipole in each hemisphere that accounts

Figure 5 RMS response waveforms in Experiment 1 averagedacross all subjects The vertical bars in each trace show standard errorswhile the asterisks show intervals where the response waveforms fromthe microphone condition (thick traces) and tape condition (thintraces) differ significantly relative to p lt 001 for both hemispheres

Figure 4 RMS responses recorded in Experiment 1 averaged over all37 detectors in each hemispherersquos detector array from a singlerepresentative subject In each plot the thick traces are responses inthe speaking condition the thin traces are responses in the tapeplayback condition The onset of the audio stimulus is at 0 msec

1128 Journal of Cognitive Neuroscience Volume 14 Number 8

for these magnetic field responses These localizationdata suggest that the observed response suppressionattenuation is arising from a restricted cortical regionpresumably A1 and its immediate environs

Overall all subjectsrsquo M100 responses were localized tothe temporal cortex in the area of the auditory cortexWe infer this because lesion studies intracranial record-ings and source modeling studies provide convergingevidence that the neuronal generators of the M100response were located in the auditory regions of thetemporal lobes and include A1 and its immediate envi-rons (Ahissar et al 2001 Picton et al 1999 Liegeois-Chauvel Musolino Badier Marquis amp Chauvel 1994Reite et al 1994 Richer Alain Achim Bouvier amp Saint-Hilaire 1989 Scherg amp Von Cramon 1985 1986Woods Knight amp Neville 1984 Hari Aittoniemi Jarvi-nen Katila amp Varpula 1980)

Experiment 2

The results of Experiment 1 clearly confirmed that theM100 response is suppressed in the speaking conditionOne possible mechanism that could cause this suppres-sion is nonspecific attenuation That is neural signalsarising as a result of activity in the motor cortex mightdirectly and broadly inhibit auditory cortex activity

If auditory cortex activity is nonspecifically suppressedduring speaking then it should be less responsive to allauditory signals In Experiment 2 we tested this possi-bility by examining the M100 response to tones heardduring speech that was either self-produced or playedback from tape

In this experiment we measured the responses evokedby tones pips under three successive conditions asshown in Figure 6 In the first condition (tones aloneFigure 6a) subjects remained silent and heard 10-kHztone pips In the second condition (tones and speakingFigure 6b) subjects again heard these tones pips but thistime the tones were presented while subjects producedlong utterances of the vowel (again with no move-ment of their jaw or tongue) In the third condition(tones and tape playback Figure 6c) subjects againremained silent and heard both tone pips and tapeplayback of their speech produced in the previous con-dition For the last two conditions in which the subjectheard tone pips and speech the speech was attenuatedby 20 dB to reduce the masking of the tone pips

The results of Experiment 2 are summarized in Figure7 The figure shows a plot of the mean RMS responseacross detectors and subjects for each condition and ineach hemisphere Dashed traces show subjectsrsquo re-sponses to hearing the tones in silence (tones alonecondition) Thick traces show responses to hearing thetones while subjects produced (tones and speakingcondition) Thin traces show responses to hearing thetones along with the taped speech playback (tones andtape playback condition) As in Figure 5 the vertical bars

are standard errors Asterisks indicate time intervalsduring which responses to tones heard while produc-ing speech (tones and speaking condition) differedsignificantly ( p lt 001) from the responses to tonesheard with taped speech playback (tones and tapeplayback condition)

For most of the traces in Figure 7 the M100 and M200responses to the tones across conditions stand outclearly For both hemispheres these responses are veryprominent for tones heard in silence (dashed traces) Inthe left hemisphere there was a reduction in M100 andM200 response to the tones in the speaking (thicktraces) and tape playback (thin traces) conditions In-terestingly the response reduction in the speakingcondition appeared to be greater than the responsereductions in the tape playback condition Indeed thestatistical analysis confirmed that the M100 response in

Figure 6 Setup for the three conditions of Experiment 2

Houde et al 1129

the tones and speaking condition was significantly lessthan the M100 response in the tones and tape playbackcondition Nevertheless the difference was small In theright hemisphere there was an approximately equalreduction in M100 and M200 responses to the tones inthe speaking and tape playback conditions Within theM100ndashM200 region responses in these two conditionswere only significantly different at one latency as shownby the asterisk

It appears that at the most the act of speakingcreates only a limited degree of nonspecific attenuationto the tones in the left hemisphere that is not seen inthe tape playback condition To quantify how muchnonspecific attenuation contributes to the suppressionof the response to self-produced speech we can esti-mate the signal reductions needed to produce theresponse reductions seen in these experiments Fromthe results of Experiment 1 we have seen that com-pared to speech from the tape playback self-productionof speech created a 30 reduction in M100 amplitudein the left hemisphere and a 15 reduction in the righthemisphere equivalent to 13 and 7 dB decreases ineffective input signal respectively1 In Experiment 2 inthe left hemisphere there was possibly an extra 7reduction in M100 response to tones while subjectsspoke as compared to the M100 response to tones duringtape playback which is equivalent to only a 3-dB reduc-tion in effective input signal Thus it appears that non-specific attenuation cannot account for all the responsesuppression recorded for self-produced speech

It has also been argued that any attenuation in theresponses during vocalization could occur due to boneconduction and activation of middle-ear muscles or dueto inhibition at the level of the brain stem (Papanicolaouet al 1986) Thus to further test the hypothesis ofnonspecific attenuation we measured brainstem evokedresponses to click trains in five subjects under threeconditions identical to those in Experiment 2mdashsilenceproduced speech and tape-recorded speech In the lattertwo conditions we attenuated the speech background by20 dB to obtain a clear evoked response to clicks Foreach of these conditions we recorded the brainstemevoked response and measured the latency and ampli-tude of Waves IndashV The results from these experimentsindicated that there were no statistical differences ineither the amplitude or the latencies for all wavesbetween the three background conditions ( p gt 2)These results provide further support to the hypothesisthat our observed effects were cortical in origin

Experiment 3

Another possible cause of the response reduction toself-produced speech is that activity in the auditory

Figure 7 RMS waveforms for responses to 10-kHz tone pipsaveraged across all subjects for the three conditions of Experiment 2the tones alone condition (dashed traces) the tones and speakingcondition (thick traces) and the tones and tape playback condition(thin traces) The vertical bars in each trace show standard errorswhile the asterisks show intervals where the response waveforms fromthe tones and speaking condition (red) and the tones and tapeplayback condition (blue) differed significantly relative to p lt 001 forboth hemispheres

Figure 8 Setup for the two conditions of Experiment 3

1130 Journal of Cognitive Neuroscience Volume 14 Number 8

cortex is the result of a comparison between the incom-ing signal and an internally generated prediction of whatthat signal would be If this were to be the case thenaltering the incoming signal should create a mismatchwith that expectation that would reduce or abolish theresponse suppression

In Experiment 3 we tested this possibility by alter-ing the auditory feedback to the subject during self-production of speech The setup for Experiment 3 isshown in Figure 8 Like Experiment 1 this experimentconsisted of two conditions speaking and tape play-back However instead of hearing their speech (self-produced or from tape playback) subjects heard a sumof their speech plus the output of a white noise gen-erator that produced noise bursts gated to the durationof the speech utterance Subjects reported that theycould not hear their own speech and that they onlyheard noise gated to their vowel production

The results of Experiment 3 are shown in Figure 9 Forboth hemispheres the thick traces show the averageevoked response of subjects hearing gated noise in thespeaking condition while the thin traces show theirresponses to gated noise in the tape playback conditionAsterisks mark poststimulus latencies where the evokedresponses in the two conditions differ significantlyrelative to p lt 001 for both hemispheres The figureshows subjectsrsquo evoked responses in the two conditionsare significantly different at a number of latencies In theleft hemisphere the responses in the speaking and tapeplayback conditions differ significantly around the M100responses and at a latency of 150 msec In the righthemisphere the speaking and tape playback responses

differ significantly at the 50- and 200-msec latenciesSince we are limiting our analysis of these results toconsidering only M100 responses the only responsedifferences of interest are the significant differences seenin the M100 region of the left hemisphere responses Itis clear from the figure however that these responsedifferences arise principally from a latency differencebetween M100 responses the M100 response to gatednoise in the tape playback condition is about 20 msecearlier that the M100 response to gated noise in thespeaking condition The amplitudes of the two M100responses appear nearly identical

In sum Figure 9 shows that the M100 amplitudesuppression seen in Experiment 1 is abolished in bothhemispheres the amplitude of M100 response to noisegenerated from subjectsrsquo speech via the microphone isas large as the M100 response to noise generated fromthe tape playback of their speech Thus when wealtered subjectsrsquo speech feedback in Experiment 3 wefound that the response suppression to self-producedspeech disappeared which is consistent with the sup-pression resulting from auditory input matching aninternally expected input

DISCUSSION

The present study demonstrates that the human audi-tory cortex responds differently to self-produced speechthan to externally produced speech In Experiment 1significant suppression in the amplitude of M100responses was recorded in the speaking condition inboth hemispheres confirming that self-production ofspeech suppresses the response in the auditory cortexto that speech In Experiment 2 M100 responses totone pips were seen to be modestly more stronglysuppressed by self-produced than by externally pro-duced speech However this relatively weak extra sup-pression was not sufficient to explain the suppressionseen in Experiment 1 On the other hand in Experi-ment 3 we were able to abolish the suppression ofM100 responses to self-produced speech by altering thefeedback heard by subjects This last result is consistentwith the hypothesis that the suppression of response inthe auditory cortex to self-produced speech resultsfrom a match with expected auditory feedback

Comparison with Other Studies

Experiment 1 is quite similar to the study by Curio et al(2000) which also examined M100 response differencesbetween speaking and tape playback conditions Theauthors found M100 amplitude differences across listen-ing conditions in both hemispheres but unlike thepresent study these differences were not significant inthe right hemisphere Further in contradiction to ourstudies Curio et al described significant differencesin M100 latencies across listening conditions in both

Figure 9 RMS response waveforms in Experiment 3 averaged acrossall subjects The vertical bars in each trace show standard errors whilethe asterisks show intervals where the response waveforms from thespeaking condition (thick traces) and tape playback condition (thintraces) differ significantly relative to p lt 001 for both hemispheres

Houde et al 1131

hemispheres Different results in M100 latencies couldbe accounted for by the fact that we matched theacoustic waveform presented to each subject whileCurio et al behaviorally matched the perceptual inten-sity between the listening conditions Because subjectscould set the tape playback volume to be different fromtheir speech feedback volume differing audio levels inthe two experiment conditions could account for someof the M100 latency differences that they report WithMEG data acquisition triggered by speech signal onsettriggering occurs when the audio signal exceeds somethreshold voltage As a result triggering time is sensitiveto the amplification of the signal The more a signal isamplified the faster it will reach the trigger thresholdThus if tape playback volume levels were less thanspeech feedback volume levels triggering from tapeplayback could be delayed compared to the speakingcondition If so then for the same true M100 latencyM100 peaks would appear to have shorter latencies inthe tape playback condition

In Experiment 3 the amplitude difference in M100responses between the speaking and tape playbackconditions was abolished by using gated noise todistort the feedback heard by subjects Interestinglyhowever this feedback manipulation created a latencydifference between the two conditions In the lefthemisphere the M100 response to gated noise trig-gered by tape playback was 15 msec earlier than thatfor gated noise triggered by the subjectrsquos own speechwhile in the right hemisphere such a latency differenceis not seen Curio et al (2000) also found that thereduced latency of M100 responses to tape playbackversus self-produced speech was significantly morepronounced in the left hemisphere Like Curio et alwe attribute this left hemisphere latency difference totwo factors First this latency difference may representdifferential processing of externally produced versusself-produced auditory feedback Although the gatednoise altered the spectral characteristics of their speechfeedback such that it was likely to mismatch anyinternally generated expectations of how their feedbackwould sound the gated noise did not alter the tempo-ral characteristics of their speech feedback Subjectsstill heard auditory feedback begin at the same time asthey commenced phonation This temporal predictabil-ity could potentially have allowed subjects to processdifferently gated noise arising from their own speechwhose onset they could predict from gated noisearising from tape playback whose onset they couldnot predict Second the fact that this latency differencewas seen primarily in the left hemisphere is consistentwith the left hemispherersquos dominance in the produc-tion of speech in right-handed speakers Other aspectsof our results may also reflect this left hemispheredominance In Experiment 1 the M100 response totape playback speech was larger in the left hemispherethan in the right and the M100 response to self-

produced speech was temporally sharper in the lefthemisphere than in the right Additionally in Experi-ment 2 in the left hemisphere the M100 response totones was significantly more suppressed by self-pro-duced speech than by tape playback speechmdasha differ-ence not seen in the right hemisphere

Results from Experiment 2 are mostly consistent withthe recent experiments by Numminen and Curio (1999)They reported that neuromagnetic responses to shortrecorded vowel sounds were delayed and dampenedrelative to background-free presentations They did notreport how much they externally attenuated the pro-duced speech signal in order to obtain a clear responseto the probe-vowel sound In our experiments we usedprobe tones instead of vowel sounds and we attenuatedthe produced speech signal to the ear by 20 dBSurprisingly we did not observe any latency differencesin the M100 responses for tones between the threeconditions We believe that the lack of a latency effectcould be accounted for by the fact that we used tonesinstead of vowel sounds (Diesch amp Luce 1997 Poeppelet al 1996 Kuriki amp Murase 1989) Furthermore by anargument analogous to that discussed above for theCurio et al (2000) latency results it is also reasonablethat any amplitude mismatches in the masking soundscould account for discrepancies in the latency effect(Hari amp Makela 1988) However while Numminen andCurio identified auditory interference as the main causeof these modifications their results also provide evi-dence against the nonspecific attenuation hypothesisalthough they do not specifically discuss their results inthis form For instance they found an additional 6ndash9decrease in amplitude in response to probe-vowelsounds during produced speech background whencompared with tape-recorded speech backgroundThese amplitude differences are consistent with ourExperiment 2 results

Gunji et al have found that vocalization-related cort-ical magnetic fields activated six sources that were foundto temporally overlap in the period 0ndash100 msec aftervocalization onset (Gunji Hoshiyama amp Kakigi 2000Gunji Kakigi amp Hoshiyama 2000) Sources 1 and 2 wereactivated approximately 150 msec before the vocalizationonset and were located in laryngeal motor areas of theleft and right hemispheres respectively Kuriki Mori andHirata (1999) have also found a motor planning centerfor speech articulation that is activated 120ndash320 msecbefore the onset of vocalization and located in a regionaround the superior end of the left insula Gunji et alSources 5 and 6 were located in the truncal motor area ineach hemisphere and were similar to Sources 1 and 2 Incontrast their Sources 3 and 4 were located in theauditory cortices of the left and right hemisphererespectively and were activated after vocalization onsetHowever these experiments did not explore either themodulation of the responses in the auditory or motorcortex due to vocalization or feedback alteration In our

1132 Journal of Cognitive Neuroscience Volume 14 Number 8

experiments because we were unable to simultaneouslymeasure activity from motor and supplementary motorareas using a 37-channel sensor array over each hemi-sphere we focus here only on the activity from auditorycortices of both hemispheres

More relevant to our results are the studies of Hiranoet al (Hirano et al 1996 Hirano Kojima et al 1997Hirano Naito et al 1997) who used PET to look atcortical activation during speaking In their first studyHirano et al found a lack of activation in the auditorycortex while subjects vocalized In their second studyHirano et al altered how subjects heard their ownvocalizations and this time recorded significant activa-tion of the auditory cortex These results are consistentwith our findings in Experiments 1 and 3 of our study

Why is the Response to Self-Produced SpeechSuppressed

The present study along with recent MEG studies byCurio et al and other brain imaging studies that haveused PET or fMRI demonstrate that during speechproduction the auditory cortex suppresses its responseto expected acoustic signals (Hirano et al 1996 HiranoKojima et al 1997 Hirano Naito et al 1997) Doesthat suppression serve any function We speculate abouttwo possibilities

Auditory Perception

One possibility is that the suppression results from theprocess of distinguishing self-produced from externallyproduced sources It would be useful for sensory systemsto make this distinction since sensory information fromthese two types of sources is commonly used for differ-ent purposes Sensory information from self-producedmotor actions could potentially be used for feedbackcontrol On the other hand sensory information fromexternal sources is primarily used for recognition

But how could the sensory systems distinguish self-produced from externally produced sensory informa-tion It has been proposed by previous researchers thatthis is done by having motor actions produce expect-ations of their sensory consequences (Jeannerod 1988)These sensory outcome predictions are then comparedwith the actual sensory input whatever matches theoutcome predictions is inhibited In that way the sen-sory consequences of self-produced actions could befiltered from incoming sensory data thus highlightingany unpredicted and therefore external (or novel) stim-ulus that should potentially be attended to

For a number of reasons the results of our experi-ments are consistent with the above account Firstalthough we found significant suppression of the re-sponses of the auditory cortex to self-produced speechwe found little evidence that the act of producingspeech suppresses responses to other sounds (in our

case the 1-kHz tone) This finding is consistent with thehypothesis that only responses to self-produced soundsare being filtered from the auditory cortex In additionwe found that altering the auditory feedback with gatednoise abolished the suppression in the auditory cortexinduced by speaking consistent with the hypothesis thatthe filtering out of auditory information arising from self-produced speech is done by comparison with predictedauditory feedback

Recently Blakemore et al have recorded the sametypes of response suppression phenomena in the soma-tosensory cortex suggesting that the need to distinguishself-produced from externally generated sensory inputmay be a general property of sensory systems In theirseries of experiments Blakemore et al used bothbehavioral and physiological procedures to look at re-sponses to both self-produced and externally producedsomatosensory stimulation In one experiment theyexamined the well-known phenomenon that peoplecannot tickle themselves (Weiskrantz Elliott amp Darling-ton 1971) This phenomenon could be explained bysupposing that somatosensory responses are inhibitedwhen the incoming sensory (touch) information matchesa prediction of that input generated by the personrsquos ownmotor system To test this hypothesis these investigatorshad subjects tickle their palm via an apparatus that couldintroduce a time delay between their movement andthe touch stimulation of their palm (Blakemore Frith ampWolpert 1999) When this was done subjects reportedthat with increasing delay the stimulation of theirpalm was increasingly stimulating (ticklish) That theperceived stimulation increased with delay could beexplained by supposing that the delay made the incom-ing sensory information mismatch in time the predictedsensory input This experiment was followed up in anfMRI imaging study looking at activation of the somato-sensory cortex and cerebellum to both self-producedand externally generated palm stimulation (BlakemoreWolpert amp Frith 1999) It was found that activation ofthe somatosensory cortex to self-produced palm stimu-lation was suppressed compared to externally producedpalm stimulationmdasha direct analog of the results recordedin the auditory cortex in our experiments

Speech Motor Control

The suppression could also serve to regulate howmuch auditory feedback is used in the ongoing controlof speech

A problem with using auditory feedback to controlspeech is the delay incurred by the processing of theauditory input To control speech auditory feedbackmust convey information about the positions of thevocal tract articulators From speech acoustics we knowthat formant frequencies and pitch are the acousticparameters that convey this information (Stevens1999) Extracting these parameters from the type of

Houde et al 1133

signal provided by the cochlea requires significant signalprocessing which in turn suggests the possibility ofsignificant delay if a succession of auditory brain areasare needed for this processing (Jurgens 2002 BurnettFreedland Larson amp Hain 1998 Perkell 1997) Inaddition there will be delays associated with the gen-eration of a response in the motor cortex There will alsobe intrinsic delays in transmitting the neural signals tothe appropriate muscles as well as delays in muscleresponses to these signals (Jurgens 2002) Togetherthese factors could contribute to a significant feedbackloop delay between auditory input and motor output

Unfortunately we know from control theory that ingeneral using feedback to control a system with a largefeedback loop delay will not work because the systembecomes unstable (Franklin Powell amp Emami-Naeini1991) Intuitively this instability arises because the delaymakes the feedback uncorrelated with the state of thesystem sensory feedback from some time in the pastdoes not entirely reflect the current state of the systemYet auditory feedback does appear to be involved in theonline control of some aspects of speech production Asnoted in the Introduction if the pitch of a speakerrsquosfeedback is perturbed they will typically compensatewithin 100 ndash200 msec (Burnett et al 1998 Kawahara1993 Elman 1981) Thus in spite of a large (100ndash200 msec) feedback loop delay auditory feedback doesstably regulate a speakerrsquos pitch How is this possibleAgain from control theory we know that when feedbackis delayed corrupted by noise or otherwise madeuncorrelated with current motor output a successfulapproach to using feedback control is to attenuate thesensory feedback to the degree that it is uncorrelatedSpecifically in feedback control based on Kalman filter-ing the Kalman gain on sensory feedback is set in partto be inversely proportional to the degree to whichsensory feedback is correlated with the current systemstate (Jacobs 1993) In an analogous way it may be thatthe suppression of the auditory cortex during speakingfunctions to attenuate the auditory feedback to a levelcommensurate with its delay such that it can be usedproperly in the control of speech

Finally it is possible that both of the above accountsare true It may be that the suppression of self-producedspeech acts as a filter for incoming sensory data allow-ing attention to be paid to externally produced sensa-tions while at the same time properly attenuating theauditory feedback for use in the online control ofspeech Future experiments must be designed to distin-guish between the above hypothesized possibilities

METHODS

Subjects

Eight male volunteers (ages 25ndash40) participated inExperiments 1 and 3 of this study Of these volun-

teers six also participated in Experiment 2 All subjectsgave their informed consent All studies were per-formed with the approval of the UCSF Committee forHuman Research

Experimental Design

Magnetic fields were recorded from both hemispheresin a shielded room using two 37-channel biomagnetom-eters with SQUID-based first-order gradiometer sensors(Magnes II 4-D Neuroimaging San Diego CA) A sche-matic of the experimental setup is shown in Figure 1A directional microphone was placed in the chamber at adistance where it did not distort the recorded magneticfields Triggering of the data acquisition system and ofstimuli delivered to the ears was controlled and itdiffered for each of three experiments in the study

Fiduciary points were marked on the skin for latercoregistration with structural magnetic resonanceimages and the head shape was digitized to constrainsubsequent source modeling The sensor was initiallypositioned over the estimated location of the auditorycortices in both hemispheres such that a dipolar M100response was evoked by single 400-msec duration tonepips (1 kHz 5 msec risefall ramps 90 dB SPL) Dataacquisition epochs were 600 msec in total duration witha 100-msec prestimulus period referenced to the onsetof the first speech stimulus Data were acquired at asampling rate of 1041 Hz The position of the sensorwas then refined so that a single dipole localizationmodel of the averaged evoked magnetic field responsesto 100 tones resulted in a correlation and goodness-of-fitgreater than 95

With satisfactory sensor positioning over the auditorycortices in both hemispheres but before starting theexperiments subjects practiced producing the neutralvowel sound by phonating while relaxing their uppervocal tract generating no movement of their jaw ortongue Preliminary experiments revealed that this formof speech production produced no movement artifactsin these magnetic field recordings

During the speaking condition of Experiments 1 and 3the audio onset of each utterance picked up the micro-phone triggered the MEG data acquisition system whichrecorded 1 sec of data (300 msec pretrigger 700 msecposttrigger) at a sampling rate of 1041 Hz Subjectswere instructed to produce short vowel sounds byphonating their vocal chords without any movement oftheir jaw or tongue Responses to 100 such utteranceswere recorded and these magnetic field recordings werethen averaged to obtain the evoked magnetic fieldresponses to self-produced speech for each MEG detec-tor channel In the subsequent tape playback conditionthe audio signal from the tape recorder was fed to thesubjects earphones and the trigger input for MEG dataacquisition Again magnetic field recordings wereaveraged to obtain the evoked magnetic field responses

1134 Journal of Cognitive Neuroscience Volume 14 Number 8

During the tape playback condition of Experiments 1and 3 the audio level of the signal fed to the earphonesand trigger was adjusted to be the same as that of thespeaking condition Because of this the trigger of MEGdata acquisition (which was based on an audio voltagethreshold) occurred at the same time relative to utter-ance onset in both conditions The equal audio levelsensured that triggering latency differences could notaccount for any differences between the evoked re-sponses to self-produced and tape-recorded speech InExperiment 1 the subject heard these equal audio levelsin the earphones which meant that his side tone ampli-tude was the same in both the speaking and tape play-back conditions In Experiment 3 the audio signal foreach utterance was used to trigger both the MEG dataacquisition system and a white noise generator (20 Hzndash20 kHz bandwidth 85 dB SPL) In this case a sum of theiraudio signal and noise was presented to the subject

In Experiment 2 onset of the tone pips triggered MEGdata acquisition in each of the three conditions (tonesalone tones and speaking and tones and tape playback)These 1-kHz tones each had a 400-msec duration 5-mseconset and offset ramps and a 08ndash12-sec ISI betweentones The subject heard the tones at 90 dB SPL As withExperiment 1 data acquisition epochs were 10 sec intotal duration with a 300-msec prestimulus period Datawere acquired at a sampling rate of 1041 Hz In eachcondition the evoked magnetic field responses to thetone pips were obtained by averaging 100 epochs

Epochs 1500 msec long with a 500-msec prestimulusperiod were collected at a sampling rate of 1041 Hz Theraw epoch magnetic field data from these 100 utteranceswere then averaged to obtain the evoked magnetic fieldresponses to gated noise during self-production

The voltage level of the audio signal fed to earphoneand trigger signal in this condition was adjusted to beidentical to the voltage levels recorded during the speak-ing condition ensuring that triggering differences couldnot account for any differences between the evokedresponses to noise in the self-produced and tape play-back blocks

Data Analysis

Data from the two sensor arrays positioned over eachhemisphere were separately analyzed From the evokedmagnetic field response for each condition in eachchannel the RMS value of evoked magnetic field strengthaveraged across channels was computed The RMS meth-od of averaging (square root of the mean of squaredchannel field strengths) was used to avoid the cancella-tion that occurs when the magnetic fields from detectorson opposite sides of the current dipole (whose fields areroughly equal in magnitude but opposite in sign) areaveraged together This process resulted in one RMStime waveform for each hemisphere for each conditionfor each subject For statistical comparisons across con-

ditions a one-way analysis of variance was performedwith condition as factor and time as a repeated measure

The RMS across channels is a convenient measure ofactivity that is independent of the position of individualsensors Following optimal placement of the sensor for asignificant M100 dipolar response RMS has been shownto correlate well with the lsquolsquoequivalent current-dipolestrengthrsquorsquo (also called the lsquolsquoQ valuersquorsquo) of a dipole posi-tioned in and around the auditory cortex The RMSmeasure is also correlated with the data obtained from(1) the channel with the biggest signal-to-noise ratioand (2) the first two principal components of the arrayresponse (Ahissar et al 2001 Mahncke 1998)

In sensor arrays comprising of planar gradiometers(like the Neuromag 122 or 306) the channel with largestsignal-to-noise ratio is often closest to the source (asdictated by the physics) However in the case of sensorarrays comprising of axial gradiometers (as in ourMagnes II system) the channel with the largest signal-to-noise ratio reflects the peak of the magnetic fieldwhich is not the closest channel to a source Moreoverthe lsquolsquobestrsquorsquo channel could reflect either a peak lsquolsquoinwardrsquorsquoor lsquolsquooutwardrsquorsquo field requiring signed inversion whileaveraging such data across subjects Finally since thesensor array position can varying considerably acrosssubjects it is difficult to use procedures typical for EEGand whole-head studies wherein single channel data isoften picked for analysis across subjects These consid-erations have led us to use RMS instead of pickingspecific channels for our analysis

Single Dipole Model Fit

A single equivalent current-dipole model was calculatedseparately for each hemisphere for each poststimulustime point using standard 4-D Neuroimaging softwareoperating on data that had been filtered between 1 and20 Hz The localization algorithm used an iterative least-squares minimization to compute the strength andlocation of a single dipole in a spherical volume ofuniform conductivity that can account for the sensordata Dipole fits were accepted based on a local corre-lation maximum criteria of 95 and goodness-of-fit valuesgreater than 95

The MEG anatomical reference frame was establishedusing a digital device called the lsquolsquosensor position indica-torrsquorsquo A set of three table-mounted and one array-mounted receivers triangulates the signal from a stylustransmitter positioned at fiducial reference points on thesubjectrsquos head surface (typically nasion left right pre-auricular points Cz and inion) The same stylus trans-mitter arrangement served to define the curvature of thehead by tracing the surface from which a lsquolsquolocal spherersquorsquomodel of the head was generated To coregister thedipole locations on the MRI image the fiduciary pointsserve as the basis for a common coordinate system Bysuperimposing these fiducial landmarks on MR images

Houde et al 1135

of the subject it was possible to define the position ofthe computed point sources with an accuracy of 5 mm

Multiple Dipole Localization Analysis

Multiple dipole localization analyses of spatio-temporalevoked magnetic fields was also performed using Multi-ple Signal Classification (MUSIC) (Sekihara PoeppelMarantz Koizumi amp Miyashita 1997 Mosher Lewisamp Leahy 1992) and beamformer algorithms (Mosheret al 1992 Sekihara Nagarajan Poeppel Marantz ampMiyashita 2001 Robinson amp Vrba 1999) MUSIC meth-ods are based on estimation of a signal lsquolsquosubspacersquorsquofrom entire spatio-temporal MEG data using singular-value decomposition (SVD) A version of the MUSICalgorithm referred to as the lsquolsquoconventionalrsquorsquo MUSICalgorithm was implemented in MATLAB under theassumption that the sources contributing to the MEGdata arose from multiple dipoles (lt37 in number)which were stationary and were located within a spher-ical volume of uniform conductivity (Sekihara et al1997) The locations of dipoles are typically determinedby conducting a search over a three-dimensional grid ofinterest within the head Given the sensor positionsand the coordinates of the origin of a lsquolsquolocal spherersquorsquoapproximation of the head shape for each subject alead-field matrix was computed for each point in this3-D grid At each point in this grid from the lead-fieldmatrix and the covariance matrix of the spatio-temporalMEG data the value of a MUSIC localizer function canthen be computed (Sekihara et al 1997 Equation (4))Maxima of this localizer function then corresponded tothe location of dipolar sources We used the MUSICalgorithm to compute the localizer functions over athree-dimensional grid in the hemisphere contralateralto the stimulus delivery The localizer was computedusing the evoked magnetic field responses over a timewindow of 1000 msec following the onset of thesentences For each subject at each point in a 3-D grid(iexcl4 lt x lt 6 0 lt y lt 8 3 lt z lt 11) in the lefthemisphere the localizer function was computed usingthe averaged evoked auditory magnetic field responses

Acknowledgments

This work was funded in part by Scientific Learning Co and agrant from the Whitaker Foundation

Reprint requests should be sent to John F Houde W M KeckCenter for Integrative Neuroscience University of CaliforniaSan Francisco 513 Parnassus Avenue S-762 San Francisco CA94143 USA or via e-mail houdephyucsfedu

Note

1 In a recent study of M100 responses to tones it was foundthat near detection thresholds a 15-dB change in SPL caused a34 change in M100 RMS amplitude (Stufflebeam et al 1998)

REFERENCES

Ahissar E Nagarajan S Ahissar M Protopapas A MahnckeH amp Merzenich M M (2001) Speech comprehension iscorrelated with temporal response patterns recorded fromauditory cortex Proceedings of the National Academy ofSciences USA 98 13367 ndash13372

Blakemore S J Frith C D amp Wolpert D M (1999)Spatio-temporal prediction modulates the perception ofself-produced stimuli Journal of Cognitive Neuroscience11 551 ndash559

Blakemore S J Wolpert D M amp Frith C D (1999) Thecerebellum contributes to somatosensory cortical activityduring self-produced tactile stimulation Neuroimage 10448 ndash459

Burnett T A Freedland M B Larson C R amp Hain T C(1998) Voice f0 responses to manipulations in pitchfeedback Journal of the Acoustical Society of America103 3153ndash3161

Cooper W E (1979) Speech perception and productionStudies in selective adaptation Norwood NJ Ablex

Creutzfeldt O amp Ojemann G (1989) Neuronal activityin the human lateral temporal lobe III Activity changesduring music Experimental Brain Research 77490 ndash498

Creutzfeldt O Ojemann G amp Lettich E (1989a) Neuronalactivity in the human lateral temporal lobe I Responses tospeech Experimental Brain Research 77 451 ndash475

Creutzfeldt O Ojemann G amp Lettich E (1989b) Neuronalactivity in the human lateral temporal lobe II Responses tothe subjects own voice Experimental Brain Research 77476 ndash489

Curio G Neuloh G Numminen J Jousmaki V amp Hari R(2000) Speaking modifies voice-evoked activity in thehuman auditory cortex Human Brain Mapping 9183 ndash191

Demonet J F Chollet F Ramsay S Cardebat DNespoulous J L Wise R Rascol A amp Frackowiak R(1992) The anatomy of phonological and semanticprocessing in normal subjects Brain 115 1753 ndash1768

Diesch E amp Luce T (1997) Magnetic fields elicited bytones and vowel formants reveal tonotopy and nonlinearsummation of cortical activation Psychophysiology 34501 ndash510

Elman J L (1981) Effects of frequency-shifted feedback onthe pitch of vocal productions Journal of the AcousticalSociety of America 70 45ndash50

Franklin G F Powell J D amp Emami-Naeini A (1991)Feedback control of dynamic systems (2nd ed) ReadingMA Addison-Wesley

Gracco V L Ross D Kalinowski J amp Stuart A (1994)Articulatory changes following spectral and temporalmodifications in auditory feedback Journal of theAcoustical Society of America 95 2821

Gunji A Hoshiyama M amp Kakigi R (2000) Identificationof auditory evoked potentials of onersquos own voice ClinicalNeurophysiology 111 214 ndash219

Gunji A Kakigi R amp Hoshiyama M (2000) Spatiotemporalsource analysis of vocalization-associated magnetic fieldsCognitive Brain Research 9 157 ndash163

Hari R Aittoniemi K Jarvinen M L Katila T amp Varpula T(1980) Auditory evoked transient and sustained magneticfields of the human brain Localization of neural generatorsExperimental Brain Research 40 237 ndash240

Hari R amp Makela J P (1988) Modification of neuromagneticresponses of the human auditory cortex by masking soundsExperimental Brain Research 71 87ndash92

Hein A V amp Held R (1962) A neural model for labile

1136 Journal of Cognitive Neuroscience Volume 14 Number 8

sensorimotor coordination In E Bernard amp M Hare(Eds) Biological prototypes and synthetic systems(vol 1 pp 71ndash74) New York Plenum

Hickok G amp Poeppel D (2000) Towards a functionalneuroanatomy of speech perception Trends in CognitiveSciences 4 131 ndash138

Hirano S Kojima H Naito Y Honjo I Kamoto YOkazawa H Ishizu K Yonekura Y Nagahama YFukuyama H amp Konishi J (1996) Cortical speechprocessing mechanisms while vocalizing visually presentedlanguages NeuroReport 8 363ndash367

Hirano S Kojima H Naito Y Honjo I Kamoto YOkazawa H Ishizu K Yonekura Y Nagahama YFukuyama H amp Konishi J (1997) Cortical processingmechanism for vocalization with auditory verbal feedbackNeuroReport 8 2379ndash2382

Hirano S Naito Y Okazawa H Kojima H Honjo I IshizuK Yenokura Y Nagahama Y Fukuyama H amp Konishi J(1997) Cortical activation by monaural speech soundstimulation demonstrated by positron emission tomographyExperimental Brain Research 113 75ndash80

Houde J F amp Jordan M I (1998) Sensorimotor adaptationin speech production Science 279 1213 ndash1216

Jacobs O L R (1993) Introduction to control theory(2nd ed) Oxford UK Oxford University Press

Jeannerod M (1988) The neural and behaviouralorganisation of goal-directed movements Oxford UKOxford University Press

Jurgens U (2002) Neural pathways underlying vocalcontrol Neuroscience and Biobehavioral Reviews 26235 ndash258

Kawahara H (1993) Transformed auditory feedback Effectsof fundamental frequency perturbation Journal of theAcoustical Society of America 94 1883

Kuriki S Mori T amp Hirata Y (1999) Motor planningcenter for speech articulation in the normal human brainNeuroReport 10 765 ndash769

Kuriki S amp Murase M (1989) Neuromagnetic study of theauditory responses in right and left hemispheres of thehuman brain evoked by pure tones and speech soundsExperimental Brain Research 77 127 ndash134

Lane H amp Tranel B (1971) The lombard sign and the roleof hearing in speech Journal of Speech and HearingResearch 14 677ndash709

Lee B S (1950) Some effects of side-tone delay Journal ofthe Acoustical Society of America 22 639 ndash640

Leonardo A amp Konishi M (1999) Decrystallization of adultbirdsong by perturbation of auditory feedback Nature 399466 ndash470

Levelt W J (1983) Monitoring and self-repair in speechCognition 14 41ndash104

Levelt W J M (1989) Speaking From intention toarticulation Cambridge MIT Press

Liegeois-Chauvel C Musolino A Badier J M Marquis P ampChauvel P (1994) Evoked potentials recorded from theauditory cortex in man Evaluation and topography of themiddle latency components Electroencephalography andClinical Neurophysiology 92 204 ndash214

Mahncke H (1998) Oscillations and temporal processingin sensory cortex Unpublished doctoral dissertationUniversity of California San Francisco

McCasland J S amp Konishi M (1981) Interaction betweenauditory and motor activities in an avian song controlnucleus Proceedings of the National Academy of SciencesUSA 78 7815ndash7819

McGuire P K Silbersweig D A amp Frith C D (1996)Functional neuroanatomy of verbal self-monitoring Brain119 907 ndash917

Mosher J C Lewis P S amp Leahy R M (1992) Multipledipole modeling and localization from spatio-temporal megdata IEEE Transactions on Biomedical Engineering 39541 ndash557

Muller-Preuss P amp Ploog D (1981) Inhibition of auditorycortical neurons during phonation Brain Research 21561ndash76

Numminen J amp Curio G (1999) Differential effects of overtcovert and replayed speech on vowel-evoked responsesof the human auditory cortex Neuroscience Letters 27229ndash32

Numminen J Salmelin R amp Hari R (1999) Subjectrsquos ownspeech reduces reactivity of the human auditory cortexNeuroscience Letters 265 119 ndash122

Papanicolaou A C Raz N Loring D W amp EisenbergH M (1986) Brain stem evoked response suppressionduring speech production Brain and Language 2750ndash55

Perkell J (1997) Articulatory processes In W J Hardcastleamp J Laver (Eds) The handbook of phonetic sciences(pp 333 ndash370) Oxford UK Blackwell Publishers

Picton T W Alain C Woods D L John M S Scherg MValdes-Sosa P Bosch-Bayard J amp Trujillo N J (1999)Intracerebral sources of human auditory-evoked potentialsAudiology and Neuro-otology 4 64ndash79

Poeppel D Yellin E Phillips C Roberts T P Rowley H AWexler K amp Marantz A (1996) Task-induced asymmetry ofthe auditory evoked m100 neuromagnetic field elicited byspeech sounds Brain Research Cognitive Brain Research4 231 ndash242

Reite M Adams M Simon J Teale P Sheeder JRichardson D amp Grabbe R (1994) Auditory m100component 1 Relationship to heschlrsquos gyri BrainResearch Cognitive Brain Research 2 13ndash20

Richer F Alain C Achim A Bouvier G amp Saint-Hilaire J M(1989) Intracerebral amplitude distributions of the auditoryevoked potential Electroencephalography and ClinicalNeurophysiology 74 202 ndash208

Robinson S amp Vrba J (1999) Functional neuroimaging bysynthetic aperture magnetometry (SAM) In Yoshimoto et al(Eds) Recent advances in biomagnetism (pp 302 ndash305)Sendai Japan Tokyo University Press

Scherg M amp Von Cramon D (1985) Two bilateral sources ofthe late aep as identified by a spatio-temporal dipole modelElectroencephalography and Clinical Neurophysiology 6232ndash44

Scherg M amp Von Cramon D (1986) Evoked dipolesource potentials of the human auditory cortexElectroencephalography and Clinical Neurophysiology65 344 ndash360

Schmidt M F amp Konishi M (1998) Gating of auditoryresponses in the vocal control system of awake songbirdsNature Neuroscience 1 513ndash518

Sekihara K Nagarajan S S Poeppel D Marantz Aamp Miyashita Y (2001) Reconstructing spatio-temporalactivities of neural sources using an meg vectorbeamformer technique IEEE Transactions on BiomedicalEngineering 48 760ndash771

Sekihara K Poeppel D Marantz A Koizumi H ampMiyashita Y (1997) Noise covariance incorporatedmeg-music algorithm A method for multiple-dipoleestimation tolerant of the influence of background brainactivity IEEE Transactions on Biomedical Engineering44 839 ndash847

Stevens K N (1999) Acoustic phonetics CambridgeMIT Press

Stufflebeam S M Poeppel D Rowley H A amp RobertsT P (1998) Peri-threshold encoding of stimulus

Houde et al 1137

frequency and intensity in the m100 latency NeuroReport9 91ndash94

Suga N amp Schlegel P (1972) Neural attenuation ofresponses to emitted sounds in echolocating rats Science177 82ndash84

Suga N amp Shimozawa T (1974) Site of neural attenuation ofresponses to self-vocalized sounds in echolocating batsScience 183 1211 ndash1213

von Bekesy G (1949) The structure of the middle ear and thehearing of onersquos own voice by bone conduction Journal ofthe Acoustical Society of America 21 217 ndash232

Weiskrantz L Elliott J amp Darlington C (1971) Preliminaryobservations on tickling oneself Nature 230 598 ndash599

Woods D L Knight R T amp Neville H J (1984)Bitemporal lesions dissociate auditory evoked potentialsand perception Electroencephalography and ClinicalNeurophysiology 57 208 ndash220

Yates A J (1963) Delayed auditory feedback PsychologicalBulletin 60 213 ndash232

Zatorre R J Evans A C Meyer E amp Gjedde A (1992)Lateralization of phonetic and pitch discrimination in speechprocessing Science 256 846 ndash849

1138 Journal of Cognitive Neuroscience Volume 14 Number 8

participate in monitoring these aspects of speech thatare self-generated (Hickok amp Poeppel 2000 Levelt1983 1989)

PET studies have shown that the response to primaryauditory cortex (A1) to self-produced speech is minimal(Hirano et al 1996 Hirano Naito et al 1997) How-ever this A1 response activity increased if the subjectheard an altered version of their normal speech (HiranoKojima et al 1997) In other PET studies varying therate of speech output while the speech-contingentauditory input was masked by constant white noiseproduced significant modulation of the response fromthe secondary auditory cortex of the left hemisphere(McGuire Silbersweig amp Frith 1996) Furthermorethese studies have reported that the medial prefrontalcortex and the left frontal insulaoperculum were differ-entially activated during speech feedback alterationHowever these studies have not examined the dynamicsof activity in these different auditory regions engagedduring speech production

More recently magnetoencephalography (MEG) stud-ies have shown that the responses from the auditorycortex to self-produced speech are attenuated in themillisecond time scale when compared with responsesfrom tape-recorded speech (Curio Neuloh NumminenJousmaki amp Hari 2000 Numminen amp Curio 1999Numminen Salmelin amp Hari 1999) These initial MEGstudies also reported a lack of a lsquolsquochangersquorsquo response toself-produced stimuli The study by Curio et al (2000)examined 100 msec poststimulus (M100) response dif-ferences between speaking and tape playback condi-tions The authors found significant differences inM100 amplitude across listening conditions in the lefthemisphere and significant differences in M100 laten-cies across listening conditions in both hemispheresNumminen et al have investigated this phenomena intwo studies In one study they reported that M100responses to 1-kHz tones were slightly delayed andsignificantly inhibited during overt speech as comparedwith silent speech (Numminen et al 1999) In a relatedstudy they also reported that M100 responses to shortrecorded vowel sounds heard while subjects also heardeither self-produced or tape recorded vocalizations weredelayed and dampened relative to background-free pre-sentations (Numminen amp Curio 1999)

In this study we sought to further examine thisattenuation of responses to self-produced speech Inparticular we were interested in discerning betweentwo possible mechanisms of the M100 response reduc-tion One possibility is that activity in the auditorysystem is generally suppressed during speech Suchlsquolsquononspecificrsquorsquo attenuation could be the indirect resultof middle-ear muscle activity during speaking (Papani-colaou Raz Loring amp Eisenberg 1986) Alternativelythe M100 response reduction during speaking couldresult from a comparison between actual and predictedauditory feedbackmdashthat is an auditory version of Heldrsquos

lsquolsquoreafference hypothesisrsquorsquo (Hein amp Held 1962) Motorsystem activity during speaking could generate an inter-nal representation of the expected auditory feedbackand a match between expected and actual feedbackcould reduce the M100 response

To test between the efference copy and nonspecificattenuation hypotheses we performed a series of exper-iments in which MEG was used to examine M100responses to speech in a variety of conditions In theseexperiments magnetic field recordings were obtainedfrom the auditory cortices in both hemispheres ofsubjects who spoke while hearing controlled versionsof their speech feedback via earphones These re-sponses were compared to MEG recordings made whilesubjects listened to a tape recording of their production

Three experiments were conducted First we repeatthe experiments of Numminen et al and examined thedynamics of attenuation of responses to self-producedspeech Second we tested the specificity of the recordedsignal attenuation by examining M100 responses to tonepulses added to the speech heard by the subject Thirdwe tested the lsquolsquoreafference hypothesisrsquorsquo by examiningM100 responses to speech feedback altered to lsquolsquomis-matchrsquorsquo the expected speech feedback

RESULTS

Experiment 1

Figure 1 shows the apparatus used in the experimentsMagnetic fields were recorded from both hemispheresin a shielded room using two 37-channel biomagnetom-eters A directional microphone was placed in thechamber at a distance where it did not distort the

Figure 1 Apparatus used in the experiments

1126 Journal of Cognitive Neuroscience Volume 14 Number 8

recorded magnetic fields Acoustic input was deliveredto the subject via air tube-based earphones Triggeringof the data acquisition system and of stimuli delivered tothe ears was controlled and differed for each of threeexperiments in the study

The schematic for our first experiment is shown inFigure 2 This experiment consisted of two successiveconditions speaking and tape playback In the speakingcondition (Figure 2a) subjects were instructed to pro-duce the short vowel sound by phonating with theirjaw and tongue relaxed and stationary The subjectrsquosspeech was picked up by a microphone and fed to a taperecorder his earphones and the trigger input for MEGdata acquisition A total of 100 utterances were recordedon tape In the subsequent tape playback condition(Figure 2b) subjects were instructed to remain silentwhile they heard a tape-recording of their utterancesfrom the speaking condition

Results for one subject in the speaking condition areshown in Figure 3 which shows evoked magnetic fieldresponse recorded at each detector position averagedover the 100 utterance trials In the figure the tracesshow average response at each detector positionaligned to the onset of vocalization (gray tick marks)For each condition (speaking tape playback) the RMSresponses at each detector were averaged together forthe left and right hemispheres as shown in Figure 4The figure shows RMS responses averaged over detec-tors in the left (upper panel) and right (lower panel)hemisphere detector arrays In each panel the thicktrace shows the average RMS response in the speaking

condition and the thin trace shows the same for thetape playback condition The vertical axis of each panelindicates RMS field strength (fT) of the response whilethe horizontal axis indicates time (msec) relative to theonset of the vocalization which occurs at 0 msec Manydifferences between responses in the speaking and

Figure 3 RMS responses across detector arrays recorded in thespeaking condition of Experiment 1 from a single representativesubject The traces show average response at each detector positionaligned to the onset of vocalization (gray tick marks) (Gaps in thedetector arrays are due to offline detectors)

Figure 2 Setup for the two conditions of Experiment 1

Houde et al 1127

tape playback conditions can be seen in the figure Atabout 100 msec poststimulus onset most of the tracesshow a peak in RMS amplitude This peak is referred toas the M100 response and there are large differencesin its amplitude across conditions in both hemispheresfor this subject In the left hemisphere the M100response in the speaking condition is greatly sup-pressed compared to that seen in the tape playbackcondition In the right hemisphere this suppression iseven greater In the tape playback condition there isquite a large M100 response but in the speakingcondition the M100 is essentially absent

Figure 5 shows the same response waveforms of theprevious figure averaged across all eight subjects In thisfigure in addition to the average response waveformsvertical bars indicating standard errors are also shownThe asterisks in each panel indicate latencies at whichthe speaking and tape playback responses differed sig-nificantly relative to p lt 001 These plots show that inboth hemispheres mean M100 responses were signifi-cantly smaller in the speaking condition than in the tapeplayback condition It appears that left hemisphereresponses are larger than right hemisphere responsesin the tape playback condition but that both hemi-spheres are inhibited to about the same level in thespeaking condition However even in the speakingcondition the left hemisphere M100 response appearsmore focal in time than that of the right hemisphereLess consistent are the longer latency responses Forexample the amplitude of the M200 response in the lefthemisphere was significantly larger in the tape playbackcondition while in the right hemisphere it was signifi-cantly larger in the speaking condition

The acoustic signals reaching the ears via air conduc-tion the so-called lsquolsquoside-tonersquorsquo sound amplitudes wereadjusted to be the same for both the speaking and tapeplayback conditions all our subjects reported that theintensity of mic and tape signals were perceptuallyidentical However in the speaking condition becausethe subject hears both the side tone and bone conduc-tion of his voice the speech heard by a subject is actuallyabout twice the amplitude of (ie 3 dB louder) thespeech heard in the tape playback condition (vonBekesy 1949) Thus a priori we would expect thisamplitude difference to be a possible confound in theexperiment The size of evoked magnetic field responsein the auditory cortex and in particular the amplitudeof the M100 response have been shown to be a monot-onically increasing function of audio input level (Stuf-flebeam Poeppel Rowley amp Roberts 1998) From thiswe would expect the M100 response to be larger in thevoiced condition than in the tape playback conditionHowever this response difference should be small Atthe audio levels used in this experiment (80 dBA at theearphones) the effect of audio amplitude on the M100response has nearly saturated Additionally as we haveseen the actual M100 response recorded in the speakingcondition was much less than that recorded in the tapeplayback conditionmdasha result opposite of what would bepredicted based on audio amplitude differences

Source locations were performed on the spatio-temporal magnetic field responses recorded 50ndash200msec following stimulus onset Across all our stimulusconditions (mic vs tape vs tone response) we found nostatistically significant difference ( p gt 5) in the locationof an equivalent dipole in each hemisphere that accounts

Figure 5 RMS response waveforms in Experiment 1 averagedacross all subjects The vertical bars in each trace show standard errorswhile the asterisks show intervals where the response waveforms fromthe microphone condition (thick traces) and tape condition (thintraces) differ significantly relative to p lt 001 for both hemispheres

Figure 4 RMS responses recorded in Experiment 1 averaged over all37 detectors in each hemispherersquos detector array from a singlerepresentative subject In each plot the thick traces are responses inthe speaking condition the thin traces are responses in the tapeplayback condition The onset of the audio stimulus is at 0 msec

1128 Journal of Cognitive Neuroscience Volume 14 Number 8

for these magnetic field responses These localizationdata suggest that the observed response suppressionattenuation is arising from a restricted cortical regionpresumably A1 and its immediate environs

Overall all subjectsrsquo M100 responses were localized tothe temporal cortex in the area of the auditory cortexWe infer this because lesion studies intracranial record-ings and source modeling studies provide convergingevidence that the neuronal generators of the M100response were located in the auditory regions of thetemporal lobes and include A1 and its immediate envi-rons (Ahissar et al 2001 Picton et al 1999 Liegeois-Chauvel Musolino Badier Marquis amp Chauvel 1994Reite et al 1994 Richer Alain Achim Bouvier amp Saint-Hilaire 1989 Scherg amp Von Cramon 1985 1986Woods Knight amp Neville 1984 Hari Aittoniemi Jarvi-nen Katila amp Varpula 1980)

Experiment 2

The results of Experiment 1 clearly confirmed that theM100 response is suppressed in the speaking conditionOne possible mechanism that could cause this suppres-sion is nonspecific attenuation That is neural signalsarising as a result of activity in the motor cortex mightdirectly and broadly inhibit auditory cortex activity

If auditory cortex activity is nonspecifically suppressedduring speaking then it should be less responsive to allauditory signals In Experiment 2 we tested this possi-bility by examining the M100 response to tones heardduring speech that was either self-produced or playedback from tape

In this experiment we measured the responses evokedby tones pips under three successive conditions asshown in Figure 6 In the first condition (tones aloneFigure 6a) subjects remained silent and heard 10-kHztone pips In the second condition (tones and speakingFigure 6b) subjects again heard these tones pips but thistime the tones were presented while subjects producedlong utterances of the vowel (again with no move-ment of their jaw or tongue) In the third condition(tones and tape playback Figure 6c) subjects againremained silent and heard both tone pips and tapeplayback of their speech produced in the previous con-dition For the last two conditions in which the subjectheard tone pips and speech the speech was attenuatedby 20 dB to reduce the masking of the tone pips

The results of Experiment 2 are summarized in Figure7 The figure shows a plot of the mean RMS responseacross detectors and subjects for each condition and ineach hemisphere Dashed traces show subjectsrsquo re-sponses to hearing the tones in silence (tones alonecondition) Thick traces show responses to hearing thetones while subjects produced (tones and speakingcondition) Thin traces show responses to hearing thetones along with the taped speech playback (tones andtape playback condition) As in Figure 5 the vertical bars

are standard errors Asterisks indicate time intervalsduring which responses to tones heard while produc-ing speech (tones and speaking condition) differedsignificantly ( p lt 001) from the responses to tonesheard with taped speech playback (tones and tapeplayback condition)

For most of the traces in Figure 7 the M100 and M200responses to the tones across conditions stand outclearly For both hemispheres these responses are veryprominent for tones heard in silence (dashed traces) Inthe left hemisphere there was a reduction in M100 andM200 response to the tones in the speaking (thicktraces) and tape playback (thin traces) conditions In-terestingly the response reduction in the speakingcondition appeared to be greater than the responsereductions in the tape playback condition Indeed thestatistical analysis confirmed that the M100 response in

Figure 6 Setup for the three conditions of Experiment 2

Houde et al 1129

the tones and speaking condition was significantly lessthan the M100 response in the tones and tape playbackcondition Nevertheless the difference was small In theright hemisphere there was an approximately equalreduction in M100 and M200 responses to the tones inthe speaking and tape playback conditions Within theM100ndashM200 region responses in these two conditionswere only significantly different at one latency as shownby the asterisk

It appears that at the most the act of speakingcreates only a limited degree of nonspecific attenuationto the tones in the left hemisphere that is not seen inthe tape playback condition To quantify how muchnonspecific attenuation contributes to the suppressionof the response to self-produced speech we can esti-mate the signal reductions needed to produce theresponse reductions seen in these experiments Fromthe results of Experiment 1 we have seen that com-pared to speech from the tape playback self-productionof speech created a 30 reduction in M100 amplitudein the left hemisphere and a 15 reduction in the righthemisphere equivalent to 13 and 7 dB decreases ineffective input signal respectively1 In Experiment 2 inthe left hemisphere there was possibly an extra 7reduction in M100 response to tones while subjectsspoke as compared to the M100 response to tones duringtape playback which is equivalent to only a 3-dB reduc-tion in effective input signal Thus it appears that non-specific attenuation cannot account for all the responsesuppression recorded for self-produced speech

It has also been argued that any attenuation in theresponses during vocalization could occur due to boneconduction and activation of middle-ear muscles or dueto inhibition at the level of the brain stem (Papanicolaouet al 1986) Thus to further test the hypothesis ofnonspecific attenuation we measured brainstem evokedresponses to click trains in five subjects under threeconditions identical to those in Experiment 2mdashsilenceproduced speech and tape-recorded speech In the lattertwo conditions we attenuated the speech background by20 dB to obtain a clear evoked response to clicks Foreach of these conditions we recorded the brainstemevoked response and measured the latency and ampli-tude of Waves IndashV The results from these experimentsindicated that there were no statistical differences ineither the amplitude or the latencies for all wavesbetween the three background conditions ( p gt 2)These results provide further support to the hypothesisthat our observed effects were cortical in origin

Experiment 3

Another possible cause of the response reduction toself-produced speech is that activity in the auditory

Figure 7 RMS waveforms for responses to 10-kHz tone pipsaveraged across all subjects for the three conditions of Experiment 2the tones alone condition (dashed traces) the tones and speakingcondition (thick traces) and the tones and tape playback condition(thin traces) The vertical bars in each trace show standard errorswhile the asterisks show intervals where the response waveforms fromthe tones and speaking condition (red) and the tones and tapeplayback condition (blue) differed significantly relative to p lt 001 forboth hemispheres

Figure 8 Setup for the two conditions of Experiment 3

1130 Journal of Cognitive Neuroscience Volume 14 Number 8

cortex is the result of a comparison between the incom-ing signal and an internally generated prediction of whatthat signal would be If this were to be the case thenaltering the incoming signal should create a mismatchwith that expectation that would reduce or abolish theresponse suppression

In Experiment 3 we tested this possibility by alter-ing the auditory feedback to the subject during self-production of speech The setup for Experiment 3 isshown in Figure 8 Like Experiment 1 this experimentconsisted of two conditions speaking and tape play-back However instead of hearing their speech (self-produced or from tape playback) subjects heard a sumof their speech plus the output of a white noise gen-erator that produced noise bursts gated to the durationof the speech utterance Subjects reported that theycould not hear their own speech and that they onlyheard noise gated to their vowel production

The results of Experiment 3 are shown in Figure 9 Forboth hemispheres the thick traces show the averageevoked response of subjects hearing gated noise in thespeaking condition while the thin traces show theirresponses to gated noise in the tape playback conditionAsterisks mark poststimulus latencies where the evokedresponses in the two conditions differ significantlyrelative to p lt 001 for both hemispheres The figureshows subjectsrsquo evoked responses in the two conditionsare significantly different at a number of latencies In theleft hemisphere the responses in the speaking and tapeplayback conditions differ significantly around the M100responses and at a latency of 150 msec In the righthemisphere the speaking and tape playback responses

differ significantly at the 50- and 200-msec latenciesSince we are limiting our analysis of these results toconsidering only M100 responses the only responsedifferences of interest are the significant differences seenin the M100 region of the left hemisphere responses Itis clear from the figure however that these responsedifferences arise principally from a latency differencebetween M100 responses the M100 response to gatednoise in the tape playback condition is about 20 msecearlier that the M100 response to gated noise in thespeaking condition The amplitudes of the two M100responses appear nearly identical

In sum Figure 9 shows that the M100 amplitudesuppression seen in Experiment 1 is abolished in bothhemispheres the amplitude of M100 response to noisegenerated from subjectsrsquo speech via the microphone isas large as the M100 response to noise generated fromthe tape playback of their speech Thus when wealtered subjectsrsquo speech feedback in Experiment 3 wefound that the response suppression to self-producedspeech disappeared which is consistent with the sup-pression resulting from auditory input matching aninternally expected input

DISCUSSION

The present study demonstrates that the human audi-tory cortex responds differently to self-produced speechthan to externally produced speech In Experiment 1significant suppression in the amplitude of M100responses was recorded in the speaking condition inboth hemispheres confirming that self-production ofspeech suppresses the response in the auditory cortexto that speech In Experiment 2 M100 responses totone pips were seen to be modestly more stronglysuppressed by self-produced than by externally pro-duced speech However this relatively weak extra sup-pression was not sufficient to explain the suppressionseen in Experiment 1 On the other hand in Experi-ment 3 we were able to abolish the suppression ofM100 responses to self-produced speech by altering thefeedback heard by subjects This last result is consistentwith the hypothesis that the suppression of response inthe auditory cortex to self-produced speech resultsfrom a match with expected auditory feedback

Comparison with Other Studies

Experiment 1 is quite similar to the study by Curio et al(2000) which also examined M100 response differencesbetween speaking and tape playback conditions Theauthors found M100 amplitude differences across listen-ing conditions in both hemispheres but unlike thepresent study these differences were not significant inthe right hemisphere Further in contradiction to ourstudies Curio et al described significant differencesin M100 latencies across listening conditions in both

Figure 9 RMS response waveforms in Experiment 3 averaged acrossall subjects The vertical bars in each trace show standard errors whilethe asterisks show intervals where the response waveforms from thespeaking condition (thick traces) and tape playback condition (thintraces) differ significantly relative to p lt 001 for both hemispheres

Houde et al 1131

hemispheres Different results in M100 latencies couldbe accounted for by the fact that we matched theacoustic waveform presented to each subject whileCurio et al behaviorally matched the perceptual inten-sity between the listening conditions Because subjectscould set the tape playback volume to be different fromtheir speech feedback volume differing audio levels inthe two experiment conditions could account for someof the M100 latency differences that they report WithMEG data acquisition triggered by speech signal onsettriggering occurs when the audio signal exceeds somethreshold voltage As a result triggering time is sensitiveto the amplification of the signal The more a signal isamplified the faster it will reach the trigger thresholdThus if tape playback volume levels were less thanspeech feedback volume levels triggering from tapeplayback could be delayed compared to the speakingcondition If so then for the same true M100 latencyM100 peaks would appear to have shorter latencies inthe tape playback condition

In Experiment 3 the amplitude difference in M100responses between the speaking and tape playbackconditions was abolished by using gated noise todistort the feedback heard by subjects Interestinglyhowever this feedback manipulation created a latencydifference between the two conditions In the lefthemisphere the M100 response to gated noise trig-gered by tape playback was 15 msec earlier than thatfor gated noise triggered by the subjectrsquos own speechwhile in the right hemisphere such a latency differenceis not seen Curio et al (2000) also found that thereduced latency of M100 responses to tape playbackversus self-produced speech was significantly morepronounced in the left hemisphere Like Curio et alwe attribute this left hemisphere latency difference totwo factors First this latency difference may representdifferential processing of externally produced versusself-produced auditory feedback Although the gatednoise altered the spectral characteristics of their speechfeedback such that it was likely to mismatch anyinternally generated expectations of how their feedbackwould sound the gated noise did not alter the tempo-ral characteristics of their speech feedback Subjectsstill heard auditory feedback begin at the same time asthey commenced phonation This temporal predictabil-ity could potentially have allowed subjects to processdifferently gated noise arising from their own speechwhose onset they could predict from gated noisearising from tape playback whose onset they couldnot predict Second the fact that this latency differencewas seen primarily in the left hemisphere is consistentwith the left hemispherersquos dominance in the produc-tion of speech in right-handed speakers Other aspectsof our results may also reflect this left hemispheredominance In Experiment 1 the M100 response totape playback speech was larger in the left hemispherethan in the right and the M100 response to self-

produced speech was temporally sharper in the lefthemisphere than in the right Additionally in Experi-ment 2 in the left hemisphere the M100 response totones was significantly more suppressed by self-pro-duced speech than by tape playback speechmdasha differ-ence not seen in the right hemisphere

Results from Experiment 2 are mostly consistent withthe recent experiments by Numminen and Curio (1999)They reported that neuromagnetic responses to shortrecorded vowel sounds were delayed and dampenedrelative to background-free presentations They did notreport how much they externally attenuated the pro-duced speech signal in order to obtain a clear responseto the probe-vowel sound In our experiments we usedprobe tones instead of vowel sounds and we attenuatedthe produced speech signal to the ear by 20 dBSurprisingly we did not observe any latency differencesin the M100 responses for tones between the threeconditions We believe that the lack of a latency effectcould be accounted for by the fact that we used tonesinstead of vowel sounds (Diesch amp Luce 1997 Poeppelet al 1996 Kuriki amp Murase 1989) Furthermore by anargument analogous to that discussed above for theCurio et al (2000) latency results it is also reasonablethat any amplitude mismatches in the masking soundscould account for discrepancies in the latency effect(Hari amp Makela 1988) However while Numminen andCurio identified auditory interference as the main causeof these modifications their results also provide evi-dence against the nonspecific attenuation hypothesisalthough they do not specifically discuss their results inthis form For instance they found an additional 6ndash9decrease in amplitude in response to probe-vowelsounds during produced speech background whencompared with tape-recorded speech backgroundThese amplitude differences are consistent with ourExperiment 2 results

Gunji et al have found that vocalization-related cort-ical magnetic fields activated six sources that were foundto temporally overlap in the period 0ndash100 msec aftervocalization onset (Gunji Hoshiyama amp Kakigi 2000Gunji Kakigi amp Hoshiyama 2000) Sources 1 and 2 wereactivated approximately 150 msec before the vocalizationonset and were located in laryngeal motor areas of theleft and right hemispheres respectively Kuriki Mori andHirata (1999) have also found a motor planning centerfor speech articulation that is activated 120ndash320 msecbefore the onset of vocalization and located in a regionaround the superior end of the left insula Gunji et alSources 5 and 6 were located in the truncal motor area ineach hemisphere and were similar to Sources 1 and 2 Incontrast their Sources 3 and 4 were located in theauditory cortices of the left and right hemisphererespectively and were activated after vocalization onsetHowever these experiments did not explore either themodulation of the responses in the auditory or motorcortex due to vocalization or feedback alteration In our

1132 Journal of Cognitive Neuroscience Volume 14 Number 8

experiments because we were unable to simultaneouslymeasure activity from motor and supplementary motorareas using a 37-channel sensor array over each hemi-sphere we focus here only on the activity from auditorycortices of both hemispheres

More relevant to our results are the studies of Hiranoet al (Hirano et al 1996 Hirano Kojima et al 1997Hirano Naito et al 1997) who used PET to look atcortical activation during speaking In their first studyHirano et al found a lack of activation in the auditorycortex while subjects vocalized In their second studyHirano et al altered how subjects heard their ownvocalizations and this time recorded significant activa-tion of the auditory cortex These results are consistentwith our findings in Experiments 1 and 3 of our study

Why is the Response to Self-Produced SpeechSuppressed

The present study along with recent MEG studies byCurio et al and other brain imaging studies that haveused PET or fMRI demonstrate that during speechproduction the auditory cortex suppresses its responseto expected acoustic signals (Hirano et al 1996 HiranoKojima et al 1997 Hirano Naito et al 1997) Doesthat suppression serve any function We speculate abouttwo possibilities

Auditory Perception

One possibility is that the suppression results from theprocess of distinguishing self-produced from externallyproduced sources It would be useful for sensory systemsto make this distinction since sensory information fromthese two types of sources is commonly used for differ-ent purposes Sensory information from self-producedmotor actions could potentially be used for feedbackcontrol On the other hand sensory information fromexternal sources is primarily used for recognition

But how could the sensory systems distinguish self-produced from externally produced sensory informa-tion It has been proposed by previous researchers thatthis is done by having motor actions produce expect-ations of their sensory consequences (Jeannerod 1988)These sensory outcome predictions are then comparedwith the actual sensory input whatever matches theoutcome predictions is inhibited In that way the sen-sory consequences of self-produced actions could befiltered from incoming sensory data thus highlightingany unpredicted and therefore external (or novel) stim-ulus that should potentially be attended to

For a number of reasons the results of our experi-ments are consistent with the above account Firstalthough we found significant suppression of the re-sponses of the auditory cortex to self-produced speechwe found little evidence that the act of producingspeech suppresses responses to other sounds (in our

case the 1-kHz tone) This finding is consistent with thehypothesis that only responses to self-produced soundsare being filtered from the auditory cortex In additionwe found that altering the auditory feedback with gatednoise abolished the suppression in the auditory cortexinduced by speaking consistent with the hypothesis thatthe filtering out of auditory information arising from self-produced speech is done by comparison with predictedauditory feedback

Recently Blakemore et al have recorded the sametypes of response suppression phenomena in the soma-tosensory cortex suggesting that the need to distinguishself-produced from externally generated sensory inputmay be a general property of sensory systems In theirseries of experiments Blakemore et al used bothbehavioral and physiological procedures to look at re-sponses to both self-produced and externally producedsomatosensory stimulation In one experiment theyexamined the well-known phenomenon that peoplecannot tickle themselves (Weiskrantz Elliott amp Darling-ton 1971) This phenomenon could be explained bysupposing that somatosensory responses are inhibitedwhen the incoming sensory (touch) information matchesa prediction of that input generated by the personrsquos ownmotor system To test this hypothesis these investigatorshad subjects tickle their palm via an apparatus that couldintroduce a time delay between their movement andthe touch stimulation of their palm (Blakemore Frith ampWolpert 1999) When this was done subjects reportedthat with increasing delay the stimulation of theirpalm was increasingly stimulating (ticklish) That theperceived stimulation increased with delay could beexplained by supposing that the delay made the incom-ing sensory information mismatch in time the predictedsensory input This experiment was followed up in anfMRI imaging study looking at activation of the somato-sensory cortex and cerebellum to both self-producedand externally generated palm stimulation (BlakemoreWolpert amp Frith 1999) It was found that activation ofthe somatosensory cortex to self-produced palm stimu-lation was suppressed compared to externally producedpalm stimulationmdasha direct analog of the results recordedin the auditory cortex in our experiments

Speech Motor Control

The suppression could also serve to regulate howmuch auditory feedback is used in the ongoing controlof speech

A problem with using auditory feedback to controlspeech is the delay incurred by the processing of theauditory input To control speech auditory feedbackmust convey information about the positions of thevocal tract articulators From speech acoustics we knowthat formant frequencies and pitch are the acousticparameters that convey this information (Stevens1999) Extracting these parameters from the type of

Houde et al 1133

signal provided by the cochlea requires significant signalprocessing which in turn suggests the possibility ofsignificant delay if a succession of auditory brain areasare needed for this processing (Jurgens 2002 BurnettFreedland Larson amp Hain 1998 Perkell 1997) Inaddition there will be delays associated with the gen-eration of a response in the motor cortex There will alsobe intrinsic delays in transmitting the neural signals tothe appropriate muscles as well as delays in muscleresponses to these signals (Jurgens 2002) Togetherthese factors could contribute to a significant feedbackloop delay between auditory input and motor output

Unfortunately we know from control theory that ingeneral using feedback to control a system with a largefeedback loop delay will not work because the systembecomes unstable (Franklin Powell amp Emami-Naeini1991) Intuitively this instability arises because the delaymakes the feedback uncorrelated with the state of thesystem sensory feedback from some time in the pastdoes not entirely reflect the current state of the systemYet auditory feedback does appear to be involved in theonline control of some aspects of speech production Asnoted in the Introduction if the pitch of a speakerrsquosfeedback is perturbed they will typically compensatewithin 100 ndash200 msec (Burnett et al 1998 Kawahara1993 Elman 1981) Thus in spite of a large (100ndash200 msec) feedback loop delay auditory feedback doesstably regulate a speakerrsquos pitch How is this possibleAgain from control theory we know that when feedbackis delayed corrupted by noise or otherwise madeuncorrelated with current motor output a successfulapproach to using feedback control is to attenuate thesensory feedback to the degree that it is uncorrelatedSpecifically in feedback control based on Kalman filter-ing the Kalman gain on sensory feedback is set in partto be inversely proportional to the degree to whichsensory feedback is correlated with the current systemstate (Jacobs 1993) In an analogous way it may be thatthe suppression of the auditory cortex during speakingfunctions to attenuate the auditory feedback to a levelcommensurate with its delay such that it can be usedproperly in the control of speech

Finally it is possible that both of the above accountsare true It may be that the suppression of self-producedspeech acts as a filter for incoming sensory data allow-ing attention to be paid to externally produced sensa-tions while at the same time properly attenuating theauditory feedback for use in the online control ofspeech Future experiments must be designed to distin-guish between the above hypothesized possibilities

METHODS

Subjects

Eight male volunteers (ages 25ndash40) participated inExperiments 1 and 3 of this study Of these volun-

teers six also participated in Experiment 2 All subjectsgave their informed consent All studies were per-formed with the approval of the UCSF Committee forHuman Research

Experimental Design

Magnetic fields were recorded from both hemispheresin a shielded room using two 37-channel biomagnetom-eters with SQUID-based first-order gradiometer sensors(Magnes II 4-D Neuroimaging San Diego CA) A sche-matic of the experimental setup is shown in Figure 1A directional microphone was placed in the chamber at adistance where it did not distort the recorded magneticfields Triggering of the data acquisition system and ofstimuli delivered to the ears was controlled and itdiffered for each of three experiments in the study

Fiduciary points were marked on the skin for latercoregistration with structural magnetic resonanceimages and the head shape was digitized to constrainsubsequent source modeling The sensor was initiallypositioned over the estimated location of the auditorycortices in both hemispheres such that a dipolar M100response was evoked by single 400-msec duration tonepips (1 kHz 5 msec risefall ramps 90 dB SPL) Dataacquisition epochs were 600 msec in total duration witha 100-msec prestimulus period referenced to the onsetof the first speech stimulus Data were acquired at asampling rate of 1041 Hz The position of the sensorwas then refined so that a single dipole localizationmodel of the averaged evoked magnetic field responsesto 100 tones resulted in a correlation and goodness-of-fitgreater than 95

With satisfactory sensor positioning over the auditorycortices in both hemispheres but before starting theexperiments subjects practiced producing the neutralvowel sound by phonating while relaxing their uppervocal tract generating no movement of their jaw ortongue Preliminary experiments revealed that this formof speech production produced no movement artifactsin these magnetic field recordings

During the speaking condition of Experiments 1 and 3the audio onset of each utterance picked up the micro-phone triggered the MEG data acquisition system whichrecorded 1 sec of data (300 msec pretrigger 700 msecposttrigger) at a sampling rate of 1041 Hz Subjectswere instructed to produce short vowel sounds byphonating their vocal chords without any movement oftheir jaw or tongue Responses to 100 such utteranceswere recorded and these magnetic field recordings werethen averaged to obtain the evoked magnetic fieldresponses to self-produced speech for each MEG detec-tor channel In the subsequent tape playback conditionthe audio signal from the tape recorder was fed to thesubjects earphones and the trigger input for MEG dataacquisition Again magnetic field recordings wereaveraged to obtain the evoked magnetic field responses

1134 Journal of Cognitive Neuroscience Volume 14 Number 8

During the tape playback condition of Experiments 1and 3 the audio level of the signal fed to the earphonesand trigger was adjusted to be the same as that of thespeaking condition Because of this the trigger of MEGdata acquisition (which was based on an audio voltagethreshold) occurred at the same time relative to utter-ance onset in both conditions The equal audio levelsensured that triggering latency differences could notaccount for any differences between the evoked re-sponses to self-produced and tape-recorded speech InExperiment 1 the subject heard these equal audio levelsin the earphones which meant that his side tone ampli-tude was the same in both the speaking and tape play-back conditions In Experiment 3 the audio signal foreach utterance was used to trigger both the MEG dataacquisition system and a white noise generator (20 Hzndash20 kHz bandwidth 85 dB SPL) In this case a sum of theiraudio signal and noise was presented to the subject

In Experiment 2 onset of the tone pips triggered MEGdata acquisition in each of the three conditions (tonesalone tones and speaking and tones and tape playback)These 1-kHz tones each had a 400-msec duration 5-mseconset and offset ramps and a 08ndash12-sec ISI betweentones The subject heard the tones at 90 dB SPL As withExperiment 1 data acquisition epochs were 10 sec intotal duration with a 300-msec prestimulus period Datawere acquired at a sampling rate of 1041 Hz In eachcondition the evoked magnetic field responses to thetone pips were obtained by averaging 100 epochs

Epochs 1500 msec long with a 500-msec prestimulusperiod were collected at a sampling rate of 1041 Hz Theraw epoch magnetic field data from these 100 utteranceswere then averaged to obtain the evoked magnetic fieldresponses to gated noise during self-production

The voltage level of the audio signal fed to earphoneand trigger signal in this condition was adjusted to beidentical to the voltage levels recorded during the speak-ing condition ensuring that triggering differences couldnot account for any differences between the evokedresponses to noise in the self-produced and tape play-back blocks

Data Analysis

Data from the two sensor arrays positioned over eachhemisphere were separately analyzed From the evokedmagnetic field response for each condition in eachchannel the RMS value of evoked magnetic field strengthaveraged across channels was computed The RMS meth-od of averaging (square root of the mean of squaredchannel field strengths) was used to avoid the cancella-tion that occurs when the magnetic fields from detectorson opposite sides of the current dipole (whose fields areroughly equal in magnitude but opposite in sign) areaveraged together This process resulted in one RMStime waveform for each hemisphere for each conditionfor each subject For statistical comparisons across con-

ditions a one-way analysis of variance was performedwith condition as factor and time as a repeated measure

The RMS across channels is a convenient measure ofactivity that is independent of the position of individualsensors Following optimal placement of the sensor for asignificant M100 dipolar response RMS has been shownto correlate well with the lsquolsquoequivalent current-dipolestrengthrsquorsquo (also called the lsquolsquoQ valuersquorsquo) of a dipole posi-tioned in and around the auditory cortex The RMSmeasure is also correlated with the data obtained from(1) the channel with the biggest signal-to-noise ratioand (2) the first two principal components of the arrayresponse (Ahissar et al 2001 Mahncke 1998)

In sensor arrays comprising of planar gradiometers(like the Neuromag 122 or 306) the channel with largestsignal-to-noise ratio is often closest to the source (asdictated by the physics) However in the case of sensorarrays comprising of axial gradiometers (as in ourMagnes II system) the channel with the largest signal-to-noise ratio reflects the peak of the magnetic fieldwhich is not the closest channel to a source Moreoverthe lsquolsquobestrsquorsquo channel could reflect either a peak lsquolsquoinwardrsquorsquoor lsquolsquooutwardrsquorsquo field requiring signed inversion whileaveraging such data across subjects Finally since thesensor array position can varying considerably acrosssubjects it is difficult to use procedures typical for EEGand whole-head studies wherein single channel data isoften picked for analysis across subjects These consid-erations have led us to use RMS instead of pickingspecific channels for our analysis

Single Dipole Model Fit

A single equivalent current-dipole model was calculatedseparately for each hemisphere for each poststimulustime point using standard 4-D Neuroimaging softwareoperating on data that had been filtered between 1 and20 Hz The localization algorithm used an iterative least-squares minimization to compute the strength andlocation of a single dipole in a spherical volume ofuniform conductivity that can account for the sensordata Dipole fits were accepted based on a local corre-lation maximum criteria of 95 and goodness-of-fit valuesgreater than 95

The MEG anatomical reference frame was establishedusing a digital device called the lsquolsquosensor position indica-torrsquorsquo A set of three table-mounted and one array-mounted receivers triangulates the signal from a stylustransmitter positioned at fiducial reference points on thesubjectrsquos head surface (typically nasion left right pre-auricular points Cz and inion) The same stylus trans-mitter arrangement served to define the curvature of thehead by tracing the surface from which a lsquolsquolocal spherersquorsquomodel of the head was generated To coregister thedipole locations on the MRI image the fiduciary pointsserve as the basis for a common coordinate system Bysuperimposing these fiducial landmarks on MR images

Houde et al 1135

of the subject it was possible to define the position ofthe computed point sources with an accuracy of 5 mm

Multiple Dipole Localization Analysis

Multiple dipole localization analyses of spatio-temporalevoked magnetic fields was also performed using Multi-ple Signal Classification (MUSIC) (Sekihara PoeppelMarantz Koizumi amp Miyashita 1997 Mosher Lewisamp Leahy 1992) and beamformer algorithms (Mosheret al 1992 Sekihara Nagarajan Poeppel Marantz ampMiyashita 2001 Robinson amp Vrba 1999) MUSIC meth-ods are based on estimation of a signal lsquolsquosubspacersquorsquofrom entire spatio-temporal MEG data using singular-value decomposition (SVD) A version of the MUSICalgorithm referred to as the lsquolsquoconventionalrsquorsquo MUSICalgorithm was implemented in MATLAB under theassumption that the sources contributing to the MEGdata arose from multiple dipoles (lt37 in number)which were stationary and were located within a spher-ical volume of uniform conductivity (Sekihara et al1997) The locations of dipoles are typically determinedby conducting a search over a three-dimensional grid ofinterest within the head Given the sensor positionsand the coordinates of the origin of a lsquolsquolocal spherersquorsquoapproximation of the head shape for each subject alead-field matrix was computed for each point in this3-D grid At each point in this grid from the lead-fieldmatrix and the covariance matrix of the spatio-temporalMEG data the value of a MUSIC localizer function canthen be computed (Sekihara et al 1997 Equation (4))Maxima of this localizer function then corresponded tothe location of dipolar sources We used the MUSICalgorithm to compute the localizer functions over athree-dimensional grid in the hemisphere contralateralto the stimulus delivery The localizer was computedusing the evoked magnetic field responses over a timewindow of 1000 msec following the onset of thesentences For each subject at each point in a 3-D grid(iexcl4 lt x lt 6 0 lt y lt 8 3 lt z lt 11) in the lefthemisphere the localizer function was computed usingthe averaged evoked auditory magnetic field responses

Acknowledgments

This work was funded in part by Scientific Learning Co and agrant from the Whitaker Foundation

Reprint requests should be sent to John F Houde W M KeckCenter for Integrative Neuroscience University of CaliforniaSan Francisco 513 Parnassus Avenue S-762 San Francisco CA94143 USA or via e-mail houdephyucsfedu

Note

1 In a recent study of M100 responses to tones it was foundthat near detection thresholds a 15-dB change in SPL caused a34 change in M100 RMS amplitude (Stufflebeam et al 1998)

REFERENCES

Ahissar E Nagarajan S Ahissar M Protopapas A MahnckeH amp Merzenich M M (2001) Speech comprehension iscorrelated with temporal response patterns recorded fromauditory cortex Proceedings of the National Academy ofSciences USA 98 13367 ndash13372

Blakemore S J Frith C D amp Wolpert D M (1999)Spatio-temporal prediction modulates the perception ofself-produced stimuli Journal of Cognitive Neuroscience11 551 ndash559

Blakemore S J Wolpert D M amp Frith C D (1999) Thecerebellum contributes to somatosensory cortical activityduring self-produced tactile stimulation Neuroimage 10448 ndash459

Burnett T A Freedland M B Larson C R amp Hain T C(1998) Voice f0 responses to manipulations in pitchfeedback Journal of the Acoustical Society of America103 3153ndash3161

Cooper W E (1979) Speech perception and productionStudies in selective adaptation Norwood NJ Ablex

Creutzfeldt O amp Ojemann G (1989) Neuronal activityin the human lateral temporal lobe III Activity changesduring music Experimental Brain Research 77490 ndash498

Creutzfeldt O Ojemann G amp Lettich E (1989a) Neuronalactivity in the human lateral temporal lobe I Responses tospeech Experimental Brain Research 77 451 ndash475

Creutzfeldt O Ojemann G amp Lettich E (1989b) Neuronalactivity in the human lateral temporal lobe II Responses tothe subjects own voice Experimental Brain Research 77476 ndash489

Curio G Neuloh G Numminen J Jousmaki V amp Hari R(2000) Speaking modifies voice-evoked activity in thehuman auditory cortex Human Brain Mapping 9183 ndash191

Demonet J F Chollet F Ramsay S Cardebat DNespoulous J L Wise R Rascol A amp Frackowiak R(1992) The anatomy of phonological and semanticprocessing in normal subjects Brain 115 1753 ndash1768

Diesch E amp Luce T (1997) Magnetic fields elicited bytones and vowel formants reveal tonotopy and nonlinearsummation of cortical activation Psychophysiology 34501 ndash510

Elman J L (1981) Effects of frequency-shifted feedback onthe pitch of vocal productions Journal of the AcousticalSociety of America 70 45ndash50

Franklin G F Powell J D amp Emami-Naeini A (1991)Feedback control of dynamic systems (2nd ed) ReadingMA Addison-Wesley

Gracco V L Ross D Kalinowski J amp Stuart A (1994)Articulatory changes following spectral and temporalmodifications in auditory feedback Journal of theAcoustical Society of America 95 2821

Gunji A Hoshiyama M amp Kakigi R (2000) Identificationof auditory evoked potentials of onersquos own voice ClinicalNeurophysiology 111 214 ndash219

Gunji A Kakigi R amp Hoshiyama M (2000) Spatiotemporalsource analysis of vocalization-associated magnetic fieldsCognitive Brain Research 9 157 ndash163

Hari R Aittoniemi K Jarvinen M L Katila T amp Varpula T(1980) Auditory evoked transient and sustained magneticfields of the human brain Localization of neural generatorsExperimental Brain Research 40 237 ndash240

Hari R amp Makela J P (1988) Modification of neuromagneticresponses of the human auditory cortex by masking soundsExperimental Brain Research 71 87ndash92

Hein A V amp Held R (1962) A neural model for labile

1136 Journal of Cognitive Neuroscience Volume 14 Number 8

sensorimotor coordination In E Bernard amp M Hare(Eds) Biological prototypes and synthetic systems(vol 1 pp 71ndash74) New York Plenum

Hickok G amp Poeppel D (2000) Towards a functionalneuroanatomy of speech perception Trends in CognitiveSciences 4 131 ndash138

Hirano S Kojima H Naito Y Honjo I Kamoto YOkazawa H Ishizu K Yonekura Y Nagahama YFukuyama H amp Konishi J (1996) Cortical speechprocessing mechanisms while vocalizing visually presentedlanguages NeuroReport 8 363ndash367

Hirano S Kojima H Naito Y Honjo I Kamoto YOkazawa H Ishizu K Yonekura Y Nagahama YFukuyama H amp Konishi J (1997) Cortical processingmechanism for vocalization with auditory verbal feedbackNeuroReport 8 2379ndash2382

Hirano S Naito Y Okazawa H Kojima H Honjo I IshizuK Yenokura Y Nagahama Y Fukuyama H amp Konishi J(1997) Cortical activation by monaural speech soundstimulation demonstrated by positron emission tomographyExperimental Brain Research 113 75ndash80

Houde J F amp Jordan M I (1998) Sensorimotor adaptationin speech production Science 279 1213 ndash1216

Jacobs O L R (1993) Introduction to control theory(2nd ed) Oxford UK Oxford University Press

Jeannerod M (1988) The neural and behaviouralorganisation of goal-directed movements Oxford UKOxford University Press

Jurgens U (2002) Neural pathways underlying vocalcontrol Neuroscience and Biobehavioral Reviews 26235 ndash258

Kawahara H (1993) Transformed auditory feedback Effectsof fundamental frequency perturbation Journal of theAcoustical Society of America 94 1883

Kuriki S Mori T amp Hirata Y (1999) Motor planningcenter for speech articulation in the normal human brainNeuroReport 10 765 ndash769

Kuriki S amp Murase M (1989) Neuromagnetic study of theauditory responses in right and left hemispheres of thehuman brain evoked by pure tones and speech soundsExperimental Brain Research 77 127 ndash134

Lane H amp Tranel B (1971) The lombard sign and the roleof hearing in speech Journal of Speech and HearingResearch 14 677ndash709

Lee B S (1950) Some effects of side-tone delay Journal ofthe Acoustical Society of America 22 639 ndash640

Leonardo A amp Konishi M (1999) Decrystallization of adultbirdsong by perturbation of auditory feedback Nature 399466 ndash470

Levelt W J (1983) Monitoring and self-repair in speechCognition 14 41ndash104

Levelt W J M (1989) Speaking From intention toarticulation Cambridge MIT Press

Liegeois-Chauvel C Musolino A Badier J M Marquis P ampChauvel P (1994) Evoked potentials recorded from theauditory cortex in man Evaluation and topography of themiddle latency components Electroencephalography andClinical Neurophysiology 92 204 ndash214

Mahncke H (1998) Oscillations and temporal processingin sensory cortex Unpublished doctoral dissertationUniversity of California San Francisco

McCasland J S amp Konishi M (1981) Interaction betweenauditory and motor activities in an avian song controlnucleus Proceedings of the National Academy of SciencesUSA 78 7815ndash7819

McGuire P K Silbersweig D A amp Frith C D (1996)Functional neuroanatomy of verbal self-monitoring Brain119 907 ndash917

Mosher J C Lewis P S amp Leahy R M (1992) Multipledipole modeling and localization from spatio-temporal megdata IEEE Transactions on Biomedical Engineering 39541 ndash557

Muller-Preuss P amp Ploog D (1981) Inhibition of auditorycortical neurons during phonation Brain Research 21561ndash76

Numminen J amp Curio G (1999) Differential effects of overtcovert and replayed speech on vowel-evoked responsesof the human auditory cortex Neuroscience Letters 27229ndash32

Numminen J Salmelin R amp Hari R (1999) Subjectrsquos ownspeech reduces reactivity of the human auditory cortexNeuroscience Letters 265 119 ndash122

Papanicolaou A C Raz N Loring D W amp EisenbergH M (1986) Brain stem evoked response suppressionduring speech production Brain and Language 2750ndash55

Perkell J (1997) Articulatory processes In W J Hardcastleamp J Laver (Eds) The handbook of phonetic sciences(pp 333 ndash370) Oxford UK Blackwell Publishers

Picton T W Alain C Woods D L John M S Scherg MValdes-Sosa P Bosch-Bayard J amp Trujillo N J (1999)Intracerebral sources of human auditory-evoked potentialsAudiology and Neuro-otology 4 64ndash79

Poeppel D Yellin E Phillips C Roberts T P Rowley H AWexler K amp Marantz A (1996) Task-induced asymmetry ofthe auditory evoked m100 neuromagnetic field elicited byspeech sounds Brain Research Cognitive Brain Research4 231 ndash242

Reite M Adams M Simon J Teale P Sheeder JRichardson D amp Grabbe R (1994) Auditory m100component 1 Relationship to heschlrsquos gyri BrainResearch Cognitive Brain Research 2 13ndash20

Richer F Alain C Achim A Bouvier G amp Saint-Hilaire J M(1989) Intracerebral amplitude distributions of the auditoryevoked potential Electroencephalography and ClinicalNeurophysiology 74 202 ndash208

Robinson S amp Vrba J (1999) Functional neuroimaging bysynthetic aperture magnetometry (SAM) In Yoshimoto et al(Eds) Recent advances in biomagnetism (pp 302 ndash305)Sendai Japan Tokyo University Press

Scherg M amp Von Cramon D (1985) Two bilateral sources ofthe late aep as identified by a spatio-temporal dipole modelElectroencephalography and Clinical Neurophysiology 6232ndash44

Scherg M amp Von Cramon D (1986) Evoked dipolesource potentials of the human auditory cortexElectroencephalography and Clinical Neurophysiology65 344 ndash360

Schmidt M F amp Konishi M (1998) Gating of auditoryresponses in the vocal control system of awake songbirdsNature Neuroscience 1 513ndash518

Sekihara K Nagarajan S S Poeppel D Marantz Aamp Miyashita Y (2001) Reconstructing spatio-temporalactivities of neural sources using an meg vectorbeamformer technique IEEE Transactions on BiomedicalEngineering 48 760ndash771

Sekihara K Poeppel D Marantz A Koizumi H ampMiyashita Y (1997) Noise covariance incorporatedmeg-music algorithm A method for multiple-dipoleestimation tolerant of the influence of background brainactivity IEEE Transactions on Biomedical Engineering44 839 ndash847

Stevens K N (1999) Acoustic phonetics CambridgeMIT Press

Stufflebeam S M Poeppel D Rowley H A amp RobertsT P (1998) Peri-threshold encoding of stimulus

Houde et al 1137

frequency and intensity in the m100 latency NeuroReport9 91ndash94

Suga N amp Schlegel P (1972) Neural attenuation ofresponses to emitted sounds in echolocating rats Science177 82ndash84

Suga N amp Shimozawa T (1974) Site of neural attenuation ofresponses to self-vocalized sounds in echolocating batsScience 183 1211 ndash1213

von Bekesy G (1949) The structure of the middle ear and thehearing of onersquos own voice by bone conduction Journal ofthe Acoustical Society of America 21 217 ndash232

Weiskrantz L Elliott J amp Darlington C (1971) Preliminaryobservations on tickling oneself Nature 230 598 ndash599

Woods D L Knight R T amp Neville H J (1984)Bitemporal lesions dissociate auditory evoked potentialsand perception Electroencephalography and ClinicalNeurophysiology 57 208 ndash220

Yates A J (1963) Delayed auditory feedback PsychologicalBulletin 60 213 ndash232

Zatorre R J Evans A C Meyer E amp Gjedde A (1992)Lateralization of phonetic and pitch discrimination in speechprocessing Science 256 846 ndash849

1138 Journal of Cognitive Neuroscience Volume 14 Number 8

recorded magnetic fields Acoustic input was deliveredto the subject via air tube-based earphones Triggeringof the data acquisition system and of stimuli delivered tothe ears was controlled and differed for each of threeexperiments in the study

The schematic for our first experiment is shown inFigure 2 This experiment consisted of two successiveconditions speaking and tape playback In the speakingcondition (Figure 2a) subjects were instructed to pro-duce the short vowel sound by phonating with theirjaw and tongue relaxed and stationary The subjectrsquosspeech was picked up by a microphone and fed to a taperecorder his earphones and the trigger input for MEGdata acquisition A total of 100 utterances were recordedon tape In the subsequent tape playback condition(Figure 2b) subjects were instructed to remain silentwhile they heard a tape-recording of their utterancesfrom the speaking condition

Results for one subject in the speaking condition areshown in Figure 3 which shows evoked magnetic fieldresponse recorded at each detector position averagedover the 100 utterance trials In the figure the tracesshow average response at each detector positionaligned to the onset of vocalization (gray tick marks)For each condition (speaking tape playback) the RMSresponses at each detector were averaged together forthe left and right hemispheres as shown in Figure 4The figure shows RMS responses averaged over detec-tors in the left (upper panel) and right (lower panel)hemisphere detector arrays In each panel the thicktrace shows the average RMS response in the speaking

condition and the thin trace shows the same for thetape playback condition The vertical axis of each panelindicates RMS field strength (fT) of the response whilethe horizontal axis indicates time (msec) relative to theonset of the vocalization which occurs at 0 msec Manydifferences between responses in the speaking and

Figure 3 RMS responses across detector arrays recorded in thespeaking condition of Experiment 1 from a single representativesubject The traces show average response at each detector positionaligned to the onset of vocalization (gray tick marks) (Gaps in thedetector arrays are due to offline detectors)

Figure 2 Setup for the two conditions of Experiment 1

Houde et al 1127

tape playback conditions can be seen in the figure Atabout 100 msec poststimulus onset most of the tracesshow a peak in RMS amplitude This peak is referred toas the M100 response and there are large differencesin its amplitude across conditions in both hemispheresfor this subject In the left hemisphere the M100response in the speaking condition is greatly sup-pressed compared to that seen in the tape playbackcondition In the right hemisphere this suppression iseven greater In the tape playback condition there isquite a large M100 response but in the speakingcondition the M100 is essentially absent

Figure 5 shows the same response waveforms of theprevious figure averaged across all eight subjects In thisfigure in addition to the average response waveformsvertical bars indicating standard errors are also shownThe asterisks in each panel indicate latencies at whichthe speaking and tape playback responses differed sig-nificantly relative to p lt 001 These plots show that inboth hemispheres mean M100 responses were signifi-cantly smaller in the speaking condition than in the tapeplayback condition It appears that left hemisphereresponses are larger than right hemisphere responsesin the tape playback condition but that both hemi-spheres are inhibited to about the same level in thespeaking condition However even in the speakingcondition the left hemisphere M100 response appearsmore focal in time than that of the right hemisphereLess consistent are the longer latency responses Forexample the amplitude of the M200 response in the lefthemisphere was significantly larger in the tape playbackcondition while in the right hemisphere it was signifi-cantly larger in the speaking condition

The acoustic signals reaching the ears via air conduc-tion the so-called lsquolsquoside-tonersquorsquo sound amplitudes wereadjusted to be the same for both the speaking and tapeplayback conditions all our subjects reported that theintensity of mic and tape signals were perceptuallyidentical However in the speaking condition becausethe subject hears both the side tone and bone conduc-tion of his voice the speech heard by a subject is actuallyabout twice the amplitude of (ie 3 dB louder) thespeech heard in the tape playback condition (vonBekesy 1949) Thus a priori we would expect thisamplitude difference to be a possible confound in theexperiment The size of evoked magnetic field responsein the auditory cortex and in particular the amplitudeof the M100 response have been shown to be a monot-onically increasing function of audio input level (Stuf-flebeam Poeppel Rowley amp Roberts 1998) From thiswe would expect the M100 response to be larger in thevoiced condition than in the tape playback conditionHowever this response difference should be small Atthe audio levels used in this experiment (80 dBA at theearphones) the effect of audio amplitude on the M100response has nearly saturated Additionally as we haveseen the actual M100 response recorded in the speakingcondition was much less than that recorded in the tapeplayback conditionmdasha result opposite of what would bepredicted based on audio amplitude differences

Source locations were performed on the spatio-temporal magnetic field responses recorded 50ndash200msec following stimulus onset Across all our stimulusconditions (mic vs tape vs tone response) we found nostatistically significant difference ( p gt 5) in the locationof an equivalent dipole in each hemisphere that accounts

Figure 5 RMS response waveforms in Experiment 1 averagedacross all subjects The vertical bars in each trace show standard errorswhile the asterisks show intervals where the response waveforms fromthe microphone condition (thick traces) and tape condition (thintraces) differ significantly relative to p lt 001 for both hemispheres

Figure 4 RMS responses recorded in Experiment 1 averaged over all37 detectors in each hemispherersquos detector array from a singlerepresentative subject In each plot the thick traces are responses inthe speaking condition the thin traces are responses in the tapeplayback condition The onset of the audio stimulus is at 0 msec

1128 Journal of Cognitive Neuroscience Volume 14 Number 8

for these magnetic field responses These localizationdata suggest that the observed response suppressionattenuation is arising from a restricted cortical regionpresumably A1 and its immediate environs

Overall all subjectsrsquo M100 responses were localized tothe temporal cortex in the area of the auditory cortexWe infer this because lesion studies intracranial record-ings and source modeling studies provide convergingevidence that the neuronal generators of the M100response were located in the auditory regions of thetemporal lobes and include A1 and its immediate envi-rons (Ahissar et al 2001 Picton et al 1999 Liegeois-Chauvel Musolino Badier Marquis amp Chauvel 1994Reite et al 1994 Richer Alain Achim Bouvier amp Saint-Hilaire 1989 Scherg amp Von Cramon 1985 1986Woods Knight amp Neville 1984 Hari Aittoniemi Jarvi-nen Katila amp Varpula 1980)

Experiment 2

The results of Experiment 1 clearly confirmed that theM100 response is suppressed in the speaking conditionOne possible mechanism that could cause this suppres-sion is nonspecific attenuation That is neural signalsarising as a result of activity in the motor cortex mightdirectly and broadly inhibit auditory cortex activity

If auditory cortex activity is nonspecifically suppressedduring speaking then it should be less responsive to allauditory signals In Experiment 2 we tested this possi-bility by examining the M100 response to tones heardduring speech that was either self-produced or playedback from tape

In this experiment we measured the responses evokedby tones pips under three successive conditions asshown in Figure 6 In the first condition (tones aloneFigure 6a) subjects remained silent and heard 10-kHztone pips In the second condition (tones and speakingFigure 6b) subjects again heard these tones pips but thistime the tones were presented while subjects producedlong utterances of the vowel (again with no move-ment of their jaw or tongue) In the third condition(tones and tape playback Figure 6c) subjects againremained silent and heard both tone pips and tapeplayback of their speech produced in the previous con-dition For the last two conditions in which the subjectheard tone pips and speech the speech was attenuatedby 20 dB to reduce the masking of the tone pips

The results of Experiment 2 are summarized in Figure7 The figure shows a plot of the mean RMS responseacross detectors and subjects for each condition and ineach hemisphere Dashed traces show subjectsrsquo re-sponses to hearing the tones in silence (tones alonecondition) Thick traces show responses to hearing thetones while subjects produced (tones and speakingcondition) Thin traces show responses to hearing thetones along with the taped speech playback (tones andtape playback condition) As in Figure 5 the vertical bars

are standard errors Asterisks indicate time intervalsduring which responses to tones heard while produc-ing speech (tones and speaking condition) differedsignificantly ( p lt 001) from the responses to tonesheard with taped speech playback (tones and tapeplayback condition)

For most of the traces in Figure 7 the M100 and M200responses to the tones across conditions stand outclearly For both hemispheres these responses are veryprominent for tones heard in silence (dashed traces) Inthe left hemisphere there was a reduction in M100 andM200 response to the tones in the speaking (thicktraces) and tape playback (thin traces) conditions In-terestingly the response reduction in the speakingcondition appeared to be greater than the responsereductions in the tape playback condition Indeed thestatistical analysis confirmed that the M100 response in

Figure 6 Setup for the three conditions of Experiment 2

Houde et al 1129

the tones and speaking condition was significantly lessthan the M100 response in the tones and tape playbackcondition Nevertheless the difference was small In theright hemisphere there was an approximately equalreduction in M100 and M200 responses to the tones inthe speaking and tape playback conditions Within theM100ndashM200 region responses in these two conditionswere only significantly different at one latency as shownby the asterisk

It appears that at the most the act of speakingcreates only a limited degree of nonspecific attenuationto the tones in the left hemisphere that is not seen inthe tape playback condition To quantify how muchnonspecific attenuation contributes to the suppressionof the response to self-produced speech we can esti-mate the signal reductions needed to produce theresponse reductions seen in these experiments Fromthe results of Experiment 1 we have seen that com-pared to speech from the tape playback self-productionof speech created a 30 reduction in M100 amplitudein the left hemisphere and a 15 reduction in the righthemisphere equivalent to 13 and 7 dB decreases ineffective input signal respectively1 In Experiment 2 inthe left hemisphere there was possibly an extra 7reduction in M100 response to tones while subjectsspoke as compared to the M100 response to tones duringtape playback which is equivalent to only a 3-dB reduc-tion in effective input signal Thus it appears that non-specific attenuation cannot account for all the responsesuppression recorded for self-produced speech

It has also been argued that any attenuation in theresponses during vocalization could occur due to boneconduction and activation of middle-ear muscles or dueto inhibition at the level of the brain stem (Papanicolaouet al 1986) Thus to further test the hypothesis ofnonspecific attenuation we measured brainstem evokedresponses to click trains in five subjects under threeconditions identical to those in Experiment 2mdashsilenceproduced speech and tape-recorded speech In the lattertwo conditions we attenuated the speech background by20 dB to obtain a clear evoked response to clicks Foreach of these conditions we recorded the brainstemevoked response and measured the latency and ampli-tude of Waves IndashV The results from these experimentsindicated that there were no statistical differences ineither the amplitude or the latencies for all wavesbetween the three background conditions ( p gt 2)These results provide further support to the hypothesisthat our observed effects were cortical in origin

Experiment 3

Another possible cause of the response reduction toself-produced speech is that activity in the auditory

Figure 7 RMS waveforms for responses to 10-kHz tone pipsaveraged across all subjects for the three conditions of Experiment 2the tones alone condition (dashed traces) the tones and speakingcondition (thick traces) and the tones and tape playback condition(thin traces) The vertical bars in each trace show standard errorswhile the asterisks show intervals where the response waveforms fromthe tones and speaking condition (red) and the tones and tapeplayback condition (blue) differed significantly relative to p lt 001 forboth hemispheres

Figure 8 Setup for the two conditions of Experiment 3

1130 Journal of Cognitive Neuroscience Volume 14 Number 8

cortex is the result of a comparison between the incom-ing signal and an internally generated prediction of whatthat signal would be If this were to be the case thenaltering the incoming signal should create a mismatchwith that expectation that would reduce or abolish theresponse suppression

In Experiment 3 we tested this possibility by alter-ing the auditory feedback to the subject during self-production of speech The setup for Experiment 3 isshown in Figure 8 Like Experiment 1 this experimentconsisted of two conditions speaking and tape play-back However instead of hearing their speech (self-produced or from tape playback) subjects heard a sumof their speech plus the output of a white noise gen-erator that produced noise bursts gated to the durationof the speech utterance Subjects reported that theycould not hear their own speech and that they onlyheard noise gated to their vowel production

The results of Experiment 3 are shown in Figure 9 Forboth hemispheres the thick traces show the averageevoked response of subjects hearing gated noise in thespeaking condition while the thin traces show theirresponses to gated noise in the tape playback conditionAsterisks mark poststimulus latencies where the evokedresponses in the two conditions differ significantlyrelative to p lt 001 for both hemispheres The figureshows subjectsrsquo evoked responses in the two conditionsare significantly different at a number of latencies In theleft hemisphere the responses in the speaking and tapeplayback conditions differ significantly around the M100responses and at a latency of 150 msec In the righthemisphere the speaking and tape playback responses

differ significantly at the 50- and 200-msec latenciesSince we are limiting our analysis of these results toconsidering only M100 responses the only responsedifferences of interest are the significant differences seenin the M100 region of the left hemisphere responses Itis clear from the figure however that these responsedifferences arise principally from a latency differencebetween M100 responses the M100 response to gatednoise in the tape playback condition is about 20 msecearlier that the M100 response to gated noise in thespeaking condition The amplitudes of the two M100responses appear nearly identical

In sum Figure 9 shows that the M100 amplitudesuppression seen in Experiment 1 is abolished in bothhemispheres the amplitude of M100 response to noisegenerated from subjectsrsquo speech via the microphone isas large as the M100 response to noise generated fromthe tape playback of their speech Thus when wealtered subjectsrsquo speech feedback in Experiment 3 wefound that the response suppression to self-producedspeech disappeared which is consistent with the sup-pression resulting from auditory input matching aninternally expected input

DISCUSSION

The present study demonstrates that the human audi-tory cortex responds differently to self-produced speechthan to externally produced speech In Experiment 1significant suppression in the amplitude of M100responses was recorded in the speaking condition inboth hemispheres confirming that self-production ofspeech suppresses the response in the auditory cortexto that speech In Experiment 2 M100 responses totone pips were seen to be modestly more stronglysuppressed by self-produced than by externally pro-duced speech However this relatively weak extra sup-pression was not sufficient to explain the suppressionseen in Experiment 1 On the other hand in Experi-ment 3 we were able to abolish the suppression ofM100 responses to self-produced speech by altering thefeedback heard by subjects This last result is consistentwith the hypothesis that the suppression of response inthe auditory cortex to self-produced speech resultsfrom a match with expected auditory feedback

Comparison with Other Studies

Experiment 1 is quite similar to the study by Curio et al(2000) which also examined M100 response differencesbetween speaking and tape playback conditions Theauthors found M100 amplitude differences across listen-ing conditions in both hemispheres but unlike thepresent study these differences were not significant inthe right hemisphere Further in contradiction to ourstudies Curio et al described significant differencesin M100 latencies across listening conditions in both

Figure 9 RMS response waveforms in Experiment 3 averaged acrossall subjects The vertical bars in each trace show standard errors whilethe asterisks show intervals where the response waveforms from thespeaking condition (thick traces) and tape playback condition (thintraces) differ significantly relative to p lt 001 for both hemispheres

Houde et al 1131

hemispheres Different results in M100 latencies couldbe accounted for by the fact that we matched theacoustic waveform presented to each subject whileCurio et al behaviorally matched the perceptual inten-sity between the listening conditions Because subjectscould set the tape playback volume to be different fromtheir speech feedback volume differing audio levels inthe two experiment conditions could account for someof the M100 latency differences that they report WithMEG data acquisition triggered by speech signal onsettriggering occurs when the audio signal exceeds somethreshold voltage As a result triggering time is sensitiveto the amplification of the signal The more a signal isamplified the faster it will reach the trigger thresholdThus if tape playback volume levels were less thanspeech feedback volume levels triggering from tapeplayback could be delayed compared to the speakingcondition If so then for the same true M100 latencyM100 peaks would appear to have shorter latencies inthe tape playback condition

In Experiment 3 the amplitude difference in M100responses between the speaking and tape playbackconditions was abolished by using gated noise todistort the feedback heard by subjects Interestinglyhowever this feedback manipulation created a latencydifference between the two conditions In the lefthemisphere the M100 response to gated noise trig-gered by tape playback was 15 msec earlier than thatfor gated noise triggered by the subjectrsquos own speechwhile in the right hemisphere such a latency differenceis not seen Curio et al (2000) also found that thereduced latency of M100 responses to tape playbackversus self-produced speech was significantly morepronounced in the left hemisphere Like Curio et alwe attribute this left hemisphere latency difference totwo factors First this latency difference may representdifferential processing of externally produced versusself-produced auditory feedback Although the gatednoise altered the spectral characteristics of their speechfeedback such that it was likely to mismatch anyinternally generated expectations of how their feedbackwould sound the gated noise did not alter the tempo-ral characteristics of their speech feedback Subjectsstill heard auditory feedback begin at the same time asthey commenced phonation This temporal predictabil-ity could potentially have allowed subjects to processdifferently gated noise arising from their own speechwhose onset they could predict from gated noisearising from tape playback whose onset they couldnot predict Second the fact that this latency differencewas seen primarily in the left hemisphere is consistentwith the left hemispherersquos dominance in the produc-tion of speech in right-handed speakers Other aspectsof our results may also reflect this left hemispheredominance In Experiment 1 the M100 response totape playback speech was larger in the left hemispherethan in the right and the M100 response to self-

produced speech was temporally sharper in the lefthemisphere than in the right Additionally in Experi-ment 2 in the left hemisphere the M100 response totones was significantly more suppressed by self-pro-duced speech than by tape playback speechmdasha differ-ence not seen in the right hemisphere

Results from Experiment 2 are mostly consistent withthe recent experiments by Numminen and Curio (1999)They reported that neuromagnetic responses to shortrecorded vowel sounds were delayed and dampenedrelative to background-free presentations They did notreport how much they externally attenuated the pro-duced speech signal in order to obtain a clear responseto the probe-vowel sound In our experiments we usedprobe tones instead of vowel sounds and we attenuatedthe produced speech signal to the ear by 20 dBSurprisingly we did not observe any latency differencesin the M100 responses for tones between the threeconditions We believe that the lack of a latency effectcould be accounted for by the fact that we used tonesinstead of vowel sounds (Diesch amp Luce 1997 Poeppelet al 1996 Kuriki amp Murase 1989) Furthermore by anargument analogous to that discussed above for theCurio et al (2000) latency results it is also reasonablethat any amplitude mismatches in the masking soundscould account for discrepancies in the latency effect(Hari amp Makela 1988) However while Numminen andCurio identified auditory interference as the main causeof these modifications their results also provide evi-dence against the nonspecific attenuation hypothesisalthough they do not specifically discuss their results inthis form For instance they found an additional 6ndash9decrease in amplitude in response to probe-vowelsounds during produced speech background whencompared with tape-recorded speech backgroundThese amplitude differences are consistent with ourExperiment 2 results

Gunji et al have found that vocalization-related cort-ical magnetic fields activated six sources that were foundto temporally overlap in the period 0ndash100 msec aftervocalization onset (Gunji Hoshiyama amp Kakigi 2000Gunji Kakigi amp Hoshiyama 2000) Sources 1 and 2 wereactivated approximately 150 msec before the vocalizationonset and were located in laryngeal motor areas of theleft and right hemispheres respectively Kuriki Mori andHirata (1999) have also found a motor planning centerfor speech articulation that is activated 120ndash320 msecbefore the onset of vocalization and located in a regionaround the superior end of the left insula Gunji et alSources 5 and 6 were located in the truncal motor area ineach hemisphere and were similar to Sources 1 and 2 Incontrast their Sources 3 and 4 were located in theauditory cortices of the left and right hemisphererespectively and were activated after vocalization onsetHowever these experiments did not explore either themodulation of the responses in the auditory or motorcortex due to vocalization or feedback alteration In our

1132 Journal of Cognitive Neuroscience Volume 14 Number 8

experiments because we were unable to simultaneouslymeasure activity from motor and supplementary motorareas using a 37-channel sensor array over each hemi-sphere we focus here only on the activity from auditorycortices of both hemispheres

More relevant to our results are the studies of Hiranoet al (Hirano et al 1996 Hirano Kojima et al 1997Hirano Naito et al 1997) who used PET to look atcortical activation during speaking In their first studyHirano et al found a lack of activation in the auditorycortex while subjects vocalized In their second studyHirano et al altered how subjects heard their ownvocalizations and this time recorded significant activa-tion of the auditory cortex These results are consistentwith our findings in Experiments 1 and 3 of our study

Why is the Response to Self-Produced SpeechSuppressed

The present study along with recent MEG studies byCurio et al and other brain imaging studies that haveused PET or fMRI demonstrate that during speechproduction the auditory cortex suppresses its responseto expected acoustic signals (Hirano et al 1996 HiranoKojima et al 1997 Hirano Naito et al 1997) Doesthat suppression serve any function We speculate abouttwo possibilities

Auditory Perception

One possibility is that the suppression results from theprocess of distinguishing self-produced from externallyproduced sources It would be useful for sensory systemsto make this distinction since sensory information fromthese two types of sources is commonly used for differ-ent purposes Sensory information from self-producedmotor actions could potentially be used for feedbackcontrol On the other hand sensory information fromexternal sources is primarily used for recognition

But how could the sensory systems distinguish self-produced from externally produced sensory informa-tion It has been proposed by previous researchers thatthis is done by having motor actions produce expect-ations of their sensory consequences (Jeannerod 1988)These sensory outcome predictions are then comparedwith the actual sensory input whatever matches theoutcome predictions is inhibited In that way the sen-sory consequences of self-produced actions could befiltered from incoming sensory data thus highlightingany unpredicted and therefore external (or novel) stim-ulus that should potentially be attended to

For a number of reasons the results of our experi-ments are consistent with the above account Firstalthough we found significant suppression of the re-sponses of the auditory cortex to self-produced speechwe found little evidence that the act of producingspeech suppresses responses to other sounds (in our

case the 1-kHz tone) This finding is consistent with thehypothesis that only responses to self-produced soundsare being filtered from the auditory cortex In additionwe found that altering the auditory feedback with gatednoise abolished the suppression in the auditory cortexinduced by speaking consistent with the hypothesis thatthe filtering out of auditory information arising from self-produced speech is done by comparison with predictedauditory feedback

Recently Blakemore et al have recorded the sametypes of response suppression phenomena in the soma-tosensory cortex suggesting that the need to distinguishself-produced from externally generated sensory inputmay be a general property of sensory systems In theirseries of experiments Blakemore et al used bothbehavioral and physiological procedures to look at re-sponses to both self-produced and externally producedsomatosensory stimulation In one experiment theyexamined the well-known phenomenon that peoplecannot tickle themselves (Weiskrantz Elliott amp Darling-ton 1971) This phenomenon could be explained bysupposing that somatosensory responses are inhibitedwhen the incoming sensory (touch) information matchesa prediction of that input generated by the personrsquos ownmotor system To test this hypothesis these investigatorshad subjects tickle their palm via an apparatus that couldintroduce a time delay between their movement andthe touch stimulation of their palm (Blakemore Frith ampWolpert 1999) When this was done subjects reportedthat with increasing delay the stimulation of theirpalm was increasingly stimulating (ticklish) That theperceived stimulation increased with delay could beexplained by supposing that the delay made the incom-ing sensory information mismatch in time the predictedsensory input This experiment was followed up in anfMRI imaging study looking at activation of the somato-sensory cortex and cerebellum to both self-producedand externally generated palm stimulation (BlakemoreWolpert amp Frith 1999) It was found that activation ofthe somatosensory cortex to self-produced palm stimu-lation was suppressed compared to externally producedpalm stimulationmdasha direct analog of the results recordedin the auditory cortex in our experiments

Speech Motor Control

The suppression could also serve to regulate howmuch auditory feedback is used in the ongoing controlof speech

A problem with using auditory feedback to controlspeech is the delay incurred by the processing of theauditory input To control speech auditory feedbackmust convey information about the positions of thevocal tract articulators From speech acoustics we knowthat formant frequencies and pitch are the acousticparameters that convey this information (Stevens1999) Extracting these parameters from the type of

Houde et al 1133

signal provided by the cochlea requires significant signalprocessing which in turn suggests the possibility ofsignificant delay if a succession of auditory brain areasare needed for this processing (Jurgens 2002 BurnettFreedland Larson amp Hain 1998 Perkell 1997) Inaddition there will be delays associated with the gen-eration of a response in the motor cortex There will alsobe intrinsic delays in transmitting the neural signals tothe appropriate muscles as well as delays in muscleresponses to these signals (Jurgens 2002) Togetherthese factors could contribute to a significant feedbackloop delay between auditory input and motor output

Unfortunately we know from control theory that ingeneral using feedback to control a system with a largefeedback loop delay will not work because the systembecomes unstable (Franklin Powell amp Emami-Naeini1991) Intuitively this instability arises because the delaymakes the feedback uncorrelated with the state of thesystem sensory feedback from some time in the pastdoes not entirely reflect the current state of the systemYet auditory feedback does appear to be involved in theonline control of some aspects of speech production Asnoted in the Introduction if the pitch of a speakerrsquosfeedback is perturbed they will typically compensatewithin 100 ndash200 msec (Burnett et al 1998 Kawahara1993 Elman 1981) Thus in spite of a large (100ndash200 msec) feedback loop delay auditory feedback doesstably regulate a speakerrsquos pitch How is this possibleAgain from control theory we know that when feedbackis delayed corrupted by noise or otherwise madeuncorrelated with current motor output a successfulapproach to using feedback control is to attenuate thesensory feedback to the degree that it is uncorrelatedSpecifically in feedback control based on Kalman filter-ing the Kalman gain on sensory feedback is set in partto be inversely proportional to the degree to whichsensory feedback is correlated with the current systemstate (Jacobs 1993) In an analogous way it may be thatthe suppression of the auditory cortex during speakingfunctions to attenuate the auditory feedback to a levelcommensurate with its delay such that it can be usedproperly in the control of speech

Finally it is possible that both of the above accountsare true It may be that the suppression of self-producedspeech acts as a filter for incoming sensory data allow-ing attention to be paid to externally produced sensa-tions while at the same time properly attenuating theauditory feedback for use in the online control ofspeech Future experiments must be designed to distin-guish between the above hypothesized possibilities

METHODS

Subjects

Eight male volunteers (ages 25ndash40) participated inExperiments 1 and 3 of this study Of these volun-

teers six also participated in Experiment 2 All subjectsgave their informed consent All studies were per-formed with the approval of the UCSF Committee forHuman Research

Experimental Design

Magnetic fields were recorded from both hemispheresin a shielded room using two 37-channel biomagnetom-eters with SQUID-based first-order gradiometer sensors(Magnes II 4-D Neuroimaging San Diego CA) A sche-matic of the experimental setup is shown in Figure 1A directional microphone was placed in the chamber at adistance where it did not distort the recorded magneticfields Triggering of the data acquisition system and ofstimuli delivered to the ears was controlled and itdiffered for each of three experiments in the study

Fiduciary points were marked on the skin for latercoregistration with structural magnetic resonanceimages and the head shape was digitized to constrainsubsequent source modeling The sensor was initiallypositioned over the estimated location of the auditorycortices in both hemispheres such that a dipolar M100response was evoked by single 400-msec duration tonepips (1 kHz 5 msec risefall ramps 90 dB SPL) Dataacquisition epochs were 600 msec in total duration witha 100-msec prestimulus period referenced to the onsetof the first speech stimulus Data were acquired at asampling rate of 1041 Hz The position of the sensorwas then refined so that a single dipole localizationmodel of the averaged evoked magnetic field responsesto 100 tones resulted in a correlation and goodness-of-fitgreater than 95

With satisfactory sensor positioning over the auditorycortices in both hemispheres but before starting theexperiments subjects practiced producing the neutralvowel sound by phonating while relaxing their uppervocal tract generating no movement of their jaw ortongue Preliminary experiments revealed that this formof speech production produced no movement artifactsin these magnetic field recordings

During the speaking condition of Experiments 1 and 3the audio onset of each utterance picked up the micro-phone triggered the MEG data acquisition system whichrecorded 1 sec of data (300 msec pretrigger 700 msecposttrigger) at a sampling rate of 1041 Hz Subjectswere instructed to produce short vowel sounds byphonating their vocal chords without any movement oftheir jaw or tongue Responses to 100 such utteranceswere recorded and these magnetic field recordings werethen averaged to obtain the evoked magnetic fieldresponses to self-produced speech for each MEG detec-tor channel In the subsequent tape playback conditionthe audio signal from the tape recorder was fed to thesubjects earphones and the trigger input for MEG dataacquisition Again magnetic field recordings wereaveraged to obtain the evoked magnetic field responses

1134 Journal of Cognitive Neuroscience Volume 14 Number 8

During the tape playback condition of Experiments 1and 3 the audio level of the signal fed to the earphonesand trigger was adjusted to be the same as that of thespeaking condition Because of this the trigger of MEGdata acquisition (which was based on an audio voltagethreshold) occurred at the same time relative to utter-ance onset in both conditions The equal audio levelsensured that triggering latency differences could notaccount for any differences between the evoked re-sponses to self-produced and tape-recorded speech InExperiment 1 the subject heard these equal audio levelsin the earphones which meant that his side tone ampli-tude was the same in both the speaking and tape play-back conditions In Experiment 3 the audio signal foreach utterance was used to trigger both the MEG dataacquisition system and a white noise generator (20 Hzndash20 kHz bandwidth 85 dB SPL) In this case a sum of theiraudio signal and noise was presented to the subject

In Experiment 2 onset of the tone pips triggered MEGdata acquisition in each of the three conditions (tonesalone tones and speaking and tones and tape playback)These 1-kHz tones each had a 400-msec duration 5-mseconset and offset ramps and a 08ndash12-sec ISI betweentones The subject heard the tones at 90 dB SPL As withExperiment 1 data acquisition epochs were 10 sec intotal duration with a 300-msec prestimulus period Datawere acquired at a sampling rate of 1041 Hz In eachcondition the evoked magnetic field responses to thetone pips were obtained by averaging 100 epochs

Epochs 1500 msec long with a 500-msec prestimulusperiod were collected at a sampling rate of 1041 Hz Theraw epoch magnetic field data from these 100 utteranceswere then averaged to obtain the evoked magnetic fieldresponses to gated noise during self-production

The voltage level of the audio signal fed to earphoneand trigger signal in this condition was adjusted to beidentical to the voltage levels recorded during the speak-ing condition ensuring that triggering differences couldnot account for any differences between the evokedresponses to noise in the self-produced and tape play-back blocks

Data Analysis

Data from the two sensor arrays positioned over eachhemisphere were separately analyzed From the evokedmagnetic field response for each condition in eachchannel the RMS value of evoked magnetic field strengthaveraged across channels was computed The RMS meth-od of averaging (square root of the mean of squaredchannel field strengths) was used to avoid the cancella-tion that occurs when the magnetic fields from detectorson opposite sides of the current dipole (whose fields areroughly equal in magnitude but opposite in sign) areaveraged together This process resulted in one RMStime waveform for each hemisphere for each conditionfor each subject For statistical comparisons across con-

ditions a one-way analysis of variance was performedwith condition as factor and time as a repeated measure

The RMS across channels is a convenient measure ofactivity that is independent of the position of individualsensors Following optimal placement of the sensor for asignificant M100 dipolar response RMS has been shownto correlate well with the lsquolsquoequivalent current-dipolestrengthrsquorsquo (also called the lsquolsquoQ valuersquorsquo) of a dipole posi-tioned in and around the auditory cortex The RMSmeasure is also correlated with the data obtained from(1) the channel with the biggest signal-to-noise ratioand (2) the first two principal components of the arrayresponse (Ahissar et al 2001 Mahncke 1998)

In sensor arrays comprising of planar gradiometers(like the Neuromag 122 or 306) the channel with largestsignal-to-noise ratio is often closest to the source (asdictated by the physics) However in the case of sensorarrays comprising of axial gradiometers (as in ourMagnes II system) the channel with the largest signal-to-noise ratio reflects the peak of the magnetic fieldwhich is not the closest channel to a source Moreoverthe lsquolsquobestrsquorsquo channel could reflect either a peak lsquolsquoinwardrsquorsquoor lsquolsquooutwardrsquorsquo field requiring signed inversion whileaveraging such data across subjects Finally since thesensor array position can varying considerably acrosssubjects it is difficult to use procedures typical for EEGand whole-head studies wherein single channel data isoften picked for analysis across subjects These consid-erations have led us to use RMS instead of pickingspecific channels for our analysis

Single Dipole Model Fit

A single equivalent current-dipole model was calculatedseparately for each hemisphere for each poststimulustime point using standard 4-D Neuroimaging softwareoperating on data that had been filtered between 1 and20 Hz The localization algorithm used an iterative least-squares minimization to compute the strength andlocation of a single dipole in a spherical volume ofuniform conductivity that can account for the sensordata Dipole fits were accepted based on a local corre-lation maximum criteria of 95 and goodness-of-fit valuesgreater than 95

The MEG anatomical reference frame was establishedusing a digital device called the lsquolsquosensor position indica-torrsquorsquo A set of three table-mounted and one array-mounted receivers triangulates the signal from a stylustransmitter positioned at fiducial reference points on thesubjectrsquos head surface (typically nasion left right pre-auricular points Cz and inion) The same stylus trans-mitter arrangement served to define the curvature of thehead by tracing the surface from which a lsquolsquolocal spherersquorsquomodel of the head was generated To coregister thedipole locations on the MRI image the fiduciary pointsserve as the basis for a common coordinate system Bysuperimposing these fiducial landmarks on MR images

Houde et al 1135

of the subject it was possible to define the position ofthe computed point sources with an accuracy of 5 mm

Multiple Dipole Localization Analysis

Multiple dipole localization analyses of spatio-temporalevoked magnetic fields was also performed using Multi-ple Signal Classification (MUSIC) (Sekihara PoeppelMarantz Koizumi amp Miyashita 1997 Mosher Lewisamp Leahy 1992) and beamformer algorithms (Mosheret al 1992 Sekihara Nagarajan Poeppel Marantz ampMiyashita 2001 Robinson amp Vrba 1999) MUSIC meth-ods are based on estimation of a signal lsquolsquosubspacersquorsquofrom entire spatio-temporal MEG data using singular-value decomposition (SVD) A version of the MUSICalgorithm referred to as the lsquolsquoconventionalrsquorsquo MUSICalgorithm was implemented in MATLAB under theassumption that the sources contributing to the MEGdata arose from multiple dipoles (lt37 in number)which were stationary and were located within a spher-ical volume of uniform conductivity (Sekihara et al1997) The locations of dipoles are typically determinedby conducting a search over a three-dimensional grid ofinterest within the head Given the sensor positionsand the coordinates of the origin of a lsquolsquolocal spherersquorsquoapproximation of the head shape for each subject alead-field matrix was computed for each point in this3-D grid At each point in this grid from the lead-fieldmatrix and the covariance matrix of the spatio-temporalMEG data the value of a MUSIC localizer function canthen be computed (Sekihara et al 1997 Equation (4))Maxima of this localizer function then corresponded tothe location of dipolar sources We used the MUSICalgorithm to compute the localizer functions over athree-dimensional grid in the hemisphere contralateralto the stimulus delivery The localizer was computedusing the evoked magnetic field responses over a timewindow of 1000 msec following the onset of thesentences For each subject at each point in a 3-D grid(iexcl4 lt x lt 6 0 lt y lt 8 3 lt z lt 11) in the lefthemisphere the localizer function was computed usingthe averaged evoked auditory magnetic field responses

Acknowledgments

This work was funded in part by Scientific Learning Co and agrant from the Whitaker Foundation

Reprint requests should be sent to John F Houde W M KeckCenter for Integrative Neuroscience University of CaliforniaSan Francisco 513 Parnassus Avenue S-762 San Francisco CA94143 USA or via e-mail houdephyucsfedu

Note

1 In a recent study of M100 responses to tones it was foundthat near detection thresholds a 15-dB change in SPL caused a34 change in M100 RMS amplitude (Stufflebeam et al 1998)

REFERENCES

Ahissar E Nagarajan S Ahissar M Protopapas A MahnckeH amp Merzenich M M (2001) Speech comprehension iscorrelated with temporal response patterns recorded fromauditory cortex Proceedings of the National Academy ofSciences USA 98 13367 ndash13372

Blakemore S J Frith C D amp Wolpert D M (1999)Spatio-temporal prediction modulates the perception ofself-produced stimuli Journal of Cognitive Neuroscience11 551 ndash559

Blakemore S J Wolpert D M amp Frith C D (1999) Thecerebellum contributes to somatosensory cortical activityduring self-produced tactile stimulation Neuroimage 10448 ndash459

Burnett T A Freedland M B Larson C R amp Hain T C(1998) Voice f0 responses to manipulations in pitchfeedback Journal of the Acoustical Society of America103 3153ndash3161

Cooper W E (1979) Speech perception and productionStudies in selective adaptation Norwood NJ Ablex

Creutzfeldt O amp Ojemann G (1989) Neuronal activityin the human lateral temporal lobe III Activity changesduring music Experimental Brain Research 77490 ndash498

Creutzfeldt O Ojemann G amp Lettich E (1989a) Neuronalactivity in the human lateral temporal lobe I Responses tospeech Experimental Brain Research 77 451 ndash475

Creutzfeldt O Ojemann G amp Lettich E (1989b) Neuronalactivity in the human lateral temporal lobe II Responses tothe subjects own voice Experimental Brain Research 77476 ndash489

Curio G Neuloh G Numminen J Jousmaki V amp Hari R(2000) Speaking modifies voice-evoked activity in thehuman auditory cortex Human Brain Mapping 9183 ndash191

Demonet J F Chollet F Ramsay S Cardebat DNespoulous J L Wise R Rascol A amp Frackowiak R(1992) The anatomy of phonological and semanticprocessing in normal subjects Brain 115 1753 ndash1768

Diesch E amp Luce T (1997) Magnetic fields elicited bytones and vowel formants reveal tonotopy and nonlinearsummation of cortical activation Psychophysiology 34501 ndash510

Elman J L (1981) Effects of frequency-shifted feedback onthe pitch of vocal productions Journal of the AcousticalSociety of America 70 45ndash50

Franklin G F Powell J D amp Emami-Naeini A (1991)Feedback control of dynamic systems (2nd ed) ReadingMA Addison-Wesley

Gracco V L Ross D Kalinowski J amp Stuart A (1994)Articulatory changes following spectral and temporalmodifications in auditory feedback Journal of theAcoustical Society of America 95 2821

Gunji A Hoshiyama M amp Kakigi R (2000) Identificationof auditory evoked potentials of onersquos own voice ClinicalNeurophysiology 111 214 ndash219

Gunji A Kakigi R amp Hoshiyama M (2000) Spatiotemporalsource analysis of vocalization-associated magnetic fieldsCognitive Brain Research 9 157 ndash163

Hari R Aittoniemi K Jarvinen M L Katila T amp Varpula T(1980) Auditory evoked transient and sustained magneticfields of the human brain Localization of neural generatorsExperimental Brain Research 40 237 ndash240

Hari R amp Makela J P (1988) Modification of neuromagneticresponses of the human auditory cortex by masking soundsExperimental Brain Research 71 87ndash92

Hein A V amp Held R (1962) A neural model for labile

1136 Journal of Cognitive Neuroscience Volume 14 Number 8

sensorimotor coordination In E Bernard amp M Hare(Eds) Biological prototypes and synthetic systems(vol 1 pp 71ndash74) New York Plenum

Hickok G amp Poeppel D (2000) Towards a functionalneuroanatomy of speech perception Trends in CognitiveSciences 4 131 ndash138

Hirano S Kojima H Naito Y Honjo I Kamoto YOkazawa H Ishizu K Yonekura Y Nagahama YFukuyama H amp Konishi J (1996) Cortical speechprocessing mechanisms while vocalizing visually presentedlanguages NeuroReport 8 363ndash367

Hirano S Kojima H Naito Y Honjo I Kamoto YOkazawa H Ishizu K Yonekura Y Nagahama YFukuyama H amp Konishi J (1997) Cortical processingmechanism for vocalization with auditory verbal feedbackNeuroReport 8 2379ndash2382

Hirano S Naito Y Okazawa H Kojima H Honjo I IshizuK Yenokura Y Nagahama Y Fukuyama H amp Konishi J(1997) Cortical activation by monaural speech soundstimulation demonstrated by positron emission tomographyExperimental Brain Research 113 75ndash80

Houde J F amp Jordan M I (1998) Sensorimotor adaptationin speech production Science 279 1213 ndash1216

Jacobs O L R (1993) Introduction to control theory(2nd ed) Oxford UK Oxford University Press

Jeannerod M (1988) The neural and behaviouralorganisation of goal-directed movements Oxford UKOxford University Press

Jurgens U (2002) Neural pathways underlying vocalcontrol Neuroscience and Biobehavioral Reviews 26235 ndash258

Kawahara H (1993) Transformed auditory feedback Effectsof fundamental frequency perturbation Journal of theAcoustical Society of America 94 1883

Kuriki S Mori T amp Hirata Y (1999) Motor planningcenter for speech articulation in the normal human brainNeuroReport 10 765 ndash769

Kuriki S amp Murase M (1989) Neuromagnetic study of theauditory responses in right and left hemispheres of thehuman brain evoked by pure tones and speech soundsExperimental Brain Research 77 127 ndash134

Lane H amp Tranel B (1971) The lombard sign and the roleof hearing in speech Journal of Speech and HearingResearch 14 677ndash709

Lee B S (1950) Some effects of side-tone delay Journal ofthe Acoustical Society of America 22 639 ndash640

Leonardo A amp Konishi M (1999) Decrystallization of adultbirdsong by perturbation of auditory feedback Nature 399466 ndash470

Levelt W J (1983) Monitoring and self-repair in speechCognition 14 41ndash104

Levelt W J M (1989) Speaking From intention toarticulation Cambridge MIT Press

Liegeois-Chauvel C Musolino A Badier J M Marquis P ampChauvel P (1994) Evoked potentials recorded from theauditory cortex in man Evaluation and topography of themiddle latency components Electroencephalography andClinical Neurophysiology 92 204 ndash214

Mahncke H (1998) Oscillations and temporal processingin sensory cortex Unpublished doctoral dissertationUniversity of California San Francisco

McCasland J S amp Konishi M (1981) Interaction betweenauditory and motor activities in an avian song controlnucleus Proceedings of the National Academy of SciencesUSA 78 7815ndash7819

McGuire P K Silbersweig D A amp Frith C D (1996)Functional neuroanatomy of verbal self-monitoring Brain119 907 ndash917

Mosher J C Lewis P S amp Leahy R M (1992) Multipledipole modeling and localization from spatio-temporal megdata IEEE Transactions on Biomedical Engineering 39541 ndash557

Muller-Preuss P amp Ploog D (1981) Inhibition of auditorycortical neurons during phonation Brain Research 21561ndash76

Numminen J amp Curio G (1999) Differential effects of overtcovert and replayed speech on vowel-evoked responsesof the human auditory cortex Neuroscience Letters 27229ndash32

Numminen J Salmelin R amp Hari R (1999) Subjectrsquos ownspeech reduces reactivity of the human auditory cortexNeuroscience Letters 265 119 ndash122

Papanicolaou A C Raz N Loring D W amp EisenbergH M (1986) Brain stem evoked response suppressionduring speech production Brain and Language 2750ndash55

Perkell J (1997) Articulatory processes In W J Hardcastleamp J Laver (Eds) The handbook of phonetic sciences(pp 333 ndash370) Oxford UK Blackwell Publishers

Picton T W Alain C Woods D L John M S Scherg MValdes-Sosa P Bosch-Bayard J amp Trujillo N J (1999)Intracerebral sources of human auditory-evoked potentialsAudiology and Neuro-otology 4 64ndash79

Poeppel D Yellin E Phillips C Roberts T P Rowley H AWexler K amp Marantz A (1996) Task-induced asymmetry ofthe auditory evoked m100 neuromagnetic field elicited byspeech sounds Brain Research Cognitive Brain Research4 231 ndash242

Reite M Adams M Simon J Teale P Sheeder JRichardson D amp Grabbe R (1994) Auditory m100component 1 Relationship to heschlrsquos gyri BrainResearch Cognitive Brain Research 2 13ndash20

Richer F Alain C Achim A Bouvier G amp Saint-Hilaire J M(1989) Intracerebral amplitude distributions of the auditoryevoked potential Electroencephalography and ClinicalNeurophysiology 74 202 ndash208

Robinson S amp Vrba J (1999) Functional neuroimaging bysynthetic aperture magnetometry (SAM) In Yoshimoto et al(Eds) Recent advances in biomagnetism (pp 302 ndash305)Sendai Japan Tokyo University Press

Scherg M amp Von Cramon D (1985) Two bilateral sources ofthe late aep as identified by a spatio-temporal dipole modelElectroencephalography and Clinical Neurophysiology 6232ndash44

Scherg M amp Von Cramon D (1986) Evoked dipolesource potentials of the human auditory cortexElectroencephalography and Clinical Neurophysiology65 344 ndash360

Schmidt M F amp Konishi M (1998) Gating of auditoryresponses in the vocal control system of awake songbirdsNature Neuroscience 1 513ndash518

Sekihara K Nagarajan S S Poeppel D Marantz Aamp Miyashita Y (2001) Reconstructing spatio-temporalactivities of neural sources using an meg vectorbeamformer technique IEEE Transactions on BiomedicalEngineering 48 760ndash771

Sekihara K Poeppel D Marantz A Koizumi H ampMiyashita Y (1997) Noise covariance incorporatedmeg-music algorithm A method for multiple-dipoleestimation tolerant of the influence of background brainactivity IEEE Transactions on Biomedical Engineering44 839 ndash847

Stevens K N (1999) Acoustic phonetics CambridgeMIT Press

Stufflebeam S M Poeppel D Rowley H A amp RobertsT P (1998) Peri-threshold encoding of stimulus

Houde et al 1137

frequency and intensity in the m100 latency NeuroReport9 91ndash94

Suga N amp Schlegel P (1972) Neural attenuation ofresponses to emitted sounds in echolocating rats Science177 82ndash84

Suga N amp Shimozawa T (1974) Site of neural attenuation ofresponses to self-vocalized sounds in echolocating batsScience 183 1211 ndash1213

von Bekesy G (1949) The structure of the middle ear and thehearing of onersquos own voice by bone conduction Journal ofthe Acoustical Society of America 21 217 ndash232

Weiskrantz L Elliott J amp Darlington C (1971) Preliminaryobservations on tickling oneself Nature 230 598 ndash599

Woods D L Knight R T amp Neville H J (1984)Bitemporal lesions dissociate auditory evoked potentialsand perception Electroencephalography and ClinicalNeurophysiology 57 208 ndash220

Yates A J (1963) Delayed auditory feedback PsychologicalBulletin 60 213 ndash232

Zatorre R J Evans A C Meyer E amp Gjedde A (1992)Lateralization of phonetic and pitch discrimination in speechprocessing Science 256 846 ndash849

1138 Journal of Cognitive Neuroscience Volume 14 Number 8

tape playback conditions can be seen in the figure Atabout 100 msec poststimulus onset most of the tracesshow a peak in RMS amplitude This peak is referred toas the M100 response and there are large differencesin its amplitude across conditions in both hemispheresfor this subject In the left hemisphere the M100response in the speaking condition is greatly sup-pressed compared to that seen in the tape playbackcondition In the right hemisphere this suppression iseven greater In the tape playback condition there isquite a large M100 response but in the speakingcondition the M100 is essentially absent

Figure 5 shows the same response waveforms of theprevious figure averaged across all eight subjects In thisfigure in addition to the average response waveformsvertical bars indicating standard errors are also shownThe asterisks in each panel indicate latencies at whichthe speaking and tape playback responses differed sig-nificantly relative to p lt 001 These plots show that inboth hemispheres mean M100 responses were signifi-cantly smaller in the speaking condition than in the tapeplayback condition It appears that left hemisphereresponses are larger than right hemisphere responsesin the tape playback condition but that both hemi-spheres are inhibited to about the same level in thespeaking condition However even in the speakingcondition the left hemisphere M100 response appearsmore focal in time than that of the right hemisphereLess consistent are the longer latency responses Forexample the amplitude of the M200 response in the lefthemisphere was significantly larger in the tape playbackcondition while in the right hemisphere it was signifi-cantly larger in the speaking condition

The acoustic signals reaching the ears via air conduc-tion the so-called lsquolsquoside-tonersquorsquo sound amplitudes wereadjusted to be the same for both the speaking and tapeplayback conditions all our subjects reported that theintensity of mic and tape signals were perceptuallyidentical However in the speaking condition becausethe subject hears both the side tone and bone conduc-tion of his voice the speech heard by a subject is actuallyabout twice the amplitude of (ie 3 dB louder) thespeech heard in the tape playback condition (vonBekesy 1949) Thus a priori we would expect thisamplitude difference to be a possible confound in theexperiment The size of evoked magnetic field responsein the auditory cortex and in particular the amplitudeof the M100 response have been shown to be a monot-onically increasing function of audio input level (Stuf-flebeam Poeppel Rowley amp Roberts 1998) From thiswe would expect the M100 response to be larger in thevoiced condition than in the tape playback conditionHowever this response difference should be small Atthe audio levels used in this experiment (80 dBA at theearphones) the effect of audio amplitude on the M100response has nearly saturated Additionally as we haveseen the actual M100 response recorded in the speakingcondition was much less than that recorded in the tapeplayback conditionmdasha result opposite of what would bepredicted based on audio amplitude differences

Source locations were performed on the spatio-temporal magnetic field responses recorded 50ndash200msec following stimulus onset Across all our stimulusconditions (mic vs tape vs tone response) we found nostatistically significant difference ( p gt 5) in the locationof an equivalent dipole in each hemisphere that accounts

Figure 5 RMS response waveforms in Experiment 1 averagedacross all subjects The vertical bars in each trace show standard errorswhile the asterisks show intervals where the response waveforms fromthe microphone condition (thick traces) and tape condition (thintraces) differ significantly relative to p lt 001 for both hemispheres

Figure 4 RMS responses recorded in Experiment 1 averaged over all37 detectors in each hemispherersquos detector array from a singlerepresentative subject In each plot the thick traces are responses inthe speaking condition the thin traces are responses in the tapeplayback condition The onset of the audio stimulus is at 0 msec

1128 Journal of Cognitive Neuroscience Volume 14 Number 8

for these magnetic field responses These localizationdata suggest that the observed response suppressionattenuation is arising from a restricted cortical regionpresumably A1 and its immediate environs

Overall all subjectsrsquo M100 responses were localized tothe temporal cortex in the area of the auditory cortexWe infer this because lesion studies intracranial record-ings and source modeling studies provide convergingevidence that the neuronal generators of the M100response were located in the auditory regions of thetemporal lobes and include A1 and its immediate envi-rons (Ahissar et al 2001 Picton et al 1999 Liegeois-Chauvel Musolino Badier Marquis amp Chauvel 1994Reite et al 1994 Richer Alain Achim Bouvier amp Saint-Hilaire 1989 Scherg amp Von Cramon 1985 1986Woods Knight amp Neville 1984 Hari Aittoniemi Jarvi-nen Katila amp Varpula 1980)

Experiment 2

The results of Experiment 1 clearly confirmed that theM100 response is suppressed in the speaking conditionOne possible mechanism that could cause this suppres-sion is nonspecific attenuation That is neural signalsarising as a result of activity in the motor cortex mightdirectly and broadly inhibit auditory cortex activity

If auditory cortex activity is nonspecifically suppressedduring speaking then it should be less responsive to allauditory signals In Experiment 2 we tested this possi-bility by examining the M100 response to tones heardduring speech that was either self-produced or playedback from tape

In this experiment we measured the responses evokedby tones pips under three successive conditions asshown in Figure 6 In the first condition (tones aloneFigure 6a) subjects remained silent and heard 10-kHztone pips In the second condition (tones and speakingFigure 6b) subjects again heard these tones pips but thistime the tones were presented while subjects producedlong utterances of the vowel (again with no move-ment of their jaw or tongue) In the third condition(tones and tape playback Figure 6c) subjects againremained silent and heard both tone pips and tapeplayback of their speech produced in the previous con-dition For the last two conditions in which the subjectheard tone pips and speech the speech was attenuatedby 20 dB to reduce the masking of the tone pips

The results of Experiment 2 are summarized in Figure7 The figure shows a plot of the mean RMS responseacross detectors and subjects for each condition and ineach hemisphere Dashed traces show subjectsrsquo re-sponses to hearing the tones in silence (tones alonecondition) Thick traces show responses to hearing thetones while subjects produced (tones and speakingcondition) Thin traces show responses to hearing thetones along with the taped speech playback (tones andtape playback condition) As in Figure 5 the vertical bars

are standard errors Asterisks indicate time intervalsduring which responses to tones heard while produc-ing speech (tones and speaking condition) differedsignificantly ( p lt 001) from the responses to tonesheard with taped speech playback (tones and tapeplayback condition)

For most of the traces in Figure 7 the M100 and M200responses to the tones across conditions stand outclearly For both hemispheres these responses are veryprominent for tones heard in silence (dashed traces) Inthe left hemisphere there was a reduction in M100 andM200 response to the tones in the speaking (thicktraces) and tape playback (thin traces) conditions In-terestingly the response reduction in the speakingcondition appeared to be greater than the responsereductions in the tape playback condition Indeed thestatistical analysis confirmed that the M100 response in

Figure 6 Setup for the three conditions of Experiment 2

Houde et al 1129

the tones and speaking condition was significantly lessthan the M100 response in the tones and tape playbackcondition Nevertheless the difference was small In theright hemisphere there was an approximately equalreduction in M100 and M200 responses to the tones inthe speaking and tape playback conditions Within theM100ndashM200 region responses in these two conditionswere only significantly different at one latency as shownby the asterisk

It appears that at the most the act of speakingcreates only a limited degree of nonspecific attenuationto the tones in the left hemisphere that is not seen inthe tape playback condition To quantify how muchnonspecific attenuation contributes to the suppressionof the response to self-produced speech we can esti-mate the signal reductions needed to produce theresponse reductions seen in these experiments Fromthe results of Experiment 1 we have seen that com-pared to speech from the tape playback self-productionof speech created a 30 reduction in M100 amplitudein the left hemisphere and a 15 reduction in the righthemisphere equivalent to 13 and 7 dB decreases ineffective input signal respectively1 In Experiment 2 inthe left hemisphere there was possibly an extra 7reduction in M100 response to tones while subjectsspoke as compared to the M100 response to tones duringtape playback which is equivalent to only a 3-dB reduc-tion in effective input signal Thus it appears that non-specific attenuation cannot account for all the responsesuppression recorded for self-produced speech

It has also been argued that any attenuation in theresponses during vocalization could occur due to boneconduction and activation of middle-ear muscles or dueto inhibition at the level of the brain stem (Papanicolaouet al 1986) Thus to further test the hypothesis ofnonspecific attenuation we measured brainstem evokedresponses to click trains in five subjects under threeconditions identical to those in Experiment 2mdashsilenceproduced speech and tape-recorded speech In the lattertwo conditions we attenuated the speech background by20 dB to obtain a clear evoked response to clicks Foreach of these conditions we recorded the brainstemevoked response and measured the latency and ampli-tude of Waves IndashV The results from these experimentsindicated that there were no statistical differences ineither the amplitude or the latencies for all wavesbetween the three background conditions ( p gt 2)These results provide further support to the hypothesisthat our observed effects were cortical in origin

Experiment 3

Another possible cause of the response reduction toself-produced speech is that activity in the auditory

Figure 7 RMS waveforms for responses to 10-kHz tone pipsaveraged across all subjects for the three conditions of Experiment 2the tones alone condition (dashed traces) the tones and speakingcondition (thick traces) and the tones and tape playback condition(thin traces) The vertical bars in each trace show standard errorswhile the asterisks show intervals where the response waveforms fromthe tones and speaking condition (red) and the tones and tapeplayback condition (blue) differed significantly relative to p lt 001 forboth hemispheres

Figure 8 Setup for the two conditions of Experiment 3

1130 Journal of Cognitive Neuroscience Volume 14 Number 8

cortex is the result of a comparison between the incom-ing signal and an internally generated prediction of whatthat signal would be If this were to be the case thenaltering the incoming signal should create a mismatchwith that expectation that would reduce or abolish theresponse suppression

In Experiment 3 we tested this possibility by alter-ing the auditory feedback to the subject during self-production of speech The setup for Experiment 3 isshown in Figure 8 Like Experiment 1 this experimentconsisted of two conditions speaking and tape play-back However instead of hearing their speech (self-produced or from tape playback) subjects heard a sumof their speech plus the output of a white noise gen-erator that produced noise bursts gated to the durationof the speech utterance Subjects reported that theycould not hear their own speech and that they onlyheard noise gated to their vowel production

The results of Experiment 3 are shown in Figure 9 Forboth hemispheres the thick traces show the averageevoked response of subjects hearing gated noise in thespeaking condition while the thin traces show theirresponses to gated noise in the tape playback conditionAsterisks mark poststimulus latencies where the evokedresponses in the two conditions differ significantlyrelative to p lt 001 for both hemispheres The figureshows subjectsrsquo evoked responses in the two conditionsare significantly different at a number of latencies In theleft hemisphere the responses in the speaking and tapeplayback conditions differ significantly around the M100responses and at a latency of 150 msec In the righthemisphere the speaking and tape playback responses

differ significantly at the 50- and 200-msec latenciesSince we are limiting our analysis of these results toconsidering only M100 responses the only responsedifferences of interest are the significant differences seenin the M100 region of the left hemisphere responses Itis clear from the figure however that these responsedifferences arise principally from a latency differencebetween M100 responses the M100 response to gatednoise in the tape playback condition is about 20 msecearlier that the M100 response to gated noise in thespeaking condition The amplitudes of the two M100responses appear nearly identical

In sum Figure 9 shows that the M100 amplitudesuppression seen in Experiment 1 is abolished in bothhemispheres the amplitude of M100 response to noisegenerated from subjectsrsquo speech via the microphone isas large as the M100 response to noise generated fromthe tape playback of their speech Thus when wealtered subjectsrsquo speech feedback in Experiment 3 wefound that the response suppression to self-producedspeech disappeared which is consistent with the sup-pression resulting from auditory input matching aninternally expected input

DISCUSSION

The present study demonstrates that the human audi-tory cortex responds differently to self-produced speechthan to externally produced speech In Experiment 1significant suppression in the amplitude of M100responses was recorded in the speaking condition inboth hemispheres confirming that self-production ofspeech suppresses the response in the auditory cortexto that speech In Experiment 2 M100 responses totone pips were seen to be modestly more stronglysuppressed by self-produced than by externally pro-duced speech However this relatively weak extra sup-pression was not sufficient to explain the suppressionseen in Experiment 1 On the other hand in Experi-ment 3 we were able to abolish the suppression ofM100 responses to self-produced speech by altering thefeedback heard by subjects This last result is consistentwith the hypothesis that the suppression of response inthe auditory cortex to self-produced speech resultsfrom a match with expected auditory feedback

Comparison with Other Studies

Experiment 1 is quite similar to the study by Curio et al(2000) which also examined M100 response differencesbetween speaking and tape playback conditions Theauthors found M100 amplitude differences across listen-ing conditions in both hemispheres but unlike thepresent study these differences were not significant inthe right hemisphere Further in contradiction to ourstudies Curio et al described significant differencesin M100 latencies across listening conditions in both

Figure 9 RMS response waveforms in Experiment 3 averaged acrossall subjects The vertical bars in each trace show standard errors whilethe asterisks show intervals where the response waveforms from thespeaking condition (thick traces) and tape playback condition (thintraces) differ significantly relative to p lt 001 for both hemispheres

Houde et al 1131

hemispheres Different results in M100 latencies couldbe accounted for by the fact that we matched theacoustic waveform presented to each subject whileCurio et al behaviorally matched the perceptual inten-sity between the listening conditions Because subjectscould set the tape playback volume to be different fromtheir speech feedback volume differing audio levels inthe two experiment conditions could account for someof the M100 latency differences that they report WithMEG data acquisition triggered by speech signal onsettriggering occurs when the audio signal exceeds somethreshold voltage As a result triggering time is sensitiveto the amplification of the signal The more a signal isamplified the faster it will reach the trigger thresholdThus if tape playback volume levels were less thanspeech feedback volume levels triggering from tapeplayback could be delayed compared to the speakingcondition If so then for the same true M100 latencyM100 peaks would appear to have shorter latencies inthe tape playback condition

In Experiment 3 the amplitude difference in M100responses between the speaking and tape playbackconditions was abolished by using gated noise todistort the feedback heard by subjects Interestinglyhowever this feedback manipulation created a latencydifference between the two conditions In the lefthemisphere the M100 response to gated noise trig-gered by tape playback was 15 msec earlier than thatfor gated noise triggered by the subjectrsquos own speechwhile in the right hemisphere such a latency differenceis not seen Curio et al (2000) also found that thereduced latency of M100 responses to tape playbackversus self-produced speech was significantly morepronounced in the left hemisphere Like Curio et alwe attribute this left hemisphere latency difference totwo factors First this latency difference may representdifferential processing of externally produced versusself-produced auditory feedback Although the gatednoise altered the spectral characteristics of their speechfeedback such that it was likely to mismatch anyinternally generated expectations of how their feedbackwould sound the gated noise did not alter the tempo-ral characteristics of their speech feedback Subjectsstill heard auditory feedback begin at the same time asthey commenced phonation This temporal predictabil-ity could potentially have allowed subjects to processdifferently gated noise arising from their own speechwhose onset they could predict from gated noisearising from tape playback whose onset they couldnot predict Second the fact that this latency differencewas seen primarily in the left hemisphere is consistentwith the left hemispherersquos dominance in the produc-tion of speech in right-handed speakers Other aspectsof our results may also reflect this left hemispheredominance In Experiment 1 the M100 response totape playback speech was larger in the left hemispherethan in the right and the M100 response to self-

produced speech was temporally sharper in the lefthemisphere than in the right Additionally in Experi-ment 2 in the left hemisphere the M100 response totones was significantly more suppressed by self-pro-duced speech than by tape playback speechmdasha differ-ence not seen in the right hemisphere

Results from Experiment 2 are mostly consistent withthe recent experiments by Numminen and Curio (1999)They reported that neuromagnetic responses to shortrecorded vowel sounds were delayed and dampenedrelative to background-free presentations They did notreport how much they externally attenuated the pro-duced speech signal in order to obtain a clear responseto the probe-vowel sound In our experiments we usedprobe tones instead of vowel sounds and we attenuatedthe produced speech signal to the ear by 20 dBSurprisingly we did not observe any latency differencesin the M100 responses for tones between the threeconditions We believe that the lack of a latency effectcould be accounted for by the fact that we used tonesinstead of vowel sounds (Diesch amp Luce 1997 Poeppelet al 1996 Kuriki amp Murase 1989) Furthermore by anargument analogous to that discussed above for theCurio et al (2000) latency results it is also reasonablethat any amplitude mismatches in the masking soundscould account for discrepancies in the latency effect(Hari amp Makela 1988) However while Numminen andCurio identified auditory interference as the main causeof these modifications their results also provide evi-dence against the nonspecific attenuation hypothesisalthough they do not specifically discuss their results inthis form For instance they found an additional 6ndash9decrease in amplitude in response to probe-vowelsounds during produced speech background whencompared with tape-recorded speech backgroundThese amplitude differences are consistent with ourExperiment 2 results

Gunji et al have found that vocalization-related cort-ical magnetic fields activated six sources that were foundto temporally overlap in the period 0ndash100 msec aftervocalization onset (Gunji Hoshiyama amp Kakigi 2000Gunji Kakigi amp Hoshiyama 2000) Sources 1 and 2 wereactivated approximately 150 msec before the vocalizationonset and were located in laryngeal motor areas of theleft and right hemispheres respectively Kuriki Mori andHirata (1999) have also found a motor planning centerfor speech articulation that is activated 120ndash320 msecbefore the onset of vocalization and located in a regionaround the superior end of the left insula Gunji et alSources 5 and 6 were located in the truncal motor area ineach hemisphere and were similar to Sources 1 and 2 Incontrast their Sources 3 and 4 were located in theauditory cortices of the left and right hemisphererespectively and were activated after vocalization onsetHowever these experiments did not explore either themodulation of the responses in the auditory or motorcortex due to vocalization or feedback alteration In our

1132 Journal of Cognitive Neuroscience Volume 14 Number 8

experiments because we were unable to simultaneouslymeasure activity from motor and supplementary motorareas using a 37-channel sensor array over each hemi-sphere we focus here only on the activity from auditorycortices of both hemispheres

More relevant to our results are the studies of Hiranoet al (Hirano et al 1996 Hirano Kojima et al 1997Hirano Naito et al 1997) who used PET to look atcortical activation during speaking In their first studyHirano et al found a lack of activation in the auditorycortex while subjects vocalized In their second studyHirano et al altered how subjects heard their ownvocalizations and this time recorded significant activa-tion of the auditory cortex These results are consistentwith our findings in Experiments 1 and 3 of our study

Why is the Response to Self-Produced SpeechSuppressed

The present study along with recent MEG studies byCurio et al and other brain imaging studies that haveused PET or fMRI demonstrate that during speechproduction the auditory cortex suppresses its responseto expected acoustic signals (Hirano et al 1996 HiranoKojima et al 1997 Hirano Naito et al 1997) Doesthat suppression serve any function We speculate abouttwo possibilities

Auditory Perception

One possibility is that the suppression results from theprocess of distinguishing self-produced from externallyproduced sources It would be useful for sensory systemsto make this distinction since sensory information fromthese two types of sources is commonly used for differ-ent purposes Sensory information from self-producedmotor actions could potentially be used for feedbackcontrol On the other hand sensory information fromexternal sources is primarily used for recognition

But how could the sensory systems distinguish self-produced from externally produced sensory informa-tion It has been proposed by previous researchers thatthis is done by having motor actions produce expect-ations of their sensory consequences (Jeannerod 1988)These sensory outcome predictions are then comparedwith the actual sensory input whatever matches theoutcome predictions is inhibited In that way the sen-sory consequences of self-produced actions could befiltered from incoming sensory data thus highlightingany unpredicted and therefore external (or novel) stim-ulus that should potentially be attended to

For a number of reasons the results of our experi-ments are consistent with the above account Firstalthough we found significant suppression of the re-sponses of the auditory cortex to self-produced speechwe found little evidence that the act of producingspeech suppresses responses to other sounds (in our

case the 1-kHz tone) This finding is consistent with thehypothesis that only responses to self-produced soundsare being filtered from the auditory cortex In additionwe found that altering the auditory feedback with gatednoise abolished the suppression in the auditory cortexinduced by speaking consistent with the hypothesis thatthe filtering out of auditory information arising from self-produced speech is done by comparison with predictedauditory feedback

Recently Blakemore et al have recorded the sametypes of response suppression phenomena in the soma-tosensory cortex suggesting that the need to distinguishself-produced from externally generated sensory inputmay be a general property of sensory systems In theirseries of experiments Blakemore et al used bothbehavioral and physiological procedures to look at re-sponses to both self-produced and externally producedsomatosensory stimulation In one experiment theyexamined the well-known phenomenon that peoplecannot tickle themselves (Weiskrantz Elliott amp Darling-ton 1971) This phenomenon could be explained bysupposing that somatosensory responses are inhibitedwhen the incoming sensory (touch) information matchesa prediction of that input generated by the personrsquos ownmotor system To test this hypothesis these investigatorshad subjects tickle their palm via an apparatus that couldintroduce a time delay between their movement andthe touch stimulation of their palm (Blakemore Frith ampWolpert 1999) When this was done subjects reportedthat with increasing delay the stimulation of theirpalm was increasingly stimulating (ticklish) That theperceived stimulation increased with delay could beexplained by supposing that the delay made the incom-ing sensory information mismatch in time the predictedsensory input This experiment was followed up in anfMRI imaging study looking at activation of the somato-sensory cortex and cerebellum to both self-producedand externally generated palm stimulation (BlakemoreWolpert amp Frith 1999) It was found that activation ofthe somatosensory cortex to self-produced palm stimu-lation was suppressed compared to externally producedpalm stimulationmdasha direct analog of the results recordedin the auditory cortex in our experiments

Speech Motor Control

The suppression could also serve to regulate howmuch auditory feedback is used in the ongoing controlof speech

A problem with using auditory feedback to controlspeech is the delay incurred by the processing of theauditory input To control speech auditory feedbackmust convey information about the positions of thevocal tract articulators From speech acoustics we knowthat formant frequencies and pitch are the acousticparameters that convey this information (Stevens1999) Extracting these parameters from the type of

Houde et al 1133

signal provided by the cochlea requires significant signalprocessing which in turn suggests the possibility ofsignificant delay if a succession of auditory brain areasare needed for this processing (Jurgens 2002 BurnettFreedland Larson amp Hain 1998 Perkell 1997) Inaddition there will be delays associated with the gen-eration of a response in the motor cortex There will alsobe intrinsic delays in transmitting the neural signals tothe appropriate muscles as well as delays in muscleresponses to these signals (Jurgens 2002) Togetherthese factors could contribute to a significant feedbackloop delay between auditory input and motor output

Unfortunately we know from control theory that ingeneral using feedback to control a system with a largefeedback loop delay will not work because the systembecomes unstable (Franklin Powell amp Emami-Naeini1991) Intuitively this instability arises because the delaymakes the feedback uncorrelated with the state of thesystem sensory feedback from some time in the pastdoes not entirely reflect the current state of the systemYet auditory feedback does appear to be involved in theonline control of some aspects of speech production Asnoted in the Introduction if the pitch of a speakerrsquosfeedback is perturbed they will typically compensatewithin 100 ndash200 msec (Burnett et al 1998 Kawahara1993 Elman 1981) Thus in spite of a large (100ndash200 msec) feedback loop delay auditory feedback doesstably regulate a speakerrsquos pitch How is this possibleAgain from control theory we know that when feedbackis delayed corrupted by noise or otherwise madeuncorrelated with current motor output a successfulapproach to using feedback control is to attenuate thesensory feedback to the degree that it is uncorrelatedSpecifically in feedback control based on Kalman filter-ing the Kalman gain on sensory feedback is set in partto be inversely proportional to the degree to whichsensory feedback is correlated with the current systemstate (Jacobs 1993) In an analogous way it may be thatthe suppression of the auditory cortex during speakingfunctions to attenuate the auditory feedback to a levelcommensurate with its delay such that it can be usedproperly in the control of speech

Finally it is possible that both of the above accountsare true It may be that the suppression of self-producedspeech acts as a filter for incoming sensory data allow-ing attention to be paid to externally produced sensa-tions while at the same time properly attenuating theauditory feedback for use in the online control ofspeech Future experiments must be designed to distin-guish between the above hypothesized possibilities

METHODS

Subjects

Eight male volunteers (ages 25ndash40) participated inExperiments 1 and 3 of this study Of these volun-

teers six also participated in Experiment 2 All subjectsgave their informed consent All studies were per-formed with the approval of the UCSF Committee forHuman Research

Experimental Design

Magnetic fields were recorded from both hemispheresin a shielded room using two 37-channel biomagnetom-eters with SQUID-based first-order gradiometer sensors(Magnes II 4-D Neuroimaging San Diego CA) A sche-matic of the experimental setup is shown in Figure 1A directional microphone was placed in the chamber at adistance where it did not distort the recorded magneticfields Triggering of the data acquisition system and ofstimuli delivered to the ears was controlled and itdiffered for each of three experiments in the study

Fiduciary points were marked on the skin for latercoregistration with structural magnetic resonanceimages and the head shape was digitized to constrainsubsequent source modeling The sensor was initiallypositioned over the estimated location of the auditorycortices in both hemispheres such that a dipolar M100response was evoked by single 400-msec duration tonepips (1 kHz 5 msec risefall ramps 90 dB SPL) Dataacquisition epochs were 600 msec in total duration witha 100-msec prestimulus period referenced to the onsetof the first speech stimulus Data were acquired at asampling rate of 1041 Hz The position of the sensorwas then refined so that a single dipole localizationmodel of the averaged evoked magnetic field responsesto 100 tones resulted in a correlation and goodness-of-fitgreater than 95

With satisfactory sensor positioning over the auditorycortices in both hemispheres but before starting theexperiments subjects practiced producing the neutralvowel sound by phonating while relaxing their uppervocal tract generating no movement of their jaw ortongue Preliminary experiments revealed that this formof speech production produced no movement artifactsin these magnetic field recordings

During the speaking condition of Experiments 1 and 3the audio onset of each utterance picked up the micro-phone triggered the MEG data acquisition system whichrecorded 1 sec of data (300 msec pretrigger 700 msecposttrigger) at a sampling rate of 1041 Hz Subjectswere instructed to produce short vowel sounds byphonating their vocal chords without any movement oftheir jaw or tongue Responses to 100 such utteranceswere recorded and these magnetic field recordings werethen averaged to obtain the evoked magnetic fieldresponses to self-produced speech for each MEG detec-tor channel In the subsequent tape playback conditionthe audio signal from the tape recorder was fed to thesubjects earphones and the trigger input for MEG dataacquisition Again magnetic field recordings wereaveraged to obtain the evoked magnetic field responses

1134 Journal of Cognitive Neuroscience Volume 14 Number 8

During the tape playback condition of Experiments 1and 3 the audio level of the signal fed to the earphonesand trigger was adjusted to be the same as that of thespeaking condition Because of this the trigger of MEGdata acquisition (which was based on an audio voltagethreshold) occurred at the same time relative to utter-ance onset in both conditions The equal audio levelsensured that triggering latency differences could notaccount for any differences between the evoked re-sponses to self-produced and tape-recorded speech InExperiment 1 the subject heard these equal audio levelsin the earphones which meant that his side tone ampli-tude was the same in both the speaking and tape play-back conditions In Experiment 3 the audio signal foreach utterance was used to trigger both the MEG dataacquisition system and a white noise generator (20 Hzndash20 kHz bandwidth 85 dB SPL) In this case a sum of theiraudio signal and noise was presented to the subject

In Experiment 2 onset of the tone pips triggered MEGdata acquisition in each of the three conditions (tonesalone tones and speaking and tones and tape playback)These 1-kHz tones each had a 400-msec duration 5-mseconset and offset ramps and a 08ndash12-sec ISI betweentones The subject heard the tones at 90 dB SPL As withExperiment 1 data acquisition epochs were 10 sec intotal duration with a 300-msec prestimulus period Datawere acquired at a sampling rate of 1041 Hz In eachcondition the evoked magnetic field responses to thetone pips were obtained by averaging 100 epochs

Epochs 1500 msec long with a 500-msec prestimulusperiod were collected at a sampling rate of 1041 Hz Theraw epoch magnetic field data from these 100 utteranceswere then averaged to obtain the evoked magnetic fieldresponses to gated noise during self-production

The voltage level of the audio signal fed to earphoneand trigger signal in this condition was adjusted to beidentical to the voltage levels recorded during the speak-ing condition ensuring that triggering differences couldnot account for any differences between the evokedresponses to noise in the self-produced and tape play-back blocks

Data Analysis

Data from the two sensor arrays positioned over eachhemisphere were separately analyzed From the evokedmagnetic field response for each condition in eachchannel the RMS value of evoked magnetic field strengthaveraged across channels was computed The RMS meth-od of averaging (square root of the mean of squaredchannel field strengths) was used to avoid the cancella-tion that occurs when the magnetic fields from detectorson opposite sides of the current dipole (whose fields areroughly equal in magnitude but opposite in sign) areaveraged together This process resulted in one RMStime waveform for each hemisphere for each conditionfor each subject For statistical comparisons across con-

ditions a one-way analysis of variance was performedwith condition as factor and time as a repeated measure

The RMS across channels is a convenient measure ofactivity that is independent of the position of individualsensors Following optimal placement of the sensor for asignificant M100 dipolar response RMS has been shownto correlate well with the lsquolsquoequivalent current-dipolestrengthrsquorsquo (also called the lsquolsquoQ valuersquorsquo) of a dipole posi-tioned in and around the auditory cortex The RMSmeasure is also correlated with the data obtained from(1) the channel with the biggest signal-to-noise ratioand (2) the first two principal components of the arrayresponse (Ahissar et al 2001 Mahncke 1998)

In sensor arrays comprising of planar gradiometers(like the Neuromag 122 or 306) the channel with largestsignal-to-noise ratio is often closest to the source (asdictated by the physics) However in the case of sensorarrays comprising of axial gradiometers (as in ourMagnes II system) the channel with the largest signal-to-noise ratio reflects the peak of the magnetic fieldwhich is not the closest channel to a source Moreoverthe lsquolsquobestrsquorsquo channel could reflect either a peak lsquolsquoinwardrsquorsquoor lsquolsquooutwardrsquorsquo field requiring signed inversion whileaveraging such data across subjects Finally since thesensor array position can varying considerably acrosssubjects it is difficult to use procedures typical for EEGand whole-head studies wherein single channel data isoften picked for analysis across subjects These consid-erations have led us to use RMS instead of pickingspecific channels for our analysis

Single Dipole Model Fit

A single equivalent current-dipole model was calculatedseparately for each hemisphere for each poststimulustime point using standard 4-D Neuroimaging softwareoperating on data that had been filtered between 1 and20 Hz The localization algorithm used an iterative least-squares minimization to compute the strength andlocation of a single dipole in a spherical volume ofuniform conductivity that can account for the sensordata Dipole fits were accepted based on a local corre-lation maximum criteria of 95 and goodness-of-fit valuesgreater than 95

The MEG anatomical reference frame was establishedusing a digital device called the lsquolsquosensor position indica-torrsquorsquo A set of three table-mounted and one array-mounted receivers triangulates the signal from a stylustransmitter positioned at fiducial reference points on thesubjectrsquos head surface (typically nasion left right pre-auricular points Cz and inion) The same stylus trans-mitter arrangement served to define the curvature of thehead by tracing the surface from which a lsquolsquolocal spherersquorsquomodel of the head was generated To coregister thedipole locations on the MRI image the fiduciary pointsserve as the basis for a common coordinate system Bysuperimposing these fiducial landmarks on MR images

Houde et al 1135

of the subject it was possible to define the position ofthe computed point sources with an accuracy of 5 mm

Multiple Dipole Localization Analysis

Multiple dipole localization analyses of spatio-temporalevoked magnetic fields was also performed using Multi-ple Signal Classification (MUSIC) (Sekihara PoeppelMarantz Koizumi amp Miyashita 1997 Mosher Lewisamp Leahy 1992) and beamformer algorithms (Mosheret al 1992 Sekihara Nagarajan Poeppel Marantz ampMiyashita 2001 Robinson amp Vrba 1999) MUSIC meth-ods are based on estimation of a signal lsquolsquosubspacersquorsquofrom entire spatio-temporal MEG data using singular-value decomposition (SVD) A version of the MUSICalgorithm referred to as the lsquolsquoconventionalrsquorsquo MUSICalgorithm was implemented in MATLAB under theassumption that the sources contributing to the MEGdata arose from multiple dipoles (lt37 in number)which were stationary and were located within a spher-ical volume of uniform conductivity (Sekihara et al1997) The locations of dipoles are typically determinedby conducting a search over a three-dimensional grid ofinterest within the head Given the sensor positionsand the coordinates of the origin of a lsquolsquolocal spherersquorsquoapproximation of the head shape for each subject alead-field matrix was computed for each point in this3-D grid At each point in this grid from the lead-fieldmatrix and the covariance matrix of the spatio-temporalMEG data the value of a MUSIC localizer function canthen be computed (Sekihara et al 1997 Equation (4))Maxima of this localizer function then corresponded tothe location of dipolar sources We used the MUSICalgorithm to compute the localizer functions over athree-dimensional grid in the hemisphere contralateralto the stimulus delivery The localizer was computedusing the evoked magnetic field responses over a timewindow of 1000 msec following the onset of thesentences For each subject at each point in a 3-D grid(iexcl4 lt x lt 6 0 lt y lt 8 3 lt z lt 11) in the lefthemisphere the localizer function was computed usingthe averaged evoked auditory magnetic field responses

Acknowledgments

This work was funded in part by Scientific Learning Co and agrant from the Whitaker Foundation

Reprint requests should be sent to John F Houde W M KeckCenter for Integrative Neuroscience University of CaliforniaSan Francisco 513 Parnassus Avenue S-762 San Francisco CA94143 USA or via e-mail houdephyucsfedu

Note

1 In a recent study of M100 responses to tones it was foundthat near detection thresholds a 15-dB change in SPL caused a34 change in M100 RMS amplitude (Stufflebeam et al 1998)

REFERENCES

Ahissar E Nagarajan S Ahissar M Protopapas A MahnckeH amp Merzenich M M (2001) Speech comprehension iscorrelated with temporal response patterns recorded fromauditory cortex Proceedings of the National Academy ofSciences USA 98 13367 ndash13372

Blakemore S J Frith C D amp Wolpert D M (1999)Spatio-temporal prediction modulates the perception ofself-produced stimuli Journal of Cognitive Neuroscience11 551 ndash559

Blakemore S J Wolpert D M amp Frith C D (1999) Thecerebellum contributes to somatosensory cortical activityduring self-produced tactile stimulation Neuroimage 10448 ndash459

Burnett T A Freedland M B Larson C R amp Hain T C(1998) Voice f0 responses to manipulations in pitchfeedback Journal of the Acoustical Society of America103 3153ndash3161

Cooper W E (1979) Speech perception and productionStudies in selective adaptation Norwood NJ Ablex

Creutzfeldt O amp Ojemann G (1989) Neuronal activityin the human lateral temporal lobe III Activity changesduring music Experimental Brain Research 77490 ndash498

Creutzfeldt O Ojemann G amp Lettich E (1989a) Neuronalactivity in the human lateral temporal lobe I Responses tospeech Experimental Brain Research 77 451 ndash475

Creutzfeldt O Ojemann G amp Lettich E (1989b) Neuronalactivity in the human lateral temporal lobe II Responses tothe subjects own voice Experimental Brain Research 77476 ndash489

Curio G Neuloh G Numminen J Jousmaki V amp Hari R(2000) Speaking modifies voice-evoked activity in thehuman auditory cortex Human Brain Mapping 9183 ndash191

Demonet J F Chollet F Ramsay S Cardebat DNespoulous J L Wise R Rascol A amp Frackowiak R(1992) The anatomy of phonological and semanticprocessing in normal subjects Brain 115 1753 ndash1768

Diesch E amp Luce T (1997) Magnetic fields elicited bytones and vowel formants reveal tonotopy and nonlinearsummation of cortical activation Psychophysiology 34501 ndash510

Elman J L (1981) Effects of frequency-shifted feedback onthe pitch of vocal productions Journal of the AcousticalSociety of America 70 45ndash50

Franklin G F Powell J D amp Emami-Naeini A (1991)Feedback control of dynamic systems (2nd ed) ReadingMA Addison-Wesley

Gracco V L Ross D Kalinowski J amp Stuart A (1994)Articulatory changes following spectral and temporalmodifications in auditory feedback Journal of theAcoustical Society of America 95 2821

Gunji A Hoshiyama M amp Kakigi R (2000) Identificationof auditory evoked potentials of onersquos own voice ClinicalNeurophysiology 111 214 ndash219

Gunji A Kakigi R amp Hoshiyama M (2000) Spatiotemporalsource analysis of vocalization-associated magnetic fieldsCognitive Brain Research 9 157 ndash163

Hari R Aittoniemi K Jarvinen M L Katila T amp Varpula T(1980) Auditory evoked transient and sustained magneticfields of the human brain Localization of neural generatorsExperimental Brain Research 40 237 ndash240

Hari R amp Makela J P (1988) Modification of neuromagneticresponses of the human auditory cortex by masking soundsExperimental Brain Research 71 87ndash92

Hein A V amp Held R (1962) A neural model for labile

1136 Journal of Cognitive Neuroscience Volume 14 Number 8

sensorimotor coordination In E Bernard amp M Hare(Eds) Biological prototypes and synthetic systems(vol 1 pp 71ndash74) New York Plenum

Hickok G amp Poeppel D (2000) Towards a functionalneuroanatomy of speech perception Trends in CognitiveSciences 4 131 ndash138

Hirano S Kojima H Naito Y Honjo I Kamoto YOkazawa H Ishizu K Yonekura Y Nagahama YFukuyama H amp Konishi J (1996) Cortical speechprocessing mechanisms while vocalizing visually presentedlanguages NeuroReport 8 363ndash367

Hirano S Kojima H Naito Y Honjo I Kamoto YOkazawa H Ishizu K Yonekura Y Nagahama YFukuyama H amp Konishi J (1997) Cortical processingmechanism for vocalization with auditory verbal feedbackNeuroReport 8 2379ndash2382

Hirano S Naito Y Okazawa H Kojima H Honjo I IshizuK Yenokura Y Nagahama Y Fukuyama H amp Konishi J(1997) Cortical activation by monaural speech soundstimulation demonstrated by positron emission tomographyExperimental Brain Research 113 75ndash80

Houde J F amp Jordan M I (1998) Sensorimotor adaptationin speech production Science 279 1213 ndash1216

Jacobs O L R (1993) Introduction to control theory(2nd ed) Oxford UK Oxford University Press

Jeannerod M (1988) The neural and behaviouralorganisation of goal-directed movements Oxford UKOxford University Press

Jurgens U (2002) Neural pathways underlying vocalcontrol Neuroscience and Biobehavioral Reviews 26235 ndash258

Kawahara H (1993) Transformed auditory feedback Effectsof fundamental frequency perturbation Journal of theAcoustical Society of America 94 1883

Kuriki S Mori T amp Hirata Y (1999) Motor planningcenter for speech articulation in the normal human brainNeuroReport 10 765 ndash769

Kuriki S amp Murase M (1989) Neuromagnetic study of theauditory responses in right and left hemispheres of thehuman brain evoked by pure tones and speech soundsExperimental Brain Research 77 127 ndash134

Lane H amp Tranel B (1971) The lombard sign and the roleof hearing in speech Journal of Speech and HearingResearch 14 677ndash709

Lee B S (1950) Some effects of side-tone delay Journal ofthe Acoustical Society of America 22 639 ndash640

Leonardo A amp Konishi M (1999) Decrystallization of adultbirdsong by perturbation of auditory feedback Nature 399466 ndash470

Levelt W J (1983) Monitoring and self-repair in speechCognition 14 41ndash104

Levelt W J M (1989) Speaking From intention toarticulation Cambridge MIT Press

Liegeois-Chauvel C Musolino A Badier J M Marquis P ampChauvel P (1994) Evoked potentials recorded from theauditory cortex in man Evaluation and topography of themiddle latency components Electroencephalography andClinical Neurophysiology 92 204 ndash214

Mahncke H (1998) Oscillations and temporal processingin sensory cortex Unpublished doctoral dissertationUniversity of California San Francisco

McCasland J S amp Konishi M (1981) Interaction betweenauditory and motor activities in an avian song controlnucleus Proceedings of the National Academy of SciencesUSA 78 7815ndash7819

McGuire P K Silbersweig D A amp Frith C D (1996)Functional neuroanatomy of verbal self-monitoring Brain119 907 ndash917

Mosher J C Lewis P S amp Leahy R M (1992) Multipledipole modeling and localization from spatio-temporal megdata IEEE Transactions on Biomedical Engineering 39541 ndash557

Muller-Preuss P amp Ploog D (1981) Inhibition of auditorycortical neurons during phonation Brain Research 21561ndash76

Numminen J amp Curio G (1999) Differential effects of overtcovert and replayed speech on vowel-evoked responsesof the human auditory cortex Neuroscience Letters 27229ndash32

Numminen J Salmelin R amp Hari R (1999) Subjectrsquos ownspeech reduces reactivity of the human auditory cortexNeuroscience Letters 265 119 ndash122

Papanicolaou A C Raz N Loring D W amp EisenbergH M (1986) Brain stem evoked response suppressionduring speech production Brain and Language 2750ndash55

Perkell J (1997) Articulatory processes In W J Hardcastleamp J Laver (Eds) The handbook of phonetic sciences(pp 333 ndash370) Oxford UK Blackwell Publishers

Picton T W Alain C Woods D L John M S Scherg MValdes-Sosa P Bosch-Bayard J amp Trujillo N J (1999)Intracerebral sources of human auditory-evoked potentialsAudiology and Neuro-otology 4 64ndash79

Poeppel D Yellin E Phillips C Roberts T P Rowley H AWexler K amp Marantz A (1996) Task-induced asymmetry ofthe auditory evoked m100 neuromagnetic field elicited byspeech sounds Brain Research Cognitive Brain Research4 231 ndash242

Reite M Adams M Simon J Teale P Sheeder JRichardson D amp Grabbe R (1994) Auditory m100component 1 Relationship to heschlrsquos gyri BrainResearch Cognitive Brain Research 2 13ndash20

Richer F Alain C Achim A Bouvier G amp Saint-Hilaire J M(1989) Intracerebral amplitude distributions of the auditoryevoked potential Electroencephalography and ClinicalNeurophysiology 74 202 ndash208

Robinson S amp Vrba J (1999) Functional neuroimaging bysynthetic aperture magnetometry (SAM) In Yoshimoto et al(Eds) Recent advances in biomagnetism (pp 302 ndash305)Sendai Japan Tokyo University Press

Scherg M amp Von Cramon D (1985) Two bilateral sources ofthe late aep as identified by a spatio-temporal dipole modelElectroencephalography and Clinical Neurophysiology 6232ndash44

Scherg M amp Von Cramon D (1986) Evoked dipolesource potentials of the human auditory cortexElectroencephalography and Clinical Neurophysiology65 344 ndash360

Schmidt M F amp Konishi M (1998) Gating of auditoryresponses in the vocal control system of awake songbirdsNature Neuroscience 1 513ndash518

Sekihara K Nagarajan S S Poeppel D Marantz Aamp Miyashita Y (2001) Reconstructing spatio-temporalactivities of neural sources using an meg vectorbeamformer technique IEEE Transactions on BiomedicalEngineering 48 760ndash771

Sekihara K Poeppel D Marantz A Koizumi H ampMiyashita Y (1997) Noise covariance incorporatedmeg-music algorithm A method for multiple-dipoleestimation tolerant of the influence of background brainactivity IEEE Transactions on Biomedical Engineering44 839 ndash847

Stevens K N (1999) Acoustic phonetics CambridgeMIT Press

Stufflebeam S M Poeppel D Rowley H A amp RobertsT P (1998) Peri-threshold encoding of stimulus

Houde et al 1137

frequency and intensity in the m100 latency NeuroReport9 91ndash94

Suga N amp Schlegel P (1972) Neural attenuation ofresponses to emitted sounds in echolocating rats Science177 82ndash84

Suga N amp Shimozawa T (1974) Site of neural attenuation ofresponses to self-vocalized sounds in echolocating batsScience 183 1211 ndash1213

von Bekesy G (1949) The structure of the middle ear and thehearing of onersquos own voice by bone conduction Journal ofthe Acoustical Society of America 21 217 ndash232

Weiskrantz L Elliott J amp Darlington C (1971) Preliminaryobservations on tickling oneself Nature 230 598 ndash599

Woods D L Knight R T amp Neville H J (1984)Bitemporal lesions dissociate auditory evoked potentialsand perception Electroencephalography and ClinicalNeurophysiology 57 208 ndash220

Yates A J (1963) Delayed auditory feedback PsychologicalBulletin 60 213 ndash232

Zatorre R J Evans A C Meyer E amp Gjedde A (1992)Lateralization of phonetic and pitch discrimination in speechprocessing Science 256 846 ndash849

1138 Journal of Cognitive Neuroscience Volume 14 Number 8

for these magnetic field responses These localizationdata suggest that the observed response suppressionattenuation is arising from a restricted cortical regionpresumably A1 and its immediate environs

Overall all subjectsrsquo M100 responses were localized tothe temporal cortex in the area of the auditory cortexWe infer this because lesion studies intracranial record-ings and source modeling studies provide convergingevidence that the neuronal generators of the M100response were located in the auditory regions of thetemporal lobes and include A1 and its immediate envi-rons (Ahissar et al 2001 Picton et al 1999 Liegeois-Chauvel Musolino Badier Marquis amp Chauvel 1994Reite et al 1994 Richer Alain Achim Bouvier amp Saint-Hilaire 1989 Scherg amp Von Cramon 1985 1986Woods Knight amp Neville 1984 Hari Aittoniemi Jarvi-nen Katila amp Varpula 1980)

Experiment 2

The results of Experiment 1 clearly confirmed that theM100 response is suppressed in the speaking conditionOne possible mechanism that could cause this suppres-sion is nonspecific attenuation That is neural signalsarising as a result of activity in the motor cortex mightdirectly and broadly inhibit auditory cortex activity

If auditory cortex activity is nonspecifically suppressedduring speaking then it should be less responsive to allauditory signals In Experiment 2 we tested this possi-bility by examining the M100 response to tones heardduring speech that was either self-produced or playedback from tape

In this experiment we measured the responses evokedby tones pips under three successive conditions asshown in Figure 6 In the first condition (tones aloneFigure 6a) subjects remained silent and heard 10-kHztone pips In the second condition (tones and speakingFigure 6b) subjects again heard these tones pips but thistime the tones were presented while subjects producedlong utterances of the vowel (again with no move-ment of their jaw or tongue) In the third condition(tones and tape playback Figure 6c) subjects againremained silent and heard both tone pips and tapeplayback of their speech produced in the previous con-dition For the last two conditions in which the subjectheard tone pips and speech the speech was attenuatedby 20 dB to reduce the masking of the tone pips

The results of Experiment 2 are summarized in Figure7 The figure shows a plot of the mean RMS responseacross detectors and subjects for each condition and ineach hemisphere Dashed traces show subjectsrsquo re-sponses to hearing the tones in silence (tones alonecondition) Thick traces show responses to hearing thetones while subjects produced (tones and speakingcondition) Thin traces show responses to hearing thetones along with the taped speech playback (tones andtape playback condition) As in Figure 5 the vertical bars

are standard errors Asterisks indicate time intervalsduring which responses to tones heard while produc-ing speech (tones and speaking condition) differedsignificantly ( p lt 001) from the responses to tonesheard with taped speech playback (tones and tapeplayback condition)

For most of the traces in Figure 7 the M100 and M200responses to the tones across conditions stand outclearly For both hemispheres these responses are veryprominent for tones heard in silence (dashed traces) Inthe left hemisphere there was a reduction in M100 andM200 response to the tones in the speaking (thicktraces) and tape playback (thin traces) conditions In-terestingly the response reduction in the speakingcondition appeared to be greater than the responsereductions in the tape playback condition Indeed thestatistical analysis confirmed that the M100 response in

Figure 6 Setup for the three conditions of Experiment 2

Houde et al 1129

the tones and speaking condition was significantly lessthan the M100 response in the tones and tape playbackcondition Nevertheless the difference was small In theright hemisphere there was an approximately equalreduction in M100 and M200 responses to the tones inthe speaking and tape playback conditions Within theM100ndashM200 region responses in these two conditionswere only significantly different at one latency as shownby the asterisk

It appears that at the most the act of speakingcreates only a limited degree of nonspecific attenuationto the tones in the left hemisphere that is not seen inthe tape playback condition To quantify how muchnonspecific attenuation contributes to the suppressionof the response to self-produced speech we can esti-mate the signal reductions needed to produce theresponse reductions seen in these experiments Fromthe results of Experiment 1 we have seen that com-pared to speech from the tape playback self-productionof speech created a 30 reduction in M100 amplitudein the left hemisphere and a 15 reduction in the righthemisphere equivalent to 13 and 7 dB decreases ineffective input signal respectively1 In Experiment 2 inthe left hemisphere there was possibly an extra 7reduction in M100 response to tones while subjectsspoke as compared to the M100 response to tones duringtape playback which is equivalent to only a 3-dB reduc-tion in effective input signal Thus it appears that non-specific attenuation cannot account for all the responsesuppression recorded for self-produced speech

It has also been argued that any attenuation in theresponses during vocalization could occur due to boneconduction and activation of middle-ear muscles or dueto inhibition at the level of the brain stem (Papanicolaouet al 1986) Thus to further test the hypothesis ofnonspecific attenuation we measured brainstem evokedresponses to click trains in five subjects under threeconditions identical to those in Experiment 2mdashsilenceproduced speech and tape-recorded speech In the lattertwo conditions we attenuated the speech background by20 dB to obtain a clear evoked response to clicks Foreach of these conditions we recorded the brainstemevoked response and measured the latency and ampli-tude of Waves IndashV The results from these experimentsindicated that there were no statistical differences ineither the amplitude or the latencies for all wavesbetween the three background conditions ( p gt 2)These results provide further support to the hypothesisthat our observed effects were cortical in origin

Experiment 3

Another possible cause of the response reduction toself-produced speech is that activity in the auditory

Figure 7 RMS waveforms for responses to 10-kHz tone pipsaveraged across all subjects for the three conditions of Experiment 2the tones alone condition (dashed traces) the tones and speakingcondition (thick traces) and the tones and tape playback condition(thin traces) The vertical bars in each trace show standard errorswhile the asterisks show intervals where the response waveforms fromthe tones and speaking condition (red) and the tones and tapeplayback condition (blue) differed significantly relative to p lt 001 forboth hemispheres

Figure 8 Setup for the two conditions of Experiment 3

1130 Journal of Cognitive Neuroscience Volume 14 Number 8

cortex is the result of a comparison between the incom-ing signal and an internally generated prediction of whatthat signal would be If this were to be the case thenaltering the incoming signal should create a mismatchwith that expectation that would reduce or abolish theresponse suppression

In Experiment 3 we tested this possibility by alter-ing the auditory feedback to the subject during self-production of speech The setup for Experiment 3 isshown in Figure 8 Like Experiment 1 this experimentconsisted of two conditions speaking and tape play-back However instead of hearing their speech (self-produced or from tape playback) subjects heard a sumof their speech plus the output of a white noise gen-erator that produced noise bursts gated to the durationof the speech utterance Subjects reported that theycould not hear their own speech and that they onlyheard noise gated to their vowel production

The results of Experiment 3 are shown in Figure 9 Forboth hemispheres the thick traces show the averageevoked response of subjects hearing gated noise in thespeaking condition while the thin traces show theirresponses to gated noise in the tape playback conditionAsterisks mark poststimulus latencies where the evokedresponses in the two conditions differ significantlyrelative to p lt 001 for both hemispheres The figureshows subjectsrsquo evoked responses in the two conditionsare significantly different at a number of latencies In theleft hemisphere the responses in the speaking and tapeplayback conditions differ significantly around the M100responses and at a latency of 150 msec In the righthemisphere the speaking and tape playback responses

differ significantly at the 50- and 200-msec latenciesSince we are limiting our analysis of these results toconsidering only M100 responses the only responsedifferences of interest are the significant differences seenin the M100 region of the left hemisphere responses Itis clear from the figure however that these responsedifferences arise principally from a latency differencebetween M100 responses the M100 response to gatednoise in the tape playback condition is about 20 msecearlier that the M100 response to gated noise in thespeaking condition The amplitudes of the two M100responses appear nearly identical

In sum Figure 9 shows that the M100 amplitudesuppression seen in Experiment 1 is abolished in bothhemispheres the amplitude of M100 response to noisegenerated from subjectsrsquo speech via the microphone isas large as the M100 response to noise generated fromthe tape playback of their speech Thus when wealtered subjectsrsquo speech feedback in Experiment 3 wefound that the response suppression to self-producedspeech disappeared which is consistent with the sup-pression resulting from auditory input matching aninternally expected input

DISCUSSION

The present study demonstrates that the human audi-tory cortex responds differently to self-produced speechthan to externally produced speech In Experiment 1significant suppression in the amplitude of M100responses was recorded in the speaking condition inboth hemispheres confirming that self-production ofspeech suppresses the response in the auditory cortexto that speech In Experiment 2 M100 responses totone pips were seen to be modestly more stronglysuppressed by self-produced than by externally pro-duced speech However this relatively weak extra sup-pression was not sufficient to explain the suppressionseen in Experiment 1 On the other hand in Experi-ment 3 we were able to abolish the suppression ofM100 responses to self-produced speech by altering thefeedback heard by subjects This last result is consistentwith the hypothesis that the suppression of response inthe auditory cortex to self-produced speech resultsfrom a match with expected auditory feedback

Comparison with Other Studies

Experiment 1 is quite similar to the study by Curio et al(2000) which also examined M100 response differencesbetween speaking and tape playback conditions Theauthors found M100 amplitude differences across listen-ing conditions in both hemispheres but unlike thepresent study these differences were not significant inthe right hemisphere Further in contradiction to ourstudies Curio et al described significant differencesin M100 latencies across listening conditions in both

Figure 9 RMS response waveforms in Experiment 3 averaged acrossall subjects The vertical bars in each trace show standard errors whilethe asterisks show intervals where the response waveforms from thespeaking condition (thick traces) and tape playback condition (thintraces) differ significantly relative to p lt 001 for both hemispheres

Houde et al 1131

hemispheres Different results in M100 latencies couldbe accounted for by the fact that we matched theacoustic waveform presented to each subject whileCurio et al behaviorally matched the perceptual inten-sity between the listening conditions Because subjectscould set the tape playback volume to be different fromtheir speech feedback volume differing audio levels inthe two experiment conditions could account for someof the M100 latency differences that they report WithMEG data acquisition triggered by speech signal onsettriggering occurs when the audio signal exceeds somethreshold voltage As a result triggering time is sensitiveto the amplification of the signal The more a signal isamplified the faster it will reach the trigger thresholdThus if tape playback volume levels were less thanspeech feedback volume levels triggering from tapeplayback could be delayed compared to the speakingcondition If so then for the same true M100 latencyM100 peaks would appear to have shorter latencies inthe tape playback condition

In Experiment 3 the amplitude difference in M100responses between the speaking and tape playbackconditions was abolished by using gated noise todistort the feedback heard by subjects Interestinglyhowever this feedback manipulation created a latencydifference between the two conditions In the lefthemisphere the M100 response to gated noise trig-gered by tape playback was 15 msec earlier than thatfor gated noise triggered by the subjectrsquos own speechwhile in the right hemisphere such a latency differenceis not seen Curio et al (2000) also found that thereduced latency of M100 responses to tape playbackversus self-produced speech was significantly morepronounced in the left hemisphere Like Curio et alwe attribute this left hemisphere latency difference totwo factors First this latency difference may representdifferential processing of externally produced versusself-produced auditory feedback Although the gatednoise altered the spectral characteristics of their speechfeedback such that it was likely to mismatch anyinternally generated expectations of how their feedbackwould sound the gated noise did not alter the tempo-ral characteristics of their speech feedback Subjectsstill heard auditory feedback begin at the same time asthey commenced phonation This temporal predictabil-ity could potentially have allowed subjects to processdifferently gated noise arising from their own speechwhose onset they could predict from gated noisearising from tape playback whose onset they couldnot predict Second the fact that this latency differencewas seen primarily in the left hemisphere is consistentwith the left hemispherersquos dominance in the produc-tion of speech in right-handed speakers Other aspectsof our results may also reflect this left hemispheredominance In Experiment 1 the M100 response totape playback speech was larger in the left hemispherethan in the right and the M100 response to self-

produced speech was temporally sharper in the lefthemisphere than in the right Additionally in Experi-ment 2 in the left hemisphere the M100 response totones was significantly more suppressed by self-pro-duced speech than by tape playback speechmdasha differ-ence not seen in the right hemisphere

Results from Experiment 2 are mostly consistent withthe recent experiments by Numminen and Curio (1999)They reported that neuromagnetic responses to shortrecorded vowel sounds were delayed and dampenedrelative to background-free presentations They did notreport how much they externally attenuated the pro-duced speech signal in order to obtain a clear responseto the probe-vowel sound In our experiments we usedprobe tones instead of vowel sounds and we attenuatedthe produced speech signal to the ear by 20 dBSurprisingly we did not observe any latency differencesin the M100 responses for tones between the threeconditions We believe that the lack of a latency effectcould be accounted for by the fact that we used tonesinstead of vowel sounds (Diesch amp Luce 1997 Poeppelet al 1996 Kuriki amp Murase 1989) Furthermore by anargument analogous to that discussed above for theCurio et al (2000) latency results it is also reasonablethat any amplitude mismatches in the masking soundscould account for discrepancies in the latency effect(Hari amp Makela 1988) However while Numminen andCurio identified auditory interference as the main causeof these modifications their results also provide evi-dence against the nonspecific attenuation hypothesisalthough they do not specifically discuss their results inthis form For instance they found an additional 6ndash9decrease in amplitude in response to probe-vowelsounds during produced speech background whencompared with tape-recorded speech backgroundThese amplitude differences are consistent with ourExperiment 2 results

Gunji et al have found that vocalization-related cort-ical magnetic fields activated six sources that were foundto temporally overlap in the period 0ndash100 msec aftervocalization onset (Gunji Hoshiyama amp Kakigi 2000Gunji Kakigi amp Hoshiyama 2000) Sources 1 and 2 wereactivated approximately 150 msec before the vocalizationonset and were located in laryngeal motor areas of theleft and right hemispheres respectively Kuriki Mori andHirata (1999) have also found a motor planning centerfor speech articulation that is activated 120ndash320 msecbefore the onset of vocalization and located in a regionaround the superior end of the left insula Gunji et alSources 5 and 6 were located in the truncal motor area ineach hemisphere and were similar to Sources 1 and 2 Incontrast their Sources 3 and 4 were located in theauditory cortices of the left and right hemisphererespectively and were activated after vocalization onsetHowever these experiments did not explore either themodulation of the responses in the auditory or motorcortex due to vocalization or feedback alteration In our

1132 Journal of Cognitive Neuroscience Volume 14 Number 8

experiments because we were unable to simultaneouslymeasure activity from motor and supplementary motorareas using a 37-channel sensor array over each hemi-sphere we focus here only on the activity from auditorycortices of both hemispheres

More relevant to our results are the studies of Hiranoet al (Hirano et al 1996 Hirano Kojima et al 1997Hirano Naito et al 1997) who used PET to look atcortical activation during speaking In their first studyHirano et al found a lack of activation in the auditorycortex while subjects vocalized In their second studyHirano et al altered how subjects heard their ownvocalizations and this time recorded significant activa-tion of the auditory cortex These results are consistentwith our findings in Experiments 1 and 3 of our study

Why is the Response to Self-Produced SpeechSuppressed

The present study along with recent MEG studies byCurio et al and other brain imaging studies that haveused PET or fMRI demonstrate that during speechproduction the auditory cortex suppresses its responseto expected acoustic signals (Hirano et al 1996 HiranoKojima et al 1997 Hirano Naito et al 1997) Doesthat suppression serve any function We speculate abouttwo possibilities

Auditory Perception

One possibility is that the suppression results from theprocess of distinguishing self-produced from externallyproduced sources It would be useful for sensory systemsto make this distinction since sensory information fromthese two types of sources is commonly used for differ-ent purposes Sensory information from self-producedmotor actions could potentially be used for feedbackcontrol On the other hand sensory information fromexternal sources is primarily used for recognition

But how could the sensory systems distinguish self-produced from externally produced sensory informa-tion It has been proposed by previous researchers thatthis is done by having motor actions produce expect-ations of their sensory consequences (Jeannerod 1988)These sensory outcome predictions are then comparedwith the actual sensory input whatever matches theoutcome predictions is inhibited In that way the sen-sory consequences of self-produced actions could befiltered from incoming sensory data thus highlightingany unpredicted and therefore external (or novel) stim-ulus that should potentially be attended to

For a number of reasons the results of our experi-ments are consistent with the above account Firstalthough we found significant suppression of the re-sponses of the auditory cortex to self-produced speechwe found little evidence that the act of producingspeech suppresses responses to other sounds (in our

case the 1-kHz tone) This finding is consistent with thehypothesis that only responses to self-produced soundsare being filtered from the auditory cortex In additionwe found that altering the auditory feedback with gatednoise abolished the suppression in the auditory cortexinduced by speaking consistent with the hypothesis thatthe filtering out of auditory information arising from self-produced speech is done by comparison with predictedauditory feedback

Recently Blakemore et al have recorded the sametypes of response suppression phenomena in the soma-tosensory cortex suggesting that the need to distinguishself-produced from externally generated sensory inputmay be a general property of sensory systems In theirseries of experiments Blakemore et al used bothbehavioral and physiological procedures to look at re-sponses to both self-produced and externally producedsomatosensory stimulation In one experiment theyexamined the well-known phenomenon that peoplecannot tickle themselves (Weiskrantz Elliott amp Darling-ton 1971) This phenomenon could be explained bysupposing that somatosensory responses are inhibitedwhen the incoming sensory (touch) information matchesa prediction of that input generated by the personrsquos ownmotor system To test this hypothesis these investigatorshad subjects tickle their palm via an apparatus that couldintroduce a time delay between their movement andthe touch stimulation of their palm (Blakemore Frith ampWolpert 1999) When this was done subjects reportedthat with increasing delay the stimulation of theirpalm was increasingly stimulating (ticklish) That theperceived stimulation increased with delay could beexplained by supposing that the delay made the incom-ing sensory information mismatch in time the predictedsensory input This experiment was followed up in anfMRI imaging study looking at activation of the somato-sensory cortex and cerebellum to both self-producedand externally generated palm stimulation (BlakemoreWolpert amp Frith 1999) It was found that activation ofthe somatosensory cortex to self-produced palm stimu-lation was suppressed compared to externally producedpalm stimulationmdasha direct analog of the results recordedin the auditory cortex in our experiments

Speech Motor Control

The suppression could also serve to regulate howmuch auditory feedback is used in the ongoing controlof speech

A problem with using auditory feedback to controlspeech is the delay incurred by the processing of theauditory input To control speech auditory feedbackmust convey information about the positions of thevocal tract articulators From speech acoustics we knowthat formant frequencies and pitch are the acousticparameters that convey this information (Stevens1999) Extracting these parameters from the type of

Houde et al 1133

signal provided by the cochlea requires significant signalprocessing which in turn suggests the possibility ofsignificant delay if a succession of auditory brain areasare needed for this processing (Jurgens 2002 BurnettFreedland Larson amp Hain 1998 Perkell 1997) Inaddition there will be delays associated with the gen-eration of a response in the motor cortex There will alsobe intrinsic delays in transmitting the neural signals tothe appropriate muscles as well as delays in muscleresponses to these signals (Jurgens 2002) Togetherthese factors could contribute to a significant feedbackloop delay between auditory input and motor output

Unfortunately we know from control theory that ingeneral using feedback to control a system with a largefeedback loop delay will not work because the systembecomes unstable (Franklin Powell amp Emami-Naeini1991) Intuitively this instability arises because the delaymakes the feedback uncorrelated with the state of thesystem sensory feedback from some time in the pastdoes not entirely reflect the current state of the systemYet auditory feedback does appear to be involved in theonline control of some aspects of speech production Asnoted in the Introduction if the pitch of a speakerrsquosfeedback is perturbed they will typically compensatewithin 100 ndash200 msec (Burnett et al 1998 Kawahara1993 Elman 1981) Thus in spite of a large (100ndash200 msec) feedback loop delay auditory feedback doesstably regulate a speakerrsquos pitch How is this possibleAgain from control theory we know that when feedbackis delayed corrupted by noise or otherwise madeuncorrelated with current motor output a successfulapproach to using feedback control is to attenuate thesensory feedback to the degree that it is uncorrelatedSpecifically in feedback control based on Kalman filter-ing the Kalman gain on sensory feedback is set in partto be inversely proportional to the degree to whichsensory feedback is correlated with the current systemstate (Jacobs 1993) In an analogous way it may be thatthe suppression of the auditory cortex during speakingfunctions to attenuate the auditory feedback to a levelcommensurate with its delay such that it can be usedproperly in the control of speech

Finally it is possible that both of the above accountsare true It may be that the suppression of self-producedspeech acts as a filter for incoming sensory data allow-ing attention to be paid to externally produced sensa-tions while at the same time properly attenuating theauditory feedback for use in the online control ofspeech Future experiments must be designed to distin-guish between the above hypothesized possibilities

METHODS

Subjects

Eight male volunteers (ages 25ndash40) participated inExperiments 1 and 3 of this study Of these volun-

teers six also participated in Experiment 2 All subjectsgave their informed consent All studies were per-formed with the approval of the UCSF Committee forHuman Research

Experimental Design

Magnetic fields were recorded from both hemispheresin a shielded room using two 37-channel biomagnetom-eters with SQUID-based first-order gradiometer sensors(Magnes II 4-D Neuroimaging San Diego CA) A sche-matic of the experimental setup is shown in Figure 1A directional microphone was placed in the chamber at adistance where it did not distort the recorded magneticfields Triggering of the data acquisition system and ofstimuli delivered to the ears was controlled and itdiffered for each of three experiments in the study

Fiduciary points were marked on the skin for latercoregistration with structural magnetic resonanceimages and the head shape was digitized to constrainsubsequent source modeling The sensor was initiallypositioned over the estimated location of the auditorycortices in both hemispheres such that a dipolar M100response was evoked by single 400-msec duration tonepips (1 kHz 5 msec risefall ramps 90 dB SPL) Dataacquisition epochs were 600 msec in total duration witha 100-msec prestimulus period referenced to the onsetof the first speech stimulus Data were acquired at asampling rate of 1041 Hz The position of the sensorwas then refined so that a single dipole localizationmodel of the averaged evoked magnetic field responsesto 100 tones resulted in a correlation and goodness-of-fitgreater than 95

With satisfactory sensor positioning over the auditorycortices in both hemispheres but before starting theexperiments subjects practiced producing the neutralvowel sound by phonating while relaxing their uppervocal tract generating no movement of their jaw ortongue Preliminary experiments revealed that this formof speech production produced no movement artifactsin these magnetic field recordings

During the speaking condition of Experiments 1 and 3the audio onset of each utterance picked up the micro-phone triggered the MEG data acquisition system whichrecorded 1 sec of data (300 msec pretrigger 700 msecposttrigger) at a sampling rate of 1041 Hz Subjectswere instructed to produce short vowel sounds byphonating their vocal chords without any movement oftheir jaw or tongue Responses to 100 such utteranceswere recorded and these magnetic field recordings werethen averaged to obtain the evoked magnetic fieldresponses to self-produced speech for each MEG detec-tor channel In the subsequent tape playback conditionthe audio signal from the tape recorder was fed to thesubjects earphones and the trigger input for MEG dataacquisition Again magnetic field recordings wereaveraged to obtain the evoked magnetic field responses

1134 Journal of Cognitive Neuroscience Volume 14 Number 8

During the tape playback condition of Experiments 1and 3 the audio level of the signal fed to the earphonesand trigger was adjusted to be the same as that of thespeaking condition Because of this the trigger of MEGdata acquisition (which was based on an audio voltagethreshold) occurred at the same time relative to utter-ance onset in both conditions The equal audio levelsensured that triggering latency differences could notaccount for any differences between the evoked re-sponses to self-produced and tape-recorded speech InExperiment 1 the subject heard these equal audio levelsin the earphones which meant that his side tone ampli-tude was the same in both the speaking and tape play-back conditions In Experiment 3 the audio signal foreach utterance was used to trigger both the MEG dataacquisition system and a white noise generator (20 Hzndash20 kHz bandwidth 85 dB SPL) In this case a sum of theiraudio signal and noise was presented to the subject

In Experiment 2 onset of the tone pips triggered MEGdata acquisition in each of the three conditions (tonesalone tones and speaking and tones and tape playback)These 1-kHz tones each had a 400-msec duration 5-mseconset and offset ramps and a 08ndash12-sec ISI betweentones The subject heard the tones at 90 dB SPL As withExperiment 1 data acquisition epochs were 10 sec intotal duration with a 300-msec prestimulus period Datawere acquired at a sampling rate of 1041 Hz In eachcondition the evoked magnetic field responses to thetone pips were obtained by averaging 100 epochs

Epochs 1500 msec long with a 500-msec prestimulusperiod were collected at a sampling rate of 1041 Hz Theraw epoch magnetic field data from these 100 utteranceswere then averaged to obtain the evoked magnetic fieldresponses to gated noise during self-production

The voltage level of the audio signal fed to earphoneand trigger signal in this condition was adjusted to beidentical to the voltage levels recorded during the speak-ing condition ensuring that triggering differences couldnot account for any differences between the evokedresponses to noise in the self-produced and tape play-back blocks

Data Analysis

Data from the two sensor arrays positioned over eachhemisphere were separately analyzed From the evokedmagnetic field response for each condition in eachchannel the RMS value of evoked magnetic field strengthaveraged across channels was computed The RMS meth-od of averaging (square root of the mean of squaredchannel field strengths) was used to avoid the cancella-tion that occurs when the magnetic fields from detectorson opposite sides of the current dipole (whose fields areroughly equal in magnitude but opposite in sign) areaveraged together This process resulted in one RMStime waveform for each hemisphere for each conditionfor each subject For statistical comparisons across con-

ditions a one-way analysis of variance was performedwith condition as factor and time as a repeated measure

The RMS across channels is a convenient measure ofactivity that is independent of the position of individualsensors Following optimal placement of the sensor for asignificant M100 dipolar response RMS has been shownto correlate well with the lsquolsquoequivalent current-dipolestrengthrsquorsquo (also called the lsquolsquoQ valuersquorsquo) of a dipole posi-tioned in and around the auditory cortex The RMSmeasure is also correlated with the data obtained from(1) the channel with the biggest signal-to-noise ratioand (2) the first two principal components of the arrayresponse (Ahissar et al 2001 Mahncke 1998)

In sensor arrays comprising of planar gradiometers(like the Neuromag 122 or 306) the channel with largestsignal-to-noise ratio is often closest to the source (asdictated by the physics) However in the case of sensorarrays comprising of axial gradiometers (as in ourMagnes II system) the channel with the largest signal-to-noise ratio reflects the peak of the magnetic fieldwhich is not the closest channel to a source Moreoverthe lsquolsquobestrsquorsquo channel could reflect either a peak lsquolsquoinwardrsquorsquoor lsquolsquooutwardrsquorsquo field requiring signed inversion whileaveraging such data across subjects Finally since thesensor array position can varying considerably acrosssubjects it is difficult to use procedures typical for EEGand whole-head studies wherein single channel data isoften picked for analysis across subjects These consid-erations have led us to use RMS instead of pickingspecific channels for our analysis

Single Dipole Model Fit

A single equivalent current-dipole model was calculatedseparately for each hemisphere for each poststimulustime point using standard 4-D Neuroimaging softwareoperating on data that had been filtered between 1 and20 Hz The localization algorithm used an iterative least-squares minimization to compute the strength andlocation of a single dipole in a spherical volume ofuniform conductivity that can account for the sensordata Dipole fits were accepted based on a local corre-lation maximum criteria of 95 and goodness-of-fit valuesgreater than 95

The MEG anatomical reference frame was establishedusing a digital device called the lsquolsquosensor position indica-torrsquorsquo A set of three table-mounted and one array-mounted receivers triangulates the signal from a stylustransmitter positioned at fiducial reference points on thesubjectrsquos head surface (typically nasion left right pre-auricular points Cz and inion) The same stylus trans-mitter arrangement served to define the curvature of thehead by tracing the surface from which a lsquolsquolocal spherersquorsquomodel of the head was generated To coregister thedipole locations on the MRI image the fiduciary pointsserve as the basis for a common coordinate system Bysuperimposing these fiducial landmarks on MR images

Houde et al 1135

of the subject it was possible to define the position ofthe computed point sources with an accuracy of 5 mm

Multiple Dipole Localization Analysis

Multiple dipole localization analyses of spatio-temporalevoked magnetic fields was also performed using Multi-ple Signal Classification (MUSIC) (Sekihara PoeppelMarantz Koizumi amp Miyashita 1997 Mosher Lewisamp Leahy 1992) and beamformer algorithms (Mosheret al 1992 Sekihara Nagarajan Poeppel Marantz ampMiyashita 2001 Robinson amp Vrba 1999) MUSIC meth-ods are based on estimation of a signal lsquolsquosubspacersquorsquofrom entire spatio-temporal MEG data using singular-value decomposition (SVD) A version of the MUSICalgorithm referred to as the lsquolsquoconventionalrsquorsquo MUSICalgorithm was implemented in MATLAB under theassumption that the sources contributing to the MEGdata arose from multiple dipoles (lt37 in number)which were stationary and were located within a spher-ical volume of uniform conductivity (Sekihara et al1997) The locations of dipoles are typically determinedby conducting a search over a three-dimensional grid ofinterest within the head Given the sensor positionsand the coordinates of the origin of a lsquolsquolocal spherersquorsquoapproximation of the head shape for each subject alead-field matrix was computed for each point in this3-D grid At each point in this grid from the lead-fieldmatrix and the covariance matrix of the spatio-temporalMEG data the value of a MUSIC localizer function canthen be computed (Sekihara et al 1997 Equation (4))Maxima of this localizer function then corresponded tothe location of dipolar sources We used the MUSICalgorithm to compute the localizer functions over athree-dimensional grid in the hemisphere contralateralto the stimulus delivery The localizer was computedusing the evoked magnetic field responses over a timewindow of 1000 msec following the onset of thesentences For each subject at each point in a 3-D grid(iexcl4 lt x lt 6 0 lt y lt 8 3 lt z lt 11) in the lefthemisphere the localizer function was computed usingthe averaged evoked auditory magnetic field responses

Acknowledgments

This work was funded in part by Scientific Learning Co and agrant from the Whitaker Foundation

Reprint requests should be sent to John F Houde W M KeckCenter for Integrative Neuroscience University of CaliforniaSan Francisco 513 Parnassus Avenue S-762 San Francisco CA94143 USA or via e-mail houdephyucsfedu

Note

1 In a recent study of M100 responses to tones it was foundthat near detection thresholds a 15-dB change in SPL caused a34 change in M100 RMS amplitude (Stufflebeam et al 1998)

REFERENCES

Ahissar E Nagarajan S Ahissar M Protopapas A MahnckeH amp Merzenich M M (2001) Speech comprehension iscorrelated with temporal response patterns recorded fromauditory cortex Proceedings of the National Academy ofSciences USA 98 13367 ndash13372

Blakemore S J Frith C D amp Wolpert D M (1999)Spatio-temporal prediction modulates the perception ofself-produced stimuli Journal of Cognitive Neuroscience11 551 ndash559

Blakemore S J Wolpert D M amp Frith C D (1999) Thecerebellum contributes to somatosensory cortical activityduring self-produced tactile stimulation Neuroimage 10448 ndash459

Burnett T A Freedland M B Larson C R amp Hain T C(1998) Voice f0 responses to manipulations in pitchfeedback Journal of the Acoustical Society of America103 3153ndash3161

Cooper W E (1979) Speech perception and productionStudies in selective adaptation Norwood NJ Ablex

Creutzfeldt O amp Ojemann G (1989) Neuronal activityin the human lateral temporal lobe III Activity changesduring music Experimental Brain Research 77490 ndash498

Creutzfeldt O Ojemann G amp Lettich E (1989a) Neuronalactivity in the human lateral temporal lobe I Responses tospeech Experimental Brain Research 77 451 ndash475

Creutzfeldt O Ojemann G amp Lettich E (1989b) Neuronalactivity in the human lateral temporal lobe II Responses tothe subjects own voice Experimental Brain Research 77476 ndash489

Curio G Neuloh G Numminen J Jousmaki V amp Hari R(2000) Speaking modifies voice-evoked activity in thehuman auditory cortex Human Brain Mapping 9183 ndash191

Demonet J F Chollet F Ramsay S Cardebat DNespoulous J L Wise R Rascol A amp Frackowiak R(1992) The anatomy of phonological and semanticprocessing in normal subjects Brain 115 1753 ndash1768

Diesch E amp Luce T (1997) Magnetic fields elicited bytones and vowel formants reveal tonotopy and nonlinearsummation of cortical activation Psychophysiology 34501 ndash510

Elman J L (1981) Effects of frequency-shifted feedback onthe pitch of vocal productions Journal of the AcousticalSociety of America 70 45ndash50

Franklin G F Powell J D amp Emami-Naeini A (1991)Feedback control of dynamic systems (2nd ed) ReadingMA Addison-Wesley

Gracco V L Ross D Kalinowski J amp Stuart A (1994)Articulatory changes following spectral and temporalmodifications in auditory feedback Journal of theAcoustical Society of America 95 2821

Gunji A Hoshiyama M amp Kakigi R (2000) Identificationof auditory evoked potentials of onersquos own voice ClinicalNeurophysiology 111 214 ndash219

Gunji A Kakigi R amp Hoshiyama M (2000) Spatiotemporalsource analysis of vocalization-associated magnetic fieldsCognitive Brain Research 9 157 ndash163

Hari R Aittoniemi K Jarvinen M L Katila T amp Varpula T(1980) Auditory evoked transient and sustained magneticfields of the human brain Localization of neural generatorsExperimental Brain Research 40 237 ndash240

Hari R amp Makela J P (1988) Modification of neuromagneticresponses of the human auditory cortex by masking soundsExperimental Brain Research 71 87ndash92

Hein A V amp Held R (1962) A neural model for labile

1136 Journal of Cognitive Neuroscience Volume 14 Number 8

sensorimotor coordination In E Bernard amp M Hare(Eds) Biological prototypes and synthetic systems(vol 1 pp 71ndash74) New York Plenum

Hickok G amp Poeppel D (2000) Towards a functionalneuroanatomy of speech perception Trends in CognitiveSciences 4 131 ndash138

Hirano S Kojima H Naito Y Honjo I Kamoto YOkazawa H Ishizu K Yonekura Y Nagahama YFukuyama H amp Konishi J (1996) Cortical speechprocessing mechanisms while vocalizing visually presentedlanguages NeuroReport 8 363ndash367

Hirano S Kojima H Naito Y Honjo I Kamoto YOkazawa H Ishizu K Yonekura Y Nagahama YFukuyama H amp Konishi J (1997) Cortical processingmechanism for vocalization with auditory verbal feedbackNeuroReport 8 2379ndash2382

Hirano S Naito Y Okazawa H Kojima H Honjo I IshizuK Yenokura Y Nagahama Y Fukuyama H amp Konishi J(1997) Cortical activation by monaural speech soundstimulation demonstrated by positron emission tomographyExperimental Brain Research 113 75ndash80

Houde J F amp Jordan M I (1998) Sensorimotor adaptationin speech production Science 279 1213 ndash1216

Jacobs O L R (1993) Introduction to control theory(2nd ed) Oxford UK Oxford University Press

Jeannerod M (1988) The neural and behaviouralorganisation of goal-directed movements Oxford UKOxford University Press

Jurgens U (2002) Neural pathways underlying vocalcontrol Neuroscience and Biobehavioral Reviews 26235 ndash258

Kawahara H (1993) Transformed auditory feedback Effectsof fundamental frequency perturbation Journal of theAcoustical Society of America 94 1883

Kuriki S Mori T amp Hirata Y (1999) Motor planningcenter for speech articulation in the normal human brainNeuroReport 10 765 ndash769

Kuriki S amp Murase M (1989) Neuromagnetic study of theauditory responses in right and left hemispheres of thehuman brain evoked by pure tones and speech soundsExperimental Brain Research 77 127 ndash134

Lane H amp Tranel B (1971) The lombard sign and the roleof hearing in speech Journal of Speech and HearingResearch 14 677ndash709

Lee B S (1950) Some effects of side-tone delay Journal ofthe Acoustical Society of America 22 639 ndash640

Leonardo A amp Konishi M (1999) Decrystallization of adultbirdsong by perturbation of auditory feedback Nature 399466 ndash470

Levelt W J (1983) Monitoring and self-repair in speechCognition 14 41ndash104

Levelt W J M (1989) Speaking From intention toarticulation Cambridge MIT Press

Liegeois-Chauvel C Musolino A Badier J M Marquis P ampChauvel P (1994) Evoked potentials recorded from theauditory cortex in man Evaluation and topography of themiddle latency components Electroencephalography andClinical Neurophysiology 92 204 ndash214

Mahncke H (1998) Oscillations and temporal processingin sensory cortex Unpublished doctoral dissertationUniversity of California San Francisco

McCasland J S amp Konishi M (1981) Interaction betweenauditory and motor activities in an avian song controlnucleus Proceedings of the National Academy of SciencesUSA 78 7815ndash7819

McGuire P K Silbersweig D A amp Frith C D (1996)Functional neuroanatomy of verbal self-monitoring Brain119 907 ndash917

Mosher J C Lewis P S amp Leahy R M (1992) Multipledipole modeling and localization from spatio-temporal megdata IEEE Transactions on Biomedical Engineering 39541 ndash557

Muller-Preuss P amp Ploog D (1981) Inhibition of auditorycortical neurons during phonation Brain Research 21561ndash76

Numminen J amp Curio G (1999) Differential effects of overtcovert and replayed speech on vowel-evoked responsesof the human auditory cortex Neuroscience Letters 27229ndash32

Numminen J Salmelin R amp Hari R (1999) Subjectrsquos ownspeech reduces reactivity of the human auditory cortexNeuroscience Letters 265 119 ndash122

Papanicolaou A C Raz N Loring D W amp EisenbergH M (1986) Brain stem evoked response suppressionduring speech production Brain and Language 2750ndash55

Perkell J (1997) Articulatory processes In W J Hardcastleamp J Laver (Eds) The handbook of phonetic sciences(pp 333 ndash370) Oxford UK Blackwell Publishers

Picton T W Alain C Woods D L John M S Scherg MValdes-Sosa P Bosch-Bayard J amp Trujillo N J (1999)Intracerebral sources of human auditory-evoked potentialsAudiology and Neuro-otology 4 64ndash79

Poeppel D Yellin E Phillips C Roberts T P Rowley H AWexler K amp Marantz A (1996) Task-induced asymmetry ofthe auditory evoked m100 neuromagnetic field elicited byspeech sounds Brain Research Cognitive Brain Research4 231 ndash242

Reite M Adams M Simon J Teale P Sheeder JRichardson D amp Grabbe R (1994) Auditory m100component 1 Relationship to heschlrsquos gyri BrainResearch Cognitive Brain Research 2 13ndash20

Richer F Alain C Achim A Bouvier G amp Saint-Hilaire J M(1989) Intracerebral amplitude distributions of the auditoryevoked potential Electroencephalography and ClinicalNeurophysiology 74 202 ndash208

Robinson S amp Vrba J (1999) Functional neuroimaging bysynthetic aperture magnetometry (SAM) In Yoshimoto et al(Eds) Recent advances in biomagnetism (pp 302 ndash305)Sendai Japan Tokyo University Press

Scherg M amp Von Cramon D (1985) Two bilateral sources ofthe late aep as identified by a spatio-temporal dipole modelElectroencephalography and Clinical Neurophysiology 6232ndash44

Scherg M amp Von Cramon D (1986) Evoked dipolesource potentials of the human auditory cortexElectroencephalography and Clinical Neurophysiology65 344 ndash360

Schmidt M F amp Konishi M (1998) Gating of auditoryresponses in the vocal control system of awake songbirdsNature Neuroscience 1 513ndash518

Sekihara K Nagarajan S S Poeppel D Marantz Aamp Miyashita Y (2001) Reconstructing spatio-temporalactivities of neural sources using an meg vectorbeamformer technique IEEE Transactions on BiomedicalEngineering 48 760ndash771

Sekihara K Poeppel D Marantz A Koizumi H ampMiyashita Y (1997) Noise covariance incorporatedmeg-music algorithm A method for multiple-dipoleestimation tolerant of the influence of background brainactivity IEEE Transactions on Biomedical Engineering44 839 ndash847

Stevens K N (1999) Acoustic phonetics CambridgeMIT Press

Stufflebeam S M Poeppel D Rowley H A amp RobertsT P (1998) Peri-threshold encoding of stimulus

Houde et al 1137

frequency and intensity in the m100 latency NeuroReport9 91ndash94

Suga N amp Schlegel P (1972) Neural attenuation ofresponses to emitted sounds in echolocating rats Science177 82ndash84

Suga N amp Shimozawa T (1974) Site of neural attenuation ofresponses to self-vocalized sounds in echolocating batsScience 183 1211 ndash1213

von Bekesy G (1949) The structure of the middle ear and thehearing of onersquos own voice by bone conduction Journal ofthe Acoustical Society of America 21 217 ndash232

Weiskrantz L Elliott J amp Darlington C (1971) Preliminaryobservations on tickling oneself Nature 230 598 ndash599

Woods D L Knight R T amp Neville H J (1984)Bitemporal lesions dissociate auditory evoked potentialsand perception Electroencephalography and ClinicalNeurophysiology 57 208 ndash220

Yates A J (1963) Delayed auditory feedback PsychologicalBulletin 60 213 ndash232

Zatorre R J Evans A C Meyer E amp Gjedde A (1992)Lateralization of phonetic and pitch discrimination in speechprocessing Science 256 846 ndash849

1138 Journal of Cognitive Neuroscience Volume 14 Number 8

the tones and speaking condition was significantly lessthan the M100 response in the tones and tape playbackcondition Nevertheless the difference was small In theright hemisphere there was an approximately equalreduction in M100 and M200 responses to the tones inthe speaking and tape playback conditions Within theM100ndashM200 region responses in these two conditionswere only significantly different at one latency as shownby the asterisk

It appears that at the most the act of speakingcreates only a limited degree of nonspecific attenuationto the tones in the left hemisphere that is not seen inthe tape playback condition To quantify how muchnonspecific attenuation contributes to the suppressionof the response to self-produced speech we can esti-mate the signal reductions needed to produce theresponse reductions seen in these experiments Fromthe results of Experiment 1 we have seen that com-pared to speech from the tape playback self-productionof speech created a 30 reduction in M100 amplitudein the left hemisphere and a 15 reduction in the righthemisphere equivalent to 13 and 7 dB decreases ineffective input signal respectively1 In Experiment 2 inthe left hemisphere there was possibly an extra 7reduction in M100 response to tones while subjectsspoke as compared to the M100 response to tones duringtape playback which is equivalent to only a 3-dB reduc-tion in effective input signal Thus it appears that non-specific attenuation cannot account for all the responsesuppression recorded for self-produced speech

It has also been argued that any attenuation in theresponses during vocalization could occur due to boneconduction and activation of middle-ear muscles or dueto inhibition at the level of the brain stem (Papanicolaouet al 1986) Thus to further test the hypothesis ofnonspecific attenuation we measured brainstem evokedresponses to click trains in five subjects under threeconditions identical to those in Experiment 2mdashsilenceproduced speech and tape-recorded speech In the lattertwo conditions we attenuated the speech background by20 dB to obtain a clear evoked response to clicks Foreach of these conditions we recorded the brainstemevoked response and measured the latency and ampli-tude of Waves IndashV The results from these experimentsindicated that there were no statistical differences ineither the amplitude or the latencies for all wavesbetween the three background conditions ( p gt 2)These results provide further support to the hypothesisthat our observed effects were cortical in origin

Experiment 3

Another possible cause of the response reduction toself-produced speech is that activity in the auditory

Figure 7 RMS waveforms for responses to 10-kHz tone pipsaveraged across all subjects for the three conditions of Experiment 2the tones alone condition (dashed traces) the tones and speakingcondition (thick traces) and the tones and tape playback condition(thin traces) The vertical bars in each trace show standard errorswhile the asterisks show intervals where the response waveforms fromthe tones and speaking condition (red) and the tones and tapeplayback condition (blue) differed significantly relative to p lt 001 forboth hemispheres

Figure 8 Setup for the two conditions of Experiment 3

1130 Journal of Cognitive Neuroscience Volume 14 Number 8

cortex is the result of a comparison between the incom-ing signal and an internally generated prediction of whatthat signal would be If this were to be the case thenaltering the incoming signal should create a mismatchwith that expectation that would reduce or abolish theresponse suppression

In Experiment 3 we tested this possibility by alter-ing the auditory feedback to the subject during self-production of speech The setup for Experiment 3 isshown in Figure 8 Like Experiment 1 this experimentconsisted of two conditions speaking and tape play-back However instead of hearing their speech (self-produced or from tape playback) subjects heard a sumof their speech plus the output of a white noise gen-erator that produced noise bursts gated to the durationof the speech utterance Subjects reported that theycould not hear their own speech and that they onlyheard noise gated to their vowel production

The results of Experiment 3 are shown in Figure 9 Forboth hemispheres the thick traces show the averageevoked response of subjects hearing gated noise in thespeaking condition while the thin traces show theirresponses to gated noise in the tape playback conditionAsterisks mark poststimulus latencies where the evokedresponses in the two conditions differ significantlyrelative to p lt 001 for both hemispheres The figureshows subjectsrsquo evoked responses in the two conditionsare significantly different at a number of latencies In theleft hemisphere the responses in the speaking and tapeplayback conditions differ significantly around the M100responses and at a latency of 150 msec In the righthemisphere the speaking and tape playback responses

differ significantly at the 50- and 200-msec latenciesSince we are limiting our analysis of these results toconsidering only M100 responses the only responsedifferences of interest are the significant differences seenin the M100 region of the left hemisphere responses Itis clear from the figure however that these responsedifferences arise principally from a latency differencebetween M100 responses the M100 response to gatednoise in the tape playback condition is about 20 msecearlier that the M100 response to gated noise in thespeaking condition The amplitudes of the two M100responses appear nearly identical

In sum Figure 9 shows that the M100 amplitudesuppression seen in Experiment 1 is abolished in bothhemispheres the amplitude of M100 response to noisegenerated from subjectsrsquo speech via the microphone isas large as the M100 response to noise generated fromthe tape playback of their speech Thus when wealtered subjectsrsquo speech feedback in Experiment 3 wefound that the response suppression to self-producedspeech disappeared which is consistent with the sup-pression resulting from auditory input matching aninternally expected input

DISCUSSION

The present study demonstrates that the human audi-tory cortex responds differently to self-produced speechthan to externally produced speech In Experiment 1significant suppression in the amplitude of M100responses was recorded in the speaking condition inboth hemispheres confirming that self-production ofspeech suppresses the response in the auditory cortexto that speech In Experiment 2 M100 responses totone pips were seen to be modestly more stronglysuppressed by self-produced than by externally pro-duced speech However this relatively weak extra sup-pression was not sufficient to explain the suppressionseen in Experiment 1 On the other hand in Experi-ment 3 we were able to abolish the suppression ofM100 responses to self-produced speech by altering thefeedback heard by subjects This last result is consistentwith the hypothesis that the suppression of response inthe auditory cortex to self-produced speech resultsfrom a match with expected auditory feedback

Comparison with Other Studies

Experiment 1 is quite similar to the study by Curio et al(2000) which also examined M100 response differencesbetween speaking and tape playback conditions Theauthors found M100 amplitude differences across listen-ing conditions in both hemispheres but unlike thepresent study these differences were not significant inthe right hemisphere Further in contradiction to ourstudies Curio et al described significant differencesin M100 latencies across listening conditions in both

Figure 9 RMS response waveforms in Experiment 3 averaged acrossall subjects The vertical bars in each trace show standard errors whilethe asterisks show intervals where the response waveforms from thespeaking condition (thick traces) and tape playback condition (thintraces) differ significantly relative to p lt 001 for both hemispheres

Houde et al 1131

hemispheres Different results in M100 latencies couldbe accounted for by the fact that we matched theacoustic waveform presented to each subject whileCurio et al behaviorally matched the perceptual inten-sity between the listening conditions Because subjectscould set the tape playback volume to be different fromtheir speech feedback volume differing audio levels inthe two experiment conditions could account for someof the M100 latency differences that they report WithMEG data acquisition triggered by speech signal onsettriggering occurs when the audio signal exceeds somethreshold voltage As a result triggering time is sensitiveto the amplification of the signal The more a signal isamplified the faster it will reach the trigger thresholdThus if tape playback volume levels were less thanspeech feedback volume levels triggering from tapeplayback could be delayed compared to the speakingcondition If so then for the same true M100 latencyM100 peaks would appear to have shorter latencies inthe tape playback condition

In Experiment 3 the amplitude difference in M100responses between the speaking and tape playbackconditions was abolished by using gated noise todistort the feedback heard by subjects Interestinglyhowever this feedback manipulation created a latencydifference between the two conditions In the lefthemisphere the M100 response to gated noise trig-gered by tape playback was 15 msec earlier than thatfor gated noise triggered by the subjectrsquos own speechwhile in the right hemisphere such a latency differenceis not seen Curio et al (2000) also found that thereduced latency of M100 responses to tape playbackversus self-produced speech was significantly morepronounced in the left hemisphere Like Curio et alwe attribute this left hemisphere latency difference totwo factors First this latency difference may representdifferential processing of externally produced versusself-produced auditory feedback Although the gatednoise altered the spectral characteristics of their speechfeedback such that it was likely to mismatch anyinternally generated expectations of how their feedbackwould sound the gated noise did not alter the tempo-ral characteristics of their speech feedback Subjectsstill heard auditory feedback begin at the same time asthey commenced phonation This temporal predictabil-ity could potentially have allowed subjects to processdifferently gated noise arising from their own speechwhose onset they could predict from gated noisearising from tape playback whose onset they couldnot predict Second the fact that this latency differencewas seen primarily in the left hemisphere is consistentwith the left hemispherersquos dominance in the produc-tion of speech in right-handed speakers Other aspectsof our results may also reflect this left hemispheredominance In Experiment 1 the M100 response totape playback speech was larger in the left hemispherethan in the right and the M100 response to self-

produced speech was temporally sharper in the lefthemisphere than in the right Additionally in Experi-ment 2 in the left hemisphere the M100 response totones was significantly more suppressed by self-pro-duced speech than by tape playback speechmdasha differ-ence not seen in the right hemisphere

Results from Experiment 2 are mostly consistent withthe recent experiments by Numminen and Curio (1999)They reported that neuromagnetic responses to shortrecorded vowel sounds were delayed and dampenedrelative to background-free presentations They did notreport how much they externally attenuated the pro-duced speech signal in order to obtain a clear responseto the probe-vowel sound In our experiments we usedprobe tones instead of vowel sounds and we attenuatedthe produced speech signal to the ear by 20 dBSurprisingly we did not observe any latency differencesin the M100 responses for tones between the threeconditions We believe that the lack of a latency effectcould be accounted for by the fact that we used tonesinstead of vowel sounds (Diesch amp Luce 1997 Poeppelet al 1996 Kuriki amp Murase 1989) Furthermore by anargument analogous to that discussed above for theCurio et al (2000) latency results it is also reasonablethat any amplitude mismatches in the masking soundscould account for discrepancies in the latency effect(Hari amp Makela 1988) However while Numminen andCurio identified auditory interference as the main causeof these modifications their results also provide evi-dence against the nonspecific attenuation hypothesisalthough they do not specifically discuss their results inthis form For instance they found an additional 6ndash9decrease in amplitude in response to probe-vowelsounds during produced speech background whencompared with tape-recorded speech backgroundThese amplitude differences are consistent with ourExperiment 2 results

Gunji et al have found that vocalization-related cort-ical magnetic fields activated six sources that were foundto temporally overlap in the period 0ndash100 msec aftervocalization onset (Gunji Hoshiyama amp Kakigi 2000Gunji Kakigi amp Hoshiyama 2000) Sources 1 and 2 wereactivated approximately 150 msec before the vocalizationonset and were located in laryngeal motor areas of theleft and right hemispheres respectively Kuriki Mori andHirata (1999) have also found a motor planning centerfor speech articulation that is activated 120ndash320 msecbefore the onset of vocalization and located in a regionaround the superior end of the left insula Gunji et alSources 5 and 6 were located in the truncal motor area ineach hemisphere and were similar to Sources 1 and 2 Incontrast their Sources 3 and 4 were located in theauditory cortices of the left and right hemisphererespectively and were activated after vocalization onsetHowever these experiments did not explore either themodulation of the responses in the auditory or motorcortex due to vocalization or feedback alteration In our

1132 Journal of Cognitive Neuroscience Volume 14 Number 8

experiments because we were unable to simultaneouslymeasure activity from motor and supplementary motorareas using a 37-channel sensor array over each hemi-sphere we focus here only on the activity from auditorycortices of both hemispheres

More relevant to our results are the studies of Hiranoet al (Hirano et al 1996 Hirano Kojima et al 1997Hirano Naito et al 1997) who used PET to look atcortical activation during speaking In their first studyHirano et al found a lack of activation in the auditorycortex while subjects vocalized In their second studyHirano et al altered how subjects heard their ownvocalizations and this time recorded significant activa-tion of the auditory cortex These results are consistentwith our findings in Experiments 1 and 3 of our study

Why is the Response to Self-Produced SpeechSuppressed

The present study along with recent MEG studies byCurio et al and other brain imaging studies that haveused PET or fMRI demonstrate that during speechproduction the auditory cortex suppresses its responseto expected acoustic signals (Hirano et al 1996 HiranoKojima et al 1997 Hirano Naito et al 1997) Doesthat suppression serve any function We speculate abouttwo possibilities

Auditory Perception

One possibility is that the suppression results from theprocess of distinguishing self-produced from externallyproduced sources It would be useful for sensory systemsto make this distinction since sensory information fromthese two types of sources is commonly used for differ-ent purposes Sensory information from self-producedmotor actions could potentially be used for feedbackcontrol On the other hand sensory information fromexternal sources is primarily used for recognition

But how could the sensory systems distinguish self-produced from externally produced sensory informa-tion It has been proposed by previous researchers thatthis is done by having motor actions produce expect-ations of their sensory consequences (Jeannerod 1988)These sensory outcome predictions are then comparedwith the actual sensory input whatever matches theoutcome predictions is inhibited In that way the sen-sory consequences of self-produced actions could befiltered from incoming sensory data thus highlightingany unpredicted and therefore external (or novel) stim-ulus that should potentially be attended to

For a number of reasons the results of our experi-ments are consistent with the above account Firstalthough we found significant suppression of the re-sponses of the auditory cortex to self-produced speechwe found little evidence that the act of producingspeech suppresses responses to other sounds (in our

case the 1-kHz tone) This finding is consistent with thehypothesis that only responses to self-produced soundsare being filtered from the auditory cortex In additionwe found that altering the auditory feedback with gatednoise abolished the suppression in the auditory cortexinduced by speaking consistent with the hypothesis thatthe filtering out of auditory information arising from self-produced speech is done by comparison with predictedauditory feedback

Recently Blakemore et al have recorded the sametypes of response suppression phenomena in the soma-tosensory cortex suggesting that the need to distinguishself-produced from externally generated sensory inputmay be a general property of sensory systems In theirseries of experiments Blakemore et al used bothbehavioral and physiological procedures to look at re-sponses to both self-produced and externally producedsomatosensory stimulation In one experiment theyexamined the well-known phenomenon that peoplecannot tickle themselves (Weiskrantz Elliott amp Darling-ton 1971) This phenomenon could be explained bysupposing that somatosensory responses are inhibitedwhen the incoming sensory (touch) information matchesa prediction of that input generated by the personrsquos ownmotor system To test this hypothesis these investigatorshad subjects tickle their palm via an apparatus that couldintroduce a time delay between their movement andthe touch stimulation of their palm (Blakemore Frith ampWolpert 1999) When this was done subjects reportedthat with increasing delay the stimulation of theirpalm was increasingly stimulating (ticklish) That theperceived stimulation increased with delay could beexplained by supposing that the delay made the incom-ing sensory information mismatch in time the predictedsensory input This experiment was followed up in anfMRI imaging study looking at activation of the somato-sensory cortex and cerebellum to both self-producedand externally generated palm stimulation (BlakemoreWolpert amp Frith 1999) It was found that activation ofthe somatosensory cortex to self-produced palm stimu-lation was suppressed compared to externally producedpalm stimulationmdasha direct analog of the results recordedin the auditory cortex in our experiments

Speech Motor Control

The suppression could also serve to regulate howmuch auditory feedback is used in the ongoing controlof speech

A problem with using auditory feedback to controlspeech is the delay incurred by the processing of theauditory input To control speech auditory feedbackmust convey information about the positions of thevocal tract articulators From speech acoustics we knowthat formant frequencies and pitch are the acousticparameters that convey this information (Stevens1999) Extracting these parameters from the type of

Houde et al 1133

signal provided by the cochlea requires significant signalprocessing which in turn suggests the possibility ofsignificant delay if a succession of auditory brain areasare needed for this processing (Jurgens 2002 BurnettFreedland Larson amp Hain 1998 Perkell 1997) Inaddition there will be delays associated with the gen-eration of a response in the motor cortex There will alsobe intrinsic delays in transmitting the neural signals tothe appropriate muscles as well as delays in muscleresponses to these signals (Jurgens 2002) Togetherthese factors could contribute to a significant feedbackloop delay between auditory input and motor output

Unfortunately we know from control theory that ingeneral using feedback to control a system with a largefeedback loop delay will not work because the systembecomes unstable (Franklin Powell amp Emami-Naeini1991) Intuitively this instability arises because the delaymakes the feedback uncorrelated with the state of thesystem sensory feedback from some time in the pastdoes not entirely reflect the current state of the systemYet auditory feedback does appear to be involved in theonline control of some aspects of speech production Asnoted in the Introduction if the pitch of a speakerrsquosfeedback is perturbed they will typically compensatewithin 100 ndash200 msec (Burnett et al 1998 Kawahara1993 Elman 1981) Thus in spite of a large (100ndash200 msec) feedback loop delay auditory feedback doesstably regulate a speakerrsquos pitch How is this possibleAgain from control theory we know that when feedbackis delayed corrupted by noise or otherwise madeuncorrelated with current motor output a successfulapproach to using feedback control is to attenuate thesensory feedback to the degree that it is uncorrelatedSpecifically in feedback control based on Kalman filter-ing the Kalman gain on sensory feedback is set in partto be inversely proportional to the degree to whichsensory feedback is correlated with the current systemstate (Jacobs 1993) In an analogous way it may be thatthe suppression of the auditory cortex during speakingfunctions to attenuate the auditory feedback to a levelcommensurate with its delay such that it can be usedproperly in the control of speech

Finally it is possible that both of the above accountsare true It may be that the suppression of self-producedspeech acts as a filter for incoming sensory data allow-ing attention to be paid to externally produced sensa-tions while at the same time properly attenuating theauditory feedback for use in the online control ofspeech Future experiments must be designed to distin-guish between the above hypothesized possibilities

METHODS

Subjects

Eight male volunteers (ages 25ndash40) participated inExperiments 1 and 3 of this study Of these volun-

teers six also participated in Experiment 2 All subjectsgave their informed consent All studies were per-formed with the approval of the UCSF Committee forHuman Research

Experimental Design

Magnetic fields were recorded from both hemispheresin a shielded room using two 37-channel biomagnetom-eters with SQUID-based first-order gradiometer sensors(Magnes II 4-D Neuroimaging San Diego CA) A sche-matic of the experimental setup is shown in Figure 1A directional microphone was placed in the chamber at adistance where it did not distort the recorded magneticfields Triggering of the data acquisition system and ofstimuli delivered to the ears was controlled and itdiffered for each of three experiments in the study

Fiduciary points were marked on the skin for latercoregistration with structural magnetic resonanceimages and the head shape was digitized to constrainsubsequent source modeling The sensor was initiallypositioned over the estimated location of the auditorycortices in both hemispheres such that a dipolar M100response was evoked by single 400-msec duration tonepips (1 kHz 5 msec risefall ramps 90 dB SPL) Dataacquisition epochs were 600 msec in total duration witha 100-msec prestimulus period referenced to the onsetof the first speech stimulus Data were acquired at asampling rate of 1041 Hz The position of the sensorwas then refined so that a single dipole localizationmodel of the averaged evoked magnetic field responsesto 100 tones resulted in a correlation and goodness-of-fitgreater than 95

With satisfactory sensor positioning over the auditorycortices in both hemispheres but before starting theexperiments subjects practiced producing the neutralvowel sound by phonating while relaxing their uppervocal tract generating no movement of their jaw ortongue Preliminary experiments revealed that this formof speech production produced no movement artifactsin these magnetic field recordings

During the speaking condition of Experiments 1 and 3the audio onset of each utterance picked up the micro-phone triggered the MEG data acquisition system whichrecorded 1 sec of data (300 msec pretrigger 700 msecposttrigger) at a sampling rate of 1041 Hz Subjectswere instructed to produce short vowel sounds byphonating their vocal chords without any movement oftheir jaw or tongue Responses to 100 such utteranceswere recorded and these magnetic field recordings werethen averaged to obtain the evoked magnetic fieldresponses to self-produced speech for each MEG detec-tor channel In the subsequent tape playback conditionthe audio signal from the tape recorder was fed to thesubjects earphones and the trigger input for MEG dataacquisition Again magnetic field recordings wereaveraged to obtain the evoked magnetic field responses

1134 Journal of Cognitive Neuroscience Volume 14 Number 8

During the tape playback condition of Experiments 1and 3 the audio level of the signal fed to the earphonesand trigger was adjusted to be the same as that of thespeaking condition Because of this the trigger of MEGdata acquisition (which was based on an audio voltagethreshold) occurred at the same time relative to utter-ance onset in both conditions The equal audio levelsensured that triggering latency differences could notaccount for any differences between the evoked re-sponses to self-produced and tape-recorded speech InExperiment 1 the subject heard these equal audio levelsin the earphones which meant that his side tone ampli-tude was the same in both the speaking and tape play-back conditions In Experiment 3 the audio signal foreach utterance was used to trigger both the MEG dataacquisition system and a white noise generator (20 Hzndash20 kHz bandwidth 85 dB SPL) In this case a sum of theiraudio signal and noise was presented to the subject

In Experiment 2 onset of the tone pips triggered MEGdata acquisition in each of the three conditions (tonesalone tones and speaking and tones and tape playback)These 1-kHz tones each had a 400-msec duration 5-mseconset and offset ramps and a 08ndash12-sec ISI betweentones The subject heard the tones at 90 dB SPL As withExperiment 1 data acquisition epochs were 10 sec intotal duration with a 300-msec prestimulus period Datawere acquired at a sampling rate of 1041 Hz In eachcondition the evoked magnetic field responses to thetone pips were obtained by averaging 100 epochs

Epochs 1500 msec long with a 500-msec prestimulusperiod were collected at a sampling rate of 1041 Hz Theraw epoch magnetic field data from these 100 utteranceswere then averaged to obtain the evoked magnetic fieldresponses to gated noise during self-production

The voltage level of the audio signal fed to earphoneand trigger signal in this condition was adjusted to beidentical to the voltage levels recorded during the speak-ing condition ensuring that triggering differences couldnot account for any differences between the evokedresponses to noise in the self-produced and tape play-back blocks

Data Analysis

Data from the two sensor arrays positioned over eachhemisphere were separately analyzed From the evokedmagnetic field response for each condition in eachchannel the RMS value of evoked magnetic field strengthaveraged across channels was computed The RMS meth-od of averaging (square root of the mean of squaredchannel field strengths) was used to avoid the cancella-tion that occurs when the magnetic fields from detectorson opposite sides of the current dipole (whose fields areroughly equal in magnitude but opposite in sign) areaveraged together This process resulted in one RMStime waveform for each hemisphere for each conditionfor each subject For statistical comparisons across con-

ditions a one-way analysis of variance was performedwith condition as factor and time as a repeated measure

The RMS across channels is a convenient measure ofactivity that is independent of the position of individualsensors Following optimal placement of the sensor for asignificant M100 dipolar response RMS has been shownto correlate well with the lsquolsquoequivalent current-dipolestrengthrsquorsquo (also called the lsquolsquoQ valuersquorsquo) of a dipole posi-tioned in and around the auditory cortex The RMSmeasure is also correlated with the data obtained from(1) the channel with the biggest signal-to-noise ratioand (2) the first two principal components of the arrayresponse (Ahissar et al 2001 Mahncke 1998)

In sensor arrays comprising of planar gradiometers(like the Neuromag 122 or 306) the channel with largestsignal-to-noise ratio is often closest to the source (asdictated by the physics) However in the case of sensorarrays comprising of axial gradiometers (as in ourMagnes II system) the channel with the largest signal-to-noise ratio reflects the peak of the magnetic fieldwhich is not the closest channel to a source Moreoverthe lsquolsquobestrsquorsquo channel could reflect either a peak lsquolsquoinwardrsquorsquoor lsquolsquooutwardrsquorsquo field requiring signed inversion whileaveraging such data across subjects Finally since thesensor array position can varying considerably acrosssubjects it is difficult to use procedures typical for EEGand whole-head studies wherein single channel data isoften picked for analysis across subjects These consid-erations have led us to use RMS instead of pickingspecific channels for our analysis

Single Dipole Model Fit

A single equivalent current-dipole model was calculatedseparately for each hemisphere for each poststimulustime point using standard 4-D Neuroimaging softwareoperating on data that had been filtered between 1 and20 Hz The localization algorithm used an iterative least-squares minimization to compute the strength andlocation of a single dipole in a spherical volume ofuniform conductivity that can account for the sensordata Dipole fits were accepted based on a local corre-lation maximum criteria of 95 and goodness-of-fit valuesgreater than 95

The MEG anatomical reference frame was establishedusing a digital device called the lsquolsquosensor position indica-torrsquorsquo A set of three table-mounted and one array-mounted receivers triangulates the signal from a stylustransmitter positioned at fiducial reference points on thesubjectrsquos head surface (typically nasion left right pre-auricular points Cz and inion) The same stylus trans-mitter arrangement served to define the curvature of thehead by tracing the surface from which a lsquolsquolocal spherersquorsquomodel of the head was generated To coregister thedipole locations on the MRI image the fiduciary pointsserve as the basis for a common coordinate system Bysuperimposing these fiducial landmarks on MR images

Houde et al 1135

of the subject it was possible to define the position ofthe computed point sources with an accuracy of 5 mm

Multiple Dipole Localization Analysis

Multiple dipole localization analyses of spatio-temporalevoked magnetic fields was also performed using Multi-ple Signal Classification (MUSIC) (Sekihara PoeppelMarantz Koizumi amp Miyashita 1997 Mosher Lewisamp Leahy 1992) and beamformer algorithms (Mosheret al 1992 Sekihara Nagarajan Poeppel Marantz ampMiyashita 2001 Robinson amp Vrba 1999) MUSIC meth-ods are based on estimation of a signal lsquolsquosubspacersquorsquofrom entire spatio-temporal MEG data using singular-value decomposition (SVD) A version of the MUSICalgorithm referred to as the lsquolsquoconventionalrsquorsquo MUSICalgorithm was implemented in MATLAB under theassumption that the sources contributing to the MEGdata arose from multiple dipoles (lt37 in number)which were stationary and were located within a spher-ical volume of uniform conductivity (Sekihara et al1997) The locations of dipoles are typically determinedby conducting a search over a three-dimensional grid ofinterest within the head Given the sensor positionsand the coordinates of the origin of a lsquolsquolocal spherersquorsquoapproximation of the head shape for each subject alead-field matrix was computed for each point in this3-D grid At each point in this grid from the lead-fieldmatrix and the covariance matrix of the spatio-temporalMEG data the value of a MUSIC localizer function canthen be computed (Sekihara et al 1997 Equation (4))Maxima of this localizer function then corresponded tothe location of dipolar sources We used the MUSICalgorithm to compute the localizer functions over athree-dimensional grid in the hemisphere contralateralto the stimulus delivery The localizer was computedusing the evoked magnetic field responses over a timewindow of 1000 msec following the onset of thesentences For each subject at each point in a 3-D grid(iexcl4 lt x lt 6 0 lt y lt 8 3 lt z lt 11) in the lefthemisphere the localizer function was computed usingthe averaged evoked auditory magnetic field responses

Acknowledgments

This work was funded in part by Scientific Learning Co and agrant from the Whitaker Foundation

Reprint requests should be sent to John F Houde W M KeckCenter for Integrative Neuroscience University of CaliforniaSan Francisco 513 Parnassus Avenue S-762 San Francisco CA94143 USA or via e-mail houdephyucsfedu

Note

1 In a recent study of M100 responses to tones it was foundthat near detection thresholds a 15-dB change in SPL caused a34 change in M100 RMS amplitude (Stufflebeam et al 1998)

REFERENCES

Ahissar E Nagarajan S Ahissar M Protopapas A MahnckeH amp Merzenich M M (2001) Speech comprehension iscorrelated with temporal response patterns recorded fromauditory cortex Proceedings of the National Academy ofSciences USA 98 13367 ndash13372

Blakemore S J Frith C D amp Wolpert D M (1999)Spatio-temporal prediction modulates the perception ofself-produced stimuli Journal of Cognitive Neuroscience11 551 ndash559

Blakemore S J Wolpert D M amp Frith C D (1999) Thecerebellum contributes to somatosensory cortical activityduring self-produced tactile stimulation Neuroimage 10448 ndash459

Burnett T A Freedland M B Larson C R amp Hain T C(1998) Voice f0 responses to manipulations in pitchfeedback Journal of the Acoustical Society of America103 3153ndash3161

Cooper W E (1979) Speech perception and productionStudies in selective adaptation Norwood NJ Ablex

Creutzfeldt O amp Ojemann G (1989) Neuronal activityin the human lateral temporal lobe III Activity changesduring music Experimental Brain Research 77490 ndash498

Creutzfeldt O Ojemann G amp Lettich E (1989a) Neuronalactivity in the human lateral temporal lobe I Responses tospeech Experimental Brain Research 77 451 ndash475

Creutzfeldt O Ojemann G amp Lettich E (1989b) Neuronalactivity in the human lateral temporal lobe II Responses tothe subjects own voice Experimental Brain Research 77476 ndash489

Curio G Neuloh G Numminen J Jousmaki V amp Hari R(2000) Speaking modifies voice-evoked activity in thehuman auditory cortex Human Brain Mapping 9183 ndash191

Demonet J F Chollet F Ramsay S Cardebat DNespoulous J L Wise R Rascol A amp Frackowiak R(1992) The anatomy of phonological and semanticprocessing in normal subjects Brain 115 1753 ndash1768

Diesch E amp Luce T (1997) Magnetic fields elicited bytones and vowel formants reveal tonotopy and nonlinearsummation of cortical activation Psychophysiology 34501 ndash510

Elman J L (1981) Effects of frequency-shifted feedback onthe pitch of vocal productions Journal of the AcousticalSociety of America 70 45ndash50

Franklin G F Powell J D amp Emami-Naeini A (1991)Feedback control of dynamic systems (2nd ed) ReadingMA Addison-Wesley

Gracco V L Ross D Kalinowski J amp Stuart A (1994)Articulatory changes following spectral and temporalmodifications in auditory feedback Journal of theAcoustical Society of America 95 2821

Gunji A Hoshiyama M amp Kakigi R (2000) Identificationof auditory evoked potentials of onersquos own voice ClinicalNeurophysiology 111 214 ndash219

Gunji A Kakigi R amp Hoshiyama M (2000) Spatiotemporalsource analysis of vocalization-associated magnetic fieldsCognitive Brain Research 9 157 ndash163

Hari R Aittoniemi K Jarvinen M L Katila T amp Varpula T(1980) Auditory evoked transient and sustained magneticfields of the human brain Localization of neural generatorsExperimental Brain Research 40 237 ndash240

Hari R amp Makela J P (1988) Modification of neuromagneticresponses of the human auditory cortex by masking soundsExperimental Brain Research 71 87ndash92

Hein A V amp Held R (1962) A neural model for labile

1136 Journal of Cognitive Neuroscience Volume 14 Number 8

sensorimotor coordination In E Bernard amp M Hare(Eds) Biological prototypes and synthetic systems(vol 1 pp 71ndash74) New York Plenum

Hickok G amp Poeppel D (2000) Towards a functionalneuroanatomy of speech perception Trends in CognitiveSciences 4 131 ndash138

Hirano S Kojima H Naito Y Honjo I Kamoto YOkazawa H Ishizu K Yonekura Y Nagahama YFukuyama H amp Konishi J (1996) Cortical speechprocessing mechanisms while vocalizing visually presentedlanguages NeuroReport 8 363ndash367

Hirano S Kojima H Naito Y Honjo I Kamoto YOkazawa H Ishizu K Yonekura Y Nagahama YFukuyama H amp Konishi J (1997) Cortical processingmechanism for vocalization with auditory verbal feedbackNeuroReport 8 2379ndash2382

Hirano S Naito Y Okazawa H Kojima H Honjo I IshizuK Yenokura Y Nagahama Y Fukuyama H amp Konishi J(1997) Cortical activation by monaural speech soundstimulation demonstrated by positron emission tomographyExperimental Brain Research 113 75ndash80

Houde J F amp Jordan M I (1998) Sensorimotor adaptationin speech production Science 279 1213 ndash1216

Jacobs O L R (1993) Introduction to control theory(2nd ed) Oxford UK Oxford University Press

Jeannerod M (1988) The neural and behaviouralorganisation of goal-directed movements Oxford UKOxford University Press

Jurgens U (2002) Neural pathways underlying vocalcontrol Neuroscience and Biobehavioral Reviews 26235 ndash258

Kawahara H (1993) Transformed auditory feedback Effectsof fundamental frequency perturbation Journal of theAcoustical Society of America 94 1883

Kuriki S Mori T amp Hirata Y (1999) Motor planningcenter for speech articulation in the normal human brainNeuroReport 10 765 ndash769

Kuriki S amp Murase M (1989) Neuromagnetic study of theauditory responses in right and left hemispheres of thehuman brain evoked by pure tones and speech soundsExperimental Brain Research 77 127 ndash134

Lane H amp Tranel B (1971) The lombard sign and the roleof hearing in speech Journal of Speech and HearingResearch 14 677ndash709

Lee B S (1950) Some effects of side-tone delay Journal ofthe Acoustical Society of America 22 639 ndash640

Leonardo A amp Konishi M (1999) Decrystallization of adultbirdsong by perturbation of auditory feedback Nature 399466 ndash470

Levelt W J (1983) Monitoring and self-repair in speechCognition 14 41ndash104

Levelt W J M (1989) Speaking From intention toarticulation Cambridge MIT Press

Liegeois-Chauvel C Musolino A Badier J M Marquis P ampChauvel P (1994) Evoked potentials recorded from theauditory cortex in man Evaluation and topography of themiddle latency components Electroencephalography andClinical Neurophysiology 92 204 ndash214

Mahncke H (1998) Oscillations and temporal processingin sensory cortex Unpublished doctoral dissertationUniversity of California San Francisco

McCasland J S amp Konishi M (1981) Interaction betweenauditory and motor activities in an avian song controlnucleus Proceedings of the National Academy of SciencesUSA 78 7815ndash7819

McGuire P K Silbersweig D A amp Frith C D (1996)Functional neuroanatomy of verbal self-monitoring Brain119 907 ndash917

Mosher J C Lewis P S amp Leahy R M (1992) Multipledipole modeling and localization from spatio-temporal megdata IEEE Transactions on Biomedical Engineering 39541 ndash557

Muller-Preuss P amp Ploog D (1981) Inhibition of auditorycortical neurons during phonation Brain Research 21561ndash76

Numminen J amp Curio G (1999) Differential effects of overtcovert and replayed speech on vowel-evoked responsesof the human auditory cortex Neuroscience Letters 27229ndash32

Numminen J Salmelin R amp Hari R (1999) Subjectrsquos ownspeech reduces reactivity of the human auditory cortexNeuroscience Letters 265 119 ndash122

Papanicolaou A C Raz N Loring D W amp EisenbergH M (1986) Brain stem evoked response suppressionduring speech production Brain and Language 2750ndash55

Perkell J (1997) Articulatory processes In W J Hardcastleamp J Laver (Eds) The handbook of phonetic sciences(pp 333 ndash370) Oxford UK Blackwell Publishers

Picton T W Alain C Woods D L John M S Scherg MValdes-Sosa P Bosch-Bayard J amp Trujillo N J (1999)Intracerebral sources of human auditory-evoked potentialsAudiology and Neuro-otology 4 64ndash79

Poeppel D Yellin E Phillips C Roberts T P Rowley H AWexler K amp Marantz A (1996) Task-induced asymmetry ofthe auditory evoked m100 neuromagnetic field elicited byspeech sounds Brain Research Cognitive Brain Research4 231 ndash242

Reite M Adams M Simon J Teale P Sheeder JRichardson D amp Grabbe R (1994) Auditory m100component 1 Relationship to heschlrsquos gyri BrainResearch Cognitive Brain Research 2 13ndash20

Richer F Alain C Achim A Bouvier G amp Saint-Hilaire J M(1989) Intracerebral amplitude distributions of the auditoryevoked potential Electroencephalography and ClinicalNeurophysiology 74 202 ndash208

Robinson S amp Vrba J (1999) Functional neuroimaging bysynthetic aperture magnetometry (SAM) In Yoshimoto et al(Eds) Recent advances in biomagnetism (pp 302 ndash305)Sendai Japan Tokyo University Press

Scherg M amp Von Cramon D (1985) Two bilateral sources ofthe late aep as identified by a spatio-temporal dipole modelElectroencephalography and Clinical Neurophysiology 6232ndash44

Scherg M amp Von Cramon D (1986) Evoked dipolesource potentials of the human auditory cortexElectroencephalography and Clinical Neurophysiology65 344 ndash360

Schmidt M F amp Konishi M (1998) Gating of auditoryresponses in the vocal control system of awake songbirdsNature Neuroscience 1 513ndash518

Sekihara K Nagarajan S S Poeppel D Marantz Aamp Miyashita Y (2001) Reconstructing spatio-temporalactivities of neural sources using an meg vectorbeamformer technique IEEE Transactions on BiomedicalEngineering 48 760ndash771

Sekihara K Poeppel D Marantz A Koizumi H ampMiyashita Y (1997) Noise covariance incorporatedmeg-music algorithm A method for multiple-dipoleestimation tolerant of the influence of background brainactivity IEEE Transactions on Biomedical Engineering44 839 ndash847

Stevens K N (1999) Acoustic phonetics CambridgeMIT Press

Stufflebeam S M Poeppel D Rowley H A amp RobertsT P (1998) Peri-threshold encoding of stimulus

Houde et al 1137

frequency and intensity in the m100 latency NeuroReport9 91ndash94

Suga N amp Schlegel P (1972) Neural attenuation ofresponses to emitted sounds in echolocating rats Science177 82ndash84

Suga N amp Shimozawa T (1974) Site of neural attenuation ofresponses to self-vocalized sounds in echolocating batsScience 183 1211 ndash1213

von Bekesy G (1949) The structure of the middle ear and thehearing of onersquos own voice by bone conduction Journal ofthe Acoustical Society of America 21 217 ndash232

Weiskrantz L Elliott J amp Darlington C (1971) Preliminaryobservations on tickling oneself Nature 230 598 ndash599

Woods D L Knight R T amp Neville H J (1984)Bitemporal lesions dissociate auditory evoked potentialsand perception Electroencephalography and ClinicalNeurophysiology 57 208 ndash220

Yates A J (1963) Delayed auditory feedback PsychologicalBulletin 60 213 ndash232

Zatorre R J Evans A C Meyer E amp Gjedde A (1992)Lateralization of phonetic and pitch discrimination in speechprocessing Science 256 846 ndash849

1138 Journal of Cognitive Neuroscience Volume 14 Number 8

cortex is the result of a comparison between the incom-ing signal and an internally generated prediction of whatthat signal would be If this were to be the case thenaltering the incoming signal should create a mismatchwith that expectation that would reduce or abolish theresponse suppression

In Experiment 3 we tested this possibility by alter-ing the auditory feedback to the subject during self-production of speech The setup for Experiment 3 isshown in Figure 8 Like Experiment 1 this experimentconsisted of two conditions speaking and tape play-back However instead of hearing their speech (self-produced or from tape playback) subjects heard a sumof their speech plus the output of a white noise gen-erator that produced noise bursts gated to the durationof the speech utterance Subjects reported that theycould not hear their own speech and that they onlyheard noise gated to their vowel production

The results of Experiment 3 are shown in Figure 9 Forboth hemispheres the thick traces show the averageevoked response of subjects hearing gated noise in thespeaking condition while the thin traces show theirresponses to gated noise in the tape playback conditionAsterisks mark poststimulus latencies where the evokedresponses in the two conditions differ significantlyrelative to p lt 001 for both hemispheres The figureshows subjectsrsquo evoked responses in the two conditionsare significantly different at a number of latencies In theleft hemisphere the responses in the speaking and tapeplayback conditions differ significantly around the M100responses and at a latency of 150 msec In the righthemisphere the speaking and tape playback responses

differ significantly at the 50- and 200-msec latenciesSince we are limiting our analysis of these results toconsidering only M100 responses the only responsedifferences of interest are the significant differences seenin the M100 region of the left hemisphere responses Itis clear from the figure however that these responsedifferences arise principally from a latency differencebetween M100 responses the M100 response to gatednoise in the tape playback condition is about 20 msecearlier that the M100 response to gated noise in thespeaking condition The amplitudes of the two M100responses appear nearly identical

In sum Figure 9 shows that the M100 amplitudesuppression seen in Experiment 1 is abolished in bothhemispheres the amplitude of M100 response to noisegenerated from subjectsrsquo speech via the microphone isas large as the M100 response to noise generated fromthe tape playback of their speech Thus when wealtered subjectsrsquo speech feedback in Experiment 3 wefound that the response suppression to self-producedspeech disappeared which is consistent with the sup-pression resulting from auditory input matching aninternally expected input

DISCUSSION

The present study demonstrates that the human audi-tory cortex responds differently to self-produced speechthan to externally produced speech In Experiment 1significant suppression in the amplitude of M100responses was recorded in the speaking condition inboth hemispheres confirming that self-production ofspeech suppresses the response in the auditory cortexto that speech In Experiment 2 M100 responses totone pips were seen to be modestly more stronglysuppressed by self-produced than by externally pro-duced speech However this relatively weak extra sup-pression was not sufficient to explain the suppressionseen in Experiment 1 On the other hand in Experi-ment 3 we were able to abolish the suppression ofM100 responses to self-produced speech by altering thefeedback heard by subjects This last result is consistentwith the hypothesis that the suppression of response inthe auditory cortex to self-produced speech resultsfrom a match with expected auditory feedback

Comparison with Other Studies

Experiment 1 is quite similar to the study by Curio et al(2000) which also examined M100 response differencesbetween speaking and tape playback conditions Theauthors found M100 amplitude differences across listen-ing conditions in both hemispheres but unlike thepresent study these differences were not significant inthe right hemisphere Further in contradiction to ourstudies Curio et al described significant differencesin M100 latencies across listening conditions in both

Figure 9 RMS response waveforms in Experiment 3 averaged acrossall subjects The vertical bars in each trace show standard errors whilethe asterisks show intervals where the response waveforms from thespeaking condition (thick traces) and tape playback condition (thintraces) differ significantly relative to p lt 001 for both hemispheres

Houde et al 1131

hemispheres Different results in M100 latencies couldbe accounted for by the fact that we matched theacoustic waveform presented to each subject whileCurio et al behaviorally matched the perceptual inten-sity between the listening conditions Because subjectscould set the tape playback volume to be different fromtheir speech feedback volume differing audio levels inthe two experiment conditions could account for someof the M100 latency differences that they report WithMEG data acquisition triggered by speech signal onsettriggering occurs when the audio signal exceeds somethreshold voltage As a result triggering time is sensitiveto the amplification of the signal The more a signal isamplified the faster it will reach the trigger thresholdThus if tape playback volume levels were less thanspeech feedback volume levels triggering from tapeplayback could be delayed compared to the speakingcondition If so then for the same true M100 latencyM100 peaks would appear to have shorter latencies inthe tape playback condition

In Experiment 3 the amplitude difference in M100responses between the speaking and tape playbackconditions was abolished by using gated noise todistort the feedback heard by subjects Interestinglyhowever this feedback manipulation created a latencydifference between the two conditions In the lefthemisphere the M100 response to gated noise trig-gered by tape playback was 15 msec earlier than thatfor gated noise triggered by the subjectrsquos own speechwhile in the right hemisphere such a latency differenceis not seen Curio et al (2000) also found that thereduced latency of M100 responses to tape playbackversus self-produced speech was significantly morepronounced in the left hemisphere Like Curio et alwe attribute this left hemisphere latency difference totwo factors First this latency difference may representdifferential processing of externally produced versusself-produced auditory feedback Although the gatednoise altered the spectral characteristics of their speechfeedback such that it was likely to mismatch anyinternally generated expectations of how their feedbackwould sound the gated noise did not alter the tempo-ral characteristics of their speech feedback Subjectsstill heard auditory feedback begin at the same time asthey commenced phonation This temporal predictabil-ity could potentially have allowed subjects to processdifferently gated noise arising from their own speechwhose onset they could predict from gated noisearising from tape playback whose onset they couldnot predict Second the fact that this latency differencewas seen primarily in the left hemisphere is consistentwith the left hemispherersquos dominance in the produc-tion of speech in right-handed speakers Other aspectsof our results may also reflect this left hemispheredominance In Experiment 1 the M100 response totape playback speech was larger in the left hemispherethan in the right and the M100 response to self-

produced speech was temporally sharper in the lefthemisphere than in the right Additionally in Experi-ment 2 in the left hemisphere the M100 response totones was significantly more suppressed by self-pro-duced speech than by tape playback speechmdasha differ-ence not seen in the right hemisphere

Results from Experiment 2 are mostly consistent withthe recent experiments by Numminen and Curio (1999)They reported that neuromagnetic responses to shortrecorded vowel sounds were delayed and dampenedrelative to background-free presentations They did notreport how much they externally attenuated the pro-duced speech signal in order to obtain a clear responseto the probe-vowel sound In our experiments we usedprobe tones instead of vowel sounds and we attenuatedthe produced speech signal to the ear by 20 dBSurprisingly we did not observe any latency differencesin the M100 responses for tones between the threeconditions We believe that the lack of a latency effectcould be accounted for by the fact that we used tonesinstead of vowel sounds (Diesch amp Luce 1997 Poeppelet al 1996 Kuriki amp Murase 1989) Furthermore by anargument analogous to that discussed above for theCurio et al (2000) latency results it is also reasonablethat any amplitude mismatches in the masking soundscould account for discrepancies in the latency effect(Hari amp Makela 1988) However while Numminen andCurio identified auditory interference as the main causeof these modifications their results also provide evi-dence against the nonspecific attenuation hypothesisalthough they do not specifically discuss their results inthis form For instance they found an additional 6ndash9decrease in amplitude in response to probe-vowelsounds during produced speech background whencompared with tape-recorded speech backgroundThese amplitude differences are consistent with ourExperiment 2 results

Gunji et al have found that vocalization-related cort-ical magnetic fields activated six sources that were foundto temporally overlap in the period 0ndash100 msec aftervocalization onset (Gunji Hoshiyama amp Kakigi 2000Gunji Kakigi amp Hoshiyama 2000) Sources 1 and 2 wereactivated approximately 150 msec before the vocalizationonset and were located in laryngeal motor areas of theleft and right hemispheres respectively Kuriki Mori andHirata (1999) have also found a motor planning centerfor speech articulation that is activated 120ndash320 msecbefore the onset of vocalization and located in a regionaround the superior end of the left insula Gunji et alSources 5 and 6 were located in the truncal motor area ineach hemisphere and were similar to Sources 1 and 2 Incontrast their Sources 3 and 4 were located in theauditory cortices of the left and right hemisphererespectively and were activated after vocalization onsetHowever these experiments did not explore either themodulation of the responses in the auditory or motorcortex due to vocalization or feedback alteration In our

1132 Journal of Cognitive Neuroscience Volume 14 Number 8

experiments because we were unable to simultaneouslymeasure activity from motor and supplementary motorareas using a 37-channel sensor array over each hemi-sphere we focus here only on the activity from auditorycortices of both hemispheres

More relevant to our results are the studies of Hiranoet al (Hirano et al 1996 Hirano Kojima et al 1997Hirano Naito et al 1997) who used PET to look atcortical activation during speaking In their first studyHirano et al found a lack of activation in the auditorycortex while subjects vocalized In their second studyHirano et al altered how subjects heard their ownvocalizations and this time recorded significant activa-tion of the auditory cortex These results are consistentwith our findings in Experiments 1 and 3 of our study

Why is the Response to Self-Produced SpeechSuppressed

The present study along with recent MEG studies byCurio et al and other brain imaging studies that haveused PET or fMRI demonstrate that during speechproduction the auditory cortex suppresses its responseto expected acoustic signals (Hirano et al 1996 HiranoKojima et al 1997 Hirano Naito et al 1997) Doesthat suppression serve any function We speculate abouttwo possibilities

Auditory Perception

One possibility is that the suppression results from theprocess of distinguishing self-produced from externallyproduced sources It would be useful for sensory systemsto make this distinction since sensory information fromthese two types of sources is commonly used for differ-ent purposes Sensory information from self-producedmotor actions could potentially be used for feedbackcontrol On the other hand sensory information fromexternal sources is primarily used for recognition

But how could the sensory systems distinguish self-produced from externally produced sensory informa-tion It has been proposed by previous researchers thatthis is done by having motor actions produce expect-ations of their sensory consequences (Jeannerod 1988)These sensory outcome predictions are then comparedwith the actual sensory input whatever matches theoutcome predictions is inhibited In that way the sen-sory consequences of self-produced actions could befiltered from incoming sensory data thus highlightingany unpredicted and therefore external (or novel) stim-ulus that should potentially be attended to

For a number of reasons the results of our experi-ments are consistent with the above account Firstalthough we found significant suppression of the re-sponses of the auditory cortex to self-produced speechwe found little evidence that the act of producingspeech suppresses responses to other sounds (in our

case the 1-kHz tone) This finding is consistent with thehypothesis that only responses to self-produced soundsare being filtered from the auditory cortex In additionwe found that altering the auditory feedback with gatednoise abolished the suppression in the auditory cortexinduced by speaking consistent with the hypothesis thatthe filtering out of auditory information arising from self-produced speech is done by comparison with predictedauditory feedback

Recently Blakemore et al have recorded the sametypes of response suppression phenomena in the soma-tosensory cortex suggesting that the need to distinguishself-produced from externally generated sensory inputmay be a general property of sensory systems In theirseries of experiments Blakemore et al used bothbehavioral and physiological procedures to look at re-sponses to both self-produced and externally producedsomatosensory stimulation In one experiment theyexamined the well-known phenomenon that peoplecannot tickle themselves (Weiskrantz Elliott amp Darling-ton 1971) This phenomenon could be explained bysupposing that somatosensory responses are inhibitedwhen the incoming sensory (touch) information matchesa prediction of that input generated by the personrsquos ownmotor system To test this hypothesis these investigatorshad subjects tickle their palm via an apparatus that couldintroduce a time delay between their movement andthe touch stimulation of their palm (Blakemore Frith ampWolpert 1999) When this was done subjects reportedthat with increasing delay the stimulation of theirpalm was increasingly stimulating (ticklish) That theperceived stimulation increased with delay could beexplained by supposing that the delay made the incom-ing sensory information mismatch in time the predictedsensory input This experiment was followed up in anfMRI imaging study looking at activation of the somato-sensory cortex and cerebellum to both self-producedand externally generated palm stimulation (BlakemoreWolpert amp Frith 1999) It was found that activation ofthe somatosensory cortex to self-produced palm stimu-lation was suppressed compared to externally producedpalm stimulationmdasha direct analog of the results recordedin the auditory cortex in our experiments

Speech Motor Control

The suppression could also serve to regulate howmuch auditory feedback is used in the ongoing controlof speech

A problem with using auditory feedback to controlspeech is the delay incurred by the processing of theauditory input To control speech auditory feedbackmust convey information about the positions of thevocal tract articulators From speech acoustics we knowthat formant frequencies and pitch are the acousticparameters that convey this information (Stevens1999) Extracting these parameters from the type of

Houde et al 1133

signal provided by the cochlea requires significant signalprocessing which in turn suggests the possibility ofsignificant delay if a succession of auditory brain areasare needed for this processing (Jurgens 2002 BurnettFreedland Larson amp Hain 1998 Perkell 1997) Inaddition there will be delays associated with the gen-eration of a response in the motor cortex There will alsobe intrinsic delays in transmitting the neural signals tothe appropriate muscles as well as delays in muscleresponses to these signals (Jurgens 2002) Togetherthese factors could contribute to a significant feedbackloop delay between auditory input and motor output

Unfortunately we know from control theory that ingeneral using feedback to control a system with a largefeedback loop delay will not work because the systembecomes unstable (Franklin Powell amp Emami-Naeini1991) Intuitively this instability arises because the delaymakes the feedback uncorrelated with the state of thesystem sensory feedback from some time in the pastdoes not entirely reflect the current state of the systemYet auditory feedback does appear to be involved in theonline control of some aspects of speech production Asnoted in the Introduction if the pitch of a speakerrsquosfeedback is perturbed they will typically compensatewithin 100 ndash200 msec (Burnett et al 1998 Kawahara1993 Elman 1981) Thus in spite of a large (100ndash200 msec) feedback loop delay auditory feedback doesstably regulate a speakerrsquos pitch How is this possibleAgain from control theory we know that when feedbackis delayed corrupted by noise or otherwise madeuncorrelated with current motor output a successfulapproach to using feedback control is to attenuate thesensory feedback to the degree that it is uncorrelatedSpecifically in feedback control based on Kalman filter-ing the Kalman gain on sensory feedback is set in partto be inversely proportional to the degree to whichsensory feedback is correlated with the current systemstate (Jacobs 1993) In an analogous way it may be thatthe suppression of the auditory cortex during speakingfunctions to attenuate the auditory feedback to a levelcommensurate with its delay such that it can be usedproperly in the control of speech

Finally it is possible that both of the above accountsare true It may be that the suppression of self-producedspeech acts as a filter for incoming sensory data allow-ing attention to be paid to externally produced sensa-tions while at the same time properly attenuating theauditory feedback for use in the online control ofspeech Future experiments must be designed to distin-guish between the above hypothesized possibilities

METHODS

Subjects

Eight male volunteers (ages 25ndash40) participated inExperiments 1 and 3 of this study Of these volun-

teers six also participated in Experiment 2 All subjectsgave their informed consent All studies were per-formed with the approval of the UCSF Committee forHuman Research

Experimental Design

Magnetic fields were recorded from both hemispheresin a shielded room using two 37-channel biomagnetom-eters with SQUID-based first-order gradiometer sensors(Magnes II 4-D Neuroimaging San Diego CA) A sche-matic of the experimental setup is shown in Figure 1A directional microphone was placed in the chamber at adistance where it did not distort the recorded magneticfields Triggering of the data acquisition system and ofstimuli delivered to the ears was controlled and itdiffered for each of three experiments in the study

Fiduciary points were marked on the skin for latercoregistration with structural magnetic resonanceimages and the head shape was digitized to constrainsubsequent source modeling The sensor was initiallypositioned over the estimated location of the auditorycortices in both hemispheres such that a dipolar M100response was evoked by single 400-msec duration tonepips (1 kHz 5 msec risefall ramps 90 dB SPL) Dataacquisition epochs were 600 msec in total duration witha 100-msec prestimulus period referenced to the onsetof the first speech stimulus Data were acquired at asampling rate of 1041 Hz The position of the sensorwas then refined so that a single dipole localizationmodel of the averaged evoked magnetic field responsesto 100 tones resulted in a correlation and goodness-of-fitgreater than 95

With satisfactory sensor positioning over the auditorycortices in both hemispheres but before starting theexperiments subjects practiced producing the neutralvowel sound by phonating while relaxing their uppervocal tract generating no movement of their jaw ortongue Preliminary experiments revealed that this formof speech production produced no movement artifactsin these magnetic field recordings

During the speaking condition of Experiments 1 and 3the audio onset of each utterance picked up the micro-phone triggered the MEG data acquisition system whichrecorded 1 sec of data (300 msec pretrigger 700 msecposttrigger) at a sampling rate of 1041 Hz Subjectswere instructed to produce short vowel sounds byphonating their vocal chords without any movement oftheir jaw or tongue Responses to 100 such utteranceswere recorded and these magnetic field recordings werethen averaged to obtain the evoked magnetic fieldresponses to self-produced speech for each MEG detec-tor channel In the subsequent tape playback conditionthe audio signal from the tape recorder was fed to thesubjects earphones and the trigger input for MEG dataacquisition Again magnetic field recordings wereaveraged to obtain the evoked magnetic field responses

1134 Journal of Cognitive Neuroscience Volume 14 Number 8

During the tape playback condition of Experiments 1and 3 the audio level of the signal fed to the earphonesand trigger was adjusted to be the same as that of thespeaking condition Because of this the trigger of MEGdata acquisition (which was based on an audio voltagethreshold) occurred at the same time relative to utter-ance onset in both conditions The equal audio levelsensured that triggering latency differences could notaccount for any differences between the evoked re-sponses to self-produced and tape-recorded speech InExperiment 1 the subject heard these equal audio levelsin the earphones which meant that his side tone ampli-tude was the same in both the speaking and tape play-back conditions In Experiment 3 the audio signal foreach utterance was used to trigger both the MEG dataacquisition system and a white noise generator (20 Hzndash20 kHz bandwidth 85 dB SPL) In this case a sum of theiraudio signal and noise was presented to the subject

In Experiment 2 onset of the tone pips triggered MEGdata acquisition in each of the three conditions (tonesalone tones and speaking and tones and tape playback)These 1-kHz tones each had a 400-msec duration 5-mseconset and offset ramps and a 08ndash12-sec ISI betweentones The subject heard the tones at 90 dB SPL As withExperiment 1 data acquisition epochs were 10 sec intotal duration with a 300-msec prestimulus period Datawere acquired at a sampling rate of 1041 Hz In eachcondition the evoked magnetic field responses to thetone pips were obtained by averaging 100 epochs

Epochs 1500 msec long with a 500-msec prestimulusperiod were collected at a sampling rate of 1041 Hz Theraw epoch magnetic field data from these 100 utteranceswere then averaged to obtain the evoked magnetic fieldresponses to gated noise during self-production

The voltage level of the audio signal fed to earphoneand trigger signal in this condition was adjusted to beidentical to the voltage levels recorded during the speak-ing condition ensuring that triggering differences couldnot account for any differences between the evokedresponses to noise in the self-produced and tape play-back blocks

Data Analysis

Data from the two sensor arrays positioned over eachhemisphere were separately analyzed From the evokedmagnetic field response for each condition in eachchannel the RMS value of evoked magnetic field strengthaveraged across channels was computed The RMS meth-od of averaging (square root of the mean of squaredchannel field strengths) was used to avoid the cancella-tion that occurs when the magnetic fields from detectorson opposite sides of the current dipole (whose fields areroughly equal in magnitude but opposite in sign) areaveraged together This process resulted in one RMStime waveform for each hemisphere for each conditionfor each subject For statistical comparisons across con-

ditions a one-way analysis of variance was performedwith condition as factor and time as a repeated measure

The RMS across channels is a convenient measure ofactivity that is independent of the position of individualsensors Following optimal placement of the sensor for asignificant M100 dipolar response RMS has been shownto correlate well with the lsquolsquoequivalent current-dipolestrengthrsquorsquo (also called the lsquolsquoQ valuersquorsquo) of a dipole posi-tioned in and around the auditory cortex The RMSmeasure is also correlated with the data obtained from(1) the channel with the biggest signal-to-noise ratioand (2) the first two principal components of the arrayresponse (Ahissar et al 2001 Mahncke 1998)

In sensor arrays comprising of planar gradiometers(like the Neuromag 122 or 306) the channel with largestsignal-to-noise ratio is often closest to the source (asdictated by the physics) However in the case of sensorarrays comprising of axial gradiometers (as in ourMagnes II system) the channel with the largest signal-to-noise ratio reflects the peak of the magnetic fieldwhich is not the closest channel to a source Moreoverthe lsquolsquobestrsquorsquo channel could reflect either a peak lsquolsquoinwardrsquorsquoor lsquolsquooutwardrsquorsquo field requiring signed inversion whileaveraging such data across subjects Finally since thesensor array position can varying considerably acrosssubjects it is difficult to use procedures typical for EEGand whole-head studies wherein single channel data isoften picked for analysis across subjects These consid-erations have led us to use RMS instead of pickingspecific channels for our analysis

Single Dipole Model Fit

A single equivalent current-dipole model was calculatedseparately for each hemisphere for each poststimulustime point using standard 4-D Neuroimaging softwareoperating on data that had been filtered between 1 and20 Hz The localization algorithm used an iterative least-squares minimization to compute the strength andlocation of a single dipole in a spherical volume ofuniform conductivity that can account for the sensordata Dipole fits were accepted based on a local corre-lation maximum criteria of 95 and goodness-of-fit valuesgreater than 95

The MEG anatomical reference frame was establishedusing a digital device called the lsquolsquosensor position indica-torrsquorsquo A set of three table-mounted and one array-mounted receivers triangulates the signal from a stylustransmitter positioned at fiducial reference points on thesubjectrsquos head surface (typically nasion left right pre-auricular points Cz and inion) The same stylus trans-mitter arrangement served to define the curvature of thehead by tracing the surface from which a lsquolsquolocal spherersquorsquomodel of the head was generated To coregister thedipole locations on the MRI image the fiduciary pointsserve as the basis for a common coordinate system Bysuperimposing these fiducial landmarks on MR images

Houde et al 1135

of the subject it was possible to define the position ofthe computed point sources with an accuracy of 5 mm

Multiple Dipole Localization Analysis

Multiple dipole localization analyses of spatio-temporalevoked magnetic fields was also performed using Multi-ple Signal Classification (MUSIC) (Sekihara PoeppelMarantz Koizumi amp Miyashita 1997 Mosher Lewisamp Leahy 1992) and beamformer algorithms (Mosheret al 1992 Sekihara Nagarajan Poeppel Marantz ampMiyashita 2001 Robinson amp Vrba 1999) MUSIC meth-ods are based on estimation of a signal lsquolsquosubspacersquorsquofrom entire spatio-temporal MEG data using singular-value decomposition (SVD) A version of the MUSICalgorithm referred to as the lsquolsquoconventionalrsquorsquo MUSICalgorithm was implemented in MATLAB under theassumption that the sources contributing to the MEGdata arose from multiple dipoles (lt37 in number)which were stationary and were located within a spher-ical volume of uniform conductivity (Sekihara et al1997) The locations of dipoles are typically determinedby conducting a search over a three-dimensional grid ofinterest within the head Given the sensor positionsand the coordinates of the origin of a lsquolsquolocal spherersquorsquoapproximation of the head shape for each subject alead-field matrix was computed for each point in this3-D grid At each point in this grid from the lead-fieldmatrix and the covariance matrix of the spatio-temporalMEG data the value of a MUSIC localizer function canthen be computed (Sekihara et al 1997 Equation (4))Maxima of this localizer function then corresponded tothe location of dipolar sources We used the MUSICalgorithm to compute the localizer functions over athree-dimensional grid in the hemisphere contralateralto the stimulus delivery The localizer was computedusing the evoked magnetic field responses over a timewindow of 1000 msec following the onset of thesentences For each subject at each point in a 3-D grid(iexcl4 lt x lt 6 0 lt y lt 8 3 lt z lt 11) in the lefthemisphere the localizer function was computed usingthe averaged evoked auditory magnetic field responses

Acknowledgments

This work was funded in part by Scientific Learning Co and agrant from the Whitaker Foundation

Reprint requests should be sent to John F Houde W M KeckCenter for Integrative Neuroscience University of CaliforniaSan Francisco 513 Parnassus Avenue S-762 San Francisco CA94143 USA or via e-mail houdephyucsfedu

Note

1 In a recent study of M100 responses to tones it was foundthat near detection thresholds a 15-dB change in SPL caused a34 change in M100 RMS amplitude (Stufflebeam et al 1998)

REFERENCES

Ahissar E Nagarajan S Ahissar M Protopapas A MahnckeH amp Merzenich M M (2001) Speech comprehension iscorrelated with temporal response patterns recorded fromauditory cortex Proceedings of the National Academy ofSciences USA 98 13367 ndash13372

Blakemore S J Frith C D amp Wolpert D M (1999)Spatio-temporal prediction modulates the perception ofself-produced stimuli Journal of Cognitive Neuroscience11 551 ndash559

Blakemore S J Wolpert D M amp Frith C D (1999) Thecerebellum contributes to somatosensory cortical activityduring self-produced tactile stimulation Neuroimage 10448 ndash459

Burnett T A Freedland M B Larson C R amp Hain T C(1998) Voice f0 responses to manipulations in pitchfeedback Journal of the Acoustical Society of America103 3153ndash3161

Cooper W E (1979) Speech perception and productionStudies in selective adaptation Norwood NJ Ablex

Creutzfeldt O amp Ojemann G (1989) Neuronal activityin the human lateral temporal lobe III Activity changesduring music Experimental Brain Research 77490 ndash498

Creutzfeldt O Ojemann G amp Lettich E (1989a) Neuronalactivity in the human lateral temporal lobe I Responses tospeech Experimental Brain Research 77 451 ndash475

Creutzfeldt O Ojemann G amp Lettich E (1989b) Neuronalactivity in the human lateral temporal lobe II Responses tothe subjects own voice Experimental Brain Research 77476 ndash489

Curio G Neuloh G Numminen J Jousmaki V amp Hari R(2000) Speaking modifies voice-evoked activity in thehuman auditory cortex Human Brain Mapping 9183 ndash191

Demonet J F Chollet F Ramsay S Cardebat DNespoulous J L Wise R Rascol A amp Frackowiak R(1992) The anatomy of phonological and semanticprocessing in normal subjects Brain 115 1753 ndash1768

Diesch E amp Luce T (1997) Magnetic fields elicited bytones and vowel formants reveal tonotopy and nonlinearsummation of cortical activation Psychophysiology 34501 ndash510

Elman J L (1981) Effects of frequency-shifted feedback onthe pitch of vocal productions Journal of the AcousticalSociety of America 70 45ndash50

Franklin G F Powell J D amp Emami-Naeini A (1991)Feedback control of dynamic systems (2nd ed) ReadingMA Addison-Wesley

Gracco V L Ross D Kalinowski J amp Stuart A (1994)Articulatory changes following spectral and temporalmodifications in auditory feedback Journal of theAcoustical Society of America 95 2821

Gunji A Hoshiyama M amp Kakigi R (2000) Identificationof auditory evoked potentials of onersquos own voice ClinicalNeurophysiology 111 214 ndash219

Gunji A Kakigi R amp Hoshiyama M (2000) Spatiotemporalsource analysis of vocalization-associated magnetic fieldsCognitive Brain Research 9 157 ndash163

Hari R Aittoniemi K Jarvinen M L Katila T amp Varpula T(1980) Auditory evoked transient and sustained magneticfields of the human brain Localization of neural generatorsExperimental Brain Research 40 237 ndash240

Hari R amp Makela J P (1988) Modification of neuromagneticresponses of the human auditory cortex by masking soundsExperimental Brain Research 71 87ndash92

Hein A V amp Held R (1962) A neural model for labile

1136 Journal of Cognitive Neuroscience Volume 14 Number 8

sensorimotor coordination In E Bernard amp M Hare(Eds) Biological prototypes and synthetic systems(vol 1 pp 71ndash74) New York Plenum

Hickok G amp Poeppel D (2000) Towards a functionalneuroanatomy of speech perception Trends in CognitiveSciences 4 131 ndash138

Hirano S Kojima H Naito Y Honjo I Kamoto YOkazawa H Ishizu K Yonekura Y Nagahama YFukuyama H amp Konishi J (1996) Cortical speechprocessing mechanisms while vocalizing visually presentedlanguages NeuroReport 8 363ndash367

Hirano S Kojima H Naito Y Honjo I Kamoto YOkazawa H Ishizu K Yonekura Y Nagahama YFukuyama H amp Konishi J (1997) Cortical processingmechanism for vocalization with auditory verbal feedbackNeuroReport 8 2379ndash2382

Hirano S Naito Y Okazawa H Kojima H Honjo I IshizuK Yenokura Y Nagahama Y Fukuyama H amp Konishi J(1997) Cortical activation by monaural speech soundstimulation demonstrated by positron emission tomographyExperimental Brain Research 113 75ndash80

Houde J F amp Jordan M I (1998) Sensorimotor adaptationin speech production Science 279 1213 ndash1216

Jacobs O L R (1993) Introduction to control theory(2nd ed) Oxford UK Oxford University Press

Jeannerod M (1988) The neural and behaviouralorganisation of goal-directed movements Oxford UKOxford University Press

Jurgens U (2002) Neural pathways underlying vocalcontrol Neuroscience and Biobehavioral Reviews 26235 ndash258

Kawahara H (1993) Transformed auditory feedback Effectsof fundamental frequency perturbation Journal of theAcoustical Society of America 94 1883

Kuriki S Mori T amp Hirata Y (1999) Motor planningcenter for speech articulation in the normal human brainNeuroReport 10 765 ndash769

Kuriki S amp Murase M (1989) Neuromagnetic study of theauditory responses in right and left hemispheres of thehuman brain evoked by pure tones and speech soundsExperimental Brain Research 77 127 ndash134

Lane H amp Tranel B (1971) The lombard sign and the roleof hearing in speech Journal of Speech and HearingResearch 14 677ndash709

Lee B S (1950) Some effects of side-tone delay Journal ofthe Acoustical Society of America 22 639 ndash640

Leonardo A amp Konishi M (1999) Decrystallization of adultbirdsong by perturbation of auditory feedback Nature 399466 ndash470

Levelt W J (1983) Monitoring and self-repair in speechCognition 14 41ndash104

Levelt W J M (1989) Speaking From intention toarticulation Cambridge MIT Press

Liegeois-Chauvel C Musolino A Badier J M Marquis P ampChauvel P (1994) Evoked potentials recorded from theauditory cortex in man Evaluation and topography of themiddle latency components Electroencephalography andClinical Neurophysiology 92 204 ndash214

Mahncke H (1998) Oscillations and temporal processingin sensory cortex Unpublished doctoral dissertationUniversity of California San Francisco

McCasland J S amp Konishi M (1981) Interaction betweenauditory and motor activities in an avian song controlnucleus Proceedings of the National Academy of SciencesUSA 78 7815ndash7819

McGuire P K Silbersweig D A amp Frith C D (1996)Functional neuroanatomy of verbal self-monitoring Brain119 907 ndash917

Mosher J C Lewis P S amp Leahy R M (1992) Multipledipole modeling and localization from spatio-temporal megdata IEEE Transactions on Biomedical Engineering 39541 ndash557

Muller-Preuss P amp Ploog D (1981) Inhibition of auditorycortical neurons during phonation Brain Research 21561ndash76

Numminen J amp Curio G (1999) Differential effects of overtcovert and replayed speech on vowel-evoked responsesof the human auditory cortex Neuroscience Letters 27229ndash32

Numminen J Salmelin R amp Hari R (1999) Subjectrsquos ownspeech reduces reactivity of the human auditory cortexNeuroscience Letters 265 119 ndash122

Papanicolaou A C Raz N Loring D W amp EisenbergH M (1986) Brain stem evoked response suppressionduring speech production Brain and Language 2750ndash55

Perkell J (1997) Articulatory processes In W J Hardcastleamp J Laver (Eds) The handbook of phonetic sciences(pp 333 ndash370) Oxford UK Blackwell Publishers

Picton T W Alain C Woods D L John M S Scherg MValdes-Sosa P Bosch-Bayard J amp Trujillo N J (1999)Intracerebral sources of human auditory-evoked potentialsAudiology and Neuro-otology 4 64ndash79

Poeppel D Yellin E Phillips C Roberts T P Rowley H AWexler K amp Marantz A (1996) Task-induced asymmetry ofthe auditory evoked m100 neuromagnetic field elicited byspeech sounds Brain Research Cognitive Brain Research4 231 ndash242

Reite M Adams M Simon J Teale P Sheeder JRichardson D amp Grabbe R (1994) Auditory m100component 1 Relationship to heschlrsquos gyri BrainResearch Cognitive Brain Research 2 13ndash20

Richer F Alain C Achim A Bouvier G amp Saint-Hilaire J M(1989) Intracerebral amplitude distributions of the auditoryevoked potential Electroencephalography and ClinicalNeurophysiology 74 202 ndash208

Robinson S amp Vrba J (1999) Functional neuroimaging bysynthetic aperture magnetometry (SAM) In Yoshimoto et al(Eds) Recent advances in biomagnetism (pp 302 ndash305)Sendai Japan Tokyo University Press

Scherg M amp Von Cramon D (1985) Two bilateral sources ofthe late aep as identified by a spatio-temporal dipole modelElectroencephalography and Clinical Neurophysiology 6232ndash44

Scherg M amp Von Cramon D (1986) Evoked dipolesource potentials of the human auditory cortexElectroencephalography and Clinical Neurophysiology65 344 ndash360

Schmidt M F amp Konishi M (1998) Gating of auditoryresponses in the vocal control system of awake songbirdsNature Neuroscience 1 513ndash518

Sekihara K Nagarajan S S Poeppel D Marantz Aamp Miyashita Y (2001) Reconstructing spatio-temporalactivities of neural sources using an meg vectorbeamformer technique IEEE Transactions on BiomedicalEngineering 48 760ndash771

Sekihara K Poeppel D Marantz A Koizumi H ampMiyashita Y (1997) Noise covariance incorporatedmeg-music algorithm A method for multiple-dipoleestimation tolerant of the influence of background brainactivity IEEE Transactions on Biomedical Engineering44 839 ndash847

Stevens K N (1999) Acoustic phonetics CambridgeMIT Press

Stufflebeam S M Poeppel D Rowley H A amp RobertsT P (1998) Peri-threshold encoding of stimulus

Houde et al 1137

frequency and intensity in the m100 latency NeuroReport9 91ndash94

Suga N amp Schlegel P (1972) Neural attenuation ofresponses to emitted sounds in echolocating rats Science177 82ndash84

Suga N amp Shimozawa T (1974) Site of neural attenuation ofresponses to self-vocalized sounds in echolocating batsScience 183 1211 ndash1213

von Bekesy G (1949) The structure of the middle ear and thehearing of onersquos own voice by bone conduction Journal ofthe Acoustical Society of America 21 217 ndash232

Weiskrantz L Elliott J amp Darlington C (1971) Preliminaryobservations on tickling oneself Nature 230 598 ndash599

Woods D L Knight R T amp Neville H J (1984)Bitemporal lesions dissociate auditory evoked potentialsand perception Electroencephalography and ClinicalNeurophysiology 57 208 ndash220

Yates A J (1963) Delayed auditory feedback PsychologicalBulletin 60 213 ndash232

Zatorre R J Evans A C Meyer E amp Gjedde A (1992)Lateralization of phonetic and pitch discrimination in speechprocessing Science 256 846 ndash849

1138 Journal of Cognitive Neuroscience Volume 14 Number 8

hemispheres Different results in M100 latencies couldbe accounted for by the fact that we matched theacoustic waveform presented to each subject whileCurio et al behaviorally matched the perceptual inten-sity between the listening conditions Because subjectscould set the tape playback volume to be different fromtheir speech feedback volume differing audio levels inthe two experiment conditions could account for someof the M100 latency differences that they report WithMEG data acquisition triggered by speech signal onsettriggering occurs when the audio signal exceeds somethreshold voltage As a result triggering time is sensitiveto the amplification of the signal The more a signal isamplified the faster it will reach the trigger thresholdThus if tape playback volume levels were less thanspeech feedback volume levels triggering from tapeplayback could be delayed compared to the speakingcondition If so then for the same true M100 latencyM100 peaks would appear to have shorter latencies inthe tape playback condition

In Experiment 3 the amplitude difference in M100responses between the speaking and tape playbackconditions was abolished by using gated noise todistort the feedback heard by subjects Interestinglyhowever this feedback manipulation created a latencydifference between the two conditions In the lefthemisphere the M100 response to gated noise trig-gered by tape playback was 15 msec earlier than thatfor gated noise triggered by the subjectrsquos own speechwhile in the right hemisphere such a latency differenceis not seen Curio et al (2000) also found that thereduced latency of M100 responses to tape playbackversus self-produced speech was significantly morepronounced in the left hemisphere Like Curio et alwe attribute this left hemisphere latency difference totwo factors First this latency difference may representdifferential processing of externally produced versusself-produced auditory feedback Although the gatednoise altered the spectral characteristics of their speechfeedback such that it was likely to mismatch anyinternally generated expectations of how their feedbackwould sound the gated noise did not alter the tempo-ral characteristics of their speech feedback Subjectsstill heard auditory feedback begin at the same time asthey commenced phonation This temporal predictabil-ity could potentially have allowed subjects to processdifferently gated noise arising from their own speechwhose onset they could predict from gated noisearising from tape playback whose onset they couldnot predict Second the fact that this latency differencewas seen primarily in the left hemisphere is consistentwith the left hemispherersquos dominance in the produc-tion of speech in right-handed speakers Other aspectsof our results may also reflect this left hemispheredominance In Experiment 1 the M100 response totape playback speech was larger in the left hemispherethan in the right and the M100 response to self-

produced speech was temporally sharper in the lefthemisphere than in the right Additionally in Experi-ment 2 in the left hemisphere the M100 response totones was significantly more suppressed by self-pro-duced speech than by tape playback speechmdasha differ-ence not seen in the right hemisphere

Results from Experiment 2 are mostly consistent withthe recent experiments by Numminen and Curio (1999)They reported that neuromagnetic responses to shortrecorded vowel sounds were delayed and dampenedrelative to background-free presentations They did notreport how much they externally attenuated the pro-duced speech signal in order to obtain a clear responseto the probe-vowel sound In our experiments we usedprobe tones instead of vowel sounds and we attenuatedthe produced speech signal to the ear by 20 dBSurprisingly we did not observe any latency differencesin the M100 responses for tones between the threeconditions We believe that the lack of a latency effectcould be accounted for by the fact that we used tonesinstead of vowel sounds (Diesch amp Luce 1997 Poeppelet al 1996 Kuriki amp Murase 1989) Furthermore by anargument analogous to that discussed above for theCurio et al (2000) latency results it is also reasonablethat any amplitude mismatches in the masking soundscould account for discrepancies in the latency effect(Hari amp Makela 1988) However while Numminen andCurio identified auditory interference as the main causeof these modifications their results also provide evi-dence against the nonspecific attenuation hypothesisalthough they do not specifically discuss their results inthis form For instance they found an additional 6ndash9decrease in amplitude in response to probe-vowelsounds during produced speech background whencompared with tape-recorded speech backgroundThese amplitude differences are consistent with ourExperiment 2 results

Gunji et al have found that vocalization-related cort-ical magnetic fields activated six sources that were foundto temporally overlap in the period 0ndash100 msec aftervocalization onset (Gunji Hoshiyama amp Kakigi 2000Gunji Kakigi amp Hoshiyama 2000) Sources 1 and 2 wereactivated approximately 150 msec before the vocalizationonset and were located in laryngeal motor areas of theleft and right hemispheres respectively Kuriki Mori andHirata (1999) have also found a motor planning centerfor speech articulation that is activated 120ndash320 msecbefore the onset of vocalization and located in a regionaround the superior end of the left insula Gunji et alSources 5 and 6 were located in the truncal motor area ineach hemisphere and were similar to Sources 1 and 2 Incontrast their Sources 3 and 4 were located in theauditory cortices of the left and right hemisphererespectively and were activated after vocalization onsetHowever these experiments did not explore either themodulation of the responses in the auditory or motorcortex due to vocalization or feedback alteration In our

1132 Journal of Cognitive Neuroscience Volume 14 Number 8

experiments because we were unable to simultaneouslymeasure activity from motor and supplementary motorareas using a 37-channel sensor array over each hemi-sphere we focus here only on the activity from auditorycortices of both hemispheres

More relevant to our results are the studies of Hiranoet al (Hirano et al 1996 Hirano Kojima et al 1997Hirano Naito et al 1997) who used PET to look atcortical activation during speaking In their first studyHirano et al found a lack of activation in the auditorycortex while subjects vocalized In their second studyHirano et al altered how subjects heard their ownvocalizations and this time recorded significant activa-tion of the auditory cortex These results are consistentwith our findings in Experiments 1 and 3 of our study

Why is the Response to Self-Produced SpeechSuppressed

The present study along with recent MEG studies byCurio et al and other brain imaging studies that haveused PET or fMRI demonstrate that during speechproduction the auditory cortex suppresses its responseto expected acoustic signals (Hirano et al 1996 HiranoKojima et al 1997 Hirano Naito et al 1997) Doesthat suppression serve any function We speculate abouttwo possibilities

Auditory Perception

One possibility is that the suppression results from theprocess of distinguishing self-produced from externallyproduced sources It would be useful for sensory systemsto make this distinction since sensory information fromthese two types of sources is commonly used for differ-ent purposes Sensory information from self-producedmotor actions could potentially be used for feedbackcontrol On the other hand sensory information fromexternal sources is primarily used for recognition

But how could the sensory systems distinguish self-produced from externally produced sensory informa-tion It has been proposed by previous researchers thatthis is done by having motor actions produce expect-ations of their sensory consequences (Jeannerod 1988)These sensory outcome predictions are then comparedwith the actual sensory input whatever matches theoutcome predictions is inhibited In that way the sen-sory consequences of self-produced actions could befiltered from incoming sensory data thus highlightingany unpredicted and therefore external (or novel) stim-ulus that should potentially be attended to

For a number of reasons the results of our experi-ments are consistent with the above account Firstalthough we found significant suppression of the re-sponses of the auditory cortex to self-produced speechwe found little evidence that the act of producingspeech suppresses responses to other sounds (in our

case the 1-kHz tone) This finding is consistent with thehypothesis that only responses to self-produced soundsare being filtered from the auditory cortex In additionwe found that altering the auditory feedback with gatednoise abolished the suppression in the auditory cortexinduced by speaking consistent with the hypothesis thatthe filtering out of auditory information arising from self-produced speech is done by comparison with predictedauditory feedback

Recently Blakemore et al have recorded the sametypes of response suppression phenomena in the soma-tosensory cortex suggesting that the need to distinguishself-produced from externally generated sensory inputmay be a general property of sensory systems In theirseries of experiments Blakemore et al used bothbehavioral and physiological procedures to look at re-sponses to both self-produced and externally producedsomatosensory stimulation In one experiment theyexamined the well-known phenomenon that peoplecannot tickle themselves (Weiskrantz Elliott amp Darling-ton 1971) This phenomenon could be explained bysupposing that somatosensory responses are inhibitedwhen the incoming sensory (touch) information matchesa prediction of that input generated by the personrsquos ownmotor system To test this hypothesis these investigatorshad subjects tickle their palm via an apparatus that couldintroduce a time delay between their movement andthe touch stimulation of their palm (Blakemore Frith ampWolpert 1999) When this was done subjects reportedthat with increasing delay the stimulation of theirpalm was increasingly stimulating (ticklish) That theperceived stimulation increased with delay could beexplained by supposing that the delay made the incom-ing sensory information mismatch in time the predictedsensory input This experiment was followed up in anfMRI imaging study looking at activation of the somato-sensory cortex and cerebellum to both self-producedand externally generated palm stimulation (BlakemoreWolpert amp Frith 1999) It was found that activation ofthe somatosensory cortex to self-produced palm stimu-lation was suppressed compared to externally producedpalm stimulationmdasha direct analog of the results recordedin the auditory cortex in our experiments

Speech Motor Control

The suppression could also serve to regulate howmuch auditory feedback is used in the ongoing controlof speech

A problem with using auditory feedback to controlspeech is the delay incurred by the processing of theauditory input To control speech auditory feedbackmust convey information about the positions of thevocal tract articulators From speech acoustics we knowthat formant frequencies and pitch are the acousticparameters that convey this information (Stevens1999) Extracting these parameters from the type of

Houde et al 1133

signal provided by the cochlea requires significant signalprocessing which in turn suggests the possibility ofsignificant delay if a succession of auditory brain areasare needed for this processing (Jurgens 2002 BurnettFreedland Larson amp Hain 1998 Perkell 1997) Inaddition there will be delays associated with the gen-eration of a response in the motor cortex There will alsobe intrinsic delays in transmitting the neural signals tothe appropriate muscles as well as delays in muscleresponses to these signals (Jurgens 2002) Togetherthese factors could contribute to a significant feedbackloop delay between auditory input and motor output

Unfortunately we know from control theory that ingeneral using feedback to control a system with a largefeedback loop delay will not work because the systembecomes unstable (Franklin Powell amp Emami-Naeini1991) Intuitively this instability arises because the delaymakes the feedback uncorrelated with the state of thesystem sensory feedback from some time in the pastdoes not entirely reflect the current state of the systemYet auditory feedback does appear to be involved in theonline control of some aspects of speech production Asnoted in the Introduction if the pitch of a speakerrsquosfeedback is perturbed they will typically compensatewithin 100 ndash200 msec (Burnett et al 1998 Kawahara1993 Elman 1981) Thus in spite of a large (100ndash200 msec) feedback loop delay auditory feedback doesstably regulate a speakerrsquos pitch How is this possibleAgain from control theory we know that when feedbackis delayed corrupted by noise or otherwise madeuncorrelated with current motor output a successfulapproach to using feedback control is to attenuate thesensory feedback to the degree that it is uncorrelatedSpecifically in feedback control based on Kalman filter-ing the Kalman gain on sensory feedback is set in partto be inversely proportional to the degree to whichsensory feedback is correlated with the current systemstate (Jacobs 1993) In an analogous way it may be thatthe suppression of the auditory cortex during speakingfunctions to attenuate the auditory feedback to a levelcommensurate with its delay such that it can be usedproperly in the control of speech

Finally it is possible that both of the above accountsare true It may be that the suppression of self-producedspeech acts as a filter for incoming sensory data allow-ing attention to be paid to externally produced sensa-tions while at the same time properly attenuating theauditory feedback for use in the online control ofspeech Future experiments must be designed to distin-guish between the above hypothesized possibilities

METHODS

Subjects

Eight male volunteers (ages 25ndash40) participated inExperiments 1 and 3 of this study Of these volun-

teers six also participated in Experiment 2 All subjectsgave their informed consent All studies were per-formed with the approval of the UCSF Committee forHuman Research

Experimental Design

Magnetic fields were recorded from both hemispheresin a shielded room using two 37-channel biomagnetom-eters with SQUID-based first-order gradiometer sensors(Magnes II 4-D Neuroimaging San Diego CA) A sche-matic of the experimental setup is shown in Figure 1A directional microphone was placed in the chamber at adistance where it did not distort the recorded magneticfields Triggering of the data acquisition system and ofstimuli delivered to the ears was controlled and itdiffered for each of three experiments in the study

Fiduciary points were marked on the skin for latercoregistration with structural magnetic resonanceimages and the head shape was digitized to constrainsubsequent source modeling The sensor was initiallypositioned over the estimated location of the auditorycortices in both hemispheres such that a dipolar M100response was evoked by single 400-msec duration tonepips (1 kHz 5 msec risefall ramps 90 dB SPL) Dataacquisition epochs were 600 msec in total duration witha 100-msec prestimulus period referenced to the onsetof the first speech stimulus Data were acquired at asampling rate of 1041 Hz The position of the sensorwas then refined so that a single dipole localizationmodel of the averaged evoked magnetic field responsesto 100 tones resulted in a correlation and goodness-of-fitgreater than 95

With satisfactory sensor positioning over the auditorycortices in both hemispheres but before starting theexperiments subjects practiced producing the neutralvowel sound by phonating while relaxing their uppervocal tract generating no movement of their jaw ortongue Preliminary experiments revealed that this formof speech production produced no movement artifactsin these magnetic field recordings

During the speaking condition of Experiments 1 and 3the audio onset of each utterance picked up the micro-phone triggered the MEG data acquisition system whichrecorded 1 sec of data (300 msec pretrigger 700 msecposttrigger) at a sampling rate of 1041 Hz Subjectswere instructed to produce short vowel sounds byphonating their vocal chords without any movement oftheir jaw or tongue Responses to 100 such utteranceswere recorded and these magnetic field recordings werethen averaged to obtain the evoked magnetic fieldresponses to self-produced speech for each MEG detec-tor channel In the subsequent tape playback conditionthe audio signal from the tape recorder was fed to thesubjects earphones and the trigger input for MEG dataacquisition Again magnetic field recordings wereaveraged to obtain the evoked magnetic field responses

1134 Journal of Cognitive Neuroscience Volume 14 Number 8

During the tape playback condition of Experiments 1and 3 the audio level of the signal fed to the earphonesand trigger was adjusted to be the same as that of thespeaking condition Because of this the trigger of MEGdata acquisition (which was based on an audio voltagethreshold) occurred at the same time relative to utter-ance onset in both conditions The equal audio levelsensured that triggering latency differences could notaccount for any differences between the evoked re-sponses to self-produced and tape-recorded speech InExperiment 1 the subject heard these equal audio levelsin the earphones which meant that his side tone ampli-tude was the same in both the speaking and tape play-back conditions In Experiment 3 the audio signal foreach utterance was used to trigger both the MEG dataacquisition system and a white noise generator (20 Hzndash20 kHz bandwidth 85 dB SPL) In this case a sum of theiraudio signal and noise was presented to the subject

In Experiment 2 onset of the tone pips triggered MEGdata acquisition in each of the three conditions (tonesalone tones and speaking and tones and tape playback)These 1-kHz tones each had a 400-msec duration 5-mseconset and offset ramps and a 08ndash12-sec ISI betweentones The subject heard the tones at 90 dB SPL As withExperiment 1 data acquisition epochs were 10 sec intotal duration with a 300-msec prestimulus period Datawere acquired at a sampling rate of 1041 Hz In eachcondition the evoked magnetic field responses to thetone pips were obtained by averaging 100 epochs

Epochs 1500 msec long with a 500-msec prestimulusperiod were collected at a sampling rate of 1041 Hz Theraw epoch magnetic field data from these 100 utteranceswere then averaged to obtain the evoked magnetic fieldresponses to gated noise during self-production

The voltage level of the audio signal fed to earphoneand trigger signal in this condition was adjusted to beidentical to the voltage levels recorded during the speak-ing condition ensuring that triggering differences couldnot account for any differences between the evokedresponses to noise in the self-produced and tape play-back blocks

Data Analysis

Data from the two sensor arrays positioned over eachhemisphere were separately analyzed From the evokedmagnetic field response for each condition in eachchannel the RMS value of evoked magnetic field strengthaveraged across channels was computed The RMS meth-od of averaging (square root of the mean of squaredchannel field strengths) was used to avoid the cancella-tion that occurs when the magnetic fields from detectorson opposite sides of the current dipole (whose fields areroughly equal in magnitude but opposite in sign) areaveraged together This process resulted in one RMStime waveform for each hemisphere for each conditionfor each subject For statistical comparisons across con-

ditions a one-way analysis of variance was performedwith condition as factor and time as a repeated measure

The RMS across channels is a convenient measure ofactivity that is independent of the position of individualsensors Following optimal placement of the sensor for asignificant M100 dipolar response RMS has been shownto correlate well with the lsquolsquoequivalent current-dipolestrengthrsquorsquo (also called the lsquolsquoQ valuersquorsquo) of a dipole posi-tioned in and around the auditory cortex The RMSmeasure is also correlated with the data obtained from(1) the channel with the biggest signal-to-noise ratioand (2) the first two principal components of the arrayresponse (Ahissar et al 2001 Mahncke 1998)

In sensor arrays comprising of planar gradiometers(like the Neuromag 122 or 306) the channel with largestsignal-to-noise ratio is often closest to the source (asdictated by the physics) However in the case of sensorarrays comprising of axial gradiometers (as in ourMagnes II system) the channel with the largest signal-to-noise ratio reflects the peak of the magnetic fieldwhich is not the closest channel to a source Moreoverthe lsquolsquobestrsquorsquo channel could reflect either a peak lsquolsquoinwardrsquorsquoor lsquolsquooutwardrsquorsquo field requiring signed inversion whileaveraging such data across subjects Finally since thesensor array position can varying considerably acrosssubjects it is difficult to use procedures typical for EEGand whole-head studies wherein single channel data isoften picked for analysis across subjects These consid-erations have led us to use RMS instead of pickingspecific channels for our analysis

Single Dipole Model Fit

A single equivalent current-dipole model was calculatedseparately for each hemisphere for each poststimulustime point using standard 4-D Neuroimaging softwareoperating on data that had been filtered between 1 and20 Hz The localization algorithm used an iterative least-squares minimization to compute the strength andlocation of a single dipole in a spherical volume ofuniform conductivity that can account for the sensordata Dipole fits were accepted based on a local corre-lation maximum criteria of 95 and goodness-of-fit valuesgreater than 95

The MEG anatomical reference frame was establishedusing a digital device called the lsquolsquosensor position indica-torrsquorsquo A set of three table-mounted and one array-mounted receivers triangulates the signal from a stylustransmitter positioned at fiducial reference points on thesubjectrsquos head surface (typically nasion left right pre-auricular points Cz and inion) The same stylus trans-mitter arrangement served to define the curvature of thehead by tracing the surface from which a lsquolsquolocal spherersquorsquomodel of the head was generated To coregister thedipole locations on the MRI image the fiduciary pointsserve as the basis for a common coordinate system Bysuperimposing these fiducial landmarks on MR images

Houde et al 1135

of the subject it was possible to define the position ofthe computed point sources with an accuracy of 5 mm

Multiple Dipole Localization Analysis

Multiple dipole localization analyses of spatio-temporalevoked magnetic fields was also performed using Multi-ple Signal Classification (MUSIC) (Sekihara PoeppelMarantz Koizumi amp Miyashita 1997 Mosher Lewisamp Leahy 1992) and beamformer algorithms (Mosheret al 1992 Sekihara Nagarajan Poeppel Marantz ampMiyashita 2001 Robinson amp Vrba 1999) MUSIC meth-ods are based on estimation of a signal lsquolsquosubspacersquorsquofrom entire spatio-temporal MEG data using singular-value decomposition (SVD) A version of the MUSICalgorithm referred to as the lsquolsquoconventionalrsquorsquo MUSICalgorithm was implemented in MATLAB under theassumption that the sources contributing to the MEGdata arose from multiple dipoles (lt37 in number)which were stationary and were located within a spher-ical volume of uniform conductivity (Sekihara et al1997) The locations of dipoles are typically determinedby conducting a search over a three-dimensional grid ofinterest within the head Given the sensor positionsand the coordinates of the origin of a lsquolsquolocal spherersquorsquoapproximation of the head shape for each subject alead-field matrix was computed for each point in this3-D grid At each point in this grid from the lead-fieldmatrix and the covariance matrix of the spatio-temporalMEG data the value of a MUSIC localizer function canthen be computed (Sekihara et al 1997 Equation (4))Maxima of this localizer function then corresponded tothe location of dipolar sources We used the MUSICalgorithm to compute the localizer functions over athree-dimensional grid in the hemisphere contralateralto the stimulus delivery The localizer was computedusing the evoked magnetic field responses over a timewindow of 1000 msec following the onset of thesentences For each subject at each point in a 3-D grid(iexcl4 lt x lt 6 0 lt y lt 8 3 lt z lt 11) in the lefthemisphere the localizer function was computed usingthe averaged evoked auditory magnetic field responses

Acknowledgments

This work was funded in part by Scientific Learning Co and agrant from the Whitaker Foundation

Reprint requests should be sent to John F Houde W M KeckCenter for Integrative Neuroscience University of CaliforniaSan Francisco 513 Parnassus Avenue S-762 San Francisco CA94143 USA or via e-mail houdephyucsfedu

Note

1 In a recent study of M100 responses to tones it was foundthat near detection thresholds a 15-dB change in SPL caused a34 change in M100 RMS amplitude (Stufflebeam et al 1998)

REFERENCES

Ahissar E Nagarajan S Ahissar M Protopapas A MahnckeH amp Merzenich M M (2001) Speech comprehension iscorrelated with temporal response patterns recorded fromauditory cortex Proceedings of the National Academy ofSciences USA 98 13367 ndash13372

Blakemore S J Frith C D amp Wolpert D M (1999)Spatio-temporal prediction modulates the perception ofself-produced stimuli Journal of Cognitive Neuroscience11 551 ndash559

Blakemore S J Wolpert D M amp Frith C D (1999) Thecerebellum contributes to somatosensory cortical activityduring self-produced tactile stimulation Neuroimage 10448 ndash459

Burnett T A Freedland M B Larson C R amp Hain T C(1998) Voice f0 responses to manipulations in pitchfeedback Journal of the Acoustical Society of America103 3153ndash3161

Cooper W E (1979) Speech perception and productionStudies in selective adaptation Norwood NJ Ablex

Creutzfeldt O amp Ojemann G (1989) Neuronal activityin the human lateral temporal lobe III Activity changesduring music Experimental Brain Research 77490 ndash498

Creutzfeldt O Ojemann G amp Lettich E (1989a) Neuronalactivity in the human lateral temporal lobe I Responses tospeech Experimental Brain Research 77 451 ndash475

Creutzfeldt O Ojemann G amp Lettich E (1989b) Neuronalactivity in the human lateral temporal lobe II Responses tothe subjects own voice Experimental Brain Research 77476 ndash489

Curio G Neuloh G Numminen J Jousmaki V amp Hari R(2000) Speaking modifies voice-evoked activity in thehuman auditory cortex Human Brain Mapping 9183 ndash191

Demonet J F Chollet F Ramsay S Cardebat DNespoulous J L Wise R Rascol A amp Frackowiak R(1992) The anatomy of phonological and semanticprocessing in normal subjects Brain 115 1753 ndash1768

Diesch E amp Luce T (1997) Magnetic fields elicited bytones and vowel formants reveal tonotopy and nonlinearsummation of cortical activation Psychophysiology 34501 ndash510

Elman J L (1981) Effects of frequency-shifted feedback onthe pitch of vocal productions Journal of the AcousticalSociety of America 70 45ndash50

Franklin G F Powell J D amp Emami-Naeini A (1991)Feedback control of dynamic systems (2nd ed) ReadingMA Addison-Wesley

Gracco V L Ross D Kalinowski J amp Stuart A (1994)Articulatory changes following spectral and temporalmodifications in auditory feedback Journal of theAcoustical Society of America 95 2821

Gunji A Hoshiyama M amp Kakigi R (2000) Identificationof auditory evoked potentials of onersquos own voice ClinicalNeurophysiology 111 214 ndash219

Gunji A Kakigi R amp Hoshiyama M (2000) Spatiotemporalsource analysis of vocalization-associated magnetic fieldsCognitive Brain Research 9 157 ndash163

Hari R Aittoniemi K Jarvinen M L Katila T amp Varpula T(1980) Auditory evoked transient and sustained magneticfields of the human brain Localization of neural generatorsExperimental Brain Research 40 237 ndash240

Hari R amp Makela J P (1988) Modification of neuromagneticresponses of the human auditory cortex by masking soundsExperimental Brain Research 71 87ndash92

Hein A V amp Held R (1962) A neural model for labile

1136 Journal of Cognitive Neuroscience Volume 14 Number 8

sensorimotor coordination In E Bernard amp M Hare(Eds) Biological prototypes and synthetic systems(vol 1 pp 71ndash74) New York Plenum

Hickok G amp Poeppel D (2000) Towards a functionalneuroanatomy of speech perception Trends in CognitiveSciences 4 131 ndash138

Hirano S Kojima H Naito Y Honjo I Kamoto YOkazawa H Ishizu K Yonekura Y Nagahama YFukuyama H amp Konishi J (1996) Cortical speechprocessing mechanisms while vocalizing visually presentedlanguages NeuroReport 8 363ndash367

Hirano S Kojima H Naito Y Honjo I Kamoto YOkazawa H Ishizu K Yonekura Y Nagahama YFukuyama H amp Konishi J (1997) Cortical processingmechanism for vocalization with auditory verbal feedbackNeuroReport 8 2379ndash2382

Hirano S Naito Y Okazawa H Kojima H Honjo I IshizuK Yenokura Y Nagahama Y Fukuyama H amp Konishi J(1997) Cortical activation by monaural speech soundstimulation demonstrated by positron emission tomographyExperimental Brain Research 113 75ndash80

Houde J F amp Jordan M I (1998) Sensorimotor adaptationin speech production Science 279 1213 ndash1216

Jacobs O L R (1993) Introduction to control theory(2nd ed) Oxford UK Oxford University Press

Jeannerod M (1988) The neural and behaviouralorganisation of goal-directed movements Oxford UKOxford University Press

Jurgens U (2002) Neural pathways underlying vocalcontrol Neuroscience and Biobehavioral Reviews 26235 ndash258

Kawahara H (1993) Transformed auditory feedback Effectsof fundamental frequency perturbation Journal of theAcoustical Society of America 94 1883

Kuriki S Mori T amp Hirata Y (1999) Motor planningcenter for speech articulation in the normal human brainNeuroReport 10 765 ndash769

Kuriki S amp Murase M (1989) Neuromagnetic study of theauditory responses in right and left hemispheres of thehuman brain evoked by pure tones and speech soundsExperimental Brain Research 77 127 ndash134

Lane H amp Tranel B (1971) The lombard sign and the roleof hearing in speech Journal of Speech and HearingResearch 14 677ndash709

Lee B S (1950) Some effects of side-tone delay Journal ofthe Acoustical Society of America 22 639 ndash640

Leonardo A amp Konishi M (1999) Decrystallization of adultbirdsong by perturbation of auditory feedback Nature 399466 ndash470

Levelt W J (1983) Monitoring and self-repair in speechCognition 14 41ndash104

Levelt W J M (1989) Speaking From intention toarticulation Cambridge MIT Press

Liegeois-Chauvel C Musolino A Badier J M Marquis P ampChauvel P (1994) Evoked potentials recorded from theauditory cortex in man Evaluation and topography of themiddle latency components Electroencephalography andClinical Neurophysiology 92 204 ndash214

Mahncke H (1998) Oscillations and temporal processingin sensory cortex Unpublished doctoral dissertationUniversity of California San Francisco

McCasland J S amp Konishi M (1981) Interaction betweenauditory and motor activities in an avian song controlnucleus Proceedings of the National Academy of SciencesUSA 78 7815ndash7819

McGuire P K Silbersweig D A amp Frith C D (1996)Functional neuroanatomy of verbal self-monitoring Brain119 907 ndash917

Mosher J C Lewis P S amp Leahy R M (1992) Multipledipole modeling and localization from spatio-temporal megdata IEEE Transactions on Biomedical Engineering 39541 ndash557

Muller-Preuss P amp Ploog D (1981) Inhibition of auditorycortical neurons during phonation Brain Research 21561ndash76

Numminen J amp Curio G (1999) Differential effects of overtcovert and replayed speech on vowel-evoked responsesof the human auditory cortex Neuroscience Letters 27229ndash32

Numminen J Salmelin R amp Hari R (1999) Subjectrsquos ownspeech reduces reactivity of the human auditory cortexNeuroscience Letters 265 119 ndash122

Papanicolaou A C Raz N Loring D W amp EisenbergH M (1986) Brain stem evoked response suppressionduring speech production Brain and Language 2750ndash55

Perkell J (1997) Articulatory processes In W J Hardcastleamp J Laver (Eds) The handbook of phonetic sciences(pp 333 ndash370) Oxford UK Blackwell Publishers

Picton T W Alain C Woods D L John M S Scherg MValdes-Sosa P Bosch-Bayard J amp Trujillo N J (1999)Intracerebral sources of human auditory-evoked potentialsAudiology and Neuro-otology 4 64ndash79

Poeppel D Yellin E Phillips C Roberts T P Rowley H AWexler K amp Marantz A (1996) Task-induced asymmetry ofthe auditory evoked m100 neuromagnetic field elicited byspeech sounds Brain Research Cognitive Brain Research4 231 ndash242

Reite M Adams M Simon J Teale P Sheeder JRichardson D amp Grabbe R (1994) Auditory m100component 1 Relationship to heschlrsquos gyri BrainResearch Cognitive Brain Research 2 13ndash20

Richer F Alain C Achim A Bouvier G amp Saint-Hilaire J M(1989) Intracerebral amplitude distributions of the auditoryevoked potential Electroencephalography and ClinicalNeurophysiology 74 202 ndash208

Robinson S amp Vrba J (1999) Functional neuroimaging bysynthetic aperture magnetometry (SAM) In Yoshimoto et al(Eds) Recent advances in biomagnetism (pp 302 ndash305)Sendai Japan Tokyo University Press

Scherg M amp Von Cramon D (1985) Two bilateral sources ofthe late aep as identified by a spatio-temporal dipole modelElectroencephalography and Clinical Neurophysiology 6232ndash44

Scherg M amp Von Cramon D (1986) Evoked dipolesource potentials of the human auditory cortexElectroencephalography and Clinical Neurophysiology65 344 ndash360

Schmidt M F amp Konishi M (1998) Gating of auditoryresponses in the vocal control system of awake songbirdsNature Neuroscience 1 513ndash518

Sekihara K Nagarajan S S Poeppel D Marantz Aamp Miyashita Y (2001) Reconstructing spatio-temporalactivities of neural sources using an meg vectorbeamformer technique IEEE Transactions on BiomedicalEngineering 48 760ndash771

Sekihara K Poeppel D Marantz A Koizumi H ampMiyashita Y (1997) Noise covariance incorporatedmeg-music algorithm A method for multiple-dipoleestimation tolerant of the influence of background brainactivity IEEE Transactions on Biomedical Engineering44 839 ndash847

Stevens K N (1999) Acoustic phonetics CambridgeMIT Press

Stufflebeam S M Poeppel D Rowley H A amp RobertsT P (1998) Peri-threshold encoding of stimulus

Houde et al 1137

frequency and intensity in the m100 latency NeuroReport9 91ndash94

Suga N amp Schlegel P (1972) Neural attenuation ofresponses to emitted sounds in echolocating rats Science177 82ndash84

Suga N amp Shimozawa T (1974) Site of neural attenuation ofresponses to self-vocalized sounds in echolocating batsScience 183 1211 ndash1213

von Bekesy G (1949) The structure of the middle ear and thehearing of onersquos own voice by bone conduction Journal ofthe Acoustical Society of America 21 217 ndash232

Weiskrantz L Elliott J amp Darlington C (1971) Preliminaryobservations on tickling oneself Nature 230 598 ndash599

Woods D L Knight R T amp Neville H J (1984)Bitemporal lesions dissociate auditory evoked potentialsand perception Electroencephalography and ClinicalNeurophysiology 57 208 ndash220

Yates A J (1963) Delayed auditory feedback PsychologicalBulletin 60 213 ndash232

Zatorre R J Evans A C Meyer E amp Gjedde A (1992)Lateralization of phonetic and pitch discrimination in speechprocessing Science 256 846 ndash849

1138 Journal of Cognitive Neuroscience Volume 14 Number 8

experiments because we were unable to simultaneouslymeasure activity from motor and supplementary motorareas using a 37-channel sensor array over each hemi-sphere we focus here only on the activity from auditorycortices of both hemispheres

More relevant to our results are the studies of Hiranoet al (Hirano et al 1996 Hirano Kojima et al 1997Hirano Naito et al 1997) who used PET to look atcortical activation during speaking In their first studyHirano et al found a lack of activation in the auditorycortex while subjects vocalized In their second studyHirano et al altered how subjects heard their ownvocalizations and this time recorded significant activa-tion of the auditory cortex These results are consistentwith our findings in Experiments 1 and 3 of our study

Why is the Response to Self-Produced SpeechSuppressed

The present study along with recent MEG studies byCurio et al and other brain imaging studies that haveused PET or fMRI demonstrate that during speechproduction the auditory cortex suppresses its responseto expected acoustic signals (Hirano et al 1996 HiranoKojima et al 1997 Hirano Naito et al 1997) Doesthat suppression serve any function We speculate abouttwo possibilities

Auditory Perception

One possibility is that the suppression results from theprocess of distinguishing self-produced from externallyproduced sources It would be useful for sensory systemsto make this distinction since sensory information fromthese two types of sources is commonly used for differ-ent purposes Sensory information from self-producedmotor actions could potentially be used for feedbackcontrol On the other hand sensory information fromexternal sources is primarily used for recognition

But how could the sensory systems distinguish self-produced from externally produced sensory informa-tion It has been proposed by previous researchers thatthis is done by having motor actions produce expect-ations of their sensory consequences (Jeannerod 1988)These sensory outcome predictions are then comparedwith the actual sensory input whatever matches theoutcome predictions is inhibited In that way the sen-sory consequences of self-produced actions could befiltered from incoming sensory data thus highlightingany unpredicted and therefore external (or novel) stim-ulus that should potentially be attended to

For a number of reasons the results of our experi-ments are consistent with the above account Firstalthough we found significant suppression of the re-sponses of the auditory cortex to self-produced speechwe found little evidence that the act of producingspeech suppresses responses to other sounds (in our

case the 1-kHz tone) This finding is consistent with thehypothesis that only responses to self-produced soundsare being filtered from the auditory cortex In additionwe found that altering the auditory feedback with gatednoise abolished the suppression in the auditory cortexinduced by speaking consistent with the hypothesis thatthe filtering out of auditory information arising from self-produced speech is done by comparison with predictedauditory feedback

Recently Blakemore et al have recorded the sametypes of response suppression phenomena in the soma-tosensory cortex suggesting that the need to distinguishself-produced from externally generated sensory inputmay be a general property of sensory systems In theirseries of experiments Blakemore et al used bothbehavioral and physiological procedures to look at re-sponses to both self-produced and externally producedsomatosensory stimulation In one experiment theyexamined the well-known phenomenon that peoplecannot tickle themselves (Weiskrantz Elliott amp Darling-ton 1971) This phenomenon could be explained bysupposing that somatosensory responses are inhibitedwhen the incoming sensory (touch) information matchesa prediction of that input generated by the personrsquos ownmotor system To test this hypothesis these investigatorshad subjects tickle their palm via an apparatus that couldintroduce a time delay between their movement andthe touch stimulation of their palm (Blakemore Frith ampWolpert 1999) When this was done subjects reportedthat with increasing delay the stimulation of theirpalm was increasingly stimulating (ticklish) That theperceived stimulation increased with delay could beexplained by supposing that the delay made the incom-ing sensory information mismatch in time the predictedsensory input This experiment was followed up in anfMRI imaging study looking at activation of the somato-sensory cortex and cerebellum to both self-producedand externally generated palm stimulation (BlakemoreWolpert amp Frith 1999) It was found that activation ofthe somatosensory cortex to self-produced palm stimu-lation was suppressed compared to externally producedpalm stimulationmdasha direct analog of the results recordedin the auditory cortex in our experiments

Speech Motor Control

The suppression could also serve to regulate howmuch auditory feedback is used in the ongoing controlof speech

A problem with using auditory feedback to controlspeech is the delay incurred by the processing of theauditory input To control speech auditory feedbackmust convey information about the positions of thevocal tract articulators From speech acoustics we knowthat formant frequencies and pitch are the acousticparameters that convey this information (Stevens1999) Extracting these parameters from the type of

Houde et al 1133

signal provided by the cochlea requires significant signalprocessing which in turn suggests the possibility ofsignificant delay if a succession of auditory brain areasare needed for this processing (Jurgens 2002 BurnettFreedland Larson amp Hain 1998 Perkell 1997) Inaddition there will be delays associated with the gen-eration of a response in the motor cortex There will alsobe intrinsic delays in transmitting the neural signals tothe appropriate muscles as well as delays in muscleresponses to these signals (Jurgens 2002) Togetherthese factors could contribute to a significant feedbackloop delay between auditory input and motor output

Unfortunately we know from control theory that ingeneral using feedback to control a system with a largefeedback loop delay will not work because the systembecomes unstable (Franklin Powell amp Emami-Naeini1991) Intuitively this instability arises because the delaymakes the feedback uncorrelated with the state of thesystem sensory feedback from some time in the pastdoes not entirely reflect the current state of the systemYet auditory feedback does appear to be involved in theonline control of some aspects of speech production Asnoted in the Introduction if the pitch of a speakerrsquosfeedback is perturbed they will typically compensatewithin 100 ndash200 msec (Burnett et al 1998 Kawahara1993 Elman 1981) Thus in spite of a large (100ndash200 msec) feedback loop delay auditory feedback doesstably regulate a speakerrsquos pitch How is this possibleAgain from control theory we know that when feedbackis delayed corrupted by noise or otherwise madeuncorrelated with current motor output a successfulapproach to using feedback control is to attenuate thesensory feedback to the degree that it is uncorrelatedSpecifically in feedback control based on Kalman filter-ing the Kalman gain on sensory feedback is set in partto be inversely proportional to the degree to whichsensory feedback is correlated with the current systemstate (Jacobs 1993) In an analogous way it may be thatthe suppression of the auditory cortex during speakingfunctions to attenuate the auditory feedback to a levelcommensurate with its delay such that it can be usedproperly in the control of speech

Finally it is possible that both of the above accountsare true It may be that the suppression of self-producedspeech acts as a filter for incoming sensory data allow-ing attention to be paid to externally produced sensa-tions while at the same time properly attenuating theauditory feedback for use in the online control ofspeech Future experiments must be designed to distin-guish between the above hypothesized possibilities

METHODS

Subjects

Eight male volunteers (ages 25ndash40) participated inExperiments 1 and 3 of this study Of these volun-

teers six also participated in Experiment 2 All subjectsgave their informed consent All studies were per-formed with the approval of the UCSF Committee forHuman Research

Experimental Design

Magnetic fields were recorded from both hemispheresin a shielded room using two 37-channel biomagnetom-eters with SQUID-based first-order gradiometer sensors(Magnes II 4-D Neuroimaging San Diego CA) A sche-matic of the experimental setup is shown in Figure 1A directional microphone was placed in the chamber at adistance where it did not distort the recorded magneticfields Triggering of the data acquisition system and ofstimuli delivered to the ears was controlled and itdiffered for each of three experiments in the study

Fiduciary points were marked on the skin for latercoregistration with structural magnetic resonanceimages and the head shape was digitized to constrainsubsequent source modeling The sensor was initiallypositioned over the estimated location of the auditorycortices in both hemispheres such that a dipolar M100response was evoked by single 400-msec duration tonepips (1 kHz 5 msec risefall ramps 90 dB SPL) Dataacquisition epochs were 600 msec in total duration witha 100-msec prestimulus period referenced to the onsetof the first speech stimulus Data were acquired at asampling rate of 1041 Hz The position of the sensorwas then refined so that a single dipole localizationmodel of the averaged evoked magnetic field responsesto 100 tones resulted in a correlation and goodness-of-fitgreater than 95

With satisfactory sensor positioning over the auditorycortices in both hemispheres but before starting theexperiments subjects practiced producing the neutralvowel sound by phonating while relaxing their uppervocal tract generating no movement of their jaw ortongue Preliminary experiments revealed that this formof speech production produced no movement artifactsin these magnetic field recordings

During the speaking condition of Experiments 1 and 3the audio onset of each utterance picked up the micro-phone triggered the MEG data acquisition system whichrecorded 1 sec of data (300 msec pretrigger 700 msecposttrigger) at a sampling rate of 1041 Hz Subjectswere instructed to produce short vowel sounds byphonating their vocal chords without any movement oftheir jaw or tongue Responses to 100 such utteranceswere recorded and these magnetic field recordings werethen averaged to obtain the evoked magnetic fieldresponses to self-produced speech for each MEG detec-tor channel In the subsequent tape playback conditionthe audio signal from the tape recorder was fed to thesubjects earphones and the trigger input for MEG dataacquisition Again magnetic field recordings wereaveraged to obtain the evoked magnetic field responses

1134 Journal of Cognitive Neuroscience Volume 14 Number 8

During the tape playback condition of Experiments 1and 3 the audio level of the signal fed to the earphonesand trigger was adjusted to be the same as that of thespeaking condition Because of this the trigger of MEGdata acquisition (which was based on an audio voltagethreshold) occurred at the same time relative to utter-ance onset in both conditions The equal audio levelsensured that triggering latency differences could notaccount for any differences between the evoked re-sponses to self-produced and tape-recorded speech InExperiment 1 the subject heard these equal audio levelsin the earphones which meant that his side tone ampli-tude was the same in both the speaking and tape play-back conditions In Experiment 3 the audio signal foreach utterance was used to trigger both the MEG dataacquisition system and a white noise generator (20 Hzndash20 kHz bandwidth 85 dB SPL) In this case a sum of theiraudio signal and noise was presented to the subject

In Experiment 2 onset of the tone pips triggered MEGdata acquisition in each of the three conditions (tonesalone tones and speaking and tones and tape playback)These 1-kHz tones each had a 400-msec duration 5-mseconset and offset ramps and a 08ndash12-sec ISI betweentones The subject heard the tones at 90 dB SPL As withExperiment 1 data acquisition epochs were 10 sec intotal duration with a 300-msec prestimulus period Datawere acquired at a sampling rate of 1041 Hz In eachcondition the evoked magnetic field responses to thetone pips were obtained by averaging 100 epochs

Epochs 1500 msec long with a 500-msec prestimulusperiod were collected at a sampling rate of 1041 Hz Theraw epoch magnetic field data from these 100 utteranceswere then averaged to obtain the evoked magnetic fieldresponses to gated noise during self-production

The voltage level of the audio signal fed to earphoneand trigger signal in this condition was adjusted to beidentical to the voltage levels recorded during the speak-ing condition ensuring that triggering differences couldnot account for any differences between the evokedresponses to noise in the self-produced and tape play-back blocks

Data Analysis

Data from the two sensor arrays positioned over eachhemisphere were separately analyzed From the evokedmagnetic field response for each condition in eachchannel the RMS value of evoked magnetic field strengthaveraged across channels was computed The RMS meth-od of averaging (square root of the mean of squaredchannel field strengths) was used to avoid the cancella-tion that occurs when the magnetic fields from detectorson opposite sides of the current dipole (whose fields areroughly equal in magnitude but opposite in sign) areaveraged together This process resulted in one RMStime waveform for each hemisphere for each conditionfor each subject For statistical comparisons across con-

ditions a one-way analysis of variance was performedwith condition as factor and time as a repeated measure

The RMS across channels is a convenient measure ofactivity that is independent of the position of individualsensors Following optimal placement of the sensor for asignificant M100 dipolar response RMS has been shownto correlate well with the lsquolsquoequivalent current-dipolestrengthrsquorsquo (also called the lsquolsquoQ valuersquorsquo) of a dipole posi-tioned in and around the auditory cortex The RMSmeasure is also correlated with the data obtained from(1) the channel with the biggest signal-to-noise ratioand (2) the first two principal components of the arrayresponse (Ahissar et al 2001 Mahncke 1998)

In sensor arrays comprising of planar gradiometers(like the Neuromag 122 or 306) the channel with largestsignal-to-noise ratio is often closest to the source (asdictated by the physics) However in the case of sensorarrays comprising of axial gradiometers (as in ourMagnes II system) the channel with the largest signal-to-noise ratio reflects the peak of the magnetic fieldwhich is not the closest channel to a source Moreoverthe lsquolsquobestrsquorsquo channel could reflect either a peak lsquolsquoinwardrsquorsquoor lsquolsquooutwardrsquorsquo field requiring signed inversion whileaveraging such data across subjects Finally since thesensor array position can varying considerably acrosssubjects it is difficult to use procedures typical for EEGand whole-head studies wherein single channel data isoften picked for analysis across subjects These consid-erations have led us to use RMS instead of pickingspecific channels for our analysis

Single Dipole Model Fit

A single equivalent current-dipole model was calculatedseparately for each hemisphere for each poststimulustime point using standard 4-D Neuroimaging softwareoperating on data that had been filtered between 1 and20 Hz The localization algorithm used an iterative least-squares minimization to compute the strength andlocation of a single dipole in a spherical volume ofuniform conductivity that can account for the sensordata Dipole fits were accepted based on a local corre-lation maximum criteria of 95 and goodness-of-fit valuesgreater than 95

The MEG anatomical reference frame was establishedusing a digital device called the lsquolsquosensor position indica-torrsquorsquo A set of three table-mounted and one array-mounted receivers triangulates the signal from a stylustransmitter positioned at fiducial reference points on thesubjectrsquos head surface (typically nasion left right pre-auricular points Cz and inion) The same stylus trans-mitter arrangement served to define the curvature of thehead by tracing the surface from which a lsquolsquolocal spherersquorsquomodel of the head was generated To coregister thedipole locations on the MRI image the fiduciary pointsserve as the basis for a common coordinate system Bysuperimposing these fiducial landmarks on MR images

Houde et al 1135

of the subject it was possible to define the position ofthe computed point sources with an accuracy of 5 mm

Multiple Dipole Localization Analysis

Multiple dipole localization analyses of spatio-temporalevoked magnetic fields was also performed using Multi-ple Signal Classification (MUSIC) (Sekihara PoeppelMarantz Koizumi amp Miyashita 1997 Mosher Lewisamp Leahy 1992) and beamformer algorithms (Mosheret al 1992 Sekihara Nagarajan Poeppel Marantz ampMiyashita 2001 Robinson amp Vrba 1999) MUSIC meth-ods are based on estimation of a signal lsquolsquosubspacersquorsquofrom entire spatio-temporal MEG data using singular-value decomposition (SVD) A version of the MUSICalgorithm referred to as the lsquolsquoconventionalrsquorsquo MUSICalgorithm was implemented in MATLAB under theassumption that the sources contributing to the MEGdata arose from multiple dipoles (lt37 in number)which were stationary and were located within a spher-ical volume of uniform conductivity (Sekihara et al1997) The locations of dipoles are typically determinedby conducting a search over a three-dimensional grid ofinterest within the head Given the sensor positionsand the coordinates of the origin of a lsquolsquolocal spherersquorsquoapproximation of the head shape for each subject alead-field matrix was computed for each point in this3-D grid At each point in this grid from the lead-fieldmatrix and the covariance matrix of the spatio-temporalMEG data the value of a MUSIC localizer function canthen be computed (Sekihara et al 1997 Equation (4))Maxima of this localizer function then corresponded tothe location of dipolar sources We used the MUSICalgorithm to compute the localizer functions over athree-dimensional grid in the hemisphere contralateralto the stimulus delivery The localizer was computedusing the evoked magnetic field responses over a timewindow of 1000 msec following the onset of thesentences For each subject at each point in a 3-D grid(iexcl4 lt x lt 6 0 lt y lt 8 3 lt z lt 11) in the lefthemisphere the localizer function was computed usingthe averaged evoked auditory magnetic field responses

Acknowledgments

This work was funded in part by Scientific Learning Co and agrant from the Whitaker Foundation

Reprint requests should be sent to John F Houde W M KeckCenter for Integrative Neuroscience University of CaliforniaSan Francisco 513 Parnassus Avenue S-762 San Francisco CA94143 USA or via e-mail houdephyucsfedu

Note

1 In a recent study of M100 responses to tones it was foundthat near detection thresholds a 15-dB change in SPL caused a34 change in M100 RMS amplitude (Stufflebeam et al 1998)

REFERENCES

Ahissar E Nagarajan S Ahissar M Protopapas A MahnckeH amp Merzenich M M (2001) Speech comprehension iscorrelated with temporal response patterns recorded fromauditory cortex Proceedings of the National Academy ofSciences USA 98 13367 ndash13372

Blakemore S J Frith C D amp Wolpert D M (1999)Spatio-temporal prediction modulates the perception ofself-produced stimuli Journal of Cognitive Neuroscience11 551 ndash559

Blakemore S J Wolpert D M amp Frith C D (1999) Thecerebellum contributes to somatosensory cortical activityduring self-produced tactile stimulation Neuroimage 10448 ndash459

Burnett T A Freedland M B Larson C R amp Hain T C(1998) Voice f0 responses to manipulations in pitchfeedback Journal of the Acoustical Society of America103 3153ndash3161

Cooper W E (1979) Speech perception and productionStudies in selective adaptation Norwood NJ Ablex

Creutzfeldt O amp Ojemann G (1989) Neuronal activityin the human lateral temporal lobe III Activity changesduring music Experimental Brain Research 77490 ndash498

Creutzfeldt O Ojemann G amp Lettich E (1989a) Neuronalactivity in the human lateral temporal lobe I Responses tospeech Experimental Brain Research 77 451 ndash475

Creutzfeldt O Ojemann G amp Lettich E (1989b) Neuronalactivity in the human lateral temporal lobe II Responses tothe subjects own voice Experimental Brain Research 77476 ndash489

Curio G Neuloh G Numminen J Jousmaki V amp Hari R(2000) Speaking modifies voice-evoked activity in thehuman auditory cortex Human Brain Mapping 9183 ndash191

Demonet J F Chollet F Ramsay S Cardebat DNespoulous J L Wise R Rascol A amp Frackowiak R(1992) The anatomy of phonological and semanticprocessing in normal subjects Brain 115 1753 ndash1768

Diesch E amp Luce T (1997) Magnetic fields elicited bytones and vowel formants reveal tonotopy and nonlinearsummation of cortical activation Psychophysiology 34501 ndash510

Elman J L (1981) Effects of frequency-shifted feedback onthe pitch of vocal productions Journal of the AcousticalSociety of America 70 45ndash50

Franklin G F Powell J D amp Emami-Naeini A (1991)Feedback control of dynamic systems (2nd ed) ReadingMA Addison-Wesley

Gracco V L Ross D Kalinowski J amp Stuart A (1994)Articulatory changes following spectral and temporalmodifications in auditory feedback Journal of theAcoustical Society of America 95 2821

Gunji A Hoshiyama M amp Kakigi R (2000) Identificationof auditory evoked potentials of onersquos own voice ClinicalNeurophysiology 111 214 ndash219

Gunji A Kakigi R amp Hoshiyama M (2000) Spatiotemporalsource analysis of vocalization-associated magnetic fieldsCognitive Brain Research 9 157 ndash163

Hari R Aittoniemi K Jarvinen M L Katila T amp Varpula T(1980) Auditory evoked transient and sustained magneticfields of the human brain Localization of neural generatorsExperimental Brain Research 40 237 ndash240

Hari R amp Makela J P (1988) Modification of neuromagneticresponses of the human auditory cortex by masking soundsExperimental Brain Research 71 87ndash92

Hein A V amp Held R (1962) A neural model for labile

1136 Journal of Cognitive Neuroscience Volume 14 Number 8

sensorimotor coordination In E Bernard amp M Hare(Eds) Biological prototypes and synthetic systems(vol 1 pp 71ndash74) New York Plenum

Hickok G amp Poeppel D (2000) Towards a functionalneuroanatomy of speech perception Trends in CognitiveSciences 4 131 ndash138

Hirano S Kojima H Naito Y Honjo I Kamoto YOkazawa H Ishizu K Yonekura Y Nagahama YFukuyama H amp Konishi J (1996) Cortical speechprocessing mechanisms while vocalizing visually presentedlanguages NeuroReport 8 363ndash367

Hirano S Kojima H Naito Y Honjo I Kamoto YOkazawa H Ishizu K Yonekura Y Nagahama YFukuyama H amp Konishi J (1997) Cortical processingmechanism for vocalization with auditory verbal feedbackNeuroReport 8 2379ndash2382

Hirano S Naito Y Okazawa H Kojima H Honjo I IshizuK Yenokura Y Nagahama Y Fukuyama H amp Konishi J(1997) Cortical activation by monaural speech soundstimulation demonstrated by positron emission tomographyExperimental Brain Research 113 75ndash80

Houde J F amp Jordan M I (1998) Sensorimotor adaptationin speech production Science 279 1213 ndash1216

Jacobs O L R (1993) Introduction to control theory(2nd ed) Oxford UK Oxford University Press

Jeannerod M (1988) The neural and behaviouralorganisation of goal-directed movements Oxford UKOxford University Press

Jurgens U (2002) Neural pathways underlying vocalcontrol Neuroscience and Biobehavioral Reviews 26235 ndash258

Kawahara H (1993) Transformed auditory feedback Effectsof fundamental frequency perturbation Journal of theAcoustical Society of America 94 1883

Kuriki S Mori T amp Hirata Y (1999) Motor planningcenter for speech articulation in the normal human brainNeuroReport 10 765 ndash769

Kuriki S amp Murase M (1989) Neuromagnetic study of theauditory responses in right and left hemispheres of thehuman brain evoked by pure tones and speech soundsExperimental Brain Research 77 127 ndash134

Lane H amp Tranel B (1971) The lombard sign and the roleof hearing in speech Journal of Speech and HearingResearch 14 677ndash709

Lee B S (1950) Some effects of side-tone delay Journal ofthe Acoustical Society of America 22 639 ndash640

Leonardo A amp Konishi M (1999) Decrystallization of adultbirdsong by perturbation of auditory feedback Nature 399466 ndash470

Levelt W J (1983) Monitoring and self-repair in speechCognition 14 41ndash104

Levelt W J M (1989) Speaking From intention toarticulation Cambridge MIT Press

Liegeois-Chauvel C Musolino A Badier J M Marquis P ampChauvel P (1994) Evoked potentials recorded from theauditory cortex in man Evaluation and topography of themiddle latency components Electroencephalography andClinical Neurophysiology 92 204 ndash214

Mahncke H (1998) Oscillations and temporal processingin sensory cortex Unpublished doctoral dissertationUniversity of California San Francisco

McCasland J S amp Konishi M (1981) Interaction betweenauditory and motor activities in an avian song controlnucleus Proceedings of the National Academy of SciencesUSA 78 7815ndash7819

McGuire P K Silbersweig D A amp Frith C D (1996)Functional neuroanatomy of verbal self-monitoring Brain119 907 ndash917

Mosher J C Lewis P S amp Leahy R M (1992) Multipledipole modeling and localization from spatio-temporal megdata IEEE Transactions on Biomedical Engineering 39541 ndash557

Muller-Preuss P amp Ploog D (1981) Inhibition of auditorycortical neurons during phonation Brain Research 21561ndash76

Numminen J amp Curio G (1999) Differential effects of overtcovert and replayed speech on vowel-evoked responsesof the human auditory cortex Neuroscience Letters 27229ndash32

Numminen J Salmelin R amp Hari R (1999) Subjectrsquos ownspeech reduces reactivity of the human auditory cortexNeuroscience Letters 265 119 ndash122

Papanicolaou A C Raz N Loring D W amp EisenbergH M (1986) Brain stem evoked response suppressionduring speech production Brain and Language 2750ndash55

Perkell J (1997) Articulatory processes In W J Hardcastleamp J Laver (Eds) The handbook of phonetic sciences(pp 333 ndash370) Oxford UK Blackwell Publishers

Picton T W Alain C Woods D L John M S Scherg MValdes-Sosa P Bosch-Bayard J amp Trujillo N J (1999)Intracerebral sources of human auditory-evoked potentialsAudiology and Neuro-otology 4 64ndash79

Poeppel D Yellin E Phillips C Roberts T P Rowley H AWexler K amp Marantz A (1996) Task-induced asymmetry ofthe auditory evoked m100 neuromagnetic field elicited byspeech sounds Brain Research Cognitive Brain Research4 231 ndash242

Reite M Adams M Simon J Teale P Sheeder JRichardson D amp Grabbe R (1994) Auditory m100component 1 Relationship to heschlrsquos gyri BrainResearch Cognitive Brain Research 2 13ndash20

Richer F Alain C Achim A Bouvier G amp Saint-Hilaire J M(1989) Intracerebral amplitude distributions of the auditoryevoked potential Electroencephalography and ClinicalNeurophysiology 74 202 ndash208

Robinson S amp Vrba J (1999) Functional neuroimaging bysynthetic aperture magnetometry (SAM) In Yoshimoto et al(Eds) Recent advances in biomagnetism (pp 302 ndash305)Sendai Japan Tokyo University Press

Scherg M amp Von Cramon D (1985) Two bilateral sources ofthe late aep as identified by a spatio-temporal dipole modelElectroencephalography and Clinical Neurophysiology 6232ndash44

Scherg M amp Von Cramon D (1986) Evoked dipolesource potentials of the human auditory cortexElectroencephalography and Clinical Neurophysiology65 344 ndash360

Schmidt M F amp Konishi M (1998) Gating of auditoryresponses in the vocal control system of awake songbirdsNature Neuroscience 1 513ndash518

Sekihara K Nagarajan S S Poeppel D Marantz Aamp Miyashita Y (2001) Reconstructing spatio-temporalactivities of neural sources using an meg vectorbeamformer technique IEEE Transactions on BiomedicalEngineering 48 760ndash771

Sekihara K Poeppel D Marantz A Koizumi H ampMiyashita Y (1997) Noise covariance incorporatedmeg-music algorithm A method for multiple-dipoleestimation tolerant of the influence of background brainactivity IEEE Transactions on Biomedical Engineering44 839 ndash847

Stevens K N (1999) Acoustic phonetics CambridgeMIT Press

Stufflebeam S M Poeppel D Rowley H A amp RobertsT P (1998) Peri-threshold encoding of stimulus

Houde et al 1137

frequency and intensity in the m100 latency NeuroReport9 91ndash94

Suga N amp Schlegel P (1972) Neural attenuation ofresponses to emitted sounds in echolocating rats Science177 82ndash84

Suga N amp Shimozawa T (1974) Site of neural attenuation ofresponses to self-vocalized sounds in echolocating batsScience 183 1211 ndash1213

von Bekesy G (1949) The structure of the middle ear and thehearing of onersquos own voice by bone conduction Journal ofthe Acoustical Society of America 21 217 ndash232

Weiskrantz L Elliott J amp Darlington C (1971) Preliminaryobservations on tickling oneself Nature 230 598 ndash599

Woods D L Knight R T amp Neville H J (1984)Bitemporal lesions dissociate auditory evoked potentialsand perception Electroencephalography and ClinicalNeurophysiology 57 208 ndash220

Yates A J (1963) Delayed auditory feedback PsychologicalBulletin 60 213 ndash232

Zatorre R J Evans A C Meyer E amp Gjedde A (1992)Lateralization of phonetic and pitch discrimination in speechprocessing Science 256 846 ndash849

1138 Journal of Cognitive Neuroscience Volume 14 Number 8

signal provided by the cochlea requires significant signalprocessing which in turn suggests the possibility ofsignificant delay if a succession of auditory brain areasare needed for this processing (Jurgens 2002 BurnettFreedland Larson amp Hain 1998 Perkell 1997) Inaddition there will be delays associated with the gen-eration of a response in the motor cortex There will alsobe intrinsic delays in transmitting the neural signals tothe appropriate muscles as well as delays in muscleresponses to these signals (Jurgens 2002) Togetherthese factors could contribute to a significant feedbackloop delay between auditory input and motor output

Unfortunately we know from control theory that ingeneral using feedback to control a system with a largefeedback loop delay will not work because the systembecomes unstable (Franklin Powell amp Emami-Naeini1991) Intuitively this instability arises because the delaymakes the feedback uncorrelated with the state of thesystem sensory feedback from some time in the pastdoes not entirely reflect the current state of the systemYet auditory feedback does appear to be involved in theonline control of some aspects of speech production Asnoted in the Introduction if the pitch of a speakerrsquosfeedback is perturbed they will typically compensatewithin 100 ndash200 msec (Burnett et al 1998 Kawahara1993 Elman 1981) Thus in spite of a large (100ndash200 msec) feedback loop delay auditory feedback doesstably regulate a speakerrsquos pitch How is this possibleAgain from control theory we know that when feedbackis delayed corrupted by noise or otherwise madeuncorrelated with current motor output a successfulapproach to using feedback control is to attenuate thesensory feedback to the degree that it is uncorrelatedSpecifically in feedback control based on Kalman filter-ing the Kalman gain on sensory feedback is set in partto be inversely proportional to the degree to whichsensory feedback is correlated with the current systemstate (Jacobs 1993) In an analogous way it may be thatthe suppression of the auditory cortex during speakingfunctions to attenuate the auditory feedback to a levelcommensurate with its delay such that it can be usedproperly in the control of speech

Finally it is possible that both of the above accountsare true It may be that the suppression of self-producedspeech acts as a filter for incoming sensory data allow-ing attention to be paid to externally produced sensa-tions while at the same time properly attenuating theauditory feedback for use in the online control ofspeech Future experiments must be designed to distin-guish between the above hypothesized possibilities

METHODS

Subjects

Eight male volunteers (ages 25ndash40) participated inExperiments 1 and 3 of this study Of these volun-

teers six also participated in Experiment 2 All subjectsgave their informed consent All studies were per-formed with the approval of the UCSF Committee forHuman Research

Experimental Design

Magnetic fields were recorded from both hemispheresin a shielded room using two 37-channel biomagnetom-eters with SQUID-based first-order gradiometer sensors(Magnes II 4-D Neuroimaging San Diego CA) A sche-matic of the experimental setup is shown in Figure 1A directional microphone was placed in the chamber at adistance where it did not distort the recorded magneticfields Triggering of the data acquisition system and ofstimuli delivered to the ears was controlled and itdiffered for each of three experiments in the study

Fiduciary points were marked on the skin for latercoregistration with structural magnetic resonanceimages and the head shape was digitized to constrainsubsequent source modeling The sensor was initiallypositioned over the estimated location of the auditorycortices in both hemispheres such that a dipolar M100response was evoked by single 400-msec duration tonepips (1 kHz 5 msec risefall ramps 90 dB SPL) Dataacquisition epochs were 600 msec in total duration witha 100-msec prestimulus period referenced to the onsetof the first speech stimulus Data were acquired at asampling rate of 1041 Hz The position of the sensorwas then refined so that a single dipole localizationmodel of the averaged evoked magnetic field responsesto 100 tones resulted in a correlation and goodness-of-fitgreater than 95

With satisfactory sensor positioning over the auditorycortices in both hemispheres but before starting theexperiments subjects practiced producing the neutralvowel sound by phonating while relaxing their uppervocal tract generating no movement of their jaw ortongue Preliminary experiments revealed that this formof speech production produced no movement artifactsin these magnetic field recordings

During the speaking condition of Experiments 1 and 3the audio onset of each utterance picked up the micro-phone triggered the MEG data acquisition system whichrecorded 1 sec of data (300 msec pretrigger 700 msecposttrigger) at a sampling rate of 1041 Hz Subjectswere instructed to produce short vowel sounds byphonating their vocal chords without any movement oftheir jaw or tongue Responses to 100 such utteranceswere recorded and these magnetic field recordings werethen averaged to obtain the evoked magnetic fieldresponses to self-produced speech for each MEG detec-tor channel In the subsequent tape playback conditionthe audio signal from the tape recorder was fed to thesubjects earphones and the trigger input for MEG dataacquisition Again magnetic field recordings wereaveraged to obtain the evoked magnetic field responses

1134 Journal of Cognitive Neuroscience Volume 14 Number 8

During the tape playback condition of Experiments 1and 3 the audio level of the signal fed to the earphonesand trigger was adjusted to be the same as that of thespeaking condition Because of this the trigger of MEGdata acquisition (which was based on an audio voltagethreshold) occurred at the same time relative to utter-ance onset in both conditions The equal audio levelsensured that triggering latency differences could notaccount for any differences between the evoked re-sponses to self-produced and tape-recorded speech InExperiment 1 the subject heard these equal audio levelsin the earphones which meant that his side tone ampli-tude was the same in both the speaking and tape play-back conditions In Experiment 3 the audio signal foreach utterance was used to trigger both the MEG dataacquisition system and a white noise generator (20 Hzndash20 kHz bandwidth 85 dB SPL) In this case a sum of theiraudio signal and noise was presented to the subject

In Experiment 2 onset of the tone pips triggered MEGdata acquisition in each of the three conditions (tonesalone tones and speaking and tones and tape playback)These 1-kHz tones each had a 400-msec duration 5-mseconset and offset ramps and a 08ndash12-sec ISI betweentones The subject heard the tones at 90 dB SPL As withExperiment 1 data acquisition epochs were 10 sec intotal duration with a 300-msec prestimulus period Datawere acquired at a sampling rate of 1041 Hz In eachcondition the evoked magnetic field responses to thetone pips were obtained by averaging 100 epochs

Epochs 1500 msec long with a 500-msec prestimulusperiod were collected at a sampling rate of 1041 Hz Theraw epoch magnetic field data from these 100 utteranceswere then averaged to obtain the evoked magnetic fieldresponses to gated noise during self-production

The voltage level of the audio signal fed to earphoneand trigger signal in this condition was adjusted to beidentical to the voltage levels recorded during the speak-ing condition ensuring that triggering differences couldnot account for any differences between the evokedresponses to noise in the self-produced and tape play-back blocks

Data Analysis

Data from the two sensor arrays positioned over eachhemisphere were separately analyzed From the evokedmagnetic field response for each condition in eachchannel the RMS value of evoked magnetic field strengthaveraged across channels was computed The RMS meth-od of averaging (square root of the mean of squaredchannel field strengths) was used to avoid the cancella-tion that occurs when the magnetic fields from detectorson opposite sides of the current dipole (whose fields areroughly equal in magnitude but opposite in sign) areaveraged together This process resulted in one RMStime waveform for each hemisphere for each conditionfor each subject For statistical comparisons across con-

ditions a one-way analysis of variance was performedwith condition as factor and time as a repeated measure

The RMS across channels is a convenient measure ofactivity that is independent of the position of individualsensors Following optimal placement of the sensor for asignificant M100 dipolar response RMS has been shownto correlate well with the lsquolsquoequivalent current-dipolestrengthrsquorsquo (also called the lsquolsquoQ valuersquorsquo) of a dipole posi-tioned in and around the auditory cortex The RMSmeasure is also correlated with the data obtained from(1) the channel with the biggest signal-to-noise ratioand (2) the first two principal components of the arrayresponse (Ahissar et al 2001 Mahncke 1998)

In sensor arrays comprising of planar gradiometers(like the Neuromag 122 or 306) the channel with largestsignal-to-noise ratio is often closest to the source (asdictated by the physics) However in the case of sensorarrays comprising of axial gradiometers (as in ourMagnes II system) the channel with the largest signal-to-noise ratio reflects the peak of the magnetic fieldwhich is not the closest channel to a source Moreoverthe lsquolsquobestrsquorsquo channel could reflect either a peak lsquolsquoinwardrsquorsquoor lsquolsquooutwardrsquorsquo field requiring signed inversion whileaveraging such data across subjects Finally since thesensor array position can varying considerably acrosssubjects it is difficult to use procedures typical for EEGand whole-head studies wherein single channel data isoften picked for analysis across subjects These consid-erations have led us to use RMS instead of pickingspecific channels for our analysis

Single Dipole Model Fit

A single equivalent current-dipole model was calculatedseparately for each hemisphere for each poststimulustime point using standard 4-D Neuroimaging softwareoperating on data that had been filtered between 1 and20 Hz The localization algorithm used an iterative least-squares minimization to compute the strength andlocation of a single dipole in a spherical volume ofuniform conductivity that can account for the sensordata Dipole fits were accepted based on a local corre-lation maximum criteria of 95 and goodness-of-fit valuesgreater than 95

The MEG anatomical reference frame was establishedusing a digital device called the lsquolsquosensor position indica-torrsquorsquo A set of three table-mounted and one array-mounted receivers triangulates the signal from a stylustransmitter positioned at fiducial reference points on thesubjectrsquos head surface (typically nasion left right pre-auricular points Cz and inion) The same stylus trans-mitter arrangement served to define the curvature of thehead by tracing the surface from which a lsquolsquolocal spherersquorsquomodel of the head was generated To coregister thedipole locations on the MRI image the fiduciary pointsserve as the basis for a common coordinate system Bysuperimposing these fiducial landmarks on MR images

Houde et al 1135

of the subject it was possible to define the position ofthe computed point sources with an accuracy of 5 mm

Multiple Dipole Localization Analysis

Multiple dipole localization analyses of spatio-temporalevoked magnetic fields was also performed using Multi-ple Signal Classification (MUSIC) (Sekihara PoeppelMarantz Koizumi amp Miyashita 1997 Mosher Lewisamp Leahy 1992) and beamformer algorithms (Mosheret al 1992 Sekihara Nagarajan Poeppel Marantz ampMiyashita 2001 Robinson amp Vrba 1999) MUSIC meth-ods are based on estimation of a signal lsquolsquosubspacersquorsquofrom entire spatio-temporal MEG data using singular-value decomposition (SVD) A version of the MUSICalgorithm referred to as the lsquolsquoconventionalrsquorsquo MUSICalgorithm was implemented in MATLAB under theassumption that the sources contributing to the MEGdata arose from multiple dipoles (lt37 in number)which were stationary and were located within a spher-ical volume of uniform conductivity (Sekihara et al1997) The locations of dipoles are typically determinedby conducting a search over a three-dimensional grid ofinterest within the head Given the sensor positionsand the coordinates of the origin of a lsquolsquolocal spherersquorsquoapproximation of the head shape for each subject alead-field matrix was computed for each point in this3-D grid At each point in this grid from the lead-fieldmatrix and the covariance matrix of the spatio-temporalMEG data the value of a MUSIC localizer function canthen be computed (Sekihara et al 1997 Equation (4))Maxima of this localizer function then corresponded tothe location of dipolar sources We used the MUSICalgorithm to compute the localizer functions over athree-dimensional grid in the hemisphere contralateralto the stimulus delivery The localizer was computedusing the evoked magnetic field responses over a timewindow of 1000 msec following the onset of thesentences For each subject at each point in a 3-D grid(iexcl4 lt x lt 6 0 lt y lt 8 3 lt z lt 11) in the lefthemisphere the localizer function was computed usingthe averaged evoked auditory magnetic field responses

Acknowledgments

This work was funded in part by Scientific Learning Co and agrant from the Whitaker Foundation

Reprint requests should be sent to John F Houde W M KeckCenter for Integrative Neuroscience University of CaliforniaSan Francisco 513 Parnassus Avenue S-762 San Francisco CA94143 USA or via e-mail houdephyucsfedu

Note

1 In a recent study of M100 responses to tones it was foundthat near detection thresholds a 15-dB change in SPL caused a34 change in M100 RMS amplitude (Stufflebeam et al 1998)

REFERENCES

Ahissar E Nagarajan S Ahissar M Protopapas A MahnckeH amp Merzenich M M (2001) Speech comprehension iscorrelated with temporal response patterns recorded fromauditory cortex Proceedings of the National Academy ofSciences USA 98 13367 ndash13372

Blakemore S J Frith C D amp Wolpert D M (1999)Spatio-temporal prediction modulates the perception ofself-produced stimuli Journal of Cognitive Neuroscience11 551 ndash559

Blakemore S J Wolpert D M amp Frith C D (1999) Thecerebellum contributes to somatosensory cortical activityduring self-produced tactile stimulation Neuroimage 10448 ndash459

Burnett T A Freedland M B Larson C R amp Hain T C(1998) Voice f0 responses to manipulations in pitchfeedback Journal of the Acoustical Society of America103 3153ndash3161

Cooper W E (1979) Speech perception and productionStudies in selective adaptation Norwood NJ Ablex

Creutzfeldt O amp Ojemann G (1989) Neuronal activityin the human lateral temporal lobe III Activity changesduring music Experimental Brain Research 77490 ndash498

Creutzfeldt O Ojemann G amp Lettich E (1989a) Neuronalactivity in the human lateral temporal lobe I Responses tospeech Experimental Brain Research 77 451 ndash475

Creutzfeldt O Ojemann G amp Lettich E (1989b) Neuronalactivity in the human lateral temporal lobe II Responses tothe subjects own voice Experimental Brain Research 77476 ndash489

Curio G Neuloh G Numminen J Jousmaki V amp Hari R(2000) Speaking modifies voice-evoked activity in thehuman auditory cortex Human Brain Mapping 9183 ndash191

Demonet J F Chollet F Ramsay S Cardebat DNespoulous J L Wise R Rascol A amp Frackowiak R(1992) The anatomy of phonological and semanticprocessing in normal subjects Brain 115 1753 ndash1768

Diesch E amp Luce T (1997) Magnetic fields elicited bytones and vowel formants reveal tonotopy and nonlinearsummation of cortical activation Psychophysiology 34501 ndash510

Elman J L (1981) Effects of frequency-shifted feedback onthe pitch of vocal productions Journal of the AcousticalSociety of America 70 45ndash50

Franklin G F Powell J D amp Emami-Naeini A (1991)Feedback control of dynamic systems (2nd ed) ReadingMA Addison-Wesley

Gracco V L Ross D Kalinowski J amp Stuart A (1994)Articulatory changes following spectral and temporalmodifications in auditory feedback Journal of theAcoustical Society of America 95 2821

Gunji A Hoshiyama M amp Kakigi R (2000) Identificationof auditory evoked potentials of onersquos own voice ClinicalNeurophysiology 111 214 ndash219

Gunji A Kakigi R amp Hoshiyama M (2000) Spatiotemporalsource analysis of vocalization-associated magnetic fieldsCognitive Brain Research 9 157 ndash163

Hari R Aittoniemi K Jarvinen M L Katila T amp Varpula T(1980) Auditory evoked transient and sustained magneticfields of the human brain Localization of neural generatorsExperimental Brain Research 40 237 ndash240

Hari R amp Makela J P (1988) Modification of neuromagneticresponses of the human auditory cortex by masking soundsExperimental Brain Research 71 87ndash92

Hein A V amp Held R (1962) A neural model for labile

1136 Journal of Cognitive Neuroscience Volume 14 Number 8

sensorimotor coordination In E Bernard amp M Hare(Eds) Biological prototypes and synthetic systems(vol 1 pp 71ndash74) New York Plenum

Hickok G amp Poeppel D (2000) Towards a functionalneuroanatomy of speech perception Trends in CognitiveSciences 4 131 ndash138

Hirano S Kojima H Naito Y Honjo I Kamoto YOkazawa H Ishizu K Yonekura Y Nagahama YFukuyama H amp Konishi J (1996) Cortical speechprocessing mechanisms while vocalizing visually presentedlanguages NeuroReport 8 363ndash367

Hirano S Kojima H Naito Y Honjo I Kamoto YOkazawa H Ishizu K Yonekura Y Nagahama YFukuyama H amp Konishi J (1997) Cortical processingmechanism for vocalization with auditory verbal feedbackNeuroReport 8 2379ndash2382

Hirano S Naito Y Okazawa H Kojima H Honjo I IshizuK Yenokura Y Nagahama Y Fukuyama H amp Konishi J(1997) Cortical activation by monaural speech soundstimulation demonstrated by positron emission tomographyExperimental Brain Research 113 75ndash80

Houde J F amp Jordan M I (1998) Sensorimotor adaptationin speech production Science 279 1213 ndash1216

Jacobs O L R (1993) Introduction to control theory(2nd ed) Oxford UK Oxford University Press

Jeannerod M (1988) The neural and behaviouralorganisation of goal-directed movements Oxford UKOxford University Press

Jurgens U (2002) Neural pathways underlying vocalcontrol Neuroscience and Biobehavioral Reviews 26235 ndash258

Kawahara H (1993) Transformed auditory feedback Effectsof fundamental frequency perturbation Journal of theAcoustical Society of America 94 1883

Kuriki S Mori T amp Hirata Y (1999) Motor planningcenter for speech articulation in the normal human brainNeuroReport 10 765 ndash769

Kuriki S amp Murase M (1989) Neuromagnetic study of theauditory responses in right and left hemispheres of thehuman brain evoked by pure tones and speech soundsExperimental Brain Research 77 127 ndash134

Lane H amp Tranel B (1971) The lombard sign and the roleof hearing in speech Journal of Speech and HearingResearch 14 677ndash709

Lee B S (1950) Some effects of side-tone delay Journal ofthe Acoustical Society of America 22 639 ndash640

Leonardo A amp Konishi M (1999) Decrystallization of adultbirdsong by perturbation of auditory feedback Nature 399466 ndash470

Levelt W J (1983) Monitoring and self-repair in speechCognition 14 41ndash104

Levelt W J M (1989) Speaking From intention toarticulation Cambridge MIT Press

Liegeois-Chauvel C Musolino A Badier J M Marquis P ampChauvel P (1994) Evoked potentials recorded from theauditory cortex in man Evaluation and topography of themiddle latency components Electroencephalography andClinical Neurophysiology 92 204 ndash214

Mahncke H (1998) Oscillations and temporal processingin sensory cortex Unpublished doctoral dissertationUniversity of California San Francisco

McCasland J S amp Konishi M (1981) Interaction betweenauditory and motor activities in an avian song controlnucleus Proceedings of the National Academy of SciencesUSA 78 7815ndash7819

McGuire P K Silbersweig D A amp Frith C D (1996)Functional neuroanatomy of verbal self-monitoring Brain119 907 ndash917

Mosher J C Lewis P S amp Leahy R M (1992) Multipledipole modeling and localization from spatio-temporal megdata IEEE Transactions on Biomedical Engineering 39541 ndash557

Muller-Preuss P amp Ploog D (1981) Inhibition of auditorycortical neurons during phonation Brain Research 21561ndash76

Numminen J amp Curio G (1999) Differential effects of overtcovert and replayed speech on vowel-evoked responsesof the human auditory cortex Neuroscience Letters 27229ndash32

Numminen J Salmelin R amp Hari R (1999) Subjectrsquos ownspeech reduces reactivity of the human auditory cortexNeuroscience Letters 265 119 ndash122

Papanicolaou A C Raz N Loring D W amp EisenbergH M (1986) Brain stem evoked response suppressionduring speech production Brain and Language 2750ndash55

Perkell J (1997) Articulatory processes In W J Hardcastleamp J Laver (Eds) The handbook of phonetic sciences(pp 333 ndash370) Oxford UK Blackwell Publishers

Picton T W Alain C Woods D L John M S Scherg MValdes-Sosa P Bosch-Bayard J amp Trujillo N J (1999)Intracerebral sources of human auditory-evoked potentialsAudiology and Neuro-otology 4 64ndash79

Poeppel D Yellin E Phillips C Roberts T P Rowley H AWexler K amp Marantz A (1996) Task-induced asymmetry ofthe auditory evoked m100 neuromagnetic field elicited byspeech sounds Brain Research Cognitive Brain Research4 231 ndash242

Reite M Adams M Simon J Teale P Sheeder JRichardson D amp Grabbe R (1994) Auditory m100component 1 Relationship to heschlrsquos gyri BrainResearch Cognitive Brain Research 2 13ndash20

Richer F Alain C Achim A Bouvier G amp Saint-Hilaire J M(1989) Intracerebral amplitude distributions of the auditoryevoked potential Electroencephalography and ClinicalNeurophysiology 74 202 ndash208

Robinson S amp Vrba J (1999) Functional neuroimaging bysynthetic aperture magnetometry (SAM) In Yoshimoto et al(Eds) Recent advances in biomagnetism (pp 302 ndash305)Sendai Japan Tokyo University Press

Scherg M amp Von Cramon D (1985) Two bilateral sources ofthe late aep as identified by a spatio-temporal dipole modelElectroencephalography and Clinical Neurophysiology 6232ndash44

Scherg M amp Von Cramon D (1986) Evoked dipolesource potentials of the human auditory cortexElectroencephalography and Clinical Neurophysiology65 344 ndash360

Schmidt M F amp Konishi M (1998) Gating of auditoryresponses in the vocal control system of awake songbirdsNature Neuroscience 1 513ndash518

Sekihara K Nagarajan S S Poeppel D Marantz Aamp Miyashita Y (2001) Reconstructing spatio-temporalactivities of neural sources using an meg vectorbeamformer technique IEEE Transactions on BiomedicalEngineering 48 760ndash771

Sekihara K Poeppel D Marantz A Koizumi H ampMiyashita Y (1997) Noise covariance incorporatedmeg-music algorithm A method for multiple-dipoleestimation tolerant of the influence of background brainactivity IEEE Transactions on Biomedical Engineering44 839 ndash847

Stevens K N (1999) Acoustic phonetics CambridgeMIT Press

Stufflebeam S M Poeppel D Rowley H A amp RobertsT P (1998) Peri-threshold encoding of stimulus

Houde et al 1137

frequency and intensity in the m100 latency NeuroReport9 91ndash94

Suga N amp Schlegel P (1972) Neural attenuation ofresponses to emitted sounds in echolocating rats Science177 82ndash84

Suga N amp Shimozawa T (1974) Site of neural attenuation ofresponses to self-vocalized sounds in echolocating batsScience 183 1211 ndash1213

von Bekesy G (1949) The structure of the middle ear and thehearing of onersquos own voice by bone conduction Journal ofthe Acoustical Society of America 21 217 ndash232

Weiskrantz L Elliott J amp Darlington C (1971) Preliminaryobservations on tickling oneself Nature 230 598 ndash599

Woods D L Knight R T amp Neville H J (1984)Bitemporal lesions dissociate auditory evoked potentialsand perception Electroencephalography and ClinicalNeurophysiology 57 208 ndash220

Yates A J (1963) Delayed auditory feedback PsychologicalBulletin 60 213 ndash232

Zatorre R J Evans A C Meyer E amp Gjedde A (1992)Lateralization of phonetic and pitch discrimination in speechprocessing Science 256 846 ndash849

1138 Journal of Cognitive Neuroscience Volume 14 Number 8

During the tape playback condition of Experiments 1and 3 the audio level of the signal fed to the earphonesand trigger was adjusted to be the same as that of thespeaking condition Because of this the trigger of MEGdata acquisition (which was based on an audio voltagethreshold) occurred at the same time relative to utter-ance onset in both conditions The equal audio levelsensured that triggering latency differences could notaccount for any differences between the evoked re-sponses to self-produced and tape-recorded speech InExperiment 1 the subject heard these equal audio levelsin the earphones which meant that his side tone ampli-tude was the same in both the speaking and tape play-back conditions In Experiment 3 the audio signal foreach utterance was used to trigger both the MEG dataacquisition system and a white noise generator (20 Hzndash20 kHz bandwidth 85 dB SPL) In this case a sum of theiraudio signal and noise was presented to the subject

In Experiment 2 onset of the tone pips triggered MEGdata acquisition in each of the three conditions (tonesalone tones and speaking and tones and tape playback)These 1-kHz tones each had a 400-msec duration 5-mseconset and offset ramps and a 08ndash12-sec ISI betweentones The subject heard the tones at 90 dB SPL As withExperiment 1 data acquisition epochs were 10 sec intotal duration with a 300-msec prestimulus period Datawere acquired at a sampling rate of 1041 Hz In eachcondition the evoked magnetic field responses to thetone pips were obtained by averaging 100 epochs

Epochs 1500 msec long with a 500-msec prestimulusperiod were collected at a sampling rate of 1041 Hz Theraw epoch magnetic field data from these 100 utteranceswere then averaged to obtain the evoked magnetic fieldresponses to gated noise during self-production

The voltage level of the audio signal fed to earphoneand trigger signal in this condition was adjusted to beidentical to the voltage levels recorded during the speak-ing condition ensuring that triggering differences couldnot account for any differences between the evokedresponses to noise in the self-produced and tape play-back blocks

Data Analysis

Data from the two sensor arrays positioned over eachhemisphere were separately analyzed From the evokedmagnetic field response for each condition in eachchannel the RMS value of evoked magnetic field strengthaveraged across channels was computed The RMS meth-od of averaging (square root of the mean of squaredchannel field strengths) was used to avoid the cancella-tion that occurs when the magnetic fields from detectorson opposite sides of the current dipole (whose fields areroughly equal in magnitude but opposite in sign) areaveraged together This process resulted in one RMStime waveform for each hemisphere for each conditionfor each subject For statistical comparisons across con-

ditions a one-way analysis of variance was performedwith condition as factor and time as a repeated measure

The RMS across channels is a convenient measure ofactivity that is independent of the position of individualsensors Following optimal placement of the sensor for asignificant M100 dipolar response RMS has been shownto correlate well with the lsquolsquoequivalent current-dipolestrengthrsquorsquo (also called the lsquolsquoQ valuersquorsquo) of a dipole posi-tioned in and around the auditory cortex The RMSmeasure is also correlated with the data obtained from(1) the channel with the biggest signal-to-noise ratioand (2) the first two principal components of the arrayresponse (Ahissar et al 2001 Mahncke 1998)

In sensor arrays comprising of planar gradiometers(like the Neuromag 122 or 306) the channel with largestsignal-to-noise ratio is often closest to the source (asdictated by the physics) However in the case of sensorarrays comprising of axial gradiometers (as in ourMagnes II system) the channel with the largest signal-to-noise ratio reflects the peak of the magnetic fieldwhich is not the closest channel to a source Moreoverthe lsquolsquobestrsquorsquo channel could reflect either a peak lsquolsquoinwardrsquorsquoor lsquolsquooutwardrsquorsquo field requiring signed inversion whileaveraging such data across subjects Finally since thesensor array position can varying considerably acrosssubjects it is difficult to use procedures typical for EEGand whole-head studies wherein single channel data isoften picked for analysis across subjects These consid-erations have led us to use RMS instead of pickingspecific channels for our analysis

Single Dipole Model Fit

A single equivalent current-dipole model was calculatedseparately for each hemisphere for each poststimulustime point using standard 4-D Neuroimaging softwareoperating on data that had been filtered between 1 and20 Hz The localization algorithm used an iterative least-squares minimization to compute the strength andlocation of a single dipole in a spherical volume ofuniform conductivity that can account for the sensordata Dipole fits were accepted based on a local corre-lation maximum criteria of 95 and goodness-of-fit valuesgreater than 95

The MEG anatomical reference frame was establishedusing a digital device called the lsquolsquosensor position indica-torrsquorsquo A set of three table-mounted and one array-mounted receivers triangulates the signal from a stylustransmitter positioned at fiducial reference points on thesubjectrsquos head surface (typically nasion left right pre-auricular points Cz and inion) The same stylus trans-mitter arrangement served to define the curvature of thehead by tracing the surface from which a lsquolsquolocal spherersquorsquomodel of the head was generated To coregister thedipole locations on the MRI image the fiduciary pointsserve as the basis for a common coordinate system Bysuperimposing these fiducial landmarks on MR images

Houde et al 1135

of the subject it was possible to define the position ofthe computed point sources with an accuracy of 5 mm

Multiple Dipole Localization Analysis

Multiple dipole localization analyses of spatio-temporalevoked magnetic fields was also performed using Multi-ple Signal Classification (MUSIC) (Sekihara PoeppelMarantz Koizumi amp Miyashita 1997 Mosher Lewisamp Leahy 1992) and beamformer algorithms (Mosheret al 1992 Sekihara Nagarajan Poeppel Marantz ampMiyashita 2001 Robinson amp Vrba 1999) MUSIC meth-ods are based on estimation of a signal lsquolsquosubspacersquorsquofrom entire spatio-temporal MEG data using singular-value decomposition (SVD) A version of the MUSICalgorithm referred to as the lsquolsquoconventionalrsquorsquo MUSICalgorithm was implemented in MATLAB under theassumption that the sources contributing to the MEGdata arose from multiple dipoles (lt37 in number)which were stationary and were located within a spher-ical volume of uniform conductivity (Sekihara et al1997) The locations of dipoles are typically determinedby conducting a search over a three-dimensional grid ofinterest within the head Given the sensor positionsand the coordinates of the origin of a lsquolsquolocal spherersquorsquoapproximation of the head shape for each subject alead-field matrix was computed for each point in this3-D grid At each point in this grid from the lead-fieldmatrix and the covariance matrix of the spatio-temporalMEG data the value of a MUSIC localizer function canthen be computed (Sekihara et al 1997 Equation (4))Maxima of this localizer function then corresponded tothe location of dipolar sources We used the MUSICalgorithm to compute the localizer functions over athree-dimensional grid in the hemisphere contralateralto the stimulus delivery The localizer was computedusing the evoked magnetic field responses over a timewindow of 1000 msec following the onset of thesentences For each subject at each point in a 3-D grid(iexcl4 lt x lt 6 0 lt y lt 8 3 lt z lt 11) in the lefthemisphere the localizer function was computed usingthe averaged evoked auditory magnetic field responses

Acknowledgments

This work was funded in part by Scientific Learning Co and agrant from the Whitaker Foundation

Reprint requests should be sent to John F Houde W M KeckCenter for Integrative Neuroscience University of CaliforniaSan Francisco 513 Parnassus Avenue S-762 San Francisco CA94143 USA or via e-mail houdephyucsfedu

Note

1 In a recent study of M100 responses to tones it was foundthat near detection thresholds a 15-dB change in SPL caused a34 change in M100 RMS amplitude (Stufflebeam et al 1998)

REFERENCES

Ahissar E Nagarajan S Ahissar M Protopapas A MahnckeH amp Merzenich M M (2001) Speech comprehension iscorrelated with temporal response patterns recorded fromauditory cortex Proceedings of the National Academy ofSciences USA 98 13367 ndash13372

Blakemore S J Frith C D amp Wolpert D M (1999)Spatio-temporal prediction modulates the perception ofself-produced stimuli Journal of Cognitive Neuroscience11 551 ndash559

Blakemore S J Wolpert D M amp Frith C D (1999) Thecerebellum contributes to somatosensory cortical activityduring self-produced tactile stimulation Neuroimage 10448 ndash459

Burnett T A Freedland M B Larson C R amp Hain T C(1998) Voice f0 responses to manipulations in pitchfeedback Journal of the Acoustical Society of America103 3153ndash3161

Cooper W E (1979) Speech perception and productionStudies in selective adaptation Norwood NJ Ablex

Creutzfeldt O amp Ojemann G (1989) Neuronal activityin the human lateral temporal lobe III Activity changesduring music Experimental Brain Research 77490 ndash498

Creutzfeldt O Ojemann G amp Lettich E (1989a) Neuronalactivity in the human lateral temporal lobe I Responses tospeech Experimental Brain Research 77 451 ndash475

Creutzfeldt O Ojemann G amp Lettich E (1989b) Neuronalactivity in the human lateral temporal lobe II Responses tothe subjects own voice Experimental Brain Research 77476 ndash489

Curio G Neuloh G Numminen J Jousmaki V amp Hari R(2000) Speaking modifies voice-evoked activity in thehuman auditory cortex Human Brain Mapping 9183 ndash191

Demonet J F Chollet F Ramsay S Cardebat DNespoulous J L Wise R Rascol A amp Frackowiak R(1992) The anatomy of phonological and semanticprocessing in normal subjects Brain 115 1753 ndash1768

Diesch E amp Luce T (1997) Magnetic fields elicited bytones and vowel formants reveal tonotopy and nonlinearsummation of cortical activation Psychophysiology 34501 ndash510

Elman J L (1981) Effects of frequency-shifted feedback onthe pitch of vocal productions Journal of the AcousticalSociety of America 70 45ndash50

Franklin G F Powell J D amp Emami-Naeini A (1991)Feedback control of dynamic systems (2nd ed) ReadingMA Addison-Wesley

Gracco V L Ross D Kalinowski J amp Stuart A (1994)Articulatory changes following spectral and temporalmodifications in auditory feedback Journal of theAcoustical Society of America 95 2821

Gunji A Hoshiyama M amp Kakigi R (2000) Identificationof auditory evoked potentials of onersquos own voice ClinicalNeurophysiology 111 214 ndash219

Gunji A Kakigi R amp Hoshiyama M (2000) Spatiotemporalsource analysis of vocalization-associated magnetic fieldsCognitive Brain Research 9 157 ndash163

Hari R Aittoniemi K Jarvinen M L Katila T amp Varpula T(1980) Auditory evoked transient and sustained magneticfields of the human brain Localization of neural generatorsExperimental Brain Research 40 237 ndash240

Hari R amp Makela J P (1988) Modification of neuromagneticresponses of the human auditory cortex by masking soundsExperimental Brain Research 71 87ndash92

Hein A V amp Held R (1962) A neural model for labile

1136 Journal of Cognitive Neuroscience Volume 14 Number 8

sensorimotor coordination In E Bernard amp M Hare(Eds) Biological prototypes and synthetic systems(vol 1 pp 71ndash74) New York Plenum

Hickok G amp Poeppel D (2000) Towards a functionalneuroanatomy of speech perception Trends in CognitiveSciences 4 131 ndash138

Hirano S Kojima H Naito Y Honjo I Kamoto YOkazawa H Ishizu K Yonekura Y Nagahama YFukuyama H amp Konishi J (1996) Cortical speechprocessing mechanisms while vocalizing visually presentedlanguages NeuroReport 8 363ndash367

Hirano S Kojima H Naito Y Honjo I Kamoto YOkazawa H Ishizu K Yonekura Y Nagahama YFukuyama H amp Konishi J (1997) Cortical processingmechanism for vocalization with auditory verbal feedbackNeuroReport 8 2379ndash2382

Hirano S Naito Y Okazawa H Kojima H Honjo I IshizuK Yenokura Y Nagahama Y Fukuyama H amp Konishi J(1997) Cortical activation by monaural speech soundstimulation demonstrated by positron emission tomographyExperimental Brain Research 113 75ndash80

Houde J F amp Jordan M I (1998) Sensorimotor adaptationin speech production Science 279 1213 ndash1216

Jacobs O L R (1993) Introduction to control theory(2nd ed) Oxford UK Oxford University Press

Jeannerod M (1988) The neural and behaviouralorganisation of goal-directed movements Oxford UKOxford University Press

Jurgens U (2002) Neural pathways underlying vocalcontrol Neuroscience and Biobehavioral Reviews 26235 ndash258

Kawahara H (1993) Transformed auditory feedback Effectsof fundamental frequency perturbation Journal of theAcoustical Society of America 94 1883

Kuriki S Mori T amp Hirata Y (1999) Motor planningcenter for speech articulation in the normal human brainNeuroReport 10 765 ndash769

Kuriki S amp Murase M (1989) Neuromagnetic study of theauditory responses in right and left hemispheres of thehuman brain evoked by pure tones and speech soundsExperimental Brain Research 77 127 ndash134

Lane H amp Tranel B (1971) The lombard sign and the roleof hearing in speech Journal of Speech and HearingResearch 14 677ndash709

Lee B S (1950) Some effects of side-tone delay Journal ofthe Acoustical Society of America 22 639 ndash640

Leonardo A amp Konishi M (1999) Decrystallization of adultbirdsong by perturbation of auditory feedback Nature 399466 ndash470

Levelt W J (1983) Monitoring and self-repair in speechCognition 14 41ndash104

Levelt W J M (1989) Speaking From intention toarticulation Cambridge MIT Press

Liegeois-Chauvel C Musolino A Badier J M Marquis P ampChauvel P (1994) Evoked potentials recorded from theauditory cortex in man Evaluation and topography of themiddle latency components Electroencephalography andClinical Neurophysiology 92 204 ndash214

Mahncke H (1998) Oscillations and temporal processingin sensory cortex Unpublished doctoral dissertationUniversity of California San Francisco

McCasland J S amp Konishi M (1981) Interaction betweenauditory and motor activities in an avian song controlnucleus Proceedings of the National Academy of SciencesUSA 78 7815ndash7819

McGuire P K Silbersweig D A amp Frith C D (1996)Functional neuroanatomy of verbal self-monitoring Brain119 907 ndash917

Mosher J C Lewis P S amp Leahy R M (1992) Multipledipole modeling and localization from spatio-temporal megdata IEEE Transactions on Biomedical Engineering 39541 ndash557

Muller-Preuss P amp Ploog D (1981) Inhibition of auditorycortical neurons during phonation Brain Research 21561ndash76

Numminen J amp Curio G (1999) Differential effects of overtcovert and replayed speech on vowel-evoked responsesof the human auditory cortex Neuroscience Letters 27229ndash32

Numminen J Salmelin R amp Hari R (1999) Subjectrsquos ownspeech reduces reactivity of the human auditory cortexNeuroscience Letters 265 119 ndash122

Papanicolaou A C Raz N Loring D W amp EisenbergH M (1986) Brain stem evoked response suppressionduring speech production Brain and Language 2750ndash55

Perkell J (1997) Articulatory processes In W J Hardcastleamp J Laver (Eds) The handbook of phonetic sciences(pp 333 ndash370) Oxford UK Blackwell Publishers

Picton T W Alain C Woods D L John M S Scherg MValdes-Sosa P Bosch-Bayard J amp Trujillo N J (1999)Intracerebral sources of human auditory-evoked potentialsAudiology and Neuro-otology 4 64ndash79

Poeppel D Yellin E Phillips C Roberts T P Rowley H AWexler K amp Marantz A (1996) Task-induced asymmetry ofthe auditory evoked m100 neuromagnetic field elicited byspeech sounds Brain Research Cognitive Brain Research4 231 ndash242

Reite M Adams M Simon J Teale P Sheeder JRichardson D amp Grabbe R (1994) Auditory m100component 1 Relationship to heschlrsquos gyri BrainResearch Cognitive Brain Research 2 13ndash20

Richer F Alain C Achim A Bouvier G amp Saint-Hilaire J M(1989) Intracerebral amplitude distributions of the auditoryevoked potential Electroencephalography and ClinicalNeurophysiology 74 202 ndash208

Robinson S amp Vrba J (1999) Functional neuroimaging bysynthetic aperture magnetometry (SAM) In Yoshimoto et al(Eds) Recent advances in biomagnetism (pp 302 ndash305)Sendai Japan Tokyo University Press

Scherg M amp Von Cramon D (1985) Two bilateral sources ofthe late aep as identified by a spatio-temporal dipole modelElectroencephalography and Clinical Neurophysiology 6232ndash44

Scherg M amp Von Cramon D (1986) Evoked dipolesource potentials of the human auditory cortexElectroencephalography and Clinical Neurophysiology65 344 ndash360

Schmidt M F amp Konishi M (1998) Gating of auditoryresponses in the vocal control system of awake songbirdsNature Neuroscience 1 513ndash518

Sekihara K Nagarajan S S Poeppel D Marantz Aamp Miyashita Y (2001) Reconstructing spatio-temporalactivities of neural sources using an meg vectorbeamformer technique IEEE Transactions on BiomedicalEngineering 48 760ndash771

Sekihara K Poeppel D Marantz A Koizumi H ampMiyashita Y (1997) Noise covariance incorporatedmeg-music algorithm A method for multiple-dipoleestimation tolerant of the influence of background brainactivity IEEE Transactions on Biomedical Engineering44 839 ndash847

Stevens K N (1999) Acoustic phonetics CambridgeMIT Press

Stufflebeam S M Poeppel D Rowley H A amp RobertsT P (1998) Peri-threshold encoding of stimulus

Houde et al 1137

frequency and intensity in the m100 latency NeuroReport9 91ndash94

Suga N amp Schlegel P (1972) Neural attenuation ofresponses to emitted sounds in echolocating rats Science177 82ndash84

Suga N amp Shimozawa T (1974) Site of neural attenuation ofresponses to self-vocalized sounds in echolocating batsScience 183 1211 ndash1213

von Bekesy G (1949) The structure of the middle ear and thehearing of onersquos own voice by bone conduction Journal ofthe Acoustical Society of America 21 217 ndash232

Weiskrantz L Elliott J amp Darlington C (1971) Preliminaryobservations on tickling oneself Nature 230 598 ndash599

Woods D L Knight R T amp Neville H J (1984)Bitemporal lesions dissociate auditory evoked potentialsand perception Electroencephalography and ClinicalNeurophysiology 57 208 ndash220

Yates A J (1963) Delayed auditory feedback PsychologicalBulletin 60 213 ndash232

Zatorre R J Evans A C Meyer E amp Gjedde A (1992)Lateralization of phonetic and pitch discrimination in speechprocessing Science 256 846 ndash849

1138 Journal of Cognitive Neuroscience Volume 14 Number 8

of the subject it was possible to define the position ofthe computed point sources with an accuracy of 5 mm

Multiple Dipole Localization Analysis

Multiple dipole localization analyses of spatio-temporalevoked magnetic fields was also performed using Multi-ple Signal Classification (MUSIC) (Sekihara PoeppelMarantz Koizumi amp Miyashita 1997 Mosher Lewisamp Leahy 1992) and beamformer algorithms (Mosheret al 1992 Sekihara Nagarajan Poeppel Marantz ampMiyashita 2001 Robinson amp Vrba 1999) MUSIC meth-ods are based on estimation of a signal lsquolsquosubspacersquorsquofrom entire spatio-temporal MEG data using singular-value decomposition (SVD) A version of the MUSICalgorithm referred to as the lsquolsquoconventionalrsquorsquo MUSICalgorithm was implemented in MATLAB under theassumption that the sources contributing to the MEGdata arose from multiple dipoles (lt37 in number)which were stationary and were located within a spher-ical volume of uniform conductivity (Sekihara et al1997) The locations of dipoles are typically determinedby conducting a search over a three-dimensional grid ofinterest within the head Given the sensor positionsand the coordinates of the origin of a lsquolsquolocal spherersquorsquoapproximation of the head shape for each subject alead-field matrix was computed for each point in this3-D grid At each point in this grid from the lead-fieldmatrix and the covariance matrix of the spatio-temporalMEG data the value of a MUSIC localizer function canthen be computed (Sekihara et al 1997 Equation (4))Maxima of this localizer function then corresponded tothe location of dipolar sources We used the MUSICalgorithm to compute the localizer functions over athree-dimensional grid in the hemisphere contralateralto the stimulus delivery The localizer was computedusing the evoked magnetic field responses over a timewindow of 1000 msec following the onset of thesentences For each subject at each point in a 3-D grid(iexcl4 lt x lt 6 0 lt y lt 8 3 lt z lt 11) in the lefthemisphere the localizer function was computed usingthe averaged evoked auditory magnetic field responses

Acknowledgments

This work was funded in part by Scientific Learning Co and agrant from the Whitaker Foundation

Reprint requests should be sent to John F Houde W M KeckCenter for Integrative Neuroscience University of CaliforniaSan Francisco 513 Parnassus Avenue S-762 San Francisco CA94143 USA or via e-mail houdephyucsfedu

Note

1 In a recent study of M100 responses to tones it was foundthat near detection thresholds a 15-dB change in SPL caused a34 change in M100 RMS amplitude (Stufflebeam et al 1998)

REFERENCES

Ahissar E Nagarajan S Ahissar M Protopapas A MahnckeH amp Merzenich M M (2001) Speech comprehension iscorrelated with temporal response patterns recorded fromauditory cortex Proceedings of the National Academy ofSciences USA 98 13367 ndash13372

Blakemore S J Frith C D amp Wolpert D M (1999)Spatio-temporal prediction modulates the perception ofself-produced stimuli Journal of Cognitive Neuroscience11 551 ndash559

Blakemore S J Wolpert D M amp Frith C D (1999) Thecerebellum contributes to somatosensory cortical activityduring self-produced tactile stimulation Neuroimage 10448 ndash459

Burnett T A Freedland M B Larson C R amp Hain T C(1998) Voice f0 responses to manipulations in pitchfeedback Journal of the Acoustical Society of America103 3153ndash3161

Cooper W E (1979) Speech perception and productionStudies in selective adaptation Norwood NJ Ablex

Creutzfeldt O amp Ojemann G (1989) Neuronal activityin the human lateral temporal lobe III Activity changesduring music Experimental Brain Research 77490 ndash498

Creutzfeldt O Ojemann G amp Lettich E (1989a) Neuronalactivity in the human lateral temporal lobe I Responses tospeech Experimental Brain Research 77 451 ndash475

Creutzfeldt O Ojemann G amp Lettich E (1989b) Neuronalactivity in the human lateral temporal lobe II Responses tothe subjects own voice Experimental Brain Research 77476 ndash489

Curio G Neuloh G Numminen J Jousmaki V amp Hari R(2000) Speaking modifies voice-evoked activity in thehuman auditory cortex Human Brain Mapping 9183 ndash191

Demonet J F Chollet F Ramsay S Cardebat DNespoulous J L Wise R Rascol A amp Frackowiak R(1992) The anatomy of phonological and semanticprocessing in normal subjects Brain 115 1753 ndash1768

Diesch E amp Luce T (1997) Magnetic fields elicited bytones and vowel formants reveal tonotopy and nonlinearsummation of cortical activation Psychophysiology 34501 ndash510

Elman J L (1981) Effects of frequency-shifted feedback onthe pitch of vocal productions Journal of the AcousticalSociety of America 70 45ndash50

Franklin G F Powell J D amp Emami-Naeini A (1991)Feedback control of dynamic systems (2nd ed) ReadingMA Addison-Wesley

Gracco V L Ross D Kalinowski J amp Stuart A (1994)Articulatory changes following spectral and temporalmodifications in auditory feedback Journal of theAcoustical Society of America 95 2821

Gunji A Hoshiyama M amp Kakigi R (2000) Identificationof auditory evoked potentials of onersquos own voice ClinicalNeurophysiology 111 214 ndash219

Gunji A Kakigi R amp Hoshiyama M (2000) Spatiotemporalsource analysis of vocalization-associated magnetic fieldsCognitive Brain Research 9 157 ndash163

Hari R Aittoniemi K Jarvinen M L Katila T amp Varpula T(1980) Auditory evoked transient and sustained magneticfields of the human brain Localization of neural generatorsExperimental Brain Research 40 237 ndash240

Hari R amp Makela J P (1988) Modification of neuromagneticresponses of the human auditory cortex by masking soundsExperimental Brain Research 71 87ndash92

Hein A V amp Held R (1962) A neural model for labile

1136 Journal of Cognitive Neuroscience Volume 14 Number 8

sensorimotor coordination In E Bernard amp M Hare(Eds) Biological prototypes and synthetic systems(vol 1 pp 71ndash74) New York Plenum

Hickok G amp Poeppel D (2000) Towards a functionalneuroanatomy of speech perception Trends in CognitiveSciences 4 131 ndash138

Hirano S Kojima H Naito Y Honjo I Kamoto YOkazawa H Ishizu K Yonekura Y Nagahama YFukuyama H amp Konishi J (1996) Cortical speechprocessing mechanisms while vocalizing visually presentedlanguages NeuroReport 8 363ndash367

Hirano S Kojima H Naito Y Honjo I Kamoto YOkazawa H Ishizu K Yonekura Y Nagahama YFukuyama H amp Konishi J (1997) Cortical processingmechanism for vocalization with auditory verbal feedbackNeuroReport 8 2379ndash2382

Hirano S Naito Y Okazawa H Kojima H Honjo I IshizuK Yenokura Y Nagahama Y Fukuyama H amp Konishi J(1997) Cortical activation by monaural speech soundstimulation demonstrated by positron emission tomographyExperimental Brain Research 113 75ndash80

Houde J F amp Jordan M I (1998) Sensorimotor adaptationin speech production Science 279 1213 ndash1216

Jacobs O L R (1993) Introduction to control theory(2nd ed) Oxford UK Oxford University Press

Jeannerod M (1988) The neural and behaviouralorganisation of goal-directed movements Oxford UKOxford University Press

Jurgens U (2002) Neural pathways underlying vocalcontrol Neuroscience and Biobehavioral Reviews 26235 ndash258

Kawahara H (1993) Transformed auditory feedback Effectsof fundamental frequency perturbation Journal of theAcoustical Society of America 94 1883

Kuriki S Mori T amp Hirata Y (1999) Motor planningcenter for speech articulation in the normal human brainNeuroReport 10 765 ndash769

Kuriki S amp Murase M (1989) Neuromagnetic study of theauditory responses in right and left hemispheres of thehuman brain evoked by pure tones and speech soundsExperimental Brain Research 77 127 ndash134

Lane H amp Tranel B (1971) The lombard sign and the roleof hearing in speech Journal of Speech and HearingResearch 14 677ndash709

Lee B S (1950) Some effects of side-tone delay Journal ofthe Acoustical Society of America 22 639 ndash640

Leonardo A amp Konishi M (1999) Decrystallization of adultbirdsong by perturbation of auditory feedback Nature 399466 ndash470

Levelt W J (1983) Monitoring and self-repair in speechCognition 14 41ndash104

Levelt W J M (1989) Speaking From intention toarticulation Cambridge MIT Press

Liegeois-Chauvel C Musolino A Badier J M Marquis P ampChauvel P (1994) Evoked potentials recorded from theauditory cortex in man Evaluation and topography of themiddle latency components Electroencephalography andClinical Neurophysiology 92 204 ndash214

Mahncke H (1998) Oscillations and temporal processingin sensory cortex Unpublished doctoral dissertationUniversity of California San Francisco

McCasland J S amp Konishi M (1981) Interaction betweenauditory and motor activities in an avian song controlnucleus Proceedings of the National Academy of SciencesUSA 78 7815ndash7819

McGuire P K Silbersweig D A amp Frith C D (1996)Functional neuroanatomy of verbal self-monitoring Brain119 907 ndash917

Mosher J C Lewis P S amp Leahy R M (1992) Multipledipole modeling and localization from spatio-temporal megdata IEEE Transactions on Biomedical Engineering 39541 ndash557

Muller-Preuss P amp Ploog D (1981) Inhibition of auditorycortical neurons during phonation Brain Research 21561ndash76

Numminen J amp Curio G (1999) Differential effects of overtcovert and replayed speech on vowel-evoked responsesof the human auditory cortex Neuroscience Letters 27229ndash32

Numminen J Salmelin R amp Hari R (1999) Subjectrsquos ownspeech reduces reactivity of the human auditory cortexNeuroscience Letters 265 119 ndash122

Papanicolaou A C Raz N Loring D W amp EisenbergH M (1986) Brain stem evoked response suppressionduring speech production Brain and Language 2750ndash55

Perkell J (1997) Articulatory processes In W J Hardcastleamp J Laver (Eds) The handbook of phonetic sciences(pp 333 ndash370) Oxford UK Blackwell Publishers

Picton T W Alain C Woods D L John M S Scherg MValdes-Sosa P Bosch-Bayard J amp Trujillo N J (1999)Intracerebral sources of human auditory-evoked potentialsAudiology and Neuro-otology 4 64ndash79

Poeppel D Yellin E Phillips C Roberts T P Rowley H AWexler K amp Marantz A (1996) Task-induced asymmetry ofthe auditory evoked m100 neuromagnetic field elicited byspeech sounds Brain Research Cognitive Brain Research4 231 ndash242

Reite M Adams M Simon J Teale P Sheeder JRichardson D amp Grabbe R (1994) Auditory m100component 1 Relationship to heschlrsquos gyri BrainResearch Cognitive Brain Research 2 13ndash20

Richer F Alain C Achim A Bouvier G amp Saint-Hilaire J M(1989) Intracerebral amplitude distributions of the auditoryevoked potential Electroencephalography and ClinicalNeurophysiology 74 202 ndash208

Robinson S amp Vrba J (1999) Functional neuroimaging bysynthetic aperture magnetometry (SAM) In Yoshimoto et al(Eds) Recent advances in biomagnetism (pp 302 ndash305)Sendai Japan Tokyo University Press

Scherg M amp Von Cramon D (1985) Two bilateral sources ofthe late aep as identified by a spatio-temporal dipole modelElectroencephalography and Clinical Neurophysiology 6232ndash44

Scherg M amp Von Cramon D (1986) Evoked dipolesource potentials of the human auditory cortexElectroencephalography and Clinical Neurophysiology65 344 ndash360

Schmidt M F amp Konishi M (1998) Gating of auditoryresponses in the vocal control system of awake songbirdsNature Neuroscience 1 513ndash518

Sekihara K Nagarajan S S Poeppel D Marantz Aamp Miyashita Y (2001) Reconstructing spatio-temporalactivities of neural sources using an meg vectorbeamformer technique IEEE Transactions on BiomedicalEngineering 48 760ndash771

Sekihara K Poeppel D Marantz A Koizumi H ampMiyashita Y (1997) Noise covariance incorporatedmeg-music algorithm A method for multiple-dipoleestimation tolerant of the influence of background brainactivity IEEE Transactions on Biomedical Engineering44 839 ndash847

Stevens K N (1999) Acoustic phonetics CambridgeMIT Press

Stufflebeam S M Poeppel D Rowley H A amp RobertsT P (1998) Peri-threshold encoding of stimulus

Houde et al 1137

frequency and intensity in the m100 latency NeuroReport9 91ndash94

Suga N amp Schlegel P (1972) Neural attenuation ofresponses to emitted sounds in echolocating rats Science177 82ndash84

Suga N amp Shimozawa T (1974) Site of neural attenuation ofresponses to self-vocalized sounds in echolocating batsScience 183 1211 ndash1213

von Bekesy G (1949) The structure of the middle ear and thehearing of onersquos own voice by bone conduction Journal ofthe Acoustical Society of America 21 217 ndash232

Weiskrantz L Elliott J amp Darlington C (1971) Preliminaryobservations on tickling oneself Nature 230 598 ndash599

Woods D L Knight R T amp Neville H J (1984)Bitemporal lesions dissociate auditory evoked potentialsand perception Electroencephalography and ClinicalNeurophysiology 57 208 ndash220

Yates A J (1963) Delayed auditory feedback PsychologicalBulletin 60 213 ndash232

Zatorre R J Evans A C Meyer E amp Gjedde A (1992)Lateralization of phonetic and pitch discrimination in speechprocessing Science 256 846 ndash849

1138 Journal of Cognitive Neuroscience Volume 14 Number 8

sensorimotor coordination In E Bernard amp M Hare(Eds) Biological prototypes and synthetic systems(vol 1 pp 71ndash74) New York Plenum

Hickok G amp Poeppel D (2000) Towards a functionalneuroanatomy of speech perception Trends in CognitiveSciences 4 131 ndash138

Hirano S Kojima H Naito Y Honjo I Kamoto YOkazawa H Ishizu K Yonekura Y Nagahama YFukuyama H amp Konishi J (1996) Cortical speechprocessing mechanisms while vocalizing visually presentedlanguages NeuroReport 8 363ndash367

Hirano S Kojima H Naito Y Honjo I Kamoto YOkazawa H Ishizu K Yonekura Y Nagahama YFukuyama H amp Konishi J (1997) Cortical processingmechanism for vocalization with auditory verbal feedbackNeuroReport 8 2379ndash2382

Hirano S Naito Y Okazawa H Kojima H Honjo I IshizuK Yenokura Y Nagahama Y Fukuyama H amp Konishi J(1997) Cortical activation by monaural speech soundstimulation demonstrated by positron emission tomographyExperimental Brain Research 113 75ndash80

Houde J F amp Jordan M I (1998) Sensorimotor adaptationin speech production Science 279 1213 ndash1216

Jacobs O L R (1993) Introduction to control theory(2nd ed) Oxford UK Oxford University Press

Jeannerod M (1988) The neural and behaviouralorganisation of goal-directed movements Oxford UKOxford University Press

Jurgens U (2002) Neural pathways underlying vocalcontrol Neuroscience and Biobehavioral Reviews 26235 ndash258

Kawahara H (1993) Transformed auditory feedback Effectsof fundamental frequency perturbation Journal of theAcoustical Society of America 94 1883

Kuriki S Mori T amp Hirata Y (1999) Motor planningcenter for speech articulation in the normal human brainNeuroReport 10 765 ndash769

Kuriki S amp Murase M (1989) Neuromagnetic study of theauditory responses in right and left hemispheres of thehuman brain evoked by pure tones and speech soundsExperimental Brain Research 77 127 ndash134

Lane H amp Tranel B (1971) The lombard sign and the roleof hearing in speech Journal of Speech and HearingResearch 14 677ndash709

Lee B S (1950) Some effects of side-tone delay Journal ofthe Acoustical Society of America 22 639 ndash640

Leonardo A amp Konishi M (1999) Decrystallization of adultbirdsong by perturbation of auditory feedback Nature 399466 ndash470

Levelt W J (1983) Monitoring and self-repair in speechCognition 14 41ndash104

Levelt W J M (1989) Speaking From intention toarticulation Cambridge MIT Press

Liegeois-Chauvel C Musolino A Badier J M Marquis P ampChauvel P (1994) Evoked potentials recorded from theauditory cortex in man Evaluation and topography of themiddle latency components Electroencephalography andClinical Neurophysiology 92 204 ndash214

Mahncke H (1998) Oscillations and temporal processingin sensory cortex Unpublished doctoral dissertationUniversity of California San Francisco

McCasland J S amp Konishi M (1981) Interaction betweenauditory and motor activities in an avian song controlnucleus Proceedings of the National Academy of SciencesUSA 78 7815ndash7819

McGuire P K Silbersweig D A amp Frith C D (1996)Functional neuroanatomy of verbal self-monitoring Brain119 907 ndash917

Mosher J C Lewis P S amp Leahy R M (1992) Multipledipole modeling and localization from spatio-temporal megdata IEEE Transactions on Biomedical Engineering 39541 ndash557

Muller-Preuss P amp Ploog D (1981) Inhibition of auditorycortical neurons during phonation Brain Research 21561ndash76

Numminen J amp Curio G (1999) Differential effects of overtcovert and replayed speech on vowel-evoked responsesof the human auditory cortex Neuroscience Letters 27229ndash32

Numminen J Salmelin R amp Hari R (1999) Subjectrsquos ownspeech reduces reactivity of the human auditory cortexNeuroscience Letters 265 119 ndash122

Papanicolaou A C Raz N Loring D W amp EisenbergH M (1986) Brain stem evoked response suppressionduring speech production Brain and Language 2750ndash55

Perkell J (1997) Articulatory processes In W J Hardcastleamp J Laver (Eds) The handbook of phonetic sciences(pp 333 ndash370) Oxford UK Blackwell Publishers

Picton T W Alain C Woods D L John M S Scherg MValdes-Sosa P Bosch-Bayard J amp Trujillo N J (1999)Intracerebral sources of human auditory-evoked potentialsAudiology and Neuro-otology 4 64ndash79

Poeppel D Yellin E Phillips C Roberts T P Rowley H AWexler K amp Marantz A (1996) Task-induced asymmetry ofthe auditory evoked m100 neuromagnetic field elicited byspeech sounds Brain Research Cognitive Brain Research4 231 ndash242

Reite M Adams M Simon J Teale P Sheeder JRichardson D amp Grabbe R (1994) Auditory m100component 1 Relationship to heschlrsquos gyri BrainResearch Cognitive Brain Research 2 13ndash20

Richer F Alain C Achim A Bouvier G amp Saint-Hilaire J M(1989) Intracerebral amplitude distributions of the auditoryevoked potential Electroencephalography and ClinicalNeurophysiology 74 202 ndash208

Robinson S amp Vrba J (1999) Functional neuroimaging bysynthetic aperture magnetometry (SAM) In Yoshimoto et al(Eds) Recent advances in biomagnetism (pp 302 ndash305)Sendai Japan Tokyo University Press

Scherg M amp Von Cramon D (1985) Two bilateral sources ofthe late aep as identified by a spatio-temporal dipole modelElectroencephalography and Clinical Neurophysiology 6232ndash44

Scherg M amp Von Cramon D (1986) Evoked dipolesource potentials of the human auditory cortexElectroencephalography and Clinical Neurophysiology65 344 ndash360

Schmidt M F amp Konishi M (1998) Gating of auditoryresponses in the vocal control system of awake songbirdsNature Neuroscience 1 513ndash518

Sekihara K Nagarajan S S Poeppel D Marantz Aamp Miyashita Y (2001) Reconstructing spatio-temporalactivities of neural sources using an meg vectorbeamformer technique IEEE Transactions on BiomedicalEngineering 48 760ndash771

Sekihara K Poeppel D Marantz A Koizumi H ampMiyashita Y (1997) Noise covariance incorporatedmeg-music algorithm A method for multiple-dipoleestimation tolerant of the influence of background brainactivity IEEE Transactions on Biomedical Engineering44 839 ndash847

Stevens K N (1999) Acoustic phonetics CambridgeMIT Press

Stufflebeam S M Poeppel D Rowley H A amp RobertsT P (1998) Peri-threshold encoding of stimulus

Houde et al 1137

frequency and intensity in the m100 latency NeuroReport9 91ndash94

Suga N amp Schlegel P (1972) Neural attenuation ofresponses to emitted sounds in echolocating rats Science177 82ndash84

Suga N amp Shimozawa T (1974) Site of neural attenuation ofresponses to self-vocalized sounds in echolocating batsScience 183 1211 ndash1213

von Bekesy G (1949) The structure of the middle ear and thehearing of onersquos own voice by bone conduction Journal ofthe Acoustical Society of America 21 217 ndash232

Weiskrantz L Elliott J amp Darlington C (1971) Preliminaryobservations on tickling oneself Nature 230 598 ndash599

Woods D L Knight R T amp Neville H J (1984)Bitemporal lesions dissociate auditory evoked potentialsand perception Electroencephalography and ClinicalNeurophysiology 57 208 ndash220

Yates A J (1963) Delayed auditory feedback PsychologicalBulletin 60 213 ndash232

Zatorre R J Evans A C Meyer E amp Gjedde A (1992)Lateralization of phonetic and pitch discrimination in speechprocessing Science 256 846 ndash849

1138 Journal of Cognitive Neuroscience Volume 14 Number 8

frequency and intensity in the m100 latency NeuroReport9 91ndash94

Suga N amp Schlegel P (1972) Neural attenuation ofresponses to emitted sounds in echolocating rats Science177 82ndash84

Suga N amp Shimozawa T (1974) Site of neural attenuation ofresponses to self-vocalized sounds in echolocating batsScience 183 1211 ndash1213

von Bekesy G (1949) The structure of the middle ear and thehearing of onersquos own voice by bone conduction Journal ofthe Acoustical Society of America 21 217 ndash232

Weiskrantz L Elliott J amp Darlington C (1971) Preliminaryobservations on tickling oneself Nature 230 598 ndash599

Woods D L Knight R T amp Neville H J (1984)Bitemporal lesions dissociate auditory evoked potentialsand perception Electroencephalography and ClinicalNeurophysiology 57 208 ndash220

Yates A J (1963) Delayed auditory feedback PsychologicalBulletin 60 213 ndash232

Zatorre R J Evans A C Meyer E amp Gjedde A (1992)Lateralization of phonetic and pitch discrimination in speechprocessing Science 256 846 ndash849

1138 Journal of Cognitive Neuroscience Volume 14 Number 8