a meg study of the neural basis of context-dependent speech categorization erika j.c.l. taylor 1,...

1
A MEG study of the neural basis of context-dependent speech categorization Erika J.C.L. Taylor 1 , Lori L. Holt 1 , Anto Bagic 2 1 Department of Psychology and Center for the Neural Basis of Cognition, Carnegie Mellon University; 2 Department of Neurology, University of Pittsburgh Acknowledgments Research was in part supported by : National Organization for Hearing Research (NOHR) National Institutes of Health (NIH) CNBC / Multimodal Neuroimaging Training Program Training Grant University of Pittsburgh Medical Center, MR Research Center Pilot Imaging Program UPMC Center for Advanced Brain Magnetic Source Imaging (CABMSI) Elekta-Neuromag Oy (Helsinki, Finalnd) References Holt, L. L. (2006b). The mean matters: Effects of statistically-defined non-speech spectral distributions on speech categorization. Journal of the Acoustical Society of America, 120, 2801-2817. Repp, B. H. (1982) Phonetic trading relations and context effects: New experimental evidence for a speech mode of perception. Psychol. Bull. 92, 81–110. Repp, B. H. & A. M. Liberman. (1987). Phonetic categories are flexible. In: S. Harnad (Ed.) Categorical Perception, (pp. 89-112). Cambridge University Press. Roberts, T. P, Ferrari P, Poeppel D. (1998). Latency of evoked neuromagnetic M100 reflects perceptual and acoustic stimulus attributes. Neuroreport, 9, 3265-3269. Roberts, T. P. L., Flagg E, & Gage NM. (2004). Vowel categorization induces departure of M100 latency from acoustic prediction. Neuroreport, 15, 1679-1682. Rosch, E., & Lloyd, B. B. (1978). Cognition and Categorization. Erlbaum Associates: Hillsdale, NJ. Shestakova, A., Brattico, E., Soloviev, A., Klucharev, V., and Huotilainen, M. (2004). Orderly cortical representation of vowel categories presented by multiple exemplars. Cognitive Brain Research, 21:3, 342-350. Abstract Previous MEG research has reported an influence of speech category boundaries on the latency of the M100 response, an evoked response that peaks 100ms post-stimulus onset. Vowels classified by listeners as unambiguously belonging to a vowel category evoked M100 latencies clustered by speech category membership whereas perceptually- ambiguous vowels produced M100 latencies better characterized by detailed acoustic features of the stimuli (Roberts et al., 2004, Neuroreport, 15(10), 1679-1682). The purpose of the current work is to further investigate the relationship of M100 latency to speech categorization by examining context effects whereby preceding acoustic speech or non-speech context affects how listeners categorize subsequent speech targets. Although such effects are well- documented behaviorally, little is known about the underlying neural mechanisms. The pattern of perception present in behavioral studies is such that speech is categorized relative to the spectral characteristics of neighboring stimuli. We thus hypothesize M100 may exhibit context-dependent shifts in latencies as a function of context sounds' acoustics. Following previous behavioral research, temporally-nonadjacent distributionally-defined sine-wave tone sequences with varying mean frequencies preceded speech targets that listeners categorized. Simultaneously, electrophysiological responses were recorded using concurrent EEG and MEG acquisition. Results are presented as latency analyses and source localization of tone sequence contexts and speech targets, specifically contrasting how representation of individual speech targets shifts as a function of preceding context. No Context Non-Speech Context 50ms 73ms 101ms 110ms 173ms 185ms 50 ftTcm Red = No Context Yellow = Non-Speech Context Latency Effects 4 No Context Non-Speech Context Right Superior Temporal 235 ms 531 ms Anterior Cingulate No Context Non-Speech Context Source Analysis 4 Discussion 5 In both Left and Right Auditory cortices we find a slightly shorter latency for speech targets that are not preceded by non-speech context compared to those that are. From these results we conclude that the presence of non-speech context immediately preceding speech tokens causes the acoustic processing of those targets to be sluggish, possibly because of the procedure of switching from processing tones to processing speech. When speech sounds are presented in isolation, there is more activity in the Right Superior Temporal area at around 235 ms. Tone stimuli have been hypothesized to be primarily processed by right hemisphere temporal structures, whereas language stimuli are hypothesized to be processed by left hemisphere temporal structures. The increased amplitude of activity in the Right Superior Temporal area during a non context condition leads us to consider the possibility that in the non-speech context condition, the presence of tone sequences may be dampening the magnitude of response from this region, as its baseline level of activity may already be increased from processing tones. When speech sounds are preceded by acoustic context, there is more activity in Anterior Cingulate. One possible reason for Anterior Cingulate to be active during the presence of acoustic context is that there may be a form of task switching involved when transitioning from listening to tone sequences to listening to speech targets. Anterior Cingulate may be recruited during this switch. Generally, these results taken together suggest that there is a lack of specificity for processing speech, lending evidence toward a domain general view of speech processing. Introduction 1 Spoken communication is deceptively simple. The ease of everyday conversation masks the cognitive and perceptual challenges of translating from acoustic signal to meaning. One of the fundamental reasons for this is that speech acoustics are incredibly complex. With diverse sources of acoustic variability – some linguistically relevant, some not – the mechanisms that transform the acoustic signal into a linguistic representation face a complex task Thus, a listener must discriminate some acoustic variability and treat other, potentially discriminable, acoustic variability as functionally equivalent. Another way to say this is that listeners must categorize speech. Categorization is a general process, performed across perceptual modalities (Rosch & Lloyd, 1978). A cardinal characteristic of speech categorization is context dependence. A specific acoustic signal may be categorized as a member of one speech category in one context, but as a member of a different category in another context. As Repp and Liberman (1987) put it, “phonetic categories are flexible.” The traditional accounting for context dependence in speech categorization relied on tacit knowledge of vocal tract dynamics or motor representations to discover the relationship between context and target (see Repp, 1982). However, research in our lab has demonstrated that non- speech context sounds influence the way that is speech categorized, too, thereby suggesting more general mechanisms not specific to speech. Our research in this area has led us to a view of auditory processing that highlights its adaptive plasticity to the ever-changing sound environment. The present research will begin to discover the neural basis of this adaptive plasticity using magnetoencephalography (MEG). A significant electromagnetic signal that exhibits the processes driving speech categorization is the M100 (labeled as N100 in EEG signal). The M100 is a peak in the electrophysiological signal about 100 ms after the onset of an auditory stimulus. In the past this evoked response was thought to index simple sensory characteristics of the auditory input and not by higher- order perceptual processes. However, recent research has found that the M100 reflects category-level, perceptual information as well (e.g., Roberts et al., 1998; Roberts et al., 2004; Shestakova et al. 2004). Specific to our interest in categorization, M100 peak latency shifts as a function of categorical shifts observed in behavioral forced-choice vowel categorization (Roberts et al., 2004). Observing the effects of context on the latency of the M100 as an index of the neural processes involved in context-dependent speech categorization is an important issue that has yet to be investigated. Our paradigm (in which a distribution of tones elicits the effect of context on speech categorization) provides an excellent means of teasing apart the boundaries and constraints on these mechanisms. Additionally, we investigate the impact of the presence of this non-speech context in other windows of the time course and in other brain regions. Design 2 Time Speech Target 589 ms Silent Interval No Context Time Speech Target 589 ms Silent Interval 50 ms Acoustic History 2100 ms Standard Tone 70 ms Non-Speech Context 1 2 6 7 5 3 4 8 9 10 / da/ / ga/ ? ipants are told to choose if they heard /ga/ or /da/, even when it is ambiguous. High Mean Frequency Non-Speech Context + ? = CONTRAST Analysis Procedures 3 3 native English speakers right handed, normal hearing Elekta Neuromag 306 sensor Vectorview Sampling Rate = 1000 kHz Spatial Filtering Signal Space Separation (SSS) Frequency Filtering Bandpass: 0.5 - 40 Hz Rejection Parameters Gradiometers: 3000 fT/cm EOG: 150 µV Averaging Time-locked to onset of presentation of word Source Localization Transformed all data to standard head position Localized sources using Minimum Current Estimate (MCE)

Upload: mark-chase

Post on 02-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A MEG study of the neural basis of context-dependent speech categorization Erika J.C.L. Taylor 1, Lori L. Holt 1, Anto Bagic 2 1 Department of Psychology

A MEG study of the neural basis of context-dependent speech categorization

Erika J.C.L. Taylor1, Lori L. Holt1, Anto Bagic2

1 Department of Psychology and Center for the Neural Basis of Cognition, Carnegie Mellon University; 2 Department of Neurology, University of Pittsburgh

Acknowledgments Research was in part supported by :

National Organization for Hearing Research (NOHR)

National Institutes of Health (NIH)

CNBC / Multimodal Neuroimaging Training Program Training Grant

University of Pittsburgh Medical Center, MR Research Center Pilot Imaging Program

UPMC Center for Advanced Brain Magnetic Source Imaging (CABMSI)

Elekta-Neuromag Oy (Helsinki, Finalnd)

References Holt, L. L. (2006b). The mean matters: Effects of statistically-defined non-speech spectral distributions on speech

categorization. Journal of the Acoustical Society of America, 120, 2801-2817.Repp, B. H. (1982) Phonetic trading relations and context effects: New experimental evidence for a speech mode of

perception. Psychol. Bull. 92, 81–110.Repp, B. H. & A. M. Liberman. (1987). Phonetic categories are flexible. In: S. Harnad (Ed.) Categorical Perception, (pp. 89-

112). Cambridge University Press.Roberts, T. P, Ferrari P, Poeppel D. (1998). Latency of evoked neuromagnetic M100 reflects perceptual and acoustic stimulus

attributes. Neuroreport, 9, 3265-3269.Roberts, T. P. L., Flagg E, & Gage NM. (2004). Vowel categorization induces departure of M100 latency from acoustic

prediction. Neuroreport, 15, 1679-1682.Rosch, E., & Lloyd, B. B. (1978). Cognition and Categorization. Erlbaum Associates: Hillsdale, NJ.Shestakova, A., Brattico, E., Soloviev, A., Klucharev, V., and Huotilainen, M. (2004). Orderly cortical representation of

vowel categories presented by multiple exemplars. Cognitive Brain Research, 21:3, 342-350.

AbstractPrevious MEG research has reported an influence of speech category boundaries on the latency of the M100 response, an evoked response that peaks 100ms post-stimulus onset. Vowels classified by listeners as unambiguously belonging to a vowel category evoked M100 latencies clustered by speech category membership whereas perceptually-ambiguous vowels produced M100 latencies better characterized by detailed acoustic features of the stimuli (Roberts et al., 2004, Neuroreport, 15(10), 1679-1682). The purpose of the current work is to further investigate the relationship of M100 latency to speech categorization by examining context effects whereby preceding acoustic speech or non-speech context affects how listeners categorize subsequent speech targets. Although such effects are well-documented behaviorally, little is known about the underlying neural mechanisms. The pattern of perception present in behavioral studies is such that speech is categorized relative to the spectral characteristics of neighboring stimuli. We thus hypothesize M100 may exhibit context-dependent shifts in latencies as a function of context sounds' acoustics. Following previous behavioral research, temporally-nonadjacent distributionally-defined sine-wave tone sequences with varying mean frequencies preceded speech targets that listeners categorized. Simultaneously, electrophysiological responses were recorded using concurrent EEG and MEG acquisition. Results are presented as latency analyses and source localization of tone sequence contexts and speech targets, specifically contrasting how representation of individual speech targets shifts as a function of preceding context.

No Context Non-Speech Context

50ms 73ms

101ms 110ms

173ms 185ms

50 ftTcm Red = No Context Yellow = Non-Speech Context

Latency Effects 4

No Context Non-Speech Context

Right

Superior

Temporal

235 ms

531 msAnterior

Cingulate

No Context Non-Speech Context

Source Analysis 4

Discussion 5

In both Left and Right Auditory cortices we find a slightly shorter latency for speech targets that are not preceded by non-speech context compared to those that are.

From these results we conclude that the presence of non-speech context immediately preceding speech tokens causes the acoustic processing of those targets to be sluggish, possibly because of the procedure of switching from processing tones to processing speech.

When speech sounds are presented in isolation, there is more activity in the Right Superior Temporal area at around 235 ms.

Tone stimuli have been hypothesized to be primarily processed by right hemisphere temporal structures, whereas language stimuli are hypothesized to be processed by left hemisphere temporal structures. The increased amplitude of activity in the Right Superior Temporal area during a non context condition leads us to consider the possibility that in the non-speech context condition, the presence of tone sequences may be dampening the magnitude of response from this region, as its baseline level of activity may already be increased from processing tones.

When speech sounds are preceded by acoustic context, there is more activity in Anterior Cingulate.

One possible reason for Anterior Cingulate to be active during the presence of acoustic context is that there may be a form of task switching involved when transitioning from listening to tone sequences to listening to speech targets. Anterior Cingulate may be recruited during this switch.

Generally, these results taken together suggest that there is a lack of specificity for processing speech, lending evidence toward a domain general view of speech processing.

Introduction 1Spoken communication is deceptively simple. The ease of everyday

conversation masks the cognitive and perceptual challenges of translating from acoustic signal to meaning. One of the fundamental reasons for this is that speech acoustics are incredibly complex. With diverse sources of acoustic variability – some linguistically relevant, some not – the mechanisms that transform the acoustic signal into a linguistic representation face a complex task

Thus, a listener must discriminate some acoustic variability and treat other, potentially discriminable, acoustic variability as functionally equivalent. Another way to say this is that listeners must categorize speech. Categorization is a general process, performed across perceptual modalities (Rosch & Lloyd, 1978). A cardinal characteristic of speech categorization is context dependence. A specific acoustic signal may be categorized as a member of one speech category in one context, but as a member of a different category in another context. As Repp and Liberman (1987) put it, “phonetic categories are flexible.” The traditional accounting for context dependence in speech categorization relied on tacit knowledge of vocal tract dynamics or motor representations to discover the relationship between context and target (see Repp, 1982). However, research in our lab has demonstrated that non-speech context sounds influence the way that is speech categorized, too, thereby suggesting more general mechanisms not specific to speech. Our research in this area has led us to a view of auditory processing that highlights its adaptive plasticity to the ever-changing sound environment.

The present research will begin to discover the neural basis of this adaptive plasticity using magnetoencephalography (MEG). A significant electromagnetic signal that exhibits the processes driving speech categorization is the M100 (labeled as N100 in EEG signal). The M100 is a peak in the electrophysiological signal about 100 ms after the onset of an auditory stimulus. In the past this evoked response was thought to index simple sensory characteristics of the auditory input and not by higher-order perceptual processes. However, recent research has found that the M100 reflects category-level, perceptual information as well (e.g., Roberts et al., 1998; Roberts et al., 2004; Shestakova et al. 2004). Specific to our interest in categorization, M100 peak latency shifts as a function of categorical shifts observed in behavioral forced-choice vowel categorization (Roberts et al., 2004).

Observing the effects of context on the latency of the M100 as an index of the neural processes involved in context-dependent speech categorization is an important issue that has yet to be investigated. Our paradigm (in which a distribution of tones elicits the effect of context on speech categorization) provides an excellent means of teasing apart the boundaries and constraints on these mechanisms. Additionally, we investigate the impact of the presence of this non-speech context in other windows of the time course and in other brain regions.

Design 2

Time

Speech Target589 ms

Silent Interval

No Context

Time

Speech Target589 ms

Silent Interval

50 ms

Acoustic History2100 ms

StandardTone70 ms

Non-Speech Context

1 2 6 753 4 8 9 10

/da//ga/ ?

Participants are told to choose if they heard /ga/ or /da/, even when it is ambiguous.

High Mean Frequency

Non-Speech Context + ? =CONTRAST

Analysis Procedures 3

3 native English speakers right handed, normal hearing

Elekta Neuromag

306 sensor Vectorview

Sampling Rate = 1000 kHz

Spatial Filtering

Signal Space Separation (SSS)

Frequency Filtering

Bandpass: 0.5 - 40 Hz

Rejection Parameters

Gradiometers: 3000 fT/cm

EOG: 150 µV

Averaging

Time-locked to onset of presentation of word

Source Localization

Transformed all data to standard head position

Localized sources using

Minimum Current Estimate (MCE)