description: state the application’s broad, long...

60
Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H. INTRODUCTION TO REVISED APPLICATION Overview. This application is a first resubmission. We thank the reviewers for their thoughtful comments on the original submission as they allowed us to significantly strengthen the proposal. The primary concern of the reviewers was that insufficient methodological detail was provided regarding the proposed experiments and modeling projects. They agreed that adequate progress had been made in the previous budget period and that the proposed work was poised to make significant advances in our understanding of speech production. We have therefore focused our revision efforts on providing additional methodological details throughout the Research Plan. To this end, portions of the Progress Report (Section C) that described subprojects in the last budget period which were not directly related to the current application were shortened or removed to allow us to focus more on material relevant for evaluating the proposed research. Section D has also been extensively revised, as detailed in the following paragraphs. Paragraphs of the application text that have changed substantially from the previous submission are indicated with vertical bars in the left margin of the Research Plan. Concerns of Reviewers 2 and 3. These reviewers were generally enthusiastic about the application and suggested few revisions. The primary concern of Reviewer 3 was that he/she felt there were problems inherent to the design of the two proposed studies involving clinical populations. Reviewer 2 had a similar concern regarding one of these studies. Reviewer 3 felt the clinical population studies should either be dropped from the project or substantially revised. We chose to drop the studies from the project, thereby allowing us to devote more space to methodological details for the remaining experiments. Reviewer 2 had two additional concerns. The first was a general lack of methodological details; this was the primary concern of Reviewer 1 as well, and below we describe how this has been addressed. The second was a minor concern involving a possible missed opportunity to study the time course of adaptation in Production Experiments 1 and 2 in Section D.1. This has been addressed by adding text in Sections C.2 and D.1 making explicit how we will investigate the time course of learning in the experiments and model simulations. The remaining paragraphs of this Introduction detail the changes we made to the application to address the concerns of Reviewer 1. Rationale for studying prosody. Reviewer 1 felt that the prosodic component of the research was not well-motivated and seemed distinct from the rest of the project. This concern has been addressed in several ways. First, we have highlighted the application’s focus on investigating auditory feedback control in speech production. In Sections B and D.1 we describe how the control of prosody is an important component of this line of inquiry. Second, we have added text explicitly stating the prosody-related hypotheses to be tested, both our own and those posited by other researchers, and the manner in which they will be tested. (This was done for all experiments in the application, not just the prosody experiments.) It is important to note that, although many of the hypotheses to be tested are embodied by our proposed neural model, they are not simply our view, but instead reflect current hypotheses proposed by many researchers. The proposed experiments therefore address a wide range of the speech theoretical and experimental literature, not just our own model. This aspect of the proposed research has been highlighted in Section D. Finally, it should be stressed that speech motor disorders rarely affect only the segmental component of speech. Instead, both prosodic and segmental disturbances typically co-occur (Darley Aronson, Brown, 1969, 1975; Kent & Rosenbek, 1982). Yorkston, Beukelman & Bell (1988) have argued that prosodic and phonetic parameters are intertwined in their effect on speech intelligibility. PHS 398 (Rev. 05/01) Page> 18

Upload: ngocong

Post on 17-Apr-2018

220 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.INTRODUCTION TO REVISED APPLICATION

Overview. This application is a first resubmission. We thank the reviewers for their thoughtful comments on the original submission as they allowed us to significantly strengthen the proposal. The primary concern of the reviewers was that insufficient methodological detail was provided regarding the proposed experiments and modeling projects. They agreed that adequate progress had been made in the previous budget period and that the proposed work was poised to make significant advances in our understanding of speech produc-tion. We have therefore focused our revision efforts on providing additional methodological details throughout the Research Plan. To this end, portions of the Progress Report (Section C) that described subprojects in the last budget period which were not directly related to the current application were shortened or removed to al-low us to focus more on material relevant for evaluating the proposed research. Section D has also been ex -tensively revised, as detailed in the following paragraphs. Paragraphs of the application text that have changed substantially from the previous submission are indicated with vertical bars in the left margin of the Research Plan.

Concerns of Reviewers 2 and 3. These reviewers were generally enthusiastic about the application and suggested few revisions. The primary concern of Reviewer 3 was that he/she felt there were problems inher -ent to the design of the two proposed studies involving clinical populations. Reviewer 2 had a similar concern regarding one of these studies. Reviewer 3 felt the clinical population studies should either be dropped from the project or substantially revised. We chose to drop the studies from the project, thereby allowing us to de-vote more space to methodological details for the remaining experiments.

Reviewer 2 had two additional concerns. The first was a general lack of methodological details; this was the primary concern of Reviewer 1 as well, and below we describe how this has been addressed. The second was a minor concern involving a possible missed opportunity to study the time course of adaptation in Pro-duction Experiments 1 and 2 in Section D.1. This has been addressed by adding text in Sections C.2 and D.1 making explicit how we will investigate the time course of learning in the experiments and model simulations.

The remaining paragraphs of this Introduction detail the changes we made to the application to address the concerns of Reviewer 1.

Rationale for studying prosody. Reviewer 1 felt that the prosodic component of the research was not well-motivated and seemed distinct from the rest of the project. This concern has been addressed in several ways. First, we have highlighted the application’s focus on investigating auditory feedback control in speech produc-tion. In Sections B and D.1 we describe how the control of prosody is an important component of this line of inquiry. Second, we have added text explicitly stating the prosody-related hypotheses to be tested, both our own and those posited by other researchers, and the manner in which they will be tested. (This was done for all experiments in the application, not just the prosody experiments.) It is important to note that, although many of the hypotheses to be tested are embodied by our proposed neural model, they are not simply our view, but instead reflect current hypotheses proposed by many researchers. The proposed experiments therefore address a wide range of the speech theoretical and experimental literature, not just our own model. This aspect of the proposed research has been highlighted in Section D.

Finally, it should be stressed that speech motor disorders rarely affect only the segmental component of speech. Instead, both prosodic and segmental disturbances typically co-occur (Darley Aronson, Brown, 1969, 1975; Kent & Rosenbek, 1982). Yorkston, Beukelman & Bell (1988) have argued that prosodic and phonetic parameters are intertwined in their effect on speech intelligibility. Removal of prosodic cues has been shown to affect intelligibility of impaired (Bunton, Weismer, & Kent, 2000; Patel 2002b) and non-impaired speech (Bunton, Kent, Kent, & Duffy, 2001; Laures & Weismer, 1999; Laures & Bunton, 2003). Thus a complete model of speech production must provide a unified account of both segmental and prosodic aspects of speech. Relatedly, a common critique by colleagues familiar with the DIVA model is its lack of a prosodic component. For these reasons, we have chosen to spend a significant amount of effort in the coming budget period addressing this shortcoming.

Details concerning the relationship between modeling, behavior, and fMRI. Reviewer 1 felt that the de-scription of how the model simulations will be compared to the results of fMRI and behavioral experiments was not sufficiently detailed. We have addressed this concern in several ways. First, we have expanded the description of the model in C.1 to include a treatment of how we generate simulated fMRI activations from the model. Our method is based on the most recent results concerning the relationship between neural activity and the blood oxygen level dependent (BOLD) signal measured with fMRI, as detailed in Section C.1. The

PHS 398 (Rev. 05/01) Page> 18

Page 2: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.section also includes a treatment of how inhibitory neurons are modeled in terms of BOLD effects. Second, in Section C.2 we detail how the model’s productions are compared to behavioral results from auditory perturba-tion experiments. We also detail how model predictions can be tested with fMRI, and how we have success -fully done this to verify a model prediction concerning auditory error cells in speech production. Third, we have added further description of how we will compare the model fMRI activations to the results of our fMRI experiments in the modeling portions of Section D.

Reviewer 1 also noted that our previous comparisons between the model and fMRI data appear to be qualitative. Indeed we have not yet performed quantitative fits of the model activations to fMRI results, though the model does provide quantitative fits to the acoustic trajectories measured while subjects perform speech tasks in the fMRI scanner (see Fig. 5 in the revised Research Plan). Currently there is no accepted method for quantitatively comparing fMRI activations generated from a system-level neural model to the results of an fMRI experiment. This is because few system-level neural models make quantitative predictions regarding voxel-level activations across wide areas of the brain, unlike our model. To address this concern we have added text describing how we intend to investigate statistical methods for characterizing the goodness of fit of brain activity predicted by the DIVA model to statistical parametric maps of task-related BOLD signals col -lected in our fMRI experiments. Despite the qualitative nature of the model/fMRI comparisons to date, we be -lieve the model-based hypotheses tested in our fMRI experiments are more precise and theoretically grounded than those tested in the large majority of fMRI experiments. For example, once a cell type has been localized in our model (note that we assign each cell type to a particular coordinate in the same stereotactic space used to analyze neuroimaging data), we can make a prediction about what should happen in that pre -cise location under a certain speaking condition. We demonstrated this in our auditory perturbation study de-scribed in Section C.2. In that study, the model predicted both the existence of a cell type (auditory error cells that respond to a discrepancy between a speaker’s desired speech signal and the actual speech signal), as well as the location of this cell type in stereotactic space. These predictions were supported in the fMRI ex -periment. Furthermore, the same model provided quantitative fits to behavioral data collected during the fMRI experiment, specifically the acoustic signals produced by subjects under upward and downward perturbations of the first formant frequency. Section C.2 and Section D have been modified to highlight this aspect of the proposed research.

fMRI methodological details. Reviewers 1 and 2 felt that more methodological details concerning the fMRI experiments were needed in order to evaluate the proposed research. We have taken several steps to ad -dress this concern. First, we have expanded the “fMRI experimental protocol” subsection at the beginning of Section D. This section provides details concerning our event-triggered protocol and our rationale for using this protocol. Second, we have made explicit our hypothesis testing methods in Section D, as described above. Third, we have added a sub-section called “Effective connectivity analysis” in Section D of the pro -posal that describes the methods we will use for estimating path strengths in order to test predictions gener-ated from the model regarding changes in effective connectivity across experimental conditions. We have also added descriptions in the hypothesis test portions of Section D of how effective connectivity analysis will be used to test predictions in the fMRI experiments. Fourth, we have added power analysis details for the be-havioral and fMRI experiments to motivate the subject pool sizes. Included in the fMRI power analysis de-scriptions are details concerning the number of stimulus presentations for each condition.

Reviewer 1 also noted that it may be important to quantify the time courses of activations in the fMRI studies. This type of analysis is not possible on the time scale of a single trial given our event-triggered scan -ning paradigm, which utilizes sparse sampling and therefore does not contain enough time points of data to reconstruct the whole hemodynamic response time-course within a trial. (Instead our paradigm basically cap-tures the peak activation in each trial and compares the size of this peak across different trial types.) This par-adigm is necessary to avoid disruption of the auditory perturbation setup we use for our fMRI experiments: the setup will not work if the subject’s speech is contaminated by scanner noise. Although it is true that some studies in the literature are beginning to analyze time course information within a trial, this is not considered an absolute requirement in the neuroimaging community.

Relatedly, Reviewer 1 felt a need for fMRI analyses beyond “show[ing] just highly filtered images”. We have added text describing several additional analyses that will be performed on the results of the fMRI ex -periments, including structural equation modeling analyses of effective connectivity between regions, correla-tion analyses between behavioral data and BOLD response; and quantitative fitting of behavioral data mea-sured during the fMRI experiment with our model.

PHS 398 (Rev. 05/01) Page> 19

Page 3: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.A. SPECIFIC AIMSThe primary goal of this research project is the continued development, testing, and refinement of a computa-tional model describing the neural processes underlying speech. We believe this model, called the DIVA model1, provides the most comprehensive current account of the neural computations underlying speech ar-ticulation, including treatment of acoustic, kinematic, and neuroimaging data. In the upcoming budget period, we propose to focus on the neural circuitry involved in auditory feedback-based control of speech movements. The project involves 3 closely related subprojects aimed at key issues concerning auditory feedback control of speech: (1) Control of prosodic aspects of the speech signal. This subproject combines neural modeling with psy-chophysical and functional magnetic resonance imaging (fMRI) experiments to investigate the neural mecha-nisms responsible for the control of word- and phrase-level prosody. The psychophysical experiments are de-signed to identify the degree to which prosodic cues are controlled independently vs. in an integrated fashion. The fMRI experiments are designed to identify the neural circuitry responsible for feedback-based control of prosodic cues. The modeling project involves modification of the DIVA model to incorporate mechanisms for controlling prosody. Simulations of the model performing the same speech tasks used in the experiments will be compared to the experimental results to test the model’s account of the neural circuitry responsible for prosodic control.(2) Representation of speech sounds in auditory cortical areas. In this subproject, we propose fMRI ex-periments and modeling work designed to further our understanding of the representation of speech sounds in the auditory cortical areas. This issue is central to the DIVA model, which utilizes an auditory reference frame to store speech sound targets for production. The modeling work will extend our model of auditory cor-tical maps (currently distinct from the DIVA model) and integrate it with the DIVA model. The modeling work will be guided by two fMRI experiments investigating important aspects of the neural representation of speech sounds: phonemic vs. auditory representations and talker-specific vs. talker-independent representations. (3) Integration of feedforward and feedback commands in different speaking conditions. In this subpro-ject, we propose to investigate the hypothesis that decreased use of auditory feedback control in fast speech and increased use in clear (carefully articulated) and stressed (emphasized) speech are responsible for differ-ences in articulatory and auditory variability in the different speaking conditions. We propose an fMRI experi -ment to investigate fast, normal, clear, and stressed speech to test two model-based hypotheses regarding brain activity in the different conditions: (i) fast speech will lead to increased inhibition of auditory and so -matosensory areas (indicative of less use of feedback control) when compared to normal speech, and (ii) clear and stressed speech will lead to more activation in the auditory and somatosensory cortical areas than normal speech due to increased reliance on feedback control.

The modeling work proposed in the three subprojects will be integrated into a single, improved version of the DIVA model, thus insuring that they provide a unified account of the neural bases of auditory feedback control of speech. We believe the resulting model will be useful to researchers studying communication disor-ders by providing a much more detailed description of the functional and neuroanatomical underpinnings of normal speech than currently exists, and by providing a functional description of what to expect when part of the neural circuitry malfunctions. We believe this improved understanding will ultimately help guide improve-ments to diagnostic tools and treatments for speech disorders.B. BACKGROUND AND SIGNIFICANCENote: Publications listed in boldface are available in the appendix materials of this application.Combining neural models and functional brain imaging to understand the neural bases of speech. Re-cent years have witnessed a large number of functional brain imaging experiments studying speech and lan-guage, and much has been learned from these studies regarding the brain mechanisms underlying speech and its disorders. For example, functional magnetic resonance imaging (fMRI) and positron emission tomog-raphy (PET) studies have identified the cortical and subcortical areas involved in simple speech tasks (e.g., Hickok et al., 2000; Riecker et al, 2000a,b; Wise et al, 1999; Wildgruber et al., 2001). However these studies do not, by themselves, answer the question of what particular function a particular brain region may play in speech. For example activity in the anterior insula has been identified in numerous speech neuroimaging studies (e.g., Wise et al, 1999; Riecker et al., 2000a), but much controversy still exists concerning its particu-lar role in the neural control of speech (Dronkers, 1996; Ackermann & Riecker 2004 ; Hillis et al, 2004; Shuster & Lemieux, in press). We contend that a better understanding of the exact roles of different brain regions in speech requires the formulation of computational neural models whose components model the computa-

1 DIVA stands for Directions Into Velocities of Articulators, which characterizes a central component of the model’s control scheme.

PHS 398 (Rev. 05/01) Page> 20

Page 4: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.tions performed by individual brain regions (see Horwitz et al., 1999; Husain et al., 2004; Tagamets & Hor -witz, 1997; Fagg & Arbib, 1998 for other examples of this approach). In the past decade we have developed one such model of speech production called the DIVA model (Guenther, 1994, 1995; Guenther et al., 1998, 2005). The model’s components correspond to regions of the cerebral cortex and cerebellum, and they con-sist of groups of neurons whose activities during speech tasks can be measured and compared to the brain activities of human subjects performing the same task (e.g., Guenther et al., 2005). These neurons are con-nected by synapses that become tuned during a babbling process and with repetition of learned sounds. Computer simulations of the model provide a simple, unified account of a wide range of observations con -cerning speech acquisition and production, including data concerning kinematics and acoustics of speech movements (Callan et al., 2000; Guenther, 1994, 1995; Guenther et al., 1998, 2005; Nieto-Castanon et al., 2005; Perkell et al., 2000, 2004ab) as well as brain activities underlying speech production (Guenther et al., 2005).

In this application we propose a set of modeling, neuroimaging, and psychophysical experiments de-signed to address several current shortcomings of the DIVA model, in the process improving our understand-ing of several aspects of speech production, including the neural bases of prosodic control, the auditory rep-resentations used for speech production, and the control of speaking rate and clarity.Neural bases of prosodic control. The speech signal consists of phonetic segments that correspond to the basic sound units (phonemes and/or syllables, or segments) and prosodic cues such as pitch, loudness, du-ration and rhythm that convey meaningful differences in linguistic or affective state (Bolinger, 1961, 1989; Lehiste, 1970, 1976; Netsell, 1973; Shriberg & Kent, 1982). To date we have focused on the segmental as-pects of speech; in the current application we propose to address both prosodic and segmental aspects.

Prosody serves various grammatical, semantic, emotional, social and psychological roles in English. Prosodic cues, often referred to as suprasegmentals, have two defining characteristics: they tend to co-occur with segmental units and they can potentially extend over more than one segmental unit (Cruttenden, 1997; Lehiste, 1970). Young children can modulate the prosody of their cries and babbles well before they can pro-duce any identifiable speech sounds (Bloom, 1973; Kent & Bauer, 1985; Kent & Murray, 1982; MacNeilage & Davis, 1993; Menyuk & Bernholtz, 1969; Snow, 1994). Recent findings indicate that people with severely im-paired speech can also control prosody despite little or no segmental clarity (Patel 2002a,b, 2003, 2004; Vance 1994; Whitehill et al., 2001). Once regarded as subservient to speech segments, prosody is beginning to be viewed as the ‘scaffolding’ that holds different levels of phonetic description together. Thus far the DIVA model has focused on the segmental aspects of speech; in this application we propose to extend the model to address the neural mechanisms involved in the control of linguistic prosody.

Regarding the neural bases of prosody, there is agreement in the literature that no single brain region is responsible for prosodic control, but there is little consensus as to exactly which regions are involved and in what capacity (see Sidtis & Van Lancker Sidtis, 2003 for a review). Most studies have focused on hemi-spheric asymmetries in prosodic perception and control, with relatively little attention to different roles played by different regions within the cerebral hemispheres. One of the more consistent findings in the literature con-cerns perception and production of affective prosody, which appear to rely more on the right cerebral hemi-sphere than the left hemisphere (Adolphs et al., 2002; Buchanan et al., 2000; George et al., 1996; Ghacibeh & Heilman, 2003; Kotz et al., 2003; Mitchell et al., 2003; Pihan et al., 1997; Ross & Mesulam, 1979; Williamson et al., 2003), though the view of affective prosody as a unitary, purely right hemisphere entity is oversimplified (Sidtis & Van Lancker Sidtis, 2003). Some researchers have concluded that linguistic prosody primarily involves left hemisphere mechanisms (Emmorey, 1987; Walker et al., 2004), while others suggest significant involvement of both hemispheres (Mildner, 2004). Lexical pitch in tonal languages appears to be predominantly processed by the left hemisphere (Gandour et al., 2003). Phrase- or sentence-level aspects of linguistic prosody may be processed preferentially in the right hemisphere (Meyer et al., 2002), though some report bilateral activation for these processes (Doherty et al., 2004; Stiller et al., 1997).

We propose to investigate the neural circuitry involved in the control of linguistic prosodic cues at the word and phrase levels, with the goal of providing a more precise account of the roles of different brain re -gions in these processes. The proposed studies are novel in their use of a combination of modeling, neu-roimaging, and psychophysics, as well as their use of an auditory perturbation paradigm to identify regions of the brain involved in online auditory feedback-based control of prosodic cues. This work, in combination with our previous and proposed research into the control of segmental cues, will also shed light on the interrela -tion between segmental and prosodic cues and its implications on speech motor control.Auditory perturbation studies. Below we propose psychophysical and fMRI experiments in which the acoustic feedback of prosodic or segmental cues from the subject’s own voice are modified in near-real-time (less than 18 ms delay) to investigate how the subject compensates and adapts to these modifications. This

PHS 398 (Rev. 05/01) Page> 21

Page 5: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.paradigm, in which a subject’s perception of his/her own actions is “perturbed” by an experimenter and the subject’s response to the perturbation over time is measured, has been used in a number of prior studies to investigate motor control mechanisms. For example, it has long been known that adding noise to a speaker’s auditory feedback of his/her own speech or attenuating the speaker’s auditory feedback leads to a compen -satory increase in speech volume (Lombard, 1911; Lane & Tranel, 1971). The majority of recent auditory per-turbation studies have involved the manipulation of vocal pitch. The auditory percept of pitch depends primar -ily on the frequency of vocal fold vibration, which is referred to as the fundamental frequency (F0). Pitch shift-ing experiments typically involve shifting of the entire frequency spectrum (including F0 as well as the formant frequencies), which yields the subjective impression of hearing one’s own voice with a different pitch. Elman (1981) showed that speakers respond to frequency-shifted feedback by altering the fundamental frequency of their speech in the opposite direction to the shift. The details of this compensatory response have been inves-tigated in several relatively recent studies by Larson, Burnett, Hain, and colleagues (Burnett et al., 1998; Bur-nett & Larson, 2002; Hain et al., 2001). In the studies of Larson and colleagues, subjects typically articulated sustained vowels in isolation, a rather unnatural speech task. Natke & Kalveram (2001) and Donath et al. (2002) extended these results by shifting pitch while subjects produced the nonsense word /tatatas/, where the first syllable could be either unstressed or stressed. If the first syllable was stressed (and therefore longer in duration), a compensatory response occurred during that syllable, but not if the first syllable was un -stressed. The second syllable showed evidence of a compensatory response in both the stressed and un-stressed cases. This suggests that subjects compensate for pitch shifting during production of syllable strings, though the compensation will occur during the first syllable only if this syllable is longer in duration than the latency of the compensatory response.

In Section D we propose several auditory perturbation experiments that extend beyond existing studies in several important directions. First, unlike previous studies, we will use pitch shifting and amplitude shifting in particular portions of an utterance to perturb specific prosodic cues. Second, we will perform perturbation experiments in the fMRI scanner to help us identify and characterize the brain regions involved in auditory feedback control of speech. Third, the studies will be accompanied by computational modeling projects in which the DIVA model will simulate the experimental tasks and its output will be compared to the experimen-tal results to test the model’s account of the neural circuits responsible for auditory error detection and correc-tion.C. PROGRESS REPORT / PRELIMINARY STUDIESThis report covers research performed during the current budget period. Our primary aims during this period were: (i) to further develop the DIVA model of speech production, (ii) to further develop our model of neural maps in auditory cortical areas, (iii) to develop laboratory infrastructure and software for performing 2 and ana-lyzing neuroimaging experiments, and (iv) to carry out fMRI experiments designed to test our models. The fol -lowing subsections describe our progress on these aims. C.1. Development of the DIVA model of speech production. During the current grant period, we have con-tinued the development of a neural network model of the brain processes underlying speech acquisition and production called the DIVA model, with emphasis on relating model components to particular regions of the brain. The model is able to produce speech sounds (including articulator movements and a corresponding acoustic signal) by learning mappings between articulations and their acoustic consequences, as well as au-ditory and somatosensory targets for individual speech sounds. It accounts for a number of speech produc-tion phenomena, including aspects of speech acquisition, coarticulation, contextual variability, motor equiva-lence, velocity/distance relationships, speaking rate effects, and perception-production interactions (Callan et al., 2000; Guenther, 1995; Guenther et al., 1998; Nieto-Castanon et al., 2005; Perkell et al., 2004a,b). The latest version of the model is detailed in Guenther et al. (2005). Here we briefly describe the model with at-tention to aspects relevant to the current proposal.

A schematic of the DIVA model is shown in Fig. 1. Each box in the diagram corresponds to a set of neu-rons in the model, and arrows correspond to synaptic projections that form mappings from one type of neural representation to another. The mappings labeled Feedback commands in Fig. 1 are tuned during a babbling phase in which semi-random articulator movements are used to produce auditory and somatosensory feed -back; this combination of articulatory, auditory, and somatosensory information is used to tune the synaptic projections between the sensory error maps and the model’s motor cortex.

In the next learning stage, the model learns a time-varying auditory target region for every speech sound presented to it. A “speech sound” as defined herein can be a phoneme, syllable, or word. For the sake of readability, in this application we will typically use the term “syllable” to refer to a single speech sound unit,

2 We collect our fMRI data at the Massachusetts General Hospital NMR Center.

PHS 398 (Rev. 05/01) Page> 22

Page 6: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.represented by its own speech sound map cell in the model. The target region is effectively stored in the synapses projecting from a Speech Sound Map cell that is chosen to represent the sound and the Auditory Error Map cells. The Speech Sound Map is hypothesized to lie in left hemisphere lateral premotor cortex, specifically in the ventral portion of the inferior frontal gyrus pars opercularis3; we will use the term frontal op-erculum to refer to this location here-after. (See Appendix A of Guenther et al., 2005 for justification of the hypothe-sized anatomical locations of model com-ponents.)

Once the target region for a sound has been learned, the model can use the feedback control subsystem to attempt to produce the sound. This is done by acti-vating the Speech Sound Map cell repre-senting the sound, which in turn reads out the sound’s target region to the Audi-tory Error Map, where it is compared to incoming auditory feedback of the model’s own speech. If the auditory feed-back is outside the auditory target region, cells in the Auditory Error Map become active, and this in turn leads to corrective motor commands (Feedback commands in Fig. 1). Additional synaptic projections from Speech Sound Map cells to the model’s motor cortex (both directly and via the cerebellum) form Feedforward commands; these commands are tuned by monitoring the overall motor command, which is formed by com-bining the Feedforward commands and Feedback commands. Early in development the feedforward com-mands are inaccurate, and the model depends on feedback control. Over time, the feedforward commands are tuned through monitoring of the movements commanded by the feedback subsystem.

We have performed a number of computer simulations of the model to verify that it can account for a wide range of experimental observations concerning the articulatory kinematics and acoustics of speech. As part of the current grant, we showed that the model can account for one of the more perplexing observations concerning speech kinematics: the very large amount of articulatory variability seen during production of the phoneme /r/ in different phonetic contexts, even by the same speaker (e.g., Delattre & Freeman, 1968; West-bury et al., 1998). As described in Nieto-Castanon et al. (2005), we collected structural MRI scans and acoustic data from two speakers who had participated in an earlier kinematic study of /r/ production (Guen -ther et al., 1999), and we used these data to construct two speaker-specific vocal tract models that mimicked the vocal tract shapes, movement degrees of freedom, and acoustics of the modeled speakers. These vocal tract models were embedded into the DIVA model, which then controlled their movements to produce /r/ in different phonetic contexts. The model’s articulatory gestures for /r/ were qualitatively and quantitatively very similar to those of the modeled subjects in each phonetic context, with both the model and speakers display -ing a wide range of tongue configurations for /r/ across contexts. Furthermore, differences in the model’s ges-tures using the two different vocal tract models mimicked the differences in articulations observed for the two speakers. These results, backed by detailed quantitative kinematic and acoustic analyses (for details see Ni-eto-Castanon et al., 2005), provide very strong support for the type of control scheme used by the DIVA model. We have also shown that this type of controller can account for observations concerning the kinemat -ics of human arm movements (Micci Barreca & Guenther, 2001; Dessing et al., in press).Generating simulated fMRI activations from model simulations. An important feature of the DIVA model that differentiates it from other computational models of speech production is that all of the model’s compo -nents have been associated with specific anatomical locations in the brain. These locations, specified in the Montreal Neurological Institute (MNI) coordinate frame in Table 1 of Guenther et al. (2005), are based on the results of fMRI and PET studies of speech production and articulation carried out by our lab and many others (see Guenther et al., 2005 for details). Since the model’s components correspond to groups of neurons at specific anatomical locations, it is possible to generate simulated fMRI activations from the model’s cell activi-3 This region is also part of Broca’s area and Brodmann’s Area 44.

PHS 398 (Rev. 05/01) Page> 23

Fig. 1. Block diagram of the DIVA model of speech production.

S o m a to sen so ry Sta te M ap(In f. P arie ta l C ortex )

S o m a to sen so ry ta rg e t regio n

F eed b a ckcom m a nd s

S p e ec h S o u n d M a p (Le ft F rontal O percu lum )

A u d itory fe edb a c k via sub c o rtic a l n u c le i

A rtic u la to ry Ve lo c ity a n d P o sit io n M a p s

(M otor C ortex )

A u d ito ry Sta te M a p(Sup. Tem pora l C ortex)

A u d ito ry E rro r M a p(Sup. Tem pora l C ortex)

S o m a to sen so ry E rro r M a p(In f. P ar ie ta l C ortex )

F eed forw a rdcom m a nd s

C ereb ellu m

A u d itory ta rg et regio n

S o m a to sen so ry fe edb a c k v ia sub c o rtical n uc le i

F eedb a ck C on tro l S u b sy s te m

F eedfo rw a rdC on tro l S u bsys tem

Page 7: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.ties during a computer simulation. The relationship between the signal measured in blood oxygen level de-pendent (BOLD) fMRI and electrical activity of neurons has been studied by numerous investigators in recent years. It is well-known that the BOLD signal is relatively sluggish compared to electrical neural activity. That is, for a very brief burst of neural activity, the BOLD signal will begin to rise and continue rising well after the neural activity stops, peaking about 4-6 seconds after the neural activation burst before falling down some-what below the starting level around 10-12 seconds after the neural burst and slowly rising back to the start -ing level. We use such a hemodynamic response function (HRF), which is part of the SPM software package for fMRI data analysis, to transform neural activities in our model cells into simulated fMRI activity. However, there are different possible definitions of “neural activity”, and the exact nature of the neural activity that gives rise to the BOLD signal is still currently under debate (e.g., Caesar et al., 2003; Heeger et al., 2000; Logo -thetis et al., 2001; Logothetis & Pfeuffer, 2004; Rees et al., 2000; Tagamets & Horwitz, 2001).

In our modeling work, each model cell is hypothesized to correspond to a small population of neurons that fire together. The output of the cell corresponds to neural firing rate (i.e., the number of action potentials per second of the population of neurons). This output is sent to other cells in the network, where it is multi -plied by synaptic weights to form synaptic inputs to these cells. The activity level of a cell is calculated as the sum of all the synaptic inputs to the cell (both excitatory and inhibitory), and if the net activity is above zero, the cell’s output is proportional to the activity level. If the net activity is below zero, the cell’s output is zero. It has been shown that the magnitude of the BOLD signal typically scales proportionally with the average firing rate of the neurons in the region where the BOLD signal is measured (e.g., Heeger et al., 2000; Rees et al., 2000). It has been noted elsewhere, however, that the BOLD signal actually correlates more closely with local field potentials, which are thought to arise primarily from averaged postsynaptic potentials (corresponding to the inputs of neurons), than it does to the average firing rate of an area (Logothetis et al., 2001). In particular, whereas the average firing rate may habituate down to zero with prolonged stimulation (greater than 2 sec), the local field potential and BOLD signal do not habituate completely, maintaining non-zero steady state val -ues with prolonged stimulation. In accord with this finding, the fMRI activations that we generate from our models are determined by convolving the total inputs to our modeled neurons (i.e., the activity level as de-fined above), rather than the outputs4 (firing rates), with an idealized hemodynamic response function gener-ated using default settings of the function ‘spm_hrf’ from the SPM toolbox (see Guenther et al., 2005 for de-tails).

In our models, an active inhibitory neuron has two effects on the BOLD signal: (i) the total input to the in -hibitory neuron will have a positive effect on the local BOLD signal, and (ii) the output of the inhibitory neuron will act as an inhibitory input to excitatory neurons, thereby decreasing their summed input and, in turn, re -ducing the corresponding BOLD signal. Relatedly, it has been shown that inhibition to a neuron can cause a decrease in the firing rate of that neuron while at the same time causing an increase in cerebral blood flow, which is closely related to the BOLD signal (Caesar et al., 2003). Caesar et al. (2003) conclude that this cere-bral blood flow increase probably occurs as the result of excitation of inhibitory neurons, consistent with our model. They also note that the cerebral blood flow increase caused by combined excitatory and in-hibitory inputs is somewhat less than the sum of the increases to each input type alone; this is also consistent with our model since the increase in BOLD signal caused by the active inhibitory neurons is somewhat counteracted by the in-hibitory effect of these neurons on the total input to excitatory neurons.

Figure 2 illustrates the process of generating fMRI activations from a model simulation and comparing

4 It is noteworthy, however, that total synaptic input correlates highly with firing rate, both physiologically and in our mod -els. Thus the two measures for estimating neural activity (firing rate vs. total input) are likely to produce similar results.

PHS 398 (Rev. 05/01) Page> 24

Fig. 2. Left: Locations of the cell types in the DIVA model. Right: Generating BOLD signals for a jaw-perturbed speech – unperturbed speech contrast. Left panels show cell activities (gray) and resulting BOLD signals (black) for two cell types in the model. The perturbation results in increased somatosensory error cell activation, but no increased auditory error cell activation. The corresponding BOLD signals are spatially smoothed and plotted on the standard single subject brain from the SPM software package (top right panel). The results of an fMRI experiment comparing perturbed to unperturbed speech are located in the bottom right panel.

Locations of DIVA Model Cells

fMRI Experimental Result

Model Simulation

Perturbed – Unperturbed Speech

Model Cell Activitiesand BOLD Response

Somatosensory Error Cells (S)

Auditory Error Cells ()

Time (s)

Bold

Signal

Neural

Actvity

Page 8: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.the resulting activation to the results of an fMRI experiment designed to test a model prediction. The left panel of the figure illustrates the locations of the DIVA model components on the “standard” single subject brain from the SPM2 software package. The DIVA model predicts that unexpected perturbation of the jaw during speech will cause a mismatch between somatosensory targets and actual somatosensory inputs, causing activation of somatosensory error cells in higher-order somatosensory cortical areas in the supra-marginal gyrus of the inferior parietal cortex. The location of these cells is denoted by S in the left panel of the figure. Simulations of the DIVA model producing speech sounds with and without jaw perturbation were performed. The top middle panel indicates the neural activity (gray) of the somatosensory error signals in the perturbed condi-tion minus activity in the unperturbed condition, along with the resulting BOLD signal (black). Since the somatosensory error cells are more active in the perturbed condition, a relatively large positive response is seen in the BOLD signal. Auditory er-ror cells, on the other hand, show little differential activation in the two conditions since very little auditory error is created by the jaw perturbation (bottom middle panel), and thus the BOLD signal for the auditory error cells in the perturbed – unperturbed contrast is near zero. The derived BOLD signals are Gaussian smoothed spatially and plotted on the standard SPM brain in the top right panel. The bottom right panel shows the results of an fMRI study we carried out to compare perturbed and unper-turbed speech (13 subjects, random effects analysis, false dis-covery rate = 0.05). In this case, the model correctly predicts the existence and location of somatosensory error cell activation, but activation not explained by the model is found in the left frontal operculum.

Fig. 3 compares results from an fMRI study performed by our lab of single syllable production to simulated fMRI data from the DIVA model in the same speech task. Comparison of the top and bottom panels indicates that the model qualitatively ac-counts for most of the fMRI activations (see Guenther et al., 2005 for details).

Research in this subsection funded by the current grant has been accepted/published in Journal of the Acoustical Society of America (Nieto-Castanon et al., 2005), Brain and Language (Guenther et al., 2005), Journal of Motor Behavior (Micci Barreca & Guenther, 2001), Journal of Cognitive Neuroscience (Dessing et al., in press) and Contemporary Issues in Communication Science and Disorders (Max et al., 2004).

C.2. Combining modeling with psychophysics and neuroimag-ing to investigate auditory feedback control in speech. The DIVA model predicts that unexpected perturbation of speech will cause a mismatch between target sensations and actual sensory inputs. For example, an auditory perturbation such as a shift of the first formant frequency, or F1 (so that a subject hears himself saying “bit” when he is attempting to say “bet”) should activate auditory error cells. These cells are hypothesized to reside in higher auditory cortical ar-eas in the planum temporale and superior temporal gyrus (A in the left panel of Fig. 2). If speech is perturbed in this manner repeatedly, rather than unpredictably, the model predicts that feedforward com-mands will eventually become tuned to include compensation for the frequency shift. Learning of this sort is sometimes termed sensori-motor adaptation (e.g. Houde & Jordan, 1998). If the shift is then re-moved, the model will display an “after-effect” while it retunes the feedforward command to work properly under normal conditions. To test these hypotheses, we performed two experiments, accompa-nied by modeling projects, investigating the effects of shifting F1 of a speaker’s auditory feedback in real-time: an unexpected auditory perturbation fMRI study and a sensorimotor adaptation psychophysi-cal study (the latter performed as part of another grant; R01

PHS 398 (Rev. 05/01) Page> 25

fMRI Activations

Fig. 4. Top: Cortical activation during auditorily perturbed speech (relative to unperturbed speech). [8 subjects; fixed effects; p<0.001 uncorrected.] Bottom: Model activations for the same contrast.

Model Simulations

Page 9: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.DC01925, J Perkell, PI).

In each trial of the fMRI study, subjects read a one-syllable word (e.g., “neck” or “bet”), and on 1 in 4 trials (randomly dispersed) the subject’s auditory feedback was perturbed by shifting the first formant frequency of his/her own speech upward or downward by 30% in real time5. (See fMRI Experimental Protocol at beginning of Section D for further details regarding scanning proto-col). The top part of Fig. 4 shows the areas with significantly more activation during shifted trials as compared to unshifted trials. As predicted by the DIVA model (illustrated by model simulation results in bottom panel of Fig. 4), in-creased activation is found in higher-order auditory cortical areas, specifically in the ventral posterior superior temporal gyrus in the right hemisphere and within the posterior portion of the planum temporale in the left hemisphere. In the model these activations arise from auditory error cells becoming active when a discrepancy exists between the auditory target and the incoming audi-tory signal during speech. It is noteworthy that the DIVA model prediction of the existence of auditory error cell activation, as well as its location, was made prior to the running of the fMRI experiment (e.g., Guenther et al., 2005). These results highlight the effectiveness of our approach in generating predic-tions to guide fMRI studies, and more generally in furthering our understanding of brain function during speech production.

The speech of subjects in the fMRI study was recorded and analyzed to identify whether subjects were compensating for the perturbation within the perturbed trial. (Note that such within-trial compensation differs from adaptation; compensation refers to on-line corrections in response to a perturbation, whereas adaptation implies a learned compensation that occurs even in the absence of a perturbation.) The gray shaded areas in Fig. 5 represent the 95% confidence interval for normalized6 F1 values during the vowel for upward perturba-tion trials (darker shading) and downward perturbation trials (lighter shading). Subjects showed clear compen-sation for the perturbations, starting approximately 100-130 ms after the start of the vowel. Simulation results from the DIVA model are indicated by the dashed line (upward perturbation) and solid line (downward pertur -bation). The model’s productions fall within the 95% confidence interval of the subjects’ productions, indicat -ing that the model quantitatively accounts for compensation seen in the fMRI subjects.

This research was been presented at the 2004 Acoustical Society of America conference in San Diego; Jason Tourville was awarded Best Student Paper in Speech Communication for this presentation.

In the psychophysical study (Villacorta, 2005), 20 subjects performed a sensorimotor adaptation experi-ment that involved four phases: a baseline phase in which the subject produced 15 repetitions of a short list of words (each repetition of the list corresponding to one epoch) with normal auditory feedback (epochs 1-15 in Fig. 6), a ramp phase during which the shift in F1 was gradually introduced to the subject’s auditory feedback (epochs 16-20), a training phase in which the full F1 perturbation (a 30% shift of F1) was applied on every trial (epochs 21-45), and a post-test phase in which the subject received unaltered auditory feed-back (epochs 46-65). The subjects’ adaptive response (i.e., the percent change in F1 compared to the base-line phase in the direction opposite the perturbation) is shown by the solid line with error bars in Fig. 6. The shaded band in Fig. 6 represents the 95% confidence interval for simulations of the DIVA model (where dif-ferent versions of the model were created to corre-spond to the different subjects; see Villacorta, 2005 for details). Except for only two epochs, the model’s

5 The auditory feedback delay was 18 ms in both perturbed and unperturbed trials; this delay is not noticeable to the sub-ject. Thus on perturbed trials the subject simply hears him/herself producing the wrong word (e.g., “bit” instead of “bet”) as he/she is speaking.6 The F1 values were normalized by dividing by the mean F1 for the vowel at the same time point in unperturbed trials.

PHS 398 (Rev. 05/01) Page> 26

10 20 30 40 50 60-0.02

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

epochs

AR F1

sig. diff.

Fig. 6. Adaptive response (AR) in F1 during the sensorimotor adaptation experiment (solid lines) compared to DIVA model simulations (shaded area). The lighter shaded region corresponds to the DIVA SA simulations, and represents the 95% confidence interval about the mean. The vertical dashed lines show the experimental phase transitions, and the horizontal dashed line indicates the baseline. [From Villacorta, 2005.]

Fig. 5. fMRI subject normalized F1 values (shaded areas) and DIVA model productions (lines) on upward (dark gray band; dashed line) and downward (light gray band; solid line) perturbation trials.

Time (s)

F1

Page 10: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.productions were statistically indistinguishable from the experimental results. Notably, subjects showed an af -ter-effect as predicted, and the model provides an accurate quantitative fit to this after-effect.C.3. Modeling and neuroimaging of sound category representations in the auditory cortical areas. This subproject involved four fMRI experiments and associated neural modeling designed to investigate the repre -sentation of sound categories in human auditory cortex. fMRI Experiment 1 investigated the representation of prototypical (good) and non-prototypical (bad) examples of a vowel sound. We found that listening to proto-typical examples of a vowel resulted in less auditory cortical activation than listening to non-prototypical ex -amples (Guenther et al., 2004). fMRI Experiments 2 and 3 investigated the effects of categorization training and discrimination training with novel non-speech sounds on auditory cortical representations. The two train-ing tasks were shown to have opposite effects on the auditory cortical representation of sounds experienced during training: discrimination training led to an increase in the amount of activation caused by the training stimuli, whereas categorization training led to decreased activation (Guenther et al., 2004). These results, which utilized powerful region-of-interest based fMRI analysis techniques that we developed under the cur-rent grant (Nieto-Castanon et al., 2003), indicate that the brain efficiently shifts neural resources away from regions of acoustic space where discrimination between sounds is not behaviorally important (e.g., near the center of a sound category) and toward regions where accurate discrimination is needed. They also provide a straightforward neural account of learned aspects of perceptual distortion near sound categories (e.g., the perceptual magnet effect demonstrated by Kuhl, 1991; Iverson & Kuhl, 1996): sounds from the center of a category are more difficult to discriminate from each other than sounds near category boundaries because they are represented by fewer cells in auditory cortical working memory. We noted that activities in inferior parietal and superior temporal regions appeared to correlate well with discrimination scores; that is, stimuli that were relatively easy to discriminate from each other generally caused more activation in these areas than stimuli that were more difficult to discriminate from each other. This result is consistent with the hypothesis that cells in the superior temporal and inferior parietal areas represent the auditory characteristics of the stim -uli in the discrimination task, and that larger neural representations in these areas (as evidenced by more ac-tivation) are less susceptible to noisy processing in individual neurons than smaller representations (see Guenther et al., 2004). Thus sounds with larger superior temporal and inferior parietal representations are easier to discriminate from each other than sounds with smaller representations. Involvement of superior tem-poral and inferior parietal areas in auditory discriminability is also compatible with the results of studies per -formed elsewhere (e.g., Caplan et al., 1995). In fMRI Experiment 4, we found evidence that the inferior pari -etal cortex is part of an auditory working memory system: this area was engaged in a discrimination task that required storage of the auditory details of one sound in working memory for comparison to a second sound, but not in an identification task that involved only identifying which phoneme was heard. Participation of infe -rior parietal cortex in phonological working memory tasks has been posited by numerous researchers (e.g., Baddeley, 2003; Hickok & Poeppel, 2004; Mottaghy et al., 2002; Jonides et al., 1998; Ravizza et al., 2004). Our results indicate that auditory details, as well as phonological information, can be stored in inferior parietal working memory.

Based on these and other results in the literature, we have developed a neural model of speech sound processing in the auditory system (e.g., Guenther & Bohland, 2002; Guenther et al., 2003). We have further elaborated this model as schematized in Fig. 7. We will refer to this elaborated model as the auditory map model7 in this proposal. It serves as the theoretical starting point for several studies proposed herein. The model consists of three interconnected maps of cells. For each map there is an equation that governs cell ac -tivities in the map; these equations8 are biologically plausible shunting equations (e.g., Grossberg, 1980). The first map in the model, the auditory map in Fig. 7, corresponds to primary and secondary auditory cortical re-gions of the superior temporal gyrus and supratemporal plane, including Heschl’s gyrus, planum temporale, and posterior superior temporal gyrus. Cells in this map represent an acoustic signal in terms of relatively low-level auditory parameters. For example, there are cells tuned to particular frequencies and sweeps of fre-quencies; cells of this type have been identified in electrophysiological studies of auditory cortex in animals (e.g., Morel et al., 1993; Kosaki et al., 1997). The auditory map thus provides a rich spectral representation of an incoming sound. The second map, labeled auditory working memory, is hypothesized to lie in the infe-rior parietal cortex. It receives input from the auditory map and is responsible for keeping recent sounds in working memory for a short period after the sounds have stopped. For example, in a same/different discrimi-nation task, the auditory working memory holds the first of a pair of sounds in working memory long enough

7 The auditory map model has thus far been developed independently of the DIVA model. One goal of the current pro -posal is to better integrate the auditory map model with the DIVA model, as described in Section D.2.8 Space constraints prohibit a detailed treatment of model equations in this application; see Guenther & Bohland, 2002 for further details concerning the theoretical underpinnings of the model and computer simulations.

PHS 398 (Rev. 05/01) Page> 27

Page 11: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.for it to be compared to the second sound of the pair. This area in-cludes the anterior and posterior supramarginal gyrus as well as the parietal operculum. It overlaps with the inferior parietal region shown to be involved in both speech perception and production tasks by Hickok and colleagues (Buchsbaum et al., 2001; Hickok et al., 2003). The third map, labeled category map, also receives input from the auditory map and is concerned with classification of sounds into behaviorally relevant categories such as phoneme categories. A good example of a pho-neme category (such as a prototypical /i/ sound in the study of Kuhl, 1991) causes a large amount of activation in this map during a discrimi-nation task, specifically in cells that represent the vowel /i/. A poor ex-ample of /i/ or a sound that is not categorically processed (e.g., an unfa-miliar sound) would cause little or no activation. This effect is achieved in the model through a simple, biologically based learning algorithm in which tuning curves of cells are adjusted to align with frequently occur-ring inputs (e.g., familiar/prototypical sounds in the native language). We hypothesize that the category map is located lateral and ventral to the primary auditory cortex, in the superior temporal sulcus and neigh-boring middle temporal gyrus. For brevity’s sake, we will refer to this general region as the superior temporal sulcus (STS) in this proposal. This location is consistent with neuroimaging studies comparing speech to non-speech stimuli, reporting preferential activation of anterior STS (Mummery et al., 1999; Binder et al., 2000; Scott et al., 2000) and ante-rior middle temporal gyrus (Zahn et al., 2000) bilaterally. Our own fMRI studies found more activity in this region when performing a phoneme identification task than when performing a discrimination task using the same sounds (Guenther et al., 2003), and that sound category learning leads to an increase in activity in this area (Guenther et al., 2004). These results provide further support for an STS category map. Buchsbaum et al. (2001) report speech-related ac-tivity in a more posterior portion of the STS. One of the goals of the current proposal is to refine our under-standing of speech maps in STS, including investigating the possibility of different speech maps in different parts of STS.

According to the model, nerve impulses due to an incoming sound are first registered in cortex in the audi-tory map. This map represents relatively low-level details of auditory stimuli, such as the shape of the fre-quency spectrum and velocities of frequency sweeps. Active cells in the auditory map (schematized by the Gaussian “hump” in the map in Fig. 7) send projections to the other two maps in the model. Projections to the auditory working memory map transmit information about auditory details of the sound so they can be stored in short-term memory, e.g. for comparison to the second incoming sound in a same/different discrimination task9. We hypothesize that the detailed auditory representation in this region also plays an important role in the tuning of speech motor programs. In the DIVA model, a simplified version of such a representation is used to monitor the sensory consequences of speech motor commands. The projections to the category map are hypothesized to transform the sound into a categorical representation. A cell in the category map be-comes active if a clear example of the cell’s preferred stimulus is processed. For instance, a good example of the phoneme /r/ is presumed to heavily activate the cells that represent the /r/ phoneme category in the map, whereas a poor /r/ example will only weakly activate these cells. In addition to connections from the auditory map to the other two maps, the model includes inhibitory connections between the inferior parietal auditory working memory and the category map. The model’s account of the perceptual magnet effect is schematized in Fig. 7. When a good category exemplar arrives, the category map is strongly activated, and this leads to relatively strong inhibition of the auditory working memory, resulting in fewer active cells in the working mem-ory map (top panel). When a poor category example arrives (bottom panel), there is relatively weak category map activation and thus weak inhibition of auditory working memory, resulting in many more active cells in the auditory working memory map. Since larger neural representations are less susceptible to noisy processing in individual neurons, the larger working memory activation for poor category exemplars results in better discrim-inability of these sounds relative to good category exemplars. This model fits well with the dual language pro -cessing streams proposed by Hickok & Poeppel (2004). In terms of both function and cortical location, our

9 It is highly likely that the working memory representation includes a prefrontal cortex component (e.g., Baddeley & Hitch, 1974; Baddeley, 2003; Smith & Jonides, 1998; Husain et al., 2004). Since our primary interest in this project was auditory cortical representations, we focus here on the temporal and parietal lobes.

PHS 398 (Rev. 05/01) Page> 28

Page 12: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.proposed auditory working memory map corresponds to their hypothesized dorsal stream, while the category map corresponds to their hypothesized ventral stream.

The model has been implemented computationally as a neural network whose cell activities and synaptic weight strengths are governed by differential equations, allowing investigation of both the transient and equi-librium behaviors of the system. In the simulations described here, the model was trained by presenting it with sound distributions mimicking those of the American English phonemes /r/ and /l/ (approximated by Gaussian distributions centered on prototypical phoneme examples). Learning took place in the synaptic weights pro-jecting from the superior temporal auditory map to the middle temporal category map using a biologically plausible self-organizing map learning law (Kohonen, 1982) that re-sults in more cells becoming active for frequently occurring sounds (e.g., category prototypes). Following training, we tested the model using artificial stimuli from a stimulus continuum formed by interpolat-ing the third formant frequency (F3) between values for the pho-nemes /r/ and /l/. To compare the model’s discrimination performance to psychophysical measures, we considered the cell activity within the working memory map as an internal psychological variable used for discrimination. Specifically, the perceptual distance (d’) between stimuli was calculated as the difference in the mean population re-sponse for each stimulus divided by the root mean square of the vari-ances of this response (averaged over many trials). When more cells contribute to the population vector, the signal to noise ratio, and in turn discriminability, improves (see also Zohary, 1992). Fig. 8 shows that the model provides an accurate fit to behavioral data we col-lected in a discrimination experiment in which 5 subjects performed a same/different task on sounds from the same /r/-/l/ continuum presented to the model.

This research was published in Journal of Speech, Language, and Hearing Research (Guenther et al., 2004), Acoustical Science and Technology (Guenther & Bohland, 2002), and NeuroImage (Nieto-Castanon et al., 2003) and was presented at several major conferences.

C.4. Investigation of brain mechanisms underlying sound initiation and sequencing. In this subproject, we conducted an fMRI experiment to explore brain activity underlying the sequencing of speech sounds. We investigated brain responses related to motor preparation and overt production of sequences of memory-guided non-word syllables such as “ba-ba-ba” or “stra-stru-stri”. Two parameters determined the linguistic content of the stimuli used. The first, syllable complexity, varied the number of phoneme segments that constituted each individual syllable in the sequence (i.e. CV vs. CCCV syllables). The second parameter, se-quence complexity, varied the number of unique syllables comprising the three-syllable sequence (repetition of the same syllable vs. three different syllables). Each factor was varied between one of two values (simple or complex), yielding four stimulus types. Comparisons across conditions were used to assess sequence-re-lated networks. 13 right-handed adult American English speakers participated (6 female, 7 male; ages 22-50). Stimuli were presented visually on a projection screen in the rear of the scanner (3T Siemens Trio). A single trial (involving stimuli chosen randomly from all conditions) began with this stimulus display. After 2.5s the stimulus was removed and immediately replaced by a white fixation cross. Subjects were asked to maintain fixation and to prepare to vocalize the stimulus they had just read. After a short random duration (0.5-2.0s), the white cross became green, signaling the subject to immediately vocalize the most recent sequence. Dur-ing this 2.5s period, the scanner remained silent. Subjects were instructed to speak each sequence at a typi -cal volume and rate. Following the full production period, the scanner was triggered to acquire three full brain volumes (see fMRI Experimental Protocol in Section D for further details). At the end of the third volume ac-quisition, the fixation cross disappeared, and the next stimulus was presented. The subjects’ vocal responses were recorded using an MRI-compatible microphone and checked off-line for production errors; error trials were removed from the analysis. A first-level analysis was performed for each subject using the General Lin-ear Model in SPM2. A second-level analysis using non-parametric permutation methods (Nichols & Holmes, 2002) from the SNPM2b toolbox assessed results across subjects.

Complex sequence responses were compared with simple sequence responses within each of the two syllable types. Several cortical regions responded more strongly for complex sequences, including the pre-SMA and SMA, left inferior frontal sulcus, left pre-central gyrus, anterior insula and frontal opercular regions, and left superior parietal cortex. Complex sequences also elicited subcortical activation in the right inferior cerebellum (lobule VIII) and left basal ganglia. Differential cortical activations related to sequence complexity were strongly left lateralized (c.f. the widely bilateral activity in overt production vs. baseline) suggesting a

PHS 398 (Rev. 05/01) Page> 29

Fig. 8. Model's fit to psychophysical data from a discrimination experiment for an /r/-/l/ continuum.

Page 13: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.hemispheric distinction between areas used for speech sequence planning (which are left-lateralized) and those used simply for motor execution (which are bilateral). These results largely implicate the same regions described in clinical cases showing deficits in sequencing and/or initiation of speech (e.g. Jonas, 1981, 1987; Ziegler et al., 1997; Pickett et al., 1998; Dronkers, 1996; Riva, 1998), but also provide further functional and anatomical specificity regarding the network for speech sequencing.

Complex syllables required subjects to realize additional phonemic/phonetic targets for proper articulation compared to simple syllables. Differential brain responses related to syllable complexity were limited to the primary motor and somatosensory cortices in the left hemisphere around the lip, jaw, and tongue representa-tions, and to the superior cerebellar cortex.

In related modeling work, we have described how a set of neural mechanisms may cooperate to mediate all the major stages of sequential action generation from initial sequence acquisition to precisely timed force production by muscles in fluent/skilled production (Bullock, 2004a,b; Rhodes et al., 2004).  Together these experimental and modeling results provide an important contribution to our understanding of how the brain plans and drives production of sequences of learned speech sounds.

This work has been presented at the 2004 Meeting of the Organization for Human Brain Mapping and published in the journals Trends in Cognitive Sciences, Motor Control, and Human Movement Science. An additional journal article is in preparation. We have proposed further work to investigate the role of prefrontal cortex and basal ganglia in the sequencing and initiation of speech as part of a recent new R01 application that is currently under review (“Sequencing and Initiation in Speech Production”, F. Guenther PI). None of the research proposed herein overlaps with that application.C.5. Modeling investigation of cerebellar involvement in feedforward control and coarticulation. In this subproject, we are developing and testing a cerebellar component for use in the DIVA model that explores the role of the cerebellum in feedforward control and coarticulation. A difficult problem faced by the speech neural controller concerns tuning of feedforward motor commands, which involves monitoring corrective com-mands specified by the feedback control subsystem and incorporating them into the feedforward command. However, due to delays inherent in the processing of sensory feedback, these corrective commands occur significantly later in time (approximately 50-200 ms) than desired, since they occur well after the error has oc-curred. It is preferable for the feedforward controller to preempt sensory errors before they arise. Our com-puter simulations indicate that, if the delays are not corrected for in the feedforward command, instabilities arise during fast speech. We hypothesize that the cerebellum, specifically the superior medial cerebellum, performs the task of temporal alignment in the learning of feedforward motor commands for speech, and that this same learning process is responsible for anticipatory coarticulation in speech production.

In our cerebellum model, the delayed feedback command generated from perceived errors causes acti -vation of cerebellar climbing fibers, which leads to long-term depression (LTD) in the synapses between par-allel fibers and Purkinje cells in the cerebellum. The effect of this LTD in the model is to shift the current cor-rective command (specified by the feedback control subsystem) to an earlier point in time on the next attempt to produce the same speech sound. This process continues to occur until the cerebellum starts to produce an error in the opposite direction because it activates the command too soon. This causes a decrease in the climbing fiber activation, which leads to long-term potentiation (LTP) in the parallel fiber-to-Purkinje synapses, which causes the output of the cerebellum to shift to later points in production. This dynamic balance be-tween LTP and LTD results in feedforward commands that are generated as early as is possible without inter-fering with previous sounds, thus providing an account of both anticipatory coarticulation and the proper tem-poral alignment of corrective commands when adjusting the feedforward command.

In other cerebellar modeling work, we have elaborated a model for adaptively timed hand movement con-trol, which shares many mechanisms with speech articulator control and is involved in sign language (Ulloa et al., 2003a,b). This work has been published in the journal Neural Networks, and a journal manuscript describ-ing the speech coarticulation and feedforward control simulations is in preparation.C.6. fMRI study of cerebellar, premotor and motor cortical contributions to speech in normal speakers and individuals with ataxic dysarthria. In order to investigate the role of the cerebellum in relation to the primary motor and premotor cortical areas during speech production, we conducted an fMRI experiment in-volving the production of simple speech sounds by neurologically normal subjects and individuals diagnosed with ataxic dysarthria, a speech motor impairment due to cerebellum damage. We examined the differences in brain activations during production of vowels (V), consonant-vowel syllables (CV) and two-syllable non-sense words (CVCV). 10 neurologically normal right handed speakers of American English (NS group) and 5 subjects diagnosed with ataxic dysarthria (AD group) participated. The data were analyzed using the same methods as the previously described fMRI studies; results from normal subjects were presented previously in the top panel of Figure 3. Additional, more detailed ROI analyses of normal speaker data revealed superior

PHS 398 (Rev. 05/01) Page> 30

Page 14: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.lateral, superior medial and anterior medial cerebellum activity bilaterally for all utterance types (V, CV, CVCV) when compared to a baseline task involving no articulation. The deep cerebellar nuclei were also ac -tive in all conditions. We hypothesized that the cerebellar component of the feedforward command would be more important during consonant production since consonants have stricter timing constraints than vowels and the cerebellum is well-known to be involved in timing of motor commands (e.g., Perrett et al., 1993; Med-ina et al., 2000; Spencer et al., 2003). This hypothesis was supported by the finding of significantly more ac-tivity in the CVCV and CV conditions than the V condition in the right superior medial and anterior cerebellum and left superior lateral cerebellum. In the AD group, as expected, the cerebellar hemispheres were not ac-tive. We hypothesized that speakers with ataxic dysarthria rely more heavily on premotor cortex than normal speakers, particularly for consonant-heavy utterances. In support of this hypothesis, we observed bilateral ac-tivation of premotor cortex in the AD group, but only left hemisphere premotor activity in normal subjects. This suggests that, in individuals with ataxic dysarthria, right hemisphere premotor cortex may compensate for cerebellar loss. This research was presented at the 2003 Organization for Human Brain Mapping Meeting, and a journal article is in preparation.

C.7. Publications supported by this grant. The following publications were funded in significant part or in their entirety by the current grant (R01 DC02852) during the current funding period (2/1/2001–2/1/2006). This list includes 13 published or accepted journal articles, 4 book chapters, 1 journal commentary, 8 conference articles, 11 conference abstracts, 4 technical reports, and 2 Ph.D. dissertations. In the following, authors who were supported by the grant are marked with asterisks (*). As described in C.1-C.6, a number of additional journal manuscripts are in preparation; we plan to complete and submit these papers during the remaining year of the current funding period (2/1/2005-2/1/2006).Callan, D.E., Honda, K., Masaki, S., Kent, R.D., Guenther*, F.H., and Vorperian, H.K. (2001). Robustness of

an auditory-to-articulatory mapping for vowel production by the DIVA model to subsequent developmen-tal changes in vocal tract dimensions. ATR Technical Report TR-H-309. Kyoto, Japan: Advanced Telecommunications Research Institute.

Guenther*, F.H. (2001). Neural modeling of speech production. Proceedings of the 4th International Nijmegen Speech Motor Conference, Nijmegen, The Netherlands, June 13-16, 2001.

Guenther*, F.H. (2001). A neural model of cortical and cerebellar interactions in speech. Society for Neuro-science Abstracts.

Guenther*, F.H., Nieto-Castanon*, A., Tourville*, J.A., and Ghosh*, S.S. (2001). The effects of categorization training on auditory perception and cortical representations. Proceedings of the Speech Recognition as Pattern Classification (SPRAAC) Workshop, Nijmegen, The Netherlands, July 11-13, 2001.

Ghosh*, S., Nieto-Castanon*, A., Tourville*, J., and Guenther*, F. (2001). ROI-based analysis of fMRI data incorporating individual differences in brain anatomy. Proceedings of the 7th Annual Meeting of the Or-ganization of Human Brain Mapping, Brighton, UK.

Micci Barreca, D., and Guenther*, F.H. (2001). A modeling study of potential sources of curvature in human reaching movements. Journal of Motor Behavior, 33, pp. 387-400.

Perkell, J.S., Guenther*, F.H., Lane, H., Matthies, Vick, J., and Zandipour, M. (2001). Planning and auditory feedback in speech production. Proceedings of the 4th International Nijmegen Speech Motor Confer-ence, Nijmegen, The Netherlands, June 13-16, 2001.

Guenther*, F.H. (2002). Effects of category learning on auditory perception and cortical maps. Program of the 143rd Meeting of the Acoustical Society of America, Journal of the Acoustical Society of America, 111(5) Pt. 2, p. 2383.

Guenther*, F.H., and Bohland*, J.W. (2002). Learning sound categories: A neural model and supporting ex-periments. Acoustical Science and Technology, 23(4), pp. 213-220. Japanese-language version ap-peared in Journal of the Acoustical Society of Japan, 58(7), pp. 441-449, July 2002.

Rhodes, B. and Bullock*, D. (2002).  Neural dynamics of learning and performance of fixed sequences: La-tency pattern reorganizations and the N-STREAMS model.  Boston University Technical Report CAS/CNS-02-007. Boston: Boston University.

Ghosh*, S.S., Bohland*, J., and Guenther*, F.H. (2003). Comparisons of brain regions involved in overt pro-duction of elementary phonetic units. Proceedings of the 9th Annual Meeting of the Organization for Hu-man Brain Mapping, New York.

Guenther*, F.H. (2003). Introductory remarks on neural modeling in speech perception research. Program of the 145th Meeting of the Acoustical Society of America, Journal of the Acoustical Society of America, 113(4) Pt. 2, p. 2209.

PHS 398 (Rev. 05/01) Page> 31

Page 15: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.Guenther*, F.H. (2003). Neural control of speech movements. In: A. Meyer and N. Schiller (eds.), Phonetics

and Phonology in Language Comprehension and Production: Differences and Similarities. Berlin: Mouton de Gruyter.

Guenther*, F.H. and Ghosh*, S.S. (2003). A model of cortical and cerebellar function in speech. Proceedings of the XVth International Congress of Phonetic Sciences. Barcelona: 15th ICPhS Organizing Committee.

Guenther*, F.H., Ghosh*, S.S., and Nieto-Castanon*, A. (2003). A neural model of speech production. Pro-ceedings of the 6th International Seminar on Speech Production, Sydney, Australia.

Guenther*, F.H., and Perkell*, J.S. (2003). A neural model of speech production and its application to studies of the role of auditory feedback in speech. In: B. Maassen, R. Kent, H. Peters, P. Van Lieshout, and W. Hulstijn (eds.), Speech Motor Control in Normal and Disordered Speech. Oxford: Oxford University Press.

Guenther*, F.H., Tourville*, J.A., and Bohland*, J. (2003). Modeling the representation of speech sounds in auditory cortical areas. Program of the 145th Meeting of the Acoustical Society of America, Journal of the Acoustical Society of America, 113(4) Pt. 2, p. 2210.

Hampson*, M. Guenther*, F.H., Cohen, M.A., and Nieto-Castanon*, A. (2003). Changes in the McGurk Effect across phonetic contexts. Boston University Technical Report CAS/CNS-TR-03-006. Boston: Boston Uni-versity.

Max, L., Gracco, V.L., Guenther*, F.H., Ghosh*, S.S., and Wallace, M. (2003). A sensorimotor model of stut -tering: Insights from the neuroscience of motor control. In A. Packman, A. Meltzer, & H.F.M. Peters al. (Eds.), Proceedings of the 4th World Congress on Fluency Disorders. Nijmegen, The Netherlands: Uni-versity of Nijmegen Press.

Nieto-Castanon*, A., Ghosh*, S.S., Tourville*, J.A., and Guenther*, F.H. (2003). Region-of-interest based analysis of functional imaging data. NeuroImage, 19, pp. 1303-1316.

Nieto-Castanon*, A., and Guenther*, F.H. (2003). A model of auditory cortical representations underlying speech perception and production. Society for Neuroscience Abstracts.

Tourville*, J.A. and Guenther*, F.H. (2003). A cortical and cerebellar parcellation system for speech studies. Boston University Technical Report CAS/CNS-03-022. Boston, MA: Boston University.

Ulloa, A., Bullock*, D., and Rhodes, B. (2003a). A model of cerebellar adaptation of grip forces during lifting. Proceedings of the IJCNN, 4, pp. 3167-3172.

Ulloa, A., Bullock*, D., and Rhodes, B. (2003b). Adaptive force generation for precision-grip lifting by a spec -tral timing model of the cerebellum. Neural Networks, 16, pp. 521-528.

Bohland*, J.W. and Guenther*, F.H. (2004). An fMRI investigation of the neural bases of sequential organiza -tion for speech production. Proceedings of the 10th Annual Meeting of the Organization for Human Brain Mapping, Budapest, Hungary.

Brown, J., Bullock*, D., and Grossberg, S. (2004). How laminar frontal cortex and basal ganglia circuits inter-act to control planned and reactive saccades. Neural Networks, 17, pp. 471-510.

Bullock*, D. (2004a). Adaptive neural models of queuing and timing in fluent action. Trends in Cognitive Sci-ences. 8, pp. 426-433.

Bullock*, D. (2004b). From parallel sequence representations to calligraphic control: A conspiracy of adaptive neural circuit models. Motor Control, 8, pp. 371-391.

Ghosh*, S.S. (2004). Understanding cortical and cerebellar contributions to speech production through mod-eling and functional imaging. Boston University Ph.D. Dissertation. Boston, MA: Boston University.

Guenther*, F.H., Nieto-Castanon*, A., Ghosh*, S.S., and Tourville*, J.A. (2004). Representation of sound cat-egories in auditory cortical maps. Journal of Speech, Language, and Hearing Research, 47, pp. 46-57.

Guenther*, F.H., and Perkell, J.S (2004). A neural model of speech production and supporting experiments. Proceedings of From Sound to Sense: Fifty+ Years of Discoveries in Speech Communication, Cam-bridge, MA.

Max, L., Guenther*, F.H., Gracco, V.L., Ghosh*, S.S., and Wallace, M.E. (2004). Unstable or insufficiently ac-tivated internal models and feedback-biased motor control as sources of dysfluency: A theoretical model of stuttering. Contemporary Issues in Communication Science and Disorders, 31, pp. 105-122.

Nieto-Castanon*, A. (2004). An investigation of articulatory-acoustic relationships in speech production . Bos-ton University Ph.D. Dissertation. Boston, MA: Boston University.

Rhodes, B.J., Bullock*, D., Verwey, W.B., Averbeck, B.B., and Page, M.P.A. (2004). Learning and production of movement sequences: Behavioral, neurophysiological, and modeling perspectives. Human Movement Science, 23, pp. 683-730.

PHS 398 (Rev. 05/01) Page> 32

Page 16: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.Tourville*, J.A., Guenther*, F.H., Ghosh*, S.S., and Bohland*, J.W. (2004). Effects of jaw perturbation on cor-

tical activity during speech production. Program of the 148th Meeting of the Acoustical Society of Amer-ica, Journal of the Acoustical Society of America, 116, p. 2631.

Yoo, J.J., Guenther*, F.H., and Perkell, J.S. (2004). Cortical networks underlying audio-visual speech per-ception in normally hearing and hearing impaired individuals. Program of the 148th Meeting of the Acoustical Society of America, Journal of the Acoustical Society of America, 116, p. 2524.

Civier*, O, Guenther*, F.H. (2005) Simulations of Feedback and Feedforward Control in Stuttering, Abstracts of Oxford Dysfluency Conference, St. Catherine’s College, Oxford, 29th June to 2nd July, 2005

Dessing, J.C., Peper, C.E., Bullock*, D., and Beek, P.J. (in press).  How position, velocity and temporal infor-mation combine in prospective control of catching: Data and model. Journal of Cognitive Neuroscience.

Nieto-Castanon*, A., Guenther*, F.H., Perkell*, J.S., and Curtin, H. (2005). A modeling investigation of articu -latory variability and acoustic stability during American English /r/ production. J Acoust Soc Am., 117, pp. 3196-3212.

Guenther*, F.H., Ghosh*, S.S., and Tourville*, J.A. (2005). Neural modeling and imaging of the cortical inter-actions underlying syllable production. Brain and Language, E-print ahead of publication.

Guenther*, F.H., Ghosh*, S.S., Nieto-Castanon*, A., and Tourville*, J.A. (in press). A neural model of speech production. In: J. Harrington & M. Tabain (eds.), Speech Production: Models, Phonetic Processes, and Techniques. London: Psychology Press.

Horwitz, B., Husain, F.T., and Guenther*, F.H. (in press). Auditory object processing and primate biological evolution: Commentary to Arbib’s “From monkey-like action recognition to human language”. Behavioral and Brain Sciences.

Perkell, J.S., Guenther*, F.H., Lane, H., Marrone, N., Matthies, M., Stockmann, E., Tiede, M. and Zandipour, M. (in press). Production and perception of phoneme contrasts covary across speakers. In: J. Harrington & M. Tabain (eds.), Speech Production: Models, Phonetic Processes, and Techniques. London: Psychol-ogy Press.

D. RESEARCH DESIGN AND METHODSThe proposed research consists of a combination of functional brain imaging, psychophysical studies, and computational neural modeling to investigate the neural substrates of auditory feedback control in speech production. These studies are organized around the neural models described in Sections C.1 (the DIVA model) and C.3 (the auditory map model). For the sake of clarity, the basic fMRI methods are described first, followed by descriptions of the three subprojects that make up the project. fMRI experimental protocol. All fMRI sessions will be carried out on a 3 Tesla Siemens scanner at the Massachusetts General Hospital NMR Center. Prior to functional runs, a high-resolution structural image of the brain is collected. This structural image serves as the basis for localizing fMRI activations. The fMRI ex-periment parameters will be based on the sequences available at the time of scanning10. The faculty and re-search staff at MGH, together with engineers from Siemens, continuously develop and test pulse sequences that optimize T1 and BOLD contrast while providing maximum spatial and temporal resolution for the installed Siemens scanners (Allegra, Sonata and Trio). A potential problem with using fMRI for speech production is that artifacts arise in images collected while the speech articulators are moving due to changes in the size of the oral cavity (e.g., Munhall, 2001). Furthermore, the fMRI experiments proposed herein involve real-time perturbation of the subject’s acoustic signal while speaking; this type of perturbation is currently not possible in the presence of scanner noise since our digital signal processing system cannot reliably track im-portant aspects of the speech signal when it is masked by scanner noise. To avoid these potential problems, we utilize an event-triggered paradigm (schematized in Fig. 9) in which the scanner is triggered to collect 2 whole brain volumes worth of images starting approximately 4 seconds11 after the subject has finished producing the stimulus for a par-

10 Current scanning parameters are as follows. T1-weighted high resolution anatomical scans: voxel size 1.33mm sagittal x 1mm coronal x 1mm axial. T2-weighted Echo Planar Imaging (EPI) functional scans: 16 slices per second, matrix size 64x64, field of view (FOV) 200x200mm, in-plane resolution 3.125 x 3.125mm. Our experience has shown that a scan vol -ume of 200mm x 200mm x 150mm (e.g., 37 slices of 4.05mm thickness) is sufficient to cover the entire brain. Slices are acquired in an interleaved manner and the volume acquisition time is approximately 2 seconds. 11 The 4-second offset allows us to scan near the peak of the hemodynamic response (e.g., Le et al., 2001).

PHS 398 (Rev. 05/01) Page> 33

Fig. 9. Timeline for a single trial in our fMRI protocol. Subject speaks the stimulus out loud during stimulus presentation. HR = estimated hemodynamic response; TA = time of acquisition of brain volume scans.

Page 17: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.ticular trial (total inter-trial interval (ITI) of 10-14 seconds12). The scanner is silent during stimulus presentation and speech. Because the blood oxygen level changes induced by the stimulus persist for many seconds, this technique allows us to measure activation changes while avoiding scanner noise during stimulus presenta-tion. Data analysis will correct for summation of blood oxygen level across trials using a general linear model (including correction for the effects of the scanner noise during the previous trial). Each session will consist of approximately 4-6 functional runs of 10-15 minutes each. During a run, stimuli will be presented in a pseudo-random sequence. For each experiment, the task(s) and stimulus type(s) are carefully chosen to address the aspect of speech being studied; these tasks and stimuli are described in the subproject descriptions in Sec-tions D.1-D.3. We have developed software to allow us to send event triggers to the scanner and to analyze the resulting data, and we have successfully used this protocol to measure brain activity during speech pro -duction and perception in a number of previous studies (see Section C).Effective connectivity analysis. Whereas commonly used voxel-based analyses of fMRI data rely on the notion of functional specialization, the brain, as well as the neural model proposed herein, is a connected structure in a graph-theoretic sense, and connections between specialized regions bring about functional in-tegration (see e.g. Horwitz et al., 2000; Friston, 2002). Functional integration, or the task-specific interactions between brain regions, can be assessed through various network analyses that measure effective connectiv-ity. In the current proposal, we will use structural equation modeling (SEM) to examine these interactions. SEM is the most widely used method for making effective connectivity inferences from fMRI (Penny et al., 2004), and benefits from a large literature regarding its application to neuroimaging (e.g. McIntosh & Gonza-lez-Lima, 1994a; McIntosh et al., 1994b; Büchel & Friston, 1997; Bullmore et al., 2000; Mechelli et al., 2002) as well as its general theory (see e.g. Bollen, 1989). This method requires the specification of a causal, di-rected anatomical (structural) model, and estimates path coefficients (strength of influence) for each connec-tion in the model that minimize the difference between the measured inter-regional covariance matrix and that implied by the model. We will utilize a single characteristic time-course of the BOLD response from each re-gion-of-interest (ROI) corresponding to a component of our model in the SEM calculations. There exists a natural correspondence between structural equation models and neural network models, as both are speci -fied by connectivity graphs and connection strengths. We can specify the connectivity structure in both mod-els identically (based on known anatomy from primate studies, diffusion tensor imaging studies, etc.), and di-rectly compare the resulting inferred path coefficients with the connectivity in the model. In both cases, inter-regional interaction may be dynamic in the sense that the activity in one region may be driven by different re -gions (or in different proportions) in varying tasks and contexts; likewise, learning may result in the strength-ening or weakening of effective connections (e.g. Büchel et al., 1999). To assess the overall goodness of fit of the SEM we will use the χ2 statistic corresponding to a likelihood ratio test. If we are unable to obtain proper fits using our theoretical structural models (i.e. P(χ2) < 0.05), we will consider this evidence that the connectivity structure is insufficient and we will develop and test alternative models. To make inferences about changes in effective connectivity due to task manipulations or learning (see Sections D.1 and D.2 for details), we will utilize a “stacked model” approach; this consists of comparing a ‘null model’ in which path co -efficients are constrained to be the same across conditions with an ‘alternative model’ in which the coeffi -cients are unconstrained. A χ2 difference test will be used to determine if the alternative model provides a sig-nificant improvement in the overall goodness-of-fit. If so, the null model can be rejected, indicating that effec-tive connectivity differed across the conditions of interest.

The remainder of this section describes the three subprojects. The first two subprojects investigate the auditory cortical representations underlying feedback-based control of prosodic (Section D.1) and segmental (Section D.2) aspects of speech production. The third subproject (Section D.3) investigates interactions be-tween the feedback-based control system investigated in D.1 and D.2 and feedforward control mechanisms. D.1 Modeling, neuroimaging, and psychophysical investigation of the neural bases of prosodic con-trolThe primary objective of this subproject is to identify and model the neural mechanisms responsible for the control of prosody in speech production. We propose 7 psychophysical experiments, 4 fMRI experiments, and a modeling project that investigate key issues regarding linguistic prosody. The psychophysical and fMRI ex-periments involve real-time perturbation of prosodic cues during speech in order to probe the feedback and feedforward control mechanisms involved in prosodic control. The results of the psychophysical and fMRI ex-periments will be used to guide development of a neural model of prosodic control. Theoretical background. Acoustic cues that carry prosodic information include fundamental frequency (F0), perceived as pitch; amplitude, perceived as loudness; and duration, perceived as length (Cutler et al., 1997; 12 This ITI range, which we have now successfully used in several studies, was chosen to optimize the signal-to-noise ra-tio of the resulting data (a smaller ITI allows more trials per subject), as determined from pilot runs.

PHS 398 (Rev. 05/01) Page> 34

Page 18: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.Shriberg & Kent, 2003; Shattuck-Hufnagel & Turk, 1996). Each prosodic cue has multiple linguistic and com-municative roles. In some instances the cues work together, and in others one cue is predominant. For exam-ple, a rising terminal pitch contour is used to signal yes/no questions. In contrast, increased loudness, dura -tion and pitch are often used to indicate word stress. F0 and duration are thought to be the main carriers of linguistic and pragmatic information (Fry, 1958; Pierrehumbert, 1980). Others have argued that intensity and vocal effort are also highly informative cues for detecting linguistic stress and emphasis (Denes, 1959; Denes & Milton-Williams, 1962; Fry, 1955). An important open question regarding prosodic control is the degree to which the different prosodic cues are controlled independently or in an integrated fashion. This question will be addressed in the psychophysical and fMRI experiments proposed below using a novel auditory perturba-tion paradigm, and the experimental results will guide refinement of the DIVA model.

In this proposal we focus on linguistic aspects of prosody, rather than affective prosody (cf. Streeter et al., 1983, Williams & Stevens, 1972). While the acoustic features used to signal linguistic contrasts and those used to signal affective states may be analogous, these two functions of prosody have largely been studied independently. It has been argued that the neural substrates for linguistic and affective prosody differ and that the subset of salient acoustic features also differs. For example, voice quality is an essential feature for signaling affective prosody yet it is not thought to be salient for linguistic contrasts (Ladd et al., 1985; Lehiste, 1970; Shattuck-Hufnagel & Turk, 1996; Streeter et al, 1983; Williams & Stevens, 1972).

Linguistic prosody shapes speech at lexical, phrasal and discourse levels. Here we address lexical and phrasal prosody. In particular, we explore noun/verb lexical stress and contrastive stress at the phrasal level. Noun/verb distinctions such as PROtest vs. proTEST are signaled by increasing the pitch, loudness and du-ration of the stressed syllable noted in uppercase. Similarly, at the phrasal level stress can be used to con -trast an alternative meaning of the utterance, e.g., “JOHN hid his key” (i.e., not Bill). Once again the stressed word is acoustically represented by a longer duration, a higher fundamental frequency and greater intensity compared to when it is unstressed (Bolinger, 1961; Lehiste, 1970; Morton & Jassem, 1965).

As described in Section B, some researchers have suggested that linguistic prosody primarily involves left hemisphere mechanisms (Emmorey, 1987; Walker et al., 2004), while others suggest significant involve-ment of the right hemisphere (Riecker et al., 2002) or both hemispheres (Mildner, 2004). Simple hemispheric models of prosody try to break prosodic function into left- vs. right-hemisphere functions (for a review see Sidtis & Van Lancker Sidtis, 2003). One such model suggests that linguistic prosody is primarily processed in the left hemisphere, while emotional/affective prosody is primarily processed by the right (e.g., Van Lancker, 1980). Additional proposals include the suggestion that the left hemisphere is particularly involved in segmen-tal and word-level prosody as opposed to sentence-level prosody (e.g., Baum & Pell, 1999). In their “dynamic dual pathway” model of auditory language comprehension, Friederici and Alter (2004) posit that the left hemi-sphere primarily processes syntactic and semantic information, whereas the right hemisphere primarily pro-cesses sentence-level prosody (see also Boutsen & Christman, 2002). A different type of hemispheric model suggests that the difference between hemispheres has to do with acoustic processing; the left hemisphere is specialized for the processing of timing (rate) whereas the right hemisphere is specialized for pitch processing (see Sidtis & Van Lancker Sidtis, 2003). A variant of this view suggests that the left hemisphere is specialized for analyzing sounds using a short temporal window, while the right hemisphere preferentially processes sounds using a longer temporal window (Poeppel, 2003). This view is supported by observations that left hemisphere damage seems to impair prosodic processing based on timing cues more than pitch cues (e.g., Van Lancker & Sidtis, 1992), whereas right hemisphere damage does not seem to impair processing of timing cues (e.g., Pell, 1999). Furthermore, the right hemisphere has often been implicated in perceptual processing of pitch (Sidtis & Volpe, 1988; Zatorre, 1988; Sidtis & Feldmann, 1990), and right hemisphere damage has been implicated in pitch production impairments (Sidtis, 1984), sometimes in the absence of a pitch percep-tion deficit. Severe left hemisphere damage appears to typically spare the ability to perceive complex pitch (Sidtis & Volpe, 1988; Zatorre, 1988) and to manipulate pitch in singing (Sidtis & Van Lancker Sidtis, 2003).

The proposed experiments will build upon previous auditory perturbation studies, most of which have in-volved perturbation of pitch without regard to its specific role as a linguistic cue in speech (see Section B). It has been hypothesized that control of F0 at least in part involves feedback control mechanisms (e.g., Elliot & Niemoeller, 1970). Hain et al (2001) determined that delaying the auditory feedback of a subject’s compen -satory response to a pitch shift causes an increase in the duration of the initial response peak. This result was interpreted as strongly supporting the use of a closed-loop, negative feedback system for the control of F0 (see also Larson et al., 2000, 2001). Such a system is utilized in the auditory feedback control portion of the DIVA model as described in Section C.1, although currently prosodic cues are not controlled in the model.

The perturbation studies described thus far involve the use of closed-loop feedback control mechanisms to compensate for unpredictable changes in pitch. Additional studies indicate that sustained (and thus pre -

PHS 398 (Rev. 05/01) Page> 35

Page 19: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.dictable) perturbations of the acoustic signal during speech can lead to sensorimotor adaptation, wherein compensatory responses continue after feedback is returned to normal. These residual “compensatory” re-sponses result in incorrect-sounding productions for the first few trials after the perturbation is removed. This result is indicative of reorganization of sensory-motor neural mappings, rather than closed-loop feedback con-trol. Houde and Jordan (1998) demonstrated sensorimotor adaptation in speech by perturbing the first two formant frequencies of the speech signal while subjects whispered one-syllable words. Over many trials, sub-jects modified their productions to compensate for this perturbation. This compensation continued after the perturbation was removed; i.e., the subjects overshot the vowel formant targets in the direction opposite to the now-removed perturbation. Adaptation effects have also been demonstrated with pitch-shifted speech (Jones & Munhall, 2000). Most studies have focused on changes in the acoustic signal; however Max et al. (2003) showed adaptation in both articulator movements and acoustics to shifts in formant frequencies.

In the DIVA model, sustained perturbation of auditory feedback will lead to reorganization of the feedfor-ward commands for speech sounds. This occurs because the model’s feedforward control subsystem con -stantly monitors the corrective commands generated by the feedback control subsystem, gradually incorpo-rating repeatedly occurring corrections into the feedforward command (see Section C.2; Guenther et al., 2005; Villacorta, 2005). If the feedback perturbation is then removed, the first few non-perturbed productions will still show “compensation”, as in the sensorimotor adaptation experiment described in Section C.2. The model thus contains the basic mechanisms necessary to account for both closed-loop feedback control (as in the unpredictable pitch shift experiments of Larson and colleagues) and sensorimotor adaptation (as in the Houde & Jordan, 1998 and Jones & Munhall, 2000) of segmental aspects of speech. Here we propose to ex-tend the model to include the control of prosodic cues in addition to segmental cues.Methods and hypotheses. The proposed methods involve tightly integrated modeling, psychophysical, and fMRI studies. For purposes of exposition, a description of the model implementation will be described first, fol -lowed by descriptions of the proposed experiments, and concluding with a description of how the modeling work will be integrated with the experimental studies. Model implementation. Currently the DIVA model does not address the control of prosody; instead it fo-cuses on segmental aspects of speech production. For example, the model does not explicitly control pitch; the pitch profile is specified by the modeler. In the proposed project we plan to extend the model to include neural mechanisms for controlling pitch, duration, and loudness to indicate stress in simple utterances.

Fig. 10 schematizes two hypothetical archi-tectures for feedback control of prosodic cues. These schemes are based in part on the DIVA model circuitry for controlling formant frequen-cies. In the Integrated Model, cells in higher-or-der auditory cortex compare a target stress level to the perceived stress level to compute a “stress error” which is then transmitted to controllers for pitch (P), duration (D), and loudness (L). In the Independent Channel Model, errors are calcu-lated for pitch, duration, and loudness sepa-rately, with each error projecting to the corre-sponding controller.

The transfer of informational cues between prosodic features has been referred to as cue trading (Howell, 1993; Lieberman, 1960). It has been shown that even though different speakers may use different combinations of prosodic cues to indicate stress, listeners are able to reliably identify stress (Howell, 1993; Patel, 2003, 2004; Peppé et al., 2000). For example, some speakers may rely more on duration than pitch or loudness to indi -cate stress, while others may use pitch or loudness more, and naïve listeners appear to be able to leverage this phenomenon to perceive stress even in cases of severely dysarthric speech (Patel, 2002b). Such cross-speaker cue trading is consistent with both the Integrated and Independent Channel models13. However the models make differential predictions regarding the effect of perturbing one of the cues for stress in real time during speech production. In the Integrated Model, if the model’s perceived pitch level were perturbed down-13 To explain cue trading across speakers in the Independent Channel model, one need only note that different speakers can use different combinations of P, D, and L targets for stress.

PHS 398 (Rev. 05/01) Page> 36

Fig. 10. Two models for feedback control of prosodic cues.

Low-level Auditory Cortex

P

LDPerceived

StressStressError

High-level Auditory Cortex

From auditory periphery

To P, D, L controllers(Motor Cortex)

Premotor Cortex

Stress Target

Low-level Auditory Cortex

P

LD

P

High-level Auditory Cortex

From auditory periphery

To P controller

Premotor Cortex

P Target

L TargetD Target D

L

To Dcontroller

To Lcontroller

Independent Channel Model

Integrated Model

Page 20: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.ward while it was speaking a stressed syllable, it would compensate for the resulting Stress Error (Fig. 10, top) by increasing not only pitch but also loudness and duration. In the Independent Channel Model, how -ever, the pitch perturbation would lead to a pitch error (P in bottom of Fig. 10) which will lead to an increase in pitch only, not loudness or duration. These predictions will be tested in the experiments proposed below.

We will create two versions of the DIVA model: one involving integrated control of stress (Fig. 10 top) and one involving independent control of pitch, duration, and loudness (Fig. 10 bottom). The models will be imple-mented using the Matlab programming environment on Windows and Linux workstations equipped with large amounts of RAM (4 Gigabytes) to allow manipulation of large matrices of synaptic weights, as well as other memory-intensive computations, in the neural network simulations. Model implementation involves the defini-tion of mathematical equations representing the “activity levels” of model neurons (corresponding approxi -mately to neural spiking frequency) as well as the synaptic connections between sets of neurons. Three new sets of neurons are needed in the model to account for prosody (see Fig. 10): premotor cortex neurons repre-senting prosodic targets, low-level auditory cortex neurons representing the pitch, duration and loudness cues available in the acoustic signal, and high-level auditory cortex prosodic cue error cells. The equations representing these neural activities will be constrained to be in general accord with cell properties found in neurophysiological studies in primate auditory cortex (e.g., Bendor & Wang, 2005; Godey et al., 2005) and ventral premotor cortex (e.g., Ferrari et al., 2003; Rizzolatti et al., 2003). For the synaptic weights, we will use biologically plausible synaptic modification equations that rely only on pre- and post-synaptically available in-formation for weight adjustment, as in our prior work (e.g., Guenther, 1994, 1995; Guenther et al., 1998). Af -ter equations for the cell activities and synaptic weights have been defined, computer code implementing these equations will be written, tested, and integrated with the existing DIVA model software framework.

After computer implementation, the different versions of the model will be simulated on the same speech tasks used in the experimental studies described below, and the models’ adequacy in accounting for the ex-perimental data will be analyzed. The details of this process are provided after the experimental descriptions.Psychophysical experiments. The proposed psychophysical experiments investigate the adaptive re-sponses of subjects to externally imposed perturbations of prosodic cues in their own speech. We have suc-cessfully implemented real-time perturbation of auditory cues using a Texas Instruments (TI DSK 6713) digi-tal signal processing board that is portable and can be used for both psychophysical and fMRI experiments. In addition to F1 perturbations (as in Section C.2; also Villacorta et al., 2004), we have successfully imple -mented, tested, and validated perturbations to pitch and intensity in pilot studies for this subproject, as well as real-time pitch tracking and syllable segmentation algorithms (see Fig. 11 below for pilot results).

The perturbation experiments are designed to clarify the degree to which the neural control mechanisms for different prosodic cues (pitch, duration, and loudness) are integrated (Fig. 10 top) or independent (Fig. 10 bottom). The psychophysical experiments will be performed and analyzed at Dr. Rupal Patel’s laboratory at Northeastern University (see letter of cooperation). Dr. Patel has experience conducting production and per-ception experiments on prosodic control in children and in healthy and dysarthric adults (e.g., Patel, 2002a,b, 2003, 2004). Her lab is equipped with a sound treated booth and software infrastructure for perceptual and acoustic analyses (see Resources page for Northeastern University subcontract for details). Subjects. All subjects will be monolingual speakers of American English with no known speech, language, and neurologi-cal disorders between the ages of 18-55 (older subjects will be excluded due to possible high frequency hear-ing loss). All subjects will be required to pass a hearing screening with thresholds at or below 25 dB in at least one ear at 500, 1000, 2000 and 4000 Hz. Power analysis. Following the methodology described in Zarahn and Slifstein (2001), we utilized the data from our previous sensorimotor adaptation experiment involving per-turbation of the first formant frequency (see Section C.2) to obtain reference parameters from which to derive power estimations for the psychophysical studies proposed here. F1 measures from the last half of the train -ing phase were used to obtain measures of within- and between-subject variability, and the difference be-tween average F1 of downward-perturbed subjects compared to upward-perturbed subjects was used as a reference effect size. The power estimate indicates that 12 subjects are enough to detect (with probability>.9 at a P<0.01 type I error level) an effect size 50% as large as the reference contrast in Production Experi -ments 1-4.

Production Experiment 1 will be a sensorimotor adaptation experiment in which phrase-level stress will be perturbed by decreasing (in half of the subjects) or increasing (in the other half of the subjects) the pitch of the stressed word as heard by the speaker. Each trial will involve the subject reading a four-word phrase with one stressed word (indicated by capital letters on the screen; e.g., Bob CAUGHT a dog). Phrases will be composed of monosyllabic words. Five phrases will be used, with three possible stress locations (on each of the content words) in each phrase. The stimuli are designed to meet several phonotactic and linguistic crite -ria. First, we designed stimuli for which the pattern of phonemes at adjacent word boundaries would permit

PHS 398 (Rev. 05/01) Page> 37

Page 21: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.automatic segmentation. For this reason we chose to begin each word with a stop consonant. Second, we held the vowel nucleus constant across key words (i.e., subject, verb, and object) within a sentence to control for differences in intrinsic vowel F0. To study differences in stress patterning for different vowels, we designed a sentence for each of the cardinal vowels and an additional sentence with the mid central vowel. Third, each sentence follows the same syn-tactical construction (subject verb article object). Fourth, only monosyl-labic words are used to ensure approximately the same utterance dura-tion in each phrase. Last, we piloted the stimuli to make sure that speak-ers could consistently produce the targeted stress.

We will use contextual sentences to elicit each stress location. The context sentence will be presented for approximately 2 seconds, fol-lowed by the target sentence, with the stressed syllable indicated by capital letters. For example, the context sentence “Who caught a dog?” would precede the test sentence “BOB caught a dog” on the screen. The specific stimuli for the experiment are as follows. Stimulus 1: Bob caught a dog [low back vowel; context sentences: Who caught a dog? What did Bob do to a dog? What did Bob catch?] Stimulus 2: Bill bit a kid [high front vowel]. Stimulus 3: Doug cut a bud [mid central vowel]. Stimulus 4: Pat planned a dance [low front vowel]. Stimulus 5: Bush took a book [high back vowel]. Since each phrase will be produced with stress in three different locations, the experiment will involve 15 stimuli.

As in our earlier sensorimotor adaptation experiment (Section C.2), each experimental session will in-volve four phases: a baseline phase with no perturbation; a ramp phase during which the perturbation is added in increments; a training phase involving full perturbation on all stressed words; and a post-test phase with no perturbation. We will use a 5 s trial length and 65 epochs (where each epoch contains each ut-terance produced once) distributed over the four phases (15 baseline, 5 ramp, 25 training, and 20 post-test), for an overall experimental session length of approximately 75 minutes. The trial length and number of trials per stimulus in each phase are based on our previous study involving perturbations of the first formant fre-quency (Villacorta et al., 2004; Villacorta, 2005; see Section C.2). Analyses will look for compensation (indi-cated by significant differences in the training phase productions of upshifted vs. downshifted groups) and adaptation (indicated by significant differences in the post-test productions of the two groups) in intensity, du-ration, F0, F114, and a perceptual measure of stress (see Perception Experiment 1 below). We have piloted this paradigm and verified that the pilot subject compensated to downward shifts in F0 of the stressed syllable by increasing her F0 for that syllable during the ramp and training phases and showed after-effects (i.e., adaptation) during the post-test phase (see Fig. 11).

Hypothesis Testing. The Independent Channel Model of Figure 10 predicts that perturbation of F0 will lead to compensation in F0 only, not in duration or intensity. To test this prediction, we will look for compen -satory changes in the duration, intensity, and F0 values of utterances in the training phase as compared to the baseline phase. The Independent Channel Model’s hypothesis will be supported if F0 is significantly dif -ferent in the training phase in the direction opposite to the perturbation (p<0.05, one-tailed t test) and inten-sity and duration do not show a change (p>0.05, two-tailed t test). Further testing of this hypothesis will be carried out by analyzing Production Expts. 1 and 2 together, as described for Production Expt. 2 below. The Integrated Model of Fig. 10 predicts that, in addition to F0, duration and intensity should change in response to perturbation of F0; this prediction will be supported if all three cues show a significant change in activity in the direction opposite the perturbation during the training phase (p<0.05; one-tailed t-test). In order to test for adaptation (vs. compensation), we will analyze the post-training data (difference between upshifted and downshifted groups) using a likelihood ratio test to compare an exponential decay approximation (adaptation) to a null hypotheses characterized by an instantaneous return to baseline (no adaptation). In addition, we will test the DIVA model’s ability to account for adaptation and compensation in the experimental data by per-forming simulations of the model performing the same sensorimotor adaptation paradigm and performing sta -tistical comparisons between the model’s productions and those of the experimental subjects as described in Integrating the Modeling and Experimental Studies below.

Production Experiment 2 will be the same as Production Expt. 1 except intensity will be perturbed in -stead of F0. The experimental protocol, stimuli, analyses, and hypothesis tests will be the same as Expt. 1. We will also analyze the combined results from Experiment 1 and 2 to further test between the Independent Channel and Integrated models. The Integrated Model predicts that the amount of compensation in F0, inten -14 F1, which correlates with tongue height, can be used to estimate hyperarticulation.

PHS 398 (Rev. 05/01) Page> 38

Fig. 11. Results of pilot study indicating that subject compensates for downward perturbation of F0 (evidenced by higher peak F0 values in the ramp and full perturbation phases vs. baseline) and shows after-effects when the perturbation is removed in the post-perturbation phase. Error bars indicate standard error.

Page 22: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.sity, and duration will be proportional in the two experiments (differences between the experiments attribut -able to overall differences in the perceived stress perturbation), while the Independent Channel Model pre -dicts additional differences indicated by the interaction between perturbation type and the measures of com-pensation. This will be tested using a repeated-measures ANOVA on the normalized measures of compensa-tion (percentage difference between training and baseline phases on each of the three measures of interest).

Production Experiment 3 will investigate word-level stress by perturbing pitch downward/upward on the stressed syllable of two-syllable words. We will use 6 pairs of two-syllable words, where words in a pair are spelled the same but stressed differently (e.g., CONduct, conDUCT). The experiment will involve the same four phases as Expts. 1 and 2. Trial length will be 3 s; the number and distribution of trials, number of sub -jects, and analyses will be as in Expts. 1 and 2. Stimuli have the same spelling and similar vowels in the two different forms (noun and verb), as well as having each syllable start with a stop consonant or affricate to al -low reliable automatic segmentation: CONtest/conTEST; CONduct/conDUCT; PROduce/proDUCE; CONtract/conTRACT; PROtest/proTEST; CONtrast/conTRAST.

Production Experiment 4 will be the same as Production Expt. 3 except intensity will be perturbed in -stead of pitch. The experimental protocol, stimuli analyses, and hypothesis tests will be the same as Expt. 3.

In Perception Experiment 1, 12 listeners will rate the stressed syllables produced by subjects in Produc-tion Experiments 1 and 2. On each trial, the listener will be presented with a phrase produced by one of the subjects in Experiment 1 or 2. The listener will choose (on a graphical interface) which syllable was stressed as well as a confidence rating of how sure they were that this was the stressed syllable. The resulting percep-tual measures will be analyzed in addition to intensity, F0, duration, and F1 for Production Experiments 1 and 2. To reduce the potential negative impact of variation in inter-rater and intra-rater reliability on the results, each listener will hear 1/12 of the utterances produced by every speaker in Experiments 1 and 2. We esti -mate that this procedure will take approximately one hour. In Perception Experiment 2, 12 listeners will per-form the same task as in Perception Experiment 1 except that word-level stress in tokens from Production Ex -periments 3 and 4 will be rated. In Perception Experiment 3, 12 listeners will rate productions of the DIVA model after it has been run on Production Experiments 1, 2, 3, 4. The rating task will be the same as in Per-ception Experiments 1 and 2. The results will be used along with measures of F0, F1, intensity, and duration to analyze the model productions and compare them to those of the human subjects. fMRI Experiments. The aim of the fMRI experiments is to elucidate the neural circuits involved in the online (or feedback-based) control of prosodic cues. The studies are also designed to fill in gaps in the currently sparse neuroimaging literature on linguistic prosody. The basic fMRI protocol was described at the beginning of Section D. The studies involve unpredictable auditory perturbations to prosodic cues (F0 or intensity) dur -ing stressed syllables on 1 in 4 trials in which the subject produces a word or short phrase from the stimulus lists used for the psychophysical experiments. Comparison of perturbed to unperturbed trials will be used to identify brain areas involved in the detection and online (i.e., feedback-based) correction of auditory errors in prosodic cues, in a manner analogous to the F1 perturbation experiment described in Section C.2. Subjects. The subject selection criteria for the four fMRI experiments will be the same as for the psychophysical experi -ments described above. 18 subjects will be scanned in each experiment; this pool size was determined from the following power analysis. Power analysis. Following the methodology described in Zarahn and Slifstein (2001), we utilized the data from our auditory perturbation fMRI experiment described in Section C.2 to obtain reference parameters for deriving power estimations for the fMRI studies proposed here. Activation in the planum temporale for the perturbed speech – unperturbed speech contrast provided measures of within- and between-subject variability as well as a reference effect size of the SPM-derived general linear model param-eters. The expected within-subject variability for our proposed studies was then computed from the reference value by using the number of conditions and stimulus presentations in the proposed studies compared to the same values for the reference study. For the four fMRI experiments proposed below, we estimate 60 minutes of total functional scanning time per subject (broken into six 10-minute runs), and a 12-second inter-trial inter -val (ITI), resulting in 300 total trials. 25 of these trials will be “baseline” trials, in which the subject simply rests while viewing “YYY” on the screen; this number provides sufficient power to detect the relatively large signal changes that occur in speech – baseline contrasts. Of the remaining trials, 1/4 (69 trials) involve an auditory perturbation (half perturbed upward, half downward), and 3/4 (206 trials) involve unperturbed speech. The re -sulting power estimate indicates that 18 subjects are enough to detect (with probability>.8) in a random-effect analysis (at a p<.05 type I error level) an effect size in the perturbed – unperturbed speech contrast that is 70% as large as the effect size of the reference contrast.

In fMRI Experiment 1, subjects will be presented with the same stimuli from the phrase-level production experiments (Production Experiments 1 and 2) and will produce these utterances in the scanner, one stimu-lus per trial. Downward perturbation of F0 for the stressed syllable will occur on 1/8 of the trials (randomly dis-

PHS 398 (Rev. 05/01) Page> 39

Page 23: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.tributed); upward perturbation will occur on 1/8 of the trials; the remaining 3/4 will be unperturbed15. Hypothe-sis Test 1. Based on our previous experiment involving perturbation of F1 (Section C.2), we hypothesize that perturbation of F0 will activate auditory error cells located in the posterior superior temporal gyrus/sulcus. To test this hypothesis, we will perform a contrast (using the SPM software package) between the perturbed (both upward and downward) and unperturbed conditions (denoted as perturbed – unperturbed hereafter). This contrast will identify any statistically significant activity differences (random effect analysis, statistics con-trolled at a 0.05 false discovery rate (FDR)) between the two cases. The hypothesis will be supported if there is significant activity in the posterior superior temporal gyrus (including the planum temporale) or superior temporal sulcus in the perturbed – unperturbed contrast. The hypothesis will be rejected if no significant activ-ity is found in these areas. If significant activity is found outside these areas in either the temporal or parietal lobes, we will interpret this as an alternative location for the F0 error cells. If significant activity is found in the motor and/or premotor cortical areas, this activity will be interpreted as constituting a motor corrective re -sponse to the perturbation. Hypothesis Test 2. To test between the different hemispheric models of prosodic processing described in the theoretical background above, we will perform a test for laterality on the per-turbed – unperturbed contrast limited to regions where significant activation is found in either hemisphere. Specifically, we will perform a paired t-test within subjects across hemisphere of the average activation across the selected regions. If significantly greater activity (p<.05 random effects) is found in the left hemisphere, this will support models positing that linguistic prosody (as conveyed by F0) is primarily controlled by left hemi-sphere mechanisms (e.g., Emmorey, 1987; Walker et al., 2004; Van Lancker, 1980). If no significant differ-ence is found, the results will support models positing significant involvement of both hemispheres in linguistic prosody (Mildner, 2004). If significantly more activity is found in the right hemisphere, this will support models positing pitch processing primarily in the right hemisphere (e.g., Friederici & Alter, 2004; Boutsen & Christ -man, 2002; Sidtis & Van Lancker Sidtis, 2003; Poeppel, 2003; Sidtis & Volpe, 1988; Zatorre, 1988; Sidtis & Feldmann, 1990).

In addition to the voxel-based tests just described, we plan to perform structural equation model (SEM) analyses to test hypotheses concerning the signaling between brain regions (see Effective connectivity analy-sis above for methodological details). It should be noted, however, that the validity of effective connectivity methods is not as well established as the validity of voxel-based methods. Thus caution must be exercised in interpreting SEM results as falsifying or strongly supporting a particular hypothesis. Nonetheless SEM can be expected to provide valuable supplemental information to the voxel-based hypothesis tests described above. Hypothesis Test 3. If auditory error cells are found, we expect larger effective connectivity between these cells and the motor/premotor face area. To test this hypothesis we will compare effective connectivity path strengths determined using SEM between the perturbed and unperturbed conditions, looking specifically at the path strength(s) between the auditory error cells and the motor/premotor face area. (See description of SEM methods in Effective connectivity analysis above.) A stronger path strength for the perturbed condition would support the model’s hypothesis that signaling between the two regions increases during perturbed speech as part of the auditory feedback control system.

fMRI Experiment 2 will be the same as fMRI Experiment 1 except that subjects will be producing the word-level stimuli from Production Experiments 3 and 4 rather than the phrase-level stimuli of Production Ex-periments 1 and 2. The number of trials of each stimulus type and statistical power estimate are the same as fMRI Experiment 1. Hypothesis Tests 1-3 will be the same as in fMRI Experiment 1. In Hypothesis Test 4, comparison of the results of fMRI Experiment 1 to those of Experiment 2 will be used to test the hypothesis that left hemisphere mechanisms are more involved in segmental and word-level prosody, whereas right hemisphere mechanisms are responsible for sentence-level prosody (e.g., Baum & Pell, 1999; Friederici & Al-ter, 2004; Boutsen & Christman, 2002). This hypothesis will be supported if the perturbed – unperturbed con-trast in fMRI Experiment 1 (phrase-level perturbation) shows significant right-lateralized activation while the same contrast in fMRI Experiment 2 (word-level perturbation) shows significant left-lateralized activation.

fMRI Experiments 3 and 4 will be the same as fMRI Experiments 1 and 2 except that intensity will be perturbed rather than F0. The number of trials of each stimulus type and statistical power estimate are the same as in fMRI Experiments 1 and 2. Hypothesis Tests 1, 2, and 3 will be the same in as fMRI Experi-ments 1 and 2, though it should be noted that, unlike pitch, the current research literature does not make spe-cific predictions regarding laterality of intensity processing independent of its role in prosodic control. In Hy-pothesis Test 4, the results of fMRI Experiments 1 and 3 will be compared to investigate the hypothesis that different mechanisms are involved in the control of the phrase-level prosodic cues intensity and F0. For ex -

15 Acoustic analysis of the utterances produced in our previous auditory perturbation fMRI study (Section C.2) indicates that perturbing ¼ of the utterances in this manner will not lead to significant sensorimotor adaptation by the subject (i.e., perturbation occurs infrequently enough that it remains “unexpected”).

PHS 398 (Rev. 05/01) Page> 40

Page 24: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.ample, if we find activation clusters in the superior temporal gyrus/sulcus in the perturbed – unperturbed con-trast in the two experiments, and the stereotactic location (and/or laterality) of these clusters do not overlap one another, we will conclude that intensity and F0 involve different auditory error cells. This would support the Independent Channel Model over the Integrated Model in Fig. 10. If the clusters do overlap significantly, however, and our behavioral results (Production Experiments 1-4) support the Integrated Model, we will con-clude that the Integrated Model best accounts for our results. Hypothesis Test 5 will be the same as Hypoth-esis Test 4, except that differences in word-level prosodic mechanisms (rather than phrase-level) will be tested by comparing the results of fMRI Experiments 2 and 4.

In addition to fMRI data, behavioral data will be collected during the fMRI runs. As in our previous audi -tory perturbation study (Section C.2), this data will be used to determine whether subjects compensate within a perturbed trial, i.e. whether their F0 (or intensity or duration) in perturbed trials differs significantly from un-perturbed trials in the direction opposite the perturbation (cf. Fig. 5) near the end of the syllable but not at the beginning of the syllable. These data will also be used to determine the latency of this compensation, if it is found. We will also look for correlations between our behavioral data (specifically, amount of compensation as measured acoustically) and magnitude of the BOLD response in the perturbed – unperturbed contrast. Fi-nally, the data will be quantitatively compared to the results of model simulations, as described below.Integrating the Modeling and Experimental Studies. As described in Section C.1 and Guenther et al. (2005), the DIVA model is capable of learning a new word presented to it in the form of an acoustic signal. Furthermore, the model can produce sequences of words that it has learned; thus it can produce phrases as in Production Experiments 1 and 2. We propose to train the model to produce words used as stimuli in Pro -duction Experiments 1-4, and then have the model produce the words (Production Experiments 3 and 4) or phrases (Production Experiments 1 and 2) in the same sensorimotor adaptation paradigm undergone by the experimental subjects in the corresponding experiment. For each experiment, the model’s productions will be analyzed in the same fashion as the human subject results and then compared to the human results as in our previous study (see Fig 6). As described earlier, we will implement both an Integrated and Independent Chan-nel version of the DIVA model. Each of these models will be optimized in terms of two free parameters: the feedback control loop gain and the learning rate for updating the feedforward command. These parameters determine the amount of compensation and rate of adaptation in the model. The optimized version of each model will be simulated, and the goodness of fit (maximum likelihood 2) of its acoustic cues (F0, intensity, duration, F1) to the time course of the same acoustic cues measured in Production Expts. 1-4 will be com-puted (cf. Fig. 6). The best fitting version of each model will be evaluated in terms of their 2 statistics, as well as compared using a likelihood ratio test. If both versions have shortcomings, the differences between the model’s performance and that of the subjects will be used to guide revisions to the model.

The proposed fMRI experiments will also inform the model. First, they will test the model hypothesis that there are auditory error cells in the superior temporal gyrus/sulcus that will respond to perturbations of F0 and intensity (Hypothesis Test 1 in fMRI Experiments 1-4). Second, assuming auditory error cells are found, the fMRI experiments test whether auditory error cells for different aspects of the prosodic signal (specifically F0 and intensity) are coded by separate portions of cortex (see Hypothesis Tests 4 and 5 for fMRI Experiments 3 and 4 above). Third, the experiments will indicate the location in stereotactic space, as well as the laterality, of auditory error cells for the prosodic cues of intensity and duration. This information will be used to localize these cells in the model (cf. the left panel of Fig. 2). Finally, if activations are found outside the areas pre -dicted by the model, we will expand the model’s description of the neural processes underlying auditory feed -back control of prosodic cues to include these areas.

After the model has been refined in these ways, simulations of the model performing the same produc-tion tasks as the subjects in the fMRI experiments will be performed. The model’s response to perturbation, as indicated by the F0, intensity, and duration of its perturbed utterances as compared to its unperturbed ut -terances, will be compared to the behavioral results collected as part of the fMRI experiment (cf. Fig. 5). The model goodness of fit (mean square error) will be evaluated and the model will be rejected if it fails to appro -priately approximate the experimental results (likelihood ratio test comparing the model's mean square error to the inter-subject variability). We will also generate simulated fMRI data from the simulations as described in Section C.1 and compare them to the fMRI data obtained in fMRI Experiments 1-4. Thus far, our compar-isons of the model to fMRI data have been largely qualitative. It should be noted that, although this is a short -coming of the model, to our knowledge no other laboratory has performed quantitative comparisons of a com-putational model of speech production to fMRI experimental results, and few other models of system-level nervous system function make even qualitative predictions regarding fMRI data that are as specific and testable as those described herein. Furthermore, there are no standard methods for quantifying fits between a computational neural model and experimentally measured fMRI activations. As part of this project, we will

PHS 398 (Rev. 05/01) Page> 41

Page 25: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.investigate quantitative techniques for comparing model and human activations. For example we will explore mutual information (e.g., Cover & Thomas, 1991; Maes et al., 1997) as a measure of match between experi -mental fMRI results and the simulated fMRI activations produced by the model. This measure describes the degree of agreement between two datasets in a way that is more robust than other comparable measures (such as correlation), and it has been extensively used for the coregistration of brain images (e.g. in the SPM2 fMRI analysis software package). Maximizing mutual information across different experimental condi-tions will be used as a quantitative objective guiding the refinement of the current model. We will also explore the use of non-parametric statistical methods such as the permutation test to obtain statistical confidence val-ues for our model when pooling measures of fit across different experiments and conditions.D.2 Phonetic and auditory representations of speech sounds in the auditory cortical areasThis subproject concerns the auditory representations responsible for auditory feedback-based control of the segmental aspects of speech. It complements Subproject D.1., which concerns auditory feedback-based con-trol of prosodic cues, and extends our earlier work on auditory representations (Section C.3).Background. An important property of the DIVA model is the use of an auditory perceptual reference frame for the planning of speech movements; this property is responsible for the model’s ability to account for many important aspects of speech production (e.g., Guenther et al., 1998, 1999; Nieto-Castanon et al., 2005). Currently, however, the model’s auditory representation is rather simplistic, consisting of only three dimen-sions derived from the first three formant frequencies of the acoustic spectrum. Such a representation is not supported by experimental studies of auditory cortex in animals; instead, primary auditory cortex contains a much more detailed representation of the acoustic spectrum, as well as changes in the spectrum through time (e.g., frequency sweeps; Mendelson et al., 1993; Tian & Rauschecker, 1998), and neuroimaging studies suggest similar properties in humans (Hall et al., 2002; Hart et al., 2003). Our more detailed model of auditory cortical representations, the auditory map model described in Section C.3, is currently distinct from the DIVA model. Furthermore, neither the DIVA model nor the auditory map model adequately accounts for talker inde-pendent speech perception, a term used here to refer to the ability of an infant to infer the same phonological units whether spoken by a child, an adult female, or an adult male, despite large differences in spectral char -acteristics of speech from these different speaker populations. A talker independent sound representation is crucial if an infant is to learn to use his/her vocal organs to imitate speech sounds produced by adults.Hypotheses and Methods. The research in this subproject addresses these issues by refining our auditory map model and integrating it with the DIVA model. The modeling work will be guided by 2 fMRI experiments and an associated psychophysical experiment designed to answer key questions regarding speech sound representation in cerebral cortex. Since our initial model has been described previously (Section C.3), the ex -periments will be described first, followed by the modeling project.fMRI Experiments. Auditory perturbation will be used to investigate the auditory cortical representations uti-lized in feedback control of speech production in two fMRI experiments. Subjects. The subject selection crite-ria for the two fMRI experiments will be the same as for the psychophysical experiments described in Section D.1. 18 subjects will be scanned in each experiment; this pool size was determined from the following power analysis. A power analysis as described for the fMRI experiments in Section D.1 indicates that 18 subjects are enough to detect (with probability>.8) in a random-effect analysis (at a p<.05 type I error level) an effect size in a perturbed – unperturbed speech contrast that is 70% as large as the reference effect size.fMRI Experiment 1: Within-category vs. between-category auditory perturbations during speech pro-duction. This experiment investigates the hypothesis that there are two types of error cells in the auditory cortical areas, currently treated as a single auditory error map in the DIVA model (Fig. 1): auditory error cells that are activated when any auditory difference exists between the intended sound and the current au-ditory feedback, and phonetic error cells that are only activated when the perceived sound is from a differ-ent phonetic category than the intended sound. We propose that the auditory error cells lie in the left planum temporale in the Sylvian fissure, while the phonetic error cells lie in the posterior superior temporal sulcus. These locations were activated in our previous auditory perturbation fMRI experiment (Fig. 4, top panel). Here we propose a refinement of that experiment that differentiates within-category auditory perturbations (which are expected to activate only one subset of the auditory error cells, those in the left planum temporale) from between-category perturbations (expected to activate cells in superior temporal sulcus bilaterally as well as left planum temporale). In our earlier experiment, subjects attempted to pronounce words they read off the screen, and their auditory feedback of their own speech was shifted such that the sound they hear them-selves produce was usually from a different phonetic category than the intended production (e.g., hearing “bit” when saying “bet”). Here we propose to perturb subjects’ auditory feedback of their own speech in two ways: (i) within-category perturbation, and (ii) between-category perturbation. In both cases, the pertur-bation will be a shift of the first formant frequency of the same size. The within-category perturbation will be

PHS 398 (Rev. 05/01) Page> 42

Page 26: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.chosen to produce perturbed speech that is still perceived as being from the same category as the speaker’s intended utterance. For example, if the unperturbed word is “pet”, the perturbed version will still be classified as “pet” by the speaker, but with an unusual-sounding example of the vowel. The between-category perturba-tion will be of the same size in mels but will straddle a category boundary so it is heard as a different pho -neme than the unperturbed speech (e.g., hearing “pit” instead of “pet”). To determine an appropriate pertur-bation magnitude for each subject, we will perform an associated psychophysical experiment involving each of the fMRI subjects prior to the fMRI session. First a recording will be made of the subject producing the words “bit” and “bet”. We will then produce up- and down-perturbed versions using perturbations of differ-ent sizes (e.g., 0, 50, 100, 150, and 200 mels). The sounds will then be presented to the same subject in an auditory identification experiment (each stimulus heard 5 times). In each trial the subject will simply identify whether he heard the word “beat”, “bit”, “bet”, or “bat”. In this way we can assign phonetic labels to each stim-ulus. This will allow us to determine, for each subject, a perturbation magnitude that, when applied to “bet” (or “bit”) in one direction (upward or downward), will cause the subject to hear a sound from a different phonetic category, but when the same size perturbation is applied in the other direction it leads to the perception of the same phonetic category. These perturbation sizes and directions will then be used in the fMRI experiment to insure that the subject hears the same phoneme category for the within-category perturbation and a different phoneme category for the between-category perturbation. To control for possible effects due to perturbation direction independent of whether a category boundary is crossed, the fMRI experiment will involve 9 subjects in which a downward shift is used for the within-category perturbation and 9 in which an upward shift is used for this perturbation.

Hypothesis Test 1. According to our hypotheses, within-category perturbations should preferentially acti-vate auditory error cells located in the left planum temporale. This hypothesis will be supported if the within-category perturbation – unperturbed speech contrast shows significant activity (random effect analysis, statis-tics controlled at a 0.05 FDR) in the left planum temporale, but no significant activity in other portions of the superior temporal gyrus/sulcus. Hypothesis Test 2. We further hypothesize that between-category perturba-tions should activate phonetic error cells located bilaterally in the superior temporal sulcus in addition to cells in the left planum temporale. This will be supported if the between-category perturbation – unperturbed speech contrast shows significant activity in these areas while the within-category perturbation – unperturbed speech shows significant activity in only left planum temporale. Hypothesis Test 3. An alternative hypothesis is suggested by a recent fMRI study (Dehaene-Lambertz et al., 2005) of auditory and speech perception. During a discrimination task, left supramarginal gyrus activation increased if the speech sounds presented were from different phonemic categories, but not when they were from the same category. Activity in bilateral superior temporal sulcus was not modulated by category membership during the same task. The Dehaene-Lambertz et al. hypothesis will be supported if we find significant activity in the supramarginal gyrus, but not the superior temporal sulcus, for the between-category perturbation – unperturbed speech contrast, while supramarginal gyrus is not significantly active for the within-category perturbation – unperturbed speech con-trast. Hypothesis Test 4. According to our model, the effective connectivity between auditory error cells (i.e., areas active in either of the perturbed – unperturbed speech contrasts) and the motor cortical face area should increase as a result of perturbation, since these pathways are used to generate motor commands that correct for the perturbation. This will be tested by looking for stronger SEM path strengths between these ar -eas in the perturbed conditions as compared to the unperturbed condition.fMRI Experiment 2: Talker-independent auditory representation of speech sounds. As mentioned above, a central hypothesis of the DIVA model is the existence of a talker-independent auditory representa-tion that, for example, allows an infant to learn auditory targets from an adult despite large variations in the acoustic signals produced by these different-sized vocal tracts (e.g., Callan et al., 2000). This representation must include detailed information regarding speech-relevant auditory dimensions since it must be used to fine-tune the speaker’s own productions. However, this representation is invariant to changes in talker; that is, a man, woman, or child speaking the same word with the same accent should yield the same activation in this representation, unlike in lower levels of the auditory system such as primary auditory cortex where spectral differences across talkers will lead to different activation patterns. This talker-independent auditory represen-tation for speech production contrasts with a talker-independent phonetic representation for speech recogni-tion (e.g., our model’s category map; see Fig. 7) which excludes auditory details and instead provides a cate -gorical representation of speech sounds. We hypothesize that the talker-independent auditory representation lies in left planum temporale, in the speech perception/production interface area referred to as Spt by Hickok and colleagues (e.g. Hickok et al., 2003), whereas the talker-independent phonetic representation lies in an -terior STS (cf. Mummery et al., 1999; Binder et al., 2000; Scott et al., 2000). These hypotheses lead to the following predictions. An auditory perturbation that alters important segmental cues in a speech signal (as in

PHS 398 (Rev. 05/01) Page> 43

Page 27: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.our F1 shift experiment described in Section C.2) should lead to auditory error signals in the talker-indepen-dent map in Spt. Such activity was found in our study described in C.2. In contrast, an auditory perturbation that does not significantly alter the segmental content of the speech signal, such as a shift in the pitch of the utterance, should not activate this region. Instead, such a perturbation may activate error cells in areas that process pitch and/or talker identity (cf. Belin & Zatorre, 2003; Kriegstein & Giraud, 2004).To test these predic-tions, we propose an fMRI experiment in which subjects’ auditory feedback of their own speech is unpre-dictably perturbed in two different ways on randomly distributed trials: (i) segmental perturbation, in which the first formant frequency is perturbed (as in our previous study16 described in C.2), and (ii) non-segmental perturbation, in which the pitch is shifted as described for the prosodic experiments in Section D.1, except the shift is applied throughout an utterance rather than just on stressed syllables. This shift has minimal effect on the segmental aspects of the speech signal; it sounds correctly articulated and stressed, but with a differ-ent pitch (as if spoken by a different voice). We will normalize the perturbation sizes so that the RMS differ -ences between perturbed and unperturbed speech are comparable for the segmental and non-segmental perturbations.

Hypothesis Test 1. We expect that segmental perturbations will cause more activation (compared to un-perturbed trials) in the left planum temporale (Spt) due to activation of auditory error cells in the talker inde-pendent auditory representation. The non-segmental perturbation is not expected to cause an increase in ac-tivation in this area since it only changes the perceived voice (talker), not the segmental or linguistic content. This hypothesis will be supported if we find significant activity in left planum temporale for the segmental per-turbation – unperturbed speech contrast, but not in the non-segmental perturbation – perturbed speech con-trast. Hypothesis Test 2. Based on previous studies involving changes in talker identity (e.g., Belin & Za-torre, 2003; Kriegstein & Giraud, 2004), we expect the non-segmental perturbation to cause increased activity in areas in the right superior temporal sulcus. This hypothesis will be supported if we find significant activity in the right superior temporal sulcus for the non-segmental perturbation – perturbed speech contrast but not for the segmental perturbation – unperturbed speech contrast. Hypothesis Test 3. According to our model, ef-fective connectivity between auditory error cells (i.e., areas active in either of the perturbed – unperturbed speech contrasts) and motor cortical areas should increase as a result of perturbation since these pathways are used to generate motor commands that correct for the perturbation. This will be tested by looking for stronger SEM path strengths between these areas in the perturbed conditions as compared to the unper-turbed condition.Modeling Project. The modeling project will consist of two stages. In the first stage, we will extend our audi-tory map model to include a talker independent auditory representation. We will incorporate cepstral normal-ization (Pelecanos & Sridharan, 2001) as well as vocal tract length normalization techniques (Cohen et al. 1995) that have been shown to produce channel and speaker normalization (Haeb-Umbach, 1999). We will also refine the model to handle voice onset time (VOT); currently we do not have cells that differentiate be-tween voiced and unvoiced consonant segments. This work will be guided by existing theoretical models for the representation of talker independent properties (e.g. Fant, 1975; Miller, 1989; Hirsch et al., 1991; Huang & Lee, 1993; Aikawa et al. 1996; Gouvêa & Stern, 1997; Levin 2002) and the exact anatomical locations (in-cluding hemispheric laterality) and firing properties of these cells will be based on the results of previous neu-rophysiological studies (e.g., Eggermont, 1995; Liégeois-Chauvel et al., 1999; Steinschneider et al., 2003; Wong et al., 2004) as well as our own fMRI experiments. Finally, the map will include the representation of prosodic cues we proposed to develop in Section D.1. To verify the model’s performance, we will present stimulus types used in the experiments cited above and compare the model’s response pattern for these stimuli to the neurophysiological data from those studies.

In the second stage, the auditory map model will be embedded into the DIVA model by replacing the DIVA model’s current auditory state map (Fig. 1) with the three maps from the auditory map model (Fig. 6). This process involves modification of the DIVA code to be compatible with the new auditory representation, as well as verification of the updated code through computer simulation of simple utterances. The DIVA soft -ware includes over 10,000 lines of code. This code involves a complex combination of modules that imple-ment the neural model itself, the graphical user interface, the simulated fMRI activation generator, and a so-phisticated articulatory synthesis system (developed by Shinji Maeda) that generates an acoustic signal based on articulator positions. Thus the merging of the auditory map model and the DIVA model is a substan-tial task.

Upon completion, the combined model will be tested to verify its ability to extract and successfully pro-

16 The F1 perturbation condition is included again here (rather than relying on our earlier results) to allow direct compari -son of the two types of perturbation in the fMRI analyses of this study. This type of comparison cannot be done across studies due to the many sources of variability across subjects, imaging sequences, etc.

PHS 398 (Rev. 05/01) Page> 44

Page 28: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.duce auditory targets spoken by different talkers, including males, females, and children. (Currently, without the talker-normalized representation proposed above, the model is capable of learning speech sounds only from speakers whose vocal tracts are approximately the same size as the vocal tract of the articulatory syn-thesizer in the model.) Then the model will be simulated on the same unexpected auditory perturbation tasks used in fMRI Experiments 1 and 2, and its acoustic trajectories quantitatively compared to those of the experi-mental subjects as illustrated in Section C.2 (see Fig. 5). The model’s simulated fMRI activations from these simulations will be compared to the results of the fMRI studies, as described in Sections C.1 and D.1. At all stages in the modeling process, if the model fails to perform adequately and/or account for aspects of the ex-perimental results, it will be revised as indicated by these results. D.3. Interactions between auditory feedback control and feedforward controlThis subproject concerns the integration of auditory feedback control mechanisms with feedforward commands for speech production.Theoretical background and hypotheses. According to the DIVA model, projections from speech sound map cells in left frontal oper-culum to higher-order auditory and somatosensory cortex constitute sensory expectations (in the form of target regions) for the current sound (Feedback Control Subsystem in Fig. 1). These expectations are effectively subtracted from incoming sensory information; that is, they have an inhibitory effect on the sensory cortical areas. If the in-coming sensory information is outside the target region, sensory er-ror cells become active and drive the movement back into the target region via projections to the motor cortex. Inhibition of auditory corti-cal areas during vocalization has been identified in human subjects (e.g., Houde et al., 2002) as well as animals (e.g., Eliades & Wang, 2003). In previous work we hypothesized that changes in speaking rate are accompanied by changes in the sizes of the sensory target regions for speech sounds (Guenther, 1995). In particular, rapid speech involves an increase in the size of the target region which leads to more centralized vowel productions, while slower “clear speech” (where subjects are asked to speak as clearly as possible) involves smaller, more extreme target regions. In collaborative work with Dr. Joseph Perkell at MIT, we found direct support for this prediction in an experiment involving vowels produced in a number of phonetic contexts and speaking conditions (Fig. 12).

In past simulations, we modeled the changing size of the target regions with a parameter that explicitly scaled the region size (Guenther, 1995). We propose here that, during fast speech, the feedback control sub-system becomes less engaged since feedback control becomes unstable for fast movements, and that this disengagement is responsible for the changing size of the target region. In the DIVA model, disengagement of the feedback control subsystem can be carried out by increasing the amount of inhibition from frontal oper-culum to sensory cortical areas. More inhibition means that a wider range of sensory input will be tolerated without giving rise to error signals; this results in a larger target region. Thus the same mechanism may be re -sponsible for increasing the target region size and disengaging feedback control during fast speech.

Only a very small number of previous neuroimaging studies have studied speech at different rates (Wild -gruber et al., 2001; Riecker et al., 2005). These studies involved a protocol in which a syllable is repeated at a predetermined pace (between 2-6 Hz). This protocol confounds number of syllables with production rate; e.g., the 6 Hz condition involves both a faster speaking rate and more syllables than the 2 Hz condition. Fur-thermore, the design involves speech during scanning, thereby creating artifacts in the scanned images due to factors such as head movement and changing size of the oral cavity. More importantly for our purposes, the scanner was creating loud noise during the subjects’ speech, a condition that greatly inhibits use of audi-tory feedback control and also creates a confound when interpreting auditory cortical activity. We avoid these problems in the experiment proposed below due to our use of an event-triggered sparse sampling paradigm which involves the same number of words per trial for each speaking rate condition. Methods. The hypotheses outlined above will be investigated in a modeling project and fMRI experiment. Modeling Project. We will implement the hypothesis that the larger target regions seen for fast speech vs. slow or clear speech (cf. Fig. 12) result from increased inhibition of the sensory cortical areas. This involves modifying the model’s premotor cortical SSM cells to take on a continuous range of activity (rather than the current binary activation pattern) so that speaking rate can be encoded by amount of activation in these cells. Furthermore, the strength of the inhibitory projections from the SSM will be made to scale with the amount of

PHS 398 (Rev. 05/01) Page> 45

Fig. 12. Ellipses indicating the range of formant frequencies (+/-1 s.d.) used by a speaker to produce five vowels (iy, eh, aa, uh, uw) during fast speech (light gray) and clear speech (dark gray) in a variety of phonetic contexts. [Data collected under R01 DC01925; J. Perkell, PI.]

Page 29: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.activation in the SSM cells. Simulations will be run to verify that these changes result in smaller target re-gions for slower speech and larger target regions in clear speech, without requiring a separate “region size” parameter as currently used in the DIVA model. After these changes have been implemented, simulations will be run in four different speaking conditions, represented by different levels of SSM activation: clear, stressed, normal, and fast (listed in increasing order of amount of SSM activation). Simulated fMRI activations will be generated from these simulations and compared to the results of the following fMRI experiment, and acoustic trajectories of the model’s productions will be compared to those measured in the fMRI experiment.fMRI Experiment. In this experiment, subjects will produce speech in four different speaking conditions while in the fMRI scanner. In each trial of the normal condition, subjects will read a four-word sentence from a screen; subjects will be instructed to speak normally (i.e., as if having a relaxed conversation) and with all four words unstressed (e.g., “Bob caught a dog”). The stimulus sentences will be the same sentences used for Production Experiment 1 in Section D.1. In the fast condition and clear condition, subjects will produce the same four-word sentences, but this time they will be instructed to produce the speech very rapidly or very clearly (as if someone were trying to transcribe their speech over the telephone) in different trials. In the stressed condition, subjects will produce the same four-word sentences, but this time they will be instructed to stress any word that is in capital letters (“BOB caught a DOG”, spoken as if answering the question “Who caught what?”). Finally, a baseline condition will consist of quietly viewing YYY on the computer screen. Subjects will be given a practice session in advance of the scanning session until they can reliably produce the sentences in the appropriate manner for each condition, as judged by the experimenter. When construct -ing the stimuli, care will be taken to balance the phonemes used and number of syllables across conditions to control for length and segmental complexity, and all sentences will use the same subject-verb-object gram-matical construction to control for grammatical complexity. As with all experiments described herein, subject productions will be analyzed to insure that the stimuli were produced properly, and trials containing mis-pro-nounced utterances will be removed from subsequent analyses. Subjects and power analysis. 15 subjects (chosen according to the same selection criteria as in the previous experiments) will be scanned. A power analysis indicates that this subject pool size will allow detection (with 0.8 probability) of an effect size that is 70% the size of the reference contrast used in the previous power analyses.

Hypothesis Test 1. As described above, we hypothesize that clear speech involves more auditory feed-back control than normal and fast speech, and that this should be evidenced by increased activity in the planum temporale and posterior superior temporal gyrus/sulcus (the location of the auditory error cells in the DIVA model, labeled A in the left panel of Fig. 2). To test this hypothesis, we will perform contrasts between the clear and fast conditions and between the clear and normal conditions. If there is significantly greater ac -tivity in the superior temporal gyrus/sulcus and planum temporale in the clear speech condition, we will con-clude that our hypothesis is supported. Hypothesis Test 2. We also hypothesize an increase in the use of somatosensory feedback control for clear speech vs. fast or normal speech, involving increased activity in the supramarginal gyrus (the location of the somatosensory error cells in the model, labeled S in the left panel of Fig. 2). To test this, we will perform contrasts between the clear and fast conditions and the clear and nor -mal conditions. These contrasts will identify any statistically significant activity differences between the con-trasted cases. If there is significantly greater activity in the supramarginal gyrus in the clear – normal and clear – fast contrasts, we will conclude that our hypothesis is supported. If instead we find that Hypothesis Tests 1 and 2 reveal that the clear condition has less activation in the sensory areas than the normal and fast conditions, we will conclude that sensory feedback control mechanisms involve less activation, rather than more activation, in the sensory cortical areas, perhaps due to the reduced amount of sensory error when feedback control supplements feedforward commands. This will cause us to reject our model of speaking rate control outlined in the theoretical background, instead favoring a model in which auditory and somatosensory error cells become more active during fast speech without causing corrective motor commands. Hypothesis Test 3. We hypothesize that normal speech involves more feedback control than fast speech, and this should be evidenced by more activity in the sensory cortical areas described above. This hypothesis will be tested by contrasting the normal and fast conditions. Significant activity in the sensory areas in the normal – fast con-trast will be taken as support for the model’s hypothesis, whereas significant activity in these areas for the fast – normal contrast will be taken as support for the alternative hypothesis described in the preceding para-graph. Hypothesis Test 4. According to Wildgruber et al. (2001) and Riecker et al. (2005), the cerebellum’s response to increased speaking rate is a step function: there is no activity for rates below about 3 Hz in their syllable repetition tasks, but both cerebellar hemispheres are active at higher rates. This hypothesis will be supported if we find no cerebellar activation (specifically in the superior paravermal region, which has been implicated in speech motor control in both neuroimaging and lesion studies (e.g., Ackermann et al., 1992; Riecker et al., 2005; Guenther et al., 2005) for the clear speech condition (compared to baseline) but signifi -

PHS 398 (Rev. 05/01) Page> 46

Page 30: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.cant activation in the fast and/or normal conditions. In contrast, in our model the cerebellum is active in all conditions; this hypothesis will be supported if superior paravermal cerebellum activity is found in all four con-ditions vs. baseline. Hypothesis Test 5. According to the model, path strengths between sensory error cells and the motor face area should be greatest during clear speech, next greatest in normal speech, and small -est in fast speech (when feedback control is disengaged). This hypothesis will be tested by comparing SEM path strengths between these areas in the different speaking conditions. Hypothesis Tests 6-8. We hypothe-size that, like clear speech, stressed speech involves more sensory feedback control than normal or fast speech. We will thus repeat Hypothesis Tests 1, 2, and 5 with the stressed condition replacing the clear con-dition.

D.4. Timeline of proposed project:

PHS 398 (Rev. 05/01) Page> 47

Year 1 Year 2 Year 3 Year 4 Year 5

D.2 fMRI Experiments (2)

D.1 fMRI Experiments (4)

D.1 Psychophysical Experiments (7) (Northeastern Univ)

D.3 fMRI Expt

D.2 Modeling Project

D.1 Modeling Project

D.3 Modeling Project

Page 31: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.E. Protection of Human Subjects

RISKS TO THE SUBJECTS Human Subjects Involvement and Characteristics: All subjects will be healthy individuals with no history of speech disorders, hearing disorders, seizures, or severe claustrophobia. Subjects will be recruited by local advertisement, and consented by procedures approved by the local Institutional Review Boards. The studies proposed are of basic physiology and thus cannot be expected, at least initially, to be influenced by gender or ethnicity. Subjects will thus be recruited in proportion to their ethnic and gender balance in the local community with the additional constraint that subjects will be native speakers of American English (Massachusetts demo-graphics according to the year 2000 census data: 7% Latino, 93% not Latino; 84% white, 5% black or African American, 4% Asian, 0.2% American Indian and Alaska native, with similar numbers of men and women). Sub-jects will be between the ages of 18 and 55, with no form of implant that involves magnetic or electrical parts. Because brain lateralization for different speech tasks is among the issues being studied, we will use right-handed subjects to maximize the likelihood that the subject has left hemisphere dominance for language A total of approximately 245 subjects will be needed for the experiments proposed herein. In experiments in-volving visually presented stimuli, subjects will be required to have normal or corrected-to-normal vision (cor-rected using non-magnetic glasses available at the MGH NMR Center). The proposed dates of enrollment are 8/1/2006-7/31/2011.

1. Inclusion Criteria(1) Age between 18 and 55 years(2) Normal physical and normal neurological examination (expect as noted in Sections D.3 and D.4)(3) Ability to participate in fMRI and MEG experiments

2. Exclusion Criteria(1) Cognitive deficit that could impair ability to give informed consent or competently participate in the study(2) Active medical, neurological or psychiatric condition(3) Presence of an MRI risk factor:

(1) Known claustrophobia(2) An electrically, magnetically or mechanically activated implant (such as cardiac pacemaker) or

intracerebral vascular clip(3) Metal in any part of the body, including metal injury to the eye(4) Pregnancy

(4) Lefthandedness

Sources of Materials: Research material will be in the form of computer data concerning performance on psychophysical tests conducted at Boston University and Northeastern University; MRI data recorded at the NMR Center of Massachusetts General Hospital. These data will be obtained specifically for the research pur-poses described in Sections A-D and will be stored in accordance with HIPAA regulations.

Potential Risks: The planned procedures involve unlikely and negligible risk. There are no known or foresee-able physical risks associated with undergoing MRI, except for those individuals who have an electronically, magnetically or mechanically activated implant or metal in their body, or are pregnant (see MRI risk factors above). All features of the MRI system to be used in the proposed study have been approved by the FDA and will be operated using parameters accepted by the FDA. Subjects will wear earplugs or headsets as hearing protection as mandated by OSHA. There is the risk of psychological discomfort if a subject has claustropho -bia. All potential subjects will be screened for risk factors prior to study enrollment, and all enrolled subjects will again be screened immediately prior to undergoing MRI. These screening procedures should exclude subjects with foreseeable risk.

ADEQUACY OF PROTECTION AGAINST RISKS Recruitment and Informed Consent: Subjects will be recruited by advertising in the form of paper and web based posters, and electronic mail mes-sages. When an individual volunteers to be a subject, the experimental protocol will be explained in detail ver-bally and she/he will be given a copy of the consent form to read and sign. Prior to an fMRI experiment, the subject will fill out a questionnaire designed to identify any potentially dangerous health conditions (such as

PHS 398 (Rev. 05/01) Page> 48

Page 32: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.metal implants). The subject will be told that the experimenters will answer any questions about the proce-dure (except about aspects of the design or hypotheses that might influence their performance). Those who volunteer to participate in a study will be given a copy of the consent from to read and sign.

Protection Against Risk: The MRI equipment is built to ensure the highest possible degree of subject safety. Subjects will be able to terminate experimental sessions for any reason by signaling the experi -menters in a manner explained to the subject before the session begins. All potential MRI subjects will be screened for MRI risk factors prior to study enrollment. If the potential subject cannot rule out the possibility of pregnancy, a pregnancy test will be conducted prior to study enrollment. All enrolled subjects will again be screened just before undergoing MRI. These screening procedures should exclude subjects with foreseeable risks. All subjects will be monitored continuously by research investigators during MRI sessions. Subjects will be able to communicate with research investigators throughout all experimental sessions via a 2-way micro -phone, and an alarm system to be used if necessary. All subjects will wear earplugs or headsets during MRI to reduce the transmission of noises of the MRI scanner (e.g., buzzing, beeping) to a comfortable and safe level. If a subject experiences any discomfort that cannot be alleviated by the research investigators, the ex-perimental session will be terminated. Subject confidentiality is protected by not using their names or initials in published reports and protecting their data in accordance with HIPAA regulations.

POTENTIAL BENEFITS OF THE PROPOSED RESEARCH TO THE SUBJECTS AND OTHERS AND IM-PORTANCE OF THE KNOWLEDGE TO BE GAINEDSubjects will not directly benefit from their participation in the proposed studies, except for subject payment. However, their participation will contribute very useful information concerning the neural mechanisms of speech perception and production and speech disorders. Thus the risk/benefit ratio is negligible.

Collaborating SitesBoston University, Boston, MA (OHRP assurance number FWA-2457; M1428)Massachusetts General Hospital, Charlestown, MA (OHRP assurance number FWA-3136)Northeastern University, Boston, MA (OHRP assurance number FWA-4630)

INCLUSION OF WOMEN Subject recruitment is via advertisement and it will be clearly stated that women are encouraged to participate in the research. From our past studies we expect women to make up approximately half of our subject pool (see Planned Enrollment Table below).

INCLUSION OF MINORITIESSubject recruitment is via advertisement and it will be clearly stated that members of minority groups are en -couraged to participate in the research. From our past studies we expect minorities to be represented in our subject pool in approximate proportion to their representation in the local population (see Planned Enrollment Table below).

INCLUSION OF CHILDRENChildren 18-21 will be enrolled. Younger children will not be enrolled for the following reasons:

(1) the proposed studies are aimed at determining brain function in the fully developed brain;(2) the fMRI experiments require subjects who can patiently perform repetitive tasks without moving

and/or losing concentration.

PHS 398 (Rev. 05/01) Page> 49

Page 33: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.

*The “Ethnic Category Total of All Subjects” must be equal to the “Racial Categories Total of All Subjects.”

PHS 398 (Rev. 05/01) Page> 50

Targeted/Planned Enrollment TableThis report format should NOT be used for data collection from study participants.

Study Title: Neural Modeling and Imaging of Speech

Total Planned Enrollment: 245

TARGETED/PLANNED ENROLLMENT: Number of Subjects

Ethnic Category Sex/Gender

Females Males Total

Hispanic or Latino 9 9 18

Not Hispanic or Latino 114 113 227

Ethnic Category Total of All Subjects* 123 122 245

Racial Categories 

American Indian/Alaska Native 1 1 2

Asian 6 6 12

Native Hawaiian or Other Pacific Islander 1 1 2

Black or African American 8 8 16

White 107 106 213

Racial Categories: Total of All Subjects * 123 122 245

Page 34: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.F. Vertebrate Animals

There are no studies involving vertebrate animals in the current proposal.

PHS 398 (Rev. 05/01) Page> 51

Page 35: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.G. Literature Cited

Ackermann, H., Vogel, M., Petersen, D., and Poremba, M. (1992). Speech deficits in ischaemic cerebellar le -sions. Journal of Neurology, 239, pp. 223-227.

Ackermann, H. and Riecker, A. (2004). The contribution of the insula to motor aspects of speech production: a review and a hypothesis. Brain and Language 89, pp. 320-328.

Adolphs, R., Damasio, H., and Tranel, D. (2002). Neural systems for recognition of emotional prosody: A 3-D lesion study. Emotion, 2, pp. 23-51.

Aikawa, K, Singer, H, Kawahara, H, and Tohkura, Y. (1996). Cepstral representation of speech motivated by time-frequency masking: an application to speech recognition. Journal of the Acoustical Society of Amer-ica. 100(1), pp. 603-14.

Baddeley, A. (2003). Working memory and language: An overview. Journal of Communication Disorders, 36, pp. 189-208.

Baddeley, A, and Hitch, G.J. (1974). Working memory. In G.A. Bower (Ed.), Recent advances in learning and motivation ,Vol. 8, pp. 47-90. New York: Academic Press.

Baum, S.R., and Pell, M.D. (1999). The neural bases of prosody: Insights from lesion studies and neuroimag-ing. Aphasiology, 13, pp. 581-608.

Belin, P., and Zatorre, R.J. (2003) Adaptation to speaker's voice in right anterior temporal lobe. Neuroreport. 14(16):2105-9.

Bendor, D., Wang, X. (2005). The neuronal representation of pitch in primate auditory cortex. Nature. 436(7054):1161-5.

Binder, J.R., Frost, J.A., Hammeke, T.A., Bellgowan, P.S., Springer, J.A., Kaufman, J.N., and Possing, E.T. (2000). Human temporal lobe activation by speech and nonspeech sounds. Cerebral Cortex, 10, pp. 512-528.

Bloom, L. (1973). One Word at a Time. Mouton, The Hague.Bolinger, D. (1961). Contrastive accent and contrastive stress. Language, 37, pp. 83-96. Bolinger, D. (1989). Intonation And Its Uses: Melody In Grammar And Discourse. Stanford: Stanford Univer-

sity Press.Bollen, K. A. (1989). Structural equations with latent variables. New York, Wiley.Boutsen, F.R., and Christman, S.S. (2002). Prosody in apraxia of speech. Seminars in Speech and Lan-

guage, 23, pp. 245-255.Buchanan, T.W., Lutz, K., Mirzazade, S., Sprecth, K., Shah, N.J., Zilles, K., and Jancke, L. (2000). Recogni -

tion of emotional prosody and verbal components of spoken language: An fMRI study. Brain Research. Cognitive Brain Research, 9, pp. 227-238.

Büchel, C., Coull, J. T., and Friston, K. J. (1999). The predictive value of changes in effective connectivity for human learning. Science 283: 1538-1541.

Büchel, C., and Friston, K. J. (1997) Modulation of connectivity in visual pathways by attention: cortical inter-actions evaluated with structural equation modeling and fMRI. Cerebral Cortex 7: 768-778.

Buchsbaum, B.R., Hickok, G., and Humpries, C. (2001). Role of left posterior superior temporal gyrus in phonological processing for speech perception and production. Cognitive Science, 25, pp. 663-678.

Bullmore, E., Horwitz B., Honey, G., Brammer, M., Williams, S., and Sharma, T. (2000). How good is good enough in path analysis of fMRI data? Neuroimage 11: 289-301.

Bullock, D. (2004a). Adaptive neural models of queuing and timing in fluent action. Trends in Cognitive Sci-ences. 8, pp. 426-433.

Bullock, D. (2004b). From parallel sequence representations to calligraphic control: A conspiracy of adaptive neural circuit models. Motor Control, 8, pp. 371-391.

Bunton, K., Kent, R., Kent, J., and Duffy, J. (2001). The effects of flattening fundamental frequency contours on sentence intelligibility in speakers with dysarthria. Clinical Linguistics & Phonetics, 15, 181-193.

Bunton, K., Weismer, G., and Kent, R. (2000). The effects of flattened fundamental frequency contours on sentence intelligibility in dysarthric speakers. Conference on Motor Speech: Motor Speech Disorders and Speech Motor Control. San Antonio, TX.

Burnett, T.A., Freedland, M.B., and Larson, C.R. (1998). Voice F0 responses to manipulation in pitch feed -back. Journal of the Acoustical Society of America, 103, pp. 3153-3161.

PHS 398 (Rev. 05/01) Page> 52

Page 36: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.Burnett, T.A., and Larson, C.R. (2002). Early pitch-shift response is active in both steady and dynamic voice

pitch control. Journal of the Acoustical Society of America, 112, pp. 1058-1063.Caesar, K., Gold, L., and Lauritzen, M. (2003). Context sensitivity of activity-dependent increases in cerebral

blood flow. PNAS 100(7): 4239-4244.Callan, D.E., Kent, R.D., Guenther, F.H., and Vorperian, H.K. (2000). An auditory-feedback-based neural

network model of speech production that is robust to developmental changes in the size and shape of the articulatory system. Journal of Speech, Language, and Hearing Research, 43, pp. 721-736.

Caplan, D., Gow, D., and Makris, N. (1995). Analysis of lesions by MRI in stroke patients with acoustic-pho-netic processing deficits. Neurology, 45, pp. 293-298.

Cohen, J., Kamm, K., and Andreou, A. (1995). Vocal tract normalization in speech recognition: com-pensat-ing for system systematic speaker variability. Journal of the Acoustical Society of America, vol. 97, no. 5, Pt. 2, pp. 3246-3247.

Cover, T.M., and Thomas, J.A. (1991). Elements of Information Theory. New York: Wiley.Cruttenden, A. (1997) Intonation. Cambridge: Cambridge University Press (2nd edition).Cutler, A., Dahan, D., and van Donselaar, W. (1997). Prosody in the comprehension of spoken language: A

literature review. Language and Speech, 40(2), pp. 141-201.Darley, F.L., Aronson, A.E., and Brown, J.R. (1969) Clusters of deviant speech dimensions in the dysarthrias.

Journal of speech and hearing research. 12(3):462-96.Darley, F. L., Aronson, A. E., and Brown, J. R. (1975). Motor Speech Disorders. Philadelphia: W. B. Saunders

Company.Dehaene-Lambertz, G, Pallier, C, Serniclaes, W, Sprenger-Charolles, L, Jobert, A, Dehaene, S. (2005). Neu-

ral correlates of switching from auditory to speech perception. Neuroimage. 24(1), pp. 21-33.Delattre, P., and Freeman, D.C. (1968). A dialect study of American r’s by x-ray motion picture. Lingusitics,

44, pp. 29-68.Denes, P. (1959). A preliminary investigation of certain aspects of intonation. Language and Speech, 2, pp.

106-122.Denes, P., and Milton-Williams, J. (1962). Further studies in intonation. Language and Speech, 5, pp. 1-14.Dessing, J.C., Peper, C.E., Bullock, D., and Beek, P.J. (in press).  How position, velocity and temporal infor-

mation combine in prospective control of catching: Data and model. Journal of Cognitive Neuroscience.Doherty, C.P., West, W.C., Dilley, L.C., Shattuck-Hufnagel, S., and Caplan, D. (2004). Question-statement

judgments: An fMRI study of intonation processing. Human Brain Mapping, 23, pp. 85-98.Donath, T.M., Natke, U., and Kalveram, K.T. (2002). Effects of frequency-shifted auditory feedback on voice

F0 contoours in syllables. Journal of the Acoustical Society of America, 111, pp. 357-366.Dronkers, N.F. (1996). A new brain region for coordinating speech articulation. Nature, 384, pp. 159-161.Eggermont, J.J. (1995). Representation of a voice onset time continuum in primary auditory cortex of the cat.

Journal of the Acoustical Society of America. 98(2), pp. 911-20.Eliades, S.J., and Wang, X. (2003). Sensory-motor interaction in the primate auditory cortex during self-initi-

ated vocalizations. Journal of Neurophysiology, 89(4), pp. 2194-207.Elliot, L., and Niemoeller, A. (1970). The role of hearing in controlling voice fundamental frequency. Interna-

tional Audiology, 9, pp. 47-52.Elman, J.L. (1981). Effects of frequency-shifted feedback on the pitch of vocal productions. Journal of the

Acoustical Society of America, 70, pp. 45-50.Emmorey, K.D. (1987). The neurological substrates for prosodic aspects of speech. Brain and Language, 30,

pp. 305-320.Fagg, A.H. and Arbib, M.A. (1998). Modeling parietal-premotor interactions in primate control of grasping.

Neural Networks, 11, pp. 1277-1303.Fant, G. (1975). Non-uniform vowel normalization, Speech Transmission Laboratory Quarterly Progress and

Status Report, 2-3/1975, pp. 1-19.Ferrari, P.F., Gallese, V., Rizzolatti, G., Fogassi, L. (2003). Mirror neurons responding to the observation of

ingestive and communicative mouth actions in the monkey ventral premotor cortex. European Journal of Neuroscience. 17(8):1703-14.

Friederici, A.D., and Alter, K. (2004). Lateralization of auditory language functions: A dynamic dual pathway

PHS 398 (Rev. 05/01) Page> 53

Page 37: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.model. Brain and Language, 89, pp. 267-276.

Friston, K.J. (2002). Beyond phrenology: What can neuroimaging tell us about distributed circuitry? Annual Reviews of Neuroscience 25: 221-250.

Fry, D. (1955). Duration and intensity as physical correlates of linguistic stress. Journal of the Acoustical Soci-ety of America, 27, pp. 765-768.

Fry, D. (1958). Experiments in the perception of stress. Language and Speech, 1(2), pp. 126-152.Gandour, J., Dzemidzic, M., Wong, D., Lowe, M., Tong, Y., Hsieh, L., Satthamnuwong, N., and Lurito, J.

(2003). Temporal integration of speech prosody is shaped by language experience: An fMRI study. Brain and Language, 84, pp. 318-336.

George, M.S., Parekh, P.I., Rosinsky, N., Ketter, T.A., Kimbrell, T.A., Heilman, K.M., Herscovitch, P., and Post, R.M. (1996). Understanding emotional prosody activates right hemisphere regions. Archives of Neurology, 53, pp. 665-670.

Ghacibeh, G.A. and Heilman, K.M. (2003). Progressive affective aprosodia and prosoplegia. Neurology, 60, pp. 1192-1194.

Godey, B., Atencio, C.A., Bonham, B.H., Schreiner, C.E., Cheung, S.W. (2005). Functional organization of squirrel monkey primary auditory cortex: responses to frequency-modulation sweeps. Journal of neuro-physiology. 94(2):1299-311.

Gouvêa, E.B., and Stern, R.M. (1997). Speaker Normalization Through Formant-Based Warping of the Fre -quency Scale, Proc of the European Conference on Speech Communication and Technology.

Grossberg, S. (1980). How does a brain build a cognitive code? Psychological Review, 87, pp. 1-57.Guenther, F.H. (1994). A neural network model of speech acquisition and motor equivalent speech produc-

tion. Biological Cybernetics, 72, pp. 43-53.Guenther, F.H. (1995). Speech sound acquisition, coarticulation, and rate effects in a neural network model

of speech production. Psychological Review, 102, pp. 594-621.Guenther, F.H., and Bohland, J.W. (2002). Learning sound categories: A neural model and supporting experi-

ments. Acoustical Science and Technology, 23(4), pp. 213-220.Guenther, F.H., Hampson, M., and Johnson, D. (1998). A theoretical investigation of reference frames for the

planning of speech movements. Psychological Review, 105, pp. 611-633.Guenther, F.H., Espy-Wilson, C.Y., Boyce, S.E., Matthies, M.L., Zandipour, M., and Perkell, J.S. (1999). Artic-

ulatory tradeoffs reduce acoustic variability during American English /r/ production. Journal of the Acous-tical Soci-ety of America, 105(5), pp. 2854-65.

Guenther, F.H., Nieto-Castanon, A., Ghosh, S.S., and Tourville, J.A. (2004). Representation of sound cate-gories in auditory cortical maps. Journal of speech, language, and hearing research, 47, pp. 46-57.

Guenther, F.H., Ghosh, S.S., and Tourville, J.A. (2005). Neural modeling and imaging of the cortical interac -tions underlying syllable production. Brain and Language, E-print ahead of publication.

Guenther, F.H., Tourville, J.A., and Bohland, J. (2003). Modeling the representation of speech sounds in au-ditory cortical areas. Program of the 145 th Meeting of the Acoustical Society of America, Journal of the Acoustical Society of America, 113(4) Pt. 2, p. 2210.

Haeb-Umbach, R. (1999). Investigations on inter-speaker variability in the feature space. Proceedings of Acoustics, Speech, and Signal Processing, ICASSP '99. IEEE International Conference. Vol 1, 15-19 pp. 397-400.

Hain, T.C., Burnett, T.A., Larson, C.R., and Kiran, S. (2001). Effects of delayed auditory feedback (DAF) on the pitch-shift reflex. Journal of the Acoustical Society of America, 109, pp. 2146-2152.

Hall, D.A., Johnsrude, I.S., Haggard, M.P., Palmer, A.R., Akeroyd, M.A., and Summerfield, A.Q. (2002). Spectral and temporal processing in human auditory cortex. Cerebral Cortex, 12(2), pp. 140-9.

Hart, H.C., Palmer, A.R., and Hall, D.A. (2003). Amplitude and frequency-modulated stimuli activate common regions of human auditory cortex. Cerebral Cortex, 13(7), pp. 773-81.

Heeger, D.J., Huk, A.C., Geisler, W.S., and Albrecht, D.G. (2000). Spikes versus BOLD: what does neu-roimaging tell us about neuronal activity? Nature Neuroscience 3(7): 631-633.

Hickok, G., Erhard, P., Kassubek, J., Helms-Tillery, A.K., Naeve-Velguth, S., Strupp, J.P., Strick, P.L., and Ugurbil, K. (2000). A functional magnetic resonance imaging study of the role of left posterior superior temporal gyrus in speech production: Implications for the explanation of conduction aphasia. Neuro-

PHS 398 (Rev. 05/01) Page> 54

Page 38: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.science Letters, 287, pp. 156-160.

Hickok, G., Buchsbaum, B., Humphries, C. and Muftuler, T. (2003). Auditory-motor interaction revealed by fMRI: speech, music, and working memory in area Spt. Journal of Cognitive Neuroscience, 15, pp. 673-682.

Hickok, G, and Poeppel, D. (2004). Dorsal and ventral streams: a framework for understanding aspects of the functional anatomy of language. Cognition. 92(1-2), pp. 67-99.

Hillis, A.E., Work, M., Barker, P.B., Jacobs, M.A., Breese, E.L. and Maurer, K. (2004). Re-examining the brain regions crucial for orchestrating speech articulation. Brain 127, pp. 1479-1487.

Hirsch, H.G., Meyer, P., and Ruehl, H.W. (1991). Improved speech recognition using high-pass filtering of subband envelopes, Proceedings of the European Conference on Speech Communication and Technol-ogy, pp. 413-416.

Horwitz, B., Friston, K.J., and Taylor, J.G. (2000) Neural modeling and functional brain imaging: an overview. Neural networks. 13(8-9):829-46.

Horwitz, B., Tagamets, M.A. and McIntosh, A.R. (1999). Neural modeling, functional brain imaging, and cog-nition. Trends in Cognitive Sciences, 3, pp. 91-98

Houde, J.F. and Jordan, M.I. (1998). Sensorimotor adaptation in speech production. Science, 279, pp. 1213-1216.

Houde, J.F., Nagarajan, S.S., Sekihara, K., and Merzenich, M.M. (2002). Modulation of the auditory cortex during speech: an MEG study. Journal of Cognitive Neuroscience, 14, pp. 1125-1138.

Howell, P. (1993). Cue trading in the production and perception of vowel stress. Journal of the Acoustical So-ciety of America, 94(4), pp. 2063-2073.

Huang, X.D. and Lee, K.F. (1993). On speaker-independent, speaker-dependent, and speaker-adaptive speech recognition, IEEE Transactions on Speech and Audio Processing, 1(2), pp. 50-157.

Husain, F.T., Tagamets, M.A., Fromm, S.J., Braun, A.R. and Horwitz, B. (2004). Relating neuronal dynamics for auditory object processing to neuroimaging activity: a computational modeling and an fMRI study. Neuroimage 21, pp. 1701-1720.

Iverson, P., and Kuhl, P.K. (1996). Influences of phonetic identification and category goodness on American listeners' perception of /r/ and /l/. Journal of the Acoustical Society of America, 99(2), pp. 1130-40.

Jonas, S. (1981). The supplementary motor region and speech emission. Journal of Communication Disor-ders, 14, pp. 349–373.

Jonas, S. (1987). The supplementary motor region and speech. In E. Perecman (ed.) The Frontal Lobes Re-visited, pp. 241–250. IRBN Press, New York.

Jones, J.A., and Munhall, K.G. (2000). Perceptual calibration of F0 production: Evidence from feedback per-turbation. Journal of the Acoustical Society of America, 108, pp. 1246-1251.

Jonides, J., Schumacher, E.H., Smith, E.E., Koeppe, R.A., Awh, E., Reuter-Lorenz, P.A., Marshuetz, C., and Willis, C.R. (1998). The role of parietal cortex in verbal working memory. Journal of Neuroscience, 18, pp. 5026-5034.

Kent, R.D. and Bauer, H.R. (1985). Vocalizations of one-year-olds. Journal of Child Language, 12, pp. 491-526.

Kent, R.D., and Murray, A.D. (1982). Acoustic features of infant vocalic utterances at 3, 6, and 9 months. Journal of the Acoustical Society of America, 72(2), pp. 353-65.

Kent, R.D., and Rosenbek, J. (1982). Prosodic disturbance and neurologic lesion. Brain and Language, 15, 259-291.

Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43, pp. 59-69.

Kosaki, H., Hashikawa, T., He, J., and Jones, E.G. (1997). Tonotopic organization of auditory cortical fields delineated by parvalbumin immunoreactivity in macaque monkeys. Journal of Comparative Neurology. 386(2), pp. 304-16.

Kotz, S.A., Meyer, M., Alter, K., Besson, M., von Cramon, D.Y., and Friederici, A.D. (2003). On the lateraliza-tion of emotional prosody: An even-related functional MR investigation. Brain and Language, 86, pp. 366-376.

Kriegstein, K.V., and Giraud, A.L. (2004) Distinct functional substrates along the right superior temporal sul-cus for the processing of voices. Neuroimage. 22(2):948-55.

PHS 398 (Rev. 05/01) Page> 55

Page 39: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.Kuhl, P.K. (1991). Human adults and human infants show a ‘perceptual magnet effect’ for the prototypes of

speech categories, monkeys do not. Perception & Psychophysics., 50, pp. 93-107.Ladd, D., Silverman, K. A., Tolkmitt, F., Bergmann, G., and Scherer, K. R. (1985). Evidence for the indepen-

dent function of intonation contour type, voice quality, and F0 range in signaling speaker affect. Journal of the Acoustical Society of America, 78, pp. 435-444.

Lane, H., and Tranel, B. (1971). The Lombard sign and the role of hearing in speech. Journal of speech and hearing research, 14, pp. 677-709.

Larson, C.R., Burnett, T.A., Kiran, S., and Hain, T.C. (2000). Effets of pitch-shift velocity on voice F0 re -sponses. Journal of the Acoustical Society of America, 107, pp. 559-564.

Larson, C.R., Burnett, T.A., Bauer, J.J., Kiran, S., and Hain, T.C. (2001). Comparison of voice F0 responses to pitch-shift onset and offset conditions. Journal of the Acoustical Society of America, 110, pp. 2845-2848.

Laures, J. and Bunton, K. (2003). Perceptual effects of a flattened fundamental frequency at the sentence level under different listening conditions. Journal of Communication Disorders, 36, 449-464

Laures, J. S., and Weismer, G. (1999). The effects of a flattened fundamental frequency contour on intelligi -bility at the sentence level. Journal of Speech, Language and Hearing Research, 42, 1148-1156.

Le, T.H., Patel, S., and Roberts, T.P. (2001). Functional MRI of human auditory cortex using block and event-related designs. Magnetic Resonance in Medicine. 45(2), pp. 254-60.

Lehiste, I. (1970). Suprasegmentals. Cambridge: MIT Press. Lehiste, I. (1976). Suprasegmental features of speech. In N.J. Lass (Ed.), Contemporary Issues in Experi-

mental Phonetics, pp.225-239. New York, NY: Academic Press.Levin, D.N. (2002). Representations of sound that are insensitive to spectral filtering and parametrization pro -

cedures. Journal of the Acoustical Society of America, 111(5) Pt. 1, pp. 2257-71. Lieberman, P. (1960). Some acoustic correlates of word stress in American english. Journal of the Acoustical

Society of America, 32:451-454Liégeois-Chauvel, C., de Graaf, J.B., Laguitton, V., and Chauvel, P. (1999). Specialization of left auditory cor-

tex for speech perception in man depends on temporal coding. Cerebral Cortex. 9(5), pp. 484-96.Logothetis, N.K., Pauls, J., Augath, M., Trinath, T., and Oeltermann, A. (2001). Neurophysiological investiga-

tion of the basis of the fMRI signal. Nature 412(6843): 150-157.Logothetis, N.K., and Pfeuffer, J. (2004). On the nature of the BOLD fMRI contrast mechanism. Magnetic

Resonance Imaging 22: 1517-1531.Lombard, E. (1911). Le signe de l’elevation de la voix. Annales des Maladies de l'Oreille du Larynx, 37, pp.

101-119.MacNeilage, P.F., and Davis, B.L. (1993). Acquisition of speech production: Frames then content. In M. Jean-

nerod (Ed.), Attention and performance XIII: Motor representation and control (Lawrence Erlbaum, Hillsdale, NJ), pp. 453-475.

Maes, F., Collignon, A., Vandermeulen, D., Marchal, G., and Seutens, P. (1997). Multimodality image regis -tration by maximisation of mutual information. IEEE Transactions on Medical Imaging, 16, pp.187-197,

Max, L., Wallace, M.E., and Vincent, I. (2003). Sensorimotor adaptation to auditory perturbations during speech: Acoustic and kinematic experiments. Proceedings of the XVth International Congress of Pho-netic Sciences, pp. 1053-1056. Barcelona: ICPhS Organizing Committee.

Max, L., Guenther, F.H., Gracco, V.L., Ghosh, S.S., and Wallace, M.E. (2004). Unstable or insufficiently acti -vated internal models and feedback-biased motor control as sources of dysfluency: A theoretical model of stuttering. Contemporary Issues in Communication Science and Disorders, 31, pp. 105-122.

McIntosh, A. R., and Gonzalez-Lima, F. (1994a). Structural equation modeling and its application to network analysis in functional brain imaging. Human Brain Mapping 2: 2-22.

McIntosh, A.R., Grady, C.L., Ungerleider, L. G., Haxby, J.V., Rapoport, S.I., and Horwitz, B. (1994b). Network analysis of cortical visual pathways mapped with PET. Journal of Neuroscience 14(2): 655-666.

Mechelli, A., Penny, W. D., Price, C. J., Gitelman, D. R. and Friston, K. J. (2002) Effective connectivity and in-tersubject variability: using a multisubject network to test differences and commonalities. NeuroImage 17(3): 1459-1469.

Medina, J.F., Garcia, K.S., Nores, W.L., Taylor, N.M. and Mauk, M.D. (2000). Timing mechanisms in the cere-bellum: Testing predictions of a large-scale computer simulation. Journal of Neuroscience, 20, pp. 5516-

PHS 398 (Rev. 05/01) Page> 56

Page 40: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.5525.

Mendelson, J.R., Schreiner, C.E., Sutter, M.L., Grasse, K.L. (1993). Functional topography of cat primary au-ditory cortex: responses to frequency-modulated sweeps. Experimental Brain Research, 94(1), pp. 65-87.

Menyuk, P., and Bernholtz, N. (1969). Prosodic features and children’s language production. MIT Quarterly Progress Report, 93, pp. 216-219.

Meyer, M., Alter, K., Friederici, A.D., Lohmann, G., and von Cramon, D.Y. (2002). FMRI reveals brain regions mediating slow prosodic modulation of spoken sentences. Human Brain Mapping, 17, pp. 73-88.

Micci Barreca, D., and Guenther, F.H. (2001). A modeling study of potential sources of curvature in human reaching movements. Journal of Motor Behavior, 33, pp. 387-400.

Mildner, V. (2004). Hemispheric asymmetry for linguistic prosody: A study of stress perception in Croatian. Brain and Cognition, 55, pp. 358-361.

Miller, J.D. (1989). Auditory-perceptual interpretation of the vowel. Journal of the Acoustical Society of Amer-ica, 85, pp. 2114-2134.

Mitchell, R.L., Elliot, R., Barry, M., Cruttenden, A., and Woodruff, P.W. (2003). The neural response to emo-tional prosody, as revealed by functional magnetic resonance imaging. Neuropsychologia, 41, pp. 1410-1421.

Morel, A., Garraghty, P.E., and Kaas, J.H. (1993). Tonotopic organization, architectonic fields, and connec-tions of auditory cortex in macaque monkeys. Journal of Comparative Neurology, 335(3), pp. 437-59.

Morton, J., and Jassem, W. (1965). Acoustic correlates of stress. Language & Speech, 8, 159-181.Mottaghy, FM, Doring, T, Muller-Gartner, HW, Topper, R, and Krause, BJ (2002). Bilateral parieto-frontal net-

work for verbal working memory: An interference approach using repetitive transcranial magnetic stimu-lation (rTMS). European Journal of Neuroscience, 16, pp. 1627-1632.

Mummery, C.J., Ashburner, J., Scott, S.K., and Wise, R.J. (1999). Functional neuroimaging of speech per-ception in six normal and two aphasic subjects. Journal of the Acoustical Society of America, 106, pp. 449-57.

Munhall, KG. (2001). Functional imaging during speech production. Acta Psychologica. 107(1-3), pp. 95-117. Natke, U., and Kalveram, K.T. (2001). Effects of frequency-shifted auditory feedback on fundamental fre -

quency of long stressed and unstressed syllables. Journal of Speech, Language, and Hearing Research, 44, pp. 577-84.

Netsell, R. (1973). Speech Physiology. In F. Minifie, T. J. Hixon, and F. Williams (Eds.), Normal aspects of speech, hearing, and language, pp. 211-234. Englewood Cliffs, NJ: Prentice-Hall.

Nichols, T.E. and Holmes, A.P. (2002). Nonparametric permutation tests for functional neuroimaging: A primer with examples. Human Brain Mapping, 15(1), pp. 1-25.

Nieto-Castanon, A., Ghosh, S.S., Tourville, J.A. and Guenther, F.H. (2003). Region of interest based analysis of functional imaging data. Neuroimage, 19, pp. 1303-1316.

Nieto-Castanon, A., Guenther, F.H., Perkell, J.S., and Curtin, H. (2005). A modeling investigation of articula-tory variability and acoustic stability during American English /r/ production. Journal of the Acoustical So-ciety of America, 117, pp. 3196-3212.

Patel, R. (2002a). Phonatory control in adults with cerebral palsy and severe dysarthria. Alternative and Aug-mentative Communication, 2, pp. 2-10.

Patel, R. (2002b). Prosodic control in severe dysarthria: Preserved ability to mark the question-statement con-trast. Journal of Speech, Language and Hearing Research, 45, 5, pp. 858-870.

Patel, R. (2003). Acoustic differences in the yes-no question-statement contrast between speakers with and without dysarthria. Journal of Speech, Language and Hearing Research, 46, 6, pp. 1401-1415.

Patel, R. (2004). Contrastive prosody in adults with cerebral palsy. Journal of Medical Speech Pathology, 12(4), pp. 189-193.

Pelecanos, J., and Sridharan, S. (2001). Feature warping for robust speaker verification. in proceed-ings of A Speaker Odyssey, Paper 1038.

Pell, M.D. (1999). The temporal organization of affective and non-affective prosody in patients with right-hemisphere infarcts. Cortex, 35, pp. 455-477.

PHS 398 (Rev. 05/01) Page> 57

Page 41: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.Peppé, S., Maxim, J. and Wells, B. (2000). Prosodic variation in southern British English. Language and

Speech 43(3), pp. 309-334.Perkell, J.S., Guenther, F.H., Lane, H., Matthies, M.L., Perrier, P., Vick, J., Wilhelms-Tricarico, R., and

Zandipour, M. (2000). A theory of speech motor control and supporting data from speakers with normal hearing and profound hearing loss. Journal of Phonetics, 28, pp. 233-272.

Perkell, J.S., Guenther, F.H., Lane, H., Matthies, M.L., Stockmann, E., Tiede, M., and Zandipour, M. (2004a). Cross-subject correlations between measures of vowel production and perception. Journal of the Acous-tical Society of America, 116(4) Pt. 1, pp. 2338-2344.

Perkell, J.S., Matthies, M.L., Tiede, M., Lane, H., Zandipour, M., Marrone, N., Stockmann, E., and Guenther, F.H. (2004b). The distinctness of speakers’ /s-sh/ contrast is related to their auditory discrimination and use of an articulatory saturation effect. Journal of Speech, Language, and Hearing Research, 47, pp. 1259-1269.

Perrett, S.P, Ruiz, B.P., and Mauk, M.D. (1993). Cerebellar cortex lesions disrupt learning-dependent timing of conditioned eyelid responses. Journal of Neuroscience, 13, pp. 1708-1718.

Pickett, E., Kuniholm, E., Protopapas, A., Friedman, J., and Lieberman, P. (1998). Selective speech motor, syntax and cognitive deficits associated with bilateral damage to the putamen and the head of the cau -date nuclues: a case study. Neuropsychologia, 36, pp.173–188.

Pierrehumbert, J.B. (1980) The Phonology and Phonetics of English Intonation. MIT PhD dissertation, Cam-bridge, MA.

Pihan, H., Altenmuller, E., and Ackermann, H. (1997). The cortical processing of perceived emotion: a DC-po-tential study on affective speech prosody. Neuroreport, 8, pp. 623-7.

Poeppel, D. (2003). The analysis of speech in different temporal integration windows: Cerebral lateralization as ‘asymmetric sampling in time’. Speech Communication, 41, pp. 245-255.

Ravizza, S.M., Delgado, M.R., Chein, J.M., Becker, J.T., and Fiez, J.A. (2004). Functional dissociations within the inferior parietal cortex in verbal working memory. NeuroImage, 22, pp. 562-573.

Rees, G., Friston, K., and Koch, C. (2000). A direct quantitative relationship between the functional properties of human and macaque V5. Nature Neuroscience 3(7): 716-723.

Rhodes, B.J., Bullock, D., Verwey, W.B., Averbeck, B.B., and Page, M.P. (2004). Learning and production of movement sequences: behavioral, neurophysiological, and modeling perspectives. Human Movement Science, 23(5), pp. 699-746.

Riecker, A., Ackermann, H., Wildgruber, D., Dogil, G., and Grodd, W. (2000a). Opposite hemispheric lateral-ization effects during speaking and singing at motor cortex, insula and cerebellum. Neuroreport, 11(9), pp.1997-2000.

Riecker, A., Ackermann, H., Wildgruber, D., Meyer, J., Dogil, G., Haider, H., and Grodd, W. (2000b). Articula -tory/phonetic sequencing at the level of the anterior perisylvian cortex: a functional magnetic resonance imaging (fMRI) study. Brain and Language, 75(2), pp. 259-276.

Riecker, A., Wildgruber, D., Grodd, W., and Ackermann, H. (2002). Reorganization of speech production at the motor cortex and cerebellum following capsular infarction: a follow-up functional magnetic resonance imaging study. Neurocase, 8(6), pp. 417-423.

Riva, D. (1998). The cerebellar contribution to language and sequential functions: evidence from a child with cerebellitis. Cortex, 34(2), pp. 279-287.

Rizzolatti G., Fogassi L., Gallese V. (2002) Motor and cognitive functions of the ventral premotor cortex. Cur-rent Opinion in Neurobiology. 12(2):149-54.

Ross, E.D. and Mesulam, M.M. (1979). Dominant language functions of the right hemisphere? Prosody and emotional gesturing. Archives of Neurology, 36, pp. 144-148.

Scott, S.K., Blank, C.C., Rosen, S., and Wise, R.J. (2000). Identification of a pathway for intelligible speech in the left temporal lobe. Brain. 123, pp. 2400-2406.

Shattuck-Hufnagel, S., and Turk, A.E. (1996). A prosody tutorial for investigators of auditory sentence pro-cessing. Journal of Psycholinguistic Research, 25(2), pp. 193-247.

Shriberg, L.D., and Kent, R.D. (1982). Clinical phonetics. New York: Macmillan. Shriberg, L.D., and Kent, R.D. (2003). Clinical phonetics (3rd ed.). Boston: Allyn & Bacon.Shuster, L.I. and Lemieux, S.K. (in press). An fMRI investigation of covertly and overtly produced mono- and

PHS 398 (Rev. 05/01) Page> 58

Page 42: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.multisyllabic words. Brain and Language.

Sidtis, J.J. (1984). Music, pitch perception, and the mechanisms of cortical hearing. In Handbook of cognitive neuroscience (Gazzaniga MS, ed.), pp. 91–114. New York: Plenum.

Sidtis JJ, and Feldmann E. (1990). Transient ischemic attacks presenting with a loss of pitch perception. Cor-tex. 26(3), pp. 469-71.

Sidtis, J.J. and Van Lancker Sidtis, D. (2003). A neurobehavioral approach to dysprosody. Seminars in Speech and Language, 24, pp. 93-105.

Sidtis, J.J., and Volpe, B.T. (1988). Selective loss of complex-pitch or speech discrimination after unilateral cerebral lesion. Brain and Language, 34, pp. 235-245.

Smith, E.E., and Jonides, J. (1998). Neuroimaging analyses of human working memory. Proceedings of the National Academy of Sciences of the United States of America. 95(20), pp. 12061-8.

Snow, D. (1994). Phrase-final syllable lengthening and intonation in early child speech. Journal of Speech and Hearing Research, 37, pp. 831-840.

Spencer, R.M., Zelaznik, H.N., Diedrichsen, J., and Ivry, R.B. (2003). Disrupted timing of discontinuous but not continuous movements by cerebellar lesions. Science, 300, pp. 1437-9.

Stiller, D., Gaschler-Markefski, B., Baumgart, F., Schindler, F., Tempelmann, C., Heinze, H.J., and Scheich, H. (1997). Lateralized processing of speech prosodies in the temporal cortex: a 3- T functional magnetic resonance imaging study. MAGMA, 5, pp. 275-84.

Steinschneider, M., Fishman, Y.I., and Arezzo, J.C. (2003). Representation of the voice onset time (VOT) speech parameter in population responses within primary auditory cortex of the awake monkey. Journal of the Acoustical Society of America, 114(1), pp. 307-2.

Streeter, L.A., Macdonald, N.H., Apple, W., Krauss, R.M. and Galotti, K.M. (1983). Acoustic and perceptual indicators of emotional stress. Journal of the Acoustical Society of America, 73, 1354-1360.

Tagamets, M.A. and Horwitz, B. (1997), Modeling brain imaging data with neuronal assembly dynamics. in J.M. Bower (Editor) Computational Neuroscience: Trends in Research 1997, pp. 949-953. Plenum Press. New York,

Tagamets, M. A., and Horwitz, B. (2001). Interpreting PET and fMRI measures of functional neural activity: The effects of synaptic inhibition on cortical activation in human imaging studies. Brain Research Bul-letin 54(3): 267-273.

Tian, B. and Rauschecker, J.P. (1998). Processing of frequency-modulated sounds in the cat's posterior audi-tory field. Journal of Neurophysiology, 79, pp. 2629-42.

Ulloa, A., Bullock, D., and Rhodes, B. (2003a). A model of cerebellar adaptation of grip forces during lifting. Proceedings of the IJCNN, 4, pp. 3167-3172.

Ulloa, A., Bullock, D., and Rhodes, B. (2003b). Adaptive force generation for precision-grip lifting by a spec-tral timing model of the cerebellum. Neural Networks, 16, pp. 521-528.

Van Lancker, D. (1980). Cerebral lateralization of pitch cues in the linguistic signal. International Journal of Human Communication, 13, pp. 201-227.

Van Lancker, D., and Sidtis, J.J. (1992). The identification of affective-prosodic stimuli by left- and right-hemi -sphere-damaged subjects: All errors are not created equal. Journal of speech and hearing research, 35, pp. 963-970.

Vance, J.E. (1994). Prosodic deviation in dysarthria: a case study. European Journal of Disorders of Commu-nication, 29(1), pp. 61-76.

Villacorta, V., Perkell, J.S., and Guenther, F.H., (2004). Sensorimotor adaptation to acoustic perturbations in vowel formants. Program of the 147 th Meeting of the Acoustical Society of America, Journal of the Acoustical Society of America, 115, p. 2430.

Villacorta, V. (2005). Sensorimotor adaptation to perturbations of vowel acoustics and its relation to percep-tion. MIT PhD Dissertation. Cambridge, MA: MIT.

Walker, J.P., Pelletier, R., and Reif, L. (2004). The production of linguistic prosodic structures in subjects with hemisphere damage. Clinical linguistics & phonetics, 18, pp. 85-106.

Westbury, J.R., Hashi, M., and Lindstrom, M.J. (1998). Differences among speakers in lingual articulation of American English /r/. Speech Communication, 26, pp. 203-226.

Whitehill, T., Ciocca, V., and Lam, S. (2001). Fundamental frequency control in connected speech in Can-

PHS 398 (Rev. 05/01) Page> 59

Page 43: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.tonese speakers with dysarthria. In B. Maassen, W. Hulstijn, R. Kent, H. Peters, & P. van Lieshout (Eds.), Speech motor control in normal and disordered speech, pp. 228-231. Nijmegen: University of Nij-megen Press.

Wildgruber, D., Ackermann, H., and Grodd, W. (2001). Differential contributions of motor cortex, basal gan -glia, and cerebellum to speech motor control: Effects of syllable repetition rate evaluated by fMRI. Neu-roImage, 13(1), pp. 101-109.

Williams, C., and Stevens, K.N. (1972). Emotions and speech: Some acoustical correlates. Journal of the Acoustical Society of America, 52, pp. 1238-1250.

Williamson, J.B., Harrison, D.W., Shenal, B.V., Rhodes, R., and Demaree, H.A. (2003). Quantitative EEG di-agnostic confirmation of expressive aprosodia. Applied neuropsychology, 10, pp. 176-181.

Wise, R.J., Greene, J., Buchel, C., and Scott, S.K. (1999). Brain regions involved in articulation. Lancet, 353, pp. 1057-1061.

Wong, P.C., Nusbaum, H.C., Small, S.L. (2004) Neural bases of talker normalization. Journal of Cognitive Neuroscience. 16(7):1173-84.

Yorkston, K. M., Beukelman, D. R., and Bell, K. R. (1988). Clinical Management of Dysarthric Speakers. Austin, TX: PRO-ED, Inc.

Zahn, R., Huber, W., Drews, E., Erberich, S., Krings, T., Willmes, K., and Schwarz, M. (2000). Hemispheric lateralization at different levels of human auditory word processing: a functional magnetic resonance imaging study. Neuroscience letters. 287(3), pp. 195-8.

Zarahn, E., and Slifstein, M. (2001). A reference effect approach for power analysis in fMRI. Neuroimage 14(3): 768-779.

Zatorre, R.J. (1988). Pitch perception of complex tones and human temporal lobe function. Journal of the Acoustical Society of America, 84, pp. 566-572.

Ziegler, W., Kilian, B., and Deger, K. (1997). The role of the left mesial frontal cortex in fluent speech: evi-dence from a case of left supplementary motor area hemorrhage. Neuropsychologia, 35(9), pp. 1197–1208.

Zohary, E. (1992). Population coding of visual stimuli by cortical neurons tuned to more than one dimension. Biological Cybernetics, 66, pp. 265-272.

PHS 398 (Rev. 05/01) Page> 60

Page 44: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.H. Consortium/Contractual Arrangements

The proposed research involves a subcontract to Northeastern University (Principal Investigator Dr. Rupal Patel, Assistant Professor of Speech Language Pathology). Dr. Patel and a graduate student will perform the seven prosody-related psychophysical experiments (involving 105 total subjects) described in Section D.1. See letter of cooperation from Dr. Patel in Section J.

In addition to his primary appointment at Boston University, Dr. Guenther has a research appointment at Massachusetts General Hospital, where fMRI experiments will be performed. Dr. Guenther and his lab mem-bers therefore have direct access to the MRI equipment needed to perform the research in this application. Imaging time at the MRI facilities will be invoiced to Prof. Guenther at Boston University; thus no subcontract is necessary for this component of the proposed research.

PHS 398 (Rev. 05/01) Page> 61

Page 45: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.I. Resource Sharing

The proposal does not involve direct cost amounts in excess of $500,000 in any year, and it does not include model organisms.

PHS 398 (Rev. 05/01) Page> 62

Page 46: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.J. Consultants

Letters of cooperation from the following individuals are provided in the following pages:

Dr. Rupal Patel, Assistant Professor of Speech Language Pathology, Northeastern University. Dr. Patel and a graduate student will perform the prosody-related auditory perturbation psychophysical experiments described in Section D.1.

PHS 398 (Rev. 05/01) Page> 63

Page 47: DESCRIPTION: State the application’s broad, long …web.mit.edu/carrien/Public/speechlab/grant/Revised... · Web viewProduction Experiment 3 will investigate word-level stress by

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.8. Appendix Materials

The following documents are included in the Appendix:1. Guenther, F.H., Ghosh, S.S., and Tourville, J.A. (2005). Neural modeling and imaging of the cortical

interactions underlying syllable production Brain and Language, E-print ahead of publication.2. Villacorta, V. (2005). Sensorimotor adaptation to perturbations of vowel acoustics and its relation to

perception (chapter 5). MIT PhD Dissertation. Cambridge, MA: MIT.3. Nieto-Castanon, A., Guenther, F.H., Perkell, J.S., and Curtin, H. (2005). A modeling investigation of

articulatory variability and acoustic stability during American English /r/ production. J Acoust Soc Am., 117, pp. 3196-3212.

4. Guenther, F.H., Nieto-Castanon, A., Ghosh, S.S., and Tourville, J.A. (2004). Representation of sound categories in auditory cortical maps. Journal of Speech, Language, and Hearing Research, 47, pp. 46-57.

PHS 398 (Rev. 05/01) Page> 64