Download - Quantitative approaches to the study of neural coding · called “rate code”? •Poisson process completely specified by instantaneous firing rate. If instantaneous rate tracks

Harvard-MIT Division of Health Sciences and TechnologyHST.722J: Brain Mechanisms for Hearing and SpeechCourse Instructor: Bertrand Delgutte

Quantitative approaches to the study of neural coding

©Bertrand Delgutte, 2003-2005

www.bsscommunitycollege.in www.bssnewgeneration.in www.bsslifeskillscollege.in

1www.onlineeducation.bharatsevaksamaj.net www.bssskillmission.in

WWW.BSSVE.IN

The Problem• Neural and behavioral data have different forms that

are not easily comparable– In some cases, data from different species and states

(awake vs. anesthetized) are being compared– Even when neural and behavioral data have similar forms

(e.g. a threshold), they both depend on arbitrary criteria • Both neural and behavioral responses are

probabilistic: They are not identical for different repetitions of the same stimulus. Therefore, they provide a noisy (imprecise) representation of the stimulus.

• Are there quantitative tools to describe the precision of stimulus representation in neural and behavioral data?



WWW.BSSVE.IN

Questions that can be addressed by quantitative methods

• Does a particular class of neurons provide sufficient information to account for performance in a particular behavioral task?

• Does a particular neural code (e.g. rate-place vs. temporal) provide sufficient information for the task?

• Which stimulus feature is a particular neuron or neural population most sensitive to? Are there “feature detector” neurons?

• Is there stimulus information in the correlated firings of groups of neurons?

• How efficient is the neural code?



WWW.BSSVE.IN

Outline

• Signal detection theory (a.k.a. ideal observer analysis)– Single neuron– Combined performance for neural population

• Shannon information theory



WWW.BSSVE.IN

Neural variability limits detection performance• When a sound stimulus is presented

repeatedly, the number of spikes recorded from an auditory neuron differs on each trial. The blue and red surfaces are model spike count distributions for two pure tones differing in intensity by 3 dB.

• The overlap in the spike count distributions limits our accuracy in identifying which of the two stimuli was presented based on the responses of this neuron.

• A measure of the separation between the two distributions (and therefore of the neuron’s ability to discriminate) is the discriminability index d’, which is the difference in means of the two distributions divided by their standard deviation.

• The just noticeable difference (JND) or difference limen (DL) is often taken as the intensity increment for which d’ = 1. This criterion corresponds to 76% correct in a two-interval, two-alternative psychophysical experiment.

Delgutte (unpublished)



WWW.BSSVE.IN

Conditional Probability and Bayes’ Rule

Conditional Probability:

Statistically Independent Events:

Bayes’ Rule:

( | ) ( , ) / ( )P S R P S R P R=

( , ) ( ) ( )( | ) ( )

P S R P S P RP S R P S

==

( | ) ( )( | )( )

P R S P SP S RP R

=www.bsscommunitycollege.in www.bssnewgeneration.in www.bsslifeskillscollege.in


WWW.BSSVE.IN

Bayesian Optimal Decision and the Likelihood Ratio

The Problem:• Choose between two alternatives (stimuli) S0 and S1 with prior

probabilities P(S0) and P(S1) given the observation (neural response) R so as to minimize the probability of error.

• Conditional probabilities (stimulus-response relationships) P(R|S0) and P(R|S1) are known.

Bayes’ Optimal Decision Rule:• Given a specific value of R, choose the alternative which

maximizes the posterior probability:

Equivalent Likelihood Ratio Test (LRT):

1( 1| ) ( 0 | )

0

SP S R P S R

S

><

1( | 1) ( 0)( | 0) ( 1)

0

SP R S P SLRP R S P S

S

>=

<

LRT separates priors (under experimenter control) from neuron’s stimulus-response characteristic.



WWW.BSSVE.IN

Properties of the Likelihood Ratio

• Separates computation (left side) from biases and costs (prior probabilities, right)

• If the conditional probabilities P(R|Si) are either Poisson or Gaussian with equal variances, LRT reduces to comparing the response R with a threshold γ:

• LRT works not only for scalar observations (e.g. a spike count from a single neuron), but also for multidimensional observations (e.g.temporal discharge patterns and ensembles of neurons).

• Decision rule invariant to monotonic transformation (e.g. logarithm)• For independent observations (e.g. simultaneous observations from

multiple neurons or multiple observation intervals from same neuron), logLR is additive:

1( | 1) ( 0)( | 0) ( 1)

0

SP R S P SLRP R S P S

S

>=

<

1

0

SR

S

γ><

log ( 1, 2,..., ) log ( )N

LR R R RN LR Ri=∑1i=www.bsscommunitycollege.in www.bssnewgeneration.in www.bsslifeskillscollege.in


WWW.BSSVE.IN

Detection and False Alarm ProbabilitiesP(R|S0)

P(R|S1)γ

PF

PD

( ) ( 0) ( 1) (1 )P Error P S PF P S PD= + −

• The neural response (in general LR) is compared to a criterion to make a decision about which stimulus was presented

• Two types of errors: “misses” and “false alarms”• PF is the probability of false alarm• PD is the probability of detection; PM = 1-PD is the probability of a

miss

Poisson Example



WWW.BSSVE.IN

Receiver Operating Characteristic (ROC)

Conditional Probabilities

(Poisson)

P(R/S0)

P(R/S1)

ROC Curves

• The area under the ROC curve gives a distribution-free measure of performancewww.bsscommunitycollege.in www.bssnewgeneration.in www.bsslifeskillscollege.in


WWW.BSSVE.IN

d’ as a measure of performance• If the conditional probabilities P(R|S0)

and P(R|S1) are Gaussian with equal variances, then d’ = ∆r/σ completely determines the performance (the ROC curve):

• Many probability distributions (including Poisson) approach a Gaussian when the mean response becomes moderately large.

• If so, performance in non-Gaussian cases can be approximated by d’. If, as in the Poisson case, the variances are unequal under the two alternatives, they can be averaged.

• d’ is a poor measure of performance when the number of spikes is very small or when the two stimuli are widely separated

2 / 22

'/ 2

1/ 2 zI

d

PC e dzπ∞

−

−

= ∫



WWW.BSSVE.IN

Visual discrimination of random-dot patterns by monkeys and cortical neurons

• Single-unit recordings from Area MT (middle temporal) of awake macaques

• Single-unit responses and behavioral performance were recorded simultaneously

• Performance of typical neuron matches behavioral performance

Figures removed due to copyright reasons. Please see: Figures from Newsome, W. T., K. H. Britten, C. D. Salzman, and J. A. Movshon. "Neuronal mechanisms of motion perception. Cold Spring Harbor Symp." Quant Biol 55 (1990): 697-705.



WWW.BSSVE.IN

Forward masking in the auditory nerve?

Figures removed due to copyright reasons.

Please see:Relkin, E. M., and C. W. Turner. “A reexamination of forward masking in the auditory nerve.” J Acoust Soc Am 84, no. 2 (Aug 1988): 584-91.

Psychophysical masked thresholds grow much faster with masker level than neural thresholds. Maximum masking can reach 30-50 dB.



WWW.BSSVE.IN

Performance based on spike count from single AN fiber severely deviates from Weber’s law

0 20 40 60 80 1000

500

1000

0 20 40 60 80 1000

10

20

30

Intensity (dB SPL)

Dis

char

ge ra

te

(sp/

sec)

Inte

nsity

DL

(dB

)

Auditory-nerve fiber

Hypothetical neuron verifying Weber’s law

Poisson statistics assumedwww.bsscommunitycollege.in www.bssnewgeneration.in www.bsslifeskillscollege.in


WWW.BSSVE.IN

The “lower envelope principle”Somatosensory: Detection of

sinusoidal vibration Auditory: Pure tone detection

Figure removed due to copyright reasons.

Please see:Delgutte, B. “Physiological models for basic auditory percepts.” Auditory Computation. Edited by H. Hawkins, and T. McMullen. New York, NY: Springer-Verlag, 1996, pp. 157-220.


Mountcastle (1972)

Psychophysical performance is determined by the most sensitive neuron in the population.



WWW.BSSVE.IN

Optimal pooling of information across neurons• Performance in discriminating two stimuli

can be improved by combining information across neurons. Specifically, if the spike counts from N neurons are either Poisson or Gaussian and statistically-independent, the optimum combination rule is to form a weighted sum of the spike counts. The discriminability index for this optimum combination is given by d’2 = Σi d’i2 .

• The structure of the optimum detector is identical to that of a single-layer perceptronin artificial neural networks. The weights can be interpreted as the strengths of synaptic connections and the threshold device as the threshold for all-or-none spike discharges in a postsynaptic neuron.

• When responses of the different neurons are not statistically independent, benefits of combining information may be better or worse than in independent case depending on the nature of the correlations.

Delgutte (unpublished)

2i

ii

rwσ∆

=



WWW.BSSVE.IN

Rate-Place Model for Intensity Discrimination (Siebert)

• Siebert (1968) developed the first model for predicting the performance of a sensory system based on the activity in primary neurons. His model incorporated all the key features of AN activity known at the time, including logarithmic cochlear frequency map, cochlear tuning, saturating rate-level functions and Poisson discharge statistics.

• The model predicts constant performance in pure-tone intensity discrimination over a wide range of intnesities (Weber’s law) by relying on the unsaturated fibers on the skirts of the cochlear excitation pattern.

• Psychophysical experiments (Viemeister, 1983) have since ruled out this model because listeners show good performance even when masking noise restricts information to a narrow band around the tone frequency. www.bsscommunitycollege.in www.bssnewgeneration.in www.bsslifeskillscollege.in


WWW.BSSVE.IN

Spread of excitation to remote cochlear places makes model predict Weber’s Law…

Characteristic Frequency (Hz)

Dis

char

ge R

ate

(sp/

s)50 dB

80 dB

100 1000 10,0000

250

100 1000 10,0000

250

…but performance severely degrades in band-reject noise.

Characteristic Frequency (Hz)

Dis

char

ge R

ate

(sp/

s)



WWW.BSSVE.IN

Rate vs. Temporal Codes

• Temporal code can convey more information than rate code

• Distinction between rate and temporal codes not that easy:

• Rate has to change when new stimulus occurs. How fast can it change and still be called “rate code”?

• Poisson process completely specified by instantaneous firing rate. If instantaneous rate tracks stimulus waveform, is it a rate or a temporal code?

From:Rieke, Fred, David Warland, Rob de RuytervanSteveninck, and William Bialek. Spikes: Exploring the Neural Code. Cambridge, MA: MIT Press/Bradford Books, 1997 (c). Used with permission.www.bsscommunitycollege.in www.bssnewgeneration.in www.bsslifeskillscollege.in


WWW.BSSVE.IN

Rate-Place Code• Narrow tuning functions• Neurons are topographically

organized to form map of stimulus parameter

• Each stimulus activates a restricted population of neurons

• Stimulus estimated from location of maximum activity along map

• Examples:– Retinal map of space– Tonotopic map– ITD map in barn owl

(Jeffress model)?



WWW.BSSVE.IN

A distributed code: The population vector (Georgopoulos, 1986)

• Broad tuning functions around a preferred direction

• Topographic map not necessary• Each stimulus activates entire

neural population to a degree• Each neuron’s response can be

seen as a vector, with angle equal to the neuron’s best direction and magnitude equal to firing rate

• Vector sum across neural population accurately points to the stimulus direction

• Example: Direction of arm movement in motor cortex.

• Could apply to any angular dimension, e.g. sound azimuth.



WWW.BSSVE.IN

Entropy of a Random Variable

Definition:

• Entropy is expressed in bits (binary choices)• For a uniform probability distribution (p(x) = 1/N for all

x), the entropy is the logarithm (base 2) of the number N of possible values of X. All other distributions have a lower entropy.

• The entropy represents the number of independent binary choices required to uniquely specify the value of a random variable, i.e. it quantifies the “uncertainty” about that variable. It is also the average length of a binary string of 1’s and 0’s required to encode the variable.

2( ) ( ) log ( )x X

H X p x p x∉

= −∑



WWW.BSSVE.IN

Entropy rates in soundsFor a continuous signal source, the entropy rate (in bits/s) is lim 1/ ( )

TR T H T

−>∞=

• CD-quality audio: 44,100 Hz X 16 bits X 2 channels = 1.4 Mb/s

• MP3: 32-128 kb/s• Telephone: 30 kb/s• Vocoder: 2.4-9.6 kb/s• Nonsense speech: 8 phonemes/sec X log2(32)

phonemes = 40 b/s (overestimate because not equiprobable)

• This lecture: 2 bits/3600s = 0.0006 b/s• What about neuron firings?



WWW.BSSVE.IN

Entropy rate of spike trainsSpike Times (binomial distribution, McKay & McCulloch 1952)

rr is the average firing rate, ∆t the temporal resolutionExample: HT = 950 bits/s (3.2 bits/spike)

for r = 300 spikes/s, ∆t=1 msSpike Counts (exponential distribution - optimal)

n=n=rTrT is the mean spike count, T is the recording timeExample: HR=26 bits/s (0.1 bits/spike) for r = 300 spikes/s, T=300

ms

2logTeR rr t

⎛ ⎞≈ ⎜ ⎟∆⎝ ⎠

[ ]2 2log (1 ) log (1 1/ ) /RR n n n T= + + +

The entropy rate of a spike train gives an upper bound on the information about the stimulus that can be transmitted by the spike train.

Figures removed due to copyright reasons.Please see:Figures from Rieke F., D. A. Bodnar, W. Bialek. "Naturalistic stimuli increase the rate and efficiency ofinformation transmission by primary auditory afferents." Proc Biol Sci 262, no. 1365(December 22, 1995): 259-65. (Courtesy of The Royal Society. Used with permission.)



WWW.BSSVE.IN

Conditional Entropy and Mutual Information

H(S) H(R)

H(R|S)H(S|R) I(S,R)

• The entropy H(S) represents the uncertainty about the stimulus in the absence of any other information

• The conditional entropy H(S|R) represents the remaining stimulus uncertainty after the neural response has been measured

• I(S,R) = H(S)–H(S|R) is the mutual information between S and R; it represents the reduction in uncertainty achieved by measuring R

• If S and R are statistically independent, then I(S,R)=0 and H(S,R)=H(S)+H(R)

• By symmetry, I(S,R) = I(R,S) = H(R)–H(R|S)

p(s,r) = p(s|r) p(r)

H (S) = _Σp(si) log2 p(si)

H (R,S) = _ΣΣp(si,rj) log2 p(si,ri)

Equivalent forms for average information:I(R, S) = H(R) _ H(R|S)I(R, S) = H(S) _ H(S|R)

I(R, S) = H(R) + H(S) _ H(R, S)

i

i j

Bayes' theorem

Entropy of S

Joint entropy of R and S

H (S|R) = _Σp(rj) Σp(si|rj) log2 p(si|rj)ij

Conditional entropy of R given S or neuronal noise

H (R|S) = _Σp(sj) Σp(ri|sj) log2 p(ri|sj)ij

Conditional entropy of S given R or stimulus equivocation

Entropy and Information.



WWW.BSSVE.IN

Direct method for computing mutual information between stimulus and neural response

H(S) H(R)

H(R|S)H(S|R)

I(S,R)

• The most obvious method for measuring I(S,R)would be to subtract H(S|R) from H(S). But H(S|R) is hard to measure because it requires estimating the stimulus from neural response.

• Trick: By symmetry, I(S,R) is also H(R) – H(R|S).H(R|S) is the entropy of the part of the neural response that is NOT predictable from the stimulus, i.e. the noise in the response.

Method:• Present stimulus set S with probability P(S). Measure neural response to many

presentations of each stimulus. • Estimate P(R) for entire stimulus set and compute H(R) from P(R).• To estimate the noise in the response, average response to all presentations of same

stimulus, and subtract average from response to each trial. Compute H(R|S) from the estimated noise distribution.

• Subtract H(R|S) from H(R). • Assumptions about noise distribution (e.g. Gaussian) can be made to simplify estimation of

probabilities. • In practice, this method becomes prohibitive in its data requirements for large

dimensionality, e.g. when computing information for population of neurons.www.bsscommunitycollege.in www.bssnewgeneration.in www.bsslifeskillscollege.in


WWW.BSSVE.IN

Classic Example (Shannon): Information Rate of Gaussian Channel

S(t)

N(t)

R(t)

( )20

( )log 1 ( )

W

INFOS fR dfN f= +∫

Both the stimulus S(t) and the additive noise N(t) have Gaussian probability distributions which are fully characterized by the power spectra S(f)and N(f).


Please see: Borst, A., and F. E. Theunissen. “Information theory and neural coding.“ Figure 2 in Nat Neurosci 2, no. 11 (Nov 1999): 947-57. Information rate is entirely determined

by available bandwidth W and signal-to-noise ratio S/N.

Telephone channel: S/N=30 dB, W=3000 Hz => I = 30 kb/s



WWW.BSSVE.IN

Natural and synthetic songs are coded more efficiently than noise in zebra finch auditory neurons

Neural recording from 3 locations:• MLd: Auditory midbrain • Field L: Primary cortex• CM: Secondary cortex involved in song processing

3 types of stimuli:• Song: 20 natural songs from male zebra

finch • Syn-song: Mimics spectral and temporal

modulations in song • ML-noise: Noise with limited range of

modulations

Hsu, A., S. M. Woolley, T. E. Fremouw, and F. E. Theunissen. “Modulation power and phase spectrum of natural sounds enhance neural encoding performed by single auditory neurons.” Figures 1A, 4, and 5 in J Neurosci 24, no. 41 (Oct 13, 2004): 9201-11. (Copyright 2004 Society for Neuroscience. Used with permission.)www.bsscommunitycollege.in www.bssnewgeneration.in www.bsslifeskillscollege.in


WWW.BSSVE.IN

Using the stimulus reconstruction method to estimate stimulus information in spike train

• Computing I(S,R) directly from the definition often requires too much data.

• Data processing theorem: If Z=f(R), then I(S,Z)≤ I(S,R).“Data processing cannot increase information”.• Special case: If Ŝ is an estimate of S based on neural

response R, then I(S,Ŝ)≤ I(S,R)

• Method:• Use Gaussian stimulus S(t) and compute linear

estimate Ŝ(t) from neural response. • Define noise as N(t)=S(t)-Ŝ(t), assumed to be

Gaussian.• Compute power spectra of N(t) and Ŝ(t).• Use formula for information rate of Gaussian channel

to estimate RINFO.

• The method is meant to give a lower bound on the information rate. The better the stimulus estimate and the Gaussian assumption, the more accurate the bound.

From:Rieke, Fred, David Warland, Rob de RuytervanSteveninck, and William Bialek. Spikes: Exploring the Neural Code. Cambridge, MA: MIT Press/Bradford Books, 1997 (c). Used with permission.www.bsscommunitycollege.in www.bssnewgeneration.in www.bsslifeskillscollege.in


WWW.BSSVE.IN

Linear reconstruction of a stimulus from the spike train (Bialek et al., 1991)

• Given a stimulus waveform s(t) and a spike train r(t),what is the linear filter h(t) which gives the least-squares estimate of s(t) from r(t)?

• The solution is given (in the frequency domain) by the Wiener filter:

– H(f) is the Fourier transform of h(t), i.e. the filter frequency response, Sr(f) is the power spectrum of the spike train, and Srs(f) is the cross-spectrum between r(t) and s(t)

2

0

ˆ( ) ( )* ( ) ( )

ˆFind ( ) to minimize ( ( ) - ( ))

ii

T

s t h t r t h t t

h t E dts t s t

= = −

=

∑

∫

( )( )( )

rs

r

S fH fS f

=



WWW.BSSVE.IN

Example of linear stimulus reconstruction from a spike train


Please see:Borst, A., and F. E. Theunissen. “Information theory and neural coding.” Figure 4 in Nat Neurosci 2, no. 11 (Nov 1999): 947-57.

• Fly visual system (“H1 neuron”), moving grating stimulus• Reconstruction resembles lowpass filtered stimuluswww.bsscommunitycollege.in www.bssnewgeneration.in www.bsslifeskillscollege.in


WWW.BSSVE.IN

Bullfrog advertisement callWaveform Power Spectrum

Male bullfrogs produce an advertisement call to convey information about location and breeding readiness to both sexes.

Figure by MIT OCW.

From:Rieke, Fred, David Warland, Rob de RuytervanSteveninck, and William Bialek. Spikes: Exploring the Neural Code. Cambridge, MA: MIT Press/Bradford Books, 1997 (c). Used with permission.



WWW.BSSVE.IN

Information rates in bullfrog primary auditory fibers are higher for call than for noise

• Linear stimulus reconstruction method was used to estimate (a lower bound on) information rate for both broadband Gaussian noise and call-shaped noise.

• Information rates are higher for call-shaped noise (40-250 bits/s) than for broadband noise (10-50 bits/s).

Figures removed due to copyright reasons.Please see:Figures from Rieke F., D. A. Bodnar, W. Bialek. "Naturalistic stimuli increase the rate and efficiency ofinformation transmission by primary auditory afferents." Proc Biol Sci 262, no. 1365(December 22, 1995): 259-65. (Courtesy of The Royal Society. Used with permission.)



WWW.BSSVE.IN

Coding efficiency in bullfrog primary auditory fibersInformation RateCoding Efficiency =

Entropy Rate in Spike Train

• The coding efficiency is a measure of the fraction of the information contained in the spike train that is used to code the stimulus.

• Coding efficiency ranged from 0.05 to 0.2 for the broadband stimulus, 0.35 to 0.9 for the call-shaped noise.

• These high coding efficiencies were achieved despite the poor quality of the stimulus reconstructions.

Figures removed due to copyright reasons.Please see:Figures from Rieke F., D. A. Bodnar, W. Bialek. "Naturalistic stimuli increase the rate and efficiency of information transmission by primary auditory afferents." Proc Biol Sci 262, no. 1365 (December 22, 1995): 259-65. (Courtesy of The Royal Society. Used with permission.)



WWW.BSSVE.IN

Information rates and coding efficiencies in sensory neurons

Information Rate(bits/spike)

Information Rate(bits/s)

Coding Efficiency

Best TimingPrecision and

Variance : MeanNeural System and Species

Cat retinal ganglion cellsPrimary visual cortex (VI) of rhesus monkeysMiddle temporal area (MT) of rhesus monkeysInferior temporal area (IT) of rhesus monkeysHippocampus of rhesus monkeys

~0.04-0.10-0.025~0.13~0.18

0.4-0.80.620.89 0.290.90.32(max = 1.2)

+-

-

---

-

~1.3--

Constant stimulus

HI motion-sensitive neuron of a flyFrog auditory afferentsVibratory receptors of the bullfrog sacculusCricket mechanoreceptorsSalamander retinal ganglion cells

The MT of anesthetized rhesus monkeys

The MT of alert rhesus monkeys

0.750.662.60.6-3.21.9

~0.65

0.6

642315575-2943.7 (up to 10 fora population of>10 cells)6.7(max = 12.3)5.5

30%11%50-60%50-60%26% (>79%for >10 cells)

-

<30%

~2 ms-~0.4 ms-

2-4 ms

-

Variable stimulus: reconstruction method

HI motion-sensitive neuron of a flySalamander and rabbit retinal ganglion cellsThe MT of alert rhesus monkeys

2.433.71.5

8016.312(max = 29)

50%59%Up to 45%

1.5-3 ms; <0.1> 0.70 ms; > 0.05<2 ms; ~1.4_ _

Variable stimulus: direct method

Table by MIT OCW.www.bsscommunitycollege.in www.bssnewgeneration.in www.bsslifeskillscollege.in


WWW.BSSVE.IN

Information Theory Pro’s and Con’sPros• Does not assume any particular neural code• Can be used to identify the stimulus features best encoded by

neurons or to compare effectiveness of different putative neuralcodes

• One number summarizes how well stimulus set is coded in neural response

Cons• Information estimate depends on stimulus set. Stimulus

probabilities in environment hard to specify.• Does not specify how to read out the code: the code might be

unreadable by the rest of the nervous system.• For all but simplest examples, estimation of mutual information

requires huge amount of data. Methods that try to circumvent data limitations (e.g. stimulus reconstruction) make additional assumptions (e.g. linearity or Gaussian distribution) that are not always valid.



WWW.BSSVE.IN

Selected References

Slide 25: Borst, A., and F. E. Theunissen. "Information theory and neural coding." Nat Neurosci 2, no. 11 (Nov 1999): 947-57.

Slide 35: Buracas, G. T., and T. D. Albright. "Gauging sensory representations in the brain." Trends Neurosci 22, no. 7 (Jul 1999): 303-9.



WWW.BSSVE.IN

Harvard-MIT Division of Health Sciences and Technology

HST.722J: Brain Mechanisms for Hearing and Speech

Course Instructor: Joseph S. Perkell

Motor Control of Speech: Control Variables and Mechanisms

HST 722, Brain Mechanisms for Hearing and Speech

Joseph S. Perkell MIT

111/2/05



WWW.BSSVE.IN

Outline

• Introduction • Measuring speech production • What are the “controlled variables” for segmental

(phonemic) speech movements? • Segmental motor programming goals • Producing speech sounds in sequences • Experiments on feedback control • Summary

2112/05



WWW.BSSVE.IN

Outline

• Introduction – Utterance planning – General physiological/neurophysiological features – The controlled systems – Example of movements of vocal-tract articulators

• Measuring speech production • What are the “controlled variables” for segmental

speech movements? • Segmental motor programming goals • Producing speech sounds in sequences • Experiments on feedback control • Summary

3112/05



WWW.BSSVE.IN

Utterance Planning

• Objective: generate an intelligible message while providing for �“economy of effort” – stages: �– Form the message (e.g. Feel hungry; smell pizza; together with a friend). – Select and sequence lexical items (words). “Do you want a pizza?” – Assign a syntactically-governed prosodic structure. – Determine “postural” parameters of overall rate, loudness and degree of

reduction (and settings that convey emotional state, etc.) • Extreme reduction: “Dja wanna pizza?”

– Determine temporal patterns: Sound segment durations depend on: • Phoneme length • Overall rate • Intrinsic characteristics of sounds • Position and number of syllables in word

• Result: an ordered sequence of goals for the production mechanism

4112/05



WWW.BSSVE.IN

Serial Ordering

• Evidence reflecting serial ordering in utterance planning: speech errors – Examples from Shattuck-Hufnagel (1979)

• Substitution Anymay (Anyway) • Exchange emeny (enemy) • Shift bad highvway dri_ing (highway driving) • Addition the plublicity would be (publicity) • Omission sonata _umber ten (number) • ? dignal sigital processing

– See Averbeck et al. on neurophysiologial evidence concerning serial ordering

5112/05



WWW.BSSVE.IN

General Physiological/Neurophysiological Features • Muscles are under voluntary control • Structures contain feedback receptors that

supply sensory information to the CNS: – Surfaces: touch/pressure – Muscles:

• length and length changes: spindles • Tension: tendon organs

– Joints (TMJ): joint angle • Reflex mechanisms:

– Stretch – Laryngeal (coughing) – Startle

• Motor programs (low-level, “hard wired” neural pattern generators) – Breathing – Swallowing – Chewing – Sucking

Pharynx

Larynx

Epiglottis

Lips

Arytenoid cartilage

Esophagus

Lungs

Soft palate

Vocal cords

Nasal cavity

Tongue

Hyoid

Teeth

Trachea

Diaphragm

Adam's apple

Thyroid cartilage

Cricoid cartilage

• Low-level circuitry could be employed in Figure by MIT OCW.

speech motor control. The picture is complex, and a comprehensive account hasn’t emerged.

112/05 6



WWW.BSSVE.IN

The controlled systems • The respiratory system

– most massive (slowly-moving structures) – Provides energy for sound production

• Fluctuations to help signal emphasis • Relatively constant level of subglottal pressure

– Different patterns of respiration: breathing, reading aloud, spontaneous, counting

– Different muscles are active at different phases of the respiratory cycle – a complex, low-level motor program

• Larynx – Smallest structures, most rapidly contracting

muscles – Voicing, turned on and off segment-by-segment – F0, breathiness – suprasegmental regulation

• Vocal tract – Intermediate-sized, slowly moving structures:

tongue, lips, velum, mandible – Many muscles do not insert on hard structures – Can produce sounds at rates up to 15/sec – To do so, the movements are coarticulated


7112/05

Please see: Conrad, B., and P. Schonle. "Speech and respiration." Arch Psychiatr Nervenkr 226, no. 4 (1979): 251-68.



WWW.BSSVE.IN

Velum

LipsTongue blade

Tongue body

Focus of lecture is on movements of vocal-tract articulators

• Consider the movements of each of these structures

• Approximate number of muscle pairs that move the

– Tongue: 9 – Velum: 3 – Lips: 12

Mandible – Mandible: 7 – Hyoid bone: 10

Hyoid bone – Larynx: 8 – Pharynx: 4

Larynx • Not including the respiratory system

• Observations: • A large number of degrees of freedom • A very complicated control problem

8112/05



WWW.BSSVE.IN

Outline

• Introduction • Measuring speech production

– Acoustics – Articulatory movement – Area functions

• What are the “controlled variables” for segmental speech movements?

• Segmental motor programming goals • Producing speech sounds in sequences • Experiments on feedback control • Summary

9112/05



WWW.BSSVE.IN

•� Acoustics – important for perception Measuring Speech Production – Spectral, temporal and amplitude measures

•� Vowels, liquids and glides: – Time varying patterns of formant frequencies

•� Consonants: – Noise bursts �


– Silent intervals – Aspiration and frication noises – Rapid formant transitions

“The yacht was a heavy one” From: Perkell, Joseph S. Physiology of Speech Production: Results and Implications of a Quantitative Cineradiographic Study. Research Monograph No. 53. Cambridge, MA: •� Movements MIT Press. 1969(c). Used with permission.

– From x-ray tracings – With an Electro-Magnetic

Midsagittal Articulometer(EMMA) System

•� Points on the tongue, lips, jaw, (velum) �

Reprinted with permission from: Perkell, J., M. Cohen, M. Svirsky, M. Matthies, I. Garabieta,

and M. Jackson. "Electro-magnetic midsagittal articulometer (EMMA) systems

• Other parameters: air pressures �

for transducing speech articulatory movements. J Acoust Soc Am 92 (1992):

and flows, muscle activity …�

3078-3096. Copyright 1992, Acoustical Society America.

112/05

10

Please see: Steven, K. Acoustic Phonetics. Cambridge, MA: MIT Press, 1998, p. 248. ISBN: 026219404X.



WWW.BSSVE.IN

EMMA Data Collection

• Transducer coils are placed on subject’s articulators • Subject reads text from an LCD screen • Movement and audio signals are digitized and displayed in real time • Signals are processed and data are extracted and analyzed

11112/05



WWW.BSSVE.IN

112/05 12

Analysis of EMMA data

• Algorithmic data extraction at time of minimum in absolute velocity during the vowel:– Vowel formants– Articulatory positions (x, y)

800 900 1000 1100 1200 1300 1400 MSEC

AUDIO and VELOCITY MAGNITUDE FOR TD

5

10

15

20

25

CM

/SE

C

s e # h u d # h , d # , t

800 900 1000 1100 1200 1300 1400 MSEC


5

10

15

20

25

CM

/SE

C800 900 1000 1100 1200 1300 1400 MSEC


5

10

15

20

25

CM

/SE

C800 900 1000 1100 1200 1300 1400 MSEC


5

10

15

20

25

CM

/SE

C800 900 1000 1100 1200 1300 1400 MSEC


5

10

15

20

25

CM

/SE

C

s e # h u d # h , d # , t



WWW.BSSVE.IN

/i/ - 2 speakers 3-Dimensional Area Function Data /u/ - 2 speakers

Transverse (horizontal)

/#/, /i/ - 2 speakers

• MR images of sustained vowels (Baer et al., JASA 90: 799-828)

– Area functions are more complicated than they look in 2 dimensions

– There are lateral asymmetries, but 2-D midsagittal Coronal (vertical) (midline) movement data provide useful information

13112/05

Baer, T., J. C. Gore, L. C. Gracco, and P. W. Nye. "Analysis of vocal tract shape and dimensions using magnetic resonance imaging: Vowels." J Acoust Soc Am 90 (1991): 799.�Reprinted with permission from

Copyright 1991, Acoustical Society America.��www.bsscommunitycollege.in www.bssnewgeneration.in www.bsslifeskillscollege.in


WWW.BSSVE.IN

Outline


(phonemic) speech movements? – Possible controlled variables – Modeling to make the problem approachable: DIVA – A schematic view of speech movements

• Segmental motor programming goals • Producing speech sounds in sequences • Experiments on feedback control • Summary

14112/05



WWW.BSSVE.IN

“What are the controlled variables?”

• The question has theoretical and practical implications – What are the fundamental motor

programming units and most Figure removed due to copyright reasons.

appropriate elements for phonological/phonetic theory?

– What domains should be the main focus of research for diagnosis and treatment of speech disorders?

“The yacht was a heavy one”

•� Objective of Speaker: –� To produce sounds strings with acoustic patterns that result in intelligible

patterns of auditory sensations in the listener •� Acoustic/auditory cues depend on type of sound segment : �

–� Vowels and glides: Time varying patterns of formant frequencies �–� Consonants: Noise bursts, Silent intervals, Aspiration and frication noises, �

Rapid formant transitions

15112/05

Please see: Steven, K. Acoustic Phonetics. Cambridge, MA: MIT Press, 1998, p. 248. ISBN: 026219404X.



WWW.BSSVE.IN

Possible Motor Control Variables

Genio-glossus

Figure by MIT OCW.

Tongue

Jaw Hyoid bone

Hyo-glossus

•� Auditory characteristics of speech sounds are determined by: 1. Levels of muscle tension 2. Changing muscle lengths and movements of structures 3. The vocal-tract shape (area function) 4. Aerodynamic events and aeromechanical interactions 5. The acoustic properties of the radiated sound

•� Hypothetically, motor control variables could consist of feedback about any combination of the above parameters

16112/05



WWW.BSSVE.IN

Modeling to make the problem approachable: DIVA

“Directions Into Velocities of Articulators” (Guenther and Colleagues – Next lecture)

Update

To

Sound Representation

4 Somatosensory Error

2 Commands

3 Auditory Error

Auditory Goal

Region Auditory Feedback

Feedback Feedforward

Command

Feedback-based Command

Auditory Feedback-based Command

Feedforward Feedback

1 Speech

Muscles

Motor

Somatosensory Goal Region

Somatosensory

Somatosensory

Subsystem Subsystem

• A neuro-computational model of relations among cortical activity, motor output, sensory consequences

• Phonemic Goals: Projections (mappings) from premotor to sensory cortex that encode expected sensory consequences of produced speech sounds – Correspond to regions in

multidimensional auditory-temporal and somatosensory-temporal spaces

• Roles of feedforward and feedback subsystems will be discussed later.

17112/05



WWW.BSSVE.IN

A Schematic View ofSpeech Movements

• Planned and actual acoustic trajectories illustrate: – Auditory/acoustic goal regions

– Economy of effort (Lindblom) – Coarticulation – Motor equivalence – Biomechanical saturation

(quantal) effects • When controlling an articulatory

speech synthesizer, DIVA, accounts for the first four and – Aspects of acquisition – Responses to perturbations

Planned trajectory

Biomechanical

height

height

Biomechanical A

rticu

lato

rpo

sitio

n

actual trajectories to lag planned trajectory

Acoustic goal region for /u/

Acoustic result of biomechanical

blade contacting sides of palatal vault)

Acoustic Space

Actual trajectories Acoustic goal region for /i/

Acoustic goal region for /g/

(closure) on actual trajectories

Lip protrusion

protrusion & tongue-body height

Repetition 1

Repetition 1

Repetition 2

Repetition 2

constraint (closure) A

cous

tic d

imen

sion

1 (e

.g. F

1)

Time

effect (inertia) causing saturation effect

Saturation effect (stiffened tongue-

Economy of effort

Articulator Space

Economy of effort

Effect of biomechanical constraint

Tongue-blade

Tongue-body

Trading relation between lip

Figure by MIT OCW. 18112/05



WWW.BSSVE.IN

Outline


speech movements? • Segmental motor programming goals

– Anatomical and acoustic constraints: Quantal effects – Individual differences – anatomy – Motor equivalence: A strategy to stabilize acoustic

goals – Clarity vs. economy of effort – Relations between production and perception

• Vowels • Sibilants

• Producing speech sounds in sequences • Experiments on feedback control • Summary

19112/05



WWW.BSSVE.IN

Anatomical and Acoustic Constraints on Articulatory Goals • Properties of speakers’ production and perception mechanisms help

to define goals for speech sounds that are used in speech motor planning

• Some of these properties are characterized by quantal effects (Stevens), which can also be called “saturation effects”

I

Articulatory parameter

Aco

ustic

par

amet

er

II

III

• Schematic example: A continuous change in an articulatory parameter produces two regions of acoustic stability, separated by a rapid transition

• Hypothesis: some goals are auditory and can be characterized in terms of acoustic parameters:formant

Figure by MIT OCW. frequencies, relative sound level, etc.

• Languages “prefer” such stable regions • The use of those regions by individual speakers helps to produce �

relatively robust acoustic cues with imprecise motor commands �

20112/05



WWW.BSSVE.IN

Goals for the vowel /i/ - A. An acoustic saturation (quantal) effect for constriction location (Stevens, 1989)

4

4

3

2

2

1

0 0 6 8

1 ( )

()

L1 L2

A1 A2

10 12 Length of back cavity, L cm

Freq

uenc

y kH

z

Figure by MIT OCW.

Perkell, J. S., and W. L. Nelson. "Variability in production

There is a range (in green) of back

of the vowels /i/ and /a/." J Acoust Soc Am 77 (1985): 1889-1895.

cavity lengths over which F1-F3 are �relatively stable. �

Many repetitions of /i/ in two subjects �show a corresponding variation of �constriction location. �

However, as reflected in the articulatory �data, the formants of /i/ are sensitive to variation in constriction degree. �

21112/05

Reprinted with permission from

Copyright 1985, Acoustical Society America.www.bsscommunitycollege.in www.bssnewgeneration.in www.bsslifeskillscollege.in


WWW.BSSVE.IN

Quantal and non-quantal articulatory-to-acoustic relations for /L/ and /$/


�

of the vowels /i/ and /a/." J Acoust Soc Am 77 (1985): 1889-1895. 22

112/05

Reprinted with permission from:

Copyright 1985, Acoustical Society America.



WWW.BSSVE.IN

A biomechanical saturation effect for constriction degree for /i/

0

1

2

3

4

0 40 80 120 140

F1

F2

F3

kHz

Stiff Tongue Blade

Lax Tongue Blade

Percent

GGp

Figures by MIT OCW.


of the vowels /i/ and /a/." J Acoust Soc Am 77 (1985): 1889-1895.

• Constriction degree and resulting formants can be stabilized

– Stiffening the tongue blade (with intrinsic muscles)

• Constriction area (shaded) varies – Pressing the stiffened tongue blade against little, even with variation in GGp the sides of the hard palate through contraction (from a 3D tongue model by contraction of the posterior genioglossus Fujimura & Kakita) (GGp) muscles

112/05 23


Direction of posterior genioglossus contraction




WWW.BSSVE.IN

Tongue Contour Differences Among Four Speakers

a

o

u

I

�

e

C8

C7

C6

C5C4

C3

C2

C1 ee

ae

oo

uh er

Figure by MIT OCW.

• Note the tongue contour differences among the four speakers.

• An “auditory-motor theory of speech production” (Ladefoged, et al., 1972)


24112/05



WWW.BSSVE.IN

Effects of different palate shapes on vowel articulations

• Palatal shapes differ among individuals • Palatal depth can influence

– Spatial differences in vowel targets

Figures removed due to copyright reasons.Please see:Perkell, J. S. "On the nature of distinctive features: Implications of a preliminary vowel production study."Frontiers of Speech Communication Research. Edited by B. Lindblom and S. Öhman.London, UK: Academic Press, 1979, p. 372 (right), 373 (left).

/i/ /I/ Figures by MIT OCW.

25112/05 www.bsscommunitycollege.in www.bssnewgeneration.in www.bsslifeskillscollege.in


WWW.BSSVE.IN

Production of /u/

• Contractions of the styloglossus andposterior genioglossus

• Note: place of constriction & variationin constriction location

1

2 3

4 5 6 7

Dorsal

Ventral

i

a

u

), A schematic illustration of articulatory data for multiple repetitions of the vowels /a/ (tongue surface illustrated by /i/ ( ) and /u/ ( ), showing elliptical distribution of positioning of points on the tongue surface, with the long axes of the ellipses oriented parallel to the vocal-tract midline.

Figure by MIT OCW.




WWW.BSSVE.IN

Stabilizing the sound output for the vowel /u/: Motor Equivalence

•� Hypothesis: negative correlation between tongue-body raising and lip protrusion in multiple repetitions of the vowel ( )

()

0.4

0.5

0.6

0.8

0.7

1.2 1.3 1.4 1.5 1.6 1.7

0.9

Upper lip x position cm

Tong

ue y

pos

ition

cm

Acoustic goal region for /u/

Aco

ustic

Dim

ensi

on 1

(e.g

. F1)

Lip Protrusion

Repetition 2 Repetition 1

& tongue-body height

Arti

cula

tor P

ositi

on

Repetition 2

Repetition 1

Actual Trajectories

Planned Trajectory

Tongue-Body Height

Trading relation between lip protrusion

3

2

1

0

-1

-2

-3

-4

-5 -5.5 -3.5 -1.5 0.5 2.5

LI

LLTB

UL Palate

•� Hypothesis is supported in a number of subjects

•� The goal for the articulatory movements for /u/ is in an acoustic/auditory frame of reference, not a spatial one

•� Strategy: Stay just within the acoustic goal region

27112/05 Figures by MIT OCW.



WWW.BSSVE.IN

Palatal Depth and Motor Equivalence

3

2

1

0

-1

-2

-3

-4

-5 -5.5 -3.5 -1.5 0.5 2.5

LI

LL

LL

TB

UL UL

-6 -5 -5

-4

-4

-3

-3

-2

-2

-1

-1

0

0

1

1

2

2

3

LI

Palate

Palate TB

1cm

Subject 6Subject 2

Palatal Depth and Motor Equivalence

Tran

sduc

er P

ositi

on (c

m)

Transducer Position (cm) Transducer Position (cm)

Tran

sduc

er P

ositi

on (c

m)

Figure by MIT OCW.

• Palatal depth can also influence – Variability in movement toward vowel targets

28112/05



WWW.BSSVE.IN

Motor Equivalence for /r/

• Speakers use similar articulatory trading relations when producing /r/ in different phonetic contexts (Guenther, Espy-Wilson, Boyce, Matthies, Perkell, and Zandipour, 1999, JASA)�

• Acoustic effect of the longer front �

cavity of the blue outlines is� compensated by the effect of the �

longer and narrower constriction of the red outlines (e.g., Stevens, 1998).

• F3 variability is greatly decreased by these articulatory trading relations.

• Conclusion: The movement goal for /r/ is a low value of F3 – an auditory/acoustic goal

29112/05

/wagrav//warav/ or /wabrav/

S1

S2

S3

S4

S5

S6

S7

BACK FRONT BACK FRONT

1 cm

Reprinted with permission from:Guenther, F., C. Espy-Wilson, S. Boyce, M. Matthies, M. Zandipour, and J. Perkell. "Articulatory tradeoffs reduce acoustic variability during American English /r/ production." J Acoust Soc Am 105 (1999): 2854-2865.Copyright 1999, Acoustical Society America.



WWW.BSSVE.IN

Clarity vs. Economy of Effort: Another principle (continuous, asopposed to. quantal) that influences vowel categories (Lindblom, 1971)


• Used an articulatory synthesizer and heuristics to estimate the location of �vowels in F1, F2 space, based on �– A compromise between “perceptual differentiation” and “articulatory ease” and – The number of vowels in the language

• Approximated vowel distributions for languages containing up to about 7 vowels • Later discussed in terms of a tradeoff between clarity and economy of effort,

i.e., a relation between production and perception 30112/05



WWW.BSSVE.IN

Relations between Production and Perception

• Close linkage between production and perception: –� Speech acquisition, with and without hearing –� Speech of Cochlear Implant users –� Second-language learning (e.g., Bradlow et al.)

–� Focused studies of production & perception (e.g., Newman)

–� Mirror neurons – a more general action-perception link (e.g., Fadiga et al.)

• Hypothesis: – Speakers who discriminate well between vowel sounds with subtle

acoustic differences will produce more clear-cut sound contrasts –� Speakers who are less able to discriminate the same sound stimuli will

produce less clear-cut contrasts

31112/05



WWW.BSSVE.IN

Production Experiment

•� Data Collection –� Subjects: 19 young-adu

speakers of American English–� For each subject: �

•� Recorded articulatory �movements and acoustic signal�

Perkell, J. S., F. H. Guenther, H. Lane, M. L. Matthies, E. Stockmann, M. Tiede,

•� Subject pronounced �

M. Zandipour. "The distinctness of speakers' productions of vowel contrasts is related to their

“Say___ hid it.”;

discrimination of the contrasts." J Acoust Soc Am 116 (2004): 2338-44.

___ = cod, cud, who’d or hood

•� Clear, Normal and Fast conditions

• Analysis �–� Calculated contrast distance for each vowel pair: �

• Articulatory (TB) contrast distance: distance in mm between the centroids of the cod

and cud TB distributions. • Acoustic contrast distance: distance in Hz between centroids of F1, F2 distributions

for cod, cud

32�112/05

-60 -40 -20 0 20-30

-20

-10

0

10

x coordinate (mm)

y co

ordi

nate

(mm

) PALATE

TDTB

TTLI

UL

LLx cud+ cod

-60 -40 -20 0 20-30

-20

-10

0

10

x coordinate (mm)

y co

ordi

nate

(mm

) PALATE

TDTB

TTLI

UL

LLx cud+ cod

Reprinted with permission from

:




WWW.BSSVE.IN

Perception Experiment Who’d-hood continuum (male)

• Methods – Synthesized natural-

sounding stimuli in 7-cod-step continua – for


cud, who’d-hood

M. Zandipour. "The distinctness of speakers' productions of vowel contrasts is related to their

– Each subject: Labeling

discrimination of the contrasts." J Acoust Soc Am 116 (2004): 2338-44.

and discrimination (ABX)tasks

• Results: ABX scores (2-step) – Ceiling effects: some 100% subjects probably had better discrimination than

measured– For further analysis divide subjects into two groups:

• HI discriminators - at 100% (above the median) • LO discriminators - (at median and below)

33112/05

70 80 90 1002-step ABX (% correct)

0

2

4

6

8

Num

ber o

f sub

ject

s


Cod-Cud Who'd Hood


0

2

4

6

8

Num

ber o

f sub

ject

s


Cod-Cud Who'd Hood





WWW.BSSVE.IN

Results & Conclusions

• HI discrimination subjects produced greater contrast distance than LO discrimination subjects (measured in articulation or acoustics)

• The more accurately a speaker discriminates a vowel contrast, the more distinctly the speaker produces the contrast

* Difference between HI and LO groups is significant at p < .001


M. Zandipour. "The distinctness of speakers' productions of vowel contrasts is related to their discrimination of the contrasts." J Acoust Soc Am 116 (2004): 2338-44.

34112/05

0

2

4

6

8

10

L

LL

Art

icul

ator

yCon

trast

Dis

tanc

e (m

m)

H

HH

F N CSpeaking Condition

200

250

300

350

400

L

LL

Aco

ustic

Con

trast

Dis

tanc

e (H

z)

H

HH

L LL

H HH

F N CSpeaking Condition

L LL

HH

H

Who'd-Hood Cod -Cud

* *

*LO

HI LO

HI

LO

HI

LO

HI

L

LLH

HH

L

LL

H

HH

L LL

H HH

L LL

HH

H

- -Cud

* *

*LO

HI LO

HI

LO

HI





WWW.BSSVE.IN

A Possible Explanation

• It is advantageous to be as intelligible as possible • Children will acquire goal regions that are as distinct as possible

– Speakers who can perceive fine acoustic details learn auditory goal

regions that are smaller and spaced further apart than speakers with less acute perception, because

– The speakers with more acute perception are more likely to reject poorly produced tokens when learning the goal regions

35112/05



WWW.BSSVE.IN

Consonants: A saturation effect for /s/ may help define the /s-6/ contrast

• Production of /6/ (as in “shed”) – Relatively long, narrow groove between

tongue blade and palate – Sublingual space

Figures removed due to copyright reasons. • Production of /s/ (as in “said”)�Please see: Figures from: Perkell, J. S., M. L. Matthies, M. Tiede, H. Lane, – Short narrow groove �M. Zandipour, N. Marrone, E. Stockmann, and F. H. Guenther. ”The Distinctness of Speakers' /s/—/∫/ Contrast is related to their – No sublingual space �auditory discrimination and use of an articulatory saturation effect.” J Speech, Language and Hearing Res 47 (2004): 1259-69. • Saturation effect for /s/ �

– As tongue moves forward from /6/, sublingual cavity volume decreases

– When tongue contacts lower alveolar ridge, sublingual cavity is eliminated, resonant frequency of anterior cavity increases abruptly

– After contact, muscle activity can increase further; output is unchanged

36112/05



WWW.BSSVE.IN

Relations Between Production andPerception of Sibilants

• Hypothesis: The sibilants, /s/ and /5/, �have two kinds of sensory goals: �

/6/ /s/

– Auditory: particular distribution of � Figures removed due to copyright reasons. Please see:energy in the noise spectrum Figures from: Perkell, J. S., M. L. Matthies, M. Tiede, H. Lane, M. Zandipour, N. Marrone, E. Stockmann, and F. H. Guenther. – Somatosensory: e.g., patterns of � ”The Distinctness of Speakers' /s/—/∫/ Contrast is related to their

contact of the tongue blade with the � auditory discrimination and use of an articulatory saturation effect.” J Speech, Language and Hearing Res 47 (2004): 1259-69.palate and teeth

• Speakers will vary in their ability to discriminate /s/ from /5/ • Speakers use contact of the tongue tip with the lower alveolar ridge for �

/s/ to help differentiate /s/ from /5/ �– This will also vary across speakers

• Across speakers, both factors, ability to discriminate auditorily between �the two sounds and use of contact (a possible somatosensory goal), �will predict the strength of the produced contrast �

37112/05



WWW.BSSVE.IN

Methods (with the same 19 subjects as the vowel study) �

• Production experiment – each subject: – Recorded: acoustic signal, and �

contact of the under side of the tongue tip with the lower alveolar ridge - with a custom-made sensor

– Subject pronounced, “Say___ hid it.”;___ = sod, shod, said or shed”

– Clear, Normal and Fast conditions • Analysis – calculated:

– Proportion of time contact was made

during the sibilant interval – Spectral median for /s/ and /5/� – Acoustic contrast distance:�

• Difference in spectral median � between /s/ and /5/�

Figures removed due to copyright reasons. Please see: Figures from: Perkell, J. S., M. L. Matthies, M. Tiede, H. Lane, M. Zandipour, N. Marrone, E. Stockmann, and F. H. Guenther. ”The Distinctness of Speakers' /s/—/∫/ Contrast is related to their auditory discrimination and use of an articulatory saturation effect.” J Speech, Language and Hearing Res 47 (2004): 1259-69.

• Perception experiment - eachsubject:– Labeled and discriminated

(ABX) between synthesized stimuli from a seven-step said to shed continuum

38 112/05



WWW.BSSVE.IN

Results Use of tongue-to-lower-ridge contact Discrimination

Figures removed due to copyright reasons. Please see: Figures from: Perkell, J. S., M. L. Matthies, M. Tiede, H. Lane, M. Zandipour, N. Marrone, E. Stockmann, and F. H. Guenther. ”The Distinctness of Speakers' /s/—/∫/ Contrast is related to their auditory discrimination and use of an articulatory saturation effect.” J Speech, Language and Hearing Res 47 (2004): 1259-69.

– Nine subjects had percent correct – 12 subjects (left of vertical line) are classified as = 100; categorized as HI

Strong (S) for use of contact difference (c) between discriminators (right of line) /s/ and /5/ – 10 subjects had percent correct

– The remaining subjects are classified Weak (W) for� < 100; categorized as LO �use of contact difference discriminators �

39112/05



WWW.BSSVE.IN

Produced contrast distance is related to

– Ability to discriminate the �contrast �

* �difference is significant, p < .01

– Use of contact difference Figures removed due to copyright reasons. Please see: Figures from: Perkell, J. S., M. L. Matthies, M. Tiede, H. Lane, M. Zandipour, N. Marrone, E. Stockmann, and F. H. Guenther.

• Interactions � ”The Distinctness of Speakers' /s/—/∫/ Contrast is related to their auditory discrimination and use of an articulatory saturation effect.” – Speakers with good � J Speech, Language and Hearing Res 47 (2004): 1259-69.

discrimination and use of �contact difference: best �contrasts �

– Speakers with one or the �other factor: intermediate �contrasts �

– Speakers with neither �factor: poorest contrasts �

40112/05



WWW.BSSVE.IN

Outline (break time?)

• Introduction • Measuring speech production • What are the “controlled variables” for segmental speech

movements? • Segmental motor programming goals • Producing speech sounds in sequences

– An example utterance – Movements show context dependence

• Velar movements • Lip rounding for /u/

– Effects of speaking rate – Persistence of inaudible gestures at word boundaries

• Experiments on feedback control • Summary

41112/05



WWW.BSSVE.IN

Producing Sounds in Sequences

• An example utterance

• * indicates articulations that aren’t strongly constrained by communicative needs

• Articulations anticipate upcoming requirements: anticipatory coarticulation

• Coarticulation: – asynchronous

movements of structures of differing sizes and movement time constants

– a complicated motor coordination task

k m b r

; /r/

/r/

*

*

/r/

*

/ /

*

/m/ /b/ *

/b/, ( /b/

)

* * *

/b/

/k/

/k/

body

Rise to contact roof of mouth to achieve closure & silence

Release contact to generate a noise

to vowel position

Begin movement toward position

Maintain position

blade

Maintain contact with floor of mouth to stay out of the way

Begin retroflexion or bunching in anticipation of

Maintain retroflexed or bunched configuration

Lips Begin spreading for the vowel

Maintain position for vowel, then begin toward closure

Achieve & maintain closure

Release rapidly & round somewhat

Maintain closure

Mandible Move upward to support tongue movement

Move downward to support tongue movement

Move upward to support lower lip movement

Move downward slightly to aid lip release

Soft palate

Maintain closure to contain pressure buildup

Begin downward movement to open velopharyngeal port for

Begin closing movement toward onset of

Reach closure at right instant to begin move upward during

walls air pressure buildup

Relax, perhaps expand actively to allow continuation of voicing for

position with peak occurring at release

Adduct to position for voicing

Maintain position Maintain position Maintain position

vocal folds

Begin to raise tension to signal stress on following vowel

Achieve maximum tension for the F0 peak that signals stress

Lower tension to lower F0

Maintain tension Maintain tension

Respiratory system

Increase subglottal air pressure to obtain a burst release for the

Maintain subglottal air pressure for increased sound level to signal stress

Return to the previous value of subglottal pressure

Maintain pressure Maintain pressure

Tongue burst move down

Tongue

to help expand v.t. walls - voicing

Vocal-tract Stiffen to contain

Vocal-fold Abduct maximally,

Tension on

112/05 Table by MIT OCW. 42



WWW.BSSVE.IN

What happens when sounds are produced in sequences?

• When individuals speak to one another, additional forces are at play – Articulatory movements from one sound to another are influenced by

dynamical factors: canonical targets are very rarely reached. – The speaker knows that the listener can fill in a great deal of missing

information, so “reduction” takes place (see Introduction) – Speaking style (casual, clear, rapid, etc.) can vary

• Amount of variation can depend on the situation and the interlocutor (a familiar speaker of the same language?)

43112/05



WWW.BSSVE.IN

Movements Show Context Dependence

• Coarticulation – At any moment in time, the current

state of the vocal tract reflects theinfluence of preceding sounds (perservatory coarticulation) and upcoming sounds (anticipatory coarticulation)

– Such coarticulation is a property ofany kind of skilled movement (e.g., tennis, piano playing, etc.)

– It makes it possible to produce sounds in rapid succession (up toabout 15/sec), with smooth, economical movements of slowly-�moving structures.

P

a

m

e

Figure by MIT OCW.

• During the /4/ in “camping” (Kent)

44112/05



WWW.BSSVE.IN

Effects of Coarticulation and Speaking Rate on velar movements

• The velum has to be raised to From: Perkell, Joseph S. Physiology of Speech Production: Results and

contain the air pressure increase of Implications of a Quantitative Cineradiographic Study. Research Monograph No. 53. Cambridge, MA: MIT Press. 1969(c). Used with permission. obstruent consonants

– Its height during the /t/ is context (vowel) dependent - coarticulation

– In the context of a nasal consonant, vowels in American English can be nasalized due to coarticulation

– This is possible because vowel Figure removed due to copyright reasons.nasalization isn’t contrastive in American English �

• The velum (like most other vocal-�tract structures) is slowly-moving �– At higher speaking rates, its

movements become attenuated 45



WWW.BSSVE.IN

Coarticulation of lip rounding for /u/

Figure by MIT OCW.From: Perkell, Joseph S. Physiology of Speech Production: Results and Implications of a Quantitative Cineradiographic Study. Research Monograph No. 53. Cambridge, MA: MIT Press. 1969(c). Used with permission.

• Lip rounding in production of the vowel /u/ in /h�’tu/ – The first three sounds in the utterance are neutral with respect to lip

rounding– The lips are fully protruded before the utterance begins – Coarticulation takes place whenever it doesn’t interfere with transmission of

the message – It crosses syllable and word boundaries – Movements of different structures are asynchronous



WWW.BSSVE.IN

Coarticulation and Acoustic Effects (Gay, 1973, J. Phonetics 2:255-266)


• Cineradiographic measurements – anticipatory and perseveratory coarticulation – Tongue movement from /L/ to /$/ can start during the /S/ because it can’t be heard

and it isn’t constrained physically – The consonants have effects only on the pellet position for the /$/ (not /L/ or /X/). – The pellet is at an acoustically critical constriction in the vocal tract for /L/ and /X/,

but not for /$/. • Note the vertical variation for /$/ (possible for constriction location - QNS).

47112/05



WWW.BSSVE.IN

Effects of Speaking Rate

Perkell, J., and M. Zandipour. "Economy of effort in different speaking conditions.

II. Kinematic performance spaces for cyclical and speech movements." J Acoust Soc Am 112 (2002): 1642-51.

•� Cyclical movements: �L X

– Higher rates show decreased movement durations, distances, increased speed (a measure of effort)

•� Speech vs. cyclical movements: – Compared to cyclical, speech movements �

( generally are faster, larger, shorter – perhaps ¥

because they have well-defined phonetic targets $ •� Vowels produced in fast vs. clear speech:

– larger dispersions, goal-region edges that are �

• Ellipses indicating the range of formant frequencies (+/-1 s.d.) used by a speaker to produce five vowels

closer together – less distinct from one another�

during fast speech (light gray) and clear speech (dark gray) in a variety of phonetic contexts.

48112/05





WWW.BSSVE.IN

Persistence of inaudible gestures at word boundaries U. Tokyo X-ray µ-beam

(Fujimura et al. 1973)

List Production

“perfect, memory”


Phrasal Production

“perfec(t) memory”

• Phrasal: /m/ closure overlaps /t/ release, making it inaudible; /t/ gesture is present nevertheless (c.f. Browman & Goldstein; Saltzman & Munhall)

• Findings replicated and expanded with 21 speakers • Explanation (DIVA): Frequently used phonemes, syllables, words become �

encoded as feedforward command sequences �49112/05



WWW.BSSVE.IN

Outline• Introduction • Measuring speech production • What are the “controlled variables” for segmental speech

movements? • Segmental motor programming goals • Producing speech sounds in sequences • Experiments on feedback control

– DIVA: feedback & feedforward control – Long term effects: Hearing loss and restoration – An example of abrupt hearing and then motor loss – Responses to perturbations – auditory and articulatory

• “Steady state” perturbations • Gradually increasing perturbations • Abrupt, unanticipated perturbations

– Feedback vs. feedforward mechanisms in error correction • Summary

50112/05



WWW.BSSVE.IN

Feedback and Feedforward Control in DIVA

Update

To

Sound Representation

4 Somatosensory Error

2 Commands

3 Auditory Error

Auditory Goal

Region Auditory Feedback

Somatosensory Feedback

Feedforward Command

Feedback-based Command

Auditory Feedback-based Command

Feedforward Feedback

1 Speech

Muscles

Motor

Somatosensory Goal Region

Somatosensory

Subsystem Subsystem • With acquisition, control becomes predominantly feedforward

• Feedback control – Uses error detection and correction - to teach, refine and update feedforward control mechanisms

• Experiments can shed light on – sensory goals

– error correction

– mappings between motor/sensory and acoustic/auditory parameters

51112/05



WWW.BSSVE.IN

Learning and maintaining phonemic goals: Use of Auditory Feedback • Audition is crucial for normal speech acquisition • Postlingual deafness: Intelligible speech, but with some abnormalities • Regain some hearing with a Cochlear Implant (CI):

• Usually show parallel improvements in perception, production and intelligibility

Acoustic measuresof contrast between /l/ and /r/6 months afterreceiving a CI

2000

1500

1500

1000

500

500 0 -500 0 1000

L

l l

l lll ll l l

l lll

l l ll

llr r

r r

r

r r r

r rr r

r

rrr

r r

r

r r

R

l

l l l

l l l

lll lll l l l l ll

r rr rrr

r r rr

rrrrrr rrr R

L

2000

1500

1500

1000

500

500 0 -500 0 1000

F3 transition (Hz)F3 transition (Hz)

F3-F

2 (H

z)

PRE-CI POST-CI

Figures by MIT OCW.

•� Phonemic contrast is enhanced pre- to post-implant – typical for CI users, many of whom have somewhat diminished contrasts pre-implant

52112/05



WWW.BSSVE.IN

Long-term stability of auditory-phonemic goals for vowels �

• Typical pre- ( �) and post- ( ) �implant formant patterns: �generally congruent with �normative data ( )

pre-FA: some irregularity of F2 implant (18 years after onset of profound hearing loss)

– One year post-implant: F2 values are more like normative ones

• Phonemic identity doesn’t change; degree of contrast can

• Goals and feedforward commands for vowels generally are stable – If they degrade from hearing loss,

can be recalibrated with hearing from a CI

Perkell, J., H. Lane, M. Svirsky,

and J. Webster. "Speech of cochlear implant

patients: A longitudinal study of vowel production."

J Acoust Soc Am 91 (1992): 2961-2979.

53

112/05

FA FB

Vowel

Data from 2 cochlear implant users




WWW.BSSVE.IN

Long-term stability of phonemic goals for sibilants in CI users

• Subjects 1 and 3: good distinctions between /s/ and /6/ pre-implant – –� Typical, decades following onset o

hearing loss• Subject 2: reversed values and distorted

productions pre-implant –� After about 6 months of implant use,

sibilant productions improved • These precisely differentiated

articulations are usually maintained for years without hearing –� Possibly because of the use of

somatosensory goals – e.g. pattern of contact between tongue, teeth and palate

Spectral� median� (Acoustic � COG) /6/ /s/

Matthies, M. L., M. A. Svirsky, H. Lane, and J. S. Perkell.

"A preliminary study of the effects of cochlear implants on

the production of sibilants." J Acoust Soc Am 96 (1994): 1367-1373.

54� 112/05




WWW.BSSVE.IN

Responses to abrupt changes in hearing and motor innervation

An NF2 patient with sudden hearing �loss, followed by some motor loss �

• Two surgical interventions – OHL: Onset of a significant hearing

loss (especially spectral) from removalof an acoustic neuroma

– Hypoglossal nerve transpositionsurgery ĺ Some tongue weakness

• /s-6/ contrast: Good until second �surgery, when contrast collapsed �

• Hypothesis: Feedforward mappings �invalidated by transposition surgery �– Without spectral auditory feedback,

compensatory adaptation (relearning) was impossible – as might be possiblewith hearing� Figure by MIT OCW.

– Somatosensory goal deteriorated without auditory reinforcement� Spectral median for /s/ and /6/ vs.

weeks in an NF2 patient

55112/05

0

( )

/ S /

/�/

-20 20 40 60 80 100 120 3200

4200

5200

6200

7200 OHL

Hypoglossal transposition surgery

Week

Med

ian

Hz



WWW.BSSVE.IN

Vowel Contrasts and Hearing Status• Compare English with Spanish CI users, CI processor OFF and ON • Previous findings: Contrasts increase with hearing, decrease without • Hypothesis: Because of the more crowded vowel space in English, turning the

CI processor OFF and ON will produce more consistent decreases and increases in vowel contrasts in English than in Spanish

( )

( )

-

i

ie

e I

�

u

uo

oc

�

�

a

( )

( ) ( )

200

400

600

800

200

400

600

800 2500 2000 1500 1000 500 2500 2000 1500 1000 500

ENGLISH eM3

ON 612

719P&B

OFF 537ae

505 ON

633 OFF

719

F1 H

z

F1 H

z

SPANISH - sM5

P&B English

F2 Hz F2 Hz

Figure by MIT OCW.

• Average Vowel Spacing (AVS) – a measure of overall vowel contrast • Change of AVS from processor ON to processor OFF (for 24 hours)

– AVS: decreases for the English speaker, increases for the Spanish speaker



WWW.BSSVE.IN

ON

AVS – by subject

• Prediction: AVS increases with the CI processor (hearing)

• Changes follow the predicted pattern more consistently for English than Spanish speakers

( )

350

250

150

50

-50

-150

-350

-250

350

250

150

50

-50

-150

-350

-250

350

250

150

50

-50

-150

-350

-250

350

250

150

50

-50

-150

-350

-250

F1 F2 F3 F4 F5

F1 F2 F3 F4 F5

English OFF to ON

English ON to OFF Spanish ON to OFF

Spanish OFF to ON

SUBJECT SUBJECT

Predicted

Predicted Predicted

Predicted

Not Predicted

Not Predicted

Not Predicted

Not Predicted

M1 M2 M3 M4 M7

M1 M2 M3 M4 M7

M5 M6 M8 M9

M5 M6 M8 M9

Cha

nge

in A

vera

ge V

owel

Spa

cing

Hz

Figure by MIT OCW.

57112/05



WWW.BSSVE.IN

112/05 58

Modeling Contrast Changes: Clarity vs. Economy of Effort

• DIVA contains a parameter that changes sizes of all goal regionssimultaneously – to control speaking rate and clarity (e.g., AVS)

• Shrinking goal region size – like what English speakers do with hearing – Produces increased clarity (contrast distance), decreased dispersion

– Without hearing, economy of effort dominates• With fewer vowels in Spanish, clarity demands aren’t as stringent

– Acceptable contrasts may be produced regardless of hearing status, without changing goal region size

Aco

usti

c/a

ud

ito

ry

para

mete

r C1

V1

C2

V2

Time

Increased

contrast

distance

Aco

usti

c/a

ud

ito

ry

para

mete

r C1

V1

C2

V2

Time

Increased

contrast

distance



WWW.BSSVE.IN

Effect of Varying S/N in Auditory FeedbackA

vger

age

Vow

el S

paci

ng (M

els

Sta

ndar

dize

d) Averaged Across 6 NH Subjects Averaged Across CI Subjects - 1 month

1.0 2

1.0

A A A A A

A

A

D D D

D D D

D

S S

S S

S S S

-1.0

A

D

S

1.0 1.5

D 1.00.50.5

1 �A � 0.5

0.00.0 0.0

0

-0.5-0.5 -0.5 �S� D�

A A

A A

AA

D D D

D

D

D

SS

S

S

S

S S

A -1.0 �-1.0 �

-1 -1.0 �

-2.50 -1.75 -1.00 -0.25 0.50 1.25 2.00 -2.50 -1.75 -1.00 -0.25 0.50 1.25 2.00 �

Standardized, binned NSR Standardized, binned NSR

( )

Averaged Across CI Subjects - 1 Year 0.5

1.50.5

A

A

A A

A

A

D

D

D D

D

D D

S S

S

S S

S S A

0.0 0.5

-0.5-0.5 D S A

-1.5-0.5 -1.0

-2.50 -1.75 -1.00 -0.25 0.50 1.25 2.00

Standardized, binned NSR

ti

Dura

ion or SPL (S

tandardzed)

• Normal-hearing and CI subjects heard their own vowel productions mixed with increasing amounts of noise

• In general, AVS increased, then decreased with increasing N/S • Possible explanation: With increasing N/S

– If auditory feedback is sufficient, clarity is increased – As feedback becomes less useful, economy of effort predominates

• Similar result for /U-6/ contrast, but with peak at lower NSR 59112/05



WWW.BSSVE.IN

Bite block experiments


• Speakers compensate fairly well with the mandible held at unusual degrees of opening

• Compensations may be better for quantal vowels (with better-defined articulatory targets)

• Presumably, the speakers mappings are not as accurate for the perturbed condition

• Compensation continues to improve, possibly with the help of auditory feedback

60112/05



WWW.BSSVE.IN

Mappings can be Temporarily Modified: Auditory Feedback (Houde and Jordan)

• SensorimotorAdaptation


• Methods: Please see: Figures from: Houde, J. F. and M. I. Jordan. "Sensorimotor adaptation of speech I:

•� Fed-back (whispered) � Compensation and adaptation." J Speech, Language, Hearing Research 45 (2002): 295-310. vowel formants were gradually shifted

•� 16 msec delay • Subjects were Figures removed due to copyright reasons.

unaware of shift Please see: Figure from: Houde, John Francis. "Sensorimotor

•� Results: � adaptation in speech production." Thesis (Ph. D.)-

•� Subjects adapted for � -Massachusetts Institute of Technology, Dept. of Brain

shift by modifying and Cognitive Sciences, 1997.

productions in the opposite direction

•� Effect generalized to other consonant environments and to other vowels •� Effect persisted in the presence of masking noise: “Adaptation” •� Adaptation was exhibited later, simply by putting subjects in the apparatus (no shift)

•� Speakers use auditory goals and auditory-motor mappings.

61112/05



WWW.BSSVE.IN

Sensorimotor adaptation • Subjects hear own vowels with F1 perturbed, are unaware of perturbation

–� •� Thesis project of Virgilio Villacorta

Based on work of Houde & Jordan, but

Averaged across 10 subjects

with voiced vowels

• Subjects partially compensate by shifting F1 in opposite direction; Shift is formant-specific

• Mismatch between expected and produced auditory sensations Æ Error correction

• 20 subjects: Varied in amount of compensation • Is there a relation between perceptual acuity

and amount of compensation? 62112/05



WWW.BSSVE.IN

Relation Between Adaptation and Auditory Discrimination

F2

F1

/(/

LO HI

Compensatory

Perturbation

responses

• DIVA and previous studies: Production goals for vowels are primarily regions in auditory space – Speakers with more acute auditory

discrimination have smaller goal regions, spaced further apart

• Hypothesis: – More acute perceivers will adapt more

Hypothetical compensatory responses to F1 to perturbation�perturbation by High- and Low-Acuity speakers

• Measure of auditory acuity: JNDs on milestone = center

r 2

= 0.250, p = 0.040

0.7 training, 1.0 m.s.

1.3 training, 1.0 m.s.

pairs of synthetic vowel stimuli0.08

• Result: Hypothesis is supported 0.075

0.07

0.065

0.06

0.055

0.05

0.045

0.04

0.035

0.03

-0.05� 0 0.05 0.1 0.15� Adaptive Response Index�

63112/05

jnd

(per

t) xy||z

r 0.661 0.656

r 2

0.514 0.436 0.430

0.004 0.010

first order r

jnd, ARI||F1 sep F1 sep, ARI||jnd jnd, F1 sep||ARI

-0.717

p-score 0.021*



WWW.BSSVE.IN

Modifying a Somatosensory-to-Motor Mapping

A “Force Field Adaptation” experiment (Ostry & colleagues)

Figures by MIT OCW.

•� Methods: • Velocity-dependent forces applied (gradually) by a robotic device act to

protrude the jaw: proportional to instantaneous jaw lowering or raising velocity • Jaw motion path over large number of repetitions (700) is used to assess

adaptation, which may be evidence of: •� Modification of somatosensory-motor mappings •� Incorporation of information about dynamics in speech movement planning

(Ostry’s interpretation)



WWW.BSSVE.IN

Results

0 0

5

5

5 0 5 5 0 5 j ( )

j (

)

j (

)

j ( )

10

10

15

15

20

30

25

20

10 10 1015 mm

mm

mm

mm

SPEECH CONDITION NON-SPEECH CONTROL

Null field

Initial exposure

Adapted

Horizontal aw position

Verti

cal

aw p

ositi

on

Verti

cal

aw p

ositi

on

Horizontal aw position

After effect

Summary and interpretation • Subjects adapt to a motion

dependent force field applied to the jaw during speech production

• Kinesthetic feedback alone is not sufficient for adaptation; have to be in a “speech mode”

• Control signals (mappings) are updated based on differences between expected and actual feedback

• Information about dynamics is incorporated in speech motor planning

Figures by MIT OCW.

65112/05



WWW.BSSVE.IN

Rapid drift of spectral median for /6/

• A CI “on-off” experiment 2000 25001500

-0.29

posi

tion

(dm

)

-0.28

-0.27

-0.26

-0.25

-0.24

On 2On1 On 2

80 dB

|�| Spectral median

75 dB

3240 Hz

3120 Hz

500 1000 1500 2000 2500

Time (s)

Tong

ue b

lade

hor

izon

tal

Off 1 Off 2 Off 2

Vowel SPL

Time (s)

Figure by MIT OCW. • Observations

•�Vowel SPL increased rapidly with CI processor off, decreased with processor on •�Spectral median drifted upward toward /s/ during the 1000 seconds with processor

off – Surprising, since the goals are usually stable •�Hearing one aberrant utterance when the processor was turned on, speaker

overcompensated to restore an appropriate /6/ • Extremely narrow dental arches (and movement transducer coil on tongue)

may have made it difficult for speaker to rely on somatosensory goal • He may have had to rely predominantly on auditory feedback to maintain

feedforward control on an utterance-to-utterance basis

66112/05



WWW.BSSVE.IN

Unanticipated Acoustic Perturbations (Tourville et al., 2005)

Form

ant d

iffer

ence

F1

F1 shifted up

F1 shifted down

(shi

fted-

unsh

ifted

-H

z)

Time (s)

• Results (averaged across 11 subjects) • Methods – like sensorimotor

adaptation, but with sudden, – Subjects produced compensatory

modification of F1, in direction opposite unanticipated shift of F1 to shift – Subjects pronounced /C(C/ – Delay of about 150 ms. – compatible

words with auditory feedback with other results, in which F0 was through a DSP board shifted.

– In 1 of 4 trials, F1 was shifted up toward /4/ or down toward /,/

• Result is compatible with error detection and correction mechanisms in DIVA

112/05 67



WWW.BSSVE.IN

A

How long does it take for Vowel 1� Vowel 2

200

100

120

140

160

180

200

on_off off_on

180

parameters to change when hearing is turned on or off?

160

140

120

100

SHED

A SHED

l

l

SOUNDPROOF ROOM

MIX/AMP

Computer mouse

PC running CI contro software Speech

Processor

SP contro interface box

-4 -3 -2 -1 0 1 2 3 4 5 -4 -3 -2 -1 0 1 2 3 4 5

85

83

81

79

77

75 75

77

79

81

83

85

-4 -3 -2 -1 0 1 2 3 4 5 -4 -3 -2 -1 0 1 2 3 4 5 Utterance relative to switch control signal from printer port

•� Subject pronounced a large number • Example results for one subject – Changes not evident until second vowelof repetitions of four 2-syllable

utterances (e.g., done shed, don – Change may be more gradual for SPL said; quasi-random order). than for F0

•� CI processor state (hearing) was �switched between on and off �unexpectedly �

• Results varied among parameter and subject – Perhaps related to subject acuity?

68112/05

F0 (H

z)

dB S

PL



WWW.BSSVE.IN

Unanticipated Movement perturbation – Motor Responses

Figures removed due to copyright reasons. Please see: Abbs, J. H., and V. L. Gracco. "Control of complex motor gestures -orofacial muscle responses to load perturbations of lip during speech." Journal of Neurophysiology 51 (1984): 705-723.

• Abbs et al.; others (1980s) – In response to downward perturbation

of lower lip in closure toward a /p/, – Upper lip responds with increased

downward displacement, accompanied by EMG and velocity increases

– The response is phoneme-specific

69112/05



WWW.BSSVE.IN

Further observations andinterpretation (Abbs et al.)


• Coordinated speech gestures are performed by “synergisms” – – Temporarily recruited combinations of neural

and muscular elements that convert a simple • Motor equivalence at the muscle input into a relatively complex set of motor

and movement levels commands • There are alternative interpretations (Gomi, et al.)

70112/05



WWW.BSSVE.IN

Motor and acoustic responses to unanticipated jaw perturbations (In collaboration with David Ostry)

“see red” N = 100 (two 50 rep trials) 1200 • Robot used to perturb 1100

850 900

assistive/down (10) resistive / msec 700 750 800

up (10)

pertu

rbat

ion

jaw movements – Triggered by downward

F2

-F1

(H

z)

1000

900

800movement • 50 repetitions/utterance� 700

e.g., “see red” 0

– 5 perturbed upward -5

(resistive) -10

– 5 downward (assistive) JA

WY

(mm

)

700 750 800 850 900 950

msec

• Formants begin to recover 60-90 ms after perturbation; jaw does not

•� Two other subjects were similar • Evidence of within-movement, closed-

loop error correction 71


-15

-20

950



WWW.BSSVE.IN

Compensatory Responses to Unexpected Palatal Perturbation (Honda , Fujino & Murano)

• A subject pronounced phrase:/L$ 6$ 6$ 6$ 6$ 6$ 6$ 6$ 6$/

• Movements and acoustic signal wererecorded

• Palatal configuration was perturbed by inflation of a small balloon on 20% of Figures removed due to copyright reasons.

Please see: trials (randomly determined) Honda, M., and E. Z. Murano. "Effects of tactile

• Feedback conditions: andauditory feedback on compensatory articulatory response to an unexpected palatal perturbation."

– Feedback not blocked Proceedings of the 6th International Seminar on

– Auditory feedback blocked with masking Speech Production, Sydney, December 7 to 10, 2003.

noise – Tactile feedback blocked with topical �

anesthesia �– Both types of feedback blocked

• Measures – Articulatory compensations – Listener judgments of distorted sibilants

72112/05



WWW.BSSVE.IN

Results

• Perturbation caused distortions in �/6/ production �– Compensation and feedback:

•� With feedback not blocked, speaker compensated within � Figures removed due to copyright reasons.

about 2 syllables •� With auditory or tactile

feedback blocked, speaker wasmuch less able to compensate

•� With both forms of feedbackblocked, compensation wasworst Mean error score for fricative consonant identification (%)

•� Results are compatible with Syllable No. 1 2 3 4 5 6 7 8

– Sensory goals as basic units Normal auditory-feedback Steady-state deflated 0 0 0 0 0 0 0 0

– Use of mismatches between � Inflation 83 14 0 0 0 0 0 0 Deflation 8 0 0 0 0 0 0 0expected and actual sensory Steady-state inflated 0 0 0 0 0 0 0 0

consequences to correct Masked auditory-feedback feedforward commands Steady-state deflated 0 0 0 0 0 14 11 11

Inflation 72 39 39 44 28 42 33 50 Deflation 0 0 0 3 3 11 11 17

Steady-state inflated 47 39 44 50 44 44 44 44

73112/05



WWW.BSSVE.IN

Feedforward vs. Feedback Control and Error Correction

• In DIVA, feedback and feedforward control operate simultaneously; feedforward usually predominates

• Feedback control intervenes when there is a large enough mismatch between expected and produced sensory consequences (sensorimotor adaptation results)

• Timing of correction: – With a long enough movement, correction is expressed (closed loop)

during the movement (e.g., “see red”) – Otherwise, correction is expressed in the feedforward control of

following movements (e.g., /6/ spectrum, vowel SPL, F0 when CI turned on) – Correction to an auditory perturbation takes longer than to a

somatosensory perturbation (presumably due to different processing times)

• Additional Examples of error correction – Closed-loop responses to perturbations (see Abbs, others)

– Feedforward error correction with, e.g., dental appliances – Responses to combined perturbations (cf. Honda & Murano)

– All are compatible with DIVA’s use of feedback 74112/05



WWW.BSSVE.IN

Outline


speech movements? • Segmental motor programming goals • Producing speech sounds in sequences • Experiments on feedback control • Summary

75112/05



WWW.BSSVE.IN

Summary of Main Points

• Highest level control variables for phonemic movements – Auditory-temporal and somatosensory-temporal goal regions

• Goal regions encoded in CNS – Projections (mappings) from premotor to sensory cortex: �

Expected sensory consequences of producing speech sounds� • Goal regions defined partly by articulatory and acoustic saturation

effects that are properties of vocal-tract anatomy and acoustics – Most vowels: goals primarily auditory; saturation effects, acoustic – Consonants: both auditory and somatosensory goals; saturation effects,

primarily articulatory (e.g., any consonant closure) • Articulatory-to-acoustic motor equivalence (/u/, /r/)

– Help stabilize output of certain acoustic cues – Evidence that goals are auditory

76112/05



WWW.BSSVE.IN

Summary (continued)

• Auditory feedback (CI users) – Used to acquire goals and feedforward commands – Needed to maintain appropriate motor commands with vocal-tract growth,

perturbations • Goals and feedforward commands are usually stable, even with hearing

loss • Clarity vs. economy of effort

– Tradeoff evident when hearing (CI) is turned on, off, in presence of noise • Relations between production and perception

– Better discriminators produce more distinct sound contrasts – Better discriminators may learn smaller, more distinct goal regions

• Feedback and feedforward control – Frequently used sounds (syllables, words) are encoded as feedforward

commands– Responses to perturbations: intra-gesture are closed loop; inter-gesture are

via adjustments to subsequent feedforward commands

77112/05



WWW.BSSVE.IN

DIVA (Next lecture)


• Components have hypothesized correlates in cortical activation

• Hypotheses can be tested with brain imaging

• Can quantify relations among phonemic specifications, cortical activity, movement

and the speech sound output

112/05 78



WWW.BSSVE.IN

112/05 79

Questions?



WWW.BSSVE.IN

SOME REFERENCES

•� Abbs, J.H. and Gracco, V.L. (1984). �Control of complex motor gestures - orofacial muscle responses to load perturbations of lip during speech. Journal of Neurophysiology, 51, 705-723.

•� Bradlow, A.R., Pisoni, D.B., Akahane-Yamada, R. and Tohkura, Y. (1997). Training Japanese listeners to identify English /r/ and /l/: IV. Some effects of perceptual learning on speech production, J. Acoust. Soc. Am. 101, 2299-2310.

•� Browman, C.P. and Goldstein, L. (1989). Articulatory gestures as phonological units. Phonology, 6, 201-251.

•� Fujimura, O. & Kakita, Y. (1979). Remarks on quantitative description of lingual �articulation. In B. Lindblom and S. Öhman (eds.) Frontiers of Speech Communication Research, Academic Press.

•� Fujimura, O., Kiritani,S. Ishida,H. (1973). Computer controlled radiography for observation of �movements of articulatory and other human organs, Comput. Biol. Med. 3, 371-384 �

•� Guenther, F.H., Espy-Wilson, C., Boyce, S.E., Matthies, M.L., Zandipour, M. and Perkell, J.S. (1999) Articulatory tradeoffs reduce acoustic variability during American English /r/ production. J. Acoust. Soc. Am., 105, 2854-2865.

•� Honda, M. & Murano, E.Z. (2003). Effects of tactile and auditory feedback on compensatory articulatory response to an unexpected palatal perturbation, Proceedings of the 6th International Seminar on Speech Production, Sydney, December 7 to 10.

•� Houde, J.F. and Jordan, M.I. (2002). Sensorimotor adaptation of speech I: Compensation and �adaptation. J. Speech, Language, Hearing Research, 45, 295-310. �

•� Lindblom, B. (1990). Explaining phonetic variation: a sketch of the H&H theory. In W. J. Hardcastle and A. Marchal (eds.), Speech production and speech modeling. Dordrecht: Kluwer. 403-439.

•� Newman, R.S. (2003). Using links between speech perception and speech production to evaluate different acoustic metrics: A preliminary report. J. Acoust. Soc. Am. 113, 2850-2860.

•� Perkell, J.S. and Nelson, W.L. (1985). Variability in production of the vowels /i/ and /a/, J. Acoust. Soc. Am. 77, 1889-1895.

•� Saltzman, E.L. and Munhall, K.G. (1989). A dynamical approach to gestural patterning in speech production, Ecological Psychology, 1, 333-382.

•� Stevens, K.N. (1989). On the quantal nature of speech. J. Phonetics,17, 3-45. 80112/05



WWW.BSSVE.IN

Selected References

Slide 6 (right figure): Denes, Peter B., and Elliot N. Pinson. The Speech Chain: The Physics and Biology of Spoken Language.New York, NY: W.H. Freeman, 1993. ISBN: 0716723441.

Slide 16 (left figure): Grant, J. C. Boileau (John Charles Boileau). Grant’s Atlas of Anatomy. 5th ed. Baltimore, MD: Williams &Wilkins, 1962.

Slide 18, Slide 27 (right three), Slide 28, Slide 55: Perkell, J. S., F. H. Guenther, H. Lane, M. L. Matthies, P. Perrier, J. Vick, R. Wilhelms-Tricarico, and M. Zandipour. “A theory of speech motor control and supporting data from speakers with normal hearing and with profound hearing loss.” J Phonetics 28 (2000): 233-372.

Slide 23 (left): Fujimura, O., and Y. Kakita. “Remarks on quantitative description of lingual articulation.”Frontiers of Speech Communication Research. Edited by B. Lindblom and S. Öhman. Academic Press, 1979.

Slide 23 (mid/lower 2): Perkell, J. S. “Properties of the tongue help to define vowel categories: Hypotheses based on physiologically- oriented modeling.” Journal of Phonetics 24 (1996): 3-22.

Slide 20-21: Stevens, K. N. "On the quantal nature of speech." J Phonetics 17 (1989): 3-45.

Slide 24: Denes, P. B., and E. N. Pinson. The Speech Chain. New York, NY: W.H. Freeman, 1993, p. 68. ISBN: 0716722569.



WWW.BSSVE.IN

Selected References (cont.)

Slide 26 (top right): Perkell, J. S. “Properties of the tongue help to define vowel categories: Hypotheses based on physiologically- oriented modeling.” Journal of Phonetics 24 (1996): 3-22.

Slide 42: Perkell, J. S. “Articulatory Processes.” The Handbook of Phonetic Sciences. Edited by W. Hardcastle and J. Laver. Blackwell, 1997, pp. 333-370.

Slide 56, Slide 57: Perkell, J., W. Numa, J. Vick, H. Lane, T. Balkany, and J. Gould. “Language-specific, hearing-related changes in vowel spaces: A preliminary study of English- and Spanish-speaking cochlear implant users.” Ear and Hearing 22 (2001): 461-470.

Slides 66: Perkell, J. S., F. H. Guenther, H. Lane, M. L. Matthies, P. Perrier, J. Vick, R. Wilhelms-Tricarico, and M. Zandipour. “A theory of speech motor control and supporting data from speakers with normal hearing and with profound hearing loss.” J Phonetics 28 (2000): 233-372.

Slides 26 (left set): Perkell, J. S., F. H. Guenther, H. Lane, M. L. Matthies, P. Perrier, J. Vick, R. Wilhelms-Tricarico, andM. Zandipour. “A theory of speech motor control and supporting data from speakers with normal hearing and with profound hearing loss.” J Phonetics 28 (2000): 233-372.

Slide 27:Perkell, J. S., F. H. Guenther, H. Lane, M. L. Matthies, P. Perrier, J. Vick, R. Wilhelms-Tricarico, and M. Zandipour. "A theory of speech motor control and supporting data from speakers with normal hearing and with profound hearing loss." J Phonetics 28 (2000): 233-372.��



WWW.BSSVE.IN

Neuroimaging Correlates of HumanAuditory Behavior

HST 722Brain Mechanisms for Hearing and Speech

October 27, 2005Jennifer Melcher

Harvard-MIT Division of Health Sciences and TechnologyHST.722J: Brain Mechanisms for Hearing and SpeechCourse Instructor: Jennifer R. Melcher



WWW.BSSVE.IN

"The Problem"

Figure by MIT OCW.



WWW.BSSVE.IN

SoundPerception

NoninvasivePhysiologicMeasures

NeuralActivity



WWW.BSSVE.IN

Figure above illustrates how auditory evoked potentials are measured. Potential shown wasevoked by a click stimulus and was recorded in a human subject. Waveform is the averagedresponse to many click presentations. (AEP record from R.A. Levine)(Courtesy of Robert Aaron Levine. Used with permission.)



WWW.BSSVE.IN



WWW.BSSVE.IN

Furst et al. (1985) “Click lateralization is related to the βcomponent of the dichotic brainstem auditory evokedpotentials of human subjects”

For binaural clicks with different ITDs and ILDs, quantified

perception

binaural difference potential

Attributes of the binaural difference are correlated withthe perception of binaural sound.



WWW.BSSVE.IN

Binaural difference (BD) is derived from BAEPs evoked bymonaural and binaural stimuli (above). BAEP: brainstem auditoryevoked potential

The BD reflects an interaction between converging signalsfrom the two ears at the level of the brainstem.

Binaural Difference Potential

Fig. 9. Binaural, sum of the monaurals, and binaural difference waveforms for both species.The binaural (solid lines) and sum of the monaural waveforms (dotted lines) aresuperimposed. The difference between these two waveforms, the binaural difference (BD)waveform, is plotted below. The recording electrodes were vertex to nape for both species.Stimuli were 10/sec; rarefaction clicks at 40 dB SL for the cat and 38 dB HL for the human.(from Fullerton et al., 1987)

0 4 8 12 0 4 8 12msec msec

CAT 24 SU 44

bd

a c

β

α

7.0 µV 0.2 µV

CAT HUMAN

Σ MON

BINBD

P4 P5

Figures by MIT OCW.



WWW.BSSVE.IN

Cellular Generators of the Binaural Difference Potential

Cellular generators of the binaural difference potential in cat. Diagonal line shadingsindicate the generators of the first peak (‘b’; white on black) and possible generatorsof the second peak (‘d’; black on white). The schematic of the lower auditory system(at bottom) shows the generators’ relationship to other cells. NLL, nuclei of thelateral lemniscus; IC, inferior colliculus.

(From Melcher, 1996)

LEFT SOC

Midline

Trapezoidbody

Auditorynerve

Laterallemniscus

NLL/IC

?

MSOLSO

Principal cell

COCHLEARNUCLEUS

Cellular generators of the binaural difference potential in cat. Green color indicate the generatorsof the first peak ('b'; white on black) and possible generators of the second peak ('d'; black on white).The schematic of the lower auditory system (at bottom) shows the generators' relationship to other cells.NLL, nuclei of the lateral lemniscus; IC, inferior colliculus.

DCN

PVCNpPVCNa

AVCNp

AVCNa

RIGHT SOC

Sphericalcell

Binaural difference waveform

b d

Figure by MIT OCW.



WWW.BSSVE.IN

Discussion Questions:

If the generator results are combined with the findings ofFurst et al., what can be said about the neural processingunderlying sound lateralization and binaural fusion?

We generally think of the MSO as a coincidence detector.Are Furst et al.’s binaural difference data consistent with this idea?



WWW.BSSVE.IN

Late Responses: dependence on attention and

stimulus context

Idealized AEP evoked by transient stimuli( ___ ) including components that are dependent onstimulus context and subject attention ( ….. , ----- ).(from Hillyard and Kutas, 1983; also see Hillyard etal., 1973; Donchin et al., 1978).

Figure 1-12. Schematic diagram of oddballstimulus presentation paradigm for P300measurement (from Squires & Hecox, 1983).Selected measurement parameters areindicated. Responses are averaged separatelyfor Stimulus Type 1 (i.e. the frequent stimulus)and Stimulus Type 2 (i.e. the rare or oddballstimulus). Note. From “ElectrophysiologicalEvaluation of Higher Level AuditoryProcessing” by K.C. Squires and K.E. Hecox,1983, Seminars in Hearing, 4 (4), p. 422.Reprinted by permission.(from Hall, 1992)

Nd

N2

P3

- or “processing negativity”- produced when the subject attends to the stimuli- visualized by taking the difference between responses to attended and unattended stimuli

- or “N2000”, “mismatch negativity- occurs in response to “rare” stimuli (S2 below) in oddball paradigm- can occur even when the subject is not attending to the stimuli- dependent on stimulus modality (e.g. auditory vs. visual)

- or “P300”- occurs in response to “rare” stimuli (S2 below) in oddball paradigm when the subject is attending to the stimuli- independent of stimulus modality

III

IIIIV V

VIP0

N0Na

N1

Nd

N2

Nb

Pa P1P2

P3 (P300)

-5µv

+5µv

Stimulus Onset

10Time (msec)

Auditory Event-Related Potential

1000100

S1 S1 S1 S1 S2 S1 S2

Average 1 Average 2

Two Stimuli, S1 & S2Train with fixed intervalsProbabilities of S1 & S2 unequalS1 & S2 mixed randomly

ODDBALL PROCEDURE

Figure by MIT OCW.

Figure by MIT OCW.www.bsscommunitycollege.in www.bssnewgeneration.in www.bsslifeskillscollege.in


WWW.BSSVE.IN

N2 (N200), P3 (P300)

Fig. 1. Mean for eight subjects of the non-signal (NS), signal (S) and difference (Δ) waveforms at each electrode site in theauditory condition. Isopotential topographic distributions are expressed as percentages of maximum response amplitude forthe N1 and P2 components of the non-signal response (left) and the negative (N2 Δ) and positive (P3 Δ) components of theΔ waveform (right). Supraorbital (0) and vertex (electrode 3) traces from the 3 runs are superimposed.(From Simson et al., 1977)NS - responses to standard stimuli (2000 Hz tone bursts)S - responses to rare stimuli (1000 Hz tone bursts)Δ - response to rare stimuli minus response to standard stimuli

Fig. 4. Frontal, vertex, and parietal(across-subjects averaged) differencewaveforms obtained by subtracting theERP to the 1000-Hz standard stimulusfrom that to the 1044-Hz deviant stimulusat different deviant-stimulus probabilities.The continuous line indicates the countingcondition and the broken line the ignorecondition. The amplitude of the fronto-centrally distributed MMN is decreasedwhen the probability is increased from 2%to 10%. When the two stimuli areequiprobable, no MMN is seen.

From Sams et al., 1985)

MMN - mismatch negativity

100 ms

+5µv-

12 3 4

5

67050

70

90

0507090

0

1

N1NS S ∆

P2

N2∆

P3∆

2

3

4

5

67

8

9

10

11

12

13

9

912 11

10

8 507090

70

50

90

Figure by MIT OCW.

MMN

2% 10% 50%

P3

Pz

Oz

Fz

Pz

Oz

Fz

Pz

Oz

Fz

s

200 msec

10µV

Figure by MIT OCW.



WWW.BSSVE.IN

Kraus et al. (1996) “Auditory neurophysiologic responsesand discrimination deficits in children with learning problems”

Stimuli: Syllables, varied along two continua

Subjects: Children with and without learning problems

Measured discrimination and mismatch negativity

The children with learning problems showed- deficits in their ability to discriminate syllables- abnormally small mismatch negativity

Conclusion The behavioral deficits in the children with learningproblems arose at a processing stage that precedes conscious perception.



WWW.BSSVE.IN

Brain ActivityIncrease

MetabolicResponse

Blood Flow

IncreaseTracer

Increase

Brain ActivityIncrease

MetabolicResponse

Blood Flow

Increase

BloodOxygen-

ationIncrease

ImageSignal

Increase

fMRI(Blood Oxygenation Level-Dependent (BOLD))

PET(radioactive tracer e.g., radio-labeled H2O)



WWW.BSSVE.IN



WWW.BSSVE.IN

Computational inflation of the corticalsurface. In the inflated format, the cortex ofsulci and gyri can be viewed simultaneously.See Fischl et al. (1999) NeuroImage 9: 195-207.

Folded Cortex(lateral view)

Inflated Cortex(lateral view)

A B C

Temporal Lobe(view from above)

planum polare

planum temporale

Heschl’s gyrus

lateral

anterior

SuperiorTemporal Sulcus

Sylvianfissure

Human Brain(lateral view)

(Courtesy of Irina Sigalovsky. Used with permission.)



WWW.BSSVE.IN

Scott et al. (2000) “Identification of a pathway for intelligiblespeech in the left temporal lobe”

Four stimulus conditions that included speech and severalforms of degraded speech:

- speech - rotated speech- noise-vocoded speech - rotated, noise-vocoded

speech

The stimuli differed in - intelligibility- presence of phonetic information- presence of pitch variations

PET activity was compared between conditions.

ConclusionProcessing unique to intelligible speech is performedanteriorly in the left superior temporal sulcus, while lower-levelprocessing is performed more posteriorly in the left STSand STG.



WWW.BSSVE.IN

Discussion Questions:

Scott et al. argue that their choice of stimuli may be betterthan previous ones for identifying sites of speech-specificprocessing. Do you agree?

What assumptions have been made about the relationshipbetween brain activity and the functional specificity ofa brain region?



WWW.BSSVE.IN

Beauchamp et al. (2004) “Unraveling multisensory integration:patchy organization within human STS multisensory cortex” Nat. Neurosci. 7: 1190-1192.

Three stimulus conditions:- visual (videos of tools or faces)- auditory (sounds of tools or voices)- audio-visual (simultaneous images and sounds)

High resolution fMRI of the superior temporal sulcus, a knownregion of multimodal convergence

Three types of cortical patches were identified having: - auditory > visual response - visual > auditory response - auditory = visual

Conclusion“A model… suggested by our data is that auditory and visualinputs arrive in the STS-MS in separate patches, followedby integration in the intervening cortex.”



WWW.BSSVE.IN

Figure 2. Statistical parametric maps for contrasts of interest (group data). a, SPMs are shown as “glass brain” projections insagittal, coronal, and axial planes. b, SPMs have been rendered on the group mean structural MRI brain image, normalized tothe MNI standard stereotactic space (Evans et al., 1993). Tilted axial sections are shown at three levels parallel to thesuperior temporal plane: 0 mm (center), +2 mm, and -2 mm (insets). The 95% probability boundaries for left and righthuman PT are outlined (black) (Westbury et al., 1999). Sagittal sections of the left (x = -56 mm) and right (x = +62 mm)cerebral hemispheres are displayed below. All voxels shown are significant at the p < 0.05 level after false discovery ratecorrection for multiple comparisons; clusters less than eight voxels in size have been excluded. Broadband noise (withoutpitch) compared with silence activates extensive bilateral superior temporal areas including medial Heschl’s gyrus (HG) (b,center, yellow). In the contrasts between conditions with changing pitch and fixed pitch and between conditions withchanging spatial location and fixed location, a masking procedure has been used to identify voxels activated only by pitchchange (blue), only by spatial change (red), and by both types of change (magenta). The contrasts of interest activate distinctanatomical regions on the superior temporal plane. Pitch change (but not spatial location change) activates lateral HG,anterior PT, and planum polare (PP) anterior to HG, extending into superior temporal gyrus, whereas spatial change (but notpitch change) produces more restricted bilateral activation involving posterior PT. Within PT (b, axial sections), activationattributable to pitch change occurs anterolaterally whereas activation attributable to spatial change occurs posteromedially.Only a small number of voxels within PT are activated both by pitch change and by spatial change.

Figures 2a, 2b from Warren, and Griffiths. “Distinct mechanisms for processing spatial sequences and pitch sequences in the human auditory brain.” J Neurosci 23 (2003): 5799-5804. (Copyright 2003 Society for Neuroscience. Used with permission.)



WWW.BSSVE.IN

Zimmer and Macaluso (2005) “High binaural coherencedetermines successful sound localization and increasedactivity in posterior auditory areas”

Main Experiment: - fMRI and behavioral measurements during sound localization - manipulate sound location using ITD - manipulate ability to localize by manipulating binaural coherence - identify brain areas showing a correlation between activation and localization performance

Control Experiments: - separated activation specifically correlated with localization performance from activation correlated with binaural coherence.

Conclusion Within the superior temporal plane, only planum temporale showed activation specifically correlated with localization performance. It was concluded that binaural coherence cues are used by this region to successfully localize sound.



WWW.BSSVE.IN

Beyond Auditory Cortex: "What" and "Where" Pathways?

Do sound recognition and soundlocalization involve segregatednetworks (i.e., "what" and"where" pathways)? Thisquestion was addressed byMaeder and coworkers (2001). Inan fMRI experiment, subjectswere imaged in three conditions:(1) during a localization task, (2)during a recognition task, and (3)at rest (see right).

Ventral cortical areas showed greater activity during the recognition task (green, below), while dorsalareas showed greater activity during localization (red, below).

FIG.5 Active paradigm: 3-D projections of activation on smoothed normalized brain (group results). Areas more activatedin recognition than localization are shown in green, areas more activated in localization than in recognition are shown in red.Adapted fromMaeder et al. (2001) NeuroImage 14: 802-816.

FIG. 1 Schematic representation of the experimental paradigm, theblocks and the temporal structure of the stimuli. L = localization task;R = recognition task; r = rest.

Figures removed due to copyright reasons. Please see:Figures 1 and 5 in Maeder et al. “Distinct pathways involved in sound recognition and localization: a human fMRI study.” NeuroImage 14 (2001): 802-816.

,



WWW.BSSVE.IN

Lewald et al. (2002) “Role of the posterior parietal cortex inspatial hearing” J. Neurosci. 22: RC207.

Subjects performed a sound lateralization task before and after cortical stimulation using transcranial magnetic stimulation (TMS)

TMS: a noninvasive stimulation method that reversibly alters neuronal function

Stimuli: Dichotic tones with various ITDs

Task: indicate perceived location (left or right)

Stimulation site: posterior parietal lobe

Conclusion TMS produced a shift in sound lateralization, suggestinga role for posterior parietal cortex in spatial hearing.



WWW.BSSVE.IN

“What” and “Where” Pathways of the Visual System

The “what” and “where” pathways in the visual system include areas specialized for processing depth perception(symbolized by a pair of spectacles), form (an angle), color, and direction (the curve ahead sign). The result is objectrecognition (the “what” pathway) or object location (the “where” pathway).

Posner, M. I., and M. E. Raichle. Images of the Mind. New York, NY: Scientific American Library, 1994.

Figure removed due to copyright reasons. Please see:



WWW.BSSVE.IN

REFERENCES

Donchin, E., Ritter, W., and McCallum, W.C. (1978) Cognitive psychophysiology:

the endogenous components of the ERP. in "Event-Related Brain Potentials in Man." E.

Callaway, P. Tueting, S.H. Koslow, eds. Academic Press, New York.

Fullerton, B.C., Levine, R.A., Hosford-Dunn, H.L. and Kiang, N.Y.S. (1987)

Comparison of cat and human brain-stem auditory evoked potentials. Electroenceph. clin.

Neurophysiol. 66, 547-570.

Furst, M., Levine, R.A. and McGaffigan, P.M. (1985) Click lateralization is related

to the ! component of t he dichotic brainstem auditory evoked potentials of human

subjects. J. Acoust. Soc. Am. 78, 1644-1651.

Goldstein, M.H. Jr. and K iang, N.Y.S. (1958) Synchrony of neural activity in

electric responses evoked by transient acoustic stimuli. J. Acoust. Soc. Am. 30, 107-114.

Hall, James W. III (1992) "Handbook of Auditory Evoked Responses". Allyn and

Bacon, Boston.

Hillyard, S.A. and Kutas, M. (1983) Electrophysiology of cogn itive processing.

Ann. Rev. Psychol. 34: 33-61.

Hillyard, S.A., Hink, R.F., Schwent, V.L. and Picton, T.W. (1973) Electrical signs

of selective attention in the human brain. Science, 182, 177-182.

Kraus N, McGee TJ, Carrell TD, Zecker SG, Nicol TG, Koch DB Auditory

neurophysiologic responses and discrimination deficits in children with learning

problems. Science 273: 971-973 (1996).

Maeder PP, Meuli RA, Adriani M, Bellmann A, Fornari E, Thiran J-P, Pittet A,

Clarke S. Distinct pathways involved in sound recognition and localization: a human

fMRI study. NeuroImage 14:802-816 (2001)

Melcher, J.R. (1996) Cellular generators of the binaural difference potential. Hear.

Res. 95, 144 - 160.

Melcher, J.R. and Kiang, N.Y.S. (1996) Generators of the brainstem auditory

evoked potential in cat III: identified cell populations. Hear. Res. 93, 52 - 71.

Sams, M., Alho, K., Näätänen, R. (1985) The mismatch negativity and information

processing. in "Psychophysiological Approaches to Human Information Processing." F.



WWW.BSSVE.IN

Scott SK, Blank SC, Rosen S, Wise RJS. Identification of a pathway for

intelligible speech in the left temporal lobe. Brain 12: 2400-2406 (2000)

Simson, R., Vaughan, H.G.Jr., and Ritter, W. (1977) The scalp topography of

potentials in auditory and visual discrimination tasks. Electroenceph. clin. Neurophysiol.

42: 528-535.

Wang, B. and Kiang, N.Y.S. (1978) Synthesized compound action potentials

(CAP's) of the auditory nerve. J. Acoust. Soc. Am. 63: S77.

Wang, B. (1979) The relation between the compound action potential and unit

discharges of the auditory nerve. Ph.D. Thesis, Massachusetts Institute of Technology.



WWW.BSSVE.IN

fMRI: Mitigation of Scanner Acoustic Noise

Clustered Volume Acquisition: A method for removing the impact of scanner acousticnoise on auditory fMRI activation

Figure removed due to copyright reasons. Please see:Edmister, et al. Human Brain Mapping 7 (1999): 89-97.



WWW.BSSVE.IN

Selected References

Slide 7: �Fullerton, B. C., R. A. Levine, H. L. Hosford-Dunn, and N. Y. Kiang. "Comparison of�cat and human brain-stem auditory evoked potentials." Electroencephalogr Clin�Neurophysiol 66, no. 6 (Jun 1987): 547-70.�

Slide 8: �Melcher, J. R. "Cellular generators of the binaural difference potential." Hear Res95 (1996): 144-160.�

Slide 10 (Top figure): �Hillyard, S. A. and M. Kutas. "Electrophysiology of cognitive processing." Ann Rev�Psychol 34 (1983): 33-61.�

Slide 10 (Bottom figure): �Hall, James W. "Handbook of Auditory Evoked Re sponses." Allyn and Bacon. Boston, I, II, 1992.�

Slide 11 (Top figure): �Simson, R., H. G. Vaughan, Jr., and W. Ritter. "The scalp topography of potentials in �auditory and visual discrimination tasks." Electroenceph clin Neurophysiol 42 (1977): 528-535.�

Slide 11 (Bottom figure):�Sams, M., K. Alho, and R. Näätänen. "The mismatch negativity and information �processing." Psychophysiological Approaches to Human Information Processing. 1985. �



WWW.BSSVE.IN

Harvard-MIT Division of Health Sciences and TechnologyHST.722J: Brain Mechanisms for Hearing and SpeechCourse Instructor: Kenneth E. Hancock

HST.722 Brain Mechanisms of Speech and HearingFall 2005

Dorsal Cochlear NucleusSeptember 14, 2005

Ken Hancock



WWW.BSSVE.IN

Dorsal Cochlear Nucleus (DCN)

• Overview of the cochlear nucleus and its subdivisions• Anatomy of the DCN• Physiology of the DCN• Functional considerations



WWW.BSSVE.IN

The cochlear nucleus

DCN

VCN AN



WWW.BSSVE.IN

AN fibers terminate in a “tonotopic” or “cochleotopic” pattern

I

IIIII

0.17 kHz

2.7 kHz

10.2 kHz

36.0 kHz

ANR

sgb

1 mm

DCN

AVCN

PVCN

36.0 kHz

10.2 kHz

2.7 kHz

0.17 kHz



WWW.BSSVE.IN

Major subdivisions of the cochlear nucleus

PVCN

AVCNDCN

OAPD

AP

PV

MA

AA

VEST NERVE

AUDITORYNERVE

Spherical bushy

Globular bushy Multipolar

Octopus

Spherical bushy

Globular bushy Multipolar

Octopus

Figure by MIT OCW.



WWW.BSSVE.IN

Summary of pathways originating in the cochlear nucleus

PVCN

AVCN

DCN

OAPD

AP

PV

MA

AA

VEST NERVE

AUDITORYNERVE

Midline

SBC

GBC MAOA

SBC

GBC MAOA

Inferior colliculus

Lateral lemniscus

Superior olive

Figure by MIT OCW.



WWW.BSSVE.IN

Projections suggest DCN is a different animal than VCN

• (All roads lead to the inferior colliculus)• VCN projects directly to structures dealing with binaural

hearing and olivocochlear feedback• DCN ???

PVCN

AVCN

DCN

OAPD

AP

PV

MA

AA

VEST NERVE

AUDITORYNERVE

Midline

SBC

GBC MAOA

SBC

GBC MAOA

Inferior colliculus

Lateral lemniscus

Superior olive

Figure by MIT OCW.



WWW.BSSVE.IN

Dorsal Cochlear Nucleus

• Overview of the cochlear nucleus and its subdivisions– DCN projections do not reveal its function

• Anatomy of the DCN• Physiology of the DCN• Functional considerations



WWW.BSSVE.IN



• Anatomy of the DCN• Physiology of the DCN• Functional considerations



WWW.BSSVE.IN



WWW.BSSVE.IN

DCN

PVCN

I.C.I.C.

SomatosensoryAuditory

Vestibular


Vestibular


Vestibular

TD

?

?

TD

?

D

?

?

granulegranule

giant

fusiform

giant

fusiform

auditory nerve

auditory nerve

golgi

cartwheel

golgi

cartwheel

stellategolgi

cartwheel

verticalvertical

Kanold Young2001

Tzounopoulos 2004

Figure by MIT OCW.



WWW.BSSVE.IN



• Anatomy of the DCN– more complex than other CN subdivisions– nonauditory inputs– similar organization to cerebellar cortex

• Physiology of the DCN• Functional considerations



WWW.BSSVE.IN




• Physiology of the DCN• Functional considerations

Photograph of Eric Young removed due to copyright reasons. Please see: http://www.bme.jhu.edu/labs/chb/people/index.php?page=ABOUT&user=eyoung

Eric Youngwww.bsscommunitycollege.in www.bssnewgeneration.in www.bsslifeskillscollege.in


WWW.BSSVE.IN

Response Map classification scheme

~ Auditory Nerve increasing inhibition

200

10050

100

150

100

200

100

200

300

200

400

0 0

600

0 0 0

FREQUENCY FREQUENCY

Type I Type II Type III Type IV Type V

Sound Level Sound Level 10 dB10 dB

Soun

d Le

vel

Dis

char

ge R

ate

Excitatory Responses

Inhibitory Responses

BF Tone

Noise

Spont.Act

Figures by MIT OCW.



WWW.BSSVE.IN

DCN: Vertical cells are type II and type III units

•Narrow V-shaped region of excitation•No spontaneous activity•Tone response >> noise response

•V-shaped region of excitation•Inhibitory sidebands

Evidence: Antidromic stimulation (Young 1980)

BF tonenoise

type IItype IItype II

BF tonenoise

type III

400

00.5 1 2 5 10 20

-95

-90

-80

-70

-60

-50

-40

-30

-20

Frequency, kHz

Rat

e, S

pike

s/s

Rat

e, S

pike

s/s

dB atten

Sound Level0

Rat

e

1.25 2

300

05 10 20

2030405060708090

100

Frequency, kHz

100

200

0Sound Level

Rat

e

300

200

100

BF

Figures by MIT OCW.



WWW.BSSVE.IN

Antidromic stimulation•record from neuron•shock its axon

DAS

VCN

DCN

fusiform

giant

vertical

recordingelectrode

stimulatingelectrode

recordingelectrode

stimulatingelectrode

recordingelectrode

stimulating electrode

recordingelectrode

stimulating electrode

Figures by MIT OCW.www.bsscommunitycollege.in www.bssnewgeneration.in www.bsslifeskillscollege.in


WWW.BSSVE.IN

DCN: “Principal” cells are type III and type IV units

•“Island of excitation” & “Sea of inhibition”•BF rate-level curve inhibited at high levels•Noise rate-level curve ~ monotonic

Rat

e, sp

ikes

/sR

ate,

spik

es/s

1.25 2

300

05 10 20

2030405060708090100

Frequency, kHz

Frequency, kHz

50

100

150

100

200

0

0Sound Level

Sound Level

Rat

eR

ate

400

05 10 20 50

25

35

45

55

65

75

8595

dB attn

TYPE IIITYPE III

TYPE IVTYPE IV

Evidence:Antidromic stimulation (Young 1980)Intracellular recording and labeling (Rhode et al. 1983, Ding et al. 1996)

SpontSpont

Figures by MIT OCW.



WWW.BSSVE.IN

Neural circuitry underlying DCN physiology:type II units inhibit type IV units

Rat

e, sp

ikes

/s

Frequency, kHz

50

100

150

0Sound level

Rat

eR

ate

400

05 10 20 50

25

35

45

55

65

75

8595

dB attn.TYPE IVTYPE IV

TYPE IITYPE II

SpontSpont

400

00.5 1 2 5 10 20

-95

-90

-80

-70

-60

-50

-40

-30

-20

Frequency, kHz

Rat

e, sp

ikes

/sdB attn.BF

Sound level0

300

200

100

BF tonenoise

"Reciprocal" responses

Giant

Fusiform

Vertical



WWW.BSSVE.IN

Classic experiment: type II units inhibit type IV units• multiunit recording Voigt & Young (1980, 1990)

Spik

es/S

econ

d

Cross-correlogram

Milliseconds-10 -5 0 5 10

10

20

30

-20

-10

400

300

200

100

0110 60 10

BF Tone

Level, dB Attn

NoiseRat

e, S

pike

s

Rat

e, S

pike

s

200

100

0-25 25 75

BBN

BF

Level,dB

type IV firing rate given type II fires at t=0

inhibitory trough

Cross-correlogramtype IV firing rate given type II fires at t=0

inhibitory trough

-

type II unit type IV unittype II unittype II unit type IV unit



WWW.BSSVE.IN

DCN physiology so far…

IV ~ principal cells

II

auditory nerve

~ vertical cells

+

+

• type II units inhibit type IV units

• BUT this analysis based on pure-tone responses

⇒ what happens with more general stimuli???



WWW.BSSVE.IN

Inhibition from type II units doesn’t account for everything

• (DCN responses to broadband stimuli cannot be predicted from responses to tones: nonlinear)

• Type II units do not respond to notch noise—whither the inhibition?• Response map has two inhibitory regions?

0.5 1 5 10 40

Frequency (kHz) Frequency (kHz)

10

20

30

40

50

60

70

80

Rat

e (s

pike

s/se

c)

dB attn

0.5 1 10 40

Data

Model

Spont.

Rat

e (s

pike

s/se

c)

200

0

100

20 d

B

200

Type IV unit response map

spec

trum

leve

l

Notch noise stimuli

spec

trum

leve

lsp

ectr

um le

vel

Notch noise stimuli

Upper Inhibitory SidebandUpper Inhibitory Sideband

Figure by MIT OCW.



WWW.BSSVE.IN

DCN notch noise sensitivity due to wideband inhibition

Broadband noise

AN input to type IV unit

frequency

leve

lle

vel

AN input to WBI

Notch noise

frequency

• strong AN input dominates

• type IV response is excitatory

• type IV loses greater portion of its excitatory input

• WBI input dominates

• type IV response is inhibitory

Nelken & Young 1994

WBIExcitatory

Inhibitory

Frequency

IV

Figure by MIT OCW.



WWW.BSSVE.IN

PVCN: is the D-stellate cell the wideband inhibitor?

• such responses arise from radiate or stellate neurons (Smith & Rhode 1989)

• stellate cells send axons dorsally into the DCN, thus called “D-stellate cells” (Oertel et al. 1990)

• D-stellate cells are inhibitory (Doucet & Ryugo 1997)

• broadly-tuned, onset-chopper units are found in the PVCN (Winter & Palmer 1995)

• typically respond better to broadband noise than to tones

5-5

-15-25-35-45-55-65-75-85-95

-1051000050001000

Frequency (Hz)

Rel

ativ

e to

ne le

vel (

dB) 500

400

300

200

100

0100 50 0

1000

00 10 20

Noise

BF tone

Level, dB attn

Rat

e, sp

ikes

/s

Figure by MIT OCW. DCN

PVCN

DCN

PVCNa

Figure by MIT OCW.



WWW.BSSVE.IN

Summary: Circuitry of DCN deep layer

Excitatory inputAN fibers?

Type II inhibitory inputs (strong)

UIS from ? source (weak)

Soun

d le

vel

Frequency

Excitatory

Inhibitory

400

300

200

100

0110 60 10

BF Tone

Level, dB attn

NoiseRat

e, sp

ike/

s

IV

Best frequency

500

400

300

200

100

0100 50 0

1000

00 10 20

Noise

BF tone

Level, dB attnR

ate,

spik

es/s

Inhibitory

Excitatory

Spirou and Young 1991

type IItype II

W.B.I.W.B.I.

“reciprocal” response properties



WWW.BSSVE.IN

Summary of DCN anatomy and physiology

NonauditorySomatosensory

AuditoryVestibular

granule

fusiform

• vertical cell• type II unit• narrowband inhibition

D• D-stellate cell• onset-chopper• wideband inhibition



WWW.BSSVE.IN




• Physiology of the DCN– diverse response properties– complex interconnections– highly nonlinear

• Functional considerations



WWW.BSSVE.IN





• Functional considerations



WWW.BSSVE.IN

Filtering by the pinna provides cuesto sound source location

Human

A

-15.

-15.

15.

60.

90.

0.

2 3 5 7 10 15 20

10

0

5

-5

-10

-15

-20

Frequency (kHz)

Tran

sfor

mat

ion

(dB

)“Head Related Transfer Function ” (HRTF)

Out

er e

ar g

ain

-15°-15°0°0°

+15°+15°

“first notch” frequency changes with elevation“first notch” frequency changes with elevation

Figures by MIT OCW.



WWW.BSSVE.IN

Type IV units are sensitive to HRTF first notch

• type IV units are inhibited by notches centered on BF

• null in DCN population response may code for sound source location

Reiss & Young2005

May 2000

Physiology ⇒

Behavior ⇒

Figures by MIT OCW.

30

-100

1020

30

-100

1020

-15dB

-35 dB

-55 dB

-75 dB

-95 dB300

3 5 10 30

3 5 10 300

100

200

Frequency / kHz

Stimulus notch frequency

Rat

e / s

pike

s per

seco

ndR

ate

/ spi

kes p

er se

cond

Gai

n / d

BEl = 30o, Az = 15o

El = 30o, Az = -15o

El = 30o, Az = -15o

El = 30o, Az = 15o

spont.



WWW.BSSVE.IN



• Anatomy of the DCN– more complex than other CN subdivisions– receives nonauditory inputs– has similar organization to cerebellar cortex


• Functional considerations– coding sound source location based on pinna cues



WWW.BSSVE.IN

DCN is a “cerebellum-like structure”

Generic cerebellum-like structure

descending auditorydescending auditory

somatosensorysomatosensory

cochleacochlea

“plastic” synapses“plastic” synapses

Predictive Inputs :Corollary discharge signalsHigher levels of the same modalityOther sensory modalities (e.g. proprioception)?

Granule layer

Molecular layer

Principal cell layer

Sensory input layer

Input from a sensory surface



WWW.BSSVE.IN

Synaptic plasticity: Long-Term Potentiation (LTP)

• “Classical” LTP demonstration at the hippocampal CA3-CA1 synapse• LTP evoked by tetanic stimulation (mechanism involves NMDA receptors)

Tzou

nopo

ulos

2004



WWW.BSSVE.IN

Electric fish provide clues to cerebellum-like function

black ghost knifefish(Apteronotus albifrons)

Photograph removed due to copyright reasons. Please see the Nelson Lab home page: http://nelson.beckman.uiuc.edu

electric organ discharge

(EOD)

• electrical activity detected by electric lateral line• afferent activity transmitted to electric lateral line

lobe (ELL), analogous to DCNwww.bsscommunitycollege.in www.bssnewgeneration.in www.bsslifeskillscollege.in


WWW.BSSVE.IN

Electric fields provide information about nearby objects

• BUT the fish generates its own electric fields:• tail movements• ventilation

⇒ cerebellum-like ELL helps solve this problem Bell 2001

Figures by MIT OCW.



WWW.BSSVE.IN

What do cerebellum-like structures do???

• Subtract the expected input pattern from the actual input pattern to reveal unexpected or novel features of a stimulus.

– DCN: pinna movement is expected to shift the first notch, independent of what the sound source is doing



WWW.BSSVE.IN





• Functional considerations: the DCN may…– code sound source location based on pinna cues– extract novel components of response



WWW.BSSVE.IN

DCN may play a role in tinnitus

• percept of noise, ringing, buzzing, etc.

• affects up to 80% of the population

• 1 in 200 are debilitated

• (not voices in the head)

This makes no sense!

So why DCN? Because tinnitus…

• involves plasticity

• may involve somatosensory effects

Levine 1999www.bsscommunitycollege.in www.bssnewgeneration.in www.bsslifeskillscollege.in


WWW.BSSVE.IN





• Functional considerations: the DCN may…– code sound source location based on pinna cues– extract novel components of response– contribute to tinnitus



WWW.BSSVE.IN

Selected References

Slide 5: Ryugo DK, May SK (1993) The projections of intracellularly labeled auditory nerve fibers to the dorsal cochlear nucleus of cats. J Comp Neurol 329:20-35.

Slide 6: Ehret G, Romand R, eds (1997) The Central Auditory System. New York: Oxford UniversityPress.

Slide 7: Ehret G, Romand R, eds (1997) The Central Auditory System. New York: Oxford UniversityPress.

Slide 12: Lorente de Nó R (1981) The Primary Acoustic Nuclei. New York: Raven.

Kane ES, Puglisi SG, Gordon BS (1981) Neuronal types in the deep dorsal cochlear nucleus of the cat. I. Giant neurons. J Comp Neurol 198:483-513.

Slide 15: Young ED (1984) Response characteristics of neurons of the cochlear nuclei. In: Hearing Science (Berlin CI, ed), pp 423-460. San Diego: College-Hill. ISBN: 0316091693.

Slides 16, 18, 19: Young ED (1984) Response characteristics of neurons of the cochlear nuclei. In: Hearing Science (Berlin CI, ed), pp 423-460. San Diego: College-Hill.

Young ED, Davis KA (2001) Circuitry and Function of the Dorsal Cochlear Nucleus. In: Integrative Functions in the Mammalian Auditory Pathway (Oertel D, Popper AN, Fay RR, eds). New York: Springer-Verlag. www.bsscommunitycollege.in www.bssnewgeneration.in www.bsslifeskillscollege.in


WWW.BSSVE.IN

Selected References, continued

Slide 20:VOIGT, H. F. AND YOUNG, E. D. Cross-correlation analysis of inhibitory interactions in dorsal cochlear nucleus. J. Neurophysiol. 64: 1590- 16 10, 1990.

Young ED, Davis KA (2001) Circuitry and Function of the Dorsal Cochlear Nucleus. In: Integrative Functions in the Mammalian Auditory Pathway (Oertel D, Popper AN, Fay RR, eds). New York: Springer-Verlag.

Slide 22:Spirou GA, Young ED (1991) Organization of dorsal cochlear nucleus type IV unit response maps and their relationship to activation by band-limited noise. J Neurophysiol 66:1750-1768.

Slide 23:Nelken I, Young ED (1994) Two separate inhibitory mechanisms shape the responses of dorsal cochlear nucleus type IV units to narrowband and wideband stimuli. J Neurophysiol 71:2446-2462.

Slide 24:Winter IM, Palmer AR (1995) Level dependence of cochlear nucleus onset unit responses and facilitation by second tones or broadband noise. J Neurophysiol 73:141-159.


Oertel D, Wu SH, Garb MW, Dizack C (1990) Morphology and physiology of cells in slice preparations of the posteroventral cochlear nucleus of mice. J Comp Neurology 295:136-154.



WWW.BSSVE.IN

Selected References, continued

Slide 25:Spirou GA, Young ED (1991) Organization of dorsal cochlear nucleus type IV unit response maps and their relationship to activation by band-limited noise. J Neurophysiol 66:1750-1768.

Nelken I, Young ED (1994) Two separate inhibitory mechanisms shape the responses of dorsal cochlear nucleus type IV units to narrowband and wideband stimuli. J Neurophysiol 71:2446-2462.


Slide 26:Oertel D, Wu SH, Garb MW, Dizack C (1990) Morphology and physiology of cells in slice preparations of the posteroventral cochlear nucleus of mice. J Comp Neurology 295:136-154.

Slide 29: Geisler CD (1998) From Sound to Synapse: Physiology of the Mammalian Ear.: Oxford University Press.

Slide 30:Young ED, Spirou GA, Rice JJ, Voigt HF (1992) Neural organization and responses to complex stimuli in the dorsal cochlear nucleus. Philos Trans R Soc Lond B Biol Sci 336:407-413.

Slide 32: Bell CC (2001) Memory-based expectations in electrosensory systems. Curr Opin Neurobiol 11:481-487.

Slide 35:Zakon HH (2003) Insight into the mechanisms of neuronal processing from electric fish. Curr Opin Neurobiol13:744-750. www.bsscommunitycollege.in www.bssnewgeneration.in www.bsslifeskillscollege.in


WWW.BSSVE.IN

Harvard-MIT Division of Health Sciences and Technology HST.722J: Brain Mechanisms for Hearing and Speech Course Instructors: Bertrand Delgutte, M. Christian Brown,

Joe C. Adams, Kenneth E. Hancock, Frank H. Guenther, Jennifer R. Melcher, Joseph S. Perkell, David N. Caplan

HST.722 Student Topics Proposal Erik Larsen

Cortical correlates of audio-visual integration

Introduction Animals have multiple sensory systems with which they can probe the external

environment. Oftentimes, objects or events in the environment produce signals to which more than one sensory system responds, and it is the task of the cerebral cortex to integrate these separate modalities into a unified percept. We take this process for granted, but it is in fact a difficult problem, which becomes obvious when we start to ask ourselves how we would design a system to do the same. The current state of knowledge described in the literature is quite limited and is mostly phenomenological, i.e. describing how auditory and visual inputs can either modulate each other and/or activate association areas of the cortex. There does not seem to be a mechanistic understanding of how this works.

I have collected a number of papers that look at audio-visual integration at different levels, from single-unit studies to cortical activation patterns to behavioral. The methods used range from intracellular recordings to scalp-recorded ERP (event-related potentials) to fMRI / MEG / PET imaging. For your pleasure and convenience, most of these papers are quite short. I have chosen three of them to discuss in class (Discussion papers), three for background, and quite a few for ‘further reading’ (if you only want to read a few of these, the most interesting for me are marked by ♦).

Behavioral effects of audiovisual integration Ultimately multi-modal (or multi-sensory) integration should lead to noticeable

behavioral effects. For human, the most interesting case is speech perception, which is usually considered an auditory modality but is strongly influenced by visual input. Being able to see the person that is talking to you leads to an increased intelligibility, especially in adverse conditions (high noise, reverberation, competing talkers) as shown by Sumby and Pollack (1954). The visual input gives redundant cues to reinforce the auditory stimulus but also disambiguates some speech sounds which differ in place of articulation but sound similarly (such as /ba/ vs. /da/). Sumby and Pollack (1954) have shown that especially at low acoustic signal-to-noise ratios, the visual signal can dramatically increase word recognition, in some cases from near zero to 70% or 80%. In higher signal-to-noise ratio conditions, the visual signal still contributes, but the absolute effect is smaller because auditory performance alone is already high.

Beside this synergistic effect of auditory and visual input, there are a few other effects that clearly demonstrate the strong interaction of these two modalities. The first is the so-called ‘ventriloquist’ effect, where a synchronous yet spatially separate audio and visual signal is heard as originating from the visual location. Macaluso et al. (2004) were interested in finding brain areas that mediate this integration of spatially separate yet synchronous information into a single percept, through PET scans of human brains, and identified an area in the right inferior parietal lobule to be activated especially in this



WWW.BSSVE.IN

condition. Bushara et al. (2001) used PET scans to identify the right insula being most strongly involved in audiovisual synchrony-asynchrony detection.

Another famous audiovisual ‘illusion’ is the ‘McGurk effect’ (McGurk and MacDonald, 1976), where the sound /ba/ is combined with the visual image of a talker articulating /ga/, leading to a robust percept of /da/. The usual explanation for this effect is that the brain tries to find the most likely stimulus given the (in this case conflicting) auditory and visual cues. Note that this effect is extremely robust and not susceptible to extinction, even after hundreds of repetitions. This effect has contributed to the notion that audiovisual integration occurs at a pre-lexical stage, early in the neural processing pathway.

Neural correlates of audiovisual integration for ‘simple stimuli’ Given the strong behavioral effects of audiovisual integration described above, it

would be very interesting to explore the neural basis for these. However, we will first explore some more basic properties of audiovisual integration.

The canonical view of cortical sensory processing is that each sensory modality has a primary unimodal cortex, several higher-order unimodal association cortices, and that finally the various sensory modalities interact in multimodal association cortices. For the auditory case, the primary auditory cortex is located in the superior temporal gyrus or BA41 (see Fig. 1 for an overview of Brodmann areas). This area is surrounded by a belt and parabelt region, which are the auditory association areas. The middle and inferior temporal gyri (BA 20, 21, 37) are multisensory association areas, mainly auditory and visual. In fMRI studies of human brain activation, these multimodal areas activate uniformly in response to multimodal stimuli. Using high-resolution fMRI, it has recently been shown by Beauchamp et al. (2004) that in fact these multisensory areas (at least in the superior temporal sulcus) contain a patchwork of auditory, visual, and audiovisual areas, and each a few mm in size. It appears that the various unimodal areas send projections to small patches of multisensory cortex, after which the modalities are integrated in the intervening patches.

From human imaging and animal studies it is clear that there are special cortical areas which have multisensory responses. Komura et al. (2005) found that multisensory responses can also be found at lower levels, specifically in the auditory thalamus – the medial geniculate body (MGB). Traditionally, the thalamus is thought of as a relay station between brainstem/spinal cord and cortex, sending signals upward; but it is also known that it receives massive projections from the cortex itself. Komura and coworkers recorded from MGB shell neurons (which receive the cortical projections) in rat during a reward-based auditory spatial discrimination task, which was paired with an irrelevant yet variable light stimulus. Although MBG neurons did not respond to the light stimulus alone, the response was strongly modulated by the visual input, in particular the response was greater when auditory and visual stimuli were matched and smaller when they were conflicting. The matched and conflicting condition also led to a decrease vs. increase in reaction time, respectively, which again demonstrates the utility of integrating multiple modalities (which would usually be in agreement with one another, leading to a faster reaction). Interestingly, MGB shell neurons were also modulated by the amount of reward, although that is beyond the scope of our discussion.



WWW.BSSVE.IN

A conceptual difficulty with studies of multisensory integration is that the paradigm does not always permit one to be sure that integration has in fact occurred. For example, in the mentioned study by Macaluso et al. (2004) where spatially separated audio and video was used, it is in principle possible that there was no integration, even though one would expect it based on prior psychophysical data. Also, conditions that are designed to produce multisensory integration often necessarily use somewhat different stimuli than conditions that are not aimed at producing integration. Bushara et al. (2003) devised an ingenious method to circumvent these problems, and were able to use exactly identical stimuli that sometimes produced integration and sometimes did not. By comparing brain activation (BOLD fMRI) from either category they were able to find a correlation between larger activation in specific areas and the (non-)occurrence of integration. Because of these results, they propose that multisensory association areas work in parallel with primary sensory areas, instead of as a higher-level end-station, which is the usual view. This would agree with Komura et al. (2005) who showed that error rates were sensitive to matched or conflicting simultaneous audiovisual signals, which also had strong neural correlates (enhanced or depressed rate responses).

A completely different aspect of multisensory effects on sensory processing is attentional cueing, i.e. one modality biases the observer such that the other modality will respond preferentially to the cued object. As an example, consider hearing a familiar voice calling your name from a crowd of people; you will reflexively turn towards the sound location and focus your visual attention of the same location, leading you to find your friend more quickly than had he been silent. Such cueing effects are well documented in psychophysical experiments, and it has been proposed that by directing attention towards an object, the neural signal from that object propagates faster through the brain. This is the proposed neural correlate of judging attended objects to appear earlier than unattended objects, even if they appear simultaneously. McDonald et al. (2005) used a sound cue in a visual task where subjects were required to judge which of two lights switched on earlier. They were able to show by measuring event-related potentials (ERPs) from the scalp that the attended (cued) light yielded a larger ERP than the unattended light, even if they were simultaneous. Somehow this difference in magnitude is translated into a delay in subsequent stages of processing that lead to perception. These findings may again corroborate to some extent the findings of Komura et al. (2005) in that cueing by one modality can influence the response of neurons in the other modality, with subsequent clear-cut behavioral consequences.

Neural correlates of audiovisual integration in speech perception

We have already described how important visual signals are for speech perception, both in the synergistic sense of aiding speech intelligibility (Sumby and Pollack, 1954) as well as in creating illusions such as the McGurk effect (McGurk and MacDonald, 1976). In the previous section we explored how relatively simple audiovisual stimuli that may or may not be fused into single objects can influence neural responses, both for individual neurons as well as for the whole brain. How is this for speech? Does brain activity differ between purely listening to speech as compared to listening and seeing speech (beside predictable activation of purely visual sensory areas)? And what about purely seeing speech (no sound)? Does this activate any of the same cortical areas? The latter question



WWW.BSSVE.IN

of as academic interest in understanding language processing, but also has a more practical importance in that this is how many deaf people ‘listen’ to others, i.e. by lipreading, also called speechreading. It would have great value to know what areas are the most important for speech reading, and whether differences in speechreading ability (which are large between people) have a identifiable neural basis.

The classic paper in this context is Calvert et al. (1997). They used fMRI to find brain areas of increased activity when either listening to speech or silent lipreading, and the surprising finding was that silent lipreading activates primary and association auditory cortices. This was interpreted to mean that audiovisual integration occurs at a very early level in the neural pathway, even before association areas are activated (although these may cooperate in parallel, instead of hierarchical, cf. Bushara, 2003). Auditory areas were not activated by closed-mouth, non-speech movements of the lower face. A series of subsequent experiments similar to these were conducted, and with improvements in fMRI technology it became controversial whether primary auditory cortex is indeed activated by silent lipreading. Most investigators failed to find group-averaged activation of primary auditory cortex, although auditory association areas (e.g. BA 42, 22) do reliable activate during silent lipreading. Hall et al. (2005) found such kind of result, with the exception that for some proficient lipreaders, superior temporal gyrus did activate. It seems that currently there is no unambiguous answer to the question whether silent lipreading activates primary auditory cortex; there seems to be evidence pro and contra. Interestingly, Hall et al. (2005) describes other cortical areas that vary in activity as a function of speechreading proficiency. For example, high activity in the left inferior frontal gyrus was associated with poor speechreading ability. The explanation is that the greater task difficulty requires more extensive use of cognitive processes, which are located in the frontal lobe. However, the left inferior frontal gyrus is Broca’s area (BA 44, 45), traditionally assumed to support articulatory-based mechanisms of speech production and executive aspects of semantic processing. From these and other recent results, it is becoming increasingly clear that Broca’s area is probably also involved in language comprehension, supporting the formation of syntactic and semantic structure and syntactic working memory.

Paulesu et al. (2003) also found activation the perisylvian language area and of Broca’s area (BA 44) in lipreading, also for non-lexical lipreading (NLLR, formed by playing video backwards). Lexical lipreading (LLR, forward video) differentially activated the more anterior part of Broca’s area (BA 45), and also the left inferior temporal cortex. Therefore, Paulesu and coworkers assumed that these two areas might be particularly important for lexical access in lipreading. As in other recent imaging studies of lipreading, they found activation of auditory association areas (but not primary auditory areas) for both LLR and NLLR.

Conversely, auditory input can also activate visual sensory areas, as studied by Giraud and Truy (2002). In both normal and cochlear-implant subjects the fusiform gyrus and early visual cortex (BA 18, 19) was activated by listening to speech (no visual signal). It is assumed that the expectancy of visual correlates of speech are responsible for this effect. In cochlear-implant patients, the visual area activation was much greater than for normal-hearing subjects, showing the greater reliance they presumably place on visual cues. This shows that multimodal integration is flexible and can be strengthened when one modality is degraded.



WWW.BSSVE.IN

References

Background • W.H. Sumby and I. Pollack, “Visual contribution to speech intelligibility in

noise,” J. Acoust. Soc. Am. 26(2): 212-215, 1954. Nice short paper showing the (large) benefit (in terms of words correctly perceived) of seeing the face (mouth) that is talking to you, with an interesting interaction of the size of the vocabulary.

• H. McGurk and J. MacDonald, “Hearing lips and seeing voices,” Nature 264: 746-748, 1976. The famous ‘McGurk effect’ described: an auditory /ba/ paired with a visual /ga/ leads to a perception of /da/.

• G.A. Calvert, E.T. Bullmore, M.J. Brammer, R. Campbell, S.C.R. Williams, P.K. McGuire, P.W.R. Woodruff, S.D. Iversen, A.S. David, “Activation of auditory cortex during silent lipreading,” Science 276: 593-596, 1997. Classic paper on neuroimaging of speechreading; was the first to show auditory cortex activation from speechreading alone. Later studies have usually found that primary auditory cortex is not activated by speechreading alone, throwing some controversy on the issue.

Discussion papers • K.O. Bushara, T. Hanakawa, I. Immisch, K. Toma, K. Kansaku, and M. Hallet,

“Neural correlates of cross-modal binding,” Nature Neurosci. 6(2): 190-195, 2003. Shows that specific cortical areas are more active (fMRI) when an audiovisual stimulus is perceived as one, compared to when exactly the same audiovisual stimulus is perceived as separate unimodal events.

• G.A. Calvert and R. Campbell, “Reading speech from still and moving faces: The neural substrates of visible speech,” J. Cogn. Neurosci. 15(1): 57-70, 2003. Visible speech aids speech reception both when moving images are presented as well as still images. In this study, fMRI was used to asses if different cortical areas subserve these two visual categories. They found that similar language-based cortical areas were activated in either case, although stronger activation was seen for the moving images.

• Y. Komura, R. Tamura, T. Uwano, H. Nishijo, and T. Ono, “Auditory thalamus integrates visual inputs into behavioral gains,” Nature Neurosci. 8(9): 1203-1209, 2005. Single-unit responses in rat MGB (belt region) are modulated by visual stimulus and reward, in an audiovisual discrimination task. Shows that neuron response predicts behavioral response in hit/miss/reject/false alarm conditions; also shows neural and behavioral correlates of task difficulty.

Further reading (♦ most interesting) • T. Raij, K. Uutela, and R. Hari, “Audiovisual integration of letters in the human

brain,” Neuron 28: 617-625, 2000. MEG imaging of human brain in response to visual, auditory, and audiovisual letters. It was found in the superior temporal sulcus audiovisual responses are suppressed relative to the unimodal responses, in contrast to single-unit responses, which are usually potentiated for multimodal stimuli.

• K.O. Bushara, J. Grafman, and M. Hallet, “Neural correlates of auditory-visual stimulus onset asynchrony detection,” J. Neurosci. 21(1): 300-304, 2001.



WWW.BSSVE.IN

The authors wanted to identify which brain areas are primarily involved in detection audio-visual asynchrony, and found the right insula to correlate best with the asynchrony detection effort as measured by response latency.

• A.L. Giraud and E. Truy, “The contribution of visual areas to speech comprehension: a PET study in cochlear implant patients and normal-hearing subjects,” Neuropsychologia 40: 1562-1569, 2002. Cochlear-implant patients (but not normal-hearing subjects) produce strong visual cortex activation when hearing speech.

• E. Paulesu, D. Perani, V. Blasi, G. Silani, N.A. Borghese, U. De Giovanni, S. Sensolo and F. Fazio, “A functional-anatomical model for lipreading,” J. Neurophysiol. 90: 2005-2013, 2003. A PET-scanning study of lexical and non-lexical (reversed video) lipreading implicates Broca’s area in speech perception, as well as some other areas. Associative (non-primary) auditory cortex was activated for both stimulus conditions.

• ♦ E. Macaluso, N. George, R. Dolan, C. Spence, and J. Driver, “Spatial and temporal factors during processing of audiovisual speech: A PET study,” NeuroImage 21: 725-732, 2004. Using PET scanning the authors identify cortical regions that are sensitive to the synchrony and/or spatial coincidence of visual and auditory speech.

• ♦ M.S. Beauchamp, B.D. Argall, J. Bodurka, J.H. Duyn, and A. Martin, “Unraveling multisensory integration: patchy organization within human STS multisensory cortex,” Nature Neurosci. 7(11): 1190-1192, 2004. The authors show with high-resolution fMRI that audiovisual association cortex has patches of auditory, visual, and audiovisual cortex, instead of a homogenous audiovisual cortex.

• D.A. Hall, C. Fussell, and A.Q. Summerfeld, “Reading fluent speech from talking faces: Typical brain networks and individual differences,” J. Cogn. Neurosci. 17(6): 939-953, 2005. Includes a literature overview of neuroimaging studies of speech/lip-reading. Investigates with fMRI which cortical areas are active both in auditory speech and visual speech comprehension, and defines some cortical areas that correlate with individual speechreading proficiency.

• ♦ J.J. McDonald and W.A. Teder-Sälejärvi, F. Di Russo, and S.A. Hillyard, “Neural basis of auditory-induced shifts in visual time-order perception,” Nat. Neurosci. 8(9): 1197-1202, 2005. Measuring ERP (event-related potentials) from human scalp the authors show that time-order judgments correlate with the magnitude of ERP, not its timing. In particular, they find no evidence for the prior held belief that attended stimuli propagate faster through the CNS than unattended stimuli. Auditory cues were used to bias attention towards one of two visual cues, which could appear simultaneously or with a time delay.



WWW.BSSVE.IN

Figure removed due to copyright considerations.

Figure 1. Outline of Brodmann areas with functional attribution. From http://spot.colorado.edu/~dubin/talks/brodmann/brodmann.html.



WWW.BSSVE.IN

Neural Centers and Perceptual Characteristics of Auditory Short-term Memory Anna A. Dreyer

Studies of auditory perception rarely incorporate findings regarding storage of short-term auditory memory. Investigators rarely incorporate auditory working memory into the neural circuits and anatomical regions they hypothesize underlies perception. This review focuses on the brain regions, neural circuits and perceptual characteristics of human auditory working memory, the storage of auditory percepts necessary to perform short-term, trial-by-trial tasks, in contrast to global, permanent memory formation. The neural areas participating in auditory working memory greatly overlap with those implicated in auditory perception, suggesting the importance of incorporating working memory into our understanding of auditory perception. Investigators interested in elucidating the mechanisms of auditory memory have focused on characterizing the perceptual limitations and neurophysiological responses. Psychophysical studies seek to understand human properties of the representation, single-cell recordings look for responses characteristic of short-term retention and imaging studies implicate the brain regions necessary to perform various short-term memory tasks. In addition, lesion studies have been important in examining the perceptual and neurophysiological defects when a key area implicated in short-term auditory memory has been removed. This review will describe how each methodology has shaped the current understanding of the mechanisms underlying auditory memory. Retention of non-verbal, perceptual information needed in pitch and lateralization discrimination is discussed to allow for comparisons between neurophysiological studies in non-human primates and human perceptual and imaging studies. Psychophysical studies find parametric segregation of short-term memory stores Psychophysical studies have explored system storage capacities in delayed performance of tasks with respect to variation in sensory parameters, such as duration, pitch, spatial location and memory load of auditory stimuli. In particular, investigators have discovered that auditory memory has separate stores for important auditory parameters. Anourova et al. (1999) conducted experiments testing whether pitch and location short-term memory can be disrupted with pitch and location-based distracters. Subjects were asked to identify whether two stimuli, separated by a delay, matched in frequency, in cases where either a stimulus of a different pitch or in a different spatial location served as a distracter during the delay. Anourova et al. found that only stimuli of the same modality (pitch of spatial location) degraded performance, concluding that pitch and location are contained in different auditory stores. Findings of Clarke et al. (1999) corroborate those findings. Semal and Demany (1991) suggest that pitch and timbre also have distinctly independent representations in auditory working memory. Other studies characterize the pitch or location-based stimuli that serve as the most distracting. For instance Deutch (1972) also asked subjects to compare two test tones and played distracter tones between the two test tones. Subjects were told to ignore the distracter tones and indicate whether the two tones were similar or different in pitch. However, as also found by Anourova et al (1999) and Clarke et al. (1999), performance degraded with distracter tones having similar pitches to the test tones. Performance worsened dramatically when the distracter tones were 2/3 tone away from the test tone, regardless of the tone frequency, and improved for distracter tones higher and lower in frequency. These findings indicate that distracter tones are

Harvard-MIT Division of Health Sciences and TechnologyHST.722J: Brain Mechanisms for Hearing and SpeechCourse Instructors: Bertrand Delgutte, M. Christian Brown, Joe C. Adams, Kenneth E. Hancock, Frank H. Guenther, Jennifer R. Melcher, Joseph S. Perkell, David N. Caplan



WWW.BSSVE.IN

Anna A. Dreyer October 26, 2005

most effective when they are close in frequency to the test tones. This study and another conducted by the author a year later (Deutsch, 1973) also adds that the auditory memory storage is logarithmically organized, since the errors scaled with tonal distance between the distracter and test tone. As evidence of further parameter segregation in auditory working memory, Clement et a. (1999) suggested that pitch and loudness memory processing have separate representations and possibly different processing centers. This study measured performance during a frequency and an intensity discrimination task and found that the patterns of performance degradation as a function of delay between the two test stimuli differs substantially, as shown in Figure 1.

Figure 1: Discrimination performance (d’) as a function of separation time between the two stimuli for frequency and intensity discrimination. Results averaged for four subjects in each condition. Results show that working memory degrades at different rates for the two conditions suggesting different underlying storage mechanisms. Adapted from Clement et al. (1999).

This finding is analogous to different working memory domains processing contrast and spatial frequency in the visual system, and provides further evidence of different memory stores for important auditory working memory parameters. Neurophysiological studies find neural correlates of short-term retention in medial geniculate body and auditory cortex Relatively few studies have directly recorded neural responses from auditory cortical areas participating in short-term memory tasks. Studies have implicated the auditory cortex in short-term memory processing in cats (Neff et al., 1975), dogs (Chorazyna and Stepien, 1961) and monkeys (Stepien et al., 1960). This relatively primitive stage or cortical processing was thoroughly investigated by Gottlieb et al., (1989) who concluded that the cortical neurons retain information about the first tone frequency during interstimulus intervals (ISI) using a rate code. Gottlieb et al. trained monkeys to discriminate two sounds with a delay in a performance and a passive listening condition, and found that the firing rate during the ISI correlates with the first tone in both conditions. In addition, they found that the firing rate of some units depended on whether the frequency of the first tone matched the frequency of the second tone. This study strongly implicates these neurons in the circuitry of short-term working memory (STWM). Participation of auditory cortex in working memory is also supported by magnetoencephalography study of tone loudness (Lu et al., 1992).

HST.722 Brain Mechanisms of Speech and Hearing 2




WWW.BSSVE.IN


In a similar study, Sakurai (1990) implicated the medial geniculate body (MGB), a processing center downstream from the auditory cortex, in addition to A1 in STWM. In his experiments, rats made discriminatory same/different responses regarding the frequency of two test tones, while Sakurai recorded spiking patterns from the auditory cortex, medial geniculate body and the inferior colliculus. Although many of the cells in each area showed differences in firing rates during the presentations of the two stimuli, some cells in the auditory cortex and the medial geniculate body showed differential, stimulus-dependent activity during the delay between the two stimuli, indicating involvement in retention. Both this and the Gottlieb et al. (1989) study suggest a neural correlate of retention of the two stimuli with a rate code dependent on the first stimulus.

Figure 2: Responses of a medial geniculate body (MGB) unit to demonstrate the delay correlate between tone presentation periods and the delay period. Left: diagrams represent peri-stimulus time (PST) histograms during a sample tone (3 s), the interstimulus delay (3 s) and the test tone (1 s). Data are presented for six types of trials with tones identified as H=high, L=low, and responses as G=go and NG=no go. Mismatch conditions are H-L and L-H. Right: Bargrams of median spike frequencies (not Hz) during the sample tone, delay and test tone, and for the first and second halves of the delay. The asterisk shows statistically significant differences (p<.05) between high and low tones during the sample tone, delay and test tone. Differences in the number of spikes also exist in both halves of the delay for different stimuli. Adapted from Sakurai (1990).

Another non-cortical area neurophysiologically implicated in working-memory in the hippocampus. Although this area is often associated with long-term, reference memory, Sakai (1994) finds that some hippocampal neurons are solely involved in either working or reference memory, whereas other neurons are involved in both types of memories. Imaging studies (see below) implicate the prefrontal cortex (PFC) in STWM (Romanski and Goldman-Rakic, 2002). The PFC consists of several cytoachitechtonically defined regions and there is evidence that these regions both distinct with respect to sensory modality and





WWW.BSSVE.IN


stimulus features (Levy and Goldman-Rakic, 2000; Carmichael and Price, 1995) and integrate information across sensory modalities (Fuster et al. 1990; Bodner et al, 1996). Romanski et al. (1999) find distinct regions in the belt and parabelt regions that have neuronal targets in the frontal lobe of the prefrontal cortex in the rhesus monkey. Since the belt and parabelt regions of A1 have implications of SHTM involvement (Gottlieb et. al, 1989; Sakurai, 1990), these histological studies give further evidence that the prefrontal cortex incorporates auditory STWM neural substrate downstream. Neurophysiological recordings of PFC activity in relation to auditory STWM tasks would shed light on the neural circuitry involving the MGB, A1 and PFC, but such studies are lacking. Imaging work implicates auditory cortical, frontal, parietal and cerebellar areas in STWM tasks Functional imaging studies have shed light on neural correlates of perceptual processing and implicate various cortical areas in auditory working memory. These studies suggest that the auditory cortex as well as higher cortical, as well as cerebellar, areas participate in working memory processing, but the degree of activation and the precise neural correlate depends on the type of memory task. Many imaging studies find activation of the prefrontal cortex in STWM tasks (e.g. Romanski and Goldman-Rakic, 2002). Levy and Goldman-Rakic (2000) find “domain-specific” organization of working memory function in the prefrontal cortex, and activation of one or more of several subdivisions of dorsolateral prefrontal cortex, depending on the type of information processed. Romanski and Goldman-Rakic (2002) specifically implicate the dorsolateral and ventrolateral prefrontal cortices in retention of auditory stimuli. Celsis et al. (1999) also suggested frontal and prefrontal regions play a role in auditory working memory in maintenance of tonal patterns. For pitch comparison memory tasks, Zatorre et al. (1994) found greater activation of the right frontal and temporal lobes, and the parietal and insular cortex in high memory load conditions. Temporal lobe activation is expected, with its recognized role in musical processing, as is frontal activation, known for executive functions. Parietal and insular cortical activation probably played a role in attention. These findings are in line with those of Griffiths et al. (1999) who also found more extensive right lateralized network including the cerebellum, posterior temporal and inferior frontal regions when subjects were making “same/different” judgments while comparing the pitch of 6 tones. This finding may be in sharp contrast with a pitch matching study by Gaab et al. (2003) of greater working pitch memory activation of the left supramarginal gyrus (SMG) and the left dorsolateral cerebellum, as well as other structures in the temporal cortex.




WWW.BSSVE.IN


Figure 3: Brain activation pattern (P<.05, corrected) during the initial imaging time points (0-2s after the ends of the auditory stimulation). The pattern is dominated by strong (left>right) activation of the superior temporal gyrus bilaterally including Heschl’s gyrus and auditory association cortex; anterior, lateral and peosterior to Heschl’s gyrus including the planum temporale and the supramarginal gyrus. Activation also includes bilateral posterior dorsolateral frontal regions, superior parietal region on the right side, left>right pre-SMA region and lobules V and VI of the left cerebellum.

Celsis et al (1999) also implicated left SMG and right frontal and primary and secondary auditory cortices in the activation of a task requiring memory judgments between tones of different pitch height or spectral content. Thus, there seems to be some disagreement about the side of the asymmetry of activation patterns in pitch memory tasks. That the cerebellum has non-motor functions is found in several studies, including auditory verbal memory function (Grasby et al, 1993), tone recognition (Holcomb et al, 1998) and musical duration discrimination (Parsons, 2001). Mathiak et al. (2004) also implicates cerebellar involvement in non-verbal auditory memory. Although the latter study suggests that auditory cortex activated early in the delay between the first and second test tones, and that the SMG and cerebellum and activated later, Gaab et al. (2003) finds that the cerebellum activates immediately and sees cerebellar activation through the task. However, both studies agree on cerebellar involvement in STWM tasks. For audiospatial memory tasks, Martinkauppi et al. (2002) found bilateral activation in superior, middle and inferior frontal gyri and posterior parietal and middle temporal cortices. These areas were similarly found in the Zatorre et al. (1994) pitch study but here with bilateral activation. Martinkauppi et al. (2002) also revealed similar activation in the audio- and visio-spatial tasks, indicating that audio- and visio-spatial performance in STM tasks are located along a common neural pathway.





WWW.BSSVE.IN


Lesion studies indicate debilitation in STWM performance with bilateral auditory association cortical lesions. Lesion studies in monkeys implicate superior temporal cortex, a high-level auditory association area, in short-term memory tasks (Colombo et al., 1990; Colombo, 1996). However, Colombo (1996) suggests that only bilateral lesions impact auditory short-term memory performance significantly. This study used a delayed matching-to-sample task with high-frequency and low-frequency tones. Monkeys had to indicate whether two consecutive stimuli (with 2 s pause) were the same or different. Monkeys received lesions of association cortex with testing after each unilateral operation. The lesions corresponded to cytoarchitectonic area TA, sparing primary (area TC) and secondary (area TB) of auditory cortex. Visual performance after the first operation was unaffected, but auditory performance was affected in some monkeys after many sessions and some were unable to achieve preoperative levels. With bilateral lesions, monkeys were unable to reach preoperative levels with extensive testing, with worse performance deteriorating with delay length.

Figure 4: Performance (percent correct) on pattern-discrimination task preoperatively and after the second (bilateral) cortical lesion. Top and bottom dotted lines represent criterial and chance levels of performance, respectively. Adapted from Colombo et al., 1996.

This study indicates that good performance is possible with one superior temporal cortex, corroborating findings of greater unilateral activation (Zatore, 1994; Gaab, 2003; Celsis, 1999), and clearly implicates this area in STWM processing. Conclusion: complex interactions of STWM circuitry Combined psychophysical, neurophysiological, imaging and lesion studies implicate a complex network of neuronal areas involved in STWM processing. Psychophysical work suggests that different memory representations and processing exists for important STWM parameters such as pitch, timbre, spatial location and loudness, with only slight parametric interaction in memory tasks. Neurophysiological studies find neural correlates of rate-based coding and retention of information about prior stimuli in MGB and A1, but further studies need to map out the connections and coding schemes involving these centers, as well as other center implicated using imaging work. Functional imaging studies suggest a complex asymmetry in pitch and spatial location memory tasks involving primary and secondary auditory cortices,





WWW.BSSVE.IN


frontal and prefrontal areas, parietal areas as well as the cerebellum. Studies disagree about the temporal patterns of activation and about the right or left asymmetry of activation. Lesion studies in monkeys suggest that only bilateral auditory cortical lesions significantly affect performance, a finding that may corroborate some imaging studies with asymmetric activation patterns of activity. Because of the wide involvement of many cortical (and sub/supracortical) areas in STWM processing as well as disagreement regarding activated areas, further investigations of perceptual auditory phenomena should keep in mind possible involvement of STWM in designing tasks and evaluating results.




WWW.BSSVE.IN


Suggested Papers

Background Pasternak, T., and Greenlee, M.W. (2005). Working memory in primate sensory systems. Nature Reviews Neurosci., 6: 97-107. Assigned Papers Clement S., Demany, L., and Semal, C. (1999). Memory for pitch versus memory for loudness. J. Acoust. Soc. Am., 106(5): 2805-2811. Gottlieb, Y., Vaadia, E., and Abeles, M. (1989). Single unit activity in the auditory cortex of a monkey performing a short-term memory task. Exp. Brain Res., 74: 139-148 Gaab, N., Gaser, C., Zaehle, T., jancke, L., Schlaug, G. (2003). Functional anatomy of pitch memory—an fMRI study with sparse temporal sampling. NeuroImage, 19: 1417-1426. Further reading Colombo, M., Rodman, H.R., Gross, C.G. (1996). The effects of superior temporal lesions on the processing and retention of auditory information in mokeys (ebus paella). J. Neurosci. 16, 4501-4517. Deutsh, D. (1973). Interference in memory between tones adjacent in the musical scale. J. Exp. Psychol. 100, 228-231. Mathiak, K., Hertrich, I., Grodd, W., and Ackermann, H. (2004). Discrimination of temporal information at the cerebellum: functional magnetic resonance imaging of nonverbal auditory memory. Neuroimage 21: 154-162. Sakurai, Y. (1990). Cells in the rat auditory system have sensory delay correlates during the performance on an auditory working memory task. Behavioral neurosci. 104(6): 856-868.




WWW.BSSVE.IN


Works Cited Anourova, I., Rama, P.; Alho, K.; Koivusalo, S.; Kahnari, J.; and Carlson, S. (1999) Selective interference reveals dissociation between auditory memory for location and pitch. Neuroreport 10. 3543-3547. Bodner, M., Kroger, J., and Fuster, J.M., (1996). Auditory memory cells in dorsolateral prefrontal cortex. Neuroreport 7, 1905-1908. Carmichael, S.T. and Price, J.L. (1995). Sensory and premotor connections of the orbital and medial prefrontal cortex of macaque monkeys. J. Comp. Nurol. 363, 642-664. Celsis, P., Boulanouar K., Doyon, B. Ranjeva, J.P., Berry, I., Nespoulous J.L, Chollet, F., (1999). Differential fMRI responses in the left posterior superior temporal gyrus and left supramarginal gyrus to habituation and chage detection in syllables and tones. Neuroimage 9, 135-144. Chorazyna, H., Stepien L. (1961). Impairment of auditory recent memory produced by cortical lesions in dogs. Acta Biol Exp. 21: 177-187. Clarke, S., Adriani M. and Bellmann. (1998). A Distinct short-term memory systems for sound content and sound localization. Neuroreport 9(15), 3433-3437. Clement S., Demany, L., and Semal, C. (1999). Memory for pitch versus memory for loudness. J. Acoust. Soc. Am., 106(5): 2805-2811. Colombo, M.D., D’Amato, M.R., Rodman, H.R., Gross, C.G. (1990). Auditory association cortex impair auditory short-term memory in monkeys. Science 247, 336-338. Colombo, M., Rodman, H.R., Gross, C.G. (1996). The effects of superior temporal lesions on the processing and retention of auditory information in mokeys (ebus paella). J. Neurosci. 16, 4501-4517. Deutsch, D. (1972). Mapping of interactions in the pitch memory store. Science, Vol. 175: 1020-1022. Deutsh, D. (1973). Interference in memory between tones adjacent in the musical scale. J. Exp. Psychol. 100, 228-231. Fuster, J.M., Bodner, M. and Kroger, J.K. (2000). Cross-modal and cross-temporal association in neurons of frontal cortex. Nature 405. 347-351. Gaab, N., Gaser, C., Zaehle, T., jancke, L., Schlaug, G. (2003). Functional anatomy of pitch memory—an fMRI study with sparse temporal sampling. NeuroImage, 19: 1417-1426. Gottlieb, Y., Vaadia, E., and Abeles, M. (1989). Single unit activity in the auditory cortex of a monkey performing a short-term memory task. Exp. Brain Res., 74: 139-148 Grasby, P.M., Frith, C.D., Friston, K.J., Bench, C., Frackowiak, R.S., Dolan, R.J. (1993). Functional mapping of brain areas implicated in auditory verbal memory function. Brain. 116(1): 1-20. Griffiths, T.D., Johnsrude, I., Dean, J.L., Green, G.G.R. (1999). A common neural substrate for the analysis of pitch and duration pattern in segmented sound? Neroreport 10. 3825-3830. Holcomb, H.H., Medoff, D.R., Caudill, P.J., Zhao, Z., Lahti, A.c., Dannals, R.F., and Tamminga, C.A. (1998). Cerebral blood flow relationships associated with a difficult tone recognition task in trained normal volunteers. Cerebral Cortex. 8: 534-542. Levy, R. and Goldman-Rakic, P.S. (2000). Segregation of working memory functions within the dorsolateral prefrontal cortex. 133(1):23-32. Lu, Z.L., Williamson, S.J. and Kaufman L. (1992). Behavioral lifetime of human auditory sensory memory predicted by physiological measures. Science 258. 1668-1670. Martinkauppi, S., Rama, P., Aronen, H.J., Korvenoja, A., Carlson, S. (2000). Working memory of auditory localization. Cerebral Cortex 10(9): 889-898. Mathiak, K., Hertrich, I., Grodd, W., and Ackermann, H. (2004). Discrimination of temporal information at the cerebellum: functional magnetic resonance imaging of nonverbal auditory memory. Neuroimage 21: 154-162. Neff, W., diamond, I., Cassedey, J. (1975). Behavioral studies of auditory discrimination: central nervous system. In: Keider WD, Neff D. (eds). Handbook of sensory physiology. Vol V/2. Springer, Berlin Heidelberg, New York, 307-400. Parsons, L.M. (2001). Exploring the functional neuroanatomy of music performance, perception and comprehension. Ann. NY Academy Sci. 930: 211-231. Pasternak, T., and Greenlee, M.W. (2005). Working memory in primate sensory systems. Nature Reviews Neurosci., 6: 97-107. Romanski, L.M., Bates, J.F. and Goldman-Rakic, P.S. (1999) Auditory Belt and Parabelt Projections to the prefrontal cortex in the rhesus monkey, J. Comp. Neurol, 403: 141-157.




WWW.BSSVE.IN

http://scholar.google.com/url?sa=U&q=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26list_uids%3D5009395%26dopt%3DCitation


Romanski, L.M., and Goldman-Rakic, P.S. (2002). Au auditory domain in primate prefrontal cortex. Nature Neurosci 5, 15-16. Sakurai, Y. (1990). Cells in the rat auditory system have sensory delay correlates during the performance on an auditory working memory task. Behavioral neurosci. 104(6): 856-868. Sakurai, Y. (1994). Involvement of auditory cortical and hippocampal-neurons in auditory working-memory and reference memory in rat. J. Neurosci. 14(5): 2606-2623. Semal, C. and Demany, L. (1991). Dissociation of pitch from timbre in auditory short-term memory. J. Acoust. Soc. Am. 89: 2404-2410. Stepien, L.S., cordeau, J.P., and Rasmussen, T. (1960). The effect of temporal lobe and hippocampal lesions on auditory and verbal recent memory. Brain 83: 470-489. Zatorre, R.J., Evans, A.C. and Meyer, E. (1994). Neural mechanisms underlying melodic perception and memory for pitch. J. Neurosci. 14. 1908-1919.




WWW.BSSVE.IN



Student Topic: Brain Attending a Cocktail Party Adrian KC Lee

Brain Attending a Cocktail Party

Introduction

Imagine after our topic presentation, we go to a crowded pub to celebrate. You are trying hard

to ignore those eighties tunes blaring out of the loudspeakers while you are chatting with your

colleagues. Suddenly, this familiar melody from a Mozart’s piano concerto grabs your attention

– yes, your cell phone alerts you that your partner just called for the third time…

This classic phenomenon, in which our brain analyzes a scene by perceptually organizing

sensory data to form auditory objects (or auditory streams), is often referred to as the “Cocktail

Party Effect” (Cherry, 1953). Many cues have been identified that influence perceptual

organization, but only little is known about the actual brain mechanisms underlying this

phenomenon. In this proposed topic, we look at the latest development in the quest to find the

neural basis for auditory stream segregation.

Background – Auditory grouping mechanisms

The ability to form auditory objects is important in the natural environment where sounds

arriving at our ears are a resultant of all spectro-temporal components that may have arisen

from different auditory events. We constantly analyze an auditory scene by trying to group

related components from one source, and segregating out other frequency components that are

not in the attended object (Carlyon, 2004). This ability of bringing acoustical events to the

attention foreground may increase the chance of survival for a species in the animal kingdom

through auditory awareness to the movement of their predators. Humans also rely on auditory

grouping mechanisms for daily communication, especially in the presence of noise and other

competing sources, since we need to group simultaneous components originating from a single

source across frequency, as well as grouping events across time, in order to hear whole words

and messages. In his seminal book, Bregman (1990) provides us with working definitions on the

terminologies used in the world of auditory scene analysis. He described auditory stream

segregation as:

The general process of auditory scene analysis in which links are formed between parts

of the sensory data. These links will affect what is included and excluded from our

perceptual descriptions of distinct auditory events.

HST.722 Brain Mechanisms for Hearing and Speech Page 1 of 6 www.bsscommunitycollege.in www.bssnewgeneration.in www.bsslifeskillscollege.in


WWW.BSSVE.IN


If these links, which are also commonly referred to as streaming cues, are correct sensory parts

across time, we refer to such perceptual grouping as sequential. However, if these sensory data

coexist in time, and the formation of multiple auditory objects are as a result of perceptually

partitioning of the spectral contents into distinct objects, e.g., harmonics in a vowel, it is referred

to as simultaneous grouping. Apparent spatial location, onset / offset synchrony, frequency

proximity, and fundamental frequency are but some of the common acoustic cues that we

employed in auditory scene analysis.

Experimental paradigm – Buildup of Streaming

How does one systematically investigate the process of stream segregation that occurs in a

complex auditory environment? A popular paradigm is to use an ambiguous auditory figure that

could either be heard as one stream with a galloping rhythm (commonly labeled as “Horse”) or

as two concurrent streams with two different tempi (“Morse”) (See Figure 1). The basic stimulus

consists of a high tone A alternating with a low tone B, in repeated ABA_ sequences. If the

frequency difference (∆f) between the A and B tones is small, then neighboring tones tend to

bind together perceptually, resulting in a “Horse” rhythm. Conversely, if ∆f is large, the A and B

tones are no longer bind to each other, resulting in a “Morse” rhythm. The tone repetition rate

also influences the percept: the faster the repetition rate, the more binding there is between the

A and B tones (van Noorden, 1975).

At intermediate values of ∆f and tone repetition rate, the initial galloping “Horse” percept

changes after a few seconds of listening to a “Morse” percept (Anstis and Saida, 1985;

Bregman, 1978; Carlyon et al, 2001). With this systematic change in auditory percept over time,

it is possible to record neural responses at various points during an ongoing sequence of

sounds without any change in the evoking stimulus, and compare the neural responses

associated with dramatically different percepts.

HST.722 Brain Mechanisms for Hearing and Speech Page 2 of 6



WWW.BSSVE.IN



Figure 1 For the correct parameters, these sequences are ambiguous and can be heard with one or two perceptual organizations with different rhythms: (Left): a characteristic galloping rhythm (“Horse”); (Right): 2 isochronous streams, like Morse code (“Morse”). Colored regions correspond to perceptual streams. (Taken from Fig. 2, Cusack, 2005).

Key Areas of Research

Many investigations into the neural correlates of auditory streaming employ the aforementioned

ABA- paradigm, or the variants thereof, but their generic approaches and the specific questions

they address can be divided into several distinct classes. One class of approach is to use

single-unit recordings in animals to infer perceptual effects of stream segregation. Fisherman et

al (2001, 2004) showed that multiunit spiking responses to tone sequences in the primary

auditory cortex of awake monkeys follow the pattern that one might expect on the basis of

published psychophysical data from human subjects. Bee and Klump (2004) performed similar

sequential streaming experiments and recorded neural responses in the auditory forebrain of

awake starlings. They concluded that while there are preattentive auditory processes, such as

frequency selectivity and forward masking, that contribute to the perceptual segregation of

sequential acoustic events having different frequencies into separate auditory streams, there

may be additional processes that are required to account for all known perceptual effects

related to sequential stream segregation. The major confounding factor in interpretations of

these conclusions is that the neural response patterns that are putatively associated with the

one- and two-stream percepts were always induced using physically different stimuli.

Another class of approach combines human neuromagnetic and behavioral measures.

Gutschalk et al (2005) used magnetoencephalography (MEG) to examine the neural bases of

auditory stream formation. They concluded that there is a tight coupling between auditory

cortical activity and streaming perception, suggesting that an explicit representation of auditory

streams may be maintained within nonprimary auditory areas. They hypothesized that selective

adaptation by feature-specific neurons might be a general neural mechanism subserving

perceptual organization. Cusack (2005) combined functional magnetic resonance imaging




WWW.BSSVE.IN


(fMRI) with psychophysical experiments to test for an effect of perceptual organization across

the whole brain.

The other class of approach addresses the specific interaction between auditory streaming and

attention. This remains a controversial topic. Due to time constraint, the scope of the proposed

discussion will not include literature that explicitly addresses this issue, but some key results are

summarized here for completeness. It has been suggested that streaming is a preattentive

process based on recording of event-related potentials (ERP) using a mismatch negativity

paradigm (MMN) (Sussman et al, 1999). It was argued by these authors that since MMN, which

can occur in the absence of attention, is elicited only in the infrequent odd-ball presentation

within-streams, the two potential streams of sound were segregated without attention being

focused on the auditory stimuli. However, Carlyon et al (2001) and Cusack et al (2003) argue

that the buildup of streaming is an attentive process, and previous experiments by van Noorden

(1975) suggests that listeners have control over their perception in the ambiguous ∆f region. In

other words, whether or not streaming is an attentive process is not conclusive, but the

techniques used to probe such a high cognitive process are, nonetheless, interesting to follow.

The last class of active research area is computational modeling of stream segregation (e.g.,

Beauvois and Meddis, 1991; McCabe and Denham, 1997). An interesting neural network model,

known as ARTSTREAM (Grossberg, 2004), uses adaptive resonance theory to propose how

the brain achieves auditory scene analysis. Even though this neural network structure does not

have clear neural correlates, its putative structure may inspire neurophysiologists to find neural

units that have similar behaviors. This is perhaps an interesting paper to discuss if the class

would like to be exposed to neural network modeling.

Recent Developments and Proposed Papers for Discussion

In this month’s publication of Neuron, Micheyl and colleague (Micheyl, 2005) reported their

comparative experiments on auditory stream segregation of human and awake monkeys. Unlike

in previous studies, they used identical stimuli to conduct psychophysical experiments in

humans and single-unit extracellular recordings in the primary auditory cortex of awake

monkeys. Interestingly, using signal detection theory discussed in previous sessions, the

authors can now compare quantitatively the stochastic neural responses with the probabilistic

perceptual judgments.




WWW.BSSVE.IN


Earlier this year, Cusack (2005) used fMRI to test for an effect of perceptual organization across

the whole brain and suggested that the intraparietal sulcus (IPS) might play a role in perceptual

organization. This is also exciting in light of the growing evidence in the literature suggesting

that the IPS is also important for binding in vision, touch and cross-modally.

In a sequel to Wang’s (2005) paper, Bartlett et al (2005) extended the investigation on

recordings of auditory cortical neurons in awake marmosets to 2-sound sequences. Not only do

their results demonstrate that persistent modulations of the responses of an auditory cortical

neuron to a given stimulus can be induced by preceding stimuli, they also find that decreases or

increases of responses to the succeeding stimuli are dependent on the spectral, temporal, and

intensity properties of the preceding stimulus. Such long-lasting modulation by stimulus context

in the cortex suggests these response properties may be important for auditory streaming and

segregation.

All the proposed papers for discussion highlight some new experimental approaches or latest

findings on the neural correlates of auditory stream segregation. This is a very active area of

research on many fronts, ranging from neurophysiology to psychoacoustics. However, we are

still in the infancy stage in the quest of understanding the brain mechanism behind auditory

scene analysis. Next time when you are in a pub drinking a beer, marvel at how the brain

accomplishes such an amazing feat!

Papers for Discussion

Bartlett, E., Wang, X. (2005). “Long-Lasting Modulation by Stimulus Context in Primate Auditory Cortex,” Journal of Neurophysiology, 94, 83-104.

Cusack, R. (2005). “The Intraparietal Sulcus and Perceptual Organization,” Journal of Cognitive Neuroscience, 17, 641-651.

Micheyl, C., Tian, B., Carlyon, B., Rauschecker, J. (2005). “Perceptual Organization of Tone Sequences in the Auditory Cortex of Awake Macaques,” Neuron, 48, 139-148.

Background

Carlyon, R. (2004). “How the brain separates sounds,” Trends in Cognitive Sciences, 8, 465-471.

Darwin, C.J. (1997). “Auditory grouping,” Trends in Cognitive Sciences, 1, 327-333.




WWW.BSSVE.IN


Further Reading

Bee, M., Klump, G. (2004). “Primitive Auditory Stream Segregation: A Neurophysiological Study in the Songbird Forebrain,” Journal of Neurophysiology, 92, 1088-1104.

Fishman, Y., Reser, D., Arezzo, J., Steinschneider, M. (2001). “Neural correlates of auditory stream segregation in primary auditory cortex of the awake monkey,” Hearing Research, 151, 167-187.

Fishman, Y., Arezzo, J., Steinschneider, M. (2004). “Auditory stream segregation in monkey auditory cortex: effects of frequency separation, presentation rate, and tone duration,” Journal of Acoustical Society of America, 116, 1656-1670.

Grossberg, S., Govindarajan, K., Wyse, L., Cohen, M. (2004). “ARTSTREAM: a neural network model of auditory scene analysis and source segregation,” Neural Networks, 17, 511-536.

Gutschalk, A., Micheyl, C., Melcher, J., Rupp, A., Scherg, M., Oxenham, A. (2005). “Neuromagnetic Correlates of Streaming in Human Auditory Cortex,” Journal of Neuroscience, 25, 5382-5388.

Sussman, E., Bregman, A., Wang, W., Khan, F. (2005). “Attentional modulation of electrophysiological activity in auditory cortex for unattended sounds within multistream auditory environments,” Cognitive, Affective, & Behavioral Neuroscience, 5, 93-110.

Winkler, I., Takegata, R., Sussman, E. (2005). “Event-related brain potentials reveal multiple stages in the perceptual organization of sound,” Cognitive Brain Research, 25, 291-299.

Reference

Beauvois, M., Meddis, R. (1991). “A computer model of auditory stream segregation,” Quarterly Journal of Experimental Psychology, 43, 517-541.

Bregman, A. S. (1978). “Auditory streaming is cumulative,” Journal of Experimental Psychology: Human Perception and Performance, 4, 380-387.

Bregman, A. S. (1990). Auditory Scene Analysis: The Perceptual Organization of Sound, Cambridge, MA: MIT Press.

Carlyon, R. P., Cusack, R., Foxton, J. M., Robertson, I. H., (2001). “Effects of attention and unilateral neglect on auditory stream segregation,” Journal of Experimental Psychology: Human Perception and Performance, 27, 115-127.

Cherry, E. C. (1953). “Some experiments on the recognition of speech, with one and with two ears”, Journal of the Acoustical Society of America 25, 975-979.

Cusack, R., Deeks, J., Aikman, G., Carlyon, R. P. (2004). “Effects of Location, Frequency Region, and Time Course of Selective Attention on Auditory Scene Analysis,” Journal of Experimental Psychology: Human Perception and Performance, 30 (4), 643-656.

McCabe, S., Denham, M. (1997). “A model of auditory streaming,” Journal of Acoustical Society of America, 101, 1611-1621.

Van Noorden, L. (1975). “Temporal coherence in the perception of tone sequences,” doctoral thesis, Eindhoven University of Technology, The Netherlands.

Wang, X., Lu, T., Snider, R., Lian, L. (2005). “Sustained firing in auditory cortex evoked by preferred stimuli,” Nature, 435, 341-346.




WWW.BSSVE.IN

C a r r i e N i z i o l e kH S T . 7 2 2 t o p i c p r o p o s a lO c t o b e r 2 6 , 2 0 0 5W h a t t h e F O X P 2 g e n e c a n t e l l u s a b o u tt h e n e u r a l c o n t r o l o f s p e e c h a n d l a n g u a g eI n c o n t r a s t w i t h a l l o t h e r a n i m a l s , h u m a n s h a v e e v o l v e d t h e a b i l i t y t o a c q u i r es p o k e n l a n g u a g e . H u m a n s ’ c a p a c i t y f o r s p e e c h m u s t , i n s o m e r e s p e c t , b e d e r i v e df r o m o u r g e n e s . W h a t i s i t i n o u r g e n e s t h a t e n a b l e s u s t o a c q u i r e s p e e c h ? W h a tw o u l d s u c h a g e n e d o , a n d w h a t n e u r a l s t r u c t u r e s w o u l d d e p e n d o n i t s e x p r e s s i o n ?W h a t w o u l d h a p p e n i f i t w e r e d i s r u p t e d ? T h e r e h a s b e e n s p e c u l a t i o n a b o u t ag e n e t i c s u b s t r a t e f o r s p e e c h a b i l i t y f o r a l o n g t i m e , b u t i t w a s n ’ t u n t i l v e r y r e c e n t l yt h a t s c i e n t i s t s w e r e a b l e t o t r a c k d o w n a n d i d e n t i f y a c a n d i d a t e g e n e , o n e t h a t w o u l db e n e c e s s a r y f o r t h e a c q u i s i t i o n o f s p o k e n l a n g u a g e .A s p e e c h - i m p a i r e d f a m i l y k n o w n a s t h e K E f a m i l y a f f o r d e d g e n e t i c i s t s ap r o m i s i n g o p p o r t u n i t y t o l o o k f o r t h e “ s p e e c h g e n e . ” H a l f t h e m e m b e r s o f t h e K Ef a m i l y h a v e s e v e r e i m p a i r m e n t s i n s p e e c h a n d l a n g u a g e , i m p l y i n g a h e r e d i t a r ym u t a t i o n t h a t i s p a s s e d o n i n a s i m p l e m a n n e r . T h o s e w i t h t h e d i s o r d e r e x h i b i td e f i c i t s i n s p e e c h m o t o r c o n t r o l : t h e y h a v e t r o u b l e w i t h f i n e m o v e m e n t s i n t h e l o w e rh a l f o f t h e f a c e , d i s r u p t i n g t h e i r a b i l i t y t o s p e a k . H o w e v e r , t h e i r d i s o r d e r i s n o te n t i r e l y a r t i c u l a t o r y : t h o s e w i t h t h e m u t a t i o n a l s o h a v e d e f e c t s u n r e l a t e d t ov e r b a l i z a t i o n , f o r e x a m p l e , a d e f i c i t i n m a k i n g a w r i t t e n l i s t o f w o r d s s t a r t i n g w i t h ap a r t i c u l a r l e t t e r . 1I n 2 0 0 1 , a f t e r y e a r s o f r e s e a r c h w i t h t h e K E f a m i l y a n d C S , a n u n r e l a t e dp a t i e n t w i t h a s i m i l a r s p e e c h a b n o r m a l i t y , g e n e t i c i s t s L a i a n d F i s h e r a l i g h t e d o n t h eF O X P 2 g e n e o n c h r o m o s o m e 7 q 3 1 , a r e g i o n k n o w n t o b e r e l a t e d t o b r a i n f u n c t i o n .( T h e n a m e F O X P 2 c o m e s f r o m f o r k h e a d b o x , f o r t h e f o r k h e a d b o x p r o t e i n f a m i l yi n v o l v e d i n t r a n s c r i p t i o n . ) F O X P 2 i s e x p r e s s e d i n m a n y r e g i o n s i n t h e d e v e l o p i n g

Harvard-MIT Division of Health Sciences and TechnologyHST.722J: Brain Mechanisms for Hearing and SpeechCourse Instructors: Bertrand Delgutte, M. Christian Brown, Joe C. Adams, Kenneth E. Hancock, Frank H. Guenther, Jennifer R. Melcher, Joseph S. Perkell, David N. Caplan



WWW.BSSVE.IN

b r a i n , i n c l u d i n g t h e c o r t i c a l p l a t e , b a s a l g a n g l i a , t h a l a m u s , i n f e r i o r o l i v e , a n dc e r e b e l l u m . A s i n g l e b a s e - p a i r m u t a t i o n c a u s e d t h e F O X P 2 m u t a t i o n r e s u l t i n g i ns p e e c h d e f e c t s i n b o t h C S a n d t h e K E f a m i l y . 2T o d e t e r m i n e F O X P 2 ’ s c o n t r i b u t i o n s t o s p e e c h a b i l i t y – t o e s t a b l i s h w h a t i ta c t u a l l y d o e s w h e n i t i s e x p r e s s e d – r e c e n t s t u d i e s h a v e l o o k e d f o r d i f f e r e n c e s i nb r a i n s t r u c t u r e b e t w e e n n o r m a l h u m a n s a n d t h o s e w i t h m u t a t e d c o p i e s o f t h e g e n e .I f w e c a n d e t e r m i n e w h a t c o r t i c a l s t r u c t u r e s t h e g e n e i s r e g u l a t i n g , a n d c o r r e l a t es t r u c t u r a l i r r e g u l a r i t y w i t h b e h a v i o r a l i m p a i r m e n t s , w e c a n l e a r n m o r e a b o u t t h en e u r a l c i r c u i t s f o r s p e e c h p r o c e s s i n g a n d a r t i c u l a t i o n . T h e p a p e r s I h a v e p r o p o s e de l u c i d a t e t h e r o l e o f F O X P 2 w i t h d a t a a b o u t i t s e x p r e s s i o n i n s p e e c h a n d m o t o rc e n t e r s i n t h e b r a i n , a s w e l l a s t h e a n a t o m i c a l , f u n c t i o n a l , a n d b e h a v i o r a lc o n s e q u e n c e s o f i t s d i s r u p t i o n .P A P E R 1 : L i é g e o i s e t a l . , 2 0 0 3 3 , n e u r o i m a g i n g ( f M R I ) . T h e f u n c t i o n a la b n o r m a l i t i e s i n m e m b e r s o f t h e K E f a m i l y c a n b e r e v e a l e d i n f u n c t i o n a l i m a g i n gs t u d i e s . R e s u l t s b y L i é g e o i s s h o w a l a r g e d i s c r e p a n c y i n b r a i n a c t i v a t i o n b e t w e e ns u b j e c t s w i t h n o r m a l a n d m u t a n t c o p i e s o f t h e F O X P 2 a l l e l e . S p e c i f i c a l l y , a f f e c t e df a m i l y m e m b e r s s h o w e d a n a t y p i c a l d i s t r i b u t i o n o f a c t i v a t i o n d u r i n g b o t h s p e e c h( n o n w o r d r e p e t i t i o n ) a n d l a n g u a g e ( w o r d g e n e r a t i o n ) t a s k s . P a r t i c u l a r l y s t r i k i n gw a s a n a b n o r m a l l y b r o a d a c t i v a t i o n t h r o u g h o u t c o r t e x a n d a s i m u l t a n e o u su n d e r a c t i v a t i o n i n B r o c a ’ s a r e a a n d o t h e r s p e e c h - r e l a t e d c o r t i c a l a n d s u b c o r t i c a lb r a i n r e g i o n s . 4 . 5 T h e s t u d y a l s o d i s c u s s e s m o r p h o l o g i c a l a b n o r m a l i t y i n t h e s es t r u c t u r e s , p r i n c i p a l l y t h e c a u d a t e n u c l e u s a n d t h e p u t a m e n . O t h e r r e l a t e d i m a g i n gs t u d i e s 1 u s e v o x e l - b a s e d m o r p h o m e t r y t o s h o w b i l a t e r a l r e d u c t i o n s i n g r a y m a t t e r i nt h e v e n t r a l c e r e b e l l u m . A l l t h i s i s s t r o n g e v i d e n c e t h a t F O X P 2 m e d i a t e s c o r t i c a lp r o c e s s e s r e l a t e d t o l a n g u a g e .Figure removed due to copyright considerations.



WWW.BSSVE.IN

P A P E R 2 : S h u e t a l . 2 0 0 5 6 ,k n o c k o u t m i c e & i m m u n o h i s t o c h e m i s t r y .B e c a u s e F O X P 2 i s e x p r e s s e d i n o t h e ra n i m a l s , a l b e i t i n a s l i g h t l y d i f f e r e n tf o r m , w e c a n i n d u c e m u t a t i o n s i n t h eg e n e a n d e x a m i n e t h e r e l a t i o n sb e t w e e n g e n e i r r e g u l a r i t y a n d t h e r e s u l t i n g p h e n o t y p i c d i f f e r e n c e s . S h u e t a l .e n g i n e e r e d t w o t y p e s o f k n o c k o u t m i c e , a t y p e w i t h t w o d i s r u p t e d c o p i e s o f F O X P 2( h o m o z y g o u s ) a n d a t y p e w i t h o n e n o r m a l a n d o n e d i s r u p t e d c o p y ( h e t e r o z y g o u s ,t h e g e n o t y p e o f t h e a f f e c t e d m e m b e r s o f t h e K E f a m i l y ) . H e t e r o z y g o u s k n o c k o u tm i c e m a d e m a n y f e w e r u l t r a s o n i c v o c a l i z a t i o n s i n r e s p o n s e t o s e p a r a t i o n f r o m t h em o t h e r t h a n d i d c o n t r o l m i c e , w h e r e a s h o m o z y g o u s k n o c k o u t m i c e m a d e n ov o c a l i z a t i o n s a t a l l . T h i s r e s u l t i m p l i e s t h a t F O X P 2 p l a y s a r o l e i n t h e d e v e l o p m e n to f a n e u r a l c i r c u i t e m p l o y e d i n s o c i a l c o m m u n i c a t i o n , n o t u n i q u e l y r e l a t e d t o h u m a ns p e e c h . I n a d d i t i o n , h e t e r o z y g o u s k n o c k o u t s e x h i b i t e d a d e l a y i n m o t o rd e v e l o p m e n t , w h i l e h o m o z y g o u s k n o c k o u t m i c e h a d a s e v e r e m o t o r i m p a i r m e n t ,a f f i r m i n g t h e g e n e ’ s i n v o l v e m e n t i n m o t o r a n d s e q u e n c i n g a r e a s o f t h e b r a i n .H i s t o l o g i c a l a n a l y s i s o f t h e m i c e i l l u s t r a t e d i r r e g u l a r c e r e b e l l a r s e c t i o n s i n k n o c k o u tm i c e , p a r t i c u l a r l y i n t h e P u r k i n j e c e l l s a n d i n t h e e x t e r n a l g r a n u l a r l a y e r . F O X P 2t h e r e f o r e s e e m s t o b e i m p o r t a n t f o r b o t h c e r e b e l l a r d e v e l o p m e n t a n d m e c h a n i s m s o fs o c i a l c o m m u n i c a t i o n .P A P E R 3 : T e r a m i t s u e t a l . 2 0 0 4 7 , h i s t o l o g y & v o c a l l e a r n i n g i n s o n g b i r d s . F O X P 2i s a l s o e x p r e s s e d i n s o n g b i r d s , a n d c u r r e n t r e s e a r c h s e e m s t o s u g g e s t i t p l a y s a r o l ei n b i r d s ’ v o c a l l e a r n i n g . A c c o r d i n g t o T e r a m i t s u e t a l . , F O X P 2 i s l o c a l i z e d t o r e l a t e ds u b c o r t i c a l s t r u c t u r e s i n h u m a n s a n d b i r d s ( s p e e c h a r e a s / s o n g n u c l e i , c e r e b e l l a rP u r k i n j e c e l l s ) . T h i s c o l o c a l i z a t i o n s u g g e s t s a s s o c i a t e d m e c h a n i s m s u n d e r l y i n gv o c a l l e a r n i n g i n b o t h h u m a n s ( s p e e c h ) a n d b i r d s ( s o n g ) . F O X P 2 a n d t h e r e l a t e d




WWW.BSSVE.IN

g e n e F O X P 1 s h o w p a r a l l e l e x p r e s s i o n i n t h e h u m a n a n d s o n g b i r d b r a i n , p r e d i c t i n g ap o s s i b l e r o l e o f F O X P 1 i n s p e e c h a b i l i t y .T h i s t o p i c i s v e r y r e l e v a n t t o s t u d y i n g t h e b r a i n m e c h a n i s m s f o r s p e e c h . T h eF O X P 2 g e n e h a s t h e p o t e n t i a l t o b e a p o w e r f u l t o o l f o r e x p l o r i n g t h e n e u r a l c i r c u i t st h a t a l l o w u s t o c o m m u n i c a t e v i a s p o k e n l a n g u a g e . H o w e v e r , t h e f i e l d w o u l dc e r t a i n l y b e n e f i t f r o m s o m e s p i r i t e d c o n v e r s a t i o n a b o u t t h e l i m i t a t i o n s o f t h e s em e t h o d s , a s w e l l a s a n y u n f o u n d e d c o n c l u s i o n s t h a t s h o u l d b e a v o i d e d .F u r t h e r m o r e , t h e f u n c t i o n o f t h e g e n e i t s e l f i s a n i n t e r e s t i n g t o p i c o f s t u d y . I f F O X P 2i s a “ l a n g u a g e ” g e n e , w h a t i s i t d o i n g i n m i c e , s o n g b i r d s , a n d f u n g i ? I f i t i s m e r e l y ag e n e e n a b l i n g f i n e m o t o r m o v e m e n t s s u c h a s t h o s e u s e d i n s p e e c h , w h a t l e a d s t od e f e c t s i n l a n g u a g e d e v e l o p m e n t i n i n d i v i d u a l s l a c k i n g t h e n o r m a l g e n e ? W h a t i st h e r e l a t i o n s h i p b e t w e e n t h e m o t o r m o u t h c o n t r o l F O X P 2 m a y r e g u l a t e a n d t h el a n g u a g e a b i l i t y i t s d i s r u p t i o n i m p a i r s ? D o e s a h e a r i n g - o r m o t o r - f e e d b a c k l o o p ,s u c h a s t h a t d e s c r i b e d i n o u r s p e e c h m o t o r c o n t r o l p a p e r s , p l a y a p a r t i n a n y o f t h i s ?F i n a l l y , w h e r e d i d t h i s g e n e c o m e f r o m ? B y c o m p a r i n g t h e g e n e ’ s e x p r e s s i o ni n o t h e r a n i m a l s , w e c a n g a i n i n s i g h t o n w h e n a n d h o w s p e e c h a b i l i t y e v o l v e d i n o u rs p e c i e s ( a n d i n n o o t h e r s ) .T h e s e a r c h f o r a n d e v a l u a t i o n o f t h e s p e e c h g e n e , F O X P 2 , i s a n e x c i t i n g a n de x t r e m e l y r e c e n t i s s u e , w i t h b r o a d - r e a c h i n g i m p l i c a t i o n s f o r r e s e a r c h o n t h ee v o l u t i o n o f h u m a n s p e e c h . C u r r e n t r e s e a r c h c o m b i n e s t e c h n i q u e s f r o m g e n e t i c s ,b r a i n i m a g i n g , i m m u n o h i s t o c h e m i s t r y , a n d b e h a v i o r a l s c i e n c e t o g e t a t t h e f u n c t i o no f F O X P 2 i n h u m a n s a n d a n i m a l s . T h e F O X P 2 g e n e i s c r i t i c a l f o r t h e d e v e l o p m e n t o ft h e n e u r a l s u b s t r a t e s o f s p e e c h a n d l a n g u a g e , a n d w i l l p r o v e t o b e i m p o r t a n t t o o u ru n d e r s t a n d i n g o f c i r c u i t s u n d e r l y i n g s p e e c h a n d a r t i c u l a t i o n .



WWW.BSSVE.IN

1 . B e l t o n E e t a l . B i l a t e r a l b r a i n a b n o r m a l i t i e s a s s o c i a t e d w i t h d o m i n a n t l y i n h e r i t e d v e r b a la n d o r o f a c i a l d y s p r a x i a . H u m a n B r a i n M a p p i n g 1 8 ( 3 ) : 1 9 4 - 2 0 0 , 2 0 0 5 .2 . * * L a i C S , F i s h e r S E , H u r s t J A , V a r g h a - K h a d e m F & M o n a c o A P . A f o r k h e a d - d o m a i n g e n ei s m u t a t e d i n a s e v e r e s p e e c h a n d l a n g u a g e d i s o r d e r . N a t u r e 4 1 3 : 5 1 9 - 5 2 3 , 2 0 0 1 .3 . * L i é g e o i s F , B a l d e w e g T , C o n n e l l y A , G a d i a n D G , M i s h k i n M & V a r g h a - K h a d e m F .L a n g u a g e f M R I a b n o r m a l i t i e s a s s o c i a t e d w i t h F O X P 2 g e n e m u t a t i o n . N a t u r eN e u r o s c i e n c e 6 : 1 2 3 0 - 1 2 3 7 , 2 0 0 3 .4 . * * M a c D e r m o t K D e t a l . I d e n t i f i c a t i o n o f F O X P 2 t r u n c a t i o n a s a n o v e l c a u s e o fd e v e l o p m e n t a l s p e e c h a n d l a n g u a g e d e f i c i t s . A m . J . H u m . G e n e t . 7 6 : 1 0 7 4 - 1 0 8 0 , 2 0 0 55 . M u s s o M e t a l . B r o c a ' s a r e a a n d t h e l a n g u a g e i n s t i n c t . N a t u r e N e u r o s c i e n c e 6 : 7 7 4 - 7 8 1 ,2 0 0 3 .6 . * S h u W , C h o S Y , B u x b a u m J D e t a l . A l t e r e d u l t r a s o n i c v o c a l i z a t i o n i n m i c e w i t h ad i s r u p t i o n i n t h e F o x p 2 g e n e . P r o c N a t l A c a d S c i U S A . 2 0 0 5 J u l y 5 ; 1 0 2 ( 2 7 ) : 9 6 4 3 –9 6 4 8 , 2 0 0 5 .7 . * T e r a m i t s u I e t a l . P a r a l l e l F o x P 1 a n d F o x P 2 e x p r e s s i o n i n s o n g b i r d a n d h u m a n b r a i np r e d i c t s f u n c t i o n a l i n t e r a c t i o n . J N e u r o s c i 2 4 ( 1 3 ) : 3 1 5 2 - 3 1 6 3 , 2 0 0 4 .* p r o p o s e d f o r d i s c u s s i o n a r t i c l e s* * p r o p o s e d f o r r e v i e w / b a c k g r o u n d a r t i c l e s



WWW.BSSVE.IN



Absolute Pitch

Jianwen Wendy Gu

October 26, 2005

HST.722 Brain Mechanisms for Hearing and Speech

Introduction

Pitch is a fundamental attribute of sound, which has led to extensive research on pitch processing, categorization, and memory with the goal of elucidating the complex workings of the auditory system. The phenomenon of absolute pitch (AP), the ability to identify or produce a specified pitch without external reference, provides a unique opportunity to study the perception and neural coding of pitch.

Background

Most people have at least some degree of AP, as it is fairly easy to identify a tone as belonging to the range of a piccolo rather than a tuba. Furthermore, even people with no musical training are good at remembering and singing the pitch of a favorite song. True AP possessors, however, can do these tasks with an order of magnitude better accuracy (Figure 1). From cognitive studies, AP seems to require two separable cognitive components: very narrow fixed pitch categories and the association of these categories with verbal labels. [4]

For people with AP ability, pitch categories are as fixed and familiar as color categories are for everyone else. However, this category structure for pitch does not affect their perception of pitch, only their ability to label. AP possessors and non-possessors have equivalent acuity and and perceptual thresholds for pitch differences. [3]

AP is not an all-or-none ability; rather, it exists along a continuum. Some people with AP can label only tones produced by one particular instrument. Others have AP for only a single tone and fail to show the automatic and rapid identification found in true AP possessors. [3]

1



WWW.BSSVE.IN

Figure 1: Distributions of pitch-name responses to randomly presented tones, plotted as distance from correct response.

Key Issues

AP Acquisition

Musical training is essential for AP to manifest itself and most importantly, this training must happen early in life (Figure 2), which supports the hypothesis of a critical period during which AP can be acquired [3]. The type of training does have some influence, for


Figure 2: Data from retrospective reports on age of acquisition of AP, as modeled by a Gamma function.

much musical training involves teaching of relative pitch, but it cannot account for all of

2




WWW.BSSVE.IN

the variance in the population. In fact, AP ability often develops without explicit tutoring: mere exposure to tones and their labels can be sufficient. [4]

A fundamental question regarding AP acquisition remains to be answered: which neural events are responsible for this fairly fixed developmental time course? There are many neurodevelopmental processes with critical periods, such as language acquisition and the columnar segregation of neurons in the visual cortex. Understanding the molecular mechanisms that lead to AP acquisition could shed light on neural development in general. [4]

AP and Genetics

Since most people who receive musical training from a young age do not develop AP, early training can be only part of the story. Two recent findings point to a genetic component. First, there are significant associations between siblings who demonstrate AP even when environmental factors are controlled. Second, AP may be differentially distributed across different human populations. For example, people of Asian descent have a much greater incidence of AP than those of other backgrounds. Moreover, this higher rate of AP is found across several Asian ethnicities (Chinese, Korean, and Japanese) that are distinct from one another culturally and is unrelated to speaking tonal languages. Neither Korean nor Japanese is a tonal language, and the higher incidence has also been reported among Asian-Americans who speak only English. [4]

Although these observations support the idea of a genetic component, it is difficult to control all environmental factors. More rigorous methods and larger samples must be utilized to identify genetic influences of AP with certainty. [4]

Neural Correlates of AP

The development of structural and functional imaging techniques have enabled researchers to probe the neural correlates of AP. Anatomically, regions of the right superior temporal cortex and planum temporale (PT) tend to be smaller in AP possessors than non-possessors. [3] Early studies using event-related potentials (ERPs) indicated that listeners with AP showed an absent or reduced electrical scalp component that is thought to index the updating of working memory. Listeners without AP had a physiological response to a certain pitch change, indicating that some type of online memory system had been refreshed. AP possessors did not show this response presumably because their memory representations consist of fixed, absolute values for each pitch. Therefore, pitch representation requires encoding into working memory but no updating once it’s in memory. [4]

A related phenomenon has been observed using measures of cerebral blood flow (CBF) as shown in Figure 3, specifically positron emission tomography (PET) and functional magnetic resonance imaging (fMRI). An area of the right frontal cortex believed to be important for monitoring pitch information in working memory was more active among musicians lacking AP than in those with AP. AP possessors apparently used their categorical representation of tones in such tasks instead of requiring continuous maintenance of a sensory trace. [4]

3



WWW.BSSVE.IN

Figure 3: Data from a PET study of AP and relative (RP) possessors, who labeled (a) isolated tones and (b) intervals.

Conversely, a different area of the frontal lobe, the posterior dorsolateral cortex, was more active in subjects with AP than in control musicians, when listening to tones. This area is known to be involved in establishing and maintaining conditional associations in memory. Thus, it is a logical candidate region for the link between a pitch and its label in AP possessors. Additional support for this idea comes from the finding that the same area is active in both subjects with AP and those without AP when asked to label pairs of tones that form musical intervals. [4]

These findings show that working memory and associative learning aspects of AP draw on neural resources that are reasonably well understood. However, this still does not explain the component of AP related to fixed pitch categories. Part of the answer to this issue may lie in the function of subcortical nuclei important for periodicity coding and coincidence detection, such as the inferior colliculus, which could provide input based on intrinsic oscillations to the auditory cortex of AP possessors. [4]

4




WWW.BSSVE.IN

Topic Papers

People with absolute pitch process tones without producing a P300

Klein et al. measured the P300 in seven AP possessors and seven non-possessors in response to auditory and visual stimuli. The P300 is a positive-going component of the ERP, which is believed to be a manifestation of the processes of maintaining or updating working memory. It is obtained in the “oddball” procedure in which two discrete stimuli (one frequent and one rare) are presented in a Bernoulli sequence; the subject counts the rare stimulus, which elicits a large P300. The control subjects showed standard ERPs in both visual and auditory modalities. AP subjects, however, showed a small P300 for the rare auditory stimulus but a normal P300 for the rare visual stimulus. These results suggest that AP possessors have access to permanent representations of the tones so they do not need to utilize working memory to fetch and compare representations for novel stimuli. [2]

Functional anatomy of musical processing in listeners with absolute pitch and relative pitch

Zatorre et al. used structural and functional brain imaging techniques to investigate the neural basis of AP. Twenty right-handed musically trained volunteers were included in the study: 10 AP possessors and 10 subjects with good relative pitch (RP) but no AP. PET was used to measure CBF during the presentation of musical tones to the subjects. Similar patterns of increased CBF were seen in auditory cortical areas in both groups. The AP group also demonstrated activation of the left posterior frontal cortex, an area thought to be related to learning conditional associations. This activity was also observed in non-AP subjects when they made relative pitch judgments of intervals. Activity within the right inferior frontal cortex was observed in RP but not in AP subjects during the interval judgment task, suggesting that AP possessors need not access working memory mechanisms in this task. MRI measures of cortical volume indicated a larger left PT in the AP group. [5]

Absolute pitch and planum temporale

In this study by Keenan et al., anatomical MRI was used to measure PT size of a right-handed group of 27 AP musicians, 27 nonmusicians, and 22 non-AP musicians. The authors hoped to settle the debate over whether the increased leftward asymmetry seen in AP possessors was due to an increase in the left PT, a decrease in the right PT, or a redistribution and reallocation of resources between the two hemispheres. The AP musicians showed greater leftward asymmetry and smaller right absolute PT size compared to the other two groups. The increased PT asymmetry of AP musicians was not seen in the group of early beginning non-AP musicians, suggesting that this asymmetry might be genetically based. [1]

5



WWW.BSSVE.IN

Conclusion

In summary, AP is an ability that involves both higher and lower levels of auditory processing. Its manifestation is influenced by the interaction of genes and environment. Thus, it is a neatly packaged phenomenon for studying many aspects of brain function.

References

[1] Julian Paul Keenan, Ven Thangaraj, Andrea R. Halpern, and Gottfried Schlaug. Absolute pitch and planum temporale. NeuroImage, 14:1402–1408, 2001.

[2] Mark Klein, Michael G. H. Coles, and Emanuel Donchin. People with absolute pitch process tones without producing a P300. Science, 223(4642):1306–1309, March 1984.

[3] Daniel J. Levitin and Susan E. Rogers. Absolute pitch: Perception, coding, and controversies. Trends in Cognitive Sciences, 9(1):26–33, January 2005.

[4] Robert J. Zatorre. Absolute pitch: A model for understanding the influence of genes and development on neural and cognitive function. Nature Neuroscience, 6(7):692–695, July 2003.

[5] Robert J. Zatorre, David W. Perry, Christine A. Beckett, Christopher F. Westbury, and Alan C. Evans. Functional anatomy of musical processing in listeners with absolute pitch and relative pitch. Proc. Natl. Acad. Sci. USA, 95:3172–3177, March 1998.

6



WWW.BSSVE.IN

Download - Quantitative approaches to the study of neural coding · called “rate code”? •Poisson process completely specified by instantaneous firing rate. If instantaneous rate tracks

Top Related