ece 598: the speech chain

ECE 598: The Speech ECE 598: The Speech ChainChain

Lecture 8: Formant Lecture 8: Formant Transitions; Vocal Tract Transitions; Vocal Tract

Transfer FunctionTransfer Function

TodayToday Perturbation Theory: Perturbation Theory:

A different way to estimate vocal tract resonant A different way to estimate vocal tract resonant frequencies, useful for consonant transitionsfrequencies, useful for consonant transitions

Syllable-Final Consonants: Formant Syllable-Final Consonants: Formant TransitionsTransitions

Vocal Tract Transfer FunctionVocal Tract Transfer Function Uniform Tube (Quarter-Wave Resonator)Uniform Tube (Quarter-Wave Resonator) During Vowels: All-Pole SpectrumDuring Vowels: All-Pole Spectrum

QQ BandwidthBandwidth

Nasal Vowels: Sum of two transfer functions Nasal Vowels: Sum of two transfer functions gives spectral zerosgives spectral zeros

Topic #1:Topic #1:Perturbation TheoryPerturbation Theory

Perturbation TheoryPerturbation Theory(Chiba and Kajiyama, (Chiba and Kajiyama, The VowelThe Vowel, 1940), 1940)

A(x) is constant everywhere, except for one small perturbation.

Method: 1. Compute formants of the “unperturbed” vocal tract. 2. Perturb the formant frequencies to match the area perturbation.

Conservation of Energy Under Conservation of Energy Under PerturbationPerturbation

““Sensitivity” FunctionsSensitivity” Functions

Sensitivity Functions for the Sensitivity Functions for the Quarter-Wave Resonator (Lips Quarter-Wave Resonator (Lips

Open)Open)

L

/AA/ /ER/ /IY/ /W/

• Note: low F3 of /er/ is caused in part by a side branch under the tongue – perturbation alone is not enough to explain it.

Sensitivity Functions for the Sensitivity Functions for the Half-Wave Resonator (Lips Half-Wave Resonator (Lips

Rounded)Rounded)

L

/L,OW/ /UW/

• Note: high F3 of /l/ is caused in part by a side branch above the tongue – perturbation alone is not enough to explain it.

Formant Frequencies of Formant Frequencies of VowelsVowels

From Peterson & Barney, 1952

Topic #2:Topic #2:Formant Transitions, Formant Transitions,

Syllable-Final Syllable-Final ConsonantConsonant

Events in the Closure of a Events in the Closure of a Nasal ConsonantNasal Consonant

Vowel Nasalization

Formant Transitions

Nasal Murmur

Formant Transitions: A Formant Transitions: A Perturbation Theory ModelPerturbation Theory Model

Formant Formant Transitions: Transitions:

Labial Labial ConsonantsConsonants

“the mom”

“the bug”


Alveolar Alveolar ConsonantsConsonants

“the tug”

“the supper”

Formant Formant Transitions: Transitions: Post-alveolar Post-alveolar ConsonantsConsonants

“the shoe”

“the zsazsa”


Velar Velar ConsonantsConsonants

“the gut”

“sing a song”

Topic #3:Topic #3:Vocal Tract Transfer Vocal Tract Transfer

FunctionsFunctions

Transfer FunctionTransfer Function ““Transfer Function” T(Transfer Function” T()=Output()=Output()/Input()/Input()) In speech, it’s convenient to write In speech, it’s convenient to write

T(T()=U)=ULL(()/U)/UGG(()) UULL(() = volume velocity at the lips) = volume velocity at the lips UUGG(() = volume velocity at the glottis) = volume velocity at the glottis T(0) = 1T(0) = 1

Speech recorded at a microphone = Speech recorded at a microphone = pressurepressure PPRR(() = R() = R()T()T()U)UGG(()) R(R() = j) = jf/r = “radiation characteristic”f/r = “radiation characteristic”

= density of air= density of air r = distance to the microphoner = distance to the microphone f = frequency in Hertzf = frequency in Hertz

Transfer Function of an Ideal Transfer Function of an Ideal Uniform TubeUniform Tube

Ideal Terminations:Ideal Terminations: Reflection coefficient at glottis: zero velocity, Reflection coefficient at glottis: zero velocity, =1=1 Reflection coefficient at lips: zero pressure, Reflection coefficient at lips: zero pressure, ==11 Obviously, this is an approximation, but it gives… Obviously, this is an approximation, but it gives…

T(T() = 1/cos() = 1/cos(L/c)L/c)

= …= ………

nn = n = nc/L – c/L – c/2Lc/2L

FFnn = nc/2L – c/4L = nc/2L – c/4L

122

232…

Transfer Function of an Ideal Uniform Transfer Function of an Ideal Uniform TubeTube

Peaks are actually infinite in height (figure is clipped to fit the display)

Transfer Function of a Non-Transfer Function of a Non-Ideal Uniform TubeIdeal Uniform Tube

Almost ideal terminations:Almost ideal terminations: At glottis: velocity almost zero, At glottis: velocity almost zero, ≈1≈1 At lips: pressure almost zero, At lips: pressure almost zero, ≈≈1 1

T(T() = 1/(j/Q +cos() = 1/(j/Q +cos(L/c))L/c))

… … at Fat Fnn=nc/2L – c/4L,…=nc/2L – c/4L,…

T(2T(2FFnn) = ) = jQjQ20log20log1010|T(2|T(2FFnn)| = 20log)| = 20log1010QQ

Transfer Function of a Non-Ideal Uniform Transfer Function of a Non-Ideal Uniform TubeTube

Transfer Function of a Vowel: Transfer Function of a Vowel: Height of First Peak is QHeight of First Peak is Q11=F=F11/B/B11

T(T() = ) = (j (j+j2+j2FFnn++BBnn)(j)(jj2j2FFnn++BBnn))

T(2T(2FF11) ≈ (2) ≈ (2FF11))22/(j4/(j4FF11BB11))

= = jFjF11/B/B11

Call QCall Qnn = F = Fnn/B/Bnn

T(2T(2FF11) ≈ ) ≈ jQjQ11

20log10|T(220log10|T(2FF11)| ≈ 20log10Q)| ≈ 20log10Q11

(2Fn)2+(Bn)2

n=1

∞

Transfer Function of a Vowel: Transfer Function of a Vowel: Bandwidth of a Peak is BBandwidth of a Peak is Bnn


T(2T(2FF11++BB11) ≈ (2) ≈ (2FF11))22/((j4/((j4FF11)()(BB11++BB11))))

= = jQjQ11/2/2

At f=FAt f=F11+0.5B+0.5Bnn, ,

|T(|T()|=0.5Q)|=0.5Qnn

20log20log1010|T(|T()| = 20log)| = 20log1010QQ11 – 3dB – 3dB

(2Fn)2+(Bn)2

n=1

∞

Amplitudes of Higher Amplitudes of Higher Formants: Include the RolloffFormants: Include the Rolloff


At f above FAt f above F11

T(2T(2f) ≈ (Ff) ≈ (F11/f)/f)

T(2T(2FF22) ≈ () ≈ (jFjF22/B/B22)(F)(F11/F/F22))

20log10|T(220log10|T(2FF22)| )|

≈ ≈ 20log20log1010QQ22 – 20log – 20log1010(F(F22/F/F11))

1/f Rolloff: 6 dB per octave (per doubling of 1/f Rolloff: 6 dB per octave (per doubling of frequency)frequency)

(2Fn)2+(Bn)2

n=1

∞

Vowel Transfer Function: Synthetic Vowel Transfer Function: Synthetic ExampleExample

L1 = 20log10(500/80)=16dB

L2 = 20log10(1500/240) – 20log10(F2/F1) = 16dB – 9.5dB

L3 = 20log10(2500/600)

– 20log10(F3/F1) – 20log10(F3/F2)

B2 = 240Hz

B1 = 80HzB3 = 600Hz?(hard to measure because rolloff from F1, F2 turns the F3 peak into a plateau)

F4 peak completely swamped by rolloff from lower formants

Shorthand Notation for the Shorthand Notation for the Spectrum of a VowelSpectrum of a Vowel

T(s) = T(s) = (s (sssnn)(s)(sssnn*)*)

s = js = jssn n = = BBnn+j2+j2FFnn

ssnn* = * = BBnnj2j2FFnn

ssnnssnn* = |s* = |snn||22 = (2 = (2FFnn))22+(+(BBnn))22

T(0) = 1T(0) = 1

20log20log1010|T(0)| = 0dB|T(0)| = 0dB

snsn*

n=1

∞

Another Shorthand Notation for Another Shorthand Notation for the Spectrum of a Vowelthe Spectrum of a Vowel

T(s) = T(s) = (1-s/s (1-s/snn)(1-s/s)(1-s/snn*)*)1

n=1

∞

Topic #4:Topic #4:Nasalized VowelsNasalized Vowels

Vowel NasalizationVowel Nasalization

Nasalized Vowel

Nasal Consonant

Nasalized VowelNasalized Vowel

PPRR(() = R() = R()(U)(ULL(()+U)+UNN(())))

UUNN(() = Volume Velocity from Nostrils) = Volume Velocity from Nostrils

PPRR(() = R() = R()(T)(TLL(()+T)+TNN(())U))UGG(())

= R(= R()T()T()U)UGG(())

T(T() = T) = TLL(() + T) + TNN(())

Nasalized VowelNasalized Vowel

T(s) = TT(s) = TLL(s)+T(s)+TNN(s)(s)

= (1-s/s= (1-s/sLnLn)(1-s/s)(1-s/sLnLn*) *) ++ (1-s/s (1-s/sNnNn)(1-s/s)(1-s/sNnNn*)*)

= (1-s/s= (1-s/sLnLn)(1-s/s)(1-s/sLnLn*)(1-s/s*)(1-s/sNnNn)(1-s/s)(1-s/sNnNn*)*)

1/s1/sZnZn = ½(1/s = ½(1/sLnLn+1/s+1/sNnNn))

ssZnZn = n = nthth spectral zero spectral zero

T(s) = 0 if s=sT(s) = 0 if s=sZnZn

1 1

2(1-s/sZn)(1-s/sZn*)

The “Pole-Zero Pair”The “Pole-Zero Pair”

20log20log1010T(T() =) =

20log20log1010(1/(1-s/s(1/(1-s/sLnLn)(1-s/s)(1-s/sLnLn*))*))

+ 20log+ 20log1010((1-s/s((1-s/sZnZn)(1-s/s)(1-s/sZnZn*)/(1-s/s*)/(1-s/sNnNn)(1-s/s)(1-s/sNnNn*))*))

= original vowel log spectrum= original vowel log spectrum

+ log spectrum of a pole-zero pair+ log spectrum of a pole-zero pair

Additive Terms in the Log Additive Terms in the Log SpectrumSpectrum

Transfer Function of a Transfer Function of a Nasalized VowelNasalized Vowel

Pole-Zero Pairs in the Pole-Zero Pairs in the SpectrogramSpectrogram

Nasal Pole

Zero

Oral Pole

SummarySummary Perturbation Theory: Perturbation Theory:

Squeeze near a velocity peak: formant goes downSqueeze near a velocity peak: formant goes down Squeeze near a pressure peak: formant goes upSqueeze near a pressure peak: formant goes up

Formant TransitionsFormant Transitions Labial closure: loci near 250, 1000, 2000 HzLabial closure: loci near 250, 1000, 2000 Hz Alveolar closure: loci near 250, 1700, 3000 HzAlveolar closure: loci near 250, 1700, 3000 Hz Velar closure: F2 and F3 come together (“velar pinch”)Velar closure: F2 and F3 come together (“velar pinch”)

Vocal Tract Transfer FunctionVocal Tract Transfer Function T(s) = T(s) = ssnnssnn*/(s-s*/(s-snn)(s-s)(s-snn*)*) T(T(=2=2Fn) = QFn) = Qnn = F = Fnn/B/Bnn 3dB bandwidth = B3dB bandwidth = Bnn Hertz Hertz T(0) = 1T(0) = 1

Nasal Vowels: Nasal Vowels: Sum of two transfer functions gives a spectral zero Sum of two transfer functions gives a spectral zero

between the oral and nasal polesbetween the oral and nasal poles Pole-zero pair is a local perturbation of the spectrumPole-zero pair is a local perturbation of the spectrum

ece 598: the speech chain

Documents

hertztransfer function

20log10qtransfer function

displaytransfer function

tongue perturbation

area perturbation

small perturbation

p jw j2pfn pbnjwj2pfn

ideal uniform tubepeaks