ece 598: the speech chain
DESCRIPTION
ECE 598: The Speech Chain. Lecture 8: Formant Transitions; Vocal Tract Transfer Function. Today. Perturbation Theory: A different way to estimate vocal tract resonant frequencies, useful for consonant transitions Syllable-Final Consonants: Formant Transitions Vocal Tract Transfer Function - PowerPoint PPT PresentationTRANSCRIPT
ECE 598: The Speech ECE 598: The Speech ChainChain
Lecture 8: Formant Lecture 8: Formant Transitions; Vocal Tract Transitions; Vocal Tract
Transfer FunctionTransfer Function
TodayToday Perturbation Theory: Perturbation Theory:
A different way to estimate vocal tract resonant A different way to estimate vocal tract resonant frequencies, useful for consonant transitionsfrequencies, useful for consonant transitions
Syllable-Final Consonants: Formant Syllable-Final Consonants: Formant TransitionsTransitions
Vocal Tract Transfer FunctionVocal Tract Transfer Function Uniform Tube (Quarter-Wave Resonator)Uniform Tube (Quarter-Wave Resonator) During Vowels: All-Pole SpectrumDuring Vowels: All-Pole Spectrum
QQ BandwidthBandwidth
Nasal Vowels: Sum of two transfer functions Nasal Vowels: Sum of two transfer functions gives spectral zerosgives spectral zeros
Topic #1:Topic #1:Perturbation TheoryPerturbation Theory
Perturbation TheoryPerturbation Theory(Chiba and Kajiyama, (Chiba and Kajiyama, The VowelThe Vowel, 1940), 1940)
A(x) is constant everywhere, except for one small perturbation.
Method: 1. Compute formants of the “unperturbed” vocal tract. 2. Perturb the formant frequencies to match the area perturbation.
Conservation of Energy Under Conservation of Energy Under PerturbationPerturbation
Conservation of Energy Under Conservation of Energy Under PerturbationPerturbation
““Sensitivity” FunctionsSensitivity” Functions
Sensitivity Functions for the Sensitivity Functions for the Quarter-Wave Resonator (Lips Quarter-Wave Resonator (Lips
Open)Open)
L
/AA/ /ER/ /IY/ /W/
• Note: low F3 of /er/ is caused in part by a side branch under the tongue – perturbation alone is not enough to explain it.
Sensitivity Functions for the Sensitivity Functions for the Half-Wave Resonator (Lips Half-Wave Resonator (Lips
Rounded)Rounded)
L
/L,OW/ /UW/
• Note: high F3 of /l/ is caused in part by a side branch above the tongue – perturbation alone is not enough to explain it.
Formant Frequencies of Formant Frequencies of VowelsVowels
From Peterson & Barney, 1952
Topic #2:Topic #2:Formant Transitions, Formant Transitions,
Syllable-Final Syllable-Final ConsonantConsonant
Events in the Closure of a Events in the Closure of a Nasal ConsonantNasal Consonant
Vowel Nasalization
Formant Transitions
Nasal Murmur
Formant Transitions: A Formant Transitions: A Perturbation Theory ModelPerturbation Theory Model
Formant Formant Transitions: Transitions:
Labial Labial ConsonantsConsonants
“the mom”
“the bug”
Formant Formant Transitions: Transitions:
Alveolar Alveolar ConsonantsConsonants
“the tug”
“the supper”
Formant Formant Transitions: Transitions: Post-alveolar Post-alveolar ConsonantsConsonants
“the shoe”
“the zsazsa”
Formant Formant Transitions: Transitions:
Velar Velar ConsonantsConsonants
“the gut”
“sing a song”
Topic #3:Topic #3:Vocal Tract Transfer Vocal Tract Transfer
FunctionsFunctions
Transfer FunctionTransfer Function ““Transfer Function” T(Transfer Function” T()=Output()=Output()/Input()/Input()) In speech, it’s convenient to write In speech, it’s convenient to write
T(T()=U)=ULL(()/U)/UGG(()) UULL(() = volume velocity at the lips) = volume velocity at the lips UUGG(() = volume velocity at the glottis) = volume velocity at the glottis T(0) = 1T(0) = 1
Speech recorded at a microphone = Speech recorded at a microphone = pressurepressure PPRR(() = R() = R()T()T()U)UGG(()) R(R() = j) = jf/r = “radiation characteristic”f/r = “radiation characteristic”
= density of air= density of air r = distance to the microphoner = distance to the microphone f = frequency in Hertzf = frequency in Hertz
Transfer Function of an Ideal Transfer Function of an Ideal Uniform TubeUniform Tube
Ideal Terminations:Ideal Terminations: Reflection coefficient at glottis: zero velocity, Reflection coefficient at glottis: zero velocity, =1=1 Reflection coefficient at lips: zero pressure, Reflection coefficient at lips: zero pressure, ==11 Obviously, this is an approximation, but it gives… Obviously, this is an approximation, but it gives…
T(T() = 1/cos() = 1/cos(L/c)L/c)
= …= ………
nn = n = nc/L – c/L – c/2Lc/2L
FFnn = nc/2L – c/4L = nc/2L – c/4L
122
232…
Transfer Function of an Ideal Uniform Transfer Function of an Ideal Uniform TubeTube
Peaks are actually infinite in height (figure is clipped to fit the display)
Transfer Function of a Non-Transfer Function of a Non-Ideal Uniform TubeIdeal Uniform Tube
Almost ideal terminations:Almost ideal terminations: At glottis: velocity almost zero, At glottis: velocity almost zero, ≈1≈1 At lips: pressure almost zero, At lips: pressure almost zero, ≈≈1 1
T(T() = 1/(j/Q +cos() = 1/(j/Q +cos(L/c))L/c))
… … at Fat Fnn=nc/2L – c/4L,…=nc/2L – c/4L,…
T(2T(2FFnn) = ) = jQjQ20log20log1010|T(2|T(2FFnn)| = 20log)| = 20log1010QQ
Transfer Function of a Non-Ideal Uniform Transfer Function of a Non-Ideal Uniform TubeTube
Transfer Function of a Vowel: Transfer Function of a Vowel: Height of First Peak is QHeight of First Peak is Q11=F=F11/B/B11
T(T() = ) = (j (j+j2+j2FFnn++BBnn)(j)(jj2j2FFnn++BBnn))
T(2T(2FF11) ≈ (2) ≈ (2FF11))22/(j4/(j4FF11BB11))
= = jFjF11/B/B11
Call QCall Qnn = F = Fnn/B/Bnn
T(2T(2FF11) ≈ ) ≈ jQjQ11
20log10|T(220log10|T(2FF11)| ≈ 20log10Q)| ≈ 20log10Q11
(2Fn)2+(Bn)2
n=1
∞
Transfer Function of a Vowel: Transfer Function of a Vowel: Bandwidth of a Peak is BBandwidth of a Peak is Bnn
T(T() = ) = (j (j+j2+j2FFnn++BBnn)(j)(jj2j2FFnn++BBnn))
T(2T(2FF11++BB11) ≈ (2) ≈ (2FF11))22/((j4/((j4FF11)()(BB11++BB11))))
= = jQjQ11/2/2
At f=FAt f=F11+0.5B+0.5Bnn, ,
|T(|T()|=0.5Q)|=0.5Qnn
20log20log1010|T(|T()| = 20log)| = 20log1010QQ11 – 3dB – 3dB
(2Fn)2+(Bn)2
n=1
∞
Amplitudes of Higher Amplitudes of Higher Formants: Include the RolloffFormants: Include the Rolloff
T(T() = ) = (j (j+j2+j2FFnn++BBnn)(j)(jj2j2FFnn++BBnn))
At f above FAt f above F11
T(2T(2f) ≈ (Ff) ≈ (F11/f)/f)
T(2T(2FF22) ≈ () ≈ (jFjF22/B/B22)(F)(F11/F/F22))
20log10|T(220log10|T(2FF22)| )|
≈ ≈ 20log20log1010QQ22 – 20log – 20log1010(F(F22/F/F11))
1/f Rolloff: 6 dB per octave (per doubling of 1/f Rolloff: 6 dB per octave (per doubling of frequency)frequency)
(2Fn)2+(Bn)2
n=1
∞
Vowel Transfer Function: Synthetic Vowel Transfer Function: Synthetic ExampleExample
L1 = 20log10(500/80)=16dB
L2 = 20log10(1500/240) – 20log10(F2/F1) = 16dB – 9.5dB
L3 = 20log10(2500/600)
– 20log10(F3/F1) – 20log10(F3/F2)
B2 = 240Hz
B1 = 80HzB3 = 600Hz?(hard to measure because rolloff from F1, F2 turns the F3 peak into a plateau)
F4 peak completely swamped by rolloff from lower formants
Shorthand Notation for the Shorthand Notation for the Spectrum of a VowelSpectrum of a Vowel
T(s) = T(s) = (s (sssnn)(s)(sssnn*)*)
s = js = jssn n = = BBnn+j2+j2FFnn
ssnn* = * = BBnnj2j2FFnn
ssnnssnn* = |s* = |snn||22 = (2 = (2FFnn))22+(+(BBnn))22
T(0) = 1T(0) = 1
20log20log1010|T(0)| = 0dB|T(0)| = 0dB
snsn*
n=1
∞
Another Shorthand Notation for Another Shorthand Notation for the Spectrum of a Vowelthe Spectrum of a Vowel
T(s) = T(s) = (1-s/s (1-s/snn)(1-s/s)(1-s/snn*)*)1
n=1
∞
Topic #4:Topic #4:Nasalized VowelsNasalized Vowels
Vowel NasalizationVowel Nasalization
Nasalized Vowel
Nasal Consonant
Nasalized VowelNasalized Vowel
PPRR(() = R() = R()(U)(ULL(()+U)+UNN(())))
UUNN(() = Volume Velocity from Nostrils) = Volume Velocity from Nostrils
PPRR(() = R() = R()(T)(TLL(()+T)+TNN(())U))UGG(())
= R(= R()T()T()U)UGG(())
T(T() = T) = TLL(() + T) + TNN(())
Nasalized VowelNasalized Vowel
T(s) = TT(s) = TLL(s)+T(s)+TNN(s)(s)
= (1-s/s= (1-s/sLnLn)(1-s/s)(1-s/sLnLn*) *) ++ (1-s/s (1-s/sNnNn)(1-s/s)(1-s/sNnNn*)*)
= (1-s/s= (1-s/sLnLn)(1-s/s)(1-s/sLnLn*)(1-s/s*)(1-s/sNnNn)(1-s/s)(1-s/sNnNn*)*)
1/s1/sZnZn = ½(1/s = ½(1/sLnLn+1/s+1/sNnNn))
ssZnZn = n = nthth spectral zero spectral zero
T(s) = 0 if s=sT(s) = 0 if s=sZnZn
1 1
2(1-s/sZn)(1-s/sZn*)
The “Pole-Zero Pair”The “Pole-Zero Pair”
20log20log1010T(T() =) =
20log20log1010(1/(1-s/s(1/(1-s/sLnLn)(1-s/s)(1-s/sLnLn*))*))
+ 20log+ 20log1010((1-s/s((1-s/sZnZn)(1-s/s)(1-s/sZnZn*)/(1-s/s*)/(1-s/sNnNn)(1-s/s)(1-s/sNnNn*))*))
= original vowel log spectrum= original vowel log spectrum
+ log spectrum of a pole-zero pair+ log spectrum of a pole-zero pair
Additive Terms in the Log Additive Terms in the Log SpectrumSpectrum
Transfer Function of a Transfer Function of a Nasalized VowelNasalized Vowel
Pole-Zero Pairs in the Pole-Zero Pairs in the SpectrogramSpectrogram
Nasal Pole
Zero
Oral Pole
SummarySummary Perturbation Theory: Perturbation Theory:
Squeeze near a velocity peak: formant goes downSqueeze near a velocity peak: formant goes down Squeeze near a pressure peak: formant goes upSqueeze near a pressure peak: formant goes up
Formant TransitionsFormant Transitions Labial closure: loci near 250, 1000, 2000 HzLabial closure: loci near 250, 1000, 2000 Hz Alveolar closure: loci near 250, 1700, 3000 HzAlveolar closure: loci near 250, 1700, 3000 Hz Velar closure: F2 and F3 come together (“velar pinch”)Velar closure: F2 and F3 come together (“velar pinch”)
Vocal Tract Transfer FunctionVocal Tract Transfer Function T(s) = T(s) = ssnnssnn*/(s-s*/(s-snn)(s-s)(s-snn*)*) T(T(=2=2Fn) = QFn) = Qnn = F = Fnn/B/Bnn 3dB bandwidth = B3dB bandwidth = Bnn Hertz Hertz T(0) = 1T(0) = 1
Nasal Vowels: Nasal Vowels: Sum of two transfer functions gives a spectral zero Sum of two transfer functions gives a spectral zero
between the oral and nasal polesbetween the oral and nasal poles Pole-zero pair is a local perturbation of the spectrumPole-zero pair is a local perturbation of the spectrum