a formant-trajectory model and its usage in comparing coarticulatory effects in dysarthric and...
TRANSCRIPT
A formant-trajectory model and its usage in comparing coarticulatory effects in
Dysarthric and normal speech
Xiaochuan Niu and Jan P. H. van Santen
Center for Spoken Language UnderstandingOGI School of Science and Engineering at Oregon Health & Science University, USA
MAVEBA 2003 Florence, Italy December 10-12, 2003
What is Dysarthria?
• Group of speech disorders – Weakness / incoordination of speech muscles – result of damage to the brain or nerves
• Results in unintelligible speech
MAVEBA 2003 Florence, Italy December 10-12, 2003
Long Term Project Goal
• Long term goal: Speech transformation– Device that works in real time – Not:
• Amplifier, spectral filter
– But: • Correct for dynamic articulatory problems• Based on a dynamic model of coarticulation
• Today’s talk: – Test (very simple) model of vowel dynamics
MAVEBA 2003 Florence, Italy December 10-12, 2003
Observation: Vowel Formants
• Median Formants in Vowel Centers
[pics]
MAVEBA 2003 Florence, Italy December 10-12, 2003
Framework
• Formant Trajectories– (linear or non-linear) interpolation– between vowel targets
• Three mechanisms for vowel triangle data: 1. More coarticulation (interpolation too smooth)2. More random variability3. Incorrect targets
MAVEBA 2003 Florence, Italy December 10-12, 2003
Mechanism 1: Coarticulation
• Average formants of any given vowel …– … more strongly dependent on …– … the average of the virtual formants …– … of the surrounding consonants
MAVEBA 2003 Florence, Italy December 10-12, 2003
Mechanism 2: Random Variability
• Average formants of any given vowel …– … result of broad distributions that are …– … skewed by the boundaries of vowel space
MAVEBA 2003 Florence, Italy December 10-12, 2003
Mechanism 3: Incorrect Targets
• Average formants of any given vowel …– … result of a tendency to …– … to move articulators in the wrong direction
MAVEBA 2003 Florence, Italy December 10-12, 2003
Linear Coarticulation Model
MAVEBA 2003 Florence, Italy December 10-12, 2003
3x1 3x3 3x1 3x3 3x1 3x3 3x3 3x3 3x3
F(t|p v n) = Apt Fp + Bnt Fn + (I - Apt - Bnt) Fv
Linear Coarticulation Model
MAVEBA 2003 Florence, Italy December 10-12, 2003
3x1 3x3 3x1 3x3 3x1 3x3 3x3 3x3 3x3
F(t|p v n) = Apt Fp + Bnt Fn + (I - Apt - Bnt) Fv
Observed formant vectort: Time p: Preceding consonantv: Voweln: Next consonant
Linear Coarticulation Model
MAVEBA 2003 Florence, Italy December 10-12, 2003
3x1 3x3 3x1 3x3 3x1 3x3 3x3 3x3 3x3
F(t|p v n) = Apt Fp + Bnt Fn + (I - Apt - Bnt) Fv
Observed formant vectort: Time p: Preceding consonantv: Voweln: Next consonant
WeightMatrices
Linear Coarticulation Model
MAVEBA 2003 Florence, Italy December 10-12, 2003
3x1 3x3 3x1 3x3 3x1 3x3 3x3 3x3 3x3
F(t|p v n) = Apt Fp + Bnt Fn + (I - Apt - Bnt) Fv
Observed formant vectort: Time p: Preceding consonantv: Voweln: Next consonant
Target Formants
WeightMatrices
Linear Coarticulation Model
MAVEBA 2003 Florence, Italy December 10-12, 2003
3x1 3x3 3x1 3x3 3x1 3x3 3x3 3x3 3x3
F(t|p v n) = Apt Fp + Bnt Fn + (I - Apt - Bnt) Fv
Based on earlier work by Broad, Oehman, Lindblom, Schouten, Pols, Stevens, …
How use for transformation?
MAVEBA 2003 Florence, Italy December 10-12, 2003
3x1 3x3 3x1 3x3 3x1 3x3 3x3 3x3 3x3
F(t|p v n) = Apt Fp + Bnt Fn + (I - Apt - Bnt) Fv
How use for transformation?
MAVEBA 2003 Florence, Italy December 10-12, 2003
3x1 3x3 3x1 3x3 3x1 3x3 3x3 3x3 3x3
F(t|p v n) = Apt Fp + Bnt Fn + (I - Apt - Bnt) Fv
Fv =est (I - Apt - Bnt)-1 (F(t|p v n) - AptFp - BntFn)
implies
How use for transformation?
MAVEBA 2003 Florence, Italy December 10-12, 2003
3x1 3x3 3x1 3x3 3x1 3x3 3x3 3x3 3x3
F(t|p v n) = Apt Fp + Bnt Fn + (I - Apt - Bnt) Fv
Fv =est (I - Apt - Bnt)-1 (F(t|p v n) - AptFp - BntFn)
implies
Partial consonant recognition
observed
How use for transformation?
MAVEBA 2003 Florence, Italy December 10-12, 2003
3x1 3x3 3x1 3x3 3x1 3x3 3x3 3x3 3x3
F(t|p v n) = Apt Fp + Bnt Fn + (I - Apt - Bnt) Fv
Fv =est (I - Apt - Bnt)-1 (F(t|p v n) - AptFp - BntFn)
implies
Partial consonant recognition
observed
Application I
MAVEBA 2003 Florence, Italy December 10-12, 2003
3x1 3x3 3x1 3x3 3x1 3x3 3x3 3x3 3x3
F(t|p v n) = Apt Fp + Bnt Fn + (I - Apt - Bnt) Fv
ant 0 0
0 ant 0
0 0 ant
Apt= [ ] bnt 0 0
0 bnt 0
0 0 bnt
Bpt= [ ]
• Model F(t|p v n) at vowel midpoints• Each <pvn> token may have different values of Apt and Bnt
No assumptions about dependency of weights on time.• But: assume synchronicity for formant changes:
Application II
MAVEBA 2003 Florence, Italy December 10-12, 2003
3x1 3x3 3x1 3x3 3x1 3x3 3x3 3x3 3x3
F(t|p v n) = Apt Fp + Bnt Fn + (I - Apt - Bnt) Fv
ant 0 0
0 a’nt 0
0 0 a”nt
Apt= [ ] bnt 0 0
0 b’nt 0
0 0 b”nt
Bpt= [ ]
• Model F(t|p v n) at vowel midpoints• Apt and Bnt same for all <pvn> tokens.
Assumptions are made about dependency of weights on time.• But: no synchronicity for formant changes:
Conclusions
• Proposed linear model of vowel dynamics– To be used for formant “correction”
• When used as analytic instrument– Gave meaningful results
• Strikingly “normal” target values– Without any normalizing bias in the estimation process
• Clear evidence for enhanced coarticulation
MAVEBA 2003 Florence, Italy December 10-12, 2003