a formant-trajectory model and its usage in comparing coarticulatory effects in dysarthric and...

A formant-trajectory model and its usage in comparing coarticulatory effects in

Dysarthric and normal speech

Xiaochuan Niu and Jan P. H. van Santen

Center for Spoken Language UnderstandingOGI School of Science and Engineering at Oregon Health & Science University, USA

MAVEBA 2003 Florence, Italy December 10-12, 2003

What is Dysarthria?

• Group of speech disorders – Weakness / incoordination of speech muscles – result of damage to the brain or nerves

• Results in unintelligible speech


Long Term Project Goal

• Long term goal: Speech transformation– Device that works in real time – Not:

• Amplifier, spectral filter

– But: • Correct for dynamic articulatory problems• Based on a dynamic model of coarticulation

• Today’s talk: – Test (very simple) model of vowel dynamics


Observation: Vowel Formants

• Median Formants in Vowel Centers

[pics]


Framework

• Formant Trajectories– (linear or non-linear) interpolation– between vowel targets

• Three mechanisms for vowel triangle data: 1. More coarticulation (interpolation too smooth)2. More random variability3. Incorrect targets


Mechanism 1: Coarticulation

• Average formants of any given vowel …– … more strongly dependent on …– … the average of the virtual formants …– … of the surrounding consonants


Mechanism 2: Random Variability

• Average formants of any given vowel …– … result of broad distributions that are …– … skewed by the boundaries of vowel space


Mechanism 3: Incorrect Targets

• Average formants of any given vowel …– … result of a tendency to …– … to move articulators in the wrong direction


Linear Coarticulation Model


3x1 3x3 3x1 3x3 3x1 3x3 3x3 3x3 3x3

F(t|p v n) = Apt Fp + Bnt Fn + (I - Apt - Bnt) Fv



3x1 3x3 3x1 3x3 3x1 3x3 3x3 3x3 3x3


Observed formant vectort: Time p: Preceding consonantv: Voweln: Next consonant



3x1 3x3 3x1 3x3 3x1 3x3 3x3 3x3 3x3



WeightMatrices



3x1 3x3 3x1 3x3 3x1 3x3 3x3 3x3 3x3



Target Formants

WeightMatrices



3x1 3x3 3x1 3x3 3x1 3x3 3x3 3x3 3x3


Based on earlier work by Broad, Oehman, Lindblom, Schouten, Pols, Stevens, …

How use for transformation?


3x1 3x3 3x1 3x3 3x1 3x3 3x3 3x3 3x3




3x1 3x3 3x1 3x3 3x1 3x3 3x3 3x3 3x3


Fv =est (I - Apt - Bnt)-1 (F(t|p v n) - AptFp - BntFn)

implies



3x1 3x3 3x1 3x3 3x1 3x3 3x3 3x3 3x3


Fv =est (I - Apt - Bnt)-1 (F(t|p v n) - AptFp - BntFn)

implies

Partial consonant recognition

observed

Application I


3x1 3x3 3x1 3x3 3x1 3x3 3x3 3x3 3x3


ant 0 0

0 ant 0

0 0 ant

Apt= [ ] bnt 0 0

0 bnt 0

0 0 bnt

Bpt= [ ]

• Model F(t|p v n) at vowel midpoints• Each <pvn> token may have different values of Apt and Bnt

No assumptions about dependency of weights on time.• But: assume synchronicity for formant changes:

Application I: Targets [jj/ll]


Application I: Targets [00/09]


Application I: Weights [jj/ll]


Application I: Weights [00/09]


Application II


3x1 3x3 3x1 3x3 3x1 3x3 3x3 3x3 3x3


ant 0 0

0 a’nt 0

0 0 a”nt

Apt= [ ] bnt 0 0

0 b’nt 0

0 0 b”nt

Bpt= [ ]

• Model F(t|p v n) at vowel midpoints• Apt and Bnt same for all <pvn> tokens.

Assumptions are made about dependency of weights on time.• But: no synchronicity for formant changes:

Application I: Targets [00/09]


Application II: Weights [00/09]


Conclusions

• Proposed linear model of vowel dynamics– To be used for formant “correction”

• When used as analytic instrument– Gave meaningful results

• Strikingly “normal” target values– Without any normalizing bias in the estimation process

• Clear evidence for enhanced coarticulation


a formant-trajectory model and its usage in comparing coarticulatory effects in dysarthric and...

Documents