issp2011 poster hmm-mge v6atef.ben-youssef/... · issp2011_poster_hmm-mge_v6.2 author: atef created...

Atef Ben Youssef, Pierre Badin & Gérard BaillyAtef Ben Youssef, Pierre Badin & Gérard BaillyGIPSA-lab (DPC / ICP), UMR 5216 CNRS / INP / UJF / U. Stendhal, Grenoble, France

{Atef.Ben-Youssef, Pierre.Badin, Gerard.Bailly}@gipsa-lab.grenoble-inp.fr

/a/ /i/ /y/ /u/

{Atef.Ben-Youssef, Pierre.Badin, Gerard.Bailly}@gipsa-lab.grenoble-inp.fr

Grenoble

Images

�

Visual articulatory feedback

/a/ /i/ /y/ /u/

Images

Parole

Signal

– Visual articulatory feedback

– Estimation of the articulatory movements from the speech signal Signal

Automatique

– Estimation of the articulatory movements from the speech signal

– Speech inversion system: HMM-based acoustic recognition and articulatory synthesis

��

– Acoustic HMMs�

– Corpus: One French male speaker, Parallel data (17 mn, 5100 phones, 36 phonemes): acoustic features x (MFCC+E+∆) //

● Tied-states and multi-Gaussian mixtures

–Decision tree-based state tying

phones, 36 phonemes): acoustic features x (MFCC+E+∆) // articulatory features y (12 EMA coordinates +∆)

Speech production model: Left-to-right, 3-state, multi-stream –Decision tree-based state tying

–Multiple mixtures component Gaussian distributions

– Speech production model: Left-to-right, 3-state, multi-stream HMM λ trained using Maximum Likelihood Estimation (MLE)

–Multiple mixtures component Gaussian distributions

� Improve statistics when the number of occurrences is low – Acoustic-to-articulatory inversion:

p(y| x) = p(y| λ,q) p(x| λ,q) P(λ){ })(maxargˆ xypy

y=

� Improve the context-dependency

Articulatory HMMs

p(y| x) = p(y| λ,q) p(x| λ,q) P(λ)Acoustic decoding (+Language model): Viterbi algorithm

– Articulatory HMMs● Minimum Generation Error (MGE)

Articulatory synthesis: Maximum Likelihood Parameter Generation (MLPG)

● Minimum Generation Error (MGE)

–Generation error defined as the Euclidean distance between the generated Ŷ and the measured Y articulatory trajectories

( ) ∑T2( ) ∑=

=−=T

ttt yyYYYYD

1

22ˆ,ˆˆ,

λn

λ1

–Articulatory HMMs parameters (mean µ and variance σ) updated as

=t 1λn

updated as( )origgenoldupdate µµµµ −−=...

λn λn ( )origgenoldupdate µµµµ −−=

( )( )∑∑ −−−=N T

ooo22 ˆˆ1 µσσAcoustic state decoder ( )( )∑∑

= =

−−−=n t

tntntntnoldupdate oooNT 1 1

,,,,22 ˆˆ µσσAcoustic state decoder

q2q2q1q1

λk

q3

…q2q2q2q1

λl

q3

�

– French acoustic-articulatory corpus OriginalMGE

Measured and synthesized articulatory spaces– French acoustic-articulatory corpus

Original/a/ /k/ /a/

MGEMLE

Tongue tip y

OriginalMGEMLE

/a/ /k/ /a/

Tongue tip y

Tongue

MLE

Tongue middle y

Tongue back y

100,00

Recognition rate

2,00

Inversion from audio alone Inversion from audio and text MLEMGE

85,46 84,31 86,19 86,35

80,00

100,00

Acc

ura

cy

1,721,55 1,48 1,58

1,50

2,00

RM

SE

(m

m)

1,88

1,54 1,561,44

1,62

1,38 1,40 1,351,50

2,00

RM

SE

(m

m)

MGE

40,00

60,00

Acc

ura

cy

1,00

1,50

RM

SE

(m

m)

1,38 1,40 1,35

1,00

1,50

RM

SE

(m

m)

0,00

20,00

40,00

Acc

ura

cy

0,00

0,50RM

SE

(m

m)

0,00

0,50

RM

SE

(m

m)

0,00

no-ctx L-ctx ctx-R L-ctx-R

0,00


0,00


– English acoustic-articulatory corpus (MOCHA-TIMIT, fsew0): 21 mn, 14000 phones, 45 phonemes– English acoustic-articulatory corpus (MOCHA-TIMIT, fsew0): 21 mn, 14000 phones, 45 phonemes

Inversion from audio and text MLERecognition rate Inversion from audio alone

1,961,76 1,77 1,741,68 1,61 1,59 1,62

2,00

RM

SE

(m

m)

Inversion from audio and text MLEMGE

80,00

100,00

Recognition rate

1,851,68 1,66

1,792,00

RM

SE

(m

m)

Inversion from audio alone

1,76 1,77 1,741,68 1,61 1,59 1,62

1,00

1,50

RM

SE

(m

m)

55,8267,89 70,20 66,30

60,00

80,00

Acc

ura

cy

1,00

1,50

RM

SE

(m

m)

0,50

1,00RM

SE

(m

m)

20,00

40,00

Acc

ura

cy

0,50

1,00

RM

SE

(m

m)

0,00


0,00

20,00


0,00

no-ctx L-ctx ctx-R L-ctx-R no-ctx L-ctx ctx-R L-ctx-Rno-ctx L-ctx ctx-R L-ctx-R no-ctx L-ctx ctx-R L-ctx-R

�

– Christophe Savariaux for its involvement in the recording of EMA data. – Christophe Savariaux for its involvement in the recording of EMA data.

– Work partially supported by the French ANR-08-EMER-001-02 project ARTIS.

issp2011 poster hmm-mge v6atef.ben-youssef/... · issp2011_poster_hmm-mge_v6.2 author: atef created...

Documents