implementation of a speech analysis-synthesis toolbox using harmonic plus noise model didier cadic...

29
Implementation of a Implementation of a speech Analysis- speech Analysis- Synthesis Toolbox using Synthesis Toolbox using Harmonic plus Noise Harmonic plus Noise Model Model Didier Cadic Didier Cadic 1 , engineering student , engineering student supervised by supervised by Olivier Cappé Olivier Cappé 1 , Maurice Charbit , Maurice Charbit 1 , , Gérard Chollet Gérard Chollet 1 , Eric Moulines , Eric Moulines 1 (presented here by Guido Aversano (presented here by Guido Aversano 1,2 1,2 ) ) 2 IIASS, IIASS, Vietri sul Mare (SA), Italy Vietri sul Mare (SA), Italy 1 Département TSI, ENST, Paris, France Département TSI, ENST, Paris, France

Upload: bruno-hoover

Post on 14-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé

Implementation of a speech Implementation of a speech Analysis-Synthesis Toolbox using Analysis-Synthesis Toolbox using

Harmonic plus Noise ModelHarmonic plus Noise Model

Didier CadicDidier Cadic11, engineering student, engineering student

supervised bysupervised by

Olivier CappéOlivier Cappé11, Maurice Charbit, Maurice Charbit11, , Gérard CholletGérard Chollet11, Eric Moulines, Eric Moulines11

(presented here by Guido Aversano(presented here by Guido Aversano1,21,2))22IIASS, IIASS, Vietri sul Mare (SA), ItalyVietri sul Mare (SA), Italy

11Département TSI, ENST, Paris, FranceDépartement TSI, ENST, Paris, France

Page 2: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé

Plan of the presentationPlan of the presentation

Text-to-speech: classic methodsText-to-speech: classic methods

HNM modelHNM model

AnalysisAnalysis

SynthesisSynthesis

Analysis-Synthesis examplesAnalysis-Synthesis examples

ConclusionsConclusions

Page 3: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé

Text-To-Speech by concatenationText-To-Speech by concatenation

EnglishEnglish, male, male

EnglishEnglish, female (vocal server example), female (vocal server example)

EnglishEnglish, female (another vocal server example), female (another vocal server example)

GermanGerman, male, male

FrenchFrench, female, female

Examples realized on the AT&T web site:Examples realized on the AT&T web site:

Page 4: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé

Text-To-Speech by concatenationText-To-Speech by concatenation

2 major challenges :2 major challenges :

smooth connection between acoustic unitssmooth connection between acoustic units

flexible prosodyflexible prosody

Page 5: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé

TD-PSOLA methodTD-PSOLA method

Analysis :Analysis :

Pitch estimationPitch estimation

Pitch-synchronous Pitch-synchronous windowing windowing

Synthesis :Synthesis :

Rearrangement of Rearrangement of framesframes

Page 6: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé

TD-PSOLA methodTD-PSOLA method

Some very good-quality results:Some very good-quality results:

Singing, originalSinging, original

Singing, modifiedSinging, modified

Time-scalingTime-scaling

Cello, originalCello, original

Cello, modifiedCello, modified

Pitch-shiftingPitch-shifting

Page 7: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé

TD-PSOLA methodTD-PSOLA method

"rain", original"rain", original

"rain", 0.5 rate"rain", 0.5 rate

"ss", original"ss", original

"ss", slowed down (classic method)"ss", slowed down (classic method)

"ss", slowed down (improved)"ss", slowed down (improved)

Artifacts appearing in non-voiced sounds:Artifacts appearing in non-voiced sounds:

Page 8: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé

Phase Vocoder methodPhase Vocoder method

Intuitive description:Intuitive description:

Compression/stretchingCompression/stretchingof (narrow-band) spectrogram’s of (narrow-band) spectrogram’s time-frequency scales…time-frequency scales…

time-scalingtime-scaling

pitch-shiftingpitch-shifting

Page 9: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé

Phase Vocoder methodPhase Vocoder method

Examples :Examples :

"rain", male voice"rain", male voice

Slow-motion by Vocoder (PSOLA : )Slow-motion by Vocoder (PSOLA : )

"The quick fox …", female voice"The quick fox …", female voice

Slow-motion by VocoderSlow-motion by Vocoder

Main problem :Main problem : phase coherence is lost in the synthesized signalphase coherence is lost in the synthesized signal

Page 10: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé

TD-PSOLA and Vocoder allow TD-PSOLA and Vocoder allow basic prosodic modifications. basic prosodic modifications.

The problem of unit concatenation for TTS isThe problem of unit concatenation for TTS is not solved. not solved.

Other kinds of modifications (timbre,Other kinds of modifications (timbre, denoising, …) should be considered. denoising, …) should be considered.

We need a parametric modelWe need a parametric model

Page 11: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé

Harmonic plus Noise Model (HNM)Harmonic plus Noise Model (HNM)

Main assumption :Main assumption :

stationary segments of a stationary segments of a speech signal can be speech signal can be always seen as the always seen as the superposition of a periodic superposition of a periodic and a noisy partand a noisy part

Page 12: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé

HNM ModelHNM Model

Modelling :Modelling :

S(t)S(t) H(t)H(t) B(t)B(t)== ++

where :where : H(t) = H(t) = A Ak k cos ( 2cos ( 2 k f k f0 0 t + t + k k ))

andand B(t) = white noise passed through an AR filterB(t) = white noise passed through an AR filter

Page 13: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé

HNM analysis of a frameHNM analysis of a frame

1.1. Pitch estimationPitch estimation

Spectral comb methodSpectral comb method

Page 14: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé

HNM analysis of a frameHNM analysis of a frame

1.1. Pitch estimationPitch estimation

Good results are obtainedGood results are obtained

In some cases the method In some cases the method erroneously returns f0/2erroneously returns f0/2

Possibility of tracking…Possibility of tracking…

"aka…aga""aka…aga"

Page 15: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé

HNM analysis of a frameHNM analysis of a frame

2.2. Harmonic part: extraction of amplitudesHarmonic part: extraction of amplitudes

Least squares methodLeast squares method

H(t) = H(t) = aakk cos ( 2cos ( 2k fk f0 0 t ) + t ) + bbkk sin ( 2sin ( 2k fk f0 0 t )t )

minmin s(t) – H(t) s(t) – H(t) 22

aak, k, bbkk

Page 16: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé

HNM analysis of a frameHNM analysis of a frame

2.2. Extraction of amplitudesExtraction of amplitudes

Problem: the noisy part gives aProblem: the noisy part gives anon-null contribution to the non-null contribution to the spectral powerspectral power

Gain correction for the harmonicsGain correction for the harmonics(using an euristic formula (using an euristic formula gg((DVDV), where ), where DVDV is the estimated voicing degree) is the estimated voicing degree)

Page 17: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé

HNM analysis of a frameHNM analysis of a frame

2.2. Extraction of amplitudesExtraction of amplitudes

Residual:Residual: R(t) = s(t) - H(t)R(t) = s(t) - H(t)

Page 18: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé

HNM analysis of a frameHNM analysis of a frame

2.2. Extraction of amplitudesExtraction of amplitudes

Possibility of improving harmonic estimationPossibility of improving harmonic estimation

Page 19: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé

where Bg = gaussian white noisewhere Bg = gaussian white noise

and F(t) = AR filter, F(z) =and F(t) = AR filter, F(z) =

HNM analysis of a frameHNM analysis of a frame

3.3. AR filter estimation for the residual:AR filter estimation for the residual:

Linear prediction methodLinear prediction method

R(t) = Bg R(t) = Bg F(t) F(t)

aa0 0 + a+ a1 1 zz-1 -1 + … + a+ … + aN N zz-N-N

11

Page 20: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé

HNM SynthesisHNM Synthesis

Interpolation for each harmonic between Interpolation for each harmonic between two succesive framestwo succesive frames

H(t) = H(t) = aakk(t)(t) cos ( 2cos ( 2k fk f00(t)(t) t ) + t ) + bbkk(t)(t) sin ( 2sin ( 2k fk f00(t)(t) t ) =t ) =

= = AAkk(t)(t) cos cos kk(t)(t)

kk(t(taa) = 2) = 2k fk f00(t(taa) ) is known by pitch analysisis known by pitch analysis..

AAkk(t(taa) and ) and kk(t(taa) ) are known at analysis instants tare known at analysis instants taa

Page 21: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé

HNM SynthesisHNM Synthesis

Erroneous pitch (usually f0/2)Erroneous pitch (usually f0/2)

harmonic correspondence problemharmonic correspondence problem

is solved introducing fictitious harmonicsis solved introducing fictitious harmonics

Page 22: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé

HNM SynthesisHNM Synthesis

AAk k cos cos kk(t)(t)Linear interpolation Linear interpolation

UnwrappingUnwrapping + + cubic interpolationcubic interpolation

Page 23: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé

HNM SynthesisHNM Synthesis

Noisy partNoisy part

Generation of normally distributed random Generation of normally distributed random numbersnumbers

AR filtering (abrupt changes of coefficients AR filtering (abrupt changes of coefficients between 2 windows have no incidence…)between 2 windows have no incidence…)

Page 24: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé

HNM SynthesisHNM Synthesis

ResultsResults

"Carottes" :"Carottes" :synthesizedsynthesized

originaloriginal

"Lawyer" :"Lawyer" :synthesizedsynthesized

originaloriginal

Tuba :Tuba :synthesizedsynthesized

originaloriginal

"wazi" :"wazi" :synthesizedsynthesized

originaloriginal

a-e-i-o-u :a-e-i-o-u :synthesizedsynthesized

originaloriginal

singing :singing :synthesizedsynthesized

originaloriginal

Page 25: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé

HNM SynthesisHNM Synthesis

ResultsResults

Discours :Discours :synthesizedsynthesized

originaloriginal

"aka aga" :"aka aga" :synthesizedsynthesized

originaloriginalDussolier :Dussolier : synthesizedsynthesized

originaloriginal

Andie :Andie :synthesizedsynthesized

originaloriginal

noisy partnoisy part

"coiffe" :"coiffe" :synthesizedsynthesized

originaloriginal

Page 26: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé

Synthesis with time-stretchingSynthesis with time-stretching

Synthesis instants (tSynthesis instants (tss) ) Analysis instants (t Analysis instants (taa))

The following parameters remain unchanged:The following parameters remain unchanged:

Noisy part parametersNoisy part parameters

The pitchThe pitch

The amplitudes AThe amplitudes Akk of the harmonics of the harmonics

Page 27: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé

Synthesis with time-stretchingSynthesis with time-stretching

Simple phase trajectories resamplingSimple phase trajectories resampling

oror

"harmonic" rephasing"harmonic" rephasing

Phase adaptationPhase adaptation

a-e-i-o-u :a-e-i-o-u : slow-motion with phase "stretching"slow-motion with phase "stretching"originaloriginal

slow-motion with "harmonic" rephasingslow-motion with "harmonic" rephasing

Page 28: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé

Final resultsFinal results

OriginalOriginal 11Synthesized with rate : Synthesized with rate :

0.40.4 0.50.5 0.60.6 0.70.7 0.80.8 1.21.2 1.51.5 22

"carottes" :"carottes" :"lawyer" :"lawyer" :

tuba :tuba :"wazi" :"wazi" :singing :singing :

"a-e-i-o-u" :"a-e-i-o-u" :Dussolier :Dussolier :Discours :Discours :

Andie :Andie :"aka aga":"aka aga":"coiffe" :"coiffe" :

Page 29: Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé

ConclusionsConclusions

Good results, showing method’s potential for Good results, showing method’s potential for different applications including TTSdifferent applications including TTS

Future work will include other kinds of Future work will include other kinds of modifications (pitch shifting, timbre etc.)modifications (pitch shifting, timbre etc.)