communications & multimedia signal processing formant tracking lp with harmonic plus noise model...

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing Formant Tracking LP with Harmonic Plus Noise Model

of Excitation for Speech Enhancement

Qin Yan

Communication & Multimedia Signal Processing Group

School of Engineering and Design, Brunel University

6 July, 2005

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing Outline

• Parameters estimation of HNM (incl. Pitch/Harmonic tracking in noise)

• HNM of excitation

• Formant Tracking LP with HNM of excitation in speech enhancement

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing Overview of Speech Enhancement System

LP PoleAnalysis

NoisySpeech

Formant Candidate

Estimation

Kalman Filter

Vowel/Consonant

ClassificationLP Modelof Noise

LP-Analysisand

LP-Spectral Subtraction

VAD

LP Spectrum Reconstruction

Residual De-noising

Speech Reconstruction

Enhanced Speech

Formant Track Restoration Module

HNM of Residual

Residual Restoration Module

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

)()()(ˆ tStStS nh

)(

)(

)(0)()(tL

tLk

ttjkwkh etAtS

)](),()[()( tbthtetSn

In HNM, speech is decomposed to two parts : Harmonic and noise.

where L(t) denotes the number of harmonic included in the harmonic part, ω0 denotes the pitch frequency.

Harmonic :

Noise :

Synthesized Speech :

where h the a time-varying autoregressive(AR) model and b is white Gaussian noise.

Harmonic plus Noise Model

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

HNM - Pitch Tracking

MaxF

k

MkF

MkFl

lXFEFE1

00

0

0

)(log)(

• In Griffin’s method error function in clean speech:

MaxF

k

MkF

MkFl

lXlWFEFE1

00

0

0

)(log)()(

• In noisy condition the error function is modified to including SNR dependent weights

The weighting function W(l) is a SNR-dependent given by)(1

)()(

lSNR

lSNRlW

• Error function can be extended into frequency domain:

t t

t t

twFtwts

FlrFtwts

FE

)(1)()(

)()()(

)(4

022

0022

0

t

ktwktstwtskr )()()()()( 22Where r is defined

• Each frame outputs several pitch candidates (N=3 ) and Viterbi algorithm generates the final pitch tracks.

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing Pitch Tracking Results (1)

Figure - Illustration of error function from a voiced frame

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing Pitch Tracking Results (2)

Pitch Tracking

Average RMSE

Average % error

Without SNR Weights 23.3 4.9%

With SNR weights 18.7 2.7%

Figure - An illustration of pitch tracks of a speech at sampling frequency of 16kHz.

Table - Comparison of average RMSE of pitch tracking from car noisy speech at SNR of 0 dB.

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing HNM - Harmonic Tracking

Peak picking

Pitch Tracking

Noise Speech

VADNoise model

FFTThrow away

short trajectory

Harmonic Frequency bin

tracks

The experiments shows it is better to perform harmonic tracking over the whole speech than the excitation.

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing Synthesis of Excitation(1)

Noisy Speech

Pitch estimation

Harmonic tracking

MMSELP

Analysis

UV decision

Excitation

LP Spectrum

Unvoiced excitation

voiced excitation

WGNSynthesized

unvoiced excitation

Synthesized voiced

excitation

Std & Phase

+Enhanced

Speech

VADNoise model

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing Synthesis of Excitation(2)

Voiced Excitation :

Unvoiced Excitation :

Car Noise speech at snr0

Cleaned speech by MMSE onlyClean speech

Cleaned speech by MMSE and HNM of Excitation

)*exp(*))((*)()()()()(ˆ)(

)(

)(0

jmestdmbetAmemememL

mLk

mmjkknh

)*exp(*))((*)()()(ˆ jmestdmbmeme n

Where b(m) is unit white Gaussian noise , e(m) is original excitation and a is the phases of original excitation.

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Future Work

• Refining voiced/unvoiced classification in pitch estimation. More precise evaluation of the improved pitch estimation method compared with manually corrected pitch tracks.

• Integrate FTLP smoothing in current speech enhancement system.

• Evaluation of the whole proposed speech enhancement system by ISD, PESQ and perceptual tests.

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Thank You!

communications & multimedia signal processing formant tracking lp with harmonic plus noise model...

Documents