communications & multimedia signal processing formant tracking lp with harmonic plus noise model...
Post on 15-Jan-2016
219 views
TRANSCRIPT
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing Formant Tracking LP with Harmonic Plus Noise Model
of Excitation for Speech Enhancement
Qin Yan
Communication & Multimedia Signal Processing Group
School of Engineering and Design, Brunel University
6 July, 2005
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing Outline
• Parameters estimation of HNM (incl. Pitch/Harmonic tracking in noise)
• HNM of excitation
• Formant Tracking LP with HNM of excitation in speech enhancement
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing Overview of Speech Enhancement System
LP PoleAnalysis
NoisySpeech
Formant Candidate
Estimation
Kalman Filter
Vowel/Consonant
ClassificationLP Modelof Noise
LP-Analysisand
LP-Spectral Subtraction
VAD
LP Spectrum Reconstruction
Residual De-noising
Speech Reconstruction
Enhanced Speech
Formant Track Restoration Module
HNM of Residual
Residual Restoration Module
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
)()()(ˆ tStStS nh
)(
)(
)(0)()(tL
tLk
ttjkwkh etAtS
)](),()[()( tbthtetSn
In HNM, speech is decomposed to two parts : Harmonic and noise.
where L(t) denotes the number of harmonic included in the harmonic part, ω0 denotes the pitch frequency.
Harmonic :
Noise :
Synthesized Speech :
where h the a time-varying autoregressive(AR) model and b is white Gaussian noise.
Harmonic plus Noise Model
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
HNM - Pitch Tracking
MaxF
k
MkF
MkFl
lXFEFE1
00
0
0
)(log)(
• In Griffin’s method error function in clean speech:
MaxF
k
MkF
MkFl
lXlWFEFE1
00
0
0
)(log)()(
• In noisy condition the error function is modified to including SNR dependent weights
The weighting function W(l) is a SNR-dependent given by)(1
)()(
lSNR
lSNRlW
• Error function can be extended into frequency domain:
t t
t t
twFtwts
FlrFtwts
FE
)(1)()(
)()()(
)(4
022
0022
0
t
ktwktstwtskr )()()()()( 22Where r is defined
• Each frame outputs several pitch candidates (N=3 ) and Viterbi algorithm generates the final pitch tracks.
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing Pitch Tracking Results (1)
Figure - Illustration of error function from a voiced frame
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing Pitch Tracking Results (2)
Pitch Tracking
Average RMSE
Average % error
Without SNR Weights 23.3 4.9%
With SNR weights 18.7 2.7%
Figure - An illustration of pitch tracks of a speech at sampling frequency of 16kHz.
Table - Comparison of average RMSE of pitch tracking from car noisy speech at SNR of 0 dB.
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing HNM - Harmonic Tracking
Peak picking
Pitch Tracking
Noise Speech
VADNoise model
FFTThrow away
short trajectory
Harmonic Frequency bin
tracks
The experiments shows it is better to perform harmonic tracking over the whole speech than the excitation.
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing Synthesis of Excitation(1)
Noisy Speech
Pitch estimation
Harmonic tracking
MMSELP
Analysis
UV decision
Excitation
LP Spectrum
Unvoiced excitation
voiced excitation
WGNSynthesized
unvoiced excitation
Synthesized voiced
excitation
Std & Phase
+Enhanced
Speech
VADNoise model
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing Synthesis of Excitation(2)
Voiced Excitation :
Unvoiced Excitation :
Car Noise speech at snr0
Cleaned speech by MMSE onlyClean speech
Cleaned speech by MMSE and HNM of Excitation
)*exp(*))((*)()()()()(ˆ)(
)(
)(0
jmestdmbetAmemememL
mLk
mmjkknh
)*exp(*))((*)()()(ˆ jmestdmbmeme n
Where b(m) is unit white Gaussian noise , e(m) is original excitation and a is the phases of original excitation.
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
Future Work
• Refining voiced/unvoiced classification in pitch estimation. More precise evaluation of the improved pitch estimation method compared with manually corrected pitch tracks.
• Integrate FTLP smoothing in current speech enhancement system.
• Evaluation of the whole proposed speech enhancement system by ISD, PESQ and perceptual tests.
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
Thank You!