advances in wp1
DESCRIPTION
Nancy Meeting – 6-7 July 2006. Advances in WP1. www.loquendo.com. WP1: Environment & Sensor Robustness T1.2 Noise Independence. Noise Reduction: Spectral Subtraction (YEAR 1) and Spectral Attenuation (YEAR2) “Automatic Speech Recognition With a Modified Ephraim-Malah Rule”, - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Advances in WP1](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815a12550346895dc75bec/html5/thumbnails/1.jpg)
Advances in WP1
Nancy Meeting – 6-7 July 2006
www.loquendo.com
![Page 2: Advances in WP1](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815a12550346895dc75bec/html5/thumbnails/2.jpg)
2
WP1: Environment & Sensor RobustnessT1.2 Noise Independence
Noise Reduction:– Spectral Subtraction (YEAR 1) and Spectral Attenuation (YEAR2)
“Automatic Speech Recognition
With a Modified Ephraim-Malah Rule”,
Roberto Gemello, Franco Mana and Renato De Mori
IEEE Signal Processing Letters, VOL 13, NO 1, January 2006
– Evaluation of HEQ for feature normalization (HEQ study + Revision 2)
![Page 3: Advances in WP1](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815a12550346895dc75bec/html5/thumbnails/3.jpg)
3
Denoising Techniques for Y2 evaluations (1)
kkk YGX ˆ
kv
t
k
kk dt
t
eG
2
1exp
1
kk
kkv
1
Ephraim–Malah MMSE log estimator rule:
Spectral Attenuation (or spectral weighting) is a form of audio signal enhancement in which noise suppression can be viewed as the application of a suppression rule, or non-negative real-valued gain Gk, to each bin k of the observed signal magnitude
spectrum, in order to form an estimate of the original signal magnitude spectrum.
2
2
k
kk
D
X 1,0,1)(,0max)1(
)1(ˆ
)1(ˆˆ
2
2
m
mD
mXk
k
k
k
2
2
k
kk
D
Y
2
2
ˆˆ
k
kk
D
Y
![Page 4: Advances in WP1](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815a12550346895dc75bec/html5/thumbnails/4.jpg)
4
Denoising Techniques for Y2 evaluations (2)
)(~
),(~)(),(~
mmGmmG kkkkkk
Modified Ephraim–Malah MMSE log estimator rule:
2
2
k
kk
D
X
)(,1)(~))(1(
)1(ˆ)(
)1(ˆ)(max
~̂2
2
mmmmDm
mXmm k
k
k
k
2
2
k
kk
D
Y 1)(,1
)(ˆ)(
)(max)(~
2
2
m
mDm
mYm
k
kk
We propose to make the estimation of the a priori and the a posteriori SNR dependent on the noise overestimation factor (m) and the spectral floor (m) as follows:
(m)
1.5
0 10 20 SNR(m) dB
0.001
(m)
1.0
0 15 20 SNR(m) dB
0.01
![Page 5: Advances in WP1](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815a12550346895dc75bec/html5/thumbnails/5.jpg)
5
Denoising Techniques for Y2 evaluations (3)
otherwisemD
falseVADmmDmYif
mYmD
mD
k
kk
kk
k
)1(ˆ
)()(ˆ)(
)(1)1(ˆ
)(ˆ
222 ˆ)(1)1()( mDmYmm kk
The noise spectrum amplitude is obtained by a first-order recursion in conjunction with an energy based Voice Activity Detector (VAD) as follows:
Where: controls the update speed of the recursion (0.9), controls the allowed dynamics of noise (4.0), and the noise standard deviation (m) is estimated as:
![Page 6: Advances in WP1](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815a12550346895dc75bec/html5/thumbnails/6.jpg)
Baseline evaluations of Loquendo ASR on Aurora2
speech databases
![Page 7: Advances in WP1](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815a12550346895dc75bec/html5/thumbnails/7.jpg)
7
Year 1+2 Performance evaluations
Test A Test B Test C A-B-C Avg
Models Clean Multi Clean Multi Clean Multi Clean Multi
ND 24.4 6.5 22.5 8.9 24.7 9.8 23.7 8.1
WM 16.0(34.4)
6.1(6.1)
15.6(30.7)
7.9(11.2)
16.7(32.4)
9.5(3.0)
16.0(32.5)
7.5(7.4)
EMM 14.7(39.7)
6.0(7.7)
15.8(29.8)
8.0(10.1)
15.2(38.5)
8.9(9.2)
15.2(35.9)
7.4(8.6)
The testing conditions used in the experiments are the following:1) No Denoising (ND): Rasta PLP features (RPLP) are used without any preliminary noise reduction.2) Wiener modified (WM): RPLP with Wiener filtering dependent on global SNR.3) Ephraim-Malah modified (EMM): RPLP with noise reduction based on the modified Ephraim-Malah spectral attenuation rule.
![Page 8: Advances in WP1](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815a12550346895dc75bec/html5/thumbnails/8.jpg)
Baseline evaluations of Loquendo ASR on Aurora3
speech databases
![Page 9: Advances in WP1](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815a12550346895dc75bec/html5/thumbnails/9.jpg)
9
Year 1+2 Performance evaluationsThe testing conditions used in the experiments are the following:1) No Denoising (ND): Rasta PLP features (RPLP) are used without any preliminary noise reduction.2) Wiener modified (WM): RPLP with Wiener filtering dependent on global SNR.3) Ephraim-Malah modified (EMM): RPLP with noise reduction based on the modified Ephraim-Malah spectral attenuation rule.
Ita WM Ita HM Spa WM Spa HM
ND 1.8 53.4 2.7 25.4
WM 1.7(5.5)
22.5(57.9)
2.4(11.1)
10.1(60.2)
EMM 1.6(11.1)
17.8(66.7)
2.3(14.8)
11.5(54.7)
![Page 10: Advances in WP1](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815a12550346895dc75bec/html5/thumbnails/10.jpg)
Baseline evaluations of Loquendo ASR on Aurora4
speech databases
![Page 11: Advances in WP1](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815a12550346895dc75bec/html5/thumbnails/11.jpg)
11
Year 1+2 Performance evaluationsThe testing conditions used in the experiments are the following:1) No Denoising (ND): Rasta PLP features (RPLP) are used without any preliminary noise reduction.2) Wiener modified (WM): RPLP with Wiener filtering dependent on global SNR.3) Ephraim-Malah modified (EMM): RPLP with noise reduction based on the modified Ephraim-Malah spectral attenuation rule.
CLEANModels
CLEAN Car Babble Restaurant Street Airport Train Station
Noise avg.
ND 14.8 45.7 76.9 70.6 66.0 70.7 67.7 66.3
WM 14.8 (00.0)
33.0(27.8)
63.4 (17.5)
69.3(1.8)
56.9 (13.8)
68.1 (3.7)
51.2 (24.4)
57.0(14.0)
EMM 14.5 (2.02)
29.6 (35.2)
62.9 (18.2)
68.4 (3.1)
54.2 (17.8)
68.4 (3.2)
46.3 (31.6)
55.0 (17.0)
![Page 12: Advances in WP1](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815a12550346895dc75bec/html5/thumbnails/12.jpg)
12
Year 1+2 Performance evaluationsThe testing conditions used in the experiments are the following:1) No Denoising (ND): Rasta PLP features (RPLP) are used without any preliminary noise reduction.2) Wiener modified (WM): RPLP with Wiener filtering dependent on global SNR.3) Ephraim-Malah modified (EMM): RPLP with noise reduction based on the modified Ephraim-Malah spectral attenuation rule.
MULTIModels
CLEAN Car Babble Restaurant Street
Airport Train Station
Noise avg.
ND 15.7 24.8 40.1 41.8 41.9 39.1 42.3 38.3
WM 16.6(-5.7)
24.1 (2.8)
39.7 (1.0)
43.2(-3.3)
39.6 (5.5)
39.5(-1.0)
37.1 (12.3)
37.2(2.9)
EMM 15.5 (1.3)
24.7 (0.4)
40.4 (-0.7)
44.2 (-5.7)
39.5 (5.7)
40.4 (-3.3)
38.2 (9.7)
37.9 (1.0)
![Page 13: Advances in WP1](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815a12550346895dc75bec/html5/thumbnails/13.jpg)
HEQ + Denoising techniques
![Page 14: Advances in WP1](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815a12550346895dc75bec/html5/thumbnails/14.jpg)
14
Problems:
(1) Context dependency (whole utterance CDF estimation the best)
(2) High variability in background noise segment
HEQ Evaluation: Revision 1 (1)(Loquendo & UGR)
HEQ (121)
E+12CEP
DE+12DEP
DDE+12DDEP
(39 coefficients)
![Page 15: Advances in WP1](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815a12550346895dc75bec/html5/thumbnails/15.jpg)
15
HEQ Integration: Revision 1 (2)(Loquendo & UGR)
Loquendo FE
UGR HEQ
Loquendo ASR
Denoise
(Power Spectrum level)
Feature Normalization
(Frame -39coeff- level)
Phoneme-based
Models
AURORA3 ITA - HM
SA WA WI WD WS
Loquendo 46.6% 77.5% 4.8% 7.2% 10.4%
+HEQ121 38.2% 69.6% 4.3% 12.6% 13.5%
HEQ121 37.9% 69.1% 3.5% 13.8% 13.5%
+HEQ1001 46.5% 77.7% 4.0% 7.3% 11.0%
![Page 16: Advances in WP1](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815a12550346895dc75bec/html5/thumbnails/16.jpg)
16
HEQ Evaluation: Revision 2 (3)(Loquendo & UGR)
HEQ (1573)E+12CEP
DE+12DEP
DDE+12DDEP
(39 coefficients)
HEQ (1573)
HEQ (1573)Benefits:
(1) Relation in magnitude and dynamics among coefficients are preserved
(2) More stable CDF estimation similar to extend the HEQ temporal window
![Page 17: Advances in WP1](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815a12550346895dc75bec/html5/thumbnails/17.jpg)
17
HEQ Evaluation: Revision 2 (4)(Loquendo & UGR)
AURORA3 ITA - HM
SA WA WI WD WS
WM 46.6% 77.5% 4.8% 7.2% 10.4%
HEQ121 47.9% 77.7% 5.1% 6.7% 10.5%
HEQ241 49.7% 79.7% 4.3% 6.6% 9.3%
WM+HEQ121 49.0% 79.2% 5.1% 5.7% 10.0%
WM+HEQ241 50.8% 79.8% 4.6% 6.1% 9.4%
![Page 18: Advances in WP1](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815a12550346895dc75bec/html5/thumbnails/18.jpg)
18
HEQ for denoising (5)(Loquendo & UGR)
Comparing RPLP / HEQrev1 / HEQrev2 using the same clean and noisy signal
![Page 19: Advances in WP1](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815a12550346895dc75bec/html5/thumbnails/19.jpg)
19
HEQ for signal level equalization (6)(Loquendo & UGR)
Comparing RPLP / HEQrev1 / HEQrev2 using the same clean signal at normal gain level and at low gain level
![Page 20: Advances in WP1](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815a12550346895dc75bec/html5/thumbnails/20.jpg)
20
WP1: Workplan
• Selection of suitable benchmark databases; (m6)
• Completion of LASR baseline experimentation of Spectral Subtraction (Wiener SNR
dependent) (m12)
• Discriminative VAD (training+AURORA3 testing) (m16)
• Exprimentation of Spectral Attenuation rule
(Ephraim-Malah SNR dependent) (m21)
• Preliminary results on spectral subtraction and HEQ techniques (m24)
• Integration of denoising and normalization techniques (m33)
• Noise estimation and reduction for non-stationary noises (m33)