speech enhancement for asr by hans hwang 8/23/2000 reference 1. alan v. oppenheim,etc., ”...
TRANSCRIPT
![Page 1: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans](https://reader036.vdocuments.mx/reader036/viewer/2022062519/5697bfac1a28abf838c9b8c5/html5/thumbnails/1.jpg)
Speech Enhancement for ASR
by Hans Hwang 8/23/2000
Reference 1. Alan V. Oppenheim ,etc.,”Multi-Channel Signal Separation by Decorrelation”,IEEE Trans. on ASSP,405-413,1993 2.Yunxin Zhao,etc.,”Adaptive Co-channel Speech Separation an
d Recognition”,IEEE Trans. On SAP,138-151,1999 3.Ing Yang Soon,etc.,”Noisy Speech Enhancement Using Discret
e Cosine Transform”,Speech communication,249-257,1998
![Page 2: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans](https://reader036.vdocuments.mx/reader036/viewer/2022062519/5697bfac1a28abf838c9b8c5/html5/thumbnails/2.jpg)
Outline Signal Separation by S-ADF/LMS Speech Enhancement by DCT Residual Signal Reduction Experimental Results
![Page 3: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans](https://reader036.vdocuments.mx/reader036/viewer/2022062519/5697bfac1a28abf838c9b8c5/html5/thumbnails/3.jpg)
Speech Signal Separation Introduction: -To Recover the desired signal and identify the unknown system from the observation signal -Speech signal recovered from SSS will increase SNR and improve the speech recognition accuracy -Specifically consider the two-channel case
![Page 4: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans](https://reader036.vdocuments.mx/reader036/viewer/2022062519/5697bfac1a28abf838c9b8c5/html5/thumbnails/4.jpg)
SSS cont’d Two-channel model description
A and B are cross-coupling effect between channels and we ignore the transfer function of each channel. xi(t) is source signal and yi(t) is acquired signal
1
1
B
AH
![Page 5: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans](https://reader036.vdocuments.mx/reader036/viewer/2022062519/5697bfac1a28abf838c9b8c5/html5/thumbnails/5.jpg)
SSS (cont’d) Source separation system (separate source signals out from acquired signals)
and called decoupling filters and modeled as FIR filter
1
1
1
11
B
A
ABH
^
A
^^
1 BAC
^
B
![Page 6: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans](https://reader036.vdocuments.mx/reader036/viewer/2022062519/5697bfac1a28abf838c9b8c5/html5/thumbnails/6.jpg)
SSS by ADF Calculate the FIR coeff. by adaptive decorre- lation filter(ADF) proposed by A. V. Oppenheim in 1993 -The objective is to design decoupling filter s.t., the estimated signals are uncorrelated. -The decoupling filtering coeff.’s are estimated iteratively based on the previous estimated filter coeff.’s and current observations
![Page 7: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans](https://reader036.vdocuments.mx/reader036/viewer/2022062519/5697bfac1a28abf838c9b8c5/html5/thumbnails/7.jpg)
SSS by ADF (cont’d) The closed form of decoupling filters
where
![Page 8: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans](https://reader036.vdocuments.mx/reader036/viewer/2022062519/5697bfac1a28abf838c9b8c5/html5/thumbnails/8.jpg)
SSS by ADF (cont’d) Choice of adaptation gain -As time goes to infinite the adaptation gain goes to zero for the system stable consideration.
-Optimal choice adaptation gain for the system
stability and convergence. -
trt )(
![Page 9: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans](https://reader036.vdocuments.mx/reader036/viewer/2022062519/5697bfac1a28abf838c9b8c5/html5/thumbnails/9.jpg)
SSS by ADF (cont’d) The experiment of : )(t
tt /5)( tttt /)(&/2.0)(
![Page 10: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans](https://reader036.vdocuments.mx/reader036/viewer/2022062519/5697bfac1a28abf838c9b8c5/html5/thumbnails/10.jpg)
Source Signal Detection(SSD)
Introduction -If one of the two is inactive then the estimated signals will be poor by ADF and cause the recog- nition errors. -So the ASR and ADF are performed within active region of each target signal.
![Page 11: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans](https://reader036.vdocuments.mx/reader036/viewer/2022062519/5697bfac1a28abf838c9b8c5/html5/thumbnails/11.jpg)
SSD (cont’d)
![Page 12: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans](https://reader036.vdocuments.mx/reader036/viewer/2022062519/5697bfac1a28abf838c9b8c5/html5/thumbnails/12.jpg)
SSD (cont’d) SSD by coherence function
If then If then
EE KK ,2,1 0)( k
EEEE KKKKor
,2,1,2,1 1)( k
![Page 13: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans](https://reader036.vdocuments.mx/reader036/viewer/2022062519/5697bfac1a28abf838c9b8c5/html5/thumbnails/13.jpg)
SSD (cont’d) - decision variable
-Decision Rule:
![Page 14: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans](https://reader036.vdocuments.mx/reader036/viewer/2022062519/5697bfac1a28abf838c9b8c5/html5/thumbnails/14.jpg)
SSD (cont’d)-Implementation using DFT and Result
![Page 15: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans](https://reader036.vdocuments.mx/reader036/viewer/2022062519/5697bfac1a28abf838c9b8c5/html5/thumbnails/15.jpg)
SSD (cont’d)
![Page 16: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans](https://reader036.vdocuments.mx/reader036/viewer/2022062519/5697bfac1a28abf838c9b8c5/html5/thumbnails/16.jpg)
Improved Filter Estimation Widrow’s LMS algorithm proposed in 1975 -If we don’t know A or B in observation(i.e., one of the source signals is inactive) then the estimation of filters will cause much errors compared to the actual filters. -If we know source signal 2 is inactive(using SSD) then we only estimate filter B and remain filter A unchanged.
![Page 17: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans](https://reader036.vdocuments.mx/reader036/viewer/2022062519/5697bfac1a28abf838c9b8c5/html5/thumbnails/17.jpg)
Improved Filter Estimation LMS algorithm and result
![Page 18: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans](https://reader036.vdocuments.mx/reader036/viewer/2022062519/5697bfac1a28abf838c9b8c5/html5/thumbnails/18.jpg)
Experimental Results-Evaluate in terms of WRA and SIR
![Page 19: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans](https://reader036.vdocuments.mx/reader036/viewer/2022062519/5697bfac1a28abf838c9b8c5/html5/thumbnails/19.jpg)
Experimental Result (cont’d) *Use 717 TIMIT
sentences to train 62 phone units.
Front-end feature is PLP and its dynamic. Grammar perplexity is
105.
After acoustic normalization
![Page 20: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans](https://reader036.vdocuments.mx/reader036/viewer/2022062519/5697bfac1a28abf838c9b8c5/html5/thumbnails/20.jpg)
Speech Enhancement usingDiscrete Cosine Transform
Motivation -DCT provides significantly higher compaction as
compared to the DFT
![Page 21: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans](https://reader036.vdocuments.mx/reader036/viewer/2022062519/5697bfac1a28abf838c9b8c5/html5/thumbnails/21.jpg)
SE Using DCT (cont’d) -DCT provides higher spectral resolution than DFT -DCT is real transform so it has only binary phases. Its phase won’t be changed unless added noise is strong.
1
0
1
0
)2
)12(cos()()()(
)2
)12(cos()()()(
N
k
N
n
N
knkXknx
N
knnxkkX
Nk
N
Nnk
2)(&
1)0(
1,0
![Page 22: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans](https://reader036.vdocuments.mx/reader036/viewer/2022062519/5697bfac1a28abf838c9b8c5/html5/thumbnails/22.jpg)
Estimating signal by MMSE Intorduction -y(t)=x(t)+n(t) and Y(k)=X(k)+N(k) Assume DCT coeff.’s are statistically independent and estimated signal is less diffenent from the original signal. -
,
)](/)([)(^
KYkXEkX
by Bayes’ ruleand signal model 1)(
)()(
^
k
kkX
])([
])([
)(
)()( 2
2
kNE
kXE
k
kk
n
x
![Page 23: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans](https://reader036.vdocuments.mx/reader036/viewer/2022062519/5697bfac1a28abf838c9b8c5/html5/thumbnails/23.jpg)
MMSE (cont’d) Estimating signal source by Decision Directed Estimation(DDE) (proposed by Ephraim & Malah in ‘8
4)
= 0.98 in computer simulation
}0),()(max{)1()()( 2^^
kkYkk npxx
![Page 24: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans](https://reader036.vdocuments.mx/reader036/viewer/2022062519/5697bfac1a28abf838c9b8c5/html5/thumbnails/24.jpg)
Reduction of Residual Signal
Introduction -If the source signal more likely exists then the
estimated is more reliable. -two states of inputs H0:speech absent
H1:speech present
: modified filter output
)())(/()(^
1 kXkYHPkA
![Page 25: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans](https://reader036.vdocuments.mx/reader036/viewer/2022062519/5697bfac1a28abf838c9b8c5/html5/thumbnails/25.jpg)
Reduction of Residual Signal
- where
)/)(()()/)(()(
)/)(()())(/(
2211
111 HkYpHpHkYpHp
HkYpHpkYHp
))()(,0);(()/)((
))(,0);(()/)((
1
0
kknYNHkYp
kkYNHkYp
xn
n
)(1)))()((2)()(
exp(1
1))(/( 21
kkknkYk
kYHp
x
![Page 26: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans](https://reader036.vdocuments.mx/reader036/viewer/2022062519/5697bfac1a28abf838c9b8c5/html5/thumbnails/26.jpg)
Experimental Results Measure in Segmental SNR
* EMF DETF DETF2
6.27 11.93 11.82 11.27
-10.17 -0.07 1.93 2.09
-1.05 11.34 13.69 13.32
-21.99 -6.99 -0.04 0.95
White noise added
Fan noise added
n
x x f
x f
n2
) (
2
) (
^
log 101
![Page 27: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans](https://reader036.vdocuments.mx/reader036/viewer/2022062519/5697bfac1a28abf838c9b8c5/html5/thumbnails/27.jpg)
Experimental Results