speech enhancement for asr by hans hwang 8/23/2000 reference 1. alan v. oppenheim,etc., ”...

Speech Enhancement for ASR

by Hans Hwang 8/23/2000

Reference 1. Alan V. Oppenheim ,etc.,”Multi-Channel Signal Separation by Decorrelation”,IEEE Trans. on ASSP,405-413,1993 2.Yunxin Zhao,etc.,”Adaptive Co-channel Speech Separation an

d Recognition”,IEEE Trans. On SAP,138-151,1999 3.Ing Yang Soon,etc.,”Noisy Speech Enhancement Using Discret

e Cosine Transform”,Speech communication,249-257,1998

Outline Signal Separation by S-ADF/LMS Speech Enhancement by DCT Residual Signal Reduction Experimental Results

Speech Signal Separation Introduction: -To Recover the desired signal and identify the unknown system from the observation signal -Speech signal recovered from SSS will increase SNR and improve the speech recognition accuracy -Specifically consider the two-channel case

SSS cont’d Two-channel model description

A and B are cross-coupling effect between channels and we ignore the transfer function of each channel. xi(t) is source signal and yi(t) is acquired signal

1

1

B

AH

SSS (cont’d) Source separation system (separate source signals out from acquired signals)

and called decoupling filters and modeled as FIR filter

1

1

1

11

B

A

ABH

^

A

^^

1 BAC

^

B

SSS by ADF Calculate the FIR coeff. by adaptive decorrelation filter(ADF) proposed by A. V. Oppenheim in 1993 -The objective is to design decoupling filter s.t., the estimated signals are uncorrelated. -The decoupling filtering coeff.’s are estimated iteratively based on the previous estimated filter coeff.’s and current observations

SSS by ADF (cont’d) The closed form of decoupling filters

where

SSS by ADF (cont’d) Choice of adaptation gain -As time goes to infinite the adaptation gain goes to zero for the system stable consideration.

-Optimal choice adaptation gain for the system

stability and convergence. -

trt )(

SSS by ADF (cont’d) The experiment of : )(t

tt /5)( tttt /)(&/2.0)(

Source Signal Detection(SSD)

Introduction -If one of the two is inactive then the estimated signals will be poor by ADF and cause the recognition errors. -So the ASR and ADF are performed within active region of each target signal.

SSD (cont’d)

SSD (cont’d) SSD by coherence function

If then If then

EE KK ,2,1 0)( k

EEEE KKKKor

,2,1,2,1 1)( k

SSD (cont’d) - decision variable

-Decision Rule:

SSD (cont’d)-Implementation using DFT and Result

SSD (cont’d)

Improved Filter Estimation Widrow’s LMS algorithm proposed in 1975 -If we don’t know A or B in observation(i.e., one of the source signals is inactive) then the estimation of filters will cause much errors compared to the actual filters. -If we know source signal 2 is inactive(using SSD) then we only estimate filter B and remain filter A unchanged.

Improved Filter Estimation LMS algorithm and result

Experimental Results-Evaluate in terms of WRA and SIR

Experimental Result (cont’d) *Use 717 TIMIT

sentences to train 62 phone units.

Front-end feature is PLP and its dynamic. Grammar perplexity is

105.

After acoustic normalization

Speech Enhancement usingDiscrete Cosine Transform

Motivation -DCT provides significantly higher compaction as

compared to the DFT

SE Using DCT (cont’d) -DCT provides higher spectral resolution than DFT -DCT is real transform so it has only binary phases. Its phase won’t be changed unless added noise is strong.

1

0

1

0

)2

)12(cos()()()(

)2

)12(cos()()()(

N

k

N

n

N

knkXknx

N

knnxkkX

Nk

N

Nnk

2)(&

1)0(

1,0

Estimating signal by MMSE Intorduction -y(t)=x(t)+n(t) and Y(k)=X(k)+N(k) Assume DCT coeff.’s are statistically independent and estimated signal is less diffenent from the original signal. -

,

)](/)([)(^

KYkXEkX

by Bayes’ ruleand signal model 1)(

)()(

^

k

kkX

])([

])([

)(

)()( 2

2

kNE

kXE

k

kk

n

x

MMSE (cont’d) Estimating signal source by Decision Directed Estimation(DDE) (proposed by Ephraim & Malah in ‘8

4)

= 0.98 in computer simulation

}0),()(max{)1()()( 2^^

kkYkk npxx

Reduction of Residual Signal

Introduction -If the source signal more likely exists then the

estimated is more reliable. -two states of inputs H0:speech absent

H1:speech present

: modified filter output

)())(/()(^

1 kXkYHPkA

Reduction of Residual Signal

- where

)/)(()()/)(()(

)/)(()())(/(

2211

111 HkYpHpHkYpHp

HkYpHpkYHp

))()(,0);(()/)((

))(,0);(()/)((

1

0

kknYNHkYp

kkYNHkYp

xn

n

)(1)))()((2)()(

exp(1

1))(/( 21

kkknkYk

kYHp

x

Experimental Results Measure in Segmental SNR

* EMF DETF DETF2

6.27 11.93 11.82 11.27

-10.17 -0.07 1.93 2.09

-1.05 11.34 13.69 13.32

-21.99 -6.99 -0.04 0.95

White noise added

Fan noise added

n

x x f

x f

n2

) (

2

) (

^

log 101

Experimental Results

speech enhancement for asr by hans hwang 8/23/2000 reference 1. alan v. oppenheim,etc., ”...

Documents