wavelet-based speech enhancement mahdi amiri april 2003 sharif university of technology course...

Wavelet-Based Speech EnhancementWavelet-Based Speech Enhancement

Mahdi Amiri

April 2003

Sharif University of Technology

Course Project Presentation 1

of 43Wavelet-Based Speech

Enhancement

Presentation OutlinePresentation Outline

Motivation and GoalsWavelet Transform - OverviewBasic Denoising in Wavelet DomainLiterature SurveyImplementation and ResultsConclusions and Future Works


Enhancement

Motivation and GoalsMotivation and GoalsKey ApplicationsKey Applications

Improving perceptual quality of speech– Reduce listener’s fatigue

– Hearing aidsImproving performance of

– Speech coders

– Voice recognition systems


Enhancement

Motivation and GoalsMotivation and GoalsGoals of SE in Wavelet DomainGoals of SE in Wavelet Domain

Variable window size for different frequency components– Long time intervals precise low frequency info.

– Short time intervals precise high frequency info.

Easy to implement– Fast WT computation complexity: O(n)

– FFT computation complexity: O(nlog2n)

Denoising by simple thresholding– Real-time implementation


Enhancement

Motivation and Goals

Wavelet Transform - Overview

Basic Denoising in Wavelet Domain Literature Survey Implementation and Results Conclusions and Future Works

Wavelet Transform - OverviewWavelet Transform - Overview


Enhancement

Wavelet Transform - OverviewWavelet Transform - OverviewHistoryHistory

Fourier (1807)

Haar (1910)

Math World

{ }i i i ii

f t / 2 2

22

nm

mn m

tt


Enhancement


What kind of Could be useful?– Impulse Function (Haar): Best time resolution

– Sinusoids (Fourier): Best frequency resolution

– We want both of the best resolutions

t

t

Heisenberg (1930)– Uncertainty Principle

• There is a lower bound for

(An intuitive prove in [Mac91])

t


Enhancement


Gabor (1945)– Short Time Fourier Transform (STFT)

• Disadvantage: Fixed window size


Enhancement


Constructing Wavelets– Daubechies (1988)

• Compactly Supported Wavelets

Computation of WT Coefficients– Mallat (1989)

• A fast algorithm using filter banks

1Aw( )ah k

( )dh k

2

2

( )ag k

( )dg k

2

2

S1Dw

S


Enhancement


Coarse version (Approximation)more useful than the Detail

Browsing image databases on the web Signal transmission for communication Denoising

Wavelet Tree Decomposition Wavelet Transform (WT) Undecimated WT (UWT)

We may lose what is in the Detail

Multiresolution Signal RepresentationMultiresolution Signal Representation


Enhancement


Full Tree Decomposition Wavelet Packet Transform (WPT) Undecimated WPT (UWPT)

S = A1+D1 or S = A1+AD2+DD2 or …Which decomposition path could be the best choice?

The answer leads us to the Best Basis


Enhancement


Cut if:

Entropy– Coifman, Meyer, Wickerhauser (1992)

Rate-Distortion:– Vetterli (1995)

( ) ( 1) ( 2)J parentnode J child J child Best Basis Selection CriterionsBest Basis Selection Criterions


Enhancement

Motivation and Goals Wavelet Transform - Overview

Basic Denoising in Wavelet Domain

Literature Survey Implementation and Results Conclusions and Future Works

Basic Denoising in Wavelet DomainBasic Denoising in Wavelet Domain


Enhancement


Only a few coefficients in the lower bands could be used for approximating the main features of the clean signal. Hence, by setting the smaller coefficients to zero, we can nearly optimally eliminate noise while preserving the important information of clean signal.

PrinciplePrinciple


Enhancement


Clean signalNoise signalNoisy signal

NotationNotation

x

v

y

Y X V

X Wx

Wavelet domainTime domain

V Wv

y x v

Y Wy


Enhancement


1. Framing input noisy signal

2. Forward WT of a frame

3. Thresholding (detail) wavelet coefficients

4. Inverse WT

5. Keep center part of the frame

6. Repeat for all of the frames

AlgorithmAlgorithm


Enhancement

Basic Denoising in Wavelet DomainBasic Denoising in Wavelet DomainThreshold ValueThreshold Value

ˆ 2 logVT N VisuShrink [DonJ94b]

2V̂

Threshold

Estimation of Noise variance Frame lengthN

For Gaussian white noise:

1 11( ( ) )

ˆ0.6745 0.6745

D D

V

median W median WMAD

MAD: Median Absolute Difference1 1( )DMAD median W

Another definition (wden.m):

Mahdi

The factor 0.6745 in the denominator rescales the numerator so that (zigma-hat) is also a suitable estimator for the standard deviation for Gaussian white noise (Wavelet Methods for Time Series Analysis).


Enhancement

Basic Denoising in Wavelet DomainBasic Denoising in Wavelet DomainThreshold ValueThreshold Value

Threshold in the WPT case

2ˆ 2 log( log )VT N N

ˆ 2 logjj vT N ˆ

0.6745j

jV

MAD

For the correlated noise situation:Use level dependent threshold (SureShrink [DonJ94b])

Mahdi

The factor 0.6745 in the denominator rescales the numerator so that (zigma-hat) is also a suitable estimator for the standard deviation for Gaussian white noise (Wavelet Methods for Time Series Analysis).


Enhancement

Basic Denoising in Wavelet DomainBasic Denoising in Wavelet DomainHow to ThresholdHow to Threshold

x( , )

0 xH

x TThr x T

T

sgn( )( ) x( , )

0 xS

x x T TThr x T

T

Hard Thresholding Soft Thresholding

Alteration of valuesComparison: Discontinuity T


Enhancement

Motivation and Goals Wavelet Transform - Overview Basic Denoising in Wavelet Domain

Literature Survey

Implementation and Results Conclusions and Future Works

Literature SurveyLiterature Survey


Enhancement


Title:– Speech enhancement with reduction of noise

components in the wavelet domain

Novelty:– Semisoft thresholding [GaoB95]

– Classification of unvoiced region in WD

– Different thresholding for unvoiced region

[SeoB97], Novelty[SeoB97], Novelty


Enhancement


Semisoft Thresholding: [GaoB95]– Less sensitivity to small perturbations in the data

– Smaller bias

[SeoB97], Thresholding[SeoB97], Thresholding

1

2 11 2

2 1

2

0 x

x( , ) sgn( )( ) x

x

SSThr x T x

x

Hard Soft Semisoft

12 12

Like [DonJ94b]


Enhancement


Separation of unvoiced region– Use DWT for finding

– Calculate average energy of each subband

– Current speech segment is unvoiced if:1.

2.

[SeoB97], Unvoiced Regions[SeoB97], Unvoiced Regions

3 3 2 1, , ,A D D Dw w w w

3 1 2 3, , ,A D D DEw Ew Ew Ew

1 3 1 2 1 3 and and D A D D D DEw Ew Ew Ew Ew Ew

3

1

0.9A

D

Ew

Ew

f

3Aw 1

Dw2Dw

3Dw


Enhancement


If unvoiced then threshold just highest frequency band

Implementation results– Additive white Gaussian noise

– SNR (-10dB 10 dB)– “Should we chase those cowboys?”

[SeoB97], Implementations[SeoB97], Implementations

SNR (dB)

Noisy

Enhanced

-10 0.93

-5 3.42

0 7.12

5 11.34

10 13.92

1Dw


Enhancement

Literature SurveyLiterature Survey[SooKY97], Novelty[SooKY97], Novelty

Title: Wavelet for speech denoising Novelty:

– Evaluation of different wavelets and different orders (db1-10, coif1-5, sym2-8, bior1.3-6.8)

– Spectral Subtraction in WD

– Wiener Filtering in WD (Uses two methods for estimating the a priori SNR)

• Maximum Likelihood approach

• Decision Directed approach


Enhancement

Literature SurveyLiterature Survey[SooKY97], Thresholding 1[SooKY97], Thresholding 1

1 2 1, , , , ,A D D D Dy L y L y L y L yw w w w w

( ( ) 0)Dy iw k

Use DWT and find L levels of decomposition

if then(̂ ) max(0, ( ) ( ( ))D D D

x i y i n iw k w k E w k else

(̂ ) min(0, ( ) ( ( ))D D Dx i y i n iw k w k E w k

1. Spectral Subtraction (SS) in WD

Expected value of the noise magnitude, could be estimated from silence frames

( ( ))Dn iE w k

Use similar scheme for

Ay Lw

Denoised value

Denoised value


Enhancement

Literature SurveyLiterature Survey[SooKY97], Thresholding 2[SooKY97], Thresholding 2

( )( ) ( )

( ) 1

DD Di

x i y iDi

kw k w k

k

2. Wiener Filtering in WD

is the a priori SNR ( ( ))( )

( ( ))

DD x ii D

n i

E w kk

E w k

2

2

( )(̂ ) max(0, 1)

( ( ) )

Dy iD

i Dn i

w kk

E w k

2

2

( )ˆ ˆ( ) ( 1) (1 ) max(0, 1)( ( ) )

Dy iD D

i i Dn i

w kk k

E w k

Estimating

a. Maximum Likelihood

b. Decision Directed

[0, 1], Typ. 0.9


Enhancement


Implementation results– White Gaussian noise

– Both male and female voices

– 10 levels of decomposition

[SooKY97], Implementations[SooKY97], Implementations

SNR: 5dB, L: 10

WaveType Method 1 (dB) Method 2b (dB)

bior3.1 6.569 1.764

bior4.4 19.523 21.981

Sym8 19.751 22.215


Enhancement


The methods are not particularly sensitive to the various wavelet types with the exception of Bior3.1

Wiener filtered speeches have better SNR values than Magnitude subtraction

For Wiener filtering, the decision directed approach gives better SNR values than the maximum likelihood approach

[SooKY97], Conclusions[SooKY97], Conclusions


Enhancement

Literature SurveyLiterature Survey[KimYK01], Novelty[KimYK01], Novelty

Title:– Speech enhancement using adaptive wavelet

shrinkage

Novelty:– Adaptive threshold value

• Threshold value will depend on the variance of estimated clean signal (BayesShrink)

– Classification of unvoiced region using entropy• Applies smaller threshold for unvoiced region and calls the

method as “Adaptive BayesShrink”


Enhancement

Literature SurveyLiterature Survey[KimYK01], Threshold Value[KimYK01], Threshold Value

BayesShrink: Adaptive threshold value for minimizingthe Bayesian riskis

Thus, finds the estimated threshold value as

2ˆ( )E X Xˆ ( , )X Thr Y T

2

( ) VX

X

T

Where

2ˆˆ ˆ( )ˆV

XX

T

1ˆ0.6745V

MAD 2 2ˆ ˆmax( ,0)X Y V

2 21

ˆˆ ˆ( ) max( )DY V Xif T w

[ChaYV00a]


Enhancement


Current region is unvoiced if

Unvoiced region has smaller energy, so apply a smaller threshold:

[KimYK01], Unvoiced Regions[KimYK01], Unvoiced Regions

max_yent

ent

0.77, 0.9 are selected by simulation

2 2log( )y k kk

ent y yThere was no comment about type of entropy,it could be as:

2 ˆˆ ˆ ˆ( ) ( ) . ( ).max_

yU X X V

entT T

ent


Enhancement


Implementation results:– Additive white Gaussian noise

– SNR: 0db, 10dB and 20dB

[KimYK01], Implementations[KimYK01], Implementations

VisuShrink BayesShrink

Adaptive BayesShrink

0 dB 4.8208 dB 4.4982 dB 5.5733 dB

10 dB 11.5650 dB 12.8456 dB 14.1543 dB

20 dB 16.8488 dB 21.8313 dB 23.8455 dB


Enhancement

Literature SurveyLiterature Survey[ChaKYK02], Novelty[ChaKYK02], Novelty

Title: Speech enhancement for non-stationary noise environment by adaptive wavelet packet

Novelty:– Node dependent thresholding for adaptation in

colored or non-stationary noise

– Noise estimation based on spectral entropy not MAD

– Modified hard thresholding to alleviate time-frequency discontinuities


Enhancement

Literature SurveyLiterature Survey[ChaKYK02], Threshold Value[ChaKYK02], Threshold Value

Create WPT and find best basis tree’s leaf nodes Node dependent thresholding

Noise estimation could be like:

or the following proposed method

,, ˆ 2 logj kj k VT N

,

,ˆ0.6745j k

j kV

MAD


Enhancement

Literature SurveyLiterature Survey[ChaKYK02], Noise Estimation[ChaKYK02], Noise Estimation

1. Estimate spectral pdf of wavelet packet coefficients through B bins histogram

2. Calculate normalized spectral entropy for each node in adapted wavelet packet tree

1

( ) log ( )B

Bb

Entropy n P P

1,2, , No of best nodesn

,No of coefficients in bin

length of nodej kw b

P


Enhancement

Literature SurveyLiterature Survey[ChaKYK02], Noise Estimation (cont.)[ChaKYK02], Noise Estimation (cont.)

3. Estimate spectral magnitude intensity by histogram

4. Define an auxiliary threshold

5. Estimate standard deviation of noise

( ) ( ) node_lengthn Entropy n

,ˆ [No of bins bigger than ( )] bin_widthj k n

0.7 ~ 0.9

node_length

bins of C. magnitudes

# of C. with magnitude equal to or greater than

bin’s amplitude( )n


Enhancement

Literature SurveyLiterature Survey[ChaKYK02], Noise Estimation (cont.)[ChaKYK02], Noise Estimation (cont.)

Greater disorder of wavelet coefficients (less voiced, more unvoiced)

More uniform spectral pdf

Bigger values for entropy (0 1)

Bigger value for alpha

Smaller # of bins bigger than alpha

Smaller estimation for standard deviation of noise


Enhancement

Literature SurveyLiterature Survey[ChaKYK02], Thresholding[ChaKYK02], Thresholding

/

x

( , ) 1T [(1 ) 1] sgn( ) xHm x T

x T

Thr x Tx T

ModifiedHard Thresholding


Enhancement


Implementation results:– Pink noise, SNR: -5db ~ 15 dB

[ChaKYK02], Implementations[ChaKYK02], Implementations

Noisy SpeechSNR (dB)

Level Dep.

with MAD

Node Dep.with MAD

Node Dep.with Proposed

SpectralSubtraction

-5 -3.7 3.53 3.31 0.10

0 1.11 5.43 5.91 1.77

5 5.79 7.44 8.30 2.35

10 10.15 9.49 10.47 2.83

15 14.15 11.39 12.15 4.08

0.8 255

Subjective tests were in favor of the level dependent thresholding but not every time!Anyway, the proposed method has better spectral performance (spectrogram)


Enhancement


– SNR (dB) test for various noisy speech: “We like bleu cheese but Victor prefers swiss cheese.” (SNR= 10dB)

[ChaKYK02], Implementations (cont.)[ChaKYK02], Implementations (cont.)

Noise type Level Dep.

with MAD

Node Dep.with Proposed

SpectralSubtraction

White 1029 10.35 2.39

Pink 9.47 10.49 2.42

F16 9.71 10.35 2.18

Car 9.65 13.50 1.95

Babble 9.59 10.18 2.23


Enhancement


To be continued…

Thank You.

……


Enhancement

References (1 of 2)References (1 of 2)

[ChaKYK02]

S. Chang, Y. Kwon, S. I. Yang, and I. J. Kim, “Speech enhancement for non-stationary noise environment by adaptive wavelet packet,” Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-2002, Vol. 1, pp. 561-564, 2002.

[ChaYV00a] S. G. Chang, B. Yu, and M. Vetterli, “Adaptive Wavelet Thresholding for Image Denoising and Compression,” IEEE Transaction on Image Processing, Vol. 9, No. 9, pp. 1532-1546, Sep. 2000.

[DonJ94b] D. L. Donoho and I. M. Johnstone, “Threshold selection for wavelet shrinkage of noisy data,” Proceedings of the 16th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 1994. Engineering Advances: New Opportunities for Biomedical Engineers, Vol. 1, pp. A24- A25, Nov. 1994.

[GaoB95] H. Y. Gao and A. G. Bruce, “WaveShrink with Semisoft Shrinkage,” Research Report No. 39, StatSci Division of MathSoft, Inc., 1995.

[KimYK01] I. J. Kim, S. I. Yang and Y. Kwon, “Speech enhancement using adaptive wavelet shrinkage,” Proceedings of IEEE International Symposium on Industrial Electronics, ISIE-2001 , Vol. 1, pp. 501-504, 2001.


Enhancement

References (2 of 2)References (2 of 2)

[SeoB97] J. W. Seok and K. S. Bae, “Speech enhancement with reduction of noise components in the wavelet domain,” IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-97, Vol. 2, pp. 1323-1326, Apr. 1997.

[SooKY97] I. Y. Soon, S. N. Koh and C. K. Yeo, “Wavelet for speech denoising,” Proceedings of IEEE Region 10 Annual Conference on Speech and Image Technologies for Computing and Telecommunications, TENCON-97, Vol. 2, pp. 479-482, Dec. 1997.

Wavelet-Based Speech EnhancementWavelet-Based Speech Enhancement

Thank You

Course Project Presentation 1

FIND OUT MORE AT...

1. http://ce.sharif.edu/~m_amiri/

2. http://www.aictct.com/dml/

wavelet-based speech enhancement mahdi amiri april 2003 sharif university of technology course...

Documents

wavelet domainbasic

wavelet domainonly

principlebasic denoising

framesalgorithmbasic

future worksbasic denoising

best frequency resolutionwe

best choice

short time fourier