noise reduction

Noise ReductionNoise Reduction

Two Stage Mel-Warped Weiner Filter Approach

Two Stage Mel-Warped Weiner Filter Approach

Intellectual PropertyIntellectual Property

• Advanced front-end feature extraction algorithm

• ETSI ES 202 050 V1.1.3 (2003-11)• European Telecommunications

Standards Institute• ETSI Technical Committee Speech

Processing, Transmission and Quality Aspects (STQ).

• Advanced front-end feature extraction algorithm

• ETSI ES 202 050 V1.1.3 (2003-11)• European Telecommunications

Standards Institute• ETSI Technical Committee Speech

Processing, Transmission and Quality Aspects (STQ).

Noise ReductionNoise Reduction

• Based on Weiner filter theory• Noise reduction is performed in

two stages• Input signal is de-noised in the first

stage.• Second stage – dynamic noise

reduction based on SNR of processed signal

• Based on Weiner filter theory• Noise reduction is performed in

two stages• Input signal is de-noised in the first

stage.• Second stage – dynamic noise

reduction based on SNR of processed signal

First StageFirst Stage

Spectrum

Estimation

PSD

MeanWF

Design

Mel

Filter-Bank

Mel

IDCT

Apply

Filter

VADNest

To Second

Stage

Second StageSecond Stage

Spectrum

Estimation

PSD

MeanWF

Design

Mel

Filter-Bank

Gain

Factorization

Mel

IDCT

Apply

Filter

From First Stage

OFF

Output

BufferingBuffering

Buffer 1 Buffer 2

0 1 2 3 0 1 2 3

A B C D E F G H

B C D new F G H

De-noised

(1st Stage)

De-noised

(output)

• 1 frame = 80 samples

• 1 buffer = 4 frames

A

De-noised

(output)

Spectrum EstimationSpectrum Estimation

• Input signal is divided into overlapping frames of Nin = 200 samples.

• A 25ms frame length and 10ms frame shift (80 samples) are used.

• Each frame Sw(n) is windowed with a Hanning window of length Nin.

• Input signal is divided into overlapping frames of Nin = 200 samples.

• A 25ms frame length and 10ms frame shift (80 samples) are used.

• Each frame Sw(n) is windowed with a Hanning window of length Nin.


sw (n) sin (n) wHann (n)

inHann N

nnw

)5.0(2cos5.05.0)(

where

SFFT (n) sw (n), 0 n N in 1

0, Nin n NFFT 1

Padding from Nin up to NFFT-1, NFFT = 256


indexfrequencybinwherenSFFTbinX FFT ,)()(

20,2

FFTNbinbinXbinP

• Frequency representation:

• Power spectrum:

• Smoothing:

40,

2

)12(2FFTNbin

binPbinPbinPin

Power Spectral Density MeanPower Spectral Density Mean

• Compute for each Pin(bin) the mean over the last TPSD = 2 frames.

• Compute for each Pin(bin) the mean over the last TPSD = 2 frames.

Pin _ psd bin, t 1

2Pin

i0

2 1

bin, t 1

Wiener Filter DesignWiener Filter Design

• A forgetting factor (weight) is computed for each frame, λNSE.

• A forgetting factor (weight) is computed for each frame, λNSE.

If (t < 100 frames)

λNSE = 1 – 1/t

else λNSE = 0.99


First stage noise spectrum estimate is updated based on VAD flag:

If flag = 0

P1/2noise(bin,tn) = min(λNSE ● P1/2

noise(bin,tn-1)+(1-λNSE)●PSDmean,exp(-10))

If flag = 1 P1/2

noise(bin,t) = P1/2noise(bin,tn) (last non speech frame)

First stage noise spectrum estimate is updated based on VAD flag:

If flag = 0

P1/2noise(bin,tn) = min(λNSE ● P1/2

noise(bin,tn-1)+(1-λNSE)●PSDmean,exp(-10))

If flag = 1 P1/2

noise(bin,t) = P1/2noise(bin,tn) (last non speech frame)


Second stage is updated permanently:

If (t < 11)Pnoise(bin,t) = λNSE ● Pnoise(bin,tn-1)+(1- λNSE)●PSDmean

elseupdate = 0.9 + 0.1×PinPSD(bin,t)/(PinPSD(bin,t)+ Pnoise(bin,t-1) )

×(1+1/(1+0.1×PinPSD(bin,t) /(PinPSD(bin,t-1)))

Pnoise(bin,t) = Pnoise(bin,t-1)×update

Second stage is updated permanently:

If (t < 11)Pnoise(bin,t) = λNSE ● Pnoise(bin,tn-1)+(1- λNSE)●PSDmean

elseupdate = 0.9 + 0.1×PinPSD(bin,t)/(PinPSD(bin,t)+ Pnoise(bin,t-1) )

×(1+1/(1+0.1×PinPSD(bin,t) /(PinPSD(bin,t-1)))

Pnoise(bin,t) = Pnoise(bin,t-1)×update


Noiseless spectrum is estimated:P1/2

den(bin,t) = 0.98×P1/2den(bin,t-1)+(1-0.98)×T[PSDmean -P1/2

noise(bin,t) ]

where the threshold function T is

Noiseless spectrum is estimated:P1/2

den(bin,t) = 0.98×P1/2den(bin,t-1)+(1-0.98)×T[PSDmean -P1/2

noise(bin,t) ]

where the threshold function T is

otherwise

tbinziftbinztbinzT

0

0),(,,


The priori SNR is calculated:The priori SNR is calculated:

tbinP

tbinPtbin

noise

den

,

,,

The filter transfer function is

tbin

tbintbinH

,1

,,


tbinPtbinHtbinP inPSDden ,,, 21212

The filter transfer function is used to improve noiseless signal estimation:

The improved priori SNR is:

22

2 22,,

,max, dB

tbinP

tbinPtbin

noise

den

Voice Activity DetectionVoice Activity Detection

• VAD is used to detect noise frames• Find frame energy:

• VAD is used to detect noise frames• Find frame energy:

If frame threshold < 10

long term energy factor (LTE) = 1 - 1/t

Else LTE = 0.97;

Calculate frame energy:

frameEn0.5 ln

64 Sin n 2

i0

M 1

64


• Use frame energy to update mean energy:

• Use frame energy to update mean energy:

If frame energy - mean energy < 20 (SNR threshold) or t < 10

Then if (frameEn < meanEn) or (t < 10)

meanEn = meanEn + (1 - LTE ) * (frameEn - meanEn)

Else meanEn = meanEn+(1 - 0.99) * (frameEn - meanEn)

If (meanEn < 80)

meanEn = 80


• Is the current frame speech?• Is the current frame speech?If t > 4

if (frameEn - meanEn) > 15

IT IS SPEECH

nbSpeechFrame++

else if nbSpeechFrame > 4

hangover = 15, nbSpeechFrame = 0

if (hangover != 0)

IT IS SPEECH

else IT IS NOT SPEECH

Mel Filter BankMel Filter Bank

• The linear frequency Weiner filter coefficients are smoothed and transformed to the Mel-frequency scale.

• The mel scale is a scale of pitches judged by listeners to be equal in distance one from another.

• The linear frequency Weiner filter coefficients are smoothed and transformed to the Mel-frequency scale.

• The mel scale is a scale of pitches judged by listeners to be equal in distance one from another.

Mel IDCTMel IDCT

• The time-domain impulse response of the Wiener filter is computed from the Mel-Wiener filter coefficients by using Mel-warped inverse Discrete Cosine Transform:

• The time-domain impulse response of the Wiener filter is computed from the Mel-Wiener filter coefficients by using Mel-warped inverse Discrete Cosine Transform:

240,24

02

nnkIDCTkHnh melk

melWF

)(

2cos, kdf

f

kfnnkIDCT

samp

centrmel

samp

centrcentr

f

kfkfkdf

11

Gain FactorizationGain Factorization

• Factorization of the Wiener filter Mel-warped coefficients is performed to control the aggression of noise reduction in the second stage.

• The de-noised frame signal energy is calculated as:

• Factorization of the Wiener filter Mel-warped coefficients is performed to control the aggression of noise reduction in the second stage.

• The de-noised frame signal energy is calculated as:

tbinPtEbin

denden ,65

0

2/13


• The noise energy of the current frame is estimated as:

• The noise energy of the current frame is estimated as:

tbinPtEbin

noisenoise ,65

0

2/1


• The smoothed SNR is evaluated using 3 de-noised frame energies and the noise energy

• The smoothed SNR is evaluated using 3 de-noised frame energies and the noise energy

tEtEtE

tEtEtERatio

noisenoisenoise

dendenden

12

If (Ratio > 0.0001)

Then

SNRavg(t) = 6.67 × log10 (Ratio)

Else

SNRavg(t) = -33.3


• To decide the degree of aggression, the SNR is tracked:

• To decide the degree of aggression, the SNR is tracked:

If {(SNRavg(t) – SNRlow-track(t-1)) < 10 or t < 10}

calculate λSNR(t)

SNRlow-track(t) = λSNR(t)× SNRlow-track(t -1)+(1- λSNR(t))×SNRavg(t)

Else

SNRlow-track(t) = SNRlow-track(t -1)


• Gain factorization applies more aggressive noise reduction to purely noisy frames and less to frames containing speech.

• The aggression coefficient takes on a value of 10% for speech + noise frames and 80% for noise frames.

• Gain factorization applies more aggressive noise reduction to purely noisy frames and less to frames containing speech.

• The aggression coefficient takes on a value of 10% for speech + noise frames and 80% for noise frames.

Apply FilterApply Filter

• The causal impulse response is obtained, truncated and weighted by a Hanning window.

• The input signal is filtered with the filter impulse response to produce the noise-reduced signal.

• The causal impulse response is obtained, truncated and weighted by a Hanning window.

• The input signal is filtered with the filter impulse response to produce the noise-reduced signal.

Offset CompensationOffset Compensation

• A filter is used to remove the DC offset over the frame length interval (80 samples).

• A filter is used to remove the DC offset over the frame length interval (80 samples).

)1()1024/11()1()()( __ nSnSnSnS ofnrnrnrofnr

Where Snr is the noise reduced signal

ResultsResults

Noisy test file:

After de-noise:

ResultsResults

Footloose:

Not Footloose:

Results: why didn’t this work?Results: why didn’t this work?

Hair dryer:

Still there?!?!:

ResultsResults

Hair dryer:

Gone:

noise reduction

Documents