speech processing research paper 10

4
2010 Inteational Conference on Computer, Mechatronics, Control and Electronic Engineering (CMCE) The Application of Hilbert-Huang Transform in Speech Enhancement Liwei Liu College of Computer Science and Engineering Changchun University of Technology Changchun, China E-mail:liuliwei@mail.ccut.edu.cn Abstract -Hilbert- Huang Transform (HHT) is a new and powerful theory for nonlinear and non-stationary signal analysis and it is efficient for describing the local features of dynamic signals. The paper introduces briefly the HHT method, validates its validity through the analysis of example, presents a speech enhancement algorithm based on the HHT, and contrasts with the speech denoising method of wavelet. Simulation experiments show that the results of based on the HHT to actualize speech denoising are the enhancement of SNR, definition and understanding of speech signal. The method of HHT adapts to the processing of speech signal. Keywords-Hbe-Huang transform; EMD; time-space filter; speech enhancement I. INTRODUCTION Nonlinear and non-stationary data processing is a necessary part in pure research and practical applications. In 1998, N. E. Huang [1] presented a new and powerl method for the analysis of the nonlinear and non-stationary time series data. The method is composed of two parts, Empirical Mode Decomposition (E) and Hilbert Transform(HT). The E is an adaptive decomposition with which any complicated signal can be decomposed into its Intrinsic Mode Functions(Is). With the HT, the Is yield instantaneous equencies as a nction of time. The final result is a three dimensional energy-equency- time spectrum designated as Hilbert spectrum. Practical applications of the T are broadly spread in numerous scientific disciplines and investigations, e.g. on gravity wave characteristics in the middle atmosphere to derive usel physical insights into dispersive-dissipative wave phenomenon [2] and on the ages of large amplitude coastal seiches on the Caribbean coast[3]. Further, the HHT has been used in other fields of geophysics, e.g. to examine earthquake processes as well as for the determination of the dispersion curves of seismic surface waves and to study the effects of seismic motions on the condition of buildings and structures in civil engineering[4]. Moreover, the HHT is used in tsunami research to detect earthquake generated water waves om data series recorded om bottom pressure transducers in the Northe Pacific and to examine the responses of New Zealand coastal waters to the Peru tsunami[5].Additionally, the EMD is also used in automatic human gait analysis that is becoming increasingly important in the context of human gesture recognition to serve as an indivial biometric characteristic[ 6]. 978-1-4244-7956-6110/$26 . 00 ©2010 IEEE 207 Gehen g Chen, Fen g Qian College of Computer Science and Engineering Changchun University of Technology Changchun, China E-mail: [email protected] is Paper is organized as follows. A primary presentation of HHT is given in Section II. In Section III, the computer simulation example is analyzed to veri the validity of HHT in processing the nonlinear d non- stationary signal. In Section IV, an approach of speech enhancement based on HHT is presented and then is used to analyze some speech signals. Corresponding experental results are shown. Conclusions are given in Section V. II. HILBERT-HUANG TRANSFORM Compared with various data analysis methods, the innovation of HHT is the introction of I,which guarantees the physically meaningl Instantaneous Frequency. The HHT consists of two processes[I ]. A. Empiri c al Mode De c omposition The procedure of E decomposition is to shiſt the original data series until the signals are adaptively decomposed into a number of IMFs. Every I must satis two properties: (I ) the number of exema and the number of zero crossings are either equal or differ by one; (2) the mean value of the envelope defined by the local minima is constant zero. A special siſting process is employed to extract all of Is. This siſting process is described as follows. Firstly, the upper envelopes and lower envelopes of signals x(t) , as well as their mean value ml (t) ,are calculated respectively. The first step of the siſting process is to calculate the difference: hl ( t ) = x(t)-ml(t) (I ) However, hI ( t ) rarely satisfies the two I properties and is taken as the first I of the signals straightway. Therefore, the siſting usually has to be implemented for more times, where the "difference" obtained in the previous siſting is taken as "signals" in present siſting. If aſter (k + I) th siſting, coesponding difference �k (t) satisfies the I properties, hlk ( t ) = ( k l ) ( t ) - mlk (t ) (2) then it can be taken as the first IMF component, denoted by cI ( t ) , that is: cI ( t ) = h lk ( t ) (3) In practice, to deteine whether or not h lk ( t ) well satisfies the I properties, we usually use socalled standard deviation(SD) criterion, that is, to check if the following inequality holds[I ]: CMCE 2010

Upload: imparivesh

Post on 18-Dec-2015

217 views

Category:

Documents


2 download

DESCRIPTION

This is ieee paper which is not accessible without any paid account in ieee .

TRANSCRIPT

  • 2010 International Conference on Computer, Mechatronics, Control and Electronic Engineering (CMCE)

    The Application of Hilbert-Huang Transform in Speech Enhancement

    Liwei Liu College of Computer Science and Engineering

    Changchun University of Technology Changchun, China

    E-mail:[email protected]

    Abstract-Hilbert-Huang Transform (HHT) is a new and powerful theory for nonlinear and non-stationary signal analysis and it is efficient for describing the local features of dynamic signals. The paper introduces briefly the HHT method, validates its validity through the analysis of example, presents a speech enhancement algorithm based on the HHT, and contrasts with the speech denoising method of wavelet. Simulation experiments show that the results of based on the HHT to actualize speech denoising are the enhancement of SNR, definition and understanding of speech signal. The method of HHT adapts to the processing of speech signal.

    Keywords-Hilbert-Huang transform; EMD; time-space filter; speech enhancement

    I. INTRODUCTION Nonlinear and non-stationary data processing is a

    necessary part in pure research and practical applications. In 1998, N. E. Huang [1 ] presented a new and powerful method for the analysis of the nonlinear and non-stationary time series data. The method is composed of two parts, Empirical Mode Decomposition (EMD) and Hilbert Transform(HT). The EMD is an adaptive decomposition with which any complicated signal can be decomposed into its Intrinsic Mode Functions(IMFs). With the HT, the IMFs yield instantaneous frequencies as a function of time. The final result is a three dimensional energy-frequencytime spectrum designated as Hilbert spectrum.

    Practical applications of the HHT are broadly spread in numerous scientific disciplines and investigations, e.g. on gravity wave characteristics in the middle atmosphere to derive useful physical insights into dispersive-dissipative wave phenomenon [2] and on the ages of large amplitude coastal seiches on the Caribbean coast[3]. Further, the HHT has been used in other fields of geophysics, e.g. to examine earthquake processes as well as for the determination of the dispersion curves of seismic surface waves and to study the effects of seismic motions on the condition of buildings and structures in civil engineering[4]. Moreover, the HHT is used in tsunami research to detect earthquake generated water waves from data series recorded from bottom pressure transducers in the Northern Pacific and to examine the responses of New Zealand coastal waters to the Peru tsunami[5].Additionally, the EMD is also used in automatic human gait analysis that is becoming increasingly important in the context of human gesture recognition to serve as an individual biometric characteristic[ 6].

    978-1-4244-7956-611 0/$26.00 20 1 0 IEEE 207

    Geheng Chen, Feng Qian College of Computer Science and Engineering

    Changchun University of Technology Changchun, China

    E-mail: [email protected]

    This Paper is organized as follows. A primary presentation of HHT is given in Section II. In Section III, the computer simulation example is analyzed to verifY the validity of HHT in processing the nonlinear and nonstationary signal. In Section IV, an approach of speech enhancement based on HHT is presented and then is used to analyze some speech signals. Corresponding experimental results are shown. Conclusions are given in Section V.

    II. HILBERT -HUANG TRANSFORM Compared with various data analysis methods, the

    innovation of HHT is the introduction of IMF,which guarantees the physically meaningful Instantaneous Frequency. The HHT consists of two processes[I ].

    A. Empirical Mode Decomposition The procedure of EMD decomposition is to shift the

    original data series until the signals are adaptively decomposed into a number of IMFs. Every IMF must satisfY two properties: (I ) the number of extrema and the number of zero crossings are either equal or differ by one; (2) the mean value of the envelope defined by the local minima is constant zero. A special sifting process is employed to extract all of IMFs. This sifting process is described as follows.

    Firstly, the upper envelopes and lower envelopes of signals x(t) , as well as their mean value ml (t) ,are calculated respectively. The first step of the sifting process is to calculate the difference:

    hl(t) = x(t)-ml(t) (I ) However, hI (t) rarely satisfies the two IMF properties

    and is taken as the first IMF of the signals straightway. Therefore, the sifting usually has to be implemented for more times, where the "difference" obtained in the previous sifting is taken as "signals" in present sifting. If after (k + I ) th sifting, corresponding difference k (t) satisfies the IMF properties,

    hlk (t) = (kl) (t) - mlk (t) (2) then it can be taken as the first IMF component, denoted by cI (t) , that is:

    cI (t) = hlk (t) (3) In practice, to determine whether or not hlk (t) well

    satisfies the IMF properties, we usually use socalled standard deviation(SD) criterion, that is, to check if the following inequality holds[I ]:

    CMCE 2010

  • SD(k) = f[lhl(k_I)) - hlk (t) IZ ] 0.2 _ 0.3 (4)

    tO hl(k-I) (t) Where T is the length of data. Next, taking rest data

    rl (t) = x(t) -cI (t) (5) as "new" signals and implementing the sifting process on it, we can obtain the second IMF Cz (t). This procedure should be repeatedly used for n times until the last residue rn (t) becomes a monotonic function. When the decomposition procedure finished, the signals then can be expressed as:

    n x(t) = >i (t) + rn (t) (6)

    i=1 where cl(t),CZ

    (t),.,cn(t) , are all of the IMFs included in the signals, and rn (t) is a negligible residue.

    B. Hilbert Transform As mentioned above, the main purpose of the EMD is to

    conduct the HT and obtain the Hilbert spectrum which is similar to wavelet spectrum. After conducting HT to every IMF component, Ci(t) , we have a new data series Yi(t)in the transform domain:

    Yi(t)=p fci(r)

    dr n: t-r

    (7)

    where P indicates the Cauchy principle value. With this definition, a complex series Zi (t) is formed:

    where Zi (t) = ci (t) + jYi (t) = ai (t)e)e,(I) (8)

    ai (t) = ciZ (t) + YiZ (t)

    B (t) = arctan Yi (t)

    I ci(t)

    (9)

    (10)

    and the IF is:

    met) = dB/t) I dt (11)

    Compared with the traditional FFT, ai (t) and mi (t) derived by HHT are functions of time t, not constant, which are different from FFT, so the HT can present the varying of the power with time.

    III. ANAL YZING OF THE COMPUTER SIMULATION EXAMPLE

    In order to verify the effectiveness of HHT in dealing with nonlinear and non-stationary signal, this paper analyzes the analytic expression of frequency modulation signal, it is: x(t) = (1 + 0.2(2n:7 .5t x cos(2n:30t + 0.5 sin(2n:15t

    +sin(2n:150t) (12) The signal is overlying of two parts that one is a FM-AM signal of 30Hz fundamental frequency, 15Hz modulation frequency, another is a 150Hz sinusoidal signal. We can get angular frequency met) through analyzing the frequency of FM about the part of FM-AM:

    m(t) = 60n:+15n:cos(30m) (13)

    208

    So getting f(t) is: f(t) = 30 + 7 .5cos(30m) (14)

    The frequency fluctuates between [22.5,37.5]. The Amplitude variation scope is between [0.8,1.2], and its variation frequency is 7.5Hz.

    Empirical Mode Decomposition

    ] .r:JWW\J\I\/J ,.-::

    50 100 150 200 250 Dl 350 400

    Figure 1. The result of EMD 450 500

    Figure 1 shows the IMF components derived from the x(t) by EMD. The Signal is sampled eight periods, and 64 points of each period. The IMFI corresponds to the 150Hz sinusoidal part. The IMF2 corresponds to the FMAM part, its waveform has change in amplitude and spacing. The res is the residue.

    Hilbert-Huang spectrum

    time

    Figure 2. The energy-frequency-time spectrum

    Figure 2 shows the energy-frequency-time spectrum based on the obtained IMFs. The horizontal coordinate is sampling time, the vertical coordinate is frequency, the color of bar chart shows the size of amplitude. There are two frequencies in Figure 2. One is 150Hz that is unchanged with time. Another is fluctuating between [22.5, 37.5] with time that fundamental frequency is 30Hz, the change of color is between [0.8,1.2] and has two times in eight periods that shows the frequency of amplitude change is 7.5Hz. From the above we can see that the energy-frequency-time spectrum can extract the various characteristics and parameters of the signals frequency and amplitude with time.

    This example can verify that the HHT is a new and powerful method for the analysis of the nonlinear and nonstationary time series data.

  • IV. SPEECH ENHANCEMENT METHOD BASED ON HHT Speech signal is a kind of typical non-stationary signal,

    but for the speech signal analysis and said has been based on the hypothesis of short-term stationary, and using the analysis method of stationary. Although these analysis method in practical application has achieved great success, but they are stiJl exist significant differences compared with the people's perception. With the continuous improvement for the requirement of the speech signal processing, using suitable nonlinear and non-stationary signal processing method to analyzing the speech signal is attended by more and more people. The HHT is an effective new analysis method that meets the requirements. Based on the characteristics of the time-space filtering of the EMD algorithm, it is applied to speech enhancement, and through the simulation experiments prove the effectiveness of this method.

    The core of HHT is the EMD algorithm. The EMD can sifting the signal, and to get many IMFs that is changed from small scale to big scale. In time, each IMF shows a modal of certain scale. In frequency, the performance of the filtering process is showed from high frequency to low frequency. For example, If the signal is decomposed into the n IMF components, then the low-pass filter can be expressed as:

    n

    XI (t) = L Ci (t) + rn (t) i=1

    the high-pass filter can be expressed as: h

    xh(t) = Lci(t) i=l

    the band-pass filter can be expressed as: I

    Xb (t) = I cJt) i=h

    (15)

    (16)

    (17)

    Based on the above principles, the paper puts forward a speech enhancement algorithm for the broadband additive noise based on the HHT. In the experiment, the speech signal set) is recorded. It is sampled at a frequency of 8kHz and converted into digital data with a precision of 16 bits, the content is "open" for Chinese girls. This pure voice signal is superimposed gaussian white noise. The value of variance 0'2 is changed to constitute seven groups signals that the SNR is 10dB, 6dB, 3dB, OdB respectively.

    500 1000 1500 2000 2500 3000 3500 4000 (a)

    '" . 0 "0 z -1 __ __ __ __ __

    500 1000 1500 2000 2500 3000 3500 4000 (b)

    Figure 3. (a) Original pure speech signal (b) Noisy speech signal at 3dB SNR

    209

    The original pure speech signal is ploted in Figure 3(a), the noisy speech signal at 3dB SNR is ploted in Figure 3(b). The noisy speech signal is processed with the denoising method based on the HHT and wavelet soft threshold and hard threshold methods, and then the results will be compared.

    " g i@f'ij.j'I'I.'.+t4I.r I!lfoM "'

    9H"1@8 i>"

    O g "O"2 c... ______________ ----"--,---,,---____________ ----,,

    - -.F0N0l\ -F?:\7=

    -/=

    ;"--------===------------------- -------== ----",j _:t r : ::: j

    500 1 000 1500 20CIJ 2500 3IXlJ 3500 4000

    Figure 4. The IMF components and residue derived from the noisy speech signal

    Firstly, decompose the noisy speech signal to IMFs by the EMD method. The IMF components and residue derived from the noisy speech signal(Figure 3(b)) shown in Figure 4. It shows the IMFlIMF5 contains a high frequency component of the signal, and the noise is contained in them approximately. But if using low-pass filter directly, the useful speech signal will be lost because the distribution of broadband noise spectrum and voice spectrum is overlapped. So the high frequency components of the IMFs is processed using wavelet denoising methods of soft threshold, that is to say, for each IMF component adopt a floating threshold to identifY the data that carries less energy. In other words the data that less than or equal to the threshold wiJl as the actual value of zero, and keep only threshold above. The specific processing according to the foJlowing formula.

    IMF' ::= ' , , (18) {sgn( IMF; n )(IIMF; n 1- 8), IIMF; n I > 8

    I,n O,IIMF;,nl < 8

    Where sgn is symbolic function, 8 is defined as threshold and its calculation formula is:

    8 - 210gN / (19) - /In(i + I) here (5"2 is the estimated noise variance. Obviously along with scale i increase, the threshold along with reduces. After the processing of threshold, the high frequency IMFs and the low frequency IMFs are superimposed, and the de-noising speech signal is reconstructed. By contrast with the above method, the speech enhancement is realized by the wavelet transform method (The wavelet

  • base uses db5) that using soft threshold and hard threshold methods respectively.

    TABLE I.

    SNRof Noisy

    Speech Sinal -lOdB

    -6dB

    -3dB

    OdB

    3dB

    6dB

    10dB

    SNR MEASURE FOR THE ENHANCED SPEECH OBTAINED WITH THREE DIFFERENT METHODS

    SNR of Enhanced Speech Signal

    Wavelet soft Wavelet hard HHTmethod threshold method threshold method

    24dB 18dB 19dB

    23dB 21dB 22dB

    23dB 23dB 24dB

    26dB 26dB 26dB

    28dB 24dB 27dB

    30dB 25dB 26dB

    30dB 24dB 28dB

    This table shows that the noisy speech signal in different SNR is executed three speech enhancement methods respectively, and get the SNR of enhanced speech signal. Considering the randomness of the noisy sample, the SNR of denoising signal is the average of the different 50 times noisy samples. The experimental results demonstrate the speech enhancement method based on the HHT outperforms the classical wavelet soft or hard threshold method and effectively improves the performance.

    V. CONCLUSION In this paper, we introduce the basic method of the

    HHT and the law of its EMD. Based on the law of the EMD and the filter characteristic of the EMD components (IMFs), a novel noise removal method is developed. The proposed algorithm has been tested and compared to conventional speech enhancement method, that is wavelet soft or hard threshold method. The result shows that our

    210

    algorithm achieved the better performance under tested environmental conditions. It is effective to remove more noise and is capable to improve the SNR of the speech. After enhanced, the articulation and intelligibility of the speech is still good.

    HHT is a new theory which has important theoretical value and widely applying perspective. Nevertheless, it is still not perfect and has some problems to be solved such as the curve fitting problem, end disposal problem, mode mixing problem, and so on. The effect of HHT-based speech enhancement algorithm is well affected by how these problems are dealt. In this paper, we did pilot study on speech enhancement methods based on the HHT theory. Many works need to be done to perfect HHT as well as to apply HHT in speech processing field.

    REFERENCES

    [I] N. E. Huang and et aI., "The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis", Proc.R.Soc.London.A, Vo1.454, 1998, pp. 903-995.

    [2] X. Zhu and et aI., "Gravity wave characteristics in the middle atmosphere derived from the empirical mode decomposition method", Journal of Geophysical Research, Vol.l02, 1997, pp. 16545-16561.

    [3] N. E. Huang, H. H. Shih, Z. Shen, S. Long, "The ages of large amplitude coastal seiches on the Caribbean Coast of Puerto Rico", Journal of Physical Oceanography, Vo1.30, No.8, August, 2000, pp. 2001-2012.

    [4] A. D. Veltcheva, "Wave and group transformation by a Hilbert spectrum", Coastal Engineering Journal, Vo1.44, No.4, April, 2002, pp. 283-300.

    [5] D. G. Goring, "Response of New Zealand waters to the Peru tsunami of 23 June 2001 ", The Royal Society of New Zealand, Vo1.36, 2002, pp. 225-232.

    [6] W. Huang and et aI., "Nonlinear indicial response of complex nonstationary oscillations as pulmonary hypertension responding to step hypoxia", Pro.Natl. Acad.Sci., Vo1.96, No.3, March, 1999, pp. 1834-1839.