a robust algorithm for formant frequency extraction of noisy speech

Upload: rizwan-ishaq

Post on 01-Jun-2018

231 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 A Robust Algorithm for Formant Frequency Extraction of Noisy Speech

    1/4

    A ROB UST ALGORITHM FOR FORMAN T FREQUENCY

    EXTRACTION OF NOISY SPEECH

    Qifang

    Zhao

    Tetsuya Shimamura Jou j i Suzuki

    D e p a r t m e n t

    of

    Informat ion and Co mp ute r Sc iences , Sa i t ama Unive rs i ty

    255

    S h i m o - O k u b o , U r a w a , S a i t a m a , 338-8570 J a p a n

    emai l : [email protected] u.ac.jp

    ABSTRACT

    In this paper a new method for formant frequency

    estimation of noisy speech is proposed based on the linear

    prediction analysis. Usually the linear prediction analysis

    based algorithms can extract the formant frequencies

    effectively for clean speech. When speech is corrupted by

    noise, however, their performance degrades seriously. It is

    well known that the autocorrelation function has the

    property of concentrating the energy of the white noise on

    the nearby of the zero lag. Utilizing this property of the

    autocorrelation function, the proposed method extracts the

    formant frequencies from the autocorrelation function of

    the speech instead of the speech itself. The experimental

    results show that the proposed method is much more

    robust to noise than the conventional linear prediction

    based algorithms.

    1 INTRODUCTION

    Formant frequency estimation of voiced speech is an

    important part of speech processing and plays a major role

    in many applications. There have been various algorithms

    proposed aimed to improve the extraction accuracy or the

    robustness to noise

    [l-41.

    A frequently used approach for

    the formant frequency estimation is linear prediction

    analysis LPC) [ 5 ] which can extract the formant

    frequencies effectively by finding the roots of the

    prediction polynomial or by the peak-picking of the linear

    prediction spectrum. The LPC based algorithms offer a

    readily implementable processing paradigm for real-time

    analysis of the speech waveform. When speech is

    corrupted by noise, however, the performance of the LPC

    scheme degrades seriously

    [6].

    It is a very difficult

    problem for the LPC based algorithms to extract the

    formant frequencies accurately from noisy speech.

    The speech signal can be divided into voiced speech and

    unvoiced speech. The formant frequencies are extracted

    from the voiced part

    of

    the speech which is quasi-periodic.

    It is well known that the autocorrelation function ACF) of

    a periodic signal possesses the same frequency

    components

    as

    the original periodic signal. Thus it is quite

    possible to extract the formant frequencies from the ACF

    of the speech instead of the speech itself. The ACF,

    besides, has the property of concentrating the energy of

    white noise on the nearby of zero lag. Therefore by

    extracting the formant frequencies

    from

    the ACF signal of

    the speech excluding the nearby of the zero lag), the

    influence of the noise can be reduced greatly.

    In this paper, by utilizing the properties of ACF described

    above[7], we propose a new LPC-based method for the

    formant frequency estimation, which is expected to be

    robust to white noise.

    2. THE PROPOSED METHOD

    Let .f t ) e a periodic signal with period

    T .

    It can be

    expanded by Fourier series as follows.

    2n

    T

    where

    @ = - .

    Then the ACF of f t ) s expressed as

    The @(Z) satisfies the following properties.

    a)

    The ACF signal @(Z) s composed of the same

    frequency components as f t )

    The amplitude of each frequency component of

    (Z)

    is proportional to the square of that of

    b)

    f (4 .

    c) I f f t )

    is

    white noise, then the energy of

    @ T) is concentrated on

    Z

    = 0 .

    The application of above properties

    is

    very successful in

    noise reduction for periodic signals. Since the voiced part

    of speech signal is quasi-periodic, we consider that the

    properties described above are applicable to voiced speech.

    Thus according to property a), the formant frequencies

    can be estimated from the ACF signal of the speech. In

    this case the amplitude of the formants will be emphasized

    v-534

    0-78034455-3/98/ 10.00 0 1998 IEEE

    mailto:[email protected]:[email protected]

  • 8/9/2019 A Robust Algorithm for Formant Frequency Extraction of Noisy Speech

    2/4

    by property b). And by property c), the influence of

    white noise can be avoided by utilizing the ACF @ Z)

    from Z= Z,

    > 0

    ( we call Z, ACF delay

    ),

    not from

    Z = 0

    ecause the ACF signal of the periodic signal is

    also a periodic signal.

    Based on these considerations, we propose a new method

    for the formant frequency estimation by employing the

    linear prediction analysis of speech ACF. The proposed

    method is expected to be ro bust to white noise. Fig.1 is the

    block-diagram of the proposed method.

    Pre-emphasis coefficient

    window

    for

    LPC analysis

    Noisy Speech

    0.975

    Hamming, 25.6ms

    Pre-emphasis

    LPC Analysis

    Peak-Picking

    Formant frequencies

    -: Pre processing I

    \ Formant Frequencies:

    Extraction

    by

    :

    LPCAnalysis

    Fig 1

    Block-diagram of the proposed method

    The proposed method is mainly divided into two steps: the

    pre-processing and the formant frequency extraction by

    LPC algorithm. In the first step, at first the noisy speech is

    divided into frames and pre-emphasized. Then the ACF of

    the speech is calculated. To avoid the influence of noise,

    only the Z = Z,

    N,Z,

    0 N s the frame length

    )

    part of the ACF signal is used as the input of the second

    step. In the second step, the ACF signal

    is

    windowed and

    pre-emphasized, then the LPC coefficients are computed

    by the autocorrelation method. Based on the LPC

    coefficients, the LPC spectrum is calculated by using FFT.

    At last the first three formant frequencies are decided by

    the peak-picking.

    The main characteristics of our proposal include two

    aspects. One is that the input signal of the LPC analysis is

    not a speech signal, but a speech ACF signal. The other is

    that the ACF of speech is computed and utilized from

    Z, > 0 in order to avoid the influence of white noise.

    A pre-emphasis before the ACF calculation is necessary

    because after the ACF calculation the difference between

    the amplitude of low frequency part and that of high

    frequency part will becom e much larger.

    3

    EXPERIMENTS AND RESULTS

    Japanese vowels “i, e, a, 0 u” spoken by a male

    is

    used to

    evaluate the effectiveness of the proposed method. At first

    a formant frequency standard reference of speech data is

    created. The standard frequencies for the first three

    formants are created basically by the peak-picking of the

    LPC spectrum of the noise free speech and are verified

    manually.

    Gaussian white noise at the proper

    rms

    level is generated

    and added to the speech to test the performance of the

    proposed method. The noisy speech is divided into frames

    of length 25.6 ms and the frames are shifted by 5 ms.

    From the noisy speech the first three formant frequencies

    are estimated at first by the conventional LPC algorithm,

    and then by the proposed method. Both of them utilize the

    peak-picking of the LPC spectrum. The results of the

    estimation are compared with the standard formant

    frequencies separately. The evaluation of the algorithm s is

    conducted by the average of the absolute error in

    percentage). The average absolute error

    6

    is defined as

    follows.

    I

    Fi Fi

    I

    FiS

    6 =

    i = 1 - 3

    3 )

    where

    F i

    is the estimated value and

    Fi

    is the standard

    value of the i-th formant frequency respectively.

    The experimental parameters are set as follows.

    Table

    1 Experimental parameter specification

    I

    sampling rate I

    lOkHz

    LPC order

    ACF

    delay

    Signal-to-Noise ratio 10dB

    The ACF delay is set as

    4

    ms. It means that only the

    Z

    = 40 - ?d part of the ACF @ Z) of the speech signal

    v-535

  • 8/9/2019 A Robust Algorithm for Formant Frequency Extraction of Noisy Speech

    3/4

    i

    e

    1.32 10.59 20.31 ~~

    FI F2 F3

    6.62 39.58 29.22

    I

    a 4.79 2.59 6.74

    U

    average

    o

    9.29

    I 126.70 I

    12.86

    7.48 12.64 3.86

    5.90 38.42 14.56

    1

    ~~

    F1 F2 F3

    2.04 1.19 4.73

    I

    e

    I

    4.26 1.07 2.64

    I

    a

    5.90 0.69

    [

    1.33

    I

    ;

    1.64

    I

    8.18

    I

    16.03

    , 17

    3.07 9.57

    average 3.38 4.14 5.58

    The results for the extraction of the first three formant

    frequencies by the conventional LPC based algorithm are

    given in Table

    2.

    The signal-to-noise ratio is 1OdB. Table

    3 shows the results

    of

    the proposed method. A comparison

    of Table and Table 3 shows that the proposed method

    brings to a considerable improvement in general over the

    conventional method in the formant frequency results,

    especially for the second and the third formants whose

    frequencies a re relatively high and ar e sensitive to noise.

    It may be noticed that as to the first formant frequency of

    the vowel

    Id

    e/ and the third formant frequency of the

    vowel

    lo/,

    the extraction results of the proposed method

    are a little worse than the conventional LPC algorithm.

    The reason is probably that the voiced speech is a quasi-

    periodic signal, but not

    a

    complete periodic signal. Thus

    when the properties

    of

    the periodic signal described in the

    section 2 are applied to the voiced speech, there may be

    errors to occur to some extent. But in general, the

    proposed method gives a much more accurate estimation

    than the conventional LPC based algorithms under the

    noisy environment.

    Fig.2 compares the linear prediction spectra of the vowel

    /a/. original” denotes the linear prediction spectrum of the

    noise free speech, while “conventional” and “proposed”

    denote those of noisy speech computed by the

    conventional and the proposed method respectively. Fig.2

    shows clearly that by using the proposed method, the

    formants are emphasized and the influence of noise is

    decreased, especially for the third forma nt.

    / , , , , , , , ,

    original

    -----

    conventional

    ,

    ,

    , , , ,

    ,

    -.---; ....\~.~

    --._

    20

    5 1000 1500

    2000 2500

    3000 3500 4000 4500 5000

    300

    Frequency [Hz]

    (a)

    30

    -

    20

    5

    10

    a

    E

    a

    0

    -10

    -20

    500

    io00 1500

    2000

    2500 3000 3500 4000

    4500 5000

    306

    Frequency [Hz]

    (b)

    Figure 2

    Linear prediction spectra of vowel a

    compu ted by the conventional method a) and the

    proposed method

    b).

    4

    CONCLUSION

    A

    robust algorithm for extracting the formant frequencies

    from noisy speech is proposed. By utilizing the linear

    prediction analysis of speech autocorrelation function

    ACF) instead of speech itself, the formants are

    emphasized and the influence of noise is decreased.

    Experimental results show that the proposed method is

    robust to white noise and a considerable improvement

    over the conventional LPC based method is achieved.

    In this paper the experiments are conducted under the

    signal-to-noise ratio of 10dB. When the noise level

    becomes much higher, it

    is

    possible to improve the

    robustness of the proposed method to noise further more

    by taking autocorrelation function one more time based on

    the existing AC F signal.

    V-536

  • 8/9/2019 A Robust Algorithm for Formant Frequency Extraction of Noisy Speech

    4/4

    The proposed method

    is

    developed with the aim of

    improving the robustness of the conventional LPC based

    algorith ms to white noise. Under the non-white noise

    environment, the performance of our proposal may

    degrades to som e extent, but is expected to be better than

    the conventional LPC based methods, because the peak

    of

    the formant is emphasized by the autocorrelation

    function and thus could become more robust to the

    influence of the noise.

    The authors would like to thank Prof. Yashima for his

    helpful advice.

    5 REFERENCES

    [l] S S

    McCandless,

    ”An algorithm for automatic

    form ant extraction using line ar prediction spectra ”

    IEEE Trans. on Acoustic, Speech and Signal

    Processing, ASSP-22, No.2, pp.135-141, 1974.

    [2]

    R.L.

    Christensen, W.J. Strong and E.P. Palmer,

    ”A

    comparison of three methods

    of

    extracting resonance

    information ro m predictor-coeficient coded speech ”

    IEEE Trans. on Acoustic, Speech and Signal

    Processing, ASSP-24, No.1, pp.8-14 , Jan. 1974.

    [3]

    R.J. Niederjohn, M. Lahat,

    “A zero-crossing

    consistency method fo r formant tracking

    of

    voiced

    speech in h igh noise levels ”

    IEEE Trans. on Acoustic,

    Speech and Signal Processing, ASSP-33, No.2,

    pp.349-355, April 1985.

    [4]

    G. Duncan, M.A. Jack,

    ”Formant estimation

    algorithm based on pole focusing offering improved

    noise tolerance and fea ture resolution ”

    IEE

    proceedings, ~01 .13 5, t .F,

    No.1,

    pp.18-32, Feb.1988.

    [ 5 ]

    J.D. Markel,

    “Digital inverse filtering-A new tool for

    form ant trajectory estimation ”

    IEEE Trans. AU-20,

    [6]

    Tierney,

    ”A study of LPC analysis of speech in

    additive noise ”

    IEEE T rans. on Acoustic, Speech and

    Signal Processing, ASSP-28, No.4, pp.389-397, 1980.

    [7] J Suzuki,

    “Spee ch processing by splicing

    o

    autocorrelation function ”

    Proc. IEEE Int. Conf.

    Acoustic, S peech and Signal Pro cessing, pp.7 13-7 16

    1976.

    pp. 129-137, 1972.

    v-537