a robust algorithm for formant frequency extraction of noisy speech

8/9/2019 A Robust Algorithm for Formant Frequency Extraction of Noisy Speech

1/4

A ROB UST ALGORITHM FOR FORMAN T FREQUENCY

EXTRACTION OF NOISY SPEECH

Qifang

Zhao

Tetsuya Shimamura Jou j i Suzuki

D e p a r t m e n t

of

Informat ion and Co mp ute r Sc iences , Sa i t ama Unive rs i ty

255

S h i m o - O k u b o , U r a w a , S a i t a m a , 338-8570 J a p a n

emai l : [email protected] u.ac.jp

ABSTRACT

In this paper a new method for formant frequency

estimation of noisy speech is proposed based on the linear

prediction analysis. Usually the linear prediction analysis

based algorithms can extract the formant frequencies

effectively for clean speech. When speech is corrupted by

noise, however, their performance degrades seriously. It is

well known that the autocorrelation function has the

property of concentrating the energy of the white noise on

the nearby of the zero lag. Utilizing this property of the

autocorrelation function, the proposed method extracts the

formant frequencies from the autocorrelation function of

the speech instead of the speech itself. The experimental

results show that the proposed method is much more

robust to noise than the conventional linear prediction

based algorithms.

1 INTRODUCTION

Formant frequency estimation of voiced speech is an

important part of speech processing and plays a major role

in many applications. There have been various algorithms

proposed aimed to improve the extraction accuracy or the

robustness to noise

[l-41.

A frequently used approach for

the formant frequency estimation is linear prediction

analysis LPC) [ 5 ] which can extract the formant

frequencies effectively by finding the roots of the

prediction polynomial or by the peak-picking of the linear

prediction spectrum. The LPC based algorithms offer a

readily implementable processing paradigm for real-time

analysis of the speech waveform. When speech is

corrupted by noise, however, the performance of the LPC

scheme degrades seriously

[6].

It is a very difficult

problem for the LPC based algorithms to extract the

formant frequencies accurately from noisy speech.

The speech signal can be divided into voiced speech and

unvoiced speech. The formant frequencies are extracted

from the voiced part

of

the speech which is quasi-periodic.

It is well known that the autocorrelation function ACF) of

a periodic signal possesses the same frequency

components

as

the original periodic signal. Thus it is quite

possible to extract the formant frequencies from the ACF

of the speech instead of the speech itself. The ACF,

besides, has the property of concentrating the energy of

white noise on the nearby of zero lag. Therefore by

extracting the formant frequencies

from

the ACF signal of

the speech excluding the nearby of the zero lag), the

influence of the noise can be reduced greatly.

In this paper, by utilizing the properties of ACF described

above[7], we propose a new LPC-based method for the

formant frequency estimation, which is expected to be

robust to white noise.

2. THE PROPOSED METHOD

Let .f t ) e a periodic signal with period

T .

It can be

expanded by Fourier series as follows.

2n

T

where

@ = - .

Then the ACF of f t ) s expressed as

The @(Z) satisfies the following properties.

a)

The ACF signal @(Z) s composed of the same

frequency components as f t )

The amplitude of each frequency component of

(Z)

is proportional to the square of that of

b)

f (4 .

c) I f f t )

is

white noise, then the energy of

@ T) is concentrated on

Z

= 0 .

The application of above properties

is

very successful in

noise reduction for periodic signals. Since the voiced part

of speech signal is quasi-periodic, we consider that the

properties described above are applicable to voiced speech.

Thus according to property a), the formant frequencies

can be estimated from the ACF signal of the speech. In

this case the amplitude of the formants will be emphasized

v-534

0-78034455-3/98/ 10.00 0 1998 IEEE

mailto:[email protected]:[email protected]


2/4

by property b). And by property c), the influence of

white noise can be avoided by utilizing the ACF @ Z)

from Z= Z,

> 0

( we call Z, ACF delay

),

not from

Z = 0

ecause the ACF signal of the periodic signal is

also a periodic signal.

Based on these considerations, we propose a new method

for the formant frequency estimation by employing the

linear prediction analysis of speech ACF. The proposed

method is expected to be ro bust to white noise. Fig.1 is the

block-diagram of the proposed method.

Pre-emphasis coefficient

window

for

LPC analysis

Noisy Speech

0.975

Hamming, 25.6ms

Pre-emphasis

LPC Analysis

Peak-Picking

Formant frequencies

-: Pre processing I

\ Formant Frequencies:

Extraction

by

:

LPCAnalysis

Fig 1

Block-diagram of the proposed method

The proposed method is mainly divided into two steps: the

pre-processing and the formant frequency extraction by

LPC algorithm. In the first step, at first the noisy speech is

divided into frames and pre-emphasized. Then the ACF of

the speech is calculated. To avoid the influence of noise,

only the Z = Z,

N,Z,

0 N s the frame length

)

part of the ACF signal is used as the input of the second

step. In the second step, the ACF signal

is

windowed and

pre-emphasized, then the LPC coefficients are computed

by the autocorrelation method. Based on the LPC

coefficients, the LPC spectrum is calculated by using FFT.

At last the first three formant frequencies are decided by

the peak-picking.

The main characteristics of our proposal include two

aspects. One is that the input signal of the LPC analysis is

not a speech signal, but a speech ACF signal. The other is

that the ACF of speech is computed and utilized from

Z, > 0 in order to avoid the influence of white noise.

A pre-emphasis before the ACF calculation is necessary

because after the ACF calculation the difference between

the amplitude of low frequency part and that of high

frequency part will becom e much larger.

3

EXPERIMENTS AND RESULTS

Japanese vowels “i, e, a, 0 u” spoken by a male

is

used to

evaluate the effectiveness of the proposed method. At first

a formant frequency standard reference of speech data is

created. The standard frequencies for the first three

formants are created basically by the peak-picking of the

LPC spectrum of the noise free speech and are verified

manually.

Gaussian white noise at the proper

rms

level is generated

and added to the speech to test the performance of the

proposed method. The noisy speech is divided into frames

of length 25.6 ms and the frames are shifted by 5 ms.

From the noisy speech the first three formant frequencies

are estimated at first by the conventional LPC algorithm,

and then by the proposed method. Both of them utilize the

peak-picking of the LPC spectrum. The results of the

estimation are compared with the standard formant

frequencies separately. The evaluation of the algorithm s is

conducted by the average of the absolute error in

percentage). The average absolute error

6

is defined as

follows.

I

Fi Fi

I

FiS

6 =

i = 1 - 3

3 )

where

F i

is the estimated value and

Fi

is the standard

value of the i-th formant frequency respectively.

The experimental parameters are set as follows.

Table

1 Experimental parameter specification

I

sampling rate I

lOkHz

LPC order

ACF

delay

Signal-to-Noise ratio 10dB

The ACF delay is set as

4

ms. It means that only the

Z

= 40 - ?d part of the ACF @ Z) of the speech signal

v-535


3/4

i

e

1.32 10.59 20.31 ~~

FI F2 F3

6.62 39.58 29.22

I

a 4.79 2.59 6.74

U

average

o

9.29

I 126.70 I

12.86

7.48 12.64 3.86

5.90 38.42 14.56

1

~~

F1 F2 F3

2.04 1.19 4.73

I

e

I

4.26 1.07 2.64

I

a

5.90 0.69

[

1.33

I

;

1.64

I

8.18

I

16.03

, 17

3.07 9.57

average 3.38 4.14 5.58

The results for the extraction of the first three formant

frequencies by the conventional LPC based algorithm are

given in Table

2.

The signal-to-noise ratio is 1OdB. Table

3 shows the results

of

the proposed method. A comparison

of Table and Table 3 shows that the proposed method

brings to a considerable improvement in general over the

conventional method in the formant frequency results,

especially for the second and the third formants whose

frequencies a re relatively high and ar e sensitive to noise.

It may be noticed that as to the first formant frequency of

the vowel

Id

e/ and the third formant frequency of the

vowel

lo/,

the extraction results of the proposed method

are a little worse than the conventional LPC algorithm.

The reason is probably that the voiced speech is a quasi-

periodic signal, but not

a

complete periodic signal. Thus

when the properties

of

the periodic signal described in the

section 2 are applied to the voiced speech, there may be

errors to occur to some extent. But in general, the

proposed method gives a much more accurate estimation

than the conventional LPC based algorithms under the

noisy environment.

Fig.2 compares the linear prediction spectra of the vowel

/a/. original” denotes the linear prediction spectrum of the

noise free speech, while “conventional” and “proposed”

denote those of noisy speech computed by the

conventional and the proposed method respectively. Fig.2

shows clearly that by using the proposed method, the

formants are emphasized and the influence of noise is

decreased, especially for the third forma nt.

/ , , , , , , , ,

original

-----

conventional

,

,

, , , ,

,

-.---; ....\~.~

--._

20

5 1000 1500

2000 2500

3000 3500 4000 4500 5000

300

Frequency [Hz]

(a)

30

-

20

5

10

a

E

a

0

-10

-20

500

io00 1500

2000

2500 3000 3500 4000

4500 5000

306

Frequency [Hz]

(b)

Figure 2

Linear prediction spectra of vowel a

compu ted by the conventional method a) and the

proposed method

b).

4

CONCLUSION

A

robust algorithm for extracting the formant frequencies

from noisy speech is proposed. By utilizing the linear

prediction analysis of speech autocorrelation function

ACF) instead of speech itself, the formants are

emphasized and the influence of noise is decreased.

Experimental results show that the proposed method is

robust to white noise and a considerable improvement

over the conventional LPC based method is achieved.

In this paper the experiments are conducted under the

signal-to-noise ratio of 10dB. When the noise level

becomes much higher, it

is

possible to improve the

robustness of the proposed method to noise further more

by taking autocorrelation function one more time based on

the existing AC F signal.

V-536


4/4

The proposed method

is

developed with the aim of

improving the robustness of the conventional LPC based

algorith ms to white noise. Under the non-white noise

environment, the performance of our proposal may

degrades to som e extent, but is expected to be better than

the conventional LPC based methods, because the peak

of

the formant is emphasized by the autocorrelation

function and thus could become more robust to the

influence of the noise.

The authors would like to thank Prof. Yashima for his

helpful advice.

5 REFERENCES

[l] S S

McCandless,

”An algorithm for automatic

form ant extraction using line ar prediction spectra ”

IEEE Trans. on Acoustic, Speech and Signal

Processing, ASSP-22, No.2, pp.135-141, 1974.

[2]

R.L.

Christensen, W.J. Strong and E.P. Palmer,

”A

comparison of three methods

of

extracting resonance

information ro m predictor-coeficient coded speech ”

IEEE Trans. on Acoustic, Speech and Signal

Processing, ASSP-24, No.1, pp.8-14 , Jan. 1974.

[3]

R.J. Niederjohn, M. Lahat,

“A zero-crossing

consistency method fo r formant tracking

of

voiced

speech in h igh noise levels ”

IEEE Trans. on Acoustic,

Speech and Signal Processing, ASSP-33, No.2,

pp.349-355, April 1985.

[4]

G. Duncan, M.A. Jack,

”Formant estimation

algorithm based on pole focusing offering improved

noise tolerance and fea ture resolution ”

IEE

proceedings, ~01 .13 5, t .F,

No.1,

pp.18-32, Feb.1988.

[ 5 ]

J.D. Markel,

“Digital inverse filtering-A new tool for

form ant trajectory estimation ”

IEEE Trans. AU-20,

[6]

Tierney,

”A study of LPC analysis of speech in

additive noise ”

IEEE T rans. on Acoustic, Speech and

Signal Processing, ASSP-28, No.4, pp.389-397, 1980.

[7] J Suzuki,

“Spee ch processing by splicing

o

autocorrelation function ”

Proc. IEEE Int. Conf.

Acoustic, S peech and Signal Pro cessing, pp.7 13-7 16

1976.

pp. 129-137, 1972.

v-537

a robust algorithm for formant frequency extraction of noisy speech

Documents