a robust algorithm for formant frequency extraction of noisy speech
TRANSCRIPT
-
8/9/2019 A Robust Algorithm for Formant Frequency Extraction of Noisy Speech
1/4
A ROB UST ALGORITHM FOR FORMAN T FREQUENCY
EXTRACTION OF NOISY SPEECH
Qifang
Zhao
Tetsuya Shimamura Jou j i Suzuki
D e p a r t m e n t
of
Informat ion and Co mp ute r Sc iences , Sa i t ama Unive rs i ty
255
S h i m o - O k u b o , U r a w a , S a i t a m a , 338-8570 J a p a n
emai l : [email protected] u.ac.jp
ABSTRACT
In this paper a new method for formant frequency
estimation of noisy speech is proposed based on the linear
prediction analysis. Usually the linear prediction analysis
based algorithms can extract the formant frequencies
effectively for clean speech. When speech is corrupted by
noise, however, their performance degrades seriously. It is
well known that the autocorrelation function has the
property of concentrating the energy of the white noise on
the nearby of the zero lag. Utilizing this property of the
autocorrelation function, the proposed method extracts the
formant frequencies from the autocorrelation function of
the speech instead of the speech itself. The experimental
results show that the proposed method is much more
robust to noise than the conventional linear prediction
based algorithms.
1 INTRODUCTION
Formant frequency estimation of voiced speech is an
important part of speech processing and plays a major role
in many applications. There have been various algorithms
proposed aimed to improve the extraction accuracy or the
robustness to noise
[l-41.
A frequently used approach for
the formant frequency estimation is linear prediction
analysis LPC) [ 5 ] which can extract the formant
frequencies effectively by finding the roots of the
prediction polynomial or by the peak-picking of the linear
prediction spectrum. The LPC based algorithms offer a
readily implementable processing paradigm for real-time
analysis of the speech waveform. When speech is
corrupted by noise, however, the performance of the LPC
scheme degrades seriously
[6].
It is a very difficult
problem for the LPC based algorithms to extract the
formant frequencies accurately from noisy speech.
The speech signal can be divided into voiced speech and
unvoiced speech. The formant frequencies are extracted
from the voiced part
of
the speech which is quasi-periodic.
It is well known that the autocorrelation function ACF) of
a periodic signal possesses the same frequency
components
as
the original periodic signal. Thus it is quite
possible to extract the formant frequencies from the ACF
of the speech instead of the speech itself. The ACF,
besides, has the property of concentrating the energy of
white noise on the nearby of zero lag. Therefore by
extracting the formant frequencies
from
the ACF signal of
the speech excluding the nearby of the zero lag), the
influence of the noise can be reduced greatly.
In this paper, by utilizing the properties of ACF described
above[7], we propose a new LPC-based method for the
formant frequency estimation, which is expected to be
robust to white noise.
2. THE PROPOSED METHOD
Let .f t ) e a periodic signal with period
T .
It can be
expanded by Fourier series as follows.
2n
T
where
@ = - .
Then the ACF of f t ) s expressed as
The @(Z) satisfies the following properties.
a)
The ACF signal @(Z) s composed of the same
frequency components as f t )
The amplitude of each frequency component of
(Z)
is proportional to the square of that of
b)
f (4 .
c) I f f t )
is
white noise, then the energy of
@ T) is concentrated on
Z
= 0 .
The application of above properties
is
very successful in
noise reduction for periodic signals. Since the voiced part
of speech signal is quasi-periodic, we consider that the
properties described above are applicable to voiced speech.
Thus according to property a), the formant frequencies
can be estimated from the ACF signal of the speech. In
this case the amplitude of the formants will be emphasized
v-534
0-78034455-3/98/ 10.00 0 1998 IEEE
-
8/9/2019 A Robust Algorithm for Formant Frequency Extraction of Noisy Speech
2/4
by property b). And by property c), the influence of
white noise can be avoided by utilizing the ACF @ Z)
from Z= Z,
> 0
( we call Z, ACF delay
),
not from
Z = 0
ecause the ACF signal of the periodic signal is
also a periodic signal.
Based on these considerations, we propose a new method
for the formant frequency estimation by employing the
linear prediction analysis of speech ACF. The proposed
method is expected to be ro bust to white noise. Fig.1 is the
block-diagram of the proposed method.
Pre-emphasis coefficient
window
for
LPC analysis
Noisy Speech
0.975
Hamming, 25.6ms
Pre-emphasis
LPC Analysis
Peak-Picking
Formant frequencies
-: Pre processing I
\ Formant Frequencies:
Extraction
by
:
LPCAnalysis
Fig 1
Block-diagram of the proposed method
The proposed method is mainly divided into two steps: the
pre-processing and the formant frequency extraction by
LPC algorithm. In the first step, at first the noisy speech is
divided into frames and pre-emphasized. Then the ACF of
the speech is calculated. To avoid the influence of noise,
only the Z = Z,
N,Z,
0 N s the frame length
)
part of the ACF signal is used as the input of the second
step. In the second step, the ACF signal
is
windowed and
pre-emphasized, then the LPC coefficients are computed
by the autocorrelation method. Based on the LPC
coefficients, the LPC spectrum is calculated by using FFT.
At last the first three formant frequencies are decided by
the peak-picking.
The main characteristics of our proposal include two
aspects. One is that the input signal of the LPC analysis is
not a speech signal, but a speech ACF signal. The other is
that the ACF of speech is computed and utilized from
Z, > 0 in order to avoid the influence of white noise.
A pre-emphasis before the ACF calculation is necessary
because after the ACF calculation the difference between
the amplitude of low frequency part and that of high
frequency part will becom e much larger.
3
EXPERIMENTS AND RESULTS
Japanese vowels “i, e, a, 0 u” spoken by a male
is
used to
evaluate the effectiveness of the proposed method. At first
a formant frequency standard reference of speech data is
created. The standard frequencies for the first three
formants are created basically by the peak-picking of the
LPC spectrum of the noise free speech and are verified
manually.
Gaussian white noise at the proper
rms
level is generated
and added to the speech to test the performance of the
proposed method. The noisy speech is divided into frames
of length 25.6 ms and the frames are shifted by 5 ms.
From the noisy speech the first three formant frequencies
are estimated at first by the conventional LPC algorithm,
and then by the proposed method. Both of them utilize the
peak-picking of the LPC spectrum. The results of the
estimation are compared with the standard formant
frequencies separately. The evaluation of the algorithm s is
conducted by the average of the absolute error in
percentage). The average absolute error
6
is defined as
follows.
I
Fi Fi
I
FiS
6 =
i = 1 - 3
3 )
where
F i
is the estimated value and
Fi
is the standard
value of the i-th formant frequency respectively.
The experimental parameters are set as follows.
Table
1 Experimental parameter specification
I
sampling rate I
lOkHz
LPC order
ACF
delay
Signal-to-Noise ratio 10dB
The ACF delay is set as
4
ms. It means that only the
Z
= 40 - ?d part of the ACF @ Z) of the speech signal
v-535
-
8/9/2019 A Robust Algorithm for Formant Frequency Extraction of Noisy Speech
3/4
i
e
1.32 10.59 20.31 ~~
FI F2 F3
6.62 39.58 29.22
I
a 4.79 2.59 6.74
U
average
o
9.29
I 126.70 I
12.86
7.48 12.64 3.86
5.90 38.42 14.56
1
~~
F1 F2 F3
2.04 1.19 4.73
I
e
I
4.26 1.07 2.64
I
a
5.90 0.69
[
1.33
I
;
1.64
I
8.18
I
16.03
, 17
3.07 9.57
average 3.38 4.14 5.58
The results for the extraction of the first three formant
frequencies by the conventional LPC based algorithm are
given in Table
2.
The signal-to-noise ratio is 1OdB. Table
3 shows the results
of
the proposed method. A comparison
of Table and Table 3 shows that the proposed method
brings to a considerable improvement in general over the
conventional method in the formant frequency results,
especially for the second and the third formants whose
frequencies a re relatively high and ar e sensitive to noise.
It may be noticed that as to the first formant frequency of
the vowel
Id
e/ and the third formant frequency of the
vowel
lo/,
the extraction results of the proposed method
are a little worse than the conventional LPC algorithm.
The reason is probably that the voiced speech is a quasi-
periodic signal, but not
a
complete periodic signal. Thus
when the properties
of
the periodic signal described in the
section 2 are applied to the voiced speech, there may be
errors to occur to some extent. But in general, the
proposed method gives a much more accurate estimation
than the conventional LPC based algorithms under the
noisy environment.
Fig.2 compares the linear prediction spectra of the vowel
/a/. original” denotes the linear prediction spectrum of the
noise free speech, while “conventional” and “proposed”
denote those of noisy speech computed by the
conventional and the proposed method respectively. Fig.2
shows clearly that by using the proposed method, the
formants are emphasized and the influence of noise is
decreased, especially for the third forma nt.
/ , , , , , , , ,
original
-----
conventional
,
,
, , , ,
,
-.---; ....\~.~
--._
20
5 1000 1500
2000 2500
3000 3500 4000 4500 5000
300
Frequency [Hz]
(a)
30
-
20
5
10
a
E
a
0
-10
-20
500
io00 1500
2000
2500 3000 3500 4000
4500 5000
306
Frequency [Hz]
(b)
Figure 2
Linear prediction spectra of vowel a
compu ted by the conventional method a) and the
proposed method
b).
4
CONCLUSION
A
robust algorithm for extracting the formant frequencies
from noisy speech is proposed. By utilizing the linear
prediction analysis of speech autocorrelation function
ACF) instead of speech itself, the formants are
emphasized and the influence of noise is decreased.
Experimental results show that the proposed method is
robust to white noise and a considerable improvement
over the conventional LPC based method is achieved.
In this paper the experiments are conducted under the
signal-to-noise ratio of 10dB. When the noise level
becomes much higher, it
is
possible to improve the
robustness of the proposed method to noise further more
by taking autocorrelation function one more time based on
the existing AC F signal.
V-536
-
8/9/2019 A Robust Algorithm for Formant Frequency Extraction of Noisy Speech
4/4
The proposed method
is
developed with the aim of
improving the robustness of the conventional LPC based
algorith ms to white noise. Under the non-white noise
environment, the performance of our proposal may
degrades to som e extent, but is expected to be better than
the conventional LPC based methods, because the peak
of
the formant is emphasized by the autocorrelation
function and thus could become more robust to the
influence of the noise.
The authors would like to thank Prof. Yashima for his
helpful advice.
5 REFERENCES
[l] S S
McCandless,
”An algorithm for automatic
form ant extraction using line ar prediction spectra ”
IEEE Trans. on Acoustic, Speech and Signal
Processing, ASSP-22, No.2, pp.135-141, 1974.
[2]
R.L.
Christensen, W.J. Strong and E.P. Palmer,
”A
comparison of three methods
of
extracting resonance
information ro m predictor-coeficient coded speech ”
IEEE Trans. on Acoustic, Speech and Signal
Processing, ASSP-24, No.1, pp.8-14 , Jan. 1974.
[3]
R.J. Niederjohn, M. Lahat,
“A zero-crossing
consistency method fo r formant tracking
of
voiced
speech in h igh noise levels ”
IEEE Trans. on Acoustic,
Speech and Signal Processing, ASSP-33, No.2,
pp.349-355, April 1985.
[4]
G. Duncan, M.A. Jack,
”Formant estimation
algorithm based on pole focusing offering improved
noise tolerance and fea ture resolution ”
IEE
proceedings, ~01 .13 5, t .F,
No.1,
pp.18-32, Feb.1988.
[ 5 ]
J.D. Markel,
“Digital inverse filtering-A new tool for
form ant trajectory estimation ”
IEEE Trans. AU-20,
[6]
Tierney,
”A study of LPC analysis of speech in
additive noise ”
IEEE T rans. on Acoustic, Speech and
Signal Processing, ASSP-28, No.4, pp.389-397, 1980.
[7] J Suzuki,
“Spee ch processing by splicing
o
autocorrelation function ”
Proc. IEEE Int. Conf.
Acoustic, S peech and Signal Pro cessing, pp.7 13-7 16
1976.
pp. 129-137, 1972.
v-537