acoustic features transformation using wavelet packets for hearing impaired

12
Reprint ISSN 0974-1518 INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH AND INDUSTRIAL APPLICATIONS (IJERIA) IJERIA Ascent Publication A-63, Pum Coop. Hous. Society Air Port Road, Pune - 411032, Maharashtra State, INDIA. Tel.:0091-20-26610466 Mobi1e:09850330076 Web: www.ascentjoumals.com E-mail: [email protected] and [email protected]

Upload: pune

Post on 20-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Reprint ISSN 0974-1518

INTERNATIONAL JOURNALOF

ENGINEERING RESEARCHAND INDUSTRIAL APPLICATIONS

(IJERIA)

IJERIA

Ascent PublicationA-63, Pum Coop. Hous. SocietyAir Port Road, Pune -411032,Maharashtra State, INDIA.Tel.:0091-20-26610466Mobi1e:09850330076Web: www.ascentjoumals.comE-mail: [email protected] and

[email protected]

International J. ofEngg. Research & Indu. Appls. (IJERIA).ISSN 0974-1518, YoU, No. IV (2008),pp 235-244

ACOUSTIC FEATURES TRANSFORMATION USINGWAVELET PACKETS FOR HEARING IMPAIRED

MAHESH T. KOLTE AND D. S. CHAUDHARI

Abstract

A bandwidth portion of speech is transform from an original bandwidth into low - frequency band,

in which hearing impaired subjects having severe high frequency hearing impairment hold speech

perception. The basic transformation operation is based upon a wavelet packet technique, and the

resulting transition in formants and power compression is accomplished without relinquishing the

essential information contained in speech. Thus, subjects are able to perceive speech of fair

intelligibility. In this paper a new processing algorithm based on wavelet packets is presented.

Preliminary results of the processed speech material in six impaired listeners suggests that this

algorithm could be implemented in hearing aids to be used with severe and moderate-severe hearing

impairment.

II

[

---------------------------------------

Keywords: Hearing Impairment, Wavelet Packets, Information Transmission Analysis.

236 MAHESH T. KOLTE AND D. S. CHAUDHARI

1. INTRODUCTION

The articulatory features that characterise consonants of speech are manner, vOIcmg,

duration, and place of articulation [1]. Also, acoustic characteristics of consonants depend

on several characteristics of the adjacent vowel [2]. The consonants are very significant for

speech intelligibility and are very easily confused [3]. With spectral smearing and upward

spread of masking, formants F2 and higher ones will be smoothed out, leaving a broadened

Fl. Hearing-impaired subjects with high frequency loss have difficulties in discriminating

fricatives like Isl, Iz/ and Ish/, since the energy in the spectra of these alveolar fricatives

lies at 4 kHz and above [4, 5]. The averaged power spectrum involves measuring spectral

amplitudes in the acoustic signals. The spectral amplitudes were measured from the

averaged power spectrum, which were obtained by averaging squared spectra in certain

time intervals. The advantage of using the power spectrum, or, in other words, squaring the

signal, before the averaging is that high amplitude spectral peaks, which are believed to be

more informative about the place of articulation, are emphasized more than the smaller

peaks. The first three formant frequencies can traced, and with enough information, one

can be able to expose the vocal tract configuration at any point in time during the vowel-

to-consonant or consonant-to-vowel transition from the locations of the formant

frequencies at the corresponding time [5, 6].

Down sampling and up-sampling are used with the wavelet based filter banks to exploit the

spectral properties such as energy levels and perceptual importance [7]. The signal is

transform so that the power spectrum tends to concentrate into a few bands [8, 9]. The

changes in the acoustic attributes such as the averaged power spectrum and formant

transitions can observe.

II. Implementation

The processing schemes were a) Spectral splitting with modified wavelets packets based

on different frequency bands, b) Two different simulink models ware developed based on

modified wavelet packet with Daubechies and biorthogonal wavelet functions. During the

process of frequency transformation, as poles ware changed, compression was achieved,

and useful to the hearing impaired having high frequency impairment. The scheme was

designed using MATLAB Software with simulink models for off-line processing. Fig. I

-

ACOUSTIC FEATURES TRANSFORMATION USING... 237

and 2 shows the power spectrum of speech signal /asa/ for the bioorthogonal WP scheme

and Daubechies WP scheme respectively, while Fig. 3 and 4 shows formants of speech

signal /asa/ for the different schemes. The changes in the acoustic attributes due to

transformation were observed. The averaged power spectrum and formant transitions are

the some of the acoustic attributes. In this study, the spectral amplitudes were measured

from the averaged power spectrum, and first three formant frequencies were traced.

The experimental evaluation was done by conducting listening tests using test material

consisting of fifteen English consonants /p, b, t, d, k, g, m, n, s, z, f, v, r, 1,y/ in vowel-

consonant-vowel (VCV) context with vowel/a! as in farmer. Listening tests involved

binaural diotic presentation of unprocessed speech and binaural dichotic presentation of

processed speech. The stimuli were presented at the most comfortable listening level of

individual subjects. An experimental set-up using Personal Computer/Laptop was used for

binaural presentation of the test stimuli, for displaying the response choices, and recording

subject's responses. Subject's responses were stored as response time statistics, stimulus-

response confusion matrix that gives the occurrence of stimulus-response pairs, and

percentage correct recognition score. Response time statistics were used to compare the

effectiveness of the processing schemes in reducing the load on perception [10]. Confusion

matrices were subjected to information transmission analysis to find the relative

information transmission for consonant identification and for various consonantal features

like, duration, frication, nasality, manner, place, and voicing [1].

2. RESULTS

In experiment with Daubechies WP of different orders and with bioorthgonal WP, the

listening tests were carried out on six subjects with bilateral 'mild' to 'very severe'

sensorineural hearing impairment. These subjects were tested without adding any masking

noise to the speech stimuli. Presentations were done at the comfortable listening level for

the subject. Subject's response times, recognition scores, relative information transmitted

for overall and consonantal features were analyzed. In experiment, with six subjects having

bilateral hearing impairment, as shown in Table 1, most of the subjects indicated highly

significant improvement in recognition score. For unprocessed speech, response times

238 MAHESH T. KOLTE AND D. S. CHAUDHARI

varied from 1.89 to 3.28 seconds. With processing, response times decreased. For

Daubechies WP scheme, relative decrease in response times ranged from 1.22 to 2.73

seconds while for bioorthogonal WP scheme, relative decrease in response times ranged

from 1.03 to 2.87 seconds. For different subjects, the percentage relative improvements in

recognition score range from 0 to 15.56 for Daubechies WP and -2.22 to 10.83 for

bioorthogonal WP in VCV context. Two subjects (NB, FSM) having severe high

frequency loss have shown maximum relative improvement for the scheme of Daubechies

WP.

3. CONCLUSION:

There was a decreased response time for two processing schemes compared with

unprocessed signal, signifying reduction in burden on perception process. Relative

decrease in response time was statistically significant for the processing schemes ps-bio.

This indicates that processing schemes are more effective in reducing perceptual load.

Recognition scores indicate that binaural dichotic presentation improved consonantal

identification, and the improvements were highest with processing scheme ps-bio. From

the analysis of recognition scores and information transmission, it is observed that, the

scheme that gives maximum benefit by reducing the effects of increased masking depends

on the individual hearing impairment configuration. Reception of the relatively robust

consonantal features (voicing, manner, and nasality) also improves because of dichotic

presentation. Hence the processing schemes for dichotic presentation have the potential of

improving speech perception for persons using binaural hearing aids.

For hearing impaired subjects, the improvement in consonantal reception and reduction in

response time do not follow the same trend. Therefore, in order to estimate the detailed

advantages of processing schemes, extended tests with hearing impaired subjects are

needed.

REFERENCES:

[1] Miller, G. A., and Nicely, P. E., "An analysis of perceptual confusions among some Englishconsonants," J. Acoust. Soc. Am., vol. 27 (2), pp. 338-352, (1955).

ACOUSTIC FEATURES TRANSFORMA nON USING... 239

[2] Dubno, J. R., and Levit, H., "Predicting consonant confusions from acoustic analysis," J.Acoust. Soc. Am. vol. 69(1), pp. 249-261, (1981).

Moore, B. C. J., An Introduction to the Psychology of Hearing, 4th ed. London: Academic,( 1997).

[3]

[4] Pickett, J. M., The Acoustics of Speech Communication: Fundamentals, Speech PerceptionTheory, and Technology (Allyn and Bacon, Boston, Massachusetts) (1999).

CHABA, "Speech-perception aids for hearing..:impairedpeople: Current status and neededresearch," J. Acoust. Soc. Am. vol. 90, pp. 637-683,(1991).

Loizou, P. C., "Mimicking the human ear," IEEE Signal Processing Magazine, vol. 15(5),pp. 101-130, (1998).

[5]

[6]

[7] Burrus, C. S., Gopinath, R. A., and Guo, H. Introduction to Wavelets and the WaveletTransforms A Primer (Prentice Hall, Upper Saddle River, NJ), (1998).

Daubechies, I. Ten Lectures on Wavelets, Philadelphia: SIAM, CBMS- NSF RegionalConference in Applied Mathematics 61, (1992).

[8]

Vaidyanathan, P. P., Multirate Systems and Filter Banks (Prentice Hall Englewood Cliffs,NJ), (1993).

[10] Chaudhari, D. S., "Dichotic presentation of speech signal for improving speech perceptionfor the bilateral sensorineural hearing impairment," Ph.D. Thesis, Dept. of Biomedical Engg.,lIT, Bombay, (2000).

[9]

iIj'100:s.

. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . , . . . . . .

40 0 3.5 40.5 \.5 2 2.5Frequency (kHz)

(a)

3 4.5 5

iIj'100~~ 80a.~ 60..~E 40::It:~ 20P-V'J

0~0

c...-200

. .__L~- i:.. . -!-~

---u:

~r---:::!~~~~::!:---

~-::l:::::t:::::t:::-".->\..,[J , ; -- ~ , en;

:: ::::I::~~:l~::~r:::+ ::::f:::f ::::1:n -:n: ---::: ~:::~;0.5 \.5 2 2.5

Frequency (kHz)

(b)

3 3.5 4 4.5 5

240 MAHESH T. KOLTE AND D. S. CHAUDHARI

$1002-~ 80a.~ 60~

~e 40::I...~ 200.V)

! r-'j---r~1--' !--1 1

! y 'j."vr~--tumtm--r--miuu_-t-m-

-~::~:)::::::J::::::+:_::-+_::-~~ _::I-:::::f:::::

::::::1::::::1::::::t::::::~::::::)::::::{::::::r::~~ 0~0

0...-200 0.5 1.5 2 2.5

Frequency (kHz)3 3.5 4 4.5 5

(c)Fig. 1. Power spectrum of speech signal /asa/ a) unprocessed signal b) processed signal

ps-bio (left ear) c) processed signal ps-bio (right ear)

$100~~ 90a'S~ 80~E 702u8. 60

V)

~ 50~0

0...

. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .

40 0 0.5 1.5 2 2.5Frequency(kHz)

3 3.5 4 4.5 5

(a)

iD100~~ 80a.~ 60'"~E 40E~ 20~....,~0

0... -200

0

0.5i L i i i i i ' 1.--

1 1.5 2 2.5 3 3.5 4 4.5 5Frequency (kHz)

(b)

ACOUSTIC FEATURES TRANSFORMATION USING.. . 241

"-'-20 ~., i i i _i i ..L i L- _L__~.-0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Frequency (kHz)(c)

Fig. 2. Power spectrum of speech signal /asa! a) unprocessed signal b) processed signalps-DB (left ear) c) processed signal ps-DB (right ear)

3.5 - -- -..~ 'f ~. .

~ 3N:r:Co 2.5>.(.)

~ 2:IC'Q)ct 1.5

0.50 100 200 300

Time (ms)400 500 600

(a)

2.5

N 2r.,32'-' 1.5>.(.)!:~ 1C'Q)

ct 0.5

oL-~. _~A0 "100 ~-' n

~~ ~I

200 300Time (ms)

(b)

400 500 600

242 MAHESH T. KOLTE AND D. S. CHAUDHARI

2.22

1.8

~ 1.6~~1.4>.gJ.2'":3 IifU::0.8

0.6

0.4

0.2 l ~--0

f\ .' --~/,.~.,..-'.-- . .

. '--" "--.-'~"

100 200 300Time (ms)

400 500 600

(c)

Fig. 3. Formants transitions of speech signal /asa/ a) unprocessed signal b) processed signal ps-bio (left ear) c) processed signal ps-bio (right ear)

3.5

1

'"' 3N

22.5>.tJ

§ 2r~ .

g- I~ 1.5t"

IL- ..

0.51_~"''''--''''''O_I_'O ~OO''-''_-l.oo~===~='=~~OL:::~.~~=~~=-=:~'.--=-=~._~._-~o=~~=~"~.~~=,,..a 100 200 3(

,.

)0Time (ms)

(a)

600

1.8

r1.6

~1.4:;:::.1.2-

G' Is::!:J 0.80'"

J: 0.60.4

0.2

00

'\

. '._~-- ~ /'/ \ .//'-0- -"-'-'-- "" ..~~ \.-- "-0' '--"-_0 " "..."..,

100 200 300Time (ms)

(b)

400 500 600

50!. ..io-

ACOUSTIC FEATURES TRANSFORMATION USING... 243

3

!

2.5

¥ 2

iL5f-~- ~-

IO:L~ 1000.

200,

300Time (ms)

\\.

400,

500 600

(c)

Fig. 4. Formav.ts transitions of speech signal /asa/ a) unprocessed signal b) processed signal ps-DB (left ear) c) processed signal ps-DB (right ear)

Table 1. Response time, mean: average response time (s), S. D.: standard deviation(s), R.D.:relative decrease in % with respect to unprocessed

Subject Signal Min Max Mean S.D. R.D.

BMA US 1.31 2.59 2.08 0.36 -

PS-DB 1.07 2.48 1.68 0.49 18.91

PS- bio 1.23 2.26 1.74 0.35 15.91

DAA US 1.21 2.98 1.76 0.55 -

PS-DB 1.23 2.54 1.70 0.38 3.528

PS- bio 1.04 2.75 1.68 0.50 04.34

KST US 1.87 2.65 2.25 0.21 -

PS-DB 1.46 2.96 2.13 0.41 05.14

PS- bio 1.37 2.67 1.86 0.40 17.45

NB US 1.89 3.56 2.73 0.53 -

PS-DB 1.21 2.73 1.96 0.51 28.10

PS- bio 1.25 2.51 1.86 0.36 31.75

PHS US 1.87 2.59 2.30 0.21 -

PS-DB 1.48 2.71 1.95 0.38 15.28

PS- bio 1.70 2.56 2.17 0.23 5.514

VR US 1.26 2.96 1.88 0.52 -

PS-DB 1.04 2.53 1.61 0.43 14.32

PS- bio 1.09 2.73 1.53 0.51 18.49

244 MAHESH T. KOLTE AND D. S. CHAUDHARI

Table 2. Percentage recognition scores, R.I.: relative improvement in % with respect tounprocessed, Avg.: averaged recognition scores, Std. Dev.: standard deviation

Mahesh T. KolteProfessor,Department of Electronics and Telecommunication Engineering,Maharashtra Academy of Engineering, Alandi (D), [email protected]

D. S. ChaudhariAssistant Professor,Department of Electronics and Telecommunication Engineering,Government College of Engineering, [email protected]

Subject!Percentage Recognition Scores

R.I. Unprocessed Processed ProcessedSignal (US) Signal (ps-DB) Signal (ps-bio)

BMA 92.22 100 92.22R.I. 7.78 0DAA 78.88 87.77 93.33R.I. 8.89 14.45KST 82.22 92.22 93.33R.I. 10 11.11NB 57.77 67.41 73.33R.I. 9.64 15.56PHS 91.11 97.77 100R.I. 6.66 8.89VR 85.55 83.33 88.88R.I. -2.22 3.33

Avg. 81.10 87.88 90.15Std. Dev. 11.51 10.83 8.22

INTERNATIONAL JOURNAL OF ENGINEERINGRESEARCH AND INDUSTRIAL APPLICATIONS

(IJERIA) ISSN0974-1518

The Journal of Engineering Research and Industrial Applications is an international journalpublished by Ascent Publication, Pune, India, quarterly; it aims to publish original research papersdevoted to wide areas of various disciplines of engineering and their applications in industry.

International Journal of Engineering Research and Industrial Applications (IJERIA) is a referredreviewed periodical journal. This journal also welcomes survey papers that stimulate research inBasic engineering, Engineering sciences and allied areas.

The manuscript should be prepared using LaTeX or Ms-Word processing system, basic font Roman12pt size. The papers should be in English and typed in frames 14x 21.6 cm (margins 3.5 cm on leftand right and 4 cm on top and bottom) in single column on A4-forrnat white paper with 1.5 linespacing. On the first page leave 4 cm space on the top for the journal's headings. Each page should benumbered, the first page of manuscripts should contain (i) the title; (ii) the author's name(s) and fullmailing address with e-mail; (iii)An abstract of not less than 150words and not exceeding 300 words;(iv) Key-words; (v) subject classification.

Authors are requested to submit manuscripts electronically at the e-mail [email protected] and [email protected].

Manuscripts should not have been published or submitted elsewhere. An effort is made to publish apaper duly recommended by referees within a period of three months.

Papers should be arranged in sections as follows: Title, Abstract, Key-words, Authors affiliation,Introduction, Methods, Results, Discussion, Figures, Tables, and Legends for illustrations,Acknowledgement (if any), References.

Tables should be given number and title, which should be on the top of the table and for the figuresnumber and title should be at the bottom of the figure.

References at the end of the manuscript should be arranged alphabetically by the name of author insquare brackets.

Authors are requested to arrange block making, printing and publication charges of their papers@ US$ 15.00 per page from their institutions or research projects. For authors in India this chargeis Rs. 300 per page. Twentyreprints of a paper are provided to the authors ex-gratis.

Twenty reprints of each paper / article will be supplied. Additional reprints may be ordered prior topublication and will be charged at cost.

The papers intended for publication in lJERIAmay be submitted in duplicate to:Principal Editor (IJERIA)Ascent Publication,A-63, Puru Coop. Housing SocietyAir Port Road, Pune-411032, Maharashtra State, INDIA.Tel.:0091-20-26610466 Tel.No.:0091-20-27185857 Ext. 105. (Office)E-mail: [email protected]@gmail.com

~