[studies in computational intelligence] new challenges in applied intelligence technologies volume...

8
N.T. Nguyen, R. Katarzyniak (Eds.): New Chall. in Appl. Intel. Tech., SCI 134, pp. 155–162, 2008. springerlink.com © Springer-Verlag Berlin Heidelberg 2008 On Vowels Segmentation and Identification Using Formant Transitions in Continuous Recitation of Quranic Arabic Hafiz Rizwan Iqbal, Mian Muhammad Awais, Shahid Masud, and Shafay Shamail Department of Computer Science, Lahore University of Management Sciences, DHA 54792, Lahore, Pakistan {rizwani,awais,smasud,sshamail}@lums.edu.pk Abstract. This paper provides an analysis of cues to identify Arabic vowels. A new algorithm for vowel identification has been developed that uses formant frequencies. The algorithm extracts the formants of already segmented recitation audio files and recognizes the vowels on the basis of these extracted formants. The investigation has been done in context of recitation principles of Holy Quran which are commonly known as Tajweed rules. Primary objective of this work is to be able to identify zabar /a/, zair /e/ and pesh /u/ mistakes of the recitor during the recitation. Acoustic Analysis was performed on 150 samples of different recitors and a corpus comprising recitation of five experts was used to validate the results. The vowel identification system developed here has shown up to 90% average accuracy on continuous speech files comprising around 1000 vowels. Keywords: Tajweed, Formant transition track(s),Wavelet transforms, Location, Trend, Gradient, Vowels, Zabar, Zair, Pesh, Laam, Meem, Noon, Continuous Arabic speech. 1 Introduction Keeping in view the emerging demands of speech recognition, a prototype application to understand Arabic recitation has been developed. This system acts as a language tutor to correct mistakes in pronunciation and recitation. The application is developed around an automated speech recognition system for which speech segmentation and identification are essential components. High segmentation accuracies are required for such systems to work. A brief description of the phoneme segmentation in continuous Arabic speech has been discussed in our prior work [1]. This paper is focused on identification of vowels from segmented speech. The standard Arabic language has 34 phonemes out of which there are 6 vowels and 28 consonants [2, 3]. Vowels are the fundamental speech units present in every spoken language. The Arabic vocalic system is composed of three short vowels /a/, /e/, /u/ and three vowels of the same quality but of longer duration /aa/, /ee/, /uu/. Several features and techniques of vowels identification in Arabic language have been discussed in the literature [4 - 9]. Most of the existing schemes are based on standard set of features such as spectral densities, intensities or formant frequencies. These techniques are known to result in Recognition Error Rate (RER) of around 10% [10].

Upload: radoslaw

Post on 23-Dec-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

N.T. Nguyen, R. Katarzyniak (Eds.): New Chall. in Appl. Intel. Tech., SCI 134, pp. 155–162, 2008. springerlink.com © Springer-Verlag Berlin Heidelberg 2008

On Vowels Segmentation and Identification Using Formant Transitions in Continuous Recitation of Quranic Arabic

Hafiz Rizwan Iqbal, Mian Muhammad Awais, Shahid Masud, and Shafay Shamail

Department of Computer Science, Lahore University of Management Sciences, DHA 54792, Lahore, Pakistan {rizwani,awais,smasud,sshamail}@lums.edu.pk

Abstract. This paper provides an analysis of cues to identify Arabic vowels. A new algorithm for vowel identification has been developed that uses formant frequencies. The algorithm extracts the formants of already segmented recitation audio files and recognizes the vowels on the basis of these extracted formants. The investigation has been done in context of recitation principles of Holy Quran which are commonly known as Tajweed rules. Primary objective of this work is to be able to identify zabar /a/, zair /e/ and pesh /u/ mistakes of the recitor during the recitation. Acoustic Analysis was performed on 150 samples of different recitors and a corpus comprising recitation of five experts was used to validate the results. The vowel identification system developed here has shown up to 90% average accuracy on continuous speech files comprising around 1000 vowels.

Keywords: Tajweed, Formant transition track(s),Wavelet transforms, Location, Trend, Gradient, Vowels, Zabar, Zair, Pesh, Laam, Meem, Noon, Continuous Arabic speech.

1 Introduction

Keeping in view the emerging demands of speech recognition, a prototype application to understand Arabic recitation has been developed. This system acts as a language tutor to correct mistakes in pronunciation and recitation. The application is developed around an automated speech recognition system for which speech segmentation and identification are essential components. High segmentation accuracies are required for such systems to work. A brief description of the phoneme segmentation in continuous Arabic speech has been discussed in our prior work [1]. This paper is focused on identification of vowels from segmented speech.

The standard Arabic language has 34 phonemes out of which there are 6 vowels and 28 consonants [2, 3]. Vowels are the fundamental speech units present in every spoken language. The Arabic vocalic system is composed of three short vowels /a/, /e/, /u/ and three vowels of the same quality but of longer duration /aa/, /ee/, /uu/.

Several features and techniques of vowels identification in Arabic language have been discussed in the literature [4 - 9]. Most of the existing schemes are based on standard set of features such as spectral densities, intensities or formant frequencies. These techniques are known to result in Recognition Error Rate (RER) of around 10% [10].

156 H.R. Iqbal et al.

In this research we have used formant transition tracks along with the phoneme duration cues for vowels identification in Quranic recitation. The identification algorithm can be divided into different stages. In the first phase, the output of the segmentation algorithm is provided to Praat speech processing tool [11] which returns all formants existing in a particular time slot. In the second phase, the segmented vowels are separated from the nasals (/l/ laam, /m/ meem, /n/ noon) and also identified as /a/, /e/, /u/ on the basis of formants obtained in the first step.

This work aims at establishing the relationship between values of formants corresponding to different vowels in Arabic recitation. Previous research in this area has been focused on the first three formants i.e. F1, F2 and F3. In comparison, this work uses only two formants F1 and F2 for vowel identification. Experimental results obtained using these two formants show 90% accuracy for the vowels identification.

The rest of the paper is organized as follows: Section 2 gives an outline of the cues used during different experiments in the vowel identification system. Section 3 explains the methodology that has been adopted for vowel identification. Section 4 describes the results and analysis of the proposed algorithm. Section 5 gives the conclusion and proposed future work while section 6 gives the references.

2 Features Analysis

Properties related to phoneme are embedded in different type of signals which can act as cues for vowels identification. Different combinations of these cues can generate different results with varying accuracy levels. These features include formants transition tracks. The formant transition track(s) have very useful information hidden in the formant frequency trends of all the phonemes [4]. Each group of formant transition tracks in F1 and F2 possess features which could be used for unique identification of the standard Arabic phonemes. This is described below.

2.1 Formant Analysis

The vowels identification algorithm developed here is independent of the speaker. The speaker independence has been achieved through a preprocessing step that relies on Location, Trend and Gradient (LTG) from the graphical analysis of formant tracks [4].

Typical formant transitions for zabar /a/, zair /e/ and pesh /u/ are shown in figures 1, 2 and 3 respectively. It can be observed from these figures that the Trend and gradient sub-cues show almost the same characteristics for all the three vowels. A prominent difference in the location of formants F1 and F2 was observed. When a speaker recites /a/ (zabar) vowel the distance between formant F1 and F2 is about 800-900 Hz as shown in the Figure 1. When /u/ (pesh) is recited, this difference between F1 and F2 decreased to about 400-500 Hz (half of the zabar) as shown in Figure 3. Difference between the locations of F1 and F2 for the vowel /e/ (zair) is about 1700-1900 Hz (twice as of zabar) as depicted in Figure 2. Formant transitions of F1 and F2 are also observed for the nasals (/l/ laam, /m/ meem, /n/ noon) sounds. It has been analyzed that F1 ranges from 300-500 Hz while F2 lies between 1250-1650 Hz, for all of the three nasals.

On Vowels Segmentation and Identification Using Formant Transitions 157

Fig. 1. Representation of /a/ (a) Waveform of signal (b) Formant transitions

Fig. 2. Representation of /e/ (a) Waveform of signal (b) Formant transitions

Fig. 3. Representation of /u/ (a) Waveform of signal (b) Formant transitions

158 H.R. Iqbal et al.

Speech Sample

Vowels Identification

Segmentation Processor

Vowels

Formants Processor

Separate Vowels & Nasals

Consonants

/a/ Zabar /e/ Zair /u/ Pesh

3 Methodology of Vowel Identification

This section outlines in detail the settings, constraints, algorithm and calculations conducted with reference to the research presented here.

3.1 Experimental Setup

The recordings were conducted in a noise-free environment using 8 kHz sampling rate. At first, 12 speakers from an age group of 15 years to 30 years were selected for the recordings. All recitors in this set belonged to the same region and were experts in the recitation of the Holy Quran according to Tajweed rules. The segmentation and identification algorithms are developed in C++. After segmentation, the samples of vowels (/a/, /e/, /u/) and consonants uttered by a particular speaker were obtained as wave files. Praat tool was used in the analysis of segmented data. Over 150 samples of vowels (/a/, /e/, /u/) were used in this analysis.

3.2 Algorithm Implementation

Segmentation algorithm generates the time boundaries of the vowels. In some cases (e.g. vowel and nasal come together during recitation), these boundaries consist of a vowel part and a nasal (/l/ laam, /m/ meem, /n/ noon) part. Formants location against each point are calculated for each of the given time slot. Through these formants, the vowels are separated from nasals and also classified automatically using the application developed in C++. Figure 4 shows an abstract level diagram of the vowels segmentation and identification system.

Fig. 4. Architecture of Vowels Identification System in Arabic Recitation

On Vowels Segmentation and Identification Using Formant Transitions 159

An audio file has been used for processing in the system. This audio file is stored in a numerical format in an array. Input speech is sampled at 8 KHz and the windows of 128 samples are taken for further processing. This data passes to the segmentation processor which generates the classified phonemes vowels and consonants. These time boundaries of vowels are sent to the Formant processor, where Praat tool calculates all the formants (F1, F2, F3 and F4) of each time slot. Now, the identification module uses these formant values to separate vowels from nasals and also classify the vowel part as /a/, /e/, /u/ (zabar, zair and pesh respectively). Figure 5 shows the algorithm that has been developed for vowels separation from nasals and also identification these vowels. It has been concluded from the experiments that the Lower Formant (LF) and Upper Formant (UF) limits consistently correspond to the following vowel ranges: These are used for different formants (F1 or F2 for vowel identification). LF for /a/= 550 Hz, UF for /a/= 900 Hz LF for /e/= 1800 Hz, UF for /e/= 2550 Hz LF for /u/= 750 Hz. UF for /u/= 1100 Hz

1. For ‘i’ from 1 to 4, get Formants Fi from the Praat tool and read each Fi

against a particular time slot. 2. Check; If the formant F1 lies between the “LF for /a/”and “UF for /a/” then it

is vowel /a/. 3. If F1 goes down to “LF for /a/” then Check F2.

3.a. If F2 lies between “LF for /e/” and “UF for /e/” then it is vowel /e/. 3.b. If F2 lies between “LF for /u/” and “UF for /u/” then it is vowel /u/. 3.c. If F2 lies between “UF for /u/” and “LF for /e/” then it is a nasalized

vowel. 4. Repeat until all the time slots are finished.

Fig. 5. Algorithm to Identify Vowels in Arabic Recitation

4 Results and Analysis

Speech signal was divided into different time slots and for each slot the location of the formants has been probed to find the number of consecutive slots specifying a certain vowel or a non-vowel. For each detected phoneme (vowel or non-vowel), the starting time, ending time and proposed classification was evaluated. As an example, summarized results for each vowel, generated from the method described in section 3 for five different speakers are shown in Tables 1, 2 and 3. The results for non-vowels (/l/ laam, /m/ meem, /n/ noon) classification are also shown in Table 4. Each table shows the total number of vowels manually identified, total number of vowels identified by the proposed algorithm, (VasO), actual vowels which were termed as other (vowels or non-vowels), (OasV), actual other (vowels or non-vowels) termed as a

160 H.R. Iqbal et al.

particular vowel. Mathematical relationship for calculating Recall and Precision values are as follows:

asOVectlytifiedCorrVowelsIden

ectlytifiedCorrVowelsIdenlVowelRecal

+=

(1)

asVOectlytifiedCorrVowelsIden

ectlytifiedCorrVowelsIdensionVowelPreci

+=

(2)

Precision defines the proportion of the classified phonemes which are actually correct whereas recall depicts the sensitivity, or the proportion of the correct results obtained. The overall accuracy of the system for/a/, /e/, /u/ is 96%, 92.5% and 84% respectively. For nasals the accuracy level is 87%. Average recall for all of the vowels and nasalized sounds is 93% and average precision for both types of sounds is 86%. Accuracy for the whole system is about 90%.

Table 1. Vowel V1 (/a/) Recall & Precision

Files V1 (Manual) V1 (Algo) V1 as O O as V1 V1 Recall V1 Precision 1 37 37 0 1 100% 97%

2 35 35 0 3 100% 92%

3 35 34 1 0 94% 97%

4 40 39 1 1 95% 95%

5 56 55 1 6 96% 89%

Table 2. Vowel V2 (/e/) Recall & Precision

Files V2 (Manual) V2 (Algo) V2 as O O as V2 V2 Recall V2 Precision 1 15 14 1 1 87% 87%

2 15 15 0 3 100% 83%

3 20 20 0 0 100% 100%

4 22 22 0 0 100% 100%

5 10 10 0 5 100% 67%

Table 3. Vowel V3 (/u/) Recall & Precision

Files V3 (Manual) V3 (Algo) V3 as O O as V3 V3 Recall V3 Precision 1 7 7 0 1 100% 87%

2 13 12 1 3 86% 75%

3 8 8 0 3 100% 73%

4 9 9 0 2 100% 82%

5 13 11 2 5 73% 61%

On Vowels Segmentation and Identification Using Formant Transitions 161

Table 4. Non-Vowel V4 (/’n’/) Recall & Precision

Files V4 (Manual) V4 (Algo) V4 as O O as V4 V4 Recall V4 Precision 1 63 60 3 1 91% 94% 2 62 53 4 1 80% 84% 3 51 48 3 1 89% 92% 4 63 60 3 1 91% 94% 5 71 57 0 1 80% 79%

5 Conclusion and Future Work

Formant transition track(s) are the cues which play a major role in the identification of vowels in Arabic recitation. A new algorithm using formant transitions was developed for vowels identification. The algorithm has been shown to provide over 90% accuracy for vowels identification in continuous speech samples. The approach developed here can be used in speech recognition solutions operating in the environment of recitation of religious scriptures, poetry or learning of a foreign language.

The scheme proposed here can be extended to use other features like phoneme duration, wavelet transforms, MFCC and cochleagram along with formant transitions to further increase the accuracy of the system. Further investigation on the effects of vowel lengthening, Qalqalah vowels and other tajweed rules for Quranic Arabic recitation is also being carried out. The objective of further work is to be able to identify the recitation mistakes in real-time for use in an interactive learning environment. Acknowledgments. The authors acknowledge the funding provided by Lahore University of Management Sciences (LUMS), Lahore for the research conducted.

References

1. Ahmad, W., Awais, M.M., Shamail, S., Masud, S.: Continuous Arbic Speech Segmentation Using FFT Spectrogram. In: Innovations in Information Technology Conference, pp. 1–6. IEEE Press, Dubai (2006)

2. Maryati, M.: Man-Machine Communication and Arabic Language. Technical report, Scientific Studies and Research Center, Syria (1987)

3. Abady, Z.A.: Arabic Speech Processing. In: International Conference on Electronics Circuits and Systems, Jordan, pp. 647–650 (1995)

4. Shoaib, M., Rasheed, F., Akhtar, J., Awais, M., Masud, S., Shamail, S.: A Novel Approach to Increase the Robustness of Speaker Independent Arabic Speech Recogniton. In: INMIC Conference, pp. 371–376. IEEE Press, Pakistan (2003)

5. Selouani, S.-A., Caelen, J.: Recognition of Arabic Phonetic Features Using Neural Networks and Knowledge-Based System: a Comparative Study. In: Intelligence and Systems Conference, pp. 404–411. IEEE Press, USA (1998)

162 H.R. Iqbal et al.

6. Gendrot, C., Adda-Decker, M.: Impact of duration on F1/F2 formant values of oral vowels: an automatic analysis of large broadcast news corpora in French and German: a Comparative Study. In: INTERSPEECH, Portugal, pp. 2453–2456 (2005)

7. Al-Tamimi, J., Carre, R., Marsico, E.: The status of vowels in Jordanian and Moroccan Arabic: Insights from production and perception. In: 48th meeting of the Acoustical Society of America, USA, p. 2629 (2004)

8. Al-Anani, M.: Arabic Vowel Formant Frequencies. In: International Congress of Phonetic Sciences, USA, pp. 2117–2119 (1999)

9. Gendrot, C., Adda-Decker, M.: Impact of duration and vowel inventory size on Formant values of earal vowels: an automated formant analysis from eight languages. In: International Congress of Phonetic Sciences, pp. 1417-1420. Germany (2007)

10. Quran Phonetic Search Engine, http://www.islamicity.com/ps/default.asp

11. Speech Analysis and Processing Tool, http://www.fon.hum.uva.nl/praat/ 12. S.: IGNATIUS HIGH SCHOOL,

http://www2.ignatius.edu/faculty/turner/arabicspanish.htm