[ieee 2013 international conference on advances in computing, communications and informatics...

4
HMM Based Isolated Kannada Digit Recognition System using MFCC Muralikrishna H, Asst. Professor, Dept. of E&C, MIT, Manipal. [email protected] Ananthakrishna T, Asst. Professor, Dept. of E&C, MIT, Manipal. [email protected] Dr. Kumara shama Professor, Dept. of E&C, MIT, Manipal. [email protected] Abstract--In this paper we have implemented Kannada isolated digit recognition system using Mel frequency cepstral coefficients (MFCC) as feature vector. The system is designed to recognize isolated utterances of Kannada numbers. MFCC are used as the features and Hidden Markov Model (HMM) as pattern recognizer. K-means procedure is performed on the feature vectors to obtain the observation sequence. Discrete HMM is used in the system. The system is developed by considering the requirement of a voice controlled machine in Kannada language. Performance of the system is evaluated and compared based on the MFCC along with its first and second order derivatives. Keywords--Mel frequency cepstral coefficients (MFCC), Hidden Markov Model (HMM), vector quantization, speech recognition, Kannada language. I. INTRODUCTION In spite of huge developments in various fields of signal processing technology, current computers and other electronic devices requires certain level of physical interfacing with users. But if a person is blind or handicapped then he may not be able to operate these systems. Communicating with computers using speech in native language will be good solutions to above mentioned problem. Also, the hearing impaired people can use of the speech to text conversion system to facilitate the human-to-human communication. The other applications of speech to text conversion systems can also be used for automatic machines like ATM or automatic telephone call processing. So, a voice controlled system is desired in the native language. Automatic speech recognition basically converts speech into equivalent text information of the specific language. Since the speech signal is non-stationary and possesses multi- dimensional information, automatic speech recognition becomes a complex task. Extracting textual information from speech signals becomes challenging since different speakers utter same word in different ways. Also individual utterances of the same word, even by the same person, often differ in length and other parameters. Many feature extracting methods were used in speech processing like linear predictive coefficients (LPC), Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction (PLP) coefficients [1]. MFCC along with its derivatives are the preferred choice by researchers for the speech recognition task [2]. This is because of the fact that MFC coefficients are orthogonal to each other and also the Mel filter bank models the percievness of human ear system, and it gives better results compared to other feature extracting methods [3]. In speech recognition system using MFCC along with HMM gives very good results over the other methods [1]. The originality of this paper comes from the fact that we have developed a system for recognizing spoken digits for Kannada language. Even though we can find that considerable work has been done in many Indian languages [4][5], Kannada language is not yet explored by many researchers (except a few recent publications [6][7]) in the context of speech recognition using HMM. II. ISOLATED DIGIT RECOGNITION SYSTEM The speech recognition system is developed in two steps: viz. training the word model and testing. In Training phase, we estimate the model parameters ) , , ( π λ B A = that best represents a particular class. The training process for the system is performed by generating one model for each Kannada digit. Training includes pre-processing, speech feature extraction, clustering the features and building the HMM model. Vector quantization (VQ) is used to convert continuous observation vector into a discrete code book index. In the testing phase, we have the sample pre-processing, feature extraction and Vector quantization followed by recognition of incoming speech data [1]. Testing process involves the comparison of incoming unknown model with each of the models in the data base and selecting a model which is closest to the incoming model. A. Sample Preprocessing The input speech signal uttered by a speaker contains some background noise and silence period along with the useful information. The preprocessing step minimizes noise present during speech recordings. The beginning of speech in the recorded Kannada digit utterance is recognized by the energy of that speech signal. The silence removal algorithm used in this system works based on the energy of the voiced speech corresponding to a digit utterance [8]. B. Feature extraction Speech is basically quasi-stationary signal and it can be assumed stationary for duration about 25msec [9]. So we divide the speech signal into frames of 20msec duration and 50% overlapping between the adjacent frames. This first step in the feature processing stage is known as frame blocking of 730 978-1-4673-6217-7/13/$31.00 c 2013 IEEE

Upload: kumara

Post on 31-Jan-2017

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE 2013 International Conference on Advances in Computing, Communications and Informatics (ICACCI) - Mysore (2013.8.22-2013.8.25)] 2013 International Conference on Advances in Computing,

HMM Based Isolated Kannada Digit Recognition System using MFCC

Muralikrishna H,

Asst. Professor, Dept. of E&C, MIT, Manipal.

[email protected]

Ananthakrishna T, Asst. Professor, Dept. of E&C,

MIT, Manipal. [email protected]

Dr. Kumara shama Professor, Dept. of E&C, MIT,

Manipal. [email protected]

Abstract--In this paper we have implemented Kannada isolated digit recognition system using Mel frequency cepstral coefficients (MFCC) as feature vector. The system is designed to recognize isolated utterances of Kannada numbers. MFCC are used as the features and Hidden Markov Model (HMM) as pattern recognizer. K-means procedure is performed on the feature vectors to obtain the observation sequence. Discrete HMM is used in the system. The system is developed by considering the requirement of a voice controlled machine in Kannada language. Performance of the system is evaluated and compared based on the MFCC along with its first and second order derivatives.

Keywords--Mel frequency cepstral coefficients (MFCC), Hidden Markov Model (HMM), vector quantization, speech recognition, Kannada language.

I. INTRODUCTION

In spite of huge developments in various fields of signal processing technology, current computers and other electronic devices requires certain level of physical interfacing with users. But if a person is blind or handicapped then he may not be able to operate these systems. Communicating with computers using speech in native language will be good solutions to above mentioned problem. Also, the hearing impaired people can use of the speech to text conversion system to facilitate the human-to-human communication. The other applications of speech to text conversion systems can also be used for automatic machines like ATM or automatic telephone call processing. So, a voice controlled system is desired in the native language.

Automatic speech recognition basically converts speech into equivalent text information of the specific language. Since the speech signal is non-stationary and possesses multi-dimensional information, automatic speech recognition becomes a complex task. Extracting textual information from speech signals becomes challenging since different speakers utter same word in different ways. Also individual utterances of the same word, even by the same person, often differ in length and other parameters. Many feature extracting methods were used in speech processing like linear predictive coefficients (LPC), Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction (PLP) coefficients [1]. MFCC along with its derivatives are the preferred choice by researchers for the speech recognition task [2]. This is because of the fact that MFC coefficients are orthogonal to each other and also the Mel filter bank models the percievness of human ear system, and it gives better results compared to

other feature extracting methods [3]. In speech recognition system using MFCC along with HMM gives very good results over the other methods [1]. The originality of this paper comes from the fact that we have developed a system for recognizing spoken digits for Kannada language. Even though we can find that considerable work has been done in many Indian languages [4][5], Kannada language is not yet explored by many researchers (except a few recent publications [6][7]) in the context of speech recognition using HMM.

II. ISOLATED DIGIT RECOGNITION SYSTEM The speech recognition system is developed in two steps:

viz. training the word model and testing. In Training phase, we estimate the model parameters ),,( πλ BA= that best represents a particular class. The training process for the system is performed by generating one model for each Kannada digit. Training includes pre-processing, speech feature extraction, clustering the features and building the HMM model. Vector quantization (VQ) is used to convert continuous observation vector into a discrete code book index.

In the testing phase, we have the sample pre-processing, feature extraction and Vector quantization followed by recognition of incoming speech data [1]. Testing process involves the comparison of incoming unknown model with each of the models in the data base and selecting a model which is closest to the incoming model. A. Sample Preprocessing The input speech signal uttered by a speaker contains some background noise and silence period along with the useful information. The preprocessing step minimizes noise present during speech recordings. The beginning of speech in the recorded Kannada digit utterance is recognized by the energy of that speech signal. The silence removal algorithm used in this system works based on the energy of the voiced speech corresponding to a digit utterance [8]. B. Feature extraction

Speech is basically quasi-stationary signal and it can be assumed stationary for duration about 25msec [9]. So we divide the speech signal into frames of 20msec duration and 50% overlapping between the adjacent frames. This first step in the feature processing stage is known as frame blocking of

730978-1-4673-6217-7/13/$31.00 c©2013 IEEE

Page 2: [IEEE 2013 International Conference on Advances in Computing, Communications and Informatics (ICACCI) - Mysore (2013.8.22-2013.8.25)] 2013 International Conference on Advances in Computing,

�������������� ������������������������������ ��������������������������� ���������������������������������������������������������������������������

����� ����� �� ��� �� �� �������� ���� ����� ������ ��� ��� �������������� ��������� �� ������ ���� ���!������ ������������������������������ ��"������������������������������������� ��� ������ ��� ����� ���!������ ������ �������� �������������������!�� �������������� ����!������������#$'������������ ���� ������������ ��� ���!������ �� ���������*��� ��� ����+� /��� �� � ���� ���;�� ��������� �������������� ��� +� /� �������;�������������������<��

���� +� � ���!������ ������� �� ����� ��� ������ �������������������������;����+� /� ����=��� ����������;����� ���������"��������� �����=�����������������������������;���������+�>>���������������������������������������������������������������� �����<?������������#<'���������?������������������� ��� ������ ��� ���� �� ���� +�>>� ������������ J����������� ���� �� ����������� �� ���� ��� �������� �������� ����������������������������������������� ������ ��������� �������

"��������������������������������������������������������������� !����*����� WXZ\� �������;� �� ���� ����� ��������������������� ���������������� ��������������������;� ������^���� ���� ����� ���;� ��� ������� �� �������=� ���� ������������ ���� �������� ������� �� ��� ����� ��� ���� ����� ��� ���� ���������������;��������#<'���������_/������� �������������������������� ����� ���;� #<`'�� ������� ���� _� ���������� �������� �����������=� ����� ����������������������������������������������� ���������� ��� ��� � ������ �������������������������� �������� ����� ��� ���� ����� ��� ���� �������� ��������� ���� � �������������� ��� ��/��������� ����� �������� ������� ��� ���� ��� ������� �� ������ ��������=� ���� ��� ��� ��� � ��� �������� ��������������

����<��+� /� �������;��������������� ������� ������ �����#$'���

����?��{ ��;����������+�>>�����������

"""�� �|JJ>��}J>^~�"�"^���"����++��

��� ������������������������� +��;��� +��� �� W�++\� ���� ���� ����� ��� ��

����������!���������������������������������������������������=����������������������������++�������� ��� ������ ���� ���������� ���� ���� ��� ���� ���� ������� ��� �������������� �������� � ���� ���� ���� ����� ��� ���� ���� ���� ��������������� ��� ���� ���� ���� ������� ����� �� ����������� ����

�������������������������������������� � ������������������

��� ����++�������� ���� ��� �������������� ���� ��� � ������

������ ����������� �π �� ���������� ���������������������� ����

���� ��� ������� ������ ��� ����������� ��� ����� ��� �++����� �� ����������������#<'����W<\��

\==W πλ ��= �������������W<\�J������������ ����������������������������������������������

��� ��� �������������W?\��

\�W < ������ �� === − ���������W?\�

�����=� ��� �������� ���� ������ ��� ����� ������� ��;��� ��

�������������������� �� ��������� �������� � ��������� �����������

����� �� ����������� ���� ����������� ��� ���� ���� ���� ���� ����� �� �++� ������ �� �� �� �� ������ ��� �������� ���� ���������� ��� ��� ���� ������ ���� ���� ����������� ��� ����� ��������� �� ���� ������ ��� ��� ��� ����������� ������� ���� ����

������ ������ � � �� ��� ���� ����� ��� �� �� ��� ���� �������

������ ����������������W$\��

\�W ������ � === ���������W$\�

� ��� ��� �� ������� ���� ���� ��� ������� ������ ���� ���� ���W�\��

{ } ���� � ���=$=?=<= == ������������W�\�

������ ���� ����� �� ������ ���� ���� ��� ������ �� �� ��� �����++=��������������� �����!"� #���$�������%��"��������������� ��=���������������������� �������������������!������ &���� ���== ?<= ���������

��������� ��� ���� ���� ���� \==W πλ ��= �� "�� ���� � ��� �������������������������� ���������� ����������� ����������������������������!������#?�'���'"� (�)��� � ���%�� ���� ������� ����� �� �� ����������� ��=� ������ ��� ��������� ���� ����� ������ ��!������ ������������ \==W πλ ��= ��X������ ������������������������������������

2013 International Conference on Advances in Computing, Communications and Informatics (ICACCI) 731

Page 3: [IEEE 2013 International Conference on Advances in Computing, Communications and Informatics (ICACCI) - Mysore (2013.8.22-2013.8.25)] 2013 International Conference on Advances in Computing,

����$���������/����������/��/}����++�+��� ����������������������������#<'��

*"� ���� ����� +� ���� ����� ���� ��� �� ��� ����� �����++�������������������������������*����������������������������� ��� � ����� ������ ���� ���� �� �� ����� ���� {���/�� ���� ��������������$��������������� ��������������������++����� ����� ������������ &���� �%����

������������� ���������������������+����{����������� _������� ��� W`� ��� �\=� �`� ���� �� ����� ���� ����� ����������� � ����� ���� �� ��������������� ��� ������� � ����� #�'������ ���� ��� � � ������ ������� ���� ��� ���� ���������� ��� ���+�>>� �������� ��������� J���� ������� ���� �� �� ������ ����������� ��� �������� ?`����� ���� �`�� ���� ������ ���� ����������� �� <?/�������� � +�>>� �������� ������� �� ����������X������!����*������������������������������������������������������������������;���� ���� ��!������*����������������_/������ � ������ ��� ��� ������� �������;� ��� �*�� ?��� �*�� ����������;� �� �� ������ ��� ������ �� �������� ���� ������� �������������������������������_��������������������������`���� ��� W��� ����� �������� ����� ������ ���� ?�� ��������� ������������� ��� _������� ���� `� ��� �\�� ���� ���� ������� ��������;=� ����� �������� ������� �� ������� �� � ���� ��� �� ����� ���������� ������� ��� � �� ���������� ���� �� ��!������ �������������� ����� ��� ������ ���� ��� ����������� ������������������������������������ �������������������������� �������� � �� ��������� ������������� �� ���������� ����� ����������������������������� ��������������������������� &��� ������

"�����������������������;���������������������� �������������� ��� ���� ���������� ��� �� _������� ��� �� ��������������� "�� ���� ����� ����=� ���� ���� �� �� ���/���������� �������������������� ��������������� ������������������������� ����� ������ �� ���������� ��� ��� �������� <?/�������� �

���������������������������������������������������������������������;������������������������������������������������

�������{ ��;�������������������������

�������{ ��;�����������������������

����� ������ �������� �� �� ������ ���� ��!������ ��� ��������� �� ��� ����������� ������� ���� ����������� ���� ��������;��������#<'���++������������������������������ � ;� ���������� � ���������� ����� �������� ������������������� ��;����� ���� �� �� ������ ��� �� ����� ���� ����� ���������� � ;� ���������������������� ;� ������� ����������������� � �� ��������� ���� ���� ���� � ����� ������������� ���������������������=���������� �� ����������������W�\=�

,���� � ≤≤= <\'=�W���#��� λ ����������W�\������������������������������ ������������������������

"X�� |J}�^}+��>J�JX�����"^������}J������"��������������������������������`��������������������

� ����� � � ���� ���� ��� ����� �������� ����� �� �� ����;���� "�������������������������������?������ ������������� �������������� ����;���� ���� ���� ��� ����� ��������� �� �� ������ �������������� ���� ������ ���;������ ������ �������������������������������������������������������������������

"�� ���� ����� ����=� ��� ����� ���������� <?/�������� �+�>>� ��� �������� �������� ���� ����������� ��� � ������������������� ���� ���� ��������������� �������� ��������� ���� ������������� ������ �� ��� �� "�� � ���� ��������� ����� ��� ���� ���������������������������������<?/��������+�>>������������

"�� ���� ����� ����=� ��� ����� ����� +�>>� � ��� ���� ����������������������������?�/������������������������"����������=� ��� �������� ����� ������ �� ��� ����������� �� ���������������� ��� ���� �������� ���� ���� ��� ����� �� ���� ��������� ������������� ��"���

732 2013 International Conference on Advances in Computing, Communications and Informatics (ICACCI)

Page 4: [IEEE 2013 International Conference on Advances in Computing, Communications and Informatics (ICACCI) - Mysore (2013.8.22-2013.8.25)] 2013 International Conference on Advances in Computing,

TABLE I. PERCENTAGE OF RECOGNITION ACCURACY.

Kannada Digits MFCC

MFCC, Delta

MFCC

MFCC, Delta MFCC,

Delta-Delta MFCC

80 92 100

76 80 96

72 80 96

84 88 100

72 80 96

72 80 96

76 84 100

68 80 92

76 88 96

80 92 100

In third experiment, we have considered the first and

second order derivatives of MFCC (Delta and delta-delta-MFCC) to form a total of 39-dimension feature vector. In this case, we can see a significant improvement in the performance of the recognition system. This is due to the fact that MFCC coefficients unable to capture the temporal information in the speech [11], whereas delta and delta-delta coefficients can capture the changes over multiple speech frames. So, when we use delta and double delta coefficients along with the MFCC coefficients gives the best performance.

In the design of HMM, we have considered the number of states equal to number of phonemes in the speech sample. Table I summarizes the percentage recognition rate for the three different cases.

V. CONCLUSION In this study, we have implemented HMM based isolated

speech recognition system for the spoken Kannada digits. The system performance is evaluated based on the MFCC and its derivatives. The best results were obtained when we have combined the MFCC features with its first and second order derivatives. The system need to be trained with larger data base to further improve the recognition accuracy.

REFERENCES [1] Rabiner, L.; Juang B.: “Fundamentals of Speech Recognition”, Prentice

Hall, Englewood Cliffs, New Jersey, (1993).

[2] Douglas O’Shaughnessy, “Automatic speech recognition: History, methods and challenges”, Elsevier, pattern Recognition, 41, 2965-2979 (2008).

[3] S. B. Davis and P. Mermelstein, "Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences", IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 28, pp. 357-366, Aug. 1980.

[4] TarunPruthi, Sameer Saksena, Pradip K Das, “Swaranjali: Isolated Word Recognition for Hindi Language using VQ and HMM”, An speaker-dependent, real-time, isolated word recognizer for Hindi developed by Hughes Software Systems Electronic City, Gurgaon, Haryana, India.

[5] M. Kumar, et al., “A Large-Vocabulary Continuous Speech Recognition System for Hindi”, IBM Research and Development Journal, September 2004.

[6] Nagesha, K Samudravijaya and G Hemantha Kumar, "Acoustic-Phonetic Analysis Of KannadaAccents", Tata Institute of Fundamental Research, Mumbai.

[7] ] M.A.Anusuya and S.K.Katti, " Wavelet Packet Based Kannada Speech Recognition ", , Proceedings published by International Journal of Computer Applications® (IJCA)ISSN: 0975 – 8887, MPGINMC-2012 ,7-8 April, 2012.

[8] G. Saha, SandipanChakroborti, SumanSenapat, " A New Silence Removal and Endpoint Detection Algorithm for Speech and Speaker Recognition Applications ", Indian Institute of Technology, Kharagpur, India.

[9] T. F. Quatieri, Discrete-Time Speech Signal Processing: Principles and Practice, Prentice-Hall Inc., 2002.

[10] J. R. Deller Jr., J. G. Proakis, and J. H. L. Hansen, Discrete Time Processing of Speech Signals, Macmillan Publishing Company, New York, 1993.

[11] Remzi, Serdar, Kurcan, “Isolated word recognition from in-ear microphone data using Hidden Markov models(HMM)”, A thesis presented to the faculty of Naval Postgraduate School for the degree of Master of Science in electrical engineering and Master of Science in system engineering., March 2006.

2013 International Conference on Advances in Computing, Communications and Informatics (ICACCI) 733