a framework for speech recognition development

7/30/2019 A Framework for Speech Recognition Development

1/23

A Framework For Speech ApplicationDevelopment

By

Jason Elroy Martis

NMAMIT Nitte

[email protected]
mailto:[email protected]:[email protected]


2/23

Agenda

Introduction

Applications

Working and Types

FSM

Problems

Proposed Solution

Results

Conclusion

References


3/23

Normal Ways of Interaction

Normal Interaction actually works in 2 basic forms

Language

Meta Language (Body Language)

Both forms occur simultaneously which makes

interaction experience richer.


4/23

Language communicated through

Language is communicated in form of Speech

What is Speech ???

Speech is the vocalized form of human

communication.

It is based upon the syntactic combination of lexicals

and names that are drawn from vocabularies.

It forms to be the most natural way of how we interact

Example : Hey! How are you?


5/23

Hence Speech Recognition (SR)

Speech Recognition is the process of converting a speech signal to a

sequence oflexicals by means of an algorithm.

i.e

Instruct something by speech signals and the computer will

recognize it .

Is this Necessary???

Of Course (It improves our natural way of communication withthe electronic or virtual world )


6/23

Application of SR

There are innumerable applications. Some are

Military Uses

Remote Command and Control Centers

(plane ,Satellite etc)

Health Care

Automated medical prescriptionsWOW!!!

Educational Uses

Helps teachers and students too


7/23

So how does SR work ??

A very simple model demonstrates how SR works


8/23

Approaches of SR

Basically divided into 3

Acoustic Phonetic Approach (Works on phonemes)

Pattern Recognition Approach ( Works on Patterns)

Artificial Intelligence Approach ( Advanced Functionality)


9/23

Acoustic Phonetic Approach

Need to know phonetics (the Language of Enunciation )

Recognize Phonemes, convert to lexicals and match to words .


10/23

Pattern Recognition

Pattern Recognition

Works in 2 Phases

Pattern Training

Comparison

Pattern Training is modeled by a FSM (Finite State Machines).

In simple words Speech Templates are created and stored .

The speakers recognized words and the stored templates arecompared and verified

If Matched: Accept

Not Matched :Reject


11/23

Pattern Recognition Contd

Model:

Problems: Different accents can cause Problems


12/23

Artificial Intelligence Approach

This approach overcomes some disadvantages of

Template based

Maintains a knowledge baseAutomatically correct words.

Eg What your name?? (Error!!!)

It overcomes some problems of Speaker variance andother constraints of Speech

E.g. Culture, Accent, etc..


13/23

Speech Recognition Model


14/23

Finite State Machines Based SR Model

It is a very simple approach

2 main Stages are present

The Acceptor

The Transducer

Acceptor used for accepting of rejecting lexicals

Transducer is for transition from a set of words to another as i/p

grows.


15/23

FSM based SR Model Contd

What if match causes a problem ( 2 words are same )

Know and no both sound same(How to overcome this problem ??)

Solution :We can attach weights to them to improverecognition (This can work better )


16/23

Performance of Speech based Systems

The performance of Speech works on 2 main basis

WER (Word Error rate)

WRR (Word Recognition Rate)

WER is simple indicating how the word is recognized

WRR is Word recognition Rate


17/23

So What is New in this ???

Theres Nothing new in this as speech recognition is developed from

almost nothing to everything now

All are attracted and developing lots of apps on it

This causes an integrity issue

All apps are from scratch

There can be App Conflicts (2 diff apps on same comp)

Both apps are waiting for the same word and cause conflicts on

same machine

License on these machines (normal developer has to do nothing

but sit silently until SDK comes) Yuck !!!


18/23

How can we Solve this

We Combine both of this Approaches

Allow developers to build from scratch (This makes them

independent)

Allow a Platform where they can work together

So,

Why not build a framework where users can build things easily

and plus from scratch We dont loose anything and we improve integrity issues


19/23

How does this Framework Look ???

Notice how integrity issue is resolved and apps are developed easily


20/23

Results

Notice how the results affect the accuracy

Type of Speech Accuracy

Normal Dictionary Speech 50-90%

Choices (Customized) 90%

Choices (General ) 80%

Individual Letters 30%

Customized Phonetics 70%


21/23

Conclusion

Speech is a natural way of Communication.

Numerous applications of Speech are present.

There are various approaches and they have their own Pros and Cons

FSMs are one way to make job easier and better

There are lots of problems

Recognition problems

Integrity issues

So , We need a platform independent framework that can solve these

issues and make the life of speech developers easier.


22/23

References[1] Wienstien C.J. Military and government applications of human-machine communication by voice. In

Proceedings of the Natl. Acad. Sci. USA. Volume 92 1001110016. October 1995.

[2].Dat Tat Tran, Fuzzy Approaches to Speech and Speaker Recognition, A thesis submitted for the degree of

Doctor of Philosophy of the university of Canberra.

[3] R.K.Moore, Twenty things we still don t know about speech, Proc.CRIM/ FORWISS Workshop on Progress

and Prospects of speech Research an Technology , 1994.

[4].Sadaoki Furui, 50 years of Progress in speech and Speaker Recognition Research, ECTI Transactions on

Computer and Information Technology, Vol.1. No.2 November 2005.[5]. Willie Walker .etal. Sphinx-4: A Flexible Open Source Framework for Speech Recognition

http://cmusphinx.sourceforge.net/sphinx4

[6] M.A.Anusuya, Speech Recognition by Machine: A Review. In (IJCSIS) International Journal of Computer

Science and Information Security, Vol. 6, No. 3, 2009

http://arxiv.org/ftp/arxiv/papers/1001/1001.2267.pdf

[7] Neann Mathai, A Literature Survey of Speech Recognition and Hidden Markov Models.

http://shenzi.cs.uct.ac.za/~honsproj/cgi-bin/view/2009/katz_mathai_sobey.zip/Speech_Katz_Mathai_Sobey/Downloads/NeannMathaiLiteratureSu

rvey.pdf

[8] Pavel Stemberk, Speech recognition based on FSM and HTK toolkits

http://stembep.wz.cz/!papers/Zilina-dt04/zildt04.pdf

[9] Steve Renals, Speech recognition.

http://dsp-book.narod.ru/rec-notes.pdf
http://shenzi.cs.uct.ac.za/~honsproj/cgi-bin/view/2009/katz_mathai_sobey.zip/Speech_Katz_Mathai_Sobey/Downloads/NeannMathaiLiteratureSurvey.pdfhttp://shenzi.cs.uct.ac.za/~honsproj/cgi-bin/view/2009/katz_mathai_sobey.zip/Speech_Katz_Mathai_Sobey/Downloads/NeannMathaiLiteratureSurvey.pdfhttp://shenzi.cs.uct.ac.za/~honsproj/cgi-bin/view/2009/katz_mathai_sobey.zip/Speech_Katz_Mathai_Sobey/Downloads/NeannMathaiLiteratureSurvey.pdfhttp://shenzi.cs.uct.ac.za/~honsproj/cgi-bin/view/2009/katz_mathai_sobey.zip/Speech_Katz_Mathai_Sobey/Downloads/NeannMathaiLiteratureSurvey.pdfhttp://shenzi.cs.uct.ac.za/~honsproj/cgi-bin/view/2009/katz_mathai_sobey.zip/Speech_Katz_Mathai_Sobey/Downloads/NeannMathaiLiteratureSurvey.pdfhttp://shenzi.cs.uct.ac.za/~honsproj/cgi-bin/view/2009/katz_mathai_sobey.zip/Speech_Katz_Mathai_Sobey/Downloads/NeannMathaiLiteratureSurvey.pdfhttp://shenzi.cs.uct.ac.za/~honsproj/cgi-bin/view/2009/katz_mathai_sobey.zip/Speech_Katz_Mathai_Sobey/Downloads/NeannMathaiLiteratureSurvey.pdf


23/23
http://www.animationfactory.com/

a framework for speech recognition development

Documents