a framework for speech recognition development
TRANSCRIPT
-
7/30/2019 A Framework for Speech Recognition Development
1/23
A Framework For Speech ApplicationDevelopment
By
Jason Elroy Martis
NMAMIT Nitte
mailto:[email protected]:[email protected] -
7/30/2019 A Framework for Speech Recognition Development
2/23
Agenda
Introduction
Applications
Working and Types
FSM
Problems
Proposed Solution
Results
Conclusion
References
-
7/30/2019 A Framework for Speech Recognition Development
3/23
Normal Ways of Interaction
Normal Interaction actually works in 2 basic forms
Language
Meta Language (Body Language)
Both forms occur simultaneously which makes
interaction experience richer.
-
7/30/2019 A Framework for Speech Recognition Development
4/23
Language communicated through
Language is communicated in form of Speech
What is Speech ???
Speech is the vocalized form of human
communication.
It is based upon the syntactic combination of lexicals
and names that are drawn from vocabularies.
It forms to be the most natural way of how we interact
Example : Hey! How are you?
-
7/30/2019 A Framework for Speech Recognition Development
5/23
Hence Speech Recognition (SR)
Speech Recognition is the process of converting a speech signal to a
sequence oflexicals by means of an algorithm.
i.e
Instruct something by speech signals and the computer will
recognize it .
Is this Necessary???
Of Course (It improves our natural way of communication withthe electronic or virtual world )
-
7/30/2019 A Framework for Speech Recognition Development
6/23
Application of SR
There are innumerable applications. Some are
Military Uses
Remote Command and Control Centers
(plane ,Satellite etc)
Health Care
Automated medical prescriptionsWOW!!!
Educational Uses
Helps teachers and students too
-
7/30/2019 A Framework for Speech Recognition Development
7/23
So how does SR work ??
A very simple model demonstrates how SR works
-
7/30/2019 A Framework for Speech Recognition Development
8/23
Approaches of SR
Basically divided into 3
Acoustic Phonetic Approach (Works on phonemes)
Pattern Recognition Approach ( Works on Patterns)
Artificial Intelligence Approach ( Advanced Functionality)
-
7/30/2019 A Framework for Speech Recognition Development
9/23
Acoustic Phonetic Approach
Need to know phonetics (the Language of Enunciation )
Recognize Phonemes, convert to lexicals and match to words .
-
7/30/2019 A Framework for Speech Recognition Development
10/23
Pattern Recognition
Pattern Recognition
Works in 2 Phases
Pattern Training
Comparison
Pattern Training is modeled by a FSM (Finite State Machines).
In simple words Speech Templates are created and stored .
The speakers recognized words and the stored templates arecompared and verified
If Matched: Accept
Not Matched :Reject
-
7/30/2019 A Framework for Speech Recognition Development
11/23
Pattern Recognition Contd
Model:
Problems: Different accents can cause Problems
-
7/30/2019 A Framework for Speech Recognition Development
12/23
Artificial Intelligence Approach
This approach overcomes some disadvantages of
Template based
Maintains a knowledge baseAutomatically correct words.
Eg What your name?? (Error!!!)
It overcomes some problems of Speaker variance andother constraints of Speech
E.g. Culture, Accent, etc..
-
7/30/2019 A Framework for Speech Recognition Development
13/23
Speech Recognition Model
-
7/30/2019 A Framework for Speech Recognition Development
14/23
Finite State Machines Based SR Model
It is a very simple approach
2 main Stages are present
The Acceptor
The Transducer
Acceptor used for accepting of rejecting lexicals
Transducer is for transition from a set of words to another as i/p
grows.
-
7/30/2019 A Framework for Speech Recognition Development
15/23
FSM based SR Model Contd
What if match causes a problem ( 2 words are same )
Know and no both sound same(How to overcome this problem ??)
Solution :We can attach weights to them to improverecognition (This can work better )
-
7/30/2019 A Framework for Speech Recognition Development
16/23
Performance of Speech based Systems
The performance of Speech works on 2 main basis
WER (Word Error rate)
WRR (Word Recognition Rate)
WER is simple indicating how the word is recognized
WRR is Word recognition Rate
-
7/30/2019 A Framework for Speech Recognition Development
17/23
So What is New in this ???
Theres Nothing new in this as speech recognition is developed from
almost nothing to everything now
All are attracted and developing lots of apps on it
This causes an integrity issue
All apps are from scratch
There can be App Conflicts (2 diff apps on same comp)
Both apps are waiting for the same word and cause conflicts on
same machine
License on these machines (normal developer has to do nothing
but sit silently until SDK comes) Yuck !!!
-
7/30/2019 A Framework for Speech Recognition Development
18/23
How can we Solve this
We Combine both of this Approaches
Allow developers to build from scratch (This makes them
independent)
Allow a Platform where they can work together
So,
Why not build a framework where users can build things easily
and plus from scratch We dont loose anything and we improve integrity issues
-
7/30/2019 A Framework for Speech Recognition Development
19/23
How does this Framework Look ???
Notice how integrity issue is resolved and apps are developed easily
-
7/30/2019 A Framework for Speech Recognition Development
20/23
Results
Notice how the results affect the accuracy
Type of Speech Accuracy
Normal Dictionary Speech 50-90%
Choices (Customized) 90%
Choices (General ) 80%
Individual Letters 30%
Customized Phonetics 70%
-
7/30/2019 A Framework for Speech Recognition Development
21/23
Conclusion
Speech is a natural way of Communication.
Numerous applications of Speech are present.
There are various approaches and they have their own Pros and Cons
FSMs are one way to make job easier and better
There are lots of problems
Recognition problems
Integrity issues
So , We need a platform independent framework that can solve these
issues and make the life of speech developers easier.
-
7/30/2019 A Framework for Speech Recognition Development
22/23
References[1] Wienstien C.J. Military and government applications of human-machine communication by voice. In
Proceedings of the Natl. Acad. Sci. USA. Volume 92 1001110016. October 1995.
[2].Dat Tat Tran, Fuzzy Approaches to Speech and Speaker Recognition, A thesis submitted for the degree of
Doctor of Philosophy of the university of Canberra.
[3] R.K.Moore, Twenty things we still don t know about speech, Proc.CRIM/ FORWISS Workshop on Progress
and Prospects of speech Research an Technology , 1994.
[4].Sadaoki Furui, 50 years of Progress in speech and Speaker Recognition Research, ECTI Transactions on
Computer and Information Technology, Vol.1. No.2 November 2005.[5]. Willie Walker .etal. Sphinx-4: A Flexible Open Source Framework for Speech Recognition
http://cmusphinx.sourceforge.net/sphinx4
[6] M.A.Anusuya, Speech Recognition by Machine: A Review. In (IJCSIS) International Journal of Computer
Science and Information Security, Vol. 6, No. 3, 2009
http://arxiv.org/ftp/arxiv/papers/1001/1001.2267.pdf
[7] Neann Mathai, A Literature Survey of Speech Recognition and Hidden Markov Models.
http://shenzi.cs.uct.ac.za/~honsproj/cgi-bin/view/2009/katz_mathai_sobey.zip/Speech_Katz_Mathai_Sobey/Downloads/NeannMathaiLiteratureSu
rvey.pdf
[8] Pavel Stemberk, Speech recognition based on FSM and HTK toolkits
http://stembep.wz.cz/!papers/Zilina-dt04/zildt04.pdf
[9] Steve Renals, Speech recognition.
http://dsp-book.narod.ru/rec-notes.pdf
http://shenzi.cs.uct.ac.za/~honsproj/cgi-bin/view/2009/katz_mathai_sobey.zip/Speech_Katz_Mathai_Sobey/Downloads/NeannMathaiLiteratureSurvey.pdfhttp://shenzi.cs.uct.ac.za/~honsproj/cgi-bin/view/2009/katz_mathai_sobey.zip/Speech_Katz_Mathai_Sobey/Downloads/NeannMathaiLiteratureSurvey.pdfhttp://shenzi.cs.uct.ac.za/~honsproj/cgi-bin/view/2009/katz_mathai_sobey.zip/Speech_Katz_Mathai_Sobey/Downloads/NeannMathaiLiteratureSurvey.pdfhttp://shenzi.cs.uct.ac.za/~honsproj/cgi-bin/view/2009/katz_mathai_sobey.zip/Speech_Katz_Mathai_Sobey/Downloads/NeannMathaiLiteratureSurvey.pdfhttp://shenzi.cs.uct.ac.za/~honsproj/cgi-bin/view/2009/katz_mathai_sobey.zip/Speech_Katz_Mathai_Sobey/Downloads/NeannMathaiLiteratureSurvey.pdfhttp://shenzi.cs.uct.ac.za/~honsproj/cgi-bin/view/2009/katz_mathai_sobey.zip/Speech_Katz_Mathai_Sobey/Downloads/NeannMathaiLiteratureSurvey.pdfhttp://shenzi.cs.uct.ac.za/~honsproj/cgi-bin/view/2009/katz_mathai_sobey.zip/Speech_Katz_Mathai_Sobey/Downloads/NeannMathaiLiteratureSurvey.pdf -
7/30/2019 A Framework for Speech Recognition Development
23/23
http://www.animationfactory.com/