speech recognition
TRANSCRIPT
Introduction
Speech recognition technology has recently reached a higher level of performance and robustness, allowing it to communicate to another user by talking .
Speech Recognization is process of decoding acoustic speech signal captured by microphone or telephone ,to a set of words.
And with the help of these it will recognize whole speech is recognized word by word .
Types of SR There are two main types of speaker models: speaker independent
and speaker dependent.
Speaker independent models recognize the speech patterns of a large group of people.
Speaker dependent models recognize speech patterns from only one person. Both models use mathematical and statistical formulas to yield the best work match for speech. A third variation of speaker models is now emerging, called speaker adaptive.
Speaker adaptive systems usually begin with a speaker independent model and adjust these models more closely to each individual during a brief training period.
Speech produces a sound pressure wave which forms an acoustic signal.
The microphone – receives the acoustic signal and converts it to an
analogue signal.
To store the analogue signal, it must be converted to a digital signal.
A speech recognizer tries to transform a digitally encoded acoustic signal in a natural language
into text in that language.
How does it works?..
Speech Waveform/Spectrogram
The spectrogram is an alternative way to characterize speech.
The louder the sound the greater the amplitude on the y-axis.
s p ee ch l a bHz
s
It is important to understand that this audio stream is rarely pristine
It contains not only the speech data (what was said) but also background noise.
This noise can interfere with the recognition process, and the speech engine must handle (and possibly even adapt to) the environment within which the audio is spoken.
Audio I/O
Once the speech data is in the proper format, the engine searches for the best match.
It does this by taking into consideration the words and phrases it knows about (the active grammars), along with its knowledge of the environment in which it is operating.
The knowledge of the environment is provided in the form of an acoustic model.
Once it identifies the most likely match for what was said, it returns what it recognized as a text string.
Acoustic+Grammer
About SR Engine
SR requires a software application "engine" with logic built in to decipher and act on the spoken word.
Sound Card – Converts acoustic signal to digital signal.
Function of SR Engine-– SR Engine converts these digital signal to
phonemes to word.
Recognition Process Flow Summary
Step 1:User Input The system catches user’s voice in the form of analog
acoustic signal.
Step 2:Digitization Digitize the analog acoustic signal.
Step 3:Phonetic Breakdown Breaking signals into phonemes.
Recognition Process Flow Summary
Step 4:Statistical Modeling Mapping phonemes to their phonetic representation
using statistics model.
Step 5:Matching According to grammar , phonetic representation and
Dictionary , the system returns an n-best list (I.e.:a word plus a confidence score)
Grammar-the union words or phrases to constraint the range of input or output in the voice application.
Dictionary-the mapping table of phonetic representation and word(EX:thu,theethe)
Challenges and Difficultiesof SR
Speech Recognition is still a very cumbersome problem. Following are the problem….
Speaker Variability Two speakers or even the same speaker will pronounce the
same word differently
Channel Variability The quality and position of microphone and background
environment will affect the output
Current Software Options for PC
Dragon Systems – Naturally Speaking
Philips – FreeSpeech
IBM – ViaVoice
Lernout & Hauspie – Voice Xpress