spik v1.0 voice commands execution in a windows environment dekel abelson eliran dahan instructor:...
Post on 21-Dec-2015
223 views
TRANSCRIPT
Spik v1.0
Voice Commands Execution in a Windows Environment
Dekel AbelsonEliran Dahan
Instructor: Ari Todtfeld
Objectives
• Analysis and exploration of Voice-Recognition systems, the abilities of such systems and its limitations
• Understanding the Windows architecture
and programming concepts• Development and implementation of a tool that enables
users to execute voice commands in a Windows environment, including the restructuring of a graphic interface (GUI) of the tool.
• Learning the Microsoft Speech SDK 5.1
(Software Development Kit) and its speech engine
Project skills
• C++ programming skills
• XML (Extensible Markup Language) programming skills
• Programming in windows environment include API (Application Programming Interface) commands
Brief history
• 1994 - Release of Dragon Systems' “DragonDictate” for Windows 1.0,
using discrete speech recognition technology • 1996 - Introduction of IBM’s “MedSpeak”, being the first continuous
speech recognition software• 1997 - Dragon Systems’ “NaturallySpeaking” first general-purpose
continuous speech software program
Two months later IBM release it’s “ViaVoice”
• 2005 – Due to improvements in PC’s process time and in the algorithms
used - today there are several speech recognition programs in the market.
Voice recognition
• Voice recognition follows these steps:1. Spoken words enter a microphone2. Audio is processed by the computer's sound card3. The software discriminates between lower-frequency
vowels and higher-frequency consonants and compares the results with phonemes, the smallest building blocks of speech
The software then compares results to groups of phonemes, and then to actual words, determining the most likely match
4. The sentence is transferred to a word processing application
Architecture
Voice command by the user
SAPI 5.1 Speech Application Program Interface
Commands executionusing API functions
Processing the recognizedcommands by C++/XML code
GUI
• Execution file - spik.exe• The GUI - A window that receives the voice commands
from the user. This GUI has been built in C++ using the
basic “Windown” class.
Sapi 5.1
• The SAPI provides a high-level interface between the application
and the speech engine• The TTS (Text-To-Speech) system synthesize text strings
and files into spoken audio Speech • Speech recognizers convert human spoken audio into
readable text strings
Processing
Main function contains the infinite loop waiting for messages to process
Main window procedurethat handles the messages to the window
Execute commands that have been identified by the speech engine
Microsoft Speech Engine
API functions
Commands Execution
• Windows API is a set of Application Programming Interfaces available in the Microsoft Windows operating systems which enable developers to create software
• The API consists of C functions implemented in dynamically linked libraries (DLLs), mainly in core DLLs - kernel32.dll, user32.dll and gdi32.dll
• Main API functions we have used:CreateProcess()– runs executable filesWinExec() – runs windows proceduresShellExecute() – runs URL filesShowWindow() – sets the specified window's show state SendMessage() – sends the specified message to a window or
windows keybd_event() – synthesizes a keystroke PostMessage() – places (posts) a message in the message queue
associated with the thread that created the specified window
The Code
קבצי קוד מקור Cבשפת ++
Headerקבצי של התוכנית
קובץ תוכנית הרצה
XMLקובץ טקסט בפורמט לשימוש מנוע זיהוי הקול
קובץ טקסט המכיל מחרוזותלשימוש התוכנית
קובץ מקומפללשימוש מנוע זיהוי הקול
Headerקבצי של מנוע זיהוי הקול
Adaptation & Training
• The speech recognition engine adapts itself to the user’s voice, vocabulary and speech style in order to improve speech recognition accuracy
• After adaptation there will be only ¼ of recognition errors and the accuracy will rise
• As more training is being done,
accuracy will rise to
around 95%.
Voice command example
• Calculator usage:Say the voice command “Open Calculator”To run the calc.exe program
Say a simple exerciseAnd than say “Equal” or “Result”To show the solution
Voice command example
• Run programs - notepad
command line
• Internet usage - search google
• Windows navigation - my documents
system properties
start menu
screen saver
Added value of the project
• Advanced versions based on Spik v1.0 will be a helpful tool for using the computer and the web, for physically challenged population
Future Development
• Advanced OS navigation in order to eliminate
the use of the keyboard
• Adding Speech-to-Text capabilities
• Improved GUI to let users enter their own
voice commands
Q&A