The Speech Speech
casey chesnutbrains-N-brawn.com
Madison .NET April 2007
Powerpoint
• Page Up
• Page Down
brains-N-brawn.com
• Pervasive Computing– Tablet PC (MVP 03)
– Compact Framework (MVP 04)
– Advanced Web Services (MVP 05)
– Media Center (MVP 06)
– Speech– Location Based Services– Artificial Intelligence– 3D
Outline
• Speech Overview
• Vista Speech Recognition
• SAPI 5.3 / System.Speech
• Speech Server 2007
Outline : Speech Overview
• Voice User Interface
• How does it work?– Synthesis (TTS)– Recognition (SR)
Overview
• Speech is just another presentation system– Synthesis = Output to user– Recognition = User input
• Voice User Interface (VUI)
VUI Modes
• Applications– Multi-modal– Voice-only
VUI Tips
• Don't replicate the touch-tone-based menu system
• Restrict options on the main (opening) menu to 4 or fewer
• Make sure your opening greeting is short • Don't design the app solely for the new user • Focus on task completion above all • What can I say?
http://blogs.msdn.com/anandis_thoughts/archive/2006/02/08/528181.aspx
Speech Synthesis
• Text to Speech– Dynamic– Prompt database
How Synthesis Works
• Text parsing– Sentences, numbers, symbols, pauses
• Natural language processing– Part of speech, tense
• Phonemes are looked up or sounded out
• Diphones are appended together
• Post process audio to add emphasis
• Play speech audio
How Synthesis Works
• Demo– /xnaSynth app
• Article– http://www.brains-N-brawn.com/ttSpeech/– http://www.brains-N-brawn.com/xnaSynth/ (codebase from
/ttSpeech)
Speech Recognition
• Speech to Text– Dictation– Command and Control
How Recognition Works
• Audio signal is processed
• Look for signals which might be speech
• Phonemes are found in audio signals
• Phonemes are mapped to a dictionary or words– Dictation or grammar-based
• Apply natural language processing
How Recognition Works
• Demo– /wavReader app
• Article– http://www.brains-N-brawn.com/noReco/
– http://www.brains-N-brawn.com/speakerVerify/ (codebase from /noReco)
Outline : Vista Speech Recognizer
• Built-in to Vista’s shell
• Microphone bar
• Language support
• Can be trained to improve accuracy
• Command-and-control, also Dictation
• Automagic application support
• Horrible Office integration
• UAC problems
Demo
• Say what you see• Show numbers• Correct• Spell it• Mouse grid
http://www.istartedsomething.com/20060808/vista-speech-recognition-screencast/
High Risk Demo
Hack
http://news.bbc.co.uk/1/hi/technology/6320865.stm
• /micBarExtend – tap and talk
Narrator
• Vista’s screen reader
Outline : SAPI 5.3 / System.Speech
• Desktop applications– SAPI 5.3– System.Speech
SAPI 5.3
• COM based
• Native applications
• Managed apps which need more control
System.Speech
• Part of .NET 3.0 WPF
• Managed wrapper built on SAPI 5.3
• Simple API
• Standards support (SSML, SRGS)
• Language support
• Vista Speech Recognition integration
• Does not work in XBAP
System.Speech.Synthesis
• SpeechSynthesizer
• SSML
• PromptBuilder
• Voices
System.Speech.Synthesis
• Demo– /speechSamples - /speechSynth
System.Speech.Recognition
• SpeechRecognizer / SpeechRecognizerEngine
• SRGS
• GrammarBuilder
• Advanced users– Deep-link functionality– Mixed initiative
System.Speech.Recognition
• Demo– /speechSamples - /speechReco
System.Speech
• Demo– /micBarExtend– /mceSapiMcpl
• Article– http://www.brains-N-brawn.com/speechSamples/– http://www.brains-N-brawn.com/micBarExtend/– http://www.brains-N-brawn.com/mceSapi/ (not
updated for Vista yet)
What about Mobile Devices
• OEMs can add VoiceCommand– VoiceCommand is not accessible to
developers
• WindowsMobile has the SAPI API, but no engines
• PlatformBuilder is supposed to have engines
• There are 3rd party engines for purchase
Outline : Speech Server 2007
Speech Server 2007
• Telephony Applications
• Outgoing calls
• Speaker Independent
Speech Server 2007
• VOIP
• Language support
• VoiceXML / SALT
• Workflow development model
• Reports
• Still in beta
Speech Server 2007
• Speech Synthesis– Inline
– PromptBuilder
– SSML
– Prompt databases
• Speech Recognition– Inline
– Dynamic Grammar
– SRGS
– Conversational Grammar Builder
– DTMF
VoiceXML
• Declarative language
• Article– http://www.brains-N-brawn.com/vxml/– http://www.brains-N-brawn.com/myVoices/– http://www.brains-N-brawn.com/voiceBio/
SALT
• Yet another declarative language
• Multimodal support has been dropped
• Article– http://www.brains-N-brawn.com/noHands/
– http://www.brains-N-brawn.com/speechMulti/– http://www.brains-N-brawn.com/tabletWeb/– http://www.brains-N-brawn.com/mceSalt/
Speech Workflow
• Speech Sequence Workflow designer
• Speech activities– Statement– QuestionAnswer
• Debugging tools
Speech Workflow
• Demo– /speechTextAdv– /speakerVerify– /mobileRecord
• Article– http://www.brains-N-brawn.com/
speechTextAdv/– http://www.brains-N-brawn.com/
speakerVerify/
Where
• Accessibility
• Telephony
• Telematics
• Home automation
• Mobile Devices / Tablets
• Gaming
• Warehouses
• …
Possible Future• Telematics• Service Pack for Office Support• Exchange Server 2007• Speech Server 2007 release• Rumors that WindowsMobile will get a public
API• Dictation has room to improve• Hope that System.Speech will ultimately work
in XBAP
Questions