© 2007 ibm corporation speechtek, august 21, 2007 jan sedivy ibm, voice technologies and systems,...
TRANSCRIPT
© 2007 IBM Corporation
SpeechTEK, August 21, 2007
Jan SedivyIBM, Voice Technologies and Systems, Czech Republic, Prague
Architecture for Web Multimodal Application
IBM, VTS, Czech Republic, Prague
© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application2
Introduction - need
Design a simple multimodal architecture
Architecture supports all possible kinds of multimodal applications starting from simple form filling to Interactive movie including animation.
Small required resources - runs on PDA and on Internet
Use open standards when possible
No compromises in multimodality - let the user freely change between voice (VUI) and GUI
Simple and fast development
IBM ViaVoice
IBM, VTS, Czech Republic, Prague
© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application3
Key Components - approach
IBM Embedded ViaVoice link
Embedded VoiceXML Browser (EVB) - research prototype
Standard HTML browser – Internet Explorer or Firefox
The Adobe Flash Player
(XML) protocol which enables the control of the browser by the external application
IBM, VTS, Czech Republic, Prague
© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application4
Embedded ViaVoice overview
Embedded ViaVoice® delivers IBM speech technology to mobile devices and automobile components.
Robust speech-recognition with low error rate and text-to-speechSLM and action classification supporting freeform commands – no
need for user’s manual Embedded grammars or large lists of over 100 000 words N-best, confidence score, out of vocabulary detection Speaker and noisy environment adaptation Push to activate button, automatic gain control, automatic end of
utterance detection, transient noise detection,Broad range of languagesEclipse based easy-to-use developer toolkitC/C++ highly portable, scalable, small footprint, low CPU MIPS code.IBM provides porting, integration, testing and consulting services,
along with customized development workshops
IBM ViaVoice
IBM, VTS, Czech Republic, Prague
© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application5
IBM Embedded VoiceXML Browser overview
Small, fast, and portable Embedded VoiceXML Browser (EVB)VoiceXML 2.0 compliant. Written in plain C++ (no templates, etc.)Compact and portable code.Targeted to small portable devices - PDA, handhelds, set-top
boxes, etc. Runs on top of the IBM's Embedded Speech Engine and TTS. Ported to Win32, WinCE (iPAQ), and Linux.Runs as a viewer, VoiceXML snippets are pushed to the EVB
EVB
IBM, VTS, Czech Republic, Prague
© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application6
Flash Player - overview
The Adobe Flash Player is a widely distributed multimedia and application player created and distributed by Macromedia (a division of Adobe Systems). Flash Player runs SWF files that can be created by the Adobe Flash authoring tool, by Adobe Flex or by a number of other Macromedia and third party tools.
Flash Player has support for an embedded scripting language called ActionScript (AS), which is based on ECMAScript. ActionScript matured from a script without variables to one that supports object-oriented code.
IBM, VTS, Czech Republic, Prague
© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application7
HTML Browsers - overview
HTML BrowserMS IE 6, IE 7
Firefox
Browsers support add-ons
IBM, VTS, Czech Republic, Prague
© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application8
PDA architecture
EVB
GUI – Adobe Flash PlayerVUI – Embedded VoiceXML Browser – viewer modeApplication control ActionScript
ActionScripts synchronizes GUI and VUI and generates: VoiceXML snippets of code,
Dynamic grammars, grammars, prompts (links)
All other dialog parameters
Result processing (n-best, disambiguation, similarity, OOV, ...)
IBM, VTS, Czech Republic, Prague
© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application9
Internet Extensions
EVB Life-Cycle Manager Add-on starting, initializing, running shutting down the browser
prevent multiply VXML browsers running at the same time
version policy mechanism providing new version notification
The Security Server permits to open a socket in a different domain.
Communicate with EVB
Life CycleManager
SecurityServer
IBM, VTS, Czech Republic, Prague
© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application10
Internet Architecture
Life cyclemanager
Securityserver EVB
Add-ons
Browser
Client
Internet
IBM, VTS, Czech Republic, Prague
© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application11
Sample application - Literacy Tutor
IBM, Corporate Citizenship & Corporate AffairsProject goals
Use speech recognition technology - over the web - to help children and adults improve their literacy skills
Value to customer Gain literacy skills through practice and positive reinforcement
Improve pronunciation in a private setting
Interaction with tutor character introduces ‘fun’ and increases computer skills
Web = Anywhere/anytime access:
Can resume where left off Can share progress with family Build and share books on the web
www.readingcompanion.org
IBM, VTS, Czech Republic, Prague
© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application12
Home page
IBM, VTS, Czech Republic, Prague
© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application13
Functionality
Practice Reading – main application Flash application that uses EVB+EVV to decode speech
Flash animates a tutor character that interacts with the reader
Reporting – performance reports for teachers indicating strengths as well as problem areas for students
Book Library – add/remove books from classroom, rate books, book browser
Classroom Management – add/delete students, adjust reading level, add/delete classrooms as well as teachers and schools
Book Authoring – separate tool to author new books
IBM, VTS, Czech Republic, Prague
© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application14
Bookshelf
IBM, VTS, Czech Republic, Prague
© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application15
Children’s book/character
IBM, VTS, Czech Republic, Prague
© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application16
Adult book/character
IBM, VTS, Czech Republic, Prague
© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application17
Student Performance
IBM, VTS, Czech Republic, Prague
© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application18
Reading Companion - summary
We currently have more than 200 schools and not-for-profit organizations participating in the grant program, involving more than 11,000 users (children and adults) in 9 countries, as follows: Canada, United States, Spain, United Kingdom, Ireland, South Africa, Mexico, Venezuela, India
Community relations managers are reviewing proposals from prospective organizations since we hope to expand the program this year to 100 more sites.
Market value: US$10,000 per site (regardless of number of users)
© 2007 IBM Corporation
SpeechTEK, August 21, 2007
Thank You!