ken rehor's presentation at ecomm 2008
TRANSCRIPT
![Page 1: Ken Rehor's presentation at eComm 2008](https://reader036.vdocuments.mx/reader036/viewer/2022081519/555052c0b4c90574428b46d7/html5/thumbnails/1.jpg)
Alphabet Soup: Sorting out Emerging
Telephony and Speech StandardsKen Rehor
Co-founder, VoiceXML Forum Founder, Harken Systems, LLC
![Page 2: Ken Rehor's presentation at eComm 2008](https://reader036.vdocuments.mx/reader036/viewer/2022081519/555052c0b4c90574428b46d7/html5/thumbnails/2.jpg)
• Voice Web Telephony Architecture
• Benefits of Open Interfaces, Protocols, Languages
• Status and Deployment
![Page 3: Ken Rehor's presentation at eComm 2008](https://reader036.vdocuments.mx/reader036/viewer/2022081519/555052c0b4c90574428b46d7/html5/thumbnails/3.jpg)
Components of a Voice Solution
Voice Processing and Telephony
Middleware
API Layer
Telephony Interface
Dialog Layer
ASR TTS AudioDTMF Media
Voice Application
Application Server
Application§ Logic§ Prompts§ Grammars
Database Database
Transaction Server
![Page 4: Ken Rehor's presentation at eComm 2008](https://reader036.vdocuments.mx/reader036/viewer/2022081519/555052c0b4c90574428b46d7/html5/thumbnails/4.jpg)
Break out of the monolithic systems trap
• Modernize existing proprietary applications without starting from scratch
• Develop new apps, and incrementally add features in a modular fashion
• Advantages• Faster development
• Less expensive to develop and maintain
• Path towards modern, open standards architecture
![Page 5: Ken Rehor's presentation at eComm 2008](https://reader036.vdocuments.mx/reader036/viewer/2022081519/555052c0b4c90574428b46d7/html5/thumbnails/5.jpg)
Internet or
Intranet
Phone user
Web user
HTTP
HTTP
App server
• Application logic• Content and data• Transaction processing• Database interface
<html>
VoiceXMLplatform
TDM orVoIP
Voice / Web Application Architecture
• Grammars• Audio / SSML• Scripts
• Images• Media• Scripts
HTTP
.wav
<grxml>
Any phone
<vxml>
![Page 6: Ken Rehor's presentation at eComm 2008](https://reader036.vdocuments.mx/reader036/viewer/2022081519/555052c0b4c90574428b46d7/html5/thumbnails/6.jpg)
© 2008 Ken Rehor. All Rights Reserved.
Scripts
HTTPHTTPS
HTTPHTTPS
VoIPGateway
VoiceXMLBrowser
Telephony Control Interface: SIP, etc.Dialog Control Interface: SIP, MSCP, etc.
DialogControlInterface
VoiceXMLApplication
CCXML VXML
Conference/MediaServer
CCXMLBrowser
PhoneNetwor
k
Caller
CCXMLCall ControlApplication
Media ControlInterface
SOAP
MRCP Client
Audio
DTMF
GRXML
Scripts
Audio
MediaMixer /Server
T1 / E1ISDNSS7
SIP
RFC 2833
RTP
TTS
Server
M R C P
SIV
Server
ASR
Server
GRXMLSSML
GRXML
G.711, WAV, .au, mp3, etc.
SIP NetannMSCMLMOML / MSMLMSCPDMSPMGCPetc.
Telephony ControlInterface
VoiceXML 2.0VoiceXML 2.1ECMAScript 262
MRCP v1MRCP v2
SSML
Voice App Architecture and Standards
![Page 7: Ken Rehor's presentation at eComm 2008](https://reader036.vdocuments.mx/reader036/viewer/2022081519/555052c0b4c90574428b46d7/html5/thumbnails/7.jpg)
Why Standards?
• Grow an industry
• Interoperation
• Lower cost of goods
• Innovation and evolution
• Disrupt proprietary markets
• Ecosystems develop around every open interface
• Everyone benefits through joint work: reduces design effort
• Promote technology to the next level
• Sell more due to larger market
![Page 8: Ken Rehor's presentation at eComm 2008](https://reader036.vdocuments.mx/reader036/viewer/2022081519/555052c0b4c90574428b46d7/html5/thumbnails/8.jpg)
Open Interfaces Enable Innovation
• Migration: Proprietary, hardware-based solutions to Proprietary software-based solutions to Open Software
• New Business Models• e.g. Voice Service Provider: Separate application from
Telephony/Speech resources
• Separation of concerns
• Evolve components without starting from scratch
• Concentrate on innovation rather than duplication
• Move up the value chain
![Page 9: Ken Rehor's presentation at eComm 2008](https://reader036.vdocuments.mx/reader036/viewer/2022081519/555052c0b4c90574428b46d7/html5/thumbnails/9.jpg)
• Leverage open, known technology• Web protocols, servers, networks, development tools, expertise
• Distributed Client-Server Architecture• Enables new business models and efficient resource utilization
• Standard/Common high-level language• Designed for voice dialogs and telephony
• Phone number mapped to URL• Phone number associated with URL of voice application
Voice Web Fundamental Concepts
![Page 10: Ken Rehor's presentation at eComm 2008](https://reader036.vdocuments.mx/reader036/viewer/2022081519/555052c0b4c90574428b46d7/html5/thumbnails/10.jpg)
Visual vs. Voice markup
Web app UI
• HTML – Structure• Layout
• Input declaration
• Transitions
• Images
• Audio
• Video
• Text
• Scripts
Voice Web app UI
• VoiceXML – Structure• Dialog flow
• Input declaration
• Transitions
• Audio
• Video, Images
• Text (for TTS)
• Scripts
![Page 11: Ken Rehor's presentation at eComm 2008](https://reader036.vdocuments.mx/reader036/viewer/2022081519/555052c0b4c90574428b46d7/html5/thumbnails/11.jpg)
Protocols
Web applications• HTTP, HTTPS
• SIP
• RTP
• SOAP
• WSDL
• …
Voice Web applications• HTTP, HTTPS
• SIP
• RTP
• SOAP
• WSDL
• …
![Page 12: Ken Rehor's presentation at eComm 2008](https://reader036.vdocuments.mx/reader036/viewer/2022081519/555052c0b4c90574428b46d7/html5/thumbnails/12.jpg)
The Telecom Trilogy
• User Interaction• Voice user interface• Multimodal user interface
• Switching• Connecting endpoints• Moving connections• Signaling
• Media processing• ASR, SIV, TTS, Record / Play• Conferencing, Mixing, Echo cancellation• Endpointing, Coding / Format conversion
![Page 13: Ken Rehor's presentation at eComm 2008](https://reader036.vdocuments.mx/reader036/viewer/2022081519/555052c0b4c90574428b46d7/html5/thumbnails/13.jpg)
Ecosystem at Every Interface
AudioEngine
ASREngine
<grxml>
TTSEngine
<ssml>
VoiceXML browser
<vxml>Application Server
Code Generator
GUI Tool / SDEProprietary dialog XML
.wav
<xml>
VoiceXML, GRXML, SSML,Scripts, etc.
MRCP client
MRCP server
VSP:Telephony, Speech, apps
• Application Developers
• VUI designers
• Voice platforms
• Tools
• Service Providers
• Application Servers
![Page 14: Ken Rehor's presentation at eComm 2008](https://reader036.vdocuments.mx/reader036/viewer/2022081519/555052c0b4c90574428b46d7/html5/thumbnails/14.jpg)
Industry Standards – Global Adoption
• VoiceXML Forum • Nearly 100 member organizations worldwide• Platform Certification• Speaker Biometrics• Collaborating with W3C, ANSI, ISO
• W3C Speech Interface Framework• VoiceXML 2.0/2.1, SRGS 1.0, SSML 1.0, CCXML 1.0• SISR 1.0, PLS 1.0• Coming: VoiceXML 3.0, SSML 1.1
• IETF• Media Resource Control Protocol (MRCPv2)• SIP / VoiceXML media server spec (MEDIACTRL)
![Page 15: Ken Rehor's presentation at eComm 2008](https://reader036.vdocuments.mx/reader036/viewer/2022081519/555052c0b4c90574428b46d7/html5/thumbnails/15.jpg)
W3C Speech Interface Framework
• VoiceXML
• SRGS
• SSML
• Semantic Interpretation
• Call Control
• Pronunciation Lexicon
• SCXML
For more information, see:
W3C Voice Browser Working Group http://www.w3.org/Voice/
![Page 16: Ken Rehor's presentation at eComm 2008](https://reader036.vdocuments.mx/reader036/viewer/2022081519/555052c0b4c90574428b46d7/html5/thumbnails/16.jpg)
W3C Speech Interface Framework
• W3C VoiceXML 2.0• W3C Recommendation March 2004• Widely implemented
• Approximately 4 dozen platforms• Many service providers worldwide• Many tools, countless applications
• VoiceXML Forum Platform Certification Program
• 24 certified platforms, more coming
• W3C VoiceXML 2.1• W3C Recommendation April 2007• Most platform vendors support it• Certification Program and Test suite in progress
• W3C VoiceXML 3.0• Spec in early stages of development
![Page 17: Ken Rehor's presentation at eComm 2008](https://reader036.vdocuments.mx/reader036/viewer/2022081519/555052c0b4c90574428b46d7/html5/thumbnails/17.jpg)
W3C Speech Interface Framework
• Call Control W3C CCXML 1.0• W3C Working Draft Jan 2007
• Implementations increasing
• Pronunciation Lexicon W3C PLS 1.0• Used to describe phonetic information for use in
speech recognition and synthesis
• 2nd Last Call Working Draft Oct 2006
![Page 18: Ken Rehor's presentation at eComm 2008](https://reader036.vdocuments.mx/reader036/viewer/2022081519/555052c0b4c90574428b46d7/html5/thumbnails/18.jpg)
W3C Speech Interface Framework• Input grammars SRGS 1.0
• W3C Recommendation March 2004
• Widely implemented
• Output formatting SSML 1.0, 1.1• SSML 1.0 - W3C Recommendation March 2004
• Widely implemented, yet minor real support (most TTS engines ignore the SSML instructions)
• SSML 1.1 – W3C Working Draft June 2007
• Adds support for Asian, Eastern European, and Middle Eastern languages
• Semantic Interpretation for Speech Recognition SISR 1.0• W3C Recommendation April 2007
• Implementations increasing
• Required for new Platform Certification
![Page 19: Ken Rehor's presentation at eComm 2008](https://reader036.vdocuments.mx/reader036/viewer/2022081519/555052c0b4c90574428b46d7/html5/thumbnails/19.jpg)
What's Next?
• VoiceXML 3.0• Video
• Multimodal integration
• Speaker Biometrics
• Cleaner Modularity
• SCXML 1.0• State Chart Markup Language
• Separate logic from presentation • W3C Working Draft Feb 2007
• Several implementations available
• Commercial, educational, open source
![Page 20: Ken Rehor's presentation at eComm 2008](https://reader036.vdocuments.mx/reader036/viewer/2022081519/555052c0b4c90574428b46d7/html5/thumbnails/20.jpg)
Web / Voice ++
• Standards enable easy integration with other technologies
• Re-use web technologies
• Multiple modalities / channels: Voice +• SMS
• Web
• Chat
• Mobile
• Voice Control / Search
![Page 21: Ken Rehor's presentation at eComm 2008](https://reader036.vdocuments.mx/reader036/viewer/2022081519/555052c0b4c90574428b46d7/html5/thumbnails/21.jpg)
"Integration" / "Mashups" / "SOA"
• Modular architecture
• Open interfaces
• Common languages, protocols
• Combine data, services, modalities
• Easy adoption of new technologies and features• Video
• Multimodal
• Biometrics
• Telephony
![Page 22: Ken Rehor's presentation at eComm 2008](https://reader036.vdocuments.mx/reader036/viewer/2022081519/555052c0b4c90574428b46d7/html5/thumbnails/22.jpg)
POTS
PSTN orVoIP
Mashups, SOA, Multi-Channel/Modal
Mobile web
VXMLBrowser
Voice UIApp
Mobile IP
IP
PC
Presentationlogic
Businesslogic
Mobile UIApp
Web UIApp
![Page 23: Ken Rehor's presentation at eComm 2008](https://reader036.vdocuments.mx/reader036/viewer/2022081519/555052c0b4c90574428b46d7/html5/thumbnails/23.jpg)
http://www.kenrehor.com
http://www.voicexml.org
http://www.w3.org/voice
For more information:
![Page 24: Ken Rehor's presentation at eComm 2008](https://reader036.vdocuments.mx/reader036/viewer/2022081519/555052c0b4c90574428b46d7/html5/thumbnails/24.jpg)
An eComm 2008 presentation –
http://eCommMedia.com for more