nuance
TRANSCRIPT
CONFIDENTIAL | © 2002-2011 Nuance Communications, Inc. All rights reserved. MOBILE SOLUTIONS1
Voice Control combined with Speech-To-Text and NLU resulting in Smart UI
Reimund Schmald, Nuance
Stefan Seide, T-Systems
CONFIDENTIAL | © 2002-2011 Nuance Communications, Inc. All rights reserved. MOBILE SOLUTIONS2 Scene from Star Trek IV: The Voyage Home (1986)
This is what we are working on!
CONFIDENTIAL | © 2002-2011 Nuance Communications, Inc. All rights reserved. MOBILE SOLUTIONS3
Agenda
• Multi-Modal Input UE: Status and Trends in Mobile
• Voice enabled NLU: Requirements + Demo
• Hybrid Architecture, Programming Example
CONFIDENTIAL | © 2002-2011 Nuance Communications, Inc. All rights reserved. MOBILE SOLUTIONS4
Write
Speak
Type
Swype
Starting with Keyboard
CONFIDENTIAL | © 2002-2011 Nuance Communications, Inc. All rights reserved. MOBILE SOLUTIONS5
Multi-Modality in AppsExample: amazon and iTranslate
CONFIDENTIAL | © 2002-2011 Nuance Communications, Inc. All rights reserved. MOBILE SOLUTIONS6
Security and Personalization
From completing a financial transaction to accessing sensitive content Voice Biometrics offers security so you can proceed with confidence.
Through speaker identification Voice Biometrics delivers a personalized experience where various users profiles are available, e.g. shared devices such as the TV or tablets. Simply speak, and your personal settings are loaded.
“My voice is my password”
CONFIDENTIAL | © 2002-2011 Nuance Communications, Inc. All rights reserved. MOBILE SOLUTIONS7
Text Dictionary (Local)
Speech Dictionary (Cloud)
Personalization – Across Devices
CONFIDENTIAL | © 2002-2011 Nuance Communications, Inc. All rights reserved. MOBILE SOLUTIONS8
Just the Mic Button
Requirements: High Quality SpeechToText + NLU
CONFIDENTIAL | © 2002-2011 Nuance Communications, Inc. All rights reserved. MOBILE SOLUTIONS9
High Quality SpeechToText on NDEV
The Industry’s FIRST developer program to offer Speech To Text and Text to Speech integration for
any mobile app
8000+ developers registered to date
iOS, Android, WP 7
www.ndevmobile.com
CONFIDENTIAL | © 2002-2011 Nuance Communications, Inc. All rights reserved. MOBILE SOLUTIONS10
NDEV Mobile: Client SDK Technical AspectsSDK Components
– Recognizer object– Audio engine– End of Speech Detection– Encoding (compresses request to conserve bandwidth)– Network Transport
Server Components– Authentication– Recognizer– Vocalizer TTS
1. Client application invokes SDK2. SDK captures request and encodes it
• Might use End of Speech, if enabled
1. SDK Network Transport sends utterance to NVC Servers2. NVC Server authenticates Client app3. Recognizer/TTS processes request4. NVC Server redirects response to Client5. SDK processes response and sends to Client app6. Client app plays/shows response
Dragon SDKRecogniser
ObjectAudio
EngineEnd Of Speech
Network Transport Encoding
Authentication
MREC Vocalizer
Search Dictation
NVC Hosted Server
Client Application 1
2
3
45
6
7
8
CONFIDENTIAL | © 2002-2011 Nuance Communications, Inc. All rights reserved. MOBILE SOLUTIONS11
NDev mobile Service Levels
FREE
CONFIDENTIAL | © 2002-2011 Nuance Communications, Inc. All rights reserved. MOBILE SOLUTIONS12
Feature Comparison: Gold, Silver, EmeraldSilver Gold Emerald
FeaturesASR Dictation & Search Models for 18 Languages ü ü ü
Network TTS for 30+ Languages ü ü ü
Bluetooth Support (8 KHz) ü ü üSSL ü üCustomized Features ü
Flexibility & CustomizationUI ü ü üPlatforms Android, iOS, WP7 ü ü ü HTTP ü üConsulting Services AvailableAvailability & SupportCentralized Speech Resource & Support Forums ü ü ü
Web Ticketing ü üSLA ü üDedicated Support Contact AvailableCostDevelopment Free for 90 days Payment Options CustomProduction Free w/ cap $0.009 trx or $0.24 flat Custom
CONFIDENTIAL | © 2002-2011 Nuance Communications, Inc. All rights reserved. MOBILE SOLUTIONS13
Different Levels of NLU
Structured NLU Unstructured NLU
Embedded & Connected speech systems working together to determine what specific phone-related task the user is looking to complete.
“Send text to John Call me shortly”
“Search for New York Yankees”
“Update Facebook I am today in Berlin”
Server-side natural language understanding platform that supports open-ended queries and intent classification.
“Is it raining in Berlin?”
“What movies are playing near me?”
“Make a reservation to Capital Grille in Burlington for 8 pm on Friday for 2 people”
NUANCE PROPRIETARY NON-DISCLOSURE INFORMATION
CONFIDENTIAL | © 2002-2011 Nuance Communications, Inc. All rights reserved. MOBILE SOLUTIONS14
Deploying a Comprehensive Speech Solution
Structured NLUNVC Hybrid allows users to
complete core phone functions (dialing, messaging, etc…)
Unstructured NLUDragonGO! allows for intelligent
Web and Content access
Both NLU systems can be combined to offer a comprehensive speech experience
+
All Web and media related queries can be passed to unstructured
NLU system (e.g. DragonGo!)
NUANCE PROPRIETARY NON-DISCLOSURE INFORMATION
CONFIDENTIAL | © 2002-2011 Nuance Communications, Inc. All rights reserved. MOBILE SOLUTIONS15
Dragon Go! Directed Search
• A specific site is referenced in the query.
• Today we support 180+ content providers including…
• CNN
• eBay
• Engadget
• New York Times
• TechCrunch
• USA Today
• Regional Newspapers
CONFIDENTIAL | © 2002-2011 Nuance Communications, Inc. All rights reserved. MOBILE SOLUTIONS16
Dragon Go! Intent Search
• CALL a business
• GET directions
• MAKE reservations
• PLAY music
• BUY tickets, products, music
• More…
CONFIDENTIAL | © 2002-2011 Nuance Communications, Inc. All rights reserved. MOBILE SOLUTIONS17
Dragon Go! Category Search
• Music
• Movies
• Businesses
• Restaurants
• Sports
• News
• Shopping
• Weather
• More…