speech for windows phone 8
DESCRIPTION
A quick introduction to the speech capabilities of Windows Phone 8, presented at the Appsterdam Milan TalkLab on Marg 14th, 2013.TRANSCRIPT
Speech for
Windows Phone 8
Marco Massarelli
http://ceoloide.com
Speech for Windows Phone 8
1. Voice commands
2. Speech recognition
3. Text-to-speech (TTS)
4. Q&A
1 Voice commands1
YOUR APP
VOICE COMMANDSTEXT-TO-SPEECH (TTS)
Voice commands
• Application entry point
• Can act as deep links to your application
1
SPEECH RECOGNITION
Voice commands
• Set up your project capabilities:
– D_CAP_SPEECH_RECOGNITION,
– ID_CAP_MICROPHONE,
– ID_CAP_NETWORKING
• Create a new Voice Command Definition
1
Voice commands1
<?xml version="1.0" encoding="utf-8"?><VoiceCommands xmlns="http://schemas.microsoft.com/voicecommands/1.0">
<CommandSet xml:lang="en-us"><CommandPrefix> Contoso Widgets </CommandPrefix><Example> Show today's specials </Example><Command Name="showWidgets">
<Example> Show today's specials </Example><ListenFor> [Show] {widgetViews} </ListenFor><ListenFor> {*} [Show] {widgetViews} </ListenFor><Feedback> Showing {widgetViews} </Feedback><Navigate Target="/favorites.xaml"/>
</Command><PhraseList Label="widgetViews">
<Item> today's specials </Item><Item> best sellers </Item>
</PhraseList></CommandSet>
<!-- Other CommandSets for other languages -->
</VoiceCommands>
Voice commands1
await VoiceCommandService.InstallCommandSetsFromFileAsync( new Uri("ms-appx:///ContosoWidgets.xml") );
• Install the Voice Command Definition (VCD) file
• VCD files need to be installed again when a
backup is restored on a device.
Voice commands1
• Voice commands parameters are included in the
QueryString property of the NavigationContext
• Asterisks in ListenFor phrases are passed as “…”
– In other words, it is not possible to receive the actual
text that matched the asterisk.
"/favorites.xaml?voiceCommandName=showWidgets&widgetViews=best%20sellers&reco=Contoso%20Widgets%Show%20best%20sellers"
1 Speech recognition2
2 Speech recognition
YOUR APP
VOICE COMMANDSTEXT-TO-SPEECH (TTS)
SPEECH RECOGNITION
• Natural interaction with your application
• Grammar-based
• Requires internet connection
Speech recognition
• Default dictation grammar for free-text
and web-search are included in WP8
• Custom grammar can be defined in two
ways:
– Programmatic list grammar (array of strings)
– XML grammar leveraging on Speech
Recognition Grammar Specification (SRGS) 1.0
2
2 Speech recognition
private async void ButtonWeatherSearch_Click(object sender, RoutedEventArgs e) {
// Add the pre-defined web search grammar to the grammar set.SpeechRecognizerUI recoWithUI = new SpeechRecognizerUI();
recoWithUI.Recognizer.Grammars.AddGrammarFromPredefinedType ("weatherSearch", SpeechPredefinedGrammar.WebSearch);
// Display text to prompt the user's input.recoWithUI.Settings.ListenText = "Say what you want to search for";
// Display an example of ideal expected input.recoWithUI.Settings.ExampleText = @"Ex. 'weather for London'";
// Load the grammar set and start recognition.SpeechRecognitionUIResult result = await recoWithUI.RecognizeWithUIAsync();
}
• Default dictation grammar
2 Speech recognition
private async void ButtonSR_Click(object sender, RoutedEventArgs e) {
SpeechRecognizerUI recoWithUI = new SpeechRecognizerUI();
// You can create this string dynamically, for example from a movie queue.string[] movies = { "Play The Cleveland Story", "Play The Office", "Play Psych", "Play Breaking Bad", "Play Valley of the Sad", "Play Shaking Mad" };
// Create a grammar from the string array and add it to the grammar set.recoWithUI.Recognizer.Grammars.AddGrammarFromList("myMovieList", movies);
// Display an example of ideal expected input.recoWithUI.Settings.ExampleText = @"ex. 'Play New Mocumentaries'";
// Load the grammar set and start recognition.SpeechRecognitionUIResult result = await recoWithUI.RecognizeWithUIAsync();
// Play movie given in result.Text}
• Programmatic list grammar
2 Speech recognition
private async void ButtonSR_Click(object sender, EventArgs e) {
// Initialize objects ahead of time to avoid delays when starting recognition.SpeeechRecognizerUI recoWithUI = new SpeechRecognizerUI();
// Initialize a URI with a path to the SRGS-compliant XML file.Uri orderPizza = new Uri("ms-appx:///OrderPizza.grxml", UriKind.Absolute);
// Add an SRGS-compliant XML grammar to the grammar set.recoWithUI.Recognizer.Grammars.AddGrammarFromUri("PizzaGrammar", orderPizza);
// Preload the grammar set.await recoWithUI.Recognizer.PreloadGrammarsAsync();
// Display text to prompt the user's input.recoWithUI.Settings.ListenText = "What kind of pizza do you want?";
// Display an example of ideal expected input.recoWithUI.Settings.ExampleText = "Large combination with Italian sausage";
SpeechRecognitionUIResult recoResult = await recoWithUI.RecognizeWithUIAsync();}
• XML grammar
2 Speech recognition
23Text-to-speech
(TTS)
Text-to-speech (TTS)3
YOUR APP
VOICE COMMANDSTEXT-TO-SPEECH (TTS)
SPEECH RECOGNITION
• Output synthetized speech
• Provide the user with spoken instructions
Text-to-speech (TTS)3
• TTS requires only the following capability:
– ID_CAP_SPEECH_RECOGNITION
• TTS can output the following text types:
– Unformatted text strings
– Speech Synthesis Markup Language (SSML)
1.0 strings or XML files
Text-to-speech (TTS)3
• Outputting unformatted strings is very easy and
it is also possible to select a voice language:
// Declare the SpeechSynthesizer object at the class level.SpeechSynthesizer synth;
private async void ButtonSimpleTTS_Click(object sender, RoutedEventArgs e){
SpeechSynthesizer synth = new SpeechSynthesizer();await synth.SpeakTextAsync("You have a meeting with Peter in 15 minutes.");
}
private async void SpeakFrench_Click_1(object sender, RoutedEventArgs e) {
synth = new SpeechSynthesizer(); // Query for a voice that speaks French.
IEnumerable<VoiceInformation> frenchVoices = from voice in InstalledVoices.Allwhere voice.Language == "fr-FR" select voice;
// Set the voice as identified by the query.synth.SetVoice(frenchVoices.ElementAt(0));
// Count in French.await synth.SpeakTextAsync("un, deux, trois, quatre");
}
Text-to-speech (TTS)3
• SSML 1.0 text can be outputted from string
or XML files
// Speaks a string of text with SSML markup.private async void SpeakSsml_Click(object sender, RoutedEventArgs e) {
SpeechSynthesizer synth = new SpeechSynthesizer(); // Build an SSML prompt in a string.string ssmlPrompt = "<speak version=\"1.0\" ";ssmlPrompt += "xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\">";ssmlPrompt += "This voice speaks English. </speak>"; // Speak the SSML prompt.await synth.SpeakSsmlAsync(ssmlPrompt);
}
// Speaks the content of a standalone SSML file.private async void SpeakSsmlFromFile_Click(object sender, RoutedEventArgs e) {
// Set the path to the SSML-compliant XML file.SpeechSynthesizer synth = new SpeechSynthesizer();
string path = Package.Current.InstalledLocation.Path + "\\ChangeVoice.ssml"; Uri changeVoice = new Uri(path, UriKind.Absolute); // Speak the SSML prompt.await synth.SpeakSsmlFromUriAsync(changeVoice);
}
34 Q&A
Questions & Answers
• Speech for Windows Phone 8 API
references:
– http://msdn.microsoft.com/en-
us/library/windowsphone/develop/jj206958(v
=vs.105).aspx
4
Thank you!