speech-enabling web apps

84
Speech- Enabling Web Apps

Upload: mojo-lingo

Post on 18-Dec-2014

369 views

Category:

Technology


2 download

DESCRIPTION

An overview of the technology options for adding speech to web applications. It covers the HTML5 Speech Input API for speech recognition, using the Audio tag with 3rd party APIs for text-to-speech, and an overview of WebRTC application possibilities. Presented at the Atlanta Ruby Users Group meeting on November 13, 2013.

TRANSCRIPT

Page 1: Speech-Enabling Web Apps

Speech-Enabling Web Apps

Page 2: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

!2

Page 3: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

!2

Ben Klang

Page 4: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

!2

Ben Klang

Page 5: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

!2

Ben Klang

Page 6: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

ADD SPEECH TO THE WEB

!3

Page 7: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

ADD SPEECH TO THE WEB

!3

Page 8: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

ADD SPEECH TO THE WEB•Speech Input API

!3

Page 9: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

ADD SPEECH TO THE WEB•Speech Input API•Text-To-Speech (<Audio/>)

!3

Page 10: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

ADD SPEECH TO THE WEB•Speech Input API•Text-To-Speech (<Audio/>)•WebRTC

!3

Page 11: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

ADD SPEECH TO THE WEB•Speech Input API•Text-To-Speech (<Audio/>)•WebRTC

!3

http://bit.ly/HTML5_Speech_Input_APIhttp://www.w3.org/TR/webrtc/

Page 12: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

ADD SPEECH TO THE WEB•Speech Input API•Text-To-Speech (<Audio/>)•WebRTC

!3

http://bit.ly/HTML5_Speech_Input_APIhttp://www.w3.org/TR/webrtc/

Page 13: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

SPEECH INPUT API

!4

Page 14: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

SPEECH INPUT API

!5

Page 15: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

SPEECH INPUT API

!5

Page 16: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

SPEECH INPUT API

!5

<input type="text" x-webkit-speech />

Page 17: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

ANNYANG!

!6

Page 18: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

!7

Page 19: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

DEMO

!8

Page 20: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

SPEECH INPUT API CAVEATS

!9

Page 21: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

SPEECH INPUT API CAVEATS•Chrome Only :(

!9

Page 22: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

SPEECH INPUT API CAVEATS•Chrome Only :(•Uses Google ASR(duh)

!9

Page 23: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

SPEECH INPUT API CAVEATS•Chrome Only :(•Uses Google ASR(duh)•Partial Firefox implementation from GSoC

!9

Page 24: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

SPEECH INPUT API CAVEATS•Chrome Only :(•Uses Google ASR(duh)•Partial Firefox implementation from GSoC•Requires ASR Server

!9

Page 25: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

SPEECH INPUT API CAVEATS•Chrome Only :(•Uses Google ASR(duh)•Partial Firefox implementation from GSoC•Requires ASR Server•Only Google runs one today

!9

Page 26: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

SPEECH INPUT API CAVEATS•Chrome Only :(•Uses Google ASR(duh)•Partial Firefox implementation from GSoC•Requires ASR Server•Only Google runs one today•serviceURI attribute not yet implemented

!9

Page 27: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

SPEECH INPUT API CAVEATS•Chrome Only :(•Uses Google ASR(duh)•Partial Firefox implementation from GSoC•Requires ASR Server•Only Google runs one today•serviceURI attribute not yet implemented•Specification maturity seems slow

!9

Page 28: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

TEXT-TO-SPEECH

!10

Page 29: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

TTS API + <AUDIO/>

!11

Page 30: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

TTS API OPTIONS

!12

Page 31: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

TTS API OPTIONS•AT&T: http://developer.att.com

!12

Page 32: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

TTS API OPTIONS•AT&T: http://developer.att.com

•Nuance NDEVhttp://nuancemobiledeveloper.com/

!12

Page 33: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

TTS API OPTIONS•AT&T: http://developer.att.com

•Nuance NDEVhttp://nuancemobiledeveloper.com/

•Google: http://translate.google.com/translate_tts?

tl=en&q=TEXT

!12

Page 34: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

<AUDIO/> CAVEATS

!13

Page 35: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

<AUDIO/> CAVEATS•You can’t pay for Google TTS

!13

Page 36: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

<AUDIO/> CAVEATS•You can’t pay for Google TTS•No specified Mandatory To Implement (MTI) codecs

!13

Page 37: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

<AUDIO/> CAVEATS•You can’t pay for Google TTS•No specified Mandatory To Implement (MTI) codecs•Broad consensus

!13

Page 38: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

<AUDIO/> CAVEATS•You can’t pay for Google TTS•No specified Mandatory To Implement (MTI) codecs•Broad consensus•Everyone: MP3 (+containers H.264, MP4)

!13

Page 39: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

<AUDIO/> CAVEATS•You can’t pay for Google TTS•No specified Mandatory To Implement (MTI) codecs•Broad consensus•Everyone: MP3 (+containers H.264, MP4)•Except IE: Ogg/Vorbis, Opus, WebM

!13

Page 40: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

<AUDIO/> CAVEATS•You can’t pay for Google TTS•No specified Mandatory To Implement (MTI) codecs•Broad consensus•Everyone: MP3 (+containers H.264, MP4)•Except IE: Ogg/Vorbis, Opus, WebM•http://bit.ly/Browser_Audio_Codecs

!13

Page 41: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

!14

Page 42: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

WHAT IS WEBRTC TO ME?

!15

Page 43: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

WHAT IS WEBRTC TO ME?

!15

Telephones in Web Browsers!

Page 44: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

WHAT IS WEBRTC TO ME?

!15

Telephones in Web Browsers!Telephones in Web Browsers!

Page 45: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

How does WebRTC Work?

!16

Page 46: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

!17

http://

Page 47: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

!17

http://

Alice

Page 48: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

!17

http://

Alice Bob

Page 49: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

!17

http://Get

me B

ob p

leas

e!

SDP:

v=0

o=al

ice 2

0518 0

IN IP

4 0.0

.0.0

s=-

t=0 0

m

=audio

54609 R

TP/SAVPF 1

09

Alice Bob

Page 50: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

!17

http://Get

me B

ob p

leas

e!

SDP:

v=0

o=al

ice 2

0518 0

IN IP

4 0.0

.0.0

s=-

t=0 0

m

=audio

54609 R

TP/SAVPF 1

09

Alice Bob

Page 51: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

!17

http://Get

me B

ob p

leas

e!

SDP:

v=0

o=al

ice 2

0518 0

IN IP

4 0.0

.0.0

s=-

t=0 0

m

=audio

54609 R

TP/SAVPF 1

09

SDP:v=0 o=bob 19915 0 IN IP4 0.0.0.0

s=- t=0 0 m=audio 61001 RTP/SAVPF 109

Alice Bob

Page 52: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

!17

http://Get

me B

ob p

leas

e!

SDP:

v=0

o=al

ice 2

0518 0

IN IP

4 0.0

.0.0

s=-

t=0 0

m

=audio

54609 R

TP/SAVPF 1

09

SDP:v=0 o=bob 19915 0 IN IP4 0.0.0.0

s=- t=0 0 m=audio 61001 RTP/SAVPF 109

Alice Bob

Page 53: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

!17

http://Get

me B

ob p

leas

e!

SDP:

v=0

o=al

ice 2

0518 0

IN IP

4 0.0

.0.0

s=-

t=0 0

m

=audio

54609 R

TP/SAVPF 1

09

SDP:v=0 o=bob 19915 0 IN IP4 0.0.0.0

s=- t=0 0 m=audio 61001 RTP/SAVPF 109

Alice Bob

Page 54: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

!17

http://Get

me B

ob p

leas

e!

SDP:

v=0

o=al

ice 2

0518 0

IN IP

4 0.0

.0.0

s=-

t=0 0

m

=audio

54609 R

TP/SAVPF 1

09

SDP:v=0 o=bob 19915 0 IN IP4 0.0.0.0

s=- t=0 0 m=audio 61001 RTP/SAVPF 109

Alice Bob

Page 55: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

!17

http://Get

me B

ob p

leas

e!

SDP:

v=0

o=al

ice 2

0518 0

IN IP

4 0.0

.0.0

s=-

t=0 0

m

=audio

54609 R

TP/SAVPF 1

09

SDP:v=0 o=bob 19915 0 IN IP4 0.0.0.0

s=- t=0 0 m=audio 61001 RTP/SAVPF 109

Alice Bob

SRTP

SRTP

Page 56: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

!17

http://Get

me B

ob p

leas

e!

SDP:

v=0

o=al

ice 2

0518 0

IN IP

4 0.0

.0.0

s=-

t=0 0

m

=audio

54609 R

TP/SAVPF 1

09

SDP:v=0 o=bob 19915 0 IN IP4 0.0.0.0

s=- t=0 0 m=audio 61001 RTP/SAVPF 109

Alice Bob

SRTP

SRTP

X

Page 57: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

!18

Alice Bob

Page 58: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

!18

Alice Bob

Get m

e Bob

ple

ase!

SDP:

v=0

o=al

ice 2

0518 0

IN IP

4 0.0

.0.0

s=-

t=0 0

m

=audio

54609 R

TP/SAVPF 1

09

Page 59: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

!18

Alice Bob

Get m

e Bob

ple

ase!

SDP:

v=0

o=al

ice 2

0518 0

IN IP

4 0.0

.0.0

s=-

t=0 0

m

=audio

54609 R

TP/SAVPF 1

09

Page 60: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

!18

Alice Calling!

SDP:

v=0 o=freeswitch 19915 0 IN IP4 0.0.0.0

s=- t=0 0

m=audio 61001 RTP/SAVPF 109

Alice Bob

Get m

e Bob

ple

ase!

SDP:

v=0

o=al

ice 2

0518 0

IN IP

4 0.0

.0.0

s=-

t=0 0

m

=audio

54609 R

TP/SAVPF 1

09

Page 61: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

!18

Alice Calling!

SDP:

v=0 o=freeswitch 19915 0 IN IP4 0.0.0.0

s=- t=0 0

m=audio 61001 RTP/SAVPF 109

Alice Bob

Get m

e Bob

ple

ase!

SDP:

v=0

o=al

ice 2

0518 0

IN IP

4 0.0

.0.0

s=-

t=0 0

m

=audio

54609 R

TP/SAVPF 1

09

Page 62: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

!18

Alice Calling!

SDP:

v=0 o=freeswitch 19915 0 IN IP4 0.0.0.0

s=- t=0 0

m=audio 61001 RTP/SAVPF 109

Alice Bob

SRTP

SRTP

Get m

e Bob

ple

ase!

SDP:

v=0

o=al

ice 2

0518 0

IN IP

4 0.0

.0.0

s=-

t=0 0

m

=audio

54609 R

TP/SAVPF 1

09

Page 63: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

Example RTC Apps

!19

Page 64: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

Example RTC Apps

!19

2 Examples

Page 65: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

“Communicating isn’t going to be what you’re doing - it’s what you’ll be doing

while you’re doing something else”

- Geoff Hollingworth Ericsson Head of AT&T Foundry

!20

Page 66: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

1. Incident Response

!21

Page 67: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

!22

Page 68: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

INCIDENT RESPONSE

!23

Page 69: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

INCIDENT RESPONSE•Timely, Contextual Information•Adapt for mobile vs. desktop users•Group-based communication•Inherit from existing organizational groups•Allow ad-hoc participants (“guest” parties)•Federate with external services•Incident recording/logging•“Lessons learned” and process improvement•Links from/to issue tracking systems

!23

Page 70: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

2. Medical Records Management

!24

Page 71: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

!25

Page 72: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

MEDICAL RECORDS MGMT

!26

Page 73: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

MEDICAL RECORDS MGMT•Automate Medical Claims•Secure Caller Authentication•Reuse primary auth via website•Verify with voice biometrics•Cross-check against caller location•Call recording/transcription•Medical advice given to patient automatically added to patient file•Auditing/Service Quality Assurance

!26

Page 74: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

HTTPS://TALKY.IO/ATLRUG

!27

Page 75: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

WEBRTC CAVEATS

!28

Page 76: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

WEBRTC CAVEATS•Bleeding edge, developing standard

!28

Page 77: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

WEBRTC CAVEATS•Bleeding edge, developing standard•Only available on Chrome, Firefox

!28

Page 78: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

WEBRTC CAVEATS•Bleeding edge, developing standard•Only available on Chrome, Firefox•Only available on Desktop

!28

Page 79: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

WEBRTC CAVEATS•Bleeding edge, developing standard•Only available on Chrome, Firefox•Only available on Desktop•Well funded/backed development

!28

Page 80: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

WEBRTC CAVEATS•Bleeding edge, developing standard•Only available on Chrome, Firefox•Only available on Desktop•Well funded/backed development•Expect to see it mainstream (Desktop + Mobile) as soon as 2014

!28

Page 81: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

WEBRTC CAVEATS•Bleeding edge, developing standard•Only available on Chrome, Firefox•Only available on Desktop•Well funded/backed development•Expect to see it mainstream (Desktop + Mobile) as soon as 2014•http://iswebrtcreadyyet.com/

!28

Page 82: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?

!29

Page 83: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

!30

adhearsionconf.comEarly Bird Discount: atlrug

Page 84: Speech-Enabling Web Apps

CAN YOU SPEAK MAGIC?CAN YOU SPEAK MAGIC?

!31

http://mojolingo.com @MojoLingo@bklang

[email protected]

http://bit.ly/HTML5_Speech_Input_APIhttp://www.w3.org/TR/webrtc/http://iswebrtcreadyyet.com/

Early Bird Discount: atlrug