developing your own wake word engine just like alexa and...

27
Developing Your Own Wake Word Engine Just Like Alexaand “OK Google” Xuchen Yao, CEO, KITT.AI Guoguo Chen, CTO, KITT.AI

Upload: others

Post on 18-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and

Developing Your Own Wake Word Engine

Just Like “Alexa” and “OK Google”

Xuchen Yao, CEO, KITT.AI

Guoguo Chen, CTO, KITT.AI

Page 2: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and

What’s a “wake word”?

• Wake word

• Hot word

• Offline

• Code runs on

CPU/DSP/MCU

• 7x24• Always listening

• One shot

understanding

• Online

• Code runs on cloud

• On Demand

• Explicit permission

Alexa

OK Google

Hey Siriwhat’s the weather today?

Page 3: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and

Conversational UI Pipeline

wake up

device

speech text

text

understandingdialogue

management

text speech

text

voice

Page 4: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and

a customizable hotword detection engine

a.k.a: deep neural network in 2MB of RAM

hotword.io video blog

Page 5: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and
Page 6: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and

10,000+ developers, 7000+ unique hotwords

Who’s using it (released 5/2016)

Dominating developer community for hotword detection

Page 7: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and

Use Cases

Page 8: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and

#1 Hotword: Smart Mirrorhttps://github.com/evancohen/smart-mirror (credits to Evan Cohen) video link

Page 9: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and

Command & Control: GoPiGo(credits to Paul Matz) video link

Page 10: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and

Project RePL(credits to Chris Burns) video link

Page 11: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and

Conversational UI Pipeline

wake up

device

speech text

text

understandingdialogue

management

text speech

text

voice

Speech Pipeline

Page 12: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and

VoiceMicrophone

Array

Wake Word

Detection

Speech

Recognition

local

• Close talking

• Far field (3-9

feet)

• 2, 4, or 6

microphones

• Linear/circular

cloud/local

• Voice Activity

Detection

• Auto Gain

Control

• Fast response

(0.1 second)

• High accuracy

• Adaptive Echo

Cancellation

• Beam forming

• IBM/Microsoft/Nua

nce/Google

• Alexa Voice Service

• Kaldi

• PocketSphinx

• HTK

• Command & Control

• Language

Understanding

• Telephone

(8KHz Sampling)

• Others (16KHz)

• Noises: TV,

radio, street,

café, car, music

• Pitch: children,

adults, senior

• Accent:

US/UK/Europe/

Asian…

Speech Pipeline

Page 13: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and

Supported Platforms and Wrappers

• Raspberry Pi

• Mac OS X

• iPhone/iPad/iPod

• x86/64bit Ubuntu

• Android

• Pine 64

• Intel Edison

• Samsung Artik

• Allwinner R-series

• Ingenic X1000

• Rockchip

Page 14: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and

Personal vs. Universal modelsPersonal Universal

Voice samples needed 3 At least 1500

Speaker-independent No Yes

Speaker-specific Sort of No

Robust against noise No Yes

Free Yes No

Time needed Immediately 2 weeks

Page 15: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and

Customizing a universal model

define

hotwordcollect voice

train a

model

deliver &

evaluate

deploy to

beta users

ship &

success

collect voice

from device

hotword

web API

Iterate & Improve

desired performance:

>90% detection rate

<= 3 false alarms in 24 hours

Page 16: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and

Science behind wake word

Page 17: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and

Challenges

• High detection rate

• Low false alarm

• Efficient: detect every 0.1 second

• Small RAM: <2MB

• Too much ambiguity, not much context

Is this “Alexa”?

short window longer window

Page 18: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and

Existing Algorithm

Page 19: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and

Existing Algorithm

Page 20: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and

Existing Algorithm

• Advantage:

–Simplified pipeline

–Simplified decoder

• Disadvantage:

–Massive hotword specific training data

Page 21: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and

Possible Ways to Improve

• Data augmentation

– Adding noise

– Adding reverberation

– And so on…

original add noise add noise

and reverberation

Page 22: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and

Possible Ways to Improve

• Network models

– Model selection

• Feedforward models? Recurrent models?

– Model compression

• 32-bit float 16-bit float 8-bit integer

• Parameters with small absolute value

Page 23: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and

Possible Ways to Improve

• Decoder redesigning

– Modeling smaller units

• Syllables, phones, etc

– False alarm suppression

• Additional classifier?

Page 24: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and

Training with Tesla K20/K80

• Positive data

– 1,500 hotword samples

• Negative data

– Thousands of hours of speech

• Training time

– Half a day with 4 K80 GPUs

Page 25: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and

Software Architecture

FrontendBackend

Page 26: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and

KITT.AI Scientific Computing

Deep Learning Cloud

DevicesProduction

Cloud

Traffic

ELB

Content

Websocket

audio, msg

HTTPs

Message

Queue

Data Training Model Deploy

Page 27: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and

Running Your First Snowboy Demo