microsoft research academic services gtm draft...crowdsourcing real-time transcription •benefits:...

15

Upload: others

Post on 27-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Microsoft Research Academic Services GTM Draft...Crowdsourcing Real-Time Transcription •Benefits: •People understand speech well, in many different domains •People can more robustly
Page 2: Microsoft Research Academic Services GTM Draft...Crowdsourcing Real-Time Transcription •Benefits: •People understand speech well, in many different domains •People can more robustly

Walter S. Assistant Professor, University of Michigan (CSE)

AudioAccessibility

Page 3: Microsoft Research Academic Services GTM Draft...Crowdsourcing Real-Time Transcription •Benefits: •People understand speech well, in many different domains •People can more robustly

Speech/Audio Access

Speech Events Language

Page 4: Microsoft Research Academic Services GTM Draft...Crowdsourcing Real-Time Transcription •Benefits: •People understand speech well, in many different domains •People can more robustly

Speech/Audio Access

Speech Events Language

Page 5: Microsoft Research Academic Services GTM Draft...Crowdsourcing Real-Time Transcription •Benefits: •People understand speech well, in many different domains •People can more robustly

Real-Time Speech Transcription (Captioning)

• Goal: convert spoken language into written text quickly• To align visual / referenced content with speech requires doing this in < 5-10s

• Challenges:• Speakers vary (accent, intonation, speech patterns, prosody, speed, cold/flu/etc., …)

• Environments vary (echo, distance, acoustic properties, background noise, …)

• Settings vary (speaker direction, mic type, mic location/movement, content topic, …)

• Natural language is weird (highly variable/flexible, use of anaphora, vague references, …)

• Spoken language is weirder (stops / restarts / repeats, less grammatical, less formal, …)

Page 6: Microsoft Research Academic Services GTM Draft...Crowdsourcing Real-Time Transcription •Benefits: •People understand speech well, in many different domains •People can more robustly

Crowdsourcing Real-Time Transcription

• Benefits:• People understand speech well, in many different domains

• People can more robustly adapt to changes on the fly

• People can keep up with an evolving conversation over long periods of time

• Challenges:• People cannot type fast enough to capture a useful piece of continuous speech!• (usually people get 10-20% of what is said, depending on typing proficiency)

Page 7: Microsoft Research Academic Services GTM Draft...Crowdsourcing Real-Time Transcription •Benefits: •People understand speech well, in many different domains •People can more robustly

Students

3-4s?

Media Server

[ Lasecki et al., UIST 2012 ]

Crowd

Merging Server

Students

Speaker

Scribe: Real-Time Transcription by Non-Experts

Page 8: Microsoft Research Academic Services GTM Draft...Crowdsourcing Real-Time Transcription •Benefits: •People understand speech well, in many different domains •People can more robustly

1-3 sec 1-3 sec 1-3 sec 1-3 sec

[ Lasecki et al., UIST 2012 ]

Scribe: Real-Time Transcription by Non-Experts

Page 9: Microsoft Research Academic Services GTM Draft...Crowdsourcing Real-Time Transcription •Benefits: •People understand speech well, in many different domains •People can more robustly

[ Lasecki et al., UIST 2012 ]

Scribe: Real-Time Transcription by Non-Experts

Page 10: Microsoft Research Academic Services GTM Draft...Crowdsourcing Real-Time Transcription •Benefits: •People understand speech well, in many different domains •People can more robustly

the brown fox jumped

quick fox lazy dog

Fox jumped over the lazy

Combiner

the quick brown fox jumped over the lazy dog

Final Caption

[ Lasecki et al., UIST 2012 ]

Scribe: Real-Time Transcription by Non-Experts

Page 11: Microsoft Research Academic Services GTM Draft...Crowdsourcing Real-Time Transcription •Benefits: •People understand speech well, in many different domains •People can more robustly

the brown fox jumped

quick fox lazy dog

Fox jumped over the lazy

Combiner

the quick brown fox jumped over the lazy dog

Final Caption

[ Lasecki et al., UIST 2012 ]

95% recall, 87% precision, in 2.9s

Scribe: Real-Time Transcription by Non-Experts

Page 12: Microsoft Research Academic Services GTM Draft...Crowdsourcing Real-Time Transcription •Benefits: •People understand speech well, in many different domains •People can more robustly

Hybrid Intelligence for Real-Time Transcriptionthe brown fox jumped

quick fox lazy dog

Fox jumped over the lazy

Combiner

the quick brown fox jumped over the lazy dog

Final Caption

Page 13: Microsoft Research Academic Services GTM Draft...Crowdsourcing Real-Time Transcription •Benefits: •People understand speech well, in many different domains •People can more robustly

Takeaways: Speech Access Powered by the Crowd• We can create systems that let people collectively

outperform the individual on streaming tasks

• Domain experts (volunteers, students, etc.) can help• More generally: democratize access technology by lowering barriers

• Provides a framework for training ASR on the fly• We can train ASR while testing it, and combine human and machine input as we go

• This intermix ratio will change over time as automation (ASR) becomes more robust

Page 14: Microsoft Research Academic Services GTM Draft...Crowdsourcing Real-Time Transcription •Benefits: •People understand speech well, in many different domains •People can more robustly
Page 15: Microsoft Research Academic Services GTM Draft...Crowdsourcing Real-Time Transcription •Benefits: •People understand speech well, in many different domains •People can more robustly