crowd-sourcing platform for large-scale speech data...
TRANSCRIPT
Crowd-sourcing platform for large-scale
speech data collection
João Freitas, António Calado, Daniela Braga, Pedro Silva,
Miguel Sales Dias
Outline
• Motivation
• Crowd-sourcing
• System description
• Quiz Game and Personalized TTS
• Media and user feedback
• Results
• Conclusions and Future work
2
Motivation
• ASR systems based on statistical models require vast
amounts of speech data
• Corpora are expensive
• Databases quality issues:
– bad recording conditions
– sample rates inconsistency
– inexistent transcription
– Etc.
3
Previous Data Collections
4
YourSpeech
• What is YourSpeech?
– Platform that aims at collecting desktop speech data at negligible
costs for any language.
– Entertainment based reward in exchange for his/her speech.
5
Crowd-sourcing
• Act of outsourcing tasks to a community (crowd)
• Collaborative model
• Entity publishes a problem Crowd finds the solution
• Reward
• Task characteristics:
– Hard to automate
– Vast
– Expensive
6
Crowd-sourcing examples
7
Quiz Game
8
Personalized TTS
9
System description
10
Internet
YourSpeech Server
YourSpeech
Database
Recording
ServicesHTTPS
Handler
Handle wave files
Handle Sessions
TTS Generation Server
Recording Platform
Recording Application
ActiveX
Recording
Control
TTS Queue
Personalized TTS (2)
11
Media and User feedback
• Dissemination and advertisement are essential
• Positive feedback
• People in general liked the initiative
12
MSN Site National TV
Tech
maganazine
Tech blogs
Results
13
Quiz Game Personalized
TTS Total
Pure Speech
(hours) 3.87 21.4 25.27
Total audio
(hours) 11.9 48 59.9
Completed
Sessions 473 94 567
Incomplete
Sessions 205 223 428
Utterances 18300 9463 27763
Results (2)
14
Quiz
Game
Personalized
TTS Total
Words 2010 40119 42129
Insertions 79 46 125
Deletions 92 103 195
Substitutions 36 47 83
WER 10.3% 0.05% 1%
Ongoing campaigns
• “Doar a voz”:
http://www.doaravoz.com/
• YourSpeech deployment in 10
other languages
15
Future work
• Platform expansion to other languages
• Transcribe and annotate all the collected corpora
• Retrain existent acoustic models with the collected data
• Verify any changes in the ASR accuracy rate
• Increase the number of questions available in the quiz
• Improve UX
• Create content-specific games
– Focus on certain groups of words (e.g. city names, numbers,
etc.) in order to have acoustic models specialized in specific
grammar types
16
Conclusions
• Crowd-sourcing can be used to expand speech
resources at negligible costs
• Motivation (reward) and good dissemination are
essential
• Media and users “snowball” effect
• Games can be used to lure users
• Personalized TTS acts as a “qui pro quo” service for
speech technology
• Positive results (1% total WER)
17
Thank you very much for your attencion!
Crowd-sourcing platform for large-
scale speech data collection
www.microsoft.com/portugal/mldc
Questions?
FALA 2010
12th November 2010, Vigo, Spain
18
PT-pt YourSpeech: www.pt.yourspeech.net