crowd-sourcing platform for large-scale speech data...

Crowd-sourcing platform for large-scale

speech data collection

João Freitas, António Calado, Daniela Braga, Pedro Silva,

Miguel Sales Dias

Outline

• Motivation

• Crowd-sourcing

• System description

• Quiz Game and Personalized TTS

• Media and user feedback

• Results

• Conclusions and Future work

2

Motivation

• ASR systems based on statistical models require vast

amounts of speech data

• Corpora are expensive

• Databases quality issues:

– bad recording conditions

– sample rates inconsistency

– inexistent transcription

– Etc.

3

Previous Data Collections

4

YourSpeech

• What is YourSpeech?

– Platform that aims at collecting desktop speech data at negligible

costs for any language.

– Entertainment based reward in exchange for his/her speech.

5

Crowd-sourcing

• Act of outsourcing tasks to a community (crowd)

• Collaborative model

• Entity publishes a problem Crowd finds the solution

• Reward

• Task characteristics:

– Hard to automate

– Vast

– Expensive

6

Crowd-sourcing examples

7

Quiz Game

8

Personalized TTS

9

System description

10

Internet

YourSpeech Server

YourSpeech

Database

Recording

ServicesHTTPS

Handler

Handle wave files

Handle Sessions

TTS Generation Server

Recording Platform

Recording Application

ActiveX

Recording

Control

TTS Queue

Personalized TTS (2)

11

Media and User feedback

• Dissemination and advertisement are essential

• Positive feedback

• People in general liked the initiative

12

MSN Site National TV

Tech

maganazine

Tech blogs

Results

13

Quiz Game Personalized

TTS Total

Pure Speech

(hours) 3.87 21.4 25.27

Total audio

(hours) 11.9 48 59.9

Completed

Sessions 473 94 567

Incomplete

Sessions 205 223 428

Utterances 18300 9463 27763

Results (2)

14

Quiz

Game

Personalized

TTS Total

Words 2010 40119 42129

Insertions 79 46 125

Deletions 92 103 195

Substitutions 36 47 83

WER 10.3% 0.05% 1%

Ongoing campaigns

• “Doar a voz”:

http://www.doaravoz.com/

• YourSpeech deployment in 10

other languages

15




Future work

• Platform expansion to other languages

• Transcribe and annotate all the collected corpora

• Retrain existent acoustic models with the collected data

• Verify any changes in the ASR accuracy rate

• Increase the number of questions available in the quiz

• Improve UX

• Create content-specific games

– Focus on certain groups of words (e.g. city names, numbers,

etc.) in order to have acoustic models specialized in specific

grammar types

16

Conclusions

• Crowd-sourcing can be used to expand speech

resources at negligible costs

• Motivation (reward) and good dissemination are

essential

• Media and users “snowball” effect

• Games can be used to lure users

• Personalized TTS acts as a “qui pro quo” service for

speech technology

• Positive results (1% total WER)

17

Thank you very much for your attencion!

Crowd-sourcing platform for large-

scale speech data collection

www.microsoft.com/portugal/mldc

Questions?

FALA 2010

12th November 2010, Vigo, Spain

18

PT-pt YourSpeech: www.pt.yourspeech.net

http://www.microsoft.com/portugal/mldc





crowd-sourcing platform for large-scale speech data...

Documents