december 19, 2005 fpms. acapela’s corporate profile
Post on 19-Dec-2015
214 views
TRANSCRIPT
December 19, 2005
FPMS
Acapela’s corporate profile
Group Background
Babel Technologies> Created in 1995 in Mons (Belgium)> Spin off of Mons Polytechnical University > In-house TTS & ASR technologies> TTS and ASR leader in Embedded environment
Infovox> Created in 1983 in Stockholm (Sweden)> Spin off of KTH (Royal Institute of Technology)> Integrated into Telia Promotor in 1993> Acquired by Babel Technologies in 2001> TTS leader in Nordic, Germany and Netherlands> Accessibility and Telecom expertise
Elan Speech> Created in 1980 in Toulouse (France)> Focused on TTS since 1996> Launch of in-house high quality TTS in 2002 (Elan Sayso)> TTS leader in Telecom and Automotive
Acapela’s locations
France, Toulouse
Belgium, Mons
Sweden, Stockholm
3 sites50 people
InternationalTeam
Local support in each site
Merged organization
Acapela’s multilingual offer
ASR & TTS components in 23 languages
Acapela’s technologies
Technologies (TTS)
Architecture
Text Preprocessor
Synthesizer
Tagger
Phonetizer
Prosody
Set of RulesSet of Rules
Dictionary basedDictionary based
Phonetic tree + DictionaryPhonetic tree + Dictionary
Prosodic PatternsProsodic Patterns
database (Voice)database (Voice)
Text Preprocessor
> Function– Generation of standard text
> Examples– Numbers: 100 one hundred– Currencies: $20 twenty dollars– Abbreviations: tel. telephone
> Implementation– Rules are defined in a standard format (BNF
format) > Size of data
– 20 Kbytes
Text Prepro.
Tagger
Phonetizer
Prosody
Speech Synth
Tagger (optional)
> Function– Generation of grammatical function of each word– Optional: not necessary for all languages
> Examples– To read – I have read– Les poules du couvent couvent
> Implementation– Dictionary based + set of rules
> Size of data– 0 to 20 Kbytes
Text Prepro.
Tagger
Phonetizer
Prosody
Speech Synth
Phonetizer
> Function– Generation of phonetic transcription for each word
> Examples– Babel: b a b E l
> Implementation– Decision tree + exception dictionary
> Size of data (language dependent)– 5 to 350 Kbytes
Text Prepro.
Speech Synth
Phonetizer
Prosody
Tagger
Prosodic module
> Function– Generation of intonation:
• Phoneme duration• Pitch markers
> Examples– See MBROLI application
> Implementation– Prosodic patterns extracted from speech corpus
> Size of data (language dependent)– 30 to 300 Kbytes
Text Prepro.
Prosody
Tagger
Phonetizer
Speech Synth
Synthesizer
> Function– Generation of speech samples from phoneme
sequence + intonation> Implementation: 3 technologies
– Formant-based = rules– Diphone concatenation– Unit Selection
> Size of data: depends on– Technology– Sampling frequency– Compression rate– From 50 Kbytes to 50 Mb
Text Prepro.
Speech Synth.
Tagger
Phonetizer
Prosody
Technologies (ASR)
Speech Recognition
Hybrid Models : Hidden Markov Models/ Neural Networks.Hybrid Models : Hidden Markov Models/ Neural Networks.
Analyse Acoustique
Reseau neurones
HMM
DiscriminationDiscrimination
Programmation Dynamique Programmation Dynamique (decoder)(decoder)
Reconnaissance
Vocabulaire– Transcription phonétique
Ex: reconnaissance: R [@] k O n E s a~ s– Envisager toutes les transcriptions !
Ex: 10 = dis – diz – di– Envisager les synonymes !
Ex: Oui , ouais, ok, c’est cela, …Ex: Télévision, TV, poste de télévision
Reconnaissance (suite) : difficultés
BruitAccentsHésitationsUtilisateursSyntaxe incorrecteMots hors vocabulaire
ASR : advantage of NN
Acapela’s product overview
Acapela’s Technologies Overview
> High-Quality TTS : the pleasant and natural sounding voicevoice enabled by Sayso and BrightSpeech based on Unit Selection technology
> High-Density TTS : the right choice for high density and small footprintsvoice enabled by Tempo and Babil based on Diphone technology
> ASR : the robust speech recognizervoice enabled by Babear Speaker Independent ASR based on Hidden Markov Models and Artificial Neural Networks
3 Technologies
Two TTS technologies
Diphone based concatenative TTS
Advantages• Small footprint (2 to 6 Mb)• Flexibility (Pitch, Speed adjustment, prosody copying)• High intelligibility• 21 languages supported
Disadvantage :• Less natural sounding
Markets/Application targeted :• Automotive & consumer electronic (low footprint)• High density, short ROI server based TTS (telephony)• Multimedia software products
High Density TTSVoice enabled by Tempo & Babil
LanguageFrench Female MaleUS English Female MaleUK English Female MaleGerman Female MaleSpanish (castillian) Female MaleItalian Female MalePolish MaleRussian MaleDutch ( NL ) Female MaleDutch ( B ) FemaleContinental Portuguese FemaleDanish Female MaleSwedish Female MaleNorwegian MaleFinnish MaleIcelandic MaleCzech FemaleTurkish MaleArabic MaleSouth American Spanish FemaleBrazilian Portuguese Female Male
Gender
High Density TTS language availability
Unit selection concatenative TTS
Advantages :• Very high quality• Highly natural• Flexibility (Pitch, Speed adjustment, timber alteration, whispering feature)• Support for Custom voice (“SpeechBrand” Program)
Disadvantage :• larger footprint (16 to 70 Mb)
Markets/Application targeted :• High end telephony application (Voice portal, news)• New generation of navigation terminals• Public address
High Quality TTS Voice enabled by Sayso & BrightSpeech
Language Status
French Female Male Available
US English Female Male Available
UK English Female Male Available
German Female Q1-2006 Available
Spanish (castillian) Female Available
Italian Female Available
Polish Female Available
Swedish Female Male Available
Arabic Female Male Available
Dutch ( NL ) Female Available
Dutch ( B ) Female Available
Norwegian Female AvailableContinental Portuguese FemaleDanish ****Mexican Spanish ***Finish **Canadian French *
Gender
High Quality TTS language availability
Hybrid technology of Hidden Markov Models and Artificial Neural Networks
Advantages :• Very high accuracy in difficult contexts• High dialog flexibility, • lip-sync and language learning capabilities thru phoneme level discrimination• Speaker independent• Accurate Voice Activation for noisy environments
Markets/Application targeted :
• Industrial Data collection : inventories, picking…• Automotive• Name dialing• Multimedia Command & Control / language learning
ASR Voice enabled by Babear
Language Robustness Status
US English +++ Available
UK English +++ Available
Spanish + Available
French +++ Available
German +++ Available
Italian ++ Available
Dutch + Available
Greek + Available
Arabic ++ Available
ASR language availability
Acapela’s market coverage
Acapela’s Markets
Solutions for Telecom, Automotive, Accessibility
Mobility, Industry, Multimedia, Consumer Electronics.
Leading 3 major and mature markets
Telecom, Automotive, Accessibility
Acapela’s Markets
Acapela’s main Markets
TelecomServer based vocalization of contents for multiple users over the phone• for Companies : Unified messaging, Auto attendant, CRM• for Telcos : Unified messaging, Voice portal, SMS2Voice, directory and reverse directory• for Contact centers: call automation, FAQ
Acapela’s main Markets
AutomotiveOn board and off-board speech solutions• On board & Off board car navigation systems• Traffic information• PDA based applications• Telematics
Acapela’s main Markets
AccessibilityAssistive technologies• Screen readers• Reading machines• Voice-controlled mobile phones
Creating new speech markets opportunities in
Acapela’s Markets
>> Mobility
• Cell phones
• Navigation on PDAs
Creating new speech markets opportunities in
Acapela’s Markets
>> Industry
• Public Address
• Alarm & Supervision
• Warehousing, Production Line
Creating new speech markets opportunities in
Acapela’s Markets
>>Multimedia
• Edutainment
• Education
• Language learning
• E-learning
Creating new speech markets opportunities in
Acapela’s Markets
>> Consumer Electronics, …
• Talking dictionaries devices
• Toys
giving you the say