december 19, 2005 fpms. acapela’s corporate profile

December 19, 2005

FPMS

Acapela’s corporate profile

Group Background

Babel Technologies> Created in 1995 in Mons (Belgium)> Spin off of Mons Polytechnical University > In-house TTS & ASR technologies> TTS and ASR leader in Embedded environment

Infovox> Created in 1983 in Stockholm (Sweden)> Spin off of KTH (Royal Institute of Technology)> Integrated into Telia Promotor in 1993> Acquired by Babel Technologies in 2001> TTS leader in Nordic, Germany and Netherlands> Accessibility and Telecom expertise

Elan Speech> Created in 1980 in Toulouse (France)> Focused on TTS since 1996> Launch of in-house high quality TTS in 2002 (Elan Sayso)> TTS leader in Telecom and Automotive

Acapela’s locations

France, Toulouse

Belgium, Mons

Sweden, Stockholm

3 sites50 people

InternationalTeam

Local support in each site

Merged organization

Acapela’s multilingual offer

ASR & TTS components in 23 languages

Acapela’s technologies

Technologies (TTS)

Architecture

Text Preprocessor

Synthesizer

Tagger

Phonetizer

Prosody

Set of RulesSet of Rules

Dictionary basedDictionary based

Phonetic tree + DictionaryPhonetic tree + Dictionary

Prosodic PatternsProsodic Patterns

database (Voice)database (Voice)

Text Preprocessor

> Function– Generation of standard text

> Examples– Numbers: 100 one hundred– Currencies: $20 twenty dollars– Abbreviations: tel. telephone

> Implementation– Rules are defined in a standard format (BNF

format) > Size of data

– 20 Kbytes

Text Prepro.

Tagger

Phonetizer

Prosody

Speech Synth

Tagger (optional)

> Function– Generation of grammatical function of each word– Optional: not necessary for all languages

> Examples– To read – I have read– Les poules du couvent couvent

> Implementation– Dictionary based + set of rules

> Size of data– 0 to 20 Kbytes

Text Prepro.

Tagger

Phonetizer

Prosody

Speech Synth

Phonetizer

> Function– Generation of phonetic transcription for each word

> Examples– Babel: b a b E l

> Implementation– Decision tree + exception dictionary

> Size of data (language dependent)– 5 to 350 Kbytes

Text Prepro.

Speech Synth

Phonetizer

Prosody

Tagger

Prosodic module

> Function– Generation of intonation:

• Phoneme duration• Pitch markers

> Examples– See MBROLI application

> Implementation– Prosodic patterns extracted from speech corpus

> Size of data (language dependent)– 30 to 300 Kbytes

Text Prepro.

Prosody

Tagger

Phonetizer

Speech Synth

Synthesizer

> Function– Generation of speech samples from phoneme

sequence + intonation> Implementation: 3 technologies

– Formant-based = rules– Diphone concatenation– Unit Selection

> Size of data: depends on– Technology– Sampling frequency– Compression rate– From 50 Kbytes to 50 Mb

Text Prepro.

Speech Synth.

Tagger

Phonetizer

Prosody

Technologies (ASR)

Speech Recognition

Hybrid Models : Hidden Markov Models/ Neural Networks.Hybrid Models : Hidden Markov Models/ Neural Networks.

Analyse Acoustique

Reseau neurones

HMM

DiscriminationDiscrimination

Programmation Dynamique Programmation Dynamique (decoder)(decoder)

Reconnaissance

Vocabulaire– Transcription phonétique

Ex: reconnaissance: R [@] k O n E s a~ s– Envisager toutes les transcriptions !

Ex: 10 = dis – diz – di– Envisager les synonymes !

Ex: Oui , ouais, ok, c’est cela, …Ex: Télévision, TV, poste de télévision

Reconnaissance (suite) : difficultés

BruitAccentsHésitationsUtilisateursSyntaxe incorrecteMots hors vocabulaire

ASR : advantage of NN

Acapela’s product overview

Acapela’s Technologies Overview

> High-Quality TTS : the pleasant and natural sounding voicevoice enabled by Sayso and BrightSpeech based on Unit Selection technology

> High-Density TTS : the right choice for high density and small footprintsvoice enabled by Tempo and Babil based on Diphone technology

> ASR : the robust speech recognizervoice enabled by Babear Speaker Independent ASR based on Hidden Markov Models and Artificial Neural Networks

3 Technologies

Two TTS technologies

Diphone based concatenative TTS

Advantages• Small footprint (2 to 6 Mb)• Flexibility (Pitch, Speed adjustment, prosody copying)• High intelligibility• 21 languages supported

Disadvantage :• Less natural sounding

Markets/Application targeted :• Automotive & consumer electronic (low footprint)• High density, short ROI server based TTS (telephony)• Multimedia software products

High Density TTSVoice enabled by Tempo & Babil

LanguageFrench Female MaleUS English Female MaleUK English Female MaleGerman Female MaleSpanish (castillian) Female MaleItalian Female MalePolish MaleRussian MaleDutch ( NL ) Female MaleDutch ( B ) FemaleContinental Portuguese FemaleDanish Female MaleSwedish Female MaleNorwegian MaleFinnish MaleIcelandic MaleCzech FemaleTurkish MaleArabic MaleSouth American Spanish FemaleBrazilian Portuguese Female Male

Gender

High Density TTS language availability

Unit selection concatenative TTS

Advantages :• Very high quality• Highly natural• Flexibility (Pitch, Speed adjustment, timber alteration, whispering feature)• Support for Custom voice (“SpeechBrand” Program)

Disadvantage :• larger footprint (16 to 70 Mb)

Markets/Application targeted :• High end telephony application (Voice portal, news)• New generation of navigation terminals• Public address

High Quality TTS Voice enabled by Sayso & BrightSpeech

Language Status

French Female Male Available

US English Female Male Available

UK English Female Male Available

German Female Q1-2006 Available

Spanish (castillian) Female Available

Italian Female Available

Polish Female Available

Swedish Female Male Available

Arabic Female Male Available

Dutch ( NL ) Female Available

Dutch ( B ) Female Available

Norwegian Female AvailableContinental Portuguese FemaleDanish ****Mexican Spanish ***Finish **Canadian French *

Gender

High Quality TTS language availability

Hybrid technology of Hidden Markov Models and Artificial Neural Networks

Advantages :• Very high accuracy in difficult contexts• High dialog flexibility, • lip-sync and language learning capabilities thru phoneme level discrimination• Speaker independent• Accurate Voice Activation for noisy environments

Markets/Application targeted :

• Industrial Data collection : inventories, picking…• Automotive• Name dialing• Multimedia Command & Control / language learning

ASR Voice enabled by Babear

Language Robustness Status

US English +++ Available

UK English +++ Available

Spanish + Available

French +++ Available

German +++ Available

Italian ++ Available

Dutch + Available

Greek + Available

Arabic ++ Available

ASR language availability

Acapela’s market coverage

Acapela’s Markets

Solutions for Telecom, Automotive, Accessibility

Mobility, Industry, Multimedia, Consumer Electronics.

Leading 3 major and mature markets

Telecom, Automotive, Accessibility

Acapela’s Markets

Acapela’s main Markets

TelecomServer based vocalization of contents for multiple users over the phone• for Companies : Unified messaging, Auto attendant, CRM• for Telcos : Unified messaging, Voice portal, SMS2Voice, directory and reverse directory• for Contact centers: call automation, FAQ


AutomotiveOn board and off-board speech solutions• On board & Off board car navigation systems• Traffic information• PDA based applications• Telematics


AccessibilityAssistive technologies• Screen readers• Reading machines• Voice-controlled mobile phones

Creating new speech markets opportunities in

Acapela’s Markets

>> Mobility

• Cell phones

• Navigation on PDAs


Acapela’s Markets

>> Industry

• Public Address

• Alarm & Supervision

• Warehousing, Production Line


Acapela’s Markets

>>Multimedia

• Edutainment

• Education

• Language learning

• E-learning


Acapela’s Markets

>> Consumer Electronics, …

• Talking dictionaries devices

• Toys

giving you the say

december 19, 2005 fpms. acapela’s corporate profile

Documents

acapelas technologies

automotive slide

organization slide

fpms slide

structure of capital

speech solutions

kbytes text prepro

elan sayso tts leader