![Page 1: 1 Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents S. Kawamoto, et al. October 27, 2004](https://reader035.vdocuments.mx/reader035/viewer/2022070416/5697c01b1a28abf838ccf971/html5/thumbnails/1.jpg)
1
Galatea: Open-Source Software for Galatea: Open-Source Software for Developing Anthropomorphic Spoken Developing Anthropomorphic Spoken
Dialog AgentsDialog Agents
S. Kawamoto, et al.
October 27, 2004
![Page 2: 1 Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents S. Kawamoto, et al. October 27, 2004](https://reader035.vdocuments.mx/reader035/viewer/2022070416/5697c01b1a28abf838ccf971/html5/thumbnails/2.jpg)
2
AgendaAgenda
• Introduction
• Toolkit Design and Outline– Speech recognition module– Speech synthesis module– Facial image synthesis module– Agent manager– Virtual machine model– Task manager– Prototyping tools
• Prototype Systems
• Conclusions
![Page 3: 1 Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents S. Kawamoto, et al. October 27, 2004](https://reader035.vdocuments.mx/reader035/viewer/2022070416/5697c01b1a28abf838ccf971/html5/thumbnails/3.jpg)
3
IntroductionIntroduction• An anthropomorphic spoken dialog agent (ASDA) is one of
the next-generation human-computer interfaces
• Many ASDA systems have been developed, but developing a high-quality ASDA system is still challenging
An unlimited number of life-like agent characters having different faces and voices just like human
• For this reason, Galatea has been developed to provide a platform to build next-generation ASDA systems
![Page 4: 1 Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents S. Kawamoto, et al. October 27, 2004](https://reader035.vdocuments.mx/reader035/viewer/2022070416/5697c01b1a28abf838ccf971/html5/thumbnails/4.jpg)
4
Features of the ToolkitFeatures of the Toolkit• Easy customization
– Model-based approachesOnce the model parameters are trained, facial expressions
and voice quality can be controlled easily
• Key techniques for natural spoken dialog Incremental speech recognition, synchronization between
speech and facial animation, etc
• Modularity of functional units– Simple architecture to manage each functional unit
User can develop, improve, debug, etc
• Open-source free software
Introduction
![Page 5: 1 Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents S. Kawamoto, et al. October 27, 2004](https://reader035.vdocuments.mx/reader035/viewer/2022070416/5697c01b1a28abf838ccf971/html5/thumbnails/5.jpg)
5
Toolkit Design and OutlineToolkit Design and Outline
Works as an inter-modulecommunication manager
Directly managed by the modules which utilize the devices
Adding a new module for the function and connecting the module to the agent manager
![Page 6: 1 Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents S. Kawamoto, et al. October 27, 2004](https://reader035.vdocuments.mx/reader035/viewer/2022070416/5697c01b1a28abf838ccf971/html5/thumbnails/6.jpg)
6
Speech Recognition Module (SRM)Speech Recognition Module (SRM)• Major interfaces of SRM are
as follows:– Outputs
Recognition result (XML format)
Engine status(“busy”, “waiting”, ... )
– Control commandReload grammar, change
the settings of thespeech recognition engine
– Grammar representationTransforms the XML grammar into a format that is accepted
by the speech recognition engine
Toolkit Design and Outline
Command InterpreterCommand Interpreter
Grammar TransformerGrammar Transformer
Speech Recognition EngineSpeech Recognition Engine
Speech input
Grammar
Request
Response
![Page 7: 1 Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents S. Kawamoto, et al. October 27, 2004](https://reader035.vdocuments.mx/reader035/viewer/2022070416/5697c01b1a28abf838ccf971/html5/thumbnails/7.jpg)
7
Speech Synthesis Module (SSM)Speech Synthesis Module (SSM)• Accept arbitrary Japanese
texts
• Synthesize speech with a human voice– HMM-based speech
synthesis method isemployed
• Synchronizing the lip movement with speech
• SSM can interrupt speech output to cope with any interruption by the user
Toolkit Design and Outline
Command Interpreter
Dictionary
AcousticModels
SpeechOutput
Text Analyzer
WaveformGeneration
Engine
![Page 8: 1 Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents S. Kawamoto, et al. October 27, 2004](https://reader035.vdocuments.mx/reader035/viewer/2022070416/5697c01b1a28abf838ccf971/html5/thumbnails/8.jpg)
8
Facial Image Synthesis Module (FSM)Facial Image Synthesis Module (FSM)• Supports high-quality facial
image synthesis, animation control, precise lip-sync with voice
• GUI is equipped to fit a generic face wire frame model onto a full-face snapshot image
• Facial action control– Mouth shape– Facial expression
Toolkit Design and Outline
![Page 9: 1 Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents S. Kawamoto, et al. October 27, 2004](https://reader035.vdocuments.mx/reader035/viewer/2022070416/5697c01b1a28abf838ccf971/html5/thumbnails/9.jpg)
9
Agent Manager (AM)Agent Manager (AM)• Integrator of all the modules of the ASDA system
• Play a central role of communication
• Synchronization manager between SSM and FSM to achieve the precise lip-sync
Toolkit Design and Outline
Dispatcher
Macro-command interpreter
![Page 10: 1 Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents S. Kawamoto, et al. October 27, 2004](https://reader035.vdocuments.mx/reader035/viewer/2022070416/5697c01b1a28abf838ccf971/html5/thumbnails/10.jpg)
10
Virtual Machine ModelVirtual Machine Model
• Module interface is modeled as a machine with slots– Each slot is indicates machine status
• Changing the slot values by a common command set “set Speak = now” means starting voice synthesis of a given
text immediately
Toolkit Design and Outline
![Page 11: 1 Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents S. Kawamoto, et al. October 27, 2004](https://reader035.vdocuments.mx/reader035/viewer/2022070416/5697c01b1a28abf838ccf971/html5/thumbnails/11.jpg)
11
Task Manager (TM)Task Manager (TM)• Define the dialog as a set of interactions which can be
represented by a dialog description language
• Goal in developing the TM is that the system can use several types of dialog description languages– VoiceXML
High-level language, task-oriented information and the intentions of the participants
– PDOC (primitive dialog operation commands)Low-level language, device events and sequence control
Toolkit Design and Outline
![Page 12: 1 Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents S. Kawamoto, et al. October 27, 2004](https://reader035.vdocuments.mx/reader035/viewer/2022070416/5697c01b1a28abf838ccf971/html5/thumbnails/12.jpg)
12
Prototyping ToolsPrototyping Tools• “Galatea Interaction Builder (IB)”
Toolkit Design and Outline
ApplicationDeveloper
Interaction Builder
Galatea MMI System
XISL File
web site
Create XISL
Document
Download and
Execute XISL
Check
DesignScenario
![Page 13: 1 Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents S. Kawamoto, et al. October 27, 2004](https://reader035.vdocuments.mx/reader035/viewer/2022070416/5697c01b1a28abf838ccf971/html5/thumbnails/13.jpg)
13
Prototype SystemsPrototype Systems
![Page 14: 1 Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents S. Kawamoto, et al. October 27, 2004](https://reader035.vdocuments.mx/reader035/viewer/2022070416/5697c01b1a28abf838ccf971/html5/thumbnails/14.jpg)
14
Echo-back taskEcho-back task
Prototype Systems
![Page 15: 1 Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents S. Kawamoto, et al. October 27, 2004](https://reader035.vdocuments.mx/reader035/viewer/2022070416/5697c01b1a28abf838ccf971/html5/thumbnails/15.jpg)
15
ConclusionsConclusions• A human-like spoken dialog agent is one of the promising
man-machine interfaces for the next generation
• Galatea is a software toolkit to develop a human-like spoken dialog agent
• Because of the high modularity and simple communication architecture, it will speed up the research and application development based on ASDA