enterface 08 project #1 “ multiparty communication with a tour guide eca” final presentation

eNTERFACE 08 Project #1“MultiParty Communication

with a Tour Guide ECA”

Final presentation

August 29th, 2008

Outline

• Project Overview

• Objectives, Issues & Work Done

• System Overview

• Configuration and Design

• Conclusion

Project Objectives• Main objective: develop an ECA Tour Guide system which can interract with one or two users

• Research features:

• multiparty dialogue model and scenario between two humans and ECA

• handling and combining input data: users presence and behaviors (speech, tracking)

• gaze behaviors control and nonverbal model of ECA

Work done: Component Functionality Overview

• We implemented components which support scenario based on narration and interruptions

• ECA is narrator, users can ask context-related questions (“where”, “how”, “when”)

• speaker, addresse and listener identification, ECA gaze model

• ECA can ask users simple “yes/no” questions to keep attention

• System can detect users appearance and dynamically initiate/end session

• System can detect and handle situation when users are paying less attention

• System can recover from failure (e.g. SR does not recognize user’s speech)

Work done...about to be done...

• Components are implemented

• System is being integrated

• debugging and full testing is needed

• Not supported:• Detection of situation when users are starting their conversation

• Detection of speech collision between users

• Smart scheduling and control of ECAs behaviors

System Configuration

Okao Vision

OpenCV

NonVerbal Input Understanding

Decision Making Planner (Scenario Component)

Animation Player

Speech Recognition 1


Input

Central Part

Output

Speech Recognition

Okao Vision

OpenCV



Animation Player



Input

Central Part

Output

Speech Recognition

• Functionality:

• Detects users requests (“Where”, “How”, “When”, “Who”)

• Detects users willingness to leave the system

• Detects results of simple questioners (“yes/no”)

• Detects unknown words

• Implementation:

• Keywords detection with confidence score and speech duration is implemented by using Loquendo API

Nonverbal Inputs and Understanding

Okao Vision

OpenCV



Animation Player



Input

Central Part

Output

Nonverbal Inputs: Users appearance and face orientation

• Functionality of components:

• Detect motions and users appearance/disappearance

• Detect number of users present

• Detect users face orientation and increased/decreased attention

• left, right user

• Implementation:

• OpenCV (motion) & Okao Vision (face orientation, gazing)

Decision Making Component

Okao Vision

OpenCV



Animation Player



Input

Central Part

Output

Decision Making Component- Functionalities

• Makes decisions “when and what to do to whom”:• Handles multimodal input events (number of users, attention, speech channels)

• Handles user interruptions while ECA is speaking

• Handles failures from SR component

• Generates multimodal output and controls ECA’s gazing

• Simple rule: “First one will be served”

• “yes”/”no” questionnaire is exception

• No domain knowledge and behavior scheduling

Decision Making Component - Implementation

• Decision Making Component component uses ideas from information state theory [Larsson’00] and AIML:

• The progress of dialogue is represented by a set of variables

• Most appropriate plans are selected and scheduled by simple inference

• Time control to obtain both messages from speech channels in case (“yes/no”) questions

• Component is being developed by using MIDIKI’s toolkit as reference

Animation Player

Okao Vision

OpenCV



Animation Player



Input

Central Part

Output

Animation Player

• Functionality:

• Animation player uses scripted behaviors (GSML language) to generate speech and animation

• Model of gaze in a multiparty communication is supported:

• Gazing control is obtained on the utterance level

• Gaze pattern is following conversational rules (who is addresee, who is listener)

• Implementation:

• Visage SDK (based on MPEG-4 standard)

• 3ds Max

Conclusion• Components to support context-based two party human - ECA communication are implemented

• System is being integrated, but not fully tested

• Component issues:• missing face tracking and domain knowledge about users behaviors• simple dialogue management and control (no smart scheduling and smart gaze control)

• Future directions: system debugging and testing, implement tracking, improve gazing control, study on users behaviors and gazing, system evaluation

enterface 08 project #1 “ multiparty communication with a tour guide eca” final presentation

Documents

users presence

users appearancedisappearance

users requests

users willingness

users speechwork

users simple yesno questions

attention system

users smart scheduling