multimodal expressive embodied conversational agents

73
1 Université Paris 8 Université Paris 8 Multimodal Expressive Multimodal Expressive Embodied Conversational Embodied Conversational Agents Agents Catherine Pelachaud Catherine Pelachaud Elisabetta Bevacqua Nicolas Ech Chafai, FT Maurizio Mancini Magalie Ochs, FT Christopher Peters Radek Niewiadomski

Upload: nova

Post on 11-Feb-2016

51 views

Category:

Documents


0 download

DESCRIPTION

Multimodal Expressive Embodied Conversational Agents. Université Paris 8. Catherine Pelachaud. Elisabetta Bevacqua Nicolas Ech Chafai, FT Maurizio Mancini Magalie Ochs, FT Christopher Peters Radek Niewiadomski. ECAs Capabilities. Anthropomorphic autonome figures - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Multimodal Expressive  Embodied Conversational Agents

11

Université Paris 8Université Paris 8

Multimodal Expressive Multimodal Expressive Embodied Conversational AgentsEmbodied Conversational Agents

Catherine PelachaudCatherine Pelachaud Elisabetta BevacquaNicolas Ech Chafai, FTMaurizio ManciniMagalie Ochs, FTChristopher PetersRadek Niewiadomski

Page 2: Multimodal Expressive  Embodied Conversational Agents

22

ECAs CapabilitiesECAs Capabilities

Anthropomorphic autonome figures Anthropomorphic autonome figures New form on human-machine interactionNew form on human-machine interaction Study of human communication, human-Study of human communication, human-

human interactionhuman interaction ECAs ought to be endowed with dialogic ECAs ought to be endowed with dialogic

and expressive capabilities and expressive capabilities Perception: an ECA must be able to pay Perception: an ECA must be able to pay

attention to, perceive user and the attention to, perceive user and the context she is placed in.context she is placed in.

Page 3: Multimodal Expressive  Embodied Conversational Agents

33

ECAs capabilitiesECAs capabilities

Interaction: Interaction: – speaker and addressee speaker and addressee emitsemits signals signals– speaker speaker perceivesperceives feedback from addressee feedback from addressee– speaker may decide to speaker may decide to adaptadapt to addressee’s to addressee’s

feedbackfeedback– consider social context consider social context

Generation: expressive synchronized visual Generation: expressive synchronized visual and acoustic behaviors. and acoustic behaviors. – produce expressive behavioursproduce expressive behaviours

words, voice, intonation,words, voice, intonation, gaze, facial expression, gesturegaze, facial expression, gesture body movements, body posturebody movements, body posture

Page 4: Multimodal Expressive  Embodied Conversational Agents

44

Synchrony tool - Synchrony tool - BEATBEAT

Cassell et al, Media Cassell et al, Media Lab MITLab MIT

Decomposition of Decomposition of text into theme and text into theme and rhemerheme

Linked to WordNetLinked to WordNet Computation of:Computation of:

– intonationintonation– gazegaze– gesturegesture

Page 5: Multimodal Expressive  Embodied Conversational Agents

55

Virtual Training Environments Virtual Training Environments MREMRE

(J. Gratch, L. Jonhson, S. (J. Gratch, L. Jonhson, S. Marsella…, USC)Marsella…, USC)

Page 6: Multimodal Expressive  Embodied Conversational Agents

66

Interactive SystemInteractive System

Real state agentGesture synchronized with speech and intonation Small talk Dialog partner

Page 7: Multimodal Expressive  Embodied Conversational Agents

77

MAX, MAX, S. Kopp, U of S. Kopp, U of BielefeldBielefeld

Gesture understanding and imitation

Page 8: Multimodal Expressive  Embodied Conversational Agents

88

Gilbert and George at Gilbert and George at the Bank (Upenn, 1994)the Bank (Upenn, 1994)

Page 9: Multimodal Expressive  Embodied Conversational Agents

99

Page 10: Multimodal Expressive  Embodied Conversational Agents

1010

GretaGreta

Page 11: Multimodal Expressive  Embodied Conversational Agents

1111

Problem to Be SolvedProblem to Be Solved Human communication is endowed Human communication is endowed

with three devices to express with three devices to express communicative intention:communicative intention:– Verbs and formulasVerbs and formulas– Intonation and paralinguisticIntonation and paralinguistic– Facial expression, gaze, gesture, body Facial expression, gaze, gesture, body

movement, posture…movement, posture… Problem: For any communicative Problem: For any communicative

act, the Speaker has to decide:act, the Speaker has to decide:– Which nonverbal behaviors to showWhich nonverbal behaviors to show– How to execute themHow to execute them

Page 12: Multimodal Expressive  Embodied Conversational Agents

1212

Verbal and Nonverbal Verbal and Nonverbal CommunicationCommunication

Suppose I want to advise a friend to put on Suppose I want to advise a friend to put on her coat because it is snowing.her coat because it is snowing.

Which signals do I use?Which signals do I use? Verbal signal: use of a syntactically complex Verbal signal: use of a syntactically complex

sentence: sentence: Take your umbrella because it is rainingTake your umbrella because it is raining

Verbal + nonverbal signals:Verbal + nonverbal signals:Take your umbrella +Take your umbrella + point out to the window to point out to the window to

show the rain by a gesture or by gazeshow the rain by a gesture or by gaze

Page 13: Multimodal Expressive  Embodied Conversational Agents

1313

Multimodal SignalsMultimodal Signals The whole body communicates by using:The whole body communicates by using:

– Verbal acts (words and sentences)Verbal acts (words and sentences)– Prosody, intonation (nonverbal vocal signals)Prosody, intonation (nonverbal vocal signals)– Gesture (hand and arm movements)Gesture (hand and arm movements)– Facial action (smile, frown)Facial action (smile, frown)– Gaze (eyes and head movements)Gaze (eyes and head movements)– Body orientation and posture (trunk and leg Body orientation and posture (trunk and leg

movements)movements) All these systems of signals have to All these systems of signals have to

cooperate in expressing overall meaning cooperate in expressing overall meaning of communicative act.of communicative act.

Page 14: Multimodal Expressive  Embodied Conversational Agents

1414

Multimodal SignalsMultimodal Signals Accompany flow of speechAccompany flow of speech Synchronized at the verbal levelSynchronized at the verbal level Punctuate accented phonemic segments Punctuate accented phonemic segments

and pausesand pauses Substitute for word(s)Substitute for word(s) Emphasize what is being saidEmphasize what is being said Regulate the exchange of speaking turnRegulate the exchange of speaking turn

Page 15: Multimodal Expressive  Embodied Conversational Agents

1515

SynchronizationSynchronization There exists an isomorphism between There exists an isomorphism between

patterns of speech, intonation and facial patterns of speech, intonation and facial actionsactions

Different levels of synchrony:Different levels of synchrony:– Phoneme level (blink)Phoneme level (blink)– Word level (eyebrow)Word level (eyebrow)– Phrase level (hand gesture)Phrase level (hand gesture)

Interactional synchrony: Synchrony Interactional synchrony: Synchrony between speaker and addresseebetween speaker and addressee

Page 16: Multimodal Expressive  Embodied Conversational Agents

1616

Taxonomy of Communicative Taxonomy of Communicative Functions (I. Poggi)Functions (I. Poggi)

The speaker may provide three broad The speaker may provide three broad types of information about:types of information about:– Information about the world: deictic, iconic Information about the world: deictic, iconic

(adjectival),…(adjectival),…– Information about the speaker’s mind: Information about the speaker’s mind:

belief (certainty, adjectival)belief (certainty, adjectival) goal (performative, rheme/theme, turn-system, belief goal (performative, rheme/theme, turn-system, belief

relation)relation) emotionemotion meta-cognitivemeta-cognitive

– Information about speaker’s identity (sex, Information about speaker’s identity (sex, culture, age…)culture, age…)

Page 17: Multimodal Expressive  Embodied Conversational Agents

1717

Multimodal Signals Multimodal Signals (Isabella Poggi)(Isabella Poggi)

Characterization of multimodal signals by Characterization of multimodal signals by their placement with respect to linguistic their placement with respect to linguistic utterance and significance in transmitting utterance and significance in transmitting information. Eg:information. Eg:– Raised eyebrow may signal surprise, Raised eyebrow may signal surprise,

emphasis, question mark, suggestion…emphasis, question mark, suggestion…– Smile may express happiness, be a polite Smile may express happiness, be a polite

greeting, be a backchannel signal…greeting, be a backchannel signal… Need two information to characterize Need two information to characterize

multimodal signals:multimodal signals:– Their meaningTheir meaning– Their visual actionTheir visual action

Page 18: Multimodal Expressive  Embodied Conversational Agents

1818

Lexicon=(meaning, signal)Lexicon=(meaning, signal)

Expression meaningExpression meaning– deicticdeictic: this, that, here, there: this, that, here, there– adjectivaladjectival: small, difficult: small, difficult– certaintycertainty: certain, uncertain…: certain, uncertain…– performativeperformative: greet, request: greet, request– topictopic commentcomment: emphasis : emphasis – BeliefBelief relationrelation: contrast,…: contrast,…– turn allocationturn allocation: take/give turn: take/give turn– affectiveaffective: anger, fear, happy-: anger, fear, happy-

for, sorry-for, envy, relief, ….for, sorry-for, envy, relief, ….

Expression signalExpression signal– Deictic:Deictic: gaze direction gaze direction– Certainty: Certainty: CertainCertain: palm up open : palm up open

hand; hand; UncertainUncertain: raised eyebrow: raised eyebrow– adjectival:adjectival: small eye aperture small eye aperture – Belief relation:Belief relation: ContrastContrast: raised : raised

eyebroweyebrow– Performative:Performative: SuggestSuggest: small : small

raised eyebrow, head aside; raised eyebrow, head aside; AssertAssert: horizontal ring: horizontal ring

– Emotion: Emotion: Sorry-forSorry-for: head : head aside, inner eyebrow up; aside, inner eyebrow up; JoyJoy: : raising fist upraising fist up

– Emphasis:Emphasis: raised eyebrows, raised eyebrows, head nod, beathead nod, beat

Page 19: Multimodal Expressive  Embodied Conversational Agents

1919

Representation LanguageRepresentation Language Affective Presentation Markup Language – APMLAffective Presentation Markup Language – APML

– describes the communicative functions describes the communicative functions – works at meaning level and not the signal levelworks at meaning level and not the signal level

<APML> <turn-allocation type="take turn"> <performative type="greet"> Good Morning, Angela. </performative><affective type="happy"> It is so <topic-comment type="comment"> wonderful </topic-comment> to see you again. </affective> <certainty type="certain"> I was <topic-comment type="comment"> sure </topic-comment> we would do so, one day! </certainty> </turn-allocation> </APML>..

Page 20: Multimodal Expressive  Embodied Conversational Agents

2020

Facial Description Facial Description LanguageLanguage

Facial expressions defined as (meaning, Facial expressions defined as (meaning, signal) pairs stored in librarysignal) pairs stored in library

Hierarchical set of classes:Hierarchical set of classes:– Facial basis FB class: basic facial movementFacial basis FB class: basic facial movement– An FB may be represented as a set of MPEG-4 An FB may be represented as a set of MPEG-4

compliant FAPs or recursively, as a combination compliant FAPs or recursively, as a combination of other FBs using the `+' operatorsof other FBs using the `+' operators FB={fap3=vFB={fap3=v11,…,fap69=v,…,fap69=vkk};}; FB'=cFB'=c11*FB*FB11+c+c22*FB*FB22;; where cwhere c11 and c and c2 2 are constants and FBare constants and FB11 and FB and FB22 can be: can be:

– Previous defined FBs Previous defined FBs – FB of the form: {fap3=vFB of the form: {fap3=v11,…,fap69=v,…,fap69=vkk}}

Page 21: Multimodal Expressive  Embodied Conversational Agents

2121

Facial basis classFacial basis class Facial basis class Facial basis class

– Examples of facial basis class:Examples of facial basis class: Eyebrow: small_frown, left_raise, Eyebrow: small_frown, left_raise,

right_raiseright_raise Eyelid: upper_lid_raiseEyelid: upper_lid_raise Mouth: left_corner_stretch, Mouth: left_corner_stretch,

left_corner_raiseleft_corner_raise

+ =

Page 22: Multimodal Expressive  Embodied Conversational Agents

2222

Facial DisplaysFacial Displays Every facial display (FD) is made up of Every facial display (FD) is made up of

one or more FBs:one or more FBs:– FD=FBFD=FB11 + FB + FB22 + FB + FB33 + … + FB + … + FBnn;;– surprise=raise_eyebrow+raise_lid+open_msurprise=raise_eyebrow+raise_lid+open_m

outh;outh;– worried=(surprise*0.7)+sadness;worried=(surprise*0.7)+sadness;

Page 23: Multimodal Expressive  Embodied Conversational Agents

2323

Facial DisplaysFacial Displays Probabilistic mapping between the tags and signals:Probabilistic mapping between the tags and signals:

– Es: happy_for = (smile*0.5, 0.3) + (smile*0.25) + (smile*2 Es: happy_for = (smile*0.5, 0.3) + (smile*0.25) + (smile*2 + raised_eyebrow, 0.35) + (nothing, 0.1)+ raised_eyebrow, 0.35) + (nothing, 0.1)

Definition of a function class for addressee Definition of a function class for addressee association (meaning, signal)association (meaning, signal)

Class communicative function:Class communicative function:– CertaintyCertainty– AdjectivalAdjectival– PerformativePerformative– AffectiveAffective– ……

Page 24: Multimodal Expressive  Embodied Conversational Agents

2424

Facial Temporal CourseFacial Temporal Course

Page 25: Multimodal Expressive  Embodied Conversational Agents

2525

Gestural LexiconGestural Lexicon Certainty: Certainty:

– Certain: palm up open handCertain: palm up open hand– Uncertain: showing empty hands while lowering forearmsUncertain: showing empty hands while lowering forearms

Belief-relation:Belief-relation:– List of items of same class: numbering on fingersList of items of same class: numbering on fingers– Temporal relation: fist with extended hand moves back Temporal relation: fist with extended hand moves back

and forth behind one’s shoulderand forth behind one’s shoulder Turn-taking:Turn-taking:

– Hold the floor: raise hand, palm toward hearer Hold the floor: raise hand, palm toward hearer Performative: Performative:

– Assert: horizontal ringAssert: horizontal ring– Reproach: extended index, palm to left, rotating up & Reproach: extended index, palm to left, rotating up &

down on wristdown on wrist Emphasis: beatEmphasis: beat

Page 26: Multimodal Expressive  Embodied Conversational Agents

2626

Gesture Specification Gesture Specification LanguageLanguage

Scripting language for hand-arm gestures, based Scripting language for hand-arm gestures, based on formational parameters [Stokoe]:on formational parameters [Stokoe]:– Hand shape specified using HamNoSys [Prillwitz et. al.]Hand shape specified using HamNoSys [Prillwitz et. al.]– Arm position: concentric squares in front of agent Arm position: concentric squares in front of agent

[McNeill][McNeill]– Wrist orientation: palm and finger base orientationWrist orientation: palm and finger base orientation

Gestures are defined by a sequence of timed key Gestures are defined by a sequence of timed key poses: gesture frameposes: gesture frame

Gestures are broken down temporally into Gestures are broken down temporally into distinct (optional) phases:distinct (optional) phases:– Gesture phase: preparation, stroke, hold, retractionGesture phase: preparation, stroke, hold, retraction– Change of formational components over time Change of formational components over time

Page 27: Multimodal Expressive  Embodied Conversational Agents

2727

Gesture Gesture specification specification

example: example: CertainCertain

Page 28: Multimodal Expressive  Embodied Conversational Agents

2828

Gesture Temporal CourseGesture Temporal Course

rest position preparation stroke start – stroke end

retraction rest position

Page 29: Multimodal Expressive  Embodied Conversational Agents

2929

ECA architectureECA architecture

Page 30: Multimodal Expressive  Embodied Conversational Agents

3030

ECA ArchitectureECA Architecture Input to the system: APML annotated textInput to the system: APML annotated text Output to the system: Animation files and WAV Output to the system: Animation files and WAV

file for the audiofile for the audio System: System:

– Interprets APML tagged dialogs, i.e. all Interprets APML tagged dialogs, i.e. all communicative functionscommunicative functions

– Looks in a library the mapping between the meaning Looks in a library the mapping between the meaning (specified by the XML-tag) and signals(specified by the XML-tag) and signals

– Decides which signals to convey on which modalitiesDecides which signals to convey on which modalities– Synchronizes the signals with speech at different Synchronizes the signals with speech at different

levels (word, phoneme or utterance)levels (word, phoneme or utterance)

Page 31: Multimodal Expressive  Embodied Conversational Agents

3131

Behavioral EngineBehavioral Engine

Page 32: Multimodal Expressive  Embodied Conversational Agents

3232

ModulesModules APML ParserAPML Parser: XML parser: XML parser TTS FestivalTTS Festival: manages the speech synthesis and give us : manages the speech synthesis and give us

the list of phonemes and phonemes duration.the list of phonemes and phonemes duration. Expr2Signal ConverterExpr2Signal Converter: given a communicative function : given a communicative function

and its meaning, this module returns the list of facial and its meaning, this module returns the list of facial signals signals

Conflicts ResolverConflicts Resolver: resolves the conflicts that may : resolves the conflicts that may happened when more than one facial signals should be happened when more than one facial signals should be activated on same facial partsactivated on same facial parts

Face GeneratorFace Generator: converts the facial signals into MPEG-4 : converts the facial signals into MPEG-4 FAP valuesFAP values

Viseme GeneratorViseme Generator: converts each phoneme, given by : converts each phoneme, given by Festival, into a set of FAPsFestival, into a set of FAPs

MPEG4 FAP DecoderMPEG4 FAP Decoder: is an MPEG-4 compliant Facial : is an MPEG-4 compliant Facial Animation Engine Animation Engine

Page 33: Multimodal Expressive  Embodied Conversational Agents

3333

TTS FestivalTTS Festival Drive the synchronization of facial expressionDrive the synchronization of facial expression Synchronization implemented at word levelSynchronization implemented at word level

– Timing of facial expression connected to the text Timing of facial expression connected to the text embedded between the markersembedded between the markers

Use of the tree structure of Festival to Use of the tree structure of Festival to compute expressions durationcompute expressions duration

Page 34: Multimodal Expressive  Embodied Conversational Agents

3434

Expr2Signal ConverterExpr2Signal Converter

Instantiation of APML tags: meaning Instantiation of APML tags: meaning of a given communicative functionof a given communicative function

Converts markers into facial signalsConverts markers into facial signals

Use of a library containing the Use of a library containing the lexicon of the type (meaning, facial lexicon of the type (meaning, facial expressions)expressions)

Page 35: Multimodal Expressive  Embodied Conversational Agents

3535

Gaze ModelGaze Model

Based on communicative functions’ model Based on communicative functions’ model of Isabella Poggiof Isabella Poggi

This model predicts what should be the This model predicts what should be the value of gaze in order to have a given value of gaze in order to have a given meaning in a given conversational context. meaning in a given conversational context.

For example:For example:– agent wants to emphasize a given word, the agent wants to emphasize a given word, the

model will output that the agent should gaze at model will output that the agent should gaze at her conversant.her conversant.

Page 36: Multimodal Expressive  Embodied Conversational Agents

3636

Gaze ModelGaze Model Very deterministic behavior model: at every Very deterministic behavior model: at every

Communicative Function associated with a Communicative Function associated with a meaning correspond the same signal (with meaning correspond the same signal (with probabilistic changes)probabilistic changes)

Event-driven model: only when a Event-driven model: only when a Communicative Function is specified the Communicative Function is specified the associated signals are computedassociated signals are computed

only when a Communicative Function is only when a Communicative Function is specified, the corresponding behavior may specified, the corresponding behavior may varyvary

Page 37: Multimodal Expressive  Embodied Conversational Agents

3737

Gaze ModelGaze Model

Several drawbacks as there is no Several drawbacks as there is no temporal consideration:temporal consideration:– No consideration of past and current No consideration of past and current

gaze behavior to compute the new onegaze behavior to compute the new one– No consideration of how long the current No consideration of how long the current

gaze state of S and L has lastedgaze state of S and L has lasted

Page 38: Multimodal Expressive  Embodied Conversational Agents

3838

Gaze AlgorithmGaze Algorithm Two steps:Two steps:1.1. Communicative prediction:Communicative prediction:

• Apply the communicative function model to Apply the communicative function model to compute the gaze behavior as to convey a compute the gaze behavior as to convey a given meaning for S and Lgiven meaning for S and L

2.2. Statistical prediction:Statistical prediction:• The communicative gaze model is The communicative gaze model is

probabilistically modified by a statistical probabilistically modified by a statistical model defined with constraints:model defined with constraints:– what is the communicative gaze behavior of S and what is the communicative gaze behavior of S and

LL– in which gaze behavior S and L werein which gaze behavior S and L were– the duration of the current state of S and Lthe duration of the current state of S and L

Page 39: Multimodal Expressive  Embodied Conversational Agents

3939

Temporal Gaze Temporal Gaze ParametersParameters

The gaze behaviors depend on the communicative The gaze behaviors depend on the communicative functions, general purpose of the conversation functions, general purpose of the conversation (persuasion discours, teaching...), personality, cultural (persuasion discours, teaching...), personality, cultural root, social relations... root, social relations...

Very, too, complex modelVery, too, complex modelpropose parameters that control the gaze behavior propose parameters that control the gaze behavior

overalloverall TTS=1,L=1S=1,L=1

maxmax: maximum duration the mutual gaze state may remain active.: maximum duration the mutual gaze state may remain active. TTS=1S=1

maxmax : maximum duration of gaze state S=1. : maximum duration of gaze state S=1. TTL=1L=1

maxmax : maximum duration of gaze state L=1 . : maximum duration of gaze state L=1 . TTS=0S=0

maxmax : maximum duration of gaze state S=0. : maximum duration of gaze state S=0. TTL=0L=0

maxmax : maximum duration of gaze state L=0. : maximum duration of gaze state L=0.

Page 40: Multimodal Expressive  Embodied Conversational Agents

4040

Mutual Gaze

Page 41: Multimodal Expressive  Embodied Conversational Agents

4141

Gaze Aversion

Page 42: Multimodal Expressive  Embodied Conversational Agents

4242

Gesture PlannerGesture Planner Adaptive instantiation:Adaptive instantiation:

– Preparation and retraction phase adjustmentsPreparation and retraction phase adjustments– Transition key and rest gesture insertionTransition key and rest gesture insertion– Joint-chain follow-throughJoint-chain follow-through

Forward time shifting of children joints in timeForward time shifting of children joints in time Stroke of gesture on stressed wordStroke of gesture on stressed word Stroke expansionStroke expansion

During planning phase, identify During planning phase, identify rhemerheme clauses with clauses with closely repeated emphases/pitch accentsclosely repeated emphases/pitch accents

Indicate secondary accents by repeating the stroke Indicate secondary accents by repeating the stroke of the primary gesture with decreasing amplitudeof the primary gesture with decreasing amplitude

Page 43: Multimodal Expressive  Embodied Conversational Agents

4343

Gesture PlannerGesture Planner Determination of gesture:Determination of gesture:

– Look in dictionaryLook in dictionary Selection of gestureSelection of gesture

– Gestures associated with most embedded tags Gestures associated with most embedded tags have priority (except beat): adjectival, deictichave priority (except beat): adjectival, deictic

Duration of gesture:Duration of gesture:– Coarticulation between successive gestures Coarticulation between successive gestures

closed in timeclosed in time– Hold for gestures belonging to higher up tag Hold for gestures belonging to higher up tag

hierarchy (e.g. performative, belief-relation)hierarchy (e.g. performative, belief-relation)– Otherwise go to rest positionOtherwise go to rest position

Page 44: Multimodal Expressive  Embodied Conversational Agents

4444

Behavior ExpressivityBehavior Expressivity Behavior is related to the (Wallbott, 1998):Behavior is related to the (Wallbott, 1998):

– qualityquality of the mental state (e.g. emotion) it refers of the mental state (e.g. emotion) it refers toto

– quantityquantity (somehow linked to the intensity factor (somehow linked to the intensity factor of the mental state)of the mental state)

Behaviors encode: Behaviors encode: – content information (the ‘What is communicating’)content information (the ‘What is communicating’)– expressive information (the ‘How it is expressive information (the ‘How it is

communicating’)communicating’) Behavior expressivity refers to the manner of Behavior expressivity refers to the manner of

execution of the behaviorexecution of the behavior

Page 45: Multimodal Expressive  Embodied Conversational Agents

4545

Expressivity DimensionsExpressivity Dimensions

SpatialSpatial: amplitude of movement: amplitude of movement TemporalTemporal: duration of movement: duration of movement PowerPower: dynamic property of movement: dynamic property of movement FluidityFluidity: smoothness and continuity of : smoothness and continuity of

movementmovement RepetitivenessRepetitiveness: tendency to rhythmic repeats: tendency to rhythmic repeats Overall Activation:Overall Activation: quantity of movement quantity of movement

across modalitiesacross modalities

Page 46: Multimodal Expressive  Embodied Conversational Agents

4646

Overall ActivitationOverall Activitation• Threshold filter on atomic behaviors during APML tag matching

• Determines the number of nonverbal signals to be executed.

Page 47: Multimodal Expressive  Embodied Conversational Agents

4747

Spatial ParameterSpatial Parameter• Amplitude of movement controlled through asymmetric scaling of the reach• space that is used to find IK goal positions• Expand or condense the entire space in front of agent

Page 48: Multimodal Expressive  Embodied Conversational Agents

4848

Temporal parameterTemporal parameter

Stroke shift / velocity control of a beat gesture

Y p

ositi

on o

f wris

t w.r.

t. sh

ould

er [c

m]

Frame #

• Determine the speed of the arm movement of a gesture's meaning-carrying stroke phase

• Modify speed of stroke

Page 49: Multimodal Expressive  Embodied Conversational Agents

4949

FluidityFluidity• Continuity control of TCB interpolation splines and gesture-to-gesture• Continuity of arms’ trajectory paths• Control the velocity profiles of an action

coarticulation

X p

ositi

on o

f wris

t w.r.

t. sh

ould

er [c

m]

Frame #

Page 50: Multimodal Expressive  Embodied Conversational Agents

5050

PowerPower

• Tension and Bias control of TCB splines;• Overshoot reduction• Acceleration and deceleration of limbs

Hand shape control for gestures that do not need hand configuration to convey their meaning (beats).

Page 51: Multimodal Expressive  Embodied Conversational Agents

5151

RepetitivityRepetitivity

• Technique of stroke expansion: Consecutive emphases are realized gesturally by repeating the stroke of the first gesture.

Page 52: Multimodal Expressive  Embodied Conversational Agents

5252

Multiple Modality Ex: Multiple Modality Ex: AbruptAbrupt

Overall Activity = 0.6Spatial = 0Temporal = 1Fluidity = -1Power = 1Repetition = -1

Page 53: Multimodal Expressive  Embodied Conversational Agents

5353

Multiple Modality Ex: Multiple Modality Ex: VigorousVigorous

Overall Activity = 1

Spatial = 1

Temporal = 1

Fluidity = 1

Power = 0

Repetition = 1

Page 54: Multimodal Expressive  Embodied Conversational Agents

5454

Evaluation of Expressive Evaluation of Expressive GestureGesture

(H1) The chosen implementation for mapping single (H1) The chosen implementation for mapping single dimensions of expressivity onto animation dimensions of expressivity onto animation parameters is appropriate - a change in a single parameters is appropriate - a change in a single dimension can be recognized and correctly dimension can be recognized and correctly attributed by users.attributed by users.

(H2) Combining parameters in such a way that they (H2) Combining parameters in such a way that they reflect a given communicative intent will result in reflect a given communicative intent will result in more believable overall impression of the agent.more believable overall impression of the agent.

106 subjects from 17 to 26 years old106 subjects from 17 to 26 years old

Page 55: Multimodal Expressive  Embodied Conversational Agents

5555

Perceptual Test StudiesPerceptual Test Studies Evaluation of the adequacy of the implementation of each Evaluation of the adequacy of the implementation of each

parameter:parameter:– check whether subjects could perceive and distinguish the six check whether subjects could perceive and distinguish the six

different expressivity parameters and indicate their direction of different expressivity parameters and indicate their direction of change. change.

– Result: good recognition for Result: good recognition for spatialspatial and and temporaltemporal parameters; parameters; lower recognition for lower recognition for fluidityfluidity and and powerpower parameters as they are parameters as they are inter-dependent.inter-dependent.

Evaluation task: does setting appropriate values for the Evaluation task: does setting appropriate values for the expressivity parameters create behaviors that are judged as expressivity parameters create behaviors that are judged as exhibiting corresponding expressivity?exhibiting corresponding expressivity?– 3 different types of behaviors: 3 different types of behaviors: abrupt, sluggish, vigorousabrupt, sluggish, vigorous– users prefer the coherent performance for vigorous and abruptusers prefer the coherent performance for vigorous and abrupt

Page 56: Multimodal Expressive  Embodied Conversational Agents

5656

InteractionInteraction Interaction: two or more parties exchange Interaction: two or more parties exchange

messages. messages. Interaction is by no means a one way Interaction is by no means a one way

communication channel between parties. communication channel between parties. Within an interaction, parties take turns in playing Within an interaction, parties take turns in playing

the roles of the speaker and of the addressee.the roles of the speaker and of the addressee.

Page 57: Multimodal Expressive  Embodied Conversational Agents

5757

InteractionInteraction Speaker and addressee adapt their Speaker and addressee adapt their

behaviors to each otherbehaviors to each other– Speaker monitors addressees attention Speaker monitors addressees attention

and interest in what he has to sayand interest in what he has to say– addressee selects feedback behaviors to addressee selects feedback behaviors to

show the speaker that he is paying show the speaker that he is paying attentionattention

Page 58: Multimodal Expressive  Embodied Conversational Agents

5858

InteractionInteraction Speaker:Speaker:

– Pointless for a speaker to engage in an act Pointless for a speaker to engage in an act of communication if addressee does not of communication if addressee does not pay or intend to pay attentionpay or intend to pay attention

– Important for speaker to assess Important for speaker to assess addressee’s engagement at:addressee’s engagement at: when starting an interaction: assess the when starting an interaction: assess the

possibility of engagement in interaction possibility of engagement in interaction ((establish phaseestablish phase))

when interaction is going on: check if when interaction is going on: check if engagement is lasting and sustaining engagement is lasting and sustaining conversation (conversation (maintain phasemaintain phase))

Page 59: Multimodal Expressive  Embodied Conversational Agents

5959

InteractionInteraction addresseeaddressee

– attentionattention: pay attention to the signals produced by : pay attention to the signals produced by speaker to perceive, process and memorize themspeaker to perceive, process and memorize them

– perceptionperception: of signals: of signals– comprehensioncomprehension: understand meaning attached to : understand meaning attached to

signalssignals– internal reactioninternal reaction: the comprehension of the : the comprehension of the

meaning may create cognitive and emotional meaning may create cognitive and emotional reactionreaction

– decisiondecision: communication or not of the internal : communication or not of the internal reactionreaction

– generationgeneration: display behaviors: display behaviors

Page 60: Multimodal Expressive  Embodied Conversational Agents

6060

BackchannelBackchannel Types of backchannels (I. Poggi):Types of backchannels (I. Poggi):

– attentionattention– comprehensioncomprehension– beliefbelief– interestinterest– agreementagreement

positive/negativepositive/negative any combination of the above: pay any combination of the above: pay

attention but not understand; understand attention but not understand; understand but non believe, etc.but non believe, etc.

Page 61: Multimodal Expressive  Embodied Conversational Agents

6161

BackchannelBackchannel Depending on the type of speech act Depending on the type of speech act

they respond to, a signal will be they respond to, a signal will be interpreted as a backchannel or not.interpreted as a backchannel or not.– backchannel: a signal of agreement / backchannel: a signal of agreement /

disagreement that follows the expression disagreement that follows the expression of opinions, evaluations, planningof opinions, evaluations, planning

– not a backchannel: a signal of not a backchannel: a signal of comprehension / incomprehension after an comprehension / incomprehension after an explicit question « Did you understand? »explicit question « Did you understand? »

Page 62: Multimodal Expressive  Embodied Conversational Agents

6262

BackchannelBackchannel Polysemy of backchannel signals: Polysemy of backchannel signals:

– a signal may provide different types of a signal may provide different types of informationinformation

– a frown: negative feedback for a frown: negative feedback for understanding, believing and agreeingunderstanding, believing and agreeing

Page 63: Multimodal Expressive  Embodied Conversational Agents

6363

Backchannel signals of gazeBackchannel signals of gaze gaze: gaze:

– show direction of attentionshow direction of attention– inform on level of engagement or on intention inform on level of engagement or on intention

to maintain engagementto maintain engagement– indicate degree of intimacy indicate degree of intimacy but alsobut also– monitor the gaze behavior of others to establish monitor the gaze behavior of others to establish

their intention to engage or maintain engagedtheir intention to engage or maintain engaged shared attention situation involved mutual shared attention situation involved mutual

gaze at each other partner or mutual gaze gaze at each other partner or mutual gaze at a same objectat a same object

Page 64: Multimodal Expressive  Embodied Conversational Agents

6464

Backchannel modellingBackchannel modelling Reactive modelReactive model

– generates an instinctive feedback without reasoninggenerates an instinctive feedback without reasoning– simple backchannel or mimicrysimple backchannel or mimicry– spontaneous - sincere spontaneous - sincere

Cognitive modelCognitive model– conscious decision to provide backchannel to provoke conscious decision to provide backchannel to provoke

a particular effect on the speaker or to reach a a particular effect on the speaker or to reach a specific goalspecific goal

– deliberate – possibly pretendeddeliberate – possibly pretended– it can be shifted to automatic (it can be shifted to automatic (ex. ex. when listening to a when listening to a

borebore))

Page 65: Multimodal Expressive  Embodied Conversational Agents

6565

Backchannel DemoBackchannel Demo

Page 66: Multimodal Expressive  Embodied Conversational Agents

6666

A reactive backchannelA reactive backchannel Currently, our model is Currently, our model is reactivereactive in in

naturenature– Dependent on perceptionDependent on perception

Speaker interprets addressee’s behaviorSpeaker interprets addressee’s behavior Speaker generates or alters its own behaviorSpeaker generates or alters its own behavior

– Our focus: interest and attention on a Our focus: interest and attention on a signal level (not on a cognitive level)signal level (not on a cognitive level)

Page 67: Multimodal Expressive  Embodied Conversational Agents

6767

Organization of the Organization of the communication communication Attraction of Attraction of attentionattention

Communicative agents: Communicative agents: the agents provide information to the user, and the agents provide information to the user, and should guarantee the user pay attentionshould guarantee the user pay attention

Animation expressivity:Animation expressivity: principle of “staging”, so that a single idea is principle of “staging”, so that a single idea is clearly expressed at each instant of timeclearly expressed at each instant of time

Animation specificity:Animation specificity: animators’ creativity, no realistic constraints for animators’ creativity, no realistic constraints for animators animators

What types of gesture properties could guarantee user’s attention?

France Telecom

Page 68: Multimodal Expressive  Embodied Conversational Agents

6868

Organization of the Organization of the communication communication Attraction of attentionAttraction of attention

Corpus:Corpus: videos from traditional animationvideos from traditional animation that that illustrate different types of conversational illustrate different types of conversational interactioninteraction

the modulations of gesture expressivity over time play a role in managing communication, thus serving as a pragmatic tool France Telecom

Page 69: Multimodal Expressive  Embodied Conversational Agents

6969

EmotionEmotion elicited by the evaluation of events, elicited by the evaluation of events,

objects, actionsobjects, actions integration of emotions in a dialog integration of emotions in a dialog

system (Artimis, FT)system (Artimis, FT) identify under which circumstances a identify under which circumstances a

dialog agent should express dialog agent should express emotionsemotions

France Telecom

Page 70: Multimodal Expressive  Embodied Conversational Agents

7070

EmotionEmotion

BDI representationBDI representation based on OCC model: Appraisal variablesbased on OCC model: Appraisal variables [Ortony [Ortony

et al.et al. 1988]: 1988]:– Desirability/Undesirability Desirability/Undesirability : Achievement or threaten of the : Achievement or threaten of the

agent's choice agent's choice – Degree of realizationDegree of realization : Degree of certainty of the choice's : Degree of certainty of the choice's

achievementachievement– Probability of an eventProbability of an event : Probability of feasibility of an event : Probability of feasibility of an event– AgencyAgency : The agent who is actor of the event : The agent who is actor of the event

Emotional Mental State

Set of appraisal variables Configuration of mental attitudes

Representation of appraisal variables by mental attitudes

France Telecom

Page 71: Multimodal Expressive  Embodied Conversational Agents

7171

EmotionEmotion complex emotions:complex emotions:

– superposition of 2 emotions: evaluation of superposition of 2 emotions: evaluation of an event can happen under different an event can happen under different anglesangles

– mask an emotion by another one : mask an emotion by another one : consideration of social contextconsideration of social context

joy + deception = maskingjoy + deception = masking

Page 72: Multimodal Expressive  Embodied Conversational Agents

7272

VideoVideoMasking of Deception by JoyMasking of Deception by Joy

Page 73: Multimodal Expressive  Embodied Conversational Agents

7373

ConclusionConclusion Creation of a virtual agent able to Creation of a virtual agent able to

– communicate nonverballycommunicate nonverbally– show emotionsshow emotions– use expressive gesturesuse expressive gestures– perceive and be attentiveperceive and be attentive– maintain the attentionmaintain the attention

Two studies on expressivityTwo studies on expressivity– from manual annotation of video corpusfrom manual annotation of video corpus– from mimicry of movement analysisfrom mimicry of movement analysis