socially-sensitive interfaces: from offline studies to interactive experiences

58
Socially-Sensitive Interfaces: From Offline Studies to Interactive Experiences Elisabeth André Augsburg University, Germany http://hcm-lab.de

Upload: elisabeth-andre

Post on 14-Jan-2017

750 views

Category:

Engineering


0 download

TRANSCRIPT

Socially-Sensitive Interfaces:

From Offline Studies to Interactive Experiences

Elisabeth André

Augsburg University, Germany

http://hcm-lab.de

2

Human-Centered Multimedia

Founded: April 2001

Chair: Elisabeth André

Research Topics:

Human-Computer Interaction

Social Signal Processing

Affective Computing

Embodied Conversational Agents

Social Robotics

3

Motivation

There is another level in human communication, which is

just as important as the spoken message:

nonverbal communication

How can we enrich the precise and useful functions of

computers with the human’s ability to shape the meaning

of a message through nonverbal messages?

4

Observation

Social signal processing has developed from a

side issue to a major area of research.

Undertaken effort has not translated well into

applications. Why is this?

1998 ………………. …………………….. 2005 2006 ……. 2009 .. 2011 2012 2013 2015

Special Session on Face and Gesture Recognition

Keynote „Honest Signals“

1st HCM Workshop 1/3 of Grand Challenge Papers on Affective Computing

3 Workshops on „Social Cues“ Brave New Topic: Affective Multimodal HCI

ACM MM

5

Challenge: Real-Life

Applications

Total of 434 publications on SSPNet

10% include term “real(-)time” and are related to detection

Only 2 % address multi-modal detection

Social Signal Processing in the Wild

90%

3% 2%

2% 1% 0% 2%

face (15) gesture (9) speech (9)interaction (8) physiological (2) multimodal (13)

Meta Analysis by J. Wagner

6

Organization of the Talk

Analysis of Emotional and Social Signals

Generation of Expressive Behaviors in Virtual Agents

and Robots

Applications of Socially Signal Processing and Embodied

Agents

Socially sensitive Robots

Training of Presentation Skills in

• Job Interviews

• Public Speaking

Providing Information on Social Context to Blind

People

7

Challenge: Noisy and Corrupted

Data

We only rely on previously seen data.

We have to deal with noisy and corrupted data.

?

now

time

noise missing

8

Challenge: Non-Prototypical

Behaviors

Previous research focused on the analysis of

prototypical samples in preferably pure form

In daily life, we also observe subtle, blended and

suppressed emotions, i.e. non-prototypical emotional

displays.

Pictures from Ekman and Friesen’s database of emotional faces

9

Accuracy Drops with

Naturalness

Systems developed under laboratory conditions

often perform poorly in real-world scenarios

100% 80% 70%

Accuracy

Naturalness

Acted Read WOZ

10

Contextualized Analysis

Improvement by context-sensitive analysis

Gender-specific information (Vogt & André 2006)

Success / failure of student in tutoring applications

(Conati & McLaren 2009)

Dialogue behavior of virtual agent / robot (Baur et al.

2014)

Learning context using (B)LSTM (Metallinou et al.

2014)

11

Challenge: Multimodal Fusion

Meta study by D’Mello and Kory on multimodal

affect detection shows that improvement

correlates with naturalness of corpus:

>10% for acted and only <5% for natural data

In natural interaction people draw on a mixture

of strategies to express emotion leading to a

complementary rather than consistent display of

social behaviour

S.K. D'Mello, J.M. Kory: Consistent but modest: a

meta-analysis on unimodal and multimodal affect

detection accuracies from 30 studies. ICMI 2012: 31-38

12

Event-Based Fusion

In case of contradictory cues, fusion methods

trust the “right” modality just as often as “wrong”

one

single modalities

fusion

techniques

sample

correct classification

incorrect classification

J. Wagner, E. André, F. Lingenfelser, J. Kim: Exploring Fusion Methods for

Multimodal Emotion Recognition with Missing Data. T. Affective Computing 2(4): 206-

218 (2011)

13

Event-Based Fusion

Amount of misclassified samples significantly

higher when annotations mismatch

Yes 71%

No 29%

62%

36%

Agreement?

14

neutral

happy

Face

Voice

happy

neutral

? Fusion ?

happy

happy

Face

Voice

Fusion happy

Event-based Fusion

15

Synchronous Fusion

Synchronous fusion approaches are characterized by

the consideration of multiple modalities within the same

time frame

16

Asynchronous Fusion

Asynchronous fusion algorithms refer to past time

frames with the help of some kind of memory support.

Therefore, they are able to capture the asynchronous

nature of observed modalities.

17

Event-Based Fusion

18

Event-Based Fusion

Take into account temporal relationships between

channels and learn when to combine information

Move from segmentation-based processing to

asynchronous event-driven approaches

More robust in the case of missing or noisy data

+

0

Fusion

time

haha hehe

Event

F. Lingenfelser, J. Wagner, E. André, G. McKeown, W. Curran: An Event Driven Fusion Approach

for Enjoyment Recognition in Real-time. ACM Multimedia 2014: 377-386

19

SSI Framework

The Social Signal Interpretation (SSI) framework

is the attempt to provide a general architecture

to tackle the challenges we have discussed:

collection of large and rich multi-modal corpora

investigation of advanced fusion techniques

simplifying the development of online systems

hehe

hehe

Johannes Wagner, Florian Lingenfelser, Tobias

Baur, Ionut Damian, Felix Kistler, Elisabeth André:

The social signal interpretation (SSI) framework:

multimodal signal processing and recognition in

real-time. ACM Multimedia 2013: 831-834

SSI is freely available under:

http://www.openssi.net

20

SSI Framework

Mic

Cam

Xsens

Wii

Smartex

Empatica

WAX9

AHM

Emotiv

Kinect

Leap

SensingTex

Touch Mouse

EyeTribe

SMI

Nexus

IOM

eHealth

Myo

KRISTINA

22

Social Robots

23

Affective Feedback Loop

Create

Rapport

Mirror Emotional Behavior

Generate Implicit

Feedback Behavior Analysis

Emotion Recognition

Sensors

24

Generation of Facial

Expressions

FACS (Facial Action Coding System) can be used to

generate and recognize facial expressions.

Action Units are used to describe emotional

expressions.

Seven Action Units were identified for the robotic face

(out of 40 Action Units for the human face)

Lower face:

lip corner puller (AU 12),

lip corner depressor (AU 15)

and lip opening (AU 25)

Upper face:

inner brows raiser (AU 1),

brown lowerer (AU 4),

upper lid raiser (AU 5)

and eye closure (AU 43).

Upper face: inner brows raiser (AU 1),

brown lowerer (AU 4), upper lid raiser

(AU 5) and eye closure (AU 43).

{ Lower face: lip corner puller (AU 12), lip

corner depressor (AU 15), and lip

opening (AU 25).

25

Generation of Facial

Expressions

26

Realization of Social Lies for the

Hanson Robokind

Social lies constitute a great part of human conversation.

Social lies, as used for politeness reasons, are generally

accepted.

Humans often show deceptive cues in their nonverbal

behavior while lying.

Humanoid robots should show deceptive cues while

conducting social lies as well.

27

Deceptive Cues

Deceptive cues in human faces, according to Ekman and

colleagues:

Micro-expressions: A false emotion is displayed but

the felt emotion is unconsciously expressed for the

fraction of a second.

Masks: The felt emotion is intentionally masked by a

not corresponding facial expression.

Timing: The longer an expression is shown the more

likely it is accompanying a lie.

Asymmetry: Voluntarily shown facial expressions

tend to be displayed in an asymmetrical way.

28

Real versus Faked Smile

Pan Am smile (without eyes) Real smile

29

Real versus Faked Smile

Asymmetric (Pan Am) smile Real smile

30

Real versus Faked Smile

Smile with blended anger (in the

eye region

Real smile

31

Results of a Study

It was easier to detect faked smiles by the mouth region.

Robots with an asymmetrical smile were rated as

significantly less happy than robots with a genuine smile.

Results are in line with research on virtual agents:

Rehm & André, AAMAS 2005:

• Agents that fake emotions are perceived as less trustworthy

and less convincing

• Subjects were not able to name reasons for their uneasiness

with the deceptive agent

B. Endrass, M. Häring, G. Akila, E. André: Simulating

Deceptive Cues of Joy in Humanoid Robots. IVA 2014:

174-177

32

TARDIS: a job interview training

system for young adults

33

Social Feedback Loop

Improve

Social Skills

Implicit Social Response

Generate Feedback

Explicit Hint on Social Behavior

Behavior Analysis

Social Behavior

Sensors

34

Behavior

Analysis Real-time multimodal analysis and classification

of social signals

Expressivity features (Energy, Openness, Fluidity)

Facial expressions (Smiles, Lip biting)

Speech quality (Speech rate, Loudness, Pitch)

Engagement, Nervousness

35

Evaluation

Location:

Parkschule School in Stadtbergen, Germany

Participants:

20 pupils (10m/10f), 13-16 years old, job seeking

Two practitioners

I. Damian, T. Baur, B.Lugrin, P. Gebhard, G.

Mehlmann, E. André: Games are Better than Books:

In-Situ Comparison of an Interactive Job Interview

Game with Conventional Training. AIED 2015: 84-94

36

Evaluation

Two conditions:

TARDIS versus Book

37

Day 1 Day 2 Day 3

Pre-Interviews Training (Control) Training (TARDIS) Post-Interviews

20 pupils

2 practitioners

Task: mock-

interviews

Duration: ~10 min

10 pupils

Task: reading a

job interview guide

Duration: ~10 min

10 pupils

Task: Interaction

with TARDIS +

NovA

Duration: ~10 min

20 pupils

2 practitioners

Task: mock-

interviews

Duration: ~10 min

2x performance

questionnaires

(user +

practitioner)

user experience

questionnaires

user experience

questionnaires

2x performance

questionnaires

(user +

practitioner)

Experimental Setting

38

Results

The overall behavior of the pupils who had interacted

with TARDIS was rated significantly better by job trainers

than the overall behavior of the pupils who prepared

themselves for the job interview using books.

Only for the pupils who trained with TARDIS we were

able to measure statistically significant improvements:

Their use of smiles appeared more appropriate.

Their use of eye contact appeared more appropriate.

They appeared significantly less nervous.

[...] using the system, pupils seem

to be highly motivated and able to

learn how to improve their

behaviour […] they usually lack

such motivation during class

[...] transports the experience into

the youngster’s own world

[...] makes the feedback be much

more believable

40

Augmenting

Social

Interactions

I. Damian, C.S. Tan, T. Baur, J. Schöning,

K. Luyten, E. André: Augmenting Social

Interactions: Realtime Behavioural

Feedback using Social Signal Processing

Techniques. CHI 2015: 565-574

41

Social Feedback Loop

Explicit

Feedback Generation Behavior Analysis

Social Behavior

Sensors Improve

Social Skills

42

Social Feedback Loop

Behavior Analysis

Social Behavior

Explicit

Feedback Generation

Haptic Feedback

Sensors Improve

Social Skills

15 speakers, 2 observers Task: Hold 5 min presentation 2 Conditions: system on, system off - within subjects - randomized order, 2 weeks apart

Data acquisition: social signal recordings, questionnaires (speaker/observers)

Study 1: Quantitative study in controlled environment

Objective analysis of recordings: Amount of inappropriate behaviour decreased when system was on

Off

On

% in

app

rop

riat

e b

ehav

iou

r

(lo

wer

is b

ette

r)

44

Example user reaction: Every time the user received negative feedback, he quickly adjusted his openness

45

3 speakers, 13 observers Task: Present PhD progress Data acquisition: semi-structured interview

Study 2: Qualitative study in a real presentation setting

[...] once I saw the feedback that I was talking too fast, I tried to adapt

[...] once I saw the feedback that I was talking too fast, I tried to adapt

[...] most of the time I did not perceive the system, only when I consciously looked at the feedback

[...] once I saw the feedback that I was talking too fast, I tried to adapt

[...] most of the time I did not perceive the system, only when I consciously looked at the feedback

It was a good feeling seeing everything [the icons] green ... it’s like applause, or as if someone looks at you and nods. However, the green lasts longer than a nod [laughs]

Exploring Eye-Tracking-Driven

Sonification for the Visually Impaired

Augmented Human, Geneva, 2016

51

Feedback Loop

Provide

Information on

Social Context

Behavior Analysis

Social Behavior

Feedback Generation

Explicit Audio Feedback

Sensors

52

Facial Expression Sonification

woodblock piano guitar

french horn bells

Map facial expressions onto musical

instruments

53

User Study

Users:

7 blind and visually impaired participants

Criteria:

No nystagmus, unrestricted eye

movements

Age Gender Visual impairment Control method

68 male Cataract center point

49 female Cataract (early stage) eye gaze

43 female Optic atrophy eye gaze

73 male Congenital blindness center point

68 male Optic nerve damage (accident) center point

87 female Macular degeneration eye gaze

70 male Retinal degeneration eye gaze

54

Experiment

Scenario:

Two videos with a speaker giving a monologue are

shown

Task:

Rate emotional state of the speaker

Results:

Videos were rated more accurately with the system on

55

Results

56

Overall Conclusions

Social and emotional sensitivity are key elements of

human intelligence.

Social signals are particularly difficult to interpret

requiring to understand and model the causes and

consequences of them.

Offline applications start from too optimistic recognition

rates.

More work needs to be devoted to interactive online

applications.

More information and software available under:

http://www.hcm-lab.de

57

Current Work:

Mobile Social Signal Processing

SSJ: Realtime Social Signal Processing for Java/Android

SSI – Unix/Android build compatibility

58

Thanks you very much!