the empathy machine - courses (reflecting...

8
The Empathy Machine Augmenting human perception of emotional expression through music David Kadish University of British Columbia Okanagan Kelowna, Canada [email protected] Nikolai Kummer University of British Columbia Okanagan Kelowna, Canada [email protected] ABSTRACT Empathy is a key component of interpersonal interactions that is often neglected by modern communication technolo- gies. This paper presents the prototyping and initial testing of a device that enhances a person’s empathic sense. The system uses facial expression recognition to identify the emo- tional state of a user’s conversation partner. It algorithmi- cally generates music in real-time to match the expressive state of the partner and plays the music to the user in a non- destructive manner. User testing indicates that the system can reliably generate music corresponding to the emotions of anger, happiness fear and sadness and that the presence of emotional music augments the emotional response generated by visual cues. Categories and Subject Descriptors H.4 [Information Systems Applications]: Miscellaneous; D.2.8 [Software Engineering]: Metrics—complexity mea- sures, performance measures General Terms Human Factors Keywords Emotional Music, Empathy, Facial Expression Recognition 1. INTRODUCTION Empathy, the ability to share feelings with others, is an important part of interpersonal communication and plays an essential role in human relations. Many communication technologies such as telephones and e-mail act as barriers between the communicators that reduce the opportunity for empathic interaction. This document describes the devel- opment and evaluation of a system that aims to facilitate empathy in live interpersonal interactions. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists requires specific permission and/or a fee. HIT2012, Vancouver, BC, Canada. c UBC 2012 . Figure 1: The empathy machine plays music to the user (wearing the green backpack), reflecting the facial expression of the conversation partner (in the plaid shirt) The empathy machine, attempts to amplify the emo- tional cues that a person receives from a conversation part- ner by generating emotion-appropriate music which plays in the background of their interaction. The system uses au- tomatic facial expression recognition (FER) to identify the facial expression of the person opposite the user. A music generation system uses a rule-based approach to select the musical properties that would best suit the associate’s ex- pressive state. The addition of music to the interaction — much like the addition of music to the audio reel of a fea- ture film — compliments the visual emotional signals that the user receives when observing the facial and gestural ex- pressions of the associate. The authors contend that the addition of auditory emo- tional cues constitutes an extension and augmentation of the empathic sense of the user. Chapter 2 explores some of the prior art in fields that relate to this the proposed system: audio augmented re- ality displays; intimate interfaces; psychological studies of emotion, empathy and communication; music and emotion; automatic facial expression recognition; and, computer gen- eration of music. Chapter 3 describes the design system in the context of the works in Chapter 2. The empathy machine’s two major subsystems, the FER system and the music generation sys-

Upload: buikhue

Post on 06-May-2018

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: The Empathy Machine - Courses (reflecting …courses.ece.ubc.ca/518/previous/hit2011W/papers/KadishKummer.pdf · The Empathy Machine ... cally generates music in real-time to match

The Empathy Machine

Augmenting human perception of emotional expression through music

David KadishUniversity of British Columbia Okanagan

Kelowna, [email protected]

Nikolai KummerUniversity of British Columbia Okanagan

Kelowna, [email protected]

ABSTRACTEmpathy is a key component of interpersonal interactionsthat is often neglected by modern communication technolo-gies. This paper presents the prototyping and initial testingof a device that enhances a person’s empathic sense. Thesystem uses facial expression recognition to identify the emo-tional state of a user’s conversation partner. It algorithmi-cally generates music in real-time to match the expressivestate of the partner and plays the music to the user in a non-destructive manner. User testing indicates that the systemcan reliably generate music corresponding to the emotions ofanger, happiness fear and sadness and that the presence ofemotional music augments the emotional response generatedby visual cues.

Categories and Subject DescriptorsH.4 [Information Systems Applications]: Miscellaneous;D.2.8 [Software Engineering]: Metrics—complexity mea-sures, performance measures

General TermsHuman Factors

KeywordsEmotional Music, Empathy, Facial Expression Recognition

1. INTRODUCTIONEmpathy, the ability to share feelings with others, is an

important part of interpersonal communication and playsan essential role in human relations. Many communicationtechnologies such as telephones and e-mail act as barriersbetween the communicators that reduce the opportunity forempathic interaction. This document describes the devel-opment and evaluation of a system that aims to facilitateempathy in live interpersonal interactions.

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise,to republish, to post on servers or to redistribute to lists requires specificpermission and/or a fee.HIT2012, Vancouver, BC, Canada. c©UBC 2012

.

Figure 1: The empathy machine plays music to the user(wearing the green backpack), reflecting the facial expressionof the conversation partner (in the plaid shirt)

The empathy machine, attempts to amplify the emo-tional cues that a person receives from a conversation part-ner by generating emotion-appropriate music which plays inthe background of their interaction. The system uses au-tomatic facial expression recognition (FER) to identify thefacial expression of the person opposite the user. A musicgeneration system uses a rule-based approach to select themusical properties that would best suit the associate’s ex-pressive state. The addition of music to the interaction —much like the addition of music to the audio reel of a fea-ture film — compliments the visual emotional signals thatthe user receives when observing the facial and gestural ex-pressions of the associate.

The authors contend that the addition of auditory emo-tional cues constitutes an extension and augmentation of theempathic sense of the user.

Chapter 2 explores some of the prior art in fields thatrelate to this the proposed system: audio augmented re-ality displays; intimate interfaces; psychological studies ofemotion, empathy and communication; music and emotion;automatic facial expression recognition; and, computer gen-eration of music.

Chapter 3 describes the design system in the context ofthe works in Chapter 2. The empathy machine’s two majorsubsystems, the FER system and the music generation sys-

Page 2: The Empathy Machine - Courses (reflecting …courses.ece.ubc.ca/518/previous/hit2011W/papers/KadishKummer.pdf · The Empathy Machine ... cally generates music in real-time to match

FERMusic

GenerationMusic

Auditory

Augmented interaction

FacialExpression Visual

EmotionProcessing

Unmediated interaction

Figure 2: System diagram showing how the empathy ma-chine augments a user’s empathetic understanding of a sit-uation

tem, are detailed in this section. The relationship betweenthe assumptions made in the conception of the empathymachine and the psychological understanding of empathy,emotion, and music are discussed.

In Chapter 4, the methods used for testing the efficacyof the system are described in detail. The experimental re-sults are presented and discussed in relation to the systemprototype. The successes and failures of the prototype arediscussed along with opportunities for improvement.

Chapter 5 states the conclusions of the paper and discussesareas for future research.

2. RELATED WORKThe conceptual basis for the empathy machine lies in

the hypothesis that music can trigger an empathic response,similar to the response that occurs when one views an emo-tional expression. In order to build a theoretical foundationfor the system, the Related Work section addresses empa-thy and emotional expression from the psychological view-point. The links between music an emotion are discussedfrom artistic and psychological perspectives. The technicalaspects of emotional computer-generated music and facialexpression recognition are addressed towards the end of thesection.

2.1 Empathy and Emotional ExpressionEmpathy is the ability to identify with the feelings and

emotions of another person or being. The neurological basisfor empathy is the existence of so-called “mirror neurons”[18]. Mirror neurons are activated both when a subject seesan action or emotional display and when they perform thatsame activity. Emotional empathy, then, can manifest asa person feeling the same emotion as someone they are ob-serving simply by seeing an emotional gesture on the partof the observed.

There are many auditory and visual signals of emotionalstate; one of the primary sources of emotional state informa-tion in people is facial expressions. Ekman et al. [4] arguedthat certain facial expressions, specifically anger, disgust,fear, joy, sadness, and surprise, are universal and not arte-

facts of culture. This was widely criticized, although thenotion of some universal components of facial expressioncombined with culturally specific features is now generallyaccepted[17].

The combination of mirror neuron and facial expressiontheories enables the understanding of facial expressions asvectors for inducing an empathetic response in an observer.This conception is the unmediated interaction shown in Fig-ure 2.

2.2 Emotion and MusicViewing facial expressions is not the only way to induce

emotional empathy; music is often cited as being able toelicit an emotional response from a listener[20]. There aretwo schools of thought about the effect of music on the emo-tional state of the listener. The ‘emotivists’ believe that mu-sic creates a real emotional response in the listener, changingtheir emotional state[20]. The alternative view to this is heldby ‘cognitivists’, who postulate that music merely representscertain emotions[24]. This issue is remains contentious in theartistic and psychological literature[19]. The implications ofthis duality on the effectiveness on the proposed device havebeen considered and are discussed in further detail in Sec-tion 3.2.

The various properties of music that elicit or representcertain emotions have been studied by scientists since the1930s[9]. In the initial research scientists were able to linka limited number of musical features — tempo, rhythm andharmonic mode — to certain emotions. Since then, researchinto the influence of musical features on emotion has ex-panded to include a wider range of features.

Many studies have presented mappings of musical featuresto emotions[12, 7, 8]. It should be noted that these lists fo-cus mainly on western styles of music, but there is evidenceto suggest that at least some emotions are universally rec-ognizable in music[6].

Musical parameters corresponding to the emotions anger,happiness and sadness have been well documented in lit-erature, which tends to use the circumplex model of emo-tion. In the circumplex model, emotions are mapped onto a2-dimensional plane, according to the strength of their va-lence and arousal. Negative and positive valence correspondto a person feeling miserable or pleased, respectively. Thearousal is a measure of the energy behind the emotion. Lowarousal corresponds to a person feeling tired, calm or satis-fied. High arousal corresponds to a person feeling alarmed,astonished or aroused.

There was no mention, in the works consulted, regardingmusical parameters for the emotions of surprise and disgust,likely because these sentiments are difficult to express musi-cally. Therefore, the empathy machine will focus on emo-tional expressions of fear, anger, happiness and sadness.

2.3 Emotional Music SynthesisEmotional music synthesis (EMS) is the computer gener-

ation of music that corresponds to a recognizable emotion.Composers and artists use music to evoke emotion and algo-rithmic musical composition is widely researched, howeverthe combination of the two is a relatively new field of musicalcomposition. In EMS, musical parameters are purposefullymanipulated to correspond to certain emotions. One exam-ple of EMS was presented in [5], in which a generator altersan existing piece of music to correspond to a mood.

Page 3: The Empathy Machine - Courses (reflecting …courses.ece.ubc.ca/518/previous/hit2011W/papers/KadishKummer.pdf · The Empathy Machine ... cally generates music in real-time to match

In the work done by Wallis et al. an EMS generator hasbeen created that uses rule-based algorithmic compositionto create music from scratch. The author is able to gener-ate emotional music that corresponds to the the circumplexmodel of emotions [21]. By manipulating musical featureslike pitch, register, loudness, rhythmic roughness, tempo andarticulation, the author was able to generate music withemotional content. The presented music[29], correspondedwell to the emotions it is supposed to emulate.

AthenaCL[? ] is an example of an algorithmic music com-position software. In the past, it has been used for artisticinstallations, as well as by researchers as a computer-aidedcomposition tool.

2.4 Facial Expression RecognitionThe recognition of facial expressions by computers traces

its roots to psychology’s treatment of facial expressions. In1969, Hjortsjo [10] described the idea of encoding movementsof the face into basic units known as Action Units (AUs).The 1978 paper by Ekman and Friesen [2] created a standardencoding method known as FACS. The authors extended thefunctionality of FACS with a paper describing a method fordetermining facial expressions of emotion using the FACScoding [3]. The idea of using computers to automaticallydiscern facial expressions was first proposed by Suwa et al.[25] in 1978. Samal and Iyengar[22] completed the first liter-ature review with a focus on the works that could contributeto the task of automatic classification of facial expressions in1992. Early efforts that followed ([13, 16]) focused on iden-tifying the 6 basic, universal emotions described by Ekmanet al. [4] in 1987: happiness, surprise, sadness, fear, disgustand anger.

The first automatic facial expression recognition systemsfocused on determining the expression displayed in static im-ages. Evidence from psychological studies suggests that thetemporal dynamics of facial expressions are an importantfactor in understanding facial expressions [1]. Recent FERefforts have begun to examine the dynamic motion of a sub-ject’s face to label the temporal segments of an expression[14, 27] and determine whether expressions are spontaneousor posed [11].

Much of the work toward recognition of facial expres-sions and their temporal qualities has concentrated on geo-metric feature sets, which assess the position and motionof points on a face to distinguish action units from oneanother[14, 15, 26]. Valstar and Pantic [26] propose a setof 25 feature types that are classified using combinations ofSupport Vector Machines and Hidden Markov Models, withaverage F1-scores around 60%. This effort forms the basisfor the FER subsystem in the empathy machine, which isdescribed in detail in Section 3.1.

3. PROTOTYPEThe system prototype consists of two primary compo-

nents: the FER system and the EMS system. The FERsystem is responsible for acquiring images, tracking faces,extracting features and classifying AUs and emotions fromthose features. The EMS system takes the emotion outputfrom the FER system and is responsible for generating musicclips to match the detected emotion. The following sectionsdeal with the design of these subsystems.

3.1 Facial Expression Recognition

(a) The face isneutral and nodeformation isvisible

(b) Themouth beginsto move andsome change isevident

(c) Linkagesaround themouth lighten;the smile takesform

(d) The smileis at its apexand the linksare at theirbrightest

Figure 3: An video sequence from the MMI database[28]. Amale subject smiles and coloured lines show the change indistance between points from a neutral face (lighter repre-sents a greater deformation).

The FER system is a modified version of the system de-scribed by Valstar and Pantic [26]. FaceTracker, an OpenCV-based algorithm for face registration and tracking based onSaragih et al. [23], is used to track 66 points on a face acrossa series of frames.

Features f1−5 from [26] are extracted from the FaceTrackerdata, with the first frame in the sequence considered the neu-tral face. f1 and f2 are the vertical and horizontal distancesof every tracked point from its neutral position. f3 is theinterpoint distance in a given frame. f4 is the change in in-terpoint distance from neutral and is depicted in Figure 3.f5 is the interpoint angle in a given frame. These equationsare listed in Equation 1.

f1(pi, t) = pi,y(t)− pi,y(1) (1a)

f2(pi, t) = pi,x(t)− pi,x(1) (1b)

f3(pi, pj , t) = ||pi(t)− pj(t)|| (1c)

f4(pi, pj , t) = f3(pi, pj , t)− f3(pi, pj , 1) (1d)

f5(pi, pj , t) = arctan

(||pi,y(t)− pj,y(t)||||pi,y(t)− pj,y(t)||

)(1e)

The results of Equation 1 are calculated for ever pointand pair of points. Then, a polynomial function in the formf(x) = ax2+bx+c is fit to the last N points for each featureto represent the feature over time. The values of a, b and care then used as the feature set for classification.

The python machine learning package scikit-learn isused to perform classification on the features. For each ac-tion unit, principle component analysis (PCA) is performedon the extracted features and a support vector machine(SVM) classifier is trained to determine whether that AUis active based on the principle components. A second clas-sifier is trained to map AU activation to displayed emotion,using a multiclass one-versus-one SVM. SVMs are trainedusing a one-versus-one training strategy to distinguish be-tween the 6 basic emotions and a neutral state.

A diagram depicting the entire FER system is shown inFigure 4.

3.2 Music GenerationThe feedback to the user, regarding the emotional state of

the associate, for the empathy machine was selected in the

Page 4: The Empathy Machine - Courses (reflecting …courses.ece.ubc.ca/518/previous/hit2011W/papers/KadishKummer.pdf · The Empathy Machine ... cally generates music in real-time to match

EmotionSVM

AU1 SVM

AU2 SVM

...

AU? SVM

FeaturesFaceTrackerVideo Emotion

Figure 4: Flow of data in the facial expression recognition system

form of emotional music. The music can be played at lowlevel and therefore interfere little with the social interactionbetween the user and the associate.

The two schools of thought about the effect of music onthe listener have been considered. Assuming the music gen-erator is able to represent an emotion adequately, one of twothings will happen.

If music can evoke emotion (i.e. the ’emotivist’ theory),the user’s emotional state will be altered to feel with theassociate. This represents an increase in the empathic feelingwith the associate. Since the user’s emotional state has beenaltered, the desired effect will have been achieved.

If the competing theory is true and music simply repre-sents an emotion, the user will become actively aware ofwhat the associate is feeling during their interaction. Thisrepresents an increase in situational empathy in the user.

Another advantage of using music that corresponds toemotion is the intuitive feedback to the user. The user willonly need little time to familiarize himself with the musicfeedback, thus reducing learning time.

The emotional music that was used to represent the emo-tions was computer generated. Computer generated musiclends itself to continuous generation and does not require anin-depth knowledge of musical composition. The music wasgenerated using the AthenaCL algorithmic music genera-tor. Algorithmic composition tends to randomize the music,thereby reducing strain on the user’s ears due to repetitivesounds.

Musical parameters for the emotional music were selectedfrom literature[12, 7, 30]. Musical roughness was achievedby the number of repeated notes within a fixed number ofnotes. The higher the number of repeated notes, the lessrough sounding the music was perceived.

The notes were chosen from various minor or major chords.Minor chords tend to sounds ’darker’ where major chords areassociated with more joyful emotions. The following list de-scribes the final parameters for the music following the firstround of user testing. A summary of the parameters can befound in Table 1

1. Anger Music that corresponded to anger used fasttempo, and high pitch. The music created a rough rhythmand consisted of sharp notes (stoccato). The music instru-ment chosen was the piano and the Phrygian harmonic modewas selected.

2. Happiness Music that corresponded to happiness orjoy consisted of very high pitch notes, a high rhythmicalroughness and a fast tempo. The music instrument chosenwas the orchestral harp (MIDI instrument #47) and theLydian harmonic mode was selected.

3. Sadness Music that corresponded to sadness consistedof long notes (legato) with low pitch and slow tempo. Therewas little rhythmical roughness and few notes being played.The instrument chosen was the piano and contained minorchords.

4. Disgust The literature consulted did not mention dis-gust as a separate emotion. The music that went with dis-gust was geared towards discrimination. The selected musicwas low pitch and the tempo alternates between brief fastperiods and extended slow notes.

5. Fear Music that corresponds to fear was implementedby exploiting the principle of atonality. One strategy ofcreating atonality is to play all the notes in an octave atrandom, however no note can be repeated, until all the notesin the octave have been played. The music used high pitchand moderately slow tempo. The instrument was chosento be MIDI instrument #100 (FX 4 atmosphere) from the’Synth Effects’ instrument subsection. This gave the musica synthetic and unnatural feel.

4. EVALUATION

4.1 FER System BenchmarkingThe performance of the FER system is evaluated using

an F1-score. The F1-score is a combined measurement ofprecision — the proportion of positive predictions that arecorrect — and recall — the proportion of positive valuesthat are correctly predicted.

The classifier is trained and tested by splitting the avail-able data from the MMI database[28] into two equal sets.Evaluation is done by training the system using K-fold cross-validation on the training data and then testing the resultingsystem on the unused testing data. The results of this eval-uation are shown in Table 2 and Table 3.

The results of classifier testing show that the system per-forms moderately well. The average F1 score is 37%, farlower than the 60% average of the system in [26]. There area number of scores in the 60-70% range, and with some fur-ther adjustment of the classification parameters, the scoresfor the other action units might be improved. The perfor-mance of the AU-to-emotion classifier is quite a bit better,with it’s average over 50%.

4.2 Emotional Music User StudyThe computer generated music was evaluated in a user

study. The main goal of this study was to test whether thelistener could discriminate between certain emotions. Twodifferent tests were done in order to evaluate strength of cor-relation between the music and the selected emotions. The

Page 5: The Empathy Machine - Courses (reflecting …courses.ece.ubc.ca/518/previous/hit2011W/papers/KadishKummer.pdf · The Empathy Machine ... cally generates music in real-time to match

Table 1: Association between the 5 emotions and musical features for the music generator

Tempo Pitch Rhythm Roughness MIDI Instrument Additional ParametersAnger Fast High Few repeated notes #2 Piano Stoccato, Phrygian mode

Joy Fast Few repeated notes #47 Orchestral Harp Stoccato, Lydian modeSadness Slow Low Repeated notes #2 Piano LegatoDisgust Slow #2 Piano #33 Acoustic Bass

Fear Fast High #100 FX 4 Atmosphere Atonality

Table 2: F1-scores for AU classification

AU Off On Avg1 0.89 0.66 0.782 0.82 0.25 0.544 0.84 0.37 0.615 0.88 0.65 0.776 0.81 0.07 0.44

10 0.84 0.38 0.6111 0.85 0.49 0.6712 0.88 0.62 0.7513 0.80 0.01 0.4115 0.83 0.35 0.5916 0.90 0.71 0.8117 0.80 0.05 0.4318 0.80 0.02 0.4120 0.91 0.75 0.8322 0.82 0.23 0.5324 0.81 0.16 0.4925 0.80 0.00 0.4026 0.83 0.30 0.5727 0.89 0.68 0.7930 0.91 0.74 0.8343 0.87 0.61 0.7445 0.80 0.01 0.4146 0.85 0.43 0.64

Avg 0.84 0.37 0.61

Table 3: F1-scores for emotion classification

Emotion Count Precision Recall F1none 1335 0.97 0.93 0.95anger 14 0.45 0.36 0.40disgust 16 0.33 0.31 0.32fear 14 0.30 0.43 0.35joy 22 0.25 0.82 0.38sadness 16 0.45 0.62 0.53surprise 20 0.59 0.80 0.68average 0.48 0.61 0.52

Figure 5: Sample music evaluation form

following five emotions were chosen for the final test: anger,fear, happiness, sadness and disgust. A high correlation be-tween music and emotion would induce a certain emotionin the listener. Alternatively, a strong correlation would re-duce the learning time to associate a specific sound clip tothe corresponding emotion.

The participants in each test were asked to fill out a ques-tionnaire while listening to a sound clip. A sample ques-tionnaire is shown in Figure 5. Participants were also askedto rate how similar the sound clips were to each other on ascale from 1 to 10. The more distinct each sound clip, theless time the user would spend, trying to figure out whichemotion it corresponds to.

The first test consisted of four people and consisted of afirst batch of sound clips. Preliminary results showed a sim-ilarity between the clips of 7.5 out of 10. In order to improvethis, additional instruments were added to make the musicmore distinct. The first test also included the ’confusion’emotion. It should be noted that during music evaluation,confusion was repeatedly the highest or second highest emo-tion. Test subjects tended to favor that emotion when theywere unsure about a sound clip. This was the reason thatemotion was removed from the sound clip generation. Inorder to improve the remaining music,harmonic modes wereimplemented. The improved sound clips (parameters shownin Table 1) were then tested in the second test on six par-ticipants.

The results for the second test are shown in Figure 6 in theorder of which the music clips were played. The best corre-lation between music and emotion was the happiness soundclip at 88% with little cross correlation to other emotions.The second test shows good results for the emotions: fear,anger, sadness and happiness. Low correlation between theemotion for disgust and the corresponding music clip wasobserved. This is however understandable, since there waslittle reference to music that corresponds to that emotionin literature. Many participants also commented that theycould not imagine music that would correspond to this emo-

Page 6: The Empathy Machine - Courses (reflecting …courses.ece.ubc.ca/518/previous/hit2011W/papers/KadishKummer.pdf · The Empathy Machine ... cally generates music in real-time to match

anger joy fear sadness disgustPreceived emotion

strongly

disagree

disagree

neither

agree

strongly

agree

Deg

ree

ofag

reem

ent

User perception of emotion in music

angerjoyfearsadnessdisgust

Figure 6: Results of the second user music test with a samplesize of six people. Shades show the intended emotion of theclip, and bars show the subjects’ level of agreement that agiven clip corresponds to a given emotion.

tion. In this case, the user would have to be trained torecognize the music and the corresponding emotion of theconversation partner.

The similarity between music clips was rated at 6.5 outof 10. This is a small improvement over the previous test,however due to a small sample size may not be significant.The strong similarity is most likely explained by the factthat MIDI music was used. This can be improved in the fu-ture by implementing the music on more sophisticated musicgeneration programs.

4.3 Audio-visual Emotion User StudyThe audiovisual emotion study was designed to gauge the

effectiveness of the empathy machine in a controlled set-ting. The study is intended to evaluate the following:

1. Does the system distract the user from what is hap-pening in the video?

2. Does the user’s situational empathy increase with theuse of the empathy machine?

6 subjects were divided into a 5-person test group and a1-person control. All 6 subjects filled out a pre-test surveyasking them to rate their emotional state in terms of 4 emo-tions (anger, disgust, joy, and fear) on a 5-point scale. Thesubjects were then asked to watch a video clip of an inter-view with a grieving mother, with the conversation muted.For the test group, the sad-sounding music was played in thebackground, while the entire video was silent for the control.Subjects were then asked to re-complete the survey ratingtheir emotional states. The change in emotional state self-report is shown in Figure 7 and is used to distinguish theeffect of the music on the participants.

In addition to the emotion survey, participants were askedto recall details from the clips: the colour of the woman’sshirt and hair, and the text in the bottom left corner of thescreen. The difference in correct answers between the testgroup and the control is used as an indicator of the levelof distraction of the music. On average, the test subjects

anger disgust joy sadnessEmotion

much

less

less

neutra

l

more

much

more

Cha

nge

inem

otio

nals

tate

Emotional state change during video

testcontrol

Figure 7: Self-reports of the change in test subjects’ emo-tional states while watching the video

responded correctly to 2 of the 3 questions, while the controlsubject only responded correctly to 1 question.

Finally, the users were asked to identify the emotionalstate of the people in the video to determine whether thedifference in the reported emotional empathy is due to adifference in the subjects’ cognitive ability to recognise emo-tion. Results from this survey are shown in Figure 8.

It is important to consider that with a sample size of only6 and only a single subject acting as a control, the testshould not be considered rigorous. Furthermore, the testshave not been designed by people trained in emotional eval-uations, and the self-reporting methods used in the test maybe subject to bias. That said, some important trends seemto be evident. The control and test subjects seem to havesimilar cognitive understanding of the display of emotionin the video. They vary widely, however, in the change ofemotional state, with the emotional states of the test sub-jects changing in the direction of the percieved emotionalstate of the video subjects, while the control subject’s emo-tional state changed in the opposite direction. This suggeststhat the generated music is an effective empathetic amplifier,though more thorough studies would be required to confirmthis.

The music does not appear to distract users as the controland test groups scored similarly on the recall portion of thetest. With this result, it is important to consider that therecall test focused on input to the visual channel. Furthertesting is required to determine whether the music interfereswith users’ auditory attention, although it is likely that moreinterference would be seen in the auditory channel.

5. CONCLUSIONSDrawing from the fields of psychology, art and engineer-

ing, this paper has established a basis for a system thataugments the empathetic response in interpersonal interac-tions. The design of a prototype for such as system has beendocumented and tested on a small base of potential users.

Preliminary results show that users are generally able todistinguish computer generated angry, joyful, fearful and sadmusic clips from each other, but may have difficulty with mu-

Page 7: The Empathy Machine - Courses (reflecting …courses.ece.ubc.ca/518/previous/hit2011W/papers/KadishKummer.pdf · The Empathy Machine ... cally generates music in real-time to match

anger disgust joy sadnessEmotion

strongly

disagree

disagree

neither

agree

strongly

agree

Deg

ree

ofag

reem

ent

Subjects’ impressions of emotional state of the mother

testcontrol

Figure 8: Test subjects’ responses to statements affirmingthe emotional state of the woman in the video as angry,disgusted, joyous and sad

sic that is intended to evoke disgust. User testing has also in-dicated that, in a highly controlled environment, emotionalmusic matched to the image of a visually emotional sub-ject can increase the empathetic response of a viewer, whencompared to the same visual cues without music. This isa positive development, but whether the same effect can beseen in a non-sterile environment remains to be tested.

Overall, the empathy machine seems to be feasible incontrolled, or semi-controlled environments — such as at atable in a coffee shop. Use in day-to-day interactions wouldrequire further development of the facial expression recogni-tion system and the emotional music system to render themrobust enough for regular use.

References[1] P. Ekman and E.L. Rosenberg. What the face reveals:

Basic and applied studies of spontaneous expression us-ing the Facial Action Coding System (FACS). OxfordUniversity Press, USA, 1997.

[2] Paul Ekman and W.V. Friesen. Facial action cod-ing system: A technique for the measurement of facialmovement, 1978.

[3] Paul Ekman and WV Friesen. EMFACS facial codingmanual. Human Interaction Laboratory, San Francisco,1983.

[4] Paul Ekman, W.V. Friesen, M. O’Sullivan, AnthonyChan, I. Diacoyanni-Tarlatzis, Karl Heider, RainerKrause, W.A. LeCompte, T. Pitcairn, P.E. Ricci-Bitti,and Others. Universals and cultural differences in thejudgments of facial expressions of emotion. Journal ofPersonality and Social Psychology, 53(4):712, 1987.

[5] Anders Friberg. Mapping Gestural Input to MusicalOutput: pDM: An Expressive Sequencer with Real-Time Control of the KTH Music-Performance Rules.Computer Music Journal, 30(1):37 – 48, 2006. ISSN0148-9267.

[6] T Fritz, S Jentschke, and N Gosselin. Universal recogni-tion of three basic emotions in music. Current Biology,2009.

[7] Alf Gabrielsson and Erik Lindstrom. The influence ofmusical structure on emotional expression. Music andemotion Theory and research Music and emotion The-ory and research Series in affective science, pages 223–248, 2001.

[8] S Le Groux. Interactive sonification of the spatial be-havior of human and synthetic characters in a mixed-reality environment. Proceedings of the 10th . . . , 2007.

[9] Kate Hevner. Experimental Studies of the Elements ofExpression in Music. The American Journal of Psy-chology, 48(2):246, April 1936. ISSN 00029556. doi:10.2307/1415746.

[10] CH Hjortsjo. Manniskans ansikte och det Mimiskaspraket. Studentliteratur, Lund, Sweden, 1969.

[11] M. Hoque, L.P. Morency, and R. Picard. Are youfriendly or just polite?–Analysis of smiles in sponta-neous face-to-face interactions. Affective Computingand Intelligent Interaction, pages 135–144, 2011.

[12] Patrik Juslin and Petri Laukka. Expression, Perception,and Induction of Musical Emotions: A Review and aQuestionnaire Study of Everyday Listening. Journal ofNew Music Research, 33(3):217–238, September 2004.ISSN 0929-8215. doi: 10.1080/0929821042000317813.

[13] H. Kobayashi and F. Hara. The recognition of basicfacial expressions by neural network. In [Proceedings]1991 IEEE International Joint Conference on NeuralNetworks, pages 460–466. IEEE, 1991. ISBN 0-7803-0227-3. doi: 10.1109/IJCNN.1991.170444.

[14] M. Pantic and I. Patras. Detecting facial actions andtheir temporal segments in nearly frontal-view face im-age sequences. In Systems, Man and Cybernetics, 2005IEEE International Conference on, volume 4, pages3358–3363. IEEE, 2005.

[15] Maja Pantic and Ioannis Patras. Dynamics of fa-cial expression: recognition of facial actions and theirtemporal segments from face profile image sequences.IEEE transactions on systems, man, and cybernet-ics. Part B, Cybernetics : a publication of the IEEESystems, Man, and Cybernetics Society, 36(2):433–49,April 2006. ISSN 1083-4419.

[16] Maja Pantic and L.J.M. Rothkrantz. Toward an affect-sensitive multimodal human-computer interaction. Pro-ceedings of the IEEE, 91(9):1370–1390, September2003. ISSN 0018-9219. doi: 10.1109/JPROC.2003.817122.

[17] P Philippot. The social context of nonverbal behavior.1999.

[18] SD Preston and FBM De Waal. Empathy: Its ultimateand proximate bases. Behavioral and Brain . . . , 2002.

[19] Blackwell Publishing. Emotions and Music : A Replyto the Cognitivists. Art Criticism, 47(1):69–76, 2012.

Page 8: The Empathy Machine - Courses (reflecting …courses.ece.ubc.ca/518/previous/hit2011W/papers/KadishKummer.pdf · The Empathy Machine ... cally generates music in real-time to match

[20] N. S. Rickard. Intense emotional responses to music:a test of the physiological arousal hypothesis. Psychol-ogy of Music, 32(4):371–388, October 2004. ISSN 0305-7356. doi: 10.1177/0305735604046096.

[21] James A Russell. A circumplex model of affect. Journalof Personality and Social Psychology, 39(6):1161–1178,1980. ISSN 00223514. doi: 10.1037/h0077714.

[22] A Samal and P A Iyengar. Automatic Recognition AndAnalysis Of Human Faces And Facial Expressions - ASurvey. Pattern Recognition, 25(1):65–77, 1992.

[23] J.M. Saragih, S. Lucey, and J.F. Cohn. Deformablemodel fitting by regularized landmark mean-shift. In-ternational Journal of Computer Vision, pages 1–16,2011.

[24] Klaus R Scherer and M.R. Zentner. Emotional effects ofmusic: Production rules. Music and emotion: Theoryand research, 2001.

[25] M. Suwa, N. Sugie, and K. Fujimora. A preliminarynote on pattern recognition of human emotional expres-sion. In Proceedings of the 4th International Joint Con-ference on Pattern Recognition, pages 408–410, Kyoto,Japan, 1978.

[26] M.F. Valstar and M. Pantic. Combined support vectormachines and hidden markov models for modeling facialaction temporal dynamics. In Proceedings of the 2007IEEE international conference on Human-computer in-teraction, pages 118–127. Springer-Verlag, 2007.

[27] Michel Valstar and Maja Pantic. Fully automatic facialaction unit detection and temporal analysis. In Com-puter Vision and Pattern Recognition Workshop, 2006Conference on, volume 3p, pages 149–149, 2006.

[28] Michel Valstar and Maja Pantic. Induced disgust, hap-piness and surprise: an addition to the mmi facial ex-pression database. In Proc. Int’l Conf. Language Re-sources and . . . , pages 65–70, 2010.

[29] I Wallis, T Ingalls, and E Campana. Computer Gen-erating Emotional Music The Design Of An AffectiveMusic Algorithm. In DAFx 08, 2008.

[30] Isaac Wallis, Todd Ingalls, and Ellen Campana.A RULE-BASED GENERATIVE MUSIC SYSTEMCONTROLLED BY DESIRED VALENCE ANDAROUSAL. SMC 11, 2011.

APPENDIXA. SUPPLEMENTAL MULTIMEDIA

Multimedia examples of the work can be found athttp://acis.ok.ubc.ca/?page_id=517