a survey on mobile affective computing - arxiv · setup to generate the affective annotations for...

18
1 A Survey on Mobile Affective Computing Shengkai Zhang and Pan Hui Department of Computer Science and Engineering The Hong Kong University of Science and Technology {szhangaj, panhui}@cse.ust.hk Abstract—This survey presents recent progress on Affective Computing (AC) using mobile devices. AC has been one of the most active research topics for decades. The primary limitation of traditional AC research refers to as impermeable emotions. This criticism is prominent when emotions are investigated outside social contexts. It is problematic because some emotions are directed at other people and arise from interactions with them. The development of smart mobile wearable devices (e.g., Apple Watch, Google Glass, iPhone, Fitbit) enables the wild and natural study for AC in the aspect of computer science. This survey emphasizes the AC study and system using smart wearable devices. Various models, methodologies and systems are discussed in order to examine the state of the art. Finally, we discuss remaining challenges and future works. Index Terms—Affective computing, mobile cloud computing, wearable devices, smartphones I. I NTRODUCTION Affective Computing (AC) aspires to narrow the gap be- tween the highly emotional human and the emotionally chal- lenged machine by developing computational systems that can response, recognize and express emotions [1]. With the long-term research on emotion theory from psychology and neuroscience [2]–[4], emotion has been confirmed to be a significant effect on human communication, decision making, perception and so on. In other words, emotion contributes to an important part of intelligence. Although the research on artificial intelligence and machine learning has made big progress, the computer system still lacks proactive interaction with human without the understanding of users’ implicit af- fective state. Therefore, computer scientists have put more and more efforts on AC to fulfill a dream of constructing a highly intelligent computer system, which enriches the intelligent and proactive human-computer interaction. Existing works detect and recognize affects following three major steps: (1) collecting affect-related signals; (2) using machine learning, pattern recognition techniques to classify emotions; (3) evaluating the performance based on bench- marks. To evaluate the performance and effectiveness of models and methods, many multimedia emotion databases have been proposed [5]–[13]. However, such databases were conducted based on the “artificial” material of deliberately expressed emotions, either asking the subjects to perform a series of emotional expressions in front of camera and/or microphone or using film clips to elicit emotions in the lab [14]. Research suggests that deliberate behavior differs in spontaneously occurring behavior. [15] shows that the posed nature of emotions in spoken language may differ in the choice of words and timing from corresponding performances in nat- ural settings. [16] comes to the facial behavior, demonstrating that spontaneous deliberately displayed facial behavior has differences both in utilized facial muscles and their dynamics. Hence, there is a great need to find data that included spontaneous displays of affective behavior. Several studies have emerged on the analysis of spontaneous facial expres- sions [17]–[20] and vocal expressions [21], [22]. These studies recruited a few participants to record their speech and facial expressions for certain time (several days to months). The major problem is the scale of databases. Recently, researchers address this problem exploiting crowdsourcing [23]–[28]. Ba- sically, these works designed a mechanism (e.g., a game [27], a web search engine [29]) to collect user’s affective state in large-scale. [25] designed a Amazon Mechanical Turk [30] setup to generate the affective annotations for corpus. An alternative to address the problem and create new research opportunities for affective computing is the combina- tion of smart mobile wearable devices (e.g., iWatch, Goggle Glass, iPhone, Fitbit) and AC. We simply call it Mobile Affective Computing (MAC). Such devices have been widely used over the world. For example, in the third quarter of 2012, one billion smartphones were in use worldwide [31]. The key feature of smart devices is the abundant sensors that enable various affect-related signals unobtrusive monitoring. Exploring the possibility of using smart mobile devices for AC will benefit at least three perspectives by the potential of long-term unobtrusive monitoring user’s affective states: Influence the affect-related research literature by the wild, natural and unobtrusive study Establish the spontaneous affect databases efficiently to evaluate new AC methods, models, systems more accu- rately Enhance the user-centered human-computer interaction (HCI) design for the future ubiquitous computing envi- ronments [1], [32]–[35] It is important to note that computer scientists and engineers can also contribute the emotion research. Scientific research in the area of emotion stretches back to the 19th century when Charles Darwin [2], [36] and William James [3] proposed theories of emotion. Despite the long history of emotion theory research, Parkinson [37] contends that traditional emotion theory research has made three problematic assumptions while studying individual emotions: (1) emotions are instantaneous; (2) emotions are passive; (3) impermeable emotions. These assumptions simplify the considerably more complex story in arXiv:1410.1648v2 [cs.HC] 11 Oct 2014

Upload: others

Post on 16-Mar-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Survey on Mobile Affective Computing - arXiv · setup to generate the affective annotations for corpus. An alternative to address the problem and create new research opportunities

1

A Survey on Mobile Affective ComputingShengkai Zhang and Pan Hui

Department of Computer Science and EngineeringThe Hong Kong University of Science and Technology

{szhangaj, panhui}@cse.ust.hk

Abstract—This survey presents recent progress on AffectiveComputing (AC) using mobile devices. AC has been one of themost active research topics for decades. The primary limitation oftraditional AC research refers to as impermeable emotions. Thiscriticism is prominent when emotions are investigated outsidesocial contexts. It is problematic because some emotions aredirected at other people and arise from interactions with them.The development of smart mobile wearable devices (e.g., AppleWatch, Google Glass, iPhone, Fitbit) enables the wild and naturalstudy for AC in the aspect of computer science. This surveyemphasizes the AC study and system using smart wearabledevices. Various models, methodologies and systems are discussedin order to examine the state of the art. Finally, we discussremaining challenges and future works.

Index Terms—Affective computing, mobile cloud computing,wearable devices, smartphones

I. INTRODUCTION

Affective Computing (AC) aspires to narrow the gap be-tween the highly emotional human and the emotionally chal-lenged machine by developing computational systems thatcan response, recognize and express emotions [1]. With thelong-term research on emotion theory from psychology andneuroscience [2]–[4], emotion has been confirmed to be asignificant effect on human communication, decision making,perception and so on. In other words, emotion contributesto an important part of intelligence. Although the researchon artificial intelligence and machine learning has made bigprogress, the computer system still lacks proactive interactionwith human without the understanding of users’ implicit af-fective state. Therefore, computer scientists have put more andmore efforts on AC to fulfill a dream of constructing a highlyintelligent computer system, which enriches the intelligent andproactive human-computer interaction.

Existing works detect and recognize affects following threemajor steps: (1) collecting affect-related signals; (2) usingmachine learning, pattern recognition techniques to classifyemotions; (3) evaluating the performance based on bench-marks. To evaluate the performance and effectiveness ofmodels and methods, many multimedia emotion databaseshave been proposed [5]–[13]. However, such databases wereconducted based on the “artificial” material of deliberatelyexpressed emotions, either asking the subjects to perform aseries of emotional expressions in front of camera and/ormicrophone or using film clips to elicit emotions in the lab[14]. Research suggests that deliberate behavior differs inspontaneously occurring behavior. [15] shows that the posednature of emotions in spoken language may differ in the choice

of words and timing from corresponding performances in nat-ural settings. [16] comes to the facial behavior, demonstratingthat spontaneous deliberately displayed facial behavior hasdifferences both in utilized facial muscles and their dynamics.

Hence, there is a great need to find data that includedspontaneous displays of affective behavior. Several studieshave emerged on the analysis of spontaneous facial expres-sions [17]–[20] and vocal expressions [21], [22]. These studiesrecruited a few participants to record their speech and facialexpressions for certain time (several days to months). Themajor problem is the scale of databases. Recently, researchersaddress this problem exploiting crowdsourcing [23]–[28]. Ba-sically, these works designed a mechanism (e.g., a game [27],a web search engine [29]) to collect user’s affective state inlarge-scale. [25] designed a Amazon Mechanical Turk [30]setup to generate the affective annotations for corpus.

An alternative to address the problem and create newresearch opportunities for affective computing is the combina-tion of smart mobile wearable devices (e.g., iWatch, GoggleGlass, iPhone, Fitbit) and AC. We simply call it MobileAffective Computing (MAC). Such devices have been widelyused over the world. For example, in the third quarter of2012, one billion smartphones were in use worldwide [31].The key feature of smart devices is the abundant sensors thatenable various affect-related signals unobtrusive monitoring.Exploring the possibility of using smart mobile devices forAC will benefit at least three perspectives by the potential oflong-term unobtrusive monitoring user’s affective states:

• Influence the affect-related research literature by the wild,natural and unobtrusive study

• Establish the spontaneous affect databases efficiently toevaluate new AC methods, models, systems more accu-rately

• Enhance the user-centered human-computer interaction(HCI) design for the future ubiquitous computing envi-ronments [1], [32]–[35]

It is important to note that computer scientists and engineerscan also contribute the emotion research. Scientific researchin the area of emotion stretches back to the 19th century whenCharles Darwin [2], [36] and William James [3] proposedtheories of emotion. Despite the long history of emotion theoryresearch, Parkinson [37] contends that traditional emotiontheory research has made three problematic assumptions whilestudying individual emotions: (1) emotions are instantaneous;(2) emotions are passive; (3) impermeable emotions. Theseassumptions simplify the considerably more complex story in

arX

iv:1

410.

1648

v2 [

cs.H

C]

11

Oct

201

4

Page 2: A Survey on Mobile Affective Computing - arXiv · setup to generate the affective annotations for corpus. An alternative to address the problem and create new research opportunities

2

reality. There is considerable merit to argue that some affectivephenomena cannot be understood at the level of small group ofpeople and must always be studied as a social process [37]–[42]. With the widespread usage and rich sensors of smartwearable devices, we are able to verify emotion theories bydesigning and implementing MAC systems to proceed wild,natural and unobtrusive studies.

Establishing a large-scale and trustful labeled data of humanaffective expressions is a prerequisite in designing automaticaffect recognizer. Although crowdsourcing provides a promis-ing method to generate large-scale affect databases. It is eitherexpensive (pay participants) or error prone. In addition, itcannot monitor human affective events in daily lives failingto establish spontaneous affect databases. Current spontaneousemotional expression databases for the ground truth are estab-lished by manually labeling movies, photos and voice records,which is very time consuming and expensive. MAC research[43], [44] designed software systems upon smartphones thatran continuously in the background to monitor user’s moodand emotional states. With proper design of mechanisms tocollect feedback from users, we can establish large-scalespontaneous affect databases efficiently with very low cost.

Furthermore, ubiquitous computing environments haveshifted the human-computer interaction (HCI) design fromcomputer-centered to human-centered [1], [32]–[35]. MobileHCI designs currently involve mobile interface devices likesmartphone, wisdom band, and smart watch. These devicestransmit explicit messages while ignoring implicit informationof users, i.e., the affective state. As proved by emotiontheory, the user’s affective state is a fundamental componentof human-human communication, motivating human actions.Consequently, the understanding of user’s affective statesfor mobile HCI would be able to perceive user’s affectivebehaviour and to initiate interactions intelligently rather thansimply responding to user’s commands.

This paper introduces and surveys these recent advances inthe research on AC. In contrast to previously published surveypapers and books in the field [33], [45]–[68], it focuses on theapproaches and systems for unobtrusive emotion recognitionusing human-computer interfaces and AC using smart mobilewearable devices. It discusses new research opportunities andexamines the state-of-the-art methods that have not beenreviewed in previous survey papers. Finally, we discuss theremaining challenges and outline future research avenues ofMAC.

This paper is organized as follows: With the significance ofAC described in Section II, we introduce the sensing ability ofmobile devices relating to AC research and in detail review therelated studies combining smart mobile devices in Section IV.Section V outlines challenges and research opportunities onMAC. A summary and closing remarks conclude this paper.

II. AFFECTIVE COMPUTING

Affective computing allows computers to recognize, ex-press, and “have” emotions [1]. It is human-computer in-teraction in which a device has the ability to detect andappropriately respond to its user’s emotions and other stimuli.

Affective computing was given its name from psychology, inwhich “affect” is a synonym for “emotion”. It seems absurd.Marvin Minsky wrote in The Society of Mind: “The questionis not whether intelligent machines can have any emotions, butwhether machines can be intelligent without emotions” [69].Scientists agree that emotion is not logic, and strong emotionscan impair rational decision making. Acting emotionally im-plies acting irrationally, with poor judgment. Computers aresupposed to be logical, rational, and predictable. These arethe very foundations of intelligence, and have been the focusof computer scientists to build an intelligent machine. At firstblush, emotions seem like the last thing we would want in anintelligent machine.

However, this negative aspect of emotion is less than half ofthe story. Today we have evidence that emotions are an activepart of intelligence, especially perception, rational thinking,decision making, planning, creativity, and so on. They arecritical in social interactions, to the point where psychologistsand educators have redefined intelligence to include emotionaland social skills. Emotion has a critical role in cognition and inhuman-computer interaction. In this section, we demonstratethe importance of affect and how it can be incorporated incomputing.

A. Background and motivation

Emotions are important in human intelligence, rationaldecision making, social interaction, perception, memory andmore. Emotion theories can be largely examined in terms oftwo components: 1) emotions are cognitive, emphasizing theirmental component; 2) emotions are physical, emphasizingtheir bodily component. The cognitive component focuses onunderstanding the situations that give rise to emotions. Forexample, “My brother is dead, therefore, I am sad.” Thephysical component focuses on the physiological response thatoccurs with an emotion. For example, your heart rate willincrease when you are nervous.

The physical component helps computers understand themappings between emotional states and emotional expressionsso that the latter can infer the former. Physical aspects ofemotion include facial expression, vocal intonation, gesture,posture, and other physiological responses (blood pressure,pulse, etc.). Nowadays, abundant sensors equipped on com-puters, mobile devices or humans are able to see, hear, andsense other responses from people. The cognitive componenthelps computers perceive and reason about situations in termsof the emotions they raise. By observing human behaviour andanalyzing a situation, a computer can infer which emotions arelikely to be presented.

With the ability of recognizing, expressing and “having”emotions for computers, we can at least benefit entertain-ment, learning, social development. In entertainment, a mu-sic application selects particular music to match a person’spresent mood or to modify the mood. In learning, an affectivecomputer tutor [70] is able to be more effective in teachingby attending to the student’s interest, pleasure, and distress.In social development, affective-sensitive system can instructpeople who lack the intelligence of social communications

Page 3: A Survey on Mobile Affective Computing - arXiv · setup to generate the affective annotations for corpus. An alternative to address the problem and create new research opportunities

3

to act more friendly. More sophisticated applications can bedeveloped by recognizing user’s affect, reasoning with emo-tional cues, and understanding how to intelligently respondgiven the user’s situation. We next provide more depth forkey technologies of AC.

B. Existing technologies

Affective computing system consists of affective data col-lection, affect recognition, affect expression, etc. Currently,researchers focus on affect detection and recognition. A com-puting device could gather cues to user emotion from a varietyof sources.

This paper will focus on reviewing the recent progress ofMobile Affective Computing (MAC) that affective computingstudies using mobile devices. The biggest difference betweentraditional AC research and MAC is the data collectionmethod. As a result, I will review typical works in terms ofdifferent signals for AC. From previous studies, the followingsignals can indicate user’s emotional state and be detected andinterpreted by computing devices: facial expressions, posture,gestures, speech, the force or rhythm of key strokes andphysiological signal changes (e.g., EEG). Table I summarizesthe traditional data collection methods in terms of differentsignals.

Signals Collection meth-ods

Typical references

Facial expression Camera [17], [18], [71]–[75]Voice & speech Microphone [7], [21], [76]–[81]Gesture & pos-ture

Pressure sensors [82]–[85]

Key strokes Keyboard [86]–[91]EEG [92] EEG sensors [93]–[96]Text Human writing

material[97]–[100]

TABLE ITRADITIONAL SIGNAL COLLECTION METHODS.

The majority of affect detection research has focused ondetecting from facial expressions since the signal is very easyto be captured by camera. Microphone is commonly equippedfor computers, smartphones, etc. to record speech and vocalexpressions. Gesture and posture can offer information thatis sometimes unavailable from the conventional nonverbalmeasures such as the face and the speech. Traditionally,researchers install pressure sensors on pad, keyboard, mouse,chair, etc. to analyze the temporal transitions of posturepatterns. The pressure-sensitive keyboard used in [86] is theone described by Dietz et al. [101]. For each keystroke, thekeyboard provides readings from 0 (no contact) to 254 (maxi-mum pressure). Implementing a custom-made keyboard loggerallows researchers to gather the pressure readings for keystroke pattern analysis. EEG studies often used some specialhardware to record EEG signals. For example, AlZoubi et al.[93] used a wireless sensor headset developed by AdvancedBrain Monitoring, Inc (Carlsbad, CA). It utilizes an integratedhardware and software solution for acquisition and real-timeanalysis of the EEG, and it has demonstrated feasibility for ac-quiring high quality EEG in real-world environments includingworkplace, classroom and military operational settings.

With the collected data by different methods and tools,affective detection studies choose appropriate feature selectiontechniques and classifiers based on the model used (dimen-sional [102], [103] or categories [61]) to detect affects. Theperformance of these systems and algorithms can be eval-uated by subjective self-report, experts report and standarddatabases. The subjective self-report is to ask participantsof the experiment to report their affective states periodicallywhile the experts report needs some other affective recognitionexperts or peers to report the affective states of participantswith some records (videos, photos, etc.). The third evaluationapproach relies on the developed affect databases. Researcherstest algorithms and systems using the data from these labeleddatabases to examine the classification accuracy.

However, such databases usually were created by deliberateemotions that differ in visual appearance, audio profile, andtiming from spontaneously occurring emotions. In addition,the experiments of traditional AC research are obtrusive thatthey need to recruit participants to take part in under control.Two obvious limitations are: 1) the data scale is limited thatit is not possible to collect affect-related data in a largepopulation; 2) the experimenting duration is limited that itis not possible to involve participants in the experiment toolong.

With the advent of powerful smart devices, e.g., smartphone,smart glass, smart watch, wisdom band and so on, AC willenter a new era that eliminates the limitations above with threereasons:

• Widespread use, in that as of 2013, 65 percent U.S.mobile consumers own smartphones. The European mo-bile device market as of 2013 is 860 million. In China,smartphones represented more than half of all handsetshipments in the second quarter of 2012 [104];

• “All-the-time” usage, in that 3 in 4 smartphone ownerscheck them when they wake up in the morning, while upto 40% check them every 10 minutes. When they go onvacation, 8 in 10 owners say they take their devices alongand use them all the time, compared to slightly more than1 in 10 who take them but rarely use them [105];

• Powerful sensors and networking, in that smart devicesboast an array of sensors, which include accelerometer,light sensor, proximity sensor and more. The advantage ofusing the smart device sensors is that the device itself cancollect the sensor output, store and process them locallyor communicate them to a remote server [106].

Next section will review the state-of-the-art unobtrusiveemotion recognition research.

III. UNOBTRUSIVE EMOTION RECOGNITION

This section elaborates the related work about emotionrecognition based on human-computer interfaces such as key-board and webcam.

Unlike existing emotion recognition technologies, users usea device with an expressionless face or silence, and they donot want to feel any burden associated with the recognitionprocess. There have been attempts to satisfy these require-ments; recognise emotions of the user inconspicuously with aminimum cost.

Page 4: A Survey on Mobile Affective Computing - arXiv · setup to generate the affective annotations for corpus. An alternative to address the problem and create new research opportunities

4

Poh et al. [107] presented a simple, low-cost method formeasuring multiple physiological parameters using a basicwebcam. They applied independent component analysis (ICA)on the color channels in facial image captured by a webcam,and extracted various vital signs such as a heart rate (HR),respiratory rate, and HR variability. To prove lightness andapplicability of their approach, they utilized a commonly usedwebcam as a sensing unit. Their approach showed significantpotentials for affective computing, because there is a closecorrelation between the bio-signal such as a HR and emotion,and it does not require any attention of the user.

Independent component analysis (ICA) is a relatively newtechnique for uncovering independent signals from a set ofobservations that are composed of linear mixtures of theunderlying sources. The underlying source signal of interestin the study is the BVP that propagates throughout the body.During the cardiac cycle, volumetric changes in the facialblood vessels modify the path length of the incident ambientlight such that the subsequent changes in amount of reflectedlight indicate the timing of cardiovascular events. By recordinga video of the facial region with a webcam, the red, green,and blue (RGB) colour sensors pick up a mixture of thereflected plethysmographic signal along with other sources offluctuations in light due to artefacts. Given that hemoglobinabsorptivity differs across the visible and near-infrared spectralrange, each color sensor records a mixture of the originalsource signals with slightly different weights. These observedsignals from the RGB color sensors are denoted by y1(t),y2(t), y3(t), respectively, which are the amplitudes of therecorded signals at time point t. Three underlying sourcesignals, represented by x1(t), x2(t), x3(t). The ICA modelassumes that the observed signals are linear mixtures of thesources, i.e.,

y(t) = Ax(t)

where the column vectors y(t) = [y1(t), y2(t), y3(t)]T , x(t) =

[x1(t), x2(t), x3(t)]T , and the square 3× 3 matrix A contains

the mixture coefficients aij . The aim of ICA is to find ademixing matrix W that is an approximation of the inverseof the original mixing matrix A whose output

x(t) = Wy(t)

is an estimate of the vector x(t) containing the underlyingsource signals. To uncover the independent sources, W mustmaximize the non-Gaussianity of each source. In practice,iterative methods are used to maximize or minimize a givencost function that measures non-Gaussianity.

This study was approved by the Institutional Review Board,Massachusetts Institute of Technology. They sample featured12 participants of both genders (four females), different ages(18–31 years) and skin colour. All the participants providedtheir informed consents. The experiments were conducted in-doors and with a varying amount of ambient sunlight enteringthrough windows as the only source of illumination. Partici-pants were seated at a table in front of a laptop at a distance ofapproximately 0.5m from the built-in webcam (iSight camera).During the experiment, participants were asked to keep still,breathe spontaneously, and face the webcam while their video

was recorded for one minute.With the proper experimental design, they demonstrated the

feasibility of using a simple webcam to measure multiplephysiological parameters. This includes vital signs, such asHR and RR, as well as correlates of cardiac autonomicfunction through HRV. The data demonstrate that there isa strong correlation between these parameters derived fromwebcam recordings and standard reference sensors. Regardingthe choice of measurement epoch, a recording of 1–2 minis needed to assess the spectral components of HRV and anaveraging period of 60 beats improves the confidence in thesingle timing measurement from the BVP waveform [108].The face detection algorithm is subject to head rotation limits.

Epp et al. [87] proposed a solution to determine emotionsof computer users by analyzing the rhythm of their typingpatterns on a standard keyboard. This work highlights twoproblems with current system approaches as mentioned above:they can be invasive and can require costly equipment. Theirsolution is to detect users emotional states through theirtyping rhythms on the common computer keyboard. Calledkeystroke dynamics, this is an approach from user authenti-cation research that shows promise for emotion detection inhuman-computer interaction (HCI). Identifying emotional statethrough keystroke dynamics addresses the problems of previ-ous methods by using costly equipment which is also non-intrusive to the user. To investigate the efficacy of keystrokedynamics for determining emotional state, they conducted afield study that gathered keystrokes as users performed theirdaily computer tasks. Using an experience-sampling approach,users labeled the data according to their level of agreementon 15 emotional states and provided additional keystrokes bytyping fixed pieces of text. Their approach allowed users’emotions to emerge naturally with minimal influence fromtheir study, or through emotion elicitation techniques.

From the raw keystroke data, they extracted a number offeatures derived mainly from key duration (dwell time) andkey latency (flight time). They created decision-tree classi-fiers for 15 emotional states, using the derived feature set.They successfully modelled six emotional states, includingconfidence, hesitance, nervousness, relaxation, sadness, andtiredness, with accuracies ranging from 77.4% to 87.8%. Theyalso identified two emotional states (anger and excitement) thatshowed potential for future work in this area.

Zimmermann et al. [109] developed and evaluated a methodto measure affective states through motor-behavioural parame-ters from the standard input devices mouse and keyboard. Theexperiment was designed to investigate the influence of in-duced affects on motor-behaviour parameters while completinga computer task. Film clips were used as affect elicitors. Thetask was an online-shopping task, that required participantsto shop on an e-commerce website for office-supplies. 96students (46 female, 50 male, aged between 17 and 38)participated in the experiment. During the experiment, allmouse and keyboard actions were recorded to a log-file bysoftware running in the background of the operating system,invisible to the subject. Log-file entries contained exact timeand type of actions (e.g. mouse button down, mouse positionx and y coordinates, which key pressed).

Page 5: A Survey on Mobile Affective Computing - arXiv · setup to generate the affective annotations for corpus. An alternative to address the problem and create new research opportunities

5

After subjects arrived at the laboratory, sensors and res-piratory measurement bands were attached and connected,and then they were asked to complete subject and healthdata questionnaires on the computer. All instructions duringthe experiment were given at the appropriate stages on thecomputer interface. The procedure was automated (as Fig. 1shown). The participants first familiarized themselves with theonline-shopping task and then indicated their mood on thegraphical and verbal differentials. Afterwards, the first movieclip, expected to be affectively neutral, was presented. Par-ticipants then filled out the mood assessment questionnaires,completed the online-shopping task and filled out the question-naires again. Then they randomly chose the second movie clip,inducing one of the moods PVHA, PVLA, NVHA, NVLA,nVnA (P=positive, N=negative, H=high, L=low, n=neutral,V=valence, A=arousal). The film was followed by graphicaland verbal differential questionnaires, then the task and thetwo questionnaires again. The experiment ended after 1.5 to2 hours.

8

entries contain exact time and type of action (e.g. mouse button down, mouse position x and y coordinates, which key pressed).

7.7 Procedure After arrival at the laboratory, sensors and respiratory measurement bands were attached and connected, and then subjects were asked to complete subject and health data questionnaires on the computer. All instructions during the experiment were written and given at the appropriate stages on the computer interface. The procedure was automated (see Figure 2). The subjects first familiarized themselves with the online-shopping task and then indicated their mood on the graphical and verbal differentials. Afterwards, the first movie clip, expected to be affectively neutral, was presented (control run). Subjects then filled out the mood assessment questionnaires, completed the online-shopping task and filled out the questionnaires again. Then the second movie clip was randomly chosen, inducing one of the moods PVHA, PVLA, NVHA, NVLA, nVnA (P=positive, N=negative, H=high, L=low, n=neutral, V=valence, A=arousal). The film was followed by graphical and verbal differential questionnaires, then the task and the two questionnaires again. The experiment ended after 1.5 to 2 hours.

PART I - MOOD:NEUTRAL

PART II - MOOD:PH/PL/NH/NL/

NEUTRAL

Subject Data

Inital Mood Assessment

Task Exercise

MOOD INDUCTION 1

MOOD ASSESSMENT 1A

TASK 1

MOOD ASSESSMENT 1B

MOOD INDUCTION 2

MOOD ASSESSMENT 2A

TASK 2

MOOD ASSESSMENT 2B

Debriefing

Introduction

Figure 2: Procedure of the experiment (PH=positive valence/high arousal, PL=positive valence/low arousal, NH=negative valence/high arousal, NL=negative valence/low arousal)

8 A n a l y s i s The collected data will be analyzed in three steps. In a first step, the independent measure is the mood induction procedure of the 5 different affective states PVHA, PVLA, NVHA, NVLA, nVnA (P=positive, N=negative, H=high, L=low, n=neutral, V=valence, A=arousal).

Fig. 1. Procedure of the experiment (PH=positive valence/higharousal, PL=positive valence/low arousal, NH=negative valence/high arousal,NL=negative valence/low arousal).

They analyzed the mouse and keyboard actions from thelog-files. The following parameters were tested for correlationswith mood state (the list may be extended): number of mouseclicks per minute, average duration of mouse clicks (frombutton-down until button-up event), total distance of mousemovements in pixels, average distance of a single mousemovement, number and length of pauses in mouse movement,number of events “heavy mouse movement” (more than 5changes in direction in 2 seconds), maximum, minimum andaverage mouse speed, keystroke rate per second, average

duration of keystroke (from key-down until key-up event) andperformance.

Vizer et al. [91] explored the possibility of detecting cog-nitive and physical stress by monitoring keyboard interactionswith the eventual goal of detecting acute or gradual changesin cognitive and physical function. The research analyzedkeystroke and linguistic features of spontaneously generatedtext. The study is designed with two experimental conditions(physical stress and cognitive stress) and one control condi-tion (no stress). Data were analyzed with machine learningtechniques using numerous features, including specific typesof words, timing characteristics, and keystroke frequencies,to generate reference models and classify test samples. Theyachieved correct classifications of 62.5% for physical stressand 75% for cognitive stress (for 2 classes), which they state iscomparable to other affective computing solutions. They alsostated that their solutions should be tested with varying typingabilities and keyboards, with varying physical and cognitiveabilities, and in real-world stressful situations.

Twenty-four participants with no documented cognitive orphysical disabilities were recruited for the study. The datacollected per keystroke consisting the type of keyboard event(either key up or key down), timestamp, and key code.The timestamp was recorded to a resolution of 10ms using theTickscall to the system clock in Visual Basic. Key codes allowedfor subsequent analysis of the keys pressed and words typed.All data were collected using a single desktop computerand standard keyboard.At the conclusion of the experimentalsession, the participant completed a demographic survey andreported his or her subjective level of stress for each cognitiveand physical stress task on an 11-point Likert scale.

The authors extracted features from the timing andkeystroke data captured during the experiment. Data werenormalized per participant per feature using z-scores. Foreach participant, the baseline samples were used to calculatemeans and standard deviations per feature; then the controland experimental samples were normalized using those values.Both the raw and normalized data were analyzed to determineif accounting for individual differences by normalizing resultsin this way would yield higher classification accuracy.

Machine learning methods were employed to classify stressconditions. The machine learning algorithms used in thisanalysis included Decision Tree (DT),Support Vector Machine(SVM), k-Nearest Neighbor (kNN), AdaBoost,and ArtificialNeural Network(ANN). These techniques were selected pri-marily because they have been previously used to analyzekeyboard behaviour (e.g., kNN) or they have shown goodperformance across a variety of applications (e.g., SVM). Theprimary goal was to confirm whether the selected featuresand automated models were able to successfully detect stressrather than optimize how well they detect stress. Therefore,they adopted the most commonly used parameter settingswhen training and testing stress detection models and usedthe same set of features for all five machine learning methods.The parameter settings were chosen based on experience andknowledge about the problem at hand rather than throughexhaustive testing of all possible combinations of parametersettings. For example, three-fold cross validation is chosen to

Page 6: A Survey on Mobile Affective Computing - arXiv · setup to generate the affective annotations for corpus. An alternative to address the problem and create new research opportunities

6

strike a balance between the number of folds and the numberof data points in each fold.

The primary contribution of this research is to highlightthe potential of leveraging everyday computer interactions todetect the presence of cognitive and physical stress. The initialempirical study described above (1) illustrated the use of aunique combination of keystroke and linguistic features forthe analysis of free text; (2) demonstrated the use of machinelearning techniques to detect the presence of cognitive orphysical stress from free text data; (3) confirmed the potentialof monitoring keyboard interactions for an application otherthan security; and (4) highlighted the opportunity to usekeyboard interaction data to detect the presence of cognitiveor physical stress. Rates for correct classification of cognitivestress samples were consistent with those currently obtainedusing affective computing methods. The rates for correctclassification of physical stress samples, though not as highas those for cognitive stress, also warrant further research.

Hernandez et al. [86] explores the possibility of usinga pressure-sensitive keyboard and a capacitive mouse todiscriminate between stressful and relaxed conditions in alaboratory study. During a 30-minute session, 24 participantsperformed several computerized tasks consisting of expressivewriting, text transcription, and mouse clicking. Under thestressful conditions, the large majority of the participantsshowed significantly increased typing pressure (> 79% of theparticipants) and more contact with the surface of the mouse(75% of the participants).

The purpose of this work is to study whether a pressure-sensitive keyboard and a capacitive mouse can be used tosense the manifestation of stress. Thus, they devised a within-subjects laboratory experiment in which participants per-formed several computerized tasks under stressed and relaxedconditions.

In order to comfortably and unobtrusively monitor stress,this study examines gathering behavioural activity from thekeyboard and the mouse. These devices are not only one ofthe most common channels of communications for computerusers but also represent a unique opportunity to non-intrusivelycapture longitudinal information that can help capture long-term conditions such as chronic stress. Instead of analyzingtraditional keyboard and mouse dynamics based on time orfrequency of certain buttons, this work focuses on pressure.In particular, they used a pressure-sensitive keyboard and acapacitive mouse (see Fig. 2).

Under Pressure: Sensing Stress of Computer Users Javier Hernandez1 Pablo Paredes2 Asta Roseway3 Mary Czerwinski3

1MIT Media Lab, 75 Amherst Street, Cambridge, MA 02139, USA 2EECS UC Berkeley, 387 Soda Hall, Berkeley, CA 94720-1776, USA

3Microsoft Research, One Microsoft Way, Redmond, WA 98052-6399, USA [email protected], [email protected], {astar, marycz}@microsoft.com

ABSTRACT Recognizing when computer users are stressed can help reduce their frustration and prevent a large variety of negative health conditions associated with chronic stress. However, measuring stress non-invasively and continuously at work remains an open challenge. This work explores the possibility of using a pressure-sensitive keyboard and a capacitive mouse to discriminate between stressful and relaxed conditions in a laboratory study. During a 30-minute session, 24 participants performed several computerized tasks consisting of expressive writing, text transcription, and mouse clicking. During the stressful conditions, the large majority of the participants showed significantly increased typing pressure (>79% of the participants) and more contact with the surface of the mouse (75% of the participants). We discuss the potential implications of this work and provide recommendations for future work.

Author Keywords Stress measurement; pressure-sensitive keyboard; capacitive mouse; Affective Computing.

ACM Classification Keywords H.5.2. Information interfaces and presentation: User Interfaces.

INTRODUCTION Do you remember the last time you felt genuinely stressed in front of the computer? Maybe you had a pressing deadline and very little time to write a report or perhaps you received an unpleasant e-mail you had to reply to. Although you might not have been completely aware about feeling stressed, your body was experiencing a chain of physiological changes: pupil dilation, deeper respiratory breathing, intensified beating of the heart, and increased muscle tension, among many other changes. As a result, you probably typed more vigorously and handled the computer mouse more actively. This chain of physiological

changes and their associated behavioral effects (also known as the fight or flight response), has evolved to help us face life-threatening situations. However, repeated triggering of this stress reflex during daily activity can result in chronic stress, leading to a large array of adverse health conditions such as depression, hypertension and various forms of cardiovascular diseases [21].

Figure 1. Pressure-sensitive keyboard (left), and capacitive mouse (right).

A first step towards preventing this type of condition consists in being able to detect when a person is stressed. Ideally, stress measurement systems should be continuous and unobtrusive so that they can capture the responses of people throughout the day without creating additional stress. If a person could know, for instance, that during the last week s/he experienced more stress than usual, the person could gain more awareness and incorporate behavioral changes to reduce unnecessary stressors (e.g., increase the number of breaks, change the type of work activity, or socialize more). Computers could also take advantage of this type of information to produce more complex forms of human-computer interaction [24]. For instance, if a computer user is feeling stressed, the computer could delay system updates and/or prevent unnecessary notifications. Alternatively, the computer could help circumvent stressful situations by recommending some soothing interventions [23].

Researchers have studied a wide gamut of approaches to measuring stress, such as self-reports and the measurement of physiological signals. However, many of these approaches require the cognitive attention of the person and/or are not totally unobtrusive. An alternative approach consists of monitoring behaviors that are influenced by stress (e.g., typing on the keyboard or handling the mouse) and detecting when and how these behaviors change. In this paper, we explore the use of a pressure-sensitive keyboard and a capacitive mouse (see Figure 1) to sense the

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributed forprofit or commercial advantage and that copies bear this notice and the full citation on thefirst page. Copyrights for components of this work owned by others than ACM must behonored. Abstracting with credit is permitted. To copy otherwise, or republish, to post onservers or to redistribute to lists, requires prior specific permission and/or a fee. Requestpermissions from [email protected]. CHI 2014, April 26–May 1, 2014, Toronto, Ontario, Canada. Copyright ©ACM 978-1-4503-2473-1/14/04...$15.00. HTTP://DX.DOI.ORG/10.1145/2556288.2557165

Fig. 2. Pressure-sensitive keyboard, and capacitive mouse.

For each keystroke, the keyboard provides readings from 0(no contact) to 254 (maximum pressure). They implemented

a custom-made keyboard logger in C++ that allowed us togather the pressure readings at a sampling rate of 50 Hz.

The capacitive mouse used in this work is the Touch Mousefrom Microsoft, based on the Cap Mouse described in [110].This mouse has a grid of 13 × 15 capacitive pixels withvalues that range from 0 (no capacitance) to 15 (maximumcapacitance). Higher capacitive readings while handling themouse are usually associated with an increase of hand contactwith the surface of the mouse. Taking a similar approach to theone described by Gao et al. [111], they estimated the pressureon the mouse from the capacitive readings. They made acustom-made mouse logger in Java that allowed gatheringinformation for each capacitive pixel at a sampling rate of120Hz.

In order to examine whether people under stress use theinput devices differently, they designed several tasks thatrequired the use of the keyboard or mouse under two differentconditions: stressed and relaxed. The chosen tasks are: texttranscription, expressive writing, mouse clicking.

In order to see whether the two versions of the tasks elicitedthe intended emotions (i.e., stressed or relaxed), participantswere requested to report their valence, arousal and stress levelson a 7-point Likert scale after completion of each task. Al-though stress could be positive or negative, they expected thathigh stress levels in the experiment would be associated withhigher arousal and negative valence. Additionally, throughoutthe experiment participants wore the Affectiva QTM wrist-bandsensor that scontinuously measured electrodermal activity, skintemperature, and 3-axis accelerometer with a sampling rate of8Hz.

In order to examine the differences between relaxed andstressed conditions, they performed a within-subjects labora-tory study. Therefore, all participants performed all the tasksand conditions during the experiment. After providing writtenconsent, participants were seated at an adjustable computerworkstation with an adjustable chair, and requested to providesome demographic information. Next, they were asked towear the QTM biosensor on the left wrist and to adjust thewristband so that it did not disturb them while typing on thecomputer. All data were collected and synchronized using asingle desktop computer with a 30 inch monitor. They usedthe same pressure-sensitive keyboard and capacitive mouse forall participants. The room and lighting conditions were alsothe same for all users. Fig. 3 shows the experimental setup.

The results of this study indicate that increased levels ofstress significantly influence typing pressure (> 83% of theparticipants) and amount of mouse contact (100% of theparticipants) of computer users. While > 79% of the partici-pants consistently showed more forceful typing pressure, 75%showed greater amount of mouse contact. Furthermore, theydetermined that considerably small subsets of the collecteddata (e.g., less than 4 seconds for the mouse clicking task)sufficed to obtain similar results, which could potentially leadto quicker and timelier stress assessments. This work is the firstto demonstrate that stress influences keystroke pressure in acontrolled laboratory setting, and the first to show the benefitof a capacitive mouse in the context of stress measurement.The findings of this study are very promising and pave the way

Page 7: A Survey on Mobile Affective Computing - arXiv · setup to generate the affective annotations for corpus. An alternative to address the problem and create new research opportunities

7

disease. Participants were selected to represent balanced gender, and received a $5 meal card in return for participation. The approximate duration of the experiment was 30 minutes.

Twenty-four participants (12 males and 12 females) participated in this study. The average age was 28 (standard deviation (STD) of 10.12) with a minimum of 17 and a maximum of 60. The average number of years of experience with keyboard and mouse was 16.67 (STD = 4.71) with a minimum of 9 and a maximum of 30 years. The average number of hours using keyboard and/or mouse per day was 9 hours (STD = 2.8) with a minimum of 5 and a maximum of 15. All participants except one had a background in computer science or a related field. The highest education levels for the participants were a Master’s degree (12), high school (7), doctoral degree (3), and a college (2) degree.

Protocol In order to examine the differences between relaxed and stressed conditions, we performed a within-subjects laboratory study. Therefore, all participants performed all the tasks and conditions during the experiment. After providing written consent, participants were seated at an adjustable computer workstation with an adjustable chair, and requested to provide some demographic information. Next, they were asked to wear the QTM biosensor on the left wrist and to adjust the wristband so that it did not disturb them while typing on the computer. In order to minimize the novelty effect of the devices and experimental tasks, participants continued by completing a short tutorial, in which they had to transcribe a short piece of text (442 characters) and practice the mouse clicking task. After the training session, participants performed the three tasks under the relaxed and stressed conditions. All conditions and tasks were counterbalanced. Therefore, half of the participants started with the relaxed condition and continued with the stressed condition, and the other half of the participants started with the stressed condition and continued with the relaxed condition. Furthermore, while the mouse task was always performed between the two keyboard tasks, the ordering of the expressive writing and the transcription tasks were also counterbalanced between participants. Also, a calming transition occurred between the training, the two conditions, and at the end of the study. During this calming transition, participants were instructed to watch a 2-minute clip of relaxing scenes of paradise beaches or else offered to close their eyes and think about

something relaxing. The clip and its duration was selected and validated during a pilot study. At the conclusion of the experimental session, participants completed a brief survey to provide feedback and comments about the experiment, and they were debriefed about the goals of the study. Figure 3 illustrates the different task/condition orderings.

All the tasks as well as the probes to measure self-reported data were implemented with the Processing software environment [27]. All data were collected and synchronized using a single desktop computer with a 30 inch monitor. We used the same pressure-sensitive keyboard and capacitive mouse for all participants. The room and lighting conditions were also the same for all users. Figure 4 shows a photo of the experimental room.

CapacitiveMouse

Pressure-sensitiveKeyboard

QTM

biosensor

Headsets for Transcription Task

Text for Transcription Task

30 Inch Display

AdjustableChair

Figure 4. Photograph of the experimental setup.

RESULTS This section provides the analysis of the collected data grouped into several research questions. First, we analyze the effectiveness of the tasks to elicit the relaxed and stressed states. Second, we study the differences in pressure for the keyboard tasks. Third, we analyze the differences in capacitance for the mouse clicking task. Finally, we explore how much data would be necessary to replicate the findings of the previous questions.

Since some of the data was not normally distributed we utilized the non-parametric Wilcoxon Rank Sum test (W) to evaluate whether two distributions are statistically different (e.g., the distributions of pressure during stressed and relaxed conditions), and the non-parametric Kruskal-Wallis test (K) to evaluate whether multiple groups of variables

Training Calming Video

Calming Video

Calming Video

TT MC EW

TT MC EW

EW MC TT

EW MC TT

3

TT MC EW

TT MC EW

EW MC TT

EW MC TT

1

2

4

3

1

2

4

Figure 3. Task orderings for different participants. Blue and dashed-red rectangles correspond to the relaxed and stressed conditions, respectively. (TT: Text Transcription, EW: Expressive Writing, MC: Mouse Clicking).

Fig. 3. The experimental setup.

for the creation of less invasive systems that can continuouslymonitor stress in real-life settings.

Table II summarizes the main features of the presentedsolutions. Some valuable conclusions might be consideredwhile building a new system.

With the rapid development of smartphone and the abundantof sensors equipped as mentioned in Sec. II, the researchon smartphone-based affective computing becomes more andmore popular. Next we will discuss the affect detection usingsmartphone.

IV. SMARTPHONE-BASED AFFECTIVE COMPUTING

This section will discuss the state-of-the-art affective com-puting using mobile or wearable devices. First, we will brieflyintroduce the sensors equipped on modern wearable devices.Powerful sensors along with the long-term usage of such de-vices enable the unobtrusive affective data collection, which ispromising to improve traditional affective computing research.Moreover, the widespread use of smartphones has changedhuman lives. Modern technologies transform the impossible tothe nature [112]. As a result, researchers are seeking to explorenew features with respect to interactions (touch behaviors,usage events, etc.) with mobile wearable devices to inferaffective states. We will summarize the related works in thisaspect.

A. Affect sensing

As we have mentioned above, traditional affective stud-ies rely on camera, microphone, sensors attached to humanbodies and some other human-computer interaction interfacesto collect raw data for affect detection and analysis. Thewidespread use and rich sensors of modern smart wearabledevices overcome the limitations. We briefly introduce theaffective-related sensors and tools equipped on smart devices.

Accelerometer: It measures proper acceleration (accelerationit experiences relative to free fall), felt by people or objects,units as m/s2 or g (1g when standing on earth at sealevel). Most smartphone accelerometers trade large value rangefor high precision, iPhone 4 range: 2g, precision 0.018g.Fig. 4 shows the smartphone accelerometer and space flight

accelerometer. This sensor can be used to detect body move-ments and gestures. Traditional approaches [82]–[85] haveto attach pressure sensors on chairs, keyboard, mouse, etc.Accelerometer equipped on smartphones will definitely benefitthe unobtrusive and long-term data collection with respect tobody movement, gestures, and other human behaviors.

(a) Smartphone accelerometer (b) Space flight accelerometer

Fig. 4. Accelerometers.

Gyroscope: It can be a very useful tool because it moves inpeculiar ways and defy gravity. Gyroscopes have been aroundfor a century now, and they are now used everywhere fromairplanes, toy helicopters to smartphones. A gyroscope allowsa smartphone to measure and maintain orientation. Gyroscopicsensors can monitor and control device positions, orientation,direction, angular motion and rotation. When applied to asmartphone, a gyroscopic sensor commonly performs ges-ture recognition functions. Additionally, gyroscopes in smart-phones help to determine the position and orientation of thephone. An on-board gyroscope and accelerometer work incombination with a smartphone’s operating system or specificsoftware applications to perform these and other functions.

Fig. 5 give a view of digital gyroscope embedded insmartphone. Some works [113] take advantages of it to recordusers’ moving and position for affect detection.

Fig. 5. iPhone’s digital gyroscope.

GPS: Location sensors detect the location of the smartphoneusing 1) GPS; 2) Lateration/Triangulation of cell towers orwifi networks (with database of known locations for towersand networks); 3) Location of associated cell tower or WiFinetworks. For localization, connection to 3 satellites is re-quired for 2D fix (latitude/longitude), 4 satellites for 3D fix(altitude) as Fig. 6 shown. More visible satellites increaseprecision of positioning. Typical precision is 20 − 50m and

Page 8: A Survey on Mobile Affective Computing - arXiv · setup to generate the affective annotations for corpus. An alternative to address the problem and create new research opportunities

8

References Sources Labelling method Method andclassifier

Results

Poh et al. [107] webcam none statisticalanalysis

conclusion: strong correla-tion between these param-eters derived from web-cam recordings and stan-dard reference sensors

Epp et al. [87] mouse movements andkeystrokes

questionnaire(5-point Likertscale)

C4.5 decisiontree

accuracy: 77.4 − 87.8%for confidence, hesitance,nervousness, relaxation,sadness, tiredness

Zimmermann etal. [109]

keystrokes and mousemovements

questionnaire(self-assessment)

statisticalanalysis

conclusion: possible dis-crimination between neu-tral category and four oth-ers

Vizer et al. [91] keystrokes, language pa-rameters

questionnaire(11-point Likertscale)

decisiontrees, supportvectormachine,k-NN,AdaBoost,neuralnetworks

accuracy: 75.0% forcognitive stress (k-NN),62.5% for physical stress(AdaBoost, SVM, neuralnetworks); conclusion:number of mistakes madewhile typing decreasesunder stress

Hernandez et al.[86]

keystrokes, typing pres-sures, biosensors, mousemovements

labels assignedaccording to thetasks elicitedthe intendedemotions

statisticalanalysis

conclusion: increased lev-els of stress significantlyinfluence typing pressure(> 83% of the par-ticipants) and amount ofmouse contact (100% ofthe participants) of com-puter users

TABLE IIUNOBTRUSIVE EMOTION RECOGNITION.

maximum precision is 10m. However, GPS will not workindoors and can quickly kill your battery. Smartphones can tryto automatically select best-suited alternative location provider(GPS, cell towers, WiFi), mostly based on desired precision.With the location, we can study the relationship between lifepatterns and affective states. For example, most people inplayground feel happy while most feel sad in cemetery.

Fig. 6. How GPS works.

Location provides additional information to verify the sub-jective report from participants of affective studies. It mayalso help to build a confidence mechanism [114] for subjectivereports. Attaching the location to the subjective report wouldproduce confident weights to measure the significance of col-lected subjective reports. For example, a participant reportedthat he was happy in cemetery. But, in common sense, peoplein cemetery would be sad. Thus, we could set a low weight(e.g., 0.2) as a confident value to the report.

Microphone: Traditional affect studies need participantsto sit in lab and speak to a microphone. Now the built-inmicrophone of smartphones is able to record voice in the wild.If you are looking to use your phone as a voice recorder,for recording personal notes, meetings, or impromptu soundsaround you, then all you need is a recording application.Important things to look for in an application are: 1) theability to adjust gain levels; 2) change sampling rates; 3)display the recording levels on the screen, so you can makeany adjustments necessary; and, perhaps not as important, 4)save the files to multiple formats (at least .wav and .mp3).Also very handy is the ability to email the recording, orsave it to cloud storage, such as Dropbox. Some of themost highly recommended applications for Android includeEasy Voice Recorder Pro, RecForge Pro, Hi-Q mp3 VoiceRecorder, Smart Voice Recorder, and Voice Pro. For iOS,Audio Memos, Recorder Plus and Quick Record appear tobe good applications. We can choose some of these greatapplications to collect the voice for affect analysis.

Camera: The principal advantages of camera phones arecost and compactness; indeed for a user who carries a mobilephone anyway, the additional size and cost are negligible.Smartphones that are camera phones (Fig. 7) may run mobileapplications to add capabilities such as geotagging and imagestitching. A few high end phones can use their touch screento direct their camera to focus on a particular object in thefield of view, giving even an inexperienced user a degree offocus control exceeded only by seasoned photographers usingmanual focus. These properties clearly inspire spontaneousface expression capture for affective studies (e.g., buildingspontaneous face expression databases).

Page 9: A Survey on Mobile Affective Computing - arXiv · setup to generate the affective annotations for corpus. An alternative to address the problem and create new research opportunities

9

Fig. 7. Samsung Galaxy K Zoom.

Touch screen: The increasing number of people using touch-screen mobile phones (as Fig. 8 shown) raises the questionof whether touch behaviors reflect players’ emotional states.Recent studies in psychology literature [115] have shownthat touch behavior may convey not only the valence of anemotion but also the type of emotion (e.g. happy vs. upset).Unfortunately, these studies have been carried out mainlyin a social context (person-person communication) and onlythrough acted scenarios.

Fig. 8. Touch-screen mobile phone.

[111] explored the possibility of using this modality tocapture a player’s emotional state in a naturalistic setting oftouch-based computer games. The findings could be extendedto other application areas where touch-based devices are used.Using stroke behavior to discriminate between a set of affec-tive states in the context of touch-based game devices is verypromising. We should pave the way for further explorationswith respect to not only different contexts but also differenttypes of tactile behavior. Further studies are needed in avariety of contexts to establish a better understanding of thisrelationship and identify if, and how, these models could be

generalized over different types of tactile behavior, activity,context and personality traits.

Smartphone usage log: Since the usage of smartphone hasbecome the natural life of human, some works [43], [116]have shown the power of application logs for affect detection.Smartphone usage data can indicate personality traits usingmachine learning approaches to extract features. It is alsopromising to study relationships between users and person-alities, by building social networks with the rich contextualinformation available in applications usage, call and SMS logs.In addition, analyzing other modalities such as accelerometersand GPS logs remains a topic of mobile affective computing.

B. Mood and emotion detection

After introducing the powerful affect sensing ability ofsmart devices, we summarize the state-of-the-art affectivecomputing research using smart devices. This part will focuson discussing emotion and mood detection.

First, we need to clarify three terms that are closely inter-twined: affect, emotions, and moods. Affect is a generic termthat covers a broad range of feelings that people experience.It’s an umbrella concept that encompasses both emotions andmoods [117]. Emotions are intense feelings that are directedat someone or something [118]. Moods are feelings that tendto be less intense than emotions and that often (though notalways) lack a contextual stimulus. Most experts believe thatemotions are more fleeting than moods [119]. For example,if someone is rude to you, you will feel angry. That intensefeeling of anger probably goes fairly quickly. When you arein a bad mood, though, you can feel bad for several hours.

LiKamWa et al. [43] designed a smartphone software sys-tem for mood detection, namely MoodScope, which infersthe mood of its user based on how the smartphone is used.Smartphone sensors that measure acceleration, light, and otherphysical properties, MoodScope, however, is a “sensor” thatmeasures the mental state of the user and provides moodas an important input to context-aware computing. Theyfound that smartphone usage correlates well with the users’moods. Users use different applications and communicate withdifferent people depending on their moods. Using only sixpieces of usage information, namely, SMS, email, phone call,application usage, web browsing, and location, they couldbuild statistical usage models to estimate moods.

In this work, they developed an iOS application that allowedusers to report their moods conveniently Fig. 9(a) showsthe primary GUI of the MoodScope Application. They alsoallowed users to see their previous inputs through a calendarmode, shown in Fig. 9(b), and also through a chart mode.They created a logger to collect a participant’s smartphone in-teraction to link with the collected moods. The logger captureduser behaviour by using daemons operating in the background,requiring no user interaction. The data were archived nightlyto a server over a cell data or Wi-Fi connection. The authorsdivided the collected data into two categories: 1) social inter-action records, in that phone calls, text messages (SMS) andemails signal changes in social interactions; 2) routine activityrecords, in that patterns in browser history, application usage

Page 10: A Survey on Mobile Affective Computing - arXiv · setup to generate the affective annotations for corpus. An alternative to address the problem and create new research opportunities

10

and location history as coarse indicators of routine activity.Then they used a least-squares multiple linear regression toperform the modeling, simply applying the regression to theusage feature table, labeled by the daily averages of mood.Sequential Forward Selection (SFS) [120] was used to choosea subset of relevant features to accelerate the learning process.

(a) Primary GUI (b) Mood calendar view

Fig. 9. MoodScope application.

Rachuri et al. [44] proposed EmotionSense, a mobile sens-ing platform for social psychology studies based on mobilephones. Key characteristics include the ability of sensingindividual emotions as well as activities, verbal and proximityinteractions among members of social groups. EmotionSensegathers participants’ emotions as well as proximity and pat-terns of conversation by processing the outputs from the sen-sors of smartphones. EmotionSense system consists of severalsensor monitors, a programmable adaptive framework basedon a logic inference engine, and two declarative databases(Knowledge Base and Action Base). Each monitor is a threadthat logs events to the Knowledge Base, a repository of allthe information extracted from the on-board sensors of thephones. The system is based on a declarative specification(using first order logic predicates) of facts, i.e., the knowledgeextracted by the sensors about user behaviour and his/herenvironment (such as the identity of the people involved ina conversation with him/her); actions, i.e., the set of sensingactivities that the sensors have to perform with different dutycycles, such as recording voices (if any) for 10 seconds eachminute or extracting the current activity every 2 minutes.By means of the inference engine and a user-dened set ofrules (a default set is provided), the sensing actions areperiodically generated. The actions that have to be executedby the system are stored in the Action Base. The Action Baseis periodically accessed by the EmotionSense Manager thatinvokes the corresponding monitors according to the actionsscheduled in the Action Base. Users can define sensing tasksand rules that are interpreted by the inference engine in orderto adapt dynamically the sensing actions performed by thesystem.

The authors have presented the design of two novel subsys-tems for emotion detection and speaker recognition which arebuilt on a mobile phone platform. These are based on GaussianMixture methods for the detection of emotions and speakeridentities. EmotionSense automatically recognizes speakersand emotions by means of classifiers running locally on off-the-shelf mobile phones. A programmable adaptive systemis proposed with declarative rules. The rules are expressedusing first order logic predicates and are interpreted by meansof a logic engine. The rules can trigger new sensing actions

(such as starting the sampling of a sensor) or modify existingones (such as the sampling interval of a sensor). The resultsof a real deployment designed in collaboration with socialpsychologists are presented. It is found that the distributionof the emotions detected through EmotionSense generallyreflected the self-reports by the participants.

Gao et al. [111] answered a question whether touch be-haviours reflect players emotional states. Since the increasingnumber of people play games on touch-screen mobile phones,the question would not only be a valuable evaluation indicatorfor game designers, but also for real-time personalizationof the game experience. Psychology studies on acted touchbehaviour show the existence of discriminative affective pro-files. In this paper, finger-stroke features during gameplayon an iPod were extracted and their discriminative powerswere analyzed. Machine learning algorithms were used tobuild systems for automatically discriminating between fouremotional states (Excited, Relaxed, Frustrated, Bored), twolevels of arousal and two levels of valence. Accuracy reachedbetween 69% and 77% for the four emotional states, andhigher results (89%) were obtained for discriminating betweentwo levels of arousal and two levels of valence.

The iPhone game Fruit Ninja5 was selected for this study.Fruit Ninja is an action game that requires the player to squish,slash and splatter fruits. Players swipe their fingers across thescreen to slash and splatter fruit, simulating a ninja warrior.Fruit Ninja has been a phenomenon in the app store with largesales, and has inspired the creation of many similar drawing-based games. As the Fruit Ninja is not an open source app, anopen source variation of it (called Samurai Fruit), developedby Ansca Mobile, was used instead. It has been developed withthe Corona SDK. In order to elicit different kinds of emotionsfrom players and to be able to capture players touch behaviour,a few modifications to the Ansca’s game were implemented.

The developed New Samurai Fruit game software capturesand stores players’ finger-stroke behaviour during gameplay:the coordinates of each point of a stroke, the contact area of thefinger at each point and the time duration of a stroke, i.e. thetime occurring from the moment the finger touches the displayto the moment the finger is lifted. This information is directlygathered by using tracking functions of the Corona SDK. Thecontact area is used here as a measure of the pressure exertedby the participants, as the device does not allow pressure tobe measured directly (called feature pressure). In order to takeinto consideration physical differences between individuals,i.e. the dimension of the tip of a finger, a baseline pressurearea is computed before the game starts. Each participant isasked to perform a stroke. The average of the contact area ofeach point of this stroke is used as a pressure baseline for thatparticipant. The pressure values collected during the game arehence normalized by subtracting from them the participant’sbaseline value.

The authors introduced and explained the experiment pro-cess, the iPod touch device and the game rules of NewSamurai Fruit to participants and answered their questions.The participants were then asked to play the game and getfamiliar with it. After the training session, the participantswere asked to play and try to complete 20 consecutive levels

Page 11: A Survey on Mobile Affective Computing - arXiv · setup to generate the affective annotations for corpus. An alternative to address the problem and create new research opportunities

11

of the game. As a form of motivation, they were informed thatthe participant with the highest final score would have beenrewarded with an iTunes gift card worth 25 pounds. The gamewas video recorded. In order to decide the ground truth (i.e. theemotional state associated with each level), participants werefirst asked to fill out a self-assessment questionnaire after everygame level. To reduce the effect of the game results on thelabelling, the Cued Recall Brief approach [121] was used. Atthe end of the 20 levels, the player was shown a recorded videoof the gameplay s/he had just carried out. While the videowas playing s/he was encouraged to recall his/her thoughtsand emotional states and if necessary relabel the levels.

The results of visual inspection and feature extractionshowed that finger-stroke behaviour allows for the discrimi-nation of the four affective states they investigated, as wellas for the discrimination between two levels of arousal andbetween two levels of valence. The results showed very highperformance on both cross validations on unseen sampledata. To investigate if person-independent emotion recognitionmodels could reach similar results, three modelling algorithmswere selected: Discriminant Analysis (DA), Artificial NeuralNetwork (ANN) with Back Propagation and Support VectorMachine (SVM) classifiers. The last two learning algorithmswere selected for their popularity and their great ability forgeneralization. The models for Arousal produced the best cor-rect recognition results ranging from 86.7% (kernel SVM) to88.7% (linear SVM). In the case of Valence, the performancesranged from 82% (linear SVM) to 86% (kernel SVM). Theresults for the discrimination of the 4 emotion labels were alsowell above chance level; ranging from 69% (kernel SVM) to77% (linear SVM).

Overall, this work has investigated the possibility of usingstroke behaviour to discriminate between a set of affectivestates in the context of touch-based game devices. The resultsare very promising and pave the way for further explorationswith respect to not only different contexts but also differenttypes of tactile behaviour. Further studies are needed in avariety of contexts to establish a better understanding of thisrelationship and identify if, and how, these models could begeneralized over different types of tactile behaviour, activity,context and personality traits.

Lee et al. [122] proposed an approach to recognize emo-tions of the user by inconspicuously collecting and analyzinguser-generated data from different types of sensors on thesmartphone. To achieve this, they adopted a machine learningapproach to gather, analyze and classify device usage patterns,and developed a social network service client for Androidsmartphones which unobtrusively found various behaviouralpatterns and the current context of users.

Social networking service (SNS) like Twitter, Facebook orGoogle+ is an online service platform which aims for buildingand managing social relationships between people. On theSNS, users represent themselves in various ways such asprofiles, pictures, or text messages, and interact with otherpeople including their acquaintances. By using various SNSapplications on mobile devices, furthermore, users can shareideas, interests or activities at any time, and anywhere withtheir friends. A SNS user often expresses her/his feeling or

emotional state directly with emoticons or indirectly withwritten text, thereby their followers respond to her/his messagemore actively or even empathize with them. In psychology, thiskind of phenomenon is called the emotional contagion. Thatis, communication between users would be newly induced orenriched by the sharing of emotion. They defined these kindsof communicational activities as affective social communica-tion as depicted in Fig. 10. However, some SNS users hardlyshow what they feel, therefore their friends cannot sense orreact to their emotional states appropriately.

Abstract—Awareness of the emotion of those who communicate

with others is a fundamental challenge in building affective intelligent systems. Emotion is a complex state of the mind influenced by external events, physiological changes, or relationships with others. Because emotions can represent a user’s internal context or intention, researchers suggested various methods to measure the user’s emotions from analysis of physiological signals, facial expressions, or voice. However, existing methods have practical limitations to be used with consumer devices, such as smartphones; they may cause inconvenience to users and require special equipment such as a skin conductance sensor. Our approach is to recognize emotions of the user by inconspicuously collecting and analyzing user-generated data from different types of sensors on the smartphone. To achieve this, we adopted a machine learning approach to gather, analyze and classify device usage patterns, and developed a social network service client for Android smartphones which unobtrusively find various behavioral patterns and the current context of users. Also, we conducted a pilot study to gather real-world data which imply various behaviors and situations of a participant in her/his everyday life. From these data, we extracted 10 features and applied them to build a Bayesian Network classifier for emotion recognition. Experimental results show that our system can classify user emotions into 7 classes such as happiness, surprise, anger, disgust, sadness, fear, and neutral with a surprisingly high accuracy. The proposed system applied to a smartphone demonstrated the feasibility of an unobtrusive emotion recognition approach and a user scenario for emotion-oriented social communication between users.

Index Terms—Affective computing, Computer mediated communication, Emotion recognition, Machine intelligence, Supervised learning

I. INTRODUCTION

OCIAL networking service (SNS) like Twitter, Facebook or Google+ is an online service platform which aims for

building and managing social relationships between people. On the SNS, users represent themselves in various ways such as profiles, pictures, or text messages, and interact with other people including their acquaintances. By using various SNS applications on mobile devices, furthermore, users can share

ideas, interests or activities at any time, and anywhere with their friends. More specifically, a SNS user often expresses her/his feeling or emotional state directly with emoticons or indirectly with written text, thereby their followers respond to her/his message more actively or even empathize with them; In psychology, this kind of phenomenon is called the emotional contagion [1]. That is, communication between users would be newly induced or enriched by the sharing of emotion. We defined these kind of communicational activities as affective social communication as depicted in Figure 1.

However, some SNS users hardly show what they feel; therefore their friends cannot sense or react to their emotional states appropriately. This is probably because they are unfamiliar with the expression of emotion or are not aware of their own emotions enough. Possible solution for this problem is adopting emotion recognition technologies which are being extensively studied by the affective computing research society to determine emotions of the user. Existing emotion recognition technologies can be divided into three major categories depending on what kinds of data is analyzed for recognizing human emotion: physiological signals, facial expressions, or voice. Physiological emotion recognition shows acceptable performance but has some critical weaknesses that prevent its widespread use; they are obtrusive to users and need special equipment or devices. For example, if

Towards Unobtrusive Emotion Recognition for Affective Social Communication

Hosub Lee, Young Sang Choi, Sunjae Lee, and I. P. Park Intelligent Computing Laboratory, Future IT Research Center

Samsung Advanced Institute of Technology (SAIT), Samsung Electronics Co., Ltd. Nongseo-dong, Giheung-gu, Yongin-si, Gyeonggi-do, Korea 446-712

{horus.lee, macho, sunjae79.lee, ilpyung.park}@samsung.com

S

John's Social Communication Activity

w/ his EmotionJohn

John's Emotion: Happy

1. Gather and Analyze Sensor Data2. Recognize User Emotion Unobtrusively3. Append User Emotion to Communication

[Communication induced by the user emotion] John's Followers

Shawn: Hey, what makes you happy?John: My paper accepted for CCNC 2012!!J Kim: Congratulation man J.Y: me too!!John: Dinner is my treat!

Fig. 1. Conceptual Diagram of Affective Social Communication.

The 9th Annual IEEE Consumer Communications and Networking Conference - Special Session Affective Computing forFuture Consumer Electronics

978-1-4577-2071-0/12/$26.00 ©2012 IEEE 260

Fig. 10. Diagram of Affective Social Communication.

This paper presented a machine learning approach to rec-ognize emotional states of a smartphone user without anyinconvenience or extra cost for equipping additional sensors.To perceive current emotion of the user, they gathered varioustypes of sensor data from the smartphone. These sensor datacan be categorized into two types such as behaviour andcontext of the user, and it was collected while the user useda certain application on her/his smartphone. As the proofof concept, they developed an Android application namedaffective twitter client which collects above mentioned sensordata whenever the user sends a text message (i.e., tweet) to theTwitter. Via the pilot study for two weeks, they collected 314dataset including self-reported emotional states of the user,and utilized it to build Bayesian Network classifier whichis a powerful probabilistic classification algorithm based onthe Bayes theorem. Through repetitive experiments includingpreprocess, feature selection, and inference (i.e., supervisedlearning), the classifier can classify each tweet written by theuser into 7 types with a satisfactory accuracy of 67.52% onaverage. Classifiable types of emotions include Ekmans sixbasic emotions [123] and one neutral state.

To determine the current emotion of users, they adopted amachine learning approach consisting of the data collection,data analysis, and classification process. For these tasks, theyused a popular machine learning software toolkit named Weka[124]. Data collection process gathered various sensor datafrom the smartphone when a participant used AT Clientinstalled on their device. In more detail, if a participant felt aspecific emotion at the certain moment in their everyday life,they would write some short text messages and were askedto report their current emotion via AT Client. Meanwhile,AT Client collected various sensor data during this period.

Page 12: A Survey on Mobile Affective Computing - arXiv · setup to generate the affective annotations for corpus. An alternative to address the problem and create new research opportunities

12

It is also possible that some users write a tweet without anyemotions; this emotional state might be a neutral, and theyalso took this as training data because the neutral emotionitself is a frequently observed state in emotion recognition[125]. From gathered sensor data, they extracted 14 featureswhich seem to have potential correlations with the emotion.All of these features are formalized as an attribute-relation fileformat (ARFF) for Weka.

They analyzed collected training data using Weka. By therepetitive experiments, they chose Bayesian Network classifieras an inference model for their system, because it showedhighest classification accuracy among other machine learningclassifiers such as a Nave Bayes, decision tree, or neuralnetwork. Fig. 11 shows sample Bayesian Network.

classification accuracies ranging from 77.4% to 87.8%. As a similar methodology of ours, this work showed a promising result for unobtrusive emotion recognition.

B. Methodology

To determine the current emotion of users, we adopted a machine learning approach consisting of the data collection, data analysis, and classification process. For these tasks, we used a popular machine learning software toolkit named Weka [11].

Data collection process is visualized as a Figure 2. In this process, we gathered various sensor data from the smartphone when a participant use AT Client installed on their device. In more detail, if a participant feels a specific emotion at the certain moment in their everyday life, they would write some short text messages and were asked to report their current emotion via AT Client. Meanwhile, AT Client collected various sensor data during this period. It is also possible that some users write a tweet without any emotions; this emotional state might be a neutral, and we also took this as training data

because the neutral emotion itself is a frequently observed state in emotion recognition [12]. From gathered sensor data, we extracted 14 features which seem to have potential correlations with the emotion. All of these features are formalized as an attribute-relation file format (ARFF) for Weka. List and descriptions of features are summarized in Table I.

Next, we analyzed collected training data using Weka. As the first step, we discretized features which have a continuous numerical value such as a Typing Speed. This procedure is necessary for making all features suitable for numerical and statistical computation on the Weka (i.e., preprocess step). Then, we ranked all features by measuring information gain with respect to each class. Information gain is one of the most popular metrics of association and correlation among several random variables [13]. Based on the result of attribute evaluation, we finally selected 10 features which have more strong correlation with the emotion to build an inference model (i.e., feature selection step). Shadowed cells of Table I are selected features and reasonable explanations for this feature selection are presented in the experimental study section. By the repetitive experiments using 10-fold cross validation, we finally chose Bayesian Network classifier as an inference model for our system, because it showed highest classification accuracy among other machine learning classifiers such as a Naïve Bayes, decision tree, or neural network. Sample Bayesian Network is drawn at the Figure 3.

At last, our inference model was continuously updated by the unknown real-world data and classified it into 7 emotional states (i.e., learning & inference step).

C. System Architecture

A block diagram of AT Client is shown in Figure 4. Data Aggregator gathers various data from internal/external sensors: it collects sensor data or information like coordinates of touch positions, degree of movement of the device, current location, or weather received from web through running software in smartphones. Then, Data Aggregator divides data into two classes such as user behavior-related and context-related data, and delivers these data to User Behavior Analyzer and User Context Analyzer respectively.

TABLE I LIST OF FEATURES

Attribute Name Description Typing Speed All of these features are generated

according to typing behavior on the default Android widget named EditText. From these data, we can infer habits of users in writing text messages. All values are numerical.

Backspace Key Press Freq. Enter Key Press Freq. Special Symbol Press Freq. Maximum Text Length Erased Text Length Touch Count It means how many times user invokes

embedded edit window in EditText to perform various functions like cursor movement, word selection, and copy & paste. All values are numerical.

Long Touch Count

Device Shake Count It means how much the device is shaken; it has a numerical value.

Illuminance It means ambient brightness; it has a numerical value.

Discomfort Index (DI)

It is calculated based on the formula of Thom; DI = 0.4 (Ta + Tw) + 15 where Ta = dry-bulb temperature (F), Tw = wet-bulb temperature (F); it has a numerical value.

Location Home, Work, Commute, Entertain., Etc Time Morning, Afternoon, Evening, Night

Weather 14 weather conditions defined by the Google weather

Class Attribute (7 emotional states): Happiness, Surprise, Anger, Disgust, Sadness, Fear, Neutral

Training Data Acquisition- ARFF format

Before send text to Twitter, participant should select her/his emotion (self-report)

Six Basic Emotions + Neutral

Single Data Record per Each Tweet

Fig. 2. Data Collection Process. A participant sends a tweet to Twitter whenshe/he feels a certain emotion, and AT Client collects associated sensor data with the self-reported user emotion. These data are utilized for the data analysis process.

Fig. 3. Sample Bayes. Network drawn by GeNIe (http://genie.sis.pitt.edu/)software

262

Fig. 11. Sample Bayesian Network.

Through a pilot study for two weeks, they gathered 314real-world dataset from a participant and utilized it to validatethe emotion recognition approach by measuring classificationaccuracy. One subject from the research group members (onemale in his 30s) was recruited for the pilot study; he wrotetweets and indicated his emotional state whenever he felt acertain emotion in his daily life.

By evaluating all features via information gain attributeevaluation algorithm and concerning some facts, they selected10 features as primary attributes of training data for theBayesian Network classifier. The feature which has the highestcorrelation to emotions was the speed of typing; in addition,lengths of inputted text, shaking of the device, or user locationwere also important features for emotion recognition.

The average classification accuracy is 67.52% for 7 emo-tions. For emotions such as happiness, surprise, and neutral,the approach performed appreciably better than chance (i.e.,50%), and in the case of anger and disgust, it also outper-formed at least random classification. Classification accuracyis irregular for emotion types, but they found a generalcorrelation between the number of observation cases andclassification accuracy. This finding means that low accuracyin classification for sadness and fear may be improved withadditional data collection.

Petersen et al. [126] rendered the activations in a 3D brainmodel on a smartphone. They combined a wireless EEGheadset with a smartphone to capture brain imaging datareflecting the everyday social behaviour in a mobile context.

They applied a Bayesian approach to reconstruct the neuralsources, demonstrating the ability to distinguish among emo-tional responses reflected in different scalp potentials whenviewing pleasant and unpleasant pictures compared to neutralcontent. This work may not only facilitate differentiation ofemotional responses but also provide an intuitive interfacefor touch based interaction, allowing for both modelling themental state of users as well as providing a basis for novelbio-feedback applications.

The setup was based on a portable wireless Emotiv ResearchEdition neuroheadset (http://emotiv.com) which transmits theEEG and control data to a receiver USB dongle, originallyintended for a Windows PC version of the Emotiv researchedition SDK. They connected the wireless receiver dongle toa USB port on a Nokia N900 smartphone with Maemo 5 OS.Running in USB hostmode they decrypted the raw binary EEGdata transmitted from the wireless headset, and in order tosynchronize the stimuli with the data they timestamped the firstand last packets arriving at the beginning and end of the EEGrecording. Eight male volunteers from the Technical Universityof Denmark, between the ages of 26 and 53 participated in theexperiment.

Fig. 12. The wireless neuroheadset transmits the brain imaging data via areceiver USB dongle connected directly to a Nokia N900 smartphone.

Fig. 12 shows the smartphone-based interface. Applyinga Bayesian approach to reconstruct the underlying neuralsources may thus provide a differentiation of emotional re-sponses based on the raw EEG data captured online in amobile context, as the current implementation is able tovisualize the activations on a smartphone with a latency of150ms. The early and late ERP components are not limited tothe stark emotional contrasts characterizing images selectedfrom the IAPS collection. Whether they read a word withaffective connotations, came across something similar in animage or recognized from the facial expression that somebodylooks sad, the electrophysical patterns in the brain seem tosuggest that the underlying emotional processes might be thesame. The ability to continuously capture these patterns byintegrating wireless EEG sets with smartphones for onlineprocessing of brain imaging data may offer completely newopportunities for modelling the mental state of users in real lifescenarios as well as providing a basis for novel bio-feedbackapplications.

Kim et al. [113] investigated human behaviours related with

Page 13: A Survey on Mobile Affective Computing - arXiv · setup to generate the affective annotations for corpus. An alternative to address the problem and create new research opportunities

13

References Data Sensor monitors Modeling technique andclassifier

Participants

LiKamWa et al.[43]

SMS, email, phone call,application usage, webbrowsing, and location

GPS Multiple linear regression,Sequential forward selec-tion

32

Rachuri et al.[44]

Location, voice records,sensing data

GPS,Accelerometer,Bluetooth,Microphone

Gaussian Mixture Modelclassifier, Hidden MarkovModels, Maximum A Pos-teriori Estimation

18

Gao et al. [111] Touch pressure, duration,etc.

Touch panel Discriminant Analysis,Artificial Neural Networkwith Back Propagation,Support Vector Machine

15

Lee et al. [122] Touch behaviour, shake,illuminance, text, location,time, weather

Touchpanel, GPS,accelerometer

Bayesian Network classi-fier

314 real-world datasetfrom 1participant

Petersen et al.[126]

EEG data Wireless EEGheadset, touchpanel

Bayesian EM approach 8

Kim et al. [113] Movement, position andapplication usage

Accelerometer,touch panel,gyroscope,application logs

Naive bayesian, bayesiannetwork, simple logistic,best-first decision tree,C4.5 decision tree andnaive bayesian decisiontree

N/A

TABLE IIIMOBILE AFFECT DETECTION.

the touch interface on a smartphone as a way to understandusers emotional states. As modern smartphones have vari-ous embedded sensors such as accelerometer and gyroscope,they aimed to utilize data from these embedded sensors forrecognizing human emotion and further finding emotionalpreferences for smartphone applications. They collected 12attributes from 3 sensors during users’ touch behaviours, andrecognized seven basic emotions with off-line analysis. Finally,they generated a preference matrix of applications by calcu-lating the difference between prior and posterior emotionalstates to the application usage. The pilot study showed 0.57of an average f1-measure score (and 0.82 with decision treebased methods) with 455 cases of touch behaviour. Theydiscovered that a simple touching behaviour has a potentialfor recognizing users’ emotional states.

This paper proposed an emotion recognition framework forsmartphone applications and the analysis of an experimentalstudy. The framework consists of an emotion recognitionprocess and emotional preference learning algorithm whichlearns the difference between two emotional state; the priorand the posterior emotional state for each entities in a mobiledevice such as downloaded applications, media contents andcontacts of people. Once the entities emotional preferencehas been made, they can recommend smartphone applications,media contents and mobile services that fit to users currentemotional state. From the emotion recognition process, theytracked the temporal changes of the users emotional states.The proposed emotion based framework has a potential tohelp users to experience emotional closeness and personalizedrecommendations.

Fig. 13 shows an overall process that consists of monitor-ing, recognition and emotional preference learning module.The monitoring module collects devices sensory data suchas touch and accelerometer with its occurrence time. Theemotion recognition module processes the collected data to

Fig. 1. The Conceptual Process of Emotion Recognition, Generation andEmotional Preference Learning

speech and contextual information are frequently used to rec-ognize human emotion. Many researchers proposed practicalapproaches to apply emotions to machines, especially robots,virtual agents and embodied characters. According to recentstudies, machine’s embedded human-like emotional aspectincreases positive reactions from the human users[6]. Beyondrobotics, several studies tried to integrate emotional factors toconsumer electronics device[4][5].

III. EMOTION RECOGNITION ON MOBILE PLATFORM

Emotion based systems have a typical process of recognitionof human emotion and generation of machine emotion. Theprocess has several stages for the recognition - classification,quantification and mapping stages. Through this, a machinecan recognize a user’s emotion, and optionally, the machinecan generates its own emotional state or make recommenda-tions for users. Fig.1 shows an overall process that consistsof monitoring, recognition and emotional preference learningmodule. The monitoring module collects device’s sensory datasuch as touch and accelerometer with its occurrence time.The emotion recognition module processes the collected datato infer the human’s emotional state. When necessary, theemotion recognition module can have additional attributessuch as memory, interaction and moods to enhance its accuracyand performance. As in the bottom part of Fig.1, the emotionalpreference learning module is responsible for building anemotional preference matrix for the smartphone applications(and or their embedding objects) by learning the differenceof quantificated emotional state between prior and posteriorbehaviours(Eq.1). We will discuss details in the followingparagraphs.

A. Mobile Platform

People usually carry mobile devices such as smartphonethroughout daily life, so we can derive contextual informationof the user by collecting sensory data and analyzing contextualinformation such as email and text messages. Mobile deviceshave various applications and we can calculate correspondingemotion based preference by calculating recognized emotionalfactors. Following is the formalism to define smartphopneapplication and its objects.

TABLE IAN EXAMPLE OF RELATIONS BETWEEN APPLICATION AND OBJECT

BASED ON CATEGORIES OF THE ANDROID MARKET

Relation A O

R1

Tools/Productivity N/APersonalization N/A

Library and Demo N/A

R2

Transportation/Weather TextHealth and Fitness Text,ImageEducation/Finance Text,Image

Business/Sport/Medical Text,ImageNews and Magazine Text,ImageBooks and Reference Text,Image

Lifestyle/Travel and Local Text,ImageShopping Text,Image

Comics/Photography ImageMusic and Audio MusicMedia and Video Video

Multimedia Music,Movie,ImageEntertainment People,Text,Image

Social People,Text,Image

Definition 1. (Entity) If there is a set of k users U ={u1, u2, . . . , uk}, a set of n applications A = {a1, a2, . . . , an}and a set of r objects O = {o1, o2, . . . , or}, an entity is atuple E = 〈U , A, O,R1,R2〉, where R1 ⊆ U × A × O andR2 ⊆ U×A which represents ternary and binary relationship,respectively.

Unlike the binary relationship R1, in case of the ternaryrelationship R2, application an contains the objects or thatinclude text, music, movie, image or people. For example, An-droid market[16] currently has 24 categories and applicationsbelonging to each one can have relations with the objects asmentioned above. If a user executed a social network applica-tion such as Facebook[19] or Twitter[20], we can assume thatthe application has the relation with communication partnersas an object.

B. Emotion Recognition

As we already described in Fig.1, we define emotion recog-nition process as composed of the classification, quantificationand mapping stages. The classification module is responsiblefor classifying sensory data according to its type. For example,touch action can be divided into tapping, dragging, flicking andmulti-touching.

Definition 2. (Quantification) When a quantification modulereceives a classified sensory data x, the data is normalized bythe function q(xt | t ∈ T ), where T is the types of sensors,and the function returns y(0 ≤ y ≤ 1).

Definition 3. (Mapping) If there is a set of l quantificatedsensory data Y = {y1, y2, . . . , yl}, emotion factors can becalculated by the mapping function m(

⋃|l|i=1 yi) = {(f,Φ) |

f ∈ Fand 0 < Φ ≤ 1}, where F is the set of emotion factorsand Φ is the strength of the factors.

To quantificate emotions, any emotional factor such asbig five personal traits[7] or circumplex model[9] can beadopted. Furthermore, we can use additional functions for the

246

Fig. 13. The Conceptual Process of Emotion Recognition, Generation andEmotional Preference Learning.

infer the humans emotional state. When necessary, the emotionrecognition module can have additional attributes such asmemory, interaction and moods to enhance its accuracy andperformance. As in the bottom part of Fig. 13, the emotionalpreference learning module is responsible for building anemotional preference matrix for the smartphone applications(and or their embedding objects) by learning the differenceof quantificated emotional state between prior and posteriorbehaviours.

People have their own preferences. To discover this, manytechnologies are depended on user or item based stochasticapproaches. But almost cases ignore the users’ emotionalfactor. From the emotion recognition module, they gave eachapplication and object an emotional preference by learning thedifferences of the users’ emotional state during the use of ap-plication. Smartphone users interact with their devices throughsequential or non-sequential activities such as dragging, flick-ing, tapping, executing applications, viewing contents and soon.

Page 14: A Survey on Mobile Affective Computing - arXiv · setup to generate the affective annotations for corpus. An alternative to address the problem and create new research opportunities

14

For the training of user’s emotional state, they periodicallyreceived emotion feedback. They tried to minimize noisesduring the experiment. Because users moving and position caninfluence accelerometer or gyroscope, they guided a subject sitand keep the same position whenever he performs an actionwith the smartphone. With the emotion recognition module,they found each applications emotional preferences by learningdifferences of the emotion attributes.

Although the experimental result showed comparativelylow performance, they showed the possibility of touch-basedemotion recognition method which has wide applicabilitysince modern smart devices are dependent on touch inter-action. They argued that they could enhance the recognitionperformance by additional data collection and found remediesfor practical issues for the future works. Collecting the selfreported emotion data is one of the biggest challenge for theresearch. Some emotions such as anger, fear and surprise arenot frequently occurring in real life situation and they are hardto be caught during the experiments.

In summary, this work proposed an emotion recognitionbased smartphone application preference learning. From theexperiment, they discovered that a simple touching behaviourcan be used for recognizing smartphone users’ emotionalstate, and reducing the number of emotions can improve therecognition accuracy. They also found that the window sizefor the recognition depended on the type of emotions andclassification methods.

V. CHALLENGES AND OPPORTUNITIES

The unobtrusive affect detection requires sensors deploy-ment on common used devices. While the mobile affectivecomputing takes advantage of embedded sensors in smartdevices. Even researchers have put a lot of efforts, we onlytouched the tip of the iceberg. Although the accuracy of someworks has reached a relative high level, we still are veryfar away from the usefulness. This section will highlight thechallenges on MAC and discuss some research potentials.

A. Challenges

So far, the research stays on the superficial of affectivecomputing problems. The conclusion either proposed a highcorrelation between collected data and results or showedreasonable accuracy of affect detection under studies with veryfew participants. All these give us hopes and inspire more andmore talents to be devoted to solving remaining problems.

The primary challenge should be the ground truth estab-lishment. To our knowledge, researchers recruited participantsto fill some questionnaires for obtaining ground truth. Alter-natively, some appropriate databases (audio, visual expres-sions) can be used to train the classifier. However, mostof databases are built by recording emotion expressions ofactors. Psychological research suggests that deliberate emotionexpressions differ in spontaneous occurring expressions [15],[16]. Spontaneous emotion expression database establishmentis quite expensive and inefficient, in that we have to manuallylabel the raw data. Nevertheless, researchers have conqueredthe difficulties to build large spontaneous affective data sets.

The existing spontaneous affective data were collected inthe following scenarios: human-human conversation, HCI, anduse of a video kiosk. Human-human conversation scenariosinclude face-to-face interviews [17], [127]. HCI scenarios in-clude computer-based dialogue systems [21], [79]. In the videokiosk settings, the subjects’ affective reactions are recordedwhile the subjects are watching emotion-inducing videos [47],[128], [129]. In most of the existing databases, discreteemotion categories are used as the emotion descriptors. Thelabels of prototypical emotions are often used, especially inthe databases of deliberate affective behaviour. In databasesof spontaneous affective behaviour, coarse affective stateslike positive versus negative, dimensional descriptions in theevaluation-activation space and some application-dependentaffective states are usually used as the data labels. Currently,several large audio, visual, and audiovisual sets of humanspontaneous affective behaviour have been collected, some ofwhich are released for public use.

Friendly application design is also challenging for mobileemotion detection. Some works (e.g., [130]) have designedsome interesting applications (Fig. 9). The friendly designcan make users feel more comfortable for affective feedbackcollection. The arising problem is the self-interference thatsuch design could incur some effects on user’s affect. It is non-trivial to evaluate the self-interference and find a compensationto the results. Besides, the widespread use of smart devicesmay change user’s social behaviour, which affects the traditionemotional social relationship.

The lack of the coordination mechanism of affective param-eters under multi-model condition quite limits the affectiveunderstanding and the affect prompts. The amalgamation ofdifferent channels is not just the combination of them, butto find the mutual relations among all channel information.The mutual relation could make better integration of thedifferent channels during interaction phases for both recog-nition/understanding and information generation.

B. Potential research

The smart devices have enough power and popularity tooffer an ubiquitous affect detection, in that they keep trackingpeople’s everyday life to constantly collecting affective data.It is possible that through the mobile application markets (e.g.,Apple Store, Google Play), we could deploy almost unlimitedthe affect detection applications among users. This enablesan individual affective services, in which the applicationwill learn individual mood and emotion patterns and providethe affect detection in custom. In this way, we can keepcollecting people’s spontaneous affective data in their dailylives. This will eventually efficiently and effectively build ascalable in custom spontaneous affective databases,which arehelpful to extract common affective features of human andfind distinguishable fingerprints among different users.

In addition, lots of new HCI and smart wearable deviceshave gain their popularity so that the affective fingerprints(e.g., voice, facial expression, gesture, heart rate and so on)could be fully investigated by combining multiple wearabledevices. Therefore, designing an affective eco-system that

Page 15: A Survey on Mobile Affective Computing - arXiv · setup to generate the affective annotations for corpus. An alternative to address the problem and create new research opportunities

15

coordinates all devices for data fusion is very promisingto boost the affect detection performance and explore thepossibility of further affective computing techniques.

Moreover, the emergence of the World Wide Web, theInternet and new technologies (e.g., mobile augmented reality[131], [132]) have changed life styles of human. It is neces-sary to discover new emotional features, which may exist inapplication logs, smart device usage patterns, locations, orderhistories, etc. There is a great need to thoroughly monitorand investigate the new emotional expression fingerprints. Inother words, establishing new affective databases in terms ofnew affective fingerprints should be a very significant researchtopic.

VI. CONCLUSION

Research on the human affect detection has witnessed agood deal of progress. At those time, affective computingresearch relied on well-controlled experiments in lab. Availabledata sets are either deliberate emotional expressions fromactors or small scale spontaneous expressions established ata substantial cost. The obtrusive data collection methods inlab also restrict the effectiveness of the research.

Today, new technologies bring us new opportunities ofaffect detection in an unobtrusive and smart way. A numberof promising methods for unobtrusive tool based (webcam,keystroke, mouse) and smartphone based of human affectdetection have been proposed. This paper focuses on sum-marising and discussing these novel approaches to the analysisof human affect and reveal the most important issues yet tobe addressed in the field include the following:

• Building a scalable in custom databases, which couldextract common affect features and distinguishable affectfingerprints of human,

• Devising an affective ecosystem that coordinates differentsmart wearable devices, which takes into considerationthe most comprehensive affective data for boosting affectdetection accuracy,

• Investigating potential new affect-related features due tothe changing life styles.

Since the complexity of these issues concerned with theinterpretation of human behaviour at a very deep level istremendous and requires a highly interdisciplinary collabora-tion, we believe the true break-throughs in this field can beexperienced by establishing an interdisciplinary internationalprogram directed toward computer intelligence of human be-haviours. The research on unobtrusive and mobile affectivecomputing can aid in advance the research in multiple relatedresearch fields including education, psychology and socialscience.

REFERENCES

[1] R. W. Picard, Affective Computing. Cambridge, MA, USA: MIT Press,1997.

[2] C. Darwin, The expression of the emotions in man and animals. OxfordUniversity Press, 1998.

[3] W. James, What is an Emotion? Wilder Publications, 2007.[4] W. D. Taylor and S. K. Damarin, “The second self: Computers and

the human spirit,” Educational Technology Research and Development,vol. 33, no. 3, pp. 228–230, 1985.

[5] J. C. Hager, P. Ekman, and W. V. Friesen, “Facial action codingsystem,” Salt Lake City, UT: A Human Face, 2002.

[6] R. Cowie, E. Douglas-Cowie, and C. Cox, “Beyond emotionarchetypes: Databases for emotion modelling using neural networks,”Neural networks, vol. 18, no. 4, pp. 371–388, 2005.

[7] R. Banse and K. R. Scherer, “Acoustic profiles in vocal emotionexpression.” Journal of personality and social psychology, vol. 70,no. 3, p. 614, 1996.

[8] L. S.-H. Chen, “Joint processing of audio-visual information for therecognition of emotional expressions in human-computer interaction,”Ph.D. dissertation, Citeseer, 2000.

[9] R. Gross, “Face databases,” in Handbook of Face Recognition.Springer, 2005, pp. 301–327.

[10] T. Kanade, J. F. Cohn, and Y. Tian, “Comprehensive database for facialexpression analysis,” in Automatic Face and Gesture Recognition, 2000.Proceedings. Fourth IEEE International Conference on. IEEE, 2000,pp. 46–53.

[11] H. Gunes and M. Piccardi, “A bimodal face and body gesture databasefor automatic analysis of human nonverbal affective behavior,” inPattern Recognition, 2006. ICPR 2006. 18th International Conferenceon, vol. 1. IEEE, 2006, pp. 1148–1153.

[12] M. Pantic, M. Valstar, R. Rademaker, and L. Maat, “Web-baseddatabase for facial expression analysis,” in Multimedia and Expo, 2005.ICME 2005. IEEE International Conference on. IEEE, 2005, pp. 5–pp.

[13] L. Yin, X. Wei, Y. Sun, J. Wang, and M. J. Rosato, “A 3d facialexpression database for facial behavior research,” in Automatic faceand gesture recognition, 2006. FGR 2006. 7th international conferenceon. IEEE, 2006, pp. 211–216.

[14] J. J. Gross and R. W. Levenson, “Emotion elicitation using films,”Cognition & Emotion, vol. 9, no. 1, pp. 87–108, 1995.

[15] C. Whissell, “The dictionary of affect in language.” Emotion: Theory,Research, and Experience. The Measurement of Emotions, vol. 4, pp.113–131, 1989.

[16] P. Ekman and E. L. Rosenberg, What the face reveals: Basic andapplied studies of spontaneous expression using the Facial ActionCoding System (FACS). Oxford University Press, 1997.

[17] M. S. Bartlett, G. Littlewort, M. Frank, C. Lainscsek, I. Fasel, andJ. Movellan, “Recognizing facial expression: machine learning andapplication to spontaneous behavior,” in Computer Vision and PatternRecognition, 2005. CVPR 2005. IEEE Computer Society Conferenceon, vol. 2. IEEE, 2005, pp. 568–573.

[18] A. B. Ashraf, S. Lucey, J. F. Cohn, T. Chen, Z. Ambadar, K. M.Prkachin, and P. E. Solomon, “The painful face–pain expression recog-nition using active appearance models,” Image and Vision Computing,vol. 27, no. 12, pp. 1788–1796, 2009.

[19] J. F. Cohn and K. L. Schmidt, “The timing of facial motion in posed andspontaneous smiles,” International Journal of Wavelets, Multiresolutionand Information Processing, vol. 2, no. 02, pp. 121–132, 2004.

[20] M. F. Valstar, M. Pantic, Z. Ambadar, and J. F. Cohn, “Spontaneousvs. posed facial behavior: automatic analysis of brow actions,” in Pro-ceedings of the 8th international conference on Multimodal interfaces.ACM, 2006, pp. 162–170.

[21] C. M. Lee and S. S. Narayanan, “Toward detecting emotions in spokendialogs,” Speech and Audio Processing, IEEE Transactions on, vol. 13,no. 2, pp. 293–303, 2005.

[22] A. Batliner, K. Fischer, R. Huber, J. Spilker, and E. Noth, “How tofind trouble in communication,” Speech communication, vol. 40, no. 1,pp. 117–143, 2003.

[23] S. Greengard, “Following the crowd,” Communications of the ACM,vol. 54, no. 2, pp. 20–22, 2011.

[24] R. Morris, “Crowdsourcing workshop: the emergence of affectivecrowdsourcing,” in Proceedings of the 2011 annual conference ex-tended abstracts on Human factors in computing systems. ACM,2011.

[25] M. Soleymani and M. Larson, “Crowdsourcing for affective annotationof video: Development of a viewer-reported boredom corpus,” inProceedings of the ACM SIGIR 2010 workshop on crowdsourcing forsearch evaluation (CSE 2010), 2010, pp. 4–8.

[26] S. Klettner, H. Huang, M. Schmidt, and G. Gartner, “Crowdsourcingaffective responses to space,” Kartographische Nachrichten-Journal ofCartography and Geographic Information, vol. 2, pp. 66–73, 2013.

[27] G. Tavares, A. Mourao, and J. Magalhaes, “Crowdsourcing foraffective-interaction in computer games,” in Proceedings of the 2ndACM international workshop on Crowdsourcing for multimedia.ACM, 2013, pp. 7–12.

Page 16: A Survey on Mobile Affective Computing - arXiv · setup to generate the affective annotations for corpus. An alternative to address the problem and create new research opportunities

16

[28] B. G. Morton, J. A. Speck, E. M. Schmidt, and Y. E. Kim, “Improvingmusic emotion labeling using human computation,” in Proceedings ofthe acm sigkdd workshop on human computation. ACM, 2010, pp.45–48.

[29] S. D. Kamvar and J. Harris, “We feel fine and searching the emotionalweb,” in Proceedings of the fourth ACM international conference onWeb search and data mining. ACM, 2011, pp. 117–126.

[30] “Amazon mechanical turk,” https://www.mturk.com/mturk/welcome.[31] “Worldwide smartphone user base hits 1 billion,” http://www.cnet.com/

news/worldwide-smartphone-user-base-hits-1-billion/.[32] J. F. Cohn, “Foundations of human computing: facial expression

and emotion,” in Proceedings of the 8th international conference onMultimodal interfaces. ACM, 2006, pp. 233–238.

[33] R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias,W. Fellenz, and J. G. Taylor, “Emotion recognition in human-computerinteraction,” Signal Processing Magazine, IEEE, vol. 18, no. 1, pp. 32–80, 2001.

[34] M. Pantic, A. Pentland, A. Nijholt, and T. S. Huang, “Human com-puting and machine understanding of human behavior: a survey,” inArtifical Intelligence for Human Computing. Springer, 2007, pp. 47–71.

[35] A. Pentland, “Socially aware, computation and communication,” Com-puter, vol. 38, no. 3, pp. 33–40, 2005.

[36] C. Darwin, The expression of the emotions in man and animals.University of Chicago press, 1965, vol. 526.

[37] B. Parkinson, Ideas and realities of emotion. Psychology Press, 1995.[38] J. J. Gross, “Emotion regulation: Affective, cognitive, and social

consequences,” Psychophysiology, vol. 39, no. 3, pp. 281–291, 2002.[39] ——, “The emerging field of emotion regulation: An integrative

review.” Review of general psychology, vol. 2, no. 3, p. 271, 1998.[40] A. Ohman and J. J. Soares, “”unconscious anxiety”: phobic responses

to masked stimuli.” Journal of abnormal psychology, vol. 103, no. 2,p. 231, 1994.

[41] ——, “On the automatic nature of phobic fear: conditioned electroder-mal responses to masked fear-relevant stimuli.” Journal of abnormalpsychology, vol. 102, no. 1, p. 121, 1993.

[42] M. S. Clark and I. Brissette, “Two types of relationship closeness andtheir influence on people’s emotional lives.” 2003.

[43] R. LiKamWa, Y. Liu, N. D. Lane, and L. Zhong, “Moodscope: Buildinga mood sensor from smartphone usage patterns,” in Proceeding of the11th annual international conference on Mobile systems, applications,and services. ACM, 2013, pp. 389–402.

[44] K. K. Rachuri, M. Musolesi, C. Mascolo, P. J. Rentfrow, C. Longworth,and A. Aucinas, “Emotionsense: a mobile phones based adaptiveplatform for experimental social psychology research,” in Proceedingsof the 12th ACM international conference on Ubiquitous computing.ACM, 2010, pp. 281–290.

[45] B. Fasel and J. Luettin, “Automatic facial expression analysis: asurvey,” Pattern Recognition, vol. 36, no. 1, pp. 259–275, 2003.

[46] O. Pierre-Yves, “The production and recognition of emotions in speech:features and algorithms,” International Journal of Human-ComputerStudies, vol. 59, no. 1, pp. 157–183, 2003.

[47] M. Pantic and M. S. Bartlett, “Machine analysis of facial expressions,”2007.

[48] M. Pantic and L. J. Rothkrantz, “Toward an affect-sensitive multimodalhuman-computer interaction,” Proceedings of the IEEE, vol. 91, no. 9,pp. 1370–1390, 2003.

[49] M. Pantic, N. Sebe, J. F. Cohn, and T. Huang, “Affective multimodalhuman-computer interaction,” in Proceedings of the 13th annual ACMinternational conference on Multimedia. ACM, 2005, pp. 669–676.

[50] N. Sebe, I. Cohen, and T. S. Huang, “Multimodal emotion recognition,”Handbook of Pattern Recognition and Computer Vision, vol. 4, pp.387–419, 2005.

[51] A. Samal and P. A. Iyengar, “Automatic recognition and analysis ofhuman faces and facial expressions: A survey,” Pattern recognition,vol. 25, no. 1, pp. 65–77, 1992.

[52] Y.-L. Tian, T. Kanade, and J. F. Cohn, “Facial expression analysis,” inHandbook of face recognition. Springer, 2005, pp. 247–275.

[53] A. Kleinsmith and N. Bianchi-Berthouze, “Affective body expressionperception and recognition: A survey,” Affective Computing, IEEETransactions on, vol. 4, no. 1, pp. 15–33, 2013.

[54] J. Tao and T. Tan, “Affective computing: A review,” in Affectivecomputing and intelligent interaction. Springer, 2005, pp. 981–995.

[55] Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang, “A survey of affectrecognition methods: Audio, visual, and spontaneous expressions,”Pattern Analysis and Machine Intelligence, IEEE Transactions on,vol. 31, no. 1, pp. 39–58, 2009.

[56] B. Pang and L. Lee, “Opinion mining and sentiment analysis,” Foun-dations and trends in information retrieval, vol. 2, no. 1-2, pp. 1–135,2008.

[57] S. Brave and C. Nass, “Emotion in human-computer interaction,”The human-computer interaction handbook: fundamentals, evolvingtechnologies and emerging applications, pp. 81–96, 2003.

[58] K. R. Scherer, T. Banziger, and E. Roesch, A Blueprint for AffectiveComputing: A sourcebook and manual. Oxford University Press, 2010.

[59] D. Gokcay, G. Yildirim, and I. Global, Affective computing andinteraction: Psychological, cognitive, and neuroscientific perspectives.Information Science Reference, 2011.

[60] R. A. Calvo and S. K. D’Mello, New perspectives on affect andlearning technologies. Springer, 2011, vol. 3.

[61] J. A. Russell, “Core affect and the psychological construction ofemotion.” Psychological review, vol. 110, no. 1, p. 145, 2003.

[62] J. A. Russell, J.-A. Bachorowski, and J.-M. Fernandez-Dols, “Facialand vocal expressions of emotion,” Annual review of psychology,vol. 54, no. 1, pp. 329–349, 2003.

[63] L. F. Barrett, “Are emotions natural kinds?” Perspectives on psycho-logical science, vol. 1, no. 1, pp. 28–58, 2006.

[64] L. F. Barrett, B. Mesquita, K. N. Ochsner, and J. J. Gross, “Theexperience of emotion,” Annual review of psychology, vol. 58, p. 373,2007.

[65] T. Dalgleish, B. D. Dunn, and D. Mobbs, “Affective neuroscience: Past,present, and future,” Emotion Review, vol. 1, no. 4, pp. 355–368, 2009.

[66] T. Dalgleish, M. J. Power, and J. Wiley, Handbook of cognition andemotion. Wiley Online Library, 1999.

[67] M. Lewis, J. M. Haviland-Jones, and L. F. Barrett, Handbook ofemotions. Guilford Press, 2010.

[68] R. J. Davidson, K. R. Scherer, and H. Goldsmith, Handbook of affectivesciences. Oxford University Press, 2003.

[69] M. Minsky, The Society of Mind. New York, NY, USA: Simon &Schuster, Inc., 1986.

[70] S. DMello, R. Picard, and A. Graesser, “Towards an affect-sensitiveautotutor,” IEEE Intelligent Systems, vol. 22, no. 4, pp. 53–61, 2007.

[71] M. S. Bartlett, G. Littlewort, B. Braathen, T. J. Sejnowski, and J. R.Movellan, “A prototype for automatic recognition of spontaneous facialactions,” in Advances in neural information processing systems. MIT;1998, 2003, pp. 1295–1302.

[72] J. F. Cohn, L. I. Reed, Z. Ambadar, J. Xiao, and T. Moriyama,“Automatic analysis and recognition of brow actions and head motionin spontaneous facial behavior,” in Systems, Man and Cybernetics, 2004IEEE International Conference on, vol. 1. IEEE, 2004, pp. 610–616.

[73] A. Kapoor, W. Burleson, and R. W. Picard, “Automatic predictionof frustration,” International Journal of Human-Computer Studies,vol. 65, no. 8, pp. 724–736, 2007.

[74] G. C. Littlewort, M. S. Bartlett, and K. Lee, “Faces of pain: automatedmeasurement of spontaneousallfacial expressions of genuine and posedpain,” in Proceedings of the 9th international conference on Multimodalinterfaces. ACM, 2007, pp. 15–21.

[75] I. Arroyo, D. G. Cooper, W. Burleson, B. P. Woolf, K. Muldner, andR. Christopherson, “Emotion sensors go to school.” in AIED, vol. 200,2009, pp. 17–24.

[76] L. Devillers, L. Vidrascu, and L. Lamel, “Challenges in real-lifeemotion annotation and machine learning based detection,” NeuralNetworks, vol. 18, no. 4, pp. 407–422, 2005.

[77] L. Devillers and L. Vidrascu, “Real-life emotions detection withlexical and paralinguistic cues on human-human call center dialogs.”in Interspeech, 2006.

[78] B. Schuller, J. Stadermann, and G. Rigoll, “Affect-robust speechrecognition by dynamic emotional adaptation,” in Proc. speech prosody,2006.

[79] D. J. Litman and K. Forbes-Riley, “Predicting student emotions incomputer-human tutoring dialogues,” in Proceedings of the 42nd An-nual Meeting on Association for Computational Linguistics. Associ-ation for Computational Linguistics, 2004, p. 351.

[80] B. Schuller, R. J. Villar, G. Rigoll, M. Lang et al., “Meta-classifiersin acoustic and linguistic feature fusion-based affect recognition,” inProc. ICASSP, vol. 5, 2005, pp. 325–328.

[81] R. Fernandez and R. W. Picard, “Modeling drivers speech under stress,”Speech Communication, vol. 40, no. 1, pp. 145–159, 2003.

[82] S. Mota and R. W. Picard, “Automated posture analysis for detectinglearner’s interest level,” in Computer Vision and Pattern RecognitionWorkshop, 2003. CVPRW’03. Conference on, vol. 5. IEEE, 2003, pp.49–49.

Page 17: A Survey on Mobile Affective Computing - arXiv · setup to generate the affective annotations for corpus. An alternative to address the problem and create new research opportunities

17

[83] S. D’Mello and A. Graesser, “Automatic detection of learner’s affectfrom gross body language,” Applied Artificial Intelligence, vol. 23,no. 2, pp. 123–150, 2009.

[84] N. Bianchi-Berthouze and C. L. Lisetti, “Modeling multimodal ex-pression of users affective subjective experience,” User Modeling andUser-Adapted Interaction, vol. 12, no. 1, pp. 49–84, 2002.

[85] G. Castellano, M. Mortillaro, A. Camurri, G. Volpe, and K. Scherer,“Automated analysis of body movement in emotionally expressivepiano performances,” 2008.

[86] J. Hernandez, P. Paredes, A. Roseway, and M. Czerwinski, “Underpressure: sensing stress of computer users,” in Proceedings of the32nd annual ACM conference on Human factors in computing systems.ACM, 2014, pp. 51–60.

[87] C. Epp, M. Lippold, and R. L. Mandryk, “Identifying emotional statesusing keystroke dynamics,” in Proceedings of the SIGCHI Conferenceon Human Factors in Computing Systems. ACM, 2011, pp. 715–724.

[88] I. Karunaratne, A. S. Atukorale, and H. Perera, “Surveillance of human-computer interactions: A way forward to detection of users’ psycho-logical distress,” in Humanities, Science and Engineering (CHUSER),2011 IEEE Colloquium on. IEEE, 2011, pp. 491–496.

[89] P. Khanna and M. Sasikumar, “Recognising emotions from key-board stroke pattern,” International Journal of Computer Applications,vol. 11, no. 9, pp. 8975–8887, 2010.

[90] F. Monrose and A. D. Rubin, “Keystroke dynamics as a biometric forauthentication,” Future Generation computer systems, vol. 16, no. 4,pp. 351–359, 2000.

[91] L. M. Vizer, L. Zhou, and A. Sears, “Automated stress detection usingkeystroke and linguistic features: An exploratory study,” InternationalJournal of Human-Computer Studies, vol. 67, no. 10, pp. 870–886,2009.

[92] E. Niedermeyer and F. L. da Silva, Electroencephalography: basic prin-ciples, clinical applications, and related fields. Lippincott Williams& Wilkins, 2005.

[93] O. AlZoubi, R. A. Calvo, and R. H. Stevens, “Classification of eegfor affect recognition: an adaptive approach,” in AI 2009: Advances inArtificial Intelligence. Springer, 2009, pp. 52–61.

[94] A. Heraz and C. Frasson, “Predicting the three major dimensions ofthelearner-s emotions from brainwaves.”

[95] A. Bashashati, M. Fatourechi, R. K. Ward, and G. E. Birch, “A surveyof signal processing algorithms in brain–computer interfaces based onelectrical brain signals,” Journal of Neural engineering, vol. 4, no. 2,p. R32, 2007.

[96] F. Lotte, M. Congedo, A. Lecuyer, F. Lamarche, B. Arnaldi et al.,“A review of classification algorithms for eeg-based brain–computerinterfaces,” Journal of neural engineering, vol. 4, 2007.

[97] C. O. Alm, D. Roth, and R. Sproat, “Emotions from text: machinelearning for text-based emotion prediction,” in Proceedings of theconference on Human Language Technology and Empirical Methodsin Natural Language Processing. Association for ComputationalLinguistics, 2005, pp. 579–586.

[98] T. Danisman and A. Alpkocak, “Feeler: Emotion classification of textusing vector space model,” in AISB 2008 Convention Communication,Interaction and Social Intelligence, vol. 1, 2008, p. 53.

[99] C. Strapparava and R. Mihalcea, “Learning to identify emotions intext,” in Proceedings of the 2008 ACM symposium on Applied comput-ing. ACM, 2008, pp. 1556–1560.

[100] C. Ma, H. Prendinger, and M. Ishizuka, “A chat system based on emo-tion estimation from text and embodied conversational messengers,” inEntertainment Computing-ICEC 2005. Springer, 2005, pp. 535–538.

[101] P. H. Dietz, B. Eidelson, J. Westhues, and S. Bathiche, “A practicalpressure sensitive computer keyboard,” in Proceedings of the 22ndannual ACM symposium on User interface software and technology.ACM, 2009, pp. 55–58.

[102] I. J. Roseman, M. S. Spindel, and P. E. Jose, “Appraisals of emotion-eliciting events: Testing a theory of discrete emotions.” Journal ofPersonality and Social Psychology, vol. 59, no. 5, p. 899, 1990.

[103] I. J. Roseman, “Cognitive determinants of emotion: A structuraltheory.” Review of Personality & Social Psychology, 1984.

[104] “Smartphone,” http://en.wikipedia.org/wiki/Smartphone.[105] “Natural usage of smartphone,” http:

//www.marketingcharts.com/wp/online/8-in-10-smart-device-owners-use-them-all-the-time-on-vacation-28979/.

[106] K. Dolui, S. Mukherjee, and S. K. Datta, “Smart device sensingarchitectures and applications,” in Computer Science and EngineeringConference (ICSEC), 2013 International. IEEE, 2013, pp. 91–96.

[107] M.-Z. Poh, D. J. McDuff, and R. W. Picard, “Advancements in non-contact, multiparameter physiological measurements using a webcam,”Biomedical Engineering, IEEE Transactions on, vol. 58, no. 1, pp. 7–11, 2011.

[108] J. Allen, “Photoplethysmography and its application in clinical phys-iological measurement,” Physiological measurement, vol. 28, no. 3,p. R1, 2007.

[109] P. Zimmermann, S. Guttormsen, B. Danuser, and P. Gomez, “Affectivecomputing–a rationale for measuring mood with mouse and keyboard,”International journal of occupational safety and ergonomics, vol. 9,no. 4, pp. 539–551, 2003.

[110] N. Villar, S. Izadi, D. Rosenfeld, H. Benko, J. Helmes, J. Westhues,S. Hodges, E. Ofek, A. Butler, X. Cao et al., “Mouse 2.0: multi-touchmeets the mouse,” in Proceedings of the 22nd annual ACM symposiumon User interface software and technology. ACM, 2009, pp. 33–42.

[111] Y. Gao, N. Bianchi-Berthouze, and H. Meng, “What does touch tell usabout emotions in touchscreen-based gameplay?” ACM Transactionson Computer-Human Interaction (TOCHI), vol. 19, no. 4, p. 31, 2012.

[112] “Transforming the impossible to the natural,” https://www.youtube.com/watch?v=3jmmIc9GvdY.

[113] H.-J. Kim and Y. S. Choi, “Exploring emotional preference for smart-phone applications,” in Consumer Communications and NetworkingConference (CCNC), 2012 IEEE. IEEE, 2012, pp. 245–249.

[114] G. Tan, H. Jiang, S. Zhang, Z. Yin, and A.-M. Kermarrec,“Connectivity-based and anchor-free localization in large-scale 2d/3dsensor networks,” ACM Transactions on Sensor Networks (TOSN),vol. 10, no. 1, p. 6, 2013.

[115] M. J. Hertenstein, R. Holmes, M. McCullough, and D. Keltner, “Thecommunication of emotion via touch.” Emotion, vol. 9, no. 4, p. 566,2009.

[116] G. Chittaranjan, J. Blom, and D. Gatica-Perez, “Who’s who with big-five: Analyzing and classifying personality traits with smartphones,” inWearable Computers (ISWC), 2011 15th Annual International Sympo-sium on. IEEE, 2011, pp. 29–36.

[117] J. M. George, “State or trait: Effects of positive mood on prosocialbehaviors at work.” Journal of Applied Psychology, vol. 76, no. 2, p.299, 1991.

[118] N. H. Frijda, “The place of appraisal in emotion,” Cognition &Emotion, vol. 7, no. 3-4, pp. 357–387, 1993.

[119] P. E. Ekman and R. J. Davidson, The nature of emotion: Fundamentalquestions. Oxford University Press, 1994.

[120] A. Jain and D. Zongker, “Feature selection: Evaluation, application, andsmall sample performance,” Pattern Analysis and Machine Intelligence,IEEE Transactions on, vol. 19, no. 2, pp. 153–158, 1997.

[121] T. Bentley, L. Johnston, and K. von Baggo, “Evaluation using cued-recall debrief to elicit information about a user’s affective experiences,”in Proceedings of the 17th Australia conference on Computer-HumanInteraction: Citizens Online: Considerations for Today and the Future.Computer-Human Interaction Special Interest Group (CHISIG) ofAustralia, 2005, pp. 1–10.

[122] H. Lee, Y. S. Choi, S. Lee, and I. Park, “Towards unobtrusiveemotion recognition for affective social communication,” in ConsumerCommunications and Networking Conference (CCNC), 2012 IEEE.IEEE, 2012, pp. 260–264.

[123] P. Ekman, “Universals and cultural differences in facial expressionsof emotion.” in Nebraska symposium on motivation. University ofNebraska Press, 1971.

[124] “Weka 3,” http://www.cs.waikato.ac.nz/ml/weka/.[125] R. Tato, R. Santos, R. Kompe, and J. M. Pardo, “Emotional space

improves emotion recognition.” in INTERSPEECH, 2002.[126] M. K. Petersen, C. Stahlhut, A. Stopczynski, J. E. Larsen, and L. K.

Hansen, “Smartphones get emotional: mind reading images and recon-structing the neural sources,” in Affective Computing and IntelligentInteraction. Springer, 2011, pp. 578–587.

[127] E. Douglas-Cowie, N. Campbell, R. Cowie, and P. Roach, “Emotionalspeech: Towards a new generation of databases,” Speech communica-tion, vol. 40, no. 1, pp. 33–60, 2003.

[128] A. J. O’Toole, J. Harms, S. L. Snow, D. R. Hurst, M. R. Pappas,J. H. Ayyad, and H. Abdi, “A video database of moving faces andpeople,” Pattern Analysis and Machine Intelligence, IEEE Transactionson, vol. 27, no. 5, pp. 812–816, 2005.

[129] N. Sebe, M. S. Lew, Y. Sun, I. Cohen, T. Gevers, and T. S. Huang,“Authentic facial expression analysis,” Image and Vision Computing,vol. 25, no. 12, pp. 1856–1863, 2007.

[130] R. LiKamWa, Y. Liu, N. D. Lane, and L. Zhong, “Can your smartphoneinfer your mood?” in Proc. PhoneSense Workshop, 2011.

Page 18: A Survey on Mobile Affective Computing - arXiv · setup to generate the affective annotations for corpus. An alternative to address the problem and create new research opportunities

18

[131] T. Hollerer and S. Feiner, “Mobile augmented reality,” Telegeoinfor-matics: Location-Based Computing and Services. Taylor and FrancisBooks Ltd., London, UK, vol. 21, 2004.

[132] Z. Huang, P. Hui, C. Peylo, and D. Chatzopoulos, “Mobileaugmented reality survey: a bottom-up approach.” CoRR, vol.abs/1309.4413, 2013. [Online]. Available: http://dblp.uni-trier.de/db/journals/corr/corr1309.html#HuangHPC13