k.facial recog

Upload: satiishmano

Post on 09-Apr-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 k.facial Recog

    1/18

    726 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART A: SYSTEMS AND HUMANS, VOL. 39, NO. 4, JULY 2009

    Emotion Recognition From Facial Expressions andIts Control Using Fuzzy Logic

    Aruna Chakraborty, Amit Konar, Member, IEEE , Uday Kumar Chakraborty, and Amita Chatterjee

    Abstract This paper presents a fuzzy relational approach tohuman emotion recognition from facial expressions and its control.The proposed scheme uses external stimulus to excite specicemotions in human subjects whose facial expressions are analyzedby segmenting and localizing the individual frames into regionsof interest. Selected facial features such as eye opening, mouthopening, and the length of eyebrow constriction are extractedfrom the localized regions, fuzzied, and mapped onto an emotionspace by employing Mamdani-type relational models. A schemefor the validation of the system parameters is also presented. Thispaper also provides a fuzzy scheme for controlling the transitionof emotion dynamics toward a desired state. Experimental results

    and computer simulations indicate that the proposed scheme foremotion recognition and control is simple and robust, with goodaccuracy.

    Index Terms Emotion control, emotion modeling, emotionrecognition, fuzzy logic.

    I. INTRODUCTION

    H UMANS often use nonverbal cues such as hand gestures,facial expressions, and tone of the voice to express feel-ings in interpersonal communications. Unfortunately, currentlyavailable humancomputer interfaces do not take completeadvantage of these valuable communicative media and thusare unable to provide the full benets of natural interactionto the users. Humancomputer interactions could signicantlybe improved if computers could recognize the emotion of theusers from their facial expressions and hand gestures, andreact in a friendly manner according to the users needs andpreferences [4].

    The phrase affective computing [30] is currently gainingpopularity in the literature of humancomputer interfaces [35],[46]. The primary role of affective computing is to monitorthe affective states of people engaged in critical/accident-prone

    Manuscript received December 18, 2006; revised March 16, 2008. Firstpublished April 17, 2009; current version published June 19, 2009. Thiswork was supported in part by the UGC (UPE) program, Jadavpur University,Calcutta. This paper was recommended by Associate Editor S. Narayanan.

    A. Chakraborty is with the Department of Computer Science and En-gineering, St. Thomas College of Engineering and Technology, Calcutta700 023, India, and also with Jadavpur University, Calcutta 700 032, India(e-mail: [email protected]).

    A. Konar is with the Department of Electronics and Tele-CommunicationEngineering, Jadavpur University, Calcutta 700 032, India (e-mail: [email protected]).

    U. K. Chakraborty is with the Department of Mathematics and Com-puter Science, University of Missouri, St. Louis, MO 63121 USA (e-mail:[email protected]).

    A. Chatterjee is with the Centre for Cognitive Science, Department of Philosophy, Jadavpur University, Calcutta 700 032, India (e-mail: [email protected]).

    Color versions of one or more of the gures in this paper are available onlineat http://ieeexplore.ieee.org.

    Digital Object Identier 10.1109/TSMCA.2009.2014645

    environments to provide assistance in terms of appropriatealerts to prevent accidents. Li and Ji [25] proposed a probabilis-tic framework to dynamically model and recognize the usersaffective states so as to provide them with corrective assistancein a timely and efcient manner. Picard et al. [31] stressed thesignicance of human emotions on their affective psychologicalstates. Rani et al. [32] presented a novel scheme for the fusionof multiple psychological indices for real-time detection of aspecic affective state (anxiety) of people using fuzzy logicand regression trees, and compared the relative merits of the

    two schemes. Among the other interesting applications in af-fective computing, the works of Scheirer et al. [35], Conati [7],Kramer et al. [22], and Rani et al. [32], [33] deserve specialmention.

    Apart from humancomputer interfaces, emotion recogni-tion by computers has interesting applications in computerizedpsychological counseling and therapy, and in the detection of criminal and antisocial motives. The identication of humanemotions from facial expressions by a machine is a complexproblem for the following reasons. First, identication of theexact facial expression from a blurred facial image is not aneasy task. Second, segmentation of a facial image into regionsof interest is difcult, particularly when the regions do not havesignicant differences in their imaging attributes. Third, unlikehumans, machines usually do not have visual perception to mapfacial expressions into emotions.

    Very few works on human emotion detection have so farbeen reported in the current literature on machine intelligence.Ekman and Friesen [9] proposed a scheme for the recognitionof facial expressions from the movements of cheek, chin, andwrinkles. They have reported that there exist many basic move-ments of human eyes, eyebrows, and mouth that have directcorelation with facial expressions. Kobayashi and Hara [18][20] designed a scheme for the recognition of human facialexpressions using the well-known backpropagation neural algo-

    rithms [16], [39][41]. Their scheme is capable of recognizingsix common facial expressions depicting happiness, sadness,fear, anger, surprise, and disgust. Among the well-known meth-ods of determining human emotions, Fourier descriptor [40],template matching [2], neural network models [11], [34], [40],and fuzzy integral [15] techniques deserve special mention.Yamada [45] proposed a new method of recognizing emotionsthrough the classication of visual information. Fernandez-Dols et al. proposed a scheme for decoding emotions fromfacial expression and content [12]. Kawakami et al. [16] an-alyzed in detail the scope of emotion modeling from facialexpressions. Busso and Narayanan compared the scope of facialexpressions, speech, and multimodal information in emotion

    1083-4427/$25.00 2009 IEEE

  • 8/8/2019 k.facial Recog

    2/18

    CHAKRABORTY et al. : EMOTION RECOGNITION FROM FACIAL EXPRESSIONS AND ITS CONTROL 727

    recognition [4]. Cohen et al. [5], [6] considered temporal varia-tions in facial expressions, which are displayed in live video torecognize emotions. She proposed a new architecture of hiddenMarkov models to automatically segment and recognize facialexpressions. Gao et al. [13] presented a methodology for facialexpression recognition from a single facial image using line-

    based caricatures. Lanitis et al. [23] proposed a novel techniquefor the automatic interpretation and coding of face images usingexible models. Examples of other important works on therecognition of facial expression for conveying emotions include[3], [8], [10], [24], [26], [34], [37], [38], and [44].

    This paper provides an alternative scheme for human emo-tion recognition from facial images, and its control, using fuzzylogic. Audiovisual stimulus is used to excite the emotions of subjects, and their facial expressions are recorded as videomovie clips. The individual video frames are analyzed to seg-ment the facial images into regions of interest. Fuzzy C-means(FCM) clustering [1] is used for the segmentation of the facialimages into three important regions containing mouth, eyes,and eyebrows. Next, a fuzzy reasoning algorithm is invokedto map fuzzied attributes of the facial expressions into fuzzyemotions. The exact emotion is extracted from fuzzied emo-tions by a denormalization procedure similar to defuzzication(fuzzy decoding). The proposed scheme is both robust andinsensitive to noise because of the nonlinear mapping of imageattributes to emotions in the fuzzy domain. Experimental resultsshow that the detection accuracies of emotions for adult male,adult female, and children of 812 years are as high as 88%,92%, and 96%, respectively, outperforming the percentageaccuracies of the existing techniques [26], [43]. This paper alsoproposes a scheme for controlling emotion [36] by judiciously

    selecting appropriate audiovisual stimulus for presentation be-fore the subject. The selection of the audiovisual stimulus isundertaken using fuzzy logic. Experimental results show thatthe proposed control scheme has good experimental accuracyand repeatability.

    This paper is organized into eight sections. Section II pro-vides new techniques for the segmentation and localization of important components in a human facial image. In Section III,a set of image attributes, including eye opening (EO), mouthopening (MO), and the length of eyebrow constriction (EBC),is determined online from the segmented images. In Section IV,we fuzzify the measurements of imaging attributes into threedistinct fuzzy sets: HIGH, MEDIUM, and LOW; the principlesof the fuzzy relational scheme for emotion recognition arealso discussed in this section. Experimental issues pertainingto emotion recognition are presented in Section V. Validationof the proposed scheme is undertaken in Section VI, wheremeasures are taken to tune the membership distributions forimproving the performance of the overall system. A scheme foremotion control, along with experimental issues, is covered inSection VII. Conclusions are drawn in Section VIII.

    II. FILTERING , SEGMENTATION , AND LOCALIZATIONOF FACIAL COMPONENTS

    The identication of facial expressions by pixel-wise analy-sis of images is both tedious and time consuming. This paper

    attempts to extract signicant components of facial expressionsthrough segmentation of the image. Because of the differencesin the regional proles on an image, simple segmentationalgorithms, such as histogram-based thresholding techniques,do not always yield good results. After conducting severalexperiments, we concluded that for the segmentation of the

    mouth region, a color-sensitive segmentation algorithm is mostappropriate. Further, because of apparent nonuniformity in thelip color prole, a fuzzy segmentation algorithm is preferred.A color-sensitive FCM clustering algorithm [9] has, therefore,been selected for the segmentation of the mouth region.

    Segmentation of the eye regions, however, in most imageshas successfully been performed by the traditional thresholdingmethod. The hair region in human face can also easily besegmented by the thresholding technique. Segmentation of themouth and eye regions is required for the subsequent determina-tion of MO and EO, respectively. Segmentation of the eyebrowregion is equally useful in determining the length of EBC. Thedetails of the segmentation techniques of different regions arepresented below.

    A. Segmentation of the Mouth Region

    Before segmenting the mouth region, we rst representthe image in the Lab space from its conventionalredgreeblue (RGB) space. The Lab system has theadditional benet of representing a perceptually uniform colorspace. It denes a uniform matrix space representation of color so that a perceptual color difference is represented bythe Euclidean distance. The color information, however, is notadequate to identify the lip region. The position informationof pixels together with their color would be a good featureto segment the lip region from the face. The FCM clusteringalgorithm that we employ to detect the lip region is suppliedwith both color and pixel-position information of the image.

    The FCM clustering algorithm is a well-known technique forunsupervised pattern recognition. However, its use in imagesegmentation in general and lip region segmentation in par-ticular is a novel area of research. A description of the FCMclustering algorithm can be found in books on fuzzy patternrecognition (see, e.g., [1], [17], and [45]). In this paper, we just demonstrate how to use FCM clustering in the presentapplication.

    A pixel in this paper is described by ve attributes: threeattributes of color information (Lab) and two attributesof position information (x, y). The objective of the clusteringalgorithm is to classify the set of 5-D data points into twoclasses/partitionsthe lip region and the nonlip region. Initialmembership values are assigned to each 5-D pixel, such that thesum of the memberships in the two regions is equal to one. Thatis, for the kth pixel xk , we have

    L(xk ) + NL(x k ) = 1 (1)

    where L(xk ) and NL(x k ) denote the membership of xk to fallin the lip and nonlip regions, respectively.

    Given the initial membership values of L(xk ) and NL(x k )for k = 1 to n2 (assuming that the image is of size n n), we

  • 8/8/2019 k.facial Recog

    3/18

    728 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART A: SYSTEMS AND HUMANS, VOL. 39, NO. 4, JULY 2009

    Fig. 1. Original face image.

    Fig. 2. Median-ltered image.

    use the FCM algorithm to determine the cluster centers VL andVNL of the lip and the nonlip regions as

    VL =n 2

    k=1

    [L(xK )]m xK /n 2

    k=1

    [L(xk )]m (2)

    VNL =

    n 2

    k=1 [NL(xK )]m

    xK /

    n 2

    k=1 [NL (xk )]m

    . (3)

    Expressions (2) and (3) provide centroidal measures of the lipand nonlip clusters, evaluated over all data points xk for k = 1to n2 . The parameter m ( > 1) is any real number that affectsthe membership grade. The membership values of pixel xk inthe image for the lip and nonlip regions are obtained from thefollowing formulas:

    L(xK )=

    2

    j=1

    xk vL 2 / xk v j 21/ (m 1)

    1(4)

    NL(x K )=

    2

    j=1

    xk vNL 2 / xk v j 21/ (m 1)

    1(5)

    where V j denotes the j th cluster center for j {L, NL}.Determination of the cluster centers [by (2) and (3)] andmembership evaluation [by (4) and (5)] are repeated severaltimes following the FCM algorithm until the positions of thecluster centers do not further change.

    Fig. 1 presents a section of a facial image with a largeMO. This image is passed through a median lter, and theresulting image is shown in Fig. 2. Application of the FCMalgorithm to the image in Fig. 2 yields the image in Fig. 3. Fig. 4demonstrates the computation of MO (Section III).

    Fig. 3. Image after applying FCM clustering.

    Fig. 4. Measurement of MO from the dips in average intensity plot.

    Fig. 5. Synthetic eye template.

    B. Segmentation of the Eye Region

    The eye region in a monochrome image has a sharp contrastto the rest of the face. Consequently, the thresholding method

    can be employed to segment the eye region from the image.Images grabbed at poor illumination conditions have a very lowaverage intensity value. Segmentation of the eye region in thesecases is difcult because of the presence of dark eyebrows in theneighborhood of the eye region. To overcome this problem, weconsider images grabbed under good illuminating conditions.After segmentation of the image, we need to localize the leftand right eyes on the image. In this paper, we use a template-matching scheme to localize the eyes. The eye template weused is similar to the template shown in Fig. 5. The template-matching scheme, taken from our previous works [2], [21],attempts to minimize the Euclidean distance between a fuzzydescriptor of the template and the fuzzy descriptor of the partof the image where the template is located. Even when thetemplate is not a part of the image, the nearest matched locationof the template in the image can be traced.

    C. Localization of EBC Region

    In a facial image, eyebrows are the second darkest regionafter the hair region. The hair region is easily segmented bysetting a very low threshold in thehistogram-based thresholdingalgorithm. The eye regions are also segmented by thresholding.A search for a dark narrow template can easily localize theeyebrows. Note that the localization of the eyebrow is essentialfor determining its length. This will be undertaken in the nextsection.

  • 8/8/2019 k.facial Recog

    4/18

    CHAKRABORTY et al. : EMOTION RECOGNITION FROM FACIAL EXPRESSIONS AND ITS CONTROL 729

    III. DETERMINATION OF FACIAL ATTRIBUTES

    In this section, we present a scheme for the measurements of facial extracts such as MO, EO, and the length of EBC.

    A. Determination of MO

    After segmentation of the mouth region, we plot the aver-age intensity prole against the MO. The dark region in thesegmented image represents the lip prole, whereas the whiteregions embedded in the dark region indicate the teeth. Noisyimages, however, may include false white patches. Fig. 4, forinstance, includes a white patch on the lip region.

    Determination of MO in a black and white image is easierbecause of the presence of the white teeth. A plot of the averageintensity prole against the MO reveals that the curve hasseveral minima, out of which the rst and third correspondto the inner region of the top lip and the inner region of thebottom lip, respectively. The difference between the preceding

    two measurements along the Y-axis gives a measure of the MO.An experimental instance of MO is shown in Fig. 4, where thepixel count between the thick horizontal lines gives a measureof MO. When no white band is detected in the mouth region,MO is set to zero. When only two minima are observed in theplot of average intensity, the gap between the two minima is themeasure of MO.

    B. Determination of EO

    After the localization of the eyes, the count of dark pixels(intensity < 30) plus the count of white pixels (intensity > 225)is plotted against the x-position. If the peak of this plot occurs

    at x = a , then the ordinate at x = a provides a measure of theEO (Fig. 6).

    C. Determination of the Length of EBC

    Constriction in the forehead region can be explained as acollection of white and dark patches called hilly and valleyregions, respectively. The valley regions are usually darker thanthe hilly regions. Usually, the width of the patches is around1015 pixels for a given facial image of 512 512 pixels.Let Iav be the average intensity in a selected rectangularprole on the forehead, and let Iij be the intensity of pixel(i, j). To determine the length of EBC on the forehead region,we scan for variation in intensity along the x-axis of the se-lected rectangular region. The maximum x-width that includesvariation in intensity is dened as the length of EBC. The lengthof the EBC has been measured in Fig. 7 by using the precedingprinciple. An algorithm for EBC is presented as follows.

    1) Take a narrow strip over the eyebrow region with thick-ness two-thirds of the width of the forehead, which isdetermined by the maximum count of pixels along thelength of projections from the hairline edge to the topedges of the eyebrows.

    2) The length l of the strip is determined by identifying itsintersection with the hair regions at both ends. Determinethe center of the strip, and select a window of x-length2l/ 3 symmetric with respect to the center.

    Fig. 6. Determination of the EO.

    Fig. 7. Determination of EBC in the selected rectangular patch, identied byimage segmentation.

    3) For x-positions central to window-right-end, do thefollowing.a) Select nine vertical lines in the window and compute

    the average intensity on each line.b) Calculate the variance of the nine average intensity

    values.c) If the variance is below a threshold, stop. Else, shift

    one pixel right.

  • 8/8/2019 k.facial Recog

    5/18

    730 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART A: SYSTEMS AND HUMANS, VOL. 39, NO. 4, JULY 2009

    4) Determine the total right shift.5) Following a procedure similar to step 3, determine the

    total left shift.6) Compute length of EBC = total left shift +

    total right shift.

    IV. FUZZY RELATIONAL MODEL FOREMOTION DETECTION

    In this section, the encoding of facial attributes and theirmapping to the emotion space are described using Mamdani-type implication relations.

    A. Fuzzication (Encoding) of Facial Attributes

    The measurements we obtain on MO, EO, and EBC areencoded into three distinct fuzzy sets: HIGH, LOW, andMODERATE. The typical membership functions [29] that wehave used in our simulation are presented below. For anycontinuous feature x, we have

    HIGH (x) = 1 exp(ax) , a > 0LOW(x) = exp( bx) , b > 0

    MODERATE (x) = exp (x xmean )2 / 22where xmean and 2 are the mean and variance of the param-eter x, respectively.

    For the best performance, we need to determine the opti-mal values of a, b, and . Details of these are discussed inSection VI.

    B. Fuzzy Relational Model for Emotion Detection

    Examination of a large facial database reveals that the degreeof a specic human emotion, such as happiness or anger, greatlydepends on the degree of MO, EO, and length of EBC. Thefollowing two sample rules describe the problem of mappingfrom the fuzzied measurement space of facial extracts to thefuzzied emotion space.

    Rule 1: IF (eye-opening is MODERATE) &(mouth-opening is SMALL) &(eyebrow-constriction is LARGE)

    THEN emotion is VERY-MUCH-DISGUSTED.

    Rule 2: IF (eye-opening is LARGE) &(mouth-opening is SMALL / MODERATE) &(eyebrow-constriction is SMALL)

    THEN emotion is VERY-MUCH-HAPPY.

    Since each rule contains antecedent clauses of three fuzzyvariables, their conjunctive effect is taken into account to de-termine the fuzzy relational matrix. The general formulation of a production rule with an antecedent clause of three linguisticvariables and one consequent clause of a single linguistic vari-able is discussed below. Consider, for instance, the followingfuzzy rule.

    If x is A and y is B and z is CThen w is D.

    Fig. 8. Structure of a fuzzy relational matrix.

    Let A(x), B(y), C(z), and D(w) be the membership distrib-utions of linguistic variables x, y, z, and w belonging to A, B,C, and D, respectively. Then, the membership distribution of the clause x is A and y is B and z is C is given by t[A(x),B(y), C(z)], where t denotes the fuzzy t-norm operator [29].Using Mamdani-type implication operator, the relation betweenthe antecedent and consequent clauses for the given rule isdescribed by

    R(x , y, z;w) = Min [t (A(x) , B(y) , C(z)) , D(w)] . (6)

    Taking Min as the t-norm, we can rewrite the preceding expres-sion as

    R(x , y, z;w) =Min [Min (A(x) , B(y) , C(z)) , D(w)]=Min [A(x) , B(y) , C(z) , D(w)] . (7)

    Now, given an unknown distribution of (A / (x) , B/ (y) ,C/ (z)) , where A/ A, B/ B, and C/ C, we can evaluateD/ (w) by the following fuzzy relational equation:

    D/ (w) = Min (A / (x) , B/ (y) , C/ (z) o R(x, y, z;w). (8)

    For discrete systems, the relation R (x, y, z; w) is representedby a matrix (Fig. 8), where xi , yi , zi , and wi denote specicarguments (corresponding to the variables x, y, z, and w,respectively).

    In our proposed application, the row index of the relationalmatrix is represented by conjunctive sets of values of MO, EO,and EBC. The column index of the relational matrix denotesthe possible values of six emotions: anxiety, disgust, fear,happiness, sadness, and surprise.

    For determining the emotion of a person, we dene two

    vectors: fuzzy descriptor vector F and emotion vector M. Thestructural forms of these two vectors are given as

    F = [S( eo) M( eo) L(eo) S(mo) M( mo) L(mo)

    S(ebc) M( ebc) L(ebc)] . (9)where S, M, and L stand for SMALL, MEDIUM, and LARGE,respectively. We also have

    M = [VA(emotion ) MA(emotion ) N-SoA (emotion )VD(emotion )MD(emotion )N-SoD (emotion )VAf (emotion ) MAf (emotion ) N-SoAf (emotion )VH(emotion ) MH(emotion ) N-SoH (emotion )VS(emotion ) MS(mod) N-SoS (emotion )VSr(emotion ) MSr(emotion ) N-SoSr (emotion )] (10)

  • 8/8/2019 k.facial Recog

    6/18

    CHAKRABORTY et al. : EMOTION RECOGNITION FROM FACIAL EXPRESSIONS AND ITS CONTROL 731

    where V, M, N-SO denote VERY, MODERATELY, andNOT-SO, and A, D, Af, H, S, and Sr denote ANXIOUS,DISGUSTED, AFRAID, HAPPY, SAD, and SURPRISED,respectively.

    The relational equation used for the proposed system isgiven by

    M = F / o RFM (11)

    where RFM is the fuzzy relational matrix with the row andcolumn indices as described above, and the ith component of the F / vector is given by Min{A/ (x i), B/ (y i), C/ (zi)}, wherethe variables xi , yi , zi {eo, mo, ebc}, and the fuzzy sets A/ ,B/ , C/ {S, M, L}are determined by the premise of the ithfuzzy rule.

    Given an F / vector and the relational matrix RFM , we caneasily compute the encoded emotion vector M using the preced-ing relational equation. Finally, to determine the membershipof the emotions from their fuzzy memberships, we need to

    employ a decoding scheme [(12), shown at the bottom of the page], where w1 , w2 , and w3 denote the weights of therespective graded memberships, which in the present contexthave (arbitrarily) been set to 0.33 each.

    V. EXPERIMENTS AND RESULTS

    The experiment is conducted in a laboratory environment,where illumination, sounds, and temperature are controlledto maintain uniformity in experimental conditions. Most of the subjects of the experiments are students, young facultymembers, and family members of the faculties. The experimentincludes two sessions: a presentation session followed by aface-monitoring session. In the presentation session, audiovi-sual clips from commercial lms are projected on a screen infront of individual subjects as a stimulus to excite their brain forarousal of emotion. A computer-controlled pan-tilt-type high-resolution camera is used for the online monitoring of facialexpressions of the subjects in the next phase. The grabbedimages of facial expressions are stored in a computer for featureanalysis in the subsequent phase.

    Experiments were undertaken over a period of two yearsto identify the appropriate audiovisual movie clips that causearousal of six different emotions: anxiety, disgust, fear, happi-ness, sadness, and surprise. A questionnaire was prepared to

    determine the consensus of the observers about the arousal of the rst ve emotions using a given set of audiovisual clips. Itincludes questions to a given observer on the percentage levelof excitation of different emotions by a set of 60 audiovisualmovie clips. The independent responses of 50 observers werecollected, and the results are summarized in Table I. Clearly,the percentages in each row total 100.

    The arousal of surprise, however, requires a subject topossess prior background information about an object or a

    TABLE IASSESSMENT OF THE AROUSAL POTENTIAL OF SELECTED AUDIOVISUAL

    MOVIE CLIPS IN EXCITING DIFFERENT EMOTIONS

    scene, and arousal starts when the object/scene signicantlydiffers from the general expectation. The main difculty to get

    someone surprised with a movie clip is that the clip shouldbe long enough to prepare the background knowledge before astrange scene is presented. To eliminate possible errors in theselection of the stimulus due to background differences amongthe subjects, it is reasonable to employ alternative means, ratherthan audiovisual movie clips, to cause arousal of surprise. Ourexperiments showed that an attempt at recognizing lost friends(usually schoolmates) from their current photographs causesarousal of surprise.

    To identify the right movie clips capable of exciting specicemotions, we need to dene a few parameters that would helpindicate a consensus of the observers about the arousal of anemotion.

    We have the following.O ji,k Percentage level of excitation of emotion j by an ob-

    server k using audiovisual clip i.E ji Average percentage score of excitation assigned to emo-

    tion j by n observers using clip i. ji Standard deviation of the percentage score assigned to

    emotion j by all the subjects using clip i.n Total number of observers.

    E ji and ji are evaluated using the following expressions:

    E ji =n

    k=1

    O ji,k / n (13)

    ji = n

    k=1

    (O ji,k E ji)2 / n . (14)

    The emotion w for which Ewi = max j{E ji} is the mostlikely aroused emotion due to excitation by audiovisual clip i.The Ewi values are obtained for i = 1 to 60.

    We next select six audiovisual clips from the pool of 60 movie samples such that the selected movies best excite

    HAPPY (emotion ) = VH(emotion ).w1 + MH(emotion ).w2 + N-SoH (emotion ).w3w1 + w 2 + w 3

    (12)

  • 8/8/2019 k.facial Recog

    7/18

    732 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART A: SYSTEMS AND HUMANS, VOL. 39, NO. 4, JULY 2009

    Fig. 9. Movie clips containing four frames (row-wise) used to excite anxiety, disgust, fear, happiness, and sadness.

    six specic emotions. The selection was made by using theaverage-to-standard-deviation ratio for competitive audiovisualclips employed to excite the same emotion. The audiovisualclip for which the average-to-standard-deviation ratio is thelargest is considered to be the most signicant sample to excitea desired emotion.

    Fig. 9 presents the ve most signicant audiovisual movieclips selected from the pool of 60 movies, where each clipwas the most successful to excite one of the ve emotions.Table II summarizes the theme of the selected movies. Theselected clips are presented before 300 people, and their facial

    TABLE IITHEME OF THE MOVIE CLIPS CAUSING STIMULATION

    OF SPECIFIC EMOTIONS

  • 8/8/2019 k.facial Recog

    8/18

    CHAKRABORTY et al. : EMOTION RECOGNITION FROM FACIAL EXPRESSIONS AND ITS CONTROL 733

    Fig. 10. Arousal of anxiety, disgust, fear, happiness, and sadness using the stimulator in Fig. 9 (female subject).

    expressions are recorded. Figs. 10 and 11 show two represen-tative examples of one female and one male in the age group2225 years.

    Image segmentation is used to segment the mouth region,eye region, and eyebrow region of individual frames for eachclip. MO, EO, and length of EBC are then determined for theindividual frames of each clip. The averages of EO, MO, andEBC over all the frames under a recorded emotion clip are thenevaluated. The membership of EO, MO, and EBC in the threefuzzy sets (Low, Medium, and High) are then evaluated usingthe membership functions given in Section IV. The results of

    membership evaluation for the emotion clips given in Fig. 10are presented in Table III.

    After evaluation of the memberships, we determine the emo-tion vector M by using (11), and then employ decoding rule [see(12)] to determine the membership of different emotions for theve clips. The emotion that comes up with the highest value isregarded as the emotion of the individual clips. The precedinganalysis is repeated for 300 people including 100 children,100 adult males, and 100 adult females, and the results of emotion classication are presented in Tables IVVI. Eachof these tables shows six aroused emotions, whereas the

  • 8/8/2019 k.facial Recog

    9/18

    734 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART A: SYSTEMS AND HUMANS, VOL. 39, NO. 4, JULY 2009

    Fig. 11. Arousal of anxiety, disgust, fear, happiness, and sadness using the stimulator in Fig. 9 (male subject).

    TABLE IIISUMMARY OF RESULTS OF MEMBERSHIP EVALUATION FOR THE EMOTIONS IN FIG . 10

  • 8/8/2019 k.facial Recog

    10/18

    CHAKRABORTY et al. : EMOTION RECOGNITION FROM FACIAL EXPRESSIONS AND ITS CONTROL 735

    TABLE IVRESULTS OF PERCENTAGE CLASSIFICATION OF AROUSED EMOTIONS IN SIX CLASSES FOR ADULT MALES

    TABLE VRESULTS OF PERCENTAGE CLASSIFICATION OF AROUSED EMOTIONS IN SIX CLASSES FOR ADULT FEMALES

    TABLE VIRESULTS OF PERCENTAGE CLASSIFICATION OF AROUSED EMOTIONS IN SIX CLASSES FOR CHILDREN IN AGE GROUP 812 Y EARS

    desired emotion refers to the emotion tag of the most sig-nicant audiovisual sample.

    The experimental results obtained from Tables IVVI revealthat the accuracy in the classication of emotion for adult male,adult female, and children (812 years) are 88.2%, 92.2%, and96%, respectively. The classication accuracies obtained in thispaper are better than the existing results on accuracy reportedelsewhere [19], [20], [26], [43].

    VI. VALIDATION OF THE SYSTEM PERFORMANCE

    After a prototype design of an intelligent system is complete,we need to validate its performance. The term validation hererefers to building the right system that truly resembles thesystem intended to be built. In other words, validation refers tothe relative performance of the system that has been designed,and suggests reformulation of the problem characteristics and

    concepts based on the deviation of its performance from thatof the desired (ideal) system [18]. It has experimentally been

    observed that the performance of the proposed system greatlydepends on the parameters of the fuzzy encoders [28]. Todetermine optimal settings of the parameters, a scheme for thevalidation of the systems performance is proposed in Fig. 12.

    In Fig. 12, we tune the parameters a, b, xmean (or m), and of the fuzzy encoders by a supervised learning algorithm, soas to generate the desired emotion from given measurements of the facial extract. The backpropagation algorithm has been em-ployed to experimentally determine the parameters a, b, m, and. The feedforward neural network used for the realization of the backpropagation algorithm has three layers with 26 neuronsin the hidden layer. The number of neurons in the input andoutput layers are determined by the dimensions of F / vectorand M vector, respectively. The root mean square error accu-racy of the algorithm was set to 0.001. For a sample space

  • 8/8/2019 k.facial Recog

    11/18

    736 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART A: SYSTEMS AND HUMANS, VOL. 39, NO. 4, JULY 2009

    Fig. 12. Validation of the proposed system by tuning the parameters a, b, m,and of the encoders.

    of approximately 100 known emotions of different persons,the experiment is conducted, and the following values of theparameters are found to yield the best results: a = 2 .2, b = 1 .9,xmean = 2 .0, and = 0 .17.

    VII. EMOTION TRANSITION AND ITS CONTROL

    The emotion of a person at a given time is determined by thecurrent state of his/her mind. The current state of ones mindis mainly controlled by the positive and negative inuencesof input sensory data, including voice, video clips, ultrasonicsignals, and music. We propose a model of emotion transitiondynamics where the strength of an emotion at time t + 1depends on the strength of all emotions of a subject at time t,and the positive/negative inuences that have been applied asstimulation at time t.

    A. Model

    We have the following.mi(t) Positive unnormalized singleton mem-

    bership of the ith emotion at time t.[wij ] Weight matrix of dimension n n,where wij denotes a cognitive

    (memory-based) degree of transitionfrom the ith to the jth emotional state,and is a signed nite real number.

    POS-IN (strength k , t) Fuzzy membership distribution of aninput stimulus with strength k to act asa positive inuence at time t.

    NEG-IN (input l , t) Fuzzy membership distribution of aninput stimulus with strength l to act asa negative inuence at time t.

    bik Weight representing the inuence of aninput with strength k on the ith emo-tional state.

    cil Weight representing the inuence of aninput with strength l on the ith emo-tional state.

    The unnormalized membership value of an emotional state i attime t + 1 can be expressed as a function of the unnormalizedmembership values of all possible emotional states j, and the

    membership distribution of the input positive and negativeinuences at time t, i.e.,

    mi(t + 1) = j

    wi, j m j(t) +

    k

    bi,k POS-IN (strength k , t)

    l

    ci,l NEG-IN (strength l , t) . (15)

    The rst term in the right-hand side of (15) accounts forthe cognitive transition of emotional states, which concernshuman memory, whereas the second and third terms indicatethe effect of external inuence on the membership of the ithemotional state. The weights

    wijof the cognitive memory are

    considered time invariant. Since wij s are time invariant, con-trolling the transition of emotional states can be accomplishedby POS-IN (strength k , t) and NEG-IN (strength l , t) .

    The positive terms on the right side of (15) indicate thatwith a growth in m j and POS-IN (strength k , t) , mi(t + 1) alsoincreases. The negative sign of the third term signies that witha growth in NEG-IN (strength l , t) , mi(t + 1) decreases.

    A persons emotional state changes from happy to anxiouson a negative input, e.g., when he/she runs the risk of losingsomething. An anxious person becomes sad when he suffersa loss. A sad person becomes disgusted when he realizesthat he is not responsible for his loss/failure. In other words,with increasing negative inuence (neg), the human emotionundergoes a transition in the following order:

    disgustedneg

    sadneg

    anxiousneg

    happy .Alternatively, with increasing positive inuence (pos), the hu-man emotion has a gradual transition from the disgusted stateto the happy state, as follows:

    disgusted possadpos

    anxiouspos

    happy .Combining the preceding two state transition schemes, we canrepresent emotion transitions by a graph (see Fig. 13).

    We have the following.M = [m i] Unnormalized membership vec-

    tor of dimension n 1, whoseith element denotes the unnor-malized singleton membershipof emotion i at time t.

    = [POS-IN (strength k , t)] Positive inuence membershipvector of dimension m 1 whosekth component denotes the fuzzymembership of strength k of theinput stimulus.

    / = [NEG-IN (input l , t)] Negative inuence membershipvector of dimension (m 1),whose lth component denotes

  • 8/8/2019 k.facial Recog

    12/18

    CHAKRABORTY et al. : EMOTION RECOGNITION FROM FACIAL EXPRESSIONS AND ITS CONTROL 737

    Fig. 13. Proposed emotion transition graph.

    the fuzzy membership of strength l of the input stimulus.

    B = [b ij ] n m companion matrix to vector.C = [c ij ] n m companion matrix to /vector.Considering the emotion transition dynamics [see (15)] fori = 1 to n, we can represent the complete system of emotiontransition in vectormatrix form as

    M (t + 1) = W .M (t) + B C / . (16a)

    The weight matrix W in (16a) is given by

    To keep the membership vector M normalized, we use thefollowing scaling operation:

    M S(t + 1) = M (t + 1) /n

    Maxi=1 {mi(t + 1) }. (16b)

    Normalization of memberships in [0, 1] is needed for con-venience of interpretation, but is not directly related to theemotion control problem undertaken here.

    A control-theoretic representation of emotion transition dy-

    namics [(16a) and (16b)] is given in Fig. 14, where the systemhas an inherent delayed positive feedback along with provisionfor control with external input vectors and / .

    B. Properties of the Model

    In an autonomous system, the system states change with-out application of any control inputs. The emotion transitiondynamics can be compared with an autonomous system witha setting of = / = 0 . The limit cyclic behavior of an au-tonomous emotion transition dynamics is given in the followingtheorem.

    Theorem 1: The vector M (t + 1) in an autonomous emo-tion transition dynamics with = / = 0 exhibits limit cyclicbehavior after every k-iterations if W k = I .

    Proof: Since = / = 0 , we can rewrite (16a) as

    M (t + 1) = W .M (t) .

    Iterating t = 0 to (k 1), we haveM (1) = W .M(0)M (2) = W .M (1) = W 2M (0)

    ::

    M (k) = W k M (0) .

    Since the membership vector exhibits limit cyclic behavior afterevery k-iterations, we have

    M (k) = M (0)

    which in turn requires W k = I .Theorem 1 indicates that without external perturbation, the

    cognitive memory helps in maintaining a recurrent relationin emotional states under a restrictive selection of memoryweights, satisfying W K = I .

    For steering the system states in Fig. 13 toward the happystate, we need to provide positive inuences in the state diagramat any state. Similarly, for controlling the state transitionstoward the disgusted state from any state s, we submit a negativeinuence at state s. This, however, demands a prerequisite of controllability of the membership state vector M . The control-lability of a given state vector M to a desired state is examinedby Theorem 2.

    Theorem 2: The necessary and sufcient condition for thestate transition system to be controllable is that the controllabil-ity matrices

    P = B W B W 2B , . . . , W n1BQ = C WC W 2C , . . . , W n1C

    should have rank equal to n.Proof: The proof follows from the test criterion of con-

    trollability of linear systems [27].

    C. Emotion Control by Mamdanis Model

    In a closed-loop process control system [21], error is denedas the difference of the set point (reference input) and theprocess response, and the task of a controller is to graduallyreduce the error toward zero. When the emotional state transi-tion is regarded as a process, we dene error as the qualitativedifference between the desired and the current emotional states.To quantitatively represent error, we attach an index to theindividual emotional states in Fig. 13, such that when the erroris positive (negative), we can guide the error toward zero byapplying a positive (negative) inuence. One possible indexingscheme that satises the above principle is given in Table VII.

  • 8/8/2019 k.facial Recog

    13/18

    738 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART A: SYSTEMS AND HUMANS, VOL. 39, NO. 4, JULY 2009

    Fig. 14. Emotion transition dynamics.

    TABLE VIIINDEXING OF EMOTIONAL STATES

    We can now quantify error as the difference between thedesired emotional index (DEI) and the current emotional index(CEI). For example, if the desired emotion is happiness, and the

    current emotion is sadness, then the error is e = DEI CEI =4 2 = 2 . Similarly, when the desired emotion is disgust, andthe current emotion is happiness, error e = DEI CEI = 1 4 = 3.Naturally, for the generation of control signals, we need toconsider both sign and magnitude of the error. To eliminatethe effect of noise in controlling emotion, instead of directlyusing the signed errors, we fuzzify the magnitude of errorsinto four fuzzy sets (i.e., SMALL, MODERATE, LARGE, andVERY LARGE) and the sign of errors into two fuzzy sets (i.e.,POSITIVE and NEGATIVE) using nonlinear (Gaussian-type)membership functions. The nonlinearity of the membershipfunctions eliminates the small Gaussian noise (with zero meanand small variance) over the actual measurement of error.Further, to generate fuzzy control signals and / , we needto represent the strength of positive and negative inuencesin four fuzzy sets (i.e., SMALL, MODERATE, LARGE, andVERY LARGE) as well. Tables VIII and IX provide a list of membership functions used for fuzzy control.

    In this paper, parameter selection of the membership func-tions has been achieved by trial and error. To attain the bestperformance of the controller, we considered 50 single-stepcontrol instances and found that the settings in Table X gavethe best performance for all the 50 instances.

    Let AMPLITUDE and SIGN be two fuzzy universes of error ,and let STRENGTH be a fuzzy universe of positive/negativeinuences. Here, SMALL, MODERATE, LARGE, and VERY

    LARGE are fuzzy subsets of AMPLITUDE and STRENGTH,whereas POSITIVE and NEGATIVE are fuzzy subsets of theuniverse SIGN. Let x, y, and z be fuzzy linguistic variables,with x and y denoting error and z denoting positive/negativeinuence. Let Ai and Ci be any singleton fuzzy subset of

    {SMALL , MODERATE , LARGE , VERY-LARGE }, and let Bibe a singleton subset of {POSITIVE , NEGATIVE }. The gen-eral form of the ith fuzzy control rule is as follows.Rule R i: If x is Ai and y is Bi Then z is Ci .

    Typical examples of the ith rule are given below.

    Rule1: If error is SMALL and error is POSITIVEThen apply positive inuence of SMALL strength.

    Rule2: If error is MODERATE and error is NEGATIVEThen apply negative inuence of MODERATEstrength.

    The Mamdani-type implication relation for Rule i is nowgiven by

    Ri(x, y;z) = Min [Min (A i(x) , Bi(y)) , Ci(z)] . (17)

    Now, for a given distribution of A/ (x) and B/ (y) , where A/

    Ai, and B/ Bi , we can evaluateC/i (z) = Min A

    / (x) , B/ (y) o R i(x, y;z). (18)

    To take the aggregation of all the n rules, we determine

    C/ (z) =n

    Maxi=1

    C/i (z) . (19)

    Example 1: In this example, we illustrate the construction of R i(x, y;z) and the evaluation of C

    /i (z) . Given

    POSITIVE (error) =

    {1/ 0.2, 2/ 0.5, 3/ 0.6

    }SMALL (error) = {0/ 0.9, 1/ 0.1, 2/ 0.01, 3/ 0.005}SMALL (pos-in ) = {2/ 0.9, 10/ 0.6, 20/ 0.21, 30/ 0.01}.

  • 8/8/2019 k.facial Recog

    14/18

    CHAKRABORTY et al. : EMOTION RECOGNITION FROM FACIAL EXPRESSIONS AND ITS CONTROL 739

    TABLE VIIIMEMBERSHIP FUNCTIONS OF MAGNITUDE OF ERROR AND UNSIGNED STRENGTH OF POSITIVE /NEGATIVE INFLUENCE

    TABLE IXMEMBERSHIP FUNCTIONS OF SIGN OF ERROR

    TABLE XSELECTED PARAMETERS OF THE MEMBERSHIP FUNCTIONS

    The relational matrix R (error, error; pos-in) can now be evalu-ated by Mamdanis implication function as follows:

    Min[POSITIVE (error), SMALL (error)]

    = {(1, 0)/ 0.2, (1, 1)/ 0.1, (1, 2)/ 0.01, (1, 3)/ 0.005 ;(2, 0)/ 0.5, (2, 1)/ 0.1, (2, 2)/ 0.01, (2, 3)/ 0.005 ;

    (3, 0)/ 0.6, (3, 1)/ 0.1, (3, 2)/ 0.01, (3, 3)/ 0.005}.Using (17), theR (error, error; pos-in) matrix is now obtained as

    pos-inuence

    e, e 02 10 20 30

    1, 0 0.2 0.2 0.2 0.011, 1 0.1 0.1 0.1 0.011, 2 0.01 0.01 0.01 0.011, 3 0.005 0.005 0.005 0.0052, 0 0.5 0.5 0.2 0.012, 1 0.1 0.1 0.1 0.012, 2 0.01 0.01 0.01 0.012, 3 0.005 0.005 0.005 0.0053, 0 0.6 0.6 0.2 0.013, 1 0.1 0.1 0.1 0.013, 2 0.01 0.01 0.01 0.013, 3 0.005 0.005 0.005 0.005

    .

    Let us now consider the observed membership distribution of error to be positive and small as follows:

    POSITIVE / (error) =

    {1/ 0.1, 2/ 0.5, 3/ 0.7

    }SMALL / (error) = {0/ 0.2, l/ 0.1, 2/ 0.4, 3/ 0.5}.

    We can evaluate

    SMALL / (pos-in )

    = Min POSITIVE / (error), SMALL / (error)

    o R(error , error ; pos-in )

    = [0.1 0.1 0.1 0.1 0.2 0.1 0.4 0.5 0.2 0.1 0.4 0.5]

    o R(error , error ; pos-in )

    = [02/ 0.2 10/ 0.2 20/ 0.2 30/ 0.01].

    To determine the strength of audiovisual movies to be se-lected for presentation to the subject, we need to defuzzify(decode) the control signal Ci(pos-in ) or Ci(neg-in ). Let

    C(pos-in ) = (x 1 / C(x 1), x2 / C(x 2), . . . , xn / C(x n )}.Then, centroidal-type defuzzication (decoding) [21] yields thevalue of the control signal

    xdeffzy =n

    i=1

    C(x i).xi/n

    i=1

    C(x i).

    Example 2: Let

    C(pos-in ) = [1/ 0.2, 3/ 0.2, 6/ 0.2, 9/ 0.1].

    The defuzzication of the control signal by the center of gravity method is obtained as

    xdeffzy =1 0.2 + 3 0.2 + 6 0.2 + 9 0.1

    0.2 + 0 .2 + 0 .2 + 0 .1= 4 .14.

  • 8/8/2019 k.facial Recog

    15/18

    740 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART A: SYSTEMS AND HUMANS, VOL. 39, NO. 4, JULY 2009

    Fig. 15. Complete scheme of Mamdani-type emotion control.

    D. Architecture of the Proposed Emotion Control Scheme

    A complete scheme of the Mamdani-type emotion control ispresented in Fig. 15. The scheme includes an emotion transitiondynamics with provision for control inputs and / , and afuzzy controller to generate the necessary control signals. Thefuzzy controller compares DEI and CEI, and their differenceis passed through the AMPLITUDE and SIGN fuzzy encoder.A fuzzy relational approach to the automatic reasoning intro-duced earlier in this paper is then employed to determine themembership distribution of Ci(pos-in ) and C

    /i (neg-in ) using

    the ith red rule. The MAX units determine the maximumof Ci(pos-in ) and C

    /i (neg-in ) that corresponds to all the red

    rules, resulting in and / , which are supplied as controlinputs to the emotion transition dynamics. The generation of control signals and / are continued until the next emo-tional index (NEI) is equal to the DEI. When NEI is equal toDEI, the control switches S and S/ , which were closed sincestartup, are opened, as further transition of emotion is no longerrequired.

    The decoding units are required to defuzzify the controlsignals and / . The decoding process yields the absolutestrength of audiovisual movies in the range [a, +b] , where in-tegers a and b are determined by system resources, as discussedin Section V. The decoding process thus selects the audiovisual

    movie of appropriate strength for presentation to the subjects.Note that there are two fuzzy decoders in Fig. 15. When theerror is positive, only fuzzy decoder1 is used. On the other hand,when the error is negative, fuzzy decoder 2 is used.

    E. Experiments and Results

    The proposed architecture (Fig. 15) was studied on300 individuals at Jadavpur University. The experiment beganwith 100 audiovisual movies labeled with a positive/negativeinteger in [a, +b] , which represents the strength of positive/ negative external stimulus. The labeling was done by a groupof 50 volunteers who assigned a score in [

    a, +b] . The average

    of the scores assigned to an audiovisual movie stimulus isrounded-off to the nearest integer and used as its label. When

  • 8/8/2019 k.facial Recog

    16/18

    CHAKRABORTY et al. : EMOTION RECOGNITION FROM FACIAL EXPRESSIONS AND ITS CONTROL 741

    Fig. 16. Submission of audiovisual stimulus of strength 28, 16, and 8 forcontrolling the emotion of a subject from the disgusted state (leftmost) to a

    nal happy state through sad and anxious states (in that order).

    the variance of the assigned scores for a movie is above aselected small threshold (1.8), we drop it from the list. Inthis manner, we selected only 60 movies, dropping 40 moviesamples. This signies that the scores obtained from 50 volun-teers are close enough for the selected 60 audiovisual movies,which indicates good accuracy of the stimulus.

    The current emotion of a subject is detected by the emotionrecognition scheme presented in the previous sections. Thedesired emotion is randomly selected for a given subject. Thecontrol scheme outlined in Section IV is then invoked. Whenfuzzy decoder 1 or 2 (Fig. 15) generates a signed score, the

    nearest available average-scored audiovisual movie is selectedfor presentation to the subject. Note that when error e is equalto m (m < = 3) , we need to select a sequence of at leastm audiovisual movies until NEI becomes equal to DEI. Anexperimental instance of emotion control is shown in Fig. 16,where the current emotion of the subject at time t = 0 isdisgust, and the desired emotion is happiness. This requiresthree state transitions in emotions, which can be undertakenby presenting three audiovisual movies of strength + 28 units,+ 16 units, and + 08 units, in succession, to the subject. Here,DEI = 3 , and CEI = 0 ; therefore, error e = DEI CEI = 3 >0. Since the error is positive, we apply positive instances of suitable strength as decided by the fuzzy controller. If the errorwere negative, then the fuzzy controller would have selectedaudiovisual stimuli of negative strength.

    Two interesting points of the experiment include 1) goodexperimental accuracy and 2) repeatability. Experimental ac-curacy ensures that we could always control the error to zero.Repeatability ensures that for the same subject and the samepair of current and desired emotional states, the selected setof audiovisual movies is unique. Robustness of the controlalgorithm is thus established. In Fig. 16, the widths of thecontrol pulses are 3, 2, and 2 s. At time t = 0 , the erroris large and positive. Therefore, the control signal generatedhas a positive strength of long duration. Then, with a gradualdecrease in error, the strength of the control signal and itsduration decrease.

    VIII. C ONCLUSION

    The merits of the proposed scheme for emotion detection liein the segmentation of the mouth region by FCM clusteringand determination of the MO from the minima in average-pixel intensity plot of the mouth region. The proposed EOdetermination also adds value to the emotion detection system.The fuzzy relational approach to emotion detection from facialfeature space to emotion space has a high classication accu-racy of around 90%, which is better than other reported results.Because of its good classication accuracy, the proposed emo-tion detection scheme is expected to have applications in next-generation humanmachine interactive systems.

    The ranking of audiovisual stimulus considered in this paperalso provides a new approach to determining the best movies toexcite specic emotions.

    The existing emotion detection methods hardly consider theeffect of near-past stimulation on the current arousal of emo-tion. To overcome this problem, we submitted an audiovisual

    stimulus of relaxation before submission of any other movieclip to excite emotions. A state transition in emotion by anaudiovisual movie thus always occurs at a relaxed state of mind,giving full effect of the current stimulus on the excitatory sub-system of the brain and causing arousal of the desired emotionwith its full manifestation on the facial expression. Featureextraction from the face becomes easy when the manifestationof facial expression truly resembles the aroused emotion.

    An important aspect of this paper is the design of an emotioncontrol scheme. The accuracy of the control scheme ensuresconvergence of the control algorithm with a zero error, andrepeatability ensures the right selection of audiovisual stimulus.

    The proposed scheme of emotion recognition and controlcan be applied in system design for two different problemdomains. First, it can serve as an intelligent layer in the next-generation humanmachine interactive system. Such a systemwould have extensive applications in the frontier technology of pervasive and ubiquitous computing [42]. Second, the emotionmonitoring and control scheme would be useful for psycho-logical counseling and therapeutic applications. The pioneeringworks on the structure of emotion by Gordon [14] and theemotional control of cognition by Simon [36] would nda new direction with the proposed automation for emotionrecognition and control.

    ACKNOWLEDGMENT

    The authors would like to thank the anonymous reviewers fortheir comments.

    REFERENCES[1] J. C. Bezdek, Fuzzy mathematics in pattern classication, Ph.D. disser-

    tation, Appl. Math. Center, Cornell Univ., Ithaca, NY, 1973.[2] B. Biswas, A. K. Mukherjee, and A. Konar, Matching of digital images

    using fuzzy logic, AMSE Publication , vol. 35, no. 2, pp. 711, 1995.[3] M. T. Black and Y. Yacoob, Recognizing facial expressions in image

    sequences using local parameterized models of image motion, Int. J.Comput. Vis. , vol. 25, no. 1, pp. 2348, Oct. 1997.

    [4] C. Busso and S. Narayanan, Interaction between speech and facial ges-tures in emotional utterances: A single subject study, IEEE Trans. Audio,Speech Language Process. , vol. 15, no. 8, pp. 23312347, Nov. 2007.

  • 8/8/2019 k.facial Recog

    17/18

    742 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART A: SYSTEMS AND HUMANS, VOL. 39, NO. 4, JULY 2009

    [5] I. Cohen, Facial expression recognition from video sequences, M.S.thesis, Univ. Illinois Urbana-Champaign, Dept. Elect. Eng., Urbana, IL,2000.

    [6] I. Cohen, N. Sebe, A. Garg, L. S. Chen, and T. S. Huang, Facial ex-pression recognition from video sequences: Temporal and static mod-eling, Comput. Vis. Image Underst. , vol. 91, no. 1/2, pp. 160187,Jul. 2003.

    [7] C. Conati, Probabilistic assessment of users emotions in educational

    games, J. Appl. Artif. Intell., Special Issue Merging Cognition Affect HCT , vol. 16, no. 7/8, pp. 555575, Aug. 2002.[8] G. Donato, M. S. Bartlett, J. C. Hager, P. Ekman, and T. J. Sejnowski,

    Classifying facial actions, IEEE Trans. Pattern Anal. Mach. Intell. ,vol. 21, no. 10, pp. 974989, Oct. 1999.

    [9] P. Ekman and W. V. Friesen, Unmasking the Face: A Guide to Recogniz-ing Emotions From Facial Clues . Englewood Cliffs, NJ: Prentice-Hall,1975.

    [10] I. A. Essa and A. P. Pentland, Coding, analysis, interpretation and recog-nition of facial expressions, IEEE Trans. Pattern Anal. Mach. Intell. ,vol. 19, no. 7, pp. 757763, Jul. 1997.

    [11] W. A. Fellenz, J. G. Taylor, R. Cowie, E. Douglas-Cowie, F. Piat,S. Kollias, C. Orovas, and B. Apolloni, On emotion recognition of facesand of speech using neural networks, fuzzy logic and the ASSESS sys-tems, in Proc. IEEE -INNS-ENNS Int. Joint Conf. Neural Netw. , 2000,pp. 9398.

    [12] J. M. Fernandez-Dols, H. Wallbotl, and F. Sanchez, Emotion cate-gory accessibility and the decoding of emotion from facial expres-sion and context, J. Nonverbal Behav. , vol. 15, no. 2, pp. 107123,Jun. 1991.

    [13] Y. Gao, M. K. H. Leung, S. C. Hui, andM. W. Tananda, Facial expressionrecognition from line-based caricatures, IEEE Trans. Syst., Man, Cybern. A, Syst., Humans , vol. 33, no. 3, pp. 407412, May 2003.

    [14] R. N. Gordon, The Structure of Emotions: Investigations in CognitivePhilosophy , ser. Cambridge Studies in Philosophy. Cambridge, U.K.:Cambridge Univ. Press, 1990.

    [15] K. Izumitani, T. Mikami, and K. Inoue, A model of expression grade forface graphs using fuzzy integral, Syst. Control , vol. 28, no. 10, pp. 590596, 1984.

    [16] F. Kawakami, S. Morishima, H. Yamada, and H. Harashima, Con-struction of 3-D emotion space using neural network, in Proc. 3rd Int. Conf. Fuzzy Logic, Neural Nets Soft Comput. , Iizuka, Japan, 1994,pp. 309310.

    [17] G. J. Klir and B. Yuan, Fuzzy Sets and Fuzzy Logic: Theory and Applica-tions . Englewood Cliffs, NJ: Prentice-Hall, 1995.[18] H. Kobayashi and F. Hara, The recognition of basic facial expressions by

    neural network, Trans. Soc. Instrum. Contr. Eng. , vol. 29, no. 1, pp. 112118, 1993.

    [19] H. Kobayashi andF. Hara, Measurementof thestrengthof sixbasicfacialexpressions by neural network, Trans. Jpn. Soc. Mech. Eng. (C) , vol. 59,no. 567, pp. 177183, 1993.

    [20] H. Kobayashi and F. Hara, Recognition of mixed facial expressionsby neural network, Trans. Jpn. Soc. Mech. Eng. (C) , vol. 59, no. 567,pp. 184189, 1993.

    [21] A. Konar, Computational Intelligence: Principles, Techniques and Appli-cations . Heidelberg, Germany: Springer-Verlag, 2005.

    [22] A. F. Kramer, E. J. Sirevaag, and R. Braune, A psycho-physiologicalassessment of operator workload during simulated ight missions, Hum.Factors , vol. 29, no. 2, pp. 145160, Apr. 1987.

    [23] A. Lanitis, C. J. Taylor, and T. F. Cootes, Automatic interpretation andcoding of face images using exible models, IEEE Trans. Pattern Anal. Mach. Intell. , vol. 19, no. 7, pp. 743756, Jul. 1997.

    [24] H. Li, P. Roivainen, and R. Forchheimer, 3D motion estimation in model-based facial image coding, IEEE Trans. Pattern Anal. Mach. Intell. ,vol. 15, no. 6, pp. 545555, Jun. 1993.

    [25] X. Li and Q. Ji, Active affective state detection and user assistance withdynamic Bayesian networks, IEEE Trans. Syst., Man, Cybern. A, Syst., Humans , vol. 35, no. 1, pp. 93105, Jan. 2005.

    [26] K. Mase, Recognition of facial expression from optical ow, Proc. IEICE Trans., Special Issue Comput. Vis. Appl. , vol. 74, no. 10, pp. 34743483, 1991.

    [27] K. Ogata, Modern Control Engineering . Englewood Cliffs, NJ: Prentice-Hall, 1990.

    [28] W. Pedrycz and J. Valente de Oliveira, A development of fuzzy encodingand decoding through fuzzy clustering, IEEE Trans. Instrum. Meas. ,vol. 57, no. 4, pp. 829837, Apr. 2008.

    [29] W. Pedrycz and F. Gomide, An Introduction to Fuzzy Sets: Analysis and Design . Cambridge, MA: MIT Press, 1998.[30] R. Picard, Affective Computing . Cambridge, MA: MIT Press, 1997.

    [31] R. W. Picard, E. Vyzas, and J. Healey, Toward machine emotional intel-ligence: Analysis of affective psychological states, IEEE Trans. Pattern Anal. Mach. Intell. , vol. 23, no. 10, pp. 11751191, Oct. 2001.

    [32] P. Rani, N. Sarkar, and J. Adams, Anxiety-based affective communi-cation for implicit humanmachine interaction, Adv. Eng. Inf. , vol. 21,no. 3, pp. 323334, Jul. 2007.

    [33] P. Rani, N. Sarkar, C. Smith, and L. Kirby, Anxiety detecting ro-botic systemsTowards implicit humanrobot collaboration, Robotica ,

    vol. 22, no. 1, pp. 8393, 2004.[34] M. Rosenblum, Y. Yacoob, and L. Davis, Human expression recognitionfrom motion using a radial basis function network architecture, IEEE Trans. Neural Netw. , vol. 7, no. 5, pp. 11211138, Sep. 1996.

    [35] J. Scheirer, R. Fernadez, J. Klein, and R. Picard, Frustrating the useron purpose: A step toward building an affective computer, Interact.Comput. , vol. 14, no. 2, pp. 93118, Feb. 2002.

    [36] H. Simon, Motivational and Emotional Control of Cognition, Models of Thought . New Haven, CT: Yale Univ. Press, 1979, pp. 2938.

    [37] D. Terzopoulos and K. Waters, Analysis and synthesis of facial imagesequences using physical and anatomical models, IEEE Trans. Pattern Anal. Mach. Intell. , vol. 15, no. 6, pp. 569579, Jun. 1993.

    [38] Y. Tian, T. Kanade, and J. Cohn, Recognizing action units for facialexpression analysis, IEEE Trans. Pattern Anal. Mach. Intell. , vol. 23,no. 2, pp. 97115, Feb. 2001.

    [39] N. Ueki, S. Morishima, and H. Harashima, Expression analysis/synthesissystem based on emotion space constructed by multilayered neural net-work, Syst. Comput. Jpn. , vol. 25, no. 13, pp. 95103, 1994.

    [40] O. A. Uwechue and S. A. Pandya, Human Face Recognition Using Third-Order Synthetic Neural Networks . Boston, MA: Kluwer, 1997.

    [41] P. Vanger, R. Honlinger, and H. Haykin, Applications of synergetic indecoding facial expressions of emotions, in Proc. Int. Workshop Autom.Face Gesture Recog. , Zurich, Switzerland, 1995, pp. 2429.

    [42] A. Vasilakos and W. Pedrycz, Ambient Intelligence, Wireless Net-working and Ubiquitous Computing . Norwood, MA: Artech House,Jun. 2006.

    [43] Y. Yacoob and L. Davis, Computing spatio-temporal representations of human faces, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recog. , Jun. 1994, pp. 7075.

    [44] Y. Yacoob and L. Davis, Recognizing human facial expression from longimage sequences using optical ow, IEEE Trans. Pattern Anal. Mach. Intell. , vol. 18, no. 6, pp. 636642, Jun. 1996.

    [45] H. Yamada, Visual information for categorizing facial expression of

    emotion, Appl. Cogn. Psychol. , vol. 7, no. 3, pp. 257270, 1993.[46] Z. Zeng, Y. Fu, G. I. Roisman, Z. Wen, Y. Hu, and T. S. Huang, Sponta-neous emotional facial expression detection, J. Multimedia , vol. 1, no. 5,pp. 18, Aug. 2006.

    Aruna Chakraborty received the M.A. degree incognitive science and the Ph.D. degree on emotionalintelligence and humancomputer interactions fromJadavpur University, Calcutta, India, in 2000 and2005, respectively.

    She is currently an Assistant Professor with theDepartment of Computer Science and Engineering,St. Thomas College of Engineering and Technology,Calcutta. She is also a Visiting Faculty with JadavpurUniversity, where she offers graduate-level courseson intelligent automation and robotics, and cognitive

    science. She is writing a book with her teacher A. Konar on Emotional Intelligence: A Cybernetic Approach , which is shortly to appear from Springer,Heidelberg, 2009. She serves as an Editor to the International Journal of Articial Intelligence and Soft Computing , Inderscience, U.K. Her current

    research interest includes articial intelligence, emotion modeling, and theirapplications in next-generation humanmachine interactive systems. She is anature lover, and loves music and painting.

  • 8/8/2019 k.facial Recog

    18/18

    CHAKRABORTY et al. : EMOTION RECOGNITION FROM FACIAL EXPRESSIONS AND ITS CONTROL 743

    Amit Konar (M97) received the B.E. degree fromBengal Engineering and Science University (B.E.College), Howrah, India, in 1983 and the M.E. TelE, M. Phil., and Ph.D. (Engineering) degrees fromJadavpur University, Calcutta, India, in 1985, 1988,and 1994, respectively.

    In 2006, he was a Visiting Professor with theUniversity of Missouri, St. Louis. He is currently

    a Professor with the Department of Electronics andTele-communication Engineering (ETCE), JadavpurUniversity, where he is the Founding Coordinator of

    the M.Tech. program on intelligent automation and robotics. He has supervisedten Ph.D. theses. He has around 200 publications in international journal andconference proceedings. He is the author of six books, including two populartexts Articial Intelligence and Soft Computing (CRC Press, 2000) and Compu-tational Intelligence: Principles, Techniques and Applications (Springer, 2005).He serves as the Editor-in-Chief of the International Journal of Articial Intelligence and Soft Computing . His research areas include the study of com-putational intelligence algorithms and their applications to the entire domain of electrical engineering and computer science. Specically, he worked on fuzzysets and logic, neurocomputing, evolutionary algorithms, DempsterShafertheory, and Kalman ltering, and applied the principles of computationalintelligence in image understanding, VLSI design, mobile robotics, and patternrecognition.

    Dr. Konar is a member of the editorial board of ve other international journals. He was the recipient of All India Council for Technical Education(AICTE)-accredited 19972000 Career Award for Young Teachers for hissignicant contribution in teaching and research.

    Uday Kumar Chakraborty received the Ph.D. de-gree from Jadavpur University, India for his work onstochastic models of genetic algorithms.

    He held positions with the CAD Center,Calcutta, India; CMC Limited (Calcutta andLondon); Jadavpur University, Calcutta, India; andthe German National Research Center for ComputerScience (GMD), Bonn, Germany. He is currentlyan Associate Professor of computer science withthe University of Missouri, St. Louis. His researchinterests include evolutionary computation, soft

    computing, scheduling, and computer graphics. He is (co)author/editor of three books and 90 articles in journals and conference proceedings. He isan Area Editor of New Mathematics & Natural Computation and an Editorof the Journal of Computing and Information Technology . He serves on theeditorial boards of three other journals. He has guest edited special issueson evolutionary computation of many computer science journals and hasserved as track chair or program committee member of numerous internationalconferences.

    Amita Chatterjee received the Ph.D. degree onTheProblems of CounterfactualConditionalsfromthe University of Calcutta, West Bengal, India.

    He is currently a Professor of philosophy and theCoordinator of the Center for Cognitive Science,Jadavpur University, Calcutta, India. She is continu-ing her personal research and supervising Ph.D. andM. Phil dissertations for the past 20 six years. Books

    authored and edited by her include UnderstandingVagueness (1994), Perspectives on Consciousness(2003), and Philosophical Concepts Relevant to Sci-

    ences , vol. 1 (2006), vol. 2 (2008). She has contributed articles in national andinternational refereed journals and anthologies of repute. She is on the editorialboard of the Indian Philosophical quarterly and the International Journal of Articial Intelligence and Soft Computing . Her areas of interest are logic, ana-lytical philosophy, philosophy of mind, and cognitive science. She is currentlyengaged in research on inconsistency-tolerant logics, human reasoning ability,consciousness studies, and modeling of perception and emotion.