chapter 5 applications of slbm and mbwm algorithms for expression...

14
99 CHAPTER 5 APPLICATIONS OF SLBM AND MBWM ALGORITHMS FOR EXPRESSION DETECTION 5.1 INTRODUCTION The essential means of communication of the emotions and feeling of the human beings is by using Facial expression. Facial expression analysis is an attention-grabbing and demanding problem with impact on important applications in many areas including human–computer interaction and data- driven animation. Due to its wide range of applications, automatic facial expression recognition has attracted much attention in recent years. Lot of progress is achieved in this field but still, recognizing facial expression with a high accuracy remains difficult due to the intricacy, complexity and variability of facial expressions. Extracting effective facial features from input face images is a vital step for successful facial expression recognition. The two common approaches to extract facial features include geometric feature- based methods and appearance-based methods. Geometric feature extracts the location and shape of facial components and form a feature vector that represents the face geometry. Valstar et al (2005) have demonstrated that geometric feature-based methods provide a better performance than appearance-based approaches in face expression recognition. However, the geometric feature-based methods usually require accurate and reliable facial feature detection and tracking, which is difficult to estimate in many situations. In the case of appearance-based methods, image filters, such as

Upload: voque

Post on 27-Jul-2018

229 views

Category:

Documents


0 download

TRANSCRIPT

99

CHAPTER 5

APPLICATIONS OF SLBM AND MBWM

ALGORITHMS FOR EXPRESSION DETECTION

5.1 INTRODUCTION

The essential means of communication of the emotions and feeling

of the human beings is by using Facial expression. Facial expression analysis

is an attention-grabbing and demanding problem with impact on important

applications in many areas including human–computer interaction and data-

driven animation. Due to its wide range of applications, automatic facial

expression recognition has attracted much attention in recent years. Lot of

progress is achieved in this field but still, recognizing facial expression with a

high accuracy remains difficult due to the intricacy, complexity and

variability of facial expressions. Extracting effective facial features from input

face images is a vital step for successful facial expression recognition. The

two common approaches to extract facial features include geometric feature-

based methods and appearance-based methods. Geometric feature extracts

the location and shape of facial components and form a feature vector that

represents the face geometry. Valstar et al (2005) have demonstrated that

geometric feature-based methods provide a better performance than

appearance-based approaches in face expression recognition. However, the

geometric feature-based methods usually require accurate and reliable facial

feature detection and tracking, which is difficult to estimate in many

situations. In the case of appearance-based methods, image filters, such as

100

Gabor wavelets, are applied to the whole-face or specific part of the face-

regions to derive the expression changes of the face. Due to their superior

performance, the major works on the appearance- based methods have

focused on using Gabor-wavelet representations. Still, it is intensive both in

time and memory to convolve face images with a bank of Gabor filters to

extract multi-scale and multi-orientation coefficients. This chapter analyze

facial representation based on Local Binary Pattern (LBP), Simplified Local

Binary Mean (SLBM) and Mean Based Weight Matrix (MBWM) features for

person-independent facial expression recognition.LBP features were proposed

originally for texture analysis, and recently have been introduced to represent

faces in facial images analysis. The most important properties of LBP features

are their tolerance against illumination changes and their computational

simplicity. The SLBM features are used for face detection which gives a

better performance compared to the LBP features. The multiclass SVM is

used to estimate the facial expression using LBP, SLBM and MBWM

features.

Figure 5.1 Block diagram for estimation of face Expression

LBP features can be derived very fast in a single scan through the

raw image and lie in low-dimensional feature space, while still retaining

discriminative facial information in a compact representation. The limitation

101

of the existing facial expression recognition methods is that they attempt to

recognize facial expressions from data collected in a highly controlled

environment given high resolution frontal faces. But, in real world

applications low resolution images are only available, which will be much

more difficult. In this work, the various features for facial expression

recognition are investigated. Experiments show that LBP features perform

stably and robustly over a useful range of expression variant face images.

Figure 5.1 presents the basic block diagram of the facial expression

recognition system. Here, the face image is pre-processed by cropping the

image to a size of 256X256. The face image is then segmented into five

regions horizontally to separate the eye and the mouth region. Then the

features of the five regions are estimated and weight is assigned to the eye and

the mouth region and the total feature is obtained by summing all the features.

Multiclass SVM is used to estimate the expression based on the total features.

5.2 FACIAL EXPRESSION DATABASE

JAFFE is used for training and testing. A widely used facial

expression description is Facial Action Coding System (FACS).This is a

human-observer-based system developed to capture delicate changes in facial

expressions. In this system, facial expressions are decomposed into one or

more Action Units (AUs). Psychophysical studies indicate that basic emotions

have corresponding universal facial expressions across all cultures. This is

reflected by most current facial expression recognition systems that attempt to

recognize a set of prototypic emotional expressions including disgust, fear,

joy, surprise, sadness and anger. This work focuses on the same prototypic

expression recognition. Both 6-class and 7-class expression recognition are

considered by including the neutral expression. Figure 5.2 show some sample

images from the Japanese female database. The database was planned and

102

assembled by Miyuki Kamachi, Michael Lyons, and Jiro Gyoba with the

helpof Reiko Kubota as a research assistant.

Figure 5.2 Japanese Female Database

The JAFFE database contains 213 images of female facial

expressions. Each image has a resolution of 256X256 pixels. The number of

images corresponding to each of the 7 categories of expression (neutral,

happiness, sadness, surprise, anger, disgust and fear) is almost the same. The

images in the database are grayscale images in the tiff file format. The

expression expressed in each image along with a semantic rating is provided

in the database that makes the database suitable for facial expression research.

The heads in the images are mostly in frontal pose. Original images have

already been rescaled and cropped such that the eyes are roughly at the same

position with a distance of 60 pixels in the final images.

Figure 5.3 represents the 7 expressions for a subject and the

extraction of the eyes and the mouth region which shows the variation in

expression.

103

Figure 5.3 Extraction of eye and mouth region by Region sub-division method

5.3 FACIAL FEATURES

A weightage based statistical LBP, SLBM and MBWM features are

extracted from the given face images. The LBP, SLBM and MBWM

algorithm for feature extraction are described in section 4.3 for recognizing

face image. The image is subdivided into five subdivisions as shown in

104

Figure 5.4. These features are extracted for these five segments of the face

image. The features extracted from the mouth region and the eye regions are

given more weightage. The weightage given for the mouth region is more

compared to the other regions as the mouth region plays a major role in

detecting the expression of the face image. Equation 5.1 gives the features of

the image to be compared.

Figure 5.4 Image with Five segments

1 + 3 2 + 3 + 5 4 + 5(5.1)

Where fos = feature of segment

And the features of the segments include the first order statistical

features mean and standard deviations given by Equations 5.2 and 5.3.

1 + 3 2 + 3 + 5 4 + 5 (5.2)

1 + 3 2 + 3 + 5 4 + (5.3)

where m =mean

m1 is mean of segment1

m2 is mean of segment2

m3 is mean of segment3

105

m4 is mean of segment4

m5 is mean of segment5

and

sd is the standard deviation of the image

sd1 is the standard deviation of segment1

sd2 is the standard deviation of segment2

sd3 is the standard deviation of segment3

sd4 is the standard deviation of segment4

sd5 is the standard deviation of segment5

5.4 FACE EXPRESSION RECOGNITION

Face expression is a multiclass problem since seven expressions are

involved. Support vector machine is used as a tool to recognize the facial

expression. Support vector machine (SVM) is a powerful machine learning

technique for data classification, SVM performs an implicit mapping of data

into a higher (maybe infinite) dimensional feature space and then finds a

linear separating hyperplane with the maximal margin to separate data in this

higher dimensional space. The various types of SVM are detailed.

5.4.1 Use of Binary Support Vector Machine (SVM)

The training set is labeled as {(xi, yi), i = 1, 2 ……l} where xi

Rn and yi {-1, 1}.A new test example x is classified using the function as in

Equation 5.4

l

i i i

i 1

f ( ) sgn y K( , ) b

(5.4)

106

where, i is the Lagrange multiplier of a dual optimization problem that

describes the separating hyperplane K (xi, x) is a kernel function and b is the

threshold parameter of the hyperplane. The training sample xi with xi> 0 is

called support vectors and SVM finds the hyperplane that maximizes the

distance between the support vectors and the hyperplane. Given a non-linear

mapping that embeds the input data into the high dimensional space,

kernels have the form of K(xi, xj) = ( (xi), (xj). SVM allows domain-

specific selection of the kernel function. Though new kernels are being

proposed, the most frequently used kernel functions are the linear, polynomial

and Radial Basis Function (RBF) kernels. SVM makes binary decisions. With

regard to the parameter selection of SVM, the mean and standard deviation

are chosen. These parameters provided the best accuracy.SVM makes binary

decisions, so the multi-class classification is accomplished here, by using the

one-against-rest technique, which trains binary classifiers to discriminate one

expression from all others, and outputs the class with the largest output of

binary classification. With regard to the parameter selection of SVM, the two

main parameters as mean and standard deviation are used.

5.4.2 Use of Multiclass SVM

In the last two decades support vector machines (SVMs) has

become a widely applied machine learning and classification technique. Some

of its attractive properties maximum margin separation, good generalization

performance, resistance to overtraining have made the SVM classier a useful

machine learning tool in many fields, including text categorization, speech

recognition, computational biology and many others.

Expression estimation requires seven classes to specify the various

expressions neutral, happiness, sadness, surprise, anger, disgust and fear. So,

binary classifier is extended to classify the seven groups using the two regular

107

strategies to form a multiclass classifier from a series of binary classifiers.

They are OAO (one against one) and OAA (one against all) strategies.

Looking into the issue of multi-classes classification, the most

popular strategy used is “one against- all SVM”. In the case of “k classes”

data samples, SVM binary model is used by the OAA-SVM classifier for

separating the classes and the rest of them.

5.5 POSE ESTIMATION

The major challenge of face detection techniques lies in handling

varying poses, i.e., detection of faces in arbitrary in-depth rotations. The face

image differences caused by rotations are often larger than the inter-person

differences used in Rotation. The research toward pose-invariant face

recognition in recent years proposes many prominent approaches. However,

several issues in face recognition across pose still remain open.

Head pose estimation is the ability to infer the orientation of a

person’s head related to the view of the camera. Alternately, head pose

estimation is the ability to infer the orientation of a head relative to a global

coordinate system, but this fine difference requires knowledge of the intrinsic

camera parameters to undo the perceptual bias from perspective distortion.

The range of head motion for an average adult male encompasses a sagittal

flexion and extension (i.e., forward to backward movement of the neck) from

60.4° to 69.6°, a frontal lateral bending (i.e., right to left bending of the neck)

from 40.9° to 36.3°, and a horizontal axial rotation (i.e., right to left rotation

of the head) from 79.8° to 75.3° (Gao et al 2001). The combination of

muscular rotation and relative orientation is an often overlooked ambiguity

(e.g., a profile view of a head does not look exactly the same when a camera

is viewing from the side as compared to when the camera is viewing from the

front and the head is turned sideways). Despite this problem, it is often

108

Frontal input image

Feature Extraction(LBPH,SLBMH,MBWMH)

Multi-class SVM

Pose varied test image

Feature Extraction(LBPH,SLBMH,MBWMH)

Pose(Yaw/Pitch/Roll)

assumed that the human head can be modeled as a disembodied rigid object.

Under this assumption, the human head is limited to three Degree of freedom

(DOF) in pose, which can be characterized by pitch, roll, and yaw angles as

shown in Figure 5.5.

Figure 5.5 Three degrees of freedom of a human head

The general block diagram for pose estimation is shown in Figure

5.6.The frontal face image is used for training and the histogram of the LBP,

SLBM and MBWM matrix are used as features to estimate the pose of the

input image. Multiclass SVM is used to estimate the pose as yaw, pitch or

roll. Head Pose face database is used for training and testing. Head Pose face

database, contains 15 subjects with 10 selected different poses.

Figure 5.6 Block diagram for pose estimation

109

5.5.1 Histogram Based Head-pose Estimation

Image histogram is a basic method that describes an image in lower

dimension. Histogram of an image is considered as feature vector

representing of the image. The histogram of an image is a statistical

description of the distribution in terms of occurrence of pixel intensities. The

size of the image histogram depends on the number of quantization levels of

the pixel intensities. In the case of monochrome image, 8- bit representation is

used for 256 gray levels. An image histogram is simply a mapping i that

counts the number of pixel intensity levels that fall into various disjoint

intervals, known as bins. The bin size determines the size of the histogram

vector. In this work, the bin size is assumed to be 256 and the size of the

histogram vector is 256. Histogram of a monochrome image, i, is the sum of

all the 255 bins. Where N is the number of pixels in an image. Then,

histogram feature vector, H, is the total histogram of the image.The similarity

between two images can be measured using the correlation between the

histogram value. If H1,H2,….,HM be a set of training face images with

different poses and M be the number of image samples, then for a given test

face image, the histogram of the test image Ht can be used to compare the

test image histogram with the training image histogram.

Head pose is estimated using multiclass support vector machine.

The face dataset is divided into training set and test set. The images used in

the test set are not included in the training set. The results of the proposed

system are outstanding, because even a single image in the training set

provides a correct recognition rate as high as 94.89%. This rate is down to

68.89% in the PCA based face recognition systems respectively. The

proposed method shows slight improvement as the number of training set

images is increased.

110

5.5.2 LBPH Based Head-pose Estimation

Head pose estimation involves two tasks, constructing the pose

estimators from face images with known pose information, and applying the

estimators to a new face image. Multiclass SVM is used to construct three

pose estimators, one for tilt (elevation) and the other for yaw (azimuth) and

the other for roll estimation. The input to the pose estimators is the

LBP-histogram of face images. The input faces image is resized to about

128 X 128 pixels. The range of pose of these face images is [-90, +90] in yaw

and [-30, +30] in tilt. The dimensionality of LBP-histogram is reasonably

small. The output is the pose angles in tilt, pitch and yaw.

The LBP matrix is constructed as explained in section 4.4.1. The

histogram of the LBP matrix constructed is used for pose estimation. LBPs

have been very effective for image representation as it is being applied to

visual inspection, motion detection and outdoor scene analysis. The most

important properties of LBP features are their tolerance against monotonic

illumination changes and their computational simplicity. The LBP operator

mainly detects many texture primitives as spot, line end, edge and corner

typically accumulated into a histogram over a region to capture local texture

information.

5.5.3 SLBMH Based Head-pose Estimation

The simplified local binary mean-histogram overcomes the

disadvantage of the LBPH algorithm by using the mean of the nine pixels for

thresholding. The construction of SLBM matrix is as explained in section

4.4.2 and the constructed matrix is as shown in Figure 5.7.

111

Figure 5.7 SLBM operator

3X3 matrix of the entire image is concatenated and the histogram of

the concatenated data is obtained. This gives the SLMBH features.

5.5.4 MBWM Based Head Pose Estimation

In SLBM, thresholding is exactly with the value of the mean of the

9 pixel im. Thus the mean value of the pixel is not given much weightage. So,

the SLBM is extended to Mean Based Weight Matrix (MBWM) .The mean

based weight matrix involves three steps which include subdividing,

thresholding and weighing as explained in section 4.4.3.

The histograms of the MBWM features are then calculated.

Figure 5.8 Histogram of the MBWM feature

112

5.6 POSE ESTIMATION USING MULTICLASS SUPPORT

VECTOR MACHINE

Pose estimation requires three classes to specify the yaw, pitch and

roll. So, binary classifier is extended to classify three groups using the two

regular strategies to form a multiclass classifier from a series of binary

classifiers. They are OAO (one against one) and OAA (one against all)

strategies.

Looking into the issue of multi-classes classification, the most

popular strategy used is “one against- all SVM”. In the case of “k classes”

data samples, SVM binary model is used by the OAA-SVM classifier for

separating the classes and the rest of them.

5.7 SUMMARY

In this chapter, the applications of SLBM and MBWM algorithm

are discussed. Face expression recognition is done using weighed MBWM

method and multiclass SVM. Comparison has been made for expression

detection with weighted SLBM method and weighed MBWM methods.

Another application of SLBM and MBWM feature is estimation of

pose in multiview face images. Comparison has been made between LBPH,

SLBMH and MBWMH features and it is found that MBWMH features are

more efficient in comparison to LBPH and SLBMH features to recognize the

pose of the multiview face image. It is robust and efficient in recognizing the

expression and pose of the face image.