chapter 5 applications of slbm and mbwm algorithms for expression...
TRANSCRIPT
99
CHAPTER 5
APPLICATIONS OF SLBM AND MBWM
ALGORITHMS FOR EXPRESSION DETECTION
5.1 INTRODUCTION
The essential means of communication of the emotions and feeling
of the human beings is by using Facial expression. Facial expression analysis
is an attention-grabbing and demanding problem with impact on important
applications in many areas including human–computer interaction and data-
driven animation. Due to its wide range of applications, automatic facial
expression recognition has attracted much attention in recent years. Lot of
progress is achieved in this field but still, recognizing facial expression with a
high accuracy remains difficult due to the intricacy, complexity and
variability of facial expressions. Extracting effective facial features from input
face images is a vital step for successful facial expression recognition. The
two common approaches to extract facial features include geometric feature-
based methods and appearance-based methods. Geometric feature extracts
the location and shape of facial components and form a feature vector that
represents the face geometry. Valstar et al (2005) have demonstrated that
geometric feature-based methods provide a better performance than
appearance-based approaches in face expression recognition. However, the
geometric feature-based methods usually require accurate and reliable facial
feature detection and tracking, which is difficult to estimate in many
situations. In the case of appearance-based methods, image filters, such as
100
Gabor wavelets, are applied to the whole-face or specific part of the face-
regions to derive the expression changes of the face. Due to their superior
performance, the major works on the appearance- based methods have
focused on using Gabor-wavelet representations. Still, it is intensive both in
time and memory to convolve face images with a bank of Gabor filters to
extract multi-scale and multi-orientation coefficients. This chapter analyze
facial representation based on Local Binary Pattern (LBP), Simplified Local
Binary Mean (SLBM) and Mean Based Weight Matrix (MBWM) features for
person-independent facial expression recognition.LBP features were proposed
originally for texture analysis, and recently have been introduced to represent
faces in facial images analysis. The most important properties of LBP features
are their tolerance against illumination changes and their computational
simplicity. The SLBM features are used for face detection which gives a
better performance compared to the LBP features. The multiclass SVM is
used to estimate the facial expression using LBP, SLBM and MBWM
features.
Figure 5.1 Block diagram for estimation of face Expression
LBP features can be derived very fast in a single scan through the
raw image and lie in low-dimensional feature space, while still retaining
discriminative facial information in a compact representation. The limitation
101
of the existing facial expression recognition methods is that they attempt to
recognize facial expressions from data collected in a highly controlled
environment given high resolution frontal faces. But, in real world
applications low resolution images are only available, which will be much
more difficult. In this work, the various features for facial expression
recognition are investigated. Experiments show that LBP features perform
stably and robustly over a useful range of expression variant face images.
Figure 5.1 presents the basic block diagram of the facial expression
recognition system. Here, the face image is pre-processed by cropping the
image to a size of 256X256. The face image is then segmented into five
regions horizontally to separate the eye and the mouth region. Then the
features of the five regions are estimated and weight is assigned to the eye and
the mouth region and the total feature is obtained by summing all the features.
Multiclass SVM is used to estimate the expression based on the total features.
5.2 FACIAL EXPRESSION DATABASE
JAFFE is used for training and testing. A widely used facial
expression description is Facial Action Coding System (FACS).This is a
human-observer-based system developed to capture delicate changes in facial
expressions. In this system, facial expressions are decomposed into one or
more Action Units (AUs). Psychophysical studies indicate that basic emotions
have corresponding universal facial expressions across all cultures. This is
reflected by most current facial expression recognition systems that attempt to
recognize a set of prototypic emotional expressions including disgust, fear,
joy, surprise, sadness and anger. This work focuses on the same prototypic
expression recognition. Both 6-class and 7-class expression recognition are
considered by including the neutral expression. Figure 5.2 show some sample
images from the Japanese female database. The database was planned and
102
assembled by Miyuki Kamachi, Michael Lyons, and Jiro Gyoba with the
helpof Reiko Kubota as a research assistant.
Figure 5.2 Japanese Female Database
The JAFFE database contains 213 images of female facial
expressions. Each image has a resolution of 256X256 pixels. The number of
images corresponding to each of the 7 categories of expression (neutral,
happiness, sadness, surprise, anger, disgust and fear) is almost the same. The
images in the database are grayscale images in the tiff file format. The
expression expressed in each image along with a semantic rating is provided
in the database that makes the database suitable for facial expression research.
The heads in the images are mostly in frontal pose. Original images have
already been rescaled and cropped such that the eyes are roughly at the same
position with a distance of 60 pixels in the final images.
Figure 5.3 represents the 7 expressions for a subject and the
extraction of the eyes and the mouth region which shows the variation in
expression.
103
Figure 5.3 Extraction of eye and mouth region by Region sub-division method
5.3 FACIAL FEATURES
A weightage based statistical LBP, SLBM and MBWM features are
extracted from the given face images. The LBP, SLBM and MBWM
algorithm for feature extraction are described in section 4.3 for recognizing
face image. The image is subdivided into five subdivisions as shown in
104
Figure 5.4. These features are extracted for these five segments of the face
image. The features extracted from the mouth region and the eye regions are
given more weightage. The weightage given for the mouth region is more
compared to the other regions as the mouth region plays a major role in
detecting the expression of the face image. Equation 5.1 gives the features of
the image to be compared.
Figure 5.4 Image with Five segments
1 + 3 2 + 3 + 5 4 + 5(5.1)
Where fos = feature of segment
And the features of the segments include the first order statistical
features mean and standard deviations given by Equations 5.2 and 5.3.
1 + 3 2 + 3 + 5 4 + 5 (5.2)
1 + 3 2 + 3 + 5 4 + (5.3)
where m =mean
m1 is mean of segment1
m2 is mean of segment2
m3 is mean of segment3
105
m4 is mean of segment4
m5 is mean of segment5
and
sd is the standard deviation of the image
sd1 is the standard deviation of segment1
sd2 is the standard deviation of segment2
sd3 is the standard deviation of segment3
sd4 is the standard deviation of segment4
sd5 is the standard deviation of segment5
5.4 FACE EXPRESSION RECOGNITION
Face expression is a multiclass problem since seven expressions are
involved. Support vector machine is used as a tool to recognize the facial
expression. Support vector machine (SVM) is a powerful machine learning
technique for data classification, SVM performs an implicit mapping of data
into a higher (maybe infinite) dimensional feature space and then finds a
linear separating hyperplane with the maximal margin to separate data in this
higher dimensional space. The various types of SVM are detailed.
5.4.1 Use of Binary Support Vector Machine (SVM)
The training set is labeled as {(xi, yi), i = 1, 2 ……l} where xi
Rn and yi {-1, 1}.A new test example x is classified using the function as in
Equation 5.4
l
i i i
i 1
f ( ) sgn y K( , ) b
(5.4)
106
where, i is the Lagrange multiplier of a dual optimization problem that
describes the separating hyperplane K (xi, x) is a kernel function and b is the
threshold parameter of the hyperplane. The training sample xi with xi> 0 is
called support vectors and SVM finds the hyperplane that maximizes the
distance between the support vectors and the hyperplane. Given a non-linear
mapping that embeds the input data into the high dimensional space,
kernels have the form of K(xi, xj) = ( (xi), (xj). SVM allows domain-
specific selection of the kernel function. Though new kernels are being
proposed, the most frequently used kernel functions are the linear, polynomial
and Radial Basis Function (RBF) kernels. SVM makes binary decisions. With
regard to the parameter selection of SVM, the mean and standard deviation
are chosen. These parameters provided the best accuracy.SVM makes binary
decisions, so the multi-class classification is accomplished here, by using the
one-against-rest technique, which trains binary classifiers to discriminate one
expression from all others, and outputs the class with the largest output of
binary classification. With regard to the parameter selection of SVM, the two
main parameters as mean and standard deviation are used.
5.4.2 Use of Multiclass SVM
In the last two decades support vector machines (SVMs) has
become a widely applied machine learning and classification technique. Some
of its attractive properties maximum margin separation, good generalization
performance, resistance to overtraining have made the SVM classier a useful
machine learning tool in many fields, including text categorization, speech
recognition, computational biology and many others.
Expression estimation requires seven classes to specify the various
expressions neutral, happiness, sadness, surprise, anger, disgust and fear. So,
binary classifier is extended to classify the seven groups using the two regular
107
strategies to form a multiclass classifier from a series of binary classifiers.
They are OAO (one against one) and OAA (one against all) strategies.
Looking into the issue of multi-classes classification, the most
popular strategy used is “one against- all SVM”. In the case of “k classes”
data samples, SVM binary model is used by the OAA-SVM classifier for
separating the classes and the rest of them.
5.5 POSE ESTIMATION
The major challenge of face detection techniques lies in handling
varying poses, i.e., detection of faces in arbitrary in-depth rotations. The face
image differences caused by rotations are often larger than the inter-person
differences used in Rotation. The research toward pose-invariant face
recognition in recent years proposes many prominent approaches. However,
several issues in face recognition across pose still remain open.
Head pose estimation is the ability to infer the orientation of a
person’s head related to the view of the camera. Alternately, head pose
estimation is the ability to infer the orientation of a head relative to a global
coordinate system, but this fine difference requires knowledge of the intrinsic
camera parameters to undo the perceptual bias from perspective distortion.
The range of head motion for an average adult male encompasses a sagittal
flexion and extension (i.e., forward to backward movement of the neck) from
60.4° to 69.6°, a frontal lateral bending (i.e., right to left bending of the neck)
from 40.9° to 36.3°, and a horizontal axial rotation (i.e., right to left rotation
of the head) from 79.8° to 75.3° (Gao et al 2001). The combination of
muscular rotation and relative orientation is an often overlooked ambiguity
(e.g., a profile view of a head does not look exactly the same when a camera
is viewing from the side as compared to when the camera is viewing from the
front and the head is turned sideways). Despite this problem, it is often
108
Frontal input image
Feature Extraction(LBPH,SLBMH,MBWMH)
Multi-class SVM
Pose varied test image
Feature Extraction(LBPH,SLBMH,MBWMH)
Pose(Yaw/Pitch/Roll)
assumed that the human head can be modeled as a disembodied rigid object.
Under this assumption, the human head is limited to three Degree of freedom
(DOF) in pose, which can be characterized by pitch, roll, and yaw angles as
shown in Figure 5.5.
Figure 5.5 Three degrees of freedom of a human head
The general block diagram for pose estimation is shown in Figure
5.6.The frontal face image is used for training and the histogram of the LBP,
SLBM and MBWM matrix are used as features to estimate the pose of the
input image. Multiclass SVM is used to estimate the pose as yaw, pitch or
roll. Head Pose face database is used for training and testing. Head Pose face
database, contains 15 subjects with 10 selected different poses.
Figure 5.6 Block diagram for pose estimation
109
5.5.1 Histogram Based Head-pose Estimation
Image histogram is a basic method that describes an image in lower
dimension. Histogram of an image is considered as feature vector
representing of the image. The histogram of an image is a statistical
description of the distribution in terms of occurrence of pixel intensities. The
size of the image histogram depends on the number of quantization levels of
the pixel intensities. In the case of monochrome image, 8- bit representation is
used for 256 gray levels. An image histogram is simply a mapping i that
counts the number of pixel intensity levels that fall into various disjoint
intervals, known as bins. The bin size determines the size of the histogram
vector. In this work, the bin size is assumed to be 256 and the size of the
histogram vector is 256. Histogram of a monochrome image, i, is the sum of
all the 255 bins. Where N is the number of pixels in an image. Then,
histogram feature vector, H, is the total histogram of the image.The similarity
between two images can be measured using the correlation between the
histogram value. If H1,H2,….,HM be a set of training face images with
different poses and M be the number of image samples, then for a given test
face image, the histogram of the test image Ht can be used to compare the
test image histogram with the training image histogram.
Head pose is estimated using multiclass support vector machine.
The face dataset is divided into training set and test set. The images used in
the test set are not included in the training set. The results of the proposed
system are outstanding, because even a single image in the training set
provides a correct recognition rate as high as 94.89%. This rate is down to
68.89% in the PCA based face recognition systems respectively. The
proposed method shows slight improvement as the number of training set
images is increased.
110
5.5.2 LBPH Based Head-pose Estimation
Head pose estimation involves two tasks, constructing the pose
estimators from face images with known pose information, and applying the
estimators to a new face image. Multiclass SVM is used to construct three
pose estimators, one for tilt (elevation) and the other for yaw (azimuth) and
the other for roll estimation. The input to the pose estimators is the
LBP-histogram of face images. The input faces image is resized to about
128 X 128 pixels. The range of pose of these face images is [-90, +90] in yaw
and [-30, +30] in tilt. The dimensionality of LBP-histogram is reasonably
small. The output is the pose angles in tilt, pitch and yaw.
The LBP matrix is constructed as explained in section 4.4.1. The
histogram of the LBP matrix constructed is used for pose estimation. LBPs
have been very effective for image representation as it is being applied to
visual inspection, motion detection and outdoor scene analysis. The most
important properties of LBP features are their tolerance against monotonic
illumination changes and their computational simplicity. The LBP operator
mainly detects many texture primitives as spot, line end, edge and corner
typically accumulated into a histogram over a region to capture local texture
information.
5.5.3 SLBMH Based Head-pose Estimation
The simplified local binary mean-histogram overcomes the
disadvantage of the LBPH algorithm by using the mean of the nine pixels for
thresholding. The construction of SLBM matrix is as explained in section
4.4.2 and the constructed matrix is as shown in Figure 5.7.
111
Figure 5.7 SLBM operator
3X3 matrix of the entire image is concatenated and the histogram of
the concatenated data is obtained. This gives the SLMBH features.
5.5.4 MBWM Based Head Pose Estimation
In SLBM, thresholding is exactly with the value of the mean of the
9 pixel im. Thus the mean value of the pixel is not given much weightage. So,
the SLBM is extended to Mean Based Weight Matrix (MBWM) .The mean
based weight matrix involves three steps which include subdividing,
thresholding and weighing as explained in section 4.4.3.
The histograms of the MBWM features are then calculated.
Figure 5.8 Histogram of the MBWM feature
112
5.6 POSE ESTIMATION USING MULTICLASS SUPPORT
VECTOR MACHINE
Pose estimation requires three classes to specify the yaw, pitch and
roll. So, binary classifier is extended to classify three groups using the two
regular strategies to form a multiclass classifier from a series of binary
classifiers. They are OAO (one against one) and OAA (one against all)
strategies.
Looking into the issue of multi-classes classification, the most
popular strategy used is “one against- all SVM”. In the case of “k classes”
data samples, SVM binary model is used by the OAA-SVM classifier for
separating the classes and the rest of them.
5.7 SUMMARY
In this chapter, the applications of SLBM and MBWM algorithm
are discussed. Face expression recognition is done using weighed MBWM
method and multiclass SVM. Comparison has been made for expression
detection with weighted SLBM method and weighed MBWM methods.
Another application of SLBM and MBWM feature is estimation of
pose in multiview face images. Comparison has been made between LBPH,
SLBMH and MBWMH features and it is found that MBWMH features are
more efficient in comparison to LBPH and SLBMH features to recognize the
pose of the multiview face image. It is robust and efficient in recognizing the
expression and pose of the face image.