face recognition by generalized two-dimensional fld method and multi-class support vector machines

11
Applied Soft Computing 11 (2011) 4282–4292 Contents lists available at ScienceDirect Applied Soft Computing j ourna l ho me p age: www.elsevier.com/l ocate/asoc Face recognition by generalized two-dimensional FLD method and multi-class support vector machines Shiladitya Chowdhury a , Jamuna Kanta Sing b,, Dipak Kumar Basu b , Mita Nasipuri b a Department of Master of Computer Application, Techno India, EM-4/1, Sector V, Salt Lake, Kolkata 700 091, India b Department of Computer Science & Engineering, Jadavpur University, 188, Raja S. C. Mullick Road, Kolkata, West Bengal 700 032, India a r t i c l e i n f o Article history: Received 20 April 2010 Received in revised form 27 October 2010 Accepted 1 December 2010 Available online 15 December 2010 Keywords: Generalized two-dimensional FLD Fisher’s criteria Feature extraction Face recognition Multi-class SVM SVM-based classifier a b s t r a c t This paper presents a novel scheme for feature extraction, namely, the generalized two-dimensional Fisher’s linear discriminant (G-2DFLD) method and its use for face recognition using multi-class support vector machines as classifier. The G-2DFLD method is an extension of the 2DFLD method for feature extraction. Like 2DFLD method, G-2DFLD method is also based on the original 2D image matrix. However, unlike 2DFLD method, which maximizes class separability either from row or column direction, the G- 2DFLD method maximizes class separability from both the row and column directions simultaneously. To realize this, two alternative Fisher’s criteria have been defined corresponding to row and column-wise projection directions. Unlike 2DFLD method, the principal components extracted from an image matrix in G-2DFLD method are scalars; yielding much smaller image feature matrix. The proposed G-2DFLD method was evaluated on two popular face recognition databases, the AT&T (formerly ORL) and the UMIST face databases. The experimental results using different experimental strategies show that the new G-2DFLD scheme outperforms the PCA, 2DPCA, FLD and 2DFLD schemes, not only in terms of computation times, but also for the task of face recognition using multi-class support vector machines (SVM) as classifier. The proposed method also outperforms some of the neural networks and other SVM-based methods for face recognition reported in the literature. © 2010 Elsevier B.V. All rights reserved. 1. Introduction Since the last decade, human face recognition is an active research area in the field of pattern recognition and computer vision due to its wide range of applications, such as identity authen- tication, access control, surveillance systems, security, etc. As a result, numerous methods have been proposed in the past. Surveys of these methods can be found in [1–4]. Often, a single method involves techniques motivated by different principles. The usage of a mixture of techniques makes it difficult to classify these methods based purely on the types of techniques used for feature represen- tation or classification. Based on the psychological study of how humans use holistic and local features, face recognition techniques may be classified into three categories: (i) holistic matching meth- ods, (ii) feature-based (structural) matching methods, and (iii) Hybrid methods. Corresponding author. E-mail addresses: [email protected] (S. Chowdhury), [email protected] (J.K. Sing), [email protected] (D.K. Basu), [email protected] (M. Nasipuri). 1.1. Holistic matching methods These methods use whole face region as the raw input to a recognition system. One of the most widely used methods is eigen- face approach, which is based on the principal component analysis (PCA) [5,6]. It generates a set of orthogonal bases that capture directions of maximum variance in the training images. Eigenface approach can preserve the global structure of the input space and is optimal in terms of image representation and reconstruction. The Fisher’s linear discriminant (FLD) method has also been widely used for feature extraction and recognition [7–9]. The key idea of the FLD technique is to find the optimal projection that maximizes the ratio of the between-class and the within-class scatter matri- ces of the projected samples. However, a difficulty in using the FLD technique in face recognition is the small sample size (SSS)problem [10]. This problem usually arises when the number of samples is smaller than the dimension of the samples. In face recog- nition domain, the dimension of a face image is generally very high. Therefore, the within-class scatter matrix is almost always singu- lar, thereby making the implementation of FLD method impossible. One direct solution of SSS is to down sample the face images into a considerably small size and then perform FLD technique. However, this process is not computationally efficient as the pre-processing 1568-4946/$ see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.asoc.2010.12.002

Upload: jdvu

Post on 13-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Journal Identification = ASOC Article Identification = 1029 Date: June 28, 2011 Time: 11:48 am

Fs

Sa

b

a

ARRAA

KGFFFMS

1

rdtroiabthmom

jdm

1d

Applied Soft Computing 11 (2011) 4282–4292

Contents lists available at ScienceDirect

Applied Soft Computing

j ourna l ho me p age: www.elsev ier .com/ l ocate /asoc

ace recognition by generalized two-dimensional FLD method and multi-classupport vector machines

hiladitya Chowdhurya, Jamuna Kanta Singb,∗, Dipak Kumar Basub, Mita Nasipurib

Department of Master of Computer Application, Techno India, EM-4/1, Sector V, Salt Lake, Kolkata 700 091, IndiaDepartment of Computer Science & Engineering, Jadavpur University, 188, Raja S. C. Mullick Road, Kolkata, West Bengal 700 032, India

r t i c l e i n f o

rticle history:eceived 20 April 2010eceived in revised form 27 October 2010ccepted 1 December 2010vailable online 15 December 2010

eywords:eneralized two-dimensional FLDisher’s criteriaeature extraction

a b s t r a c t

This paper presents a novel scheme for feature extraction, namely, the generalized two-dimensionalFisher’s linear discriminant (G-2DFLD) method and its use for face recognition using multi-class supportvector machines as classifier. The G-2DFLD method is an extension of the 2DFLD method for featureextraction. Like 2DFLD method, G-2DFLD method is also based on the original 2D image matrix. However,unlike 2DFLD method, which maximizes class separability either from row or column direction, the G-2DFLD method maximizes class separability from both the row and column directions simultaneously.To realize this, two alternative Fisher’s criteria have been defined corresponding to row and column-wiseprojection directions. Unlike 2DFLD method, the principal components extracted from an image matrix inG-2DFLD method are scalars; yielding much smaller image feature matrix. The proposed G-2DFLD method

ace recognitionulti-class SVM

VM-based classifier

was evaluated on two popular face recognition databases, the AT&T (formerly ORL) and the UMIST facedatabases. The experimental results using different experimental strategies show that the new G-2DFLDscheme outperforms the PCA, 2DPCA, FLD and 2DFLD schemes, not only in terms of computation times,but also for the task of face recognition using multi-class support vector machines (SVM) as classifier.The proposed method also outperforms some of the neural networks and other SVM-based methods forface recognition reported in the literature.

© 2010 Elsevier B.V. All rights reserved.

. Introduction

Since the last decade, human face recognition is an activeesearch area in the field of pattern recognition and computer visionue to its wide range of applications, such as identity authen-ication, access control, surveillance systems, security, etc. As aesult, numerous methods have been proposed in the past. Surveysf these methods can be found in [1–4]. Often, a single methodnvolves techniques motivated by different principles. The usage of

mixture of techniques makes it difficult to classify these methodsased purely on the types of techniques used for feature represen-ation or classification. Based on the psychological study of howumans use holistic and local features, face recognition techniquesay be classified into three categories: (i) holistic matching meth-

ds, (ii) feature-based (structural) matching methods, and (iii) Hybridethods.

∗ Corresponding author.E-mail addresses: [email protected] (S. Chowdhury),

[email protected] (J.K. Sing),[email protected] (D.K. Basu),[email protected] (M. Nasipuri).

568-4946/$ – see front matter © 2010 Elsevier B.V. All rights reserved.oi:10.1016/j.asoc.2010.12.002

1.1. Holistic matching methods

These methods use whole face region as the raw input to arecognition system. One of the most widely used methods is eigen-face approach, which is based on the principal component analysis(PCA) [5,6]. It generates a set of orthogonal bases that capturedirections of maximum variance in the training images. Eigenfaceapproach can preserve the global structure of the input space andis optimal in terms of image representation and reconstruction.The Fisher’s linear discriminant (FLD) method has also been widelyused for feature extraction and recognition [7–9]. The key idea ofthe FLD technique is to find the optimal projection that maximizesthe ratio of the between-class and the within-class scatter matri-ces of the projected samples. However, a difficulty in using theFLD technique in face recognition is the “small sample size (SSS)”problem [10]. This problem usually arises when the number ofsamples is smaller than the dimension of the samples. In face recog-nition domain, the dimension of a face image is generally very high.Therefore, the within-class scatter matrix is almost always singu-

lar, thereby making the implementation of FLD method impossible.One direct solution of SSS is to down sample the face images into aconsiderably small size and then perform FLD technique. However,this process is not computationally efficient as the pre-processing

Journal Identification = ASOC Article Identification = 1029 Date: June 28, 2011 Time: 11:48 am

ft Com

octettPbimstnaraiRmattiefidtuar

tmarnnbcuwthtciftmhbdasads

puafarPf

S. Chowdhury et al. / Applied So

f images takes considerable amount of time before actual appli-ation of the FLD technique. Er et al. [11] proposed a PCA + FLDechnique to avoid SSS problem. In [11], face features are firstxtracted by the principal component analysis (PCA) method andhen the resultant features are further processed by FLD techniqueo acquire lower dimensional discriminant features. An improvedCA technique, the two-dimensional PCA (2DPCA), was proposedy Yang et al. [12]. Unlike PCA, which works on the stretched

mage vector, the 2DPCA works directly on the original 2D imageatrix. The 2DPCA is not only computationally efficient, but also

uperior for the task of face recognition and image reconstructionhan the conventional PCA technique [12]. However, the PCA tech-iques yield projection directions that maximize the total scattercross all classes, i.e., across all face images. Therefore, the PCAetains unwanted variations caused by lighting, facial expression,nd other factors [7,11]. The PCA techniques do not provide anynformation for class discrimination but dimension reduction [11].ecently, Xiong et al. [13] proposed a two-dimensional FLD (2DFLD)ethod, which also works directly on the original 2D image matrix

nd maximizes class separability either from row or column direc-ion. The so called SSS problem does not arise in 2DFLD method ashe size of its scatter matrices is much smaller. The 2DFLD methods found to be superior to the PCA and 2DPCA in terms of featurextraction and face recognition [13]. Apart from the eigenface andsherface approaches, Bayesian methods, which use a probabilisticistance metric [14], neural networks [11,15–19] and support vec-or machine (SVM) methods [20–26] have also been developed. Totilize higher order statistics, some nonlinear forms of eigenfacend fisherface methods have been developed [27–32] for betterecognition performance.

The advantage of using the neural networks for face recogni-ion [11,15–19] is that the networks can be trained to capture

ore knowledge about the variation of face images and therebychieving good generalization. In recent times, among the neu-al network approaches, many researchers have used RBF neuraletworks (RBFNN) for face recognition [11,15–19]. The RBF neuraletworks can be trained faster than multi layer perceptron (MLP)ecause of its locally tuned neurons and has more compact topologyompared to other models of neural networks. Er et al. [11] havesed principal component analysis (PCA) method with RBF net-orks for face recognition. In their recent work [18], discrete cosine

ransform (DCT) and Fisher’s linear discriminant (FLD) techniqueave been employed in an RBFNN for high-speed face recogni-ion. In our earlier work [15], we have used a modified k-meanslustering algorithm using point symmetry distance as a similar-ty measure to model the hidden layer neurons of an RBFNN forace recognition. In this method we have generated cluster cen-ers from each individual of the database independently to capture

ore knowledge about distribution of facial images. Recently, weave proposed a high-speed face recognition method using pixel-ased features and RBFNN [16]. Yang and Paindovoine [17] haveown-sampled the face images into 16 × 16 pixels and applied inn RBFNN for recognition. Haddadnia et al. [19] have combined thehape information and PCA to extract features from a face imagend used in RBF neural networks for face recognition. The mainrawback of this technique is that the networks have to be exten-ively tuned to get exceptional performance.

Few methods for face recognition using SVM have also been pro-osed in the past [20–26]. Among the earlier works, Phillips [20]sed SVM for face recognition. Zhaohui and Guiming [21] proposed

method based on multi-class bias SVM (BSVM), where local facialeatures are automatically extracted and combine them to form

single feature vector, which is then classified by the BSVM forecognition. Lee et al. [22] proposed a SVM-based method usingCA + FLD feature subspace. The method reduces the number oface classes by selecting a few classes closest to the test data after

puting 11 (2011) 4282–4292 4283

projected in the PCA + LDA feature subspace. Ko and Byun [23] pro-posed a method by combining one-per-class (OPC) and pairwisecoupling (PWC) SVMs with rejection criteria. Guo et al. [24] pro-posed a binary tree-based multi-class SVM for face recognition.Wang and Sun [25] presented a face recognition method using sim-ple gabor feature space (SGFS) and SVM. Thakur et al. [26] proposeda SVM-based face recognition technique using FLD features.

More recently some new developments on the holistic match-ing methods can be found in the literature [33–36]. Zhi and Ruan[33] proposed a two-dimensional direct and weighted linear dis-criminant analysis (2D-DWLDA) for feature extraction. The methodtries to weaken the overlap between the neighbouring classes byintroducing a weighting function. Wang et al. [34] proposed a fea-ture extraction method, which combines the ideas of 2D-PCA and2D maximum scatter difference methods. The method can simulta-neously make use of the discriminant and descriptive informationof the image. Song et al. [35] proposed a face recognition methodbased on complete fuzzy linear discriminant analysis (CF-LDA)and decision tree fuzzy support vector machines (DT-FSVM). Themethod uses a relaxed normalized condition in the definition offuzzy membership function to improve the classification results.Jiang et al. [36] proposed a method for facial eigenfeature regular-ization and extraction. Image space spanned by the eigenvectors ofthe within-class scatter matrix is decomposed into three subspaces.Then eigenfeatures are regularized differently in these three sub-spaces based on an eigenspectrum model to address the problemsof instability, over fitting and poor generalization. After discrimi-nant assessment, features are extracted from these three subspaces.

1.2. Feature-based (structural) matching methods

Most earlier methods of face recognition belong to this cate-gory. Local structural features such as eyes, nose, mouth, etc. areextracted from the frontal-view images and their locations, angles,distances, etc. are used for recognition [37–39]. Without findingthe exact locations of the facial features, Hidden Markov Model(HMM)-based methods use strip of pixels to cover forehead, eye,nose, mouth, and chin [40,41]. One of the most successful meth-ods in this category is the graph matching technique [42], whichis based on the Dynamic Link Architecture (DLA). The main disad-vantage of these methods is that the profile (side-view) images andillumination variations can increase the complexity and time of theapproach.

1.3. Hybrid methods

These types of methods try to realize the human perceptionby integrating holistic and feature-based approaches to recognizea face. Some of the hybrid methods are the modular eigenfacemethod [43], hybrid local feature analysis (LFA) [44], shape-normalized method [45] and component-based method [46]. Themodular eigenface method [43] uses hybrid features by combiningeigenfaces and other eigenmodules such as, eigeneyes, eigen-mouth, and eigennose. This method is found to be slightly superiorto the holistic eigenface method. The hybrid LFA method [44] usesa set of hybrid features using PCA and LFA methods. The shape-normalized method uses both shape and gray-level informationfor automatic face recognition [45]. The component-based method[46] decomposes a face into a set of facial components such asmouth and eyes that are interconnected by a flexible geometricalmodel. One drawback of this method is that it needs a large num-ber of training images taken from different viewpoints and under

different lighting conditions.

In this paper, we have extended the 2DFLD algorithm andpresent a novel generalized two-dimensional FLD (G-2DFLD) tech-nique, which maximizes class separability from both the row and

Journal Identification = ASOC Article Identification = 1029 Date: June 28, 2011 Time: 11:48 am

4 oft Com

cmmr2mt2metaa

t2ofear

2e

d2adgl

Y

icNtcd

G

G

f

J

w

GcG

wG

284 S. Chowdhury et al. / Applied S

olumn directions simultaneously. Like 2DFLD method, G-2DFLDethod is also based on the original 2D image matrix. In G-2DFLDethod, two alternative Fisher’s criteria have been defined cor-

esponding to row and column-wise projection directions. UnlikeDFLD method, the principal components extracted from an imageatrix by the G-2DFLD method are scalars. Therefore, the size of

he resultant image feature matrix is much smaller using the G-DFLD method than that of using the 2DFLD method. A non-linearulti-class SVM has been designed to classify the face images. The

xperimental results on the AT&T and the UMIST databases showhat the new G-2DFLD scheme outperforms the PCA, 2DPCA, FLDnd 2DFLD schemes, not only in terms of computation time, butlso for the task of face recognition.

The remaining part of the paper is organized as follows. Sec-ion 2 describes the procedure of extracting face features usingDFLD technique. Section 3 presents the key idea and algorithmf the proposed G-2DFLD method for feature extraction and fisher-ace calculation. The key idea of SVMs is described in Section 4. Thexperimental results on the AT&T and the UMIST face databasesre presented in Section 5. Finally, Section 6 draws the concludingemarks.

. Two-dimensional FLD (2DFLD) method for featurextraction

The 2DFLD [13] method is based on the 2D image matrix. Itoes not need to form a stretched large image vector from theD image matrix. The key idea is to project an image matrix X,n m × n random matrix, onto an optimal projection matrix A ofimension n × k (k is the number of projection vector and k ≤ n) toet an image feature matrix Y of dimension m × k by the followinginear transformation [13]:

= XA (1)

Let there are N training images, each one is denoted by m × nmage matrix Xi (i = 1, 2, . . ., N). The training images contain Classes (subjects), and the cth class Cc has Nc samples (

∑Cc=1Nc =

). Let the mean image of the training samples is denoted by � andhe mean image of the cth class is denoted by �c. The between-lass and within-class scatter matrices Gb and Gw, respectively areefined as follows:

b =C∑c

Nc(�c − �)T(�c − �) (2)

w =C∑c

N∑i ∈ c

(Xi − �c)T(Xi − �c) (3)

Then the two-dimensional Fisher’s criterion J(Q) is defined asollows:

(Q ) = |Q TGbQ ||Q TGwQ | (4)

here Q is the projection matrix.It may be noted that the size of both the Gb and Gw is n × n. If

w is a nonsingular matrix, the ratio in (4) is maximized when theolumn vectors of the projection matrix Q, are the eigenvectors ofbG−1

w . The optimal projection matrix Qopt is defined as follows:

Qopt = argmaxQ

|GbG−1w |

= [q1, q2, . . . , qk](5)

here {qi|i = 1, 2, . . ., k} is the set of normalized eigenvectors ofbG−1

w corresponding to k largest eigenvalues {�i|i = 1, 2, . . ., k}.

puting 11 (2011) 4282–4292

Now, each face image Xi (i = 1, 2, . . ., N) is projected into theoptimal projection matrix Qopt to obtain its (m × k)-dimensional2DFLD-based features Yi, which is defined as follows:

Yi = X̄iQopt; i = 1, 2, . . . , N (6)

where X̄i is mean-subtracted image of Xi and defined as follows:

X̄i = Xi − � (7)

3. Generalized two-dimensional FLD (G-2DFLD) method forfeature extraction

3.1. Key idea and the algorithm

Like 2DFLD method, the generalized two-dimensional FLD (G-2DFLD) method is also based on 2D image matrix. The onlydifference is that, it maximizes class separability from both therow and column directions simultaneously by the following lineartransformation:

Z = UTXV (8)

where U and V are two projection matrices of dimension m × p(p ≤ m) and n × q (q ≤ n), respectively. Therefore, our goal is to findthe optimal projection directions U and V so that the projectedvector in the (p × q)-dimensional space reaches its maximum classseparability.

3.1.1. Alternate Fisher’s criteriaWe have defined two alternative Fisher’s criteria J(U) and J(V)

corresponding to row and column-wise projection directions asfollows:

J(U) = |UTGbrU||UTGwrU|

(9)

and

J(V) = |VTGbcV||VTGwcV|

(10)

where

Gbr =C∑c

Nc(�c − �)(�c − �)T (11)

Gwr =C∑c

N∑i ∈ c

(Xi − �c)(Xi − �c)T (12)

Gbc =C∑c

Nc(�c − �)T(�c − �) (13)

Gwc =C∑c

N∑i ∈ c

(Xi − �c)T(Xi − �c) (14)

We call the matrices Gbr, Gwr, Gbc and Gwc, as image rowbetween-class scatter matrix, image row within-class scattermatrix, image column between-class scatter matrix and image col-umn within-class scatter matrix, respectively. It may be noted thatsize of the scatter matrices Gbr and Gwr is m × m, whereas, for Gbcand Gwc the size is n × n. The sizes of these scatter matrices aremuch smaller than that of the conventional FLD algorithm, whose

scatter matrices are mn × mn in size. For a square image, m = n andwe have Gbr = GT

bc and Gwr = GTwc and vice versa.

The ratios in (9) and (10) are maximized when the column vec-tors of the projection matrices U and V, are the eigenvectors of

Journal Identification = ASOC Article Identification = 1029 Date: June 28, 2011 Time: 11:48 am

ft Com

Gv

wG{c

3

to

z

nccpfsms

3

vaoi

mafif

I

4

mowefl

4

tTsdcrTf

S. Chowdhury et al. / Applied So

brG−1wr and GbcG−1

wc , respectively. The optimal projection (eigen-ector) matrices Uopt and Vopt are defined as follows:

Uopt = argmaxU

|GbrG−1wr |

= [u1, u2, . . . , up](15)

Vopt = argmaxV

|GbcG−1wc |

= [v1, v2, . . . , vq](16)

here {ui|i = 1, 2, . . ., p} is the set of normalized eigenvectors ofbrG−1

wr corresponding to p largest eigenvalues {�i|i = 1, 2, . . ., p} andvj |j = 1, 2, . . ., q} is the set of normalized eigenvectors of GbcG−1

wcorresponding to q largest eigenvalues {˛j|j = 1, 2, . . ., q}.

.1.2. Feature extractionThe optimal projection matrices Uopt and Vopt are used for fea-

ure extraction. For a given image sample X, an image feature isbtained by the following linear projection:

ij = uTi Xvj, i = 1, 2, . . . , p; j = 1, 2, . . . , q (17)

The zij (i = 1, 2, . . ., p; j = 1, 2, . . ., q) is called a principal compo-ent of the sample image X. It may be noted that each principalomponent of the 2DFLD method is a vector, whereas, the principalomponent of the G-2DFLD method is a scalar. The principal com-onents thus obtained are used to form a G-2DFLD-based image

eature matrix Z of dimension p × q (p ≤ m, q ≤ n), which is muchmaller than the 2DFLD-based image feature matrix Y of dimension

× k (k ≤ n). Therefore, in this case an image matrix is reduced con-iderably in both the row and column directions simultaneously.

.1.3. Calculating fisherfacesLet an image Ai (i = 1, 2, . . ., N) be an m × n matrix of intensity

alues. The dimension of the row and column scatter matrices GbrG−1wr

nd GbcG−1wc are m × m and n × n, respectively. Since the eigenvectors

f these two scatter matrices together define a subspace of the facemages, we can combine them linearly to form fisherfaces.

Let Uopt = [u1, u2, . . ., up] and Vopt = [v1, v2, . . ., vq] are the opti-al orthonormal eigenvectors matrices corresponding to the p

nd q largest eigenvalues of GbrG−1wr and GbcG−1

wc , respectively. Thesherfaces are generated by linear combination of eigenvectors asollows:

ij = uivTj , i = 1, 2, . . . , p; j = 1, 2, . . . , q (18)

. Support vector machines

After the feature extraction, we have designed a non-linearulti-class support vector machines (SVMs) to classify and rec-

gnize the image samples. The support vector machines originallyere designed for binary-class classification problems [47,48]. Sev-

ral binary-class SVMs can be combined to form multi-class SVMsor multi-class classification problems, like face recognition prob-em.

.1. Key idea of binary-class support vector machines

The key idea of a binary-class SVM [47,48] is to separate thewo classes by a function, which is induced from available samples.he SVM finds the hyperplane that separates the largest fraction ofamples of the same class on the same side, while maximizing theistance from the either class to the hyperplane. This hyperplane is

alled optimal separating hyperplane (OSH), which minimizes theisk of misclassification in the training as well as unknown test set.he basic algorithm of the binary-class SVM can be described asollows:

puting 11 (2011) 4282–4292 4285

Given labeled N training samples,

D = {(xi, yi)}Ni=1, xi ∈ Z ⊂ �d, yi ∈ M = {+1, −1} (19)

where xi is the G-2DFLD-based image feature matrix of the ith train-ing sample, d (d = p × q) is the dimension of the image feature vectorand yi is the class of the ith sample.

A SVM separates the training samples belonging to two sepa-rate classes by forming an optimal hyperplane (w · x) + b = 0, w ∈ � d,b ∈ � , which maximizes the margin from x to the hyperplane. Theconstraint of the hyperplane can be written as:

yi((w · xi) + b ≥ 1, i = 1, 2, . . . , N (20)

The discriminant function implemented by a support vectormachine for an input sample x is defined as follows:

f (x) =N∑

i=1

˛iyi(xi · x) + b (21)

The distance of a sample x from the hyperplane is 1/||w||. There-fore, a total distance between two classes will be 2/||w||. Hencethe optimal separating hyperplane (OSH) minimizes the followingfunction:

� (w) = 12

||w||2 (22)

The solution to the optimization problem of (22) subject to theconstraint of (20) is given by the saddle point of the followingLagrange function:

L(w, b, ˛) = 12

||w||2 −N∑

i=1

˛i{yi((w · xi) + b) − 1} (23)

L(w, b, ˛) = 12

||w||2 −N∑

i=1

˛iyi(w · xi) +N∑

i=1

˛iyib +N∑

i=1

˛i (24)

where ˛i is the Lagrange multiplier of the training samples. TheLagrange function has to be minimized with respect to w, b andmaximized with respect to ˛i ≥ 0. The Lagrange function can trans-formed into its dual problem, which is easier to solve, as follows:

max˛

W(˛) = max˛

{minw,b

L(w, b, ˛)} (25)

We can derive two optimal conditions from Eq. (24) as follows:

ı

ıwL(w, b, ˛) = w −

N∑i=1

˛iyixi = 0 (26)

ı

ıbL(w, b, ˛) =

N∑i=1

˛iyi = 0 (27)

Substituting Eqs. (26) and (27) into the right hand side of theLagrange function (24) reduces the function into the dual objectivefunction with ˛i as the dual variable. The dual problem (25) is thendefined as follows:

˛∗ = argmax˛

N∑i=1

˛i − 12

N∑i=1

N∑j=1

˛i˛jyiyj(xi · xj) (28)

with constraints,

N∑

i=1

˛iyi = 0 (29)

˛i ≥ 0, i = 1, 2, . . . , N (30)

Journal Identification = ASOC Article Identification = 1029 Date: June 28, 2011 Time: 11:48 am

4 oft Com

L

w

b

f

ppwicpi

K

wl

md

P

G

w

v

f

a

˛

w

0

wbt

sp

˛

0

˛

s

286 S. Chowdhury et al. / Applied S

Solving Eq. (28) with constraints (29) and (30) determines theagrange multipliers ˛i, and the OSH is defined as follows:

∗ =N∑

i=1

˛∗i yixi (31)

∗ = (w∗ · xi) − yi for some ˛i > 0 (32)

For a new sample x, the classification is defined as follows:

(x) = sign((w∗ · x) + b∗) (33)

In face recognition domain, due to variations in illumination,ose, etc. face images are highly non-linear. Therefore, each sam-le is non-linearly mapped into a high-dimensional feature spaceith a non-linear function ̊ : � d → � D ; D d. Then, linear SVM is

mplemented in the feature space. To avoid explicit mapping ̊ andomputational overhead in the high-dimensional feature space, aositive definite kernel function K is chosen a priori to perform

nner product of vectors in the feature space as follows:

(xi, x) = ˚(xi) · ˚(x) (34)

here ˚(x) is the transformed vector of the sample x by the non-inear function ˚.

Two of the commonly used kernel functions are the polyno-ial and Gaussian radial basis function kernels. These kernels are

efined as follows:

olynomial kernel : K(xi, x) = (xi · x)r (35)

aussian radial basis function : K(xi, x) = exp

(−||xi − x||2

2�2

)

(36)

here r is a positive integer, � > 0.The discriminant function implemented by a non-linear support

ector machine for an input sample x is defined as follows:

(x) =N∑

i=1

˛iyiK(xi, x) + b (37)

The dual objective function (28) in a non-linear SVM becomess follows:

∗ = argmax˛

N∑i=1

˛i − 12

N∑i=1

N∑j=1

˛i˛jyiyjK(xi, xj) (38)

ith constraints,

N

i=1

˛iyi = 0 (39)

≤ ˛i ≤ C, i = 1, 2, . . . , N (40)

here C is a regularization parameter, controlling a compromiseetween maximizing the margin and minimizing the number ofraining set error.

The Karush–Kuhn–Tucker (KKT) conditions are necessary andufficient conditions for an optimal point of a positive definite dualroblem. The dual problem is solved when, for all i:

i = 0 ⇒ yif (xi) ≥ 1, (41)

< ˛i < C ⇒ yif (xi) = 1, (42)

i = C ⇒ yif (xi) ≤ 1. (43)

In our work, the dual objective function (38) is solved by theequential minimization optimization (SMO) algorithm [49].

puting 11 (2011) 4282–4292

4.2. Multi-class support vector machines

Support vector machines are originally designed for binary pat-tern classification. Multi-class pattern recognition problems arecommonly solved using a combination of binary SVMs and a deci-sion strategy to decide the class of the input pattern. Each SVM isindependently trained. Multi-class SVM can be implemented usingone-against-all [48] and one-against-one [50] strategy. In our work,we have implemented one-against-all strategy due to its less mem-ory requirement, as discussed below.

Let the training set (xi, ci) consists of N samples of M classes,where ci (ci ∈ 1, 2, . . ., M) represents the class label of the sample xi.An SVM is constructed for each class by discriminating that classagainst the remaining (M − 1) classes. The number of SVMs used inthis approach is M. A test pattern x is classified by using the winner-takes-all decision strategy, i.e., the class with the maximum valueof the discriminant function f(x) is assigned to it. All the N trainingsamples are used in constructing an SVM for a class. The SVM forclass k is constructed using the set of training samples and theirdesired outputs, (xi, yi). The desired output yi for a training samplexi is defined as follows:

yi ={ +1 if ci = k

−1 if ci /= k(44)

The samples with the desired output yi = +1 are called positivesamples and the samples with the desired output yi = −1 are callednegative samples.

5. Experimental results

The performance of the proposed method has been evalu-ated on the AT&T Laboratories Cambridge database (formerly ORLdatabase) [51] and the UMIST face database [52]. The AT&T databaseis used to test performance of the proposed method under thecondition of minor variations of rotation and scaling, whereas theUMIST database is used to examine the performance of the methodwhen the angle of rotation of the facial images is quite large. Theexperiments were carried out in three different strategies; (i) ran-domly partitioning the database, (ii) n-fold cross validation test and(iii) leave-one-out strategy to test the performance of the proposedmethod.

The recognition rate has been defined as the percentage of ratioof the total number of correct recognition by the method to thetotal number of images in the test set for a single experimentalrun. Therefore, the average recognition rate, Ravg, of the method isdefined as follows:

Ravg =∑l

i=1nicls

l × ntot× 100 (45)

where l is the number of experimental runs, each one of which hasbeen performed by randomly partitioning the database into twosets; training set and test set. The ni

clsis the number of correctly

recognized faces in the ith run and ntot is the total number of testfaces in each run..

The performances of the method have also been evaluated usingrejection criteria. We believe that an ideal face recognition systemshould reject the intruders (faces belonging to other classes) whilerecognizing the own faces. Here, an SVM of a class should recog-

nize all the faces of its own class and reject the faces belonging tothe other classes (intruders). To calculate the success rate of themethod two parameters, namely, the sensitivity and specificity areevaluated. Sensitivity is defined as probability of correctly recogniz-

Journal Identification = ASOC Article Identification = 1029 Date: June 28, 2011 Time: 11:48 am

S. Chowdhury et al. / Applied Soft Com

Fig. 1. Sample images of a subject from the AT&T database.

FA

ir

S

S

wpitioo

for testing. Therefore, training and test sets consist of 360 and 40

ig. 2. Average recognition rates (sensitivity (%)) of the G-2DFLD algorithm on theT&T database for different values of s by varying the values of p and q.

ng a face, whereas specificity refers to the probability of correctlyejecting an intruder. They can be computed as follows:

ensitivity = TP

TP + FN(46)

pecificity = TN

TN + FP(47)

here TP is the total number of faces correctly recognized (trueositive) and FN is the total number of faces falsely recognized as

ntruders (false negative) in each run. TN is the total number faces ofhe other classes truly rejected as intruders (true negative) and FP

s the total number of faces of other classes falsely recognized as itswn (false positive) in each run. It may be noted that the percentagef the sensitivity is also referred as the recognition rate.

Fig. 3. Fourteen of the fisherfaces calculated

puting 11 (2011) 4282–4292 4287

5.1. Experiments on the AT&T face database

The AT&T database contains 400 gray-scale images of 40 per-sons. Each person has 10 gray-scale images, having a resolutionof 112 × 92 pixels. Images of the individuals have been taken byvarying light intensity, facial expressions (open/closed eyes, smil-ing/not smiling) and facial details (glasses/no glasses) against a darkhomogeneous background, with tilt and rotation up to 20◦ and scalevariation up to 10%. Sample face images of a person are shown inFig. 1

5.1.1. Randomly partitioning the databaseIn this experimental strategy, we randomly select s images from

each subject to form the training set and the remaining imagesare included in the test set. To ensure sufficient training and totest the effectiveness of the proposed technique for different sizesof the training sets, we choose the value of s as 3, 4, 5, 6 and7. It may be noted that there is no overlap between the trainingand test images. To reduce the influence of performance on thetraining and test sets, for each value of s, experiment is repeated20 times with different training and test sets. Since the num-bers of projection vectors p and q have a considerable impact onthe performance of the G-2DFLD algorithm, we perform severalexperiments by varying the values of p and q. Fig. 2 shows therecognition rates (sensitivity (%)) of the G-2DFLD algorithm using amulti-class support vector machine (SVM). For each value s, aver-age recognition rates are plotted by varying the values of p andq. For s = 3, 4, 5, 6 and 7 the best average recognition rates arefound to be 92.82%, 95.94%, 97.68%, 98.72% and 98.42%, respec-tively and the dimension (p × q) of the corresponding image featurematrices are (16 × 16), (16 × 16), (14 × 14), (14 × 14) and (8 × 8),respectively. The average specificity (%) are found to be 99.82%,99.90%, 99.94%, 99.97% and 99.96% for s = 3, 4, 5, 6 and 7, respec-tively.

We have constructed the fisherfaces using the eigenvectors fors = 5. Some samples of these fisherfaces Iii (i = 1, 2, . . ., 14) are shownin Fig. 3.

5.1.2. n-Fold cross validation testIn this experiment, we divide the AT&T database into 10-folds

randomly, taking one image of a person into a fold. Therefore, eachfold consists of 40 images, each one corresponding to a differentperson. For 10-folds cross validation test, in each experimental run,9-folds are used to train the multi-class SVM and remaining 1-fold

images, respectively. The average recognition rates (sensitivity (%))by varying the image feature matrix (i.e. p × q) are shown in Fig. 4.The best average recognition rate is found to be 99.75% using image

from a training set of AT&T database.

Journal Identification = ASOC Article Identification = 1029 Date: June 28, 2011 Time: 11:48 am

4288 S. Chowdhury et al. / Applied Soft Computing 11 (2011) 4282–4292

Fig. 4. Average recognition rates (sensitivity (%)) of the G-2DFLD algorithm on theAT&T database for 10-folds cross validation test by varying the values of p and q. Theupper and lower extrema of the error bars represent the maximum and minimumvalues, respectively.

Table 1Experimental results using leave-one-out strategy on the AT&T database.

Featurematrix

# offeatures

Avg. recognition rate(sensitivity (%))

Avg. specificity(%)

fb

5

diet4s(9

5

PSooptrbbt4mcPr

t2doO2a

Fig. 5. Some sample images of a subject from the UMIST database.

we have randomly divided the database into 19-folds, taking oneimage of a subject into a fold. Therefore, in each fold there are 20images, each one corresponding to a different subject. For 19-folds

8 × 8 64 99.00 99.97

eature matrix of size (8 × 8). The average specificity (%) is found toe 99.99%.

.1.3. Leave-one-out methodTo classify an image of a subject, the image is removed from the

atabase of N images and placed into a test set. Remaining N − 1mages are used in the corresponding training set. In this way,xperiments were performed N times, removing one image fromhe database at a time. For the AT&T database, we have performed00 experimental runs for the database of 400 images. Table 1hows the average recognition rate (sensitivity (%)) and specificity%) using 8 × 8 image feature matrix. We have achieved 99.00% and9.97% average recognition rate and specificity (%), respectively.

.1.4. Comparison with other methodsFor a fair comparison, we have implemented the PCA, 2DPCA,

CA + FLD and 2DFLD algorithms and used the same multi-classVM and parameters for classification. The comparisons in termsf best average recognition rates (sensitivity (%)) and specificity (%)f the PCA, 2DPCA, PCA + FLD and 2DFLD algorithms along with theroposed G-2DFLD algorithm using the two different experimen-al strategies on the AT&T database are shown in Tables 2 and 3,espectively. Table 2 also shows the comparison of performancesetween the proposed method and the neural networks and SVM-ased methods as reported in [16–18,24,25,53]. It may be notedhat the results reported in [16–18,24,25,53] are based on 10, 1, 10,, 1 and 4 experimental runs, respectively, whereas, the proposedethod is based on 20 experimental runs. We can see that in all the

ases the performance of the G-2DFLD method is better than theCA, 2DPCA, PCA + FLD and 2DFLD methods, and also the methodseported in [17,18,24,25,53].

Table 4 shows the average feature extraction, recognition andotal times (in s) taken by the G-2DFLD, PCA, 2DPCA, PCA + FLD andDFLD methods with 200 training and 200 test images of the AT&Tatabase using an IBM Intel Pentium 4 Hyper-Threading technol-gy, 3.0 GHz, 2 GB DDR-II RAM computer running on Fedora 9 Linux

perating Systems. It may be again noted that the proposed G-DFLD method is more efficient than the PCA, 2DPCA, PCA + FLDnd 2DFLD methods in term of total computation time.

Fig. 6. Average recognition rates (sensitivity (%)) of the G-2DFLD algorithm on theUMIST database for different values of s by varying the values of p and q.

5.2. Experiments on the UMIST face database

The UMIST1 face database is a multi-view database, consistingof 575 gray-scale images of 20 people (subject), each covering awide range of poses from profile to frontal views. Each image hasa resolution of 112 × 92 pixels. Each subject also covers a range ofrace, sex and appearance. Unlike the AT&T database, the number ofimages per people is not fixed; it varies from 19 to 48. Fig. 5 showssome of the sample images of a subject from the database.

5.2.1. Randomly partitioning the databaseLike AT&T database, we randomly select s images from each sub-

ject to form the training set and the remaining images are includedin the test set. We choose the value of s as 4, 6, 8 and 10. It maybe again noted that there is no overlap between the training andtest images. For each value of s, experiment is repeated 20 timeswith different training and test sets. Fig. 6 shows the recognitionrates (sensitivity (%)) of the G-2DFLD algorithm using a multi-classSVM. For each value of s, average recognition rates are plotted byvarying the values of p and q. For s = 4, 6, 8 and 10 the best aver-age recognition rates are found to be 86.22%, 92.28%, 95.54% and96.92%, respectively and the dimension (p × q) of the correspond-ing image feature matrices are (14 × 14), (14 × 14), (14 × 14) and(18 × 18), respectively. The average specificity (%) are found to be99.28%, 99.59%, 99.77% and 99.84% for s = 4, 6, 8 and 10, respectively.

5.2.2. n-Fold cross validation testSince the number of images per subject varies from 19 to 48,

1 At present UMIST database contains 475 images. However, we have used theearlier version of the UMIST database to test with more number of images.

Journal Identification = ASOC Article Identification = 1029 Date: June 28, 2011 Time: 11:48 am

S. Chowdhury et al. / Applied Soft Computing 11 (2011) 4282–4292 4289

Table 2Comparison of different methods in terms of average recognition rates (sensitivity (%)) on the AT&T database.

Experiment Method Avg. recognition rates (sensitivity (%))

s = 3 s = 4 s = 5 s = 6 s = 7

Randomly partition, s images/subject

G-2DFLD 92.82 95.94 97.68 98.72 98.42PCA 85.58 89.42 93.10 95.28 96.012DPCA 91.27 94.33 96.83 97.72 97.79PCA + FLD 83.65 88.65 92.60 95.30 95.832DFLD 92.30 95.08 97.50 98.26 97.88SA-RBF [16] 93.86 96.25 97.30 – –RBF [17] 93.50 – 96.90 – –DCT + RBF [18] – – 97.55 – –PCA + SVM [24] – – 97.00 – –SGFS + SVM [25] – – 95.00 – –NFL [53] – – 96.87 – –

10-Folds cross validation test

G-2DFLD 99.75PCA 97.002DPCA 99.25PCA + FLD 98.252DFLD 99.00

Table 3Comparison of different methods in terms of average specificity (%) on the AT&T database.

Experiment Method Average specificity (%)

s = 3 s = 4 s = 5 s = 6 s = 7

Randomly partition, s images/subject

G-2DFLD 99.82 99.90 99.94 99.97 99.96PCA 99.63 99.73 99.82 99.88 99.902DPCA 99.78 99.85 99.92 99.94 99.94PCA + FLD 99.58 99.71 99.81 99.88 99.892DFLD 99.80 99.87 99.94 99.96 99.95

10-Folds cross validation test

G-2DFLD 99.99PCA 99.922DPCA 99.98PCA + FLD 99.962DFLD 99.97

Table 4Comparison of different methods in terms of average feature extraction, recognition and total times (in s) using 200 training and 200 test images on the AT&T database.

Method # of features Avg. feature extraction time (s) Avg. recognition time (s) Avg. total time (s)

G-2DFLD 14 × 14 = 196 12.95 53.42 66.37PCA 60 55.10 13.75 68.85

313.29 345.8413.31 69.06

313.03 335.38

ctirn(fa

5

fnfn

TE

2DPCA 112 × 14 = 1568 32.55

PCA + FLD 25 55.75

2DFLD 112 × 14 = 1568 22.35

ross validation test, in each experimental run, 18-folds are usedo train the multi-class SVM and remaining 1-fold is used for test-ng. Therefore, training and test sets consist of 360 and 20 images,espectively in a particular experimental run. The average recog-ition rates (sensitivity (%)) by varying the image feature matrixi.e. p × q) are shown in Fig. 7. The best average recognition rate isound to be 98.95% using image feature matrix of size (14 × 14). Theverage specificity (%) is found to be 99.95%.

.2.3. Leave-one-out methodIn this experiment, we have performed 575 experimental runs

or the database of 575 images. Table 5 shows the average recog-ition rate (sensitivity (%)) and specificity (%) using 14 × 14 image

eature matrix. We have achieved 98.96% and 99.95% average recog-ition rate and specificity (%), respectively.

able 5xperimental results using leave-one-out strategy on the UMIST database.

Featurematrix

# offeatures

Avg. recognition rate(sensitivity (%))

Avg. specificity(%)

14 × 14 196 98.96 99.95

Fig. 7. Average recognition rates (sensitivity (%)) of the G-2DFLD algorithm on theUMIST database for 19-folds cross validation test by varying the values of p and q. Theupper and lower extrema of the error bars represent the maximum and minimumvalues, respectively.

Journal Identification = ASOC Article Identification = 1029 Date: June 28, 2011 Time: 11:48 am

4290 S. Chowdhury et al. / Applied Soft Computing 11 (2011) 4282–4292

Table 6Comparison of different methods in terms of average recognition rates (sensitivity (%)) on the UMIST database.

Experiment Method Average recognition rates (sensitivity (%))

s = 4 s = 6 s = 8 s = 10

Randomly partition, s images/subject

G-2DFLD 86.22 92.28 95.54 96.92PCA 80.72 86.53 94.01 95.112DPCA 85.70 91.91 95.07 96.60PCA + FLD 76.31 85.69 90.93 93.722DFLD 86.12 92.16 95.25 96.55SA-RBF [16] 89.46 92.84 96.36 –KPCA SVM GSFS [54] – 92.30 – –

19-Folds cross validation test

G-2DFLD 98.95PCA 98.682DPCA 98.95PCA + FLD 96.362DFLD 98.68

Table 7Comparison of different methods in terms of average specificity (%) on the UMIST database.

Experiment Method Average specificity (%)

s = 4 s = 6 s = 8 s = 10

Randomly partition, s images/subject

G-2DFLD 99.28 99.59 99.77 99.84PCA 98.99 99.29 99.68 99.742DPCA 99.25 99.57 99.74 99.82PCA + FLD 98.75 99.25 99.52 99.672DFLD 99.27 99.59 99.75 99.83

19-Folds cross validation test

G-2DFLD 99.95PCA 99.932DPCA 99.95

5

tmisaeTpr1edamG2tGKr

6

ptosohpa

PCA + FLD

2DFLD

.2.4. Comparison with other methodsFor a fair comparison, like AT&T database, we have implemented

he PCA, 2DPCA, PCA + FLD and 2DFLD algorithms and used the sameulti-class SVM and parameters for classification. The comparisons

n terms of the best average recognition rates (sensitivity (%)) andpecificity (%) of the PCA, 2DPCA, PCA + FLD and 2DFLD algorithmslong with the propose G-2DFLD method using the two differentxperimental strategies are shown in Tables 6 and 7, respectively.able 6 also shows the comparison of performances between theroposed method and neural networks and SVM-based methodseported as in [16,54]. The results reported in [16] are based on0 experimental runs, whereas in [54], the result is based on 1xperimental run using only 380 images out of 575 images of theatabase. It may be recalled that the results of the proposed methodre based on 20 experimental runs using all the 575 images. Itay be again noted that in all the cases the performance of the-2DFLD method is better than the PCA, 2DPCA, PCA + FLD andDFLD methods, excepting in 19-folds cross validation test, wherehe performance of the 2DPCA matches with that of the proposed-2DFLD method. The G-2DFLD method is also comparable to thePCA SVM GSFS method [54] in spite of using more experimentaluns and images.

. Conclusion

In this paper, we have presented a face recognition system byroposing a novel feature extraction method, namely, generalizedwo-dimensional FLD (G-2DFLD) method, which is based on theriginal 2D image matrix. The G-2DFLD algorithm maximizes classeparability from both the row and column directions simultane-

usly, resulting in smaller image feature matrix. To realize this, weave defined two alternative Fisher’s criteria. The principal com-onents extracted from an image matrix by the G-2DFLD methodre scalars. Since the size of the scatter matrices in the proposed

99.8199.93

G-2DFLD algorithm is much smaller than those in the conven-tional PCA and FLD schemes, the computational time for featureextraction is much less. Again, the image feature matrix generatedby the G-2DFLD algorithm is much smaller than those of gener-ated by the 2DPCA and 2DFLD algorithms. As a result, the overalltime (feature extraction time + recognition time) of G-2DFLD algo-rithm is also much lesser than the 2DPCA and 2DFLD algorithms.Several experiments were carried out on the AT&T and UMISTdatabases, using three different experimental strategies; namely,(i) randomly partitioning the database, (ii) n-fold cross validationtest, and (iii) leave-one-out method, to test the performance of theproposed method. A non-linear multi-class SVM has been designedto classify the face images. The experimental results show that theG-2DFLD method is more efficient than the PCA, 2DPCA, PCA + FLD,and 2DFLD methods, not only in terms of computation times, butalso for the task of face recognition. The proposed method alsooutperforms some of the neural networks and other SVM-basedmethods for face recognition reported in the literature.

Acknowledgements

This work was supported by the UGC major research project (F.No.: 37-218/2009(SR), dated: 12-01-2010), CMATER and SRUVMprojects of the Department of Computer Science & Engineering,Jadavpur University, Kolkata, India. The author, Shiladitya Chowd-hury would like to thank Techno India, Kolkata for providingcomputing facilities and allowing time for conducting researchworks. The author, D. K. Basu would also like to thank the AICTE,

New Delhi for providing him the Emeritus Fellowship (F. No.: 1-51/RID/EF(13)/2007-08, dated 28-02-2008). Last but not the least;the authors would also like to thank the anonymous reviewers fortheir constructive suggestions to improve quality of the paper.

Journal Identification = ASOC Article Identification = 1029 Date: June 28, 2011 Time: 11:48 am

ft Com

R

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[[[

[

[

[

[

[

the Department of Science & Technology, Govt. of India,at the University of Pennsylvania and the University ofIowa during 2006. He is a member of the IEEE, USA.His research interest includes face recognition/detection,

medical image processing, and pattern recognition.

S. Chowdhury et al. / Applied So

eferences

[1] A. Samal, P. Iyengar, Automatic recognition and analysis of human faces andfacial expressions: a survey, Pattern Recogn. 25 (1992) 65–77.

[2] R. Chellappa, C.L. Wilson, S. Sirohey, Human and machine recognition of faces:a survey, Proc. IEEE 83 (1995) 705–740.

[3] W. Zhao, R. Chellappa, P.J. Phillops, A. Rosenfeld, Face recognition: a literaturesurvey, ACM Comput. Surveys 35 (2003) 399–458.

[4] A.S. Tolba, A.H. El-Baz, A.A. El-Harby, Face recognition: a literature review, Int.J. Signal Process. 2 (2006) 88–103.

[5] L. Sirovich, M. Kirby, Low-dimensional procedure for the characterization ofhuman faces, J. Opt. Soc. Am. 4 (1987) 519–524.

[6] M. Kirby, L. Sirovich, Application of the KL procedure for the characterizationof human faces, IEEE Trans. Pattern Anal. Mach. Intell. 12 (1990) 103–108.

[7] P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfaces versus fisherfaces:recognition using class specific linear projection, IEEE Trans. Pattern Anal.Mach. Intell. 19 (1997) 711–720.

[8] C. Liu, H. Wechsler, A shape- and texture-based enhanced fisher classifier forface recognition, IEEE Trans. Image Process. 10 (2001) 598–608.

[9] W. Zhao, R. Chellappa, A. Krishnaswamy, Discriminant analysis of principalcomponents for face recognition, in: Proceedings of the International Confer-ence on Automatic Face and Gesture Recognition, 1998, pp. 336–341.

10] K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press,New York, 1990.

11] M.J. Er, S. Wu, J. Lu, H.L. Toh, Face recognition with radial basis function (RBF)neural networks, IEEE Trans. Neural Netw. 13 (2002) 697–710.

12] J. Yang, D. Zhang, A.F. Frangi, J.Y. Yang, Two-dimensional PCA: a new approachto appearance-based face representation and recognition, IEEE Trans. PatternAnal. Mach. Intell. 26 (2004) 131–137.

13] H. Xiong, M.N.S. Swamy, M.O. Ahmad, Two-dimensional FLD for face recogni-tion, Pattern Recogn. 38 (2005) 1121–1124.

14] B. Moghaddam, A. Pentland, Probabilistic visual learning for object represen-tation, IEEE Trans. Pattern Anal. Mach. Intell. 19 (1997) 696–710.

15] J.K. Sing, D.K. Basu, M. Nasipuri, M. Kundu, Face recognition using point sym-metry distance-based RBF network, Appl. Soft Comput. 7 (2007) 58–70.

16] J.K. Sing, S. Thakur, D.K. Basu, M. Nasipuri, M. Kundu, High-speed face recogni-tion using self-adaptive radial basis function neural networks, Neural Comput.Appl. 18 (2009) 979–990.

17] F. Yang, M. Paindovoine, Implementation of an RBF neural network on embed-ded systems: real-time face tracking and identity verification, IEEE Trans.Neural Netw. 14 (2003) 1162–1175.

18] M.J. Er, W. Chen, S. Wu, High-speed face recognition based on discrete cosinetransform and RBF neural networks, IEEE Trans. Neural Netw. 16 (2005)679–691.

19] J. Haddadnia, K. Faez, M. Ahmadi, A fuzzy hybrid learning algorithm for radialbasis function neural network with application in human face recognition,Pattern Recogn. 36 (2003) 1187–1202.

20] P.J. Phillips, Support vector machines applied to face recognition, Adv. NeuralInform. Process. Syst. 11 (1998) 803–809.

21] C. Zhaohui, H. Guiming, Face recognition using multi-class BSVM with com-ponent features, in: Proceedings of the 2005 IEEE International Conference onNeural Networks and Brain, 2005, pp. 1449–1452.

22] C.-H. Lee, S.-W. Park, W. Chang, J.-W. Park, Improving the performance of multi-class SVMs in face recognition with nearest neighbor rule, in: Proceedings of the15th IEEE International Conference on Tools with Artificial Intelligence, 2003.

23] J. Ko, H. Byun, Combining SVM classifiers for multiclass problem: its applicationto face recognition, in: Proceedings of 4th International Conference on Audio-and Video-Based Biometrie Person Authentication, 2003, pp. 531–539.

24] G.D. Guo, S.Z. Li, K.L. Chen, Support vector machine for face recognition, J. ImageVis. Comput. 19 (2001) 631–638.

25] L. Wang, Y. Sun, A new approach for face recognition based on SGFS and SVM,in: Proceedings of IEEE, 2007, pp. 527–530.

26] S. Thakur, J.K. Sing, D.K. Basu, M. Nasipuri, Face recognition using Fisher lineardiscriminant analysis and support vector machine, in: Proceedings of the 2ndInternational Conference on Contemporary Computing, 2009, pp. 318–326.

27] M.-H. Yang, N. Ahuja, D. Kraegman, Face recognition using kernel eigenfaces,in: Proceeding of the IEEE International Conference on Image Processing, 2000,pp. 37–40.

28] K.I. Kim, K. Jung, H.J. Kim, Face recognition using kernel principal componentanalysis, IEEE Signal Proc. Lett. 9 (2002) 40–42.

29] V.D.M. Nhat, S.Y. Lee, Kernel-based 2DPCA for face recognition, in: Proceedingsof the IEEE International Symposium on Signal Proc. and Info. Tech, 2007, pp.35–39.

30] S. Mika, G. Ratsch, J. Weston, Fisher discriminant analysis with kernels, in:Proceedings of the Neural Networks Signal Processing Workshop, 1999, pp.41–48.

31] G. Baudat, F. Anouar, Generalized discriminant analysis using a kernelapproach, Neural Comput. 12 (2000) 2385–2404.

32] O. Liu, X. Tang, H. Lu, S. Ma, Face recognition using kernel scatter-difference-based discriminant analysis, IEEE Trans. Neural Netw. 17 (2006) 1081–1085.

33] R. Zhi, Q. Ruan, Two-dimensional direct and weighted linear discriminant anal-

ysis for face recognition, Neurocomputing 71 (2008) 3607–3611.

34] J. Wang, W. Yang, Y. Lin, J. Yang, Two-directional maximum scatter differ-ence discriminant analysis for face recognition, Neurocomputing 72 (2008)352–358.

puting 11 (2011) 4282–4292 4291

35] X.-N. Song, Y.-J. Zheng, X.-J. Wu, X.-B. Yang, J.-Y. Yang, A complete fuzzy dis-criminant analysis approach for face recognition, Appl. Soft Comput. 10 (2010)208–214.

36] X. Jiang, B. Mandal, A. Kot, Eigenfeature regularization and extraction in facerecognition, IEEE Trans. Pattern Anal. Mach. Intell. 30 (2008) 383–394.

37] M.D. Kelly, Visual Identification of People by Computer, Tech. rep. AI-130, Stan-ford AI Project, Stanford, CA, 1970.

38] T. Kanade, Computer Recognition of Human Faces, Birkhauser, Basel,Switzerland, and Stuttgart, Germany, 1973.

39] I.J. Cox, J. Ghosn, P.N. Yianilos, Feature-based face recognition using mixture dis-tance, in: Proceedings of the IEEE Conference on Computer Vision and PatternRecognition, 1996, pp. 209–216.

40] F. Samaria, S. Young, HMM based architecture for face identification, Image Vis.Comput. 12 (1994) 537–583.

41] A.V. Nefian, M.H. Hayes III, Hidden Markov Models for face recognition, in:Proceedings of the International Conference on Acoustics, Speech and SignalProcessing, 1998, pp. 2721–2724.

42] L. Wiskott, J.-M. Fellous, C. Von Der Malsburg, Face recognition by elastic bunchgraph matching, IEEE Trans. Pattern Anal. Mach. Intell. 19 (1997) 775–779.

43] A. Pentland, B. Moghaddam, T. Starner, View-based and modular eigenspacesfor face recognition, in: Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition, 1994, pp. 84–91.

44] P. Penev, J. Atick, Local feature analysis: a general statistical theory for objectrepresentation, Netw. Comput. Neural Syst. 7 (1996) 477–500.

45] A. Lanitis, C.J. Taylor, T.F. Cootes, Automatic face identification system usingflexible appearance models, Image Vis. Comput. 13 (1995) 393–401.

46] J. Huang, B. Heisele, V. Blanz, Component-based face recognition with 3D mor-phable models, in: Proceedings of the International Conference on Audio- andVideo-Based Person Authentication, 2003, pp. 27–34.

47] C. Cortes, V. Vapnik, Support-vector network, Mach. Learn. 20 (1995) 273–297.48] V.N. Vapnik, Statistical Learning Theory, John Wiley & Sons, New York, 1998.49] J. Platt, Fast training of SVMs using sequential minimal optimization, in:

Advances in Kernel Methods Support Vector Machine, MIT Press, Cambridge,1999, pp. 185–208.

50] S. Knerr, L. Personnaz, G. Dreyfus, Nurocosingle-Layer Learning Revisited: AStepwise Procedure for Building and Training a Neural Network, Springer, 1990.

51] The Database of Faces, AT&T Laboratories, Cambridge, U.K. [Online]. Available:http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html.

52] D.B. Graham, N.M. Allinson, in: H. Wechsler, P.J. Phillips, V. Bruce, F. Fogelman-Soulie, T.S. Huang (Eds.), Characterizing Virtual Eigensignatures for GeneralPurpose Face Recognition: From Theory to Applications’, vol. 163, NATO ASISeries F, Computer and Systems Sciences, 1998, pp. 446–456.

53] S.Z. Li, J. Lee, Face recognition using the nearest feature line method, IEEE Trans.Neural Netw. 10 (1999) 439–443.

54] W. Li, W. Gang, Y. Liang, W. Chen, Feature selection based on KPCA, SVM andGSFS for face recognition, in: Proceedings of the International Conference onAdvances in Pattern Recognition, 2005, pp. 344–350.

Shiladitya Chowdhury got his Bachelor of Technologydegree in Computer Science and Engineering from WestBengal University of Technology, Kolkata, India, in 2005and the Master of Technology degree in Computer Tech-nology from Jadavpur University, Kolkata, India, in 2009.He is working as a Lecturer, at the Department of Masterof Computer Application in Techno India, Kolkata, Indiasince January 2007. He is currently pursuing his DoctorateDegree in Engineering at Jadavpur University. His researchinterests include face recognition, pattern recognition andimage processing.

Jamuna Kanta Sing received his B.E. (Computer Science& Engineering) degree from Jadavpur University in 1992,M.Tech. (Computer & Information Technology) degreefrom Indian Institute of Technology (IIT), Kharagpur in1993 and Ph.D. (Engineering) degree from Jadavpur Uni-versity in 2006. Dr. Sing has been a faculty memberof the Department of Computer Science & Engineering,Jadavpur University since March 1997. He has done hisPost Doctoral research works as a BOYSCAST Fellow of

Journal Identification = ASOC Article Identification = 1029 Date: June 28, 2011 Time: 11:48 am

4 oft Com

& Engineering, Jadavpur University since 1987. Her cur-rent research interest includes image processing, patternrecognition, and multimedia systems. She is a senior

292 S. Chowdhury et al. / Applied S

Dipak Kumar Basu received his B.E.Tel.E., M.E.Tel., andPh.D. (Engg.) degrees from Jadavpur University, in 1964,1966 and 1969 respectively. Prof. Basu was a faculty mem-ber of the Department of Computer Science & Engineering,Jadavpur University from 1968 to January 2008. He ispresently an A.I.C.T.E. Emeritus Fellow at the Departmentof Computer Science & Engineering, Jadavpur University.

His current fields of research interest include patternrecognition, image processing, and multimedia systems.He is a senior member of the IEEE, USA, Fellow of IE (India)and WBAST, Kolkata, India and a former Fellow of Alexan-der Von Humboldt Foundation, Germany.

puting 11 (2011) 4282–4292

Mita Nasipuri received her B.E.E., M.E.Tel.E. and Ph.D.(Engg.) degrees from Jadavpur University, in 1979, 1981and 1990, respectively. Prof. Nasipuri has been a fac-ulty member of the Department of Computer Science

member of the IEEE, USA, Fellow of IE (India) and WBAST,Kolkata, India.