04049828.pdf

11
150 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 1, JANUARY 2007 A Method of Face Recognition Based on Fuzzy c-Means Clustering and Associated Sub-NNs Jianming Lu, Xue Yuan, and Takashi Yahagi, Senior Member, IEEE Abstract—The face is a complex multidimensional visual model and developing a computational model for face recognition is difficult. In this paper, we present a method for face recogni- tion based on parallel neural networks. Neural networks (NNs) have been widely used in various fields. However, the computing efficiency decreases rapidly if the scale of the NN increases. In this paper, a new method of face recognition based on fuzzy clustering and parallel NNs is proposed. The face patterns are divided into several small-scale neural networks based on fuzzy clustering and they are combined to obtain the recognition result. In particular, the proposed method achieved a 98.75% recognition accuracy for 240 patterns of 20 registrants and a 99.58% rejection rate for 240 patterns of 20 nonregistrants. Experimental results show that the performance of our new face-recognition method is better than those of the backpropagation NN (BPNN) system, the hard c-means (HCM) and parallel NNs system, and the pat- tern-matching system. Index Terms—Face recognition, fuzzy clustering, parallel neural networks (NNs). I. INTRODUCTION F ACE recognition plays an important role in many applica- tions such as building/store access control, suspect identi- fication, and surveillance [1], [2], [4]–[7], [16]–[23]. Over the past 30 years, many different face-recognition techniques have been proposed, motivated by the increased number of real-world applications requiring the recognition of human faces. There are several problems that make automatic face recognition a very difficult task. The face image of a person input to a face-recogni- tion system is usually acquired under different conditions from those of the face image of the same person in the database. Therefore, it is important that the automatic face-recognition system be able to cope with numerous variations of images of the same face. The image variations are mostly due to changes in the following parameters: pose, illumination, expression, age, disguise, facial hair, glasses, and background [18]–[23]. In many pattern-recognition systems, the statistical approach is frequently used [18]–[23]. Although this paradigm has been successfully applied to various problems in pattern classifica- tion, it is difficult to express structural information unless an appropriate choice of features is possible. Furthermore, this approach requires much heuristic information to design a clas- sifier. Neural-network (NN)-based paradigms, as new means of implementing various classifiers based on statistical and struc- tural approaches, have been proven to possess many advantages Manuscript received January 28, 2005; revised September 22, 2005. The authors are with the Graduate School of Science and Technology, Chiba University, Chiba 263-8522, Japan (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNN.2006.884678 for classification because of their learning ability and good generalization [5]–[9]. Generally speaking, multilayered net- works (MLNs), usually coupled with the backpropagation (BP) algorithm, are most widely used for face recognition [24]. The BP algorithm is a gradient-based method, hence, some inherent problems (or difficulties) are frequently encountered in the use of this algorithm, e.g., very slow convergence speed in training, and difficulty in escaping from a local minimum. Therefore, some techniques are introduced to resolve these drawbacks; however, to date, all of them are still far from satisfactory. A structurally adaptive intelligent neural tree (SAINT) was proposed by Lin et al. [7]. The basic idea is to hierarchically partition the input pattern space using a tree-structured NN composed of subnetworks with topology-preserving mapping ability. The self-growing NN CombNet-II was proposed by Nugroho et al. [28]. The stem network divides the input space by a vector quantizing network into several subspaces. Each output neuron of the stem network is associated with a branch network, which is a feedforward three-layered network that performs a refined classification of the input vector in a specific subspace. The radial basis function NN (RBFNN) is widely used for function approximation and pattern-recognition sys- tems [14], [16], [17]. In this paper, we propose a new method of face recognition based on fuzzy clustering and parallel NNs. As one drawback of the BP algorithm, when the scale of the NN increases, the computing efficiency decreases rapidly for various reasons, such as the appearance of a local minimum. Therefore, we propose a method in which the individuals in the training set are divided into several small-scale parallel NNs, and they are combined to obtain the recognition result. The HCM is the most well-known conventional (hard) clustering method [12]. The HCM algorithm executes a sharp classifica- tion, in which each object is either assigned to a cluster or not. Because the HCM restricts each point of a data set to exactly one cluster and the individuals belonging to each cluster are not overlapped, some similar individuals cannot be assigned to the same cluster, and, hence, they are not learned or recognized in the same NN. In this paper, fuzzy c-means (FCM) is used [13]–[15]. In contrast to HCM, the application of fuzzy sets in a classification function causes the class membership to become a relative one and an object can belong to several clusters at the same time but to different degrees. FCM produces the idea of uncertainty of belonging described by a membership function, and it enables an individual to belong to several networks. Then, all similar patterns can be thoroughly learned and recognized in one NN. Details of this system are described in the remainder of this paper. Section II covers preprocessing of the system. In Section III, we present a method for face recognition based on 1045-9227/$20.00 © 2006 IEEE

Upload: nadeemq0786

Post on 04-Dec-2015

248 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 04049828.pdf

150 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 1, JANUARY 2007

A Method of Face Recognition Based on Fuzzyc-Means Clustering and Associated Sub-NNs

Jianming Lu, Xue Yuan, and Takashi Yahagi, Senior Member, IEEE

Abstract—The face is a complex multidimensional visual modeland developing a computational model for face recognition isdifficult. In this paper, we present a method for face recogni-tion based on parallel neural networks. Neural networks (NNs)have been widely used in various fields. However, the computingefficiency decreases rapidly if the scale of the NN increases. Inthis paper, a new method of face recognition based on fuzzyclustering and parallel NNs is proposed. The face patterns aredivided into several small-scale neural networks based on fuzzyclustering and they are combined to obtain the recognition result.In particular, the proposed method achieved a 98.75% recognitionaccuracy for 240 patterns of 20 registrants and a 99.58% rejectionrate for 240 patterns of 20 nonregistrants. Experimental resultsshow that the performance of our new face-recognition methodis better than those of the backpropagation NN (BPNN) system,the hard c-means (HCM) and parallel NNs system, and the pat-tern-matching system.

Index Terms—Face recognition, fuzzy clustering, parallel neuralnetworks (NNs).

I. INTRODUCTION

FACE recognition plays an important role in many applica-tions such as building/store access control, suspect identi-

fication, and surveillance [1], [2], [4]–[7], [16]–[23]. Over thepast 30 years, many different face-recognition techniques havebeen proposed, motivated by the increased number of real-worldapplications requiring the recognition of human faces. There areseveral problems that make automatic face recognition a verydifficult task. The face image of a person input to a face-recogni-tion system is usually acquired under different conditions fromthose of the face image of the same person in the database.Therefore, it is important that the automatic face-recognitionsystem be able to cope with numerous variations of images ofthe same face. The image variations are mostly due to changesin the following parameters: pose, illumination, expression, age,disguise, facial hair, glasses, and background [18]–[23].

In many pattern-recognition systems, the statistical approachis frequently used [18]–[23]. Although this paradigm has beensuccessfully applied to various problems in pattern classifica-tion, it is difficult to express structural information unless anappropriate choice of features is possible. Furthermore, thisapproach requires much heuristic information to design a clas-sifier. Neural-network (NN)-based paradigms, as new means ofimplementing various classifiers based on statistical and struc-tural approaches, have been proven to possess many advantages

Manuscript received January 28, 2005; revised September 22, 2005.The authors are with the Graduate School of Science and Technology, Chiba

University, Chiba 263-8522, Japan (e-mail: [email protected]).Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TNN.2006.884678

for classification because of their learning ability and goodgeneralization [5]–[9]. Generally speaking, multilayered net-works (MLNs), usually coupled with the backpropagation (BP)algorithm, are most widely used for face recognition [24]. TheBP algorithm is a gradient-based method, hence, some inherentproblems (or difficulties) are frequently encountered in the useof this algorithm, e.g., very slow convergence speed in training,and difficulty in escaping from a local minimum. Therefore,some techniques are introduced to resolve these drawbacks;however, to date, all of them are still far from satisfactory.A structurally adaptive intelligent neural tree (SAINT) wasproposed by Lin et al. [7]. The basic idea is to hierarchicallypartition the input pattern space using a tree-structured NNcomposed of subnetworks with topology-preserving mappingability. The self-growing NN CombNet-II was proposed byNugroho et al. [28]. The stem network divides the input spaceby a vector quantizing network into several subspaces. Eachoutput neuron of the stem network is associated with a branchnetwork, which is a feedforward three-layered network thatperforms a refined classification of the input vector in a specificsubspace. The radial basis function NN (RBFNN) is widelyused for function approximation and pattern-recognition sys-tems [14], [16], [17]. In this paper, we propose a new methodof face recognition based on fuzzy clustering and parallel NNs.

As one drawback of the BP algorithm, when the scale of theNN increases, the computing efficiency decreases rapidly forvarious reasons, such as the appearance of a local minimum.Therefore, we propose a method in which the individuals in thetraining set are divided into several small-scale parallel NNs,and they are combined to obtain the recognition result. TheHCM is the most well-known conventional (hard) clusteringmethod [12]. The HCM algorithm executes a sharp classifica-tion, in which each object is either assigned to a cluster or not.Because the HCM restricts each point of a data set to exactlyone cluster and the individuals belonging to each cluster arenot overlapped, some similar individuals cannot be assigned tothe same cluster, and, hence, they are not learned or recognizedin the same NN. In this paper, fuzzy c-means (FCM) is used[13]–[15]. In contrast to HCM, the application of fuzzy sets in aclassification function causes the class membership to becomea relative one and an object can belong to several clusters at thesame time but to different degrees. FCM produces the idea ofuncertainty of belonging described by a membership function,and it enables an individual to belong to several networks. Then,all similar patterns can be thoroughly learned and recognized inone NN.

Details of this system are described in the remainder ofthis paper. Section II covers preprocessing of the system. InSection III, we present a method for face recognition based on

1045-9227/$20.00 © 2006 IEEE

Page 2: 04049828.pdf

LU et al.: FACE RECOGNITION BASED ON FCM CLUSTERING AND SUB-NNS 151

Fig. 1. Original face image.

fuzzy clustering and parallel NNs. In Section IV, experimentalresults of evaluating the developed techniques are presented.Discussion is presented in Section V. Finally, conclusions aresummarized in Section VI.

II. PREPROCESSING

A. Facial-Image Acquisition

In our research, original images were obtained using acharge coupled devices (CCD) camera with image dimensionsof 384 243 pixels encoded using 256 gray-scale levels.

In image acquisition, the subject sits 2.5 m away from a CCDcamera. On each site of the camera, two 200-W lamps are placedat 30 angles to the camera horizontally. The original images areshown in Fig. 1.

B. Lighting Compensation

We adjusted the locations of the lamps to change the lightingconditions. The total energy of an image is the sum of thesquares of the intensity values. The average energy of all theface images in the database is calculated. Then, each face imageis normalized to have energy equal to the average energy

Energy Intensity (1)

C. Facial-Region Extraction

We adopt the face-detection method presented in [25]. Themethod of detecting and extracting the facial features in a gray-scale image is divided into two stages. First, the possible humaneye regions are detected by testing all the valley regions in animage. A pair of eye candidates is selected by means of the ge-netic algorithm to form a possible face candidate. In our method,a square block is used to represent the detected face region.Fig. 2 shows an example of a selected face region based on thelocation of an eye pair. The relationships between the eye pairand the face size are defined as follows:

Then, the symmetrical measure of the face is calculated. Thenose centerline (the perpendicular bisector of the line linkingthe two eyes) in each facial image is calculated. The differencebetween the left half and right half from the nose centerline of aface region should be small due to its symmetry. If the value of

Fig. 2. Geometry of our head model.

Fig. 3. Windows for facial feature extraction.

the symmetrical measure is less than a threshold value, the facecandidate will be selected for further verification.

After measuring the symmetry of a face candidate, the exis-tences of the different facial features are also verified. The posi-tions of the facial features are verified by analyzing the projec-tion of the face candidate region. The facial feature regions willexhibit a low value on the projection. A face region is dividedinto three parts, each of which contains the respective facial fea-tures. The -projection is the average of gray-level intensitiesalong each row of pixels in a window. In order to reduce the ef-fect of the background in a face region, only the white windows,as shown in Fig. 3, are considered in computing the projections.The top window should contain the eyebrows and the eyes, themiddle window should contain the nose, and the bottom windowshould contain the mouth. When a face candidate satisfies theaforementioned constraints, it will be extracted as a face region.The extracted face image is shown in Fig. 4.

D. Principal Component Analysis (PCA)

Let a pattern be a two-dimensional (2-D) array ofintensity values. A pattern may also be considered as a vectorof dimension . Denote the database of patterns by

. Define the covariance matrix asfollows [4]:

(2)

Page 3: 04049828.pdf

152 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 1, JANUARY 2007

Fig. 4. Extracted face image.

where and. Then, the eigenvalues and eigenvectors of

the covariance are calculated. Letbe the eigenvectors corresponding to the

largest eigenvalues. Thus, for a set of patterns ,their corresponding eigenface-based features can beobtained by projecting into the eigenface space as follows:

(3)

For the PCA method, results are shown for the case of using32 principal components. In other words, faces from a high-di-mensional image space are projected to a 32-dimensional fea-ture vector.

III. FUZZY CLUSTERING AND NEURAL NETWORKS

The “clusters” are functions that assign to each object anumber between zero and one, which is called the membershipof the object in the cluster. Objects which are similar to eachother are identified by having high membership degrees in thesame cluster. It is also assumed that the membership degreesare chosen so that their sum for each object is one; therefore,fuzzy clustering is also a partition of the set of objects. The mostwidely used fuzzy clustering algorithm is the FCM algorithm[13]–[15].

A. FCM

FCM is a data clustering algorithm in which each data pointis associated with a cluster through a membership degree. Thistechnique divides a collection of data points into fuzzygroups and finds a cluster center in each group such that a costfunction of a dissimilarity measure is minimized. The algorithmemploys fuzzy partitioning such that a given data point canbelong to several groups with a degree specified by member-ship grades between 0 and 1. A fuzzy -partition of input fea-ture vector is represented by a matrix

, and is an -element set of -dimensional vectors,each representing a 32-dimensional vector. The entries satisfythe following constraints:

(4)

(5)

(6)

represents the feature coordinate of the thdata. is the membership degree of to cluster . A properpartition of may be defined by the minimization of the

following cost function:

(7)

where is a weighting exponent, called a fuzzifier,that is chosen according to the case. When , the processconverges to a generalized classical -means. When ,all clusters tend towards the center of gravity of the whole dataset. That is, the partition becomes fuzzier with increasing .

is the vector of the cluster centers, andis the distance between and the th cluster. Bezdek [13]

proved that if and , then andminimize only if their entries are computed as

(8)

(9)

One of the major factors that influence the determinationof appropriate clusters of points is the dissimilarity measurechosen for the problem. Indeed, the computation of the mem-bership degree depends on the definition of the distancemeasure , which is the inner product norm (quadratic norm).The squared quadratic norm (distance) between a pattern vector

and the center of the th cluster is defined as

(10)

where is any positive–definite matrix. The identity matrix isthe simplest and most popular choice for .

B. Distributing Algorithm of the Facial Images by FCM

The FCM algorithm consists of a series of iterations using (8)and (9). This algorithm converges to a local minimum point of

. We use the FCM as follows to determine the clustercenters and the membership matrix .Step 1) Initially, the membership matrix is constructed using

random values between 0 and 1 such that constraints(4), (5), and (6) are satisfied.

Step 2) The membership function is computed as follows.a) For each cluster , the fuzzy cluster center is

computed using (9).b) All cluster centers which are too close to each

other are eliminated.For each cluster , the distance is computedas .When for to ,where is the average distance value.

c) For each cluster , the distance is computedusing (10).

d) The cost function using (7) is computed. Stopif its improvement over the previous iterationbelow a threshold.

e) A new using (7) is computed and Step 2) isrepeated.

Step 3) The number of membership functions is decreasedbased on defuzzification.

Page 4: 04049828.pdf

LU et al.: FACE RECOGNITION BASED ON FCM CLUSTERING AND SUB-NNS 153

Fig. 5. Structure of proposed parallel NNs.

C. Parallel NNs

In this paper, the parallel NNs are composed of three-layerBPNNs. A connected NN with 32 input neurons and six outputneurons have been simulated (six individuals are permitted tobelong to each subnet, which is presented in Section IV). Thestructure of the proposed parallel NNs is illustrated in Fig. 5.

The number of hidden units was selected by sixfold crossvalidation from 6 to 300 units [29]. The algorithm added threenodes to the growing network once. The number of hidden unitsis selected based on the maximum recognition rate.

1) Learning Algorithm: A standard pattern (average pattern)is obtained from 12 patterns per registrant. Based on the FCMalgorithm, 20 standard patterns are divided into several clusters.Similar patterns in one cluster are entered into one subnet.

Then, 12 patterns of a registrant are entered into the inputlayer of the NN to which the registrant belongs. On each subnet,the weights are adapted according to the minus gradient of thesquared Euclidean distance between the desired and obtainedoutputs.

2) Recognition Algorithm: When a test pattern is input intothe parallel NNs, as illustrated in Fig. 5, based on the outputsin each subnet and the similarity values, the final result can beobtained as follows.Step 1) Exclusion by the negation ability of NN. First, all the

registrants are regarded as candidates. Then, onlythe candidate with the maximum output remains ineach subnet. If the maximum output values are lessthan the threshold value, corresponding candidatesare deleted. The threshold value is set to 0.5, whichis determined based on the maximum output valueof the patterns of the nonregistrant. Since similar in-dividuals are distributed into one subnet, based onthis step, the candidates similar to the desired indi-vidual are excluded.

Step 2) Exclusion by the negation ability of parallel NNs.Among the candidates remaining after Step 1), thecandidate that has been excluded in one subnet willbe deleted from other subnets. If all the candidates

are excluded in this step, this test pattern is judgedas a nonregistrant. When a candidate similar to thedesired individual is assigned to several clusters atthe same time, it may become the maximum outputof the subnets to which the desired individual doesnot belong and may be selected as the final answerby mistake. By performing Step 2), this possibilityis avoided.

Step 3) Judgment by the similarity method. If some candi-dates remain after Step 2), then, the similarity mea-sure is used for judgment.The similarity value between the patterns of each re-maining candidate and the test pattern is calculated.The candidate having the greatest similarity value isregarded as the final answer. If this value is less thanthe threshold value of similarity, the test pattern isregarded as a nonregistrant.

We illustrate the overall recognition rates for differentthreshold values in Fig. 11. Lowering the threshold valueraises the recognition rate but lowers the rejection rate, causingnonregistrants to be judged as registrants. In contrast, raisingthe threshold lowers the recognition rate but raises the rejec-tion rate, causing registrants to be judged as nonregistrants.From Fig. 11, it can be seen that in our experiment, the bestperformance is achieved when the threshold is set to 0.97. Thesimilarity is calculated as

similarity (11)

where is the number of individuals and is the number oftrained patterns for the individual.

The system architecture of our experiment is illustrated inFig. 6.

IV. EXPERIMENTS

Experiments have been carried out using patterns of 40 indi-viduals at Chiba University, Chiba, Japan (20 individuals were

Page 5: 04049828.pdf

154 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 1, JANUARY 2007

Fig. 6. System architecture of our experiment.

selected as registrants and 20 individuals as nonregistrants).Each individual provided 24 frontal patterns which show dif-ferent facial expressions: blank expressions and smiles. In oursystem, for each individual, 12 patterns were selected as thetraining set, and 12 patterns were used as the test patterns forrecognition. Patterns of 40 different individuals were obtainedover a two-month period. The 40 individuals consisted of 26males and 14 females. The age of the subjects ranged from 12to 40 years old.

TABLE ISUBNETS DETERMINED USING FCM

Fig. 7. Recognition rate as a function of max-cluster-number.

A. Computation by FCM

Based on the FCM algorithm presented in Section III, if thedifference between two cluster centers is less than the averagedifference value, the two clusters will be incorporated and onecluster will be deleted. individuals divided into clusters aredescribed by an matrix , where the th entry isthe membership between 0 and 1, and the sum of the entries ineach column is one. The last number of clusters is ten. In otherwords, a total of ten clusters were formed around individuals[1]–[5], [7], [10], [11], [15], [19].

B. Defuzzification

1) Defuzzification of Columns: The maximum number ofclusters to which an individual may belong (max-to-cluster)was set to 5. In order to reduce the number of elements, thethreshold of defuzzification was set to . Here,is the number of clusters. If each element value is less than thisthreshold, the value of this element is set to 0. If there are morethan five elements in one column, only the top five elements re-main in the column.

Page 6: 04049828.pdf

LU et al.: FACE RECOGNITION BASED ON FCM CLUSTERING AND SUB-NNS 155

Fig. 8. Learning time as a function of max-cluster-number.

Fig. 9. Recognition rate as a function of max-to-cluster.

2) Defuzzification of Lines: The lines determine how manyelements one cluster may contain. The maximum number ofelements in a cluster (max-cluster-member) is set to six. Thetop six elements that are not 0 are saved and the other elementsare excluded. In order to guarantee that an element belongs to atleast one cluster, an element that is not among the top six is tobe saved when the value of this element in other clusters is 0 inthe same column. The obtained subnets are presented in Table I.

In order to determine the values of max-to-cluster and max-cluster-member, we performed various experiments. The exper-imental results are illustrated in Figs. 7–10. The values were de-termined when the system achieved the highest recognition rateand took the shortest learning time.

Fig. 10. Learning time as a function of max-to-cluster.

TABLE IISUBNETS AFTER PARTIAL UNIFICATION

3) Merging of Clusters: In order to reduce the amount of cal-culation, clusters are integrated (arranged). However, the max-imum number of elements per cluster is six. We integrate theclusters automatically. The algorithm for this step is presentedas follows.

for

If net-count max-to-cluster, continue

for

for

If net-count net-count max-to-cluster, continue

net-count net-count

Here, net-count denotes the number of elements in theth cluster, and denotes the number of the clusters after

integration.

Page 7: 04049828.pdf

156 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 1, JANUARY 2007

TABLE IIIOUTPUT RESULTS OF SUBNET 1 AFTER LEARNING

Furthermore, when the same elements are integrated into onecluster, only one element remains. Here, integrating the clus-ters can reduce the amount of calculation, as shown in Table II,which enhances the efficiency.

C. Learning and Recognition by Parallel NNs

In the learning procedure, let us consider patterns 1, 8, 9, 12,14, and 16 in subnet 1 of Table II. Table III gives some of theactual output after learning. On the basis of Table III, learningby the parallel NNs is judged to be correct.

For 20 registrants, 240 patterns are used for recognition. Anadditional 240 patterns are prepared for 20 nonregistrants to de-termine whether the parallel NNs system can judge that the in-dividuals are not registered. The recognition procedure is basedon the algorithm presented in Section III.

D. Experimental Results

1) Registrant: Here, the results of recognition of a pattern(pattern 1116, registrant 11, image 16) are shown as an ex-ample. Table IV gives the outputs from each subnet for pattern1116. The element of the greatest output is extracted from each

TABLE IVOUTPUTS OF EACH SUBNET FOR PATTERN 1116

subnet as the first answer to each subnet. For subnet 6, no an-swer was obtained because all element values were lower thanthe threshold of 0.5. Table V lists the results. For subnets 1–7,patterns 6, 11, 12, and 18 were selected. Pattern 6 was excludedfrom subnets 3, 5, and 6. Pattern 18 was excluded from subnets 5and 6 as well as from the answers of all subnets. Patterns 11 and12 remained after recognition based on the negation ability ofthe parallel NNs system. The results are presented in Table VI.Here, recognition by the similarity measure was applied. The

Page 8: 04049828.pdf

LU et al.: FACE RECOGNITION BASED ON FCM CLUSTERING AND SUB-NNS 157

TABLE VRESULTS OF EACH SUBNET FOR PATTERN 1116

TABLE VIRECOGNITION RESULTS OF PARALLEL NNS BASED

ON REJECTION RULES FOR PATTERN 1116

TABLE VIIEXAMPLE OF SIMILARITY WITH PATTERN 1116

TABLE VIIIOUTPUTS OF EACH SUBNET FOR PATTERN N0101

TABLE IXOUTPUTS OF EACH SUBNET FOR PATTERN N0311

input pattern had a similarity of 0.994677 with registrant 11,which was better than that of 0.974347 with registrant 12. Thesimilarities are presented in Table VII. Therefore, the face pat-tern was judged to be that of the eleventh registrant.

2) Nonregistrants:a) Exclusion based on the negation ability of the NN: The

results of recognition of a pattern (pattern N0101, nonregistrant1, image 1) are used as an example. Table VIII gives the outputs

TABLE XRESULTS OF EACH SUBNET FOR PATTERN N0311

TABLE XIOUTPUT OF EACH SUBNET FOR PATTERN N0307

from each subnet for pattern N0101. An element of the greatestoutput is extracted from each subnet as the first answer to eachsubnet. For subnets 1–7, no answer was obtained because allelement values were lower than the threshold of 0.5. Therefore,pattern N0101 was identified as a nonregistrant.

b) Exclusion based on the negation ability of the parallelNNs: The results of recognition of a pattern (pattern N0311,nonregistrant 3, image 11) are used as an example. Table IXlists the outputs from each subnet for pattern N0311. An elementof the greatest output is extracted from each subnet as the firstanswer to each subnet. Table X gives the results of each subnet.For subnet 7, pattern 8 was selected. However, it was excludedfrom subnet 1. No element remained after recognition based onthe negation ability of the parallel NNs system. Therefore, theface pattern was judged to be a nonregistrant.

c) Exclusion based on similarity: Here, the results ofrecognition of a pattern (pattern N0307, nonregistrant 3, pattern7) are used as an example. Table XI gives the outputs from eachsubnet for pattern N0307. An element of the greatest output isextracted from each subnet as the first answer to each subnet.Table XII gives the results. For subnets 1–7, patterns 6 and 18were selected. Pattern 6 was excluded from subnets 3, 5, and6. Pattern 18 was left after recognition based on the negationability of the parallel NNs system. However, the similarity withpattern 18 was 0.9602, which was less than the threshold of0.97. Therefore, the pattern was judged to be a nonregistrant.

We illustrate the overall recognition rates for differentthreshold values in Fig. 11, where this method was appliedto frontal patterns. The horizontal axis indicates the thresholdvalue used, and the vertical axis represents the recognitionrate. When the threshold is set to 0.97, the recognition rateis 98.75% (two errors and one rejection among 240 patterns)for registrants and 99.58% (one error among 240 patterns) fornonregistrants. The false rejection rate (FRR) is 0.42% and thefalse acceptation rate (FAR) is 0.42%. Since the FRR and FAR

Page 9: 04049828.pdf

158 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 1, JANUARY 2007

TABLE XIIRESULTS OF EACH SUBNET FOR PATTERN N0307

Fig. 11. Recognition results.

are equal, the crossover error rate (CER) is 0.42%. Recognitionrate and rejection rate are defined as follows.

1) Recognition rate: For patterns of registrants in a database,the correct recognition accuracy rate at the testing stage.

2) Recognition rate 100 (number of correct patterns)/(total number of all patterns).

3) Rejection rate: For patterns of nonregistrant, the accuraterejection rate, and for patterns of registrants, the false re-jection rate at the testing stage.

4) Rejection rate 100 (number of rejected patterns)/(totalnumber of patterns).

V. DISCUSSION

In this paper, an efficient approach for face recognitionwas presented. In order to assess this system, we tested threeexisting approaches for face recognition using the same data-base and compared the performances with our method. TheBPNN system, the HCM and parallel NNs system, and thepattern matching system were used as the three approaches.The three approaches were carried out using facial patterns of40 individuals (20 individuals were selected as registrants and20 individuals as nonregistrants). They are the same group asthat in the our research. The processes and experimental resultsof the three experiments are as follows.

A BPNN with 32 input neurons and 20 output neurons (thenumber of classes) were simulated in this experiment. Thetraining algorithm was the same as the algorithm in each subnet

TABLE XIIIERROR RATES OF DIFFERENT APPROACHES

of our proposed system. The optimum number of hidden unitswas selected by cross validation presented in Section III. Therecognition rate was 93.75%. Furthermore, it required an ap-proximately two times longer learning time than the proposedsystem.

The experiment using parallel NNs and HCM [5] was per-formed for the same database as ours. First, 20 registrants weredivided into four clusters and the maximum number of individ-uals in each cluster was six. The training algorithm was per-formed in each subnet. When a test pattern was input into theparallel NNs, the maximum output of each subnet was extracted.Then, the similarity value between the candidates extracted andthe test pattern was calculated. The candidate having the greatestsimilarity value is judged as the final answer. Since the individ-uals belonging to each subnet are not overlapped, some similarpatterns cannot be assigned to the same subnet, the recognitionrate of registrants (95.83%) was lower than that of our system.

Lam and Yan [2] proposed an analytic-to-holistic approachbased on point matching, in which the feature points and theeyes, nose and mouth windows are compared with those in thedatabase by correlation. The pattern-matching (MP) method isbased on distances in multidimensional characteristic space andis widely used in various fields. However, the MP method is vul-nerable to image fluctuation. Comparisons with Lam and Yan’smethod for the same database are shown in Table XIII.

This technique involves the similarity measure for selectingthe final answer from all candidates extracted by the parallelNNs system. Since the parallel NNs system has already ex-cluded candidates similar to the desired individual throughthe processes of extracting the candidate having the maximumoutput in each subnet at Step 1) (Section III) and excluding allthe candidates that have been excluded in a subnet at Step 2),a simple method is sufficient for subsequent judgment. Thismethod should be used to compare the patterns dissimilar toeach other for final judgment. The similarity measure is a goodmethod of judging an answer from a small number of dissim-ilar patterns by an easy process. Furthermore, utilizing thesimilarity measure takes much shorter learning and recognitiontime than utilizing the NN method at the final step.

VI. CONCLUSION

In this paper, we proposed a fuzzy clustering and parallelNNs method for face recognition. The patterns are divided intoseveral small-scale subnets based on FCM. Due to the nega-tion ability of NN and parallel NNs, some candidates are ex-cluded. The similarities between the remaining candidates andthe test patterns are calculated. We judged the candidate withthe greatest similarity to be the final answer when the similarity

Page 10: 04049828.pdf

LU et al.: FACE RECOGNITION BASED ON FCM CLUSTERING AND SUB-NNS 159

value was above the threshold value. Otherwise, it was judgedto be nonregistered. The proposed method achieved a 98.75%recognition accuracy for 240 patterns of 20 registrants and a99.58% rejection rate for 240 patterns of 20 nonregistrants.

APPENDIX

COMPARISONS WITH OTHER APPROACHES

To compare our proposed recognition system against the pop-ular face-recognition methods, we perform our proposed systemon an Olivetti Research Laboratory (ORL) database, CambridgeUniversity, Cambridge, U.K.1 All the 400 patterns from the ORLdatabase are used to evaluate the face-recognition performanceof our proposed method. The ORL face database is composedof 400 patterns of ten different patterns for each of 40 distinctindividuals. The variations of the patterns are across pose, size,time, and facial expression. “All the images were taken against adark homogeneous background with the subjects in an upright,frontal position, with tolerance for some tilting and rotation ofup to about 20 . There is some variation in scale of up to about10%” [27]. The spatial and gray-scale resolution of the patternsare 92 112 and 256, respectively.

The training set and test set are derived in the same way asin [6], [16], [24], and [25]. A total of 200 patterns were ran-domly selected as the training set and another 200 patterns as thetesting set, in which each individual has five patterns. Next, thetraining and testing patterns were exchanged and the experimentwas repeated one more time. Such procedures were carried outseveral times. In the following experiments, the National Foot-ball League (NFL) error rate was the average of the error ratesobtained by three runs (three runs [6], four runs [26], six runs[16], and five runs [27]).

The face-recognition procedure consists of 1) a feature ex-traction step where the feature representation of each training ortest pattern is extracted by PCA + FDA (fisher discriminant anal-ysis) [16], and 2) a classification step in which each feature rep-resentation obtained is input into the proposed fuzzy clusteringand parallel NN system. A 1% error rate was obtained when 25features were used. This is better than the result (error rate of1.92%) reported by Er et al. [16], where the feature extractionstep was the same as ours and the RBFNN was used in the clas-sification step. It should be noted at this point that PCA + FDA isused in Step 1) since the facial patterns from the ORL databaseare variable in the pose and facial expression. As mentioned byEr et al. [16], the PCA retains unwanted variations caused bylighting, facial expression, and other factors, the fisherface par-adigm aims at overcoming the drawback of the eigenface par-adigm by integrating FDA criteria. Otherwise, it is mentionedby Lu et al. [27] that fisherfaces may lose significant discrimi-nant information due to the intermediate PCA step. Therefore,PCA is used to extract features in our aforementioned experi-ment since the variation of the patterns in our database is slight.Comparisons with Cable News Network (CNN) [6], RBFNN[16], NFL [26] and direct fractional LDA (DF-LDA) [27] per-formed on the same ORL database are shown in Table XIV. It

1The ORL database is available from http//www.cam-orl.co.uk/face-data-base.html

TABLE XIVERROR RATES OF RECENTLY PERFORMED EXPERIMENTS

ON THE ORL DATABASE

can be seen that the performance of our proposed method overallis superior to those of the other known methods when using theORL database.

REFERENCES

[1] A. Z. Kouzani, F. He, and K. Sammut, “Towards invariant face recog-nition,” Inf. Sci., vol. 123, pp. 75–101, 2000.

[2] K. Lam and H. Yan, “An analytic-to-holistic approach for face recog-nition based on a single frontal view,” IEEE Trans. Pattern Anal. Mach.Intell., vol. 20, no. 7, pp. 673–686, Jul. 1998.

[3] P. J. Phillips, “Matching pursuit filters applied to face identification,”IEEE Trans. Image Process., vol. 7, no. 8, pp. 1150–1164, Aug. 1998.

[4] M. A. Turk and A. P. Pentland, “Eigenfaces for recognition,” J. Cog-nitive Neurosci., vol. 3, pp. 71–86, 1991.

[5] T. Yahagi and H. Takano, “Face recognition using neural networks withmultiple combinations of categories,” J. Inst. Electron. Inf. Commun.Eng., vol. J77-D-II, no. 11, pp. 2151–2159, 1994, (in Japanese).

[6] S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back, “Face recogni-tion: A convolutional neural-network approach,” IEEE Trans. NeuralNetw., vol. 8, no. 1, pp. 98–113, Jan. 1997.

[7] S. H. Lin, S. Y. Kung, and L. J. Lin, “Face recognition/detection byprobabilistic decision-based neural network,” IEEE Trans. NeuralNetw., vol. 8, no. 1, pp. 114–132, Jan. 1997.

[8] H. H. Song and S. W. Lee, “A self-organizing neural tree for large-setpattern classification,” IEEE Trans. Neural Netw., vol. 9, no. 3, pp.369–380, May 1998.

[9] C. M. Bishop, Neural Networks for Pattern Recognition. London,U.K.: Oxford Univ. Press, 1995.

[10] J. L. Yuan and T. L. Fine, “Neural-network design for small trainingsets of high dimension,” IEEE Trans. Neural Netw., vol. 9, no. 1, pp.266–280, Jan. 1998.

[11] X. Xie, R. Sudhakar, and H. Zhuang, “Corner detection by a cost mini-mization approach,” Pattern Recognit., vol. 26, no. 12, pp. 1235–1243,1993.

[12] S. K. Oh and W. Pedrycz, “Multi-FNN identification based on HCMclustering and evolutionary fuzzy granulation,” Simulation ModellingPractice Theory, vol. 11, no. 7–8, pp. 627–642, 2003.

[13] J. Bezdek, “A convergence theorem for the fuzzy ISODATA clusteringalgorithms,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-2, no.1, pp. 1–8, Jan. 1981.

[14] W. Pedrycz, “Conditional fuzzy clustering in the design of radial basisfunction neural networks,” IEEE Trans. Neural Netw., vol. 9, no. 4, pp.601–612, Jul. 1998.

[15] X. Wu and M. J. Er, “Dynamic fuzzy neural networks: A novel ap-proach to function approximation,” IEEE Trans. Syst., Man, Cybern.,B, Cybern., vol. 30, no. 2, pp. 358–364, Apr. 2000.

[16] M. J. Er, S. Wu, J. Lu, and H. L. Toh, “Face recognition with radialbasis function RBF neural networks,” IEEE Trans. Neural Netw., vol.13, no. 3, pp. 697–710, May 2002.

[17] F. Yang and M. Paindavoine, “Implementation of an RBF neural net-work on embedded systems: Real-time face tracking and identity ver-ification,” IEEE Trans. Neural Netw., vol. 14, no. 5, pp. 1162–1175,Sep. 2003.

[18] J. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos, “Face recogni-tion using kernel direct discriminant analysis algorithms,” IEEE Trans.Neural Netw., vol. 14, no. 1, pp. 117–126, Jan. 2003.

[19] B. K. Gunturk, A. U. Batur, and Y. Altunbasak, “Eigenface-domainsuper-resolution for face recognition,” IEEE Trans. Image Process.,vol. 12, no. 5, pp. 597–606, May 2003.

[20] B. L. Zhang, H. Zhang, and S. S. Ge, “Face recognition by applyingwavelet subband representation and kernel associative memory,” IEEETrans. Neural Netw., vol. 15, no. 1, pp. 166–177, Jan. 2004.

Page 11: 04049828.pdf

160 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 1, JANUARY 2007

[21] Q. Liu, X. Tang, H. Lu, and S. Ma, “Face recognition using kernelscatter-difference-based discriminant analysis,” IEEE Trans. NeuralNetw., vol. 17, no. 4, pp. 1081–1085, Jul. 2006.

[22] W. Zheng, X. Zhou, C. Zou, and L. Zhao, “Facial expression recogni-tion using kernel canonical correlation analysis (KCCA),” IEEE Trans.Neural Netw., vol. 17, no. 1, pp. 233–238, Jan. 2006.

[23] X. Tan, S. Chen, Z. H. Zhou, and F. Zhang, “Recognizing partially oc-cluded, expression variant faces from single training image per personwith SOM and soft k-NN ensemble,” IEEE Trans. Neural Netw., vol.16, no. 4, pp. 875–886, Jul. 2005.

[24] D. Valentin, H. Abdi, A. J. O’Toole, and G. W. Cottrell, “Connectionistmodels of face processing: A survey,” Pattern Recognit., vol. 27, no. 9,pp. 1209–1230, 1994.

[25] K. W. Wong, K. M. Lam, and W. C. Siu, “An efficient algorithm forface detection and facial feature extraction under different conditions,”Pattern Recognit., vol. 34, no. 10, pp. 1993–2004, 2001.

[26] S. Z. Li and J. Lu, “Face recognition using the nearest feature linemethod,” IEEE Trans. Neural Netw., vol. 10, no. 2, pp. 439–443, Mar.1999.

[27] J. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos, “Face recognitionusing LDA-based algorithm,” IEEE Trans. Neural Netw., vol. 14, no.1, pp. 195–200, Jan. 2003.

[28] A. S. Nugroho, S. Kuroyanagi, and A. Iwata, “Efficent subspacelearning using a large scale neural network combNET-II,” in Proc. 9thInt. Conf. Neural Inf. Process., Nov. 2002, vol. 1, pp. 447–451.

[29] R. Setiono, “Feedforward neural network construction using cross val-idation,” Neural Comput., vol. 13, pp. 2865–2877, 2001.

Jianming Lu received the M.S. and Ph.D. degreesfrom the Graduate School of Science and Tech-nology, Chiba University, Chiba, Japan, in 1990 and1993, respectively.

In 1993, he joined Chiba University as an Asso-ciate in the Department of Information and ComputerSciences. Since 1994, he has been with the GraduateSchool of Science and Technology, Chiba University,where, in 1998, he became an Associate Professor.His current research interests are in the theory andapplications of digital signal processing and control

theory.Dr. Lu is a member of the Institute of Electronics, Information and Commu-

nication Engineers (IEICE–Japan), the Society of Instrument and Control Engi-neers (SICE–Japan), the Institute of Electrical Engineers of Japan (IEEJ), JapanSociety of Mechanical Engineers (JSME), and Research Institute of Signal Pro-cessing, Japan.

Xue Yuan received the B.S. degree from the Schoolof Information Science and Engineering, North-eastern University, Shenyang, China, in 1999 and theM.S. degree from the Graduate School of Scienceand Technology, Chiba University, Chiba, Japan, in2004, where she is currently working towards thePh.D. degree.

Her current research interests include image anal-ysis and pattern recognition.

Takashi Yahagi (M’78–SM’05) received the B.S.,M.S., and Ph.D. degrees in electronics from theTokyo Institute of Technology, Tokyo, Japan, in1966, 1968, and 1971, respectively.

In 1971, he joined Chiba University, Chiba, Japan,as a Lecturer in the Department of Electronics. From1974 to 1984, he was an Associate Professor, andin 1984 he became a Professor in the Departmentof Electrical Engineering. From 1989 to 1998,he was with the Department of Information andComputer Sciences. Since 1998, he has been with

the Department of Information Science of the Graduate School of Science andTechnology, Chiba University. He is the author of Theory of Digital SignalProcessing volumes 1–3 (1985, 1985, and 1986), Digital Signal Processing andBasic Theory (1996), and Digital Filters and Signal Processing (2001) and thecoauthor of Digital Signal Processing of Speech and Images (1996), VLSI andDigital Signal Processing (1997), Multimedia and Digital Signal Processing(1997), Neural Network and Fuzzy Signal Processing (1998), Communicationsand Digital Signal Processing (1999), Fast Algorithms and Parallel SignalProcessing (2000), and Digital Filters and Signal Processing (Corona: Tokyo,Japan, 2001). He is the Editor of the Library of Digital Signal Processing(Corona: Tokyo, Japan). His current research interests are in the theory andapplications of digital signal processing and other related areas.

Dr. Yahagi is a Fellow of the Institute of Electronics, Information and Com-munication Engineers (IEICE–Japan), and Research Institute of Signal Pro-cessing, Japan. He has been the President of the Research Institute of SignalProcessing, Japan, since 1997, and the Editor-in-Chief of the Journal of SignalProcessing.