robustvisualsimilarityretrievalinsinglemodelfacedatabases ...statistics [15] . bartlett et al. [15]...

Pattern Recognition 38 (2005) 1009–1020www.elsevier.com/locate/patcog

Robust visual similarity retrieval in single model face databases

Yongsheng Gao∗,Yutao QiSchool of Microelectronic Engineering, Nathan Campus, Griffith University, QLD 4111, Australia

Received 20 May 2004; received in revised form 20 December 2004; accepted 20 December 2004

Abstract

In this paper, we introduce a novel visual similarity measuring technique to retrieve face images in photo album databasesfor law enforcement. Though much work is being done on face similarity matching techniques, little attention is given to thedesign of face matching schemes suitable for visual retrieval in single model databases where accuracy, robustness to scaleand environmental changes, and computational efficiency are three important issues to be considered. This paper presents arobust face retrieval approach using structural and spatial point correspondence in which the directional corner points (DCPs)are generated for efficient face coding and retrieval. A complete investigation on the proposed method is conducted, whichcovers face retrieval under controlled/ideal condition, scale variations, environmental changes and subject actions. The systemperformance is compared with the performance of the eigenface method. It is an attractive finding that the proposed DCPretrieval technique has performed superior to the eigenface method in most of the comparison experiments. This researchdemonstrates that the proposed DCP approach provides a new way, which is both robust to scale and environmental changes,and efficient in computation, for retrieving human faces in single model databases.� 2005 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.

Keywords:Content-based image retrieval; Face database; Single model database; Face matching; Similarity measuring; Directional cornerpoint

1. Introduction

In recent years, image retrieval by visual content fromdatabase has become a major research area due to theever-increasing rate at which images are generated in manyinformation systems[1]. A wide range of image data hasestablished its relevance in the areas of medicine, industry,art and law enforcement. Most practices in querying and re-trieving images are often based on queries by filenames, cap-tions and keywords. This does not satisfy the requirementsof modern image retrieval. Content-based image retrieval,which retrieves images by visual contents in an image, hasattracted much attention in recent researches. Visual retrieval

∗ Corresponding author. Tel.: +61 7 3875 3652;fax: +61 7 3875 5198.

E-mail address:[email protected](Y. Gao).

0031-3203/$30.00� 2005 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.doi:10.1016/j.patcog.2004.12.006

of face images, such as album search that helps police offi-cers identify a suspect by searching a country or state photodatabase, is of particular interest in law enforcement appli-cations. Bach et al.[2] first attempted this problem using aninteractive manner. Automatic similarity retrieval in a facedatabase presents a significant challenge to researchers dueto the small inter-class variation in the face database. Humanfaces are very similar in structure with minor differencesfrom person to person. Furthermore, scale variations, ap-pearance changes due to environmental changes (i.e., light-ing condition changes) and subject actions (i.e., head posevariations and facial expressions) increase the intra-classvariation. These factors further complicate the face retrievaltask as one of the most difficult problems in image retrieval.

Computerized human face recognition has been an activeresearch area for the last few decades because it has manypractical applications, such as access control, bankcardverification, mug shot searching, security monitoring, and

http://www.elsevier.com/locate/patcog

mailto:[email protected]

1010 Y. Gao, Y. Qi / Pattern Recognition 38 (2005) 1009–1020

surveillance systems. Different applications present differ-ent constraints and need different processing requirements.Though much work is being done on face similarity match-ing techniques[3–7], little attention is given to the design offace matching schemes suitable for visual retrieval in singlemodel databases where accuracy, robustness to scale andenvironmental changes, and computational efficiency arethree important issues to be considered. For image retrieval,the method employed needs to satisfy following conditions:(1) It must be suitable for retrieval from single modeldatabases because the face databases in law enforcementapplications contain only one photo per person. (2) Itshould be robust to scale and environmental changes. Facedetection and localization is a prior stage for automaticrecognition of faces. It is inevitable that a face is localizedcorrectly with minor scale variation. Thus, it is an impor-tant requirement that a face matching algorithm is robust toscale variations. The robustness to environmental changes(i.e., lighting condition changes) is another critical issue forface image retrieval, such as photo album searching. Whencollecting the image data, subjects are usually required tobe cooperative to have a fronto-parallel head pose with neu-tral expression such that the effect of appearance changesdue to subject actions is minimized. However, the lightingconditions between the query image and the models areusually different because they are taken at different timeand in different places. The environmental changes couldthus be the key remaining problem in this situation. (3) Itmust be fast in computation.

One of the pioneering studies on automated face recog-nition was using geometrical features done by Kanade[8]in 1973. Their system achieved a peak performance of 75%recognition rate on a database of 20 people using two im-ages per person, one as the model and the other as thetest image. Goldstein et al.[9], and Kaya and Kobayashi[10] showed that a face recognition program provided withfeatures extracted manually could perform recognition withapparently satisfactory result. Bruneli and Poggio[5] auto-matically extracted a set of geometrical features from thepicture of a face, such as nose width and length, mouth po-sition and chin shape. There were 35 features extracted toform a 35-dimensional vector. The recognition was then per-formed with a Bayes classifier. They reported a recognitionrate of 90% on a database of 47 people. Cox et al.[11] in-troduced a mixture-distance technique, which achieved 95%recognition rate on a query database of 685 individuals.Each face was represented by 30 manually extracted dis-tances. Manjunath et al.[12] used Gabor wavelet decom-position to detect feature points for each face image, whichgreatly reduced the storage requirement for the database.Typically 35–45 feature points per face were generated. Thematching process utilized the information present in a topo-logical graphic representation of the feature points. Aftercompensating for different centroid location, two cost val-ues, the topological cost and similarity cost, were evaluated.The recognition accuracy in terms of the best match to the

right person was 86% and 94% of the cases that the correctperson’s face was in the top three candidate matches. Geo-metrical feature matching based on precisely measured dis-tances between features may be most useful for finding pos-sible matches in a large database such as mug shot album.Eigenface[6] is one of the most well-known face match-ing approaches. It is also known as principal componentanalysis (PCA). Kirby and Sirovich[13] initially used PCAto efficiently represent pictures of human faces. Turk andPentland introduced eigenfaces[6] and eigenspaces[14] forface detection and identification. Any face image could beapproximately reconstructed by a collection of weights forthe face and a standard face picture, i.e. eigenpicture. Theweights describing each face are used for similarity match-ing. The eigenface approach appears to be a fast and practi-cal method for face image retrieval. However, the basis im-ages developed by PCA depend only on second-order imagesstatistics[15]. Bartlett et al.[15] used independent compo-nent analysis (ICA), which is sensitive to higher-order im-ages statistics, in their face recognition system. Better per-formances were reported under various testing conditions.Neural networks[4,16–18]have been successfully appliedto face recognition to tackle the problem of appearancechanges and scale variations. Lawrence et al.[4] proposeda hybrid neural network, which combined local image sam-pling, a self-organizing map (SOM) neural network, and aconvolutional neural network. The SOM provides a quanti-zation of the image samples into a topological space. Theconvolutional network extracts successively larger featuresin a hierarchical set of layers, and provides partial invarianceto translation, rotation, scale and deformation. The authorsreported 96.2% correct recognition with the ORL databaseof 400 images of 40 individuals. The classification time wasless than 0.5 s, but the training time was as long as 4 h. Linet al. [17] used probabilistic decision-based neural network(PDBNN), which inherited the modular structure from itspredecessor, a decision-based neural network (DBNN)[18].A hierarchical neural network structure with non-linear ba-sis functions and a competitive credit-assignment schemewas adopted. PDBNN-based biometric identification systemhas the merits of both neural networks and statistical ap-proaches, and its distributed computing principle is relativelyeasy to implement on a parallel computer. In Ref.[17], itwas reported that PDBNN face recognizer had the capabil-ity of recognizing up to 200 people, and could achieve upto 96% correct recognition rate in approximately 1 s. How-ever, when the number of persons increases, the computingexpense will become more demanding. For face verifica-tion, a multiresolution pyramid structure called Cresceptron[16] was developed. In general, neural network approachesencounter problems when the number of classes (i.e., indi-viduals) increases. Moreover, they are not suitable for singlemodel image retrieval tasks because multiple model imagesper person are required for training the systems to optimalparameter setting. The line edge map (LEM) matching tech-nique [19] was recently used to deal with the difficulty of

Y. Gao, Y. Qi / Pattern Recognition 38 (2005) 1009–1020 1011

face recognition under lighting condition changes. Thoughthe LEM technique is relatively tolerant to lighting conditionchanges and suitable for retrieval in single model databases,its computational expense is still too high for image retrievalof face databases.

This paper proposes a new face description and similaritymeasuring technique for visual similarity retrieval in singlemodel face databases, which is relatively robust to scale andenvironmental changes, and efficient in computation. Insteadof using local operation of isolated points, the proposedmethod employs directional corner point (DCP) matchingin which directional information showing connectivity toits neighbors is utilized in the point correspondence. Un-like neural network approaches that require multiple modelfaces per person to train the system to optimal setting, theDCP method is suitable for single model face database re-trieval and is fast in computation. A complete feasibilityinvestigation and evaluation for the proposed face retrievalmethod is conducted using two standard and public avail-able face databases, which covers all conditions of humanface retrieval. They are face retrievals under controlled/idealcondition, scale variations, environmental changes and sub-ject actions. The system performance is compared with theperformance of the eigenface method. This research demon-strates that the proposed DCP approach provides a new so-lution for visual similarity retrieval in single model facedatabases.

In the following, Section 2 presents a lighting-insensitiveface descriptor, which incorporates structural informationwith spatial features. In Section 3, an efficient warping al-gorithm for visual similarity retrieval is proposed. Particularattention is given to the discriminative power to distinguishsimilar objects. In Section 4, the proposed system is exten-sively examined with various experimental conditions, andcompared with a benchmark system. Finally, the paper con-cludes in Section 5.

2. Describing a face by DCP

Edges are the most fundamental features of objects in the3-D world. The edges in an image reflect large local intensity

M(x,y,null,�2)

P(x,y,�1,�2)

N(x,y,�1,�2)

L(x,y,�1,null)

“horn 1”

“horn 2”

Fig. 1. An illustration of DCPs.

changes due to the geometrical structure of the object, thecharacteristic of surface reflectance of the object, and theviewing direction. They have the advantages of less demandon storage space and less sensitive to illumination changes.However, edge maps utilize spatial information of an imagebut lack structural representation. For face image retrieval,the representation efficiency of edge curves needs to be fur-ther improved to cater for the speed requirement of databaseretrieval. On the other hand, we believe that the structuralinformation indicating the connectivity of these edge pointscould enhance the face identity description for similaritymatching. In this study, a new face feature descriptor, DCPs,is proposed to integrate the structural connectivity informa-tion with spatial features of a face image. After detectingthe edge curves[20], a corner detection process[21] is ap-plied to generate the DCPs of a face. A DCPP(x, y, �1, �2)

is represented by its Cartesian coordinates(x, y) and two-directional attributes�1 and �2 (Fig. 1). �1 is the angularvalue of “horn 1” that points to its anterior neighboring cor-ner pointM. Similarly, �2 is the angular value of “horn 2”that points to its posterior neighboring corner pointN. �1and �2 range from 0◦ to 360◦. If a DCP is a start pointof a curve, such as pointM in Fig. 1, a null is assigned to�1. If a DCP is an end point of a curve, such as pointLin Fig. 1, a null is assigned to�2. An example of humanface DCPs is illustrated inFig. 2. A DCP is either a cor-ner point with two “horns” pointing to its two neighboringDCPs or a start/end point of the edge curve with a single“horn” pointing to its neighboring DCP. These “horns” pro-vide isolated feature points with additional structural infor-mation about the connectivity to their neighbors. The DCPdescriptor, using sparse points, further reduces the storagedemand of an edge map and thus improves the compu-tational efficiency to meet the high-speed requirement invisual retrieval of face databases. On the other hand, thestructural attributes on the points enhance the discrimina-tive power of the descriptor to improve the retrieval accu-racy. The DCP descriptor is expected to be less sensitiveto illumination changes due to the fact that it is a featurederived from low-level illumination-insensitive edge maprepresentation.


Fig. 2. An illustration of face DCPs. The green points with one ortwo “horns” are DCPs superimposed on the face image.

3. DCP correspondence

Based on above image coding, a face is represented by aDCP descriptor that is a set of DCPs in a 4-D feature spaceof location and directions. The face retrieval process locatesthe face in the query image, generates the DCP descrip-tor of the query face and calculates the differences betweenthe query DCP descriptor and the model descriptors in thedatabase. The model in the database with minimum differ-ence is considered as the correct return. Here, we proposea warping process to establish the correspondence betweentwo DCP descriptors. The difference between two faces ismeasured by a cost function of global warping.

3.1. Point-to-point warping

Let A(xA, yA, �A1 , �A

2 ) and B(xB, yB, �B1 , �B

2 ) be twoDCPs. A 3-step warping process, which consists of transla-tion, rotation and open/close operations, is used to establishthe correspondence fromA to B.

(1) Translation operation: A translation operation fromAto B, denoted asT (A → B), movesA to the location ofBsuch thatxA=xB andyA=yB (Figs. 3(a) and (b)). The costfunction for a translation operation fromA toB is defined as

C[T (A → B)] =√

(xA − xB)2 + (yA − yB)2. (1)

(2) Rotation operation: A rotation operation fromA to B,denoted asR(A → B), rotatesA anticlockwise by� degreeuntil the right “horn” ofA coincides with the right “horn”of B (Figs. 3(b) and (c)). The “horn” of a double-horn DCPis defined as the right “horn” if one can rotate that “horn”

(a)

A

B

(b)

A B

(c)

A B

(d)

A

B

Fig. 3. An illustration of DCP warping: (a) two DCPs beforewarping, (b) after translation operation, (c) after rotation operationand (d) after open/close operation.

anticlockwise to the other with an angle less than 180◦.The cost function for a rotation operation fromA to B isdefined as

C[R(A → B)] ={

� if ��180◦,360◦ − � if 180◦ < ��360◦. (2)

(3) Open/close operation: An open (or close) operationfrom A to B, denoted asO/C(A → B), opens (or closes)the two “horns” ofA by � degree until the two “horns”of A coincides with the corresponding “horns” ofB if theintersecting angle between the two “horns” ofA is smaller(or greater) than that ofB (Figs. 3(c) and (d)). The costfunction for an open/close operation fromA toB is defined as

C[O/C(A → B)] = �. (3)

Let A → B denote a warping fromA to B. The costfunction for warpingA to B is defined as a combined costof above three operations in Eq. (4).

C(A→B)

=√

C2[T (A→B)]+f 2{C[R(A→B)]+C[O/C(A→B)]}. (4)

f (·) is a non-linear function to map the angle to a scalar. Itis desirable to ignore small angle variation, which is mostlikely segmentation error or intra-class variation, but penal-ize heavily on large deviation, which is most likely inter-class difference. In this study, a quadratic function

f (x) = x2

W(5)

is used, whereW is the weight to be determined experimen-tally in Section 4.1.


By substitutingA(xA, yA, �A1 , �A

2 ) andB(xB, yB, �B1 , �B

2 )

into Eq. (4), we have

C(A → B) =√

(xA−xB)2+(yA−yB)2+f 2{min[�(�A1 , �B

1 )+�(�A2 , �B

2 ), �(�A1 , �B

2 )+�(�A2 , �B

1 )]}, (6)

where

�(�Ai , �B

j )

={ |�A

i − �Bj | if |�A

i − �Bj |�180◦,

360◦ − |�Ai − �B

j | if 180◦ < |�Ai − �B

j |�360◦,i, j = 1, 2 (7)

and�kl |k=A,B

l=1,2 ∈ [0◦, 360◦).For warping between a single-horn DCP and a double-

horn DCP, one of the four�kl |k=A,B

l=1,2 is null which rep-

resents an indefinite direction. Thus�(null, �Bj )|j=1,2 or

�(�Ai , null)|i=1,2 is required for the calculation of Eq. (6).

Since �(�Ai , �B

j )|i,j=1,2 ranges from 0◦ to 180◦ as de-fined in Eq. (7), the maximum value, 180◦, is assigned to�(null, �B

j )|j=1,2 or �(�Ai , null)|i=1,2 to penalize warping

a single-horn DCP to a double-horn DCP, or vice versa. It isdesirable to prohibit a warping between two DCPs of differ-ent types. The cost function for warping between a single-horn DCP and a double-horn DCP is the same as Eq. (6).

For warping between two single-horn DCPs, a rotationoperation fromA to B rotatesA anticlockwise by� degreeuntil the single “horn” ofA coincides with the single “horn”of B. The cost function for a rotation operation between twosingle “horn” DCPs is of the same form as Eq. (2). It canbe rewritten as Eq. (8) by substitutingA(xA, yA, �A

1 , �A2 )

andB(xB, yB, �B1 , �B

2 ) into Eq. (2).

C[R(A → B)] = �(�Anonull, �

Bnonull), (8)

where�Anonull and�B

nonull are of non-null values which in-dicate the directions of the single-horns ofA andB, respec-tively.

The cost function for an open/close operation in thiscase is defined the same asC[R(A → B)] such that thevalue of C[R(A → B)] + C[O(A → B)] is within thesame range of[0◦, 360◦) in all of the three warping cases(i.e., double-horn to double-horn warping, single-horn todouble-horn warping or vice versa, and single-horn tosingle-horn warping). Thus the cost function for warpinga single-horn DCP (A) to another single-horn DCP (B) isdefined as

C(A→B)

=√

(xA−xB)2+(yA−yB)2+f 2{2×�(�Anonull, �

Bnonull)}. (9)

3.2. Set-to-set correspondence

In that a DCP is modeled as a point in the 4-D fea-ture space of location and directions, the representation of

a generic face results to be a set of points in this space.The dissimilarity between two faces can be characterized by

the warping cost between the two sets in the 4-D space, asshown inFig. 4.

Given two finite DCP setsQ(A1, A2, . . . , Ap) represent-ing a query face andM(B1, B2, . . . , Bq) representing amodel in the face database.pandqare the numbers of DCPsin Q andM. A DCP set to set warping process is proposed toestablish every DCP correspondence between the two DCPsets by minimizing the global warping cost. For each DCPAi in Q, its corresponding DCPBj in M is identified as theone with minimum warping cost fromAi to Bj among allBj ∈ M. The cost for establishing the paring forAi can becalculated by

minBj ∈M

C(Ai → Bj ). (10)

The cost for warping the whole setQ to setM (i.e., estab-lishing paring for all DCPs inQ), denoted asQ ⇒ M, isdefined as

C(Q ⇒ M) = 1

p

∑Ai∈Q

minBj ∈M

C(Ai → Bj ). (11)

Finally, the dissimilarity betweenQ and M is defined asthe maximum value of the two minimum costs to establish

Fig. 4. An example pair of DCP sets (in red and green, respectively)from different face images of the same person.


bilateral correspondences fromQ to M and vice versa.

D(Q, M) = max[C(Q ⇒ M), C(M ⇒ Q)]. (12)

For a given query face, the face retrieval process calculatesabove dissimilarity between the query face and each modelin the database. The model in the database with minimumdissimilarity is considered as the correct return.

4. Experimental results

A complete system performance examination that coversall aspects of face retrieval was conducted. The followingissues for face database retrieval are investigated.

1. Retrieval accuracy under controlled condition.2. Sensitivity to scale variations.3. Sensitivity to environmental changes. When collecting

the image data in law enforcement, the subjects are usu-ally asked to have a neutral expression and fronto-parallelface images are taken such that the effect of subjectactions is minimized. However, the lighting conditionsbetween the query image and the models are usually dif-ferent because they are taken at different time and indifferent places.

4. Computational speed.

In order to make the investigation complete, the sensitivitiesto subject actions are also tested though their effect hasbeen minimized in the data collection stage. The system wascompared with the eigenface method[6], which is widelyused as the benchmark approach.

In this study, two well-known and publicly available facedatabases were tested because one database[22] containsfaces under controlled condition, lighting condition changes(environmental changes) and expression changes (subjectactions) and the other[24] contains faces under controlledcondition and rotated faces (subject actions). The AR face

Fig. 5. Examples of cropped faces.

database[22] from Purdue University was used to evaluatethe system performances under controlled condition, scalevariations, lighting condition changes and facial expressionchanges. The database consists of over 3200 color images ofthe frontal view faces of 126 people (70 men and 56 women).There are 26 different images per person, recorded in twodifferent sessions with a two-week time interval, each ses-sion consisting of 13 images. However, some images werefound lost or corrupted after downloading through Inter-net. One hundred and twelve sets of images (61 men and51 women) can be used. No restrictions on wear (clothes,glasses, etc.), make-up, hairstyle, etc. were imposed to theparticipants. For the details on the collection of the imagesin the AR face database, readers can refer to Martinez andBenavente[23]. Since the number of face images per per-son in the AR database is larger than in many other avail-able face databases and only one image is used for trainingpurpose, we are able to test the proposed approach under alarge variety of conditions. The database from the Universityof Bern [24] was used to examine the system performancesunder controlled condition and head pose variations. Thedatabase contains frontal views of 30 people with differenthead pose variations (Two fronto-parallel pose, two lookingto the right, two looking to the left, two looking downwardsand two looking upwards). In all the experiments, a pre-processing to locate the faces was applied. Original imageswere normalized (in scale and orientation) such that the twoeyes were aligned roughly at the same position with a dis-tance of 80 pixels. Then the facial areas were cropped intothe final images for query and retrieval. Some automatic facedetection and eye location algorithms can be found in Refs.[25–29]. A few examples of the cropped faces are shown inFig. 5.

4.1. Determination of W

The effect ofW in Eq. (5) was investigated using the ARdatabase. All the neutral expression faces under background


Fig. 6. Example pairs of face images in the AR face data base[22]. One is used as the model in the single model data base; the other isused as the query image.

Retrieval rate

70%

75%

80%

85%

90%

95%

100%

0 200 400 600 800 1000 1200 1400

Fig. 7. The effect ofW on retrieval rate.

lighting condition taken in the first session were used toconstruct the model database. The neutral expression facesunder background lighting condition taken in the secondsession were used as query images. Sample query imagesand models in the database are illustrated inFig. 6. The re-trieval rate is plotted against theW in Fig. 7. W was foundeasily tuned since the retrieval rate remained higher than93.75% whenW ranged from 500 to 1200 (Fig. 7). The algo-rithm could only achieve a 2.68% retrieval rate whenW= 1.It improved quickly with the increase ofW and reached theoptimal value of 94.64% whenWwas 800 and remained un-changed till 950. For all the other experiments in this study,W was set as 900. Readers can use this experimental pa-rameter determination method to achieve optimal setting inother applications under different conditions.

4.2. Face retrieval under normal condition

The face images under controlled condition in thedatabase from the University of Bern[24] were also usedto evaluate the performance of the proposed approach. Theretrieval rates are summarized inTable 1. The DCP ex-periments were conducted under three retrieval conditions,namely, top 1, top 5 and top 10 retrievals. In image retrieval,

Table 1Retrieval results of the proposed DCP and the eigenface (20-eigenvectors) methods

Method Retrieval rate (%)

Bern database Eigenface 100DCP 100

AR database Eigenface 55.4DCP (top 1 retrieval) 94.64DCP (top 5 retrieval) 99.11DCP (top 10 retrieval) 100

the performance of a system is not only reflected by the top1 (or rank 1) retrieval rate but also by the topN retrievalrate, i.e., whether the correct image is among the bestN re-trieved images if it is not the best matched image. In thetop 1 retrieval, a correct retrieval was counted when the bestreturned face in the model database was from the same per-son of the query face. In the top 5 or the top 10 retrieval,a correct retrieval was counted when the face image fromthe same person of the query face was among the best 5 or10 returned faces in the model database, respectively. It isfound that DCP approach performed better than (or equallywell as) the eigenface method. The DCP and eigenface ap-proaches both achieved 100% accuracy for retrieving facesin the Bern database. However, the DCP method signifi-cantly outperformed the eigenface method on the AR facedatabase. Detailed eigenface data are tabulated inTable 2.

The performance of the eigenface approach depends onthe number of eigenvectors. If this number is too small,important information about the identity is likely to be lost. Ifit is too high, the weights corresponding to small eigenvaluesmight be noises. The number of eigenvectors is limited bythe rank of the training set matrix. One hundred and twelveis its upper bound in the experiment on the AR databaseand thus 78.57% is the best performance that the eigenface


Table 2Performance comparisons on the AR database

Method Retrieval rate (%)

DCP method 94.64Eigenface (20-eigenvectors) 55.36Eigenface (60-eigenvectors) 71.43Eigenface (112-eigenvectors) 78.57

approach can achieve here. One way to interpret this is thatthe eigenface approach works well as long as the test imageis “similar” to the ensemble of images used in the calculationof eigenface[30]. And the training set should include anumber of images for each person with some variations[6] to obtain a better performance. Here, only one modelimage per person was used for training. Another reason is thedifferences between two face images from the same personare larger in the AR database than that in the database fromthe University of Bern. In particular, the illuminations ofthe query images and the models are slightly different (seethe last pair inFig. 6) because the query images were takenafter two weeks of taking the models.

4.3. Sensitivity to scale variations

The sensitivity to scale variations is an important issuethat has not attracted much attention to date. Face detectionand localization is a prior stage for face recognition. It isinevitable that a face is localizedcorrectlywith minor scalevariation [31]. Recently, Martinez[32] highlighted and in-vestigated the effect of imprecise localization problem onface recognition accuracy. It should be noticed that the lo-calization problem does not mean a failure in face detectionstage; rather, the face localization step succeeds, but thatsmall-scale variation may cause mismatch in the identifica-tion process. Therefore, it is an important requirement thata face similarity measuring algorithm should be robust toscale variations.

A sensitivity analysis to scale variations was conductedusing the AR database. The scale variations were generatedby applying a random scaling factor, which was uniformlydistributed within [1−10%, 1+10%], to the query images.Four faces with different scales were generated from eachquery image. Thus, we had 448 query faces for the retrievaltest with scale variations ranging from−10% to+10%. Thescales of the models were not changed.

The experimental results are tabulated inTable 3. Theresults show that the proposed DCP approach outperformedthe eigenface approach by 23.4%, which means that it ismuch more robust to scale variations than the eigenfacemethod. This is a very attractive property that can alleviatethe difficulty of precisely locating faces in the prior facedetection stage.

Table 3Retrieval accuracies with scale variations

Top 1 (%) Top 5 (%)

Eigenface (112-eigenvectors) 44.9 68.8DCP method 68.30 73.22

4.4. Sensitivity to environmental changes

The robustness to environmental changes (i.e., lightingcondition changes) is one of the critical issues for face im-age retrieval systems, such as photo album searching sys-tems. When collecting the image data, subjects are usuallyrequired to be cooperative to have a neutral expression andfronto-parallel view faces are taken such that the effect ofsubject actions (i.e., head pose variations and facial expres-sions) is minimized. Usually, passport/IC photos are usedto construct a model face database. Thus the environmentalchanges could be the only remaining appearance variabil-ity problem in face retrieval. In general, the query imageand the model image of each person are taken at differenttime and in different places. Their lighting conditions aredifferent and unknown. Moreover, the model images fromdifferent subjects are taken under different lighting condi-tions. It is impossible to get a query image under the samelighting condition as when all the model images in the photodatabase are taken. Hence, the retrieval algorithm has to berobust to the variability in appearance due to lighting con-dition changes.

The issue addressed in this section is whether the DCPrepresentation is sufficient or how well it performs forretrieving faces under varying lighting conditions. Theexperiment was designed using face images taken underdifferent lighting conditions from the AR database (Fig. 8).The faces in neutral expression with background illumina-tion were used as single models of the subjects. The imagesunder three different lighting conditions were used as queryimages.

The experimental results on query images with three dif-ferent lighting conditions are illustrated inTable 4togetherwith the retrieval results under the controlled conditionas comparison benchmarks. In all the three experiments,the proposed DCP method significantly outperformed theeigenface approach. For the eigenface method, it has beensuggested that the first three principal components are theprimary components responding sensibly to lighting varia-tions. The system error rate can thus be reduced by discard-ing these three most significant principal components[33].Though the accuracies of the eigenface approach increasedwithout using the first three eigenvectors, the DCP approachstill significantly outperformed it when one light was on.

The variations of lighting condition did affect the sys-tem performance. Nevertheless, the DCP approach ismuch more tolerant to lighting condition changes than the


Fig. 8. Examples of model faces and query faces under different lighting condition changes.

Table 4Sensitivity to lighting condition changes

Query faces Eigenface DCP method

Top 1 retrieval (%) Decrease (%) Top 5 retrieval (%) Decrease (%)

Normal 20-eigenvectors 55.36%condition 60-eigenvectors 71.43% 94.64 — 99.11 —(Benchmark) 112-eigenvectors 78.57%

20-eigenvectors 6.25%60-eigenvectors 9.82%

Left light on 112-eigenvectors 9.82% 87.50 7.14 96.43 2.68112-eigenvectors w/ofirst 3 components 26.79%20-eigenvectors 4.46%60-eigenvectors 7.14%

Right light on 112-eigenvectors 7.14% 89.29 5.35 96.43 2.68112-eigenvectors w/ofirst 3 components 49.11%20-eigenvectors 1.79%60-eigenvectors 2.68%

Both lights on 112-eigenvectors 2.68% 61.61 33.03 81.25 17.86112-eigenvectors w/ofirst 3 components 64.29%


eigenface method. The effect on retrieval rates when onelight was on produced only 7.14% and 5.35% decreases inretrieval accuracy for the DCP approach. When both lightswere on, the error rate became much higher than that of onlyone light on. This evidence shows that the DCP would stillbe affected by extreme lighting condition changes, such asover-illumination, though it is less sensitive to some extent.The over-illumination would cause strong specular reflec-tion on the face skin (it is no longer a Lambertian surface).Therefore the shape information on faces would have beensuppressed or lost, which would result in the increase of theerror rate.

4.5. Computational efficiency

An experiment was conducted to evaluate the computa-tional efficiency of the DCP matching using the AR facedatabase[22]. The average computational time for onematch is 57 ms. The experiment was conducted on a PCunder Windows platform with 1.3 GHz CPU and 512 MBRAM. In a face retrieval system, searching is the mostcomputationally expensive operation due to the large num-ber of images available in the database. Therefore, it isa prerequisite of image retrieval systems to use efficientvisual similarity matching algorithms. In most systems,face features are extracted/coded off-line from the originalimages, and stored in the face feature database. In queryingprocess, the same features are extracted from the queryface, and the features of the query image are comparedwith the features of each model image in the database.In practice, apart from adopting a fast face matching al-gorithm, pre-filtering operation[19] can be employed tofurther speed up the search by reducing the number ofcandidates.

4.6. Sensitivity to subject actions

Till now, the DCP approach has demonstrated very attrac-tive capability on retrieval accuracy, robustness to scale andenvironmental changes, and computational efficiency. For acomplete investigation covering all conditions of face simi-larity measuring, it is also interesting to know the remainingsensitivity problem due to subject actions (i.e., facial expres-sions and head pose variations), though they are of less im-portance in database retrieval systems as mentioned earlier.Similar experiments were conducted to evaluate the effectsof different facial expressions (smile, anger and scream) us-ing the AR database. The experimental results were summa-rized in Table 5. The smile expression caused recognitionrate to drop by 31.25% as compared to neutral expressionin Table 1while the anger expression caused only 2.89%drop of the rate. This was not unexpected because the angerexpression produced less physical variation from neutralexpression than the expression of smile. The scream expres-sion could be the extreme case of deformation among vari-ous human facial expressions, i.e., most facial features were

Table 5Retrieval rates under different facial expressions.

Query faces Eigenface DCP method (%)

Smilingexpression 20-eigenvectors 87.85% 63.39

60-eigenvectors 94.64%112-eigenvectors 93.97%112-eigenvectors w/ofirst 3 components 82.04%

Angryexpression 20-eigenvectors 78.57% 93.75


Screamingexpression 20-eigenvectors 34.82% 27.68


Table 6Retrieval rates under different head pose variations

Query faces Eigenface Eigenface DCP(20-eigenvectors) (30-eigenvectors) method (%)(%) (%)

Looksleft/right 70.00 75.00 70.83Looks up 51.67 56.67 65.00Looks down 45.00 55.00 70.00Average 59.17 65.12 68.61

distorted. The eigenface approach was found less sensitiveto significant facial expression changes such as smile andscreaming.

The face database[24] was also used to evaluate the sys-tem performance on query face images with different headposes. One fronto-parallel face per person was used as themodel face. The system was tested using the 8 poses lookingto the left, right, up and down for each person. There were240 query images in total. The retrieval results are summa-rized in Table 6. It can be observed that head pose varia-tions degraded the recognition rate of all the investigatedmethods, but DCP was more robust to head pose variationsthan the eigenface approach. Virtual view technique[34]and head pose recovery technique[35] can be employed tofurther improve the system performance of retrieving a faceunder pose variation.


5. Conclusion

In this paper, we present a novel visual similarity mea-suring technique to retrieve face images in photo albumdatabases for law enforcement. The proposed DCP is par-ticularly designed to address the problem of image queryingand retrieval from single model face databases where ac-curacy, robustness to scale and environmental changes, andcomputational efficiency are three important issues to beconsidered. In order to meet the efficiency requirements ofan image descriptor in high-dimensional image spaces, weextract DCPs from edge maps to further reduce the storagedemand and the computational expense of edge map match-ing, while keeping the advantage of insensitivity to illumi-nation changes. On the other hand, the directional attributesare introduced to enhance the discriminative capability tocater for high accuracy requirement.

The algorithm has been evaluated using two well knownand public available face databases of over 3000 imagesand compared with the eigenface approach, one of the bestface retrieval techniques. It is a very encouraging findingthat the proposed DCP approach performed superior to theeigenface approach in most of the comparison experiments.DCP correctly returned 100% and 94.64% of the querieson the face databases[24] and[22] respectively. The DCPapproach significantly outperformed the eigenface methodby 23.4% in retrieval rate by querying a face with scalevariation. The effect on retrieval rates when one light was ondegraded only by 7.14% and 5.35%. This study demonstratesthat the proposed DCP method provides a new solution,which is both robust to scale and environmental changes,and efficient in computation, for retrieving human faces insingle model databases.

Acknowledgements

This work is partially supported by the Australian Re-search Council (ARC) Discovery Grant DP0451091.

References

[1] Y.A. Aslandogan, C.T. Yu, Techniques and systems for imageand video retrieval, IEEE Trans. Knowledge Data Eng. 11(1999) 56–63.

[2] J.R. Bach, S. Paul, R. Jain, A visual information managementsystem for the interactive retrieval of faces, IEEE Trans.Knowledge Data Eng. 5 (1993) 619–628.

[3] V. Blanz, T. Vetter, Face recognition based on fitting a 3Dmorphable model, IEEE Trans. Pattern Anal. Mach. Intell. 25(9) (2003).

[4] S. Lawrence, C.L. Giles, A.C. Tsoi, A.D. Back, Facerecognition: a convolutional neural-network approach, IEEETrans. Neural Networks 8 (1997) 98–113.

[5] R. Bruneli, T. Poggio, Face recognition: features versustemplates, IEEE Trans. Pattern Anal. Mach. Intell. 15 (1993)1042–1052.

[6] M. Turk, A. Pentland, Eigenfaces for recognition, J. CognitiveNeurosci. 3 (1991) 71–86.

[7] M. Lades, J.C. Vorbrüggen, J. Buhmann, J. Lange, C. von derMalsburg, R.P. Würtz, M. Konen, Distortion invariant objectrecognition in the dynamic link architecture, IEEE Trans.Comput. 42 (1993) 300–311.

[8] T. Kanade, Picture processing by computer complex andrecognition of human faces, Technical Report, KyotoUniversity, Department of Information Science, 1973.

[9] A.J. Goldstein, L.D. Harmon, A.B. Lesk, Identification ofhuman faces, Proc. IEEE 59 (1971) 748.

[10] Y. Kaya, K. Kobayashi, A basic study on human facerecognition, Frontiers of Pattern Recognition, S. Watanabe,1972, pp. 265.

[11] I.J. Cox, J. Ghosn, P.N. Yianios, Feature-based facerecognition using mixture-distance, Computer Vision andPattern Recognition, IEEE Press, Piscataway, NJ, 1996.

[12] B.S. Manjunath, R. Chellappa, C. von der Malsburg, A featurebased approach to face recognition, in: Proceedings of theIEEE Computer Society of Conference on Computer Visionand Pattern Recognition, 1992, , pp. 373–378.

[13] M. Kirby, L. Sirovich, Application of the Karhunen–Loèveprocedure for the characterisation of human faces, IEEE Trans.Pattern Anal. Mach. Intell. 12 (1990) 831–835.

[14] A. Pentland, B. Moghaddam, T. Starner, View-based andmodular eigenspaces for face recognition, in: Proceedings ofthe IEEE Computer Society Conference on Computer Visionand Pattern Recognition, 1994, , pp. 84–91.

[15] M.S. Bartlett, J.R. Movellan, T.J. Sejnowski, Face recognitionby independent component analysis, IEEE Trans. NeuralNetworks 13 (6) (2002).

[16] J. Weng, J.S. Huang, N. Ahuja, Learning recognition andsegmentation of 3D objects from 2D images, Proceedings ofthe IEEE International Conference on Computer Vision, 1993,pp. 121–128.

[17] S.H. Lin, S.Y. Kung, L.J. Lin, Face recognition/detectionby probabilistic decision-based neural network, IEEE Trans.Neural Networks 8 (1997) 114–132.

[18] S.Y. Kung, J.S. Taur, Decision-based neural networks withsignal/image classification applications, IEEE Trans. NeuralNetworks 6 (1995) 170–181.

[19] Y. Gao, M.K.H. Leung, Face recognition using line edgemap, IEEE Trans. Pattern Anal. Mach. Intell. 24 (6) (2002)764–779.

[20] R. Nevatia, K.R. Babu, Linear feature extraction anddescription, Comput. Graphics Image Process. 13 (1980)257–269.

[21] M.K.H. Leung, Y.H. Yang, Dynamic two-strip algorithm incurve fitting, Pattern Recognition 23 (1990) 69–79.

[22] Face database, Purdue University.http://rvl1.ecn.purdue.edu/∼aleix/aleix_face_DB.html

[23] A.M. Martinez, R. Benavente, The AR face database, CVCTechnical Report #24, June 1998.

[24] Face database, the University of Bern.ftp://iamftp.unibe.ch/pub/Images/FaceImages/

[25] E. Saber, A.M. Tekalp, Frontal-view face detection and facialfeature extraction using color, shape and symmetry based costfunctions, Pattern Recognition Lett. 19 (1998) 669–680.

[26] K.M. Lam, H. Yan, Locating and extracting the eye in humanface images, Pattern Recognition 29 (1996) 771–779.

http://rvl1.ecn.purdue.

mailto:edu/~aleix/aleixprotect LY1extunderscore faceprotect LY1extunderscore DB.html

ftp://iamftp.unibe.ch/


[27] H.A. Rowley, S. Baluja, T. Kanade, Neural network-basedface detection, IEEE Trans. Pattern Anal. Mach. Intell. 20(1998) 23–38.

[28] C.H. Lee, J.S. Kim, K.H. Park, Automatic human face locationin a complex background using motion and color information,Pattern Recognition 29 (1996) 1877–1889.

[29] K.K. Sung, T. Poggio, Example-based learning for view-basedhuman face detection, IEEE Trans. Pattern Anal. Mach. Intell.20 (1998) 39–51.

[30] R. Chellappa, C.L. Wilson, S. Sirohey, Human and machinerecognition of faces: a survey, Proc. IEEE 83 (1995) 705–740.

[31] M.H. Yang, D.J. Kriegman, N. Ahuja, Detecting faces inimages: a survey, IEEE Trans. Pattern Anal. Mach. Intell. 24(1) (2002) 34–58.

[32] A.M. Martinez, Recognizing imprecisely localized, partiallyoccluded, and expression variant faces from a single sampleper class, IEEE Trans. Pattern Anal. Mach. Intell. 24 (6) (2002)748–763.

[33] P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfaces vs.fisherfaces: recognition using class specific linear projection,IEEE Trans. Pattern Anal. Mach. Intell. 19 (1997) 711–720.

[34] D. Beymer, T. Poggio, Face recognition from one exampleview, A.I. Memo No. 1536, Artificial Intelligence Lab, MIT,New York, 1995.

[35] Y. Gao, M.K.H. Leung, W. Wang, S.C. Hui, Fast faceidentification under varying pose from single 2D model view,IEE Proc.—Vision, Image Signal Process. 148 (4) (2001)248–254.

About the Author—YONGSHENG GAO received the B.Sc. and M.Sc. degrees in Electronic Engineering from Zhejiang University, China,in 1985 and 1988, respectively, and the Ph.D. Degree in computer engineering from Nanyang Technological University, Singapore. Currently,he is a Senior Lecturer with the School of Microelectronic Engineering, Faculty of Engineering and Information Technology, GriffithUniversity, Australia. His research interests include face recognition, biometrics, image retrieval, computer vision, and pattern recognition.He is a member of the IEEE.

robustvisualsimilarityretrievalinsinglemodelfacedatabases ...statistics [15] . bartlett et al. [15]...

Documents