an improved biometrics technique based on metric learning approach

Neurocomputing 97 (2012) 44–51

Contents lists available at SciVerse ScienceDirect

Neurocomputing

0925-23

http://d

n Corr

E-m

journal homepage: www.elsevier.com/locate/neucom

An improved biometrics technique based on metric learning approach

Xianye Ben a,n, Weixiao Meng b, Rui Yan c, Kejun Wang d

a School of Information Science and Engineering, Shandong University, No. 27, Shanda South Road, Jinan 250100, Chinab School of Electronics Information Engineering, Harbin Institute of Technology, Harbin 150080, Chinac Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY 12180, USAd College of Automation, Harbin Engineering University, Harbin 150001, China

a r t i c l e i n f o

Article history:

Received 18 May 2011

Received in revised form

31 May 2012

Accepted 16 June 2012

Communicated by J. Zhangobtaining a subspace that best detects the essential manifold structure. Furthermore, the two

Available online 1 July 2012

Keywords:

Gait recognition

Face recognition

Metric learning

Different walking states

Variant face pose

12/$ - see front matter & 2012 Elsevier B.V. A

x.doi.org/10.1016/j.neucom.2012.06.022

esponding author.

ail address: [email protected] (X. Ben).

a b s t r a c t

A biometrics technique based on metric learning approach is proposed in this paper to achieve higher

correct classification rates under the condition that the feature of the query is very different from that

of the register for a given individual. Inspired by the definition of generalized distance, the criterion of

this new metric learning is defined by finding an embedding that preserves local information and

transformation matrices for the query and the register are obtained by a generalized eigen-decom-

position. Experiments tested on biometric applications of CASIA(B) gait database and the UMIST face

database, demonstrate that our proposed method performs better than classical metric learning

methods and the current radial basis function (RBF) algorithms.

& 2012 Elsevier B.V. All rights reserved.

1. Introduction

Biometrics [1] is an authentication mechanism that relies onthe automated identification or verification of an individual basedon unique physiological or behavioral characteristics. Identifica-tion, known as recognition, attempts to establish a person’sidentity by asking ‘‘who is the person?’’. Verification also confirmsor denies a person’s claimed identity by asking ‘‘is this personwhom he or she claims to be?’’. The majority of research is aimedat studying well-established physical biometrics such as finger-print [2], iris [3], face [4], vein [5] and so on. Behavioralbiometrics, like keystrokes [6], gait [7] or signature [8] providea number of advantages over traditional physical biometrictechnologies. They can be collected unobtrusively or even withoutthe knowledge of the user. Collection of behavioral data oftendoes not require any special hardware and is so very costeffective [9].

Not only are vast quantities of biometrics information used asa form of identity access management and access control, but alsoas unprecedented challenges for us—it is difficult to have accessto valuable information hidden in a large number of redundantdata. To analyze and learn the knowledge contained in the dataare the common requirements in the application area of bio-metrics. Metric learning [10], which can be applied in a variety of

ll rights reserved.

biometrics, aims to learn an appropriate distance or similarityfunction for a given problem. A vast majority of the recent workhas been focused on learning so-called Mahalanobis distancefunctions; it may be viewed as construction of data lineartransformation and then application of the Euclidean distanceover the transformed data. Metric learning techniques can beroughly divided into two different approaches: unsupervised andsupervised learning methods. The representative approaches inthe unsupervised methods include principal component analysis(PCA) [11] and independent components analysis (ICA) [12]. Sincethese linear subspace methods are not effective to depict non-linear diversification, iso-metric feature mapping (ISOMAP) [13],Locally Linear Embedding (LLE) [14] and Laplacian eigenmap (LE)[15] have been proposed to discover the nonlinear structure ofthe manifold. Then, He et al. proposed locality preserving projec-tions (LPP) [16] and neighborhood preserving embedding (NPE)[17] from the linear approximation of LLE and LE. The mostrepresentative supervised approach is linear discriminative ana-lysis (LDA), and it searches for the project axes on which thedistance of data points are close in the same class while far in thedifferent classes. However, its deficiency lies in that the scattermatrix is singularity when the data points come from highdimensional space and the dimensionality of these data is higherthan the number of samples. Usually, PCAþLDA [18] is a solutionto this singularity problem. Local LDA (LLDA) [19], UncorrelatedLDA (ULDA) [20], Orthogonal LDA (OLDA) [21], and Diagonal LDA(DiaLDA) [22] are the expansion algorithms of LDA. These dimen-sionality reduction or subspace analysis techniques have also

www.elsevier.com/locate/neucom

www.elsevier.com/locate/neucom

dx.doi.org/10.1016/j.neucom.2012.06.022



mailto:[email protected]


X. Ben et al. / Neurocomputing 97 (2012) 44–51 45

been proposed and applied for face recognition. In addition, faceimages are naturally second-order tensors with column and rowmodes. Although gait sequence images can be viewed as third-order tensor with column, row and time modes, assuming thatsilhouettes have been extracted from original human walkingsequences, many researchers applied a silhouette preprocessingprocedure on the extracted silhouette sequences. It included sizenormalization (proportionally resizing each silhouette image sothat all silhouettes have the same height) and horizontal align-ment (centering the upper half silhouette part with respect to itshorizontal centroid). Then, gait templates, such as motion energyimage (MEI) [23], motion history image, (MHI) [23], single stephistory images (SSHI) [24], colored gait history image (CGHI) [25],gait history image (GHI) [26], moving silhouette image (MSI) [27],difference gait image (DGI) [28], gait energy image (GEI) [29],Enhanced GEI (EGEI) [30], (X-T plane energy images, X-TPEI) [31],gait entropy image (GEnI) [32] and gait flow image (GFI) [33],have been used for individual recognition. Therefore, an originalgait silhouette sequence was represented as a matrix-like second-order image. Once a series of templates were obtained for eachindividual, the problem of their excessive dimensionality occurs.Fortunately, the learning procedure following such subspaceanalysis approaches can achieve good data representation andgood class separability simultaneously. Relevant componentsanalysis (RCA) [34] makes use of equivalence relation constraintsto achieve metric learning. Goldberger et al. proposed neighbor-hood component analysis (NCA) [35] which directly maximizes astochastic variant of the leave-one-out KNN score on the trainingset and is non-parametric, making no assumptions on the shapeof the class distributions or the boundaries between them.Weinberger et al. [36] proposed large margin nearest neighbor(LMNN) which is an expansion of NCA under the framework ofmaximum interval. Xiang et al. [37] learned Mahalanobis distancemetric, hoping the distances of point pairs in must-links were assmall as possible and those of point pairs in cannot-links were aslarge as possible. Wang et al. [38] proposed distance metriclearning with feature decomposition (DMLFD) to decompose thehigh-dimensional feature space into a set of low-dimensionalfeature spaces with minimal dependencies, which can reduce thecomputational costs. From the early works mentioned above,universal researchers focus on the single data collection withthe same dimension by relying on a single map to achievedistance metric learning, and at the same time dimensionalityreduction is also achieved. However, the above mentioned meth-ods would not work when one operates dimensionality reductionand distance metric among different kinds of collections. Differ-ent collections refer to the data collected under different situa-tions, such as face images captured from different views ordifferent walking states for gait. That’s to say one cannot measureeither the distance between frontal and lateral pose of a humanface or the distance between normal gait and gait carrying a bag.Absolutely, they present daunting problems. It will require a newdefinition for distance metric of different kinds of collections, andit will transform data into a unified space where such a measureof similarity is valid. Thus, we proposed an improved biometricstechnique based on metric learning approach, and apply it to gaitand face recognition. The main contribution of this paper is anovel approach for distance metric of different kinds of collec-tions by incorporating the priori knowledge that the appearancefeatures from an inter-individual across largely diverse conditionssubstantially stay as close as possible. The proposed methodattempts to solve biometrics matching across large variation byseeking a unified representation between the query and regis-tered samples.

The rest of the paper is organized as follows: Section 2describes the improved metric learning approach for biometrics

in detail, including the definition of generalized distance, featureextraction and classification. A series of experiments conductedon the CASIA(B) gait database and the UMIST face database inSection 3. And the final Section 4 presents the conclusions.

2. Improved metric learning approach for biometrics

2.1. Distance metric

Distance metric is a function: X � X-R, where X � Rd denotesa d-dimension vector set. The simplest distance of any twovectors xi, xj is Euclidean distance expressed as follows:

Dðxi,xjÞ ¼ 99xi�xj992 ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðxi�xjÞ

Tðxi�xjÞ

qð1Þ

where, 99U992 means the L2 norm.The definition of generalized distance DCðxi,xjÞ should be

satisfied by:

(1)
Nonnegative: DCðxi,xjÞZ0 (2) Symmetry: DCðxi,xjÞ ¼DCðxj,xiÞ
(3)
Triangle inequality: DCðxi,xkÞþDCðxk,xjÞZDCðxi,xjÞ, where xk
denotes another vector.

if

DCðxi,xjÞ ¼ 99xi�xj99C ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi½ðxi�xjÞ

TCðxi�xjÞ�

qð2Þ

it is required that C is a real symmetric and positive semidefinitematrix, namely C ¼ PPT. Therefore, distance metric learning isimplemented by the study of linear transformation P.

The discussion above is distance metric in an ideal case. Bycontrast, the direct match does not work for the distance metric ofdifferent kinds of collections X, Y � Rd, for example, the distancesbetween two different walking states and between two variantface poses are shown in Fig. 1(1) and (2), respectively. For gaitrecognition, we hope that the projection features of normal gaitenergy image (GEI) [29] should be as close as possible to that ofGEI of a pose wearing a bag (GEI with a bag) for the sameindividual. Given the preprocessed binary gait silhouette imagesBtðx,yÞ at time t in a sequence, the GEI is defined asGðx,yÞ ¼ 1

N

PNt ¼ 1 Btðx,yÞ, where N is the number of frames in the

complete cycle(s) of a silhouette sequence, and x and y are valuesin the 2D image coordinate. Similarly, for face recognition, weexpect that the projection features of profile face should also be asclose as possible to that of the frontal face for the same person.

Now taking gait recognition for example, we attempt toproject the data points in the two original feature spaces into aunified space. The measured distance can be represented math-ematically as

Dðxi,yjÞ ¼DCð ~x i, ~y jÞ ¼DCðf xðxiÞ,f yðyjÞÞ ¼ 99f xðxiÞ�f yðyjÞ99C

¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi½ðf xðxiÞ�f yðyjÞÞ

TPPTðf xðxiÞ�f yðyjÞÞ�

qð3Þ

where, xiAX, yjAY . ~x ¼ f xðxiÞ and ~y j ¼ f yðyjÞ denote the features inthe unified space.

Minimizing the above Eq. (3) can achieve the defined generalizeddistance. For the benefit of solving the problem more easily, f x andf y are considered to be linear, that is f xðxiÞ ¼WT

1xi,f yðyjÞ ¼WT2yj.

Finally, substituting them into Eq. (3), and letting P1 ¼W1P(P2 ¼W2P) we can get

Dðxi,yjÞ ¼ 99PT1xi�PT

2yj992 ð4Þ

Therefore, our method aims to learn two linear transforma-tions P1 and P2, and then applies the Euclidean distance over thetransformed data.

Training

Testing

Unified space

Training

Testing

Projection

Projection

Training

Testing

P2

Unified space

Training

Testing

P2P1

P1

P2P1

P1

Training

Testing

Unified space

Training

Testing

Projection

Projection

P1

Training

Testing

P2

P2

P2

P1

P1

Unified space

Training

Testing

P1

(1) Gait recognition

(2) Face recognition

Fig. 1. The processes of training and testing. Fig. 1(1) shows the processes of the

proposed method for gait recognition, which mainly consists of two phases:

training and testing. In the training phrase, we learn two linear transformation

matrices for different walking states collections and we can represent all the

individuals in a unified space. In the testing phase, the query sample and the

registered set are projected into a common feature space and transformed

coefficients are used as unified features for classification. Fig. 1(2) shows similar

processes of the proposed method for face recognition.

X. Ben et al. / Neurocomputing 97 (2012) 44–5146

2.2. Improved metric learning approach

Considering to find an embedding that preserves local infor-mation and to obtain a subspace that best detects the essentialmanifold structure, this criterion can be defined by

JðP1,P2Þ ¼X

i,j99PT

1xi�PT2yj99

2Sij ð5Þ

where, i and j indices are running over data points.

There are two ways to define S of which element is Sij. Onedefinition is the cosine similarity matrix

Sij ¼xiUxj

99xi99 99xj99ð6Þ

The other is as follows

Sij ¼ exp �99xi�xj99

2

t

!ð7Þ

if xi is among k nearest neighbors of xj or xj is among k nearestneighbors of xi, otherwise, Sij ¼ 0. Here, k defines the localneighborhood.

The training set contains samples of normal GEIs and GEIs witha bag. Normal GEIs and GEIs with a bag are vectorized andarranged as X ¼ ½x1,x2,. . .,xM� and Y ¼ ½y1,y2,. . .,yM�, respectively,with M being the number of samples for the normal GEIs or GEIswith a bag. S can be obtained from two kinds of matrices. One iscomputed by normal GEIs, the other is computed by GEIs with abag. The selection of S will be discussed in the latter experiments.

FhðSÞ and FvðSÞ are diagonal matrices; their entries are columnand row sums of S. In addition, they provide natural measures onthe data points.

FhðSÞ ¼

PjS1j 0 0

0 & 0

0 0P

jSMj

264

375 ð8Þ

FvðSÞ ¼

PiSi1 0 0

0 & 0

0 0P

iSiM

264

375 ð9Þ

Eq. (5) can be rewritten as follow:

JðP1,P2Þ ¼ trP1

P2

" #TX

Y

� � FhðSÞ �S

�ST FvðSÞ

" #X

Y

� �T P1

P2

" #0@

1Að10Þ

Assuming that Q ¼P1

P2

" #, Z ¼

X

Y

� �, Y¼

FhðSÞ �S

�ST FvðSÞ

" #,

Eq. (11) is an alternate expression of Eq. (10).

JðP1,P2Þ ¼ trðQ TZYZTQ Þ ð11Þ

The dimensions of Q , Z and Y are 2d�2m, 2d�2M, and2M�2M, therefore, the d-dimensional normal GEIs or GEIs with abag reduce to m (mod) dimensions. The solution to make Eq. (11)minimized is obtained by a generalized eigen-decomposition ofðZYZT

Þq¼ lðZZTÞq and taking the eigenvectors q2,q3,. . .,qmþ1

corresponding to the second to (mþ1)th smallest eigenvaluesl2,l3,. . .,lmþ1, where the dimensions of ZYZT and ZZT are both2d�2d. As the generalized eigen-decomposition is computed in ahigh dimensional space, the proposed method is likely to overfitthe data. So a regularization term ZZT

þtI adjusts ZZT, where t isa adjustment factor with a small positive real value, such ast¼ 10�6. P1 corresponds to the 1st to d-th rows of the matricesand P2 corresponds to the (dþ1)th to 2dth rows of Q .

2.3. Feature extraction

We use the optimal projection vectors Q ¼ ½q2,q3,. . .,qmþ1� ¼

½P1=P2� for feature extraction. For a given normal GEI A and agiven GEI with a bag B with vectorized versions a and b,respectively, we have

ya ¼ PT1a ð12Þ

yb ¼ PT2b ð13Þ


We call this matrix ya the feature matrix of the normal GEI A.Accordingly, yb is the feature matrix of GEI B.

2.4. Classification

The register samples are supposed to be normal GEIs. Here, weuse Eq. (12) to obtain the features of the register samplesya1,ya2,. . .,yaM . For a given tested GEI with a bag B

0

whose versionwe have to know a priori, its vectorized version is b

0

and itsfeature can be represented by yb ¼ PT

2b0

. If Dðb0

,yakÞ ¼

minjDðb0

,yajÞ, then the resulting decision is B0

belonging to theclass of yak. The processes of feature extraction (or training) andclassification (or testing) for gait recognition is shown inFig. 1(1) and similarly to gait recognition, that for face recognitionis shown in Fig. 1(2).

2.5. Computational burden analysis

We finish this section by analyzing the time complexity of theproposed method. The most expensive step is the generalized eigen-decomposition. M is the number of data points in the training dataset,and d is the dimension of the data. Since the dimensions of ZYZT andZZT with correlated variables are both 2d�2d, PCA can be applied fordimension reduction with the time complexity OðMd�minðM,dÞÞbefore the proposed method. Suppose the dimensionality of eachdatum is reduced to d0, the proposed method is applied in the d0-dimensional space, hence takes OðMd02Þ, assuming M4d0. Hence, thetotal time complexity is OðMðd�minðM,dÞþd02ÞÞ.

3. Experiment and analysis

The proposed metric learning approach is used for gait andface recognition. It is tested on the CASIA(B) gait database [39]and the UMIST face database [40]. The CASIA(B) gait database isused to discuss the selection of S and evaluate the performance ofour proposed method under different walking states conditions.The UMIST face database is used to evaluate the performance ofour proposed method under variant face poses conditions. Thetwo databases are both divided into two non overlapping sets fortraining and testing.

3.1. Experiments on the CASIA(B) gait database

The CASIA(B) gait database contains 124 subjects’ gait videosequences collected under 11 views, clothing and carrying a bagconditions. Among all the views, the lateral gait is selected asresearch objects. For each individual, two sequences are walkingwith a bag, two are wearing in a coat, and the rest are normalgaits. In our experiments, the gait period detection is achieved bythe robust dual-ellipse fitting approach [41], and the GEIs withthe size of 64�64 pixels are computed for all the gait sequencesas we did in our previous work [42]. Therefore, it is difficult tocompute the eigen-decomposition of ðZYZT

Þq¼ lðZZTÞq with the

dimensions of ZYZT and ZZT being both 8192�8192. But thisproblem can be solved by applying PCA before distance metriclearning. The dimension of the projecting space for PCA is M�1 inall the experiments, because mean subtraction is necessary forperforming PCA, and the maximal rank of the covariance matrixcomputed by the singular value decomposition is M�1.

3.1.1. Selection of S

In order to discuss the selection of S, we select the firstsequence of gait with a bag and the first normal gait sequencefor each individual as the training set, and the normal ones as

register. While the second sequence of gait with a bag for eachindividual is selected as the querying ones.

Let S be the form of ‘Gauss’ as Eq. (7), calculated by the normaland the gait with a bag be S1 and S2. And S3 ¼minfS1,S2g, whereminfUg denotes a operation of finding out the smallest values ofcorresponding elements in matrices.

The correct classification rates (CCRs) of S being S1, S2 and S3

with varying number k of nearest neighbors and scaling factor t

are given in Fig. 2(1)–(3). It can be observed from similar resultsin Fig. 2(1) and (2) that the top CCRs are both 90.32% under thecondition of t¼ ½50,100� and different values of k. With the benefitof flexible selection of S3, it can be ascertained that S3 obtainsbetter CCR compared with S1 and S2, and it can achieve the bestCCR of 95.16% under t¼150 and k¼4. The reason why theselection of S3 can learn the best distance metric and improvethe recognition performance is that (1) The intrinsic geometry ofXand local structure can be sought via S1, and that of Y

corresponds to S2. The similarity matrix is underfitted only byeither X or Y . (2) All entries of the principal diagonal in thesimilarity matrix are equal to 1, and others are non-negative butless than 1. S3 ¼minfS1,S2g can restrict non- principal diagonalentries to be sufficiently small so as to make sure the similaritybetween one individual and another small enough. Then, thesimilarity among samples for an individual is comparatively large.Hence, S3 yield the best recognition performance.

Let S be the form of ‘cosine similarity’ as Eq. (6), also calculatedby S1, S2and S3 ¼minfS1,S2g. The CCR across S1, S2, and S3 overthe number k of nearest neighbors is plotted in Fig. 3. Fig. 3 showsthat ‘cosine similarity’ achieves the maximal CCR of 92.74% and isinferior to that of the form of ‘Gauss’.

3.1.2. Test results

If S in the Eq. (10) is degenerated as an identity matrix, Eq. (10)has generally the form of side information. A comparativeexperiment is to compare the performance of the proposedmethod including S being the form of ‘Gauss’ and being the formof ‘Side information’ with that of Huang’s method (PCA-CCA-RBF(Radial Basis Function)) [43], PCA-RBF [43], NPE [17], PCA and LPP[16]. Actually, the query sample should be matched with theregister set when classification. PCA is a mathematical procedurethat uses an orthogonal transformation to convert a set ofobservations of possibly correlated variables into a set of valuesof uncorrelated variables. This transformation is defined in such away that the first principal component has as high a variance aspossible (that is, accounts for as much of the variability in thedata as possible), and each succeeding component in turn has thehighest variance possible under the constraint that it be ortho-gonal to (uncorrelated with) the preceding components. Canoni-cal correlation analysis (CCA) is a way of making sense of cross-covariance matrices. It will enable us to find linear combinationsof the two sets of variables which have maximum correlationwith each other. However, the proposed method aims to findtransformation matrices in order to make the distances of pointpairs for a certain individual but with different states (views orwearing conditions) as small as possible.

The training data consists of 248 GEIs: one registered normalGEI and one GEI with a bag per person are randomly selected. Theremaining 124 GEIs with a bag are used for testing. We also usethe measure of cumulative match score (CMS) [44] across 30random realizations of the training set to evaluate the perfor-mance of the above mentioned methods. The CMS curves areplotted in Fig. 4, which can answer to the question ‘‘is it thecorrect answer in the top n matches’’ and let one know how manyimages have to be examined to get a desired level of performance.It is noted that the horizontal axis of the figure is rank and the

12

34

56

0100

200300

4005000.4

0.6

0.8

1

k

X: 1Y: 50Z: 0.9032

t

kt

k

t

CC

R

12

34

56

0100

200300

400500

0.70.80.9

1

X: 1Y: 50Z: 0.9032

CC

R

12

34

56

0

200

400

6000.75

0.8

0.85

0.9

0.95

1X: 4Y: 150Z: 0.9516

CC

R

(1) S = S1

(2) S = S2

(3) S = S3

Fig. 2. CCR over the variations of number k of nearest neighbors and scaling factor

t in the form of ‘Gauss’.

Fig. 3. CCR over the variations of number k of nearest neighbors in the form of

‘cosine’ similarity.

0 2 4 6 8 10 12 14 16 18 20

0.4

0.5

0.6

0.7

0.8

0.9

1

Rank

CM

S

our proposed method + 'Gauss'our proposed method + 'Side'Huang's methodPCA-RBFNPEPCALPP

Fig. 4. CMS of the CASIA(B) gait database.


vertical axis is the probability of the identification. We can seethat our proposed method is indeed a more effective strategy toimprove the recognition performance when there is a greatdifference in walking states between the register and the query.

The side information used for distance metric learning trans-mit 0 or 1 link messages is less effective than the Gauss kernel

with decimal from 0 to 1 and consequently its recognitionperformance is inferior to Gauss kernel.

The label ‘nm’, ‘bg’ and ‘cl’ denote normal gait, gait with a bag,and gait wearing a coat, respectively. Because of the largedifferences of feature cluster between ‘‘nm’’ and ‘‘bg’’, respec-tively, NPE, PCA and LPP which are applied to match with theregister set have a bad recognition performance.

Table 1 summarized CCR of our proposed method and othermethods for the CASIA(B) gait database, where the number in abracket is the sample number for each individual. The registersets include normal gait, gait with a bag, and gait wearing a coat.Every training set consists of two different walking states.Although lack of similarity between the register and the query,our proposed method has a higher recognition performance andNPE, PCA and LPP with the bad performance cannot serve in thiscase. The CCR computed across 30 random realizations of thetraining set for side information utilized is lower when comparingwith Gauss for all experiments in Table 1. As can be seen, the CCRof Huang’s method generally increases as the number of trainingsamples increases. While, there is no significant improvement for

Table 1The top CCR of our proposed method and other methods for the CASIA(B) gait database.

Sample set Our proposed method Other methods

Register Register Register Gauss Side information Huang PCA-RBF NPE PCA LPP

nm nm(1) bg(1) bg 0.9516 0.9032 0.3710 0.3145 0.3306 0.3468 0.3629nm nm(1) cl(1) cl 0.9516 0.9355 0.6532 0.3952 0.1694 0.1371 0.2177nm nm(2) bg(1) bg 0.9516 0.9274 0.7661 0.2581 0.4758 0.3871 0.4597nm nm(2) cl(1) cl 0.9758 0.9516 0.8145 0.3145 0.2177 0.1210 0.2581nm nm(3) bg(1) bg 0.9355 0.8629 0.8387 0.2903 0.4435 0.3952 0.4758nm nm(3) cl(1) cl 0.9597 0.9597 0.9113 0.3629 0.2339 0.1210 0.2903nm nm(4) bg(1) bg 0.9194 0.8952 0.8387 0.2823 0.4597 0.3790 0.5081nm nm(4) cl(1) cl 0.9597 0.9435 0.9274 0.2823 0.2581 0.1371 0.3387nm nm(5) bg(1) bg 0.9274 0.8710 0.8387 0.2339 0.4516 0.3629 0.4919nm nm(5) cl(1) cl 0.9758 0.9355 0.9274 0.2984 0.2581 0.1452 0.3226nm nm(6) bg(1) bg 0.9274 0.8710 0.8387 0.2742 0.4919 0.3629 0.5242nm nm(6) cl(1) cl 0.9597 0.9113 0.9274 0.2903 0.2742 0.1532 0.3548bg bg(1) cl(1) cl 0.9516 0.879 0.5645 0.3306 0.0645 0.0565 0.121bg bg(2) cl(1) cl 0.9677 0.9597 0.8387 0.2984 0.1129 0.0726 0.2500cl cl(1) bg(1) bg 0.8871 0.8710 0.5726 0.2661 0.0645 0.0484 0.0887cl cl(2) bg(1) bg 0.8871 0.8548 0.7258 0.2339 0.0968 0.0645 0.1048

0 2 4 6 8 10 12 14 16 18 200.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Rank

CM

S


0 2 4 6 8 10 12 14 16 18 200.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Rank

CM

S


(1) Experiment 1

(2) Experiment 2

Fig. 5. CMS of the UMIST face database.


PCA-RBF because only one ‘bg’ or ‘cl’ sample together withanother sample for each person is used to be paired, and at most123 (i.e., M�1) features are employed to learn RBF mappingsalthough the number of training samples increases.

3.2. Experiments on the UMIST face database

UMIST face database consists of 564 images of 20 people;every image is cropped with the size of 112�92 and covers arange of poses from profile to frontal views. So we select 9 faceimages with poses from profile to frontal views for each person.The first three, the middle three and the remainder are ordered asthe classes of lateral face, oblique face, frontal face and labeled ‘lf’,‘of’ and ‘ff’.

Experiment 1: one frontal face image and one lateral faceimage for each person are randomly selected as the training set,and the frontal ones as register. While the remainder lateral faceimages are selected as the querying ones.

Experiment 2: two frontal face images and two lateral faceimages for each person are randomly selected as the training set,and also the frontal ones as register. While the remainder lateralone for each person is selected as the query.

The difference between Experiment 2 and Experiment 1 is thenumber of similar pairs. Experiment 1 and Experiment 2 areconducted to test CMS across 30 random realizations of thetraining set.

The CMS curves of comparative experiments using our pro-posed method and Huang’s method, PCA-RBF, NPE, PCA and LPPare plotted in Fig. 5, where (1) for Experiment 1 and (2) forExperiment 2.

Making changes in the register, training and correspondingtesting sets, the experimental top CCR of our proposed methodsacross 30 random realizations of the training set includingsimilarity matrix being ‘Gauss’ and side information and Huang’smethod, PCA-RBF, NPE, PCA and LPP are listed in Table 2, wherethe number in a bracket is the sample number for each individual.The results shown in Table 2 reconfirm our previous finding, i.e.,our proposed method has better classification accuracies thanother methods. The CCR of two similar pairs is higher than that ofone single similar pair, and it is because that the number ofsimilar pairs is increased, there are more correlations between theregister and query, which are possibly contributed to higher CCR.

From Tables 1 and 2, we can see differences between theproposed methods and other metric learning algorithms is muchbigger for the CASIA(B) gait problem than for the UMIST face

problem, which related with database scale and clustering. More-over, it should be noted that PCA-RBF yields great differences forthese two problems. Because only one ‘bg’ or ‘cl’ sample, which is

Table 2The top CCR of our proposed method and other methods for the UMIST face database.

Sample set Our proposed method Other methods

Register Register Register Gauss Side information Huang PCA-RBF NPE PCA LPP

ff ff(1) lf(1) lf 0.7500 0.6000 0.5250 0.3000 0.2750 0.3000 0.3750ff ff(1) of(1) of 0.9750 0.9250 0.6250 0.6250 0.8250 0.7750 0.8500lf lf(1) of(1) of 0.9750 0.8750 0.6500 0.7000 0.3250 0.3500 0.2750lf lf(1) ff(1) ff 0.7750 0.6750 0.6500 0.4750 0.1750 0.1750 0.1250of of(1) ff(1) ff 0.8250 0.6250 0.6000 0.4250 0.4500 0.5000 0.3750of of(1) lf(1) lf 0.8000 0.5250 0.4500 0.5750 0.8000 0.7750 0.8000ff ff(2) lf(2) lf 0.9500 0.6000 0.9500 0.9500 0.2000 0.4000 0.3500ff ff(2) of(2) of 1 0.9500 0.9000 0.9000 0.8500 0.8000 0.8500lf lf(2) of(2) of 1 1 0.9500 0.9000 0.5500 0.4000 0.5000lf lf(2) ff(2) ff 0.9000 0.6000 0.9000 0.8500 0.2500 0.2000 0.2500of of(2) ff(2) ff 0.9000 0.6000 0.9000 0.7500 0.5000 0.5000 0.4500of of(2) lf(2) lf 0.9500 0.8000 0.9500 0.9500 0.7500 0.8000 0.8000

Table 3The average time (s) consumed of the 6 methods (CPU: Intel(R) Core(TM)2 Duo

T8300 @2.40GHZ 2.39GHZ, RAM:1G).

Task Our proposed method Huang PCA-RBF NPE PCA LPP

Gait recognition 0.55 15.75 15.67 0.54 0.26 0.40

Face recognition 0.08 0.25 0.23 0.18 0.06 0.14


not enough, together with another sample for each person is usedto be paired, and at most C�1 (where C is the number of class)features are employed to learn RBF mappings for this gaitproblem; while in the face recognition task, two face images withvariant poses simultaneously double for each individual, and2C�1 features can be used to train RBF model efficiently andaccurately, as a result, PCA-RBF achieves better prediction andyields better performance.

In addition, given a pair of samples for each individual in thetraining set and the remainders for querying, the average CPUtime consumed of each method for gait recognition and facerecognition tasks is tested and given in Table 3. The proposedmethod needs less CPU time compared to Huang’s method andPCA-RBF, and comparable to NPE, PCA and LPP.

4. Conclusions

We have presented a distance metric learning method for gaitrecognition with different walking states and face recognitionwith variant face poses. The major advantage of the proposedmethod is that it requires computing one generalized eigen-decomposition for simultaneously obtaining two linear transfor-mation matrices unlike other metric learning methods. Onetransformation matrix is used for the query sample, the other isused for the register set. Therefore, similarity between theregister and query although it lacks, our proposed method has ahigher recognition performance. Experimental results reveal thatthe proposed approach has far better recognition rate whencompared to other classical metric learning approaches and RBFavailable, and it is suitable for real-time gait or face recognitionapplications. The multi-versions (features spaces) generalizationproblem is decomposed into a series of pairwise problems. Ofcourse, in which way these separate pairwise subproblems areintegrated transfers and establish the relationship.

Although the extensive experiments on the benchmark data-base show the superiority of the proposed method, there remainsthe space to further improve the performance of the presentedalgorithm. The basic idea of this work can be further extended byusing either Fisher discriminant analysis or large margin

information. However, like other linear methods, the proposedalgorithm is also limited by the assumption of linearity, and thusthe performance depends fundamentally on the distribution ofnonlinear pattern. In our ongoing step, we are going to carry outthe studies on an improved biometrics technique based on kernelmetric learning approach and plan to explore the relationshipbetween the kernel methods and this proposed metric learningmethod.

Acknowledgment

We sincerely thank the Institute of Automation ChineseAcademy of Sciences for granting us permission to use theCASIA(B) gait database. This project is supported by the NaturalScience Foundation of China (Grant No. 61172167), the NationalScience Foundation for Post-doctoral Scientists of China (GrantNo. 20110491087), the Doctoral Fund of Ministry of Education ofChina (Grant No. 20102304110004), the Research Award Fund forOutstanding Middle-aged and Young Scientist of Shandong Pro-vince (Grant No. BS2010DX019) and the Independent InnovationFoundation of Shandong University (Grant No. 2012DX007). Wealso would like to thank H.Huang for providing the code of hismethod (Huang and He, 2011).

References

[1] Delac K., Grgic M.. A survey of biometric recognition methods, ProceedingsElmar 2004. 46th International Symposium on Electronics in Marine, 2004:184-193.

[2] Q. Zhao, David Zhang, L. Zhang, N. Luo, High resolution partial fingerprintalignment using pore-valley descriptors, Pattern Recognit. 43 (3) (2010)1050–1061.

[3] C. Sanchez-Avila, R. Sanchez-Reillo, Two different approaches for iris recogni-tion using Gabor filters and multiscale zero-crossing representation, PatternRecognit. 38 (2) (2005) 231–240.

[4] W. Yang, C. Sun, L. Zhang, A multi-manifold discriminant analysis method forimage feature extraction, Pattern Recognit. 44 (8) (2011) 1649–1657.

[5] Z. Liu, Y. Yin, H. Wang, S. Song, Q. Li, Finger vein recognition with manifoldlearning, J. Network Comput. Appl. 33 (3) (2010) 275–282.

[6] A. Subramanya, M.L. Seltzer, A. Acero, Automatic removal of typed keystrokesfrom speech signals, IEEE Signal Process Lett. 14 (5) (2007) 363–366.

[7] Sarkar S.. Zongyi Liu, Improved gait recognition by gait dynamics normal-ization, IEEE Trans. Pattern Anal. Mach. Intell. 28 (6) (2006) 863–876.

[8] C. Quek, R.W. Zhou., Antiforgery: a novel pseudo-outer product based fuzzyneural network driven signature verification system, Pattern Recognit. Lett.23 (14) (2002) 1795–1816.

[9] Roman V. Yampolskiy, Venu Govindaraju, Behavioural biometrics: a surveyand classification, Int. J. Biometrics 1 (1) (2008) 81–113.

[10] C. Domeniconi, D. Gunopulos, Jing Peng. Large margin nearest neighborclassifiers, IEEE Trans. Neural Networks 16 (4) (2005) 899–909.

[11] R. Duda, P. Hart, D Stork, Pattern Classification, Wiley, 2000.[12] P. Comon, Independent component analysis, a new concept, Signal Process.

36 (3) (1994) 287–314P.


[13] J.B. Tenenbaum, V. de Silva, J.C. Langford, A global geometric framework fornonlinear dimensionality reduction, Science 260 (2000) 2319–2323P.

[14] S. Roweis, L. Saul, Nonlinear dimensionality reduction by locally linearembedding, Science 290 (2000) 2323–2326P.

[15] M. Belkin, P. Niyogi, Laplacian eigenmaps for dimensionality reduction anddata representation, Neural Comput. 15 (6) (2003) 1373–1396P.

[16] X. He, P. Niyogi, Locality preserving projections, Adv. Neural Inf. Process. Syst.(2004).

[17] X. He, D. Cai, S. Yan, H. Zhang, Neighborhood preserving embedding, In Proc.ICCV 2 (2005) 1208–1213P.

[18] J. Yang, J. Yang, Why can LDA be performed in PCA transformed space, PatternRecognit. 36 (2) (2003) 563–566P.

[19] M. Sugiyama, Local fisher discriminant analysis for supervised dimension-ality reduction, In Proc. ICML (2006) 905–912P.

[20] J. Ye, T. Li, T. Xiong, R. Janardan, Using uncorrelated discriminant analysis fortissue classification with gene expression data, IEEE/ACM Trans. Comput.Biol. Bioinf. 4 (1) (2004) 181–190P.

[21] J. Ye, Characterization of a family of algorithms for generalized discriminantanalysis on undersampled problems, J. Mach. Learn. Res. 6 (4) (2005)483–502P.

[22] S. Noushath, G.H. Kumar, P. Shivakumar, Diagonal Fisher linear discriminantanalysis for efficient face recognition, Neurocomputing 69 (13–15) (2006)1711–1716P.

[23] A F Bobick, JW. Davis, The recognition of human movement using temporaltemplates, IEEE Trans. PAMI 23 (3) (2001) 257–267P.

[24] Gao Youxing. Chen Shi, Gait recognition with wavelet moments of silhouettechange images, J. Xi’an Jiaotong Univ. 43 (1) (2009) 90–94P.

[25] Chen Shi, Ma Tian-jun, Huang Wan-hong, et al., A multi-layer windowsmethod of moments for gait recognition, J. Electron. Inf. Technol. 31 (1)(2009) 116–119P.

[26] Jianyi Liu, Nanning Zheng, Gait history image: a novel temporal template forgait, IEEE Int. Conf. Recognit. Multimedia Expo (2007) 663–666P.

[27] T Lam, R.A Lee, New representation for human gait recognition: motionsilhouette image MSI, Int. Conf. Biometrics (2006) 612–618P.

[28] J. Yang, X. Wu, Z. Peng, Gait recognition based on difference motion slice,Proc. 8th Int. Conf. Signal Process. (2006) 16–20P.

[29] J Han, B. Bhanu, Individual recognition using gait energy image, IEEE TransPattern Anal. Mach. Intell. 28 (2) (2006) 316–322P.

[30] Xiaochao Yang, Yue Zhou, Tianhao Zhang, et al., Gait recognition based ondynamic region analysis, Signal Process. 88 (9) (2008) 2350–2356P.

[31] Guo Chang Huang, Yun-Hong Wang, Human gait recognition based on X-Tplane energy images, Int. Conf. Wavelet Anal. Pattern Recognit. (ICWAPR’07 3(2007) 1134–1138P.

[32] K Bashir, Tao Xiang, Shaogang Gong, Gait recognition without subjectcooperation, Pattern Recognit. Lett. 31 (13) (2010) 2052–2060P.

[33] HWL Toby, KH Cheung, NKL. James, Gait flow image: a silhouette-based gaitrepresentation for human identification, Pattern Recognit. 44 (4) (2011)973–987P.

[34] A. Bar-Hillel, T. Hertz, N. Shental, D. Weinshall, Learning distance functionsusing equivalence relations, In Proc. ICML (2003) 11–18P.

[35] G. Jacob, R. Sam, H. Geoffrey, S. Ruslan, Neighbourhood components analysis,In Proc. NIPS (2005) 13–18P.

[36] K. Weinberger, J. Blitzer, L. Saul, Distance metric learning for large marginnearest neighbor classification, Adv. Neural Inf. Process. Syst. 18 (2006)1473–1480P.

[37] S. Xiang, F. Nie, C. Zhang, Learning a Mahalanobis distance metric for dataclustering and classification, Pattern Recognit. 41 (12) (2008) 3600–3612P.

[38] M. Wang, B. Liu, J. Tang, X. Hua, Metric learning with feature decompositionfor image categorization, Neurocomputing 73 (10–12) (2010) 1562–1569P.

[39] S. Yu, D. Tan, T.. Tan, A framework for evaluating the effect of view angle,clothing and carrying condition on gait recognition, Proc. 18th Int. Conf.Pattern Recognit. Hong Kong, China (2006) 441–444.

[40] H. Wechsler, P.J. Phillips, V. Bruce, F. Fogelman-Soulie, T.S. Huang (Eds.),NATO ASI Series F, Vol. 163, 1998, pp. 446–456.

[41] X.Y. Ben, W.X. Meng, R. Yan, Dual-ellipse fitting approach for robust gaitperiodicity detection, Neurocomputing 79 (2012) 173–178, March.

[42] BEN Xianye, WANG Kejun, YAN Rui, POPOOLA Oluwatoyin Pius., Subpattern-based complete two dimensional principal component analysis for gaitrecognition, Proc. China Assoc. Sci. Technol. 7 (2011) 16–22.

[43] H Huang, H. He, Super-resolution method for face recognition using non-linear mappings on coherent features, IEEE Trans. Neural Networks 22 (1)(2011) 121–130P.

[44] P.J. Phillips, Hyeonjoon Moon, S.A. Rizvi, P.J.. Rauss, The FERET evaluationmethodology for face-recognition algorithms, IEEE Trans. Pattern Anal. Mach.Intell. 22 (10) (2000) 1090–1104P.

Xianye Ben was born in Harbin, China, in 1983. She
received the B.S. degree in electrical engineering andautomation from the College of Automation, HarbinEngineering University, Harbin, China, in 2006, and thePh.D. degree in pattern recognition and intelligentsystem from the College of Automation, Harbin Engi-neering University, Harbin, in 2010. She is currentlyworking as an Assistant Professor in the School ofInformation Science and Engineering, Shandong Uni-versity, Jinan, China. She has published more than 40papers in major journals and conferences. Her currentresearch interests include pattern recognition, digital image processing and analysis, machine learning.
Weixiao Meng was born in Harbin, China, in 1968. Hereceived his B.S. degree in Electronic Instrument andMeasurement Technology from Harbin Institute ofTechnology (HIT), China, in 1990. And then heobtained the M.S. and Ph.D. degree, both in Commu-nication and Information System, HIT, in 1995 and2000, respectively. Now he is a professor in School ofElectronics and Communication Engineering, HIT.Besides, he is a senior member of IEEE, a seniormember of China Institute of Electronics, China Insti-tute of Communication and Expert Advisory Group onHarbin E-Government. His research interests mainly
focus on adaptive signal processing. In recent years, he
has published 1 authored book and more than 100 academic papers on journalsand international conferences, more than 60 of which were indexed by SCI, EI andISTP. Up to now, he has totally completed more than 20 research projects andholds 6 China patents. 1 standard proposal was accepted by IMT-Advancedtechnical group.

Rui Yan was born in Jilin, China, in 1988. He receivedthe B.S. degree in automation from the College ofAutomation, Harbin Engineering University, Harbin,China, in 2011. He is currently a Ph.D. student in theComputer Science Department, Rensselaer PolytechnicInstitute. His current research interests include patternrecognition and semantic web.

Kejun Wang was born in Jilin, China, in 1962. Hereceived his Ph.D. degree in Special auxiliary ships,marine equipment and systems from Harbin Engineer-ing University in 1995. From 1996 to 1998, he was aPostdoctoral Research Fellow in Fluid Power Transmis-sion and Control at Harbin Institute of Technology. Heis now a professor and doctoral supervisor at College ofAutomation in Harbin Engineering University. He hasheld and participated in many projects such as finger-printing recognition and has published more than 80refereed journal papers. His current research interestsinclude biometrics and pattern recognition and intel-
ligent system.

an improved biometrics technique based on metric learning approach

Documents