kernel coupled distance metric learning for gait recognition and face recognition

13
Kernel coupled distance metric learning for gait recognition and face recognition Xianye Ben a,n , Weixiao Meng b , Rui Yan c , Kejun Wang d a School of Information Science and Engineering, Shandong University, Jinan 250100, China b School of Electronics and Information Engineering, Harbin Institute of Technology, Harbin 150080, China c Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY 12180, USA d College of Automation, Harbin Engineering University, Harbin 150001, China article info Article history: Received 20 February 2012 Received in revised form 3 April 2013 Accepted 7 April 2013 Communicated by L. Shao Available online 24 May 2013 Keywords: Kernel coupled distance metric learning (KCDML) Gait recognition Face recognition Different walking states Variant face pose Variant resolution abstract The performances of biometrics may be adversely impact by different walking states, walking directions, resolutions of gait sequence images, pose variation and low resolution of face images. To address these problems, we presented a kernel coupled distance metric learning (KCDML) method after considering matching among different data collections. By using a kernel trick and a specialized locality preserving criterion, we formulated the problem of kernel coupled distance metric learning as an optimization problem whose aims are to search for the pair-wise samples staying as close as possible and to preserve the local structure intrinsic data geometry. Instead of an iterative solution, one single generalized eigen- decomposition can be leveraged to compute the two transformation matrices for two classications of data sets. The effectiveness of the proposed method is empirically demonstrated on gait and face recognition tasks' results which outperform four linear subspace solutions' (i.e. CDML, PCA, LPP, LDA) and four nonlinear subspace solutions' (i.e. Huang's method, PCA-RBF, KPCA, KLPP). & 2013 Elsevier B.V. All rights reserved. 1. Introduction A suitable distance function or metric is signicant for many real-world applications involving high-dimensional data (such as image annotation [1], image retrieval [2], image segmentation [3], text document classication [4], handwritten digit recognition [5], content-based copy detection [6], phoneme classication [5], mass spectrum data mining [7], gene classication [8], bioinformatics application [9], ngerprinting identication [10], face identica- tion [11], behavior recognition [12] and many others). Assuming that the input space is homogeneous, then Euclidean distance is simple and commonly used in the original space. However, recent years have witnessed intensive research on metric learning which, unlike the curse of dimensionality, will not make the above assumption valid. A simple but effective strategy is to replace Euclidean distances with so-called Mahalanobis distances. Work in this area incorpo- rates various linear and nonlinear methods; the former includes principal component analysis (PCA), linear discriminative analysis (LDA) and locality preserving projection (LPP) [13], while the latter involves iso-metric feature mapping (ISOMAP) [14], locally linear embedding (LLE) [15] and Laplacian Eigenmap (LE) [16]. Lu et al. [17] presented a general framework that offers a unied view for understanding and explaining dimension reduction algorithms such as PCA, LDA, ISOMAP, LLE, LE, LPP, Neighborhood Preserving Embedding (NPE) and Marginal Fisher Analysis (MFA). The transfer matrix is usually viewed as a distance metric. The research of Li et al. [18] showed that the problem of distance metric learning can be solved by spectral dimensionality reduction methods with label information injected under the Euclidean assumption. Learning a Riemannian metric is also related to nding a lower dimensional representation of a data set. Among the advantages of many metric learning global Mahalanobis metrics, one is that learning can often be formulated as convex optimization problems with no local optima. These problems can be solved by efcient algorithms [19]. Class label information is available for metric learning in supervised learning tasks [2023], on the contrary, such information is not available in unsupervised learning tasks [24]. In addition, semi-supervised tasks can [2528] learn with both labeled and unlabeled data. Xing et al. [29] proposed a convex optimization problem to learn a global Mahalanobis metric according to pair-wise constraints. Bar-Hilleletal. [30] devised relevant component analysis (RCA) for learning a Mahalanobis metric. It was a non-iterative algorithm Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/neucom Neurocomputing 0925-2312/$ - see front matter & 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.neucom.2013.04.012 n Corresponding author. Tel.: +86 15254130623. E-mail address: [email protected] (X. Ben). Neurocomputing 120 (2013) 577589

Upload: kejun

Post on 05-Jan-2017

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Kernel coupled distance metric learning for gait recognition and face recognition

Neurocomputing 120 (2013) 577–589

Contents lists available at ScienceDirect

Neurocomputing

0925-23http://d

n CorrE-m

journal homepage: www.elsevier.com/locate/neucom

Kernel coupled distance metric learning for gait recognitionand face recognition

Xianye Ben a,n, Weixiao Meng b, Rui Yan c, Kejun Wang d

a School of Information Science and Engineering, Shandong University, Jinan 250100, Chinab School of Electronics and Information Engineering, Harbin Institute of Technology, Harbin 150080, Chinac Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY 12180, USAd College of Automation, Harbin Engineering University, Harbin 150001, China

a r t i c l e i n f o

Article history:Received 20 February 2012Received in revised form3 April 2013Accepted 7 April 2013

Communicated by L. Shao

criterion, we formulated the problem of kernel coupled distance metric learning as an optimization

Available online 24 May 2013

Keywords:Kernel coupled distance metric learning(KCDML)Gait recognitionFace recognitionDifferent walking statesVariant face poseVariant resolution

12/$ - see front matter & 2013 Elsevier B.V. Ax.doi.org/10.1016/j.neucom.2013.04.012

esponding author. Tel.: +86 15254130623.ail address: [email protected] (X. Ben).

a b s t r a c t

The performances of biometrics may be adversely impact by different walking states, walking directions,resolutions of gait sequence images, pose variation and low resolution of face images. To address theseproblems, we presented a kernel coupled distance metric learning (KCDML) method after consideringmatching among different data collections. By using a kernel trick and a specialized locality preserving

problem whose aims are to search for the pair-wise samples staying as close as possible and to preservethe local structure intrinsic data geometry. Instead of an iterative solution, one single generalized eigen-decomposition can be leveraged to compute the two transformation matrices for two classifications ofdata sets. The effectiveness of the proposed method is empirically demonstrated on gait and facerecognition tasks' results which outperform four linear subspace solutions' (i.e. CDML, PCA, LPP, LDA) andfour nonlinear subspace solutions' (i.e. Huang's method, PCA-RBF, KPCA, KLPP).

& 2013 Elsevier B.V. All rights reserved.

1. Introduction

A suitable distance function or metric is significant for manyreal-world applications involving high-dimensional data (such asimage annotation [1], image retrieval [2], image segmentation [3],text document classification [4], handwritten digit recognition [5],content-based copy detection [6], phoneme classification [5], massspectrum data mining [7], gene classification [8], bioinformaticsapplication [9], fingerprinting identification [10], face identifica-tion [11], behavior recognition [12] and many others). Assumingthat the input space is homogeneous, then Euclidean distance issimple and commonly used in the original space. However, recentyears have witnessed intensive research on metric learning which,unlike the curse of dimensionality, will not make the aboveassumption valid.

A simple but effective strategy is to replace Euclidean distanceswith so-called Mahalanobis distances. Work in this area incorpo-rates various linear and nonlinear methods; the former includesprincipal component analysis (PCA), linear discriminative analysis(LDA) and locality preserving projection (LPP) [13], while the latter

ll rights reserved.

involves iso-metric feature mapping (ISOMAP) [14], locally linearembedding (LLE) [15] and Laplacian Eigenmap (LE) [16]. Lu et al.[17] presented a general framework that offers a unified view forunderstanding and explaining dimension reduction algorithmssuch as PCA, LDA, ISOMAP, LLE, LE, LPP, Neighborhood PreservingEmbedding (NPE) and Marginal Fisher Analysis (MFA). The transfermatrix is usually viewed as a distance metric. The research of Liet al. [18] showed that the problem of distance metric learning canbe solved by spectral dimensionality reduction methods with labelinformation injected under the Euclidean assumption. Learning aRiemannian metric is also related to finding a lower dimensionalrepresentation of a data set.

Among the advantages of many metric learning globalMahalanobis metrics, one is that learning can often be formulatedas convex optimization problems with no local optima. Theseproblems can be solved by efficient algorithms [19]. Class labelinformation is available for metric learning in supervised learningtasks [20–23], on the contrary, such information is not available inunsupervised learning tasks [24]. In addition, semi-supervisedtasks can [25–28] learn with both labeled and unlabeled data.Xing et al. [29] proposed a convex optimization problem to learn aglobal Mahalanobis metric according to pair-wise constraints.Bar-Hilleletal. [30] devised relevant component analysis (RCA)for learning a Mahalanobis metric. It was a non-iterative algorithm

Page 2: Kernel coupled distance metric learning for gait recognition and face recognition

X. Ben et al. / Neurocomputing 120 (2013) 577–589578

but only incorporated positive constraints. Bing et al. [31] refor-mulated the semi-definite programming (SDP) problems assmooth convex nonlinear programming (NLP) problems withmuch simpler constraints for performing metric learning tasks.Kumar et al. [32] proposed a metric learning algorithm thatsimultaneously learned the underlying dissimilarity measurewhile clustering the given data set using triplet constraintsreferring to the relative comparisons. He et al. [33] adopted anextension to the non-negative matrix factorization to measure thebetween-class similarity of two patterns. Cevikalp [26] used pair-wise equivalence (similarity and dissimilarity) constraints toimprove the original distance metric in lower dimensional inputspaces and learning distance metric was formulated as a quadraticoptimization problem which returned a global optimal solution.Li et al. [34] added more weights for sample pairs on the boundary,which were hard to classify, and defined a new objective functionfor optimization.

However, the construction of a Mahalanobis distance metricalso may encounter problems in performance in terms of involvingthe estimation of so many parameters. Generally the metriclearning is accomplished based on the pair-wise constraintsamong training samples,thus over-fitting may be induced if thenumber of training samples is not large enough and its perfor-mance will significantly degrade [35]. Moreover, although lowerdimensional representations are useful for visualizing high-dimensional data, the obtained sub-manifold is tuned to thetraining data and new data points will likely lie outside the sub-manifold due to noise, cropping, distortion, and even samplesartificially captured from multiple camera views. These aforemen-tioned approaches do not take into account the relationshipbetween the modalities which actually can benefit the learningprocess. Therefore, it is necessary to specify some way of project-ing the off-manifold points into the manifold via metric learning.

Fortunately, Li et al. [36] proposed coupled locality preservingmappings, but the shortage of Li's algorithm existed in that thecomputation of generalized eigen-decomposition is too excessiveto surpass the scope that a computer can handle. Deng et al. [37]proposed regularized coupled mappings, however, this algorithmwas sensitive to the balance controlling parameter. In our earlierwork [38], we presented an improved biometrics technique basedon metric learning approach for gait recognition and face recogni-tion, and showed the basic results on the recognition performanceunder the condition that the feature of the query is very differentfrom that of the register for a given individual. The criterion of thisproposed metric learning [38] was defined by finding an embed-ding that preserved local information and obtaining a subspacethat best detected the essential manifold structure in the linearspace. Therefore, the proposed approach is limited by the assump-tion of linearity, and thus it is not applicable to nonlinearsituations. In addition, its performance depends fundamentallyon the distribution of nonlinear pattern. This paper largely extends[38] by proposing a new kernel coupled distance metric learningapproach for gait recognition and face recognition, as well asrecognition performances of inconsistent matching issues such asmatching between any two kinds of gaits with different walkingstates, matching between any two kinds of gaits with differentwalking directions, matching between gaits of various resolutions,matching between face images under variant poses and matchingbetween face images of various resolutions and even between faceimages with various resolutions and different poses.

Compared with the conventional methods aforementioned andour pervious work [38], the difference of the proposed approachlies in computing two transformation matrices for two differentkinds of modalities (taking gait images with different walkingdirections, gait images of various resolutions, face images undervariant poses and face images of various resolutions and even face

images with various resolutions and different poses for example)in the nonlinear space. The major contributions of this paper arehighlighted as follows:

1.

We propose a novel kernel coupled distance metric learningapproach supported by side information, inherent neighbor-hood structures among examples with a set of similar pair wiseconstraints, that is KCDML. Moreover, this distance metric isable to describe nonlinear pattern distributions.

2.

We conduct a comprehensive study of applying the KCDMLtechnique to two applications: gait recognition and face recog-nition. As we know, both recognitions are difficult because faceimages vary with scale, pose, expression, etc. and gait imagesvary with scale, view angle, clothing and carrying conditionchanges. Thus, this research focuses on these influence factors.We apply KCDML to make gait images with different walkingdirections, gait images of various resolutions, face imagesunder variant poses, face images of various resolutions andeven face images with various resolutions and different posesmore consistent respectively.

The rest of this paper is organized as follows: in Section 2 wepresent a brief mathematical description on three definitionsabout distance metric. The proposed KCDML, including problemdefinition, distance metric criterion, decoupling, and algorithmprocedure as well as its whole process of classification is detailedin Section 3. We compare the proposed method with otherapproaches by utilizing CASIA(B) gait database and UMIST facedatabase in Section 4. Finally, Section 5 presents the conclusionsabout current and future work.

2. Preliminaries

In this section, we introduce three definitions about distancefunction:

We use the notation Rþ ¼ ½0;þ∞Þ.

Definition 1. A metric is a function D : Rn � Rn-Rþ that satisfiesthe following conditions ∀x1; x2; x3∈Rn:

1.

Dðx1; x2Þ ¼ 0⇔x1 ¼ x2 2. Dðx1; x2Þ ¼Dðx2;x1Þ 3. Triangle inequality: Dðx1; x2Þ þ Dðx2;x3Þ≥Dðx1;x3Þ

Definition 2. The more general metric DA rescales the coordinatesor their combinations. Such a metric can be expressed by a positivesemi-definite matrix A, xTAx≥0 for all x. A positive semi-definitematrix can always be expressed by A¼ STS for some S, and hencethe metric is

DAðx; yÞ ¼ ðx−yÞTAðx−yÞ ¼ ðx−yÞTSTSðx−yÞ

¼ ðSx−SyÞTðSx−SyÞ ¼ ðx′−y′ÞTðx′−y′Þ ð1Þ

where x′¼ Sx, y′¼ Sy. Hence, the metric is equivalent to linearfeature extraction with a matrix S, followed by the standardEuclidean metric.

Definition 3. Riemannian metrics is the most general metric,where the matrix A depends on the location, and the distance is

DAðxÞðx; yÞ ¼ ðx−yÞTAðxÞðx−yÞ ð2Þ

The metric is constructed by the learning metrics principle.Therefore, a key problem in metric learning for pattern recog-

nition and signal processing is to find a suitable objective function.

Page 3: Kernel coupled distance metric learning for gait recognition and face recognition

X. Ben et al. / Neurocomputing 120 (2013) 577–589 579

3. Kernel coupled distance metric learning

In this section, we propose a novel kernel coupled distancemetric learning using the side information as well as inherentneighborhood structures among examples with a set of similarpair wise constraints to find an appropriate distance function. Wefirst address the problem of computing a generalized distancebetween the samples respectively from two different sets of imagematrices. Afterwards, we present a kernel coupled distance metriccriterion and establish the optimized objective function which canbe transformed into an eigen-decomposition optimization pro-blem in the nonlinear space. Hence, the algorithmic procedure isthen formally stated. Finally, we choose the nearest neighborclassifier or SVM (linear SVM and RBF SVM) as the final classifica-tion tool to predict the class of the query sample from theregistration list involving inconsistent matching problem.

3.1. Problem definition

Suppose we are given two sets of image matrices X1, X2,…, XM

and Y1, Y2,…, YM . Conventionally, the image matrix is convertedinto a vector before the metric learning. The data x1, x2,…, xM andy1, y2,…,yM are mapped to a higher dimensional feature space byusing a map ϕ : RN-F, that is, ϕðx1Þ, ϕðx2Þ,…, ϕðxMÞ and ϕðy1Þ,ϕðy2Þ,…, ϕðyMÞ. Assume that both these sample sets are centeredrespectively, i.e. ∑M

i ¼ 1ϕðxiÞ ¼ 0 and ∑Mi ¼ 1ϕðyiÞ ¼ 0.

Before formally describing the kernel coupled distance metriclearning (KCDML) algorithm, we give some terminologies ondistance metric operations.

A generalized distance DCðxi; yjÞ between xi and yj is a function:X�Y-R, where X⊂Rm and Y⊂Rn are two different sample setsof m-dimension and n-dimension vectors respectively as follows:

DCðxi; yjÞ ¼ jjxi−yjjjC ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi½ðf ðxiÞ−gðyjÞ�T½ðf ðxiÞ−gðyjÞ�

qð3Þ

Here, jj⋅jjC is defined as a certain norm form, and f ð⋅Þ, gð⋅Þ aretwo transformation functions which can make X and Y unified intothe same coupled space, i.e. transforming xi and yj from Rm and Rn

respectively to Rd. Actually, our proposed distance metric is learntby finding out those two transformation functions f ð⋅Þ, gð⋅Þ.

3.2. Kernel coupled distance metric criterion

To obtain a coupled subspace, an effective way is to search forthe pair wise samples which should stay as close as possible andpreserve the local structure intrinsic data geometry. For historicaland computational reason, the side-information [25,39] thatoptimally establishes associations between two classifications ofdata sets has been broadly applied. In this paper, the sideinformation as well as inherent neighborhood structures amongexamples with a set of similar pair wise constraints ði; jÞ∈ℂ isexploited. ði; jÞ∈ℂ means that i and j belong to the same class.Choosing the similarity matrix S ¼ I whose i-th and j-th entry is Sij,we may define a specialized locality preserving criterion asfollows:

JðP1;P2Þ ¼∑i;jjjPT1ϕðxiÞ−PT

2ϕðyjÞjj2Sij ð4Þ

where P1 and P2 are transformation matrices in the kernel space.The objective function of this proposed algorithm is optimized tominimize Eq. (4) which can be also rewritten to be

JðP1;P2Þ ¼ trP1

P2

" #Tϕðx1Þ;ϕðx2Þ;…;ϕðxMÞ

ϕðy1Þ;ϕðy2Þ;…;ϕðyMÞ

" #⋯

0@

I �I�I I

� � ϕðx1Þ;ϕðx2Þ;…;ϕðxMÞϕðy1Þ;ϕðy2Þ;…;ϕðyMÞ

" #TP1

P2

" #1Að5Þ

Following the conventional way for kernel analysis, there exist Msamples to span the kernel feature space fϕðxiÞ; i¼ 1;2;…;Mg forP1, and M samples to span the kernel feature spacefϕðyjÞ; j¼ 1;2;…;Mg for P2. Therefore there exist coefficientsα1; α2;…; αM and β1; β2;…; βM such that

P1 ¼ ∑M

i ¼ 1αiϕðxiÞ ð6Þ

P2 ¼ ∑M

i ¼ 1βiϕðyiÞ ð7Þ

Substituting Eqs. (6) and (7) into Eq. (5), the objective functionin Eq. (5) is changed to

Jðα; βÞ ¼ trα

β

" #TϕðXÞT

ϕðYÞT" #

ϕðXÞϕðYÞ

" #I �I�I I

� �0@

�ϕðXÞ

ϕðYÞ

" #TϕðXÞ

ϕðYÞ

" #α

β

" #1A

¼ trα

β

" #T Kx

Ky

" #I �I�I I

� � Kx

Ky

" #Tα

β

" #0@

1A ð8Þ

where ϕðXÞ ¼ ½ϕðx1Þ;ϕðx2Þ;…;ϕðxMÞ� and ϕðYÞ ¼ ½ϕðy1Þ;ϕðy2Þ;…;

ϕðyMÞ�.Then, we achieve Kx and Ky by using the kernel function

defined as Eqs. (9) and (10), which implicitly computes the dotproduct of vectors xi and yj in the higher dimensional space:

Kx ¼

⟨ϕðx1Þ;ϕðx1Þ⟩ ⟨ϕðx1Þ;ϕðx2Þ⟩ ⋯ ⟨ϕðx1Þ;ϕðxMÞ⟩⟨ϕðx2Þ;ϕðx1Þ⟩ ⟨ϕðx2Þ;ϕðx2Þ⟩ ⋯ ⟨ϕðx2Þ;ϕðxMÞ⟩

⋮ ⋮ ⋱ ⋮⟨ϕðxMÞ;ϕðx1Þ⟩ ⟨ϕðxMÞ;ϕðx2Þ⟩ ⋯ ⟨ϕðxMÞ;ϕðxMÞ⟩

266664

377775 ð9Þ

Ky ¼

⟨ϕðy1Þ;ϕðy1Þ⟩ ⟨ϕðy1Þ;ϕðy2Þ⟩ ⋯ ⟨ϕðy1Þ;ϕðyMÞ⟩⟨ϕðy2Þ;ϕðy1Þ⟩ ⟨ϕðy2Þ;ϕðy2Þ⟩ ⋯ ⟨ϕðy2Þ;ϕðyMÞ⟩

⋮ ⋮ ⋱ ⋮⟨ϕðyMÞ;ϕðy1Þ⟩ ⟨ϕðyMÞ;ϕðy2Þ⟩ ⋯ ⟨ϕðyMÞ;ϕðyMÞ⟩

266664

377775 ð10Þ

It is noted that the sample data in real world do not always satisfy∑M

i ¼ 1ϕðxiÞ ¼ 0 or ∑Mi ¼ 1ϕðyiÞ ¼ 0, and then Kx and Ky in Eqs. (9) and

(10) are replaced by

~K x ¼Kx�1M⋅Kx−Kx⋅1M þ 1M⋅Kx⋅1M ð11Þ

~K y ¼Ky�1M ⋅Ky−Ky⋅1M þ 1M ⋅Ky⋅1M ð12Þwhere 1M is a M �Munit matrix whose coefficient is 1=M.

The most common used kernel functions are Gaussian kernel,polynomial kernel and sigmoid kernel. Gaussian kernel is used forthe experimentation in this work, and each entry of Kx and Ky iscalculated by

Kxði; jÞ ¼ exp −jjxi−xjjj2

2s2

!ð13Þ

Kyði; jÞ ¼ exp −jjyi−yjjj2

2s2

!ð14Þ

The basic idea of kernel coupled distance is in some way similarto that of context-dependent kernels [40]. Its criterion alsomeasures a kind of alignment: the pair wise samples stay as closeas possible. But context-dependent similarity function is defined

Page 4: Kernel coupled distance metric learning for gait recognition and face recognition

X. Ben et al. / Neurocomputing 120 (2013) 577–589580

as the fixed-point of (1) an energy function used to balance afidelity term, (2) a context criterion and (3) an entropy term.The 1-st term is inversely proportional to the expectation of theEuclidean distance between the most likely aligned spatialinterest points. The 2-nd one is proportional to the alignmentscores of all pair wise samples to the given spatial interest points.The last one can be viewed as a smoothing regularizer whichcontrols the uncertainty and decision thresholds. However, theproposed specialized locality preserving criterion is to search forthe pair wise samples which should stay as close as possible andpreserve the local structure intrinsic data geometry at thesame time.

Finally, assuming

W ¼α

β

" #; Z ¼

Kx

Ky

" #; Θ¼ I �I

�I I

� �;

the objective function in Eq. (8) can be rewritten as

Jðα; βÞ ¼ trðWTZΘZTWÞ ð15Þ

3.3. Decoupling of the criterion function

Commonly, ZZT may be not always invertible, and we need toadjust ZZT to be ZZT þ τI, where τ is an adjusting factor with asmall positive real value, such as τ¼ 10−7. The dimensions of ZΘZT

and ZZT are both 2M � 2M. The transformation vector W thatminimizes Eq. (15) is given by the minimum eigenvalue solution tothe following generalized eigen-decomposition problem:

ðZΘZTÞW ¼ λðZZTÞW ð16ÞEq. (15) is minimized when W is composed of the first d

smallest eigenvectors of Eq. (16). Clearly, α∈RM�d corresponds tothe 1-st to M -th rows of the matrix W , and β∈RM�d corresponds tothe (M+1)-th to 2M- th rows of W .

3.4. The algorithm

The algorithmic procedure is formally stated below:

1.

FigGEI

All the image matrices are converted into vectors and thenrealigned as X ¼ ½x1; x2; :::; xM � and Y ¼ ½y1; y2; :::; yM� for bothdifferent kinds of data.

2.

X and Y are reformulated in terms of only dot-product as Eqs. (9)and (10), and ~K x and ~K y are computed by Eqs. (11) and (12).

. 1. The flowchart of KCDML for classification (In the training section, the bigger gaitis a query sample.)

3.

ene

The linear projections are computed by solving the generalizedeigen-decomposition problem of Eq. (16). It is easy to check

W ¼α

β

" #

4.

X and Y are coupled into the same unified kernel space, namely

Fx ¼ PT1ϕðXÞ ¼ αΤ ~K x ð17Þ

Fy ¼ PT2ϕðY Þ ¼ βΤ ~K y ð18Þ

where, Fx ¼ ½f x1; f x2;…; f xM �, Fy ¼ ½f y1; f y2;…; f yM �, whose entriesf xiði¼ 1;…;MÞ and f yjðj¼ 1;…;MÞ correspond to the features of xiand yj.

These above steps can be illustrated in the training section ofFig. 1.

3.5. Kernel coupled distance metric learning for classification

The classification process can include training and testingphases. The former phase is discussed in Section 3.4. The testingphase consists of query sample transformation and feature match-ing. In detail, the query sample is also first realigned into a singlevector recorded as y′, and then it will be mapped from originalvector input space to a higher or even infinite dimensional featurespace ϕðy′Þ. Actually, the kernel coupled feature f ′y for this sampleis derived as follows:

f ′y ¼ PT2ϕðy′Þ ¼ βΤ ~K

′y ð19Þ

where ~K′

y which is the centralized form ofK ′

y ¼ ½ϕðy1Þ⋅ϕðy′Þ;ϕðy2Þ⋅ϕðy′Þ;⋯;ϕðyMÞ⋅ϕðy′Þ�T can be computed asfollows:

~K′y ¼K ′

y−Ky⋅1T1�M−1M ⋅K′y þ 1M⋅Ky⋅1T1�M ð20Þ

where 11�M is a 1�M unit matrix whose coefficient is 1=M.After feature extraction by KCDML, the nearest neighbor

classifier is used to predict the class of the query sample as

Dðf xc; f ′yÞ ¼ argminj

Dðf xj; f ′yÞ ð21Þ

i.e. f ′y is closest to f xc , therefore, its class belongs to the class of f xc .The whole process of classification is shown in Fig. 1.

rgy images (GEIs) are register samples, while in the testing section, the smaller

Page 5: Kernel coupled distance metric learning for gait recognition and face recognition

Fig. 3. Effects of feature dimension and s on the recognition precision in KCDML.

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

CC

R

KCDMLCDMLHuangPCA-RBFPCAKPCALPPKLPPPCA(*)KPCA(*)LPP(*)KLPP(*)LDA(*)

X. Ben et al. / Neurocomputing 120 (2013) 577–589 581

4. Experiments

We reported on a set of experiments carried out on twobenchmark databases, i.e., the CASIA(B) gait database [41] andUMIST face database [42] to evaluate the performance of theproposed KCDML for recognition tasks involving inconsistentmatching issues.

CASIA(B) gait database [41] consists of 13 640 sequences from124 subjects of 11 walking views classified by angles between thecamera optical axis and walking direction, namely 01, 181, 361, 541,721, 901, 1081, 1261, 1441, 1621and 1801, moreover, three variations(view angle, clothing and carrying condition changes) are sepa-rately considered.

UMIST face database [42] consists of 564 images of 20 indivi-duals (mixed race/gender/appearance), and each individual isshown in a range of poses from profile to frontal views.

In all experiments, the training samples were randomlyselected, the remainder was for testing, and the nearest neighborclassifier was employed for classification aiming at validatingthe effectiveness of feature extraction but not the classifier's.The experiments for a certain case were repeated 30 times, andthe recognition precision (we call it correct classification rate inthis paper) is obtained as the average of the ratio of the number ofcorrectly classified test samples out of the total test samples.

We compare KCDML not only with the state-of-art methodsincluding our current work—CDML [38], Huang's method [43] andPCA combined with RBF (PCA-RBF) [43], but also with importantbaseline methods, such as PCA Kernel PCA (KPCA), LPP, Kernel LPP(KLPP) [44] and LDA. CDML, PCA, LPP, LDA belong to linearsubspace solutions, while Huang's method, PCA-RBF, KPCA, KLPPare regarded as nonlinear subspace approaches.

4.1. CASIA (B) gait database experimental results

We fitted two regions of the whole silhouette divided by itscentroid into two ellipses respectively in the each frame of gait video.Then we constructed the gait fluctuation as a periodic functiondepending on the eccentricities of two halves of the silhouette overtime. Thus, we achieved gait period detection by analyzing thevariation of eccentricities [45]. Let x and y be values in the 2D imagecoordinate. The mean image Gðx; yÞ ¼ ð1=NÞ∑N

t ¼ 1Btðx; yÞ, named asgait energy image (GEI), of silhouettes over a complete gait cycle(N frames) within a sequence B1ðx; yÞ;⋯;BNðx; yÞ was used to repre-sent a gait cycle' feature, because GEI is very robust against any errorsin individual frames [46]. Each GEI is of size 64�64 pixels.

In this section, we evaluated KCDML on inconsistent matchingproblem such as different walking states, walking directions andresolutions using CASIA(B) gait database.

4.1.1. Different walking statesIn this section, we discussed the issue of selecting the kernel

parameter of the proposed method and presented the recognitionperformance of our method compared with other the state-of-artmethods and baseline methods based on different walking statesbetween the register and query sets. ‘nm', ‘bg’ and ‘cl’ respectivelydenoted normal gaits, gaits with a bag and gaits with clothingcondition changes respectively in Fig. 2.

Fig. 2. Different walking states.

First, one normal gait and one gait with a bag of everyindividual were randomly selected for training while the remain-ing gaits with a bag probe for test. In addition, class labels ofnormal gaits were given as a register set. This case can be denotedas ‘nm&bg’. Fig. 3 shows the effects of feature dimension and thekernel parameter s on the recognition precision in KCDML. Thex-coordinate is the feature dimension, and the y-coordinate is thekernel parameter s, and the z-coordinate is the correct classifica-tion rate (CCR). The feature dimension changes from 10 to 120with step 10 and s changes from 1000 to 100 000 with step 1000.For each probe the parameter s was chosen to achieve the bestperformance. The detailed information about the value of s isgiven in Table 2, and the corresponding feature dimensions are allless than 120. In each experiment, there is a range of s for whichKCDML achieves a good performance. We only show one value foreach probe.

Fig. 4 reveals the comparisons of the CCRs of linear and kernelmethods under different numbers of features on CASIA (B) gaitdatabase. Here, 13 methods were compared: (1) our method(KCDML), (2) CDML, (3) Huang's method, (4) PCA combined withRBF (PCA-RBF), (5) PCA, (6) Kernel PCA (KPCA), (7) LPP, (8) KernelLPP (KLPP) [44], (9) PCA (*), (10) KPCA (*), (11) LPP (*), (12) KLPP(*), and (13) LDA (*), where ‘*’ denoted that the query one wasemployed in matching with gaits with a bag instead. Fisher LDA

0 20 40 60 80 100 1200

0.1

0.2

Number of features

Fig. 4. CCRs with different dimensionalities for ‘nm&bg’.

Page 6: Kernel coupled distance metric learning for gait recognition and face recognition

X. Ben et al. / Neurocomputing 120 (2013) 577–589582

defines the separation between these two distributions to be theratio of the variance between the classes to the variance within theclasses, we cannot compare our method with LDA because there isno variance within the classes this ‘nm&bg’ experiment. However,when we compared it with LDA(*), we can assume that the classesof gaits with a bag have been known. There were two samples ineach class, thus variance within the classes exist. Huang's methodreconstructed input gait from another walking state which wasdifferent from the register one in the coherent feature space forrecognition, while PCA-RBF did not interpolate the input gait inthe coherent space. Apparently, (9)–(13) were ideal, but it wasdifficult to ensure consistent between the register and query ones.It can be seen from Fig. 4 that except PCA, LPP, Huang's methodand PCA-RBF, the other eight methods are effective. Although PCA,KPCA, LPP and KLPP are employed in an ideal case which ismatching with the training gaits whose states are the same asthe probe ones, KCDML and CDML outperform PCA, KPCA, LPP andKLPP when the features are reserved enough. KCDML can yield92.74% CCR with 70D features, which is better than its linearversion, i.e. CDML. KCDML in a kernel space need a bit morefeatures than CDML in a linear space, so that KCDML achieveslower CCRs than CDML does when the number of reservedfeatures is no more than 50. It is possible that kernel methodrequires a high kernel space and is not good at sample expression

Fig. 5. Identification performance in terms of rank order statistics for ‘nm&bg’(the CCR corresponds to rank¼1).

Table 1The top CCR of the proposed method and other methods under different walking state

Sample set Register Rank¼1

nm nm bgTraining set nm(1) nm(1) bg(1)

bg(1) cl(1) cl(1)Testing bg cl cl

Our method 0.9274 0.9355 0.9113CDML 0.9032 0.9355 0.8790Huang's method 0.3710 0.6532 0.5645PCA-RBF 0.3145 0.3952 0.3306PCA 0.3468 0.1371 0.0565KPCA 0.7661 0.8710 0.7984LPP 0.3629 0.2177 0.1210KLPP 0.8468 0.9032 0.9113

in a lower reduced dimensionality space. By contrast, linearmethods can manipulate in a low linear space.

Then, we used rank order statistic to evaluate the proposedmethod. This is defined as the cumulative probability that theactual class of a test measurement is among its k top matches,where k is called the rank. This performance statistics are reportedas cumulative match scores (CMS) and it can be effective tocharacterize features' filter capability. The rank is plotted alongthe horizontal axis and the vertical axis is the percentage ofcorrect match. Fig. 5 shows the CMS for ranks up to 20 of KCDMLalso compared with the aforementioned 11 methods. As can beseen, our method slightly outperforms CDML at some minor rank,while PCA and LPP perform also poorly. The filter capability ofeither Huang's method or PCA-RBF is not strong. Though PCA,KPCA, LPP and KLPP are employed to ideally match with ‘bg’, theyare inferior to KCDML and CDML.

Finally, other experiments were done with different walkingstates. There were four cases in total including nm vs. bg, nm vs. cl,bg vs. cl and cl vs. bg, denoted as ‘nm&bg’, ‘nm&cl’, ‘bg &cl’ and ‘clvs. bg’ respectively. The corresponding value of s of KCDML isgiven in the first two rows of Table 2. We did not compare ourmethod with LDA because there was no variance within theclasses in these experiments. The CCR results for rank is equal to1 and 10 respectively are summarized in Table 1, where thenumber in a bracket is the sample number for each individual.The results can clearly demonstrate that KCDML is superior toCDML, better than KLPP and KPCA, and much better than PCA, LPP,Huang's method and PCA-RBF.

4.1.2. Different walking directionsThe proposed method was also tested on gait recognition with

different walking directions to demonstrate its validity. Fig. 6shows normal gait energy images extracted from sequencesrecorded at 0, 18, 36, 54, 72, 90, 108, 126, 144, 162, 1801 angle.

Ten experiments were conducted and 901 angle gaits arealways enrolled. The training set were composed of 01, 181, 361,541, 721, 1081, 1261, 1441, 1621 and 1801 angle gaits respectivelytogether with gaits recorded at a 901 angle, and the query set othergaits recorded at 01, 181, 361, 541, 721, 1081, 1261, 1441, 1621 and1801 angle correspondingly, denoted as ‘90&0’, ‘90&18’, ‘90&36’,‘90&54’, ‘90&72’, ‘90&108’, ‘90&126’, ‘90&144’, ‘90&162’ and‘90&180’ in turn. The corresponding value of s of KCDML is givenin the third to sixth rows of Table 2. Considering that there is novariance within the classes in these experiments, we did notcompare our method with LDA. Fig. 7 reports the performanceresults compared with those aforementioned seven methods of

s.

Rank¼10

cl nm nm bg clcl(1) nm(1) nm(1) bg(1) cl(1)bg(1) bg(1) cl(1) cl(1) bg(1)bg bg cl cl bg

0.8790 0.9839 0.9677 0.9919 0.94350.8710 0.9839 0.9677 0.9758 0.94350.5726 0.7097 0.8629 0.7903 0.83060.2661 0.7339 0.7258 0.5887 0.62900.0484 0.5968 0.4194 0.3065 0.35480.7823 0.9435 0.9758 0.9839 0.95160.0887 0.5968 0.5242 0.4919 0.36290.8710 0.9274 0.9677 0.9758 0.9435

Page 7: Kernel coupled distance metric learning for gait recognition and face recognition

Fig. 6. Different walking directions.

Table 2Parameters in KCDML for gait recognition.

Different walking states nm&bg nm&cl bg&cl cl&bg — — —

s 61 000 50 000 530 000 64 0000 — — —

Different walking directions 90&0 90&18 90&36 90&54 90&72 90&108 90&126s 3000 83 000 343 000 8000 28 0000 49 000 750 000

90&144 90&162 90&180 — — — —

s 97 000 870 000 250 000 — — — —

Different resolutions 1&&1 56*56 48*48 40*40 32*32 24*24 16*16 8*8s 64 000 870 000 432 000 570 000 430 000 5000 110Different resolutions 2&&2 56*56 48*48 40*40 32*32 24*24 16*16 8*8s 35 000 3200 730 280 000 68 000 100 000 6000

0 18 36 54 72 108 126 144 162 1800

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Angle

CC

R

KCDMLCDMLHuangPCA-RBFPCAKPCALPPKLPP

Fig. 7. The CCR vs. different walking directions.

Fig. 8. Different resolutions of GEI.

X. Ben et al. / Neurocomputing 120 (2013) 577–589 583

gait recognition with different walking directions. Thus, thehorizontal axis represents the angles of tested gaits, while thevertical axis represents the CCR. It is evident that the recognitionperformances of gaits recorded at 181 and 361 angle arecomparatively poor. This is probably due to severe shape varia-tions in such gait patterns. It also can be seen that KCDMLconsistently outperforms the other 7. Although as a whole theresults are very encouraging, more experiments on gaits withvarious resolutions still need to be further investigated in order tobe more conclusive.

4.1.3. Different resolutionsThe useful classification performance was also tested in gaits

with various resolutions. The high-resolution GEI was originallywith the size of 64�64 pixels, and the low-resolution GEIs withsizes of 56�56, 48�48, 40�40, 32�32, 24�24, 16�16, 8�8pixels were generated by the operation of smoothing and downsampling, which can be seen in Fig. 8 sequentially. The robustnessto low resolution of the proposed method and the effect of thenumber of training samples in each class on recognition perfor-mance was investigated.

Page 8: Kernel coupled distance metric learning for gait recognition and face recognition

X. Ben et al. / Neurocomputing 120 (2013) 577–589584

First, for each individual, we randomly chose the first two GEIsfor training and the remaining with low-resolution for testing.Moreover, one GEI with low-resolution coupled one with high-resolution constituting one pair for training, and the first GEI withhigh-resolution of 64�64 pixels was registered, denoted as ‘1&&1’for this case. We compare our method with CDML, Huang’smethod, PCA-RBF, PCA, KPCA, LPP and KLPP. Because of differentresolutions between the query and register ones, we employedinterpolation algorithms including nearest, bilinear, bicubic, lanc-zos2, and lanczos3 [47], called ‘n’, ‘l’, ‘c’, ‘l2’ and ‘l3’ respectively forshort when applying PCA, KPCA, LPP and KLPP. Especially, thefunctions for lanczos2 and lanczos3 can be shown in Fig. 9.

From the result, we can see that the lanczos3 interpolationalgorithm achieves the best performance, and we give the resultsof LPP under a variety of interpolation. So especially, only PCA,KPCA and KLPP after lanczos3 interpolation algorithm display theirresults. In the left half of Fig. 10, the CCRs with different methodsare drawn. The CCRs of testing GEIs with sizes of 56�56, 48�48,40�40, 32�32, 24�24, 16�16, 8�8 pixels achieved by ourmethod are 0.9395, 0.9254, 0.9274, 0.9536, 0.9214, 0.875 and

Fig. 9. The functions for lanczos2, and lanczos3. (1) Lanczos kern

Fig. 10. The CCRs under different r

0.7762 respectively with the corresponding value of s of KCDMLalso by using cross validation given in the last third to fourth rowsof Table 2. The reason why our method achieves the highestrecognition rate with 32�32 is that the original size of GEI is64�64, and every 4 pixels interpolate into a new pixel, thus itsgeometry of GEI has not changed.

Also, we changed the number of training samples, and for eachindividual, four GEIs are randomly selected for training, and theother with low-resolution for testing. Furthermore, two GEIs with64�64 pixels were registered, also were coupled with anothertwo GEIs with low-resolution to constitute one pair, which wasdenoted as ‘2&&2’. As can be seen in the right half of Fig. 10, theCCRs of testing GEIs with sizes of 56�56, 48�48, 40�40,32�32, 24�24, 16�16, 8�8 pixels achieved by our methodare 0.9516, 0.9395, 0.9274, 0.9798, 0.9516, 0.9113 and 0.8831respectively with the corresponding value of s of KCDML givenin the last one to second rows of Table 2. We can see that the CCRrises with the increasing number of training samples. Moreover,the CCRs for lower-resolution such as 16�16 and 8�8 pixels donot decline so much. So our method is robust to resolution

el function for a¼2 and (2) Lanczos kernel function for a¼3.

esolutions for gait recognition.

Page 9: Kernel coupled distance metric learning for gait recognition and face recognition

Fig. 11. Face images for one individual.

2 4 6 8 10 12 14 16 180

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Number of features

CC

R

KCDMLCDMLHuangPCA-RBFPCAKPCALPPKLPP

Fig. 12. The CCRs with different number of features for ‘ff&lf’.

0 2 4 6 8 10 12 14 16 18 200.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

RankC

CR KCDML

CDMLHuangPCA-RBFPCAKPCALPPKLPP

Fig. 13. Identification performance in terms of rank order statistics for ‘ff&lf’.

Table 3Comparisons of several recent algorithms for face recognition under variant faceposes.

Sample set Register ff ff lf lf of ofTraining set ff ff lf lf of of

lf of of ff ff lfTesting lf of of ff ff lf

Our method 0.875 1 1 0.925 0.9 0.925CDML 0.6 0.925 0.875 0.675 0.625 0.525Huang's method 0.525 0.625 0.65 0.65 0.6 0.45PCA-RBF 0.3 0.625 0.7 0.475 0.425 0.575PCA 0.3 0.775 0.35 0.175 0.5 0.775KPCA 0.6 0.95 0.9 0.7 0.75 0.625LPP 0.375 0.85 0.275 0.125 0.375 0.8KLPP 0.75 1 0.975 0.85 0.85 0.7

X. Ben et al. / Neurocomputing 120 (2013) 577–589 585

variations. The performance of CDML is worse than that of ourmethod. The reason is that CDML applies linear mapping to obtainfeature. The low-resolution severely affects the performance ofPCA, KPCA, LPP, KLPP and LDA. Though the descent performancesof Huang's method and PCA-RBF are not much as PCA, KPCA, LPP,KLPP and LDA, they can not work well when the similaritybetween the query and register ones is low.

4.2. UMIST face database experimental results

In this section, we evaluated KCDML on variant face poses anddifferent resolutions using UMIST face database. Fig. 11 shows thatnine samples with the original size of 112�92 pixels for eachindividual are used to ensure gradual changing of face from profileto frontal orientation, where the first, middle and last three arelabeled as lateral, oblique and frontal faces with the short-name‘lf’, ‘of’ and ‘ff’ respectively. In the transformed space, a nearestneighbor classifier is employed.

4.2.1. Face recognition under variant face posesFirst, to test the recognition performance with respect to

different number of features, one ‘ff’ for each individual wasregarded as register, and these ‘ff’ altogether with the ‘lf’ for allindividuals randomly composed of a training set, while theremaining two ‘lf’ were treated as the probe ones. This case canbe denoted as ‘ff&lf’. Fig. 12 lists the CCRs of KCDML with the valueof s being 5 008 000, compared with that of CDML, Huang'smethod, PCA-RBF, PCA, KPCA, LPP as well as KLPP. From Fig. 12we can see that KCLPM obtains the best result of 0.875 with thefeature dimension of 16, and it outperforms all the other methods.KCDML in the kernel space needs a bit more features than CDML ina linear space, so that KCDML achieves lower CCRs than othermethods do, when the number of reserved features is no morethan 10. Because the kernel method requires a high dimensionalkernel space and is not good at sample expression in a lowerreduced dimensionality space. Especially, KCDML needs more

vector feature points to make face images under variant posesmore consistent respectively. The CCRs achieved by CDML, Huang'smethod, PCA-RBF, KPCA, LPP and KLPP are 0.6, 0.525, 0.3, 0.6,0.375, 0.75 respectively all with 18D features, while for PCA, therecognition rate with 14D features is 0.3.

Second, in order to evaluate our face recognition performance,we also use CMS. The CMS using the best 16 features for KCDML is100% correct identification when Rank¼3 (Fig. 13). According tothe dimensionality selected above, the corresponding CMSs for theabove-mentioned algorithms are also contrastively shown inFig. 13.

We also tested other recognition performances by varyingregister, training and testing sets, denoted as ‘ff&lf’, ‘ff&of’, ‘lf&of’,‘lf&ff’, ‘of&ff’ and ‘of&lf’, and then the CCRs of the above-mentionedalgorithms are listed in Table 3. The corresponding value of s ofKCDML is given in the first two rows of Table 4. It can be indicatedthat the classification results of KCDML are very robust with thevariation of face pose.

Page 10: Kernel coupled distance metric learning for gait recognition and face recognition

Table 4Parameters in KCDML for face recognition.

Variant face poses ff&lf ff&of lf&of lf&ff of&ff of&lfs 5 008 000 12 000 3 161 000 4 482 000 3 665 000 451 8000Different resolutions_’lf ’ 28*24 12*10 8*6 6*4 — —

s 15 000 24 000 1000 90 — —

Different resolutions_’of ’ 28*24 12*10 8*6 6*4 — —

s 1000 2000 5000 38 — —

Different resolutions_’ff ’ 28*24 12*10 8*6 6*4 — —

s 1500 577 220 86 — —

Combined factors of different resolutions and poses ff&lf 28*24 ff&lf 12*10 ff&lf 8*6 ff&of 28*24 ff&of 12*10 ff&of 8*6s 6 873 000 1 347 000 2 836 000 2 286 000 228 000 4 530 000

lf&of 28*24 lf&of 12*10 lf&of 8*6 lf&ff 28*24 lf&ff 12*10 lf&ff 8*6s 2 082 000 1 080 000 240 000 858 000 7 696 000 174 000

of&ff 28*24 of&ff 12*10 of&ff 8*6 of&lf 28*24 of&lf 12*10 of&lf 8*6s 5 091 000 8 854 000 5 314 000 5 844 000 7 826 000 203 000

Fig. 14. Different resolutions for one individual.

Fig. 15. Face recognition results under various resolutions.

X. Ben et al. / Neurocomputing 120 (2013) 577–589586

4.2.2. Face recognition under variant face posesThree kinds of experiment tests were designed: lateral, oblique

and frontal face recognition under different resolutions such as28*24, 12*10, 8*6 and 6*4 pixels respectively, as shown in Fig. 14.

Especially, we made sure the facial orientation of all samples inregister set and test set are the same, but with different resolu-tions. We also tested the performance of the proposed methodcompared with CDML, Huang's method, PCA-RBF, PCA, KPCA, LPPand KLPP, and it is summarized in Fig. 15. It can be seen from thethree sub-figures of Fig. 15 that the right person in the top onematch 0.95, 1, and 0.95 respectively for lateral, oblique and frontalface recognition under all various resolutions. The correspondingvalue of s of KCDML is given in the third to eighth rows of Table 4.The experimental results show the CCR is not decreased due tolow-resolutions for the proposed method, but the CCRs of othermethod dramatically fall down. Therefore, the proposed method isvery robust to the variation of face resolution.

4.2.3. Combined factors of different resolutions and poses on facerecognition

To inquire into the influence factor mixing different face posesand resolutions of face images, six kinds of experiment types weredesigned. Each experiment of one paired samples, namely frontalface coupled with lateral face, frontal face coupled with obliqueface, lateral face coupled with oblique face, lateral face coupledwith frontal face, oblique face coupled with frontal face, andoblique face coupled with lateral face for each individual wasconducted. The register sample was with the resolution of 112�92and query samples were with the resolutions of 28�24, 12�10,8�6 pixels respectively. These experimental results comparedwith CDML, Huang's method, PCA-RBF, PCA, KPCA, LPP and KLPPare listed in Fig. 16 (1)–(6). The corresponding value of s of KCDMLis given in the last six rows of Table 4. Unlike the only factor ofresolution, the CCR of KCLPM descends as the resolution of thequery sample descends except (2) and (3). However, the perfor-mance of KCLPM declines no more than other methods and it canbe demonstrated the effectiveness, robustness and superiorityof KCLPM.

It is noted that from the results of the parameters on KCDML foreach probe of face recognition and gait recognition, 1000 or 10 000

can be chosen as the step to achieve the best performance exceptfor the very low-resolution such as 6*4 pixels, but 1 or 10 can bechosen as the step for the 6*4 pixels resolution case.

4.3. Time complexity analysis

In this section, the time complexity of each method for gaitrecognition and face recognition under various influence factors isdiscussed. Denoting s and t as the height and width of the trainingsample, it is noted that KCDML involves eigen-decompositionproblems of size 2M � 2M, as compared to size 2st � 2M in CDML,where st is further more than M. Therefore, PCA should beemployed before applying CDML to gait recognition and facerecognition. Time complexity is actually related with the dimen-sionalities of coupled samples. Therefore,we choose the highestresolution such as 64�64 or 112�92 for each recognition task forexample. Table 5 gives the mean runtime of each method for gaitrecognition and face recognition under various conditions. Thereason why the time consumed of gait recognition is longer thanthat of face recognition is because the number of individuals inCASIA(B) database is larger than UMIST database. Because there isno variance within the classes in our experiments on the UMISTdatabase, so we do not list the time consumed of LDA. For all theexperiments, the complexity of KCDML, which can obtain thehighest CCR, is lower than of CDML. Also, it can be revealed thatthe computational time of PCA, KPCA, LPP, KLPP and LDA is moreor less than that of the proposed method and CDML, as a result ofPCA, KPCA, LPP, KLPP and LDA aim to compute one single trans-form matrix, while KCDML and CDML need to calculate twomatrices for two different kinds of data sets. As the number of

Page 11: Kernel coupled distance metric learning for gait recognition and face recognition

Fig. 16. Face recognition results under combined factors of different resolutions and poses. (1) Register: ff & query: lf, (2) Register: ff & query: of, (3) Register: lf & query: of,(4) Register: lf & query: ff, (5) Register: of & query: ff and (6) Register: of & query: lf.

Table 5The average time (s) consumed of the eight methods (CPU: Intel(R) Core(TM)2 DuoT8300 at 2.40 GHz 2.39 GHz, RAM:1G).

Task KCDML CDML Huang PCA-RBF

PCA KPCA LPP KLPP LDA

Gaitrecognition

0.35 0.55 15.75 15.67 0.26 0.23 0.40 0.34 0.53

Facerecognition

0.06 0.08 0.25 0.23 0.06 0.07 0.14 0.13 —

Table 6CCRs of different classifiers.

State Gait recognition Face recognition

Differentwalkingstates

Differentwalkingdirections

Differentresolutions

Variantfaceposes

Variantresolutions

Differentresolutionsand poses

NN 0.9133 0.8258 0.9188 0.9375 0.9667 0.8583LinearSVM

0.9194 0.8274 0.9210 0.9392 0.9667 0.8592

RBFSVM

0.9194 0.8274 0.9210 0.9392 0.9667 0.8592

X. Ben et al. / Neurocomputing 120 (2013) 577–589 587

samples increases, the time consumed of Huang's method andPCA-RBF dramatically increase as compared to KCDML and CDML.

4.4. Extra classifier discussion

This paper presented a method called Kernel coupled distancemetric learning (KCDML) to address the metric learning problemin the inconsistent matching biometrics recognition. It modeled agraph according to the relationships within the training datanodes in the kernel space, and designed the constraint to mini-mize the difference between dual samples with different appear-ances for an identical subject employing similarity matrix. Thispaper did not pay attention to the classifier but the featureextraction approach. Therefore, the nearest neighbor classifier

was chosen to evaluate the superiority of the proposed method,which can make different appearances for an identical subjectmore consistent. Indeed, choosing a suitable classifier is a usefulway to obtain better recognition performance, however, thisperformance improvement embraces feature extraction effectstogether with the contribution from the classifier.

SVM [48] can be also chosen as the classifier. To keep thecomputational load reasonable, the mappings used by SVMschemes are designed to ensure that dot products and RBF kernelfunction may be computed easily in terms of the variables in theoriginal space, by defining them in terms of a kernel function

Page 12: Kernel coupled distance metric learning for gait recognition and face recognition

X. Ben et al. / Neurocomputing 120 (2013) 577–589588

Kðxi; xjÞ selected to suit the problem:

Linear SVM function : Kðxi; xjÞ ¼ xTi xj

RBF SVM function : Kðxi; xjÞ ¼ expð−γjjxi−xjjj2Þ; γ40

The CCRs for the two databases are given in Table 6. They indicatethat linear SVM and RBF SVM can obtain slightly better perfor-mance than the nearest neighbor classifier.

5. Conclusions and future work

A metric learning method, namely kernel coupled distancemetric learning (KCDML), has been proposed in this paper. KCDMLimproves the existing metric learning methods due to the follow-ing facts: (1) kernel tricks allow effective image representationwith an underlying nonlinear spatial structure; (2) KCDML has asimple process without iterations; (3) a key insight behind KCDMLis to transform the input data into a higher-dimensional featurespace, which leads to a reduction in the small sample size problemoccurring in conventional supervised learning algorithms; and(4) the decoupling procedure of KCDML can be solved by ageneralized eigen-decomposition. To empirical justify the proper-ties of KCDML, we applied it to gait recognition and face recogni-tion. Based on a number of experiments, we have the followingobservations: (1) KCDML perform better than linear methods suchas CDML, PCA and LPP, and nonlinear methods such as Huang'smethod, PCA-RBF, KPCA and KLPP for gait recognition underdifferent walking states, walking directions, or resolutions.(2) KCDML achieves highly competitive performance with respectto CDML, Huang's method, PCA-RBF, PCA, LPP, KPCA and KLPP forface recognition under variant face poses, resolutions or theircombined factor. (3) It is not difficult to obtain a reasonably goodperformance of the proposed method by adjusting the parameters via cross validation.

A possible extension of our work is to consider the use ofdiscriminative information. Since LDA are believed to encodediscriminative information in a linearly separable space usingbases that are not necessarily orthogonal, we believe that thediscriminative information is of great value. We currently areexploring these problems in theory and practice. In addition tocross validation to determine the kernel parameters, some othermeans will be explored.

Another future work is to extend the coupled distance metriclearning algorithm to its matrix version and multilinear version,i.e. tensorization of CDML through the coupled distance metriccriterion that the pair-wise samples should stay as close aspossible and preserve the local structure intrinsic data geometry.Especially, for gait recognition, each sample is a sequence tensor,and we plan to further explore the performance of motion featureslike 3D Haar-like features [49] which are used to represent thegait's feature.

Acknowledgment

We sincerely thank the Institute of Automation Chinese Academyof Sciences for granting us permission to use the CASIA(B) gaitdatabase. We would like to thank H. Huang for providing the code ofhis method [43]. This project is supported by the Natural ScienceFoundation of China (Grant No. 61201370, 61100103), the Indepen-dent Innovation Foundation of Shandong University (Grant No.2012GN043, 2012DX007, 2012ZD039), the Specialized Research Fundfor the Doctoral Program of Higher Education of China (Grant No.20120131120030), the Natural Science Foundation of ShandongProvince (Grant No. ZR2010FM040), the Research Award Fund for

Outstanding Middle-aged and Young Scientist of Shandong Province(Grant No. 2013BSE27058) and the National Science Foundation forPost-doctoral Scientists of China (Grant No. 2013M530321).

References

[1] L. Wu, S.C.H. Hoi, Enhancing bag-of-words models with semantics-preservingmetric learning, IEEE Multimedia 18 (1) (2011) 24–37.

[2] S.C.H. Hoi, W. Liu, S.F. Chang, Semi-supervised distance metric learning forcollaborative image retrieval, in: Proceedings of IEEE Conference on ComputerVision and Pattern Recognition (CVPR), 2008, pp. 1–7.

[3] Y. Jia, C. Zhang, Learning distance metric for semi-supervised image segmen-tation, in: Proceedings of 15th IEEE International Conference on ImageProcessing (ICIP), 2008, pp. 3204–3207.

[4] G. Lebanon, Metric learning for text documents, IEEE Trans. Pattern Anal.Mach. Intell. 28 (4) (2006) 497–508.

[5] K.Q. Weinberger, F. Sha, L.K. Saul, Convex optimizations for distance metriclearning and pattern classification, IEEE Signal Process. Mag. 27 (3) (2010)146–158.

[6] T.K. Ates, E. Esen, A. Saracoglu, M. Soysal, Y. Turgut, O. Oktay, A.A. Alatan,Content based video copy detection with local descriptors, in: Proceedings ofIEEE 18th Signal Processing and Communications Applications Conference,2010, pp. 49–52.

[7] Q. Liu, M. Qiao, A.H. Sung, Distance metric learning and support vectormachines for classification of mass spectrometry proteomics data, in: Pro-ceedings of Seventh International Conference on Machine Learning andApplications, 2008, pp. 631–636.

[8] J. Lee, C. Zhang, Classification of gene-expression data: the manifold-basedmetric learning way, Pattern Recognition 39 (12) (2006) 2450–2463.

[9] K. Gopal, T.R. Ioerger, Distance metric learning through optimization ofranking, in: Proceedings of Seventh IEEE International Conference on DataMining Workshops, 2007, pp. 201–206.

[10] J. Dalwon, C.D. Yoo, Fingerprint matching based on distance metric learning,in: Proceedings of IEEE International Conference on Acoustics, Speech andSignal Processing, 2009, pp. 1529–1532.

[11] M. Guillaumin, J. Verbeek, C. Schmid, Is that you? Metric learning approachesfor face identification, in: Proceedings of IEEE 12th International Conferenceon Computer Vision, 2009, pp. 498–505.

[12] P. Zhu, W. Hu, C. Yuan, L. Li, Prototype learning using metric learning basedbehavior recognition, in: Proceedings of 20th International Conference onPattern Recognition (ICPR), 2010, pp. 2604–2607.

[13] X. He, P. Niyogi, Locality preserving projections, Adv. Neural Inf. Process. Syst.(2004).

[14] J.B. Tenenbaum, V. Silva, J.C. Langford, A global geometric framework fornonlinear dimensionality reduction, Science 260 (2000) 2319–2323.

[15] S. Roweis, L. Saul, Nonlinear dimensionality reduction by locally linearembedding, Science 290 (2000) 2323–2326.

[16] M. Belkin, P. Niyogi, Laplacian eigenmaps for dimensionality reduction anddata representation, Neural Comput. 15 (6) (2003) 1373–1396.

[17] C. Lu, G. Feng, J. Jiang, P. Wang, Metric learning: A general dimension reductionframework for classification and visualization, in: Proceedings of 19th Inter-national Conference on Pattern Recognition, 2008, pp. 1–4.

[18] F. Li, J. Yang, J. Wang, A transductive framework of distance metric learning byspectral dimensionality reduction, in: Proceedings of 24th internationalconference on Machine learning. Penn Plaza, Suite 701 New York, NY, USA,2007, pp. 513–520.

[19] D.Y. Yeung, H. Chang, Locally smooth metric learning with application toimage retrieval, in: Proceedings of IEEE 11th International Conference onComputer Vision (ICCV), 2007, pp. 1–7.

[20] P. Jaakko, K. Arto, K. Samuel, Improved learning of Riemannian metrics forexploratory analysis, Neural Networks 17 (8–9) (2004) 1087–1100.

[21] H. Tang, M. Hasegawa-Johnson, T. Huang, A novel vector representation ofstochastic signals based on adapted ergodic HMMs, IEEE Signal Process. Lett.17 (8) (2010) 715–718.

[22] J. Sun, D. Sow, J. Hu, S. Ebadollahi, Localized supervised metric learning ontemporal physiological data, in: Proceedings of 20th International Conferenceon Pattern Recognition (ICPR), 2010, pp. 4149–4152.

[23] C.C. Chang, Generalized iterative RELIEF for supervised distance metriclearning, Pattern Recognition 43 (8) (2010) 2971–2981.

[24] K. Abou-Moustafa, F.P. Ferrie, Regularized minimum volume ellipsoid metricfor query-based learning, in: Proceedings of Seventh International Conferenceon Machine Learning and Applications, 2008, pp. 188–193.

[25] B. Kulis, P. Jain, K. Grauman, Fast similarity search for learned metrics, IEEETrans. Pattern Anal. Mach. Intell. 131 (12) (2009) 2143–2157.

[26] H. Cevikalp, Semi-supervised distance metric learning by quadratic program-ming, in: Proceedings of 20th International Conference on Pattern Recognition(ICPR), 2010, pp. 3352–3355.

[27] G. Beliakov, S. James, G. Li, Learning choquet-integral-based metrics forsemisupervised clustering, IEEE Trans. Fuzzy Syst. 19 (3) (2011) 562–574.

[28] B. Xie, M. Wang, D. Tao, Toward the optimization of normalized graphlaplacian, IEEE Trans. Neural Networks 22 (4) (2011) 660–666.

[29] E.P. Xing, A.Y. Ng, M.I. Jordan, S. Russell, Distance Metric Learning withApplication to Clustering with side Information, Advances in NIPS 15, MITPress, Cambridge, MA, USA505–512.

Page 13: Kernel coupled distance metric learning for gait recognition and face recognition

X. Ben et al. / Neurocomputing 120 (2013) 577–589 589

[30] A. Bar-Hillel, T. Hertz, N. Shental, D. Weinshall, Learning a Mahalanobis metricfrom equivalence constraints, J. Mach. Learn. Res. 6 (2005) 937–965.

[31] J. Bing, B.C. Vemuri, Metric learning using Iwasawa decomposition, in:Proceedings of IEEE 11th International Conference on Computer Vision (ICCV),2007, pp. 1–6.

[32] N. Kumar, K. Kummamuru, Semisupervised clustering with metric learningusing relative comparisons, IEEE Trans. Knowl. Data Eng. 20 (4) (2008)496–503.

[33] X. He, Z. Zhang, Distance metric learning for ameliorated nonnegative matrixfactorization, in: Proceedings of Second International Workshop on ComputerScience and Engineering, 2009, pp. 511–515.

[34] S. Li, S. Shan, Margin emphasized metric learning and its application to Gaborfeature based face recognition, in: Proceedings of IEEE International Con-ference on Automatic Face & Gesture Recognition and Workshops, 2011,pp. 579–584.

[35] M. Wang, B. Liu, J. Tang, X. Hua, Metric learning with feature decompositionfor image categorization, Neurocomputing 73 (10–12) (2010) 1562–1569.

[36] B. Li, H. Chang, S. Shan, X. Chen, Low-resolution face recognition via coupledlocality preserving mappings, IEEE Signal Process. Lett. 17 (1) (2010) 20–23.

[37] Z. Deng, D. Dai, X. Li, Low-resolution face recognition via color informationand regularized coupled mappings, in: Proceedings of 2010 Chinese Confer-ence on Pattern Recognition, 2010.

[38] X. Ben, W. Meng, R. Yan, K. Wang, An improved biometrics technique based onmetric learning approach, Neurocomputing 97 (11) (2012) 44–51.

[39] W. Huang, L.C. Kap, Y. Gao, V. Chong, Nasopharyngeal carcinoma lesionextraction using clustering via semi-supervised metric learning with side-information, in: Proceedings of 5th International Conference on VisualInformation Engineering, 2008, pp. 539–543.

[40] H. Sahbi, L. Ballan, G. Serra, A. Del Bimbo, Context-dependent logo matchingand recognition, IEEE Trans. Image Process. 22 (3) (2013) 1018–1031.

[41] S. Yu, D. Tan, T. Tan, A framework for evaluating the effect of view angle,clothing and carrying condition on gait recognition, in: Proceedings of 18thInternational Conference on Pattern Recognition, Hong Kong, China, 2006,pp. 441–444.

[42] B.G. Daniel, M.A. Nigel, Face recognition: from theory to applications, in:H. Wechsler, P.J. Phillips, V. Bruce, F. Fogelman-Soulie, T.S. Huang (Eds.), NATOASI Series F, Computer and Systems Sciences, 1998, pp. 446–456.

[43] H. Huang, H. He, Super-resolution method for face recognition using nonlinearmappings on coherent features, IEEE Trans. Neural Networks 22 (1) (2011)121–130.

[44] C. Deng, X. He, J. Han, Spectral regression for dimensionality reduction,Department of Computer Science Technical Report No. 2856, University ofIllinois at Urbana-Champaign (UIUCDCS-R-2007-2856), 2007.

[45] X. Ben, W. Meng, R Yan, Dual-ellipse fitting approach for robust gaitperiodicity detection, Neurocomputing 79 (3) (2012) 173–178.

[46] X. Ben, K. Wang, R. Yan, O.P. Popoola. Subpattern-based Complete TwoDimensional Principal Component Analysis for Gait Recognition, vol. 7(2),The China Association for Science and Technology, 2011, pp. 16–22.

[47] W. Burger, M.J. Burge, Principles of Digital Image Processing: Core Algorithms,Springer231–232.

[48] T. Sergios, K. Konstantinos, Pattern Recognition, 4th Edition, Academic Press,2009.

[49] X. Cui, Y. Liu, S. Shan, X. Chen, W. Gao. 3D haar-like features for pedestriandetection 2007, in: IEEE International Conference on Multimedia and Expo,2007, pp.1263–1266.

Xianye Ben was born in Harbin, China, in 1983. Shereceived the B.S. degree in electrical engineering andautomation from the College of Automation, HarbinEngineering University, Harbin, China, in 2006, and thePh.D. degree in pattern recognition and intelligentsystem from the College of Automation, Harbin Engi-neering University, Harbin, in 2010. She is currentlyworking as an Assistant Professor in the School ofInformation Science and Engineering, Shandong Uni-versity, Jinan, China. She has published more than 40papers in major journals and conferences. Her currentresearch interests include pattern recognition, digital

image processing and analysis, machine learning.

Weixiao Meng was born in Harbin, China, in 1968.He received his B.Sc. degree in Electronic Instrumentand Measurement Technology from Harbin Institute ofTechnology (HIT), China, in 1990. And then he obtainedthe M.S. and Ph.D. degree, both in Communication andInformation System, HIT, in 1995 and 2000 respectively.Now he is a professor in School of Electronics andCommunication Engineering, HIT. Besides, he is asenior member of IEEE, a senior member of ChinaInstitute of Electronics, China Institute of Communica-tion and Expert Advisory Group on Harbin E-Government. His research interests mainly focus on

adaptive signal processing. In recent years, he has

published one authored book and more than 100 academic papers on journalsand international conferences, more than 60 of which was indexed by SCI, EI andISTP. Up to now, he totally completed more than 20 research projects and holds sixChina patents. One standard proposal was accepted by IMT-Advancedtechnical group.

Rui Yan was born in Jilin, China, in 1988. He receivedthe B.S. degree in automation from the College ofAutomation, Harbin Engineering University, Harbin,China, in 2011. He is currently a Ph.D. student in theComputer Science Department, Rensselaer PolytechnicInstitute. His current research interests include patternrecognition and semantic web.

Kejun Wang was born in Jilin, China, in 1962. Hereceived his Ph.D. degree in Special auxiliary ships,marine equipment and systems from Harbin Engineer-ing University in 1995. From 1996 to 1998, he was aPostdoctoral Research Fellow in Fluid Power Transmis-sion and Control at Harbin Institute of Technology. He isnow a professor and doctoral supervisor at College ofAutomation in Harbin Engineering University. He hasheld and participated in many projects such as finger-printing recognition and has published more than 80refereed journal papers. His current research interestsinclude biometrics and pattern recognition and intelli-

gent system.