incremental pairwise discriminant analysis based visual tracking

11
Incremental pairwise discriminant analysis based visual tracking Jing Wen a , Xinbo Gao a , Xuelong Li b,n , Dacheng Tao c , Jie Li a a School of Electronic Engineering, Xidian University, No.2, South Taibai Road, Xi’an 710071, Shaanxi, P. R. China b Center for OPTical IMagery Analysis and Learning (OPTIMAL), State Key Laboratory of Transient Optics and Photonics, Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an 710119, Shaanxi, P. R. China c School of Computer Engineering, Nanyang Technological University, Singapore article info Article history: Received 5 February 2010 Received in revised form 28 April 2010 Accepted 26 July 2010 Communicated by Qingshan Liu Available online 27 August 2010 Keywords: Pairwise discriminant analysis Log-Euclidean Riemannian Incremental learning Visual tracking abstract The distinguishment between the object appearance and the background is the useful cues available for visual tracking, in which the discriminant analysis is widely applied. However, due to the diversity of the background observation, there are not adequate negative samples from the background, which usually lead the discriminant method to tracking failure. Thus, a natural solution is to construct an object–background pair, constrained by the spatial structure, which could not only reduce the neg- sample number, but also make full use of the background information surrounding the object. However, this idea is threatened by the variant of both the object appearance and the spatial-constrained background observation, especially when the background shifts as the moving of the object. Thus, an incremental pairwise discriminant subspace is constructed in this paper to delineate the variant of the distinguishment. In order to maintain the correct the ability of correctly describing the subspace, we enforce two novel constraints for the optimal adaptation: (1) pairwise data discriminant constraint and (2) subspace smoothness. The experimental results demonstrate that the proposed approach can alleviate adaptation drift and achieve better visual tracking results for a large variety of nonstationary scenes. & 2010 Elsevier B.V. All rights reserved. 1. Introduction Visual tracking is a fundamental and challenging task in pattern recognition and computer vision, which has wide application in video surveillance [52], robot, human–machine interaction [42,45,46], and object recognition [31,50]. Influenced by the view, illumination variation, and shape deformation, etc., the change of the object may ruin the prespecified visual measurement (or observation) model and lead to tracking failure. Most existing tracking methods can be classified into two types of approaches. One is to exploit the invariant feature [3] of the object. However, it is very difficult to find invariants, although learning methods [1,2,6,12,16] can be employed. Moreover, these kinds of methods usually need an off-line training process. The other type of approaches is to adapt the visual model to the changes, e.g., by online updating the appearance models [5,7,10,14,15], or selecting the best visual features [8,17–19,24] during the procedure of the tracking. Compared to the invariants- based methods, the adaptation-based methods are more flexible, since the measurement models are adaptive or the features used for tracking can be adaptively selected [20,21,28]. However, the adaptation drift, i.e., the appearance models adapt to other image regions rather than the object of interest and lead to tracking failure, can be seen commonly in most existing adaptation-based methods. Many methods have been proposed to alleviate the drift, e.g., by enforcing the similarity to the initial model [8,10]. In most existing adaptive tracking methods, the model at current time instant is update by the new data that are closest to the model at previous time step, with a hidden assumption that the optimal model up to time t 1 also adapt for time t. Unfortunately, this assumption may not hold when the new data is far away from the model. The nature of the adaptive tracking problem lies in a chicken- and-egg dilemma [21]: the right data at time t are found by the right model at time t, while the right model can only be adapted by using the right data at time t. Thus, a supervised mechanism may be required to introduce the negative information, here the background in the tracking issue, to constrain the correct object. If no constraints are enforced, any new data can lead to a valid and stable adaptation, since the adapted model tends to best fit the new data. Therefore, we could introduce the discriminative scheme, which usually need to obtain effective and typical negative samples from the background, as well as good data-driven constraints from the image observations at the current time instant, and they should be reasonable and allow a wide range of adaptation. Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/neucom Neurocomputing 0925-2312/$ - see front matter & 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.neucom.2010.07.014 n Corresponding author. E-mail address: [email protected] (X. Li). Neurocomputing 74 (2010) 428–438

Upload: jing-wen

Post on 21-Jun-2016

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Incremental pairwise discriminant analysis based visual tracking

Neurocomputing 74 (2010) 428–438

Contents lists available at ScienceDirect

Neurocomputing

0925-23

doi:10.1

n Corr

E-m

journal homepage: www.elsevier.com/locate/neucom

Incremental pairwise discriminant analysis based visual tracking

Jing Wen a, Xinbo Gao a, Xuelong Li b,n, Dacheng Tao c, Jie Li a

a School of Electronic Engineering, Xidian University, No.2, South Taibai Road, Xi’an 710071, Shaanxi, P. R. Chinab Center for OPTical IMagery Analysis and Learning (OPTIMAL), State Key Laboratory of Transient Optics and Photonics, Xi’an Institute of Optics and Precision Mechanics, Chinese

Academy of Sciences, Xi’an 710119, Shaanxi, P. R. Chinac School of Computer Engineering, Nanyang Technological University, Singapore

a r t i c l e i n f o

Article history:

Received 5 February 2010

Received in revised form

28 April 2010

Accepted 26 July 2010

Communicated by Qingshan Liuobject–background pair, constrained by the spatial structure, which could not only reduce the neg-

Available online 27 August 2010

Keywords:

Pairwise discriminant analysis

Log-Euclidean Riemannian

Incremental learning

Visual tracking

12/$ - see front matter & 2010 Elsevier B.V. A

016/j.neucom.2010.07.014

esponding author.

ail address: [email protected] (X. Li).

a b s t r a c t

The distinguishment between the object appearance and the background is the useful cues available for

visual tracking, in which the discriminant analysis is widely applied. However, due to the diversity of

the background observation, there are not adequate negative samples from the background, which

usually lead the discriminant method to tracking failure. Thus, a natural solution is to construct an

sample number, but also make full use of the background information surrounding the object. However,

this idea is threatened by the variant of both the object appearance and the spatial-constrained

background observation, especially when the background shifts as the moving of the object. Thus, an

incremental pairwise discriminant subspace is constructed in this paper to delineate the variant of the

distinguishment. In order to maintain the correct the ability of correctly describing the subspace, we

enforce two novel constraints for the optimal adaptation: (1) pairwise data discriminant constraint and

(2) subspace smoothness. The experimental results demonstrate that the proposed approach can

alleviate adaptation drift and achieve better visual tracking results for a large variety of nonstationary

scenes.

& 2010 Elsevier B.V. All rights reserved.

1. Introduction

Visual tracking is a fundamental and challenging task inpattern recognition and computer vision, which has wideapplication in video surveillance [52], robot, human–machineinteraction [42,45,46], and object recognition [31,50]. Influencedby the view, illumination variation, and shape deformation, etc.,the change of the object may ruin the prespecified visualmeasurement (or observation) model and lead to tracking failure.

Most existing tracking methods can be classified into twotypes of approaches. One is to exploit the invariant feature [3] ofthe object. However, it is very difficult to find invariants, althoughlearning methods [1,2,6,12,16] can be employed. Moreover, thesekinds of methods usually need an off-line training process. Theother type of approaches is to adapt the visual model to thechanges, e.g., by online updating the appearance models[5,7,10,14,15], or selecting the best visual features [8,17–19,24]during the procedure of the tracking. Compared to the invariants-based methods, the adaptation-based methods are more flexible,since the measurement models are adaptive or the features usedfor tracking can be adaptively selected [20,21,28].

ll rights reserved.

However, the adaptation drift, i.e., the appearance modelsadapt to other image regions rather than the object of interest andlead to tracking failure, can be seen commonly in most existingadaptation-based methods. Many methods have been proposed toalleviate the drift, e.g., by enforcing the similarity to the initialmodel [8,10]. In most existing adaptive tracking methods, themodel at current time instant is update by the new data that areclosest to the model at previous time step, with a hiddenassumption that the optimal model up to time t�1 also adaptfor time t. Unfortunately, this assumption may not hold when thenew data is far away from the model.

The nature of the adaptive tracking problem lies in a chicken-and-egg dilemma [21]: the right data at time t are found by theright model at time t, while the right model can only be adaptedby using the right data at time t.

Thus, a supervised mechanism may be required to introducethe negative information, here the background in the trackingissue, to constrain the correct object. If no constraints areenforced, any new data can lead to a valid and stable adaptation,since the adapted model tends to best fit the new data. Therefore,we could introduce the discriminative scheme, which usuallyneed to obtain effective and typical negative samples from thebackground, as well as good data-driven constraints from theimage observations at the current time instant, and they shouldbe reasonable and allow a wide range of adaptation.

Page 2: Incremental pairwise discriminant analysis based visual tracking

J. Wen et al. / Neurocomputing 74 (2010) 428–438 429

In this paper, the general adaptation problem is substantia-lized as a pairwise discriminant subspace adaptation problem innonstationary appearance tracking. By discriminating the positiveand negative data, the optimal object state is estimated with thelargest separation. Moreover, the discriminant subspace could notonly represent the object appearance variant, but also exclude thebackground observation far from the object. This subspace couldbe obtained under the assumption that there exists a kind ofobject–background data pair, and have the characteristic that inthe discriminant subspace, it is only the object–background datapair can get the large measurement, otherwise the data pairwould be other type of data pair, e.g., the background–back-ground data pair. Here we could also pose the object–backgroundpair as the positive–negative data pair. In order to obtain thediscriminant subspace, the data pair to construct the subspaceshould satisfy the condition that the relationship between thepositive and negative sample should be constrained. In thetracking problem, this relationship could be the spatial, forexample, the relative location. Relative to the positive class, thenegative class usually takes on diversity; it is hardly to obtain thesufficient samples. However, for the visual tracking, the negativeclass only distribute in the non-object region, be more exact, theregion surrounding and near the object of interest. Therefore, theobject would be tracked if the excellent discriminant scheme isbuilt, thus the object of interest is found right among the samplepair satisfying the assumption above. Moreover, with theassumption that the variant during the short interval should belinear, the subspace smoothness is adopted to constrain thediscriminant subspace update, in case that the subspace wouldbend to arbitrary disturbance.

In the next section, we briefly review some related trackingalgorithms in terms of different observation models, afterwardsthe dilemma in the traditional adaptation schemes is investigated.Our approach is proposed to deal with the dilemma of theadaptation is elaborated in Section 3. In Section 4, we present thetracking flow based on the incremental pairwise discriminantanalysis. The experimental results and discussions are presentedin Section 5. The concluding remarks are given in Section 6.

2. Related work and motivation

Object tracking could be formulated into a dynamic system,which mainly depends on two respects: the dynamic associationand the measurement matching. Due to the state transformationusually formulated as the one-order Markov process, the dynamicassociation predicts the state parameters of the object at time t

based on the previous time t�1. In the measurement matching,the similarity of the observed evidence and the visual model iscomputed to estimate the optimal state. In tracking, we usuallyuse the observation model instead of the measurement matching.Generally, the performance of the tracking depends much moreon the observation model than the dynamic association. Withoutany prior about the moving object, the appearance based methodsare more stable and general compared to the others. Whereas,whether the optimal observation could be obtained usuallydepends on the ability of the object appearance model to describethe object.

Here, we would investigate the visual observation model fromthe respect that whether the appearance model is updated as thetracking process. For the type of no update, i.e., the fixed model,the tracking procedure mainly directed by the similarity betweenthe feature of the image observations z, which can be edge, colorhistograms, feature points, etc. However, the image feature of theobject appearance would also be influenced by many factors, e.g.,illumination and occlusion, as well as the deformation, which

would lead the fixed model to two kinds of limitations: (1) weakgeneralization ability and (2) the possibility of the complexcomputation for training the appearance model. Moreover, theoff-line training process is usually required. Owing to the impactof the factors above, the simple appearance model could not coverall the variant of the object appearance. Although the firstlimitation could be improved by enumerating the appearance asmany as possible, the construction of the appearance modelwould refer to the large computational consumption, especiallythe nonlinear manifold is concerned.

Thus the updating schemes for the appearance model areexploited during the tracking. In general, there is a commonassumption that the manifold during a short time interval is linear[10,23]. The nonlinear manifold is approximated by piecewiselinear subspace [9] or mapped to low-dimensional manifold [25]using nonlinear mapping, or the learned general subspace couldbe updated to a specific one during the tracking [22], or the multi-linear methods [38–41,51,52] are employed for modeling theobject. Among these methods, model drift is one of the commonand fundamental challenges.

On the basis of the assumption about the linearity over a shorttime interval, we assume the object appearances (or visualfeatures) zARm lie in a linear subspace spanned by r linearlyindependently columns of a linear transform AARm�r , i.e., z is alinear combination of the columns of A, z¼Ab.

The projection of z on the subspace Rr is given by the leastsquare solution of z¼Ab, i.e.,

b¼ ðAT AÞ�1AT z ð1Þ

where ðAT AÞ�1 is the pseudo-inverse of A. The reconstruction ofthe projection in Rr is given by

~z ¼ AAT , z¼ Pz ð2Þ

where P¼ AAT ARm�m is called the projection matrix. Thesubspace L delineated by a random vector process {z} is givenby the following optimization problem:

P� ¼ arg minP

99z�Pz992ð3Þ

The optimization problem is equivalent to apply the principalcomponent analysis on the data. In the tracking scenario, theproblem [5,8,11,13] becomes

x�t ¼ arg minxt

99zðxtÞ�Pt�1zðxtÞ992

P�t ¼ arg minPt

99zðx�t Þ�Ptzðx�t Þ992

8>><>>: ð4Þ

where xt is the motion parameters to be tracked, x�t is the optimalmotion parameter estimated by Pt�1. With this setting, we arefacing a dilemma: if {xt} cannot be determined, then P neither can,nor vice versa. Namely, given any tracking result, good or bad, wecan always find an optimal subspace that can best explain thisparticular result. Though the discriminant information may beapplied, the dilemma still exists. Moreover, the negative samplesare hardly selected typically. The reason for that is there are noconstraints on the relationship of the positive and negativesample for the discriminant, as well as on P for the subspaceupdate.

3. Incremental pairwise discriminant analysis

According to the analysis in Section 2, it is clear that we shouldmake full use of the background observation surrounding theobject as the discriminative information, and set reasonableconstraints to the adaptation appearance model. It would be a

Page 3: Incremental pairwise discriminant analysis based visual tracking

J. Wen et al. / Neurocomputing 74 (2010) 428–438430

reasonable appearance model if the following characteristics aremet: (1) It could discriminate the object from the background. (2)It has the consecutiveness between the successive models, whenthe incremental scheme is involved. Therefore, we impose thefollowing two constraints for the visual appearance model:

Pairwise discriminant constraint: Given a pair of data points, it ismuch easier to determine whether or not they belong to the sameclass. Enlightened by this idea, the observed evidences areprovided by the pairwise manner. The data pair projected onthe discriminant subspace is separated far from each other, andthen it is possible the true object exists among the data pair. Thefarther the separation of the data pair, the most possibly thepositive data are the true object. It should be noticed that the datapair should maintain some structure relationship.

Subspace smoothness constraint: The smoothness constraint isvery important for the discriminant subspace, since the subspaceat time t is updated on the basis of the subspace at time t�1, withthe assumption that the difference between the consecutivesubspaces is small.

As shown in Fig. 1, the regions in red rectangle denote theobject of interest, the regions between the red and blue dashedrectangle correspond to the object are the background observa-tions, which are the only negative information should beconcerned at the current time/frame. By projecting the data pairinto the discriminant subspace, we could determine which pair isthe most possibly the object–background pair. The green linedenotes the discriminant subspace update online. The set withinthe green line is the positive class, while the set out of the greenline is negative class. Under the help of the pairwise dicriminantconstraint, the discriminant subspace could be delineated andupdated by the new arrival data pair. Due to the current subspacetakes advantage of the previous subspace, the conjunctivesubspaces should have little difference. The smoothness is thepowerful basis for constraining the discriminant subspace when itis updated. Note that the data pairs are obtained by specificspatial structure relationship, such as in Fig. 1.

3.1. Formulation of the appearance model

According to the demonstration above in this section, an optimalsubspace should have three features: firstly, it is premised that thepositive data have larger projection on this subspace, that is, thelarger of the projection 99AT

t Cþt At992

the better ability of thesubspace on expressing the positive class; secondly, the negativedata have smaller projection 99AT

t C�t At992, i.e., the negative data are

far away from their projection on the subspace; thirdly, the currentsubspace should be close to the previous one. The optimal subspaceat current time t should be formulated as

minAt

J0ðAtÞ ¼minAt

f99ATt C�t At99

2�a99AT

t Cþt At992þb99Pt�Pt�199

2

F g ð5Þ

Fig. 1. Pairwise discriminant constrains.

where Cþt ¼ zþt zþTt and C�t ¼ z�t z�T

t are the positive and negativecovariance matrix at time t, respectively, b40 is a weightingfactor, a40 is a tuning parameter, 99U99F is the Frobenius normoperation. The aforementioned properties could be reflected inthe terms in Eq. (5). The optimal subspace would ensure zþt havelarge projection and z�t with small value, and the projectionmatrices Pt and Pt�1 in successive frames are close. Note that boththe positive and negative data should remove their respectivemean. Eq. (5) also can be approximately rewritten as

minAt

J1ðAtÞ ¼minAt

ftraceðATt C-

t AtÞ�atraceðATt Cþt AtÞþb99Pt�Pt�199

2

F g

ð6Þ

for the purpose of computational convenience.The solution to the problem in Eq. (6) is given by Pt¼UUT,

where U is constituted by the r eigenvectors that corresponds tothe r smallest eigenvalues of a matrix

C ¼ C�t �aCþt þbðI�Pt�1Þ ð7Þ

When requiring that At is spanned by r orthogonal vectors,then At¼U, since At might be not unique without this assumption.

3.2. Incremental learning for the pairwise discriminant subspace

Many discriminant analysis methods have been proposed in[29,30,33–37,44,47], as well as some matrix decompositionmethods [43,48,49] developed to improve the proposed discrimi-nant analysis. But most of these discriminant methods would becomputational and memory intensity to keep all the data todetermine the discriminant subspace. At the same time, thenegative samples would be much more than the positive samples,due to the diversity of the negative class. Therefore, anincremental updating scheme is adopted to approximate the truedistribution of both object and background observation, and keepthe discriminant between the pos- and neg-class. Note that thebackground (or negative) data have same number as the object,since the requirement of the pairwise constraints on the data pairis one-to-one form, that is, the object–background pair.

Both the object and background in this paper are presented bythe covariance matrices described in the following section. As theresult, one image observation will be generated to pos- and neg-samples as shown in Fig. 2. In Section 3.2.2, an incrementalscheme on the pairwise discriminant subspace is introduced.

3.2.1. Covariance variable based feature descriptor

The observed evidence (or image feature) here is representedby the covariance matrix descriptor proposed by Tuzel et al. [27].Denote I as a W�H one-dimensional intensity or three-dimen-sional color image, and F as the W�H� d dimensional featureimage extracted from I:

Fðx,yÞ ¼cðI,x,yÞ ð8Þ

object

background

object

background

Fig. 2. Covariance descriptor for the object and background.

Page 4: Incremental pairwise discriminant analysis based visual tracking

Table 1Incremental learning for pairwise discriminant subspace.

Input:

new data: zðiÞobject ,zðiÞ

background ARd2

, i¼1,y,l

new data number: l

old covariance matrices: C 7old

old discriminant matrix: Uold

old data mean: I7old

old data number: n

Output:

new covariance matrices: C 7new

new discriminant matrix: Unew

new data mean: I7new

updated data number: n

1. compute the new data mean I7new ¼

lnlnþ l I

7oldþ

llnþ l I u7new

2. compute the new covariance matrices

C 7new ¼ l2C7

oldþC u7newþ

lnl

lnþ lðI u7new�I

7oldÞðI u

7new�I

7oldÞ

T ,

where I u7new ¼1l

Pl

i ¼ 1

zðiÞobject, I u�new ¼1l

Pl

i ¼ 1

zðiÞbackground ,

Cuþnew ¼Xl

i ¼ 1ðzðiÞobject�I uþnewÞðz

ðiÞobject�I uþnewÞ

T

Cu�new ¼Xl

i ¼ 1ðzðiÞbackground�I u�newÞðz

ðiÞbackground�I u�newÞ

T

3. compute the new discriminant matrix:

½U,D� ¼ SVDðC�new�aCþnewþbðI�PoldÞÞ,

where Pold ¼UoldUTold, I is a unit matrix

Unew is the r eigenvector of U corresponding to the largest r eigenvalue of D.

4. Compute the updated number n¼ln+m

J. Wen et al. / Neurocomputing 74 (2010) 428–438 431

where c is a function for extracting image features. For a givenrectangle region RC I, denote {fi}i¼1,y,L as the d-dimensionalfeature points obtained by c within R. Consequently, the imageregion R can be represented as a d� d covariance matrixCR ¼ ð1=L�1Þ

PLi ¼ 1ðfi�mÞðfi�mÞT , where m is the mean of the

{fi}i¼1,y,L. For our tracking issue, there are two covariance matrixwith the object and background observation in some image regionas shown in Fig. 2. We define the c(I,x,y) as

xy9Ix99Iy9ffiffiffiffiffiffiffiffiffiffiffiffiffiI2x þ I2

y

q9Ixx99Iyy9arctan

9Ix99Iy9

" #ð9Þ

where x and y are the pixel location, Ix,Ixx,y are intensityderivatives, and the last term is the edge orientation.

The covariance description is used in this paper, since theshape of the image region would not be concerned when thecovariance matrix is computed. In this paper, the backgroundregion would be selected in larger size than the object region atthe same center as the object region. In this paper, the region sizefor the background is 2.5 times than that for the object. Thus, foran observed evidence, there are two parts zt ¼ fz

þt ,z�t g, the image

feature for a sample is {Cobject, Cbackground}, which is obtained bycomputing the covariance matrix of the object region andbackground region with the object part removed.

3.2.2. Incremental pairwise discriminant analysis

Based on the research of the Riemannian metric, it can be easilydrawn the conclusion that both the object and backgroundcovariance metric are symmetric positive definite (SPD) matrix[26] lying on a connected Riemannian manifold. Enlightened by thework of Arsigny et al. [26] on log-Euclidean Riemannian metric forstatistics on SPD matrices, in this section, an incremental learningfor pairwise discriminant analysis will be exploited in detail.

In our tracking framework, the data is represented by twocovariance matrices, Cobject and Cbackground from Fig. 2, supposed tobe the object and background observation, respectively. By thelog-Euclidean mapping, the two covariance tensor Cobject andCbackground are transformed into

Lgobject ¼ flogCobjectg

Lgbackground ¼ flogCbackgroundg

(ð10Þ

which is called the log-Euclidean covariance tensor. Due to thevector space structure of log C under the log-Euclidean Rieman-nian metric, log C is unfolded into a d2-dimensional vector zwhich is formulated as

zobject ¼UTflogCobjectg

zbackground ¼UTflogCbackgroundg

(ð11Þ

where UT(U) is an operator unfolding a matrix into a column vector.The classic R-SVD algorithm [4] efficiently computes the

singular value decomposition of a dynamic matrix with newlyadded columns or rows, based on the existing SVD. However, theR-SVD algorithm [4] is based on the zero mean assumption,leading to the failure of tracking subspace variabilities. Based on[4], Lim et al. [14] improves the R-SVD algorithm to compute theeigenbasis of a scatter matrix with the mean update. Based on theimproved R-SVD [14], we apply the update method to the objectand background subspace, so as to keep the Cþ and C� instead ofmaintaining all the data so far, Table 1 is the pseudo code forupdating the pairwise discriminant subspace.

In Table 1, a, b40 are weighting factors, and 0olr1 is theforgetting factor used to alleviate the influence of the old data onthe subspace update. Notice that the old covariance matrices C 7

old

can also be substituted by the eigenvector from SVD decomposi-tion of the old data, when the energy of C7

old can be ignored for theconsideration of memory saving.

Complexity: The space consumption for incremental pairwisediscriminant analysis algorithm is O(d4) if the old data is stored inC 7

old, while it will be O(d2) if the old data is stored in the form of{eigenvector, eigenvalue}, which are only kept by most energy,say, 90%. The computation cost is: O(d4)from updating thevariance matrix, O(d6)from SVD decomposition. In the paper, thefeature number d will be 7 for gray image and 23 for color image.

4. Visual tracking based on incremental pairwisediscriminant subspace

The tracking procedure is in the frame of Bayesian stateinference, assuming the motion between the consecutive framesto be an affine motion. Let xt denotes the state variable describingthe affine motion parameters of an object at time t. Given a set ofobserved evidence zt ¼ fz1,. . .,ztg, the posterior probability isformulated by Bayesian theorem as

pðxt9ztÞppðzt9xtÞ

Zpðxt9xt�1Þpðxt�19ztÞdxt�1 ð12Þ

where pðzt9xtÞ denotes the observation model, and p(xt9xt�1)represents the dynamic model. In the tracking framework, weapply an affine image warping to model the objet motion of twoconsecutive frames. The six parameters of the affine transform areused to model p(xt9xt�1) of a tracked object. Let xt¼(xt,yt,rt,st,at,kt)where the six parameters denote the x, y translations, the rotationangle, the scale, the aspect ratio, and the skew direction at time t,respectively. Due to the motion of the object from one frame tothe next can be modeled by a one-order Markov model, the stateparameter at time t depends on the last time t�1, a Gaussiandistribution is used to describe the state transform as

pðxt9xt�1Þ ¼Nðxt; xt�1,SÞ ð13Þ

Page 5: Incremental pairwise discriminant analysis based visual tracking

J. Wen et al. / Neurocomputing 74 (2010) 428–438432

where S is a diagonal covariance matrix whose elements are thecorresponding variances of affine parameters, i.e., s2

x ,s2y ,s2

r ,s2s ,s2

a ,s2k .

The observation model pðzt9xtÞ reflects the probability that a sampleis generated from the subspace. In this paper, the evidence consists oftwo sets of data with the positive and negative sample zt ¼ fz

þt ,z�t g

as shown in Fig. 2. The similarity of the sample to the discriminantsubspace also is constituted by two parts as

pðzt9xtÞpexpf�½99ðzþt �IþÞ�UoldUT

oldðzþt �I

þÞ992

þ99UToldðz

�t �I

�Þ992�g ð14Þ

where the first term on the right hand in the exponential operation inEq. (14) which denotes the reconstructive error of the positive sampleon the subspace, while the second term represents the projection ofnegative sample on the subspace. According to the analysis in section,the subspace would make the positive sample close to the subspacewith small reconstruction error, and the relative negative sample faraway from the subspace with small projection.

The entire procedure of the proposed algorithm is summarizedas follows:

TabThe

D

In

T

B

C

S

Initialization: At t¼0, the object x is specified by the user input.The covariance descriptions for the object and background arecomputed, and the prior for the affine parameter should beprovided.

� Iteration: For t40, perform the following 3 steps until the

video is over:J Step 1: Generate the samples with the affine parameter,

warp the frame by the affine parameters, and compute thecovariance description of the sample image patch.

J Step 2: Compute the weights of the samples by Eq. (14),keep the state parameter with maximum weight as theoptimal estimation, and draw the tracking result.

J Step 3: Keep the feature data of the optimal estimation intoa buffer memory. While the data number in the buffer is

le 2tes

ude

doo

oy

aske

orrid

kiing

t video sequences.

Colorful Object type No. of frames No. of d

k No Face 500 7

r No Face 340 7

No Toy 1100 7

tball Yes Human body 280 23

er Yes Pedestrian 400 23

Yes Human body 300 23

Fig. 3. The tracking results in video sequence ‘‘dudek’’ with the frame numbe

more than l, apply the incremental learning in Section 3 toupdate the pairwise discriminant subspace.

r #4

In this paper, the sample number is set to 100 and l is 5.

5. Experimental results and discussion

In order to evaluate the performance of the proposed trackingalgorithm, we collected six videos as shown in Table 2. The humanface, pedestrians and toy as the tracking objects, where the firstthree videos are captured indoor undergoing large pose variantand drastic illumination, the last three video sequences arerecorded with moving human in shopping center, basketball courtand water-skiing in a lake. Among the six video sequences, thefirst three video are gray and the others are colorful.

5.1. Tracking a human face

In this section, we desire two experiments to evaluate thetracking performance mainly undergoing pose variant andocclusion, in Exp1 and Exp2, respectively.

Exp1: In this experiment, the test video sequence is the‘‘dudek’’, which is widely used in various tracking research, withthe characteristics: occlusion, fast motion, pose, and appearancevariant. The appearance change in this video sequence is drasticand challenging for many tracking method. Our experimentshows that the discriminant subspace stick to the difference ofthe true object and background. The pairwise condition constrainsthe discriminant subspace close to the object subspace and faraway from the background subspace as shown in Fig. 3.

Exp2: The video sequence ‘‘indoor’’ is taken indoor with twopeople walking around the camera. The human face of interesthas much appearance variant because of the drastic pose changeand occlusion. Our experimental results show the good trackingperformance even when the face recovers from the large posechange. Generally, the tracker would drift the appearance to themost likelihood sample and lose the track when the drasticappearance variant happens. In our method, the tracker could findthe object by the pairwise discriminant analysis with backgroundinformation as shown in Fig. 4.

5.2. Tracking a toy object

The video sequence ‘‘sylv’’ is also a challenging video for thetracking task because the cumulated error during the long-time

6, #120, #133, #158, #165, # 193, #207, and #215, in order.

Page 6: Incremental pairwise discriminant analysis based visual tracking

Fig. 4. The tracking results in video sequence ‘‘indoor’’ with the frame number #12, #43, #54, #62, #72, #82, #88, #106, #111, #170, #180, #220, #229, #280, #285, and

#297, in order.

J. Wen et al. / Neurocomputing 74 (2010) 428–438 433

process. Though many trackers could get good performance in theforepart frame of the sequence ‘‘sylv’’, they usually could not bearthe long duration of the tracking, due to that the appearancemodel has been gradually adapted to the observed evidencewhich is so different from the object. As shown in Fig. 5, ourapproach could keep good tracking performance with thedicriminant analysis from the background information.

5.3. Tracking a human body

In this section, the test video sequences are colorful, thememory and computational consumption are much larger thanthose in gray videos, due to the image feature will be computed ineach channel of the color images. In this part of experiments, theobjects of interest are human body, which have more appearancedeformation than face, since the free degree of the human body ismuch more than face. Therefore, the human body would be morechallenging in the nature video with much non-rigid motion.

Exp1: In this video sequence as shown in Fig. 6, the object ofinterest is a human playing basketball, whose appearance under-goes the influence of the pose, scale variant, and the occlusion bythe other objects. Note that the background around the object hasthe similar color to the object in about frame #106, #118, and#126. However, the tracker in our method could still track the

object effectively, because our tracker could keep the consistencyof the discriminant subspace during the subspace update.

Exp2: As shown in Fig. 7, the object of interest is a pedestrianwalking far away from the camera. Although the object appear-ance itself has little change, the observed image evidence is stillinfluenced by the illumination and occlusion by the other people.The results in about frame #100 and #216 show the robustness ofour tracking method.

Exp3: In the ‘‘skiing’’ sequence, the object of interest is a water-skiing person, who makes many actions during his skiing, withthe appearance deformation resulted from the motion variant. Asshown in Fig. 8, the proposed tracking method could get goodperformance, though the object appearance takes on deformation.

5.4. Comparison results

In this section, we use the sequences ‘‘indoor’’ and ‘‘sylv’’ tocompare the tracking performance of the tracker in [14,32], andour approach. In [14], the improved incremental subspacelearning (ISL) is proposed to compute eigenvectors of the updatedscatter matrix with the mean updated. The ISL tracker gets goodtracking performance in a short-time tracking. However, this isachieved under the condition that the optimal estimated objectappearance data in early period should not be polluted by drastic

Page 7: Incremental pairwise discriminant analysis based visual tracking

Fig. 5. The tracking results in video sequence ‘‘sylv’’ with the frame number #67, #98, #136, #156, #169, #259, #328, # 423, #593, #654, #839, #849, #855, #976, #998,

and # 1016, in order.

Fig. 6. The tracking results in video sequence ‘‘basketball’’ with the frame number #18, #76, #81, #104, #118, #126, #148, and #182, in order.

J. Wen et al. / Neurocomputing 74 (2010) 428–438434

Page 8: Incremental pairwise discriminant analysis based visual tracking

Fig. 7. The tracking results in video sequence ‘‘corridor’’ with the frame number #11, #68, #100, #142, #177, #192, #218, and # 270, in order.

Fig. 8. The tracking results in video sequence ‘‘skiing’’ with the frame number #6, #46, #71, #91, #99, #127, #158, and #169 in order.

J. Wen et al. / Neurocomputing 74 (2010) 428–438 435

deformation and the disturbing of the extinct surrounding, suchas, illumination and occlusion; otherwise, the error would becumulated in the procedure of the subspace updating, and thuslosing the track. In ISL, the subspace for tracking only depends onthe maximum likelihood of the sample to the subspace, thetracker would be lost, once the tracker start to adapt the subspaceto the observed image region, which does not cover the trueobject, tracker would be lost. In [32], the subspaces used todescribe the object of interest are constructed by five types ofcovariance matrices with the transformation to the log-EuclideanRiemannian matrices, and the five covariance matrices areupdated by the same mechanism as [14]. The incrementallog-Euclidean Riemannian subspace learning (IRSL) tracker issupposed to consider most condition of the possible objectappearance under the five covariance matrix. However, withoutthe consideration of the background knowledge, the tracker in[32] could not get stable tracking result in the video with thesimilar texture and intensity in the background to the object.Moreover, due to the computation of five subspaces update andmeasurement in observation, the tracker in [32] processes verycomputationally.

Exp1: As shown in Fig. 9, the top, middle and bottom row is thetracking results in [14,32], and our approach, named as IPDA. Asthe analysis above, due to the requirement of the subspace updatein the early period in ISL is not satisfied, the ISL tracker would losethe object in the top row in Fig. 9. While the IRSL tracker takes oninstable tracking results, though the tracker could cover the objectregion as shown in the middle row in Fig. 9. The proposed methodcould get the stable tracking results, even though the objectundergoes drastic motion and pose variant.

Exp2: In the comparison in video sequence ‘‘sylv’’ in Fig. 10, theISL tracker could get good performance in the early procedure ofthe tracking. However, the tracker starts to be lost in about frame#620 in Fig. 10. The IRSL could only cover part of the object regionduring all the procedure of the tracking. The proposed methodcould keep the excellent tracking performance even bearing long-time tracking process.

Exp3: As shown in Fig. 11, the two figures are the comparisonresults of the location error on the sequence ‘‘indoor’’ and ‘‘toy’’,respectively, by the ISL [14], IRSL [32] and the IPDA. The left one inFig. 11 is the comparison result of the sequence ‘‘indoor’’, theobject undergoes the drastic pose variation with fast motion and

Page 9: Incremental pairwise discriminant analysis based visual tracking

Fig. 9. The comparison results of [14,32] and the proposed method on the video sequence ‘‘indoor’’. The top, middle and bottom row is the tracking results by ISL [14], IRSL

[32] and the proposed method, respectively.

Fig. 10. The comparison results of [14], [32] and the proposed method on the video sequence ‘‘sylv’’. The top, middle and bottom row is the tracking results by ISL [14], IRSL

[32] and the proposed method, respectively.

0 50 100 1500

5

10

15

20

25

30

35

40

45

Frame Number Frame Number

Loca

tion

Erro

r

ISLIRSLIPDA

ISLIRSLIPDA

0 50 100 150 200 250 300 350 400 450 5000

10

20

30

40

50

60

Loca

tion

Erro

r

Fig. 11. The comparison results of the proposed IPAD and ISL in [14] and IRSL in [32] on the video sequences ‘‘indoor’’ and ‘‘toy’’. The left and right are the comparison

results on the ‘‘indoor’’ and ‘‘toy’’ sequence, respectively.

J. Wen et al. / Neurocomputing 74 (2010) 428–438436

Page 10: Incremental pairwise discriminant analysis based visual tracking

J. Wen et al. / Neurocomputing 74 (2010) 428–438 437

scale change, which leads to the shift from the true location byboth the ISL and IRSL methods as the blue and black curves shownin the Fig. 11. Similarly, the tracking methods of the ISL and IRSLon the sequence ‘‘toy’’ also lose the tracker, because the object hasthe same characteristics as the one in the sequence ‘‘indoor’’, aswell as the complicated illumination. The red curves in Fig. 11show the ability of the IPDA could not only get the accuratelocation of the object, but also keep the tracker in a rather longduration.

5.5. Discussion

All the experiments above have validated the proposedapproach. Due to the adaptation to the false object feature, theupdated model usually tends to far away from the true model, inaddition of the forgetting factor may be included, and the trackerusually drifts to the non-object evidence. Moreover, once the driftstarts, there is no remedy in most method to pull them back.Therefore, the drift is unstable and catastrophic.

In contrast, the utilization of the discriminant informationbetween the object and the background could make a feedback forthe uncertainty of the object estimation; moreover, the subspaceconsistency constrains the update subspace would not bend tothe emergency of the polluted appearance, since the objectsubspace in short interval is suppose to be linear.

6. Conclusion

In this paper, an incremental pairwise discriminant analysisbased object tracking is proposed to deal with the drift of the objectmodel. The proposed method could get good tracking results withthe consideration the following factors: (1) premising the abilityto delineate the object and push away the background from theobject; (2) the pairwise adaptation of the object–background tothe discriminant subspace; and (3) the subspace consistency of thesuccessive frame. The proposed method could prevent the trackingdrift.

Our further work will focus on fusion of the pairwisediscriminant and subspace consistency constraints, once thecontrast between the two constrains occurs.

Acknowledgment

This research is supported by the National Basic ResearchProgram of China (973 Program) (Grant No. 2011CB707000),National Natural Science Foundation of China (Grant Nos.60771068, 60702061, 60832005, and 61072093), the Open-EndFund of National Laboratory of Pattern Recognitions of CAS,National Laboratory of Automatic Target Recognition of ShenzhenUniversity, and the Program for Chang-Jiang Scholars andInnovative Research Team in University of China.

References

[1] M.J. Black, A.D. Jepson, Eigentracking: obust matching and tracking ofarticulated objects using view-based representation, in: Proceedings of theFourth European Conference on Computer Vision, Cambridge, UK, vol. 1, April15–18, 1996, pp. 329–342.

[2] G.D. Hager, P.N. Belhumeur, Real-time tracking of image regions with changesin geometry and illumination, in: Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition, San Francisco, CA, USA, vol. 1, June18–20, 1996, pp. 403–410.

[3] S. Birchfield, Elliptical head tracking using intensity gradients and colorhistograms, in: Proceedings of the IEEE Conference on Computer Vision andPattern Recognition, Santa Barbara, CA, USA, vol. 1, June 23–25, 1998,pp. 232–237.

[4] A. Levy, M. Lindenbaum, Sequential Karhunen–Loeve basis extraction and itsapplication to images, IEEE Trans. Image Process. 9 (8) (2000) 1371–1374.

[5] A.D. Jepson, D.J. Fleet, T.R. El-Maraghi, Robust online appearance models forvisual tracking, in: Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition, Kauai, HI, vol. 1, December 9–14, 2001, pp. 415–422.

[6] K. Toyama, A. Blake, Probabilistic tracking in a metric space, in: Proceedingsof the IEEE International Conference on Computer Vision, Vancouver, BC,Canada, vol. 2, July 7–14, 2001, pp. 50–57.

[7] J. Vermaak, P. Perez, M. Gangnet, A. Blake, Towards improved observationmodels for visual tracking: selective adaptation, in: Proceedings of theSeventh European Conference on Computer Vision, Copenhagen, Denmark,vol. 1, May 2002, pp. 645–660.

[8] R.T. Collins, Y. Liu, On-line selection of discriminative tracking features, in:Proceedings of the IEEE International Conference on Computer Vision, Nice,France, vol. 1, October 13–16, 2003, pp. 346–352.

[9] K.C. Lee, J. Ho, M.H. Yang, D. Kriegman, Video-based face recognition usingprobabilistic appearance manifolds, in: Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition, Madison, WI, vol. 1, June 18–20,2003, pp. 313–320.

[10] J. Ho, K.C. Lee, M.H. Yang, D.J. Kriegman, Visual tracking using learned linearsubspace, in: Proceedings of the IEEE Conference on Computer Vision andPattern Recognition, vol. 1, June 27–July 2, 2004, pp. 782–789.

[11] D. Ross, J. Lim, M.H. Yang, Adaptive probabilistic visual tracking withincremental subspace update, in: Proceedings of the European Conference onComputer Vision, Prague, Czech Republic, vol. 1, May 2004, pp. 215–227.

[12] S. Avidan, Support vector tracking, IEEE Trans. Pattern Anal. Mach. Intell. 26(8) (2004) 1064–1072.

[13] J. Ho, K.C. Lee, M.H. Yang, D.J. Kriegman, Visual tracking using learned linearsubspace, in: Proceedings of the IEEE Conference on Computer Vision andPattern Recognition, Grand Hyatt, Washington, vol. 1, June 27–July 2, 2004,pp. 782–789.

[14] J. Lim, D. Ross, R.-S. Lin, M.-H. Yang, Incremental learning for visual tracking,in: Proceedings of the 17th Advances in Neural Information ProcessingSystems, Vancouver, BC, Canada, December 13–18, 2004, pp. 801–808.

[15] S.K. Zhou, R. Chellappa, B. Moghaddam, Visual tracking and recognition usingappearance-adaptive models in particle filters, IEEE Trans. Image Process. 13(11) (2004) 1491–1506.

[16] A. Elgammal, Learning to track: conceptual manifold map for closed-formtracking, in: Proceedings of the IEEE Conference on Computer Vision andPattern Recognition, San Diego, CA, vol. 1, June 20–26, 2005, pp. 724–730.

[17] J. Wang, X. Chen, W. Gao, Online selecting discriminative tracking featuresusing particle filter, in: Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition, San Diego, CA, vol. 2, June 20–26, 2005,pp. 1037–1042.

[18] S. Avidan, Ensemble tracking, in: Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition, San Diego, CA, vol. 2, June 20–25,2005, pp. 494–501.

[19] A.P. Leung S. Gong, Online feature selection using mutual information forreal-time multi-view object tracking, in: Proceedings of the IEEE Interna-tional Workshop Analysis and Modeling of Faces and Gestures, Beijing, China,October 16, 2005, pp. 184–197.

[20] F. Tang, H. Tao, Object tracking with dynamic feature graph, in: Proceedingsof the IEEE Workshop on Visual Surveillance and Performance Evaluation ofTracking and Surveillance, Beijing, China, October 15–16, 2005, pp. 25–32.

[21] M. Yang, Y. Wu, Tracking non-stationary appearances and dynamic featureselection, in: Proceedings of the IEEE Conference on Computer Vision andPattern Recognition, San Diego, CA, vol. 2, June 20–26, 2005, pp. 1059–1066.

[22] K.C. Lee, D.J. Kriegman, Online learning of probabilistic appearance manifoldsfor video-based recognition and tracking, in: Proceedings of the IEEEConference on Computer Vision and Pattern Recognition, San Diego, CA,vol. 1, June 20–25, 2005, pp. 852–859.

[23] X. He, D. Cai, S. Yan, H. Zhang, Neighborhood preserving embedding, in:Proceedings of the IEEE Conference on Computer Vision, Beijing, China, vol. 2,2005, pp. 1208–1213.

[24] H. Grabner, M. Grabner, H. Bischof, Real-time tracking via on-line boosting,in: Proceedings of the Conference on British Machine Vision, Edinburgh,vol. 1, September 4–7, 2006, pp. 47–56.

[25] H. Lim, V.I. Morariu, O.I. Camps, M. Sznaier, Dynamic appearance modelingfor human tracking, in: Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition, New York, NY, vol. 1, June 17–22, 2006,pp. 751–757.

[26] V. Arsigny, P. Fillard, X. Pennec, N. Ayache, Geometric means in a novel vectorspace structure on symmetric positive-definite matrices, SIAM J. Matrix Anal.Appl. 29 (1) (2007) 328–347.

[27] O. Tuzel, F. Porikli, P. Meer, Human detection via classification on Riemannianmanifolds, in: Proceedings of the Conference on Computer Vision and PatternRecognition, Minneapolis, MN, June 17–22, 2007, pp. 1–8.

[28] Z. Yin, R. Collins, On-the-fly object modeling while tracking, in: Proceedingsof the IEEE Conference on Computer Vision and Pattern Recognition,Minneapolis, MN, vol. 1, June 17–22, 2007, pp. 1–8.

[29] D. Cai, X. He, J. Han, Efficient kernel discriminant analysis via spectralregression, in: Proceedings of the International Conference on Data Mining,Omaha, Nebraska, USA, October 28–31, 2007, pp. 427–432.

[30] D. Cai, X. He, J. Han, Semi-supervised discriminant analysis, in: Proceedings ofthe International Conference on Computer Vision, Rio de Janeiro, Brazil,October 14–20, 2007, pp. 1–7.

Page 11: Incremental pairwise discriminant analysis based visual tracking

J. Wen et al. / Neurocomputing 74 (2010) 428–438438

[31] D. Xu, S. Lin, S. Yan, X. Tang, Rank-one projections with adaptive margin forface recognition, IEEE Trans. Syst. Man Cybern. Part B 37 (5) (2007) 1226–1236.

[32] X. Li, W. Hu, Z. Zhang et al., Visual tracking via incremental log-EuclideanRiemannian subspace learning, in: Proceedings of the Conference onComputer Vision and Pattern Recognition, Anchorage, AK, vol. 1, June 23–28, 2008, pp. 1–8.

[33] D. Cai, X. He, J. Han, SRDA: an efficient algorithm for large-scale discriminantanalysis, IEEE Trans. Knowledge Data Eng. 20 (1) (2008) 1–12.

[34] Y. Yuan, Y. Pang, Discriminant adaptive edge weights for graph embedding,in: Proceedings of the IEEE Conference on Acoustics, Speech, and SignalProcessing, Las Vegas, NV, USA, 2008, pp. 1993–1996.

[35] Y. Yuan, Y. Pang, Boosting simple projections for multi-class dimensionalityreduction, in: Proceedings of the IEEE Conference on Systems, Man, andCybernetics, Singapore, 2008, pp. 2231–2235.

[36] X. He, D. Cai, J. Han, Learning a maximum margin subspace for imageretrieval, IEEE Trans. Knowledge Data Eng. 20 (2) (2008) 189–201.

[37] X. Li, S. Lin, S. Yan, D. Xu, Discriminant locally linear embedding with high-order tensor data, IEEE Trans. Syst. Man Cybern. Part B 38 (2) (2008) 342–352.

[38] D. Xu, S. Yan, S. Lin, T.S. Huang, Convergent 2-D subspace learning with null spaceanalysis, IEEE Trans. Circuits Syst. Video Technol. 18 (12) (2008) 1753–1759.

[39] D. Xu, S. Yan, L. Zhang, H. Zhang, T.S. Huang, Reconstruction and recognitionof tensor-based objects with concurrent subspaces analysis, IEEE Trans.Circuits Syst. Video Technol. 18 (1) (2008) 36–47.

[40] D. Xu, S. Yan, S. Lin, T.S. Huang, S.F. Chang, Enhancing bilinear subspacelearning by element rearrangement, IEEE Trans. Pattern Anal. Mach. Intell. 31(10) (2009) 1913–1920.

[41] D. Xu, S. Yan, Semi-supervised bilinear subspace learning, IEEE Trans. ImageProcess. 18 (7) (2009) 1671–1676.

[42] Y. Yuan, Y. Pang, J. Pan, X. Li, Scene segmentation based on IPCA for visualsurveillance, Neurocomputing 72 (10–12) (2009) 2450–2454.

[43] Y. Yuan, X. Li, Y. Pang, X. Lu, D. Tao, Binary sparse nonnegative matrixfactorization, IEEE Trans. Circuits Syst. Video Technol. 19 (5) (2009) 772–777.

[44] Y. Lu, Q. Tian, Discriminant subspace analysis: an adaptive approach forimage classification, IEEE Trans. Multimedia 11 (7) (2009) 1289–1300.

[45] H. Zhou, Y. Yuan, C. Shi, Object tracking using SIFT features and mean shift,Comput. Vision Image Understanding 113 (2) (2009) 345–352.

[46] H. Zhou, Y. Yuan, Y. Zhang, C. Shi, Non-rigid object tracking in complexscenes, Pattern Recognition Lett. 30 (2) (2009) 98–102.

[47] T. Zhang, B. Fang, Y. Tang, Z. Shang, B. Xu, Generalized discriminant analysis:a matrix exponential approach, IEEE Trans. Syst. Man Cybern. Part B: Cybern.40 (1) (2010) 186–197.

[48] X. Li, Y. Pang, Deterministic column-based matrix decomposition, IEEE Trans.Knowledge Data Eng. 22 (1) (2010) 145–149.

[49] X. He, Laplacian regularized d-optimal design for active learning and itsapplication to image retrieval, IEEE Trans. Image Process. 19 (1) (2010) 254–263.

[50] Y. Yuan, Y. Pang, X. Li, Footwear for gender recognition, IEEE Trans. CircuitsSyst. Video Technol. 20 (1) (2010) 131–135.

[51] J. Wen, X. Gao, Y. Yuan, D. Tao, Incremental tensor biased discriminantanalysis: a new color based visual tracking, NeuroComputing 73 (4–6) (2010)827–839.

[52] X. Li, Y. Pang, Y. Yuan, L1-norm-based 2DPCA, IEEE Trans. Syst. Man Cybern.Part B 40 (4) (2010) 1170–1175.

Jing Wen received the B.Sc. degree in ElectronicInformation Science and Technology from ShanxiUniversity, Taiyuan, China, in 2003, and the M.Eng.degree in Signal and Information Processing fromXidian University, Xi’an, China, in 2006. Since August2006, she has been pursuing her Ph.D. degree inPattern Recognition and Intelligent System at XidianUniversity. Her research interests include patternrecognition and computer vision.

Xinbo Gao received his Bachelor degree in ElectronicEngineering, Master degree and Ph.D. degree in Signaland Information Processing from Xidian University,Xi’an, China, in 1994, 1996, and 1999, respectively.From 1997 to 1998, he was a Research Fellow inDr. Hiroyuki Iida’s Group, Department of ComputerScience at Shizuoka University, Hamamatsu, Japan.From 2000 to 2001, he was a Postdoctoral Fellow inDr. Xiaoou Tang’s Group, Department of InformationEngineering at the Chinese University of Hong Kong,Shatin, NT, Hong Kong SAR, China. Since 2003, Dr. XinboGao has been a full Professor in School of Electronic

Engineering at Xidian University, Xi’an, China, and the

Director of Video & Image Processing System Laboratory (VIPSL). Since 2005, he hasbeen the Director of Office of Cooperation and Exchange, Xidian University. Since2008, he concurrently served as the Dean of School of International Education,Xidian University. His research interests include visual information processing andanalysis, pattern recognition, machine learning and computational Intelligence. In2004, Dr. Gao was selected as a member of the program for New Century ExcellentTalents in University of China by the Ministry of Education (MOE). He wasauthorized the title Pacemaker of Ten Excellent Young Teacher of Shaanxi Provincein 2005. In 2006, he was awarded the Young Teacher Award of High School by theFok Ying Tung Education Foundation. From 2006, he was selected as an Expertenjoying the Government Special Subsidy. In 2007, as one of the principal members,he and his colleagues founded an Innovative Research Team in University, MOE,China. In 2008, he was awarded one of 10 distinguished teachers of XidianUniversity. This year, he was just selected as a candidate of the One Hundred plusOne Thousand plus Ten Thousand Talents Project of the New Century. So far, Dr. Gaois a Fellow of IET/IEE, and Vice Chairman of IET Xi’an Network; Senior Member ofIEEE; Member of IEEE Xi’an Section Executive Committee, and the MembershipDevelopment Committee Chair, Vice President of Computational IntelligenceChapter, IEEE Xi’an Section and Member of Technical Committee of CognitiveComputing, IEEE SMC Society; Senior Member of China Computer Federation (CCF)and Academic Committee Member of YOCSEF, Xi’an, and Senior Member of theChinese Institute of Electronics (CIE), an Executive member of China Society ofImage and Graphics (CSIG) Council, and Members of Editorial board for EURASIPSignal Processing Journal, Neurocomputing, International Journal of MultimediaIntelligence and Security, and International Journal of Image and Graphics.

Xuelong Li is a Researcher (i.e., full professor) with the State Key Laboratory ofTransient Optics and Photonics and the director of the Center for OPTical IMageryAnalysis and Learning (OPTIMAL), Xi’an Institute of Optics and PrecisionMechanics, Chinese Academy of Sciences, Xi’an 710119, Shaanxi, PR China.

Dacheng Tao received the B.Eng. degree from theUniversity of Science and Technology of China (USTC),the Mphil degree from the Chinese University of HongKong (CUHK), and the Ph.D. degree from the Universityof London (Lon). Currently, he is a Nanyang AssistantProfessor with the School of Computer Engineering in theNanyang Technological University, a Visiting Professor inXi Dian University, a Guest Professor in Wu HanUniversity, and a Visiting Research Fellow at Birkbeckin Lon. His research is mainly on applying statistics andmathematics for data analysis problems in data mining,computer vision, machine learning, multimedia, and

visual surveillance. He has published nearly 100 scien-

tific papers including IEEE TPAMI, TKDE, TIP, TMM, TCSVT, TIFS, TSMCB, TSMC, TITB,CVPR, ECCV, ICDM; ACM TKDD, Multimedia, KDD, etc., with one best paper runner upaward. Previously he gained several Meritorious Awards from the InternationalInterdisciplinary Contest in Modeling, which is the highest level mathematicalmodeling contest in the world, organized by COMAP. He is an associate editor of IEEETransactions on Knowledge and Data Engineering, Neurocomputing (Elsevier) and theOfficial Journal of the International Association for Statistical Computing—Computa-tional Statistics and Data Analysis (Elsevier). He has authored/edited six books andeight special issues, including CVIU, PR, PRL, SP, and Neurocomputing. He has (co-)chaired for special sessions, invited sessions, workshops, and conferences. He hasserved with more than 50 major international conferences including CVPR, ICCV,ECCV, ICDM, KDD, and Multimedia, and more than 15 top international journalsincluding TPAMI, TKDE, TOIS, TIP, TCSVT, TMM, TIFS, TSMC-B, Computer Vision andImage Understanding (CVIU), and Information Science. He is a member of IEEE, IEEEComputer Society, IEEE Signal Processing Society, IEEE SMC Society, and IEEE SMCTechnical Committee on Cognitive Computing.

Jie Li received the B.Sc., M.Sc. and Ph.D. degrees inCircuit and System from Xidian University, China, in1995, 1998 and 2005, respectively. In 1998, she joinedthe School of Electronic Engineering at Xidian Uni-versity. Currently, she is a Professor of Xidian Uni-versity. Her research interests include computationalintelligence, machine learning, and image processing.In these areas, she has published over 30 technicalarticles in refereed journals and proceedings includingIEEE TCSVT, IJFS, etc.