Transcript

2013 Sixth International Conference on Advanced Computational Intelligence

October 19-21, 2013, Hangzhou, China

Distance Metric Learning for Multi-Camera People Matching

Haoxiang Wang, Ferdinand Shkjezi, and Ela Hoxha

Abstract- In this paper, we propose a supervised distance metric learning method for the problem of matching people in different but non-overlapping camera pictures, which is an important and challenging problem for behavior understanding. Different from previous methods, which try to extract good visual features, in this paper, we try to model it as a distance metric learning problem. We formulate the problem so that the learned distance between the a pair of true matched people' image is smaller than that of a wrong matched pair. We conduct­ed experiments on one benchmarking dataset, and demonstrate the advantage of the proposed distance learning models over state-of-the-art multi-camera people matching techniques.

I. INTRODUCTION

IT is very important to match a person from one camera

to another non-overlapping camera in the behaviour un­

derstanding problem, which is defined as the multi-camera

people matching problem, as shown in Figure I. Although

many works have been done recently, this problem is still

open [I], [2]. Assume we already have an image of a target

people, and many candidate images from another camera,

we try to find the right match of the target people from

these candidate images. Three phases are usually need for

this problem [3]:

W hich one?

Fig. I. Multi-Camera People Matching Problem.

1) We firstly extracted visual features from both the target

people image and the candidate images [4];

2) Then the distance measures between each pair of

possible matches are computed [5].

H. Wang is with the School of Electronic Engineering, Xidian University, Xi' an 710126, China. F. Shkjezi and E. Hoxha are with the Universiteti Sevasti & Parashqevi Qirjazi (USPQ), Koder Kamez, Tirane, Albania.

This work was supported by Chongqing Key Laboratory of Computational Intelligence (Grant No. CQ-LCI-2013-02).

978-1-4673-6343-3/13/$3\.00 ©2013 IEEE 140

3) Finally, we decide which candidate image has the same

people as the target people image, based on these

distances.

Visual Feature Extraction

Distance Measure

Orde lin g

Fig. 2. Three phases of Multi-Camera People Matching

Almost all the current works are about the the first phase to

extract visual features, and they usually use simple distance

functions for the seconded phase, which is not a optimal

procedure.

The problem considered in this paper is the second phase

of multi-camera people matching - the distance metric

learning [6]. We assume that we already have some visual

features which are extracted from each candidate image,

our target is to learn the good distance metric so that we

can have the correct matches based on these features. To

this end, inspired by a large margin based visual vocabulary

learning and weighting method [7], we formulate the problem

as a distance metric learning problem so that the distance

between of a pair of true match (images of the same person)

is always smaller than that of a wrong match (images of

different persons) [8]. In this way, the true match can always

be ordered before the wrong matches, and returned to the user

based on the distances. We developed an iterative algorithm

[9] based on logistic function [10] is to learn the distance

metric. We also carry out experiments on a multi-camera

people matching database, and encouraging results show that

our algorithm has high matching accuracy.

The contributions of this paper are listed as following:

1) We formulate the multi-camera people matching prob­

lem as distance metric learning problem.

2) A logistic objective function based distance learning

model is proposed for this problem.

3) We also develop a new iterative based on this objective

function.

d(x, Xl)

d(x X")

Fig. 3. Illumination of the distance metric leaming problem.

II. PROPOSED METHOD

The multi-camera people matching problem is transformed

into a distance metric learning problem. We first represent

each image of a person as a feature vector x. Given an image

x, a distance measure is sought so that another image x' of

the same person from another camera can be found. To this

end, we also have another image x" of another person, and

we do this by learning a distance metric function d(x, x') to

make it like d(x, x') < d(x, x"), as shown in Figure 3. A

triplet set T = {(Xi,X�,X�/)}r=l is defined for the learning

problem.

We parameterize the distance metric function d(x, x') as a

squared Mahalanobis distance function [II]:

d(x, x') = IIWT X -WT x' 112 = IIWT(x-x')112 = (X_X/)TWWT(x_X/)

(1)

where W = [WI,'" ,wml is the metric matrix. In this way,

the the learning problem thus becomes to learn the matrix

W, so that

d(Xi' xD < d(Xi' x;'), Vi E T (2)

It is further modeled as a logistic objective function [12] to

be minimized as following:

O(W) = L log {I + exp [d(Xi' x;) -d(Xi' x;')]}

= L log {I + exp [IIWT (Xi -x;)112 (3)

-IIWT(Xi _x;')112]}

In this way, we transform the learning of distance measure

to the learning of W as a minimization problem [13]:

141

T {( I x")}n Input: Training set = Xi) Xi , ,=\ Input: The iteration number T.

Initialize W

For t=l . . . . T

I Update Wt based on equ (6);

Endfol'

Output: WT

Fig. 4. Algorithm for the metric matrix learning.

minO(W) W s.t.wTw = 1,

(4)

where WTW = 1 constrain that W is a set of orthogonal

vectors [14].

We imposed orthogonality constraints on the W, which

makes it difficult to optimize. Instead of imposing hard

orthogonality constraints as WTW = 1, we convert them

into a penalty term added to the objective function as

IIWTW -1112. It leads the learning problem to be:

minQ(W), W Q(W) = L log{1 + exp[IIWT (Xi -x;)112 (5)

-IIWT (Xi -x;')112]} + pllWTW -1112 To solve (5), we develop an iterative algorithm to learn W by

updating it using the gradient descent algorithm [15], [16],

where

Wnew +--- Wold _ T/ oQ(W) I oW Wold (6)

X [(Xi -x;) (Xi -x;) T -(Xi -X;')(Xi -x;') TlW + p(I -W).

The algorithm is summarised in Figure 4.

III. EXPERIMENTS

A. Databases and experiment protocols

(7)

We used the people matching database ETHZ [17] to test

the proposed algorithm. In this database, there are 8555

images of 146 people. 7 images are selected randomly for

each people to construct the training set. To make it more

realistic to a multi-camera setup, we randomly chose for

each person for training in the dataset for our experiments.

Some samples are given in Figure 5. Many images in the

Fig. 5. Samples of ETHZ database.

(b)

Fig. 6. Problems of occlusion (a) and illumination change (b) of ETHZ database.

database have the problem of occlusion and illumination

change (Figure 6) [18].

We select randomly several people's images from the

dataset to construct the test set. The remaining images are

used as training samples. The number of the selected test

people are varied so that the matching performance can be

evaluated according to different numbers of training samples.

B. Experiment results

We compare our distance metric learning method with

some distance and similarity learning algorithm:

• L1-norm distance [19];

• Bhattacharyya distance [20];

• Graph regularized distance [21];

142

The results are given in Figure 7. It is shown clearly

that our methods outperforms other methods. The matching

performance is improved significantly. It is also noticed that

using graph can also improve the matching results. The

encouraging results verified that it is very important to learn

distance for matching problem.

IV. CONCLUSIONS

In this paper, the problem of people matching is formulated

as a distance metric learning problem. We also proposed a

novel distance metric learning algorithm for this task. It can

maximise the distance between a pair of images of different

people, and it can also minimize the distance between a pair

of images of the same people. The experiments show that the

proposed method can outperforms the other distance learning

methods.

REFERENCES

[I] O. Javed, K. Shafique, and M. Shah, "Appearance modeling for

tracking in multiple non-overlapping cameras," in Proceedings of'the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, 2005, pp. 26-33.

[2] S. Kutty, R. Nayak, and L. Chen, "A people-to-people matching system using graph mining techniques," World Wide Web, pp. 1-39,2013.

[3] Y Chae, Y-J. Choi, Y-H. Seo, and H. Yang, "Robust people tracking using an adaptive sensor fusion between a laser scanner and video camera," International Journal of'Distributed Sensor Networks, vol. 2013, 2013.

[4] X.-Y. Wang, H.-Y. Yang, Y.-W. Li, and F.-Y. Yang, "Robust color image retrieval using visual interest point feature of significant bit­planes," Digital Signal Processing: A Review Journal, vol. 23, no. 4, pp. 1136-1153,2013.

[5] D. Koloseni, J. Lampinen, and P. Luukka, "Differential evolution based

nearest prototype classifier with optimized distance measures for the features in the data sets;' Expert Systems with Applications, vol. 40, no. 10, pp. 4075-4082, 2013.

[6] Y Mu, W. Ding, and D. Tao, "Local discriminative distance metrics ensemble learning," Pattern Recognition, vol. 46, no. 8, pp. 2337-2349, 2013.

[7] J. J.-Y Wang, H. Bensmail, and X. Gao, "Joint learning and weighting of visual vocabulary for bag-of-feature based tissue classification," Pattern Recognition, vol. 46, no. 12, pp. 3249 - 3255, 2013.

[8] H. Cevikalp and B. Triggs, "Hyperdisk based large margin classifier," Pattern Recognition, vol. 46, no. 6, pp. 1523-1531, 2013.

[9] Z. Ma, G. Zhang, C. Zhao, J. Xu, and L. Hu, "Iterative algorithm of conjugate depth for parabolic channels," Flow Measurement and Instrumentation, vol. 32, pp. 1-4, 2013.

80�'-----'-----'-----'----'-----'�==�=====r=====c� - B - Proposed method - ... - Graph Regularized Distance

75 - • - Bhattacharyya distance

� 70 ()' � " !i. 65

1: : .... .. ... ...... ......

L1 norm distance " ,: : :, � : : -----.,,-, <Itt ...... .. .. : ......... . --- -a..,.

g> :c " � 60

........... : :: ...... ... ..... ... ... ... ... .......... t : : : : = = = ""'a

55

50L-�------�----�------�----�------�----�------�----�--� 40 50 60 70 80 90

Number of Test People.

Fig. 7. Experiment results

[10] N. Kudryashov, "Polynomials in logistic function and solitary waves of nonlinear differential equations," Applied Mathematics and Com­putation, vol. 219, no. 17, pp. 9245-9253, 2013.

[II] R. Todeschini, D. Ballabio, V. Consonni, F. Sahigara, and P. Filzmoser, "Locally centred mahalanobis distance: A new distance measure with salient features towards outlier detection," Analytica Chimica Acta, 2013.

[12] 1. Zhang and K. Wang, "Stream line optimization model with matching degree between logistics supply and demand as objective function," Xi­

nan Jiaotong Daxue Xuebao/Journal o(Southwest Jiaotong University, vol. 45, no. 2, pp. 324-330, 2010.

[13] D. Sahu, Q. Ansari, and 1. Yao, "A unified hybrid iterative method for hierarchical minimization problems," Journal of Computational and

Applied Mathematics, vol. 253, pp. 208-221, 2013. [14] A. Sang, T. Sun, M. Chen, H. Chen, and L. Liu, "6d vector orthogonal

transformation and its application in multi view video coding," Imaging Science Journal, vol. 61, no. 4, pp. 341-350, 2013.

[15] y. Xu, X. Zeng, L. Han, and 1. Yang, "A supervised multi-spike learn­ing algorithm based on gradient descent for spiking neural networks," Neural Networks, vol. 43, pp. 99-113, 2013.

[16] X. Duan, H. Sun, L. Peng, and X. Zhao, "A natural gradient descent algorithm for the solution of discrete algebraic lyapunov equations based on the geodesic distance," Applied Mathematics and Computa­

tion, vol. 219, no. 19, pp. 9899-9905, 2013. [17] A. Ess, B. Leibe, and L. Van Gool, "Depth and appearance for mobile

scene analysis," in Proceedings ol the IEEE International Conference on Computer Vision, 2007.

[18] X. Wei, c.-T. Li, and Y. Hu, "Robust face recognition under varying illumination and occlusion considering structured sparsity," in 2012 International Conference on Digital Image Computing Techniques and Applications, DICTA 2012, 2012.

[19] Z. Wang, S. Gao, and L.-T. Chia, "Learning class-to-image distance via large margin and ll-norm regularization," Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinf!Jrmatics), vol. 7573 LNCS, no. PART 2,

pp. 230-244, 2012. [20] S. Liang, N. Liu, G. Wang, and W. Guo, "Camera sabotage detection

based on log histogram and bhattacharyya distance," Lecture Notes in Electrical Engineering, vol. 143 LNEE, no. VOL. 1, pp. 197-203, 2012.

[21] 1. 1.-y' Wang, H. Bensmail, and X. Gao, "Multiple graph regularized protein domain ranking," BMC Bioinfrmnatics, vol. 13, no. 1,2012.

143

100 110 120


Top Related