combining hausdorff distance, hsv histogram and...

Combining Hausdorff Distance, HSV Histogram and Nonextensive Entropy for Object Tracking

PAULO S. RODRIGUES1 AND GILSON A. GIRALDI 1 AND JASJIT S. SURI 2

1LNCC–National Laboratory for Scientific Computing -Av. Getulio Vargas, 333, 25651-070 Rio de Janeiro, RJ, Brazil

{pssr,gilson}@lncc.br2Biomedical Research Institute, Idaho, ID, USA

{suri0256}@msn.com

Abstract. In Computational Vision, object tracking in a sequence of frames is one ofthe most important problems. Among the most used approaches there is the model-targetone, which matches a model object against a candidate target region in a frame sequence.To accomplish this task, the Hausdorff distance has an attractiveness dueto its simplicityof implementation and possibility of matching between two sets with different cardinal-ity. Viewing images as non-extensive systems, we may apply the Tsallis Entropy(whichworks with only one parameter, called entropic parameter) to segment the frames in orderto find the target object. In this work we propose a methodology which combines Haus-dorff distance, Bayesian network, HSV histogram and Tsallis non-extensive entropy forobjects recognition and tracking in a frame sequence. With this proposal, wereduce theHausdorff’s noise sensitive and the high parameter dependence of thetracking task. Weapply our method in experiments with one object over a moving background in a sequenceof 300 frames.

1 Introduction

In the context of multimedia applications the detec-tion of objects under changes in illumination, directlyaffect the object tracking task. A fail due to changesin illumination in the detection of the object of inter-est may propagate the error through out the remain-der frames, imposing a need for human interventionor system reinitialization. In this paper we focus onrobust object tracking against changes in illuminationconditions [13, 8].

In a simple way, the problem of object trackingcan be posed as a correspondence problem betweentwo regions. The general idea is to use a model, calledModel-Object(MO), in a framei and to find the cor-respondent Target-Object in the framei + 1 . But, be-fore achieving the correspondence, generally we needa segmentation process which may increase the com-putational complexity and may yield problems for pa-rameter choice [13]. Among these methods, the onewhich uses the Tsallis entropy for segmentation is ofspecial interest for our work [1]. It is based on the hy-pothesis that we can find out a threshold between thebackground and the foreground through the maximiza-tion of the image information (entropy). The main ad-vantage (which we are interested) of this new theoryis that it is based on a parameter, calledq, which may

be handled according to the image characteristics.In this paper, we firstly segment the objects through

the approach based on Tsallis entropy. Then, the taskis handled through correspondence between segmentedregions (objects in scenes) and a Model Object (MO).In this step we use the Hausdorff distance [13, 8] forregion matching and a similarity measure between theHSV Histograms. Our main contribution is the com-bination of the Hausdorff distance and the Tsallis non-extensive entropy in order to reduce the Hausdorff’snoise sensitive and the high parameter dependence ofthe tracking task. To validate our approach, we exper-imentally test it under hard illumination conditions.

The paper is organized as follows. The sections2 and 3 present related works and the background ofthe used theory. Then, in section 4 we describe our al-gorithm which was tested in the experimental resultspresented in section 5. Finally, we end with conclu-sions and final comments (section 6).

2 Related Work

In this paper we focus on region-based object track-ing by combining Hausdorff distance and Tsallis non-extensive entropy. Recently, H. Xu [13] and collab-orators proposed a work for object detection with a

Proceedings of the 6th WSEAS International Conference on Multimedia Systems & Signal Processing, Hangzhou, China, April 16-18, 2006 (pp225-230)

combination of Hausdorff distance and segmentationbased on Watershed approaches. Their method con-cerns of three steps. Firstly, the target objects are ex-tracted with the use of only the two first frames. Withthe extracted object taken as a model, the second stepof the method uses the Hausdorff distance for trackingthe objects of interest in the remainder frames. Theobject found in the last frame is used as a new modelto find the object in the current one. Then, the modelis updated at each frame during the tracking task. Thisapproach has the advantage of letting the system im-mune against possible error propagation. The finalstep uses a Watershed algorithm, not for a segmenta-tion as usual, but for upper bounding the object bound-ary. This strategy avoids the influence of external re-gions, as well as noises, in the object segmentation.The main disadvantage, outlined by the authors, is thattheir proposed approach fails to separate regions withlow contrast; then, it is sensitive to changes in illumi-nation.

Another approach which uses Hausdorff distancefor object tracking is that one proposed by N. Paragiosand R. Deriche [8], which minimizes the interframedifference density function.

Recently, we introduce ongoing works [10, 11]using a Bayesian Network for parameter estimationin a framework for object recognition under an aug-mented reality environment. In these works, we usedfeature extraction to generate several candidate regionsin a frame. Then, the proposed Bayesian Network wasapplied in order to choose, among all candidates, thetarget object.

3 Background

In this section, we present the two main theories un-derlying our proposed approach for object tracking:the Hausdorff distance and the Tsallis NonextensiveEntropy.

For the context of interest, given two point setsA, B ⊂ <2, the Hausdorff distance between them canbe formally defined as [3],[2]:

H(A, B) = max(h(A, B), h(B, A)) (1)

where:

h(A, B) = maxa∈Aminb∈B‖a − b‖. (2)

‖ • ‖ represents some underlying norm defined in<2.The same can be defined for<n, n > 2. In general, itis used aLp norm, usually theL2 or Euclidean norm.The functionh(A, B) is called the direct Hausdorff

distance fromA to B. Intuitively, if h(A, B) = d,each point inA must be within a distanced of somepoint inB.

The maximum value for the Hausdorff distanceof two regionsA andB in an image with dimensionM ×N is the half of its diagonal,η =

√M2 + N2/2.

Therefore, to get the Hausdorff distance between0 and1, we normalize the equation (1) byη, and define thefollowing extensible Haudorff distance:

He(A, B) = 1 − 2H(A, B)

η(3)

The Hausdorff distance can be used for matchingtwo data sets of different cardinalities, it is sensitive tosmall rotations and smooth deformations and is sim-ple for implementation. These are the motivations forapplying Hausdorf distance in this work.

The Tsallis entropy is defined as:

Sq =1 − ∑k

i=1(pi)q

q − 1(4)

where k is the total number of states of the sys-tem, pi is the probability of finding the system inthe statei, and the real numberq is an entropic pa-rameter that characterizes the degree of nonextensiv-ity. This expression meets the Shannon entropy (S =−∑

i pi ln(pi)) in the limit q → 1. The Tsallis entropyis nonextensive in the sense that for two statisticallyindependent subsystems,A andB, the probability ofthe composed system will be:

Sq(A+B) = Sq(A)+Sq(B)+(1−q) ·Sq(A) ·Sq(B)(5)

The concept of entropy has been used for imagesegmentation through thresholding [6][7]. The fol-lowing method, proposed in [1], is such an exampleand will be used in our method described in the nextsection. For an image withk gray-levels, letpi =p1; p2; . . . ; pk be the probability distribution of the lev-els. From this distribution we derive two probabilitydistributions, one for the object (class A) and anotherfor the background (class B). The probability distribu-tions of the object and background classes, A and B,are given by

pA :p1

pA,p2

pA, . . . ,

pt

pA(6)

pB :pt+1

pB,pt+2

pB, . . . ,

pk

pB(7)

wherepA =∑t

i=1 pi andpB =∑k

i=t+1 pi.


The a priori Tsallis entropy for each distributionis defined as

SAt =

1 − ∑ti=1(

pi

pA )q

q − 1(8)

SBt =

1 − ∑ki=t+1(

pi

pB )q

q − 1(9)

The Tsallis entropySq(t), of the composite system,is dependent upon the threshold valuet for the fore-ground and background. If we suppose that A and Bare statistically independent, its value can be obtainedby equation (5):

Sq(t) =1 − ∑t

i=1(pA)q

q − 1+

1 − ∑ki=t+1(pB)q

q − 1+

(1 − q).1 − ∑t

i=1(pA)q

q − 1.1 − ∑k

i=t+1(pB)q

q − 1(10)

We maximize the information measure betweenthe two classes (object and background). WhenSq(t)is maximized, the luminance levelt is considered tobe the optimum threshold value. This can be achievedwith a cheap computational effort. Note that the valuet which maximizes equation (5) depends on the pa-rameterq also. This is an advantage because, by vary-ing q it is possible to obtain a value oft adapted to cur-rent illumination conditions. Certainly, the range forsearchingq values is a key issue in this proposal. For-tunately, the experiments have shown that this range issmall. This fact, makes the nonextensive segmentationadequate to applications which need a frame sequencesegmentation at real time under changes in the illumi-nation conditions.

4 The Proposed Tsallis-Hausdorff-HSV Algorithmfor Tracking

In our work we proposed a framework which com-bines the nonextensive Tsallis entropy, Hausdorff dis-tance and HSV Histogram for tracking objects in aframe sequence. We follow the well known idea ofusing a model, called Model-Object, to find the corre-spondent Target-Object in the next frame.

The segmentation algorithm of section 3 dependson the parameterq. By changingq we can adaptthe measure of information in the image given by theTsallis entropy (equation ( 4)). Thus, to get a com-putational feasible method we sample theq parame-ter to obtain a set{q1, q2, ..., qm} of values. Thus,given a framef , the optimization of expression(10 ) will produce a sequence of optimum thresholds

{

tq1

opt, tq2

opt, ..., tqm

opt

}

and, consequently, a set of seg-

mented (binary) imagesIc = {f1, f2, . . . , fm}, wherefi is the image segmented by usingtqi

opt as a threshold.Fortunately, the experiments have shown that a suit-able small range ofq can be found.

Eachfi may have several segmented regions —among foreground and background ones — which wecall Target-Regions. Then, letri,k ⊂ fi be thek-thTarget-Region of the binary imagefi. In this paper, weassume that the Model-ObjectMO was defined in thefirst frame through user interaction or some automatictechnique [5, 11].

The object recognition in a scene is accomplished,in this work, with the combination of color and shapecharacteristics. For shape characteristic we use theHausdorff distance throughout equation (3), and forcolor characteristics we used the probability distribu-tion of color in the HSV System. Then, we use

Hc(MO, rjk) =

∑

i αi × vi,jk√

∑

i α2i ×

√

∑

i v2i,jk

(11)

as a similarity measure between the HSV Histogramsof the Model-Object, MO, and the HSV Histograms ofthe region of interestrj . In the equation ( 11),αi andvi,jk stand for the bins of the histograms of the regionsMO andrjk, respectively.

Finally, we define the similarity measure,S ∈[0, 1], which uses color and shape between the Model-Object, MO, and the Target-Object,rjk, as follows:

S(MO, rjk) = 1 − [(1 − He(MO, rjk))×

(1 − Hc(MO, rjk))] (12)

This equation stands for the probability of theTarget-Object,rjk, occurs since the Model-Object,MO, has occurred with probability distribution givenby He andHc. Expression (12) is based on the equa-tion defined in [9] and [12], for text information re-trieval, and in [11] for image retrieval, where the prob-ability of occurrence of a documentA since documentB has occurred, is defined as:

P (A|B) = 1 − [(1 − Pi(A|B)) × (1 − Pj(A|B))×

(1 − Pk(A|B))] (13)

This equation, which has been defined from aBayesian Theory [4], also can be interpreted as theprobability of the document A be equals to documentB. Each termi, j andk stands for a document char-acteristic (also called evidence). In the present paper,


we have adapted this equation by considering the ev-idencesi, j and k, as object characteristics such ascolor and shape, and so, obtaining a similarity mea-sure between the Model-Object and the Target-Objectgiven by equation (12).

Our proposed Hausdorff-Tsallis algorithm for track-ing can be summarized as follows: Given a sampleset{q1, q2, ..., qm} and a framef , we computeIc ={f1, f2, . . . , fm}, the set of segmented (binary) framesobtained through the application of the Tsallis entropyalgorithm over the framef . Then, the Hausdorff dis-tance is computed with equation (3) for allrjk ⊂ fj

and the color distance is computed with equation (11)for the MO. Finally, theMO is updated with therjk which minimizes the equation (12). With the newMO, the process is repeated for the next frame.

5 Experimental Results

In order to validate the proposed algorithm, several ex-periments were carried out. The Fig. 1 shows an ex-ample of the problem at hand. For simplification, theFig. 1-(a) shows a unique object under a homogeneousbackground. We assume that before the frame capture,the Model Object’ contour is a known data. The valueof q = 0.35 was chosen since it generates good seg-mentation yielding the contour of Fig. 1-(b) (frame1). However, in the Fig. 1-(c) (frame 7), the illumina-tion conditions changed, and the parameterq = 0.35is now inadequate yielding a wrong segmentation (ascan be seen by the curves outlining the Target-Object).In this case, there will be error propagation to the re-mainder frames.

Even if there is an automatic mechanism avoidingerror propagation — such as, for example, if an objectdata base is requested periodically to check if the cur-rent contour is according to the Model Object — theproblem holds yet, due to the fixed value forq underchanges in illumination.

The Fig. 2 shows a frame sequence with the sameobject of Fig. 1. However, we have applied differentvalues forq, taken according to different scene con-ditions. Moreover we observe changes in the cameraview to show the method’s robustness against changesin the object scale. We carried out the complete ex-periment using our approach with 300 frames for thesame scene of the Fig. 1. From this experiment,4frames are presented in Fig. 3 with the Target-Objectinside a rectangle to indicate a positive finding. Fromthese frames we can see strongly variation of illumi-nation. However, we observe a correct recognition ofthe Target-Object. In this sequence, we can also see

(a) (b)

(c) (d)

Figure 1: A frame sequence illustrating the problemat hand. (a) shows the image of Model-Object; (b)-(d) show some remainder frames under different il-lumination conditions. The white curves around theobjects (b)-(d) bound the best matched regions. Theparameterq = 0.35 is the same for all frames. Un-der changes in illumination conditions the parameterq = 0.35 becomes inadequate to segment the remain-der Target-Regions.

(a) (b)

(c) (d)

(e) (f)

Figure 2: A frame sequence illustrating the applicationof the proposed model when changingq value. (a)-(f)shows the same object model of Fig. 1, under differentillumination and viewer conditions.


a strongly variation in the object viewpoint, e.g: Figs.3-(a), (c). In the performed experiments, the entropicparameterq had a range0 ≤ q ≤ 1.

(a) (b)

(c) (d)

Figure 3: A sequence of frames for which the pro-posed method is applied. The rectangular box showsthe area found with a target object. In this experimentthere are changes in view and illumination.

In order to measure the precision of the proposedmethod, we may define the following equation:

Precision =A ∩ B

A ∪ B(14)

whereA andB are two sets of pixels. This equationcaptures the matching precision underlying both theposition and shape of the regions. We can observe that,when the shape ofA resembles the shape ofB andalso they have geometrical centers near each other, theobtained value tends to1.0. Inversely, when they haveno intersection, the precision tends to0. This equationmatches the shape and position at the same time.

In our experiment, we have used this equationto measure the precision of our proposed algorithmat each frame, where the Model-Object is matchedagainst a manual segmentation, and the equation (14)is used to get the precision. The graphic frames× pre-cision for300 frames are presented in Fig. 4.

6 Conclusion

We presented a methodology for object tracking ina frame sequence. This methodology uses a combi-nation of Hausdorff distance for shape matching andHSV histograms for color comparison. Also, it usesTsallis nonextensive entropy for image segmentation.

The usefulness of Tsallis nonextensive entropywas demonstrated to set the segmentation according

Figure 4: This graphic shows the frames× precisionfor 300 frames.

to changes in illumination. When comparing the seg-mentation using Tsallis entropy with other methodsfor segmentation that use spatial filters and/or math-ematical morphology, the handle of only one parame-ter is one advantage, since the method is efficient androbust against changes in the illuminations conditions.The proposed model is easily extensible to others fea-tures and should be tested for real-time applications.

Acknowledgment

The authors are acknowledged to CNPq (Conselho Na-cional de Desenvolvimento Cientıfico e Tecnologico),a Brazilian agency for scientific financing.

References

[1] M. P. Albuquerque, M. P. Albuquerque, I.A. Es-quef, and A.R.G. Mello. Image thresholding us-ing tsallis entropy.Pattern Recognition Letters,25:1059–1065, 2004.

[2] D. P. Huttenlocher, G. A. Klanderman, and W. J.Rucklidge. Comparing images using the haus-dorff distance. IEEE Transactions on MedicalImaging, 15(9):850–863, 1993.

[3] D. P. Huttenlocher, J. J. Noh, and W. J. Ruck-lidge. Tracking non-rigid objects in complexscenes. Technical Report 1320, Dept. of Com-puter Science, Cornell Univ., 1992.

[4] F. V. Jensen.Bayesian Networks and DecisionGraphs. Statistics for Engineering and Informa-tion Science. Springer, 2001.


[5] C. Kim and J. Hwang. Fast and automaticvideo object segmentation and tracking forcontent-based applications.IEEE Transactionson Circuits and Systems for Video Technology,12(3):122–129, February 2002.

[6] C. H. Li and C. K. Lee. Minimum cross entropythresholding.Pattern Recognition, 26:617–625,1993.

[7] N. R. Pal. On minimum cross entropy threshold-ing. Pattern Recognition, 26:575–580, 1996.

[8] N. Paragios and R. Deriche. Geodesic active con-tours and level sets for detection and tracking ofmoving objects. IEEE Transaction on PatternAnalysis and Machine Intelligence, 22:266–280,March 2000.

[9] B. Ribeiro-Neto and R. R. Muntz. A belief net-work model for IR. In ACM Conference onResearch and Development in Information Re-trieval - SIGIR96, pages 253–260, 1996.

[10] P. S. Rodrigues, D. M. C. Rodrigues, G. A. Gi-raldi, and R. L. S. Silva. A bayesian networkmodel for object recognition environment usingcolor features. InEletronic Proceedings of Sib-grapi’04, 2004.

[11] P. S. Rodrigues, D. M. C. Rodrigues, G. A. Gi-raldi, and R. L. S. Silva. Object recognition us-ing bayesian networks for augmented reality ap-plications. InProceedings of VII Symposium onVirtual Reality (SVR’04), 2004.

[12] I. Silva, B. Ribeiro-Neto, P. P. Calado, E. S.Moura, and N. Ziviani. Link-based and content-based evidential information retrieval in a beliefnetwork model.Proceedings of the 23rd AnnualACM SIGIR Conference on Research and Devel-opment in Information Retrieval, pages 96–103,July 2000.

[13] H. Xu, A. Younis, and M. R. Kabuka. Automaticmoving object extraction for content-based ap-plications. IEEE Transaction on Circuits andSystems for Video Technology, 14(6):796–812,June 2004.


combining hausdorff distance, hsv histogram and...

Documents