probabilistic boundary-based contour tracking with snakes in natural cluttered video sequences

29
July 23, 2003 13:18 WSPC/INSTRUCTION FILE IJIG˙DM˙10 International Journal of Image and Graphics c World Scientific Publishing Company PROBABILISTIC BOUNDARY-BASED CONTOUR TRACKING WITH SNAKES IN NATURAL CLUTTERED VIDEO SEQUENCES GABRIEL TSECHPENAKIS School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou Str., Zografou, 15773 Athens, Greece [email protected] NICOLAS TSAPATSOULIS School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou Str., Zografou, 15773 Athens, Greece [email protected] STEFANOS KOLLIAS School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou Str., Zografou, 15773 Athens, Greece [email protected] Received 2 August 2002 Revised 19 February 2003 Accepted 31 March 2003 Moving object detection and tracking in video sequences is a task that emerges in var- ious research fields of video processing, including video analysis and understanding, object-based coding and many related applications, such as content-based retrieval, re- mote surveillance and object recognition. This work revisits one of the most popular deformable templates for shape modeling and object tracking, the Snakes, and proposes a modified snake model and a probabilistic utilization of it for object tracking. Special attention has been drawn to complex natural (indoor and outdoor) sequences, where temporal clutter, abrupt motion and external lighting changes are crucial for the ac- curacy of the results, also focusing on the ability of the proposed approach to handle specific HCI problems, such as face and facial feature tracking. A variety of image se- quences are used to illustrate the method’s capability, providing theoretical explanation as well as experimental verification in specific tracking problems. Keywords : Model-based Snakes, rule-driven tracking, object partial occlusion 1. Introduction The issues of shape modeling and object tracking are very important for the fields of image and video processing and numerous applications related to them. Many approaches have been proposed in the literature, focusing on either the highest possible accuracy or the lowest computational cost, depending on the application criteria. Among the most popular approaches, the deformable templates 17 known as 1

Upload: independent

Post on 06-Mar-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

July 23, 2003 13:18 WSPC/INSTRUCTION FILE IJIG˙DM˙10

International Journal of Image and Graphicsc© World Scientific Publishing Company

PROBABILISTIC BOUNDARY-BASED CONTOUR TRACKINGWITH SNAKES IN NATURAL CLUTTERED VIDEO SEQUENCES

GABRIEL TSECHPENAKIS

School of Electrical and Computer Engineering, National Technical University of Athens,9 Iroon Polytechniou Str., Zografou, 15773 Athens, Greece

[email protected]

NICOLAS TSAPATSOULIS

School of Electrical and Computer Engineering, National Technical University of Athens,9 Iroon Polytechniou Str., Zografou, 15773 Athens, Greece

[email protected]

STEFANOS KOLLIAS

School of Electrical and Computer Engineering, National Technical University of Athens,9 Iroon Polytechniou Str., Zografou, 15773 Athens, Greece

[email protected]

Received 2 August 2002Revised 19 February 2003Accepted 31 March 2003

Moving object detection and tracking in video sequences is a task that emerges in var-ious research fields of video processing, including video analysis and understanding,object-based coding and many related applications, such as content-based retrieval, re-mote surveillance and object recognition. This work revisits one of the most populardeformable templates for shape modeling and object tracking, the Snakes, and proposesa modified snake model and a probabilistic utilization of it for object tracking. Specialattention has been drawn to complex natural (indoor and outdoor) sequences, wheretemporal clutter, abrupt motion and external lighting changes are crucial for the ac-curacy of the results, also focusing on the ability of the proposed approach to handlespecific HCI problems, such as face and facial feature tracking. A variety of image se-quences are used to illustrate the method’s capability, providing theoretical explanationas well as experimental verification in specific tracking problems.

Keywords: Model-based Snakes, rule-driven tracking, object partial occlusion

1. Introduction

The issues of shape modeling and object tracking are very important for the fieldsof image and video processing and numerous applications related to them. Manyapproaches have been proposed in the literature, focusing on either the highestpossible accuracy or the lowest computational cost, depending on the applicationcriteria. Among the most popular approaches, the deformable templates 17 known as

1

July 23, 2003 13:18 WSPC/INSTRUCTION FILE IJIG˙DM˙10

2 G. Tsechpenakis, N. Tsapatsoulis and S. Kollias

active contours have come up and drawn special attention in the last decade, dealingwith problems like image and motion segmentation 7,22,4,5,10,32,30, object detection,localization and tracking in video sequences 35,33,42,8. A variety of applications areinvolved, such as object-based video coding, remote surveillance, content-based re-trieval and object recognition.Regarding the general problem of object localization and tracking, existing ap-proaches cope with two basic cases: when (a) a static camera 21 and (b) a mobilecamera 31,43 captures the examined sequences. For the special case of static cam-eras, the proposed methods can be divided into two main classes: the feature-based,depending on the extraction of general sequence characteristics and the pixel-based,examining the differences between successive frames using pixels as input features.Based on the motion estimation results, mobile objects are then extracted usinga variety of motion segmentation algorithms. For sequences acquired by a movingcamera, existing approaches can be divided into two categories: one deriving con-straints for object’s 2D motion parameters from the 3D motion of the camera 39 andanother constructing a 2D parametric motion model for the background dominantmotion 27,21,40,43. In both cases (static and moving camera), using the extractedmotion features, various active contour models 41 are then employed to estimatethe contours of the mobile objects, dealing with object distortion due to temporalclutter or changes in viewing geometry.A major category of active contours, the Snakes, has been successfully applied toa variety of problems, such as edge detection and tracking 13. Snakes are based onenergy minimization along a curve, that is a curve (or snake) deforms its shape so asto minimize an ”internal” and an ”external” energy along its boundary. However,various problems concerning these models arise due to strong existence of noise,mostly in natural cluttered sequences 37, the requirement of an appropriate shapeinitialization 6, and/or the parameter tuning 29. Although snakes, due to their lin-earity, provide efficient computational times and, thus they can be applicable evenin real time problems, automatic handling of the above problems increases the com-plexity of the algorithms significantly.In this work we try to balance the tradeoff between computational complexity andthe efficient solution of the issues mentioned in the previous paragraph. We presenta modified snake model, which is relatively robust to parameter tuning while at thesame time handles image clutter through the appropriate definition of the externalenergy term, and a probabilistic utilization of it in tracking contours of moving ob-jects which minimizes the effort devoted to the shape initialization. The proposedmethod consists of two main steps: the extraction of the ”uncertainty regions” ofeach object in a sequence, and the estimation of the mobile object contours. Theterm ”uncertainty regions” is used to describe the regions in a frame where mov-ing contours are possible to be located, whereas the estimation of the contoursconsists of an energy minimization procedure based on the proposed snake energyterms. More specifically, the contour of a moving object is estimated first in a fewsuccessive frames of a sequence; this can be achieved with appropriate parameter

July 23, 2003 13:18 WSPC/INSTRUCTION FILE IJIG˙DM˙10

Probabilistic Boundary-Based Contour Tracking with Snakes in Natural Cluttered Video Sequences 3

initialization utilizing the proposed snake model. Then, in the current frame, afamily of curves is generated by oscillating each point of the lastly estimated curveinside the respective uncertainty regions; these regions are extracted using the dis-placement history of each point of the contour. The procedure of picking out thecorrect curve, representing the contour of the moving object in the current frame,is obtained as the solution of an energy minimization problem. In order to reducethe computational complexity of the proposed method, we follow a force-based ap-proach, converting energy terms to respective forces, and the individual terms thatform the overall model are appropriately defined and justified.Special attention has been given to accuracy, enabling the proposed approach toresult in efficient solutions in a variety of applications involving different classes ofobjects and object movements even in sequences with complex noisy backgrounds.Moreover, computational complexity is kept low, aiming at applications that re-quire fast implementation of object tracking techniques.The paper is organized as follows. In Section 2, the theoretical background of theSnakes is given, while in Section 3 the proposed modified snake model is described.In subsection 3.1, we describe some morphological operations that are used forimage enhancement purposes and for the extraction of some useful image featuresthat improve the proposed snake model’s accuracy. In Section 4, the object trackingproblem is posed and the proposed approach is extensively described and theoreti-cally testified in a variety of cases, where common problems of contour estimationand tracking emerge. In Section 5, the proposed method’s performance is exper-imentally tested in natural sequences, where objects of different size and shapecomplexity are tracked in cluttered environment. Finally, in Section 6, conclusionsare drawn about the efficiency of the proposed method and a brief description offurther work is given.

2. Snake Model

Active contour models were first introduced by Kass et al. 18 through the so-calledsnake models. In general, snakes concern model and image data analysis, throughthe definition of a linear energy function and a set of regularization parameters.This energy function consists of two parts: (a) the data-driven component (exter-nal energy), which depends on the image data according to a chosen criterion and(b) the smoothness-driven one (internal energy), which enforces smoothness alongthe snake. The goal is to minimize the total snake energy and this is achieved iter-atively, after considering an initial estimate for the object shape (prototype). Oncesuch an appropriate initialization is specified, the snake can converge to the nearbyenergy minimum, using gradient descent techniques.According to the above formulation, a snake is modeled as being able to deformelastically, but any deformation increases its internal energy causing a ”restitution”force, which tries to bring it back to its original shape. At the same time, the snakeis immersed in an energy field (created by the examined image), which causes a

July 23, 2003 13:18 WSPC/INSTRUCTION FILE IJIG˙DM˙10

4 G. Tsechpenakis, N. Tsapatsoulis and S. Kollias

(a) (b) (c)

Fig. 1. Image segmentation using the snake approach. The image energy is defined inverselyproportional to the image gradient. (a) Input image: the snake is initialized around the object ofinterest, (b) intermediate image: the snake is attracted to the salient edges by image gradient, (c)final configuration: the snake converges to the object boundary of high image gradient.

force acting on the snake. These two forces balance each other and the contouractively adjusts its shape and position until it reaches a local minimum of its totalenergy. Fig. 1 presents an example of image segmentation using a snake. In Fig. 1(a)the snake is initialized (manually) near the head, which is the object of interest.The snake gets attracted to the salient edges of high gradient (Fig. 1(b)) and isfinally locked at the head boundary (Fig. 1(c)).The main problem that snakes deal with is that usually this initial approximation(snake initialization) must be close to the object boundaries; otherwise the pro-cedure may be trapped in local minima and give insufficient results. Moreover, incase of strong existence of noise (temporal clutter and/or external lighting changes),when the edges at object boundaries are relatively smooth, or when the backgroundis complex and does not provide smoothness close to object boundaries, snakes turnout to be inefficient.Let us consider a snake Csnake, which represents a curve defined by a position vec-tor X = (x(p), y(p)) on the image plane, and is parameterized by a p, 0 ≤ p ≤ P .The total energy function of the snake Csnake, is then a weighted summation of aninternal energy factor Eint, corresponding to the summation of a bending and astretching energy term, and an external energy factor Eext, which denotes how thesnake evolves according to the examined image features:

Esnake = a · Eint + b · Eext, (1)

Eint =∫ P

0

eint(p)dp, (2)

Eext =∫ P

0

eext(p)dp, (3)

where eint(p) and eext(p) are the respective internal and external energy termsdefined for each point p of the curve, whereas a and b are the normalization param-eters.For the internal energy Eint, a variety of definitions have been proposed, according

July 23, 2003 13:18 WSPC/INSTRUCTION FILE IJIG˙DM˙10

Probabilistic Boundary-Based Contour Tracking with Snakes in Natural Cluttered Video Sequences 5

to the application in hand, and thus several examples of such energy terms can befound in the literature; indicatively three approaches are mentioned: the B-snake 9,where the curve is approximated by B-spline polynomials, the affine-invariant (AI-)snake 15 and the curvature-based snake 36 models. On the contrary, in most of theapproaches, the external energy term for each point p of the snake, is generallydefined as:

eext(p) = 1− |∇Gσ∗I(x(p), y(p))| · |g(p) · n(p)|, (4)

where |∇Gσ∗I(x(p), y(p))| (Gaussian-of-Laplacian) denotes the magnitude of thegradient of the image I convolved with a gaussian filter, of variance σ at point(x(p), y(p)), corresponding to the curve point p. The unit vectors g(p) and n(p)denote the image gradient direction and the normal vector of the snake at thepoint p, respectively.From eq. (4) it can be easily seen that in smooth regions of the image I, whereimage gradient is relatively low, the external energy term takes values close to 1(maximum), whereas in regions including (or close to) edges, the image gradient(normalized so as to have maximum value euqal to 1) is high and the external energyterm takes values close to 0. In this sense, the external energy forces the snaketowards the most salient edges of the image, during the total energy minimizationprocedure. In fact, in natural sequences with the presence of clutter and complexbackgrounds, such a definition of external energy usually leads to inadequate results,due to background edges that are close to objects having smoother ones, or due tothe presence of edges in the interior of the object boundaries; especially in noisyimages, even if the snake initialization is very close to the desired contour, such acriterion leads to insufficient results.After the definition of the energy terms, the procedure of snake’s convergence tothe object boundary is given by the solution of its total energy minimization:

Csnake = arg min[a · Eint + b · Eext] (5)

Fig. 2 illustrates the snake self-adjustment so as to minimize its total energy. Theprototype (snake initialization) is denoted by continuous line while the respectivenew positions of the snaxels, which provide the optimal shape corresponding to theenergy minimum, are presented by dots.The minimization of the snake total energy function is usually achieved by followingthe gradient descent approach, utilizing algorithms of dynamic programming.

3. Proposed Snake Model

The proposed model adopts the above described snake formulation, regarding thetotal energy minimization, but gives different definitions to the respective energyterms. The aim of defining the respective energies as presented below, is the opti-mum utilization of the snake model in tracking a contour in natural video sequences,even in case of strong existence of temporal clutter.

July 23, 2003 13:18 WSPC/INSTRUCTION FILE IJIG˙DM˙10

6 G. Tsechpenakis, N. Tsapatsoulis and S. Kollias

Fig. 2. Face contour example - snake prototype (continuous line) and snaxels new positions (dots)that minimize the total snake energy.

Let us consider again a snake Csnake, defined by the position vector X = (x(p), y(p))on the image plane, and parameterized by p. The internal energy of the snake isthen defined in terms of the snake curvature Ksnake and its point density distribu-tion DVsnake, considering that the snake points are not equally spaced along thecurve:

Ksnake =x · y − x · y(x2 + y2)

32

, (6)

DVsnake =√

x2 + y2, (7)

The first and the second derivatives of (x, y), define the velocity and the accelera-tion, along the curve:

X = [x, y] = [dx

dp,dy

dp], (8)

X = [x, y] = [d2x

dp2,d2y

dp2] (9)

Practically, in the discrete case, as in our implementation, the snake is defined byN ordered points Vi, i.e Csnake = [Csnake(Vi) | i = 1, . . . , N ], corresponding tothe spatial coordinates (xi, yi) on the image plane. Thus, the derivatives of (x, y)at each site Vi, are obtained using the neighboring points Vi−1 and Vi+1, e.g for thefirst derivatives:

{x|i =xi+1 − xi−1

2, y|i =

yi+1 − yi−1

2} (10)

Also, according to its definition, the sign of the curvature is positive, if the contour islocally convex, and negative if concave. Moreover, curvature distribution/functionuniquely defines a curve and it does not significantly change as the desired objectmoves, while it can uniquely define a propagating curve at different time instances.However, it should be stated that curvature is not affine-invariant and thus it isinappropriate for shape modeling in object recognition problems 15,1. In the pro-posed snake model the points constituting a curve are not equally spaced and thusthe distances between successive points represent the local elasticity of the snake.As can be easily seen from eqs. (7) and (8), the term “local elasticity” we use, is

July 23, 2003 13:18 WSPC/INSTRUCTION FILE IJIG˙DM˙10

Probabilistic Boundary-Based Contour Tracking with Snakes in Natural Cluttered Video Sequences 7

actually the norm of the velocity along the curve, , i.e DVsnake = ‖X‖. In ourimplementation, a slight variation in the curvature of the snake, corresponding torelative smooth parts of it, leads to low point densities, whereas greater variation inthe curvature distribution leads to higher point densities. Thus, the point densitydistribution represents the constraint that the snake elasticity should not signifi-cantly change in successive frames of a sequence.However, it should be noted that both curvature and elasticity terms, not beingaffine-invariant, they cannot determine snake point correspondences between twosuccessive frames of a video sequence. That is, in case of object affine transfor-mations, due to the projected motion on the image plane, such as objects movingtowards the shooting camera, rotations, or even camera zooming, these terms arenot adequate to define the correspondences between snake points. This problemis crucial to object recognition, but in our method these terms are used as curvesimilarity criteria for the objects contour in successive frames, as will be explainedin the following sections. Fig. 3 illustrates these two distributions of a given curve.In Fig. 3(a), the snake is locked at the desired object (car) boundary. The curvepoints shown on the snake correspond to indicative parts of along the curve, wherethe variance of curvature and point density distribution is high. The respective dis-tributions are illustrated in Figs. 3(b,c), where the marked points are also presentedin dots. As it is shown, smooth parts of the snake correspond to curvature valuesnear the zero value, whereas rough parts of it correspond to high variance of thecurvature distribution.Using the definitions of eqs. (6) and (7), the snake internal energy term, for a pointp, is given by,

eint(p) = |Ksnake(p)|+ DVsnake(p), (11)

In cluttered sequences, in case of edges at object boundaries being relatively smooth,or when the background is complex and does not provide smoothness close to objectboundaries, the external energy term, defined in eq. (4), turns out to be inefficient,as explained above. Another term is therefore introduced, minimizing the local vari-ations of the image gradient, preserving the most ”important” image regions. Thisis achieved through morphological operations leading to a modified image gradi-ent. In particular, in eq. (4), the expression ∇Gσ∗I is replaced by a morphologicalmodified image gradient Gm and the image-data criterion is strengthened throughthe square of Gm:

eext(p) = 1−G2m(p) · |g

m(p) · n(p)|, (12)

where gm

(p) and n(p) denote the modified image gradient unit vector and thenormal vector of the snake at the point p, respectively.To obtain the morphological modified image gradient Gm, we follow the proceduredescribed in the next subsection.

July 23, 2003 13:18 WSPC/INSTRUCTION FILE IJIG˙DM˙10

8 G. Tsechpenakis, N. Tsapatsoulis and S. Kollias

(a)

0 50 100 150 200 250 300 350−1

−0.5

0

0.5

0 50 100 150 200 250 300 3500.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

(b) (c)

Fig. 3. Curvature and point density distribution of a given contour: (a) the snake is locked atcar boundaries, whereas the circled areas denote parts of the curve of high curvature and pointdensity, (b) curvature distribution and (c) point density distribution.

3.1. Morphological Operations for the Definition of the Snake

External Energy

In order to preserve the most important regions, and thus the most salient edges, ofeach frame of a video sequence, we utilize some morphological operations leading tothe proposed modified image gradient, used in eq. (12), exploiting some well-knownadvantages, mostly appearing in morphological watershed image segmentation 28,24.

3.1.1. Image Pre-filtering:

To obtain the modified image gradient, we first pre-smooth the image (video frame)with a non-linear morphological filter, called ASF (Alternating Sequential Filter)23, and we extract the morphological image gradient. The ASF used in our model isbased on morphological area opening (◦) and closing (•) operations with structureelements of increasing scales. More specifically, let us denote by Si, i=1,...,8, the2-pixel connected line segments oriented at 45(i-1) degrees, and by nSi the corre-sponding (n+1)-pixel elements of size n=1,2,3,.... The openings αn and closings βn

that make up the filter are the following:

αn(I)(x, y) = maxi∈[1,4]

[I ◦ nSi(x, y)], (13)

July 23, 2003 13:18 WSPC/INSTRUCTION FILE IJIG˙DM˙10

Probabilistic Boundary-Based Contour Tracking with Snakes in Natural Cluttered Video Sequences 9

(a)

(b)

(c)

Fig. 4. Structure elements of orientations 0, 45, 90, 135, 180, 225, 270 and 315 degrees andincreasing scale (a) 5× 5, (b) 7× 7 and (c) 9× 9, for ASF construction.

(a) (b) (c)

Fig. 5. Comparative results of different filters: (a) proposed ASF, (b) median and (c) Wiener filter.

βn(I)(x, y) = mini∈[1,4]

[I • nSi(x, y)], (14)

and the ASF ψn is given by the following cascade:

ψn = βnαn...β2α2β1α1 (15)

Fig. 4 illustrates an example of structure elements of different orientation and in-creasing scale, used for the construction of the ASF utilized in our implementation.It should also be noted that structure elements of only four orientations could beused, i.e. 0o, 90o, 180o, 270o, to reduce time complexity, but the obtained resultswould be of lower accuracy.The main advantage of such filters is that they preserve line-type image struc-

tures, which is impossible to be achieved with such methods as median filtering 23.Also, this filter is considered to be the most appropriate one, since it can handlerandom noise, without smoothing edges. Fig. 5 illustrates a comparison betweenthe proposed ASF, a median and a Wiener filter, and as can be clearly seen, noiseis successfully eliminated in all cases, but only ASF preserves the most importantedges of the examined image.

July 23, 2003 13:18 WSPC/INSTRUCTION FILE IJIG˙DM˙10

10 G. Tsechpenakis, N. Tsapatsoulis and S. Kollias

Fig. 6. Binary markers extraction for morphological modified image gradient calculation.

3.1.2. Morphological Modified Image Gradient Extraction:

The next step is to extract binary markers of the image, which is a part of theWatershed transformation in image segmentation problems 24. These markers m

are used for the extraction of the modified image gradient Gm, through a geodesicerosion reconstruction of the image gradient G; the procedure is based on successivemorphological conditional erosions of the markers, so that they constitute the onlylocal minima of the image gradient:

Gm = limk→∞

[(mªB)∨

G](k) (16)

where B is a symmetric structuring element of radius 1, k is the number of thesuccessive operations, and “ª” and “

∨” denote the flat erosion and supremum op-

erations, respectively 25.More specifically, from the initial image I, a constant h is subtracted (Fig. 6) anda morphological procedure called “grayscale reconstruction opening” is applied tothe initial and the resulting image; what actually happens is the expansion of theintensity function I − h, until it reaches the function I. The obtained image issubtracted from the initial one, I, and the result is an image J that contains onlythe local maxima of I. In order to preserve only the most important maxima of I,a threshold, usually equal to h/2, is used and the resulting image consists of thebinary markers of the image I, representing its most important maxima. The sameprocedure is followed to extract the binary markers that represent the image’s mostimportant minima. Finally, after combining the binary marker images into one, ageodesic erosion reconstruction of image gradient G is computed, according to theeq. (16), as illustrated in Fig. 7.

July 23, 2003 13:18 WSPC/INSTRUCTION FILE IJIG˙DM˙10

Probabilistic Boundary-Based Contour Tracking with Snakes in Natural Cluttered Video Sequences 11

Fig. 7. Geodesic erosion reconstruction of the image gradient with the extracted binary markers,resulting to the modified image gradient.

Regarding the constant h, a reasonable choice is 0.1 ≤ h ≤ 0.3, considering theimage intensity function normalized in the interval [0, 1]. This constant is only usedfor the extraction of the binary markers, representing the most important maximaand minima of the examined image and it can be experimentally verified that thechoice of h is not crucial for the results of the geodesic erosion reconstruction, whichfollows the binary markers extraction.Fig. 8 illustrates the differences between the two image-data criteria |∇Gσ∗I(xi, yi)|and Gm, presented in eqs. (4) and (12). It can be seen in Figs. 8(b,c) that the pro-posed procedure clearly suppresses noise and retains the most important edgesof the examined image, whereas Figs. 8(d,e) illustrate the difference between im-age gradient and the proposed modified gradient distributions, computed along arandomly selected image row. Since only the most important image regions arepreserved in the proposed modified gradient, some details inside the object regionare eliminated, especially when pre-smoothing is applied, which is actually of noimportance, even if some regions are merged together. The reason is that even insuch cases, the snake internal energy does not allow great local deformations of themoving object contour, as it will be explained in the proposed tracking approach.

4. The Proposed Tracking Approach

This Section addresses tracking of moving contours in a video-stream. Specifically,given a sequence of images in which a known object of interest is in motion, the goalis to track the object’s silhouette across time-varying images. A variety of applica-tions take advantage of object contour tracking: video analysis, crowd surveillance,human-computer interaction, robotics, automated analysis of human organ perfor-mance as well as standardizing issues related to MPEG-4 and MPEG-7. Besides,tracking provides a good basis for high-level tasks of computer vision like 3D re-construction and 3D representation.During the last decade, a great variety of tracking algorithms have been proposed.They can be divided into two main classes: (a) the motion-based approaches that

July 23, 2003 13:18 WSPC/INSTRUCTION FILE IJIG˙DM˙10

12 G. Tsechpenakis, N. Tsapatsoulis and S. Kollias

(a) (b) (c)

20 40 60 80 100 1200

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

20 40 60 80 100 1200

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

(d) (e)

Fig. 8. Differences between the image-data criterion using the image gradient and the proposedone: (a) original image, (b) image gradient and (c) modified image gradient, (d) image gradientcomputed along a randomly chosen row shown in (a), and (e) modified image gradient computedalong the same row.

rely on grouping motion information over time 27 and (b) the model-based ap-proaches that impose high-level semantic representation and knowledge 11,20. Inboth cases, tracking is performed utilizing geometrical or region-based propertiesof the tracked object. In this context, there are mainly two approaches being fol-lowed: boundary based (edge-based) methods 16,19,26, which rely on the informationprovided by object boundaries, and region-based ones 27,2, which rely on the infor-mation provided by the interior area of the tracked object.The main issues that tracking approaches are called upon to cope with are thefollowing: (a) non-rigid (deformable) moving objects, (b) moving objects with acomplicated (not smooth) contour, especially for boundary-based methods, (c) ob-ject movements which are not simple translations, but also include rotations andmotion towards the camera capturing the sequence, (d) sequences with complexbackground (common case in natural sequences), and (e) sequences with strongexistence of temporal clutter and external lighting changes. The latter has been amotivation for many researchers, especially in the last years, to follow probabilisticapproaches, e.g 34. In the following we describe the proposed approach, which aimsto cope with the above mentioned problems, and especially with the presence ofclutter and external conditions changes.The proposed method mainly consists of two steps: the extraction of the “uncer-tainty regions” of each object in a sequence, and the estimation of the mobile objectcontours. The term “uncertainty regions” is used to describe the regions in a frame,where moving contours are possible to be located, whereas the estimation of thecontours consists of an energy minimization procedure based on the proposed snake

July 23, 2003 13:18 WSPC/INSTRUCTION FILE IJIG˙DM˙10

Probabilistic Boundary-Based Contour Tracking with Snakes in Natural Cluttered Video Sequences 13

energy terms, described in Section 2. More specifically, the contour of a moving ob-ject is estimated first in a few successive frames of a sequence; this can be achievedwith appropriate parameter initialization utilizing the proposed snake model. Then,for the next frames, a force-based approach is being followed to minimize the totalsnake energy inside the respective uncertainty regions, which are extracted using themotion history of each point of the contour. The force-based approach is adoptedas an alternative to direct energy minimization, to reduce computation complexity.Given the proposed snake model presented in Section 2, the first step is to extractthe uncertainty regions around the snake initialization, using the motion history ofthe tracked contour (curve points’ displacements in previous time instances): thepreviously estimated contour is deformed according to the previously calculatedpoint movements, to obtain snake’s initialization for the next frame, and the stan-dard deviation of each point’s mean motion is calculated; the uncertainty regionaround each initialized point is then defined in terms of its corresponding standarddeviation. The next step is to oscillate each point of the curve inside its correspond-ing uncertainty region, until it reaches the minimum of a criterion, which is definedby the snake’s energy terms described above. As it will be explained, the directenergy minimization procedure is of high complexity, even if constrained in a nar-row band round the snake initialization, and thus we convert the energy terms intorespective forces, applied on the curve in order to make it converge to the movingobject boundaries. It should be mentioned that this force-based approach is followedin a steepest descent manner and does not ensure that the snake’s final positioncorresponds to its energy total minimum, but provides a good approximation of it.

4.1. Snake Points Motion Estimation

In order to extract the snake initialization and the respective uncertainty regionsaround it for each next frame of an examined video sequence, as briefly describedabove, we estimate the snake motion utilizing an appropriate existing technique.The correct extraction of moving edges in terms of position and direction is im-portant and aids the accurate estimation of an object’s position from the currentto the next frame 38. Several existing techniques are able to adequately cope withthe difficult problem of optical flow recovery given that their assumptions hold.The challenge is to achieve high robustness against strong assumption violationscommonly met in real sequences. We adopt the motion estimation technique pro-posed by Black et al. 3 as an efficient tool for overcoming these violations. Theyreformulate the objective function, which consists of the optical flow equation andthe spatial coherence constraint, in order to include the robust statistics tools 14

in an almost straightforward way. They simply take the standard least-squaresformulation of optical flow and use a robust estimator instead of the quadraticone. This approximation is then minimized using a coarse-to-fine (multiresolution)simultaneous over-relaxation technique. The proposed reformulation results in anarea-based regression technique that is robust to multiple motions due to occlusion,

July 23, 2003 13:18 WSPC/INSTRUCTION FILE IJIG˙DM˙10

14 G. Tsechpenakis, N. Tsapatsoulis and S. Kollias

transparency or specular reflections and compensates for over-smoothing and noisesensitivity.Let us consider the contour of a moving object, located in the I-th frame (I > 1),of a video sequence as a vector of complex numbers, i.e.,

C(I) = [x(I)i + j · y(I)

i |i = 1, ..., N ] = [C(I)(1) , ..., C

(I)(N)], (17)

where C(I)(k) = x

(I)k + j · y(I)

k is the location of the k-th point of the contour, and(xk, yk) is the respective position on the image plane. We define the instant motionof the k-th point of the object contour, computed in the I-th frame, as:

m(I)c,k = MF (I−1,I)(xk, yk), (18)

where MF (I−1,I)(xk, yk) is the motion vector of the point (xk, yk) estimated withthe adopted technique, between the successive frames I − 1 and I. Thus, the snakeinstant motion between these frames is,

m(I)c = [MF (I−1,I)(xi, yi)|i = 1, ..., N ] = [m(I)

(c,1), ..., m(I)(c,N)] (19)

4.2. Uncertainty Regions Extraction

The minimization procedure of snake’s total energy is actually a problem of pickingout the “correct” curve in the image, i.e., the curve which corresponds to the objectof interest among a set of candidate curves, given an initial estimate of the object’scontour. In this subsection we propose a way to determine a region/band around thesnake initialization, for each frame of a video sequence, in which the correct curveis located. This idea is not new, as stochastic models have been lately proposed inthe literature, mostly as shape prior knowledge 35, to define possible positions ofthe curve points around an initialization. In the same direction, we introduce herethe term “uncertainty region”, which denotes that the minimization procedure (orthe picking out of the correct curve) takes place inside that region, constrainingthe problem inside a narrow band around the snake initialization. Such regionsare extracted by exploiting the motion history of the tracked contour, extractingstatistical measurements of it, which is obtained using the technique described inthe previous subsection.Based on the definitions of eqs. (17), (18) and (19), given in the previous subsection,we define the mean movement of contour C up to frame I as:

m(I)c = [m(I)

c,1 , ..., m(I)c,N ] (20)

where

m(I)c,k = m

(I)x,k + j · m(I)

y,k =1

I − 1

I−1∑

i=1

m(i+1)c,k (21)

is the corresponding mean movement of the k-th point of the contour.Similarly the standard deviation of contour’s mean movement is defined as:

s(I)c = [s(I)

c,1 , ..., s(I)c,N ] (22)

July 23, 2003 13:18 WSPC/INSTRUCTION FILE IJIG˙DM˙10

Probabilistic Boundary-Based Contour Tracking with Snakes in Natural Cluttered Video Sequences 15

where

s(I)c,k =

1I − 1

{

√√√√I−1∑

i=1

(m(I)x,k −m

(i+1)x,k )2 + j ·

√√√√I−1∑

i=1

(m(I)y,k −m

(i+1)y,k )2 } (23)

is the standard deviation of k-th point’s mean movement.In practice, eqs. (21) and (23) are computed based on the last L frames so as totake into account only the recent history of contour’s movement, i.e.,

m(I)c,k =

1L

I−1∑

i=I−L

m(i+1)c,k (24)

s(I)c,k =

1L{

√√√√I−1∑

i=I−L

(m(I)x,k −m

(i+1)x,k )2 + j ·

√√√√I−1∑

i=I−L

(m(I)y,k −m

(i+1)y,k )2 } (25)

The initial estimation (initialization) of the object’s contour C(I+1)init in the next

frame frame I + 1 is computed based on contour’s current location and its meanmotion, i.e.,

C(I+1)init = C(I) + m(I)

c = [C(I)init(1), ..., C

(I)init(N)] (26)

while its exact location C(I+1) = [C(I+1)(k) |k = 1 . . . N ] is obtained by solving the

following equations:

C(I+1) = arg minr∈R

[w1(·D(r)K + ·D(r)

DV ) + w2 · E(r)ext], (27)

D(r)K =

N∑

k=1

[K(C(I))(k)−K(r)(k)]2, (28)

D(r)DV =

N∑

k=1

[DV(C(I))(k)−DV(r)(k)]2, (29)

E(r)ext =

N∑

k=1

[e(r)ext(k)] (30)

where K(C(I))(k), K(r)(k) and DV(C(I))(k), DV(r)(k) are the curvature and thepoint density values of the contour C(I) and the curve r ∈ R at the k-th point,respectively. Parameters w1 and w2 represent the weights with which the energy-based terms of eq. (27) participate in the minimization procedure.R is a set of curves created by the stochastic process:

Q = C(I+1)init + z, (31)

July 23, 2003 13:18 WSPC/INSTRUCTION FILE IJIG˙DM˙10

16 G. Tsechpenakis, N. Tsapatsoulis and S. Kollias

Fig. 9. Point oscillation following a Gaussian distribution defined by its motion history.

where z ∈ ZN is a vector of random complex variables, each of which follows a zeromean gaussian distribution. In particular:

zk ∼ g(0, s(I)c,k) (32)

The set of all possible curves R emerge by oscillating the points of the curve C(I+1)init

according to the standard deviation from their mean movement, as shown in Fig.9: x-axis represents the displacement from the point’s current position and y-axisrepresents the respective probabilities. The gaussian formulation for the point os-cillations is mainly adopted to show that each point of the curve is likely to move inthe same way (amplitude and direction) that it has been moving until the currentframe, rather than moving in a much different way. In this way and for each contourpoint k, an uncertainty region is defined. If C

(I)(k) is the location of the k-th point of

the contour in frame I and this point was static during the previous L frames thens(I)c,k = 0 and its uncertainty region shrinks to a single point whose location coin-

cides with C(I+1)init(k). If point k was moving with constant velocity then the standard

deviation of its movement is again s(I)c,k = 0 and the previous case holds regarding

its uncertainty region. On the other hand, if point k was oscillating in the previousL frames, the standard deviation of its movement is high and consequently its un-certainty region is wide.Fig. 10 illustrates the proposed approach in steps, in the case of a car tracking.Figs. 10(a) and (b) present two successive frames, I and I + 1, of a car sequence,whereas in (a) the car contour C(I) is shown and in (b) the snake initializationC(I+1)

init is estimated, according to eq. 26. Fig. 10(c) illustrates the amplitude (3s(I)c,k)

and the direction (normal to the snake initialization) of each point k oscillation. InFigs. 10(d) and (e) the snake’s uncertainty regions are extracted, and in Fig. 10(f)the object contour C(I+1) is estimated, after the minimization of eq. (27).However, there is a case, where the extraction of the uncertainty regions may leadto false results, in the localization of the possible positions of the object contourin the next frame. This problem occurs when rapid changes in the examined sceneoccur, e.g when an object remains static for a large number of frames (≥ L) and

July 23, 2003 13:18 WSPC/INSTRUCTION FILE IJIG˙DM˙10

Probabilistic Boundary-Based Contour Tracking with Snakes in Natural Cluttered Video Sequences 17

(a) (b) (c)

(d) (e) (f)

Fig. 10. Proposed tracking method in steps: (a) object contour in the previous frame, (b) snakeinitialization in the next frame, (c) amplitude and direction of snake points oscillations, (d) un-certainty region boundaries, (e) extracted uncertainty region and (f) final solution (object contourin the current frame).

suddenly starts moving, or when a moving object rapidly accelerates. This problemcould be solved using higher order statistics, which is a task of future work.

4.3. Force-Based Approach

The minimization of the eq. (27) is a procedure of high complexity: if N is thenumber of points forming the examined curve C and M is the number of all possiblepositions of each curve point C

(I+1)init(k) inside the extracted uncertainty region U,

assuming that M is the same for all points, then the number of all possible curvesr ∈ R generated by points’ oscillations is MN . In order to avoid that problem,and to avoid re-parametrization along the snake, i.e re-arrangement of the snakepoints along the curve so as to establish point correspondences between successiveframes, we propose a force-based approach, instead of using a dynamic programmingalgorithm, where the energy terms, participating in the snake energy function, aretransformed into forces applied in each curve point so as to converge to the desiredobject boundaries.Let us consider the initialization of the object’s contour in the frame I + 1, i.eC(I+1)

init = [C(I+1)init(k)|k = 1 . . . N ]. Let also t(I+1) be the set of the tangential unit

vectors and n(I+1) the set of the normal vectors of curve C(I+1)init , given by eqs. (33):

July 23, 2003 13:18 WSPC/INSTRUCTION FILE IJIG˙DM˙10

18 G. Tsechpenakis, N. Tsapatsoulis and S. Kollias

Fig. 11. Curvature-based and point density-based (internal) forces Fc and Fd, respectively, alongthe initialization of a curve C in the frame I + 1.

t(I+1) = [t(I+1)k |k = 1 . . . N ], n(I+1) = [n(I+1)

k |k = 1 . . . N ],

t(I+1)k =

∇C(I+1)init |k

‖∇C(I+1)init |k‖

, n(I+1)k =

∇t(I+1)|k‖∇t(I+1)|k‖

(33)

We define the following forces at each point k:

Fd(k) = [DV(C

(I)(k))−DV

(C(I+1)init(k))

]·t(I+1)k , Fc(k) = [K

(C(I)(k))−K

(C(I+1)init(k))

]·n(I+1)k (34)

Fd = [Fd(k)|k = 1, . . . , N ] represents the stretching component that forces pointsto come closer or draw away from each other along the curve, and it is alwaystangential to it. Thus, if the distance between two curve points C(I)

(k) and C(I)(k+1) is

greater than the distance between C(I+1)init(k) and C(I+1)

init(k+1), then Fd(k) · t(I+1)k > 0

and the k-th point of C(I+1)init tends to draw away from the (k+1)-th one; otherwise,

Fd(k) · t(I+1)k < 0 and the k-th point is forced to come closer to the (k + 1)-th one.

Fc = [Fc(k)|k = 1, . . . , N ] represents the deformation of the curve along its normaldirection. The property of the curvature distribution to take low values where thecurve is relatively smooth and high values where the curve has strong variations,makes Fc force curve to the initial shape (the one in the previous frame) and not toa smoother form. Moreover, we exploit the curvature’s property to be positive wherethe curve is convex and negative where the curve is concave. Fig. 11 illustrates thedirections of Fc and Fd along a curve.These forces represent the internal snake forces that deform the curve C(I+1)

init

according to the shape of the contour C(I) in the previous frame. The constraint ofsuch a deformation is actually the external energy Eext, which is transformed intoforce as described in the following.Given the snake’s uncertainty region U, depending on the standard deviation of thecurve’s mean movement, for each point k we define a function gm,k(p), constitutingof all pixels p = xp+j ·yp of the modified image gradient Gm, inside the uncertaintyregion U, lying on the line segment that is defined by the normal direction of the

July 23, 2003 13:18 WSPC/INSTRUCTION FILE IJIG˙DM˙10

Probabilistic Boundary-Based Contour Tracking with Snakes in Natural Cluttered Video Sequences 19

curve C(I+1)init at point k:

gm,k(p) = [Gm(p) | (C(I+1)init(k) − p)T · n(I+1)

k = 1, p ∈ U] (35)

The maximum of this function determines the most salient edge-pixel in the linesegment defined above and thus defines the direction of the external snake force:

pk = arg maxp

[gm,k(p)] , (36)

sgnk =

{+, if p inside the area defined by C(I+1)

init

−, otherwise, (37)

where sgnk denotes the sign/direction of the external force to be applied on thek-th point.Then, the external snake force for each point k is given by:

Fe(k) = sgnk · eext(k) · n(I+1)k (38)

From the definition of the external energy term Eext = [eext(k)|k = 1 . . . N ] itcan be seen that it takes values close to zero in regions of high image gradient(G2

m(k) ' 1) and values close to unity in regions with relatively constant intensity(smooth)(G2

m(k) ' 0). Thus, the term Fe = [Fe(k)|k = 1, . . . , N ] is proportional toGm and forces the curve to the salient edges inside the extracted uncertainty region.In the definition of this force we exploit the advantage of Gm against |∇Gσ∗I|, topreserve the most important edges, as shown before, and thus the problem of theexistence of many local maxima in eq. (36) is eliminated.In the above described force-based approach, the minimization of eq. (27) can beapproximated by using the internal and external snake forces, in an iterative mannersimilar to the steepest descent approach 12, as it is summarized below. In particular,let C(I+1),(ξ) be the estimated contour in the ξ-est iteration; then, with the initialcondition C(I+1),(0) = C(I+1)

init , the following equations hold:

C(I+1),(ξ) = C(I+1),(ξ−1) + ∆C(ξ) (39)

∆C(ξ) = [C(I+1),(ξ−1)(k) · F(ξ−1)

tot (k) | k = 1 . . . N ] (40)

F(ξ−1)tot (k) = w1 · ( F(ξ−1)

c (k) + F(ξ−1)d (k)) + w2 · F(ξ−1)

e (k), (41)

where F(ξ−1)d (k), F(ξ−1)

c (k) and F(ξ−1)e (k) are estimated according to the eqs. (34)

and (38) respectively, on the basis of C(I+1),(ξ−1) (instead of C(I+1)init ). The final

contour C(I+1) is obtained when one of the following criteria is satisfied:(a) F

(ξ)τ < a · F (ξ+1)

τ , where

F (ξ)τ = w1 · ( ‖

N∑

k=1

F(ξ)c (k)‖+ ‖

N∑

k=1

F(ξ)d (k)‖ ) + w2 · ‖

N∑

k=1

F(ξ)e (k)‖ (42)

July 23, 2003 13:18 WSPC/INSTRUCTION FILE IJIG˙DM˙10

20 G. Tsechpenakis, N. Tsapatsoulis and S. Kollias

a is a positive constant in the range 0 < a < 1. When a is selected to be close toone then C(I+1) is more likely to correspond to a local minimum solution; lowervalues of a increase the number of iterations and, therefore, the execution time.The statistical approach we follow to estimate the regions of uncertainty allows forthe use of a close to 1.(b) the maximum number of iterations is reached. In this case

C(I+1) = C(I+1),(ξ), ξ = arg minξ

[F (ξ)τ ] (43)

It must be noted that the use of the proposed steepest descent approach does notensure that the final contour corresponds to the solution of the eq. (27). However,under the constraints we pose, even if C(I+1) corresponds to a local minimum, it isclose to the desired solution (global minimum).

4.4. Implementation Issues

In this subsection, some clarifications about the proposed approach are stated, soas to make its implementation straightforward.Time window L: As shown in eqs. (24) and (25), the mean motion and standarddeviation of each point of the tracked contour are computed in order to extractthe uncertainty regions. If I is the current frame of an examined sequence, thenthe contour of the desired object in the previous L successive frames is needed inorder to calculate the L − 1 displacements of each point of the curve (eq. (18)).The frame-window L is usually set to values varying from 5 . . . 15 frames accordingto the number of the available frames. For a given number of L past frames, if themovement of the desired object is rapid, the extracted uncertainty regions are largedue to the high standard deviation of its motion; otherwise, if object movement isslow, the extracted uncertainty regions are small and the new positions of the curvepoints do not vary much from the previously estimated ones.First L frames: Regarding the first L frames of the examined sequence, for theestimation of the snake initialization, the mean motion of each point of the snake iscalculated according to the eq. (21), given the motion fields estimated in the past(≤ L) available frames. Regarding the snake initialization in second frame of thesequence, and assuming that the examined object contour is known for the firstone, the mean motion is actually the motion between these two frames. In the samedirection, and to ensure that the extracted uncertainty regions contain the solutions,the motion standard deviation is directly (manually) initialized in a relatively highvalue (the same for all points of the curve) and consequently the correspondinguncertainty regions are widely extended round the snake initializations. This mayresult in inaccurate contours, mainly due to some salient background edges closeto object boundaries, which remain after image pre-filtering and the morphologicaloperations described in subsection 3.1, for the extraction of the modified imagegradient. This problem occurs when the motion history information is poor, but itis overcome while the motion history gets “enriched” along the time.

July 23, 2003 13:18 WSPC/INSTRUCTION FILE IJIG˙DM˙10

Probabilistic Boundary-Based Contour Tracking with Snakes in Natural Cluttered Video Sequences 21

Weights [w1, w2]: In eqs. (27) and (41) the internal and external energy/force-based terms participate in the minimization procedure/convergence with weightsw1 and w2. The choice of appropriate values for these weights is not crucial for themethod’s performance, but they should be set in values depending on the amountof the background complexity and the smoothness of the object silhouette. Forsequences with relatively smooth background (without any significant edges, oredges far from object boundaries) the external energy/force is used as a reliablecriterion and thus w2 is set to higher value. Moreover, if the contour of the trackedobject is complicated (not smooth) or noisy, the internal energy/force terms (basedon curvature and point density) are not reliable and thus are set to lower values.Practically, it suffices that these weights are proportional to the values 1, 10 or100. A practical way to automatically estimate these weights, is the calculation ofthe snake/curve zero crossings (zero points of the second derivative of the curveor the curvature), for w1, and the calculation of the mean value of the modifiedimage gradient inside the extracted uncertainty regions, for w2, and appropriatethresholding of the results. Thus, noisy or complicated contours result in largenumbers of curvature zero points, whereas remaining background edges inside theextracted uncertainty regions result in higher values of the modified image gradient.In this way, these weights are not fixed for the whole sequence, but they are adjustedin each frame.

5. Experimental Results

The performance of the proposed approach was tested over a large number ofnatural sequences. In the experiments a Matlab implementation on INTEL PIII,500MHz, PC was used.Table 1, presents the execution time of the proposed method, when applied to twosuccessive frames of the sequences (corresponding to Figs. 12- 16), with respect tothe frame size, the tracked object (its rigidity), the adopted time-window L (numberof past successive frames) and the existence of clutter and background complexity.The respective weights [w1, w2] of the energy minimization, or the force-based snakeconvergence, are also shown in this Table.Fig. 12 illustrates the case of tracking an object with complicated contour (low

smoothness) moving in front of a relatively smooth background. In such a case,weight w2 of eqs. (27) and (41) must be significantly greater than w1 ([w1, w2] =[1, 100]). In this case the desired object (aircraft) is moving towards the shootingcamera and even if the object is rigid, its projection on the image plane is deforming(its contour expands), as time passes.In Fig. 13 the case of car tracking in six successive frames of a traffic sequence is

presented. In this example the desired object (car) is moving towards the cameraand although it is rigid, its projection on the image plane is slowly deforming alongtime, as in the previous example. In this case, the background is relatively smoothclose to object boundaries, and the car silhouette is not complicated. Therefore, the

July 23, 2003 13:18 WSPC/INSTRUCTION FILE IJIG˙DM˙10

22 G. Tsechpenakis, N. Tsapatsoulis and S. Kollias

Sequence Frame size Object L [w1, w2] Exec. time ( secframe )

384× 288 aircraft 10 [1,100] 9.593

384× 288 car 10 [1,10] 7.407

384× 288 dolphin 5 [100,1] 6.846

352× 288 man 10 [100,1] 6.473

352× 288 ball 10 [100,1] 6.743

Table 1. Performance of the proposed method - parameter tuning and computational cost

(a) (b) (c)

Fig. 12. Example of tracking an aircraft approaching the airport. The chosen weights are [w1, w2] =[1, 100] due to the tracked contour complexity and the background smoothness.

July 23, 2003 13:18 WSPC/INSTRUCTION FILE IJIG˙DM˙10

Probabilistic Boundary-Based Contour Tracking with Snakes in Natural Cluttered Video Sequences 23

(a) (b) (c)

(d) (e) (f)

Fig. 13. A moving car tracking in six (a,b,c,d,e,f) successive frames of a traffic sequence. The back-ground is relatively smooth close to car’s boundary, while the car’s contour is not very complicated;thus [w1, w2] = [1, 10].

(a) (b) (c)

Fig. 14. Object tracking in strong existence of clutter. For each of the transitions (a)-(b) and(b)-(c), the chosen weights are [w1, w2] = [100, 1], since the background is very noisy containingsalient edges.

weights are [w1, w2] = [1, 10] and the frame window is set to L = 10.Fig. 14 illustrates the method’s performance in a strongly cluttered sequence, wherethe object is non-rigid and its motion projection is both rotational and translationalrather than a simple translation or expansion/shrink. The accuracy of the methodin such a case is mainly based on the snake’s external energy definition through theASF filtering and the image modified gradient estimation, although the extracteduncertainty regions are wide. It must be pointed out that in cases with strong ex-istence of noise and weak object edges, w1 must be significantly greater than w2

([w1, w2] = [100, 1]).In Fig. 15, the proposed approach is applied to a strongly cluttered sequence, where

July 23, 2003 13:18 WSPC/INSTRUCTION FILE IJIG˙DM˙10

24 G. Tsechpenakis, N. Tsapatsoulis and S. Kollias

(a) (b) (c)

(d) (e) (f)

Fig. 15. Example of a man walking in a cluttered sequence. The main source of inaccuracies isthe weak edges of the human body (especially close to the head). We choose [w1, w2] = [100, 1] toreduce the effect of the background complexity.

the desired object is a man walking in one direction. The contour of the movinghuman body is deforming along time, and therefore the choice of L = 10 resultsto large uncertainty regions, whereas the weights w1 and w2 are set to values 100and 1 respectively, due to the existence of temporal clutter. The accuracy of themethod is again based on the snake’s external energy definition through the imagemodified gradient utilization and the ASF pre-filtering.Finally, Fig. 16 illustrates twelve successive frames of a sequence presenting a soccerplayer receiving a handoff and getting ready to give a quick kick. The object beingtracked in this case is the ball, which rapidly changes its direction of movement,when the player receives and kicks the ball. The desired object is rigid with smoothcontour and the background provides smoothness close to the object boundary; onthe other hand, the intense shadows of the ball and the player, due to the stronglighting of the soccer field, is the main source of possible inaccuracies. To reducethe side effect of the shadows, we choose [w1, w2] = [100, 1], whereas, due to therapid change of the ball’s direction of movement, the choice of L = 10 results tolarge uncertainty regions at that time instance. Automatic estimation of the valuesof constants w1 and w2 is an important task and the authors are currently workingtowards this direction, taking into account the description given in subsection 4.4.

6. Conclusions and Further Work

In this work we deal with the challenge of object tracking in clutter with an edge-based model. We proposed a probabilistic snake-based approach for tracking con-

July 23, 2003 13:18 WSPC/INSTRUCTION FILE IJIG˙DM˙10

Probabilistic Boundary-Based Contour Tracking with Snakes in Natural Cluttered Video Sequences 25

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

Fig. 16. A ball being tracked in a soccer sequence. The background is relatively smooth close tothe ball, but the direction of the ball’s movement changes rabidly, whereas there is intense playerand ball shadows; therefore, since the ball remains rigid (circular shape) along the time, the chosenweights are [w1, w2] = [100, 1]

tours of moving objects even in cluttered natural sequences. The introduced internalsnake energy, based on curvature and point density distribution, gives a descriptionof object shape and deformation, while the external energy, based on morphologicaloperations, ensures that the resulting curve stops at the desired object boundaries.

July 23, 2003 13:18 WSPC/INSTRUCTION FILE IJIG˙DM˙10

26 G. Tsechpenakis, N. Tsapatsoulis and S. Kollias

The tracking problem is posed as a procedure of a stochastic generation of curvesand picking out the one that minimizes the summation of three terms, defined by thesnake internal and external energies. Statistical measurements of the object contourmotion history are extracted to obtain snake initializations and uncertainty regions,in which the estimated contours are to be localized. In this way we constrain thesolution in narrow bands around the next frames snake initializations. The snakemotion is estimated according to an existing technique, which eliminates the effectof noise. In order to avoid computation complexity and to solve the point correspon-dence problem, we follow a force based approach, converting energy based termsinto respective forces, and thus the minimization procedure is approximated by aforce-based curve evolution, inside the extracted uncertainty regions. The proposedmethod has been successfully applied in natural sequences with strong existenceof clutter and in sequences where objects move arbitrarily, whereas it is currentlybeing applied to a video analysis framework, utilizing MPEG-7 visual descriptors.Since in the last few years it has been clear to researchers that the utilization of bothedge and region-based information performs better than the use of only one of them,we are currently working in that direction to cope with various fundamental andcommon problems in object tracking; thus, we are working on some modificationsof the proposed method, in order to cope with object partial occlusions and abruptmotions. In particular, we are examining a rule-based approach, where the separa-tion between moving objects and background, as well as occlusion detection, areachieved by introducing region motion information in the proposed model. A stepfurther is to introduce prior knowledge about the shape and the possible deforma-tions of the tracked objects, for specific applications, such as human lip tracking forlip reading. In the same direction, motion detection/segmentation, texture-based,and motion-based features (optical flow) to yield the correspondence and free themodel from the initial conditions can be considered. A coupled Geodesic ActiveContour framework that incorporates different information forms (boundary, re-gion) of different nature (edges, intensities, texture, motion) is the future directionof our work.

Acknowledgements: We wish to thank the Directorate of Greek National Television

(ERT) for allowing us to use audiovisual material obtained from its TV programs.

References

1. Y. Avrithis, Y. Xirouhakis and S. Kollias, “Affine-Invariant Curve Normalizationfor Object Shape Representation, Classification and Retrieval,” Machine Vision andApplications, 13(2), pp. 80-94, 2001.

2. B. Bascle and R. Deriche, ”Region Tracking through Image Sequences,” Proc. of the5th International Conference on Computer Vision (ICCV’95), pp. 302-307, Boston,USA, 1995.

3. M.J. Black and P. Anandan, “The Robust Estimation of Multiple Motions: Parametricand Piecewise-Smooth Flow Fields,” CVIU, 63(1), pp. 75-104, 1996.

4. V. Caselles, R. Kimmel, G. Sapiro, and C. Sbert, ”Minimal Surfaces: A Geometric

July 23, 2003 13:18 WSPC/INSTRUCTION FILE IJIG˙DM˙10

Probabilistic Boundary-Based Contour Tracking with Snakes in Natural Cluttered Video Sequences 27

Three Dimensional Segmentation Approach,” Numerische Mathematik 77(4), pp. 423-425,1997.

5. V. Caselles, R. Kimmel, G. Sapiro, and C. Sbert, ”Minimal Surfaces Based ObjectSegmentation,” IEEE Trans. on Pattern Analysis and Machine Intelligence, 19(4),pp. 394-398, April 1997.

6. X. Chenyang and J. L. Prince, “Snakes, shapes, and gradient vector flow,” IEEETrans. on Image Processing, 7(3), pp. 359-369, 1998.

7. L. D. Cohen, “On Active Contour Models and Ballons,” CVGIP: Image Understand-ing 53(2), pp. 211-218, 1991.

8. M. Daoudi, F. Ghorbel, A. Mokadem, O. Avaro and H. Sanson, “Shape Distances forContour Tracking and Motion Estimation,” Pattern Recognition 32, pp. 1297-1306,1998.

9. P. Delagnes, J. Benois and D. Barba, “Active Contours Approach to Object Trackingin Image Sequences with Complex Background,” Pattern Recognition Letters, 16, pp.171-178, 1995.

10. O. Faugeras and R. Keriven, ”Variational Principles, Surface Evolution PDE’s, LevelSet Methods, and the Stereo Problem,” IEEE Trans. on Image Processing, 7(3), pp.336-344, March 1998.

11. D. Gravila and L. Davis, “3-D Model-based Tracking of Humans in Action: A Multi-view Approach,” Proc. of IEEE Conference on Computer Vision and Pattern Recog-nition (CVPR’96), San Fransisco, USA, pp. 73-80, 1996.

12. S. Haykin,“The Steepest Descent Method,” Neural Networks, Macmillan College Pub-lishing Company, Chapt. 5, pp. 124-126, 1994.

13. M. Hoch and P. Litwinowicz, “A Practical Solution for Tracking Edges in ImageSequences with Snakes,” The Visual Computer, 12(2), pp. 75-83, 1996.

14. P. Huber, Robust Statistics, Wiley eds., NY, 1981.15. H. S. Ip and S. Dinggang, “An Affine-Invariant Active Contour Model (AI-Snake) for

Model-Based Segmentation,” Image and Vision Computing, 16(2), pp. 135-146, 1998.16. M. Israd and A. Blake, “Contour Tracking by Stochastic Propagation of Conditional

Density,” Proc. of European Conf. on Computer Vision (ECCV’96), 1, Cambridge,UK, pp. 343-356, 1996.

17. A. K. Jain, Y. Zhong and M.-P. Dubuisson-Jolly, “Deformable Template Models: Areview,” Signal Processing 71, pp. 109-129, 1998.

18. M. Kass, A. Witkin and D. Terzopoulos, ”Snakes: Active Contour Models,” Interna-tional Journal of Computer Vision 1(4), 1988, 321-331.

19. F. Leymarie and M. Levine, “Tracking Deformable Objects in the Plane using anActive Contour Model,” Trans. Pattern Recognition and Machine Intelligence, 33,pp. 617-634, 1993.

20. D.G. Lowe, ”Robust Model-Based Motion-Tracking Through the Integration of Searchand Estimation,” International Journal of Computer Vision 8(2), 1992, 113-122.

21. A. Makarov, ”Comparison of Background Extraction Based Intrusion Detection Algo-rithms,” In Proc. International Conference on Image Processing, Lausanne, Switzer-land, 1996, pp. 521-524.

22. A. Mansouri, T. Chomaud and J. Konrad, “A comparative evaluation of algorithmsfor fast computation of level set PDEs with applications to motion segmentation,” InProc. of International Conference on Image Processing (ICIP’01), pp. 636 -639, vol.3,Thessaloniki, Greece, 2001.

23. P. Maragos, “Noise Suppression,” The Digital Signal Processing Handbook, V.KMadisetti and D.B Williams Eds., CRC Press, 1998, Chapt. 74, pp. 20-26.

24. P. Maragos, “Image Segmentation,” The Digital Signal Processing Handbook, V.K

July 23, 2003 13:18 WSPC/INSTRUCTION FILE IJIG˙DM˙10

28 G. Tsechpenakis, N. Tsapatsoulis and S. Kollias

Madisetti and D.B Williams Eds., CRC Press, 1998, Chapt. 74, pp. 25-26.25. P. Maragos, “Computer Vision,” Textbook from the post-graduate course “Computer

Vision” in Electrical & Computer Eng. Dept. of National Technical University ofAthens, instructor P. Maragos, chapt. 5, 2002.

26. D. Metaxas and D. Terzopoulos, ”Constrained Deformable Superquadrics and Non-rigid Motion Tracking,” Proc. of IEEE Conference on Computer Vision and PatternRecognition (CVPR’91), pp. 337-343, 1991.

27. F. Meyer and P. Bouthemy, “Region-Based Tracking Using Affine Motion Models inLong Image Sequences,” CVGIP: Image Understanding 60(2), pp. 119-140, 1994.

28. F. Meyer and P. Maragos, Multiscale Morphological Segmentations Based on Water-shed, Flooding, and Eikonal PDE, eds. M. Nielsen et al., pp. 351-362, 1999.

29. F. Mohanna and F. Mokhtarian, “Improved curvature estimation for accurate locali-sation of active contours,” In Proc. of International Conference on Image Processing(ICIP’01), pp. 781-784, vol.2, Thessaloniki, Greece, 2001.

30. K. P. Ngoi and J.C. Jia, ”An Active Contour Model for Color Region Extraction inNatural Scenes,” Image and Vision Computing, 17(13), pp. 955-966, 1999.

31. J.M Odobez and P. Bouthemy, “Separation of Moving Regions from Backgroundin an Image Sequence Acquired with a Mobile Camera,” Video Data Compressionfor Multimedia Computing, eds. H.H Li, S. Sun and H. Derin, Kluwer AcademicPublisher, Chapt. 8, pp. 283-311.

32. S. Osher and J. Sethian, “Fronts Propagating with Curvature-Dependent Speed:Algorithms Based on the Hamilton-Jacobi Formulation,” Journal of ComputationalPhysics 79, pp. 12- 49, 1988.

33. N. Paragios and R. Deriche, “A PDE - Based Level Set Approach for Detectionand Tracking of Moving Objects,” In Proc. of the Sixth International Conference onComputer Vision (ICCV’98), pp. 1139-1145, Bombay, India, 1998.

34. N. Paragios and R. Deriche, “Geodesic Active Regions for Motion Estimation andTracking ,” 7th IEEE Conference in Computer Vision, Greece, 1999.

35. N. Paragios and R. Deriche, “Geodesic Active Contours and Level Sets for the Detec-tion and Tracking of Moving Objects,” IEEE Trans. Pattern Analysis and MachineIntelligence 22(3), pp. 266-280, 2000.

36. J.S. Park and J.H. Han, “Contour Matching: A Curvature-based Approach,” Imageand Vision Computing, 16, pp. 181-189, 1998.

37. N. Peterfreund, “Robust Tracking of Position and Velocity with Kalman Filters,”IEEE Trans. Pattern Analysis and Machine Intelligence 21(6), pp. 564-569, 1999.

38. A. M. Tekalp, Digital Video Processing, Prentice Hall PTR.39. W.B Thompson, P. Lechleider and E.R. Stuck, “Detecting Moving Objects Using the

Rigidity Constraint,” IEEE Trans. Pattern Analysis and Machine Intelligence 2(15),pp. 162-166, 1993.

40. G. Tsechpenakis, Y. Xirouhakis and A. Delopoulos, “A Multiresolution Approach forMain Mobile Object Localization in Video Sequences,” International Workshop onVery Low Bitrate Video Coding (VLBV01), Athens, Greece, October 2001.

41. G. Tsechpenakis, Y. Avrithis and S. Kollias, “Efficient Moving Object Detectionand Tracking in Video Sequences,” submitted to IEEE Image Vision and Computing,http://www.image.ntua.gr/ gtsech/.

42. C. Vieren, F. Cabestaing and J.G. Postaire, “Catching Moving Objects with Snakesfor Motion Tracking” Pattern Recognition Letters, 16, pp. 679-685, 1995.

43. Y. Xirouhakis, G. Tsechpenakis and A. Delopoulos, “Fast Mobile Object Detectionand Localization in Video Sequences,” submitted to IEEE Computer Vision and ImageUnderstanding, http://www.image.ntua.gr/ gtsech/.

July 23, 2003 13:18 WSPC/INSTRUCTION FILE IJIG˙DM˙10

Probabilistic Boundary-Based Contour Tracking with Snakes in Natural Cluttered Video Sequences 29

Photo and Bibliography

Gabriel Tsechpenakis was born in Athens, in 1975. He grad-uated from the School of Electrical and Computer Engineering,National Technical University of Athens, in 1999, and obtainedhis PhD in 2003 from the same University. His current researchinterests focus on the fields of image and video processing, com-puter vision and human computer interaction. So far, he hasbeen an author of 3 papers in international journals and over 10papers in proceedings of international conferences.

He is a member of the Technical Chambers of Greece and a member of theIEEE Signal Processing Society. He is currently a member of the Image, Video andMultimedia Systems Laboratory of NTUA, also collaborating with the Center forComputational Biomedicine Imaging and Modeling, in Rutgers State University ofNew Jersey.

Nicolas Tsapatsoulis was born in Limassol, Cyprus in 1969.He graduated from the Department of Electrical and ComputerEngineering, the National Technical University of Athens in 1994and received his Ph.D degree in 2000 from the same University.His current research interests lie in the areas of human computerinteraction, machine vision, image and video processing, neuralnetworks and biomedical engineering. He is a member of theTechnical Chambers of Greece and Cyprus and a member of

IEEE Signal Processing and Computer societies. Dr. Tsapatsoulis has publishedthirteen papers in international journals and more than 35 in proceedings of inter-national conferences. He served as Technical Program Co-Chair for the VLBV’01workshop. He is a reviewer of the IEEE Transactions on Neural Networks and IEEETransactions on Circuits and Systems for Video Technology journals.

Stefanos Kollias was born in Athens in 1956. He obtained hisDiploma from NTUA in 1979, his M.Sc. in Communication Engi-neering in 1980 from UMIST in England and his Ph.D in SignalProcessing from the Computer Science Division of NTUA. Heis with the Electrical Engineering Department of NTUA since1986 where he serves now as a Professor. Since 1990 he is Direc-tor of the Image, Video and Multimedia Systems Laboratory ofNTUA. He has published more than 120 papers in the above

fields, 50 of which in international journals. He has been a member of the Technicalor Advisory Committee or invited speaker in 40 International Conferences. He andhis team have been participating in 38 European and National projects.