wp4.2.2: semantic video segmentation and tracking … · video segmentation and tracking ... a...

23
WP4.2.2: Semantic Video Segmentation and Tracking Video Segmentation and Tracking Literature Review June-September 2004 Compiled by Aristotle University of Thessaloniki

Upload: buingoc

Post on 05-May-2018

226 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: WP4.2.2: Semantic Video Segmentation and Tracking … · Video Segmentation and Tracking ... A Survey on Visual Surveillance of Object ... morphological eye and mouth feature detection

WP4.2.2: Semantic Video Segmentation and Tracking Video Segmentation and Tracking Literature Review June-September 2004 Compiled by Aristotle University of Thessaloniki

Page 2: WP4.2.2: Semantic Video Segmentation and Tracking … · Video Segmentation and Tracking ... A Survey on Visual Surveillance of Object ... morphological eye and mouth feature detection

Object Tracking

A. Journal Papers

Contour-Based Object Tracking with Occlusion Handling in Video Acquired Using Mobile Cameras Yilmaz, A., Xin Li, Shah, M., Pattern Analysis and Machine Intelligence, IEEE Transactions on, Volume: 26, Issue: 11, Nov. 2004,Pages: 1531 – 1536 Abstract: We propose a tracking method which tracks the complete object regions, adapts to changing visual features, and handles occlusions. Tracking is achieved by evolving the contour from frame to frame by minimizing some energy functional evaluated in the contour vicinity defined by a band. Our approach has two major components related to the visual features and the object shape. Visual features (color, texture) are modeled by semiparametric models and are fused using independent opinion polling. Shape priors consist of shape level sets and are used to recover the missing object regions during occlusion. We demonstrate the performance of our method on real sequences with and without object occlusions. Link: http://csdl.computer.org/comp/trans/tp/2004/11/i1531abs.htm Fast occluded object tracking by a robust appearance filter Nguyen, H.T., Smeulders, A.W.M., Pattern Analysis and Machine Intelligence, IEEE Transactions on, Volume: 26, Issue: 8, Aug. 2004,Pages: 1099 – 1104 Abstract: We propose a new method for object tracking in image sequences using template matching. To update the template, appearance features are smoothed temporally by robust Kalman filters, one to each pixel. The resistance of the resulting template to partial occlusions enables the accurate detection and handling of more severe occlusions. Abrupt changes of lighting conditions can also be handled, especially when photometric invariant color features are used, The method has only a few parameters and is computationally fast enough to track objects in real time. Link: http://ieeexplore.ieee.org/xpl/abs_free.jsp?arNumber=1307017 Automatic Moving Object Extraction for Content-Based Applications Xu, H., Younis, A.A., Kabuka, M.R., Circuits and Systems for Video Technology, IEEE Transactions on, Volume: 14, Issue: 6, June 2004,Pages: 796 – 812 Abstract: Rapid developments in the Internet and multimedia applications allow us to access large amounts of image and video data. While significant progress has been made in digital data compression, content-based functionalities are still quite limited. Many existing techniques in content-based retrieval are based on global visual features extracted from the entire image. In order to provide more efficient content-based functionalities for video applications, it is necessary to extract meaningful video objects from scenes to enable object-based representation of video content. Object-based representation is also introduced by MPEG-4 to enable content-based functionality and high coding efficiency. In this paper, we propose a new algorithm that automatically extracts meaningful video objects from video sequences. The algorithm begins with the robust motion segmentation on the first two successive frames. To detect moving objects, segmented regions are grouped together according to their spatial similarity. A binary object model for each moving object is automatically derived and tracked in subsequent frames using the generalized Hausdorff distance. The object model is updated for each frame to accommodate for complex motions and shape changes of the object. Experimental results using different types of video sequences are presented to demonstrate the efficiency and accuracy of our proposed algorithm. Link: http://ieeexplore.ieee.org/xpl/abs_free.jsp?arNumber=1302161

Page 3: WP4.2.2: Semantic Video Segmentation and Tracking … · Video Segmentation and Tracking ... A Survey on Visual Surveillance of Object ... morphological eye and mouth feature detection

A Survey on Visual Surveillance of Object Motion and Behaviors Hu, W., Tan, T., Wang, L., Maybank, S., Systems, Man and Cybernetics, Part C, IEEE Transactions on, Volume: 34, Issue: 3, Aug. 2004, Pages: 334 – 352 Abstract: Visual surveillance in dynamic scenes, especially for humans and vehicles, is currently one of the most active research topics in computer vision. It has a wide spectrum of promising applications, including access control in special areas, human identification at a distance, crowd flux statistics and congestion analysis, detection of anomalous behaviors, and interactive surveillance using multiple cameras, etc. In general, the processing framework of visual surveillance in dynamic scenes includes the following stages: modeling of environments, detection of motion, classification of moving objects, tracking, understanding and description of behaviors, human identification, and fusion of data from multiple cameras. We review recent developments and general strategies of all these stages. Finally, we analyze possible research directions, e.g., occlusion handling, a combination of two- and three-dimensional tracking, a combination of motion analysis and biometrics, anomaly detection and behavior prediction, content-based retrieval of surveillance videos, behavior understanding and natural language description, fusion of information from multiple sensors, and remote surveillance. Link: http://ieeexplore.ieee.org/xpl/abs_free.jsp?arNumber=1310448 Stable Real-Time 3D Tracking Using Online and Offline Information Vacchetti, L., Lepetit, V., Fua, P., Pattern Analysis and Machine Intelligence, IEEE Transactions on, Volume: 26, Issue: 10, Oct. 2004, Pages: 1385 – 1391 Abstract: We propose an efficient real-time solution for tracking rigid objects in 3D using a single camera that can handle large camera displacements, drastic aspect changes, and partial occlusions. While commercial products are already available for offline camera registration, robust online tracking remains an open issue because many real-time algorithms described in the literature still lack robustness and are prone to drift and jitter. To address these problems, we have formulated the tracking problem in terms of local bundle adjustment and have developed a method for establishing image correspondences that can equally well handle short and wide-baseline matching. We then can merge the information from preceding frames with that provided by a very limited number of keyframes created during a training stage, which results in a real-time tracker that does not jitter or drift and can deal with significant aspect changes. Link: http://cvlab.epfl.ch/publications/vacchetti-et-al-pami04.pdf A Real-Time Articulated Human Motion Tracking Using Tri-Axis Inertial/Magnetic Sensors Package Zhu, R., Zhou, Z., Neural Systems and Rehabilitation Engineering, IEEE Transactions on [see also IEEE Trans. on Rehabilitation Engineering] Volume: 12, Issue: 2, June 2004, Pages: 295 – 302 Abstract: A basic requirement in virtual environments is the tracking of objects, especially humans. A real time motion-tracking system was presented and evaluated in this paper. System sensors were built using tri-axis microelectromechanical accelerometers, rate gyros, and magnetometers. A Kalman-based fusion algorithm was applied to obtain dynamic orientations and further positions of segments of the subject's body. The system with the proposed algorithm was evaluated via dynamically measuring Euler orientation and comparing with other two conventional methods. An arm motion experiment was demonstrated using the developed system and algorithm. The results validated the effectiveness of the proposed method. Link: http://ieeexplore.ieee.org/xpl/abs_free.jsp?arNumber=1304870 A multiobject tracking framework for interactive multimedia applications Yeasin, M., Polat, E., Sharma, R., Multimedia, IEEE Transactions on Volume: 6, Issue: 3, June 2004, Pages: 398 – 405

Page 4: WP4.2.2: Semantic Video Segmentation and Tracking … · Video Segmentation and Tracking ... A Survey on Visual Surveillance of Object ... morphological eye and mouth feature detection

Abstract: Automatic initialization and tracking of multiple people and their body parts is one of the first steps in designing interactive multimedia applications. The key problems in this context are robust detection and tracking of people and their body parts in an unconstrained environment. This paper presents an integrated framework to address detection and tracking of multiple objects in a computationally efficient manner. In particular, a neural network-based face detector was employed to detect faces and compute person specific statistical model for skin color from the face regions. A probabilistic model was proposed to fuse the color and motion information to localize the moving body parts (hands). Multiple hypothesis tracking (MHT) algorithm was adopted to track face and hands. In real world scenes extracted features (face and hands) usually contain spurious measurements that create unconvincing trajectories and needless computations. To deal with this problem a path coherence function was incorporated along with MHT to reduce the number of hypotheses, which in turn reduces the computational cost and improves the structure of trajectories. The performance of the framework was validated using experiments on synthetic and real sequence of images. Link: http://ieeexplore.ieee.org/xpl/abs_free.jsp?arNumber=1298812 Model-based global and local motion estimation for videoconference sequences Calvagno, G., Fantozzi, F., Rinaldo, R., Viareggio, A., Circuits and Systems for Video Technology, IEEE Transactions on, Volume: 14, Issue: 9, Sept. 2004, Pages: 1156 – 1161 Abstract: In this work, we present an algorithm for face 3-D motion estimation in videoconference sequences. The algorithm is able to estimate both the position of the face as an object in 3-D space (global motion) and the movements of portions of the face, like the mouth or the eyebrows (local motion). The algorithm uses a modi- fied version of the standard 3-D face model CANDIDE.We present various techniques to increase robustness of the global motion estimation which is based on feature tracking and an extended Kalman filter. Global motion estimation is used as a starting point for local motion detection in the mouth and eyebrow areas. To this purpose, synthetic images of these areas (templates) are generated with texture mapping techniques, and then compared to the corresponding regions in the current frame. A set of parameters, called action unit vectors (AUVs) influences the shape of the synthetic mouth and eyebrows. The optimal AUV values are determined via a gradient-based minimization procedure of the error energy between the templates and the actual face areas. The proposed scheme is robust and was tested with success on sequences of many hundreds of frames. Link: http://ieeexplore.ieee.org/xpl/abs_free.jsp?arNumber=1325199 Tracking Multiple Humans in Complex Situations Tao Zhao, Nevatia, R., Pattern Analysis and Machine Intelligence, IEEE Transactions on, Volume: 26, Issue: 9, Sept. 2004, Pages: 1208 – 1221 Abstract: Tracking multiple humans in complex situations is challenging. The difficulties are tackled with appropriate knowledge in the form of various models in our approach. Human motion is decomposed into its global motion and limb motion. In the first part, we show how multiple human objects are segmented and their global motions are tracked in 3D using ellipsoid human shape models. Experiments show that it successfully applies to the cases where a small number of people move together, have occlusion, and cast shadow or reflection. In the second part, we estimate the modes (e.g., walking, running, standing) of the locomotion and 3D body postures by making inference in a prior locomotion model. Camera model and ground plane assumptions provide geometric constraints in both parts. Robust results are shown on some difficult sequences. Link: http://iris.usc.edu/Outlines/papers/2004/zhao-pami-04.pdf Robust real-time face tracker for cluttered environments Keith Anderson and Peter W. McOwan, Computer Vision and Image UnderstandingVolume 95, Issue 2, (August 2004), Pages: 184-200.

Page 5: WP4.2.2: Semantic Video Segmentation and Tracking … · Video Segmentation and Tracking ... A Survey on Visual Surveillance of Object ... morphological eye and mouth feature detection

Abstract: A multi-stage system for single face tracking in cluttered scenes is presented. Initial candidate face locations are identified using a modified version of the ratio template algorithm [Proceedings of the Fifteenth National Conference on Artificial Intelligence (1998)]. This operates by matching ratios of averaged luminance using a spatial face model adapted here to incorporate biological proportions (the golden ratio). We find the inclusion in the face template of these golden ratio proportions increases tolerance to illumination changes. Subsequent processing stages enable rejection of false positives. These stages include a novel ratio-ratios operator that improves recognition rates by examining higher order relationships within the initial ratio template measures, and simple morphological eye and mouth feature detection. This frontal-face tracking architecture runs in real-time and is applicable to problem domains where face localisation is required. Link: http://www.sciencedirect.com/science?_ob=JournalURL&_cdi=6750&_auth=y&_acct=C000006498&_version=1&_urlVersion=0&_userid=604493&md5=a8e19d5d73f2f45a76de408d7c1f7029&chunk=95#95 A robust hardware algorithm for real-time object tracking in video Mahmoud Meribout, Lazher Khriji and Mamoru Nakanishi, Real-Time Imaging, Volume 10, Issue 3, (June 2004), Pages: 145-159. Abstract: Most of the emerging content-based multimedia technologies are based on efficient methods to solve machine early vision tasks. Among other tasks, object segmentation is perhaps the most important problem in single image processing. The solution of this problem is the key technology of the development of the majority of leading-edge interactive video communication technology and telepresence systems. The aim of this paper is to present a robust framework for real-time object segmentation and tracking in video sequences taken simultaneously from different perspectives. The other contribution of the paper is to present a new dedicated parallel hardware architecture. It is composed of a mixture of Digital Signal Processing and Field Programmable Gate Array technologies and uses the Content Addressable Memory as a main processing unit. Experimental results indicate that small amount of hardware can deliver real-time performance and high accuracy. This is an improvement over previous systems, where execution time of the second-order using a greater amount of hardware has been proposed. Link: http://www.sciencedirect.com/science?_ob=JournalURL&_cdi=6997&_auth=y&_acct=C000006498&_version=1&_urlVersion=0&_userid=604493&md5=d05bf7520900d6b84e7666bce5c95dae Learning switching dynamic models for objects tracking Gilles Celeux, Jacinto Nascimento and Jorge Marques, Pattern Recognition, Volume 37, Issue 9, (September 2004) Pages: 1841-1853. Abstract: Many recent tracking algorithms rely on model learning methods. A promising approach consists of modeling the object motion with switching autoregressive models. This article is involved with parametric switching dynamical models governed by an hidden Markov Chain. The maximum likelihood estimation of the parameters of those models is described. The formulas of the EM algorithm are detailed. Moreover, the problem of choosing a good and parsimonious model with BIC criterion is considered. Emphasis is put on choosing a reasonable number of hidden states. Numerical experiments on both simulated and real data sets highlight the ability of this approach to describe properly object motions with sudden changes. The two applications on real data concern object and heart tracking. Link: http://www.sciencedirect.com/science?_ob=JournalURL&_cdi=5664&_auth=y&_acct=C000006498&_version=1&_urlVersion=0&_userid=604493&md5=6e7468cd8d671a07c832cc08f2c9ee21

Page 6: WP4.2.2: Semantic Video Segmentation and Tracking … · Video Segmentation and Tracking ... A Survey on Visual Surveillance of Object ... morphological eye and mouth feature detection

People tracking based on motion model and motion constraints with automatic initialization Huazhong Ning, Tieniu Tan, Liang Wang and Weiming Hu, Pattern Recognition, Volume 37, Issue 7, (July 2004), Pages: 1423-1440. Abstract: Human motion analysis is currently one of the most active research topics in computer vision. This paper presents a model-based approach to recovering motion parameters of walking people from monocular image sequences in a CONDENSATION framework. From the semi-automatically acquired training data, we learn a motion model represented as Gaussian distributions, and explore motion constraints by considering the dependency of motion parameters and represent them as conditional distributions. Then both of them are integrated into a dynamic model to concentrate factored sampling in the areas of the state-space with most posterior information. To measure the observation density with accuracy and robustness, a pose evaluation function (PEF) combining both boundary and region information is proposed. The function is modeled with a radial term to improve the efficiency of the factored sampling. We also address the issue of automatic acquisition of initial model pose and recovery from severe failures. A large number of experiments carried out in both indoor and outdoor scenes demonstrate that the proposed approach works well. Link: http://www.sciencedirect.com/science?_ob=JournalURL&_cdi=5664&_auth=y&_acct=C000006498&_version=1&_urlVersion=0&_userid=604493&md5=6e7468cd8d671a07c832cc08f2c9ee21 Conics-enhanced vision approach for easy and low-cost 3D tracking Tien-LungTien-Lung Sun, Pattern Recognition, Volume 37, Issue 7, (July 2004), Pages: 1441-1450 . Abstract: This paper presents a conics-enhanced vision approach for low-cost and easy-to-operate 3D tracking. The idea is to use paper disks as markers and recover the 3D positions of these paper disks by a property of conics: the 3D rotation and translation information of a planar circle with known radius could be recovered from its elliptic projection in one image (Int. J. Comput. Vision 10(1) (1993) 7). This property implies that tracking a paper disk in 3D could be done by tracking its elliptic projection in a sequence of 2D images. Since ellipse tracking has to consider many factors such as occlusion, background, light and so on, it is difficult to develop a general algorithm that works for all situations. In this paper, we discuss algorithms for two types of ellipse tracking: real-time tracking of single, non-occluded ellipse and off-line tracking of multiple ellipses. They are used to develop a real-time tracker to control the camera movement in a virtual environment and a multiple-tracker system that works off-line to acquire human motion data to animate a 3D human model. Implementation details and experimental results of the tracking systems are presented. The proposed 3D tracking approach is valuable for applications where accuracy and speed are not very critical but affordability and operational ease are most concerned, e.g., the educational-purpose virtual environments for kids. Link: http://www.sciencedirect.com/science?_ob=JournalURL&_cdi=5664&_auth=y&_acct=C000006498&_version=1&_urlVersion=0&_userid=604493&md5=6e7468cd8d671a07c832cc08f2c9ee21 Detection and tracking of eyes for gaze-camera control Shinjiro Kawato and Nobuji Tetsutani, Image and Vision Computing, Volume 22, Issue 12, (1 October 2004) Proceedings from the 15th International Conference on Vision Interface Pages: 1031-1038. Abstract: A head-off gaze-camera needs eye location information for head-free usage. For this purpose, we propose new algorithms to extract and track the positions of eyes in a real-time video stream. For extraction of eye positions, we detect blinks based on the differences between successive images. However, eyelid regions are fairly small. To distinguish them from dominant head movement, we elaborate a head movement cancellation process. For eye-position tracking, we use a template of ‘Between-the-Eyes,’ which is updated frame-by-frame, instead of the eyes themselves. Eyes are searched based on the current position of ‘Between-the-Eyes’ and their geometrical relations to the position in the previous frame. The ‘Between-the-Eyes’ pattern is easier to locate accurately than eye

Page 7: WP4.2.2: Semantic Video Segmentation and Tracking … · Video Segmentation and Tracking ... A Survey on Visual Surveillance of Object ... morphological eye and mouth feature detection

patterns. We implemented the system on a PC with a Pentium III 866-MHz CPU. The system runs at 30 frames/s and robustly detects and tracks the eyes. Link: http://www.sciencedirect.com/science?_ob=JournalURL&_cdi=5641&_auth=y&_acct=C000006498&_version=1&_urlVersion=0&_userid=604493&md5=56610a87257482f48c0347dc7c816b1f Real-time tracking for visual interface applications in cluttered and occluding situations David J. Bullock and John S. Zelek, Image and Vision Computing, Volume 22, Issue 12, Pages: 1083-1091 . Abstract: Visual interface systems require object tracking techniques with real-time performance for ubiquitous interaction. A probabilistic framework for a visual tracking system, which robustly tracks targets in real-time using color and motion cues, is presented. The algorithm is based on particle filtering techniques of the I-Condensation filter. An innovation of the paper is the use of motion cues to guide the propagation of particle samples which are being evaluated using color cues. This results in a probabilistic blob tracking method which is shown to greatly outperform conventional blob trackers when in the presence of occlusion and clutter. A second innovation presented is the use of motion-based temporal signatures for the visual recognition of an initialization cue. This allows for passive initialization of the tracking system. The application presented here is the task of digital video annotation using a hand-held marking device. Link: http://www.sciencedirect.com/science?_ob=JournalURL&_cdi=5641&_auth=y&_acct=C000006498&_version=1&_urlVersion=0&_userid=604493&md5=56610a87257482f48c0347dc7c816b1f Tightly integrated sensor fusion for robust visual tracking G. S. W. Klein and T. W. Drummond, Image and Vision Computing, Volume 22, Issue 10, (1 September 2004) British Machine Vision Computing 2002, Pages: 769-776. Abstract: This paper presents a novel method for increasing the robustness of visual tracking systems by incorporating information from inertial sensors. We show that more can be achieved than simply combining the sensor data within a statistical filter: besides using inertial data to provide predictions for the visual sensor, this data can be used to dynamically tune the parameters of each feature detector in the visual sensor. This allows the visual sensor to provide useful information even in the presence of substantial motion blur. Finally, the visual sensor can be used to calibrate the parameters of the inertial sensor to eliminate drift. Link: http://www.sciencedirect.com/science?_ob=JournalURL&_cdi=5641&_auth=y&_acct=C000006498&_version=1&_urlVersion=0&_userid=604493&md5=56610a87257482f48c0347dc7c816b1f An integrated surveillance system—human tracking and view synthesis using multiple omni-directional vision sensors Kim C. Ng, Hiroshi Ishiguro, Mohan Trivedi and Takushi Sogo, Image and Vision Computing, Volume 22, Issue 7, (1 July 2004) Visual Surveillance, Pages 551-561. Abstract: Accurate and efficient monitoring of dynamically changing environments is one of the most important requirements for visual surveillance systems. This paper describes the development of an integrated system for this monitoring purpose. The system consists of multiple omnidirectional vision sensors and was developed to address two specific surveillance tasks: (1) robust tracking and profiling of human activities; (2) dynamic synthesis of virtual views for observing the environment from arbitrary vantage points.

Page 8: WP4.2.2: Semantic Video Segmentation and Tracking … · Video Segmentation and Tracking ... A Survey on Visual Surveillance of Object ... morphological eye and mouth feature detection

Link: http://www.sciencedirect.com/science?_ob=JournalURL&_cdi=5641&_auth=y&_acct=C000006498&_version=1&_urlVersion=0&_userid=604493&md5=56610a87257482f48c0347dc7c816b1f Robust tracking of persons in real-world scenarios using a statistical computer vision approach Gerhard Rigoll, Harald Breit and Frank Wallhoff, Image and Vision ComputingVolume 22, Issue 7, (1 July 2004) Visual Surveillance, Pages 571-582. Abstract: In the following work we present a novel approach to robust and flexible person tracking using an algorithm that combines two powerful stochastic modeling techniques: the first one is the technique of so-called Pseudo-2D Hidden Markov Models (P2DHMMs) used for capturing the shape of a person within an image frame, and the second technique is the well-known Kalman-filtering algorithm, that uses the output of the P2DHMM for tracking the person by estimation of a bounding box trajectory indicating the location of the person within the entire video sequence. Both algorithms are cooperating together in an optimal way, and with this cooperative feedback, the proposed approach even makes the tracking of persons possible in the presence of background motions, for instance caused by moving objects such as cars, or by camera operations as e.g. panning or zooming. We consider this as a major advantage compared to most other tracking algorithms that are mostly not capable of dealing with background motion. Furthermore, the person to be tracked is not required to wear special equipment (e.g. sensors) or special clothing. Additionally, we show how our approach can be effectively extended in order to include on-line background adaptation. Our results are confirmed by several tracking examples in real scenarios, shown at the end of the article and provided on the web server of our institute. Link: http://www.sciencedirect.com/science?_ob=JournalURL&_cdi=5641&_auth=y&_acct=C000006498&_version=1&_urlVersion=0&_userid=604493&md5=56610a87257482f48c0347dc7c816b1f Performance Measures for Video Object Segmentation and Tracking Erdem, C.E., Sankur, B., Tekalp, A.M., Image Processing, IEEE Transactions on, Volume: 13, Issue: 7, July 2004, Pages: 937 – 951 Abstract: We propose measures to evaluate quantitatively the performance of video object segmentation and tracking methods without ground-truth (GT) segmentation maps. The proposed measures are based on spatial differences of color and motion along the boundary of the estimated video object plane and temporal differences between the color histogram of the current object plane and its predecessors. They can be used to localize (spatially and/or temporally) regions where segmentation results are good or bad; and/or they can be combined to yield a single numerical measure to indicate the goodness of the boundary segmentation and tracking results over a sequence. The validity of the proposed performance measures without GT have been demonstrated by canonical correlation analysis with another set of measures with GT on a set of sequences (where GT information is available). Experimental results are presented to evaluate the segmentation maps obtained from various sequences using different segmentation approaches. Link: http://www.busim.ee.boun.edu.tr/~sankur/SankurFolder/VCIP03.pdf Robust Segmentation and Tracking of Colored Objects in Video Gevers, T., Circuits and Systems for Video Technology, IEEE Transactions on, Volume: 14, Issue: 6, June 2004, Pages: 776 – 781 Abstract: Segmenting and tracking of objects in video is of great importance for video-based encoding, surveillance, and retrieval. However, the inherent difficulty of object segmentation and tracking is to distinguish changes in the displacement of objects from disturbing effects such as noise and illumination changes. Therefore, in this paper, we formulate a color-based deformable model which is robust against noisy data and changing illumination. Computational methods are presented to measure color constant gradients. Further, a model is given to estimate the amount of sensor noise through these color constant gradients. The obtained uncertainty is subsequently used as a weighting

Page 9: WP4.2.2: Semantic Video Segmentation and Tracking … · Video Segmentation and Tracking ... A Survey on Visual Surveillance of Object ... morphological eye and mouth feature detection

term in the deformation process. Experiments are conducted on image sequences recorded from three-dimensional scenes. From the experimental results, it is shown that the proposed color constant deformable method successfully finds object contours robust against illumination, and noisy, but homogeneous regions. Link: http://ieeexplore.ieee.org/xpl/abs_free.jsp?arNumber=1302159 B. Conference Papers

An HMM/MRF-Based Stochastic Framework for Robust Vehicle Tracking Kato, J., Watanabe, T., Joga, S., Liu, Y., Hase, H., Intelligent Transportation Systems, IEEE Transactions on, Volume: 5, Issue: 3 Sept. 2004, Pages: 142 – 154. Abstract: Shadows of moving objects often obstruct robust visual tracking. In this paper, we present a car tracker based on a hidden Markov model/Markov random field (HMM/MRF)-based segmentation method that is capable of classifying each small region of an image into three different categories: vehicles, shadows of vehicles, and background from a traffic-monitoring movie. The temporal continuity of the different categories for one small region location is modeled as a single HMM along the time axis, independently of the neighboring regions. In order to incorporate spatial-dependent information among neighboring regions into the tracking process, at the state-estimation stage, the output from the HMMs is regarded as an MRF and the maximum a posteriori criterion is employed in conjunction with the MRF for optimization. At each time step, the state estimation for the image is equivalent to the optimal configuration of the MRF generated through a stochastic relaxation process. Experimental results show that, using this method, foreground (vehicles) and nonforeground regions including the shadows of moving vehicles can be discriminated with high accuracy. Link: http://ieeexplore.ieee.org/xpl/abs_free.jsp?arNumber=1331385 Managing Uncertainty: Modeling Users in Location-Tracking Applications Abdelsalam, W., Ebrahim, Y., Pervasive Computing, IEEE, Volume: 03, Issue: 3, July 2004, Pages: 60– 65. Abstract: Applications based on location tracking have been growing in number and sophistication over the past few years. These applications include those that can track elderly people, provide targeted advertising to mobile users, and track moving objects. Here, the authors discuss human-controlled moving objects under the general category of "roving users." A typical use of a roving use--RU for short--location tracking system is to track the location of each RU to answer queries about the person's whereabouts at any particular time. Regardless of the time frame, any research project seeking to address the RU issue must deal with the uncertainty that can be caused by the fact that it's difficult to predict each RU’s location at any given time. Link: http://csdl.computer.org/comp/mags/pc/2004/03/b3060abs.htm Probabilistic object tracking using multiple features Serby, D., Koller-Meier, E., Van Gool, L., Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, Volume: 2, Aug. 23-26 2004, Pages: 184 – 187. Abstract: We present a generic tracker which can handle a variety of different objects. For this purpose, groups of low-level features like interest points, edges, homogeneous and textured regions, are combined on a flexible and opportunistic basis. They sufficiently characterize an object and allow robust tracking as they are complementary sources of information which describe both the shape and the appearance of an object. These low-level features are integrated into a particle filter framework as this has proven very successful for non-linear and non-Gaussian estimation problems. In this paper we concentrate on rigid objects under affine transformations. Results on real-world scenes demonstrate the performance of the proposed tracker.

Page 10: WP4.2.2: Semantic Video Segmentation and Tracking … · Video Segmentation and Tracking ... A Survey on Visual Surveillance of Object ... morphological eye and mouth feature detection

Link: http://cs.gmu.edu/~fli/papers/Probabilistic%20Object%20Tracking%20Using%20Multiple%20Features.pdf Real-time, 3-D-multi object position estimation and tracking Kaszubiak, J., Tornow, M., Kuhn, R.W., Michaelis, B., Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, Volume: 1, Aug. 23-26, 2004, Pages: 785 – 788. Abstract: For autonomously acting robots and driver assistance systems powerful optical stereo sensor systems are required. Object positions and environmental conditions have to be acquired in real-time. In this paper a hardware-software co-design is applied, acting within the presented stereophotogrammetric system. For calculation of the depth map an optimized algorithm is implemented as a hierarchical parallel hardware solution. By adapting the image resolution to the distance, real-time processing is possible. The object clustering and the tracking is realized in a processor. The density distribution of the disparity in the depth map (disparity histogram) is used for object detection. A Kalman filter stabilizes the parameters of the results. Link: http://csdl.computer.org/comp/proceedings/icpr/2004/2128/01/212810785abs.htm Robust real time tracking of 3D objects Masson, L., Dhome, M., Jurie, F., Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, Volume: 4, Aug. 23-26, 2004, Pages: 252 – 255 . Abstract: In this article the problem of tracking rigid 3D objects is addressed. The contribution of the proposed approach is an algorithm which combines: efficiency (i.e. the algorithm is designed before all to be real time using standard architecture), robustness (occlusions are allowed) and accuracy (sub-pixel accuracy is obtained). It is devoted to the tracking of 3D rigid objects, assuming that the 3D geometry as well as the texture of the surface is known. Such performances can be obtained through a two levels scheme: the core of the approach consists in an efficient 2D patch tracker whose results are combined robustly to compute the 3D object pose. This article provides experimental results proving the soundness of the proposed approach. Link: Not Available. Robust appearance-based tracking of moving object from moving platform Jie Shao, Shaohua Kevin Zhou, Qinfen Zheng, Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, Volume: 4, Aug. 23-26, 2004, Pages: 215 – 218. Abstract: We present a robust algorithm for tracking moving objects from a moving platform. Robustness is achieved by incorporating temporal differencing and shape detection in an appearance-based object tracking algorithm. In addition, the incorporation of these two methods also improves accuracy and computational efficiency of detection. Some experimental results using airborne-video tracking are given to illustrate the effectiveness of this method. Link: http://www.cfar.umd.edu/~shaohua/publications.html Object tracking using incremental fisher discriminant analysis Ruei-Sung Lin, Ming-Hsuan Yang, Levinson, S.E., Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, Volume: 2, Aug. 23-26, 2004, Pages: 757 – 760 . Abstract: This paper presents a novel object tracking algorithm using incremental Fisher Linear Discriminant (FLD) algorithm. The sample distribution of the target class is modeled by a single Gaussian and the non-target background class is modeled by a mixture of Gaussians. To a facilitate a multiclass classification problem, we recast the classic FLD algorithm in which the number of classes does not need to be pre-determined. The most discriminant projection matrix that best separates the samples in the projected space is computed using FLD at each frame. Based on the current target

Page 11: WP4.2.2: Semantic Video Segmentation and Tracking … · Video Segmentation and Tracking ... A Survey on Visual Surveillance of Object ... morphological eye and mouth feature detection

location, an efficient sampling algorithm is used to predict the possible locations in the next frame. Using the current projection matrix computed by FLD, the most likely candidate which is closed to the center of the target class in the projected space is selected. Since the FLD is repeatedly computed at each frame, we develop an incremental and efficient method to compute the projection matrix based on the previous results. Experimental results show that our tracker is able to follow the target with large lighting, pose and expression variation. Link: http://csdl.computer.org/comp/proceedings/icpr/2004/2128/02/212820757abs.htm Reinforcement learning-based feature learning for object tracking Fang Liu, Jianbo Su. Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, Volume: 2, Aug. 23-26, 2004, Pages: 748 – 751. Abstract: Feature learning in object tracking is important because the choice of the features significantly affects system's performance. In this paper, a novel online feature learning approach based on reinforcement learning is proposed. Reinforcement learning has been extensively used as a generative model of sequential decision-making that interacts with uncertain environment. We extend this technique to feature selection for object tracking, and further add human-computer interaction to reinforcement learning to reduce the learning complexity and speed the convergence rate. Experiments of the object tracking are provided to verify the effectiveness of the proposed approach. Link: http://csdl.computer.org/comp/proceedings/icpr/2004/2128/02/212820748abs.htm Object tracking by the mean-shift of regional color distribution combined with the particle-filter algorithm Deguchi, K., Kawanaka, O., Okatani, T., Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, Volume: 3, Aug. 23-26, 2004, Pages: 506 – 509. Abstract: This paper presents a method for tracking a person in a video sequence in real time. In this method the profile of color distribution characterises target’s feature. It is invariant for rotation and scale changes. It’s also robust to nonrigidity and partial occlusion of the target. We employ the mean-shift algorithm to track the target and to reduce the computational cost. Moreover, we incorporate the particle- filter into it to cope with a temporal occlusion of the target, and largely reduce the computational cost of the original particle-filter. Experiments show the availability of this method. Link: Not Available. A non causal bayesian framework for object tracking and occlusion handling for the synthesis of stereoscopic video Moustakas, K., Tzovaras, D., Strintzis, M.G., 3D Data Processing, Visualization and Transmission, 004. 3DPVT 2004. Proceedings. 2nd International Symposium on, 6-9 Sept. 2004, Pages: 147 – 154. Abstract: This paper presents a framework for the synthesis of stereoscopic video using as input only a monoscopic image sequence. Initially, bi-directional 2D motion estimation is performed, which is followed by an efficient method for the reliable tracking of object contours. Rigid 3D motion and structure is recovered utilizing extended Kalman filtering. Finally, occlusions are dealt with a novel Bayesian framework, which exploits future information to correctly reconstruct occluded areas. Experimental evaluation shows that the layered object scene representation, combined with the proposed methods for object tracking throughout the sequence and occlusion handling, yields very accurate results. Link: http://csdl.computer.org/comp/proceedings/3dpvt/2004/2223/00/22230147abs.htm Estimation of the bayesian network architecture for object tracking in video sequences Jorge, P.M., Marques, J.S., Abrantes, A.J., Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, Volume: 2, Aug. 23-26, 2004, Pages: 732 – 735.

Page 12: WP4.2.2: Semantic Video Segmentation and Tracking … · Video Segmentation and Tracking ... A Survey on Visual Surveillance of Object ... morphological eye and mouth feature detection

Abstract: It was recently proposed the use of Bayesian networks for object tracking. Bayesian networks allow to model the interaction among detected trajectories, in order to obtain a reliable object identification in the presence of occlusions. However, the architecture of the Bayesian network has been defined using simple heuristic rules which fail in many cases. This paper addresses the above problem and presents a new method to estimate the network architecture from the video sequences using supervised learning techniques. Experimental results are presented showing that significant performance gains (increase of accuracy and decrease of complexity) are achieved by the proposed methods. Link: http://csdl.computer.org/comp/proceedings/icpr/2004/2128/02/212820732abs.htm Kernel-based method for tracking objects with rotation and translation Haihong Zhang, Zhiyong Huang, Weimin Huang, Liyuan Li Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, Volume: 2, Aug. 23-26, 2004, Pages: 728 – 731. Abstract: This paper addresses the issue of tracking translation and rotation simultaneously. Starting with a kernel-based spatial-spectral model for object representation, we define an l2-norm similarity measure between the target object and the observation, and derive a new formulation to the tracking of translational and rotational object. Based on the tracking formulation, an iterative procedure is proposed. We also develop an adaptive kernel model to cope with varying appearance. Experimental results are presented for both synthetic data and real-world traffic video. Link: http://csdl.computer.org/comp/proceedings/icpr/2004/2128/02/212820728abs.htm A particle filter for tracking densely populated objects based on explicit multiview occlusion analysis Otsuka, K., Mukawa, N., Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, Volume: 4, Aug. 23-26, 2004, Pages: 745 – 750 . Abstract: A novel particle filter is presented for tracking densely populated objects moving on a two-dimensional plane; it is based on a probabilistic framework of explicit multiview occlusion analysis. The spatial structure of 2-D occlusion process between objects is modeled as a hidden process controlled by a Markov probability structure. The tracking problem is then formulated as a recursive Bayesian framework for solving the simultaneous estimation problem of two interactive processes; hypothesis generation/testing of the occlusion structure and the computation of posterior probability distribution of object states such as position and pose. For efficient implementation of the formulated framework, we develop a novel particle filter in which each particle can support multiple posterior distributions of object states on different occlusion hypotheses. Experiments using synthetic and real data confirm the robustness of the proposed method even in the face of severe occlusion. Link: Not Available. An ubiquitous architectural framework and protocol for object tracking using RFID tags Pradip De, Basu, K., Das, S.K., Mobile and Ubiquitous Systems: Networking and Services, 2004. MOBIQUITOUS 2004. The First Annual International Conference on, Aug. 22-26, 2004, Pages: 174 – 182 . Abstract: A completely visible Pervasive Transaction Environment where it is possible to link all related transactions of physical objects and trace their mobility through their entire life process, has been elusive. With the emergence of Radio Frequency Identification (RFID) based tags, it is now practicable to automatically collect information pertaining to any object’s place, time, transaction, etc. Based on the pervasive deployment of RFID tags, we propose in this paper a novel ubiquitous architecture followed by a protocol for tracking mobile objects in real-time. Our delay analysis and simulation results indicate that the delay incurred in the database update of current tag location is very low and is in the order of seconds, thus providing a very fine granularity in time for consistent location update requests.

Page 13: WP4.2.2: Semantic Video Segmentation and Tracking … · Video Segmentation and Tracking ... A Survey on Visual Surveillance of Object ... morphological eye and mouth feature detection

Link: http://csdl.computer.org/comp/proceedings/mobiquitous/2004/2208/00/22080174abs.htm Efficient tracking of moving objects with precision guarantees Civilis, A., Jensen, C.S., Nenortaite, J., Pakalnis, S., Mobile and Ubiquitous Systems: Networking and Services, 2004. MOBIQUITOUS 2004. The First Annual International Conference on, Aug. 22-26, 2004, Pages: 164 – 173 . Abstract: Sustained advances in wireless communications, geo-positioning, and consumer electronics pave the way to a kind of location-based service that relies on the tracking of the continuously changing positions of an entire population of service users. This type of service is characterized by large volumes of updates, giving prominence to techniques for location representation and update. This paper presents several representations, along with associated update techniques, that predict the present and future positions of moving objects. An update occurs when the deviation between the predicted and the actual position of an object exceeds a given threshold. For the case where the road network, in which an object is moving, is known, we propose a so-called segment-based policy that predicts an object’s movement according to the road’s shape. Map matching is used for determining the road on which an object is moving. Empirical performance studies based on a real road network and GPS logs from cars are reported. Link: http://csdl.computer.org/comp/proceedings/mobiquitous/2004/2208/00/22080164abs.htm Dual prediction-based reporting for object tracking sensor networks Yingqi Xu, Winter, J., Wang-Chien Lee, Mobile and Ubiquitous Systems: Networking and Services, 2004. MOBIQUITOUS 2004. The First Annual International Conference on, Aug. 22-26, 2004, Pages: 154 – 163. Abstract: As one of the wireless sensor network killer applications, object tracking sensor networks (OTSNs) disclose many opportunities for energy-aware system design and implementations. In this paper, we investigate prediction-based approaches for performing energy efficient reporting in OTSNs. We propose a dual prediction-based reporting mechanism (called DPR), in which both sensor nodes and the base station predict the future movements of the mobile objects. Transmissions of sensor readings are avoided as long as the predictions are consistent with the real object movements. DPR achieves energy efficiency by intelligently trading off multi-hop/long-range transmissions of sensor readings between sensor nodes and the base station with one-hop/short-range communications of object movement history among neighbor sensor nodes. We explore the impact of several system parameters and moving behavior of tracked objects on DPR performance, and also study two major components of DPR: prediction models and location models through simulations. Our experimental results show that DPR is able to achieve considerable energy savings under various conditions and outperforms existing reporting mechanisms. Link: http://csdl.computer.org/comp/proceedings/mobiquitous/2004/2208/00/22080154abs.htm Multiview occlusion analysis for tracking densely populated objects based on 2-D visual angles Otsuka, K., Mukawa, N. Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, Volume: 1, 27 June – 2 July 2004, Pages: 90 – 97. Abstract: A novel framework of multiview occlusion analysis is presented for tracking densely populated objects moving on two-dimensional plane. This paper explicitly models the spatial structure of the occlusion process between objects and its uncertainty, based on 2-D silhouette-based visual angles from fixed viewpoints. The occlusion structure is defined as tangency combination between the objects and the edges of the visual angles, based on geometric constraints inherent in the visual angles. The problem is then formulated as recursive Bayesian estimation consisting of hypothesis generation/testing of the occlusion structure and the estimation of posterior probability distribution for the object states including position and posture, on each hypothesis of the occlusion structure. For implementing the proposed framework, we develop a novel type of particle filter that supports multiple

Page 14: WP4.2.2: Semantic Video Segmentation and Tracking … · Video Segmentation and Tracking ... A Survey on Visual Surveillance of Object ... morphological eye and mouth feature detection

state distributions. Experiments using synthetic and real data show the robustness of the framework even in the face of severe occlusions. Link: http://csdl.computer.org/comp/proceedings/cvpr/2004/2158/01/215810090abs.htm An algorithm for multiple object trajectory tracking Mei Han, Wei Xu, Hai Tao, Yihong Gong, Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, Volume: 1, 27 June - 2 July 2004, Pages: 864 – 871. Abstract: Most tracking algorithms are based on the maximum a posteriori (MAP) solution of a probabilistic framework called Hidden Markov Model, where the distribution of the object state at current time instance is estimated based on current and previous observations. However, this approach is prone to errors caused by temporal distractions such as occlusion, background clutter and multi-object confusion. In this paper we propose a multiple object tracking algorithm that seeks the optimal state sequence which maximizes the joint state-observation probability. We name this algorithm trajectory tracking since it estimates the state sequence or "trajectory" instead of the current state. The algorithm is capable of tracking multiple objects whose number is unknown and varies during tracking. We introduce an observation model which is composed of the original image, the foreground mask given by background subtraction and the object detection map generated by an object detector. The image provides the object appearance information. The foreground mask enables the likelihood computation to consider the multi-object configuration in its entirety. The detection map consists of pixel-wise object detection scores, which drives the tracking algorithm to perform joint inference on both the number of objects and their configurations efficiently. Link: http://csdl.computer.org/comp/proceedings/cvpr/2004/2158/01/215810864abs.htm An EM-like algorithm for color-histogram-based object tracking Zivkovic, Z., Krose, B., Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, Volume: 1, 27 June - 2 July 2004, Pages: 798 – 803. Abstract: The iterative procedure called ’mean-shift’ is a simple robust method for finding the position of a local mode (local maximum) of a kernel-based estimate of a density function. A new robust algorithm is given here that presents a natural extension of the ’mean-shift’ procedure. The new algorithm simultaneously estimates the position of the local mode and the covariance matrix that describes the approximate shape of the local mode. We apply the new method to develop a new 5-degrees of freedom (DOF) color histogram based non-rigid object tracking algorithm. Link: http://carol.science.uva.nl/~zivkovic/Publications/zivkovic2004CVPR.pdf Incremental density approximation and kernel-based Bayesian filtering for object tracking Bohyung Han, Comaniciu, D., Ying Zhu, Davis,L., Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, Volume: 1, 27 June - 2 July 2004, Pages: 638 – 644. Abstract: Statistical density estimation techniques are used in many computer vision applications such as object tracking, background subtraction, motion estimation and segmentation. The particle filter (Condensation) algorithm provides a general framework for estimating the probability density functions (pdf) of general non-linear and non-Gaussian systems. However, since this algorithm is based on a Monte Carlo approach, where the density is represented by a set of random samples, the number of samples Link: http://www.cs.umd.edu/~bhhan/papers/cvpr2004_final.pdf Parallel tracking of all soccer players by integrating detected positions in multiple view images

Page 15: WP4.2.2: Semantic Video Segmentation and Tracking … · Video Segmentation and Tracking ... A Survey on Visual Surveillance of Object ... morphological eye and mouth feature detection

S. Iwase, H. Saito, 17th International Conference on Pattern Recognition, 2004, Volume: 4, Pages: 751-754. Abstract: Soccer, one of the popular sports around the world, is often broadcasted on TV, and various researches have done on soccer scene images such as strategy analysis, scene recovery, automatic indexing of soccer scenes, and automatic intelligent sports casting. As robust player tracking is fundamental to those researches, there is a demand for an automatic player tracking system using soccer imaging data. In this paper, we propose a method of tracking soccer players using multiple views. Tracking is done by integrating the tracking data from all cameras, using the geometrical relationship between cameras called homography. Integrating information from all cameras enables stable tracking on the scene, where the tracking by a single camera often fails in the case of occlusion. Link: Not Available. A color-based tracking by Kalman particle filter Satoh, Y., Okatani, T., Deguchi, K., 17th International Conference on Pattern Recognition, Volume: 3, Pages: 502-505. Abstract: In this paper, a method for real-time tracking of moving objects is proposed. We applied Kalman particle filter (KPF) to color-based tracking. This KPF is a particle filter including the principle of Kalman filter, and it was adopted to the object contour tracking. We modified this KPF for color-based tracking. This modified KPF can approximate the probabilistic density of the position of the tracked object properly and needs fewer particles for tracking than conventional particle filters. We made experiments to confirm effectiveness of this method. Link: Not Available. Multiple target tracking by appearance-based condensation tracker using structure information, Satake, J., Shakunaga, T., 17th International Conference on Pattern Recognition, Volume: 3, Pages: 294-297. Abstract: Multiple target tracking is a challenging problem, especially when targets are frequently crossing each other. It becomes very difficult and confusing when some targets are often occluded by other targets. This paper proposes a novel tracking method for the problem using an appearance-based condensation tracker. In order to overcome difficulties in the occlusion problem, a target object is regarded as a set of parts that constrains each other in the target structure. While each part is tracked basically in the condensation method, all the parts cooperate in the drift step of the condensation. Experimental results show the effectiveness of the proposed method for multiple person tracking and human face tracking. Link: Not Available. Human tracking using floor sensors based on the Markov chain Monte Carlo method Murakita, T., Ikeda, T., Ishiguro, H., 17th International Conference on Pattern Recognition, Volume: 4, Pages: 917-920. Abstract: The aim of this paper is to develop a human tracking system that is resistant to environmental changes and covers wide area. Simply structured floor sensors are low-cost and can track people in a wide area. However, the sensor reading is discrete and missing; therefore, footsteps do not represent the precise location of a person. A Markov Chain Monte Carlo method (MCMC) is a promising tracking algorithm for these kinds of signals. We applied two prediction models to the MCMC: a linear Gaussian model and a highly nonlinear bipedal model. The Gaussian model was efficient in terms of computational cost while the bipedal model discriminated people more accurate than the Gaussian model. The Gaussian model can be used to track a number of people, and the Bipedal model can be used in situations where more accurate tracking is required. Link: Not Available.

Page 16: WP4.2.2: Semantic Video Segmentation and Tracking … · Video Segmentation and Tracking ... A Survey on Visual Surveillance of Object ... morphological eye and mouth feature detection

Object boundary edge selection using normal direction derivatives of a contour in a complex scene Kim, T.Y.[Tae-Yong], Park, J.[Jihun], Lee, S.W.[Seong-Whan], 17th International Conference on Pattern Recognition, Volume: 4, Pages: 755-758. Abstract: Recently, Nguyen proposed a method[1] for tracking a nonparameterized object (subject) contour in a single video stream. Nguyen’s approach combined outputs of two steps: creating a predicted contour and removing background edges. In this paper, we propose a method to increase object tracking accuracy by improving the background edge removal process. Nguyen’s background edge removal method of leaving many irrelevant edges is subject to inaccurate contour tracking. Our accurate tracking is based on reducing affects from irrelevant edges by selecting the boundary edge only. We select high-valued edge pixels of average image intensity gradients in the contour normal direction. Our experimental results show that our tracking approach is robust enough to handle a complex-textured scene. Link: Not Available. Evaluation of tracking reliability metrics based on information theory and normalized correlation Loutas, E.,Nikolaidis, N., Pitas, I., 17th International Conference on Pattern Recognition, Volume: 4, Pages: 653-656. Abstract: The efficiency of three tracking reliability metrics based on information theory and normalized correlation is examined in this paper. The two information theory tools used for the metrics construction are the mutual information and the Kullback-Leibler distance. The metrics are applicable to any feature-based tracking scheme. In the context of this work they are applied for comparison purposes on an object tracking scheme using multiple feature point correspondences. Experimental results have shown that the information theory based metrics perform better than the normalized correlation one. Link: Not Available. A probabilistic framework for joint head tracking and pose estimation Ba, S.O., Odobez, J.M., 17th International Conference on Pattern Recognition, Volume: 4, Pages: 264-267. Abstract: Head Tracking and pose estimation are usually considered as two sequential and separate problems: pose is estimated on the head patch provided by a tracking module. However, precision in head pose estimation is dependent on tracking accuracy which itself could benefit from the head orientation knowledge. Therefore, this work considers head tracking and pose estimation as two coupled problems in a probabilistic setting. Head pose models are learned and incorporated into a mixed-state particle filter framework for joint head tracking and pose estimation. Experimental results on real sequences show the effectiveness of the method in estimating more stable and accurate pose values. Link: Not Available. Implementation of a modular real-time feature-based architecture applied to visual face tracking Castaneda, B., Luzanov, Y., Cockburn, J.C., 17th International Conference on Pattern Recognition, Volume: 4, Pages: 167-170. Abstract: This paper presents a modular real–time feature–based visual tracking architecture where each feature of an object is tracked by one module. A data fusion stage collects the information from various modules exploiting the relationship among features to achieve robust detection and visual tracking. This architecture takes advantage of the temporal and spatial information available in a video

Page 17: WP4.2.2: Semantic Video Segmentation and Tracking … · Video Segmentation and Tracking ... A Survey on Visual Surveillance of Object ... morphological eye and mouth feature detection

stream. Its effectiveness is demonstrated in a face tracking system that uses eyes and lips as features. In the architecture implementation, each module has a pre–processing stage that reduces the number of image regions that are candidates for eyes and lips. Support Vector Machines are then used in the classification process, whereas a combination of Kalman filters and template matching is used for tracking. The geometric relation between features is used in the data fusion stage to combine the information from different modules to improve tracking. Link: Not Available. Combined face-body tracking in indoor environment Song, X.[Xuefeng], Nevatia, R., 17th International Conference on Pattern Recognition, Volume: 4, Pages: 159-162. Abstract: Background subtraction is commonly used for tracking objects in outdoor environment. But it doesn't work that well indoors, because of the problems caused by illumination change, shadows, occlusion and targets' changing appearances. In contrast, color tracking is relatively resistant to these problems, but suffers from the need of initialization. To complete the specific human tracking task in indoor environment, this paper utilizes the specific human face-body structure, and tracks face and body simultaneously and cooperatively. The advantage of this approach is that it can keep tracking in some bad situations, when one of the parts is missing, which makes it more robust than single-part tracking. Experimental tracking results on a meeting room video data are given. Link: Not Available. Eye tracking using Markov models Bagci, A.M., Ansari, R., Khokhar, A., Cetin, E., 17th International Conference on Pattern Recognition , Volume: 3, Pages: 818-821. Abstract: We propose an eye detection and tracking method based on color and geometrical features of the human face using a monocular camera. In this method a decision is made on whether the eyes are closed or not and, using a Markov chain framework to model temporal evolution, the subject’s gaze is determined. The method can successfully track facial features even while the head assumes various poses, so long as the nostrils are visible to the camera. We compare our method with recently proposed techniques and results show that it provides more accurate tracking and robustness to variations in view of the face. A procedure for detecting tracking errors is employed to recover the loss of feature points in case of occlusion or very fast head movement. The method may be used in monitoring a driver’s alertness and detecting drowsiness, and also in applications requiring non-contact human computer interaction. Link: Not Available. Probabilistic tracking with adaptive feature selection Chen, H.T.[Hwann-Tzong], Fuh, C.S.[Chiou-Shann], 17th International Conference on Pattern Recognition, Volume: 2, Pages: 736-739. Abstract: We propose a color-based tracking framework that infers alternately an object’s configuration and good color features via particle filtering. The tracker adaptively selects discriminative color features that well distinguish foregrounds from backgrounds. The effectiveness of a feature is weighted by the Kullback-Leibler observation model, which measures dissimilarities between the color histograms of foregrounds and backgrounds. Experimental results show that the probabilistic tracker with adaptive feature selection is resilient to lighting changes and background distractions. Link: Not Available.

Page 18: WP4.2.2: Semantic Video Segmentation and Tracking … · Video Segmentation and Tracking ... A Survey on Visual Surveillance of Object ... morphological eye and mouth feature detection

An effective and fast soccer ball detection and tracking method Tong, X.F.[Xiao-Feng], Lu, H.Q.[Han-Qing], Liu, Q.S.[Qing-Shan], 17th International Conference on Pattern Recognition, Volume: 4, Pages: 795-798. Abstract: A ball detection and tracking approach in real soccer game is proposed in this paper. In view of difficulties of direct detection, an indirect strategy based on non-ball elimination is applied. We distinguish the ball with a coarse-to-fine process. Game field is firstly extracted and the posterior operations are restricted within it. Then, at the coarse step, some distinct non-ball regions are removed via evaluation of color and shape. And at the fine step, the remained regions are further examined and the optimal one is determined as ball. Afterwards, CONDENSATION algorithm is utilized to track ball. Region optimization is appended to adapt to the ball’s size and color/texture changes in response to movement along sequential frames through maximizing the normalization sum of intensity gradient around its perimeter. Moreover, a confidence measure representing the ball region’s reliability is presented to guide possible re-detection for continuous tracking. Experiments have demonstrated the method is valid and fast in real soccer sequences. Link: Not Available. Multiple pedestrian detection and tracking based on weighted temporal texture features Yang, H.D.[Hee-Deok], Lee, S.W.[Seong-Whan], 17th International Conference on Pattern Recognition, Volume: 4, Pages: 248-251. Abstract: This paper presents a novel method for detecting and tracking pedestrians from video images taken by a fixed camera. A pedestrian may be totally or partially occluded in a scene for some period of time. The proposed approach uses the appearance model for the identification of pedestrians and the weighted temporal texture features. We compared the proposed method with other related methods using color and shape features, and analyzed the features’ stability. Experimental results with various real video data revealed that real time pedestrian detection and tracking is possible with increased stability over 5–15% even under occasional occlusions in video surveillance applications. Link: Not Available. Robust real-time detection, tracking, and pose estimation of faces in video streams Huang, K.S., Trivedi, M.M., 17th International Conference on Pattern Recognition, Volume: 3, Pages: 965-968. Abstract: Robust human face analysis has been recognized as a crucial part in intelligent systems. In this paper we present the development of a computational framework for robust detection, tracking, and pose estimation of faces captured by video arrays. We discuss the development of a multi-primitive skin-tone and edge-based detection module embedded in a tracking module for efficient and robust face detection and tracking. A continuous density HMM based pose estimation is developed for an accurate estimate of the face orientation motions. Experimental evaluations of these algorithms suggest the validity of the proposed framework and its computational modules. Link: http://cvrr.ucsd.edu/publications/2004/17th International Conference on Pattern Recognition _Huang_Title.pdf A dynamic bayesian network approach to multi-cue based visual tracking Wang, T.[Tao], Diao, Q.[Qian], Zhang, Y.[Yimin], Song, G.[Gang], Lai, C.[Chunrong], Bradski, G., 17th International Conference on Pattern Recognition, Volume: 2, Pages: 167-170.

Page 19: WP4.2.2: Semantic Video Segmentation and Tracking … · Video Segmentation and Tracking ... A Survey on Visual Surveillance of Object ... morphological eye and mouth feature detection

Abstract: Visual tracking has been an active research field of computer vision. However, robust tracking is still far from satisfactory under conditions of various background clutter, poses and occlusion in the real world. To increase reliability, this paper presents a novel Dynamic Bayesian Networks (DBNs) approach to multi-cue based visual tracking. The method first extracts multi-cue observations such as skin color, ellipse shape, face detection, and then integrates them with hidden motion states in a compact DBN model. By using particle-based inference with multiple cues, our method works well even in background clutter without the need to resort to simplified linear and Gaussian assumptions. The experimental results are compared against the widely used CONDENSATION and KF approaches. Our better tracking results along with ease of fusing new cues in the DBN framework suggest that this technique is a fruitful basis to build top performing visual tracking systems. Link: http://csdl.computer.org/comp/proceedings/icpr/2004/2128/02/212820167abs.htm Lie algebra template tracking Bayro-Corrochano, E., Ortegon-Aguilar, J., 17th International Conference on Pattern Recognition, Volume: 2, Pages: 56-59. Abstract: Visual cues are often very difficult to track. We use an effective least squares estimation of the Lie algebra parameters to find the affine transformation involved in a visual region tracking. These parameters represent the geodesics of the optimal transformation orbit. Our experiments validate the effectiveness of the method. Link: http://csdl.computer.org/comp/proceedings/icpr/2004/2128/02/212820056abs.htm Augmented reality through real-time tracking of video sequences using a panoramic view Dehais, C., Douze, M., Morin, G., Charvillat, V., 17th International Conference on Pattern Recognition, Volume: 4, Pages: 995-998. Abstract: We propose a 2D approach for Augmented Reality (AR) applications where the real scene is modelled as a static panorama. We adapted a sparse tracking method based on homographies to track the orientation and zooming parameters of the camera during a video sequence. AR scenario (synthetic object insertion, real object or character extraction) can be performed in arbitrary static environments (from wide outdoor scenes to virtually augmented desktops or conference rooms). Link: Not Available. Real time tracking with occlusion and illumination variations Chateau, T., Lapreste, J.T., 17th International Conference on Pattern Recognition, Volume: 4, Pages: 763-766. The authors present a robust real time object tracking by image processing. A planar model defined by a center and few interest points of the object is used. This model choice allows to be robust to light variations and partial occlusions and a particle filter based tracker allows to recover from total occlusion of the object during a few seconds. This algorithm works at __Hz on a personal computer under Linux and is robust concerning both geometrical aspects as planar rotation and scale modification. Real experimentations prove the validity of the method for outdoor applications like vehicule tracking for adaptive cruise control. Link: Not Available. A multi-object tracking system for surveillance video analysis Xie, D., Weiming Hu, Tieniu Tan, Junyi Peng, Proceedings of the 17th International Conference on, Volume: 4, Aug. 23-26, 2004, Pages: 767 – 770.

Page 20: WP4.2.2: Semantic Video Segmentation and Tracking … · Video Segmentation and Tracking ... A Survey on Visual Surveillance of Object ... morphological eye and mouth feature detection

Link: Not Available. Hierarchical probabilistic models for video object segmentation and tracking Thirde, D., Jones, G., Pattern Recognition, Proceedings of the 17th International Conference on, Volume: 1, Aug. 23-26, 2004, Pages: 636 – 639. Abstract: When tracking and segmenting semantic video objects, different forms of representational model can be used to find the object region on a per-frame basis. We propose a novel hierarchical technique using parametric models to describe the appearance and location of an object and then use nonparametric methods to model the sub-object regions for accurate pixel-wise segmentation. Our motivation is to use parametric models to locate the object, improving the sensitivity of the non-parametric sub-object region models to background clutter. The results indicate this is a promising approach to extracting video objects. Link: http://dircweb.king.ac.uk/papers/Thirde_D.J.2004_617535/1757_Thirde_D.pdf

Video Segmentation A. Journals Segmentation for robust tracking in the presence of severe occlusion Gentile, C., Camps, O., Sznaier, M., Image Processing, IEEE Transactions on, Volume: 13, Issue: 2, Feb. 2004, Pages: 166 – 178 Abstract: Tracking an object in a sequence of images can fail due to partial occlusion or clutter. Robustness to occlusion can be increased by tracking the object as a set of "parts" such that not all of these are occluded at the same time. However, successful implementation of this idea hinges upon finding a suitable set of parts. In this paper we propose a novel segmentation, specifically designed to improve robustness against occlusion in the context of tracking. The main result shows that tracking the parts resulting from this segmentation outperforms both tracking parts obtained through traditional segmentations, and tracking the entire target. Additional results include a statistical analysis of the correlation between features of a part and tracking error, and identifying a cost function that exhibits a high degree of correlation with the tracking error. Link: http://www.antd.nist.gov/wctg/manet/docs/Camillo_TIP04.pdf Video object segmentation using Bayes-based temporal tracking and trajectory-based region merging Mezaris, V., Kompatsiaris, I., Strintzis, M.G., Circuits and Systems for Video Technology, IEEE Transactions on, Volume: 14 Issue: 6, June 2004, Pages: 782 – 795. Abstract: A novel unsupervised video object segmentation algorithm is presented, aiming to segment a video sequence to objects: spatiotemporal regions representing a meaningful part of the sequence. The proposed algorithm consists of three stages: initial segmentation of the first frame using color, motion, and position information, based on a variant of the K-means-with-connectivity-constraint algorithm; a temporal tracking algorithm, using a Bayes classifier and rule-based processing to reassign changed pixels to existing regions and to efficiently handle the introduction of new regions; and a trajectory-based region merging procedure that employs the long-term trajectory of regions, rather than the motion at the frame level, so as to group them to objects with different motion. As shown by experimental evaluation, this scheme can efficiently segment video sequences with fast moving or newly appearing objects. A comparison with other methods shows segmentation results corresponding more accurately to the real objects appearing on the image sequence.

Page 21: WP4.2.2: Semantic Video Segmentation and Tracking … · Video Segmentation and Tracking ... A Survey on Visual Surveillance of Object ... morphological eye and mouth feature detection

Link: http://www.iti.gr/files/mezaris_csvt04_06.pdf Real-time compressed-domain spatiotemporal segmentation and ontologies for video indexing and retrieval Mezaris, V., Kompatsiaris, I., Boulgouris, N.V., Strintzis, M.G., Circuits and Systems for Video Technology, IEEE Transactions on, Volume: 14, Issue: 5, May 2004, Pages: 606 – 621. Abstract: In this paper, a novel algorithm is presented for the real-time, compressed-domain, unsupervised segmentation of image sequences and is applied to video indexing and retrieval. The segmentation algorithm uses motion and color information directly extracted from the MPEG-2 compressed stream. An iterative rejection scheme based on the bilinear motion model is used to effect foreground/background segmentation. Following that, meaningful foreground spatiotemporal objects are formed by initially examining the temporal consistency of the output of iterative rejection, clustering the resulting foreground macroblocks to connected regions and finally performing region tracking. Background segmentation to spatiotemporal objects is additionally performed. MPEG-7 compliant low-level descriptors describing the color, shape, position, and motion of the resulting spatiotemporal objects are extracted and are automatically mapped to appropriate intermediate-level descriptors forming a simple vocabulary termed object ontology. This, combined with a relevance feedback mechanism, allows the qualitative definition of the high-level concepts the user queries for (semantic objects, each represented by a keyword) and the retrieval of relevant video segments. Desired spatial and temporal relationships between the objects in multiple-keyword queries can also be expressed, using the shot ontology. Experimental results of the application of the segmentation algorithm to known sequences demonstrate the efficiency of the proposed segmentation approach. Sample queries reveal the potential of employing this segmentation algorithm as part of an object-based video indexing and retrieval scheme. Link: http://www.iti.gr/files/mezaris_csvt04_05.pdf Skin color-based video segmentation under time-varying illumination Sigal, L., Sclaroff, S., Athitsos, V., Pattern Analysis and Machine Intelligence, IEEE Transactions on, Volume: 26, Issue: 7, July 2004, Pages: 862 – 877. Abstract: A novel approach for real-time skin segmentation in video sequences is described. The approach enables reliable skin segmentation despite wide variation in illumination during tracking. An explicit second order Markov model is used to predict evolution of the skin-color (HSV) histogram over time. Histograms are dynamically updated based on feedback from the current segmentation and predictions of the Markov model. The evolution of the skin-color distribution at each frame is parameterized by translation, scaling, and rotation in color space. Consequent changes in geometric parameterization of the distribution are propagated by warping and resampling the histogram. The parameters of the discrete-time dynamic Markov model are estimated using maximum likelihood estimation and also evolve over time. The accuracy of the new dynamic skin color segmentation algorithm is compared to that obtained via a static color model. Segmentation accuracy is evaluated using labeled ground-truth video sequences taken from staged experiments and popular movies. An overall increase in segmentation accuracy of up to 24 percent is observed in 17 out of 21 test sequences. In all but one case, the skin-color classification rates for our system were higher, with background classification rates comparable to those of the static segmentation. Link: http://www.cs.bu.edu/techreports/pdf/2003-006.SkinColor.pdf Reconstruction of segmentally articulated structure in freeform movement with low density feature points Baihua Li, Qinggang Meng and Horst Holstein, Image and Vision Computing, Volume 22, Issue 10, (1 September 2004), Pages 749-759. Though a large body of research has focused on tracking and identifying objects from the domain of colour or grey-scale images, there is a relative dearth in the literature on complex articulated/non-rigid motion reconstruction from a collection of low density feature points. In this paper, we propose a

Page 22: WP4.2.2: Semantic Video Segmentation and Tracking … · Video Segmentation and Tracking ... A Survey on Visual Surveillance of Object ... morphological eye and mouth feature detection

segment-based articulated matching algorithm to establish a crucial self-initialising identification in model-based point-feature tracking of articulated motion with near-rigid segments. We avoid common assumptions such as pose similarity or small motion with respect to the model, and assume no prior knowledge of a specific movement from which to restrict pose identification. Experimental results based on synthetic pose and real-world human motion capture data demonstrate the ability of the algorithm to perform the identification task. Link: Not Available. Dynamic learning from multiple examples for semantic object segmentation and search Yaowu Xu, Eli Saber and A. Murat Tekalp, Computer Vision and Image Understanding, Volume 95, Issue 3, (September 2004), Pages 334-353. Abstract: We present a novel “dynamic learning” approach for an intelligent image database system to automatically improve object segmentation and labeling without user intervention, as new examples become available, for object-based indexing. The proposed approach is an extension of our earlier work on “learning by example,” which addressed labeling of similar objects in a set of database images based on a single example. The proposed dynamic learning procedure utilizes multiple example object templates to improve the accuracy of existing object segmentations and labels. Multiple example templates may be images of the same object from different viewing angles, or images of related objects. This paper also introduces a new shape similarity metric called normalized area of symmetric differences (NASD), which has desired properties for use in the proposed “dynamic learning” scheme, and is more robust against boundary noise that results from automatic image segmentation. Performance of the dynamic learning procedures has been demonstrated by experimental results. Link: http://www.ece.rochester.edu/users/tekalp/papers/cviu_yaxu_dynamic.pdf B. Conferences Semantic object segmentation by a spatio-temporal MRF model Zeng, W.[Wei], Gao, W.[Wen], Proceedings of the 17th International Conference on, Volume: 4, Aug. 23-26, 2004, Pages: 775-778. Abstract: In this paper, a region-based spatio-temporal Markov random field (STMRF) model is proposed to segment moving objects semantically. The STMRF model combines segmentation results of four successive frames and integrates the temporal continuity in the uniform energy function. The segmentation procedure is composed of two stages: one is the short-term’s classification and the other is temporal integration. At the first stage, moving objects are extracted by a region-based MRF model between two frames in a frame group of four successive frames. At the second stage, the ultimate semantic object is labeled by minimization the energy function of the STMRF model. Such phased segmentation process is corresponding to a multi-level simulated anneal strategy. Experimental results show that the proposed algorithm can efficiently capture the motion semantic meaning of objects and accurately extract moving objects. Link: http://www.jdl.ac.cn/doc/2004/ICPR2004-WeiZeng-MotionSegment.pdf Background removal system for object movies Tsai, Y.P.[Yu-Pao], Hung, Y.P.[Yi-Ping], Shih, Z.C.[Zen-Chung], Su, J.J.[Jin-Jen], Tsai, S.R.[Shang-Ru], Proceedings of the 17th International Conference on Pattern Recognition, Volume: 1, Aug. 23-26, 2004, Pages: 608-611. Abstract: In this paper, we present an interactive system for removing the backgrounds from object movies. Our system automatically extracts initial segmentation results based on observed haracteristics of object movies. These characteristics are (1) the distribution of background color is Gaussian, (2) the color difference between foreground and background is distinct, and (3) the background of the images

Page 23: WP4.2.2: Semantic Video Segmentation and Tracking … · Video Segmentation and Tracking ... A Survey on Visual Surveillance of Object ... morphological eye and mouth feature detection

set with the same tilt angle is static. The user can modify misclassified pixels in only a few frames. The corrected result is propagated to all frames through spatial and temporal coherence. After user manipulation, the alpha estimation process is performed to obtain the alpha values for pixels that are composed of both background and foreground. Our automatic process for obtaining initial segmentation results extracts most foreground and background pixels, and thus more accurate results are obtained with little user intervention. Link: http://csdl.computer.org/comp/proceedings/icpr/2004/2128/01/212810608abs.htm