[ieee 2013 sixth international conference on contemporary computing (ic3) - noida, india...

Classification and Indexing in Highway Traffic Videos

Shruti V Kamath Dept. of Computer Science RV College of Engineering

Bangalore, India [email protected]

Mayank Darbari Dept. of Computer Science RV College of Engineering


Dr. Rajashree Shettar Dept. of Computer Science RV College of Engineering


Abstract - This paper involves object detection, object tracking and classification of objects, and indexing from vehicle surveillance videos automatically. The objects are detected using optical flow method and tracked using Kalman Filtering method. The objects extracted are classified using ten features including shape based features such as area, height, width, compactness factor, elongation factor, skewness, perimeter, orientation, aspect ratio and extent. A comparative analysis is presented in this paper for the classification of objects (car, truck, auto, human, motorcycle, none) based on Multi-class SVM (one vs. all), Back-propagation, and Adaptive Hierarchical Multi-class SVM (Support Vector Machine). The results obtained from the above methods have an accuracy of 92 percent for Multi-SVM (one vs. all), 87.8 percent for Adaptive Hierarchical Multi-class SVM, and 82 percent for back-propagation. Using the trained classifier obtained using Multi-class SVM (one vs. all), the objects are classified and counted in real-time. In addition, objects are automatically indexed using type of object, size, color and the portion of the video it appears in. Keywords - Computer Vision; Object Classification; Object Detection; Object Indexing; Object Tracking; Feature Extraction

I. INTRODUCTION

Detecting and recognizing moving vehicles in traffic scenes for traffic surveillance, traffic control, and road traffic information systems is an emerging research area for Intelligent Transportation Systems. The need to find a vehicle because of any violation of traffic rules is a common case. A user would have to rewind the video to look for an event which happened at a previous time. This would be a very labor intensive and tedious process, and events could be overlooked due to human error, if there is not an effective content based method for indexing and retrieval. The proposed traffic monitoring system greatly reduces human effort.

Visual vehicle surveillance is one of the fastest growing segments of the security industry. Some of the prominent commercial vehicle surveillance systems include the IBM S3 [1] and the Hitachi Data Systems Solutions for Video Surveillance [2]. These systems provide the capability to automatically

monitor a scene and manage surveillance data and perform event based retrieval. In India, Aftek, Logica and Traffline [3] are some of the widely used traffic systems. The most recent research work in visual vehicle surveillance includes real-time vehicle detection by parts [4], integrated lane and vehicle detection and tracking [5] and occluded vehicle recognition and tracking [6].

The proposed system is a smart surveillance system which works for both real time and prerecorded traffic videos. The combination of methods used is unique, with suitable approach at each step. Thus, a complete system for content based indexing and retrieval for highway traffic videos is provided. Moving objects are detected and tracked in the given input video. Detected objects are classified based on their types using a robust selection of features. In case of prerecorded videos, if the videos are large, to save time and resources, shots are detected from it using colour histogram difference method, followed by key frame extraction as described in [21]. For detection of objects, the Optical Flow Model [23] is used. The objects detected in the video are tracked by Kalman Filtering method [22].

Objects detected are tracked and a number of low level features including shape based features are extracted. A set of 10 features is used compared to the common where only a subset of low level features is used. The objects extracted from the video are manually labelled for training. These samples are trained using Multi-class Support Vector Machine (one vs. all), Adaptive Hierarchical Multi-class SVM and Back-propagation algorithm and later tested for another set of samples. The trained data is used to predict the class the objects belong to. The classification results of these algorithms are compared. The trained classifier of the algorithm with the highest accuracy is used for classifying objects in real time and also are automatically indexed and stored for any number of videos.

The remainder of this paper is organized as follows. Section II is a Literature Survey, discussing the related work in the field of accident traffic analysis. Section III gives a brief system overview.

978-1-4799-0192-0/13/$31.00 ©2013 IEEE 359

Section IV describes the Optical Flow Model for object detection. In Section V Kalman Filtering Method for object tracking is explained. The features extracted for classification algorithms are explained in Section VI. Section VII describes the object classification process and algorithms. Indexing and Querying is explained in Section VIII. Experimental Analysis and Results are discussed in Section IX. Finally the conclusion and future work is presented in Section X.

II. LITERATURE SURVEY

Surveillance and monitoring systems generally require all the moving objects to be segmented in a video sequence. Background subtraction is a simple approach to detect moving objects in video sequences [7]. To deal with the difficulties arising during background subtraction, several methods have been proposed in [8]. A Gaussian Mixture Model [9] may be used to detect objects. Another set of algorithms is based on spatio-temporal segmentation of the video signal. These methods try to detect moving regions taking into account not only the temporal evolution of the pixel intensities and colour but also their spatial properties.

Various models and methods have been proposed for appearance-based object detection, in particular vehicle detection. Examples include the seminal work of Viola and Jones [10] and many extensions using different features, such as edgelets and strip features, as well as different boosting algorithms like Real Adaboost and GentleBoost. Support vector machines with histograms of oriented gradients have also been a popular choice for object detection.

A survey of the research in object tracking [11] discusses about new research in the area of moving object tracking as well as in the field of computer vision. Here research on object tracking can be classified as point tracking, kernel tracking and contour tracking according to the representation method of a target object. In point tracking approach, statistical filtering method has been used to estimating the state of target object. In kernel tracking approach, various estimating methods are used to find corresponding region to target object. Contour tracking is based on the fact that when an object arbitrarily deforms, each contour point can move independently in it. In the proposed system, Kalman Filtering is used for object tracking due to its predictive nature and its ability to handles occlusions.

Feature extraction in terms of supervised learning [12] can be described as, given a set of candidate features, the selection of a subset of feature most suitable for the classification algorithm to be used. Features are generally divided with respect to the shape and texture of the object. Shape features are based on the objects geometry, captured by both the

boundary and the interior region. Texture features on the other hand depend on the grayscale values of the interior. Some of the latest feature extraction methods include SIFT descriptor [13], SURF descriptor [14], GLOH features [15] and HOG features [16]. The features used in this system include area, perimeter, height, width, orientation, compactness, extent, skewness, elongation and aspect ratio. These low level features are robust when compared to [27][28] where only a few or a subset of these is usually used to classify objects.

In recent years, multi-class SVM has been one of the hot spots for many researchers, and multi-class classification based on clustering is one of the strategies. Support vector machines (SVM) based on structural risk minimization principle demonstrates the better learning ability for decision-making. Since the normal SVM is deduced from two classifications, it faced difficulty in solving the multi-class classifications like the shift decision of the engineering vehicle. A shift decision algorithm which is based on SVM-binary tree multi-class classification [17], distributes classifier to every node for constructing the multi-class SVM. Analysing the advantages and disadvantages of existing multi-class support vector machines, an improved multi-class support vector machines based on binary tree structure is presented in [18]. Linear discriminant analysis can also be introduced to binary-tree [19], in which pre-treatment of training samples is done before clustering is done to find optimal feature space in which the samples in the same classes are gathered together. For the features used in this paper, Adaptive Hierarchical Multi-class SVM was used since testing phase is quicker when compared with Multi-class SVM (one vs. all).

For indexing purposes [20], the vehicles are tracked over time, and each vehicle is given a unique ID. In addition, the location and the bounding box of each vehicle are output for each frame. This data format includes the ID, the first frame the vehicle appears, the vehicle’s position and size for each frame until it disappears, and its average size and type. In this proposed approach, additional parameters such as dimensions of vehicles are also used for indexing and retrieval which is done automatically for a set of videos for querying purpose later. Also these parameters vehicle is calculated just once for majority of vehicles.

After a thorough research, the following algorithms were concluded the best suited for this system. For object detection the Optical Flow Model [21] and Kalman Filtering [22] for tracking objects including in real time. A comparative analysis of classification algorithms Multi-class SVM (one vs. all), Adaptive Hierarchal Multi-class SVM and Back-propagation is presented for categorizing detected objects. The set of features used to train and the test the classification algorithms were a unique

360

combination of shape and texture features, namely perimeter, height, width, area, extent, compactness, elongation, orientation, aspect ratio and skewness.

III. SYSTEM OVERVIEW

The Fig 1 shows the organization of the various system components. Pre-processing, as described in [21] may be required for pre-recorded video sequences by performing shot boundary detection and key frame extraction to remove redundant frames for faster processing if there are long periods of inactivity in video. Object detection and tracking is done using Optical Flow Model and Kalman Filtering respectively. After this, features are extracted, which are used to train the classification algorithms, which classify the objects into pre-determined categories. Modules for Indexing and Querying are present as well.

Figure 1: Organisation of System Components

IV. OBJECT DETECTION USING OPTICAL FLOW

MODEL

Optical flow is the distribution of apparent velocities of movement of brightness patterns in an image. Discontinuities in the optical flow can help in the segmentation of images into regions that correspond to different objects [21]. Median filtering is applied to remove speckle noise, using a velocity threshold obtained from Optical flow method. Morphological operations are done to remove small objects and holes.

V. KALMAN FILTERING FOR OBJECT

TRACKING

The Kalman filter [22] tracking algorithm is used to track multiple objects. Thus, the position estimate in the next frame is determined. Then, the weights are updated; when the position in the next frame is known (it becomes present frame). Higher weights are given to those object tracks with higher certainty of being to that track and vice versa. The predicted tracks are

assigned to the detections (obtained from using Optical flow method described in section IV) using an assignment algorithm called Hungarian algorithm [24]. Thus the most optimal tracks are obtained. Bounding box and the trajectory for the objects is shown.

VI. FEATURE EXTRACTION

When the object is close to the camera, it is captured/cropped so that a clear image is obtained. The features selected for classification are stored in a feature vector. There are 10 features extracted from detected objects in this system for classification. Area is the actual number of pixels in the region. Perimeter is computed by calculating the distance between each adjoining pair of pixels around the border of the region. Extent is the proportion of the pixels in the bounding box. It is computed by dividing the area of the object (blob) by the area of the bounding box. Aspect Ratio is obtained by dividing the width of the bounding box, by the height. The height and width of the bounding box is also used. Compactness [30] indicates how round the region of the shape is and is given by equation 1. Elongation factor [31] is as shown in equation 2.

(1) (2) Orientation of object can be defined as angle between x-axis and principal axis, axis around which the object can be rotated with minimum inertia. It varies from -90 to 90 degrees between the x-axis and the major axis of the ellipse that has the same second-moments as the region [29]. It is given by equation 3:

. , 0,1 (3) Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point. It is given by the equation 4:

(4)

l here defines the length as explained above.

VII. OBJECT CLASSIFICATION

Once the objects are detected and tracked, they are categorized into one of the following six categories, namely “car”, “bike”, “truck/bus”, “human”, “auto” and “junk”.

In Multi-class SVM (one vs. all), the samples are trained which consists of 6 classes. It takes the samples from each class as positive samples and the remaining as negative samples, during training.

361

The Adaptive Hierarchical Multi-class SVM method in [25] using a top-down approach for training and testing. This has been implemented which used a decision tree approach. The dataset is divided into two non-overlapping subsets using k-means clustering algorithm (k=2). The means of each class is the input to the clustering algorithm. Using SVM, which is a binary classification algorithm, it is trained with one cluster given as positive samples and the other as negative sample. This is continued for each cluster till only one class remains. For testing purpose, a decision binary tree is built and a top-down approach is used. Starting from the root node, it goes down left or right sub-tree until it reaches a leaf which is the class it belongs to. This tree classifier for has 5 internal SVM nodes. The kernel function used is the Gaussian Radial Basis Function for both described above.

The Levenberg-Marquardt Back-propagation [26] Method is also used for classification. The number of input nodes is 10 which the number of features, number of hidden layer nodes is 12 and number of output nodes is 6. The number of patterns used is 1016 and learning rate used is 0.01

The accuracy of the three algorithms is compared. The algorithm with the highest accuracy is used for classifying objects in real time.

VIII. INDEXING AND QUERYING

The moving objects in video are which are classified are automatically indexed with the video ID, a position ID in its respective video, type it has been classified into, dimensions, color and frames of the respective video it appears in are stored in MATLAB’s mat files.

Color determination: The dominant color is computed by initially converting each input video frame into (hue, saturation, luminance) HSL space from its original RGB space, and then quantizing the HSL space into six bins. Before this quantization is done, a convex hull of the foreground mask (using blob analysis) is superimposed over the original vehicle image and the pixels within the contour of object are considered.

Size determination: To estimate the length and width of the detected cars, their orientation and the width and height of their bounding boxes are considered. Using a suitable scale factor, these dimensions are converted to centimeter from pixels.

The proposed system also has a querying module which can be used to query objects based on their dimension, color or type. The user may view the frames in which a particular queried object appears.

IX. EXPERIMENTAL ANALYSIS AND RESULTS

The entire system was developed using MATLAB, since MATLAB environment is convenient for testing purposes.

A training set of 1016 samples was prepared using 13 different videos, having a combined duration of 12 minutes and 20880 frames in total. Similarly, a testing set of 500 samples was created using 4 different videos, having a combined duration of 4 minutes and 6960 frames in total. As shown below in table 4, Multi-class SVM has highest accuracy so its trained classifier is used to classify objects in real time with the count determined for each type of object as shown in Fig 2. The objects (vehicles and human) which are indexed and stored can be queried using type, dimension and/or color, depending on the object type.

Figure 2: GUI with count displayed for objects in real time

A. Confusion Matrices

The confusion matrices for the classification algorithms are shown in tables 1, 2 & 3.

Predicted class Actual Class

Car Bike Bus/Truck Human Auto Junk

Car 199 2 1 1 0 22Bike 4 86 0 0 0 5Bus/Truck 0 0 15 0 1 1Human 0 0 0 10 0 0Auto 0 0 0 0 2 0Junk 3 0 0 0 0 148

Table 1: Confusion Matrix for Multi-class SVM (Accuracy = 92%)



Car 171 0 0 1 0 12Bike 3 82 0 0 0 1Bus/Truck 5 0 13 0 0 0Human 0 3 0 9 0 1Auto 5 0 3 0 3 1Junk 22 3 0 1 0 161 Table 2: Confusion Matrix for Adaptive Hierarchal SVM

(Accuracy = 87.80%)

362



Car 178 7 5 1 0 28Bike 7 75 0 2 0 1Bus/Truck 2 0 10 0 3 2Human 0 0 0 3 0 0Auto 0 0 0 0 0 0Junk 19 6 1 5 0 145

Table 3: Confusion Matrix for Back-propagation (Accuracy = 82.20%)

B. Accuracy

Accuracy [9] is the overall correctness of the model and is calculated as the sum of correct classifications divided by the total number of classifications. It is shown in Table 4.

S/No. Classification Algorithm

Accuracy Percentage

1 Multi-class SVM (one vs. all)

460/500 92%

2 Adaptive Hierarchal Multi-class SVM

439/500 87.80%

3 Back-propagation

410/500 82.20%

Table 4: Accuracy of Classification Algorithms

C. Precision

Precision [9] is a measure of the accuracy provided that a specific class has been predicted. It is defined by equation (5), where tp and fp are the numbers of true positive and false positive predictions for the considered class.

Precision = tp/(tp + fp) (5) Multi-class

SVM (one vs. all)

Adaptive Hierarchal Multi-class SVM

Back-propagation Algorithm

PrecisionCars 96.60% 83.00% 86.40%PrecisionBike 97.72% 93.18% 85.22%PrecisionTruck 93.75% 81.25% 60.25%PrecisionHuman 90.90% 81.81% 27.27%PrecisionAuto 66.66% 100% 0%PrecisionJunk 84.90% 91.47% 82.38%

Table 5: Precision for Classified Objects

X. CONCLUSION & FUTURE WORKS

In this paper, a novel traffic analysis and monitoring system that is capable of operating in real-time is presented. Objects detected are tracked, and suitable features are extracted for object classification. A comparative analysis of three algorithms, namely Multi-class SVM (one vs. all), Back-propagation, and Adaptive Hierarchical Multi-class SVM is presented, since they are essential to extract suitable vehicle features and vehicle parameters that can be used for object classification.

The accuracy was highest for Multi-SVM, 92%, followed by Adaptive Hierarchal Multi-SVM with 88% and then the Back-propagation Algorithm, 82%. Classification of objects is done in real time providing the count for each type of object.

The system also has a Querying module, which is able to query objects based on their dimension, color and type. Also, it displays the portion of video which contains the queried object. The accuracy of module depends on the classification mentioned above. A suitable system for indexing and retrieval for videos is provided which works well in medium traffic conditions. The combination of low level features used has proved to be efficient in classifying objects into different types in highway traffic videos.

Future works include, improving the performance of the detection and tracking algorithm, problems created by shadows and occlusion are planned to be addressed in heavy traffic conditions by using better background modelling techniques and re-segmentation of the segmented vehicle. It is also planned to make these modules operate under night conditions. Also it is planned to collect more traffic data from different camera angles to make these modules robust to various conditions and situations.

REFERENCES [1] Arun Hampapur, “S3-R1: The IBM Smart Surveillance

System-Release 1,” IBM T.J. Watson Research Centre, New York, U.S.A, 2006.

[2] Scott Bradley and Peter DeCoursey, “Hitachi Data Systems Solutions for Video Surveillance,” Hitachi Data Systems Corporation, 2011.

[3] Rijurekha Sen and Bhaskaran Raman, “Intelligent Transport Systems for Indian Cities,” 6th USENIX/ACM Workshop on Networked Systems for Developing Regions at Boston, 2012.

[4] Sayanan Sivaraman and Mohan M. Trivedi, "Vehicle Detection by Independent Parts for Urban Driver Assistance,” IEEE Transactions on Intelligent Transportation Systems, Anchorage, Alaska, U.S.A., 2013.

[5] Sayanan Sivaraman and Mohan M. Trivedi, "Integrated Lane and Vehicle Detection, Localization, and Tracking: A Synergistic Approach,” IEEE Transactions on Intelligent Transportation Systems, Anchorage, Alaska, U.S.A., 2013.

[6] Eshed Ohn-Bar, Sayanan Sivaraman, and Mohan M. Trivedi, “Partially Occluded Vehicle Recognition and Tracking in 3D,” IEEE Intelligent Vehicles Symposium, San Diego, California, 2013.

[7] Kirk James, F. O’Brien and David A. Forsyth, “Skeletal Parameter Estimation from Optical Motion Capture Data,” University of California, Berkeley, 2008.

[8] J. Komala Lakshmi and M. Punithavalli, “A Survey on Performance Evaluation of Object Detection Techniques in Digital Image Processing,” International Journal of Computer Science, pp. 562-568, Vol. 7(6), 2010.

[9] Shruti V Kamath, Mayank Darbari and Rajashree Shettar, “Content Based Indexing and Retrieval from Vehicle Surveillance Videos Using Gaussian Mixture Method,” International Journal of Computer Engineering & Technology, Vol. 4(1), pp. 420-429, 2013.

[10] Paul Viola and Michael Jones, “Robust Real-time Object Detection,” International Workshop on Statistical and Computing Theories of Vision, Vancouver, Canada, 2001.

363

[11] Kinjal A Joshi and Darshak G Thakore, “A Survey on Moving Object Detection and Tracking in Video Surveillance System,” International Journal of Soft Computing and Engineering, Vol. 2(3), 2012.

[12] Luis Carlos Molina, Lluís Belanche and Àngela Nebot, “Feature Selection Algorithms: A Survey and Experimental Evaluation,” University of Barcelona, Spain.

[13] David Lowe, "Object recognition from local scale-invariant features," Proceedings of the International Conference on Computer Vision, Vol. 2, pp. 1150-1157, 1999.

[14] Herbert Bay, Andreas Ess, Tinne Tuytelaars, Luc Van Gool, "SURF: Speeded Up Robust Features," Computer Vision and Image Understanding, Vol. 110, No. 3, pp. 346-359, 2008.

[15] Krystian Mikolajczyk and Cordelia Schmid, "A performance evaluation of local descriptors," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 10(27), pp. 1615-1630, 2005.

[16] N. Dalal and B. Triggs, “Histograms of Oriented Gradients for Human Detection,” Computer Vision and Pattern Recognition, 2005.

[17] Shunjie Han, Wen You and Hui Li, "Application of Binary Tree Multi-class Classification Algorithm Based on SVM in Shift Decision for Engineering Vehicle," IEEE International Conference on Control and Automation at Guangzhou, 2007.

[18] Chaobin Liu, Yuexiang Yang and Chuan Tang, "An Improved Method for Multi-class Support Vector Machines ," International Conference on Measuring Technology and Mechatronics Automation at Changsha City, 2010.

[19] Junjie Chen, Wei Zhao and Haifang Li, "Research on Multi-Class SVM Combining with LDA ," International Symposium on Intelligent Information Technology Application Workshops at Shanghai, 2008.

[20] Chang Liu, George Chen, Yingdong Ma and Xiankai Chen, “A System for Indexing and Retrieving Vehicle Surveillance Videos,” 4th International Congress on Image and Signal Processing, 2011.

[21] Shruti V Kamath, Mayank Darbari and Rajashree Shettar, “Content Based Indexing and Retrieval from Vehicle Surveillance Videos Using Optical Flow Method,” International Journal of Scientific Research, Vol. 2(4), 2013.

[22] R.E. Kalman, “A new approach to linear filtering and prediction problems,” Journal of Basic Engineering, Vol. 82(1), pp. 35–45, 1960.

[23] B. K. P. Horn and B. G. Schunck, “Determining optical flow,” Artificial Intelligence, vol. 17, pp. 185–203, August 1981.

[24] András Frank, “On Kuhn's Hungarian Method – A tribute from Hungary,” Egervary Research Group, Budapest, Hungary, 2004.

[25] Song Liu, Haoran Yi, Liang-Tien Chia, and Deepu Rajan, “Adaptive Hierarchical Multi-class SVM Classifier for Texture-based Image Classification,” IEEE International Conference on Multimedia & Expo, NTU, Singapore, 2005.

[26] Raúl Rojas, “The backpropagation algorithm of Neural Networks - A Systematic Introduction,” Springer-Verlag, Berlin, 1996.

[27] Jun-Wei Hsieh, Shih-Hao Yu, Yung-Sheng Chen and Wen-Fong Hu, “Automatic Traffic Surveillance System for Vehicle Tracking and Classification”, IEEE Transactions on Intelligent Transportation Systems, Vol. 7, No. 2, 2006.

[28] Deng-Yuan Huang, Chao-Ho Chen, Wu-Chih Hu, Shu-Chung Yi, and Yu-Feng Lin, “Feature-Based Vehicle Flow Analysis and Measurement for a Real-Time Traffic Surveillance System”, Journal of Information Hiding and Multimedia Signal Processing, Vol. 3, No. 3, July 2012.

[29] Dmitrij Csetverikov, “Basic Algorithms for Digital Image Analysis: a course”, Institute of Informatics, Eötös Lorànd University, Budapest, Hungary.

[30] Berna Erol and Faouzi Kossentini, “Retrieval of video objects by compressed domain shape features”, The 7th IEEE

International Conference on Electronics, Circuits and System, Vol 2, pp. 667-670, 2000.

[31] Valdek Mikli, Helmo Käerdi, Priit Kulu, And Michal Besterci, “Characterization of Powder Particle Morphology”, Proc. Estonian Acad. Sci. Eng., 7, 1, pp. 22–34, 2001.

364

[ieee 2013 sixth international conference on contemporary computing (ic3) - noida, india...

Documents