full-body occlusion handling and density analysis in traffic video-surveillance systems

Full-Body Occlusion Handling and Density Analysis in Traffic Video-1 Surveillance Systems 2

3 Authors: 4 5 Evangelos Palinginis, M.Sc.1 6 Graduate Research Assistant, [email protected] 7 8 Man-Woo Park, Ph.D.2* 9 Assistant Professor, [email protected] 10 11 Keitaro Kamiya, M.Sc.1 12 Graduate Research Assistant, [email protected] 13 14 Jorge Laval, Ph.D. 1 15 Associate Professor, [email protected] 16 17 Ioannis Brilakis, Ph.D.3 18 Laing O’Rourke Lecturer, [email protected] 19 20 Randall Guensler, Ph.D. 1 21 Professor, [email protected] 22

23 1Phone: 404-894-2201 / Fax: 404-894-2278 24 School of Civil and Environmental Engineering 25 Georgia Institute of Technology 26 790 Atlantic Drive, Atlanta, GA 30332 27 28 2Phone: +82-31-330-6411 29 Department of Civil and Environmental Engineering 30 Myongji Unversity 31 116 Myongji-ro, Cheoi-gu, Yongin, Gyeonggi-do 32 449-728, Korea 33 34 3Phone: +44-(0)-1223 332718 35 Department of Engineering 36 University of Cambridge 37 Trumpington Street, Cambridge, 38 Cambridgeshire CB2 1PZ 39

40 * Corresponding author 41

42 Word Count: 7423 (text: 6923; tables: 1 x 250: 250; figures: 1 x 250: 250) 43 Submission Date: August 01, 2013 44 45

TRB 2014 Annual Meeting Paper revised from original submittal.

Palinginis, Park, Kamiya, Laval, Brilakis, and Guensler 2

ABSTRACT 1 Vision-based traffic surveillance systems are amidst the most reliable, inexpensive and highly 2 applicable methodologies for surveying traffic conditions. The implementation of these 3 strategies, however, is limited under certain conditions, such as the presence of vehicle 4 occlusions or poor illumination conditions that lead to either over-counted or undercounted 5 traffic data. The proposed motion-based methodology aims at overcoming these limitations by 6 employing a new technique for full-body occlusion handling of vehicle cars. The methodology 7 is based on five main steps and three main methods: Background Subtraction, Histogram of 8 Oriented Gradients (HOG) trained by linear Support Vector Machine (SVM), Haar-Like features 9 (HL) trained by Adaboost and Vehicle Counting. The proposed methodology is tested with 10 various 30-minute videos and 452 pre-identified cases of occlusion. Preliminary results indicate 11 that the proposed methodology is reliable and robust in providing traffic density analysis. Future 12 work may rely on the extension of the proposed methodology to deal with the detection of 13 vehicles moving towards multiple directions. 14 15 16



INTRODUCTION Vision-based traffic surveillance systems have been widely implemented as one of the most 1 efficient and reliable alternatives to the traditional mean of surveying traffic conditions (1). 2 Precise detection of every vehicle, however, is not possible with current technology (1-2) using 3 pre-existing algorithms given varying environmental conditions, such as illumination, and given 4 traffic conditions that result in occlusion of vehicles. The vast majority of the existing research-5 oriented or commercialized traffic counting systems still suffer from over-counting by detecting 6 multiple smaller parts of larger vehicles, or under-counting due to the implementation of non-7 robust detectors. In the special state of heavily congested traffic, incorrect detection can be 8 severely exaggerated because most existing systems rely on motion-based detection; thus, many 9 overlapped vehicles may be counted as one. Additionally, motion-based detection may cause 10 alignment errors as an effect of the occluded object. In particular, when the detected object is 11 occluded, regardless of the pre-occluded and the post-occluded state, the centroid at the center of 12 the vehicle is shifted, altering and transposing the trajectory of the vehicle. Hence, the detected 13 vehicle is disassociated from its movement trajectory. 14

The proposed full-body occlusion handling framework allows for a wider area of 15 detection zones to be pre-set to match only the reliable data with the total number of counted 16 vehicles. This way, the algorithms can detect a vehicle that would otherwise be missed in the 17 traditional traffic counting methods, because a clear view of the fully occluded vehicle can be 18 recovered before and immediately after the state of occlusion. The five-step methodology is 19 implemented to automatically detect the vehicle element as truthful to the ground truth as 20 possible. 21

Contrary to motion-based detection, one of the advantages of feature-based detection is 22 its robustness against alignment error. More accurate counting per lane is facilitated with this 23 method. The detected vehicle triggers the tracking algorithm with eigenimages and estimates 24 vehicles’ location through particle filtering. Then, the tracking data are converted into real-world 25 road coordinates enabled through calibration of the video field of view. In this particular 26 framework, the detected vehicle may then be completely covered by an adjacent vehicle (e.g. 27 heavy-duty truck), depending on the angle of the stationary camera regarding the road surface, 28 leading to tracking failure and a required redetection of the vehicle. 29

In this paper, two mitigation strategies are employed to remove the previously detected 30 tracking data and to solely consider the re-detected object so as to avoid over-counting. Also, a 31 methodology to re-detect the occluded vehicle by the time it re-appears is presented for use with 32 a spectrum of traffic state scenarios. Initially, various cases where full-vehicle occlusions occur 33 in the sample video are examined to enable the extraction of workable samples. The extraction 34 process is achieved by recognizing the irregular movement pattern of the labeled counts provided 35 by the speed and acceleration data obtained through the object's trajectory in the calibrated field 36 of view. Accordingly, the data corresponding to entirely occluded objects are stored 37 automatically into databases and trained using feature-based detection to create learning 38 classifiers iteratively applied to each occurrence. Finally, the occlusion handling from speed and 39 acceleration data, as well from “tweaking” several tracking parameters are compared each other 40 to assess which method provides the most reliable tracking occlusion handling. 41

The processing was conducted on surveillance video data provided by the Georgia 42 Navigator System in the State of Georgia, USA, which covers most of the highway corridor in 43 metropolitan Atlanta. The benchmarking is conducted between the performance results of the 44 classifier from the automatic and manual data collection to assess the feasibility of the automated 45



image-collection method. The proposed methodology is tested using various thirty-minute-long 1 videos recorded from different locations along the Georgia Highway Corridor. Because the 2 credibility of the methodology relies greatly on the precision of the results, manual counts are 3 employed as ground truth data for comparison with post-process analysis. The detection rate of 4 feature-based detection as well as the removal rate of occluded vehicles is weighted individually. 5 The results indicate the potential of the proposed occlusion-handling technique to provide 6 reliable traffic flow results. Though it relies on either the instantaneous or average speed, the 7 method is effective because of its versatility in adapting various environmental variables such as 8 camera view-angle. 9 10 LITERATURE REVIEW 11 Motion-based detection considers a descriptive pattern of moving objects separated from the 12 background model to locate objects of interest. In the application of vision-based traffic 13 surveillance tools, many methods have been implemented, including adaptive median filters (3), 14 color median (4-5), frame differentiation (6), mixture of Gaussian (MoG) (7-8), wavelet 15 differentiation (9), kernel-based density estimation (10-11) and sigma-delta filters (12). Also, 16 MoG is implemented with shadow elimination based on color reflectance and gradient feature 17 (8). Recently, a sigma-delta filter was proposed (12) to continuously recognize a stopped vehicle 18 as the foreground object and reduce the computational cost. Traditionally, background 19 subtraction methods cannot cope with the major intrinsic problems of vision-based systems, such 20 as occlusion, shadow, illumination, camera distortion and weather conditions (13); therefore, 21 they are not considered reliable while standing alone. Also, contemporary background 22 subtraction itself is ineffective detector of the vehicle objects at their center, which is critical for 23 initiating accurate tracking records. 24

Many researchers have employed feature-based detection on top of computationally 25 inexpensive background subtraction models to constrain the region of their interest (ROI). This 26 strategy works well in dealing with shadow effects that cause over-counting, because the 27 detected window fits to the vehicles’ appearance. In particular, various features of vehicle 28 objects are targeted for appearance-based approach including - but not limited to - shape (14), 29 points (15), edge (16) and symmetry (17) that are uniquely distinguishable. In approaches where 30 machine learning is implemented to classify the object, detection capability relies on visual 31 features that are selected for training. Furthermore, the extraction of point and edge-related 32 information in low-quality images contains heavy noise and it is difficult to provide precision as 33 correlation among neighboring pixels is not clearly defined. 34

More descriptive pattern-based object features such as Haar-wavelet features, HoG and 35 Local Binary Patterns (LBP) are proven to be effective for vehicle detection applications. Haar–36 wavelet features classify each object by comparing the intensity of every adjacent pixel values 37 and differentiating them depending on the presence of intensity gradients (18-19). Haar features 38 are known to be robust for low-resolution images (18, 20) and yield fast processing times. 39 Recently, Haar-Like features (HL) without gradient information were trained with SVM to attain 40 satisfactory results (21). Color-based Haar detection was also demonstrated to be more effective 41 than traditional gray-scale Haar feature detection (22). Another famous feature, the HoG, is 42 widely used to consider gradient information and orientation around key point location of all 43 pixels in vehicle image (15, 23). This methodology (24) was further investigated upon 44 implementing unsupervised learning of images collected by background subtraction. 45 Benchmarking of HL and HoG features for vehicle detection was conducted (23) and both 46



demonstrated to be effective. In a method described by Avery et al. (15), point-based features 1 called Multi-Block LBP are considered to detect flat areas, corner points and edges in the trained 2 images using AdaBoost along with integral image for faster computation. In addition to the 3 benefit of appearance-based features described above, the precise acquisition of centroid of 4 vehicle object is now possible because the detected window is appropriately fit to the vehicles’ 5 appearance. 6

Many of the existing detection algorithms used in traffic surveillance applications 7 recognize the importance of identifying the presence of occlusion; however, they do not address 8 how the blocked view causes problems. Also numerous tracking algorithms have been 9 implemented and improved; notably the Kalman filter (25), the particle filter (7, 26) and mean-10 shift (27). Within one important research initiative (25), the corner points of the detected vehicle 11 are recognized by specific user-defined eigenvalue thresholds, and tracked using Kalman filter. 12 Some known hybrid models as seen in a method described by Prevost et al. (23) use projective 13 Kalman filter combined with mean shift tracker. Others simply linked independent detection in 14 consecutive frames for tracking vehicles, assuming constant acceleration (24). 15

The majority of the existing tracking strategies involve tracking of local feature points 16 instead of capturing the non-transportation object in its entirety. This process is widely known as 17 feature cluster tracking. In particular, Canny edge detection algorithms and Hough 18 transformation were used to obtain the spatiotemporal information of the moving vehicles (16). 19 Those clustering Hough feature points are grouped together to represent the vehicle's travel path, 20 however, by assuming that the vehicle always travels linearly, which does not necessarily 21 represent the real-world. Vehicles pass through curves and change lanes during periods of 22 occlusion. 23

Similar concepts are implemented by considering clusters of point-based feature that are 24 tracked using the Lucas-Kanade feature tracker provided as an open source (OpenCV) (28). It is 25 critical to have a precise spatiotemporal travel path, because even a minor flickering movement 26 can cause great variability in instantaneous speed and acceleration. Although not transportation 27 related, some researchers have successfully demonstrated that the localized centroid update 28 method can sufficiently alleviate small fluctuation in object movement (29). This method 29 however requires additional post-processing work depending on the measurements between 30 target model and detected candidate. More descriptive information regarding some of the 31 tracking algorithms presented in this document can be found in the document created by Yilmaz 32 et al. (30). 33

Recently, state-of-the art commercialized products work similarly by localizing the 34 detection zone to detect different types of vehicles (31). The structure of these systems is 35 computationally inexpensive and can run on a real-time basis while retaining very respectable 36 detection rates; however, they contain some inevitable problems caused by the object occlusion 37 where larger vehicle with partially occluded smaller vehicle are typically considered as one 38 object. This happens because foreground detection methods are not intrinsically designed to 39 segregate multiple occluded vehicles (16). In another case, the appearance of larger vehicle or 40 vehicle's shadow occluding the adjacent lanes also triggers false detection in commercial systems 41 (28). Some detection-and-tracking systems are known to perform in real time as seen in a method 42 proposed by Rodrigue et al. (32), however the traffic speed is assumed to be constant for each 43 vehicle. Given the aforementioned, the merit of using computer vision as a surveillance tool had 44 been limited; focusing strictly on building reliable systems that can perform in real-time. 45

Last but not least, most of the existing methods have been employed at either the ground 46



level, observed from an overpass bridge, or on high-ground, where the detected vehicles’ size 1 remains large with less noise relative to the video frame size (6, 23, 26, 28, 33). However, most 2 Closed-Circuit Television Cameras (CCTV) cameras in surveillance networks are located very 3 high above the ground, thus each vehicle “occupies” a small area within the camera field of 4 view. The reduced number of pixels representing the vehicle in the field of view may not affect 5 the detection capability of the system when state-of-the-art background subtraction algorithms 6 are applied that cope well with the motion variables. However, degraded quality and smaller 7 pixel-wise objects certainly can affect the tracking accuracy of vehicle objects because single 8 pixel fluctuation in tracking quality on 2-D image coordinates results in greater difference in 9 real-world vehicle trajectory. 10

Considering the CCTV cameras’ field of view on highways, severe full-body occlusion 11 has a significant impact on the tracking accuracy of the software. The proposed framework is 12 able to detect and track each vehicle independently, even if it is occluded. The only major 13 constraint is based on the requirement to have the vehicle-to-be-occluded at least partially shown 14 at the beginning. Moreover, if the scaling factor of each tracker diminishes significantly per 15 frame, it is more probable that the detected element is occluded. On the contrary, full-body 16 occlusion is sometimes imminent when the clear view of another vehicle, such as large truck, is 17 completely obscured by another vehicle in the adjacent lane. In particular, the feasibility of 18 capturing the ground truth movement of the vehicle is examined using its trajectory estimation 19 given the initial speed and acceleration to estimate the position and speed of the vehicle when the 20 state of occlusion is passed. Finally, the averaged acceleration data are used on top of the scaling 21 factor to distinguish between false detections and true occlusions. 22

The proposed framework can benefit transportation researchers and practitioners by 23 proving more accurate results associated with the traffic density, evaluation of drivers’ behavior 24 in respect to the existing real-time condition (i.e., maintenance projects, weather, and incidents) 25 and evaluation of the causes of crashes. These improvements can be made by applying new 26 software to existing CCTV video camera systems (i.e., at no additional capital cost for data 27 collection). The accurately extracted data can be a stepping-stone for further assessment of the 28 systems’ performance. 29 30 METHODOLOGY 31 This paper drives new technology information and achieves what has never even been attempted 32 before; a method for full-body occlusion handling and density analysis in traffic video-33 surveillance systems where the object to be occludes is a moving vehicle. The proposed method 34 consists of five main stages: camera calibration, detection of the vehicle-to-be-occluded, tracking 35 of the vehicle-to-be-occluded, and the occlusion phase that includes the sub-phases of vehicle 36 counting and full-occlusion. Figure 1 illustrates the main steps and the appearance of the output 37 per step. 38



1

2 3

FIGURE 1: The main steps of our framework 4 5 Camera Calibration 6 Camera calibration provides the relationship between the 2-D image coordinates and the ground-7 truth coordinates. The CCTV cameras used in the Georgia Highway Transportation System 8 provide a unique configuration where the field of observation is static; thus the calibration is 9 robust for surveillance applications. During calibration, each video sample has been calibrated to 10 its unique characteristics using the Vanishing Point, Length and Width calibration strategy (34). 11 Considering that the width of each traffic lane is 12ft and the transverse length between each end 12 of the tick mark of dotted lane 40ft, as they are both depicted in figure 1, the intrinsic camera 13 properties including focal length (f) or external variables such as tilt angle (φ) and pan angle (θ) 14 can be computed. 15

The real world three-dimensional coordinates (xr, yr, zr) are converted into an image 16 coordinate (𝜇!, 𝑣!) in following format: 17

18 µivi1=

f 00 -fsin(φ)

0fhcos(φ)

0 cos(φ) hsin(φ)*xryr1, (Eq. 1) 19

20 assuming every point on the road is a planer object, which is not the case for vehicle objects. To 21 incorporate this change, the transformation matrix is modified (eq.2) such to include the height 22 of each vehicle: 23 24



µivi1=f 00 -f*sin(φ)

0f(h-hz)*cos(φ)

0 cos(φ) (h-hz)*sin(φ)*xiyi1, (Eq. 2), 1

2 where the height is approximated as an array of 1xN size (eq. 3): 3 4

hz∈[ho, ho+hc, ho+2*hc, ho+3*hc,…hf], (Eq. 3). 5 6

The height parameters are specified as initial height, ho, final height, hf, and increment value, hc. 7 Moreover, the user specifies the total number of lanes and identifies an area in the video called 8 entry zone (EZ) in the video frame where the detection phase will occur. The explicit selection of 9 the EZ is critical for the steps that follow in our methodology. The details of the selection in the 10 vehicle-counting chapter are described later in this paper. 11 12 Detection 13 The detection phase is subdivided into two steps; Background Subtraction and Haar-Feature 14 detection. Our method implements two types of cues to characterize the appearance of the 15 vehicles; motion and shape. These cues are exploited separately in a sequence of two steps. In 16 the first step, the foreground blobs of moving objects are obtained. Then, the vehicles are 17 separated from the foreground blobs based on their shape. The shape of the vehicles is defined 18 by patterns of HoG features. Since the CCTV cameras are fixed and the field of view is static, 19 the background can be estimated, therefore, the regions of the vehicles can be detected with 20 lower processing time and fewer errors. To classify the areas of our interest, the AdaBoost 21 classifier (35) is used. 22

To construct a classifier of vehicles’ shapes, which determines whether a sample of HoG 23 feature represents vehicles or not, a large number of training images is required. Through the 24 training, a classifier learns the patterns of HoG descriptors that can discriminate vehicles from 25 other objects. The AdaBoost classifier is used to train Haar-Cascade by grouping the HL from 26 the truly detected vehicles and segregating them from the false-detected images. During the 27 process, each HL is applied separately to the same vehicle within the region of interest. The 28 Haar-Cascade implementation in its default inserts a parameter called closer neighbor (37). The 29 corresponding value of the closest neighbor specifies the minimum number of feature points that 30 the overlapping detection needed for the object to be detected. Whenever the number of feature 31 points decreases, more objects are detected because it reduces the number of overlapping regions 32 to be detected. The output of detection is a number of rectangles surrounding each vehicle 33 separately. 34 35 Tracking 36 The detection and matching of each image pixel in consequent frames to corresponding pixels 37 allows the determination of the position of each vehicle for each time interval, or “timestamp”. 38 To minimize computational effort and provide a real-time tracking algorithm, the matched and 39 marked entities are used as input to the existing vision-based tracking algorithms in every frame. 40 Given the automated input of these entities, the tracking methods can be used for real-time 41 tracking without manual re-identification of the vehicle. 42

The appearance of the vehicle is modeled with principal components of its eigenimages, 43 which is stored in temporal subspace. The principal components are repeatedly updated during 44



tracking. The implementation of this tracking algorithm works smoothly with the Haar-Cascade 1 detection because it establishes a good starting point for initial eigenimage capturing without any 2 subsequent processing. This method is also fundamentally advantageous in detection-tracking 3 systems as it is tracking a memorized object rather than the cluster of identical characteristics 4 denoted by point or line features. In previous work (1), tracking precision was highest when 5 kernel-based methods were implemented; however, occlusion and changes in illumination and 6 scale were considered independently. It is critical, though, to determine not only the number of 7 occluded vehicles but also their behavior throughout the phase of occlusion. 8 9 Vehicle Counting and False Detection 10 The first step is to specify the Entry Region (ER); an area where the detection and tracking 11 begin. The selection of the ER is the most important step because the precision of each further 12 step is dependent upon this selection. Thus, the ER needs to include all the areas where detection 13 and tracking works the best and simultaneously exclude all the stationary objects that can hinder 14 the tracking capability. 15

For every detected vehicle, the tracking algorithm seeks to capture the x and y 16 coordinates of the center of the box that surrounds the vehicle as long as it remains in the ER. 17 Simultaneously, the x-y pairs “draw” the trace of its trajectory; thus, information about the speed 18 and location of the vehicle can be obtained. The field of view calibration then promotes the 19 conversion of the 2-D image planes to 2-D road coordinates. Whenever the vehicle departs from 20 the ER, a counter placed over the specific lane increments the count by one. This process is 21 repeated for all the lanes. 22

One of the predominant constraints of counting in the ER is that various interfering 23 traffic-related and visual conditions may occur in the ER. These include partial or full-body 24 occlusion, lane changes, closely following cars and visibility issues. The first and most important 25 is the recognition of false detected vehicles. 26

For the purpose of false detection, the relationship between average and individual speed 27 versus density is examined. In general, traffic density is inversely proportional to traffic speed 28 (36); meaning that if the flow speed is high, then density is generally low, and vice versa. If the 29 relationship is not met, the detected object is more likely to be a non-vehicle. However, the 30 variability of speed versus density is not linear and cannot be expressed as a function, therefore a 31 generalized relationship cannot serve as a precise reference. Also, each lane behaves differently 32 (forthcoming research results). The authors decided to store the local behavior per lane at 33 specific timestamps to deal with this issue. Equation 1 describes the average local speed at 34 timestamp t as the following: 35 36

vmin,t,l=vrt,l= vk,tend, lNID=1

N- vth (Eq. 4) 37

38 Additionally, the scale factor serves as a criterion of false detection. One contour box 39 circumscribes each vehicle and thus is assumed to decrease at a specific rate as it moves away 40 from the camera. If the decreasing rate fluctuates, then it is likely that the box slipped, indicating 41 that the detected object is a non-vehicle. The formula in this case is presented in Equation 2. 42

St> Savg4

(Eq. 5) 43



Finally, both the instantaneous and averaged speed data are affected by the varying frame rate of 1 the video. For this reason, there is a slight probability that the vehicle can be falsely identified as 2 a non-vehicle. To resolve this problem, the tracking of each vehicle contains a method where for 3 each time the speed falls within the filter range, as specified by Equation 1 or 2, its value is 4 incremented by 1. If this value exceeds or equals the pre-determined threshold (in this 5 framework, a constant threshold of 2 is used), then such tracking can finally be considered as 6 non-vehicle. This threshold of 2 implies that the acceleration of the vehicle in one frame doubles 7 the instant acceleration of the vehicle in the immediately preceding and following frames. Given 8 that the total time that 3 consecutive frames last for cannot exceed 0.1s with any current 9 technology (i.e. videos with more than 30 frames per second), this abrupt change in acceleration 10 can be caused only by occlusion. The arithmetic value of the threshold is proposed after 11 performing hierarchical cluster analysis on over two hundred thousand “200,000” 12 acceleration/deceleration abrupt changes. 13 14 Occlusion Handling 15 Occlusion handling is more complicated than the identification of false detection because: 16

1. The high variance in speed fluctuation at the instances when the clear view of a vehicle is 17 hindered; as an example, in queue dissipation conditions, 18

2. Different types of occlusion occur, including occlusion of one car by another in free flow 19 conditions, multiple occlusion from one car under stop-and-go conditions and more, and 20

3. The relationship between the acceleration and the occluding vehicle just before, during 21 and after the occlusion. 22

Therefore, pattern recognition regarding different types of occlusion scenarios as well as their 23 relation to the general traffic movement can be hard to generalize. The clear view of a single 24 vehicle moving at constant speed can be hindered by traffic and vehicle types moving various 25 speeds in the neighboring lane, anywhere from 0 mph to design speed (e.g. 75mph on highway) 26 depending on congestion level. 27

The tracking algorithm presented in (5) offers a unique solution to the phenomenon of a 28 vehicle image “disappearing" from the scene when such vehicle is completely occluded by an 29 adjacent larger vehicle (such as a large truck). The precision, though, of this algorithm depends 30 on the presence of clear trained images from the vehicle that is initiated in the detection phase. 31 The algorithm continuously recaptures and updates the view of this vehicle element throughout 32 the vehicles’ movement and stores the spatial data into the temporary subspace, while managing 33 to retain accuracy even as the vehicle image is rotated or skewed. However, when the vehicle's 34 clear view is disrupted by the occlusion, tracking that is based on the past-stored subspace 35 becomes unreliable over time as the change in the view departs from the stored spatial view. 36 This essentially causes the tracking of the same vehicle to become unstable as the tracked 37 element "wobbles" around the region where the object was successfully tracked. 38

The proposed method can recognize this wobble movement as reflected by the abrupt 39 change in the scaling of the tracked area where the algorithm is trying to maintain the subspace 40 data. The following criteria will first detect the occluded candidates. 41

42 1. St

St-1<Sth (Eq. 6) 43

2. St> Savg4

, (Eq. 7) 44



where St is the threshold scale factor used for occlusion and the Savg , the average scale factor for 1 all the tracking objects 2

Similar to the identification of false detection, the occlusion-handling framework 3 recognizes each possible occlusion, and each successful tracking event is incremented by 1. If 4 the value exceeds the preset threshold, the system finally recognizes the vehicle as being 5 occluded. Empirical study indicates that the method segregate the occluded data from the rest 6 fairly well, but also contain extraneous data from the occluding vehicle or object. The extraneous 7 data that are misclassified as the occluded vehicle typically reduces in scale value have 8 extremely high, nonrealistic fluctuation in average speed. The true occluded vehicles typically 9 seem to reduce in scaling factor gradually and steadily while retaining their average speed fairly 10 constant. Thus, additional processing was added to sort out true occlusion by reinforcing the 11 algorithm with acceleration data. 12 13

1) aavg,t< ath (Eq. 8) 14

The acceleration data are highly unreliable without additional smoothing due to varying frame 15 rates, however, they serves effectively in comparing the result produced from the first occlusion 16 handling based on the scale factor. This low-pass filter compares the current vehicle acceleration 17 to the threshold. If the filter exceeds the threshold more than twice, the label is officially 18 registered as an occluded vehicle. 19

There are four possible occlusion scenarios that can be observed in highway traffic and 20 numerous subcases for each of them. The main scenarios include: 21

1. The vehicle is initially visible, then becomes occluded (V-O), 22 2. The vehicles’ view is clear, then becomes occluded and then reappears once the occlusion 23

is finished (V-O-V), 24 3. The vehicle is initially occluded and then appears somewhere in the ER (O-V), 25 5. The vehicle is initially occluded, becomes visible, and is occluded again (O-V-O), which 26

may not be detected if the vehicle first becomes visible outside of the EZ, and,The 27 vehicle is occluded for its’ entire movement (O), which cannot be detected, tracked or 28 counted. 29

For scenario 3, occlusion-handling techniques are unnecessary; as the vehicle count algorithm 30 can added to the counter exactly by the time the vehicle enters the counting zone. On the 31 contrary, in scenarios 1 and 2, the output may cause undercounting or over-counting. As an 32 example, if the detected vehicle is to be tracked with the same configuration as before, it often 33 results in undercounting because of the wobble movement that increasingly affects the tracking 34 accuracy. In this scenario a rapid increase in scaling is observed. 35

By applying conservation of vehicles, the total number of cars that enter the system 36 equals the total number of cars that exists the system, even under the conditions of occlusion. 37 The following paragraphs explain how the neighboring traffic environment, detection training, 38 and lane changes affect the outcome of occlusion for the cases 1 (V-O) and 2 (V-O-V). 39

In scenario 1, the algorithm retains the vehicles’ motion trace for use in subsequent 40 frames given the trained pre-occluded images that memorize its appearance. It is also assumed 41 that the vehicle moves parallel-to-traffic, its speed remains constant and that the perpendicular-42 to-the-traffic speed vector remains within specific range during the occlusion, which is against 43 what it was assumed in the previous literature. The range of the perpendicular to the traffic speed 44 varies depending on the actual speed of the car and is calculated based on the 95% precision that 45



needs to be achieved, plus the consideration that the level of resulting alteration in the x-y speed 1 does not exceed 2mph. Given the aforementioned, the formula takes the following form 2 (equation 9). 3

𝑉! 𝑜𝑐. 𝑐𝑎𝑟 = 𝑉! ± 2,𝑎𝑛𝑑

𝑉! − 1.96 ∗!!≤ 𝑉!𝑜𝑐. 𝑐𝑎𝑟 ≤ 𝑉! + 1.96 ∗

!!,𝑤ℎ𝑒𝑟𝑒 𝑉! 𝑖𝑠 𝑡ℎ𝑒 𝑎𝑣𝑔 𝑠𝑝𝑒𝑒𝑑. (Eq.9) 4

The average speed, 𝑉! , is based on the cumulative averaged speed of the cars that surround the 5 occluded vehicle based on the minimum neighbor value (MNV) discussed earlier. The lower the 6 value of minimum-neighbor, the better the precision will be; because it is directly related with 7 the occlusion and the possibility of lane changing. 8 9 RESULTS AND VERIFICATION 10 The proposed method was implemented using a prototype created by Microsoft Visual C#. This 11 prototype uses OpenCV as its main image processing library and EmguCV as a .Net wrapper to 12 OpenCV. The team recorded 350 vehicle pre-occluded images (640 by 480 pixels) from forty-13 five videos recorded from the CCTV cameras of the Georgia Department of Transportation on I-14 85 to “train” the algorithm. To be able to check the accuracy of the method in different 15 situations, pre-occluded images are obtained under different weather and lighting conditions as 16 well as under various neighboring and traffic conditions. 17 Using these images, the uniqueness threshold was varied to find the value that detects and 18 reliably matches the HoG of the vehicle to be occluded in eight or more points. The threshold 19 value of 0.0625 was the smallest distance value that achieved the required number of matching 20 points. Therefore, the threshold was set to 0.0625. Under this threshold, the average error is 1 21 pixel. After calibrating the uniqueness threshold, the accuracy of the method was tested 22 according to changes in the aforementioned various conditions, however, since the objects have 23 been captured using the CCTV auto-focus cameras, the method was not tested for poorly focused 24 images. The camera was calibrated using MAPSAC algorithm (32). After camera calibration, 25 information regarding detection, tracking, vehicle counting and occlusion handling was 26 extracted. The output data include direct and indirect information regarding: 27

1. The number of vehicles per lane, 28 2. The number of cars per lane, 29 3. The number of lane changes by lane of origin and the exact location where this occurs 30 4. Speed Data for all the vehicles, and 31 5. The number of occluded vehicles and location of their occlusion. 32

To verify the robustness of the proposed methodology, the output counting data were 33 compared to the manual counting data of the same video for verification. The performance of 34 system for each video is evaluated by the following testing parameters: 35 36

1) Correct Counting Rate CCR = Counting data that are vehicleManual counting data

37

2) False Counting Rate FCR = Counting data that are non-vehiclesTotal counting data

38

3) Removal Rate RR RR = Total number of falsely detected objects removed from countingTotal number of falsely detected objects

39



4) Occlusion Handling Rate (OHR) = !"#$ !"##$%$&' !""#$%&!'!"#$% !"#"$#"! !""#$%&!'

1

The aggregated results of the comparison from the forty-five 30-minute long videos as 2 well as the independent variables and measures of effectiveness are presented in Table 1 for 3 MNV=2 and inside parenthesis for MNV=5. The results from each video sample are also 4 compared in terms of observed environmental variables such as illumination variance, 5 congestion level and the difference in camera angles (θ, φ, horizontal and vertical angles 6 respectively). The most significant results are illustrated in the total number of automated counts 7 and the correct versus false occlusion counts, which validates the performance of the algorithm. 8 9

TABLE 1: Counting Data for Minimum Neighbor = 2 (In parenthesis MN =5) 10

11

True (manual) Counts 380 562 421 512 301 2176 Lane Number 0 1 2 3 4 Total

Automated Counts 377 (352) 554 (511) 413 (408) 500 (469) 299 (289) 2142 (2028)

Over-counts

Double Count 0 (0) 4 (4) 2 (2) 1 (1) 0 (2) 7 (9)

False Count 0 (0) 2 (2) 4 (4) 0 (1) 1 (1) 7 (8)

Falsely added to

the counter

0 (0) 0 (0) 0 (0) 0 (3) 0 0 (3)

Undercounts Falsely

Removed by Filter

2 (3) 9 (11) 8 (10) 3 (2) 7 (7) 29 (33)

False Occlusion (Correct Counts) 0 (0) 2 (3) 1 (3) 0 (1) 0 (0) 3 (7)

Correct Occlusion Counts 14 (14) 21 (20) 8 (8) 1 (0) 0 (0) 44 (42)

Total Correct Vehicle 377 (352) 548 (505) 412 (402) 500 (464) 297 (286) 2134 (2009)

Total False Count 0 (0) 6 (6) 6 (6) 1 (5) 1 (3) 14 (20)

Total Missed Vehicle 3 (26) 14 (57) 9 (19) 12 (48) 4 (15) 42 (165)

Total Removed 8 (10) 25 (31) 44 (41) 46 (39) 10 (12) 133 (133)

Total Correctly Added by OH 14 (14) 23 (23) 9 (11) 1 (1) 0 (0) 44 (46)

CCR (%) 98.95 (92.63)

97.69 (89.86)

97.86 (95.49)

97.66 (90.63)

98.67 (95.02)

98.29 (92.73)

FCR (%) 0.0 (0.0) 1.1 (1.2) 1.46 (1.49) 0.2 (1.08) 0.3 (1.05) 0.612

(0.964)

RR(%) 0.0 (0.0) 96.2

(91.4) 99.6

(96.2) 99.7

(93.4) 99.6

(93.3) 98.9

(93.6)

OHR(%) 100 (100) 100 (100) 97.1 (92.3) 100 (100) NA (NA) 99.42

(98.46)



In Table 1, it is significant to the accuracy of the occlusion-handling algorithm that 1 ranges from 97.1% to 100% in the case of MN=2, while the accuracy slightly drops down in the 2 case of MN=5, given the higher number of potential matching elements within the image. It is 3 also clearly stated that the RR has been significantly improved; therefore, the number of over-4 counts and undercounts is eliminated. 5 6 LIMITATIONS 7 There are three main limitations to the occlusion handling methods developed in this work: 1) 8 speed variability under various congestion levels causes errors, 2) the spillover effect, and 3) the 9 propagation of errors within vehicle platoons. 10 Regarding speed variability, the initial assumption is that the drivers’ characteristics are 11 approaching the normal distribution, however, this assumption does not adequately account for 12 timid and aggressive drivers. Also, whenever the sight distance is greater (i.e. in free flow), the 13 error is greater because the potential acceleration or deceleration of the vehicle cannot be 14 estimated precisely while it is occluded. On the other hand, the average difference in speed in the 15 aforementioned case shouldn’t exceed two miles per hour, as the speed defined by the geometric 16 position of the occluded car with respect to the camera given that the assumed average length of 17 a car and truck is 18ft and 40ft respectively. Finally the same issue is observed when series of 18 vehicles were not successfully detected in the system: the stored reference speed fails to cope 19 with the abrupt change in the speed differentials. 20

Most issues are defined when the detection fails to represent the vehicles’ entire view; 21 thus the occlusion does not match any of the major categories described above. In this case, 22 vehicles’ counts are often misplaced into the adjacent lanes. This is defined as the spillover effect 23 and results from: 24

1. Partial detection of the moving object: In this case, the centroid, which corresponds to the 25 x-y pair of vehicles’ location at every timestamp, is shifted from the center of the lane 26 and appears on a different case. In that other case lane changes cannot be observed. 27

2. If counting zones are placed in the same transition zone, false vehicle counts are added on 28 the adjacent lane; therefore the calculation of density is not precise. 29

Finally, when vehicle platoon is observed, the probability of error in occlusion increases because 30 of the speed fluctuation (stop-and-go). In particular, a potential error in the detection of occlusion 31 in one vehicle expands to the vehicles that follow behind on the same lane causing a series of 32 errors. 33 34 CONCLUSION AND RECOMMENDATIONS 35 The purpose of this paper is to delve into new fields of research by introducing an automated 36 reliable and robust framework for full-body occlusion handling on vehicles using monocular 37 CCTV cameras mounted on the sides of the Georgia Interstate System. Also, a novel and 38 completely automated version for vehicle detection is introduced as part of the process. The 39 benefits of the proposed framework are summarized below: 40

1. The precision in detecting and tracking occlusion cases in conjunction with the absolute 41 reliability of the removal algorithm lead to a cutting-edge; automated and zero-cost 42 method for analyzing driver’s behavior and extracting significant information related to 43 the impact of onramps and off-ramps, HOV-to-HOT lanes and toll-stations given 44 different traffic flow, density and demand throughout the day. 45



2. The accuracy of the output is the same regardless the scale of the experiment. The data 1 collection process is zero-cost, since new software can be applied to the existing CCTV 2 cameras. 3

3. When the trained classifier is relatively weak, an increase in the area that the ER covers 4 leads to greater detection rates 5

4. Recognition of false positives is effortless, thus the vehicle counting output is precise, 6 and the performance of a classifier based on the region of interest curve can be loosened 7 to allow for higher detection rates, 8

5. Occlusion detection can also be tested and can be utilized in order to add missing vehicle 9 to the overall counting. 10

Recommendations for future work include, but are not limited, to the definition of the sub-11 components of the algorithm to deal with more complicated movements. For example, the 12 occlusion of one vehicle by another in a transit hub or a crossing area may require adaptation. 13 With respect to image processing, further assessment of the potential use of the proposed 14 methods in cheaper and broadband cameras with lower resolution or poor video quality must be 15 examined. The reliability of proposed framework may be further improved by reinforcing the 16 filtering operation with more advanced microscopic traffic theory in the aforementioned 17 challenging environments. These propositions will serve as a significant component in the 18 dissemination of this research. 19 20 ACKNOWLEDGEMENTS 21 This material is based upon work supported by the Georgia Department of Transportation. Any 22 opinions, findings, and conclusions or recommendations expressed in this material are those of 23 the authors and do not necessarily reflect the views of the Georgia Department of Transportation. 24 25 REFERENCES 26 [1] Park, M.W., E. Palinginis, and I. Brilakis. Detection of Construction Workers in Video 27

Frames for Automated Initialization of Vision Trackers. Presented at CRC Annual 28 Meeting, 2012. 29

[2] Brilakis, I., and M. W. Park. Automated Vision Tracking of Project Related Entities, 30 Advances and Challenges in Computing in Civil and building Engineering, Vol. 25, No. 31 4, 2011, pp. 713-724. 32

[3] Sankari, M., and C. Meena. Estimation of Dynamic Background and Object Detection in 33 Noisy Visual Surveillance. International Journal of Advanced Computer Science & 34 Application, Vol. 2, No. 6, 2011, pp. 77-83. 35

[4] Zhang, L., S. Z. Liu, X. Yuan, and S. Xiang. Real-Time Object Classification in Video 36 Surveillance Based on Appearance Learning. Presented at the IEEE Conference on 37 Computer Vision and Pattern Recognition, 2007. 38

[5] Avery, R.P., G. Zhng, Y. Wang, and N. L. Nihan. An Investigation into Shadow Removal 39 from Traffic Images. In Transportation Research Record: Journal of the Transportation 40 Research Board, No. 1, Transportation Research Board of the National Academies, 41 Washington, D.C., 2000, pp. 70-77. 42

[6] Chaiyawatana, N., B. Uyyanonvara, T. Kondo, and P. Dubey. Robust Object Detection 43 on Video Suveillance. Presented at the 8th International Joint Conference on Computer 44 Science and Software Engineering, IEEE, 2011. 45



[7] Bas, E., A. M. Tekalp, and F. S. Salman. Automated Vehicle Counting from Video for 1 Traffic Flow Analysis. Presented at the Intelligent Vehicles Symposium, IEEE, 2007. 2

[8] Tsai, L-W., Y. C. Chean, C. P. Ho, H. Z. Gu, and S. Y. Lee. Lane Detection and Road 3 Traffic Congestion Classification for Intelligent Transportation System. Presented at the 4 3rd International Conference on Machine Learning and Computing, 2011. 5

[9] Crnojevic, V., B. Antic, and D. Culibrk. Optimal Wavelet Differencing Method for 6 Robust Motion Detection. Presented at the 16th International Conference on Image 7 Processing, 2009. 8

[10] Elgammal, A., R. Duraiswami, D. Harwood, and L.S. Davis. Background and 9 Foreground Modeling using Nonpanoramic Kernel Density Estimation for Visual 10 Surveillance. Proceedings of IEEE, Vol. 90, No. 7, pp.1151-1163, 2002. 11

[11] Sheikh, Y. and M. Shah. Bayesian modeling of dynamic scenes for object detection. 12 IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol 27, No. 11, 2005, 13 pp. 1778-1792. 14

[12] Vargas, M., J. M. Milla, S. L. Toral, and F. Barrero. An Enhanced Background 15 Estimation Algorithm for Vehicle Detection in Urban Traffic Scenes. IEEE Transactions 16 on Vehicular Technology, Vol. 59, No. 8, 2010, pp. 3694-3709. 17

[13] Robert, K. Video-Based Traffic Monitoring at Day and Night Vehicle Features Detection 18 Tracking. Presented at the 12th International IEEE conference on Intelligent 19 Transportation Systems, 2009. 20

[14] Tsuchiya, M., and H. Fujiyoshi. Evaluating Feature Importance for Object Classification 21 in Visual Surveillance. Psesented at the 18th International Conference on Pattern 22 Recognition, Vol. 2, IEEE, 2006. 23

[15] Zhang, G., R. P. Avery, and Y. Wang. Video-Based Vehicle Detection And Classification 24 System for Real-Time Traffic Data Collection using Uncalibrated Video Cameras. In 25 Transportation Research Record: Journal of the Transportation Research Board, No. 1, 26 Transportation Research Board of the National Academies, Washington, D.C., 2007, pp. 27 138-147. 28

[16] Malinovskiy, Y., Y. J. Wu, and Y. Wang. Video-Based Vehicle Detection and Tracking 29 Using Spatiotemporal Maps. In Transportation Research Record: Journal of the 30 Transportation Research Board, No. 2121, Transportation Research Board of the 31 National Academies, Washington D.C., 2009, pp. 81-89. 32

[17] Liu, T., N. Zheng, L. Zhao, and H. Cheng. Learning Based Symmetric Features Selection 33 for Vehicle Detection. Psesented at the Intelligent Vehicles Symposium, IEEE, 2005. 34

[18] Cabrera, R.R., T. Tuytelaars, and L. Van Gool. Efficient Multi-Camera Detection, 35 Tracking, and Identification using a Shared Set of Haar-Features. Presented at the 36 Conference on Computer Vision and Pattern Recognition, IEEE, 2011. 37

[19] Feris, R., J. Petterson, B. Siddiquie, L. Brown, and S. Pankanti. Large-Scale Vehicle 38 Detection in Challenging Urban Surveillance Environments. Presented at the Workshop 39 on Applications of Computer Vision, IEEE, 2011. 40

[20] Enzweiler, M., and D. M. Gavrila. Monocular Pedestrian Detection: Survey and 41 Experiments. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 31, 42 No. 12, 2009, pp. 2179-2195. 43

[21] Wen, X., H. Yuan, W. Liu, and H. Zhao. An Improved Wavelet Feature Extraction 44 Approach for Vehicle Detection. Presented at the International Conference on Vehicular 45 Electronics and Safety, IEEE, 2007. 46



[22] Chang, W. C. and C. W. Cho. Multi-Class Boosting with Color-Based Haar-Like 1 Features. Presented at the 3rd international IEEE conference on Signal-Image 2 Technologies and Internet-Based System, IEEE, 2007. 3

[23] Negri, P., X. Clady, and L. Prevost. Benchmarking Haar and Histograms of Oriented 4 Gradients Features Applied to Vehicle Detection. Proceedings of the 4th International 5 Conference on Informatics in Control, Automation and Robotics, 2007. 6

[24] Tamersoy, B., and J. Aggarwal. Robust Vehicle Detection for Tracking in Highway 7 Surveillance Videos Using Unsupervised Learning. Presented at the 6th International 8 Conference in Video and Signal Based Surveillance, IEEE, 2009. 9

[25] Huang, L. Real-Time Multi-Vehicle Detection and Sub-Feature Based Tracking for 10 Traffic Surveillance Systems. Presented at the 2nd International Asia Conference on 11 Informatics in Control, Automation and Robotics, IEEE, 2010. 12

[26] Hsieh, J. W., S. H. Yu, Y. S. Chen, and W. F. Hu. Automatic Traffic Surveillance System 13 for Vehicle Tracking and Classification. IEEE Transactions on Intelligent Transportation 14 Systems, Vol 7, No. 2, 2006, pp. 175-187. 15

[27] Ritchie, D. Georgia’s Regional Signal Operations Program, 2012 16 [28] Kanhere, N. K. and S. T. Birchfield. A Taxonomy and Analysis of Camera Calibration 17

Methods for Traffic Monitoring Applications. IEEE Transactions on Intelligent 18 Transportation Systems, Vol 11, No. 2, 2010, pp. 441-452. 19

[29] Mehmood, R., M. U. Ali, and I. A. Taj. Applying Centroid Based Adjustment to Kernel 20 Based Object Tracking for Improving Localization. Presented at the International 21 Conference on Information and Communication Technologies, IEEE, 2009. 22

[30] Yilmaz, A., O. Javed, and M. Shah. Object tracking: A survey. Acm Computing Surveys , 23 Vol. 38, 2006, pp. 4-13. 24

[31] Ross, D., J. Lim, R. S. Lin, and M. H. Yang. Incremental Learning for Robust Visual 25 Tracking. International Journal of Computer Vision, Vol. 77, No. 1, 2008, pp. 125-141. 26

[32] Torr P.H.S. Bayesian Model Estimation and Selection for Epipolar Geometry and 27 Generic Manifold Fitting. International Journal of Computer Vision, Vol. 50, No.1, 2002, 28 pp.35-61. 29

[33] Cheung, S., and C. Kamath. Robust Techniques for Background Subtraction in Urban 30 Traffic Video. Proceedings of Video Communications and Image Processing, SPIE 31 Electronic Imaging In Proceedings of Video Communications and Image Processing, 32 SPIE Electronic Imaging, 2004, pp. 881-892. 33

[34] Antonio Guiducci, Camera Calibration for Road Applications, Computer Vision and 34 Image Understanding, 79, no. 2 , 2000, pp. 250–266. 35

[35] C Wohler and J K Anlauf, An Adaptable Time-delay Neural-network Algorithm for Image 36 Sequence Analysis, IEEE Transactions on Neural Networks / a Publication of the IEEE 37 Neural Networks Council 10, no. 6, 1999, pp 1531–1536. 38

[36] Roess, R. P., Prassas, E. S., McShane, W. R. Traffic Engineering, 4th Edition. Pearson 39 Education Inc., New Jersey, 2010. 40

[37] Robert M. Haralick, K. Shanmugam, and Its’Hak Dinstein, “Textural Features for Image 41 Classification,” IEEE Transactions on Systems, Man, and Cybernetics 3, no. 6 1973, pp. 42 610–621. 43

44


full-body occlusion handling and density analysis in traffic video-surveillance systems

Documents