[ieee 2013 ieee international conference on electronics, computing and communication technologies...

1 2 3 4 5 6 7 8 91011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556576061

Robust Template Matching Based Obstacle Tracking For Autonomous Rovers

Prabhakar Mishra†, J.K.Kishore††

†Department of Telecommunication Engineering PES Institute of Technology Bangalore, India

[email protected] ††Indian Space Research Organization

Bangalore, India.

Rakshith Shetty, Akshay Malhotra, Rahul S Kukreja Department of Electronics & Communication Engineering

PES Institute of Technology Bangalore, India

[email protected]

Abstract—Detection and tracking of moving obstacles is central to collision free navigation of autonomous rovers in dynamic environments. Template matching based methods for obstacle tracking have been proposed in literature. These methods have limitations in tracking dynamic obstacles owing to scale and rotation variations. These variations arise due to relative velocity between the rover and the obstacle to be tracked. Due to the relative velocity, the correlation between the template of the obstacle and the region in the image corresponding to the obstacle in successive frames degrades. In this paper we present three schemes targeted at improving robustness of template matching based tracking technique using monocular vision. The algorithms presented account for the geometric constraints posed on an image captured from a fixed camera mounted on a mobile platform. We present experimental results comparing the performance of our technique with the existing template matching based techniques. Also, we present two real-time applications in the form of object following behavior and obstacle avoidance behavior to demonstrate its efficacy and computational feasibility.

Keywords: Autonomous Rovers, Detection and Tracking of dynamic obstacles, Monocular vision, Template Matching.

I. INTRODUCTION Autonomous navigation in unknown environments must

account for the presence of dynamic obstacles. Detection and tracking of dynamic obstacles is central to the design and implementation of a control system which ensures collision free navigation. Vision based techniques have been applied for detection and tracking of dynamic obstacles [1]. Detection and tracking of dynamic obstacles from a stationary observer has the advantage of characterizing the background using adaptive methods [2] and use background elimination to detect and track moving obstacles. Background subtraction techniques have been proposed in literature for simplifying the process of object detection and tracking [3]. Detection and tracking of dynamic obstacles from a moving observer is a non-trivial problem owing to relative motion between the observer and the objects being tracked. Moving object detection based on optical flow is commonly used in vision based navigation techniques [4], wherein, the motion vectors obtained from the optical flow are used to steer the rover. Mean-shift [5] technique compares a color histogram of the object to be tracked with the histogram

of the candidate object in the successive image. Moreover, techniques such as Speeded up Robust Features (SURF) [6] and Scale Invariant feature transform (SIFT) [7] in combination with Harris detector [8] have been proposed to identify and track certain features in the image. These techniques are computationally expensive and work well in scenarios where a rich set of features can be extracted in consecutive frames.

Template matching [9] based schemes have been reported in literature which use templates of objects created by a separate system for matching in consecutive frames in order to track them. These are computationally inexpensive techniques which work well for regular shaped objects. But, when using template matching to track objects from a camera mounted on a moving platform like a rover, the performance of the algorithm is not satisfactory due to relative motion between the vision sensor and the object being tracked. The apparent image of the object in subsequent frames is subject to scaling and rotation. This leads to significant degradation in performance of template matching and thereby leads to loss of tracking.

In this paper, we present an enhanced template matching scheme, which is robust to scaling and can handle slow rotational motion. We use the following three schemes to achieve the same a) Template Resizing, b) Template Correction and c) Color Correlation. The main idea here is to exploit the geometric relationship that exists between an object and it’s representation in the image to estimate the variations in scale and rotation and then correct the template for these variations before the process of template matching. The advantage of this scale corrected template matching over other methods like SIFT is that this method works well even on homogenous colored objects where SIFT suffers due to lack of features.

II. OVERVIEW OF THE PROPOSED TRACKING SYSTEM The tracking system consists of two distinct modules. The

first module, an object detector, detects any new moving object within the visual scope of the camera and stores a snapshot of this object which is referred to as its “Template”. This module returns the template of the object, its position in the frame it was initially detected in and its size. The second module, the tracking module, takes the output of the first module to track

IEEE CONECCT2013 1569680607

1

1 2 3 4 5 6 7 8 91011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556576061

the object in subsequent frames. This module outputs the current position of the object and its velocity vector consisting of the x and y components. In the tracking module, a Kalman filter [10] based approach is used to predict the objects position. For the prediction step, a linear discrete time model is used. Using this model along with the velocities of the object and its current position, the position of the object in the next frame is predicted. In the subsequent frame, in a window around the predicted position, we use template matching to measure the position of the object. The Kalman filter in its update phase combines the measured and the predicted positions to obtain the noise filtered estimates of the object’s position. In this paper we restrict the discussion to the measurement part of the tracking module.

III. TEMPLATE MATCHING A normalized cross correlation based template matching

technique has been used as the measurement scheme. The object’s template and the rectangular window in the image centered at the predicted position form the two inputs to the correlation system. The equations for normalized cross correlation are as follows.

, ∑ ∑ , ,∑ ∑ , ∑ ∑ , (1)

, , ∑ ∑ ,. (2)

, , ∑ ∑ ,. (3)

Where, T : Template of the object to be matched T’: Zero mean Template I : Search Window in the image I’ : Zero mean Search window R : Resultant Matrix

The position of the maxima of the correlation output is taken as the position of the object. The search

window is a rectangular window centered at the predicted position of the object. Its dimensions are taken as twice the dimensions of the object to tradeoff computational cost and the probability of correct measurement.

This method works well as long as the object’s representation in the new frames is similar in size to the initial template. But, even with slight change in the size of the object in new images, the correlation peak starts dipping rapidly and soon gets submerged in the noise floor. This leads to false detections, due to which the tracking mechanism breaks down causing the rover to react erroneously. Hence both these problems i.e. rapid decrease in the peak value of the correlation output and false peak detection need to be addressed.

A. Template Resizing

One way to avoid errors due to the scaling of object size in subsequent frames would be to resize the template by a scaling factor, so that it mirrors the change in the apparent size of the

object in the image. To calculate the scaling factor we have to quantify the geometric constraints involved. The geometric constraints in this case are imposed because the camera projects a 3D space onto a 2D plane thereby making the area of its projection a function of its position. In order to empirically obtain this function, only those parts of the image are considered which correspond to the floor or the surface in the real world. This is done assuming that majority of the objects that will be tracked by the tracking system will be in direct contact with the ground plane. With this constraint two key observations can be made:

a) The real world coordinate x is directly proportional to the image coordinate p along P-axis, as shown in figure 1. The proportionality constant, α, is a function of the image co-ordinate q along Q-axis, as given in equation 4.

x= α (q) * p (4)

b) All lines on the floor in the real world that are parallel to the camera axis, when projected onto the floor in the image, converge at one point in the image , as shown in figure 1. Now, given a template obtained at initial co-ordinates ( , , and point of interest , q , where the template has to be matched, we arrive at equations (5) and (6) which describe the height and width to which the template should be scaled. Essentially, it is the expected size of the object in the image at the co-ordinates , q given that the template of size , h was obtained at initial co-ordinates ( , . It is to be noted that these co-ordinates i.e. ( , and , q are the bottom left corner co-ordinates of the template and not it’s center's co-ordinates.

q w (5)

(6)

Where, , , , , , , , , ,

2

1 2 3 4 5 6 7 8 91011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556576061

Figure 1. Illustrates the resizing process. A template of w length at qT gets

scaled to wk at q. The bold lines are parallel in the real world.

Using equation 5, the template can be resized to its expected size as per its y co-ordinates in the image. Resizing can be applied before the process of correlation to obtain significant improvement in matching accuracy. This is done by starting the correlation at the bottom of the window. As the q value increases, the template T is resized using the new value q as the modified coordinate of the bottom of the template. Thus we arrive at the following equation:

2 , 2 , ,

(7)

Equation 7 can be modeled along the lines of equation 1 to obtain normalized cross correlation with resizing. To improve the computational performance we apply resizing only for every n pixels change in the y-index. Using regression for different values of n we obtain an optimal value of 10 for n.

To experimentally verify the efficacy of resizing based correlation, a video segment of a single object moving away from the rover is considered. Figure 2 contains some snapshots of the video segment. Initial template of the object was selected manually and the tracking results from both the methods were obtained. Also, resizing variations to the algorithm like setting the step-size to 10 and resizing only once per correlation at the bottom of the window were evaluated on the same video segment. The results are as shown in figure 3. Here maximum value of each correlation output is plotted against the frame number. As can be seen from the figure, correlation without resizing degrades rapidly in successive frames (object moving away) whereas correlation with resizing is robust to this scaling and the peak degrades only slightly. Another point to be noted is that the step-size of 10 is also almost as good as resizing at every step. Hence we stick to the step-size of 10 due its added benefit of reduced computational complexity.

Figure 2. Sequence of images showing the rover tracking a moving object.

Image order from (a) to (d)

Figure 3. Graph comparing the normalised cross correlation, correlation with

resizing and the optimized correlation techniques.

B. Template Correction

When the object's motion vectors have a significant component perpendicular to the camera axis or when the object is rotating, the maximum correlation value dips suddenly and drastically as the apparent image of the object changes with respect to the original template captured.

To overcome this problem, we use template correction where the performance of matching is tracked over time and the template of the object is changed when there is a significant degradation. We track the peak value of the correlation and

5 10 15 20 25 30 35 400

0.2

0.4

0.6

0.8

1Comparison of Resizing based Correlation Techniques

Time Instants

Peak

Val

ue o

f the

Cor

rela

tion

Without ResizingWith 1-step ResizingWith 10-step ResizingWith only one Resizing

(a) (b)

(c) (d)

3

1 2 3 4 5 6 7 8 91011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556576061

capture a new template when the change in this peak value between any two successive frames is above a threshold or if the peak of the correlation values falls below a threshold, as shown in equation 8.

peak i 1 peak i Ω || i μ (8) (17)

Where, peak i max , , Ω limits the rate of degradation of correlation, μ is the lower limit on the peak correlation value.

These two values have been tuned so as to detect the degradation in template matching early, in order to correct the template before we lose track of the object. New templates are captured by applying our object detection algorithm only around the search window. Figure 4 shows template correction being applied at two different points during a single tracking process. As seen from the figure, there is significant improvement in the templates leading to more accurate tracking.

Figure 4. Template change mechanism in action. Templates are changed using differencing to get a better template to deter the peak degradation,

C. Three Channel Correlation

As a further enhancement, we implemented color correlation i.e. to perform correlation on all three channels (RGB) and to finally take the product of all three correlation outputs as the final output. This reduces the values of the maxima but noise filtering achieved with this is more pronounced. Figure 5 compares the performance of grey scale correlation with color correlation. Color based correlation helps in reducing the number of false peaks that appear in the correlation output and thereby reduces the error occurrence probability. A comparison between the number of false peaks between three-channel correlation and grey scale correlation can be found in figure 6. As is evident from the graph, the number of false peaks above the threshold has reduced significantly in three channel correlation and this reduces the possibility of false detection.

Figure 5. Improved template matching using multi channel correlation

Figure 6. Graph comparing the number of false peaks between multi channel correlation and grey scale correlation

IV. TRACKING DEMONSTRATION As a prototype to demonstrate the efficiency of our tracking

algorithm in a real world scenario, we have implemented an object follower application. Here the rover has to keep track of an object whose template has been provided as an input to the object follower system. This process is illustrated in the flowchart in figure 7.

Figure 7. Block diagram of the object follower system

5 10 15 20 25 30 35 400%

0.25%

0.5%

0.75%

1%

1.25%

Time Instants

Pece

ntag

e of

pea

ks a

bove

the

Thre

shol

d

Comparison of Color Correlation with Grey scal correlation

Grey scale Correlation

Three channel color correlation

4

1 2 3 4 5 6 7 8 91011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556576061

Object recognition is done using the enhanced correlation techniques. The object velocities obtained from the Kalman filter's output in our tracking subsystem are used to calculate the wheel speeds for both left and right wheels. The speeds are chosen so as to smoothly steer the rover towards the object to be followed. The results are shown in figure 8, where the rover is following a ball in a curved trajectory.

(a) (b)

(c) (d)

Figure 8. Sequence of images showing the rover following a moving object. Image order from (a) to (d)

As another illustration for validating the tracking capabilities of our system, a dynamic obstacle avoidance system was implemented. The tracking system detects and tracks any moving object in the rover’s field of view and determines its velocities. Once the tracker has stabilized it triggers an obstacle avoidance mechanism which controls the actuators to steer the rover away from the collision path. The obstacle avoidance is based on the velocity obstacles algorithm [11].

Figure 9 demonstrates the rover avoiding an obstacle moving towards it.

(a) (b)

(c) (d) Figure 9. Sequence of images illustrating the rover avoiding a ball moving

towards it. Image order from (a) to (d)

V. CONCLUSION In this paper we presented improvements to traditional

template matching based tracking system which made tracking robust to scale variations and slow rotations. Experimental results were presented to show the efficacy of the algorithm. Also, results from real-time implementation of the algorithm on the rover were presented.

The scope for future work would be to extend the method to cameras with a pan-tilt mechanism where the camera axis can change. Also another option is to explore the use of stereo vision to derive a more generalized set of constraints to assist in object tracking on uneven surfaces.

ACKNOWLEGMENT We thank PES Centre for Intelligent Systems for

facilitating our work. We thank Dr. K Koshy George, the director of PES Centre for Intelligent Systems for the encouragement and motivation provided to us.

REFERENCES [1] A. Yilmaz, O. Javed, and M. Shah, “Object tracking: A survey,” ACM

Computing Surveys, vol. 38, no. 4, 2006. [2] H. Liu, W. Pi and H. Zha, "Motion Detection for Multiple Moving

Targets by Using an Omnidirectional Camera," In Conf. on Int. Syst. and Signal Process., vol. 1, 2003, pp. 422- 426.

[3] R.J. Radke, S. Andra, O. Al-Kofahi, and B. Roysam, “Image change detection algorithms: a systematic survey,” IEEE Trans. on Image Process, 2005, pp. 294-307.

[4] A. Talukder, S. Goldberg, L. Matthies, and A. Ansar, “Real-time detection of moving objects in a dynamic scene from moving robotic vehicles,” In proc. of Intell. Robots and Syst., vol.2, 2003, pp. 1308- 1313.

[5] D. Comaniciu, and P. Meer, "Mean shift: A robust approach toward feature space analysis," IEEE Trans. Patt. Analy. Mach. Intell., 2002, pp. 603–619.

[6] H. Bay, T. Tuytelaars, and L. van Gool. "SURF: Speeded up robust features," In ECCV, 2006.

[7] D. Lowe, "Distinctive image features from scale-invariant keypoints," in Int. J. Computer Vision, 2004, pp. 91–110.

[8] C. Harris, and M. Stephens, "A combined corner and edge detector". In Proc. of 4th Alvey Vision Conference, 1988, pp. 147–151.

[9] Roberto Brunelli, Template Matching Techniques in Computer Vision: Theory and Practice, Wiley Publishing, 2009

[10] C.K. Chui and G. Chen, Kalman Filtering with Real-Time Applications, Springer Series in Information Sciences. Springer, Berlin, Heidelberg, New York, 1987.

[11] P. Fiorini, and Z. Shiller, "Motion Planning in Dynamic Environments using Velocity Obstacles," In proc. of Int. J. of Robotics, 1998, pp. 760-772

5

[ieee 2013 ieee international conference on electronics, computing and communication technologies...

Documents