modified surf based object tracking system - · pdf fileabstract — a speeded up robust...
Post on 07-Feb-2018
231 Views
Preview:
TRANSCRIPT
Abstract—A Speeded Up Robust Feature (SURF) algorithm is
modified and applied to an objectobject tracking problem on fisheye
lens images for an automobile safety system. The modified version of
SURF algorithm locates objects on images by the matching points
distribution. The proposed object tracking system intends to achieve
more accurate and faster object tracking results robust to variations in
color and shape by correcting the problems with MeanShift algorithm.
In order to evaluate the proposed system, experiments on sets of image
sequence data obtained from fisheye lens images are performed and
compared with the conventional MeanShift algorithm. The
preliminary results show that the modified SURF-based object
tracking method can be a valuable alternative over existing methods
based on MeanShift algorithm in terms of tracking accuracy and speed
for fisheye lens images.
Keywords— Object detection, Speed Up Robust Feature, ,
MeanShift, object, Moments.
I. INTRODUCTION
UTOMATC DETECTION and tracking of objects is one
of the most important topics in designing security system.
Especially, detecting object is an essential task for
autonomous vehicles. When computer vision is used for this
purpose, it becomes a very challenging problem because
objects have different appearance and shapes [1]-[4]. A simple
and powerful tool for this problem is to transform this problem
to a binary classification problem, where a local region is
classified either a region with object or not with a sliding
window strategy. Most important achievements in the field of
object detection problem include the method of gradient-based
features with SIFT [5]- [7] and Histogram of Oriented Gradient
(HOG) features [8]. Various types of sensors including various
types of camera lenses are utilized in different object detection
and tracking systems for autonomous vehicles. Among
different camera lenses, fisheye lens produces a wide
panoramic image with strong visual distortion. Because of this
strong visual distortion, it sometimes limits its applicability to
image processing and understanding tasks even though its wide
angles of view. If, however, the task does not require any
sophisticated information related with image processing and
understanding results. the fisheye lens becomes an excellent
candidate to produce images with a very wide angle of view[3].
The object tracking methods used in this paper adopts SURF
algorithm with MeanShift algorithm [9][10]. MeanShift is a
procedure for locating the maxima of a density function given
discrete data sampled from that function [5][6] and SURF
algorithm is a local feature detector and descriptor that can be
Miso Jang and Dong-Chul Park are with Department of Electronics
Engineering, Myong Ji University, Yong In, Gyeonggi-do, 449-728, South
Korea (e-mail: parkd@mju.ac.kr).
used for tasks such as object recognition or registration or
classification or 3D reconstruction. The modified SURF
algorithm used in this paper utilizes MeanShift and SURF for
accurate object tracking purpose with real time operation on
fisheye lens images. In this paper, we take the advantages of
MeanShift and SURF algorithm for object detection and
tracking task on fisheye lens images.
The rest of this paper is organized as follows: A brief
summary of MeanShift and SURF in Section 2. Section 3
summarizes a object tracking method based on SURF
algorithm. Experiments and results are given in Section 4.
Finally, Section 5 concludes this paper.
II. MEAN SHIFT AND SURF ALGORITHMS
MeanShift is a procedure for finding the maxima of a density
function given discrete data samples [5][6]. MeanShift
considers data space as a probability density function. If the
input is a set of points then MeanShift considers them as
sampled from the underlying probability density function.
Dense regions are considered as the local maxima of the
probability density function. A confidence map based on the
color information between current image frame and previous
image frame in image data is introduced when the mean shift is
utilized for object tracking problems. The probability
information from the color pixels of the target object in the
previous image frame can allow us to find the most probable
location near the target object’s previous location with the
mean shift information. The MeanShift algorithm can find the
location of target object iteratively beginning with the initial
estimate of the target object.
When a Gaussian kernel on the distance to the current
estimate is used and is the neighborhood of , the
weighted mean of the density in the window estimated by the
following equation:
∑
∑
For each data point, a gradient ascent method on the local
estimated density is performed until convergence. The
stationary points obtained via gradient ascent are considered as
the modes of the density function. Note that all points
Modified SURF-based Object Tracking System
Miso Jang, and Dong-Chul Park
A
International Journal of Computer Science and Electronics Engineering (IJCSEE) Volume 3, Issue 4 (2015) ISSN 2320–4028 (Online)
319
Fig. 1 The second order differentiation box filter
associated with the same stationary point belong to the same
cluster. In order to find the maximum density, MeanShift
algorithm uses the ratio of the current distribution over the
target iteratively. When is the intensity of the discrete
probability image at within the search window, mean value of
locations of the discrete probability image and the size of
search window are found by using the following moments[9]:
∑∑
∑∑
∑∑
By using the moments in Eq. (2), the followings can be
found:
√
(3)
where denotes the center of the moving object and is
the estimated size of search window.
SURF(Speeded Up Robust Feature) is an improved version
of SIFT (Scale-Invariant Feature Transform)[5][6]. SIFT is an
algorithm to detect and describe local features in images. The
SIFT feature descriptor is invariant to uniform scaling,
orientation, and partially invariant to affine distortion and
illumination changes and SIFT can identify target objects
distorted with the conditions of clutter and under partial
occlusion. Given target objects, keypoints are first extracted
from reference images in SIFT. An object in a new image is
then recognized by individually matching each feature from the
new image to this reference image. Based on the matching
information for the entire feature space, subsets of keypoints
with its location, scale, and orientation in the new image
decides matches with reference images. In SIFT, Laplacian of
Gaussian with Difference of Gaussian is approximated in order
to find the scale-space.
Fig. 2 Searching for maxima in scale space[11]
In SURF[10], Laplacian of Gaussian (LoG) is approximated
with Box Filter. With this approximation, the convolution
process with box filter can be calculated with integral images
through relatively light computational effort. The SURF
algorithm utilizes the determinant of Hessian matrix for both
scale and location. The Hessian matrix is defined as follows
|
|
where , and are the convolutions of the Gaussian
second order derivative with the image at [9].
For faster computation, the following Hessian determinant is
Utilized[8]:
( ) ( )
In order to achieve the scale invariant feature[11], an image
pyramid of scale spaces is adopted as shown in Fig. 2. As the
difference between two differently low-pass filtered images,
the DoG actually performs as a band-pass filter, which removes
high frequency noise, and also some low frequency
components in the image. The frequency components in the
passing band can be considered as the edges in the images.
Pyramid layers are subtracted and the DoG (Difference of
Gaussian) images are obtained. Each octave in the scale space
represents a group of filter responses obtained by the
convolution of the input image and a set of filters. The interest
points are selected when the determinant value of a point is
larger than a predetermined threshold in the window.
The orientation of rotation invariant interest points can be
obtained by using the wavelet responses in horizontal and
vertical direction for a neighborhood of size 6s. The orientation
is determined by finding the sum of all responses within a
sliding orientation window of 60 degree angle.
Local feature descriptor in SURF can also be extracted by
using Haar wavelet responses in horizontal and vertical
direction. A neighborhood of size is considered
International Journal of Computer Science and Electronics Engineering (IJCSEE) Volume 3, Issue 4 (2015) ISSN 2320–4028 (Online)
320
Fig. 3 Building the Interest point descriptor [10]
around the keypoint where s is the size as shown in Fig. 3. It is
then divided into 4x4 subregions. For each subregion,
horizontal and vertical wavelet responses are calculated and a
vector is formed. That is, Haar wavelet responses of
∑ ∑ ∑ ∑ are obtained and the resulting
64-dimensional features on subblocks are
obtained for the region.
In order to reduce the computational burden, the sign of
Laplacian for underlying interest point is utilized because it is
already computed during detection. The sign of the Laplacian
can detect bright blobs on dark backgrounds. In the matching
stage, we only compare features of types of contrast by using
the sign of Laplacian. This simplification results in faster
matching and can help the real-time operation of the systems
adopting this algorithm.
An accurate and stable operation of objects tracking task can
be accomplished by combining SURF algorithm and MeanShift
algorithm. However, in order to estimate the center of moving
objects, the color distribution information used in MeanShift
algorithm is replaced by the matching point distribution
information. That is, is utilized instead of in
calculating the moments of Eq. (2). After finding interest
points, a matching process is then performed. If the signs of
Laplacian match, the Euclidean distances are then calculated.
By using the following nearest neighbor search method, the
matching points are estimated with a predetermined threshold
value, .
Note that is set to be 0.8 in our experiments.
III. MODIFIED SURF-BASED OBJECT TRACKING SYSTEM
An accurate and stable object tracking system utilized the
keypoint information obtained by using SURF algorithm in this
paper. In order to estimate the center of moving objects, the
Modified SURF-based Object Tracking System utilizes the
information of matching points distribution by calculating the
moments of Eq. (2) for instead of while
Fig. 4 Tracking results on video frame No. 1
MeanShift algorithm utilizes the color distribution. Local
feature descriptor is constructed with a square region (size of
) centered at interest points and by rotating along the
orientation. The resulting 64-dimensional features
are then found by calculating Haar wavelet responses of the
following four descriptors: ∑ ∑ ∑ ∑ .
International Journal of Computer Science and Electronics Engineering (IJCSEE) Volume 3, Issue 4 (2015) ISSN 2320–4028 (Online)
321
Fig. 5 Tracking results on video frame No. 2
In order to achieve an adaptive estimation of object size, the
ratio between the distance among matching points of the target
object and the distance among matching points of the object in
the image space is calculated. The ratio and the size of moving
object can then be found.
IV. EXPERIMENTS AND RESULTS
The Modified SURF-based Object Tracking System intends
to overcome the problems involved with the object tracking
system based on MeanShift algorithm for accurate and stable
object tracking performance. Since the features adopted in the
proposed object tracking system are based on the SURF
algorithm, the proposed object tracking system can produce
accurate tracking results. The proposed object tracking system
is evaluated with sets of image sequence data in terms of
tracking accuracy and tracking speed. The image sequences
obtained with a fisheye lens used for our experiments and
tracking results are shown in Fig. 3 and Fig. 4. Since the images
obtained through fisheye lens, the images include severe
distortion [3]. As can be seen from Fig. 4 and Fig. 5, the objects
on the image sequence has very strong visual distortions
including color changes. The image sequence has 320 frames
with 640 480 resolution. The computing environment used in
our experiments is: Intel Core i7-2600 CPU 3.40GHz, RAM
8.00GB, Windows 7 Enterprise 64bit OS.
V. CONCLUSIONS
An object tracking system based on modified SURF
algorithm and Meanshift algorithm is proposed in this paper.
Once an object in the proposed object tracking system is
detected, feature points inside a window are calculated by
using SURF algorithm. In the following image frame, matching
points with the target object are calculated and the center of
object is then estimated by using the moment method. When the
matching points of an object between frames are satisfied with
the conditions of the nearest neighbor search method, the size
of moving object is then estimated by using the ratio between
the distances of matching points in the two sequential image
frames. The proposed system improves the tracking inaccuracy
in CAMShift when changes including color, illumination,
rotation, and shape are present. In order to increases its
operation speed for real-time operation, the proposed system
utilizes strategies for minimizing arithmetic operations: using
the moment generating function for the estimation of target
objects on image data. Experiments on sets of image sequence
data obtained from fisheye lens for the system evaluation
purpose show that the proposed system yields very accurate
tracking results with real-time operation speed. This
preliminary results implies that the proposed object tracking
system is far more accurate than the conventional MeanShift
algorithm. The results also show that the operation speed is
suitable for real-time object tracking purpose.
ACKNOWLEDGMENT
This work was supported by the IT R&D program of The
MKE/KEIT (10040191, The development of Automotive
Synchronous Ethernet combined IVN/OVN and Safety control
system for 1Gbps class).
REFERENCES
[1] S. S. Paisitkriangkrai, C. Shen, and J. Zhang, “ Fast pedestrian detection
using a cascade of boosted covariance features,” IEEE Transactions on
Circuits and Systems for Video Technology. Vol. 18, no.8, pp.1140-1151, 2008.
[2] S. Wu, R. Laganiere, and P. Payeur “Improving pedestrian detection
with selective gradient self-similarity feature,” Pattern Recognition vol. 48, no. 8, pp. 2364-2376, August 2015.
[3] Y. Kubo, T. Kitaguchi and J. Yamaguchi, “Human tracking using fisheye
images,” in Proc. IEEE Conf. SICE, 2007, pp. 17-20. [4] N. Tuong, T. Muller, and A. Knoll, “Robust pedestrian detection and
tracking from a moving vehicle,” in Proc. SPIE7878. Intelligent Robots
and Computer Vision XXVIII: Algorithms and Techniques, doi:10.1117/12.871994, 2011.
[5] Y. Cheng, “Mean Shift, Mode Seeking, and Clustering,” IEEE Trans. Pattern Analysis and Machine Intelligence, vo.17, no.8, pp. 790-799,
1995.
[6] D. Comaniciu and P. Meer, “Mean shift: a robust approach toward feature space analysis,” IEEE Trans. Pattern Analysis and Machine Intelligence,
vol. 24, no. 5, pp. 603-619, May, 2002
[7] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol.60, no.2, pp.91-110, 2004.
[8] N. Dalal and B. Triggs, “Histograms of oriented gradients for human
detection,” Proc. of IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 886-893, 2005.
[9] R. Mukundan and K. R. Ramakrishman, Moment Functions in Image
Analysis: Theory and Application, Singapore: World Scientific, 1998. [10] H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool, “SURF: Speeded up robust
features,” Computer Vision and Image Understanding (CVIU), vol. 110,
no. 3, pp. 346-359, June, 2008. [11] D. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” .
Int. J. of Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
International Journal of Computer Science and Electronics Engineering (IJCSEE) Volume 3, Issue 4 (2015) ISSN 2320–4028 (Online)
322
Miso Jang received the B.S. degree in Electronics Engineering from MyongJi University, Korea, in 2012. She is pursuing his M.S. degree in
Electronics Engineering at Intelligent Computing Research Lab at MyongJi
University. Her research interests include intelligent computing, neural networks, and pattern recognition.
Dong-Chul Park (M’90-SM’99) received the B.S. degree in electronics engineering from Sogang University, Seoul, Korea, in 1980, the M.S. degree in
electrical and electronics engineering from the Korea Advanced Institute of
Science and Technology, Seoul, Korea, in 1982, and the Ph.D. degree in electrical engineering, with a dissertation on system identifications using
artificial neural networks, from the University of Washington (UW), Seattle, in
1990. From 1990 to 1994, he was with the Department of Electrical and Computer Engineering, Florida International University, The State University
of Florida, Miami. Since 1994, he has been with the Department of Electronics
Engineering, MyongJi University, Korea, where he is a Professor. From 2000 to 2001, he was a Visiting Professor at UW. He is a pioneer in the area of
electrical load forecasting using artificial neural networks. He has published
more than 130 papers, including 40 archival journals in the area of neural network algorithms and their applications to various engineering problems
including financial engineering, image compression, speech recognition,
time-series prediction, and pattern recognition. Dr. Park was a member of the Editorial Board for the IEEE TRANSACTIONS ON NEURAL NETWORKS
from 2000 to 2002.
International Journal of Computer Science and Electronics Engineering (IJCSEE) Volume 3, Issue 4 (2015) ISSN 2320–4028 (Online)
323
top related