research article a study of visual descriptors for outdoor...

13
Research Article A Study of Visual Descriptors for Outdoor Navigation Using Google Street View Images L. Fernández, L. Payá, O. Reinoso, L. M. Jiménez, and M. Ballesta Department of Systems Engineering and Automation, Miguel Hernandez University, Avda. de la Universidad s/n, Elche, 03202 Alicante, Spain Correspondence should be addressed to L. Pay´ a; [email protected] Received 22 March 2016; Revised 24 August 2016; Accepted 2 November 2016 Academic Editor: Jose R. Martinez-de Dios Copyright © 2016 L. Fern´ andez et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. A comparative analysis between several methods to describe outdoor panoramic images is presented. e main objective consists in studying the performance of these methods in the localization process of a mobile robot (vehicle) in an outdoor environment, when a visual map that contains images acquired from different positions of the environment is available. With this aim, we make use of the database provided by Google Street View, which contains spherical panoramic images captured in urban environments and their GPS position. e main benefit of using these images resides in the fact that it permits testing any novel localization algorithm in countless outdoor environments anywhere in the world and under realistic capture conditions. e main contribution of this work consists in performing a comparative evaluation of different methods to describe images to solve the localization problem in an outdoor dense map using only visual information. We have tested our algorithms using several sets of panoramic images captured in different outdoor environments. e results obtained in the work can be useful to select an appropriate description method for visual navigation tasks in outdoor environments using the Google Street View database and taking into consideration both the accuracy in localization and the computational efficiency of the algorithm. 1. Introduction Designing vehicles capable of navigating autonomously, in a previously unknown environment and with no human intervention, is a fundamental objective in mobile robotics. To achieve this objective, the vehicle must be able to build a model (or map) of the environment and to estimate its position within this model. A great variety of localization approaches can be found in the literature. In general, the position and orientation of the robot can be obtained from proprioceptive (odometer) or exteroceptive (laser, camera, or sonar) sensors, as presented in the works of run et al. [1] and Gil et al. [2]. With the exteroceptive approach, the use of computer vision to create a representation of the environment is very extended due to the good relationship quantity of infor- mation/cost that the cameras offer. e research developed during the last years in the topic of map creation using visual information is enormous, and new algorithms are presented continuously. Usually, one of the key points of these algorithms is the description of the visual information to extract relevant information which is useful for the robot to estimate its position and orientation. In general, the problem can be approached from two points of view: local features extraction and global-appearance approaches. In the first one, a number of landmarks (distinctive points or regions) are extracted from each scene and each landmark is described to obtain a descriptor which is invariant against changes in the robot position and orientation. Murillo et al. [3] presented an algorithm that made use of the SURF (Speeded Up Robust Features) description method [4] to improve the performance of appearance-based localization methods using omnidirec- tional images in large data sets. On the other hand, global- appearance approaches consist in representing each scene by a single descriptor which is computed working with the scene as a whole, with no local feature extraction. is approach has recently become popular and some examples can be found. Rossi et al. [5] present a metric to compute the image similarity using the Fourier Transform of spherical omnidirectional images in order to carry out the localization Hindawi Publishing Corporation Journal of Sensors Volume 2016, Article ID 1537891, 12 pages http://dx.doi.org/10.1155/2016/1537891

Upload: others

Post on 09-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Article A Study of Visual Descriptors for Outdoor ...downloads.hindawi.com/journals/js/2016/1537891.pdf · Research Article A Study of Visual Descriptors for Outdoor Navigation

Research ArticleA Study of Visual Descriptors for Outdoor NavigationUsing Google Street View Images

L Fernaacutendez L Payaacute O Reinoso L M Jimeacutenez and M Ballesta

Department of Systems Engineering and Automation Miguel Hernandez University Avda de la Universidad snElche 03202 Alicante Spain

Correspondence should be addressed to L Paya lpayaumhes

Received 22 March 2016 Revised 24 August 2016 Accepted 2 November 2016

Academic Editor Jose R Martinez-de Dios

Copyright copy 2016 L Fernandez et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

A comparative analysis between several methods to describe outdoor panoramic images is presentedThemain objective consists instudying the performance of thesemethods in the localization process of amobile robot (vehicle) in an outdoor environment whena visual map that contains images acquired from different positions of the environment is available With this aim we make useof the database provided by Google Street View which contains spherical panoramic images captured in urban environments andtheir GPS positionThemain benefit of using these images resides in the fact that it permits testing any novel localization algorithmin countless outdoor environments anywhere in the world and under realistic capture conditions The main contribution of thiswork consists in performing a comparative evaluation of different methods to describe images to solve the localization problemin an outdoor dense map using only visual information We have tested our algorithms using several sets of panoramic imagescaptured in different outdoor environments The results obtained in the work can be useful to select an appropriate descriptionmethod for visual navigation tasks in outdoor environments using the Google Street View database and taking into considerationboth the accuracy in localization and the computational efficiency of the algorithm

1 Introduction

Designing vehicles capable of navigating autonomously ina previously unknown environment and with no humanintervention is a fundamental objective in mobile roboticsTo achieve this objective the vehicle must be able to builda model (or map) of the environment and to estimate itsposition within this model A great variety of localizationapproaches can be found in the literature In general theposition and orientation of the robot can be obtained fromproprioceptive (odometer) or exteroceptive (laser camera orsonar) sensors as presented in the works of Thrun et al [1]and Gil et al [2]

With the exteroceptive approach the use of computervision to create a representation of the environment is veryextended due to the good relationship quantity of infor-mationcost that the cameras offer The research developedduring the last years in the topic of map creation usingvisual information is enormous and new algorithms arepresented continuously Usually one of the key points of these

algorithms is the description of the visual information toextract relevant information which is useful for the robot toestimate its position and orientation In general the problemcan be approached from two points of view local featuresextraction and global-appearance approaches In the first onea number of landmarks (distinctive points or regions) areextracted from each scene and each landmark is described toobtain a descriptor which is invariant against changes in therobot position and orientation Murillo et al [3] presentedan algorithm that made use of the SURF (Speeded Up RobustFeatures) descriptionmethod [4] to improve the performanceof appearance-based localization methods using omnidirec-tional images in large data sets On the other hand global-appearance approaches consist in representing each sceneby a single descriptor which is computed working with thescene as a whole with no local feature extraction Thisapproach has recently become popular and some examplescan be found Rossi et al [5] present a metric to computethe image similarity using the Fourier Transform of sphericalomnidirectional images in order to carry out the localization

Hindawi Publishing CorporationJournal of SensorsVolume 2016 Article ID 1537891 12 pageshttpdxdoiorg10115520161537891

2 Journal of Sensors

of a mobile robot Paya et al [6] present a framework tocarry out multirobot route following using an appearance-based approach with omnidirectional images to representthe environment and a probabilistic method to estimate thelocalization of the robot Finally Fernandez et al [7] deal withthe problemof robot localization using the visual informationprovided by a single omnidirectional cameramounted on therobot using techniques based on the global appearance ofpanoramic images and a Monte Carlo Localization (MCL)algorithm [8]

The availability of spherical images that represent outdoorenvironments is nowadays almost unlimited thanks to theservices of Google Street View Furthermore these imagesprovide a complete 360-degree view of the scenery in theground plane and 180-degree view vertically Thanks to thisgreat amount of information these images can be used tocarry out autonomous navigation tasks robustly Using a setof these previously available spherical images as a densevisual map of an environment it is possible to develop anautonomous localization and navigation system employingthe images captured by a mobile robot or vehicle andcomparing themwith themap information in order to resolvethe localization problemThis way in this paper we considerthe use of the images provided by Google Street View asa visual map of the environment in which a mobile robotmust be localized using the image acquired from an unknownposition

The literature regarding the navigation problem usingGoogle Street View information is somewhat sparse butgrowing in recent years For example Gamallo et al [9] pro-posed the combination of a low cost GPS with a particle filterto implement a vision based localization system that com-pares traversable regions detected by a camera with regionspreviously labeled in a map (composed of Google Mapsimages)Themain contribution of this work is that a syntheticimage of what the robot should see from the predictedposition is generated and compared with the real observationto calculate the weight of each particle Torii et al [10] tried topredict the GPS location of a query image given the GoogleStreet View databaseThis work presents a design of a match-ing procedure that considers linear combinations of bag-of-feature vectors of database images With respect to indoorpose estimation Aly and Bouguet [11] present an algorithmthat takes as input spherical Google Street View images and asoutput their relative pose up to a global scale Finally Tanejaet al [12] proposed a method to refine the calibration of theGoogle Street View images leveraging cadastral 3D informa-tion

The localization of the vehiclerobot can be formulatedas the problem of matching the currently captured imagewith the images previously stored in the dense map (imagesin the database) Nowadays a great variety of detection anddescription methods have been proposed in the context ofvisual navigation but in our opinion there exists no consen-sus on this matter when we use outdoor images

Amoros et al [13] carried out a review and comparisonof different global-appearance methods to create descriptorsof panoramic scenes in order to extract the most relevantinformation The authors of this work developed a set of

experiments with panoramic images captured in indoor envi-ronments to demonstrate the applicability of some appear-ance descriptors to robotic navigation tasks and to measuretheir quality in position and orientation estimationHoweveras far as outdoor scenarios are concerned there is no revisionof methods that offer good results This situation combinedwith the fact that using Google Street View images has barelybeen tested in autonomous navigation systems hasmotivatedthe work presented here Following this philosophy we madea comparison between different descriptors of panoramicimages but in this case we used Google Street View imagescaptured in outdoor environmentsThis is amore challengingproblem due to several features the openness of the images(ie the degree of dominance of some structures such asthe sky and the road which do not add distinctiveness tothe image) their changing lighting conditions and the largegeometrical distance between the points where the imageswere captured

Taking these features into account we consider that it isworth carrying out a comparative evaluation of the perfor-mance of different image descriptors under real conditions ofautonomous outdoor localization since it would be a nec-essary step prior to the implementation of a visual nav-igation framework In this paper we evaluate two differ-ent approaches approaches based on local features andapproaches based on global appearance In both cases we testthe performance of the descriptor depending on the mainparameters that configure it and we make a graphical repre-sentation of the precision of eachmethod versus the recall [14]

When a robot has to navigate autonomously outdoorsvery often a rough estimation of the area where the robotmoves is available and the robot must be able to estimate itsposition in this wide zone This work focus on this task weassume the zone where the robot navigates is approximatelyknown and it must estimate its position more accurately inthis area With this aim two different wide areas have beenchosen to evaluate the performance of the localization algo-rithms and a set of images per area has been obtained fromthe Google Street View database

The remainder of the paper is organized as follows In Sec-tion 2 we present the description methods evaluated in thiswork In Section 3 the experimental setup and the databaseswe have used are described Section 4 describes the methodwe have followed to evaluate the descriptors in a localizationprocess Section 5 presents the experimental results Finallyin Section 6 we outline the conclusions and the future works

2 Image Descriptors

In this section we present five different image descriptors thatare suitable to build a compact description of the appearanceof each scene [13ndash15]One of themethods previously denotedas a feature-based approach consists in representing theimage as a set of landmarks extracted from the scene alongwith the description of such landmarksThemethod selectedfor this landmarks description is SURF (Speeded Up RobustFeatures)Theothermethods chosen to carry out the compar-ative analysis are the following appearance-based methodsthe two-dimensional Discrete Fourier Transform (DFT) the

Journal of Sensors 3

Fourier Signature (FS) gist and the Histogram of OrientedGradients (HOG) Each method uses a different mechanismto express the global information of the scene First DFTand FS are based on the analysis in the frequency domainin two dimensions and one dimension respectively Secondthe approach of gist we use is built from edges informationobtained through Gabor filtering and analyzed in severalscales Finally HOG gathers systematic information from theorientation of the gradient in local areas of the image Thechoice of these description methods will permit analyzingthe influence of each kind of information in the localizationprocess

The initial objective of this study was to compare someglobal-appearance methods However we have decided toinclude in this comparative evaluation a local features descri-ption method to make a more complete study With this aimwe have chosen SURF due to its relatively low computa-tional cost comparing with other classical feature-basedapproaches

The next subsections present briefly the descriptionmethods included in the comparative evaluation

21 SURF and Harris Corner Detector The Speeded UpRobust Features (SURF) were introduced by Bay et al [4]This study showed that SURF outperform existing methodswith respect to repeatability robustness and distinctivenessof the descriptorsThe detectionmethod uses integral imagesto reduce the computational time and is based on the Hessianmatrix On the other hand the descriptor represents adistribution of Haar-wavelet responses within the interestpoint neighborhood and makes an efficient use of integralimages In this work we only include the standard SURFdescriptor which has a dimension of 64 components perlandmark but there are two more versions the extendedversion (E-SURF) with 128 elements and the upright version(U-SURF) that is not invariant to rotation and has a lengthof 64 elements On the other hand we perform the detectionof the features using the Harris corner detector (based on theeigenvalues of the second moment matrix [16]) because ourexperiments showed that this method extracted most robustpoints in outdoor images comparing to the SURF extractionmethod

This way themethod we use in this work is a combinationof these two algorithms More specifically the Harris cornerdetector is used to extract the features from the image and thestandard SURFdescriptor is used to characterize anddescribeeach one of the landmarks previously detected

22 Two-Dimensional Discrete Fourier Transform From animage119891(119909 119910)with119873119909 rows and119873119910 columns the 2DDiscreteFourier Transform (DFT) can be defined as follows119865 [119891 (119909 119910)] = 119865 (119906 V)

= 1119873119910119873119909 119873119909minus1sum119909=0119873119910minus1sum119910=0

119891 (119909 119910) sdot 119890minus2120587119895(119906119909119873119909+V119910119873119910)119906 = 0 119873119909 minus 1 V = 0 119873119910 minus 1

(1)

where (119906 V) are the frequency variables and the transformedfunction 119865(119906 V) is a complex function which can be decom-posed into a magnitudes matrix and an arguments matrixThis transformation presents some interesting propertieswhich are helpful in robot localization tasks First the mostrelevant information in the Fourier domain concentrates inthe low frequency components so it is possible to reducethe amount of memory and to optimize the computationalcost by retaining only the first 119896119909 rows and 119896119910 columns inthe transform Second when 119891(119909 119910) is a panoramic scene atranslation in the rows andor columns of the original imageproduces a change only in the arguments matrix [15] Thisway the magnitudes matrix contains information which isinvariant to rotations of the robot in the groundplane and thearguments matrix contains information that can be useful toestimate the orientation of the robot in this plane with respectto a reference image (using the DFT shift theorem)

Taking these facts into account the global description ofthe image 119891(119909 119910) consists of the magnitudes matrix 119860(119906 V)and the arguments matrix Φ(119906 V) of its two-dimensionalDFT The dimensions of both matrices are 119896119909 lt 119873119909 rowsand 119896119910 lt 119873119910 columns On the one hand 119860(119906 V) is useful toestimate the robot position and on the other hand the infor-mation in Φ(119906 V) can be used to estimate the robot orienta-tion

23 Fourier Signature The third image description methodused in this comparative analysis is the Fourier Signature(FS) described initially by Menegatti et al [17] From animage 119891(119909 119910) with119873119909 rows and119873119910 columns the FS consistsin obtaining the one-dimensional DFT of each row Thismethod presents some advantages such as its simplicity itslow computational cost and the fact that it exploits better theinvariance against rotations of the robot in the ground planewhen we work with panoramic views

More specifically the process to compute the FS consistsin transforming each row 119909 of the original panoramic image119891119909 = 1198911199090 1198911199091 119891119909119873119910minus1 119909 = 0 119873119909 minus 1 into thesequence of complex numbers 119865119909 = 1198651199090 1198651199091 119865119909119873119910minus1119909 = 0 119873119909 minus 1 according to the 1D-DFT expression

119865119909119896 = 119873119910minus1sum119899=0

119891119909119899 sdot 119890minus119895(2120587119873119910)119896119899119896 = 0 119873119910 minus 1 119909 = 0 119873119909 minus 1 (2)

The result is a complex matrix 119865(119909 V) where V is a fre-quency variable which can be decomposed into amagnitudesmatrix and an arguments matrix

Thanks to the 1D-DFT properties it is possible to repre-sent each row of 119865(119909 V) with the first coefficients since themost relevant information is concentrated in the low fre-quency components of each row in the descriptor so it ispossible to reduce the amount of memory by retaining only119896119910 first columns in signature 119865(119909 V) Also when 119891(119909 119910) isa panoramic scene the modules matrix is invariant againstrobot rotations in the ground plane and the magnitudesmatrix permits estimating the change in the robot orientationusing the DFT shift theorem [15 17 18]

4 Journal of Sensors

Taking these facts into account the global description ofthe image 119891(119909 119910) consists of the magnitudes matrix 119860(119909 V)and the arguments matrix Φ(119909 V) of the Fourier SignatureThe dimensions of both matrices are 119873119909 rows and 119896119910 lt 119873119910columns First the position of the robot can be estimatedusing the information in 119860(119909 V) since it is invariant tochanges in robot orientation and second Φ(119909 V) can be usedto estimate the robot orientation

24 Gist Theconcept of the gist of an image can be defined asan abstract representation that activates the memory of scenecategories [19]The gist-based descriptors try to represent theimage by obtaining its essential information simulating thehuman perception system and its ability to recognize a scenethrough the identification of color saliency or remarkablestructures Torralba [20] presents a model to obtain globalscene features working in several spatial frequencies andusing different scales based onGabor filteringThey use thesefeatures in a scene recognition and classification task Inprevious works [13] we employed a gist-Gabor descriptor inorder to obtain frequency and orientation information Dueto the good results obtained in indoor environments whenthemobile robot presents 3 DOF (degrees of freedom)move-ments on the ground plane the fourth method employed inthe comparative analysis presented in this paper is the gistdescriptor of panoramic images

The method starts with two versions of the initialpanoramic image 119891(119909 119910) the original one with119873119909 rows and119873119910 columns and a new version after applying a Gaussianlow-pass filter and subsampling to a new size equal to 05 sdot119873119909 times 05 sdot 119873119910 After that both images are filtered with a bankof 119899119891 Gabor filters whose orientations are evenly distributedto cover the whole circle Then to reduce the amount ofinformation the pixels into both images are grouped into 1198961horizontal blocks per imagewhosewidth is equal to119873119910 in thefirst image and 05sdot119873119910 in the second oneThe average value ofthe pixels in each group is calculated and all this informationis arranged into a final descriptor which is a column vector with 2 sdot 1198961 sdot 119899119891 components This descriptor is invariantagainst rotations of the vehicle on the ground plane Moreinformation about the method can be found in [13]

25 Histogram of Oriented Gradients The Histogram ofOriented Gradients (HOG) descriptors are based on theorientation of the gradient in local areas of an image It wasdescribed initially by Dalal and Triggs [21] More conciselyit consists first in obtaining the magnitude and orientationof the gradient of each pixel of the original image Thisimage is divided then into a set of cells and a histogram ofgradient orientation is compiled for each cell aggregating theinformation of the gradient orientation of each pixel withinthe cell weighting with the magnitude of the pixel

The omnidirectional images captured from a specificposition of the ground plane contain the same pixels in a rowindependently on the orientation of the robot in this planebut in a different order Taking this fact into account if wecalculate the histogram of cells that have the same width of

the original image we obtain a descriptor which is invariantagainst rotations of the robot

The method we use is described in depth in [22] andcan be summarized as follows The initial panoramic image119891(119909 119910) with 119873119909 rows and 119873119910 columns is first filtered toobtain two images with the information of the horizontal andvertical edges 119891119909(119909 119910) and 119891119910(119909 119910) From these two imagesthe magnitude of the gradient and its orientation is obtainedpixel by pixel and the results are stored in matrices119872(119909 119910)and Θ(119909 119910) respectively Matrix Θ(119909 119910) is then divided into1198962 horizontal cells whose width is equal to119873119910 For each cellan orientation histogramwith 119887 bins is compiled During thisprocess each pixel inΘ(119909 119910) is weighted with the magnitudeof the corresponding pixel in 119872(119909 119910) At the end of theprocess the set of histograms constitutes the final descriptorℎ which is a column vector with 1198962 sdot 119887 components

3 Experiments Setup

The main objective of this work consists in carrying out anexhaustive evaluation of the performance of the descriptionmethods presented in the previous section All thesemethodswill be included in a localization algorithm and their per-formance will be evaluated and compared both in terms ofcomputational cost and localization accuracy The results ofthis comparative evaluation will give us an idea of which isthe description method that offers the best results in outdoorenvironments when using Google Street View images

With this aim two different regions in the city of Elche(Spain) have been selected and the Google Maps images ofthese two areas have been obtained and stored in two datasets Each one of these data sets will constitute a map andwill be used subsequently to estimate the position of thevehicle within the map by comparing the image capturedby the vehicle from the unknown position with the imagespreviously stored in each map

Themain features of the two sets of images are as follows

Set 1 Set 1 consists of 177 full spherical panorama imageswithresolution generally up to 3328 times 1664 pixels Each imagecovers a field of view of 360 degrees in the ground plane and180 degrees vertically Figure 1 shows the GPS position whereeach image was captured (blue dots) and two examples ofthe panoramic images after a preprocessing process This setcorresponds with a mesh topography database that containsimages of various streets and open areasThe images cover anarea of approximately 700m times 300m

Set 2 Set 2 consists of 144 full spherical panorama imagesThe images have been captured along the same street with alinear topology covering approximately 1700m The appear-ance of these images is more urban Figure 2 shows the GPSposition where each image was captured (blue dots) andthree examples of the panoramic images after a preprocessingprocess

31 Image Preprocessing and Map Creation Due to the widevertical field of view of the acquisition system the sky is

Journal of Sensors 5

Figure 1 Bird eyersquos view of the region chosen as map 1 prior to the localization experiment The blue dots represent the coordinates wherethe images of the Set 1 were captured Two examples of Google Street View images after a preprocessing step are also shown

Figure 2 Bird eyersquos view of the region chosen as map 2 prior to the localization experiment The blue dots represent the coordinates wherethe images of Set 2 were captured Three examples of Google Street View images after a preprocessing step are also shown

often a big portion of the Google Street View images Theappearance of this area will be very prone to changes whenthe localization process is carried out in a different time of daywith respect to the time of day when the map was capturedTaking this fact into account a preprocessing step has beencarried out to remove part of the sky in the scenes

Once part of the sky has been removed fromall the scenesthe images are converted into grayscale and their resolutionis reduced to 512 times 128 pixels to ensure the computationalviability of the algorithms

After that each image will be described using the fivedescription methods presented in Section 2 At the endone map will be available per image set and per description

method Each map will be composed of the set of descriptorsof each panoramic scene

32 Localization Process Once the maps are available inorder to evaluate the different visual descriptors introducedin Section 2 to solve the localization problem we also makeuse of Google Street View images

To carry out the localization process first we choose oneof the images of the database (named as test image) In thismoment this image is removed from the map Second wecompute the descriptor of the test image (using one of themethods presented in Section 2) and obtain the distancebetween this descriptor and the descriptors of the rest of

6 Journal of Sensors

the images stored in the corresponding map As a resultthe images of the map are arranged from the nearest to thefarthest using the image distance as arranging criterion

The result of the localization algorithm is consideredcorrect if the first image it returns was captured on the mappoint which is geometrically the closest to the test imagecapture point (the GPS coordinates are used with this aim)We will refer to this case as a correct localization in zone 1However since this is a quite challenging and restrictive prob-lem it is also interesting to know if the first image that thealgorithm returns was captured on one of the two geometri-cally closest points to the test image capture point (zone 2) oreven on one of the three geometrically closest points (zone 3)The first case is the ideal one but we are also interested in theother cases as they will indicate if the algorithm is returningan image in the surroundings of the actual position of the testimage (ie the localization algorithm detects that the robot isin a zone close around its actual position)

This process is repeated for each description methodusing all the images of Sets 1 and 2 as test images In briefthe procedure to test the localization methods previouslyexplained consists in the following steps for each image anddescription method

(1) Extracting one image of the set (denoted as testimage) then this test image is eliminated from themap

(2) Calculating the descriptor of the test image(3) Calculating the distance between this descriptor and

all the map descriptors which we named imagedistance

(4) Retaining the most similar descriptor and studying ifit corresponds to one image that has been captured inthe surroundings of the test image capture point (zone1 2 or 3)

As a result the next data are retained for an individual testimage the image distance between the test image descriptorand the most similar map descriptor119863119905 and the localizationresults in zone 1 (correct or wrong match) 1198981199051 in zone 2119898t2 and in zone 3 1198981199053 After repeating this process with all

the test images the results will consist of four vectors whosedimension is equal to the number of test images The firstvector contains the distances119863119905 and the other three 12 and 3 contain respectively the information of corrector incorrect matches in zones 1 2 and 3

4 Evaluation Methods

In this work the localization results are expressed by meansof recall and precision curves [14] To build them the com-ponents of the vectors 1 2 and 3 are equally sorted inascending order of the distances that appear in the first vectorThe resulting sorted vectors of correct andwrongmatches arethen used to calculate the values of recall and precision Letus focus on the sorted vector of matches in zone 1 1199041 Firstfor each component in this vector the recall is calculated asthe number of correct matches obtained so far with respect to

0 02 04 06 08 10

02

04

06

08

1

Recall

Prec

ision

Descriptor 1Descriptor 2

Figure 3 Two sample recall-precision graphs obtained after carryingout the localization experiments with two different descriptionmethods

the total number of test images Second for each componentin the same vector the precision is obtained as the number ofcorrect matches obtained so far with respect to the numberof test images considered so far Then with the informationcontained in these vectors a precision versus recall curve isbuilt corresponding to the localization in zone 1 This isrepeated with the sorted vectors 1199042 and 1199043 to obtain thelocalization results in zone 2 and zone 3

In our experiments the most important piece of infor-mation of this type of graphs is the final point because itshows the global result of the experiment (final precision afterconsidering all the test images) However additional relevantinformation can be extracted from them because the graphalso shows the ability of the localization algorithm to findthe correct match while considering a specific image distancethreshold As explained in the previous paragraph the resultshave been arranged considering the ascending value of dis-tances Taking it into account as the recall increases thethreshold also does For this reason the evolution of therecall-precision curves contains information about the robust-ness of the algorithm with respect to a specific image distancethreshold If the precision values stay high independentlyof the recall this shows the presence of a lower number ofwrong results under this distance threshold Figure 3 showstwo sample recall-precision curves obtained after running thelocalization algorithm with all the test images and two dif-ferent description methods considering zone 1 Both curvesshow a similar final precision value between 06 and 065However the evolutions present a different behavior As anexample if we consider as threshold the distance associatedto recall = 025 according to the graph the precision ofdescriptor 1 is 100 but the precision of descriptor 2 is90Thismeans that considering the selected image distancethreshold 25 of correct localizations are achieved with100 of probability using descriptor 1 and with a 90 ofprobability using descriptor 2 This study can be carried outconsidering any value for the image distance threshold

Journal of Sensors 7

Before running the algorithm it is necessary to definethe image distance We use two different distance measuresdepending on the kind of descriptor used

First in the case of the feature-based method (SURF-Harris) it is necessary to extract the interest points prior todescribing the appearance of the image We propose usingthe Harris corner detector [16] to extract visual landmarksfrom the panoramic images After that each interest point isdescribed using standard SURF To compare the test imagewith themap images first we extract and describe the interestpoints from all the images After that a matching process iscarried out with these points The points detected in the testimage captured with a particular position and orientation ofthe vehicle are searched in themap imagesThe performanceof the matching method is not the scope of this paper weonly employ it as a tool Once the matching process has beencarried out we evaluate the performance of the descriptortaking into account the number of matched points so thatwe will consider as closest image the one that presents morematching points with the test image More concisely wecompute the distance between the test image 119905 and any otherimage of the map 119895 as

119863119905119895fea = 1 minus( 119873119872119905119895max119895 (997888997888997888997888rarr119873119872119905)) (3)

where119873119872119905119895 is the number of matches between the images 119905and 119895 997888997888997888997888rarr119873119872119905 = [1198731198721199051 119873119872119905119899map] is a vector that containsthe number of matches between the image 119905 and every imageof the map and 119899map is the number of images in the map

Second in the case of appearance-based methods (2DDFT FS gist and HOG) no local information needs to beextracted from the images Instead the appearance of thewhole images is compared This way the global descriptor ofthe test image is calculated and the distances between it andthe descriptors of themap images are obtainedTheEuclideandistance is used in this case defined as

119863119905119895119864 = radic 119872sum119898=1

(119898119905 minus 119898119895 )2 (4)

where 119905 is the descriptor of the test image 119905 119895 is thedescriptor of the map image 119895 and 119872 is the size of thedescriptors This distance is normalized to obtain the finaldistance between the images 119905 and 119895 according to the nextexpression

119863119905119895app = 119863119905119895119864max119895 (119905119864) (5)

where119863119905119895119864 is the Euclidean distance between the descriptor ofthe test image 119905 and the map image 119895 119905119864 = [1198631199051119864 119863119905119899map

119864 ]is a vector that contains the Euclidean distance between thedescriptor of the image 119905 and all the images in the map and119899map is the number of images in the map

It is important to note that the algorithm must be ableto estimate the position of the robot with accuracy but it isalso important that the computational cost is adequate toknowwhether it would be feasible to solve the problem in realtime To estimate the computational cost we have computedconsidering bothmaps in the experiments the necessary timeto calculate the descriptor of each test image to compute thedistance to themap descriptors and to detect themost similardescriptor We must take into account that the descriptors ofall the map images can be computed prior to the localizationin an off-line process Apart from the time we have alsoestimated the amount of memory needed to store each imagedescriptor

To finish we also propose to study the relationship dis-tance between two image descriptors versus geometric distancebetween the capture points of these two images Ideally the dis-tance between the descriptors must increase as the geometricdistance between capture points does (ie it must not presentany local minima) This information is very interesting inapplications such as map building where the robot mustbe able to build a map using as input information only thedistance between image descriptors It is also important whenit is necessary to estimate the position of the vehicle at halfwaypoints within the gridmap Additionally it may help to detectif the problem of visual aliasing is present in the environment(ie two zones which are geometrically far may present asimilar visual appearance which might lead to errors in themapping and localization process)

5 Experimental Results

As stated in the previous section with the purpose of estab-lishing the capacity of each descriptor to correctly localizethe robot (or vehicle) we have built recall-precision curvesto reflect the results of each experiment Figure 4 shows thisgraphical representation using (a) the first and (b) the secondset of images (denoted as Sets 1 and 2 in previous sections) Tobuild this figure we consider the localization results in zone 1This way the figure shows the ability of the localization algo-rithm to correctly detect which image of the map was cap-tured closer to the test imageThis is themost restrictive case

Apart from it the performance of the localization algo-rithm in zones 2 and 3 has also been studied This wayFigure 5 shows the results of the localization process in zone2 using (a) Set 1 and (b) Set 2 Finally Figure 6 shows thelocalization results in zone 3 using (a) Set 1 and (b) Set 2Thisis the least restrictive case among the three studied

In all cases the results show that the SURF-Harris des-criptor presents a relatively better performance comparing tothe other descriptors in terms of accuracy and using bothimage sets As far as the methods based on global appearanceare concerned the good behavior ofHOG can be highlightedIn the case of the localization in zone 2 it reaches 60and 50of precision in Sets 1 and 2 respectively These results can beconsidered relatively good taking into account the fact thatthe localization process is solved in an absolute way (ie weconsider that no information about the previous position ofthe robot is available and the test image is compared with allthe images stored in the data sets) In a real application it

8 Journal of Sensors

0 01 02 03 04 050

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(a)

0 01 02 03 04 05 060

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(b)

Figure 4 Results of the localization algorithm considering the correct matches in zone 1 using (a) Set 1 and (b) Set 2 The results of eachdescription method are shown as different recall-precision curves

0 01 02 03 04 05 06 070

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(a)

0 02 04 06 080

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(b)

Figure 5 Results of the localization algorithm considering the correct matches in zone 2 using (a) Set 1 and (b) Set 2 The results of eachdescription method are shown as different recall-precision curves

is usual to make use of any kind of probabilistic algorithmto estimate the position of the robot taking into account itsprevious estimated position This is expected to provide ahigher accuracyWe expect to develop this type of algorithmsand tests in a future work

Some additional conclusions can be reached by compar-ing the performance of the methods in open areas (Set 1) andurban areas (Set 2) In open areas the performance of SURF-Harris HOG and gist is quite similar and relatively goodin all cases and the methods based on the Discrete FourierTransform tend to present worse results However in the case

of urban areas SURF-Harris outperforms the other methodsand gist is the one that presents the worst results

Apart from the localization accuracy it is also importantto study the computational cost of the process since in a realapplication it would have to run in real time as the robotis navigating through the environment This way we haveobtained in all cases the necessary time to calculate thedescriptor of the test image on the one hand and to compare itwith the descriptors stored in themap to detect themost sim-ilar descriptor and to analyze the results on the other handThe average computational time of the localization process

Journal of Sensors 9

0 02 04 06 080

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(a)

0 02 04 06 080

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(b)

Figure 6 Results of the localization algorithm considering the correct matches in zone 3 using (a) Set 1 and (b) Set 2 The results of eachdescription method are shown as different recall-precision curves

Table 1 Average computational cost of the description algorithms studied per test image For each descriptionmethod and data set the tableshows first the necessary time to obtain the test image descriptor and second the time to compare it with the descriptors of the map and toobtain the final localization result

2D Fourier Fourier Signature Gist HOG SURF-HarrisData Set 1 Descriptor 00087 s 00080 s 04886 s 00608 s 05542 sData Set 1 Match 00015 s 00058 s 00006 s 00008 s 258085 sData Set 2 Descriptor 00085 s 00079 s 04828 s 00621 s 05389 sData Set 2 Match 00012 s 00047 s 00005 s 00006 s 193931 s

after considering all the test images is shown in Table 1 Toobtain the results of this table the algorithms have beenimplemented using Matlab

With respect to the computational cost the methodsbased on the Fourier Transform are significantly faster thanthe rest while SURF-Harris presents a considerably highcomputational cost About the necessary time to compare twodescriptors gist andHOG are the fastest methods In the caseof SURF-Harris the brute force match method implementedresults in a relatively high computational cost This methodhas been chosen to make a homogeneous comparison withthe other global-appearance methods However in a realimplementation a bag-of-words based approach [23] wouldimprove the computational efficiency of the algorithm

At last we have obtained the averagememory size neededto store each descriptorThe results are shown in Table 2Gistis the most compact descriptor (it is able to compress theinformation in each scene significantly) while SURF-Harrisneeds more memory size

Considering these results jointly with the precision inlocalization we could say that the SURF-Harris descriptorshows very good results in location accuracy but its compu-tational cost makes it unfeasible to solve a real applicationHOG which is the second in terms of accuracy also has avery good computational cost so we consider it interesting to

study more thoroughly this descriptor as future work and toimplement more advanced versions of this method to try tooptimize the accuracy Likewise other types of distances tocompare images could also be studied apart from the Euc-lidean distance For the same reasons we also consider itappropriate to examine more thoroughly the gist descriptorsas well as using other methods to extract the gist of a sceneapart from the orientation information (eg from the colorinformation)

As a final experiment we have studied the relationshipdistance between two image descriptors versus geometric dista-nce between the capture points of these two images As statedat the beginning of the section this information is very inter-esting in some applications such as the construction of mapsfrom the images with geometric precision or the localizationof the vehicle at halfway points of the grid of the map It isimportant that the distance between descriptors grows as thegeometric distance does Figure 7 shows the results obtainedusing (a) Set 1 and (b) Set 2 To obtain these figures an imagehas been set as a reference image and the distance betweenthe reference image descriptor and the other descriptors hasbeen calculatedThefigure shows this distance versus the geo-metric distance between the capture points of each image andthe capture point of the reference image In both cases thisrelationship is monotonically increasing up to a geometric

10 Journal of Sensors

Table 2 Necessary memory to store each descriptor

2D Fourier Fourier Signature Gist HOG SURF-HarrisDescriptor 16384 bytes 32768 bytes 4096 bytes 8192 bytes 110400 bytes

0 200 400 600 8000

02

04

06

08

1

Des

crip

tor d

istan

ce

2D Fourier Fourier SignatureHOG

GistSURF-Harris

Geometrical distance (m)

12

(a)

Des

crip

tor d

istan

ce0 200 400 600

0

02

04

06

08

1

12

2D FourierFourier SignatureHOG

GistSURF-Harris

Geometrical distance (m)

(b)

Figure 7 Relationship distance between two image descriptors versus geometric distance between the capture points of these two images using(a) Set 1 and (b) Set 2

distance of approximately 100meters From this point it tendsto stabilize with a relatively high variance The exceptionis the local features descriptor which stabilizes at the finalvalue from a very small geometric distance However theappearance-based descriptors exhibit a more linear behavioraround each image

6 Conclusions and Future Works

In this paper we have carried out a comparative evaluationof several description methods of scenes considering theperformance of these methods to accurately solve an absolutelocalization problem in a large real outdoor environmentWeevaluated two different approaches of visual descriptors localfeatures descriptors (SURF-Harris) and global-appearancedescriptors (2D Discrete Fourier Transform Fourier Signa-ture HOG and gist)

All the tests have been carried out with images ofGoogle Street View captured under realistic conditions Twodifferent areas of a city have been considered an open areaand a purely urban area with narrower streets The capturepoints of each area present different topographyThe first oneis a grid map that covers several streets and avenues and thesecond one a linear map (ie the images were captured whenthe mobile traversed a linear path on a narrow street)

Some different studies have been performed First wehave evaluated the accuracy of the localization process To

do this recall and precision curves have been computed tocompare the performance of each description method Weplot the recall and precision curves for both areas takinginto account different levels of accuracy to consider thatthe localization result is correct In these experiments thecomputational cost of the localization process has also beenanalyzed

We have also studied each descriptor in terms of behaviorof the descriptor distance comparing to geometrical distancebetween image capture points To do this we plot a curvethat represents the descriptor distance versus the geometricaldistance between capture points This measure is very usefulfor performing navigation tasks since thanks to it we canestimate the range of use of the descriptor

It is noticeable that the SURF-Harris descriptor is themost suitable descriptor in terms of precision in localizationbut it presents a smaller zone of work in terms of Euclideandistance between descriptorsTheHOGdescriptor has showna relatively good performance to solve the localization prob-lem and presents a good response of the descriptors distanceversus geometrical distance between capture points If weanalyze jointly the results of both experiments and take intoaccount the computational cost (Tables 1 and 2) we concludethat although the SURF-Harris descriptor presents the bestresults in terms of recall and precision curves it does notallow us to work in real time Therefore taking into account

Journal of Sensors 11

that HOG is the descriptor that presents the second bestresults in terms of recall and precision curves and allows us towork in real time we can conclude that the HOG is the mostsuitable descriptor

We plan to extend this work to (a) capture a real out-door trajectory traveling along several streets and capturingomnidirectional images using a catadioptric vision system(b) combine the information provided by this vision systemand the images of the Google Street View and (c) evaluatethe performance of the best descriptors in a probabilisticlocalization process

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work has been supported by the Spanish Governmentthrough the Project DPI 2013-41557-P Navegacion de Robotsen Entornos DinamicosMedianteMapas Compactos con Infor-macion Visual de Apariencia Global and by the GeneralitatValenciana through the ProjectsAICO2015021 Localizaciony Creacion de Mapas Visuales para Navegacion de Robots con6 GDL and GV2015031 Creacion de Mapas Topologicos aPartir de la Apariencia Global de un Conjunto de Escenas

References

[1] S Thrun D Fox W Burgard and F Dellaert ldquoRobust MonteCarlo localization for mobile robotsrdquo Artificial Intelligence vol128 no 1-2 pp 99ndash141 2001

[2] A Gil O Reinoso M A Vicente C Fernandez and LPaya ldquoMonte carlo localization using SIFT featuresrdquo in PatternRecognition and Image Analysis vol 3522 of Lecture Notes inComputer Science pp 623ndash630 2005

[3] AMurillo J Guerrero andC Sagues ldquoSurf features for efficientrobot localization with omnidirectional imagesrdquo in Proceedingsof the IEEE International Conference on Robotics amp AutomationSan Diego Calif USA 2007

[4] H Bay T Tuytelaars and L Van Gool ldquoSurf speeded up robustfeaturesrdquo in Proceedings of the 9th European Conference onComputer Vision Graz Austria 2006

[5] F Rossi A Ranganathan F Dellaert and EMenegatti ldquoTowardtopological localization with spherical fourier transform anduncalibrated camerardquo in Proceedings of the International Con-ference on Simulation Modeling and Programming for Auto-nomous Robots (SIMPAR rsquo08) pp 319ndash330 Venice Italy 2008

[6] L Paya O Reinoso F Amoros L Fernandez and A Gil ldquoPro-babilistic map building localization and navigation of a team ofmobile robots application to route followingrdquo in Multi-RobotSystems Trends and Development pp 191ndash210 2011

[7] L Fernandez L Paya D Valiente A Gil and O ReinosoldquoMonte Carlo localization using the global appearance of omni-directional images algorithmoptimization to large indoor envi-ronmentsrdquo in Proceedings of the 9th International Conference onInformatics in Control Automation and Robotics (ICINCO rsquo12)pp 439ndash442 Rome Italy July 2012

[8] M Montemerlo FastSLAM a factored solution to the simul-taneous localization and mapping problem with unknown dataassociation [PhD thesis] Robotics Institute Carnegie MellonUniversity Pittsburgh Pa USA 2003

[9] C Gamallo P Quintıa R Iglesias-Rodrıguez J V Lorenzo andC V Regueiro ldquoCombination of a low cost GPS with visuallocalization based on a previous map for outdoor navigationrdquoin Proceedings of the 11th International Conference on IntelligentSystems Design and Applications (ISDA rsquo11) pp 1146ndash1151Cordoba Spain November 2011

[10] A Torii J Sivic and T Pajdla ldquoVisual localization by linearcombination of image descriptorsrdquo in Proceedings of the IEEEInternational Conference on Computer VisionWorkshops (ICCVWorkshops rsquo11) pp 102ndash109 Barcelona Spain November 2011

[11] M Aly and J-Y Bouguet ldquoStreet view goes indoors automaticpose estimation fromuncalibrated unordered spherical panora-masrdquo in Proceedings of the IEEEWorkshop on the Applications ofComputer Vision (WACV rsquo12) pp 1ndash8 Breckenridge Colo USAJanuary 2012

[12] A Taneja L Ballan andM Pollefeys ldquoRegistration of sphericalpanoramic images with cadastral 3d modelsrdquo in Proceedings ofthe International Conference on 3D Imaging Modeling Process-ing Visualization and Transmission (3DIMPVT rsquo12) pp 479ndash486 Zurich Switzerland 2012

[13] F Amoros L Paya O Reinoso and L Jimenez ldquoComparisonof global-appearance techniques applied to visual map buildingand localizationrdquo in Proceedings of the International Conferenceon Computer Vision Theory and Applications pp 395ndash398Rome Italy 2012

[14] A Gil O M Mozos M Ballesta and O Reinoso ldquoA compara-tive evaluation of interest point detectors and local descriptorsfor visual SLAMrdquo Machine Vision and Applications vol 21 no6 pp 905ndash920 2010

[15] L Paya L FernandezO ReinosoAGil andDUbeda ldquoAppea-rance-based dense maps creation Comparison of compressiontechniques with panoramic imagesrdquo in Proceedings of the Inter-national Conference on Informatics in Control Automation andRobotics (INSTICC rsquo09) pp 238ndash246 Milan Italy 2009

[16] C Harris and M Stephens ldquoA combined corner and edgedetectorrdquo in Proceedings of the Alvey Vision Conference pp 231ndash236 Manchester UK 1988

[17] E Menegatti T Maeda and H Ishiguro ldquoImage-based mem-ory for robot navigation using properties of omnidirectionalimagesrdquo Robotics and Autonomous Systems vol 47 no 4 pp251ndash267 2004

[18] L Paya L Fernandez A Gil and O Reinoso ldquoMap build-ing and Monte Carlo localization using global appearance ofomnidirectional imagesrdquo Sensors vol 10 no 12 pp 11468ndash114972010

[19] A Friedman ldquoFraming pictures the role of knowledge in auto-matized encoding andmemory for gistrdquo Journal of ExperimentalPsychology General vol 108 no 3 pp 316ndash355 1979

[20] A Torralba ldquoContextual priming for object detectionrdquo Interna-tional Journal of Computer Vision vol 53 no 2 pp 169ndash1912003

[21] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) pp 886ndash893 Washington DC USA June 2005

12 Journal of Sensors

[22] F Amoros L Paya O Reinoso L Fernandez and J MarınldquoVisual map building and localization with an appearance-based approachmdashcomparisons of techniques to extract infor-mation of panoramic imagesrdquo in Proceedings of the 7th Inter-national Conference on Informatics in Control Automation andRobotics pp 423ndash426 2010

[23] D Filliat ldquoA visual bag of words method for interactive quali-tative localization and mappingrdquo in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRArsquo07) pp 3921ndash3926 Roma Italy April 2007

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 2: Research Article A Study of Visual Descriptors for Outdoor ...downloads.hindawi.com/journals/js/2016/1537891.pdf · Research Article A Study of Visual Descriptors for Outdoor Navigation

2 Journal of Sensors

of a mobile robot Paya et al [6] present a framework tocarry out multirobot route following using an appearance-based approach with omnidirectional images to representthe environment and a probabilistic method to estimate thelocalization of the robot Finally Fernandez et al [7] deal withthe problemof robot localization using the visual informationprovided by a single omnidirectional cameramounted on therobot using techniques based on the global appearance ofpanoramic images and a Monte Carlo Localization (MCL)algorithm [8]

The availability of spherical images that represent outdoorenvironments is nowadays almost unlimited thanks to theservices of Google Street View Furthermore these imagesprovide a complete 360-degree view of the scenery in theground plane and 180-degree view vertically Thanks to thisgreat amount of information these images can be used tocarry out autonomous navigation tasks robustly Using a setof these previously available spherical images as a densevisual map of an environment it is possible to develop anautonomous localization and navigation system employingthe images captured by a mobile robot or vehicle andcomparing themwith themap information in order to resolvethe localization problemThis way in this paper we considerthe use of the images provided by Google Street View asa visual map of the environment in which a mobile robotmust be localized using the image acquired from an unknownposition

The literature regarding the navigation problem usingGoogle Street View information is somewhat sparse butgrowing in recent years For example Gamallo et al [9] pro-posed the combination of a low cost GPS with a particle filterto implement a vision based localization system that com-pares traversable regions detected by a camera with regionspreviously labeled in a map (composed of Google Mapsimages)Themain contribution of this work is that a syntheticimage of what the robot should see from the predictedposition is generated and compared with the real observationto calculate the weight of each particle Torii et al [10] tried topredict the GPS location of a query image given the GoogleStreet View databaseThis work presents a design of a match-ing procedure that considers linear combinations of bag-of-feature vectors of database images With respect to indoorpose estimation Aly and Bouguet [11] present an algorithmthat takes as input spherical Google Street View images and asoutput their relative pose up to a global scale Finally Tanejaet al [12] proposed a method to refine the calibration of theGoogle Street View images leveraging cadastral 3D informa-tion

The localization of the vehiclerobot can be formulatedas the problem of matching the currently captured imagewith the images previously stored in the dense map (imagesin the database) Nowadays a great variety of detection anddescription methods have been proposed in the context ofvisual navigation but in our opinion there exists no consen-sus on this matter when we use outdoor images

Amoros et al [13] carried out a review and comparisonof different global-appearance methods to create descriptorsof panoramic scenes in order to extract the most relevantinformation The authors of this work developed a set of

experiments with panoramic images captured in indoor envi-ronments to demonstrate the applicability of some appear-ance descriptors to robotic navigation tasks and to measuretheir quality in position and orientation estimationHoweveras far as outdoor scenarios are concerned there is no revisionof methods that offer good results This situation combinedwith the fact that using Google Street View images has barelybeen tested in autonomous navigation systems hasmotivatedthe work presented here Following this philosophy we madea comparison between different descriptors of panoramicimages but in this case we used Google Street View imagescaptured in outdoor environmentsThis is amore challengingproblem due to several features the openness of the images(ie the degree of dominance of some structures such asthe sky and the road which do not add distinctiveness tothe image) their changing lighting conditions and the largegeometrical distance between the points where the imageswere captured

Taking these features into account we consider that it isworth carrying out a comparative evaluation of the perfor-mance of different image descriptors under real conditions ofautonomous outdoor localization since it would be a nec-essary step prior to the implementation of a visual nav-igation framework In this paper we evaluate two differ-ent approaches approaches based on local features andapproaches based on global appearance In both cases we testthe performance of the descriptor depending on the mainparameters that configure it and we make a graphical repre-sentation of the precision of eachmethod versus the recall [14]

When a robot has to navigate autonomously outdoorsvery often a rough estimation of the area where the robotmoves is available and the robot must be able to estimate itsposition in this wide zone This work focus on this task weassume the zone where the robot navigates is approximatelyknown and it must estimate its position more accurately inthis area With this aim two different wide areas have beenchosen to evaluate the performance of the localization algo-rithms and a set of images per area has been obtained fromthe Google Street View database

The remainder of the paper is organized as follows In Sec-tion 2 we present the description methods evaluated in thiswork In Section 3 the experimental setup and the databaseswe have used are described Section 4 describes the methodwe have followed to evaluate the descriptors in a localizationprocess Section 5 presents the experimental results Finallyin Section 6 we outline the conclusions and the future works

2 Image Descriptors

In this section we present five different image descriptors thatare suitable to build a compact description of the appearanceof each scene [13ndash15]One of themethods previously denotedas a feature-based approach consists in representing theimage as a set of landmarks extracted from the scene alongwith the description of such landmarksThemethod selectedfor this landmarks description is SURF (Speeded Up RobustFeatures)Theothermethods chosen to carry out the compar-ative analysis are the following appearance-based methodsthe two-dimensional Discrete Fourier Transform (DFT) the

Journal of Sensors 3

Fourier Signature (FS) gist and the Histogram of OrientedGradients (HOG) Each method uses a different mechanismto express the global information of the scene First DFTand FS are based on the analysis in the frequency domainin two dimensions and one dimension respectively Secondthe approach of gist we use is built from edges informationobtained through Gabor filtering and analyzed in severalscales Finally HOG gathers systematic information from theorientation of the gradient in local areas of the image Thechoice of these description methods will permit analyzingthe influence of each kind of information in the localizationprocess

The initial objective of this study was to compare someglobal-appearance methods However we have decided toinclude in this comparative evaluation a local features descri-ption method to make a more complete study With this aimwe have chosen SURF due to its relatively low computa-tional cost comparing with other classical feature-basedapproaches

The next subsections present briefly the descriptionmethods included in the comparative evaluation

21 SURF and Harris Corner Detector The Speeded UpRobust Features (SURF) were introduced by Bay et al [4]This study showed that SURF outperform existing methodswith respect to repeatability robustness and distinctivenessof the descriptorsThe detectionmethod uses integral imagesto reduce the computational time and is based on the Hessianmatrix On the other hand the descriptor represents adistribution of Haar-wavelet responses within the interestpoint neighborhood and makes an efficient use of integralimages In this work we only include the standard SURFdescriptor which has a dimension of 64 components perlandmark but there are two more versions the extendedversion (E-SURF) with 128 elements and the upright version(U-SURF) that is not invariant to rotation and has a lengthof 64 elements On the other hand we perform the detectionof the features using the Harris corner detector (based on theeigenvalues of the second moment matrix [16]) because ourexperiments showed that this method extracted most robustpoints in outdoor images comparing to the SURF extractionmethod

This way themethod we use in this work is a combinationof these two algorithms More specifically the Harris cornerdetector is used to extract the features from the image and thestandard SURFdescriptor is used to characterize anddescribeeach one of the landmarks previously detected

22 Two-Dimensional Discrete Fourier Transform From animage119891(119909 119910)with119873119909 rows and119873119910 columns the 2DDiscreteFourier Transform (DFT) can be defined as follows119865 [119891 (119909 119910)] = 119865 (119906 V)

= 1119873119910119873119909 119873119909minus1sum119909=0119873119910minus1sum119910=0

119891 (119909 119910) sdot 119890minus2120587119895(119906119909119873119909+V119910119873119910)119906 = 0 119873119909 minus 1 V = 0 119873119910 minus 1

(1)

where (119906 V) are the frequency variables and the transformedfunction 119865(119906 V) is a complex function which can be decom-posed into a magnitudes matrix and an arguments matrixThis transformation presents some interesting propertieswhich are helpful in robot localization tasks First the mostrelevant information in the Fourier domain concentrates inthe low frequency components so it is possible to reducethe amount of memory and to optimize the computationalcost by retaining only the first 119896119909 rows and 119896119910 columns inthe transform Second when 119891(119909 119910) is a panoramic scene atranslation in the rows andor columns of the original imageproduces a change only in the arguments matrix [15] Thisway the magnitudes matrix contains information which isinvariant to rotations of the robot in the groundplane and thearguments matrix contains information that can be useful toestimate the orientation of the robot in this plane with respectto a reference image (using the DFT shift theorem)

Taking these facts into account the global description ofthe image 119891(119909 119910) consists of the magnitudes matrix 119860(119906 V)and the arguments matrix Φ(119906 V) of its two-dimensionalDFT The dimensions of both matrices are 119896119909 lt 119873119909 rowsand 119896119910 lt 119873119910 columns On the one hand 119860(119906 V) is useful toestimate the robot position and on the other hand the infor-mation in Φ(119906 V) can be used to estimate the robot orienta-tion

23 Fourier Signature The third image description methodused in this comparative analysis is the Fourier Signature(FS) described initially by Menegatti et al [17] From animage 119891(119909 119910) with119873119909 rows and119873119910 columns the FS consistsin obtaining the one-dimensional DFT of each row Thismethod presents some advantages such as its simplicity itslow computational cost and the fact that it exploits better theinvariance against rotations of the robot in the ground planewhen we work with panoramic views

More specifically the process to compute the FS consistsin transforming each row 119909 of the original panoramic image119891119909 = 1198911199090 1198911199091 119891119909119873119910minus1 119909 = 0 119873119909 minus 1 into thesequence of complex numbers 119865119909 = 1198651199090 1198651199091 119865119909119873119910minus1119909 = 0 119873119909 minus 1 according to the 1D-DFT expression

119865119909119896 = 119873119910minus1sum119899=0

119891119909119899 sdot 119890minus119895(2120587119873119910)119896119899119896 = 0 119873119910 minus 1 119909 = 0 119873119909 minus 1 (2)

The result is a complex matrix 119865(119909 V) where V is a fre-quency variable which can be decomposed into amagnitudesmatrix and an arguments matrix

Thanks to the 1D-DFT properties it is possible to repre-sent each row of 119865(119909 V) with the first coefficients since themost relevant information is concentrated in the low fre-quency components of each row in the descriptor so it ispossible to reduce the amount of memory by retaining only119896119910 first columns in signature 119865(119909 V) Also when 119891(119909 119910) isa panoramic scene the modules matrix is invariant againstrobot rotations in the ground plane and the magnitudesmatrix permits estimating the change in the robot orientationusing the DFT shift theorem [15 17 18]

4 Journal of Sensors

Taking these facts into account the global description ofthe image 119891(119909 119910) consists of the magnitudes matrix 119860(119909 V)and the arguments matrix Φ(119909 V) of the Fourier SignatureThe dimensions of both matrices are 119873119909 rows and 119896119910 lt 119873119910columns First the position of the robot can be estimatedusing the information in 119860(119909 V) since it is invariant tochanges in robot orientation and second Φ(119909 V) can be usedto estimate the robot orientation

24 Gist Theconcept of the gist of an image can be defined asan abstract representation that activates the memory of scenecategories [19]The gist-based descriptors try to represent theimage by obtaining its essential information simulating thehuman perception system and its ability to recognize a scenethrough the identification of color saliency or remarkablestructures Torralba [20] presents a model to obtain globalscene features working in several spatial frequencies andusing different scales based onGabor filteringThey use thesefeatures in a scene recognition and classification task Inprevious works [13] we employed a gist-Gabor descriptor inorder to obtain frequency and orientation information Dueto the good results obtained in indoor environments whenthemobile robot presents 3 DOF (degrees of freedom)move-ments on the ground plane the fourth method employed inthe comparative analysis presented in this paper is the gistdescriptor of panoramic images

The method starts with two versions of the initialpanoramic image 119891(119909 119910) the original one with119873119909 rows and119873119910 columns and a new version after applying a Gaussianlow-pass filter and subsampling to a new size equal to 05 sdot119873119909 times 05 sdot 119873119910 After that both images are filtered with a bankof 119899119891 Gabor filters whose orientations are evenly distributedto cover the whole circle Then to reduce the amount ofinformation the pixels into both images are grouped into 1198961horizontal blocks per imagewhosewidth is equal to119873119910 in thefirst image and 05sdot119873119910 in the second oneThe average value ofthe pixels in each group is calculated and all this informationis arranged into a final descriptor which is a column vector with 2 sdot 1198961 sdot 119899119891 components This descriptor is invariantagainst rotations of the vehicle on the ground plane Moreinformation about the method can be found in [13]

25 Histogram of Oriented Gradients The Histogram ofOriented Gradients (HOG) descriptors are based on theorientation of the gradient in local areas of an image It wasdescribed initially by Dalal and Triggs [21] More conciselyit consists first in obtaining the magnitude and orientationof the gradient of each pixel of the original image Thisimage is divided then into a set of cells and a histogram ofgradient orientation is compiled for each cell aggregating theinformation of the gradient orientation of each pixel withinthe cell weighting with the magnitude of the pixel

The omnidirectional images captured from a specificposition of the ground plane contain the same pixels in a rowindependently on the orientation of the robot in this planebut in a different order Taking this fact into account if wecalculate the histogram of cells that have the same width of

the original image we obtain a descriptor which is invariantagainst rotations of the robot

The method we use is described in depth in [22] andcan be summarized as follows The initial panoramic image119891(119909 119910) with 119873119909 rows and 119873119910 columns is first filtered toobtain two images with the information of the horizontal andvertical edges 119891119909(119909 119910) and 119891119910(119909 119910) From these two imagesthe magnitude of the gradient and its orientation is obtainedpixel by pixel and the results are stored in matrices119872(119909 119910)and Θ(119909 119910) respectively Matrix Θ(119909 119910) is then divided into1198962 horizontal cells whose width is equal to119873119910 For each cellan orientation histogramwith 119887 bins is compiled During thisprocess each pixel inΘ(119909 119910) is weighted with the magnitudeof the corresponding pixel in 119872(119909 119910) At the end of theprocess the set of histograms constitutes the final descriptorℎ which is a column vector with 1198962 sdot 119887 components

3 Experiments Setup

The main objective of this work consists in carrying out anexhaustive evaluation of the performance of the descriptionmethods presented in the previous section All thesemethodswill be included in a localization algorithm and their per-formance will be evaluated and compared both in terms ofcomputational cost and localization accuracy The results ofthis comparative evaluation will give us an idea of which isthe description method that offers the best results in outdoorenvironments when using Google Street View images

With this aim two different regions in the city of Elche(Spain) have been selected and the Google Maps images ofthese two areas have been obtained and stored in two datasets Each one of these data sets will constitute a map andwill be used subsequently to estimate the position of thevehicle within the map by comparing the image capturedby the vehicle from the unknown position with the imagespreviously stored in each map

Themain features of the two sets of images are as follows

Set 1 Set 1 consists of 177 full spherical panorama imageswithresolution generally up to 3328 times 1664 pixels Each imagecovers a field of view of 360 degrees in the ground plane and180 degrees vertically Figure 1 shows the GPS position whereeach image was captured (blue dots) and two examples ofthe panoramic images after a preprocessing process This setcorresponds with a mesh topography database that containsimages of various streets and open areasThe images cover anarea of approximately 700m times 300m

Set 2 Set 2 consists of 144 full spherical panorama imagesThe images have been captured along the same street with alinear topology covering approximately 1700m The appear-ance of these images is more urban Figure 2 shows the GPSposition where each image was captured (blue dots) andthree examples of the panoramic images after a preprocessingprocess

31 Image Preprocessing and Map Creation Due to the widevertical field of view of the acquisition system the sky is

Journal of Sensors 5

Figure 1 Bird eyersquos view of the region chosen as map 1 prior to the localization experiment The blue dots represent the coordinates wherethe images of the Set 1 were captured Two examples of Google Street View images after a preprocessing step are also shown

Figure 2 Bird eyersquos view of the region chosen as map 2 prior to the localization experiment The blue dots represent the coordinates wherethe images of Set 2 were captured Three examples of Google Street View images after a preprocessing step are also shown

often a big portion of the Google Street View images Theappearance of this area will be very prone to changes whenthe localization process is carried out in a different time of daywith respect to the time of day when the map was capturedTaking this fact into account a preprocessing step has beencarried out to remove part of the sky in the scenes

Once part of the sky has been removed fromall the scenesthe images are converted into grayscale and their resolutionis reduced to 512 times 128 pixels to ensure the computationalviability of the algorithms

After that each image will be described using the fivedescription methods presented in Section 2 At the endone map will be available per image set and per description

method Each map will be composed of the set of descriptorsof each panoramic scene

32 Localization Process Once the maps are available inorder to evaluate the different visual descriptors introducedin Section 2 to solve the localization problem we also makeuse of Google Street View images

To carry out the localization process first we choose oneof the images of the database (named as test image) In thismoment this image is removed from the map Second wecompute the descriptor of the test image (using one of themethods presented in Section 2) and obtain the distancebetween this descriptor and the descriptors of the rest of

6 Journal of Sensors

the images stored in the corresponding map As a resultthe images of the map are arranged from the nearest to thefarthest using the image distance as arranging criterion

The result of the localization algorithm is consideredcorrect if the first image it returns was captured on the mappoint which is geometrically the closest to the test imagecapture point (the GPS coordinates are used with this aim)We will refer to this case as a correct localization in zone 1However since this is a quite challenging and restrictive prob-lem it is also interesting to know if the first image that thealgorithm returns was captured on one of the two geometri-cally closest points to the test image capture point (zone 2) oreven on one of the three geometrically closest points (zone 3)The first case is the ideal one but we are also interested in theother cases as they will indicate if the algorithm is returningan image in the surroundings of the actual position of the testimage (ie the localization algorithm detects that the robot isin a zone close around its actual position)

This process is repeated for each description methodusing all the images of Sets 1 and 2 as test images In briefthe procedure to test the localization methods previouslyexplained consists in the following steps for each image anddescription method

(1) Extracting one image of the set (denoted as testimage) then this test image is eliminated from themap

(2) Calculating the descriptor of the test image(3) Calculating the distance between this descriptor and

all the map descriptors which we named imagedistance

(4) Retaining the most similar descriptor and studying ifit corresponds to one image that has been captured inthe surroundings of the test image capture point (zone1 2 or 3)

As a result the next data are retained for an individual testimage the image distance between the test image descriptorand the most similar map descriptor119863119905 and the localizationresults in zone 1 (correct or wrong match) 1198981199051 in zone 2119898t2 and in zone 3 1198981199053 After repeating this process with all

the test images the results will consist of four vectors whosedimension is equal to the number of test images The firstvector contains the distances119863119905 and the other three 12 and 3 contain respectively the information of corrector incorrect matches in zones 1 2 and 3

4 Evaluation Methods

In this work the localization results are expressed by meansof recall and precision curves [14] To build them the com-ponents of the vectors 1 2 and 3 are equally sorted inascending order of the distances that appear in the first vectorThe resulting sorted vectors of correct andwrongmatches arethen used to calculate the values of recall and precision Letus focus on the sorted vector of matches in zone 1 1199041 Firstfor each component in this vector the recall is calculated asthe number of correct matches obtained so far with respect to

0 02 04 06 08 10

02

04

06

08

1

Recall

Prec

ision

Descriptor 1Descriptor 2

Figure 3 Two sample recall-precision graphs obtained after carryingout the localization experiments with two different descriptionmethods

the total number of test images Second for each componentin the same vector the precision is obtained as the number ofcorrect matches obtained so far with respect to the numberof test images considered so far Then with the informationcontained in these vectors a precision versus recall curve isbuilt corresponding to the localization in zone 1 This isrepeated with the sorted vectors 1199042 and 1199043 to obtain thelocalization results in zone 2 and zone 3

In our experiments the most important piece of infor-mation of this type of graphs is the final point because itshows the global result of the experiment (final precision afterconsidering all the test images) However additional relevantinformation can be extracted from them because the graphalso shows the ability of the localization algorithm to findthe correct match while considering a specific image distancethreshold As explained in the previous paragraph the resultshave been arranged considering the ascending value of dis-tances Taking it into account as the recall increases thethreshold also does For this reason the evolution of therecall-precision curves contains information about the robust-ness of the algorithm with respect to a specific image distancethreshold If the precision values stay high independentlyof the recall this shows the presence of a lower number ofwrong results under this distance threshold Figure 3 showstwo sample recall-precision curves obtained after running thelocalization algorithm with all the test images and two dif-ferent description methods considering zone 1 Both curvesshow a similar final precision value between 06 and 065However the evolutions present a different behavior As anexample if we consider as threshold the distance associatedto recall = 025 according to the graph the precision ofdescriptor 1 is 100 but the precision of descriptor 2 is90Thismeans that considering the selected image distancethreshold 25 of correct localizations are achieved with100 of probability using descriptor 1 and with a 90 ofprobability using descriptor 2 This study can be carried outconsidering any value for the image distance threshold

Journal of Sensors 7

Before running the algorithm it is necessary to definethe image distance We use two different distance measuresdepending on the kind of descriptor used

First in the case of the feature-based method (SURF-Harris) it is necessary to extract the interest points prior todescribing the appearance of the image We propose usingthe Harris corner detector [16] to extract visual landmarksfrom the panoramic images After that each interest point isdescribed using standard SURF To compare the test imagewith themap images first we extract and describe the interestpoints from all the images After that a matching process iscarried out with these points The points detected in the testimage captured with a particular position and orientation ofthe vehicle are searched in themap imagesThe performanceof the matching method is not the scope of this paper weonly employ it as a tool Once the matching process has beencarried out we evaluate the performance of the descriptortaking into account the number of matched points so thatwe will consider as closest image the one that presents morematching points with the test image More concisely wecompute the distance between the test image 119905 and any otherimage of the map 119895 as

119863119905119895fea = 1 minus( 119873119872119905119895max119895 (997888997888997888997888rarr119873119872119905)) (3)

where119873119872119905119895 is the number of matches between the images 119905and 119895 997888997888997888997888rarr119873119872119905 = [1198731198721199051 119873119872119905119899map] is a vector that containsthe number of matches between the image 119905 and every imageof the map and 119899map is the number of images in the map

Second in the case of appearance-based methods (2DDFT FS gist and HOG) no local information needs to beextracted from the images Instead the appearance of thewhole images is compared This way the global descriptor ofthe test image is calculated and the distances between it andthe descriptors of themap images are obtainedTheEuclideandistance is used in this case defined as

119863119905119895119864 = radic 119872sum119898=1

(119898119905 minus 119898119895 )2 (4)

where 119905 is the descriptor of the test image 119905 119895 is thedescriptor of the map image 119895 and 119872 is the size of thedescriptors This distance is normalized to obtain the finaldistance between the images 119905 and 119895 according to the nextexpression

119863119905119895app = 119863119905119895119864max119895 (119905119864) (5)

where119863119905119895119864 is the Euclidean distance between the descriptor ofthe test image 119905 and the map image 119895 119905119864 = [1198631199051119864 119863119905119899map

119864 ]is a vector that contains the Euclidean distance between thedescriptor of the image 119905 and all the images in the map and119899map is the number of images in the map

It is important to note that the algorithm must be ableto estimate the position of the robot with accuracy but it isalso important that the computational cost is adequate toknowwhether it would be feasible to solve the problem in realtime To estimate the computational cost we have computedconsidering bothmaps in the experiments the necessary timeto calculate the descriptor of each test image to compute thedistance to themap descriptors and to detect themost similardescriptor We must take into account that the descriptors ofall the map images can be computed prior to the localizationin an off-line process Apart from the time we have alsoestimated the amount of memory needed to store each imagedescriptor

To finish we also propose to study the relationship dis-tance between two image descriptors versus geometric distancebetween the capture points of these two images Ideally the dis-tance between the descriptors must increase as the geometricdistance between capture points does (ie it must not presentany local minima) This information is very interesting inapplications such as map building where the robot mustbe able to build a map using as input information only thedistance between image descriptors It is also important whenit is necessary to estimate the position of the vehicle at halfwaypoints within the gridmap Additionally it may help to detectif the problem of visual aliasing is present in the environment(ie two zones which are geometrically far may present asimilar visual appearance which might lead to errors in themapping and localization process)

5 Experimental Results

As stated in the previous section with the purpose of estab-lishing the capacity of each descriptor to correctly localizethe robot (or vehicle) we have built recall-precision curvesto reflect the results of each experiment Figure 4 shows thisgraphical representation using (a) the first and (b) the secondset of images (denoted as Sets 1 and 2 in previous sections) Tobuild this figure we consider the localization results in zone 1This way the figure shows the ability of the localization algo-rithm to correctly detect which image of the map was cap-tured closer to the test imageThis is themost restrictive case

Apart from it the performance of the localization algo-rithm in zones 2 and 3 has also been studied This wayFigure 5 shows the results of the localization process in zone2 using (a) Set 1 and (b) Set 2 Finally Figure 6 shows thelocalization results in zone 3 using (a) Set 1 and (b) Set 2Thisis the least restrictive case among the three studied

In all cases the results show that the SURF-Harris des-criptor presents a relatively better performance comparing tothe other descriptors in terms of accuracy and using bothimage sets As far as the methods based on global appearanceare concerned the good behavior ofHOG can be highlightedIn the case of the localization in zone 2 it reaches 60and 50of precision in Sets 1 and 2 respectively These results can beconsidered relatively good taking into account the fact thatthe localization process is solved in an absolute way (ie weconsider that no information about the previous position ofthe robot is available and the test image is compared with allthe images stored in the data sets) In a real application it

8 Journal of Sensors

0 01 02 03 04 050

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(a)

0 01 02 03 04 05 060

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(b)

Figure 4 Results of the localization algorithm considering the correct matches in zone 1 using (a) Set 1 and (b) Set 2 The results of eachdescription method are shown as different recall-precision curves

0 01 02 03 04 05 06 070

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(a)

0 02 04 06 080

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(b)

Figure 5 Results of the localization algorithm considering the correct matches in zone 2 using (a) Set 1 and (b) Set 2 The results of eachdescription method are shown as different recall-precision curves

is usual to make use of any kind of probabilistic algorithmto estimate the position of the robot taking into account itsprevious estimated position This is expected to provide ahigher accuracyWe expect to develop this type of algorithmsand tests in a future work

Some additional conclusions can be reached by compar-ing the performance of the methods in open areas (Set 1) andurban areas (Set 2) In open areas the performance of SURF-Harris HOG and gist is quite similar and relatively goodin all cases and the methods based on the Discrete FourierTransform tend to present worse results However in the case

of urban areas SURF-Harris outperforms the other methodsand gist is the one that presents the worst results

Apart from the localization accuracy it is also importantto study the computational cost of the process since in a realapplication it would have to run in real time as the robotis navigating through the environment This way we haveobtained in all cases the necessary time to calculate thedescriptor of the test image on the one hand and to compare itwith the descriptors stored in themap to detect themost sim-ilar descriptor and to analyze the results on the other handThe average computational time of the localization process

Journal of Sensors 9

0 02 04 06 080

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(a)

0 02 04 06 080

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(b)

Figure 6 Results of the localization algorithm considering the correct matches in zone 3 using (a) Set 1 and (b) Set 2 The results of eachdescription method are shown as different recall-precision curves

Table 1 Average computational cost of the description algorithms studied per test image For each descriptionmethod and data set the tableshows first the necessary time to obtain the test image descriptor and second the time to compare it with the descriptors of the map and toobtain the final localization result

2D Fourier Fourier Signature Gist HOG SURF-HarrisData Set 1 Descriptor 00087 s 00080 s 04886 s 00608 s 05542 sData Set 1 Match 00015 s 00058 s 00006 s 00008 s 258085 sData Set 2 Descriptor 00085 s 00079 s 04828 s 00621 s 05389 sData Set 2 Match 00012 s 00047 s 00005 s 00006 s 193931 s

after considering all the test images is shown in Table 1 Toobtain the results of this table the algorithms have beenimplemented using Matlab

With respect to the computational cost the methodsbased on the Fourier Transform are significantly faster thanthe rest while SURF-Harris presents a considerably highcomputational cost About the necessary time to compare twodescriptors gist andHOG are the fastest methods In the caseof SURF-Harris the brute force match method implementedresults in a relatively high computational cost This methodhas been chosen to make a homogeneous comparison withthe other global-appearance methods However in a realimplementation a bag-of-words based approach [23] wouldimprove the computational efficiency of the algorithm

At last we have obtained the averagememory size neededto store each descriptorThe results are shown in Table 2Gistis the most compact descriptor (it is able to compress theinformation in each scene significantly) while SURF-Harrisneeds more memory size

Considering these results jointly with the precision inlocalization we could say that the SURF-Harris descriptorshows very good results in location accuracy but its compu-tational cost makes it unfeasible to solve a real applicationHOG which is the second in terms of accuracy also has avery good computational cost so we consider it interesting to

study more thoroughly this descriptor as future work and toimplement more advanced versions of this method to try tooptimize the accuracy Likewise other types of distances tocompare images could also be studied apart from the Euc-lidean distance For the same reasons we also consider itappropriate to examine more thoroughly the gist descriptorsas well as using other methods to extract the gist of a sceneapart from the orientation information (eg from the colorinformation)

As a final experiment we have studied the relationshipdistance between two image descriptors versus geometric dista-nce between the capture points of these two images As statedat the beginning of the section this information is very inter-esting in some applications such as the construction of mapsfrom the images with geometric precision or the localizationof the vehicle at halfway points of the grid of the map It isimportant that the distance between descriptors grows as thegeometric distance does Figure 7 shows the results obtainedusing (a) Set 1 and (b) Set 2 To obtain these figures an imagehas been set as a reference image and the distance betweenthe reference image descriptor and the other descriptors hasbeen calculatedThefigure shows this distance versus the geo-metric distance between the capture points of each image andthe capture point of the reference image In both cases thisrelationship is monotonically increasing up to a geometric

10 Journal of Sensors

Table 2 Necessary memory to store each descriptor

2D Fourier Fourier Signature Gist HOG SURF-HarrisDescriptor 16384 bytes 32768 bytes 4096 bytes 8192 bytes 110400 bytes

0 200 400 600 8000

02

04

06

08

1

Des

crip

tor d

istan

ce

2D Fourier Fourier SignatureHOG

GistSURF-Harris

Geometrical distance (m)

12

(a)

Des

crip

tor d

istan

ce0 200 400 600

0

02

04

06

08

1

12

2D FourierFourier SignatureHOG

GistSURF-Harris

Geometrical distance (m)

(b)

Figure 7 Relationship distance between two image descriptors versus geometric distance between the capture points of these two images using(a) Set 1 and (b) Set 2

distance of approximately 100meters From this point it tendsto stabilize with a relatively high variance The exceptionis the local features descriptor which stabilizes at the finalvalue from a very small geometric distance However theappearance-based descriptors exhibit a more linear behavioraround each image

6 Conclusions and Future Works

In this paper we have carried out a comparative evaluationof several description methods of scenes considering theperformance of these methods to accurately solve an absolutelocalization problem in a large real outdoor environmentWeevaluated two different approaches of visual descriptors localfeatures descriptors (SURF-Harris) and global-appearancedescriptors (2D Discrete Fourier Transform Fourier Signa-ture HOG and gist)

All the tests have been carried out with images ofGoogle Street View captured under realistic conditions Twodifferent areas of a city have been considered an open areaand a purely urban area with narrower streets The capturepoints of each area present different topographyThe first oneis a grid map that covers several streets and avenues and thesecond one a linear map (ie the images were captured whenthe mobile traversed a linear path on a narrow street)

Some different studies have been performed First wehave evaluated the accuracy of the localization process To

do this recall and precision curves have been computed tocompare the performance of each description method Weplot the recall and precision curves for both areas takinginto account different levels of accuracy to consider thatthe localization result is correct In these experiments thecomputational cost of the localization process has also beenanalyzed

We have also studied each descriptor in terms of behaviorof the descriptor distance comparing to geometrical distancebetween image capture points To do this we plot a curvethat represents the descriptor distance versus the geometricaldistance between capture points This measure is very usefulfor performing navigation tasks since thanks to it we canestimate the range of use of the descriptor

It is noticeable that the SURF-Harris descriptor is themost suitable descriptor in terms of precision in localizationbut it presents a smaller zone of work in terms of Euclideandistance between descriptorsTheHOGdescriptor has showna relatively good performance to solve the localization prob-lem and presents a good response of the descriptors distanceversus geometrical distance between capture points If weanalyze jointly the results of both experiments and take intoaccount the computational cost (Tables 1 and 2) we concludethat although the SURF-Harris descriptor presents the bestresults in terms of recall and precision curves it does notallow us to work in real time Therefore taking into account

Journal of Sensors 11

that HOG is the descriptor that presents the second bestresults in terms of recall and precision curves and allows us towork in real time we can conclude that the HOG is the mostsuitable descriptor

We plan to extend this work to (a) capture a real out-door trajectory traveling along several streets and capturingomnidirectional images using a catadioptric vision system(b) combine the information provided by this vision systemand the images of the Google Street View and (c) evaluatethe performance of the best descriptors in a probabilisticlocalization process

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work has been supported by the Spanish Governmentthrough the Project DPI 2013-41557-P Navegacion de Robotsen Entornos DinamicosMedianteMapas Compactos con Infor-macion Visual de Apariencia Global and by the GeneralitatValenciana through the ProjectsAICO2015021 Localizaciony Creacion de Mapas Visuales para Navegacion de Robots con6 GDL and GV2015031 Creacion de Mapas Topologicos aPartir de la Apariencia Global de un Conjunto de Escenas

References

[1] S Thrun D Fox W Burgard and F Dellaert ldquoRobust MonteCarlo localization for mobile robotsrdquo Artificial Intelligence vol128 no 1-2 pp 99ndash141 2001

[2] A Gil O Reinoso M A Vicente C Fernandez and LPaya ldquoMonte carlo localization using SIFT featuresrdquo in PatternRecognition and Image Analysis vol 3522 of Lecture Notes inComputer Science pp 623ndash630 2005

[3] AMurillo J Guerrero andC Sagues ldquoSurf features for efficientrobot localization with omnidirectional imagesrdquo in Proceedingsof the IEEE International Conference on Robotics amp AutomationSan Diego Calif USA 2007

[4] H Bay T Tuytelaars and L Van Gool ldquoSurf speeded up robustfeaturesrdquo in Proceedings of the 9th European Conference onComputer Vision Graz Austria 2006

[5] F Rossi A Ranganathan F Dellaert and EMenegatti ldquoTowardtopological localization with spherical fourier transform anduncalibrated camerardquo in Proceedings of the International Con-ference on Simulation Modeling and Programming for Auto-nomous Robots (SIMPAR rsquo08) pp 319ndash330 Venice Italy 2008

[6] L Paya O Reinoso F Amoros L Fernandez and A Gil ldquoPro-babilistic map building localization and navigation of a team ofmobile robots application to route followingrdquo in Multi-RobotSystems Trends and Development pp 191ndash210 2011

[7] L Fernandez L Paya D Valiente A Gil and O ReinosoldquoMonte Carlo localization using the global appearance of omni-directional images algorithmoptimization to large indoor envi-ronmentsrdquo in Proceedings of the 9th International Conference onInformatics in Control Automation and Robotics (ICINCO rsquo12)pp 439ndash442 Rome Italy July 2012

[8] M Montemerlo FastSLAM a factored solution to the simul-taneous localization and mapping problem with unknown dataassociation [PhD thesis] Robotics Institute Carnegie MellonUniversity Pittsburgh Pa USA 2003

[9] C Gamallo P Quintıa R Iglesias-Rodrıguez J V Lorenzo andC V Regueiro ldquoCombination of a low cost GPS with visuallocalization based on a previous map for outdoor navigationrdquoin Proceedings of the 11th International Conference on IntelligentSystems Design and Applications (ISDA rsquo11) pp 1146ndash1151Cordoba Spain November 2011

[10] A Torii J Sivic and T Pajdla ldquoVisual localization by linearcombination of image descriptorsrdquo in Proceedings of the IEEEInternational Conference on Computer VisionWorkshops (ICCVWorkshops rsquo11) pp 102ndash109 Barcelona Spain November 2011

[11] M Aly and J-Y Bouguet ldquoStreet view goes indoors automaticpose estimation fromuncalibrated unordered spherical panora-masrdquo in Proceedings of the IEEEWorkshop on the Applications ofComputer Vision (WACV rsquo12) pp 1ndash8 Breckenridge Colo USAJanuary 2012

[12] A Taneja L Ballan andM Pollefeys ldquoRegistration of sphericalpanoramic images with cadastral 3d modelsrdquo in Proceedings ofthe International Conference on 3D Imaging Modeling Process-ing Visualization and Transmission (3DIMPVT rsquo12) pp 479ndash486 Zurich Switzerland 2012

[13] F Amoros L Paya O Reinoso and L Jimenez ldquoComparisonof global-appearance techniques applied to visual map buildingand localizationrdquo in Proceedings of the International Conferenceon Computer Vision Theory and Applications pp 395ndash398Rome Italy 2012

[14] A Gil O M Mozos M Ballesta and O Reinoso ldquoA compara-tive evaluation of interest point detectors and local descriptorsfor visual SLAMrdquo Machine Vision and Applications vol 21 no6 pp 905ndash920 2010

[15] L Paya L FernandezO ReinosoAGil andDUbeda ldquoAppea-rance-based dense maps creation Comparison of compressiontechniques with panoramic imagesrdquo in Proceedings of the Inter-national Conference on Informatics in Control Automation andRobotics (INSTICC rsquo09) pp 238ndash246 Milan Italy 2009

[16] C Harris and M Stephens ldquoA combined corner and edgedetectorrdquo in Proceedings of the Alvey Vision Conference pp 231ndash236 Manchester UK 1988

[17] E Menegatti T Maeda and H Ishiguro ldquoImage-based mem-ory for robot navigation using properties of omnidirectionalimagesrdquo Robotics and Autonomous Systems vol 47 no 4 pp251ndash267 2004

[18] L Paya L Fernandez A Gil and O Reinoso ldquoMap build-ing and Monte Carlo localization using global appearance ofomnidirectional imagesrdquo Sensors vol 10 no 12 pp 11468ndash114972010

[19] A Friedman ldquoFraming pictures the role of knowledge in auto-matized encoding andmemory for gistrdquo Journal of ExperimentalPsychology General vol 108 no 3 pp 316ndash355 1979

[20] A Torralba ldquoContextual priming for object detectionrdquo Interna-tional Journal of Computer Vision vol 53 no 2 pp 169ndash1912003

[21] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) pp 886ndash893 Washington DC USA June 2005

12 Journal of Sensors

[22] F Amoros L Paya O Reinoso L Fernandez and J MarınldquoVisual map building and localization with an appearance-based approachmdashcomparisons of techniques to extract infor-mation of panoramic imagesrdquo in Proceedings of the 7th Inter-national Conference on Informatics in Control Automation andRobotics pp 423ndash426 2010

[23] D Filliat ldquoA visual bag of words method for interactive quali-tative localization and mappingrdquo in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRArsquo07) pp 3921ndash3926 Roma Italy April 2007

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 3: Research Article A Study of Visual Descriptors for Outdoor ...downloads.hindawi.com/journals/js/2016/1537891.pdf · Research Article A Study of Visual Descriptors for Outdoor Navigation

Journal of Sensors 3

Fourier Signature (FS) gist and the Histogram of OrientedGradients (HOG) Each method uses a different mechanismto express the global information of the scene First DFTand FS are based on the analysis in the frequency domainin two dimensions and one dimension respectively Secondthe approach of gist we use is built from edges informationobtained through Gabor filtering and analyzed in severalscales Finally HOG gathers systematic information from theorientation of the gradient in local areas of the image Thechoice of these description methods will permit analyzingthe influence of each kind of information in the localizationprocess

The initial objective of this study was to compare someglobal-appearance methods However we have decided toinclude in this comparative evaluation a local features descri-ption method to make a more complete study With this aimwe have chosen SURF due to its relatively low computa-tional cost comparing with other classical feature-basedapproaches

The next subsections present briefly the descriptionmethods included in the comparative evaluation

21 SURF and Harris Corner Detector The Speeded UpRobust Features (SURF) were introduced by Bay et al [4]This study showed that SURF outperform existing methodswith respect to repeatability robustness and distinctivenessof the descriptorsThe detectionmethod uses integral imagesto reduce the computational time and is based on the Hessianmatrix On the other hand the descriptor represents adistribution of Haar-wavelet responses within the interestpoint neighborhood and makes an efficient use of integralimages In this work we only include the standard SURFdescriptor which has a dimension of 64 components perlandmark but there are two more versions the extendedversion (E-SURF) with 128 elements and the upright version(U-SURF) that is not invariant to rotation and has a lengthof 64 elements On the other hand we perform the detectionof the features using the Harris corner detector (based on theeigenvalues of the second moment matrix [16]) because ourexperiments showed that this method extracted most robustpoints in outdoor images comparing to the SURF extractionmethod

This way themethod we use in this work is a combinationof these two algorithms More specifically the Harris cornerdetector is used to extract the features from the image and thestandard SURFdescriptor is used to characterize anddescribeeach one of the landmarks previously detected

22 Two-Dimensional Discrete Fourier Transform From animage119891(119909 119910)with119873119909 rows and119873119910 columns the 2DDiscreteFourier Transform (DFT) can be defined as follows119865 [119891 (119909 119910)] = 119865 (119906 V)

= 1119873119910119873119909 119873119909minus1sum119909=0119873119910minus1sum119910=0

119891 (119909 119910) sdot 119890minus2120587119895(119906119909119873119909+V119910119873119910)119906 = 0 119873119909 minus 1 V = 0 119873119910 minus 1

(1)

where (119906 V) are the frequency variables and the transformedfunction 119865(119906 V) is a complex function which can be decom-posed into a magnitudes matrix and an arguments matrixThis transformation presents some interesting propertieswhich are helpful in robot localization tasks First the mostrelevant information in the Fourier domain concentrates inthe low frequency components so it is possible to reducethe amount of memory and to optimize the computationalcost by retaining only the first 119896119909 rows and 119896119910 columns inthe transform Second when 119891(119909 119910) is a panoramic scene atranslation in the rows andor columns of the original imageproduces a change only in the arguments matrix [15] Thisway the magnitudes matrix contains information which isinvariant to rotations of the robot in the groundplane and thearguments matrix contains information that can be useful toestimate the orientation of the robot in this plane with respectto a reference image (using the DFT shift theorem)

Taking these facts into account the global description ofthe image 119891(119909 119910) consists of the magnitudes matrix 119860(119906 V)and the arguments matrix Φ(119906 V) of its two-dimensionalDFT The dimensions of both matrices are 119896119909 lt 119873119909 rowsand 119896119910 lt 119873119910 columns On the one hand 119860(119906 V) is useful toestimate the robot position and on the other hand the infor-mation in Φ(119906 V) can be used to estimate the robot orienta-tion

23 Fourier Signature The third image description methodused in this comparative analysis is the Fourier Signature(FS) described initially by Menegatti et al [17] From animage 119891(119909 119910) with119873119909 rows and119873119910 columns the FS consistsin obtaining the one-dimensional DFT of each row Thismethod presents some advantages such as its simplicity itslow computational cost and the fact that it exploits better theinvariance against rotations of the robot in the ground planewhen we work with panoramic views

More specifically the process to compute the FS consistsin transforming each row 119909 of the original panoramic image119891119909 = 1198911199090 1198911199091 119891119909119873119910minus1 119909 = 0 119873119909 minus 1 into thesequence of complex numbers 119865119909 = 1198651199090 1198651199091 119865119909119873119910minus1119909 = 0 119873119909 minus 1 according to the 1D-DFT expression

119865119909119896 = 119873119910minus1sum119899=0

119891119909119899 sdot 119890minus119895(2120587119873119910)119896119899119896 = 0 119873119910 minus 1 119909 = 0 119873119909 minus 1 (2)

The result is a complex matrix 119865(119909 V) where V is a fre-quency variable which can be decomposed into amagnitudesmatrix and an arguments matrix

Thanks to the 1D-DFT properties it is possible to repre-sent each row of 119865(119909 V) with the first coefficients since themost relevant information is concentrated in the low fre-quency components of each row in the descriptor so it ispossible to reduce the amount of memory by retaining only119896119910 first columns in signature 119865(119909 V) Also when 119891(119909 119910) isa panoramic scene the modules matrix is invariant againstrobot rotations in the ground plane and the magnitudesmatrix permits estimating the change in the robot orientationusing the DFT shift theorem [15 17 18]

4 Journal of Sensors

Taking these facts into account the global description ofthe image 119891(119909 119910) consists of the magnitudes matrix 119860(119909 V)and the arguments matrix Φ(119909 V) of the Fourier SignatureThe dimensions of both matrices are 119873119909 rows and 119896119910 lt 119873119910columns First the position of the robot can be estimatedusing the information in 119860(119909 V) since it is invariant tochanges in robot orientation and second Φ(119909 V) can be usedto estimate the robot orientation

24 Gist Theconcept of the gist of an image can be defined asan abstract representation that activates the memory of scenecategories [19]The gist-based descriptors try to represent theimage by obtaining its essential information simulating thehuman perception system and its ability to recognize a scenethrough the identification of color saliency or remarkablestructures Torralba [20] presents a model to obtain globalscene features working in several spatial frequencies andusing different scales based onGabor filteringThey use thesefeatures in a scene recognition and classification task Inprevious works [13] we employed a gist-Gabor descriptor inorder to obtain frequency and orientation information Dueto the good results obtained in indoor environments whenthemobile robot presents 3 DOF (degrees of freedom)move-ments on the ground plane the fourth method employed inthe comparative analysis presented in this paper is the gistdescriptor of panoramic images

The method starts with two versions of the initialpanoramic image 119891(119909 119910) the original one with119873119909 rows and119873119910 columns and a new version after applying a Gaussianlow-pass filter and subsampling to a new size equal to 05 sdot119873119909 times 05 sdot 119873119910 After that both images are filtered with a bankof 119899119891 Gabor filters whose orientations are evenly distributedto cover the whole circle Then to reduce the amount ofinformation the pixels into both images are grouped into 1198961horizontal blocks per imagewhosewidth is equal to119873119910 in thefirst image and 05sdot119873119910 in the second oneThe average value ofthe pixels in each group is calculated and all this informationis arranged into a final descriptor which is a column vector with 2 sdot 1198961 sdot 119899119891 components This descriptor is invariantagainst rotations of the vehicle on the ground plane Moreinformation about the method can be found in [13]

25 Histogram of Oriented Gradients The Histogram ofOriented Gradients (HOG) descriptors are based on theorientation of the gradient in local areas of an image It wasdescribed initially by Dalal and Triggs [21] More conciselyit consists first in obtaining the magnitude and orientationof the gradient of each pixel of the original image Thisimage is divided then into a set of cells and a histogram ofgradient orientation is compiled for each cell aggregating theinformation of the gradient orientation of each pixel withinthe cell weighting with the magnitude of the pixel

The omnidirectional images captured from a specificposition of the ground plane contain the same pixels in a rowindependently on the orientation of the robot in this planebut in a different order Taking this fact into account if wecalculate the histogram of cells that have the same width of

the original image we obtain a descriptor which is invariantagainst rotations of the robot

The method we use is described in depth in [22] andcan be summarized as follows The initial panoramic image119891(119909 119910) with 119873119909 rows and 119873119910 columns is first filtered toobtain two images with the information of the horizontal andvertical edges 119891119909(119909 119910) and 119891119910(119909 119910) From these two imagesthe magnitude of the gradient and its orientation is obtainedpixel by pixel and the results are stored in matrices119872(119909 119910)and Θ(119909 119910) respectively Matrix Θ(119909 119910) is then divided into1198962 horizontal cells whose width is equal to119873119910 For each cellan orientation histogramwith 119887 bins is compiled During thisprocess each pixel inΘ(119909 119910) is weighted with the magnitudeof the corresponding pixel in 119872(119909 119910) At the end of theprocess the set of histograms constitutes the final descriptorℎ which is a column vector with 1198962 sdot 119887 components

3 Experiments Setup

The main objective of this work consists in carrying out anexhaustive evaluation of the performance of the descriptionmethods presented in the previous section All thesemethodswill be included in a localization algorithm and their per-formance will be evaluated and compared both in terms ofcomputational cost and localization accuracy The results ofthis comparative evaluation will give us an idea of which isthe description method that offers the best results in outdoorenvironments when using Google Street View images

With this aim two different regions in the city of Elche(Spain) have been selected and the Google Maps images ofthese two areas have been obtained and stored in two datasets Each one of these data sets will constitute a map andwill be used subsequently to estimate the position of thevehicle within the map by comparing the image capturedby the vehicle from the unknown position with the imagespreviously stored in each map

Themain features of the two sets of images are as follows

Set 1 Set 1 consists of 177 full spherical panorama imageswithresolution generally up to 3328 times 1664 pixels Each imagecovers a field of view of 360 degrees in the ground plane and180 degrees vertically Figure 1 shows the GPS position whereeach image was captured (blue dots) and two examples ofthe panoramic images after a preprocessing process This setcorresponds with a mesh topography database that containsimages of various streets and open areasThe images cover anarea of approximately 700m times 300m

Set 2 Set 2 consists of 144 full spherical panorama imagesThe images have been captured along the same street with alinear topology covering approximately 1700m The appear-ance of these images is more urban Figure 2 shows the GPSposition where each image was captured (blue dots) andthree examples of the panoramic images after a preprocessingprocess

31 Image Preprocessing and Map Creation Due to the widevertical field of view of the acquisition system the sky is

Journal of Sensors 5

Figure 1 Bird eyersquos view of the region chosen as map 1 prior to the localization experiment The blue dots represent the coordinates wherethe images of the Set 1 were captured Two examples of Google Street View images after a preprocessing step are also shown

Figure 2 Bird eyersquos view of the region chosen as map 2 prior to the localization experiment The blue dots represent the coordinates wherethe images of Set 2 were captured Three examples of Google Street View images after a preprocessing step are also shown

often a big portion of the Google Street View images Theappearance of this area will be very prone to changes whenthe localization process is carried out in a different time of daywith respect to the time of day when the map was capturedTaking this fact into account a preprocessing step has beencarried out to remove part of the sky in the scenes

Once part of the sky has been removed fromall the scenesthe images are converted into grayscale and their resolutionis reduced to 512 times 128 pixels to ensure the computationalviability of the algorithms

After that each image will be described using the fivedescription methods presented in Section 2 At the endone map will be available per image set and per description

method Each map will be composed of the set of descriptorsof each panoramic scene

32 Localization Process Once the maps are available inorder to evaluate the different visual descriptors introducedin Section 2 to solve the localization problem we also makeuse of Google Street View images

To carry out the localization process first we choose oneof the images of the database (named as test image) In thismoment this image is removed from the map Second wecompute the descriptor of the test image (using one of themethods presented in Section 2) and obtain the distancebetween this descriptor and the descriptors of the rest of

6 Journal of Sensors

the images stored in the corresponding map As a resultthe images of the map are arranged from the nearest to thefarthest using the image distance as arranging criterion

The result of the localization algorithm is consideredcorrect if the first image it returns was captured on the mappoint which is geometrically the closest to the test imagecapture point (the GPS coordinates are used with this aim)We will refer to this case as a correct localization in zone 1However since this is a quite challenging and restrictive prob-lem it is also interesting to know if the first image that thealgorithm returns was captured on one of the two geometri-cally closest points to the test image capture point (zone 2) oreven on one of the three geometrically closest points (zone 3)The first case is the ideal one but we are also interested in theother cases as they will indicate if the algorithm is returningan image in the surroundings of the actual position of the testimage (ie the localization algorithm detects that the robot isin a zone close around its actual position)

This process is repeated for each description methodusing all the images of Sets 1 and 2 as test images In briefthe procedure to test the localization methods previouslyexplained consists in the following steps for each image anddescription method

(1) Extracting one image of the set (denoted as testimage) then this test image is eliminated from themap

(2) Calculating the descriptor of the test image(3) Calculating the distance between this descriptor and

all the map descriptors which we named imagedistance

(4) Retaining the most similar descriptor and studying ifit corresponds to one image that has been captured inthe surroundings of the test image capture point (zone1 2 or 3)

As a result the next data are retained for an individual testimage the image distance between the test image descriptorand the most similar map descriptor119863119905 and the localizationresults in zone 1 (correct or wrong match) 1198981199051 in zone 2119898t2 and in zone 3 1198981199053 After repeating this process with all

the test images the results will consist of four vectors whosedimension is equal to the number of test images The firstvector contains the distances119863119905 and the other three 12 and 3 contain respectively the information of corrector incorrect matches in zones 1 2 and 3

4 Evaluation Methods

In this work the localization results are expressed by meansof recall and precision curves [14] To build them the com-ponents of the vectors 1 2 and 3 are equally sorted inascending order of the distances that appear in the first vectorThe resulting sorted vectors of correct andwrongmatches arethen used to calculate the values of recall and precision Letus focus on the sorted vector of matches in zone 1 1199041 Firstfor each component in this vector the recall is calculated asthe number of correct matches obtained so far with respect to

0 02 04 06 08 10

02

04

06

08

1

Recall

Prec

ision

Descriptor 1Descriptor 2

Figure 3 Two sample recall-precision graphs obtained after carryingout the localization experiments with two different descriptionmethods

the total number of test images Second for each componentin the same vector the precision is obtained as the number ofcorrect matches obtained so far with respect to the numberof test images considered so far Then with the informationcontained in these vectors a precision versus recall curve isbuilt corresponding to the localization in zone 1 This isrepeated with the sorted vectors 1199042 and 1199043 to obtain thelocalization results in zone 2 and zone 3

In our experiments the most important piece of infor-mation of this type of graphs is the final point because itshows the global result of the experiment (final precision afterconsidering all the test images) However additional relevantinformation can be extracted from them because the graphalso shows the ability of the localization algorithm to findthe correct match while considering a specific image distancethreshold As explained in the previous paragraph the resultshave been arranged considering the ascending value of dis-tances Taking it into account as the recall increases thethreshold also does For this reason the evolution of therecall-precision curves contains information about the robust-ness of the algorithm with respect to a specific image distancethreshold If the precision values stay high independentlyof the recall this shows the presence of a lower number ofwrong results under this distance threshold Figure 3 showstwo sample recall-precision curves obtained after running thelocalization algorithm with all the test images and two dif-ferent description methods considering zone 1 Both curvesshow a similar final precision value between 06 and 065However the evolutions present a different behavior As anexample if we consider as threshold the distance associatedto recall = 025 according to the graph the precision ofdescriptor 1 is 100 but the precision of descriptor 2 is90Thismeans that considering the selected image distancethreshold 25 of correct localizations are achieved with100 of probability using descriptor 1 and with a 90 ofprobability using descriptor 2 This study can be carried outconsidering any value for the image distance threshold

Journal of Sensors 7

Before running the algorithm it is necessary to definethe image distance We use two different distance measuresdepending on the kind of descriptor used

First in the case of the feature-based method (SURF-Harris) it is necessary to extract the interest points prior todescribing the appearance of the image We propose usingthe Harris corner detector [16] to extract visual landmarksfrom the panoramic images After that each interest point isdescribed using standard SURF To compare the test imagewith themap images first we extract and describe the interestpoints from all the images After that a matching process iscarried out with these points The points detected in the testimage captured with a particular position and orientation ofthe vehicle are searched in themap imagesThe performanceof the matching method is not the scope of this paper weonly employ it as a tool Once the matching process has beencarried out we evaluate the performance of the descriptortaking into account the number of matched points so thatwe will consider as closest image the one that presents morematching points with the test image More concisely wecompute the distance between the test image 119905 and any otherimage of the map 119895 as

119863119905119895fea = 1 minus( 119873119872119905119895max119895 (997888997888997888997888rarr119873119872119905)) (3)

where119873119872119905119895 is the number of matches between the images 119905and 119895 997888997888997888997888rarr119873119872119905 = [1198731198721199051 119873119872119905119899map] is a vector that containsthe number of matches between the image 119905 and every imageof the map and 119899map is the number of images in the map

Second in the case of appearance-based methods (2DDFT FS gist and HOG) no local information needs to beextracted from the images Instead the appearance of thewhole images is compared This way the global descriptor ofthe test image is calculated and the distances between it andthe descriptors of themap images are obtainedTheEuclideandistance is used in this case defined as

119863119905119895119864 = radic 119872sum119898=1

(119898119905 minus 119898119895 )2 (4)

where 119905 is the descriptor of the test image 119905 119895 is thedescriptor of the map image 119895 and 119872 is the size of thedescriptors This distance is normalized to obtain the finaldistance between the images 119905 and 119895 according to the nextexpression

119863119905119895app = 119863119905119895119864max119895 (119905119864) (5)

where119863119905119895119864 is the Euclidean distance between the descriptor ofthe test image 119905 and the map image 119895 119905119864 = [1198631199051119864 119863119905119899map

119864 ]is a vector that contains the Euclidean distance between thedescriptor of the image 119905 and all the images in the map and119899map is the number of images in the map

It is important to note that the algorithm must be ableto estimate the position of the robot with accuracy but it isalso important that the computational cost is adequate toknowwhether it would be feasible to solve the problem in realtime To estimate the computational cost we have computedconsidering bothmaps in the experiments the necessary timeto calculate the descriptor of each test image to compute thedistance to themap descriptors and to detect themost similardescriptor We must take into account that the descriptors ofall the map images can be computed prior to the localizationin an off-line process Apart from the time we have alsoestimated the amount of memory needed to store each imagedescriptor

To finish we also propose to study the relationship dis-tance between two image descriptors versus geometric distancebetween the capture points of these two images Ideally the dis-tance between the descriptors must increase as the geometricdistance between capture points does (ie it must not presentany local minima) This information is very interesting inapplications such as map building where the robot mustbe able to build a map using as input information only thedistance between image descriptors It is also important whenit is necessary to estimate the position of the vehicle at halfwaypoints within the gridmap Additionally it may help to detectif the problem of visual aliasing is present in the environment(ie two zones which are geometrically far may present asimilar visual appearance which might lead to errors in themapping and localization process)

5 Experimental Results

As stated in the previous section with the purpose of estab-lishing the capacity of each descriptor to correctly localizethe robot (or vehicle) we have built recall-precision curvesto reflect the results of each experiment Figure 4 shows thisgraphical representation using (a) the first and (b) the secondset of images (denoted as Sets 1 and 2 in previous sections) Tobuild this figure we consider the localization results in zone 1This way the figure shows the ability of the localization algo-rithm to correctly detect which image of the map was cap-tured closer to the test imageThis is themost restrictive case

Apart from it the performance of the localization algo-rithm in zones 2 and 3 has also been studied This wayFigure 5 shows the results of the localization process in zone2 using (a) Set 1 and (b) Set 2 Finally Figure 6 shows thelocalization results in zone 3 using (a) Set 1 and (b) Set 2Thisis the least restrictive case among the three studied

In all cases the results show that the SURF-Harris des-criptor presents a relatively better performance comparing tothe other descriptors in terms of accuracy and using bothimage sets As far as the methods based on global appearanceare concerned the good behavior ofHOG can be highlightedIn the case of the localization in zone 2 it reaches 60and 50of precision in Sets 1 and 2 respectively These results can beconsidered relatively good taking into account the fact thatthe localization process is solved in an absolute way (ie weconsider that no information about the previous position ofthe robot is available and the test image is compared with allthe images stored in the data sets) In a real application it

8 Journal of Sensors

0 01 02 03 04 050

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(a)

0 01 02 03 04 05 060

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(b)

Figure 4 Results of the localization algorithm considering the correct matches in zone 1 using (a) Set 1 and (b) Set 2 The results of eachdescription method are shown as different recall-precision curves

0 01 02 03 04 05 06 070

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(a)

0 02 04 06 080

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(b)

Figure 5 Results of the localization algorithm considering the correct matches in zone 2 using (a) Set 1 and (b) Set 2 The results of eachdescription method are shown as different recall-precision curves

is usual to make use of any kind of probabilistic algorithmto estimate the position of the robot taking into account itsprevious estimated position This is expected to provide ahigher accuracyWe expect to develop this type of algorithmsand tests in a future work

Some additional conclusions can be reached by compar-ing the performance of the methods in open areas (Set 1) andurban areas (Set 2) In open areas the performance of SURF-Harris HOG and gist is quite similar and relatively goodin all cases and the methods based on the Discrete FourierTransform tend to present worse results However in the case

of urban areas SURF-Harris outperforms the other methodsand gist is the one that presents the worst results

Apart from the localization accuracy it is also importantto study the computational cost of the process since in a realapplication it would have to run in real time as the robotis navigating through the environment This way we haveobtained in all cases the necessary time to calculate thedescriptor of the test image on the one hand and to compare itwith the descriptors stored in themap to detect themost sim-ilar descriptor and to analyze the results on the other handThe average computational time of the localization process

Journal of Sensors 9

0 02 04 06 080

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(a)

0 02 04 06 080

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(b)

Figure 6 Results of the localization algorithm considering the correct matches in zone 3 using (a) Set 1 and (b) Set 2 The results of eachdescription method are shown as different recall-precision curves

Table 1 Average computational cost of the description algorithms studied per test image For each descriptionmethod and data set the tableshows first the necessary time to obtain the test image descriptor and second the time to compare it with the descriptors of the map and toobtain the final localization result

2D Fourier Fourier Signature Gist HOG SURF-HarrisData Set 1 Descriptor 00087 s 00080 s 04886 s 00608 s 05542 sData Set 1 Match 00015 s 00058 s 00006 s 00008 s 258085 sData Set 2 Descriptor 00085 s 00079 s 04828 s 00621 s 05389 sData Set 2 Match 00012 s 00047 s 00005 s 00006 s 193931 s

after considering all the test images is shown in Table 1 Toobtain the results of this table the algorithms have beenimplemented using Matlab

With respect to the computational cost the methodsbased on the Fourier Transform are significantly faster thanthe rest while SURF-Harris presents a considerably highcomputational cost About the necessary time to compare twodescriptors gist andHOG are the fastest methods In the caseof SURF-Harris the brute force match method implementedresults in a relatively high computational cost This methodhas been chosen to make a homogeneous comparison withthe other global-appearance methods However in a realimplementation a bag-of-words based approach [23] wouldimprove the computational efficiency of the algorithm

At last we have obtained the averagememory size neededto store each descriptorThe results are shown in Table 2Gistis the most compact descriptor (it is able to compress theinformation in each scene significantly) while SURF-Harrisneeds more memory size

Considering these results jointly with the precision inlocalization we could say that the SURF-Harris descriptorshows very good results in location accuracy but its compu-tational cost makes it unfeasible to solve a real applicationHOG which is the second in terms of accuracy also has avery good computational cost so we consider it interesting to

study more thoroughly this descriptor as future work and toimplement more advanced versions of this method to try tooptimize the accuracy Likewise other types of distances tocompare images could also be studied apart from the Euc-lidean distance For the same reasons we also consider itappropriate to examine more thoroughly the gist descriptorsas well as using other methods to extract the gist of a sceneapart from the orientation information (eg from the colorinformation)

As a final experiment we have studied the relationshipdistance between two image descriptors versus geometric dista-nce between the capture points of these two images As statedat the beginning of the section this information is very inter-esting in some applications such as the construction of mapsfrom the images with geometric precision or the localizationof the vehicle at halfway points of the grid of the map It isimportant that the distance between descriptors grows as thegeometric distance does Figure 7 shows the results obtainedusing (a) Set 1 and (b) Set 2 To obtain these figures an imagehas been set as a reference image and the distance betweenthe reference image descriptor and the other descriptors hasbeen calculatedThefigure shows this distance versus the geo-metric distance between the capture points of each image andthe capture point of the reference image In both cases thisrelationship is monotonically increasing up to a geometric

10 Journal of Sensors

Table 2 Necessary memory to store each descriptor

2D Fourier Fourier Signature Gist HOG SURF-HarrisDescriptor 16384 bytes 32768 bytes 4096 bytes 8192 bytes 110400 bytes

0 200 400 600 8000

02

04

06

08

1

Des

crip

tor d

istan

ce

2D Fourier Fourier SignatureHOG

GistSURF-Harris

Geometrical distance (m)

12

(a)

Des

crip

tor d

istan

ce0 200 400 600

0

02

04

06

08

1

12

2D FourierFourier SignatureHOG

GistSURF-Harris

Geometrical distance (m)

(b)

Figure 7 Relationship distance between two image descriptors versus geometric distance between the capture points of these two images using(a) Set 1 and (b) Set 2

distance of approximately 100meters From this point it tendsto stabilize with a relatively high variance The exceptionis the local features descriptor which stabilizes at the finalvalue from a very small geometric distance However theappearance-based descriptors exhibit a more linear behavioraround each image

6 Conclusions and Future Works

In this paper we have carried out a comparative evaluationof several description methods of scenes considering theperformance of these methods to accurately solve an absolutelocalization problem in a large real outdoor environmentWeevaluated two different approaches of visual descriptors localfeatures descriptors (SURF-Harris) and global-appearancedescriptors (2D Discrete Fourier Transform Fourier Signa-ture HOG and gist)

All the tests have been carried out with images ofGoogle Street View captured under realistic conditions Twodifferent areas of a city have been considered an open areaand a purely urban area with narrower streets The capturepoints of each area present different topographyThe first oneis a grid map that covers several streets and avenues and thesecond one a linear map (ie the images were captured whenthe mobile traversed a linear path on a narrow street)

Some different studies have been performed First wehave evaluated the accuracy of the localization process To

do this recall and precision curves have been computed tocompare the performance of each description method Weplot the recall and precision curves for both areas takinginto account different levels of accuracy to consider thatthe localization result is correct In these experiments thecomputational cost of the localization process has also beenanalyzed

We have also studied each descriptor in terms of behaviorof the descriptor distance comparing to geometrical distancebetween image capture points To do this we plot a curvethat represents the descriptor distance versus the geometricaldistance between capture points This measure is very usefulfor performing navigation tasks since thanks to it we canestimate the range of use of the descriptor

It is noticeable that the SURF-Harris descriptor is themost suitable descriptor in terms of precision in localizationbut it presents a smaller zone of work in terms of Euclideandistance between descriptorsTheHOGdescriptor has showna relatively good performance to solve the localization prob-lem and presents a good response of the descriptors distanceversus geometrical distance between capture points If weanalyze jointly the results of both experiments and take intoaccount the computational cost (Tables 1 and 2) we concludethat although the SURF-Harris descriptor presents the bestresults in terms of recall and precision curves it does notallow us to work in real time Therefore taking into account

Journal of Sensors 11

that HOG is the descriptor that presents the second bestresults in terms of recall and precision curves and allows us towork in real time we can conclude that the HOG is the mostsuitable descriptor

We plan to extend this work to (a) capture a real out-door trajectory traveling along several streets and capturingomnidirectional images using a catadioptric vision system(b) combine the information provided by this vision systemand the images of the Google Street View and (c) evaluatethe performance of the best descriptors in a probabilisticlocalization process

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work has been supported by the Spanish Governmentthrough the Project DPI 2013-41557-P Navegacion de Robotsen Entornos DinamicosMedianteMapas Compactos con Infor-macion Visual de Apariencia Global and by the GeneralitatValenciana through the ProjectsAICO2015021 Localizaciony Creacion de Mapas Visuales para Navegacion de Robots con6 GDL and GV2015031 Creacion de Mapas Topologicos aPartir de la Apariencia Global de un Conjunto de Escenas

References

[1] S Thrun D Fox W Burgard and F Dellaert ldquoRobust MonteCarlo localization for mobile robotsrdquo Artificial Intelligence vol128 no 1-2 pp 99ndash141 2001

[2] A Gil O Reinoso M A Vicente C Fernandez and LPaya ldquoMonte carlo localization using SIFT featuresrdquo in PatternRecognition and Image Analysis vol 3522 of Lecture Notes inComputer Science pp 623ndash630 2005

[3] AMurillo J Guerrero andC Sagues ldquoSurf features for efficientrobot localization with omnidirectional imagesrdquo in Proceedingsof the IEEE International Conference on Robotics amp AutomationSan Diego Calif USA 2007

[4] H Bay T Tuytelaars and L Van Gool ldquoSurf speeded up robustfeaturesrdquo in Proceedings of the 9th European Conference onComputer Vision Graz Austria 2006

[5] F Rossi A Ranganathan F Dellaert and EMenegatti ldquoTowardtopological localization with spherical fourier transform anduncalibrated camerardquo in Proceedings of the International Con-ference on Simulation Modeling and Programming for Auto-nomous Robots (SIMPAR rsquo08) pp 319ndash330 Venice Italy 2008

[6] L Paya O Reinoso F Amoros L Fernandez and A Gil ldquoPro-babilistic map building localization and navigation of a team ofmobile robots application to route followingrdquo in Multi-RobotSystems Trends and Development pp 191ndash210 2011

[7] L Fernandez L Paya D Valiente A Gil and O ReinosoldquoMonte Carlo localization using the global appearance of omni-directional images algorithmoptimization to large indoor envi-ronmentsrdquo in Proceedings of the 9th International Conference onInformatics in Control Automation and Robotics (ICINCO rsquo12)pp 439ndash442 Rome Italy July 2012

[8] M Montemerlo FastSLAM a factored solution to the simul-taneous localization and mapping problem with unknown dataassociation [PhD thesis] Robotics Institute Carnegie MellonUniversity Pittsburgh Pa USA 2003

[9] C Gamallo P Quintıa R Iglesias-Rodrıguez J V Lorenzo andC V Regueiro ldquoCombination of a low cost GPS with visuallocalization based on a previous map for outdoor navigationrdquoin Proceedings of the 11th International Conference on IntelligentSystems Design and Applications (ISDA rsquo11) pp 1146ndash1151Cordoba Spain November 2011

[10] A Torii J Sivic and T Pajdla ldquoVisual localization by linearcombination of image descriptorsrdquo in Proceedings of the IEEEInternational Conference on Computer VisionWorkshops (ICCVWorkshops rsquo11) pp 102ndash109 Barcelona Spain November 2011

[11] M Aly and J-Y Bouguet ldquoStreet view goes indoors automaticpose estimation fromuncalibrated unordered spherical panora-masrdquo in Proceedings of the IEEEWorkshop on the Applications ofComputer Vision (WACV rsquo12) pp 1ndash8 Breckenridge Colo USAJanuary 2012

[12] A Taneja L Ballan andM Pollefeys ldquoRegistration of sphericalpanoramic images with cadastral 3d modelsrdquo in Proceedings ofthe International Conference on 3D Imaging Modeling Process-ing Visualization and Transmission (3DIMPVT rsquo12) pp 479ndash486 Zurich Switzerland 2012

[13] F Amoros L Paya O Reinoso and L Jimenez ldquoComparisonof global-appearance techniques applied to visual map buildingand localizationrdquo in Proceedings of the International Conferenceon Computer Vision Theory and Applications pp 395ndash398Rome Italy 2012

[14] A Gil O M Mozos M Ballesta and O Reinoso ldquoA compara-tive evaluation of interest point detectors and local descriptorsfor visual SLAMrdquo Machine Vision and Applications vol 21 no6 pp 905ndash920 2010

[15] L Paya L FernandezO ReinosoAGil andDUbeda ldquoAppea-rance-based dense maps creation Comparison of compressiontechniques with panoramic imagesrdquo in Proceedings of the Inter-national Conference on Informatics in Control Automation andRobotics (INSTICC rsquo09) pp 238ndash246 Milan Italy 2009

[16] C Harris and M Stephens ldquoA combined corner and edgedetectorrdquo in Proceedings of the Alvey Vision Conference pp 231ndash236 Manchester UK 1988

[17] E Menegatti T Maeda and H Ishiguro ldquoImage-based mem-ory for robot navigation using properties of omnidirectionalimagesrdquo Robotics and Autonomous Systems vol 47 no 4 pp251ndash267 2004

[18] L Paya L Fernandez A Gil and O Reinoso ldquoMap build-ing and Monte Carlo localization using global appearance ofomnidirectional imagesrdquo Sensors vol 10 no 12 pp 11468ndash114972010

[19] A Friedman ldquoFraming pictures the role of knowledge in auto-matized encoding andmemory for gistrdquo Journal of ExperimentalPsychology General vol 108 no 3 pp 316ndash355 1979

[20] A Torralba ldquoContextual priming for object detectionrdquo Interna-tional Journal of Computer Vision vol 53 no 2 pp 169ndash1912003

[21] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) pp 886ndash893 Washington DC USA June 2005

12 Journal of Sensors

[22] F Amoros L Paya O Reinoso L Fernandez and J MarınldquoVisual map building and localization with an appearance-based approachmdashcomparisons of techniques to extract infor-mation of panoramic imagesrdquo in Proceedings of the 7th Inter-national Conference on Informatics in Control Automation andRobotics pp 423ndash426 2010

[23] D Filliat ldquoA visual bag of words method for interactive quali-tative localization and mappingrdquo in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRArsquo07) pp 3921ndash3926 Roma Italy April 2007

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 4: Research Article A Study of Visual Descriptors for Outdoor ...downloads.hindawi.com/journals/js/2016/1537891.pdf · Research Article A Study of Visual Descriptors for Outdoor Navigation

4 Journal of Sensors

Taking these facts into account the global description ofthe image 119891(119909 119910) consists of the magnitudes matrix 119860(119909 V)and the arguments matrix Φ(119909 V) of the Fourier SignatureThe dimensions of both matrices are 119873119909 rows and 119896119910 lt 119873119910columns First the position of the robot can be estimatedusing the information in 119860(119909 V) since it is invariant tochanges in robot orientation and second Φ(119909 V) can be usedto estimate the robot orientation

24 Gist Theconcept of the gist of an image can be defined asan abstract representation that activates the memory of scenecategories [19]The gist-based descriptors try to represent theimage by obtaining its essential information simulating thehuman perception system and its ability to recognize a scenethrough the identification of color saliency or remarkablestructures Torralba [20] presents a model to obtain globalscene features working in several spatial frequencies andusing different scales based onGabor filteringThey use thesefeatures in a scene recognition and classification task Inprevious works [13] we employed a gist-Gabor descriptor inorder to obtain frequency and orientation information Dueto the good results obtained in indoor environments whenthemobile robot presents 3 DOF (degrees of freedom)move-ments on the ground plane the fourth method employed inthe comparative analysis presented in this paper is the gistdescriptor of panoramic images

The method starts with two versions of the initialpanoramic image 119891(119909 119910) the original one with119873119909 rows and119873119910 columns and a new version after applying a Gaussianlow-pass filter and subsampling to a new size equal to 05 sdot119873119909 times 05 sdot 119873119910 After that both images are filtered with a bankof 119899119891 Gabor filters whose orientations are evenly distributedto cover the whole circle Then to reduce the amount ofinformation the pixels into both images are grouped into 1198961horizontal blocks per imagewhosewidth is equal to119873119910 in thefirst image and 05sdot119873119910 in the second oneThe average value ofthe pixels in each group is calculated and all this informationis arranged into a final descriptor which is a column vector with 2 sdot 1198961 sdot 119899119891 components This descriptor is invariantagainst rotations of the vehicle on the ground plane Moreinformation about the method can be found in [13]

25 Histogram of Oriented Gradients The Histogram ofOriented Gradients (HOG) descriptors are based on theorientation of the gradient in local areas of an image It wasdescribed initially by Dalal and Triggs [21] More conciselyit consists first in obtaining the magnitude and orientationof the gradient of each pixel of the original image Thisimage is divided then into a set of cells and a histogram ofgradient orientation is compiled for each cell aggregating theinformation of the gradient orientation of each pixel withinthe cell weighting with the magnitude of the pixel

The omnidirectional images captured from a specificposition of the ground plane contain the same pixels in a rowindependently on the orientation of the robot in this planebut in a different order Taking this fact into account if wecalculate the histogram of cells that have the same width of

the original image we obtain a descriptor which is invariantagainst rotations of the robot

The method we use is described in depth in [22] andcan be summarized as follows The initial panoramic image119891(119909 119910) with 119873119909 rows and 119873119910 columns is first filtered toobtain two images with the information of the horizontal andvertical edges 119891119909(119909 119910) and 119891119910(119909 119910) From these two imagesthe magnitude of the gradient and its orientation is obtainedpixel by pixel and the results are stored in matrices119872(119909 119910)and Θ(119909 119910) respectively Matrix Θ(119909 119910) is then divided into1198962 horizontal cells whose width is equal to119873119910 For each cellan orientation histogramwith 119887 bins is compiled During thisprocess each pixel inΘ(119909 119910) is weighted with the magnitudeof the corresponding pixel in 119872(119909 119910) At the end of theprocess the set of histograms constitutes the final descriptorℎ which is a column vector with 1198962 sdot 119887 components

3 Experiments Setup

The main objective of this work consists in carrying out anexhaustive evaluation of the performance of the descriptionmethods presented in the previous section All thesemethodswill be included in a localization algorithm and their per-formance will be evaluated and compared both in terms ofcomputational cost and localization accuracy The results ofthis comparative evaluation will give us an idea of which isthe description method that offers the best results in outdoorenvironments when using Google Street View images

With this aim two different regions in the city of Elche(Spain) have been selected and the Google Maps images ofthese two areas have been obtained and stored in two datasets Each one of these data sets will constitute a map andwill be used subsequently to estimate the position of thevehicle within the map by comparing the image capturedby the vehicle from the unknown position with the imagespreviously stored in each map

Themain features of the two sets of images are as follows

Set 1 Set 1 consists of 177 full spherical panorama imageswithresolution generally up to 3328 times 1664 pixels Each imagecovers a field of view of 360 degrees in the ground plane and180 degrees vertically Figure 1 shows the GPS position whereeach image was captured (blue dots) and two examples ofthe panoramic images after a preprocessing process This setcorresponds with a mesh topography database that containsimages of various streets and open areasThe images cover anarea of approximately 700m times 300m

Set 2 Set 2 consists of 144 full spherical panorama imagesThe images have been captured along the same street with alinear topology covering approximately 1700m The appear-ance of these images is more urban Figure 2 shows the GPSposition where each image was captured (blue dots) andthree examples of the panoramic images after a preprocessingprocess

31 Image Preprocessing and Map Creation Due to the widevertical field of view of the acquisition system the sky is

Journal of Sensors 5

Figure 1 Bird eyersquos view of the region chosen as map 1 prior to the localization experiment The blue dots represent the coordinates wherethe images of the Set 1 were captured Two examples of Google Street View images after a preprocessing step are also shown

Figure 2 Bird eyersquos view of the region chosen as map 2 prior to the localization experiment The blue dots represent the coordinates wherethe images of Set 2 were captured Three examples of Google Street View images after a preprocessing step are also shown

often a big portion of the Google Street View images Theappearance of this area will be very prone to changes whenthe localization process is carried out in a different time of daywith respect to the time of day when the map was capturedTaking this fact into account a preprocessing step has beencarried out to remove part of the sky in the scenes

Once part of the sky has been removed fromall the scenesthe images are converted into grayscale and their resolutionis reduced to 512 times 128 pixels to ensure the computationalviability of the algorithms

After that each image will be described using the fivedescription methods presented in Section 2 At the endone map will be available per image set and per description

method Each map will be composed of the set of descriptorsof each panoramic scene

32 Localization Process Once the maps are available inorder to evaluate the different visual descriptors introducedin Section 2 to solve the localization problem we also makeuse of Google Street View images

To carry out the localization process first we choose oneof the images of the database (named as test image) In thismoment this image is removed from the map Second wecompute the descriptor of the test image (using one of themethods presented in Section 2) and obtain the distancebetween this descriptor and the descriptors of the rest of

6 Journal of Sensors

the images stored in the corresponding map As a resultthe images of the map are arranged from the nearest to thefarthest using the image distance as arranging criterion

The result of the localization algorithm is consideredcorrect if the first image it returns was captured on the mappoint which is geometrically the closest to the test imagecapture point (the GPS coordinates are used with this aim)We will refer to this case as a correct localization in zone 1However since this is a quite challenging and restrictive prob-lem it is also interesting to know if the first image that thealgorithm returns was captured on one of the two geometri-cally closest points to the test image capture point (zone 2) oreven on one of the three geometrically closest points (zone 3)The first case is the ideal one but we are also interested in theother cases as they will indicate if the algorithm is returningan image in the surroundings of the actual position of the testimage (ie the localization algorithm detects that the robot isin a zone close around its actual position)

This process is repeated for each description methodusing all the images of Sets 1 and 2 as test images In briefthe procedure to test the localization methods previouslyexplained consists in the following steps for each image anddescription method

(1) Extracting one image of the set (denoted as testimage) then this test image is eliminated from themap

(2) Calculating the descriptor of the test image(3) Calculating the distance between this descriptor and

all the map descriptors which we named imagedistance

(4) Retaining the most similar descriptor and studying ifit corresponds to one image that has been captured inthe surroundings of the test image capture point (zone1 2 or 3)

As a result the next data are retained for an individual testimage the image distance between the test image descriptorand the most similar map descriptor119863119905 and the localizationresults in zone 1 (correct or wrong match) 1198981199051 in zone 2119898t2 and in zone 3 1198981199053 After repeating this process with all

the test images the results will consist of four vectors whosedimension is equal to the number of test images The firstvector contains the distances119863119905 and the other three 12 and 3 contain respectively the information of corrector incorrect matches in zones 1 2 and 3

4 Evaluation Methods

In this work the localization results are expressed by meansof recall and precision curves [14] To build them the com-ponents of the vectors 1 2 and 3 are equally sorted inascending order of the distances that appear in the first vectorThe resulting sorted vectors of correct andwrongmatches arethen used to calculate the values of recall and precision Letus focus on the sorted vector of matches in zone 1 1199041 Firstfor each component in this vector the recall is calculated asthe number of correct matches obtained so far with respect to

0 02 04 06 08 10

02

04

06

08

1

Recall

Prec

ision

Descriptor 1Descriptor 2

Figure 3 Two sample recall-precision graphs obtained after carryingout the localization experiments with two different descriptionmethods

the total number of test images Second for each componentin the same vector the precision is obtained as the number ofcorrect matches obtained so far with respect to the numberof test images considered so far Then with the informationcontained in these vectors a precision versus recall curve isbuilt corresponding to the localization in zone 1 This isrepeated with the sorted vectors 1199042 and 1199043 to obtain thelocalization results in zone 2 and zone 3

In our experiments the most important piece of infor-mation of this type of graphs is the final point because itshows the global result of the experiment (final precision afterconsidering all the test images) However additional relevantinformation can be extracted from them because the graphalso shows the ability of the localization algorithm to findthe correct match while considering a specific image distancethreshold As explained in the previous paragraph the resultshave been arranged considering the ascending value of dis-tances Taking it into account as the recall increases thethreshold also does For this reason the evolution of therecall-precision curves contains information about the robust-ness of the algorithm with respect to a specific image distancethreshold If the precision values stay high independentlyof the recall this shows the presence of a lower number ofwrong results under this distance threshold Figure 3 showstwo sample recall-precision curves obtained after running thelocalization algorithm with all the test images and two dif-ferent description methods considering zone 1 Both curvesshow a similar final precision value between 06 and 065However the evolutions present a different behavior As anexample if we consider as threshold the distance associatedto recall = 025 according to the graph the precision ofdescriptor 1 is 100 but the precision of descriptor 2 is90Thismeans that considering the selected image distancethreshold 25 of correct localizations are achieved with100 of probability using descriptor 1 and with a 90 ofprobability using descriptor 2 This study can be carried outconsidering any value for the image distance threshold

Journal of Sensors 7

Before running the algorithm it is necessary to definethe image distance We use two different distance measuresdepending on the kind of descriptor used

First in the case of the feature-based method (SURF-Harris) it is necessary to extract the interest points prior todescribing the appearance of the image We propose usingthe Harris corner detector [16] to extract visual landmarksfrom the panoramic images After that each interest point isdescribed using standard SURF To compare the test imagewith themap images first we extract and describe the interestpoints from all the images After that a matching process iscarried out with these points The points detected in the testimage captured with a particular position and orientation ofthe vehicle are searched in themap imagesThe performanceof the matching method is not the scope of this paper weonly employ it as a tool Once the matching process has beencarried out we evaluate the performance of the descriptortaking into account the number of matched points so thatwe will consider as closest image the one that presents morematching points with the test image More concisely wecompute the distance between the test image 119905 and any otherimage of the map 119895 as

119863119905119895fea = 1 minus( 119873119872119905119895max119895 (997888997888997888997888rarr119873119872119905)) (3)

where119873119872119905119895 is the number of matches between the images 119905and 119895 997888997888997888997888rarr119873119872119905 = [1198731198721199051 119873119872119905119899map] is a vector that containsthe number of matches between the image 119905 and every imageof the map and 119899map is the number of images in the map

Second in the case of appearance-based methods (2DDFT FS gist and HOG) no local information needs to beextracted from the images Instead the appearance of thewhole images is compared This way the global descriptor ofthe test image is calculated and the distances between it andthe descriptors of themap images are obtainedTheEuclideandistance is used in this case defined as

119863119905119895119864 = radic 119872sum119898=1

(119898119905 minus 119898119895 )2 (4)

where 119905 is the descriptor of the test image 119905 119895 is thedescriptor of the map image 119895 and 119872 is the size of thedescriptors This distance is normalized to obtain the finaldistance between the images 119905 and 119895 according to the nextexpression

119863119905119895app = 119863119905119895119864max119895 (119905119864) (5)

where119863119905119895119864 is the Euclidean distance between the descriptor ofthe test image 119905 and the map image 119895 119905119864 = [1198631199051119864 119863119905119899map

119864 ]is a vector that contains the Euclidean distance between thedescriptor of the image 119905 and all the images in the map and119899map is the number of images in the map

It is important to note that the algorithm must be ableto estimate the position of the robot with accuracy but it isalso important that the computational cost is adequate toknowwhether it would be feasible to solve the problem in realtime To estimate the computational cost we have computedconsidering bothmaps in the experiments the necessary timeto calculate the descriptor of each test image to compute thedistance to themap descriptors and to detect themost similardescriptor We must take into account that the descriptors ofall the map images can be computed prior to the localizationin an off-line process Apart from the time we have alsoestimated the amount of memory needed to store each imagedescriptor

To finish we also propose to study the relationship dis-tance between two image descriptors versus geometric distancebetween the capture points of these two images Ideally the dis-tance between the descriptors must increase as the geometricdistance between capture points does (ie it must not presentany local minima) This information is very interesting inapplications such as map building where the robot mustbe able to build a map using as input information only thedistance between image descriptors It is also important whenit is necessary to estimate the position of the vehicle at halfwaypoints within the gridmap Additionally it may help to detectif the problem of visual aliasing is present in the environment(ie two zones which are geometrically far may present asimilar visual appearance which might lead to errors in themapping and localization process)

5 Experimental Results

As stated in the previous section with the purpose of estab-lishing the capacity of each descriptor to correctly localizethe robot (or vehicle) we have built recall-precision curvesto reflect the results of each experiment Figure 4 shows thisgraphical representation using (a) the first and (b) the secondset of images (denoted as Sets 1 and 2 in previous sections) Tobuild this figure we consider the localization results in zone 1This way the figure shows the ability of the localization algo-rithm to correctly detect which image of the map was cap-tured closer to the test imageThis is themost restrictive case

Apart from it the performance of the localization algo-rithm in zones 2 and 3 has also been studied This wayFigure 5 shows the results of the localization process in zone2 using (a) Set 1 and (b) Set 2 Finally Figure 6 shows thelocalization results in zone 3 using (a) Set 1 and (b) Set 2Thisis the least restrictive case among the three studied

In all cases the results show that the SURF-Harris des-criptor presents a relatively better performance comparing tothe other descriptors in terms of accuracy and using bothimage sets As far as the methods based on global appearanceare concerned the good behavior ofHOG can be highlightedIn the case of the localization in zone 2 it reaches 60and 50of precision in Sets 1 and 2 respectively These results can beconsidered relatively good taking into account the fact thatthe localization process is solved in an absolute way (ie weconsider that no information about the previous position ofthe robot is available and the test image is compared with allthe images stored in the data sets) In a real application it

8 Journal of Sensors

0 01 02 03 04 050

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(a)

0 01 02 03 04 05 060

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(b)

Figure 4 Results of the localization algorithm considering the correct matches in zone 1 using (a) Set 1 and (b) Set 2 The results of eachdescription method are shown as different recall-precision curves

0 01 02 03 04 05 06 070

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(a)

0 02 04 06 080

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(b)

Figure 5 Results of the localization algorithm considering the correct matches in zone 2 using (a) Set 1 and (b) Set 2 The results of eachdescription method are shown as different recall-precision curves

is usual to make use of any kind of probabilistic algorithmto estimate the position of the robot taking into account itsprevious estimated position This is expected to provide ahigher accuracyWe expect to develop this type of algorithmsand tests in a future work

Some additional conclusions can be reached by compar-ing the performance of the methods in open areas (Set 1) andurban areas (Set 2) In open areas the performance of SURF-Harris HOG and gist is quite similar and relatively goodin all cases and the methods based on the Discrete FourierTransform tend to present worse results However in the case

of urban areas SURF-Harris outperforms the other methodsand gist is the one that presents the worst results

Apart from the localization accuracy it is also importantto study the computational cost of the process since in a realapplication it would have to run in real time as the robotis navigating through the environment This way we haveobtained in all cases the necessary time to calculate thedescriptor of the test image on the one hand and to compare itwith the descriptors stored in themap to detect themost sim-ilar descriptor and to analyze the results on the other handThe average computational time of the localization process

Journal of Sensors 9

0 02 04 06 080

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(a)

0 02 04 06 080

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(b)

Figure 6 Results of the localization algorithm considering the correct matches in zone 3 using (a) Set 1 and (b) Set 2 The results of eachdescription method are shown as different recall-precision curves

Table 1 Average computational cost of the description algorithms studied per test image For each descriptionmethod and data set the tableshows first the necessary time to obtain the test image descriptor and second the time to compare it with the descriptors of the map and toobtain the final localization result

2D Fourier Fourier Signature Gist HOG SURF-HarrisData Set 1 Descriptor 00087 s 00080 s 04886 s 00608 s 05542 sData Set 1 Match 00015 s 00058 s 00006 s 00008 s 258085 sData Set 2 Descriptor 00085 s 00079 s 04828 s 00621 s 05389 sData Set 2 Match 00012 s 00047 s 00005 s 00006 s 193931 s

after considering all the test images is shown in Table 1 Toobtain the results of this table the algorithms have beenimplemented using Matlab

With respect to the computational cost the methodsbased on the Fourier Transform are significantly faster thanthe rest while SURF-Harris presents a considerably highcomputational cost About the necessary time to compare twodescriptors gist andHOG are the fastest methods In the caseof SURF-Harris the brute force match method implementedresults in a relatively high computational cost This methodhas been chosen to make a homogeneous comparison withthe other global-appearance methods However in a realimplementation a bag-of-words based approach [23] wouldimprove the computational efficiency of the algorithm

At last we have obtained the averagememory size neededto store each descriptorThe results are shown in Table 2Gistis the most compact descriptor (it is able to compress theinformation in each scene significantly) while SURF-Harrisneeds more memory size

Considering these results jointly with the precision inlocalization we could say that the SURF-Harris descriptorshows very good results in location accuracy but its compu-tational cost makes it unfeasible to solve a real applicationHOG which is the second in terms of accuracy also has avery good computational cost so we consider it interesting to

study more thoroughly this descriptor as future work and toimplement more advanced versions of this method to try tooptimize the accuracy Likewise other types of distances tocompare images could also be studied apart from the Euc-lidean distance For the same reasons we also consider itappropriate to examine more thoroughly the gist descriptorsas well as using other methods to extract the gist of a sceneapart from the orientation information (eg from the colorinformation)

As a final experiment we have studied the relationshipdistance between two image descriptors versus geometric dista-nce between the capture points of these two images As statedat the beginning of the section this information is very inter-esting in some applications such as the construction of mapsfrom the images with geometric precision or the localizationof the vehicle at halfway points of the grid of the map It isimportant that the distance between descriptors grows as thegeometric distance does Figure 7 shows the results obtainedusing (a) Set 1 and (b) Set 2 To obtain these figures an imagehas been set as a reference image and the distance betweenthe reference image descriptor and the other descriptors hasbeen calculatedThefigure shows this distance versus the geo-metric distance between the capture points of each image andthe capture point of the reference image In both cases thisrelationship is monotonically increasing up to a geometric

10 Journal of Sensors

Table 2 Necessary memory to store each descriptor

2D Fourier Fourier Signature Gist HOG SURF-HarrisDescriptor 16384 bytes 32768 bytes 4096 bytes 8192 bytes 110400 bytes

0 200 400 600 8000

02

04

06

08

1

Des

crip

tor d

istan

ce

2D Fourier Fourier SignatureHOG

GistSURF-Harris

Geometrical distance (m)

12

(a)

Des

crip

tor d

istan

ce0 200 400 600

0

02

04

06

08

1

12

2D FourierFourier SignatureHOG

GistSURF-Harris

Geometrical distance (m)

(b)

Figure 7 Relationship distance between two image descriptors versus geometric distance between the capture points of these two images using(a) Set 1 and (b) Set 2

distance of approximately 100meters From this point it tendsto stabilize with a relatively high variance The exceptionis the local features descriptor which stabilizes at the finalvalue from a very small geometric distance However theappearance-based descriptors exhibit a more linear behavioraround each image

6 Conclusions and Future Works

In this paper we have carried out a comparative evaluationof several description methods of scenes considering theperformance of these methods to accurately solve an absolutelocalization problem in a large real outdoor environmentWeevaluated two different approaches of visual descriptors localfeatures descriptors (SURF-Harris) and global-appearancedescriptors (2D Discrete Fourier Transform Fourier Signa-ture HOG and gist)

All the tests have been carried out with images ofGoogle Street View captured under realistic conditions Twodifferent areas of a city have been considered an open areaand a purely urban area with narrower streets The capturepoints of each area present different topographyThe first oneis a grid map that covers several streets and avenues and thesecond one a linear map (ie the images were captured whenthe mobile traversed a linear path on a narrow street)

Some different studies have been performed First wehave evaluated the accuracy of the localization process To

do this recall and precision curves have been computed tocompare the performance of each description method Weplot the recall and precision curves for both areas takinginto account different levels of accuracy to consider thatthe localization result is correct In these experiments thecomputational cost of the localization process has also beenanalyzed

We have also studied each descriptor in terms of behaviorof the descriptor distance comparing to geometrical distancebetween image capture points To do this we plot a curvethat represents the descriptor distance versus the geometricaldistance between capture points This measure is very usefulfor performing navigation tasks since thanks to it we canestimate the range of use of the descriptor

It is noticeable that the SURF-Harris descriptor is themost suitable descriptor in terms of precision in localizationbut it presents a smaller zone of work in terms of Euclideandistance between descriptorsTheHOGdescriptor has showna relatively good performance to solve the localization prob-lem and presents a good response of the descriptors distanceversus geometrical distance between capture points If weanalyze jointly the results of both experiments and take intoaccount the computational cost (Tables 1 and 2) we concludethat although the SURF-Harris descriptor presents the bestresults in terms of recall and precision curves it does notallow us to work in real time Therefore taking into account

Journal of Sensors 11

that HOG is the descriptor that presents the second bestresults in terms of recall and precision curves and allows us towork in real time we can conclude that the HOG is the mostsuitable descriptor

We plan to extend this work to (a) capture a real out-door trajectory traveling along several streets and capturingomnidirectional images using a catadioptric vision system(b) combine the information provided by this vision systemand the images of the Google Street View and (c) evaluatethe performance of the best descriptors in a probabilisticlocalization process

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work has been supported by the Spanish Governmentthrough the Project DPI 2013-41557-P Navegacion de Robotsen Entornos DinamicosMedianteMapas Compactos con Infor-macion Visual de Apariencia Global and by the GeneralitatValenciana through the ProjectsAICO2015021 Localizaciony Creacion de Mapas Visuales para Navegacion de Robots con6 GDL and GV2015031 Creacion de Mapas Topologicos aPartir de la Apariencia Global de un Conjunto de Escenas

References

[1] S Thrun D Fox W Burgard and F Dellaert ldquoRobust MonteCarlo localization for mobile robotsrdquo Artificial Intelligence vol128 no 1-2 pp 99ndash141 2001

[2] A Gil O Reinoso M A Vicente C Fernandez and LPaya ldquoMonte carlo localization using SIFT featuresrdquo in PatternRecognition and Image Analysis vol 3522 of Lecture Notes inComputer Science pp 623ndash630 2005

[3] AMurillo J Guerrero andC Sagues ldquoSurf features for efficientrobot localization with omnidirectional imagesrdquo in Proceedingsof the IEEE International Conference on Robotics amp AutomationSan Diego Calif USA 2007

[4] H Bay T Tuytelaars and L Van Gool ldquoSurf speeded up robustfeaturesrdquo in Proceedings of the 9th European Conference onComputer Vision Graz Austria 2006

[5] F Rossi A Ranganathan F Dellaert and EMenegatti ldquoTowardtopological localization with spherical fourier transform anduncalibrated camerardquo in Proceedings of the International Con-ference on Simulation Modeling and Programming for Auto-nomous Robots (SIMPAR rsquo08) pp 319ndash330 Venice Italy 2008

[6] L Paya O Reinoso F Amoros L Fernandez and A Gil ldquoPro-babilistic map building localization and navigation of a team ofmobile robots application to route followingrdquo in Multi-RobotSystems Trends and Development pp 191ndash210 2011

[7] L Fernandez L Paya D Valiente A Gil and O ReinosoldquoMonte Carlo localization using the global appearance of omni-directional images algorithmoptimization to large indoor envi-ronmentsrdquo in Proceedings of the 9th International Conference onInformatics in Control Automation and Robotics (ICINCO rsquo12)pp 439ndash442 Rome Italy July 2012

[8] M Montemerlo FastSLAM a factored solution to the simul-taneous localization and mapping problem with unknown dataassociation [PhD thesis] Robotics Institute Carnegie MellonUniversity Pittsburgh Pa USA 2003

[9] C Gamallo P Quintıa R Iglesias-Rodrıguez J V Lorenzo andC V Regueiro ldquoCombination of a low cost GPS with visuallocalization based on a previous map for outdoor navigationrdquoin Proceedings of the 11th International Conference on IntelligentSystems Design and Applications (ISDA rsquo11) pp 1146ndash1151Cordoba Spain November 2011

[10] A Torii J Sivic and T Pajdla ldquoVisual localization by linearcombination of image descriptorsrdquo in Proceedings of the IEEEInternational Conference on Computer VisionWorkshops (ICCVWorkshops rsquo11) pp 102ndash109 Barcelona Spain November 2011

[11] M Aly and J-Y Bouguet ldquoStreet view goes indoors automaticpose estimation fromuncalibrated unordered spherical panora-masrdquo in Proceedings of the IEEEWorkshop on the Applications ofComputer Vision (WACV rsquo12) pp 1ndash8 Breckenridge Colo USAJanuary 2012

[12] A Taneja L Ballan andM Pollefeys ldquoRegistration of sphericalpanoramic images with cadastral 3d modelsrdquo in Proceedings ofthe International Conference on 3D Imaging Modeling Process-ing Visualization and Transmission (3DIMPVT rsquo12) pp 479ndash486 Zurich Switzerland 2012

[13] F Amoros L Paya O Reinoso and L Jimenez ldquoComparisonof global-appearance techniques applied to visual map buildingand localizationrdquo in Proceedings of the International Conferenceon Computer Vision Theory and Applications pp 395ndash398Rome Italy 2012

[14] A Gil O M Mozos M Ballesta and O Reinoso ldquoA compara-tive evaluation of interest point detectors and local descriptorsfor visual SLAMrdquo Machine Vision and Applications vol 21 no6 pp 905ndash920 2010

[15] L Paya L FernandezO ReinosoAGil andDUbeda ldquoAppea-rance-based dense maps creation Comparison of compressiontechniques with panoramic imagesrdquo in Proceedings of the Inter-national Conference on Informatics in Control Automation andRobotics (INSTICC rsquo09) pp 238ndash246 Milan Italy 2009

[16] C Harris and M Stephens ldquoA combined corner and edgedetectorrdquo in Proceedings of the Alvey Vision Conference pp 231ndash236 Manchester UK 1988

[17] E Menegatti T Maeda and H Ishiguro ldquoImage-based mem-ory for robot navigation using properties of omnidirectionalimagesrdquo Robotics and Autonomous Systems vol 47 no 4 pp251ndash267 2004

[18] L Paya L Fernandez A Gil and O Reinoso ldquoMap build-ing and Monte Carlo localization using global appearance ofomnidirectional imagesrdquo Sensors vol 10 no 12 pp 11468ndash114972010

[19] A Friedman ldquoFraming pictures the role of knowledge in auto-matized encoding andmemory for gistrdquo Journal of ExperimentalPsychology General vol 108 no 3 pp 316ndash355 1979

[20] A Torralba ldquoContextual priming for object detectionrdquo Interna-tional Journal of Computer Vision vol 53 no 2 pp 169ndash1912003

[21] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) pp 886ndash893 Washington DC USA June 2005

12 Journal of Sensors

[22] F Amoros L Paya O Reinoso L Fernandez and J MarınldquoVisual map building and localization with an appearance-based approachmdashcomparisons of techniques to extract infor-mation of panoramic imagesrdquo in Proceedings of the 7th Inter-national Conference on Informatics in Control Automation andRobotics pp 423ndash426 2010

[23] D Filliat ldquoA visual bag of words method for interactive quali-tative localization and mappingrdquo in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRArsquo07) pp 3921ndash3926 Roma Italy April 2007

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 5: Research Article A Study of Visual Descriptors for Outdoor ...downloads.hindawi.com/journals/js/2016/1537891.pdf · Research Article A Study of Visual Descriptors for Outdoor Navigation

Journal of Sensors 5

Figure 1 Bird eyersquos view of the region chosen as map 1 prior to the localization experiment The blue dots represent the coordinates wherethe images of the Set 1 were captured Two examples of Google Street View images after a preprocessing step are also shown

Figure 2 Bird eyersquos view of the region chosen as map 2 prior to the localization experiment The blue dots represent the coordinates wherethe images of Set 2 were captured Three examples of Google Street View images after a preprocessing step are also shown

often a big portion of the Google Street View images Theappearance of this area will be very prone to changes whenthe localization process is carried out in a different time of daywith respect to the time of day when the map was capturedTaking this fact into account a preprocessing step has beencarried out to remove part of the sky in the scenes

Once part of the sky has been removed fromall the scenesthe images are converted into grayscale and their resolutionis reduced to 512 times 128 pixels to ensure the computationalviability of the algorithms

After that each image will be described using the fivedescription methods presented in Section 2 At the endone map will be available per image set and per description

method Each map will be composed of the set of descriptorsof each panoramic scene

32 Localization Process Once the maps are available inorder to evaluate the different visual descriptors introducedin Section 2 to solve the localization problem we also makeuse of Google Street View images

To carry out the localization process first we choose oneof the images of the database (named as test image) In thismoment this image is removed from the map Second wecompute the descriptor of the test image (using one of themethods presented in Section 2) and obtain the distancebetween this descriptor and the descriptors of the rest of

6 Journal of Sensors

the images stored in the corresponding map As a resultthe images of the map are arranged from the nearest to thefarthest using the image distance as arranging criterion

The result of the localization algorithm is consideredcorrect if the first image it returns was captured on the mappoint which is geometrically the closest to the test imagecapture point (the GPS coordinates are used with this aim)We will refer to this case as a correct localization in zone 1However since this is a quite challenging and restrictive prob-lem it is also interesting to know if the first image that thealgorithm returns was captured on one of the two geometri-cally closest points to the test image capture point (zone 2) oreven on one of the three geometrically closest points (zone 3)The first case is the ideal one but we are also interested in theother cases as they will indicate if the algorithm is returningan image in the surroundings of the actual position of the testimage (ie the localization algorithm detects that the robot isin a zone close around its actual position)

This process is repeated for each description methodusing all the images of Sets 1 and 2 as test images In briefthe procedure to test the localization methods previouslyexplained consists in the following steps for each image anddescription method

(1) Extracting one image of the set (denoted as testimage) then this test image is eliminated from themap

(2) Calculating the descriptor of the test image(3) Calculating the distance between this descriptor and

all the map descriptors which we named imagedistance

(4) Retaining the most similar descriptor and studying ifit corresponds to one image that has been captured inthe surroundings of the test image capture point (zone1 2 or 3)

As a result the next data are retained for an individual testimage the image distance between the test image descriptorand the most similar map descriptor119863119905 and the localizationresults in zone 1 (correct or wrong match) 1198981199051 in zone 2119898t2 and in zone 3 1198981199053 After repeating this process with all

the test images the results will consist of four vectors whosedimension is equal to the number of test images The firstvector contains the distances119863119905 and the other three 12 and 3 contain respectively the information of corrector incorrect matches in zones 1 2 and 3

4 Evaluation Methods

In this work the localization results are expressed by meansof recall and precision curves [14] To build them the com-ponents of the vectors 1 2 and 3 are equally sorted inascending order of the distances that appear in the first vectorThe resulting sorted vectors of correct andwrongmatches arethen used to calculate the values of recall and precision Letus focus on the sorted vector of matches in zone 1 1199041 Firstfor each component in this vector the recall is calculated asthe number of correct matches obtained so far with respect to

0 02 04 06 08 10

02

04

06

08

1

Recall

Prec

ision

Descriptor 1Descriptor 2

Figure 3 Two sample recall-precision graphs obtained after carryingout the localization experiments with two different descriptionmethods

the total number of test images Second for each componentin the same vector the precision is obtained as the number ofcorrect matches obtained so far with respect to the numberof test images considered so far Then with the informationcontained in these vectors a precision versus recall curve isbuilt corresponding to the localization in zone 1 This isrepeated with the sorted vectors 1199042 and 1199043 to obtain thelocalization results in zone 2 and zone 3

In our experiments the most important piece of infor-mation of this type of graphs is the final point because itshows the global result of the experiment (final precision afterconsidering all the test images) However additional relevantinformation can be extracted from them because the graphalso shows the ability of the localization algorithm to findthe correct match while considering a specific image distancethreshold As explained in the previous paragraph the resultshave been arranged considering the ascending value of dis-tances Taking it into account as the recall increases thethreshold also does For this reason the evolution of therecall-precision curves contains information about the robust-ness of the algorithm with respect to a specific image distancethreshold If the precision values stay high independentlyof the recall this shows the presence of a lower number ofwrong results under this distance threshold Figure 3 showstwo sample recall-precision curves obtained after running thelocalization algorithm with all the test images and two dif-ferent description methods considering zone 1 Both curvesshow a similar final precision value between 06 and 065However the evolutions present a different behavior As anexample if we consider as threshold the distance associatedto recall = 025 according to the graph the precision ofdescriptor 1 is 100 but the precision of descriptor 2 is90Thismeans that considering the selected image distancethreshold 25 of correct localizations are achieved with100 of probability using descriptor 1 and with a 90 ofprobability using descriptor 2 This study can be carried outconsidering any value for the image distance threshold

Journal of Sensors 7

Before running the algorithm it is necessary to definethe image distance We use two different distance measuresdepending on the kind of descriptor used

First in the case of the feature-based method (SURF-Harris) it is necessary to extract the interest points prior todescribing the appearance of the image We propose usingthe Harris corner detector [16] to extract visual landmarksfrom the panoramic images After that each interest point isdescribed using standard SURF To compare the test imagewith themap images first we extract and describe the interestpoints from all the images After that a matching process iscarried out with these points The points detected in the testimage captured with a particular position and orientation ofthe vehicle are searched in themap imagesThe performanceof the matching method is not the scope of this paper weonly employ it as a tool Once the matching process has beencarried out we evaluate the performance of the descriptortaking into account the number of matched points so thatwe will consider as closest image the one that presents morematching points with the test image More concisely wecompute the distance between the test image 119905 and any otherimage of the map 119895 as

119863119905119895fea = 1 minus( 119873119872119905119895max119895 (997888997888997888997888rarr119873119872119905)) (3)

where119873119872119905119895 is the number of matches between the images 119905and 119895 997888997888997888997888rarr119873119872119905 = [1198731198721199051 119873119872119905119899map] is a vector that containsthe number of matches between the image 119905 and every imageof the map and 119899map is the number of images in the map

Second in the case of appearance-based methods (2DDFT FS gist and HOG) no local information needs to beextracted from the images Instead the appearance of thewhole images is compared This way the global descriptor ofthe test image is calculated and the distances between it andthe descriptors of themap images are obtainedTheEuclideandistance is used in this case defined as

119863119905119895119864 = radic 119872sum119898=1

(119898119905 minus 119898119895 )2 (4)

where 119905 is the descriptor of the test image 119905 119895 is thedescriptor of the map image 119895 and 119872 is the size of thedescriptors This distance is normalized to obtain the finaldistance between the images 119905 and 119895 according to the nextexpression

119863119905119895app = 119863119905119895119864max119895 (119905119864) (5)

where119863119905119895119864 is the Euclidean distance between the descriptor ofthe test image 119905 and the map image 119895 119905119864 = [1198631199051119864 119863119905119899map

119864 ]is a vector that contains the Euclidean distance between thedescriptor of the image 119905 and all the images in the map and119899map is the number of images in the map

It is important to note that the algorithm must be ableto estimate the position of the robot with accuracy but it isalso important that the computational cost is adequate toknowwhether it would be feasible to solve the problem in realtime To estimate the computational cost we have computedconsidering bothmaps in the experiments the necessary timeto calculate the descriptor of each test image to compute thedistance to themap descriptors and to detect themost similardescriptor We must take into account that the descriptors ofall the map images can be computed prior to the localizationin an off-line process Apart from the time we have alsoestimated the amount of memory needed to store each imagedescriptor

To finish we also propose to study the relationship dis-tance between two image descriptors versus geometric distancebetween the capture points of these two images Ideally the dis-tance between the descriptors must increase as the geometricdistance between capture points does (ie it must not presentany local minima) This information is very interesting inapplications such as map building where the robot mustbe able to build a map using as input information only thedistance between image descriptors It is also important whenit is necessary to estimate the position of the vehicle at halfwaypoints within the gridmap Additionally it may help to detectif the problem of visual aliasing is present in the environment(ie two zones which are geometrically far may present asimilar visual appearance which might lead to errors in themapping and localization process)

5 Experimental Results

As stated in the previous section with the purpose of estab-lishing the capacity of each descriptor to correctly localizethe robot (or vehicle) we have built recall-precision curvesto reflect the results of each experiment Figure 4 shows thisgraphical representation using (a) the first and (b) the secondset of images (denoted as Sets 1 and 2 in previous sections) Tobuild this figure we consider the localization results in zone 1This way the figure shows the ability of the localization algo-rithm to correctly detect which image of the map was cap-tured closer to the test imageThis is themost restrictive case

Apart from it the performance of the localization algo-rithm in zones 2 and 3 has also been studied This wayFigure 5 shows the results of the localization process in zone2 using (a) Set 1 and (b) Set 2 Finally Figure 6 shows thelocalization results in zone 3 using (a) Set 1 and (b) Set 2Thisis the least restrictive case among the three studied

In all cases the results show that the SURF-Harris des-criptor presents a relatively better performance comparing tothe other descriptors in terms of accuracy and using bothimage sets As far as the methods based on global appearanceare concerned the good behavior ofHOG can be highlightedIn the case of the localization in zone 2 it reaches 60and 50of precision in Sets 1 and 2 respectively These results can beconsidered relatively good taking into account the fact thatthe localization process is solved in an absolute way (ie weconsider that no information about the previous position ofthe robot is available and the test image is compared with allthe images stored in the data sets) In a real application it

8 Journal of Sensors

0 01 02 03 04 050

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(a)

0 01 02 03 04 05 060

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(b)

Figure 4 Results of the localization algorithm considering the correct matches in zone 1 using (a) Set 1 and (b) Set 2 The results of eachdescription method are shown as different recall-precision curves

0 01 02 03 04 05 06 070

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(a)

0 02 04 06 080

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(b)

Figure 5 Results of the localization algorithm considering the correct matches in zone 2 using (a) Set 1 and (b) Set 2 The results of eachdescription method are shown as different recall-precision curves

is usual to make use of any kind of probabilistic algorithmto estimate the position of the robot taking into account itsprevious estimated position This is expected to provide ahigher accuracyWe expect to develop this type of algorithmsand tests in a future work

Some additional conclusions can be reached by compar-ing the performance of the methods in open areas (Set 1) andurban areas (Set 2) In open areas the performance of SURF-Harris HOG and gist is quite similar and relatively goodin all cases and the methods based on the Discrete FourierTransform tend to present worse results However in the case

of urban areas SURF-Harris outperforms the other methodsand gist is the one that presents the worst results

Apart from the localization accuracy it is also importantto study the computational cost of the process since in a realapplication it would have to run in real time as the robotis navigating through the environment This way we haveobtained in all cases the necessary time to calculate thedescriptor of the test image on the one hand and to compare itwith the descriptors stored in themap to detect themost sim-ilar descriptor and to analyze the results on the other handThe average computational time of the localization process

Journal of Sensors 9

0 02 04 06 080

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(a)

0 02 04 06 080

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(b)

Figure 6 Results of the localization algorithm considering the correct matches in zone 3 using (a) Set 1 and (b) Set 2 The results of eachdescription method are shown as different recall-precision curves

Table 1 Average computational cost of the description algorithms studied per test image For each descriptionmethod and data set the tableshows first the necessary time to obtain the test image descriptor and second the time to compare it with the descriptors of the map and toobtain the final localization result

2D Fourier Fourier Signature Gist HOG SURF-HarrisData Set 1 Descriptor 00087 s 00080 s 04886 s 00608 s 05542 sData Set 1 Match 00015 s 00058 s 00006 s 00008 s 258085 sData Set 2 Descriptor 00085 s 00079 s 04828 s 00621 s 05389 sData Set 2 Match 00012 s 00047 s 00005 s 00006 s 193931 s

after considering all the test images is shown in Table 1 Toobtain the results of this table the algorithms have beenimplemented using Matlab

With respect to the computational cost the methodsbased on the Fourier Transform are significantly faster thanthe rest while SURF-Harris presents a considerably highcomputational cost About the necessary time to compare twodescriptors gist andHOG are the fastest methods In the caseof SURF-Harris the brute force match method implementedresults in a relatively high computational cost This methodhas been chosen to make a homogeneous comparison withthe other global-appearance methods However in a realimplementation a bag-of-words based approach [23] wouldimprove the computational efficiency of the algorithm

At last we have obtained the averagememory size neededto store each descriptorThe results are shown in Table 2Gistis the most compact descriptor (it is able to compress theinformation in each scene significantly) while SURF-Harrisneeds more memory size

Considering these results jointly with the precision inlocalization we could say that the SURF-Harris descriptorshows very good results in location accuracy but its compu-tational cost makes it unfeasible to solve a real applicationHOG which is the second in terms of accuracy also has avery good computational cost so we consider it interesting to

study more thoroughly this descriptor as future work and toimplement more advanced versions of this method to try tooptimize the accuracy Likewise other types of distances tocompare images could also be studied apart from the Euc-lidean distance For the same reasons we also consider itappropriate to examine more thoroughly the gist descriptorsas well as using other methods to extract the gist of a sceneapart from the orientation information (eg from the colorinformation)

As a final experiment we have studied the relationshipdistance between two image descriptors versus geometric dista-nce between the capture points of these two images As statedat the beginning of the section this information is very inter-esting in some applications such as the construction of mapsfrom the images with geometric precision or the localizationof the vehicle at halfway points of the grid of the map It isimportant that the distance between descriptors grows as thegeometric distance does Figure 7 shows the results obtainedusing (a) Set 1 and (b) Set 2 To obtain these figures an imagehas been set as a reference image and the distance betweenthe reference image descriptor and the other descriptors hasbeen calculatedThefigure shows this distance versus the geo-metric distance between the capture points of each image andthe capture point of the reference image In both cases thisrelationship is monotonically increasing up to a geometric

10 Journal of Sensors

Table 2 Necessary memory to store each descriptor

2D Fourier Fourier Signature Gist HOG SURF-HarrisDescriptor 16384 bytes 32768 bytes 4096 bytes 8192 bytes 110400 bytes

0 200 400 600 8000

02

04

06

08

1

Des

crip

tor d

istan

ce

2D Fourier Fourier SignatureHOG

GistSURF-Harris

Geometrical distance (m)

12

(a)

Des

crip

tor d

istan

ce0 200 400 600

0

02

04

06

08

1

12

2D FourierFourier SignatureHOG

GistSURF-Harris

Geometrical distance (m)

(b)

Figure 7 Relationship distance between two image descriptors versus geometric distance between the capture points of these two images using(a) Set 1 and (b) Set 2

distance of approximately 100meters From this point it tendsto stabilize with a relatively high variance The exceptionis the local features descriptor which stabilizes at the finalvalue from a very small geometric distance However theappearance-based descriptors exhibit a more linear behavioraround each image

6 Conclusions and Future Works

In this paper we have carried out a comparative evaluationof several description methods of scenes considering theperformance of these methods to accurately solve an absolutelocalization problem in a large real outdoor environmentWeevaluated two different approaches of visual descriptors localfeatures descriptors (SURF-Harris) and global-appearancedescriptors (2D Discrete Fourier Transform Fourier Signa-ture HOG and gist)

All the tests have been carried out with images ofGoogle Street View captured under realistic conditions Twodifferent areas of a city have been considered an open areaand a purely urban area with narrower streets The capturepoints of each area present different topographyThe first oneis a grid map that covers several streets and avenues and thesecond one a linear map (ie the images were captured whenthe mobile traversed a linear path on a narrow street)

Some different studies have been performed First wehave evaluated the accuracy of the localization process To

do this recall and precision curves have been computed tocompare the performance of each description method Weplot the recall and precision curves for both areas takinginto account different levels of accuracy to consider thatthe localization result is correct In these experiments thecomputational cost of the localization process has also beenanalyzed

We have also studied each descriptor in terms of behaviorof the descriptor distance comparing to geometrical distancebetween image capture points To do this we plot a curvethat represents the descriptor distance versus the geometricaldistance between capture points This measure is very usefulfor performing navigation tasks since thanks to it we canestimate the range of use of the descriptor

It is noticeable that the SURF-Harris descriptor is themost suitable descriptor in terms of precision in localizationbut it presents a smaller zone of work in terms of Euclideandistance between descriptorsTheHOGdescriptor has showna relatively good performance to solve the localization prob-lem and presents a good response of the descriptors distanceversus geometrical distance between capture points If weanalyze jointly the results of both experiments and take intoaccount the computational cost (Tables 1 and 2) we concludethat although the SURF-Harris descriptor presents the bestresults in terms of recall and precision curves it does notallow us to work in real time Therefore taking into account

Journal of Sensors 11

that HOG is the descriptor that presents the second bestresults in terms of recall and precision curves and allows us towork in real time we can conclude that the HOG is the mostsuitable descriptor

We plan to extend this work to (a) capture a real out-door trajectory traveling along several streets and capturingomnidirectional images using a catadioptric vision system(b) combine the information provided by this vision systemand the images of the Google Street View and (c) evaluatethe performance of the best descriptors in a probabilisticlocalization process

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work has been supported by the Spanish Governmentthrough the Project DPI 2013-41557-P Navegacion de Robotsen Entornos DinamicosMedianteMapas Compactos con Infor-macion Visual de Apariencia Global and by the GeneralitatValenciana through the ProjectsAICO2015021 Localizaciony Creacion de Mapas Visuales para Navegacion de Robots con6 GDL and GV2015031 Creacion de Mapas Topologicos aPartir de la Apariencia Global de un Conjunto de Escenas

References

[1] S Thrun D Fox W Burgard and F Dellaert ldquoRobust MonteCarlo localization for mobile robotsrdquo Artificial Intelligence vol128 no 1-2 pp 99ndash141 2001

[2] A Gil O Reinoso M A Vicente C Fernandez and LPaya ldquoMonte carlo localization using SIFT featuresrdquo in PatternRecognition and Image Analysis vol 3522 of Lecture Notes inComputer Science pp 623ndash630 2005

[3] AMurillo J Guerrero andC Sagues ldquoSurf features for efficientrobot localization with omnidirectional imagesrdquo in Proceedingsof the IEEE International Conference on Robotics amp AutomationSan Diego Calif USA 2007

[4] H Bay T Tuytelaars and L Van Gool ldquoSurf speeded up robustfeaturesrdquo in Proceedings of the 9th European Conference onComputer Vision Graz Austria 2006

[5] F Rossi A Ranganathan F Dellaert and EMenegatti ldquoTowardtopological localization with spherical fourier transform anduncalibrated camerardquo in Proceedings of the International Con-ference on Simulation Modeling and Programming for Auto-nomous Robots (SIMPAR rsquo08) pp 319ndash330 Venice Italy 2008

[6] L Paya O Reinoso F Amoros L Fernandez and A Gil ldquoPro-babilistic map building localization and navigation of a team ofmobile robots application to route followingrdquo in Multi-RobotSystems Trends and Development pp 191ndash210 2011

[7] L Fernandez L Paya D Valiente A Gil and O ReinosoldquoMonte Carlo localization using the global appearance of omni-directional images algorithmoptimization to large indoor envi-ronmentsrdquo in Proceedings of the 9th International Conference onInformatics in Control Automation and Robotics (ICINCO rsquo12)pp 439ndash442 Rome Italy July 2012

[8] M Montemerlo FastSLAM a factored solution to the simul-taneous localization and mapping problem with unknown dataassociation [PhD thesis] Robotics Institute Carnegie MellonUniversity Pittsburgh Pa USA 2003

[9] C Gamallo P Quintıa R Iglesias-Rodrıguez J V Lorenzo andC V Regueiro ldquoCombination of a low cost GPS with visuallocalization based on a previous map for outdoor navigationrdquoin Proceedings of the 11th International Conference on IntelligentSystems Design and Applications (ISDA rsquo11) pp 1146ndash1151Cordoba Spain November 2011

[10] A Torii J Sivic and T Pajdla ldquoVisual localization by linearcombination of image descriptorsrdquo in Proceedings of the IEEEInternational Conference on Computer VisionWorkshops (ICCVWorkshops rsquo11) pp 102ndash109 Barcelona Spain November 2011

[11] M Aly and J-Y Bouguet ldquoStreet view goes indoors automaticpose estimation fromuncalibrated unordered spherical panora-masrdquo in Proceedings of the IEEEWorkshop on the Applications ofComputer Vision (WACV rsquo12) pp 1ndash8 Breckenridge Colo USAJanuary 2012

[12] A Taneja L Ballan andM Pollefeys ldquoRegistration of sphericalpanoramic images with cadastral 3d modelsrdquo in Proceedings ofthe International Conference on 3D Imaging Modeling Process-ing Visualization and Transmission (3DIMPVT rsquo12) pp 479ndash486 Zurich Switzerland 2012

[13] F Amoros L Paya O Reinoso and L Jimenez ldquoComparisonof global-appearance techniques applied to visual map buildingand localizationrdquo in Proceedings of the International Conferenceon Computer Vision Theory and Applications pp 395ndash398Rome Italy 2012

[14] A Gil O M Mozos M Ballesta and O Reinoso ldquoA compara-tive evaluation of interest point detectors and local descriptorsfor visual SLAMrdquo Machine Vision and Applications vol 21 no6 pp 905ndash920 2010

[15] L Paya L FernandezO ReinosoAGil andDUbeda ldquoAppea-rance-based dense maps creation Comparison of compressiontechniques with panoramic imagesrdquo in Proceedings of the Inter-national Conference on Informatics in Control Automation andRobotics (INSTICC rsquo09) pp 238ndash246 Milan Italy 2009

[16] C Harris and M Stephens ldquoA combined corner and edgedetectorrdquo in Proceedings of the Alvey Vision Conference pp 231ndash236 Manchester UK 1988

[17] E Menegatti T Maeda and H Ishiguro ldquoImage-based mem-ory for robot navigation using properties of omnidirectionalimagesrdquo Robotics and Autonomous Systems vol 47 no 4 pp251ndash267 2004

[18] L Paya L Fernandez A Gil and O Reinoso ldquoMap build-ing and Monte Carlo localization using global appearance ofomnidirectional imagesrdquo Sensors vol 10 no 12 pp 11468ndash114972010

[19] A Friedman ldquoFraming pictures the role of knowledge in auto-matized encoding andmemory for gistrdquo Journal of ExperimentalPsychology General vol 108 no 3 pp 316ndash355 1979

[20] A Torralba ldquoContextual priming for object detectionrdquo Interna-tional Journal of Computer Vision vol 53 no 2 pp 169ndash1912003

[21] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) pp 886ndash893 Washington DC USA June 2005

12 Journal of Sensors

[22] F Amoros L Paya O Reinoso L Fernandez and J MarınldquoVisual map building and localization with an appearance-based approachmdashcomparisons of techniques to extract infor-mation of panoramic imagesrdquo in Proceedings of the 7th Inter-national Conference on Informatics in Control Automation andRobotics pp 423ndash426 2010

[23] D Filliat ldquoA visual bag of words method for interactive quali-tative localization and mappingrdquo in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRArsquo07) pp 3921ndash3926 Roma Italy April 2007

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 6: Research Article A Study of Visual Descriptors for Outdoor ...downloads.hindawi.com/journals/js/2016/1537891.pdf · Research Article A Study of Visual Descriptors for Outdoor Navigation

6 Journal of Sensors

the images stored in the corresponding map As a resultthe images of the map are arranged from the nearest to thefarthest using the image distance as arranging criterion

The result of the localization algorithm is consideredcorrect if the first image it returns was captured on the mappoint which is geometrically the closest to the test imagecapture point (the GPS coordinates are used with this aim)We will refer to this case as a correct localization in zone 1However since this is a quite challenging and restrictive prob-lem it is also interesting to know if the first image that thealgorithm returns was captured on one of the two geometri-cally closest points to the test image capture point (zone 2) oreven on one of the three geometrically closest points (zone 3)The first case is the ideal one but we are also interested in theother cases as they will indicate if the algorithm is returningan image in the surroundings of the actual position of the testimage (ie the localization algorithm detects that the robot isin a zone close around its actual position)

This process is repeated for each description methodusing all the images of Sets 1 and 2 as test images In briefthe procedure to test the localization methods previouslyexplained consists in the following steps for each image anddescription method

(1) Extracting one image of the set (denoted as testimage) then this test image is eliminated from themap

(2) Calculating the descriptor of the test image(3) Calculating the distance between this descriptor and

all the map descriptors which we named imagedistance

(4) Retaining the most similar descriptor and studying ifit corresponds to one image that has been captured inthe surroundings of the test image capture point (zone1 2 or 3)

As a result the next data are retained for an individual testimage the image distance between the test image descriptorand the most similar map descriptor119863119905 and the localizationresults in zone 1 (correct or wrong match) 1198981199051 in zone 2119898t2 and in zone 3 1198981199053 After repeating this process with all

the test images the results will consist of four vectors whosedimension is equal to the number of test images The firstvector contains the distances119863119905 and the other three 12 and 3 contain respectively the information of corrector incorrect matches in zones 1 2 and 3

4 Evaluation Methods

In this work the localization results are expressed by meansof recall and precision curves [14] To build them the com-ponents of the vectors 1 2 and 3 are equally sorted inascending order of the distances that appear in the first vectorThe resulting sorted vectors of correct andwrongmatches arethen used to calculate the values of recall and precision Letus focus on the sorted vector of matches in zone 1 1199041 Firstfor each component in this vector the recall is calculated asthe number of correct matches obtained so far with respect to

0 02 04 06 08 10

02

04

06

08

1

Recall

Prec

ision

Descriptor 1Descriptor 2

Figure 3 Two sample recall-precision graphs obtained after carryingout the localization experiments with two different descriptionmethods

the total number of test images Second for each componentin the same vector the precision is obtained as the number ofcorrect matches obtained so far with respect to the numberof test images considered so far Then with the informationcontained in these vectors a precision versus recall curve isbuilt corresponding to the localization in zone 1 This isrepeated with the sorted vectors 1199042 and 1199043 to obtain thelocalization results in zone 2 and zone 3

In our experiments the most important piece of infor-mation of this type of graphs is the final point because itshows the global result of the experiment (final precision afterconsidering all the test images) However additional relevantinformation can be extracted from them because the graphalso shows the ability of the localization algorithm to findthe correct match while considering a specific image distancethreshold As explained in the previous paragraph the resultshave been arranged considering the ascending value of dis-tances Taking it into account as the recall increases thethreshold also does For this reason the evolution of therecall-precision curves contains information about the robust-ness of the algorithm with respect to a specific image distancethreshold If the precision values stay high independentlyof the recall this shows the presence of a lower number ofwrong results under this distance threshold Figure 3 showstwo sample recall-precision curves obtained after running thelocalization algorithm with all the test images and two dif-ferent description methods considering zone 1 Both curvesshow a similar final precision value between 06 and 065However the evolutions present a different behavior As anexample if we consider as threshold the distance associatedto recall = 025 according to the graph the precision ofdescriptor 1 is 100 but the precision of descriptor 2 is90Thismeans that considering the selected image distancethreshold 25 of correct localizations are achieved with100 of probability using descriptor 1 and with a 90 ofprobability using descriptor 2 This study can be carried outconsidering any value for the image distance threshold

Journal of Sensors 7

Before running the algorithm it is necessary to definethe image distance We use two different distance measuresdepending on the kind of descriptor used

First in the case of the feature-based method (SURF-Harris) it is necessary to extract the interest points prior todescribing the appearance of the image We propose usingthe Harris corner detector [16] to extract visual landmarksfrom the panoramic images After that each interest point isdescribed using standard SURF To compare the test imagewith themap images first we extract and describe the interestpoints from all the images After that a matching process iscarried out with these points The points detected in the testimage captured with a particular position and orientation ofthe vehicle are searched in themap imagesThe performanceof the matching method is not the scope of this paper weonly employ it as a tool Once the matching process has beencarried out we evaluate the performance of the descriptortaking into account the number of matched points so thatwe will consider as closest image the one that presents morematching points with the test image More concisely wecompute the distance between the test image 119905 and any otherimage of the map 119895 as

119863119905119895fea = 1 minus( 119873119872119905119895max119895 (997888997888997888997888rarr119873119872119905)) (3)

where119873119872119905119895 is the number of matches between the images 119905and 119895 997888997888997888997888rarr119873119872119905 = [1198731198721199051 119873119872119905119899map] is a vector that containsthe number of matches between the image 119905 and every imageof the map and 119899map is the number of images in the map

Second in the case of appearance-based methods (2DDFT FS gist and HOG) no local information needs to beextracted from the images Instead the appearance of thewhole images is compared This way the global descriptor ofthe test image is calculated and the distances between it andthe descriptors of themap images are obtainedTheEuclideandistance is used in this case defined as

119863119905119895119864 = radic 119872sum119898=1

(119898119905 minus 119898119895 )2 (4)

where 119905 is the descriptor of the test image 119905 119895 is thedescriptor of the map image 119895 and 119872 is the size of thedescriptors This distance is normalized to obtain the finaldistance between the images 119905 and 119895 according to the nextexpression

119863119905119895app = 119863119905119895119864max119895 (119905119864) (5)

where119863119905119895119864 is the Euclidean distance between the descriptor ofthe test image 119905 and the map image 119895 119905119864 = [1198631199051119864 119863119905119899map

119864 ]is a vector that contains the Euclidean distance between thedescriptor of the image 119905 and all the images in the map and119899map is the number of images in the map

It is important to note that the algorithm must be ableto estimate the position of the robot with accuracy but it isalso important that the computational cost is adequate toknowwhether it would be feasible to solve the problem in realtime To estimate the computational cost we have computedconsidering bothmaps in the experiments the necessary timeto calculate the descriptor of each test image to compute thedistance to themap descriptors and to detect themost similardescriptor We must take into account that the descriptors ofall the map images can be computed prior to the localizationin an off-line process Apart from the time we have alsoestimated the amount of memory needed to store each imagedescriptor

To finish we also propose to study the relationship dis-tance between two image descriptors versus geometric distancebetween the capture points of these two images Ideally the dis-tance between the descriptors must increase as the geometricdistance between capture points does (ie it must not presentany local minima) This information is very interesting inapplications such as map building where the robot mustbe able to build a map using as input information only thedistance between image descriptors It is also important whenit is necessary to estimate the position of the vehicle at halfwaypoints within the gridmap Additionally it may help to detectif the problem of visual aliasing is present in the environment(ie two zones which are geometrically far may present asimilar visual appearance which might lead to errors in themapping and localization process)

5 Experimental Results

As stated in the previous section with the purpose of estab-lishing the capacity of each descriptor to correctly localizethe robot (or vehicle) we have built recall-precision curvesto reflect the results of each experiment Figure 4 shows thisgraphical representation using (a) the first and (b) the secondset of images (denoted as Sets 1 and 2 in previous sections) Tobuild this figure we consider the localization results in zone 1This way the figure shows the ability of the localization algo-rithm to correctly detect which image of the map was cap-tured closer to the test imageThis is themost restrictive case

Apart from it the performance of the localization algo-rithm in zones 2 and 3 has also been studied This wayFigure 5 shows the results of the localization process in zone2 using (a) Set 1 and (b) Set 2 Finally Figure 6 shows thelocalization results in zone 3 using (a) Set 1 and (b) Set 2Thisis the least restrictive case among the three studied

In all cases the results show that the SURF-Harris des-criptor presents a relatively better performance comparing tothe other descriptors in terms of accuracy and using bothimage sets As far as the methods based on global appearanceare concerned the good behavior ofHOG can be highlightedIn the case of the localization in zone 2 it reaches 60and 50of precision in Sets 1 and 2 respectively These results can beconsidered relatively good taking into account the fact thatthe localization process is solved in an absolute way (ie weconsider that no information about the previous position ofthe robot is available and the test image is compared with allthe images stored in the data sets) In a real application it

8 Journal of Sensors

0 01 02 03 04 050

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(a)

0 01 02 03 04 05 060

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(b)

Figure 4 Results of the localization algorithm considering the correct matches in zone 1 using (a) Set 1 and (b) Set 2 The results of eachdescription method are shown as different recall-precision curves

0 01 02 03 04 05 06 070

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(a)

0 02 04 06 080

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(b)

Figure 5 Results of the localization algorithm considering the correct matches in zone 2 using (a) Set 1 and (b) Set 2 The results of eachdescription method are shown as different recall-precision curves

is usual to make use of any kind of probabilistic algorithmto estimate the position of the robot taking into account itsprevious estimated position This is expected to provide ahigher accuracyWe expect to develop this type of algorithmsand tests in a future work

Some additional conclusions can be reached by compar-ing the performance of the methods in open areas (Set 1) andurban areas (Set 2) In open areas the performance of SURF-Harris HOG and gist is quite similar and relatively goodin all cases and the methods based on the Discrete FourierTransform tend to present worse results However in the case

of urban areas SURF-Harris outperforms the other methodsand gist is the one that presents the worst results

Apart from the localization accuracy it is also importantto study the computational cost of the process since in a realapplication it would have to run in real time as the robotis navigating through the environment This way we haveobtained in all cases the necessary time to calculate thedescriptor of the test image on the one hand and to compare itwith the descriptors stored in themap to detect themost sim-ilar descriptor and to analyze the results on the other handThe average computational time of the localization process

Journal of Sensors 9

0 02 04 06 080

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(a)

0 02 04 06 080

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(b)

Figure 6 Results of the localization algorithm considering the correct matches in zone 3 using (a) Set 1 and (b) Set 2 The results of eachdescription method are shown as different recall-precision curves

Table 1 Average computational cost of the description algorithms studied per test image For each descriptionmethod and data set the tableshows first the necessary time to obtain the test image descriptor and second the time to compare it with the descriptors of the map and toobtain the final localization result

2D Fourier Fourier Signature Gist HOG SURF-HarrisData Set 1 Descriptor 00087 s 00080 s 04886 s 00608 s 05542 sData Set 1 Match 00015 s 00058 s 00006 s 00008 s 258085 sData Set 2 Descriptor 00085 s 00079 s 04828 s 00621 s 05389 sData Set 2 Match 00012 s 00047 s 00005 s 00006 s 193931 s

after considering all the test images is shown in Table 1 Toobtain the results of this table the algorithms have beenimplemented using Matlab

With respect to the computational cost the methodsbased on the Fourier Transform are significantly faster thanthe rest while SURF-Harris presents a considerably highcomputational cost About the necessary time to compare twodescriptors gist andHOG are the fastest methods In the caseof SURF-Harris the brute force match method implementedresults in a relatively high computational cost This methodhas been chosen to make a homogeneous comparison withthe other global-appearance methods However in a realimplementation a bag-of-words based approach [23] wouldimprove the computational efficiency of the algorithm

At last we have obtained the averagememory size neededto store each descriptorThe results are shown in Table 2Gistis the most compact descriptor (it is able to compress theinformation in each scene significantly) while SURF-Harrisneeds more memory size

Considering these results jointly with the precision inlocalization we could say that the SURF-Harris descriptorshows very good results in location accuracy but its compu-tational cost makes it unfeasible to solve a real applicationHOG which is the second in terms of accuracy also has avery good computational cost so we consider it interesting to

study more thoroughly this descriptor as future work and toimplement more advanced versions of this method to try tooptimize the accuracy Likewise other types of distances tocompare images could also be studied apart from the Euc-lidean distance For the same reasons we also consider itappropriate to examine more thoroughly the gist descriptorsas well as using other methods to extract the gist of a sceneapart from the orientation information (eg from the colorinformation)

As a final experiment we have studied the relationshipdistance between two image descriptors versus geometric dista-nce between the capture points of these two images As statedat the beginning of the section this information is very inter-esting in some applications such as the construction of mapsfrom the images with geometric precision or the localizationof the vehicle at halfway points of the grid of the map It isimportant that the distance between descriptors grows as thegeometric distance does Figure 7 shows the results obtainedusing (a) Set 1 and (b) Set 2 To obtain these figures an imagehas been set as a reference image and the distance betweenthe reference image descriptor and the other descriptors hasbeen calculatedThefigure shows this distance versus the geo-metric distance between the capture points of each image andthe capture point of the reference image In both cases thisrelationship is monotonically increasing up to a geometric

10 Journal of Sensors

Table 2 Necessary memory to store each descriptor

2D Fourier Fourier Signature Gist HOG SURF-HarrisDescriptor 16384 bytes 32768 bytes 4096 bytes 8192 bytes 110400 bytes

0 200 400 600 8000

02

04

06

08

1

Des

crip

tor d

istan

ce

2D Fourier Fourier SignatureHOG

GistSURF-Harris

Geometrical distance (m)

12

(a)

Des

crip

tor d

istan

ce0 200 400 600

0

02

04

06

08

1

12

2D FourierFourier SignatureHOG

GistSURF-Harris

Geometrical distance (m)

(b)

Figure 7 Relationship distance between two image descriptors versus geometric distance between the capture points of these two images using(a) Set 1 and (b) Set 2

distance of approximately 100meters From this point it tendsto stabilize with a relatively high variance The exceptionis the local features descriptor which stabilizes at the finalvalue from a very small geometric distance However theappearance-based descriptors exhibit a more linear behavioraround each image

6 Conclusions and Future Works

In this paper we have carried out a comparative evaluationof several description methods of scenes considering theperformance of these methods to accurately solve an absolutelocalization problem in a large real outdoor environmentWeevaluated two different approaches of visual descriptors localfeatures descriptors (SURF-Harris) and global-appearancedescriptors (2D Discrete Fourier Transform Fourier Signa-ture HOG and gist)

All the tests have been carried out with images ofGoogle Street View captured under realistic conditions Twodifferent areas of a city have been considered an open areaand a purely urban area with narrower streets The capturepoints of each area present different topographyThe first oneis a grid map that covers several streets and avenues and thesecond one a linear map (ie the images were captured whenthe mobile traversed a linear path on a narrow street)

Some different studies have been performed First wehave evaluated the accuracy of the localization process To

do this recall and precision curves have been computed tocompare the performance of each description method Weplot the recall and precision curves for both areas takinginto account different levels of accuracy to consider thatthe localization result is correct In these experiments thecomputational cost of the localization process has also beenanalyzed

We have also studied each descriptor in terms of behaviorof the descriptor distance comparing to geometrical distancebetween image capture points To do this we plot a curvethat represents the descriptor distance versus the geometricaldistance between capture points This measure is very usefulfor performing navigation tasks since thanks to it we canestimate the range of use of the descriptor

It is noticeable that the SURF-Harris descriptor is themost suitable descriptor in terms of precision in localizationbut it presents a smaller zone of work in terms of Euclideandistance between descriptorsTheHOGdescriptor has showna relatively good performance to solve the localization prob-lem and presents a good response of the descriptors distanceversus geometrical distance between capture points If weanalyze jointly the results of both experiments and take intoaccount the computational cost (Tables 1 and 2) we concludethat although the SURF-Harris descriptor presents the bestresults in terms of recall and precision curves it does notallow us to work in real time Therefore taking into account

Journal of Sensors 11

that HOG is the descriptor that presents the second bestresults in terms of recall and precision curves and allows us towork in real time we can conclude that the HOG is the mostsuitable descriptor

We plan to extend this work to (a) capture a real out-door trajectory traveling along several streets and capturingomnidirectional images using a catadioptric vision system(b) combine the information provided by this vision systemand the images of the Google Street View and (c) evaluatethe performance of the best descriptors in a probabilisticlocalization process

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work has been supported by the Spanish Governmentthrough the Project DPI 2013-41557-P Navegacion de Robotsen Entornos DinamicosMedianteMapas Compactos con Infor-macion Visual de Apariencia Global and by the GeneralitatValenciana through the ProjectsAICO2015021 Localizaciony Creacion de Mapas Visuales para Navegacion de Robots con6 GDL and GV2015031 Creacion de Mapas Topologicos aPartir de la Apariencia Global de un Conjunto de Escenas

References

[1] S Thrun D Fox W Burgard and F Dellaert ldquoRobust MonteCarlo localization for mobile robotsrdquo Artificial Intelligence vol128 no 1-2 pp 99ndash141 2001

[2] A Gil O Reinoso M A Vicente C Fernandez and LPaya ldquoMonte carlo localization using SIFT featuresrdquo in PatternRecognition and Image Analysis vol 3522 of Lecture Notes inComputer Science pp 623ndash630 2005

[3] AMurillo J Guerrero andC Sagues ldquoSurf features for efficientrobot localization with omnidirectional imagesrdquo in Proceedingsof the IEEE International Conference on Robotics amp AutomationSan Diego Calif USA 2007

[4] H Bay T Tuytelaars and L Van Gool ldquoSurf speeded up robustfeaturesrdquo in Proceedings of the 9th European Conference onComputer Vision Graz Austria 2006

[5] F Rossi A Ranganathan F Dellaert and EMenegatti ldquoTowardtopological localization with spherical fourier transform anduncalibrated camerardquo in Proceedings of the International Con-ference on Simulation Modeling and Programming for Auto-nomous Robots (SIMPAR rsquo08) pp 319ndash330 Venice Italy 2008

[6] L Paya O Reinoso F Amoros L Fernandez and A Gil ldquoPro-babilistic map building localization and navigation of a team ofmobile robots application to route followingrdquo in Multi-RobotSystems Trends and Development pp 191ndash210 2011

[7] L Fernandez L Paya D Valiente A Gil and O ReinosoldquoMonte Carlo localization using the global appearance of omni-directional images algorithmoptimization to large indoor envi-ronmentsrdquo in Proceedings of the 9th International Conference onInformatics in Control Automation and Robotics (ICINCO rsquo12)pp 439ndash442 Rome Italy July 2012

[8] M Montemerlo FastSLAM a factored solution to the simul-taneous localization and mapping problem with unknown dataassociation [PhD thesis] Robotics Institute Carnegie MellonUniversity Pittsburgh Pa USA 2003

[9] C Gamallo P Quintıa R Iglesias-Rodrıguez J V Lorenzo andC V Regueiro ldquoCombination of a low cost GPS with visuallocalization based on a previous map for outdoor navigationrdquoin Proceedings of the 11th International Conference on IntelligentSystems Design and Applications (ISDA rsquo11) pp 1146ndash1151Cordoba Spain November 2011

[10] A Torii J Sivic and T Pajdla ldquoVisual localization by linearcombination of image descriptorsrdquo in Proceedings of the IEEEInternational Conference on Computer VisionWorkshops (ICCVWorkshops rsquo11) pp 102ndash109 Barcelona Spain November 2011

[11] M Aly and J-Y Bouguet ldquoStreet view goes indoors automaticpose estimation fromuncalibrated unordered spherical panora-masrdquo in Proceedings of the IEEEWorkshop on the Applications ofComputer Vision (WACV rsquo12) pp 1ndash8 Breckenridge Colo USAJanuary 2012

[12] A Taneja L Ballan andM Pollefeys ldquoRegistration of sphericalpanoramic images with cadastral 3d modelsrdquo in Proceedings ofthe International Conference on 3D Imaging Modeling Process-ing Visualization and Transmission (3DIMPVT rsquo12) pp 479ndash486 Zurich Switzerland 2012

[13] F Amoros L Paya O Reinoso and L Jimenez ldquoComparisonof global-appearance techniques applied to visual map buildingand localizationrdquo in Proceedings of the International Conferenceon Computer Vision Theory and Applications pp 395ndash398Rome Italy 2012

[14] A Gil O M Mozos M Ballesta and O Reinoso ldquoA compara-tive evaluation of interest point detectors and local descriptorsfor visual SLAMrdquo Machine Vision and Applications vol 21 no6 pp 905ndash920 2010

[15] L Paya L FernandezO ReinosoAGil andDUbeda ldquoAppea-rance-based dense maps creation Comparison of compressiontechniques with panoramic imagesrdquo in Proceedings of the Inter-national Conference on Informatics in Control Automation andRobotics (INSTICC rsquo09) pp 238ndash246 Milan Italy 2009

[16] C Harris and M Stephens ldquoA combined corner and edgedetectorrdquo in Proceedings of the Alvey Vision Conference pp 231ndash236 Manchester UK 1988

[17] E Menegatti T Maeda and H Ishiguro ldquoImage-based mem-ory for robot navigation using properties of omnidirectionalimagesrdquo Robotics and Autonomous Systems vol 47 no 4 pp251ndash267 2004

[18] L Paya L Fernandez A Gil and O Reinoso ldquoMap build-ing and Monte Carlo localization using global appearance ofomnidirectional imagesrdquo Sensors vol 10 no 12 pp 11468ndash114972010

[19] A Friedman ldquoFraming pictures the role of knowledge in auto-matized encoding andmemory for gistrdquo Journal of ExperimentalPsychology General vol 108 no 3 pp 316ndash355 1979

[20] A Torralba ldquoContextual priming for object detectionrdquo Interna-tional Journal of Computer Vision vol 53 no 2 pp 169ndash1912003

[21] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) pp 886ndash893 Washington DC USA June 2005

12 Journal of Sensors

[22] F Amoros L Paya O Reinoso L Fernandez and J MarınldquoVisual map building and localization with an appearance-based approachmdashcomparisons of techniques to extract infor-mation of panoramic imagesrdquo in Proceedings of the 7th Inter-national Conference on Informatics in Control Automation andRobotics pp 423ndash426 2010

[23] D Filliat ldquoA visual bag of words method for interactive quali-tative localization and mappingrdquo in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRArsquo07) pp 3921ndash3926 Roma Italy April 2007

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 7: Research Article A Study of Visual Descriptors for Outdoor ...downloads.hindawi.com/journals/js/2016/1537891.pdf · Research Article A Study of Visual Descriptors for Outdoor Navigation

Journal of Sensors 7

Before running the algorithm it is necessary to definethe image distance We use two different distance measuresdepending on the kind of descriptor used

First in the case of the feature-based method (SURF-Harris) it is necessary to extract the interest points prior todescribing the appearance of the image We propose usingthe Harris corner detector [16] to extract visual landmarksfrom the panoramic images After that each interest point isdescribed using standard SURF To compare the test imagewith themap images first we extract and describe the interestpoints from all the images After that a matching process iscarried out with these points The points detected in the testimage captured with a particular position and orientation ofthe vehicle are searched in themap imagesThe performanceof the matching method is not the scope of this paper weonly employ it as a tool Once the matching process has beencarried out we evaluate the performance of the descriptortaking into account the number of matched points so thatwe will consider as closest image the one that presents morematching points with the test image More concisely wecompute the distance between the test image 119905 and any otherimage of the map 119895 as

119863119905119895fea = 1 minus( 119873119872119905119895max119895 (997888997888997888997888rarr119873119872119905)) (3)

where119873119872119905119895 is the number of matches between the images 119905and 119895 997888997888997888997888rarr119873119872119905 = [1198731198721199051 119873119872119905119899map] is a vector that containsthe number of matches between the image 119905 and every imageof the map and 119899map is the number of images in the map

Second in the case of appearance-based methods (2DDFT FS gist and HOG) no local information needs to beextracted from the images Instead the appearance of thewhole images is compared This way the global descriptor ofthe test image is calculated and the distances between it andthe descriptors of themap images are obtainedTheEuclideandistance is used in this case defined as

119863119905119895119864 = radic 119872sum119898=1

(119898119905 minus 119898119895 )2 (4)

where 119905 is the descriptor of the test image 119905 119895 is thedescriptor of the map image 119895 and 119872 is the size of thedescriptors This distance is normalized to obtain the finaldistance between the images 119905 and 119895 according to the nextexpression

119863119905119895app = 119863119905119895119864max119895 (119905119864) (5)

where119863119905119895119864 is the Euclidean distance between the descriptor ofthe test image 119905 and the map image 119895 119905119864 = [1198631199051119864 119863119905119899map

119864 ]is a vector that contains the Euclidean distance between thedescriptor of the image 119905 and all the images in the map and119899map is the number of images in the map

It is important to note that the algorithm must be ableto estimate the position of the robot with accuracy but it isalso important that the computational cost is adequate toknowwhether it would be feasible to solve the problem in realtime To estimate the computational cost we have computedconsidering bothmaps in the experiments the necessary timeto calculate the descriptor of each test image to compute thedistance to themap descriptors and to detect themost similardescriptor We must take into account that the descriptors ofall the map images can be computed prior to the localizationin an off-line process Apart from the time we have alsoestimated the amount of memory needed to store each imagedescriptor

To finish we also propose to study the relationship dis-tance between two image descriptors versus geometric distancebetween the capture points of these two images Ideally the dis-tance between the descriptors must increase as the geometricdistance between capture points does (ie it must not presentany local minima) This information is very interesting inapplications such as map building where the robot mustbe able to build a map using as input information only thedistance between image descriptors It is also important whenit is necessary to estimate the position of the vehicle at halfwaypoints within the gridmap Additionally it may help to detectif the problem of visual aliasing is present in the environment(ie two zones which are geometrically far may present asimilar visual appearance which might lead to errors in themapping and localization process)

5 Experimental Results

As stated in the previous section with the purpose of estab-lishing the capacity of each descriptor to correctly localizethe robot (or vehicle) we have built recall-precision curvesto reflect the results of each experiment Figure 4 shows thisgraphical representation using (a) the first and (b) the secondset of images (denoted as Sets 1 and 2 in previous sections) Tobuild this figure we consider the localization results in zone 1This way the figure shows the ability of the localization algo-rithm to correctly detect which image of the map was cap-tured closer to the test imageThis is themost restrictive case

Apart from it the performance of the localization algo-rithm in zones 2 and 3 has also been studied This wayFigure 5 shows the results of the localization process in zone2 using (a) Set 1 and (b) Set 2 Finally Figure 6 shows thelocalization results in zone 3 using (a) Set 1 and (b) Set 2Thisis the least restrictive case among the three studied

In all cases the results show that the SURF-Harris des-criptor presents a relatively better performance comparing tothe other descriptors in terms of accuracy and using bothimage sets As far as the methods based on global appearanceare concerned the good behavior ofHOG can be highlightedIn the case of the localization in zone 2 it reaches 60and 50of precision in Sets 1 and 2 respectively These results can beconsidered relatively good taking into account the fact thatthe localization process is solved in an absolute way (ie weconsider that no information about the previous position ofthe robot is available and the test image is compared with allthe images stored in the data sets) In a real application it

8 Journal of Sensors

0 01 02 03 04 050

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(a)

0 01 02 03 04 05 060

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(b)

Figure 4 Results of the localization algorithm considering the correct matches in zone 1 using (a) Set 1 and (b) Set 2 The results of eachdescription method are shown as different recall-precision curves

0 01 02 03 04 05 06 070

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(a)

0 02 04 06 080

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(b)

Figure 5 Results of the localization algorithm considering the correct matches in zone 2 using (a) Set 1 and (b) Set 2 The results of eachdescription method are shown as different recall-precision curves

is usual to make use of any kind of probabilistic algorithmto estimate the position of the robot taking into account itsprevious estimated position This is expected to provide ahigher accuracyWe expect to develop this type of algorithmsand tests in a future work

Some additional conclusions can be reached by compar-ing the performance of the methods in open areas (Set 1) andurban areas (Set 2) In open areas the performance of SURF-Harris HOG and gist is quite similar and relatively goodin all cases and the methods based on the Discrete FourierTransform tend to present worse results However in the case

of urban areas SURF-Harris outperforms the other methodsand gist is the one that presents the worst results

Apart from the localization accuracy it is also importantto study the computational cost of the process since in a realapplication it would have to run in real time as the robotis navigating through the environment This way we haveobtained in all cases the necessary time to calculate thedescriptor of the test image on the one hand and to compare itwith the descriptors stored in themap to detect themost sim-ilar descriptor and to analyze the results on the other handThe average computational time of the localization process

Journal of Sensors 9

0 02 04 06 080

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(a)

0 02 04 06 080

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(b)

Figure 6 Results of the localization algorithm considering the correct matches in zone 3 using (a) Set 1 and (b) Set 2 The results of eachdescription method are shown as different recall-precision curves

Table 1 Average computational cost of the description algorithms studied per test image For each descriptionmethod and data set the tableshows first the necessary time to obtain the test image descriptor and second the time to compare it with the descriptors of the map and toobtain the final localization result

2D Fourier Fourier Signature Gist HOG SURF-HarrisData Set 1 Descriptor 00087 s 00080 s 04886 s 00608 s 05542 sData Set 1 Match 00015 s 00058 s 00006 s 00008 s 258085 sData Set 2 Descriptor 00085 s 00079 s 04828 s 00621 s 05389 sData Set 2 Match 00012 s 00047 s 00005 s 00006 s 193931 s

after considering all the test images is shown in Table 1 Toobtain the results of this table the algorithms have beenimplemented using Matlab

With respect to the computational cost the methodsbased on the Fourier Transform are significantly faster thanthe rest while SURF-Harris presents a considerably highcomputational cost About the necessary time to compare twodescriptors gist andHOG are the fastest methods In the caseof SURF-Harris the brute force match method implementedresults in a relatively high computational cost This methodhas been chosen to make a homogeneous comparison withthe other global-appearance methods However in a realimplementation a bag-of-words based approach [23] wouldimprove the computational efficiency of the algorithm

At last we have obtained the averagememory size neededto store each descriptorThe results are shown in Table 2Gistis the most compact descriptor (it is able to compress theinformation in each scene significantly) while SURF-Harrisneeds more memory size

Considering these results jointly with the precision inlocalization we could say that the SURF-Harris descriptorshows very good results in location accuracy but its compu-tational cost makes it unfeasible to solve a real applicationHOG which is the second in terms of accuracy also has avery good computational cost so we consider it interesting to

study more thoroughly this descriptor as future work and toimplement more advanced versions of this method to try tooptimize the accuracy Likewise other types of distances tocompare images could also be studied apart from the Euc-lidean distance For the same reasons we also consider itappropriate to examine more thoroughly the gist descriptorsas well as using other methods to extract the gist of a sceneapart from the orientation information (eg from the colorinformation)

As a final experiment we have studied the relationshipdistance between two image descriptors versus geometric dista-nce between the capture points of these two images As statedat the beginning of the section this information is very inter-esting in some applications such as the construction of mapsfrom the images with geometric precision or the localizationof the vehicle at halfway points of the grid of the map It isimportant that the distance between descriptors grows as thegeometric distance does Figure 7 shows the results obtainedusing (a) Set 1 and (b) Set 2 To obtain these figures an imagehas been set as a reference image and the distance betweenthe reference image descriptor and the other descriptors hasbeen calculatedThefigure shows this distance versus the geo-metric distance between the capture points of each image andthe capture point of the reference image In both cases thisrelationship is monotonically increasing up to a geometric

10 Journal of Sensors

Table 2 Necessary memory to store each descriptor

2D Fourier Fourier Signature Gist HOG SURF-HarrisDescriptor 16384 bytes 32768 bytes 4096 bytes 8192 bytes 110400 bytes

0 200 400 600 8000

02

04

06

08

1

Des

crip

tor d

istan

ce

2D Fourier Fourier SignatureHOG

GistSURF-Harris

Geometrical distance (m)

12

(a)

Des

crip

tor d

istan

ce0 200 400 600

0

02

04

06

08

1

12

2D FourierFourier SignatureHOG

GistSURF-Harris

Geometrical distance (m)

(b)

Figure 7 Relationship distance between two image descriptors versus geometric distance between the capture points of these two images using(a) Set 1 and (b) Set 2

distance of approximately 100meters From this point it tendsto stabilize with a relatively high variance The exceptionis the local features descriptor which stabilizes at the finalvalue from a very small geometric distance However theappearance-based descriptors exhibit a more linear behavioraround each image

6 Conclusions and Future Works

In this paper we have carried out a comparative evaluationof several description methods of scenes considering theperformance of these methods to accurately solve an absolutelocalization problem in a large real outdoor environmentWeevaluated two different approaches of visual descriptors localfeatures descriptors (SURF-Harris) and global-appearancedescriptors (2D Discrete Fourier Transform Fourier Signa-ture HOG and gist)

All the tests have been carried out with images ofGoogle Street View captured under realistic conditions Twodifferent areas of a city have been considered an open areaand a purely urban area with narrower streets The capturepoints of each area present different topographyThe first oneis a grid map that covers several streets and avenues and thesecond one a linear map (ie the images were captured whenthe mobile traversed a linear path on a narrow street)

Some different studies have been performed First wehave evaluated the accuracy of the localization process To

do this recall and precision curves have been computed tocompare the performance of each description method Weplot the recall and precision curves for both areas takinginto account different levels of accuracy to consider thatthe localization result is correct In these experiments thecomputational cost of the localization process has also beenanalyzed

We have also studied each descriptor in terms of behaviorof the descriptor distance comparing to geometrical distancebetween image capture points To do this we plot a curvethat represents the descriptor distance versus the geometricaldistance between capture points This measure is very usefulfor performing navigation tasks since thanks to it we canestimate the range of use of the descriptor

It is noticeable that the SURF-Harris descriptor is themost suitable descriptor in terms of precision in localizationbut it presents a smaller zone of work in terms of Euclideandistance between descriptorsTheHOGdescriptor has showna relatively good performance to solve the localization prob-lem and presents a good response of the descriptors distanceversus geometrical distance between capture points If weanalyze jointly the results of both experiments and take intoaccount the computational cost (Tables 1 and 2) we concludethat although the SURF-Harris descriptor presents the bestresults in terms of recall and precision curves it does notallow us to work in real time Therefore taking into account

Journal of Sensors 11

that HOG is the descriptor that presents the second bestresults in terms of recall and precision curves and allows us towork in real time we can conclude that the HOG is the mostsuitable descriptor

We plan to extend this work to (a) capture a real out-door trajectory traveling along several streets and capturingomnidirectional images using a catadioptric vision system(b) combine the information provided by this vision systemand the images of the Google Street View and (c) evaluatethe performance of the best descriptors in a probabilisticlocalization process

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work has been supported by the Spanish Governmentthrough the Project DPI 2013-41557-P Navegacion de Robotsen Entornos DinamicosMedianteMapas Compactos con Infor-macion Visual de Apariencia Global and by the GeneralitatValenciana through the ProjectsAICO2015021 Localizaciony Creacion de Mapas Visuales para Navegacion de Robots con6 GDL and GV2015031 Creacion de Mapas Topologicos aPartir de la Apariencia Global de un Conjunto de Escenas

References

[1] S Thrun D Fox W Burgard and F Dellaert ldquoRobust MonteCarlo localization for mobile robotsrdquo Artificial Intelligence vol128 no 1-2 pp 99ndash141 2001

[2] A Gil O Reinoso M A Vicente C Fernandez and LPaya ldquoMonte carlo localization using SIFT featuresrdquo in PatternRecognition and Image Analysis vol 3522 of Lecture Notes inComputer Science pp 623ndash630 2005

[3] AMurillo J Guerrero andC Sagues ldquoSurf features for efficientrobot localization with omnidirectional imagesrdquo in Proceedingsof the IEEE International Conference on Robotics amp AutomationSan Diego Calif USA 2007

[4] H Bay T Tuytelaars and L Van Gool ldquoSurf speeded up robustfeaturesrdquo in Proceedings of the 9th European Conference onComputer Vision Graz Austria 2006

[5] F Rossi A Ranganathan F Dellaert and EMenegatti ldquoTowardtopological localization with spherical fourier transform anduncalibrated camerardquo in Proceedings of the International Con-ference on Simulation Modeling and Programming for Auto-nomous Robots (SIMPAR rsquo08) pp 319ndash330 Venice Italy 2008

[6] L Paya O Reinoso F Amoros L Fernandez and A Gil ldquoPro-babilistic map building localization and navigation of a team ofmobile robots application to route followingrdquo in Multi-RobotSystems Trends and Development pp 191ndash210 2011

[7] L Fernandez L Paya D Valiente A Gil and O ReinosoldquoMonte Carlo localization using the global appearance of omni-directional images algorithmoptimization to large indoor envi-ronmentsrdquo in Proceedings of the 9th International Conference onInformatics in Control Automation and Robotics (ICINCO rsquo12)pp 439ndash442 Rome Italy July 2012

[8] M Montemerlo FastSLAM a factored solution to the simul-taneous localization and mapping problem with unknown dataassociation [PhD thesis] Robotics Institute Carnegie MellonUniversity Pittsburgh Pa USA 2003

[9] C Gamallo P Quintıa R Iglesias-Rodrıguez J V Lorenzo andC V Regueiro ldquoCombination of a low cost GPS with visuallocalization based on a previous map for outdoor navigationrdquoin Proceedings of the 11th International Conference on IntelligentSystems Design and Applications (ISDA rsquo11) pp 1146ndash1151Cordoba Spain November 2011

[10] A Torii J Sivic and T Pajdla ldquoVisual localization by linearcombination of image descriptorsrdquo in Proceedings of the IEEEInternational Conference on Computer VisionWorkshops (ICCVWorkshops rsquo11) pp 102ndash109 Barcelona Spain November 2011

[11] M Aly and J-Y Bouguet ldquoStreet view goes indoors automaticpose estimation fromuncalibrated unordered spherical panora-masrdquo in Proceedings of the IEEEWorkshop on the Applications ofComputer Vision (WACV rsquo12) pp 1ndash8 Breckenridge Colo USAJanuary 2012

[12] A Taneja L Ballan andM Pollefeys ldquoRegistration of sphericalpanoramic images with cadastral 3d modelsrdquo in Proceedings ofthe International Conference on 3D Imaging Modeling Process-ing Visualization and Transmission (3DIMPVT rsquo12) pp 479ndash486 Zurich Switzerland 2012

[13] F Amoros L Paya O Reinoso and L Jimenez ldquoComparisonof global-appearance techniques applied to visual map buildingand localizationrdquo in Proceedings of the International Conferenceon Computer Vision Theory and Applications pp 395ndash398Rome Italy 2012

[14] A Gil O M Mozos M Ballesta and O Reinoso ldquoA compara-tive evaluation of interest point detectors and local descriptorsfor visual SLAMrdquo Machine Vision and Applications vol 21 no6 pp 905ndash920 2010

[15] L Paya L FernandezO ReinosoAGil andDUbeda ldquoAppea-rance-based dense maps creation Comparison of compressiontechniques with panoramic imagesrdquo in Proceedings of the Inter-national Conference on Informatics in Control Automation andRobotics (INSTICC rsquo09) pp 238ndash246 Milan Italy 2009

[16] C Harris and M Stephens ldquoA combined corner and edgedetectorrdquo in Proceedings of the Alvey Vision Conference pp 231ndash236 Manchester UK 1988

[17] E Menegatti T Maeda and H Ishiguro ldquoImage-based mem-ory for robot navigation using properties of omnidirectionalimagesrdquo Robotics and Autonomous Systems vol 47 no 4 pp251ndash267 2004

[18] L Paya L Fernandez A Gil and O Reinoso ldquoMap build-ing and Monte Carlo localization using global appearance ofomnidirectional imagesrdquo Sensors vol 10 no 12 pp 11468ndash114972010

[19] A Friedman ldquoFraming pictures the role of knowledge in auto-matized encoding andmemory for gistrdquo Journal of ExperimentalPsychology General vol 108 no 3 pp 316ndash355 1979

[20] A Torralba ldquoContextual priming for object detectionrdquo Interna-tional Journal of Computer Vision vol 53 no 2 pp 169ndash1912003

[21] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) pp 886ndash893 Washington DC USA June 2005

12 Journal of Sensors

[22] F Amoros L Paya O Reinoso L Fernandez and J MarınldquoVisual map building and localization with an appearance-based approachmdashcomparisons of techniques to extract infor-mation of panoramic imagesrdquo in Proceedings of the 7th Inter-national Conference on Informatics in Control Automation andRobotics pp 423ndash426 2010

[23] D Filliat ldquoA visual bag of words method for interactive quali-tative localization and mappingrdquo in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRArsquo07) pp 3921ndash3926 Roma Italy April 2007

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 8: Research Article A Study of Visual Descriptors for Outdoor ...downloads.hindawi.com/journals/js/2016/1537891.pdf · Research Article A Study of Visual Descriptors for Outdoor Navigation

8 Journal of Sensors

0 01 02 03 04 050

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(a)

0 01 02 03 04 05 060

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(b)

Figure 4 Results of the localization algorithm considering the correct matches in zone 1 using (a) Set 1 and (b) Set 2 The results of eachdescription method are shown as different recall-precision curves

0 01 02 03 04 05 06 070

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(a)

0 02 04 06 080

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(b)

Figure 5 Results of the localization algorithm considering the correct matches in zone 2 using (a) Set 1 and (b) Set 2 The results of eachdescription method are shown as different recall-precision curves

is usual to make use of any kind of probabilistic algorithmto estimate the position of the robot taking into account itsprevious estimated position This is expected to provide ahigher accuracyWe expect to develop this type of algorithmsand tests in a future work

Some additional conclusions can be reached by compar-ing the performance of the methods in open areas (Set 1) andurban areas (Set 2) In open areas the performance of SURF-Harris HOG and gist is quite similar and relatively goodin all cases and the methods based on the Discrete FourierTransform tend to present worse results However in the case

of urban areas SURF-Harris outperforms the other methodsand gist is the one that presents the worst results

Apart from the localization accuracy it is also importantto study the computational cost of the process since in a realapplication it would have to run in real time as the robotis navigating through the environment This way we haveobtained in all cases the necessary time to calculate thedescriptor of the test image on the one hand and to compare itwith the descriptors stored in themap to detect themost sim-ilar descriptor and to analyze the results on the other handThe average computational time of the localization process

Journal of Sensors 9

0 02 04 06 080

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(a)

0 02 04 06 080

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(b)

Figure 6 Results of the localization algorithm considering the correct matches in zone 3 using (a) Set 1 and (b) Set 2 The results of eachdescription method are shown as different recall-precision curves

Table 1 Average computational cost of the description algorithms studied per test image For each descriptionmethod and data set the tableshows first the necessary time to obtain the test image descriptor and second the time to compare it with the descriptors of the map and toobtain the final localization result

2D Fourier Fourier Signature Gist HOG SURF-HarrisData Set 1 Descriptor 00087 s 00080 s 04886 s 00608 s 05542 sData Set 1 Match 00015 s 00058 s 00006 s 00008 s 258085 sData Set 2 Descriptor 00085 s 00079 s 04828 s 00621 s 05389 sData Set 2 Match 00012 s 00047 s 00005 s 00006 s 193931 s

after considering all the test images is shown in Table 1 Toobtain the results of this table the algorithms have beenimplemented using Matlab

With respect to the computational cost the methodsbased on the Fourier Transform are significantly faster thanthe rest while SURF-Harris presents a considerably highcomputational cost About the necessary time to compare twodescriptors gist andHOG are the fastest methods In the caseof SURF-Harris the brute force match method implementedresults in a relatively high computational cost This methodhas been chosen to make a homogeneous comparison withthe other global-appearance methods However in a realimplementation a bag-of-words based approach [23] wouldimprove the computational efficiency of the algorithm

At last we have obtained the averagememory size neededto store each descriptorThe results are shown in Table 2Gistis the most compact descriptor (it is able to compress theinformation in each scene significantly) while SURF-Harrisneeds more memory size

Considering these results jointly with the precision inlocalization we could say that the SURF-Harris descriptorshows very good results in location accuracy but its compu-tational cost makes it unfeasible to solve a real applicationHOG which is the second in terms of accuracy also has avery good computational cost so we consider it interesting to

study more thoroughly this descriptor as future work and toimplement more advanced versions of this method to try tooptimize the accuracy Likewise other types of distances tocompare images could also be studied apart from the Euc-lidean distance For the same reasons we also consider itappropriate to examine more thoroughly the gist descriptorsas well as using other methods to extract the gist of a sceneapart from the orientation information (eg from the colorinformation)

As a final experiment we have studied the relationshipdistance between two image descriptors versus geometric dista-nce between the capture points of these two images As statedat the beginning of the section this information is very inter-esting in some applications such as the construction of mapsfrom the images with geometric precision or the localizationof the vehicle at halfway points of the grid of the map It isimportant that the distance between descriptors grows as thegeometric distance does Figure 7 shows the results obtainedusing (a) Set 1 and (b) Set 2 To obtain these figures an imagehas been set as a reference image and the distance betweenthe reference image descriptor and the other descriptors hasbeen calculatedThefigure shows this distance versus the geo-metric distance between the capture points of each image andthe capture point of the reference image In both cases thisrelationship is monotonically increasing up to a geometric

10 Journal of Sensors

Table 2 Necessary memory to store each descriptor

2D Fourier Fourier Signature Gist HOG SURF-HarrisDescriptor 16384 bytes 32768 bytes 4096 bytes 8192 bytes 110400 bytes

0 200 400 600 8000

02

04

06

08

1

Des

crip

tor d

istan

ce

2D Fourier Fourier SignatureHOG

GistSURF-Harris

Geometrical distance (m)

12

(a)

Des

crip

tor d

istan

ce0 200 400 600

0

02

04

06

08

1

12

2D FourierFourier SignatureHOG

GistSURF-Harris

Geometrical distance (m)

(b)

Figure 7 Relationship distance between two image descriptors versus geometric distance between the capture points of these two images using(a) Set 1 and (b) Set 2

distance of approximately 100meters From this point it tendsto stabilize with a relatively high variance The exceptionis the local features descriptor which stabilizes at the finalvalue from a very small geometric distance However theappearance-based descriptors exhibit a more linear behavioraround each image

6 Conclusions and Future Works

In this paper we have carried out a comparative evaluationof several description methods of scenes considering theperformance of these methods to accurately solve an absolutelocalization problem in a large real outdoor environmentWeevaluated two different approaches of visual descriptors localfeatures descriptors (SURF-Harris) and global-appearancedescriptors (2D Discrete Fourier Transform Fourier Signa-ture HOG and gist)

All the tests have been carried out with images ofGoogle Street View captured under realistic conditions Twodifferent areas of a city have been considered an open areaand a purely urban area with narrower streets The capturepoints of each area present different topographyThe first oneis a grid map that covers several streets and avenues and thesecond one a linear map (ie the images were captured whenthe mobile traversed a linear path on a narrow street)

Some different studies have been performed First wehave evaluated the accuracy of the localization process To

do this recall and precision curves have been computed tocompare the performance of each description method Weplot the recall and precision curves for both areas takinginto account different levels of accuracy to consider thatthe localization result is correct In these experiments thecomputational cost of the localization process has also beenanalyzed

We have also studied each descriptor in terms of behaviorof the descriptor distance comparing to geometrical distancebetween image capture points To do this we plot a curvethat represents the descriptor distance versus the geometricaldistance between capture points This measure is very usefulfor performing navigation tasks since thanks to it we canestimate the range of use of the descriptor

It is noticeable that the SURF-Harris descriptor is themost suitable descriptor in terms of precision in localizationbut it presents a smaller zone of work in terms of Euclideandistance between descriptorsTheHOGdescriptor has showna relatively good performance to solve the localization prob-lem and presents a good response of the descriptors distanceversus geometrical distance between capture points If weanalyze jointly the results of both experiments and take intoaccount the computational cost (Tables 1 and 2) we concludethat although the SURF-Harris descriptor presents the bestresults in terms of recall and precision curves it does notallow us to work in real time Therefore taking into account

Journal of Sensors 11

that HOG is the descriptor that presents the second bestresults in terms of recall and precision curves and allows us towork in real time we can conclude that the HOG is the mostsuitable descriptor

We plan to extend this work to (a) capture a real out-door trajectory traveling along several streets and capturingomnidirectional images using a catadioptric vision system(b) combine the information provided by this vision systemand the images of the Google Street View and (c) evaluatethe performance of the best descriptors in a probabilisticlocalization process

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work has been supported by the Spanish Governmentthrough the Project DPI 2013-41557-P Navegacion de Robotsen Entornos DinamicosMedianteMapas Compactos con Infor-macion Visual de Apariencia Global and by the GeneralitatValenciana through the ProjectsAICO2015021 Localizaciony Creacion de Mapas Visuales para Navegacion de Robots con6 GDL and GV2015031 Creacion de Mapas Topologicos aPartir de la Apariencia Global de un Conjunto de Escenas

References

[1] S Thrun D Fox W Burgard and F Dellaert ldquoRobust MonteCarlo localization for mobile robotsrdquo Artificial Intelligence vol128 no 1-2 pp 99ndash141 2001

[2] A Gil O Reinoso M A Vicente C Fernandez and LPaya ldquoMonte carlo localization using SIFT featuresrdquo in PatternRecognition and Image Analysis vol 3522 of Lecture Notes inComputer Science pp 623ndash630 2005

[3] AMurillo J Guerrero andC Sagues ldquoSurf features for efficientrobot localization with omnidirectional imagesrdquo in Proceedingsof the IEEE International Conference on Robotics amp AutomationSan Diego Calif USA 2007

[4] H Bay T Tuytelaars and L Van Gool ldquoSurf speeded up robustfeaturesrdquo in Proceedings of the 9th European Conference onComputer Vision Graz Austria 2006

[5] F Rossi A Ranganathan F Dellaert and EMenegatti ldquoTowardtopological localization with spherical fourier transform anduncalibrated camerardquo in Proceedings of the International Con-ference on Simulation Modeling and Programming for Auto-nomous Robots (SIMPAR rsquo08) pp 319ndash330 Venice Italy 2008

[6] L Paya O Reinoso F Amoros L Fernandez and A Gil ldquoPro-babilistic map building localization and navigation of a team ofmobile robots application to route followingrdquo in Multi-RobotSystems Trends and Development pp 191ndash210 2011

[7] L Fernandez L Paya D Valiente A Gil and O ReinosoldquoMonte Carlo localization using the global appearance of omni-directional images algorithmoptimization to large indoor envi-ronmentsrdquo in Proceedings of the 9th International Conference onInformatics in Control Automation and Robotics (ICINCO rsquo12)pp 439ndash442 Rome Italy July 2012

[8] M Montemerlo FastSLAM a factored solution to the simul-taneous localization and mapping problem with unknown dataassociation [PhD thesis] Robotics Institute Carnegie MellonUniversity Pittsburgh Pa USA 2003

[9] C Gamallo P Quintıa R Iglesias-Rodrıguez J V Lorenzo andC V Regueiro ldquoCombination of a low cost GPS with visuallocalization based on a previous map for outdoor navigationrdquoin Proceedings of the 11th International Conference on IntelligentSystems Design and Applications (ISDA rsquo11) pp 1146ndash1151Cordoba Spain November 2011

[10] A Torii J Sivic and T Pajdla ldquoVisual localization by linearcombination of image descriptorsrdquo in Proceedings of the IEEEInternational Conference on Computer VisionWorkshops (ICCVWorkshops rsquo11) pp 102ndash109 Barcelona Spain November 2011

[11] M Aly and J-Y Bouguet ldquoStreet view goes indoors automaticpose estimation fromuncalibrated unordered spherical panora-masrdquo in Proceedings of the IEEEWorkshop on the Applications ofComputer Vision (WACV rsquo12) pp 1ndash8 Breckenridge Colo USAJanuary 2012

[12] A Taneja L Ballan andM Pollefeys ldquoRegistration of sphericalpanoramic images with cadastral 3d modelsrdquo in Proceedings ofthe International Conference on 3D Imaging Modeling Process-ing Visualization and Transmission (3DIMPVT rsquo12) pp 479ndash486 Zurich Switzerland 2012

[13] F Amoros L Paya O Reinoso and L Jimenez ldquoComparisonof global-appearance techniques applied to visual map buildingand localizationrdquo in Proceedings of the International Conferenceon Computer Vision Theory and Applications pp 395ndash398Rome Italy 2012

[14] A Gil O M Mozos M Ballesta and O Reinoso ldquoA compara-tive evaluation of interest point detectors and local descriptorsfor visual SLAMrdquo Machine Vision and Applications vol 21 no6 pp 905ndash920 2010

[15] L Paya L FernandezO ReinosoAGil andDUbeda ldquoAppea-rance-based dense maps creation Comparison of compressiontechniques with panoramic imagesrdquo in Proceedings of the Inter-national Conference on Informatics in Control Automation andRobotics (INSTICC rsquo09) pp 238ndash246 Milan Italy 2009

[16] C Harris and M Stephens ldquoA combined corner and edgedetectorrdquo in Proceedings of the Alvey Vision Conference pp 231ndash236 Manchester UK 1988

[17] E Menegatti T Maeda and H Ishiguro ldquoImage-based mem-ory for robot navigation using properties of omnidirectionalimagesrdquo Robotics and Autonomous Systems vol 47 no 4 pp251ndash267 2004

[18] L Paya L Fernandez A Gil and O Reinoso ldquoMap build-ing and Monte Carlo localization using global appearance ofomnidirectional imagesrdquo Sensors vol 10 no 12 pp 11468ndash114972010

[19] A Friedman ldquoFraming pictures the role of knowledge in auto-matized encoding andmemory for gistrdquo Journal of ExperimentalPsychology General vol 108 no 3 pp 316ndash355 1979

[20] A Torralba ldquoContextual priming for object detectionrdquo Interna-tional Journal of Computer Vision vol 53 no 2 pp 169ndash1912003

[21] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) pp 886ndash893 Washington DC USA June 2005

12 Journal of Sensors

[22] F Amoros L Paya O Reinoso L Fernandez and J MarınldquoVisual map building and localization with an appearance-based approachmdashcomparisons of techniques to extract infor-mation of panoramic imagesrdquo in Proceedings of the 7th Inter-national Conference on Informatics in Control Automation andRobotics pp 423ndash426 2010

[23] D Filliat ldquoA visual bag of words method for interactive quali-tative localization and mappingrdquo in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRArsquo07) pp 3921ndash3926 Roma Italy April 2007

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 9: Research Article A Study of Visual Descriptors for Outdoor ...downloads.hindawi.com/journals/js/2016/1537891.pdf · Research Article A Study of Visual Descriptors for Outdoor Navigation

Journal of Sensors 9

0 02 04 06 080

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(a)

0 02 04 06 080

02

04

06

08

1

Recall

Prec

ision

2D FourierFourier SignatureHOG

GistSURF-Harris

(b)

Figure 6 Results of the localization algorithm considering the correct matches in zone 3 using (a) Set 1 and (b) Set 2 The results of eachdescription method are shown as different recall-precision curves

Table 1 Average computational cost of the description algorithms studied per test image For each descriptionmethod and data set the tableshows first the necessary time to obtain the test image descriptor and second the time to compare it with the descriptors of the map and toobtain the final localization result

2D Fourier Fourier Signature Gist HOG SURF-HarrisData Set 1 Descriptor 00087 s 00080 s 04886 s 00608 s 05542 sData Set 1 Match 00015 s 00058 s 00006 s 00008 s 258085 sData Set 2 Descriptor 00085 s 00079 s 04828 s 00621 s 05389 sData Set 2 Match 00012 s 00047 s 00005 s 00006 s 193931 s

after considering all the test images is shown in Table 1 Toobtain the results of this table the algorithms have beenimplemented using Matlab

With respect to the computational cost the methodsbased on the Fourier Transform are significantly faster thanthe rest while SURF-Harris presents a considerably highcomputational cost About the necessary time to compare twodescriptors gist andHOG are the fastest methods In the caseof SURF-Harris the brute force match method implementedresults in a relatively high computational cost This methodhas been chosen to make a homogeneous comparison withthe other global-appearance methods However in a realimplementation a bag-of-words based approach [23] wouldimprove the computational efficiency of the algorithm

At last we have obtained the averagememory size neededto store each descriptorThe results are shown in Table 2Gistis the most compact descriptor (it is able to compress theinformation in each scene significantly) while SURF-Harrisneeds more memory size

Considering these results jointly with the precision inlocalization we could say that the SURF-Harris descriptorshows very good results in location accuracy but its compu-tational cost makes it unfeasible to solve a real applicationHOG which is the second in terms of accuracy also has avery good computational cost so we consider it interesting to

study more thoroughly this descriptor as future work and toimplement more advanced versions of this method to try tooptimize the accuracy Likewise other types of distances tocompare images could also be studied apart from the Euc-lidean distance For the same reasons we also consider itappropriate to examine more thoroughly the gist descriptorsas well as using other methods to extract the gist of a sceneapart from the orientation information (eg from the colorinformation)

As a final experiment we have studied the relationshipdistance between two image descriptors versus geometric dista-nce between the capture points of these two images As statedat the beginning of the section this information is very inter-esting in some applications such as the construction of mapsfrom the images with geometric precision or the localizationof the vehicle at halfway points of the grid of the map It isimportant that the distance between descriptors grows as thegeometric distance does Figure 7 shows the results obtainedusing (a) Set 1 and (b) Set 2 To obtain these figures an imagehas been set as a reference image and the distance betweenthe reference image descriptor and the other descriptors hasbeen calculatedThefigure shows this distance versus the geo-metric distance between the capture points of each image andthe capture point of the reference image In both cases thisrelationship is monotonically increasing up to a geometric

10 Journal of Sensors

Table 2 Necessary memory to store each descriptor

2D Fourier Fourier Signature Gist HOG SURF-HarrisDescriptor 16384 bytes 32768 bytes 4096 bytes 8192 bytes 110400 bytes

0 200 400 600 8000

02

04

06

08

1

Des

crip

tor d

istan

ce

2D Fourier Fourier SignatureHOG

GistSURF-Harris

Geometrical distance (m)

12

(a)

Des

crip

tor d

istan

ce0 200 400 600

0

02

04

06

08

1

12

2D FourierFourier SignatureHOG

GistSURF-Harris

Geometrical distance (m)

(b)

Figure 7 Relationship distance between two image descriptors versus geometric distance between the capture points of these two images using(a) Set 1 and (b) Set 2

distance of approximately 100meters From this point it tendsto stabilize with a relatively high variance The exceptionis the local features descriptor which stabilizes at the finalvalue from a very small geometric distance However theappearance-based descriptors exhibit a more linear behavioraround each image

6 Conclusions and Future Works

In this paper we have carried out a comparative evaluationof several description methods of scenes considering theperformance of these methods to accurately solve an absolutelocalization problem in a large real outdoor environmentWeevaluated two different approaches of visual descriptors localfeatures descriptors (SURF-Harris) and global-appearancedescriptors (2D Discrete Fourier Transform Fourier Signa-ture HOG and gist)

All the tests have been carried out with images ofGoogle Street View captured under realistic conditions Twodifferent areas of a city have been considered an open areaand a purely urban area with narrower streets The capturepoints of each area present different topographyThe first oneis a grid map that covers several streets and avenues and thesecond one a linear map (ie the images were captured whenthe mobile traversed a linear path on a narrow street)

Some different studies have been performed First wehave evaluated the accuracy of the localization process To

do this recall and precision curves have been computed tocompare the performance of each description method Weplot the recall and precision curves for both areas takinginto account different levels of accuracy to consider thatthe localization result is correct In these experiments thecomputational cost of the localization process has also beenanalyzed

We have also studied each descriptor in terms of behaviorof the descriptor distance comparing to geometrical distancebetween image capture points To do this we plot a curvethat represents the descriptor distance versus the geometricaldistance between capture points This measure is very usefulfor performing navigation tasks since thanks to it we canestimate the range of use of the descriptor

It is noticeable that the SURF-Harris descriptor is themost suitable descriptor in terms of precision in localizationbut it presents a smaller zone of work in terms of Euclideandistance between descriptorsTheHOGdescriptor has showna relatively good performance to solve the localization prob-lem and presents a good response of the descriptors distanceversus geometrical distance between capture points If weanalyze jointly the results of both experiments and take intoaccount the computational cost (Tables 1 and 2) we concludethat although the SURF-Harris descriptor presents the bestresults in terms of recall and precision curves it does notallow us to work in real time Therefore taking into account

Journal of Sensors 11

that HOG is the descriptor that presents the second bestresults in terms of recall and precision curves and allows us towork in real time we can conclude that the HOG is the mostsuitable descriptor

We plan to extend this work to (a) capture a real out-door trajectory traveling along several streets and capturingomnidirectional images using a catadioptric vision system(b) combine the information provided by this vision systemand the images of the Google Street View and (c) evaluatethe performance of the best descriptors in a probabilisticlocalization process

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work has been supported by the Spanish Governmentthrough the Project DPI 2013-41557-P Navegacion de Robotsen Entornos DinamicosMedianteMapas Compactos con Infor-macion Visual de Apariencia Global and by the GeneralitatValenciana through the ProjectsAICO2015021 Localizaciony Creacion de Mapas Visuales para Navegacion de Robots con6 GDL and GV2015031 Creacion de Mapas Topologicos aPartir de la Apariencia Global de un Conjunto de Escenas

References

[1] S Thrun D Fox W Burgard and F Dellaert ldquoRobust MonteCarlo localization for mobile robotsrdquo Artificial Intelligence vol128 no 1-2 pp 99ndash141 2001

[2] A Gil O Reinoso M A Vicente C Fernandez and LPaya ldquoMonte carlo localization using SIFT featuresrdquo in PatternRecognition and Image Analysis vol 3522 of Lecture Notes inComputer Science pp 623ndash630 2005

[3] AMurillo J Guerrero andC Sagues ldquoSurf features for efficientrobot localization with omnidirectional imagesrdquo in Proceedingsof the IEEE International Conference on Robotics amp AutomationSan Diego Calif USA 2007

[4] H Bay T Tuytelaars and L Van Gool ldquoSurf speeded up robustfeaturesrdquo in Proceedings of the 9th European Conference onComputer Vision Graz Austria 2006

[5] F Rossi A Ranganathan F Dellaert and EMenegatti ldquoTowardtopological localization with spherical fourier transform anduncalibrated camerardquo in Proceedings of the International Con-ference on Simulation Modeling and Programming for Auto-nomous Robots (SIMPAR rsquo08) pp 319ndash330 Venice Italy 2008

[6] L Paya O Reinoso F Amoros L Fernandez and A Gil ldquoPro-babilistic map building localization and navigation of a team ofmobile robots application to route followingrdquo in Multi-RobotSystems Trends and Development pp 191ndash210 2011

[7] L Fernandez L Paya D Valiente A Gil and O ReinosoldquoMonte Carlo localization using the global appearance of omni-directional images algorithmoptimization to large indoor envi-ronmentsrdquo in Proceedings of the 9th International Conference onInformatics in Control Automation and Robotics (ICINCO rsquo12)pp 439ndash442 Rome Italy July 2012

[8] M Montemerlo FastSLAM a factored solution to the simul-taneous localization and mapping problem with unknown dataassociation [PhD thesis] Robotics Institute Carnegie MellonUniversity Pittsburgh Pa USA 2003

[9] C Gamallo P Quintıa R Iglesias-Rodrıguez J V Lorenzo andC V Regueiro ldquoCombination of a low cost GPS with visuallocalization based on a previous map for outdoor navigationrdquoin Proceedings of the 11th International Conference on IntelligentSystems Design and Applications (ISDA rsquo11) pp 1146ndash1151Cordoba Spain November 2011

[10] A Torii J Sivic and T Pajdla ldquoVisual localization by linearcombination of image descriptorsrdquo in Proceedings of the IEEEInternational Conference on Computer VisionWorkshops (ICCVWorkshops rsquo11) pp 102ndash109 Barcelona Spain November 2011

[11] M Aly and J-Y Bouguet ldquoStreet view goes indoors automaticpose estimation fromuncalibrated unordered spherical panora-masrdquo in Proceedings of the IEEEWorkshop on the Applications ofComputer Vision (WACV rsquo12) pp 1ndash8 Breckenridge Colo USAJanuary 2012

[12] A Taneja L Ballan andM Pollefeys ldquoRegistration of sphericalpanoramic images with cadastral 3d modelsrdquo in Proceedings ofthe International Conference on 3D Imaging Modeling Process-ing Visualization and Transmission (3DIMPVT rsquo12) pp 479ndash486 Zurich Switzerland 2012

[13] F Amoros L Paya O Reinoso and L Jimenez ldquoComparisonof global-appearance techniques applied to visual map buildingand localizationrdquo in Proceedings of the International Conferenceon Computer Vision Theory and Applications pp 395ndash398Rome Italy 2012

[14] A Gil O M Mozos M Ballesta and O Reinoso ldquoA compara-tive evaluation of interest point detectors and local descriptorsfor visual SLAMrdquo Machine Vision and Applications vol 21 no6 pp 905ndash920 2010

[15] L Paya L FernandezO ReinosoAGil andDUbeda ldquoAppea-rance-based dense maps creation Comparison of compressiontechniques with panoramic imagesrdquo in Proceedings of the Inter-national Conference on Informatics in Control Automation andRobotics (INSTICC rsquo09) pp 238ndash246 Milan Italy 2009

[16] C Harris and M Stephens ldquoA combined corner and edgedetectorrdquo in Proceedings of the Alvey Vision Conference pp 231ndash236 Manchester UK 1988

[17] E Menegatti T Maeda and H Ishiguro ldquoImage-based mem-ory for robot navigation using properties of omnidirectionalimagesrdquo Robotics and Autonomous Systems vol 47 no 4 pp251ndash267 2004

[18] L Paya L Fernandez A Gil and O Reinoso ldquoMap build-ing and Monte Carlo localization using global appearance ofomnidirectional imagesrdquo Sensors vol 10 no 12 pp 11468ndash114972010

[19] A Friedman ldquoFraming pictures the role of knowledge in auto-matized encoding andmemory for gistrdquo Journal of ExperimentalPsychology General vol 108 no 3 pp 316ndash355 1979

[20] A Torralba ldquoContextual priming for object detectionrdquo Interna-tional Journal of Computer Vision vol 53 no 2 pp 169ndash1912003

[21] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) pp 886ndash893 Washington DC USA June 2005

12 Journal of Sensors

[22] F Amoros L Paya O Reinoso L Fernandez and J MarınldquoVisual map building and localization with an appearance-based approachmdashcomparisons of techniques to extract infor-mation of panoramic imagesrdquo in Proceedings of the 7th Inter-national Conference on Informatics in Control Automation andRobotics pp 423ndash426 2010

[23] D Filliat ldquoA visual bag of words method for interactive quali-tative localization and mappingrdquo in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRArsquo07) pp 3921ndash3926 Roma Italy April 2007

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 10: Research Article A Study of Visual Descriptors for Outdoor ...downloads.hindawi.com/journals/js/2016/1537891.pdf · Research Article A Study of Visual Descriptors for Outdoor Navigation

10 Journal of Sensors

Table 2 Necessary memory to store each descriptor

2D Fourier Fourier Signature Gist HOG SURF-HarrisDescriptor 16384 bytes 32768 bytes 4096 bytes 8192 bytes 110400 bytes

0 200 400 600 8000

02

04

06

08

1

Des

crip

tor d

istan

ce

2D Fourier Fourier SignatureHOG

GistSURF-Harris

Geometrical distance (m)

12

(a)

Des

crip

tor d

istan

ce0 200 400 600

0

02

04

06

08

1

12

2D FourierFourier SignatureHOG

GistSURF-Harris

Geometrical distance (m)

(b)

Figure 7 Relationship distance between two image descriptors versus geometric distance between the capture points of these two images using(a) Set 1 and (b) Set 2

distance of approximately 100meters From this point it tendsto stabilize with a relatively high variance The exceptionis the local features descriptor which stabilizes at the finalvalue from a very small geometric distance However theappearance-based descriptors exhibit a more linear behavioraround each image

6 Conclusions and Future Works

In this paper we have carried out a comparative evaluationof several description methods of scenes considering theperformance of these methods to accurately solve an absolutelocalization problem in a large real outdoor environmentWeevaluated two different approaches of visual descriptors localfeatures descriptors (SURF-Harris) and global-appearancedescriptors (2D Discrete Fourier Transform Fourier Signa-ture HOG and gist)

All the tests have been carried out with images ofGoogle Street View captured under realistic conditions Twodifferent areas of a city have been considered an open areaand a purely urban area with narrower streets The capturepoints of each area present different topographyThe first oneis a grid map that covers several streets and avenues and thesecond one a linear map (ie the images were captured whenthe mobile traversed a linear path on a narrow street)

Some different studies have been performed First wehave evaluated the accuracy of the localization process To

do this recall and precision curves have been computed tocompare the performance of each description method Weplot the recall and precision curves for both areas takinginto account different levels of accuracy to consider thatthe localization result is correct In these experiments thecomputational cost of the localization process has also beenanalyzed

We have also studied each descriptor in terms of behaviorof the descriptor distance comparing to geometrical distancebetween image capture points To do this we plot a curvethat represents the descriptor distance versus the geometricaldistance between capture points This measure is very usefulfor performing navigation tasks since thanks to it we canestimate the range of use of the descriptor

It is noticeable that the SURF-Harris descriptor is themost suitable descriptor in terms of precision in localizationbut it presents a smaller zone of work in terms of Euclideandistance between descriptorsTheHOGdescriptor has showna relatively good performance to solve the localization prob-lem and presents a good response of the descriptors distanceversus geometrical distance between capture points If weanalyze jointly the results of both experiments and take intoaccount the computational cost (Tables 1 and 2) we concludethat although the SURF-Harris descriptor presents the bestresults in terms of recall and precision curves it does notallow us to work in real time Therefore taking into account

Journal of Sensors 11

that HOG is the descriptor that presents the second bestresults in terms of recall and precision curves and allows us towork in real time we can conclude that the HOG is the mostsuitable descriptor

We plan to extend this work to (a) capture a real out-door trajectory traveling along several streets and capturingomnidirectional images using a catadioptric vision system(b) combine the information provided by this vision systemand the images of the Google Street View and (c) evaluatethe performance of the best descriptors in a probabilisticlocalization process

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work has been supported by the Spanish Governmentthrough the Project DPI 2013-41557-P Navegacion de Robotsen Entornos DinamicosMedianteMapas Compactos con Infor-macion Visual de Apariencia Global and by the GeneralitatValenciana through the ProjectsAICO2015021 Localizaciony Creacion de Mapas Visuales para Navegacion de Robots con6 GDL and GV2015031 Creacion de Mapas Topologicos aPartir de la Apariencia Global de un Conjunto de Escenas

References

[1] S Thrun D Fox W Burgard and F Dellaert ldquoRobust MonteCarlo localization for mobile robotsrdquo Artificial Intelligence vol128 no 1-2 pp 99ndash141 2001

[2] A Gil O Reinoso M A Vicente C Fernandez and LPaya ldquoMonte carlo localization using SIFT featuresrdquo in PatternRecognition and Image Analysis vol 3522 of Lecture Notes inComputer Science pp 623ndash630 2005

[3] AMurillo J Guerrero andC Sagues ldquoSurf features for efficientrobot localization with omnidirectional imagesrdquo in Proceedingsof the IEEE International Conference on Robotics amp AutomationSan Diego Calif USA 2007

[4] H Bay T Tuytelaars and L Van Gool ldquoSurf speeded up robustfeaturesrdquo in Proceedings of the 9th European Conference onComputer Vision Graz Austria 2006

[5] F Rossi A Ranganathan F Dellaert and EMenegatti ldquoTowardtopological localization with spherical fourier transform anduncalibrated camerardquo in Proceedings of the International Con-ference on Simulation Modeling and Programming for Auto-nomous Robots (SIMPAR rsquo08) pp 319ndash330 Venice Italy 2008

[6] L Paya O Reinoso F Amoros L Fernandez and A Gil ldquoPro-babilistic map building localization and navigation of a team ofmobile robots application to route followingrdquo in Multi-RobotSystems Trends and Development pp 191ndash210 2011

[7] L Fernandez L Paya D Valiente A Gil and O ReinosoldquoMonte Carlo localization using the global appearance of omni-directional images algorithmoptimization to large indoor envi-ronmentsrdquo in Proceedings of the 9th International Conference onInformatics in Control Automation and Robotics (ICINCO rsquo12)pp 439ndash442 Rome Italy July 2012

[8] M Montemerlo FastSLAM a factored solution to the simul-taneous localization and mapping problem with unknown dataassociation [PhD thesis] Robotics Institute Carnegie MellonUniversity Pittsburgh Pa USA 2003

[9] C Gamallo P Quintıa R Iglesias-Rodrıguez J V Lorenzo andC V Regueiro ldquoCombination of a low cost GPS with visuallocalization based on a previous map for outdoor navigationrdquoin Proceedings of the 11th International Conference on IntelligentSystems Design and Applications (ISDA rsquo11) pp 1146ndash1151Cordoba Spain November 2011

[10] A Torii J Sivic and T Pajdla ldquoVisual localization by linearcombination of image descriptorsrdquo in Proceedings of the IEEEInternational Conference on Computer VisionWorkshops (ICCVWorkshops rsquo11) pp 102ndash109 Barcelona Spain November 2011

[11] M Aly and J-Y Bouguet ldquoStreet view goes indoors automaticpose estimation fromuncalibrated unordered spherical panora-masrdquo in Proceedings of the IEEEWorkshop on the Applications ofComputer Vision (WACV rsquo12) pp 1ndash8 Breckenridge Colo USAJanuary 2012

[12] A Taneja L Ballan andM Pollefeys ldquoRegistration of sphericalpanoramic images with cadastral 3d modelsrdquo in Proceedings ofthe International Conference on 3D Imaging Modeling Process-ing Visualization and Transmission (3DIMPVT rsquo12) pp 479ndash486 Zurich Switzerland 2012

[13] F Amoros L Paya O Reinoso and L Jimenez ldquoComparisonof global-appearance techniques applied to visual map buildingand localizationrdquo in Proceedings of the International Conferenceon Computer Vision Theory and Applications pp 395ndash398Rome Italy 2012

[14] A Gil O M Mozos M Ballesta and O Reinoso ldquoA compara-tive evaluation of interest point detectors and local descriptorsfor visual SLAMrdquo Machine Vision and Applications vol 21 no6 pp 905ndash920 2010

[15] L Paya L FernandezO ReinosoAGil andDUbeda ldquoAppea-rance-based dense maps creation Comparison of compressiontechniques with panoramic imagesrdquo in Proceedings of the Inter-national Conference on Informatics in Control Automation andRobotics (INSTICC rsquo09) pp 238ndash246 Milan Italy 2009

[16] C Harris and M Stephens ldquoA combined corner and edgedetectorrdquo in Proceedings of the Alvey Vision Conference pp 231ndash236 Manchester UK 1988

[17] E Menegatti T Maeda and H Ishiguro ldquoImage-based mem-ory for robot navigation using properties of omnidirectionalimagesrdquo Robotics and Autonomous Systems vol 47 no 4 pp251ndash267 2004

[18] L Paya L Fernandez A Gil and O Reinoso ldquoMap build-ing and Monte Carlo localization using global appearance ofomnidirectional imagesrdquo Sensors vol 10 no 12 pp 11468ndash114972010

[19] A Friedman ldquoFraming pictures the role of knowledge in auto-matized encoding andmemory for gistrdquo Journal of ExperimentalPsychology General vol 108 no 3 pp 316ndash355 1979

[20] A Torralba ldquoContextual priming for object detectionrdquo Interna-tional Journal of Computer Vision vol 53 no 2 pp 169ndash1912003

[21] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) pp 886ndash893 Washington DC USA June 2005

12 Journal of Sensors

[22] F Amoros L Paya O Reinoso L Fernandez and J MarınldquoVisual map building and localization with an appearance-based approachmdashcomparisons of techniques to extract infor-mation of panoramic imagesrdquo in Proceedings of the 7th Inter-national Conference on Informatics in Control Automation andRobotics pp 423ndash426 2010

[23] D Filliat ldquoA visual bag of words method for interactive quali-tative localization and mappingrdquo in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRArsquo07) pp 3921ndash3926 Roma Italy April 2007

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 11: Research Article A Study of Visual Descriptors for Outdoor ...downloads.hindawi.com/journals/js/2016/1537891.pdf · Research Article A Study of Visual Descriptors for Outdoor Navigation

Journal of Sensors 11

that HOG is the descriptor that presents the second bestresults in terms of recall and precision curves and allows us towork in real time we can conclude that the HOG is the mostsuitable descriptor

We plan to extend this work to (a) capture a real out-door trajectory traveling along several streets and capturingomnidirectional images using a catadioptric vision system(b) combine the information provided by this vision systemand the images of the Google Street View and (c) evaluatethe performance of the best descriptors in a probabilisticlocalization process

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work has been supported by the Spanish Governmentthrough the Project DPI 2013-41557-P Navegacion de Robotsen Entornos DinamicosMedianteMapas Compactos con Infor-macion Visual de Apariencia Global and by the GeneralitatValenciana through the ProjectsAICO2015021 Localizaciony Creacion de Mapas Visuales para Navegacion de Robots con6 GDL and GV2015031 Creacion de Mapas Topologicos aPartir de la Apariencia Global de un Conjunto de Escenas

References

[1] S Thrun D Fox W Burgard and F Dellaert ldquoRobust MonteCarlo localization for mobile robotsrdquo Artificial Intelligence vol128 no 1-2 pp 99ndash141 2001

[2] A Gil O Reinoso M A Vicente C Fernandez and LPaya ldquoMonte carlo localization using SIFT featuresrdquo in PatternRecognition and Image Analysis vol 3522 of Lecture Notes inComputer Science pp 623ndash630 2005

[3] AMurillo J Guerrero andC Sagues ldquoSurf features for efficientrobot localization with omnidirectional imagesrdquo in Proceedingsof the IEEE International Conference on Robotics amp AutomationSan Diego Calif USA 2007

[4] H Bay T Tuytelaars and L Van Gool ldquoSurf speeded up robustfeaturesrdquo in Proceedings of the 9th European Conference onComputer Vision Graz Austria 2006

[5] F Rossi A Ranganathan F Dellaert and EMenegatti ldquoTowardtopological localization with spherical fourier transform anduncalibrated camerardquo in Proceedings of the International Con-ference on Simulation Modeling and Programming for Auto-nomous Robots (SIMPAR rsquo08) pp 319ndash330 Venice Italy 2008

[6] L Paya O Reinoso F Amoros L Fernandez and A Gil ldquoPro-babilistic map building localization and navigation of a team ofmobile robots application to route followingrdquo in Multi-RobotSystems Trends and Development pp 191ndash210 2011

[7] L Fernandez L Paya D Valiente A Gil and O ReinosoldquoMonte Carlo localization using the global appearance of omni-directional images algorithmoptimization to large indoor envi-ronmentsrdquo in Proceedings of the 9th International Conference onInformatics in Control Automation and Robotics (ICINCO rsquo12)pp 439ndash442 Rome Italy July 2012

[8] M Montemerlo FastSLAM a factored solution to the simul-taneous localization and mapping problem with unknown dataassociation [PhD thesis] Robotics Institute Carnegie MellonUniversity Pittsburgh Pa USA 2003

[9] C Gamallo P Quintıa R Iglesias-Rodrıguez J V Lorenzo andC V Regueiro ldquoCombination of a low cost GPS with visuallocalization based on a previous map for outdoor navigationrdquoin Proceedings of the 11th International Conference on IntelligentSystems Design and Applications (ISDA rsquo11) pp 1146ndash1151Cordoba Spain November 2011

[10] A Torii J Sivic and T Pajdla ldquoVisual localization by linearcombination of image descriptorsrdquo in Proceedings of the IEEEInternational Conference on Computer VisionWorkshops (ICCVWorkshops rsquo11) pp 102ndash109 Barcelona Spain November 2011

[11] M Aly and J-Y Bouguet ldquoStreet view goes indoors automaticpose estimation fromuncalibrated unordered spherical panora-masrdquo in Proceedings of the IEEEWorkshop on the Applications ofComputer Vision (WACV rsquo12) pp 1ndash8 Breckenridge Colo USAJanuary 2012

[12] A Taneja L Ballan andM Pollefeys ldquoRegistration of sphericalpanoramic images with cadastral 3d modelsrdquo in Proceedings ofthe International Conference on 3D Imaging Modeling Process-ing Visualization and Transmission (3DIMPVT rsquo12) pp 479ndash486 Zurich Switzerland 2012

[13] F Amoros L Paya O Reinoso and L Jimenez ldquoComparisonof global-appearance techniques applied to visual map buildingand localizationrdquo in Proceedings of the International Conferenceon Computer Vision Theory and Applications pp 395ndash398Rome Italy 2012

[14] A Gil O M Mozos M Ballesta and O Reinoso ldquoA compara-tive evaluation of interest point detectors and local descriptorsfor visual SLAMrdquo Machine Vision and Applications vol 21 no6 pp 905ndash920 2010

[15] L Paya L FernandezO ReinosoAGil andDUbeda ldquoAppea-rance-based dense maps creation Comparison of compressiontechniques with panoramic imagesrdquo in Proceedings of the Inter-national Conference on Informatics in Control Automation andRobotics (INSTICC rsquo09) pp 238ndash246 Milan Italy 2009

[16] C Harris and M Stephens ldquoA combined corner and edgedetectorrdquo in Proceedings of the Alvey Vision Conference pp 231ndash236 Manchester UK 1988

[17] E Menegatti T Maeda and H Ishiguro ldquoImage-based mem-ory for robot navigation using properties of omnidirectionalimagesrdquo Robotics and Autonomous Systems vol 47 no 4 pp251ndash267 2004

[18] L Paya L Fernandez A Gil and O Reinoso ldquoMap build-ing and Monte Carlo localization using global appearance ofomnidirectional imagesrdquo Sensors vol 10 no 12 pp 11468ndash114972010

[19] A Friedman ldquoFraming pictures the role of knowledge in auto-matized encoding andmemory for gistrdquo Journal of ExperimentalPsychology General vol 108 no 3 pp 316ndash355 1979

[20] A Torralba ldquoContextual priming for object detectionrdquo Interna-tional Journal of Computer Vision vol 53 no 2 pp 169ndash1912003

[21] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) pp 886ndash893 Washington DC USA June 2005

12 Journal of Sensors

[22] F Amoros L Paya O Reinoso L Fernandez and J MarınldquoVisual map building and localization with an appearance-based approachmdashcomparisons of techniques to extract infor-mation of panoramic imagesrdquo in Proceedings of the 7th Inter-national Conference on Informatics in Control Automation andRobotics pp 423ndash426 2010

[23] D Filliat ldquoA visual bag of words method for interactive quali-tative localization and mappingrdquo in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRArsquo07) pp 3921ndash3926 Roma Italy April 2007

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 12: Research Article A Study of Visual Descriptors for Outdoor ...downloads.hindawi.com/journals/js/2016/1537891.pdf · Research Article A Study of Visual Descriptors for Outdoor Navigation

12 Journal of Sensors

[22] F Amoros L Paya O Reinoso L Fernandez and J MarınldquoVisual map building and localization with an appearance-based approachmdashcomparisons of techniques to extract infor-mation of panoramic imagesrdquo in Proceedings of the 7th Inter-national Conference on Informatics in Control Automation andRobotics pp 423ndash426 2010

[23] D Filliat ldquoA visual bag of words method for interactive quali-tative localization and mappingrdquo in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRArsquo07) pp 3921ndash3926 Roma Italy April 2007

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 13: Research Article A Study of Visual Descriptors for Outdoor ...downloads.hindawi.com/journals/js/2016/1537891.pdf · Research Article A Study of Visual Descriptors for Outdoor Navigation

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of