autonomous robot exploration in smart environments exploiting...

15
Ann. Telecommun. (2012) 67:297–311 DOI 10.1007/s12243-012-0305-z Autonomous robot exploration in smart environments exploiting wireless sensors and visual features Andrea Bardella · Matteo Danieletto · Emanuele Menegatti · Andrea Zanella · Alberto Pretto · Pietro Zanuttigh Received: 14 May 2011 / Accepted: 13 February 2012 / Published online: 12 June 2012 © Institut Mines- Télécom and Springer-Verlag 2012 Abstract This paper presents a complete solution for the integration of robots and wireless sensor networks in an ambient intelligence scenario. The basic idea consists in shifting from the paradigm of a very skilled robot interacting with standard objects to a simpler ro- bot able to communicate with smart objects, i.e., objects capable of interacting among themselves and with the robots. A smart object is a standard item equipped with a wireless sensor node (or mote) that provides sensing, communication, and computational capabilities. The mote’s memory is preloaded with object information, as name, size, and visual descriptors of the object. In this paper, we will show how the orthogonal advantages of wireless sensor network technology and of mobile robots can be synergically combined in our approach. We detail the design and the implementation of the interaction of the robot with the smart objects in the environment. Our approach encompasses three main phases: (a) discovery, the robot discovers the smart A. Bardella · M. Danieletto · E. Menegatti · A. Zanella (B ) · A. Pretto · P. Zanuttigh Dept. of Information Engineering, University of Padova, via Gradenigo 6/B, 35131 Padova, Italy e-mail: [email protected] A. Bardella e-mail: [email protected] M. Danieletto e-mail: [email protected] E. Menegatti e-mail: [email protected] A. Pretto e-mail: [email protected] P. Zanuttigh e-mail: [email protected] objects in the area by using wireless communication; (b) mapping, the robot moving in the environment roughly maps the objects in space using wireless communica- tion; (c) recognition, the robot recognizes and precisely locates the smart object of interest by requiring the object to transmit its visual appearance. Hence, the robot matches this appearance with its visual percep- tion and reach the object for fine-grain interaction. Experimental validation for each of the three phases in a real environment is presented. Keywords Wireless sensor network · Mobile robot · Ambient intelligence · Object-recognition · MoBIF · Mapping · Localization 1 Introduction Service robots are looking for their killer applications to leave research laboratories and enter in our daily lives, being progressively deployed in houses, on roads, and in public spaces. The same trend is experienced by wireless sensor network (WSN) technologies, which are breaking through the academic boundaries to spread over the market. The complementarities of the WSN and robot technologies result in the synthesis of a novel network paradigm, generally called wireless sensor and robot networks (WSRN), where the two technologies are intimately interconnected in order to enable a large set of advanced and innovative services ([13] and www.scat.or.jp/nrf/English/). A similar trend is fol- lowed by the networked robotics community that is investigating the idea of exploiting the communica- tion between the robot and sensors distributed in the environment to lower the computational burden and

Upload: others

Post on 24-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Autonomous robot exploration in smart environments exploiting …pretto/papers/Annals_Telec2012.pdf · Autonomous robot exploration in smart environments exploiting wireless sensors

Ann. Telecommun. (2012) 67:297–311DOI 10.1007/s12243-012-0305-z

Autonomous robot exploration in smart environmentsexploiting wireless sensors and visual features

Andrea Bardella · Matteo Danieletto · Emanuele Menegatti ·Andrea Zanella · Alberto Pretto · Pietro Zanuttigh

Received: 14 May 2011 / Accepted: 13 February 2012 / Published online: 12 June 2012© Institut Mines- Télécom and Springer-Verlag 2012

Abstract This paper presents a complete solution forthe integration of robots and wireless sensor networksin an ambient intelligence scenario. The basic ideaconsists in shifting from the paradigm of a very skilledrobot interacting with standard objects to a simpler ro-bot able to communicate with smart objects, i.e., objectscapable of interacting among themselves and with therobots. A smart object is a standard item equipped witha wireless sensor node (or mote) that provides sensing,communication, and computational capabilities. Themote’s memory is preloaded with object information,as name, size, and visual descriptors of the object. Inthis paper, we will show how the orthogonal advantagesof wireless sensor network technology and of mobilerobots can be synergically combined in our approach.We detail the design and the implementation of theinteraction of the robot with the smart objects in theenvironment. Our approach encompasses three mainphases: (a) discovery, the robot discovers the smart

A. Bardella · M. Danieletto · E. Menegatti ·A. Zanella (B) · A. Pretto · P. ZanuttighDept. of Information Engineering, University of Padova,via Gradenigo 6/B, 35131 Padova, Italye-mail: [email protected]

A. Bardellae-mail: [email protected]

M. Danielettoe-mail: [email protected]

E. Menegattie-mail: [email protected]

A. Prettoe-mail: [email protected]

P. Zanuttighe-mail: [email protected]

objects in the area by using wireless communication; (b)mapping, the robot moving in the environment roughlymaps the objects in space using wireless communica-tion; (c) recognition, the robot recognizes and preciselylocates the smart object of interest by requiring theobject to transmit its visual appearance. Hence, therobot matches this appearance with its visual percep-tion and reach the object for fine-grain interaction.Experimental validation for each of the three phases ina real environment is presented.

Keywords Wireless sensor network · Mobile robot ·Ambient intelligence · Object-recognition · MoBIF ·Mapping · Localization

1 Introduction

Service robots are looking for their killer applicationsto leave research laboratories and enter in our dailylives, being progressively deployed in houses, on roads,and in public spaces. The same trend is experienced bywireless sensor network (WSN) technologies, which arebreaking through the academic boundaries to spreadover the market. The complementarities of the WSNand robot technologies result in the synthesis of a novelnetwork paradigm, generally called wireless sensor androbot networks (WSRN), where the two technologiesare intimately interconnected in order to enable alarge set of advanced and innovative services ([1–3]and www.scat.or.jp/nrf/English/). A similar trend is fol-lowed by the networked robotics community that isinvestigating the idea of exploiting the communica-tion between the robot and sensors distributed in theenvironment to lower the computational burden and

Page 2: Autonomous robot exploration in smart environments exploiting …pretto/papers/Annals_Telec2012.pdf · Autonomous robot exploration in smart environments exploiting wireless sensors

298 Ann. Telecommun. (2012) 67:297–311

the intelligence requested to the robot to effectivelyoperate in complex environments.

Inspired by the PEIS ecology developed by Saffiottiet al. [4], where intelligent and complex behaviors areachieved through the cooperation of many simple ro-bots and sensors, we here propose a system that com-bines in a synergetic way the complementarities of therobot and WSN technologies to enable the autonomousexploration of unknown intelligent environments bythe robots. The key ingredient of our system is thepresence in the environment of objects tagged withwireless sensor nodes, or motes for shortness, whichprovide communication, processing, and data storagecapabilities, thus turning a dummy object into a smartobject. The same mote is applied to the robot in order toenable radio communication with the smart objects. Inour solution, the robot does not have any prior knowl-edge about the shapes of the objects, their positionsand their number: all these information are obtainedby interacting with the objects. Therefore, converselyto most of the solutions proposed in the literature, wecan deal with any object from the simplest to the mostcomplex (like those in Fig. 1), without necessity of anya priori knowledge about the objects.

We foresee the application of this system in manydifferent scenarios, including industrial, domestic, andassisted living. As an example, teams of robots may bedeployed in automated warehouses to catalog, localize,and retrieve objects upon request, without knowing inadvance the type nor the location of the objects to becollected. Similarly, robots may be used in libraries toput back books on the shelves after closure. In a homescenario, the same system can be used to tidy-up theplay room of kids. At the end of the day, a robot can

Fig. 1 The robot and some of the smart objects used in thepresented experiments. In order to demonstrate the flexibility ofthe system, we chose objects that usually people move around ina room

locate all toys in the room and store them back into theproper containers, provided both toys and containersare smart objects. In the assisted living domain, wemay envision a system where an intelligent portabledevice may drive a visually impaired person througha partially unknown environment by vocally describingthe smart objects recognized along the path. The audiodescription of the smart objects may be embedded inthe motes’ memory by the object manufacturer andwirelessly transmitted to the portable smart speaker-phone upon request. Yet, a personal robot may displayto a motion-impaired user the list of smart objects(remote controls, mobile phones, books, and so on)that it discovers in the environment—thanks to wire-less and, possibly, multihop communication. Then, therobot may go and grab an object for the user.

The realization of this vision requires the robot tobe able to identify, locate and, finally, recognize thedifferent objects in the environment. Whereas the radiocommunication may be successfully exploited to dis-cover which smart objects are located in the environ-ment, the standard localization schemes based on theradio signal strength (RSS) do not provide a precisegeographical location of the nodes [5]. This is partic-ularly true in indoor environment because of the self-interference effects due to multiple reflections of theradio signals. Currently, the most popular RSS-basedlocalization algorithm can locate the motes with respectto the robot with a precision of a meter or so, that is notenough for a reliable localization of the object basedonly on its ID. We, hence, propose to complement theradio-based localization with a visual detection of theobject by the robot. To this end, we store locally, inthe memory of the smart object, the appearance of theobject itself. This information is then transmitted tothe robot upon request, so that the robot can visuallyrecognize that object in the image taken by the onboardcamera.

Object recognition from visual features is a well-known problem in computer vision for which manydifferent solutions have been proposed, based bothon the analysis of the global appearance of the ob-ject and on the extraction of a relevant set of fea-tures. Concerning the first approach, one of the mostpopular approaches was proposed by Viola and Jones[6], where an adaptive boosting (AdaBoost) learningalgorithm is applied to a set of Haar-like featuresefficiently extracted from images. Lowe [7] proposed anobject recognition system based on local image features(scale-invariant feature transform (SIFT)) invariant toimage scaling, translation, and rotation. SIFT featureshave proved to be very robust and effective in manypractical object recognition applications. In the last

Page 3: Autonomous robot exploration in smart environments exploiting …pretto/papers/Annals_Telec2012.pdf · Autonomous robot exploration in smart environments exploiting wireless sensors

Ann. Telecommun. (2012) 67:297–311 299

years, bag-of-words (BoW) methods have received agreat attention for object recognition and categoriza-tion tasks (e.g., [8]): these methods aim to representsimages with a set of clusterized visual descriptors (e.g.,SIFT features) taken from a visual vocabulary gener-ated off-line. Recently, BoW approaches have beenimproved by indexing descriptors (visual words) in avocabulary tree with a hierarchical quantization of thefeature descriptors [9] and using quantization methodsbased on randomized trees [10]. Despite their efficiencyand effectiveness, BoW methods require an accurateoff-line learning step. Moreover, all the object signa-tures (i.e., collections of visual words that describes theobjects) must share the same dictionary: this depen-dency limits somehow the portability of these methods.For these reasons, we choose to employ an objectrecognition system similar to the one proposed in [7],successfully exploited in other recent robotic recogni-tion frameworks (e.g., [11]).

We observe that some authors proposed the use ofRFID technology to ease the recognition of objectsby the robot. Although the use of RFID technologymay boost the cooperation and interaction betweenthe object and the robot, its scope is limited by theintrinsic constraints of the RFID technology, such asthe rather short operational range (especially for thepassive RFIDs) and the extremely limited computa-tional capabilities of the RF tags. Moreover, the useof active motes may also provide measurements in-volving either the object itself, like object temperature,level of filling, inclination, weight, deterioration, or thesurrounding environment, as temperature, pollution,humidity, light, and so on. The smart objects may forma self-established multihop wireless network that can beused to enlarge the sensorial and communication rangeof the robot, augmenting its environmental awareness.The synergetic combination of the two systems is finallyrealized by means of a suitable suite of communicationprotocols and algorithms, whose aim is to enable theinteraction of the robot with the smart objects locatedin the area.

In summary on our approach, the robot can createa map with a coarse localization of the smart objectsof interest obtained via radio and move toward them.When the robot is in the room where the object ofinterest is located, it can seek for its appearance inthe images it has acquired. In this way, the robot canlocate much more precisely the object and navigatetoward it in order to perform the desired interaction.This process develops along three main phases:

– Object discovery The smart objects, which are usu-ally dozing to save energy, are awaken, identified,

and time-synchronized by the robot by using theradio interface.

– Object mapping The robot starts navigating theenvironment and uses the information providedby the onboard odometers and the radio signalsreceived from the smart objects to map the objectsin space.

– Object recognition Once the smart objects’ mappingis sufficiently accurate, the robot can move towardthe estimated location of a target object. Whenin proximity, the robot asks the smart object tosend its visual appearance descriptors, which arecontinuously matched against those extracted fromthe stereo camera images. When the matching ex-ceeds a given threshold, the object is recognizedin the images taken by the onboard cameras. Therobot can then exactly localize the object in space,approach it, and start fine-grain interaction.

In the following, we describe in greater detail thealgorithms that we used to realize the object discovery,object mapping and object recognition services. Weobserve that these services can actually be realized byusing state of the art solutions taken from the WSN,autonomous robotic, or WSRN areas. However, theapproaches proposed in the literature generally fail tocatch the potential synergies among the three differentphases, in particular the object mapping and recog-nition, whereas in this manuscript we offer a morecomplete and organic vision of the system as a whole.Furthermore, we present a large selection of exper-imental results obtained by using a proof-of-concepttestbed that we developed to prove the feasibility of theidea and to identify pitfalls and technical challenges.

Summing up, the strength of the proposed approachwith respect to the state of the art consists the followingmain points: (a) We take the concept of smart objectin the WSRN picture, which is a common object taggedwith a mote that stores object-related information, suchas weight, size, appearance, status, and so on. This in-formation can be wirelessly transmitted to other nodes,thus enabling the seamless inclusion of new objects intothe system, without the need for database updating orsimilar operations that are instead required in classicalobject-recognition systems; (b) The use of motes torealize the smart objects rather than passive RFIDsmakes it possible to establish a multihop wireless net-work that, besides providing the usual WSN services,may relay messages from remote nodes to the robot incase direct communication is not available. This featureimproves the flexibility and robustness of the systemwith respect to passive RFID solutions. We observethat, concerning the communication and computational

Page 4: Autonomous robot exploration in smart environments exploiting …pretto/papers/Annals_Telec2012.pdf · Autonomous robot exploration in smart environments exploiting wireless sensors

300 Ann. Telecommun. (2012) 67:297–311

capabilities, WSNs and active RFID are rather similar.Nonetheless, the WSN technology natively supportsenvironmental monitoring functionalities that can beintegrated into the system in different ways. For in-stance, the data collected by light sensors can be usedto suitably tune the sensitivity of the camera to theenvironmental brightness or to select the best sets ofimage features to be transmitted to the robot; (c) Com-plementing the localization information obtained byradio-based localization schemes with information ex-tracted from the visual appearance of the smart objectsimproves the accuracy of the discrimination capabilityof the robot in the presence of similar objects, the accu-racy in the localization of the objects with respect to therobot, and finally enhances its capability of interactionwith the objects.

Part of the results presented in this manuscript ap-peared in the proceedings of some international con-ferences, namely [12–14]. In this paper, however, weoffer a more complete and organic vision of the systemas a whole, which was missing in each of the previouspublications. Furthermore, we detail the communica-tion protocol between smart objects and robot andpropose an innovative multichannel strategy for radio-based localization. Moreover, we assess the improve-ments with respect to classical single-channel methodsby providing extensive experimental results and, asa side result, we compare the localization techniquesconsidered in our previous publications [13, 14] with an-other algorithm, based on the multidimensional scalingtechnique. For this last technique, we also investigatethe accuracy gain obtained by considering interobjectmeasurements into the localization algorithm, whichwas not presented in our previous publications. Finally,we extend the analysis of the smart object visual recog-nition by means of motion blur invariant feature de-tectors (MoBIFs), as presented in [12], by introducinga stereoscopic visual recognition setup that makes itpossible to estimate the object distances by applyingtriangulation on the correspondences found in MoBIFdescriptor clouds.

The rest of the paper is organized as follows. InSection 2, we describe the object discovery process.In Section 3, we discuss the object mapping problem,and we present the multichannel ranging techniqueand the communication protocol that we designed tocoordinate the nodes during this phase. Furthermore,we describe the multidimensional scaling method forobject mapping. In Section 4, we describe the objectappearance descriptor method we used to realize the vi-sual object recognition module. In Section 5, we presentexperiments in which the robot performs all the stepsto discover and approach the smart objects. Finally,

in Section 6, we draw conclusions and discuss futureextensions of the work.

2 Object discovery

As we said, the number of smart objects and theirposition in the environment are initially unknown to therobot. The robot thus needs first to discover and, then,localize the objects in the environment.

The problem in objects discovery is that motes arenot always awake and ready to reply to inquiry mes-sages. In fact, the radio transceiver is the most energy-hungry component of a sensor node, so that motesgenerally operate according to a periodic ON–OFFpattern: in ON periods, all the sensing and communica-tion capabilities are active, whereas in OFF periods theradio transceiver and, possibly, other hardware mod-ules are powered off to save energy. Hence, a node isactive for only the fraction of time d, typically referredto as duty cycle. The ON–OFF patterns followed bythe different nodes are, in general, asynchronous sinceachieving time synchronization in multihop WSN re-quires rather sophisticated algorithms (see, e.g., [15]).Clearly, the duty cycle d and the ON–OFF cycle periodhave a direct impact on the node’s lifetime, i.e., the timebefore a sensor node exhausts its battery charge, and onthe reactiveness of the node to the occurrence of eventsor solicitations by other nodes. These two aspects areobviously in contrast, and striking the most convenientbalance between them is a subtle design problem thathas stimulated different solutions (see, for instance,[16] and references therein). In our scenario, however,the node discovery problem is greatly simplified bythe presence of the robot that can keep its wirelessinterface always active, thus intercepting any transmis-sion by other nodes in its coverage range. This featuremakes it possible to adopt simple rendezvous strategiesthat, on the one hand, allow motes to implement ON–OFF patterns with minimal duty cycles, thus savingenergy and, on the other hand, provide quick reactiontime, compatibly with the length of the cycle time Tc.

More specifically, in this work, we propose the fol-lowing object discovery strategy, which is graphicallyexemplified in Fig. 2. The motes attached to the smartobjects can operate in two states, namely active andquiet. In active state, the smart objects keep theirtransceiver switched on and ready to receive commandmessages; whereas in quiet state, each mote followsa periodic ON–OFF pattern. In the ON phase, themote switches on its radio interface and broadcasts aHELLO message containing basic information about theattached object. Channel access is managed according

Page 5: Autonomous robot exploration in smart environments exploiting …pretto/papers/Annals_Telec2012.pdf · Autonomous robot exploration in smart environments exploiting wireless sensors

Ann. Telecommun. (2012) 67:297–311 301

Fig. 2 Example of objectdiscovery procedure,followed by part of RSSIharvesting procedure

to the classic CSMA algorithm, so that the messagetransmission is deferred to a later time if the channelis occupied by other transmissions. The ON phase endswhen the channel remains idle for a certain time inter-val Tlisten. During the ON period, the mote will processall the received packets and act accordingly.

When a robot wishes to discover the smart objectsin its proximity, it switches on its radio interface andcontinuously listens for HELLOmessages sent by nearbyobjects. The robot retrieves and stores the MAC ad-dress, the object profile, and the other info contained inthe received HELLO messages. Furthermore, the robotreplies to each message by sending a SYNC messagewithin the time interval Tlisten. The SYNC messagecontains two main fields, the temporary address, andthe rendezvous time. The temporary address is a shortidentifier that the robot assigns to the addressed smartobject and that will be used in subsequent communi-cations to refer to that object. The rendezvous time,instead, denotes the time interval after which the robotexpects the smart object to be in active state. The smartobject that receives the SYNC message, then, stores itstemporary address and schedules the transition fromquiet to active mode after the rendezvous time interval.The rendezvous time value is updated at each SYNCmessage in order to synchronize the activation of thesmart objects around the same time instant.

The communication protocol is unreliable, since itdoes not entail any explicit acknowledgment mecha-nism: if the SYNC message is not correctly decoded bythe addressed smart object, the object will stay in quietstate and keep performing the ON–OFF cycle. In turn,the information collected by the robot are stored in a

soft form and will be deleted after a certain time periodif not refreshed by the reception of new messages fromthe corresponding smart object. However, the robotcan reply to HELLO messages at any time, even whenperforming other tasks than object discovery, in orderto identify objects that were initially out of the robot’srange.

3 Object mapping

The object discovery procedure provides the robotwith information concerning the objects in its coveragerange. The following step is to map the objects in thearea, i.e., determine their geographical coordinates, inorder to enable fine-grain interaction. The problem ofnode localization in WSN has since been long recog-nized as an important and challenging issue, and a lotof research has been carried out in this context. Manysolutions assume the presence of a limited numberof nodes, called beacons or anchors, that know theirown position and are used by other nodes to locatethemselves through triangulation techniques. Many ofthese schemes make use of the received signal strengthindicator (RSSI) to determine a rough estimate of thedistance between the transmitter and receiver, an op-eration referred to as ranging. This approach offersthe advantage of being readily employable in any ra-dio device since the RSSI is supported by basicallyall radio transceivers. Another advantage of RSSI-based ranging is that it does not require the nodeto be in line of sight with the robot since the radiosignal passes through obstacles, as people, furniture,

Page 6: Autonomous robot exploration in smart environments exploiting …pretto/papers/Annals_Telec2012.pdf · Autonomous robot exploration in smart environments exploiting wireless sensors

302 Ann. Telecommun. (2012) 67:297–311

or even walls. Unfortunately, the range estimate basedon RSSI measurements is unreliable and subject torandom fluctuations due to a number of environmentalfactors. Therefore, the accuracy that can be obtainedwith RSSI-based localization techniques in indoor en-vironments is rather poor, with errors of the order of 1to 6 meters [5], depending on the number of beacons.

The presence of the robot, however, can drasti-cally enhance the performance of the localization tech-niques. For instance, in [17], we showed that the robot,which is fairly well localized by virtue of the onboardodometers and navigation system, can act as a sortof mobile beacon drastically augmenting the numberof reference signals to be used in classical localizationalgorithms. However, the presence of robots opens theway to much more advanced and sophisticated localiza-tion methods. A very flexible and powerful techniqueis the simultaneous localization and mapping (SLAM)algorithm realized by means of an extended Kalmanfilter (EKF) approach that merges the information pro-vided by the robot’s odometers with the RSSI samplesprovided by the surrounding objects to simultaneouslytrack the motion of the robot in the environment andrefine the mapping of the objects in the area. Weexperimented this approach in [13] and observed thatthe accuracy of the mapping provided by EKF-SLAMis strongly affected by the initial guess of the moteposition, which is required at the beginning of theSLAM procedure to initialize the system state �, whichis a vector containing the current estimate of robotand object locations. Therefore, in [14], we proposed tocouple the EKF-SLAM algorithm with a mote positioninitialization based on particle filters. According to ourexperimental results, this approach reduces both meanand variance of the final location estimate error withrespect to the simple EKF approach.

In this work, we further advance the investigationof the object-mapping problem along three directions:first, we propose a method to increase the accuracy ofthe RSSI-based ranging by exploiting the capability ofthe sensor nodes to operate on different RF channels;1

second, we include in our experiments another local-ization method, namely the weighted multidimensionalscaling (MDS), that is computationally lighter thanEKF and is based on the same engine used for theobject recognition functionalities that will be describedin the next section; third, we evaluate the extent to

1Some preliminary results obtained with this method were pre-sented in [18].

which interobject RSSI measurements may amelioratethe object mapping. As a side-result, in Section 5,we provide an experimental performance comparisonamong the various mapping techniques considered inour work, namely EKF with delayed initialization, par-ticle filter only (PF), MDS, and MDS with interobjectmeasurements.

In the following of this section, we explain theprinciple of multichannel ranging and describe thecommunication protocol we designed to collect RSSIsamples over different RF channels. Successively, forreader convenience, we overview the basics of the MDSapproach that was proposed by Costa et al. in [19];whereas for the details of the other aforementionedlocalization techniques, we refer the reader to our pre-vious publications [13, 14].

3.1 Multichannel RSSI-based ranging

The most widely used RSSI-based ranging model isbased on the well-known path loss plus shadowingsignal propagation model, according to which the re-ceived power at distance d from the transmitter can beexpressed (in power ratio in decibels (dBm)) as

Prx = Ptx + K − 10η log10

(dd0

)+ � , (1)

where Ptx is the transmitted power in dBm, K is aunitless constant that depends on the environment, d0

is the reference distance for the far field model to bevalid, η is the path loss coefficient and � is a randomvariable that takes fading effects into account [20]. Ingeneral, the � term is assumed to be a zero-meanGaussian random variable, though this model is notalways the most appropriate, especially in the presenceof line of sight between transmitter and receiver [21]. Inthis case, in fact, the variability of the received poweris mainly due to the random phase shifts between thedirect path and the strongest reflections, typically onfloor, ceiling, and close-by objects. At the frequencyof 2.4 GHz, which is typically used by sensor nodes,moving transmitter, or receiver of few centimeters canresult in a totally different combination of the signalreflections at the receiver, with a significant variationof the received signal power. We observe that thesame effect can be obtained by changing the carrierfrequency without moving the nodes. As an example,by increasing the carrier frequency of the radio signalfrom 2.4 to 2.45 GHz, the phase shift between the directsignal and a copy that follows a path 3 m longer (e.g.,ceiling reflection) will be ≈ π . This suggest that it ispossible to reduce the impact of � in Eq. 1 by collecting

Page 7: Autonomous robot exploration in smart environments exploiting …pretto/papers/Annals_Telec2012.pdf · Autonomous robot exploration in smart environments exploiting wireless sensors

Ann. Telecommun. (2012) 67:297–311 303

RSSI samples on different RF channels and, then, usingtheir mean value P̄rx in the ranging equation:

d̂ = d010Ptx+K−P̄rx

10η . (2)

Clearly, to gather RSSI measurements on differentchannels, nodes need to coordinate in order to concor-dantly change the carrier frequency. To this end, wedesigned the following protocol, which is initiated bythe robot when the objects contacted during the objectdiscovery phase enter active mode at the rendezvoustime.

The multichannel RSSI harvesting process occurs insuccessive rounds. Each round is initiated by the robotthat broadcasts an RSSI_GET message. This messagecontains the list of smart objects that are required tocollect RSSI samples and the transmission order of thenodes. For compactness, nodes are identified by meansof the temporary addresses assigned during the objectdiscovery phase, rather than using the MAC addresses,which are typically longer. Channel access occurs ac-cording to a time division multiple access scheme: timeis partitioned in transmission slots of constant duration(slightly longer than the transmission time of a fulldata packet), and each node is assigned to a singleslot in an exclusive manner. Each node listed in theRSSI_GET message, then waits for its assigned slot,and then broadcasts an RSSI_REPORT message thatcontains the vector of RSSI values collected in theprevious slots, included the robot’s one. Furthermore,the RSSI_GET and RSSI_REPORT messages will alsocarry an indication of the RF channel that will be usedin the following round. In this way, nodes that missthe robot’s packet but overhear a report message cansynchronize again in the following round. We observethat the number of RSSI samples reported by the nodesis not homogeneous across the round since the firstnodes that transmit have not yet received messagesfrom the others. To overcome this drawback, the robotmay permute the transmission order of the nodes ineach subsequent rounds. Furthermore, each round maybe repeated multiple times, without changing channel.

Once again, the communication is unreliable and noACK mechanism is considered. If a smart object failsreplying to the robot’s message for two consecutiverounds, its entry is deleted in the robot’s memory andthe temporary address is released. On the other hand,if a smart object does not receive any message (eitherfrom the robot or other objects) for a interval Ttimeout,it switches back to the quiet mode.

When the multichannel RSSI harvesting is complete,the robot can move into a new location and repeat thefull process.

3.2 MDS

In the following, we describe the MDS algorithmfor object mapping that we used in our experiments.Let us enumerate from 1 to n the smart objects in-cluded in the mapping process. Furthermore, let n +1, n + 2, . . . , n + k denote the locations where the robotstopped to collect RSSI samples from the surroundingobjects. In the following, we denote these positions asvirtual beacon nodes. Let θi = (xi, yi)

T the vector ofCartesian coordinates for node i. Our aim is to deter-mine an estimate of θi for i = 1, 2, . . . , n knowing theexact position of the virtual beacons and the rangingvalues given by Eq. 2. The MDS approach consists inminimizing the following cost function

S(�k) =n∑

i=1

n+k∑j=n+1

2wi, j

(d̂i, j − di, j(�k)

)2(3)

where �k = [θ1, . . . , θn+k

]is the state vector, d̂i, j is the

estimated distance between smart object i and virtualbeacon j; whereas di, j(�k) is the distance between thesame nodes given the state vector �k. Finally, the scalar

wi, j = e−P̄2rxi, j

/P2th accounts for the accuracy of d̂i, j, where

P̄rxi, j and Pth are, respectively, the power received bynode i from node j , averaged over different channels,and the power threshold for ranging. The cost functionS(�k) can also be modified to include the measure-ments between smart objects in the following way:

S(�k) =n∑

i=1

( n∑j=1j�=i

wi, j(d̂i, j − di, j(�k))2+

n+k∑j=n+1

2wi, j(d̂i, j − di, j(�k))2

).

(4)

The minimization of S(�k) cannot be performed inclosed form, but the problem can be solved iteratively.Given the state vector at the iterative step h, �

(h)

k =[θ

(h)1 , . . . , θ

(h)

n+k

], the next state can be computed by

applying this simple updating function (see [19] for thedetails):

θ(h+1)

i = ai�(h)

k b(h)

i (5)

where

ai =

⎛⎜⎜⎝

n∑j=1j�=i

wi, j +n+k∑

j=n+1

2wi, j

⎞⎟⎟⎠

−1

(6)

Page 8: Autonomous robot exploration in smart environments exploiting …pretto/papers/Annals_Telec2012.pdf · Autonomous robot exploration in smart environments exploiting wireless sensors

304 Ann. Telecommun. (2012) 67:297–311

and b(h)

i =[b (h)

i,1 , . . . , b (h)

i,n+k

]Tis a vector whose entries

are given by

b (h)

i, j =

⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

αwi, j

(1 − d̂i, j

di, j(�(h)

k )

)j �= i

n∑�=1��=i

wi,�d̂i,�

di,�(�(h)

k )+

n+k∑�=n+1

2wi,�d̂i,�

di,�(�(h)

k )j = i ,

(7)

with α = 1 if j ≤ n and α = 2 otherwise. The iterativeprocedure stops when S(�

(h−1)

k ) − S(�(h)

k ) < ε for a cer-tain ε. We observe that, although the updating equa-tions are simple to compute, the number of operationsgrows linearly with the number of virtual beacons, sothat the execution of the MDS algorithm progressivelyslows down as the number of sampling positions in-creases. The same scalability problem, however, affectsthe other localization algorithms considered in our pre-vious work. In particular the complexity of EKF isroughly an order of O(m n3), with m number of steps ofthe robot and n number of objects in the area; while thecomplexity of the MDS algorithms is instead O(n m L),where L is the number of recursions performed by thealgorithm to converge to the solution. The value of Lgrows with the number of sampling positions m, thoughthe dependence of L on m is not available in an explicitform. Nonetheless, we experimentally found that MDSis lighter than EKF for reasonable values of n and m.

4 Visual object recognition and interaction

The mapping algorithm described in the previous sec-tion is generally capable of localizing the smart objectsin the environment with a residual error of the orderof magnitude of a meter. Although this precision maybe sufficient to correctly steer the robot towards atarget destination, it is not enough for enabling physicalinteraction between the robot and the object. To thisend, we need a much more precise localization of theobject, which can be obtained by recognizing its ap-pearance in the images provided by the robot’s camera.Operatively, when the robot is in the surrounding ofthe object of interest, it starts sending DescriptorRequest messages to that object. The object repliesby sending packets containing descriptors of its ap-pearance, which are passed to the object identificationmodule in the robot. Note how each object stores onlyits own descriptors, so the memory requirements forthe motes are very limited and can fit inside the used

motes’ memory. The robot controller then executes thefollowing steps:

– gets images from the onboard cameras;– extract the descriptors of these images;– compare these descriptors with those transmitted

by the object’s mote;– if the object is recognized in the camera images,

its position is computed from the descriptors’ loca-tions, and is passed to the robot navigation module.

Clearly, the performance of the object recognitionmodule depends on the method used to represent theobject appearance. Ideally, the descriptors shall permitthe robot to perform a fast and reliable recognition ofthe object in its visual perspective, irrespective of itsdistance, orientation, and light exposition. The robotshould also be able to recognize the object even if it ispartly occluded by other elements of the scene. Many ofthese features are possessed by the scale-invariant fea-ture transform descriptors [7]. Furthermore, each SIFTfeature descriptor occupies just 128 bytes of memoryand, thus, the set of feature descriptors correspondingto the object can be stored in commercial motes. Theextraction of SIFTs from the images grabbed by thecamera and their comparison with other descriptorstaken from a sample image is also fast enough to allowobject recognition in about a second.2 Moreover, ex-tracted features are particularly robust to affine trans-formations and occlusions. Unfortunately, in indoorenvironments, the ambient is dim and the light is notenough for grabbing clear images most of the time. Inthis case, the images taken by the robot while movingwill likely be affected by motion blur. To solve thisproblem, we propose the use of a new feature detectorscheme called motion blur invariant feature detectorthat was originally developed for humanoid robots [22].Instead of trying to restore the original, unblurred im-ages, the MoBIF approach uses an adapted scale–spacerepresentation of the image that tries to overcome thenegative effect of the motion blur in the invariant fea-ture detection and description process. This approachcan deal also with nonuniform, nonlinear motion blur.Like SIFT, MoBIF descriptors are particularly robustto affine transformations and occlusions, thus allowingour approach to work correctly even in the case of par-tially occluded objects. Furthermore, MoBIF descrip-tors proved to perform similarly to SIFT on standarddatasets and outperformed SIFT in images affected bymotion blur or in dim images.

2This value refers to the SIFT feature extraction and matchingon a 1, 032 × 778 image using the hardware platform described inSection 5.1.

Page 9: Autonomous robot exploration in smart environments exploiting …pretto/papers/Annals_Telec2012.pdf · Autonomous robot exploration in smart environments exploiting wireless sensors

Ann. Telecommun. (2012) 67:297–311 305

Unfortunately, MoBIF descriptors (like SIFT) arenot robust to large perspective transformations nor tolarge rotations of the object along the vertical axisthat occur when the robot observes the objects fromvery different points of view (as reported in [7] per-formances decrease when the rotation is larger than30 degrees). Moreover, SIFT and MoBIF descriptorsshow a good invariance to the scale only until a certainlimit (in the order of a couple of meters for the ob-jects considered in this study). We addressed these twoproblems by taking many pictures of each object fromdifferent points of views, separated by 20 ÷ 30 degrees(thus ensuring that there is always a view correspondingto a rotation for which SIFT descriptors work properly)and at several distances (i.e., 1, 3, and 5 m). We then ex-tracted the descriptors from each image and added eachof them in a single descriptor cloud, in an incrementalway. If a descriptor is too similar to another descriptoralready present in the cloud, it is rejected. For example,in our experimental setting, we took images of eachobject from 18 different viewing directions and at threedifferent distances. For each image, we selected ap-proximately the ten most informative MOBIFs, so thatthe total memory footprint of the object description isat most about 70 kB, which fits in the flash memoryof 1 MB of TmoteSky nodes. The set of descriptorsresulting from this process completely describes theexternal surface of the object. Searching for objects ina frame is hence done by looking for correspondenceson this set. We remark that merging the informationcoming from all the pictures and deleting redundantdescriptors makes it possible to recognize the objectfrom any point of view, while maintaining a reasonablecomputational complexity.

Note how visual recognition is also very useful inorder to distinguish between different objects placedtoo close to be distinguished from the RSSI technique:MoBIF descriptors have a very good matching accuracy

and are able to distinguish between two not too similarobjects. The only critical case is when almost identicalobjects are placed very close, and the object recognitionalgorithm may fail because the visual features of theobjects are very similar.

In order to improve the precision of the robot lo-calization, we also tested an improved version of thevisual recognition system where a second camera hasbeen added to the proposed setup. Each robot is so pro-vided with two cameras arranged in a binocular stereosetup. This not only allows to improve the recognitionperformance but also, as well known from computervision theory, allows to compute the object distance bytriangulation from the positions of its features in theimages of the two cameras.

We exploited the stereoscopic camera setup dur-ing the autonomous navigation using the approachdepicted in Fig. 3. First of all, before starting theautonomous navigation, the two cameras are jointlycalibrated, a rectification transform between the twoimages is computed [23], and the resulting rectificationmap is stored on the robot. This allows to rectify inreal time the images acquired by the cameras duringthe navigation using the precomputed map in order tomake the descriptor matching easier. Then, as previ-ously introduced, when the robot comes close to anobject, the MoBIF descriptors are extracted from therectified images of both cameras, and the descriptorsextracted from each of the two cameras are matchedwith the object ones. Note that some of the objectdescriptors will be present in the images of both cam-eras, while other ones could appear just in one of thembecause they could be out of the field of view, occludedby other objects, or simply not detected by the featureextraction algorithm. Following this observation, a firstadvantage of using two cameras is that it is possible toget more feature points and obtain a slight improve-ment in the object detection performance. However

Fig. 3 Architecture of thestereoscopic featureextraction system

Page 10: Autonomous robot exploration in smart environments exploiting …pretto/papers/Annals_Telec2012.pdf · Autonomous robot exploration in smart environments exploiting wireless sensors

306 Ann. Telecommun. (2012) 67:297–311

the main improvement offered by the stereo setup isthe additional information on the feature locations.More precisely, as well known from computer vision,the projection of the same 3-D point to two differentviews corresponding to a pair of cameras is shifted by anamount inversely proportional to the distance betweenthe object and the cameras. In particular in the simpleconfiguration of Fig. 4 (referring to rectified images), itcan be easily shown that the relation between the objectdistance Z and the position shift of the feature locationbetween the view of the first camera and of the secondcamera is given by [23]:

Z = |C1 − C2| fd2 − d1

= b fd2 − d1

(8)

where b = |C1 − C2| is the distance between the twocameras’ optical centers C1 and C2 (stereo system base-line) and f is the cameras’ focal length (we used thesame focal length for both cameras).

After this computation for each couple of corre-sponding descriptors, we have a distance measure be-tween the feature point and the robot. The usefulness ofthis information is twofold: firstly, all these distance val-ues can be used independently when the robot is veryclose to an object to help the robot in its interactionwith the object. Furthermore, when the robot is comingin proximity of one of the objects, it is also possibleto estimate the distance between the robot and theobject by taking the average of the distance betweenthe various features on that object and the robot. Thiscorresponds to assume that all the features are locatedin the object centroid, thus introducing an error in theactual computed distance; however, at least for smallobjects, this approximation is reasonable. In computingthe distance, we also used the RANSAC [24] robustestimator in order to exclude from the computation

Fig. 4 Geometry of a binocular stereo setup

outliers due to incorrect matches that can arise becauseof errors in the feature matching. These mismatchescould appear, for example, if the object has symmetricalor repeating patterns, and a point seen from one ofthe cameras is matched with another one in the othercamera’s image corresponding to another instance ofthe same pattern or if some features of the object getmatched with others not belonging to it. The estimateof the distance between the robot and the centroidof the object obtained from the stereo setup can thenbe used as an initialization information for the WSNlocalization algorithm of Section 3 when the robot startsmoving again towards a new object.

5 Experiments

In order to prove the feasibility of the proposed systemand identify drawbacks and problems, we developed aproof-of-concept prototype that we used to run someexperiments. Below, we briefly describe the platformand present a selection of results.

5.1 The hardware platforms

Smart objects have been realized by glueing TmoteSkywireless sensor nodes to sample items. The TmoteSkyradio transceiver is the Chipcon CC2420, whose PHYand MAC is compliant to the IEEE 802.15.4 standard,operating in ISM band at 2.4 GHz and providing abit rate of 250 kbit/s. The module also provides an 8-bit register named received signal strength indicator,whose value is proportional to the power of the re-ceived radio signal. The core of the mote is the MSP430,a Texas Instrument low-power microcontroller, whichis used to control the communication and sensing pe-ripherals. The microcontroller is provided with 10 kBof RAM and 48 kB of integrated flash memory, usedto host operating system and programs, whereas addi-tional 1 MB of flash memory is available for data stor-ing. Besides, the board is equipped with integrated lightand humidity sensors. Motes have been programmed inNesC, using the TinyOS open-source operating system.

The robot, named Bender, was a custom-builtwheeled differential drive platform based on the Pio-neer 2 by MobileRobots Inc, depicted in Fig. 1. Therobot is equipped with a standard ATX motherboardwith 1.6 GHz Intel Pentium 4, a 256 MB RAM, and a160 GB hard disk, running Linux OS. The only onboardsensors are a stereoscopic camera and the odometersconnected to the two driven wheels. Communicationwith the laboratory intranet is provided by a PCMCIAwireless ethernet card, whereas the connection with the

Page 11: Autonomous robot exploration in smart environments exploiting …pretto/papers/Annals_Telec2012.pdf · Autonomous robot exploration in smart environments exploiting wireless sensors

Ann. Telecommun. (2012) 67:297–311 307

WSN is obtained by a Tmote Sky connected to one ofthe robot’s USB ports.

5.2 Object mapping experiments

In this section, we compare the performances of thethree localization algorithms introduced in Section 3,namely EKF, PF, and MDS, in terms of mean localiza-tion error. We also considered the MDS with interob-ject RSSI measurements, here denoted as MDS intern-ode. In Figs. 5 and 6, we report the results obtained inindoor and outdoor scenarios, respectively. Images ofFigs. 5a and 6a show the location of the nodes in theexperiments and the trajectory followed by the mobilerobot across the area. Each RSSI harvesting stationalong the path is marked by a cross. Figures 5b, c and6b, c show the final localization error of each nodefor the different algorithms, when using single-channel(Figs. 5b and 6b) and multichannel ranging (Figs. 5c and6c), respectively. Furthermore, in Table 1, we collectthe mean localization algorithm over all the nodes, inthe different cases.

First of all, by comparing the results achieved withthe same setting in the two scenarios, we observe thatall the algorithms provide better location estimate inoutdoor because of the less severe multipath fading.Second and more interesting, observing the results re-ported in Figs. 5b, c and 6b, c in the respective scenar-ios, we see that in almost all the cases, the localizationerror of all the considered algorithms is reduced whenusing multichannel ranging rather than single-channelranging. This experimental evidence confirms the intu-ition according to which averaging the RSSI samplesover multiple channels reduces the uncertainty of theranging estimate. The counterpart is that the collectionof RSSI samples over multiple channels requires a moresophisticated communication algorithm and, in general,may take a longer time. However, we observe that withsingle-channel ranging, it is still necessary to collectmultiple RSSI samples for each pairs of nodes, in orderto average the fast fading term. Conversely, with multi-channel ranging, we collect one or a few RSSI samplesin each RF channel, but we repeat the operation insuccessive time instants in different channels, so thatthe fast fading is still averaged out when taking themean RSSI value. Therefore, the multichannel RSSIranging takes approximately the same time as single-channel ranging at the end. In particular, note how thetime taken to collect RSSI samples over k differentchannels can be roughly estimated as M = k (n T + S),where n is the number of in-range nodes, T is the slotduration, and S is the switching delay that accountsfor the time taken by the nodes to switch to the next

0 2 4 6 8 100

0.5

1

1.5

2

2.5

3

3.5

4

X coordinate [m]

Y c

oord

inat

e [m

]

node 1

node 2

node 3

node 4

node 5

robot pathvirtual beaconssmart objects

(a)

0

1

2

3

4

5

1 2 3 4 5

Loca

lizat

ion

erro

r [m

]

Node ID

EKFPF

MDSMDS internode

(b)

0

1

2

3

4

5

1 2 3 4 5

Loca

lizat

ion

erro

r [m

]

Node ID

EKFPF

MDSMDS internode

(c)

Fig. 5 Indoor scenario. a Experimental setup, b mean estimateerror using single-channel RSSI ranging, c mean estimate errorusing multichannel RSSI-based ranging

Page 12: Autonomous robot exploration in smart environments exploiting …pretto/papers/Annals_Telec2012.pdf · Autonomous robot exploration in smart environments exploiting wireless sensors

308 Ann. Telecommun. (2012) 67:297–311

0 5 10 150

1

2

3

4

5

6

7

8

X coordinate [m]

Y c

oord

inat

e [m

]

node 1

node 2

node 3

node 4

node 5

robot pathvirtual beaconssmart objects

(a)

0

1

2

3

4

5

1 2 3 4 5

Loca

lizat

ion

erro

r [m

]

Node ID

EKFPF

MDSMDS internode

(b)

0

1

2

3

4

5

1 2 3 4 5

Loca

lizat

ion

erro

r [m

]

Node ID

EKFPF

MDSMDS internode

(c)

Fig. 6 Outdoor scenario. a Experimental setup, b mean estimateerror using single-channel RSSI ranging, c mean estimate errorusing multichannel RSSI-based ranging

Table 1 Mean localization errors for indoor (first and secondrows) and outdoor (third and fourth rows) environments, usingsingle-channel and multichannel ranging

EKF PF MDS MDS(m) (m) (m) internode (m)

IN single-channel 3.16 3.44 1.92 1.95IN multi-channel 1.25 1.9 1.32 0.87OUT single-channel 2.37 1.82 1.88 1.87OUT multi-channel 1.14 0.8 0.95 0.9

IN indoor, OUT outdoor

channel and receive the next RSSI_GET packet fromthe robot. With the TmoteSky sensor nodes we used,the slot time turns out to be approximately equal toT � 10 [ms], while the switching time is S � 50 ms.Hence, collecting RSSI samples over k = 4 maximallyspaced-apart RF channels from n = 10 nodes takesapproximately M = 600 ms.

Finally, we note that the MDS internode algorithmyields, in a few cases, slightly worse localization ac-curacy than the standard MDS. In the other cases,however, the MDS internode scheme may provide sig-nificant improvements as, for instance, for nodes 2 inFig. 5. The reason is that rough internode ranging esti-mates may generally impact negatively on the localiza-tion accuracy provided by the MDS algorithm when thefirst guess of the nodes position is good. However, incase the nodes are severely misplaced at the beginningof the MDS algorithm, the availability of internoderanging information makes it possible to correct thisdeficiencies. This is the case of node 2 in Fig. 5. Infact, observing the time evolution of the state vector�k during the execution of the mapping algorithms(not reported here for space constraints), we could seethat the initial guess for this node position, obtainedby applying the particle filter initialization approach,was close to the position of node 1, which is actuallysymmetric with respect to the robot trajectory. Withthe path followed by the robot in this experiment, theEKF, PF, and MDS algorithms were not able to recovernode 2 from that erroneous initialization, so that thefinal localization error was large. Conversely, using theinternode ranging information between nodes 1 and2, the MDS internode algorithm was able to correctthe initial error and enhance the accuracy of the finalposition estimation of node 2.

5.3 Visual recognition experiments

When the robot receives from the mote the MoBIF-based object descriptors, it starts looking for this objectin the surrounding environment. In our experiments,we use several type of smart objects, as those depicted

Page 13: Autonomous robot exploration in smart environments exploiting …pretto/papers/Annals_Telec2012.pdf · Autonomous robot exploration in smart environments exploiting wireless sensors

Ann. Telecommun. (2012) 67:297–311 309

Fig. 7 MoBIF descriptor match test in a complex environmentand at different distances. The blue box is the object to recognize,and in both images there are only a few incorrect descriptormatches. In the images, the dots have dif ferent colors to identifyat which cloud distance MoBIF is associated

Fig. 8 Example of correct matches in the presence of motionblur. Top, a zoom of an image grabbed by the robot withoutmotion blur; bottom the image grabbed while moving, thus withmotion blur

a) Box experiment

b) Ball experiment

c) Occluded object (plush) experiment

Fig. 9 Feature matching with a stereoscopic setup. Green dotsrepresent features present in the object and in both cameras;while blue and red features are present in the object but onlyin the left or right camera, respectively. Yellow features do notbelong to the object

in Fig. 1. In all the experiments we performed, the ro-bot, in addition to correctly localizing itself and buildinga map of the perceived motes, was able to correctlyrecognize the smart objects in its visual perspective.Figure 7 exemplifies the result of MoBIF descriptorsmatching (red dots) for a smart object, the blue box, ina cluttered environment. We notice that the matchingis correct irrespective of the distance and orientationof the target object. In Fig. 8, we show how, by usingthe MoBIF descriptors, we are able to obtain a correctrecognition even in presence of motion blur.

As described in Section 4, we also introduced asecond camera to improve the performance of the vi-sual recognition module. Figure 9 shows the extracted

Table 2 Number of matched descriptors for the image of Fig. 9a

Camera Matched descriptors

Obj. and Obj. and Obj. and Not belongingboth im. left im. right im. to the obj.(green) (blue) (red) (yellow)

Left 26 105 – 982Right 26 – 136 1,117

Page 14: Autonomous robot exploration in smart environments exploiting …pretto/papers/Annals_Telec2012.pdf · Autonomous robot exploration in smart environments exploiting wireless sensors

310 Ann. Telecommun. (2012) 67:297–311

Table 3 Comparison between the distance estimates and theground truth data

Object Estimate distance Real distance Error(cm) (cm) (cm)

Box 133.7 140 6.3Ball 136.1 140 3.9Plush 130.3 133 3.3

features: it can be clearly seen that the matching ofthe extracted MoBIF features is very reliable. Table 2shows the number of extracted features for the examplein Fig. 9a; there are features present in both cameras(shown in green) but some of the object features arepresent only in the left or in the right camera image(shown in blue and red respectively). This shows howthe stereoscopic setup allows to extract and match morefeatures than each of the two cameras alone, thus al-lowing a more reliable object matching. Furthermore,Fig. 9c shows an example of occluded object detection.Even if a smaller number of features is available due tothe occlusion, the robustness of the MoBIF descriptorsto occlusions and the additional information providedby the second camera allows to correctly detect andlocalize also the partly occluded object.

Following the approach introduced in Section 4, thefeatures visible in both cameras can be matched to-gether (as shown in green in Fig. 9), and their respectiveposition can be used to estimate the object position.The stability of the matching of the MoBIF descriptorsand the RANSAC robust estimator provide a reliabledistance estimate as can be seen from Table 3. The tableshows the estimate of the distance between the objectand the robot in the three example cases in Fig. 9 andcompares it with a ground truth measure acquired by atime-of-flight sensor. The error in the estimates is just afew centimeters.

6 Conclusions and future work

In this paper, we present a system that enables theautonomous exploration of smart environments by arobot. The application stems from the capability of therobot to closely interact with some objects that are en-abled to wireless communications and capable of sim-ple computational task and limited data storage. Theproposed approach allows the robot to progressivelyacquire environmental awareness by interacting withthe smart objects located in the space. The feasibility ofthis vision has been proved by means of an experimen-tal prototype of the system, in which a robot has provedto be able to discover the objects in radio range by using

RF communication, then to roughly map them intothe area through an RSSI-based localization algorithmcoupled with a proper initialization scheme based onparticle filters, and finally to recognize the objects in itsvisual perspective by matching the information trans-mitted from the object with the appearance descriptorsobtained from the onboard cameras.

This prototype can be further ameliorated in manydifferent ways. For instance, we plan to integrate intothe SLAM algorithm the localization information thatmay be extracted from the robot’s camera images. Thenext step will be to map not only the smart objectsbut also the rest of the environment, so that the smartobjects will be located in a 3-D visual map of theenvironment. In this connection, we are also planningto extend the proposed approach in order to obtaina 3-D localization of the robots inside a 3-D scenerepresentation in order to allow a better interactionbetween the robots and complex environments. Thefinal step will be to integrate this complete system witha robotics brain computer interface we are developingin collaboration with IRCCS San Camillo of Venice(Italy) and the University of Palermo (Italy). The BCIsystem will enable to select the smart object we wantthe robot to interact with just by “thinking it”. This willopen the possibility to interact with a domotic houseor an intelligent ambient also to people with severedisabilities, as amyotrophic lateral sclerosis.

References

1. Lee J, Hashimoto H (2002) Intelligent space - concept andcontents. Adv Robot 16(3):265–280

2. Kim J, Kim Y, Lee K (2004) The third generation of robotics:ubiquitous robot. In: Proc of the 2nd int conf on autonomousrobots and agents, Palmerston North, New Zealand

3. Dressler F (2006) Self-organization in autonomous sensorand actuator networks. In: Proceedings of the 19th IEEE intconf on architecture of computing systems

4. Saffiotti A, Broxvall M, Gritti M, LeBlanc K, Lundh R,Rashid J, Seo B, Cho Y (2008) The peis-ecology project:vision and results. In: IROS 2008: intelligent robots andsystems. North-Holland Publishing Co., Amsterdam, TheNetherlands, The Netherlands, pp 2329–2335

5. Zanca G, Zorzi F, Zanella A, Zorzi M (2008) Experimentalcomparison of rssi-based localization algorithms for indoorwireless sensor networks. In: REALWSN ’08: proceedings ofthe workshop on real-world wireless sensor networks. ACM,New York, NY, USA, pp 1–5

6. Viola P, Jones M (2001) Robust real-time object detection.In: Second international workshop on statistical and compu-tational theories of vision

7. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110

8. Sivic J, ZA (2003) Video google: a text retrieval approachto object matching in videos. In: Proceedings of the interna-tional conference on computer vision

Page 15: Autonomous robot exploration in smart environments exploiting …pretto/papers/Annals_Telec2012.pdf · Autonomous robot exploration in smart environments exploiting wireless sensors

Ann. Telecommun. (2012) 67:297–311 311

9. Nister D, Stewenius H (2006) Scalable recognition with avocabulary tree. In: Proceedings of the ieee computer societyconference on computer vision and pattern recognition

10. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Ob-ject retrieval with large vocabularies and fast spatial match-ing. In: Proceedings of the IEEE computer society confer-ence on computer vision and pattern recognition

11. Meger D, Forssén P, Lai K, Helmer S, McCann S, Southey T,Baumann M, Little J, Lowe D, Dow B (2007) Curious george:an attentive semantic robot. In: IROS 2007 workshop: fromsensors to human spatial concepts

12. Pretto A, Menegatti E, Pagello E (2007) Reliable featuresmatching for humanoid robots. In: 7th IEEE-RAS interna-tional conference on humanoid robots, 2007, pp 532–538

13. Menegatti E, Zanella A, Zilli S, Zorzi F, Pagello E (2009)Range-only slam with a mobile robot and a wireless sensornetworks. In: IEEE international conference on robotics andautomation, 2009. ICRA ’09, pp 8–14

14. Menegatti E, Danieletto M, Mina M, Pretto A, Bardella A,Zanella A, Zanuttigh P (2010) Discovery, localization andrecognition of smart objects by a mobile robot. In: Simu-lation, modeling, and programming for autonomous robots.Lecture notes in computer science, vol 6472. Springer, Berlin,Heidelberg, pp 436–448

15. Schenato L, Fiorentin F (2009) Average timesync: aconsensus-based protocol for time synchronization in wire-less sensor networks. In: Proceedings of 1st IFAC workshopon estimation and control of networked systems (NecSys09)

16. Dutta P, Culler D (2008) Practical asynchronous neighbordiscovery and rendezvous for mobile sensing applications.In: SenSys ’08: proceedings of the 6th ACM conference onembedded network sensor systems. ACM, New York, NY,USA, pp 71–84

17. Zanella A, Menegatti E, Lazzaretto L (2007) Self local-ization of wireless sensor nodes by means of autonomousmobile robots. In: Proceedings of the 19th Tyrrhenian inter-national workshop on digital communications, Ischia, Italy,9–12 Sept 2007

18. Menegatti E, Danieletto M, Mina M, Pretto A, Bardella A,Zanconato S, Zanuttigh P, Zanella A (2010) Autonomousdiscovery, localization and recognition of smart objectsthrough wsn and image features. In: IEEE internationalworkshop towards SmArt communications and networktechnologies applied on autonomous systems (SaCoNAS),Miami, USA

19. Costa JA, Patwari N, Hero III AO (2006) Distributedweighted-multidimensional scaling for node localization insensor networks. ACM Trans Sens Netw 2:39–64

20. Goldsmith A (2005) Wireless communications. CambridgeUniversity Press, New York, NY, USA

21. Bardella A, Bui N, Zanella A, Zorzi M (2010) An experi-mental study on ieee 802.15.4 multichannel transmission toimprove rssi-based service performance. In: Fourth workshopon real-world wireless sensor networks (REALWSN 2010),Colombo, Sri Lanka

22. Pretto A, Menegatti E, Bennewitz M, Burgard W, Pagello E(2009) A visual odometry framework robust to motion blur.In: IEEE international conference on robotics and automa-tion, 2009. ICRA ’09, pp 2250–2257

23. Hartley RI, Zisserman A (2004) Multiple view geometryin computer vision, 2nd edn. Cambridge University Press,Cambridge. ISBN: 0521540518

24. Fischler MA, Bolles RC (1981) Random sample consen-sus: a paradigm for model fitting with applications to imageanalysis and automated cartography. Commun ACM 24(6):381–395