multi-objective exploration and search for autonomous rescue robots

Multi-objective Exploration andSearch for Autonomous

Rescue Robots

Daniele Calisi, Alessandro Farinelli,Luca Iocchi, and Daniele NardiDipartimento Informatica e SistemisticaUniversità di Roma “La Sapienza”Via Salaria, 113 - 00198 Roma, Italye-mail: [email protected]: [email protected]: [email protected]: [email protected]

Received 8 January 2007; accepted 23 July 2007

“Exploration and search” is a typical task for autonomous robots performing in rescuemissions, specifically addressing the problem of exploring the environment and at thesame time searching for interesting features within the environment. In this paper, wemodel this problem as a multi-objective exploration and search problem and present aprototype system, featuring a strategic level, which can be used to adapt the task of ex-ploration and search to specific rescue missions. Specifically, we make use of high-levelrepresentation of the robot plans through a Petri Net formalism that allows representingin a coherent framework decisions, loops, interrupts due to unexpected events or actionfailures, concurrent actions, and action synchronization. While autonomous explorationhas been investigated in the past, we specifically focus on the problem of searching in-teresting features in the environment during the map building process. We discuss per-formance evaluation of exploration and search strategies for rescue robots, by using aneffective performance metric, and present evaluation of our system through a set ofexperiments. © 2007 Wiley Periodicals, Inc.

1. INTRODUCTION

In recent years increasing attention has been devotedto rescue robotics, both from the research communityand from rescue operators. Robots can consistentlyhelp human operators in dangerous tasks during res-cue operations in several ways. Indeed, one of themain services that mobile robots can provide to res-

cue operators is to work as remote sensing devices re-porting information from dangerous places that hu-man operators cannot easily and/or safely reach.

A consistent part of rescue robotic research is fo-cused on providing robots with high mobility capa-bilities and complex sensing devices. Such kinds ofrobots are usually designed to be teleoperated duringthe rescue mission and the knowledge about the mis-

• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •

Journal of Field Robotics 24(8/9), 763–777 (2007) © 2007 Wiley Periodicals, Inc.Published online in Wiley InterScience (www.interscience.wiley.com). • DOI: 10.1002/rob.20216

sion scenario is gathered by the human operatorthrough the displayed output of the sensors �e.g.,camera images�. Performance evaluation for thesekinds of rescue robots is consequently focused on theability to drive the robot through terrains of measur-able complexity �e.g., random stepfields �Jacoff &Messina 2006��, while taking into account operationconstrains, such as lack of visibility of the robot in ac-tion, or control by single operator. Communication inthis case must be guaranteed at all times, otherwisethe robots become out of control.

Another branch of rescue robotics focuses insteadon providing mobile bases with a certain degree ofautonomy. Autonomous and semi-autonomous ro-bots can process acquired data and build a high-levelrepresentation of the surrounding environment. Inthis mode, robots can act in the environment �e.g.,navigate� through a limited interaction with the hu-man operator. This also allows a human operator toeasily control multiple robots by providing high levelcommands �e.g., “explore this area,” “reach thispoint,” etc.�. Moreover, in case of temporary networkbreakdown, the mobile robotic platforms can con-tinue the execution of the ongoing task and return toa predefined base position.

In this work, we consider autonomous explora-tion and search in an indoor unstructured environ-ment. The goal is to explore and map the environ-ment, while searching for mission relevantinformation that should be reported to the human op-erators. Such a search task is typically targeted to-wards detection and analysis of “interesting” fea-tures: human victims, temperature, presence of gas orany other substance, etc. Moreover, the relevant fea-tures should be located inside the environment,therefore the robot has to build a map of the exploredarea and localize itself inside the map. In the follow-ing, we refer to the above task as “exploration andsearch”. The expected output for “exploration andsearch” is a 2D map of the planar environment, an-notated with interesting features �e.g., victims�. Wealso assume the environment to be static.

More specifically, we focus on multi-objective ex-ploration and search. In fact, in rescue missions, het-erogeneous sensing devices are often used for themap building system and for the victim detection sys-tem, and the two processes have different and, some-times, conflicting goals. For example, the map build-ing process must be accurate and fast: a laser canaccurately acquire information from a long distance�up to 80 m�. On the other hand, victim detection can

be very demanding from a computational point ofview and it requires a very short range of operation.

Autonomous exploration has been deeply inves-tigated in the mobile robot literature �Latombe, 1991;Stachniss, Grisetti, & Burgard, 2005; Yamauchi, 1997;Jin, Liao, Polycarpou, & Minai, 2004�. Most ap-proaches, however, do not consider the problem ofsearching for interesting features inside the environ-ment, while doing the exploration. On the otherhand, several approaches have been proposed forsearching the environment, using coverage tech-niques �see �Choset, 2001� for a survey� oraggregation-based techniques �DasGupta, Hespanha,Riehl, & Sontag, 2006�. However, such approachesgenerally assume to operate in a known environmentand do not address uncertainty in robot actions andperceptions.

The aim of the present paper is to present a novelapproach to the multi-objective exploration andsearch problem based on an explicit high-level rep-resentation of the robot plans and to assess its per-formance through a systematic evaluation.

Our approach is based on a strategic level com-ponent, which is in charge of monitoring and drivingthe search and exploration process. Specifically, wemodel the strategic level using a formalism called Pe-tri Net Plans �PNP� �Ziparo & Iocchi, 2006�, which isbased on Petri Nets �Peterson, 1981; Murata, 1989�, agraphical modeling language for representing dy-namic systems. The PNP formalism allows for an ef-fective representation of concurrent processes, sens-ing actions, as well as interrupts and failures, whichare needed to promptly react to new events and tohandle the faults due to the high uncertainty of theinformation available. The introduction of a strategiclevel allows for an effective and flexible solution tothe multi-objective exploration problem, as opposedto approaches based on the optimization of aweighted function �e.g., �Stachniss et al., 2005��.

The powerful formalism used for the strategiclevel allows specifying complex strategies that canaddress and resolve the conflicting goals of the dif-ferent subsystems, while easily incorporating pos-sible a priori knowledge about missions and environ-ments into the system. For example, consider the casewhere a rescue system enters a building knowing thatmost of the people inside it were concentrated in oneroom �without knowing the exact map of the buildingand, therefore, without knowing how to reach such aroom�. The system should quickly explore the build-ing as soon as a relevant number of people is de-

764 • Journal of Field Robotics—2007

Journal of Field Robotics DOI 10.1002/rob

tected, then the mapping process should have lesspriority and the system should focus on precisely as-sessing the status of the found people. Such an explo-ration strategy is very hard to encode writing downa function to optimize, while it can be easily ex-pressed using our strategic level.

The evaluation of the system is based on quan-titative experiments in different operative scenariosusing a Pioneer 3-AT mobile base equipped with aLaser Range Finder and a pan-tilt sensor unit includ-ing a stereo camera and a thermo sensor. A 3D simu-lator of Rescue Scenarios �USARSim1� has also beenused in the experiments. The aim of the experimentsis to show that the introduction of a strategic level al-lows for a high flexibility of the system in definingdifferent strategies for rescue missions and it allowsfor an easy integration of available a priori knowl-edge about the operational scenario.

The metric used to evaluate the system takes intoaccount at the same time the need for being easily ap-plicable on real systems and the requirement of beingtotally objective �i.e., not requiring the definition ofarbitrary weighting factors to account for differenttypes of information to gather from the environment�.Consequently, differently from other metrics �for ex-ample, those used in RoboCup Rescue competitions�,we propose a metric based on a set of functions, eachone measuring the ability of the system to discoverinformation of a particular kind, without trying tomerge these results into a single value by weightingthe importance of each feature. This metric does notdirectly compare two systems/strategies when it isnot evident that one outperforms the other, i.e., wherethe result would be situation dependent. In thesecases, evaluation done by human operators seemsbest suited.

The paper is organized as follows. In the next sec-tion we present related work and compare our solu-tion with previous work. Section 3 describes our pro-posal for multi-objective exploration and search;Section 4 describes our robotic system and implemen-tation details about the software modules that imple-ment mapping, localization, navigation, and victimdetection. Section 5 describes the experimental set-ting and performance evaluation of the proposed sys-tem. Finally, Section 6 draws the conclusions.

2. RELATED WORK

The exploration and search problem in rescue mis-sions is the main focus of the RoboCup Rescuecompetitions,2 and therefore all the participant teamshave to deal with this problem and have imple-mented specific solutions. In case of tele-operated ro-bots, the multi-objective exploration and search issolved by the human operator, who decides the ac-tivities of the robot. The teams that have imple-mented autonomous behaviors usually adopt a strat-egy that prefers victim analysis to map exploration,also according to the specific score system used in thecompetitions. Therefore, to the best of our knowl-edge, a formalization of the problem as a multi-objective search task and a systematic analysis of thedeveloped solutions have not been presented. In ad-dition, there are several works in the literature ad-dressing exploration and search problems, but notconsidering some relevant issues encountered in res-cue missions.

The exploration problem is usually seen as an op-timization problem, in which the expected total timeof the exploration or a related value has to be opti-mized �e.g., the information gain has to be maxi-mized, the total travel cost minimized, etc.�. Severalapproaches address the problem by defining utilityfunctions for the choice of the next position to travelto. In these utility functions, one has to consider boththe costs of the actions and their information gains.Typically, one can take into account the minimizationof the total time along with other features, such asmap precision, and include them in the utility func-tion. For example, Yamauchi �Yamauchi, 1997� pro-poses a frontier based exploration, where the robotchooses the next point to reach based on its distancefrom the available frontiers. A frontier is a part of theenvironment, which divides explored from unex-plored space. Following such an approach, the robottries to minimize the total exploration time.González-Baños and Latombe in �González-Baños &Latombe, 2002� propose an extension of the next bestview �NBV� method applied to the exploration task.The proposed approach extends previous techniques�e.g., �Pito, 1996�� developed for computer vision,considering issues specific to mobile robots �e.g., lo-calization and obstacle avoidance�.

While the task of exploring an unknown environ-ment with the goal of building a map can be suitably

1http://usarsim.sf.net 2www.rescuesystem.org/robocuprescue/

Calisi et al.: Multi-objective Exploration and Search for Autonomous Rescue Robots • 765


modelled as an optimization problem, as presentedabove, exploration and search usually entails a multi-objective optimization �Coello, 1999�. Several ap-proaches address the problem by defining a singleutility function, which depends on several param-eters to optimize. For example, Stachniss et al. �Stach-niss et al., 2005� propose an approach that combinesmapping, localization, and exploration. Their methodevaluates the cost and information gain of the actionsthat the robot has to perform and combines themwith a weighted function to be maximized. The workby Mathews and Durrant-Whyte �Mathews &Durrant-Whyte, 2006� addresses a similar problem.The authors focus on the control of multiple plat-forms involved in an information gathering task. Thegoal of the approach is to find the optimal joint ac-tions for the mobile platforms in order to minimizethe entropy of the probability distribution over thesystem state. The authors apply their approach to adistributed indoor mapping scenario. In both the lat-ter two approaches, the actions are addressed mostlyfor the construction of a precise and accurate mapand do not consider other kinds of interesting fea-tures to be analyzed and reported on the map.

On the other hand, Amigoni and Gallo �Amigoni& Gallo, 2005� address the multi-objective nature ofthe exploration problem, by considering a Pareto ef-ficient approach: instead of finding an optimal solu-tion for a weighted function, they suggest to find asolution that is good enough for all the measures tobe optimized.

Also, the work by Jin et al. �Jin et al., 2004� ex-plicitly addresses the trade-off between search andresponse to interesting targets. The work focuses ona team of UAVs searching for moving targets. Someof the targets are known beforehand and others ap-pear dynamically. UAVs should thus search for pos-sible new targets, while at the same time act in re-sponse to already found ones. This trade-off isaddressed using a hybrid algorithm that switches be-tween an instantaneous task assignment technique,which considers only the current world state in thetask assignment process, and a predictive task assign-ment technique, which tries to predict future states ofthe targets. The authors show that the hybrid algo-rithm successfully addresses the mentioned trade-off;however, the approach seems to be strongly depen-dent on the reference scenario. In particular, the de-termination of the threshold for switching between

the two task assignment techniques is performed us-ing an heuristic based on the results obtained in pre-vious simulations.

The problem of searching an object for a searcherwith limited perceptions, but with a priori probabilitydistribution of the target position, is addressed byDasGupta et al. �DasGupta et al., 2006�. This problemamounts to defining a path that maximizes the prob-ability of finding the target. The authors use anaggregation-based technique to approximate theproblem and solve it in polynomial time. However,the results apply only to a scenario where the map isgiven beforehand, and a probability function of thetarget position is known.

All the approaches mentioned above implementmulti-objective exploration either as a numeric opti-mization process or with a task-specific solution. Theuse of an explicit high-level representation of the in-formation and of the plan to be executed has not beenaddressed.

On the other hand, multi-objective optimizationcan be addressed by making use of Markov DecisionProcesses �MDP, POMDP� �Pineau & Gordon, 2005�.Markov Decision Processes allow for high-level rep-resentation of actions and strategies and for devisingoptimal policies �i.e., sequences of actions� to be ex-ecuted, maximizing a reward function defined overthe domain. Such approaches, however, are highlycomputationally demanding, especially for complexenvironments, where several actions and states haveto be modeled. These approaches can address multi-objective problems only by including each objectivereward �using weighting factors� in the global rewardfunction. Moreover, in this paper, we are not inter-ested in determining optimal policies, but in provid-ing a flexible and efficient way of implementing ex-ploration and search strategies.

In this paper, we adopt an approach to multi-objective exploration based on an explicit high-levelrepresentation of the strategy to be implemented dur-ing the mission. The use of Petri Net Plans allows foran expressive representation language and an effi-cient implementation. In addition, the ability of rep-resenting in a qualitative way the knowledge ac-quired during the mission, the decision conditionsthat are used to implement a multi-objective strategy,and a priori knowledge about the rescue mission isfundamental for an effective implementation of au-tonomous rescue robots. Such a high-level represen-tation has the following advantages over a numericaloptimization technique: �1� it does not require the



definition of empirical numerical values to guide theexploration and search task; �2� it allows for an easycharacterization of the strategy to be adopted in agiven scenario; �3� it allows for an easy injection of apriori knowledge about the mission, which is usuallygiven in qualitative terms.

3. HIGH-LEVEL REPRESENTATION OFMULTI-OBJECTIVE EXPLORATION ANDSEARCH STRATEGIES

Autonomous robots performing exploration andsearch tasks must be able not only to build a map ofthe environment and to localize themselves with re-spect to it �SLAM�, but also to decide the sequencesof movements needed to explore, in an optimal way,the whole environment.

It is useful to characterize exploration as a next-best-view �NBV� problem �González-Baños &Latombe, 2002�, i.e., computing a sequence of sensingpositions based on the data acquired at previous lo-cations. An optimal NBV algorithm moves the robotto positions, where it can achieve the maximum in-formation gain �given the current knowledge of theenvironment�. The algorithm structure is given by thefollowing steps:

1. select or choose the next view point;2. navigate or move the sensor to that view

point;3. acquire information from the sensor;4. integrate this information with the global

knowledge of the object/environment.

This process does not take into account multi-objective searches. When the exploration task andthe search tasks are performed simultaneously, itis often convenient to interrupt the action to reacha given target if a better �e.g., more promising,closer� one is detected during navigation. For ex-ample, in a rescue mission, it is possible to noticea victim while reaching the next frontier for ex-ploration and map building. In this case, wewould like the robot to get information about thevictim as soon as possible, possibly before reach-ing the current frontier target. The NBV strategy,as stated above, does not implement such behav-ior, since decisions are taken only when the cur-rent target has been reached.Moreover, the obvious approach of reconsidering

the choice of the next target at each cycle �or with veryhigh frequency� can be unfeasible due to the compu-tation cost and the uncertainty in robot perceptionthat may cause instability of behavior and conse-quent loss of performance. Moreover, one needs toface unexpected navigation failures that, in a rescuemission, can often happen.

In this section we present a highly flexible mecha-nism for realizing the decision level of an autono-mous robot involved in a multi-objective explorationand search task. In particular, we describe an ap-proach that provides a very flexible way to decide �1�when it is necessary to reconsider decisions and �2�how to choose among different targets.

The basic structure of our exploration and searchstrategy is inspired by the general NBV algorithmstructure and is divided into two steps: �1� detection,evaluation, and selection of target candidates and �2�navigation towards the chosen target. However, ef-fective implementations of complex exploration andsearch strategies are achieved by exploiting the highflexibility of the high-level strategic component thatallows for explicitly modelling interrupt conditionsand reactions to navigation failures.

To this end, we have chosen to explicitly modelthe strategic level, by developing a Plan ExecutorModule, based on Petri Net Plans. Petri Net Plans ex-ploit Petri Net ability of representing concurrent ex-ecution of discrete state systems and are thus morepowerful representations based on finite state ma-chines. In particular, PNP allows for the representa-tion of different kinds of actions �ordinary and sens-ing actions� as well as many standard controlstructures: conditional execution, loops, concurrentexecution, interrupts, communication and synchroni-zation actions for multi-agent systems, etc.

The use of PNP provides the developer with asimple to use high-level formalism for defining com-plex behaviors, with graphical tools for writing anddebugging plans, with verification tools to check con-sistency of the plans with respect to the intended be-haviors, and with simulations of plan execution thatare used to test its correct behavior. These are impor-tant advantages with respect to hard coding the robotbehaviors, which speeds up development time, in-creasing implementation correctness and system re-liability.

While a detailed presentation of the Petri NetPlan formalism is outside the scope of the present pa-per �see �Ziparo & Iocchi, 2006� for details�, we pro-vide intuitions of the PNP structures and we present



examples of multi-objective exploration and searchstrategies developed with such a formalism.

One instance of such plans is depicted in Figure1. All the actions begin with a start transition, have anexecution state exec, and terminate with an end, afail, or an interrupt transition. The token contained inthe place P1 indicates the initial marking, represent-ing the beginning of the execution. This token is thenpropagated in the network places as specified by thetransitions, which can be fired when the input placesare active, i.e., when they contain at least one token,and the attached conditions �written between squarebrackets� are true.

The plan begins with a computeTargets action.This action performs three activities: �1� it computesmap frontiers, �2� it evaluates candidate features to beanalyzed, and �3� it chooses one of them as the nexttarget position. The next action, getTargetType, isused as a conditional construct to guide the plan ac-cording to the target type. If it has been chosen to goto a possible victim, the subplan navigateToCandidat-eVictim is performed, otherwise the subplan navi-gateAndSearch is executed. In this plan, an interrupttransition is present in the execution step of navigate-AndSearch. This is controlled by the conditionnewVictimFound and it is used to switch from thenavigateAndSearch behavior to the navigateToCan-didateVictim one. In other words, it allows for imple-menting a strategy where victim analysis is preferredto map exploration.

The subplan navigateToCandidateVictim, shown

in Figure 2, starts two parallel actions, pantiltSearchand navigate, using the fork operator before their ac-tivation. While the robot navigates to the chosen tar-get, the pan-tilt unit will move searching for new in-teresting places �using analysis of data coming fromthe thermo and stereo sensors, which is explained inthe next sections�. When the robot approaches the vic-tim, the pantiltSearch action is stopped and the stereosensor is pointed towards the candidate victim �ac-tion pantiltPoint�. When the target position is reached,both navigate and pantiltPoint actions are stopped:this allows for a more stable acquisition of data andfor using all the computational power required by thecomplex analysis of 3D points extracted with the ste-reo sensor �action analyzeVictim�. The subplan termi-nates either after the victim analysis action or in thepresence of navigation failures. In this latter case, ac-tion synchronization is used to interrupt also the pan-tilt search action. Notice that, when this subplan ter-minates, the main plan will loop to compute a newtarget and go to it.

4. SYSTEM DESCRIPTION

In this section, we briefly explain some implementa-tion details of our robotic system for autonomoussearch and exploration. Our system has been vali-dated through a set of experiments performed both insimulation and in real rescue arenas. It is worth point-ing out that we use the same software modules and

Figure 1. One of the plans used in the experiments, called “Victims First.”



the same strategic plans both in the simulated and inthe real scenarios. We modeled our world in thesimulated environment to be identical to the real one;in this way, the comparison between the behavior inboth environments is straightforward. In the follow-ing subsections we provide an overview of the sys-tem and the components that are necessary to per-form a search and rescue mission. Moreover, wedetail important issues regarding the modeling of ourrescue scenario in USARSim.

4.1. Robotic System

Our robot is a wheeled mobile robotic base, Pioneer3-AT, equipped with a laser range finder. An on-board computational unit is responsible to run allthe software modules needed by the robot. Sincemoving the robot can be very time consuming, thesensors used for feature detection �a stereo-cameraand a nontouch thermo sensor� are mounted on apan-tilt unit. In this way, while the robot is moving,these sensors can point towards targets that are notin front of the robot.

The possible targets to be explored are com-puted using three different algorithms, which aim atreaching three different �and concurrent� goals:

• The first goal is to build the map of the en-vironment using laser range finder data;therefore, the first algorithm computes fron-tiers between free explored space and un-known portions of the map using a wave-front expansionlike algorithm �Latombe,

1991�. The algorithm allows to compute fron-tiers incrementally; as pointed out in �Yamau-chi, 1997�, since frontiers move, while the ro-bot acquires new information about theenvironment, the method will eventuallyguide the robot to explore all the accessibleenvironment �though there is a possibleZeno-like Paradox, that prevents the map tobe fully explored, this is very unlikely to hap-pen�.

• The second goal is to find “interesting” fea-tures in the environment: in our implemen-tation, we assume that the interesting fea-tures are always intercepted by a laser scan.Therefore, we need to search for victims onlyalong the obstacles detected by the laserrange finder, thus avoiding unnecessarysearch in open spaces.

• A search algorithm scans the areas given bythe method above and returns a list of pos-sible features. These need to be analyzed inorder to be confirmed or declared false posi-tives. This analysis is computationally inten-sive and requires the robot to stand still: thislist of possible features represents the thirdset of candidate targets.

In addition, the search and exploration sub-system relies on other modules, which are commonamong robotic systems and represent the low-levelcomponents that allow the robot to realize the deci-sions taken by the strategic level. These modules are

Figure 2. The subplan navigateToCandidateVictim.



integrated in a framework that enables concurrentprocesses and communication among them �detailscan be found in �Farinelli, Grisetti, & Iocchi, 2006��.In the following we provide a brief description ofthe most important software modules.

• The SLAM Module deals with map buildingand localization at the same time. We use ascan matching algorithm for on-line localiza-tion in conjunction with an occupancy gridfor map building. The map built using suchan approach is locally consistent and the er-ror accumulated is small enough to allow forsafe navigation, good quality maps, andquite precise victim localization.

• The Navigation Module enables the robot toreach a target position; a coarse “topological”path-planner computes a path and a lower-level module is in charge of maneuvering therobot to follow the provided path �see �Calisi,Farinelli, Iocchi, & Nardi, 2005� for details�.The lower level module uses two modalities:an accurate motion planner based on Ran-domized Kinodynamic Trees �see �LaValle &Kuffner, 1999�� to maneuver the robot in clut-tered space, and a fast reactive navigationmodule based on the Vector Field Histogramthat is able to steer the robot at high speedwhen far from obstacles.

• The Human Body Detection Module �HBD�performs two processing steps. The first stepis fast, but not accurate, and is used to iden-tify interesting places where a human bodycould be located. The main sensor used inthis step is a nontouch temperature sensor.The second step uses temperature informa-tion, stereo vision, and a human body modeldatabase �see �Bahadori Ghouchani, 2006� fordetails�. It is very computationally intensiveand slow, but provides good accuracy. Forthis reason, it should be activated only inthose areas that are declared interesting in thefirst step; moreover, its range of operation isshort �between 0.5 and 1.5 m�.

4.2. Simulated Robotic System

The experiments in the simulated scenario are per-formed using USARSim, a 3D simulation environ-ment based on a game engine. This framework al-

lows for a realistic modeling of robots, sensors, andactuators, as well as complex, unstructured, dy-namic environments. Recently, it has been used tomeasure the performance of robotic platforms andmulti-robot systems in USAR �Urban Search AndRescue� scenarios �Wang, Lewis, & Gennari, 2003;Balakirsky, Scrapper, Carpin, & Lewis, 2006�.

In USARSim, we can use exactly the same soft-ware modules used in the real robot �SLAM, Navi-gation, etc.� by replacing the interfaces to the realsensors �drivers�, with interfaces to the simulator.

The simulation environment provides models ofour mobile base and of the pan-tilt unit. It also in-cludes models of laser range finders, odometry, andsonar sensors. The only missing device is the non-touch temperature sensor. However, we are not in-terested in using the simulator to test and evaluatethe HBD system; we rather want the simulated sys-tem to have similar behaviors with respect to thereal robot when executing the exploration process.To this end, in the simulator, we have used RFIDtags located near victims and put an RFID reader onthe robot. When the RFID tag is detected by thereader, the robot receives data similar to the onesprovided by the thermo sensor on the real robot, i.e.,a position in the environment were a possible victimwas detected. For the second step �i.e., victim confir-mation�, we replace it with the request to the simu-lated robot of approaching the victim and taking asnapshot of the scene.

5. EXPERIMENTS

A set of experiments has been performed in order toshow and analyze the effectiveness of using the high-level strategic module. More specifically, we haveevaluated the ability of introducing some qualitativea priori knowledge about the mission in the system.In the following of this section, we first present theperformance metric used to evaluate the differentstrategies and then show the results in four differentscenarios.

5.1. Performance Metrics

An important aspect of the search and rescue tasksdescribed in this article is the definition of effectivemetrics for measuring performance of autonomousrobotic systems. The goal is to choose the actionsthat gain the maximum information in the minimum



time �or minimum energy�, and performance metricscan be divided in two groups: measure of time �orenergy� needed to acquire all the information fromthe environment and measure of information ac-quired with limited �time or energy� resources.

The first group considers a complete discoveryof all the information from the environment andmeasures the time �or energy� needed to accomplishsuch a task. This is a good and objective metric, butit requires that the task is completely finished, andcannot be used to evaluate partial execution of thetask. Notice that in many cases a complete discoveryof the information from the environment is not fea-sible, due to time or resource constraints in largeenvironments �e.g., battery life�, or to the inability ofthe robot to detect some features.

Another set of metrics determines a bound onthe resources available to the robot and measuresperformance of the search task once the limit hasbeen reached. This bound can be given both in termsof time and in the form of some kind of energy con-sumed by the robot �that can be approximated bylimiting the total path the robot can drive along�.This metric is easily applicable to real systems, sinceit takes into account intrinsic limitations on systemresources, but it requires measuring how much in-formation has been gathered from the environmentat a given point. This is not easy in the presence ofmultiple objectives �e.g., map to be explored and vic-tims to be found�, and in many cases �see, for ex-ample, the evaluation function used within the Rob-oCup Rescue competitions �Jacoff, Weiss, & Messina,2003�3� the introduction of weighting factors thatmake the measure not objective is needed.

In order to use an objective performance metricthat is applicable to real robotic systems, in the fol-lowing we will consider several functions, one foreach feature that must be discovered and measuredfrom the environment.

More specifically, our performance metric isgiven by a set of functions

V � ��vi�t��i = 1, . . . ,n� , �1�

one for each feature i to be measured from the envi-ronment. Each function vi�t�→ �0,1� measures thepercentage of information related to feature i mea-sured at time t. For example, in the case of map

building, vi�t� is the amount of explored map withrespect to the total size of the environment, while inthe case of victim detection it is the percentage accu-racy in determining the number of discovered vic-tims versus the total number of victims that are inthe environment. More specifically, for victim detec-tion the evaluation function is

vi�t� = max

0,1 −�found _ victims − total _ victims�

total _ victims ,

where found_victims refers both to candidate andconfirmed victims. This function guarantees that viis always between 0 and 1 even in the presence offalse positives.

This metric has the advantage of being easilyapplicable to real systems; moreover, it does not re-quire the definition of weighting factors. However, itdoes not allow for a direct comparison between dif-ferent strategies or different systems, in cases whereall the single functions of one system/strategy donot outperform the ones of the other.

5.2. Experimental Scenarios

Experiments have been performed in four differentscenarios that require different mission strategies toaccomplish the goal. In the following subsections wedescribe each scenario and the particular strategyimplemented for it. Moreover, we discuss the perfor-mance of our approach with respect to the particulargoal of the mission. All the experiments have beenperformed in a limited time �10 min� and we mea-sured the values of information vi�t� for the interest-ing features: v1�t� measures the explored map, v2�t�the candidate victims, v3�t� the confirmed victims,v4�t� �used only in Scenario 2� the fire detection.

In Scenarios 2–4, qualitative a priori knowledgeabout the mission is provided. This is taken into ac-count in our approach by modifying the high-levelPetri Net Plan in order to better fit the desired be-havior.

In Scenarios 1 and 2, the implemented strategiesare compared with a greedy policy �we call it “Sim-plest”� that uses a utility function to decide the nextposition to explore. This strategy can interrupt thenavigation to the current goal if another candidatetarget, with a higher utility value, has been found.

3http://www.rescuesystem.org/robocuprescue/orhttp://www.isd.mel.nist.gov/projects/USAR/competitions.htm



The utility function computes the path distance tothe candidate targets and then multiplies this valueby a fixed weighting coefficient that characterizes theimportance of the target type.

Scenarios 3 and 4 are more complex, and the apriori knowledge cannot be simply expressed by de-fining a function to optimize. Therefore, in these sce-narios, we show how the high-level representationof the robot plan can be suitably modified in order totake into account complex qualitative knowledge.

The experiments have been performed usingboth our rescue robot operating in a test bed arenaand a simulated robot in USARSim. More specifi-cally, the experiments of Scenarios 1 and 2 have beenperformed both in the real and in the simulated en-vironment, while the ones in Scenarios 3 and 4 havebeen performed only in the simulation. For each ex-periment, ten runs have been performed �both withthe real robot and in the simulator� starting from thesame initial configuration of the environment and ofthe robot. Results reported below are averaged overthese ten runs.

As described below, experiments in simulatedscenarios are performed using two different maps�one for Scenarios 1 and 2 and one for Scenarios 3and 4� with two different robot starting poses andvictim configurations for each of them.4

5.2.1. Scenario 1

In the first scenario, the goal is to find as many vic-tims as possible with no a priori knowledge on theenvironment. For this scenario we have performedexperiments in the virtual and in the real environ-ments, aiming at assessing the basic behaviors of thesystem. The arena is built following the RoboCupRescue Yellow Arenas specifications: it is planar andhas a mazelike structure. The map used in the realand the simulated environment is the same and it isshown in Figure 3. In both cases there are three vic-tims in an area of 7�5 m2. The robot goal is thus tofind and confirm all potential victims. The plan thatdetermines the strategy for this scenario is the onealready described in Section 3: the robot stops theexploration as soon as a new potential victim hasbeen found and then it moves to a location suitablefor the confirmation step. We call this strategy “Vic-tims First.”

The results of the experiments are shown in Fig-ure 4, where we report, for each time step, the num-ber of victims detected �both potential and con-firmed�, the number of victims confirmed, and theportion of the total map explored. As already men-tioned, the reported values are averaged over a setof ten experiments. The left graph is relative to ex-periments in the real scenario and it can be com-pared with the right one that shows the results in thesimulated environment. The results of the two setsof experiments show a similar trend of the evalua-tion functions, indicating a consistent behavior in

4Simulated scenarios, robot configurations, and results are avail-able for experimental comparison of other approaches.

Figure 3. The robot in the simulated and real arenas used for the experiments in Scenarios 1 and 2.



the two environments. In fact, it is possible to seethat victims are analyzed as soon as they are discov-ered. Notice that, while the trend of the functions issimilar, their values present some differences thatare due to the different behavior of the simulatorwith respect to the real environment.

This strategy has been also compared with the“Simplest” one, where the importance factor to con-firm a candidate victim is much higher than the oth-ers �i.e., exploring other part of the environment andfinding other candidate victims�. The two strategieshave the same behavior, hence the graphs are verysimilar to the above ones.

5.2.2. Scenario 2

In this scenario we introduce a new feature that rep-resents the source of the incident �it can be thesource of a fire, a chemical agent, etc.�. There is onlyone such feature and the priority of discovering it isconsidered higher than finding the victims, since itmay still represent an active threat. For the simu-lated experiments, we used a particular kind ofRFID to model this feature, while in the real experi-ments a heat source with a very high temperature�about 100 °C� is used. In the real system, fire detec-tion is performed by the same software module thatis used for detecting potential victims, by using adifferent interpretation of data coming from thethermo sensor. In both cases, we assumed that a fur-ther detection step to confirm this observation is notneeded.

The strategy implemented for this scenario�called “Fire First”� is a slight variation of the onementioned in the previous subsection. We want therobot to find the threat as soon as possible, avoidingthe time-consuming analysis of candidate victimsduring the search. Therefore, potential victims arenot considered when computing the next target toexplore or search �in the computeTargets action ofFigure 1�, though we may still find new potentialvictims. As soon as the fire has been found, the pri-ority of the strategy changes and, afterwards, the ro-bot can focus on analyzing potential victims �i.e.,those previously found and new ones as soon asthey are detected�.

To this end, we changed the computeTargets ac-tion by transforming it in a subplan that is shown inFigure 5. Here a conditional action checks if the firehas already been found and selects one of two dif-ferent instances of the computeTargets action: thefirst considers victims to be analyzed, while the sec-ond does not.

The “Simplest” strategy for this scenario hasbeen configured by ordering the importance factorsof the features as follows: detecting fire has higherpriority than confirming victims and exploration.The results of the set of experiments in the real en-vironment are shown in Figure 6. The ones from thesimulated environment have a similar trend and arethus omitted. The event of the discovery of the fire isalso included. In the single experiment, this valuecan be 0 �fire not found� or 1 �fire found�, but since

Figure 4. Results of the experiments in Scenario 1: �left� results in the real scenario using the “Victim First” strategy,�right� results of the same strategy in the simulated scenario.



the graph shows the average over ten experiments,intermediate values are present �for each “step” ofthe function measuring fire discovery, we know thatat least one experiment found the fire at that time�.

The results of the experiments show that the“Fire First” strategy handles effectively the priorityof the fire search: the robot never stops the naviga-tion to perform the second time-consuming step ofthe human body detection algorithm, before the fire

discovery. The main difference with respect to the“Simplest” strategy is that, since it uses a fixed util-ity function for the choice of the next target to ex-plore, the system behaves like it is maintaining thesearch of the threat as the primary goal of the mis-sion, even after having found the threat. In otherwords, with the “Simplest” strategy, the second stepof the victim analysis �and the consequent confirma-tion� can be performed only when the whole envi-ronment has been explored and all possible threatsand victims found, while this process can be initi-ated immediately after the fire discovery by the “FireFirst” strategy. The outcome of this behavior is thatthe “Fire First” strategy confirms a higher number ofvictims. In this scenario, we can notice that a dy-namic reallocation of the weights of the “Simplest”strategy will result in a behavior similar to that ofthe “Fire First.” Anyway, this solution does not al-low for the flexibility of using the high-level repre-sentation.

5.2.3. Scenario 3

The goal here is to find as many victims as possible,with a priori knowledge of their location. This canbe represented, for example, by a probability distri-bution over a metrical map or by some topologicalhigh-level statement �e.g., victims are mostly in thethird room on the left�. Although we have such in-formation, we do not yet know the �metric� map ofthe environment, which has to be built on-line.

Figure 5. The subplan computeTargets, used in Strategy“Fire First.”

Figure 6. Results of the experiments in Scenario 2: on the left the strategy “Fire First,” on the right the “Simplest”strategy, both in real environments.



The plan-based strategy here is another varia-tion of the “Victims First” strategy, where the actioncomputeTargets takes into account the probabilitydistribution in the decision of the next target to ex-plore, giving more chances to candidate targets thatare in areas where the probability distribution ishigher. We call this strategy “Biased.”

Instead, the “Simplest” strategy cannot be usedas it is; it should be rewritten in order to take intoaccount the notion of probability distribution.

The simulated scenario represents an officelikeenvironment �see Figure 7�, with a long corridor andfive rooms. The a priori knowledge gives a higherprobability to find victims in one of the rooms andthe robot should concentrate its search in that room,although it is possible to find other victims else-where. In particular, we put four victims inside aroom and one in the corridor.

Results of this experiment performed in thesimulated environment are shown in Figure 8. Inthis figure, the results are compared with the out-come of the “Victims First” strategy in the same en-vironment, in order to show how the a priori knowl-edge about the victim location leads to a consistentperformance improvement, i.e., almost all the vic-tims are confirmed by exploring only a portion ofthe environment.

5.2.4. Scenario 4

In the last scenario, the a priori information aboutthe mission is the knowledge that victims are prob-ably grouped in clusters. However, we do not knowanything about their locations in the map. Theimplemented strategy, called “Go Away from Con-firmed” follows the usual structure of the other ex-periments, except that, when a victim is found andconfirmed, the computeTarget action will not con-sider other potential victims that are in the samearea. This allows searching for other clusters as soonas a victim �that can be considered as the represen-tative of its cluster� has been detected and con-firmed.

In this scenario, we use a map similar to that ofScenario 3, in which we put five victims in one roomand another far away from them, so that we havetwo clusters of victims.

Results in Figure 9 show that the strategypushes the robot to explore unknown areas as soon

Figure 7. The officelike environment used for Scenarios 3and 4.

Figure 8. Results of the experiments in Scenario 3: on the left the strategy “Biased,” on the right the results of the“Victims First.”



as it detects and confirms the first victim �rememberthat since there are five victims, 20% means one vic-tim�. In this case, we are not interested in the num-ber of confirmed victims, but in finding and con-firming both the clusters of victims in the room �byfinding and confirming one of them� and the victimin the corridor, isolated from the rest. To show thecorrect behavior of this strategy, we show also theresults of the “Victims First” strategy, which, follow-ing correctly its goal, will confirm as many victimsas possible. Since the room with the four victims isnearer than the isolated victim, this strategy discov-ers and confirms, in all experiments, the four victimsin the room, but has not enough time to discover thesecond cluster.

6. CONCLUSIONS AND FUTURE WORK

In this paper we have investigated multi-objective ex-ploration and search in a rescue environment. Specifi-cally, we focused our attention on autonomous robotsthat can accomplish missions without direct controlfrom the operator. We have described a solutionbased on a high-level framework for representingplans. In this way, it is possible to implement explo-ration and search strategies that depend on the fea-tures of the operational scenario and on the objectivesof the mission and can be effectively applied by therescue robot.

Different experimental scenarios have been de-

vised to suitably evaluate the performance of explo-ration and search strategies. Moreover, a comparisonof the proposed approach with methods based onutility function optimization has been performed.The reported results show that our formalism allowsfor easy implementations of different strategies thatare effective in the considered rescue scenarios. More-over, based on the experimental analysis, we arguethat a proper evaluation of the performance in multi-objective search and exploration requires user ori-ented criteria to assess the effectiveness of the pro-posed methods.

As future work the approach could be extendedby addressing more complex scenarios, where someof the assumptions made in the present work are notapplicable: for example, dynamic scenarios whereevents occur during the mission �e.g., doors areopened/closed�, 3D environments �e.g., multi-levelenvironments�, presence of moving victims, and ad-ditional features to be measured from the environ-ment. In our opinion, more complex scenarios shouldfurther emphasize the features of the proposed ap-proach.

Finally, the possibility to deploy teams of robots,which seems particularly attractive from a practicalviewpoint, cannot simply be handled by summingthe results achieved individually by the robots. Boththe quality and the correctness of the information de-pend on how results are combined. Specifically, theoverall result of search and exploration is clearly de-pendent on whether and how the maps built by the

Figure 9. Results of the experiments in Scenario 4: on the left the strategy “Go Away from Confirmed,” on the right theresults of the “Victims First.”



individual robots are merged. Moreover, the searchand exploration strategy is also influenced by the co-operation method adopted by the team: the analysisof the experiments and the use of more refined evalu-ation metrics with teams of robots must therefore in-clude an additional dimension to account for coop-eration.

REFERENCES

Amigoni, F., & Gallo, A. �2005�. A multi-objective explora-tion strategy for mobile robots. In Proceedings of the2005 IEEE International Conference on Robotic andAutomation, Barcelona, Spain.

Bahadori Ghouchani, S. �2006�. Human body detection insearch and rescue missions. Ph.D. thesis, University ofRome ‘La Sapienza’, Dipartimento Di Informatica eSistemistica.

Balakirsky, S., Scrapper, C., Carpin, S., & Lewis, M. �2006�.USARSim: providing a framework for multi-robotperformance evaluation. In Proceedings of the Inter-national Workshop on Performance Metrics for Intel-ligent Systems �PerMIS� .

Calisi, D., Farinelli, A., Iocchi, L., & Nardi, D. �2005�. Au-tonomous navigation and exploration in a rescue en-vironment. In Proceedings of the 2nd European Con-ference on Mobile Robotics �ECMR�, pp. 110–115,Edizioni Simple s.r.l., Macerata, Italy.

Choset, H. �2001�. Coverage for robotics—a survey on re-cent results. Annals of Mathematics and Artificial In-telligence, 31, 113–126.

Coello, C. A. C. �1999�. An updated survey of evolutionarymultiobjective optimization techniques: State of theart and future trends. In P. J. Angeline, Z.Michalewicz, M. Schoenauer, X. Yao, & A. Zalzala�eds.�, Proceedings of the Congress on EvolutionaryComputation, vol. 1, pp. 3–13, Washington DC. NewYork: IEEE Press.

DasGupta, B., Hespanha, J.P., Riehl, J., & Sontag, E. �2006�.Honey-pot constrained searching with local sensoryinformation. Journal of Nonlinear Analysis: HybridSystems and Applications, 65�9�, 1773–1793.

Farinelli, A., Grisetti, G., & Iocchi, L. �2006�. Design andimplementation of modular software for program-ming mobile robots. International Journal of Ad-vanced Robotic Systems, 3�1�, 37–42.

González-Baños, H.H., & Latombe, J.-C. �2002�. Naviga-tion strategies for exploring indoor environments. Int.J. Robotic Res., 21�10–11�, 829–848.

Jacoff, A., & Messina, E. �2006�. DHS/NIST response robotevaluation exercises. In IEEE International Workshopon Safety Security and Rescue Robots, Gaithersburg,MD.

Jacoff, A., Weiss, B., & Messina, E. �2003�. Evolution of aperformance metric for urban search and rescue ro-bots. In Proceedings of Performance Metrics for Intel-ligent Systems �PerMIS� Workshop, Gaithersburg,MD.

Jin, Y., Liao, Y., Polycarpou, M.M., & Minai, A.A. �2004�.Balancing search and target response in cooperativeuav teams. In Proc. of 43rd IEEE Conference on Deci-sion and Control, pp. 2923–2928.

Latombe, J. C. �1991�. Robot motion planning. Dordrecht:Kluwer Academic Publishers.

LaValle, S., & Kuffner, J. �1999�. Randomized kinodynamicplanning. In Proceedings of IEEE International Con-ference on Robotics and Automation �ICRA�, pp. 473–179.

Mathews, G.M., & Durrant-White, H.F. �2006�. Scalable de-centralised control for multi-platform reconnaissanceand information gathering tasks. In Proceedings of9th International Conference on Information Fusion,pp. 1–8.

Murata, T. �1989�. Petri Nets: properties, analysis and ap-plications. Proceedings of the IEEE, 77�4�, 541–580.

Peterson, J.L. �1981�. Petri Net theory and the modelling ofsystems. Englewood Cliff, NJ: Prentice Hall.

Pineau, J., & Gordon, G. �2005�. POMDP planning for ro-bust robot control. In Robotics research, SpringerTracts in Advanced Robotics, Vol. 28, pp. 69–82.

Pito, R. �1996�. A sensor based solution to the next bestview problem. In Proceedings of International Confer-ence on Pattern Recognition �ICPR�, pp. 941–945.

Stachniss, C., Grisetti, G., & Burgard, W. �2005�. Informa-tion gain-based exploration using rao-blackwellizedparticle filters. In Proceedings of Robotics: Scienceand Systems �RSS�, Cambridge, MA.

Wang, J., Lewis, M., & Gennari, J. �2003�. Interactive simu-lation of the nist usar arenas. In Proceedings of IEEEInternational Conference on Systems, Man, and Cy-bernetics, pp. 1350–1354.

Yamauchi, B. �1997�. A frontier based approach for autono-mous exploration. In IEEE International Symposiumon Computational Intelligence in Robotics and Auto-mation .

Ziparo, V.A., & Iocchi, L. �2006�. Petri Net Plans. In Pro-ceedings of Fourth International Workshop on Mod-elling of Objects, Components, and Agents �MOCA�,pp. 267–290, Turku, Finland. Bericht 272, FBI-HH-B-272/06.



multi-objective exploration and search for autonomous rescue robots

Documents