on confinement of the initial location of an intruder in a multi-robot pursuit game

31
1 23 Journal of Intelligent & Robotic Systems with a special section on Unmanned Systems ISSN 0921-0296 Volume 71 Combined 3-4 J Intell Robot Syst (2013) 71:361-389 DOI 10.1007/s10846-012-9792-4 On Confinement of the Initial Location of an Intruder in a Multi-robot Pursuit Game Soheil Keshmiri & Shahram Payandeh

Upload: independent

Post on 19-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

1 23

Journal of Intelligent & RoboticSystemswith a special section on UnmannedSystems ISSN 0921-0296Volume 71Combined 3-4 J Intell Robot Syst (2013) 71:361-389DOI 10.1007/s10846-012-9792-4

On Confinement of the Initial Location ofan Intruder in a Multi-robot Pursuit Game

Soheil Keshmiri & Shahram Payandeh

1 23

Your article is protected by copyright and all

rights are held exclusively by Springer Science

+Business Media Dordrecht. This e-offprint

is for personal use only and shall not be self-

archived in electronic repositories. If you wish

to self-archive your article, please use the

accepted manuscript version for posting on

your own website. You may further deposit

the accepted manuscript version in any

repository, provided it is only made publicly

available 12 months after official publication

or later and provided acknowledgement is

given to the original source of publication

and a link is inserted to the published article

on Springer's website. The link must be

accompanied by the following text: "The final

publication is available at link.springer.com”.

J Intell Robot Syst (2013) 71:361–389DOI 10.1007/s10846-012-9792-4

On Confinement of the Initial Location of an Intruderin a Multi-robot Pursuit Game

Soheil Keshmiri · Shahram Payandeh

Received: 28 July 2012 / Accepted: 9 October 2012 / Published online: 1 November 2012© Springer Science+Business Media Dordrecht 2012

Abstract Research in multi-robot pursuit-evasiondemonstrates that three pursuers are sufficient tocapture an intruder in a polygonal environment.However, this result requires the confined of theinitial location of the intruder within the convexhull of the locations of the pursuers. In this study,we extend this result to alleviate this convexitythrough the application of a set of virtual goalsthat are independent of the locations of the pur-suers. These virtual goals are solely calculatedusing the location information of the intrudersuch that whose locations confine the intruderwithin their convex hull at every execution cycle.We propose two strategies to coordinate the pur-suers. They are the agents votes maximizationand the profile matrix permutations strategies. Weconsider the time, the energy expended, and thedistance traveled by the pursuers as metrics toanalyze the performance of these strategies incontrast to three different allocation strategies.They are the probabilistic, the leader-follower,and the prioritization coordination strategies.

S. Keshmiri (B) · S. PayandehExperimental Robotics Laboratory,School of Engineering Science,Simon Fraser University,Burnaby, BC, Canadae-mail: [email protected]

S. Payandehe-mail: [email protected]

Keywords Multi-robot system · Pursuit-evasion ·Strategic planning · Multi-agent coordination

1 Introduction

Jankovic [9] proves that in a polygonal environ-ment three pursuers are sufficient to capture anintruder if the initial location of the intruder iswithin the convex hull of the locations of thepursuers. We extend this result to demonstratethat the confined location of the intruder is alle-viated through the incremental computation of aset of virtual goals that are independent from thelocations of the pursuers. We consider the locationinformation of the intruder to present the isogonicpoint of an isosceles triangle to generate this set ofvirtual goals. These virtual goals form the verticesof this triangle. We use variable lengths of theside and the height of this conceptual triangleto provide flexibility in the positioning of thesevirtual goals. This set of virtual goals reflects therelocation of the intruder at every execution cycle.This results in the location of the intruder withinthe convex of these goals.

We devise the pursuers with a decision mech-anism to cast their votes for the available virtualgoals. The proposed decision engine is equippedwith an opportunistic ranking module to enablethe pursuers to track the evolution of their votesthroughout the pursuit mission. We study the

Author's personal copy

362 J Intell Robot Syst (2013) 71:361–389

effect of the opportunistic ranking module ofthe decision engine of pursuers on the decisionprocess. Furthermore, we study the performanceof the agents votes maximization and the profilematrix permutations strategies to coordinate therobotic agents. We consider the time, the energyexpended, and the distance traveled by the pur-suers as metrics to analyze the performance ofthese strategies in contrast to three different al-location strategies. They are the probabilistic, theleader-follower, and the prioritization coordina-tion strategies.

The remainder of this article is organizedas follows. Section 2 provides an overview ofthe research in the field of multi-robot pursuit.Section 3 elaborates the generation process ofa set of virtual goals. The formulation of thedecision engine of the pursuers is explained inSection 4. The agents votes maximization and theprofile matrix permutations coordination strate-gies are presented in Section 6. Simulation re-sults are provided in Section 7. Conclusion andsome insights on future direction are presented inSection 8.

2 Literature Review

Pursuit and evasion has received special attentionin research of multi-robot systems. They offera wide range of real-life applications from gam-ing and the military to border patrol and otherdomains. Chung and Hollinger [1] classify theapproaches to pursuit-evasion into two main cat-egories of differential and combinatorial tech-niques. The differential approach is based onnon-cooperative differential games [2]. It utilizesthe solutions to differential equations of the mo-tion of the players as control inputs to achieve theobjective of the game. This approach further al-lows the physical constraints of robot (e.g., turningvelocity and acceleration) to be incorporated intothese equations. However, the complexity of thedifferential equations is proportional to the envi-ronmental complexity. This limits their scope toa locally or heuristically valid other than globallyoptimal solutions.

Kim and Sugie [3] introduce a target-enclosingstrategy based on cyclic pursuit algorithms for a

stationary target in an obstacle-free environment,modeled as 3D space. Guo et al. [4] extend thisapproach to enclose a non-stationary target in a2D space whose velocity is piecewise constant butunknown to the pursuers. In these approachesthe coordination of the mission is achieved byinstructing every pursuer i to pursue i + 1 mod-ulo n pursuer, where n represents total numberof pursuers. Undeger and Polat [5] employ twocoordination strategies to capture a prey. Theblocking escape directions coordination calculatesapproaching directions of the predators to prey.It follows by the using alternative proposals todetermine closest path of a predator to prey andin conjunction with the direction of the prey. As aresult, the decision-making is simplified to findingwhich agent is the closest agent to the prey andcoordination is instructing pursuers to follow theagent that is the closest to the prey.

In contrast, the combinatorial techniques rep-resent the environment geometrically using polyg-onal or graphical models. They directly employthese representations to address the pursuitgames. Cops and robbers game originally intro-duced by Nowakowski and Winkle [6] and Aignerand Fromme [7] is a classic example of the graph-ical approach to pursuit-evasion. In this setting,the game is accomplished if a cop moves onto thevertex occupied by a robber. The complexity ofthe algorithm for a single pursuer case is O(n4)

and is exponential to the number of vertices andthe pursuers in general. More specifically, thegrowth of the complexity of the algorithm is inorder of O(n2(k+1)) where n and k represent thenumber of pursuers and vertices of the graph,respectively. Isler and Karnad [8] show that theduration of the game is bounded by the number ofvertices in case of a complete graph. However, therepresentation of the environment in the form of acomplete graph Kn is an oversimplification of theproblem. This is in particular true if the behaviorof the intruder is deterministic. More specifically,in a Kn graph one single move of the pursueris sufficient to capture the intruder. Jankovic [9]shows that in a polygonal environment three pur-suers are sufficient to capture an intruder if theinitial location of the intruder is within the convexhull of the locations of the pursuers. Koppartyand Ravishankar [10] generalize this result and

Author's personal copy

J Intell Robot Syst (2013) 71:361–389 363

prove that the number of pursuers is proportionalto the dimensions of the environment. They showthat in a R

d, d ≥ 2 polygonal environment, d + 1pursuers achieve the same result. Isler et al. [11]demonstrate that a pursuer equipped with a ran-domized strategy can locate an intruder in anysimply-connected polygon. Bopardikar et al. [12]study the pursuit-evasion under limited sensingcondition where the sensing capability of the play-ers are limited within a threshold range. How-ever, the intruder exhibits a reactive behavior andchanges its location only if it senses the presenceof a pursuer. Thunberg and Orgen [13] use mixedlinear integer programming approach to addressa visibility-based pursuit scenario in a polygonalenvironment.

One of the limitation of the combinatorial ap-proaches is due to the representation of the pathsof the players as edges of the graph. Chung andHollinger [1] note that a large portion of fun-damental work in pursuit-evasion examined theproblem of edge search, where the evader resideson the edges of a graph. Edge search does notapply directly to many robotics problems. Thepossible paths of an evader in many indoor andoutdoor environments often cannot be accuratelyrepresented as the edges in a graph. In some cases,it is possible to construct a dual graph by replac-ing the nodes with edges, but these translationsdo not necessarily yield the same results as theoriginal problem (ibid., p. 310). In addition, thepolygonal and the graphical representations sufferfrom the free movement of the players amongavailable nodes. In other words, players are al-lowed to move between nodes without followingedges (ibid., p. 310). The comprehensive surveysof combinatorial approaches are found in [14, 15].

Game-theoretic and probabilistic approachesare widely used to address the pursuit-evasion. Inmatrix games [2] utility functions are representedas payoff matrices, one per agent, in which thejoint actions of agents are calculated. Joint ac-tions correspond to particular entries in the payoffmatrices and agents play the same matrix gamesrepeatedly. The gain and losses of the systemthrough these actions is calculated and if the out-come does not violate some predefined criteriathe agents are accordingly assigned to their cor-responding actions.

In the works of Levy and Rosenschein [16] andHarmati and Skrzypczyk [17] utility functions areused in a game-theoretic framework to incorpo-rate the global goal of the system into agents thatare obtained locally. The utilities are generallyused to achieve a predetermined stable state suchas the equilibrium of the system.1 For instance, ina multi-robot target pursuit, the controller picksthe combination of actions in which total distancetraveled by the agents at the system-level andat every iteration is minimized. As a result, theequilibrium or stable-state is travel distance thatis within a threshold condition. An important lim-itation of the game-theoretic approach to utilityfunctions is the lack of guarantee of optimizationof the proposed solution on a given decision cycle.The sole concern of the controller is to avoid theloss by the agent (e.g., to undergo more traveldistance) at a predetermined value, or gain abovethe given threshold in the maintenance of stability.

Keshmiri and Payandeh [19] apply theBayesian formalism to capture a non-evasiveintruder in a cluttered environment. It utilizes themeasurement (e.g., distance, sensor reading) andthe control data (e.g., velocity) to calculate thebelief distribution of an agent in conjunction witha given task. It is a recursive algorithm where thebelief at time t is calculated based on its value attime t − 1. In this framework, decision-makers usepriors to reason the manner in which their actionsinfluence the behaviors of other agents. Thesepriors vary from the probability distributionfunction of resource densities over the filed ofoperation to the preassigned ranking of differentactions available to agents. As a result, someprior densities over possible dynamics (e.g.,the probability of agent r j to take action ai ifthe current decision-maker r1 chooses a1) andreward distributions have to be known by adecision-maker in a priori.

Seminal works by Koopman [20, 21] outlineanalytical principles for applied probability andoptimization techniques for maritime warfarestrategies. Assaf and Zamir [22] utilize the dis-tributions of the locations of multiple objects to

1A robotic system is considered to meet the equilibriumcondition if there is a dynamic working balance among itsinterdependent parts (see [18] for further details).

Author's personal copy

364 J Intell Robot Syst (2013) 71:361–389

construct the priors distributions in Bayesianframework. Washburn [23] presents an itera-tive algorithm based on Morkov decision process(MDP) to pursue a target in a discretized searchenvironment.

The basic assumption in probabilistic frame-work is the ability of an agent to observe the ac-tions taken by all agents, the resulting outcomes,and the rewards received by other agents. Inthis framework an agent incorporates informationfrom the actions of other agents. In formulatingtheir strategy, it is assumed that agents can keeptrack the history of their as well as the previousactions of teammates to make future decisions.Furthermore, the probabilistic approaches are ap-plicable only for a special distributions and relyon the absence of false positive detection. Addi-tionally, they require an analytical specificationof an a priori (e.g., probability distribution of thelocation of the target within the field). Further-more, these approaches require a measure of thedensity of the search (e.g., search time, resourcesexpended) and the probability of detection giventhis density.

3 Virtual Goals Generation

A delegated mission to a robotic team is eithera composition of several subgoals (e.g., foraging,a rescue mission) or monolithic in nature, butits fulfillment demands the cooperative engage-ment of several agents (e.g., box-pushing, enclos-ing an intruder). Mission decomposition is a steptowards subdivision of the high level descriptionof a mission into a set of virtual goals such thattheir incremental executions lead the system tothe accomplishment of the mission. Virtual goalsare the means through which agents engage withthe overall mission. An important aspect of aset of virtual goals V G is that it forms the com-mon knowledge of the members of the roboticteam. More specifically, all robotic agents uti-lize the same set of virtual goals to make theirdecisions.

Definition 1 (Virtual Goals) The set of virtualgoals V G of a robotic team, is a non-empty, finiteset of disjoint elements ρ j ∈ V G where every ele-

Fig. 1 The first isogonic formation of the virtual goals

ment ρ j is representative of a subgoal performedby a robotic agent:

V G � ∅ (1)

V G → {1 . . . m} ≡ |V G | = m (2)

∀ρi, ρ j ∈ V G , ρi = ρ j ⇔ i = j (3)

3.1 Isogonic Decomposition

Some problem domains such as the pursuit-game,the formation, and the large-object transportationrequire system to determine the positioning of theindividual agents with respect to the other robotswithin a certain approximation. The underlyingstructure of this configuration of agents provides afoundation to determine their future displacementand relocation. In other words, it is possible toadapt a top-down approach to the formulation ofthe final desirable configuration to decompose amission to a set of virtual goals. Figure 1 depicts ascenario where a mission is decomposed to a set ofvirtual goals ρi ∈ V G that forms the vertices andthe first isogonic point of an isosceles triangle.2

This isogonic point resides on the intersection of

2The isogonic point of a triangle minimizes the cumulativesum of distances of the vertices of the triangle. ρ4 is theisogonic point of ρ2ρ1ρ3 in Fig. 1. There are two isogonicpoints associated with every triangle.

Author's personal copy

J Intell Robot Syst (2013) 71:361–389 365

the lines that connect the vertices of the threeequilateral triangles that are formed out of thesides of the given triangle to the vertex that is inthe opposite side of the given triangle.

As shown through the following Theorem, aninteresting property attributed to an isosceles tri-angle is the alignment of its isogonic point with itsleading vertex.

Theorem 1 (Isogonic Alignment) The isogonicpoint of an isosceles triangle is always aligned withits leading vertex.

Proof In the equilateral triangle ρ2s′ρ3 of Fig. 1,we have:

‖ρ2s′‖ = ‖ρ3s′‖ & ss′ ⊥ ρ2ρ3 ⇒ ‖ρ2s‖ = ‖ρ3s‖(4)

Similarly, in the triangle ρ2ρ1ρ3 of Fig. 1, we get:

‖ρ2ρ1‖ = ‖ρ3ρ1‖ & ρ1S ⊥ ρ2ρ3 ⇒ ‖ρ2s‖ = ‖ρ3s‖(5)

Equations 4 and 5 imply that ρ1s and ss′ arealigned. Therefore:

ρ1s′ ⊥ ρ2ρ3 & ‖ρ2s‖ = ‖ρ3s‖ (6)

��

Corollary 1 The isogonic point of an isosceles tri-angle is always in equal distances from its two sidevertices.

Proof The proof of this Corollary is apparent inEq. 6. ��

Theorem 1 and Corollary 1 demonstrate thatthe location of ρ4 is well-defined with regards tothe locations of ρ1, ρ2, and ρ3 (see Fig. 1). Thisreduces the amount of information required todecompose a mission into a set of virtual goals.More specifically, we only need to assume the ini-tial location information of one of the virtual goalsin order to generate the entire set V G . We assumethat the initial location information of ρ1 is knownto compute the set of virtual goals. However, thereis no restriction on the choice of this virtual goal.

We exploit the location information of ρ1 andcompute the set of virtual goals. In Fig. 1, locationsof ρ2 and ρ3 are computed as:

ρ2 =

⎡⎢⎢⎣

xρ1 −(

‖ρ3ρ1‖ × sin(

λ

2

))

yρ1 −(

‖ρ3ρ1‖ × cos(

λ

2

))

⎤⎥⎥⎦ (7)

ρ3 =

⎡⎢⎢⎣

xρ1 +(

‖ρ3ρ1‖ × sin(

λ

2

))

yρ1 −(

‖ρ3ρ1‖ × cos(

λ

2

))

⎤⎥⎥⎦ (8)

where:

‖ρ1s‖ = ‖ρ1ρ2‖ cos(γ

2

)= ‖ρ1ρ3‖ cos

2

)(9)

‖ρ2s‖ = ‖ρ1ρ2‖ sin(γ

2

)= ‖ρ3s‖ = ‖ρ1ρ3‖ sin

2

)

(10)

We utilize the location information of these vir-tual goals to confine the location of the virtualgoal ρ4 within the convex hull of the ρ2ρ1ρ3. Thisconstraint on the location of the virtual goal ρ4 issatisfied if and only if ∠ρ2ρ1ρ3 < 120◦ as shown inthe following theorem.

Theorem 2 (Coincidental Case) ρ1 and ρ4 coin-cide if ∠ρ2ρ1ρ3 ≥ 120◦.

Proof Let ρ4 represent the isogonic point of theisosceles triangle ρ2ρ1ρ3 where ∠ρ2ρ1ρ3 ≥ 120◦,‖ρ2ρ1‖ = ‖ρ3ρ1‖. Let:

ρ4ρ2

‖ρ4ρ2‖ + ρ4ρ3

‖ρ4ρ3‖ =(

a1

a2

)(11)

Furthermore, ρ4 is the isogonic point of ρ2ρ1ρ3

if ρ4 satisfies (Kupitz and Martini [24], p. 58):

ρ4ρ2

‖ρ4ρ2‖ + ρ4ρ3

‖ρ4ρ3‖ + ρ4ρ1

‖ρ4ρ1‖ = 0 (12)

Substituting Eq. 11 in Eq. 12, we get:

(a1

a2

)+ ρ4ρ1

‖ρ4ρ1‖ =(

00

)

⇒ (1 − a2

1

)(xρ1 − xρ4)

2 − a21(yρ1 − yρ4)

2 = 0 (13)

(1 − a2

2

)(yρ1 − yρ4)

2 − a22(xρ1 − xρ4)

2 = 0 (14)

Author's personal copy

366 J Intell Robot Syst (2013) 71:361–389

Solving Eqs. 13 and 14 for xρ4 we get:

(1 − a2

2

)(1 − a2

2

)(xρ1 − xρ4)

2 − a21a2

2

(xρ1 − xρ4)2 = 0

⇒ xρ4 = xρ1 (15)

Substituting Eq. 15 in Eq. 14, we get:

(1 − a2

2

)(yρ1 − yρ4)

2 − a22(xρ1 − xρ1)

2 = 0

⇒ yρ4 = yρ1 (16)

��

Corollary 2 ρ4 is within the convex hull of thevirtual goals ρ1, ρ2, and ρ3 if ∠ρ2ρ1ρ3 < 120◦.

Proof In Fig. 1, Let λ = ∠ρ2ρ1ρ3 < 120◦. Thisyields to (Boltyanski et al. [25], p. 236):

∠ρ1ρ4ρ3 = ∠ρ1ρ4ρ2 = ∠ρ2ρ4ρ3 = 120◦ (17)

Hence, ρ4 is within the convex hull of ρ1, ρ2,and ρ3. ��

The location of ρ4 with regards to the locationinformation of ρ1 and ρ3 is computed as:3

ρ4 =⎡⎢⎣

xρ1

yρ1 −‖ρ3ρ1‖×cos(

λ

2

)+ (‖ρ3ρ1‖×sin( λ

2 )×cos(α))

sin(α)

⎤⎥⎦

(18)

3ρ2ρ1ρ3 is an isosceles triangle, hence: ‖ρ3ρ1‖ = ‖ρ2ρ1‖.

3.2 Transformation of V G Elements

Section 3.1 demonstrates a procedure to calcu-late a set of virtual goals based on their finaldesirable configuration. However, this proceduredoes not reflect the effect of the rotation of thefinal configuration on the location informationof these virtual goals. We introduce this effectof the rotation of the final configuration to thelocation information of the virtual goals throughthe application of the transformational matrix:

Rz(θ) =⎛⎝

cos (θ) − sin (θ) 0sin (θ) cos (θ) 0

0 0 1

⎞⎠ (19)

where θ represents the angle of the rotation of theconfiguration.

We use Eqs. 7 and 8 along with the transfor-mational matrix in Eq. 19 to update the locationinformation of ρ2 and ρ3 with regards to ρ1. Thisyields to:⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

cos(θ) ×(

xρ1 − ‖ρ3ρ1‖ × sin(

λ

2

))

+sin(θ) ×(

yρ1 − ‖ρ3ρ1‖ × cos(

λ

2

))

cos(θ) ×(

yρ1 − ‖ρ3ρ1‖ × cos(

λ

2

))

−sin(θ) ×(

xρ1 − ‖ρ3ρ1‖ × sin(

λ

2

))

1

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

(20)

Similarly, we use Eqs. 18 and 19 to update thelocation information of ρ4 to:

⎛⎜⎜⎜⎜⎜⎜⎜⎝

sin(θ) ∗(

yρ1 −(

‖ρ3ρ1‖ × cos(

λ

2

))+

(‖ρ3ρ1‖ × sin(

λ2

) × cos(α))

sin(α)

)+ xρ1 × cos(θ)

cos(θ) ∗(

yρ1 −(

‖ρ3ρ1‖ × cos(

λ

2

))+

(‖ρ3ρ1‖ × sin(

λ2

) × cos(α))

sin(α)

)− xρ1 × sin(θ)

1

⎞⎟⎟⎟⎟⎟⎟⎟⎠

(21)

4 Robotic Agents Decision Mechanism

Decision engine is the backbone of a multi-robotsystem. Its proper design and implementation is

paramount to the successful accomplishment ofa mission. In particular, this component enablesthe system to exploit the available information toinfer the further steps such as the group-consensusand the coordination of agents.

We assert that there are two exclusive states as-sociated with an agent that is situated in the world

Author's personal copy

J Intell Robot Syst (2013) 71:361–389 367

and is engaged in a mission: the external state andthe internal state. The external state relates anagent to the mission and the surrounding envi-ronment. In contrast, the internal state relates theagent to itself. This approach to the formulation ofthe decision engine empowers an agent to realizeits spatial relation to a mission. Furthermore, itenables the agent to determine its capability tofulfill the delegated tasks.

Incorporation of the state of an agent into thedecision mechanism has been subject to a rigor-ous research. The probabilistic approaches suchas [26, 27] focus mainly on the external state anddiscard the role of the internal state (see [28] forthe review of the topic). The requirement of a pri-ori information in the form of probability densityfunctions (e.g., uniform probability distribution ofsuccess to every region of the environment at thecommencement of a mission) is another subtletyof these approaches [29, 30]. However, availabilityof a priori information is questionable if the envi-ronment and the task space are highly dynamic.Furthermore, the precision of the prediction ishighly dependent on the size of a priori informa-tion and the result of the decision is misleading ifthis information is small.

The game theoretic frameworks attempt tomodel priors through a set of predeterminedpayoff values (e.g., Amigoni and Troiani [31]). Inother words, they transform the inherent dynamicof the decision-making problem into a highly de-terministic formulation. They also have the short-coming of high computational complexity due tothe requirement of an exhaustive search of allpossible outcomes of the game.

In the reminder of this section we formulate adecision mechanism that utilizes the internal andthe external states of the agents to estimate theircontributions at the individual-level. The overallformulation of the decision engine is presentedin Section 4.1. We alleviate the requirement ofa priori information through incorporation of anopportunistic ranking module in the external statecomponent of the decision engine in Section 4.2.

4.1 Formal Representation

Although there is a correspondence between theresults of the estimates of the internal and the

external states of an agent, these results are con-ditionally independent. In particular, the result ofthe estimate of a state does not imply the compu-tational outcome of the other state. For instance,the estimate of the external state to allocate a taskto an agent may contradict the estimate of theinternal state of the agent for the same task. Thiscontradiction is plausible if the available energyto an agent is insufficient to navigate the pathto the designated task. It is also possible for anagent to miss a specific equipment (e.g., an end-effector) to perform a task. Hence, it is crucialfor a decision mechanism to explicitly recognizethe independence of the results of the estimatesof the internal and the external states during thedecision process. This results in incompatibilityof the additive incorporation of the internal andthe external states. More specifically, the additiveincorporation of the estimates of these states pro-vides the agent with an inaccurate and a mislead-ing result. We further elaborate the conditionalindependence of the states of an agent through thefollowing example.

Let assume the external state of the roboticagent ri estimate a 100 % of success if ri is as-signed with the virtual goal ρ j ∈ V G . However,the internal state of ri realizes the lack of sufficientenergy to reach ρ j and hence rank this virtualgoal with 0 % estimate of success. It is apparentthat the additive incorporation of the internal andthe external states of ri provides the agent witha misleading estimate to rank the virtual goal ρ j

with a 100 % of success.We address this incompatibility through the

multiplicative incorporation of the internal andthe external states into the decision mechanism:

πi(ρ j) = 1

η

∏ψi(ri, ρ j)φi(ri, ρ j), ∀ρ j ∈ V G (22)

where

ri ith robotic agent.ρ j ∈ V G jth element of the set of virtual goals

V G .πi(ρ j) Vote of the ith agent for the virtual

goal ρ j ∈ V G .ψi(ri, ρ j) The external state component.φi(ri, ρ j) The internal state component.

Author's personal copy

368 J Intell Robot Syst (2013) 71:361–389

Normalization factor 1η

ensures that the votes ofan agent for the available virtual goals sum to 1:∑

ρ j∈V G

πi(ρ j) = 1, ∀πi(ρ j) ≥ 0, i = 1 . . . n (23)

where n represents the total number of robots.Robotic agents use the decision engine in

Eq. 22 to independently rank the virtual goal ρ j ∈V G incrementally and at every decision cycle.As a result, every individual agent maintains anumber of vote values that is directly proportionalto the cardinality of the set of virtual goals |V G |.These values form the vote profile �i of a roboticagent.

Definition 2 (Vote Profile) Vote profile �i of theith robotic agent comprises the votes of the agentfor the virtual goal ρ j ∈ V G :

�i = {πi(ρ j) : ρ j ∈ V G , πi(ρ j) ≥ 0} (24)

4.2 External State Component

The external state component ψi(ri, ρ j) in Eq. 22ranks the contribution of an agent to a missionwith respect to a given virtual goal using the loca-tion information of the robot and the virtual goal.It consists of the default and the opportunisticranking modules.

Definition 3 (Default ranking module) Given aset of virtual goals V G at a decision cycle t, thedefault ranking represents the estimate of the suc-cess of the agent to participate in the mission usingthe virtual goal ρ j ∈ V G at decision cycle t.

The default ranking module of the ith agentπ t

i (ri �→ ρ j) calculates the vote for a given virtualgoal ρ j ∈ V G based on the current distance of theagent to the virtual goal dcur

i (ri, ρ j) and the desiredagent-to-virtual goal distance ddesired

i (ri, ρ j) atdecision cycle t:

π ti (ri �→ ρ j)

=

⎡⎢⎢⎣

1 dcuri (ri, ρ j) ≤ ddesired

i (ri, ρ j)

ddesiredi (ri, ρ j)

dcuri (ri, ρ j)

Otherwise(25)

The desired distance ddesiredi (ri, ρ j) is a prede-

termined, fixed integer that corresponds to theinterval:

0 < ddesiredi (ri, ρ j)

≤ ⌊dcur

i (ri, ρ j) − (dcur

i (ri, ρ j) − ⌊dcur

i (ri, ρ j)⌋)⌋

(26)

Equation 26 indicates that the desired distanceddesired

i (ri, ρ j) is bounded with an upper limit thatis equal to the floor of the integer portion ofthe current distance of the robotic agent to thevirtual goal. Furthermore, Eq. 26 expresses thatzero is not a permissible choice for the value ofthe desired distance. This is due to the fact that adesired distance that is set to zero yields the resultof the computation of Eq. 25 to be steadily zero ifdcur

i (ri, ρ j) > ddesiredi (ri, ρ j).

The choice of the value of the desired distanceis highly domain-specific. This value is influencedby the type of information that is represented bythe virtual goals. For example, the value of thedesired distance is as close as possible to zero(e.g., it is set to one) when a virtual goal marksa specific location for an agent to attend. On theother hand, an approximation of the vicinity of theagent to the virtual goal suffices if the virtual goalrepresents a region.

The superscript t in Eq. 25 refers to the decisioncycle t. This implies that agent ri considers thedistance to a virtual goal ρ j that is encounteredat time t. This distance is calculated based on thelocation information of the robotic agent and thevirtual goal. The current distance of an agent toa virtual goal is dependent on its ability to detectobstacles. In particular, this distance is unaffectedby the presence of an obstacle that is beyondthe detection range of an agent. Therefore, thecurrent distance is calculated as:

dcuri (ri, ρ j)

=⎡⎣

wi‖ρ j − ri‖ No obstacle

wi‖p − ri‖ + wi‖ρ j − p‖ Otherwise(27)

Author's personal copy

J Intell Robot Syst (2013) 71:361–389 369

where p represents the next temporary locationthat ri selects reactively to avoid collision with anobstacle.

Definition 4 (Opportunistic ranking module)Given a set of virtual goals V G at a decision cyclet, the opportunistic ranking represents the votevalues of the agent ∀ρ j ∈ V G that are calculatedat t − 1.

The opportunistic ranking module ωi(ρ j) incor-porates the estimate of the success of an agent fora given virtual goal ρ j ∈ V G that is acquired inprevious decision cycle:

ωi(ρ j) = C + π t−1i (ρ j) (28)

where C ∈ [0 . . . 1] is a constant that initializes theopportunistic ranking module of the external statecomponent of the decision engine at the com-mencement of a mission. This initialization valueis necessary to avoid an unexpected behavior ofthe decision mechanism in the first decision cycledue to an undefined value of the opportunisticranking module. There is no restriction on thechoice of this initialization value from the giveninterval. However, values larger than zero suggestthe prioritization of the virtual goals. Therefore,we intentionally use the value zero to prevent anycontingent assumption of availability of a prioriinformation. As a result, the confidence of therobotic agents is constructed based solely on theevolution of their votes. This evolution reflectsthe result of the independent decisions of theagents at a given decision cycle.

We use Eqs. 25 and 28 to express the externalstate component as:

ψi(ri, ρ j) = π ti (ri �→ ρ j) + ωi(ρ j) (29)

= π ti (ri �→ ρ j) + C+ π t−1

i (ρ j),

∀ρ j ∈ V G (30)

The state diagram of the external state componentof the decision engine is presented in Fig. 2. Itcalculates the default ranking of an agent in con-junction with a set of virtual goals V G . Next, itobtains the final vote of a virtual goal ρ j ∈ V Gthrough the cumulative sum of its default and theopportunistic rankings. In addition, it updates theopportunistic ranking of the agent using its finalvotes at a given decision cycle. More specifically,this component overwrites the opportunistic rank-ings that are acquired in the previous decisioncycle with the votes in present decision cycle. Asa result, it enhances the decision engine to anevolving mechanism to determine the confidenceof the agents on a set of virtual goals as the stateof a mission progresses.

4.2.1 Ef fect of the Opportunistic Ranking Module

Let Table 1 represent the vote values of the ro-botic agent r1 for a set of virtual goals V G ={ρ1, ρ2, ρ3} at the first decision cycle, t = 1. Thisimplies that the opportunistic ranking moduleω1(ρ j) = 0, ∀ρ j ∈ V G . This explains the require-ment of the initialization value C in Eq. 28. There-fore, entries of Table 1 represent the vote profileof r1 that is solely calculated using its default rank-ing module. These vote values are calculated usingthe location information of r1 and ρ j ∈ V G . The

Fig. 2 The conceptual diagram of the external state component ψi(ri, ρ j)

Author's personal copy

370 J Intell Robot Syst (2013) 71:361–389

Table 1 The estimate of the external state componentof agent r1 for a set of virtual goals V G = {ρ1, ρ2, ρ3} atdecision cycle t = 1

r1 ρ1 ρ2 ρ3

π t1(r1 �→ ρ j) 0.41 0.16 0.43

ω1(ρ j) 0.00 0.00 0.00π1(ρ j) 0.41 0.16 0.43

entries of Table 1 show that the decision engineof r1 ranks the virtual goal ρ3 with the highestestimate of success since π1(ρ3) = 0.43 > π1(ρ1) =0.41 > π1(ρ2) = 0.16.

Let Table 2 represent the vote profile of r1

calculated using the default ranking module of theagent in the next decision cycle t + 1. The entriesof Table 2 indicate that the decision mechanism ofr1 ranks ρ1 with highest estimate in this decisioncycle. However, a comparison of the entries ofTables 1 and 2 reveals that the modification ofthe estimate of the success of r1 is due to a slightchange of the votes between these two consecu-tive decision cycles.

The opportunistic ranking module enables thedecision mechanism to reduce the possibility ofan unnecessary modification of the ranking of thevirtual goals. In particular, this module representsthe incremental evolution of the confidence ofthe robotic agents on available virtual goals. Thelast row entry of Table 1 corresponds to the votevalues of r1 at decision cycle t = 1. These valuesconstitute the opportunistic ranking of the virtualgoals in decision cycle t + 1 (see Fig. 2). The de-cision engine of r1 utilizes these values along withthe default ranking of the virtual goals at decisioncycle t + 1 (i.e., the entries of Table 2) to calculatethe vote profile of r1 using Eq. 30:

π1(ρ1) = π t+11 (r1 �→ ρ1) + ω1(ρ1) = 0.840

π1(ρ2) = π t+11 (r1 �→ ρ2) + ω1(ρ2) = 0.305

π1(ρ3) = π t+11 (r1 �→ ρ3) + ω1(ρ3) = 0.855

Table 2 The estimate of the default ranking moduleπ t+1

1 (r1 �→ ρ j) of r1 for a set of virtual goals V G ={ρ1, ρ2, ρ3} at decision cycle t + 1

r1 ρ1 ρ2 ρ3

π t+11 (r1 �→ ρ j) 0.43 0.145 0.425

Table 3 The estimate of the external state componentof agent r1 after the incorporation of the opportunisticranking module ωi(ρ j)

r1 ρ1 ρ2 ρ3

π t+11 (r1 �→ ρ j) 0.4300 0.145 0.4250

ωt1(ρ j|r1) 0.4100 0.160 0.4300

π1(ρ j) 0.5100 −0.025 0.5250π1(ρ j) 0.4925 0.000 0.5075

These votes are normalized to obtain:

π1(ρ1) = π t+11 (r1 �→ ρ1) + ω1(ρ1) = 0.510

π1(ρ2) = π t+11 (r1 �→ ρ2) + ω1(ρ2) = −0.025

π1(ρ3) = π t+11 (r1 �→ ρ3) + ω1(ρ3) = 0.525

This normalization procedure continues until thevotes comply with Eq. 23. This yields the values:

π1(ρ1) = π t+11 (r1 �→ ρ1) + ω1(ρ1) = 0.4925

π1(ρ2) = π t+11 (r1 �→ ρ2) + ω1(ρ2) = 0.0000

π1(ρ3) = π t+11 (r1 �→ ρ3) + ω1(ρ3) = 0.5075

Table 3 shows the vote profile of r1 at decisioncycle t + 1. The entries of Table 3 indicate that thedecision engine of r1 ranks ρ3 with highest vote atdecision cycle t + 1 as well.

4.3 Internal State Component

The internal state component φi(ri, ρ j) ranks aset of virtual goals V G based on the amount ofthe energy that is available to the robotic agents.This component utilizes the location informationof the virtual goals ρ j ∈ V G along with the energyinformation to determine if the agents are able toreach these virtual goals:4

φi(ri, ρ j) = π ti (ei �→ ρ j)

=⎡⎣

0 ei ≤ E N iei − E N i

FE iOtherwise

(31)

4Although it is possible to model the internal state com-ponent using the formal system of propositional logic, anumerical representation of the ranking is more desirable.This is due to the fact that the rankings of the virtual goalsthat are represented quantitatively provide the system witha better estimates of the final vote values at every decisioncycle.

Author's personal copy

J Intell Robot Syst (2013) 71:361–389 371

where

ei Current energy level of the ith roboticagent.

FE i Energy available to the ith robotic agentwhen it is fully charged (i.e., Full Energy).

E N i Energy needed by the ith robotic agent toreach ρ j ∈ V G .

Equation 31 verifies the necessity of the mul-tiplicative incorporation of the internal and theexternal states of the decision engine in Eq. 22. Inparticular, this equation expresses that the resultof the computation of the internal state compo-nent is zero if the current energy level ei of theagent is less than energy needed to reach a virtualgoal ρ j. This results in the final vote of the agentfor ρ j i.e., πi(ρ j) to be zero.

We use the approach introduced by Meiet al. [32] to calculate the energy consumption ofthe robotic agents. More specifically, we use thefollowing energy model [32]:

pmi(vi) = 0.29 + 7.4 × vi (32)

4.3.1 Ef fect of the Internal State Component

We formulated the external state component andthe effect of the opportunistic ranking moduleof the component in Section 4.2.1. The next stepto finalize the vote profile of the robotic agentsis the calculation of the estimate of the internalstate component for a set of virtual goals. Thesecomponents are incorporated multiplicatively torank the virtual goals at every decision cycle.

Let Table 4 represent the estimates of the ex-ternal state component of the robotic agent r1

for a set of virtual goals V G = {ρ1, ρ2, ρ3}. Letassume the battery of the robotic agent r1 holds10,000 Joules when it is fully charged and that r1

current energy level is 9,000 Joules. Let furtherassume that the current distance of the roboticagent r1 from the virtual goal ρ1 is 37.76 m, and

Table 4 The estimate of the external state componentof agent r1 for a set of virtual goals V G = {ρ1, ρ2, ρ3} atdecision cycle t + 1

r1 ρ1 ρ2 ρ3

ψ1(r1, ρ j) 0.4925 0.0 0.5075

Table 5 The estimate of the internal state component ofagent r1 for a set of virtual goals V G = {ρ1, ρ2, ρ3} atdecision cycle t + 1

r1 ρ1 ρ2 ρ3

φ1(r1, ρ j) 0.34 0.33 0.33

that the velocity of the robotic agent r1 has theupper bound of v1 = 20 m

s . The estimate of theinternal state component of r1 for the virtual goalρ1 using Eqs 31 and 32 is:

φi(r1, ρ1) = 9,000 − [(0.29 + 7.4 × 20) × 37.76]10,000

= 0.34 (33)

We similarly compute the estimate of the internalstate component of r1 for all available virtual goalsusing its current distances to the virtual goals.

Let Table 5 represent the estimate of the in-ternal state component of r1 for the virtual goalsV G = {ρ1, ρ2, ρ3}. The entries of Table 5 indicatesthat the internal state component of r1 ranks thevirtual goal ρ1 with the highest estimate of success.This estimate apparently contradicts the estimateof the external state component of r1 that ranksρ3 with the highest estimate. However, the mul-tiplicative incorporation of the external and theinternal states of r1 provides the agent with betterestimate of success for the available virtual goals.

We use the entries of Tables 4 and 5 along withEq. 22 to finalize the vote profile of r1:

π1(ρ1) = 0.4925 × 0.34 = 0.167450

π1(ρ2) = 0.0 × 0.33 = 0.0

π1(ρ3) = 0.5075 × 0.33 = 0.167475

Furthermore, we normalize these vote values toobtain a vote profile that satisfies Eq. 23. Table 6shows the final vote profile of r1 after the nor-malization of the multiplicative incorporation ofthe internal and the external states of the agent.The entries of Table 6 indicates that the decision

Table 6 The vote profile �1 of agent r1 for a set of virtualgoals V G = {ρ1, ρ2, ρ3} after multiplicative incorporationof the internal and the external states components at deci-sion cycle t + 1

r1 ρ1 ρ2 ρ3

π1(ρ j) 0.49998 0.0 0.50002

Author's personal copy

372 J Intell Robot Syst (2013) 71:361–389

mechanism of r1 ranks the virtual goal ρ3 with aslightly higher estimate of success. However, thedifference between the vote values of the virtualgoals ρ1 and ρ3 is negligible.

5 Formal Analysis

In this section, we formally analyze the perfor-mance of the decision engine of the robotic agents.In particular, we demonstrate the capability ofthis mechanism to infer the best choice of thevirtual goal of the agents at the individual-level.We define the best choice of the virtual goal of anagent as follows.

Definition 5 (Agent Best Choice) Given a setof virtual goals V G = {ρ1 . . . ρm}, and the voteprofile �i that comprises the vote values πi(ρ j) ofthe ith robotic agent, the best choice of virtual goalρ̂ ∈ V G satisfies:

πi(ρ̂) ≥ πi(ρ j), ∀ρ j ∈ V G , ρ̂ ∈ V G (34)

This implies that:

ρ̂ ∈ argmax πi(ρ j), ∀ρ j ∈ V G , ρ̂ ∈ V G (35)

Theorem 3 (Optimal Choice) The vote prof ile �i

consists of at least one vote value that correspondsto the best choice of virtual goal of the ith agent atevery decision cycle.

Proof Let V G represent a set of virtual goals.Then ∀ρ j ∈ V G , πi(ρ j) is the vote of ith roboticagent that is calculated using Eq. 22. Let �i ={πi(ρ j) : ρ j ∈ V G , πi(ρ j) ≥ 0} represent the voteprofile of the ith robotic agent. If �i is a singletonset, then the proof is trivial. However, if |�i| > 1where |�i| denotes the cardinality of �i, thereexists at least one element πi

′(ρ j) ∈ �i such that:

1. πi′(ρ j) > πi(ρ j), ∀πi(ρ j) ∈ �i, then πi

′(ρ j) is

the vote value that strongly dominates all theelements of �i. Hence it is the best choice ofthe agent.

2. πi′(ρ j) ≥ πi(ρ j), ∀πi(ρ j) ∈ �i, then π

′i (ρ j)

weakly dominates all the elements of �i. Thisvote satisfies Eq. 34 and hence it is the bestchoice of the agent.

3. ∃πi(ρ j) ∈ �i, πi′(ρ j) = πi(ρ j), then the ith

agent is indifferent to the two vote values andeither choice is the best voted virtual goal. ��

Theorem 3 underlines a situation where therobotic agents have equal confidence on multi-ple virtual goals. This condition may suggest thenecessity for further analysis of the vote profilebefore inferring the best choice of virtual goal ofan agent. However, the following theorem provesthat such an analysis of the solution space is notrequired since any other possible highly votedchoices of the virtual goals is at most as good as thevirtual goal that is determined as the best choice ofvirtual goal of the agent.

Theorem 4 For a robot r with its best choice ofvirtual goal denoted by ρ̂ ∈ V G , |V G | ≥ 1, givenany ρ∗ ∈ V G , πr(ρ

∗) ≥ 0, it is true that: πr(ρ∗) ≤

πr(ρ̂), ∀ρ∗ ∈ V G .

Proof If V G is a singleton set, then the proof istrivial. Let |V G | > 1, ρ̂ ∈ V G . If ∃ρ∗ ∈ V G withthe vote value higher than that of the ρ̂, usingEq. 22 we have:

πr(ρ̂) = argmaxρ∈V G

(1

η

∏ψ(r, ρ)φ(r, ρ)

)

= argmaxρ �=ρ∗

[1

η

∏ψ(r, ρ)φ(r, ρ)

]

×[

1

η

∏ψ(r, ρ∗)φ(r, ρ∗)

]

= argmaxρ �=ρ∗[(

π tr(r �→ ρ) + ωr(ρ)

)

× πr(er �→ ρ)]

× [(π t

r(r �→ ρ∗) + ωr(ρ∗)

) × πr(er �→ ρ∗)]

< argmaxρ �=ρ∗[(π t

r(r �→ ρ∗) + ωr(ρ))

× πr(er �→ ρ∗)]

× [(π t

r(r �→ ρ∗) + ωr(ρ∗)

) × πr(er �→ ρ∗)]

= πr(ρ∗) (36)

A contradiction to the original assumption thatρ̂ is the best choice of the virtual goal of theagent. ��

Author's personal copy

J Intell Robot Syst (2013) 71:361–389 373

Lemma 1 (Agent Best Choice Complexity) Ittakes no longer than O(m) for an agent to f ind itsbest choice of virtual goal.

Proof Let |�i| = m represent the cardinality ofthe vote profile of ith robotic agent. It is apparentthat to find the best choice of the virtual goal isequivalent to ascertaining the highest voted vir-tual goal ρ j ∈ V G in �i. This is done in a lineartime. Hence O(m). ��

6 Coordination of the Assignments

Assignment of two or more robots to the sametask or any constraint on the number of attendeesof a specific virtual goal are the examples wherea multi-robot system requires to coordinate theseindependent decisions. Furthermore, the distribu-tion of the robots or the virtual goals can be biasedto a particular region of the field of the operation.As a result, the virtual goals or the agents thatare in relatively farther distances from the highdensity region are discarded. This influences thedecision engine of the robotic agents to rank theseoutliers unfavorably. Therefore, it is crucial forthe system to bring the independent decisions ofthe individual robots together to infer the coordi-nation strategy at the group-level in every decisioncycle.

Definition 6 (Profile Matrix) Profile matrix n×m

of a group of robotic agents ri, i = 1 . . . n, is amatrix where every row entry corresponds to thevote profile of an individual agent ri:

n×m =

⎡⎢⎢⎢⎣

�1

�2...

�n

⎤⎥⎥⎥⎦

=

⎡⎢⎢⎢⎣

π1(ρ1) π1(ρ2) · · · π1(ρm)

π2(ρ1) π2(ρ2) · · · π2(ρm)...

. . ....

πn(ρ1) πn(ρ2) · · · πn(ρm)

⎤⎥⎥⎥⎦ (37)

where n and m represent the number of roboticagents and the cardinality of a set of virtual goals,|V G |.

6.1 Agents Votes Maximization Strategy

This strategy coordinates a multi-robot systemthrough the redirection of agents with lower votevalues to the virtual goals that are least favoredwith the robotic agents that achieve the highestvotes in the system. Algorithm 1 summarizes theagents votes maximization coordination strategy.The profile matrix is the input to this algorithm.It allocates the virtual goals based on the highestvotes of the individual robots. More specifically,agents with the higher vote values are treatedwith higher priority to assign the virtual goals at agiven decision cycle. Subsequently, the vote of therobotic agent that is allocated to this virtual goalin an iteration is set to −1. This prevents the sameagent to be reselected in consecutive iterations. Asa result, the entry of an assigned virtual goal ρ j isremoved from the profile matrix n×m.

6.1.1 The Agents Votes Maximization Strategy:An Example

Let the profile matrix for a group of robotic agentsri, i = 1 . . . 3, and the set of virtual goals V G ={ρ1, ρ2, ρ3} be:

3×3 =⎡⎣

�1

�2

�3

⎤⎦ =

⎡⎣

043 0.17 0.400.10 0.35 0.550.75 0.15 0.10

⎤⎦ (38)

Let the value −1 represent an agent ri wherethe allocation of this agent to a virtual goal iscompleted. Hence, the entry of the profile matrixwith −1 implies a virtual goal that is allocated.

Author's personal copy

374 J Intell Robot Syst (2013) 71:361–389

In Eq. 38, the highest vote corresponds toπ3(ρ1) = 0.75. Therefore, the virtual goal ρ1 isallocated to the agent r3 (i.e., r3 ← ρ1). Subse-quently, the entry of r3 and ρ1 is set to −1. Thisupdates the entries of the profile matrix 3×3 to:

(39)

In the matrix 39, it is apparent that r2 ← ρ3 sinceπ2(ρ3) = 0.55. Therefore:

(40)

This results in r1 ← ρ2. Hence, the final allocationof the agents becomes:

(41)

Lemma 2 It takes no longer than O(m3) to com-plete a one-to-one allocation of the robots to thevirtual goals.

Proof This setting requires to satisfy:

n = |V G | = m (42)

Assuming Eq. 42 is satisfied, one agent and onevirtual goal requires no further comparison for theallocation. Hence:

n = 1, |V G | = 1 ⇒ T1 = 0 (43)

Increasing the number of agents and the cardinal-ity of V G to two results in:

n = 2, |V G | = 2 ⇒ T2 = 3

= 22 + T1 − 1 (44)

Using Eqs. 43 and 44, we get:

T1 = 0

T2 = 3 = 22 + T1 − 1

T3 = 11 = 32 + T2 − 1

...

Tn = n2 + Tn−1 − 1

= n2 + (n − 1)2 + Tn−2 − 2

= n2 + (n − 1)2 + (n − 2)2 + · · · − n

= n(n + 1)(2n + 1)

6− n

= 1

3n3 + 1

2n2 − 5

6n

≤ 5

6n3 ≤ n3 ⇒ O(n3) n ≥ 1 (45)

��

6.2 Profile Matrix Permutations Strategy

This algorithm utilizes the profile matrix n×m

to calculate a coordination strategy where theallocated virtual goals are distinct. The result ofthe allocation strategy is the assignment of thevirtual goals such that the cumulative sum ofthe votes of robotic agents is maximum. This isachieved through the calculation of the differentpermutations of the entries of the profile matrix n×m. These permutations are utilized to ascertainthe permutation where the cumulative sum of thevotes of the agents is maximum. Moreover, thecalculated permutations are required to satisfythe following necessary conditions:

1. Every row of n×m is considered in everypermutation.

2. Every column of n×m is considered exactlyonce in every permutation.

Condition 1 guarantees that every agent is in-cluded in the allocation process. On the otherhand, condition 2 ensures that the allocated virtualgoals of these agents are distinct. The profile ma-trix permutations strategy stores these permuta-tions in a matrix �p×n. The number of row entriesof this matrix p equals the number of possible

Author's personal copy

J Intell Robot Syst (2013) 71:361–389 375

permutations of the entries of the profile matrix n×m. Additionally, the number of columns ofthis matrix equals the cardinality of the set ofvirtual goals |V G |. As a result, the best allocationstrategy in a decision cycle is the row entry of�p×n where the cumulative sum of the votes ismaximum:

υ ∈ argmaxp∑

i=1

n∑j=1

�ij (46)

In Eq. 46, υ is a 1 × n vector where every columnentry corresponds to an individual agents.

6.3 Permutations Calculation

Algorithm 2 shows the process of the calcula-tion of the permutations of the profile matrix n×m. These permutations are calculated throughrecursive invocation of Permutations function. Itfinds all the possible permutations of the arrayPermuteIndices where the size of this array equalsthe total number of virtual goals. The entries ofthis array is initialized to zero at the commence-ment of the computation. These elements rep-resent the column indices of the profile matrix

n×m. Algorithm 2 passes these values one-by-one to Permutations function for every entry ofthe array that is still zero. Furthermore, it ensuresthat the elements of PermuteIndices array are notrepeated by tracking the entered values throughthe parameter level. After all the elements of thePermuteIndices array are considered in a givenpermutation (i.e., level == m), the new sequenc-ing of the elements of this array is exploited topopulate the corresponding row entry of the per-mutation matrix �p×n using the votes in profilematrix n×m.

Next, the permutation matrix �p×n is usedby Algorithm 3 to find the optimal coordinationstrategy to allocate the virtual goals. It returnsthe index of the row entry in �p×n where thecumulative sum of the votes is maximum.

Theorem 5 The prof ile matrix permutations re-sults in an optimal coordination strategy.

Proof Let �p×n represent the permutation matrixof the robotic agents ri, i = 1 . . . n. Let υi denotethe cumulative sum of the ith row entry of �p×n.Then, V = {υ1, . . . , υp} is a set that consists ofall the possible cumulative sums of the entries of�p×n. It is apparent that the proof presented inTheorem 3 holds if we replace the vote profile ofthe agents �i in Theorem 3 with V . ��

6.4 Coordination Through Permutation:An Example

Consider a team of three robotic agents and aset of virtual goals V G = {ρ1, ρ2, ρ3}. Let πi(ρ j)

Author's personal copy

376 J Intell Robot Syst (2013) 71:361–389

represent the vote of ith agent for the vir-tual goal ρ j ∈ V G . This implies that every agentri has a vote profile with entries �i = {πi(ρ1),πi(ρ2), πi(ρ3)}. Therefore, the profile matrix 3×3

becomes:

3×3 =⎡⎣

�1

�2

�3

⎤⎦ =

⎡⎣

π1(ρ1) π1(ρ2) π1(ρ3)

π2(ρ1) π2(ρ2) π2(ρ3)

π3(ρ1) π3(ρ2) π3(ρ3)

⎤⎦ (47)

The possible permutations of the entries of 3×3 are:

�6×3 =

⎡⎢⎢⎢⎢⎢⎢⎣

π1(ρ1) π2(ρ2) π3(ρ3)

π1(ρ1) π2(ρ3) π3(ρ2)

π1(ρ2) π2(ρ1) π3(ρ3)

π1(ρ2) π2(ρ3) π3(ρ1)

π1(ρ3) π2(ρ2) π3(ρ1)

π1(ρ3) π2(ρ1) π3(ρ2)

⎤⎥⎥⎥⎥⎥⎥⎦

(48)

In Eq. 48 every row entry of �6×3 contains a votevalue from every robotic agent. Moreover, thesevotes correspond to distinct columns of 3×3. Theoptimal allocation strategy where the cumulativesum of the votes of robotic agents is maximum, iscalculated as:

υ ← MAX({π1(ρ1) + π2(ρ2) + π3(ρ3)},{π1(ρ1) + π2(ρ3) + π3(ρ2)},{π1(ρ2) + π2(ρ1) + π3(ρ3)},{π1(ρ2) + π2(ρ3) + π3(ρ1)},{π1(ρ3) + π2(ρ2) + π3(ρ1)},{π1(ρ3) + π2(ρ1) + π3(ρ2)}) (49)

This returns the index of an entry of �6×3 wherethe cumulative sum of the votes is maximum. Forinstance, if Eq. 38 represents the profile matrix as-sociated with the votes of robotic agents r1, r2, andr3 for the set of virtual goals V G = {ρ1, ρ2, ρ3}, theprofile matrix of these agents becomes:

�6×3 =

⎡⎢⎢⎢⎢⎢⎢⎣

043 0.35 0.100.43 0.55 0.150.17 0.10 0.100.17 0.55 0.750.40 0.10 0.150.40 0.35 0.75

⎤⎥⎥⎥⎥⎥⎥⎦

(50)

Hence, the optimum allocation using the profilematrix permutations strategy is:

υ ← [π1(ρ3), π2(ρ1), π3(ρ2)] (51)

This is the last row entry of �6×3 since 0.40 +0.35 + 0.75 = 1.50 has the highest cumulative sumof the votes. Therefore, the allocated virtual goalsare:

r1 ← ρ3, r2 ← ρ1, r3 ← ρ2 (52)

7 Simulation Setup

The simulation environment consists of a numberof stationary rectangular obstacles. There are twoexits in this environment. We consider a multi-robot, single intruder pursuit scenario. There arethree pursuers involved in every experiment. Thelinear velocity of the intruder and the pursuersvaries between 0 and 10 m

s . In addition, these ro-botic agents interact with the environment usingtheir respective on-board simulated sensors. Theyperform simple reactive collision avoidance toavoid collision with the obstacles and the roboticagents in their vicinity.

1. Task of the intruder The intruder enters thisenvironment from one of the exits and tres-passes to the other exit. This evasive agentattempts to escape if it detects a pursuer.However, the intruder does not follow anyspecific escape plan. It changes its naviga-tional direction until the pursuers are out of itssensing range. We choose the initial locationof the intruder as follows (see Fig. 3).

– The intruder enters the environment fromExit 1 and trespasses to Exit 2.

– The intruder enters the environment fromExit 2 and trespasses to Exit 1.

– The initial location of the intruder isselected arbitrarily in the environment.The intruder chooses the closest exit andmoves towards this exit.

2. Location of the pursuers We locate the pur-suers in the bottom-left corner compound ofthe simulation environment (referred to asthe base of the pursuers in Fig. 3). We use

Author's personal copy

J Intell Robot Syst (2013) 71:361–389 377

Fig. 3 Multi-robot, single intruder pursuit. The intruder isthe red-colored agent. The pursers are depicted in green.The stationary obstacles are shown in black. The environ-

ment consists of two exits. The virtual goals are shown bythe orange-colored circles

the same initial location information of thepursuers in all the experiments.

3. Experiments We use the initial locations ofthe intruder to determine the effect of theplacement of the intruder on the pursuit mis-sions. A pursuit mission is successful if theintruder falls within the convex hull of thelocations of the pursuers such that any furthermovement of the intruder results in collisionswith the pursuers.

7.1 Isogonic Decomposition Pursuit

Figure 3 shows a snapshot of the simulation envi-ronment. The pursuers are shown in green. Thered-colored agent is the intruder. The two exitsare labeled. The black-colored rectangles are thestationary obstacles. The base of the pursuers isshown in the bottom-left corner of this figure.A pursuit mission consists of two phases: Theambush phase and the capture phase.

7.2 Ambush Phase

The ambush phase instructs the pursuers to leavetheir base compound to spread out in the envi-

ronment. This is achieved through the applicationof a set of virtual goals that are calculated usingthe location information of the two exit ways andthe pursuers. We use the location information ofthe pursuers to calculate the center of mass of theinitial locations of these agents as:

Pr = argminp

3∑i=1

‖ri − p‖ = 1

3

3∑i=1

ri (53)

Where ri and Pr denote the location informationof the ith robotic agent and the center of mass ofthe initial locations of the pursuers. We utilize Pr

and the location information of the two exits tocalculate a set of virtual goals V G = {ρ1, ρ2, ρ3}as:

ρ j = 1

2‖Pr − ext j‖, j = 1, 2 (54)

ρ3 = argminp

⎡⎣

2∑j=1

‖ext j − p‖ + ‖Pr − p‖⎤⎦

= 1

3

⎡⎣

2∑j=1

ext j + Pr

⎤⎦ (55)

Author's personal copy

378 J Intell Robot Syst (2013) 71:361–389

where ext j represents the location information ofthe jth exit. As a result, the virtual goals ρ1 andρ2 are placed between the base compound and thetwo exits. In contrast, the virtual goal ρ3 is placedapproximately in the middle of the environment.The virtual goals ρ1 and ρ2 relocate the pursuerscloser to the exits. The virtual goal ρ3 relocatesone of the pursuer to the middle of the environ-ment to facilitate the entrapment of the intruderduring the capture phase.

7.2.1 Decision Engine of the PursuersDuring the Ambush Phase

Figure 4 shows the evolution of the votes of pur-suers r1, r2, and r3 for the set of virtual goals V G ={ρ1, ρ2, ρ3} calculated in Section 7.2. Subplots a, b,and c correspond to the votes that are computedby the default ranking module π t

i (ri �→ ρ j) of theexternal state component of the decision engineof the pursuers. Subplots d, e, and f represent the

votes that are computed by the decision engine ofthe agents after the incorporation of their respec-tive opportunistic ranking modules ωi(ρ j).

Subplots a, b, and c indicate that the decisionengine of the pursuers expends a longer time inranking the available virtual goals. This delay ofthe ranking of virtual goals is verified through theneutral votes of the pursuers. The default rankingmodule of the external state component of thepursuers rank the virtual goals with almost thesame fixed value 0.33 for more than 400 executioncycles. The neutrality of the votes of the defaultranking module changes only after the distancesof the pursuers to the virtual goals decrease. How-ever, the maximum value of these votes does notexceed 0.75.

The incorporation of the opportunistic rank-ing module of the external state component ofthe decision engine addresses the delays in theevolution of votes. In subplots d, e, and f, itis apparent that votes associated with different

Fig. 4 The evolution of the votes of pursuers r1, r2, and r3during the ambush phase of pursuit. Subplots a, b, and crepresent the votes of the default ranking module π t

i (ri �→

ρ j) of the agents. Subplots d, e, and f are the votes ofthe pursuers after the incorporation of the opportunisticranking module ωi(ρ j) (Eq. 30)

Author's personal copy

J Intell Robot Syst (2013) 71:361–389 379

virtual goals evolve from the early stages of theambush phase. In particular, the decision engineof r2 and r3 specify the best choices of the virtualgoals of these agents before 400 execution cycles.Furthermore, they distinguish their best choices ofvirtual goals where these goals are equally ranked.However, r1 takes longer to choose its best virtualgoal. Subplot d in Fig. 4 indicates that the decisionengine of r1 favors the virtual goal ρ2 (i.e., blue-colored curve) with a higher rank after 300 execu-tion cycles. However, this virtual goal is voted onby r2 with a higher value at this execution cycle.This causes the coordination strategy to assignthe virtual goal ρ2 to r2 and redirect the decisionengine of r1 to the unallocated virtual goal ρ3.

Figures 5 and 6 show the distributions of thevotes of the pursuers for the set of virtual goalsV G = {ρ1, ρ2, ρ3} during the ambush phase. Theyshow the frequencies of the votes of the defaultand the opportunistic ranking modules for theindividual virtual goal ρ j ∈ V G . Figure 5 clarifiesthe neutrality of the votes of the default ranking

module during this phase. The frequency of thesevotes mostly spread on [0.30 0.40] interval. Theexceptions to this observation are the subplotsthat represent the virtual goals with the highestvotes. These are π1(ρ2), π2(ρ3), and π3(ρ1). How-ever, the value of these votes does not exceed 0.75and is in accordance with the Fig. 4.

The votes of the pursuers exhibit wider distri-butions when the opportunistic ranking moduleis incorporated into the decision engine. Figure 6indicates that the value of these votes covers the[0.0 1.0] interval. The best choices for the virtualgoals of r2 and r3 are apparent in this figure. Theyare subplots π2(ρ3) and π3(ρ1). The best choice forthe virtual goal of r1 shows a shorter bar on thevalue 1.0 in the subplot π1(ρ2). An investigation ofthe subplots of r1 (i.e., π1(ρ1), π1(ρ2), and π1(ρ3)

in Fig. 6) reveals that the decision engine of thisagent favors the virtual goal ρ3 for an extendedperiod of the ambush phase. This is evident insubplot π1(ρ3). However, r2 ranks this virtual goalwith a higher vote. As a result, the vote of r1

Fig. 5 Distributions of the votes of default ranking moduleπ t

i (ri �→ ρ j) of pursuers for a set of virtual goals V G ={ρ1, ρ2, ρ3} during the ambush phase of a pursuit mission.

These histograms provide the frequencies of the votes forthe virtual goal ρ j ∈ V G . πi(ρ j) represents the vote of theith pursuer for the jth virtual goal (Eq. 22)

Author's personal copy

380 J Intell Robot Syst (2013) 71:361–389

Fig. 6 Distributions of the vote of pursuers for a setof virtual goals V G = {ρ1, ρ2, ρ3} after the incorporationof the opportunistic ranking module ωi(ρ j) into decisionmechanism during the ambush phase of a pursuit mission.

These histograms provide the frequencies of the votes forthe virtual goal ρ j ∈ V G . πi(ρ j) represents the vote of theith pursuer for the jth virtual goal (Eq. 22)

exhibits a slower evolution for the virtual goalρ2. This situation arises when the decision engineof two or more pursuers favor the same virtualgoal as the best choice. We resolve this issue inSection 7.4.1.

7.3 Capture Phase

Theorem 1 and Corollary 1 demonstrate that thedisplacement of the isogonic point ρ4 of an isosce-les triangle is always in an equal distance fromthe vertices ρ2 and ρ3 (see Fig. 1). The locationof this virtual goal is confined within the convexhull of the vertices of this triangle if the necessarycondition in Theorem 2 is satisfied.5 This reducesthe amount of information required to decomposea mission into a set of virtual goals. Specifically,the location information of the intruder suffices togenerate the set of virtual goals V G = {ρ1, ρ2, ρ3}.

5In Fig. 1, we bound the value of ∠λ within the boundarylimit 0 ≤ λ < 120 to satisfy this condition.

We denote the location information of the in-truder with ρ4 (i.e., the isogonic point of the trian-gle ρ1ρ2ρ3 in Fig. 1). We utilize ρ4 and Eq. 18 tocalculate the location of the virtual goal ρ1. Sub-sequently, we use the location information of ρ1,Eqs. 7 and 8 to calculate the location informationof the virtual goals ρ2 and ρ3 with regards to thelocation information of the intruder. In addition,we reflect the relocations of the intruder duringthe pursuit mission using Eqs. 20 and 21. We usethe location information of the intruder (i.e., ρ4)and Eq. 21 to update the location information ofρ1 at every execution cycle. Next, we utilize ρ1 andEq. 20 to update the location information of thevirtual goals ρ2 and ρ3.

7.3.1 Decision Engine of the PursuersDuring the Capture Phase

Figure 7 illustrates the performance of the de-cision engine of the pursuers during the cap-ture phase. Subplots a, b, and c correspond

Author's personal copy

J Intell Robot Syst (2013) 71:361–389 381

Fig. 7 The evolution of the votes of pursuers r1, r2, andr3 during the capture phase of the pursuit. Subplots a, b,and c represent the votes of the default ranking moduleπ t

i (ri �→ ρ j) of the agents. Subplots d, e, and f are the votes

of the pursuers after the incorporation of the opportunisticranking module ωi(ρ j) of the pursuers into their decisionengine (Eq. 30)

to the default ranking module π ti (ri �→ ρ j) of

the pursuers. The votes of the pursuers afterthe incorporation of the opportunistic rankingmodule are presented in subplots d, e, and f.Figure 7 signifies the influence of the relocationsof the intruder on the decision-making process.The location information of the virtual goals areupdated at every execution cycle to reflect therelocation of the intruder. This results in thefluctuation of the rankings of these virtual goalsbetween consecutive execution cycles. A compar-ison between Figs. 4 and 7 indicates more en-gagement of the default ranking module in thisphase. However, further investigation of Fig. 7reveals extreme indecisiveness of this module onthe ranking of the virtual goals. The value ofthese votes are unstable after 900 execution cycles.For example, the maximum vote value achievedby the default ranking module of r1 is less than0.36. In addition, the default ranking module ofr3 is indifferent to the available virtual goals formore than 500 execution cycles. Subplot c in Fig. 7verifies that the values of these votes are fixed at

0.33. Moreover, the votes of r2 and r3 alternate fre-quently between every virtual goals. These votesdo not stabilize after 900 execution cycles.

Figure 8 provides the distributions of the votesof the pursuers during the capture phase. Thisfigure indicates that the votes of r1 are mostlydistributed on [0.32 0.35] interval. The votes of r2

and r3 cover a wider range of values and occasion-ally reach 0.80 and 1.0, respectively. However, theoccurrence of these values are negligible.

The performance of the decision engine of thepursuers changes dramatically after the incorpo-ration of the opportunistic ranking module. Insubplots d, e, and f of Fig. 7, the peaks and valleysof the curves of votes reveal the active engage-ment of the decision engine with the relocationsof virtual goals. The relocations of these virtualgoals are highly dependent on the behavior of theintruder. The behavior of the intruder changes inresponse to various events. For example, the in-truder changes its navigational direction to avoidcollision with a stationary obstacle. A change ofthe direction in the navigation of the intruder

Author's personal copy

382 J Intell Robot Syst (2013) 71:361–389

Fig. 8 Distributions of the votes of default ranking moduleπ t

i (ri �→ ρ j) of pursuers for a set of virtual goals duringthe capture phase of a pursuit mission. These histograms

provide the frequencies of the votes for the virtual goalρ j ∈ V G . πi(ρ j) represents the vote of the ith pursuer forthe jth virtual goal (Eq. 22)

occurs if the intruder detects a pursuer in its vicin-ity to evade capture. Consequently, the changesof the relocations of the virtual goals result fromthe variations of their rankings by the decisionengines of the pursuers. Although the evolutionof the votes of r1 and r2 exhibit slow progressbetween 100 and 300 execution cycles, these votesare consistent with the best choices of these ro-botic agents. These virtual goals are ρ1 and ρ3 forthe pursuers r1 and r2, respectively. Furthermore,their votes are stabilized after 300 execution cy-cles. In contrast, it takes longer for r3 to ascertainits best choice of virtual goal. Subplot f in Fig. 7shows that the vote of this agent is stabilized after400 execution cycles. The extended time exhibitedby r3 is due to the choice of the decision enginebetween 100 and 400 execution cycles. In partic-ular, this agent favors the virtual goal ρ3 duringthis time interval. However, r2 ranks this virtualgoal with a higher vote between 100 and 400execution cycles. This influences the allocation ofthe virtual goals during the coordination of thevote profiles of the agents. The distributions ofthe votes of pursuers after the incorporation of

the opportunistic ranking module in the decisionprocess are elaborated in Fig. 9.

7.4 Coordination of the Pursuers

Although Fig. 4 through Fig. 9 illustrate the abilityof the decision engine of pursuers to elect theirbest choices, the system requires that every virtualgoal is allocated distinctively to a pursuer. Forexample, Figs. 4 and 6 illustrate a scenario wherethe decision engine of r1 and r2 demand the samevirtual goal ρ3. This results in a situation wherethe virtual goal ρ2 is unallocated. This unattendedvirtual goal provides the intruder with the oppor-tunity to escape if the pursuers are allowed to actsolely on the outcome of their respective decisionengines. We prevent this situation through theapplication of the agents votes maximization andthe profile matrix permutations strategies.

7.4.1 Agents Votes Maximization Strategy

Figure 10 illustrates the process of the virtual goalsallocation to the pursuers using this strategy. The

Author's personal copy

J Intell Robot Syst (2013) 71:361–389 383

Fig. 9 Distributions of the votes of the pursuers for a setof virtual goals V G = {ρ1, ρ2, ρ3} after the incorporationof the opportunistic ranking module ωi(ρ j) into decisionmechanism during the capture phase of a pursuit mission.

These histograms provide the frequencies of the votes forthe virtual goal ρ j ∈ V G . πi(ρ j) represents the vote of theith pursuer for the jth virtual goal (see Eq. 22)

left subplots correspond to the vote profiles ofthe individual pursuers. The right subplots showthe allocated virtual goals of individual pursuersat different execution cycles. Subplots (1) and(2) are associated with the ambush phase of thepursuit. Subplots (3) and (4) show the result ofthe agents votes maximization strategies duringthe capture phase. The x-coordinate of these sub-plots are labeled r1, r2, and r3 to distinguish thevote profiles and the allocated virtual goal ofdifferent agents.

This figure verifies that the allocated virtualgoals of the pursuers are distinct. In addition, thevirtual goals are allocated based on the highestvote values. For example, the virtual goal ρ3 (i.e.,the blue-colored bar) is allocated to r2 in subplot(1). This is due to the fact that the vote value of r2

is higher than the votes of r1 and r3 for the virtualgoal ρ2. This characteristic of the agents votesmaximization strategy is apparent in every subplotof Fig. 10. The change of the assignments of r1

and r3 are illustrated in subplots (3) and (4). These

subplots show that the allocated virtual goals of r1

and r3 are exchanged. However, r2 continues withthe same virtual goal ρ3 throughout the pursuitmission.

7.4.2 Prof ile Matrix Permutations Strategy

The profile matrix permutations strategy is basedon the calculation of the possible permutationsof the votes of the pursuers. This strategy uti-lizes the profile matrix of the pursuers to calcu-late these permutations. Furthermore, it selectsa permutation where the cumulative sum of thevotes of agents are maximized. Figure 11 illus-trates the evolution of the selected permutationin contrast to the possible permutations of thevotes of pursuers during the pursuit mission.6

Although some of the permutations of the votes

6There are six possible permutations of the votes of thepursuers.

Author's personal copy

384 J Intell Robot Syst (2013) 71:361–389

Fig. 10 The mediation of the vote profiles of the pursuersusing the agents votes maximization strategy. The left sub-plots correspond to the vote profiles of the pursuers r1,r2, and r3 for the set of virtual goals V G = {ρ1, ρ2, ρ3}

in different execution cycles. The right subplots show theallocation of the virtual goals of the pursuer in a given ex-ecution cycle. The x-coordinate of these subplots indicatethe pursuers ri, i = 1 . . . 3

occasionally rise (e.g., at about 1,000 executioncycle), these permutations are dominated by thegrowth of the cumulative sum of the votes of theselected permutation. Figure 11 reveals that theallocated virtual goals of the pursuers determinedby the profile matrix permutations strategy areunaltered throughout the mission. This is verifiedby the linearity of the evolution of the selectedpermutation in Fig. 11. Furthermore, the varia-tion of the value of this permutation is negligiblein different execution cycles. This results in anallocation strategy where the assignments of thepursuers remain the same throughout the mission.Figure 12 illustrates this effect of the linearity ofthe evolution of the selected permutation on theassignment of the pursuers. This figure verifiesthat the allocated virtual goals of the pursuersare fixed throughout the mission. Additionally,Figs. 11 and 12 show that it takes longer in theprofile matrix permutations strategy to completea pursuit mission. A comparison between Figs. 10and 11 verifies that the agents votes maximization

strategy completes a pursuit mission in 800 exe-cution cycles. However, it takes 1,500 executioncycles for the profile matrix permutations strategyto capture the intruder.

7.5 Further Analysis

In this section, we study the performance of theagents votes maximization and the profile matrixpermutations strategies in contrast to the leader-follower, the prioritization, and the probabilis-tic approaches. The leader-follower strategy (e.g.,Undeger and Polat [5]) designates one of the ro-bots as the leader of a multi-robot system. Theserobots follow the strategy that is formulated usingthe information of the leader (e.g., location in-formation). On the other hand, the prioritizationimplies a deterministic assignments of the tasksat the commencement of a mission. In addition,we use Bayes filter to present the probabilisticapproach [33, 34]. We use the elapsed time, theenergy expended, and the distance traveled by

Author's personal copy

J Intell Robot Syst (2013) 71:361–389 385

Fig. 11 The evolution ofthe selected permutationof the votes of thepursuers

Fig. 12 The allocated virtual goals of the pursuers based on the profile matrix permutations strategy. These allocated virtualgoals are fixed throughout the pursuit mission

the pursuers to compare the performance of thesestrategies. We use the same simulation setup in allthe experiments.

Table 7 shows the means and standard devi-ations for the time expended by the pursuers tocomplete a pursuit mission using different strate-gies. In addition, this table provides the per-centage of the successful pursuit missions basedon these strategies. Table 7 indicates that theagents votes maximization and the probabilisticstrategies perform better on average. The meanvalue entry of this table verifies this improve-ment on the elapsed time of the pursuers usingthese two strategies. Furthermore, the standarddeviations support the differences in the meanof the elapsed time. Although the profile ma-

trix permutations strategy achieves a better re-sult compared to the leader-follower strategy, itsaverage expended time is more than the agents

Table 7 The mean and the standard deviation of theelapsed time to complete a pursuit mission

Adapted Time (s) Success

strategy Mean STD (%)

Leader-Follower 44.21 4.58 23.8Prioritization 38.99 7.96 65.7Probabilistic 37.72 6.17 70.3Agents Votes Maximization 31.09 7.01 72.9Profile Matrix Permutations 41.02 8.67 64.5

The percentage of the successful completion of the pursuitis provided in the last column of the table

Author's personal copy

386 J Intell Robot Syst (2013) 71:361–389

Table 8 The mean, the median, and the standard deviation of the travel distance of the pursuers at the group-level in theabsence and the presence of obstacles in the environment

Adapted Obstacle-Free Obstacles

strategy Mean Median STD Mean Median STD

Leader-Follower 185.22 163.53 12.34 232.75 196.61 13.49Prioritization 139.56 119.05 11.16 192.00 164.18 15.25Probabilistic 116.76 94.13 8.23 161.76 139.13 8.93AVM 114.65 94.07 8.34 162.75 141.44 9.40PMP 140.82 123.29 7.79 187.62 167.15 8.56

AVG and PMP stand for the Agents Votes Maximization and the Profile Matrix Permutations strategies

votes maximizations and the probabilistic strate-gies. This result is explained by the linearity inthe growth of the selected permutation in Fig. 11.More specifically, this linearity indicates that thechoice of the selected permutation by the profilematrix permutations strategy is unaffected by therelocations of the intruder at different executioncycles. Therefore, the allocated virtual goals ofthe pursuers using the profile matrix permutationsresemble the fixed assignments of the prioritiza-tion strategy. Table 7 indicates that the differencebetween the mean of elapsed time of the profilematrix permutations and the prioritization strat-egy is negligible. The mean of the elapsed time ofthese strategies is within one standard deviation.Hence, the difference in the mean of their elapsedtimes is not statistically significant. Furthermore,the percentage of successful pursuits in this tablesupports the similarity of the performance of theprofile matrix permutations and the prioritizationstrategies. However, the prioritization strategyachieves a slightly higher rate of success comparedto the profile matrix permutations strategy.

Table 8 presents the group-level travel distanceof the pursuers in the absence and the presence ofobstacles in the environment. This table verifiesthat the same differences on the performanceof these strategies in Table 7 continue on thedistance traveled by the pursuers. However, theperformance of the probabilistic and the agentsvotes maximization is almost indistinguishable.The mean of the distances traveled by the pur-suers using these strategies is within one standarddeviation. This result is achieved regardless ofthe absence or the presence of obstacles in theenvironment. Therefore, any conclusion on thecomparative analysis of the improvement ofthe distances traveled by the pursuers using these

strategies is not warranted. Although the profilematrix permutations exhibits a fixed allocation ofthe virtual goals during the pursuit (see Fig. 11),these assignments result in a shorter travel dis-tance compared to the prioritization strategy. Thisis verified by a comparison of the standard devia-tion of these strategies.

We mentioned earlier that the pursuers and theintruder are capable of accelerating and decel-erating during the pursuit mission. Furthermore,we introduced a maximum limit of 10 m

s in thechange of the velocity of the agents. Therefore,the results presented in Tables 7 and 8 requirefurther investigation to determine the effect of thedifferent strategies on the energy consumption ofthe pursuers. These tables show that the agentsvotes maximization and the probabilistic strate-gies show a similar trend of the improvementof elapsed time and travel distance. However,there is a significant difference in the energy con-sumption. Table 9 indicates that the result of theallocation of the virtual goals using agents votesmaximization strategy yields better average en-ergy consumption. Moreover, the leader-followerstrategy does not exhibit a convincing perfor-mance. This is due to the fact that the pursuersfollow a relatively identical strategy during the

Table 9 The mean, the median, and the standard deviationof the energy expended by the pursuers at the group-levelusing different strategies (out of 5,000 energy units)

Adapted strategy Mean Median STD

Leader-Follower 2567.50 1515.40 96.41Prioritization 2381.80 2568.00 323.19Probabilistic 1525.50 1557.10 68.73Agents Votes Maximization 1482.40 1434.90 123.04Profile Matrix Permutations 2451.72 2484.56 90.89

Author's personal copy

J Intell Robot Syst (2013) 71:361–389 387

Fig. 13 The percentageof successful pursuit inconjunction with theincrease of the velocity ofintruder. Vintruder andVpursuers are the velocitiesof intruder and pursuers,respectively. The x-axisentries indicate theinstances where thevelocity of intruder is0.50, 0.20, 0.10, 1.0, and1.1 times the velocityof pursuers

pursuit mission. Specifically, pursuers follow thestrategy that is inferred based on the informationof one of the agents (i.e., the leader of the team).The results of the profile matrix permutations andthe prioritization strategies on the energy con-sumption of the pursuers are relatively similar.However, the prioritization strategy imposes lessenergy consumption on the pursuers.

Figure 13 shows the effect of an increase inthe velocity of the intruder on the successful com-pletion of the pursuit mission. This figure showsthe instances where the velocity of the intruderis 0.50, 0.20, 0.10, 1.0, and 1.1 times the veloc-ity of the pursuers. The leader-follower strategyexhibits an abrupt drop after the velocity of theintruder is set to ≤0.50 times of the velocity ofthe pursuers and continues to fall thereafter. Incontrast, the prioritization, the probabilistic, theagents votes maximization, and the profile matrixpermutations exhibit a relatively similar trend inthe percentage of the successful pursuits.

8 Conclusion

Research in multi-robot pursuit-evasion demon-strates that three pursuers are sufficient to capturean intruder in a polygonal environment. How-ever, this result requires the confined of the initiallocation of the intruder within the convex hullof the locations of the pursuers. In this study,we extended this result to demonstrate that this

convexity is alleviated through the application ofa set of virtual goals that are independent of thelocations of the pursuers. These virtual goals aresolely calculated using the location information ofthe intruder such that whose locations confine theintruder within their convex hull at every execu-tion cycle.

We studied the profile matrix permutations andthe agents votes maximization strategies to coor-dinate the independent decisions of the pursuers.This study shows the better performance of theagents votes maximization to the profile matrixpermutations. The result suggests that the satisfac-tion of the gain of the pursuers at the group-levelis an insufficient assumption to yield an optimalstrategy if the group dynamics have a significanteffect on the performance of the individuals. Morespecifically, it compromises the opportunities ofsome agents in order to maintain the higher gainof the entire group. In contrast, the agents votesmaximization strategy results in the performanceof the pursuers that is equivalent to the proba-bilistic framework. This shows the efficiency ofthe opportunistic ranking module of the externalstate component of the decision engine mecha-nism. This module provides a multi-robot systemwith the capability of inferring decisions wherea priori information on the state of the missionis absent.

In this research the continuity and accuracy ofthe location information of tasks to calculate theset of virtual goals is the basic assumption in the

Author's personal copy

388 J Intell Robot Syst (2013) 71:361–389

decomposition process. However, this informa-tion is error prone in real-life problem domains.Moreover, it is possible for a system to acquirethis information in discontinuous intervals. Anextension of this research is to study the effect ofsuch uncertainties in the location information oftasks on the decomposition process. For example,it is possible to incorporate a tracking mechanismto estimate these locations based on the availableimperfect information.

The formulation of decision engine in the cur-rent research assumes that the set of virtual goalsis received synchronously and without any furtherdelays. Moreover, the set is assumed to form thecommon knowledge of the robotic agents duringthe voting process. The instantaneous availabil-ity of a set of virtual goals can be challengedthrough the introduction of communication fail-ure between the decomposition mechanism andthe decision engines in different decision cycles.

The behavior of the intruder is an impor-tant factor that significantly influences the perfor-mance of the pursuers. Although the intruder inthis study performs evasion, its evasive behavior isbased solely on the vicinity of the pursuers. More-over, the intruder evades the pursuers reactively.It is possible to enable the intruder to strategicallyrespond to the detection of the pursuers in thevicinity. Furthermore, the decline of the result ofthe successful pursuit by using the profile matrixpermutations and the agents votes maximizationstrategies can be analyzed to determine the the-oretical bound of the decline in conjunction withthe velocity of the intruder.

References

1. Chung, T.H., Hollinger, G.A.: Search and pursuit-evasion in mobile robotics. Auton. Robots 31, 299–316(2011)

2. Basar, T., Olsder, G.J.: Dynamic NoncooperativeGame Theory. Society for Industrial Mathematics,Philadelphia(1999)

3. Kim, T.H., Sugie, T.: Cooperative control for target-capturing task based on a cyclic pursuit strategy. Au-tomatica 43(8), 1426–1431 (2007)

4. Guo, J., Yan, G., Lin, Z.: Cooperative control synthesisfor moving-target-enclosing with changing topologies.In: IEEE International Conference on Robotics andAutomation (ICRA10) (2010)

5. Undeger, C., Polat, F.: Multi-agent real-time pursuit.Auton. Agent. Multi-Agent Syst. 21, 69–107 (2010)

6. Nowakowski, R., Winkler, P.: Vertex-to-vertex pursuitin a graph. Discrete Math. 43(2–3), 235–239 (1983)

7. Aigner, M., Fromme, M.: A game of cops and robbers.Discrete Appl. Math. 8(1), 1–12 (1984)

8. Isler, V., Karnad, N.: The role of information in thecoprobber game. Theor. Comp. Sci. 3(399), 179–190(2008). Special issue on graph searching

9. Jankovic, V.: About a man and lions. Mat. Vesn. 2, 359–361 (1978)

10. Kopparty, S., Ravishankar, C.V.: A framework for pur-suit evasion games in Rn. Inf. Process. Lett. 96(3), 114–122 (2005)

11. Isler, V., Kannan, S., Khanna, S.: Randomized pursuit-evasion in a polygonal environment. IEEE Trans.Robot. 21(5), 875–884 (2005)

12. Bopardikar, S.D., Bullo, F., Hespanha, J.P.: Sensinglimitations in the Lion and Man problem. In: IEEEAmerican Control Conference (2007)

13. Thunberg, J., Ogren, P.: An iterative mixed inte-ger linear programming approach to pursuit evasionproblems in polygonal environment. In: InternationalConference on Robotics and Automation (ICRA)(2010)

14. Alspach, B.: Searching and sweeping graphs: a briefsurvey. Matematiche 59, 5–37 (2004)

15. Fomin, F.V., Thilikos, D.M.: An annotated bibliogra-phy on guaranteed graph searching. Theor. Comp. Sci.399(3), 236–245 (2008)

16. Levy, R., Rosenschein, J.: A game theoretic approachto the pursuit problem. In: 11th International Work-shop on Distributed Artificial Intelligence (1992)

17. Harmati, I., Skrzypczyk, K.: Robot team coordinationfor target tracking using fuzzy logic controller in gametheoretic framework. Robot. Auton. Syst. 57, 75–86(2009)

18. Shoham, Y., Brown, K.L.: Multiagent Systems: Al-gorithmic, Game-Theoretic, and Logical Foundations(2009)

19. Keshmiri, S., Payandeh, S.: Multi-robot target pursuit:towards an opportunistic control architecture. In: 8thInternational Multi-Conference on Systems, Signals &Devices (SSD11) (2011)

20. Koopman, B.O.: The theory of search. Part I. Kine-matic bases. Oper. Res. 4(5), 324–346 (1956)

21. Koopman, B.O.: The theory of search. Part II. Targetdetection. Oper. Res. 4(5), 503–531 (1956)

22. Assaf, D., Zamir, S.: Optimal sequential search: aBayesian approach. Ann. Stat. 13(3), 1213–1221 (1985)

23. Washburn, A.R.: Search for a moving target: the F ABalgorithm. Oper. Res. 31(4), 739–751 (1983)

24. Kupitz, Y., Martini, H.: Geometric aspects of the gen-eralized fermat-torricelli problem. Bolyai Soc. Math.Stud. 6, 55–129 (1997)

25. Boltyanski, V., Martini, H., Soltan, V.: GeometricMethods and Optimization Problems. Kluwer Acad-emic Publishers, Boston (1999)

26. Charniak, E.: Bayesian network without tears. AI Mag.12, 50–63 (1991)

Author's personal copy

J Intell Robot Syst (2013) 71:361–389 389

27. Thrun, S., Fox, D., Burgard, W., Dellaert, F.: Robustmonte carlo localization for mobile robots. Artif. Intell.128, 99–141 (2001)

28. Borenstein, J., Everett, B., Feng, L.: Navigating Mo-bile Robots: Systems and Techniques. Wellesley, MA(1996)

29. Chung, C.F., Furukawa, T.: Coordinated pursuer con-trol using particle filters for autonomous search-and-capture. Robot. Auton. Syst. 57, 700–711 (2009)

30. Furukawa, T., Bourgault, F., Lavis, B., Durrant-Whyte,H.F.: Recursive bayesian search-and-tracking using co-ordinated uavs for lost targets. In: IEEE InternationalConference on Robotics and Automation (ICRA06)(2006)

31. Amigoni, F., Basilico, N., Gatti, N., Saporiti, A.,Troiani, S.: Moving game theoretical patrolling strate-gies from theory to practice: an usarsim simulation.In: IEEE International Conference on Robotics andAutomation (ICRA10) (2010)

32. Mei, Y., Lu, Y.H., Hu, Y.C., Lee, C.: A case studyof mobile robot’s energy consumption and conserva-tion techniques. In: 12th International Conference onAdvanced Robotics (ICAR05) (2005)

33. Thrun, S., Fox, D., Burgard, W., Dellaert, F.: Robustmonte carlo localization for mobile robots. Artif. Intell.128, 99–141 (2001)

34. Thrun, S., Burgard, W., Fox, D.: Probabilistic Robotics.The MIT Press, Cambridge, MA (2006)

Author's personal copy